Beruflich Dokumente
Kultur Dokumente
Jonathan Gleason
Trigger warning: Mathematics contained herein.
Contents
1 What is a number? . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Cardinality of sets and the natural numbers . . . . 1
1.1.1 The natural numbers as a set . . . . . . . . . . . . . . . . . . . 2
1.1.2 The natural numbers as an integral crig . . . . . . . . . . 5
1.1.3 The natural numbers as a well-ordered set . . . . . . . . 9
1.1.4 The natural numbers as a well-ordered rig . . . . . . . 18
An axiomatization of N . . . . . . . . . . . . . . . . . . . . . . . . 24
The Peano axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.2 Additive inverses and the integers . . . . . . . . . . . 27
1.3 Multiplicative inverses and the rationals . . . . . . 41
1.4 Least upper-bounds and the real numbers . . . . 46
1.4.1 Least upper-bounds and why you should care . . . . 46
1.4.2 Dedekind-cuts and the real numbers . . . . . . . . . . . 52
1.5 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 70
5 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
5.1 Measure theory . . . . . . . . . . . . . . . . . . . . . . . . . . 467
5.1.1 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
5.1.2 Measures on topological and uniform spaces . . . 482
Measures on topological spaces . . . . . . . . . . . . . . . . 482
Measures on uniform spaces . . . . . . . . . . . . . . . . . . . 499
A summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
5.1.3 The Haar-Howes Theorem . . . . . . . . . . . . . . . . . . . 503
5.1.4 The product measure . . . . . . . . . . . . . . . . . . . . . . 525
5.1.5 Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . 544
Generalized Cantor sets . . . . . . . . . . . . . . . . . . . . . . 548
5.2 The integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
5.2.1 Char., simple, Borel, and integrable functions . . . . 562
5.2.2 The integral itself . . . . . . . . . . . . . . . . . . . . . . . . . . 574
5.2.3 Properties of the integral . . . . . . . . . . . . . . . . . . . . 593
Fubini’s Theorem and its consequences . . . . . . . . . . . 599
5.2.4 Riemann integrability . . . . . . . . . . . . . . . . . . . . . . . 621
5.3 L p spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622
6 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 631
6.1 Tensors and abstract index notation . . . . . . . . 631
6.1.1 Motivation and introduction . . . . . . . . . . . . . . . . . 631
6.1.2 The dual-space . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
6.1.3 Tensors and the tensor product . . . . . . . . . . . . . . . 634
6.2 The definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
6.2.1 The definition itself . . . . . . . . . . . . . . . . . . . . . . . . . 649
6.3 A smooth version of MorTop (Rd, R): C ∞ (Rd ) . . . 662
6.4 Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666
6.4.1 Tools for calculation . . . . . . . . . . . . . . . . . . . . . . . . 666
6.4.2 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . 668
6.4.3 The Fundamental Theorem of Calculus . . . . . . . . . 679
6.4.4 Change of variables . . . . . . . . . . . . . . . . . . . . . . . 685
6.4.5 Functions defined by their derivatives . . . . . . . . . . 687
Analytic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 688
The definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
Their properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
The Multinomial Theorem and the Power Rule . . . . . . . 704
The p-series test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
The irrationality of e . . . . . . . . . . . . . . . . . . . . . . . . . . 709
6.5 Differentiability and continuity . . . . . . . . . . . . . 713
6.6 The Inverse Function Theorem . . . . . . . . . . . . . . 727
Appendices
That is, it is a net made up from terms of the original net that has
the property that any set which eventually contains the original net
eventually contains the subnet. I personally find this to be a much
cleaner definition than the ones involving awkward conditions on the
indices of the nets: The indices themselves don’t matter—it is only
what eventually happens that matters.
Uniform spaces
My inclusion of uniform spaces was essentially a replacement for a
treatment of metric spaces (which, for some reason, seems standard).
Unfortunately, metric spaces are just not enough. For example, if you
care about continuous functions on the real line,11 you cannot get
away with metric spaces. You could generalize to semimetric spaces,
but I find this to not be as clean as uniform spaces. Uniform spaces
also include all topological groups, and so by generalizing to uniform
spaces, you include another hugely important example of topological
spaces. For some reason, the theory of uniform spaces seems to be
very obscure in standard undergraduate and graduate curricula, and I
myself was never formally taught it in my own education. Despite this,
I’ve found knowledge of uniform spaces to be incredibly useful,12 in
particular in functional analysis, and they deserve more attention than
they get in courses in analysis and general topology.
σ-algebras
In general, I try to place a relatively large amount of emphasis on
motivation, and so when I first started writing down my development
of measure theory, I decided from the beginning that I would first
introduce the notion of an outer-measure, and then introduce the
notion of an abstract σ-algebra once I had shown that Carathéodory’s
Theorem (Theorem 5.1.1.23) says that the measurable sets form a
11If you’re a mathematician or planning on becoming one, chances are you do, at
least a little.
12For example, there is a generalization of Haar measure to uniform spaces (see
the Haar-Howes Theorem), which, among other things, gives isometric invariance of
Lebesgue measure for free.
σ-algebra. That is, you don’t have to pull the definition out of thin
air—the theory gives it to you. As I continued to develop the theory,
however, I found that I never really needed to make use of the notion—
working with measure spaces (sets equipped with a measure) was
sufficient: all I had to do was add in the hypothesis “measurable” in
certain places. I found that this made the theory quite a bit cleaner. For
one thing, it was easier to teach, but even in terms of the mathematics
itself, it made some things a bit easier. For example, when discussing
product measures, you run into the issue of whether or not the product
σ-algebra agrees with the σ-algebra of measurable sets on the product.
If I recall, in general this fails, but that doesn’t matter: if you don’t use
σ-algebras, it’s just not an issue—Mr. Carathéodory’s always tells
you what the measurable sets are.
Measure theory and the Lebesgue integral
Since the first day you learned the ‘definition’ of the integral in your
first calculus class,13 you have almost certainly thought of the integral
as the ‘(signed) area under the curve’. That’s what it is. Measure
theory allows one to make this the definition of the integral. There
is no need for these awkward approximations by rectangles.14 The
counter-argument to this is “Okay, sure, but that just reduces the
complication of the Lebesgue integral to measure theory.”. To be fair,
this is correct—if you take this route, you have to develop measure
theory, whereas, with the Riemann integral, you can start talking
about partitions, etc., almost immediately. That being said, I take
my time developing measure theory more fully because I don’t like
approaches which ‘run straight to the finish line’, but in principle,
if you really don’t like the amount of build-up that measure theory
requires, you can probably cut-down the length of my development
by two-thirds or so to get to the definition of the integral. To those
who would still object to even a minimal measure theoretic treatment,
I would argue the following. There are two types of students: those
who want to become mathematicians and those who do not. The
13Or when self-teaching, for the particularly precocious amongst you.
14Of course, in the spirit of defining things by the properties they uniquely specify,
I don’t quite state it like this. Instead I say something like “The area under the curve
is the unique extended real-valued function on nonnegative Borel functions such
that. . . ”—see Definition 5.2.2.1.
former need to learn the Lebesgue integral anyways, and so I don’t see
much point dicking around with the Riemann integral when they’ll
have to learn the Lebesgue integral in a graduate analysis course (if
not earlier) anyways. In the latter case, simply stating informally “It’s
the area under the curve.” should suffice—in fact, that’s not even
really a lie, or even a stretch of the truth—if you set things up right,
that’s literally what it is.
Tangent spaces in differentiation in Rd
I actually found differentiation to be quite a bit easier when studying
it on manifolds as opposed to in Rd . I think this is because it is easy
to mix up objects that should be thought of as points and objects that
should be thought of as vectors in Rd , something that is just plain
impossible to do in a general manifold. In an attempt to circumvent
this potential confusion, I introduce the term tangent space as well as
notation for it, Tx (Rd ). This is really only for conceptual clarity: it is
not the definition one would use on a manifold. In fact, it would be
essentially impossible to do this because the usual definition of the
tangent space on a manifold requires one to already have a theory of
differentiation on Rd . Instead, I just take Tx (Rd ) as a metric vector
space whereas the ‘entire space’ Rd is considered as a metric space.15
I also made a point to introduce index notation and tensor fields.
The motivation for this one really comes from my physics background:
index notation and tensor calculus are mandatory background if one
wants to do theoretical physics. But this is not the only reason for
introducing it of course—I personally find index notation to sometimes
be very useful, even in pure Riemannian geometry. In particular, I find
that index notation can make some equations much more transparent.
For example, compare16
Rab cd B ∇a ∇b − ∇b ∇a
15Note the implicit elementary categorical ideas coming into play here. Also
note the two distinct uses of the term “metric”. Who the hell came-up with this
terminology anyways? Mathematicians need to get better at naming things. . . Surely
you can find a more descriptive adjective than “normal” or “regular” for your fancy
new idea, yes?
16At least for a torsion-free covariant derivative.
versus
Jonathan Gleason
Ph. D. Student, Department of Mathematics, University of California,
Berkeley
First draft (of preface): 16 September 2015
16 September 2015
1. What is a number?
So we’ve determined what it means for two sets to have the same
cardinality, but what actually is a cardinality? The trick is to identify
a cardinal with the collection of all sets which have that cardinality.
1Set is the category of sets. If you are reading this linearly and you see something
you don’t recognize, chances are it’s in the appendix. Use the index and index of
notation at the end of the notes to find exactly where.
1.1 Cardinality of sets and the natural numbers 3
The idea then is that the natural numbers are precisely those
cardinals which are finite. We thus must now answer the question
“What does it mean to be ‘finite’?”. This is actually a tad bit tricky.
Of course, we don’t have a precise definition yet, but everyone
has an intuitive idea of what it means to be infinite. So, consider
an ‘infinite set’ X. Now remove one element x0 ∈ X to form the set
U B X \ {x0 }. For any reasonable definition of “infinite”, removing
a single element from an infinite set should not change the fact that
it is infinite, and so U should still be infinite. In fact, more should
be true. Not only should U still be infinite, but it should still have
the same cardinality as X.2 It is this idea that we take as the defining
property of being infinite.
FX B {S ⊆ X : S is finite.} . (1.1.1.6)
2We will see in the next chapter that there are infinite sets which are not of the
same cardinality. That is, in this sense, there is more than one type of infinity.
1 What is a number? 4
m + n B | M t N | and mn B | M × N | (1.1.2.2)
are well-defined.
| M1 t N1 | = | M2 t N2 |, (1.1.2.4)
(m + n) + o = (| M | + | N |) + |O|
= | M t N | + |O|
= |(M t N) t O| (1.1.2.9)
= | M t (N t O)|
= m + (n + o).
1 What is a number? 8
m + 0 = | M t ∅| = | M | = m = 0 + m. (1.1.2.10)
M × N = ty ∈N M. (1.1.2.14)
As for what the definition of that preorder should be, recall our
explanation of thinking of a function f : X → Y as ‘labeling elements
of Y with elements of X’—see the beginning of Subsection 1.1.1 The
natural numbers as a set. We argued that our definition of ‘same
number of elements’ should have the properties that (i) every element
of Y is labeled and (ii) no element of Y is labeled more than once.
Similarly, our definition of “Y has at least as many element of X”
should have the property that are not forced to label an element of Y
more than once (i.e. that f is injective), but not necessarily that every
element of Y is labeled.
Definition 1.1.3.1 Let m, n ∈ ℵ and let M and N be sets such
that m = | M | and n = | N |. Then, we define m ≤ n iff there is
an injective map from M to N.
You might be thinking “Ah, that makes sense. But why use
injective? Couldn’t we also say that | X | ≥ |Y | iff there is a surjective
function X → Y ?”. Unfortunately, this is only almost correct.
Exercise 1.1.3.3
(i). Show that if X and Y are nonempty sets, then | X | ≤ |Y |
iff there is a surjective function from Y to X.
(ii). On the other hand, show how this might fail without the
assumption of nonemptiness.
The next result is perhaps the first theorem we have come to that
has a nontrivial amount of content to it.
Theorem 1.1.3.5 — Bernstein-Cantor-Schröder Theorem.
hℵ, ≤i is a partially-ordered set.
Proof. a St e p 1 : R e ca l l w h at i t m e a n s t o b e a
partial - o r d e r
Recall that being a partial-order just means that ≤ is an anti-
symmetric preorder. We have just shown that ≤ is a preorder
(see Definition A.3.3.6), so all that remains to be seen is that
≤ is antisymmetric.
St e p 2 : D e t e r m i n e w h at e x p l i c i t ly w e n e e d t o
show
Let m, n ∈ ℵ and let M, N be sets such that m = | M | and
n = | N |. Suppose that m ≤ n and n ≤ m. By definition, this
means that there is an injection f : M → N and an injection
1 What is a number? 12
St e p 3 : No t e t h e e x i st e nc e o f l e f t- i n v e r s e t o
both f a n d g
If M is empty, then as N injects into M, N must also be empty,
and we are done. Likewise, if N is empty, we are also done.
Thus, we may as well assume that M and N are both nonempty.
We can now use the result of Equation (A.3.26) which says
that both f and g have left inverses.b Denote these inverses by
f −1 : N → M and g −1 : M → N respectively, so that
f −1 ◦ f = id M and g −1 ◦ g = id N .c (1.1.3.6)
Step 4: D e f i n e Cx
Fix an element x ∈ M and define
n
Cx B . . . , g −1 f −1 g −1 (x) , f −1 g −1 (x) , g −1 (x), x,
f (x), g ( f (x)) , f (g ( f (x))) , . . .} (1.1.3.7)
⊆ M t N.d
In other words, x1 ∈ Cx2 . Not only this, but f (x1 ) ∈ Cx2 as well
because f (x1 ) = f [g ◦ f ] (x2 ) . Similarly, g −1 (x1 ) ∈ Cx2 ,
l−k
as well as
X1 BM ∩ A, Y1 BN ∩ A, (1.1.3.12a)
c c
X2 BM ∩ A , Y2 BN ∩ A . (1.1.3.12b)
Step 7: S h ow t h at f | X1 : X1 → Y1 i s a b i j e c t i o n
We claim that f | X1 : X1 → Y1 is a bijection. First of all, note
the it x ∈ X1 , then in fact f (x) ∈ Y1 , so that this statement
1 What is a number? 14
Step 8: S h ow t h at g| Y2 : Y2 → X2 i s a b i j ec t i o n
We now show that g| Y2 : Y2 → X2 is a bijection. Once again,
all we must show is surjectivity, so let x ∈ X2 = M ∩ Ac . From
the definition of A (1.1.3.11), it thus cannot be the case that
Cx ∩ N is contained in f (M), so that there is some y ∈ Cx ∩ N
such that y < f (M). By virtue of (1.1.3.10), we have that
Cx = Cy , and in particular x ∈ Cy . From the definition of Cy
(1.1.3.7), it follows that either (i) x = y, (ii) x is in the image
of f −1 , or (iii) x is in the image of g (the other possibilities are
excluded because x ∈ M). Of course it cannot be the case that
x = y because x ∈ M and y ∈ N. Likewise, it cannot be the
case that x is in the image of f −1 because x ∈ Ac . Thus, we
must have that x = g(y 0) for some y 0 ∈ N. Once again, we still
must show that y 0 ∈ Y2 . However, we have that y 0 = g −1 (x), so
that y 0 ∈ Cx . Furthermore, as Cx ∩ N is not contained in f (M),
from (1.1.3.13) it follows that Cx ⊆ Ac . Thus, y 0 ∈ Cx ⊆ Ac ,
and so y 0 ∈ Y2 . Thus, g| Y2 : Y2 → X2 is a bijection.
Step 9: Co n st ru c t t h e b i j e c t i on f ro m M t o N
Finally, we can define the bijection from M to N. We define
h : M → N by
(
f (x) if x ∈ X1
h(x) B −1 (1.1.3.14)
g (x) if x ∈ X2 .
Proof. a St e p 1 : Co nc lu d e t h at i t s u f f i c e s t o
s h ow t h at e v e ry no n e m p t y s u b s e t h a s a s m a l l -
est elem e n t
1 What is a number? 16
Step 2: D e f i n e T a s a p r e o r d e r e d s e t
So, let S ⊆ ℵ be a nonempty collection of cardinals and for
each m ∈ S write m = | Mm | for some set Mm . Define
Ö
MB Mm (1.1.3.16)
m∈S
and
T B {T ⊆ M : T ∈ Obj(Set); for all x, y ∈ T,
if x , y it follows that (1.1.3.17)
xm , ym for all m ∈ S.} .
Order T by inclusion.
St e p 3 : V e r i f y t h at T sat i s f i e s t h e h y p o t h e s e s
of Zorn’ s L e m m a
We wish to apply Zorn’s Lemma (Theorem A.3.5.9) to T . To
do that of course, we must first verify the hypotheses of Zorn’s
Lemma. T is a partially-ordered set by Exercise A.3.3.11. Let
W ⊆ T be a well-ordered subset and define
Ø
WB T. (1.1.3.18)
T ∈W
St e p 4 : Co nc lu d e t h e e x i st e nc e o f a m a x i m a l
element
The hypotheses of Zorn’s Lemma being verified, we deduce
that there is a maximal element T0 ∈ T .
St e p 5 : S h ow t h at t h e r e i s s o m e p ro j e c t i o n
w h o s e r e st r i c t i o n t o t h e m a x i m a l e l e m e n t i s
surjecti v e
Let πm : M → Mm be the canonical projection. We claim that
there is some m0 ∈ S such that πm0 (T0 ) = Mm0 . To show this,
we proceed by contradiction: suppose that for all m ∈ M there
is some element xm ∈ Mm \ πm (T0 ). Then, T0 ∪ {x} ∈ T is
strictly larger than T0 : a contradiction of maximality. Therefore,
there is some m0 ∈ S such that πm0 (T0 ) = Mm0 .
St e p 6 : Co n st ru c t a n i n j e c t i o n f ro m Mm0 t o Mm
for all m ∈ S
The defining condition of T , (1.1.3.17), is simply the statement
that πm | T : T → Mm is injective for all T ∈ T . In particular,
by the previous step, πm0 T0 : T0 → Mm0 is a bijection. And
−1
therefore, the composition πm ◦ πm0 T0 : Mm0 → Mm is an
injection from Mm0 to M. Therefore,
m0 = Mm0 ≤ | Mm | = m (1.1.3.19)
if x < 0.
−1
1.1 Cardinality of sets and the natural numbers 19
One thing to note about preordered rngs is that, to define the order,
it suffices only to be able to compare everything with 0 (and in fact,
some will even take this as their definition).
An axiomatization of N
We have attempted to introduce the natural numbers in such a way that
feels, well, natural. Arguably counting is the very first mathematics
all of us learn, and it is the natural numbers which allow us to do
exactly this. On the other hand, you will find that this treatment is
not exactly analogous to our development of the integers, rationals, or
reals. Of course, in all cases, I chose to develop the number systems in
the way which I felt to be most natural—but this doesn’t mean that all
the developments will be analogous. One might like to see a result for
N analogous to the defining results for Z, Q, and Q (Theorems 1.2.1,
1.3.4 and 1.4.2.9). Unfortunately,4 the analogous result is not true.
4Or perhaps fortunately, as this just solidifies our claim that the presentation in
terms of finite cardinals is a natural one.
1.1 Cardinality of sets and the natural numbers 25
Proof. St e p 1 : D e f i n e e ve ry t h i ng
Define N0 B N, s(m) B m + 1, and 00 B 0.
Step 2: P rov e ( i )
1.2 Additive inverses and the integers 27
Step 3: P rov e ( i i )
Step 4: P rov e ( i i i )
Let S ⊆ N0 have the properties that (iii.a) 0 ∈ S and (iii.b)
m ∈ S implies m + 1 ∈ S. We wish to show that S = N0. We
proceed by contradiction: suppose that S , N0. Then, S c is
nonempty, and as N is well-ordered, S c has a least element
m0 ∈ S c . As 0 ∈ S, it cannot be the case that m0 = 0. Then,
by (i), there is some n0 ∈ N0 such that s(n0 ) = m0 . As we have
defined s(n0 ) B n0 + 1, we in particular have that n0 < m0 . As
n0 is less than m0 and m0 is the least element of S c , we cannot
have n0 ∈ S c . Therefore, n0 ∈ S. But then, by hypothesis,
m0 = s(n0 ) ∈ S: a contradiction (as m0 ∈ S c ). Hence, we
must have that S = N.
Proof. St e p 1 : D e f i n e a n e q u i va l e nc e r e l at i o n
on N × N
The idea behind constructing the integers is to think of a pair of
natural numbers hm, ni as representing what should be m − n.
One issue with this, however, is that, if we do this, then it
should be the case that hm + 1, n + 1i and hm, ni both represent
the same number. Thus, we put an equivalence relation on
N × N.
Define hm1, n1 i ∼ hm2, n2 i iff m1 + n2 = m2 + n1 . We came
up with this of course because the statement hm1, n1 i ∼ hm2, n2 i
should be m1 − n1 = m2 − n2 . Of course, this itself doesn’t
make sense, and so we write this same thing as something that
does make sense given what we have already defined, namely
m1 + n2 = m2 + n1 .
m1 + n2 + m2 + n3 = m2 + n1 + m3 + n2 . (1.2.3)
Step 3: D e f i n e Z a s a s e t
We define
Z B N × N/∼ . (1.2.4)
1.2 Additive inverses and the integers 31
St e p 4 : D e f i n e a d d i t i o n a n d m u lt i p l i cat i o n o n
Z
We define
and
Step 5: D e f i n e t h e i d e n t i t i e s
We define
Step 6: S h ow t h at Z i s a n i n t e g r a l c r i ng
Step 7: D e f i n e a p r e o r d e r o n Z
We define
Step 8: S h ow t h at ≤ i s a t o ta l - o r d e r
St e p 9 : S h ow t h at hZ, +, ·, 0, 1, −, ≤i i s a t o ta l ly-
ordered i n t e g r a l c r i ng
St e p 1 3 : S h ow t h at Z i s t h e ‘ s m a l l e st ’ r i ng
which co n ta i n s N
Here, we are referring to the “Furthermore,. . . ” part of the
statement of the theorem. Of course, we have already shown
that Z is a ring which ‘contains’ N. Furthermore, if we
can show (ii0), then the same argument as before will show
uniqueness up to unique isomorphism of rings. So, let Z be
some other ring such that N ⊆ Z, that is, for which there
is a unique embedding (i.e. injective homomorphism—see
Exercise B.2.18) ι : N → Z of rigs. Similarly as before, define
i : Z → Z by
a To see how we came up with these definitions, recall that we are thinking
of hm1, n1 i and hm2, n2 i respetively as m1 −n1 and m2 −n2 . Thus, the product
of these two integers ‘is’ (m1 −n1 )(m2 −n2 ) = (m1 m2 +n1 n2 )−(m1 n2 +n1 m2 ).
b Precisely, we must show that there is a unique embedding Z → Z.
Nevertheless, we continue to use the term “copy” informally in this sense.
1.2 Additive inverses and the integers 35
You will notice a common theme through these notes, and indeed,
throughout all of mathematics: if ever we want something that isn’t
there, throw it in. For example, we had the natural numbers, but
wanted additive inverses, and so we came up with Z by just ‘adjoining’
the additive inverses of all the elements of N. Similarly, we want
multiplicative inverses, and so by throwing them in, we obtain Q.
Doing the same with limits gives us R, and in turn doing the same
with roots of polynomials gives us C.5
We now know that Z is a totally-ordered integral cring. Of course,
Z satisfies other properties as well (a lot of which follow from the
fact that Z is a totally-ordered integral cring).
a Contrast this with the rationals in which there are many more rationals
than just Z and the multiplicative inverses of elements of Z.
(iii). m + 1 ≤ n.
Proof. St e p 1 : I n t ro d u c e no tat i o n
Let Z be some other nonzero totally-ordered integral cring.
Let us write 0 Z and 1 Z respectively for the additive and
multiplicative identity in Z, to distinguish them from 0, 1 ∈ Z.
Step 2: D e f i n e ι : Z → Z
Propositions 1.1.4.10 and 1.2.23a imply that every element of
Z is of the form 1 + · · · + 1 or −(1 + · · · + 1) for some m ∈ N.
| {z } | {z }
m m
Note that any homomorphism of rings Z → Z must send
1 + · · · + 1 to 1 Z + · · · + 1 Z (and similarly for the additive
| {z } | {z }
m m
inverse). There is thus a unique homomorphism ι : Z → Z
defined in exactly this way. We wish to show that ι is an
embedding of totally-ordered rings.
1 What is a number? 38
Step 3: S h ow t h at ι i s no n d e c r e a s i ng
We leave this as an exercise.
Exercise 1.2.28 Show that ι is nondecreasing.
Step 4: S h ow t h at ι i s i n j e c t i v e
We check that ι is injective. For m ∈ Z, let us write mZ ∈ Z
to represent 1 Z + · · · + 1 Z if m ≥ 0 and −(1 Z + · · · + 1 Z ) if
| {z } | {z }
m −m
m ≤ 0, or more concisely, mZ = sgn(m)(1 Z + · · · + 1 Z ). Thus,
| {z }
|m|
in this notation, ι(m) = mZ . Now, suppose that mZ = nZ .
Without loss of generality, n ≤ m. From this equation, it
follows that (m − n) Z = 0 Z .
If m = n, we are done, so suppose that m − n ≥ 1. From
the axioms of a preordered rig, it follows that 1 Z ≤ (m − n) Z .
But then 0 Z ≤ 1 Z ≤ (m − n) Z = 0 Z , and so 0 Z = 1 Z .
By Exercise A.4.21, it follows that Z is the zero cring: a
contradiction. Hence, ι is injective.
St e p 6 : D e d u c e t h at ι i s a n e m b e d d i ng o f p r e -
ordered r i ng s
We have shown that ι is an injective nondecreasing ring ho-
momorphism such that ι(m) ≤ ι(n) iff m ≤ n, and so by
Exercises B.2.18 and B.2.19, it follows that ι is an embedding
of preordered rings.
St e p 7 : S h ow t h at ι i s t h e u n i q u e e m b e d d i ng o f
Z into Z
We mentioned before that there is a unique homomorphism
Z → Z, and so in particular this embedding is unique.
We end this section with an example that is a bit off track: when
writing this section, the question came to mind whether it was true or
not that a totally-ordered cring was necessarily integral (in particular,
I wanted to know whether we could get integrality for free from other
properties of the integers). It turns out this is false.
define
as well as
That is, the only ring in which 0 is invertible is the 0 ring. On the
other hand, there are many interesting rigs in which 0 is invertible.
Just as with the the natural numbers (Proposition 1.1.4.10) and the
integers (Proposition 1.2.23), we would like to know that this agrees
with our naive idea of what the rational numbers are.
Proposition 1.3.8 For all x ∈ Q, there exist unique m ∈ Z
and n ∈ Z+ such that (i) gcd(m, n) = 1 and (ii) x = n.
m
a More generally, you might say that any integers m, n ∈ Z for which
x= mn are numerators and denominators of x respectively, regardless of
whether or not n is positive or gcd(m, n) = 1.
∈ Q : m, n ∈ Z, m , 0 . (1.3.9)
n
QB m
R
a In general, we don’t want to consider ±∞ as elements of X, and if you
like, we are extending the order on X to the set X t {±∞} in such a way so
that ±∞ are the maximum and minimum of X respectively. On the other
hand, if X already had a maximum or minimum, then you should consider
±∞ as synonyms for the previously existing maximum and minimum.
So that’s what suprema (and infima) are, but why should you care?
At least one reason is the following. Consider the set
n n o
S B 1 + n1 : n ∈ Z+ ⊆ Q. (1.4.1.7)
This set has two key properties (i) it is bounded above, and (ii) for
every x ∈ S, there is some x 0 ∈ S with x < x 0. Draw yourself a
picture of what this must look like. Though we don’t know what a
limit is yet, from the picture we see that pretty much any reasonable
definition of a limit should have the property that
n
lim 1 + n1 = sup(S). (1.4.1.8)
n
Though we don’t even have the definition to make sense of it yet, you’ll
recall that the answer to the left-hand side should be e B exp(1),
which of course is not rational.6 Thus, despite the fact that S ⊆ Q,
sup(S) < Q, and in fact, for us, sup(S) just doesn’t make sense (yet).
Ultimately, because we want to do calculus, we want to be able to take
limits. In particular, because of things like (1.4.1.8), we had better
be able to take suprema as well. It turns out that throwing in all least
upper-bounds gives us all the limits we were missing. This is really
the motivation for demanding the existence of least upper-bounds: we
want to be able to take limits.
Thus, the desired property which Q lacks is the following.
6Of course, we haven’t proven that e is not rational, but right now, as we are only
concerned with motivation, simply knowing that it is not rational is enough to justify
the desire to have suprema.
1.4 Least upper-bounds and the real numbers 49
√
R It is not uncommon to see others using facts like “ 2 is
not rational.” as motivation for the introduction of the real
numbers. This√is stupid. If all we really cared about were
numbers like 2, then we shouldn’t be going from Q to
R, but rather from Q to A, the algebraic numbers (see
Definition 2.1.13). The point of R is not to be able to take
square-roots; the point is to be able to take limits.
Dx0 B { x ∈ Q : x ≤ x0 } . (1.4.2.1)
DU B {u ∈ X : u is an upper-bound of D.}
(1.4.2.3)
DL B {l ∈ X : u is a lower-bound of D.} .
D = (DU )L . (1.4.2.4)
D = { x ∈ X : x ≤ sup(D)} . (1.4.2.6)
1 What is a number? 54
Proof. St e p 1 : P rov e a u s e f u l l e m m a
Step 2: D e f i n e R a s a s e t
Define
n o
R B D ∈ 2Q : D is a dedekind-cut. . (1.4.2.11)
Step 3: D e f i n e a p r e o r d e r o n R
We define
D ≤ E iff D ⊆ E. (1.4.2.12)
Step 4: S h ow t h at ≤ i s a t o ta l - o r d e r
≤ is automatically a partial-order because set-inclusion is
always a partial-order. To show totality, let D, E ∈ R. If
D ≤ E, we are done, so suppose this is not the case. We
would like to show that E ≤ D, i.e., that E ⊆ D, so let
e ∈ E. As D is a Dedekind-cut, it suffices to show that e is a
lower-bound of every upper-bound of D. So, let u ∈ Q be an
upper-bound of D. We wish to show that e ≤ u. We proceed by
1.4 Least upper-bounds and the real numbers 59
Step 5: S h ow t h at ≤ i s D e d e k i n d - co m p l e t e
As ≤ is a total-order, it suffices simply to show that ≤ has the
least upper-bound property (by Proposition 1.4.1.11). To show
this, let S ⊆ R be nonempty and bounded above. Define
!U L
© Ø
SB D ® . (1.4.2.13)
ª
« D ∈S ¬
Step 6: D e f i n e a d d i t i o n
Addition of elements of R is just set addition:a
D + E B {d + e : d ∈ D, e ∈ E } . (1.4.2.15)
St e p 7 : D e f i n e t h e a d d i t i v e i d e n t i t y a n d i n -
verses
The cut
0 B { x ∈ Q : x ≤ 0} (1.4.2.17)
St e p 8 : S h ow t h at hR, +, 0, −i i s a co m m u tat i v e
group
Associativity and commutativity follow from the fact that set
addition is associative and commutative.
1.4 Least upper-bounds and the real numbers 61
D + (−D) = {d + (x − y) : d ∈ D, x ≤ 0, y < D} .
D0+ B {d ∈ D : 0 ≤ d} . (1.4.2.23)
1 What is a number? 62
1 B { x ∈ Q : x ≤ 1} (1.4.2.27)
St e p 1 2 : S h ow t h at t h e a d d i t i v e i n v e r s e d i s -
tributes
We wish to show that
y − M = d 0 + e 0 = d + (e 0 − (d − d 0)) . (1.4.2.32)
As d − d 0 ≥ 0, e B e 0 − (d − d 0) ∈ E. Of course,
y − (M − ε) = d + (e + ε), (1.4.2.33)
D · (E + F). (1.4.2.35)
D · (E + F) B D0+ [E + F]+0 ∪ 0
D·E+D·F
= D · ((E + F) + (−F)) + D · F
= D · (E + F) + D · (−F) + D · F (1.4.2.37)
C D · (E + F) + D · F + (−(D · F))
= D · (E + F),
St e p 1 6 : S h ow t h at hR, +, 0, −, ·, 1i i s a D e d e k i n d -
complet e t o ta l ly- o r d e r e d f i e l d
All that remains to be shown is that 0 ≤ D, E implies 0 ≤ D · E.
Of course, if d ∈ D, d ≥ 0 and e ∈ E, e ≥ 0, then de ∈ D · E,
so that 0 ≤ D · E.
Step 17: S h ow t h at Q ⊆ R
That Q ⊆ R follows immediately from Theorem 1.3.7.
St e p 1 8 : S h ow t h at e v e ry no n z e ro D e d e k i n d -
complete totally-ordered field which contains
Q contai n s a u n i q u e co p y o f R
Let R be a nonzero Dedekind-complete totally-ordered field
which contains Q. By Theorem 1.3.7, R contains a copy of
Q ⊆ R (as well as R itself, Q ⊆ R.). Let x ∈ X. Thus,
because both R and R contain Q, abusing notation, we may
consider
Dx := {q ∈ Q : q ≤ x} (1.4.2.42)
St e p 1 9 : S h ow t h at hR, +, 0, −, ·, 1i i s u n i q u e u p t o
unique is o m o r p h i s m o f p r e o r d e r e d f i e l d s
We leave this as an exercise.
Exercise 1.4.2.45 Prove this yourself.
a If ever you’re wondering how to come up with these definitions, just
think of what the answer should be for sets of the form (1.4.2.1).
b Of course this is abuse of notation. It should not cause any confusion
as one is a subset of Q and the other is an element of Q.
c While it’s perhaps not quite so clear a priori why this has no loss of
generality, if you look at the following computation, you will see that you
can swap their roles if instead we had d ≤ d 0 .
d If you want to be pedantic about things, there is a subfield Q ⊆ R
together with an isomorphism of preordered fields ψ : Q → Q, where
Q ⊆ R.
8Once again, not just the smallest, like with the integers and rationals, but the
only.
1.4 Least upper-bounds and the real numbers 69
Dx := {q ∈ Q : q ≤ x} (1.4.2.50)
a If
you want to be pedantic about things, there is a subfield Q ⊆ R
together with an isomorphism of totally-ordered fields ψ : Q → Q, where
Q ⊆ R.
R To clarify,
Our first order of business is to decide what other sets besides the
natural numbers are countably-infinite.
2 Basics of the real numbers 74
You (hopefully) just showed that 2ℵ0 = ℵ0 , but what about ℵ20 ?
is countable.
Xm B {hi, ji ∈ N × N : i + j = m} (2.1.10)
This result might seem a bit crazy at first. I mean, just ‘look’ at
the number line, right? There’s like bajillions more rationals than
naturals. Surely it can’t be the case there there are no more rationals
than natural numbers, can it? Well, yes, in fact that can be, and in
fact is, precisely the case—despite what your silly intuition might be
telling you, there are no more rational numbers than there are natural
numbers.
We mentioned briefly before the algebraic numbers in the remark
right before Subsection 1.4.2 Dedekind-cuts and the real numbers. We
go ahead and define them now because they provide a good exercise
in cardinality.
aI
√
know technically we have not yet defined 2, though as I only intend
you to convince you (as opposed to prove it for you) that there are algebraic
numbers which are not rational, this is not an issue.
So, we’ve now done both Z and Q, but what about R? At first,
you might have declared it obvious that there are more real numbers
than natural numbers, but perhaps the result about Q has now given
you some doubt. In fact, it does turn out that there are more real
numbers than there are natural numbers.
Theorem 2.1.15 — Cantor’s Cardinality Theorem. Let X
be a set. Then, |X | < |2X |.
Bε (x0 ) B { x ∈ R : |x − x0 | < ε}
(2.2.3)
Dε (x0 ) B { x ∈ R : |x − x0 | ≤ ε} .
S B {m ∈ N : m < x} (2.3.4)
m1 < m2 ≤ m0 . (2.3.6)
But then |m1 − m2 | < 12 (because they are both strictly between
m0 − 12 and m0 ), and so m1 = m2 (by Proposition 2.2.5): a
contradiction (of the fact that m1 < m2 ). hence, it must have
been the case that m0 = m1 ∈ N.
We cannot have that m0 + 1 ∈ S because then otherwise m0
would not be an upper-bound of S. Therefore, m0 + 1 ∈ N \ S =
{m ∈ N : x ≤ m}, and so x ≤ m0 + 1, and so x < m0 + 2.
a You aren’t supposed to know what this means yet of course. This
doesn’t come until Definition 4.4.7.
R It turns out that this is also the case for the irra-
tional numbers, Qc , but at the moment, we don’t
even know how to construct a single irrational num-
ber, so we will have to return to this at a later date
(Theorem 2.4.3.64).a
n0 −1
First of all, we know that n0 − 1 < S, and so m0 < b. On
the other hand,
n0 −1
m0 = n0
m0 − m10 ≥ b− m10 > b−ε = b−(b−a) = a, (2.3.20)
n0 −1
and so m0 ∈ (a, b).
spaces with the same notion of sequential convergence—see Example 3.2.48. The
significance of this is that, if you want to define a topology by saying what it means
to converge (and you almost certainly will at some point if you plan to become a
mathematician), you cannot use sequences—see the remark in Theorem 3.4.2.1.
2.4 Nets, sequences, and limits 87
2.4.2 Limits
We now define what it means to be the limit of a net. To make the
definition as clean and concise as possible, we introduce the concept
of nets eventually doing something.
Eventuality
We often use the word eventually in the context of nets and sequences.
For example, we might say “The net λ 7→ xλ is eventually XYZ.”,
“XYZ” be some sort of property. This means that there is some λ0
such that if λ ≥ λ0 it follows that λ 7→ xλ is XYZ. For example, a fact
that will come in handy is that convergent nets (and in fact, Cauchy
nets) are eventually bounded (Proposition 2.4.3.9). We make this
formal.
2.4 Nets, sequences, and limits 89
it’s fair to say that the definition that uses the term
“eventually” is both more intuitive and concise.
a Divergence
is not the same as nonconvergence. We will define diver-
gence momentarily.
1
Exercise 2.4.2.11 Show that limm m = 0.
Similarly, we write
Ö n
Ö
xk B xk
{x m,...,x n } k =m (2.4.2.14)
B xm · xm+1 · · · · · xn−1 · xn .a
We have
m
Õ m
b = (1 + (b − 1)) =
m m
(b − 1)k ≥ 1 + m(b − 1).a
k =0
k
Exercise 2.4.2.17
(i). What is an example of a net which neither converges
nor diverges?
(ii). What is an example of a net which diverges, but doesn’t
diverge to either ±∞.
You probably recall that this should converge to e, but what the hell
is e? It has not yet been defined. In fact, we will later define
m
1
Õ
e B lim k! , 5 (2.4.2.19)
m
k =0
4In case you’re wondering what this crazy new symbol “!” means, for n ∈ N, n!,
the factorial of n, is defined inductively by 0! B 1 and (n + 1)! B (n + 1) · n!.
2.4 Nets, sequences, and limits 97
7Actually we will define e B exp(1), but this will of course turn out to be exactly
this limit.
2 Basics of the real numbers 98
m
1
Õ
m 7→ sm B k! (2.4.3.6)
k =0
k =m +1
2
1
follows that
sm − sm ≤ 2−m1 < ε. (2.4.3.8)
1 2
where x0 is the center of the ball B. The reason for the 1 + |x0 |
is as follows. For m ≥ m0 , we have that
| xm | = |(xm − x0 ) + x0 | ≤ | xm − x0 | + | x0 |
(2.4.3.11)
≤ 1 + | x0 |.
a The common index is supposed to indicate that the two nets have the
same domain.
xλ > x∞ − 12 x∞ = 12 x∞ . (2.4.3.16)
x − x −1 = | x∞ − xλ | < 1
− x∞ | < 1
ε. (2.4.3.17)
−1
λ ∞ |x
M2 λ M2
| xλ x∞ |
a Here we are making use of the trick we mentioned before about
‘throwing away’ the beginning terms of a net—see the end of Eventuality.
b That is, forget our previous definition of λ .
0
x∞ B sup{xλ } ∈ R. (2.4.3.22)
λ
Taking the inf of this inequality (and using the fact that
inf(−S) = − sup(S)—see Exercise 1.4.2.54), we find
Pick something larger than both λ0 and λ00 and change notation
so that this new larger thing is called λ0 . Thus, now, for λ ≥ λ0 ,
both sets of inequalities hold, and so
a No, you are not expected to know what a uniform space is yet (though
see Definition 4.1.2.1 if you can’t wait to find out).
can define
m
1
Õ
e B lim k! . (2.4.3.48)
m
k =0
But let’s not. Given what we currently know, it would seem like
we would just be pulling this definition out of our ass. This series
is not why e matters. In fact, I would argue that e itself doesn’t
matter—it is the function exp that arises naturally in calculus (being
the unique (up to scalar multiples) function equal to its own derivative).
Thus, we refrain from ‘officially’ defining e until we have defined the
exponential, which of course we won’t be able to do until we know
about differentiation—see Definition 6.4.5.28.
Nevertheless, it would be nice to ‘officially’ know that there is at
least some real number that is not rational.
Square-roots
I mentioned back right before Subsection 1.4.2 Dedekind-cuts and
the real numbers that sometimes people introduce the real numbers
so that we can take square-roots of numbers, but that this logic is a
bit silly, because if that were the objective, then we should really be
extending our number system from Q to A, not from Q to R. That
being said, even though our justification for the introduction of the
reals was really so that we can do calculus (take limits), it does turn
out that we do get square-roots of all nonnegative reals. This should
be viewed more as “icing on the cake” instead of the raison d’être.
1
x
xm+1 B xm + (2.4.3.50)
2 xm
1
x
x∞ = x∞ + , (2.4.3.51)
2 x∞
2
1
2 x
xm = xm−1 +
4 xm−1
" ! #
1 2 x2
=x+ x + 2x + 2 −x (2.4.3.54)
4 m−1 xm−1
2
1
x
=x+ xm−1 − ≥ x.
4 xm−1
2.4 Nets, sequences, and limits 111
R As I’m
√ sure you’re
√ aware, if m = 2, it is common to
write x B 2 x.
√
R If x < 0, then we consider the symbol m
x to be
undefined.
We would have been able to show almost from the very beginning
that, if m ∈ Z+ is not a perfect square, then there is no element x ∈ Q
such that x 2 = m. What we haven’t been able to show until just
now, however, is that there is a real number x ∈ R such that x 2 = m.
We now finally check that indeed there is no rational number whose
square is m. Thus, we will have finally established the existence of
real numbers which are not rational.
Proposition 2.4.3.58 Let m ∈ Z+ . Then, if m is not a perfect
square, then there is no x ∈ Q such that x 2 = m.
2 Basics of the real numbers 112
a2 = nb2, (2.4.3.59)
and hence
We now show that all intervals in R are exactly what you think
they are (confer Definition A.3.3.3). We could have done this awhile
ago now, but we wanted to wait until we could definitively provide an
2.4 Nets, sequences, and limits 113
9By “what you would expect”, we mean something like [a, b], or [a, b), etc.
2 Basics of the real numbers 114
‘Density’ of Qc in R
We mentioned back when we showed the ‘density’ of Q in R (The-
orem 2.3.18) that it was also true that Qc was ‘dense’ in R. At the
time, however, we could not even construct a single real number that
we could prove was irrational. Now, however, we have done so, and
so we can return to the issue of the ‘density’ of Qc in R.
Counter-examples
In this small subsubsection, we present a couple of counter-examples,
which might seem quite surprising if you’ve never thought too much
about these things before.
Thus:
Even if there is some number x∞ such that
(i) for all λ, limµ xλ,µ = x∞ , and (ii) for all
µ, limλ xλ,µ = x∞ , it need not be the case that
limλ xλ,λ = x∞ .
Then,
0
lim xm,n = 1
= 0, (2.4.3.72)
m 0+ n
and hence
lim lim xm,n = 0. (2.4.3.73)
n m
and hence
lim lim xm,n = 1. (2.4.3.75)
m n
Thus:
2.4.4 Series
As you probably know from calculus, a series is an ‘infinite sum’. In
other words, it is a limit of a finite sum.
Definition 2.4.4.1 — Series Let m 7→ am be a sequence and
define sm B ak . Then,
Ím
k =0
Õ ∞
Õ m
Õ
ak B ak B lim ak (2.4.4.2)
m
k ∈N k =0 k =0
is a series and the sm s are the partial sums of this series.a The
series is said to converge iff the sequence m 7→ sm converges
and similarly for diverge to ±∞ and diverge. The series
converges absolutely iff km=0 |ak | converges and similarly for
Í
diverges absolutelyb . The series converges conditionally iff
it converges but not absolutely, and similarly for diverges
conditionally.
a Ofcourse this limit need not exist. In that case, the notation k ∈N ak
Í
is meaningless, or at least, doesn’t represent any real number.
b Of course, there is no “diverges to ±∞ absolutely” because this can
only diverge to +∞. Also note that “diverges absolutely” is the same as
“doesn’t converge absolutely”.
c Why?
(−1) m+1
(ii). converges, but not absolutely (that is, is
Í
m∈Z+ m
conditionally convergent).
(iii). m∈Z+ m diverges to +∞.
Í
(iv). m∈Z+ (−2)m+1 diverges, but not to ±∞.
Í
m+1
(v). m∈Z+ (−1)m diverges absolutely, but doesn’t diverge
Í
(that is, is conditionally divergent).
the above is all you need to convince yourself that the statement
is true.a
Let ε > 0 and choose m0 so that whenever m0 ≤ m ≤ n it
follows that
Õ n Õm
|ak | − |ak | < ε. (2.4.4.12)
k =0 k =0
Now that the partial sums are Cauchy follows from the fact
that limm am = 0.
a Depending on whether m − n is even or odd, the last term might have
instead just be −an instead of −(an−1 − an ). Either way, the same inequality
holds.
Proof. (i) follows from the fact that the operation of taking lim-
its (in this case, applied to the partial sums) is nondecreasing—
see Exercise 2.4.3.19.
Exercise 2.4.4.18 Prove (ii).
Exercise 2.4.4.19
(i). Find an example of sequences m 7→ am and m 7→
bm with (i) |am | ≤ | bm |, (ii) m∈N am divergent, but
Í
(iii) m∈N bm not divergent.
Í
(ii). Find an example of sequences m 7→ am and m 7→ bm
with (i) |am | ≤ | bm |, (ii) m∈N bm convergent, but
Í
(iii) m∈N am not convergent.
Í
Proof. Define L B limm bam > 0. Let ε > 0 be less than L,
m
and choose m0 ∈ N such that, whenever m ≥ m0 , it follows
thata
am
bm − L < ε, (2.4.4.21)
that is,
am
−ε < bm − L < ε, (2.4.4.22)
2.4 Nets, sequences, and limits 123
or rather
a It
is implicit in this assumption that we also assume that m 7→ am is
eventually nonzero, so that this hypothesis actually makes sense.
Proof. Suppose that r B limm aam+1 < 1. Let ε > 0 be less
m
than 1−r. Then, there is some m0 such that, whenever m ≥ m0 ,
it follows that
am+1 a
am − r ≤ am+1 − r < ε, (2.4.4.25)
m
so that aam+1 < r + ε C C < 1, so that |am+1 | < C|am |, so
m
a
Exercise 2.4.4.26 Prove the case where limm am+1 m
>
1.
2 Basics of the real numbers 124
1
W Warning: The lim supm |am | m case fails if you re-
place “doesn’t converge” with “diverges”. For exam-
ple, take the sequence of terms
1
Proof. Suppose that u B lim supm |am | m < 1. Let ε > 0
be less than 1 − u > 0. Then, there is some m0 such that,
whenever m ≥ m0 , it follows that
1
sup {|an | n } − u < ε, (2.4.4.29)
n≥m
1 1
so that supn≥m {|an | n } < ε + u < 1, so that |an | n < r B
ε + u < 1 for all n ≥ m0 , so that |an | < r n for all n ≥ m0 . It
2.4 Nets, sequences, and limits 125
= 1 + 12 + 12 + 12 + · · · = ∞.
In fact, there is a test that will tell us that the Harmonic Series
diverges immediately, called the p-series Test. It says that n∈Z+ n1p
Í
converges (absolutely) iff p > 1. While I suppose it is possible to
prove it at the moment,10 I think the most natural proof makes use of
the integral, and so we postpone the proof—see Proposition 6.4.5.76.
More specifically, this proof will make use of the Integral Test, which,
obviously, first requires we introduce the integral, and so we likewise
postpone this as well—see Proposition 5.2.3.15.
Decimal expansions
We now return to the issue left unfinished from Chapter 1 having to
do with your “naive” idea of the real numbers, namely, as decimal
expansions. In mathematics, it is almost never a good idea to prove
something about the real numbers using decimal expansions. On the
other hand, I suppose it’s nice to know that this new object does in
fact correspond to what you are familiar with.
In any case, without further ado, here is the precise statement of
what is meant by “decimal expansions”.
Proof. St e p 1 : R e p l ac e x w i t h | x|
We always have that x = sgn(x)| x|. Thus, what we actually
want to do is find a unique x0 ∈ N and xk ∈ {0, . . . , r − 1} for
k ∈ Z+ such that
(i).
∞
Õ xk
| x| = x0 + ; (2.4.4.35)
k =1
rk
and
(ii). k 7→ xk is not eventually the constant r − 1.
2 Basics of the real numbers 128
Step 2: D e f i n e x0
First of all, let x0 ∈ N be the largest integer less-than-or-equal
to x.a Define y0 B x − x0 ∈ [0, 1).
Step 3: D e f i n e x1
Note that dr ∈ [0, 1) for d ∈ {0, . . . , r − 1}. As the set
d ∈ {0, . . . , r − 1} : dr ≤ y0 is nonempty and finite, it has a
Step 4: D e f i n e xk f o r k ∈ Z+ i n d u c t i v e ly
Note again that rd2 ∈ [0, r1 ) for d ∈ {0, . . . , r − 1}, and so,
similarly as before, there is a largest x2 ∈ {0, . . . , r − 1} such
that rx22 ≤ y1 .
Continue this process inductively defining xk for all k ∈ Z+ .
Step 5: V e r i f y ( i ) a n d ( i i )
We now check that k 7→ xk is not eventually the constant r − 1.
To see this, note that
∞
Õ r −1 1
k
= k −1 (2.4.4.39)
k =k
r r 0
0
2.4 Nets, sequences, and limits 129
Step 6: S h ow t h at x0 = x00 i n u n iq u e n e s s
It remains to check uniqueness. So, let x00 ∈ Z and xk0 ∈
{0, . . . , r − 1} be such that
∞
Õ xk0
x = x00 + (2.4.4.40)
k =1
rk
and hence
∞ 0
Õ x k − x k
x0 − x 0 ≤ . (2.4.4.42)
0
k =1
rk
Step 7: S h ow t h at x1 = x10 i n u n iq u e n e s s
Now that we know x0 = x00 ,
∞ ∞
Õ xk Õ xk0
x0 + x = x 0
0 + (2.4.4.47)
k =1
rk k =1
rk
simplifies to
∞
Õ xk0 − xk
x1 − x10 = (2.4.4.48)
k =2
r k−1
2.4 Nets, sequences, and limits 131
St e p 8 : S h ow t h at xk = xk0 f o r k ∈ Z+ i n u n i q u e -
ness
Applying this argument inductively, we find that xk = xk0 for
all k ∈ Z+ , finishing the proof of uniqueness.
a By the Archimedean Property, there is some integer larger than x. As N
is well-ordered, there is a smallest integer strictly larger than x. That integer
minus one will be the largest integer less-than-or-equal to x.
ψ(x) B {q ∈ Q : q ≤ x} . (2.4.4.54)
2 Basics of the real numbers 134
And now we turn to the precise statement of the idea that “addition
of infinitely many real numbers is ‘noncommutative’.”.
+
am B 12 (|am | + am ) and am
−
B 12 (|am | − am ). (2.4.4.61)
Then,
(i). if am converges conditionally, then
Í
m∈N
+
Õ Õ
am =∞= −
am ; (2.4.4.62)
m∈N m∈N
and
(ii). if
+
Õ Õ
am =∞= −
am , (2.4.4.63)
m∈N m∈N
+
Õ Õ
∞= |am | = [am + am−
], (2.4.4.64)
m∈N m∈N
+
Õ Õ
∞> am = [am − am−
], (2.4.4.65)
m∈N m∈N
−am .
−
ak+ < y,
Õ
(2.4.4.69)
k =0
so that
m0
ak+ − y < am
+
Õ
0≤ 0
. (2.4.4.70)
k =0
and
m0 n0 m1 n1
ak+ − ak+ −
Õ Õ Õ Õ
ak− + ak− ≤ x (2.4.4.74)
k =0 k =0 k =m0 +1 k =n0 +1
a Adapted from [Rud76].
x = −∞ and y = +∞!
b Note in the statement that we are allowed to take
c Note that I can replace m0 with m ≤ m0 on the left-hand side of
(2.4.4.70) and the inequality still remains valid. Similarly, I can replace
n0 with m ≤ n0 on the left-hand side of (2.4.4.72) and the inequality still
remains valid.
a You are neither supposed to know what these are yet nor why this is
significant.
b Or at least, this is the impression that I have gotten.
xλµ : µ ≥ µ0 ⊆ { xλ : λ ≥ λ0 } . (2.4.5.3)
xλµ : µ ≥ µ0 ⊆ { xλ : λ ≥ λ0 } . (2.4.5.4)
xλµ : µ ≥ µ0 ⊆ { xλ : λ ≥ λ0 } . (2.4.5.5)
a6July 2015
b Ormaybe even impossible? I don’t know because I don’t use this
definition.
Example 2.4.5.13
(i). h0, 0, 0, . . .i is a subsequence of h0, 1, 0, 1, 0, 1, . . .i.
2.4 Nets, sequences, and limits 145
{λ ∈ Λ : xλ is XYZ.} (2.4.5.15)
Define
(x i )∞ B lim(x i )λ . (2.4.5.24)
λ
(x )λi − (x ∞ )∞ ≤ (x i )λi − (x i )∞
i
+ (x i )∞ − (x ∞ )∞ (2.4.5.26)
< ε + ε = 2ε.
lim inf xλ ≤ lim inf xλµ ≤ lim sup xλµ ≤ lim sup xλ .
λ µ µ λ
xλµ : µ ≥ µ0 ⊆ { xλ : λ ≥ λ0 } . (2.4.5.28)
Hence
2.5.1 Continuity
Roughly speaking, a continuous function is a function which ‘preserves
limits’, that is, if you take the limit as x goes to a, then f (x) goes to
f (a). Thus, to formulate the definition of continuity, we must first say
what we mean by limit of a function (so far, we have only said what it
means to take the limit of a net). In turn, the notion of a limit point
will make this definition slightly easier to formulate (it’s also quite an
important concept in its own right).
12Though upon generalization we lose a lot of the nice properties we had in R!
2.5 Basic topology of the Euclidean space 151
of the Algebraic Limit Theorems and the Order Limit Theorem for
limits of functions.
Proposition 2.5.1.6 — Algebraic Limit Theorems (for
functions) Let D ⊆ Rd , let x0 ∈ Rd be a limit point of D,
and let f , g : D → Re both have limits at x0 . Then,
(i). limx→x0 [ f (x) + g(x)] = limx→x0 f (x) + limx→x0 g(x);
(ii). limx→x0 [ f (x)g(x)] = limx→x0 f (x) limx→x0 g(x) ;
1
(iii). limx→x0 f (x) = lim x→x1 f (x) if limx→x0 f (x) ∈ [Re ]×a ;
0
and
(iv). limx→x0 [α f (x)] = α limx→x0 f (x) for α ∈ R.
a Recall that ((A.4.17)) the notation [Re ]× means the set of invertible ele-
ments in Re . Concretely, this means that every component of lim x→x0 f (x)
is nonzero.
a I suppose for functions with values in R d for d ≥ 2, you could take the
lim sup and lim inf ‘componentwise’, but that I think is a bit awkward.
a The reason for this definition is nontrivial, and we will have to wait for
Lebesgue measure to justify it. That being said, you can see intuitively why
we make this definition simply by graphing the function hx, yi 7→ x y on
R+ × R+ . Despite our reasons, I should probably note that I imagine most
simply leave 00 undefined.
Proof. St e p 1 : D e f i n e a m f o r m ∈ N
For m ∈ N, a m is defined by
am B a · · · · · a . (2.5.1.21)
| {z }
m
Step 2: D e f i n e a1/n f o r n ∈ Z+
1
For n ∈ Z+ , a n is defined to be the unique positive real number
1
that has the property that (a n )n = a—see Proposition 2.4.3.56.
Step 3: D e f i n e ar f o r r ∈ Q p
For qp ∈ Q with p , 0, q ∈ Z+ , and gcd(p, q) = 1, a q is
defined by
p 1
a q B (a p ) q . (2.5.1.22)
√ √3
16Did you really think
√ you knew√what something like 2 was? What are you
going to do? Multiply 2 by itself 3 times? Good luck with that.
2 Basics of the real numbers 158
Step 4: D e f i n e a x f o r x ∈ R
For x ∈ R, let λ 7→ xλ ∈ Q be some net converging to x.a
Then, a x is defined by
a x B lim a xλ . (2.5.1.23)
λ
Step 5: C h e c k p ro p e rt i e s
a1 = a by definition.
Step 6: S h ow u n i q u e n e s s
a Note that some such net exists by ‘Density’ of Q in R.
17Okay, so this is almost true, but not quite—see Definition 3.1.3.1 and Exer-
cise 3.1.3.3.
2 Basics of the real numbers 160
(⇐) Suppose that the preimage of every open set is open. Let
x ∈ Rd and ε > 0. Bε ( f (x)) is open by Exercise 2.5.2.4, and
so f −1 (Bε ( f (x))) is open. As this set contains x, there is some
δ > 0 such that Bδ (x) ⊆ f −1 (Bε ( f (x))), and so f (Bδ (x)) ⊆
Bε ( f (x)). Thus, f is continuous by Exercise 2.5.1.18.(iv).
a Example?
2 Basics of the real numbers 164
2.5 Basic topology of the Euclidean space 169
The next couple of results are incredibly important for the same
reason that Theorem 2.5.2.5 (the characterization of continuity in
terms of open sets) was: they are neither deep nor difficult (in fact,
they’re quite trivial), but they will be defining requirements of open
sets in the more general setting of topological spaces.
is open.
is open.
2 Basics of the real numbers 172
is closed.
is closed.
Proof. Define
Ù
Cls(S) B C. (2.5.2.39)
C ⊆R closed
S ⊆C
One might ask “What about the smallest open set which contains
S?” (and “dually”). In general, however, there is no smallest open set
that contains S.
Example 2.5.2.43 — A set for which there is not a
smallest open set which contains S Define S B {0} ⊆ R.
Suppose there is a smallest open set U which contains 0. As U
2.5 Basic topology of the Euclidean space 175
2 Basics of the real numbers 178
d
Exercise 2.5.2.51 Let S ⊆ 2R be a collection of subsets of
Rd . Show that the following are true.
(i). S ∈S Cls(S) ⊇ Cls ( S ∈S S).
Ñ Ñ
(ii). S ∈S Int(S) ⊆ Int ( S ∈S S).
Ð Ð
(iii). S ∈S Cls(S) ⊆ Cls ( S ∈S S).
Ð Ð
(iv). S ∈S Int(S) ⊇ Int ( S ∈S S).
Ñ Ð
2.5.3 Quasicompactness
You will find with experience that closed intervals on the real line are
particularly nice to work with. For example, you’ll probably recall
from calculus that, on a closed interval, every continuous function
attains a maximum and minimum (the Extreme Value Theorem
(Corollary 3.8.2.5)). In particular, continuous functions are bounded
on closed intervals. This is not just true of all closed intervals, however,
but is in fact true about any closed bounded subset of Rd .
The objective then is to characterize closed bounded sets in
such a way that will generalize to arbitrary topological spaces. The
characterization we are looking for is what is called quasicompactness.
d
Definition 2.5.3.1 — Cover Let S ⊆ Rd and let U ⊆ 2R .
Then, U is a cover of S iff S ⊆ U ∈U U. U is an open cover
Ð
iff every U ∈ U is open. A subcover of U is a subset V ⊆ U
that is still a cover of S.
a You are not expected to know what T means—see the next chapter for
2
details (specifically, Definition 3.6.2.18), though keep in mind, the details
don’t matter at the moment.
b Actually, since first writing this remark, I found that Bourbaki [Bou66,
pg. 83] uses exactly this terminology, so perhaps the root of this terminology
goes back further.
And now of course we had better check that this condition properly
characterizes “closed and bounded” in the real numbers, as desired.
Proof. a (⇒) St e p 1 : A s s u m e h y p o t h e s e s
Suppose that S is closed and bounded. Let U be an open cover
of S. We proceed by contradiction: suppose that U has no
finite subcover.
0, M
2 ∩ S or 2 , M ∩ S has no finite subcover. Proceeding
M
Step 3: Co n st ru c t a n e l e m e n t i n S ∩
Ñ
k ∈N Rk
2.5 Basic topology of the Euclidean space 181
Step 4: D e d u c e t h e co n t r a d i c t i o n
As x∞ ∈ S, there is some U ∈ U such that x∞ ∈ U. Then,
because the side lengths of the rectangles converge to 0 and
U is open, there must be some Im0 such that x∞ ∈ Im0 ⊆ U.
But then, {U} is a finite open cover of Im0 ∩ S: a contradiction.
Therefore, S is quasicompact.
(⇐)
Step 5: A s s u m e h y p o t h e s e s
Suppose that S is quasicompact.
Step 6: S h ow t h at S i s bo u n d e d
The cover
Step 7: S h ow t h at S i s c l o s e d
Let λ 7→ xλ ∈ S be a net converging to x∞ ∈ Rd . We must
show that x∞ ∈ S—see Theorem 2.5.2.17. We proceed by
contradiction: suppose that x∞ < S. Then, for each s ∈ S,
because s , x∞, there is some εs > 0 such that x∞ < Bεs (s).d
The collection Bεs (s) : s ∈ S is certainly an open cover of S,
and so there is some finite subcover {Bεs1 (s1 ), . . . , Bεs m (sm )}.
Define ε0 B min1≤k ≤m {| sk − x∞ | − εk }. As limλ xλ = x∞ ,
there is some xλ0 ∈ Bε0 (x∞ ). However, by the Reverse Triangle
Inequality (Exercise 2.2.4.(v)),
xλ − sk = (xλ − x∞ ) + (x∞ − sk )
0 0
≥ xλ0 − x∞ − | x∞ − sk |
≥ | x∞ − sk | − xλ0 − x∞
(2.5.3.6)
> | x∞ − sk | − ε0
≥ | x∞ − sk | − (| sk − x∞ | − εk )
= εk ,
and so, xλ0 < Bεsk (sk ), a contradiction of the fact that
{Bεs1 (s1 ), . . . , Bεs m (sm )} is a cover of S. Therefore, limλ xλ =
x∞ ∈ S, and we are done.
a Proof adapted from [Abb02, pg. 87–89].
b In general, we want S ⊆ [−M, M] × · · · × [−M, M].
c In higher dimensions, [−M, M] × · · · × [−M, M] breaks up into not
2 cases, but 2d , all with side length M. The same logic gives us that S
intersected with at least one of these ‘quadrants’ has no finite subcover.
d For what it’s worth, this step does not work in general.
a Recall De Morgan’s Laws (Exercise A.2.3.12).
Cλ B Cls xµ : µ ≥ λ (2.5.3.11)
Define
Let ε1, ε2 > 0 and let λ1, λ2 ∈ Λ be such that xλk ∈ Bεk (x).
Take ε B min{ε1, ε2 } and let λ3 be at least as large as λ1 and
λ2 . By (2.5.3.13), there is some λ ≥ λ3 such that xλ ∈ Bε (x).
Then, hε, λi ∈ Λ0 and hε, λi ≥ hεk , λk i, so that Λ0 is directed.
Now, for hε, µi ∈ Λ0, pick some λε,µ such that (i) xλε, µ ∈
Bε (x) and (ii) λε,µ ≥ µ. Of course hε, µi 7→ xε,µ converges
to x, but we still need to check that it is a subnet.
Let λ0 be an arbitrary index. Then, if hε, µi ≥ (1, λ0 ), it
follows that λε,µ ≥ µ ≥ λ0 , and so, by Proposition 2.4.5.9, this
is indeed a subnet.
2.6 Summary
This has been a rather long chapter and we have covered many different
properties of the real numbers. For convenience, we summarize here
some of the main points we have covered.
(i). The real numbers are the unique (up to isomorphism of totally-
ordered fields) nonzero Dedekind-complete totally-ordered
field.
(ii). The real numbers are Cauchy-complete (Theorem 2.4.3.40).
(iii). The real numbers have cardinality 2ℵ0 > ℵ0 (Theorems 2.1.15
and 2.4.4.49).
(iv). The real numbers are Archimedean (the natural numbers are
unbounded)—see Theorem 2.3.3.
2.6 Summary 187
(i). ∅, X ∈ C ;
(ii). C ∈D C if D ⊆ C ; and
Ñ
(iii). km=1 Ck ∈ C if Ck ∈ C for 1 ≤ k ≤ m.
Ð
(⇐) Suppose that (i) B covers X, and that (ii) for every for
x ∈ X and B1, B2 ∈ B with x ∈ B1 ∩ B2 , there is some
3 Topological spaces 192
The discrete topology is the ‘finest’ topology you can have (finer
means more open sets). The ‘coarsest’ topology you can have is the
indiscrete topology.
3.1.3 Continuity
We now finally vastly generalize our definition of continuity.
and f1 is continuous.
about them. But by the same token, because there is so much structure,
unless you go through the proofs yourself in detail, it can be difficult
to recall why things are true—does property XYZ hold for the real
numbers because they are a totally-ordered field or because they are
a metric space or because they are a topological vector space or
because. . . ? Instead, however, if we step back and only study R as a
topological space, it becomes clearer why certain things are true. This
allows us to tell easily what is true about the real numbers because
they are a topological space, as opposed to what is true about the real
numbers because they are a field, etc..
For example, consider the following.
3.2 A review
Quite a many things that we did with the real numbers in the last
chapter carry over to general topological spaces no problem. For
convenience, we present here all definitions and theorems which carry
over nearly verbatim to topological spaces. We will try to point out
when things do not carry over identically (for example, limits no
longer need be unique (Example 3.2.7)).
One important thing to keep in mind is: Open neighborhoods of a
point in a general topological space play the same role that ε-balls did
in R. Not only do we use this to determine appropriate generalizations,
but if you are feeling a bit uncomfortable with topological spaces, this
should help your intuition.
For the entirety of this section, X and Y will be general topo-
logical spaces. We recommend that this section be used mainly as
a reference—the pedagogy of the concepts is contained in the last
chapter.
xλµ : µ ≥ µ0 ⊆ { xλ : λ ≥ λ0 } . (3.2.11)
{m ∈ N : xm , x∞ } (3.2.47)
(⇐) Ste p 1 : Ma k e h y p o t h e s e s
Suppose that every cover by elements of S has a finite subcover.
To show that X is quasicompact, we prove the contrapositive
of the defining condition of quasicompactness. That is, we
show that every collection of open sets that has the property
that no finite subset covers X, also does not cover X. So, let U
be a collection of open sets that has the property that no finite
subset covers X. We show that U itself does not cover X.
Step 2: E n l a rg e U t o a m a x i m a l co l l e c t i o n
Let U˜ be the collection of all collections of open sets which
(i) contain U (ii) also have the property that no finite subset
covers X. This is a set that is partially-ordered by inclusion,
and so we intend to apply Zorn’s Lemma (Theorem A.3.5.9) to
extract a maximal such element. So, let W˜ be a well-ordered
subset of U˜ and define
Ø
W0 B W. (3.2.1.6)
W ∈W˜
so that
nk
m Ø
Ø
U∪ Ukl = X, (3.2.1.8)
k =1 l =1
Step 4: D e d u c e t h at U0 d o e s no t cov e r X
3 Topological spaces 224
Define
V0 B U0 ∩ S . (3.2.1.9)
V0 = U0 ∩ (S ∪ {X }). (3.2.1.10)
opposed to, for example, the one currently5 given on Wikipedia, and
(ii) it is something that is important enough that you should probably
at least be aware of its existence and will almost certainly eventually
encounter it if you decide to become a mathematician.
Filter bases are actually an alternative to nets. In principle, one
could do the entirety of topology never speaking of nets and instead
using only filter bases. This would be one motivation for introducing
them (though we have decided to primarily stick to nets).
The relation between nets and filter bases is given by the following.
55 July 2015
3 Topological spaces 226
a *cough*—Wikipedia—*cough*
a As of 16 July 2015
3.3 Filter bases 229
also converges to x ∞ .
a “P” is for principal, the etymology being from the use of the word
“principal” in the context of ideals in ring theory (to the best of my knowledge
anyways).
b Indeed, for derived filter bases, this is equivalent to being an (ordinary)
subnet—see Proposition 3.3.9.
c Or rather, I am not aware of a way to state the axioms without at least
implicitly using nets.
Proof. St e p 1 : M a k e h yp o t h e s e s
Suppose that (i) C(∅) = ∅, (ii) S ⊆ C(S), (iii) C(S) = C (C(S)),
and (iv) C(S ∪ T) = C(S) ∪ C(T).
Step 3: D e f i n e w h at s h o u l d b e t h e c l o s e d s e t s
The idea of the proof is that the closed sets should be precisely
the sets that are equal to their closure (Proposition 3.2.43). We
thus make the definition
C B C ∈ 2X : C = C(C) . (3.4.1.2)
Step 4: V e r i f y t h at t h i s d e f i n e s a t o p o l o gy
By (i), the empty-set is closed (by which we mean it is an
element of C ). By (ii), we have
X ⊆ C(X) ⊆ X, (3.4.1.3)
Step 6: D e m o n st r at e u n i q u e n e s s
If another topology V satisfies this property, that is, ClsV (S) =
C(S), then we have that ClsV (S) = Cls(S) (no subscript indi-
cates the closure in the topology defined by C ). It follows that
a set S is closed with respect to V iff it is equal to ClsV (S) iff
it is equal to Cls(S) iff it is closed in the topology defined by
C . Thus, the two topologies have the same closed sets, and
hence are the same.
a This is the exercise that tells us we can define a topology by specifying
the closed sets.
b Because Cls(S) ∈ C , as it must be because Cls(S) is closed.
Proof. St e p 1 : M a k e h yp o t h e s e s
Suppose that (i) (λ 7→ x∞ ) → x∞ ; (ii) if (λ 7→ xλ ) →
x∞ , then (µ 7→ xλµ ) → x∞ for every subnet µ 7→ xλµ of
λ 7→ xλ ; (iii) if every cofinal subnet µ 7→ xλµ has in turn a
subnet ν 7→ xλµν such that (ν 7→ xλµν ) → x∞ , then (λ 7→
xλ ) → x∞ ; and (iv) for all directed sets I and nets x i :
Λi → X, if x i → (x i )∞ and (i 7→ (x i )∞ ) → (x ∞ )∞ , then
I × i ∈I Λi 3 hi, λi 7→ (x i )λi → (x ∞ )∞ .
Î
Step 2: D e f i n e t h e no t i o n o f a n a d h e r e n t p o i n t
For a subset S ⊆ X and x ∈ X, we say that x is an adherent
pointa of S iff there is some net λ 7→ xλ ∈ S such that
(λ 7→ xλ ) → x.
Step 3: D e f i n e w h at s h o u l d b e t h e c l o s u r e
3.4 Equivalent definitions of topological spaces 237
ΛS B {λ ∈ Λ : xλ ∈ S} and ΛT B {λ ∈ Λ : xλ ∈ T } .
St e p 5 : S h ow t h at λ 7→ xλ co n v e rg e s t o x∞ i f f
(λ 7→ xλ ) → x∞
3 Topological spaces 238
Step 6: D e m o n st r at e u n i q u e n e s s
Recall that sets are closed iff they contain all their limit points.
If we have two topologies with the same notion of convergence,
then the set of limit points of a given set in each topology are
3.4 Equivalent definitions of topological spaces 239
Proof. St e p 1 : M a k e h yp o t h e s e s
Suppose that (i)Px → x, where Px B {U ⊆ X : x ∈ U}; (ii)
if F → x, then G → x for every filtering G of F ; (iii); if
every filtering G ⊇ F has in turn a filtering H such that
H → x, then F → x; and (iv) for all directed sets I and
3 Topological spaces 242
Step 2: D e f i n e t h e no t i o n o f a n a d h e r e n t p o i n t
For a subset S ⊆ X and x ∈ X, we say that x is an adherent
pointa iff there is a filter base F such that F ⊆ S for all F ∈ F
with F → x∞ .
Step 3: D e f i n e w h at s h o u l d b e t h e c l o s u r e
For a subset S ⊆ X, we define C(S) to be the set of adherent
points of S.
G∞ B {U ∩ S : U ∈ F∞ } (3.4.2.8)
K∞ B {U ∩ G : G ∈ G} (3.4.2.10)
Step 7: D e m o n st r at e u n i q u e n e s s
Note that x ∈ X is an adherent point of S ⊆ X iff there is a filter
base F such that F ⊆ S for all F ∈ F with F converging to
x. Recall that sets are closed iff they contain all their adherent
points.b If we have two topologies with the same notion of
filter convergence, then the adherent points of a given set in
each topology are the same, and consequently, a set is closed
in one iff it is closed in the other.
a This of course will turn out to agree with the notion of adherent point
for the topology.
b Because an adherent point of S is either an element of S of an accumu-
lation point of S.
3.4 Equivalent definitions of topological spaces 245
generates U .
R In particular,
Initial topologies have the nice property that you can determine
whether functions into X are continuous or not by looking at their
composition with each fY .
Proof. Define
3.4.4 Summary
We now quickly recap all the ways in which we know how to specify
a topology on a set.
(i).We can specify the open sets (Definition 3.1.1)..
(ii).We can specify the closed sets (Exercise 3.1.2).
(iii).We can specify a base (Definition 3.1.1.1).
(iv). We can specify a neighborhood base (Definition 3.1.1.6).
(v). We can generate a topology (Proposition 3.1.1.13).
(vi). We can define the closure of each set (Theorem 3.4.1.1).
(vii). We can define the interior of each set (Theorem 3.4.1.7).
(viii). We can define convergence of nets (Theorem 3.4.2.1).
(ix). We can define convergence of filters (Theorem 3.4.2.5).
(x). We can declare that functions on the space are continuous (the
initial topology—see Proposition 3.4.3.1).
(xi). We can declare that functions into the space are continuous (the
final topology—see Proposition 3.4.3.9.
U = {U ∩ S : U ⊆ X is open.}. (3.5.1.2)
9And indeed, you can probably quite easily find a proof that doesn’t use it.
3 Topological spaces 256
K ∩ C ⊆ U1 ∪ · · · ∪ Um, (3.5.1.8)
that is, an uncountable product (precisely, a product over the power set
of N) of the two-element set {0, 1}. Obviously {0, 1} is quasicompact
(all finite spaces are), and then a theorem called Tychonoff’s Theorem
(Theorem 3.5.3.14), which says that arbitrary products of quasicom-
pact spaces are quasicompact, will tell us that X is quasicompact.
So before we do anything else then, we must first define the
product topology.
Proof. All of this follows from the definition of the initial topol-
ogy, its characterization in terms of continuity of functions
into initial topologies, and the defining result of generating
collections (Propositions 3.1.1.13, 3.4.3.1 and 3.4.3.8).
is open in the box topology (by definition), but not open in the
product topology by the theorem above.
10At least compared to what we’ve been doing, though really the ‘meat’ is contained
in the Alexander Subbase Theorem (Theorem 3.2.1.5).
3 Topological spaces 262
Define
that is, a product of 2ℵ0 copies of the two element set {0, 1}.
In other words, it is the set of all functions from 2N into
{0, 1}—in fact, for the most of this example, we shall think
of this space as the collection of functions. This is quasi-
compact by Tychonoff’s Theorem (and because finite sets are
quasicompact—see Exercise 3.2.54). On the other hand, we
may define a sequence m 7→ xm ∈ X as follows.
(
1 if m ∈ S
xm (S) B (3.5.3.18)
0 if m < S,
11Says the person who decided to include the most detailed account I’m aware of
(except for probably [SJ70]) . . . .
12That is, the case when the sets are singletons.
3 Topological spaces 266
U B {∅, X }. (3.6.1.3)
arated, etc.), this is just the first one that is not completely obvious, which
is why it is the only one we have asked about.
√
T B{ 12 } × {r 2 ∈ (0, 1) : r ∈ Q}
√ √ (3.6.1.19)
= {h 12 , r 2i : r ∈ Q, r 2 ∈ (0, 1)}
and
U ⊆ S : U is open
o in S. if hx, y i ∈ S
b
n
B hx, yi : m ∈ Z+ if hx, y i ∈ T (3.6.1.21)
m
B Uhx,
yi
or x = h0, 0i; h1, 0i,
where
m
B { h0, 0i } ∪ hx, y i ∈ (0, 14 ) × (0, m1
) : x, y ∈ Q
Uh0,0i
m
B { h1, 0i } ∪ (x, y) ∈ ( 34 , 1) × (0, m
1
) : x, y ∈ Q
Uh1,0i
n √ √
U m1 √ B hx, y i ∈ ( 14 , 34 ) × (r 2 − m 1
,r 2+ m 1
): (3.6.1.22)
h , r 2i
2
x, y ∈ Q } for m ∈ Z+ with
√ √
1 1
r 2− m ,r 2+ m ⊆ (0, 1).
a Up to homeomorphism.
is open, and so
Ù
Uyc = {x} (3.6.2.14)
y,x
is closed.
contradiction.
14To remember where the condition “quasicompact subsets are closed” fits into the
series of implications (that is, strictly between T1 and T2 ), I secretly remember this
condition as “T1 3 (T1 1 is inappropriate because the 12 should be reserved for some
4 2
sort of “separation by closed neighborhoods”). I hesitate to make this an ‘official’
term because, besides the fact that no one else uses it, I don’t see how this property
is really a separation axiom.
3 Topological spaces 282
(This is the rationals in the open unit square together with the
bottom-left and bottom-right corners.) We define a topology
on X by defining a neighborhood base at each point—see
Proposition 3.1.1.9. For (x, y) ∈ X, there are three cases:
(i) hx, yi = h0, 0i, (ii) hx, yi = h1, 0i, and (iii) hx, yi ∈ S. We
define
U ⊆ S : U is open in S.b if hx, yi ∈ S
(
B hx,y i B n o
m
U hx,y i
: m ∈ Z+ otherwise,
where
m
B {h0, 0i} ∪ hx, yi ∈ (0, 12 ) × (0, 1
: x, y ∈ Q
U h0,0i m)
m
B {h1, 0i} ∪ hx, yi ∈ ( 12 , 1) × (0, 1
: x, y ∈ Q .
U h1,0i m)
For any point hx, yi distinct from h0, 0i and h1, 0i, we
can separate hx, yi from h0, 0i with neighborhoods by taking
m
U h0,0i 3 h0, 0i for m ∈ Z+ with m1 < y and any ε-ball about
hx, yi of with radius less than y − m1 . Similarly for hx, yi and
h1, 0i. Of course, any two points distinct from both h0, 0i and
h1, 0i are separated by neighborhoods because they can be
separated by neighborhoods in S. Thus, X is indeed T2 .
0 if x = x1
if x = x2 (3.6.2.41)
f (x) B 1
1
otherwise.
2
R ⊆ f −1 ([0, n1 )).
Ö
h0, 0, . . .i ∈ Uinn × · · · × Uinmn ×
1 n
i ∈I
i ,i1n,...,im
n
n
Ù Ö
n n
Ui n × · · · × Uimn × R = {h0, 0, . . .i}.
1 n
n∈Z+ i ∈I
i ,i1n,...,im
n
n
fi1 + · · · + fim0
!
Ö
f B gk · .ab (3.6.2.46)
k ∈S
m 0
! ∞
!
Ö Õ
f B gk · 2 fik − 1 . (3.6.2.47)
k ∈S k =1
But now we’ve gone through all the separation properties, right?
How could T3 be a thing? Well, now we enter a collection of separation
axioms of a different nature—now we will focus on separating closed
sets. One unfortunate fact, however, is that, if points are not closed,
then these new separation axioms are not strictly stronger than the
separation axioms we just presented. There are thus two versions of
the following separation axioms: ones in which the points are closed
and ones in which they aren’t.
Stupid examples like this is why we almost always care about the
case when we in addition impose the condition of being T1 . This
finally brings us to the definition of T3 .
Then, define
Ø
Y0 B Lm
m∈2Z
Ø
Y1 B {pn,k }
n∈1+2Z
k ∈Z,k ≥2
Ø (3.6.2.55)
Y2 B Tn,k
n∈1+2Z
k ∈Z,k ≥2
Y B Y0 ∪ Y1 ∪ Y2 .
where
Sm B f −1 ((α − m1 , α + 1
m )) ∩ Tn,k (3.6.2.59)
(Upα2 )c .
Ù
{p1 } = (3.6.2.62)
α∈R
0 if x = x1
if x = x2 (3.6.2.65)
f (x) B 1
φ(x)
otherwise.
The preimage of any set in [0, 1] which contains neither 0 nor
1 will be closed because φ is a homeomorphism. If it does
contain 0 or 1, then the preimage will be the union of a closed
set in the usual topology on R and a finite set, and hence will
be closed. Thus, f is continuous, and so X is perfectly-T2 . We
know check that X is not T3 .
Q ⊆ X is closed because it is countable. Any open neighbor-
hood of Q must be of the form of a usual open neighborhood
of Q with countably many points removed. Of course, the
only usual open neighborhood of Q is all of R, and so every
open neighborhood of Q is just √ R with countably many points
removed. On √ the other hand, 2 < Q and any open neigh-
borhood of 2 is a usual open neighborhood with countably
many points removed. Thus, by uncountability of R, these √
two must intersection, and so you cannot separate Q and 2
with neighborhoods. Thus, X is not T3 .
a Such a homeomorphism exists by Definition 6.4.5.58 (hopefully you
can construct a homeomorphism between (− π2 , π2 ) and (0, 1)).
Thus, once you hit T2 1 , you can increase your separation in one
2
of two noncomparable ways: you can make your space completely-T2
3.6 Separation Axioms 299
or you can make your space T3 . At the end of the section, we will
summarize how all the separation axioms relate to each other.
C ⊆ U ⊆ Cls(U) ⊆ V c ⊆ W1 (3.6.2.70)
and
1 if x ∈ C
if x = x0 (3.6.2.80)
f (x) B 0
1
otherwise.
2
Note that the same example (Example 3.6.2.43) that showed that
an arbitrary product of perfectly-T2 spaces need not be perfectly-T2
also shows that an arbitrary product of perfectly-T3 spaces need not
be even perfectly-T2 , much less perfectly-T3 . However, in fact, it’s
even worse: not even a finite product of perfectly-T3 spaces need be
perfectly-T3 . And now we return to the issue of the product of two
perfectly-T3 spaces.
Define
∆ B {ha, ai ∈ A × A : a ∈ A} . (3.6.2.87)
3 Topological spaces 304
Stupid examples like this is why we almost always care about the
case when we in addition impose the condition of being T1 .
There are at least two relatively large families of spaces that are T4 :
metric spaces (which are in fact perfectly-T4 —see Proposition 4.3.1.18)
and compact spaces.
Recall that (Example 3.6.2.52) there are T3 spaces that are not
completely-T2 . This does not happen with T4 and completely-T3 :
it turns out that every T4 space is completely-T3 , as we shall see
in a moment, in fact, they are what is called completely-T4 —see
Theorem 3.6.2.106. In particular, we shouldn’t try to look for a space
that is T4 but not completely-T3 . On the other hand, there are spaces
that are perfectly-T3 but not T4 .
For m ∈ Z+ , define
1
Sm B {x ∈ R \ Q : εx > m }. (3.6.2.95)
Then,
Ø Ø
R= Sm ∪ {x}, (3.6.2.96)
m∈Z+ x ∈Q
and so
Ø Ø
R= ClsR (Sm ) ∪ {x}.c (3.6.2.97)
m∈Z+ x ∈Q
x ∈ V ⊆ Cls(V) ⊆ W c ⊆ U. (3.6.2.103)
x ∈ V ⊆ Cls(V) ⊆ C c, (3.6.2.104)
(ii)
Proof. a
and so C is a G δ .
Õ fm (x)
f (x) B . (3.6.2.119)
m∈N
2 m+1
0 if x ∈ C1
(3.6.2.121)
f (x) B 1 if x ∈ C2
1
otherwise.
2
The preimage of 0 is C1 is closed, the preimage of 1 is C2 is
closed, and the preimage of 12 contains 0 ∈ X and so is closed.
Thus, this function is continuous. Now suppose that 0 ∈ C1 .
Then, we may define f : X → [0, 1] by
(
0 if x ∈ C1
f (x) B (3.6.2.122)
1 if x ∈ C2 .
and
and
For m ∈ Z+ , define
1
Sm B x ∈ Qc : εx > .f (3.6.2.133)
m
Then,
Ø Ø
R= Sm ∪ {x}, (3.6.2.134)
m∈Z+ x ∈Q
and so
Ø Ø
R= Cls(Sm ) ∪ {x}. (3.6.2.135)
m∈Z+ x ∈Q
3.6.3 Summary
We summarize what we have covered so far in this section.
First of all, there are several levels of separation between two
different objects in a space
3 Topological spaces 322
Closed sets and points, as well as closed sets and closed sets,
are automatically separated, and so there are no separation axioms
analogous to T0 and T1 for families (ii) and (iii).
For the other two “families”, in order for them to be directly
comparable with the first, we require that points be closed (that is, we
explicitly require spaces in the second two “families” to be T1 —see
Proposition 3.6.2.12). Without this extra assumption, the separation
axioms are called “regular” and “normal” respectively.
Thus, for the second family:17
(i). Closed sets and points being separated by neighborhoods is T3
(Definition 3.6.2.50).
(ii). Closed sets and points being separated by closed neighborhoods
is T3 1 (Definition 3.6.2.68).
2
(iii). Closed sets and points being completely-separated is
completely-T3 (Definition 3.6.2.72).
(iv). Closed sets and points being perfectly-separated is perfectly-T3
(Definition 3.6.2.76).
And similarly for the third family: 18
(i). Closed sets being separated by neighborhoods is T4 (Defini-
tion 3.6.2.50).
(ii). Closed sets being separated by closed neighborhoods is T4 1
2
(Definition 3.6.2.100).
(iii). Closed sets being completely-separated is completely-T4 (Defi-
nition 3.6.2.101).
(iv). Closed set being perfectly-separated is perfectly-T4 (Defini-
tion 3.6.2.115).
We now illustrate how all these axioms are related to each other.
All implications are strict (unless otherwise indicated by a ⇔), in
which case the offending counter-example is given in the indicated
footnote. Perhaps the only real surprise is the equivalence of T4 1 and
2
T0
20
T1
21
T2
22 . (3.6.3.1)
T 1 T3 T 1 T4 T 1
22 23 24 32 25 26 42
27 29 31
Completely-T2 Completely-T3 Completely-T4
28 30
32 34 36
Perfectly-T2 Perfectly-T3 Perfectly-T4
33 35
K ⊆ V ⊆ Cls(V) ⊆ U. (3.7.7)
and hence
Vx = Vx ∩ Cls(Bx ) ⊆ Wxc ∩ Cls(Bx ) ⊆
(3.7.9)
Ux ∩ Cls(Bx ) = Ux,
We will see that the proper way to interpret this statement is that the
image of f is connected, so that the image must contain everything
in-between f (a) and f (b) as well. Of course, in order to make this
precise, we have to first define what it means to be connected.
X0 = V ∪ W (3.8.1.5)
UV B U ∩ V and UW B U ∩ W, (3.8.1.6)
so that
U = UV ∪ UW (3.8.1.7)
Proof. Define
Ø
Ux B U. (3.8.1.12)
U ⊆X
U connected
x ∈U
aI suppose you could equip Rd with the product order, but this is not
totally-ordered for d ≥ 2, and so you cannot define the order-topology.
S B { x ∈ I : [(a, x) ⊆ U} . (3.8.1.22)
a Because every equivalence relation determines a partition—see Corol-
lary A.3.2.11.
and so
and hence
f (K) ⊆ f f −1 (U1 ) ∪ · · · ∪ f −1 (Um )
(3.8.2.4)
= a f f −1 (U1 ) ∪ · · · ∪ f f −1 (Um )
⊆ bU1 ∪ · · · ∪ Um,
Characterizations of quasicompactness
What follows is a summary of all the equivalent characterizations of
quasicompactness we are aware of. We have seen several character-
ization so far, but we have one more to go. IMHO, it is a bit more
difficult to understand, and not as useful as the others, so we have pro-
crastinated dealing with it until this subsection on quasicompactness.
While the real objective of this subsection was to do the Extreme
Value Theorem, it probably makes more sense to summarize these
characterizations here than anywhere else.
First of all, we will need one last definition before getting to our
final characterization.
3.8 The IVT and EVT 343
a Disclaimer: I didn’t actually check this because it doesn’t matter for us,
so I don’t guarantee that this statement is exactly true on those nose.
Proof. a St e p 1 : D e f in e S
Let Fe be the collection of all filter bases F that contain Fλ7→x
λ
and have the property that λ 7→ xλ is frequently contained in
F for all F ∈ F . Note that Fλ7→xλ ∈ F e (this is the derived
filter base of λ 7→ xλ (Definition 3.3.2)).
Step 2: O b ta i n a m a x im a l e l e m e n t F0 o f Fe
Regard F
e as a partially-ordered subset with respect to inclusion.
Let W ⊆ W
f f be a well-ordered subset, and define
Ø
F B W. (3.8.2.12)
W ∈W
f
St e p 3 : F i n d a s u b n e t µ 7→ xλµ e v e n t ua l ly co n -
tained i n e ac h e l e m e n t o f F0
Denote the index set of λ 7→ xλ by Λ. Order F0 by reverse
inclusion and equip Λ × F0 with the product order (Defini-
tion 2.4.5.19). For λ0 ∈ Λ and F ∈ F0 , as λ 7→ xλ is frequently
contained in F, there is some λλ0,F ≥ λ0 such that xλλ0, F ∈ F.
We claim that
Step 4: S h ow t h at µ 7→ xλµ is a n u lt r a n e t
Let S ⊆ X. We wish to show that µ 7→ xλµ is eventually
contained in S or S c . If S ∈ F0 , then by the previous part,
µ 7→ xλµ is eventually contained in S, and we are done, so
suppose that S < F0 . This means that (Meta-proposition 3.2.5),
λ 7→ xλ is frequently in S. Thus, there must be some F ∈ F0
such that S ∩ F = ∅, otherwise F0 ∪ {S} would be strictly larger
than F0 in Fe . S ∩ F = ∅ implies that F ⊆ S c , and so λ 7→ xλ
is frequently contained in S c as it is frequently contained in F.
But then (Meta-proposition 3.2.5 again) λ 7→ xλ is eventually
contained in S, and we are done.
a Proof adapted from [How91, pg. 90].
A uniform space is the most general context in which one can talk
about concepts such as uniform continuity, uniform convergence,
Cauchyness, completeness, etc.. To formalize this notion, we will
equip a set with a distinguished set of covers, called uniform covers.
The example you should always keep in the back of your mind is the
collection of all ε-balls for a fixed ε: Uε B {Bε (x) : x ∈ R}. The
idea is that, somehow, all of the sets in the same uniform cover are of
the ‘same size’.
Having specified the uniform covers, we will then be able to say
things like a net λ 7→ xλ is Cauchy iff for every uniform cover U ,
there is some U ∈ U such that λ 7→ xλ is eventually contained in U.
In the case that the collection of uniform covers is {Uε : ε > 0},1 you
can check that this is precisely the definition of Cauchyness we had in
R (Definition 2.4.3.1).
Moreover, the generalization from R to uniform spaces is not a
needless abstraction. Indeed, I am required to cover metric spaces
(Definition 4.2.1.6) in this course, and this is just a very special type
of uniform space. Indeed, essentially every topological space we look
1Disclaimer: The collection of all the Uε is not actually a uniformity but rather a
uniform base—see Definition 4.1.3.1.
4.1 Basic definitions and facts 349
4.1.1 Star-refinements
Definition 4.1.1.1 — Star Let X be a set, let S ⊆ X, and let
U be a collection of subsets of X. Then, the star of S with
respect to U , StarU (S), is defined by
Ø
StarU (S) B U. (4.1.1.2)
U ∈U
U∩S , ∅
To help guide your intuition, stars play similar roles as ε-balls did
in Rd . For example, if Uε B {Bε (x) : x ∈ Rd } is the cover of R
ball all ε-balls, then the star ‘centered’ at a point x ∈ Rd is
StarUε (x) = B2ε (x). (4.1.1.3)
We will certainly be wanting to take the ‘image’ and ‘preimage’
of a cover.
Definition 4.1.1.4 — Image and preimage of a cover Let
f : X → Y be a function, let U be a cover of X, and V be a
cover of Y . Then, the image of U , f (U ), is defined by
f (U ) B { f (U) : U ∈ U }. (4.1.1.5)
f −1 (V ) B { f −1 (V) : V ∈ V }. (4.1.1.6)
a Actually,
given a function f : X → Y , we obtain a function from 2X to
2Y and a function from 2Y to 2X : f : 2X → 2Y (the function that sends
a set to its image) and f −1 : 2Y → 2X (the function that sends a set to
its preimage) respectively. The preimage of a cover is the image of the
cover with respect to the preimage function f −1 : 2Y → 2X . Likewise, the
image of a cover is the image of the cover with respect to the image function
f : 2X → 2Y .
4.1 Basic definitions and facts 351
and so
and so
f (U ) = {[m + y, m + 1 + y] : m ∈ Z, y ∈ R} (4.1.1.16)
(iii).
(iv).
Star f (U ) (T) = f StarU f −1 (T) . (4.1.1.22)
Ø
= a f −1
© ª
V ®®
V ∈V
« f −1 (V )∩ f −1 (T ), ∅ ¬ (4.1.1.23)
© Ø ª
⊆ b f −1 V ®®
V ∈V
« V ∩T , ∅ ¬
C f −1 (StarV (T)) .
U ∩ S = f −1 ( f (U)) ∩ f −1 ( f (S))
(4.1.1.24)
= f −1 ( f (U) ∩ f (S)),
and hence
© Ø ª
f (StarU (S)) B f U ®®
U ∈U
« U∩S , ∅ ¬
Ø
=c f (U)
U ∈U
(4.1.1.26)
U∩S , ∅
Ø
⊆ d f (U)
U ∈U
f (U)∩ f (S), ∅
C Star f (U ) f (S).
(4.1.1.27)
Ø
= f −1
ª
V ®®
V ∈V
V ∩ f (S), ∅ ¬
C f −1 (StarV ( f (S))) .
4.1 Basic definitions and facts 355
Ø © Ø ª
Star f (U ) (T) B f (U) = f U®
®
(4.1.1.28)
®
U ∈U U ∈U
f (U)∩T , ∅ «U∩ f −1 (T ), ∅ ¬
C f StarU ( f −1 (T)) .
a Exercise A.3.27.(i)
b Here we are using the fact that f −1 (V) intersects f −1 (T) implies that
V intersects T. Also note that we have equality here if f is surjective.
c Exercise A.3.27.(iii)
d Here we are using the fact that U intersects S implies that f (U) intersects
f (S). Also note that we have equality here if f is injective.
U ∧ V B {U ∩ V : U ∈ U and V ∈ V }. (4.1.1.35)
© Ø ª ©Ø ª
StarU (x) ∩ StarV (x) B U ®® ∩ V ®®
U ∈U V ∈V
« x ∈U ¬ « x ∈V ¬
Ø
= U ∩V (4.1.1.40)
U ∈U s.t. x ∈U
V ∈V s.t. x ∈V
Ø
= U ∩ V C StarU ∧V (x).
U ∈U ,V ∈V
x ∈U∩V
4 Uniform spaces 358
Then,
f −1 (U ) ∧ f −1 (V )
B f −1 (U) ∩ f −1 (V) : U ∈ U , V ∈ V
(4.1.1.50)
= f −1 (U ∩ V) : U ∈ U , V ∈ V
C f −1 (U ∧ V ).
Thus, f (U ) Î f (V ).
R The elements of U
fare uniform covers.
We wish to show that U ⊆ StarU1 (x) ∩ StarU2 (x) such that (i)
x ∈ U and (ii) for every y ∈ U there is some V ∈ Ufsuch that
StarV (y) ⊆ U—see Proposition 3.1.1.9 (the result which tells
us how to define a topology by defining a neighborhood base).
To show that U ⊆ StarU1 (x) ∩ StarU2 (x), we show that
as desired.
x ∈ U as tautologically StarU3 (x) ⊆ StarU3 (x).
Now, let y ∈ U. We wish to find a cover V ∈ U f such
that StarV (y) ⊆ U. By definition, there is a cover W ∈ U f
such that StarW (y) ⊆ StarU3 (x). Let V ∈ U be a star-
f
refinement of W . We wish to show that StarV (y) ⊆ U. So,
let z ∈ StarV (y). To show that z ∈ U, it suffices to show that
StarV (z) ⊆ StarU3 (x). As StarW (y) ⊆ StarU3 (x), it in turn
suffices to show that StarV (z) ⊆ StarW (y). So, let V ∈ V be
such that z ∈ V. We wish to show that V ⊆ StarW (y). As
V Î W , there is some W ∈ W such that StarV (V) ⊆ W.
As V ⊆ StarV (V), it suffices to show that W ⊆ StarW (y),
that is, it suffices to show that y ∈ W. As StarV (V) ⊆ W, it
suffices to show that y ∈ StarV (V). However, as z ∈ StarV (y),
y ∈ StarV (z) ⊆ StarV (V), as desired.
R Note that this proof did not make use of the upward-
closed axiom. Thus, in fact, uniform bases (see below
in Definition 4.1.3.1) suffice to define the uniform
topology as well.
And just like with bases, the real reason uniform bases are
important is because they allow us to define uniformities. Thus, same
4Unfortunately, the way in which one might like to define a uniformity that is anal-
ogous to generating collections for topologies doesn’t work—see Example 4.1.3.11.
4 Uniform spaces 366
5Of course one cannot hope to make a statement like this precise (What does
it mean for a method of defining a uniformity to be “analogous to” a method of
defining a topology?), but hopefully the intuition is clear. All elements in a uniform
cover, no matter where they are in the space, are supposed to be thought of as the
same size. How could one hope to encode the idea of two sets living ‘far away’ are
of the same size by the specification of local information alone?
4.1 Basic definitions and facts 371
if S
fis a nonempty collection of covers on X, then there is a
unique uniformity U fsuch that (i) every cover in S
fis uniform
with respect to U , and (ii) that if U is any other uniformity
f f 0
and
U1 ({2}) = {1, 2} and StarU2 ({2}) = {2, 3}, and so V would have
b Star
to contain a set which contains {1, 2} and a set which contains {2, 3}. This
implies that the star of either of these sets with respect to V is {1, 2, 3},
which is not contained in any element of U .
Proof. Let CeY be any uniform base for Y and let Ce be the
collection of all finite meets of covers of the form fY−1 (V ) for
V ∈ CeY and Y ∈ Y . We wish to check that this is a uniform
base.
So, let fY−1
1
(V1 ) ∧ · · · ∧ fY−1
m
(Vm ) and fY−1
m+1
(Vm ) ∧ · · · ∧
f −1
Ym+n (Vm+n ) be two elements of Ce. Let Wk be a star-
refinement of Vk in CeYk . Then, because preimage preserves
star-refinement (Proposition 4.1.1.46) and the meet is ‘compat-
ible’ with the relation of star-refinement (Exercise 4.1.1.45), it
follows that
fY−1
1
(W1 ) ∧ · · · ∧ fY−1
m+n
(Wm+n ) (4.1.3.16)
for all x ∈ K.
Let U0 be in turn a star-refinement of U0 . We show that
U ∧ {K } is a star-refinement of f −1 (V ) ∧ {K }. So, let U ∈ U .
Let U0 ∈ U0 be such that StarU (U) ⊆ U0 . Let x ∈ U and pick
k so that (4.1.3.29) holds. Then,
and
U
f B {Uε : ε > 0} . (4.1.3.33)
and so
Exercise 4.2.1.14
(i). Show that StarBε (Bε (x)) = B3ε (x) in Rd .
(ii). Find a metric space in which this fails.
It’s worth noting that, in a metric space, for every closed subset,
the distance (as defined below in (4.2.1.16)) from a point to the closed
subset is a continuous function of that point.
and hence
Now that we’ve gotten that out of the way, we are ready to equip
semimetric spaces with a topology and uniformity.
where
| ·, · | ,..., | ·, · | m | ·, · | 1,..., | ·, · | m
n o
Bε1,...,ε
1
m
B Bε1,...,ε m
(x) : x ∈ X (4.2.1.23)
and
| ·, · | ,..., | ·, · |
1
Bε1,...,ε m
(x) B { y ∈ X :
m
(4.2.1.24)
| y, x| 1 < ε1, . . . , | y, x| m < εm .
is just like the ε-balls you know and love, but now,
as we have more than one semimetric, we don’t just
have ε-balls, but instead we have hε1, . . . , εm i-balls
for all m ∈ Z+ . Then, we just do the same as we
did when there was just one metric: for each choice
of hε1, . . . , εm i and corresponding semimetric, there
is a uniform cover consisting of hε1, . . . , εm i-balls
centered at each point, and the collection of all of
these uniform covers is the uniform base.
(⇐) Suppose that for every |·,·| ∈ E and ε > 0, there are some
|·,·| 1, . . . , |·,·| m ∈ D and δ1, . . . , δm > 0 such that, whenever
| x1, x2 | k < δk for 1 ≤ k ≤ m, it follows that | f (x1 ), f (x2 )| < ε.
Let |·,·| 1, . . . , |·,·|
m ∈ E and let ε1, . . . , εm > 0. We wish to
a
| ·, · | 1,..., | ·, · | m
show that f −1 Bε1,...,ε m
is a uniform cover.
By hypothesis, for each k, there are finitely many
|·,·| k,1, . . . , |·,·| k,mk ∈ D and δk1 , . . . , δk,mk > 0 such that,
whenever | x1, x2 | k,l < δk,l for 1 ≤ l ≤ mk , it follows that
| ·, · | ,..., | ·, · |
| f (x1 ), f (x2 )| k < 12 εk . We claim that Bδ1,1,...,δ 1,1
m, m m
m, m
star-
,...,
| ·, · | | ·, · |
refines f −1 Bε1,...,ε 1
m
m
, which will complete the proof.
1,1 | ·, · | ,..., | ·, · | 1,1 | ·, · | ,..., | ·, · |
So, let Bδ1,1,...,δ m, m m
m, m m
(x) ∈ Bδ1,1,...,δ m, m m
m, m
. We claim
that
| ·, · | ,..., | ·, · |
1,1
Star |·,·| 1,1, ..., |·, ·| m, m Bδ1,1,...,δ m, m m
(x) ⊆
B δ , ..., δ m, m m, m m
(4.2.1.28)
1,1 m
| ·, · | 1,..., | ·, · | m
f −1 Bε1,...,ε m
( f (x)) ,
1,1 | ·, · | ,..., | ·, · | m, m m
some x 00 ∈ X with x 00 ∈ Bδ1,1,...,δ m, m
(x 0) and x 00 ∈
m
| ·, · |
1,1 ,..., | ·, · |
Bδ1,1,...,δ m, m m
m, m m
(x). As | x 00, x 0 | k,l < δk for 1 ≤ k ≤ ml
(and similarly for | x 00, x| k,l ), it follows from our hypothesis
4.2 Semimetric spaces and topological groups 389
1 1
that | f (x 00), f (x 0)| k < 2 εk and | f (x 00), f (x)| k < 2 εk for
1 ≤ k ≤ m. Hence,
a We have not defined what precisely this means for categories, but with
a little mathematical maturity, you can probably figure it out. In any case,
it’s okay if you don’t know the precise definition.
{q + nZ : q ∈ G, n ∈ Z, n , 0} (4.2.2.4)
S B {hx, yi ∈ G × G : xy , 1} , (4.2.2.6)
B
eG, L B BU, L : U 3 1 is open. (4.2.2.17)
and
B
eG,R B BU,R : U 3 1 is open. (4.2.2.18)
where
BU, L B {gU : g ∈ G} (4.2.2.19a)
BU, R B {Ug : g ∈ G} . (4.2.2.19b)
are the same. However, we have have yet another potential problem
to deal with here—R is both a metric space and a topological group
(with respect to +), so we could use either the (semi)metric uniformity
or the topological group uniformity.6 Fortunately, we needn’t worry
about this, because the two uniformities are the same.
Proof. St e p 1 : Ma k e h y po t h e se s
Suppose that f is continuous at x0 .
Step 2: S h ow t h at f i s con t i n u o u s .
We first show that f is continuous (as opposed to just contin-
uous at x0 ). To show that, we show that f is continuous at
so that
x x0−1 f −1 f (x0 ) f (x)−1V ⊆ f −1 (V), (4.2.2.24)
Step 3: S h ow t h at f i s u n if o r m ly- co n t i n u o u s .
To show that f is uniformly-continuous, we apply Propo-
sition 4.1.3.7 (it suffices to check uniform-continuity on a
uniform base). So, let V ⊆ H be an open neighborhood of the
identity and consider the uniform cover UV B {hV : h ∈ H}.
To show that f −1 (UV ) is a uniform cover, it suffices to find
an open neighborhood U ⊆ G of the identity such that
UU Î f −1 (UV ). Take U 0 to be an open neighborhood of the
identity such that U 0U 0 ⊆ f −1 (V), and then in turn take U to
be an open neighborhood of the identity such that (i) UU ⊆ U 0
and (ii) U = U −1 (which we may do by Proposition 4.2.2.12).
We wish to show that
Ø
StarUU (U) B xU ⊆ f −1 (V). (4.2.2.25)
x ∈G
xU∩U , ∅
so that UU Î f −1 (UV ).
a The juxtaposition here is being used to denote multiplication in the
group. Be careful not to confuse preimages with inverse elements (even
though the same symbol is used, the context makes the notation unambigu-
ous).
|v1, v2 | B k v1 − v2 k . (4.2.3.4)
k f (v)k 0 ≤ K1 k vk 1 + · · · + Km k vk m (4.2.3.9)
for all v ∈ V.
We’re just about to put all these definitions to use for the purpose
of introducing, among other things, the notion of uniform convergence.
It will be important to be able to contrast this with the weaker notion
of pointwise convergence.
OK1,a1,ε1 ;...;Kn,an,εn
(4.2.3.29)
B f ∈ C(X; F) : f (Kk ) ⊆ Bεk (ak )
for every ε > 0, the set O {x },1,ε;{y },0,ε is open in the locally
convex topology, and hence contains a set of the form
,...,K n
f + BεK11,...,ε n
, (4.2.3.30)
f − f (y)
(4.2.3.31)
f (x) − f (y)
is continuous, is 0 at y, and 1 at x. Thus, X is completely T2 .
but we now also have the strict inequality | h(x) − f (x)| < δk
for all x ∈ Kk .
For each Ki and x ∈ Kk , let Ux ⊆ Kk be open and such
that f is strictly within εk −δ 2
k
of f (x) on Ux . Then, there is
a finite cover—denote the closure (in Kk ) of each element of
of this finite cover by Lk,1, . . . , Lk,nk with xk,l ∈ Lk,l being
the x of Ux . Thus, Kk is covered by Lk,1, . . . , Lk,nk and f is
strictly within εk − δk of f (xk,l ) on Lk,l Do this for each Kk
to get compact sets Lk,l for 1 ≤ k ≤ n and 1 ≤ l ≤ mn . As h
is within δk of f on all of Kk , it will certainly be within δk of
f (xk,l ) for all l as xk,l ∈ Kk . Thus,
h ∈ O L1,1, f (x1,1 ),δ1 ;..., Lm, n m , f (xn, xn m ),δm . (4.2.3.33)
Similarly, if g is an element of this “O-set”, we have that
sup |g(x) − f (x)| : x ∈ Lk,l
St e p 1 : Co n st ru c t a s e q u e nc e o f sta r-
refineme n t s o f U
Let us write U0 B U . Then, we take a star-refinement U1 of
U0 , in turn another star-refinement U2 of U1 , and so on.
Step 4: D e f i n e t h e s e m i m e t r i c
Finally, we define
Step 5: S h ow t h at t h i s i s i n f ac t a s e m i m e t r i c
From the definition, we have that |·,·| is symmetric (this is
the reason for putting the “or” in the definition—note that
the definition of `(x1, x2 ) is not manifestly symmetric). The
triangle inequality follows from the fact that a path from x1 to
x3 and a path from x3 to x2 gives us a path from x1 to x2 , with
the length of this new path being the sum of the lengths of the
other two. Thus, |·,·| is in fact a semimetric.
Step 6: Co n st ru c t Y
Define x1 ∼ x2 iff | x1, x2 | = 0. That this is an equivalence
relation follows from the fact that |·,·| is a semimetric. Thus,
we may define
Y B X/∼ . (4.3.2.5)
Step 7: Co n st ru c t t h e m e t r i c o n Y
We abuse notation and write the induced metric on Y with the
same symbol |·,·| as the semimetric on X:
Step 8: S h ow t h at t h i s i n f ac t a m e t r i c
|·,·| on Y is automatically symmetric and satisfies the tri-
angle inequality because |·,·| on X does. Furthermore, if
|[x1 ]∼, [x2 ]∼ | = 0, then | x1, x2 | = 0, and so x1 ∼ x2 by the
definition of ∼. Thus, |·,·| is indeed a metric on Y .
Step 9: D e f i n e q : X → Y
4 Uniform spaces 420
so that
and
Define f : X → [0, 1] by
that is, the set of all functions that are injective ‘modulo
sending more than one point to m’. We show that P0 and
4.3 T0 uni. spaces are uni.-completely-T3 425
And now we present the result that was the motivation for the
introduction of the condition “totally-bounded” in the first place: an
4 Uniform spaces 432
Proof. St e p 1 : P rov e t h e r e s u lt f o r X q ua s i co m -
pact
We first take X to be quasicompact. In this case, the uniform
structure on MorTop (X, R) is the same as that generated by
the single norm k·kX (Exercise 4.2.3.25). Therefore, by Ex-
ercise 4.4.3 (Cauchyness in semimetric spaces), to show that
MorTop (X, R) is complete, it suffices to show that every net
that is Cauchy with respect to k·kX converges. So, suppose
that λ 7→ fλ ∈ MorTop (X, R) is Cauchy. As, for every x ∈ X,
fλ (x) − fλ (x) ≤
fλ − fλ
, (4.4.1.2)
1 2 1 2
| f∞ (x) − f∞ (x∞ )|
≤ f∞ (x) − fλ1 (x) + fλ1 (x) − fλ0 (x)
< 5ε.
Thus, f∞ is continuous.
4.4 Cauchyness and completeness 435
{Ux : x ∈ X } (4.4.1.4)
| fλ (x) − f∞ (x)|
(4.4.1.5)
≤ | fλ (x) − fλ (x1 )| + | fλ (x1 ) + f∞ (x1 )| < 2ε.
Step 2: P rov e t h e r e s u lt i n g e n e r a l
We now do the general case, in which case X is not nec-
essarily quasicompact. So, let λ 7→ fλ ∈ MorTop (X, R) be
Cauchy. Then, by Exercise 4.4.3 again, we must have that
λ 7→ fλ | K MorTop (K, R) is Cauchy for each quasicompact
subset K ⊆ X. Therefore, by the quasicompact case, λ 7→ fλ
converges to its pointwise limit f∞ on each quasicompact
subset. As f∞ is continuous on each quasicompact subset,
the intersection of the preimage of every open set with every
quasicompact subset of X is open, and hence, by hypothesis,
is open. Therefore, f∞ is continuous. Furthermore, λ 7→ fλ
converges to f∞ in MorTop (X, R) because it converges to f∞
on each quasicompact subset.
4 Uniform spaces 436
Now that we’ve just proven what goes right, we turn to the more
interesting side of things—what can go wrong.
The key result of this subsection was that MorTop (X, R) is com-
plete for X quasicompactly-generated. It turns out that the converse
of this is false.
In conclusion:
4.4 Cauchyness and completeness 439
Proof. St e p 1 : P rov e t h e r e s u lt f o r X q ua s i co m -
pact
Suppose that λ 7→ fλ (x) is uniformly eventually monotone.
By definition, this means that there is some λ0 such that,
whenever λ ≥ λ0 , λ 7→ fλ (x) is monotone for all x ∈ X.
Define gλ B | fλ − f∞ |. It follows that, whenever λ ≥ λ0 ,
λ 7→ gλ is nonincreasing. Without loss of generality, suppose
that λ 7→ gλ is actually nonincreasing. Not that λ 7→ gλ
converges pointwise to 0.
Denote the index set by Λ. Let ε > 0. Then, X =
λ∈Λ gλ ([0, ε)) because λ 7→ gλ converges pointwise to 0.
Ð −1
Step 2: P rov e t h e r e s u lt i n g e n e r a l
We now do the general case, in which case X is not neces-
sarily quasicompact. So, suppose that λ 7→ fλ is uniformly
eventually monotone. Then certainly λ 7→ fλ | K is uniformly
eventually monotone for all quasicompact K ⊆ X, and so by the
previous step, λ 7→ fλ | K converges to f∞ | K in MorTop (K, R),
and hence λ 7→ fλ converges to f∞ in MorTop (X, R).
Proof. St e p 1 : M a k e h yp o t h e s e s
Suppose that for every x1, x2 ∈ X with x1 , x2 there is some
f ∈ A such that f (x1 ) , f (x2 ).
Step 2: R e d u c e t o t h e q ua s i co m pac t ca s e
4.4 Cauchyness and completeness 443
Step 5: S h ow t h at Cls(A) i s a l s o a s u ba l g e b r a
We leave this step as an exercise.
Step 6: S h ow t h at i f f ∈ A, t h e n | f | ∈ Cls(A)
c Suppose that f ∈ A. As X is quasicompact, by the Extreme
hm B hm−1 + 12 (g 2 − hm−1
2
). (4.4.1.17)
h∞ = h∞ + 12 (g 2 − h∞
2
), (4.4.1.18)
h1 B 0 + 12 (g 2 − 0) = g 2 ≥ 0 C h0 . (4.4.1.19)
hm+1 B hm + 12 (g 2 − hm
2
)
= hm + 12 (|g| + hm )(|g| − hm )
≤ hm + 12 (|g| + |g|)(|g| − hm )
(4.4.1.21)
= hm + |g|(|g| − hm )
≤ d hm + |g| − hm
= |g|.
Thus, m 7→ hm is a nondecreasing sequence in MorTop (X, R)
with 0 ≤ hm ≤ |g|. By Dini’s Theorem (Proposition 4.4.1.10),
the sequence m 7→ hm converges. If the limit is denoted h∞ ,
then as each hm ≥ 0, h∞ ≥ 0 as well, and so as we have
g 2 = h∞
2 , we indeed have that h = |g|, as desired.
∞
St e p 8 : S h ow t h at i f f , g ∈ Cls(A), t h e n
max( f , g), min( f , g) ∈ Cls(A)
Suppose that f , g ∈ Cls(A). Note that
a Recall that (Theorem 4.4.1.12) this means that for every x , x ∈ X
1 2
distinct, there is some p ∈ R[x] such that p(x1 ) , p(x2 ).
7Of course, we already know that Q is not complete (see Propositions 2.4.3.49
and 2.4.3.58), and so in general there will most certainly be some ‘completing’ to be
done.
4 Uniform spaces 450
X 0 B {λ 7→ xλ ∈ X : λ 7→ xλ is Cauchy.} (4.4.2.2)
Step 3: D e f i n e Cmp(X) a s a s e t
We define
Step 4: D e f i n e a u n i f o r m i t y o n X 0 .
Denote the uniformity on X by U f. For every uniform cover
U ∈U f, we define a corresponding cover U 0 of X 0 as follows.
4.4 Cauchyness and completeness 451
For U ∈ U
fand U ∈ U , define
U 0 B {(λ 7→ xλ ∈ X 0 :
λ 7→ xλ is eventually contained in U.}
(4.4.2.4)
U B {U 0 : U ∈ U }
0
U
f0 B {U 0 : U ∈ U
f}.
St e p 5 : S h ow t h at t h e q u o t i e n t m a p q : X 0 →
Cmp(X) sat i s f i e s q−1 (q(U 0)) = U 0
As it is always the case that U 0 ⊆ q−1 (q(U 0)), it suffices to show
that q−1 (q(U 0)) ⊆ U 0. So, let (λ 7→ xλ ) ∈ q−1 (q(U 0)). Then,
4 Uniform spaces 452
Step 6: D e f i n e a u n i f o r m ba s e o n Cmp(X)
For every uniform cover U ∈ U f, we define a corresponding
cover Cmp(U ) of Cmp(X). For U ∈ U fand U ∈ U , definea
Cmp(U) B q(U 0)
Cmp(U ) B q(U 0) B {Cmp(U) : U ∈ U } (4.4.2.5)
Cmp(U
) B q(U ) B {Cmp(U ) : U ∈ U
f0 f}.
Step 7: S h ow t h at Cmp(X) i s co m p l e t e .
Let λ 7→ q(x λ ) be Cauchy in Cmp(X). By definition of
Cauchyness and our uniformity on Cmp(X), this means that,
for every uniform cover Cmp(U ) ∈ Cmp(U f), there is some
λ
Cmp(U) ∈ Cmp(U ) such that λ 7→ q(x ) is eventually con-
tained in Cmp(U) B q(U 0). Thus, for each x λ ∈ X 0 for
λ sufficiently large, there is some y λ ∈ U 0 with x λ ∼ y λ .
However, by definition of U 0, U eventually contains y λ , and
hence, because x λ ∼ y λ , eventually contains x λ , and so in
fact x λ ∈ U 0. Thus, we have a net λ 7→ x λ ∈ X 0 that has the
4.4 Cauchyness and completeness 453
M λ 3 (hλ, µi 7→ [x λ ]µ λ ∈ X.
Ö
Λ× (4.4.2.6)
λ∈Λ
(x λ )µ λ ∈ U (4.4.2.7)
Step 8: S h ow t h at Cmp(X) co n ta i n s X
4 Uniform spaces 454
Cmp(U ) ∧ {ι(X)}
B {q(U 0) ∩ q(c(X)) : U ∈ U }
(4.4.2.12)
= {q(U 0 ∩ c(X)) : U ∈ U }
= {ι(U) : U ∈ U } ,
x∞ B lim xm . (4.4.3.6)
m
X B { f : Z+ → [0, 1] :
(4.4.3.8)
f (m) = 0 for all but finitely many m ∈ N.}.b
4 Uniform spaces 462
and so
1
| y1, y2 | ≤ (| y1, f (y1 )| + | f (y2 ), y2 |) (4.4.3.22)
1−M
Also note that
So, first things first—fuck the Riemann integral. Seriously. The only
argument pro-Riemann integral is that it is easier. What a ridiculous
argument. This is math, dude. If you choose to do things because
they’re easy, you’re in the wrong subject. Moreover, I would argue
that this is not even true—if you set things up right, you can literally
define the (Lebesgue) integral to be the area (measure) under the curve.
Or, if you prefer, you can take a limit over the size of a partition of
the sum of the areas of the rectangles corresponding to the subsets
of the partition (the Riemann integral). Are you really going to sit
here and try to argue that this is easier to teach? I call bullshit. And
besides, if you’re going to become a mathematician, you have to learn
the Lebesgue integral at some point anyways. . . why learn something
only to have to relearn it later?
Okay, so now that my rant is out of the way, let’s actually do some
mathematics.
5.1.1 Measures
The intuition behind measure is actually quite easy—a measure is just
an axiomatization of our intuition about notion of things like length,
area, and volume. Before we define “measure”, however, it will be
convenient to introduce a couple of terms.
a Concretely,
this means that m(S) ≤ m(T) if S ⊆ T.
b Incidentally,as we shall see shortly, m restricted to the collection
of measurable sets (Definition 5.1.1.19) will be additive (Carathéodory’s
Theorem (Theorem 5.1.1.23)), that is, a “measure” in their sense of the term.
That is, every measure in our sense of the word determines a “measure”
in their sense of the term, and so the distinction is not really that big of a
deal—I just find it quite a bit cleaner to be working with m defined on all of
2X than merely just the measurable sets.
Dear god, this example is even more interesting than the last
one.
In measure theory, things almost always matter only ‘up to’ sets
of measure 0. This concept is so important that there is a term for it.
for all S ⊆ X.
Proof. St e p 1 : S h ow ( i v )
The definition of measurability is S ↔ S c symmetric, so (iv)
is automatically true.
Step 2: S h ow ( v )
The empty-set is measurable by the previous exercise, and
hence by (iv), X is measurable as well, which establishes (v).
Step 4: P rov e ( i i ) f o r f i n i t e u n i o n s
We first show that the union of finitely many measurable sets is
measurable. It suffices of course to then just show that the union
of two measurable sets is measurable. So, let M1, M2 ⊆ X be
5 Integration 476
Step 5: Co m p l e t e t h e p ro o f o f ( i i )
Let {Mm : m ∈ N} be a countable collection of measurable
sets. We wish to show that
Ø
Mm (5.1.1.24)
m∈N
Thus, as
Ø Ø
Mm = Mm0 , (5.1.1.27)
m∈N m∈N
m(S ∩ Nm ) = d m(S ∩ Nm ∩ Mm )
c
+ m(S ∩ Nm ∩ Mm )
= e m(S ∩ Mm ) + m(S ∩ Nm−1 ) (5.1.1.29)
m
Õ
= f m(S ∩ Mk ).
k =0
Thus,
Step 6: P rov e ( i )
5 Integration 478
∞
! m
!
Ù Ù
m Mk = lim m Mk = lim m(Mk ). (5.1.1.38)
m m
k =0 k =0
∞ · 0 B 0 C 0 · ∞. (5.1.1.43)
Therefore, one of the open sets {x1 } or {x2 } must have been
nonmeasurable.
Õ
m(U \ M) ≤ m(Um − Mm ) < 2ε. (5.1.2.19)
m∈N
m(U \ V c ) = m(U ∩ V)
≤ m(U ∩ V ∩ M) + m(U ∩ V ∩ M c ) (5.1.2.21)
≤ m(V \ M c ) + m(U \ M) < 4ε.
5 Integration 488
∞
Õ m
Õ
m(S) − m(Lm,ε ) = b m(Sk ) − m(Kk,ε )
k =0 k =0
∞
Õ Õm
= m(Sk ) + m(Sk ) − m(Kk,ε )
k =m+1 k =0
Õ∞ m
Õ
m(Sk ) + m(Uk,ε ) − m(Kk,ε ) (5.1.2.29)
≤
k =m+1 k =0
∞ m
Õ Õ ε
< m(Sk ) +
k =m+1 k =0
2 k−1
∞
Õ
≤ m(Sk ) + 4ε
k =m+1
and so m is inner-regular on S.
St e p 2 : P rov e t h e r e s u lt f o r s e t s o f i n f i n i t e
measure
Let S ⊆ X be measurable. Suppose that m(S) = ∞. This time,
write X = m∈N Km for Km ⊆ X compact and Km ⊆ Km+1 .c
Ð
Define Sm B S ∩ Km . Then, Km \ Sm has finite measure, and
so by outer-regularity, there is some open Um ⊆ X such that
Km \ Sm ⊆ Um and m (Um \ (Km \ Sm )) < 1.d Now define
L m B K m ∩ Um c . L is compact and L ⊆ S . Furthermore,
m m m
c
m(Sm \ Lm ) B m(Sm ∩ (Km ∪ Um )
= m(Sm ∩ Um ) (5.1.2.31)
≤ m(Um \ (Km \ Sm )) < 1.
5.1 Measure theory 491
!
Ø
lim m(Sm ) = m Sm = m(S) = ∞, (5.1.2.32)
m
m∈N
as desired.
a Here we are applying Exercise 5.1.1.33 together with (5.1.2.27) and
(5.1.2.28). This is why we neeed Km,ε 0 to be measurable.
b Here we are using the fact that the F s are disjoint, so that the S s and
k k
Kk,ε s are in turn disjoint.
c By definition, σ-compact just means we can write X = Ð
m∈N Km for
0
Km ⊆ X compact, not necessarily with Km ⊆ Km+1 . Given this, we may
0 0
define Km B km=0 Km 0 so that now X = Ð
m∈N , Km ⊆ X is compact, and
Ð
Km ⊆ Km+1 .
d Here, we have taken ε = 1.
have that
!
Ø Õ
∞ = m(S) = m Sm ≤ m(Sm ), (5.1.2.35)
m∈N m∈N
and
(ii).
Ø
S= Km ∪ Z, (5.1.2.38)
m∈N
One likewise might expect that nonempty open sets have positive
measure in topological measure spaces. This is not true.
(
0 if 0 < S
m(S) B (5.1.2.47)
1 if 0 ∈ S.
m(U) = 1
(5.1.2.48)
= sup{m(K) : K ⊆ U, K quasicompact}.
5 Integration 496
m(U) = 0
(5.1.2.49)
= sup{m(K) : K ⊆ U, K quasicompact}.
Thus, m is inner-regular on opens. Now let S ⊆ R be arbitrary.
If 0 ∈ S, then every open set which contains S will also contain
0, and so we definitely have
Hence,
m(A) > m(Uε ) − ε ≥ b m(Kε ∪ Lε ) − ε
= c m(Kε ) + m(Lε ) − ε
> m(U ∩ Uε ) + m(Uε ∩ Kεc ) − 3ε (5.1.2.61)
d m(U c
≥ ∩ Uε ) + m(Uε ∩ U ) − 3ε
≥ e m(A ∩ U) + m(A ∩ U c ) − 3ε.
(ii). m is uniformly-additive.
(iii). m is Borel.
A summary
Before finally moving on to the Haar-Howes Theorem, let us briefly
summarize.
(i). Borel means that open sets are measurable—see Defini-
tion 5.1.2.1.
(ii). Regular means quasicompact sets have finite measure, inner-
regularity on opens, and outer-regularity on everything—see
Definition 5.1.2.2.
5.1 Measure theory 503
1Warning: This is the name I have chosen to call it, because I am unaware of
another name for this result. In particular, don’t expect others to know what you’re
talking about if you reference this theorem by name. (It is a generalization of the
existence and essential uniqueness of haar measure, in case you are more familiar
with that.) While the proof of the result I cobbled together from other sources, the
formulation of the result I found essentially in [How91], in which he claims “the
author” established essential uniqueness, as well as existence, which he and another
mathematician (Izkowitz) established independently.
5 Integration 504
B
eΦ B {BU } (5.1.3.2)
where
{BB : B ∈ B} (5.1.3.4)
And finally now the key result that will allow us to define basically
every measure we work with in these notes (and much more).
Proof. St e p 1 : M a k e h y p o t h e s e s a n d i n t ro d u c e
notation
Let K0 ⊆ X be a quasicompact subset with nonempty interior.
Denote the uniform topology on X by U and denote the
collection of all quasicompact subsets of X by K
Step 2: D e f i n e (K : U) f o r K ∈ K a n d U ∈ U
The cover BU B {φ(U) : h ∈ Φ} is an open cover of K, and
so there is a finite subcover. Let (K : U) denote the cardinality
of the smallest such subcover.
Step 3: D e f i n e HU : K → R+0
For U ∈ U , define HU : K → R+0 by
(K : U) a
HU (K) B . (5.1.3.12)
(K0 : U)
Step 5: D e f i n e H : K → R+0
5.1 Measure theory 509
and
C B {CU : U ∈ U }. (5.1.3.15)
Step 10: D e f i n e t h e m e a s u r e m o n a l l o p e n s u b -
sets of X
For U ⊆ X open, define
Step 11: E x t e n d m t o a l l s u b s e ts o f X
Now, for an arbitrary subsets S of X, define
Step 12: S h ow t h at m i s a m e a s u r e
Define
Thus, m is a measure on X.
St e p 1 3 : S h ow t h at e ac h BU i s u n i f o r m ly-
measura b l e w i t h r e s p e c t t o m
Let φ ∈ Φ. We want to show that m(φ(U)) = m(U). Then,
for any other φ 0 ∈ Φ, we will have that m(φ(U)) = m(U) =
m(φ 0(U)), so that indeed every element of BU has the same
measure.
However, K is a quasicompact set contained in U iff φ(K)
is a quasicompact set contained in φ(U). Therefore, by
the definition of m(U) (5.1.3.18) it suffices to show that
H(K) = H(φ(K)) for all K ∈ K . To show this, we first
show that HU (K) = HU (h(K)) for all U ∈ U . That is, we
would like to show that (K : U) = (φ(K) : U). However,
every cover of K by elements of BU , φ1 (U), . . . , φm (U),
gives a cover of φ(K) by elements of BU of the same
cardinality, φ(φ1 (U)), . . . , φ(φm (U)). It thus follows that
HU (K) = HU (h(K)).
For φ ∈ Φ fixed, consider the map from H to R that sends
f to f (φ(K)) − f (K). We just showed that this is 0 on each
HU ∈ T, and so it is 0 on CU , and so it is 0 on H, that is,
H(K) = H(φ(K)).
St e p 1 4 : S h ow t h at i f S a n d T a r e u n i f o r m ly-
separate d , t h e n m(S ∪ T) = m(S) + m(T)
Note that we always have that m(S ∪ T) ≤ m(S) + m(T), and
so it suffices to show that m(S ∪ T) ≥ m(S) + m(T).
We first prove this for open sets. So, let U, V ∈ U . If either
U or V s has infinite measure, then this inequality just reads
∞ ≥ ∞, and is so automatically satisfied. Thus, without loss
5.1 Measure theory 515
and so
m(U ∪ V) ≥ H(K ∪ L) = H(K) + H(L)
(5.1.3.30)
> m(U) + m(V) − 2ε.
as desired.
In particular, we have now shown that BeΦ is a uniformly-
measurable base for m and that m is additive for uniformly-
separated sets, so that indeed m is a uniformly-additive on
X.
It remains to show that m is regular.
Step 16: S h ow t h at m i s r e g u l a r
The first thing we check is that m(K) < ∞ for K quasicompact.
By Proposition 3.7.6, there is some open U ⊆ X containing
5.1 Measure theory 517
Step 19: D e f i n e S b T
5 Integration 518
For the rest of this proof, let us write S b T iff there is some
uniform cover B ∈ B eΦ for which StarB (S) ⊆ T.
Step 21: S h ow t h at m i s u n i q u e
Let m0 be another regular uniformly-additive on measure X
with uniformly-measurable base B eΦ and m(K0 ) = 1. As both
m and m are outer-regular, it suffices to show that they agree on
0
Z(G ) B
S ∈ 2X : for every ε > 0 there are Cε closed
for U ∈ G open (the exact same proof will work for m0).
Of course, we only need to show that ≤ inequality (because
m(U) ≥ m(V) for V b U). As m is inner-regular on open sets
(and so in particular on elements of G ), we have that
Define
W ⊆ StarB (V c )c ⊆ (V c )c = V, (5.1.3.40)
m0(U0 ) (U0 : V)
0
≤ (5.1.3.44)
m0(U) (U 0 : V)
whenever StarBU 0 (U0 ) ⊆ U00 .
m0(U0 ) (U00 : V)
≥ (5.1.3.46)
m0(U 00) (U 0 : V)
whenever StarBU 0 (U00 ) ⊆ U0 .
aK
0 is nonempty, and so cannot be covered by anything empty. Therefore,
(K0 : U) ≥ 1, and in particular, is not 0.
b That was sort of the point of the previous step.
c For each K , K ∈ K with K ⊆ K , we have such a map.
1 2 1 2
d The first map from H into R × R is the projection of f ∈ H onto the
K1th coordinate in the first coordinate and the projection of f ∈ H onto
K2th coordinate in the second coordinate. This map is continuous because
it is continuous in each coordinate (each coordinate is continuous because
projections are continuous). The first map is followed by the map from
R × R into R given by subtraction, which is continuous because we know
that hR, +, 0, −i is a topological group.
e This is the definition of uniformly-separated—see Definition 4.3.1.3.
f Note that every B is an open cover—both B and C are secretly of
U
the form BU and BV for some U, V ⊆ X open—so there is no need to say
“open” here—it is just to clarify. (We don’t write BU or BV to simplify the
notation (and also because we will want to write U for something else).)
g Because C and C 0 intersect.
h Because B contains C 0 , which intersects S.
i H is nondecreasing by Step 6.
j Recall that H is always finite, by definition.
k Take any open set U with compact closure (which exists by local
compactness). Then, BU will be a cover whose elements are in G . Take
a common star-refinement of this and B . The closures of elements of this
new cover will be contained in the elements of BU , which themselves will
be compact as closed sets of compact sets are compact.
5.1 Measure theory 523
Φ B {φ : X → X : φ an isometry.} . (5.1.3.49)
2Though this is a bit like using a sledgehammer (or maybe a nuke?) ‘swat’ a fly.
5.1 Measure theory 525
3You can do things much more generally than this, but for us,there is no need.
As a rule, it is usually best to do things in the nicest theory that encompasses every
example you’re interested in, and for us, we will not be interested in measures that
are not topological (except perhaps when it comes to counter-examples).
4At least when f is nonnegative—we’ll have to work just a teensy bit harder to
take the ‘signed area’ if the function is negative somewhere.
5Of course, you don’t have to do things this way, and indeed, this is not how
it’s usually done. Doing things the other way, however, is arguably one reason why
people think the Lebesgue integral is “not geometric”. On the other hand, I can’t
believe anybody would argue that the “area under the curve” is not geometric or
intuitive—this is in fact how we explain things to high school students, after all.
5 Integration 526
(ii).
( m
Õ
[m1 × m2 ](U) = sup m1 (K1, k ) m2 (K2, k ) : m ∈ N
k =0
Ki, k ⊆ Xi compact,
K1, k × K2, k : 0 ≤ k ≤ m is disjoint, (5.1.4.3)
m
)
Ø
K1, k × K2, k ⊆ U
k =0
for U ⊆ X1 × X2 open;
and
(iii).
(
Õ
[m1 × m2 ](S) = inf m1 (U1, m ) m2 (U2, m ) :
m∈N
) (5.1.4.4)
Ø
Ui, m ⊆ Xi open, S ⊆ U1, m × U2, m .
m∈N
a That is, you assume that ‘area is base times height’ for quasicompact
rectangles, and then you get that ‘area is base times height’ for all rectangles—
you don’t even need S1 and S2 to be measurable.
Proof. St e p 1 : C h e c k th at X1 × X2 i s σ- co m pac t
You (hopefully) already showed in Exercise 5.1.2.10 that the
product of two σ-compact spaces is σ-compact, and so X1 × X2
is σ-compact.
Step 2: D e f i n e m1 × m2
We define
mK (K1 × K2 ) B m1 (K1 ) m2 (K2 ) for K1 ⊆ X1, K2 ⊆ X2 compact
( m
Õ
mU (U) B sup m K (K1, k × K2, k ) : m ∈ N
k =0
K1, k ⊆ X1, K2, k ⊆ X2 compact,
(5.1.4.5)
K1, k × K2, k : 0 ≤ k ≤ m is disjoint,
m
)
Ø
K1, k × K2, k ⊆ U for U ⊆ X1 × X2 open
k =0
[m1 × m2 ](S) B inf {mU (U) : S ⊆ U, U open.}.
Step 3: S h ow t h at m1 × m2 i s no n d e c r e a s i ng
(m1 (U1 )+m2 (U2 ))·min{m1 (U1 ), m2 (U2 )} > ε > 0 (5.1.4.12)
5.1 Measure theory 529
Let ε > 0. Choose δ > 0 such that δ(m1 (S1 )+m2 (S2 ))+δ2 < ε.
Because mi is regular, there is some open Ui ⊆ Xi with Si ⊆ Ui
and
so that
Ø
Ki,k = Ki,k,S (5.1.4.21)
S ⊆Sk
and
m
Ø m Ø
Ø
Li B Ki,k = Ki,k,S (5.1.4.22)
k =0 k =0 S ⊆Sk
Step 9: S h ow t h at m1 × m2 i s a m e a s u r e
That [m1 × m2 ](∅) = 0 follows immediately from the definition.
We already showed that it is nondecreasing.
Step 10: S h ow t h at m1 × m2 i s r e g u l a r
That it is outer-regular follows immediately from the definition.j
Inner-regularity also follows quite easily from the definition.
( m
Õ
[m1 × m2 ](U) B sup [m1 × m2 ](K1,k × K2,k ) : m ∈ N
k =0
K1,k ⊆ X1,
K2,k ⊆ X2 compact,
K1,k × K2,k : 0 ≤ k ≤ m is disjoint,
m
)
Ø
K1,k × K2,k ⊆ U
k =0
≤ sup{[m1 × m2 ](K) : K ⊆ U, K compact}
≤ k [m1 × m2 ](U),
St e p 1 1 : S h ow t h at m1 × m2 i s a d d i t i v e o n f i n i t e
disjoint u n i o n s o f co m pac t r e c ta ng l e s
5.1 Measure theory 533
m
!
Ø
[m1 × m2 ] K1,k × K2,k
k =0
m
(5.1.4.24)
Õ
= m1 (K1,k ) m2 (K2,k ).
k =0
Then,
m
!
Ø
[m1 × m2 ] K1,k × K2,k > [m1 × m2 ](U) − ε
k =0
m
(5.1.4.25)
Õ
≥ l m1 (K1,k ) m2 (K2,k ) − ε,
k =0
Step 12: S h ow t h at m1 × m2 i s Bo r e l
Now, let U ⊆ X1 × X2 be open. We wish of course to show
that U is open. Write Xi = m∈N Ji,m for Ji,m compact and
Ð
define Uk,l B U ∩ (J1,k × J2,l ). Then, as U = k,l ∈N Uk,l , it
Ð
suffices to show that Uk,l is measurable for k, l ∈ N. Thus, for
this step, without loss of generality, suppose that X1 and X2
themselves are compact.
5 Integration 534
m
! c!
Ø
[m1 × m2 ] U ∩ L1,k × L2,k −ε
k =0
n
(5.1.4.29)
Õ
< [m1 × m2 ](M1,k × M2,k ).
k =0
5.1 Measure theory 535
Hence,
[m1 × m2 ](S) > [m1 × m2 ](U) − ε
m n
!
Ø Ø
≥ [m1 × m2 ] L1,k × L2,k ∪ M1,k × M2,k − ε
k =0 k =0
m
Õ
= m [m1 × m2 ](L1,k × L2,k )
k =0
n
Õ
+ [m1 × m2 ](M1,k × M2,k ) − ε
k =0
> [m1 × m2 ] U ∩ (K1 × K2 )c
m
! c!
Ø
+ [m1 × m2 ] U ∩ L1,k × L2,k
k =0
− 3ε
≥ [m1 × m2 ](U ∩ (K1 × K2 )c )
+ [m1 × m2 ](U ∩ (K1 × K2 )) − 3ε.
As ε > 0 was arbitrary, we obtain the desired inequality.
As X1 and X2 are compact, [m1 × m2 ](X1 × X2 ) =
m1 (X1 ) m2 (X2 ) < ∞, and so [m1 × m2 ](U) < ∞. So, let
ε > 0, and let Ki,k ⊆ Xi be compact such that {K1,k × K2,k :
0 ≤ k ≤ m} is disjoint and
[m1 × m2 ](U) − ε
m
Õ (5.1.4.30)
< [m1 × m2 ](K1,k × K2,k ) ≤ [m1 × m2 ](U).
k =0
Applying Exercise 5.1.1.33 and using the fact that ∪km=0 K1,k ×
K2,k is measurable, we have that
m
!!
Ø
[m1 × m2 ] U \ K1,k × K2,k < ε. (5.1.4.31)
k =0
Step 13: S h ow u n i q u e n e s s
To show that two regular measures agree, by outer-regularity,
it suffices to show that they agree on on open sets. By inner-
regularity, to show this, it suffices to show that
sup{m(K) : K ⊆ U, K quasicompact}
( m
Õ
= sup [m1 × m2 ](K1,k × K2,k ) : m ∈ N
k =0
K1,k ⊆ X1, K2,k ⊆ X2 quasicompact, (5.1.4.32)
K1,k × K2,k : 0 ≤ k ≤ m is disjoint,
m
)
Ø
K1,k × K2,k ⊆ U .n
k =0
Step 14: S h ow ‘ r e c ta ng l e o u t e r- r e g u l a r i t y ’
We now seek to show (5.1.4.4), which we reproduce here for
convenience.
(
Õ
[m1 × m2 ](S) = inf m1 (U1,m ) m2 (U2,m ) :
m∈N
) (5.1.4.33)
Ø
Ui,m ⊆ Xi open, S ⊆ U1,m × U2,m .
m∈N
5.1 Measure theory 537
If we have proven the result for U open, then there are Ui,m ⊆ Xi
open such that U ⊆ m∈N U1,m × U2,m and
Ð
Õ
[m1 × m2 ](U) ≤ m1 (U1,m ) m2 (U2,m )
m∈N (5.1.4.35)
< [m1 × m2 ](U) + ε.
Hence,
Thus, for every ε > 0, we can pick Um0 with measure at most ε.
Now, using the fact that we have proven the result for open sets
of finite measure, cover Um0 with open rectangles the sum of
whose measures is within ε of [m1 × m2 ](Um0 ). Then, the sum
of these measures will be within 2ε of 0, proving the result of
G, thereby completing the proof of this step.
Px B { y ∈ X2 : hx, yi ∈ U} . (5.1.4.38)
Px ⊇ { y ∈ X2 : hx, yi ∈ S1 × S2 }
(
S2 if x ∈ S1 (5.1.4.39)
=
∅ if x < S1 .
Define
and
Ø
U1 B Ux . (5.1.4.41)
x ∈S1
K M ⊆ U x1 ∪ · · · ∪ U x m . (5.1.4.43)
where φ1 × φ2 : X1 × X2 → X1 × X2 is defined by
BU1 ×U2 :
(5.1.4.54)
Ui ⊆ Xi nonempty open.}
Exercise 5.1.4.57
(i). Find an example of a topological space that is locally
compact but not σ-quasicompact.
(ii). Find an example of a topological space that is σ-compact
but not locally quasicompact.
(i).
and
(ii). if S is measurable, then aS is measurable.
7We already know that countable sets must have Lebesgue measure zero. The
existence of the Cantor Set thus shows that the converse is false.
8We already know (Why?) that sets of Lebesgue measure zero must have empty
interior. The existence of the Cantor-Smith-Volterra Set thus shows that the converse
is false.
5.1 Measure theory 549
and define
Ck+1 B
2k h i h i (5.1.5.23)
ai +bi
− αk L ∪ ai +b
Ø
ai, 2 2
i
+ αk L, b i .
i =1
Note that Ck+1 is again the disjoint union of 2k+1 intervals all
of the same length 1−α2 L, and so this recursive definition is
k
Thus, in summary,
m (S ∩ N) ∪ (S ∩ N c )
= m(S) (5.1.5.43)
, m(S ∩ N) + m(S ∩ N c ).
is a disjoint union.
We proceed by contradiction: suppose that every subset of
S is measurable. Then, the previous equality implies that
Õ
m(S) = m(S ∩ (r + N)) (5.1.5.47)
r ∈Q d
m(q(R0 (M 0))
Õ Õ
1≥ m(q(Rm (M 0)) =
m∈Z m∈Z
Õ (5.1.5.53)
= m(M),
m∈Z
recursively:
1
f (3x) if x ∈ [0, 13 ]
2 m−1
1
if x ∈ [ 13 , 23 ] (5.1.5.55)
fm (x) B 2
1 + 1 fm−1 (3x − 2) if x ∈ [ 23 , 1].
2 2
This completes the base case. As for the inductive case, fix
m ≥ 1 and assume that (5.1.5.61) holds for 0 ≤ n < m. We
prove that it holds for m as well. From the definition (5.1.5.55)
again, we have that
| fm+1 (x) − fm (x)|
| fm (3x) − fm−1 (3x)|
if x ∈ [0, 13 ]
1
=2· 0 if x ∈ [ 13 , 23 ]
and
are measurable.
a For example, just as we need not have m(S ∪ T) = m(S) + m(T) for
S and T disjoint
∫ in general unless S and∫ T are measurable,∫ so to we need
not have that X d m(x) [ f (x) + g(x)] = X d m(x) f (x) + X d m(x) g(x) in
general unless f and g are Borel.
5.2 The integral 565
is measurable iff
(i). f ∈ Bor(X).
(ii). f −1 (U) is measurable for every U ⊆ [−∞, ∞] open.
(iii). f −1 (C) is measurable for every C ⊆ [−∞, ∞] closed.
and
(
inf hx, yi∈(X ×R)\K m {y } if K m ∩ ({x } × [0, ∞]) , ∅
h m (x) B (5.2.1.16)
0 if K m ∩ ({x } × [0, ∞]) = ∅.
Thus,
Ø
Γf = Γ f ∩ (Mm × [0, m))
m∈N
Ø (5.2.1.23)
∪ Γ f ∩ (Mm × {∞}).
m∈N
Note that
Γ f ∩ (Mm × [n, n + 1))
= {hx, yi ∈ X × [−∞, ∞] :
(5.2.1.24)
y < f (x), x ∈ Mm, n ≤ y < n + 1}
= Γ fm ,
where fm : X → R is defined by
f (x) if x ∈ Mm and f (x) < m
if x ∈ Mm and f (x) ≥ m (5.2.1.25)
fm (x) B m
0
otherwise.
Γ fm is measurable because Γ f is. Furthermore, fm is bounded
and vanishes outside a set of finite measure, and so the previous
case applies, and we have that the preimage of open sets under
fm are measurable.
5 Integration 570
and
I− ( f ) B [m × mL ] ({ hx, y i ∈ X × [−∞, ∞] : f (x) ≤ y < 0}), (5.2.2.3)
R For S ⊆ X, we define
∫ ∫
d m(x) f (x) B d m(x) χS (x) f (x). (5.2.2.5)
S X
R Note that
∫ ∫ ∫
d m(x) f (x) = d m(x) f+ (x) − d m(x) f− (x)
X X X
a By this point, it probably goes without saying that I prefer the termi-
nology I do because it allows me to be more precise. For example, I now
have terms for four classes of functions: integrable, ∞-integrable, Borel
integrable, and Borel ∞-integrable; whereas with the other convention we
would only have the terminology to speak of just one class of functions.
b For two reasons: (i) it tells you immediately what the variable you are
integrating with respect to is (just a slight convenience, especially when the
integrand is complicated) and (ii) it is more analogous to how we write the
derivative—no one writes d f (x)/dx—people write dx d f (x).
c The so-called Henstock integral is such an example.
d We mention this because in general you do need f and g to be Borel
to have X d m(x) [ f (x) + g(x)] = X d m(x) f (x) + X d m(x) g(x)—see
∫ ∫ ∫
Theorem 5.2.2.7.
e In fact, I know of no definition of the integral which strictly generalizes
the Lebesgue integral (in a way that makes no direct reference to the
Lebesgue integral). The parenthetical comment refers to the fact that you
can define integrals with values in more general spaces (e.g. the Gelfand-
Pettis integral for topological vector spaces), but in order to define such
integrals you ultimately reduce their definitions to the Lebesgue integral.
Proof. a St e p 1 : I n t ro d u c e no tat i o n
As we will be making use of the set quite a bit, it will be useful
to introduce the notation
Step 2: D e f i n e I
Let f : X → R+0 and define
∫
I( f ) B d m(x) f (x) = [m × mL ](Γ f ). (5.2.2.10)
X
Step 3: S h ow t h at I( χS ) = m(S)
and so
Step 4: S h ow t h at I i s a d d i t i v e
We wish to show that I( f + g) = I( f ) + I(g). If either I( f ) or
I(g) is infinite, then this is trivially satisfied as this equation
then reads ∞ = ∞. Thus, without loss of generality, we may
assume that I( f ), I(g) < ∞.
For f : X → [0, ∞], define τf : X × [0, ∞] → X × [0, ∞] by
Γ f +g = Γ f ∪ τf (Γg ) (5.2.2.14)
5 Integration 580
τf (Γg ) = Γ f +g \ Γ f , (5.2.2.15)
Furthermore,
τg (Γ f ) = τg {hx, yi ∈ Γ f : x ∈ MX }
∪{hx, yi ∈ Γ f : x < MX }
(5.2.2.18)
= {hx, y + bi ∈ Γ f : x ∈ MX }
∪ {hx, yi ∈ Γ f : x < MX },
5.2 The integral 581
= [m × mL ](Γ f ),
and hence (by (5.2.2.17) and the fact that these sets (in partic-
ular, τf (M)) are measurable)
[m × mL ](τf (M))
= [m × mL ](τf (Γg )) − [m × mL ](τf (Γh ))
(5.2.2.23)
= [m × mL ](Γg ) − [m × mL ](Γh )
= [m × mL ](Γg \ Γh ) = [m × mL ](M).
Õ
m(U1,m ) mL (U2,m ) − ε < [m × mL ](M). (5.2.2.24)
m∈N
Similarly,
Hence,
[m × mL ](τf (M)) + [m × mL ] τf ((MX × [n, n + 1)) \ M)
= [m × mL ](τf (M))
+ [m × mL ] τf ((MX × [n, n + 1)) \ M) .
Hence,
0 = [m × mL ](M X × [n, n + 1)) − [m × mL ](τ f (M X × [n, n + 1)))
= ([m × mL ](M) + [m × mL ]((MX × [n, n + 1)) \ M))
− [m × mL ](τ f (M)) + [m × mL ](τ f (M X × [n, n + 1)) \ τ f (M))
Note that
Γa f B {hx, yi ∈ X × R : y < a f (x)}
(5.2.2.29)
= {hx, ayi ∈ X × R : y < f (x)} = aΓ f .
Then, it follows from Exercise 5.1.5.14 that
I(a f ) B m(Γa f ) = m(aΓ f ) = a m(Γ f ) C aI( f ). (5.2.2.30)
St e p 6 : P rov e L e b e s g u e ’ s M o no t o n e Co n v e r -
gence Th e o r e m
Let m 7→ fm be a nondecreasing sequence of functions (it is
implicit that the it is nondecreasing almost-everywhere). Let
us define f∞ : X → [0, ∞] by f∞ (x) B limm f (x). For almost
every x ∈ X, the sequence m 7→ fm (x) is nondecreasing, and
so by the usual Monotone Convergence Theorem, this limit
f∞ (x) exists in [0, ∞]d for almost-every x. For all the points
where we do not have monotonicity, we may without loss of
generality take f∞ to be infinite there.
We wish to show that
lim I( fm ) = I( f∞ ). (5.2.2.31)
m
Step 7: S h ow u n i q u e n e s s
Now, let I : Bor+0 (X) → [0, ∞] be an additive nonnegative-
homogeneous map that satisfies Lebesgue’s Monotone Conver-
gence Theorem and I( χS ) = m(S). It follows immediately that
they agree on Borel simple functions. However, we already
know (Proposition 5.2.1.35) that any Borel function can be
written as the limit monotone limit of a sequence of simple
Borel functions, so that, by Lebesgue’s Monotone Convergence
Theorem, we actually have equality of I with the integral on
all Borel functions.
a Proof adapted from [Pug02, pg. 377].
b This requires the use of the strict inequality in the definition of Γ f and
Γg .
c Because every open set can be written as the countable disjoint union
of closed-open intervals—see Exercise 5.1.5.13.
d If the sequence is bounded, the Monotone Convergence Theorem tells
us it converges in R. Otherwise, it converges to ∞.
e I find it fascinating how relatively trivial this is compared to additivity,
despite this having a name attached to it and additivity practically taken for
granted. On the contrary: Lebesgue’s Monotone Convergence Theorem
requires just basic theory of measures—additivity on the other hand requires
a reasonably well-developed theory of topological measure spaces!
as desired.
It thus remains to prove the result in the nonnegative case.
So, let f : X → [−∞, ∞] be nonnegative and Borel. We wish
to show that
∫ ∫
d m(x) f (x) = lim d m(x) f (x). (5.2.2.38)
X K ∈K K
To
∫ show this, we show that every cofinal subnet λ 7→
d m(x) f (x) has in turn a subnet that converges to
∫Kλ
d m(x) f (x). So, let λ 7→ K d m(x) f (x) be a cofinal
∫
X λ
5 Integration 588
as desired.
Similarly,
Õ∫
d m(x) f− (x) = ∞. (5.2.2.41)
m∈N Nm
Define
∫ ∫
am B d m(x) f+ (x) − d m(x) f− (x). (5.2.2.42)
Pm Nm
and
∫ ∫
d m(x) f− (x) = lim d m(x) f− (x). (5.2.2.45)
Nm n L m, n
and
∫ ∫
ε
d m(x) f− (x) − d m(x) f− (x) < (5.2.2.47)
Nk 2m
L k, n m
One crucial fact is that the integral ‘doesn’t care’ about sets of
measure 0.
Proposition 5.2.2.49 Let hX, mi be a topological measure
space and let f , g : X → [−∞, ∞] be ∞-integrable and
Borel. Then, if∫ f (x) = g(x) almost-everywhere, then
d m(x) f (x) = X d m(x) g(x).
∫
X
a Another way to see this is from the fact that X d m(x) χS (x) = m(S)
∫
almost-everywhere, and so
∫
d m(x) f (x)
X ∫ ∫
= d m(x) f+ (x) − d m(x) f− (x)
∫X ∫X
(5.2.2.50)
= d m(x) g+ (x) − d m(x) g− (x)
∫X X
= d m(x) g(x).
X
St e p 2 : P rov e t h e ca s e w h e r e f a n d g a r e no n -
negativ e
Suppose that f and g are nonnegative, ∞-integrable, and Borel.
Then, by definition
∫
d m(x) f (x) = [m × mL ](Γ f ), (5.2.2.51)
X
where
and so
[m × mL ](Γ f ∩ Γcg ) ≤ [m × mL ](S × [0, ∞])
(5.2.2.54)
= 0 · ∞ B 0.
5.2 The integral 593
Hence,
= d m(x) | f (x)| .
X
a This requires that f be borel.
is integrable and
∫
lim d m(x) | fλ (x) − f∞ (x)| = 0. (5.2.3.6)
λ X
∫1
On the other hand, we have that 0
dx fm (x) = 1 for all m ∈ N,
and so
∫ 1
lim dx fm (x) = 1. (5.2.3.22)
m 0
We mentioned awhile back that sums are really just integrals over
the counting measure.
a By additivity and nonnegative-homogeneity.
and
∫ ∫ ∫
d[m1 × m2 ](x) f (x) = d m2 (x2 ) d m1 (x1 ) f (x1, x2 ) (5.2.3.29)
M1 ×M2 M2 M1
Proof. St e p 1 : M a k e h yp o t h e s e s
Suppose that f is Borel or characteristic. First, consider the
case where f is Borel. Ultimately, (Step 11) we reduce this
case to proving the case for f characteristic as well, and so in
that step we will finish the case f characteristic as well.
Step 2: R e d u c e t o t h e ca s e o f (5.2.3.28)
By 1 ↔ 2 symmetry, it suffices to prove (5.2.3.28).
as desired.
Step 6: D e f i n e a m e a s u r e m o n X1 × X2
Define m : 2X1 ×X2 → [0, ∞] by
∫ ∫
m(S) B d m1 (x1 ) d m2 (x2 ) χS x1 (x2 ), (5.2.3.31)
X1 X2
where
we have χ[K1 ×K2 ] x1 (x2 ) = chiK1 (x1 ) χK2 (x2 ), and hence indeed
m(K1 × K2 ) B X . Thus, it suffices to show that this defines a
∫
1
regular Borel measure.
Step 8: S h ow t h at m i s a m e a s u r e
It is immediate that m(∅) = 0.
m is nondecreasing by Exercise 5.2.3.14.
Let Mm ⊆ X1 × X2 for m ∈ N. Define Sm B km=0 Mk for
Ð
m ∈ N. Then
Step 9: S h ow t h at m i s r e g u l a r
We first check outer-regularity. Let S ⊆ X1 × X2 . If m(S) = ∞,
then
Without loss of generality,a assume that the U1,m s and the U2,m
are disjoint. Then,
Ø
m U1,m × U2,m
m∈N
∫ ∫
B d m1 (x1 ) d m2 (x2 ) χ Ð (x2 )
m∈N U1, m ×U2, m x
X1 X2 1
∫ ∫ Õ
=b d m1 (x1 ) d m2 (x2 ) χU1, m (x1 ) χU2, m (x2 ) (5.2.3.37)
X1 X2 m∈N
Õ ∫ ∫
=c d m1 (x1 ) χU1, m (x1 ) d m2 (x2 ) χU2, m (x2 )
m∈N X1 X2
Õ
= m1 (U1,m ) m2 (U2,m ).
m∈N
[m1 × m2 ](U)
( m
Õ
B sup m1 (K1,k ) m2 (K2,k ) :
k =0
m ∈ N, Ki,k ⊆ Xi quasicompact, (5.2.3.38)
K1,k × K2,k : 0 ≤ k ≤ m is disjoint,
m
)
Ø
K1,k × K2,k ⊆ U
k =0
[m1 × m2 ](S)
(5.2.3.39)
B inf {[m1 × m2 ](U) : S ⊆ U, U open} ;
Step 10: S h ow t h at m i s Bo r e l
Let U ⊆ X1 × X2 be open. The argument for inner-regularity
on open sets using (5.2.3.38) and (5.2.3.39) will show that
the same equation holds with m1 × m2 on the left-hand side
replaced with m. Thus, modulo a set of measurable 0 (which is
measurable), U is a countable disjoint union of quasicompact
rectangles. Thus, it suffices to show that m(K1 × K2 ) is
measurable for Ki ⊆ Xi quasicompact.
So, let S ⊆ X1 × X2 be arbitrary. We wish to show that
Hence,
as desired.
where
Sx1 B { x2 ∈ X2 : hx1, x2 i ∈ S} (5.2.3.48)
and
Sx2 B { x1 ∈ X1 : hx1, x2 i ∈ S} (5.2.3.49)
for xi ∈ Xi .
R Cavalieri’s Principle is the statement the ‘volume’ of
an object can be computed by integrating the ‘area’
of the cross-section as a function of the ‘height’. For
example, this is most likely what you would use to
show that volume of a cone with radius r and height
h is 13 πr 2 h.a
5.2 The integral 609
[m1 × m2 ](S)
∫
= d[m1 × m2 ](x) χS (x)
∫X1 ×X2 ∫
= d m1 (x1 ) d m2 (x2 ) χS (x1, x2 )
X1 X2 (5.2.3.50)
∫ ∫
= d m1 (x1 ) d m2 (x2 ) χS x1 (x2 )
X1 X2
∫
= d m1 (x1 ) m2 (Sx1 ).
X1
and
∫
X2 3 x2 7→ d m1 (x1 ) f (x1, x2 ) ∈ [−∞, ∞] (5.2.3.57)
X1
are Borel.
Proof. St e p 1 : Ma k e h y po t h e se s
Suppose that f is Borel and ∞-integrable.
Step 2: R e d u ce t o t h e ca s e o f x1 7→ f (x1, x2 )
By
∫ 1 ↔ 2 symmetry, it suffices to prove that X 3 x1 7→
X
d m2 (x2 ) f (x1, x2 ) ∈ [−∞, ∞] is Borel.
2
St e p 4 : R e d u c e t o t h e ca s e w h e r e f = χM f o r
M ⊆ X1 × X2 m e a su r a b l e
5.2 The integral 611
St e p 5 : R e d u c e t o t h e ca s e w h e r e f = χM f o r
M ⊆ X1 × X2 m e a su r a b l e of f in it e m e a s u r e
Suppose we have proven the result characteristic functions of
measurable sets with finite measure, and let M ⊆ X1 × X2
be measurable (not necessarily of finite measure). Write
Xi = m∈N Ki,m for Ki,m ⊆ Xi compact and define
Ð
m
Ø
Lm B K1,k × K2,l, (5.2.3.58)
k =0=l
St e p 6 : P rov e t h e r e s u lt f o r f = χM f o r M ⊆
X1 × X2 m e a su r a b l e of f in i t e m e a s u r e
Let M ⊆ X1 × X2 be measurable of finite measure. Let ε > 0,
and , using “rectangle outer regularity” ((5.1.4.4)), choose
ε ⊆ X open such that M ⊆ Ð ε ε ε
Ui,m i m∈N U1,m × U2,m C U and
ε ε
Õ
m1 (U1,m ) m2 (U2,m ) − ε < [m1 × m2 ](M). (5.2.3.59)
m∈N
5 Integration 612
and so
m2 (Uxε1 ) = ε
) χUε 1, m (x1 ),
Õ
m2 (U2,m (5.2.3.61)
m∈N
then
and
is Borel.
(i).
∫ ∫ ∫
d m(x) [ f (x)+g(x)] ≤ d m(x) f (x)+ d m(x) g(x)
X X X
a This is where we use the fact that f is integrable, so that X d m(x) g(x)
∫
is finite.
b Note that we may not have equality here if S is not measurable; however,
the inequality still holds by virtue of Theorem 5.2.3.68.
c Because g(x) − g (x) ≥ 0 and g
m0 m0 is bounded above by m0 .
Finally, define
∞
Õ m0
Õ
vB cm χUm and u B cm χKm . (5.2.3.101)
m= 0 m= 0
a Proof adapted from [Rud87, pg. 56].
5.3 L p spaces
For X a topological measure space, L p (X) will wind up being a
certain collection of Borel functions defined on X. While there are
many reasons one might be interested in such objects, one possible
motivation is that, if you care about topological measure spaces, then
you can obtain information about X by a study of L p (X).10 In this
case, the space is a topological measure space, and so the relevant
functions will be the Borel functions. In a perfect world, I suppose
we could use the integral to put (semi)norms on Bor(X) itself, but
unlike in the continuous case where we had theorems (namely the
Extreme Value Theorem) to guarantee the finiteness of the supremum
seminorms on quasicompact sets, there are no such theorems in this
case—there’s no getting around the fact that some Borel function will
have infinite integral on any set of positive measure. And so, instead
of studying all of the Borel functions at once, we study certain subsets
of them. The L p spaces are certain nice spaces of Borel functions,
namely the ones whose pth power is integrable.
Before we do anything else, we define the so-called L p norms.
1
∫ d m(x) | f (x)| p p
p,∞
k f kp B X
sup y ∈ R : m f −1 ([y, ∞]) > 0 p = ∞.
10In fact, the idea of studying spaces by instead studying functions defined on
those spaces is quite a pervasive theme in all of mathematics.
5.3 L p spaces 623
R is the category of (real) Hilbert spaces. If you know what this is,
a Hil
great—here’s another cookie *gives yet another cookie to the reader*. If
not, don’t worry about it for now—as long as you intuitively see why this is
the same as the Euclidean norm, you’re fine.
lim k f k p = k f k ∞ . (5.3.4)
p→∞
k f gk 1 ≤ k f k p k gk q . (5.3.8)
as desired.
Exercise 5.3.13 Prove the result for p = 1 and q = ∞.
a I feel as if this notation, while consistent, is a bit obtuse—| f | is
the
∫ function defined by x 7→ | f (x)|, and k f k 1 is the number defined by
X
d m(x) | f (x)|.
b Because we have assumed that | f | = 1 = | f | .
p q
k f + gk p ≤ k f k p + k gk p . (5.3.18)
k ( f + g) p k 1 =
f ( f + g) p−1 + g( f + g) p−1
1
≤ a
f ( f + g) p−1
1 +
g( f + g) p−1
≤ b k f k p
( f + g) p−1
q + k gk p
( f + g) p−1
q
=
( f + g) p−1
q k f k p + k gk p ,
so that
1 1
p 1− q
k f + gk p B k ( f + g) p k 1 = k ( f + g) p k 1 (5.3.20)
≤ k f k p + k gk p,
as desired.
Exercise 5.3.21 Do the case p = 1 and p = ∞.
a By the usual Triangle Inequality.
b By Hölder’s Inequality.
5 Integration 628
This is just the statement that the ‘norm’ k·k p is in fact a norm,
the Triangle Inequality being the only nonobvious axiom that needs
to be satisfied. Now that we know this, we are free to defined the L p
spaces.
a0 B h1, 0, 0, 0, . . .i
a1 B h0, 1, 0, 0, . . .i
(5.3.29)
a2 B h0, 0, 1, 0, . . .i
etc.
v 7→ (φ 7→ φ(v)) (6.1.2.4)
is a natural isomorphism.
R
For this reason, in this context, we do
not distinguish between [V † ]† and V.
Theorem D.3.2.6 for the “official” version. For us, it will be useful to
think of tensor products in the following way.
Let V and W be real vector spaces, and let v ∈ V and w ∈ W.
Then, there is a ‘thing’ v ⊗ w, the tensor product of v and w, and
V ⊗ W is the vector space spanned by these things. Students often
ask “Okay, but what actually is v ⊗ w?”. The answer is that V ⊗ W is
uniquely defined by a property it satisfies (just as was the case with N,
Z, Q, and R), and that space is spanned by elements of the form v ⊗ w.
For example, if you asked me “But what actually is a real number?”,
I could tell you “A real number is a Dedekind cut.”, but not only is
this a borderline lie, it’s also just not useful. We used Dedekind cuts
to construct the real numbers, and, several hundred pages of theory
later, we haven’t needed to use them since. Dedekind cuts served
the purpose of proving that there was an object R that satisfied the
properties we claimed it did. Similarly, a more explicit construction
of v ⊗ w really only serves to prove that an object V ⊗ W exists that
satisfies the properties we claim it does. After that, however, this
explicit construction doesn’t matter, and the only thing used are those
properties which uniquely define V ⊗ W.
As it is difficult to reproduce the definition of the tensor product in
a way that is easy to digest, we refrain from reproducing the definition
here. You should consider V ⊗ W and v ⊗ w to be “officially” defined
as in Theorem D.3.2.6, and here we will only focus on the facts that
you need to know.
(ii).
v ⊗ (w1 + w2 ) = v ⊗ w1 + v ⊗ w2 (6.1.3.3)
(v1 + v2 ) ⊗ w = v1 ⊗ w + v2 ⊗ w (6.1.3.4)
V ⊗ W = Span{v ⊗ w : v ∈ V, w ∈ W }. (6.1.3.6)
R In finite
Ëk dimensions, Theorem 6.1.3.27 will tell us
that l V is naturally isomorphic to
V k ⊗,l ⊗† B V k ⊗ ⊗ [V † ]l ⊗
B V ⊗ · · · ⊗ V ⊗ V † ⊗ · · · ⊗ V † . (6.1.3.9)
| {z } | {z }
k l
By now you’re well aware of the fact that people often abuse
notation and write “ f (x)” for the function f . Strictly speaking, f (x)
is the value of f at x, but it is very useful to use this notation as it
gives you some information about the domain. For instance, if context
is clear, you can infer that the domain of f (x, y) is R2 but the domain
of g(x, y, z) is R3 .
Similarly for a tensor T, we are going to ‘decorate’ “T” with
symbols that tell us what the domain of T is.4 In fact, we’ll also want
to “decorate” “T” with symbols to indicate the codomain as well.
This might make more sense to you if you use Theorem 6.1.3.27(iii)
to view a rank hk, li tensor as a function with domain
V † × · · · × V † × V × · · · × V, (6.1.3.10)
| {z } | {z }
k l
T ∈ Mor V ⊗ · · · ⊗ V , V ⊗ · · · ⊗ V (6.1.3.12)
| {z } | {z }
k l
a If you don’t know what Einstein index notation is, don’t worry—
this remark is only to be helpful to those who might already have some
familiarity.
is denoted
a1 ···a k1 c1 ···ck2
[T1 ⊗ T2 ] b1 ···bl1 d1 ···dl2
C
a1 ...a k1 c1 ...ck2 (6.1.3.15)
[T1 ] [T ]
b1 ...bl1 2 d1 ...dl2
.
T ab v c = v c T ab . (6.1.3.16)
T abb = v a ”. (6.1.3.22)
Example 6.1.3.23
(i). Vectors (written v a ) themselves are tensors of type
h1, 0i.
(ii). Covectors (or linear functionals) (written ωa ) are of
type h0, 1i. For ω a linear functional and v a vector,
ω(v) is written as ωa v a .
(iii). The dot product (written temporarily as gab ) is an
example of a tensor of type h0, 2i—it takes in two
vectors and spits out a number, written v wÛ = gab v a w b .
(iv). Linear-transformations (T ab ) are tensors of type h1, 1i—
it takes in a single vector and spits out another vector
(written v a 7→ T ab v b ).
(i).
Ìk
V. (6.1.3.28)
l
(ii).
V ⊗ · · · ⊗ V ⊗ V † ⊗ · · · ⊗ ⊗V † . (6.1.3.29)
| {z } | {z }
k l
V † × · · · × V † × V × · · · × V → K. (6.1.3.30)
| {z } | {z }
k l
V † ⊗ V † 3 v a w b 7→ T axy v x w x ∈ V, (6.1.3.31)
T abc (6.1.3.32)
V † ⊗V ⊗V 3 ωa v b w c 7→ T xyz ωx v y w z . (6.1.3.33)
v a 7→ T abx v x . (6.1.3.34)
v a 7→ T axb v x . (6.1.3.35)
and a point. The problem in Rd is that the space of points and the
space of vectors are effectively the same thing: they are both given by
a d-tuple of real numbers, when in fact, they are really playing quite
different roles. The points tell you “where” we are and the vectors tell
you “what direction” to go in. In a general manifold, the points that
tell you “where” form the points of the space itself and the vectors do
not live in the entire space itself, but rather the tangent spaces.
Thus, while we have no intention of doing manifold theory in
general,6 we will make use of some suggestive notation that comes
from the theory.
| f (x + v) − f (x) − d fx (v)|
lim = 0. (6.2.1.6)
v→0 k vk
Before moving on, I want to mention that, for all the fuss we’re
making about different types of differentiability, in practice, one
doesn’t need to worry. The reason for this is that, usually, either you
are interested in continuous things or you are interested in smooth
(Meta-definition 6.2.1.19) things—functions which are differentiable
and just differentiable are not as commonly used.
We’ve talked awhile about directional derivatives and differentia-
bility, but we still have yet to define the derivative itself! It’s about
time we do this.
Proposition 6.2.1.14 — Derivative (of a function) Let D ⊆
Rd and let f : D → R. Then, if f is linearly-differentiable,
then there is a unique covector field on D, ∇a f , the deriva-
tive of f , such that v a ∇a f (x) = Dv g(x) for all linearly-
differentiable extensions g of f , all x ∈ D, and all v ∈ Tx (D).
f is continuously-differentiable iff it is linearly-
differentiable and ∇a f is continuous.a
6 Differentiation 656
Tx (D) B v ∈ Tx (Rd ) : x + εv ∈ D
v a [ω1 ]a1 · · · [ωk ]ak [v1 ]b1 · · · [vl ]bl ∇aT a1 ...akb1 ...bl =
v a ∇a [ω1 ]a1 · · · [ωk ]ak [v1 ]b1 · · · [vl ]bl T a1 ...akb1 ...bl
if hx, yi = h0, 0i
(
0
f (hx, yi) B x2 y (6.2.1.23)
x 2 +y 2
otherwise.
vx2 vy
v 7→ C Dv f (h0, 0i) (6.2.1.25)
vx2 + vy2
6.2 The definition 661
In fact, things that are arguably even worse than this can happen—
it can happen that all directional derivatives exist and furthermore the
map that seconds a vector to the directional derivative in that direction
is linear, but yet the function not even be continuous. Unfortunately,
we will have to postpone such a counter-example until after having
defined the exponential function—see Example 6.5.1
And now we present some familiar examples of derivatives of
tensors.
Example 6.2.1.26 — Gradient, divergence, curl, and
Laplacian The gradient, divergence, curl, and Laplacian
from multivariable calculus are all just special cases of deriva-
tives of tensors (along with other tensorial constructions such
as contraction).
Throughout this example, let f be a smooth function on Rd
and let v a be a vector field on Rd .
The gradient of f is simply ∇a f . You’ve already seen this
guy.
The divergence of v a is ∇a v a . Note how in multivariable
calculus you might write this as ∇® · v®, the ‘dot product’ of the
gradient with the vector field v®. The index contraction here
hopefully makes obvious how this is the divergence that you
know and (maybe) love.
The curl is trickier, but this is not surprising, as it was
trickier than the rest in multivariable calculus as well. First
of all, the curl of a vector field is in general not a vector
6 Differentiation 662
6.4 Calculus
6.4.1 Tools for calculation
And now we come to the results which you are probably most familiar
with from calculus.
Proposition 6.4.1.1 — Algebraic Derivative Theorems
Let f , g : Rd → R be linearly-differentiable.a and let α ∈ R.
Then,
(i). (Linearity) ∇a ( f + g) = ∇a f + ∇a g;
(ii). (Homogeneity) ∇a (α f ) = α∇a f ;
(iii). (Product Rule)∇a ( fg)= (∇a f )g + f (∇a g);b and
(iv). (Quotient Rule) ∇a gf = (∇ a f )g−g2
f (∇ a g)
whenever g ,
0.
a Everything works just as well (with the same proof) if you just assume
it is differentiable in some direction at some point, but the notation is more
tedious.
b I would get in the habit of not mixing-up the order of f and g (e.g. by
writing (∇a f )g + (∇a g) f or something of the like). It won’t matter for
us, but it can and will latter when you’re working with things that are not
commutative (the cross-product of vectors is probably the most elementary
example).
g ( f (x + hv)) − g( f (x))
v a ∇a [g ◦ f ](x) = lim (6.4.1.6)
h→0 h
and on the other hand
(∇α g( f (x))) (v a ∇a f α (x))
1 (6.4.1.7)
= lim [g ( f (x) + hv a ∇a f (x)) − g( f (x))]
h→0 h
and so
f (x0 +hv) = f (x0 )+hK > f (x0 ) and f (x0 −hv) = f (x0 )−hL < f (x0 ), (6.4.2.4)
a Well,
of course the statement of the theorem doesn’t care what you’re
trying to do, but in practice this is used when you’re trying to maximize f
subject to the constraint g.
a In your calculus class, you may have seen it required that f , g, h are
continuous on [a, b] and differentiable on (a, b). However, recall that (see the
definition, Meta-definition 6.2.1.10) a function being differentiable means
that it is differentiable on a neighborhood of the domain. In particular, by
Proposition 6.5.8, our hypothesis implies that f , g, h are all continuous on
[a, b].
6 Differentiation 674
and f : [−1, 1] → R2 by
1
(
x− 4 if x ∈ Q
f (x) B (6.4.2.23)
x2 if x ∈ Qc .
1
We first show that the derivative at 2 is 1.
1 1 1
( 2 +ε)− 4 − 4
f ( 12 f ( 12 )
+ ε) − if ε ∈ Q
= ε2
1 1
ε
+ε −4
2
if x ∈ Qc (6.4.2.24)
ε
(
1 if ε ∈ Q
=
1+ε if ε ∈ Qc .
f ( 12 + ε) − f ( 12 )
lim+ = 1 > 0. (6.4.2.25)
ε→0 ε
1 1
the fact that 2 +δ < 2 + ε, we nevertheless have that
Let us write
∇ f (x)
L B lim . (6.4.2.29)
x→b ∇g(x)
Now, for every x ∈ (a, b), define
∇ f (t)
m(x) B inf ∈ [−∞, ∞) (6.4.2.30)
t ∈(x,b) ∇g(t)
∇ f (t)
M(x) B inf ∈ (−∞, ∞] (6.4.2.31)
t ∈(x,b) ∇g(t)
f (x)
It follows that lim supx→b g(x) = L. Similarly,
f (x) f (x)
lim inf x→b g(x)= L, and hence limx→b g(x) exists and is
equal to L by Proposition 2.4.3.31.
a The basic idea of the entire proof was that (i) it suffices to show that
f (x)
m(x) ≤ g(x) ≤ M(x) and (ii) this holds by the Ratio Mean Value Theorem
and the fact that f (y) and g(y) are ‘negligible’. Thus, the remainder of
the proof boils down to showing that f (y) and g(y) are ‘negligible’ in an
appropriate sense.
b We must use lim sup becasue we do not know a priroi that this limit
exists.
6.4 Calculus 679
Well, sort of. If we’re not careful about what we mean, this can
completely fail.. That’s right—your calculus teachers lied to you. You
should sue them.13
Example 6.4.3.2 — A uniformly-continuous function on
[0, 1] that is 0 at 0, 1 at 1, but whose derivative is 0 almost-
everywhere—the Devil’s Staircase We’ve seen this bad boy
before—see Example 5.1.5.54.a The construction is relatively
long, and so we don’t repeat it here, but if it helps you remember,
it’s that function which is constant on the components of the
complement of the Cantor Set in [0, 1]. In fact, as the Cantor
Set has measure 0, it follows from this fact that the derivative
of the Devil’s Staircase is 0 almost-everywhere. Hence, on
13I’m exaggerating for dramatic effect. (So dramatic, isn’t it?) When you first took
calculus, the idea that a derivative would be defined only almost-everywhere was
probably just not a thing.
6 Differentiation 680
∫1 d
one hand we have that 0 dx dx f (x) = 0, but on the other
hand we have f (1) − f (0) = 1. Turns out that a lot can happen
on a set of measure 0!
In fact, you can modify this by adding x to obtain a uniform-
homeomorphism (the Cantor Function—see the same example,
Example 5.1.5.54) which has derivative 1 almost-everywhere,
but yet goes from 0 to not 1, but 2, on [0, 1]. The point
is, you can have ridiculously nice functions for which the
Fundamental Theorem of Calculus fails if the derivative is
only required to exist almost-everywhere.
a There, it was used to generate an example of a uniform-homeomorphism
of R that preserves neither measurability not measure 0.
d x
∫
dt f (t) = f (x) (6.4.3.4)
dx a
16Take any function f , any point x ∈ R, and any ‘value’ y ∈ R—you can redefine
f to be equal to y at x, and you will have no effect on its integral, so that F(x) will
not change, but f (x) will, and so we will no longer have that dxd F(x) = f (x).
6 Differentiation 682
d
Then, if t 7→ f (t) is integrable on (a, b), then
dt
∫ x
f (x) − f (a) = dt dtd f (t). (6.4.3.7)
a
d
R If dx F(x) = f (x), then F is an antiderivative or
primitive of f . The practical use of this result, as
described in the previous remark, is that you can
compute integrals by finding an antiderivative of the
integrand.
d
Proof. a Suppose that x 7→ dx f (x) is integrable on (a, b).
Without loss of generality, assume that x = b. Let ε > 0.
Then, by Carathéodory-Vitali Theorem (Theorem 5.2.3.95),
there is a lower-semicontinuous function g : (a, b) → R such
that (i) dtd f ≤ g and (ii)
∫ b ∫ b
d
dt g(t) < dt dt f (t) + ε. (6.4.3.8)
a a
d f (t) − f (x) d
g(t) > dt f (x) and < dt f (x) + εta (6.4.3.10)
t−x
for all t ∈ (x, x + δx ). Thus, for t ∈ (x, x + δx ), we have
∫ t
Fη (t) − Fη (x) = ds g(s) − [ f (t) − f (x)] + η(t − x)
x
> (t − x) dtd f (x) − (t − x) dtd f (x) + η + η(t − x)
= 0.
Define
and so
∫ b
d
f (b) − f (a) ≤ dt dt f (t). (6.4.3.13)
a
Hence,
∫
dhx, yi f (x, y)
D ∫
= dhx, yi χD (x, y) f (x, y)
[0,1]×[0,1]
∫ 1 ∫ 1
= dx dy χ[0, x] (y) f (y)
0 0
∫ 1 ∫ x
C dx dy x 3 y
0 0 (6.4.3.18)
∫ 1
1 2 x
= dx x 3 2y 0
0
∫ 1
= 1
2 dx x 5
0 1
1 1 6
= 2 x6 0
1
= 12 .
Of course, you already know the answer—or so you think you do.
Can you define exp(x)? For what it’s worth, you do have the tools
to do so at this point, but it’s quite likely that someone told you
once upon a time that e ≈ 2.718 . . . and then exp(x) B ex . If it’s
not clear to you at this point that this is just complete and utter
nonsense, then apparently I’m not very good at writing mathematical
exposition.17 What you could do, however, is define exp by its power-
m
series, exp(x) B m∈M xm! , but where did that formula come from?
Í
The proper way to define the exponential function is that it is the
17Or maybe you’re just stupid. (In case it’s not obvious, I’m not being serious. . . .)
6 Differentiation 688
unique function from R to R that (i) is equal to its own derivative and
(ii) is 1 at 0.18 Of course, as always, we can’t just go around asserting
things like this exist willy-nilly. Who can even comprehend the chaos
that might ensue? We must prove that such a thing exists, and that
only one such thing exists. The tool that will allow us to do this is a
theorem (Theorem 6.4.5.15) concerning the existence and uniqueness
of (ordinary) differential equations. This result doesn’t just assert the
existence of any old functions, however. It asserts the existence of
analytic functions, and so before we get to the result about solutions
to differential equations, we first briefly investigate analytic functions.
Analytic functions
.
is
18The unique function that that is equal to its own derivative and is equal to 0 at 0
is the
√ function that is everywhere 0. 1 is the next most obvious choice, as opposed to,
say π.
6.4 Calculus 689
is the unique real number r0 such that (i) the series converges
absolutely for all x ∈ R with | x − x0 | and (ii) diverges for all
x ∈ Rd with | x − x0 | > r0 .
Furthermore,
1
r0 = (6.4.5.6)
lim supm |cm | 1/m
You’ll recall from calculus that a lot of the functions you worked
with (e.g. ln, arctan, etc.) were defined as inverses to other functions.
It will thus be of interest to us in order to know that inverses of analytic
functions are analytic.
Step 2: S h ow s m o o t h n e s s g i v e n e x i st e nc e
For the proof, we do not make use of index notation.
Step 3: S h ow e x i st e nc e
Define γm : R → Rd by
m
Õ Ak t k
γm (t) B γ0 . (6.4.5.17)
k =0
k!
Ak k (6.4.5.19)
m−1 m
Õ Ak−1 Õ
= A lim t k−1 = A lim t
m
k =1
(k − 1)! m
k =0
k!
= Aγ.
And the other condition follows easily from the fact that
γm (0) = γ0 for all m ∈ N. By construction,a γ is the limit
of its Taylor series in C ∞ (R), and therefore, by the previous
exercise, is globally-analytic.
Step 4: S h ow u n i q u e n e s s
6 Differentiation 694
by
∫ t
[T( f )] (t) B γ0 + A ds f (s). (6.4.5.22)
0
L
≤L· · k f − gk = 12 k f − gk,
2
so that this is indeed a contraction mapping.
a Almost—you have to check that this is actually the Taylor series of the
function.
d k
Proof. For 0 ≤ k ≤ m − 1, define gk B dx k f. Then,
(6.4.5.25) together with these definitions gives us a system of
m first-order differential equations:
0 1 0 ··· 0
g ª g0
© 0 0 0 1 ··· 0
©
d . ª® ® . ®
®© ª
.. ® = .. .. .. .. .. ® .. ® (6.4.5.26)
dt ® . . . . . ® ®
«gm−1 ¬
a a a a
® g
0 1 2 « m−1 ¬
«− a m − am − am ··· − am−1
m ¬
d d
upon defining g0 B f and g1 B dx f = dx g0 , then our system
would read
d
dx g0 = + 1 · g1
d
(6.4.5.27)
dx g1 = − 2 · g0 + 3 · g1 .
The definitions
Definition 6.4.5.28 — Exponential function The exponen-
tial function, exp, is the unique function exp : R → R such
d
that (i) dx exp(x) = exp(x) and (ii) exp(0) = 1.
Looks like we’re just going to have to come up with a new name for
such a function. “Cosine” sounds rad. Let’s call it that.
Definition 6.4.5.31 — Cosine and sine The cosine func-
tion, cos, is the unique function cos : R → R such that
d2 d
(i) dx 2 cos(x) = − cos(x), (ii) cos(0) = 1, and (iii) dx cos(0) =
0.
The sine function, sin, is the unique function sin : R →
d2
R such that (i) dx 2 sin(x) = − sin(x), (ii) sin(0) = 0, and
d
(iii) dx sin(0) = 1.
Corollary 6.4.5.33
(i). exp(x + y) = exp(x) exp(y) for all x, y ∈ R.
(ii). exp(x)y = exp(xy) for all x, y ∈ R.
for all m ∈ N.
d
In particular, as dx exp = exp > 0, exp is increasing, and in
particular, injective. This allows us to make the following definition.
Exercise 6.4.5.38
(i). Show that ln(x y) = ln(x) + ln(y) for all x, y ∈ R+ .
(ii). Show that x ln(y) = ln(y x ) for all x ∈ R and y ∈ R+ .
d
Exercise 6.4.5.39 Show that dx ln(x) = x1 .
6.4 Calculus 699
dm
Show that dx m f (0) = 0 for all m ∈ N.
d d
Exercise 6.4.5.44 Show that dx sin = cos and dx cos = − sin.
for all x, y ∈ R.
6 Differentiation 700
cos x + τ4 = − sin(x)
(6.4.5.49)
sin x + τ4 = cos(x).
d2
Proof. It follows from the definition that dx 2
cos(0) =
d2
− cos(0) = −1. We first show that dx 2 cos is not everywhere
d
negative. If this were the case that dx cos would be (strictly)
d
decreasing. As dx cos(0) = 0, there would then be some
d
x0 > 0 with dx cos(x0 ) C y0 < 0. Then, because the deriva-
d
tive is decreasing again, we would have that dx cos(t) ≤ y0 for
all y ≥ x0 . Hence,
x
d
∫
cos(x) = dt
cos(t)
0 dt
d d
∫ x ∫ x0
= dt cos(t) + dt cos(t) (6.4.5.50)
x0 dt 0 dt
d
∫ x0
≤ y0 · (x − x0 ) + dt cos(t).
0 dt
By taking x sufficiently large, we would obtain the existence
of some point x ∈ R with cos(x) < −1: a contradiction.
6.4 Calculus 701
d2
n o
Therefore, the set x ∈ R+ : dx 2 cos(x) = 0 is nonempty,
and so we may define
d2
n o
θ B inf x ∈ R+ : dx 2 cos(x) = 0 . (6.4.5.51)
2
d
By continuity, we have that dx 2 cos(θ) = 0, and so cos(θ) = 0
addition formula’
1 = cos(0) = cos(T) = cos T4 + T4 + T4 + T4
4 2 2 4
= a cos T4 − 6 cos T4 sin T4 + sin T4
= C 4 − 6C 2 (1 − C 2 ) + (1 − C 2 )2 = 8C 4 − 8C 2 + 1,
C 2 − 1 = 0, (6.4.5.52)
a Algebra omitted. It’s tedious.
Definition 6.4.5.55 — tan, cot, sec, and csc Define tan, sec :
− π2 , π2 → R and cot, csc : (0, π) → R by
sin
(i). tan B cos ;
cos
(ii). cot B sin ;
1
(iii). sec B cos ; and
1
(iv). csc B sin .
Exercise 6.4.5.57 Show that the image of both tan and cot is
R.
Combinatorically, k1,...,k
m
represents the number
R n
of ways in which you can place m objects into n
boxes, with k1 objects in the first box, k2 objects in
the second box, etc..
.
For α ∈ Nn and x ∈ Rn , we write
.
For α ∈ Nn and m ∈ N, we write
m m
B . (6.4.5.66)
α α1, . . . , αn
6.4 Calculus 707
by
a That said, it’s worth noting that if you do allow the complex numbers,
this can be made to work for all x, y ∈ R (and all a ∈ C). This is why we
have included the absolute values here, even though they are not strictly
necessary—they will be necessary when you go to generalize.
1 1 1−p ∞ 1 1
∫ ∞
dx p = x = 0− = , (6.4.5.77)
1 x 1−p 1 1−p p−1
The irrationality of e
We mentioned a long time ago way back in Chapter 2 that e was
irrational. Now that we (finally!) know what e is, it is time to return
to this issue.
Theorem 6.4.5.79. e is irrational.
Proof. St e p 1 : E x p r e s s e a s a s e r i e s
d
As exp(x) is globally-analytic and dx exp(x) = exp(x), the
Taylor series of exp centered at 0 gives in particular
Õ 1
e B exp(1) = . (6.4.5.80)
m!
m∈N
Step 2: S h ow t h at e < Z
We first check that e is not an integer. To do that, we show that
it is strictly larger than 2 and strictly less than 3. That is is
strictly greater than 2 is obvious: e = 1 + 1 + positive stuff > 2.
On the other hand, to see that e < 3, note that 2m ≤ m! for
6.4 Calculus 711
m ≥ 4, so that
∞
1 1 Õ 1
e=1+1+ + +
2 6 m=4 m!
∞
1 1 Õ 1
≤ 1+1+ + +
2 6 m=4 2m
∞
1 1 Õ 1 (6.4.5.81)
=1+1+ + + 2−4
2 6 m= 0
2m
1 1 2−4
=1+1+ + +
2 6 1 − 12
1 1 7
= 1 + 1 + + + 2−3 = 1 + 1 + < 3.
2 6 24
Step 3: P ro c e e d by co n t r a d i c t i o n
Suppose that e = m +
n for m ∈ Z, n ∈ Z , and gcd(m, n) = 1.
Note that, as we now know that e is not an integer, n ≥ 2.
Step 4: D e f i n e x ∈ R
Define
∞
Õ n!
xB . (6.4.5.82)
k =n+1
k!
Step 5: S h ow t h at x ∈ Z
6 Differentiation 712
Note that
∞
Õ 1
x B n!
k =n+1
k!
∞ n
! n
!
Õ 1 Õ 1 Õ 1
= n! − = n! e −
k =0
k! k =0 k! k =0
k!
n
! (6.4.5.83)
m Õ 1
= n! −
n k =0 k!
n
Õ n!
= m(n − 1)! − ∈ Z.
k =0
k!
Hence,
∞ ∞
Õ n! Õ 1
xB ≤
k =n+1
k! k =n+1
(n + 1)k−n
∞ (6.4.5.85)
Õ 1 1 1 1
= = = < 1.
k =0
(n + 1) k+1 n + 1 1
1 − n+1 n
Step 7: F i n i s h t h e p ro o f
The last two steps give us our contradiction, and so it must
have been that e < Q.
6.5 Differentiability and continuity 713
0 if x = 0
1
(6.5.2)
f (hx, yi) B exp − 2 y
x
exp − 22 +y 2
otherwise.
x
exp(− t22 )
f (xt ) = f (ht, exp(− t12 )i) =
exp(− t22 ) + exp(− t22 ) (6.5.3)
1
= 2 , 0 = f (h0, 0i),
ε exp − 2
(εv )2
+ (y + εvy )2
x
1 y + εvy
= .
ε exp −
1 1
(εvx )2
+ (y + εvy )2 exp (εvx )2
6 Differentiation 714
Hence,
f (h0, yi + εv) − f (h0, yi)
Dv f (h0, yi) B lim+ =
ε→0 ε (6.5.5)
= 0.
Proof. Consider
f (x) − f (x0 )
f (x) − f (x0 ) = (x − x0 ). (6.5.9)
x − x0
x sin( x1 )
(
if x , 0
f (x) B (6.5.15)
0 if x = 0.
if x ∈ [0, 12 ]
(
x
f0 (x) B (6.5.22)
1−x if x ∈ [ 12 , 1].
and extend f0 : R → R periodically with period 1. For k ∈ N,
now define fk : R → R by
1
fk (x) B f (4k x).
4k 0
(6.5.23)
Then define sm : R → R by
m
Õ
sm (x) B fk (x). (6.5.24)
k =0
and will automatically have that s ∈ MorTop (R, R) (that is, that
f is continuous). We will then show that f is not differentiable
anywhere.
We thus verify Cauchyness. For n > m, we have
n n
Õ Õ 1
| sn (x) − sm (x)| = fk (x) <
k =m+1 k =m+1
4k
∞ m
Õ 1 1 Õ 1
≤ = 1
− (6.5.26)
k =m+1
4 k
1− 4 k =0
4k
1 m+1
1 1 − (4) 4 1
= 1
− = 3 .
1− 4 1 − 14 4m+1
whenever εm 0 ≤ ε . Define r
m m B 4 x − b4 xc, so that
m m
if r ∈ [0, 14 ) or r ∈ [ 12 , 34 )
(
1 1
εm B 4m+1 · (6.5.29)
−1 if r ∈ [ 14 , 12 ) or r ∈ [ 34 , 1).
1
Obviously m = 4 m+1
. For the other fact, notice that
if rm ∈ [0, 12 )
(
r
f0 (4m x) = (6.5.30)
1−r if x ∈ [ 12 , 1),
6.5 Differentiability and continuity 719
Thus,
f0 (4m (x + εm )) − f0 (4m x)
= f0 (rm + 4m εm ) − f0 (rm ) (6.5.31)
1
= f0 (rm ± 4) − f0 (rm ).
f0 (4m (x + εm
0
)) − f0 (4m x) = 4m εm
0
. (6.5.32)
1
= 4 n f0 (4 x ± 4
n n−(m+1)
) (6.5.33)
= b 1 f (4n x) = f (x).
4n 0 n
Thus,
∞
Õ
f (x + εm ) − f (x) = [ fk (x + εm ) − fk (x)]
k =0
m (6.5.34)
Õ
= [ fk (x + εm ) − fk (x)].
k =0
and so
| fn (x + εm ) − fn (x)|
x hm
= 4m−n fn m−n + m−n − fn
x
4 m−n (6.5.36)
4 4
ε m
= c 4m−n · m−n = εm
4
Thus,
f (x + εm ) − f (x)
∆m B
εm
m (6.5.37)
Õ fk (x + εm ) − fk (x)
= .
k =0
εk
In fact, if you thought this was bad, we can do even worse than this.
(This also wraps up a loose end when we motivated the introduction
of the topology on C ∞ (Rd ).)
x.
a Recall that T (R d ) is a metric vector space, with the metric being the
x
dot product. This metric induces a norm (the usual Euclidean norm), which
in turn induces a metric, which in turn induces a uniformity, which in turn
induces a topology.
While the next example doesn’t have to do with continuity per se,
I think it makes most sense to place here, along with a series of other
counter-examples.
and define f : R2 → R by
(
sin y12 if x > 0 and x 2 < y < 3x 2
f (x, y) B (6.5.47)
0 otherwise.
g is certainly Fréchet-differentiable.
is not differentiable at t = 0.
v a ∇a (w b ∇b f ) (x)
Show that
22Though we present a stronger result, what we call simply the Highest Derivative
Test.
6.6 The Inverse Function Theorem 727
A Basic set theory . . . . . . . . . . . . . . . . . . . . 731
A.1 What is a set?
A.2
Appendices
The absolute basics
Some comments on logical implication
A bit about proofs
Iff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738
The following are equivalent . . . . . . . . . . 738
For all. . . . . . . . . . . . . . . . . . . . . . . . . . . . 739
The contrapositive and proof by contradiction
739
Without loss of generality. . . . . . . . . . . . . . 741
If XYZ we are done, so suppose that ¬XYZ . 742
Proving two sets are equal . . . . . . . . . . . . 742
Induction . . . . . . . . . . . . . . . . . . . . . . . . 742
Sets
A.3 Relations, functions, and orders
Arbitrary disjoint-unions and products
Equivalence relations
Preorders
Well-founded induction
Zorn’s Lemma
A.4 Sets with algebraic structure
Quotient groups and quotient rngs
X B {Y : Y < Y } . (A.1.1)
X B {Y ∈ U : Y < Y } (A.1.2)
for some fixed set U.2 Russel’s Paradox now becomes the statement
that X < U.
1Hopefully you have seen notation like this before. If not, really quickly skip
ahead to Appendix A.2 The absolute basics to look up the meaning of this notation.
2“U” is for “universe”.
A.2 The absolute basics 733
(i). m < n.
(ii). m ≤ n − 1.
(iii). m + 1 ≤ n.
To prove this, you need to prove that (i) iff (ii), (i) iff (iii), and (ii) iff
(iii)—this is exactly what it means for all the three statements to be
logically-equivalent to one another. On the face of it, it seems like
this would mean we would have to do 2 × 3 = 6 proofs. Not so. In
fact, it is enough to prove (i) implies (ii), (ii) implies (iii), and (iii)
implies (i). Using these three implications alone, you can go from
any one statement to any other. For example, (ii) implies (i) because,
if (ii), then (iii), and hence (i).
8Keep in mind that in the following subsubsections we will often make use
of examples to illustrate concepts that we technically have not yet developed the
mathematics for yet. First of all, you needn’t worry, as because we are just using the
examples for the purposes of illustration, this doesn’t make our development circular.
Secondly, if you can’t follow an example because you haven’t seen it before, don’t
worry—just get what you can out of it and move on.
9If you are fine with proofs, you can probably safely skip to the next subsection,
Appendix A.2.3 Sets.
A.2 The absolute basics 739
For all. . .
If the statement you are trying to prove is of the form “∀ X ∈ X, P(x)”,
you should almost certainly start your proof with something like
“Let x ∈ X be arbitrary.” You then prove P(x) itself. Pretty self-
explanatory.
The contrapositive and proof by contradiction
Proof by contradiction and proof by contraposition are two closely
related proof techniques. In fact, in a sense to be explained below,
they’re the same proof technique. Before we get there, however, let us
first explain what these two techniques refer to.
First, we explain “contradiction”. Assume you want to prove the
statement “A implies B.”. Of course, you first assume that A is true.
You now try to prove that B is true. Sometimes doing this directly
can prove difficult, and in such cases, you can try what is referred to
as proof by contradiction: Suppose that ¬B is true. Now, using A
and ¬B, try to prove something you already know to be false. As the
only assumption you made was ¬B, that assumption must have been
incorrect, and therefore ¬¬B is true, and hence B is true.10
On the other hand, proof by contraposition refers to nothing more
than an application of Proposition A.2.1.4. That is, if you would like
to prove that “A implies B”, you instead prove that “¬B implies ¬A”.
All that remains in this subsubsection is an explanation of the rela-
tionship between proof by contraposition and proof by contradiction.
As this relationship is not particularly important, feel free to skip to
the next subsubsection.
Superficially, proof by contradiction and proof by contraposition
appear to be distinct, but related techniques. On the other hand,
they are equivalent in a sense to be described as follows.11 First
of all, we have to be precise about what we mean by “proof by
10This logic implicitly uses what is called the Principle of the Excluded Middle,
which says that, if A is a statement, then A is true or A is false. Some mathemati-
cians reject this as valid (or so Wikipedia claims). They are crazy. Such crazy
mathematicians thus cannot use proofs by contradiction. Pro-tip: don’t be crazy.
11The explanation of exactly in what sense these two proof techniques are equivalent
is not particularly useful. Certainly, I find it highly unlikely that what follows in this
subsubsection will be of significant use in actual proof writing. Thus, feel free to
skip to the end of the following proof unless you are particularly curious.
A Basic set theory 740
fact not actually so. This is similar to how "the Principle of Strong
Induction seems stronger than (just) the Principle of Induction, but in
fact they are equivalent statements—see below.
Finally, we end with a quick comment on usage. First of all,
it is very easy to rephrase a proof by contraposition as a proof by
contradiction (as explained above), and so, if you like, you needn’t
worry about proof by contraposition at all. Furthermore, proof by
contradiction tends to be most useful when “B” in “A implies B.” is
somehow ‘already negated’. For example, in the proof of Cantor’s
Cardinality Theorem (Theorem 2.1.15), we wish to show that there is
no surjection from X to 2X . Here, we could proceed by contradiction,
and say “Suppose there exists a surjection f : X → 2X .”, and then
use this to produce a contradiction.13
Without loss of generality. . .
You will often see the phrase “Without loss of generality. . . ” used in
proofs. It is easiest to demonstrate what this means, and how to use it
yourself, with an example.
The definition of integral (Definition A.4.22) reads
A rg hX, +, 0, ·i is integral iff it has the property that,
whenever x · y = 0, it follows that either x = 0 or y = 0.
So, imagine you were doing a proof, and you know that hX, +, 0, ·i
was an integral rg,14 and that x · y = 0 for x, y ∈ X. The definition
of integral implies that x = 0 or y = 0. At this point, you could say,
“Without loss of generality, suppose that x = 0.”. You then continue
the proof using the fact that x = 0.
Logically, you would first have to assume that x = 0, finish the
proof in that case, and then go back to the case where y = 0, and finish
the proof again. However, if the proofs are essentially identical15 in
these two cases, you are ‘allowed’ to cut your work in half with the
phrase “Without loss of generality. . . ”—there is no point in repeating
the same logic with a different letter twice.
13We don’t quite phrase it like that, but this is essentially what we do.
14You shouldn’t need to know what a rg is to understand the explanation of
“Without loss of generality. . . ”.
15Obviously, if the proof needed to do the case x = 0 is significantly different from
the proof needed to do the case y = 0 (i.e. is more than just a letter swap x ↔ y), you
should not use this phrase and instead write up the proofs of both cases individually.
A Basic set theory 742
So, let p ∈ Z and suppose that you want to prove that p is prime. To
prove this condition, you would say “Let m, n ∈ Z and suppose that
p | (mn).” From here, you now want to prove that p | m or p | n. At
this point, you can say “If p | m we are done, so suppose that p - m.”.
Hopefully, the logic of this is pretty self-explanatory. If it helps,
however, you can view this as essentially the same as proof by
contradiction. If we were to proceed by contradiction, instead we
would say “Suppose that p - m and p - n.”, and from that deduce a
contradiction. Instead, in this case, we shall assume p - m, and use
that to prove that p | m.
Proving two sets are equal
A lot of proofs require you to show that two different sets are in
fact one in the same. The way to prove this is made precise with
Exercise A.2.3.4. In the meantime, however, we can say the following.
This logic is very much identical to that used for proving “iff”
statements.
Induction
Induction
Let Pm be a statement for m ∈ N. The Principle of Induction says
that
16We have omitted part of the definition in the “. . . ” that is irrelevant for us at the
moment.
A.2 The absolute basics 743
f (0) B h0, 0i
(
h f (m)x − 1, f (m)y + 1i if f (m)x ≥ 1 (A.2.2.2)
f (m + 1) B
h f (m)x + f (m)y + 1, 0i otherwise.20
Strong Induction
Equally as valid, for the exact same reason, is what is sometimes
referred to the Principle of Strong Induction, which says that
The key difference between this and ‘regular’ induction is that, in the
induction step, you don’t just assume Pm , but instead, you assume
P0 , and P1 , and P2 , and . . . Pm . Superficially, this does indeed seem
stronger,23 but in fact ‘regular’ induction and strong induction are
17(i) is referred to as the initial case or the initial step, and (ii) is referred to as the
inductive step.
18Incidentally, the passage from “A and A ⇒ B” to B is called modus ponens,
which itself is short for “modus pōnendō pōnēns”, which is Latin for (literally) “the
method to be affirmed, affirms”.
19Though it won’t be able to prove P∞ ! For example, a common fake proof
that π is rational essentially proves by induction that, for all m ∈ Z+ , the decimal
approximation of π with m digits is rational, and ‘therefore’ π is rational. Sorry, but
that’s not how induction works.
22The picture is that you go ‘down the diagonal’, unless you ‘hit the edge’, in
which case you ‘hop to the top of the next diagonal’.
23Because during the inductive step, you don’t get to assume just the single
statement Pm , but rather the m + 1 statements P0 , and P1 , and P2 , and . . . Pm .
A Basic set theory 744
A.2.3 Sets
The idea of a set is something that contains other things.
If X is a set which contains an element x, then we
write x ∈ X.26Two sets are equal iff they contain (A.2.3.1)
the same elements.
24For some reason, not too many people seem to know of well-founded induction.
30Sometimes we will also write X 3 x if it happens to be more convenient to write
it in that order (for example, in R 3 x 7→ x 2 ).
A Basic set theory 746
{ x ∈ X : P(x)}
{ x ∈ X |P(x)} ,
but my personal opinion is that this can look ugly (or even
slightly confusing) if, for example, P(x) contains an absolute
value in it, e.g.
{ x ∈ R || x| < 1} .
A.2 The absolute basics 747
X \ Y B {x ∈ X : x < Y }. (A.2.3.7)
A ∪ B B { x ∈ X : x ∈ A or x ∈ B} . (A.2.3.9)
A ∩ B B { x ∈ X : x ∈ A and x ∈ B} . (A.2.3.10)
and
Ù
S B { x ∈ X : ∀S ∈ S , x ∈ S.} .
S ∈S
a Technically, the term collection is just synonymous with the term “set”,
though it tends to be used in cases when the elements of the set itself are to
be thought of as other sets (e.g. here where the elements of S are subsets
of X).
A Basic set theory 748
X × Y B {hx, yi : x ∈ X, y ∈ Y } . (A.2.3.18)
2X B { A : A ⊆ X } . (A.2.3.23)
S ◦ R B {hx, zi ∈ X × Z :
(A.3.3)
∃ y ∈ Y such that hx, yi ∈ R and hy, zi ∈ S.} .
(T ◦ S) ◦ R = T ◦ (S ◦ R). (A.3.5)
f | S B f ◦ {hs, xi ∈ S × X : s = x} .a (A.3.7)
f | T B {hy, ti ∈ Y × T : y = t} ◦ f .b (A.3.8)
R 3 x 7→ 3x 2 − 5 ∈ R (A.3.10)
a Modulo the stupid case when the domain is the empty-set—see the
remark in Equation (A.3.26).
Þ
X B {hx, Xi : X ∈ X x ∈ X }. (A.3.1.2)
X ∈X
ιX B [idÝ X ∈X ]X , (A.3.1.5)
( )
Ö Þ
X B f:X → X : f (X) ∈ X . (A.3.1.7)
X ∈X X ∈X
hxi : i ∈ I hB x, (A.3.1.8)
A.3 Relations, functions, and orders 761
a Like the term “collection”, the term indexing set is technically just
synonymous with the word “set”. There is nothing mathematically different
about it. This term is only used to clarify to the human readers out there
how one should intuitively think of the set, specifically that it is a set whose
elements are being used to “index” other things.
πX B [idÎ X ∈X X ]X , (A.3.1.11)
XI →
Ö
X. (A.3.1.14)
i ∈I
[x0 ]∼ B { x ∈ X : x ∼ x0 } = { x ∈ X : x0 ∼ x} . (A.3.2.5)
etc..
of Z.
A Basic set theory 766
set mod 5 is
Z/ (mod5) = [0] ( mod 5), [1] ( mod 5), [2] ( mod 5),
It is quite common for us, after having defined the quotient set,
to want to define operations on the quotient set itself. For example,
we would like to be able to add integers modulo 24 (we do this when
telling time). In this example, we could make the following definition.
[x] ( mod 24) + [y] ( mod 24) B [x + y] ( mod 24) . (A.3.2.20)
This is okay, but before we proceed, we have to check that this
definition is well-defined. That is, there is a potential problem here,
and we have to check that this potential problem doesn’t actually
happen. I will try to explain what the potential problem is.
Suppose we want to add 3 and 5 modulo 7. On one hand, we could
just do the obvious thing 3 + 5 = 8. But because we are working with
equivalence classes, I should just as well be able to add 10 and 5 and
get the same answer. In this case, I get 10 + 5 = 15. At first glance, it
might seem we got different answers, but, alas, while 8 and 15 are not
the same integer, they ‘are’ the same equivalence class modulo 7.
In symbols, if I take two integers x1 and x2 and add them, and you
take two integers y1 and y2 with y1 equivalent to x1 and y2 equivalent
to x2 , it had better be the case that x1 + x2 is equivalent to y1 + y2 .
That is, the answer should not depend on the “representative” of the
equivalence class we chose to do the addition with.
A.3.3 Preorders
Definition A.3.3.1 — Preorder A preorder on a set X is a
relation ≤ on X that is reflexive and transitive. A set equipped
with a preorder is a preordered set.
[x1, x2 ] B {x ∈ X : x1 ≤ x ≤ x2 }
(x1, x2 ) B {x ∈ X : x1 < x < x2 }
[x1, x2 ) B {x ∈ X : x1 ≤ x < x2 }
(x1, x2 ] B {x ∈ X : x1 < x ≤ x2 }.
There are two preorders that you can define on any set. They are
not terribly useful, except perhaps for producing counter-examples.
R ≤D is the discrete-order on X.
R ≤I is the indiscrete-order on X.
trouble because, somehow, this process will never stop, not even if
you ‘go on forever’: give Zorn’s Lemma a try.
A B
C D (A.3.5.5)
Proof. a St e p 1 : M a k e h y p o t h e s e s
Suppose that every well-ordered subset has an upper bound.
We proceed by contradiction: suppose that X has no maximal
element.
St e p 2 : S h ow t h at e v e ry w e l l - o r d e r e d s u b s e t
has an u p p e r- bo u n d no t co n ta i n e d i n i t
Let S ⊆ X be a well-ordered subset, and let u be some upper-
bound of S. If there were no element in X strictly greater than
u, then u would be a maximal element of X. Thus, there is
some u 0 > u. It cannot be the case that u 0 ∈ S because then
we would have u 0 ≤ u because u is an upper-bound of S. But
then the fact that u 0 ≤ u and u ≤ u 0 would imply that u = u 0: a
A.3 Relations, functions, and orders 777
Step 3: D e f i n e u(S)
For each well-ordered subset S ⊆ X, denote by u(S) some
upper-bound of S not contained in S.
Step 4: D e f i n e t h e no t i o n o f a u- s e t
We will say that a well-ordered subset S ⊆ X is a u-set iff
x0 = u ({ x ∈ S : x < x0 }) for all x0 ∈ S.
Step 6: D e f i n e U
Define
Ø
UB S. (A.3.5.11)
S ⊆X
S is a u-set
St e p 7 : F i n i s h t h e p ro o f by s h ow i ng t h at U i s a
u-set
We first need to check that U is well-ordered. Let A ⊆ U be
nonempty. For S ⊆ X a u-set, define AS B A ∩ S. For each
AS that is nonempty, denote by aS the smallest element in AS
(which exists as S is in particular well-ordered). Let T ⊆ X be
some other u-set. Then, by Step 5, without loss of generality, S
is downward-closed in T. In particular, S ⊆ T so that aS ∈ T.
Hence, aT ≤ aS . Then, because S is downward-closed in T,
A.4 Sets with algebraic structure 779
R This is what justifies the ‘trick’ (if you can call it that)
of doing the same thing to both sides of an equation
that is so common in algebra.
R In a rig R, we write
R × B {r ∈ R : r has a multiplicative inverse.} . (A.4.17)
a But not fields. This is why the definitions are stated in such a way that
the additive inverses for rings are regarded as structure, whereas the multi-
plicative inverses for fields are regarded as properties—homomorphisms
should preserve all “structure”. This is a subtle and, for now, unimportant
point, and so if this doesn’t make sense, you can ignore it for the time being.
31I think it’s fair to say that quotient algebraic structures are among the most difficult
things students encounter when first beginning algebra, and so it is worthwhile to
take some extra time to step back and think about what one is actually trying to
accomplish.
A.4 Sets with algebraic structure 789
g1 g2 (modH), (A.4.1.4)
and
objects themselves, but also with maps between them that ‘preserve’
the relevant structure. In the case of a set, there is no extra structure
to preserve, and so the relevant maps are all the functions. In contrast,
however, for topological spaces, we will see that the relevant maps are
not all the functions, but instead all continuous functions.2 Similarly,
the relevant maps between monoids are not all the functions but rather
the homomorphisms. The idea then is to come up with a definition
that deals with both the objects and the relevant maps, or morphisms,
simultaneously. This is the motivating idea of the definition of a
category.
R We write
Ä
MorC B MorC (A, B) (B.1.2)
A, B ∈Obj(C)
2You might say that the entire point of the notion of a topological space is it is
one of the most general contexts in which the notion of continuity makes sense. We
will see exactly how this works later in the body of the text.
B.1 What is a category? 797
a No, we do not require that Mor (A, B) be a (small) set. (This comment
C
is really intended for those who have seen this definition elsewhere—often
times authors fix a universe U, whose elements are referred to as the small
sets, and in the definition of a category they require that the morphisms
form small sets—we make no such requirement.)
b In case you’re wondering, the quotes around “associative” are used
because usually the word “associative” refers to a property that a binary
operation has. A binary operation on a set S is, by definition, a function
from X × X into X. Composition however in general is a function from
X ×Y into Z for X B MorC (B, C), Y B MorC (A, B) and Z B MorC (A, C),
and hence not a binary operation.
The intuition here is that the objects Obj(C) are the objects you
are interested in studying (for example, topological spaces), and
for objects A, B ∈ Obj(C), the morphisms MorC (A, B) are the maps
relevant to the study of the objects in C (for example, continuous
functions from A to B). For us, it will usually be the case that every
element of Obj(C) is a set equipped with extra structure (e.g. a binary
operation) and the morphisms are just the functions that ‘preserve’
this structure (e.g. homomorphisms). In fact, there is a term for such
categories—see Definition B.1.7.
At the moment, this might seem a bit abstract because of the
lack of examples. As you continue through the main text, you will
encounter more examples of categories, which will likely elucidate
B Basic category theory 798
The idea here is that the only structure on a preordered set is the
preorder, and that the precise notion of what it means to ‘preserve’
this structure is to be nondecreasing. Of course, we could everywhere
replace the word “preorder” (or its obvious derivatives) with “partial-
order” or “total-order” and everything would make just as much sense.
Upon doing so, we would obtain the category of partially-ordered sets
and the category of totally-ordered sets respectively.
We also have the category of magmas.
B.1 What is a category? 799
Similarly, the idea here is that the only structure here is that of
the binary operation (and possibly an identity element) and that it is
the homomorphisms which preserve this structure. Of course, we
could everywhere here replace the word “magma” with “semigroup”,
“monoid”, “group”, etc. and everything would make just as much
sense. Upon doing so, we would obtain the categories of semigroups,
the category of monoids, and the category of groups respectively.
Finally we have the category of rgs.
which the objects are “sets equipped with extra structure” and the
morphisms are “functions which ‘preserve’ this structure”. The term
for such categories is concrete.
While not terribly important for us, as you might now be wondering
“What could a nonconcrete category possibly look like?”, we present
the following example.
+ 0 1
0 0 1 (B.2.14)
1 1 0
· 1 −1
1 1 −1 . (B.2.15)
−1 −1 1
(Feel free to check that this does indeed satisfy the axioms of
a group (in fact, commutative group) if you like, but this is not
so crucial .)
Now, the key to notice is the following: aside from a
relabeling of symbols, the tables in (B.2.14) and (B.2.15)
are identical. Explicitly, the relabeling is 0 7→ 1, 1 7→ −1,
and + 7→ ·. The precise way of saying this is: the function
φ : Z/2Z → C2 defined by φ(0) B 1 and φ(1) B −1 is an
isomorphism in Grp (or, to say the same thing in slightly
different language, is an isomorphism of groups).
While not always literally true, depending on your category,
I think at an intuitive level it is safe to think of two objects that
are isomorphic as being ‘the same’ up to a relabeling of the
elements. This is why, in mathematics, it is very common to
not distinguish between objects which are isomorphic. This
would be like making a distinction between the two equations
x 2 + 5x − 3 = 0 and y 2 + 5y − 3 = 0 in elementary algebra:
the name of the variable in question doesn’t have any serious
effect on the mathematics—it’s just a name.
a Note that, we can regard Z/2Z as a ring, explicitly hZ/2Z, +, 0, −, ·i,
but we don’t. We’re forgetting about the extra binary operation, and upon
doing so, we obtain the group hZ/2Z, +, 0, −i.
B.2 Some basic concepts 805
3We take our category to be concrete because, to the best of my knowledge, there
is no definition of embedding/quotient that is satisfactory for all (not-necessarily-
concrete) categories.
4See Example 3.1.3.12.
5That is, if X is a subset of Y and Y is a subset of Z, then of course I can consider
X as a subset of Z as well.
B Basic category theory 806
Now that we have the precise definition in hand, we can prove that
being an injective morphism is not enough.
Exercise B.2.19
(i). Show that a morphism f in Pre is an embedding iff it
is injective and has the property that f (x1 ) ≤ f (x2 ) iff
x1 ≤ x2 .
(ii). Show that a morphism f in Pre is a quotient iff it is
surjective and has the property that f (x1 ) ≤ f (x2 ) iff
x1 ≤ x2 .
such that
(i). f(g ◦ f ) = f(g) ◦ f( f ) for all composable morphisms
f , g ∈ MorC ; and
(ii). f(id A) = idf(A) for all A ∈ Obj(C).
a Of
course, this function depends on A and B, but by abuse of notation
we omit this dependence.
a And also give this remark because we feel guilty about being sloppy.
f ◦op g B g ◦ f ; (B.3.1.9)
and
(IV). for A ∈ Obj(Cop ), id A B id A.
Obj(K-Mod) 3 V 7→ V † ∈ Obj(K-Mod)
MorK-Mod (V, W) 3 T 7→ T † ∈ MorK-Mod (W †, V † )
B.3.2 Natural-transformations
We can compose the dual-space functor with itself to obtain
the “double-dual-space” functor V 7→ [V † ]† . Recall that (Theo-
rem D.3.1.8), for every K-module V, we actually have a linear-
transformation V → [V † ]† , and in fact, this is an isomorphism in the
context of finite-dimensional vector spaces.
On the other hand, if V is a finite-dimensional vector space,
then so is V † . Furthermore, they have the same dimension, and
hence are isomorphic.7 The isomorphisms V V † and V [V † ]†
are fundamentally different, however. The latter is what is called a
natural isomorphism. To see more clearly how these isomorphisms
are different, let us recall more explicitly what they are.
First of all, the map V → [V † ]† is given by
f( f )
f(A) f(B)
ηA ηB (B.3.2.4)
g(A) g(B)
g( f )
Note that, unlike the previous two appendices, this one requires some
prerequisites from the main text. In particular, you should have
technically read through the section defining the integers.1
On one hand, these are note on analysis, and so it might seem a
bit odd to include an appendix on number theory. On the other hand,
there are places throughout the notes where we will need to make
use of elementary number theoretic concepts (e.g, prime numbers,
divisibility, greatest common divisors, etc.). There’s a good chance
you probably know a lot of this already, but in the spirit of building
everything from the ground-up, these basic concepts deserve to be
treated somewhere.
The first thing we must do is what is called the Division Algorithm,
and it is simply a precise statement of how division (with remainder)
works.
Theorem C.1 — Division Algorithm. Let m ∈ Z and let
n ∈ Z+ . Then, there are unique integers q, r ∈ Z such that
1Though I realize that, in practice, it’s quite likely you’ll be able to understand the
contents of this appendix without having seen a formal development of the integers.
C Basic number theory 818
(i). m = qn + r; and
(ii). 0 ≤ r < n.
a And I suppose, more generally, for what are called GCD domains.
Our first objective is to show that these two definitions are equiva-
lent. Before we do so, however, we’re going to need a couple of extra
tools, one of which is the greatest common divisor.
(ii). d | n; and
(iii). if k | m, n, then k | d.
If m = 0 = n, then we declare the greatest common divisor to
be 0.a
(i). m | l;
(ii). n | l; and
(iii). if m, n | k, then l | k.
It turns out that any two integers have a greatest common divisor,
though this is not immediate.
Proof. St e p 1 : D e f i n e d
Define D B {xm + yn ∈ Z+ : x, y ∈ Z}.
St e p 2 : S h ow t h at e v e ry e l e m e n t o f D i s a m u l -
tiple of d
Let a ∈ D. By the Division Algorithm, there are unique
q, r ∈ Z such that a = qd + r and 0 ≤ r < d. It suffices to
show that r = 0. We proceed by contradiction: suppose that
r > 0. Note that, a is an integral linear combination of m and
n and so is qd, and hence r = a − qd is likewise an integral
linear combination of m and n. Hence, as r > 0, r ∈ D. As
r < d, this is a contradiction of the fact that d was the smallest
element of D.
St e p 4 : S h ow t h at d i s t h e g r e at e s t co m m o n
divisor o f m a n d n
Let d 0 ∈ Z be another common divisor of m and n. As
d = xd m + yd n, it follows that d 0 in turn divides d, as desired.
Step 5: S h ow u n i q u e n e s s
C Basic number theory 824
Okay, now that that caveat is out of the way, let us return to
the definition of a vector space. A vector space is (i) a set, the
elements of which are called vectors2; together with (ii) rules for
adding and scaling the vectors.3 This is the intuition anyways. In
this way, the definition of vector spaces in general is an abstraction
of the set of column vectors Rd that you know and love. This is not
unlike how we ‘abstracted’ the definition of a general topological
space (Definition 3.1.1) from the properties of open sets in Rd
(Exercises 2.5.2.3 and 2.5.2.4 and Theorems 2.5.2.28 and 2.5.2.30).
To specify the scaling operation, however, we first need to say
what the “scalars” are. For vector spaces, the scalars are taken to be a
field. That said, the axioms of a vector spaces make no references to
the existence of multiplicative inverses, and so in fact the definition
makes just as much sense if you allow your field to be a more general
1Or possibly finite-dimensional vector spaces if you prefer—it’s not as if there is
a strict definition of what the term “linear algebra” refers to.
2Though are most definitely not necessarily “actual” column vectors—we just
spent the last couple of paragraphs harping on this point.
3Subject to various axioms of course.
D.1 Basic definitions 829
.
Exercise D.1.2 Let R be a ring and let M be an R-module.
(i). Show that 0 = 0 · v for all v ∈ M.
(ii). Show that −v = (−1) · v for all v ∈ M.
n · v B sgn(n)(v + · · · + v ). (D.1.4)
| {z }
|n |
D.1.1 Bimodules
In fact, we would like the morphism sets MorK-Mod (V, W) themselves
to furnish examples of K-modules. Unfortunately, if K is not commu-
tative, we can’t quite do this. To see this, we would need to be able to
define a notion of scaling of linear-transformations, and essentially
the only thing one could write down is
[α · T](v) B α · T(v). (D.1.1.1)
Unfortunately, however, this is not linear in general:
[α · T](β · v) B α · T(β · v) = αβ · T(v) , β · (α · T(v))
(D.1.1.2)
C β · [α · T](v).
In order for what we have written as an inequality to be an equality,
we would need to know that K is commutative. There is a silly way
to fix this, however—instead, let us try scaling T on the right:
[T · α](β · v) B T(β · v) · α = β · T(v) · α = β · [T · α](v), (D.1.1.3)
and so T · α is again K-linear (on the left).5
5Note that the scaling in W has to be written on the right in order for this to make
sense.
D.1 Basic definitions 833
(α · v) · ζ = α · (v · ζ) (D.1.1.5)
a Of
course, the two scaling operations here are not the same, even those
we abuse notation and simply write “·” for both.
v · α B α · v. (D.1.1.8)
D Linear algebra 834
a Note
that V is still an R-S-bimodule as in the previous example, but
now W is a T-S-bimodule (before it was an R-T-bimodule).
and
for αv ∈ K.
α1 · v1 + · · · + αm · vm (D.2.3)
for α1, . . . , αm ∈ K.
α1 · v1 + · · · + αm · vm (D.2.6)
implies that α1 = 0, . . . , αm = 0.
Furthermore, even if S is not necessarily finite,
you may prefer to use the equivalent definition: “S
is linearly-independent iff for every m ∈ N and
v1, . . . , vm ∈ S
α1 · v1 + · · · + αm · vm = 0 (D.2.7)
implies
α1 = 0, . . . , αm = 0. (D.2.8)
α1 · s1 + · · · + αm · sm = 0. (D.2.9)
a Why?
w = w 1 · v1 + · · · + w m · vm . (D.2.12)
Suppose that
Õ
0= αv · v (D.2.17)
v ∈S
D Linear algebra 840
Span(v1, . . . , vm ) B Span(S)
= {α1 · v1 + · · · + αm · vm : (D.2.20)
αk ∈ K} .
D.2 Linear-ind., spanning, bases, and dimension 841
It follows that
Õ
c b · b − 1 · c = 0, (D.2.29)
b ∈B
It follows that
Õ
bc · c − b = 0, (D.2.33)
c∈C
a Using this terminology, (i) becomes the statement that modules over
division rings are free.
Proof. St e p 1 : I n t ro d u c e no tat i o n
Denote the ground division ring by F . To simplify notation by
getting rid of some subscripts, let us instead write B B B1
and C B B2 .
Step 2: P rov e ( i )
The idea of the proof is to show that V has a maximal linearly-
independent set. This will be a basis by Proposition D.2.27.
The reason this is the strategy is because we have a result
that asserts the existence of maximal things—Zorn’s Lemma
(Theorem A.3.5.9).
So, define
B B {S ⊆ V : S is linearly-independent.} . (D.2.37)
We consider B as a partially-ordered set with respect to the
relation of inclusion. As just explained, a maximal element of
B will be a basis. Furthermore, Zorn’s Lemma states that B
will have a maximal element if every well-ordered subset of
B has an upper-bound in B, and so it suffices to show this.
So, let S ⊆ B be a well-ordered subset of B. Define
Ø
BB S (D.2.38)
S ∈S
D Linear algebra 846
α1 · b1 + · · · + αm · bm = 0 (D.2.39)
and hence
V = Span(b1, . . . , bd )
d
(D.2.41)
Ø
⊆ Span Ck ⊆ Span(C) = V,
k =1
and so C is finite.
Step 4: P rov e ( i i ) i n t h e ca s e o n e i s f i n i t e
Without loss of generality, suppose that B is finite. By the
previous step, C must also be finite. Write B = {b1, . . . , bd }
and C = {c1, . . . , ce }. We wish of course to show that d = e.
Without loss of generality, suppose that d ≤ e.
b1 ∈ Span(C), so we may write
Step 5: P rov e ( i i ) i n t h e ca s e bo t h a r e i n f i n i t e
Now suppose that the cardinalities of both B and C are infinite.
We wish to show that |B| ≤ | C|. If we can show this, then
by B ↔ C symmetry, the same argument can be used to
show that | C| ≤ |B|, whence we will have |B| = | C| by the
Bernstein-Cantor-Schröder Theorem (Theorem 1.1.3.5), as
desired.
So, we would like to show that |B| ≤ | C|. Let c ∈ C. As B
spans V, there are b1, . . . , bm ∈ B and c1, . . . , c m ∈ F nonzero
such that
c = c 1 · b1 + · · · + c m · b m . (D.2.50)
CF B {c ∈ C : Bc = F}. (D.2.51)
We have that
Ø
C= CF . (D.2.52)
F ⊆B
F finite
here, one does not need to know the technical details7 to work with
tensors. In particular:
One reason the term “dual” is used is because, while on one hand,
we can obviously view elements of V † as linear-functionals on V, we
can “dually” view elements of V as linear-functionals on V † .
D.3 The tensor product and dual space 853
v 7→ (φ 7→ φ(v)) (D.3.1.9)
as desired.
T(v1, . . . , vk · α, vk+1, . . . , vm )
(D.3.2.2)
= T(v1, . . . , vk , α · vk+1, . . . , vm )
for α ∈ Kk .
V ⊗K · · · ⊗K V B K. (D.3.2.8)
| {z }
0
D Linear algebra 858
V ×W V ⊗S W
(D.3.2.9)
a The other big motivation is that you will need to learn tensor products
in this level of generality at some point in your mathematical life, so may as
well learn it now.
basis for V and a bunch of cl s that are a basis for W, then the collection
of all bk ⊗ cl s forms a basis for V ⊗ W. Thus, the elements of V ⊗ W
are precisely those things that can be written (uniquely) as a linear
combinations of bk ⊗ cl s.
{b ⊗ c : b ∈ B, c ∈ C} (D.3.2.12)
is a basis for V ⊗ W.
D Linear algebra 862
V ⊗S W = Span {v ⊗ w : v ∈ V, w ∈ W } . (D.3.2.17)
Mor R-Mod-T (U ⊗S V, W) ←
(D.3.2.20)
MorS-Mod-T (V, Mor R-Mod (U, W))
defined by
(u ⊗ v 7→ [φ(v)](u)) ← φ (D.3.2.21)
Mor R-Mod-T (U ⊗S V, W) ←
(D.3.2.22)
Mor R-Mod-S (U, MorMod-T (V, W))
defined by
(u ⊗ v 7→ [φ(u)](v)) ← φ (D.3.2.23)
fg (u ⊗ v) B [g(v)](u). (D.3.2.30)
defined by
and
and
Ì Ì0
V B V (D.3.2.44)
l l
V × · · · × V × V † × · · · × V † → K.
| {z } | {z }
l k
Ëk
R Note that the notation l V is nonstandard (though
based on the standard notation Λl V for something
new which we will become acquainted with later on).
D.4.1 Permutations
Definition D.4.1.1 — Symmetric group Let S be a set. Then,
the symmetric group of S is AutSet (S).
1 7→ 1 (D.4.1.3a)
2 7→ 5 (D.4.1.3b)
3 7→ 2 (D.4.1.3c)
4 7→ 4 (D.4.1.3d)
5 7→ 3. (D.4.1.3e)
D Linear algebra 874
for all
R We write
Ük Ük
V B V (D.4.2.7)
0
D.4 The determinant 875
and
Ü Ü0
V B V (D.4.2.8)
l l
and
Û Û0
V B V (D.4.2.10)
l l
Bibliography
V, 875 N, 4 fX , 761
Ôk
V, 875 a x , 157 m ≤ n, 10
Ô
Ôlk
f (−), 754 m | n, 819
l V, 874
Tx (D), 656 f (·), 754 v ⊗ w, 857
Ëk
V, 870 f (U ), 350 x + y, 150
f (x), 753 x ∈ X, 745
l V, 870
Ë
f (x, y), 761
BeG, L , 398 x ∼R y, 750
f : X → Y , 753
BeG,R , 398 x ∼ y, 750
f ∼AlE g, 471
GeG, R , 398 f −1 , 755 x α , 706
UfG, L , 398 f −1 (V ), 350 x0 /∼, 763
π, 700 f+ , 573 x1 < x2 , 768
τ, 700 f− , 573 xλ , 86