Preliminary Version
Last Revised: January 21, 2008
http://pages.pomona.edu/sg064747
stephan.garcia@pomona.edu
Contents
Lecture 1. Introduction
1.1. Preliminaries
1.2. Hippasus Theorem
1.3. A Nonconstructive Proof
1.4. A Third Proof of Hippasus Theorem
1
1
1
2
3
5
5
6
7
9
9
10
13
13
14
Lecture 5. Bijections
5.1. Counting Without Counting
5.2. Galileos Paradox
5.3. Injections
5.4. Surjections
5.5. Bijections
17
17
17
18
20
21
Lecture 6. Cardinality
6.1. Cardinality
6.2. Countable Sets
23
23
24
27
27
28
30
30
31
32
33
35
i
ii
CONTENTS
35
36
39
39
41
43
43
44
46
46
47
47
49
49
50
51
51
53
54
54
54
56
56
57
59
59
60
61
62
64
64
65
68
68
70
70
73
73
75
CONTENTS
iii
78
78
78
80
80
80
82
83
86
86
87
89
89
91
92
95
95
96
98
98
100
100
103
103
104
105
105
108
108
112
112
113
115
115
117
117
119
119
120
iv
CONTENTS
120
121
123
124
124
128
128
130
130
130
133
133
134
135
136
137
138
139
140
140
142
142
144
144
146
148
149
151
152
152
152
153
156
156
157
159
160
161
163
165
165
166
168
CONTENTS
Galileos Paradox
168
170
172
174
174
175
176
177
179
181
182
182
183
183
LECTURE 1
Introduction
1.1. Preliminaries
Since the real number system (denoted by R) is basic to real analysis, we need to know
exactly what real numbers are. As we will see, this is a far less trivial problem than it first
appears and it deserves serious consideration.
Although to some, the rigorous construction of the real number system can be endlessly fascinating, to others it may appear tedious and pedantic. There is a certain undeniable beauty to seeing the real number system built from the ground up, using logic and set
theory alone. On the other hand, while the grand scheme may be inspiring, many of the
details are quite mechanical and uninteresting. We will content ourselves with some sort
of middle ground, leaving some of the details to the homework and later lectures, while
omitting others altogether.
Since we all believe that R exists, in some form or another, we will introduce the
real numbers somewhat axiomatically. This means that we will not go into the details of
justifying why such a number system exists in proving the existence R we would stray
too far into the realm of set theory and away from real analysis itself. We will simply state
the basic properties of R as axioms and highlight their importance.
Among other things, the real number system contains a number of distinguished subsets:
Definition. R denotes the set of real numbers, N = {0, 1, 2, 3, . . .} denotes the set of natural numbers, Z = {. . . , 2, 1, 0, 1, 2, . . .} denotes the set of integers1, and Q denotes
the set of rational numbers (fractions).
In particular, note that our definition of N includes the number 0. This is somewhat
standard, as far as set theory and logic go, but in other branches of mathematics you may
see N introduced starting from 1. This is not a major mathematical issue, but it is important
to point out the notation that will we be using.
1.2. Hippasus Theorem
The complement of Q in R is the set
I = R\Q = {x R : x
/ Q}
of all irrational numbers (i.e. real numbers which are not rational). It turns out that I 6=
(the
empty set) in other words irrational number exist. In particular, we will demonstrate
that 2 (the length of the diagonal of a unit square) is irrational. While it appears that the
original proof of this
fact was due to Hippasus of Metapontum (who was a Pythagorean),
the irrationality of 2 is often attributed to Pythagoras himself. Regardless of who thought
of it, the proof is a standard example of proof by contradiction.
1The letter Z stands for the first letter of the German word Zahlen.
1
Lecture 1. Introduction
2 is irrational.
Proof #1. Suppose toward a contradiction that 2 is rational. Let us write 2 = a/b
where the fraction is reduced to lowest terms (so that the greatest common divisor gcd{a, b}
of a and b is 1). Squaring the preceding equation, we obtain 2b2 = a2 . This shows that
a2 is even, whence a is even as well (since the square of an odd number is odd). Writing
a = 2c, we find that 2b2 = (2c)2 = 4c2 and thus b2 = 2c2 . This shows that b2 , and
hence b itself, is even. Therefore a and b are both divisible by 2, a contradiction to the
hypothesis
that the fraction a/b was reduced to lowest
It is important to note that the preceding proof implicitly relied on the Fundamental
Theorem of Arithmetic. There are several other basic properties of N that we often take
for granted. Chief among these is the following:
Theorem 1.2 (WellOrdering Property of N). Every nonempty subset of N contains a
smallest element. In other words, if S N and S 6= , then there exists n S such
that n m for every m S.
The WellOrdering Property of N can be proved using the Principle of Mathematical
Induction, which can itself be proved from the axioms of set theory. However, we will not
concern ourselves with such details. We now give another proof of Theorem 1.1 which
relies on a minimality argument:
whence 2 1, an absurdity. A few algebraic manipulations leads us to another representation of 2 as a rational number:
!
2 a
2b a
2 2
21
2= 2
=
= a b =
.
1
ab
21
21
b
Since a b > 0 and 2 > 0, it follows from the preceding that 2b a > 0. However,
a > b and 2b a > 0 mean that
0 < a b < b,
(ii) If c is irrational, then let a = c and b = 2. In this case, the usual rules for
manipulating exponents yields
22 2
2
= 2 = 2.
ab = ( 2 ) 2 = 2
In this case, ab is rational while a and b are irrational.
Since both cases lead to the conclusion that there are irrational numbers a and b such that
ab is rational, the proof is finished.
Observe that the preceding proof does not tell us whether c is irrational or not. It turns
out that c is irrational this follows from the famed GelfondSchneider Theorem, a deep
and difficult result in the theory of transcendental numbers.
1.4. A Third Proof of Hippasus Theorem
Before giving our third proof of Hippasus Theorem, we need a few preliminaries. A
particularly familiar application of the WellOrdering Property is the following:
Theorem 1.4 (Division Algorithm). Given a, b Z with b > 0, there exist unique q, r Z
such that a = qb + r and 0 r < b.
In other words, when you divide a by b, you wind up with a quotient q and a remainder
r which satisfies 0 r < b. Thus the Division Algorithm is just a familiar fact from grade
school arithmetic. Although we omit the proof, it is important to mention that the Division
Algorithm can be proved from more primitive notions. In particular, almost everything in
mathematics can be built up from the basic axioms of set theory.
If a, b are two nonzero integers, then gcd(a, b) will denote the greatest common divisor
of a and b. An important fact about the greatest common divisor is the following theorem,
the proof of which demonstrates both minimality and maximality arguments.
Theorem 1.5 (Linear Representation of GCD). Let a, b be nonzero integers. If g =
gcd(a, b), then there exist integers x0 and y0 such that g = ax0 + by0 . In other words, the
greatest common divisor of a and b is an integral linear combination of a and b.
Proof. Without loss of generality, we may assume that a, b > 0. The set
S = {ax + by : x, y Z}
contains positive integers (as well as 0). By the WellOrdering Property of N, there exist
x0 and y0 such that l = ax0 + by0 is the smallest positive integer in S. It will turn out that
l is the greatest common divisor of a and b, that is l = g where g = gcd(a, b). Notice that
0 < l a,
0<lb
by the definition of l.
We first need to show that l is a common divisor of a and b. By the Division Algorithm,
we may write a = lq + r where 0 r < l (i.e. q is the quotient and r is the remainder
when a is divided by l). Therefore
r = a lq
= a q(ax0 + by0 )
Since r is of the form ax + by, it follows that r S. Since 0 r < l and l is the smallest
positive element of S, we see that r = 0. In other words, l evenly divides a (since the
Lecture 1. Introduction
We now present yet another proof of Hippasus Theorem. In fact, we prove that n is
irrational when n is not a perfect square. The following approach is not as wellknown as
the others and it has a completely different flavor altogether:
Theorem
1.6 (Hippasus of Metapontum). If a natural number n is not a perfect square,
then n is irrational.
Proof #3. Suppose that n = a/b where the fraction a/b has been reduced to lowest
terms. In other words, a and b share no common factors and hence gcd(a, b) = 1. By the
linear representation of the GCD, there exist integers x, y so that 1 = ax + by. It therefore
follows that
n = n(ax + by)
= ( na)x + ( nb)y
= bnx + ay
LECTURE 2
The proof that R enjoys the Archimedean Property requires the Least Upper Bound
Principle, which we will discuss relatively soon. It is important to note that there are
mathematical structures (i.e. ordered fields see Notes on Fields) which are similar to R,
yet which do not enjoy the Archimedean Property.
The Archimedean Property is often used in the following form:
Corollary 1. For every > 0, there exists n N such that
1
n
< .
Theorem 2.3 (Density of Q in R). If a < b, then there exists x Q so that a < x < b.
Proof. Since b a > 0 the Archimedean Principle asserts that there exists n N such
that (b a)n > 1. Since bn an > 1, it follows that there exists an integer m such that
an < m < bn.
(2.2)
m
n.
The same statement holds for the set of irrational numbers I = R\Q:
Theorem 2.4 (Density of I in R). If a < b, then there exists x I such that a < x < b.
Proof. By the preceding theorem, there exists a rational number y such that
b
a
<y< ,
2
2
or equivalently,
a < 2y < b.
{z}
x
holds for any integer n 1 and any real numbers x, y. Moreover, the binomial coefficient
n!
n
=
k!(n k)!
k
is always an integer.
For a proof of the Binomial Theorem, you can consult the Notes on Induction. As
an immediate consequence of the Binomial Theorem, we obtain the following:
Theorem 2.6 (Bernoullis Inequalities). The inequalities
(1 + a)n 1 + na
(1 + a)n 1 + na +
hold for all a 0 and n N.
(Weak Version)
n(n 1) 2
a
2
(Strong Version)
Proof #1. The right hand sides of these inequalities are simply the first two and three
terms, respectively, in the binomial expansion of (1 + a)n . Since each term in the binomial
expansion is 0, the desired result follows.
Both versions of Bernoullis Inequality can be proved by Mathematical Induction. For
instance, here is an inductive proof of the weak version of Bernoullis Inequality:
Proof #2. Let a > 0 and let P (n) be the statement
(a > 1)( (1 + a)n 1 + na ).
(2.3)
We will use mathematical induction to show that the statements P (0), P (1), . . . are all true.
BASE C ASE: Clearly P (0) is true since the desired inequality reduces 1 1, which is
obviously true.
I NDUCTIVE S TEP: Suppose that P (n) is true for some value of n. In other words, suppose
that (2.3) is true for this specific value of n. Multiplying the inequality in (2.3) through by
1 + a we find that
(1 + a)n+1 = (1 + a)n (1 + a)
(1 + na)(1 + a)
= 1 + na + a + na2
= 1 + (n + 1)a + na2
1 + (n + 1)a.
In other words, the statement P (n + 1) is also true we have established that P (n)
P (n + 1). This completes the inductive step.
C ONCLUSION: By mathematical induction, it follows that P (n) is true for n = 0, 1, 2, . . .
and hence (1 + a)n 1 + na for a > 0 and every integer n 0.
As a consequence of the weak version of Bernoullis Inequality, we can prove the
following wellknown and useful result:
Theorem 2.7. If x > 1 and M > 0, then there exists n N such that xn > M . Similarly
if 0 x < 1 and > 0, then there exists n N such that 0 xn < .
Proof. Since the second assertion follows immediately from the first, we prove only the
first statement in the theorem. If x > 1, then write x = 1 + a where a > 0 and use the
weak form of Bernoullis Inequality:
xn = (1 + a)n 1 + na.
2 is irrational.
Proof. #4. Assume toward a contradiction that 2 = p/q where p, q are integers and
q 1. Define the numbers en via the formula
D En
2 = ( 2 1)n
en =
Theorem 2.8 (Hippasus of Metapontum).
0 < 2 1 < 12 .
(2.4)
Indeed, elementary arithmetic shows that the preceding inequality is equivalent to the obvious inequality 1 < 2 < 49 (i.e we can establish (2.4) without the use of a calculator or
decimal expansions). It follows from (2.4) and the definition of en that
1
(2.5)
0 < en < n
2
for all n N. Now observe that for each n N there exist integers an , bn such that
en = an + bn 2.
(2.6)
Although this statement can be proved by Mathematical Induction, it also be proved directly from the Binomial Theorem:
n
X
n n
( 2) (1)nk .
en = ( 2 1)n =
k
k=0
n
Since the binomial coefficients k are integers and since ( 2)n is either an integer or an
integer times 2, the desired formula (2.6) follows immediately. By (2.6), we have
e n = an + b n 2
p
= an + b n
q
an q + b n p
=
q
cn
=
q
where cn is an integer. Since en 6= 0, it follows that cn 1 whence en 1/q. Putting this
together with (2.5), we find that
1
1
en < n
q
2
for every n N. However, the resulting inequality
2n < q fails for sufficiently large n by
Theorem 2.7. This contradiction shows that 2 must be irrational.
LECTURE 3
In terms of the real line visualization of R, an upper bound for A is simply a point
that lies to the right of the entire set A.
Definition. We call a real number s a least upper bound (or supremum) for A if
(i) s is an upper bound for A
(ii) if t is any upper bound for A, then s t.
This is written s = sup A and where sup stands for supremum. If A is not bounded above,
then we say that sup A = .
The corresponding notion of greatest lower bound (also called the infimum) of a set A
(denoted inf A) is defined analogously.
Note that sup A, when it exists, is uniquely determined. Indeed, if s1 , s2 are two least
upper bounds for A, then s1 , s2 are both upper bounds. Since s1 is the least upper bound,
it follows that s1 s2 . Similarly, we find that s2 s1 and hence s1 = s2 . Thus we can
speak of the least upper bound of a set.
Example 3.1. If A is a finite subset of R, then sup A is simply the largest element of A
and inf A is the smallest.
Example 3.2. sup N = since N is not bounded above (this follows from the Archimedean
property).
Example 3.3. sup[0, 1) = 1, where [0, 1) denotes the halfopen interval:
[0, 1) = {x R : 0 x < 1}.
Clearly 1 is an upper bound for [0, 1), so condition (i) in the definition is satisfied.
Now let us check condition (ii). We claim that
x is an upper bound for [0, 1)
x 1.
(3.1)
If x is any number smaller than 1, we can say that x = 1 , where > 0. But then
x < 1 2 [0, 1) and hence x is not an upper bound for [0, 1). This proves (3.1) and thus
1 is the least upper bound for [0, 1). The proves that sup[0, 1) = 1.
9
10
A.
Example 3.4. Consider the set
A = {1, 1.4, 1.41, 1.414, 1.4142, 1.41421, 1.414213, . . .}.
The set A is bounded above, since every element of A is 2. In other words, 2 is an upper
bound for A. Of course, we also recognize that S contains a list of better and better rational
approximations to
2 = 1.414213562 . . . .
The problem with the rational number system Q is that it has holes which must be repaired.
Our intuition tells us that the sequence
1, 1.4, 1.41, 1.414, 1.4142, 1.41421, 1.414213, . . .
is increasing up to 2. That is, there should be some real number that is the least upper
bound for A. In other words, the sequence
above is approaching a hole in Q. This hold
11
2 < 4 x2 ,
it follows that A is bounded above by 2. By the Least Upper Bound Principle, the set A
has a least upper bound in R. Let s = sup A and note that s 1.
By the Trichotomy Law, there are three possible cases to check:
s2 < 2,
s2 > 2,
s2 = 2.
If we can show that the first two cases lead to contradictions, this we can conclude that
s2 = 2, as desired.
(i) If s2 < 2, then we claim that (s + n1 )2 < 2 holds for sufficiently large n N.
Since
2
1
2s
1
s+
= s2 +
+ 2
n
n
n
1
2s
+
s2 +
n
n
2s + 1
2
,
=s +
n
if we make n large enough so that
s2 +
2s + 1
< 2,
n
2s
> 2,
n
2s
2
s2
12
Since (i) and(ii) led to contradictions, it follows from the Trichotomy Law that s2 = 2.
In other words, 2 exists in R.
LECTURE 4
One of the most important consequences of the Least Upper Bound Property is the
socalled Monotone Sequence Property, which asserts that an increasing sequence which
is bounded above must converge:
Theorem 4.1 (Monotone Sequence Property). If an is a sequence of real numbers which
is monotonically increasing (i.e. an an+1 for all n) and which is bounded above (i.e.
there exists M R so that an M for all n), then an is convergent.
The proof of the preceding theorem is requested on an upcoming homework assignment.
P
Definition. An infinite series i=0 ai of real numbers is said to converge to S if the
sequence of partial sums
n
X
ai
Sn =
i=0
tends to S:
ai = S
means that
i=0
lim Sn = S.
X
1
.
xn =
1
x
n=0
Proof. If x < 1 and > 0 are given, then let N N be so large that
xN < 1 x.
We are guaranteed that such an N exists since x < 1. Recalling that
Sn = 1 + x + x2 + + xn1
13
(4.1)
14
1 xn
,
1x
1 x
.
Thus
1
,
1x
which is equivalent to the desired formula (4.1).
lim Sn =
(4.2)
The preceding is nothing more than the familiar recipe for dealing with repeating
decimal expansions. For example:
Example 4.1.
4
= 0.4,
9
47
= 0.47,
99
476
= 0.476, . . . .
999
X
di
(4.3)
(0.d1 d2 d3 d4 . . .)b .
(4.4)
i=1
bi
15
Proof. First note that if di {0, 1, . . . , b 1} for each i, then the formula for the summation of a geometric series tells us that the partial sums of (4.3) are bounded above by
1
X
b1
b
=
(b
1)
1
i
b
1
b
i=1
= (b 1)
= 1.
1
b1
By the preceding theorem, it follows that any series of the form (4.3) converges to a real
number in [0, 1].
Suppose now that x [0, 1) and let d1 be the largest natural number such that
d1
x.
(4.5)
b
In other words, let d1 = [xb]. Observe that 0 d1 b 1 since d1 b would contradict
the fact that x < 1. Similarly, let d2 be the largest natural number such that
d1
d2
(4.6)
+ 2 x
b
b
(this can again be defined in terms of the greatest integer function) and observe that 0
d2 b 1 since d2 b would violate the maximality of d1 in (4.5). Proceeding in
this manner, we obtain a sequence d1 , d2 , d3 . . . of baseb digits of x which satisfy the
inequality
dn
d2
d1
(4.7)
+ 2 + + n x
b
b
b
for each n = 1, 2, 3, . . .. Let
d1
dn
d2
A=
+ 2 + + n : n = 1, 2, 3, . . .
b
b
b
Since A 6= and x is an upper bound for D, it follows that s = sup A exists. The
definition of sup A implies that s x. We claim now that s = x.
Suppose toward a contradiction that s < x. Let m N be so large that
1
< x s.
bm
By the definition of dm , it follows that
d1
dm
d2
1
x<
+ 2 + + m + m
b {z
b } b
b
1
s+ m
b
< x.
However, this implies that x < x, a contradiction. Since s x, it follows that we must
actually have x = s, as claimed. The remainder of the proof (the fact that the series (4.3)
converges to x and the uniqueness assertion) is left to the reader.
An important fact about infinite decimal expansions is that they can help us better understand the relationship between rational numbers and irrational numbers. The following
theorem precisely characterizes rational and irrational numbers according to their infinite
decimal expansions:
16
Theorem 4.5. A real number has an eventually repeating decimal expansion if and only if
it is rational. In other words, a real number x is a rational number if and only if its infinite
decimal expansion is of the form
x = A.BC
where A, B, C are finite blocks of decimal digits.
Proof. Suppose that the real number x has a repeating decimal expansion:
x = A.BC
where A, B, C are blocks of digits of lengths a, b, c, respectively. Clearly,
whence
x A = 0.BC
10b (x A) = B.C
= B + 0.C.
C
10c 1
x = A + 10b B +
10b C
,
10c 1
leads to
LECTURE 5
Bijections
5.1. Counting Without Counting
The set
A = {apple, bird, cat}
has three elements. What do we mean by three? This is a philosophical question, but
clearly there is some property that the set A shares with the set
B = {a, b, c}.
There is some abstract notion of the number three that A and B share and we instantly
recognize this property even though we cannot define it.
We know that the sets A and B above have the same number of elements since they
both have three elements. Unfortunately, this procedure will not work for infinite sets the
types of sets that are of interest in real analysis. Nevertheless, by pairing up elements
(apple, a),
(bird, b),
(cat, c)
and noting that there are no elements of A or B left over, we can conclude that A and B
have the same number of elements without counting. In other words, to see that A and B
have the same number of elements does not actually require us to count to three (or to even
know what three is). For finite sets A and B, we observe that
If there is a onetoone correspondence between the elements
of two finite sets A and B, then A and B have the same number
of elements.
In order to carry over this scheme of counting without counting to more general
sets, we need to discuss functions and their properties. However, let us first examine what
happens if we naively try to count infinite sets.
5.2. Galileos Paradox
In his final book The Discourses and Mathematical Demonstrations Relating to Two
New Sciences (1638), Galileo has a dialogue between two characters about infinite sets.
They discuss what is now known as Galileos Paradox. Galileo did not have permission
from the Inquisition to publish this book after a heresy trial based on an earlier book, the
Roman Inquisition banned Galileo from publishing anything. After failed attempts to publish his book in Germany, France, and Poland, it was finally published in the Netherlands.
Let
S = {0, 1, 4, 9, 16, . . .}
denote the set of perfect squares. Clearly S is a proper subset of N. In other words, S N
and S 6= N since clearly there are natural numbers (like 3) that are not perfect squares.
17
18
Lecture 5. Bijections
0
0
1 2
1 4
3 4 5 6 7
9 16 25 36 49
Galileos Paradox is the apparent contradiction that although S is much smaller than
N, we can still pair off elements of N with elements S. According to our intuition
obtained from studying finite sets, we might say that N and S have the same number of
elements.
In more precise terminology, Galileos Paradox is essentially the observation that the
set N properly contains
S = {0, 1, 4, 9, 16, 25, . . .},
0
0
1 2
1 4
3 4 5 6 7
9 16 25 36 49
f (a1 ) 6= f (a2 ).
(5.1)
19
Since both cases lead to the conclusion that x = y, it follows that f is injective.
Observe that the preceding proof did not automatically assume that f is invertible
(i.e., we did not make any use of the squareroot function). Using the squareroot function
would be inappropriate here since otherwise our reasoning would have been circular.
Another proof that f is injective can be based upon the contrapositive (5.1) of the
definition. If x1 6= x2 and x1 , x2 [0, ), then without loss of generality suppose that
0 x1 < x2 . It follows from this that x21 < x22 whence f (x1 ) 6= f (x2 ). To be really
picky, one should prove that 0 x1 < x2 implies that x21 < x22 . Let x2 = x1 + where
= x2 x1 > 0. It follows that
x22 = x21 + 2x1 + 2 > x21
since > 0 and x1 > 0.
The following theorem involves is useful for constructing various examples:
Theorem 5.1. If f is differentiable on an open interval I and f (x) 6= 0 for all x I,
then f is injective on I.
Proof. Suppose toward a contradiction that a, b I, a < b, and f (a) = f (b). By the
Mean Value Theorem from Calculus I, there exists some c such that a < c < b such that
f (b) f (a) = f (c)(b a).
Since f (b) f (a) = 0 and f (c) 6= 0, it follows that b a = 0 whence a = b. This
contradiction proves that f is injective.
20
Lecture 5. Bijections
5.4. Surjections
Definition. For a function f : A B, the set A is called the domain of f . The set B is
sometimes called the target set of f . The range of f is defined by
Ran f = {b B : (a A)(b = f (a))}.
The range of f is also sometimes called the image of f and denoted f (A).
Note that the range f (A) of f is not always equal to B.
Example 5.5. We can define a function f : R R via the formula f (x) = x2 . Here
A = B = R so that the domain and target set of f are both R. The range of f , however, is
the interval [0, ) = {x R : 0 < x}. This is because not every element of the target set
R is hit by the function. This points out the distinction between the target set B and the
range of a function. The target set is what you are aiming for and the range is what you
hit.
Definition. A function f : A B is called surjective (with respect to B) if for every
b B, there exists an a A such that f (a) = b. A surjective function is called a
surjection.
In symbols, the definition reads:
(b B)(a A)(f (a) = b).
Another commonly used terminology (which you may have heard in your calculus class)
is onto.
Observe that the target set B is of fundamental importance in the definition of surjectivity. By definition, a function is surjective if and only if ran f = B. That is, if and only
if the range of the function equals the entire target set B.
To say that a function is surjective is the same as saying that it
hits its entire target set.
Whether a function is surjective or not depends heavily on the target set B.
Example 5.6. The function f : N N defined by f (n) = n + 1 is not surjective since
f (n) 6= 0 for any n N.
Example 5.7. The function f : Z Z defined by f (n) = n + 1 is surjective. Indeed, for
any b Z, there exists an a Z (namely a = b 1) so that f (a) = b.
The preceding example illustrates the following rule:
A function f : A B is surjective if and only if the equation
f (a) = b has a solution for every b in B.
Example 5.8. The function f : R [1, 1] defined by f (x) = sin x is surjective. Note
that the equation f (x) = y for y [1, 1] has infinitely many solutions. In particular,
surjectivity does not guarantee that solutions to f (a) = b are necessarily unique.
21
5.5. Bijections
Definition. If f : A B is both injective and surjective, then we say that f is bijective.
Bijections are special. If f : A B is a bijection, then we can define an inverse
function f 1 : B A by setting f 1 (b) = a whenever f (a) = b. This is welldefined
since f is both surjective and injective. Indeed, if f is not surjective, then f 1 (b) cannot
be defined for those b B\f (A). If f is not injective, then there may be two distinct
a1 , a2 A such that f (a1 ) = f (a2 ) = b and hence f 1 (b) does not make sense.
Note that
f 1 f : A A; f f 1 : B B
and hence f 1 f and f f 1 are different functions (they have different domains) unless
A = B. Thus f 1 f = IA and f f 1 = IB where IA and IB denotes the identity
functions on A and B, respectively.
Example 5.9. The table below covers a number of examples:
f (x) =
x+1
x+1
sin x
x3 x
tan x
Range
{1, 2, 3, . . .}
Z
[1, 1]
R
R
Injective Surjective
Yes
No
Yes
Yes
No
No
No
Yes
Yes
Yes
Bijection
No
Yes
No
No
Yes
Most of the entries in the table are relatively selfexplanatory. A few are worth mentioning
specifically, however. The function f : R R defined by f (x) = x3 x is a surjection
but not an injection. It is a surjection since limx f (x) = and f is continuous
(hence by the Intermediate Value Theorem from Calculus I, its range in R). It is not an
injection since f (1) = f (1) = 0.
Definition. Suppose that f : A B and g : B C are two functions. The composition
g f is the function g f : A C defined by
(g f )(a) = g(f (a)).
Observe that function composition is associative. Indeed, if h : C D, then
(h (g f ))(a) = h((g f )(a))
= h(g(f (a)))
= (h g)(f (a))
= ((h g) f )(a)
for all a A and hence we may write h g f without parentheses.
From our perspective, the most important property of function composition is that it
respects the properties of injectivity, surjectivity, and bijectivity:
Theorem 5.2. Let f : A B and g : B C be functions.
(i) If f and g are injections, then g f : A C is an injection,
(ii) If f and g are surjections, then g f : A C is a surjection,
(iii) If f and g are bijections, then g f : A C is a bijection,
22
Lecture 5. Bijections
(iv) If f : A B is a bijection, then f 1 : B A is also a bijection.
and hence g f is injective. The first two s are because g and f are injections, respectively. Now for (ii). If c C, then we must find some a A such that (g f )(a) = c.
Since g is surjective, there exists some b B such that g(b) = c. Since f is surjective,
there exists some a A such that f (a) = b. Hence
(g f )(a) = g(f (a)) = g(b) = c
and g f is surjective. The proof of (iii) follows immediately from (i) and (ii). Statement
(iv) was discussed above when we defined inverse functions.
LECTURE 6
Cardinality
6.1. Cardinality
Definition. Let A and B be sets. If there exists a bijection f : A B, then A and B
are said to have equal cardinality (or stated: A and B are of the same cardinality). This is
written A
= B.
B just means that A and B have the same number of
Example 6.1. For finite sets, A =
elements. For instance, the sets
A = {apple, bird, cat},
B = {a, b, c}.
Proof. (i) follows from the fact that the identity function IA : A A is a bijection. (ii)
follows from the fact that a bijection f : A B has an inverse function f 1 : B A
which is also a bijection. (iii) follows from the fact that the composition of bijections is
also a bijection.
The concept of cardinality allows us to divide up the universe of sets into various
categories. Some important definitions are:
Definition. We say that a set A is
(i) finite if A = or A
= {1, 2, . . . , n} for some n N
(ii) infinite if A is not finite
(iii) countable if A is finite or A
=N
(iv) uncountable if A is not countable.
A countable infinite set is sometimes called countably infinite.
There is an alternate definition of infinite which is sometimes used. We state it in the
form of a theorem (without proof):
Theorem 6.2. A set A is infinite if and only if there exists a proper subset B ( A such
that A
= B.
23
24
Lecture 6. Cardinality
6.2. Countable Sets
Let us discuss some examples of countable sets and various methods for constructing
them.
Example 6.2. N
= N. Indeed, the identity function I : N N defined by I(n) = n for
all n N is clearly a bijection.
Theorem 6.3. Any subset of a countable set is countable.
Sketch of Pf. If A is a countable set, then we may list the elements of A:
a0 , a1 , a2 , . . . .
If B A, then we simply make a new list by crossing out those elements of A which do
not belong to B. This produces a new list which provides a recipe for a bijection.
Example 6.3. S
= N, where S = {0, 1, 4, 9, 16, . . .}. Indeed, the function f (n) = n2 is a
bijection from N onto S.
If A is a countable infinite set, then there is a bijection f : N A which provides a
list of the elements of A:
f (0),
f (1),
f (2),
f (3), . . . .
a1 ,
a2 ,
a3 ,
b0 ,
b1 ,
b2 ,
b3 ,
a4 , . . .
b4 , . . . .
We can simply interlace the two lists to obtain a listing of every element of A B:
a0 , b 0 , a1 , b 1 , a2 , b 2 , a3 , b 3 , a4 , b 4 , . . . .
0, 1, 1, 2, 2, 3, 3, 4, 4, . . .
0
0
1 2
1 1
3 4 5
2 2 3
n
2
n+1
2
6 7
3 4
n even
n odd.
25
Example 6.5. N2
= N. This example is so important that we explain it in two different
ways. First, consider Figure 1, which illustrates a procedure for listing every element of
N2 . This provides a definite procedure for listing each element of N2 . In fact, one can find
F IGURE 1. A listing of N2 .
a polynomial in two variables which accomplishes this task (we leave the derivation of this
formula to the homework).
On a completely different note, there is also a brief numbertheoretic argument which
provides another bijection f : N2 N. We claim that the function
f (a, b) = 2a (2b + 1) 1
26
Lecture 6. Cardinality
LECTURE 7
Cantors Theorem
7.1. Constructions with Countable Sets
S
Theorem 7.1. If An is countable for each n N, then nN An is countable. In other
words, the countable union of countable sets is countable.
Sketch of Pf. Without loss of generality, suppose that each of the An is countably infinite
and that Ai Aj = if i 6= j. For each n N, arrange the elements of each An in a list:
An = {a0n , a1n , a2n , a3n , . . .}.
The function
f (i, j) = ith element in the listing of Aj
= aij
S
defines a bijection f : N2 nN An . It therefore follows that
[
An
=N
= N2
nN
and each set An = {(an , b) : b B} is countable, it follows from the preceding theorem
that A B is countable.
Example 7.1. Z2
= N. Indeed, Z
= N (i.e., Z is countable) and hence the preceding
theorem tells us that Z2 is countable as well. This can also be established via a snake
eating the dots argument. First regard Z2 as a subset of the Euclidean plane. Starting at
(0, 0), trace out a square spiral pattern which hits every lattice point (a, b) Z2 .
Example 7.2. Q
= N. For each point (a, b) Z2 , we can associate the fraction a/b.
Some of these will be meaningless (if b = 0) and many will be repeats, since 1/2 = 2/4 =
3/6 = , for example. We can, however, produce a list of all of Q by using the snake
argument from the preceding example to produce a complete list of all possible rational
numbers.
Another way to prove that Q
= N is to use the fact that Q [0, 1]
= N and employ
some of our theorems on constructing countable sets. We leave this as an exercise.
27
28
One might begin to suspect that bijections between infinite sets are essentially meaningless and that all infinities are the same. Shockingly, it turns out that this is not the
case. The following remarkable theorem is due to Georg Cantor:
Theorem 7.3 (Cantor). R is uncountable. In other words, there does not exist a bijection
f : N R.
Proof. Suppose toward a contradiction that a bijection f : N R exists (in fact, we will
prove that no surjection f : N R exists). We will use the fact that any real number can
be written uniquely as a sequence of decimal digits1. Since the function f is supposed to
be a bijection from N to R, we obtain a complete listing
f (0), f (1), f (2), f (3), . . .
of all R. Let us write this list as an array:
f (0) = d00 .d01 d02 d03 d04 . . .
f (1) = d10 .d11 d12 d13 d14 . . .
f (2) = d20 .d21 d22 d23 d24 . . .
f (3) = d30 .d31 d32 d33 d34 . . .
f (4) = d40 .d41 d42 d43 d44 . . .
.. .. ..
. . .
where the di0 s are integers and the dij {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} for j 1. We will
take the diagonal number:
d00 .d11 d22 d33 d44 . . .
and tweak it so that the resulting number cannot possibly be on our list. This will be our
desired contradiction.
Consider the new number
x = D0 .D1 D2 D3 . . .
where the new digits Dn are defined by
Dn =
4
7
dnn =
6 4
.
dnn = 4
Note that for each n N, the nth decimal place of x is different than the nth decimal place
of f (n). In other words, x cannot be any of the f (n) and hence the function f : N R is
not surjective, a contradiction.
The numbers 4 and 7 in the preceding proof are not important. We just do not want to
use 9s in either case since otherwise the number y produced might end in all 9s, which
would cause a problem since we are using decimal expansions that do not trail off in all
9s.
MORAL: R is so much larger than N that it belongs to a higher
class of infinite sets. In other words, there are different levels
of infinity.
1If we agree not to end in all 9s. For instance: 0.50000 . . . = 0.4999999 . . ..
29
Corollary 3. The set R\Q of irrational numbers is uncountable. In particular, there are
more irrational numbers than rational numbers.
Proof. Recall that Q is countable. Since the union of two countable sets is countable, if
R\Q were countable, then R would be countable too. This would be a contradiction to
Cantors Theorem.
Corollary 4. Every subinterval (a, b) of R is uncountable.
Sketch of Pf. It suffices to find a bijection between (a, b) and R. This can be done, for
instance, by composing an appropriate linear function f (x) = ax + b (with a 6= 0) a
continuous, monotone increasing function with two vertical asymptotes, such as g(x) =
tan1 x on the interval (/2, /2) or h(x) = x/(1 x2 ) on the interval (1, 1).
The previous corollary asserts that not only does R contain vastly more elements than
N, any tiny subinterval of R, no matter how small, does as well. This may seem paradoxical
at first, and it takes a long time to digest. Try to think of this in the context of the fact that
between any two rational numbers, there is an irrational number and that between any two
irrational numbers, there is a rational number.
LECTURE 8
Example 8.2. Describing the power set of infinite sets is much trickier. For instance, P(N)
contains every possible subset of N and hence contains the sets
, {1}, {5, 23}, {2, 4, 6, 8, . . .}, {100, 101, 102, . . .}, {2, 3, 5, 7, 11, 13, . . .}.
It turns out that P(N) is much larger than N itself. In fact, Cantor showed that there
are infinitely many levels of infinity:
Theorem 8.1 (Cantor). If S is any set, then there does not exist a bijection f : S P(S).
In other words, P(S) is of a strictly larger cardinality than S.
Proof. Assume toward a contradiction that f : S P(S) is a bijection. For each x S,
we have f (x) S and hence either x f (x) or x
/ f (x). Let
E = {x S : x
/ f (x)}.
Since f is a bijection, there exists a z S such that f (z) = E. However,
zEz
/ f (z)
Def. of E
z
/E
Since f (z) = E.
The preceding theorem shows that if S is an infinite set, then P(S) is much bigger
than S itself, so much bigger that it bumps up to a higher level of infinity. Moreover, we
can obtain a chain of ever larger infinite sets:
S,
P(S),
P(P(S)), . . . .
31
The expression
P (x) = (x is a set) (x
/ x)
is quite unambiguous. An object x should either be a set or not a set. An object x should
either be an element of itself or not be an element of itself. Thus P (x) looks like an unambiguous, if a little unusual, condition. As logical human beings, we should be permitted to
think about the set R.
Russell then asks: Does R contain itself or not? Unfortunately, the definition of R
implies that
RR R
/ R.
Neither R R nor R
/ R is logically possible! This means that we cannot treat R
as a set it is simply too large of an idea to be considered in a logically sound manner.
In other words, we cannot logically consider the set of all sets that are not elements of
themselves without running into paradoxes. We just cannot it is a law of the universe.
Russells Paradox shows that the General Comprehension Principle is not correct.
Russell discovered this paradox and sent it to Gottlob Frege (1848 1925) as Frege was
finishing his Grundgesetze der Arithmetik, a work which attempted to rigorously derive the
laws of arithmetic from supposedly logical axioms. Russells Paradox invalidated much of
Freges work. Indeed, Frege noted:
A scientist can hardly meet with anything more undesirable than to have the
foundation give way just as the work is finished. I was put in this position
by a letter from Mr. Bertrand Russell when the work was nearly through the
press.
There are many other logical paradoxes that have been discovered throughout the
years, but Russells paradox is one of the most important. It forced mathematicians and
logicians to completely reevaluate mathematics and logic from the ground up. Russells
32
Paradox ushered in a new age in which sets would have to be treated in a rigorous axiomatic
fashion. The rules would have to be explicitly stated in such a way that Russells Paradox
would not occur in the universe of Axiomatic Set Theory. Although we will not discuss
axiomatic set theory in this course, it is important to be aware that sets and set theory are
not as simple as they sound.
Here are a couple of paradoxes which are somewhat similar in spirit:
Example 8.3. A car is equipped with a Russell light on its dashboard. The light turns on
to warn the driver if a light has burnt out. What happens when the Russell light burns out?
Example 8.4. The following paradox of Eubulides of Miletus1 (4th century BCE) indicates
that selfreference can be troublesome:
This statement is false.
This is a troublesome sentence (call it P ) since
P is true
P is false.
Thus Eubulides statement is not a logical proposition. This paradox is similar to the liar
paradox: I am lying.
8.3. The Continuum Hypothesis
Question. If N A R, then is it necessarily the case that either A
= N or A
= R?
Phrased another way, are there intermediate cardinalities between that of N and that of
R?
The Continuum Hypothesis (CH) asserts that if N A R, then either A
= N
or A
= R. Georg Cantor believed CH to be true, and spent years attempting to prove
it. David Hilbert, one of the greatest mathematicians in history, placed it first on his list
of open questions presented to the 1900 International Mathematical Congress in Paris.
Surprisingly, the question of whether CH is true or false is not possible to answer.
In 1940, Kurt Godel proved that CH cannot be disproved from the axioms of set theory. Specifically, he showed that CH cannot be disproved using the ZermeloFraenkel (ZF)
axioms or using the ZermeloFraenkel axioms with the addition of the (at one time controversial) Axiom of Choice (AC). This extended axiom system is denoted ZFC. In 1963, Paul
Cohen demonstrated that CH cannot be proved from ZFC either and hence CH is logically
independent of ZFC it is neither true nor false, with respect to the standard axioms of set
theory (of course, the results of Godel and Cohen rely on the assumption that ZFC is not
in itself flawed).
Using the standard (ZFC) axioms of set theory, one can add CH or its negation to
obtain two different versions of mathematics, one in which CH is true and one in which
CH is false. Each universe is as valid as the other the truth or falsehood of CH is therefore
a matter of opinion, since it cannot be proved or disproved from ZFC. This seems bizarre,
but it is easier to understand if we examine a similar situation that occurred in classical
geometry.
1I have actually been to Miletus (now known as Milet, in modern Turkey). There are many fascinating
Roman era ruins, partially sunken below a swamp, which are open to the public. There are, however, few tourists
who visit the site.
33
34
from Postulates 14 and call it a theorem instead. This is precisely what people tried to do
(unsuccessfully) for 2000 years.
Given only Postulates 14, it is impossible to prove or to disprove Postulate 5. In
other words, Euclids 5th Postulate is neither true nor false in the mathematical universe
generated by Postulates 14. This is not a statement about universal truth and universal
falsehood, which is reserved for philosophy. It means only that if we are given only Postulates 14 as true, we cannot logically deduce the truth or falsehood of Postulate 5. One
says that the 5th Postulate is logically independent of Postulates 14.
This opens up two possible mathematical universes, each as valid as the other. If one
assumes that Euclids 5th Postulate is true and proceeds to prove theorems from based
on Postulates 15, then one is proving theorems about Euclidean (flat) geometry. If one
assumes that Euclids 5th Postulate is false and proceeds to prove theorems in this setting,
then one is proving theorems about hyperbolic geometry (a type of curved geometry). The
existence of curved geometries is not surprising to us in the 21st century, since we are used
to hearing of relativity and curved spacetime. Many years ago, however, this was an
extremely radical thought. Indeed, the philosopher Immanuel Kant went so far as to say
that Euclidean geometry is the inevitable necessity of thought.
LECTURE 9
for all u V.
(iv) A DDITIVE I NVERSE: For every u V, there exists a v V such that u + v =
0.
(v) M ULTIPLICATIVE I DENTITY: 1u = u for all u V.
(vi) D ISTRIBUTIVITY:
a(u + v) = au + av,
(a + b)u = au + bu
for all a, b R and u, v V.
Note that in general we do not have a rule that lets us multiply two vectors (i.e., like
a crossproduct). An important theorem for constructing and identifying new vector
spaces is the following:
Theorem 9.1. A subset W of a vector space V is itself a vector space (with the operations
inherited from V) if and only if c1 w1 + c2 w2 W for all c1 , c2 R and w1 , w2 W.
Example 9.1. R itself is a vector space. The operations are simply the usual operations of
addition and multiplication.
Example 9.2. The simplest and most important nontrivial example of a vector space is
ndimensional Euclidean space, Rn , with the usual operations of vector addition
(x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn )
and scalar multiplication
a(x1 , . . . , xn ) = (ax1 , . . . , axn ).
35
36
Example 9.3. The set Mn (R) of all n n matrices is a vector space. Indeed, matrices
can be added and multiplied by constants (one can check that the vector space axioms are
2
satisfied). In fact, Mn (R) is really a disguised version of Rn .
Example 9.4. Let Pn (R) denote the set of polynomials of degree n:
a0 + a1 x + + an xn
Notice the similarity between Pn (R) and Rn+1 . Each polynomial in Pn (R) is uniquely
determined by an (n + 1)tuple (a0 , a1 , . . . , an ) of real numbers.
Example 9.5. If X Rn , then the set C(X) of all continuous realvalued functions
f : X R is a vector space. Using the fact that the sum of continuous functions is
continuous, one can verify that C(X) is closed under the operations of addition and scalar
multiplication. In particular, note that the zero function plays the role of the zero vector in
C(X).
Example 9.6. The set
V = {f : R R : (x R)( f (x) + f (x) = 0 )}
y (x) + y(x) = 0
(9.1)
is a vector space (with the regular multiplication and function addition playing the roles of
scalar multiplication and vector addition).
Recall from Calculus I that every differentiable function is automatically continuous.
It therefore follows that V C(R), a known vector space. By Theorem 9.1, to show that
V is a vector space, we need only show that if y1 and y2 are two solutions to (9.1) and
c1 , c2 R, c1 y1 + c2 y2 also satisfies the differential equation (9.1):
(c1 y1 + c2 y2 ) + (c1 y1 + c2 y2 ) = c1 y1 + c2 y2 + c1 y1 + c2 y2
= c1 (y1 + y1 ) + c2 (y2 + y2 )
= c1 0 + c2 0
= 0.
In any case, the main point of this discussion is to explain that vector spaces composed
of functions often arise in natural settings.
9.2. Norms on Vector Spaces
Definition. A norm on a vector space V is any function k k : V R that satisfies the
following conditions:
(i) kvk 0 for all v V and kvk = 0 if and only if v = 0
(ii) kavk = akvk for any a R and v V,
37
The inequality (iii) in the preceding definition is known as the Triangle Inequality.
Example 9.7. R is a normed linear space when equipped with the norm kak = a. In fact,
norms are generalizations of the absolute value function to vector spaces. Also observe
that if > 0, then kak = a is also a norm on R.
Often there are several possible norms on a given vector space. In that case, we should
be specific about stating which norm we are using.
Example 9.8. There are many different norms on Rn . For instance, the following norms
on Rn are extremely important:
n
X
vi ,
kvk1 =
i=1
v
u n
uX
kvk2 = t
vi 2 ,
i=1
is a norm on Rn . It is 1/n times the 1norm k k1 on Rn . Observe that this new norm is
simply the mean of the absolute values of the entries of v. In particular, it is not hard to
see how this norm would come up in statistics.
Example 9.10. If V = C([a, b]), the vector space of continuous functions on [a, b], then we
have a choice of many possible norms. The following norms on C([a, b]) are all extremely
important:
Z b
kf k1 =
f (x) dx,
sa
Z b
kf k2 =
f (x)2 dx,
a
kf k = sup f (x).
axb
Observe that the functions we are considering are continuous, the preceding norms are
actually welldefined. For instance, if f is continuous, then it has an absolute maximum
and minimum on [a, b] by the Extreme Value Theorem from Calculus I and kf k is welldefined. In fact, you have actually been using the norm (also called the sup norm) for
most of your mathematical career since
kf k = the absolute maximum of f (x) on [a, b].
38
Example 9.11. If V denotes the vector space of all possible functions f : [a, b] R,
then there are no useful norms that can be defined on V. Indeed, the norms from the
preceding examples are no longer welldefined since without any restrictions on f , the
integrals defining the prospective norms need not exist (i.e., they can blow up or be
undefined). In other words, this vector space is simply too large to have any useful
geometric structure.
LECTURE 10
Metric Spaces
10.1. Metric Spaces
Having seen that normed linear spaces and inner product spaces (see Notes on Inner
Products) are natural generalizations of Rn , we now turn to metric spaces, which can be
loosely characterized as anything that we can have a halfway decent notion of distance
in.
Definition. A metric space is a set M , whose elements are called points, endowed with a
metric d : M M R that satisfies the following properties:
(i) d(x, y) 0 for all x, y M . Moreover, d(x, y) = 0 if and only if x = y,
= k(x y) + (y z)k
kx yk + ky zk
In particular, R is a metric space in several different ways (recall that there are many
different ways to place a norm on Rn ). When confusion might occur, we should be specific
about which metric we are using.
Example 10.2. The set
Mn (R) = {A : A is an n n matrix}
39
40
is a metric space. Indeed, it can be viewed as Rn and hence we can equip Mn (R) with
2
any of the metrics that we place on Rn . For instance, if aij denotes the ijth entry of A,
then
v
uX
u n
kAk2 = t
aij 2
i,j=1
defines a norm on Mn (R). Thus if A an B are n n matrices with entries aij and bij , then
v
uX
u n
aij bij 2
d2 (A, B) = kA Bk2 = t
i,j=1
n
X
i,j=1
aij bij ,
One important thing to note is that, unlike normed vector spaces (which include inner
product spaces), there is no notion of adding and scalar multiplication in general metric
spaces.
Definition. If X is a nonempty set, then we define the discrete metric on X to be the
metric defined by
(
0 x=y
d(x, y) =
1 x 6= y.
The discrete metric comes up occasionally in graph theory and computer science. If
X is a metric space equipped with the discrete metric, then all points of X lie at a unit
distance from all other points. This is a difficult thing to visualize. For instance, picture
what the discrete metric on R looks like.
Definition. If X is a metric space and Y X, then Y is also a metric space (when
equipped with the metric inherited from X). We say that Y is a subspace of X.1
Example 10.3. A Mobius strip in R3 (with the standard metric) is also a metric space.
The set of all invertible n n matrices is a metric subspace of Mn (R), although it is not a
vector subspace (it is not closed under addition).
Example 10.4. The spiral
Y = {(r cos r, r sin r) R2 : r 0}
41
d(xn , x) < /2
n Ny
and
Let N = max{Nx , Ny }. If n N , then it follows from the triangle inequality that
d(x, y) d(x, xn ) + d(xn , y)
<
= .
Thus 0 d(x, y) < for any > 0, whence d(x, y) = 0 by the Principle. By the
definition of a metric, it follows that x = y, as desired.
Another important property of convergent sequences is that they are always bounded:
Theorem 10.2. A convergent sequence is bounded. In other words, if limn xn = x,
then there exists R > 0 so that d(xn , x) R for all n N.
Proof. Letting = 1 in the definition of convergence, we find that there exists N N so
that d(xn , x) < 1 whenever n N . Since
d(x0 , x), d(x1 , x), . . . , d(xN 1 , x)
42
n = 0, 1, 2, . . . , N 1.
Geometrically speaking, the preceding theorem asserts that the open ball
BR (x) = {y M : d(x, y) < R}
LECTURE 11
Subsequences, Continuity
11.1. Subsequences
Definition. If xn is a sequence in a metric space (M, d), then we say that yk is a subsequence of xn if there is a sequence nk of natural numbers such that
and yk = xnk .
0 n0 < n1 <
Observe that the terms in the subsequence yk must appear in the same order that they
appeared in xn . Furthermore, also note that k nk for all k N.
Example 11.1. The sequence
1 1
1, , , . . .
2 3
converges to 0 in the metric space (R, d), where d(x, y) = x y is the usual metric. The
sequence
1 1 1 1
1, , , , , . . .
3 5 7 9
is a subsequence of the original sequence. On the other hand,
1 1 1 1
, , , , 1, . . .
5 3 3 9
is not a subsequence for a variety of reasons. First, the terms do not appear in the same
order as they did in the original sequence. Second, 13 is repeated twice, but only occurs
once in the original series.
Theorem 11.1. Every subsequence of a convergent sequence converges and it converges
to the same limit as the original sequence does.
Proof. Let yk = xnk be a subsequence of xn . If > 0, let N N be so large that
d(xn , x) < for n N . Since k nk for all k N, it follows that d(yk , x) =
d(xnk , x) < whenever k N . Thus yk x.
Keep in mind the following examples of sequences and subsequences.
Example 11.2. The sequence xn = (1)n in R (with the usual metric) does not converge.
However, the subsequences 1, 1, 1, 1, . . . and 1, 1, 1, . . . do converge (to different limits).
Example 11.3. The sequence
xn =
1/n
n
n even
n odd
in R (with the standard metric) does not converge. However, one can show that every
subsequence of xn which does converge converges to 0.
43
44
The principle objects of study in analysis are functions. In particular, we are interested in continuous functions. The following definition is a generalization of the notion of
continuity encountered in calculus:
Definition. Let (A, dA ) and (B, dB ) be metric spaces. We say that a function f : A B
is continuous at a point x0 A if
( > 0)( > 0)( dA (x, x0 ) < dB (f (x), f (x0 )) < ).
In fact, this is essentially the Euclidean metric on Rn ! Thus the determinant function
det : Mn (R) R is continuous (with respect to any of the metrics above) since det
is a polynomial in the n2 real variables aij (where 1 i, j n) and we know from
Multivariable Calculus that polynomial functions are continuous functions.
The following theorem is an example of a standard continuity argument:
Theorem 11.2. Let (A, dA ) be a metric space and let B be a normed vector space with
norm k kB .1 If f : A B and g : A B are continuous, then f + g is also continuous.
Proof. This is another /2 argument. Let > 0 be given and note that the definition of
continuity gives us 1 > 0 and 2 > 0 so that
dA (x, y) < 1
dA (x, y) < 2
2.
Thus f + g is continuous.
1Recall that B is automatically a metric space with metric d (x, y) = kx yk .
B
B
45
Example 11.6. If we take A = B = R with the usual metric, then the preceding theorem
simply states that the sum of two continuous functions is continuous.
LECTURE 12
A
a
an
B
f (a).
f (an )
(12.1)
A
a. Let > 0 and use the
Proof. () Suppose that f is continuous at a and that an
definition of continuity to obtain a > 0 so that
dA (a, a ) <
dB (f (a), f (a )) < .
dA (an , a) < .
nN
1
2n
47
B
A
f (a) by hypothesis. However, this
a whence f (an )
By the Squeeze Theorem, an
contradicts the fact that d(f (an ), f (a)) for all n N. Thus f must actually be
continuous at a, as desired.
A
a be a convergent sequence in A.
g : B C be two continuous functions. Let an
Since f is continuous, it follows that
B
f (a).
f (an )
C
g(f (a)).
g(f (an ))
A
C
a. By the previous theorem, we can
(g f )(a) whenever an
Therefore (g f )(an )
conclude that the composition g f : A C is continuous.
48
LECTURE 13
Closed Sets
13.1. Limit, Accumulation, and Isolated Points
Definition. Let (M, d) be a metric space and let S M .
(i) A point x M is a limit point of S if there exists a sequence xn in S so that
xn x.
(ii) A point x M is a accumulation point of S if there exists a sequence xn of
distinct points of S so that xn x.
(iii) A point x M is called an isolated point of S if there exists > 0 such that
B (x) S = {x}.
Here B (x) denotes the open ball of radius centered at x.
In particular, note that
An accumulation point of S is automatically a limit point of S,
An isolated point of S automatically belongs to S.
The following example illustrates more basic facts about limit, accumulation, and isolated
points:
Example 13.1. Let M = R and d(x, y) = x y. If
(i) S = [0, 1), then 1 is a limit point and an accumulation point of S. In particular,
note that neither a limit point nor an accumulation point of S need actually
belong to S.
(ii) S = {0}, then 0 is an isolated point of S. On the other hand, 0 is also a limit
point of S since the sequence 0, 0, 0, . . . of points of S converges to 0.
Theorem 13.1. Let (A, dA ) and (B, dB ) be metric spaces. If a is an isolated point of A
and f : A B is any function, then f is continuous at a.
Proof. Since a is an isolated point of A, there exists > 0 such that dA (x, a) < implies
that x = a. If > 0 is given, then observe that
dA (x, a) <
Thus f is continuous at a.
x=a
50
LECTURE 14
Open Sets
14.1. Closed Sets
Definition. The set of all limit points of S is denoted S and called the closure of S (with
respect to M and d).
Since each element x S is a limit point of S (i.e., x is the limit of the sequence
x, x, x, x, x, x, . . .), it follows that
S S.
(14.1)
Definition. A subset S of a metric space (M, d) is called closed (with respect to M and
d) if every limit point of S belongs to S. In other words, S is closed if and only if
S = S.
Example 14.1. If (M, d) is a metric space, then and M are both closed sets. Indeed, the
closure of is since there are no elements of to make sequences with. On the other
hand, M is closed since any convergent sequence in M converges to a point of M . Thus
M contains all of its limit points.
Theorem 14.1. If (M, d) is a metric space and S M , then
S = S.
In other words, the closure of a set is a closed set.
Proof. By (14.1), we need only prove that
S S.
(14.2)
n N d(x, yn ) < .
2
On the other hand, since yn belongs to S, it follows that there exists xn S such that
d(xn , yn ) < 2 . Putting this all together we find that n N implies that
d(x, xn ) d(x, yn ) + d(yn , xn )
< +
2 2
= .
Thus xn is a sequence in S which converges to x, whence x S. This establishes (14.2)
and completes the proof.
51
52
Example 14.2. This example illustrates that one needs to make sure that the big metric
space is declared beforehand. For instance, Q is a closed subset of the metric space (Q, d),
where d denotes the standard metric d(x, y) = x y. When using Q as the big metric
space, it is as if we are assuming that irrational numbers no longer exist as far as Q is
concerned, they do not.
On the other hand,
Q is not closed when considered
as a subset of the metric space
(R, d). For example, 2 is a limit point of Q in R and 2
/ Q. In fact, the density of Q in
R implies that Q = R. We therefore see that the property of being closed depends strongly
on the big metric space which the set in question belongs to.
Example 14.3. Consider C([a, b]) with the infinity metric:
d (f, g) = kf gk = sup f (x) g(x).
axb
d(yn , y) < .
Since this holds for every > 0, it follows from the Principle that d(x, y) whence
y C (x).
Example 14.4. It is not true in general that
B (x) = C (x) = {y M : d(x, y) }.
Let M be any nonempty set which contains at least two points, let d be the discrete metric
on N, and let = 1. In this case
B (x) = {y M : d(x, y) < 1} = {x}
whence B (x) = {x}. On the other hand,
{y M : d(x, y) 1} = M 6= {x}
since M contains at least two points.
Corollary 5. The interval [a, b] (where a < b) is a closed subset of R (equipped with the
standard metric).
53
b+a
2 
ba
2 },
the closed ball in R of radius (b a)/2 centered at (b + a)/2. Indeed, the inequalities
are equivalent to a y b.
ba
2 y
b+a
2
ba
2
Example 14.5. If (M, d) is a metric space, then and M are open subsets of M .
We must be careful with the term open ball since we now have a technical definition
for the term open. Is what we call an open ball actually an open set, according to our
definition? Fortunately, the answer is yes.
Theorem 14.3. Let (M, d) be a metric space. For each x M and each > 0, the subset
is an open subset of M .
Proof. Let x M and let > 0. If y B (x), then d(x, y) < and hence
r = d(x, y) > 0
Therefore z B (x) whence Br (y) B (x). By the definition of open sets, it follows
that B (x) is open.
LECTURE 15
Pf. of (ii). Let S M be closed and suppose toward a contradiction that S c is open. In
other words, suppose that there exists some x S c such that B (x) 6 S c for every > 0.
It follows that for every n N, there exists xn S such that xn B 21n (x). This implies
that d(xn , x) < 21n whence xn x by the Squeeze Theorem. Since S is closed, it follows
that x belongs to S. However, this contradicts the fact that x belongs to S c .
Example 15.1. The halfopen intervals [a, b) and (a, b] are neither open nor closed in R.
On the other hand, and R are both closed and open in R.
Definition. Let (M, d) be a metric space. A subset S M is called clopen if S if both
open and closed.
Example 15.2. In a metric space (M, d), the sets and M are clopen. There are sometimes other clopen sets in a metric space we will learn more about clopen sets when we
discuss connectivity.
15.2. Set Operations with Open and Closed Sets
It turns out that open and closed sets have relatively nice properties, set theoretically
speaking. The following theorem tells us how open sets react to the usual set theoretical
operations:
54
55
= min{1 , 2 , . . . , n },
B (x) Bi (x) Ai
for i = 1, 2, . . . , n. Therefore B (x) ni=1 Ai and hence ni=1 Ai is an open set.
iI
Ai
!c
iI
Aci ,
iI
which are valid for any index set I (whether finite or infinite).
A useful theorem (which we shall not prove, at least yet) is the following:
Theorem 15.3. An open set S R can be uniquely expressed as a countable union of
disjoint open intervals in such a way that the endpoints of these intervals do not belong to
S.
LECTURE 16
f (a) Y.
It is rather convenient that inverse images work well with the standard set operations:
Theorem 16.1. If f : A B is a function and C, D B, then
(i) f 1 (C D) = f 1 (C) f 1 (D),
x f 1 (C D) f (x) C D
(f (x) C) (f (x) D)
(x f
xf
(C)) (x f
(C) f
(D))
(D)
def. of
def. inv. img.
def. of ,
it follows that the conditions for membership in f 1 (C D) and f 1 (C) f 1 (D) are
identical.
Pf. of (ii). This is similar to the previous proof:
x f 1 (C D) f (x) C D
(f (x) C) (f (x) D)
(x f
xf
(C)) (x f
(C) f
(D)
(D))
def. of
def. inv. img.
def. of .
Since the conditions for membership in f 1 (C D) and f 1 (C) f 1 (D) are identical,
it follows that f 1 (C D) = f 1 (C) f 1 (D).
1Observe that Y c B and that `f 1 (Y )c A. In other words, the complements are with respect to B
and A, respectively.
56
57
A
B
a.
f (a) whenever an
(iv) f (an )
Since the sequence f (xn ) belongs to the closed subset Y , it follows that f (x) also belongs
to Y . Thus x belongs to f 1 (Y ) and therefore f 1 (Y ) is a clsoed set.
(ii) (iii): This follows immediately from the fact that
c
f 1 (Y ) = f 1 (Y c )
for Y B.
(iii) (i): Let x A and let > 0 and note that B (f (x)) is an open subset of B. By
condition (iii), the set f 1 (B (f (x)) is an open subset of A which contains x. Therefore
there exists > 0 so that
B (x) f 1 (B (f (x)).
In other words,
y f 1 (B (f (x)))
f (y) B (f (x))
Example 16.1. The theorem does not say that that f (X) is open if X is open. Indeed, let
A = B = R and let f be the zero function. Then f (X) = {0} for any subset X of R,
regardless of whether X is open or not.
Example 16.2. The theorem does not say that that f (X) is closed if X is closed. Indeed,
let A = B = R and let f (x) = tan1 x. Note that X = R is a closed subset of R but that
f (X) = (/2, /2), which is not closed.
In additional to providing an elegant topological characterization of continuity, the
preceding theorem is extremely useful since it often provides a shortcut to proving that
sets are open or closed.
58
a, b, c > 0
is an open subset of R (with respect to the usual metric). Indeed, the function f : R2 R
defined by
f (x, y) = ax2 + by 2
is clearly continuous (it is the sum of the continuous functions g(x, y) = ax2 and h(x, y) =
by 2 , which are themselves products of continuous functions . . . ). Since the set (, c) is
an open subset of R, it follows that
S = f 1 ( (, c) )
is an open subset of R2 . Similar arguments apply to other planar regions defined in terms
of strict inequalities.
Example 16.5. Consider R3 , equipped with the usual metric. Recall that a plane P is a
subset of the form
P = {(x, y, z) R3 : ax + by + cz = d}
where a, b, c, d R are constants. Since the function f : R3 R defined by
f (x, y, z) = ax + by + cz
is a closed subset of R3 since {d} is a closed subset of R. Similar arguments show that
most surfaces in R3 are closed sets.
Example 16.6. There does not exist a continuous function f : R R (with respect to the
usual metric on R) so that f (x) 0 if x Q and f (x) < 0 if x
/ Q. Indeed, if such an f
existed, then f 1 ( [0, ) ) would be a closed subset of R. However, f 1 ( [0, ) ) = Q is
not closed.
2Although S is a union of the closed sets {n} (n Z), this union is not a finite union.
LECTURE 17
Cauchy Sequences
17.1. Cauchy Sequences
Definition. Let (M, d) be a metric space. A sequence xn in M satisfies the Cauchy condition if for every > 0, there exists N N so that
m, n N
d(xn , xm ) < .
If the sequence xn satisfies the Cauchy condition, then we say that xn is a Cauchy sequence.
In more descriptive terms, one might say that a Cauchy sequence is a sequence whose
terms eventually get arbitrarily close to each other. One important fact about Cauchy sequences is the following:
Theorem 17.1. Every convergent sequence is a Cauchy sequence.
Proof. Let (M, d) be a metric space and suppose that xn x. If > 0, then let N N
be so large that n N implies that d(xn , x) < /2. Therefore if n, m N we have
d(xn , xm ) d(xn , x) + d(x, xm )
< 2 +
= .
Using the preceding theorem we can easily prove that the harmonic series
1+
1 1 1
+ + +
2 3 4
(17.1)
diverges.
Theorem 17.2. The harmonic series (17.1) diverges.
Proof. Suppose toward a contradiction that the series (17.1) converges. In other words,
suppose that the sequence
1 1
1
Sn = 1 + + + +
2 3
n
of partial sums converges. Since limn Sn exists, it follows that Sn is a Cauchy sequence. Thus if = 12 , there exists a corresponding N N such that
n, m N
Sm Sn  < 21 .
60
1
1
1
+
+ +
2N {z
2N}
2N
N times
1
= .
2
The following example illustrates that there are Cauchy sequences which do not converge:
Example 17.1. In the metric space Q, endowed with the usual metric, the sequence
1, 1.4, 1.41, 1.414, . . .
(17.2)
LECTURE 18
Completeness
Lemma 3. The metrics d1 , d2 , d on Rn satisfy
for all x, y in Rn .
(18.1)
Proof. Each of the inequalities can be verified directly from the definitions of d1 , d2 , d .
Theorem 18.1. A subset A Rn is open (resp. closed) with respect to d1 , d2 , or d if
and only if it is open (resp. closed) with respect to either of the others. In other words, the
metrics d1 , d2 , d are equivalent in the sense that they induce the same open and closed
sets.
Sketch of Pf. One verifies that the inequality (18.1) implies the equivalence of the following statements:
d
(i) xi 1 x
d
(ii) xi 2 x
d
x,
(iii) xi
where xi denotes a sequence in Rn and x Rn . For instance, let us prove that (iii) implies
(i).
d
x, then for any > 0 there exists N N such that
If xi
i N d (x, xi ) < .
n
By (18.1), this implies that
iN
d1 (x, xi ) <
whence xi 1 x.
It follows from the equivalence of (i), (ii), and (iii) that the closures with respect to d1 ,
d2 , and d of a subset S of Rn are identical. Since a set is closed if and only if equals
its closure, it follows that the metric spaces (Rn , d1 ), (Rn , d2 ), (Rn , d ) have exactly the
same closed sets. Since the complement of a closed (resp. open) set is open (resp. closed),
it follows that these metric spaces also have precisely the same open sets.
Theorem 18.2. Rn is complete with respect to any of the metrics d1 , d2 , d .
Proof. By (18.1), a sequence in Rn which is Cauchy (resp. convergent) with respect to d1 ,
d2 , or d is automatically Cauchy (resp. convergent) with respect to the other two metrics.
It therefore suffices to prove that Rn is complete with respect to the metric d .
61
62
If a sequence xn is Cauchy with respect to d , then the ith entries xj (i) and xk (i) of
the jth and kth vectors xj and xk satisfy
xj (i) xk (i) max{ xj (1) xk (1), . . . , xj (n) xk (n) }
= d (xj , xk )
i = 1, 2, . . . , n.
Our next goal is to show that the sequence xn in Rn converges to x with respect to the
metric d .
Let > 0 be given and let M1 , M2 , . . . , Mn N be so large that
for i = 1, 2, 3, . . . , n. If
j Mi
N = max{M1 , M2 , . . . , Mn },
then
d (xj , x) = max{ xj (1) x(1), xj (2) x(2), . . . , xj (n) x(n) }
< .
d
63
The proof of the theorem is somewhat lengthy and the difficulties are mostly notational (although the concepts are interesting). The basic idea is to construct M from
equivalence classes of Cauchy sequences from M . We will not go into the details of the
construction, however.
Example 18.1. Consider the metric space (Q, d) where d(x, y) = x y. It can be shown
that the completion of (Q, d) is essentially R (with the normal metric). In fact, some real
analysis textbooks begin by constructing R explicitly from Q using this technique.
Example 18.2. It turns out that C([a, b]) is complete with respect to the metric d (we
will prove this later in the course), but not with respect to d1 or d2 . The completions of
C([a, b]) with respect to d1 and d2 turn out to be the Lebesgue spaces L1 [a, b] and L2 [a, b],
respectively. To go into more detail would require a long digression on measure theory and
the Lebesgue integral.
LECTURE 19
Infinite Series
19.1. Cauchy Criterion for Series
P
Definition. An infinite series n=0 an in
Pamnormed vector space V is said to converge to
a vector S V if the partial sums Sm = n=0 an tend to S:
an = S
n=0
lim Sm = S.
Later in the course, we will prove that (C([a, b]), d ) is indeed complete. In graduate
analysis or differential equations you will encounter many other Banach spaces, including
the Lebesgue spaces and the Sobolev spaces. For now, we will mostly be concerned with
Rn and Mn (R).
In a Banach space, we have the Cauchy Convergence Criterion for series:
P
Theorem 19.1. A series n=0 an in a Banach space V converges if and only if for every
> 0 there exists N N so that
kjN
k
X
n=j
an k < .
Proof. Since V is complete, the given series converges if and only if the sequence
Sm =
n
X
an
n=0
of partial sums is a Cauchy sequence. If Sm is a Cauchy sequence, then for each > 0
there exists N N such that k j N implies that
k
k
X
n=j
On the other hand, if the preceding condition holds, then the partial sums Sm form a
Cauchy sequence. Since V is complete, it follows that limn Sm exists.
64
65
X
lim
an = 0.
m
n=m+1
Here 0 denotes the zero vector in V. In other words, the tail end of a convergent series
tends to zero.
Proof. Let > 0 and find N N such that
nN
kS Sm k < .
 {z }
d(S,Sm )
k(S Sm ) 0k <
whence S Sm converges to the vector 0. However, this is just another way of saying that
!
X
lim
an = lim (S Sm ) = 0.
m
n=m+1
Proof. Since (ii) is the contrapositive of (i) and thus it suffices to prove (i). By the Cauchy
Criterion for Series, we find that for any > 0 there exists N N so that
nN
kan k
Putting this altogether, we find that for any > 0 there exists N N so that
nN
kan 0k <
Another important consequence of the Cauchy Criterion for Series is the following
generalization of the Comparison Theorem from Calculus II:
66
P
Theorem 19.4. Let
n=0 an be a series in a Banach space (V, k k).
P
(i) If n=0 bn is a convergent series of nonnegative real numbers and if there
exists N N so that
then
n=0
nN
kan k bn ,
an converges. In particular, if
n=0
kan k
an
n=0
nN
0 cn kan k,
an diverges.
Proof. If n=0 bn is a convergent series of nonnegative real numbers, then for each > 0
there exists N N so that
kjN
k
X
bn < .
n=j
k
X
n=j
k
X
n=j
k
X
an k
kan k
bn
n=j
< .
By the Cauchy Criterion for Series in a Banach Space, it follows that the series
converges in V. This establishes (i). We leave the proof of (ii) to the reader.
n=0
an
The importance of the preceding theorem is that it allows us to conclude that a series of
vectors in a normed vector space converges if a corresponding numerical series converges.
Needless to say, it is usually much easy to test for the convergence of a series of real
numbers than a series of vectors in a normed vector space.
P
P
Definition. A series n=0 an in a Banach space is called absolutely convergent if n=0 kan k
converges in R.
67
Example 19.1. Not every convergent series (even in R) converges absolutely. For instance,
it can be shown that the Alternating Harmonic Series
X
(1)n+1
1 1 1
= 1 + +
n
2 3 4
n=1
converges (in fact to the value ln 2). However, the corresponding series of positive terms is
simply the harmonic series, which diverges.
Example 19.2. If A is any n n matrix, then we may define the exponential of A by the
series
X
An
.
exp(A) =
n!
n=0
Since Mn (R) is complete with respect to d2 (and d1 , d as well), we need only show
that the terms of the preceding series are bounded in norm by the terms of a convergent
numerical series. Using the submultiplicativity of d2 we find that
n
A
1
n
n!
= n! kA k2
2
Since
1
kAkn2 .
n!
X
kAkn2
n!
n=0
is a convergent series in R (via the Ratio Test from Calculus II), it therefore follows that
X
An
n=0
n!
y1 (t)
y = ... ,
y(0) = y0
y1 (t)
y = ... ,
a0
y0 = ... .
yn (t)
yn (t)
One can show that the solution is given by the simple formula
y(t) = eAt y0 .
This is entirely analogous to the fact that the solution to
y (t) = ay(t),
y(0) = y0
is given by
y(t) = eat y0 .
an
LECTURE 20
Infinite Series
20.1. An Extended Example
Matrices are particularly interesting to study because their algebraic structure (e.g.,
multiplication and inversion) is closely related to their analytic structure (e.g., metrics and
convergence). This example highlights a few such connections.
The following algebraic lemma is quite useful. You have seen it before in the case of
1 1 matrices (i.e., real numbers):
Lemma 4. The formulae
(I Am ) = (I A)(I + A + A2 + + Am1 )
= (I + A + A2 + + Am1 )(I A)
hold for all A Mn (R) and all m N. Here we use the notation
A0 = I,
A1 = A,
A2 = AA,
A3 = AAA, . . . ,
= (AA A)
 {z }
j+k
= (AA A)
 {z }
k+j
= (AA A) (AA A)
 {z }  {z }
k
=A A .
Since any power of A (including A0 = I) commutes with any other power of A, the
identities
(I A)(I + A + + Am1 ) = I Am
and
(I + A + + Am1 )(I A) = I Am
can be proved using the same arguments used in the real case.
P
Lemma 5. If kAk2 < 1, then n=0 An converges (the limit being taken with respect to
the d2 metric).
68
69
Proof. It suffices to prove that the sum is absolutely convergent. Note that we cannot
assume that the series sums to (I A)1 since we do not know yet whether I A is
invertible.
Since we will be dealing only with the 2norm, we will simply drop the subscript 2 in
the following. The sum is absolutely convergent since
X
X
1
kAn k
kAkn =
.
1 kAk
n=0
n=0
P
n
Indeed, the preceding shows that the partial sums of the real series
n=0 kA k (which has
nonnegative terms) are bounded above. Also observe that we used the fact that kAk < 1 to
sum the resulting real geometric series. Moreover, the inequality kAn k kAkn follows
from the submultiplicativity of the 2norm. Since Mn (R) is complete, we know that every
absolutely convergent series in Mn (R) converges in Mn (R) and therefore the given series
converges to some matrix S.
Lemma 6. Matrix inverses are unique, when they exist. In other words, if X, Y, Z
Mn (R) satisfy XY = Y X = I and XZ = XZ = I, then Y = Z.
Proof. Using the fact that matrix multiplication is associative, we see that
Y = Y I = Y (XZ) = (Y X)Z = IZ = Z.
Theorem 20.1. If kAk2 < 1, then I A is invertible and
X
(I A)1 =
An ,
n=0
m1
X
An
n=0
denote the mth partial sum of the series we are concerned with. Since
(I Am ) = (I A)Sm
= Sm (I A)
d
by a preceding lemma, we may pass to the limit and use the fact that Am 2 0 (the zero
matrix) since kAk2 < 1. Using the fact that multiplication by I A is continuous with
respect to d2 , we have
I = (I A)S = S(I A)
from which it follows (using the uniqueness of inverses) that S = (I A)1 .
LECTURE 21
Integral Test
21.1. The Harmonic Series and Integral Test
In this section, we consider only series with real terms. In other words, we have V = R
and our metric is implicitly given by d(x, y) = x y.
Example 21.1. Recall that the harmonic series
X
1
n
n=1
diverges. In particular,
1
= 0,
n
P
In particular, the implication that limn an = 0 implies the
but n=1 an diverges.
P
convergence of n=1 an is false in general. Make sure you remember this.
lim an =
Although we have already proved that the harmonic series diverges (Lecture 17), a second proof of the divergence of the harmonic series is requested on an upcoming homework
assignment. A cheap1 way of seeing that the harmonic series diverges is the Integral Test
from Calculus II:
Theorem 21.1 (The Integral Test). Suppose
R that f : [0, ) R is continuous,
Ppositive, and decreasing. If an = f (n), then 1 f (x) dx converges if and only if n=1 an
converges.
Proof. By considering the graph of f (x) and interpreting the partial sums are the sum of
the areas of boxes, one obtains the inequalities
Z n
Z n
f (x) dx a1 + a2 + + an a1 +
f (x) dx.
(21.1)
1
In particular, the convergence of either the improper integral or the series will imply the
convergence of the other.
Since we have not introduced the integral, let alone improper integrals, in a respectable
manner yet, please do not use the integral test on the homework (until we cover integrals
more formally). In any case, back to the harmonic series:
Example 21.2. The estimates (21.1) from the integral test (with f (x) = 1/x) imply that
ln n 1 +
1
1
+ + 1 + ln n.
2
n
1In the sense that we have not introduced integrals in a rigorous manner.
70
(21.2)
71
In particular, the partial sums of the harmonic series tend to infinity, but extremely slowly.
For instance, the sum of the first million terms satisfies:
6
13.815 . . .
10
X
1
n
n=1
<
<
14.815 . . . .
<
21.7233 . . . .
20.7233 . . .
10
X
1
n
n=1
<
In particular, observe that it would be very difficult indeed to conclude that the harmonic
series diverges based on purely numerical evidence. In fact, we would have to add the first
2.688 1043
terms to get a partial sum of the harmonic series to be greater than 100.
It follows from (21.2) that
1
1
0 1 + + + ln n 1.
 2
{z n
}
=F (n)
This implies that the sequence F (n) is nonnegative (i.e., bounded below by 0). Moreover,
F (n) is a decreasing sequence since:
1
1
1
1
F (n) F (n + 1) = 1 + + + ln n 1 + + +
ln(n + 1)
2
n
2
n+1
1
= ln(n + 1) ln(n)
n+1
Z n+1
1
dx
=
x
n+1
n
> 0.
The inequality follows from the fact that
1
1
>
x
n+1
on the interval [n, n + 1] and hence the area under the graph of f (x) from x = n to
x = n + 1 must be greater than 1/(n + 1).
It follows from the preceding computations that
1
1
+ + ln n
2
n
is a decreasing sequence of real numbers which is bounded below. This implies that
limn F (n) exists. This limit is called EulerMascheroni Constant and it is denoted
(lowercase gamma):
1
1
= lim 1 + + + ln n
n
2
n
0.5772156649 . . ..
F (n) = 1 +
After 0, 1, , e, and the imaginary unit i, is perhaps the most important mathematical
constant. The EulerMascheroni constant appears, among other places, in number theory
72
and complex analysis. For instance, Dirichlet proved that if d(n) denotes the number of
divisors of an integer n, then
!
n
1X
lim
d(i) ln n = 2 1.
n
n i=1
This is an interesting statement about the average number of divisors of positive integers.
LECTURE 22
Alternating Series
22.1. The Alternating Series Test
The following lemma is useful in a variety of scenarios:
Lemma 7. If an is a sequence in a metric space (M, d) such that limn a2n = L and
limn a2n+1 = L, then limn an = L.
Proof. Considering a2n and a2n+1 as sequences in their own right, it follows that for each
> 0 there exists N1 , N2 N such that
n 12 N1
1
2 N2
d(a2n , L) <
d(a2n+1 , L) < .
d(an , L) < ,
The Alternating Series Test applies to series of real numbers whose terms alternate
between positive and negative values.
Theorem 22.1 (Alternating Series Test). If an an+1 > 0 for all n N and limn an =
0, then the alternating series
(1)n an = a0 a1 + a2 a3 +
n=0
converges.
Proof. Let Sm denote the mth partial sum of the given series. Observe that
S 0 = a0
0
S 2 = a0 a1 + a2
= S0 + (a2 a1 )
S0
S 4 = a0 a1 + a2 a3 + a4
= S2 + (a4 a3 )
S2
73
74
Since the evenly indexed partial sums S2n form a decreasing sequence which is bounded
below, it follows that limn S2n exists. Let us denote this limit by S. A similar argument
shows that limn S2n+1 exists as well. By the preceding lemma, it suffices to show that
limn S2n+1 = S:
lim S2n+1 = lim (S2n a2n1 )
=S +0
= S.
X
(1)n
1
1
1
1
= 1 + +
n
+
1
2
3
4
5
n=0
X
(1)n+1
1 1 1 1
= 1 + +
n
2 3 4 5
n=1
X
1 1 1 1
1
= 1 + + + + + .
n
2 3 4 5
n=1
The Alternating Series Test asserts that the alternating harmonic series converges, but we
can say much more. In fact, it is possible to find the sum of the alternating harmonic series
explicitly. In particular, this eliminates the need to appeal to the Alternating Series Test in
the first place we can show that the alternating harmonic series converges without it and
we can compute the sum exactly.
Theorem 22.2. The alternating harmonic series converges to ln 2:
1
1 1 1
+ + = ln 2.
2 3 4
(22.1)
Proof. Let
1
1
+ +
2
m
denote the mth partial sum of the harmonic series and let
Hm = 1 +
1
(1)m+1
+ +
2
m
denote the mth partial sum of the alternating harmonic series. Recall that the EulerMascheroni constant is defined by the limit
Sm = 1
= lim (Hm ln m)
m
0.577 . . . .
75
In particular, we proved that the preceding limit exists. A clever trick now shows that
the evenly indexed partial sums S2m of the alternating harmonic series converges to ln 2.
Observe that
1 1
1
1
S2m = 1 + +
2 3
2m 1 2m
1 1
1 1 1
1
1
= 1 + + + +
2
+ + + +
2 3
2m
2 4 6
2m
1 1
1
1
1 1
1 + + + +
= 1 + + + +
2 3
2m
2 3
m
= H2m Hm
= H2m ln(2m) + ln(2m) Hm
= ln 2 +
= ln 2.
1
m
2m + 1
1
= lim S2m + lim
m
m 2m + 1
= ln 2 + 0
= lim S2m +
= ln 2.
Since limm S2m = limm S2m+1 = ln 2, it follows that limn Sn = ln 2. In
other words, we have proved (22.1).
Theorem 22.3. If
space, then
(an + bn ) = A + B
n=1
and
can = cA
n=1
for any c R.
Proof. We prove only the first portion of the theorem. The proof of the second statement
is considerably easier. If > 0 is given, then the partial sums
m
m
X
X
Sm =
an ,
Tm =
bn
n=1
n=1
m N1 kSm Ak <
2
76
kTm Bk <
.
2
If n N = max{N1 , N2 }, then
Pm
Pm
Pm
k n=1 (an + bn ) (A + B)k = k [( n=1 an ) A] + [( n=1 bn ) B] k
Pm
Pm
k ( n=1 an ) Ak + k ( n=1 bn ) Bk
= kSm Ak + kTm Bk
< +
2 2
= .
In other words, we may add convergent series together (or multiply them by constants)
without affecting convergence. Things work as we would expect, one might say. On the
other hand, taking products of infinite series is tricky indeed. First of all, products are not
defined in most complete normed vector spaces, since their is no notion of multiplication.
Second, even for Mn (R) which does have a notion of multiplication, we have the additional
difficulty that multiplication is not commutative. Indeed, the situation is delicate enough
when using only real numbers as we shall shortly see.
Example 22.2. We know that
1
Multiplying (22.2) by
1
2
1 1 1
+ + = ln 2.
2 3 4
(22.2)
we find that
1 1 1 1
+ + = 12 ln 2.
2 4 6 8
Inserting zeros between the terms of (22.3) we find that
1
1
1
1
+ 0 + 0+ + 0 + 0 + =
2
4
6
8
Adding (22.2) and (22.4) we find that
0+
(22.3)
1
2
ln 2.
(22.4)
1 1 1 1 1
+ + + = 32 ln 2.
3 2 5 7 4
In other words, by rearranging the terms of the series (22.2) so that each negative term
occurs after a pair of positive terms we have changed the sum.
1+
Theorem 22.4. The sum of rearrangement of the alternating harmonic series (22.2) consisting of p positive terms and q negative terms (the terms stay in the same relative order)
is equal to is log 2 + 12 (log p log q).
Proof. Let Hn denote the nth partial sum of the harmonic series and observe that
1
+ n
n
where n is a sequence converging to the EulerMascheroni constant . Indeed, note that
Z n
dx
Hn log n = Hn
x
1
!
n1
X 1 Z k+1 dx
1
+
=
k
x
n
k
Hn = log n +
k=1
n1
X Z k+1
k=1
1
1
k x
{z
dx +
}
77
1
n
1
.
n
Since limn Hn log n = , it follows that limn n = .
Consider only partial sums that consists of blocks of terms. Since the pattern has
period p + q we consider the partial sums S(p+q)n . Since the sum for each successive
block of p + q terms tends to zero, it suffices to prove that
1
(22.5)
lim S(p+q)n = log 2 + (log p log q).
n
2
Since
1 1
1 1
1
1
S(p+q)n =
+ + +
+ + +
1 3
2pn 1
2 4
2qn
1 1
1
1
1 1 1
+ + + +
+ + +
=
1 2 3
2pn
2 4
2pn
1 1
1
+ + +
2 4
2qn
1
1
= H2pn Hpn Hqn
2
2
1
1
1
1
log(pn) pn
= log(2pn) + 2pn +
2pn 2
2
2pn
1
1
1
log(qn) qn
2
2
2qn
1
1
1
= log 2 + log p + log n log p log n log q
2
2
2
1
1
1
1
log n + 2pn pn qn
2
2
2
2qn
1
1
1
1
= log 2 + (log p log q) + 2pn pn qn
,
2
2
2
2qn
(22.5) follows upon passing to the limit.
= n +
LECTURE 23
Rearrangements of Series
23.1. Rearrangements of Series
P
bn = a(n) for all n N. It suffices to show that n=0 bn n=0 an . Indeed, if we can
P
P
prove this then the reverse inequality n=0 an n=0 bn will follow since an is also a
rearrangement of bn (i.e., an = b1 (n) ). For each N N, we have
N
X
n=1
bn =
N
X
n=1
a(n)
M
X
n=1
an
an
n=1
where
M = max{(0), (1), . . . , (N )}.
P
Since
bn 0 for all n N and
n are bounded above
n=0
P
Pthe partial sums of thePseries
Pb
If n=0 an and n=0 bn are two convergent series in R, then what might we say
about their product? Formally, we expect that termbyterm multiplication (the socalled
78
79
X
X
bj = (a0 + a1 + a2 + )(b0 + b1 + b2 + )
ai
i=0
j=0
= a0 b0 + (a0 b1 + a1 b0 ) + (a0 b2 + a1 b1 + a2 b0 ) +
= c0 + c1 + c2 +
=
cn
n=0
The following theorem of Mertens implies that products of series can be taken termbyterm as long as at least one of the series is absolutely convergent.
P
P
Theorem 23.4 (Mertens). If n=0 an and n=0 bn are both convergent (with sums A
and B, respectively) and if at least one of the two series is absolutely convergent, then the
product of the two series may be taken termbyterm:
X
cn = AB
n=0
where
cn =
n
X
ak bnk .
k=0
In other words, the product of an absolutely convergent series and a convergent series can
be multiplied together termbyterm via the Cauchy formula.
We will not prove Mertens theorem here, since it is more important to understand
this particular result than to reproduce its proof. A related theorem of N.H. Abel is the
following:
P
P
P
Theorem 23.5 (Abel). If
= A,
n=0 an P
n=0 bn = B,
n=0 cn = C are convergent
n
series of real numbers (where cn = k=0 ak bnk ), then C = AB.
P
In other words, Abels theorem says that if the termbyterm product series n=0 cn
converges (without the absolute convergence assumption that Mertens theorem requires),
then the sum must actually be AB. The proof requires a clever argument based on partial summation (a discrete analog of integration by parts) and a theorem on the boundary
behavior of power series near their circle of convergence.
1The reason for introducing this method of multiplication is due to the fact that we need to add every
possible term ai bj . We cannot sum with respect to i first since this would lead to the sum of infinitely many
infinite series. Similarly, summing with respect to j first would lead to the same problem. This is similar to the
problem of counting N N by thinking diagonally we can actually list every ai bj without introducing infinitely
many . . . s. The sums defining the new terms cn are finite sums, and hence cause us no trouble.
LECTURE 24
Products of Series
24.1. Cauchy Products of Series
P
If n=0 an and n=0 bn are two convergent series in R, then what might we say
about their product? Formally, we expect that termbyterm multiplication (the socalled
Cauchy product1 of the two series)
!
X
X
bj = (a0 + a1 + a2 + )(b0 + b1 + b2 + )
ai
i=0
j=0
= a0 b0 + (a0 b1 + a1 b0 ) + (a0 b2 + a1 b1 + a2 b0 ) +
= c0 + c1 + c2 +
=
cn
n=0
where
cn =
n
X
ak bnk
k=0
should yield a convergent series. As the following example shows, this is not always the
case.
24.2. The Cauchy Product of Convergent Series Can Diverge!
Consider the series
X
(1)n
n+1
n=0
By the Alternating Series Test, this series converges to some value A. What happens when
we square this series and perform termbyterm multiplication? In other words, let
(1)n
an = b n =
n+1
P
P
and consider the Cauchy product of the two series n=0 an and n=0 bn .
The formula for the new terms cn tells us that
n
X
cn =
ak bnk
k=0
1The reason for introducing this method of multiplication is due to the fact that we need to add every
possible term ai bj . We cannot sum with respect to i first since this would lead to the sum of infinitely many
infinite series. Similarly, summing with respect to j first would lead to the same problem. This is similar to the
problem of counting N N by thinking diagonally we can actually list every ai bj without introducing infinitely
many . . . s. The sums defining the new terms cn are finite sums, and hence cause us no trouble.
80
81
n
X
(1)nk
(1)k
=
k+1
nk+1
k=0
=
n
X
(1)k (1)nk
p
(n k + 1)(k + 1)
k=0
= (1)n
n
X
k=0
1
p
.
(n k + 1)(k + 1)
(n k + 1)(k + 1)
holds for 0 k n.
n
2
+1
2
Pf. of Claim. The inequality can be verified by simply multiplying it all out and checking
to see whether we are led to a true inequality:
?
(n k + 1)(k + 1)
n
+1
2
2
2
n
+n+1
nk + n k 2 k + k + 1
4
? n2
nk k 2
4
? n2
nk + k 2
0
4
n
2
0
k . (T RUE )
2
?
This last inequality is clearly true, and hence so is our desired inequality (i.e., working
backward from the last inequality yields the desired inequality).
Returning to the formula for the terms cn we find that
cn  =
=
=
n
X
k=0
n
X
k=0
n
X
1
p
(k + 1)(n k + 1)
q
n
k=0 2
n
X
k=0
n
2
+1
1
+1
2
n+2
= (n + 1)
=
2
n+2
2n + 2
.
n+2
2
82
P
From this it is clear that limn cn  = 2 6= 0 and therefore the series
n=0 cn does not
converge (by the socalled Divergence Test). In other words, attempting to compute
j
k
X
X
(1)
(1)
j+1
k+1
j=0
k=0
by multiplying termbyterm leads to a divergent series.
X
X
1
1
=
pzn
pz
n=0
n=0
=
1
1 p1z
converges absolutely since 1/pz  < 1. Since Mertens theorem says that we may multiply
absolutely convergent series termbyterm, it follows that (using p = 2, 3 in the above) that
1
1
1
1
1
1
=
1 + z + 2z +
1 + z + 2z +
(24.1)
2
2
3
3
1 21z 1 31z
1
1
1
1
1 + z + z +
=
1 + z + z +
2
4
3
9
1
1
1
1
1
1
1
= 1 + z + z + z + z + z + z + z +
2
3
4
6
8
9
12
where the last sum includes terms corresponding exactly to those numbers whose prime
factorizations only use 2 and 3. To see why, consider that we must multiply each term
1/2jz with each term 1/3kz when expanding out the multiplication on the right hand side
of (24.1). Similarly, one can show that
1
1
1
X
1
,
nz
n=0
(24.3)
defined (at the moment) for real variable z > 1. The Euler Product Formula relates the
series (24.3) for the function to an infinite product indexed by the prime numbers:
1
X
Y
1
1
=
.
(24.4)
1
nz
pz
n=0
pP
83
Here P = {2, 3, 5, 7, . . .} denotes the set of all prime numbers. We will not go into
the details of the proof here, although we will mention that it involves the Fundamental
Theorem of Algebra which tells us that each term 1/nz can be written in the form
1
nz
1
(pa1 1 pa2 2 par r )z
z
z
z
1
1
1
=
pa1 1
pa2 2
par r
in exactly one way. From (24.4), we can see exactly why the function is important to
number theorists. It connects analysis (i.e. infinite series and later functions of a complex
variable) to the prime numbers.
24.4. Eulers Refinement of Euclids Theorem
Recall that Euclid (over 2300 years ago) showed that the set of prime numbers is
infinite. This nontrivial assertion, now known as Euclids theorem, was proved in Book
IX of Euclids Elements. One proof of Euclids theorem is in Lecture 1.2 An 18th century
proof based on the Euler Product Formula (24.4) is given below:
Theorem 24.1 (Euclids Theorem). The number of primes is infinite.
Pf. 1 (Euler, 1737). If the set P of primes were finite, then the Euler Product Formula
(24.4) would have only finitely many terms. Hence the product would remain bounded as
zP 1 which contradicts the fact that the series is unbounded (since the harmonic series
n=1 1/n diverges) as z 1.
Using infinite series techniques, Euler proved a much sharper version of Euclids theorem. Eulers version roughly tells us that there are enough primes to make the series of
prime reciprocals
1 1 1 1
1
1
+ + + +
+
+
(24.5)
2 3 5 7 11 13
diverge. Compare this with the series of reciprocals of perfect squares
1+
1
1
1 1
+ +
+
+
4 9 16 25
(24.6)
which converges by the integral test (Euler also proved that (24.6) converges to 2 /6, an
important result which we will discuss later). Although there are infinitely many primes
and infinitely many perfect squares, the primes are packed close enough together to make
(24.5) diverge while the perfect squares are far enough apart to make (24.6) converge.
The recent proof of Eulers theorem presented below is due to Clarkson:
Theorem 24.2 (Euler, 1737). If pn denotes the nth prime number, then the series
X
1
p
n=1 n
(24.7)
84
Pf. (Clarkson, 1966). Suppose toward a contradiction that the series (24.7) converges. It
follows that there exists a positive integer k such that
m=k+1
1
1
< .
pm
2
(24.8)
This is because the left hand side of (24.8) is the tailend of a convergent series and hence
tends to 0 as m (i.e we are letting = 12 ). Now let
Q = p1 p2 pk
and note that all of the numbers
1 + nQ,
n = 1, 2, 3, . . .
are not divisible by any of the primes p1 , . . . , pk . This follows from the same trick used in
Euclids original proof (see Lecture 1). Hence the prime factors3 of each number 1 + nQ
all belong to the set {pk+1 , pk+2 , . . .}.
For each N 1 we have
!j
N
X
X
X
1
1
.
(24.9)
1 + nQ j=0
pm
n=1
m=k+1
The reason for the inequality is due to the fact that the sum on the right hand side of (24.9),
when expanded, includes every term on the left hand side. Now observe that (24.8) tells us
that the right hand side of (24.9) is dominated by the convergent geometric series
j
X
1
.
2
j=0
This implies that
1
1
+
nQ
n=1
converges, since it is a series of positive terms which has bounded partial sums. The
integral test, however, reveals that this is false.4 This contradiction shows that the original
series (24.7) diverges.
There are many variants and further refinements of this theorem. For instance, a sharpened form says that
X1
log log x = B1
lim
x
p
px
where B1 0.2614972 . . . is called Mertens Constant. This was first demonstrated (independently) in 1866 by Meissel and Mertens in 1874. A shocking refinement of Eulers
3We are implicitly using
Theorem 24.3 (Fundamental Theorem of Arithmetic). Every integer n > 1 can be expressed as a product of
ar
1 a2
primes. Specifically n = pa
1 p2 pr where the pk are distinct primes and the ak are positive integers. The
factorization of an integer n > 1 into primes is unique, apart from the order of the prime factors.
4i.e. R
1
dx
1+xQ
diverges.
85
theorem is Bruns theorem (1919). This theorem states that the series of reciprocals of twin
primes converges. In fact
1 1
1 1
1
1
+
+
+ 1.9021606 . . .
+
+
+
3 5
5 7
11 13
It is not known whether the constant 1.9021606 . . . is rational or irrational. Furthermore,
it is not even known whether or not there are infinitely many twin primes.
LECTURE 25
Compactness
25.1. Compactness
Definition. Let (M, d) be a metric space. A subset S M is called compact1 if every
sequence xn in S has a subsequence xnk which converges to a point in S.
Example 25.1. is a compact subset of any metric space.
Example 25.2. Any finite subset of a metric space is compact. Indeed, let (M, d) be a
metric space and let S = {a1 , a2 , . . . , an } be a finite subset of M . If xn is a sequence in
S, then there exists i {1, 2, . . . , n} so that xn = ai for infinitely many n. In other words,
there exists a sequence nk of natural numbers such that the corresponding subsequence xnk
of xn is constant (each term is ai ). In particular, the subsequence xnk converges to ai .
Example 25.3. The set S = { n1 : n N} is not compact in R (with respect to the usual
metric), even though every subsequence of each sequence in S converges to 0. Since this
limit point is not an element of S, S is not compact. On the other hand, S {0} is a
compact subset of R.
Theorem 25.1. Every closed interval [a, b] R is compact.2
Proof. Without loss of generality, suppose that xn is a sequence in [0, 1]. Let I0 = [0, 1]
and select any xn0 I0 . Now observe that xn [0, 12 ] or xn [ 12 , 1] for infinitely many
values of n. Let I1 denote one of these subintervals which contains xn for infinitely many
values of n and select xn1 I1 where n0 < n1 . Continuing this bisection procedure, we
construct a sequence of subintervals Ik such that
Ik+1 Ik I1 I0 = [0, 1]
and
1
,
2k
and corresponding points xnk Ik such that n0 < n1 < . The subsequence xnk so
constructed satisfies the condition
1
j, k N xnk xnj  < N
2
since the terms xnk and xnj are restricted to lie in the interval IN . Therefore the subsequence xnk is Cauchy. Since [0, 1] is complete (it a closed subset of the complete metric
space R), it follows that the subsequence xnk converges to a limit in [0, 1].
Length(Ik ) =
1The term sequentially compact is sometimes used to distinguish this concept from covering compactness,
which we will discuss later. It turns out that in metric spaces the two concepts are equivalent (this is a major
theorem). Therefore we can safely use the term compact without worrying too much in the long run.
2By definition [a, b] is of finite length b a. In other words, we do not mean to include closed intervals of
the form [a, ) or (, b].
86
87
Proof. We prove the theorem in R2 . The general case (where n > 2) is similar, but the
notation is more cumbersome. If (xn , yn ) is a sequence in the box [a1 , b1 ] [a2 , b2 ], then
xn is a sequence in the compact subset [a1 , b1 ] of R. By the compactness of [a1 , b1 ] (as a
subset of R), there exists a subsequence xnk of xn so that xnk converges to some x in R.
Now ynk lives in the compact set [a2 , b2 ] and hence has a convergent subsequence
ynkj which converges to some point y in R. Therefore the subsubsequence (xnkj , ynkj )
converges to the point (x, y) in R2 . This proves that the box [a1 , b1 ] [a2 , b2 ] is compact.
Theorem 25.3. Every compact subset of a metric space (M, d) is closed and bounded.
Proof. Let S be a compact subset of a metric space (M, d). Suppose that xn is a sequence
in S which converges in M . In other words, there exists x M so that xn x. Since
S is compact, there is a subsequence xnk of xn that converges to some point y S. But
subsequences of a convergent sequence must converge to the original limit. Therefore
x = y S and hence S is closed (since S contains all of its limits points).
To see that S is bounded, fix x M . Either S is bounded (i.e., there exists M > 0
so that S BM (x)) or else for each n N there exists xn so that d(x, xn ) > n. Since
S is compact, there exists a subsequence xnk of xn which converges to a point y S.
However,
nk < d(xnk , x)
d(xnk , y) + d(y, x)
for all k N. However, since xnk y and d(y, x) is constant, the right hand side of the
preceding inequality is bounded, a contradiction. Thus S is bounded, as claimed.
Theorem 25.4. Every closed subset of a compact metric space (M, d) is compact.
Proof. Let S be a closed subset of a compact metric space (M, d). If xn is a sequence
in S, then the compactness of M yields a subsequence xnk which converges to a point x
in M . However, xnk is a sequence in the closed set S and hence the limit point x must
belong to S. In particular, this means that every sequence in S has a subsequence which
converges in S. Thus S is compact.
Corollary 8. The arbitrary intersection of compact
T sets is compact. In other words, if Ai
is a compact subset of (M, d) for all i I, then iI Ai is also compact.
Proof.
T Recall that the arbitrary intersection of closed sets is closed. It therefore follows
that iI Ai is a closed subset of a compact set (namely any of the Ai ) and is thus compact
by the preceding theorem.
It is also true that the union of finitely many compact sets is also compact.
25.2. Compact Sets in Rn
Theorem 25.5 (BolzanoWeierstrass). A bounded sequence in Rn has a convergent subsequence.3
3Note that the limit point does not have to be a member of the sequence. The sequence 1, 1 , 1 , . . . in R
2 3
is bounded and has many convergent subsequences. However, the limit point 0 does not belong to the original
sequence.
88
Proof. A bounded sequence in Rn is contained in a box. Since boxes are compact, some
subsequence converges to a limit contained in the box.
The preceding theorem is sometimes stated without mention of sequences:
Theorem 25.6 (BolzanoWeierstrass). A bounded, infinite subset S of Rn has an accumulation point in Rn .4
Recall that if a subset S in a metric space (M, d) is compact, then S must be closed
and bounded. In general, the converse is false (i.e., it is possible for S to be closed and
bounded, but not compact). However, Rn is particularly nice in the sense that the converse
is true for subsets of Rn :5
Theorem 25.7 (HeineBorel). A subset S of Rn is compact if and only if S is closed and
bounded.
Proof. We have already proved that a compact set must be closed and bounded. Indeed,
this holds in any metric space (M, d), not just Rn .
On the other hand, suppose that S is a closed and bounded subset of Rn . Since S is
bounded, it follows that S is contained in a box B = [a1 , b1 ][a2 , b2 ] [an , bn ] Rn .
Since B is compact, it follows that every sequence xn in S has a subsequence xnk which
converges to a limit x in B. However, since S is closed, it follows that x belongs to S.
Therefore S is compact.
with respect to one of d1 , d2 , d if and only if it converges with respect to the other two metrics, it turns out that
the following theorem holds regardless of whether which of the three metrics d1 , d2 , d one uses.
LECTURE 26
[0, 1]
C1
C2
..
.
=
..
.
[0, 13 ] [ 23 , 1]
[0, 91 ] [ 29 , 13 ] [ 32 , 97 ] [ 89 , 1]
..
.
where Cn+1 is obtained from Cn by removing the middle third of every closed interval
contained in Cn . To be more specific:
Cn consists of 2n closed intervals of length
1
.
3n
Cn .
n=0
In other words, C is the set that is left over after removing the intervals
[ 31 , 23 ],
[ 19 , 92 ],
[ 97 , 89 ], . . .
from [0, 1]. One might at first think that C is empty. Nothing could be further from the
truth. In fact, it is immediately clear that C is infinite since 0, 1, 13 , 23 , 19 , . . . all belong to
C. This is because each of these numbers belongs to every Cn (i.e., these numbers are
never removed from [0, 1] during the construction of C). In other words, the endpoints
of any of the closed subintervals that belong to any of the Cn also belong to the Cantor Set
C.
It turns out that C is a highly nontrivial example of a compact set:
Theorem 26.1. C is compact.
Proof. Each set Cn is the finite intersection of closed sets and hence closed. It follows that
C, being the intersection of closed sets, is itself closed. Since C is also bounded, it follows
that C is compact by the HeineBorel theorem.
89
90
One of the first things one notices about C is that it is somewhat sparse. Although we
have not discussed the concept of Lebesgue measure (a Math 137138 topic), the following
theorem is too curious to pass up:
Theorem 26.2. C has measure zero. In other words, the length of C is zero.
Proof. We will show that the complement [0, 1]\C of the Cantor set has length 1. This is
a much more intuitive thing to do since [0, 1]\C is simply a union of open intervals, each
of which has a welldefined length. Therefore the length of [0, 1]\C is simply the sum of
the lengths of the intervals that are removed in the formation of C. Recalling that the set
Cn from the nth stage of the construction of the Cantor set consists of 2n closed intervals
of length 31n , we compile the following table:
Stage
0
1
2
3
..
.
n
..
.
2
9
4
27
+ =
=
=
=
1
2
4
3 (1 + 3 + 9
X
1
( 32 )n
3
n=0
1
1
3 1
1.
+ )
2
3
Thus [0, 1]\C has length 1, which implies that C has zero length.
Of course we may have expected the preceding since C seems so sparse and spread
out. However, in another sense C is quite large:
Theorem 26.3. C is uncountable. In fact, C has the same cardinality as [0, 1] itself.
Sketch of Pf. One notes that a number x [0, 1] belongs to C if and only if the base3 expansion of x contains only the digits 0 or 2. In other words, each x C can be
represented uniquely in the form
a0
a2
a1
x=
+ 2 + 3 +
3
3
3
where ai {0, 2} for each i. Thus C has the same cardinality as the set of all infinite
strings of 0s and 2s. However, this is the same cardinality as that of the set of all infinite
binary strings strings which use only the symbols 0 and 1. The set of all infinite binary
strings corresponds to [0, 1] itself, however, which is uncountable.
The preceding theorem is quite remarkable, since it says that C has the same cardinality as [0, 1] (and hence R itself) despite the fact that the length of C, by any reasonable
standard of measurement, is zero.
91
Recall that the Cantor set C is a peculiar compact subset of [0, 1] which is uncountable, yet is of measure 0. Unlike other sets that we have encountered in our cardinality
discussions, the Cantor set is closed in R. In other words, C contains all of its limit points.1
In fact, it turns out that even more is true:
Theorem 26.4. Every point of C is an accumulation point of C.
Proof. If x C and > 0 are given, then let n N so be large that 3n < . Since
x Cn , there exists a closed interval I of length 3n such that x I Cn . If y be an
endpoint of I which is distinct from x, then x y 3n < . This implies that x is an
accumulation point of C.
Thus C is an uncountable subset of [0, 1] of measure zero which is somehow so dense
in itself that every point of C is the limit of a sequence of distinct points of C. Moreover,
C is totally disconnected, in the sense that between any two points x, y C, there exists
a point z in between them which is not an element of C. To show this, it suffices to prove
that C contains no intervals.
Theorem 26.5. C contains no intervals.
Proof. Let I be a subinterval of [0, 1] of length > 0. Let n N be so large that
3n < . Then Cn consists entirely of intervals of length < which implies that I 6 Cn .
In particular, this means that I 6 C.
26.2. The Cantor Ternary Function
Using the Cantor set, one can create a host of pathological functions. For instance,
consider the Cantor ternary function (a.k.a. the Devils Staircase) f : [0, 1] [0, 1]
defined by
1
if x [ 13 , 23 ]
1 if x [ 1 , 2 ]
4
9 9
f (x) = 3
if x [ 79 , 89 ]
.. ..
. .
(see Figure 1). Clearly this is well defined if x
/ C. If does belong to C, then recall x C
if and only if there exists a sequence an of 0s and 2s so that
x=
For such x, we define
f
X
an
3n
n=1
X
an
.
3n
n=1
X
an
.
2n+1
n=1
Theorem 26.6. The Cantor ternary function f : [0, 1] [0, 1] is continuous and increasing and satisfies f (x) = 0 for all x
/ C.
Sketch of Pf. Since f is constant on each of the open intervals removed during the construction of C, we need only show that f is continuous at each point of C itself. Let > 0
be given and let n N be so large that 1/2n < . If x y < = 31n , then there are
1For instance, contrast this with Q [0, 1] which is certainly not closed.
92
3
4
1
2
1
4
1
2n
< .
In other words, the Cantor ternary function is flat almost everywhere yet is still
increasing. Using such tricks, one can create even more bizarre functions:
Example 26.1. Let f denote the Cantor ternary function. The function g : [0, 1] R
defined by
g(x) = x 2f (x)
g(1) = 1.
93
1
4
1
2
3
4
1
In other words, all compact metric spaces are continuous images of the Cantor set.
This does not contradict anything we have learned about connectedness. Although the
continuous image of a connected set is connected, the continuous image of a disconnected
set can certainly be connected.
Another byproduct of the construction of the Cantor Set is the construction of Peano
curves (spacefilling curves). For example:
94
Theorem 26.9. There exists a continuous function f : [0, 1] [0, 1] [0, 1] which is
surjective.
Finally, we leave off with the shocking fact that a nontrivial metric space can be homeomorphic to its own Cartesian product:
Theorem 26.10. C is homeomorphic to C C.
LECTURE 27
holds for all x A. We claim that f attains the absolute maximum value y = sup f (A) at
some point of A (the proof that f attains its absolute minimum value inf f (A) is similar).
1In other words, let x belong to f 1 ({y }).
n
n
95
96
Example 27.2. There is a hottest place on Earth if we imagine the surface of the earth
to be the sphere S = {(x, y, z) R3 : x2 + y 2 + z 2 = 1}, we see that S is compact (it is
closed and bounded). Since the function T (x, y, z) describing the temperature at any point
of S is continuous (one would assume), the preceding theorem says that there is a point on
S at which the temperature is an absolute maximum.
27.2. Uniform Continuity
Recall that a function f : A B is said to be continuous on A if f is continuous
at each point x in A. In particular, observe that if f is continuous on A, then the in the
definition of continuity is allowed to depend upon x. In other words, given > 0, the same
is not guaranteed to work for each x in A. Some x will require smaller s than others.
There is a stronger notion of continuity that is extremely important in analysis:
Definition. Let (A, dA ) and (B, dB ) be metric spaces. A function f : A B is called
uniformly continuous if
( > 0)( > 0)(x, y A)( dA (x, y) <
The key difference between uniform continuity and continuity is that once > 0 is
fixed, the same > 0 must work for all x, y A. Let us make this distinction explicit. A
function f : A B is continuous on A if
(x A)( > 0)( > 0)(y A)( dA (x, y) <
Example 27.3. The function f : [0, 1] R defined by f (x) = x2 is uniformly continuous. One can prove this directly from the definition, but we prefer a more clever
approach. Consider the following:
f (x) f (y) = x2 y 2 
= x + yx y
2x y.
It follows that if > 0 is given that we may take = /2 for all x, y [0, 1]. Indeed,
x y < immediately implies, by the preceding inequalities, that f (x) f (y) < .
The following example indicates that the domain of the function plays a significant
role in whether a continuous function is uniformly continuous:
Example 27.4. The function f : R R defined by f (x) = x2 is continuous, but not uniformly continuous. What does it mean to not be uniformly continuous? Since the definition
97
It follows that f is not uniformly continuous if and only if the following holds2:
( > 0)( > 0)(x, y)( x y <
f (x) f (y) ).
Thus we must prove that the preceding statement is satisfied by our given function. Indeed,
if = 1 and > 0 is given, we wish to find x, y such that x y < and x2 y 2  1.
We claim that if x is sufficiently large and if y = x + 2 , then both conditions will hold.
We therefore wish to find x 0 such that the inequality
1 f (x + 2 ) f (x)
= (x + 2 )2 x2 
= 2x( 2 ) + ( 2 )2
= x +
2
4
4
will do. In particular, this shows that f is not uniformly continuous.
LECTURE 28
Uniform Continuity
Theorem 28.1. A continuous function on a compact set is uniformly continuous.
Proof. Let (A, dA ) be a compact metric space, let (B, dB ) be a metric space, and let f :
A B be continuous. Suppose toward a contradiction that f is not uniformly continuous.
Hence there exists > 0 so that no matter how small > 0 is, there exist x, y A so that
dA (x, y) < but dB (f (x), f (y)) .
Letting = 21n for n N, we may therefore find sequences xn , yn in A so that
dA (xn , yn ) <
1
2n
2
ex sin x + 47 + cos(sin(cos( 3 x + 47)))
,
f (x) =
47 x
e + 47
then the preceding theorem asserts that f is uniformly continuous on A. Clearly this is not
something you would want to ever verify by direct computation.
28.1. Nested Compact Sets
Theorem 28.2. Let (M, d) be a metric space. If An is a sequence of nonempty, compact
subsets of M such that
A0 A1 A2
T
then A = n=0 An is compact and nonempty.
98
99
Proof. Recall that any closed subset of a compact set is compact. Since the arbitrary
intersection of closed sets is closed, it follows that A is a closed subset of A0 whence A is
compact. We must now show that A is nonempty.
For each n N, select a point xn An . Since A0 is compact and since xn A0 for
all n N, it follows that some subsequence xnk of xn converges to a limit x A0 .
We wish to show that the limit point x of this subsequence also belongs to each An .
To do this, note that for each n N the tail sequence
xn , xn+1 , xn+2 , . . .
follows that the limit point x belongs to An for each n N. Therefore x n=1 An = A
and A 6= .
Definition. Let (M, d) be a metric space. The diameter diam(S) of a subset S M is
defined by
diam(S) = sup{d(x, y) : x, y S}.
In other words, the diameter of a set is the supremum of the distances between points of S.
Theorem 28.3. Let (M, d) be a metric space. If An is a sequence of nonempty, compact
subsets of M satisfying
lim diam(An ) = 0,
n
T
then A = n=0 An consists of a single point.
0 d(x, y) diam(An ) 0.
LECTURE 29
d(T x, T y) d(x, y)
Tn = T
 T T{z T} .
n
Proof. We need to show that a fixed point p of T exists and that it is unique.
E XISTENCE: Since T : M M is a uniformly strict contraction, there exists a constant
[0, 1) so that d(T x, T y) d(x, y) for all x, y M . Fix any x0 M and set
xn = T n x0 for n 1. It follows that
d(xn+1 , xn ) = d(T n+1 x0 , T n x0 )
= d(T (T n x0 ), T (T n1x0 ))
= d(T xn , T xn1 )
d(xn , xn1 ).
100
101
= lim T xn
n
= lim xn+1
n
= x.
Thus T x = x and x is a fixed point of T . However, it appears that x might depend upon
the initial point x0 . We must show that this is not the case in other words we must show
that T can have at most one fixed point.
U NIQUENESS: If y is a fixed point of T then
0 d(x, y) = d(T x, T y) d(x, y),
The following example demonstrates the power of the Contraction Mapping Principle
and of the entire metric space machinery that we have built up. In particular, the Contraction Mapping Principle can provide proofs that certain complicated differential and integral
equations have unique solutions. Moreover, it also provides an algorithm by which these
solutions can be computed.
Example 29.1. If g : [0, 1] R is continuous, then there exists a continuous realvalued
function f : [0, 1] such that
Z x
2
f (x t)et dt = g(x).
(29.1)
f (x)
0
sup
0x1
0
x
f1 (x t) f2 (x t)et dt
kf1 f2 k
kf1 f2 k
et dt
et dt
102
where
et dt = 0.746824 . . . < 1.
0
2
The fact that 0 < 1 can be seen by considering the graph of ex for 0 x 1 (see
2
Figure 1). An alternate approach might be to use the fact that the series expansion for ex
1.0
0.8
0.6
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
F IGURE 1. Graph of y = ex .
is alternating:
2
ex =
X
(x2 )n
n!
n=0
x4
x6
+
2
6
4
x
1 x2 +
2
= 1 x2 +
t2
dt
x4
2
1x +
dt
2
= 0.76.
By the Contraction Mapping Principle, it follows that T has a unique fixed point. In other
words, there exists a unique continuous realvalued function f : [0, 1] such that
(29.1) holds.
The preceding example shows the power of this abstract approach. It also indicates
that we need to have a better understanding of integrals, derivatives, infinite series, and the
d metric in order to handle some of the sophisticated problems that one encounters in
other branchs of mathematics and in applications.
LECTURE 30
Derivatives
The following notion of convergence is commonly used in Calculus:
Definition. Let (A, dA ) and (B, dB ) be metric spaces, let a A, and let f : A\{a} B
be a function. We say that
lim f (x) = y
xa
f (x) y < .
The following theorem asserts that the preceding definition of limits agrees with our
original definition and also agrees quite well with the definition of continuity:
Theorem 30.1. Let (A, dA ) and (B, dB ) be metric spaces, let a A and let f : A\{a}
B be a function. The following are equivalent:
(i) limxa f (x) = y,
(ii) limn f (xn ) = y for any sequence xn in A which converges to a.
(iii) f can be extended continuously to a by setting f (a) = y. In other words, the
function fe : A B defined by
(
f (x) x 6= a
fe(x) =
y
x=a
is continuous on A.
30.1. Derivatives
Definition. A function f : (a, b) R is said to be differentiable at x0 (a, b) if for every
> 0 there exists > 0 such that
f (x) f (x0 )
L < .
(30.1)
0 < x x0  <
x x0
We call L the derivative of f at x0 , denoted f (x0 ).
xx0
or equivalently
f (x) f (x0 )
=L
x x0
f (x0 + h) f (x0 )
=L
h0
h
lim
103
104
(30.2)
so that
In other words, the error term E(x) is goes to zero faster than x x0 does:
E(x)
= 0.
x x0
The astute reader will observe that there is the slight problem defining E(x0 ). However,
one notes that (30.2) implies that E(x) is uniformly continuous near x0 and hence can be
extended continuously to x0 (see HW).
Summing things up, a differentiable function f is well approximated by the linear
function f (x0 ) + f (x0 )(x x0 ) near x0 .
lim
xx0
Theorem 30.2. If f : (a, b) R is a constant function, then f (x) = 0 for all x (a, b).
Proof. Since the difference quotient is always 0, the theorem follows immediately from
the definition of the derivative.
30.2. Basic Theorems
Theorem 30.3. If f is differentiable at x0 , then f is continuous at x0 .
Proof. This follows from the fact that
f (x) f (x0 )
x x0 
lim f (x) f (x0 ) = lim
xx0
xx0
x x0
f (x) f (x0 )
lim x x0 
= lim
xx0
xx0
x x0
= f (x0 ) lim x x0 
xx0
= 0.
LECTURE 31
f (x) f (x0 )
0
x x0
f (x) f (x0 )
0
x x0
whence f (x0 ) 0. Putting this all together, we find that f (x0 ) = 0, as desired.
(31.1)
Proof. Let
f (b) f (a)
ba
denote the slope of the secant of the graph of f from (a, f (a)) to (b, f (b)). Next define the
auxiliary function
g(x) = f (x) Sx
S=
and note that g(a) = g(b) since the net rise of both f (x) and Sx are the same over the
interval [a, b]. Let us make this more precise:
g(a) = f (a) Sa
105
106
g(b) = f (b) Sb
f (b) f (a)
ba
(b a)f (b) bf (b) + bf (a)
=
ba
bf (a) af (b)
=
ba
= f (b) b
whence
g(a) = g(b) =
There are two cases to investigate:
bf (a) af (b)
.
ba
for all x (a, b) whence f (x0 ) = S for every x0 (a, b). In particular, this
proves (31.1).
(ii) Suppose now that g is not constant. Since g is continuous on [a, b], it follows
from the Extreme Value Theorem that g assumes an absolute maximum and
absolute minimum on [a, b]. Since g is not constant, either the absolute maximum or absolute minimum value of g on [a, b] is attained at some x0 (a, b).
It follows that g(x0 ) is either a local maximum or local minimum whence
g (x0 ) = 0. However, this implies that f (x0 ) = S whence (31.1) follows.
An immediate corollary of the Mean Value Theorem is Rolles Theorem:
Theorem 31.3 (Rolles Theorem). If f : [a, b] R is continuous on [a, b], differentiable
on (a, b), and f (a) = f (b), then there exists x0 (a, b) so that f (x0 ) = 0.
Example 31.1. If f (x) = x3 + px + q where p > 0, then f has a unique real root. First
let us observe that at least one root exists. Since limx f (x) = , it follows that
f assumes both positive and negative values whence a root must exist by the Intermediate
Value Theorem. Now suppose toward a contradiction that there exist a < b such that
f (a) = f (b) whence there would exist a c (a, b) such that 0 = f (c) = 3c2 + p > 0.
This is a contradiction.
Theorem 31.4. If f is differentiable on (a, b) and f (x) 0 for all x (a, b), then f is
increasing on (a, b). In other words, x y implies that f (x) f (y).
Proof. Let x < y (the case x = y is trivial) and note that the Mean Value Theorem gives
us c (x, y) so that
f (y) f (x) = f (c)(y x) y x 0.
107
Example 31.2. The Mean Value Theorem can be used to prove various interesting inequalities for everyday functions. For example, given a, b (/2, /2) there exists c strictly
between a and b so that
tan b tan a = (sec2 c)(b a)
from which it follows that
 tan b tan a b a
2
since sec c 1 for all c (/2, /2).
2
Example 31.3. The function f (x) = ex is uniformly continuous on [0, 1]. Although
this is guaranteed from the fact that f is continuous and [0, 1] is compact, we can prove
this directly. Given 0 x < y 1, the Mean Value Theorem asserts that there exists
c (x, y) so that
2
2
2
ex ey = 2cec (x y).
Since 0 c 1, it follows that
2
x y < =
f (x) f (y) < .
2e
2
Since this depends only upon , it follows that f (x) = ex is uniformly continuous on
[0, 1].
The following theorem (which we state without proof) asserts that a differentiable
function satisfies the mean value property:
Theorem 31.5. Let f : [a, b] R be differentiable at each point of [a, b]. If f (a) < y <
f (b) or f (b) < y < f (a), then there exists x (a, b) such that f (x) = y.
Example 31.4. There does not exist a function F : R R such that F (x) = [x] for all
x R. Since the greatest integer function certainly does not have the intermediate value
property, it is clear from the preceding theorem that it cannot be the derivative of another
function.
LECTURE 32
0.002
0.001
0.025
0.001
0.002
and
f (x) f (0)
= lim x sin 1 
lim
x
x 0 x0+
x0+
108
0.05
109
0.75
0.5
0.25
0.025
0.05
0.25
0.5
0.75
1
whence f (0) = 0. Since f (x) oscillates wildly (with amplitude approaching 1 as x 0),
it follows that limx0 f (x) 6= 0 = f (0) and hence f is discontinuous at x = 0. In
particular, note that f exists everywhere and the discontinuity at x = 0 is not a jump
discontinuity.
Example 32.2. An even more bizarre function can be constructed by modifying the preceding example. Consider the function (see Figure 2)
0.01
0.0075
0.005
0.0025
0.02
0.04
0.0025
0.005
0.0075
0.01
g(x) =
x3/2 sin x1
0
x>0
.
x0
By reasoning similar to that of the preceding example, we see that g (0) = 0. However,
the standard derivative formulas from Calculus I tell us that
1
1
1
3
g (x) =
x sin cos
2
x
x
x
for x > 0. Thus g is differentiable at x = 0 but g oscillates with increasing frequency
and unbounded amplitude as x approaches zero (see Figure 1). In particular, g exists
everywhere but is discontinuous at 0 in an extreme way.
110
15
10
0.025
0.05
5
10
15
20
3
2 x sin(1/x)
1
x
cos(1/x).
Example 32.3. This example should dispel the common misconception that if f (x0 ) > 0,
then f must be increasing in some neighborhood of x0 . Using similar reasoning to the
preceding examples, one can show that the function (see Figure 5)
(
x + 2x2 sin x1 x 6= 0
h(x) =
0
x=0
satisfies h (0) = 1 > 0 and hence one might naively assume that h is increasing in some
small neighborhood of 0 (this is not guaranteed by any theorem read your calculus text
more closely). This turns out to be false. Indeed, the derivative of h is given by
0.075
0.05
0.025
0.1
0.05
0.05
0.1
0.025
0.05
0.075
h (x) =
1 + 4x sin x1 2 cos x1
1
x 6= 0
x=0
which oscillates between positive and negative values infinitely often as x approaches 0.
Thus h is not increasing on any open interval that contains 0.
Example 32.4. Another common misconception is that if a function has a local minimum
or maximum at a point, then the derivative of that function must undergo a simple change
of sign at that point. Consider the function (see Figure 6)
111
0.000014
0.000012
0.00001
8 106
6 106
4 106
2 106
0.04
0.02
0.02
0.04
k(x) =
x4 (2 + sin x1 )
0
x 6= 0
x = 0.
In particular, we note that attains its absolute minimum at k(0) = 0. Using the definition
of the derivative, we can see that
(
4x3 (2 + sin x1 ) x2 cos x1 x 6= 0
k (x) =
0
x = 0.
In particular, k (0) = 0 as expected. However, the formula for k shows that k (x) assumes both positive and negative values in any neighborhood of 0 and hence k(x) is not
monotonic on any interval (0, ) or (, 0) for any > 0.
LECTURE 33
Uniform Convergence
33.1. Pointwise Convergence
Definition. Let (A, dA ) and (B, dB ) be metric spaces. A sequence of functions fn : A
B converges pointwise to a function f : A B if
lim fn (a) = f (a)
for each a A.
In other words, a sequence fn converges pointwise to f if and only if it converges
pointbypoint. Unfortunately, pointwise convergence is of limited use since it does not
respect continuity. Consider the following example:
Example 33.1. The functions fn : [0, 1] R defined by fn (x) = xn (see Figure 1) are
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
(
0
lim fn (x) =
n
1
if 0 x < 1
if x = 1
112
113
(1 + 1 + + 1 + 1) x y

{z
}
n
= nx y
< .
so that each fn is uniformly continuous. Thus even the pointwise limit of uniformly continuous functions need not be continuous. Although our s do not depend on x, y, they do
seem to depend on n and .
33.2. Uniform Convergence
Since pointwise convergence does not preserve continuity, we need a stronger, more
restrictive notion of convergence. Fortunately, we have already laid some of the groundwork for this in our discussion of normed vector spaces.
Definition. Let (A, dA ) and (B, dB ) be metric spaces. A sequence of functions fn : A
B converges uniformly to a function f : A B if for each > 0 there exists N N so
that for all a A
n N dB (fn (a), f (a)) < .
The main way to picture uniform convergence (at least in the special case of functions
from some closed interval [a, b] to R) is with via tubes (see Figure 2). Based upon
114
Proof. This is due to the fact that fn (x) f (x) < for all x [a, b] if and only if
sup{fn (x) f (x) : x [a, b]} < if and only if d (fn , f ) < .
An important fact about uniform convergence is that it preserves continuity:
Theorem 33.2. Let (A, dA ) and (B, dB ) be metric spaces. If fn : A B is a sequence
of continuous functions which converges uniformly to f : A B, then f is continuous. In
other words: the uniform limit of continuous functions is continuous.
Proof. Suppose that fn : A B converges uniformly to some function f : A B. We
wish to show that the limit function f is continuous on A. It therefore suffices to show
that f is continuous at each point x A. To this end, let > 0 and let x A. Since fn
converges to f uniformly on A, there exists N N so that
(33.2)
dA (x, y) < dB (fN (x), fN (y)) < .
3
Putting (33.1) and (33.2) together we find that
dB (f (x), f (y)) dB (f (x), fN (x)) + dB (fN (x), fN (y)) + dB (fN (y), f (y))
< + +
3 3 3
= .
In other words, given any > 0 and any x A, we can find a corresponding > 0 so that
dA (x, y) <
LECTURE 34
Uniform Convergence
34.1. Completeness of C(X)
Definition. Let (X, d) be a compact metric space. C(X) denotes the normed vector space
of all continuous function f : X R endowed with the norm kf k = supxX f (x).
By the Extreme Value Theorem, it follows that kf k is finite for each f C(X) (this
is why we need X to be compact). Being a normed vector space, C(X) is automatically a
metric space when equipped with the associated metric
d (f, g) = sup f (x) g(x).
xX
Theorem 34.1. If (X, d) is a compact metric space, then C(X) is a complete metric space.
Proof. Let fn be a Cauchy sequence in C(X). It follows from the fact that
fn (x) fm (x) sup fn (x) fm (x)
axb
= d (fn , fm )
that fn (x) is Cauchy in R for each x X. Thus the sequence fn converges pointwise and
we may define a function f : X R by the formula
f (x) = lim fn (x).
n
n, m N d (fn , fm ) < .
2
Since fn converges to f pointwise, it follows that for each x X there exists an m(x)
N so that
< +
2 2
=
for any x X. This implies that the sequence fn converges to f uniformly on X. Since
the uniform limit of continuous functions is continuous, it follows that f is continuous (and
hence belongs to C(X)). Therefore C(X) is complete.
Another fact (which we shall not prove) is the following:
115
116
1.2
0.8
0.6
0.4
0.2
1
0.5
0.5
LECTURE 35
Weierstrass Mtest
35.1. Weierstra M Test
Theorem 35.1 (Weierstra M Test). Let (X, d) be a metric space and let let fn : X R
be a sequence of functions satisfying
fn (x) Mn
P
P
for all x X and for all n N. If n=0 Mn converges, then n=0 fn converges
uniformly (and absolutely for each x X). In particular, if each fn is continuous, then
the limit function f : X R is continuous.
P
Proof.
series n=0 fn (x) converges by comparison with
Pnumerical
P For each x X, the
Mj < .
j=n+1
This follows from the fact that the tail end of a convergent series goes to zero. For each
x X we now have
n
X
X
fj (x)
fj (x) = 
f (x)
j=0
j=n+1
j=n+1
fj (x)
Mj
j=n+1
< .
P
kf
k
converges,
then
f
=
118
The Weierstrass M Test also furnishes a way for producing somewhat bizarre continuous functions. For example, one can show that a sequence of everywhere differentiable
functions can converge uniformly to a nowhere differentiable function.
Example 35.2. Start with a sawtooth w : R R defined by
w(x) = 1 2 hxi 1
where hxi denotes the fractional part of x (see Figure 1). The Weierstrass nowhere differ1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.4
0.6
1
0.8
0.2
0.4
0.6
0.8
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
n
X
3
n=0
w(4n x).
By the Weierstrass M Test, one sees that W (x) series converges uniformly on [0, 1]. In
particular W (x) is continuous on [0, 1] (and hence uniformly continuous).
However, one can show that W (x) does not have a derivative at any point of [0, 1]. In
P3
P4
light of how spiky the graph of the functions n=0 ( 34 )n w(4n x) and n=0 ( 43 )n w(4n x)
are (see Figure 2), it is not surprising that the limit function W is not differentiable anywhere.
2.5
2
1.5
1
0.5
0.2
0.4
0.6
0.4
0.6
0.8
2.5
2
1.5
1
0.5
0.2
0.8
F IGURE 2. Graphs of the fourth and fifth partial sums of the Weierstrass series.
119
g (c)
g(b) g(a)
(35.1)
Setting g(x) = x on [a, b] yields the standard version of the Mean Value Theorem.
Proof. Define an auxiliary function
h(x) = f (x)
and observe that
f (b) f (a)
g(x)
g(b) g(a)
f (a)g(b) f (b)g(a)
g(b) g(a)
(this is straightforward, but slightly tedious computation). Since h is continuous on [a, b]
and differentiable on (a, b), it follows from Rolles Theorem that there exists c (a, b)
such that h (c) = 0. In other words,
h(a) = h(b) =
f (b) f (a)
g (c).
g(b) g(a)
The preceding equation immediately implies (35.1).
0 = f (c)
1Observe that the denominator g(b) g(a) is nonzero. Indeed, if g(a) = g(b), then by Rolles Theorem
there exists x0 (a, b) such that g (x0 ) = 0. This contradicts the hypothesis of the theorem.
LECTURE 36
LHopitals Rule
and Taylors Theorem
36.1. LHopitals Rule
An important consequence of Cauchys Mean Value Theorem is the following:
Theorem 36.1 (LHopitals Rule). If
(i) f, g are differentiable on (a, b),
(ii) limxa+ f (x) = limxa+ g(x) = 0,
(iii) g(x) 6= 0 and g (x) 6= 0 for all x (a, b),
(iv) limxa+
f (x)
tends to a finite limit L,
g (x)
then
f (x)
f (x)
= lim
= L.
(36.1)
xa+ g(x)
xa+ g (x)
Similar statements hold in the cases where x and/or limxa+ f (x) = limxa+ g(x) =
lim
Proof. Let x > a and observe that (ii) ensures that f and g extend continuously to [a, b)
by setting
f (a) = g(a) = 0
Let xn be a sequence in (a, b) tending to a. By Cauchys Mean Value Theorem, there exists
a sequence cn such that a < cn < xn for all n N and such that
f (cn )
f (xn ) f (a)
=
.
g (cn )
g(xn ) g(a)
Since f (a) = g(a) = 0, it follows that
f (xn )
f (cn )
=
g (cn )
g(xn )
for all n N. As xn a+ , it follows from the Squeeze Theorem that cn a+ whence
f (cn )
f (xn )
= lim
= L.
lim
n g (cn )
n g(xn )
By (iv), the limit L is independent of the sequence cn . In particular, the preceding holds for
every sequence xn in (a, b) tending to a from which the desired result (36.1) follows.
Example 36.1. Condition (iv) is essential. Consider the functions
f (x) = x + sin x,
g(x) = x.
120
121
Clearly
f (x)
= lim (1 + cos x),
x g (x)
x
lim
f (x)
x + sin x
sin x
= lim
= 1 + lim
= 1 + 0 = 1.
x
x x
g(x)
x
d
dx (x + cos x sin x)
d sin x
(x + cos x sin x)
dx e
2
2 cos x
2esin x cos2 x + esin x cos x(x + cos x sin x)
2 cos x
= lim sin x
x e
(x + sin x cos x + 2 cos x)
2e
lim
x x 3
= 0,
= lim
whereas
lim
f (x)
1
= lim sin(x)
x
g(x)
e
does not exist. An incorrect application of LHopitals rule in this instance leads to the
wrong answer.
36.2. Taylors Theorem
An important generalization of the Mean Value Theorem is Taylors Theorem:
Theorem 36.2 (Taylors Theorem). Let n 0 and f : [a, b] R. If
(i) f , f , . . . f (n) are continuous on (a, b),
(ii) f (n+1) exists on (a, b),
(iii) x, x0 (a, b),
122
then there exists strictly between x and x0 such that
f (x) = f (x0 ) + f (x0 )(x x0 ) + +

{z
Pn (x)
(n+1)
f
()
(x x0 )n+1 .
(n + 1)!

{z
}
f (n) (x0 )
(x x0 )n
n!
}
Rn (x)
Pn (x)
(for this specific value of x we are certainly not asserting that f is a polynomial of degree
n + 1). We wish to show that rn = f (n+1) () for some lying strictly between x and x0 .
Define the auxiliary function
n
X
f (k) (t)
rn
F (t) =
(x t)k +
(x t)n+1
k!
(n + 1)!
k=0
The expression
Pn (x) =
n
X
f (k) (x0 )
(x x0 )k
k!
k=0
f (n+1) ()
x x0 n+1
(n + 1)!
LECTURE 37
Taylor Series
Theorem 37.1 (Taylors Inequality). If f (n+1) (x) M for x x0  < r, then the
remainder Rn (x) of the Taylor series satisfies
Rn (x)
M
x x0 n+1
(n + 1)!
for x x0  < r.
Proof. This follows from Taylors Theorem and the fact that f (n+1) () M for all
between x and x0 .
Definition. If f is infinitely differentiable at x0 , then
X
f (n) (x0 )
(x x0 )n
n!
n=0
is called the Taylor series for f centered at x0 . We do not claim that the series converges
nor that it converges to f on some open interval containing x0 .
Theorem 37.2 (Taylor Expansion Theorem). Suppose that
(i) the Taylor series for f centered at x0 converges1 for x x0  < r,
(ii) limn Rn (x) = 0 for x x0  < r,
then
f (x) =
X
f (n) (x0 )
(x x0 )n
n!
n=0
for x x0  < r. In other words, the Taylor series for f (x) centered at x0 converges to the
value f (x).
Proof. Let xx0  < r and recall that f (x)Pn (x) = Rn () where is strictly between x
and x0 . Taking absolute values and applying (ii) we find that limn f (x) Pn (x) = 0.
Since Pn (x) is simply the nth partial sum of the Taylor series for f (x) centered at x0 , the
desired result follows.
To most students of Calculus II, it is surprising that both (i) and (ii) are required for
the conclusion of the theorem to hold. We will consider several bizarre examples shortly.
1In particular, this implies that f is infinitely differentiable at x .
0
123
124
C n.
nN
f1 (x) = xx
is C 1 but not C 2
f2 (x) = x3
is C 2 but not C 3
..
.
X
f (x) =
an (x x0 )n
n=0
125
its center. In particular, condition (ii) in the Taylor Expansion theorem cannot be ignored.
To be more specific, we claim that the function f : R R
(
2
e1/x x 6= 0
f (x) =
0
x=0
is C but not C (see Figure 1). To be specific, we claim that f is infinitely differentiable
1
0.8
0.6
0.4
0.2
4
2
n = 0, 1, 2, . . . .
In other words, we are claiming that the Taylor series of f (x) centered at x0 = 0 is the
zero function. In particular, f is an example of an infinitely differentiable function which
does not equal its own Taylor series on any open interval containing 0. This shows that C
is a proper subset of C .
By the standard differentiation formulas, it follows that f is infinitely differentiable
at x0 as long as x0 6= 0. It therefore remains to show that f (n) (0) exists for each n =
0, 1, 2, . . . (in particular, we will show that f (n) (0) = 0). We do this by induction.
BASE C ASE: Since f (0) (0) = f (0) = 0 by definition, the base case is trivial.
I NDUCTIVE S TEP: Suppose that we have already shown that
For x 6= 0, we see that
..
.
and, more generally, we observe that f (n) (x) is a polynomial3 in 1/x times e1/x :
2
x 6= 0.
126
x0
=0
2
since et tends to zero faster4 than any polynomial can blow up. Thus f (n) (0) = 0 for
all n = 0, 1, 2, . . . as claimed. In particular, the Taylor series for f (x) at x0 = 0 is the zero
function!
Definition. Let f : R R be a function. The support of f is the set
supp(f ) = {x R : f (x) 6= 0}.
In other words, supp(f ) is the closure of the set upon which f does not vanish. In particular, supp(f ) is always closed.
Example 37.2. In this example, we construct a C with compact support. Consider the
bump function
(
2
e1/(1x ) if x < 1,
f (x) =
0
otherwise .
An argument similar to that used in Example 37.1 shows that f C (R). See Figure 2
We say that f has compact support since supp(f ) = [1, 1] is compact. Keep in mind that
127
if x 1
= 0
g(x) (0, 1) if 1 < x < 1
=1
if x 1
Thus we have a C ramp function.
The following theorem shows that condition (i) in the Taylor Expansion Theorem
cannot be ignored:
Theorem 37.3. For each sequence a0 , a1 , . . . of real numbers, there exists a function f
C (R) such that
f (n) (0)
= an
n!
P
for all n N. In particular, for each prospective Taylor series n=0 an xn , there exists
a function f whose Taylor coefficients are precisely an . Moreover, the choice an = nn
yields a C function whose Taylor series diverges whenever x 6= 0.
Sketch of Pf. Let be a C supported in [2, 2] and such that (x) = 1 if x [1, 1]
(it takes some work to justify the existence of such a function). In particular, observe that
(n) (0) = 0 for n = 1, 2, 3, . . . since is constant on [1, 1]. Now let
fn (x) = an xn (n x)
where n is a sequence of positive numbers to be defined later. Now observe that
(
n!an if j = n
(j)
fn (0) =
0
if j 6= n.
P
LECTURE 38
y(x0 ) = y0
(38.1)
where y is a function of x, x0 and y0 are real constants, and F (x, y) is a continuous function
of x, y. Many standard problems in pure and applied mathematics are of this form.
Theorem 38.1. Suppose that F (x, y) and F
y (x, y) are continuous on an open neighbor2
hood of (x0 , y0 ) R . If > 0 is given, there there exists a > 0 such that there is a
unique continuously differentiable function y(x) on I = [x0 , x0 + ] which satisfies the
initial value problem (39.1) and for which y(x) y0  < for all x I.
Proof. By the hypotheses of the theorem, there exists a closed rectangle R (a compact set)
centered at (x0 , y0 ) and constants M0 , M1 > 0 such that
F (x, y) M0
F
y (x, y) M1
(x, y) R,
(x, y) R.
F
y
with respect to
(x, y) R,
(x, y1 ), (x, y2 ) R.
By considering an even smaller rectangle, we may also presume that R has width 2 where
> 0 is sufficiently small so that
M0 ,
M1 < 1.
Let I = [x0 , x0 + ] and let E denote the closed ball in C(I) centered at the
constant function y0 . In other words,
E = {f C(I) : kf (x) y0 k }.
Since E is a closed subset of the complete metric space (C(I), d ), it follows that (E, d )
is itself a complete metric space.
128
129
E.
We claim that T (E) E. In other words, T maps E into itself and can be regarded as a
function T : E E. Indeed, let E (i.e., k y0 k ). For each x I it follows
that
Z x
[T ](x) y0  =
F (t, (t)) dt
x0
M0 x x0 
M0
.
Thus kT y0 k and T E.
Having established that T maps E into E, we now show that T : E E is a strict
uniform contraction. In fact, we will show that the contraction constant is
= M1 < 1.
Let 1 , 2 E and note that
kT 1 T 2 k
Z x
Z x
= sup
F (t, 2 (t)) dt
F (t, 1 (t)) dt
xI
x0
x
Z x0
F (t, 1 (t)) F (t, 2 (t)) dt
sup
xI x0
Z x
= sup M1
1 (t) 2 (t) dt
xI
x0
Z x
= M1 k1 2 k
dt
x0
= M1 k1 2 k
k1 2 k .
y(x0 ) = y0 .
Moreover, taking the derivative of (38.2) and using the Fundamental Theorem of Calculus
it follows that y = F (x, y) for all x I. In other words, our fixed point y E is a
solution to the initial value problem (39.1). Moreover, it is the only solution in E.
LECTURE 39
Picard Iteration
39.1. Initial Value Problems
Consider the initial value problem
y (x) = F (x, y(x)),
y(x0 ) = y0
(39.1)
where y is a function of x, x0 and y0 are real constants, and F (x, y) is a continuous function
of x, y. Many standard problems in pure and applied mathematics are of this form.
Theorem 39.1. Suppose that F (x, y) and F
y (x, y) are continuous on an open neighborhood of (x0 , y0 ) R2 . If > 0 is given, there there exists a > 0 such that there is a
unique continuously differentiable function y(x) on I = [x0 , x0 + ] which satisfies the
initial value problem (39.1) and for which y(x) y0  < for all x I.
Let us recall a few things about the method of proof of the Existence and Uniqueness
Theorem. First, we obtained the > 0 at the beginning of the proof. Recall that the size
of (and consequently the length of the interval upon which we expect a solution to (39.1)
to exist) was determined by the behavior of F and F
y (x, y). Once > 0 was determined,
we defined I = [x0 , x0 + ] and defined E to be the closed ball in C(I) centered at
the constant function y0 .
Next, we noticed that y(x) is a solution to the initial value problem (39.1) if and only
if
Z x
y (t) dt
y(x) y(x0 ) =
x0
Z x
F (t, y(t)) dt.
=
x0
x0
it follows that y C(I) is a solution to (39.1) if and only if y is a fixed point of the integral
operator
Z x
F (t, (t)) dt,
[T ](x) = y0 +
x0
y(0) = 0.
131
Since the equation is both separable and linear, it is easy to solve (assuming you have taken
an elementary course in differential equations):
2
y(x) = ex 1.
In fact, it is easy to check that the above is indeed a solution to the initial value problem.
Let us see the Contraction Mapping Principle in action. Here x0 = y0 = 0 and
F (x, y) = 2x(1 + y). The corresponding initial value problem can be rewritten as the
integral equation:
Z
x
(x) =
The sequence n should approach (with respect to d ) the actual solution to our initial
value problem (at least on some neighborhood of x0 = 0). This method is known as Picard
iteration.
With the initial approximation is 0 (x) = y0 = 0, it follows that
1 (x) = [T ](x)
Z x
=
2t[1 + 0] dt
Z0 x
2t dt
=
0
= x2 .
Similarly,
2 (x) = [T 1 ](x)
Z x
2t[1 + 1 (t)] dt
=
Z0 x
=
2t[1 + t2 ] dt
Z0 x
=
2t + 2t3 dt
0
= x2 +
x4
.
2
Computing again
3 (x) = [T 2 ](x)
Z x
=
2t[1 + 2 (t)] dt
0
Z x
x4
dt
2t 1 + t2 +
=
2
0
Z x
=
2t + 2t3 + t5 dt
0
132
x4
x6
+ .
2
6
X
x2n
=
.
n!
n=1
n (x) = x2 +
APPENDIX A
Basic Logic
A.1. Primitive Concepts
To do meaningful mathematics one needs to start out with various primitive concepts. There are many things that we cannot adequately define without some form of
selfreference. For example, try to define the following without referring to other concepts
that require further definitions:
Idea
Statement
True, false
Sets, objects
Everything, nothing
There are a host of words that we use every day that we simply cannot define without
reference to other, equally hardtodefine concepts. You might say, a set is a collection of
objects. But what is a collection? What are objects? Simply put, to convey information to
someone, you must both already have a common language and several primitive concepts
that both parties understand beforehand.
Another interesting example is that of numbers. What exactly is 2? What is a whole
number, exactly? Can you define it? Of course, one might just say that this is silly we all
know what numbers are, dont we? It turns out, however, that some languages only have
words for one and many but no words to express the concept of two, three, etc.
There are certain ideas (such as sets, true, false, etc.) which mathematicians use freely,
without worrying about any of the philosophical difficulties involved. On the other hand,
many philosophers are not satisfied with this situation and seek further to clarify the meaning of some of these words (in analytic philosophy). In keeping with our main theme
(learning about real analysis), we will not be overly picky with the philosophical details.
The term sentence and statement will be used interchangeably in these notes to refer
to an expression that is wellformed in the rules of the language in which it is written. This
brings up the ideas of languages and of what exactly constitutes meaning (these are issues
that are discussed in the realms of computer science and philosophy). There are expressions like i(&*#dfs9[{ and at the the up that have no meaningful interpretation
in the language in which they are written and we will not consider these to be sentences.
Sentences are classified according to their truth value:
Example A.1. Some sentences are true. The sentences
1+1=2
and
There are infinitely many prime numbers
133
134
are true. The fact that the second statement (known as Euclids theorem) is true is not
obvious it requires proof.
Example A.2. Some sentences are false. For instance
0>1
and
One can get an A in Math 131 without doing the homework
are false statements.
We also adopt the conventions that a statement cannot be simultaneously true and
false, although a sentence can be neither true nor false.1 A proposition is a statement
which has a definite truth value (it is either true or false). For example, 1 + 1 = 3 is
a proposition (which is false). Of course, there are many propositions whose exact truth
value is unknown to us. For instance:
(i) There are infinitely many pairs of twin primes.2
(ii) There exists an odd perfect number.3
Nevertheless, either an odd perfect number exists or one does not. The sentence There
exists an odd perfect number is a proposition. Unfortunately, we have not
been able to determine its exact truth value at this point in time.
A.2. Negation (NOT)
There are several basic operations which allow us to create new propositions from
old ones. The simplest of these operations is called negation, which simply reverses the
truth value of its argument. The negation P (read not P ) of a proposition P is the
proposition
It is not the case that P .
When negating English sentences, one can often write things in a more elegant fashion.
Example A.3. If P is the proposition
Class meets at 9am,
then P would be
It
{z case that}
 is not the
class
meets
{z at 9am} .

P
1This does not occur often in practice, but it does come up when considering metamathematical issues. We
will consider only one such example in this course.
2Twin primes are prime numbers like (17 and 19) or (29 and 31) which differ from each other by 2.
3
A natural number n is called perfect if n is equal to the sum of its proper divisors. For instance, 6 =
1 + 2 + 3, so 6 is a perfect number. The next largest perfect number is 28 since 28 = 1 + 2 + 4 + 7 + 14. Can
you find more?
135
A proposition P and and its negation P are related by the following truth table:
P
T
F
P
F
T
Moreover, it is not hard to see that the expressions P and P have the same truth table:
P
T
F
P
F
T
P
T
F
The importance of this observation is that the roles of P and P can be interchanged in
mathematical arguments. We say that P and P are equivalent and write
P
(A.1)
The truth value of P (x) depends, of course, on x. Since a real number which is not rational
is called irrational, we can write P (x) as
x is not irrational.
(A.2)
Clearly, (A.1) and (A.2) are saying the same thing in two different ways.
A.3. Conjunction (AND)
If P and Q are propositions, then the new proposition P Q is interpreted as P and Q,
just as in English. In other words, the sentence P Q is true if and only if both statements
P and Q are true. Therefore the truth value of P Q is related to P and Q via the following
table:
P Q P Q
T T
T
T F
F
F T
F
F F
F
Example A.7. If
P = It is Thursday
Q = It is raining today,
then P Q is the proposition
It is Thursday
{z
}

P
and
{z}
it is raining today .
{z
}

Q
The proposition P Q is therefore true only on rainy Thursdays (when P and Q are both
true).
4A rational number is a fraction, a ratio a/b of integers a and b, where b 6= 0.
136
Example A.8. Using truth tables, we can derive the associative law for :
P (Q R) (P Q) R
Indeed, we merely need to produce the truth tables for P (Q R) and (P Q) R and
compare entries. Since these expressions have three propositional variables (P, Q, R), our
truth table will have 8 = 23 rows since there are two possibilities for each variable (namely
T or F ).
P
T
T
T
F
F
F
F
Q R
T T
T F
F T
T T
F T
T F
F F
P Q
T
T
F
F
F
F
F
(P Q) R
T
F
F
F
F
F
F
QR
T
F
F
T
F
F
F
P (Q R)
T
F
F
F
F
F
F
Since the truth tables for P (Q R) and (P Q) R are the same, they are equivalent
statements.
A.4. Disjunction (OR)
If P and Q are propositions, then P Q is the new proposition
P or Q
where the word or is to be interpreted as an inclusive or (see below). Specifically, the
truth value of the proposition P Q is related to P and Q via the following table:
P
T
T
F
F
Q
T
F
T
F
P Q
T
T
T
F
or
{z}
it is raining today .
{z
}

Q
This proposition is false only on sunny days that are not Thursday.
137
(P Q)
P Q
(A.3)
P Q.
(A.4)
A short computation shows that the expressions (P Q) and P Q have the same
truth tables, which establishes (A.3):
P
T
T
F
F
Q
T
F
T
F
P Q
T
F
F
F
(P Q)
F
T
T
T
P
F
F
T
T
Q
F
T
F
T
P Q
F
T
T
T
We could do something similar to show that (A.4) is correct, but there is a better way. Since
(A.3) holds for any two propositions P and Q, we can insert P and Q in their place to
obtain
(P Q)
Negating both sides gives
(P Q)
(P ) (Q)
P Q.
(P Q)
P Q,
Q(x) = x is odd.
Thus
P (x) Q(x) = x is prime and x is odd
= x is an odd prime
The proposition P (x) Q(x) is therefore true for
and false for all other integers. By (A.3), the first of de Morgans laws, it follows that the
negation of the proposition x is an odd prime is
(x is an odd prime)

{z
}
P (x)Q(x)
(x is prime)
{z
}

P (x)
and
(x
is
odd
)
{z}  {z }
Q(x)
(x is prime) (x
 is{z odd})

{z
}
P (x)
Q(x)
138
(x is composite) (x is even)
x is composite or even.
x is composite or x is even
(Recall that an integer n is composite if it is divisible by a positive integer other than 1 and
n).
A.6. Implication (P Q)
The proposition P Q (called an implication) is read
If P , then Q
or
P implies Q
and is commonly denoted
P Q
or
Q P.
The proposition P is called the hypothesis of the implication and the proposition Q is
called the conclusion. Be careful with the order since P Q and (P Q) are quite
different expressions.5 Always remember that takes priority over and .
The truth table
P Q P P Q
T T
F
T
(A.5)
T F
F
F
F T
T
T
F F
T
T
for P Q shows that the only case where P Q is false is when P is true and Q is
false.
In some texts, P Q is defined to be
(P Q).
Q
T
F
T
F
Q P Q (P Q)
F
F
T
T
T
F
F
F
T
T
F
T
(P Q).
139
or
{z}
140
P Q
T
F
T
T
P Q
T
T
F
T
P Q
T
F
F
T
(P Q).
This does not conflict with our earlier usage of . For instance, we wrote de Morgans
first rule (A.3) as:
(P Q) P Q.
(A.6)
The preceding can itself be thought of as a statement, as opposed to simply relating the
truth values of the two statements (P Q) and P Q. A short computation shows
that the expression (A.6) has the following truth table:
P
T
T
F
F
Q
T
F
T
F
(P Q) P Q
F
F
T
T
T
T
T
T
(P Q) P Q
T
T
T
T
In other words, the statement (A.6) is true regardless of the truth value (or meaning) of P
and Q. Such statements (in more precise terminology, sentential forms) are called tautologies.
A.9. Contrapositive
The contrapositive of an implication P Q is defined to be
Q P.
The reason that contrapositives are so important is because they are equivalent to their
original implications:
(P Q) (Q P ).
Q
T
F
T
F
P Q
T
F
T
T
Q P
F
F
T
F
F
T
T
T
Q P
T
F
T
T
(P Q) (Q P )
T
T
T
T
141
Thus if one wants to prove that the statement P Q is true, one can prove Q P
instead.
Example A.15. A positive integer x 2 is called perfect if x equals the sum of its proper
divisors. For instance 6 and 28 are perfect numbers since
6=1+2+3
28 = 1 + 2 + 4 + 7 + 14.
Let x denote a positive integer 2 and let
P (x) = x is perfect
Q(x) = x is even.
In plain English, we might say that
(P (x) Q(x))
(Q(x) P (x))
Note that these two propositions mean exactly the same thing, but in different ways. It is
unknown whether an odd perfect number exists (an unsolved problem for over 2000 years),
so the truth value of the propositions above are unknown.
APPENDIX B
This definition is somewhat circular and it underlines one of the obstacles in talking about
sets. One cannot define a set as a a collection without first knowing what a collection
is. After all, how does one define a collection? We will simply have to accept that the
student understands what is meant by the term set. We do not have the time to grapple
with the deep philosophical issues that are clearly at hand.
Sets have elements, also known as members. If A is a set, then x A stands for the
proposition
x belongs to A
or
x is an element of A.
For example, 2 {0, 1, 2} is a true proposition. One way to describe a set is by just
writing out its members between the set brackets { and }. The proposition (x A),
which translates as
x is not an element of A,
is usually written x
/ A.
Example B.1. According to the definition, we have
2
/ {0, penguin, {0, 1, 2}}.
This example shows a couple things. First, the elements of a set do not have to be the
same type of thing. Second, a set (namely {0, 1, 2}) can be an element of another set. If
one thinks of a set as being a box in which objects are placed, then one sees that is not
unreasonable for a box to contain some items and possibly another box.
Two sets A and B are called equal, written A = B, if and only if they have exactly the
same elements. If two sets A and B are not equal, we write A 6= B (which literally means
(A = B)). This is the case whenever A contains an object that B does not, or viceversa.
Example B.2. According to our definition of set equality, a set is completely determined
by its members. For instance,
{, e, e, } = {, , , e} = {, e} = {e, }.
142
143
Repetition and order do not matter when listing the members of a set. Also observe that
{, e, {e}} 6= {, e}
since {e} and e are not the same thing. One way to think about this is that e and a box
containing e are not the same thing.
The set
= {}
is called the empty set. It has no elements, it contains nothing. One can think of it as an
empty box. There is one catch, however, for the empty set is considered to be unique it
is the only set with no elements.
Example B.3. Using the definition of set equality, we see that
6= {}
since {} while
/ (and therefore and {} do not have exactly the same
elements). Think of it this way: An box with an empty box inside is not the same thing as
an empty box.
Example B.4. The following sets
{}
{, {}}
{, {}, {, {}}}
..
.
are all distinct from one another. In fact, each successive set in our list contains all of the
preceding ones as elements. They are all created from nothing, using only the primitive
notion of sets. You can therefore build quite complicated sets without assuming that actual
objects exist! In fact, if one wants to axiomatize set theory and construct all of mathematics
rigorously from the basic principle of a set, one can take the sequence of sets above as
starting point for defining the natural numbers.
Since this is a mathematics course, we will obviously be talking about numbers (of
various sorts) quite often. Some important sets of numbers which we will frequently refer
to are the following:
P = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, . . .}, the set of prime numbers.1
N = {1, 2, 3, . . .}, the set of natural numbers.
Z = {. . . , 2, 1, 0, 1, 2, . . .}, the set of integers.
Q, the set of rational numbers (fractions).
R, the set of real numbers.
144
Although all of the above can be rigorously constructed from the basic axioms of
set theory, we will not do so here. In this course, we make the bold assumption that
P, N, Z, and Q exist. The real numbers, however, are a different matter altogether. We will
explore the nature and structure of R shortly. The real numbers, it turns out, are much more
complicated that you might think.
B.2. Using Properties to define Sets
We can use propositions to define sets using the socalled set builder notation. If P (x)
is a proposition (whose truth value depends on the variable object x), we define
{x : P (x)}
to be the set of all x such that P (x) is true. We are overlooking a few fine points of logic
here,2 but this definition will be sufficient for most purposes (although we will see how
unrestricted use of the set builder notation can lead to logical paradoxes).
Example B.5. The set P of all prime numbers can be written as
{2, 3, 5, 7, 11, 13, . . .} = {x : x is a prime number}
= {y : y is a prime number}
Note that the particular symbol used as a variable is irrelevant. It is a dummy variable as
R1
R1
in calculus: 0 f (x) dx = 0 f (y) dy.
Example B.6. When using the set builder notation, we must be careful to use conditions
that are unambiguous. For instance
{x : x is a lucky number}
are not possible to explicitly produce since we do not know the entire decimal expansion
for (for it is an infinite string of digits without any apparent pattern). We know that
certain strings of digits, like 1415 and 535 belong to the set above, but in general it is not
a set that we can grasp in its entirety.
B.3. Russells Paradox
Having worked with sets a little bit, you might be surprised to learn that our approach
to sets is not logically sound. In fact, it is called naive set theory to distinguish it from
the rigorous axiomatic approach used in formal set theory. A startlingly simple logical
paradox due to Bertrand Russell immediately shows that the basis of this approach to sets
is unsound.
One of the basic principles of naive set theory is the General Comprehension Principle, which we implicitly used above. In the early days of set theory (around 18731900),
mathematicians and logicians had always assumed that you can always define a set if you
2For instance, what is an object? Is x allowed to be any object? Clearly x should be restricted to all objects
145
have a definite property P (x). In other words, given a reasonable statement P (x), the set
of all x for which P (x) is true should exist, logically speaking. Essentially, they assumed
that
{x : P (x)}
should always exist and be something that we are allowed to think about and discuss logically. Surprisingly, this is not the case.
The death blow to naive set theory came in 1901 and it is called Russells Paradox.
Russell begins by letting
R = {x : (x is a set) (x
/ x)}
In other words,
R is the set of all sets that are not elements of themselves.
The expression
P (x) = (x is a set) (x
/ x)
is quite unambiguous. An object x should either be a set or not a set. An object x should
either be an element of itself or not be an element of itself. Thus P (x) looks like an unambiguous, if a little unusual, condition. As logical human beings, we should be permitted to
think about the set R.
Russell then asks: Does R contain itself or not? This is a simple yes/no question and
there are clearly only two possibilities.
C ASE 1: If R R is true, then R
/ R is true by the definition of R. However, this is not
logically possible since R
/ R is false when R R is true.
C ASE 2: If R
/ R is true, then R R is true by the definition of R. However, this is not
logically possible since R R is false when R
/ R is true.
Neither R R not R
/ R are logically possible! This means that we cannot treat R
as a set it is simply too large of an idea to be considered in a logical sound manner.
In other words, we cannot logically consider the set of all sets that are not elements of
themselves without running into paradoxes. We just cannot it is a law of the universe.
Russells Paradox shows that the General Comprehension Principle is not correct. Russell
discovered this paradox and sent it to Gottlob Frege (1848 1925) as Frege was finishing
his Grundgesetze der Arithmetik, a work which attempted to rigorously derive the laws of
arithmetic from supposedly logical axioms. Russells Paradox invalidated much of Freges
work. Indeed, Frege noted:
A scientist can hardly meet with anything more undesirable than to have the
foundation give way just as the work is finished. I was put in this position
by a letter from Mr. Bertrand Russell when the work was nearly through the
press.
There are many other logical paradoxes that have been discovered throughout the
years, but Russells paradox is one of the most important. It forced mathematicians and
logicians to completely reevaluate mathematics and logic from the ground up. Russells
Paradox ushered in a new age in which sets would have to be treated in a rigorous axiomatic fashion. The rules would have to be explicitly stated in such a way that Russells
Paradox would not occur in the universe of axiomatic set theory. Although we will not
discuss axiomatic set theory in this course, it is important to be aware that sets and set
theory are not as simple as they sound.
146
Example B.8. A car is equipped with a Russell light on its dashboard. The light turns on
to warn the driver if a light has burnt out. What happens when the Russell light burns out?
Example B.9. The following paradox of Eubulides of Miletus3 (4th century BCE) indicates that selfreference can be troublesome:
This statement is false.
This is a troublesome sentence (call it P ) since
P is true
P is false.
Thus Eubulides statement is not a logical proposition. This paradox is similar to the liar
paradox: I am lying.
B.4. Quantifiers
In mathematics, we often deal with propositions which depend on variables. The
special symbols (called quantifiers) and will help us. The symbol stands either for
for all, for every, or for each (depending on which makes more grammatical
sense). The symbol stands for there exists.
There are many ways to use quantifiers and various ways to combine them with other
symbols. The best way to understand how to read and translate sentences with quantifiers
is to study a number of examples.
Example B.10. The statement
(x > 0)(y)(x = ey )
can be translated in a number of ways:
For every x > 0, there exists a y such that x = ey .
147
(x, y Q) ( (x + y) Q) (xy Q)
translates as
(B.1)
148
[(x)P (x)]
(x)(P (x))
(x)(P (x)).
If the quantifiers have additional symbols attached, the rules are the same. For instance:
[(x A)P (x)]
(x A)(P (x))
(x A)(P (x)).
(B.2)
It turns out that (B.2) is false. In fact, 2 is an irrational number and hence cannot be
written in the form a/b where a and b are integers.5
We can negate (B.2) to obtain a true statement. In words, we might say:
There does not exist a rational number x such that x2 = 2.
5According to legend, this was discovered by the Pythagorean philosopher Hippasus of Metapontum (an
ancient Greek colony in southern Italy) around 500 BCE. The numbers e and were proved to be irrational by
Euler and Lambert in 1737 and 1760, respectively.
149
We would like to write this in terms of quantifiers (There does not exist is not a
quantifier) According to the rules for negating propositions with quantifiers, the negation
of (B.2) is:
(x Q)(x2 = 2)
There are several ways to interpret this:
(x Q)(((x2 = 2))
(x Q)(x2 6= 2).
For every x in Q, x2 6= 2.
For each rational number x, it is the case that
x2 6= 2.
2 is irrational.
This is a true statement, known as Hippasus theorem (and often attributed to Pythagoras).
B.6. Subsets
Definition (Set Inclusion). If A and B are sets, then we say that A B (read: A is a
subset of B) if every member of A is also a member of B. In other words,
(A B)
(x)(x A x B),
(B.3)
When we write x, the variable x actually lives in some universal set U . Typically, U
will be a set of numbers, functions, or other mathematical objects. Moreover, exactly what
the universal set U is will typically be clear from context.
The following theorem can be proved from the basic definitions and logical principles
(although we will not prove it in these notes):
Theorem B.1. (A = B) [A B B A]
The importance of the theorem above is that if we wish to prove that A = B, it suffices
to prove A B and B A separately. This is sometimes easier than proving A = B
directly.
Example B.17. {0, 1} {0, 1, 2} R. Indeed, {0, 1} is subset of {0, 1, 2} since
every element in {0, 1} (namely the numbers 0 and 1) also belongs to {0, 1, 2}. We also
see that {0, 1, 2} is a subset of R since 0, 1, and 2 all are elements of R (the set of real
numbers).
Example B.18. Observe that A A holds for any set A. In other words, every set is a
subset of itself. Indeed, the proposition
(x)((x A) (x A))
is true for any x in our universal space. To see this, simply write out a truth table for the
implication P Q where P = (x A) and Q = (x A):
xA xA
T
T
F
F
(x A) (x A)
T
T
150
(x ) (x A)
T
T
[(x)(x A x B)]
(x)[(x A x B)]
(x)[(x A) (x
/ B)].
(x)[((x A) (x B))]
Here we have added C, the set of complex numbers, to our list of number sets.6
6We might also add C H O where H is quaternion number system and O is the octonion number
system. The quaternions are noncommutative 4dimensional number system (whose elements are of the form
a + bi + cj + dk where a, b, c, d R) discovered by the Irish mathematician William Rowan Hamilton. The
idea came to him while he was walking to a meeting of the Irish Academy. Hamilton scratched the fundamental
formulas i2 = j 2 = k 2 = ijk = 1 on the stone of Brougham Bridge (Dublin). Hamiltons graffiti remains to
this day a mathematical tourist attraction. The octonions O are a bizarre number system (also called the Cayley
numbers) which we do not describe here in any further detail.
151
Example B.22. Let A = {0, 1, {a, b}} and B = {a, b, 1}. Then A\B = {0, {a, b}}. On
the other hand B\A = {a, b}.
Typically, one works inside of some universal set U and in terms of Venn diagrams, the
complement of A in U is just the outside of A. If the universal set is declared beforehand
(or obvious from context), then we sometimes denote the complement of A in U by A or
Ac .
Example B.23. If the universal set U = Z and A = N, then
the set of negative integers.
Nc = Z\N = {. . . , 3, 2, 1}
Definition (Union). If A and B are sets then the union of A and B is the set
A B = {x : (x A) (x B)}.
Definition (Intersection). If A and B are sets then the intersection of A and B is the set
A B = {x : (x A) (x B)}.
There are many laws about how unions and intersections interact. They can be derived
from the rules for and . For example:
Proof. One way to show that the two sets are equal A (B C) and (A B) (A C)
is to show that the conditions for membership in them are logically equivalent. We will
therefore try to show that the statements x [A (B C)] and x [(A B) (A C)]
are logically equivalent:8
x [A (B C)]
(x A) [x (B C)]
(x A) [(x B) (x C)]
def. of
def. of
def. of
def. of
152
Although Venn diagrams are not always accurate (they are limited by the constraints
of being in two dimensions and are hence unsuitable for picturing complex relationships
between large numbers of sets), they are generally a good tool for getting the feel of a
statement. For instance, draw some Venn diagrams to convince yourself that the theorem
above is true.
B.8. Ordered Pairs
Since we will typically be concerned with order pairs (and ordered ntuples) of real
numbers, we do not need to go further into the subject of ordered tuples and Cartesian
products at this point. However, let us briefly mention the technical definition:
Definition (Ordered Pairs). The symbol (a, b) denotes an ordered pair. It has the property
that if (c, d) is another ordered pair then
(a, b) = (c, d)
(a = c) (b = d).
It is important to note that the existence of a definition does not logically imply the
existence of the object defined. For instance, we might make the following definition:
Definition. A penguin p is called exceptional if p can fly.
It is clear that no exceptional penguins exist, despite the nice definition we made for
them. We have not actually proved that ordered pairs exist or that some structure satisfying
the definition can be constructed using sets. However, one can actually define the ordered
pair (a, b) to be the set
(a, b) = {{a}, {a, b}}
and then verify that this set satisfies the property of the definition. However, we will not
go through the (somewhat tedious) proof.
B.9. Cartesian Products
Definition (Cartesian Product). If A and B are sets, then the Cartesian product of A and
B is the set
A B = {(a, b) : a A, b B}.
Example B.24. If A = R and B = R then the Cartesian product R R is denoted R2 .
This is typically thought of as the xyplane in analytic geometry.
153
You can probably see a pattern forming here. In this case, our intuition is correct:
corresponds to the binary string 10101000 000. Since there are 2n possible strings,
there are 2n possible subsets.
Another way to think of the preceding sketch is to consider how many choices one
has when creating a subset of A. To construct a subset of A, one has to choose whether to
include a1 or not. Then on has to choose whether to include a2 or not, and so forth. In all,
there will be n choices to make and there are 2n possible ways of doing this.
Example B.31. Describing the power set of infinite sets is much trickier. For instance,
P(N) contains every possible subset of N and hence the sets
, {1}, {5, 23}, {2, 4, 6, 8, . . .}, {100, 101, 102, . . .}, {2, 3, 5, 7, 11, 13, . . .}
154
One of the cornerstones of Pythagorean philosophy was the assignment of mystical qualities to numbers. They chose to call numbers like 6, 28, 496, and 8126 perfect numbers.
Later philosophers and theologians like St. Augustine and Alcuin of York would expound the special nature of such numbers. For instance, in the City of God, St. Augustine
(354430) said:
Six is a number perfect in itself, and not because God created the world in six
days; rather the contrary is true. God created the world in six days because this
number is perfect, and it would remain perfect, even if the work of the six days
did not exist.
The fact that it takes 28 days for the Moon to travel round the Earth was also seen to
confirm the importance of perfect numbers.
In his book Introductio Arithmetica, Nicomachus of Gerasa (ca. 60120 C.E.) conjectured that there is one perfect number with exactly k digits for each k 1 and that they
alternate ending in 6 and 8. Both of these claims are incorrect, since the fifth and sixth
perfect numbers are 33, 550, 336 and 8, 589, 869, 056.
After Euclid and until Euler, most mathematicians implicitly assumed that all perfect
numbers were generated by a formula due to Euclid and Euler9. This formula produces
only even perfect numbers. Some, like Descarte and Mersenne admitted that they saw no
reason why odd perfect numbers should not exist, despite the fact that no one had yet found
one.
Euler was one of the first to attack one of the most intriguing (and one of the oldest)
problems in number theory and proved an important theorem on odd perfect numbers.
Although no odd perfect numbers are known to exist, there are many conditions that a
hypothetical odd perfect number must satisfy. As Sylvester (18141897) noted:
. . . the existence of [an odd perfect number]  its escape, so to say, from the
complex web of conditions which hem it in on all sides would be little short of
a miracle.
A (by no means complete) list of conditions that an odd perfect number n must satisfy are
given below:
n has at least four distinct prime factors. (Cole, 1888)
If n is not divisible by 3, 5, or 7, then n has at least 26 distinct prime factors.
(Catalan, 1888)
n has at least 5 distinct prime factors and n > 2 106 . (Turcanov, 1908)
n has at least 6 distinct prime factors. (Gradshtein, 1925)
Not all of the even exponents ki can be 2. (Steuerwald, 1937) %item Not all of
the even exponents ki can be 4. Nor is it possible for one of them to be 4 and
the others 2. (Kanold, 1941)
n > 1020 . (Kanold, 1957)
Not all of the even exponents ki can be 6. (Haggis, McDaniel, 1972)
If 3 does not divide n then n has at least 11 distinct prime factors. (Haggis,
1983)
9Marking a collaboration of Eu mathematicians 2000 years apart!
155
10Although in this case, fortunately, the aim of these theorems is to show that the properties that an odd
perfect number must satisfy are so restrictive that no number can satisfy them. In particular, no one actually
believes that odd perfect numbers exist.
APPENDIX C
Mathematical Induction
C.1. The Power Sum Problem
What is
1 + 2 + 3 + + 100?
(C.1)
According to mathematical folklore, the correct answer (namely 5050) was given immediately by the young Carl Friedrich Gauss (17771855) when his teachers assigned his class
this busy work problem. His teachers soon realized Gauss prodigious talent and his
education was later sponsored by the Duke of Brunswick.
The young Gauss found the sum (C.1) using the formula
n(n + 1)
.
2
This formula1 can be derived by adding the two equations
1 + 2 + 3 + + n =
1 + 2 + 3 + + n =
n + (n 1) + (n 2) + + 1 =
(C.2)
S
S
= 1
= 5
12 + 22 + 32
12 + 22 + 33 + 42
= 14
= 30,
12 + 22 + 33 + 42 + 52
= 55,
Example C.1. Let p(n) = n2 + n + 41 so that p(0) = 41, p(1) = 43, p(2) = 47,
p(3) = 53, p(4) = 61, . . . . Do you notice a pattern? It appears that p(0), p(1), p(2), . . .
are always primes. In fact, p(n) is prime for n = 0, 1, 2, . . . , 39 but p(40) is composite.
1The formula (C.2) was known since ancient times (and hence merely rediscovered by the young Gauss).
156
157
n(n + 1)
2
(C.4)
12
2 .
I NDUCTIVE S TEP: If P (n) is true for some number n, does it follow that P (n + 1) is also
true? We basically want to prove the statement If P (n) is true, then P (n + 1)
is true. In other words, we must show that
n(n + 1)
1 + 2 + + n =
2
(n + 1)((n + 1) + 1)
1 + 2 + + n + (n + 1) =
.
2
Therefore our goal is to derive the formula
1 + 2 + + n + (n + 1) =
(n + 1)((n + 1) + 1)
2
2It is possible to prove the principle of mathematical induction from the axioms of set theory, but we will
not do that here.
3Note that there is no infinity step and that it is improper to speak of a last step since there is no last
natural number.
158
n(n + 1)
.
2
Adding n + 1 to both sides of the preceding formula gives
n(n + 1)
+ (n + 1)
(1 + 2 + + n) + (n + 1) =
2
n(n + 1) + 2(n + 1)
=
2
(n + 1)(n + 2)
=
2
(n + 1)((n + 1) + 1)
=
.
2
Therefore P (n + 1) is true if P (n) is true. By induction, the formula holds for all n
N.
1 + 2 + + n =
n(n + 1)(2n + 1)
6
(C.5)
=
=
=
=
=
=
=
=
=
=
n(n + 1)(2n + 1)
+ (n + 1)2
6
n(n + 1)(2n + 1)
+ (n2 + 2n + 1)
6
n(n + 1)(2n + 1) 6(n2 + 2n + 1)
+
6
6
2
2
1
6 ((n + n)(2n + 1) + 6n + 12n + 6)
3
2
2
1
6 (2n + 3n + n + 6n + 12n + 6)
3
2
1
6 (2n + 9n + 13n + 6)
3
2
2
1
6 ((2n + 7n + 6n) + (2n + 7n +
2
1
6 (n + 1)(2n + 7n + 6)
1
6 (n + 1)(n + 2)(2n + 3)
6))
(n + 1)((n + 1) + 1)(2(n + 1) + 1)
.
6
159
Hence if (C.5) holds for some value of n, it must also hold for n + 1 as well. Since we
have established that the formula holds for n = 1, it follows that it also holds for 2. Since
it holds for n = 2, it must hold for n = 3, and so on. This is the essence of mathematical
induction.
Based on the facts that
1 + 1 + 1 + + 1
1 + 2 + 3 + + n
1 + 2 2 + 3 2 + + n2
= n
n(n + 1)
=
2
n(n + 1)(2n + 1)
.
=
6
(C.7)
(C.8)
(C.9)
(C.10)
160
Example C.2. Consider the set S = {a, b, c, d, e}. How many two element subsets of S
are there? To make a two element subset, we first need to choose one element, and there
are 5 ways of doing this. Lets say that we pick a:
{a}.
Now we have to choose an additional element of S to into our subset. There are 4 additional
ways of doing this. Lets pick b:
{a, b}.
We have produced a two element subset of S. There were
5!
3!
ways of doing this. However, if we had chosen b first and then a, we would have {b, a}
5!
by the number of ways to
instead. But {a, b} = {b, a} and we must therefore divide 3!
order aset with 2 objects, namely 2!. Therefore the total number of 2 elements subsets of
S is 52 or 5 choose 2. Of course this example does not prove anything, but it gives you
a little bit of the feel for the proof of the preceding theorem.
54=
3
4
1
2
1
3
6
10
4
10
(C.11)
1
1
5
From Pascals Triangle, one can deduce Pascals Rule which describes the (n + 1)st row
of Pascals Triangle in terms of the nth row.
Theorem C.4 (Pascals Rule). For n, k 0,
n
n
n+1
+
=
k
k+1
k+1
Proof. This is a straightforward computation:
n!
n
n
n!
+
=
+
k+1
k
k!(n k)! (k + 1)!(n k 1)!
n!(n k)
n!(k + 1)
+
=
(k + 1)!(n k)! (k + 1)!(n k)!
n!(k + 1 + n k)
=
(k + 1)!((n + 1) (k + 1))!
(n + 1)!
=
(k + 1)!((n + 1) (k + 1))!
n+1
=
.
k+1
161
n+1
Using Pascals Rule we see that the entries
in the (n + 1)st row of the triangle
k
are integers precisely because the entries nk in the preceding row are integers. Sometimes
Pascals Rule is written in the form:
n
n
n+1
+
=
k1
k
k
for n 1 and 1 k n.
Corollary 12. nk is always an integer.
(x + y)1
x+y
(x + y)2
x2 + 2xy + y 2
(x + y)3
x3 + 3x2 y + 3xy 2 + y 3
(x + y)4
(x + y)
The binomial theorem says that this pattern (based on Pascals triangle) continues indefinitely:
162
(C.12)
k=0
I NDUCTIVE S TEP: Now we must prove the statement If P (n) is true, then P (n+
1) is true. In other word we must show that
"
#
"
#
n
n+1
X
X n + 1
n k nk
n
k (n+1)k
n+1
(x + y) =
x y
(x + y)
=
x y
.
k
k
k=0
k=0
=
=
=
=
=
=
(x + y)(x + y)n
n
X
n k nk
(x + y)
x y
k
k=0
n
n
X
n k+1 nk X n k nk+1
x
y
+
x y
k
k
k=0
k=0
n+1
n
X n
X
n k (n+1)k
k n(k1)
x y
+
x y
k1
k
k=1
k=0
n+1
n
X n
X
n k (n+1)k
xk y (n+1)k +
x y
k1
k
k=1
k=0
!
n
X
n 0 n+1
n n+1 0
n
n
k (n+1)k
+
x y
+
x
y
+
x y
0
n
k1
k
k=1
n
n + 1 0 n+1 X n + 1 k (n+1)k
n + 1 n+1 0
x y
+
x y
+
x
y
0
k
n+1
k=1
n+1
X n + 1
xk y (n+1)k .
k
k=0
2 = (1 + 1) =
n
X
n
k=0
n
k is
163
(x + 1)
m+1
m+1
m+1 2
m+1 m
=1+
x+
x + +
x .
1
2
m
Since this holds for x = 1, 2, 3, . . . , n, we may add this equation to itself as x goes from 1
to n to obtain
n
n
X
X
m+1
m+1 2
m+1 m
m+1
m+1
1+
x+
x + +
x .
(x + 1)
x
=
1
2
m
x=1
x=1
where
Sm (n) = 1m + 2m + + nm
denotes the sum of the first n mth powers. All of these computations yield Bernoullis
formula
m+1
m+1
m+1
m+1
(n + 1)
1=n+
S1 (n) +
S2 (n) + +
Sm (n).
1
2
m
This is a recursive formula for Sm (n). In other words, if we have formulas for Sk (n) for
k = 1, 2, . . . , m 1 we can solve the equation above for Sm (n).
Example C.4. Recall that our experimentation suggested that
2
n(n + 1)
.
S3 (n) = 13 + 23 + + n3 =
2
164
This formula can be derived from Bernoullis recursive procedure. Indeed, we have
n(n + 1)
S1 (n) =
2
n(n + 1)(2n + 1)
S2 (n) =
6
and hence setting m = 3 in Bernoullis formula we see that
4 n(n + 1)
4 n(n + 1)(2n + 1)
4
(n + 1)4 1 = n +
+
+
S3 (n).
1
2
6
2
3
Expanding out both sides of the preceding equation yields
n4 + 4n3 + 6n2 + 4n = n + (2n2 + 2n) + (2n3 + 3n2 + n) + 4S3 (n).
Collecting common terms reduces the preceding to
n4 + 2n3 + n2 = 4S3 (n)
from which it follows that
2
n(n + 1)
2
as desired. Although this formula could also be proved using mathematical induction, one
would first have to know the formula beforehand (i.e. via numerical computations and
guesswork, as we have done). The advantage of Bernoullis method is that knowledge of
lower order power sums leads directly to formulas for higher order power sums, without
having to derive formulas from numerical computations and inspired guesswork.
S3 (n) =
APPENDIX D
Ordered Fields
D.1. Fields
The two prominent modern methods of constructing the real numbers (starting only
with the rational numbers, set theory, and logic) is through Dedekind cuts or equivalence
classes of Cauchy sequences. We will briefly touch on these later on in the course and
through the homework. However, we will not dwell on them now. Rather, we will examine
the properties of the real numbers that makes them what (we think) they are.1
Let us assume for the moment that R exists. What type of object is R? Where does it
fit into the grand scheme of things? In algebraic terminology, the real numbers R form a
field, a type of generalized number system which shares many of the standard properties
of elementary arithmetic.
Definition. A field is a set K endowed with two operations, denoted + and , which satisfy
the following axioms:
(i) C OMMUTATIVITY: x + y = y + x and x y = y x for every x, y K.
(ii) A SSOCIATIVITY: (x + y) + z = x + (y + z) and (x y) z = x (y z) for
every x, y, z K.
(iii) D ISTRIBUTIVITY: x (y + z) = x y + x z for every x, y, z K.
(iv) A DDITIVE AND M ULTIPLICATIVE I DENTITIES: There are distinct elements
called 0 and 1 of K such that x + 0 = x and 1 x = x for every x K
(v) A DDITIVE AND M ULTIPLICATIVE I NVERSES: For each x K, there exists an
element of K, denoted x, such that x + (x) = 0. For any nonzero x K,
there exists an element of K, denoted x1 , such that x x1 = 1.
It is important to be explicit about these axioms, for there are many algebraic systems
which do not obey all of the rules above. For instance, one can add and multiply n n
matrices, but matrix multiplication is not commutative nor does every nonzero matrix has
an inverse.
Most of the rules of basic algebra that you are familiar with from grade school can be
proved from these basic axioms. Unless you have taken abstract algebra, you might not
have known that there are many other number systems that obey these rules too.
It is important to understand that many different fields exist, and that the operations
+ and do not necessarily correspond to our usual understanding of addition and multiplication. Furthermore, the symbols 0 and 1 do not necessarily correspond to the numbers 0
and 1, in the usual sense. Consider the following examples:
1Fortunately, mathematicians have proved that the real number system exists and that it satisfies the properties of a complete ordered field. These properties are not assumed as axioms, rather they can be deduced
logically from either construction method referred to above.
165
166
Example D.1. Let K be a set containing the symbols 0 and 1 and define the operations +
and by the following tables:
+
0
1
0 1
0 1
1 0
0
1
0
0
0
1
0
1
One can check that K = {0, 1}, equipped with the operations above, forms a field. In fact,
you are already familiar with this field since it corresponds to the algebra of even and odd
numbers (represented by 0 and 1).
Example D.2. One can sometimes make new fields from pieces of old number systems.
If p is a prime number, then the set Zp = {0, 1, 2, . . . , p 1} forms a field2 when the
operations are defined by
x + y = remainder of x + y when divided by p
x y = remainder of x y when divided by p
As expected, 0 and 1 play the role of additive and multiplicative identities in this field.
Note also that Z2 is simply the field from the previous example.
Example D.3. The rational numbers Q, endowed with the standard operations, form a
field. It is a subfield of R.
Example D.4. The set
Q( 2) = {a + b 2 : a, b Q},
endowed with the usual operations of addition and multiplication, is also field.
Example D.5. The complex numbers system C is a field. Notice also that
Q Q( 2) R C.
Example D.6. R(x), the set of (real) rational functions, is a field (when endowed with
the usual addition and multiplication of functions). The constant functions 0 and 1 are the
additive and multiplicative identities.
MORAL: Although R is a field, the field axioms (i.e. standard
properties of commutativity, associativity, and distributivity) do
not narrow things down to the points where R is the only such
object. Can we list more properties of R? In fact, can we find
a list of properties that characterize R completely?
D.2. Ordered Fields
One property that helps to distinguish R from other fields is the fact that R comes
equipped with an ordering. Specifically, the real numbers form what is called an ordered
field. In addition to the standard field axioms, an ordered field also satisfies the following:
Definition. A field K is an ordered field if there is a subset K+ of K such that
(i) If x, y K+ , then x + y K+ and x y K+ .
(ii) T RICHOTOMY: For each x K, one and only one of the following is true:
x K+ , x = 0, x K+ .
2If p is not a prime number, then Z is not a field. For instance, 2 has no multiplicative inverse in Z .
p
4
167
One then says that x < y if y x K+ . The elements of K+ are called positive and the
elements such that x K+ are called negative.
Example D.7. Q, Q( 2), and R, endowed with the usual notions of positive and negative,
are ordered fields.
Example D.8. The field R(x) of all rational functions in the variable x can be ordered.
Specifically, we say that f R(x) 0 if f is eventually positive. In other words
(f R(x) 0)
Although this example may seem somewhat alien at first, it directly corresponds to the
intuitive notion of how strong a function is as x . For instance, in the ordering of
R(x) we have x2 > x > 1/x. Unlike R, however, R(x) does not have the Archimedean
Property. Indeed, in R(x), we have 1 R(x) x but n 1 R(x) x also holds for any n N.
Every ordered field comes equipped with an absolute value, defined by:
(
x
if x 0
x =
x if x < 0.
It is not too hard to show that the absolute value enjoys the standard features that we all
expect it to. However, there are two important properties that are often forgotten:
x + y x + y
x y x y
(Triangle Inequality)
(Reverse Triangle Inequality).
Know these inequalities well you will use them many times in this course.
Another important consequence of the order axioms is the Trichotomy Law:
Theorem D.1 (Trichotomy Law). Let K be an ordered field. Given x, y K, then one
and only one of the following statements is true: x < y, x = y, x > y.
Example D.9. Ordered fields are much rarer than fields. For example, no finite field is
an ordered field. Furthermore, the complex number system C is not ordered. Indeed, if
C were an ordered field, then by the Trichotomy Law, either i > 0, i = 0, or i < 0.
Manipulating these inequalities quickly leads to contradictions (try it).
Example D.10. Some fields can be ordered in more than one way. For example, Q( 2)
sits inside R, and as suchhas a natural ordering. However, one can
declare a new ordering by saying that a + b 2 is positive in the new sense if a b 2is positive in the
usual sense. It requires some checking, but it turns out that this gives Q( 2) two possible
orderings. Fortunately, Q and R themselves can be ordered in one and only one way (this
requires checking too).
Adding the order axioms to the field axioms narrows things down abit. We are closer
to obtaining a list of properties that characterizes R. However, Q, Q( 2), and R(x) are
also ordered fields. We therefore need to add at least one more axiom to make sure that we
have completely characterized R.
APPENDIX E
Primes Numbers
E.1. Euclids Theorem
Recall that the prime numbers are the building blocks of all integers. You are probably at least informally acquainted (via grade school arithmetic) with many of their basic
properties.
Definition. An integer p > 1 is called a prime number if there is no (integer) divisor d of
p such that 1 < d < p. A positive integer that is not prime is called a composite number.
Example E.1. The integers 2, 3, 5, and 7 are primes and 4, 6, 8, and 9 are composites.
Less obvious examples are 1299709 (the 100000th prime number) and 1299711, which is
divisible by 3 and hence composite.
Theorem E.1 (Fundamental Theorem of Arithmetic). Every integer n > 1 can be expressed as a product of primes. Specifically, we may write n = pa1 1 pa2 2 par r where the
pk are distinct primes and the ak are positive integers. The factorization of an integer
n > 1 into primes is unique, apart from the order of the prime factors.
This theorem first appeared (somewhat vaguely) as Proposition 14 of Book IX of Euclids book the Elements (ca. 2300 BCE):
If a number be the least that is measured by prime numbers, it will not be
measured by any other prime except those originally measuring it.
However, C.F. Gauss (in his groundbreaking 1804 treatise Disquisitiones Arithmeticae) was the first to state and prove the Fundamental Theorem of Arithmetic in a rigorous
way. Incidentally, Gauss was also the first to prove the Fundamental Theorem of Algebra
in a rigorous way!
An important mathematical fact is that the set
P = {2, 3, 5, 7, 11, 13, 17, 19, 23, . . .}
of prime numbers in infinite. This nontrivial assertion, now known as Euclids theorem,
was proved
in Book IX of Euclids book the Elements. Euclids proof, along with the irrationality of 2 (commonly attributed to Pythagoras, but most likely due to the Pythagorean
Hippasus of Metapontum), is considered one of the most mathematically elegant contributions of the ancient Greeks.
In his famous book A Mathematicians Apology, the great early 20th century English
mathematician G.H. Hardy stated that
I can hardly do better than go back to the Greeks. I will state and prove two of
the famous theorems of Greek mathematics. They are simple theorems, both
in idea and in execution, but there is no doubt at all about their being theorems
of the highest class. Each is as fresh and significant as when it was discovered
. . . two thousand years have not written a wrinkle on either of them . . . The first
168
169
Euclids proof is startling in its simplicity and its elegant use of reductio ad absurdum
(proof by contradiction). As Hardy says:
Reductio ad absurdum, which Euclid loved so much, is one of a mathematicians
finest weapons. It is a far finer gambit than any chess play: a chess player may
offer the sacrifice of a pawn or even a piece, but a mathematician offers the
game.
is divisible by 2
(n + 1)! + 3
is divisible by 3
..
.
..
.
(n + 1)! + n
is divisible by n.
Example E.2. For n = 4, the construction used in the proof of Theorem E.3 produces the
sequence
122 = 2 61
123 = 3 41
124 = 4 31
125 = 5 25
of four consecutive composite integers. Hoewever, 24, 25, 26, 27 and 32, 33, 34, 35 are
both much smaller sequences of composite integers. In general, the method of Theorem E.3
produces much larger sequences than necessary. This also illustrates the fact that although
a proof might work, it does not mean that the methods used are necessarily optimal.
170
Legendre was the first to publicly make a significant conjecture regarding the large
scale distribution of prime numbers. In his Essai sur la Theorie des Nombres (1798), he
proposed that
.
x
=1
lim (x)
x
log x 1.08366
where (x) denotes the number of primes x and log denotes the natural logarithm.
Based on numerical evidence, Gauss (as a child) conjectured that
(x)
=1
x/ log x
(E.1)
.
lim (x) Li(x)
(E.2)
lim
and
dt
log
t
2
is called the logarithmic integral. It appears that Gauss work on the subject began in 1791
(at the age of fourteen), well before Legendres book was written. The conjecture (E.1) is
true, and it is now known as the Prime Number Theorem. The proof of the Prime Number
Theorem would have to wait until the end of the 19th century.
A major step was taken in 1850, when the Russian mathematician Pafnuty Lvovich
Chebyshev proved that there exist constants c1 , c2 such that
x
x
< (x) < c2
c1
log x
log x
Li(x) =
(x)
x/ log x
exists, then this limit must equal 1. Unfortunately, Chebyshev was not able to prove that
the limit actually exists.
In 1896, Hadamard and de la Vallee Poussin (independently) proved the celebrated
Prime Number Theorem:
Theorem E.4 (Prime Number Theorem).
lim
(x)
= 1.
x/ log x
Their proofs are technical and involve the use of complex function theory and the
Riemann function. In 1949, Selberg and Erdos succeeded in proving the Prime Number Theorem without using complex function theory. Their socalled elementary proof is
exceedingly complicated, but does not use advanced complex analysis.
It is interesting to note that the conjecture (E.2) of the fourteen year old Gauss is also
true and more accurate than the standard prime number theorem.
A result of Littlewood (1914) shows that the difference (x) Li(x) assumes both
positive and negative values infinitely often. However, the first value of x for which (x) >
Li(x) is not known. In 1933, Skewes proved that such an x must occur before
e79
ee
1034
1010
(E.3)
171
The number (E.3) is called Skewes number and is widely believed to be the largest number
that has ever appeared for a genuine purpose. Subsequently this extravagant bound has
been reduced to 1.165 101165 by Lehman (1966), 8.185 10370 by te Riele (1987), and
it is now known to be somewhat less than 1.39822 10316 .
APPENDIX F
Galileos Paradox
The following is the passage from The Discourses and Mathematical Demonstrations Relating to Two New Sciences concerning Galileos Paradox:
S IMPLICIO: Here a difficulty presents itself which appears to me
insoluble. Since it is clear that we may have one line greater than
another, each containing an infinite number of points, we are forced
to admit that, within one and the same class, we may have something
greater than infinity, because the infinity of points in the long line is
greater than the infinity of points in the short line. This assigning to
an infinite quantity a value greater than infinity is quite beyond my
comprehension.
S ALVIATI: This is one of the difficulties which arise when we attempt, with our finite minds, to discus the infinite, assigning to it
those properties which we give to the finite and limited; but this I
think is wrong, for we cannot speak of infinite quantities as being the
one greater or less than or equal to another. To prove this I have in
mind an argument which, for the sake of clearness, I shall put in the
form of questions to Simplicio who raised this difficulty. I take it for
granted that you know which of the numbers are squares and which
are not.
S IMPLICIO: I am quite aware that a squared number is one which
results from the multiplication of another number by itself; this 4, 9,
etc., are squared numbers which come from multiplying 2, 3, etc., by
themselves.
S ALVIATI: Very well; and you also know that just as the products are
called squares so the factors are called sides or roots; while on the
other hand those numbers which do not consist of two equal factors
are not squares. Therefore if I assert that all numbers, including both
squares and nonsquares, are more than the squares alone, I shall
speak the truth, shall I not?
S IMPLICIO: Most certainly.
S ALVIATI: If I should ask further how many squares there are one
might reply truly that there are as many as the corresponding number
of roots, since every square has its own root and every root its own
172
173
APPENDIX G
= x1 x1 + x2 x2 + x3 x3
= x21 + x22 + x23
= kxk2 .
175
such that:
(i) (P OSITIVITY) hv, vi 0 for all v V;
FIRST SLOT )
An inner product space is simply a vector space V equipped with an inner product.
There are a couple additional properties that inner products have, which follow quickly
from the definitions. For example, combining (iii) and (v) yields:
hau + bv, wi = a hu, wi + b hv, wi
Example G.1. Rn , when equipped with the dot product, is an inner product space. With
our new notation, we have
n
X
xi yi .
hx, yi =
i=1
defines an inner product on R . Here hAx, Ayi refers to the standard inner product on Rn
from the preceding example. Let us briefly check that this satisfies properties (i) through
(v):
(i) If x Rn , then hx, xiA = hAx, Axi 0 since the standard inner product (i.e.,
the dot product) satisfies (i). More geometrically, we note that
p
kAxk = hAx, Axi,
the Euclidean norm of the vector Ax Rn .
(ii) If hx, xiA = 0, then hAx, Axi = 0 whence Ax = 0 since the standard inner
product satisfies (ii). Since A is invertible, it follows that x = 0 since the
homogeneous system Ax = 0 has only the trivial solution.
(iii) This is a straightforward computation using the fact that multiplication by A is
linear:
hu + v, wiA = hA(u + v), Awi
176
(iv) Since the standard inner product satisfies (iv) it follows that
hu, viA = hAu, Avi = hAv, Aui = hv, uiA .
(v) Using the fact that the standard inner product satisfies (v) along with the fact
that A(au) = a(Au) for all a R and u V, we see that
hau, viA = hA(au), Avi = haAu, Avi = a hAu, Avi = a hu, viA .
In summary, there are many possible inner products on Rn . It turns out that the inner
products described above are the only possible inner products on Rn .
G.3. Norms Defined by Inner Products
Recall that a norm on a vector space V is a function k k : V R that satisfies the
following conditions:
(i) kvk 0 for all v V and kvk = 0 if and only if v = 0
(ii) kavk = akvk for any a R and v V,
(iii) kv + wk kvk + kwk.
It turns out that an inner product space is always a normed vector space. In fact, the
following definition is a generalization of the fact that if x = (x1 , x2 , x3 ) is a vector in R3 ,
then its Euclidean length kxk is given by kxk2 = x x.
Definition. If V is an inner product space and v V, then the norm on V induced by the
inner product is defined by
p
(G.1)
kvk = hv, vi.
It turns out that (G.1) indeed defines a norm on V. In other words, one can verify that
the axioms for a norm are satisfied by the expression (G.1):
p
Theorem G.1. If V is an inner product space, then kvk = hv, vi defines a norm on V.
In particular, kvk satisfies the axioms (i), (ii), and (iii) for a norm on V and V is thus a
normed vector space.
p
Proof. Property (i) is easily verified:
hv, vi 0 for all v Vpis automatic since
hv, vi 0 for all v V by the definition of an inner product. If hv, vi = 0, then
hv, vi = 0 whence v = 0 by the definition of an inner product.
Property (ii) is slightly trickier:
kavk2
=
=
=
=
=
hav, avi
a hv, avi
a hav, vi
a2 hv, vi
a2 kvk2 .
177
Make sure you see why each step was valid look at the axioms for inner products to see
which rules we used. Taking square roots yields the desired formula
kavk = akvk.
We postpone the proof of Property (iii), the Triangle Inequality, until later.
Example G.3. We can define an inner product on C([a, b]), the vector space of continuous
(realvalued) functions on the closed interval [a, b], by defining
Z b
hf, gi =
f (x)g(x) dx.
a
The reason for using continuous functions is to ensure that the preceding integral exists
and is finite. Something that requires proof is that
Z b
hf, f i =
f (x)2 dt
a
equals zero iff f (x) is the zero function. We will overlook this for the moment.
The preceding product is not so bizarre. In fact, vectors in Rn are just functions, if
you think of them the right way. One usually thinks of a vector f Rn as an ntuple
f = (a1 , a2 , . . . , an ).
such that f (x) = ax for each x {1, 2, 3, . . . , n}. From this point of view the inner
product on Rn is simply
n
n
X
X
hf , gi =
ax b x =
f (x)g(x).
x=1
x=1
Keeping in mind that integration is a type of summation process (think Riemann sums),
one begins to see the relationship between the standard inner products on Rn and C([a, b]).
They are essentially the same, except that one is discrete and one is continuous. In light of
this revelation, we will begin using the symbols f, g to denote generic vectors (as opposed
to u, v, . . .).
G.4. Orthogonal Vectors
Definition. Two vectors u, v V are called orthogonal if hu, vi = 0.
Example G.4. In the real inner product space Rn vectors a = (a0 , a1 , . . . , an ) and b =
(b1 , b2 , . . . , bn ) are orthogonal iff
n
X
ha, bi =
an bn = 0.
k=1
178
Example G.5. If m, n Z, then cos 2nx and sin 2mx are orthogonal in C([0, 1]) with
R1
respect to the inner product hf, gi = 0 f (x)g(x) dx. Indeed, the following integral can
be verified directly:
Z 1
cos(2nx) sin(2mx) dx = 0.
0
Given two perpendicular line segments which form the sides of a right triangle, the
Pythagorean theorem tells us how to find the length of the hypotenuse. Although this is
one of the most basic theorems in all of mathematics, a surprising number of math majors
do not know how to prove it from basic principles. Here is a simple proof:
Theorem G.2 (Classical Pythagorean Theorem). If a, b, c are the lengths of the two sides
and hypotenuse of a right triangle, respectively, then a2 + b2 = c2 .
Proof. Put four copies of the triangle around a square of side c to make a square of side
a + b. Comparing areas of the big square to the sum of the areas of the components we get:
(a + b)2 = c2 + 4( 21 ab).
Expanding and canceling terms shows that a2 + b2 = c2 .
Properly interpreted, the Pythagorean theorem suggests something about inner product
spaces. The Euclidean plane is simply the inner product space R2 and the sides of our
triangle are orthogonal vectors u and v. In this form, the Pythagorean theorem states
ku + vk2 = kuk2 + kvk2 .
This is true in complete generality and it is one of the most fundamental properties of
abstract inner product spaces:
Theorem G.3 (Abstract Pythagorean Theorem). If f and g are orthogonal vectors in an
inner product space, then
kf + gk2 = kf k2 + kgk2 .
Proof. If f, g are orthogonal, then hf, gi = 0 by definition. Thus
kf + gk2
= hf + g, f + gi
179
obviously holds for all c R. It will be useful to find a constant c such that
hf cg, cgi = 0.
= hf cg, cgi
= c hf, gi c2 hg, gi
= c(hf, gi ckgk2 )
hf, gi
.
kgk2
We obtain the orthogonal decomposition f = cg + h where the vector
c=
h=f
hf, gi
g
kgk2
is orthogonal to g. Notice the important fact that h = 0 if and only if f and g are scalar
multiples of one another.
G.5. The CauchySchwarzBunyakowsky Inequality
One of the most useful inequalities in all of mathematics is the CauchySchwarzBunyakowsky Inequality. In the west, the following has traditionally be known as the
Schwarz Inequality or the CauchySchwarz Inequality. In Eastern Europe, it is frequently
called the Bunyakowsky Inequality. In light of this, many authors simply refer to it as the
CSB Inequality.
Theorem G.5 (CauchySchwarzBunyakowsky Inequality). If h, i is an inner product on
V, then  hf, gi  kf kkgk for all f, g V. Equality holds if and only if f and g are scalar
multiples of one another.
Pf. #1. If either f or g is the zero vector, then the inequality is obviously true. Thus
is suffices to check the case where neither f nor g is zero. Write down the orthogonal
decomposition of f with respect to g:
hf, gi
g + h.
kgk2
Here the vector h is orthogonal to f . The Pythagorean Theorem states that:
f=
hf, gi 2
gk + khk2
kgk2
 hf, gi 2 kgk2
kgk4
 hf, gi 2
=
kgk2
which implies the CSB inequality. Equality holds in the CSB inequality if and only if
h = 0, which by the comment at the end of the preceding section implies that f and g are
scalar multiples of one another.
kf k2
180
Pf. #2. Let f, g V and let t R be any real scalar. Furthermore, suppose that f 6= 0
and g 6= 0 to avoid any trivialities. Now observe that
p(t) = ktf + gk2 0
is a realvalued function of the variable t and furthermore p(t) 0 for all t. We can use
the definition of the norm and some basic properties of inner products to derive an explicit
formula for p(t):
p(t) =
=
=
=
=
ktf + gk2
htf + g, tf + gi
htf, tf i + htf, gi + hg, tf i + hg, gi
t2 hf, f i + 2t hf, gi + hg, gi
kf k2 t2 + 2 hf, gi t + kgk2 .
valid for all continuous function f, g on [a, b]. Try proving that directly!
i=1
i=1
Since x, y Rn , we may use the CSB inequality for the standard inner product to get
 hx, yi  kxkkyk,
which (when squared) yields exactly the strange inequality proposed above.
181
We proved that this proposed norm satisfies (i) and (ii) of the axioms for a norm, but we
never showed that (iii), the Triangle Inequality, was satisfied.
A fundamental theorem in plane geometry says that the sum of the lengths of two sides
of a triangle is always greater than the length of the other side. The following theorem
generalizes this idea to inner product spaces:
Lemma 9 (Triangle Inequality). Let V be an inner product space. If f, g V, then
kf + gk kf k + kgk.
Equality holds if and only if f and g are nonnegative scalar multiples of each other.
Proof.
kf + gk2
= hf + g, f + gi
= hf, f i + hf, gi + hg, f i + hg, gi
= kf k2 + 2 hf, gi + kgk2
kf k2 + 2 hf, gi  + kgk2
= (kf k + kgk)2 .
APPENDIX H
Covering Compactness
It turns out that there is a completely different approach to the concept of compactness.
These notes give a brief introduction to this viewpoint.
H.1. Covering Compactness
Definition. Let (M, d) be a metric space and let S M . We say that S is covering
compact if, whenever S is contained in the union of a collection of open subsets of M , S
is contained in the union of a finite number of these open subsets.
This definition is frequently stated as:
S is covering compact if every open cover of S has a finite subcover.
Example H.1. Any finite set S = {x1 , x2 , . . . , xn } in a metric space (M, d) is covering
compact. Let {A }I be an open cover of S. In other words, I is an index set1 and for
each I we have an open subset A of M . Since
S I A ,
it follows that each xn belongs to at least one of the A . In other words, there exist
1 , 2 , . . . , n I so that xi Ai for i = 1, 2, . . . , n. In particular,
n
[
Ai .
S
i=1
{A1 , A2 , . . . An }
Example H.2. (0, 1] is not covering compact since the open cover defined by
A = (, 1 + ),
>0
does not have a finite subcover which still covers (0, 1]. Indeed, take n of the A :
A1 , A2 , . . . , An
and note that
x < min{1 , 2 , . . . , n }
x
/
n
[
Ai .
i=1
In other words, the union of any finite number of the A excludes points of (0, 1] which
are sufficiently close to zero. Since there exists an open cover of (0, 1] which cannot be
refined to produce a finite subcover of (0, 1], it follows that (0, 1] is not covering compact.
1The index set can be finite, countably infinite, or even uncountable there are no restrictions.
182
183
forms an open cover of S. Since S is covering compact, there exists a finite subcover
{B1 (a1 ), B2 (a2 ), . . . , Bn (an )}
of S. However, each Bi (ai ) contains only finitely many terms of the sequence xn . On the
other hand, since
n
[
Bi (ai ),
S
i=1
it follows that the sequence xn assumes only finitely many values, a contradiction.
The proof that sequential compactness implies covering compactness is significantly
more difficult (it would take a couple pages) and is therefore omitted.
H.3. Total Boundedness
Definition. A set S M is totally bounded if for each > 0 there exists a finite covering
of S by balls.
S In other words, S is totally bounded if there exist x1 , x2 , . . . , xn M
such that S ni=1 B (xi ).
Theorem H.2. Let (M, d) be a metric space and let S M . The following are equivalent:
(i) S is (sequentially) compact
2What we have been referring to as compactness.
184
If M = Rn and d is the Euclidean metric, then the three conditions above are equivalent
to S being closed and bounded.
There are essentially two totally different ways of looking at compactness. We have
chosen to use the sequential approach because it is somewhat more intuitive. The covering
approach is a little more abstract and difficult to motivate. Nevertheless, the concept of covering compactness is open to greater generalization. When one studies pointset topology
(typically in graduate school), one no longer considers metric spaces, but rather topological
spaces where open and closed sets exist, but there is no notion of distance. Consequently,
the notion of compactness one encounters there is actually covering compactness.
For each theorem about compactness which we proved using the sequential definition,
there is typically a corresponding proof which uses the covering definition. For example:
Theorem H.3. A continuous function on a compact metric space is uniformly continuous.
Pf. (via Covering Compactness). Let (A, dA ) be compact and let f : A B be continuous. For each > 0 and for each x A there exists a number (x) > 0 so that
dA (x, y) < (x)
The open balls B(x)/2 (x) form an open cover of A since x B(x)/2 (x) for each x. Since
A is (covering) compact, there exists x1 , x2 , . . . , xn A so that
n
[
B(xi )/2 (xi ).
A
i=1
Now let
(xi )
2
(xi )
2
+
+
(xi )
2
= (xi ).
Therefore
dA (xi , y) < (xi ) and dA (x, xi ) < (xi )
which implies that
dB (f (xi ), f (y)) <
Putting this all together, we have shown that x y < implies that
Since this depends only upon (and not x or y) it follows that f is uniformly continuous.
185
[
M=
Acn
n=1
and hence {Acn : n N} is an open cover of M . Since (M, d) is compact, it follows that
the open cover {Acn : n N} has a finite subcover. In other words, there exists
n 1 < n 2 < . . . < nm
so that
Since
if follows that
whence
which is a contradiction.
Anm ,
3The important part of the theorem is the assertion that the intersection is nonempty!