Introduction To Proofs

Introduction to Proof
Steven Spallone
PREFACE 3
Preface
What is mathematics? Two friends and I pondered this question and came up
with the statement Mathematics is pattern recognition and deduction applied to
numbers and geometry. We believe we observe phenomena of various sorts, and
wish to convince ourselves and others that these phenomena are real. Perhaps we
notice that multiplying two odd numbers tends to produce another odd number, or
that two triangles with proportional sides tend to have the same angles as well. A
curious student should want to know, not only whether these phenomena are true,
but also why they are so. Part of our civilizations great heritage is the observation
and proof of such things. Over the years a standard language has arisen, which is
quite satisfactory to all but the most extreme skeptics. It is the purpose of this
book to introduce students to this language and methods of mathematical proof.
What is a proof? According to Steven Krantz [6], A proof in mathematics is a

psychological device for convincing some person, or some audience, that a certain
mathematical assertion is true. Thus, it varies according to whom youre telling
it. For instance, if I wanted to convince a mathematics professor that The nth
derivative of xn is n!., I would simply say that it is true by induction. If I
wanted to prove this fact to a typical high school student, I would have to put in
considerably more work. Now there is an interesting issue here. It is plausible that
I could convince this student by demonstrating that it is true for n = 1, 2, 3, 4
and 5 and perhaps subtly intimidating him into acquiescence. But that would be a
bad deed; of course recognizing a pattern is a different act than proving the pattern
persists. In this book we strive to only give good proofs.
The first chapter treats mathematical grammar, elementary logic, basic proof tech-
niques such as deduction and contradiction. This is followed by a review of (naive)
set theory, and then mathematical induction. At this points students have gained
some skill with proofs and are ready to learn theory building.
In Chapter 2, we no longer assume an easy familiarity with numbers, as we plan

to develop elementary number theory from very simple beginnings. We present the
Peano theory of the natural numbers N, based on only two simple axioms and the
principle of induction. Addition and multiplication are defined recursively and we
prove everything straight through to the Fundamental Theorem of Arithmetic.
Chapter 3 is a study of functions and relations. Particularly important is an intro-

duction to the theory of equivalence classes. The chapter ends with the beginnings
of cardinality theory.
The problems throughout the text are a compilation from old homework, exam,
and bonus problems, although I have stripped away hints and demands for rigor.
I feel that it is the instructors place to adapt the problems to the class. The self-
studying student should be warned that many of the problems are difficult, and
should not get hung up on the toughies, which I place at the end of the chapters.
Much thanks are due to Ben Walter for teaching out of an earlier version of the
text, and for several of the problems. At present, these notes are being used by the
4
author for a course at the Indian Institute of Science Education and Research in
Pune.
If you find errata please e-mail them to me and I will thank you and try to update
the notes appropriately.
Steven Spallone
Contents
Preface 3
Chapter 1. Naive Logic 7

1. Introduction 8
2. Mathematical statements 8
3. Implication 12
4. Propositional Calculus of a Single Variable 14
5. Exploiting Symmetry in Proofs 18
6. Some Game Theory 19
7. Sets 21
8. Induction 26
9. Chapter 1 Wrap-up 33
Chapter 2. Arithmetic 37
1. Introduction 38
2. The Natural Numbers N 38
3. The Division Algorithm 46
4. The Division Algorithm 48
5. Superlatives 51
6. Euclidean Algorithm 53
7. Strong Induction 54
8. Place-Value Systems 57
9. The Fundamental Theorem of Arithmetic 67
Chapter 3. Functions and Relations 79

1. Relations 80
2. Composition of Relations 84
3. Functions 88
4. Functions as Relations 95
5. Partially Ordered Sets 96
Chapter 4. Cardinality 99
1. Finite and Infinite Sets 100
2. Countable Sets 106
3. Uncountable Sets 108
4. Interlude on Paradoxes 111
5. Some History 115
5
6 CONTENTS
Chapter 5. Equivalence 117

1. Equivalence Relations 118
2. The Positive Rationals Q+ 122
Chapter 6. Rings 127
1. Abstract Algebra 128
2. Rings 129
3. Abstract Linear Algebra 137
4. Chapter Wrap-Up 142
Chapter 7. Polynomials 145
1. Polynomials 146
2. Polynomials over a Field 152
3. Irreducibility in C[x] 164
4. Irreducibility in R[x] 164
5. Irreducibility in Q[x] 165
6. Z[x] 166
7. Rational Functions 169
8. Composition of Polynomials 170
9. Chapter 6 Wrap-Up 172
Chapter 8. Real Numbers 175
1. Constructing R 176
2. Ordered Fields 176
3. Decimal Expansions 177
4. Dedekind Cuts 180
Chapter 9. Miscellaneous 189
1. An ODE Proof 190
2. Pythagorean Triples 192
Bibliography 195
CHAPTER 1
Naive Logic
7
8 1. NAIVE LOGIC
1. Introduction
2. Mathematical statements
One of the great features of mathematics is that every problem in the subject has
a correct answer. Every statement is either true or false. Here, for instance, are
some mathematical statements. Do you think they are true or false?
(1) 100101 > 101100 .

(2) 1 + 12 + 13 + + 10
1
= 3.
(3) A regular icosahedron has 30 edges.
(4) 2 = 10.
You may or may not know whether these statements are true or false, but you
should believe that there is a correct answer. Mathematicians presume that ev-
ery mathematical statement is either true or false. This philosophy goes back to
Aristotle, and is called the Law of Excluded Middle.
Compare this, for instance, to the statement sI have two hands. and My dog
Diogi is friendly. I consider that these are true statements. Any normal person
would agree with the first statement. But many people, and certainly many other
dogs, would consider the second statement to be false. Moreover Im afraid I would
be unlikely to convince them that it is a true statement.
Even with the first statement, one could imagine, for instance, a paranoid con-
spiracy theorist who believes that professors of mathematics have another hand
hidden somewhere. Most real-world statements are subjective, or debatable on
some level. One thinks of the philosopher Descartes who strives to prove that he
exists! Nonetheless, the logic we develop in this text applies so well to common
real-world situations that we will often spice up this chapter with nonmathematical
statements . Indeed, there are many applications of logic, such as law, which benefit
from this theory. The magnanimous reader will not be offended by the subjectivity
of my real-world examples.
What exactly is meant by the term mathematical statement? For the purposes
of this text, it means a grammatically correct English sentence which only con-
cerns mathematical objects. It should end with a period/fullstop. There should
be a subject and a verb. (In the mathematical statements above, the verbs are
is, equals, has, and equals. The audience should, in principle, understand
precisely what is meant when reading it. We say that putative (mathematical)
statements are not well-formed if they fail to impart precise meaning to their
audience. So for instance,
3 + 7.,
The real number 3 + 7 is awesome.
are not well-formed statements. The first fails because it is not even grammatically
a sentence (there is no verb), and the second fails because the audience presumably
does not know what makes a number awesome. This second sentence can be
2. MATHEMATICAL STATEMENTS 9
remedied by preceding it with a sentence that explains the unfamiliar word. Con-
sider the two statements: We say a real number is awesome, provided that it is
greater than 2. The real number 3 + 7 is awesome. The first sentence defines the
term awesome, and now the second statement is well-formed.
Note: I will often omit the adjective mathematical from the word statement,
when it is understood from context. The word proposition is a synonym for
statement, although later on it takes on the connotation of a statement that
should be proved or disproved.
There is an important distinction I want to make: If a statement is well-formed, it

is true or false. So it is common to have a false well-formed statement. For instance,
The real number 3 7 is positive. is a well-formed statement, even though it is
false. When discussing logic, about half of our statements ought to be false. Please
pay attention to context.
Note: In a more serious course in logic, meticulous care would be taken in expli-
cating the precise rules for making well-formed statements. There is good reason
for this, which we discuss in the Section Interlude on Paradoxes. However, this
is an enormous endeavor that we will not take on; we will be content with naive
logic.
2.1. Equality versus Equivalence
Let P and Q be statements. (I am using here, much as in algebra, a variable

to denote an entire statement. It is not meant to be a number, or any other
mathematical object; it is an entire statement.) Let us be very strict by saying that
P equals Q, written P = Q, provided that they are precisely the same statement,
word for word and symbol for symbol. So if P is I like cats and dogs., and Q is
I like dogs and cats., then P 6= Q, although they mean the same thing. It is very
easy to tell whether two statements are equal.
We will rarely worry about whether two statements are literally equal, since it is
just too strict. A more useful notion is that of equivalence. We say that two
statements are equivalent provided that they mean the same thing. Let us write
P Q if P and Q are equivalent statements, as in the cats and dogs example
above. Certainly P and Q are not equivalent to the statement I hate cats. The
statements 5! > 102 . and 102 < 5!. are equivalent. The statements 2 6= 10.
and 2 < 10 or 2 > 10 are equivalent.
It is not always easy to tell whether two statements are equivalent. Equivalence is
something that needs to be proved. For instance, from the statement P :5! > 102 .,
one can deduce the statement Q:4! > 20 by dividing both sides of the inequality
by 5, and one can deduce P from Q by multiplying both sides by 5. Since we can
deduce one from the other, we may conclude that P Q.
To define equivalence a bit more mathematically, if we are given statements P

and Q, we may consider P Q, or P is equivalent to Q., as a third statement,
whose truth value is given by the following table:
10 1. NAIVE LOGIC
P Q P Q
T T T
T F F .
F T F
F F T
This is an example of a truth table. Given truth values (true or false) of state-
ments, you read the table to find the truth value of the new statement. Caution:
Truth tables rely fundamentally on the Law of Excluded Middle, and so applying
this definition of equivalence to real world examples can lead to nonsense.
Note that any two true (mathematical) statements are equivalent; for instance the
statement 1 + 1 = 2. is equivalent to the statement A triangle has three sides.
Similarly, any two false statements are equivalent. Traditionally, we take 0 = 0.
as the simplest true statement and 0 = 1. as the simplest false statement. So,
any mathematical statement is equivalent to one of these.
2.2. Negation
Suppose that P is a given statement. Write Q for the statement P is true., and
R for the statement P is false. The truth table for P and Q is:
P Q =P is true.
T T
F F
We can see from this table that Q is equivalent to P , since Q agrees with P whether
P is true or false.
On the other hand, R is certainly not equivalent to P . R is called the negation of

P ; we use the notation R = P . Another truth table gives the truth values of P
in terms of those of P :
P P
T F
F T
You can read this table as, If P is true then P is false. If P is false then P is
true.
For instance suppose P is the statement 100101 > 101100 . Then P is the
statement 100101 > 101100 is false. This is equivalent to the statement
100101 < 101100 . If you are asked to negate a statement, you should find
a statement equivalent to its negation which is as simple as possible. Of course this
is a matter of taste as to what is simplest.
Here are two statements obviously equivalent to the negation of the statement
2 = 10.:
(1) 2 6= 10.
(2) Either 2 < 10 or 2 > 10.
2. MATHEMATICAL STATEMENTS 11
Remark: The statement is false, so strictly speaking any true statement is a nega-
tion here. But the given negations are good answers, assuming that the audience
doesnt know the statement is false. It is always easy to give a reasonable negation
of a mathematical statement, as we will see, whether or not we know it to be true.
If P is a statement, then what is a good negation of P ? Call the negation Q. If

P is true, then P is false, and so Q is true. Similarly, if P is false, then Q is also
false. Looking over what we just said, we see that Q is equivalent to P . We have
proven our first theorem:
Theorem 1.1. Let P be a statement. Then (P ) P .
We can also prove this theorem more mechanically by constructing a truth table.
P P (P )
T F T .
F T F
Note that this theorem relies heavily on the Law of Excluded Middle. In everyday
speech, we might say something like, Im not hungry but Im not not hungry. to
indicate that the statement I am hungry. is neither true nor false.
2.3. Conjunction and Disjunction
Mathematicians share a very precise language. Subtle ambiguities can creep into
the English language, for example with the word or. If you say,
Every day, Steven eats dahl or Steven drinks lassi.,
does this assertion include the possibility that on Saturday I might consume both
dahl and lassi?
In mathematics, we do include the possibility of both, with the word or. If P

and Q are statements, then P Q is the statement P or Q is true. Here is the
truth table:
P Q P Q
T T T
T F T
F T T
F F F
From the table, if P is true and Q is false, then the statement P or Q is true. If
P false and Q is true, then the statement P or Q is true. If P is false and Q is
false, then the statement P or Q is false.
If P and Q are statements, then P Q is called the disjunction of P and Q. It is

a way of forming a new statement from two other statements.
Next is the mathematical and, which is represented with the symbol . This
combination works the way youd expect; here is the truth table:
12 1. NAIVE LOGIC
P Q P Q
T T T
T F F
F T F
F F F
Thus, P Q is true only when both P and Q are true. The statement P Q is
called the conjunction of P and Q.
2.4. Truth Table Proofs
Proposition 1.2. Let P, Q be statements. Then (P Q) (P ) (Q).
[Negation of and/or statements, tautology, absurdum]
2.5. Exercises
Use truth tables to prove the following identities, for any statements P, Q, R.
(1)P P.
(2)(P Q) (Q P ).
(3)Prove that P Q is equivalent to ((P ) (Q)).
(4)Prove that (P Q) R is equivalent to P (Q R). Then the same for
instead of . Is it true if we replace with ? [move]
(5) Find combinations of P, Q, , , which give each of the 16 possible truth
values, given the four possible truth values of P and Q. (This is meant as
a group exercise.)
(6) Of the sixteen different combinations of P and Q from the previous prob-
lem, how many are both commutative and associative?
(7) Let P ,Q, and R be statements. Prove that P (Q R) is equivalent to
(P Q) (P R), and that P (Q R) is equivalent to
(P Q) (P R).
The Vellerman exercise
3. Implication
The phrase P implies Q, written P Q is typically confusing for students who

may confuse it with its English usage. I recommend you substitute the phrase
The truth of P implies the truth of Q. whenever you are perplexed. As with the
connectives $ and , we may define it with a truth table:
P Q P Q
T T T
T F F
F T T
F F T
3. IMPLICATION 13
Please note that whenever P is false, then the statement P Q is true, contrary
to what you might think. So under this convention any false statement implies any
other statement. The only time that P Q is false is when P is true and Q is
false. Under this convention, the following statements are true:
If this is the year 1986, then the Earth has two moons.
If this is the year 1986, then the Earth has one moon.
If the earth has one moon, then there are 24 hours in a day.
Here is a false statement: If there are 24 hours in a day, then the Earth has two
moons.
[ converse, contrapositive. Square of Opposition ]
P Q P Q QP P Q Q P
T T T T T T
T F F
F T T
F F T
3.1. A Syllogism
Now we will prove a form of deduction using truth tables. We will prove that
the statement
[(P Q) (Q R)] (P R)
is always true. Before doing the proof, let us apply it to the following statements.
P : I eat an entire chocolate cake.
Q: I get sick.
R: I will not win the wrestling tournament.
Lets say that you believe that P implies Q, and also that Q implies R. (Do you?)
Then you should also believe that if I eat an entire chocolate cake, then I will not
win the wrestling tournament. Combining two implications in this way one of the
logical exercises going back to Aristotle, called syllogisms. Presumably you have
already mastered this in some intuitive form.
Here we prove it using truth tables.

14 1. NAIVE LOGIC
P Q R P Q QR P R (P Q) (Q R) [(P Q) (Q R)] (P R)
T T T T T T T T
T T F T F F F T
T F T F T T F T
T F F F T F F T
F T T T T T T T
F T F T F T F T
F F T T T T T T
F F F T T T T T
As you can see, no matter what the values of P , Q, and R, the value of the final
column is always true. Reflect on this and see if you agree that this is a proof. A
tautology is a formula of propositional logic, which is always true regardless of the
truth value given to the propositional variables. The statement [(P Q)(Q
R)] (P R) is a tautology. You will see more examples in the exercises below.
3.2. Exercises
(1) Use truth tables to check that the following are tautologies. Also explain
why they are true with common sense.
P P
P P Q
(P Q) (P R) Q R
[(P Q) P ] Q
Remark: This last one is another famous example of a syllogism.
4. Propositional Calculus of a Single Variable
4.1. Quantifying
The first thing a math student should learn is how to write a mathematical state-
ment. This is different from writing a mathematical expression or equation. It is
important that you introduce variables with care. For example, merely writing
(x + y)2 = x2 + y 2
is very bad, but not for the reason you might think. The main reason it is bad is
because we have not introduced the variables x and y. Are they real numbers?
Complex numbers? Matrices? A better statement, grammatically, would be
For all positive numbers x and y, we have (x + y)2 = x2 + y 2 .
We will call such a statement well-formed. It is false. For the moment, our
priority is to make grammatically correct statements, which may be true or false.
If a statement is not well-formed, then we do not ask whether it is true or false; we
send it back to the author asking for a revision. I hope when you read this book,
and other books, that you appreciate the care that is made to make well-formed
statements.
4. PROPOSITIONAL CALCULUS OF A SINGLE VARIABLE 15
Here we make a true well-formed statement:
Let x = 0 and y = 2; then (x + y)2 = x2 + y 2 .
Please remember to always initialize your variables in such a fashion. This is

extremely important to do, both to communicate with your audience and also to
clarify your own thinking.
Here are some good ways to introduce, or quantify a variable:
Give it a specific values. (Let x = 4.)

Allow it to be some element of some set. (Let x be a real number.)
Allow the variable to represent all values in a given set, i.e. (For all real
numbers x, the number x2 is negative.)
State that what follows is true for some value in a given set, i.e., (There
exist positive numbers x and y so that log(x + y) = log(x) + log(y).)
The expressions For all and There exist are ubiquitous in mathematics. They
are called quantifiers, and get the special symbols and , respectively. You
should look for them, or their implication, throughout mathematics.
As another example, note that the constant C in Proposition 9.1 is introduced with
the quantifier.
Definition. We say f is even provided that real numbers x, we have

f (x) = f (x). We say f is odd provided that real numbers x, we have
f (x) = f (x).
Here are some examples (which well study later more thoroughly) from the theory
of factorization:
Definition. Let d and n be integers. Then d divides n provided that
an integer e so that de = n. Let n > 1 be an integer. Then n is prime
provided that divisors d > 0 of n, either d = 1 or d = n.
4.2. The First Principle of Analysis
It is important to understand how to combine these quantifiers, particularly in the

area of mathematics called analysis.
Lets start with a benign concept: When is f a constant function?
Definition. A function f is constant provided that C a real number

so that real numbers x, we have f (x) = C.
The order in which the quantifiers , are used is crucial; a different order can
completely change the meaning of the statement. For example, which functions
f : R R satisfy:
16 1. NAIVE LOGIC
For all real numbers x, there exists C a real number so that f (x) = C?
All functions f : R R do. For instance, consider f (x) = x2 . Then given a real
number x, there exists C (= x2 ) so that f (x) = C.
This may be confusing to you, so lets look more carefully at these examples. Keep in
mind that once you quantify a variable in a (normal) sentence, it remains quantified
that way for the rest of the sentence. Let me rewrite these two sentences with
parenthesis to clarify this.
(1) C a real number so that ( real numbers x,( we have f (x) = C)).
(2) real numbers x, ( C a real number so that( f (x) = C)).
In (1), The choice of C must be made before x is introduced; it must therefore hold
for all x at once. In (2) within the first set of parenthesis, the choice of x has been
fixed, and we only need to choose C to work with this choice.
Lets do another example. Are the following statements true or false?
(1) positive real numbers x, a positive real number y so that y < x.

(2) a positive real number y so that positive real numbers x, we have
y x.
Certainly (1) is true; given x > 0 we may take y = 12 x. What about (2)? Is there a
positive number so small that no other positive number is less? Certainly not, and
this fact is so important that we will give a formal proof. The proof will be our
first example of a Proof by Contradiction.
Proof Strategy: Proof by Contradiction (reductio ad absurdum)
In mathematics, either a statement is true or it is false. There is no middle ground.

Here is how a proof by contradiction works. We have a statement, and we want
to prove that it is true. But instead of directly deducing the statement, we take
another approach. We add as a hypothesis that the statement is false, and then
logically deduce an absurd (false) statement; this is the contradiction. Thus we
then see that the statement must be true.
This is best understood through examples.
Proposition 1.3. (First Principle of Analysis) Let x 0 be a real number. Sup-

pose that for all real numbers y > 0 we have x y. Then x = 0.
Proof. Suppose x > 0. Then 21 x > 0. By hypothesis, x 21 x (since x isnt

bigger than any positive number). Since x > 0 we may divide by x to obtain 1 12 ,
which is absurd.
Therefore it is impossible that x > 0, and we conclude that x = 0.
Read this short proof a few times to make sure you understand it.
4. PROPOSITIONAL CALCULUS OF A SINGLE VARIABLE 17
Let us give an application of the First Principle of Analysis. Many people in the
world do not believe that 0.9999 . . . = 1. (Do you?) Suppose you find yourself
locked in a debate with a nonbeliever; here is one argument to explain why it is so.
Let x = 1 0.9999 . . .; we can call it the niggling number. Get your opponent to
agree that x 0. Then get him to think about what the decimal expansion of x
must be. With some reflection, he should agree that it will begin x = 0.000 . . ., and
that it will begin with as many 0s as you like. (He will start to give up, but may
still feel like something is happening at infinite places...) Now, if y is any positive
number, it cant be smaller than x, because it will have some nonzero decimal digit
somewhere. So your gracious opponent will agree that x y for all positive y. Now
you have him. You say that therefore x 21 x, and so if he still doesnt believe that
x = 0 you divide by 0 and show that his way leads to the madness of 1 21 . Done.
4.3. Exercises
(1) Negate the statement, There is a real number x so that for all real num-
bers y, we have x y. Then prove your negated statement.
(2) Negate the statement even integers n > 2, there exist prime numbers
p1 , p2 so that n = p1 + p2 .
(3) Consider the statement, For every real-valued function f : R R there
is a constant k > 0 so that for all x R, we have f (x) |kx|. Is this
statement true or false?
(4) Suppose that f, g : R R are functions satisfying f (x) = g(y) for all
numbers x, y. Prove that f and g are both constant functions.
(5) Suppose that f, g : R R are twice differentiable functions which are
nowhere 0. Let u(x, y) = f (x)g(y). Suppose that
2u 2u
+ 2 = 0.
x2 y
Prove that there exists a constant C so that for all x, we have f 00 (x) =
Cf (x) and g 00 (y) = Cg(y). (You should use the previous exercise.)
18 1. NAIVE LOGIC
5. Exploiting Symmetry in Proofs
Proof Strategy: Without loss of generality Often one is faced with two or
more possibilities in a proof. If all possibilities are symmetric, then we may say
Without loss of generality we may assume (WLOG WMA) to assume one of these
possibilities. The (implied) proofs for the other cases must be exactly the same,
except for the symmetry.
Here is an example of a proof with three usages of WLOG WMA. See if you un-
derstand it; after the proof we will examine the symmetry behind these usages.
Proposition 1.4. Suppose the complete graph on 6 vertices has each edge colored
red or blue. Then there must be either a red triangle or a blue triangle.
Proof. Label the vertices A, B, C, D, E, F as in the figure. [Sorry...please

make your own figure] There are five edges from A so there are either at least 3 red
edges from A, or there are at least 3 blue edges from A. (For otherwise there would
be no more than 4 edges from A, a contradiction.) WLOG WMA there are at least
three red edges from A. WLOG WMA that the edges AB, AC, and AD are red.
Now consider the edges BC, CD, and BD. Suppose one of these is red. WLOG
WMA that BC is red. Then 4ABC is a red triangle. If none of these edges are
red, then they are all blue, and then 4BCD is a blue triangle. Therefore in all
cases, there is a red or a blue triangle.
Lets now examine the WLOG WMA symmetries
If there are three blue edges from A, we may switch the roles of red/blue
for the rest of the proof. This justifies the first WLOG WMA.
If three red edges from A are connected to P1 , P2 , P3 instead of B, C, D,
then we may switch the roles P1 B, P2 C, and P3 D for the rest
of the proof. This justifies the second WLOG WMA.
If CD is red, we may apply the permutation
B C D B,
and if BD is red, we may switch the roles via C D. This justifies the
third WLOG WMA.
5.1. Exercises
For the first 5 problems, suppose the complete graph on n vertices has each edge
colored red or blue.
(1) For which n is there necessarily a monochromatic triangle?

(2) If n = 5 and some vertex has 4 edges of the same color, then there is a
monochromatic quadrilateral.
(3) If n = 6 then there is a monochromatic quadrilateral. (Harder, use the
previous problem.)
(4) If n = 5 and every vertex has 2 red edges and 2 blue edges, then there is
a monochromatic pentagon.
6. SOME GAME THEORY 19
(5) If n = 6 there are actually at least two monochromatic triangles.

(6) A magic square is an nn grid in which each of the numbers {1, 2, 3, . . . , n2 }
is used once and the sum of each row, column, and diagonal is the same.
Find all possible 33 magic squares. Prove that you have done so. (What
is the common sum? What must the middle square be?)
(7) Suppose that 17 friends are standing in a circle holding strings so that
every pair of students is sharing a string. Each string is colored either
red, yellow, or blue. Prove that there exists either a red triangle, a yellow
triangle, or a blue triangle in this arrangement.
6. Some Game Theory
The theory of the previous section suggests some pleasant graph games.
Activity: The Game of SIM

SIM requires a piece of paper, two players, and two writing utensils with
two different colors, say red and blue. First mark 6 dots, evenly dis-
tributed around a circle. Players take turns, with Player #1 going first.
Players #1 uses red and Player #2 uses blue. On a given turn, two dots
are connected with a straight red or blue line. You may not connect the
same two dots twice. If a red triangle is created, then Player #2 wins,
and if a blue triangle is created, then Player #1 wins.
By Proposition 1.4, eventually one of the players will win. (Compare this to Tic-
Tac-Toe, where a game between experienced players should end in a tie.)
There are many variations of SIM: one may vary the number of dots, or one may
pick other shapes besides triangles, or one may play MIS, in which the first person
to get a triangle (or other shape) wins instead of loses.
Note that it is possible for 5-dot Sim to end in a tie, since the outcome could simply
be a red pentagon and a blue pentagon.
Proposition 1.5. There is a strategy for Player #1 to always win 5-dot MIS.
Proof. Write the vertices as A, B, C, D, E. Player #1 may start with edge

AB. WLOG WMA either that Player #2 plays either BC or CD.
Case I: Player #2 plays BC. In this case, Player #1 plays AD, forcing Player #2
to play BD. Player #1 plays CD, then Player #2 must play AC. Player #1 now
plays DE for the win, since on his/her next move, either AE or CE makes a red
triangle.
Case II: Player #2 plays CD. Player #1 plays AE, forcing Player #2 to play
BE. Player #1 now plays AD for the win, since on his/her next move, either DE
or BD makes a red triangle.
Corollary 1.6. If n > 5, there is a strategy for Player #1 to always win n-dot
MIS.
20 1. NAIVE LOGIC
Proof. Red simply picks 5 of the n dots and follows the previous strategy.
Lemma 1.7. For any n there is either a strategy for Player #1 to always win, or a
strategy for Player #2 to always win n-dot SIM.
[Proof]
Proposition 1.8. For any n there is a strategy for Player #2 to always win n-dot
SIM.
[Proof: Strategy stealing]
[Explanation for how this proof would fail for chess.]

Proposition 1.9. For any n and any shape X, there is a strategy for Player #1
to always win n-dot X-MIS.
6.1. Exercises
(1) Find an example of a game of SIM in which the game doesnt end until
the 15th edge is drawn.
(2) Consider the strategy in chess (or checkers, or go) to always mirror your
opponents move. Must the game end in a tie? Experiment with a friend.
(3) Sometimes a master of chess (in which white moves first) will play two
games with two opponents at the same time, alternating moves between
the boards. Suppose you are the second of these opponents, and that the
master is playing as black against you, and as white in the other game.
Describe a way to play so that the master does not win both of the games.
(4) Learn the games Dots and Boxes, Connect 4, Gomoku, and Hex. Like
SIM, these games have variations. Find some variations where ties are
impossible, and find some examples to which strategy-stealing arguments
apply.
7. SETS 21
7. Sets
In 1874, Mathematics received its Theory of Everything, a theory on which all

other theories of the time (algebra, analysis, differential equations, . . . ) could
rest. This was the theory of sets, introduced by Cantor, and now taught to school
children around the world. We begin this section by first explaining why we will not
define the term set explicitly, but then introduce the basic set theory operations
in terms of mathematical logic.
7.1. Definition of a Set
A set is a collection of elements.
This is a very well-known sentence. How do you feel about it? Do you think it
makes a good definition? In mathematics, a good definition precisely introduces a
word or phrase in terms of earlier established concepts. A mathematician reading
this will ask, what is a collection ? What is an element ? Where are the definitions
of those words? Without knowing precisely what are collections or elements, this
is simply not a definition in the mathematical sense.
However Im never going to give you a definition of set. There is an insurmountable

problem with trying to define the most basic objects in mathematics. To illustrate
this problem I will use the Oxford Dictionary [13] to attempt to define the word
cat. Suppose that I dont know what any words in the English language mean.
Definition. cat: A small domesticated carnivorous mammal with soft

fur, a short snout, and retractile claws.
Hold on. Maybe I dont know what the word a means. Lets look that up!
Definition. a: Used when referring to someone or something for the

first time in a text or conversation.
Before I notice the fact that the word a is used in the definition of a, I am again
lost because I dont know used.
Definition. used: Having already been used.
Dont groan. We look up having, et cetera.
Definition. having: possess, own, or hold.
Definition. possess: have as belonging to one; own.
Definition. have: possess, own, or hold.
Uh-oh. We are now totally stuck. The first word in the definition of possess is
have, and the first word in the definition of possess is have. We are unable to
understand the definitions of possess or have purely by using the dictionary, and
22 1. NAIVE LOGIC
in some sense we can never understand the other words, including cat, without
any of our own intuition. Its an interesting question, to what extent a picture
dictionary would help in this regard.
And this is how it goes. Certain concepts we treat as irreducible, in the sense
that we dont have a way to define them in terms of simpler notions. The notion
of a set is an irreducible concept. Here are some other definitions of set, just to
be thorough.
Definition. By a set we mean a grouping into one entity of distinct

objects of our intuition or our thought.Cantor
Definition. A set consists of elements which are capable of possessing

certain properties and of having certain relations between themselves or
with elements of other sets.Bourbaki
Definition. A set is a collection of distinct objects, considered as an

object in its own right.Wikipedia
It is interesting to look at mathematical books and see what they treat as irreducible
concepts, and how they introduce these. Here are some definitions from Euclids
Elements of notions that seem like they were really irreducible back then:
Definition. A point is that which has no part. A line is breadthless

length. A surface is that which has length and breadth only.
Anyway were not going to precisely define a set. We will say many things about
them however. And we will otherwise try to define things as well as we can.
7.2. Basic Set Theory Notions
Sets have members; we write x S if S is a set and x is an element of S. A

synonym for element is member. An example of a set is A = {2, 1, 7}. This
notation means that the numbers 2, 1, and 7 are elements of A, and nothing else
is an element of S. We write 3
/ A to indicate that 3 is not a member of A.
Definition. Let A, B be sets. Then B is a subset of A, written B A

provided that (x B) (x A).
The notation A B means the same thing. It is like with inequalities. For example
let B = {2, 1} and A = {2, 1, 7} again. Then B A.
The following is actually an axiom of set theory:
Definition. Let A, B be sets. Then A = B provided that

(B A) (A B).
Equivalently, A = B means that x A x B.

7. SETS 23
For instance, the sets {1, 2} and {1, 2, 1} are equal, even though 1 is presented twice
in the expression for the second set.
Of course A 6= B means (B = A).
Definition. Let A, B be sets. Then B A provided that

(B A) (B 6= A). In this case, B is called a proper subset of A.
I should warn you that many eminent authors use the notation B A to mean
B A, and many use notation as written here. I like the analogy with inequality
for numbers, which is why we will use the above notation in this book.
Definition. The empty set is the set with no elements.
A set is determined by the answers to the question, Is x A? for various x. For

the empty set, the answer to the question, Is x ? is always No!.
Convince yourself by using the definitions that if A is any set, then A. You will
need to remember that if a statement P is false, then P Q is true no matter
what the statement Q is.
Definition. Let A be a set. The power set (A) is the set of all subsets
of A.
For example, let A = {a, b}, with a 6= b. Then
(A) = {, {a}, {b}, A}.
Let A be a set, and P (x) a statement involving members x A. Then we may

form a new set {x A | P (x)}, which is the set of all x A so that P (x) is true.
For example, the set {(x, y) R2 | x2 + y 2 = 1} is the unit circle, a subset of the
plane R2 .
Definition. Let X be a set, and A, B X. Then

A B = {x X | (x A) (x B)},
A B = {x X | (x A) (x B)},
We call A B the intersection of A and B and A B the union of A
and B.
What is A ? What is A ? What is A A? What is A A? What is A X?

What is A X?
Before the next definition, please note quietly that if X is a set, and A is a subset
of X, then {x X | x A} = A.
Definition. Let X be a set, and A X. Then Ac = {x X | x

/ A}.
We call Ac the complement of A in X, or X A.
What is X c ? What is (Ac )c ? What is A Ac ? What is A Ac ?

24 1. NAIVE LOGIC
7.3. Set Theory Applications of Propositional Logic
[To be written.]
7.4. Exercises
(1) Give an example of a subset A of R so that the statement

(A Q) (Q A) is false.
(2) Let B = {x R | 1 < x}. Give an example of a subset A of R so that
the statement (A B) (B A) is false, and use set builder notation
to describe A.
(3) Let B = {x R | 1 < x}. Give an example of a subset A of R so that
the statement (A B) (B A) is false, and use set builder notation
to describe A.
(4) Can you find a set whose power set has exactly three elements?
(5) Given objects a, b, and y and given that {a, b} = {a, y}, prove that b = y.
(6) List the elements of the set
{A (N) | A {11, 6} = {6, 11}} .
(Here (N) denotes the power set of the natural numbers.)
(7) Describe three different elements of the set
{A (N) | A {11, 6} = {6, 11}}.
(8) Consider the two sets
(x, z) R2 | y R so that (x2 + y 2 = 1) (y 2 + z 2 = 1)

and
(x, z) R2 | (1 x 1) (1 z 1) (x = z) .

Prove carefully that these two sets are equal.

(9) Let A, B, C be sets. Consider the statement
A (B C) = (A B) C.
(a) Use a Venn diagram to show the statement is true or untrue.
(b) Reduce the statement to propositional logic.
(10) Let A, B, C be subsets of some universal set X. Consider the statement
A B C c C A B.
(a) Use a Venn diagram to show the statement is true or untrue.
(11) Let A, B be subsets of some set X. Consider the statement
A B = B (B Ac )c .
Recall that Ac denotes the complement of A in X.
(a) Use Venn diagrams to illustrate that the statement is generally true,
or sometimes false.
(12) Let A, B be subsets of some set X. Consider the statement
A B = B (B c A).
Recall that B c denotes the complement of A in X.
7. SETS 25
(a) Use Venn diagrams to illustrate that the statement is generally true,
or sometimes false.
(b) Reduce the statement to propositional logic, and use a truth table to
verify your answer above.
26 1. NAIVE LOGIC
8. Induction
8.1. Standard Induction
The method of induction is suggested by problems of the following type. You want
to prove a proposition P (n) which involves a parameter n which is a natural num-
ber. As n varies, you get infinitely many different propositions P (1), P (2), P (3), . . .
Imagine that P (n) is easy when n is small but gets progressively more complex as
n grows. Then a reasonable idea is to try to prove the smaller ones first, and work
your way up to the bigger ones.
Suppose we have three propositions P, Q, and R. Recall that if P Q and Q R,

then P R.
More generally, suppose we have a sequence of propositions P (1), . . . , P (n), and for
every k from 1 to n 1, we can show that P (k) implies P (k + 1). Then by iterating
the above idea we get that P (1) implies P (n):
P (1) P (2) P (3) P (n 1) P (n)
This is the basic form of induction. (The word induction suggests an electrical
analogy. Think of each P (k) as being connected to P (k + 1) by a wire. Then if you
charge up P (1) with veracity, the charge will eventually get to P (n).)
Thus in practice, if you want to prove P (n) for all integers n 1 by induction,
then you must prove:
(1) P (1) is true.

(2) For all k N, if P (k) is true, then P (k + 1) is true.
Step 2 has some logical complexity to it, and is often misinterpreted. You do not
prove that P (k) is true. You show that if it were true, then P (k + 1) would also be
true.
Step 2 is usually the hardest. Its not going to work unless you see a relationship
between the various P (k). You need to see a way to make the step from each one
to the next. Warning: it is not always manageable to prove a proposition P (n)
with induction, as there may not be any tractable relationship present. Moreover,
one can often prove a proposition directly and more simply without induction. So
dont get too carried away with this.
Lets do some examples.

n(n + 1)
Proposition 1.10. For all n N, 1 + + n = .
2
Let us call the proposition P (n). It is healthy to always try writing out explicitly
a few of the smaller P (n)s. For instance
1(2)
P (1) : 1 = ,
2
2(3)
P (2) : 1 + 2 = ,
2
8. INDUCTION 27
3(4)
P (3) : 1 + 2 + 3 = .
2
All these are easily verified; this suggests that we have correctly interpreted the
problem. Warning: P (n) is not a number! Do not say, for example, that P (2) = 3.
The P (n) are always mathematical statements, never numbers. In this case they
are equations.
Now there is an obvious relationship between the P (k)s as k grows. The left hand
side of P (k + 1) is obtained from the left hand side of P (k) by adding k + 1.
So step 2 goes like this:
Suppose P (k) is true. Thus

k(k + 1)
1 + 2 + + k = .
2
Add k + 1 to both sides. Then

k(k + 1)
1 + 2 + + k + (k + 1) = + (k + 1)
2
is true.
We do some algebra to the right hand side and deduce that

(k + 1)(k + 2)
1 + 2 + + k + (k + 1) =
2
is true.
But this equation is exactly P (k + 1).
So thats it. We have checked that P (1) is true, and proven Step 2. Finish by
writing something like Thus by standard induction P (n) is true.
Id like to remark that this proof is a little unsatisfying, in that it never really
explains the formula. (Although it serves as a good example of induction.) There
are many proofs of this important result; here is an easy one:
Write S for the sum of the first n numbers. Then

S = 1 + 2 + + n,
S = n + (n 1) + + 1
Adding these equations yields

2S = (n + 1) + (n + 1) + + (n + 1) = n(n + 1),
which yields the desired formula.
Our next example of induction I find much more satisfying. Let us prove the power
rule of calculus, that is
d n
Proposition 1.11. If n N then dx (x ) = nxn1 .
dx
We will assume only the product rule for derivatives and the rule dx = 1.
28 1. NAIVE LOGIC
d
Proof. As before we write P (n) : dx (xn ) = nxn1 . It is good to fo-
cus first on a few small cases. P (1) is the rule dx
dx = 1, which we have al-
d 2
ready assumed. P (2) is the rule dx (x ) = 2x. Why is this true? Typically
2 2
one writes out limh0 (x+h)h x , does some algebra and limit-logic to get P (2).
But we want to connect P (2) to P (1) and so will instead use the product rule:
d 2 d d d d
dx (x ) = dx (x x) = x dx (x) + x dx (x) = 2x dx (x) = 2x. Note that the last equal-
ity uses P (1). Can we make this connection more generally? You bet; using the
product rule:
d k+1 d k d d
(x )= (x x) = xk (x) + x (xk ).
dx dx dx dx
We finish this off by applying P (1) and P (k):
= xk + x(kxk1 ) = (k + 1)xk .
d k+1
Combining all the equalities yields P (k + 1) : dx (x ) = (k + 1)xk . Thus P (n) is
true by induction.
8.2. Recursive Definitions
A related idea to proof by induction, is that of recursive definition. For example

n! may be familiar to you as the product of all numbers from one to n, or
n(n 1) 2 1. The recursive definition is:
Definition. (
1 if n = 1,
n! =
n (n 1)! if n > 1.
For example if we want to know what 3! is, the definition says it is 3 2!. This
forces us to use the definition again to determine that 2! = 2 1!, and we need to
look once more at the definition to find that 1! = 1. We put this all together to get
3! = 3 2 1 = 6. The reader should believe that given any positive integer n, one
can in principle use this definition to compute n!.
n
d
As another example, consider the nth derivative of a function f , dx n (f ), which
may be familiar as what you get when you differentiate f n times. The recursive
definition is
Definition. (
df
dn f dx if n = 1,
= dn1 df
dxn dxn1 dx if n > 1.
The advantage of using recursive definitions is that it does not require readers to
use their imagination about doing something n times. There are no s, for
example; all the logic is laid out for you. This is particularly nice when these
concepts gang up on you. Here is a small example.
dn n
Proposition 1.12. dxn (x ) = n!
8. INDUCTION 29
dx
Proof. Induction on n. The statement for n = 1 is dx = 1, a familiar fact.
Suppose the proposition is true for k. Then
dk+1 k+1 dk d k+1 dk
(x ) = (x ) = ((k + 1)xk ),
dxk+1 dxk dx dxk
dn
using the recursive definition of dx n and Proposition 3.7. One factors out the k + 1
and uses the inductive hypothesis:

dk k
= (k + 1) (x ) = (k + 1) k!.
dxk
Finally, using the recursive definition of n! this is equal to (k + 1)!. We are done
by induction.
I hope you can see in the above example that recursive definitions mesh well with
proofs by induction. The resulting proof is clean, and does not ask the reader to
visualize, for example, a sequence of exponents coming down and being multiplied,
exactly as many times as the power of x, until we simultaneously have x0 multiplied
the product of integers from 1 to n. The latter, with some examples, is fine if youre
talking to someone and cant write things down. The inductive proof is clearer and
easier to check.
Here are a couple more recursive definitions. Let a1 , a2 , . . . , an , . . . be a sequence

of numbers. Then
n
(
X a1 if n = 1,
ai = Pn1
i=1 i=1 ai + an if n > 1.
is a recursive way to define a1 + a2 + + an .
Also commonly used is the notation

n
(
Y a1 if n = 1,
a = Q i n1

i=1 i=1 ai an if n > 1.
for the product of n numbers a1 a2 an .

Pn Qn
For example, i=1 i = n(n+1)
2 and i=1 i = n!.
8.3. Induction Schemes
Id like to codify the logic of induction from the previous section as (P (1)
k(P (k) P (k + 1))) n, P (n). The first is something that you need to
prove in an induction proof, and the second is the statement of induction. Thus
if you can manage to prove the items in parenthesis, you have obtained the items
on the right of the second .
Sometimes you want to tweak the rules of induction. Consider the following prob-
lem. We have an intuitive understanding that factorials grow much faster than
polynomials, and want to prove that n! > n2 . Unfortunately this isnt true for
30 1. NAIVE LOGIC
some small values of n. In fact for n = 2 and 3, the quantity n2 is bigger than n!.
For n = 4, we finally have 4! = 24 > 16 = 42 .
If we therefore try to literally apply standard induction, as presented in the previous

section, to the proposition P (n) : n! > n2 we will fail because it is not true for
P (1). So the scheme of our proof cannot be
(P (1), P (k) P (k + 1) k 1) P (n) n 1.
We will instead settle for

(P (4), P (k) P (k + 1) k 4) P (n) n 4.
Thus we will use the proposition for n = 4, and also prove that if P (k) is true
k 4, then P (k + 1) is true.
So we will get
P (4) P (5) P (6) P (n 1) P (n).
Proposition 1.13. If n is an integer greater or equal to 4, then n! > n2 .
We will proceed by the induction scheme

(P (4) k 4(P (k) P (k + 1))) n 4 P (n).
The statement P (4) is true since 4! = 24 > 16 = 42 . Lets get to work on the
induction step. We need to find a relationship between P (k) and P (k + 1) which
will allow us to derive one from the other. The left hand sides seem to be easiest to
relate, since the LHS (left hand side) of P (k + 1) is k + 1 times the LHS of P (k). If
P (k) is true, then by multiplying both sides by k +1 we see that (k +1)! > k 2 (k +1).
This is not P (k + 1), since the RHS is not exactly (k + 1)2 . However if we can prove
that k 2 (k + 1) is greater than (k + 1)2 , then we can combine the inequalities a la
(k+1)! > k 2 (k+1) > (k+1)2 to obtain P (k+1). The inequality k 2 (k+1) > (k+1)2
reduces to k 2 > k + 1. Bear in mind that we only need to prove this for k 4.
This proof can be done in any number of ways; I prefer k 2 > 2k > k + 1. The
first inequality is true since k > 2 and the second since k > 1. We are done by our
induction scheme.
The previous paragraph included some brainstorming. It is good to present a

final proof which does not include this, and is independent of the last paragraph.
Logically, it can be read immediately after the statement of the proposition.
Proof. We prove the proposition by induction. It is clear for n = 4. Assuming

the statement for k 4, then k! > k 2 , so that (k + 1)2 > k 2 (k + 1). Now as k > 2,
k 2 > 2k > k + 1, and therefore (k + 1)! > (k + 1)2 . We are done by induction.
Once you know what to do, the proof need not be very long. The above proof
requires a sophisticated and active reader who understands inequalities well.
Here is another kind of problem. Suppose you want to convince yourself that you
can integrate any power of sin(x). Well make P (n) the somewhat imprecise I have
a formula for an antiderivative of sinn (x). This will be true for, say, n 0. (Is it
8. INDUCTION 31
true for negative n?) Lets do a couple. An antiderivative of sin0 (x) = 1 is given
by x, and an antiderivative of sin(x) is given by cos(x). What about sin2 (x)?
Heres one approach:
Z Z Z
sin2 (x)dx = (1 cos2 (x))dx = x cos2 (x)dx.
Now integrate by parts, with u = cos(x) and dv = cos(x)dx. The latter integral
becomes Z Z
cos (x)dx = sin(x) cos(x) + sin2 (x)dx.
2
The devout calculus student will recall that we put this all together to get:
Z Z
2 2
sin (x)dx = x sin(x) cos(x) + sin (x)dx ,
and we can solve:

x sin(x) cos(x)
Z
sin2 (x)dx = .
2
This same basic method works to write higher powers of sin in terms of lower
powers:
Z Z Z Z
k k2 k2
sin (x)dx = sin 2
(x)(1cos (x))dx = sin (x)dx sink2 (x) cos2 (x)dx.
Let u = cos(x) and dv = sink2 (x) cos(x)dx. The latter integral becomes
Z Z
k2 1 k1 1
sin 2
(x) cos (x)dx = sin (x) cos(x) + sink (x)dx.
k1 k1
Putting this all together we get:
Z Z Z
k k2 1 k1 1 k
sin (x)dx = sin (x)dx sin (x) cos(x) + sin (x)dx ,
k1 k1
and we can solve for sink (x)dx:
R
k1
Z Z
1
sink (x)dx = sink2 (x)dx sink1 (x) cos(x).
k k
Okay. So I hope that was a pleasant review of integration by parts. Where are
we? We have R shown that if P (k 2)
R is true, then so is P (k), since we can write a
formula for sink (x)dx in terms of sink2 dx. If we want the integral for sin6 (x),
for example, we can use the above to reduce to sin4 (x) which reduces to sin2 (x),
and then to 1, which we know. If we wanted the integral for sin7 (x), we can use
the above to eventually reduce to sin(x), whose antiderivative has also been noted.
This suggests a new induction scheme. If weve proven P (0) and P (1) and also
proven that P (k) implies P (k + 2), then P (n) is true for all integers n 0. I will
codify this as:
(P (0) P (1) k(P (k) P (k + 2))) nP (n)
Here are some other useful induction schemes:

(a odd P (a) (kP (k) P (2k))) nP (n)
( p prime P (a) ( k, ` 2P (k) P (`) P (k`))) n 2 P (n)
32 1. NAIVE LOGIC
Of course, not everything is an induction scheme. For example, the scheme

(P (1) (kP (k) P (k + 2))) nP (n)
is certainly not valid, because at no point do we obtain P (2), or P (n) for any even
number n.
Which schemes are valid? For now, use your common sense. Later we will give
proofs for the validity of other induction schemes based on the original one.
The mother of them all, though, is Strong Induction. This is the scheme
(P (a) (( a k < n P (k)) P (n))) n a P (n).
You are probably a little tired of induction now so we will postpone the discussion
of Strong Induction until later.
8.4. Exercises
Standard Induction
(1) Prove that if x 1 and n N, then

(1 + x)n 1 + nx.
Pn
(2) We have seen that the nth triangular number tn = i=1 i is given by tn =
n(n + 1) Pn
. The nth tetrahedral number Tn is defined by Tn = i=1 ti . For
2
example, T3 = 1 + 3 + 6 = 10. Prove that the nth tetrahedral number is
1
6 n(n + 1)(n + 2). P
n1
(3) Prove that n! = 1 + i=0 i(i!) for n 1. (The convention is that 0! = 1.)
n
(4) The nth Fermat number Fn is given by the formula 22 + 1. For example,
F0 = 3 and F1 = 5. Prove the following.
F0 F1 Fn = Fn+1 2.
(5) Suppose that A is a convex subset of the plane. This means that, whenever
two points P, Q are in A, and is a real number between 0 and 1, then
the point
P + (1 ) Q
is also in A. (These points fill out the line segment joining P and Q.)
Prove that if P1 , P2 , . . . , Pn are n points in A, then the centroid
1
(P1 + P2 + + Pn )
n
is also in A.
(6) Use the recursive definition of summation to prove that if x 6= 1, then
n
X xn+1 1
xi = .
i=1
x1
9. CHAPTER 1 WRAP-UP 33
(7) Prove that you can solve the n towers of Hanoi problem in 2n 1. Also
prove that this is the minimum number of moves required.
Induction Schemes
(8) Write hk for the k-th Hemachandra number, starting with h0 =0 and
h1 = h2 = 1. Thus, hk+2 = hk + hk+1 for k 1. Let = 1+2 5 and

= 12 5 . Use the induction scheme
(P (1) P (2) (kP (k) P (k + 1) P (k + 2))) nP (n)
to prove that
n
n
hn = .
5
In other words, check the formula is correct for n = 1 and 2, and
prove that if the formula is correct for n = k and n = k + 1, then it is
2
true for n = k + 2. (Algebra tip: Show that 2 = + 1 and = + 1.)
(9) Prove that a, b N, we have ha+b = ha+1 hb + ha hb1 . (Suggestion: Use
the same induction scheme; youre not meant to use the -formula from
the previous problem.)
a
(10) Use a scheme mentioned in the text to prove that any positive fraction
b
can be reduced to a fraction in which the numerator and denominator are
not both even.
(11) Xuande has a pile of 4- and 5-cent postage stamps. What are all the
postages he can pay? Give a proof. (Suggestion: After you figure out the
answer, come up with an appropriate induction scheme.)
(12) Which of the following are valid induction schemes? Explain.
(a) (P (1) (k 2(P (k) P (k 1))) nP (n).
(b) (P (1) (k(P (k) (P (2k) P (2k + 1)))) nP (n).
(c) (P (0) P (1) (k, ` Z(P (k) P (`)) P (k `))) n Z P (n).
(d) (P (1) (kP (k) P (k + 1) P (k + 2))) nP (n)
9. Chapter 1 Wrap-up
9.1. Rubric for Chapter 1
In this chapter you should have learned
to quantify your variables

proof by contradiction
how to use truth tables to prove syllogisms in propositional calculus
how to use propositional calculus to prove facts in set theory
standard induction, and other induction schemes
9.2. Toughies for Chapter 1
(1) Suppose that f : R R is a differentiable function so that f 00 (x) = f (x)

for all real numbers x. Prove that there are constants C, D R so that
f (x) = Cex + Dex for all real numbers x.
34 1. NAIVE LOGIC
(2) Let P and Q be statements. Prove that you can not form the statement
P exclusive or Q, or P + Q, using only the symbols P, Q, , .
(3) In this exercise we look at some of the Zermelo-Fraenkel Axioms of Set
Theory, written in the raw form of Propositional Logic. In principle they
do not need any explanation but in reality it is a challenge to understand
what they mean. Can you explain? x, y, z, u can be regarded as both sets
and elements.
(a) xyz(x z y z).
(b) xy(z(z x z y) x = y).
(c) xyz(u(u z u x) z y).
(d) x(x = x).
(e) xyzu(u x z u z y).
(f) x( x y(y x y {y} x)).
Here is another statement apropos of Set Theory:
z((x
/ x) (x z)).
Can you interpret it? (Hint: It is related to a famous paradox.)

(4) Show how to inductively find antiderivatives of sinm (x) cosn (x) with m, n
any integers. They may be positive, negative, or zero. What happens for
fractions?
(5) Say you have some statements P, Q, R, . . . and you form a formula F from
these using , , and . Now make a new formula F D by turning all
the into and all the into . We call F D the dual formula to F .
For instance, if F is the formula P (Q R), then F D is the formula
P (Q R). Now you can also deal with formulas with by turning
something like P Q into Q P . Show that the dual of P Q is
(Q P ). Now find the duals of the following formulas:
P Q
P P .
P (P Q).
(P Q) (P R) (Q R).
((P Q) P ) Q.
These last four formulas were tautologies. If you did this correctly,
then the duals you found should be absurda, meaning that they are
false, no matter what truth values you put in for P, Q, R, . . ..
Here is the challenge. Prove that if a formula is a tautology, then its
dual is an absurdum.
Remark: A formula is an absurdum iff its negation is a tautology.
So, if you start with a tautology T , form the dual formula T D , and then
take the negation T D , you again get a tautology. Must T D be the same
formula as T ?
(6) The following is quoted from [1]:
Let S be a set. Suppose that a subset A of S is obtained from other
subsets X, Y, Z, . . . of S by applying only the operations , , c (in
any order). Then the complement Ac can be obtained by replacing
the subsets X, Y, Z, . . . by their respective complements, and the op-
erations , by , , respectively, while preserving the order of the
operations. This is the duality rule.
Let A = B be an equality of subsets of the above form, and con-

sider the equality Ac = B c . If we replace Ac and B c by the expres-
sions obtained by applying the duality rule, and if we then replace
X c , Y c , Z c . . . by X, Y, Z, respectively, and vice-versa, we obtain an
equality called the dual of A = B. We can do the same for the in-
clusion relation A B, but then we must take care to replace by
.
For example, it is generally true that for any subsets X, Y, Z of a given
set S, we have X Z (X Y ) Z. The dual of this relation is (check!)
that, generally, X Z (X Y ) Z.
What is the dual of the identity X (Y Z)c = (X Y c ) (X Z c )?
Can you explain the above two points?
CHAPTER 2
Arithmetic
37
38 2. ARITHMETIC
1. Introduction
In this chapter we will develop the basic properties of arithmetic, using as few
assumptions as possible.
In Section 2 we lay down the three Peano Axioms, and prove from them the rules
of addition and multiplication.
Arithmetic starts getting really interesting when we get to the idea of division with
remainder. In Section 4 we develop this concept and the related idea of a place-value
system.
In Section 3 we work out the theory of greatest common divisors. In particular we

deal with the idea of the greatest and least element of a set. An important tool
in understanding gcds is the Euclidean Algorithm, and along the way we upgrade
our induction toolkit by learning Strong Induction.
By Section 4 we are ready to treat the theory of prime numbers, and the Fundamen-
tal Theorem of Arithmetic. The FTA says that every number can be given unique
coordinates, with one component for each prime number. These coordinates
completely determine the multiplicative role of a number.
2. The Natural Numbers N
Admittedly we will not actually be able to construct the natural numbers N, since
we need a spark of life to get going. This spark takes the form of the existence of
an infinite set, which we assume has been organized into a certain linear shape.
Assuming the presence of this shape, we will be able to define the basic operations of
arithmetic and derive their basic properties. Moreover we will be able to construct
the other sets out of N.
2.1. Peanos Axioms
Children believe that there is a counting process to get to all numbers, starting with
1, in which every number is succeeded by another number. The rules are designed
so that different numbers have different successors, and you never get back to 1.
Let us codify this into mathematics.
Definition. The natural numbers N is a set with a successor func-

tion N N written n 7 n0 , and an initial element 1 N satisfying
the following three properties:
(INJ) For m, n N, (m0 = n0 ) (m = n).
(INF) n N, n0 6= 1.
(IND) For S N, ((1 S) ((n S) (n0 S))) (S = N).
Let us unravel some of these properties. The property (INJ) says that if two
numbers have the same successor, then they must be the same number. It may be
2. THE NATURAL NUMBERS N 39
easier to understand its contrapositive, which for m, n N, is:

(m 6= n) (m0 6= n0 ),
or that different numbers have different successors. (Later we will say that a func-
tion f is injective if (f (x) = f (y)) (x = y).)
Property (INF) is simple enough; it just means that 1 is not the successor of any
number. (In particular 0
/ N!)
To unravel the third axiom, let us make a quick definition:
Definition. Let S be a subset of N. Call S inductive if (n S)

(n0 S).
For example, the set of odd numbers is not inductive, since 1 is odd but 10 is not.
The set of numbers greater than 100 is inductive.
Then we can rewrite (IND) as:
(IND) If S is an inductive subset of N, and if 1 S, then S = N.
The reason this last axiom is called (IND) is that it actually allows us to use induc-
tion when proving things about N. Heres why. Let P (n) for n N be a sequence of
propositions as in the previous section. Write S = {n N | P (n) is true}. Suppose
we know that P (1) is true. Then 1 S. Suppose that we know that, for all k,
P (k) P (k 0 ). Then S is inductive. By (IND), we deduce that all natural numbers
are in S. This means that P (n) is true for all n, as desired.
Later we will deduce other induction schemes from (IND).
Let us deduce our first lemma.
Lemma 2.1. Let n N. If n 6= 1, then there is a unique number m N so that

m0 = n.
Proof. Let S = {1} {m0 | m N}. Then 1 S, and certainly if s S then

0
s S, so that S is inductive. By (IND), S = N. Since n N it is therefore in S
and since n 6= 1, we conclude that n must be of the form m0 for some m N. To
see uniqueness, suppose that n = m01 and n = m02 . Then m01 = m02 so by (INJ) we
see that m1 = m2 .
Until we have enough arithmetic to develop a place-value system, we will use Roman
numerals (except often for 1 itself) for elements of N. They are well-suited for Peano
arithmetic anyway. Thus the first few natural numbers {1, 10 , 100 , 1000 , 10000 , . . .} will
be denoted as {I, II, III, IV, V, . . .}. In other words,
Definition.
I = 1, II = I0 , III = II0 , IV = III0 , V = IV0 ,
VI = V0 , VII = VI0 , VIII = VII0 , IX = VIII0 , X = IX0 .
40 2. ARITHMETIC
We will also have occasion to use larger Roman numberals without comment but
they will be no larger than M = XIII .
Remark: There is not a consensus in the mathematical community about whether 0

should be considered a natural number, and so other books may have the convention
that 0 N. However it is an important issue, and you will need to know that we
are excluding 0 in this course.
Definition. The operation of addition in N is defined recursively via

(
a0 if n = 1,
a+n=
(a + m) if n = m0 .
0
Example:
II + III = (II + II)0 = ((II + I)0 )0 = ((II0 )0 )0 = V .
Please note that in particular, (a + m)0 = a + m0 for all a, m N.
2.2. Properties of Addition
Theorem 2.2. For all numbers a, b, c, we have (a + b) + c = a + (b + c).
Proof. We fix a and b and use induction on c. If c = 1, the theorem says

(a+b)+1 = a+(b+1). By definition of adding 1, this is the same as (a+b)0 = a+b0 ,
which is the very definition of a + b0 .
Suppose the theorem is true for some c. Taking successors of both sides yields
[(a + b) + c]0 = [a + (b + c)]0 .
The definition of addition lets us move the prime within the sums on both sides of
the equation:
(a + b) + c0 = a + (b + c)0
= a + (b + c0 ).
Thus the theorem is then true for c0 .
Lemma 2.3. For all numbers n, we have n + 1 = 1 + n.
Proof. Exercise.
Theorem 2.4. For all numbers a, b, we have a + b = b + a.
Proof. We fix a and use induction on b. If b = 1 this is the lemma. Suppose

a + k = k + a. Taking successors gives (a + k)0 = (k + a)0 . Then we have
a + k 0 = (a + k)0
= (k + a)0
= (k + a) + 1
= k + (a + 1)
= k + (1 + a)
= k 0 + a.
The fourth and sixth equality used associativity, and the fifth used the lemma. The
rest follows from the definition of addition and the inductive hypothesis. Thus we
are done by induction.
Theorem 2.5. (Cancellation Law of Addition) Let a, b, n N. If a + n = b + n,

then a = b.
This proof is a little bit complex logically. Before writing the formal proof, let us
brainstorm for a few minutes first. It will be an induction proof on n. The case
P (1) is the axiom (INJ). What about P (2)? Suppose a + II = b + II. Then I can
rewrite both sides as (a + 1)0 = (b + 1)0 . To this we apply (INJ) to get the equality
a + 1 = b + 1, which by P (1) implies a = b. This logical deduction gives P (2).
Similarly, for P (3) we start with a + III = b + III, rewrite it as (a + II)0 = (b + II)0 ,
and apply (INJ) to get a + II = b + II. This shows that P (2) P (3). Now we are
ready for the proof.
Proof. Induction on n. (INJ) is the case n = 1. Suppose the theorem is true

for some k. Then, suppose a+k 0 = b+k 0 . This can be rewritten as (a+k)0 = (b+k)0 .
Through (INJ) this implies that a + k = b + k, which by the inductive hypothesis
implies that a = b.
Thus the theorem is true for all n.
Definition. Let a and b be numbers. Then a<b provided that there is a

number x so that a + x = b.
Note that a < a0 always; in this case x = 1.
Proposition 2.6. If a < b and b < c, then a < c.
Proof. The hypotheses imply that there are numbers x and y so that a+x = b
and b + y = c. Then we compute that a + (x + y) = (a + x) + y = b + y = c, thus
a < c.
Proposition 2.7. If a b provided that b < a. We

also write ab provided that
(a < b) (a = b).
Lemma 2.8. For all n N, we have 1 n.
Proof. If n = 1 we are done. Otherwise n 6= 1. Then by Lemma 2.1, there is

an m N so that m0 = n. Thus m + 1 = n, which shows that 1 < n.
Note that by this lemma, any number n is either 1 or m0 for some m.

Lemma 2.9. (Creeping Lemma) If a < b, then a0 b.
Proof: Exercise. Use the previous lemma.

Theorem 2.10. (Weak Trichotomy) Let a, b N. Then either a < b, a > b, or
a = b.
Proof. Fix b, and write P (a) for the statement of the lemma. We will induct
on a, holding b constant. Lemma 2.8 gives us P (1). Now suppose P (k) is true,
giving three possible cases. We will show that each of these cases leads to a case of
P (k 0 ), which will prove the theorem. If k b then since k 0 > k we conclude k 0 > b. If k = b then
k 0 > b.
To prove that no more than one of these possibilities can hold, we need the following
proposition.
Proposition 2.11. a, n N, we have a + n 6= n.
Proof. Induction on n. (INF) says the proposition is true for n = 1. Suppose

the proposition were not true for the successor k 0 of some k. Then a + k 0 = k 0 . But
then by (INJ) we would have a + k = k, which means the proposition would not be
true for k. This is the contrapositive of P (k) P (k 0 ). Thus P (k) P (k 0 ), and
we are done by induction.
Corollary 2.12. n N, (n < n).
The corollary says that a natural number cannot be less than itself. Do you see
why the corollary follows immediately from the Proposition? If not, go back and
reread the definition of a < b.
Theorem 2.13. (Strong Trichotomy) Let a, b N. Then exactly one of a < b,
a > b, or a = b holds.
Proof. Suppose a b is similar.
Definition. If a < b, then b a is the number x so that a + x = b.
Thus a + (b a) = b = (b a) + a by definition.
Note that this number is uniquely determined, by the Cancellation Law of Addition.
Here is one of the Associative Laws of Subtraction:

Lemma 2.14. Let x, y, z N. If x > y, then (z + x) y = z + (x y).
Proof. The calculation

[z + (x y)] + y = z + [(x y) + y]
= z + x,
shows that z + (x y) is (z + x) y.
Can you guess what the two other Associative Laws of Subtraction are? (Theyre
given as exercises.)
2.3. Properties of Multiplication
That is as far as we will go with just addition. At this point we will freely use the
associative and commutative rules of addition without comment.
We turn to multiplication, which is of course repeated addition.
Definition. The operation of multiplication in N is defined recursively

via (
a if n = 1,
an =
a m + a if n = m0 .
Example:
II III = II II + II = (II I + II) + II = (II + II) + II = IV + II = VI .
Theorem 2.15. If a, b, n N, then we have (a + b) n = a n + b n.
Proof. Induction on n. Since a 1 = a by the definition of multiplication, the

case n = 1 reduces to a + b = a + b on both sides. Now suppose the theorem is true
for k, so that (a + b) k = a k + b k.
To get P (k 0 ) we have
(a + b) k 0 = (a + b) k + (a + b)
=ak+bk+a+b
= (a k + a) + (b k + b)
= a k0 + b k0 ,
as desired. We are done by induction.
Theorem 2.16. If a, b N, then a b = b a.
44 2. ARITHMETIC
Proof. Exercise.
Corollary 2.17. If a, b, y N, then y (a + b) = y a + y b.
Proof. Combining the previous two results, we have

y (a + b) = (a + b) y
=ay+by
=ya+yb

Theorem 2.18. If a, b, n N, then we have (a b) n = a (b n).
Proof. Induction on n. If n = 1 then both sides are ab. Suppose the theorem
is true for n = k. Thus (a b) k = a (b k).
To get P (k 0 ) we have
(a b) k 0 = (a b) k + a b
= a (b k) + a b
= a (b k + b)
= a (b k 0 ),
as desired. (Can you justify each step?) We are done by induction.
Proposition 2.19. If a < b, then na < nb. Moreover n(b a) = nb na.
Proof. There is a number x = b a so that a + x = b. Multiplying this by n

yields na + nx = nb; thus nb > na and nx = nb na as claimed.
Proposition 2.20. (Cancellation Law of Multiplication) If na = nb, then a = b.
Proof. By contradiction and Strong Trichotomy. If a < b, the previous propo-

sition and S.T. shows that na 6= nb. Similarly if a > b.
These are the elementary properties of addition and multiplication.
2.4. Exercises
(1) Lemma 2.3, Proposition 2.7, the Creeping Lemma, and Theorem 2.16.
(2) Prove that IV II = II using the definitions.
(3) If a > b prove that a2 > b2 .
(4) If a > b prove that a2 b2 = (a + b)(a b).
(5) If a > b + c prove that a (b + c) = (a b) c.
(6) If b > c and a + c > b prove that a (b c) = (a + c) b.
(7) State an recursive definition of ab for a, b N, agreeing with the usual
sense. Use your definition to prove that for a, b, c N,
ab ac = ab+c .
(8) Prove that abc = (ab )c .
(9) Prove that if b > 1, and r < s, then br < bs .

(10) Prove that if b, r, s N, with b > 1 and br = bs , then r = s.
(11) Prove that if be = ce , then b = c.
(12) (Associativity for Exponentiation) For which a, b, c N is it true that
c
a(b )
= (ab )c ?
(13) Define an operation recursively on natural numbers via
(
a if b = 1,
a b =
a(ac) if b = c0 .
Put the following numbers in order from least to greatest:
I V, II IV, III III, IV II, V I, and II V .
46 2. ARITHMETIC
3. The Division Algorithm
3.1. Divisibility and Quotients
Divisibility is the multiplicative analogue of inequality.
Definition. Let a, b N. We say a divides b, b is a multiple of

a, and a|b provided that there is a number x N so that ax = b.
Here are some basic properties of divisibility, which the reader should verify:
Proposition 2.21. Let a, b, c N. We have:
1|a and a|a.

If a|b and b|c, then a|c.
If a|b, then ac|bc, and conversely.
If a|b, then a|bc.
If a|b and a|c, then a|b + c.
If a|b, a|c, and b > c, then a|(b c).
Proposition 2.22. let a, b N. If a|b, then a b.
Proof. The hypothesis implies that b = ac for some c N. Since 1 c by

Lemma 2.8, the proposition follows from Proposition 2.19.
Lets prove something concrete.

Proposition 2.23. II does not divide III.
Proof. Suppose that II | III. We also have II | III, so by the last item in Propo-
sition 2.21, we may conclude that II | I. But this contradicts the previous proposi-
tion.
Definition. Let a, b N with a|b. We define b a to be the number

q N so that qa = b.
Note that this number is uniquely determined, by the Cancellation Law of Multi-
plication. For example n 1 = n for all n N. Here is a typical proposition and
proof involving this definition:
Proposition 2.24. If a|b and c|d then ac|bd and bd ac = (b a) (d c).
By popular demand, we give two proofs, one straightforward and another concep-
tual. First, the straightforward and unimaginative one.
Proof. Lets just write out all the divisibility definitions. Say aq = b and
cp = d. Then aqcp = bd, so by definition, (b a) (d c) = qp = bd ac.
Next, for the student with a crisp sense of the definitions.

3. THE DIVISION ALGORITHM 47
Proof. It is enough to show that (b a)(d c) satisfies the defining property

of bd ac:
(ac)((b a) (d c)) = (a(b a)) (c(d c)) = b c.

Definition. Let Div(a) = {d N | d|n}.
Thus, Div(a) is the set of divisors of a. For example, Div(XV) = {I, III, V, XV}.
Here are translations of parts of Proposition 2.21:
{1, a} Div(a).
If a Div(b) and b Div(c), then a Div(c).
If a Div(b), then a Div(bx).
Can you translate the rest?
3.2. Including Zero
. This is a good time to append the number 0 to our set of numbers.
Definition. The set of whole numbers N is defined as the union of the

natural numbers N with a new element 0.
Thus if n N, then (n = 0) (n N). Now that we have two sets of numbers, we

must be careful to specify whether a variable n is in N or N.
We now describe how to extend succession, addition, inequality, multiplication, and

exponentiation to N. We put 00 = 1. Note that now N fails (INF); this is okay
because N 6= N. (Although if we just rewrote Peanos Axioms with 0 in place of 1
we would have defined N!)
Definition. The operation of addition in N is defined via

the same m + n if m, n N,

m+n= m if n = 0,

n if m = 0.

Note that this is consistent if m = n = 0.
Inequality: As before, for a, b N, we define a < b provided that there is a

number x N so that a + x = b. For example 0 < 1.
The following basic properties of addition and inequality which we proved for N
also hold for N: Commutativity and Associativity of Addition, Trichotemy, the
Creeping Lemma. It is a bit tedious to verify that so many proposition we proved
for N still hold for whole numbers, so let us just give a sample, leaving the rest to
the reader.
Proposition 2.25. For a, b N we have a + b = b + a.
48 2. ARITHMETIC
Proof. If a, b N, then this has already been proved. If a = 0, then by

the definition above both sides of the equation are equal to b. Similarly if b = 0.
Therefore a + b = b + a in all cases.
Obviously Lemma 2.8 fails and should be replaced with 0 n for all n N. Again,
the proof is just by two cases: either n N or n = 0.
Subtraction should now be extended to m n for m, n N satisfying m n. As

before m n is defined as the number x so that n + x = m. Since we now have
n + 0 = n, this gives n n = 0. Also note that x 0 = x by the same token.
Multiplication should give no surprise:
Definition. The operation of multiplication in N is defined via

(
the same m n if m, n N,
mn=
0 if (m = 0) (n = 0).
Commutativity, Associativity, and Distributivity, and Proposition 2.22 [fix] are

still true in N and quite easy to check. Unfortunately we must give up on the
Cancellation Law for Multiplication for whole numbers, since 0 0 = 0 1, for
example. A subtle ramification of this is that although the equation 0 x = 0 has
solutions, it does not have a unique solution, so we do not define the expression
00. We do have 0n = 0 for n N . Before we forget let us treat exponentiation,
which might give a mild surprise:
Definition. The operation of exponentiation in N is defined via

n
the same m
if m, n N,
n
m = 1 if n = 0,

0 if (m = 0) (n 6= 0).

Note that this means 0x = 0 except at x = 0, a kind of discontinuity. One can

check that this definition satisfies the usual exponentiation rules.
Remark: One reason for the definition 00 = 1 is for the facility of power series.
P xi 00
For example, ex = i=0 evaluated at x = 0 gives 1 = , which suggests that
i! 0!
we define both 0 and 0! to be 1. A more philosophical reason that 00 = 1 is as
0
follows. An expression like 00 or 10 , or even 0! is whats called an empty product,

in which nothing is being multiplied. Now an empty sum should be the neutral
number for addition, which is 0, but the empty product should be the neutral
number for multiplication, which is 1.
4. The Division Algorithm
Here is a very important theorem about integers which lies at the heart of arith-
metic.
4. THE DIVISION ALGORITHM 49
Theorem 2.26. (Division Algorithm, Existence) Let a N and b N. Then there

are numbers q, r N so that b = qa + r and r < a.
Proof. Fix a; we want to induct on b. Write P (b) for the statement of the
theorem. If b = 0 put q = r = 0. Suppose P (k) is true.
Then k = qa + r, with r < a. Then we have k 0 = qa + r0 . By the Creeping

Lemma, r0 a. If r0 < a, then we are done. If r0 = a, then by the definition of
multiplication, k 0 = q 0 a = q 0 a + 0. Thus P (k 0 ) is true, so we are done.
Remark: We have used the induction scheme

(P (0) P (k) P (k 0 )) (P (n)n N).
This relates to Standard Induction as follows. If P (0) is true, then by the hypoth-
esis, so is P (1). Then together with P (k) P (k 0 ) we recover the hypotheses of
Standard Induction, thus get P (n) for all nN. We also have P (0), so we finally
get P (n) for all n N.
Let us pause for a tiny application. For sanitys sake, let 2 = II.
Definition. Let n N. Then n is even provided that 2|n. On the other

hand, n is odd provided that 2 - n.
Lemma 2.27. Let n N. Then n is odd if and only if k N with n = 2k + 1.
Proof. () Suppose that n is odd. Apply the division algorithm to n and 2

so that n = 2k + r with r, k N and r < 2. If r = 0, then 2|n, a contradiction.
The only possibility is then r = 1, which gives n = 2k + 1 as claimed. () Suppose
n = 2k+1 is even. Then 2|2k+1. Since plainly 2|2k, we also have 2(2k+1)2k, thus
2|1. But this is impossible by Proposition 2.22 [fix] since 1 < 2. This contradiction
shows that n must be odd.
Lemma 2.28. Let n N. If n2 is even, then n is even.
Proof. We prove this by contraposition. If n is not even, then by the previous

lemma, we may write n = 2k + 1 for some k N. Then n2 = 4k 2 + 4k + 1 =
2(2k 2 + 2k) + 1, which by the previous lemma implies that n2 is odd.
I want to break the rules and talk about rational numbers for just a moment,to
show one of the greatest mathematical proofs of all time, the irrationality of 2,
as an application of the work weve just done.
Theorem 2.29. There does not exist a rational number whose square is 2.
Proof. We prove this by contradiction. Suppose that there exists q Q with

q 2 = 2. We may assume q > 0. (Why?) By Exercise 10 [fix] in Section 7 of the
a a2
previous chapter, we may write q = , with a, b N not both even. Then 2 = 2,
b b
so that a2 = 2b2 . In particular, a2 is even. By the previous lemma, this implies
50 2. ARITHMETIC
that a is itself even. We may write a = 2k for some k N . Thus (2k)2 = 2b2 ,
and canceling 2s gives 2k 2 = b2 . So now b2 is even, and again this means that b
is even. This contradicts the fact that at least one of a, b is odd, and we conclude
that such a q must not exist.
Now we return to Peano Theory. For the uniqueness part of the Division Algo-
rithm, we generalize the () part of the proof of Lemma 2.25. [fix]
Theorem 2.30. (Division Algorithm, Uniqueness) Let a N. Suppose there are
q1 , q2 , r1 , r2 N with
q1 a + r1 = q2 a + r2 ,
and r1 , r2 < a. Then q1 = q2 and r1 = r2 .
Proof. The idea is simple enough, but working it out carefully is a good
benchmark for our Peano Theory. Consider Trichotomy for r1 and r2 . If they
are unequal then one is greater than the other. Say r1 < r2 . By definition of
subtraction, this means that q1 a = (q2 a + r2 ) r1 . By Lemma 2.14 [fix], q1 a =
q2 a + (r2 r1 ). This shows that q1 a > q2 a, and indeed that q1 a q2 a = r2 r1 . By
Proposition 2.19 [fix], we have a(q1 q2 ) = r2 r1 . The left hand side is plainly a
nonzero multiple of a, thus by Proposition 2.22 [fix],
a r2 r1 r2 < a,
which contradicts Strong Trichotomy. Thus r1 = r2 , which implies that q1 a = q2 a
by the Cancellation Law of Addition. By the Cancellation Law of Multiplication
we conclude that q1 = q2 .
4.1. Exercises
(1) Prove the two cancellation laws for division:

(a) If a, b, n N with n|a and n|b and a n = b n, then a = b.
(b) If a, b, n N with a|n and b|n and n a = n b, then a = b.
(2) Suppose b|a and d|c. Prove (a b) + (c d) = (ad + bc) (bd).
(3) Let a, m, n N with m n. Prove that anm = an am .
(4) Let (hn ) denote the Hemachandra sequence. Prove that if d|n then hd |hn .
(Use Exercise 9 in Section 7.4. [fix])
(5) Suppose a, b1 , b2 N satisfy a b1 + b2 . Prove that there are a1 , a2 N
so that a = a1 + a2 , a1 b1 , and a2 b2 .
(6) Consider the set N = N {}, with addition and multiplication defined
via
(
the same m + n if m, n N,
m+n=
if (m = ) (n = ),
(
the same m n if m, n N,
mn=
if (m = ) (n = ).
Which of the following properties does addition/multiplication in N
not satisfy? Commutativity, Associativity, Distributivity, Cancellation.
5. SUPERLATIVES 51
Just give counterexamples for any failing properties. Suppose we wanted

to extend N further to include 0, and extending the rules of N. Clearly
0 + should be . Are there any values we can give 0 so that
no further properties (other than ones youve discarded in the previous
paragraph) fail?
(7) Let a 2 and b N. Prove that there are numbers q, r, e N so that
b = qae + r and 0 < q < a and 0 r < ae .
(Follow the proof of the Division Algorithm.)
(8) Let x, y N. Say that xy provided that there is an n N so that
xn = y. Determine whether is transitive. In other words, if ab and
bc, is it necessarily true that ac?
(9) Let x, y N. Say that x>y provided that there is an n N so that
nx = y. Determine whether > is transitive. In other words, if a>b and
b>c, is it necessarily true that a>c?
(10) Prove that there is no rational number whose square is 21 .
(11) Prove that if n is a whole number and n2 is divisible by 3, then n is
divisible by 3.
(12) Prove that there is no rational number whose square is 2, 3, 6, or 32 .
(13) Prove that there is no rational number whose cube is 2.
5. Superlatives
Let us discuss the notion of the minimum and maximum member of a set of
numbers.
Definition. Let S N. A number ` N is a lower bound of S
provided that (s S) (` s). A number u N is an upper bound
of S provided that (s S) (u s).
For example, if S is the set of positive even numbers, then 1 and 2 are the only
natural numbers which are lower bounds of S, and S has no upper bounds.
If S is any subset of N, then certainly 1 is a lower bound of S. We will soon prove

that every nonempty set S of natural numbers has a lower bound m which is an
element of S. We call this the minimum of S. This is something special about
natural numbers, as compared to subsets of the real numbers. Denote by R>0 the
set of positive real numbers; then R>0 does not have a minimum. It has a greatest
lower bound, namely 0, but 0 / R>0 .
If S = , then pure logic dictates that every n N is both an upper and lower
bound of S. (Can you see that?) However since n / S for all n, it obviously cannot
have a minimum element.
Theorem 2.31. (Well-Ordering, Min Form) Let S N be nonempty. Then there
is an element m S which is a lower bound of S.
Proof. This will be a proof by contradiction. Suppose that the theorem is

false. This means that no lower bounds of S are themselves elements of S. Let T
52 2. ARITHMETIC
be the set of lower bounds of S. That is,

T = {n N | n is a lower bound of S}.
Since S 6= , we know that 1 T (using Lemma 2.8). We will prove that T is

inductive. Let n T . This means that n s for every s S. Since we are
supposing the theorem is false, we cannot have n S. Therefore n 6= s, so n < s.
By the Creeping Lemma, we know that for all s S, n0 s. Therefore n0 T .
This reasoning shows that T is inductive.
By (IND) we conclude that T = N, and therefore S = . This is the contradiction,

which finishes the proof.
Definition. The number m in the above theorem is called the minimum

of S, or min S.
Theorem 2.32. (Well-Ordering, Max Form) Let S N be a nonempty subset of N

which is bounded above. Then there is an element M S which is an upper bound
of S.
Exercise; apply the Min Form to the set of upper bounds for S.
Definition. The number M in the above theorem is called the maxi-

mum of S, or max S.
Example: Let a, b N and put

S = {n N | bn a}.
Then a is an upper bound of S, since if n S, then n bn a. Therefore S has
a maximum. (How can you compute it?)
Example: Write Div(a, b) for the set of common divisors of a and b. In other
words, Div(a, b) = Div(a) Div(b), the intersection of the two sets.
This set is bounded above by a, and 1 Div(a, b) so it is not empty. Therefore it

has a maximum element, called the greatest common divisor of a and b.
Definition. Let gcd(a, b) = max Div(a, b); it is called the greatest

common divisor of a and b.
Example: Div(42) = {1, 2, 3, 6, 7, 14, 21, 42} and Div(24) = {1, 2, 3, 4, 6, 8, 12, 24},
so Div(42, 24) = {1, 2, 3, 6}. It is easy to see now that gcd(42, 24) = 6.
Similarly write Mult(a, b) for the set of common multiples of a and b. Since ab
Mult(a, b) this is nonempty; thus it has a minimum element.
Definition. Let lcm(a, b) = min Mult(a, b); it is called the least com-
mon multiple of a and b.
Example: Mult(42) = {42, 84, 126, 168, 210, . . .} and Mult(24) = {24, 48, 72, 96, 120, 144, 168, 192, . . .},
so Mult(42, 24) = {168, . . .}. Therefore lcm(42, 24) = 168.
6. EUCLIDEAN ALGORITHM 53
More generally, let a1 , . . . , an be any finite set of numbers. Write Div(a1 , . . . , an )

for the set of numbers dividing each of a1 , . . . , an , and Mult(a1 , . . . , an ) for the set
of numbers divisible by each of a1 , . . . , an . As above we may let gcd(a1 , . . . , an ) =
max Div(a1 , . . . , an ) and lcm(a1 , . . . , an ) = min Mult(a1 , . . . , an ).
6. Euclidean Algorithm
If you are like most students, you have an old habit of thinking about the gcd of
two numbers as follows. You take your two numbers, factor them, and then for each
prime note the smaller exponent that occurs in the factorizations of both numbers.
The exponents of primes appearing in the factorization of the gcd will be these
smaller exponents.
While we will eventually derive this characterization of the gcd, you should forget
about it for a while for two reasons. One, it is usually inefficient to factor large
numbers. Two, at this point in the course we are trying to train you to under-
stand the logic of the definition of mins and maxes, as well as digest the theory of
divisibility.
Try to work through the following two lemmas, to break yourself from the afore-
mentioned habits.
Lemma 2.33. If b = qa + r, then gcd(a, b) = gcd(a, r).
Proof. This follows from the fact that Div(a, b) = Div(a, r), which the reader
should prove.
Lemma 2.34. If a|b, then gcd(a, b) = a.
Proof. This is a good exercise for you to do right now. Use the definition of
gcd!
These two lemmas allow us to compute the gcd of any two natural numbers. Con-
sider, for example, a = 51 and b = 36. (Allow me to use normal notation for
numbers for this example.) Applying the division algorithm yields
51 = 1 36 + 15.
By the first lemma, we conclude that gcd(51, 36) = gcd(36, 15). So we have simpli-
fied the problem. Next,
36 = 2 15 + 6.
Thus gcd(36, 15) = gcd(15, 6). Next,
15 = 2 6 + 3.
Thus gcd(15, 6) = gcd(6, 3). But since 3|6, we know gcd(3, 6) = 3 by the second
lemma. Thus gcd(51, 36) = 3.
This is a great algorithm for computing gcds, and originates in the first proposition
of Book VII of Euclids Elements. It is described therein as antenaresis, or
repeated subtraction.
54 2. ARITHMETIC
There is a second phase of this algorithm, which allows us to express gcd(a, b) as

the difference of a multiple of a and a multiple of b. We iteratively use the idea that
if b = qa + r, then r = b qa. Thus we retrace the steps of the first algorithm, each
time writing the remainder as the dividend minus the quotient times the divisor.
In our present example we start with
3 = 15 2 6.
The next step is to start with the smaller of the underlined numbers on the right,
find the equation in which it is the remainder, and use that equation to substite in
a difference of larger numbers.
3 = 15 2 (36 2 15),
Then combine terms.
3 = 5 15 2 36.
Now the 15 is the smaller of the underlined numbers, so again subsitute and com-
bine:
3 = 5 (51 1 36) 2 36.
3 = 5 51 7 36.
This expresses 3 as the difference of a multiple of 51 and a multiple of 36.
Very soon (after we learn Strong Induction), we will formally prove:

Theorem 2.35. (Euclidean Algorithm) If a, b N and d = gcd(a, b), then there
are m, n N so that ma nb = d.
Meanwhile just practice with the algorithm; the proof wont be much more than
that.
7. Strong Induction
In Exercise 8 in Section 8.4 you were asked to prove a formula for the Hemachandra
numbers, using the induction scheme
(P (1) P (2) (k(P (k) P (k + 1) P (k + 2)))) nP (n).
This meant that you were to verify the formula at n = 1 and 2, and then prove
that if the formula is correct for two consecutive numbers, then it is true for the
next number.
We are now in a position to prove that this is a valid induction scheme. It comes
down to the following proposition:
Proposition 2.36. Let S N be a subset of N satisfying the following properties.
(1) 1, 2 S.
(2) n N, we have (n, n + 1 S) (n + 2 S)
Then S = N.
7. STRONG INDUCTION 55
Before proving this, let me spell out the relationship with the above induction
scheme. Suppose you are given propositions P (n) as in the Hemachandra situation,
then let S = {n N | P (n)}. Then knowing P (1), P (2) and knowing that P (k)
and P (k + 1) together imply P (k) tells you that S satisfies properties (1) and (2).
Therefore S = N and so P (n) is true for all n N.
Proof. Let T = S c be the complement of S in N. Suppose T is nonempty.

Then T has a minimum, say m = min T . Since 1, 2 S we know they are not in
T . Therefore m 6= 1 and m 6= 2. Consider the numbers m 1 and m 2. Since
they are both less than m they are not in T . Therefore they are in S. By property
(2) m S, so it is not in T . This is a contradiction. Therefore we know T must be
empty and therefore S = N.
I hope you get the feeling from the above proof that this min technique is very
powerful. It seems unfit to use it for such a random-looking induction scheme. In
fact, this same technique will take us all the way to Strong Induction.
The idea of Strong Induction is as follows: Again you have a sequence P (n) of
propositions, and know that P (1), say, is true. Suppose you can always reduce
P (k) to either
(1) some P (`) for ` < k, or

(2) some combination of P (`)s with various ` < k.
Then you know P (n) is true for all n.
This should make sense to you, because you are always decreasing n until it finally
gets down to 1. You shouldnt have to worry exactly how it decreases, just that it
does.
We codify the above as

P (1) (k > 1(P (1) . . . P (k 1) P (k))) nP (n)
Although you may not necessarily need all k between 1 and n, its simpler to
suppose that you do.
7.1. The Chocolate Bar Problem
In this paragraph we step outside Peano Theory to give an example of problem

resolved with Strong Induction.
Suppose I have a bar of chocolate with vertical and horizontal lines dividing it into
an m n grid of segments, for some m, n N. I want to break the bar into mn
pieces, by breaking the bar along the lines. Lets say that a split is the act of
breaking a bar along a line to get two smaller bars. How many splits will it take?
Proposition 2.37. It always takes mn 1 splits.
56 2. ARITHMETIC
Its not hard to come up with an algorithm that breaks up the bar in some organized
fashion, and check that your algorithm takes mn 1 splits. But the stronger fact
is that no matter how you break it up, it always takes the same number of splits.
[Someday, a diagram...]
Proof. We prove the proposition by strong induction on the number of seg-

ments, which is mn. In other words our statement P (N ) is If a bar has N segments,
then it always takes N 1 splits to separate them into N pieces.
This is clear for P (1), because this is the case of only one piece, so no splits are
required.
Suppose k > 1. Then some split is possible. Suppose the splitting is as in the
diagram. Then the total number of splits required is
1 + (m1 n 1) + (m2 n 1)
by P (m1 n) P (m2 n). Since this is equal to mn 1 = N 1, we have proved the
inductive step. We are done by strong induction.
7.2. Proof of Strong Induction
We now prove the validity of the strong induction scheme. Its just like the proof
of the validity of the Hemachandra induction scheme. As before, our proof will use
the Min Form of Well-ordering, which was proved using (IND). So really, standard
induction implies strong induction.
The validity of the scheme amounts, as usual, to a statement about subsets of N.

Remember, the set S below should be thought of as the set of n N so that P (n)
is true.
Theorem 2.38. (Strong Induction) Let S N be a subset of N satisfying the
following properties.
(1) 1 S.
(2) k > 1, we have (1, . . . , k 1 S) (k S). (In other words, if all
numbers less than k are in S, then k S.)
Then S = N.
Proof. Let T = S c the set of natural numbers not in S. If the theorem is not
true then S is not N and therefore T is nonempty. Let m = min T . Note m > 1
since 1 S. Also note that if k < m, then k
/ T so k S. By hypothesis, m S
which is a contradiction.
7.3. Exercises
(1) Theorem 2.32, Lemma 2.33, and Lemma 2.34.

8. PLACE-VALUE SYSTEMS 57
(2) Let a N. To compute Div(a), we need to check whether each number

n a divides a. Naturally we would do this in order starting with n = 1.
Write Div(a) = {d1 , d2 , d3 , . . .} with di < di+1 for all i.
(a) Prove that a is either a perfect square, or the product of two consec-
utive divisors dm , dm+1 .
(b) Suppose that a = dm dm+1 is the product of two consecutive divisors
dm , dm+1 . Prove that
Div(a) = {d1 , d2 , . . . , dm , a dm , a dm1 , . . . , a d1 }.
(c) What happens if instead, d2m = a?
(This helps compute Div(a). )
(3) For each of the following pairs of numbers a, b, find d = gcd(a, b) and
numbers m, n N so that ma nb = d and numbers p, q N so that
pb qa = d.
(a) a = 9409, b = 7081.
(b) a = 165, b = 224.
(4) If a = 2 in Theorem 2.35, and b is odd, what are m and n? What if a is
odd and b = 2?
(5) Find natural numbers x, y, z so that 35x + 15y 21z = 1.
(6) Fix a number a N. Suppose S N is a set of numbers satisfying the
following two properties.
(a) a S.
(b) Whenever n S then n0 S.
Prove that S contains all numbers greater or equal to a.
(7) Prove gcd(ac, bc) = c gcd(a, b) for a, b, c N.
(8) If d = gcd(a, b) prove that gcd(2a 1, 2b 1) = 2d 1. (Use Problem 5
from Section ??.)
(9) Prove gcd(a + b, b) = gcd(a, b).
(10) Prove that (a odd P (a) (kP (k) P (2k))) nP (n) is a valid
induction scheme.
(11) Prove that every natural number can be expressed as a sum of distinct
Hemachandra numbers.
(12) Consider the following two player game, played using two piles of pennies:
Players take turns. In each turn a player picks one pile and removes
some (natural) number of pennies from that pile. The player removing
the last penny wins.
Prove that, as long as the two piles begin with an equal number of
pennies, the second player can always win.
(13) Write hk for the k-th Hemachandra number, starting with h1 = h2 = 1.
Prove gcd(hk , hk+1 ) = 1 for all k.
(14) Prove that if hd |hn then d|n. (Suggestion: Use the previous problem,
problem 6, and problem ?? in Section ??.)
8. Place-Value Systems
There are very effective ways to represent numbers, using what is called the Place-
Value system. This is the way of expressing whole numbers by assembling a (finite)
string of digits. Let us begin with base X.
58 2. ARITHMETIC
Our digits will be the familiar Hindu-Arabic numerals {1, 2, 3, 4, 5, 6, 7, 8, 9} in place

of {I, II, III, IV, V, VI, VII, VIII, IX}. Thus the definition of 2 is 10 , the definition of
3 is 20 , etc. We will also use the common words one, two, three,..., nine to have
the usual meaning.
These numerals comprise a fixed set of digits {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} which is

used, in that order. The successor of each digits is another digit, unless the digit is
equal to 9.
Here is a number expressed in the most common place-value system.

2608 = 2 X3 +6 X2 +0 X1 +8 X0 .
How do you take the successor of a number in place-value language? If you have a
number N represented by a string of digits dm dm1 d1 d0 , then N 0 is usually given
by dm dm1 d1 d00 , where d00 is the digit coming after d0 . That rule doesnt work
if d0 = 9, and in that case, the successor of N is usually given by dm dm1 d01 0.
And so forth. Either one reaches a digit unequal to 9, or all of the digits must be
equal to 9, in which case the successor of N is of course 100 0, with m zeroes.
(One could be more rigorous here. Any instance of and so forth in mathematics
can be replaces with an inductive argument.)
The place-value system is an invention of Indian astronomers who worked with cos-
mic units of time, like the kalpa, which is precisely 4, 320, 000, 000 years. Certainly
this also necessitated the use of the digit 0, or sh unya to them, meaning void.
According to the tremendous book [4], the place-value system was in place by 458
CE, and certainly originated during the Gupta Dynasty. After you learn to take suc-
cessors, you learn to add and multiply. First you memorize addition/multiplication
tables, which tell you how to add/multiply digits. Then you learn inductive al-
gorithms for adding/multiplying entire strings. There are of course algorithms for
subtracting and long division as well. All of these algorithms of arithmetic can be
proved using the definition of the strings, and the distributive property. We will
not do this here.
A student of mathematics should lean to distinguish between the notion of number

and the strings of digits we use to represent them. A good way is to learn arithmetic
in another base. One can use any b > 1 to replace the role of X in the above. The
digits used are instead {0, 1, . . . , b 1}. Base b = III is called ternary, and base
b = X is called decimal. The ternary digits are {0, 1, 2}, and the string
2011 = 2 b10 + 0 b2 + 1 b + 1 = XXII,
in our neutral Roman numerals. (Note that the exponent 10 here is the usual
three.)
The first several ternary numbers are:

1, 2, 10, 11, 12, 20, 21, 22, 100, 101, 102, 110, 111, 112, 120, 121, 122, 200, 201, 202, 210, . . .
You should check as above using the exponential sums that they convert into the
right numbers. Note that this counting follows the same rules as for base X except
that 2 is the last digit instead of 9. Also notice analogues of facts in the decimal
system: The last digit of a ternary number is 0 iff the number is divisible by 3.
What does it mean if the last two digits are zeros?
As another example, say the base b = 7. With this convention, we have for instance
234 = 2 b2 + 3 b + 4, which is CXXIII.
Remark: Sometimes one writes 2347 to distinguish this from the base ten expres-
sion, which would be 23410 =CCXXXIV. But too much notation can be a headache,
so we will rely on context.
Ten is a conveniently sized number, and it is related to our anatomy so easy to

learn as a child. But in many situations, for example computer science the bases II
and powers of II are common. It is also true that certain math problems (see the
exercises) are more easily solved in another base.
8.1. Practice with binary arithmetic
In this section we will compute the sum

1000
X
n!
n=1
with base b = II, thus in binary. Binary arithmetic is simpler than decimal in the
sense that rather than needing two nine-by-nine tables of addition and subtraction,
we only need to know that 1+1 = 10 and 11 = 1. (The rules for 0 being universal.)
The sum is equal to
1! + 10! + 11! + 100! + 101! + 110! + 111! + 1000!.
The exercise here is to try not to convert these into decimal, do the operation
and convert back, but to do the entire computation in binary. We have of course
1! = 1 and 10! = 10. The next term is 11! = 11 10 1 = 11 10. We use long
multiplication:
11
10
00
+ 110
110
It should be clear from this that the rule for multiplying by 10 is to simply append
a zero to the end of your digits. Similarly the rule for multiplying by 100 is to
append two zeros to the end. Thus 100! = 100 110 = 11000. Next we use again
long multiplication to compute 101! = 101 11000:
11000
101
11000
1100000
+ 1111000
60 2. ARITHMETIC
The next calculation, 110! = 110 1111000, is a little more interesting, since we
have some carrying of addition:
1111000
110
11110000
+ 111100000
1011010000
Did you catch that? We used 1 + 1 = 10 in the sixth place, and 1 + 1 + 1 = 11
in places seven through nine, carrying the 1 each time. Similarly with 111! =
111 1011010000:
1011010000
111
1011010000
+ 10110100000
101101000000
1001110110000
Finally, 1000! = 1001110110000000.
P1000
Thus the sum n=1 n! is equal to:
1
10
110
11000
1111000
1011010000
1001110110000
+ 1001110110000000
1011010010011001
as the reader should verify.
Strictly speaking, this was just an illustration of basic binary arithmetic. But Id
like to point something out. Suppose we were to continue this computation, adding
on successively higher factorials. All higher factorials N ! with N 1000 end in at
least seven zeros, since they are divisible by 1000!. Therefore the last seven digits
PN
of the sum n=1 n! for N 1000 is 0011001. Moreover, as n increases, n! ends in
more and more zeros. Thus more and more ending digits of the sum will stabilize.
For example, 10000! ends in fifteen zeros, so we conclude that the last fifteen digits
PN
of n=1 n! for N 10000 will be the same. (In fact, they are 111101000011001.)
This process gives us an infinite binary expansion going to the left. Does it even-
tually repeat? No one knows.
8.2. Subtraction and Long Division
You should be able to figure out how subtraction is done in other bases. For
instance, in ternary we have
201210
122212
1221
Check! Of course there was some borrowing as you subtract from right to left.
As usual, you can doublecheck by adding 1221 + 122212 and see whether you get
201210.
Now youre ready for long division. Base ten long division, as taught in elementary
school, is a very mysterious-looking algorithm. It is a good question to ask, why
does it gives the correct quotient and remainder of the Division Algorithm? We
will not answer that question in this book, but instead we will demonstrate how to
perform long division in some other bases. We proceed by analogy.
Let us start with binary again, and in binary, divide 111011 by 101. Heres how
the final long division looks:
1011 R 100
101 111011
-101
1001
-101
1001
-101
100
Can you figure out what happened? In one way, this is easier than decimal long
division, because there are only two digits involved. In the first step, we ask, is
101 1, 11, 111? And since 111 is the first part of the number 111011 which is
at least as big as 111, we put the digit 1 above the third 1, where we are forming
the quotient q. Then we subtract 101 from 111 to get 10. Now we bring down the
digit 0. The new number 100 is still less than 101, so we put the next digit 0 in the
quotient, and bring down another digit. And so on. When we are out of digits
the remainder is 100 which we record next to the big R.
From our belief in long division, we conclude that 111011 = 1011 101 + 100.
[Add a ternary example.]
8.3. Converting numbers from one base to another
Ive presented N in this book as the set {1, 10 , 100 , . . .}. Later I agreed to use Ro-
man numerals {I, II, III, IV . . .} for numbers less than four thousand. Thats our
neutral way to express numbers, even though it is not far from decimal.
It is also fine to think of N as the set {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, . . .}, when we are
tacitly using decimal notation. But if binary is the convention, then is is fine to
think of N as the set {1, 10, 11, 100, 101, . . .}. And so on for the different bases. As
62 2. ARITHMETIC
long as its clear what 1 is, and what the successor function is, and Peanos axioms
are satisfied, its just N with different notation.
To understand how this works, let us work through some examples of how to convert
from one base to another. The easiest is converting to decimal. The number written
as 25037 in base b = 8 can be computed in decimal as:
2 84 + 5 83 + 0 82 + 3 81 + 7 80 = 10783.
How to convert a decimal number like 343 into binary or base 5? There are two
methods.
Method I: Top-down This method finds the digits from left to right.
In binary, you need to express your number as a sum of distinct powers of 2,

including 20 . So, find the highest power of 2 less than your number. Here the
highest power is 256 = 28 . Subtract it off and you get 87. The highest power of 2
less than 87 is 64 = 26 . Subtract to get 23. Then subtract 16 = 24 to get 7, which
is 4 + 2 + 1.
Therefore
343 = 28 + 26 + 24 + 22 + 21 + 20 .
Fill in the zeros and ones to prepare to write the binary expansion:
343 = 1 28 + 0 27 + 1 26 + 0 25 + 1 24 + 0 23 + 1 22 + 1 21 + 1 20 .
The binary representation is then 101010111.
The top-down method is less elegant in other bases. Lets try to convert the decimal
number 343 into base b = V. The highest power of 5 less than 343 is 125 = 53 . So
we know the result will have four digits. But to figure out the first digit, we need
to determine the highest multiple of 125 which is less than or equal to 343. We
have 250 = 2 125 < 343 < 375 = 3 125, and so the first digit is 2. The next step
is to subtract off the 2 125 to get 93. Iterating, the highest power of 5 less than
93 is 25, and the highest multiple of 25 less than 93 is 75 = 3 25. The number 18
remains. It is easy to see what to do from here; 18 = 3 5 + 3.
Finally we have computed that
343 = 2 53 + 3 52 + 3 51 + 3 50 .
Our conclusion is that 343 in decimal converts to 2333 in base 5.
Method II: Bottom-up This method finds the digits from right to left.
The right most digit of a number N in base b is the remainder when you divide N
by b. For instance, when you divide 343 by 2 we get q = 171 and r = 1. So the
unit place digit is 1. You then iterate this; divide 171 by 2 to get q = 85 and r = 1.
So far we have learned that the binary expansion of the decimal number 343 is:
( the binary expansion of 171 )1 = ( the binary expansion of 85 )11

Continuing this way, we get

( the binary expansion of 42)111 = (21)0111
= (10)10111
= (5)010111
= (2)1010111
= 101010111
Note that we simply wrote (21) for the binary expansion of the decimal number
21, et cetera.
Similarly to convert the decimal number 343 into base 5, with similar notation we
compute:
(343) = (68)3
= (13)33
= (2)333
= 2333
With some practice youll do fine.
Hexadecimal Notation Bases with b > X have occasional use. Of course one
needs some symbols for digits beyond the Hindu-Arabic numerals. Let us discuss
the hexadecimal system, which is base b =XVI. Here the convention is to take the
union of the symbols {0, 1, . . . , 9} and the symbols {A, B, C, D, E, F }. We have
A = 90 , B = A0 , . . . , F = E 0 =XV. Thus, for instance, if we wanted to convert the
hexadecimal 2F ACE into decimal, we would compute, in decimal,
2 164 + (15) 163 + (10) 162 + (12) 161 + 14 = 195278.
Lets convert 343 into hexadecimal. Following the notation above, we have
(343) = (21)7
= (1)57
= 157.
If remainders larger than 9 had occurred, then we would have used the letters. For
instance, 26 = 1A.
Converting between Non-decimal bases
Finally, how do you convert between two bases like binary and ternary, neither of
which is decimal. There are multiple ways.
One way is to just use the division algorithm as above, but performing it in the
given base. You need to know how to do that. Another way is cheap: convert the
binary number into decimal, and then the decimal number into ternary. Its cheap
but youre less likely to make a mistake.
As an example, lets convert the binary number 1011 into ternary. Long division
of 1011 by 11 in binary gives a quotient of 11 and a remainder of 10, which is 2 in
64 2. ARITHMETIC
ternary. We write this symbolically as (1011) = (11)2. Then long division of 11 by

11 of course gives remainder 0 and quotient 1. So (1011) = (11)2 = 102 in ternary.
The cheap way is to note that 1011 in binary is 11 in decimal, and 11 = 1 9 + 2 1,

so we get 102 again.
Just learn one way that works for now, and eventually learn how to perform long
division in other bases.
8.4. Existence and Uniqueness of Place-value notation
Let us be more formal now and prove that any whole number can be written
uniquely in place value notation with any integer b > 1 as a base. The exis-
tence proof will use the idea of the bottom-up approach above, since anyway
what were doing is converting numbers into a given base place-value system. For
the uniqueness proofs, both bottom-up and top-down proofs are given.
Definition. Let b N with b > 1. A base-b-digit is an integer d N

with 0 d < b.
Proposition 2.39. (Existence of Place-value Representation) Fix a number b > 1,

and let N N. Then there is a number m N and base-b-digits d0 , . . . , dm so that
m
X
N= di bi .
i=0
Proof. Strong Induction on N . For N = 0, 1 let m = 0 and d0 = 0, 1.

Assuming the proposition for numbers less than N , apply the division algorithm to
N and b, to yield N = qb+r with r a base-b-digit. In fact, r will be the units
Pm digit of
N . Since b > 1, we know q < N . So using the inductive hypothesis, q = i=0 di bi
for base-b-digits di . Since N = qb + r we have
m
!
X
N=
di b i+1
+ r.
i=0
If we now define d0 = r and di = di1 for i 1, we have

m+1
X
N= di bi ,
i=0
as required.
Proposition 2.40. (Uniqueness of Place-value Representation) Let b be a natural

number greater than 1. Suppose that there Pm m N and base-b-digits
Pmis a number
d0 , d1 , . . . , dm and d00 , d01 , . . . , d0m so that i=0 di bi = i=0 d0i bi . Prove that di = d0i
for all 0 i m.
I give two proofs of this: one bottom up and the other top down.
Proof. (Bottom up) Standard Induction on m. If m = 0 it is clear, since

there is only d0 = d00 . The remainder of the LHS (resp. RHS) upon division by b
is d0 (resp. d00 ). By the Uniqueness part of the division algorithm, we must have
d0 = d00 . Now subtract this remainder from both sides and divide by b. The result
is:
m1
X m1
X
di+1 bi = d0i+1 bi .
i=0 i=0
Note there is one fewer digit. By the inductive hypothesis, di = d0i for 0 < i m.
Therefore all the digits are equal, as claimed. We are done by induction.
For the top-down proof, the idea is to successively show that the highest digits
are equal, and then cancel them one by one. If the highest digit on one side is bigger
than the other, then surely there is some contradiction. The key is the following
lemma:
Lemma 2.41. Let d0 , . . . , dm1 be base b digits, and c N. Then
m1
X
di 10i < c 10m .
i=0
Write b for the digit b = b 1.
Proof. In fact this is seen by the inequalities

m1
X m1
X
d i bi b bi
i=0 i=0
m1
!0
X
i
< b b
i=0
m
=b
c bm .
We are using the successor rule that (b b )0 = b0 0, where the number of
0s is equal to the number of b s.
Now we give our second proof:
Proof. (Top down proof of Proposition 2.40)
Standard Induction on m. If m = 0 it is clear, since there is only d0 = d00 . Suppose

that dm 6= d0m . We may assume that dm < d0m . But then, subtracting dm bm from
both sides gives
Pm1 i 0 m
Pm1 i 0 m
i=0 di b = (dm dm )b + i=0 di b (dm dm )b .
Putting c = d0m dm we see this contradicts the lemma. Therefore dm = d0m . We

Pm1 Pm1
now subtract dm bm = d0m bm from both sides to get i=0 di bi = i=0 d0i bi .
66 2. ARITHMETIC
Note there is one fewer digit. By the inductive hypothesis, di = d0i for 0 i < m.
Therefore all the digits are equal, as claimed. We are done by induction.
8.5. Exercises
(1) In base b = III, perform long division to divide 21110210 by 21.

(2) For n N write Sn for the sum of the digits (base X) of n. Prove that if
a, b N then 9|(Sa + Sb Sa+b ). Is the analogous statement true in any
base b?
(3) Prove that, if n > 2 there is no solution to nx + ny = nz with x, y, z N.
(Suggestion: First think about n =X. Then think about the problem in a
general base n.)
(4) Let a, b N. Prove that 2a 1|2b 1 if and only if a|b. (Hint: think in
binary)
(5) Let a, b N and suppose r is the remainder when you divide b by a. Show
that 2r 1 is the remainder when you divide 2b 1 by 2a 1.
(6) Compute the sum
X100
n!
n=1
in the base b = III. (Note that 100 = IX.) Show all of your work; you
should start by writing out the little addition/multiplication tables base
III.
(7) Write each of the numbers CDLIII, DCLXXVII, and CMXI in the three
bases b = II,IV,and VIII. What is an easy way to convert a binary expan-
sion into a base IV expansion and a base VIII expansion? (If you dont
see the pattern, make more examples.)
(8) Base ten long division, as taught in elementary school, is a very mysterious-
looking algorithm. Explain why it gives the correct quotient and remain-
der of the Division Algorithm. Please note: Im not just asking you to
articulate the algorithm. The problem is to explain why, not how.
(9) Let N N. Prove that there is a number m N and digits d0 , . . . dm with
dn N, dn n so that
Xm
N= di i!.
i=0
Are these digits unique? (Hint: Recall Exercise 3 in Section 8.4.)
9. THE FUNDAMENTAL THEOREM OF ARITHMETIC 67
9. The Fundamental Theorem of Arithmetic
9.1. Euclidean Algorithm II
Now we turn to some unfinished business.
In a previous section we showed that if a = 51 and b = 36, then d = gcd(a, b) = 3

and moreover that 3 = 551736. Are the factors 5 and 7 unique here? In fact, by
adding and subtracting 5136, we also get the solution 3 = (5+36)51(7+51)36.
We can iterate that idea to get the solutions 3 = (5 + 36k) 51 (7 + 51k) 36 for any
k N. But it doesnt stop there; one can go the other direction as well. You and
I know that there are such things as negative integers, and if we let, for instance
k = 1, then we are led to the solution 3 = (5 36) 51 (7 51) 36, which we
rewrite quickly as
3 = 44 36 31 51,
before anyone notices we used negative numbers.
Let us prove the following proposition:

Proposition 2.42. Let a, b N and gcd(a, b) = d. Then there are m, n N so
that (ma nb = d) (nb ma = d).
Because of the trick from the previous paragraph, it is actually true that you can
express d in both ways. However for the main part of the proof, one first realizes
one way or the other.
How will we prove this important theorem? It will be an induction proof. One needs
to locate an integer quantity that will strictly decrease as we iterate the algorithm.
We use the remainders that occur in each iteration of the division algorithm.
Proof. We proceed by strong induction on a. Write P (a) for the statement

of the proposition. If a = 1, then d = 1 and 1 a 0 b = 1.
Now we assume a > 1 and assume P (< a).
By the division algorithm, b = qa + r for some q, r N with 0 r < a. If r = 0

then a b, so gcd(a, b) = a and we may use 1 a 0 b = a.
If r > 0, then we have gcd(a, r) = gcd(a, b) = d. We may apply P (r). So m0 , n0

N so that
m0 a n0 r = d n0 r m0 a = d.
Eliminating the r gives
(m0 a n0 (b qa) = d) (n0 (b qa) m0 a = d);
thus
((m0 + n0 q)a n0 b = d) (n0 b (n0 q + m0 )a = d).
This proves P (a), and so we are done by induction.
We now show that the or is really not required:

68 2. ARITHMETIC
Proof. (of Theorem 2.35)
Suppose we have m, n N so that nbma = d. Pick k large enough so that bk m

and ak n (for instance k = max(m, n)). Then (bk m)a (ak n)b = d.
9.2. Euclidean Applications
Euclids Algorithm has vast application.
Definition. Numbers a and b are called relatively prime provided that

gcd(a, b) = 1.
The word coprime is a synonym of relatively prime. If two numbers a and b

are relatively prime, then by the Euclidean algorithm ma nb = 1 for some choice
of m, n. If x is any number, we may multiply this equation by x to obtain (xm)a
(xn)b = x; therefore any number may be written as an integral combination of a
and b.
Proposition 2.43. Let a, b, c N and suppose that a|bc, and gcd(a, b) = 1. Then
a|c.
Proof. By the Euclidean Algorithm, there exist m, n N so that manb = 1.

Then mac nbc = c. Since a|mac and a|nbc, we see that a divides the LHS. Thus
a divides the RHS, which is c.
Proposition 2.44. Let a, b N be relatively prime. Then Mult(a, b) = Mult(ab).
Proof. It should be clear that Mult(ab) Mult(a, b). Let Mult(a, b). By
the Euclidean algorithm, there are numbers m, n so that ma nb = 1. Therefore
ma nb = . Because is a common multiple of a and b it is easy to see that a
and b are divisible by ab. Therefore the LHS and hence is a multiple of ab.
Proposition 2.45. Suppose a and b are relatively prime, and both divide some
number c. Then ab|c.
Proof. By the Euclidean Algorithm, there are numbers m and n so that

ma nb = 1. Multiply this by c to get mac nbc = c. Using the hypothesis we see
that both terms of the left hand side are divisible by ab, thus the right hand side
is as well.
Proposition 2.46. Let a, b N. Then Div(a, b) = Div(gcd(a, b)).
Proof. Let d = gcd(a, b). We have ma nb = d for some numbers m, n. If c

is a common divisor of a and b, then c divides the left hand side, and therefore c|d.
Conversely, since d Div(a, b), any divisor of d is also in Div(a, b).
Here is a sort of converse to the Euclidean Algorithm.

Proposition 2.47. If a, b, m, n N and ma nb = 1, then a and b are relatively
prime.
Proof. Let d be a common divisor of a and b. Then d divides the LHS and
therefore d|1. It follows that Div(a, b) = 1.
The reader should contrast the following definitions, which will be referred to in
the exercises:
Definition. Let a1 , . . . , an N. We say they are pairwise coprime
provided that for all i 6= j, gcd(ai , aj ) = 1. We say they are relatively
prime provided that gcd(a1 , . . . , an ) = 1.
For example, the three numbers 10, 21, 121 are pairwise coprime and relatively
prime, and the three numbers 6, 15, 35 are relatively prime but not pairwise coprime.
Here are some propositions which use these notions. We will not need them in the
sequel, so we just present them as exercises.
Proposition 2.48. If a1 , . . . , an N are relatively prime, then there are numbers
c1 , . . . , cn N so that !
n1
X
ci ai cn an = 1.
i=1
Proposition 2.49. If a1 , . . . , an N are pairwise coprime, then Mult(a1 , . . . , an ).
9.3. Exercises
(1) Let a, b1 , . . . , bn N and suppose that gcd(a, bi ) = 1 for all i. Prove that
gcd(a, b1 bn ) = 1.
(2) Let a, b N. Proposition 2.42 gives numbers m, n N so that either
ma nb = d or nb ma = d. Prove that if one follows the Euclidean
Algorithm, then actually m b and n a. (Strong Induction; follow the
proof of Proposition 2.42.)
(3) Let a, b N be relatively prime. Let N (a 1)(b 1). By the Euclidean
Algorithm we know there are m, n N so that ma nb = N . The goal
of this exercise is to prove that there exist c, d N so that ca + db = N .
(The class size problem.) Let k0 = max{k N | m bk}. Prove that
ak0 n. (Hint: Keep the Creeping Lemma Handy) Modify m, n using k0
to get c, d as desired.
(4) Let a, b N be relatively prime. Prove that there do not exist m, n N
with ma + nb = ab a b. (Thus, using the previous exercise, ab a b
is the largest such number.)
(5) Given a natural number n, write (n) N for the number of integers from
1 to n which are relatively prime to n. For example (12) = 4 since there
are four such numbers: {1, 5, 7, 11}. Compute (n) for all the numbers n
from 1 to 25. Is it true that (mn) = (m)(n) for all m, n N?
(6) Prove that if a and b are relatively prime, and a|bc, then a|c.
(7) Let a, b N. Prove that lcm(a, b) divides every element of Mult(a, b).
(8) For which pairs of numbers d, do there exist a, b N so that d = gcd(a, b)
and = lcm(a, b)?
(9) Let d = gcd(a, b). Prove that a d and b d are relatively prime.
70 2. ARITHMETIC
(10) Let d = gcd(a, b) and = lcm(a, b). Prove that ab = d. [Suggestion:

First show ab d Mult(a, b). Then show that if Mult(a, b) then
ab d divides . You may use the previous exercise.]
(11) Let a, b N be relatively prime. Suppose that there are m1 , m2 , n1 , n2 N
so that
m1 a + n1 b = m2 a + n2 b.
Suppose m2 m1 . Prove that there is an integer k N so that m2 =
m1 + bk.
(12) Prove that gcd(a, b, c) = gcd(a, gcd(b, c)). Use this to compute gcd(290177, 241133, 190747).
(13) Prove that lcm(a, b, c) = lcm(a, lcm(b, c)).
(14) Proposition 2.48 and Proposition 2.49.
(15) Prove gcd(a + b, b) = gcd(a, b).
9.4. Ords
If m > 1 and n are natural numbers we want to define ordm (n) to be the maximum
number of times m divides n. For example, ord3 (18) should be 2. Suggestively,
18 = 2ord2 18 3ord3 18 .
In fact, we will later prove that these ords are the exponents that occur in prime
factorizations of numbers.
Let us quickly check that these maximums exist. Remember, a set is guaranteed a
maximum as long as it has some upper bound, and is nonempty.
Proposition 2.50. Let m > 1 and n N. Then n is an upper bound for the set
{i N; mi |n}.
Proof. We will prove by induction on n that mn > n. Once we have done

this, if i is in this set, then mi n < mn , which by Exercise 9 in Section 2.4 implies
that i < n. Therefore n will be an upper bound of the set.
The case n = 1 is obvious. Suppose we know that mk > k. Then multiplying

both sides by m we see that mk+1 > mk. Since m > 1 we know that mk > k and
therefore mk k + 1. Appending this to the previous inequality we conclude that
mk+1 > k + 1. Thus we are done by induction.
Certainly 0 is in this set, so it is nonempty, and since it is bounded above, it has a

maximum. We can therefore make the following definition:
Definition. Let m > 1 and n N. Then ordm (n) = max{i N; mi |n}
For example, ord6 (12) = 1, ord6 (100) = 0, and ord2 (48) = 4.
Remarks:
(1) This terminology comes from the world of analysis, where ord means
order of vanishing. For example, the order of vanishing of f (x) =
x2 (x + 1) at x = 0 is 2, and at x = 1 is 1, so one would say ord0 (f ) = 2,

ord1 (f ) = 1 and ord1 (f ) = 0.
(2) We will avoid defining ordm (0), but the obvious choice is ordm (0) = .
The following is a convenient reformulation of the definition of ordm :

Proposition 2.51. Let m > 1 and n N. Then ordm (n) = i if and only if there
is a number u N so that n = mi u and m - u.
Proof. () Suppose ordm (n) = i. Then mi |n, say n = mi u. If m|u, then

mi+1 |n, contradicting the maximality of i. Thus m - u.
() If n = mi u then ordm (n) i. If mi+1 |n we would have m|u. So since m - u,

ordm (n) 1 and a, b N. Then ordm (a+b) min(ordm (a), ordm (b)).
Moreover if ordm (a) < ordm (b) then ordm (a + b) = ordm (a).
Before working through this proof the reader should try a few examples.
Proof. Let i = ordm (a) and j = ordm (b). By the previous proposition we
may write a = mi u and b = mj v, with m - u, v. If i j then
a + b = mi (u + vmji ).
If i < j it is easy to see that n - (u + vmji ), so that ordm (a + b) = ordm (a) by the
previous proposition. If i = j then the same equation shows that mi |(a + b). Since
ordm (a + b) is the maximum of such exponents, we conclude that ordm (a + b) i.
Of course the case i j is similar.
9.5. Prime Numbers
For every number n N, it is easy to see that 1, n Div(n). For some numbers,
these are the only elements of Div(n).
Definition. Let n > 1 be a natural number. We say that n is prime

provided that Div(n) = {1, n}. We say that n is composite provided that
it is not prime.
For example 8675309 and 314159 are prime.
Thus a number p > 1 is prime if whenever d|p, then d = 1 or p. Another way to say
this is as follows. One might call a divisor d of n a proper divisor if 1 < d < p.
A number is composite iff it has a proper divisor. Then, p is prime if and only if it
does not have a proper divisor.
Remark: The number 1 is neither considered a prime nor a composite. We do have

a name for such things; it is called a unit.
72 2. ARITHMETIC
Activity: Prime Number Bee

Have all the students all stand up, and pick some order to go around the
class. Successive students must recite the prime numbers 2, 3, 5, . . .. Any
student who gives a composite number, skips a prime, or takes more than
ten seconds must sit, and the last one standing receives a prize.
Proposition 2.53. The number 2 is prime.
Proof. If d N divides 2, then 1 d 2. Therefore d = 1 or d = 2. Thus

Div(2) = {1, 2}, and so 2 is prime.
Proposition 2.54. If N N with N > 1, then N has a prime factor.
Proof. Strong Induction on N 2. By the previous proposition we have

the base case N = 2. Indeed, if N is prime then we are done. Otherwise, N is
composite, so it factors in some nontrivial way, say N = de, with d, e < N . By the
inductive hypothesis, d has a prime factor, which is therefore also a prime factor of
N.
The next theorem has one of the most famous and treasured proofs in mathematics.
It goes back to Euclids Elements.
Theorem 2.55. There are infinitely many prime numbers.
Proof. By Contradiction. Suppose there are only finitely many p1 , . . . , pm .

Consider the number N = p1 pm + 1. By the previous proposition N must have
a prime factor. This prime factor must be one of the pi and therefore divides the
LHS of
N p1 pm = 1,
which is of course a contradiction.
Remark: Note that the number N = 2 3 5 7 11 13 + 1 = 30031 is not prime,

since 30031 = 59 509. The argument of the proof does not imply that N is prime,
only that it is divisible by a prime not in the list.
Proposition 2.56. Let N N with N > 1. Then there is an r N and prime
numbers p1 , . . . , pr (not necessarily distinct) so that N = p1 p2 . . . pr .
Proof. Strong Induction on N > 1. For N = 2 use Proposition 2.53 again. By

Proposition 2.54, there is a prime divisor p of N . If N = p we are done. Otherwise
1 < N p < N and so by our inductive hypothesis, N p is a product of primes:
N p = p1 p2 . . . pr .
Thus N = p1 p2 . . . pr p, as required.
Corollary 2.57. (Fundamental Theorem of Arithmetic, Existence) Let N N
with N > 1. Then there is a number m N, distinct primes p1 , . . . , pm , and
e1 , . . . , em N so that
N = pe11 pe22 . . . pemm .
Proof. This follows from the previous proposition by gathering together iden-
tical prime factors.
Now we start to deal with the issue of the uniqueness of prime factorization. For
example, 1001 = 11 91 = 143 7. Does this violate unique factorization into
primes? (Thanks to [2] for this cute example.)
Proposition 2.58. Let p, a, b N with p prime. Then (p|ab) ((p|a) (p|b)).
Proof. Suppose that p - a. Then Div(a, p) = 1, thus by the Euclidean Algo-

rithm there are numbers m, n so that ma np = 1. Multiplying this by b yields
mab npb = b. By hypothesis, p divides both parts of the left hand side, and
therefore it divides b.
Let us analyze the preceding proof a bit more. Let P ,Q,R be the statements:
P: (p is a prime) (p|ab)
Q: p|a
R:p|b.
The proof actually took the form (P Q) R, which by propositional logic is

equivalent to P (Q R). Does that help you understand the proof better?
Note that we can use this proposition to analyze the putative prime factorizations
of 1001 above. For instance 7 is prime (check!) and divides the right-hand side. It
therefore divides 11 or 91, which should lead the reader to suspect that 91 is not a
prime number.
Proposition 2.59. Let p, a1 , . . . , an N with p prime. If p|(a1 an ), then 1
i n so that p|ai .
Proof. Induction on n. This is clear if n = 1, suppose it is true for n =

k. Then if p|(a1 ak+1 ) = (a1 ak ) ak+1 , we have p|(a1 ak ) or p|ak+1 by
Proposition 2.58. In the first case, p divides some ai by the case n = k. In the
second case we are also done.
The next proposition shows that if p is prime, then the function ordp : N N
behaves much like a logarithm.
Proposition 2.60. If p is prime and a, b N then ordp (ab) = ordp (a) + ordp (b).
Proof. Let i = ordp (a). Thus there is a number u so that a = pi u, and p - u.

Similarly if j = ordp (b) there is a v so that b = pj v and p - v. So ab = pi upj v =
pi+j uv. The contrapositive of Proposition 2.58 tells us that p - uv. Therefore
ordp (ab) = i + j.
Corollary 2.61. If p is prime, a N and e N, then ordp (ae ) = e ordp (a).
This next proposition is to impress upon you the power of ord2 :

74 2. ARITHMETIC
Proposition 2.62. Let a, b N. Then a2 6= 2b2 .
Proof. Suppose that a2 = 2b2 . Taking ord2 of both sides gives 2 ord2 (a) =
2 ord2 (b) + 1. The RHS is odd and the LHS is even, which is a contradiction.

This is the essentially the proof that 2 is irrational, i.e., Theorem ??. This
argument should strike you as much more powerful and direct than the classic proof
we gave earlier. See Proposition 2.71 below for the final word in such problems.
Lemma 2.63. If p and q are primes, and p|q, then p = q.
Proof. This is easy enough to do in your head.
Proposition 2.64. Let p be prime and n N. Then Div(pn ) = {pe | 0 e n}.
Proof. Suppose d Div(pn ), and let e = ordp (d). Then d = pe u, with p - u.

If u 6= 1 then by Proposition 2.54 it has a prime factor q 6= p. This implies that q|pn ,
thus Proposition 2.59 implies q|p, contradicting the above lemma. We conclude that
u = 1 and thus d = pe . It is easy to see that e n.
Corollary 2.65. Let p, q be distinct primes, and m, n N. Then gcd(pm , q n ) = 1.
Proof. The above proposition shows that if x Div(pm , q n ) then x = pe = q f

for some e, f N. If e = 0 then x = 1. Otherwise f > 0 as well and we have p|q f
which by Proposition 2.59 implies p|q so p = q again, a contradiction.
Theorem 2.66. (Fundamental Theorem of Arithmetic, Uniqueness) Let N > 1,

and suppose N factors in some way as
N = pf11 pf22 pfrr ,
with the pi distinct prime numbers and fi N. Then the pi are all the prime
divisors of N , and fi = ordpi (N ).
Proof. Obviously the pi at least form a subset of the prime divisors of N ,

and the definition of ord implies that fi ordpi (N ). It follows that the RHS of the
equation in the corollary is no bigger than the RHS of the equation in the theorem,
and equality can only hold if we have equality of the ei and fi .
For example, 108 = 2 2 3 3 3. If we group together like factors we obtain

108 = 22 33 .
Corollary 2.67. Let N > 1, and suppose p1 , . . . , pr are the (distinct) prime
factors of N . Let ei = ordpi (N ). Then
N = pe11 pe22 pfrr .
Proof. This Corollary is obtained by combining the Existence and Uniqueness

forms of the Fundamental Theorem of Arithmetic.
9.6. More about ords
The Fundamental Theorem of Arithmetic says that knowing a number is equivalent

to knowing its ords. And given whole numbers {ep } for every prime p, there is an
n with ord
Q p (n) = ep for all p when all but finitely many of them are 0. Namely,
put n = p pep , a finite product. Moreover, knowing the ords of a number tells us
how it behaves multiplicatively. To understand this, start with the following:
Proposition 2.68. Let a, b N. Then ab = c if and only if for all primes p,
ordp (c) = ordp (a) + ordp (b).
Proof. The direction () is Proposition 2.60. We prove the other direction

here. Let p1 , . . . , p` be the list of all the primes dividing a, b, or c. Then
a b = (p1e1 pe22 pe` ` ) (pf11 pf22 pf` ` ),
where ei = ordpi (a) and fi = ordpi (b). Let gi = ordp (c). Since ei + fi = gi this
product becomes
pg11 pg22 pg` ` = c,
as desired.
Proposition 2.69. Let a, c N. Then a|c ordp (a) ordp (c) for all primes p.
Proof. If a|c then the result follows from the only if part of the previous
proposition. Conversely, if ordp (a) ordp (b) for all primes p, then let let p1 , . . . , p`
be all the primes dividing a or c. Then put
`
Y ordpi (c)ordpi (a)
b= pi .
i
Then the if part of the previous proposition proves that ab = c.
Thus we can characterize Div(n) as the set of numbers a so that for all primes
p, ordp (a) ordp (n). For every p,
Q there are (ordp (n) + 1) choices for ordp (a), and
it follows that there are exactly p (ordp (n) + 1) divisors of n.
Proposition 2.70. Let m, n N. Then there exists an integer r N with rm = n
iff for all prime numbers p, we have m| ordp (n).
Proof. This is left to the reader.

Proposition 2.71. Let m, n N, and suppose there does not exist an integer r N
with rm = n. Then there do not exist integers a, b N with am = nbm .
Proof. In view of Proposition 2.70, the hypothesis implies that there is a

prime number p so that m - ordp (n). Applying ordp to both sides of am = nbm
gives
m ordp (a) = m ordp (b) + ordp (n).
Thus m does divide ordp (n), a contradiction.
76 2. ARITHMETIC

Remark: This is really a proof that m n Q m n Z. Its related to the
application of the rational root test to the polynomial xm n.
Proposition 2.72. Let a, b N. If d = gcd(a, b), then ordp (d) = min(ordp (a), ordp (b))
for all primes p. If = lcm(a, b), then ordp () = max(ordp (a), ordp (b)) for all
primes p.
Proof. Let p be a prime, and suppose i = ordp (a) ordp (b). Then pi |a
and pi |b, so pi Div(a, b), which is equal to Div(d) by Proposition 2.46. There-
fore pi |d. However, pi+1 - a, so certainly pi+1 - d. It follows that ordp (d) = i =
min(ordp (a), ordp (b)) in this case. Obviously if ordp (b) ordp (a) a similar argu-
ment holds.
The statement about the least common multiple is an exercise.
We now have a straightforward way to compute the least common multiple of two
numbers. For instance let a = 75 and b = 21. We factor to get a = 3 52 and
b = 3 7. The only nonzero ords are for p = 3, 5, 7. If = lcm(a, b) then we must
have ord3 () = 1, ord5 () = 2, and ord7 () = 1. This determines = 3 52 7.
Here is another nice application of ords:

Proposition 2.73. Let a, b N. Then gcd(a, b) lcm(a, b) = ab.
Proof. Given the above discussion, this reduces to proving that min(x, y) +
max(x, y) = x + y for all x, y N. This is obvious from the proper point of view,
or the skeptical reader may consider the trichotomy of x and y.
This gives a way to compute lcm(a, b) without having to factor a and b. For
example, if a = 2000002 and b = 2000004 then the Euclidean Algorithm gives that
gcd(a, b) = 2 and therefore lcm(a, b) = ab 2 = 2000006000004.
9.7. Exercises
(1) The second half of Proposition 2.72, and Proposition 2.70.

(2) Let m > 1 and a, b N. Prove that ordm (ab) ordm (a) + ordm (b).
(3) There is a formula relating ord2 (n!) and Sn , where Sn is the sum of the
binary digits of n. Can you find it? Can you prove it?
(4) Recall Eulers totient function from Exercise 5 from Section 9.3. Explain
why if p is prime, then (p) = p 1. Find a formula for (p2 ). Find a
formula for (pk ), with k N. If q 6= p is another prime, prove that
(pq) = (p)(q).
(5) Consider the sequence 41, 43, 47, 53, . . . obtained by beginning with the
number 41 and successively adding all positive even integers 2, 4, 6, . . ..
Are all the numbers in this list prime? Give a proof or a counterexample.
Also answer the same question starting with 11 or 17.
(6) Use Problem 4 in Section 8.4 to prove that the Fermat numbers are pair-
wise coprime. Why does this imply that there are infinitely many prime
numbers?
(7) Prove that 2n 1 is composite when n is composite. If n is prime, is 2n 1

necessarily prime?
(8) For n N, write d(n) for the number of divisors of n, that is, d(n) =
| Div(n)|. For m, n N, prove that d(mn) d(m) d(n). When is this an
equality?
(9) Prove that gcd(a2 , b2 ) = gcd(a, b)2 for a, b N. Is this true for other
powers?
(10) Prove that gcd(a, bc)| gcd(a, b) gcd(a, c).
(11) For which a, b, c is gcd(a, b, c) lcm(a, b, c) = abc?
(12) If a|x and b|y prove that gcd(a, b)| gcd(x, y).
(13) Suppose that a|bc. Prove that there are numbers a1 , a2 N so that
a = a1 a2 , a1 |b, and a2 |c. (Use factorization and Exercise ?? in Section
??.)
(14) Let m, n be relatively prime with m n. If mn is even, then
gcd(n2 m2 , n2 + m2 ) = 1 = gcd(2mn, n2 + m2 ).
If mn is odd, then
2
n m2 n2 m2 n2 m2

gcd , = 1 = gcd mn, .
2 2 2
(15) Prove that log2 3 is irrational. (If it were rational and positive, then 2
raised to some power would be equal to 3 raised to some nonzero power.)
(16) Prove that log18 12 is irrational. Let p 6= q be primes and m, n, x, y inte-
gers. Under what conditions is
logpm qn px q y
irrational?
the Peano theory; how properties of arithmetic derive from just a few
axioms
how to work with different kinds of definitions, for example, inductive
definitions, the definition of a b, and the definition of gcd(a, b).
strong induction
the role of the division algorithm and Euclidean Algorithm
arithmetic in other bases
the ord coordinates of numbers
(1) (Uniqueness of the Natural Numbers) Suppose M is a set with an element

, and a successor function m 7 m# for m M satisfying the analogue
of the Peano Axioms. That is to say,
(INF) m# 6= for all m M,
78 2. ARITHMETIC
(INJ) If m# #
1 = m2 , then m1 = m2 , and
(IND) If S M is a subset satisfying S and m# S whenever
m S, then S = M. Define a bijection f : N M so that for all n N,
f (n0 ) = f (n)# . [Suggestion: Define your function inductively.] Be sure
to prove that your function is bijective. At what points do you use the
axioms (INF),(INJ),(IND) for N and M? (You need all six.)
(2) For which a, b is it true that ab = ba ? Lets see a proof!
(3) Let a, b N. Recall the definition of the statement ab; this means
there is a natural number n N with an = b. If a > 1 write loga b = n
in this situation. Is loga b uniquely defined? Prove that if a, b, c N with
a > 1, ac and cb, then
(loga b) (loga c) = logc b.
(4) Prove that the numbers q, r, e in Problem ?? in Section ?? are uniquely
determined.
(5) Let n N. Prove that in base X arithmetic there is a multiple of n which
is written as a string of 1s followed by a string of 0s. For example 11100
is a multiple of VI.
(6) Recall that hn denotes the nth Hemachandra number. Let a, b N. Prove
that gcd(ha , hb ) = hgcd(a,b) .
(7) Prove that given a number N one can find N consecutive numbers, each
having prime factors other than 2 or 3. Generalize this to any finite set
of primes.
(8) There is a formula relating ordp (n!) and Sn , where Sn is the sum of the
base p digits of n. Can
you find it? Can you prove it?
(9) Prove that ordp ( nk ) is the number of carries that occur in the base p
addition of k and n k.
(10) Let m, n N . Put urdm (n) = min{i N; n|mi }, if this set is nonempty,
and let urdm (n) = otherwise. For example, urd6 (4) = 2 since 4|62
but 4 - 61 , and urd4 (6) = since 6 - 4i for any i. Find and prove some
interesting properties of urd.
(11) Find two sequences of base-ten-digits a1 , a2 , . . . and b1 , b2 , . . ., with a1 = 2
and b1 = 5, so that for any natural number n, the product of the two
numbers (written in decimal) an an1 a2 a1 bn bn1 b2 b1 ends with
at least n zeros. For example, if your sequences started with a1 = 2, a2 =
1, b1 = 5, and b2 = 2, then they would check out for n = 1 and n = 2,
because 2 5 = 10 ends with a zero, and 12 25 = 300 ends with two
zeros. Also, are there analogues in any base, not just 10?
CHAPTER 3
Functions and Relations
79
80 3. FUNCTIONS AND RELATIONS
1. Relations
1.1. Introduction: Multivariable Propositional Logic
So far we have seen propositions P which do not depend on a variable, and propo-
sitions P (x), where x ranges over some set X.
We also might like to make propositions P (x, y), where x ranges over some given
set X, and y ranges over some other set Y .
For example, consider the statement, She has enough money to buy it. Here she
and it are variables. We may write the statement as P (she, it), where she ranges
over the set of females, and it ranges over the set of commodities (things you can
buy). Its truth value depends on the pair (she, it). Generally, the truth value of
P (x, y) will be true for certain pairs (x, y), and false for the rest.
Definition. If X and Y are sets, write X Y for the set of (ordered)

pairs (x, y), with x X and y Y . The pairs (x1 , y1 ), (x2 , y2 ) X Y
are equal provided that (x1 = x2 ) (y1 = y2 ).
For example, if X = {1, 2, 3} and Y = {A, B}, then

(1, A), (1, B),
(1) X Y = (2, A), (2, B), .
(3, A), (3, B)

If X and Y are finite, our convention is to display the product set as an array of
pairs, with the rows corresponding to elements of X and the columns corresponding
to elements of Y .
If X = Y = R, then X Y = R2 is the familiar Cartesian plane consisting of pairs

(x, y) of real numbers. If X = R2 and Y = R, then X Y is, strictly speaking,
the set of pairs (~v , y), where ~v = (x1 , x2 ) is a pair of real numbers, and y is a real
number. Naturally we view (~v , y) as the triple (x1 , x2 , y) and thus identify X Y
with three-dimensional space R3 .
Let A, A0 X and B, B 0 Y . If A B = A0 B 0 (as subsets of X Y ), is it

necessarily true that A = A0 and B = B 0 ? Not quite, because if, say A = , then
A B = (right?), and so the information of B gets lost. But this is essentially
the only counterexample, by the following lemma. I hope you appreciate our rigor.
Lemma 3.1. Let A, A0 X and B, B 0 Y with A, B nonempty. If AB = A0 B 0

as subsets of X Y , then A = A0 and B = B 0 .
Proof. Let a A and b B. Then (a, b) A B = A0 B 0 , so a A0 and

b B 0 . Since this is true for all such a, b, we have A A0 and B B 0 . Reversing
the argument gives the opposite inclusion.
1. RELATIONS 81
For our bivariable statement P (x, y), its truth set is given by
{(x, y) X Y | P (x, y) is true }.
Any such statement is equivalent, then, to a relation in the following sense:
Definition. Let X, Y be sets. A subset R X Y is called a relation

from X to Y . Write Rel(X, Y ) for the set of relations from X to Y . If
X = Y , then a relation from X to X is simply called a relation on X.
Write Rel(X) in this case for Rel(X, X).
Note that Rel(X, Y ) = (X Y ).
This subset could be anything. For instance, if again X = {1, 2, 3} and Y = {A, B},
then a relation could be given by

(1, B),
(2) R= (2, A), .
(3, A), (3, B)

A more efficient notation is to simply denote this by the asterisk-matrix

0
AR = 0 .

The asterisk-matrix has the same number of rows and columns as the table corre-
sponding to X Y , and we put an asterisk in the (x, y)-entry if (x, y) R and put
a 0 there if (x, y)
/ R.
Of course this depends on the way we order the sets X and Y . For instance, if we
had ordered the columns as B, A instead of A, B, then we would need to accordingly
switch the columns of AR to fit this convention.
Among other things, asterisk-matrix notation gives us a nice way to catalogue finite
relations on finite sets. There are 16 relations on the set X = {a, b}. Here they are
represented as asterisk-matrices, with the first row and column corresponding to a
and the second row and column corresponding to b:

0 0 0 0 0 0 0 0
, , , , , , , ,
0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0
, , , , , , , .
0 0 0 0

0 0
We have names for some of these matrices. For instance, 0 = A = ,
0 0
0 0
IX = , AXX = , Ea,b = .
0 0 0
If X = Y = R, then any figure in the plane is a relation on R. Let us give names

to some pleasant relations on R, so we can use them later.
(1) Write S 1 for the circle {(x, y) R2 | x2 + y 2 = 1}.

(2) Write D2 for the disc {(x, y) R2 | x2 + y 2 1}.
(3) Given m R, write `m for the line {(x, mx) R2 | x R} through the
origin, with slope m.
(4) Write ` for the y-axis.
Let R be a relation on X Y . For x X and y Y we write xRy for the

statement (x, y) R.
One always has the empty relation R = , and the total relation R = X Y .
In the former case xRy is always false, in the latter case xRy is always true. If
x0 X and y0 Y , write x0 ,y0 for the from from X to Y consisting of the single
pair {(x0 , y0 )}. These are called Dirac-delta relations. In this case xRy (x =
x0 ) (y = y0 ).
1.2. Reflexivity
The diagonal is an important relation.
Definition. The relation X = {(x, y) X X | x = y} is called the

diagonal of X.
We have aX b a = b. This can be thought of as the graph of the function

y = x. Of course this is equal to `1 when X = R.
If X is a finite set with say three elements, then we would naturally represent X
with the diagonal asterisk-matrix, which we write as IX :

0 0
IX = 0 0 .
0 0
Definition. A relation R on X is called reflexive provided that

X R.
This simply means that xRx is true for all x.
Example: The relation x|y on N is reflexive, but the relation x < y on N is not.
Example: Let L be the set of lines in the plane. Then parallelism is a reflexive
relation on L, but orthogonality is not.
1.3. Transposes of Relations
Not all relationships are symmetric. The relation R : Person x is taller than person
y. is certainly not. The transpose of a relation is what you get when you switch
the roles of x and y. So, the transpose of R is the relation RT : Person y is taller
than person x.
1. RELATIONS 83
Definition. Let X, Y be sets. If R is a relation from X to Y , then its

inverse RT is the relation from Y to X defined via
RT = {(y, x) Y X | (x, y) R}.
Say that a relation R on X is symmetric provided that RT = R.
Example: Let X = N. The transpose of x < y is x > y. The transpose of

x = y is x = y.
Example: The transpose of the relation (2) is

T (A, 2), (A, 3),
R = ,
(B, 1), (B, 3)
whose asterisk-matrix is simply

0
AR T = .
0
It is generally true that the asterisk-matrix corresponding to the transpose relation

is the transpose of the asterisk-matrix corresponding to the original relation. In
other words,
ART = (AR )T .
Example: Take X = R. To find the transpose, one reflects through the line x = y.
This has the effect of switching the coordinate (x, y) to (y, x). Note that S 1 and
D2 are symmetric. The transpose of the x-axis `0 is the y-axis ` . What is the
transpose of `m with m 6= 0? (Pssst: If R is the graph of a function f , and f is
invertible, then RT is the graph of the inverse of f !) The transpose of the second
quadrant is the fourth quadrant. The first and third quadrants are symmetric.
Proposition 3.2. The following hold for relations R from X to Y :
T = .
(X Y )T = Y X.
(RT )T = R.
T
For x X and y Y , we have x,y = y,x .
Can you prove each of these statements?
1.4. Exercises
(1) Let X be a set with n elements. How many relations R on X are sym-
metric? How many are reflexive? How many satisfy RT R = ?
2. Composition of Relations
Definition. Let X, Y , and Z be three sets, R Rel(X, Y ) and S

Rel(Y, Z). The composition S R Rel(X, Z) is the relation defined as
S R = {(x, z) X Z | y Y so that ((x, y) R) ((y, z) S)}.
We say that a relation R on a set X is transitive provided that RR R.
Thus x(S R)z y so that xRy yRz. To say that a relation R on a set X is
transitive, is the same as to say that whenever xRy and yRz are true for x, y, z X,
then we also have xRz is true.
Example: Let X = {1, 2, 3}, Y = {A, B}, and Z = {, , }. Let

(1, B),
R= (2, A), .
(3, A), (3, B)

and

(A, ), (A, ),
S= .
(B, )
Please check that

(1, ),
SR= (2, ), (2, ), .
(3, ), (3, ), (3, )

Example: Let X = Y = Z = N, and let R be the relation <. When is

(x, z) R R? Exactly when there is a y N so that x < y < z. Clearly, this is
possible, for integers, exactly when z x 2. Thus x(R R)y x + 2 y.
Example: (Squaring the Circle) Let R = S 1 , the graph of the unit circle in R2 .
Then
R R = {(x, z) R2 | y R so that x2 + y 2 = 1 = y 2 + z 2 }.
Let us try to understand this relation well enough to sketch it. Certainly it implies
that x2 = y 2 , or that x = z. But if you have such a pair (x, z), there still may
not exist such a y, because you also need to solve x2 + y 2 = 1. This is possible iff
1 x 1. So our relation is:
R R = {(x, z) R2 | (x2 = z 2 ) 1 x 1}
which is the graph of the two diagonals of the square [1, 1] [1, 1].
2. COMPOSITION OF RELATIONS 85
2.1. Asterisk-Matrix Composition
The calculation of S R in the previous section can be reduced to theasterisk-

matrixmultiplication
of
0
0
AR = 0 and AS = .
0 0

For this I ought to tell you how to add and multiply 0s and s. The idea is
to treat as unknown. The sum of an unknown and anything is an unknown,
but the product of an unknown and 0 is still 0. This leads to the following addi-
tion/multiplication tables:
+ 0 0
0 0 and 0 0 0 .
0
Using these rules you can define matrix multiplication using the usual dot products.
For instance, in the above example, the product of AR and AS is given by

0 0 0
0 0 = 0 .
0 0

Note that this is precisely equal to ASR , and therefore in this case,
ASR = AR AS .
Here the multiplication on the right hand side is asterisk-matrix multiplication.
In fact, this is true in general. We omit the proof, but isnt it curious that the order
of multiplication changed?
2.2. Properties of Composition
Proposition 3.3. (Associativity of Composition) Let X, Y, Z, W be sets. Let K

Rel(X, Y ), H Rel(Y, Z), and G Rel(Z, W ). Then
(G H) K = G (H K).
Proof. The left hand side is equal to

{(x, w) X W | y Y so that ((y, w) G H) ((x, y) K)}.
Using the definition of (y, w) G H, this breaks further into
{(x, w) X W | y Y, z Z so that ((y, z) H)((z, w) G)((x, y) K)}.
But this is the same as
{(x, w) X W | z Z so that ((z, w) G) ((x, z) H K)},
which is now equal to G (H K). Thus these two sets are equal.
Remark: The above proof used the associativity of ; that is that (P Q) R

P (Q R) for statements P, Q, R. Do you see where?
Proposition 3.4. (Composition and Transposes) Let X, Y, Z be sets. Let H

Rel(X, Y ) and G Rel(Y, Z). Then
(G H)T = H T GT .
Proof. Note that G H X Z and so (G H)T Z X.
The left hand side of the equation in the proposition is equal to

{(z, x) Z X | (x, z) G H}.
By the definition of G H, this is equal to
{(z, x) Z X | y Y so that ((x, y) H) ((y, z) G)},
which is equal to
{(z, x) Z X | y Y so that ((y, x) H T ) ((z, y) GT )}.
Let us rewrite this as
{(z, x) Z X | y Y so that ((z, y) GT ) ((y, x) H T )}.
This set is now equal to H T GT .
Remark: The above proof used the commutativity of ; that is that P Q

Q P for statements P, Q. Did you see where?
2.3. Graphical Views of a Relation
Suppose you have a graph G consisting of vertices V and edges E between them.
Every edge e starts at some initial vertex v1 and ends at some terminal vertex
v2 . In graph theory, it doesnt really matter if the edges are straight lines or not,
and it is fine to move the vertices and edges around, as long as the same vertices
are connected with the same edges. When we use computers to study graphs, the
essential information we upload is our vertex set, maybe V = {a, b, c}, and our edge
set. To describe an edge to a computer, we only need to say where it begins and
ends. So e above would be uploaded as e = (v1 , v2 ) V V . Thus, the edge set
E V V is simply a relation on V .
For example, consider the following graph on the vertex set V = {a, b, c}:
[There should be a nice picture here. Can you reconstruct it? The edges must all
have arrows.]
This corresponds to the relation

E = {(a, c), (b, a), (b, b), (c, a)}.
Let us note some phenomena above. The edge from b to b is called a loop;
generally a loop is an edge that begins and ends at the same vertex. With our
earlier terminology, the set of loops is exactly E V . Thus E will be reflexive if
there is a loop at every vertex.
2. COMPOSITION OF RELATIONS 87
Vertices a and c have two edges between them, but they are not considered the same
edge because they are going in different directions. When we have two vertices
joined in both directions by two edges, we should replace the two edges with a
simple edge with no arrows:
[Picture where we replaced the two edges with arrows from a to c with a single edge
without an arrow.]
The relation E is symmetric iff all the edges are now simple.
[Id like to describe what I call the bipartite view of a relation between two sets.
This is where you draw two horizontal ovals with some dots in between them, and
draw arrows from the dots on the left to the dots on the right. For instance when
I spoke in class about functions I drew several of these.]
2.4. Exercises
(1) There are three relations on X = {a, b} which are not transitive. Find
them.
(2) Find a relation R on X = {a, b} which is transitive, but for which R R 6=
R.
(3) Let X be a set, and R a relation on X. Prove that if R is reflexive and
transitive, then R R = R.
(4) There are five equivalence relations on the set X = {a, b, c} of three dis-
tinct elements. Find them, and write them as asterisk-matrices.
(5) Let P1 be the set of lines in R2 passing through the origin. On P1 , consider
the relation `1 R`2 provided that `1 and `2 are orthogonal. Compute RR.
(6) Let P2 be the set of lines in R3 passing through the origin. On P2 , consider
the relation `1 R`2 provided that `1 and `2 are orthogonal. Compute RR.
(7) Compose the relations `m , ` , S 1 , D2 from the first section with each
other. Can you find examples of relations which do not commute here?
Also compose these relations on both sides with the total relation R2 .
(8) Consider the relation R R2 given by the square pictured below. The
square has vertices (0, 0), (1, 0), (0, 1), and (1, 1), and it is the union of
four closed intervals in the obvious way. Describe the relation R R.
y
(1, 1)
x
(9) If R0 R and S 0 S, then R0 S 0 R S.
(10) Let S Y Z and R X Y be relations. Suppose that S = S1 S2 .
Prove that
S R = (S1 R) (S2 R).
(11) Let R X Y be a relation. Prove that Y R = R and R X = R.

(12) What happens when you compose the elementary relations x,y with other
relations? (Think about both x,y R and R x,y .) Suggestion: Experi-
ment with the R2 relations.
(13) Let X, Y be sets, x, x0 X, and y, y 0 Y . Give a formula for x,y x0 ,y0 ,
where x, x0 X and y, y 0 Y .
(14) Square all 16 asterisk matrices in the previous section. Make a list of
the 2 2 matrices which are squares. I think it is an interesting project
to determine which n n asterisk matrices are squares. This, of course,
is equivalent to determining which relations on a set with n elements are
squares.
3. Functions
The modern definition of a function was not enunciated until the middle of the
19th century, a considerable time after the advent of calculus, for instance. Math-
ematicians before this just basically dealt with expressions, usually power series or
rational functions (quotients of polynomials). Words like singularity were used
for a point not in the domain of a function. If you were an expert mathematician,
you knew what you were doing. But without the modern notion, it is difficult to
understand things like inversion. For instance, the inverse trig functions are difficult
to grasp without the proper vocabulary of domain and codomain.
Definition. Let X, Y be sets. A function f : X Y is a rule which

assigns to every element x X a unique element y Y . We write
f (x) = y to indicate this rule. The set X is called the domain of f ,
and the set Y is called the codomain of f .
The words function, transformation, and map are all synonyms. Some use
the word target as a synonym for codomain. The word range is used in-
consistently, sometimes meaning codomain and sometimes meaning image (see
below). It is best avoided.
For example, the function f : R R given by f (x) = x2 has domain R and also
codomain R.
You may rightfully argue that in the (standard) definition above, I have used an
undefined concept rule. This is not insurmountable; I will later indicate how
one can alternately define function in terms of a certain kind of relation. (One
that passes the vertical line test.) It is essential, however that the function is
well-defined, or well-formed.
Example: The expression

(
x2 + x if x < 1
f (x) =
5 if x > 1
does not define a function f : R R. The problem is that it defines for example
f (0) in two different ways; both as 0 and 5! If a putative function (or rule) gives
more than one answer it is called not well-defined. Actually to say that a function
is well-defined is redundant, but we say so anyway to emphasize this point.
3. FUNCTIONS 89
1
Example: Writing f (x) = 2x does not define a function f : R R; it is obviously
not defined at 2. You cant later say f (2) = , unless you change the codomain to
include . (Or . Let us not talk about that.)

Example: Expecting
f (z) = z to define a function f : C C is (very) bad.
What should 1 be?
3.1. Injectivity
The following is one of the most important definitions in mathematics:
Definition. Let f : X Y be a function. We say that f is injective

provided that:
x1 , x2 X, we have (f (x1 ) = f (x2 )) (x1 = x2 ).
Using the contrapositive, we can equivalently say that f is injective iff
x1 , x2 X, we have (x1 6= x2 ) (f (x1 ) 6= f (x2 )).
The word one-to-one is often used as a synonym for injective, however it is

easily confused with the phrase one-to-one correspondence (see below), and so
I shant use it. The noun form of this concept is injection. A function is an
injection provided that it is injective.
Suppose
you are
grading a students work, in which he is trying to demonstrate
that 2 2 = 2. It reads:

To show: 2 2 = 2

1 2= 21

(1 2)2 = ( 2 1)2

1+2 2+2=22 2+1

32 2=32 2
Hence, proved.
Where did the student go wrong? First of all, he is not really explaining his logic
with sentences. Hence, proved. is not useful. So we have to guess his thought
process. The first step is evidently subtracting 1 from both sides of the equation,
and the second step is squaring both sides of the equation. But, the implied logic
seems to be: I wanted to prove two things are equal, I apply some operations to
both of them and they become equal. Therefore the original two things must be
equal. This only works if the operations are injective, and that is exactly where
the putative proof breaks down. So, injectivity is an important notion to bear in
mind throughout mathematics.
Is f (x) = x2 injective? Hold on; the question of injectivity really depends on the
domain of the function. If we mean f : R R it is certainly not injective, since
f (1) = f (1) = 1, but 1 6= 1. On the other hand, if we mean f : [0, ) R then

it is injective. The domain is important.
If you look at the graph of a real-valued function, with domain some subset of R,
the function is injective iff it satisfies the horizontal line test. This means that
any horizontal line must only intersect the graph at most once if the function is to
be injective. Can you explain why?
Suppose that f : I R, where I is an interval. We know from calculus that if

f 0 (x) > 0 for all x I, then f is a strictly increasing function. If you dont know
what that means,
Definition. Let f : I R. Then f is strictly increasing provided

that x1 , x2 I, we have (x1 < x2 ) (f (x1 ) < f (x2 )).
Lemma 3.5. If f : I R is a strictly increasing function, then f is injective.
Proof. Let x 6= y be in I. We may assume that x < y. Then f (x) < f (y),
which implies that f (x) 6= f (y). Therefore f is injective.
Similarly, if f 0 (x) < 0 for all x I, then f is strictly decreasing, which also implies
that it is injective.
Corollary 3.6. If f : I R has f 0 (x) > 0 for all x I, then f is injective. If
f 0 (x) < 0 for all x I, then f is also injective.
Caution: The domain needs to be an interval (i.e. connected). The function

f (x) = tan x satisfies f 0 (x) > 0 for all x in its natural domain, but an application
of the horizontal line test to the graph (which you should know by heart) shows
that it is certainly not injective.
3.2. Surjectivity
A function need not take on all the values in its codomain. Sometimes this set of
values is hard to calculate, as with f : R R given by f (x) = x4 7x + 1.
Definition. Let f : X Y . The image of f is the set

im f = {y Y | x X so that f (x) = y}.
For example, the set f : R R given by f (x) = x2 has im f = [0, ).
Let me state a powerful theorem from calculus, which combines the Extreme Value
Theorem and the Intermediate Value Theorem:
Theorem 3.7. Let a < b in R and f : [a, b] R a continuous function. Put
m = min{f (x) | x [a, b]} and M = max{f (x) | x [a, b]}. Then the image of f
is equal to [m, M ].
(The existence of the min and max is from the Extreme Value Theorem.)
3. FUNCTIONS 91
Next we have an example from multivariable calculus.
Example: Let f : R R2 be the function f (t) = (cos(t), sin(t)). The image of f

is the unit circle in R2 . If we use the same formula for f , but view it as a function
f : [0, /2) R2 , then the image of f is the part of the unit circle in the first
quadrant, with one endpoint closed and the other open.
Definition. Let f : X Y . We say that f is surjective provided that

Y = im f .
A synonym for surjective is onto.
Example: If f : [a, b] R is continuous then it is never surjective, by Theorem 3.7.
3.3. Bijectivity
Bijective functions are particularly important.
Definition. A function f : X Y is bijective provided that it is both

injective and surjective.
Example: The function f : [0, ) [0, ) given by f (x) = x2 is a bijection.
Example: Let us see that the function f : R2 R2 defined by f (x, y) = (x, x+y) is
a bijection. For injectivity, suppose that f (x1 , y1 ) = f (x2 , y2 ). Thus (x1 , x1 + y1 ) =
(x2 , x2 + y2 ), which is the statement that (x1 = x2 ) (x1 + y1 = x2 + y2 ). It follows
easily that (x1 , y1 ) = (x2 , y2 ), and so f is injective.
For surjectivity, let (a, b) R2 . We must determine whether there is an (x, y) so

that f (x, y) = (a, b). Thus we must solve (x, x + y) = (a, b), which is easily done.
Put x = a and y = b a.
Definition. Let X be a set. The function f : X X via

f (x) = x x X
is called the identity function on X, written f = idX .
It is obvious but important that idX is a bijection.
Here is what calculus/analysis says about continuous bijections of intervals:
Theorem 3.8. Let a, b R, and f : [a, b] R a continuous function. Then f is

a bijection onto its image iff either f is strictly increasing on [a, b] or f is strictly
decreasing on [a, b].
3.4. Composition of functions
A basic notion in mathematics is the composition of functions.

Definition. Let X, Y, Z be sets, and f : X Y, g : Y Z functions.

The composition g f : X Z is the function defined by
(g f )(x) = g(f (x)).
Example: Continuing notation from the following section, if B is a matrix with

n rows and p columns, then LB LA = LBA , where BA is defined by matrix
multiplication.
Composition is associative, since if h : Z T is a third function, we have

((h g) f )(x) = (h(g(f (x))) = (h (g f ))(x).
Note that we also have
f idX = f, and idY f = f.
Proposition 3.9. With notation as above,
(1) If f, g are injective, then so is g f .

(2) If f, g are surjective, then so is g f .
(3) If f, g are bijective, then so is g f .
(4) If g f is injective, then so is f .
(5) If g f is surjective, then so is g.
To appreciate some of this, consider the following example: Let X = Z = {a, b}

and Y = {P, Q, R}. Let f be the map f (a) = P , f (b) = Q. Let g be the map
g(P ) = a, g(Q) = b, g(R) = b. Then g f = idX and is therefore a bijection, but f
is not surjection and g is not injective.
Proof. For the first part, let x1 , x2 X. If g(f (x1 )) = g(f (x2 )), then since g
is injective we have f (x1 ) = f (x2 ), and since f is injective we deduce that x1 = x2 .
This shows that g f is injective.
For the second part, let z Z. Since g is surjective, there is a y Y so that

g(y) = z. Since f is surjective, there is an x X so that f (x) = y. Then
g(f (x)) = g(y) = z, which shows that g f is surjective.
The third part follows from the first and second parts, and the rest you should do
yourself.
3.5. Inverses of functions
Definition. Let f : X Y . A function g : Y X is inverse to f

provided that (g f = idX ) (f g = idY ).
Merely one of the conditions, i.e. g f = idX is not enough by the example above.
Example: The function f : R2 R2 , f (x, y) = (x, xy ) has inverse g(x, y) =

(x, y x), since:
(g f )(x, y) = g(x, x + y) = (x, y), (f g)(x, y) = f (x, y x) = (x, y).
3. FUNCTIONS 93
Proposition 3.10. (Uniqueness of Inverses) If g1 , g2 : Y X are both inverse to

f : X Y , then g1 = g2 .
Proof. We have
g1 = idX g1
= (g2 f ) g1
= g2 (f g1 )
= g2 idY
= g2 .

Proposition 3.11. A function f : X Y is bijective iff it has an inverse.
Proof. If f has an inverse g, then it is bijective by the last two parts of

Proposition 3.9. Conversely, suppose that f is injective and surjective. Define
g : Y X by the rule:
g(y) = the unique x X so that f (x) = y.
Note that such an x exists because f is surjective, and this x is uniquely determined
because f is injective. Now for all y Y ,
f (g(y)) = f (x) = y,
and for all x X, and y = f (x), we have
g(f (x)) = g(y) = x.
(Think these equations through!) Therefore g is the inverse of f .
Inverses depend very much on the domain and codomain of the function. For
2
instance f1 : [0, )
[0, ) given by f1 (x) = x has inverse g : [0, ) [0, )
2
given by g1 (x) = x, but f2 : (, 0] [0, ) given by f 2 (x) = x has inverse
g : [0, ) (, 0] given by g1 (x) = x.
This is especially prominent for the inverse trig functions. For instance, the sine
function naturally has domain R and codomain R, but it is not injective or surjec-
tive as such. One typically restricts the domain to [/2, /2] and the codomain
to [1, 1] to obtain a bijection. Thus, one has an inverse arcsin : [1, 1]
[/2, /2]. Note that with this convention arcsin(y) doesnt take the value ,
even though sin() = 0.
One could easily find other domains on which sin is injective, such as [/2, 3/2].
But the original one is more commonly taken, and numbers in this range are called
the principal value of the arcsine function.
A worse situation is trying to invert the cotangent function. [picture] Which do-
main should we take for cotangent? Different authorities make different choices.
Wikipedia, for instance, says that we should invert cotangent on (0, ) R, but
Mathematica says we should restrict cotangent to a function
cot : (/2, 0) (0, /2] R.
Both of these are bijections. What is arccot(1), for instance? Wikipedia says it
should be 3/4 and Mathematica says it should be /4.
This can be confusing if you dont have a handle on the domain/codomain concept.
Rather than trying to memorize conventions, try to understand what the logical
issue is.
3.6. Sections and Retractions of functions
Definition. Let f : X Y be a function. A function r : Y X is a

retraction of f provided that r f = idX . A function s : Y X is a
section of f provided that f s = idY .
Proposition 3.12. Let f : X Y be a function. Then
(1) f is an injection iff there is a retraction of f .

(2) g is a surjection iff there is a section of f .
Theorem 3.13. Let X and Y be nonempty sets. Then there is an injection from
X to Y iff there is a surjection from Y to X.
Proof. Suppose there is an injection f : X Y . Let r : Y X be a

retraction of f , so that r f = idX . Then r is the required surjection. Suppose
there is an surjection f : Y X. Let s : X Y be a section of f , so that
f s = idY . Then s is the required injection.
3.7. Exercises
(1) The last two parts of Proposition 3.9.

(2) Write down all the 3 2 *-matrices corresponding to injective functions
(from a set with 2 elements to a set with 3 elements).
(3) Give an example of a function f : [0, 1] (0, 1) which is injective but not
surjective.
(4) Let a, b, c, d R. Consider the function L : R2 R2 defined by L(x, y) =
(ax + by, cx + dy), and the function L : R2 R2 defined by L (x, y) =
(dx by, cx + ay). Compute L L and L L. Let D = ad bc. If
D = 0 prove that L is not injective. If D 6= 0 prove that L is a bijection.
(5) Let X be the set of nonzero vectors in R3 . Consider the relation: (x, y, z)
(x0 , y 0 , z 0 ) provided that there exists 6= 0 so that (x0 , y 0 , z 0 ) = (x, y, z).
Check that this is an equivalence relation. Describe an equivalence class
under this relation. Can you describe the quotient set?
(6) Let X = N2 . Say that two pairs (a, b) and (a0 , b0 ) in X are proportional
provided that ab0 = a0 b. Check that proportionality is an equivalence rela-
tion. Describe an equivalence class under this relation. Can you describe
the quotient set?
4. FUNCTIONS AS RELATIONS 95
(7) Let f : X Y and g : Y Z be functions. Suppose that g f : X Z

is surjective. Prove that g is surjective.
(8) Let f : R2 R be given by f (x, y) = x + y. Find two different sections
of f .
(9) Let f : R R2 be the map defined by f (t) = (2t + 1, t 7). Find two
different retractions of f .
4. Functions as Relations
In this section we embed the concept of a function into the concept of a relation.
Definition. If R Rel(X, Y ), and x X, put

R(x) = {y Y | (x, y) R} Y.
[Examples]
Definition. Let X and Y be sets. A relation R Rel(X, Y ) is called a

function from X to Y provided that for all x X, the set R(x) is a
singleton.
Let us apply this definition when X = . In this case

Rel(X, Y ) = (X Y )
= ( Y )
= ()
= {}.
Thus, Rel(X, Y ) consists of only the empty relation R = . Let us determine

whether this is a function from X = to Y . The question is whether it is true that
for all x X = , the set R(x) is a singleton. Since it is never true that x , the
statement
x X, R(x) is a singleton.
is vacuously true. Therefore R is a function. It is called the empty function.
We now extend the notions of injective, surjective, and bijective to the context of
relations.
Definition. A relation R is injective provided that: for x, x0 X and

y Y,
(x, y), (x0 , y) R x = x0 .
A relation R is surjective provided that y Y x X so that (x, y)
R. A relation is bijective provided that it is injective and surjective.
Proposition 3.14. Let X be a set.
(1) There is a unique function X; it is an injection.

(2) There is no function X , unless X = .
(3) There is a unique bijection .
4.1. Exercises
(1) Write down all the 3 2 *-matrices corresponding to injective functions

(2) Write down all the 2 3 *-matrices corresponding to surjective functions
(3) Let X and Y be sets, and R Rel(X, Y ). We say that R is injective
provided that, for x, x0 X and y Y , we have (x, y), (x0 , y) R x =
x0 . Explain why the empty function is injective.
(4) Let X and Y be sets, and R Rel(X, Y ). We say that R is surjective
provided that, for all y Y there exists x X so that (x, y) R. Give
an example of some R Rel(R, R) which is injective and surjective, but
is not a function.
5. Partially Ordered Sets
Definition. A relation R Rel(X) is a partial ordering relation on

X provided that:
(1) x X, we have (x, x) R. (Reflexivity)
(2) x, y, z X, we have (x, y), (y, z) R (x, z) R. (Transi-
tivity)
(3) x, y X, we have (x, y), (y, x) R x = y. (Antisymmetry:
R RT = X )
A set X considered with a partial ordering is called a partially ordered
set, or simply a poset.
We usually write x R y or simply x y to mean (x, y) R in this situation.
Examples:
Let X = N. The usual order on N corresponds to the relation R< =

{(a, b) | a b} on R. Then this is a total ordering. On the other hand,
the relation R| = {(a, b) | a|b} on N is just a partial ordering.
Let S be a set, and let (S) be the power set of S. Write R for the relation
{(A, B) (S) (S) | A B} on (S). This is the inclusion relation,
which is usually just a partial ordering.
Here are two natural partial orderings on N2 . The product order on N2 is
the prescription that (a1 , b1 ) (a2 , b2 ) provided that a1 a2 and b1 b2 .
For instance (4, 6) (7, 6). The lexicographic (or dictionary) order is the
rule that (a1 , b1 ) (a2 , b2 ) provided that (a1 < a2 ) (a1 = a2 b1 b2 ).
For instance (4, 6) (5, 1). In similar fashion one can take the product of
any two posets, or indeed any finite number of posets.
If X is any set, simple equality is a partial ordering.
Definition. Let X be a set, and 1 , 2 partial orderings on X. We say

1 is weaker than 2 provided that
x, y X, x 1 y x 2 y.
We also say here that 2 is stronger than 1 .
The equality relation is weaker than any other partial ordering. The product order
on N2 is weaker than the lexicographic order.
Note that the relation 1 is weaker than 2 is itself a partial order on the set of
partial orders on a set.
Definition. A partial ordering on X is a total ordering provided

that, for all x, y X, we have (x y) (y x). A poset where the
ordering is a total ordering is called a toset.
If R is the corresponding relation, then this condition is equivalent to X = R RT .
The usual ordering on N, Q, R are total orderings. The
5.1. Exercises
(1) List all partial orders on J3 . How many are total orders? How many are
well-orders?
(2) How many total orders are there on Jn ?
(3) Show that on a set X, no partial orders are strictly stronger than total
orders.
(4) Let X be a toset. Let
[
X0 = Xx .
xX
Prove that exactly one of the following is true:

(a) X = X 0 .
(b) X X 0 is a singleton {M }, where M = max X.
(5) Let X be a poset. Show that there is a subposet Y (X) so that X is
order-isomorphic to Y .
In this chapter you should have learned..
What a relation is, and how to compose two of them.

The definition of an equivalence relation, what an equivalence class is, and
an idea of what the quotient set is.
What a function is, when they are injective, surjective, bijective. What
an inverse is.
What equivalence relations have to do with partitions.
A little about what -invariant functions are, especially for defining func-
tions on angles.
(1) Let X be a set. Which relations on X commute with all other relations?
(Relations R and S on X commute provided that R S = S R.)
(2) Let X be a set. We say that a relation S on X is a square root of a
relation R provided that S S = R. Does every relation have a square
root? Is the square root unique if it exists?
Does the unit circle relation on R have a square root?
CHAPTER 4
Cardinality
99
100 4. CARDINALITY
1. Finite and Infinite Sets
We now define when two sets have the same size:

Definition. Let X, Y be sets. We say that X and Y are equipotent
provided that there is a bijection from X to Y . We write X Y if they
are equipotent.
Note that:
X X by using idX .
If X Y , then Y Y . This is because the inverse of a bijection is
another bijection.
If (X Y ) (Y Z), then X Z. This is because the composition of
two bijections is a bijection.
1.1. The sets Jn
Definition. Given n N let Jn = {0, 1, . . . , n 1}, a subset of N. Let

J0 = .
Lemma 4.1. Let n 1 and 0 j < n. Then there is a bijection from Jn {j} to
Jn1 .
Proof. Define f : Jn {j} Jn1 by

(
i if i < j
f (i) = ;
i 1 if i > j
it is certainly a bijection.
Proposition 4.2. Let m, n N.
(1) There is an injection from Jm to Jn iff m n.

(2) There is a surjection from Jm to Jn iff m n > 0, or m = n = 0.
(3) There is a bijection from Jm to Jn iff m = n.
(4) A function f : Jn Jn is an injection iff it is a surjection.
Proof. If m n, then inclusion Jm Jn gives an injection. We will prove

the converse, If there is an injection from Jm to Jn , then m n, by induction
on m. If m = 0, then m n, so that is settled. So suppose that m > 0, and
: Jm Jn is an injection. Let j = (m 1) Jn . (This implies n > 0.) Then
|Jm1 : Jm1 Jn {j}
is an injection. Let f be the bijection from the lemma. The composition
Jm1 Jn {j} Jn1
is an injection, being the composition of two injections. By induction, we conclude
that m 1 n 1, which implies that m n.
1. FINITE AND INFINITE SETS 101
For the second, use Theorem 3.13 together with the first part. The third part
follows from the first two parts.
Let f : Jn Jn be a surjection. Suppose that f were not injective. Then there

would be unequal j, k Jn so that f (j) = f (k). (This implies n 2.) Note that
f |Jn {j} Jn
would again be a surjection. (Why?) Composing this with a bijection from Jn1
to Jn {j} (via Lemma 4.1) we would obtain a surjection from Jn1 to Jn . This
would contradict part (ii), so the original f must be an injection.
We leave it to the reader to prove the converse, i.e. that an injection f : Jn Jn

must be a surjection.
Proposition 4.3. There is a bijection from Jm Jn to Jmn . There is a bijection
from the power set of Jn to J2n .
Proof. If m or n are 0, the statements are clear. So assume m, n 6= 0. For

the first statement, check that the function f (a, b) = na + b is a bijection. For the
second statement, use the function
X
f (A) = 2a .
aA

Example: Let n = 2. Then the bijection f : (J2 ) J4 is explicitly given by

f () = 0, f ({0}) = 1, f ({1}) = 2, and f (J2 ) = 3.
1.2. Finite Sets
Definition. A set X is finite provided that n N so that X is equipo-

tent to Jn . A set is infinite provided that it is not finite.
We have just proved that Jm Jn and the power set of Jn are finite sets.
Definition. If X is equipotent to Jn we write |X| = n and say that X

has cardinality n.
In particular, || = 0.
Theorem 4.4. Let X, Y be finite sets. Say |X| = m and |Y | = n. Let f : X Y
be a function. Then
(1) If f is injective, then m n.

(2) If f is surjective, then m n.
(3) If f is bijective, then m = n.
(4) If m = n, then f is injective iff f is surjective.
Proposition 4.5. Let X and Y be sets, with |X| = m and |Y | = n.
(1) The product set X Y has cardinality mn.

102 4. CARDINALITY
(2) The power set of X has cardinality 2m .
Proof. (First Part) By hypothesis, there are bijections : X Jm and

: Y Jn , and so
: X Y Jm Jn
defined by
( )(x, y) = ((x), (y))
is a bijection. It is easy to check that is a bijection. (The product of bijections
is a bijection.) Thus the composition
X Y Jm Jn Jmn
is a bijection. Thus |X Y | = mn.
The second part is similar; if f : X Y is a bijection, consider

(f ) : (X) (Y )
defined by (f ) : A 7 f (A) for A X. Check that (f ) is a bijection, and then
compose as in the first part.
Proposition 4.6. Let X be a set, and suppose there is an injection f : X Jn .

Then X is finite, and |X| n.
Proof. We proceed by induction on n. If n = 0, then f : X , which

implies that X = , so |X| = 0.
Supposing veracity of the proposition for n, consider an injection f : X Jn+1 .

If f is surjective, then it is a bijection and so |X| = n + 1. Otherwise there exists
j / f (X). Thus we have an injection followed by a bijection:
X Jn+1 {j} Jn ,
and so |X| n by induction.
Corollary 4.7. Any subset of Jn is finite. Any subset of a finite set is finite. If
X Y with Y finite and X 6= Y , then |X| < |Y |. In particular, a finite set is not
equipotent to a proper subset of itself. If f : X Y is a surjection, and X is finite,
then Y is also finite.
The reader familiar with linear algebra may appreciate the following theorem, which
is analogous to Theorem 4.4:
Theorem 4.8. Let V, W be finite-dimensional vector spaces, with dim V = m and
dim W = n. Let f : V W be linear. Then
(1) If f is injective, then m n.

(2) If f is surjective, then m n.
(3) If f is bijective, then m = n.
(4) If m = n, then f is injective iff f is surjective.
All linear maps on finite-dimensional vector spaces are the following, up to notation:
Let A be a matrix with n rows and m columns, and let LA (~v ) = A~v (matrix-vector
multiplication). Then LA : Rm Rn is a linear map, and thus the previous
theorem applies. There are numbers called rank and nullity associated to matrices.
The rank of A is n iff LA is surjective, and the nullity of A is 0 iff LA is injective.
In fact, the theorem above is deduced from the rank-nullity theorem (which says
that rank(A) + nullity(A) = m).
1.3. Some Combinatorics
Proposition 4.9. Let X and Y be finite sets, with |X| = n and |Y | = m. Suppose
there is a map f : X Y whose fibres all have the same cardinality d. Then
n = dm.
Proof. For each y Y , there is a bijection y : f 1 (y) Jd .
Define a map X Y Jd by
F (x) = (f (x), f (x) (x)).
Thus, if f (x) = y, then
F (x) = (y, y (x)).
We claim that F is a bijection. Suppose that F (x) = F (x0 ). Then f (x) = f (x0 ) = y,
so x and x0 are both in f 1 (y). The fact that F (x) = F (x0 ) also gives y (x) =
y (x0 ). Since y is injective, we have x = x0 . Therefore F is injective. To show
that F is surjective, let y Y and j Jd . Let x be the element in the fibre of y
which maps to j under y ; it is easy to see that F (x) = (y, j).
Thus |X| = |Y Jd | = md.
[Application to binomial coefficients.]

Proposition 4.10. Let X and Y be finite sets. Suppose there is a map f : X Y
whose fibres all have cardinality less than or equal to d. Then |X| d|Y |.
Proof. Exercise.
The contrapositive of this is:

Proposition 4.11. (Pigeon-hole Principle)Let X and Y be finite sets. If |X| >
d|Y | for some integer d 0, and f : X Y is a map, then for some y Y we
have |f 1 (y)| > d.
Let X be a set of pigeons, and Y be a set of holes. We can think of putting pigeons
in holes in terms of a function from X to Y . The fibre over a hole is the set of
pigeons put into that hole. If |X| > d|Y |, then there must be at least one hole with
more than d pigeons.
104 4. CARDINALITY
1.4. Unions, Intersections, and Coproducts
Definition. Let A, B be subsets of a set X. The coproduct of A and

B, written A q B, is the subset of X J2 given by
A q B = (A {0}) (B {1}).
Note in particular that X q X = X J2 .
There is a natural injection iA : A A q B given by a 7 (a, 0) and similarly an

injection iB : B A q B. There is a natural surjection p : A q B A B defined
by p(a, 0) = a and p(b, 1) = b. Note that p is a bijection iff A B = , and that
p iA = idA and p iB = idB .
Proposition 4.12. Let m, n 0. The coproduct Jm q Jn is equipotent to Jm+n .
Proof. A bijection is given by f (j, 0) = j and f (j, 1) = m + j.
Corollary 4.13. The coproduct of two finite sets is finite.
Corollary 4.14. Let A, B be finite subsets of a set X. Then A B is finite.
Proof. As above, there is a surjection from the finite set A q B to A B.
Proposition 4.15. A q B B q A.
(A q B) q C A q (B q C).
Definition. Let X and I be sets, and for each i S

I, let a subset Xi of
X be given. The union of the sets Xi is the subset iI Xi defined by
[
Xi = {x X | i I s.t. x Xi }
iI
[Example]
Definition. Let X and I be sets, and for each i I, let a subset

T Xi
of X be given. The intersection of the sets Xi is the subset iI Xi
defined by \
Xi = {x X | i I, x Xi }
iI
[Example]
Definition. Let X and I be sets, and for each i I, let a subset Xi of

X be given. The coproduct of the sets Xi , written qiI Xi , is the subset
of X I given by [
qiI Xi = (Xi {i}).
iI
1.5. Infinite Sets
Proposition 4.16. Let X be a set. The following are equivalent:
(1) X is infinite.
(2) There is an injection from N to X.
(3) There is a surjection from X to N.
(4) X is equipotent to a proper subset of itself. (Dedekinds Criterion)
Proof. For (2) (1), suppose that X is finite, with |X| = n, and there is
an injection from N to X. Then there is an injection Jn+1 N X Jn ,
contradicting [earlier].
For (2) (3), let f : N X be an injection, and r : X N a retraction of f ,

meaning that r f is the identity on N. Then r is necessarily a surjection from X
to N. The proof of (3) (2) is similar.
For (1) (2), suppose that X is infinite. We will recursively define an injection
from f : N X, as follows. Since X is nonempty, some x0 X. Put f (0) = x0 .
Now suppose that an injection f : Jn X is given. If f is a surjection, then it is a
bijection contradicting the infinitude of X. Otherwise, there is some xn X, and
we put f (n) = xn . Thus we recursively have distinct elements f (n) for all n N;
the resulting f is an injection.
By the previous corollary, (4) implies (2).
For (2) (4), let f : N X be an injection. Let xn = f (n) for all n and
Z = X {x0 , x1 , . . .}. We define a bijection g : X X {x0 } so that g is the
identity on Z and g(xn ) = xn+1 .
1.6. Exercises
(1) Let n 0. Suppose that a function f : Jn Jn is an injection. Prove

that it must also be a surjection.
(2) Let m, n 0. Prove that the map f : Jm Jn Jmn given by f (a, b) =
na + b is a bijection. Do this both by proving injectivity, surjectivity, and
also by giving an explicit inverse.
(3) Let n 0. Prove that the map f : (Jn ) J2n given by f (A) = aA 2a
P
is a bijection.
(4) Let A, B, C, D be nonempty sets, and let f : A B and g : C D be
two maps. Consider the product map f g : A C B D given by
(f g)(a, c) = (f (a), g(c)). Prove that f and g are injections iff f g is
an injection. Similarly for surjection and bijection.
(5) Let f : A B be a map. Define in a natural way a map (f ) : (A)
(B). Prove that f is an injection iff (f ) is an injection, and similarly
for surjection and bijection.
(6) Prove that the power set of a finite set is necessarily finite.
106 4. CARDINALITY
(7) Prove that if X is a finite set and f : X Y is a surjection, then Y is

also a finite set, and |Y | |X|. This was sketched in class; please fill in
the details.
(8) Prove that if X is a finite set, and Y X but Y 6= X, then |Y | < |X|.
(9) Proposition 4.10
(10) Let A, B be sets. Then there is a set A B and two maps pA : A B A
and pB : A B B with the following (universal) property: If C is a
set, and fA : C A, fB : C B are maps, then there is a unique map
f : C A B so that pA f = fA and pB f = fB .
(11) Let A, B be sets. Then there is a set A q B and two maps iA : A A q B
and iB : B A q B with the following (universal) property: If C is a
set, and fA : A C, fB : B C are maps, then there is a unique map
f : A q B C so that f iA = fA and f iB = fB .
(12) Given a natural number n, T write nZ for the set of integer multiples of n.
Describe the intersection nN nZ.
(13) Given a natural number n, write n1 Z Q for the set of integer multiples
of the fraction n1 . Describe the union eN 21e Z. What do these numbers
S
look like in binary?
(14) Describe the sets nN rQ (r n1 , r + n1 ) and rQ nN (r n1 , r + n1 ).
T S S T
Here (a, b) denotes the set of real numbers x with a < x < b.
(15) Show carefully using the definitions in class that the coproduct X q X is
equal to X J2 .
(16) Give an explicit bijection from the coproduct Jm q Jn to Jm+n .
(17) Prove carefully that the coproduct of two finite sets is finite, assuming the
previous exercise.
(18) Draw a picture of the coproduct qnN nZ as a subset of Z N.
(19) Prove that if X and Y are sets, then the coproducts X q Y and Y q X
are equipotent.
Let B be the set of sequences, where each term is either 0 or 1. For
instance,
(0, 1, 1, 0, 0, 0, 1, 1, 0, 1, . . .) B.
(a) Prove that B is equipotent to B B.
(b) Find a surjection from B to [0, 1] R.
2. Countable Sets
Definition. We say that a set X is denumerable provided that it is

equipotent to N. We say that X is countable provided that it is either
finite or denumerable. We say that X is uncountable otherwise.
Definition. If X is denumerable we write |X| = 0 and say that X

has cardinality aleph-naught.
The following sets are denumerable:
{2, 3, 4, . . .}
{2, 4, 6, 8, . . .}.
2. COUNTABLE SETS 107
Z.
Can you find bijections to N?
Note that if a set X is denumerable, this means that the elements of X can be
expressed as a sequence {x1 , x2 , . . .} of distinct elements.
Proposition 4.17. Let n 1. Then the product N Jn is denumerable.
Proof. A bijection f : N Jn N is given by f (m, r) = mn + r. Its inverse

g : N N Jn is given by
g(M ) = ((M r) n, r),
where r is the remainder of M upon division by n.
Corollary 4.18. The product of a denumerable set and a nonempty finite set is
denumerable.
Proposition 4.19. Z is denumerable.
Proof. A bijection f : N J2 Z is defined by the rule

f (n, 0) = n 1, f (n, 1) = n
What is the inverse?
Proposition 4.20. N N is denumerable.
Proof. A bijection f : N N N is given by f (a, b) = 2a (2b + 1). Its inverse

g : N N N is given by
g(n) = (ord2 (n), n 2ord2 (n) ).

Corollary 4.21. The product of two denumerable sets is denumerable. The prod-
uct of two countable sets is countable.
Proposition 4.22. An infinite subset X of N is denumerable.
Proof. We define a function f : N X recursively by putting

f (1) = min X,
and given f (1), f (2), . . . , f (n), put
f (n + 1) = min(X {f (1), f (2), . . . , f (n)}.
It is straightforward to see that f is a bijection. Its inverse g : X N could be
given by the prescription
g(x0 ) = |{x X | x x0 }|.

Corollary 4.23. Let Y be a countable set. Then any subset of Y is countable. If
there is an injection from a set X to Y , then X is countable. If there is a surjection
from Y to X, then X is countable. The union of two countable sets is countable.
108 4. CARDINALITY
Corollary 4.24. The set Q of rational numbers is denumerable.
Proof. Since Z is denumerable, the product Z N is also denumerable. The

function f : Z N Q given by f (z, n) = nz is clearly surjective, and so Q is
countable. Since N is a subset of Q, it must be that Q is infinite. Therefore Q is
denumerable.
Proposition 4.25. Let X and Y be sets, with Y countable. Suppose there is a
map f : X Y whose fibres are all countable. Then X is countable.
Proposition 4.26. If I is countable, and each Xi is countable, then qiI Xi is
countable.
Proof. Since each Xi is countable, there is an injection fi : Xi N. We may

therefore define a map
F : qiI Xi N I
via the prescription
F (x, i) = (fi (x), i).
Let us check that F is an injection. Suppose that (x, i) Xi {i} and (y, j)
Xj {j}, and F (x, i) = F (y, j). Then (fi (x), i) = (fj (y), j), so i = j. We get
fi (x) = fi (y), and since fi is an injection we conclude that x = y.
Since there is an injection of qiI Xi into the countable set N I, we conclude that
qiI Xi is countable.

S
Corollary 4.27. If I is countable, and each Xi is countable, then iI Xi is
countable.
S
Proof. The map p : qiI Xi iI Xi defined by p(xi , i) = xi is a surjection.
Since the domain is countable, the codomain must be as well.
Example: The set Z[x] is denumerable: Let Z[x]d be the set of polynomials of
degree d. It is clearly equipotent to Zd+1 and hence denumerable. Now

[
Z[x] = Z[x]d ,
i=0
and is therefore denumerable.
To appeal to the corollary, one often says a countable union of countable sets is
countable.
3. Uncountable Sets
3.1. Uncountability of R
Theorem 4.28. R is uncountable.

3. UNCOUNTABLE SETS 109
Proof. (Cantors Original Proof) Suppose it were countable. Then we could

express R as a sequence {x1 , x2 , . . .}. Consider the decimal representation of these
numbers. Form a new real number x by making its integer part 0, and for its nth
decimal place, look at the nth decimal place of xn and change it to the next bigger
decimal place. We claim that this number x we have just formed is not in that
sequence. Its not equal to xn because their nth decimal places differ. This is a
contradiction.
Cantor mailed his proof to another mathematician, Dedekind. Dedekind pointed

out a subtle flaw in his argument: Some numbers have more than one decimal
representation, like 0.99999 . . . and 1.000 . . .. So its possible that the number x
thats formed in the proof really is equal to some xn , even though the digits be
different. This flaw is easily remedied. It suffices to ensure that none of the digits
of x are 0 or 9.
Proof. (Cantors Fixed Proof) Suppose it were countable. Then we could

express R as a sequence {x1 , x2 , . . .}. Consider the decimal representation of these
numbers. Form a new real number x by making its integer part 0, and for its
nth decimal place, look at the nth decimal place of xn and change it to the next
bigger decimal place. Except, if the nth decimal place of xn is 8 or 9, we change
it to 1 instead. We claim that this number x we have just formed is not in that
sequence. Its not equal to xn because their nth decimal places differ. This is a
contradiction.
3.2. Uncountability of (N)
Theorem 4.29. Let X be a set. There is no surjection f : X (X).
Proof. Suppose there does exist such a surjection. Let

R = {x X | x
/ f (x)}
Since f is surjective, r X so that f (r) = R. Here is the Question: Is r R?
If yes, then r
/ f (r) = R, a contradiction.
If no, then r f (r) = R, a contradiction.
Thus anyway we have a contradiction, and we conclude that there does not exist
such a surjection.
Corollary 4.30. The set of subsets of N is uncountable.
Theorem 4.29 implies that there is no bijection from any set to its power set. In
some sense it means that, even if X is infinite, (X) is still bigger.
Consider the following statement:
If X is an uncountable subset of R, then X is equipotent to R.

110 4. CARDINALITY
This statement is called The Continuum Hypothesis. It basically asks if there is

a set bigger than N but smaller than R. It is a very interesting question whether
the Continuum Hypothesis is true or false. A reasonable person might say it was
proved by the logicial Kurt Godel, but logicians would say that he and later Paul
Cohen merely resolved the situation. More specifically, they proved that it is
independent of the axioms of set theory. We will not say more here, but refer
the reader to [3].
3.3. Existence of Transcendental Numbers
I would like now to give an application to the theory of algebraic numbers.

Definition. Let C. We say that is an algebraic number, provided that
there is a nonzero polynomial p(x) Z[x] so that p() = 0. We say that is
transcendental otherwise. Write Q for the set of algebraic numbers.
Lemma 4.31. Let p Z[x] be a nonzero polynomial. Then the set
Z(p) = { C | p() = 0}
is finite.
This follows from Theorem 7.34; please accept it for now to get to our application.
Proposition 4.32. The set of algebraic numbers is countable.
Proof. Let I = Z[x] {0}.
Then [
Q= Z(p)
pI
is a denumerable union of finite sets, and is therefore countable.
Corollary 4.33. There exist transcendental numbers.
Proof. Since R is uncountable, it cannot be that R Q.
3.4. Exercises
(1) Express the set [0, 1] in R as a countable intersection of open intervals in

R.
(2) Which of the following sets are countable? Explain.
(a) Z3 , the set of triplets of integers.
(b) (0, 1), the set of real numbers between 0 and 1.
(c) The set of real numbers with terminating decimal expansions.
(d) The set of square roots of positive rational numbers.
(e) The set P of polynomials whose coefficients are all 0s and 1s. For
example x67 + x4 + x, x3 + 1 P .
(f) The set of all sequences, where every term is either 1 or 1. For ex-
ample, (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, . . .). (This set arises in prob-
ability theory modeling coinflips.)
4. INTERLUDE ON PARADOXES 111
(g) The set of isosceles right triangles in R2 .

(h) The set of sequences (a0 , a1 , a2 , . . .) of real numbers which satisfy
an+2 = an + an+1 for all n 0.
(i) The set of solutions to the differential equation f 0 = f .
(j) The set of square roots of natural numbers (positive and negative).
(k) S 1 = {(x, y) R2 | x2 + y 2 = 1}.
(l) The set of integers which are congruent to 2 modulo 3.
(m) The set of real numbers with nonrepeating decimal expansion.
(n) The set of sequences of natural numbers.
(o) The set of all sequences of natural numbers which are eventually
constant.
(p) The set of all bounded sequences of natural numbers.
(q) The set of finite subsets of N.
(r) The set of rational functions (with integer coefficients), i.e. those of
x2 +7x4
the form p(x)
q(x) with p, q Z[x] and q 6= 0. For instance x11 2 is a
rational function.
(3) Give an example of a function f : (0, 1) [0, 1] which is bijective.
(4) Let X and Y be sets, with Y countable. Suppose there is a map f : X Y
whose fibres are all countable. Prove that X is countable.
(5) Make a sketch of the Cantor set. This is the set of numbers in [0, 1] whose
base-3 decimal expansion doesnt have any 1s. (Start with the interval
[0, 1]. The middle third is the set of numbers whose first digit base 3 is
equal to 1. So erase the middle third; you are left with two intervals. The
middle thirds of these intervals are numbers whose second digit base 3 is
1. So erase those as well. Continue...) Is this a countable set?
(6) Let us say a book is a finite list of ASCII symbols, including letters,
spaces, fullstops, etc. Is the number of possible books countable?
(7) Say a real number is definable if one can specify it with a finite number
of words. Forinstance, the positive square root of two is a satisfactory
definition of 2. Is every rational number definable? Is every real number
definable?
(8) In the spirit of the previous problem, is every subset of N definable?
(9) What is the smallest positive integer not definable in fewer than twelve
words?
(10) Let X be an uncountable set, and Y X a countable subset. Prove that
X Y is equipotent to X.
(11) Prove that the closed interval [0, 1] is equipotent to R.
(12) Find a surjective map from (N) to [0, 1] R. What are the fibres of
your map? Prove that (N) and R are equipotent. (Suggestion: use the
two problems above.)
4. Interlude on Paradoxes
We are soon going to run into some dangerous logical territory, with proofs by
contradiction on the verge of creating paradoxes in mathematics. This section is
likely to delight many of you but horrify the rest. Lets begin.
112 4. CARDINALITY
4.1. The Liars Paradox
A great interview question, if you dont like the candidate, is Is the answer to
this question, No ?. If they say Yes, then they havent answered the question
properly. If they say No, then the answer to the question was not No, so they have
not given the right answer. This paradox is a variation of the Liars Paradox, which
is, I am lying. Is that true or false? Again, both answers lead to a contradiction.
For us, the appropriate version is:
This statement is false.
Let us call this statement P . Thus P P and P P . Do you remember

earlier when I said: In mathematics, every statement is true or false? It seems that
this must be a statement P so that both P P is true.
This is very bad. Suppose that one has a statement P so that both P P is true.
That is, a contradiction in mathematics. Let Q be any other statement. Then by
the tautology
P (P Q),
Q must be true. Since Q was arbitrary, every possible statement is true. This
sounds like a disaster for mathematics.
How does one resolve this quandary? Well first of all, what I actually said ear-
lier was: In mathematics, every well-formed statement is true or false. This and
subsequent paradoxes point to the need to more rigorously define the notion of
well-formed. When one studies the subject of Logic, one takes great pains to say
exactly what is meant by this. For instance, Srivastavas book [12], starts by defin-
ing what different kinds of symbols are, what terms are made up of these symbols,
what formulas are made up of terms, etc. etc. The method is recursive, and one
does not see a way to form a self-referential statement. We will not give these de-
tails in these notes, but the resolution is essentially that self-referential statements
are not well-formed.
Here are some similar paradoxes of this nature to enjoy:
The next statement is true. The previous statement is false.

Yields falsehood when preceded by its quotation yields falsehood when
preceded by its quotation. (Quines paradox)
If this statement is false, then you are a zombie. (Currys paradox)
4.2. The Grelling-Nelson Paradox
Next is a linguistic paradox. I do this because I dont want you to go thinking that
paradoxes are entirely the fault of mathematicians...
Definition. An adjective is called autological if it describes itself. It

is called heterological if it does not describe itself.
For example, the word noun is a noun. So the word noun is autological. The
word verb is not itself a verb, so the word verb is heterological.
4. INTERLUDE ON PARADOXES 113
Here are some autological words: pentasyllabic, english, awkwardnessful, cutesy,

erudite
Here are some heterological words: bisyllabic, incomplete, tree, red, long
(Dont take this too seriously. Of course with many adjectives it is a grey area
whether they are one or the other.)
So here is the question: Is heterological a heterological word? Think about

it... Well if it is, then heterological doesnt describe itself. Which means that it
must be autological. Which means that it must describe itself. Contradiction!
But if it isnt, then it does describe itself. So it must be heterological. Which
means it must describe itself. Contradiction either way!
Therefore there is no correct yes or no answer to the question. Too bad for
linguistics.
Berrys paradox is both mathematical and linguistic:
Let n be the smallest natural number not definable in fewer than twelve words.
But we just defined it with eleven words!
4.3. Russells Paradox
Let S denote the set of all sets, sometimes called the Universal Set. Remember
that the empty set is the set so that x, x / . Well, S is the set so that x,
x S. Pretty simple to understand, right? One curious thing youll notice about
S is that it is an element of itself, meaning S S. Can you think of other sets like
that? We have the set of infinite sets, say, I = {A S | A is infinite}; certainly I
is infinite, so I I? Does the set of abstract thought qualify?
Let R = {A S | A / A}. This is the set of sets which are not members of
themselves. For instance N R, since N
/ N. (Of course N N, but that is a
different thing.)
Here is the question: Is R R ?
Well, lets look at both cases. If R R, then R

/ R. On the other hand, if R
/ R,
then R R...Both ways lead to a contradiction!
Heres a way to think about it: S is not well-formed, because if it is itself to be a set,
then its definition is again self-referential. So not everything you can name qualifies
as a set. This paradox is very important historically; it called for a profound
reexamination of what qualifies as a set. Since sets are the foundation for all of
modern mathematics, many logicians worked very hard to articulate exactly what
should or what shouldnt be a set. In fact, we now no longer allow any sets to be
members of themselves. So there is no set of all sets, or set of all infinite sets.
We dont even have a set of all finite sets, or indeed a set of all singletons. Ill
show you soon how that last one leads to a paradox.
114 4. CARDINALITY
Most mainstream mathematicians accept the Zermelo-Fraenkel Axioms to describe

what should and shouldnt be a set, just like the Peano Axioms told us what N
should be. If one follows this theory, then one starts with the empty set, and
then allows taking of power sets, product sets, and subsets defined by well-formed
conditions. At no point does one reach anything like the set of all sets. We dont
believe that these axioms can lead to a contradiction, but indeed noone has proved
that they dont!
4.4. The Singleton Paradox
Ready for more? Heres a hair-raising one, that destroyed a monumental work of
Frege, one of the fathers of logic. Frege had a theory of number that came before
Peano. He had a wild idea for defining numbers. He said that the number n
should be the set of all sets X with |X| = n. For instance 1 should be the set
of all singletons. (By singleton I mean a set with only one element.) When the
set-theory paradoxes appeared, one of them undermined his opus Grundgesetze
der Arithmetik. We will describe this now.
Let S be the set of all singletons. Observe that every set X injects into S, because
one can define f : X S to be f (x) = {x}. In particular, if we put X = (S) we
obtain an injection f : (S) S.
Exercise: Let A, B be nonempty sets, and f : A B an injection. Prove that f

has a left inverse g : B A so that g f = idA . In particular, g is a surjection.
By the exercise, there is a surjection g : S (S), which contradicts the theorem

above.
The resolution of this paradox is that there is no set of all singletons! This kind of
paradox is quite alarming, because theres noting obviously self-referential in the
formation of S.
Exercise: Let n N any number. Show that the notion of the set of all sets with
n elements leads to a paradox.
4.5. Unadoxes: Just for Fun
Most paradoxes of the sort mentioned above have a corresponding unadox. Recall
that a paradox is a statement, which when assigned either truth value gives a
contradiction. A unadox is a statement, which when assigned either truth value,
does not give a contradiction. Each unadox could be true or false; there is no way
to tell, but neither true nor false would give any contradiction. It is associated to
the paradox by changing a key false somewhere (maybe implicit) to a true.
Example: The Liars paradox is: This statement is false. The corresponding
unadox is: This statement is true. If you declare that it is true, that is consistent
because the statement is then true. If you declare that it is false, that is consistent
because the statement is then false.
Do you get the idea? Can you find a corresponding unadox for the following para-
doxes should be?
Is the answer to this question No. ?

The next statement is true. The previous statement is false.
Heterological is a heterological word. (Grelling-Nelson paradox)
Is the set of all sets which are not members of themselves a member of
itself? (Russells paradox)
Yields falsehood when preceded by its quotation yields falsehood when
preceded by its quotation. (Quines paradox)
If this statement is false, then you are a zombie. (Currys paradox)
5. Some History
1800 BCE Babylonian tablet with Pythagorean triples

600 BCE A Cretan says, The Cretans are always liars
529 BCE Pythagoreans prove that there are irrational quantities
285 BCE Publication of Euclids Elements
400-500 Chinese Remainder Theorem
415 Death of Hypatia & of Classical Period
458 Place Value System in India
820 Al-Khwarizmi develops algebra
1350-1425 Madhava: infinite series
1673 Leibniz coins the word function
1680 Newtons Principia Mathematica
1834-7 Dirichlet and Lobachevsky clarify the definition of function
1873 Cantors Theory of Sets and Cardinality
1889 Peanos Axioms
1902 Russells Paradox
1908-1922 Zermelo-Fraenkels Axioms of Set Theory
1931 Godels Incompleteness Theorem
( See page 26 of Krantzs book [6] for much more.)
Sometimes we have precise dates, sometimes that information is lost...
(1) Let m, n 0 be whole numbers. Prove the following:

(a) There is a linear injection from Rm to Rn iff m n.
(b) There is a linear surjection from Rm to Rn iff m n.
(c) There is a linear isomorphism from Rm to Rn iff m = n.
(d) A linear map f : Rn Rn is an injection iff it is a surjection.
(Note: R0 = {0}.)
116 4. CARDINALITY
(2) Is there a countable group, with uncountably many subgroups?

(3) For the following sets, which are denumerable? Which are equipotent to
R? Which are equipotent to (R)?
(a) (N)
(b) Seq(N) [Hint: Learn what a continued fraction of a real number
is.]
(c) R2
(d) The set of bijections from N to N.
(e) The set of open subsets of the real line.
CHAPTER 5
Equivalence
117
118 5. EQUIVALENCE
1. Equivalence Relations
There are some basic axioms of equality which we use all the time, usually without
thinking. No matter what elements a, b, c we have of some set, we always have:
a = a,
a = b b = a,
((a = b) (b = c)) (a = c).
In other words, the equality relation (which we called X earlier), is reflexive,

symmetric, and transitive. Often in life and in mathematics, we want to formulate
some notion of equivalence for various things, meaning that we want to treat
things as the same if they satisfy some criterion. The world is a big place, and
simplification makes our lives easier. For instance, we might consider triangles
equivalent if they have the same angles. This relation is called similarity. Or we
might consider triangles equivalent if they have the same side lengths, even though
their position in space may differ. That relation is called congruence, and it is a
stronger condition than similarity. When should a relation count as an equivalence?
Definition. Let R be a relation on a set X. Then R is an equivalence

relation provided that it is reflexive, symmetric, and transitive.
Similarity and Congruence of triangles are clearly equivalence relations. Lets do

an example where we have to work to prove it.
Definition. Let n Z. We say that two integers a, b Z are congruent

mod n provided that n | (a b).
Proposition 5.1. Congruence mod n is an equivalence relation.
Proof. For reflexivity, we note that n | (a a) for any a Z. For symmetry,

let a, b Z with n | (a b). Then since b a = (1)(a b), we also have n | (b a).
This proves symmetry. For transitivity, let a, b, c Z with (n | (ab))(n | (bc)).
Then certainly n | (a b + b c); that is n | (a c). This proves transitivity.
Note that if n = 0, then equivalence mod n is the same as equality. If n = 1, then

any a, b are equivalent mod n.
1.1. Partitions
How would a graph of vertices and edges look if it gives an equivalence relation on
the vertices? Since it is reflexive, there will be a loop at every vertex. Since it is
symmetric, every edge will be simple. Transitivity is the interesting property. If
there is a (simple) edge connecting v1 to v2 and another connecting v2 to v3 , then
there must be a third edge connecting v1 to v3 .
[Picture]
1. EQUIVALENCE RELATIONS 119
Try drawing a graphs with this transitivity property. Youll quickly find that your
graphs are all disjoint unions of complete graphs. That is, no two of them intersect.
Here is an example of such:
j g
e
b h
c
d i
f
a k
This graph on the vertices {a, b, c, d, e, f, g, h, i, j, k} is a disjoint union of three

complete graphs.
Definition. Let X1 , X2 be subsets of a set X. We say that X1 and X2

are disjoint provided that X1 X2 = .
These complete graphs illustrate what are called equivalence classes.
Definition. Let X be a set with an equivalence relation , and x X.

The equivalence class of x is the set of elements equivalent to x, i.e.,
{y X | y x}. It is written [x].
The triangle in the graph above can be expressed as [a] or [b] or [c]. And so [a] =
[b] = [c]. The entire graph is the union X = [a][d][f ] of three equivalence classes.
This union can also be expressed in other ways, for instance: X = [b] [e] [f ] of
course.
These three equivalence classes form what is called a partition of X.
Definition. Let X be a set. A partition of X is a family of pairwise

disjoint subsets of X whose union is X. These sets are called parts.
We will prove that if X is a set with an equivalence relation, then the set of
equivalence classes form a partition of X. As another example take X = Z and the
mod 2 equivalence relation above. The corresponding partition of Z is
Z = { odd numbers} { even numbers}
The set of odd numbers is the equivalence class containing 1. It is also the equiv-
alence class containing 13. In modular arithmetic we usually write x for [x]. So
in mod 2 equivalence 0 denotes the set of even numbers, and 1 the set of odd
numbers. But it is also true that 4 is the set of even numbers, since any number
equivalent to 4 is even. Thus 0 = 4. Similarly, 13 = 1.
Remark: When using this symbolism, it is important that the mod 2 is understood.
Context lets you know that this is not mod 3, for instance.
Proposition 5.2. Let X be a set with an equivalence relation . Then the equiv-
alence classes [x] form a partition of X.
120 5. EQUIVALENCE
S Proof. Reflexivity says that x [x], which shows in particular that X =

xX [x]. Symmetry says that if x [y], then y [x]. Suppose this is so. Then if
z is something else in [x], by transitivity, z [y]. Since everything in [x] is in [y],
we write [x] [y]. The same argument starting with z [y] concludes that z [x],
thus [y] [x]. This shows that [x] = [y]. Conclusion: if x [y], then [x] = [y].
This implies that equivalence classes dont overlap. If any element z were in the
overlap, i.e., z [x] and z [y], then [z] = [x] = [y]. Thus they are in fact the
same class.
Conversely, if you have a partition of X, you can define an equivalence relation on

it by saying x y if x and y are in the same part. Then the equivalence classes for
will be the parts of the original partition of X.
1.2. Quotient Set
Lets say you have a set with an equivalence relation. If you really think of equiv-
alent things as being the same, in the sense that you replace x y with x = y,
then you are shrinking the set X down to what is called the quotient set.
Definition. Let X be a set and R an equivalence relation on X. We

write X/R for the set of equivalence classes for R.
The set X/R is the quotient of X by R.
In other words, X/R = {[x] | x X}.
Example: Consider Z with the mod 2 equivalence relation. Then Z/R = {[0], [1]}.
Traditionally, we write Z/2Z for Z/R and so the previous sentence could also be
written as Z/2Z = {0, 1}.
Example: More generally, let n N and consider congruence mod n. Then

Z/R = Z/nZ = {0, 1, . . . , n 1}. (These are the possible remainders upon division
by n.)
Example: Let I = [0, 1], the closed interval. Let me describe a partition of I with
infinitely many parts. If 0 < x < 1, then the singleton {x} will be a part. (A
singleton is a set with exactly one element.) Other than these singletons, I declare
{0, 1} to also be a part. This describes a partition. The equivalence relation
corresponding to this partition is x y provided that (x = y) (x = 0 y =
1) (x = 1 y = 0). The quotient set I/ naturally forms a circle. The idea is
that you can start at [0], move to the right through all the x (0, 1), and then end
up at [1]. But [1] = [0], so you have wound up where you started from. Just like
on a circle.
Example: Let X be the square [0, 1] [0, 1]. Consider the partition of X as
follows: If 0 < x < 1 and y [0, 1], then the singleton {(x, y)} will be a part. Other
than these, declare {(0, y), (1, y)} to be a part for every y [0, 1]. The equivalence
relation corresponding to this relation really just says that we call the two vertical
1. EQUIVALENCE RELATIONS 121
sides of the square equivalent. The quotient set X/ naturally forms a cylinder.
Can you see how?
Example: Let X be the square [0, 1][0, 1]. Consider the partition of X as follows:
If 0 < x < 1 and 0 < y < 1, then the singleton {(x, y)} will be a part. Other than
these, declare {(0, y), (1, y)} to be a part for every y (0, 1), declare {(x, 0), (x, 1)}
to be a part for every x (0, 1), and declare {(0, 0), (1, 0), (0, 1), (1, 1)} to be a part.
The equivalence relation corresponding to this relation really just says that we call
the two vertical sides of the square equivalent, and also the two horizontal sides of
the square equivalent. The quotient set X/ naturally forms a torus. Can you
see how?
Example: Let X be the square [0, 1][0, 1]. Consider the partition of X as follows:
If 0 < x < 1 and 0 < y < 1, then the singleton {(x, y)} will be a part. Other than
these, declare the union of all four sides of the square as one part. Thus, we shrink
the entire boundary down to one point in the quotient set. In fact, The quotient
set X/ naturally forms a sphere. Can you see how?
How can you make a Moobius strip?
Example: Let S be the set of sequences of decimal digits s = (d1 , d2 , d3 , . . .) with

each di a digit in base X =ten. Consider the following partition of S: If s neither
ends in repeating nines nor repeating zeros, then the singleton {s} will be a part.
Other than these, declare a pair of sequences of the form

{(d1 , d2 , . . . , dn , 9, 9, 9, . . .), (d1 , d2 , . . . , dn + 1, 0, 0, 0, . . .)}
to be a part if dn 6= 9. (Actually we will need one more singleton {(9, 9, 9, . . .)}.)
Thus we are really considering these sequences to be equal. The quotient set of S
by this equivalence relation naturally forms the interval [0, 1]. This can be used to
construct the set of real numbers.
Example: Let R[x] be the set of polynomials with real coefficients. Say that
polynomials f, g are equivalent provided that x2 + 1 divides f (x) g(x). Believe it
or not the quotient set is the definition of the complex numbers C. Why dont you
ponder that for a while?
1.3. Exercises
(1) Let n N. Prove that two numbers a, b N are equivalent mod n iff they
have the same remainder upon division by n.
(2) Let X be the set of functions from R to itself. Let g X, and write O(g)
for the set of functions f X so that
f (x)
lim exists and is finite.
x g(x)
Consider the relation f g provided that f O(g). Is this an

equivalence relation? Explain. Now fix a function h X (for example, x2
or ex ). Consider the relation f g if f g O(h). Is this an equivalence
relation? Explain.
122 5. EQUIVALENCE
2. The Positive Rationals Q+
We are about to make our first great leap in mathematical thought: the construction
of (positive) rational numbers. One defect of the natural numbers is one can solve
some division problems but not others. For example the theory does not include
any meaning for dividing 1 by 2. When we buy apples in a grocery store this doesnt
cause any problem because we only need to add and occasionally multiply them.
But when we want to share them with a friend or make muffins we may need to
speak of fractional parts of apples. Now if all our recipes were written in terms of
eighths of apples, for instance, we could do the following. We could write 1 for
an eighth of an apple, 8 for a full apple, and multiply all our previous tallies by
8. This is unpleasant for several reasons, aesthetic and practical. And if I wanted
to distribute an apple amongst a set of quintuplets I would be at a loss. In other
words, we would like a logical system in which we can add, multiply and divide
natural numbers.
2.1. Ratios
The solution to our problem is roughly like this: We form the set of division prob-
lems, decide which of them should be equivalent, and then mod out by this
equivalence relation. We would like to do arithmetic, so we will need to spend some
care defining addition and multiplication on these problems.
We will denote by R+ the set of positive ratios (a : b), where a and b are natural
numbers. Strictly speaking, a ratio is just an ordered pair of numbers, but you
should think of them as division problems. So (a : b) is the division problem of a by
b. Consider the ratios (2 : 1) and (4 : 2). Strictly speaking they ask two different
questions, although they both should have the same answer 2. We call two ratios
proportional if they should have the same answer.
Definition. Ratios (x : y) and (a : b) are proportional if x b = a y.
Write (x : y) (a : b) if they are proportional.
Although we have not defined fractions yet, you may intuitively think of the ratio
x
(x : y) as getting at the fraction ; this will help explain some of the formulas. For
y
x a
example, it (x : y) (a : b) if = .
y b
Example: (2 : 6) (3 : 9); these ratios are proportional but not equal. Just to be
clear, two ratios (a : b) and (c : d) are equal only if a = c and b = d.
You should check that proportionality is an equivalence relation on R+ .
Proposition 5.3. Proportionality is an equivalence relation on R+ .
Proof. Reflexivity: Since x y = y x, we have (x : y) (x, y).

2. THE POSITIVE RATIONALS Q+ 123
Symmetry: Suppose that (x : y) (a : b). Then x b = a y, thus b x = y a and

therefore (a : b) (x : y). We will henceforth not be so careful about the order of
multiplication.
Transitivity: Suppose that (x : y) (a : b) and (a : b) (c : d). Then we have the

two equations
xb = ay, ad = bc.
Multiplying the first equation by c gives xbc = ayc. Using the second equation
gives xad = ayc. Using the cancellation law for multiplication gives xd = yc. This
implies that (x : y) (c : d).
Now that we have an equivalence relation we can start thinking about equivalence
classes, such as [(1 : 1)]. In fact (a : b) [(1 : 1)] exactly when a = b.
a c a+c
[Explain whats bad about b + d = b+d .]
We now present the proportion-invariant operations of addition and multiplication

on R+ / .
Addition is defined via (a : b) + (c : d) = (ad + bc : bd) and multiplication via

(a : b) (c : d) = (ac : bd). Check that these operations are -invariant. Thus
they give operations on R+ / .
Definition. The set R+ / , with the above addition and multiplication

laws is called Q+ . We write ab for the equivalence class [(a : b)].
Thus we have the familiar rules
a c ad + bc a c ac
+ = and = .
b d bd b d bd
One should check that Associativity, Commutativity, and Distributivity of Addition

and Multiplication are satisfied. They follow from the corresponding properties of
N.
[include proof of associativity.]
2.2. Relationship with N
We would like to identify certain fractions as being integers. Here is how one
formally makes this identification.
Define a function : N Q+ via (n) = n1 . (We want to identify n with n1 .) First

note that if (m) = (n), then m = n, so we dont lose any information by applying
. Next, check that (m + n) = (m) + (n) and (mn) = (m) (n). Thus the
identification preserves addition and multiplication. Another way of viewing this
is to say that the addition and multiplication on Q+ extend the addition and
multiplication on N. In summary, via , we may view Q+ as an extension of N.
124 5. EQUIVALENCE
a
Note for instance, that b N exactly when b|a.
[Division.]
Definition. Let p, q N. A ratio (p : q) is called reduced if p and q

are relatively prime.
Proposition 5.4. Every ratio is proportional to a reduced ratio.
Proof. Let a, b N. Let d = gcd(a, b). It follows from Exercise 9 in Section

9.3 that gcd(a d, b d) = 1. It is easy to see that (a : b) (a d, b d).
2.3. Exercises
For the first four exercises, use the equivalence class definitions.
(1) Check that the operations of addition and multiplication above are indeed
-invariant.
(2) In this exercise we will use the first quadrant of the Cartesian plane to
plot sets of ratios. To a ratio (a : b) associate the point (a, b).
(a) Plot the set of ratios equivalent to the ratio (1 : 2). It should be an
infinite sequence of collinear points, and determines a line. What is
its slope?
(b) Plot the set of ratios which are of the form (1 : 2) (a : b), with
a, b N.
(c) Plot the set of ratios which are of the form (2 : 4) (a : b), with
a, b N.
(d) Plot the set of ratios which are of the form (1 : 2) + (a : b), with
a, b N.
(3) Prove that if two ratios in R+ are proportional and reduced, then they
must be equal. (Note we are only using positive numbers.)
(4) Does the distributive law hold in Q+ ? Does it hold in the set R+ of
positive ratios? Give proofs or counterexamples.
The rest of the exercises do not focus on the equivalence class definitions.
(5) (Improper Fractions) Let m, n N. Prove that there are numbers a, b N

with b < n so that
m b
=a+ .
n n
(6) Let p be a prime number, and a, n N. Prove that there are whole
numbers b, a1 , . . . , an with
n
a X ai
= b + ,
pn i=1
pi
and each ai < p.
2. THE POSITIVE RATIONALS Q+ 125
(7) Suppose a, b1 , b2 Q+ satisfy a < b1 +b2 . Prove that there are a1 , a2 Q+

so that a = a1 + a2 , a1 < b1 , and a2 < b2 .
(8) Suppose a, b1 , b2 Q+ satisfy a < b1 b2 . Prove that there are a1 , a2 Q+
so that a = a1 a2 , a1 < b1 , and a2 < b2 . (See Exercise 13 in Section 9.7.)
CHAPTER 6
Rings
127
128 6. RINGS
1. Abstract Algebra
You probably learned a lot of algebra in school. You learned how to solve equations
like ax + b = c, and quadratic equations, you learned how to combine terms, that
you should add exponents when multiplying powers, you learned to solve systems
of equations, et cetera.
If you learned about imaginary and complex numbers, you didnt have to relearn
those rules or develop much more algebraic intuition. You had to learn the rule
i2 = 1 and how to rationalize complex denominators, but you could still use all
the skill from before.
If youve had some linear algebra, you know that square matrices of the same size
can be treated much like numbers. They can be added, multiplied, raised to powers,
and you can often solve an equation of matrices AX = B by multiplying both sides
by A1 . Again, much of what is true about the algebra of numbers is also true
for matrices. Of course, much of the training of linear algebra is to be cautious
with your intuition, since most of the time it is not true that AB = BA. You
have to distill out which of your intuition comes from commutativity and which is
independent of it. But you still want that overall algebraic intuition.
In working with modular arithmetic, you can use a great deal of the intuition from
those high school algebra days. As we saw, you can still add b to both sides of an
equation to cancel a b. You can often divide both sides of an equation by a. If the
modulus is odd, the quadratic equation still works basically the same way.
There are tons of these different algebra systems in mathematics, and were going to
focus for this chapter on one type of algebra system called a ring. Examples of rings
we have seen so far include Z, Q, R, and all of the Z/nZs for n > 1. Philosophically,
knowing something is a ring means you can transfer a certain amount of algebraic
intuition to studying it. More practically, you can prove lots of results in the context
of abstract ring theory, and they will automatically be true for not only every ring
youve ever met, but every ring youll meet in the future.
In the next section we introduce the concept of a ring, and study the abstract idea
of divisibility. In particular, when can the product of two things be 0 or 1? A field
is a special kind of ring, when everything is divisible by everything else (except 0).
We will discover ways to produce new rings from old. For instance, given a ring
R, you can talk about the ring of matrices Mn (R) whose entries are in R, and the
ring of polynomials R[x] whose coefficients are in R.
One of the greatest analogies in mathematics is that between the integers Z and the
ring of polynomials F [x] where F is a field. The most important themes of Chapter
1 carry over, in particular the Fundamental Theorem of Arithmetic, in the sense
that any nonzero polynomial factors into a product of irreducibles in essentially one
way. I hope you will find many other similarities as well.
Finally we demonstrate how to mod out in the general context of a ring. This is
an interesting way to create new rings satisfying (almost) whatever relations you
2. RINGS 129
like. for instance we can force a ring to have an element x satisfying x2 = 1; this
leads to the complex numbers.
2. Rings
2.1. Definition
Intuitively a ring is a place where you can add, subtract, and multiply. You cant
necessarily divide. Thats what you tell your friends. Of course, you need to define
what you mean by add, subtract, and multiply, though the words are suggestive.
Definition. A ring is a set R with two operations + and satisfying

the following axioms.
(1) (Associativity of Addition) For a, b, c R, (a+b)+c = a+(b+c).
(2) (Commutativity of Addition)For a, b R, a + b = b + a.
(3) (Additive Identity) There is an element 0R R so that for all
a R, a + 0R = a.
(4) (Additive Inverses) For a R there is an element a R so
that a + (a) = 0R .
(5) (Associativity of Multiplication) For a, b, c R, (ab)c = a(bc).
(6) (Multiplicative Identity) There is an element 1R R so that for
all a R, 1R a = a 1R = a.
(7) (Distributivity) For a, b, c R, a (b + c) = a b + a c and
(a + b) c = a c + b c.
(8) (Nontriviality) 0R 6= 1R .
So for instance, 0Z/nZ = 0 and 0Q = 01 .
Definition. A ring R is said to be commutative provided that it also

satisfies
(Commutativity of Multiplication) For a, b R, a b = b a.
Most of our rings will be commutative; rings of matrices will be the main examples
of noncommutative rings.
The first four axioms give the fundamentals of addition and subtraction. They
imply you can always solve the x + a = b problem for x: If a, x, b R and x + a = b,
then adding a to the right of both sides yields
(x + a) + (a) = b + (a)
x + (a + (a)) = b + (a)
x + 0R = b + (a)
x = b + (a).
Above we have used associativity, the property of additive inverses, and the property
of the additive identity.
That proof reminds me to make a definition:

130 6. RINGS
Definition. Let a, b R. Then a b is defined as a + (b).
So we have our first result in pure ring theory:

Proposition 6.1. If a, b, x R and a + x = b, then x = b a.
I remark that we wont generally be so pedantic about the use of parentheses or

even mention that were using associativity of addition and multiplication; we did
enough of that with the Peano Arithmetic. Also when convenient we will drop the
dot in a b and write ab instead.
Note we have not included an axiom for a multiplicative inverse. This is intentional,
and part of what makes ring theory interesting. More on that later.
Here is another example of a proposition in pure ring theory.

Proposition 6.2. Let x R. Then 0R x = 0R .
Proof. Since 0R is the additive identity, x+0R = x. Multiplying this equation

by x yields xx+0R x = xx. By the previous proposition, 0R x = xxxx = 0R ,
as desired.
I am now reminded to make another definition.

Definition. Let R be a ring, a R and n N. Then
(
n a if n = 1,
a =
aa m
if n = m0 .
The axiom of nontriviality really only excludes the trivial ring. This is because if
0R = 1R , then if x R, we have x = 1R x = 0R x = x by the above Proposition.
By the way, this is what we would get if we considered Z/nZ for n = 1.
Here are some more ring facts which the eager reader may enjoy proving.
(1) For a R, (a) = a.

(2) a = (1R )a.
(3) If b R, then ab = (a)(b).
(4) am an = am+n for m, n N.
(5) If R is commutative, then (ab)n = an bn .
(6) (am )n = amn .
Remark: There are a couple different conventions about what axioms a ring should
have. First, some authors do admit the trivial ring. Secondly it is sometimes
interesting to study rings which dont have a multiplicative identity, like the even
integers. We will not pursue this.
2.2. Divisibility
In this section and the next R is a commutative ring.

2. RINGS 131
Definition. Let a, b R. We say a divides b, or a|b, provided that

c R so that ac = b
Basic properties of divisibility from Z carry over to any ring, for the same reasons.
Proposition 6.3. Let a, b, c, x, y R. Then:
If a|b and b|c, then a|c.

If a|b, then ac|bc.
If a|b, then a|bx.
If a|b and a|c, then a|bx + cy.
a|0R .
Proof. Left to the reader.
Definition. Let a R. Then a is a unit provided that a|1R .
In other words, a is a unit if there is a b R so that ab = 1R .
Definition. The element b in the above situation is called the inverse

of a, and we write b = a1 .
Proposition 6.4. Let x, y, u R with u a unit. Then x|y x|uy xu|y.
Proof. We prove one direction, saving the others for the reader. Suppose that
x|uy. Then there is an element c R so that xc = yu. Multiply both sides by u1 .
This gives x(cu1 ) = y, and therefore x|y.
Units are invaluable in solving the ax = b problem, as you may recall from our
study of this problem for modular arithmetic. If a is a unit, then the solution to
the problem is x = a1 b.
The only units of Z are {1, 1}. Every nonzero element of Q and R are units. The
units of Z/nZ are the congruence classes a where a and n are relatively prime.
Note that 1R and 1R are units in any ring. Since 0R x = 0R for all x, 0R is never
a unit. Sometimes it is the only nonunit of a ring, and such rings have a special
name.
Definition. A ring R is a field if every nonzero element of R is a unit.
Thus Q and R are fields. It is a good exercise to think through the following:
Proposition 6.5. The ring Z/nZ is a field if and only if n is prime.
The ring Z is not a field. The number 2 does not have an inverse in Z. It doesnt
matter that in the bigger ring Q it has an inverse.
Definition. Let R be a ring. Say that an element a R is associate

to b provided that a|b and b|a.
132 6. RINGS
For example in Z, 3 and 3 are associate. Note this is an equivalence relation on

elements of R. If a R and u is a unit in R, then au is associate to a. Thus 2 and
2 = 4 Z/6Z are associate.
2.3. Zero Divisors
Lets recall how to solve an equation like x2 5x + 6 = 0 in algebra. The normal

thing to do is to factor it into (x2)(x3) = 0 and then argue that either x2 = 0
or x 3 = 0. One then concludes that the only solutions are x = 2 or x = 3.
The most interesting step here is the one that comes after the factoring. Why,
exactly, is it that if two numbers multiply to 0 then one of them must be 0?
In the ring Z/10Z, for instance, the elements 5 and 6 multiply to 0, though neither
of them is itself 0 = 0Z/10Z . And note that in Z/10Z, the congruence class 8 is
another solution to x2 5x + 6, in addition to 2 and 3. Are there any others?...
We need to keep track of when nonzero elements can multiply to be 0R .
Definition. A nonzero element x of a ring R is a zero divisor if there

is a nonzero element y R so that xy = 0R .
Thus an element is a zero divisor if it is part of a nontrivial factorization of 0R .

Note that 0R itself is not considered a zero divisor.
Check that in Z/10Z the zero divisors are 2, 4, 5, 6, and 8.
How do we know the other elements arent zero divisors?

Proposition 6.6. Let R be a ring, and u a unit of R. Then u is not a zero divisor
of R.
Proof. Suppose uy = 0R . Multiplying both sides by u1 yields y = 0R .
For the general ring Z/nZ, we have an easy characterization:

Proposition 6.7. Let n > 1 and 0 < a < n. Then a is a zero divisor of Z/nZ if
and only if a is not relatively prime to n.
Proof. If a is relatively prime to n then a is a unit, thus not a zero divisor by

the above. On the other hand, say d = gcd(a, n) > 1. Let e = (n d) < n. Then
since a is a multiple of d, ae is a multiple of n, so ae = 0. But neither a nor e = 0.
Thus a is a zero divisor.
Here is a practical way to think about a zero divisors. Suppose a is not a zero
divisor, and ab = ac. Then a(b c) = 0. This is a factorization of 0, so b c = 0
and thus b = c. So even though we didnt use an inverse of a, since it was not a
zero divisor, we could cancel it from both sides of the equation.
Rings without zero divisors are important.

2. RINGS 133
Definition. A ring is called an integral domain if it does not have

any zero divisors.
Since units and 0R are never zero divisors, any field is an integral domain. The ring
Z is an integral domain, but is not a field. So the converse does not hold.
Proposition 6.8. Let n > 1. The ring Z/nZ is an integral domain if and only if
n is prime.
The reader should think this through.
Remark: A finite ring is an integral domain if and only if it is a field. Can you
prove it?
2.4. Products of Rings
Here is a way to combine two rings.
Definition. Let R and S be rings. Write R S for the set of pairs

{(r, s) | r R, s S}, with addition and multiplication on R S defined
componentwise via (r1 , s1 ) + (r2 , s2 ) = (r1 + r2 , s1 + s2 ) and (r1 , s1 )
(r2 , s2 ) = (r1 r2 , s1 s2 ).
For example, (Z/2Z) (Z/2Z) has four elements which we will denote by 0 = (0, 0),
e1 = (1, 0), e2 = (0, 1), and 1 = (1, 1). Its addition/multiplication tables are:
[Make tables.]
This is a different ring than Z/4Z! You can tell, because every element of Z/2Z
Z/2Z added to itself is 0, whereas this is not the case for the ring Z/4Z. Later we
will study more systematically what it means for two rings to be different.
What are the units and zero divisors of Z/2Z Z/2Z?
In fact in general R S is a ring. [Check distributivity, as an example.] Its additive

identity is 0RS = (0R , 0S ) and its multiplicative identity is 1RS = (1R , 1S ). The
negative of (r, s) is (r, s).
In general, is R S ever a field? An integral domain?
2.5. Rings of Functions
Here is another interesting way to derive new rings from old.
Definition. Let R be a ring, and X a set. Write F(X, R) for the set of
functions from X to R, with addition and multiplication defined point-
wise as follows. If f and g are two functions with domain X and values
in R, then f + g and f g are defined via (f + g)(x) = f (x) + g(x) and
(f g)(x) = f (x)g(x) for every point of X.
134 6. RINGS
F(X, R) is in fact a ring. [Check a ring property]. Its additive identity 0F (X,R)
is the constant function f0 whose value at every point of X is 0R . In other words
f0 (x) = 0 for all x. Its multiplicative identity 1F (X,R) is the constant function f1
whose value at every point of X is 1R . In other words f1 (x) = 1 for all x. The
negative of a function f is the function defined by (f )(x) = (f (x)).
Lets take X and R to both be the real numbers R. Then F(R, R) is just the set
of real-valued functions with domain R.
The function f (x) = x2 + 1 is a unit because it has an inverse g(x) = x21+1 so that
for all x, f (x)g(x) = 1. Thus f g = f1 . The function f (x) = x is not a unit because
if g F(R, R) is any function, f (0)g(0) = 0, so f g cannot be equal to f1 .
What are the units of F(R, R)?
The functions (
0 if x 0
f (x) =
1 if x > 0
and (
1 if x 0
g(x) =
0 if x > 0
satisfy f g = f0 , though neither f nor g is f0 . So f and g are zero divisors.
What are the zero divisors of F(R, R)?
You should compare the rings R R and F(X, R) if R is a ring and X is a set with
two elements.
2.6. Subrings
Checking all the ring axioms is a little tiresome. Luckily there is a way to generate
rings as subsets of other rings. For example we will say that Z is a subring of Q.
But you cant take any subset. For instance N is a subset of Z but doesnt have a
zero, or any negatives inside N. As another example, the subset {1, 0, 1} isnt a
subring for the basic reason that the operation of addition from Z takes you outside
{1, 0, 1} so it isnt a ring in its own right. Here is the definition of a subring.
Definition. Let R be a ring, and S R a subset of R. Then S is called

a subring of R provided that the following conditions are satisfied:
(1) If s1 , s2 S, then s1 + s2 S.
(2) If s1 , s2 S, then s1 s2 and s2 s1 S.
(3) If s S, then s S.
(4) 1R S.
In other words S is closed under addition, multiplication, subtraction, and contains

1.
Note that if S is a subring then 1R S, so is 1R and thus 1R + (1R ) = 0R S.

2. RINGS 135
If S is a subring then it becomes a ring itself under the operations already defined in
R. The first two axioms just say those operations on S dont go outside of S. You
dont need to check associativity, commutativity, or distribution because theyre
true in R and in particular for elements of S.
Examples: Z is a subring of Q, which is itself a subring of R. The set C(R, R) of

continuous real-valued functions on R is a subring of F(R, R). The subset of differ-
entiable real-valued functions is a subring of C(R, R), and the subset of polynomial
functions is a subring of that. The following proposition is pretty easy.
Proposition 6.9. If S is a subring of R and T is a subring of S then T is a subring
of R.
Thus Z is a subring of R, etc.

Proposition 6.10. Let R be a ring. Then the diagonal R = {(r, r)|r R} is
a subring of R R.
Proof. Let (a, a) and (b, b) be in R . Then (a, a)+(b, b) = (a+b, a+b) R ,
(a, a)(b, b) = (ab, ab), and (a, a) = (a, a), which shows that R is closed under
addition, multiplication, and negation. Moreover 1RR = (1R , 1R ) R .
A subring of an integral domain is an integral domain. You should think through

why that is so. But the converse is not true, as you can see from Proposition 6.10
with R an integral domain. A subring of a field is not necessarily a field as the
example of Z Q shows.
2.7. Exercises
(1) Propositions 6.3 and 6.4.

(2) (a) Let R be a commutative ring. Prove that associativity is an equiva-
lence relation on R. (Go back and read the boxed definition of a is
associate to b.)
(b) Let R = Z/12Z. Write out the equivalence classes for the above
relation.
(3) Let R be 3-space R3 , with addition defined as usual: (x, y, z)+(x0 , y 0 , z 0 ) =
(x+x0 , y+y 0 , z+z 0 ). Define multiplication via the cross-product: (x, y, z)
(x0 , y 0 , z 0 ) = (yz 0 y 0 z, x0 z xz 0 , xy 0 x0 y). Which ring axioms does R
satisfy? Be sure to specify what the additive and multiplicative identities
are, if they exist.
(4) The following addition and multiplication tables describe a ring R with
four elements. Which elements of R are units? Which are zero divisors?
+

(5) Show that if u, v are units in R then uv is a unit in R.
136 6. RINGS
(6) Let R be an integral domain. Prove that if a|b and b|a, then there is a
unit u R so that b = au.
(7) Show the previous exercise may be false if R is not an integral domain.
(8) Let R be a commutative ring. Say an element x R is nilpotent if there
is an n N so that xn = 0. Prove that the sum of two nilpotent elements
is nilpotent.
(9) Let X be a set. Let R be the set of subsets of X, with the following
addition and multiplication laws. For A, B R, define multiplication via
A B = A B. Define addition via
A + B = (A B) (B A).
Here A B = {a A | a / B} denotes the set of elements in A which are
not in B.
Check that R is a ring under these operations. Be sure to specify
what the additive and multiplicative identities are.
Write out the multiplication and addition tables in the case where X
has two elements.
(10) Let R = ZZ, with addition and multiplication defined componentwise,
i.e. (a, b) + (c, d) = (a + c, b + d) and (a, b) (c, d) = (ac, bd). Determine
the units and zero divisors of R, and show your reasoning.
(11) Explain why the subset of real numbers with terminating decimal expan-
sions is a subring of R. What are the units?
(12) Let p be a prime. Let Z(p) = {x Q | ordp (x) 0}. Thus Z(p) is the set
of fractions with no ps in the denominator. Check that it is in fact a
subring of Q, using properties of ordp . What are the units? Is it a field?
(13) Let p be a prime. Let Z[ p1 ] = {x Q | ordq (x) 0 for all primes q 6= p}.
Thus Z[ p1 ] is the set of fractions with only ps in the denominator. For
3
instance 25 is in Z[ 51 ] but not in Z[ 13 ]. Check that it is also a subring of
Q. What are the units? Is it a field?
(14) Suppose 1R +1R = 2R is a unit in R. Prove that the equation x2 +bx+c =
0 has a solution in R if and only if b2 4c is the square of an element in
R.
For the next three problems, let R = F(R, R) be the ring of all functions from R
to R as above and S the subring of continuous functions.
(15) Prove that every function in R is either zero, a unit, or a zero divisor in
R.
(16) Find a function in S that is neither zero, a unit, nor a zero divisor in S.
(17) Find three different solutions to the equation f 2 + f = 6 in R. How many
are there in S?
3. ABSTRACT LINEAR ALGEBRA 137
3. Abstract Linear Algebra
3.1. Definition of Mn (R)
Let R be a commutative ring, and n N. In this section we will define a new ring
Mn (R) of n n matrices with entries in R, and develop its properties. We hope
the reader is familiar with basic linear algebra.
An element X Mn (R) is an n n array of elements of R, i.e.,

a11 a12 a1n
a21 a22 a2n
X= .

an1 an2 ann

It is called a matrix, and the element in the ith row and jth column is called the
(i, j)th entry of X. The addition rule for n n matrices X and Y is defined as
follows. If the (i, j)th entry of X is aij and the (i, j)th entry of Y is bij , then the
(i, j)th entry of Y is aij + bij .
Since addition in R is commutative and associative, so is addition in Mn (R). The

zero element of Mn (R) is the n n matrix, all of whose entries are 0R ; thus

0R 0R 0R
0R 0R 0R
0Mn (R) =

.

0R 0R 0R
The negative of a matrix X Mn (R) is the n n matrix whose entries are the
negatives of the entries of X; thus

a11 a12 a1n
a21 a22 a2n
X =
.

an1 an2 ann
Certainly X + (X) = 0Mn (R) .
Write Rn for n-tuples of elements of R. They are called vectors, and the elements are
called components. Thus a typical vector v Rn can be written v = (a1 , . . . , an ).
A given row or column of a matrix forms a vector as usual; for instance the second
row of X above is the vector (a12 , a22 , , an2 ).
If i N is not larger than n write ei for the vector whose ith component is 1R
and whose other components are 0R . It is called the ith standard basis vector.
Addition and vectors is performed componentwise in the usual way. We require
the notion of scalar multiplicatio. If a R and v = (a1 , . . . , an ) then we will write
av = (aa1 , . . . , aan ).
Pn
Note that if v = (a1 , . . . , an ) then v = i=1 ai ei .
138 6. RINGS
Definition. If v = (a1 , . . . , an ) and w = (b1 , . . . , bn ) arePtwo vectors in

n
Rn , then the dot product of v and w is given by v w = i=1 ai bi . This
is an element of R.
Let u, v, w Rn , and a R. The following properties are easy to check:
u (v + w) = u v + u w.
(u + v) w = u w + v w.
v ei = ei v is the ith component of v.
v (aw) = a(v w).
u v = v u.
Definition. (MM1) If X Mn (R) and v Rn then the product Xv

is defined to be the vector in Rn whose ith component is the dot product
of the ith row of X with v.
Let X, Y Mn (R), a R, and v, w Rn . The following properties follow from the

above properties of the dot product:
(X + Y )v = Xv + Y v.
(F)Xei is the ith column of X.
X(v + w) = Xv + Xw.
X(av) = a(Xv).
The property (F) belongs to any linear algebraists toolkit, and is worth meditating
on. It implies that X is determined by its multiplications against the standard basis.
In particular,

| | |
X = Xe1 Xe2 Xen .
| | |
Pn
The last twoPproperties can be iterated to show that if v = (a1 , . . . , an ) = i=1 a i ei ,
n
then Xv = i=1 ai (Xei ).
In other words, if

| | |
X = v1 v2 vn ,
| | |
then Xv = a1 v1 + a2 v2 + + an vn .
Now we define matrix multiplication.
Definition. (MM2) If X, Y Mn (R) then the product XY is defined

to be the n n matrix whose jth column is the product of X with the jth
column of Y .
Thus, if w1 , . . . , wn are the column vectors of Y, then

| | | | | |
X w1 w2 wn = Xw1 Xw2 Xwn .
| | | | | |
Let X, Y, Z Mn (R). Write I = 1Mn (R) for the n n identity matrix; this is the
square matrix whose diagonal elements aii are all 1R and whose other elements are
0R . Thus,

1R 0R 0R
0R 1R 0R
.
1Mn (R) =
0R 0R 1R
Note that the ith row is ei , as is the ith column.. Since R is nontrivial, I 6= 0Mn (R) .
The following properties follow from the above properties of matrix-vector multi-
plication:
(X + Y )Z = XZ + Y Z.
X(Y + Z) = XY + XZ.
XI = IX = X.
Note that since we have defined matrix multiplication in terms of columns, we can
translate it by (F) into
X(Y ei ) = (XY )ei .
The RHS tells you what the ith column of XY should be, given the ith column,
Y ei , of Y . Associativity of multiplication sprouts out of this.
Proposition 6.11. If X, Y Mn (R) and v Rn then X(Y v) = (XY )v.
Pn
Proof. Let v = (a1 , . . . , an ) = i=1 ai ei . The properties we have developed
thus far show that
Xn Xn n
X
X(Y v) = X(Y ( ai ei )) = X( ai Y (ei )) = ai X(Y ei ) =
i=1 i=1 i=1
n
X n
X n
X
ai ((XY )ei ) = (XY )(ai ei ) = (XY )( ai ei ) = (XY )v.
i=1 i=1 i=1
Theorem 6.12. If X, Y, Z Mn (R) then (XY )Z = X(Y Z).
Proof. It is enough to show that the columns of the LHS and RHS are the
same. The jth column of the LHS is (XY )zj , where zj is the jth column of Z. The
jth column of the RHS is X(Y zj ). By the previous proposition we are done.
140 6. RINGS
3.2. Noncommutative Rings
By the previous section we have our first example Mn (R) of a noncommutative

ring. We know M2 (R), for example, is never commutative, because

1 0 0 1 0 1 0 0 0 1 1 0
= 6= = .
0 0 0 0 0 0 0 0 0 0 0 0
Note that we are finally relaxing the subscripts of 1R and 0R .
I would like to make some general remarks now about the theory of divisibility,
units, and zero divisors in noncommutative rings.
What should we mean by a|b in a noncommutative ring R? It makes a difference

whether you say there exists a c R so that ac =b or so
that ca = b! Consider

1 0 1 2
the following example, with R = M2 (R). Let A = and B = .
0 0 0 0
Does A|B? In fact, if we put C = B then youll see that AC = B but CA 6= B.
In fact, a moments computation will show you that there is no C M2 (R) with
CA = B! This motivates the following definition.
Definition. Let R be a ring, and a, b R. Then a is a left divisor

of b provided that there is a c R so that ac = b. We say a is a right
divisor of b provided that there is a c R so that ca = b.
Thus in the above example, A is a left divisor of B but not a right divisor of B. I
would propose the notations a|r b and a|` b, but we shall not have much occasion to
use this.
What about units?

Definition. Let R be a ring, and a R. Then a is a unit of R provided
that it is both a right divisor and a left divisor of 1R .
This left/right nuance for units doesnt get noticed in linear algebra.
Fact 6.13. Let F be a field. Then a matrix A Mn (F ) is a left divisor of I if and

only if it is a right divisor of I.
As an exercise, see how your linear algebra textbook covers this.
(Anyone know about the case of Mn (R) for general R? Drop me a line.)
Units in matrix rings play an important role in linear algebra, but they usually are
called something else.
Definition. Let F be a field, and n N. Then a matrix A Mn (F ) is

called invertible or nonsingular provided that it is a unit in Mn (F ).
Zero divisors split up into left ones and right ones.

Definition. Let R be a ring, and a R. Then a is a left zero divisor

provided that there is a nonzero element b R so that ab = 0R . We say
a is a right zero divisor provided that there is a nonzero element b R
so that ba = 0R .

0 1
For example, the matrix A = M2 (R) for any ring R is both a left and
0 0
right zero divisor since A2 = 0M2 (R) .
Again this left/right nuance doesnt happen in a linear algebra class.

Fact 6.14. Let F be a field. Then a matrix A Mn (F ) is a left zero divisor if and
only if it is a right zero divisor.
Can you prove it?
(Anyone know about the case of Mn (R) for general R? Drop me a line.)
Facts 6.13 and 6.14 are not true for general noncommutative rings, but the examples
are a little heavy. Here is a sketch of an example. Consider a real vector space V
with an infinite basis of the form {e1 , e2 , . . .}. The set R of linear transformations
L : V V forms a ring, under pointwise addition and composition. Consider the
linear transformations , , and defined by
(e1 ) = 0, (e2 ) = e1 , (e3 ) = e2 , . . . and
(e1 ) = e2 , (e2 ) = e3 , (e3 ) = e4 , . . . .
(e1 ) = e1 , (e2 ) = 0, (e3 ) = 0, . . . .
In other words, moves the basis vectors to the left, moves them to the
right, and sends all the basis vectors but e1 to 0. Then the reader may check
that = 1R , the identity transformation, but since (e1 ) = 0, there is no linear
transformation 0 with 0 = 1R . Thus is a left unit but not a right unit.
Similarly, is a right unit but not a left unit. Finally note that = 0R , so
is a left zero divisor. If there were a R with = 0R then, applying to both
sides of = 1R we would have 0R = , so is not a right zero divisor.
3.3. The 2 2 Case
We can save ourselves some headache if we focus on the 2 2 case, which is plenty
big enough for our purposes. For every matrix X M2 (R) there is an associated
element det(X) R, given by

a b
det = ad bc.
c d
Here are some properties of the determinant, which can be checked directly.
If X, Y M2 (R), then det(XY ) = det(X) det(Y ).

det(I) = 1.
If a column of X is 0, then det(X) = 0.
142 6. RINGS
The following formula is key to understanding the ring theory of M2 (R):

a b d b d b a b ad bc 0
= = = (adbc)I
c d c a c a c d 0 ad bc
Proposition 6.15. Let X M2 (R).
(1) X is a unit in M2 (R) if and only if det(X) is a unit in R.

(2) X is zero or a zero divisor in M2 (R) if and only if det(X) is zero or a
zero divisor in R.

a b d b
Proof. Let X = , and Y = . The main formula is
c d c a
XY = Y X = det(X)I.
(1) If X is a unit in M2 (R), then there is an element A M2 (R) so that

XA = I. Taking determinants of both sides shows that det(X) det(A) =
1, thus det(X) is a unit in R.
r R so that
Conversely, if det(X) is a unit, then there is an element
dr br
det(X)r = 1. Then the main formula shows that rY = is
cr ar
inverse to X.
(2) If X = 0 then det(X) = 0.
If X is a zero divisor in M2 (R), then there is a nonzero matrix A

M2 (R) so that XA = 0. Let v = (x, y) be a nonzero column of A. By
(MM2) we know thatXv = 0. Suppose x 6= 0 (the case y 6= 0 is similar).
x 0
Let Z be the matrix ; its determinant is x. By (MM2) the first
y 1
column of XZ is 0, so the determinant of XZ is 0. Thus det(X)x = 0,
and it follows that det(X) is either 0 or a zero divisor.
If det(X) is zero, the main formula shows that XY = 0. Therefore

X is zero or a zero divisor. If det(X) is a zero divisor, there is a nonzero
element r R so that det(X)r = 0. Then X(rY ) = r det(X)I
= 0.
d b
If rY 6= 0 then we see X is a zero divisor. If r = 0 then
c a

a b r 0
r =0 as well, so that X = 0. This shows that X is a
c d 0 r
zero divisor in this case as well.
4. Chapter Wrap-Up
4.1. Rubric for Chapter

4. CHAPTER WRAP-UP 143
The definition of a ring, units, and zero divisors.

Ways of forming rings, including products, polynomial rings, and rings of
functions.
Simple ring-theoretic proofs.
Methods for determining units and zero divisors for a given ring.
4.2. Toughies
(1) There are four rings which contain exactly 4 elements. Find them, and
write out their addition/multiplication tables.
(2) Find all the subrings of Q.
CHAPTER 7
Polynomials
145
146 7. POLYNOMIALS
1. Polynomials
Polynomials are ubiquitous in mathematics, appearing for example as Taylor poly-

nomials in calculus, characteristic polynomials in linear algebra, knot polynomials
in topology, generating functions in combinatorics, and characteristic equations in
differential equation theory.
In calculus they are usually the first functions studied as they are the most amenable
to differentiation and integration. Indeed, they are closed under these operations.
It is often of interest in the above examples to know the roots of polynomials, and
how they factor. This leads to the study of the arithmetic of polynomials, which
we pursue in this section.
1.1. Basics of Polynomials over Rings
Let R be a commutative ring. We will define polynomials with coefficients in R.
[Authors Note: I need to replace arbitrary commutative ring with something

conceptually easier.]
Definition. A polynomial is an expression of the form

f (x) = ad xd + ad1 xd1 + + a0 x0 ,
where ai R. The zero polynomial is the polynomial in which all
ai = 0.
Remarks: For convenience, we will often write x0 = 1.
Definition. Let f be a nonzero polynomial. The degree of f , or deg(f ),

is the highest power of x with a nonzero coefficient.
Thus in the previous definition, if ad 6= 0, then deg(f ) = d.
Remark: If f = 0 is the zero polynomial, we do not define a degree. Some call the
degree and assume a calculus for such a symbol. We, however, feel this gives an
undue mysticism for and choose to deal with the zero polynomial separately.
If deg(f ) = 0, f is a constant polynomial; of the form f (x) = a, with a 6= 0.
If deg(f ) = 1, f is linear polynomial; of the form f (x) = ax + b, with a 6= 0.
If n > deg(f ), the convention will be that the coefficient an = 0.
Definition. If f is a nonzero polynomial, write LT(f ) for the highest

degree (leading) term of f .
Thus f = LT(f ) + f< , where f< is a polynomial with degree less than f . Note that
LT(f ) 6= 0.
1. POLYNOMIALS 147
Definition. Addition of polynomials is performed in the obvious way: If

f (x) = ad xd + + a0 , and g(x) = be xe + + b0 , and n d, e, then
(f + g)(x) = (an + bn )xn + + (a0 + b0 ).
It is easy to see that the additive ring axioms for R impose the same axioms for
this addition.
Lemma 7.1. (Degree Estimate for Addition) If f and g are nonzero polynomials,
with g 6= f , then deg(f + g) max{deg(f ), deg(g)}. If deg(f ) > deg(g) then this
is exactly deg(f ), and LT(f + g) = LT(f )
Now we turn to multiplication.
1.2. Multiplication of Polynomials
In this section, we discuss the delicacies of polynomial multiplication. The idea is

very simple: use induction to whittle it down to monomial multiplication.
Definition. A monomial is an expression of the form axn , with a R

and n a nonnegative integer.
Definition. Monomials multiply via axm bxn = abxm+n .
Proposition 7.2. Monomial multiplication is commutative and associative.
Proof: This follows from commutativity and associativity of multiplication in the

ring R, and commutativity and associativity of addition of integers.
Definition. Monomials and polynomials multiply via

axm f (x) = aad xd+m + aad1 xd+m1 + + aa0 xm .
Proposition 7.3. If f is a monomial and g and h are polynomials, then f (g+h) =

f g + f h.
Pm This is perhaps
Proof. Pn best done with summation notation: Write f =
axl , g = i=0 bi xi , h = i=0 ci xi , and let N m, n. Then
m n
! N
! N
X X X X
l i i l i
f (g + h) = ax bi x + ci x = ax (bi + ci )x = a(bi + ci )xi+l .
i=0 i=0 i=0 i=0
By distributivity of R, a(bi + ci ) = abi + aci , so this is

N
! N
!
X X
l i l i
ax bi x + ax ci x = f g + f h.
i=0 i=0

148 7. POLYNOMIALS
Definition. Since every polynomial is a sum of its leading term and its
lower degree terms, we can define polynomial multiplication recursively
via (
0 if f = 0,
f g =
LT(f ) g + f< g if f 6= 0
The induction will stop when f is a monomial, since then f< = 0. Note that if
deg(f ) = 0, then f is a monomial, so the induction process will certainly end.
Note that, by induction, f 0 = 0.
Remark: Throughout this section when we say induction we are referring to

strong induction, since deg(f< ) may be less than deg(f ) 1.
Proposition 7.4. If f is a monomial and g is a polynomial, then f g = g f .
Proof. Clear if g = 0. Otherwise, use induction on deg(g). If deg(g) = 0,

then g is a (constant) monomial, and the result follows from commutativity of
monomials. By the previous proposition, f g = f LT(g) +f g< . We must compare
this to LT(g)f + g< f . But these are equal by commutativity of monomials and
the induction hypothesis.

Corollary 7.5. If f is a monomial and g and h are polynomials, then (g +h)f =
g f + h f.
Proof. Using previous results, (g + h)f = f (g + h) = f g + f h = gf + hf .
Finally, we get some theorems.

Theorem 7.6. (Distribution) Let f, g, and h be polynomials. Then f (g + h) =
f g + f h.
Proof. Left as exercise.

Theorem 7.7. (Commutativity) Let f and g be polynomials. Then f g = g f .
Proof. If f = 0 both sides are zero. Otherwise use induction on deg(f ). If

deg(f ) = 0 this is monomial-polynomial commutativity. By definition of f g, we
have
f g = LT(f ) g + f< g = g LT(f ) + g f< ,
using monomial-polynomial commutativity and the induction hypothesis. By dis-
tribution of g, this is equal to gf .
We finish with associativity. The idea is simply three inductions, one for each term.
The reader should fill in the details.
Proposition 7.8. If f and g are monomials, and h is a polynomial, then (f g)h =
f (g h).
1. POLYNOMIALS 149
Proof. Induction on deg(h). Use distributivity, and associativity of mono-

mono-mono multiplication.
Proposition 7.9. If f is a monomial and g and h are polynomials, then (f g)h =
f (g h).
Proof. Induction on deg(g). Use mono-mono-poly associativity and distribu-

tivity.
Theorem 7.10. (Associativity) If f, g, h are polynomials, then (f g) h = f (g h).
Proof. Induction on deg(f ). Use mono-poly-poly associativity and distribu-

tivity.
1.3. The Polynomial Ring R[x]
By the previous section, the addition and multiplication laws on the set of polyno-
mials make it into a ring.
Definition. The set of polynomials with coefficients in a ring R thus

forms a ring itself. We call this new ring R[x].
(Obviously the constant polynomial f1 (x) = 1 is the multiplicative identity.)
Remark: R may be viewed as a subset of R[x] via the constant polynomials. It is

then a subring.
In this section we will try to find the units and zero divisors of R[x] and succeed
when R is an integral domain. We start with a lemma.
Lemma 7.11. Let R be an integral domain, and f R[x] a nonzero monomial. If
g 6= 0, then f g 6= 0 and LT(f g) = LT(f ) LT(g).
Proof. This follows from a simple computation.

Corollary 7.12. In the above situation, deg(f g) = deg(f ) + deg(g).
Theorem 7.13. Let R be an integral domain. Let f and g be nonzero polynomials.
Then f g 6= 0, deg(f g) = deg(f ) + deg(g), and LT(f g) = LT(f ) LT(g).
Proof. Induction on deg(f ). If f is a monomial the theorem follows from the

previous lemma. So we are done if deg(f ) = 0 or if f< = 0. We have LT(f g) =
LT(LT(f )g+f< g) by the definition of multiplication. By our inductive hypothesis,
deg(f< g) = deg(f< ) + deg(g) < deg(f ) + deg(g) = deg(LT(f ) g).
Therefore we may apply Lemma 7.1, to conclude that
LT(f g) = LT(LT(f ) g + f< g) = LT(LT(f ) g).
By the previous lemma again this is LT(LT(f )) LT(g). Since LT(f ) is monomial,
this yields LT(f ) LT(g), as desired. It follows that deg(f g) = deg(LT(f g)) =
deg(f ) + deg(g).
150 7. POLYNOMIALS
Corollary 7.14. If R is an integral domain, R[x] is an integral domain.

Corollary 7.15. If R is an integral domain, the units of R[x] are the constant
polynomials equal to units in R.
Proof. If f g = 1, then deg(f ) + deg(g) = deg(1) = 0. Thus f and g must

both be constants, and the result follows.
Example: This fails if R is not an integral domain. For example, if R = (Z/4Z),

then (1 + 2x)2 = 1, so 1 + 2x is a unit in R[x].
Corollary 7.16. In the integral domain situation, if f |g, and g 6= 0, then deg(f )
deg(g).
1.4. Roots of Polynomials
It is often of importance to determine the roots of polynomials. In linear algebra,

the eigenvalues of a square matrix are the roots of its characteristic polynomial.
Definition. If f (x) = an xn + + a1 x + a0 , and c R, we define

f (c) = an cn + + a1 c + a0 .
We will need the following:

Proposition 7.17. Let R be a ring, and f, g R[x]. If f +g = h, then f (c)+g(c) =
h(c). If f g = h, then f (c)g(c) = h(c).
Proof. We leave this as an exercise, with a hint. The first fact is easily seen
with a direct calculation. For the second, use the LT-method. Thus make a
direct calculation when f is monic. When f is a polynomial, use induction.
Definition. An element c R is a root of f if f (c) = 0.
Note that if (x c)|f (x), then c is a root of f . The converse is true; we will later
prove it in the case where R is a field.
1.5. Exercises
(1) Theorem 7.6, Lemmas 7.8, 7.9, Theorem 7.10, Lemma 7.11, and Proposi-
tion 7.17.
(2) Let R be an integral domain. Let f, g R[x] with f nonconstant and
g 6= 0. Prove that the set {i N; f i |g} is bounded above.
(3) In the above situation, define ordf (g) = max{i N; f i |g}. Prove that
ordf (g) = i if and only if there is an h F [x] so that g = f i h and f - h.
(4) How many functions are there from Z/3Z to Z/3Z? Find two different
polynomials in (Z/3Z)[x] which give the same function on Z/3Z. Also try
this exercise for other Z/nZs.
(5) R may not be an integral domain. Prove that in this case we still have
deg(f g) deg(f ) + deg(g) if f g is nonzero.
1. POLYNOMIALS 151
(6) Find a quadratic polynomial in (Z/8Z)[x] with 4 roots. Are there any
with 5 roots?
For the next three problems fix a ring R, and consider polynomials in R[x]. Let
f R[x] be a nonzero polynomial. Define the order of f , written (f ), to be the
lowest power of x with a nonzero coefficient.
For example, (x5 + 2x3 ) = 3. If f = 0 we define (f ) = .
(7) Show that (f ) = ordx (f ) when f is nonconstant.

(8) Prove that if f, g R[x] then (f + g) min((f ), (g)).
(9) Let R be an integral domain. Prove that (f g) = (f ) + (g).
(10) Let L = Q[x, x1 ] be the set of rational Laurent polynomials, which are
of the form
f (x) = am xm + + a1 x1 + a0 + a1 x + + an xn ,
with ai Q. Then L is a ring under the usual addition and multiplication
rules. Write (f ) for the least integer i so that ai 6= 0, and deg(f ) for the
greatest integer i so that ai 6= 0. For example, if f = x4 + 2x3 4x1 ,
then (f ) = 4 and deg(f ) = 1. Explain why if f, g L are nonzero,
then (f g) = (f ) + (g) and deg(f g) = deg(f ) + deg(g).
(11) Use the previous problem to show that the units of L are all of the form
xi , where is a nonzero real number and i Z.
152 7. POLYNOMIALS
2. Polynomials over a Field
2.1. Some More Ring Theory
Much of the material from the first chapter can be generalized to an arbitrary
commutative ring R.
Definition. Write Div(a, b) for the set of common divisors of a and b,

and Mult(a, b) for the set of common multiples of a and b.
Definition. We say a divisor d of x is a proper divisor of x provided

that it is neither a unit nor the product of x and a unit.
Definition. We say an element x R is irreducible provided that it

is nonzero, not a unit, and has no proper divisors.
Example: When R = Z, the irreducible elements are the primes of N and their
negatives. Every irreducible element of Z(p) is associate to p.
2.2. Another Division Algorithm
The arithmetic of R[x] is much nicer when R is a field, and is very similar to that
of the ring Z. In this section we will show that, up to constants, any nonzero
polynomial factors uniquely into irreducible polynomials.
Let F be a field. By the work in the previous section, we know that F [x] is an
integral domain, and the units are exactly the polynomials of degree 0. Up to these
units, we will have unique factorization. We will follow the same basic path as with
N.
We have a Division Algorithm for Polynomials.

Theorem 7.18. (Division Algorithm, Weak Form) Let f, g F [x] be polynomials
with g 6= 0. Then there are polynomials p, r with r = 0 or deg(r) < deg(g), so that
f = pg + r.
Proof: If f = 0 we put q = r = 0. Otherwise we use induction on deg(f ). First note

that if deg(f ) < deg(g) we may put q = 0, r = f , so we are done in that case. So we
may assume below that deg(f ) deg(g). Secondly, note that if deg(g) = 0, then g
is a unit so we may put p = f g 1 and r = 0. In particular, these remarks settle
the case of deg(f ) = 0. As usual consider the decompositions f = LT(f ) + f< and
g = LT(g) + g< . Let LT(f ) = axm and LT(g) = bxn . By the assumption above,
m n Thus we may take p0 = ab xmn . (This will be LT(q).) What is left? Since
LT(f ) = LT(p0 g), either deg(f p0 g) < deg(f ) or this difference is 0. In the
latter case, we may take q = q0 and r = 0. In the former we apply the induction
hypothesis to f p0 g and g. We find p1 , r as in the statement of the theorem,
satisfying f p0 g = p1 g + r. In this case, we may take p = p0 + p1 and the same r.
Corollary 7.19. (Strong Form) In the above situation, the p and r are uniquely
determined.
2. POLYNOMIALS OVER A FIELD 153
Proof: Suppose pg + r = p0 g + r0 , with deg(r), deg(r0 ) < deg(g). Then (p p0 )g =

(r0 r). If the right hand side is not 0, its degree is less than the degree of g. But
it is divisible by g, a contradiction. So r = r0 and the RHS is 0. Since g is not a
zero divisor we conclude that p = p0 and are done.
Corollary 7.20. (Root Test) Let c F . The linear polynomial x c divides f iff
c is a root of f .
Proof: We have already mentioned one direction. Apply the division algorithm to
f and x c. If x c does not divide f , then f (x) = p(x)(x c) + r(x), with
deg(r) = 0. Then f (c) = r 6= 0.
2.3. Synthetic Division
In this section, F denotes a field. Long division of polynomials is somewhat cum-

bersome because one has to do a great deal of redundant writing, with xs and bars.
With some practice you can omit the variables and only work with the coefficients.
If you are merely dividing a polynomial f (x) by the polynomial x c with c F ,
you can further simplify your work to a table with three rows of numbers, which
will quickly give you all the information of the division algorithm. The first row is
simply the coefficients of f , and the third row will be the coefficients of p and r.
Let f (x) = an xn + an1 xn1 + + a0 F [x], and c F . Set up the table:

an an1 a0
c an
Here are the rules:
Every number in the third row should be multiplied by c and the result
should be put in the upper right entry of the second row.
Every number in the second row should be added to the entry above it
and the result should be put in the row below it.
So you wind up getting:

an an1 an2 a0
an c an c2 + an1 c an cn + an1 cn1 + + a1 c
c an an c + an1 an c2 + an1 c + an2 an cn + an1 cn1 + + a1 c + a0
Note that the last entry is f (c), which is the remainder. The other entries in the
third row are the coefficients of the quotient p, which will be one degree less than
f.
Here is an example, with f (x) = x4 + 5x3 + x + 3 and c = 2.
1 5 0 1 3
2 6 12 26
2 1 3 6 13 23
Thus f (x) = (x3 + 3x2 6x + 13)(x + 2) 23.

154 7. POLYNOMIALS
Heres why this works. Following the rules sets up (the coefficients of) a polynomial
p and a number r so that the second row is (the coefficents of) cp, and the third
row is xp + r. (The factor of x accounts for the shift to the left.) Moreover, the
third row is the sum of the first two. Thus, f + cp = xp + r. Regroup this to get
f = (x c)p + r.
Synthetic division is a fine way to determine whether c F is a root of f . It is a

root only if the last entry is 0. Of course, another way is to plug c directly into f .
One advantage of synthetic division is that if c is a root, then p is the quotient of
f by x c. Any further roots of f will also be roots of p, which has smaller degree.
Here is another benefit of synthetic division when F is the real numbers R (or Q).
You can often use the p to zoom in on possible roots of f . For instance consider
f (x) = x3 5x + 6 and c = 3:
1 0 5 6
3 9 12
3 1 3 4 18
So 3 is not a root, and f (x) = (x2 + 3x2 + 4)(x 3) + 18. Can f have any roots d
greater than 3? If so, then f (d) = (d2 + 3d2 + 4)(d 3) + 18. But since d > 3 > 0,
all these terms are positive, and so the result cannot be 0!
A general rule is:
Proposition 7.21. Let f R[x] and c R. If the third row of the synthetic
division table consists of positive numbers, then there are no positive roots of f
greater than c.
Proof. Under the given hypotheses, f (x) = p(x)(x c) + r, with r > 0 and
the coefficients of p positive. Let d > c be positive. Then f (d) = p(d)(d c) + r.
Since d is positive, so is p(d). Since d > c, d c > 0. Thus f (d) > 0.
The condition that d be positive is important; watch what happens with f =

x2 + 4x + 4, c = 3, and d = 2.
Here is the rule in the other direction, which the reader should enjoy proving:
Proposition 7.22. Let f R[x] and c R. If the entries of the third row of the
synthetic division table are nonzero and alternate sign, then there are no negative
roots of f less than c.
Remark: In this and the previous proposition, the c in the corner is not considered
part of the third row.
For example this happens with f (x) = x3 5x + 6 and c = 3:
1 0 5 6
3 9 12
3 1 3 4 6
Thus we know all real roots of f lie between 3 and 3. In fact there is exactly one
real root, about 2.68. The third row of the synthetic division table for c = 1 has
a negative entry; these simple tests give only one-way information.
2.4. Rational Root Test
Lets specialize to the rational numbers Q. Say youre given a polynomial f Q[x]
and need to find its rational roots. For instance, f (x) = 3x5 17 4 9 3 2
2 x + 2 x + 4x
11
2 x 1. At first it seems there are infinitely many possibilities, and most of them
will not be roots. In this section we will use a little modular arithmetic to show
that there are actually only finitely many possibilities; in this case you need to
check 1, 2, 12 , 13 , 23 , 16 . Some fluency with synthetic division makes this
even easier.
Before we begin please note that any polynomial f Q[x] can be multiplied by a
constant N Z so that N f Z[x]. For example, N could be the product of the
denominators of the coefficients of f . In the above example, 2f = 6x5 17x4 +
9x3 + 8x2 11x 2 Z[x]. The roots of f are of course the same as the roots of
N f . So we may reduce to the case of integer polynomials.
Heres the theorem:

Theorem 7.23. (Rational Root Test) Let f (x) = an xn +an1 xn1 + +a1 x+a0
Z[x] be a nonzero polynomial. Let pq be a reduced fraction in Q, thus gcd(p, q) = 1.
If pq is a root of f , then p|a0 and q|an .
Thus there are only finitely many possibilities for p and q, as long as a0 and an are
nonzero. (What if they arent?) In the above example, a5 = 6 and a0 = 2; this is
how the list of possible roots was made. In general you list every number which
may be written as a divisor of a0 divided by a divisor of an .
n n1
Proof. If pq is a root of f , then 0 = f p
q = an p
q + an1 p
q + +

a1 pq + a0 . Multipling by q n we obtain
an pn + an1 pn1 q + + a1 pq n1 + a0 q n .
So a0 qn 0 mod p. Since gcd(p, q) = 1, we know q is a unit mod p. So we can invert
it to get a0 0 mod p. This exactly says that p|a0 . Similarly, an pn 0 mod q
says that q|an .
Lets use this to find the rational roots of our example. First try c = 1.
6 17 9 8 11 2
6 23 32 24 13
1 6 23 32 24 13 15
Thus 1 is not a root; too bad. Since the digits of the third row have alternating
sign, by Proposition 7.22 we know that any roots must be greater than 1; this
rules out the 2. Lets check c = 1.
156 7. POLYNOMIALS
6 17 9 8 11 2
6 11 2 6 5
1 6 11 2 6 5 7
Thus 1 is not a root. Since the third row is neither all positive nor alternating, we
cant use either Proposition 7.21 or Proposition 7.22 to rule out any more roots.
Too bad.
Next, c = 16 .
6 17 9 8 11 2
1 3 2 1 2
16 6 18 12 6 12 0
A root! We conclude that 2f = (x 16 )(6x4 18x3 +12x2 +6x12) = (6x1)(x4

3x3 + 2x2 + x 2). Any more roots of f are thus also roots of x4 3x3 + 2x2 + x 2.
The rational root test now tells us that the possible roots are 1, 2. We already
know that 1, 1, and 2 are not roots, so the only possibility is 2:
1 3 2 1 2
2 2 0 2
2 1 1 0 1 0
Thus 2 is a root, and in fact 2f = (6x 1)(x 2)(x3 x2 + 1). The only possible
roots of the cubic are 1, and we have already checked that these are not roots.
So we are done; the only roots of the original f are 2 and 61 .
Here are some corollaries of the theory, starting with an important definition:
Definition. Let f be a nonzero polynomial in F [x]. We say f is monic

if the coefficient of its highest degree term is 1.
Corollary 7.24. If f Z[x] is monic, then any rational root is an integer.
p
Indeed, if an = 1, then q|1 so q = p.
Corollary 7.25. There is no rational square root of 2.
Indeed, the only possible rational roots to x2 2 are 1, 2, and the squares of
these are not 2. Of course there is a real root to x2 2.
2.5. Euclidean Algorithm
As in the case of integers, the Division Algorithm leads directly to a Euclidean

Algorithm. We must first discuss greatest common divisors. We will normalize
them to be monic, as defined above.
Here are some facts about monic polynomials we will need.

Proposition 7.26. If f is a nonzero polynomial, then f factors uniquely into a
nonzero constant and a monic polynomial.
Proof. Indeed, if f = an xn + , with an 6= 0, then f = an ( afn ).
Proposition 7.27. If f and g are monic polynomials of the same degree, and f |g,
then f = g.
Proof. Exercise.
We want to define a greatest common divisor of two nonzero polynomials f and

g. We dont have such a nice well-ordering as in N, but we do have subtraction. So
we use the following approach.
Let f, g F [x] be polynomials, not both 0. Consider the set I = {af + bg|a, b
F [x]}
By the Min Form of Well-Ordering, the set of degrees of polynomials in I has

a minimum. We would like a polynomial in I of minimum degree to be called
gcd(f, g). However if you multiply such a polynomial by a nonzero c F , it
will have the same degree and will still be in I. If we normalize by asking for a
monic polynomial of this degree, then there is only one such in I, by the following
proposition.
Proposition 7.28. There is a unique monic polynomial d in I of smallest degree.
The polynomial d divides both f and g. Moreover, if there is a polynomial e so that
e|f and e|g, then e|d. Therefore d is the unique monic polynomial of greatest degree
in Div(f, g).
Proof. Observe that I is closed under addition, and multiplication by ele-

ments of R[x]. Since f ,g are not both 0, there are nonzero polynomials in I.
Dividing one of these by its leading coefficient shows that there are monic polyno-
mials in I. Suppose that d and d0 are monic polynomials in I of smallest degree.
By the division algorithm there are polynomials p and r so that d0 = pd + r, with
r = 0 or deg(r) < deg(d). But by the above observation, r = d0 pd I. Thus r
must be 0, for otherwise its degree would be too small. Thus d|d0 . By Proposition
7.27, d = d0 .
Note that f and g are themselves members of I. By the division algorithm, f =

pd + r for some p, r. As in the previous proof, r I, and is therefore 0. So d|f and
similarly d|g. Suppose e|f and e|g. Since d I, we know that d = af + bg for some
a, b F [x]. Therefore e|d.
Definition. Let f, g be polynomials, not both 0. The greatest common

divisor of f and g, written gcd(f, g), is the unique monic polynomial of
greatest degree in Div(f, g).
Corollary 7.29. (Bezout Identity) Let f, g be polynomials, not both 0. Then there
are polynomials a, b F [x] so that af + bg = gcd(f, g).
Proof. Indeed, by Proposition 7.28, gcd(f, g) I, so it can be written in this

form by definition.
158 7. POLYNOMIALS
Proposition 7.30. Let f, g be nonconstant polynomials. Then there are polyno-

mials a, b F [x] with deg(a) < deg(g) or a = 0, and deg(b) < deg(f ) or b = 0 so
that af + bg = gcd(f, g).
Proof. Let d = gcd(f, g). The cases a = 0 or b = 0 occur when g|d or f |d; so
we may rule out these cases.
By the theorem, we know that there are a0 , b0 F [x] so that a0 f + b0 g = d. Note

that if k(x) F [x], then also (a0 pg)f +(b0 +pf )g = d. By the division algorithm,
there is a p so that if a = a0 pg, then deg(a) < deg(g). (Note that if a = 0 then
g|a0 which implies that g|d which we have ruled out.)
Let b = b0 + pf . (If b = 0 then f |b0 which implies f |d.) We can rewrite the above as
bg = d af . Since d|f but f - d, deg(af ) > deg(d) and so deg(d af ) = deg(af ).
Thus deg(b) + deg(g) = deg(bg) = deg(af ) = deg(a) + deg(f ) < deg(g) + deg(f ),
and therefore deg(b) < deg(f ).
This leads to the method of Undetermined Coefficients.
Example: Let us apply this to the polynomials f (x) = x2 + 1 and g(x) = 3x 1

in R[x]. Note that g(x) is irreducible and does not divide f , so Div(f, g) = {
R| 6= 0}. We set up the equation af + bg = 1, with deg(a) = 0 and deg(b) = 1:
We obtain:
(x2 + 1) = (x + ) (3x 1) = 1,
( + 3)x2 + (3 )x + ( ) = 0x2 + 0x + 1.
This is meant to be an identity of polynomials, and so each of the three coefficients
must agree:
+ 3 = 0, 3 = 0, = 1.
9 3 1
This is easily solved to give = 10 , = 10 , and = 10 . And the reader may
check that
9 2 3 1
(x + 1) + x (3x 1) = 1.
10 10 10
Remark: There is a Euclidean Algorithm for polynomials analogous to the one for
N, but it is computationally cumbersome in practice.
Definition. Let f1 , . . . , fn be polynomials in F [x], not all 0. The great-

est common divisor of f1 , . . . , fn , written gcd(f1 , . . . , fn ), is the unique
monic polynomial of greatest degree in Div(f1 , . . . , fn ).
It was slightly bad of me to write down this definition before proving the following:
Proposition 7.31. There is a monic polynomial d so that Div(f1 , . . . , fn ) = Div(d).
Therefore this is the unique monic polynomial of greatest degree in Div(f1 , . . . , fn )
A monic polynomial of smallest possible degree certainly exists; this is the Max
form of Well-Ordering on the set of degrees. The interesting algebraic fact is that
there is only one such polynomial.
Proof. Induction on n; the case n = 1 is obvious; just divide f1 by its leading

coefficient. Suppose the Proposition is true for k. Let f1 , . . . , fk+1 be given, and
dk = gcd(f1 , . . . , fk ). Let d = gcd(dk , fk+1 ); we will show that Div(f1 , . . . , fk+1 ) =
Div(d). We know that d|dk and dk |fi for i = 1, . . . , k, so d Div(f1 , . . . , fk+1 ),
as is any divisor of d. So the RHS LHS. Next, let g Div(f1 , . . . , fk+1 )
Div(f1 , . . . , fk ) = Div(dk ). So g Div(dk , fk+1 ) = Div(d).
The following plays a vital role in the theory of canonical forms in linear algebra,
so we include a proof.
Proposition 7.32. Let f1 , . . . , fn F [x], not all 0, with gcd(f1 , . . . fn ) = d. Then
there are polynomials a1 , . . . , an so that a1 f1 + + an fn = d.
Proof. Induction on n; the case n = 1 is again obvious. Suppose the propo-

sition is true for k. Let f1 , . . . , fk+1 be given, and that b1 f1 + + bk fk = dk =
gcd(f1 , . . . , fk ). As in the previous proof we have d = gcd(dk , fk+1 ). Let p, q F [x]
be polynomials so that pdk + qfk+1 = d. Then we may set ai = pbi for i = 1, . . . , k
and ak+1 = q.
Definition. Let f1 , . . . , fn F [x] be nonzero polynomials. We say they

are pairwise coprime if for all i 6= j, gcd(fi , fj ) = 1. We say they are
relatively prime if gcd(f1 , . . . , fn ) = 1.
Proposition 7.33. Let f1 , . . . , fn F [x] be pairwise coprime. Then Mult(f1 , . . . , fn ) =
Mult(f1 fn ).
We omit the proof because it is just like the proof of Proposition 2.49 in Section
9.2.
Theorem 7.34. If f F [x] has degree d, then f has at most d distinct roots.
Proof. If c1 , c2 , . . . , cn are distinct roots of f , then by Proposition 7.20, f

Mult(x c1 , . . . , x cn ). Therefore by the previous proposition the product
(x c1 ) (x cn ) divides f (x). This implies that n d.
2.6. The Fundamental Theorem of Arithmetic for Polynomials
This section is closely related to the section for numbers. We leave out some proofs
which are identical to the proofs for polynomials.
Definition. A divisor g of f is called a proper divisor provided that

its degree is neither 0 nor deg(f ).
Note that this agrees with the definition in a general ring, because polynomials of
degree 0 are exactly the units.
Definition. A nonzero polynomial f is irreducible provided that its de-

gree is greater than 0 and it has no proper divisors. It is called reducible
provided that its degree is greater than 0 and it has proper divisors.
160 7. POLYNOMIALS
Thus f is irreducible when Div(f ) consists only of polynomials of the form c or cf

where c is a nonzero constant.
Lemma 7.35. If f is irreducible, and f - g, then f and g are relatively prime.
Proof. If c 6= 0, then cf - g either. Therefore Div(f, g) consists of only

nonzero constants, and we conclude that gcd(f, g) = 1.
Proposition 7.36. If deg(f ) = 1 then f is irreducible.
Proof. Suppose f factored in some way as f = gh. Then

1 = deg(f ) = deg(g) + deg(h).
Since the degree of a polynomial is a whole number in N, either deg(g) or deg(h)
must be 0, and therefore one of them is not a proper divisor.
Lemma 7.37. A nonzero polynomial f has a root in F if and only if it has a divisor
of degree 1.
Proof. If f (c) = 0 for c F , then (x c)|f (x) by Corollary 7.20. On the

other hand, suppose that some polynomial ax + b divides f with a 6= 0. Then since
a is a unit in F [x], the polynomial x + ab divides f (x), and therefore ab is a root
of f by Corollary 7.20 again.
Proposition 7.38. If deg(f ) = 2 or 3 then f is irreducible if and only if it does
not have any roots in F .
Proof. We give the proof for deg(f ) = 3; the other case is similar. By the
Lemma, if f is irreducible, it does not have any roots in F . Now suppose f doesnt
have any roots in F . Suppose f factored in some way as f = gh. Then
3 = deg(f ) = deg(g) + deg(h).
Since these are all whole numbers, the only possibilities for the degree of g are
0, 1, 2, 3. If the degree is 0 or 3 then g is not a proper divisor. If the degree is 1
then f has a root by the Lemma, a contradiction. If deg(g) = 2 then deg(h) = 1
and so f has a root by the Lemma again. Therefore f is irreducible.
Example: f = (x2 + 1)2 R[x] is reducible although it doesnt have any roots.
Lemma 7.39. If f is irreducible and f |gh then f |g or f |h.
Proof. Suppose f - g. By Lemma 7.35, gcd(f, g) = 1. By the Euclidean

Algorithm, there are polynomials a, b so that af + bg = 1. Multiplying this by h
we obtain af h + bgh = h. Since f divides each part of the left hand side, it divides
h as well.
Lemma 7.40. If f and g are irreducible monic polynomials and f |g then f = g.
Proof. Because g is irreducible, f = cg for some c 6= 0. By comparing leading

terms we see that c = 1 and therefore f = g.
Proposition 7.41. If G(x) F [x] is nonconstant then G has a monic irreducible

factor.
Proof. Exercise.
Proposition 7.42. Let f be irreducible and n N. Then
Div(f n ) = {cf (x)e ; e n, c 6= 0}.
Proof. Exercise.
Corollary 7.43. Let f, g be distinct monic irreducible polynomials, and m, n N.
Then gcd(f m , g n ) = 1.
Proof. Exercise.
Recall from Exercise 3 in Section 1.5 :
Definition. Let f, g F [x] with f nonconstant and g 6= 0. Then

ordf (g) = max{i N; f i |g}.
Proposition 7.44. If f is irreducible then g, h N then ordf (gh) = ordf (g) +

ordf (h).
There is a small issue in stating the Existence part of the Fundamental Theorem
of Arithmetic. In the natural number case, it was obvious that there were only a
finite number of primes dividing a given N , simply because such primes must be
less than N . In the polynomial case, all we know a priori is that a polynomial
dividing G must have a smaller degree. If F is infinite, there are infinitely many
monic irreducible polynomials, even of degree one!
So we pause for a lemma.

Lemma 7.45. Let G be a nonconstant polynomial. There are only finitely many
monic irreducible divisors of G.
Proof. Let n be the degree of G, and suppose f1 , . . . , fn+1 are distinct monic
irreducible divisors of G. By Lemmas 7.40 and 7.35, these are coprime. Therefore
by Proposition 7.33, we see that the product (f1 f2 fn+1 )|G. But the degree of
this divisor is clearly greater than the degree of G, so this is impossible.
Theorem 7.46. (Existence) Let G be a nonconstant polynomial, and let f1 , . . . , fm
be the monic irreducible divisors of G. Let ei = ordfi (G) for all such i. Let c be
the leading coefficient of G. Then
G(x) = cf1 (x)e1 f2 (x)e2 fm (x)em .
Proof. Write M (x) = f1 (x)e1 f2 (x)e2 fm (x)em . It is easy to see that

ordf (M ) ordf (G) for all irreducibles f . We know that fiei divides G for all i,
and by Corollary 7.43 we know these factors are pairwise coprime. By Proposition
7.33 we deduce that M |G. Thus G = M u for some u F [x]. If u is nonconstant
162 7. POLYNOMIALS
then by Proposition 7.41 u is divisible by some monic irreducible f . But then

ordf (u) > 0 and ordf (G) = ordf (M ) + ordf (u), contradicting that ordf (M )
ordf (G). Therefore u is a nonzero constant. By comparing the leading coefficients
we conclude that u = c.
Theorem 7.47. (Uniqueness) Let G be a nonconstant polynomial, and suppose G

factors in some way as
0 0 0
G = c0 f1 (x)e1 f2 (x)e2 fr (x)er ,
with c0 F , all the fi monic irreducibles and e0i N. Then c = c0 , the fi are all
the irreducible monic divisors of G, and e0i = ordfi (G).
Proof. Obviously the fi at least form a subset of the irreducible monic divisors
of G, and the definition of ord implies that e0i ordfi (G). It follows that the degree
of the RHS of the equation in the corollary is no bigger than the degree of the RHS
of the equation in the theorem, and equality of degrees can only hold if we have
equality of the ei and e0i . By comparing the leading coefficients we conclude that
c = c0 .
2.7. Exercises
(1) Propositions 7.22,7.27, 7.41, 7.42 and Corollary 7.43.

(2) A standard exercise in an introductory course in analysis is the follow-
ing. You are given a polynomial f (x) R[x] and a number a R, and
must prove directly that limxa f (x) = f (a). This relies on relating the
quantities |f (x) f (a)| and |x a|. Prove that x a always divides
f (x) f (a).
(3) Let F be a field, and f (x), g(x) F [x]. Suppose that every root of f is also
a root of g. Does f necessarily divide g? Give a proof or a counterexample.
(4) Find all the rational roots of the following polynomials.
(a) 8x3 36x2 + 54x 27
(b) 30x3 31x2 + 10x 1
(c) 23 x6 + x5 21 4 3 2
2 x + 2x + x 2
21
3 4 3 2 11
(d) 2 x + 10x + 7x + 2 x 3
(5) Use the mod 2 Polynomial Sieve worksheet from the 453 website to find
all irreducible polynomials in (Z/2Z)[x] of degree six or less. Note the
shorthand of writing the coefficients without the variables. For instance,
11001 denotes the polynomial x4 + x3 + 1. See any patterns?
(6) Are there infinitely many irreducible polynomials in (Z/2Z)[x]? Prove or
disprove.
(7) Suppose F is a field and f F [x] is a polynomial of degree n. Must f
have n distinct roots? Can f have more than n roots?
(8) Let F3 = Z/3Z be the field with three elements. Here is a list of all the
monic quadratic polynomials in F3 [x] :
x2 x2 + 1 x2 + 2 x2 + x x2 + x + 1 x2 + x + 2 x2 + 2x x2 + 2x + 1 x2 + 2x + 2
Circle the irreducible polynomials and cross out the reducible ones.
(9) Here are some quartic polynomials in F3 [x] :
x4 x4 + 1 x4 + 2 x4 + x x4 + x + 1 x4 + x + 2 x4 + 2x x4 + 2x + 1 x4 + 2x + 2
Circle the irreducible polynomials and cross out the reducible ones.
(10) Prove the product of two monic polynomials is monic.
(11) Let f (x) = an xn + + a1 x + a0 R[x], and suppose f has at least n + 1
distinct roots. Use linear algebra, notably the theory of the Vandermonde
determinant, to prove that f = 0.
(12) Prove Corollary 7.20 for a general ring R.
(13) If q(x) = an xn + + a1 x + a0 R[x], and f is a sufficiently differentiable
real-valued function on R, let
q(D)f = an f (n) (x) + + a1 f 0 (x) + a0 f (x).
Here f (n) denotes the nth derivative of f . Prove that if q = q1 + q2 , then
q(D)f = q1 (D)f + q2 (D)f , and if q = q1 q2 , then q(D)f = q1 (D)(q2 (D)f ).
(14) Suppose that p R[x] is the product of two relatively prime polynomials
p = p1 p2 . Prove that any solution to the differential equation p(D)f = 0
is the sum of two solutions f = f1 +f2 , where p1 (D)f1 = 0 and p2 (D)f2 =
0. [Hint: Apply the Bezout Identity to p1 and p2 .]
164 7. POLYNOMIALS
3. Irreducibility in C[x]
We already know that if F is a field, any polynomial of degree one is irreducible. In

this section we will argue that the converse is true in C[x]. We borrow the following
result from complex analysis:
Theorem 7.48. (Fundamental Theorem of Algebra, Weak Form) Every non-constant

polynomial in C[x] has a complex root.
Proof. (Sketch) Suppose p(x) C[x] is a nonconstant polynomial without

1
any roots. Then f (z) = p(z) is a complex-differentiable function defined on all of
C. Simple estimates with the triangle inequality show that |p(z)| diverges as |z|
approaches infinity, and thus f is bounded. Liouvilles theorem (take a course in
Complex Analysis) says that any bounded complex-differentiable function defined
on all of C is constant. Thus f and therefore p is constant. This is a contradiction.

Thus if f C[x] and deg(f ) 1, there is some number C so that f () = 0.

We know this implies that x |f . So if deg(f ) > 1, then f cannot be irreducible.
We conclude that the only irreducible polynomials are linear. The only monic
irreducible polynomials are of the form x , with C.
Together with Unique Factorization, this gives us:
Theorem 7.49. (Fundamental Theorem of Algebra, Strong Form) Every nonzero

polynomial f C[x] of degree n factors uniquely as
f (x) = c (x 1 ) (x n ),
where c C is nonzero and the i are roots of f .
The i are of course not necessarily distinct. There are, however, no other roots
of f as one can check by evaluating the right hand side, using that C is an integral
domain.
[partial fractions for C]
4. Irreducibility in R[x]
In this section we will find the irreducible polynomials in R[x].
Let f R[x] be a nonconstant polynomial. Since R[x] is a subring of C[x], we know

from the previous section that f has a complex root C. If R, then x |f
and we are done.
By a previous proposition, we know that is also a root of f .
Suppose
/ R, and consider the polynomial g(x) = (x )(x ).
5. IRREDUCIBILITY IN Q[x] 165
Note that the coefficients of g are ( + ) and , both real numbers. (One
can compute this directly or note that they are fixed by complex conjugation.)
Therefore g R[x].
Apply the division algorithm in R[x] to f and g. We conclude that there are
r, p R[x] so that f = pg +r, with r a constant or linear polynomial. So r = f pg.
Now the right hand side of this has two complex roots, and . This means r
cant be linear or a nonzero constant, and we conclude that r = 0 so g|f .
Let us summarize. If f R[x] is a nonconstant polynomial, then it has a complex

root . If the root is real, then x |f . If the root is not real, then the qua-
dratic polynomial g|f . This means that every polynomial of degree 3 and higher is
reducible!
So, any irreducible polynomials in R[x] must be linear or quadratic. All the linear
ones are irreducible, and the quadratic formula tells us that if a 6= 0, then ax2 +bx+c
is irreducible if and only if b2 4ac is negative.
Theorem 7.50. The only irreducible polynomials in R[x] are:
(1) Linear polynomials

(2) Quadratic polynomials ax2 + bx + c, where b2 4ac < 0.
Corollary 7.51. Every nonconstant polynomial f R[x] factors into linear and
quadratic polynomials.
Can you find a factorization of x4 + 1 in R[x]?
[partial fractions for R]
5. Irreducibility in Q[x]
[This section and the next are somewhat out of order.]
The field Q is algebraically much more complicated and interesting than the fields
C and R. There are irreducible polynomials of every degree, for instance xn 2 is
irreducible for all n. There is no good algorithm for determining whether a rational
polynomial is irreducible, but we sketch one method in this section.
First note that if f Q[x], one can multiply by a divisible enough integer N to
clear the denominators of f to get g = N f Z[x]. Then f is associate to g, so
f is irreducible if and only if g is irreducible. So we may assume that the f we
started with has integer coefficients. The nice thing about this is that one can look
at it mod p for various primes p. We must proceed cautiously however, because
a polynomial may be reducible in Z[x] but not in Q[x], for instance f = 2x + 2 =
2(x+1) is reducible in Z[x] but not in Q[x]. Since we want to focus on irreducibility
in Q[x] here is a new definition.
Definition. A polynomial f Z[x] is quasiirreducible if it is irreducible when
viewed as an element of Q[x].
166 7. POLYNOMIALS
Thus 2x + 2 is a quasiirreducible polynomial in Z[x].
Here is a key fact called Gausss Lemma, whose proof we defer until the next section:
Lemma 7.52. Let f Z[x]. If f = gh with g, h Q[x], then there is a nonzero
rational number c so that cg, 1c h Z[x].
For example, x2 4 = (2x 4)( 12 x + 1) is a factorization in Q[x] which c = 1

2 turns
into a factorization x2 4 = (x 2)(x + 2) in Z[x].
Corollary 7.53. If f Z[x] is irreducible, then it is quasiirreducible.
The previous lemma implies that if f factors in Q[x] it also factors in Z[x].
The next step is to study irreducibility in Z[x]. This is a very messy ring, but
has lots of nice quotient rings. Consider the modding out by p homomorphism
: Z[x] Fp [x] for various primes p. Write f for f mod p. The following is a nice
way to find irreducible polynomials.
Proposition 7.54. Suppose f Z[x] is nonzero and p does not divide the leading
coefficient of f . Suppose further that f mod p is irreducible. Then f is quasiirre-
ducible.
Proof. We give a proof by contradiction, since we do not wish to untangle

the three hypotheses. Suppose f is not quasiirreducible. This means there are
nonconstant polynomials g0 , h0 Q[x] so that f = g0 h0 . Gausss Lemma implies
that there are nonconstant polynomials g, h Z[x] so that f = gh. Recall that
LT(f ) = LT(g) LT(h). Since p - LT(f ), we know p - LT(g) and p - LT(h). It follows
from this that g and h are nonconstant, thus not units. Since f = gh, we conclude
that f is reducible mod p. This is a contradiction, so f is quasiirreducible.
For example, we know that p(x) = x4 + x3 + x2 + x + 1 is irreducible in F2 [x].

The above proposition implies that any quartic integer polynomial congruent to
p(x) mod 2, thus with all odd coefficients, is irreducible in Q[x]. For example,
1017x4 + x3 7x2 + x 33 Q[x] is irreducible. So if you look at your mod 2
polynomial sieve, you can now prove that many integral polynomials are irreducible
in Q[x].
However, it only goes one way. The polynomial x2 + 1, for example, factors as
(x + 1)2 mod 2, but is irreducible in Q[x].
Remark: The polynomial 2x2 x is is divisible by x but irreducible mod 2. This

is why it is important that p not divide the leading coefficient.
6. Z[x]
The arithmetic of the ring Z[x] is a rich interplay between numbers and polynomials.
Since Z is not a field, there are polynomials of degree 0 which are not units in Z[x],
for example f (x) = 11.
6. Z[x] 167
We no longer have a division algorithm.

Proposition 7.55. If there are polynomials p, r with x2 = 2p(x) + r(x), then
deg(r) 2.
Proof. Modulo 2, we have x2 = r(x).
There is no longer a nice theory of greatest common divisors. For example, the
following proposition should give you pause.
Proposition 7.56. There do not exist polynomials f (x), g(x) Z[x] with 2f (x) +
xg(x) = 1.
Proof. Exercise.
Therefore we cannot expect to use a Euclidean Algorithm in the same way as in Z

or, say, Q[x].
On the other hand, we still have a perfectly good function deg, which satisfies
deg(f g) = deg(f ) + deg(g) since Z is a domain. We will now develop a theory of
ordp for Q[x].
Definition. Let p be a prime, and f (x) = a0 + a1 x + + an xn Q[x].

Then ordp (f ) = mini {ordp (ai )}.
For example, if f (x) = 12 + 6x + 74 x2 , then ord2 (f ) = 2, ord3 (f ) = ord7 (f ) = 1,

and all other ords of f are 0. (Recall that ordp (0) = .)
Let us deduce some properties of the function ordp on Q[x].

Theorem 7.57. Let p be a prime, and f, g Q[x]. Then ordp (f g) = ordp (f ) +
ordp (g).
Proof. This is clear if either f or g is the zero polynomial, so assume this is

not the case. Let us first prove the theorem when ordp (f ) = ordp (g) = 0. Note that
this happens if and only if f, g Z[x] pZ[x]. Thus neither f nor g is congruent
to 0 mod p. Since the ring (Z/pZ)[x] is an integral domain, the product f g is not
congruent to 0 mod p. Whence f g Z[x] pZ[x], and therefore ordp (f g) = 0.
More generally, let f0 = p ordp (f ) f and g0 = p ordp (g) g. Then ordp (f0 ) = 0
and ordp (g0 ) = 0, so ordp (p ordp (f ) f p ordp (g) g) = 0. It is easy to see that
generally ordp (pk h) = k + ordp (h) for h Q[x], so the previous equation becomes
0 = ordp (f g) ordp (f ) ordp (g), giving the desired result.
Theorem 7.58. Let p be a prime. If f, g Q[x], then ordp (f +g) min{ordp (f ), ordp (g)}.
Moreover if ordp (f ) < ordp (g) then ordp (f + g) = ordp (f ).
Proof. Let f (x) = i ai xi and g(x) = i bi xi . By Proposition ?? we know

P P
that for all i, ordp (ai + bi ) min{ordp (ai ), ordp (bi )}. By the definition of ordp (f )
168 7. POLYNOMIALS
and ordp (g), it follows that for all i, ordp (ai ) ordp (f ) and ordp (bi ) ordp (g).
Thus for all i,
ordp (ai + bi ) min{ordp (ai ), ordp (bi )} min{ordp (f ), ordp (g)}.
Since ordp (f + g) = mini ordp (ai + bi ) is equal to the LHS of these inequalities for
some i, the first part of the proposition holds.
Now suppose that ordp (f ) < ordp (g). Say that ordp (f ) = ordp (am ) for some m.
Then ordp (f + g) = mini {ordp (ai + bi )} ordp (am + bm ), which is ordp (am ) by
Proposition ??. This shows that ordp (f + g) ordp (f ) = min{ordp (f ), ordp (g)}.
But the first part of the proposition shows the other inequality, and therefore they
must be equal.
6.1. Exercises
(1) Factor the polynomial x4 + x2 + 9 into irreducible polynomials in each of

the rings Q[x], R[x], C[x], F2 [x], F3 [x], and F5 [x]. For the finite fields Fp ,
the coefficients are considered mod p.
(2) Let n N and consider the polynomial f (x) = xn 2 Z[x]. Suppose
f factors into f = gh in Z[x], with deg(g), deg(h) 1 . By modding out
the coefficients by 2 we get f = gh F2 [x].
(a) What can you say about g and h?
(b) What does that say about g(0) and h(0)?
(c) Why does this give a contradiction?
(d) Prove that f Z[x] is irreducible; the above does most of the work.
(3) Find an antiderivative of x41+1 .
(4) Let p R[x] be a polynomial with p() > 0 for all R. Prove that
there are polynomials q1 , q2 R[x] so that p(x) = q1 (x)2 + q2 (x)2 .
7. RATIONAL FUNCTIONS 169
7. Rational Functions
Let F be a field. In this section we construct the field of rational functions with
coefficients in F from the polynomial ring F [x]. We leave many routine details to
the reader.
Definition. Write X for the set of pairs of polynomials (f : g) with

f, g F [x] and g 6= 0. Consider the following addition and multiplication
laws on X:
(a : b) + (f : g) = (ag + bf : bg)
(a : b) (f : g) = (af : bg).
Proposition 7.59. The following is an equivalence relation on X:

(f : g) (h : k) if f k = gh.
The addition and multiplication laws on X are - invariant.
Definition. Write F (x) for the set of equivalence classes of X under

the above relation. By the previous proposition, the addition and multi-
plication laws on X give addition and multiplication laws on F (x).
Proposition 7.60. F (x) is a field.
Definition. F (x) is called the field of rational functions on F .
Write X [0] for the pairs (f : g) X with f 6= 0.

Proposition 7.61. The function deg(f : g) = deg(f ) deg(g) from X [0] to Z
is - invariant.
f f
As with rational numbers, we usually write g for [(f : g)] and f for 1.
f
Proposition 7.62. Let F (x). Then there are polynomials q, r with r = 0 or
g
deg r < deg g so that
f r
=q+
g g
f
Definition. A rational function F (x) is called topheavy provided
g
that deg f deg g. It is called bottomheavy provided that deg f <
deg g.
f
Note that this notion does not depend on the representative of the equivalence
g
class, by Proposition 7.61.
Thus, every rational function may be expressed as a sum of a polynomial and a

bottomheavy rational function.
170 7. POLYNOMIALS
7.1. Exercises
(1) Prove the unproven statements above.

(2) X does not satisfy two of the ring axioms; which ones?
(3) Describe the equivalence class of X which gives the multiplicative identity
1F (x) .
(4) Prove that there is no rational function in F (x) whose square is x.
(5) Prove that the function (f : g) = (f ) (g) from
X [0] to Z is
f
-invariant. Thus we may consistently define = (f ) (g).
g
8. Composition of Polynomials
We define composition of polynomials analogously to how we defined multiplication

of polynomials. Let R be a ring, and f, g R[x]. We define the composition f g.
first when f is a monomial.
Definition. If f (x) = axn , then (f g)(x) = ag(x)n .
Here is the recursive definition.

Definition.
(
0 p.t. f = 0,
f g =
LT(f ) g + f< g 6 0
p.t. f =
Here are some basic properties of composition, which the reader may verify:
Proposition 7.63. We have:
(f + g) h = (f h) + (g h).
(f g) h = (f g) (f h).
(f g) h = f (g h).
Here is an important point.
Proposition 7.64. Let R be an integral domain, and f, g R[x]. Suppose f 6= 0

and deg(g) 1. Then f g is nonzero and its degree is deg(f ) deg(g).
For the rest of this section we assume that R is a field F .
Proposition 7.65. Suppose that u F [x] has degree 1. Then there is a polynomial
v F [x] of degree 1 so that (u v)(x) = (v u)(x) = x.
Write v = u1 in the above situation.
Polynomials of degree 1 play the role of units here. We now define an analogue to
primality or irreducibility, but relative to composition.
8. COMPOSITION OF POLYNOMIALS 171
Definition. Let f F [x] with deg(f ) > 1. We say that f is decom-

posable provided that we may write f = g h, with deg g, deg h < deg f .
We say that f is indecomposable otherwise.
We are ruling out compositional factors of degree 1 because for any degree 1 poly-
nomial u, one always has the trivial decomposition f = (f u) u1 .
Proposition 7.66. Let f (x) = x4 + x Q[x]. Then f is indecomposable.
Here are some basic properties of indecomposability, which the reader may verify.
Proposition 7.67. We have:
If f is indecomposable and deg u = 1, then f u and u f are indecom-

posable.
If deg(f ) is prime then f is indecomposable.
Here is food for thought.

Proposition 7.68. Let f F [x] with deg f > 1. There are indecomposable poly-
nomials g1 , . . . , gn with f = g1 gn .
It is an interesting question to ask how unique this decomposition is. Certainly,

one should distill out the interference of the unit polynomials u. For instance,
(g1 u) (u1 g2 ) = g1 g2 . Next one must deal with examples such as x2 x3 =
x3 x2 . We do not pursue this interesting question further here.
8.1. Exercises
(1) Let R be a ring, and f, g, h R[x]. Let a = (f ), b = (g), and c = (h).

Suppose that a 1, and b < c. Let d = b(a 1) + c. Prove that
f (g + h) f g mod xd .
What happens if a = 0?
(2) Let t(x) = x + 1 Q[x]. Prove that if f (x) Q[x] with f t = t f , then
f (x) = x + c, for some c Q.
(3) Prove that if f, g Q[x] are two quadratic polynomials with f g = g f ,
then f = g.
(4) Let t(x) = x + 1 F2 [x]. How many polynomials f F2 [x] can you find
with with f t = t f ?
(5) Let f (x) = x4 Q[x]. Find two different ways to express f as the
composition of two quadratic polynomials.
(6) List all the indecomposable quartic (degree 4) polynomials in F2 [x].
(7) Let f be a quadratic polynomial in Q[x]. Prove that there are degree 1
polynomials u1 , u2 Q[x] so that (u1 f u2 )(x) = x2 . Is the analogous
fact still true in F2 [x]?
(8) Prove that (f g) = (f ) (g) for nonzero polynomials f, g R[x],
when R is a domain.
(9) Let R be a ring, and g R[x]. Prove that the subset {f g | f R[x]} is
a subring of R[x].
172 7. POLYNOMIALS
(10) Let F be a field, and f, g, h F [x], with f, h nonconstant. Suppose that

the division algorithm for f g gives a quotient of q and a remainder of
r. Show that the division algorithm for (f h) (g h) gives a quotient
of q h and a remainder of r h.
(11) Let F be a field, f F [x] and r F (x), with both f and r nonconstant.
We may define f r exactly as above. Prove that f r F (x) is not
constant.
(12) Let F be a field, and r, s F (x), with s nonconstant. Define the composi-
tion r s. (Define it first for two members of X, then show your definition
is -invariant.)
(13) (Continuing) Give counterexamples to show that deg(r s) 6= deg(r)
deg(s), and (r s) 6= (r) (s) in general.
9. Chapter 6 Wrap-Up
The basics of roots and factorization of polynomials.

How to construct the field of rational functions.
Basic properties of composition.
9.2. Toughies
(1) Find the nilpotent elements in the polynomial ring R = (Z/4Z)[x].

(2) Find all the units in R = (Z/4Z)[x].
(3) Find all the zero divisors in R = (Z/4Z)[x].
(4) Generalize the above three exercises to R = (Z/nZ)[x], where n > 1.
What about a more general ring than Z/nZ?
(5) The polynomial f (x) = 12 x2 + 21 x Q[x] has the property that f (z) Z
for all z Z. Find all polynomials in Q[x] with this property.
(6) You may have noticed when doing the mod 2 Polynomial Sieve that if a
polynomial a0 + a1 x + + an xn is irreducible, with a0 , an 6= 0 and n > 1
then so is its reverse an + an1 xn1 + + a0 xn . Prove that this is true
for polynomials in any field.
(7) Prove that there is an indecomposable polynomial of every degree (greater
than one) in Q[x].
(8) Let F be a field. Any nonzero r F (x) may be written as r = fg with
f and g relatively prime. In this case, we call max(deg(f ), deg(g)) the
height of r, or height(r). Suppose that r, s F (x) are nonconstant. Prove
that height(r s) = height(r) height(s). State and prove an analogue of
Proposition 7.68 for rational function composition.
(9) In the example of the synthetic division of 6x5 17x4 + 9x3 + 8x2 11x2
by x 61 , we saw that every coefficient of the quotient 6x4 18x3 + 12x2 +
6x 12 is an integer and divisible by 6. Prove the following general fact:
Let f Z[x] and pq Q a reduced fraction. if f ( pq ) = 0 then every

coefficient of the polynomial f (x) (x pq ) is an integer divisible by q.
CHAPTER 8
Real Numbers
175
176 8. REAL NUMBERS
1. Constructing R
In an analysis class one needs axioms for the real numbers R. There are various
formulations of such axioms, but they all mean that R is a complete ordered field,
which we will define in the next section. But even after stating the properties you
want R to satisfy, there are still two logical quandaries. First, is there such a field?
Maybe setting up so many axioms leads eventually to a logical paradox. This
fear can only be assuaged if we construct an example of such a field. Second, is
there more than one such field? This question is subtle and requires the notion of
isomorphism, which we defer until later. [As of now unwritten.]
In this chapter we construct the real numbers in three different ways, via decimals,
Dedekind cuts, and equivalence classes of Cauchy sequences. Each of these has
their merit. Decimals are practical for computation and learned at a young age.
However operations with decimals are difficult to work into an axiomatic framework.
Dedekind cuts give a good framework for proofs, but are a little abstract. The
Cauchy sequence approach is the most abstract, but heres something interesting.
If you change the meaning of what convergence means, you may get an entirely
new field, the p-adic numbers!
All three of these approaches involve having some kind of analytic point of view.
2. Ordered Fields
In this section we define the phrase complete ordered field, together with other
important notions.
An ordered field is not simply an ordered set which happens to be a field; the
ordering must interact with the ring structure.
Definition. Let F be an totally ordered set which is also a field. Then

F is an ordered field provided that the following properties hold:
(1) If a, b, c F then a 0 implies that ac < bc.
Our only example of an ordered field so far is Q.
Definition. Let F be an ordered field. Then F is a complete ordered

field if every nonempty subset of F which is bounded above has a least
upper bound in F .
Theorem 8.1. Q is not a complete ordered field.
Proof. Consider the set C = {a Q | a2 < 2}. C is nonempty since 1 C.

It is bounded above by 2.
Suppose some rational number q = sup C, and suppose first that q 2 < 2. By the
first part of the lemma below there is a number q 0 > q with q 0 C, a contradiction
to q being an upper bound. Now suppose that 2 < q 2 . By the second part of
3. DECIMAL EXPANSIONS 177
the lemma below there is a rational number q 0 < q with q 0 an upper bound of C,
another contradiction.
The only remaining possibility is that q 2 = 2, but we know very well this is impos-
sible.
Lemma 8.2. If a Q+ satisfies a2 < 2, then there is a Q+ so that (a + )2 < 2.
If b Q+ satisfies 2 < b2 , then there is a Q+ so that < b and 2 < (b )2 .

Proof. For the first part, put = 2 a2 Q+ and = min{1, 4a } Q+ .
2
Since 1, we have . Since a 1 we have 4a 4 . Therefore 4 .
2

Meanwhile, since 4a , we have 2a 2 . It follows that (a+)2 = a2 +2a + 2
2 2
a + 2 + 4 < a + = 2, as desired.
We leave the proof of the second part to the reader.
3. Decimal Expansions
In this section we will discuss the decimal construction of the real numbers R.
The theory of place values and decimals does not lend itself to pleasant proofs,
so we will not give many. We assume the reader is acquainted with basic decimal
arithmetic.
Definition. Let D be the set of decimal expansions, i.e., expressions of the form
= dn dn1 d0 .d1 d2 d3 , with ai , di {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. The lead-
ing digit an should not be 0 if n N. We call di the ith digit of . We say
is terminating if there is an ` Z so that if i < `, then di = 0. In this case d` is
called the last digit of .
For example, we have the decimal expansion 1.3333333 with repeating 30 s. Often
one writes this as 1.3 for brevity. This example corresponds to the rational number
4
3.
Remark: These will give nonnegative real numbers, which are complicated enough...
For k Z write ek for the expansion whose digits are zero except at the kth place.
For example e3 = 0.0010. (In fact ek = 10k .)
Definition. Let k N. If = dn dn1 d0 .d1 d2 d3 is a decimal expansion,
then its kth truncation bck is the expansion dn dn1 dk 0.
For example, bc0 = 3 and bc2 = 3.14.
Certainly if k < `, and you know bc` , then you know bck as well, by truncating
sooner. In fact, bbc` ck = bck .
Consider an expansion D. Since you can always truncate a truncation as above,

knowing the digits of is the same as knowing the terminating decimals bck for
sufficiently large k. We will soon define addition in D by describing what the kth
truncations of a sum should be for k arbitrarily large.
178 8. REAL NUMBERS
Defining the addition of two decimal expansions is a little tricky. Suppose you have
two expansions = dn dn1 d0 .d1 d2 and 0 = d0n d0n1 d00 .d01 d02 , and
want an expansion for + 0 . The basic idea is to add the corresponding digits, but
if they add up to more than 9, then carrying is involved.
If and 0 terminate, then addition as usual by starting with the smallest place
where one of them is nonzero and adding vertically as usual, possibly with carrying.
We do not give more detail here, but it is commutative and associative.
If neither and 0 terminate, then we have to think a little. For example, consider
the addition problem
58.793
+ 41.206
The digits of the sum depend on the omitted digits to the right! If the other
digits are all 00 s, for instance, then the sum is the terminating expansion 99.9990
If the digits in the next place add to more than 10, then the expansion starts as
100.000 . . ..
In reality, these two potential sums correspond to rather close numbers, so the
difference is mild. But to write down a general addition rule with digits takes some
willpower. Here goes.
Definition. Fix two expansions and 0 as above. Let i Z be a place. Let si be
the sum of the digits in the ith place of and 0 . If si 8, then i is called simple.
If si 10, then i is called enhanced. If si = 9, then i is called precarious.
[Example]
Let and 0 D, and k N. Suppose we want to specify b + 0 ck . If we specify

b + 0 c` for some ` > k, thats even better.
Case I: Some place ` > k is simple. In this case, put

b + 0 c`1 = bc`1 + bc`1 .
Case II: Some place ` > k is enhanced. In this case, put

b + 0 c`1 = (bc`1 + bc`1 ) + e`1 .
Case III: Every place ` > k is precarious. In this case, put

b + 0 ck = bck + bck .
Note that we can do all the additions on the right because the expansions involved
terminate.
The above gives an addition law on D. It is easy to see that it is commutative, since
it is commutative for terminating decimals, and since the sum si does not depend
on the order.
If someone has a nice argument for why the law is associative, drop me a line!
3. DECIMAL EXPANSIONS 179
What about multiplication? If we wish to multiply a decimal expansion by a

single digit d in {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, we can simply add to itself d times (Or
get 0.0, if d = 0). (Alternatively one could develop a theory of multiplication of
digits and carrying as taught in school.)
Multiplication of by one of the ei is performed by shifting by i places [which

direction?].
If we wish to multiply a decimal P expansion by a terminating expansion , we

can
P write as a finite sum i i i , where the di are digits. Then put =
d e
(d
i i (e i )). Note this is perfectly defined by the above. One should in principle
check now various properties like distributivity, commutativity, and associativity
with the appropriate inputs (for instance, if the inputs are all terminating).
Some difficulty comes when we consider multiplying a nonterminating expansion

by a nonterminating expansion. Consider, for instance, the multiplication of 2.2
with 0.3:
2.22222
0.333333
0.666666 . . .
+ 0.066666 . . .
+ 0.006666 . . .
+ 0.000666 . . .
..
+ .
?
Do you know any of the digits of the result? It looks like the tenths place of the
result is going to be a 7, because the 6 + 6 in the hundredths place adds up to a
12, and one carries the 1. But what if theres a massive amount of carrying before
the hundredths place, and somehow an 8 gets carried, leading to 6 + 6 + 8 = 20?!
As you go to lower and lower places in this multiplication, you can see that the
carrying increases without bound. At the 10100 place, for example, the sum is at
least 600, which means that the carried number is 60!
I hope you can see from this example that developing an algorithm where you input
the (infinitely many) digits of two decimal expansions and output the digits of their
product would be a chore. You know something like
b 0 ck = bck bck
should be true in the limit , even if its not exactly true as it stands.
Addition was simple enough to trudge through with case analysis, but if you want
to have a real number system with clearly defined arithmetic operations one can
work with in an intelligible way, decimal expansions are not the way to go.
Remark: By the way, the above example is comprised of repeating decimals for
simplicity. The clever reader knows this problem is really 20 1
9 and 3 and so the
20
product is 27 = 0.740. But the point was to find a rule that works for all decimal
expansions.
180 8. REAL NUMBERS
Here is another problem. How do you subtract 0.9 from 1.0? We havent discussed
subtraction yet, but the only reasonable decimal expansion which could be the
difference is 0.0. On the other hand, the rule for addition gives 0.9 + 0.0 = 0.9.
At no point in our definition of D did we say that 0.9 = 1.0, so they are different
elements of D. This is not an isolated occurence either; any time you have a
terminating expansion like 0.340 there is a repeating 9s expansion like 0.339 lurking
in its shadow.
This is not an insurmountable problem. One can define an equivalence relation

on D by saying any terminating decimal is equivalent to its repeating 9s shadow.
For example, 45.60000 . . . 45.5999 . . .. Thus equivalence classes have either 1 or
2 expansions; 1 if its neither terminating nor repeating 9s, and 2 if it is. This is
the formal way of saying such expansions should be equal.
Then you can define the nonnegative real numbers R0 to be the set of equivalence
classes. One patiently checks that addition is -invariant, and if anyone gets around
to defining multiplication, subtraction, and division, they can check that those are
also well-defined. The next step is probably to define the notion of <, and then
define negative real numbers and their arithmetic.
At the end of the day youve done a lot of work, but youve kept in touch with your
roots as a student of the decimal system.
4. Dedekind Cuts
4.1. Positive Real Numbers
For simplicity we will actually construct the positive real numbers R+ first. One
can then form R just as we formed Z from N.
Definition. Let C be a subset of Q+ . Then an upper bound of C is

an element x Q+ so that c x for all c C. The set C is considered
bounded above if there is an upper bound of C. If there is an element
x C which is an upper bound of C, then x is called a maximum of C.
It is easy to see that a set has at most one maximum.
Definition. A cut is a nonempty subset C of Q+ which is bounded

above, but has no maximum, and satisfying the following property. If
a C and b is a positive rational number so that b < a, then b C.
We occasionally refer to the last condition as C being left closed.
In other words, a cut is an open rational line segment whose left endpoint is 0.
Intuitively, the real numbers are the right endpoints of these cuts.
Definition. If q Q+ let Cq) = {a Q+ | a < q}.
In fact, Cq) = (0, q), with the understanding that (0, q) = {x Q | 0 < x < q}.
4. DEDEKIND CUTS 181
Proposition 8.3. Cq) is a cut.
Proof. Note that Cq) is nonempty since 2q Cq) . Also, Cq) is bounded above
by q. If a Cq) then the average a+q a+q a+q
2 satisfies a < 2 < q, and therefore 2 Cq ,
so a is not the maximum of Cq) . Thus Cq) has no maximum. It is easy to see that
Cq) satisfies the rest of the definition of cut, by the transitivity of inequality.
Are there any other cuts of Q+ ?

Proposition 8.4. Let C = {a Q+ | a2 < 2}. Then C is a cut, but is not equal
to Cq) for any positive rational q.
Proof. C is nonempty since 1 C. It is bounded above by 2. Suppose

1 a C. By Lemma 8.2, there is a positive number > 0 so that a + C.
Therefore a cannot be a maximum of C. It is easy to see that C satisfies the rest
of the definition of cut.
Suppose C = Cq) for some rational number q, and suppose first that q 2 < 2. By
the first part of Lemma 8.2 there is a number q 0 = q + with q 0
/ Cq) but q 0 C,
2
a contradiction. Now suppose that 2 < q . By the second part of Lemma 8.2 there
is a number q 0 = q Q+ with q 0 Cq) but q 0
/ C, another contradiction.
The only remaining possibility is that q 2 = 2, but we know very well this is impos-
sible.

As you may have guessed, this cut C will correspond to the irrational number 2.
Lemma 8.5. Let C be a cut and Q+ . Then there are numbers p C and q
/C
so that 0 < q p < .
Proof. Since C is nonempty we have an element a C. Write a as a fraction

m
of integers . By multiplying numerator and denominator by a sufficiently large
n
integer, we may assume n > 1 . Now let

i
S= iN| C .
n
Since m S it is nonempty. Let b be an upper bound of C; then nb is an upper
bound of S. By Well-Ordering, S has a maximum element M . Then it is easy to
see that p = M
n and q =
M +1
n satisfy the conditions of the lemma.
Definition. Write R+ for the set of all cuts.
First we show how R+ is an ordered set.
Definition. Say that C1 C2 if C1 is a subset of C2 .
Of course, we define C1 < C2 if C1 C2 but C1 6= C2 . We also define >, in the

usual way, using inclusion the other way around.
182 8. REAL NUMBERS
Proposition 8.6. (Trichotomy) If C1 and C2 are cuts, then C1 C2 or C2 < C1 .
Proof. If C1 is not a subset of C2 , there is an element x1 C1 which is not

in C2 . Let x2 C2 . If x2 x1 then x1 would be a member of C2 by the definition
of cut. Since this is impossible we must have x2 < x1 . So by the definition of cut,
x2 C1 . Therefore C2 < C1 .
We will soon define addition, multiplication and division for R+ . But first we will
define the sup operation.
+
S S R is a nonempty set of cuts which is bounded above.
Theorem 8.7. Suppose

Then the union C = CS C is a cut.
Proof. Write C for the union. It is obviously nonempty and bounded above.
Suppose C had a maximum m C . There must be a cut C S with m C,
and it is easy to see that m is a maximum of C , a contradiction. This shows that
C does not have a maximum. Suppose a C , and b < a. Then a C for some
C S, and therefore b C C . This shows that C is left-closed.
We will give C a better name: sup S.
Definition. If S R+ is a nonempty set of cuts which is bounded above,

then the union of C S is called the supremum of S. Write sup S R+
for this union.
Note that
C sup S for all C S.

If D is an upper bound of S, then sup S D.
Here is the reasoning for the second point: Recall that C SD means C D as
a set of rational numbers. If C D for all C S, then CS C D, which
translates to sup S D.
Definition. If C1 and C2 are cuts, then let C1 + C2 = {x1 + x2 | x1

C1 , x2 C2 }. Also let C1 C2 = {x1 x2 | x1 C1 , x2 C2 }.
Proposition 8.8. If C1 and C2 are cuts, then C1 + C2 is a cut.
Proof. It is easy to see that both sets are nonempty. If b1 and b2 are upper
bounds of C1 and C2 then b1 + b2 is an upper bound of C1 + C2 .
Suppose m is a maximum of C1 + C2 . Since m C1 + C2 , it may be written as

m = x1 + x2 with xi Ci (i = 1, 2). Since C1 is a cut, x1 is not an upper bound
of C1 , and therefore there is a number y1 > x1 in C1 . Then y1 + x2 C1 + C2 ,
contradicting the maximality of m.
Let x1 + x2 C1 + C2 with xi Ci , and suppose y < x1 + x2 . Then by Exercise 7

in Section 2.3, we may write y = y1 + y2 with y1 < x1 and y2 < x2 . Since C1 and
C2 are cuts, y1 C1 and y2 C2 , and therefore y C1 + C2 .
Proposition 8.9. (Cancellation Law of Addition) Suppose C0 , C1 , and C2 R+ ,

and that C0 + C1 = C0 + C2 . Then C1 = C2 .
Proof. We argue by way of contradiction. Suppose that they are not equal.
By Trichotomy, we may assume that C1 < C2 (the case C2 < C1 is similar). Then
there is a number a C2 C1 . Since a is not the maximum of C2 , there is also
a number b C2 with a < b. Let = b a. By Lemma 8.5, there are numbers
x, y with x + < y, x C0 , and y / C0 . Note that x + b C0 + C2 . Since
C0 + C1 = C0 + C2 , we must be able to write x + b = x0 + x1 , with x0 C0 and
x1 C1 . Since a = b is not in C1 , we have x1 y > x0 , a contradiction.
Let us check that for rational cuts Cq) , we recover addition in Q+ .

Proposition 8.10. If q1 , q2 Q+ , then Cq1 ) + Cq2 ) = Cq1 +q2 ) .
Proof. First we check that Cq1 ) + Cq2 ) Cq1 +q2 ) . If x Cq1 ) and y Cq1 ) ,
then x < q1 and y < q2 . Therefore x+y < q1 +q2 , which shows that x+y Cq1 +q2 ) .
To check the other inclusion, suppose that z Cq1 +q2 ) . Then z < q1 + q2 . By
Exercise 7 in Section 2.3 again, we may write z = x + y with x < q1 and y < q2 .
This shows that z Cq1 ) + Cq2 ) , as desired.
Proposition 8.11. If C1 and C2 are cuts, then C1 C2 is a cut.
Remark: This Proposition would not be true if we allowed the negative numbers
in our cuts.
Proof. Exercise.
Proposition 8.12. If q1 , q2 Q+ , then Cq1 ) Cq2 ) = Cq1 q2 ) .
Proof. Exercise.
Proposition 8.13. Suppose that C1 C2 , and C0 R+ . Then C1 + C0 C2 + C0
and C1 C0 C2 C0 .
Proof. Exercise.
Proposition 8.14. If C is a cut, then C C1) = C.
Proof. First we check that C C1) C. If x C and y C1) , then xy < x.

Since C is left-closed, this implies that xy C.
Next, suppose that x C. Since C does not have a maximum element, there is a
x1 C with x < x1 , and therefore xx1 C1) . Thus x = x1 xx1 C C1) . It follows
that C C C1) .
Lemma 8.15. Let C be a cut, and write S = {C 0 R+ |CC 0 C1) }. Then S is
nonempty and bounded above.
184 8. REAL NUMBERS
Proof. Since C is nonempty, some x C. Then S is bounded above by x1 .

Suppose C is bounded above by b. Then Cb) S, so S is nonempty.
Proposition 8.16. Let C be a cut, and C = sup S, where S is the set in the above
lemma. Then CC = C1) .
Proof. It is easy to see that CC = CC 0 C1) ; we must show the

S
C 0 S
other inclusion.
Let q C1) so that 0 < q < 1 and put = 1 q. Let x C and y / C as in the
lemma below. Since y / C the cut Cy1 ) is in S. Therefore xy CCy1 ) CC .
1
But then 1 xy < 1 q implies that q < xy 1 . Since CC is a cut, we must

1
have q CC as well.

Lemma 8.17. Let C be a cut, and Q+ . Then there are elements x C and
/ C so that 0 < 1 xy 1 < .
y
Proof. Since C is nonempty there is a number a C. By Lemma 8.5, there

are numbers x C and y / C so that 0 < y x < a. Since C is left-closed, a < y,
and therefore y x < y. Dividing this inequality by y gives the lemma.
Definition. Let C be a cut. Then write C 1 for the cut C from the
previous proposition.
Thus C 1 is a cut so that CC 1 = C1) .

Proposition 8.18. Commutativity and Associativity of addition and multiplication
hold for cuts, as does Distribution.
4.2. Exercises
(1) The rest of Lemma 8.2 and Propositions 8.11 and 8.12.
(2) Let C1 and C2 be cuts with C1 < C2 . Prove that there is a rational
number q so that C1 < Cq) < C2 .
(3) Prove that if C is a cut, then C = supqC Cq) .
(4) If C is a cut, is the set {y Q+ |xy < 1 for all x C} necessarily a cut?
1
(5) Let q Q+ . Prove that Cq) = Cq1 ) .
(6) Let C be a cut. Prove that there is a cut D so that D2 = C.
(7) Definition 5 of Book V of Euclids Elements reads, Magnitudes are said
to be in the same ratio, the first to the second and the third to the
fourth, when, if any equimultiples whatever are taken of the first and
third, and any equimultiples whatever of the second and fourth, the former
equimultiples alike exceed, are alike equal to, or alike fall short of, the
latter equimultiples respectively taken in corresponding order. Explain
how this is essentially the notion of a cut. (Remark: Online commentary
on this topic is confusing and irrelevant to this problem, so dont bother
reading it.)
4.3. Additive Identity
In this section we add the element 0 to R+ . At this point you can forget the
definition of cuts; all we need is the properties weve been accumulating.
Definition. A set P with operations + and is called a prefield, if it

satisfies the following properties:
(1) Associativity, Commutativity, and Distributivity of Addition
and Multiplication
(2) Cancellation Law of Addition
(3) Existence of Multiplicative Identity 1
(4) Existence of Multiplicative Inverse
(5) Nontriviality (there is an element which is not 1)
For example, Q+ and R+ are prefields. If F is an ordered field, then the positive
elements form an ordered prefield.
Definition. A prefield P with a total ordering < is called an ordered

prefield if the following properties hold. Let x, y, z P .
If x < y then x + z < y + z.
If x < y then xz < yz.
If F is an ordered field, then the positive elements form an ordered prefield.
One of the properties a prefield is missing is an additive identity.
Actually the definition precludes such an element:

Lemma 8.19. Let P be a prefield, and x, y P . Then x + y 6= x.
Proof. Suppose this were the case. Multiplying by the inverse of x gives
1 + x1 y = x1 y. Since P is nontrivial, there is an element 1 6= z P . Adding z
to both sides of the equation gives 1 + x1 y + z = x1 y + z. By the Cancellation
Law we obtain
(3) 1 + z = z.
Adding 1 to both sides of this equation and cancelling zs gives 1 + 1 = 1. By
Distribution it follows that z + z = z 1 = z. Substituting z + z into the right
hand side of Equation (3) gives 1 + z = z + z. By Cancellation we obtain 1 = z, a
contradiction.
Definition. Let P be a prefield. Then write P = P {0} for the set

obtained from P by adding a new element {0}. Extend the addition, and
multiplication from P to P as follows.
(1) 0 + x = x + 0 = x for all x P .
(2) 0 x = x 0 = 0 for all x P .
Proposition 8.20. If P is a prefield (resp. an ordered prefield), then the new set
P has all the properties of a prefield (resp. and ordered prefield), except that 0 does
not have a multiplicative inverse.
186 8. REAL NUMBERS
Proof. Let us check the Cancellation Law of Addition. If x + 0 = y + 0, then

clearly x = y. Suppose that 0 + x = y + x. Then by the above lemma, y / P and
therefore y = 0. The other properties the reader should check, with or without a
pencil.
Definition. If P is an ordered prefield, we may extend the ordering on

P to P simply by saying 0 < x for all x P .
It is easy to see that this is an ordering on P .
4.4. Adding the Negatives
In this section we complete the construction of R. We are working in analogy with

the construction of the integers Z from N; the reader may wish to review that
section for motivation and proofs.
Definition. Let P be a prefield. Denote by A(P ) the set of arrows

ha, b], where a, b P . Arrows hx, y] and ha, b] A(P ) are congruent if
x + b = a + y.
Proposition 8.21. Congruence is an equivalence relation.
Proof. Exercise. Your proof should involve the Associative and Commutative
Laws for Addition, and the Cancellation Law.
For convenience we have again the following proposition.

Proposition 8.22. Every arrow is congruent to an arrow where at least one of the
components is 0.
Proof. Exercise.
Every arrow is thus equivalent to hx, 0], h0, x], or h0, 0], for some x P . We call
the arrows in the first case positive and in the second case negative.
We have the same rules for adding and multiplying arrows in A(P ):
Definition. For a, b, c, d P , put ha, b] + hc, d] = ha + c, b + d] and

.
ha, b] hc, d] = hac + bd, ad + bc]
Proposition 8.23. Addition and multiplication in A(P ) is congruence-invariant.
Proof. Omitted for now.
Definition. The set of equivalence classes of A(P ) under = is denoted

by P [Z]. We give P [Z] the addition and multiplication laws coming from
those in A(P ).
Theorem 8.24. If P is a prefield then P [Z] is a field.

For x P , write x for the equivalence class of hx, 0] and x for the class of h0, x].
As in the case of Z, P [Z] is the union of the xs, the xs, and 0. Therefore as a
set, P [Z] = P P {0}. This is how we would usually think of it.
Definition. Suppose P is an ordered prefield. We define an ordering on

P [Z] as follows: Let x, y P [Z]. If x, y P then use the ordering on P .
If x, y P then say x < y if and only if y < x. If x P and y P
then x < y.
Theorem 8.25. If P is an ordered prefield then P [Z] is an ordered field. If P is a
complete ordered prefield then P [Z] is a complete ordered field.
Definition. Let P = R+ . Then we denote by R the complete ordered

field P [Z].
4.5. Exercises
(1) Let X be a nonempty set, and P = {f F(X, R)|f (x) > 0 for x X}.
Show that P is a prefield.
CHAPTER 9
Miscellaneous
189
190 9. MISCELLANEOUS
1. An ODE Proof
If a function is equal to its own derivative, what is it? Certainly f (x) = ex is

such a function, but are there any others? Of course f (x) = 0 will do, and if you
think about it, any multiple of ex is also equal to its own derivative. How about
f (x) = ex+1 ? Is that another example? Can clever people forever produce more
examples you havent thought of? It is a very important and applicable fact that
every such function is a multiple of ex . The simplest differential equations given to
us by the real world are all of the form y 0 = ky for k a constant, and it is crucial
to pin down solutions to this kind of differential equation.
Let us write our statement formally as a proposition:

Proposition 9.1. Let f : R R be a differentiable function so that f 0 (x) = f (x)
for all real numbers x. Then there is a real number C so that f (x) = Cex for all x.
We shall call this two-sentence assertion a Proposition. The proposition divides

into two parts: the first sentence is the hypothesis and the second statement is the
conclusion. The hypothesis describes the assumption we expect to use to prove the
conclusion.
f (x)
Proof. Consider the function g(x) = . Note that this division makes
ex
ex f 0 (x) f (x)ex
sense because ex is never zero. By the quotient rule, g 0 (x) = .
0 x 0 x
e2x
Since f (x) = f (x) for all x, we also have e f (x) f (x)e = 0 for all x. Therefore
f (x)
g 0 (x) = 0; therefore g is a constant function. Call the constant C; thus x = C,
e
and it follows that f (x) = Cex , as claimed.
This puts the issue to rest. We do not need to worry about any clever people
thinking up other solutions, or more importantly whether our real-world phenom-
enon modeled by y 0 = y can be anything other than a multiple of the exponential
function. (What about f (x) = ex+1 ? If youre really stuck try using the proof to
find the C.)
I want to now present a bad proof of the proposition above.
Proof: Let y = f (x). We have

dy
= y,
dx
so
1
(4) dy = dx.
y
So, Z Z
1
dy = dx.
y
So, for some integration constant C1 ,
log |y| = x + C1 .
1. AN ODE PROOF 191
Thus,
|y| = ex+C1 .
So
y = ex eC1 ,
and therefore
y = Cex ,
where C = eC1 .
This argument is well-known to many students, although at a key step it the

argument is nonsense. That step is at Equation (4), where we have multiplied both
sides by dx and divided both sides by y. Dividing both sides by y is a mild sin,
since it could be zero. But multiplying both sides by dx is the real sham. What
is dx? Is it a number? Is it a function? There is no answer to this question.
Proponents of this proof will tell you something like, It is an infinitesimal quantity
that approaches 0. They are usually confused about the 0/0 indeterminate form
of a limit, and are essentially multiplying both sides of the equation by 0. This
confusion is passed on to the students.
Is it harmful to treat the dx as a mathematical object that you can multiply or

divide by? Definitely; once we learn about partial derivatives, we encounter formulas
of the following kind. Suppose that f (x, y) is a real-valued function of two variables,
and x(t), y(t) are real-valued functions of one variable. View f = f (x(t), y(t)) as a
function of the variable t. Then:
f f x f y
= + .
t x t y t
If x and y were independent mathematical quantities, then one could cancel them
and be left with the paradoxical and very incorrect
f f
=2 .
t t
Thus, students who have learned the wrong proof have bad intuition and are
now completely confused about partial derivatives. It is better not to memorize
mumbo-jumbo, and best to learn real proofs.
1.1. Exercises
(1) Suppose that f : R R is a differentiable function so that f 0 (x) = f (x)

for all real numbers x. Prove that there is a real number C so that
f (x) = Cex for all real numbers x.
(2) Suppose that f is a real-valued differentiable function with domain (0, )
so that xf 0 (x) = f (x) for all x > 0. Prove that there is a real number C
so that f (x) = Cx for all positive real numbers x.
2. Pythagorean Triples
Around 1800 B.C.E. a Babylonian clay tablet was found recording triples of numbers
such as (3, 4, 5), (8, 15, 17), (6, 8, 10), (5, 12, 13). Mathematicians recognize these as
integer solutions to the equation a2 +b2 = c2 . The triples of integers (a, b, c) solving
this equation are called pythagorean triples. I want to allow negative solutions as
well, so notice that any solution (a, b, c) also gives solutions (a, b, c) and also
(b, a, c).
How do you get all of the solutions? We will use a technique called algebraic
geometry to study this problem. This section will bring in more powerful ideas
from than you may be comfortable with. Some facts about integers will be proved
rigorously much later. Dont get too nervous, just sit back and enjoy the ride. It
is good to be occasionally exposed to deeper mathematical thought.
The first observation is that if (a, b, c) is a pythagorean triple with c 6= 0, then

(x, y) = ( ac , cb ) is a rational point on the circle C : x2 + y 2 = 1. (What happens if
c = 0?)
By a rational point P on the plane we mean a point whose coordinates are both
rational numbers.
Proposition 9.2. If P , Q are rational points in the plane, with different abscissas,
then the line connecting them has rational slope.
Proof. Let P = (x1 , y1 ) and Q = (x2 , y2 ). The slope of the line connecting
them is
y2 y1
m= .
x2 x1
Since x1 , x2 , y1 , y2 are all rational numbers, so is m.
Let P = (1, 0) C. If Q 6= P , and Q C has rational coordinates, then the slope

of P Q is rational. The winning geometric idea is to go backwards: Let t Q and
consider the line ` through P with slope t. Then ` will intersect C in some other
point Q. Our calculation will show that Q is also a rational point, and all rational
points on C, besides P , are obtained this way. [Read this paragraph a few times if
you dont get it at first.]
Here is the calculation. A line ` through P with slope t is given by the equation
y = t(x + 1). If (x, y) satisfies both this equation and also x2 + y 2 = 1, then
x2 + t2 (x + 1)2 = 1. Solving this with the quadratic equation gives
t2 1 1 t2

x= , meaning (x = 1) x = .
1 + t2 1 + t2
Of course x = 1 gives the point P which we already know. The other value gives
1 t2

2t
Q= , .
1 + t2 1 + t2
Let us record our work as a proposition.

2. PYTHAGOREAN TRIPLES 193

1t2 2t
Proposition 9.3. All rational points on the unit circle C are of the form 1+t2 , 1+t2 ,
for t Q, or (1, 0).
Any rational number is of the form t = m n for m, n Z. More precisely, we

may take m, n to be in lowest terms, so that gcd(m, n) = 1. (We will talk more
carefully about gcds later, but you should have an idea what this means.) When
we substitute this for t in the formula we get
m2
!
1 2m n2 m2

n2 n 2mn
Q= m2
, m2
= , 2 .
1+ n2
1+ n
n + m n + m2
2 2
At this point it should be obvious to you how to find some (a, b, c) so that
a n2 m2 b 2mn
= 2 and = 2 ;
c n + m2 c n + m2
surely one takes a = n2 m2 , b = 2mn, and c = m2 + n2 . And indeed you can
and should check that (n2 m2 )2 + (2mn)2 = (m2 + n2 )2 . Do we have a proof
that all Pythagorean Triples are of the form (a, b, c) = (n2 m2 , 2mn, m2 + n2 ),
with m, n integers? If you reflect on this for a moment, youll notice something
is amiss. To start with, were not getting the solution (4, 3, 5), because 3 is odd
and 2mn obviously has to be even. Were also neither getting (3, 4, 5) because
m2 + n2 is obviously positive. Thirdly, notice that this doesnt either give the
solution (9, 12, 15) since you cant write 15 as the sum of two integer squares.
None of these exceptions is insurmountable, but we need to think more carefully

about our argument. Our logic debugging strategy will be to take these examples
and trace through the logic with these numbers. Lets start with (9, 12, 15). If we
9 12
convert this to a rational point on the circle, we get ( 15 , 15 ) which is equal to ( 35 , 45 ),
which is also the point corresponding to (3, 4, 5). And this forces us to notice that
(9, 12, 15) = (3 3, 3 4, 3 5). Now if (a, b, c) is a pythagorean triple, then so is
(n a, n b, n c) for any integer n. Let us say that a pythagorean triple (a, b, c)
is primitive if the only common (positive) integer divisor of the three numbers
a, b, c is 1. Then every pythagorean triple (except (0, 0, 0)) is an integer multiple
of a primitive pythagorean triple. So nothing is lost if we find all primitive ones.
Similarly lets now just look for ones with positive integers, since we can anyway
just later include all the possible signs.
We are still bothered by the counterexample (4, 3, 5), because it was supposed
to be covered by this geometric method. Lets go through the program with this
example. It corresponds to the point ( 45 , 35 ) in C. The slope of the line connecting
this point to P is t = 13 (right?). So we were supposed to get this point from m = 1
and n = 3. Heres what happened: Plugging these values into the last expression
for Q gives
2
3 1 23 8 6
Q= , = ( , ).
32 + 1 32 + 1 10 10
So you see what happened? These fractions both reduce further to ( 54 , 35 ), and then
we get our primitive pythagorean triple (4, 3, 5). The point is that even though
t= mn was a reduced fraction, the formula for the coordinates of Q did not give
reduced fractions.
m
So heres what happens in general. There are two cases. Take t = n a reduced
fraction.
Case I: If m or n is even, then the fractions

n2 m2 2mn
,
n2 + m2 n2 + m2
are in lowest terms. These correspond to the primitive pythagorean triples
((n2 m2 ), 2mn, (m2 + n2 )).
Case II: If m and n are both odd, then these fractions are in lowest terms after
dividing by 2. In other words,
1 2 2
2 (n m )

mn
Q= 1 2 2
, 1 2 2
.
2 (n + m ) 2 (n + m )
These correspond to the primitive pythagorean triples

( 12 (n2 m2 ), mn, 12 (m2 + n2 ))
Exercise: Show that by substituting m = u v, n = u + v we get the solutions

(2uv, u2 v 2 , u2 + v 2 ).
Bibliography
[1] N. Bourbaki, Theory of Sets. Springer, 2004.

[2] J.H. Conway, R. Guy, The Book of Numbers, Copernicus, 1996.
[3] K. Ciesielski, Set Theory for the Working Mathematician.
[4] G. Ifrah, The Universal History of Numbers. Wiley, 2000.
[5] D. R. Hofstadter, Godel, Escher, Bach: An Eternal Golden Braid.
[6] S. Krantz, The Proof is in the Pudding: A Look at the Changing Nature of Mathematical
Proof. Springer, 2011.
[7] S. Lang, Basic Mathematics.
[8] J. Lewin, An Interactive Introduction to Mathematical Analysis. Cambridge Univ. Press,
2014.
[9] Y.I. Manin, A Course in Mathematical Logic for Mathematicians, Springer LNM 53.
[10] E. Nagel, J.R. Newman, Godels Proof.
[11] S. Lipschutz, Schaums Outline of Theory and Problems of Set Theory.
[12] S.M. Srivastava, A Course on Mathematical Logic. Springer, 2008.
[13] Oxford Dictionary.
[14] D.J. Velleman, How to Prove It: A Structured Approach.
195

Introduction To Proofs

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Introduction To Proofs

Hochgeladen von

Copyright:

Verfügbare Formate

Introduction to Proof

What is a proof? According to Steven Krantz [6], A proof in mathematics is a

In Chapter 2, we no longer assume an easy familiarity with numbers, as we plan

Chapter 3 is a study of functions and relations. Particularly important is an intro-

Chapter 1. Naive Logic 7

Chapter 3. Functions and Relations 79

Chapter 5. Equivalence 117

(1) 100101 > 101100 .

There is an important distinction I want to make: If a statement is well-formed, it

2.1. Equality versus Equivalence

Let P and Q be statements. (I am using here, much as in algebra, a variable

To define equivalence a bit more mathematically, if we are given statements P

On the other hand, R is certainly not equivalent to P . R is called the negation of

If P is a statement, then what is a good negation of P ? Call the negation Q. If

2.3. Conjunction and Disjunction

Every day, Steven eats dahl or Steven drinks lassi.,

In mathematics, we do include the possibility of both, with the word or. If P

If P and Q are statements, then P Q is called the disjunction of P and Q. It is

2.4. Truth Table Proofs

Proposition 1.2. Let P, Q be statements. Then (P Q) (P ) (Q).

[Negation of and/or statements, tautology, absurdum]

The phrase P implies Q, written P Q is typically confusing for students who

[ converse, contrapositive. Square of Opposition ]

P : I eat an entire chocolate cake.

R: I will not win the wrestling tournament.

Here we prove it using truth tables.

4. Propositional Calculus of a Single Variable

For all positive numbers x and y, we have (x + y)2 = x2 + y 2 .

Here we make a true well-formed statement:

Let x = 0 and y = 2; then (x + y)2 = x2 + y 2 .

Please remember to always initialize your variables in such a fashion. This is

Here are some good ways to introduce, or quantify a variable:

Give it a specific values. (Let x = 4.)

Definition. We say f is even provided that real numbers x, we have

4.2. The First Principle of Analysis

It is important to understand how to combine these quantifiers, particularly in the

Lets start with a benign concept: When is f a constant function?

Definition. A function f is constant provided that C a real number

Lets do another example. Are the following statements true or false?

(1) positive real numbers x, a positive real number y so that y < x.

Proof Strategy: Proof by Contradiction (reductio ad absurdum)

In mathematics, either a statement is true or it is false. There is no middle ground.

This is best understood through examples.

Proposition 1.3. (First Principle of Analysis) Let x 0 be a real number. Sup-

Proof. Suppose x > 0. Then 21 x > 0. By hypothesis, x 21 x (since x isnt

Therefore it is impossible that x > 0, and we conclude that x = 0.

5. Exploiting Symmetry in Proofs

Proof. Label the vertices A, B, C, D, E, F as in the figure. [Sorry...please

Lets now examine the WLOG WMA symmetries

(1) For which n is there necessarily a monochromatic triangle?

(5) If n = 6 there are actually at least two monochromatic triangles.

6. Some Game Theory

Activity: The Game of SIM

Proof. Write the vertices as A, B, C, D, E. Player #1 may start with edge

[Proof: Strategy stealing]

[Explanation for how this proof would fail for chess.]

In 1874, Mathematics received its Theory of Everything, a theory on which all

7.1. Definition of a Set

A set is a collection of elements.

However Im never going to give you a definition of set. There is an insurmountable

Definition. cat: A small domesticated carnivorous mammal with soft

Definition. a: Used when referring to someone or something for the

Definition. used: Having already been used.