Beruflich Dokumente
Kultur Dokumente
Steven Spallone
PREFACE 3
Preface
What is mathematics? Two friends and I pondered this question and came up
with the statement Mathematics is pattern recognition and deduction applied to
numbers and geometry. We believe we observe phenomena of various sorts, and
wish to convince ourselves and others that these phenomena are real. Perhaps we
notice that multiplying two odd numbers tends to produce another odd number, or
that two triangles with proportional sides tend to have the same angles as well. A
curious student should want to know, not only whether these phenomena are true,
but also why they are so. Part of our civilizations great heritage is the observation
and proof of such things. Over the years a standard language has arisen, which is
quite satisfactory to all but the most extreme skeptics. It is the purpose of this
book to introduce students to this language and methods of mathematical proof.
The first chapter treats mathematical grammar, elementary logic, basic proof tech-
niques such as deduction and contradiction. This is followed by a review of (naive)
set theory, and then mathematical induction. At this points students have gained
some skill with proofs and are ready to learn theory building.
The problems throughout the text are a compilation from old homework, exam,
and bonus problems, although I have stripped away hints and demands for rigor.
I feel that it is the instructors place to adapt the problems to the class. The self-
studying student should be warned that many of the problems are difficult, and
should not get hung up on the toughies, which I place at the end of the chapters.
Much thanks are due to Ben Walter for teaching out of an earlier version of the
text, and for several of the problems. At present, these notes are being used by the
4
author for a course at the Indian Institute of Science Education and Research in
Pune.
If you find errata please e-mail them to me and I will thank you and try to update
the notes appropriately.
Steven Spallone
Contents
Preface 3
Chapter 2. Arithmetic 37
1. Introduction 38
2. The Natural Numbers N 38
3. The Division Algorithm 46
4. The Division Algorithm 48
5. Superlatives 51
6. Euclidean Algorithm 53
7. Strong Induction 54
8. Place-Value Systems 57
9. The Fundamental Theorem of Arithmetic 67
10. Chapter 2 Wrap-up 77
Chapter 4. Cardinality 99
1. Finite and Infinite Sets 100
2. Countable Sets 106
3. Uncountable Sets 108
4. Interlude on Paradoxes 111
5. Some History 115
6. Chapter 4 Wrap-up 115
5
6 CONTENTS
Naive Logic
7
8 1. NAIVE LOGIC
1. Introduction
2. Mathematical statements
One of the great features of mathematics is that every problem in the subject has
a correct answer. Every statement is either true or false. Here, for instance, are
some mathematical statements. Do you think they are true or false?
You may or may not know whether these statements are true or false, but you
should believe that there is a correct answer. Mathematicians presume that ev-
ery mathematical statement is either true or false. This philosophy goes back to
Aristotle, and is called the Law of Excluded Middle.
Compare this, for instance, to the statement sI have two hands. and My dog
Diogi is friendly. I consider that these are true statements. Any normal person
would agree with the first statement. But many people, and certainly many other
dogs, would consider the second statement to be false. Moreover Im afraid I would
be unlikely to convince them that it is a true statement.
Even with the first statement, one could imagine, for instance, a paranoid con-
spiracy theorist who believes that professors of mathematics have another hand
hidden somewhere. Most real-world statements are subjective, or debatable on
some level. One thinks of the philosopher Descartes who strives to prove that he
exists! Nonetheless, the logic we develop in this text applies so well to common
real-world situations that we will often spice up this chapter with nonmathematical
statements . Indeed, there are many applications of logic, such as law, which benefit
from this theory. The magnanimous reader will not be offended by the subjectivity
of my real-world examples.
What exactly is meant by the term mathematical statement? For the purposes
of this text, it means a grammatically correct English sentence which only con-
cerns mathematical objects. It should end with a period/fullstop. There should
be a subject and a verb. (In the mathematical statements above, the verbs are
is, equals, has, and equals. The audience should, in principle, understand
precisely what is meant when reading it. We say that putative (mathematical)
statements are not well-formed if they fail to impart precise meaning to their
audience. So for instance,
3 + 7.,
The real number 3 + 7 is awesome.
are not well-formed statements. The first fails because it is not even grammatically
a sentence (there is no verb), and the second fails because the audience presumably
does not know what makes a number awesome. This second sentence can be
2. MATHEMATICAL STATEMENTS 9
remedied by preceding it with a sentence that explains the unfamiliar word. Con-
sider the two statements: We say a real number is awesome, provided that it is
greater than 2. The real number 3 + 7 is awesome. The first sentence defines the
term awesome, and now the second statement is well-formed.
Note: I will often omit the adjective mathematical from the word statement,
when it is understood from context. The word proposition is a synonym for
statement, although later on it takes on the connotation of a statement that
should be proved or disproved.
Note: In a more serious course in logic, meticulous care would be taken in expli-
cating the precise rules for making well-formed statements. There is good reason
for this, which we discuss in the Section Interlude on Paradoxes. However, this
is an enormous endeavor that we will not take on; we will be content with naive
logic.
We will rarely worry about whether two statements are literally equal, since it is
just too strict. A more useful notion is that of equivalence. We say that two
statements are equivalent provided that they mean the same thing. Let us write
P Q if P and Q are equivalent statements, as in the cats and dogs example
above. Certainly P and Q are not equivalent to the statement I hate cats. The
statements 5! > 102 . and 102 < 5!. are equivalent. The statements 2 6= 10.
and 2 < 10 or 2 > 10 are equivalent.
It is not always easy to tell whether two statements are equivalent. Equivalence is
something that needs to be proved. For instance, from the statement P :5! > 102 .,
one can deduce the statement Q:4! > 20 by dividing both sides of the inequality
by 5, and one can deduce P from Q by multiplying both sides by 5. Since we can
deduce one from the other, we may conclude that P Q.
P Q P Q
T T T
T F F .
F T F
F F T
This is an example of a truth table. Given truth values (true or false) of state-
ments, you read the table to find the truth value of the new statement. Caution:
Truth tables rely fundamentally on the Law of Excluded Middle, and so applying
this definition of equivalence to real world examples can lead to nonsense.
Note that any two true (mathematical) statements are equivalent; for instance the
statement 1 + 1 = 2. is equivalent to the statement A triangle has three sides.
Similarly, any two false statements are equivalent. Traditionally, we take 0 = 0.
as the simplest true statement and 0 = 1. as the simplest false statement. So,
any mathematical statement is equivalent to one of these.
2.2. Negation
Suppose that P is a given statement. Write Q for the statement P is true., and
R for the statement P is false. The truth table for P and Q is:
P Q =P is true.
T T
F F
We can see from this table that Q is equivalent to P , since Q agrees with P whether
P is true or false.
You can read this table as, If P is true then P is false. If P is false then P is
true.
For instance suppose P is the statement 100101 > 101100 . Then P is the
statement 100101 > 101100 is false. This is equivalent to the statement
100101 < 101100 . If you are asked to negate a statement, you should find
a statement equivalent to its negation which is as simple as possible. Of course this
is a matter of taste as to what is simplest.
Here are two statements obviously equivalent to the negation of the statement
2 = 10.:
(1) 2 6= 10.
(2) Either 2 < 10 or 2 > 10.
2. MATHEMATICAL STATEMENTS 11
Remark: The statement is false, so strictly speaking any true statement is a nega-
tion here. But the given negations are good answers, assuming that the audience
doesnt know the statement is false. It is always easy to give a reasonable negation
of a mathematical statement, as we will see, whether or not we know it to be true.
We can also prove this theorem more mechanically by constructing a truth table.
P P (P )
T F T .
F T F
Note that this theorem relies heavily on the Law of Excluded Middle. In everyday
speech, we might say something like, Im not hungry but Im not not hungry. to
indicate that the statement I am hungry. is neither true nor false.
Mathematicians share a very precise language. Subtle ambiguities can creep into
the English language, for example with the word or. If you say,
does this assertion include the possibility that on Saturday I might consume both
dahl and lassi?
From the table, if P is true and Q is false, then the statement P or Q is true. If
P false and Q is true, then the statement P or Q is true. If P is false and Q is
false, then the statement P or Q is false.
Next is the mathematical and, which is represented with the symbol . This
combination works the way youd expect; here is the truth table:
12 1. NAIVE LOGIC
P Q P Q
T T T
T F F
F T F
F F F
Thus, P Q is true only when both P and Q are true. The statement P Q is
called the conjunction of P and Q.
2.5. Exercises
Use truth tables to prove the following identities, for any statements P, Q, R.
(1)P P.
(2)(P Q) (Q P ).
(3)Prove that P Q is equivalent to ((P ) (Q)).
(4)Prove that (P Q) R is equivalent to P (Q R). Then the same for
instead of . Is it true if we replace with ? [move]
(5) Find combinations of P, Q, , , which give each of the 16 possible truth
values, given the four possible truth values of P and Q. (This is meant as
a group exercise.)
(6) Of the sixteen different combinations of P and Q from the previous prob-
lem, how many are both commutative and associative?
(7) Let P ,Q, and R be statements. Prove that P (Q R) is equivalent to
(P Q) (P R), and that P (Q R) is equivalent to
(P Q) (P R).
The Vellerman exercise
3. Implication
P Q P Q
T T T
T F F
F T T
F F T
3. IMPLICATION 13
Please note that whenever P is false, then the statement P Q is true, contrary
to what you might think. So under this convention any false statement implies any
other statement. The only time that P Q is false is when P is true and Q is
false. Under this convention, the following statements are true:
If this is the year 1986, then the Earth has two moons.
If this is the year 1986, then the Earth has one moon.
If the earth has one moon, then there are 24 hours in a day.
Here is a false statement: If there are 24 hours in a day, then the Earth has two
moons.
P Q P Q QP P Q Q P
T T T T T T
T F F
F T T
F F T
3.1. A Syllogism
Now we will prove a form of deduction using truth tables. We will prove that
the statement
[(P Q) (Q R)] (P R)
is always true. Before doing the proof, let us apply it to the following statements.
Q: I get sick.
Lets say that you believe that P implies Q, and also that Q implies R. (Do you?)
Then you should also believe that if I eat an entire chocolate cake, then I will not
win the wrestling tournament. Combining two implications in this way one of the
logical exercises going back to Aristotle, called syllogisms. Presumably you have
already mastered this in some intuitive form.
P Q R P Q QR P R (P Q) (Q R) [(P Q) (Q R)] (P R)
T T T T T T T T
T T F T F F F T
T F T F T T F T
T F F F T F F T
F T T T T T T T
F T F T F T F T
F F T T T T T T
F F F T T T T T
As you can see, no matter what the values of P , Q, and R, the value of the final
column is always true. Reflect on this and see if you agree that this is a proof. A
tautology is a formula of propositional logic, which is always true regardless of the
truth value given to the propositional variables. The statement [(P Q)(Q
R)] (P R) is a tautology. You will see more examples in the exercises below.
3.2. Exercises
(1) Use truth tables to check that the following are tautologies. Also explain
why they are true with common sense.
P P
P P Q
(P Q) (P R) Q R
[(P Q) P ] Q
Remark: This last one is another famous example of a syllogism.
4.1. Quantifying
The first thing a math student should learn is how to write a mathematical state-
ment. This is different from writing a mathematical expression or equation. It is
important that you introduce variables with care. For example, merely writing
(x + y)2 = x2 + y 2
is very bad, but not for the reason you might think. The main reason it is bad is
because we have not introduced the variables x and y. Are they real numbers?
Complex numbers? Matrices? A better statement, grammatically, would be
We will call such a statement well-formed. It is false. For the moment, our
priority is to make grammatically correct statements, which may be true or false.
If a statement is not well-formed, then we do not ask whether it is true or false; we
send it back to the author asking for a revision. I hope when you read this book,
and other books, that you appreciate the care that is made to make well-formed
statements.
4. PROPOSITIONAL CALCULUS OF A SINGLE VARIABLE 15
The expressions For all and There exist are ubiquitous in mathematics. They
are called quantifiers, and get the special symbols and , respectively. You
should look for them, or their implication, throughout mathematics.
As another example, note that the constant C in Proposition 9.1 is introduced with
the quantifier.
Here are some examples (which well study later more thoroughly) from the theory
of factorization:
Definition. Let d and n be integers. Then d divides n provided that
an integer e so that de = n. Let n > 1 be an integer. Then n is prime
provided that divisors d > 0 of n, either d = 1 or d = n.
The order in which the quantifiers , are used is crucial; a different order can
completely change the meaning of the statement. For example, which functions
f : R R satisfy:
16 1. NAIVE LOGIC
For all real numbers x, there exists C a real number so that f (x) = C?
All functions f : R R do. For instance, consider f (x) = x2 . Then given a real
number x, there exists C (= x2 ) so that f (x) = C.
This may be confusing to you, so lets look more carefully at these examples. Keep in
mind that once you quantify a variable in a (normal) sentence, it remains quantified
that way for the rest of the sentence. Let me rewrite these two sentences with
parenthesis to clarify this.
(1) C a real number so that ( real numbers x,( we have f (x) = C)).
(2) real numbers x, ( C a real number so that( f (x) = C)).
In (1), The choice of C must be made before x is introduced; it must therefore hold
for all x at once. In (2) within the first set of parenthesis, the choice of x has been
fixed, and we only need to choose C to work with this choice.
Certainly (1) is true; given x > 0 we may take y = 12 x. What about (2)? Is there a
positive number so small that no other positive number is less? Certainly not, and
this fact is so important that we will give a formal proof. The proof will be our
first example of a Proof by Contradiction.
Read this short proof a few times to make sure you understand it.
4. PROPOSITIONAL CALCULUS OF A SINGLE VARIABLE 17
Let us give an application of the First Principle of Analysis. Many people in the
world do not believe that 0.9999 . . . = 1. (Do you?) Suppose you find yourself
locked in a debate with a nonbeliever; here is one argument to explain why it is so.
Let x = 1 0.9999 . . .; we can call it the niggling number. Get your opponent to
agree that x 0. Then get him to think about what the decimal expansion of x
must be. With some reflection, he should agree that it will begin x = 0.000 . . ., and
that it will begin with as many 0s as you like. (He will start to give up, but may
still feel like something is happening at infinite places...) Now, if y is any positive
number, it cant be smaller than x, because it will have some nonzero decimal digit
somewhere. So your gracious opponent will agree that x y for all positive y. Now
you have him. You say that therefore x 21 x, and so if he still doesnt believe that
x = 0 you divide by 0 and show that his way leads to the madness of 1 21 . Done.
4.3. Exercises
(1) Negate the statement, There is a real number x so that for all real num-
bers y, we have x y. Then prove your negated statement.
(2) Negate the statement even integers n > 2, there exist prime numbers
p1 , p2 so that n = p1 + p2 .
(3) Consider the statement, For every real-valued function f : R R there
is a constant k > 0 so that for all x R, we have f (x) |kx|. Is this
statement true or false?
(4) Suppose that f, g : R R are functions satisfying f (x) = g(y) for all
numbers x, y. Prove that f and g are both constant functions.
(5) Suppose that f, g : R R are twice differentiable functions which are
nowhere 0. Let u(x, y) = f (x)g(y). Suppose that
2u 2u
+ 2 = 0.
x2 y
Prove that there exists a constant C so that for all x, we have f 00 (x) =
Cf (x) and g 00 (y) = Cg(y). (You should use the previous exercise.)
18 1. NAIVE LOGIC
Proof Strategy: Without loss of generality Often one is faced with two or
more possibilities in a proof. If all possibilities are symmetric, then we may say
Without loss of generality we may assume (WLOG WMA) to assume one of these
possibilities. The (implied) proofs for the other cases must be exactly the same,
except for the symmetry.
Here is an example of a proof with three usages of WLOG WMA. See if you un-
derstand it; after the proof we will examine the symmetry behind these usages.
Proposition 1.4. Suppose the complete graph on 6 vertices has each edge colored
red or blue. Then there must be either a red triangle or a blue triangle.
If there are three blue edges from A, we may switch the roles of red/blue
for the rest of the proof. This justifies the first WLOG WMA.
If three red edges from A are connected to P1 , P2 , P3 instead of B, C, D,
then we may switch the roles P1 B, P2 C, and P3 D for the rest
of the proof. This justifies the second WLOG WMA.
If CD is red, we may apply the permutation
B C D B,
and if BD is red, we may switch the roles via C D. This justifies the
third WLOG WMA.
5.1. Exercises
For the first 5 problems, suppose the complete graph on n vertices has each edge
colored red or blue.
The theory of the previous section suggests some pleasant graph games.
By Proposition 1.4, eventually one of the players will win. (Compare this to Tic-
Tac-Toe, where a game between experienced players should end in a tie.)
There are many variations of SIM: one may vary the number of dots, or one may
pick other shapes besides triangles, or one may play MIS, in which the first person
to get a triangle (or other shape) wins instead of loses.
Note that it is possible for 5-dot Sim to end in a tie, since the outcome could simply
be a red pentagon and a blue pentagon.
Proposition 1.5. There is a strategy for Player #1 to always win 5-dot MIS.
Case I: Player #2 plays BC. In this case, Player #1 plays AD, forcing Player #2
to play BD. Player #1 plays CD, then Player #2 must play AC. Player #1 now
plays DE for the win, since on his/her next move, either AE or CE makes a red
triangle.
Case II: Player #2 plays CD. Player #1 plays AE, forcing Player #2 to play
BE. Player #1 now plays AD for the win, since on his/her next move, either DE
or BD makes a red triangle.
Corollary 1.6. If n > 5, there is a strategy for Player #1 to always win n-dot
MIS.
20 1. NAIVE LOGIC
Proof. Red simply picks 5 of the n dots and follows the previous strategy.
Lemma 1.7. For any n there is either a strategy for Player #1 to always win, or a
strategy for Player #2 to always win n-dot SIM.
[Proof]
Proposition 1.8. For any n there is a strategy for Player #2 to always win n-dot
SIM.
6.1. Exercises
(1) Find an example of a game of SIM in which the game doesnt end until
the 15th edge is drawn.
(2) Consider the strategy in chess (or checkers, or go) to always mirror your
opponents move. Must the game end in a tie? Experiment with a friend.
(3) Sometimes a master of chess (in which white moves first) will play two
games with two opponents at the same time, alternating moves between
the boards. Suppose you are the second of these opponents, and that the
master is playing as black against you, and as white in the other game.
Describe a way to play so that the master does not win both of the games.
(4) Learn the games Dots and Boxes, Connect 4, Gomoku, and Hex. Like
SIM, these games have variations. Find some variations where ties are
impossible, and find some examples to which strategy-stealing arguments
apply.
7. SETS 21
7. Sets
This is a very well-known sentence. How do you feel about it? Do you think it
makes a good definition? In mathematics, a good definition precisely introduces a
word or phrase in terms of earlier established concepts. A mathematician reading
this will ask, what is a collection ? What is an element ? Where are the definitions
of those words? Without knowing precisely what are collections or elements, this
is simply not a definition in the mathematical sense.
Hold on. Maybe I dont know what the word a means. Lets look that up!
Before I notice the fact that the word a is used in the definition of a, I am again
lost because I dont know used.
Uh-oh. We are now totally stuck. The first word in the definition of possess is
have, and the first word in the definition of possess is have. We are unable to
understand the definitions of possess or have purely by using the dictionary, and
22 1. NAIVE LOGIC
in some sense we can never understand the other words, including cat, without
any of our own intuition. Its an interesting question, to what extent a picture
dictionary would help in this regard.
And this is how it goes. Certain concepts we treat as irreducible, in the sense
that we dont have a way to define them in terms of simpler notions. The notion
of a set is an irreducible concept. Here are some other definitions of set, just to
be thorough.
It is interesting to look at mathematical books and see what they treat as irreducible
concepts, and how they introduce these. Here are some definitions from Euclids
Elements of notions that seem like they were really irreducible back then:
Anyway were not going to precisely define a set. We will say many things about
them however. And we will otherwise try to define things as well as we can.
The notation A B means the same thing. It is like with inequalities. For example
let B = {2, 1} and A = {2, 1, 7} again. Then B A.
For instance, the sets {1, 2} and {1, 2, 1} are equal, even though 1 is presented twice
in the expression for the second set.
I should warn you that many eminent authors use the notation B A to mean
B A, and many use notation as written here. I like the analogy with inequality
for numbers, which is why we will use the above notation in this book.
Convince yourself by using the definitions that if A is any set, then A. You will
need to remember that if a statement P is false, then P Q is true no matter
what the statement Q is.
Definition. Let A be a set. The power set (A) is the set of all subsets
of A.
Before the next definition, please note quietly that if X is a set, and A is a subset
of X, then {x X | x A} = A.
[To be written.]
7.4. Exercises
and
(x, z) R2 | (1 x 1) (1 z 1) (x = z) .
(a) Use Venn diagrams to illustrate that the statement is generally true,
or sometimes false.
(b) Reduce the statement to propositional logic, and use a truth table to
verify your answer above.
26 1. NAIVE LOGIC
8. Induction
The method of induction is suggested by problems of the following type. You want
to prove a proposition P (n) which involves a parameter n which is a natural num-
ber. As n varies, you get infinitely many different propositions P (1), P (2), P (3), . . .
Imagine that P (n) is easy when n is small but gets progressively more complex as
n grows. Then a reasonable idea is to try to prove the smaller ones first, and work
your way up to the bigger ones.
More generally, suppose we have a sequence of propositions P (1), . . . , P (n), and for
every k from 1 to n 1, we can show that P (k) implies P (k + 1). Then by iterating
the above idea we get that P (1) implies P (n):
P (1) P (2) P (3) P (n 1) P (n)
This is the basic form of induction. (The word induction suggests an electrical
analogy. Think of each P (k) as being connected to P (k + 1) by a wire. Then if you
charge up P (1) with veracity, the charge will eventually get to P (n).)
Thus in practice, if you want to prove P (n) for all integers n 1 by induction,
then you must prove:
Step 2 has some logical complexity to it, and is often misinterpreted. You do not
prove that P (k) is true. You show that if it were true, then P (k + 1) would also be
true.
Step 2 is usually the hardest. Its not going to work unless you see a relationship
between the various P (k). You need to see a way to make the step from each one
to the next. Warning: it is not always manageable to prove a proposition P (n)
with induction, as there may not be any tractable relationship present. Moreover,
one can often prove a proposition directly and more simply without induction. So
dont get too carried away with this.
Let us call the proposition P (n). It is healthy to always try writing out explicitly
a few of the smaller P (n)s. For instance
1(2)
P (1) : 1 = ,
2
2(3)
P (2) : 1 + 2 = ,
2
8. INDUCTION 27
3(4)
P (3) : 1 + 2 + 3 = .
2
All these are easily verified; this suggests that we have correctly interpreted the
problem. Warning: P (n) is not a number! Do not say, for example, that P (2) = 3.
The P (n) are always mathematical statements, never numbers. In this case they
are equations.
Now there is an obvious relationship between the P (k)s as k grows. The left hand
side of P (k + 1) is obtained from the left hand side of P (k) by adding k + 1.
So thats it. We have checked that P (1) is true, and proven Step 2. Finish by
writing something like Thus by standard induction P (n) is true.
Id like to remark that this proof is a little unsatisfying, in that it never really
explains the formula. (Although it serves as a good example of induction.) There
are many proofs of this important result; here is an easy one:
Our next example of induction I find much more satisfying. Let us prove the power
rule of calculus, that is
d n
Proposition 1.11. If n N then dx (x ) = nxn1 .
dx
We will assume only the product rule for derivatives and the rule dx = 1.
28 1. NAIVE LOGIC
d
Proof. As before we write P (n) : dx (xn ) = nxn1 . It is good to fo-
cus first on a few small cases. P (1) is the rule dx
dx = 1, which we have al-
d 2
ready assumed. P (2) is the rule dx (x ) = 2x. Why is this true? Typically
2 2
one writes out limh0 (x+h)h x , does some algebra and limit-logic to get P (2).
But we want to connect P (2) to P (1) and so will instead use the product rule:
d 2 d d d d
dx (x ) = dx (x x) = x dx (x) + x dx (x) = 2x dx (x) = 2x. Note that the last equal-
ity uses P (1). Can we make this connection more generally? You bet; using the
product rule:
d k+1 d k d d
(x )= (x x) = xk (x) + x (xk ).
dx dx dx dx
We finish this off by applying P (1) and P (k):
= xk + x(kxk1 ) = (k + 1)xk .
d k+1
Combining all the equalities yields P (k + 1) : dx (x ) = (k + 1)xk . Thus P (n) is
true by induction.
Definition. (
1 if n = 1,
n! =
n (n 1)! if n > 1.
For example if we want to know what 3! is, the definition says it is 3 2!. This
forces us to use the definition again to determine that 2! = 2 1!, and we need to
look once more at the definition to find that 1! = 1. We put this all together to get
3! = 3 2 1 = 6. The reader should believe that given any positive integer n, one
can in principle use this definition to compute n!.
n
d
As another example, consider the nth derivative of a function f , dx n (f ), which
may be familiar as what you get when you differentiate f n times. The recursive
definition is
Definition. (
df
dn f dx if n = 1,
= dn1 df
dxn dxn1 dx if n > 1.
The advantage of using recursive definitions is that it does not require readers to
use their imagination about doing something n times. There are no s, for
example; all the logic is laid out for you. This is particularly nice when these
concepts gang up on you. Here is a small example.
dn n
Proposition 1.12. dxn (x ) = n!
8. INDUCTION 29
dx
Proof. Induction on n. The statement for n = 1 is dx = 1, a familiar fact.
Suppose the proposition is true for k. Then
dk+1 k+1 dk d k+1 dk
(x ) = (x ) = ((k + 1)xk ),
dxk+1 dxk dx dxk
dn
using the recursive definition of dx n and Proposition 3.7. One factors out the k + 1
Finally, using the recursive definition of n! this is equal to (k + 1)!. We are done
by induction.
I hope you can see in the above example that recursive definitions mesh well with
proofs by induction. The resulting proof is clean, and does not ask the reader to
visualize, for example, a sequence of exponents coming down and being multiplied,
exactly as many times as the power of x, until we simultaneously have x0 multiplied
the product of integers from 1 to n. The latter, with some examples, is fine if youre
talking to someone and cant write things down. The inductive proof is clearer and
easier to check.
Id like to codify the logic of induction from the previous section as (P (1)
k(P (k) P (k + 1))) n, P (n). The first is something that you need to
prove in an induction proof, and the second is the statement of induction. Thus
if you can manage to prove the items in parenthesis, you have obtained the items
on the right of the second .
Sometimes you want to tweak the rules of induction. Consider the following prob-
lem. We have an intuitive understanding that factorials grow much faster than
polynomials, and want to prove that n! > n2 . Unfortunately this isnt true for
30 1. NAIVE LOGIC
some small values of n. In fact for n = 2 and 3, the quantity n2 is bigger than n!.
For n = 4, we finally have 4! = 24 > 16 = 42 .
Thus we will use the proposition for n = 4, and also prove that if P (k) is true
k 4, then P (k + 1) is true.
So we will get
P (4) P (5) P (6) P (n 1) P (n).
Proposition 1.13. If n is an integer greater or equal to 4, then n! > n2 .
Once you know what to do, the proof need not be very long. The above proof
requires a sophisticated and active reader who understands inequalities well.
Here is another kind of problem. Suppose you want to convince yourself that you
can integrate any power of sin(x). Well make P (n) the somewhat imprecise I have
a formula for an antiderivative of sinn (x). This will be true for, say, n 0. (Is it
8. INDUCTION 31
true for negative n?) Lets do a couple. An antiderivative of sin0 (x) = 1 is given
by x, and an antiderivative of sin(x) is given by cos(x). What about sin2 (x)?
Heres one approach:
Z Z Z
sin2 (x)dx = (1 cos2 (x))dx = x cos2 (x)dx.
Now integrate by parts, with u = cos(x) and dv = cos(x)dx. The latter integral
becomes Z Z
cos (x)dx = sin(x) cos(x) + sin2 (x)dx.
2
The devout calculus student will recall that we put this all together to get:
Z Z
2 2
sin (x)dx = x sin(x) cos(x) + sin (x)dx ,
This same basic method works to write higher powers of sin in terms of lower
powers:
Z Z Z Z
k k2 k2
sin (x)dx = sin 2
(x)(1cos (x))dx = sin (x)dx sink2 (x) cos2 (x)dx.
Let u = cos(x) and dv = sink2 (x) cos(x)dx. The latter integral becomes
Z Z
k2 1 k1 1
sin 2
(x) cos (x)dx = sin (x) cos(x) + sink (x)dx.
k1 k1
Putting this all together we get:
Z Z Z
k k2 1 k1 1 k
sin (x)dx = sin (x)dx sin (x) cos(x) + sin (x)dx ,
k1 k1
and we can solve for sink (x)dx:
R
k1
Z Z
1
sink (x)dx = sink2 (x)dx sink1 (x) cos(x).
k k
Okay. So I hope that was a pleasant review of integration by parts. Where are
we? We have R shown that if P (k 2)
R is true, then so is P (k), since we can write a
formula for sink (x)dx in terms of sink2 dx. If we want the integral for sin6 (x),
for example, we can use the above to reduce to sin4 (x) which reduces to sin2 (x),
and then to 1, which we know. If we wanted the integral for sin7 (x), we can use
the above to eventually reduce to sin(x), whose antiderivative has also been noted.
This suggests a new induction scheme. If weve proven P (0) and P (1) and also
proven that P (k) implies P (k + 2), then P (n) is true for all integers n 0. I will
codify this as:
(P (0) P (1) k(P (k) P (k + 2))) nP (n)
Which schemes are valid? For now, use your common sense. Later we will give
proofs for the validity of other induction schemes based on the original one.
The mother of them all, though, is Strong Induction. This is the scheme
(P (a) (( a k < n P (k)) P (n))) n a P (n).
You are probably a little tired of induction now so we will postpone the discussion
of Strong Induction until later.
8.4. Exercises
Standard Induction
(7) Prove that you can solve the n towers of Hanoi problem in 2n 1. Also
prove that this is the minimum number of moves required.
Induction Schemes
(8) Write hk for the k-th Hemachandra number, starting with h0 =0 and
h1 = h2 = 1. Thus, hk+2 = hk + hk+1 for k 1. Let = 1+2 5 and
= 12 5 . Use the induction scheme
(P (1) P (2) (kP (k) P (k + 1) P (k + 2))) nP (n)
to prove that
n
n
hn = .
5
In other words, check the formula is correct for n = 1 and 2, and
prove that if the formula is correct for n = k and n = k + 1, then it is
2
true for n = k + 2. (Algebra tip: Show that 2 = + 1 and = + 1.)
(9) Prove that a, b N, we have ha+b = ha+1 hb + ha hb1 . (Suggestion: Use
the same induction scheme; youre not meant to use the -formula from
the previous problem.)
a
(10) Use a scheme mentioned in the text to prove that any positive fraction
b
can be reduced to a fraction in which the numerator and denominator are
not both even.
(11) Xuande has a pile of 4- and 5-cent postage stamps. What are all the
postages he can pay? Give a proof. (Suggestion: After you figure out the
answer, come up with an appropriate induction scheme.)
(12) Which of the following are valid induction schemes? Explain.
(a) (P (1) (k 2(P (k) P (k 1))) nP (n).
(b) (P (1) (k(P (k) (P (2k) P (2k + 1)))) nP (n).
(c) (P (0) P (1) (k, ` Z(P (k) P (`)) P (k `))) n Z P (n).
(d) (P (1) (kP (k) P (k + 1) P (k + 2))) nP (n)
9. Chapter 1 Wrap-up
(2) Let P and Q be statements. Prove that you can not form the statement
P exclusive or Q, or P + Q, using only the symbols P, Q, , .
(3) In this exercise we look at some of the Zermelo-Fraenkel Axioms of Set
Theory, written in the raw form of Propositional Logic. In principle they
do not need any explanation but in reality it is a challenge to understand
what they mean. Can you explain? x, y, z, u can be regarded as both sets
and elements.
(a) xyz(x z y z).
(b) xy(z(z x z y) x = y).
(c) xyz(u(u z u x) z y).
(d) x(x = x).
(e) xyzu(u x z u z y).
(f) x( x y(y x y {y} x)).
Here is another statement apropos of Set Theory:
z((x
/ x) (x z)).
Arithmetic
37
38 2. ARITHMETIC
1. Introduction
In this chapter we will develop the basic properties of arithmetic, using as few
assumptions as possible.
In Section 2 we lay down the three Peano Axioms, and prove from them the rules
of addition and multiplication.
Arithmetic starts getting really interesting when we get to the idea of division with
remainder. In Section 4 we develop this concept and the related idea of a place-value
system.
By Section 4 we are ready to treat the theory of prime numbers, and the Fundamen-
tal Theorem of Arithmetic. The FTA says that every number can be given unique
coordinates, with one component for each prime number. These coordinates
completely determine the multiplicative role of a number.
Admittedly we will not actually be able to construct the natural numbers N, since
we need a spark of life to get going. This spark takes the form of the existence of
an infinite set, which we assume has been organized into a certain linear shape.
Assuming the presence of this shape, we will be able to define the basic operations of
arithmetic and derive their basic properties. Moreover we will be able to construct
the other sets out of N.
Children believe that there is a counting process to get to all numbers, starting with
1, in which every number is succeeded by another number. The rules are designed
so that different numbers have different successors, and you never get back to 1.
Let us unravel some of these properties. The property (INJ) says that if two
numbers have the same successor, then they must be the same number. It may be
2. THE NATURAL NUMBERS N 39
Property (INF) is simple enough; it just means that 1 is not the successor of any
number. (In particular 0
/ N!)
For example, the set of odd numbers is not inductive, since 1 is odd but 10 is not.
The set of numbers greater than 100 is inductive.
The reason this last axiom is called (IND) is that it actually allows us to use induc-
tion when proving things about N. Heres why. Let P (n) for n N be a sequence of
propositions as in the previous section. Write S = {n N | P (n) is true}. Suppose
we know that P (1) is true. Then 1 S. Suppose that we know that, for all k,
P (k) P (k 0 ). Then S is inductive. By (IND), we deduce that all natural numbers
are in S. This means that P (n) is true for all n, as desired.
Until we have enough arithmetic to develop a place-value system, we will use Roman
numerals (except often for 1 itself) for elements of N. They are well-suited for Peano
arithmetic anyway. Thus the first few natural numbers {1, 10 , 100 , 1000 , 10000 , . . .} will
be denoted as {I, II, III, IV, V, . . .}. In other words,
Definition.
I = 1, II = I0 , III = II0 , IV = III0 , V = IV0 ,
VI = V0 , VII = VI0 , VIII = VII0 , IX = VIII0 , X = IX0 .
40 2. ARITHMETIC
We will also have occasion to use larger Roman numberals without comment but
they will be no larger than M = XIII .
Example:
II + III = (II + II)0 = ((II + I)0 )0 = ((II0 )0 )0 = V .
Suppose the theorem is true for some c. Taking successors of both sides yields
[(a + b) + c]0 = [a + (b + c)]0 .
The definition of addition lets us move the prime within the sums on both sides of
the equation:
(a + b) + c0 = a + (b + c)0
= a + (b + c0 ).
Proof. Exercise.
a + k 0 = (a + k)0
= (k + a)0
= (k + a) + 1
= k + (a + 1)
= k + (1 + a)
= k 0 + a.
The fourth and sixth equality used associativity, and the fifth used the lemma. The
rest follows from the definition of addition and the inductive hypothesis. Thus we
are done by induction.
This proof is a little bit complex logically. Before writing the formal proof, let us
brainstorm for a few minutes first. It will be an induction proof on n. The case
P (1) is the axiom (INJ). What about P (2)? Suppose a + II = b + II. Then I can
rewrite both sides as (a + 1)0 = (b + 1)0 . To this we apply (INJ) to get the equality
a + 1 = b + 1, which by P (1) implies a = b. This logical deduction gives P (2).
Similarly, for P (3) we start with a + III = b + III, rewrite it as (a + II)0 = (b + II)0 ,
and apply (INJ) to get a + II = b + II. This shows that P (2) P (3). Now we are
ready for the proof.
Proof. The hypotheses imply that there are numbers x and y so that a+x = b
and b + y = c. Then we compute that a + (x + y) = (a + x) + y = b + y = c, thus
a < c.
Proof. Exercise.
42 2. ARITHMETIC
Proof. Fix b, and write P (a) for the statement of the lemma. We will induct
on a, holding b constant. Lemma 2.8 gives us P (1). Now suppose P (k) is true,
giving three possible cases. We will show that each of these cases leads to a case of
P (k 0 ), which will prove the theorem. If k < b then by the Creeping Lemma k 0 b.
Thus k 0 = b or k 0 < b. If k > b then since k 0 > k we conclude k 0 > b. If k = b then
k 0 > b.
To prove that no more than one of these possibilities can hold, we need the following
proposition.
Proposition 2.11. a, n N, we have a + n 6= n.
The corollary says that a natural number cannot be less than itself. Do you see
why the corollary follows immediately from the Proposition? If not, go back and
reread the definition of a < b.
Theorem 2.13. (Strong Trichotomy) Let a, b N. Then exactly one of a < b,
a > b, or a = b holds.
Thus a + (b a) = b = (b a) + a by definition.
Note that this number is uniquely determined, by the Cancellation Law of Addition.
shows that z + (x y) is (z + x) y.
Can you guess what the two other Associative Laws of Subtraction are? (Theyre
given as exercises.)
That is as far as we will go with just addition. At this point we will freely use the
associative and commutative rules of addition without comment.
Example:
II III = II II + II = (II I + II) + II = (II + II) + II = IV + II = VI .
Theorem 2.15. If a, b, n N, then we have (a + b) n = a n + b n.
To get P (k 0 ) we have
(a + b) k 0 = (a + b) k + (a + b)
=ak+bk+a+b
= (a k + a) + (b k + b)
= a k0 + b k0 ,
as desired. We are done by induction.
Theorem 2.16. If a, b N, then a b = b a.
44 2. ARITHMETIC
Proof. Exercise.
Corollary 2.17. If a, b, y N, then y (a + b) = y a + y b.
Proof. Induction on n. If n = 1 then both sides are ab. Suppose the theorem
is true for n = k. Thus (a b) k = a (b k).
To get P (k 0 ) we have
(a b) k 0 = (a b) k + a b
= a (b k) + a b
= a (b k + b)
= a (b k 0 ),
as desired. (Can you justify each step?) We are done by induction.
Proposition 2.19. If a < b, then na < nb. Moreover n(b a) = nb na.
2.4. Exercises
(1) Lemma 2.3, Proposition 2.7, the Creeping Lemma, and Theorem 2.16.
(2) Prove that IV II = II using the definitions.
(3) If a > b prove that a2 > b2 .
(4) If a > b prove that a2 b2 = (a + b)(a b).
(5) If a > b + c prove that a (b + c) = (a b) c.
(6) If b > c and a + c > b prove that a (b c) = (a + c) b.
(7) State an recursive definition of ab for a, b N, agreeing with the usual
sense. Use your definition to prove that for a, b, c N,
ab ac = ab+c .
(8) Prove that abc = (ab )c .
2. THE NATURAL NUMBERS N 45
Here are some basic properties of divisibility, which the reader should verify:
Proposition 2.21. Let a, b, c N. We have:
Proof. Suppose that II | III. We also have II | III, so by the last item in Propo-
sition 2.21, we may conclude that II | I. But this contradicts the previous proposi-
tion.
Note that this number is uniquely determined, by the Cancellation Law of Multi-
plication. For example n 1 = n for all n N. Here is a typical proposition and
proof involving this definition:
Proposition 2.24. If a|b and c|d then ac|bd and bd ac = (b a) (d c).
By popular demand, we give two proofs, one straightforward and another concep-
tual. First, the straightforward and unimaginative one.
Proof. Lets just write out all the divisibility definitions. Say aq = b and
cp = d. Then aqcp = bd, so by definition, (b a) (d c) = qp = bd ac.
Thus, Div(a) is the set of divisors of a. For example, Div(XV) = {I, III, V, XV}.
Here are translations of parts of Proposition 2.21:
{1, a} Div(a).
If a Div(b) and b Div(c), then a Div(c).
If a Div(b), then a Div(bx).
The following basic properties of addition and inequality which we proved for N
also hold for N: Commutativity and Associativity of Addition, Trichotemy, the
Creeping Lemma. It is a bit tedious to verify that so many proposition we proved
for N still hold for whole numbers, so let us just give a sample, leaving the rest to
the reader.
Proposition 2.25. For a, b N we have a + b = b + a.
48 2. ARITHMETIC
Obviously Lemma 2.8 fails and should be replaced with 0 n for all n N. Again,
the proof is just by two cases: either n N or n = 0.
Remark: One reason for the definition 00 = 1 is for the facility of power series.
P xi 00
For example, ex = i=0 evaluated at x = 0 gives 1 = , which suggests that
i! 0!
we define both 0 and 0! to be 1. A more philosophical reason that 00 = 1 is as
0
Here is a very important theorem about integers which lies at the heart of arith-
metic.
4. THE DIVISION ALGORITHM 49
Proof. Fix a; we want to induct on b. Write P (b) for the statement of the
theorem. If b = 0 put q = r = 0. Suppose P (k) is true.
This relates to Standard Induction as follows. If P (0) is true, then by the hypoth-
esis, so is P (1). Then together with P (k) P (k 0 ) we recover the hypotheses of
Standard Induction, thus get P (n) for all nN. We also have P (0), so we finally
get P (n) for all n N.
Let us pause for a tiny application. For sanitys sake, let 2 = II.
I want to break the rules and talk about rational numbers for just a moment,to
show one of the greatest mathematical proofs of all time, the irrationality of 2,
as an application of the work weve just done.
Theorem 2.29. There does not exist a rational number whose square is 2.
that a is itself even. We may write a = 2k for some k N . Thus (2k)2 = 2b2 ,
and canceling 2s gives 2k 2 = b2 . So now b2 is even, and again this means that b
is even. This contradicts the fact that at least one of a, b is odd, and we conclude
that such a q must not exist.
Now we return to Peano Theory. For the uniqueness part of the Division Algo-
rithm, we generalize the () part of the proof of Lemma 2.25. [fix]
Theorem 2.30. (Division Algorithm, Uniqueness) Let a N. Suppose there are
q1 , q2 , r1 , r2 N with
q1 a + r1 = q2 a + r2 ,
and r1 , r2 < a. Then q1 = q2 and r1 = r2 .
Proof. The idea is simple enough, but working it out carefully is a good
benchmark for our Peano Theory. Consider Trichotomy for r1 and r2 . If they
are unequal then one is greater than the other. Say r1 < r2 . By definition of
subtraction, this means that q1 a = (q2 a + r2 ) r1 . By Lemma 2.14 [fix], q1 a =
q2 a + (r2 r1 ). This shows that q1 a > q2 a, and indeed that q1 a q2 a = r2 r1 . By
Proposition 2.19 [fix], we have a(q1 q2 ) = r2 r1 . The left hand side is plainly a
nonzero multiple of a, thus by Proposition 2.22 [fix],
a r2 r1 r2 < a,
which contradicts Strong Trichotomy. Thus r1 = r2 , which implies that q1 a = q2 a
by the Cancellation Law of Addition. By the Cancellation Law of Multiplication
we conclude that q1 = q2 .
4.1. Exercises
5. Superlatives
Let us discuss the notion of the minimum and maximum member of a set of
numbers.
Definition. Let S N. A number ` N is a lower bound of S
provided that (s S) (` s). A number u N is an upper bound
of S provided that (s S) (u s).
For example, if S is the set of positive even numbers, then 1 and 2 are the only
natural numbers which are lower bounds of S, and S has no upper bounds.
If S = , then pure logic dictates that every n N is both an upper and lower
bound of S. (Can you see that?) However since n / S for all n, it obviously cannot
have a minimum element.
Theorem 2.31. (Well-Ordering, Min Form) Let S N be nonempty. Then there
is an element m S which is a lower bound of S.
Exercise; apply the Min Form to the set of upper bounds for S.
Example: Write Div(a, b) for the set of common divisors of a and b. In other
words, Div(a, b) = Div(a) Div(b), the intersection of the two sets.
Example: Div(42) = {1, 2, 3, 6, 7, 14, 21, 42} and Div(24) = {1, 2, 3, 4, 6, 8, 12, 24},
so Div(42, 24) = {1, 2, 3, 6}. It is easy to see now that gcd(42, 24) = 6.
Similarly write Mult(a, b) for the set of common multiples of a and b. Since ab
Mult(a, b) this is nonempty; thus it has a minimum element.
Definition. Let lcm(a, b) = min Mult(a, b); it is called the least com-
mon multiple of a and b.
Example: Mult(42) = {42, 84, 126, 168, 210, . . .} and Mult(24) = {24, 48, 72, 96, 120, 144, 168, 192, . . .},
so Mult(42, 24) = {168, . . .}. Therefore lcm(42, 24) = 168.
6. EUCLIDEAN ALGORITHM 53
6. Euclidean Algorithm
If you are like most students, you have an old habit of thinking about the gcd of
two numbers as follows. You take your two numbers, factor them, and then for each
prime note the smaller exponent that occurs in the factorizations of both numbers.
The exponents of primes appearing in the factorization of the gcd will be these
smaller exponents.
While we will eventually derive this characterization of the gcd, you should forget
about it for a while for two reasons. One, it is usually inefficient to factor large
numbers. Two, at this point in the course we are trying to train you to under-
stand the logic of the definition of mins and maxes, as well as digest the theory of
divisibility.
Try to work through the following two lemmas, to break yourself from the afore-
mentioned habits.
Lemma 2.33. If b = qa + r, then gcd(a, b) = gcd(a, r).
Proof. This follows from the fact that Div(a, b) = Div(a, r), which the reader
should prove.
Lemma 2.34. If a|b, then gcd(a, b) = a.
Proof. This is a good exercise for you to do right now. Use the definition of
gcd!
These two lemmas allow us to compute the gcd of any two natural numbers. Con-
sider, for example, a = 51 and b = 36. (Allow me to use normal notation for
numbers for this example.) Applying the division algorithm yields
51 = 1 36 + 15.
By the first lemma, we conclude that gcd(51, 36) = gcd(36, 15). So we have simpli-
fied the problem. Next,
36 = 2 15 + 6.
Thus gcd(36, 15) = gcd(15, 6). Next,
15 = 2 6 + 3.
Thus gcd(15, 6) = gcd(6, 3). But since 3|6, we know gcd(3, 6) = 3 by the second
lemma. Thus gcd(51, 36) = 3.
This is a great algorithm for computing gcds, and originates in the first proposition
of Book VII of Euclids Elements. It is described therein as antenaresis, or
repeated subtraction.
54 2. ARITHMETIC
The next step is to start with the smaller of the underlined numbers on the right,
find the equation in which it is the remainder, and use that equation to substite in
a difference of larger numbers.
3 = 15 2 (36 2 15),
Then combine terms.
3 = 5 15 2 36.
Now the 15 is the smaller of the underlined numbers, so again subsitute and com-
bine:
3 = 5 (51 1 36) 2 36.
3 = 5 51 7 36.
Meanwhile just practice with the algorithm; the proof wont be much more than
that.
7. Strong Induction
In Exercise 8 in Section 8.4 you were asked to prove a formula for the Hemachandra
numbers, using the induction scheme
(P (1) P (2) (k(P (k) P (k + 1) P (k + 2)))) nP (n).
This meant that you were to verify the formula at n = 1 and 2, and then prove
that if the formula is correct for two consecutive numbers, then it is true for the
next number.
We are now in a position to prove that this is a valid induction scheme. It comes
down to the following proposition:
Proposition 2.36. Let S N be a subset of N satisfying the following properties.
(1) 1, 2 S.
(2) n N, we have (n, n + 1 S) (n + 2 S)
Then S = N.
7. STRONG INDUCTION 55
Before proving this, let me spell out the relationship with the above induction
scheme. Suppose you are given propositions P (n) as in the Hemachandra situation,
then let S = {n N | P (n)}. Then knowing P (1), P (2) and knowing that P (k)
and P (k + 1) together imply P (k) tells you that S satisfies properties (1) and (2).
Therefore S = N and so P (n) is true for all n N.
I hope you get the feeling from the above proof that this min technique is very
powerful. It seems unfit to use it for such a random-looking induction scheme. In
fact, this same technique will take us all the way to Strong Induction.
The idea of Strong Induction is as follows: Again you have a sequence P (n) of
propositions, and know that P (1), say, is true. Suppose you can always reduce
P (k) to either
This should make sense to you, because you are always decreasing n until it finally
gets down to 1. You shouldnt have to worry exactly how it decreases, just that it
does.
Although you may not necessarily need all k between 1 and n, its simpler to
suppose that you do.
Suppose I have a bar of chocolate with vertical and horizontal lines dividing it into
an m n grid of segments, for some m, n N. I want to break the bar into mn
pieces, by breaking the bar along the lines. Lets say that a split is the act of
breaking a bar along a line to get two smaller bars. How many splits will it take?
Proposition 2.37. It always takes mn 1 splits.
56 2. ARITHMETIC
Its not hard to come up with an algorithm that breaks up the bar in some organized
fashion, and check that your algorithm takes mn 1 splits. But the stronger fact
is that no matter how you break it up, it always takes the same number of splits.
[Someday, a diagram...]
This is clear for P (1), because this is the case of only one piece, so no splits are
required.
Suppose k > 1. Then some split is possible. Suppose the splitting is as in the
diagram. Then the total number of splits required is
1 + (m1 n 1) + (m2 n 1)
by P (m1 n) P (m2 n). Since this is equal to mn 1 = N 1, we have proved the
inductive step. We are done by strong induction.
We now prove the validity of the strong induction scheme. Its just like the proof
of the validity of the Hemachandra induction scheme. As before, our proof will use
the Min Form of Well-ordering, which was proved using (IND). So really, standard
induction implies strong induction.
(1) 1 S.
(2) k > 1, we have (1, . . . , k 1 S) (k S). (In other words, if all
numbers less than k are in S, then k S.)
Then S = N.
Proof. Let T = S c the set of natural numbers not in S. If the theorem is not
true then S is not N and therefore T is nonempty. Let m = min T . Note m > 1
since 1 S. Also note that if k < m, then k
/ T so k S. By hypothesis, m S
which is a contradiction.
7.3. Exercises
8. Place-Value Systems
There are very effective ways to represent numbers, using what is called the Place-
Value system. This is the way of expressing whole numbers by assembling a (finite)
string of digits. Let us begin with base X.
58 2. ARITHMETIC
How do you take the successor of a number in place-value language? If you have a
number N represented by a string of digits dm dm1 d1 d0 , then N 0 is usually given
by dm dm1 d1 d00 , where d00 is the digit coming after d0 . That rule doesnt work
if d0 = 9, and in that case, the successor of N is usually given by dm dm1 d01 0.
And so forth. Either one reaches a digit unequal to 9, or all of the digits must be
equal to 9, in which case the successor of N is of course 100 0, with m zeroes.
(One could be more rigorous here. Any instance of and so forth in mathematics
can be replaces with an inductive argument.)
The place-value system is an invention of Indian astronomers who worked with cos-
mic units of time, like the kalpa, which is precisely 4, 320, 000, 000 years. Certainly
this also necessitated the use of the digit 0, or sh unya to them, meaning void.
According to the tremendous book [4], the place-value system was in place by 458
CE, and certainly originated during the Gupta Dynasty. After you learn to take suc-
cessors, you learn to add and multiply. First you memorize addition/multiplication
tables, which tell you how to add/multiply digits. Then you learn inductive al-
gorithms for adding/multiplying entire strings. There are of course algorithms for
subtracting and long division as well. All of these algorithms of arithmetic can be
proved using the definition of the strings, and the distributive property. We will
not do this here.
system: The last digit of a ternary number is 0 iff the number is divisible by 3.
What does it mean if the last two digits are zeros?
As another example, say the base b = 7. With this convention, we have for instance
234 = 2 b2 + 3 b + 4, which is CXXIII.
Remark: Sometimes one writes 2347 to distinguish this from the base ten expres-
sion, which would be 23410 =CCXXXIV. But too much notation can be a headache,
so we will rely on context.
with base b = II, thus in binary. Binary arithmetic is simpler than decimal in the
sense that rather than needing two nine-by-nine tables of addition and subtraction,
we only need to know that 1+1 = 10 and 11 = 1. (The rules for 0 being universal.)
The sum is equal to
The exercise here is to try not to convert these into decimal, do the operation
and convert back, but to do the entire computation in binary. We have of course
1! = 1 and 10! = 10. The next term is 11! = 11 10 1 = 11 10. We use long
multiplication:
11
10
00
+ 110
110
It should be clear from this that the rule for multiplying by 10 is to simply append
a zero to the end of your digits. Similarly the rule for multiplying by 100 is to
append two zeros to the end. Thus 100! = 100 110 = 11000. Next we use again
long multiplication to compute 101! = 101 11000:
11000
101
11000
1100000
+ 1111000
60 2. ARITHMETIC
The next calculation, 110! = 110 1111000, is a little more interesting, since we
have some carrying of addition:
1111000
110
11110000
+ 111100000
1011010000
Did you catch that? We used 1 + 1 = 10 in the sixth place, and 1 + 1 + 1 = 11
in places seven through nine, carrying the 1 each time. Similarly with 111! =
111 1011010000:
1011010000
111
1011010000
+ 10110100000
101101000000
1001110110000
Finally, 1000! = 1001110110000000.
P1000
Thus the sum n=1 n! is equal to:
1
10
110
11000
1111000
1011010000
1001110110000
+ 1001110110000000
1011010010011001
as the reader should verify.
Strictly speaking, this was just an illustration of basic binary arithmetic. But Id
like to point something out. Suppose we were to continue this computation, adding
on successively higher factorials. All higher factorials N ! with N 1000 end in at
least seven zeros, since they are divisible by 1000!. Therefore the last seven digits
PN
of the sum n=1 n! for N 1000 is 0011001. Moreover, as n increases, n! ends in
more and more zeros. Thus more and more ending digits of the sum will stabilize.
For example, 10000! ends in fifteen zeros, so we conclude that the last fifteen digits
PN
of n=1 n! for N 10000 will be the same. (In fact, they are 111101000011001.)
This process gives us an infinite binary expansion going to the left. Does it even-
tually repeat? No one knows.
You should be able to figure out how subtraction is done in other bases. For
instance, in ternary we have
8. PLACE-VALUE SYSTEMS 61
201210
122212
1221
Check! Of course there was some borrowing as you subtract from right to left.
As usual, you can doublecheck by adding 1221 + 122212 and see whether you get
201210.
Now youre ready for long division. Base ten long division, as taught in elementary
school, is a very mysterious-looking algorithm. It is a good question to ask, why
does it gives the correct quotient and remainder of the Division Algorithm? We
will not answer that question in this book, but instead we will demonstrate how to
perform long division in some other bases. We proceed by analogy.
Let us start with binary again, and in binary, divide 111011 by 101. Heres how
the final long division looks:
1011 R 100
101 111011
-101
1001
-101
1001
-101
100
Can you figure out what happened? In one way, this is easier than decimal long
division, because there are only two digits involved. In the first step, we ask, is
101 1, 11, 111? And since 111 is the first part of the number 111011 which is
at least as big as 111, we put the digit 1 above the third 1, where we are forming
the quotient q. Then we subtract 101 from 111 to get 10. Now we bring down the
digit 0. The new number 100 is still less than 101, so we put the next digit 0 in the
quotient, and bring down another digit. And so on. When we are out of digits
the remainder is 100 which we record next to the big R.
From our belief in long division, we conclude that 111011 = 1011 101 + 100.
Ive presented N in this book as the set {1, 10 , 100 , . . .}. Later I agreed to use Ro-
man numerals {I, II, III, IV . . .} for numbers less than four thousand. Thats our
neutral way to express numbers, even though it is not far from decimal.
It is also fine to think of N as the set {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, . . .}, when we are
tacitly using decimal notation. But if binary is the convention, then is is fine to
think of N as the set {1, 10, 11, 100, 101, . . .}. And so on for the different bases. As
62 2. ARITHMETIC
long as its clear what 1 is, and what the successor function is, and Peanos axioms
are satisfied, its just N with different notation.
To understand how this works, let us work through some examples of how to convert
from one base to another. The easiest is converting to decimal. The number written
as 25037 in base b = 8 can be computed in decimal as:
2 84 + 5 83 + 0 82 + 3 81 + 7 80 = 10783.
How to convert a decimal number like 343 into binary or base 5? There are two
methods.
Method I: Top-down This method finds the digits from left to right.
Therefore
343 = 28 + 26 + 24 + 22 + 21 + 20 .
Fill in the zeros and ones to prepare to write the binary expansion:
343 = 1 28 + 0 27 + 1 26 + 0 25 + 1 24 + 0 23 + 1 22 + 1 21 + 1 20 .
The top-down method is less elegant in other bases. Lets try to convert the decimal
number 343 into base b = V. The highest power of 5 less than 343 is 125 = 53 . So
we know the result will have four digits. But to figure out the first digit, we need
to determine the highest multiple of 125 which is less than or equal to 343. We
have 250 = 2 125 < 343 < 375 = 3 125, and so the first digit is 2. The next step
is to subtract off the 2 125 to get 93. Iterating, the highest power of 5 less than
93 is 25, and the highest multiple of 25 less than 93 is 75 = 3 25. The number 18
remains. It is easy to see what to do from here; 18 = 3 5 + 3.
343 = 2 53 + 3 52 + 3 51 + 3 50 .
Method II: Bottom-up This method finds the digits from right to left.
The right most digit of a number N in base b is the remainder when you divide N
by b. For instance, when you divide 343 by 2 we get q = 171 and r = 1. So the
unit place digit is 1. You then iterate this; divide 171 by 2 to get q = 85 and r = 1.
So far we have learned that the binary expansion of the decimal number 343 is:
Similarly to convert the decimal number 343 into base 5, with similar notation we
compute:
(343) = (68)3
= (13)33
= (2)333
= 2333
With some practice youll do fine.
Hexadecimal Notation Bases with b > X have occasional use. Of course one
needs some symbols for digits beyond the Hindu-Arabic numerals. Let us discuss
the hexadecimal system, which is base b =XVI. Here the convention is to take the
union of the symbols {0, 1, . . . , 9} and the symbols {A, B, C, D, E, F }. We have
A = 90 , B = A0 , . . . , F = E 0 =XV. Thus, for instance, if we wanted to convert the
hexadecimal 2F ACE into decimal, we would compute, in decimal,
2 164 + (15) 163 + (10) 162 + (12) 161 + 14 = 195278.
Lets convert 343 into hexadecimal. Following the notation above, we have
(343) = (21)7
= (1)57
= 157.
If remainders larger than 9 had occurred, then we would have used the letters. For
instance, 26 = 1A.
Finally, how do you convert between two bases like binary and ternary, neither of
which is decimal. There are multiple ways.
One way is to just use the division algorithm as above, but performing it in the
given base. You need to know how to do that. Another way is cheap: convert the
binary number into decimal, and then the decimal number into ternary. Its cheap
but youre less likely to make a mistake.
As an example, lets convert the binary number 1011 into ternary. Long division
of 1011 by 11 in binary gives a quotient of 11 and a remainder of 10, which is 2 in
64 2. ARITHMETIC
Just learn one way that works for now, and eventually learn how to perform long
division in other bases.
Let us be more formal now and prove that any whole number can be written
uniquely in place value notation with any integer b > 1 as a base. The exis-
tence proof will use the idea of the bottom-up approach above, since anyway
what were doing is converting numbers into a given base place-value system. For
the uniqueness proofs, both bottom-up and top-down proofs are given.
m
X
N= di bi .
i=0
as required.
I give two proofs of this: one bottom up and the other top down.
8. PLACE-VALUE SYSTEMS 65
For the top-down proof, the idea is to successively show that the highest digits
are equal, and then cancel them one by one. If the highest digit on one side is bigger
than the other, then surely there is some contradiction. The key is the following
lemma:
Lemma 2.41. Let d0 , . . . , dm1 be base b digits, and c N. Then
m1
X
di 10i < c 10m .
i=0
Note there is one fewer digit. By the inductive hypothesis, di = d0i for 0 i < m.
Therefore all the digits are equal, as claimed. We are done by induction.
8.5. Exercises
Because of the trick from the previous paragraph, it is actually true that you can
express d in both ways. However for the main part of the proof, one first realizes
one way or the other.
How will we prove this important theorem? It will be an induction proof. One needs
to locate an integer quantity that will strictly decrease as we iterate the algorithm.
We use the remainders that occur in each iteration of the division algorithm.
Proof. It should be clear that Mult(ab) Mult(a, b). Let Mult(a, b). By
the Euclidean algorithm, there are numbers m, n so that ma nb = 1. Therefore
ma nb = . Because is a common multiple of a and b it is easy to see that a
and b are divisible by ab. Therefore the LHS and hence is a multiple of ab.
Proposition 2.45. Suppose a and b are relatively prime, and both divide some
number c. Then ab|c.
Proof. Let d be a common divisor of a and b. Then d divides the LHS and
therefore d|1. It follows that Div(a, b) = 1.
The reader should contrast the following definitions, which will be referred to in
the exercises:
Definition. Let a1 , . . . , an N. We say they are pairwise coprime
provided that for all i 6= j, gcd(ai , aj ) = 1. We say they are relatively
prime provided that gcd(a1 , . . . , an ) = 1.
For example, the three numbers 10, 21, 121 are pairwise coprime and relatively
prime, and the three numbers 6, 15, 35 are relatively prime but not pairwise coprime.
Here are some propositions which use these notions. We will not need them in the
sequel, so we just present them as exercises.
Proposition 2.48. If a1 , . . . , an N are relatively prime, then there are numbers
c1 , . . . , cn N so that !
n1
X
ci ai cn an = 1.
i=1
9.3. Exercises
(1) Let a, b1 , . . . , bn N and suppose that gcd(a, bi ) = 1 for all i. Prove that
gcd(a, b1 bn ) = 1.
(2) Let a, b N. Proposition 2.42 gives numbers m, n N so that either
ma nb = d or nb ma = d. Prove that if one follows the Euclidean
Algorithm, then actually m b and n a. (Strong Induction; follow the
proof of Proposition 2.42.)
(3) Let a, b N be relatively prime. Let N (a 1)(b 1). By the Euclidean
Algorithm we know there are m, n N so that ma nb = N . The goal
of this exercise is to prove that there exist c, d N so that ca + db = N .
(The class size problem.) Let k0 = max{k N | m bk}. Prove that
ak0 n. (Hint: Keep the Creeping Lemma Handy) Modify m, n using k0
to get c, d as desired.
(4) Let a, b N be relatively prime. Prove that there do not exist m, n N
with ma + nb = ab a b. (Thus, using the previous exercise, ab a b
is the largest such number.)
(5) Given a natural number n, write (n) N for the number of integers from
1 to n which are relatively prime to n. For example (12) = 4 since there
are four such numbers: {1, 5, 7, 11}. Compute (n) for all the numbers n
from 1 to 25. Is it true that (mn) = (m)(n) for all m, n N?
(6) Prove that if a and b are relatively prime, and a|bc, then a|c.
(7) Let a, b N. Prove that lcm(a, b) divides every element of Mult(a, b).
(8) For which pairs of numbers d, do there exist a, b N so that d = gcd(a, b)
and = lcm(a, b)?
(9) Let d = gcd(a, b). Prove that a d and b d are relatively prime.
70 2. ARITHMETIC
9.4. Ords
If m > 1 and n are natural numbers we want to define ordm (n) to be the maximum
number of times m divides n. For example, ord3 (18) should be 2. Suggestively,
18 = 2ord2 18 3ord3 18 .
In fact, we will later prove that these ords are the exponents that occur in prime
factorizations of numbers.
Let us quickly check that these maximums exist. Remember, a set is guaranteed a
maximum as long as it has some upper bound, and is nonempty.
Proposition 2.50. Let m > 1 and n N. Then n is an upper bound for the set
{i N; mi |n}.
Remarks:
(1) This terminology comes from the world of analysis, where ord means
order of vanishing. For example, the order of vanishing of f (x) =
9. THE FUNDAMENTAL THEOREM OF ARITHMETIC 71
Before working through this proof the reader should try a few examples.
Proof. Let i = ordm (a) and j = ordm (b). By the previous proposition we
may write a = mi u and b = mj v, with m - u, v. If i j then
a + b = mi (u + vmji ).
If i < j it is easy to see that n - (u + vmji ), so that ordm (a + b) = ordm (a) by the
previous proposition. If i = j then the same equation shows that mi |(a + b). Since
ordm (a + b) is the maximum of such exponents, we conclude that ordm (a + b) i.
Of course the case i j is similar.
For every number n N, it is easy to see that 1, n Div(n). For some numbers,
these are the only elements of Div(n).
Thus a number p > 1 is prime if whenever d|p, then d = 1 or p. Another way to say
this is as follows. One might call a divisor d of n a proper divisor if 1 < d < p.
A number is composite iff it has a proper divisor. Then, p is prime if and only if it
does not have a proper divisor.
The next theorem has one of the most famous and treasured proofs in mathematics.
It goes back to Euclids Elements.
Theorem 2.55. There are infinitely many prime numbers.
Proof. This follows from the previous proposition by gathering together iden-
tical prime factors.
Now we start to deal with the issue of the uniqueness of prime factorization. For
example, 1001 = 11 91 = 143 7. Does this violate unique factorization into
primes? (Thanks to [2] for this cute example.)
Proposition 2.58. Let p, a, b N with p prime. Then (p|ab) ((p|a) (p|b)).
Let us analyze the preceding proof a bit more. Let P ,Q,R be the statements:
P: (p is a prime) (p|ab)
Q: p|a
R:p|b.
Note that we can use this proposition to analyze the putative prime factorizations
of 1001 above. For instance 7 is prime (check!) and divides the right-hand side. It
therefore divides 11 or 91, which should lead the reader to suspect that 91 is not a
prime number.
Proposition 2.59. Let p, a1 , . . . , an N with p prime. If p|(a1 an ), then 1
i n so that p|ai .
The next proposition shows that if p is prime, then the function ordp : N N
behaves much like a logarithm.
Proposition 2.60. If p is prime and a, b N then ordp (ab) = ordp (a) + ordp (b).
Proof. Suppose that a2 = 2b2 . Taking ord2 of both sides gives 2 ord2 (a) =
2 ord2 (b) + 1. The RHS is odd and the LHS is even, which is a contradiction.
This is the essentially the proof that 2 is irrational, i.e., Theorem ??. This
argument should strike you as much more powerful and direct than the classic proof
we gave earlier. See Proposition 2.71 below for the final word in such problems.
Lemma 2.63. If p and q are primes, and p|q, then p = q.
Proof. If a|c then the result follows from the only if part of the previous
proposition. Conversely, if ordp (a) ordp (b) for all primes p, then let let p1 , . . . , p`
be all the primes dividing a or c. Then put
`
Y ordpi (c)ordpi (a)
b= pi .
i
Thus we can characterize Div(n) as the set of numbers a so that for all primes
p, ordp (a) ordp (n). For every p,
Q there are (ordp (n) + 1) choices for ordp (a), and
it follows that there are exactly p (ordp (n) + 1) divisors of n.
Proposition 2.70. Let m, n N. Then there exists an integer r N with rm = n
iff for all prime numbers p, we have m| ordp (n).
Remark: This is really a proof that m n Q m n Z. Its related to the
application of the rational root test to the polynomial xm n.
Proposition 2.72. Let a, b N. If d = gcd(a, b), then ordp (d) = min(ordp (a), ordp (b))
for all primes p. If = lcm(a, b), then ordp () = max(ordp (a), ordp (b)) for all
primes p.
Proof. Let p be a prime, and suppose i = ordp (a) ordp (b). Then pi |a
and pi |b, so pi Div(a, b), which is equal to Div(d) by Proposition 2.46. There-
fore pi |d. However, pi+1 - a, so certainly pi+1 - d. It follows that ordp (d) = i =
min(ordp (a), ordp (b)) in this case. Obviously if ordp (b) ordp (a) a similar argu-
ment holds.
We now have a straightforward way to compute the least common multiple of two
numbers. For instance let a = 75 and b = 21. We factor to get a = 3 52 and
b = 3 7. The only nonzero ords are for p = 3, 5, 7. If = lcm(a, b) then we must
have ord3 () = 1, ord5 () = 2, and ord7 () = 1. This determines = 3 52 7.
Proof. Given the above discussion, this reduces to proving that min(x, y) +
max(x, y) = x + y for all x, y N. This is obvious from the proper point of view,
or the skeptical reader may consider the trichotomy of x and y.
This gives a way to compute lcm(a, b) without having to factor a and b. For
example, if a = 2000002 and b = 2000004 then the Euclidean Algorithm gives that
gcd(a, b) = 2 and therefore lcm(a, b) = ab 2 = 2000006000004.
9.7. Exercises
the Peano theory; how properties of arithmetic derive from just a few
axioms
how to work with different kinds of definitions, for example, inductive
definitions, the definition of a b, and the definition of gcd(a, b).
strong induction
the role of the division algorithm and Euclidean Algorithm
arithmetic in other bases
the ord coordinates of numbers
(INJ) If m# #
1 = m2 , then m1 = m2 , and
(IND) If S M is a subset satisfying S and m# S whenever
m S, then S = M. Define a bijection f : N M so that for all n N,
f (n0 ) = f (n)# . [Suggestion: Define your function inductively.] Be sure
to prove that your function is bijective. At what points do you use the
axioms (INF),(INJ),(IND) for N and M? (You need all six.)
(2) For which a, b is it true that ab = ba ? Lets see a proof!
(3) Let a, b N. Recall the definition of the statement ab; this means
there is a natural number n N with an = b. If a > 1 write loga b = n
in this situation. Is loga b uniquely defined? Prove that if a, b, c N with
a > 1, ac and cb, then
(loga b) (loga c) = logc b.
(4) Prove that the numbers q, r, e in Problem ?? in Section ?? are uniquely
determined.
(5) Let n N. Prove that in base X arithmetic there is a multiple of n which
is written as a string of 1s followed by a string of 0s. For example 11100
is a multiple of VI.
(6) Recall that hn denotes the nth Hemachandra number. Let a, b N. Prove
that gcd(ha , hb ) = hgcd(a,b) .
(7) Prove that given a number N one can find N consecutive numbers, each
having prime factors other than 2 or 3. Generalize this to any finite set
of primes.
(8) There is a formula relating ordp (n!) and Sn , where Sn is the sum of the
base p digits of n. Can
you find it? Can you prove it?
(9) Prove that ordp ( nk ) is the number of carries that occur in the base p
addition of k and n k.
(10) Let m, n N . Put urdm (n) = min{i N; n|mi }, if this set is nonempty,
and let urdm (n) = otherwise. For example, urd6 (4) = 2 since 4|62
but 4 - 61 , and urd4 (6) = since 6 - 4i for any i. Find and prove some
interesting properties of urd.
(11) Find two sequences of base-ten-digits a1 , a2 , . . . and b1 , b2 , . . ., with a1 = 2
and b1 = 5, so that for any natural number n, the product of the two
numbers (written in decimal) an an1 a2 a1 bn bn1 b2 b1 ends with
at least n zeros. For example, if your sequences started with a1 = 2, a2 =
1, b1 = 5, and b2 = 2, then they would check out for n = 1 and n = 2,
because 2 5 = 10 ends with a zero, and 12 25 = 300 ends with two
zeros. Also, are there analogues in any base, not just 10?
CHAPTER 3
79
80 3. FUNCTIONS AND RELATIONS
1. Relations
So far we have seen propositions P which do not depend on a variable, and propo-
sitions P (x), where x ranges over some set X.
We also might like to make propositions P (x, y), where x ranges over some given
set X, and y ranges over some other set Y .
For example, consider the statement, She has enough money to buy it. Here she
and it are variables. We may write the statement as P (she, it), where she ranges
over the set of females, and it ranges over the set of commodities (things you can
buy). Its truth value depends on the pair (she, it). Generally, the truth value of
P (x, y) will be true for certain pairs (x, y), and false for the rest.
(1, A), (1, B),
(1) X Y = (2, A), (2, B), .
(3, A), (3, B)
If X and Y are finite, our convention is to display the product set as an array of
pairs, with the rows corresponding to elements of X and the columns corresponding
to elements of Y .
For our bivariable statement P (x, y), its truth set is given by
{(x, y) X Y | P (x, y) is true }.
This subset could be anything. For instance, if again X = {1, 2, 3} and Y = {A, B},
then a relation could be given by
(1, B),
(2) R= (2, A), .
(3, A), (3, B)
Of course this depends on the way we order the sets X and Y . For instance, if we
had ordered the columns as B, A instead of A, B, then we would need to accordingly
switch the columns of AR to fit this convention.
Among other things, asterisk-matrix notation gives us a nice way to catalogue finite
relations on finite sets. There are 16 relations on the set X = {a, b}. Here they are
represented as asterisk-matrices, with the first row and column corresponding to a
and the second row and column corresponding to b:
0 0 0 0 0 0 0 0
, , , , , , , ,
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
, , , , , , , .
0 0 0 0
0 0
We have names for some of these matrices. For instance, 0 = A = ,
0 0
0 0
IX = , AXX = , Ea,b = .
0 0 0
One always has the empty relation R = , and the total relation R = X Y .
In the former case xRy is always false, in the latter case xRy is always true. If
x0 X and y0 Y , write x0 ,y0 for the from from X to Y consisting of the single
pair {(x0 , y0 )}. These are called Dirac-delta relations. In this case xRy (x =
x0 ) (y = y0 ).
1.2. Reflexivity
If X is a finite set with say three elements, then we would naturally represent X
with the diagonal asterisk-matrix, which we write as IX :
0 0
IX = 0 0 .
0 0
Example: The relation x|y on N is reflexive, but the relation x < y on N is not.
Example: Let L be the set of lines in the plane. Then parallelism is a reflexive
relation on L, but orthogonality is not.
Not all relationships are symmetric. The relation R : Person x is taller than person
y. is certainly not. The transpose of a relation is what you get when you switch
the roles of x and y. So, the transpose of R is the relation RT : Person y is taller
than person x.
1. RELATIONS 83
0
AR T = .
0
Example: Take X = R. To find the transpose, one reflects through the line x = y.
This has the effect of switching the coordinate (x, y) to (y, x). Note that S 1 and
D2 are symmetric. The transpose of the x-axis `0 is the y-axis ` . What is the
transpose of `m with m 6= 0? (Pssst: If R is the graph of a function f , and f is
invertible, then RT is the graph of the inverse of f !) The transpose of the second
quadrant is the fourth quadrant. The first and third quadrants are symmetric.
T = .
(X Y )T = Y X.
(RT )T = R.
T
For x X and y Y , we have x,y = y,x .
1.4. Exercises
(1) Let X be a set with n elements. How many relations R on X are sym-
metric? How many are reflexive? How many satisfy RT R = ?
84 3. FUNCTIONS AND RELATIONS
2. Composition of Relations
Thus x(S R)z y so that xRy yRz. To say that a relation R on a set X is
transitive, is the same as to say that whenever xRy and yRz are true for x, y, z X,
then we also have xRz is true.
and
(A, ), (A, ),
S= .
(B, )
Example: (Squaring the Circle) Let R = S 1 , the graph of the unit circle in R2 .
Then
R R = {(x, z) R2 | y R so that x2 + y 2 = 1 = y 2 + z 2 }.
Let us try to understand this relation well enough to sketch it. Certainly it implies
that x2 = y 2 , or that x = z. But if you have such a pair (x, z), there still may
not exist such a y, because you also need to solve x2 + y 2 = 1. This is possible iff
1 x 1. So our relation is:
R R = {(x, z) R2 | (x2 = z 2 ) 1 x 1}
which is the graph of the two diagonals of the square [1, 1] [1, 1].
2. COMPOSITION OF RELATIONS 85
For this I ought to tell you how to add and multiply 0s and s. The idea is
to treat as unknown. The sum of an unknown and anything is an unknown,
but the product of an unknown and 0 is still 0. This leads to the following addi-
tion/multiplication tables:
+ 0 0
0 0 and 0 0 0 .
0
Using these rules you can define matrix multiplication using the usual dot products.
For instance, in the above example, the product of AR and AS is given by
0 0 0
0 0 = 0 .
0 0
Note that this is precisely equal to ASR , and therefore in this case,
ASR = AR AS .
Here the multiplication on the right hand side is asterisk-matrix multiplication.
In fact, this is true in general. We omit the proof, but isnt it curious that the order
of multiplication changed?
Suppose you have a graph G consisting of vertices V and edges E between them.
Every edge e starts at some initial vertex v1 and ends at some terminal vertex
v2 . In graph theory, it doesnt really matter if the edges are straight lines or not,
and it is fine to move the vertices and edges around, as long as the same vertices
are connected with the same edges. When we use computers to study graphs, the
essential information we upload is our vertex set, maybe V = {a, b, c}, and our edge
set. To describe an edge to a computer, we only need to say where it begins and
ends. So e above would be uploaded as e = (v1 , v2 ) V V . Thus, the edge set
E V V is simply a relation on V .
For example, consider the following graph on the vertex set V = {a, b, c}:
[There should be a nice picture here. Can you reconstruct it? The edges must all
have arrows.]
Let us note some phenomena above. The edge from b to b is called a loop;
generally a loop is an edge that begins and ends at the same vertex. With our
earlier terminology, the set of loops is exactly E V . Thus E will be reflexive if
there is a loop at every vertex.
2. COMPOSITION OF RELATIONS 87
Vertices a and c have two edges between them, but they are not considered the same
edge because they are going in different directions. When we have two vertices
joined in both directions by two edges, we should replace the two edges with a
simple edge with no arrows:
[Picture where we replaced the two edges with arrows from a to c with a single edge
without an arrow.]
The relation E is symmetric iff all the edges are now simple.
[Id like to describe what I call the bipartite view of a relation between two sets.
This is where you draw two horizontal ovals with some dots in between them, and
draw arrows from the dots on the left to the dots on the right. For instance when
I spoke in class about functions I drew several of these.]
2.4. Exercises
(1) There are three relations on X = {a, b} which are not transitive. Find
them.
(2) Find a relation R on X = {a, b} which is transitive, but for which R R 6=
R.
(3) Let X be a set, and R a relation on X. Prove that if R is reflexive and
transitive, then R R = R.
(4) There are five equivalence relations on the set X = {a, b, c} of three dis-
tinct elements. Find them, and write them as asterisk-matrices.
(5) Let P1 be the set of lines in R2 passing through the origin. On P1 , consider
the relation `1 R`2 provided that `1 and `2 are orthogonal. Compute RR.
(6) Let P2 be the set of lines in R3 passing through the origin. On P2 , consider
the relation `1 R`2 provided that `1 and `2 are orthogonal. Compute RR.
(7) Compose the relations `m , ` , S 1 , D2 from the first section with each
other. Can you find examples of relations which do not commute here?
Also compose these relations on both sides with the total relation R2 .
(8) Consider the relation R R2 given by the square pictured below. The
square has vertices (0, 0), (1, 0), (0, 1), and (1, 1), and it is the union of
four closed intervals in the obvious way. Describe the relation R R.
y
(1, 1)
x
(9) If R0 R and S 0 S, then R0 S 0 R S.
(10) Let S Y Z and R X Y be relations. Suppose that S = S1 S2 .
Prove that
(12) What happens when you compose the elementary relations x,y with other
relations? (Think about both x,y R and R x,y .) Suggestion: Experi-
ment with the R2 relations.
(13) Let X, Y be sets, x, x0 X, and y, y 0 Y . Give a formula for x,y x0 ,y0 ,
where x, x0 X and y, y 0 Y .
(14) Square all 16 asterisk matrices in the previous section. Make a list of
the 2 2 matrices which are squares. I think it is an interesting project
to determine which n n asterisk matrices are squares. This, of course,
is equivalent to determining which relations on a set with n elements are
squares.
3. Functions
The modern definition of a function was not enunciated until the middle of the
19th century, a considerable time after the advent of calculus, for instance. Math-
ematicians before this just basically dealt with expressions, usually power series or
rational functions (quotients of polynomials). Words like singularity were used
for a point not in the domain of a function. If you were an expert mathematician,
you knew what you were doing. But without the modern notion, it is difficult to
understand things like inversion. For instance, the inverse trig functions are difficult
to grasp without the proper vocabulary of domain and codomain.
The words function, transformation, and map are all synonyms. Some use
the word target as a synonym for codomain. The word range is used in-
consistently, sometimes meaning codomain and sometimes meaning image (see
below). It is best avoided.
For example, the function f : R R given by f (x) = x2 has domain R and also
codomain R.
You may rightfully argue that in the (standard) definition above, I have used an
undefined concept rule. This is not insurmountable; I will later indicate how
one can alternately define function in terms of a certain kind of relation. (One
that passes the vertical line test.) It is essential, however that the function is
well-defined, or well-formed.
1
Example: Writing f (x) = 2x does not define a function f : R R; it is obviously
not defined at 2. You cant later say f (2) = , unless you change the codomain to
include . (Or . Let us not talk about that.)
Example: Expecting
f (z) = z to define a function f : C C is (very) bad.
What should 1 be?
3.1. Injectivity
Suppose
you are
grading a students work, in which he is trying to demonstrate
that 2 2 = 2. It reads:
To show: 2 2 = 2
1 2= 21
(1 2)2 = ( 2 1)2
1+2 2+2=22 2+1
32 2=32 2
Hence, proved.
Where did the student go wrong? First of all, he is not really explaining his logic
with sentences. Hence, proved. is not useful. So we have to guess his thought
process. The first step is evidently subtracting 1 from both sides of the equation,
and the second step is squaring both sides of the equation. But, the implied logic
seems to be: I wanted to prove two things are equal, I apply some operations to
both of them and they become equal. Therefore the original two things must be
equal. This only works if the operations are injective, and that is exactly where
the putative proof breaks down. So, injectivity is an important notion to bear in
mind throughout mathematics.
Is f (x) = x2 injective? Hold on; the question of injectivity really depends on the
domain of the function. If we mean f : R R it is certainly not injective, since
90 3. FUNCTIONS AND RELATIONS
If you look at the graph of a real-valued function, with domain some subset of R,
the function is injective iff it satisfies the horizontal line test. This means that
any horizontal line must only intersect the graph at most once if the function is to
be injective. Can you explain why?
Proof. Let x 6= y be in I. We may assume that x < y. Then f (x) < f (y),
which implies that f (x) 6= f (y). Therefore f is injective.
Similarly, if f 0 (x) < 0 for all x I, then f is strictly decreasing, which also implies
that it is injective.
Corollary 3.6. If f : I R has f 0 (x) > 0 for all x I, then f is injective. If
f 0 (x) < 0 for all x I, then f is also injective.
3.2. Surjectivity
A function need not take on all the values in its codomain. Sometimes this set of
values is hard to calculate, as with f : R R given by f (x) = x4 7x + 1.
Let me state a powerful theorem from calculus, which combines the Extreme Value
Theorem and the Intermediate Value Theorem:
Theorem 3.7. Let a < b in R and f : [a, b] R a continuous function. Put
m = min{f (x) | x [a, b]} and M = max{f (x) | x [a, b]}. Then the image of f
is equal to [m, M ].
(The existence of the min and max is from the Extreme Value Theorem.)
3. FUNCTIONS 91
3.3. Bijectivity
Example: Let us see that the function f : R2 R2 defined by f (x, y) = (x, x+y) is
a bijection. For injectivity, suppose that f (x1 , y1 ) = f (x2 , y2 ). Thus (x1 , x1 + y1 ) =
(x2 , x2 + y2 ), which is the statement that (x1 = x2 ) (x1 + y1 = x2 + y2 ). It follows
easily that (x1 , y1 ) = (x2 , y2 ), and so f is injective.
Proof. For the first part, let x1 , x2 X. If g(f (x1 )) = g(f (x2 )), then since g
is injective we have f (x1 ) = f (x2 ), and since f is injective we deduce that x1 = x2 .
This shows that g f is injective.
The third part follows from the first and second parts, and the rest you should do
yourself.
Merely one of the conditions, i.e. g f = idX is not enough by the example above.
Proof. We have
g1 = idX g1
= (g2 f ) g1
= g2 (f g1 )
= g2 idY
= g2 .
Inverses depend very much on the domain and codomain of the function. For
2
instance f1 : [0, )
[0, ) given by f1 (x) = x has inverse g : [0, ) [0, )
2
given by g1 (x) = x, but f2 : (, 0] [0, ) given by f 2 (x) = x has inverse
g : [0, ) (, 0] given by g1 (x) = x.
This is especially prominent for the inverse trig functions. For instance, the sine
function naturally has domain R and codomain R, but it is not injective or surjec-
tive as such. One typically restricts the domain to [/2, /2] and the codomain
to [1, 1] to obtain a bijection. Thus, one has an inverse arcsin : [1, 1]
[/2, /2]. Note that with this convention arcsin(y) doesnt take the value ,
even though sin() = 0.
One could easily find other domains on which sin is injective, such as [/2, 3/2].
But the original one is more commonly taken, and numbers in this range are called
the principal value of the arcsine function.
A worse situation is trying to invert the cotangent function. [picture] Which do-
main should we take for cotangent? Different authorities make different choices.
94 3. FUNCTIONS AND RELATIONS
Wikipedia, for instance, says that we should invert cotangent on (0, ) R, but
Mathematica says we should restrict cotangent to a function
cot : (/2, 0) (0, /2] R.
Both of these are bijections. What is arccot(1), for instance? Wikipedia says it
should be 3/4 and Mathematica says it should be /4.
This can be confusing if you dont have a handle on the domain/codomain concept.
Rather than trying to memorize conventions, try to understand what the logical
issue is.
3.7. Exercises
4. Functions as Relations
In this section we embed the concept of a function into the concept of a relation.
[Examples]
We now extend the notions of injective, surjective, and bijective to the context of
relations.
4.1. Exercises
Examples:
The equality relation is weaker than any other partial ordering. The product order
on N2 is weaker than the lexicographic order.
Note that the relation 1 is weaker than 2 is itself a partial order on the set of
partial orders on a set.
5.1. Exercises
(1) List all partial orders on J3 . How many are total orders? How many are
well-orders?
(2) How many total orders are there on Jn ?
(3) Show that on a set X, no partial orders are strictly stronger than total
orders.
(4) Let X be a toset. Let
[
X0 = Xx .
xX
6. Chapter 3 Wrap-up
A little about what -invariant functions are, especially for defining func-
tions on angles.
(1) Let X be a set. Which relations on X commute with all other relations?
(Relations R and S on X commute provided that R S = S R.)
(2) Let X be a set. We say that a relation S on X is a square root of a
relation R provided that S S = R. Does every relation have a square
root? Is the square root unique if it exists?
Does the unit circle relation on R have a square root?
CHAPTER 4
Cardinality
99
100 4. CARDINALITY
Note that:
X X by using idX .
If X Y , then Y Y . This is because the inverse of a bijection is
another bijection.
If (X Y ) (Y Z), then X Z. This is because the composition of
two bijections is a bijection.
Lemma 4.1. Let n 1 and 0 j < n. Then there is a bijection from Jn {j} to
Jn1 .
For the second, use Theorem 3.13 together with the first part. The third part
follows from the first two parts.
We have just proved that Jm Jn and the power set of Jn are finite sets.
In particular, || = 0.
Theorem 4.4. Let X, Y be finite sets. Say |X| = m and |Y | = n. Let f : X Y
be a function. Then
Corollary 4.7. Any subset of Jn is finite. Any subset of a finite set is finite. If
X Y with Y finite and X 6= Y , then |X| < |Y |. In particular, a finite set is not
equipotent to a proper subset of itself. If f : X Y is a surjection, and X is finite,
then Y is also finite.
The reader familiar with linear algebra may appreciate the following theorem, which
is analogous to Theorem 4.4:
Theorem 4.8. Let V, W be finite-dimensional vector spaces, with dim V = m and
dim W = n. Let f : V W be linear. Then
All linear maps on finite-dimensional vector spaces are the following, up to notation:
Let A be a matrix with n rows and m columns, and let LA (~v ) = A~v (matrix-vector
multiplication). Then LA : Rm Rn is a linear map, and thus the previous
theorem applies. There are numbers called rank and nullity associated to matrices.
The rank of A is n iff LA is surjective, and the nullity of A is 0 iff LA is injective.
In fact, the theorem above is deduced from the rank-nullity theorem (which says
that rank(A) + nullity(A) = m).
Proposition 4.9. Let X and Y be finite sets, with |X| = n and |Y | = m. Suppose
there is a map f : X Y whose fibres all have the same cardinality d. Then
n = dm.
Define a map X Y Jd by
F (x) = (f (x), f (x) (x)).
Thus, if f (x) = y, then
F (x) = (y, y (x)).
We claim that F is a bijection. Suppose that F (x) = F (x0 ). Then f (x) = f (x0 ) = y,
so x and x0 are both in f 1 (y). The fact that F (x) = F (x0 ) also gives y (x) =
y (x0 ). Since y is injective, we have x = x0 . Therefore F is injective. To show
that F is surjective, let y Y and j Jd . Let x be the element in the fibre of y
which maps to j under y ; it is easy to see that F (x) = (y, j).
Proof. Exercise.
Let X be a set of pigeons, and Y be a set of holes. We can think of putting pigeons
in holes in terms of a function from X to Y . The fibre over a hole is the set of
pigeons put into that hole. If |X| > d|Y |, then there must be at least one hole with
more than d pigeons.
104 4. CARDINALITY
Proposition 4.15. A q B B q A.
(A q B) q C A q (B q C).
[Example]
[Example]
(1) X is infinite.
(2) There is an injection from N to X.
(3) There is a surjection from X to N.
(4) X is equipotent to a proper subset of itself. (Dedekinds Criterion)
Proof. For (2) (1), suppose that X is finite, with |X| = n, and there is
an injection from N to X. Then there is an injection Jn+1 N X Jn ,
contradicting [earlier].
For (1) (2), suppose that X is infinite. We will recursively define an injection
from f : N X, as follows. Since X is nonempty, some x0 X. Put f (0) = x0 .
Now suppose that an injection f : Jn X is given. If f is a surjection, then it is a
bijection contradicting the infinitude of X. Otherwise, there is some xn X, and
we put f (n) = xn . Thus we recursively have distinct elements f (n) for all n N;
the resulting f is an injection.
For (2) (4), let f : N X be an injection. Let xn = f (n) for all n and
Z = X {x0 , x1 , . . .}. We define a bijection g : X X {x0 } so that g is the
identity on Z and g(xn ) = xn+1 .
1.6. Exercises
Here (a, b) denotes the set of real numbers x with a < x < b.
(15) Show carefully using the definitions in class that the coproduct X q X is
equal to X J2 .
(16) Give an explicit bijection from the coproduct Jm q Jn to Jm+n .
(17) Prove carefully that the coproduct of two finite sets is finite, assuming the
previous exercise.
(18) Draw a picture of the coproduct qnN nZ as a subset of Z N.
(19) Prove that if X and Y are sets, then the coproducts X q Y and Y q X
are equipotent.
Let B be the set of sequences, where each term is either 0 or 1. For
instance,
(0, 1, 1, 0, 0, 0, 1, 1, 0, 1, . . .) B.
(a) Prove that B is equipotent to B B.
(b) Find a surjection from B to [0, 1] R.
2. Countable Sets
{2, 3, 4, . . .}
{2, 4, 6, 8, . . .}.
2. COUNTABLE SETS 107
Z.
Note that if a set X is denumerable, this means that the elements of X can be
expressed as a sequence {x1 , x2 , . . .} of distinct elements.
Proposition 4.17. Let n 1. Then the product N Jn is denumerable.
Since there is an injection of qiI Xi into the countable set N I, we conclude that
qiI Xi is countable.
S
Corollary 4.27. If I is countable, and each Xi is countable, then iI Xi is
countable.
S
Proof. The map p : qiI Xi iI Xi defined by p(xi , i) = xi is a surjection.
Since the domain is countable, the codomain must be as well.
Example: The set Z[x] is denumerable: Let Z[x]d be the set of polynomials of
degree d. It is clearly equipotent to Zd+1 and hence denumerable. Now
[
Z[x] = Z[x]d ,
i=0
To appeal to the corollary, one often says a countable union of countable sets is
countable.
3. Uncountable Sets
3.1. Uncountability of R
If yes, then r
/ f (r) = R, a contradiction.
Thus anyway we have a contradiction, and we conclude that there does not exist
such a surjection.
Theorem 4.29 implies that there is no bijection from any set to its power set. In
some sense it means that, even if X is infinite, (X) is still bigger.
This follows from Theorem 7.34; please accept it for now to get to our application.
Proposition 4.32. The set of algebraic numbers is countable.
Then [
Q= Z(p)
pI
is a denumerable union of finite sets, and is therefore countable.
Corollary 4.33. There exist transcendental numbers.
3.4. Exercises
4. Interlude on Paradoxes
We are soon going to run into some dangerous logical territory, with proofs by
contradiction on the verge of creating paradoxes in mathematics. This section is
likely to delight many of you but horrify the rest. Lets begin.
112 4. CARDINALITY
A great interview question, if you dont like the candidate, is Is the answer to
this question, No ?. If they say Yes, then they havent answered the question
properly. If they say No, then the answer to the question was not No, so they have
not given the right answer. This paradox is a variation of the Liars Paradox, which
is, I am lying. Is that true or false? Again, both answers lead to a contradiction.
For us, the appropriate version is:
This is very bad. Suppose that one has a statement P so that both P P is true.
That is, a contradiction in mathematics. Let Q be any other statement. Then by
the tautology
P (P Q),
Q must be true. Since Q was arbitrary, every possible statement is true. This
sounds like a disaster for mathematics.
How does one resolve this quandary? Well first of all, what I actually said ear-
lier was: In mathematics, every well-formed statement is true or false. This and
subsequent paradoxes point to the need to more rigorously define the notion of
well-formed. When one studies the subject of Logic, one takes great pains to say
exactly what is meant by this. For instance, Srivastavas book [12], starts by defin-
ing what different kinds of symbols are, what terms are made up of these symbols,
what formulas are made up of terms, etc. etc. The method is recursive, and one
does not see a way to form a self-referential statement. We will not give these de-
tails in these notes, but the resolution is essentially that self-referential statements
are not well-formed.
Next is a linguistic paradox. I do this because I dont want you to go thinking that
paradoxes are entirely the fault of mathematicians...
For example, the word noun is a noun. So the word noun is autological. The
word verb is not itself a verb, so the word verb is heterological.
4. INTERLUDE ON PARADOXES 113
Here are some heterological words: bisyllabic, incomplete, tree, red, long
(Dont take this too seriously. Of course with many adjectives it is a grey area
whether they are one or the other.)
Therefore there is no correct yes or no answer to the question. Too bad for
linguistics.
Let n be the smallest natural number not definable in fewer than twelve words.
But we just defined it with eleven words!
Let S denote the set of all sets, sometimes called the Universal Set. Remember
that the empty set is the set so that x, x / . Well, S is the set so that x,
x S. Pretty simple to understand, right? One curious thing youll notice about
S is that it is an element of itself, meaning S S. Can you think of other sets like
that? We have the set of infinite sets, say, I = {A S | A is infinite}; certainly I
is infinite, so I I? Does the set of abstract thought qualify?
Let R = {A S | A / A}. This is the set of sets which are not members of
themselves. For instance N R, since N
/ N. (Of course N N, but that is a
different thing.)
Heres a way to think about it: S is not well-formed, because if it is itself to be a set,
then its definition is again self-referential. So not everything you can name qualifies
as a set. This paradox is very important historically; it called for a profound
reexamination of what qualifies as a set. Since sets are the foundation for all of
modern mathematics, many logicians worked very hard to articulate exactly what
should or what shouldnt be a set. In fact, we now no longer allow any sets to be
members of themselves. So there is no set of all sets, or set of all infinite sets.
We dont even have a set of all finite sets, or indeed a set of all singletons. Ill
show you soon how that last one leads to a paradox.
114 4. CARDINALITY
Ready for more? Heres a hair-raising one, that destroyed a monumental work of
Frege, one of the fathers of logic. Frege had a theory of number that came before
Peano. He had a wild idea for defining numbers. He said that the number n
should be the set of all sets X with |X| = n. For instance 1 should be the set
of all singletons. (By singleton I mean a set with only one element.) When the
set-theory paradoxes appeared, one of them undermined his opus Grundgesetze
der Arithmetik. We will describe this now.
Let S be the set of all singletons. Observe that every set X injects into S, because
one can define f : X S to be f (x) = {x}. In particular, if we put X = (S) we
obtain an injection f : (S) S.
The resolution of this paradox is that there is no set of all singletons! This kind of
paradox is quite alarming, because theres noting obviously self-referential in the
formation of S.
Exercise: Let n N any number. Show that the notion of the set of all sets with
n elements leads to a paradox.
Most paradoxes of the sort mentioned above have a corresponding unadox. Recall
that a paradox is a statement, which when assigned either truth value gives a
contradiction. A unadox is a statement, which when assigned either truth value,
does not give a contradiction. Each unadox could be true or false; there is no way
to tell, but neither true nor false would give any contradiction. It is associated to
the paradox by changing a key false somewhere (maybe implicit) to a true.
Example: The Liars paradox is: This statement is false. The corresponding
unadox is: This statement is true. If you declare that it is true, that is consistent
because the statement is then true. If you declare that it is false, that is consistent
because the statement is then false.
6. CHAPTER 4 WRAP-UP 115
Do you get the idea? Can you find a corresponding unadox for the following para-
doxes should be?
5. Some History
6. Chapter 4 Wrap-up
Equivalence
117
118 5. EQUIVALENCE
1. Equivalence Relations
There are some basic axioms of equality which we use all the time, usually without
thinking. No matter what elements a, b, c we have of some set, we always have:
a = a,
a = b b = a,
((a = b) (b = c)) (a = c).
1.1. Partitions
How would a graph of vertices and edges look if it gives an equivalence relation on
the vertices? Since it is reflexive, there will be a loop at every vertex. Since it is
symmetric, every edge will be simple. Transitivity is the interesting property. If
there is a (simple) edge connecting v1 to v2 and another connecting v2 to v3 , then
there must be a third edge connecting v1 to v3 .
[Picture]
1. EQUIVALENCE RELATIONS 119
Try drawing a graphs with this transitivity property. Youll quickly find that your
graphs are all disjoint unions of complete graphs. That is, no two of them intersect.
Here is an example of such:
j g
e
b h
c
d i
f
a k
The triangle in the graph above can be expressed as [a] or [b] or [c]. And so [a] =
[b] = [c]. The entire graph is the union X = [a][d][f ] of three equivalence classes.
This union can also be expressed in other ways, for instance: X = [b] [e] [f ] of
course.
We will prove that if X is a set with an equivalence relation, then the set of
equivalence classes form a partition of X. As another example take X = Z and the
mod 2 equivalence relation above. The corresponding partition of Z is
The set of odd numbers is the equivalence class containing 1. It is also the equiv-
alence class containing 13. In modular arithmetic we usually write x for [x]. So
in mod 2 equivalence 0 denotes the set of even numbers, and 1 the set of odd
numbers. But it is also true that 4 is the set of even numbers, since any number
equivalent to 4 is even. Thus 0 = 4. Similarly, 13 = 1.
Remark: When using this symbolism, it is important that the mod 2 is understood.
Context lets you know that this is not mod 3, for instance.
Proposition 5.2. Let X be a set with an equivalence relation . Then the equiv-
alence classes [x] form a partition of X.
120 5. EQUIVALENCE
Lets say you have a set with an equivalence relation. If you really think of equiv-
alent things as being the same, in the sense that you replace x y with x = y,
then you are shrinking the set X down to what is called the quotient set.
Example: Consider Z with the mod 2 equivalence relation. Then Z/R = {[0], [1]}.
Traditionally, we write Z/2Z for Z/R and so the previous sentence could also be
written as Z/2Z = {0, 1}.
Example: Let I = [0, 1], the closed interval. Let me describe a partition of I with
infinitely many parts. If 0 < x < 1, then the singleton {x} will be a part. (A
singleton is a set with exactly one element.) Other than these singletons, I declare
{0, 1} to also be a part. This describes a partition. The equivalence relation
corresponding to this partition is x y provided that (x = y) (x = 0 y =
1) (x = 1 y = 0). The quotient set I/ naturally forms a circle. The idea is
that you can start at [0], move to the right through all the x (0, 1), and then end
up at [1]. But [1] = [0], so you have wound up where you started from. Just like
on a circle.
Example: Let X be the square [0, 1] [0, 1]. Consider the partition of X as
follows: If 0 < x < 1 and y [0, 1], then the singleton {(x, y)} will be a part. Other
than these, declare {(0, y), (1, y)} to be a part for every y [0, 1]. The equivalence
relation corresponding to this relation really just says that we call the two vertical
1. EQUIVALENCE RELATIONS 121
sides of the square equivalent. The quotient set X/ naturally forms a cylinder.
Can you see how?
Example: Let X be the square [0, 1][0, 1]. Consider the partition of X as follows:
If 0 < x < 1 and 0 < y < 1, then the singleton {(x, y)} will be a part. Other than
these, declare {(0, y), (1, y)} to be a part for every y (0, 1), declare {(x, 0), (x, 1)}
to be a part for every x (0, 1), and declare {(0, 0), (1, 0), (0, 1), (1, 1)} to be a part.
The equivalence relation corresponding to this relation really just says that we call
the two vertical sides of the square equivalent, and also the two horizontal sides of
the square equivalent. The quotient set X/ naturally forms a torus. Can you
see how?
Example: Let X be the square [0, 1][0, 1]. Consider the partition of X as follows:
If 0 < x < 1 and 0 < y < 1, then the singleton {(x, y)} will be a part. Other than
these, declare the union of all four sides of the square as one part. Thus, we shrink
the entire boundary down to one point in the quotient set. In fact, The quotient
set X/ naturally forms a sphere. Can you see how?
Example: Let R[x] be the set of polynomials with real coefficients. Say that
polynomials f, g are equivalent provided that x2 + 1 divides f (x) g(x). Believe it
or not the quotient set is the definition of the complex numbers C. Why dont you
ponder that for a while?
1.3. Exercises
(1) Let n N. Prove that two numbers a, b N are equivalent mod n iff they
have the same remainder upon division by n.
(2) Let X be the set of functions from R to itself. Let g X, and write O(g)
for the set of functions f X so that
f (x)
lim exists and is finite.
x g(x)
We are about to make our first great leap in mathematical thought: the construction
of (positive) rational numbers. One defect of the natural numbers is one can solve
some division problems but not others. For example the theory does not include
any meaning for dividing 1 by 2. When we buy apples in a grocery store this doesnt
cause any problem because we only need to add and occasionally multiply them.
But when we want to share them with a friend or make muffins we may need to
speak of fractional parts of apples. Now if all our recipes were written in terms of
eighths of apples, for instance, we could do the following. We could write 1 for
an eighth of an apple, 8 for a full apple, and multiply all our previous tallies by
8. This is unpleasant for several reasons, aesthetic and practical. And if I wanted
to distribute an apple amongst a set of quintuplets I would be at a loss. In other
words, we would like a logical system in which we can add, multiply and divide
natural numbers.
2.1. Ratios
The solution to our problem is roughly like this: We form the set of division prob-
lems, decide which of them should be equivalent, and then mod out by this
equivalence relation. We would like to do arithmetic, so we will need to spend some
care defining addition and multiplication on these problems.
We will denote by R+ the set of positive ratios (a : b), where a and b are natural
numbers. Strictly speaking, a ratio is just an ordered pair of numbers, but you
should think of them as division problems. So (a : b) is the division problem of a by
b. Consider the ratios (2 : 1) and (4 : 2). Strictly speaking they ask two different
questions, although they both should have the same answer 2. We call two ratios
proportional if they should have the same answer.
Although we have not defined fractions yet, you may intuitively think of the ratio
x
(x : y) as getting at the fraction ; this will help explain some of the formulas. For
y
x a
example, it (x : y) (a : b) if = .
y b
Example: (2 : 6) (3 : 9); these ratios are proportional but not equal. Just to be
clear, two ratios (a : b) and (c : d) are equal only if a = c and b = d.
Now that we have an equivalence relation we can start thinking about equivalence
classes, such as [(1 : 1)]. In fact (a : b) [(1 : 1)] exactly when a = b.
a c a+c
[Explain whats bad about b + d = b+d .]
a c ad + bc a c ac
+ = and = .
b d bd b d bd
We would like to identify certain fractions as being integers. Here is how one
formally makes this identification.
a
Note for instance, that b N exactly when b|a.
[Division.]
2.3. Exercises
For the first four exercises, use the equivalence class definitions.
(1) Check that the operations of addition and multiplication above are indeed
-invariant.
(2) In this exercise we will use the first quadrant of the Cartesian plane to
plot sets of ratios. To a ratio (a : b) associate the point (a, b).
(a) Plot the set of ratios equivalent to the ratio (1 : 2). It should be an
infinite sequence of collinear points, and determines a line. What is
its slope?
(b) Plot the set of ratios which are of the form (1 : 2) (a : b), with
a, b N.
(c) Plot the set of ratios which are of the form (2 : 4) (a : b), with
a, b N.
(d) Plot the set of ratios which are of the form (1 : 2) + (a : b), with
a, b N.
(3) Prove that if two ratios in R+ are proportional and reduced, then they
must be equal. (Note we are only using positive numbers.)
(4) Does the distributive law hold in Q+ ? Does it hold in the set R+ of
positive ratios? Give proofs or counterexamples.
The rest of the exercises do not focus on the equivalence class definitions.
Rings
127
128 6. RINGS
1. Abstract Algebra
You probably learned a lot of algebra in school. You learned how to solve equations
like ax + b = c, and quadratic equations, you learned how to combine terms, that
you should add exponents when multiplying powers, you learned to solve systems
of equations, et cetera.
If you learned about imaginary and complex numbers, you didnt have to relearn
those rules or develop much more algebraic intuition. You had to learn the rule
i2 = 1 and how to rationalize complex denominators, but you could still use all
the skill from before.
If youve had some linear algebra, you know that square matrices of the same size
can be treated much like numbers. They can be added, multiplied, raised to powers,
and you can often solve an equation of matrices AX = B by multiplying both sides
by A1 . Again, much of what is true about the algebra of numbers is also true
for matrices. Of course, much of the training of linear algebra is to be cautious
with your intuition, since most of the time it is not true that AB = BA. You
have to distill out which of your intuition comes from commutativity and which is
independent of it. But you still want that overall algebraic intuition.
In working with modular arithmetic, you can use a great deal of the intuition from
those high school algebra days. As we saw, you can still add b to both sides of an
equation to cancel a b. You can often divide both sides of an equation by a. If the
modulus is odd, the quadratic equation still works basically the same way.
There are tons of these different algebra systems in mathematics, and were going to
focus for this chapter on one type of algebra system called a ring. Examples of rings
we have seen so far include Z, Q, R, and all of the Z/nZs for n > 1. Philosophically,
knowing something is a ring means you can transfer a certain amount of algebraic
intuition to studying it. More practically, you can prove lots of results in the context
of abstract ring theory, and they will automatically be true for not only every ring
youve ever met, but every ring youll meet in the future.
In the next section we introduce the concept of a ring, and study the abstract idea
of divisibility. In particular, when can the product of two things be 0 or 1? A field
is a special kind of ring, when everything is divisible by everything else (except 0).
We will discover ways to produce new rings from old. For instance, given a ring
R, you can talk about the ring of matrices Mn (R) whose entries are in R, and the
ring of polynomials R[x] whose coefficients are in R.
One of the greatest analogies in mathematics is that between the integers Z and the
ring of polynomials F [x] where F is a field. The most important themes of Chapter
1 carry over, in particular the Fundamental Theorem of Arithmetic, in the sense
that any nonzero polynomial factors into a product of irreducibles in essentially one
way. I hope you will find many other similarities as well.
Finally we demonstrate how to mod out in the general context of a ring. This is
an interesting way to create new rings satisfying (almost) whatever relations you
2. RINGS 129
like. for instance we can force a ring to have an element x satisfying x2 = 1; this
leads to the complex numbers.
2. Rings
2.1. Definition
Intuitively a ring is a place where you can add, subtract, and multiply. You cant
necessarily divide. Thats what you tell your friends. Of course, you need to define
what you mean by add, subtract, and multiply, though the words are suggestive.
Most of our rings will be commutative; rings of matrices will be the main examples
of noncommutative rings.
The first four axioms give the fundamentals of addition and subtraction. They
imply you can always solve the x + a = b problem for x: If a, x, b R and x + a = b,
then adding a to the right of both sides yields
(x + a) + (a) = b + (a)
x + (a + (a)) = b + (a)
x + 0R = b + (a)
x = b + (a).
Above we have used associativity, the property of additive inverses, and the property
of the additive identity.
Note we have not included an axiom for a multiplicative inverse. This is intentional,
and part of what makes ring theory interesting. More on that later.
The axiom of nontriviality really only excludes the trivial ring. This is because if
0R = 1R , then if x R, we have x = 1R x = 0R x = x by the above Proposition.
By the way, this is what we would get if we considered Z/nZ for n = 1.
Here are some more ring facts which the eager reader may enjoy proving.
Remark: There are a couple different conventions about what axioms a ring should
have. First, some authors do admit the trivial ring. Secondly it is sometimes
interesting to study rings which dont have a multiplicative identity, like the even
integers. We will not pursue this.
2.2. Divisibility
Basic properties of divisibility from Z carry over to any ring, for the same reasons.
Proposition 6.3. Let a, b, c, x, y R. Then:
Proof. We prove one direction, saving the others for the reader. Suppose that
x|uy. Then there is an element c R so that xc = yu. Multiply both sides by u1 .
This gives x(cu1 ) = y, and therefore x|y.
Units are invaluable in solving the ax = b problem, as you may recall from our
study of this problem for modular arithmetic. If a is a unit, then the solution to
the problem is x = a1 b.
The only units of Z are {1, 1}. Every nonzero element of Q and R are units. The
units of Z/nZ are the congruence classes a where a and n are relatively prime.
Note that 1R and 1R are units in any ring. Since 0R x = 0R for all x, 0R is never
a unit. Sometimes it is the only nonunit of a ring, and such rings have a special
name.
Thus Q and R are fields. It is a good exercise to think through the following:
Proposition 6.5. The ring Z/nZ is a field if and only if n is prime.
The ring Z is not a field. The number 2 does not have an inverse in Z. It doesnt
matter that in the bigger ring Q it has an inverse.
The most interesting step here is the one that comes after the factoring. Why,
exactly, is it that if two numbers multiply to 0 then one of them must be 0?
In the ring Z/10Z, for instance, the elements 5 and 6 multiply to 0, though neither
of them is itself 0 = 0Z/10Z . And note that in Z/10Z, the congruence class 8 is
another solution to x2 5x + 6, in addition to 2 and 3. Are there any others?...
Here is a practical way to think about a zero divisors. Suppose a is not a zero
divisor, and ab = ac. Then a(b c) = 0. This is a factorization of 0, so b c = 0
and thus b = c. So even though we didnt use an inverse of a, since it was not a
zero divisor, we could cancel it from both sides of the equation.
Since units and 0R are never zero divisors, any field is an integral domain. The ring
Z is an integral domain, but is not a field. So the converse does not hold.
Proposition 6.8. Let n > 1. The ring Z/nZ is an integral domain if and only if
n is prime.
Remark: A finite ring is an integral domain if and only if it is a field. Can you
prove it?
For example, (Z/2Z) (Z/2Z) has four elements which we will denote by 0 = (0, 0),
e1 = (1, 0), e2 = (0, 1), and 1 = (1, 1). Its addition/multiplication tables are:
[Make tables.]
This is a different ring than Z/4Z! You can tell, because every element of Z/2Z
Z/2Z added to itself is 0, whereas this is not the case for the ring Z/4Z. Later we
will study more systematically what it means for two rings to be different.
Definition. Let R be a ring, and X a set. Write F(X, R) for the set of
functions from X to R, with addition and multiplication defined point-
wise as follows. If f and g are two functions with domain X and values
in R, then f + g and f g are defined via (f + g)(x) = f (x) + g(x) and
(f g)(x) = f (x)g(x) for every point of X.
134 6. RINGS
F(X, R) is in fact a ring. [Check a ring property]. Its additive identity 0F (X,R)
is the constant function f0 whose value at every point of X is 0R . In other words
f0 (x) = 0 for all x. Its multiplicative identity 1F (X,R) is the constant function f1
whose value at every point of X is 1R . In other words f1 (x) = 1 for all x. The
negative of a function f is the function defined by (f )(x) = (f (x)).
Lets take X and R to both be the real numbers R. Then F(R, R) is just the set
of real-valued functions with domain R.
The function f (x) = x2 + 1 is a unit because it has an inverse g(x) = x21+1 so that
for all x, f (x)g(x) = 1. Thus f g = f1 . The function f (x) = x is not a unit because
if g F(R, R) is any function, f (0)g(0) = 0, so f g cannot be equal to f1 .
The functions (
0 if x 0
f (x) =
1 if x > 0
and (
1 if x 0
g(x) =
0 if x > 0
satisfy f g = f0 , though neither f nor g is f0 . So f and g are zero divisors.
You should compare the rings R R and F(X, R) if R is a ring and X is a set with
two elements.
2.6. Subrings
Checking all the ring axioms is a little tiresome. Luckily there is a way to generate
rings as subsets of other rings. For example we will say that Z is a subring of Q.
But you cant take any subset. For instance N is a subset of Z but doesnt have a
zero, or any negatives inside N. As another example, the subset {1, 0, 1} isnt a
subring for the basic reason that the operation of addition from Z takes you outside
{1, 0, 1} so it isnt a ring in its own right. Here is the definition of a subring.
If S is a subring then it becomes a ring itself under the operations already defined in
R. The first two axioms just say those operations on S dont go outside of S. You
dont need to check associativity, commutativity, or distribution because theyre
true in R and in particular for elements of S.
Proof. Let (a, a) and (b, b) be in R . Then (a, a)+(b, b) = (a+b, a+b) R ,
(a, a)(b, b) = (ab, ab), and (a, a) = (a, a), which shows that R is closed under
addition, multiplication, and negation. Moreover 1RR = (1R , 1R ) R .
2.7. Exercises
(6) Let R be an integral domain. Prove that if a|b and b|a, then there is a
unit u R so that b = au.
(7) Show the previous exercise may be false if R is not an integral domain.
(8) Let R be a commutative ring. Say an element x R is nilpotent if there
is an n N so that xn = 0. Prove that the sum of two nilpotent elements
is nilpotent.
(9) Let X be a set. Let R be the set of subsets of X, with the following
addition and multiplication laws. For A, B R, define multiplication via
A B = A B. Define addition via
A + B = (A B) (B A).
Here A B = {a A | a / B} denotes the set of elements in A which are
not in B.
Check that R is a ring under these operations. Be sure to specify
what the additive and multiplicative identities are.
Write out the multiplication and addition tables in the case where X
has two elements.
(10) Let R = ZZ, with addition and multiplication defined componentwise,
i.e. (a, b) + (c, d) = (a + c, b + d) and (a, b) (c, d) = (ac, bd). Determine
the units and zero divisors of R, and show your reasoning.
(11) Explain why the subset of real numbers with terminating decimal expan-
sions is a subring of R. What are the units?
(12) Let p be a prime. Let Z(p) = {x Q | ordp (x) 0}. Thus Z(p) is the set
of fractions with no ps in the denominator. Check that it is in fact a
subring of Q, using properties of ordp . What are the units? Is it a field?
(13) Let p be a prime. Let Z[ p1 ] = {x Q | ordq (x) 0 for all primes q 6= p}.
Thus Z[ p1 ] is the set of fractions with only ps in the denominator. For
3
instance 25 is in Z[ 51 ] but not in Z[ 13 ]. Check that it is also a subring of
Q. What are the units? Is it a field?
(14) Suppose 1R +1R = 2R is a unit in R. Prove that the equation x2 +bx+c =
0 has a solution in R if and only if b2 4c is the square of an element in
R.
For the next three problems, let R = F(R, R) be the ring of all functions from R
to R as above and S the subring of continuous functions.
(15) Prove that every function in R is either zero, a unit, or a zero divisor in
R.
(16) Find a function in S that is neither zero, a unit, nor a zero divisor in S.
(17) Find three different solutions to the equation f 2 + f = 6 in R. How many
are there in S?
3. ABSTRACT LINEAR ALGEBRA 137
Let R be a commutative ring, and n N. In this section we will define a new ring
Mn (R) of n n matrices with entries in R, and develop its properties. We hope
the reader is familiar with basic linear algebra.
a11 a12 a1n
a21 a22 a2n
X= .
0R 0R 0R
0R 0R 0R
0Mn (R) =
.
0R 0R 0R
The negative of a matrix X Mn (R) is the n n matrix whose entries are the
negatives of the entries of X; thus
a11 a12 a1n
a21 a22 a2n
X =
.
an1 an2 ann
Write Rn for n-tuples of elements of R. They are called vectors, and the elements are
called components. Thus a typical vector v Rn can be written v = (a1 , . . . , an ).
A given row or column of a matrix forms a vector as usual; for instance the second
row of X above is the vector (a12 , a22 , , an2 ).
If i N is not larger than n write ei for the vector whose ith component is 1R
and whose other components are 0R . It is called the ith standard basis vector.
Addition and vectors is performed componentwise in the usual way. We require
the notion of scalar multiplicatio. If a R and v = (a1 , . . . , an ) then we will write
av = (aa1 , . . . , aan ).
Pn
Note that if v = (a1 , . . . , an ) then v = i=1 ai ei .
138 6. RINGS
u (v + w) = u v + u w.
(u + v) w = u w + v w.
v ei = ei v is the ith component of v.
v (aw) = a(v w).
u v = v u.
(X + Y )v = Xv + Y v.
(F)Xei is the ith column of X.
X(v + w) = Xv + Xw.
X(av) = a(Xv).
The property (F) belongs to any linear algebraists toolkit, and is worth meditating
on. It implies that X is determined by its multiplications against the standard basis.
In particular,
| | |
X = Xe1 Xe2 Xen .
| | |
Pn
The last twoPproperties can be iterated to show that if v = (a1 , . . . , an ) = i=1 a i ei ,
n
then Xv = i=1 ai (Xei ).
In other words, if
| | |
X = v1 v2 vn ,
| | |
then Xv = a1 v1 + a2 v2 + + an vn .
Let X, Y, Z Mn (R). Write I = 1Mn (R) for the n n identity matrix; this is the
square matrix whose diagonal elements aii are all 1R and whose other elements are
0R . Thus,
1R 0R 0R
0R 1R 0R
.
1Mn (R) =
0R 0R 1R
Note that the ith row is ei , as is the ith column.. Since R is nontrivial, I 6= 0Mn (R) .
The following properties follow from the above properties of matrix-vector multi-
plication:
(X + Y )Z = XZ + Y Z.
X(Y + Z) = XY + XZ.
XI = IX = X.
Note that since we have defined matrix multiplication in terms of columns, we can
translate it by (F) into
X(Y ei ) = (XY )ei .
The RHS tells you what the ith column of XY should be, given the ith column,
Y ei , of Y . Associativity of multiplication sprouts out of this.
Proposition 6.11. If X, Y Mn (R) and v Rn then X(Y v) = (XY )v.
Pn
Proof. Let v = (a1 , . . . , an ) = i=1 ai ei . The properties we have developed
thus far show that
Xn Xn n
X
X(Y v) = X(Y ( ai ei )) = X( ai Y (ei )) = ai X(Y ei ) =
i=1 i=1 i=1
n
X n
X n
X
ai ((XY )ei ) = (XY )(ai ei ) = (XY )( ai ei ) = (XY )v.
i=1 i=1 i=1
Proof. It is enough to show that the columns of the LHS and RHS are the
same. The jth column of the LHS is (XY )zj , where zj is the jth column of Z. The
jth column of the RHS is X(Y zj ). By the previous proposition we are done.
140 6. RINGS
1 0 0 1 0 1 0 0 0 1 1 0
= 6= = .
0 0 0 0 0 0 0 0 0 0 0 0
I would like to make some general remarks now about the theory of divisibility,
units, and zero divisors in noncommutative rings.
Thus in the above example, A is a left divisor of B but not a right divisor of B. I
would propose the notations a|r b and a|` b, but we shall not have much occasion to
use this.
This left/right nuance for units doesnt get noticed in linear algebra.
(Anyone know about the case of Mn (R) for general R? Drop me a line.)
Units in matrix rings play an important role in linear algebra, but they usually are
called something else.
(Anyone know about the case of Mn (R) for general R? Drop me a line.)
Facts 6.13 and 6.14 are not true for general noncommutative rings, but the examples
are a little heavy. Here is a sketch of an example. Consider a real vector space V
with an infinite basis of the form {e1 , e2 , . . .}. The set R of linear transformations
L : V V forms a ring, under pointwise addition and composition. Consider the
linear transformations , , and defined by
(e1 ) = 0, (e2 ) = e1 , (e3 ) = e2 , . . . and
(e1 ) = e2 , (e2 ) = e3 , (e3 ) = e4 , . . . .
(e1 ) = e1 , (e2 ) = 0, (e3 ) = 0, . . . .
In other words, moves the basis vectors to the left, moves them to the
right, and sends all the basis vectors but e1 to 0. Then the reader may check
that = 1R , the identity transformation, but since (e1 ) = 0, there is no linear
transformation 0 with 0 = 1R . Thus is a left unit but not a right unit.
Similarly, is a right unit but not a left unit. Finally note that = 0R , so
is a left zero divisor. If there were a R with = 0R then, applying to both
sides of = 1R we would have 0R = , so is not a right zero divisor.
We can save ourselves some headache if we focus on the 2 2 case, which is plenty
big enough for our purposes. For every matrix X M2 (R) there is an associated
element det(X) R, given by
a b
det = ad bc.
c d
Here are some properties of the determinant, which can be checked directly.
r R so that
Conversely, if det(X) is a unit, then there is an element
dr br
det(X)r = 1. Then the main formula shows that rY = is
cr ar
inverse to X.
4. Chapter Wrap-Up
4.2. Toughies
(1) There are four rings which contain exactly 4 elements. Find them, and
write out their addition/multiplication tables.
(2) Find all the subrings of Q.
CHAPTER 7
Polynomials
145
146 7. POLYNOMIALS
1. Polynomials
In calculus they are usually the first functions studied as they are the most amenable
to differentiation and integration. Indeed, they are closed under these operations.
It is often of interest in the above examples to know the roots of polynomials, and
how they factor. This leads to the study of the arithmetic of polynomials, which
we pursue in this section.
Remark: If f = 0 is the zero polynomial, we do not define a degree. Some call the
degree and assume a calculus for such a symbol. We, however, feel this gives an
undue mysticism for and choose to deal with the zero polynomial separately.
Thus f = LT(f ) + f< , where f< is a polynomial with degree less than f . Note that
LT(f ) 6= 0.
1. POLYNOMIALS 147
It is easy to see that the additive ring axioms for R impose the same axioms for
this addition.
Lemma 7.1. (Degree Estimate for Addition) If f and g are nonzero polynomials,
with g 6= f , then deg(f + g) max{deg(f ), deg(g)}. If deg(f ) > deg(g) then this
is exactly deg(f ), and LT(f + g) = LT(f )
Pm This is perhaps
Proof. Pn best done with summation notation: Write f =
axl , g = i=0 bi xi , h = i=0 ci xi , and let N m, n. Then
m n
! N
! N
X X X X
l i i l i
f (g + h) = ax bi x + ci x = ax (bi + ci )x = a(bi + ci )xi+l .
i=0 i=0 i=0 i=0
148 7. POLYNOMIALS
Definition. Since every polynomial is a sum of its leading term and its
lower degree terms, we can define polynomial multiplication recursively
via (
0 if f = 0,
f g =
LT(f ) g + f< g if f 6= 0
The induction will stop when f is a monomial, since then f< = 0. Note that if
deg(f ) = 0, then f is a monomial, so the induction process will certainly end.
Corollary 7.5. If f is a monomial and g and h are polynomials, then (g +h)f =
g f + h f.
We finish with associativity. The idea is simply three inductions, one for each term.
The reader should fill in the details.
Proposition 7.8. If f and g are monomials, and h is a polynomial, then (f g)h =
f (g h).
1. POLYNOMIALS 149
By the previous section, the addition and multiplication laws on the set of polyno-
mials make it into a ring.
In this section we will try to find the units and zero divisors of R[x] and succeed
when R is an integral domain. We start with a lemma.
Lemma 7.11. Let R be an integral domain, and f R[x] a nonzero monomial. If
g 6= 0, then f g 6= 0 and LT(f g) = LT(f ) LT(g).
Proof. We leave this as an exercise, with a hint. The first fact is easily seen
with a direct calculation. For the second, use the LT-method. Thus make a
direct calculation when f is monic. When f is a polynomial, use induction.
Note that if (x c)|f (x), then c is a root of f . The converse is true; we will later
prove it in the case where R is a field.
1.5. Exercises
(1) Theorem 7.6, Lemmas 7.8, 7.9, Theorem 7.10, Lemma 7.11, and Proposi-
tion 7.17.
(2) Let R be an integral domain. Let f, g R[x] with f nonconstant and
g 6= 0. Prove that the set {i N; f i |g} is bounded above.
(3) In the above situation, define ordf (g) = max{i N; f i |g}. Prove that
ordf (g) = i if and only if there is an h F [x] so that g = f i h and f - h.
(4) How many functions are there from Z/3Z to Z/3Z? Find two different
polynomials in (Z/3Z)[x] which give the same function on Z/3Z. Also try
this exercise for other Z/nZs.
(5) R may not be an integral domain. Prove that in this case we still have
deg(f g) deg(f ) + deg(g) if f g is nonzero.
1. POLYNOMIALS 151
(6) Find a quadratic polynomial in (Z/8Z)[x] with 4 roots. Are there any
with 5 roots?
For the next three problems fix a ring R, and consider polynomials in R[x]. Let
f R[x] be a nonzero polynomial. Define the order of f , written (f ), to be the
lowest power of x with a nonzero coefficient.
Much of the material from the first chapter can be generalized to an arbitrary
commutative ring R.
Example: When R = Z, the irreducible elements are the primes of N and their
negatives. Every irreducible element of Z(p) is associate to p.
The arithmetic of R[x] is much nicer when R is a field, and is very similar to that
of the ring Z. In this section we will show that, up to constants, any nonzero
polynomial factors uniquely into irreducible polynomials.
Let F be a field. By the work in the previous section, we know that F [x] is an
integral domain, and the units are exactly the polynomials of degree 0. Up to these
units, we will have unique factorization. We will follow the same basic path as with
N.
Proof: We have already mentioned one direction. Apply the division algorithm to
f and x c. If x c does not divide f , then f (x) = p(x)(x c) + r(x), with
deg(r) = 0. Then f (c) = r 6= 0.
c an
Every number in the third row should be multiplied by c and the result
should be put in the upper right entry of the second row.
Every number in the second row should be added to the entry above it
and the result should be put in the row below it.
Note that the last entry is f (c), which is the remainder. The other entries in the
third row are the coefficients of the quotient p, which will be one degree less than
f.
1 5 0 1 3
2 6 12 26
2 1 3 6 13 23
Heres why this works. Following the rules sets up (the coefficients of) a polynomial
p and a number r so that the second row is (the coefficents of) cp, and the third
row is xp + r. (The factor of x accounts for the shift to the left.) Moreover, the
third row is the sum of the first two. Thus, f + cp = xp + r. Regroup this to get
f = (x c)p + r.
Here is another benefit of synthetic division when F is the real numbers R (or Q).
You can often use the p to zoom in on possible roots of f . For instance consider
f (x) = x3 5x + 6 and c = 3:
1 0 5 6
3 9 12
3 1 3 4 18
So 3 is not a root, and f (x) = (x2 + 3x2 + 4)(x 3) + 18. Can f have any roots d
greater than 3? If so, then f (d) = (d2 + 3d2 + 4)(d 3) + 18. But since d > 3 > 0,
all these terms are positive, and so the result cannot be 0!
Proposition 7.21. Let f R[x] and c R. If the third row of the synthetic
division table consists of positive numbers, then there are no positive roots of f
greater than c.
Proof. Under the given hypotheses, f (x) = p(x)(x c) + r, with r > 0 and
the coefficients of p positive. Let d > c be positive. Then f (d) = p(d)(d c) + r.
Since d is positive, so is p(d). Since d > c, d c > 0. Thus f (d) > 0.
Here is the rule in the other direction, which the reader should enjoy proving:
Proposition 7.22. Let f R[x] and c R. If the entries of the third row of the
synthetic division table are nonzero and alternate sign, then there are no negative
roots of f less than c.
Remark: In this and the previous proposition, the c in the corner is not considered
part of the third row.
1 0 5 6
3 9 12
3 1 3 4 6
2. POLYNOMIALS OVER A FIELD 155
Thus we know all real roots of f lie between 3 and 3. In fact there is exactly one
real root, about 2.68. The third row of the synthetic division table for c = 1 has
a negative entry; these simple tests give only one-way information.
Lets specialize to the rational numbers Q. Say youre given a polynomial f Q[x]
and need to find its rational roots. For instance, f (x) = 3x5 17 4 9 3 2
2 x + 2 x + 4x
11
2 x 1. At first it seems there are infinitely many possibilities, and most of them
will not be roots. In this section we will use a little modular arithmetic to show
that there are actually only finitely many possibilities; in this case you need to
check 1, 2, 12 , 13 , 23 , 16 . Some fluency with synthetic division makes this
even easier.
Before we begin please note that any polynomial f Q[x] can be multiplied by a
constant N Z so that N f Z[x]. For example, N could be the product of the
denominators of the coefficients of f . In the above example, 2f = 6x5 17x4 +
9x3 + 8x2 11x 2 Z[x]. The roots of f are of course the same as the roots of
N f . So we may reduce to the case of integer polynomials.
Thus there are only finitely many possibilities for p and q, as long as a0 and an are
nonzero. (What if they arent?) In the above example, a5 = 6 and a0 = 2; this is
how the list of possible roots was made. In general you list every number which
may be written as a divisor of a0 divided by a divisor of an .
n n1
Proof. If pq is a root of f , then 0 = f p
q = an p
q + an1 p
q + +
a1 pq + a0 . Multipling by q n we obtain
an pn + an1 pn1 q + + a1 pq n1 + a0 q n .
So a0 qn 0 mod p. Since gcd(p, q) = 1, we know q is a unit mod p. So we can invert
it to get a0 0 mod p. This exactly says that p|a0 . Similarly, an pn 0 mod q
says that q|an .
Lets use this to find the rational roots of our example. First try c = 1.
6 17 9 8 11 2
6 23 32 24 13
1 6 23 32 24 13 15
Thus 1 is not a root; too bad. Since the digits of the third row have alternating
sign, by Proposition 7.22 we know that any roots must be greater than 1; this
rules out the 2. Lets check c = 1.
156 7. POLYNOMIALS
6 17 9 8 11 2
6 11 2 6 5
1 6 11 2 6 5 7
Thus 1 is not a root. Since the third row is neither all positive nor alternating, we
cant use either Proposition 7.21 or Proposition 7.22 to rule out any more roots.
Too bad.
Next, c = 16 .
6 17 9 8 11 2
1 3 2 1 2
16 6 18 12 6 12 0
Thus 2 is a root, and in fact 2f = (6x 1)(x 2)(x3 x2 + 1). The only possible
roots of the cubic are 1, and we have already checked that these are not roots.
So we are done; the only roots of the original f are 2 and 61 .
Here are some corollaries of the theory, starting with an important definition:
p
Indeed, if an = 1, then q|1 so q = p.
Corollary 7.25. There is no rational square root of 2.
Indeed, the only possible rational roots to x2 2 are 1, 2, and the squares of
these are not 2. Of course there is a real root to x2 2.
Proposition 7.27. If f and g are monic polynomials of the same degree, and f |g,
then f = g.
Proof. Exercise.
Let f, g F [x] be polynomials, not both 0. Consider the set I = {af + bg|a, b
F [x]}
Corollary 7.29. (Bezout Identity) Let f, g be polynomials, not both 0. Then there
are polynomials a, b F [x] so that af + bg = gcd(f, g).
Proof. Let d = gcd(f, g). The cases a = 0 or b = 0 occur when g|d or f |d; so
we may rule out these cases.
Let b = b0 + pf . (If b = 0 then f |b0 which implies f |d.) We can rewrite the above as
bg = d af . Since d|f but f - d, deg(af ) > deg(d) and so deg(d af ) = deg(af ).
Thus deg(b) + deg(g) = deg(bg) = deg(af ) = deg(a) + deg(f ) < deg(g) + deg(f ),
and therefore deg(b) < deg(f ).
Remark: There is a Euclidean Algorithm for polynomials analogous to the one for
N, but it is computationally cumbersome in practice.
It was slightly bad of me to write down this definition before proving the following:
Proposition 7.31. There is a monic polynomial d so that Div(f1 , . . . , fn ) = Div(d).
Therefore this is the unique monic polynomial of greatest degree in Div(f1 , . . . , fn )
A monic polynomial of smallest possible degree certainly exists; this is the Max
form of Well-Ordering on the set of degrees. The interesting algebraic fact is that
there is only one such polynomial.
2. POLYNOMIALS OVER A FIELD 159
The following plays a vital role in the theory of canonical forms in linear algebra,
so we include a proof.
Proposition 7.32. Let f1 , . . . , fn F [x], not all 0, with gcd(f1 , . . . fn ) = d. Then
there are polynomials a1 , . . . , an so that a1 f1 + + an fn = d.
We omit the proof because it is just like the proof of Proposition 2.49 in Section
9.2.
Theorem 7.34. If f F [x] has degree d, then f has at most d distinct roots.
This section is closely related to the section for numbers. We leave out some proofs
which are identical to the proofs for polynomials.
Note that this agrees with the definition in a general ring, because polynomials of
degree 0 are exactly the units.
Proof. We give the proof for deg(f ) = 3; the other case is similar. By the
Lemma, if f is irreducible, it does not have any roots in F . Now suppose f doesnt
have any roots in F . Suppose f factored in some way as f = gh. Then
3 = deg(f ) = deg(g) + deg(h).
Since these are all whole numbers, the only possibilities for the degree of g are
0, 1, 2, 3. If the degree is 0 or 3 then g is not a proper divisor. If the degree is 1
then f has a root by the Lemma, a contradiction. If deg(g) = 2 then deg(h) = 1
and so f has a root by the Lemma again. Therefore f is irreducible.
Example: f = (x2 + 1)2 R[x] is reducible although it doesnt have any roots.
Lemma 7.39. If f is irreducible and f |gh then f |g or f |h.
Proof. Exercise.
Proposition 7.42. Let f be irreducible and n N. Then
Div(f n ) = {cf (x)e ; e n, c 6= 0}.
Proof. Exercise.
Corollary 7.43. Let f, g be distinct monic irreducible polynomials, and m, n N.
Then gcd(f m , g n ) = 1.
Proof. Exercise.
There is a small issue in stating the Existence part of the Fundamental Theorem
of Arithmetic. In the natural number case, it was obvious that there were only a
finite number of primes dividing a given N , simply because such primes must be
less than N . In the polynomial case, all we know a priori is that a polynomial
dividing G must have a smaller degree. If F is infinite, there are infinitely many
monic irreducible polynomials, even of degree one!
Proof. Let n be the degree of G, and suppose f1 , . . . , fn+1 are distinct monic
irreducible divisors of G. By Lemmas 7.40 and 7.35, these are coprime. Therefore
by Proposition 7.33, we see that the product (f1 f2 fn+1 )|G. But the degree of
this divisor is clearly greater than the degree of G, so this is impossible.
Theorem 7.46. (Existence) Let G be a nonconstant polynomial, and let f1 , . . . , fm
be the monic irreducible divisors of G. Let ei = ordfi (G) for all such i. Let c be
the leading coefficient of G. Then
G(x) = cf1 (x)e1 f2 (x)e2 fm (x)em .
Proof. Obviously the fi at least form a subset of the irreducible monic divisors
of G, and the definition of ord implies that e0i ordfi (G). It follows that the degree
of the RHS of the equation in the corollary is no bigger than the degree of the RHS
of the equation in the theorem, and equality of degrees can only hold if we have
equality of the ei and e0i . By comparing the leading coefficients we conclude that
c = c0 .
2.7. Exercises
x4 x4 + 1 x4 + 2 x4 + x x4 + x + 1 x4 + x + 2 x4 + 2x x4 + 2x + 1 x4 + 2x + 2
Circle the irreducible polynomials and cross out the reducible ones.
(10) Prove the product of two monic polynomials is monic.
(11) Let f (x) = an xn + + a1 x + a0 R[x], and suppose f has at least n + 1
distinct roots. Use linear algebra, notably the theory of the Vandermonde
determinant, to prove that f = 0.
(12) Prove Corollary 7.20 for a general ring R.
(13) If q(x) = an xn + + a1 x + a0 R[x], and f is a sufficiently differentiable
real-valued function on R, let
q(D)f = an f (n) (x) + + a1 f 0 (x) + a0 f (x).
Here f (n) denotes the nth derivative of f . Prove that if q = q1 + q2 , then
q(D)f = q1 (D)f + q2 (D)f , and if q = q1 q2 , then q(D)f = q1 (D)(q2 (D)f ).
(14) Suppose that p R[x] is the product of two relatively prime polynomials
p = p1 p2 . Prove that any solution to the differential equation p(D)f = 0
is the sum of two solutions f = f1 +f2 , where p1 (D)f1 = 0 and p2 (D)f2 =
0. [Hint: Apply the Bezout Identity to p1 and p2 .]
164 7. POLYNOMIALS
3. Irreducibility in C[x]
The i are of course not necessarily distinct. There are, however, no other roots
of f as one can check by evaluating the right hand side, using that C is an integral
domain.
4. Irreducibility in R[x]
Suppose
/ R, and consider the polynomial g(x) = (x )(x ).
5. IRREDUCIBILITY IN Q[x] 165
Note that the coefficients of g are ( + ) and , both real numbers. (One
can compute this directly or note that they are fixed by complex conjugation.)
Therefore g R[x].
Apply the division algorithm in R[x] to f and g. We conclude that there are
r, p R[x] so that f = pg +r, with r a constant or linear polynomial. So r = f pg.
Now the right hand side of this has two complex roots, and . This means r
cant be linear or a nonzero constant, and we conclude that r = 0 so g|f .
So, any irreducible polynomials in R[x] must be linear or quadratic. All the linear
ones are irreducible, and the quadratic formula tells us that if a 6= 0, then ax2 +bx+c
is irreducible if and only if b2 4ac is negative.
Theorem 7.50. The only irreducible polynomials in R[x] are:
5. Irreducibility in Q[x]
The field Q is algebraically much more complicated and interesting than the fields
C and R. There are irreducible polynomials of every degree, for instance xn 2 is
irreducible for all n. There is no good algorithm for determining whether a rational
polynomial is irreducible, but we sketch one method in this section.
First note that if f Q[x], one can multiply by a divisible enough integer N to
clear the denominators of f to get g = N f Z[x]. Then f is associate to g, so
f is irreducible if and only if g is irreducible. So we may assume that the f we
started with has integer coefficients. The nice thing about this is that one can look
at it mod p for various primes p. We must proceed cautiously however, because
a polynomial may be reducible in Z[x] but not in Q[x], for instance f = 2x + 2 =
2(x+1) is reducible in Z[x] but not in Q[x]. Since we want to focus on irreducibility
in Q[x] here is a new definition.
Definition. A polynomial f Z[x] is quasiirreducible if it is irreducible when
viewed as an element of Q[x].
166 7. POLYNOMIALS
Here is a key fact called Gausss Lemma, whose proof we defer until the next section:
Lemma 7.52. Let f Z[x]. If f = gh with g, h Q[x], then there is a nonzero
rational number c so that cg, 1c h Z[x].
The previous lemma implies that if f factors in Q[x] it also factors in Z[x].
The next step is to study irreducibility in Z[x]. This is a very messy ring, but
has lots of nice quotient rings. Consider the modding out by p homomorphism
: Z[x] Fp [x] for various primes p. Write f for f mod p. The following is a nice
way to find irreducible polynomials.
Proposition 7.54. Suppose f Z[x] is nonzero and p does not divide the leading
coefficient of f . Suppose further that f mod p is irreducible. Then f is quasiirre-
ducible.
However, it only goes one way. The polynomial x2 + 1, for example, factors as
(x + 1)2 mod 2, but is irreducible in Q[x].
6. Z[x]
The arithmetic of the ring Z[x] is a rich interplay between numbers and polynomials.
Since Z is not a field, there are polynomials of degree 0 which are not units in Z[x],
for example f (x) = 11.
6. Z[x] 167
There is no longer a nice theory of greatest common divisors. For example, the
following proposition should give you pause.
Proposition 7.56. There do not exist polynomials f (x), g(x) Z[x] with 2f (x) +
xg(x) = 1.
Proof. Exercise.
On the other hand, we still have a perfectly good function deg, which satisfies
deg(f g) = deg(f ) + deg(g) since Z is a domain. We will now develop a theory of
ordp for Q[x].
More generally, let f0 = p ordp (f ) f and g0 = p ordp (g) g. Then ordp (f0 ) = 0
and ordp (g0 ) = 0, so ordp (p ordp (f ) f p ordp (g) g) = 0. It is easy to see that
generally ordp (pk h) = k + ordp (h) for h Q[x], so the previous equation becomes
0 = ordp (f g) ordp (f ) ordp (g), giving the desired result.
Theorem 7.58. Let p be a prime. If f, g Q[x], then ordp (f +g) min{ordp (f ), ordp (g)}.
Moreover if ordp (f ) < ordp (g) then ordp (f + g) = ordp (f ).
and ordp (g), it follows that for all i, ordp (ai ) ordp (f ) and ordp (bi ) ordp (g).
Thus for all i,
ordp (ai + bi ) min{ordp (ai ), ordp (bi )} min{ordp (f ), ordp (g)}.
Since ordp (f + g) = mini ordp (ai + bi ) is equal to the LHS of these inequalities for
some i, the first part of the proposition holds.
Now suppose that ordp (f ) < ordp (g). Say that ordp (f ) = ordp (am ) for some m.
Then ordp (f + g) = mini {ordp (ai + bi )} ordp (am + bm ), which is ordp (am ) by
Proposition ??. This shows that ordp (f + g) ordp (f ) = min{ordp (f ), ordp (g)}.
But the first part of the proposition shows the other inequality, and therefore they
must be equal.
6.1. Exercises
7. Rational Functions
Let F be a field. In this section we construct the field of rational functions with
coefficients in F from the polynomial ring F [x]. We leave many routine details to
the reader.
f f
As with rational numbers, we usually write g for [(f : g)] and f for 1.
f
Proposition 7.62. Let F (x). Then there are polynomials q, r with r = 0 or
g
deg r < deg g so that
f r
=q+
g g
f
Definition. A rational function F (x) is called topheavy provided
g
that deg f deg g. It is called bottomheavy provided that deg f <
deg g.
f
Note that this notion does not depend on the representative of the equivalence
g
class, by Proposition 7.61.
7.1. Exercises
8. Composition of Polynomials
Here are some basic properties of composition, which the reader may verify:
(f + g) h = (f h) + (g h).
(f g) h = (f g) (f h).
(f g) h = f (g h).
Proposition 7.65. Suppose that u F [x] has degree 1. Then there is a polynomial
v F [x] of degree 1 so that (u v)(x) = (v u)(x) = x.
Polynomials of degree 1 play the role of units here. We now define an analogue to
primality or irreducibility, but relative to composition.
8. COMPOSITION OF POLYNOMIALS 171
We are ruling out compositional factors of degree 1 because for any degree 1 poly-
nomial u, one always has the trivial decomposition f = (f u) u1 .
Proposition 7.66. Let f (x) = x4 + x Q[x]. Then f is indecomposable.
Here are some basic properties of indecomposability, which the reader may verify.
Proposition 7.67. We have:
8.1. Exercises
9. Chapter 6 Wrap-Up
9.2. Toughies
Real Numbers
175
176 8. REAL NUMBERS
1. Constructing R
In an analysis class one needs axioms for the real numbers R. There are various
formulations of such axioms, but they all mean that R is a complete ordered field,
which we will define in the next section. But even after stating the properties you
want R to satisfy, there are still two logical quandaries. First, is there such a field?
Maybe setting up so many axioms leads eventually to a logical paradox. This
fear can only be assuaged if we construct an example of such a field. Second, is
there more than one such field? This question is subtle and requires the notion of
isomorphism, which we defer until later. [As of now unwritten.]
In this chapter we construct the real numbers in three different ways, via decimals,
Dedekind cuts, and equivalence classes of Cauchy sequences. Each of these has
their merit. Decimals are practical for computation and learned at a young age.
However operations with decimals are difficult to work into an axiomatic framework.
Dedekind cuts give a good framework for proofs, but are a little abstract. The
Cauchy sequence approach is the most abstract, but heres something interesting.
If you change the meaning of what convergence means, you may get an entirely
new field, the p-adic numbers!
All three of these approaches involve having some kind of analytic point of view.
2. Ordered Fields
In this section we define the phrase complete ordered field, together with other
important notions.
An ordered field is not simply an ordered set which happens to be a field; the
ordering must interact with the ring structure.
Suppose some rational number q = sup C, and suppose first that q 2 < 2. By the
first part of the lemma below there is a number q 0 > q with q 0 C, a contradiction
to q being an upper bound. Now suppose that 2 < q 2 . By the second part of
3. DECIMAL EXPANSIONS 177
the lemma below there is a rational number q 0 < q with q 0 an upper bound of C,
another contradiction.
The only remaining possibility is that q 2 = 2, but we know very well this is impos-
sible.
Lemma 8.2. If a Q+ satisfies a2 < 2, then there is a Q+ so that (a + )2 < 2.
If b Q+ satisfies 2 < b2 , then there is a Q+ so that < b and 2 < (b )2 .
Proof. For the first part, put = 2 a2 Q+ and = min{1, 4a } Q+ .
2
Since 1, we have . Since a 1 we have 4a 4 . Therefore 4 .
2
Meanwhile, since 4a , we have 2a 2 . It follows that (a+)2 = a2 +2a + 2
2 2
a + 2 + 4 < a + = 2, as desired.
3. Decimal Expansions
In this section we will discuss the decimal construction of the real numbers R.
The theory of place values and decimals does not lend itself to pleasant proofs,
so we will not give many. We assume the reader is acquainted with basic decimal
arithmetic.
Definition. Let D be the set of decimal expansions, i.e., expressions of the form
= dn dn1 d0 .d1 d2 d3 , with ai , di {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. The lead-
ing digit an should not be 0 if n N. We call di the ith digit of . We say
is terminating if there is an ` Z so that if i < `, then di = 0. In this case d` is
called the last digit of .
For example, we have the decimal expansion 1.3333333 with repeating 30 s. Often
one writes this as 1.3 for brevity. This example corresponds to the rational number
4
3.
Remark: These will give nonnegative real numbers, which are complicated enough...
For k Z write ek for the expansion whose digits are zero except at the kth place.
For example e3 = 0.0010. (In fact ek = 10k .)
Definition. Let k N. If = dn dn1 d0 .d1 d2 d3 is a decimal expansion,
then its kth truncation bck is the expansion dn dn1 dk 0.
Certainly if k < `, and you know bc` , then you know bck as well, by truncating
sooner. In fact, bbc` ck = bck .
Defining the addition of two decimal expansions is a little tricky. Suppose you have
two expansions = dn dn1 d0 .d1 d2 and 0 = d0n d0n1 d00 .d01 d02 , and
want an expansion for + 0 . The basic idea is to add the corresponding digits, but
if they add up to more than 9, then carrying is involved.
If and 0 terminate, then addition as usual by starting with the smallest place
where one of them is nonzero and adding vertically as usual, possibly with carrying.
We do not give more detail here, but it is commutative and associative.
If neither and 0 terminate, then we have to think a little. For example, consider
the addition problem
58.793
+ 41.206
The digits of the sum depend on the omitted digits to the right! If the other
digits are all 00 s, for instance, then the sum is the terminating expansion 99.9990
If the digits in the next place add to more than 10, then the expansion starts as
100.000 . . ..
In reality, these two potential sums correspond to rather close numbers, so the
difference is mild. But to write down a general addition rule with digits takes some
willpower. Here goes.
Definition. Fix two expansions and 0 as above. Let i Z be a place. Let si be
the sum of the digits in the ith place of and 0 . If si 8, then i is called simple.
If si 10, then i is called enhanced. If si = 9, then i is called precarious.
[Example]
Note that we can do all the additions on the right because the expansions involved
terminate.
The above gives an addition law on D. It is easy to see that it is commutative, since
it is commutative for terminating decimals, and since the sum si does not depend
on the order.
If someone has a nice argument for why the law is associative, drop me a line!
3. DECIMAL EXPANSIONS 179
Do you know any of the digits of the result? It looks like the tenths place of the
result is going to be a 7, because the 6 + 6 in the hundredths place adds up to a
12, and one carries the 1. But what if theres a massive amount of carrying before
the hundredths place, and somehow an 8 gets carried, leading to 6 + 6 + 8 = 20?!
As you go to lower and lower places in this multiplication, you can see that the
carrying increases without bound. At the 10100 place, for example, the sum is at
least 600, which means that the carried number is 60!
I hope you can see from this example that developing an algorithm where you input
the (infinitely many) digits of two decimal expansions and output the digits of their
product would be a chore. You know something like
b 0 ck = bck bck
should be true in the limit , even if its not exactly true as it stands.
Addition was simple enough to trudge through with case analysis, but if you want
to have a real number system with clearly defined arithmetic operations one can
work with in an intelligible way, decimal expansions are not the way to go.
Remark: By the way, the above example is comprised of repeating decimals for
simplicity. The clever reader knows this problem is really 20 1
9 and 3 and so the
20
product is 27 = 0.740. But the point was to find a rule that works for all decimal
expansions.
180 8. REAL NUMBERS
Here is another problem. How do you subtract 0.9 from 1.0? We havent discussed
subtraction yet, but the only reasonable decimal expansion which could be the
difference is 0.0. On the other hand, the rule for addition gives 0.9 + 0.0 = 0.9.
At no point in our definition of D did we say that 0.9 = 1.0, so they are different
elements of D. This is not an isolated occurence either; any time you have a
terminating expansion like 0.340 there is a repeating 9s expansion like 0.339 lurking
in its shadow.
Then you can define the nonnegative real numbers R0 to be the set of equivalence
classes. One patiently checks that addition is -invariant, and if anyone gets around
to defining multiplication, subtraction, and division, they can check that those are
also well-defined. The next step is probably to define the notion of <, and then
define negative real numbers and their arithmetic.
At the end of the day youve done a lot of work, but youve kept in touch with your
roots as a student of the decimal system.
4. Dedekind Cuts
For simplicity we will actually construct the positive real numbers R+ first. One
can then form R just as we formed Z from N.
In other words, a cut is an open rational line segment whose left endpoint is 0.
Intuitively, the real numbers are the right endpoints of these cuts.
In fact, Cq) = (0, q), with the understanding that (0, q) = {x Q | 0 < x < q}.
4. DEDEKIND CUTS 181
Proof. Note that Cq) is nonempty since 2q Cq) . Also, Cq) is bounded above
by q. If a Cq) then the average a+q a+q a+q
2 satisfies a < 2 < q, and therefore 2 Cq ,
so a is not the maximum of Cq) . Thus Cq) has no maximum. It is easy to see that
Cq) satisfies the rest of the definition of cut, by the transitivity of inequality.
Suppose C = Cq) for some rational number q, and suppose first that q 2 < 2. By
the first part of Lemma 8.2 there is a number q 0 = q + with q 0
/ Cq) but q 0 C,
2
a contradiction. Now suppose that 2 < q . By the second part of Lemma 8.2 there
is a number q 0 = q Q+ with q 0 Cq) but q 0
/ C, another contradiction.
The only remaining possibility is that q 2 = 2, but we know very well this is impos-
sible.
As you may have guessed, this cut C will correspond to the irrational number 2.
Lemma 8.5. Let C be a cut and Q+ . Then there are numbers p C and q
/C
so that 0 < q p < .
We will soon define addition, multiplication and division for R+ . But first we will
define the sup operation.
+
S S R is a nonempty set of cuts which is bounded above.
Theorem 8.7. Suppose
Then the union C = CS C is a cut.
Proof. Write C for the union. It is obviously nonempty and bounded above.
Suppose C had a maximum m C . There must be a cut C S with m C,
and it is easy to see that m is a maximum of C , a contradiction. This shows that
C does not have a maximum. Suppose a C , and b < a. Then a C for some
C S, and therefore b C C . This shows that C is left-closed.
Note that
Here is the reasoning for the second point: Recall that C SD means C D as
a set of rational numbers. If C D for all C S, then CS C D, which
translates to sup S D.
Proof. It is easy to see that both sets are nonempty. If b1 and b2 are upper
bounds of C1 and C2 then b1 + b2 is an upper bound of C1 + C2 .
Proof. We argue by way of contradiction. Suppose that they are not equal.
By Trichotomy, we may assume that C1 < C2 (the case C2 < C1 is similar). Then
there is a number a C2 C1 . Since a is not the maximum of C2 , there is also
a number b C2 with a < b. Let = b a. By Lemma 8.5, there are numbers
x, y with x + < y, x C0 , and y / C0 . Note that x + b C0 + C2 . Since
C0 + C1 = C0 + C2 , we must be able to write x + b = x0 + x1 , with x0 C0 and
x1 C1 . Since a = b is not in C1 , we have x1 < b . Thus x + b x0 < b ,
or x + < x0 . However, x + > y > x0 , a contradiction.
Proof. First we check that Cq1 ) + Cq2 ) Cq1 +q2 ) . If x Cq1 ) and y Cq1 ) ,
then x < q1 and y < q2 . Therefore x+y < q1 +q2 , which shows that x+y Cq1 +q2 ) .
To check the other inclusion, suppose that z Cq1 +q2 ) . Then z < q1 + q2 . By
Exercise 7 in Section 2.3 again, we may write z = x + y with x < q1 and y < q2 .
This shows that z Cq1 ) + Cq2 ) , as desired.
Proposition 8.11. If C1 and C2 are cuts, then C1 C2 is a cut.
Remark: This Proposition would not be true if we allowed the negative numbers
in our cuts.
Proof. Exercise.
Proposition 8.12. If q1 , q2 Q+ , then Cq1 ) Cq2 ) = Cq1 q2 ) .
Proof. Exercise.
Proposition 8.13. Suppose that C1 C2 , and C0 R+ . Then C1 + C0 C2 + C0
and C1 C0 C2 C0 .
Proof. Exercise.
Proposition 8.14. If C is a cut, then C C1) = C.
Next, suppose that x C. Since C does not have a maximum element, there is a
x1 C with x < x1 , and therefore xx1 C1) . Thus x = x1 xx1 C C1) . It follows
that C C C1) .
Lemma 8.15. Let C be a cut, and write S = {C 0 R+ |CC 0 C1) }. Then S is
nonempty and bounded above.
184 8. REAL NUMBERS
Let q C1) so that 0 < q < 1 and put = 1 q. Let x C and y / C as in the
lemma below. Since y / C the cut Cy1 ) is in S. Therefore xy CCy1 ) CC .
1
have q CC as well.
Lemma 8.17. Let C be a cut, and Q+ . Then there are elements x C and
/ C so that 0 < 1 xy 1 < .
y
Definition. Let C be a cut. Then write C 1 for the cut C from the
previous proposition.
4.2. Exercises
(1) The rest of Lemma 8.2 and Propositions 8.11 and 8.12.
(2) Let C1 and C2 be cuts with C1 < C2 . Prove that there is a rational
number q so that C1 < Cq) < C2 .
(3) Prove that if C is a cut, then C = supqC Cq) .
(4) If C is a cut, is the set {y Q+ |xy < 1 for all x C} necessarily a cut?
1
(5) Let q Q+ . Prove that Cq) = Cq1 ) .
(6) Let C be a cut. Prove that there is a cut D so that D2 = C.
(7) Definition 5 of Book V of Euclids Elements reads, Magnitudes are said
to be in the same ratio, the first to the second and the third to the
fourth, when, if any equimultiples whatever are taken of the first and
third, and any equimultiples whatever of the second and fourth, the former
equimultiples alike exceed, are alike equal to, or alike fall short of, the
latter equimultiples respectively taken in corresponding order. Explain
how this is essentially the notion of a cut. (Remark: Online commentary
on this topic is confusing and irrelevant to this problem, so dont bother
reading it.)
4. DEDEKIND CUTS 185
In this section we add the element 0 to R+ . At this point you can forget the
definition of cuts; all we need is the properties weve been accumulating.
For example, Q+ and R+ are prefields. If F is an ordered field, then the positive
elements form an ordered prefield.
Proof. Suppose this were the case. Multiplying by the inverse of x gives
1 + x1 y = x1 y. Since P is nontrivial, there is an element 1 6= z P . Adding z
to both sides of the equation gives 1 + x1 y + z = x1 y + z. By the Cancellation
Law we obtain
(3) 1 + z = z.
Adding 1 to both sides of this equation and cancelling zs gives 1 + 1 = 1. By
Distribution it follows that z + z = z 1 = z. Substituting z + z into the right
hand side of Equation (3) gives 1 + z = z + z. By Cancellation we obtain 1 = z, a
contradiction.
Proposition 8.20. If P is a prefield (resp. an ordered prefield), then the new set
P has all the properties of a prefield (resp. and ordered prefield), except that 0 does
not have a multiplicative inverse.
186 8. REAL NUMBERS
Proof. Exercise. Your proof should involve the Associative and Commutative
Laws for Addition, and the Cancellation Law.
Proof. Exercise.
Every arrow is thus equivalent to hx, 0], h0, x], or h0, 0], for some x P . We call
the arrows in the first case positive and in the second case negative.
We have the same rules for adding and multiplying arrows in A(P ):
For x P , write x for the equivalence class of hx, 0] and x for the class of h0, x].
As in the case of Z, P [Z] is the union of the xs, the xs, and 0. Therefore as a
set, P [Z] = P P {0}. This is how we would usually think of it.
4.5. Exercises
(1) Let X be a nonempty set, and P = {f F(X, R)|f (x) > 0 for x X}.
Show that P is a prefield.
CHAPTER 9
Miscellaneous
189
190 9. MISCELLANEOUS
1. An ODE Proof
f (x)
Proof. Consider the function g(x) = . Note that this division makes
ex
ex f 0 (x) f (x)ex
sense because ex is never zero. By the quotient rule, g 0 (x) = .
0 x 0 x
e2x
Since f (x) = f (x) for all x, we also have e f (x) f (x)e = 0 for all x. Therefore
f (x)
g 0 (x) = 0; therefore g is a constant function. Call the constant C; thus x = C,
e
and it follows that f (x) = Cex , as claimed.
This puts the issue to rest. We do not need to worry about any clever people
thinking up other solutions, or more importantly whether our real-world phenom-
enon modeled by y 0 = y can be anything other than a multiple of the exponential
function. (What about f (x) = ex+1 ? If youre really stuck try using the proof to
find the C.)
Thus,
|y| = ex+C1 .
So
y = ex eC1 ,
and therefore
y = Cex ,
where C = eC1 .
f f x f y
= + .
t x t y t
If x and y were independent mathematical quantities, then one could cancel them
and be left with the paradoxical and very incorrect
f f
=2 .
t t
Thus, students who have learned the wrong proof have bad intuition and are
now completely confused about partial derivatives. It is better not to memorize
mumbo-jumbo, and best to learn real proofs.
1.1. Exercises
2. Pythagorean Triples
Around 1800 B.C.E. a Babylonian clay tablet was found recording triples of numbers
such as (3, 4, 5), (8, 15, 17), (6, 8, 10), (5, 12, 13). Mathematicians recognize these as
integer solutions to the equation a2 +b2 = c2 . The triples of integers (a, b, c) solving
this equation are called pythagorean triples. I want to allow negative solutions as
well, so notice that any solution (a, b, c) also gives solutions (a, b, c) and also
(b, a, c).
How do you get all of the solutions? We will use a technique called algebraic
geometry to study this problem. This section will bring in more powerful ideas
from than you may be comfortable with. Some facts about integers will be proved
rigorously much later. Dont get too nervous, just sit back and enjoy the ride. It
is good to be occasionally exposed to deeper mathematical thought.
By a rational point P on the plane we mean a point whose coordinates are both
rational numbers.
Proposition 9.2. If P , Q are rational points in the plane, with different abscissas,
then the line connecting them has rational slope.
Proof. Let P = (x1 , y1 ) and Q = (x2 , y2 ). The slope of the line connecting
them is
y2 y1
m= .
x2 x1
Since x1 , x2 , y1 , y2 are all rational numbers, so is m.
Here is the calculation. A line ` through P with slope t is given by the equation
y = t(x + 1). If (x, y) satisfies both this equation and also x2 + y 2 = 1, then
x2 + t2 (x + 1)2 = 1. Solving this with the quadratic equation gives
t2 1 1 t2
x= , meaning (x = 1) x = .
1 + t2 1 + t2
Of course x = 1 gives the point P which we already know. The other value gives
1 t2
2t
Q= , .
1 + t2 1 + t2
1t2 2t
Proposition 9.3. All rational points on the unit circle C are of the form 1+t2 , 1+t2 ,
for t Q, or (1, 0).
m2
!
1 2m n2 m2
n2 n 2mn
Q= m2
, m2
= , 2 .
1+ n2
1+ n
n + m n + m2
2 2
At this point it should be obvious to you how to find some (a, b, c) so that
a n2 m2 b 2mn
= 2 and = 2 ;
c n + m2 c n + m2
surely one takes a = n2 m2 , b = 2mn, and c = m2 + n2 . And indeed you can
and should check that (n2 m2 )2 + (2mn)2 = (m2 + n2 )2 . Do we have a proof
that all Pythagorean Triples are of the form (a, b, c) = (n2 m2 , 2mn, m2 + n2 ),
with m, n integers? If you reflect on this for a moment, youll notice something
is amiss. To start with, were not getting the solution (4, 3, 5), because 3 is odd
and 2mn obviously has to be even. Were also neither getting (3, 4, 5) because
m2 + n2 is obviously positive. Thirdly, notice that this doesnt either give the
solution (9, 12, 15) since you cant write 15 as the sum of two integer squares.
We are still bothered by the counterexample (4, 3, 5), because it was supposed
to be covered by this geometric method. Lets go through the program with this
example. It corresponds to the point ( 45 , 35 ) in C. The slope of the line connecting
this point to P is t = 13 (right?). So we were supposed to get this point from m = 1
and n = 3. Heres what happened: Plugging these values into the last expression
for Q gives
2
3 1 23 8 6
Q= , = ( , ).
32 + 1 32 + 1 10 10
So you see what happened? These fractions both reduce further to ( 54 , 35 ), and then
we get our primitive pythagorean triple (4, 3, 5). The point is that even though
194 9. MISCELLANEOUS
t= mn was a reduced fraction, the formula for the coordinates of Q did not give
reduced fractions.
m
So heres what happens in general. There are two cases. Take t = n a reduced
fraction.
Case II: If m and n are both odd, then these fractions are in lowest terms after
dividing by 2. In other words,
1 2 2
2 (n m )
mn
Q= 1 2 2
, 1 2 2
.
2 (n + m ) 2 (n + m )
195