Discrete Mathematics Monash University

Contents
Lecture 1: What is MAT1830 about?

Lecture 2: Divisors and primes
Lecture 3: Congruences
Lecture 4: Logic
Lecture 5: Tautologies and logical equivalence
Lecture 6: Rules of inference
Lecture 7: Predicates and quantifiers
Lecture 8: Predicate logic
Lecture 9: Mathematical induction
Lecture 10: Induction and well-ordering
Lecture 11: Sets
Lecture 12: Operations on sets
Lecture 13: Functions
Lecture 14: Examples of functions
Lecture 15: Composition and inversion
Lecture 16: Relations
Lecture 17: Equivalence relations
Lecture 18: Order relations
Lecture 19: Selections and arrangements
Lecture 20: Pascal’s triangle
Lecture 21: Probability
Lecture 22: Conditional probability and Bayes’ theorem
Lecture 23: Random variables
Lecture 24: Expectation and variance
Lecture 25: Discrete distributions
Lecture 26: Recursion
Lecture 27: Recursive algorithms
Lecture 28: Recursion, lists and sequences
Lecture 29: Graphs
Lecture 30: Walks, paths and trails
Lecture 31: Degree
Lecture 32: Trees
Lecture 33: Trees, queues and stacks
Unit Information
MAT1830 – Semester 1 2020
Course coordinator/lecturer (Clayton) Lecturer (Clayton)
• Name: A/Prof Daniel Horsley • Name: Dr. Mikhail Isaev
• Office: 9Rnf/418 • Office: 9Rnf/435
• Phone: 9905 4459 • Email: mikhail.isaev@monash.edu
• Email: daniel.horsley@monash.edu
Lecturer (Clayton) Lecturer (Malaysia)

• Name: Prof. Ian Wanless • Name: Dr. Tham Weng Kee
• Office: 9Rnf/420 • Email: tham.wengkee@monash.edu
• Phone: 9905 4442
• Email: ian.wanless@monash.edu
Moodle Page
The course’s moodle page can be accessed through the my.monash portal at
https://my.monash.edu.au/
Unit Guide
The full unit guide can be found on the moodle page. It contains a topic outline and information on
assessment, special consideration, plagiarism etc.
Course Information
• Lectures: Three hours per week (lectures run in weeks 2–12).
• Applied classes: One per week as allocated (applied classes run in weeks 2–12).
• Required materials: Course notes booklet – available for less than $10 from the Clayton campus
bookshop or available as a pdf from the course Moodle page. Note that there is no required
textbook for the course.
Recordings of the lectures will be available through the moodle page.
Assessment
• Applied class participation worth 5%
• Ten assignments worth 3.5% each (one due each week from week 3 to week 12).
• Final examination worth 60% (held in the examination period).
In order to pass the course you must receive at least 50% overall AND at least 45% of the exam marks
AND at least 45% of the other marks.
You receive the 5% participation marks if you participate in at least 8 of your 11 applied classes,
otherwise you receive 0%. The assignments will be issued in lectures and will be available from the
course Moodle page. Assignments are to be submitted to your tutor during your applied class. They
will then be marked and returned to you in your next applied class. No calculators or other materials
will be allowed in the final exam.
Mathematics Learning Centre

The Mathematics Learning Centre is open weekdays from 11–2 on the ground floor of 9Rnf (from
week 2 right through exams). You can drop in for help anytime during these hours.
Lecture 1: What is MAT1830 about?
Discrete mathematics studies objects which we’ll see, even if you are pretty sure that some-
have distinct separated values (e.g. integers), as thing is true, it can be really useful to have a
opposed to objects which vary smoothly (e.g. proof of it, for a number of reasons.
real numbers). You can think of it as being
“digital” mathematics rather than “analogue” 1.3 Maths in computer science
mathematics.
Discrete mathematics is particularly impor- As we mentioned above, maths and computer
tant in computer science and the two fields are science are very closely related. The topics in
very closely linked. this course all have many applications to com-
This course covers a wide range of topics in puter science. For example:
discrete mathematics including the following:
• Number theory is used in cryptography
• Numbers to enable secure communication, identity
verification, online banking and shopping
• Logic etc.
• Induction and recursion • Logic is used in digital circuit design and

in program control flow.
• Sets, functions and relations
• Induction and recursion are used to study
• Probability algorithms and their effectiveness.
• Graph theory • Functions are important in the theory of
programming and relations are vital in
1.1 What to expect database theory and design.
What we do here might be a bit different to a lot • Probability is vital for understanding ran-
of the maths you’ve done in the past. We’ll be domised algorithms and for creating sys-
concentrating on really understanding the con- tems to deal with uncertain situations.
cepts, rather than simply learning how to solve
certain types of questions. • Graph theory is used in software which
For a lot of the questions we ask, there won’t solves allocation and scheduling problems.
be a single fixed procedure you can apply to get
the answer. Instead, you’ll have to think care- Questions
fully about what the question is asking and try
1.1 What maths that you’ve done in the past
to work out what is really going on. Don’t be
would count as discrete? What would
afraid to try different things, play around, and
count as continuous instead? Are there
look at examples.
grey areas?
We’ll also be emphasising the importance of
proving results. 1.2 Why might proofs be important to math-
ematicians and computer scientists?
1.2 Proofs
1.3 Can you think of other links between
A proof is essentially just a water-tight argu- maths and computer science?
ment that a certain statement must be true. As
Lecture 2: Divisors and primes
Thus to test whether 10001 is prime, say, we

We say that integer a divides integer b if
only have to see whether any of the√numbers
b = qa for some integer q.
2, 3, 4, . . . 6 100 divide 10001, since 10001 <
101. (The least divisor found is in fact 73, be-
Example. 2 divides 6 because 6 = 3 × 2. cause 10001 = 73 × 137.)
This is the same as saying that division with This explains a common algorithm for recog-
remainder gives remainder 0. Thus a does not nising whether n is prime: try dividing n by
√
divide b when the remainder is 6= 0. a = 2, 3, . . . while a 6 n.
The algorithm is written with a boolean vari-
Example. 3 does not divide 14 because it leaves able prime, and n is prime if prime = T (true)
remainder 2: 14 = 4 × 3 + 2. when the algorithm terminates.
When a divides b we also say:
• a is a divisor of b, assign a the value 2.
• a is a factor of b, assign prime the value T.
√
• b is divisible by a, while a 6 n and prime= T
• b is a multiple of a. if a divides n
give prime the value F
2.1 Primes else
increase the value of a by 1.
A positive integer p > 1 is a prime if its only
positive integer divisors are 1 and p. Thus the
first few prime numbers are
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, . . . 2.3 Finding divisors
The number 1 is not counted as a prime, as this
would spoil the This algorithm also finds a prime divisor of n.
Either
Fundamental Theorem of Arithmetic. √
the least a 6 n which divides n,
Each integer > 1 can be expressed in exactly or,
one way, up to order, as a product of primes. √
if we do not find a divisor among the a 6 n, n
itself is prime.
Example. 210 = 2 × 3 × 5 × 7, and this is the
only product of primes which equals 210.
This would not be true if 1 was counted as 2.4 The greatest common divisor of
a prime, because many factorisations involve 1. two numbers
E.g.
It is remarkable that we can find the greatest
210 = 1 × 2 × 3 × 5 × 7 = 12 × 2 × 3 × 5 × 7 = . . .
common divisor of positive integers m and n,
gcd(m, n), without finding their prime divisors.
2.2 Recognising primes
This is done by the famous Euclidean al-
If an integer n > 1 has a divisor, it has a divisor gorithm, which repeatedly divides the greater
√ √
6 n, because for any divisor a > n we also number by the smaller, keeping the smaller num-
√
have the divisor n/a, which is < n. ber and the remainder.
2.6 Extended Euclidean algorithm
Euclidean Algorithm.
Input: positive integers m and n with m > n If we have used the Euclidean algorithm to find
Output: gcd(m, n) that gcd(m, n) = d, we can “work backwards”
a := m, b := n through its steps to find integers a and b such
r := remainder when a is divided by b that am + bn = d.
while r 6= 0 do Example. For our m = 237, n = 105 example
a := b above:
b := r
r := remainder when a is divided by b
end 3= 27 − 1×24
return b 3= 27 − 1(105 − 3×27) = −105 + 4×27
3 = −105 + 4(237 − 2×105) = 4×237 − 9×105
Example. m = 237, n = 105 So we see that a = 4 and b = −9 is a solution in
The first values are a = 237, b = 105, this case.
so r = 237 − 2 × 105 = 27. Our first line above was a rearrangement of
The next values are a = 105, b = 27, the second last line of our original Euclidean al-
so r = 105 − 3 × 27 = 24. gorithm working. In the second line we made a
The next values are a = 27, b = 24, substitution for 24 based on the second line of
so r = 27 − 1 × 24 = 3. our original Euclidean algorithm working. In
The next values are a = 24, b = 3, the third line we made a substitution for 27
so r = 24 − 8 × 3 = 0. based on the first line of our original Euclidean
Thus the final value of b is 3, which is algorithm working.
gcd(237, 105).
This can be set out more neatly: Questions
237 = 2 × 105 + 27 2.1 Write down multiples of 13, and multiples
105 = 3 × 27 + 24 of 21, until you find a multiple of 13 and
27 = 1 × 24 + 3 a multiple of 21 which differ by 1.
24 = 8 × 3 + 0
2.2 Can a multiple of 15 and a multiple of 21
differ by 1? If not, what is the small-
2.5 The Euclidean algorithm works! est positive difference between such mul-
We start with the precondition m > n > 0. tiples?
Then the division theorem tells us there is a re- 2.3 Find gcd(13, 21) and gcd(15, 21), and sug-
mainder r < b when a = m is divided by b = n. gest how they are related to the results in
Repeating the process gives successively smaller Questions 2.1 and 2.2.
remainders, and hence the algorithm eventually
returns a value. 2.4 Work out the prime factorisations of 999
That the value returned value is actually and 1000.
gcd(m, n) relies on the following fact.
2.5 You should find no common prime factor
Fact. If a, b and k are integers, then of 999 and 1000. How could you see this
without factorising the numbers? (Hint: a
gcd(a − kb, b) = gcd(a, b).
common divisor of 1000 and 999 is also a
common divisor of . . . what?)
By using this fact repeatedly, we can show that
after each execution of the while loop in the al-
gorithm gcd(b, r) = gcd(m, n). When the algo-
rithm terminates, this means b = gcd(b, 0) =
gcd(m, n). (Equivalently, in the neat set out
given above, the gcd of the numbers in the last
two columns is always gcd(m, n).)
Lecture 3: Congruences
We’re used to classifying the integers as ei- Example.

ther even or odd. The even integers are those
19 ≡ 13 (mod 6) because 6 divides 19-13
that can be written as 2k for some integer k.
The odd integers are those that can be written 12 ≡ 20 (mod 4) because 4 divides 20-12
as 2k + 1 for some integer k. 22 ≡ 13 (mod 3) because 3 divides 22-13
even . . . , −6, −4, −2, 0, 2, 4, 6, . . .

odd . . . , −5, −3, −1, 1, 3, 5, . . . 3.2 Working with congruences
When working with congruences modulo some
fixed integer n, we can “substitute in” just like
This classification is useful because even and we can with equalities.
odd integers have particular properties. For ex-
ample, the sum of any two odd integers is even.
Similarly we can split the integers into three If a ≡ b (mod n) and b ≡ c (mod n), then
classes: those that are 3k for some integer k,
a ≡ c (mod n).
those that are 3k + 1 for some integer k, and
those that are 3k + 2 for some integer k.
Example. Suppose x ≡ 13 (mod 7). Then
x ≡ 6 (mod 7) because 13 ≡ 6 (mod 7).
3k . . . , −9, −6, −3, 0, 3, 6, 9, . . .
3k + 1 . . . , −8, −5, −2, 1, 4, 7, 10, . . . We can add, subtract and multiply congru-
3k + 2 . . . , −7, −4, −1, 2, 5, 8, 11, . . . ences just like we can with equations.
If a1 ≡ b1 (mod n) and a2 ≡ b2 (mod n), then

These classes also have particular properties.
For example, the sum of an integer in the sec- • a1 + a2 ≡ b1 + b2 (mod n)
ond class and an integer in the third class will
always be in the first class. • a1 − a2 ≡ b1 − b2 (mod n)
We don’t have to stop with 3. We could di- • a1 a2 ≡ b1 b2 (mod n).
vide integers into 4 different classes according to
their remainders when divided by 4, and so on.
Example. If x ≡ 3 (mod 8) and y ≡ 2 (mod 8),
then
3.1 Congruences • x + y ≡ 5 (mod 8)
• x − y ≡ 1 (mod 8)
Let n > 2 be an integer. We say integers a
and b are congruent modulo n and write • xy ≡ 6 (mod 8).
a ≡ b (mod n) We can also deduce that x + 4 ≡ 7 (mod 8),
when n divides a − b. that 4x ≡ 12 (mod 8) and so on, because obvi-
ously 4 ≡ 4 (mod 8). Note as well that 4x ≡
12 (mod 8) can be simplified to 4x ≡ 4 (mod 8).
In some situations we can also “divide 3.4 Modular inverses
through” a congruence by an integer.
A modular multiplicative inverse of an inte-
If a ≡ b (mod n) and d divides a, b and n, ger a modulo n is an integer x such that
then
a b n ax ≡ 1 (mod n).
d ≡ d (mod d ).
From the last section we know that such an in-

verse will exist if and only if gcd(a, n) = 1. If
3.3 Solving linear congruences
inverses do exist then we can find them using
Think of a congruence like 7x ≡ 5 (mod 9). This the extended Euclidean algorithm (there will
will hold if 9 divides 7x − 5 or in other words if be lots of inverses, but they will all be in one
there is an integer y such that 7x − 5 = 9y. So congruence class modulo n). These inverses
to solve our original congruence we can find an have important applications to cryptography
integer solution to 7x − 9y = 5. and random number generation.
Some congruences don’t have solutions.
For example, there is no solution to 10x ≡ Example. 8 should have a multiplicative in-
6 (mod 20) because there are no integers x and verse modulo 45 because gcd(8, 45) = 1. Using
y such that 10x − 20y = 6. the extended Euclidean algorithm we see that
We can find an expression for all the integers −3 × 45 + 17 × 8 = 1. So 8 × 17 ≡ 1 (mod 45).
x that satisfy a congruence like ax ≡ b (mod n) This means that 17 is a multiplicative inverse of
in the following way: 8 modulo 45.
1. Find d = gcd(a, n). Questions

2. If d doesn’t divide b, then there are no so- 3.1 Are the following true or false?
lutions.
• 6 ≡ 3 (mod 3)
3. If d divides b, then divide through the con- • 9 ≡ 18 (mod 8)
gruence by d to get an equivalent congru-
• 5x + 6 ≡ 2x (mod 3)
ence ad x ≡ db (mod nd )
3.2 Prove all of the facts about congruences
4. Find integers x0 and y 0 such that ad x0 − that were stated in this lecture (use the
n 0 b
d y = d . The integers x that satisfy the definition of congruence modulo n and the
original congruence are exactly those for definition of divides).
which x ≡ x0 (mod nd ).
3.3 Find an expression for all the integers x
Example. Find all integers x such that 36x ≡ that satisfy 9x ≡ 36 (mod 60).
10 (mod 114).
Using the Euclidean algorithm we find
gcd(36, 114) = 6. So 6 divides 36x − 114y
for any integers x and y, and consequently
36x − 114y 6= 10. This means that there are
no integers x such that 36x ≡ 10 (mod 114).
Example. Find all integers x such that

24x ≡ 8 (mod 44).
Using the Euclidean algorithm we find
gcd(24, 44) = 4. So we divide through by 4 to
get the equivalent congruence 6x ≡ 2 (mod 11).
Using the extended euclidean algorithm we see
that 2×6−1×11 = 1, and hence 4×6−2×11 = 2.
Thus the integers x such that 24x ≡ 8 (mod 44)
are exactly the integers x ≡ 4 (mod 11).
Lecture 4: Logic
The simplest and most commonly used part Similarly, p ∨ q is true when p is true or q is
of logic is the logic of “and”, “or” and “not”, true, but now we have to be more precise, be-
which is known as propositional logic. cause “or” has at least two meanings in ordinary
A proposition is any sentence which has a speech.
definite truth value (true= T or false= F), such
as We define ∨ by the truth table
1 + 1 = 2, or p q p∨q
10 is a prime number. T T T
but not T F T
What is your name? or F T T
This sentence is false. F F F
Propositions are denoted by letters such as This is the inclusive sense of “p or q” (often writ-
p, q, r, . . . , and they are combined into com- ten “p and/or q” and meaning at least one of p,
pound propositions by connectives such as ∧ q is true).
(and), ∨ (or) and ¬ (not). Finally, “not” ¬ (also called negation) is de-
fined as follows.
We define ¬ by the truth table
p ¬p
4.1 Connectives ∧, ∨ and ¬
T F
∧, ∨ and ¬ are called “connectives” because they F T
can be used to connect two sentences p and q
into one. These particular connectives are de- The connectives ∧, ∨ and ¬, are functions of the
fined so that they agree with the most com- propositional variables p and q, which can take
mon interpretations of the words “and”, “or” the two values T and F. For this reason, ∧, ∨
and “not”. and ¬ are also called truth functions.
To define p ∧ q, for example, we only have to
say that p ∧ q is true only when p is true and q 4.2 Implication
is true.
Another important truth function is p → q,
We define ∧ by the following truth table: which corresponds to “if p then q” or “p implies
q” in ordinary speech.
p q p∧q In ordinary speech the value of p → q de-
T T T pends only on what happens when p is true. For
T F F example to decide whether
F T F MCG flooded → the cricket is off
F F F it is enough to see what happens when the MCG
is flooded. Thus we agree that p → q is true
when p is false.
3. If we write 0 for F and 1 for T then ∨ becomes
We define → by the truth table
the function
p q p→q p q p∨q
T T T 1 1 0
T F F 1 0 1
F T T 0 1 1
F F T 0 0 0
This is also known as the “mod 2 sum”, because
1 + 1 = 2 ≡ 0 (mod 2). (It could also be called
4.3 Other connectives the “mod 2 difference” because a + b is the same
a − b, mod 2).
Two other important connectives are ↔ (“if and
only if”) and ∨ (“exclusive or”). 4. The mod 2 sum occurs in many homes where
The sentence p ↔ q is true exactly when the two switches p, q control the same light. The
truth values of p and q agree. truth value of p ∨ q tells whether the light is on
or not, and the light can be switched to the op-
We define ↔ by the truth table
posite state by switching the value of either p or
p q p↔q q.
T T T
Questions
T F F
F T F 4.1 Which of the following are propositions?
F F T 1 + 1 = 3, 1 + 1, 3 divides 7, 3÷7
We could also write p ↔ q as (p → q) ∧ (q → p). 4.2 Let f be the proposition “foo” and let b be
We’ll see how to prove this in the next lecture. the proposition “bar”. Write the following
The sentence p ∨ q is true exactly when the propositions in symbols, using f, b, → and
truth values of p and q disagree. ¬.
• if foo, then bar.

We define ∨ by the truth table
• bar if foo.
p q p∨q
• bar only if foo.
T T F
• foo implies not bar.
T F T
• foo is sufficient for bar.
F T T
• foo is necessary for bar.
F F F
4.3 In the following examples, is the “or” in-
tended to be inclusive or exclusive?
4.4 Remarks • Would you like coffee or tea?
1. The symbols ∧ and ∨ are intentionally sim- • Oranges or lemons are a good source
ilar to the symbols ∩ and ∪ for set intersection of vitamin C.
and union because • He will arrive in a minute or two.
x ∈ A ∩ B ⇔ (x ∈ A) ∧ (x ∈ B)
x ∈ A ∪ B ⇔ (x ∈ A) ∨ (x ∈ B)
(We study sets later.)
2. The “exclusive or” function ∨ is written XOR

in some programming languages.
Lecture 5: Tautologies and logical equivalence
A major problem in logic is to recognise 5.2 Logical equivalence

statements that are “always true” or “always
false”.
Sentences φ and ψ are logically equivalent if they
are the same truth function, which also means
5.1 Tautologies and contradictions φ ↔ ψ is a tautology. This relation between
sentences is written φ ⇔ ψ or φ ≡ ψ.
A sentence φ in propositional logic is a formula
with variables p, q, r, . . . which can take the val-
ues T and F. The possible interpretations of φ
are all possible assignments of values to its vari-
ables.
A sentence in propositional logic is Example. p → q ≡ (¬p) ∨ q

We know p → q has the truth table
• a tautology if it has value T under all
interpretations; p q p→q
• a contradiction if it has value F under T T T
all interpretations. T F F
F T T
We can check whether φ is a tautology, a F F T
contradiction, or neither by computing its value
for all possible values of its variables. Now (¬p) ∨ q has the truth table
p q ¬p (¬p) ∨ q
Example. (¬p) ∨ p is a tautology. T T F T
The truth table for (¬p) ∨ p is
T F F F
p ¬p (¬p) ∨ p F T T T
T F T F F T T
F T T So p → q and (¬p) ∨ q have the same truth
So (¬p)∨p has value T under all interpretations, table (looking just at their columns). It follows
and thus is a tautology. (It is sometimes known from this that p → q can always be rewritten
as the law of the excluded middle). as (¬p) ∨ q. In fact, all truth functions can be
expressed in terms of ∧, ∨, and ¬.
We can similarly compute the values of any
truth function φ, so this is an algorithm for
recognising tautologies. However, if φ has n
variables, they have 2n sets of values, so the
amount of computation grows rapidly with n.
One of the biggest unsolved problems of logic This is like finding identities in algebra – one
and computer science is to find an efficient algo- uses known equivalences to rearrange, expand
rithm for recognising tautologies. and simplify.
5.3 Useful equivalences brackets. Since p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r, we
can write either side as p ∨ q ∨ r. This is like
The following equivalences are the most fre- p + (q + r) = (p + q) + r = p + q + r in ordinary
quently used in this “algebra of logic”. algebra.
Equivalence law 3. The distributive laws are used to “expand”

p ↔ q ≡ (p → q) ∧ (q → p) combinations of ∧ and ∨.
Implication law p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)
p → q ≡ (¬p) ∨ q is like
Double Negation law p(q + r) = pq + pr.
¬¬p ≡ p
The other distributive law is not like anything
Idempotent laws in ordinary algebra.
p∧p≡p
4. Some of these laws are redundant, in the sense
p∨p≡p
that other laws imply them. For example, the
Commutative laws absorption law
p∧q ≡q∧p
p ∧ (p ∨ q) ≡ p
p∨q ≡q∨p
follows from the distributive, idempotent, iden-
Associative laws
tity and annihilation laws:
p ∧ (q ∧ r) ≡ (p ∧ q) ∧ r
p ∧ (p ∨ q) ≡ (p ∧ p) ∨ (p ∧ q)
p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r
by distributive law
Distributive laws
≡ p ∨ (p ∧ q)
p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r) by idempotent law
p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r) ≡ (p ∧ T) ∨ (p ∧ q)
De Morgan’s laws by identity law
¬(p ∧ q) ≡ (¬p) ∨ (¬q) ≡ p ∧ (T ∨ q)
¬(p ∨ q) ≡ (¬p) ∧ (¬q) by distributive law
Identity laws ≡ p∧T
p∧T≡p by annihilation law
p∨F≡p ≡ p
by identity law
Annihilation laws
p∧F≡F
Questions
p∨T≡T
Inverse laws 5.1 Explain why there are 8 ways to assign
truth values to variables p, q, r; 16 ways
p ∧ (¬p) ≡ F
to assign truth values to variables p, q, r, s;
p ∨ (¬p) ≡ T and in general 2n ways to assign truth val-
Absorption laws ues to n variables.
p ∧ (p ∨ q) ≡ p
5.2 Use truth tables to verify the de Morgan’s
p ∨ (p ∧ q) ≡ p laws and absorption laws.
5.3 If p ∨ q is the “exclusive or” discussed last

Remarks
lecture, see whether it satisfies the dis-
1. The commutative laws are used to rear- tributive laws
range terms, as in ordinary algebra. The law
p ∨ q ≡ q ∨ p is like p + q = q + p in ordinary p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)
algebra, and p ∧ q ≡ q ∧ p is like pq = qp. p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r)
2. The associative laws are used to remove

Lecture 6: Rules of inference
Last time we saw how to recognise tautolo- Example. The contrapositive of

gies and logically equivalent sentences by com-
“If it’s a bird then it has feathers.”
puting their truth tables. Another way is to in-
fer new sentences from old by rules of inference. is
“If it doesn’t have feathers, then it’s not a bird.”
6.1 Replacement
The contrapositive has the same meaning as the
Any sentence may be replaced by a logically original statement.
equivalent sentence. Any series of such replace-
ments therefore leads to a sentence equivalent to
Example. On the other hand, the negation of
the one we started with.
the statement
Using replacement is like the usual method
of proving identities in algebra – make a series “If it’s a bird then it has feathers.”
of replacements until the left hand side is found is
equal to the right hand side.
“It’s a bird and it doesn’t have feathers.”
This is (roughly speaking) the opposite of the
Example. Prove that x → y ≡ (¬y) → (¬x).
original statement. Note that the negation of
an “implies” statement is an “and” statement,
x → y ≡ (¬x) ∨ y not another “implies” statement.
by implication law
≡ y ∨ (¬x)
by commutative law 6.3 Using logic laws
≡ (¬¬y) ∨ (¬x)
Example. Prove that p → (q → p) is a tautol-
by law of double negation ogy.
≡ (¬y) → (¬x)
by implication law p → (q → p)
≡ (¬p) ∨ (q → p)
6.2 Contrapositives
by implication law
≡ (¬p) ∨ ((¬q) ∨ p)
x → y ≡ (¬y) → (¬x)
by implication law
(¬y) → (¬x) is the contrapositive of x → y.
≡ (¬p) ∨ (p ∨ (¬q))
by commutative law
Example. The contrapositive of
≡ ((¬p) ∨ p) ∨ (¬q)
MCG flooded → cricket is off
by associative law
is ≡ (p ∨ (¬p)) ∨ (¬q)
Cricket is on → MCG not flooded. by commutatve law
An implication and its contrapositive are ≡ T ∨ (¬q) by inverse law
equivalent: they mean the same thing! ≡ T by annihilation law
Example. Prove that ((p → q) ∧ p) → q is a Questions
tautology.
6.1 The slogan “no pain, no gain” stands for
an implication. What is it?
((p → q) ∧ p) → q
6.2 What is the contrapositive of “no pain, no
≡ ¬((p → q) ∧ p) ∨ q
gain”?
by implication law
≡ (¬(p → q) ∨ (¬p)) ∨ q 6.3 Write the following sentences as implica-
by de Morgan’s law tions, and then write their contrapositives.
≡ ¬(p → q) ∨ ((¬p) ∨ q) • You can’t make an omelette without

by associative law breaking eggs.
≡ ¬(p → q) ∨ (p → q) • If n is even, so is n2
by implication law • Haste makes waste.
≡ (p → q) ∨ ¬(p → q)
6.4 Show that p → (q → (r → p)) is a tautol-
by commutative law ogy using the laws of logic.
≡ T by inverse law
6.5 Find a tautology with n variables which
This tautology says that “if p implies q and p is
is p → (q → p) for n = 2 and p →
true then q is true”.
(q → (r → p)) for n = 3.
6.4 Logical consequence
A sentence ψ is a logical consequence of a sen-

tence φ, if ψ = T whenever φ = T. We write
this as φ ⇒ ψ.
It is the same to say that φ → ψ is a tautol-

ogy, but φ ⇒ ψ makes it clearer that we are dis-
cussing a relation between the sentences φ and
ψ.
Any sentence ψ logically equivalent to φ is
a logical consequence of φ, but not all conse-
quences of ψ are equivalent to it.
Example. p ∧ q ⇒ p
p is a logical consequence of p ∧ q, because p = T
whenever p ∧ q = T. However, we can have
p ∧ q = F when p = T (namely, when q = F).
Hence p ∧ q and p are not equivalent.
This example shows that ⇒ is not symmet-
ric:
(p ∧ q) ⇒ p but p ; (p ∧ q)
This is where ⇒ differs from ≡, because if φ ≡ ψ
then ψ ≡ φ.
In fact, we build the relation ≡ from ⇒ the
same way ↔ is built from →:
φ ≡ ψ means (φ ⇒ ψ) and (ψ ⇒ φ).
Lecture 7: Predicates and quantifiers
We get a more expressive language than Another way is to use quantifiers:

propositional logic by admitting predicates like • ∀ (meaning “for all”) and
• ∃ (meaning “there exists” or “there is”).
P (n), Q(x, y), R(a, b, c)
These stand for properties or relations such as
Example. ∃nP (n) is the (true) sentence
P (n) : n is prime
Q(x, y) : x 6 y there exists an n such that n is prime.
R(a, b, c) : a + b = c. ∀nP (n) is the (false) sentence
Those with one variable, such as “n is prime,” for all n, n is prime.
are usually called properties, while those with
two or more variables, such as “x 6 y,” are usu-
ally called relations. Note that when ∃n is read “there exists an
n” we also add a “such that.”
7.1 Predicates
A predicate such as “n is prime” is not a proposi- 7.3 Quantifiers and connectives
tion because it is neither true nor false. Rather,
it is a function P (n) of n with the Boolean val- We can also combine quantifiers with connec-
ues T (true) or F (false). In this case, P (n) is a tives from propositional logic.
function of natural numbers defined by
(
T if n is prime Example. Let Sq(n) be the predicate “n is a
P (n) =
F otherwise. square,” and let P os(n) be the predicate “n is
Similarly, the “x 6 y” predicate is a function of positive” as above. Then we can symbolise the
pairs of real numbers, defined by following sentences:
( There is a positive square:
T if x 6 y
R(x, y) = ∃n(P os(n) ∧ Sq(n)).
F otherwise.
Since most of mathematics involves properties There is a positive integer which is not a
and relations such as these, only a language with square:
predicates is adequate for mathematics (and ∃n(P os(n) ∧ ¬Sq(n))
computer science).
All squares are positive:
∀n(Sq(n) → P os(n))
7.2 Building sentences from predi-
cates
Notice that the “All. . . are” combination in
One way to create a sentence from a predicate English actually involves an implication. This
is to replace its variables by constants. For ex- is needed because we are making a claim only
ample, when P (n) is the predicate “n is prime,” about squares and the implication serves to
P (3) is the sentence “3 is prime.” “narrow down” the set we are examining.
7.4 Alternating quantifiers Remark. Another way to say “you can’t fool
all of the people all of the time” is
Combinations of quantifiers like ∀x∃y . . . , “for
all x there is a y . . .” are common in mathe- ∃p∃t¬F (p, t).
matics, and can be confusing. It helps to have
some examples in mind to recall the difference Questions
between ∀x∃y . . . and ∃y∀x . . . 7.1 Write “roses are red” in the language of
The relation x < y is convenient to illustrate predicate logic, using
such combinations; we write x < y as the pred-
icate L(x, y) rose(x) for “x is a rose”
Then red(x) for “x is red.”
∀x∃yL(x, y) 7.2 If P (n) stands for “n is prime” and E(n)
is the (true) sentence stands for “n is even,” what does P (n) ∧
(¬E(n)) say about n?
for all x there is a y such that x < y,
which says that there is no greatest number. 7.3 Using the predicates
But with the opposite combination of quan- pol(x) for “x is a politician”
tifiers we have
liar(x) for “x is a liar”
∃y∀xL(x, y)
• all politicians are liars
is the false sentence
• some politicians are liars
there is a y such that for all x, x < y,
• no politicians are liars
which says there is a number greater than all
• some politicians are not liars.
numbers.
Even though these statements are usually Are any of these sentences logically equiv-
written without brackets they are effectively alent?
bracketed “from the centre”. So ∀x∃yL(x, y)
means ∀x(∃yL(x, y)) and ∃y∀xL(x, y) means
∃y(∀xL(x, y)).
7.5 An example from Abraham Lin-

coln
You can fool all of the people some of the time

and
you can fool some of the people all of the time
but
you can’t fool all of the people all of the time.
Let F (p, t) be the predicate:
person p can be fooled at time t.
Then
∀p∃tF (p, t) says
you can fool all of the people some of the time,
∃p∀tF (p, t) says
you can fool some of the people all of the time,
¬∀p∀tF (p, t) says
you can’t fool all of the people all of the time.
Hence Lincoln’s sentence in symbols is:
∀p∃tF (p, t) ∧ ∃p∀tF (p, t) ∧ ¬∀p∀tF (p, t)
Lecture 8: Predicate logic
8.1 Valid sentences Example. The sentence

The language of predicate logic is based on ∀x∃yQ(x, y) → ∃x∀yQ(x, y)
predicate symbols, variables, constants, brack- is false if we interpret Q(x, y) as x 6 y on the
ets, ∀, ∃ and connectives. The examples from real numbers. With this interpretation
last lecture illustrate how these ingredients are
used to form sentences. ∀x∃yQ(x, y) is true
(for any number there is a larger number), but
A sentence in predicate logic is valid if it has
value T under all interpretations. ∃x∀yQ(x, y) is false
(there is no number 6 all numbers). Hence the
This is similar to the definition of a tautology implication is false.
in propositional logic. But now “all interpreta-
tions” means all interpretations of the predicate
symbols, which is more complicated. The inter-
8.4 Consequence and equivalence
pretation of a symbol P (n), say, must include
both the range of the variable n, as well as say- As in propositional logic, a sentence ψ is a logi-
ing those n for which P (n) is true. cal consequence of a sentence φ if any interpre-
tation which makes φ true makes ψ true. Again
8.2 Interpretations we write φ ⇒ ψ if ψ is a consequence of φ, and
this is the same as saying φ → ψ is valid.
For example, one interpretation of P (n) is “n is
positive,” where n ranges over the real numbers.
Under this interpretation, ∀nP (n) is false. Example. Any interpretation which makes
A different interpretation of P (n) is “n is ∀nP (n) true makes ∃nP (n) true, and this is why
positive,” where n ranges over the numbers > 2. ∀xP (x) → ∃xP (x) is valid.
Under this interpretation, ∀nP (n) is true.
Unlike in propositional logic, there are in-
Similarly, sentences ψ and φ are equivalent,
finitely many different interpretations of each
written ψ ≡ φ, if each is a consequence of the
formula. Thus there is no truth table method
other. Some sentences are equivalent for “propo-
for predicate logic. We cannot decide whether a
sitional logic reasons.”
formula is valid by testing all interpretations.
8.3 Recognising valid sentences Example. We have
Nevertheless, in some cases, we can see that a ∀x(P (x) ∧ Q(x)) ≡ ∀x(Q(x) ∧ P (x))
sentence is true for all interpretations. simply because
Example. ∀x∀yP (x, y) → ∀y∀xP (x, y) is true P (x) ∧ Q(x) ≡ Q(x) ∧ P (x)
for all properties P , and hence is valid. for any x.
Likewise, we can sometimes see that a sen-
tence is not valid by finding an interpretation However there are also equivalences that
which makes it false. genuinely involve quantifiers.
8.5 Useful equivalences 8.7* Completeness and undecidabil-
ity
Two important equivalences involving quanti-
In 1930, Gödel proved that there is a complete
fiers are
set of rules of inference for predicate logic. This
means, in particular, that there is an algorithm
¬∀xP (x) ≡ ∃x¬P (x) to list the valid sentences.
However, in 1936, Church and Turing proved
¬∃xP (x) ≡ ∀x¬P (x) that there is no algorithm to list the logically
false sentences. This means, in particular, that
These make sense intuitively. For example, predicate logic is undecidable: there is no algo-
¬∀xP (x) means P (x) is not true for all x, hence rithm which, for any sentence φ, decides whether
there is an x for which P (x) is false, that is, φ is valid or not.
∃x¬P (x). This negative result is due to the power of
They can also be viewed as “infinite De Mor- predicate logic: it can express all mathemati-
gan’s laws.” If x ranges over {1, 2, 3, . . .} for ex- cal or computational problems, and it is known
ample, then that some of these problems cannot be solved by
algorithm.
∀xP (x) ≡ P (1) ∧ P (2) ∧ P (3) ∧ · · ·
and Questions
∃xP (x) ≡ P (1) ∨ P (2) ∨ P (3) ∨ · · · 8.1 Give interpretations which make the fol-
Hence lowing sentences false.
¬∀xP (x) ≡ ¬ (P (1) ∧ P (2) ∧ P (3) ∧ · · · ) ∃nP (n) → ∀nP (n)
≡ (¬P (1)) ∨ (¬P (2)) ∨ (¬P (3)) ∨ · · · ∀x∀y (R(x, y) → R(y, x))
by de Morgan’s law ∀m∃nS(m, n)
≡ ∃x¬P (x).
8.2 Give interpretations which show that the
And similarly sentences
¬∃xP (x) ≡ ¬ (P (1) ∨ P (2) ∨ P (3) ∨ · · · ) ∃x (P (x) ∧ L(x))
≡ (¬P (1)) ∧ (¬P (2)) ∧ (¬P (3)) ∧ · · ·
and
by de Morgan’s law
∃x (P (x) ∧ ¬L(x))
≡ ∀x¬P (x).
are not equivalent.
8.3 Is ∃y∀xR(x, y) a logical consequence of

∀x∃yR(x, y)?
8.6 Simplification If so, explain why. If not, give an interpre-
tation which makes ∀x∃yR(x, y) true and
The infinite de Morgan’s laws allow a certain ∃y∀xR(x, y) false.
simplification of predicate formulas by “pushing
8.4 Is ∀x∃yR(x, y) a logical consequence of
¬ inside quantifiers.”
∃y∀xR(x, y)?
If so, explain why. If not, give an interpre-
tation which makes ∃y∀xR(x, y) true and
Example.
∀x∃yR(x, y) false.
¬∀x∃yQ(x, y) ≡ ∃x¬∃yQ(x, y)
≡ ∃x∀y¬Q(x, y). 8.5 Explain why ¬∀p∀tF (p, t) ≡ ∃p∃t¬F (p, t).
It is in fact possible to transform any quanti-

fied statement in predicate logic to an equivalent
with all quantifiers at the front.
Lecture 9: Mathematical induction
Since the natural numbers 0, 1, 2, 3, . . . are Example 2. Prove there are 2n n-bit binary
generated by a process which begins with 0 and strings.
repeatedly adds 1, we have the following.
Let P (n) be “there are 2n n-bit binary strings”.
Property P is true for all natural numbers if Base step. There is 20 = 1 0-bit binary string
1. P (0) is true. (the empty string) so P (0) is true.
2. P (k) ⇒ P (k + 1) for all k ∈ N.
Induction step. We want to prove that
This is called the principle of mathematical
induction. there are 2k k-bit binary strings
It is used in a style of proof called proof by ⇒ there are 2k+1 (k + 1)-bit binary strings
induction, which consists of two steps. Well, a (k + 1)-bit binary string is either W 0 or
Base step: Proof that the required property P W 1, where W is any k-bit binary string. Thus
is true for 0. if there are 2k k-bit binary strings W , there are
Induction step: Proof that if P (k) is true 2 × 2k = 2k+1 (k + 1)-bit binary strings.
then P (k + 1) is true, for each k ∈ N. This completes the induction step, and hence
completes the proof.
9.1 Examples
Example 1. Prove that 3 divides n3 + 2n for
all n ∈ N 9.2 Starting the base step higher
Let P (n) be “3 divides n3 + 2n”.
It is not always appropriate to start the induc-
Base step. 3 divides 03 + 2 × 0 = 0, so P (0) is tion at 0. Some properties are true only from a
true. certain positive integer upwards, in which case
Induction step. We want to prove the induction starts at that integer.
3 divides k 3 + 2k Example 3. Prove n! > 2n for all integers

⇒ 3 divides (k + 1)3 + 2(k + 1). n>4
Well, Let P (n) be “n! > 2n ”.
(k + 1)3 + 2(k + 1) Base step. 4! = 4 × 3 × 2 = 24 > 16 = 24 , so
= k 3 + 3k 2 + 3k + 1 + 2k + 2 P (4) is true.
= k 3 + 2k + 3k 2 + 3k + 3 Induction step. We want to prove k! > 2k ⇒
= k 3 + 2k + 3(k 2 + k + 1). (k + 1)! > 2k+1 for all integers k > 4.
Now, for k > 4, if k! > 2k ,
Therefore,
(k+1)! = (k+1)×k! > (k+1)×2k > 2×2k = 2k+1 .
3 divides k 3 + 2k
(The first > holds because we are assuming
⇒ 3 divides k 3 + 2k + 3(k 2 + k + 1)
k! > 2k and the second holds because k > 4.)
⇒ 3 divides (k + 1)3 + 2(k + 1) Thus k! > 2k ⇒ (k + 1)! > 2k+1 , as required to
as required. This completes the induction step, complete the induction.
and hence completes the proof. So n! > 2n for all n > 4.
Example 4. Prove any integer value n > 8 (in Remark. Another proof is to write down
cents) is obtainable with 3c and 5c stamps.
1 + 2 + 3 + · · · + (n − 1) + n
Let P (n) be “n cents is obtainable with 3c and n + (n − 1) + · · · + 3 + 2 + 1
5c stamps”.
and observe that each of the n columns sums
Base step. 8c can be obtained by a 3c plus a 5c to n + 1. Thus the sum of twice the series is
stamp. So P (8) is true. n(n + 1), and hence the sum of the series itself
Induction step. We have to show that if k cents is n(n + 1)/2. One could argue that this proof
is obtainable, so is (k + 1) cents, when k > 8. uses induction stealthily, to prove that the sum
Case 1. The k cents is obtained using a 5c of each column is the same.
stamp (among others). Replace the 5c stamp by
two 3c stamps, thus obtaining k + 1 cents. Questions
Case 2. If the k cents is obtained using only
In most induction problems set for students we
3c stamps, there are at least three of them (since
skip the experimental part, which is finding what
k > 8). In this case, replace three 3c stamps by
to prove. Before trying to prove that 3 divides
two 5c stamps, again obtaining k + 1 cents.
n3 + 2n, for example, someone needs to guess
Thus in either case, when k > 8, P (k) ⇒
that it is true, perhaps by trying n = 1, 2, 3, 4.
P (k + 1). This completes the induction proof
that n cents are obtainable from 3c and 5c 9.1 In this question, try to guess what ? stands
stamps, for all integers n > 8. for, by trying a few values of n.
• ? divides n2 + n
• The sum of the first n odd numbers
is ?
1 1 1 1
• 1×2 + 2×3 + 3×4 + . . . + n(n+1) = 1−?
9.3 Sums of series
9.2 If you correctly guessed the sum
1 1 1 1
Induction is often used to prove that sum for- 1×2 + 2×3 + 3×4 + ... + n(n+1) ,
mulas are correct. you might wonder why it is so simple.
Here is a clue:
Example 5. Prove 1+2+3+· · ·+n = n(n+1)/2 1
= 1
− 12 .
1×2 1
for all integers n > 1.
1 1
Let P (n) be “1 + 2 + 3 + · · · + n = n(n + 1)/2”. What is 2×3 ? 3×4 ?
How does this lead to a simple formula for
Base step. When n = 1, the left hand side is 1,
1 1 1 1
and the right hand side is 1(1 + 1)/2 = 2/2 = 1, 1×2 + 2×3 + 3×4 + ... + n(n+1) ?
so P (1) is true.
OK, if we can guess formulas correctly,
Induction step. We have to prove that why bother proving them by induction?
1 + 2 + ··· + k = k(k+1) The reason is that a statement which fits
2
(k+1)(k+2) many values of n can still be wrong.
⇒ 1 + 2 + · · · + k + (k + 1) = 2 .
Now, if P (k) is true, 9.3 Show that n2 + n + 41 is a prime number
for n = 1, 2, 3, 4 (and go further, if you
1 + 2 + · · · + k + (k + 1) like). Do you think n2 + n + 41 is prime
= (1 + 2 + · · · + k) + (k + 1) for all natural numbers n?
k(k+1)
= 2 + (k + 1) using P (k)
= (k + 1)( k2 + 1)
(k+1)(k+2)
= 2
as required.
This completes the induction proof.
Lecture 10: Induction and well-ordering
In the previous lecture we were able to prove Example 2. Prove that every positive integer
a property P holds for 0, 1, 2, . . . as follows: is a sum of distinct powers of 2. (Just a power
Base step. Prove P (0) of two by itself counts as a “sum”.)
Induction step. Prove P (k) ⇒ P (k + 1) for
The idea behind this proof is to repeatedly sub-
each natural number k.
tract the largest possible power of 2. We illus-
This is sufficient to prove that P (n) holds for
trate with the number 27.
all natural numbers n, but it may be difficult to
prove that P (k + 1) follows from P (k). It may 27 − largest power of 2 less than 27
in fact be easier to prove the induction step = 27 − 16 = 11
11 − largest power of 2 less than 11
= 11 − 8 = 3
P (0) ∧ P (1) ∧ · · · ∧ P (k) ⇒ P (k + 1).
3 − largest power of 2 less than 3
=3−2=1
That is, it may help to assume P holds for Hence 27 = 16 + 8 + 2 + 1 = 24 + 23 + 21 + 20 .
all numbers before k + 1. Induction with this (It is only interesting to find distinct powers
style of induction step is sometimes called the of 2, because of course each integer > 1 is a sum
strong form of mathematical induction. of 1s, and 1 = 20 .)
The strong induction proof goes as follows.
Let P (n) be “n is a sum of distinct powers of
2”.
Base step. 1 = 20 , so 1 is a sum of (one) power
10.1 Examples of strong induction of 2. Thus P (1) is true.
Induction step. Suppose each of the numbers
Example 1. Prove that every integer > 2 is a 1, 2, 3, . . . , k is a sum of distinct powers of 2. We
product of primes. (Just a prime by itself counts wish to prove that k +1 is a sum of distinct pow-
as a “product”.) ers of 2.
Let P (n) be “n is a product of primes”. This is certainly true if k + 1 is a power of 2. If
not, let 2j be the greatest power of 2 less than
Base step. 2 is a prime, hence a product of (one) k + 1. Then
prime. So P (2) is true.
i = k + 1 − 2j
Induction step. Suppose 2, 3, . . . , k are products
is one of the numbers 1, 2, 3, ..., k, and hence it
of primes. We wish to prove that k +1 is a prod-
is a sum of distinct powers of 2.
uct of primes.
Also, the powers of 2 that sum to i are all
This is certainly true if k + 1 is a prime. If not
less than 2j , otherwise 2j is less than half k + 1,
k + 1 = i × j, contrary to the choice of 2j as the largest power
for some natural numbers i and j less than k +1. of 2 less than k + 1.
But then i and j are products of primes by our Hence k + 1 = 2j + powers of 2 that sum to
assumption, hence so is i × j = k + 1. i is a sum of distinct powers of 2.
This completes the induction proof. This completes the induction proof.
√
10.2 Well-ordering and descent But then 2 = m1 /n1 , and we can repeat
the argument to show that m1 and n1 are both
Induction expresses the fact that each natural
even, so m1 = 2m2 and n1 = 2n2 , and so on.
number n can be reached by starting at 0 and
Since the argument can be repeated indefi-
going upwards (e.g. adding 1) a finite number
nitely, we get an infinite descending sequence of
of times.
natural numbers
Equivalent facts are that it is only a finite
number of steps downwards from any natural m > m1 > m2 > · · ·
number to 0, that any descending sequence of which is impossible.
natural numbers is finite, and that any set of Hence
√ there are no natural numbers m and
natural numbers has a least element. n with 2 = m/n.
This property is called well-ordering of the
natural numbers. It is often convenient to ar- Questions
range a proof to “work downwards” and appeal
to well-ordering by saying that the process of 10.1 For each of the following statements, say
working downwards must eventually stop. which is likely to require strong induction
Such proofs are equivalent to induction, for its proof.
though they are sometimes called “infinite de- an+1 −1
• 1 + a + a2 + · · · + an = a−1
scent” or similar.
• ¬ (p1 ∨ p2 ∨ p3 ∨ · · · ∨ pn ) ≡ (¬p1 ) ∧
(¬p2 ) ∧ (¬p3 ) ∧ · · · ∧ (¬pn )
10.3 Proofs by descent n
• Each fraction m < 1 is a sum of
Example 1. Prove that any integer > 2 has a distinct fractions with numerator 1
prime divisor. for example, 11 1 1 1

12 = 2 + 3 + 12 .
If n is prime, then it is a prime divisor of itself.
10.2 There is something else which tells you ev-
If not, let n1 < n be a divisor of n.
ery integer > 1 is a sum of distinct powers
If n1 is prime, it is a prime divisor of n. If
of 2. What is it?
not, let n2 < n1 be a divisor of n1 (and hence of
n). 10.3 Is every integer > 1 a sum of distinct pow-
If n2 is prime, it is a prime divisor of n. If ers of 3?
not, let n3 < n2 be a divisor of n2 , etc.
The sequence n > n1 > n2 > n3 > · · · must
eventually terminate, and this means we find a
prime divisor of n.
√
Example 2. Prove 2 is irrational.
√
Suppose that 2 = m/n for natural numbers m
and n. We will show this is impossible. Since the
square of an odd number is odd, we can argue
as follows
√
2 = m/n
⇒ 2 = m2 /n2 squaring both sides
⇒ m2 = 2n2
⇒ m2 is even
⇒ m is even
since the square of an odd number is odd
⇒ m = 2m1 say
⇒ 2n2 = m2 = 4m21
⇒ n2 = 2m21
⇒ n is even, = 2n1 say
Lecture 11: Sets
Sets are vital in expressing mathematics for- For example, when discussing arithmetic it
mally and are also very important data struc- might be sufficient to work just with the num-
tures in computer science. bers 0, 1, 2, 3, . . .. Our universal set could then
A set is basically just an unordered collection be taken as
of distinct objects, which we call its elements or
N = {0, 1, 2, 3, . . .},
members. Note that there is no notion of order
for a set, even though we often write down its and other sets of interest, e.g. {x : x is prime},
elements in some order for convenience. Also, are parts of N.
there is no notion of multiplicity: an object is
either in a set or not – it cannot be in the set
multiple times.
11.3 Subsets
Sets A and B are equal when every element We say that A is a subset of B and write
of A is an element of B and vice-versa. A ⊆ B when each element of A is an element
of B.
11.1 Set notation

Example. The set of primes forms a subset of
N, that is {x : x is prime} ⊆ N.
• x ∈ S means x is an element of set S.
• {x1 , x2 , x3 , . . .} is the set with elements

x1 , x2 , x3 , . . . . 11.4 Characteristic functions
• {x : P (x)} is the set of all x with property A subset A of B can be specified by its charac-
P. teristic function χA , which tells which elements
of B are in A and which are not.
(
Example. 1 if x ∈ A
χA (x) =
17 ∈ {x : x is prime} = {2, 3, 5, 7, 11, 13, . . .} 0 if x ∈ /A
{1, 2, 3} = {3, 1, 2}
{1, 1, 1} = {1}
Example. The subset A = {a, c} of B =
{a, b, c} has the characteristic function χA with
For a finite set S, we write |S| for the number
χA (a) = 1, χA (b) = 0, χA (c) = 1.
of elements of S.
We also write this function more simply as
11.2 Universal set a b c

1 0 1
The idea of a “set of all sets” leads to logical
difficulties. Difficulties are avoided by always
working within a local “universal set” which in- In fact we can list all characteristic functions
cludes only those objects under consideration. on {a, b, c}, and hence all subsets of {a, b, c}, by
listing all sequences of three binary digits:
characteristic function subset corresponds to the property of being even.
a b c Similarly, the set
0 0 0 {} {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, . . .}
0 0 1 {c} corresponds to the property of being prime. The
0 1 0 {b} power set P(N) corresponds to all possible prop-
0 1 1 {b, c} erties of natural numbers.
1 0 0 {a}
11.7* What are numbers?
1 0 1 {a, c}
1 1 0 {a, b} The most common approach to building mathe-
matics up from logical foundations considers all
1 1 1 {a, b, c}
mathematical objects to be fundamentally made
of sets. One simple way to define numbers using
We could similarly list all the subsets of a sets (due to von Neumann) is the following.
four-element set, and there would be 24 = 16 of
0 = {}
them, corresponding to the 24 sequences of 0s
and ls. 1 = {0}
In the same way, we find that an n-element 2 = {0, 1}
set has 2n subsets, because there are 2n binary ..
.
sequences of length n. (Each of the n places in
the sequence can be filled in two ways.) n + 1 = {0, 1, 2, . . . , n}
We are not going to use this definition in this
course. Still, it is interesting that numbers can
11.5 Power set
be defined in such a simple way.
The set of all subsets of a set U is called the Questions

power set P(U ) of U .
11.1 Suppose E(x) stands for “x is even” and
F (x) stands for “5 divides x.”
Example. We see from the previous table that
P({a, b, c}) is the set • What is the set {x : E(x) ∧ F (x)}?
• Write a formula using E(x) and
{}, {c}, {b}, {b, c}, {a}, {a, c}, {a, b}, {a, b, c} .
F (x) which describes the set
{5, 15, 25, 35, . . .}.
If U has n elements, then P(U ) has 2n ele-
ments. 11.2 How many subsets does the set
{2, 5, 10, 20} have?
(The reason P(U ) is called the “power” set 11.3 Consider the infinitely many sets
is probably that the number of its elements is
this power of 2. In fact, the power set of U is {x : 0 < x < 1}
sometimes written 2U .) {x : 0 < x < 12 }
{x : 0 < x < 13 }
11.6 Sets and properties {x : 0 < x < 14 }
..
We mentioned at the beginning that {x : P (x)} .
stands for the set of objects x with property P . Do they have any element in common?
Thus sets correspond to properties.
Properties of the natural numbers
0, 1, 2, 3, . . . , for example, correspond to sub-
sets of the set N = {0, 1, 2, 3, . . .}. Thus the
subset
{0, 2, 4, 6, . . .} = {n ∈ N : n is even}
Lecture 12: Operations on sets
There is an “arithmetic” of sets similar to or- 12.4 Difference A − B

dinary arithmetic. There are operations similar
The difference A − B of sets A and B consists of
to addition, subtraction and multiplication.
the elements in A and not in B, indicated by the
shaded region in the following Venn diagram.
12.1 Venn diagrams
The simple operations on sets can be visualised U
with the help of Venn diagrams, which show sets A B
A, B, C, . . . as disks within a rectangle represent-
ing the universal set U .
U
A B The difference U − B relative to the univer-
sal set U is called the complement B of B. Here
is the Venn diagram of B.
U
A B
12.2 Union A ∪ B
The union A ∪ B of sets A and B consists of
the elements in A or B, and is indicated by the
shaded region in the following Venn diagram.
U 12.5 Symmetric difference A4B

A B The union of A − B and B − A is called the
symmetric difference A4B of A and B.
U
A B
12.3 Intersection A ∩ B
The intersection A ∩ B of sets A and B consists
of the elements in A and B, indicated by the
shaded region in the following Venn diagram. A4B consists of the elements of one of A, B
but not the other.
U It is clear from the diagram that we have not
A B only
A4B = (A − B) ∪ (B − A),
but also
A4B = (A ∪ B) − (A ∩ B).
12.6 Ordered Pairs area l × w. In fact, we call it an “l × w rectan-
gle.” This is probably the reason for using the
Sometimes we do want order to be important.
× sign, and for calling A × B a “product.”
In computer science arrays are ubiquitous ex-
amples of ordered data structures. In maths,
Questions
ordered pairs are frequently used. An ordered
pair (a, b) consists simply of a first object a and 12.1 Draw a Venn diagram for A ∩ B. What is
a second object b. The objects a and b are some- another name for this set?
times called the entries or coordinates of the or-
dered pair. 12.2 Check the de Morgan laws by drawing
Venn diagrams for A ∪ B, A ∩ B, A ∩ B
Two ordered pairs (a, b) and (c, d) are equal and A ∪ B
if and only if a = c and b = d.
12.3 Find which of the following is true by
drawing suitable Venn diagrams.
Example. {0, 1} = {1, 0} but (0, 1) 6= (1, 0). A ∩ (B4C) = (A ∩ B)4(A ∩ C)?
There’s no reason we need to stop with pairs. A4(B ∩ C) = (A4B) ∩ (A4C)?
We can similarly define ordered triples, quadru- 12.4 If plane = line × line, what do you think
ples, and so on. When there are k coordinates, line × circle is? What about circle × cir-
we call the object an ordered k-tuple. Two or- cle?
dered k-tuples are equal if and only if their ith
coordinates are equal for i = 1, 2, . . . , k.
12.7 Cartesian product A × B
The set of ordered pairs

A × B = {(a, b) : a ∈ A and b ∈ B}
is the cartesian product of sets A and B.
The commonest example is where A = B =

R (the set of real numbers, or the number line).
Then the pairs (a, b) are points in the plane, so
R × R is the plane.
(a, b)
b
O a
Because Descartes used this idea in geome-

try, the cartesian product is named after him.
12.8 A × B and multiplication

If A has |A| elements and B has |B| elements,
then A × B has |A| × |B| elements.
Similarly, if L is a line of length l, and W is
a line of length w, then L × W is a rectangle of
Lecture 13: Functions
√
A function can be thought of as a “black 2. The square root function sqrt(x) = x with
box” which accepts inputs and, for each input, domain R>0 , codomain R, and pairs
produces a single output. √
{(x, x) : x ∈ R and x > 0}.
13.1 Defining functions via sets 2
Formally we represent a function f as a set X

1
of possible inputs, a set Y so that every out-
put of f is guaranteed to be in Y , and a set
of (input,output) pairs from X × Y . The vital 0
property of a function is that each input gives
exactly one output.
-1
A function f consists of a domain X, a -1 0 1 2 3
codomain Y , and a set of ordered pairs from The image of this function (the set of y values)
X × Y which has exactly one ordered pair is the set R>0 .
(x, y) for each x ∈ X.
3. The cubing function cube(x) = x3 with do-
When (a, b) is in this set we write f (a) = b. main R, codomain R, and pairs
The set of y values occurring in these pairs is
{(x, x3 ) : x ∈ R},
the image of f .
Note that the image of a function is always

a subset of its codomain but they may or may
not be equal.
If the image of a function is equal to its

1
codomain, we say the function is onto.
Examples. 0
1. The squaring function square(x)= x2 with -1

domain R, codomain R, and pairs -1 0 1
2
{(x, x ) : x ∈ R},
which form what we usually call the plot of the
squaring function.
1
The image of this function is the whole of the
0 codomain R, so it is onto.
-1 0 1
The image of this function (the set of y val-

ues) is the set R>0 of real numbers > 0.
13.2 Arrow notation functions are or are not one-to-one.
Example. The function f : R → R given by

If f is a function with domain A and
f (x) = 6x + 2 is one-to-one because
codomain B we write
f (x1 ) = f (x2 )
f : A → B,
⇒ 6x1 + 2 = 6x2 + 2
and we say that f is from A to B.
⇒ 6x1 = 6x2
For example, we could define ⇒ x1 = x2 .
square : R → R. Example. The function f : R → R given
We could also define by f (x) = x2 + 1 is not one-to-one because
f (−1) = 2 and f (1) = 2 and so
square : R → R>0 .
f (−1) = f (1).
Likewise, we could define
cube : R → R. Questions
However we could not define 13.1 Some of the following “rules” do not define
>0 genuine functions. Which are they?
cube : R → R ,
because for some x ∈ R, cube(x) is negative. • For each set S of natural numbers, let
For example, cube(−1) = −1. f (S) be the least element of S.
• For each set X of real numbers be-
13.3 One-to-one functions tween 0 and 1, let g(X) be the least
element of X.
A function f : X → Y is one-to-one if for • For each circle C in the (x, y) plane,
each y in the image of f there is only one let h(C) be the minimum distance
x ∈ X such that f (x) = y. from C to the x-axis.
For example, the function cube(x) is one-to- • For each pair A, B of sets of real num-
one because each real number y is the cube of bers, let s(A, B) be the smallest set
exactly one real number x. containing both A and B.
The function square: R → R is not one-to- • For each pair A, B of sets of real num-
one because the real number 1 is the square of bers, let t(A, B) be the largest set
two different real numbers, 1 and −1. (In fact contained in both A and B.
each real y > 0 is the square of two different real
√ √ 13.2 For each of the following, say which can be
numbers, y and − y)
On the other hand, square : R>0 → R is one- defined with domain R and codomain R.
to-one because each real number y in R>0 is the √ √
x2 , 1/x, log x, x, 3 x
square of only one real number in R>0 , namely
√
y.
The last example shows that the domain of
a function is an important part of its descrip-
tion, because changing the domain can change
the properties of the function.
13.4 Proving a function is one-to-one

There is an equivalent way of phrasing the def-
inition of one-to-one: a function f : X → Y is
one-to-one when, for all x1 , x2 ∈ X,
f (x1 ) = f (x2 ) ⇒ x1 = x2 .
This can be useful for proving that some
Lecture 14: Examples of functions
The functions discussed in the last lecture 14.3 Characteristic functions

were familiar functions of real numbers. Many
A subset of N = {0, 1, 2, 3, . . .} can be repre-
other examples occur elsewhere, however.
sented by its characteristic function. For exam-
ple, the set of squares is represented by the func-
tion χ : N → {0, 1} defined by
(
14.1 Functions of several variables 1 if n is a square
χ(n) =
0 if n is not a square
We might define a function which has the following sequence of values
sum : R × R → R by sum(x, y) = x + y. 110010000100000010000000010000000000100 . . .
Because the domain of this function is R×R, the (with 1s at the positions of the squares
inputs to this function are ordered pairs (x, y) 0, 1, 4, 9, 16, 25, 36, . . .).
of real numbers. Because its codomain is R, we Any property of natural numbers can like-
are guaranteed that each output will be a real wise be represented by a characteristic function.
number. This function can be thought of as a For example, the function χ above represents
function of two variables x and y. the property of being a square.
Similarly we might define a function Thus any set or property of natural numbers
is represented by a function
binomial : R × R × N → R
χ : N → {0, 1}.
by
Characteristic functions of two or more vari-
binomial(a, b, n) = (a + b)n . ables represent relations between two or more
Here the inputs are ordered triples (x, y, n) such objects. For example, the relation x 6 y be-
that x and y are real numbers and n is a natural tween real numbers x and y has the character-
number. We can think of this as a function of istic function χ : R × R → {0, 1} defined by
three variables. (
1 if x 6 y
χ(x, y) =
0 otherwise.
14.2 Sequences 14.4 Boolean functions

The connectives ∧, ∨ and ¬ are functions of vari-
An infinite sequence of numbers, such as ables whose values come from the set B = {T, F}
1, 21 , 14 , 18 , 16
1
,..., of Boolean values (named after George Boole).
¬ is a function of one variable, so
can be viewed as the function f : N → R de-
fined by f (n) = 2−n . In this case, the inputs to ¬:B→B
f are natural numbers, and its outputs are real and it is completely defined by giving its values
numbers. on T and F, namely
Any infinite sequence a0 , a1 , a2 , a3 , . . . can be
viewed as a function g(n) = an from N to some ¬T = F and ¬F = T.
set containing the values an . This is what we previously did by giving the
truth table of ¬. The construction of f is sometimes called a “di-
∧ and ∨ are functions of two variables, so agonalisation argument”, because we get its val-
ues by switching values along the diagonal in the
∧:B×B→B
table of values of f0 , f1 , f2 , f3 , . . ..
and
∨:B×B→B
They are completely defined by giving their val-
ues on the pairs (T, T), (T, F), (F, T), (F, F) in
B × B, which is what their truth tables do.
Questions
14.1 Suggest domains and codomains for the
14.5* Characteristic functions and following functions, writing the domain as
subsets of N a cartesian product where applicable.
Mathematicians say that two (possibly infinite) gcd, reciprocal, remainder ∩, ∪
sets A and B have the same cardinality (size) if
there is a one-to-one and onto function from A 14.2 If A and B are subsets of N with charac-
to B. This function associates each element of A teristic functions χA and χB respectively,
with a unique element of B and vice-versa. With what set does the function χA (n)χB (n)
this definition, it is not too hard to show that, represent?
for example, N and Z have the same cardinality
14.3 How many Boolean functions of n vari-
(they are both “countably infinite”).
ables are there?
It turns out, though, that P(N) has a strictly
greater cardinality than N. We can prove this
by showing: no sequence f0 , f1 , f2 , f3 , . . . in-
cludes all characteristic functions for subsets of
N. (This shows that there are more characteris-
tic functions than natural numbers.)
In fact, for any infinite list f0 , f1 , f2 , f3 , . . .
of characteristic functions, we can define a char-
acteristic function f which is not on the list.
Imagine each function given as the infinite se-
quence of its values, so the list might look like
this:
f0 values 0101010101 . . .
f1 values 0000011101 . . .
f2 values 1111111111 . . .
f3 values 0000000000 . . .
f4 values 1001001001 . . .
..
.
Now if we switch each of the underlined values
to its opposite, we get a characteristic function
(
1 if fn (n) = 0
f (n) =
0 if fn (n) = 1
which is different from each function on the list.
In fact, it has a different value from fn on the
number n.
For the given example, f has values
11011 . . .
Lecture 15: Composition and inversion
Complicated functions are often built from 15.2 Conditions for composition
simple parts. For example, the function f : R →
Composite functions do not always exist.
R defined by f (x) = (x2 + 1)3 is computed by
doing the following steps in succession: Example. If reciprocal : R − {0} → R is de-
fined by reciprocal(x) = x1 and predecessor :
• square, R → R is defined by predecessor(x) = x − 1,
then reciprocal ◦ predecessor does not exist, be-
• add 1, cause predecessor(1) = 0 is not a legal input for
reciprocal.
• cube.
To avoid this problem, we demand that the
We say that f (x) = (x2
+ 1)3
is the composite codomain of h be equal to the domain of g for
of the functions (from R to R) g ◦ h to exist. This ensures that each output of
h will be a legal input for g.
• square(x)=x2 ,
Let h : A → B and g : C → D be func-
• successor(x)=x + 1, tions. Then g ◦ h : A → D exists if and only
if B = C.
• cube(x)=x3 .
15.1 Notation for composite func- 15.3 The identity function

tions On each set A the function iA : A → A defined
In the present example we write by
f (x) = cube(successor(square(x))), iA (x) = x,
or is called the identity function (on A).
f = cube ◦ successor ◦ square. 15.4 Inverse functions
Let h : X → Y and g : Y → Z be functions. Functions f : A → A and g : A → A are said to

The function g ◦ h : X → Z is defined by be inverses (of each other) if
g ◦ h(x) = g(h(x)) f ◦ g = g ◦ f = iA .
and is called the composite of g and h. Example. square and sqrt are inverses of each
other on the set R>0 of reals > 0.
Warning: Remember that g ◦ h means “do sqrt(square(x)) = x and square(sqrt(x)) = x.

h first, then g.” g ◦ h is usually different from In fact, this is exactly what sqrt is supposed
h ◦ g. to do – reverse the process of squaring. How-
ever, this works only if we restrict the domain to
R>0 . On R we do not have sqrt(square(x)) = x
Example.
because, for example,
square(successor(x)) = (x + 1)2 = x2 + 2x + 1
successor(square(x)) = x2 + 1 sqrt(square(−1)) = sqrt(1) = 1.
This problem arises whenever we seek an in- 15.6 Operations
verse for a function which is not one-to-one. The
An operation is a particular type of function,
squaring function on R sends both 1 and −1 to
with domain A × A × A × . . . × A and codomain
1, but we want a single value 1 for sqrt(1). Thus
A, for some set A.
we have to restrict the squaring function to R>0 .
For example, the addition function f (a, b) =
a + b is called an operation on R, because f :
R × R → R. (That is, addition is a function of
15.5 Conditions for inversion two real variables, which takes real values.)
An operation with one variable is called
A function f can have an inverse without its unary, an operation with two variables is called
domain and codomain being equal. binary, an operation with three variables is
called ternary, and so on.
The inverse of a function f : A → B is a
function f −1 : B → A such that Examples
f −1 ◦ f = iA and f ◦ f −1 = iB . 1. Addition is a binary operation on R.
Note that f −1 ◦ f and f ◦ f −1 are both iden- 2. Successor is a unary operation on N.

tity functions but they have different domains. 3. Intersection is a binary operation on P(A)
Not every function has an inverse, but we for any set A.
can neatly classify the ones that do.
4. Complementation is a unary operation on
Let f : A → B be a function. Then : f −1 P(A) for any set A.
B → A exists if and only if f is one-to-one
and onto. Questions
15.1 Suppose f, m and s are the following func-
Example: ex and log tions on the set of people.
Consider f : R → R>0 − {0} defined by f (x) =
m(x) = mother of x
ex . We know that ex is one-to-one (e.g. because
it is strictly increasing), and onto. So it has an f (x) = father of x
inverse f −1 on R>0 − {0}. s(x) = spouse of x
3 What are the English terms for the follow-

ing composite functions?
2 m ◦ s, f ◦ s, m ◦ m, f ◦ m, s ◦ s
1 15.2 Write the following functions as compos-

ites of square(x), sqrt(x), successor(x) and
0 cube(x).
√
-4 -3 -2 -1 0 1 1 + x3 , x3/2 , (1 + x)3 , (1 + x3 )2
Plot of y = ex . 15.3 What interesting feature do the following
In fact, f −1 = log(y) where functions have in common? (Hint: con-
sider their inverses.)
log : R>0 − {0} → R.
Now • ¬ on B
• The reciprocal, f (x) = x1 , on R − {0}
elog x = x and log(ex ) = x,
x
• The function g(x) = x−1 , on R − {1}.
so elog x and log(ex )
are both identity functions,
but they have different domains.
The domain of elog x is R>0 − {0} (note log
is defined only for reals > 0). The domain of
log(ex ) is R.
Lecture 16: Relations
Mathematical objects can be related in var- This relation is also a function (the identity
ious ways, and any particular way of relating function on R), since there is exactly one pair
objects is called a relation on the set of objects for each x ∈ R.
in question.
(This also applies to relations in the every-
day sense. For example, “parent of” is a relation
on the set of people.)
A binary relation R on a set A consists of A

and a set of ordered pairs from A × A. 2. The < relation on R.
This relation consists of all the pairs (x, y)
When (a, b) is in this set we write aRb. with x < y. It is the following shaded subset
of the plane.
Similarly, a ternary relation on A would be
defined by a set of ordered triples from A×A×A, 1
and so on. (A unary relation on A is just a sub-
set of A.) 0
-1
16.1 Relations and functions -1 0 1
Any function f : X → Y can be viewed as a (The dashed line indicates that the points
relation R on X ∪ Y . The relation is defined by where x = y are omitted.)
xRy if and only if y = f (x).
However, not every relation is a function.
Remember that a function must have exactly
one output y for each input x in its domain. In
a relation, on the other hand, an element x may
be related to many elements y, or to none at all. 3. Algebraic curves.
An algebraic curve consists of the points
(x, y) satisfying an equation p(x, y) = 0,
16.2 Examples where p is a polynomial.
1. Equality on R. E.g. unit circle x2 + y 2 − 1 = 0.
This is the relation consisting of the pairs 1
(x, x) for x ∈ R. Thus it is the following
subset of the plane. 0
1
-1
0 -1 0 1
Notice that this relation is not a function,

-1 because there are two pairs with the same x,
-1 0 1 e.g. (0, 1) and (0, −1).
Likewise, the curve y 2 = x2 (x + 1) is not a These properties are clear if one remembers
function. that a ≡ b (mod n) means a and b have the same
2 remainder on division by n.
1 Questions
16.1 Which of the following relations R(x, y)
0 satisfy ∀x∃yR(x, y)?
-1 • x∧y =T (for propositions x, y)

• x⊆y (for sets x, y of natural num-
-2 bers)
-1 0 1 2 • x>y (for real numbers x, y )
• x divides y (for natural numbers
x, y)
16.2 Use logic symbols and the 6 relation to

4. The subset relation ⊆. write a relation between real numbers x, y
This consists of the ordered pairs of sets which says that the point (x, y) lies in the
(A, B) such that A ⊆ B. A and B must both square with corners (0,0), (1,0), (0,1) and
be subsets of some universal set U . (1,1).
5. Congruence modulo n.
For a fixed n, congruence modulo n is a bi-
nary relation. It consists of all the ordered
pairs of integers (a, b) such that n divides
a − b.
16.3 Properties of congruence

As the symbol ≡ suggests, congruence mod n is
a lot like equality. Numbers a and b which are
congruent mod n are not necessarily equal, but
they are “equal up to multiples of n,” because
they have equal remainders when divided by n.
Because congruence is like equality, congru-
ence a ≡ b (mod n) behave a lot like equations.
In particular, they have the following three prop-
erties.
1. Reflexive property.
a ≡ a (mod n)
for any number a.
2. Symmetric property.
a ≡ b (mod n) ⇒ b ≡ a (mod n)
for any numbers a and b.
3. Transitive property.
a ≡ b (mod n) and b ≡ c (mod n) ⇒
a ≡ c (mod n)
for any numbers a, b and c.
Lecture 17: Equivalence relations
3. Similarity of triangles.
An equivalence relation R on a set A is a bi-
Triangles ABC and A0 B 0 C 0 are similar if
nary relation with the following three prop-
erties. AB BC CA
0 0
= 0 0 = 0 0.
AB BC CA
1. Reflexivity. E.g. the following triangles are similar
aRa C0
for all a ∈ A.
2. Symmetry.
C
aRb ⇒ bRa
for all a, b ∈ A.
3. Transitivity. A B B0 A0
aRb and bRc ⇒ aRc 4. Parallelism of lines.
for all a, b, c ∈ A. The relation LkM (L is parallel to M ) is an
equivalence relation.
Equality and congruence mod n (for fixed n) are
Remark
examples of equivalence relations.
In all these cases the relation is an equivalence
because it says that objects are the same in some
respect.
17.1 Other equivalence relations
1. Equivalence of fractions. 1. Equivalent fractions have the same re-

Two fractions are equivalent if they reduce duced form.
to the same fraction when the numerator and
denominator of each are divided by their gcd.
E.g. 24 and 36 are equivalent because both re- 2. Congruent triangles have the same side
duce to 12 . lengths.
2. Congruence of triangles.
Triangles ABC and A0 B 0 C 0 are congruent if 3. Similar triangles have the same shape.
AB = A0 B 0 , BC = B 0 C 0 and CA = C 0 A0 . E.g.
the following triangles are congruent.
4. Parallel lines have the same direction.
C C0
Sameness is always reflexive (a is the same as

a), symmetric (if a is the same as b, then b is
the same as a) and transitive (if a is the same
A B B0 A0
as b and b is the same as c, then a is the same
as c).
17.2 Equivalence classes 17.4 Partitions and equivalence
classes
Conversely, we can show that if R is a reflexive,
symmetric and transitive relation then aRb says A partition of a set S is a set of subsets of S
that a and b are the same in some respect: they such that each element of S is in exactly one
have the same R-equivalence class. of the subsets.
If R is an equivalence relation we define the Using what we showed in the last section, we
R-equivalence class of a to be have the following.
[a] = {s : sRa}. If R is an equivalence relation on a set A,

then the equivalence classes of R form a par-
Thus [a] consists of all the elements related to tition of A. Two elements of A are related if
a. It can also be defined as {s : aRs}, because and only if they are in the same equivalence
sRa if and only if aRs, by symmetry of R. class.
Examples
• The parallel equivalence class of a line L Example. Let R be the relation on Z defined
consists of all lines parallel to L. by aRb if and only if a ≡ b (mod 3). The three
• The equivalence class of 1 for congruence equivalence classes of R are
mod 2 is the set of all odd numbers. {x : x ≡ 0 (mod 3)} = {3k : k ∈ Z}
{x : x ≡ 1 (mod 3)} = {3k + 1 : k ∈ Z}
{x : x ≡ 2 (mod 3)} = {3k + 2 : k ∈ Z}.
17.3 Equivalence class properties These partition the set Z.
Claim. If two elements are related by an equiv- Questions

alence relation R on a set A, their equivalence
17.1 Which of the following relations between
classes are equal.
integers x and y are equivalence relations?
Proof. Suppose a, b ∈ A and aRb. Now
• |x| = |y|
s ∈ [a] ⇒ sRa by definition of [a] • x3 − y 3 = 1
⇒ sRb by transitivity of R • x divides y
since sRa and aRb
• 5 divides x − y
⇒ s ∈ [b] by definition of [b].
17.2 For those relations in Question 17.1 that
Thus all elements of [a] belong to [b]. Similarly,
are not equivalence relations, say which
all elements of [b] belong to [a], hence [a] = [b].
properties of equivalence they fail to sat-

isfy.
Claim. If R is an equivalence relation on a
17.3 For those that are equivalence relations,
set A, each element of A belongs to exactly one
say what is the “same” about the related
equivalence class.
objects.
Proof. Suppose a, b, c ∈ A, and c ∈ [a] ∩ [b].
17.4 Also, for those relations that are equiva-
c ∈ [a] and c ∈ [b] lence relations, describe their equivalence
⇒ cRa and cRb classes.
by definition of [a] and [b]
⇒ aRc and cRb by symmetry
⇒ aRb by transitivity
⇒ [a] = [b]
by the previous claim.
Lecture 18: Order relations
18.1 Partial order relations 18.2 Total order relations

A total order relation is a special kind of partial
A partial order relation R on a set A is a bi-
order relation that “puts everything in order”.
nary relation with the following three prop-
erties. A total order relation R on a set A is a par-
tial order relation that also has the property
1. Reflexivity.
aRb or bRa for all a, b ∈ A.
aRa
for all a ∈ A. Examples.
2. Antisymmetry.
1. 6 on R
aRb and bRa ⇒ a = b This is a total order relation because for all
for all a, b ∈ A. real numbers a and b we have a 6 b or b 6 a.
3. Transitivity. 2. ⊆ on P(N).
This is not a total order because, for example,
aRb and bRc ⇒ aRc {1, 2} * {1, 3} and {1, 3} * {1, 2}.
for all a, b, c ∈ A.
3. Divisibility on N.
This is not a total order because, for exam-
Examples.
ple, 2 does not divide 3 and 3 does not divide
1. 6 on R. 2.
Reflexive: a 6 a for all a ∈ R.
Antisymmetric: a 6 b and b 6 a ⇒ a = b for 4. Alphabetical order of words.
all a, b ∈ R. This is a total order because given any two
Transitive: a 6 b and b 6 c ⇒ a 6 c for all different words, one will appear before the
a, b, c ∈ R. other in alphabetical order.
2. ⊆ on P(N).
Reflexive: A ⊆ A for all A ∈ P(N). 18.3 Hasse diagrams
Antisymmetric: A ⊆ B and B ⊆ A ⇒ A = B A partial order relation R on a finite set A can be
for all A, B ∈ P(N). represented as a Hasse diagram. The elements
Transitive: A ⊆ B and B ⊆ C ⇒ A ⊆ C for of A are written on the page and connected by
all A, B, C ∈ P(N). lines so that, for any a, b ∈ A, aRb exactly when
3. Divisibility on N. b can be reached from a by travelling upward
The relation “a divides b” on natural num- along the lines.
bers is reflexive, antisymmetric and transi-
Example. A Hasse diagram for the relation ⊆
tive. We leave checking this as an exercise.
on the set P({1, 2}) can be drawn as follows.
4. Alphabetical order of words.
{1, 2}
Words on the English alphabet are alphabet-
ically ordered by comparing the leftmost let-
{1} {2}
ter at which they differ. We leave checking
that this relation is reflexive, antisymmetric
and transitive as an exercise. ∅
Example. A Hasse diagram for the relation “di- Example. The relation 6 on {x : x ∈ R, x > 0}
vides” on the set {1, 2, 3, 5, 6, 10, 15, 30} can be is not a well-order relation. For example, the
drawn as follows. subset {x : x ∈ R, x > 3} has no least element.
30
Questions
6 15 18.1 Explain why “antisymmetric” does not
10
mean “not symmetric”. Give an example
3 of a relation which is neither symmetric
2 5
nor antisymmetric.
1 18.2 Draw a diagram of the positive divisors of
42 under the relation “divides.” Why does
Example. A Hasse diagram for the relation 6 it resemble the diagram for the positive di-
on the set {1, 2, 3, 4, 5} can be drawn as follows. visors of 30}?
5
18.3 Invent a partial order relation on N × N.
Is your ordering a total ordering? Is your
4
ordering a well-ordering?
Notice how this last Hasse diagram can be

simply drawn as a vertical chain, when the pre-
vious two are “wider” and more complicated.
This corresponds to the fact that the last exam-
ple was of a total order relation but the previous
two were not of total order relations.
18.4 Well-ordering
A well-order relation on a set is a total order
relation that also has the property that each
nonempty set of its elements contains a least el-
ement.
A well-order relation R on a set A is a to-
tal order relation such that, for all nonempty
S ⊆ A, there exists an ` ∈ S such that `Ra
for all a ∈ S.
Example. The relation 6 on N is a well-order

relation because every nonempty subset of N has
a least element.
The well-ordering of N is the basis of proofs

by induction.
Example. The relation 6 on Z is not a well-

order relation. For example, Z itself has no least
element.
Lecture 19: Selections and Arrangements
19.1 Ordered selections without For every unordered list our reviewer could
repetition make there are 3! = 6 corresponding possible or-
dered lists. And we’ve seen that she could make
A reviewer is going to compare ten phones and
10 × 9 × 8 ordered lists. So the number of un-
list, in order, a top three. In how many ways can
ordered lists she could make is 10×9×8
6 .
she do this? More generally, how many ways are
For every combination of r elements from a
there to arrange r objects chosen from a set of
set of n elements there are r! corresponding per-
n objects?
mutations. So, using our formula for the number
In our example, the reviewer has 10 options
of permutations we have the following.
for her favourite, but then only 9 for her second-
favourite, and 8 for third-favourite. So there are
The number of combinations of r elements
10 × 9 × 8 ways she could make her list.
from a set of n elements (0 6 r 6 n) is
For an ordered selection without repetition
n(n − 1) · · · (n − r + 1) n! n

of r elements from a set of n elements there are = = .
r! r!(n − r)! r
n options for the 1st element
Notice that the notation nr is used for r!(n−r)!
n!

.
n−1 options for the 2nd element
Expressions like this are called binomial coeffi-
n−2 options for the 3rd element
cients. We’ll see why they are called this in the
.. ..
. . next lecture.
n−r+1 options for the rth element.
19.3 Ordered selections with
So we have the following formula. repetition
An ordered selection of r elements from a set X
The number of ordered selections without
is really just a sequence of length r with each
repetition of r elements from a set of n el-
term in X. If X has n elements, then there are
ements (0 6 r 6 n) is
n possibilities for each term and so:
n!
n(n − 1) · · · (n − r + 1) = .
(n − r)!
The number of sequences of r terms, each
When r = n and all the elements of a set S from some set of n elements, is
are ordered, we just say that this is a permuta- r
| ×n×
n {z· · · × n} = n .
tion of S. Our formula tells us there are n! such r
permutations. For example, there are 3! = 6
permutations of the set {a, b, c}:
(a, b, c), (a, c, b), (b, a, c), (b, c, a), (c, a, b), (c, b, a). 19.4 Unordered selections with
repetition
19.2 Unordered selections without A shop has a special deal on any four cans of soft
repetition drink. Cola, lemonade and sarsaparilla flavours
are available. In how many ways can you select
What if our reviewer instead chose an unordered four cans?
top three? In how many ways could she do that? We can write a selection in a table, for ex-
More generally, how many ways are there to ample,
choose (without order) r objects from a set of
n objects? C L S C L S
and .
• •• • • •••
A combination of r elements from a set S is
a subset of S with r elements. We can change a table like this into a string
of zeroes and ones, by moving from left to right
reading a “•” as a 0 and a column separator as Questions
a 1. The tables above would be converted into
19.1 A bank requires a PIN that is a string of
0 1 0 0 1 0 and 1 0 1 0 0 0 four decimal digits. How many such PINs
are there? How many are made of four
Notice that each string has four zeroes (one different digits?
for each can selected) and two ones (one fewer
than the number of flavours). We can choose 19.2 How many binary strings of length 5 are
a string like this by beginning with a string of there? How many of these contain exactly
six ones and then choosing four ones to change two 1s?
6

to zeroes. There are 4 ways to do this and so 19.3 In a game, each of ten players holds red,
6

there are 4 possible can selections. blue and green marbles, and places one
An unordered selection of r elements, with marble in a bag. How many possibilities
repetition allowed, from a set X of n elements are there for the colours of marbles in the
can be thought of as a multiset with r elements, bag? If each player chooses their colour at
each in X. As in the example, we can represent random are all of these possibilities equally
each such multiset with a string of r zeroes and likely?
n − 1 ones. We can choose a string like this by
beginning with a string of n + r − 1 ones and
then choosing r ones to change to zeroes.
The number of multisets of r elements, each

from a set of n elements, is
n+r−1 (n + r − 1)!

= .
r r!(n − 1)!
19.5 The pigeonhole principle

The pigeonhole principle is a reasonably obvious
statement, but can still be very useful.
If n items are placed in m containers with

n > m, then at least one container has at
least two items.
Example. If a drawer contains only blue, black

and white socks and you take out four socks
without looking at them, then you are guaran-
teed to have two of the same colour.
We can generalise the pigeonhole principle

as follows.
If n items are placed in m containers, then at

n
least one container has at least d m e items.
n
In the above d m e means the smallest integer
n n
greater than m (or m “rounded up”).
Example. If 21 tasks have been distributed be-

tween four processor cores, the busiest core must
have been assigned at least 6 tasks.
Lecture 20: Pascal’s triangle
20.1 Pascal’s triangle above it. To see why this is, we’ll begin with an
example.
We can write the binomial coefficients in an (in-
Example. Why is 62 = 52 + 51 ?

finite) triangular array as follows:
0
There are 62 combinations of 2 elements of
1
0
1
{1, 2, 3, 4, 5, 6}. Every such combination either
0 1
2
2 2 • does not contain a 6, in which case it is
0 31 32 3 5

3 one of the 2 combinations of 2 elements
4
0 41 42 43 4 of {1, 2, 3, 4, 5}; or
5
0 51 52 53 54 5
61 62 63 64 65 6
0 • does contain a 6, in which case the rest of
6
0 1 2 3 4 5 6 the combination is one of the 51 combi-
.. .. .. nations of 1 element from {1, 2, 3, 4, 5}.
. . .
So 2 = 52 + 51 .
6

Here are the first ten rows with the entries as
integers: We can make a similar argument in general.
Let X be a set of n elements and x is a fixed
1
1 1 n−1
of X. For any r ∈ {1, . . . , n}, there are
element
r combinations of r elements of X that do
1 2 1 not contain x and there are n−1
r−1 combinations
1 3 3 1 of r elements of X that do contain x. So:
1 4 6 4 1
1 5 10 10 5 1 n n−1 n−1
= + for 1 6 r 6 n.
1 6 15 20 15 6 1 r r r−1
1 7
35 35 21 7 1 21
1 8
56 70 56 28 8 28 1 This shows that every internal entry in Pascal’s
1 9 36 84 126 126 84 36 9 1 triangle is the sum of the two above it.
1 10 45 120 210 252 210 120 45 10 1
20.3 The binomial theorem
This triangular array is often called Pascal’s
triangle (although Pascal was nowhere near the (x + y)0 = 1
1
(x + y) = x+y
first to discover it).
(x + y)2 = x2 + 2xy + y 2
20.2 Patterns (x + y)3 = x3 + 3x2 y + 3xy 2 + y 3
4
(x + y) = x + 4x3 y + 6x2 y 2 + 4xy 3 + y 4
4
Writing the binomial coeffcients this way reveals (x+y)5 = x5 +5x4 y+10x3 y 2 +10x2 y 3 +5xy 4 +y 5
a lot of different patterns in them. Perhaps the
most obvious is that every row reads the same
Notice that the coefficients on the right are
left-to-right and right-to-left. Choosing r ele-
exactly the same as the entries in Pascal’s tri-
ments from a set of n elements to be in a combi-
angle. Why does this happen? Think about
nation is equivalent to choosing n − r elements
expanding (x + y)3 and finding the coefficient of
from the same set to not be in the combination.
xy 2 , for example.
So:
(x + y)(x + y)(x + y) = xxx + xxy + xyx + xyy

n n
+ yxx + yxy + yyx + yyy
= for 0 6 r 6 n.
r n−r = x3 + 3x2 y + 3xy 2 + y 3
The coefficient of xy 2 is 3 because we have
This shows that every row reads the same left- three terms in the sum above that contain two
to-right and right-to-left. y’s (those underlined). This is because there are
3
Another pattern is that every “internal” en- 2 ways to choose two of the three factors in a
try in the triangle is the sum of the two entries term to be y’s.
The same logic holds in general. The
coeffi- does this tell you about the rows of Pas-
n−r r n n
cient of x y in (x + y) will be r because cal’s triangle?
there will be nr ways to choose r of the n fac-
tors in a term to be y’s. This fact is called the 20.2 Find a pattern in the sums of the rows
binomial theorem. in Pascal’s triangle. Prove your pattern
holds using the binomial theorem. Also
Binomial theorem For any n ∈ N, prove it holds by considering the powerset
of a set.
(x+y)n = n0 xn y 0 + n1 xn−1 y 1 + n2 xn−2 y 2 +

n
+ nn x0 y n . 20.3 Use inclusion-exclusion to work out how
1 n−1
· · · + n−1 x y
many numbers in the set {1, . . . , 100} are
divisible by 2 or 3 or 5.
20.4 Inclusion-exclusion
A school gives out prizes to its best ten students
in music and its best eight students in art. If
three students receive prizes in both, how many
students get a prize? If we try to calculate this
as 10 + 8 then we have counted the three over-
achievers twice. To compensate we need to sub-
tract three and calculate 10 + 8 − 3 = 15.
In general, if A and B are finite sets then we
have
|A ∪ B| = |A| + |B| − |A ∩ B|.
With a bit more care we can see that if A, B
and C are sets then we have
|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B|
− |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|.
This is part of a more general law called the
inclusion-exclusion principle.
Let X1 , X2 , . . . , Xt be finite sets. To calculate

|X1 ∪ X2 ∪ · · · ∪ Xt |:
• add the sizes of the sets;
• subtract the sizes of the 2-way intersections;
• add the sizes of the 3-way intersections;
• subtract the sizes of the 4-way intersections;
..
.
• add/subtract the size of the t-way intersection.
To see why this works, think of an element x that

is in n of the sets X1 , X2 , . . . , Xt . It is counted

n n n n n
− + − + ··· ±
1 2 3 4 n
times. By the Binomial theorem with x = 1 and
y = −1 (see Question 20.1), this is equal to 1.
So each element is counted exactly once.
Questions
20.1 Substitute x = 1 and y = −1 into the
statement of the binomial theorem. What
Lecture 21: Probability
Probability gives us a way to model ran- It can be convenient to give this as a table:
dom processes mathematically. These processes
s 1 2 3 4
could be anything from the rolling of dice, to .
1 1 1 1
radioactive decay of atoms, to the performance Pr(s) 2 4 8 8
of a stock market index. The mathematical en- Example. Rolling a fair six-sided die could
vironment we work in when dealing with prob- be modeled by a probability space with sample
abilities is called a probability space. space S = {1, 2, 3, 4, 5, 6} and probability func-
tion Pr given as follows.
s 1 2 3 4 5 6
21.1 Probability spaces 1 1 1 1 1 1
.
Pr(s) 6 6 6 6 6 6
We’ll start with a formal definition and then A sample space like this one where every out-
look at some examples of how the definition is come has an equal probability is sometimes
used. called a uniform sample space. Outcomes from
a uniform sample space are said to have been
A probability space consists of taken uniformly at random.
• a set S called a sample space which con- 21.2 Events

tains all the possible outcomes of the ran-
dom process; and An event is a subset of the sample space.
• a probability function Pr : S → [0, 1] such
An event is just a collection of outcomes we are
that the sum of the probabilities of the out-
interested in for some reason.
comes in S is 1.
Example. In the die rolling example with S =
Each time the process occurs it should produce {1, 2, 3, 4, 5, 6}, we could define the event of
exactly one outcome (never zero or more than rolling at least a 3. Formally, this would be the
one). The probability of an outcome is a mea- set {3, 4, 5, 6}. We could also define the event of
sure of the likeliness that it will occur. It is rolling an odd number as the set {1, 3, 5}.
given as a real number between 0 and 1 inclu-
sive, where 0 indicates that the outcome cannot The probability of an event A is the sum of
occur and 1 indicates that the outcome must oc- the probabilities of the outcomes in A.
cur.
Example. Example. In the spinner example, for the event

A = {1, 2, 4}, we have
4 Pr(A) = Pr(1) + Pr(2) + Pr(4)
3 = 1 1 1
1 2 + 4 + 8
7
= 8.
2
In a uniform sample space (where all out-
comes are equally likely) the probability of an
The spinner above might be modeled by a prob- event A can be calculated as:
ability space with sample space S = {1, 2, 3, 4}
number of outcomes in A |A|
and probability function given as follows. Pr(A) = = .
 number of outcomes |S|
1
2 for s = 1



 1 for s = 2 21.3 Operations on events

4
Pr(s) = 1
8 for s = 3 Because events are defined as sets we can per-



 1

form set operations on them. If A and B are
8 for s = 4.
1
events for a sample space S, then Pr(s) = 8 for any s ∈ S. So,
1
• A ∪ B is the event “A or B,” A = {100, 101, 110, 111} Pr(A) = 2
1
B = {010, 011, 110, 111} Pr(B) = 2
• A ∩ B is the event “A and B,” C = {011, 101, 110} Pr(C) = 3
8
1
• A is the event “not A.” A ∩ B = {110, 111} Pr(A ∩ B) = 4
1
A ∩ C = {101, 110} Pr(A ∩ C) = 4
We always take the sample space as our univer- So Pr(A ∩ B) = Pr(A)Pr(B) but Pr(A ∩ C) 6=
sal set, so A means S − A. Pr(A)Pr(C).
21.4 Probabilities of unions 21.6 Warning

We saw in the section on the inclusion-exclusion Random processes can occur in both discrete
principal that |A ∪ B| = |A| + |B| − |A ∩ B| for and continuous settings, and probability theory
finite sets A and B. We have a similar law in can be applied in either setting. In this lecture,
probability. and in the next four lectures, we are discussing
only the discrete case. Many of the definitions
For any two events A and B, and results we state apply only in this case. Our
definition of a probability space, for example, is
Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).
actually the definition of a discrete probability
space, and so on.
Example. In our die rolling example, let A = The discrete setting provides a good envi-
{1, 2} and B = {2, 3, 4} be events. Then ronment to learn most of the vital concepts and
intuitions of probability theory. What you learn
|A| |B| |A ∩ B| 2 3 1 2
Pr(A∪B) = + − = + − = . here is very useful in itself, and will act as a good
|S| |S| |S| 6 6 6 3 base if you go on to study continuous probabil-
Two events A and B are mutually exclusive ity.
if P r(A ∩ B) = 0, that is, if A and B cannot
occur together. For mutually exclusive events, Questions
we have An integer is chosen uniformly at random from
Pr(A ∪ B) = Pr(A) + Pr(B). the set {1, 2, . . . , 30}. Let A be the event that
the integer is at most 20. Let B be the event
that the integer is divisible by 6. Let C be the
21.5 Independent events event that the integer’s last digit is a 5.
We say that two events are independent when
21.1 Write A, B and C as sets, and find their
the occurrence or non-occurrence of one event
probabilities.
does not affect the likelihood of the other occur-
ring. 21.2 Find the probabilities of A ∪ B, A ∪ C and
B ∪ C. Which pairs of A, B, C are mutu-
Two events A and B are independent if ally exclusive?
Pr(A ∩ B) = Pr(A)Pr(B). 21.3 Find the probabilities of A ∩ B, A ∩ C and
B ∩ C. Which pairs of A, B, C are inde-
pendent?
Example. A binary string of length 3 is gener-
ated uniformly at random. The event A that the
first bit is a 1 is independent of the event B that
the second bit is a 1. But A is not independent
of the event C that the string contains exactly
two 1s.
Formally, the sample space is S =
{000, 001, 010, 011, 100, 101, 110, 111} and
Lecture 22: Conditional probability and Bayes’ theorem
Your friend believes that Python coding has even. What is Pr(A|B)?
become more popular than AFL in Melbourne. 1
Pr(A ∩ B) = Pr(4) = 8
She bets you $10 that the next person to pass
1 1 3
you on the street will be a Python program- Pr(B) = Pr(2) + Pr(4) = 4 + 8 = 8
mer. You feel confident about this bet. How- Thus,
Pr(A∩B)
ever, when you see a man in a “Hello, world!” Pr(A|B) = Pr(B) = ( 18 )/( 38 ) = 13 .
t-shirt approaching, you don’t feel so confident
any more. Why is this? Example. A binary string of length 6 is gener-
We can think about this with a diagram. ated uniformly at random. Let A be the event
The rectangle represents the set of people in that the first bit is a 1 and B be the event that
Melbourne, the circle P is the set of Python the string contains two 1s. What is Pr(A|B)?
coders, and the circle T is the set of “Hello, There are 26 strings in our sample space.
world!” t-shirt owners. Now A ∩ B occurs when the first bit is 1 and the
rest of the string contains 1 one. There are 51
such strings and so Pr(A ∩ B) = 51 /26 . Also,

there are 62 strings containing two 1s and so

Pr(B) = 62 /26 . Thus,
T P
Pr(A∩B) 5 6
= 13 .

Pr(A|B) = Pr(B) = 1 / 2
22.2 Independence again
Initially, you feel confident because the circle Our definition of conditional probability gives us
P takes up a small proportion of the rectan- another way of defining independence. We can
gle. But when you learn that your randomly say that events A and B are independent if
selected person is in the circle T , you feel bad Pr(A) = Pr(A|B).
because the circle P covers almost all of T . In
This makes sense intuitively: it is a formal way
mathematical language, the probability that a
of saying that the likelihood of A does not de-
random Melbournian is a Python coder is low,
pend on whether or not B occurs.
but the probability that a random Melbournian
is a Python coder given that they own a “Hello,
world!” t-shirt is high.
22.3 Independent repeated trials
Generally if we perform exactly the same action
22.1 Conditional probability multiple times, the results for each trial will be
independent of the others. For example, if we
Conditional probabilities measure the likelihood roll a die twice, then the result of the first roll
of an event, given that some other event occurs. will be independent of the result of the second.
For two independent repeated trials, each
from a sample space S, our overall sample
For events A and B, the conditional probabil-
space is S × S and our probability function will
ity of A given B is
be given by Pr((s1 , s2 )) = Pr(s1 )Pr(s2 ). For
Pr(A|B) = Pr(A∩B)
Pr(B) .
three independent repeated trials the sample
space is S × S × S and the probability function
This definition also implies that
Pr((s1 , s2 , s3 )) = Pr(s1 )Pr(s2 )Pr(s3 ), and so on.
Pr(A ∩ B) = Pr(A|B)Pr(B).
Example. The spinner from the previous ex-
Example. The spinner from the last lecture is ample is spun twice. What is the probability
spun. Let A be the event that the result was at that the results add to 5?
least 3 and B be the event that the result was A total of 5 can be obtained as (1,4), (4,1), (2,3)
or (3,2). Because the spins are independent: Example. A binary string is created so that the
1 1 1 first bit is a 0 with probability 31 and then each
Pr((1, 4)) = Pr((4, 1)) = 2 × 8 = 16 subsequent bit is the same as the preceding one
1 1 1
Pr((2, 3)) = Pr((3, 2)) = 4 × 8 = 32 with probability 34 . What is the probability that
So, because (1,4), (4,1), (2,3) and (3,2) are mu- the first bit is 0, given that the second bit is 0?
tually exclusive, the probability of the total be- Let F be the event that the first bit is 0 and
1 1 1 1 3 let S be the event that the second bit is 0. So
ing 5 is 16 + 16 + 32 + 32 = 16 .
Pr(F ) = 13 . If F occurs then the second bit will
be 0 with probability 43 and so Pr(S|F ) = 34 . If
22.4 Bayes’ theorem F does not occur then the second bit will be 0
with probability 14 and so Pr(S|F ) = 14 . So, by
Bayes’ theorem gives a way of calculating the Bayes theorem,
conditional probability of an event A given an
event B when we already know the probabilities Pr(F )Pr(S|F )
Pr(F |S) = Pr(F )Pr(S|F )+Pr(F )Pr(S|F )
of A, of B given A, and of B given A. 1
× 34
= 1
3
× 4 + 23 × 14
3
3
Bayes’ theorem. For the events A and B,
= ( 14 )/( 12
5
)
Pr(B|A)Pr(A)
Pr(A|B) = .
Pr(B|A)Pr(A) + Pr(B|A)Pr(A) = 35 .
Note that the denominator above is simply an

Questions
expression for Pr(B). The fact that
22.1 An integer is selected uniformly at random
Pr(B) = Pr(B|A)Pr(A) + Pr(B|A)Pr(A)
from the set {1, 2, . . . , 15}. What is the
is due to the law of total probability. probability that it is divisible by 5, given
that it is odd?
22.5 Bayes’ theorem examples 22.2 A standard die is rolled twice. What is the
probability that the first roll is a 1, given
Example. Luke Skywalker discovers that some that the sum of the rolls is 6?
porgs have an extremely rare genetic mutation
that makes them powerful force users. He de- 22.3 A bag contains three black marbles and
velops a test for this mutation that is right 99% two white marbles and they are randomly
of the time and decides to test all the porgs on selected and removed, one at a time un-
Ahch-To. Suppose there are 100 mutant porgs til the bag in empty. Use Bayes’ theo-
in the population of 24 million. We would guess rem to calculate the probability that the
that the test would come up positive for 99 of the first marble selected is black, given that
100 mutants, but also for 239 999 non-mutants. the second marble selected is black.
We are assuming that the conditional prob-
ability of a porg testing positive given it’s a mu-
tant is 0.99. But what is the conditional prob-
ability of it being a mutant given that it tested
positive? From our guesses, we would expect
99
this to be 99+239999 ≈ 0.0004. Bayes’ theorem
gives us a way to formalise this:
Pr(P |M )Pr(M )
Pr(M |P ) = Pr(P |M )Pr(M )+Pr(P |M )Pr(M )
100
×0.99
= 100
24000000
×0.99+(1− 100
)×0.01
24000000 24000000
99
= 99+239999
≈ 0.0004.
Lecture 23: Random variables
In a game, three standard dice will be rolled Example. A standard die is rolled three times.
and the number of sixes will be recorded. We Let X be the number of sixes rolled. What is the
could let X stand for the number of sixes rolled. probability distribution of X? Obviously X can
Then X is a special kind of variable whose value only take values in {0, 1, 2, 3}. Each roll there
is based on a random process. These are called is a six with probability 16 and not a six with
random variables. probability 65 . The rolls are independent.
Because the value of X is random, it doesn’t 5 5 5
Pr(X = 0) = 6 × 6 × 6
make sense to ask whether X = 0, for exam-
ple. But we can ask what the probability is that Pr(X = 1) = ( 61 )( 65 )( 56 ) + ( 56 )( 16 )( 56 ) + ( 56 )( 56 )( 16 )
X = 0 or that X > 2. This is because “X = 0” Pr(X = 2) = ( 16 )( 61 )( 56 ) + ( 16 )( 56 )( 16 ) + ( 56 )( 16 )( 16 )
and “X > 2” correspond to events from our 1 1 1
sample space. Pr(X = 3) = 6 × 6 × 6
So the probability distribution of X is
23.1 Formal definition x 0 1 2 3
125 75 15 1
.
Formally, a random variable is defined as a func- Pr(X = x) 216 216 216 216
tion from the sample space to R. In the example
above, X is a function from the process’s sample 23.3 Independence
space that maps every outcome to the number
of sixes in that outcome. We have seen that two events are independent
when the occurrence or non-occurrence of one
Example. Let X be the number of 1s in event does not affect the likelihood of the other
a binary string of length 2 chosen uniformly occurring. Similarly two random variables are
at random. Formally, X is a function from independent if the value of one does not affect
{00, 01, 10, 11} to {0, 1, 2} such that the likelihood that the other will take a certain
value.
X(00) = 0, X(01) = 1, X(10) = 1, X(11) = 2.
For most purposes, however, we can think of X Random variables X and Y are independent
as simply a special kind of variable. if, for all x and y,
Pr(X = x ∧ Y = y) = Pr(X = x)Pr(Y = y).
23.2 Probability distribution
We can describe the behaviour of a random vari- Example. An integer is generated uniformly at
able X by listing, for each value x that X can random from the set {10, 11, . . . , 29}. Let X and
take, the probability that X = x. This gives the Y be its first and second (decimal) digit. Then
probability distribution of the random variable. X and Y are independent random variables be-
Again, formally this listing is a function from cause, for x ∈ {1, 2} and {0, 1, . . . , 9},
the values of X to their probabilities.
1
Pr(X = x ∧ Y = y) = 20
Example. Continuing with the last example, 1 1
= 2 × 10
the probability distribution of X is given by = Pr(X = x)Pr(Y = y).
 1

 4 if x = 0
Pr(X = x) = 1
if x = 1 23.4 Operations
 2
1
if x = 2. From a random variable X, we can create new

4
It can be convenient to give this as a table: random variables such as X + 1, 2X and X 2 .
These variables work as you would expect them
x 0 1 2 to.
1 1 1
.
Pr(X = x) 4 2 4
Example. If X is the random variable with dis- Questions
tribution
23.1 An elevator is malfunctioning. Every
x −1 0 1 minute it is equally likely to ascend one
1 1 1
,
Pr(X = x) 6 3 2
floor, descend one floor, or stay where it
is. When it begins malfunctioning it is on
then the distributions of X + 1, 2X and X 2 are
level 5. Let X be the level it is on three
y 0 1 2 minutes later. Find the probability distri-
Pr(X + 1 = y) 1 1 1 bution for X.
6 3 2
y −2 0 2 y 0 1 23.2 An integer is generated uniformly at ran-

1 1 1 1 2
. dom from the set {11, 12, . . . , 30}. Let X
Pr(2X = y) 6 3 2 Pr(X 2 = y) 3 3
and Y be its first and second (decimal)
digit. Are the random variables X and Y
23.5 Sums and products
independent?
From random variables X and Y we can define a
23.3 Let X and Y be independent random vari-
new random variable Z = X + Y . Working out
ables with distributions
the distribution of Z can be complicated, how-
ever. We give an example below of doing this x 0 2
when X and Y are independent. Pr(X = x) 1 3
4 4
Example. Let X and Y be independent ran-

dom variables with y 0 1 2 3
1 1 1 1
.
x 0 1 2 y 0 1 2 3 Pr(Y = y) 4 4 4 4
1 1 1 1 1 1 1
.
Pr(X = x) 4 2 4 Pr(Y = y) 6 3 3 6 Find the probability distribution of Z =
X +Y.
Let Z = X + Y . To find Pr(Z = z) for some
value of z, we must consider all the ways that
X + Y could equal z. For example, X + Y = 3
could occur as (X, Y ) = (0, 3), (X, Y ) = (1, 2)
or (X, Y ) = (2, 1). Because X and Y are inde-
pendent, we can find the probability that each
of these occur
1 1 1
Pr(X = 0 ∧ Y = 3) = 4 × 6 = 24 ,
Pr(X = 1 ∧ Y = 2) = 12 × 13 = 61 ,
Pr(X = 2 ∧ Y = 1) = 41 × 13 = 12
1
.
So, because the three are mutually exclusive,
1 1 1 7
Pr(X = 3) = + + = .
24 6 12 24
Doing similar calculations for each possible
value, we see that the distribution of Z is
z 0 1 2 3 4 5
1 1 7 7 1 1
.
Pr(Z = z) 24 6 24 24 6 24
The distribution of a product of two indepen-

dent random variables can be found in a similar
way.
Finding the distribution of sums or prod-
ucts of dependent random variables is even more
complicated. In general, this requires knowing
the probability of each possible combination of
values the variables can take.
Lecture 24: Expectation and variance
A standard die is rolled some number of Then

times and the average of the rolls is calculated. 2 1 1
E[X] = × −10 + ×4+ × 10 = −1
If the die is rolled only once this average is just 5 2 10
the value rolled and is equally likely to be 1, 2, Because this value is negative, Acme shares will
3, 4, 5 or 6. If the die is rolled ten times, then almost certainly decrease in value over the long
the average might be between 1 and 2 but this term.
is pretty unlikely – it’s much more likely to be Notice that it was important that we
between 3 and 4. If the die is rolled ten thou- weighted our average using the probabilities
sand times, then we can be almost certain that here. If we had just taken the average of -10,
the average will be very close to 3.5. We will 4 and 10 we would have gotten the wrong an-
see that 3.5 is the expected value of a random swer by ignoring the fact that some values were
variable representing the die roll. more likely than others.
24.1 Expected value 24.2 Law of large numbers
When we said “average” above, we really meant Our initial die-rolling example hinted that the
“mean”. Remember that the mean of a collec- average of a large number of independent trials
tion of numbers is the sum of the numbers di- will get very close to the expected value. This
vided by how many of them there are. So the is mathematically guaranteed by a famous the-
mean of x1 , . . . , xt is x1 +···+x t
. The mean of 2,2,3 orem called the law of large numbers.
t
2+2+3+11
and 11 is 4 = 4.5, for example.
Let X1 , X2 , . . . be independent random vari-
The expected value of a random variable is
ables, all with the same distribution and ex-
calculated as a weighted average of its possible
pected value µ. Then
values.
lim 1 (X1 + · · · + Xn ) = µ.
n→∞ n
If X is a random variable with distribution
x x1 x2 · · · xt
, 24.3 Linearity of expectation
Pr(X = x) p1 p2 · · · pt
then the expected value of X is We saw in the last lecture that adding random
variables can be difficult. Finding the expected
E[X] = p1 x1 + p2 x2 + · · · + pt xt .
value of a sum of random variables is easy if we
know the expected values of the variables.
Example. If X is a random variable represent-
ing a die roll, then If X and Y are random variables, then
1 1 1 E[X + Y ] = E[X] + E[Y ].
E[X] = ×1+ × 2 + ··· + × 6 = 3.5.
6 6 6
Example. Someone estimates that each year This works even if X and Y are not independent.
the share price of Acme Corporation has a 10% Similarly, finding the expected value of a
chance of increasing by $10, a 50% chance of in- scalar multiple of a random variable is easy if
creasing by $4, and a 40% chance of falling by we know the expected value of the variable.
$10. Assuming that this estimate is good, are
Acme shares likely to increase in value over the If X is a random variable and s ∈ R, then
long term? E[sX] = sE[X].
We can represent the change in the Acme
share price by a random variable X with distri-
Example. Two standard dice are rolled. What
bution
is the expected total?
x −10 4 10 Let X1 and X2 be random variables repre-
2 1 1
.
Pr(X = x) 5 2 10
senting the first and second die rolls. From the
earlier example E[X1 ] = E[X2 ] = 3.5 and so Similarly,
1 1
E[X1 + X2 ] = E[X1 ] + E[X2 ] = 3.5 + 3.5 = 7. Var[Y ] = 2 × (−1)2 + 2 × 12 = 1
1 1
Var[Z] = × (−50)2 + × 502 = 2500.
Example. What is the expected number of ‘11’ 2 2
substrings in a binary string of length 5 chosen Notice that the variance of X is much smaller
uniformly at random? than the variance of Z because X is very likely
For i = 1, . . . , 4, let Xi be a random vari- to be close to its expected value whereas Z will
able that is equal to 1 if the ith and (i + 1)th certainly be far from its expected value.
bits of the string are both 1 and is equal to 0
Example. Let X be a random variable with
otherwise. Then X1 + · · · + X4 is the number
distribution given by
of ‘11’ substrings in the string. Because the bits
are independent, Pr(Xi = 1) = 21 × 12 = 14 and x 0 2 6
E[Xi ] = 41 for i = 1, . . . , 4. So, 1 1 1
.
Pr(X = x) 6 2 3
4 Then the expected value of X is
E[X1 + · · · + X4 ] = E[X1 ] + · · · + E[X4 ] = = 1.
4 1 1 1
Note that the variables X1 , . . . , X4 in the above E[X] = ×0+ ×2+ × 6 = 3.
6 2 3
example were not independent, but we were still So, the variance of X is
allowed to use linearity of expectation. 1 1 1
Var[X] = ×(0−3)2 + ×(2−3)2 + ×(6−3)2 = 5.
6 2 3
24.4 Variance
Questions
Think of the random variables X, Y and Z
whose distributions are given below. 24.1 Do you agree or disagree with the following
statement? “The expected value of a ran-
x −1 99 y −1 1
dom variable is the value it is most likely
99 1 1 1
Pr(X = x) 100 100 Pr(Y = y) 2 2 to take.”
z −50 50 24.2 Let X be the sum of 1000 spins of the spin-
1 1
Pr(Z = z) 2 2 ner from Lecture 21, and let Y be 1000
These variables are very different. Perhaps X times the result of a single spin. Find E[X]
corresponds to buying a raffle ticket, Y to mak- and E[Y ]. Which of X and Y do you think
ing a small bet on a coin flip, and Z to making would have greater variance?
a large bet on a coin flip. However, if you only
24.3 Let X be the number of heads occurring
consider expected value, all of these variables
when three fair coins are flipped. Find
look the same – they each have expected value
E[X] and Var[X].
0.
To give a bit more information about a ran-
dom variable we can define its variance, which
measures how “spread out” its distribution is.
If X is a random variables with E[X] = µ,

Var[X] = E[(X − µ)2 ].
So the variance is a measure of how much we

expect the variable to differ from its expected
value.
Example. The variable X above will be 1

smaller than its expected value with probabil-
99
ity 100 and will be 99 larger than its expected
1
value with probability 100 . So
99 1
Var[X] = × (−1)2 + × 992 = 99.
100 100
Lecture 25: Discrete distributions
In this lecture we’ll introduce some of the 25.3 Geometric distribution

most common and useful (discrete) probability
distributions. These arise in various different This distribution gives the probability that, in a
real-world situations. sequence of independent Bernoulli trials, we see
exactly k failures before the first success.
The geometric distribution with parameter

25.1 Discrete uniform distribution
p ∈ [0, 1] is given by
This type of distribution arises when we choose Pr(X = k) = p(1 − p)k for k ∈ N.
one of a set of consecutive integers so that all
1−p 1−p
choices are equally likely. We have E[X] = p and Var[X] = p2
.
Geometric distribution with p = 0.5

The discrete uniform distribution with pa-
0.5
rameters a, b ∈ Z (a 6 b) is given by
1
Pr(X = k) = b−a+1 for k ∈ {a, a + 1, . . . , b}.
0.4
(b−a+1)2 −1
Pr(X = k)
a+b
We have E[X] = 2 and Var[X] = 12 . 0.3
Uniform distribution with a = 3, b = 8
0.2
0.2
0.1
0.15
0
Pr(X = k)
0 2 4 6 8 10 12
0.1 k
Example. If every minute there is a 1% chance

0.05
that your internet connection fails then the
probability of staying online for exactly x con-
0 secutive minutes is approximated by a geometric
0 2 4 6 8 10
distribution with p = 0.01. It follows that the
k
expected value is 1−0.01
0.01 = 99 minutes and the
1−0.01
variance is (0.01)2 = 9900.
25.2 Bernoulli distribution

25.4 Binomial distribution
This type of distribution arises when we have a
single process that succeeds with probability p This distribution gives the probability that, in
and fails otherwise. Such a process is called a a sequence of n independent Bernoulli trials, we
Bernoulli trial. see exactly k successes.
The Bernoulli distribution with parameter The binomial distribution with parameters
p ∈ [0, 1] is given by n ∈ Z+ and p ∈ [0, 1] is given by
(
Pr(X = k) = nk pk (1 − p)n−k

p for k = 1
Pr(X = k) =
1 − p for k = 0. for k ∈ {0, . . . , n}.
We have E[X] = p and Var[X] = p(1 − p). We have E[X] = np and Var[X] = np(1 − p).
Binomial distribution with n = 20, p = 0.5 λ = 6 approximates the probability it receives
k calls in a certain minute. It follows that the
0.2 expected value is 6 calls and the variance is 6.
0.15 Questions
Pr(X = k)
25.1 There is a 95% chance of a packet being

0.1 received after being sent down a noisy line,
and the packet is resent until it is received.
0.05 What is the probability that the packet is
received within the first three attempts?
0 25.2 A factory aims to have at most 2% of the

0 5 10 15 20 components it makes be faulty. What is
k
the probability of a quality control test of
Example. If 1000 people search a term on a 20 random components finding that 2 or
certain day and each of them has a 10% chance more are faulty, if the factory is exactly
of clicking a sponsored link, then the number of meeting its 2% target?
clicks on that link is approximated by a binomial
25.3 The number of times a machine needs ad-
distribution with n = 1000 and p = 0.1. It fol-
justing during a day approximates a Pois-
lows that the expected value is 1000 × 0.1 = 100
son distribution, and on average the ma-
clicks and the variance is 1000 × 0.1 × 0.9 = 90.
chine needs to be adjusted three times per
day. What is the probability it does not
25.5 Poisson distribution need adjusting on a particular day?
In many situations where we know that an aver-
age of λ events occur per time period, this dis-
tribution gives a good model of the probability
that k events occur in a time period.
The Poisson distribution with parameter

λ ∈ R (where λ > 0) is given by
λk e−λ
Pr(X = k) = for k ∈ N.
k!
We have E[X] = λ and Var[X] = λ.

Poisson distribution with λ = 4
0.2
0.15
Pr(X = k)
0.1
0.05
0
0 2 4 6 8 10 12 14 16
k
Example. If a call centre usually receives 6 calls

per minute, then a Poisson distribution with
Lecture 26: Recursion
Just as the structure of the natural numbers Remark. Using a recursive program to com-
supports induction as a method of proof, it sup- pute Fibonacci numbers can easily lead to stack
ports induction as a method of definition or of overflow, because each value depends on two
computation. previous values (each of which depends on an-
When used in this way, induction is usually other two, and so on).
called recursion, and one speaks of a recursive A more efficient way to use the recursive defi-
definition or a recursive algorithm. nition is to use three variables to store F (k + 1),
F (k) and F (k − 1). The new values of these
26.1 Recursive Definitions variables, as k increases by 1, depend only on
the three stored values, not on all the previous
Many well known functions f (n) are most easily
values.
defined in the “base step, induction step” for-
mat, because f (n + 1) depends in some simple
way on f (n).
The induction step in the definition is more 26.2 Properties of recursively defined
commonly called the recurrence relation for f , functions
and the base step the initial value.
These are naturally proved by induction, using
Example. The factorial f (n) = n! a base step and induction step which parallel
those in the definition of the function.
Initial value. 0! = 1.
Recurrence relation. (k + 1)! = (k + 1) × k!
Many programming languages allow this Example. For n > 5, 10 divides n!
style of definition, and the value of the func-
tion is then computed by a descent to the initial
value. Proof Base step.
For example, to compute 4!, the machine 5! = 5 × 4 × 3 × 2 × 1 = 10 × 4 × 3,
successively computes
hence 10 divides 5!.
4! = 4 × 3! Induction step. We have to show
= 4 × (3 × 2!) 10 divides k! =⇒ 10 divides (k + 1)!
= 4 × (3 × (2 × (1!)))
Since (k + 1)! = (k + 1) × k! by the recurrence
= 4 × (3 × (2 × (1 × 0!))) relation for factorial, the induction step is clear,
which can finally be evaluated since 0! = 1. and hence the induction is complete.
Remark. The numbers 4, 3, 2, 1 have to be
stored on a “stack” before the program reaches Example. F (0) + F (1) + · · · + F (n) = F (n +
the initial value 0! = 1 which finally enables it 2) − 1.
to evaluate 4!.
Thus a recursive program, though short,
may run slowly and even cause “stack overflow.” Proof Base step. F (0) = 0 = F (2) − 1,
because F (2) = 1.
Induction step. We have to show
Example. The Fibonacci sequence
0, 1, 1, 2, 3, 5, 8, . . . F (0) + F (1) + · · · + F (k)
= F (k + 2) − 1
The nth number F (n) in this sequence is defined
by ⇒ F (0) + F (1) + · · · + F (k + 1)
= F (k + 3) − 1.
Initial values. F (0) = 0, F (1) = 1.
Recurrence relation. F (k +1) = F (k)+F (k −1).
Well, hence it follows (by induction) that
F (0) + F (1) + · · · + F (k) f (n) = number of n bit strings
= F (k + 2) − 1 with no consecutive 0s
⇒ F (0) + F (1) + · · · + F (k + 1) = F (n + 2).
= F (k + 2) + F (k + 1) − 1,
by adding F (k + 1) to both sides Questions
⇒ F (0) + F (1) + · · · + F (k + 1) 26.1 A function s(n) is defined recursively by
= F (k + 3) − 1 Initial value: s(0) = 0
since F (k + 2) + F (k + 1) = F (k + 3) Recurrence relation: s(n + 1) = s(n) + 2n + 1
by the Fibonacci recurrence relation Write down the first few values of s(n),
This completes the induction. and guess what function s is.
26.2 Check that the function s you guessed in

26.3 Problems with recursive solu-
Question 26.1 satisfies
tions
s(0) = 0 and s(n + 1) = s(n) + 2n + 1
Sometimes a problem about n reduces to a prob-
lem about n − 1 (or smaller numbers), in which (This proves by induction that your guess
case the solution may be a known recursively is correct.)
defined function.
26.3 If a sequence satisfies the Fibonacci recur-
Example. Find how many n-bit binary rence relation,
strings contain no two consecutive 0s. f (n) = f (n − 1) + f (n − 2),
We can divide this problem into two cases. must it agree with the Fibonacci sequence
from some point onward?
1. Strings which end in 1, e.g. 1101101.
In this case, the string before the 1 (110110
here) can be any (n − 1) bit string with no
consecutive 0s.
2. Strings which end in 0, e.g. 1011010.

In this case, the string must actually end
in 10, to avoid consecutive 0s, but the
string before the 10 (10110 here) can be
any (n − 2) bit string with no consecutive
0s.
Thus among strings with no consecutive 0s we

find
1. Those with n bits ending in 1 correspond

to those with (n − 1) bits.
2. Those with n bits ending in 0 correspond

to those with (n − 2) bits.
Hence if we let f (n) be the number of such

strings with n bits we have
f (n) = f (n − 1) + f (n − 2).
This is the Fibonacci recurrence relation.
It can also be checked that
f (1) = 2 = F (3) and f (2) = 3 = F (4),
Lecture 27: Recursive Algorithms
Recursion may be used to define functions 27.2 Products

whose definition normally involves “ · · · ”, to give
Example. 1 × 2 × 3 × · · · × n
algorithms for computing these functions, and to
prove some of their properties. This is the function n! defined recursively by
Initial value. 0! = 1
27.1 Sums Recurrence relation. (k + 1)! = (k + 1) × k!
Example. 1 + 2 + 3 + · · · + n
This is the function f (n) defined by 27.3 Sum and product Notation
Initial value. f (1) = 1
n
Recurrence relation. f (k + 1) = f (k) + (k + 1) X
1 + 2 + 3 + · · · + n is written k,
k=1
Example. 1 + a + a2 + ··· + an n
X
2 n
1 + a + a + · · · + a is written ak .
This is the function g(n) defined by
k=0
Initial value. g(0) = 1 Σ is capital sigma, standing for “sum.”
Recurrence relation. g(k + 1) = g(k) + ak+1 n
Y
1 × 2 × 3 × · · · × n is written k.
We can use this relation to prove by induc- k=1
n+1
tion that g(n) = a a−1−1 (a formula for the sum Π is capital pi, standing for “product.”
of a geometric series), provided a 6= 1.
27.4 Binary search algorithm
Proof
a0+1 −1 Given a list of n numbers in order
Base step. For n = 0, 1 = g(0) = a−1 , as
required.
x1 < x2 < · · · < xn ,
Induction step. We want to prove
we can find whether a given number a is in the
ak+1 − 1 ak+2 − 1 list by repeatedly “halving” the list.
g(k) = ⇒ g(k + 1) = .
a−1 a−1 The algorithm binary search is specified
Well, recursively by a base step and a recursive step.
ak+1 − 1
g(k) = Base step. If the list is empty,
a−1
ak+1 − 1 report ‘a is not in the list.’
⇒ g(k + 1) = + ak+1
a−1
ak+1 − 1 + (a − 1)ak+1 Recursive step If the list is not empty, see
⇒ g(k + 1) = whether its middle element is a. If so, report
a−1
k+2 k+1 ‘a found.’
a +a − ak+1 − 1
= Otherwise, if the middle element m > a,
a−1
k+2 binary search the list of elements < m.
a −1
= as required. And if the middle element m < a, binary
a−1
search the list of elements > m.
This completes the induction.
27.5 Correctness half with each question.
E.g. if the answer is an integer, do binary
We prove that the algorithm works on a list of
search on the list of possible answers. If the
n items by strong induction on n.
answer is a word, do binary search on the list
Base step. The algorithm works correctly on
of possible answers (ordered alphabetically). If
a list of 0 numbers, by reporting that a is not in
this is done, then 20 questions suffice to find the
the list.
correct answer out of 220 = 1, 048, 576 possibili-
Induction step. Assuming the algorithm
ties.
works correctly on any list of < k + 1 numbers,
suppose we have a list of k + 1 numbers.
The recursive step either finds a as the mid- Questions
dle number in the list, or else produces a list of
P
27.1 Rewrite the following sums using nota-
< k +1 numbers to search, which by assumption tion.
it will do correctly.
This completes the induction. • 1 + 4 + 9 + 16 + · · · + n2
Remark. This example shows how easy it is to • 1 − 2 + 3 − 4 + · · · − 2n
prove correctness of recursive algorithms, which 27.2 Which of the proofs in this lecture uses
may be why they are popular despite the prac- strong induction?
tical difficulties in implementing them.
27.3 Imagine a game where the object is to
27.6 Running time identify a natural number between 1 and
220 using 20 questions with YES-NO an-
log2 n is the number x such that
swers. The lecture explains why 20 ques-
n = 2x . tions are sufficient to identify any such
For example, 1024 = 210 , and therefore number.
Explain why less than 20 YES-NO ques-
log2 1024 = 10. tions are not always sufficient.
Similarly log2 512 = 9, and hence log2 1000 is
between 9 and 10.
Repeatedly dividing 1000 by 2 (and discard-
ing remainders of 1) runs for 9 steps:
500, 250, 125, 62, 31, 15, 7, 3, 1
The 10 halving steps for 1024 are
512, 256, 128, 64, 32, 16, 8, 4, 2, 1
This means that the binary search algorithm
would do at most 9 “halvings” in searching a
list of 1000 numbers and at most 10 “halvings”
for 1024 numbers.
More generally, binary search needs at most
blog2 nc “halvings” to search a list of n numbers,
where blog2 nc is the floor of log2 n, the greatest
integer 6 log2 n.
Remark. In a telephone book with 1,000,000
names, which is just under 220 , it takes at most
20 halvings (using alphabetical order) to find
whether a given name is present.
27.7 20 questions
A mathematically ideal way to play 20 questions
would be to divide the number of possibilities in
Lecture 28: Recursion, lists and sequences
A list or sequence of objects from a set X is “Unfolding” tn , we see that multiplication

a function f from {1, 2, . . . , n} to X, or by r is done n − 1 times, hence
(if infinite) from {1, 2, 3, . . .} to X.
tn = arn−1 .
We usually write f (k) as xk and the list as
hx1 , x2 , . . . , xn i, or hx1 , x2 , x3 , . . .i. Thus The above recurrence relations are called
first order because tk+1 depends on only the
f (1) = x1 = first item on list
previous value, tk . (Or, because the values of
f (2) = x2 = second item on list all terms follow from one initial value.)
.. A second order recurrence relation requires
.
two initial values, and is usually harder to un-
The empty list is written hi. fold.
Example. Example. A simple sequence in disguise
hm, a, t, h, si is a function f from {1, 2, 3, 4, 5}
into the English alphabet, with f (1) = m, Initial values. t0 = 1, t1 = 2
f (2) = a, etc. Recurrence relation. tk+1 = 2tk − tk−1
28.1 Sequences Calculating the first values, we find
A sequence is also a list, but when we use the t2 = 2t1 − t0 = 2 × 2 − 1 = 3,

term sequence we are usually interested in the t3 = 2t2 − t1 = 2 × 3 − 2 = 4,
rule by which the successive terms t1 , t2 , . . . are t4 = 2t3 − t2 = 2 × 4 − 3 = 5.
defined.
Often, the rule is a recurrence relation.
It looks like tn = n + 1, and indeed we can
Example. Arithmetic sequence prove this by induction. For the base step we
a, a + d, a + 2d, a + 3d, . . . have the initial values t0 = 1 = 0 + 1 and
t1 = 2 = 1 + 1. We do the induction step by
strong induction: assuming tn = n + 1 for all
This is defined by n 6 k, we deduce that tk+1 = k + 2.
Initial value. t1 = a In fact we have
Recurrence relation. tk+1 = tk + d
tk+1 = 2tk − tk−1
“Unfolding” this recurrence relation from tn by the recurrence relation
back to t1 , we see that d gets added n − 1 times, = 2(k + 1) − k
hence
by our assumption
tn = a + (n − 1)d. = 2k + 2 − k = k + 2
as required.
Example. Geometric sequence This completes the induction.
a, ar, ar2 , ar3 , . . . Example. Fibonacci sequence
Initial value. t1 = a
Recurrence relation. tk+1 = rtk Initial values. t0 = 0, t1 = 1
Recurrence relation. tk+1 = tk + tk−1 28.2 Let Tn be the number of ways of tiling a
2 × n strip with 2 × 1 tiles (which may be
It is possible to write tn directly as a func- rotated so they are 1 × 2). Find Tn for
tion of n. The function
√ is not at all obvious, n = 1, 2, 3, 4. Find a recurrence relation
because it involves 5: for Tn .
√ √ !
1 1 + 5 n 1 − 5 n
tn = √ − .
5 2 2
We do not go into how to find such a formula in
this unit. However, if someone gave you such a
formula you could prove it is correct by induc-
tion.
28.2 Relations – homogeneous and in-

homogeneous
Recurrence relations such as
tk+1 = 2tk
and
tk+1 = tk + tk−1
in which each term is a multiple of some tj , are
called homogeneous.
The characteristic property of any homoge-
neous equation is that if tn = f (n) is a solution,
then so is tn = cf (n), for any constant c.
E.g. tn = 2n is a solution of tk+1 = 2tk , and
so is tn = c2n , for any constant c.
Relations like tk+1 = tk + 3, in which there
is a term other than the tj terms, are called in-
homogeneous.
Homogeneous recurrence relations are usu-
ally easier to solve, and in fact there is a gen-
eral method for solving them (which we will not
cover in this unit).
There is no general method for solving in-
homogeneous recurrence relations, though they
can often be solved if the term other than the tj
terms is simple.
Questions
28.1 Find the next four values of each of the fol-
lowing recurrence relations. What order is
each recurrence relation? Which are ho-
mogeneous and which are inhomogeneous?
(a) rk+1 = rk + k 2 , r0 = 0.
(b) sk+1 = 3sk − 2sk−1 , s0 = 1, s1 = 2.
(c) tk+1 = tk +tk−2 +1, t0 = 1, t1 = 1, t2 = 1.

Lecture 29: Graphs
A graph consists of a set of objects called ver- 29.2 Problems given by graphs
tices and a list of pairs of vertices, called edges.
Many problems require vertices to be connected
Graphs are normally represented by pic-
by a “path” of successive edges. We shall define
tures, with vertex A represented by a dot la-
paths (and related concepts) next lecture, but
belled A and each edge AB represented by a
the following examples illustrate the idea and
curve joining A and B.
show how often it comes up.
Such pictures are helpful for displaying data
They also show how helpful it is to have
or relationships, and they make it easy to recog-
graph pictures when searching for paths.
nise properties which might otherwise not be no-
ticed.
1. Gray Codes
The description by lists of vertices and edges
is useful when graphs have to be manipulated by
The binary strings of length n are taken as the
computer. It is also a useful starting point for
vertices, with an edge joining any two vertices
precise definitions of graph concepts.
that differ in only one digit. This graph is called
the n-cube.
29.1 Examples of graphs
E.g. the 2-digit binary strings form a square
Description Picture (a “2-cube”).
A 01 11
Vertices: A, B, C
Edges: AB, BC, CA 00 10
B C
Such a graph, with at most one edge be- and the 3-digit binary strings form an ordinary
tween each pair of vertices, and no vertex joined cube (a “3-cube”).
to itself, is called a simple graph.
101 111
Description Picture
001 011
Vertices: A, B, C, D A
Edges: AB, AB, BC, BC, 000 010
AD, BD, CD B D
100 110
C
The two edges AB which join the same pair
of vertices are called parallel edges. A Gray code of length n is a path which in-
cludes each vertex of the n-cube exactly once.
Description Picture E.g. here is a path in the 3-cube which gives the
Gray code
Vertices: A, B A B 000, 001, 011, 010, 110, 111, 101, 100
Edges: AA, AB, AB
The edge joining A to A is called a loop. 101 111
The name multigraph is used when loops
and/or parallel edges are allowed. 001 011
Warning: A graph can be rep- 000 010

resented by pictures that look
A B
very different. This last example 100 110
could be redrawn as:
Remark. The n-cube has been popular as a
computer architecture in recent years. Proces-
sors are placed at the vertices of an n-cube (for
(0, 0, 6) (0, 3, 3)
n = 15 or so) and connected along its edges. (3, 0, 3)
2. Travelling salesman problem (0, 1, 5)
Vertices are towns. Two towns are joined by

an edge labelled ` if there is a road of length `
between them. E.g.
C (3, 1, 2)
(0, 4, 2) (3, 3, 0)
5 (2, 4, 0)
A
6
4 3 A graph like this, with some directed edges, is
6 called a directed graph or digraph.
B D This particular digraph shows that
The problem is to find a path of minimal (0, 0, 6) → (0, 4, 2) → (3, 1, 2) → (0, 1, 5)
length which includes all towns, in this case hence we can start with a full 6-litre jug and in
BADC. three pourings get 1 litre in the 4-litre jug and
5 litres in the 6-litre jug.
3. Jug problem
Suppose we have three jugs, which hold ex- Questions
actly 3, 4 and 6 litres respectively. We fill the 6-
litre jug, and then pour from one jug to another, 29.1 Write down descriptions of the following
always stopping when the jug being poured to graphs
becomes full or when the jug being poured from A D A B A B
becomes empty. picture
Is it possible to reach a state where one jug B C D C C D
contains 1 litre and another contains 5 litres?
We represent each possible state by a triple
description
(a, b, c), where
a = number of litres in the 3-litre jug 29.2 Draw pictures of graphs with the following
b = number of litres in the 4-litre jug descriptions.
c = number of litres in the 6-litre jug A, B, C A, B, C, D
description
AA, BB, CC AB, AC, AD
Each state is a vertex, and if state (a0 , b0 , c0 ) can
be reached from (a, b, c) by pouring as described
above, we put a directed edge in the graph: picture
(a, b, c) (a0 , b0 , c0 ) 29.3 Use the following picture of the 4-cube to

find a Gray code of length 4.
If (a, b, c) can also be reached from (a0 , b0 , c0 ), we

join them by an ordinary edge.
Then, listing the states that can be reached
from (0, 0, 6), we find the following graph.
Lecture 30: Walks, paths and trails
(
There are several ways of “travelling” around 1 if Vi is adjacent to Vj in G,
the edges of a graph. aij =
0 otherwise.
A walk is a sequence
V1
V1 , e1 , V2 , e2 , V3 , e3 , . . . , en−1 , Vn , For example, the graph
V2 V3
where each ei is an edge joining vertex Vi to ver-  
0 0 1
tex Vi+1 . (In a simple graph, where at most one
has adjacency matrix  0 0 1 
 
edge joins Vi and Vi+1 , it is sufficient to list the
vertices alone.) 1 1 0
If Vn = V1 the walk is said to be closed.
A path is a walk with no repeated vertices.
A trail is a walk with no repeated edges. 30.3 Adjacency matrix powers
The product of matrices
30.1 Examples    
a11 a12 · · · b11 b12 · · ·
In these pictures, a walk is indicated by a di-
a21 a22 · · · × b21 b22 · · ·
   
rected curve running alongside the actual edges
in the walk. ··· ··· ··· ··· ··· ···
is the matrix whose (i, j) entry is
ai1 b1j + ai2 b2j + ai3 b3j + · · · ,
–the “dot product” of the ith row
A walk which is not a trail or a path. (Repeated ai1 ai2 ai3 ···
edge, repeated vertex.) of the matrix on the left with the jth column
b1j
b2j
b3j
..
A trail which is not a path. (Repeated vertex.) .
of the matrix on the right.
The (i, j) entry in the k th power of the ad-

A nonclosed walk and a closed walk. jacency matrix gives the number of walks of
length k between Vi and Vj . The length of a
walk is the number of “steps” (edges) in it.
30.2 Adjacency matrix
If two vertices are joined by an edge we say that For example, suppose we want the number of
they are adjacent. walks of length 2 from V3 to V3 in the graph
A simple graph G with vertices V1 , V2 , . . . , Vn V1
is described by an adjacency matrix which has
(i, j) entry (ith row and j th column) aij given by V2 V3
The adjacency matrix M tells us that the fol- The (1, 1) entry 2 × 2 + 2 × 2 in N 2 , for example,
lowing edges exist. indicates that there are 8 walks of length 2 from
 
··· ··· 1 ← V1 toV3 V1 to V1 : 4 walks twice around the loop, and 4
· · · · · · 1 ← V2 toV3
  walks from V1 to V2 (2 ways) then V2 to V1 (2
ways).
1 1 0
This count distinguishes between different
↑ ↑ directions around the loop. It may help to re-
V3 V3 gard the loop as a pair of opposite directed loops.
to to We can generalise the adjacency matrix M
V1 V2 to directed graphs by letting the (i, j) entry be
So when we square this matrix, the (3, 3) entry the number of directed edges (which include or-
in M 2 dinary edges) from Vi to Vj .
  With this definition, the (i, j) entry of M k
1 gives the number of directed walks of length k
1 1 0 1 = 1 × 1 + 1 × 1 = 2
 
from Vi to Vj (i.e. walks that obey the directions
0 of edges).
counts the walks from V3 to V3 , namely
Questions
V3 → V1 → V3 and V3 → V2 → V3 .
30.1 Draw the graph/digraph/multigraph with
Similarly, the (i, j) entry in M 2 is the num-
adjacency matrix
ber of walks of length 2 from Vi to Vj . The (i, j)
entry in M 3 is the number of walks of length 3
 
1 0 1
from to Vi to Vj , and so on. M = 0 1 0  ,
 
In fact,
1 0 1
   
1 1 0 0 0 1
using V1 , V2 and V3 as names for the ver-
M2 × M =  1 1 0  ×  0 0 1 
   
tices corresponding to columns 1, 2 and 3
0 0 2 1 1 0 respectively.
has (3, 2) entry
30.2 Calculate M 2 , and use it to find the num-
ber of walks of length 2 from V1 to V3 .
 
0
0 0 2 0 = 2.
  Does this calculation give the number you
would expect from the graph?
1
Hence the number of walks of length 3 from V3 30.3 Without any calculation, show that the
to V2 is 2. middle row of any power M k is (0 1 0).
30.4 General adjacency matrix

The adjacency matrix can be generalised to
multigraphs by making the (i, j) entry the num-
ber of edges from Vi to Vj (special case: count
each loop twice).
For example, the graph V1 V2
has adjacency matrix
!
2 2
N=
2 0
and so
!
8 4
N2 = .
4 4
Lecture 31: Degree
The degree of a vertex A in a graph G is the

number of times A occurs in the list of edges of
G.
For example, if G is A B
then the list of edges is AA, AB, AB, and hence
Lecture 31: Degree
degree(A) = 4. Lecture 28. Degree The question came up: is it possible for a
Intuitively speaking, degree(A) is the num- walk to cross all seven bridges without crossing
ber of “ends” of edges occurring at A. In par- the same bridge twice?
ticular, a loop at A contributes 2 to the degree An equivalent question is whether there is
The degree of a vertex A in a graph G is the
of A. number of times A occurs in the list of edges of a came
The question trailup:which includes
is it possible for a walkall edges in the following
G.
For example, if G is r'>v---....
�
B
to cross all seven bridges without crossing the
graph.
same bridge twice?
An equivalent question is whether there is a
then the list of edges is AA, AB, AB and hence trail which includes all edges in the following
31.1 The handshaking
degree(A) = 4. lemma graph. A A
Intuitively speaking, degree(A) is the number
of "ends" of edges occurring at A. In particular, B� D
In any a loop at A contributes 2 to the degree of A.
graph, B D
C
sum of degrees = 2× number of edges.
31.1 The handshaking lemma 31.3 Euler's solution
In any graph,
C
Euler (1737) observed that the answer is no, be
The reason
sum offor the= name
degrees is ofthat
2 x number edges.if each edge
cause
. .
is viewed asThe
a reason
handshake,
for the name is that if each edge is 1. Each time a walk enters and leaves a vertex
viewed as a handshake, 31.3 Euler’s solution
it "uses up" 2 from the degree.
� 2. Hence if all edges are used by the walk, all
then at each vertex V Eulerthe(1737)
vertices except first and observed
last must havethat the answer is no, be-
then at each vertex V = number of hands.
degree(V) even degree.
cause
Hence 3. The seven bridges graph in fact has four ver
degree(V ) = number of hands. tices of odd degree.
sum of degrees 1. Each time a walk enters and leaves a ver-
Hence = total number of hands tex it “uses up” 2 from the degree.
sum of =degrees 31.4 Euler's theorem
2 x number of handshakes
The same argument shows in general that
= total number
An important of hands
consequence 2. Hence if all edges are used by the walk, all
A graph with > 2 odd degree vertices
× handshaking
= 2The number of implies that in any
handshakes
lemma vertices
has no trail using all its edges. except
the first and last must have
graph the sum of degrees is even (being
2xsomething). Thus it is impossible, e.g. for even degree.
And a similar argument shows
An important consequence
a graph to have degrees 1,2,3,4,5.
A graph with odd degree vertices
The handshaking lemma implies thatofin anyhas no closed trail 3. using
Theallseven
its edges.bridges graph in fact has four
31.2 The seven bridges
graph the sumKonigsberg of degrees is even (being vertices
(Because in this case the first andoflast
odd degree.
vertex
2×something). ThusKonigsberg
In 18th century it is impossible,
there were sevene.g. are
forthe same, and its degree is ''used up" by a
bridges connecting islands in the river to the closed trail as follows: 1 at the start, 2 each time
a graph tobanks
have degrees 1,2,3,4,5.
as follows. through, 1 at the end.)
31.4 Euler’s theorem
56
The same argument shows in general that
31.2 The seven bridges of Königsberg
A graph with > 2 odd degree vertices has no
In 18th century Königsberg there were seven trail using all its edges.
bridges connecting islands in the river to the
And a similar argument shows
banks as follows.
connected, since any unused edge would
A graph with odd degree vertices has no
be connected to used ones, and thus would
closed trail using all its edges.
have eventually been used).
(Because in this case the first and last vertex
are the same, and its degree is “used up” by a 31.6 Bridges
closed trail as follows: 1 at the start, 2 each time A bridge in a connected graph G is an edge
through, 1 at the end.) whose removal disconnects G. E.g. the edge
B is a bridge in the following graph.
31.5 The converse theorem
If, conversely, we have a graph G whose vertices B
all have even degree, is there a closed trail using
The construction of an Euler trail is improved
all the edges of G?
by the doing the following (Fleury’s algorithm).
Not necessarily. For example, G might be
disconnected : • Erase each edge as soon as it is used.
• Use a bridge in the remaining graph only
if there is no alternative.
It turns out, when this algorithm is used, that it
We say a graph is connected if any two of is not necessary to make any detours. The im-
its vertices are connected by a walk ( or equiva- provement, however, comes at the cost of need-
lently, by a trail or a path). We call a trail using ing an algorithm to recognise bridges.
all edges of a graph an Euler trail. Then we have
A connected graph with no odd degree ver- Questions

tices has a closed Euler trail. 31.1 For each of the following sequences, con-
struct a graph whose vertices have those
Such a closed trail can be constructed as follows:
degrees, or explain why no such graph ex-
1. Starting at any vertex V1 , follow a trail t1
ists.
as long as possible.
• 1, 2, 3, 4
2. The trail t1 eventually returns to V1 , be-
cause it can leave any other vertex it en- • 1, 2, 1, 2, 1
ters. (Immediately after the start, V1 has • 1, 2, 2, 2, 1
one “used” edge, and hence an odd num-
ber of “unused” edges. Any other vertex 31.2 A graph G has adjacency matrix
 
has an even number of “unused” edges.) 0 1 1 0 0
 
3. If t1 does not use all edges, retrace it to 1 0 0 0 0
 
the first vertex V2 where t1 meets an edge 1 0 0 0

0

not in t1 . 
0 0 0 2

1
4. At V2 add a “detour” to t1 by following a
 
0 0 0 1 2
trail out of V2 as long as possible, not using
edges in t1 . As before, this trail eventually Decide, without drawing the graph,
returns to its starting point V2 , where we whether G is connected or not.
resume the trail t1 . Let t2 be the trail t1
31.3 A graph H has adjacency matrix
plus the detour from V2 .  
5. If t2 does not use all the edges, retrace t2 2 0 1 0 1
 
to the first vertex V3 where t2 meets an 0
 2 0 1 0 
edge not in t2 . Add a detour at V3 , and so 1

0 2 0 1

on. 
0

 1 0 2 0 
6. Since a graph has only a finite number of
1 0 1 0 2
edges, this process eventually halts. The
result will be a closed trail which uses all What are the degrees of its vertices? Does
the edges (this requires the graph to be H have a closed Euler trail?
Lecture 32: Trees
A cycle is a closed trail (with at least one Remarks

edge) that doesn’t repeat any vertex except that
it ends where it started. A tree is a connected 1. This proof also shows that any edge in a
graph with no cycles. tree is a bridge.
For example, 2. Since a tree has one more vertex than edge,
it follows that m trees have m more ver-
tices than edges.
3. The theorem also shows that adding any

edge to a tree (without adding a vertex)
is a tree. creates a cycle. (Since the graph remains
connected, but has too many edges to be
a tree.)
32.1 The number of edges in a tree
These remarks can be used to come up with
A tree with n vertices has n − 1 edges. several equivalent definitions of tree.
Next we see how any connected graph can
The proof is by strong induction on n. be related to trees.
Base step. A tree with 1 vertex has 0 edges
(since any loop would itself be a cycle). 32.2 Spanning trees
Induction step. Supposing any tree with
j 6 k vertices has j −1 edges, we have to deduce A spanning tree of a graph G is a tree contained
that a tree with k + 1 vertices has k edges. in G which includes all vertices of G.
Well, given a tree Tk+1 with k + 1 vertices, For example,
we consider any edge e in Tk+1 , e.g.
A e B
is a spanning tree of
Removing e disconnects the ends A and B

of e. (If they were still connected, by some path
p, then p and e together would form a cycle in
Tk+1 , contrary to its being a tree.) Any connected graph G contains a spanning
Thus Tk+1 − {e} consists of two trees, say Ti tree.
and Tj with i and j vertices respectively. We
have i + j = k + 1 but both i, j 6 k, so our This is proved by induction on the number of
induction assumption gives edges.
Base step. If G has no edge but is connected
Ti has i − 1 edges, Tj has j − 1 edges. then it consists of a single vertex. Hence G itself
But then Tk+1 = Ti ∪ Tj ∪ {e} has is a spanning tree of G.
(i − 1) + (j − 1) + 1 = (i + j) − 1 = k edges, as Induction step. Suppose any connected
required. graph with 6 k edges has a spanning tree, and
we have to find a spanning tree of a connected e.g. in Chartrand’s Introductory Graph
graph Gk+1 with k + 1 edges. Theory.)
If Gk+1 has no cycle then Gk+1 is itself a
tree, hence a spanning tree of itself. 5. Another problem which can solved by a
If Gk+1 has a cycle p we can remove any edge “greedy” algorithm is splitting a natural
e from p and Gk+1 − {e} is connected (because number n into powers of 2. Begin by
vertices previously connected via e are still con- subtracting the largest such power 2m 6
nected via the rest of p). Since Gk+1 − {e} has n from n, then repeat the process with
one edge less, it contains a spanning tree T by n − 2m , etc.
induction, and T is also a spanning tree of Gk+1 .
Questions
Remark It follows from these two theorems 32.1 Which of the following graphs are trees?
that a graph G with n vertices and n − 2 edges In each case we insist that m 6= n.
(or less) is not connected.
If it were, G would have a spanning tree T , • vertices 1, 2, 3, 5, 7
with the same n vertices. But then T would an edge between m and n
have n − 1 edges, which is impossible, since it is if m divides n or n divides m
more than the number of edges of G.
• vertices 1, 2, 3, 4, 5
an edge between m and n
32.3 The greedy algorithm
if m divides n or n divides m
Given a connected graph with weighted edges,
• vertices 2, 3, 4, 5, 6
a minimal weight spanning tree T of G may be
an edge between m and n
constructed as follows.
if m divides n or n divides m
1. Start with T empty. 32.2 Find spanning trees of the following
2. While T is not a spanning tree for graphs (cube and dodecahedron).
G, add to T an edge ek+1 of minimal
weight among those which do not close
a cycle in T , together with the vertices
of ek+1 .
This is also known as Kruskal’s algorithm.
Remarks
32.3 Also find spanning trees of the cube and
1. T is not necessarily a tree at all steps of dodecahedron which are paths.
the algorithm, but it is at the end.
2. For a graph with n vertices, the algorithm

runs for n − 1 steps, because this is the
number of edges in a tree with n vertices.
3. The algorithm is called “greedy” because

it always takes the cheapest step available,
without considering how this affects future
steps. For example, an edge of weight 4
may be chosen even though this prevents
an edge of length 5 being chosen at the
next step.
4. The algorithm always works, though this

is not obvious, and the proof is not re-
quired for this course. (You can find it,
Lecture 33: Trees, queues and stacks
To search a graph G systematically, it helps each vertex v to a “predecessor” among the ad-
to have a spanning tree T , together with an or- jacent vertices of v already in T . An arbitrary
dering of the vertices of T . vertex is chosen as the root V0 of T .
1. Initially, T = tree with just one vertex V0 ,

33.1 Breadth first ordering
Q = the queue containing only V0 .
The easiest ordering to understand is called
2. While Q is nonempty
breadth first, because it orders vertices “across”
the tree in “levels.” 2.1. Let V be the vertex at the head of Q
2.2. If there is an edge e = V W in G
Level 0 is a given “root” vertex.
where W is not in T
Level 1 is the vertices one edge
2.2.1. Add e and W to T
away from the root.
2.2.2. Insert W in Q (at the tail).
Level 2 are the vertices two edges
away from the root, 2.3. Else remove V from Q.
. . . and so on. Remarks
Example.
1. If the graph G is not connected, the al-
gorithm gives a spanning tree of the con-
A, B, C, D, E, F, G A nected component containing the root ver-
is a breadth first ordering of tex A, the part of G containing all vertices
B C connected to A.
2. Thus we can recognise whether G is con-

DEF G nected by seeing whether all its vertices
are included when the algorithm termi-
nates.
33.2 Queues
Breadth first ordering amounts to putting ver- 3. Being able to recognise connectedness en-
tices in a queue - a list processed on a “first ables us, e.g., to recognise bridges.
come, first served” or “first in, first out” basis.
• The root vertex is first in the queue (hence

first out).
• Vertices adjacent to the head vertex v in

the queue go to the tail of the queue (hence
they come out after v), if they are not al-
ready in it.
• The head vertex v does not come out of

the queue until all vertices adjacent to v
have gone in.
33.3 Breadth first algorithm

For any connected graph G, this algorithm not
only orders the vertices of G in a queue Q, it
also builds a spanning tree T of G by attaching
Example. Example.
A We use the same G, and take the top of S to be
its right hand end.
If G = B C , with root vertex A.
D E
Step S T
Then Q and T grow as follows: 1 A A
A
2 AB B
Step Q T
A A
1 A B C
3 ABC
A
B A
2 AB B C
A E
B C 4 ABCE
3 ABC
A
4 BC B C
A 4 ABCED D E
B C
D 6 ABCE
5 BCD
A 7 ABC
B C 8 AB
6 BCDE D E 9 A
7 CDE
8 DE
9 E Questions
33.1 The following list gives the state, at suc-
cessive stages, of either a queue or a stack.
A
33.4 Depth first algorithm
AB
This is the same except it has a stack S instead ABC
of a queue Q. S is “last in, first out,” so we
insert and remove vertices from the same end of BC
S (called the top of the stack). BCD
CD
1. Initially, T = tree with just one vertex V0 ,
D
S = the stack containing only V0 .
Which is it: a queue or a stack?
2. While S is nonempty 33.2 Construct a breadth first spanning tree for
2.1. Let V be the vertex at the top of S the graph
2.2. If there is an edge e = V W in G A
where W is not in T
B D
2.2.1. Add e and W to T
2.2.2. Insert W in S (at the top).
C E
2.3. Else remove V from S.
33.3 Construct a depth first spanning tree for
Remark. The breadth first and depth first al- the graph in Question 33.2.
gorithms give two ways to construct a spanning
tree of a connected graph.
Useful notation
Sets of numbers
N the set of natural numbers {0, 1, 2, 3, . . .}
Z the set of integers {. . . , −2, −1, 0, 1, 2, . . .}
Q the set of rational numbers { ab : a, b ∈ Z, b 6= 0}
R the set of real numbers
Number Theory
a|b a divides b b = qa for some q ∈ Z
gcd(a, b) greatest common divisor of a and b
a ≡ b (mod n) a and b are congruent modulo n n | (a − b)
Logic
¬p not p
p∧q p and q
p∨q p or q
p→q p implies q
∀x for all x
∃x there exists x
Sets
x∈A x is an element of A
{x : P (x)} the set of x such that P (x)
|A| the number of elements in A
A⊆B A is a subset of B
A∩B A intersect B {x : x ∈ A ∧ x ∈ B}
A∪B A union B {x : x ∈ A ∨ x ∈ B}
A−B set difference A minus B {x : x ∈ A ∧ x ∈
/ B}
Functions
f :A→B f is a function from A to B
Probability
Pr(E) probability of E
Pr(A|B) conditional probability of A given B
E[X] expected value of X
Var[X] variance of X
Sums and products

Pb
f (i) sum of f (i) from i = a to i = b f (a) + f (a + 1) + · · · + f (b)
i=a
b
Q
f (i) product of f (i) from i = a to i = b f (a) × f (a + 1) × · · · × f (b)
i=a
Useful formulas
Logic Laws Binomial theorem

p ↔ q ≡ (p → q) ∧ (q → p) n
(x + y)n = xn y 0 + n1 xn−1 y 1 + n2 xn−2 y 2 +

0
p → q ≡ (¬p) ∨ q n
· · · + n−1
1 n−1
x y + nn x0 y n

¬¬p ≡ p
p∧p≡p
p∨p≡p Conditional probability
p∧q ≡q∧p Pr(A ∩ B)
Pr(A|B) =
Pr(B)
p∨q ≡q∨p
p ∧ (q ∧ r) ≡ (p ∧ q) ∧ r Bayes’ theorem
p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r Pr(B|A)Pr(A)
Pr(A|B) =
Pr(B|A)Pr(A) + Pr(B|A)Pr(A)
p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)
p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r)
¬(p ∧ q) ≡ (¬p) ∨ (¬q) Discrete uniform distribution
¬(p ∨ q) ≡ (¬p) ∧ (¬q) 1
Pr(X = k) = for k ∈ {a, a + 1, . . . , b}
b−a+1
p∧T≡p a+b (b−a+1)2 −1
E[X] = 2 , Var[X] = 12
p∨F≡p
Bernoulli distribution
p∧F≡F (
p∨T≡T p for k = 1
Pr(X = k) =
1 − p for k = 0
p ∧ (¬p) ≡ F
E[X] = p, Var[X] = p(1 − p)
p ∨ (¬p) ≡ T
p ∧ (p ∨ q) ≡ p Geometric distribution
p ∨ (p ∧ q) ≡ p Pr(X = k) = p(1 − p)k for k ∈ N
1−p 1−p
¬∀xP (x) ≡ ∃x¬P (x) E[X] = p , Var[X] = p2
¬∃xP (x) ≡ ∀x¬P (x)
Binomial distribution
n k

Pr(X = k) = p (1 − p)n−k for k ∈ {0, . . . , n}
k
Ordered selections without repetition
E[X] = np, Var[X] = np(1 − p)
n!
n(n − 1) · · · (n − r + 1) =
(n − r)! Poisson distribution
λk e−λ
Unordered selections without repetition Pr(X = k) = for k ∈ N
k!
n(n − 1) · · · (n − r + 1) n! n E[X] = λ, Var[X] = λ

= =
r! r!(n − r)! r
Ordered selections with repetition

nr
Unordered selections with repetition

n+r−1 (n + r − 1)!

=
r r!(n − 1)!

Discrete Mathematics Monash University

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Discrete Mathematics Monash University

Hochgeladen von

Copyright:

Verfügbare Formate

Contents

Lecture 1: What is MAT1830 about?

Lecturer (Clayton) Lecturer (Malaysia)

Mathematics Learning Centre

• Induction and recursion • Logic is used in digital circuit design and

Thus to test whether 10001 is prime, say, we

We’re used to classifying the integers as ei- Example.

even . . . , −6, −4, −2, 0, 2, 4, 6, . . .

If a1 ≡ b1 (mod n) and a2 ≡ b2 (mod n), then

3.1 Congruences • x + y ≡ 5 (mod 8)

From the last section we know that such an in-

1. Find d = gcd(a, n). Questions

Example. Find all integers x such that

• if foo, then bar.

2. The “exclusive or” function ∨ is written XOR

A major problem in logic is to recognise 5.2 Logical equivalence

A sentence in propositional logic is Example. p → q ≡ (¬p) ∨ q

Equivalence law 3. The distributive laws are used to “expand”

5.3 If p ∨ q is the “exclusive or” discussed last

2. The associative laws are used to remove

Last time we saw how to recognise tautolo- Example. The contrapositive of

≡ ¬(p → q) ∨ ((¬p) ∨ q) • You can’t make an omelette without

6.4 Logical consequence

A sentence ψ is a logical consequence of a sen-

It is the same to say that φ → ψ is a tautol-

We get a more expressive language than Another way is to use quantifiers:

7.5 An example from Abraham Lin-

You can fool all of the people some of the time

8.1 Valid sentences Example. The sentence

8.3 Recognising valid sentences Example. We have

8.3 Is ∃y∀xR(x, y) a logical consequence of

It is in fact possible to transform any quanti-

3 divides k 3 + 2k Example 3. Prove n! > 2n for all integers

11.1 Set notation

• {x1 , x2 , x3 , . . .} is the set with elements

11.2 Universal set a b c

The set of all subsets of a set U is called the Questions

There is an “arithmetic” of sets similar to or- 12.4 Difference A − B

U 12.5 Symmetric difference A4B

12.7 Cartesian product A × B

The set of ordered pairs

The commonest example is where A = B =

Because Descartes used this idea in geome-

12.8 A × B and multiplication

13.1 Defining functions via sets 2

Formally we represent a function f as a set X

Note that the image of a function is always

If the image of a function is equal to its

1. The squaring function square(x)= x2 with -1

The image of this function (the set of y val-

Example. The function f : R → R given by

13.4 Proving a function is one-to-one

The functions discussed in the last lecture 14.3 Characteristic functions

14.2 Sequences 14.4 Boolean functions

15.1 Notation for composite func- 15.3 The identity function

f (x) = cube(successor(square(x))), iA (x) = x,

or is called the identity function (on A).

f = cube ◦ successor ◦ square. 15.4 Inverse functions

Let h : X → Y and g : Y → Z be functions. Functions f : A → A and g : A → A are said to

Warning: Remember that g ◦ h means “do sqrt(square(x)) = x and square(sqrt(x)) = x.

f −1 ◦ f = iA and f ◦ f −1 = iB . 1. Addition is a binary operation on R.

Note that f −1 ◦ f and f ◦ f −1 are both iden- 2. Successor is a unary operation on N.

3 What are the English terms for the follow-