Beruflich Dokumente
Kultur Dokumente
Moodle Page
The course’s moodle page can be accessed through the my.monash portal at
https://my.monash.edu.au/
Unit Guide
The full unit guide can be found on the moodle page. It contains a topic outline and information on
assessment, special consideration, plagiarism etc.
Course Information
• Lectures: Three hours per week (lectures run in weeks 2–12).
• Applied classes: One per week as allocated (applied classes run in weeks 2–12).
• Required materials: Course notes booklet – available for less than $10 from the Clayton campus
bookshop or available as a pdf from the course Moodle page. Note that there is no required
textbook for the course.
Recordings of the lectures will be available through the moodle page.
Assessment
• Applied class participation worth 5%
• Ten assignments worth 3.5% each (one due each week from week 3 to week 12).
• Final examination worth 60% (held in the examination period).
In order to pass the course you must receive at least 50% overall AND at least 45% of the exam marks
AND at least 45% of the other marks.
You receive the 5% participation marks if you participate in at least 8 of your 11 applied classes,
otherwise you receive 0%. The assignments will be issued in lectures and will be available from the
course Moodle page. Assignments are to be submitted to your tutor during your applied class. They
will then be marked and returned to you in your next applied class. No calculators or other materials
will be allowed in the final exam.
Discrete mathematics studies objects which we’ll see, even if you are pretty sure that some-
have distinct separated values (e.g. integers), as thing is true, it can be really useful to have a
opposed to objects which vary smoothly (e.g. proof of it, for a number of reasons.
real numbers). You can think of it as being
“digital” mathematics rather than “analogue” 1.3 Maths in computer science
mathematics.
Discrete mathematics is particularly impor- As we mentioned above, maths and computer
tant in computer science and the two fields are science are very closely related. The topics in
very closely linked. this course all have many applications to com-
This course covers a wide range of topics in puter science. For example:
discrete mathematics including the following:
• Number theory is used in cryptography
• Numbers to enable secure communication, identity
verification, online banking and shopping
• Logic etc.
• x − y ≡ 1 (mod 8)
Let n > 2 be an integer. We say integers a
and b are congruent modulo n and write • xy ≡ 6 (mod 8).
a ≡ b (mod n) We can also deduce that x + 4 ≡ 7 (mod 8),
when n divides a − b. that 4x ≡ 12 (mod 8) and so on, because obvi-
ously 4 ≡ 4 (mod 8). Note as well that 4x ≡
12 (mod 8) can be simplified to 4x ≡ 4 (mod 8).
In some situations we can also “divide 3.4 Modular inverses
through” a congruence by an integer.
A modular multiplicative inverse of an inte-
If a ≡ b (mod n) and d divides a, b and n, ger a modulo n is an integer x such that
then
a b n ax ≡ 1 (mod n).
d ≡ d (mod d ).
The simplest and most commonly used part Similarly, p ∨ q is true when p is true or q is
of logic is the logic of “and”, “or” and “not”, true, but now we have to be more precise, be-
which is known as propositional logic. cause “or” has at least two meanings in ordinary
A proposition is any sentence which has a speech.
definite truth value (true= T or false= F), such
as We define ∨ by the truth table
1 + 1 = 2, or p q p∨q
10 is a prime number. T T T
but not T F T
What is your name? or F T T
This sentence is false. F F F
Propositions are denoted by letters such as This is the inclusive sense of “p or q” (often writ-
p, q, r, . . . , and they are combined into com- ten “p and/or q” and meaning at least one of p,
pound propositions by connectives such as ∧ q is true).
(and), ∨ (or) and ¬ (not). Finally, “not” ¬ (also called negation) is de-
fined as follows.
We define ¬ by the truth table
p ¬p
4.1 Connectives ∧, ∨ and ¬
T F
∧, ∨ and ¬ are called “connectives” because they F T
can be used to connect two sentences p and q
into one. These particular connectives are de- The connectives ∧, ∨ and ¬, are functions of the
fined so that they agree with the most com- propositional variables p and q, which can take
mon interpretations of the words “and”, “or” the two values T and F. For this reason, ∧, ∨
and “not”. and ¬ are also called truth functions.
To define p ∧ q, for example, we only have to
say that p ∧ q is true only when p is true and q 4.2 Implication
is true.
Another important truth function is p → q,
We define ∧ by the following truth table: which corresponds to “if p then q” or “p implies
q” in ordinary speech.
p q p∧q In ordinary speech the value of p → q de-
T T T pends only on what happens when p is true. For
T F F example to decide whether
F T F MCG flooded → the cricket is off
F F F it is enough to see what happens when the MCG
is flooded. Thus we agree that p → q is true
when p is false.
3. If we write 0 for F and 1 for T then ∨ becomes
We define → by the truth table
the function
p q p→q p q p∨q
T T T 1 1 0
T F F 1 0 1
F T T 0 1 1
F F T 0 0 0
This is also known as the “mod 2 sum”, because
1 + 1 = 2 ≡ 0 (mod 2). (It could also be called
4.3 Other connectives the “mod 2 difference” because a + b is the same
a − b, mod 2).
Two other important connectives are ↔ (“if and
only if”) and ∨ (“exclusive or”). 4. The mod 2 sum occurs in many homes where
The sentence p ↔ q is true exactly when the two switches p, q control the same light. The
truth values of p and q agree. truth value of p ∨ q tells whether the light is on
or not, and the light can be switched to the op-
We define ↔ by the truth table
posite state by switching the value of either p or
p q p↔q q.
T T T
Questions
T F F
F T F 4.1 Which of the following are propositions?
F F T 1 + 1 = 3, 1 + 1, 3 divides 7, 3÷7
We could also write p ↔ q as (p → q) ∧ (q → p). 4.2 Let f be the proposition “foo” and let b be
We’ll see how to prove this in the next lecture. the proposition “bar”. Write the following
The sentence p ∨ q is true exactly when the propositions in symbols, using f, b, → and
truth values of p and q disagree. ¬.
p q ¬p (¬p) ∨ q
Example. (¬p) ∨ p is a tautology. T T F T
The truth table for (¬p) ∨ p is
T F F F
p ¬p (¬p) ∨ p F T T T
T F T F F T T
F T T So p → q and (¬p) ∨ q have the same truth
So (¬p)∨p has value T under all interpretations, table (looking just at their columns). It follows
and thus is a tautology. (It is sometimes known from this that p → q can always be rewritten
as the law of the excluded middle). as (¬p) ∨ q. In fact, all truth functions can be
expressed in terms of ∧, ∨, and ¬.
We can similarly compute the values of any
truth function φ, so this is an algorithm for
recognising tautologies. However, if φ has n
variables, they have 2n sets of values, so the
amount of computation grows rapidly with n.
One of the biggest unsolved problems of logic This is like finding identities in algebra – one
and computer science is to find an efficient algo- uses known equivalences to rearrange, expand
rithm for recognising tautologies. and simplify.
5.3 Useful equivalences brackets. Since p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r, we
can write either side as p ∨ q ∨ r. This is like
The following equivalences are the most fre- p + (q + r) = (p + q) + r = p + q + r in ordinary
quently used in this “algebra of logic”. algebra.
Example. p ∧ q ⇒ p
p is a logical consequence of p ∧ q, because p = T
whenever p ∧ q = T. However, we can have
p ∧ q = F when p = T (namely, when q = F).
Hence p ∧ q and p are not equivalent.
This example shows that ⇒ is not symmet-
ric:
(p ∧ q) ⇒ p but p ; (p ∧ q)
This is where ⇒ differs from ≡, because if φ ≡ ψ
then ψ ≡ φ.
In fact, we build the relation ≡ from ⇒ the
same way ↔ is built from →:
φ ≡ ψ means (φ ⇒ ψ) and (ψ ⇒ φ).
Lecture 7: Predicates and quantifiers
7.1 Predicates
A predicate such as “n is prime” is not a proposi- 7.3 Quantifiers and connectives
tion because it is neither true nor false. Rather,
it is a function P (n) of n with the Boolean val- We can also combine quantifiers with connec-
ues T (true) or F (false). In this case, P (n) is a tives from propositional logic.
function of natural numbers defined by
(
T if n is prime Example. Let Sq(n) be the predicate “n is a
P (n) =
F otherwise. square,” and let P os(n) be the predicate “n is
Similarly, the “x 6 y” predicate is a function of positive” as above. Then we can symbolise the
pairs of real numbers, defined by following sentences:
( There is a positive square:
T if x 6 y
R(x, y) = ∃n(P os(n) ∧ Sq(n)).
F otherwise.
Since most of mathematics involves properties There is a positive integer which is not a
and relations such as these, only a language with square:
predicates is adequate for mathematics (and ∃n(P os(n) ∧ ¬Sq(n))
computer science).
All squares are positive:
∀n(Sq(n) → P os(n))
7.2 Building sentences from predi-
cates
Notice that the “All. . . are” combination in
One way to create a sentence from a predicate English actually involves an implication. This
is to replace its variables by constants. For ex- is needed because we are making a claim only
ample, when P (n) is the predicate “n is prime,” about squares and the implication serves to
P (3) is the sentence “3 is prime.” “narrow down” the set we are examining.
7.4 Alternating quantifiers Remark. Another way to say “you can’t fool
all of the people all of the time” is
Combinations of quantifiers like ∀x∃y . . . , “for
all x there is a y . . .” are common in mathe- ∃p∃t¬F (p, t).
matics, and can be confusing. It helps to have
some examples in mind to recall the difference Questions
between ∀x∃y . . . and ∃y∀x . . . 7.1 Write “roses are red” in the language of
The relation x < y is convenient to illustrate predicate logic, using
such combinations; we write x < y as the pred-
icate L(x, y) rose(x) for “x is a rose”
Then red(x) for “x is red.”
∀x∃yL(x, y) 7.2 If P (n) stands for “n is prime” and E(n)
is the (true) sentence stands for “n is even,” what does P (n) ∧
(¬E(n)) say about n?
for all x there is a y such that x < y,
which says that there is no greatest number. 7.3 Using the predicates
But with the opposite combination of quan- pol(x) for “x is a politician”
tifiers we have
liar(x) for “x is a liar”
∃y∀xL(x, y)
• all politicians are liars
is the false sentence
• some politicians are liars
there is a y such that for all x, x < y,
• no politicians are liars
which says there is a number greater than all
• some politicians are not liars.
numbers.
Even though these statements are usually Are any of these sentences logically equiv-
written without brackets they are effectively alent?
bracketed “from the centre”. So ∀x∃yL(x, y)
means ∀x(∃yL(x, y)) and ∃y∀xL(x, y) means
∃y(∀xL(x, y)).
Nevertheless, in some cases, we can see that a ∀x(P (x) ∧ Q(x)) ≡ ∀x(Q(x) ∧ P (x))
sentence is true for all interpretations. simply because
Example. ∀x∀yP (x, y) → ∀y∀xP (x, y) is true P (x) ∧ Q(x) ≡ Q(x) ∧ P (x)
for all properties P , and hence is valid. for any x.
Likewise, we can sometimes see that a sen-
tence is not valid by finding an interpretation However there are also equivalences that
which makes it false. genuinely involve quantifiers.
8.5 Useful equivalences 8.7* Completeness and undecidabil-
ity
Two important equivalences involving quanti-
In 1930, Gödel proved that there is a complete
fiers are
set of rules of inference for predicate logic. This
means, in particular, that there is an algorithm
¬∀xP (x) ≡ ∃x¬P (x) to list the valid sentences.
However, in 1936, Church and Turing proved
¬∃xP (x) ≡ ∀x¬P (x) that there is no algorithm to list the logically
false sentences. This means, in particular, that
These make sense intuitively. For example, predicate logic is undecidable: there is no algo-
¬∀xP (x) means P (x) is not true for all x, hence rithm which, for any sentence φ, decides whether
there is an x for which P (x) is false, that is, φ is valid or not.
∃x¬P (x). This negative result is due to the power of
They can also be viewed as “infinite De Mor- predicate logic: it can express all mathemati-
gan’s laws.” If x ranges over {1, 2, 3, . . .} for ex- cal or computational problems, and it is known
ample, then that some of these problems cannot be solved by
algorithm.
∀xP (x) ≡ P (1) ∧ P (2) ∧ P (3) ∧ · · ·
and Questions
∃xP (x) ≡ P (1) ∨ P (2) ∨ P (3) ∨ · · · 8.1 Give interpretations which make the fol-
Hence lowing sentences false.
¬∀xP (x) ≡ ¬ (P (1) ∧ P (2) ∧ P (3) ∧ · · · ) ∃nP (n) → ∀nP (n)
≡ (¬P (1)) ∨ (¬P (2)) ∨ (¬P (3)) ∨ · · · ∀x∀y (R(x, y) → R(y, x))
by de Morgan’s law ∀m∃nS(m, n)
≡ ∃x¬P (x).
8.2 Give interpretations which show that the
And similarly sentences
¬∃xP (x) ≡ ¬ (P (1) ∨ P (2) ∨ P (3) ∨ · · · ) ∃x (P (x) ∧ L(x))
≡ (¬P (1)) ∧ (¬P (2)) ∧ (¬P (3)) ∧ · · ·
and
by de Morgan’s law
∃x (P (x) ∧ ¬L(x))
≡ ∀x¬P (x).
are not equivalent.
Since the natural numbers 0, 1, 2, 3, . . . are Example 2. Prove there are 2n n-bit binary
generated by a process which begins with 0 and strings.
repeatedly adds 1, we have the following.
Let P (n) be “there are 2n n-bit binary strings”.
Property P is true for all natural numbers if Base step. There is 20 = 1 0-bit binary string
1. P (0) is true. (the empty string) so P (0) is true.
2. P (k) ⇒ P (k + 1) for all k ∈ N.
Induction step. We want to prove that
This is called the principle of mathematical
induction. there are 2k k-bit binary strings
It is used in a style of proof called proof by ⇒ there are 2k+1 (k + 1)-bit binary strings
induction, which consists of two steps. Well, a (k + 1)-bit binary string is either W 0 or
Base step: Proof that the required property P W 1, where W is any k-bit binary string. Thus
is true for 0. if there are 2k k-bit binary strings W , there are
Induction step: Proof that if P (k) is true 2 × 2k = 2k+1 (k + 1)-bit binary strings.
then P (k + 1) is true, for each k ∈ N. This completes the induction step, and hence
completes the proof.
9.1 Examples
Example 1. Prove that 3 divides n3 + 2n for
all n ∈ N 9.2 Starting the base step higher
Let P (n) be “3 divides n3 + 2n”.
It is not always appropriate to start the induc-
Base step. 3 divides 03 + 2 × 0 = 0, so P (0) is tion at 0. Some properties are true only from a
true. certain positive integer upwards, in which case
Induction step. We want to prove the induction starts at that integer.
• ? divides n2 + n
• The sum of the first n odd numbers
is ?
1 1 1 1
• 1×2 + 2×3 + 3×4 + . . . + n(n+1) = 1−?
9.3 Sums of series
9.2 If you correctly guessed the sum
1 1 1 1
Induction is often used to prove that sum for- 1×2 + 2×3 + 3×4 + ... + n(n+1) ,
mulas are correct. you might wonder why it is so simple.
Here is a clue:
Example 5. Prove 1+2+3+· · ·+n = n(n+1)/2 1
= 1
− 12 .
1×2 1
for all integers n > 1.
1 1
Let P (n) be “1 + 2 + 3 + · · · + n = n(n + 1)/2”. What is 2×3 ? 3×4 ?
How does this lead to a simple formula for
Base step. When n = 1, the left hand side is 1,
1 1 1 1
and the right hand side is 1(1 + 1)/2 = 2/2 = 1, 1×2 + 2×3 + 3×4 + ... + n(n+1) ?
so P (1) is true.
OK, if we can guess formulas correctly,
Induction step. We have to prove that why bother proving them by induction?
1 + 2 + ··· + k = k(k+1) The reason is that a statement which fits
2
(k+1)(k+2) many values of n can still be wrong.
⇒ 1 + 2 + · · · + k + (k + 1) = 2 .
Now, if P (k) is true, 9.3 Show that n2 + n + 41 is a prime number
for n = 1, 2, 3, 4 (and go further, if you
1 + 2 + · · · + k + (k + 1) like). Do you think n2 + n + 41 is prime
= (1 + 2 + · · · + k) + (k + 1) for all natural numbers n?
k(k+1)
= 2 + (k + 1) using P (k)
= (k + 1)( k2 + 1)
(k+1)(k+2)
= 2
as required.
This completes the induction proof.
Lecture 10: Induction and well-ordering
In the previous lecture we were able to prove Example 2. Prove that every positive integer
a property P holds for 0, 1, 2, . . . as follows: is a sum of distinct powers of 2. (Just a power
Base step. Prove P (0) of two by itself counts as a “sum”.)
Induction step. Prove P (k) ⇒ P (k + 1) for
The idea behind this proof is to repeatedly sub-
each natural number k.
tract the largest possible power of 2. We illus-
This is sufficient to prove that P (n) holds for
trate with the number 27.
all natural numbers n, but it may be difficult to
prove that P (k + 1) follows from P (k). It may 27 − largest power of 2 less than 27
in fact be easier to prove the induction step = 27 − 16 = 11
11 − largest power of 2 less than 11
= 11 − 8 = 3
P (0) ∧ P (1) ∧ · · · ∧ P (k) ⇒ P (k + 1).
3 − largest power of 2 less than 3
=3−2=1
That is, it may help to assume P holds for Hence 27 = 16 + 8 + 2 + 1 = 24 + 23 + 21 + 20 .
all numbers before k + 1. Induction with this (It is only interesting to find distinct powers
style of induction step is sometimes called the of 2, because of course each integer > 1 is a sum
strong form of mathematical induction. of 1s, and 1 = 20 .)
The strong induction proof goes as follows.
Let P (n) be “n is a sum of distinct powers of
2”.
Base step. 1 = 20 , so 1 is a sum of (one) power
10.1 Examples of strong induction of 2. Thus P (1) is true.
Induction step. Suppose each of the numbers
Example 1. Prove that every integer > 2 is a 1, 2, 3, . . . , k is a sum of distinct powers of 2. We
product of primes. (Just a prime by itself counts wish to prove that k +1 is a sum of distinct pow-
as a “product”.) ers of 2.
Let P (n) be “n is a product of primes”. This is certainly true if k + 1 is a power of 2. If
not, let 2j be the greatest power of 2 less than
Base step. 2 is a prime, hence a product of (one) k + 1. Then
prime. So P (2) is true.
i = k + 1 − 2j
Induction step. Suppose 2, 3, . . . , k are products
is one of the numbers 1, 2, 3, ..., k, and hence it
of primes. We wish to prove that k +1 is a prod-
is a sum of distinct powers of 2.
uct of primes.
Also, the powers of 2 that sum to i are all
This is certainly true if k + 1 is a prime. If not
less than 2j , otherwise 2j is less than half k + 1,
k + 1 = i × j, contrary to the choice of 2j as the largest power
for some natural numbers i and j less than k +1. of 2 less than k + 1.
But then i and j are products of primes by our Hence k + 1 = 2j + powers of 2 that sum to
assumption, hence so is i × j = k + 1. i is a sum of distinct powers of 2.
This completes the induction proof. This completes the induction proof.
√
10.2 Well-ordering and descent But then 2 = m1 /n1 , and we can repeat
the argument to show that m1 and n1 are both
Induction expresses the fact that each natural
even, so m1 = 2m2 and n1 = 2n2 , and so on.
number n can be reached by starting at 0 and
Since the argument can be repeated indefi-
going upwards (e.g. adding 1) a finite number
nitely, we get an infinite descending sequence of
of times.
natural numbers
Equivalent facts are that it is only a finite
number of steps downwards from any natural m > m1 > m2 > · · ·
number to 0, that any descending sequence of which is impossible.
natural numbers is finite, and that any set of Hence
√ there are no natural numbers m and
natural numbers has a least element. n with 2 = m/n.
This property is called well-ordering of the
natural numbers. It is often convenient to ar- Questions
range a proof to “work downwards” and appeal
to well-ordering by saying that the process of 10.1 For each of the following statements, say
working downwards must eventually stop. which is likely to require strong induction
Such proofs are equivalent to induction, for its proof.
though they are sometimes called “infinite de- an+1 −1
• 1 + a + a2 + · · · + an = a−1
scent” or similar.
• ¬ (p1 ∨ p2 ∨ p3 ∨ · · · ∨ pn ) ≡ (¬p1 ) ∧
(¬p2 ) ∧ (¬p3 ) ∧ · · · ∧ (¬pn )
10.3 Proofs by descent n
• Each fraction m < 1 is a sum of
Example 1. Prove that any integer > 2 has a distinct fractions with numerator 1
prime divisor. for example, 11 1 1 1
12 = 2 + 3 + 12 .
If n is prime, then it is a prime divisor of itself.
10.2 There is something else which tells you ev-
If not, let n1 < n be a divisor of n.
ery integer > 1 is a sum of distinct powers
If n1 is prime, it is a prime divisor of n. If
of 2. What is it?
not, let n2 < n1 be a divisor of n1 (and hence of
n). 10.3 Is every integer > 1 a sum of distinct pow-
If n2 is prime, it is a prime divisor of n. If ers of 3?
not, let n3 < n2 be a divisor of n2 , etc.
The sequence n > n1 > n2 > n3 > · · · must
eventually terminate, and this means we find a
prime divisor of n.
√
Example 2. Prove 2 is irrational.
√
Suppose that 2 = m/n for natural numbers m
and n. We will show this is impossible. Since the
square of an odd number is odd, we can argue
as follows
√
2 = m/n
⇒ 2 = m2 /n2 squaring both sides
⇒ m2 = 2n2
⇒ m2 is even
⇒ m is even
since the square of an odd number is odd
⇒ m = 2m1 say
⇒ 2n2 = m2 = 4m21
⇒ n2 = 2m21
⇒ n is even, = 2n1 say
Lecture 11: Sets
Sets are vital in expressing mathematics for- For example, when discussing arithmetic it
mally and are also very important data struc- might be sufficient to work just with the num-
tures in computer science. bers 0, 1, 2, 3, . . .. Our universal set could then
A set is basically just an unordered collection be taken as
of distinct objects, which we call its elements or
N = {0, 1, 2, 3, . . .},
members. Note that there is no notion of order
for a set, even though we often write down its and other sets of interest, e.g. {x : x is prime},
elements in some order for convenience. Also, are parts of N.
there is no notion of multiplicity: an object is
either in a set or not – it cannot be in the set
multiple times.
11.3 Subsets
Sets A and B are equal when every element We say that A is a subset of B and write
of A is an element of B and vice-versa. A ⊆ B when each element of A is an element
of B.
• {x : P (x)} is the set of all x with property A subset A of B can be specified by its charac-
P. teristic function χA , which tells which elements
of B are in A and which are not.
(
Example. 1 if x ∈ A
χA (x) =
17 ∈ {x : x is prime} = {2, 3, 5, 7, 11, 13, . . .} 0 if x ∈ /A
{1, 2, 3} = {3, 1, 2}
{1, 1, 1} = {1}
Example. The subset A = {a, c} of B =
{a, b, c} has the characteristic function χA with
For a finite set S, we write |S| for the number
χA (a) = 1, χA (b) = 0, χA (c) = 1.
of elements of S.
We also write this function more simply as
U
A B The difference U − B relative to the univer-
sal set U is called the complement B of B. Here
is the Venn diagram of B.
U
A B
12.2 Union A ∪ B
The union A ∪ B of sets A and B consists of
the elements in A or B, and is indicated by the
shaded region in the following Venn diagram.
U
A B
12.3 Intersection A ∩ B
The intersection A ∩ B of sets A and B consists
of the elements in A and B, indicated by the
shaded region in the following Venn diagram. A4B consists of the elements of one of A, B
but not the other.
U It is clear from the diagram that we have not
A B only
A4B = (A − B) ∪ (B − A),
but also
A4B = (A ∪ B) − (A ∩ B).
12.6 Ordered Pairs area l × w. In fact, we call it an “l × w rectan-
gle.” This is probably the reason for using the
Sometimes we do want order to be important.
× sign, and for calling A × B a “product.”
In computer science arrays are ubiquitous ex-
amples of ordered data structures. In maths,
Questions
ordered pairs are frequently used. An ordered
pair (a, b) consists simply of a first object a and 12.1 Draw a Venn diagram for A ∩ B. What is
a second object b. The objects a and b are some- another name for this set?
times called the entries or coordinates of the or-
dered pair. 12.2 Check the de Morgan laws by drawing
Venn diagrams for A ∪ B, A ∩ B, A ∩ B
Two ordered pairs (a, b) and (c, d) are equal and A ∪ B
if and only if a = c and b = d.
12.3 Find which of the following is true by
drawing suitable Venn diagrams.
Example. {0, 1} = {1, 0} but (0, 1) 6= (1, 0). A ∩ (B4C) = (A ∩ B)4(A ∩ C)?
There’s no reason we need to stop with pairs. A4(B ∩ C) = (A4B) ∩ (A4C)?
We can similarly define ordered triples, quadru- 12.4 If plane = line × line, what do you think
ples, and so on. When there are k coordinates, line × circle is? What about circle × cir-
we call the object an ordered k-tuple. Two or- cle?
dered k-tuples are equal if and only if their ith
coordinates are equal for i = 1, 2, . . . , k.
(a, b)
b
O a
√
A function can be thought of as a “black 2. The square root function sqrt(x) = x with
box” which accepts inputs and, for each input, domain R>0 , codomain R, and pairs
produces a single output. √
{(x, x) : x ∈ R and x > 0}.
Examples. 0
Complicated functions are often built from 15.2 Conditions for composition
simple parts. For example, the function f : R →
Composite functions do not always exist.
R defined by f (x) = (x2 + 1)3 is computed by
doing the following steps in succession: Example. If reciprocal : R − {0} → R is de-
fined by reciprocal(x) = x1 and predecessor :
• square, R → R is defined by predecessor(x) = x − 1,
then reciprocal ◦ predecessor does not exist, be-
• add 1, cause predecessor(1) = 0 is not a legal input for
reciprocal.
• cube.
To avoid this problem, we demand that the
We say that f (x) = (x2
+ 1)3
is the composite codomain of h be equal to the domain of g for
of the functions (from R to R) g ◦ h to exist. This ensures that each output of
h will be a legal input for g.
• square(x)=x2 ,
Let h : A → B and g : C → D be func-
• successor(x)=x + 1, tions. Then g ◦ h : A → D exists if and only
if B = C.
• cube(x)=x3 .
g ◦ h(x) = g(h(x)) f ◦ g = g ◦ f = iA .
and is called the composite of g and h. Example. square and sqrt are inverses of each
other on the set R>0 of reals > 0.
Mathematical objects can be related in var- This relation is also a function (the identity
ious ways, and any particular way of relating function on R), since there is exactly one pair
objects is called a relation on the set of objects for each x ∈ R.
in question.
(This also applies to relations in the every-
day sense. For example, “parent of” is a relation
on the set of people.)
-1
16.1 Relations and functions -1 0 1
Any function f : X → Y can be viewed as a (The dashed line indicates that the points
relation R on X ∪ Y . The relation is defined by where x = y are omitted.)
xRy if and only if y = f (x).
However, not every relation is a function.
Remember that a function must have exactly
one output y for each input x in its domain. In
a relation, on the other hand, an element x may
be related to many elements y, or to none at all. 3. Algebraic curves.
An algebraic curve consists of the points
(x, y) satisfying an equation p(x, y) = 0,
16.2 Examples where p is a polynomial.
1. Equality on R. E.g. unit circle x2 + y 2 − 1 = 0.
This is the relation consisting of the pairs 1
(x, x) for x ∈ R. Thus it is the following
subset of the plane. 0
1
-1
0 -1 0 1
1 Questions
16.1 Which of the following relations R(x, y)
0 satisfy ∀x∃yR(x, y)?
5. Congruence modulo n.
For a fixed n, congruence modulo n is a bi-
nary relation. It consists of all the ordered
pairs of integers (a, b) such that n divides
a − b.
1. Reflexive property.
a ≡ a (mod n)
for any number a.
2. Symmetric property.
a ≡ b (mod n) ⇒ b ≡ a (mod n)
for any numbers a and b.
3. Transitive property.
a ≡ b (mod n) and b ≡ c (mod n) ⇒
a ≡ c (mod n)
for any numbers a, b and c.
Lecture 17: Equivalence relations
3. Similarity of triangles.
An equivalence relation R on a set A is a bi-
Triangles ABC and A0 B 0 C 0 are similar if
nary relation with the following three prop-
erties. AB BC CA
0 0
= 0 0 = 0 0.
AB BC CA
1. Reflexivity. E.g. the following triangles are similar
aRa C0
for all a ∈ A.
2. Symmetry.
C
aRb ⇒ bRa
for all a, b ∈ A.
3. Transitivity. A B B0 A0
aRb and bRc ⇒ aRc 4. Parallelism of lines.
for all a, b, c ∈ A. The relation LkM (L is parallel to M ) is an
equivalence relation.
Equality and congruence mod n (for fixed n) are
Remark
examples of equivalence relations.
In all these cases the relation is an equivalence
because it says that objects are the same in some
respect.
17.1 Other equivalence relations
2. Congruence of triangles.
Triangles ABC and A0 B 0 C 0 are congruent if 3. Similar triangles have the same shape.
AB = A0 B 0 , BC = B 0 C 0 and CA = C 0 A0 . E.g.
the following triangles are congruent.
4. Parallel lines have the same direction.
C C0
If R is an equivalence relation we define the Using what we showed in the last section, we
R-equivalence class of a to be have the following.
3. Transitivity. 2. ⊆ on P(N).
This is not a total order because, for example,
aRb and bRc ⇒ aRc {1, 2} * {1, 3} and {1, 3} * {1, 2}.
for all a, b, c ∈ A.
3. Divisibility on N.
This is not a total order because, for exam-
Examples.
ple, 2 does not divide 3 and 3 does not divide
1. 6 on R. 2.
Reflexive: a 6 a for all a ∈ R.
Antisymmetric: a 6 b and b 6 a ⇒ a = b for 4. Alphabetical order of words.
all a, b ∈ R. This is a total order because given any two
Transitive: a 6 b and b 6 c ⇒ a 6 c for all different words, one will appear before the
a, b, c ∈ R. other in alphabetical order.
2. ⊆ on P(N).
Reflexive: A ⊆ A for all A ∈ P(N). 18.3 Hasse diagrams
Antisymmetric: A ⊆ B and B ⊆ A ⇒ A = B A partial order relation R on a finite set A can be
for all A, B ∈ P(N). represented as a Hasse diagram. The elements
Transitive: A ⊆ B and B ⊆ C ⇒ A ⊆ C for of A are written on the page and connected by
all A, B, C ∈ P(N). lines so that, for any a, b ∈ A, aRb exactly when
3. Divisibility on N. b can be reached from a by travelling upward
The relation “a divides b” on natural num- along the lines.
bers is reflexive, antisymmetric and transi-
Example. A Hasse diagram for the relation ⊆
tive. We leave checking this as an exercise.
on the set P({1, 2}) can be drawn as follows.
4. Alphabetical order of words.
{1, 2}
Words on the English alphabet are alphabet-
ically ordered by comparing the leftmost let-
{1} {2}
ter at which they differ. We leave checking
that this relation is reflexive, antisymmetric
and transitive as an exercise. ∅
Example. A Hasse diagram for the relation “di- Example. The relation 6 on {x : x ∈ R, x > 0}
vides” on the set {1, 2, 3, 5, 6, 10, 15, 30} can be is not a well-order relation. For example, the
drawn as follows. subset {x : x ∈ R, x > 3} has no least element.
30
Questions
6 15 18.1 Explain why “antisymmetric” does not
10
mean “not symmetric”. Give an example
3 of a relation which is neither symmetric
2 5
nor antisymmetric.
1 18.2 Draw a diagram of the positive divisors of
42 under the relation “divides.” Why does
Example. A Hasse diagram for the relation 6 it resemble the diagram for the positive di-
on the set {1, 2, 3, 4, 5} can be drawn as follows. visors of 30}?
5
18.3 Invent a partial order relation on N × N.
Is your ordering a total ordering? Is your
4
ordering a well-ordering?
18.4 Well-ordering
A well-order relation on a set is a total order
relation that also has the property that each
nonempty set of its elements contains a least el-
ement.
A well-order relation R on a set A is a to-
tal order relation such that, for all nonempty
S ⊆ A, there exists an ` ∈ S such that `Ra
for all a ∈ S.
19.1 Ordered selections without For every unordered list our reviewer could
repetition make there are 3! = 6 corresponding possible or-
dered lists. And we’ve seen that she could make
A reviewer is going to compare ten phones and
10 × 9 × 8 ordered lists. So the number of un-
list, in order, a top three. In how many ways can
ordered lists she could make is 10×9×8
6 .
she do this? More generally, how many ways are
For every combination of r elements from a
there to arrange r objects chosen from a set of
set of n elements there are r! corresponding per-
n objects?
mutations. So, using our formula for the number
In our example, the reviewer has 10 options
of permutations we have the following.
for her favourite, but then only 9 for her second-
favourite, and 8 for third-favourite. So there are
The number of combinations of r elements
10 × 9 × 8 ways she could make her list.
from a set of n elements (0 6 r 6 n) is
For an ordered selection without repetition
n(n − 1) · · · (n − r + 1) n! n
of r elements from a set of n elements there are = = .
r! r!(n − r)! r
n options for the 1st element
Notice that the notation nr is used for r!(n−r)!
n!
.
n−1 options for the 2nd element
Expressions like this are called binomial coeffi-
n−2 options for the 3rd element
cients. We’ll see why they are called this in the
.. ..
. . next lecture.
n−r+1 options for the rth element.
19.3 Ordered selections with
So we have the following formula. repetition
An ordered selection of r elements from a set X
The number of ordered selections without
is really just a sequence of length r with each
repetition of r elements from a set of n el-
term in X. If X has n elements, then there are
ements (0 6 r 6 n) is
n possibilities for each term and so:
n!
n(n − 1) · · · (n − r + 1) = .
(n − r)!
The number of sequences of r terms, each
When r = n and all the elements of a set S from some set of n elements, is
are ordered, we just say that this is a permuta- r
| ×n×
n {z· · · × n} = n .
tion of S. Our formula tells us there are n! such r
permutations. For example, there are 3! = 6
permutations of the set {a, b, c}:
(a, b, c), (a, c, b), (b, a, c), (b, c, a), (c, a, b), (c, b, a). 19.4 Unordered selections with
repetition
19.2 Unordered selections without A shop has a special deal on any four cans of soft
repetition drink. Cola, lemonade and sarsaparilla flavours
are available. In how many ways can you select
What if our reviewer instead chose an unordered four cans?
top three? In how many ways could she do that? We can write a selection in a table, for ex-
More generally, how many ways are there to ample,
choose (without order) r objects from a set of
n objects? C L S C L S
and .
• •• • • •••
A combination of r elements from a set S is
a subset of S with r elements. We can change a table like this into a string
of zeroes and ones, by moving from left to right
reading a “•” as a 0 and a column separator as Questions
a 1. The tables above would be converted into
19.1 A bank requires a PIN that is a string of
0 1 0 0 1 0 and 1 0 1 0 0 0 four decimal digits. How many such PINs
are there? How many are made of four
Notice that each string has four zeroes (one different digits?
for each can selected) and two ones (one fewer
than the number of flavours). We can choose 19.2 How many binary strings of length 5 are
a string like this by beginning with a string of there? How many of these contain exactly
six ones and then choosing four ones to change two 1s?
6
to zeroes. There are 4 ways to do this and so 19.3 In a game, each of ten players holds red,
6
there are 4 possible can selections. blue and green marbles, and places one
An unordered selection of r elements, with marble in a bag. How many possibilities
repetition allowed, from a set X of n elements are there for the colours of marbles in the
can be thought of as a multiset with r elements, bag? If each player chooses their colour at
each in X. As in the example, we can represent random are all of these possibilities equally
each such multiset with a string of r zeroes and likely?
n − 1 ones. We can choose a string like this by
beginning with a string of n + r − 1 ones and
then choosing r ones to change to zeroes.
n
In the above d m e means the smallest integer
n n
greater than m (or m “rounded up”).
20.1 Pascal’s triangle above it. To see why this is, we’ll begin with an
example.
We can write the binomial coefficients in an (in-
Example. Why is 62 = 52 + 51 ?
finite) triangular array as follows:
0
There are 62 combinations of 2 elements of
1
0
1
{1, 2, 3, 4, 5, 6}. Every such combination either
0 1
2
2 2 • does not contain a 6, in which case it is
0 31 32 3 5
3 one of the 2 combinations of 2 elements
4
0 41 42 43 4 of {1, 2, 3, 4, 5}; or
5
0 51 52 53 54 5
61 62 63 64 65 6
0 • does contain a 6, in which case the rest of
6
0 1 2 3 4 5 6 the combination is one of the 51 combi-
.. .. .. nations of 1 element from {1, 2, 3, 4, 5}.
. . .
So 2 = 52 + 51 .
6
Here are the first ten rows with the entries as
integers: We can make a similar argument in general.
Let X be a set of n elements and x is a fixed
1
1 1 n−1
of X. For any r ∈ {1, . . . , n}, there are
element
r combinations of r elements of X that do
1 2 1 not contain x and there are n−1
r−1 combinations
1 3 3 1 of r elements of X that do contain x. So:
1 4 6 4 1
1 5 10 10 5 1 n n−1 n−1
= + for 1 6 r 6 n.
1 6 15 20 15 6 1 r r r−1
1 7
35 35 21 7 1 21
1 8
56 70 56 28 8 28 1 This shows that every internal entry in Pascal’s
1 9 36 84 126 126 84 36 9 1 triangle is the sum of the two above it.
1 10 45 120 210 252 210 120 45 10 1
20.3 The binomial theorem
This triangular array is often called Pascal’s
triangle (although Pascal was nowhere near the (x + y)0 = 1
1
(x + y) = x+y
first to discover it).
(x + y)2 = x2 + 2xy + y 2
20.2 Patterns (x + y)3 = x3 + 3x2 y + 3xy 2 + y 3
4
(x + y) = x + 4x3 y + 6x2 y 2 + 4xy 3 + y 4
4
Writing the binomial coeffcients this way reveals (x+y)5 = x5 +5x4 y+10x3 y 2 +10x2 y 3 +5xy 4 +y 5
a lot of different patterns in them. Perhaps the
most obvious is that every row reads the same
Notice that the coefficients on the right are
left-to-right and right-to-left. Choosing r ele-
exactly the same as the entries in Pascal’s tri-
ments from a set of n elements to be in a combi-
angle. Why does this happen? Think about
nation is equivalent to choosing n − r elements
expanding (x + y)3 and finding the coefficient of
from the same set to not be in the combination.
xy 2 , for example.
So:
(x + y)(x + y)(x + y) = xxx + xxy + xyx + xyy
n n
+ yxx + yxy + yyx + yyy
= for 0 6 r 6 n.
r n−r = x3 + 3x2 y + 3xy 2 + y 3
The coefficient of xy 2 is 3 because we have
This shows that every row reads the same left- three terms in the sum above that contain two
to-right and right-to-left. y’s (those underlined). This is because there are
3
Another pattern is that every “internal” en- 2 ways to choose two of the three factors in a
try in the triangle is the sum of the two entries term to be y’s.
The same logic holds in general. The
coeffi- does this tell you about the rows of Pas-
n−r r n n
cient of x y in (x + y) will be r because cal’s triangle?
there will be nr ways to choose r of the n fac-
tors in a term to be y’s. This fact is called the 20.2 Find a pattern in the sums of the rows
binomial theorem. in Pascal’s triangle. Prove your pattern
holds using the binomial theorem. Also
Binomial theorem For any n ∈ N, prove it holds by considering the powerset
of a set.
(x+y)n = n0 xn y 0 + n1 xn−1 y 1 + n2 xn−2 y 2 +
n
+ nn x0 y n . 20.3 Use inclusion-exclusion to work out how
1 n−1
· · · + n−1 x y
many numbers in the set {1, . . . , 100} are
divisible by 2 or 3 or 5.
20.4 Inclusion-exclusion
A school gives out prizes to its best ten students
in music and its best eight students in art. If
three students receive prizes in both, how many
students get a prize? If we try to calculate this
as 10 + 8 then we have counted the three over-
achievers twice. To compensate we need to sub-
tract three and calculate 10 + 8 − 3 = 15.
In general, if A and B are finite sets then we
have
|A ∪ B| = |A| + |B| − |A ∩ B|.
With a bit more care we can see that if A, B
and C are sets then we have
|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B|
− |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|.
This is part of a more general law called the
inclusion-exclusion principle.
Questions
20.1 Substitute x = 1 and y = −1 into the
statement of the binomial theorem. What
Lecture 21: Probability
Probability gives us a way to model ran- It can be convenient to give this as a table:
dom processes mathematically. These processes
s 1 2 3 4
could be anything from the rolling of dice, to .
1 1 1 1
radioactive decay of atoms, to the performance Pr(s) 2 4 8 8
of a stock market index. The mathematical en- Example. Rolling a fair six-sided die could
vironment we work in when dealing with prob- be modeled by a probability space with sample
abilities is called a probability space. space S = {1, 2, 3, 4, 5, 6} and probability func-
tion Pr given as follows.
s 1 2 3 4 5 6
21.1 Probability spaces 1 1 1 1 1 1
.
Pr(s) 6 6 6 6 6 6
We’ll start with a formal definition and then A sample space like this one where every out-
look at some examples of how the definition is come has an equal probability is sometimes
used. called a uniform sample space. Outcomes from
a uniform sample space are said to have been
A probability space consists of taken uniformly at random.
Your friend believes that Python coding has even. What is Pr(A|B)?
become more popular than AFL in Melbourne. 1
Pr(A ∩ B) = Pr(4) = 8
She bets you $10 that the next person to pass
1 1 3
you on the street will be a Python program- Pr(B) = Pr(2) + Pr(4) = 4 + 8 = 8
mer. You feel confident about this bet. How- Thus,
Pr(A∩B)
ever, when you see a man in a “Hello, world!” Pr(A|B) = Pr(B) = ( 18 )/( 38 ) = 13 .
t-shirt approaching, you don’t feel so confident
any more. Why is this? Example. A binary string of length 6 is gener-
We can think about this with a diagram. ated uniformly at random. Let A be the event
The rectangle represents the set of people in that the first bit is a 1 and B be the event that
Melbourne, the circle P is the set of Python the string contains two 1s. What is Pr(A|B)?
coders, and the circle T is the set of “Hello, There are 26 strings in our sample space.
world!” t-shirt owners. Now A ∩ B occurs when the first bit is 1 and the
rest of the string contains 1 one. There are 51
such strings and so Pr(A ∩ B) = 51 /26 . Also,
Initially, you feel confident because the circle Our definition of conditional probability gives us
P takes up a small proportion of the rectan- another way of defining independence. We can
gle. But when you learn that your randomly say that events A and B are independent if
selected person is in the circle T , you feel bad Pr(A) = Pr(A|B).
because the circle P covers almost all of T . In
This makes sense intuitively: it is a formal way
mathematical language, the probability that a
of saying that the likelihood of A does not de-
random Melbournian is a Python coder is low,
pend on whether or not B occurs.
but the probability that a random Melbournian
is a Python coder given that they own a “Hello,
world!” t-shirt is high.
22.3 Independent repeated trials
Generally if we perform exactly the same action
22.1 Conditional probability multiple times, the results for each trial will be
independent of the others. For example, if we
Conditional probabilities measure the likelihood roll a die twice, then the result of the first roll
of an event, given that some other event occurs. will be independent of the result of the second.
For two independent repeated trials, each
from a sample space S, our overall sample
For events A and B, the conditional probabil-
space is S × S and our probability function will
ity of A given B is
be given by Pr((s1 , s2 )) = Pr(s1 )Pr(s2 ). For
Pr(A|B) = Pr(A∩B)
Pr(B) .
three independent repeated trials the sample
space is S × S × S and the probability function
This definition also implies that
Pr((s1 , s2 , s3 )) = Pr(s1 )Pr(s2 )Pr(s3 ), and so on.
Pr(A ∩ B) = Pr(A|B)Pr(B).
Example. The spinner from the previous ex-
Example. The spinner from the last lecture is ample is spun twice. What is the probability
spun. Let A be the event that the result was at that the results add to 5?
least 3 and B be the event that the result was A total of 5 can be obtained as (1,4), (4,1), (2,3)
or (3,2). Because the spins are independent: Example. A binary string is created so that the
1 1 1 first bit is a 0 with probability 31 and then each
Pr((1, 4)) = Pr((4, 1)) = 2 × 8 = 16 subsequent bit is the same as the preceding one
1 1 1
Pr((2, 3)) = Pr((3, 2)) = 4 × 8 = 32 with probability 34 . What is the probability that
So, because (1,4), (4,1), (2,3) and (3,2) are mu- the first bit is 0, given that the second bit is 0?
tually exclusive, the probability of the total be- Let F be the event that the first bit is 0 and
1 1 1 1 3 let S be the event that the second bit is 0. So
ing 5 is 16 + 16 + 32 + 32 = 16 .
Pr(F ) = 13 . If F occurs then the second bit will
be 0 with probability 43 and so Pr(S|F ) = 34 . If
22.4 Bayes’ theorem F does not occur then the second bit will be 0
with probability 14 and so Pr(S|F ) = 14 . So, by
Bayes’ theorem gives a way of calculating the Bayes theorem,
conditional probability of an event A given an
event B when we already know the probabilities Pr(F )Pr(S|F )
Pr(F |S) = Pr(F )Pr(S|F )+Pr(F )Pr(S|F )
of A, of B given A, and of B given A. 1
× 34
= 1
3
× 4 + 23 × 14
3
3
Bayes’ theorem. For the events A and B,
= ( 14 )/( 12
5
)
Pr(B|A)Pr(A)
Pr(A|B) = .
Pr(B|A)Pr(A) + Pr(B|A)Pr(A) = 35 .
22.5 Bayes’ theorem examples 22.2 A standard die is rolled twice. What is the
probability that the first roll is a 1, given
Example. Luke Skywalker discovers that some that the sum of the rolls is 6?
porgs have an extremely rare genetic mutation
that makes them powerful force users. He de- 22.3 A bag contains three black marbles and
velops a test for this mutation that is right 99% two white marbles and they are randomly
of the time and decides to test all the porgs on selected and removed, one at a time un-
Ahch-To. Suppose there are 100 mutant porgs til the bag in empty. Use Bayes’ theo-
in the population of 24 million. We would guess rem to calculate the probability that the
that the test would come up positive for 99 of the first marble selected is black, given that
100 mutants, but also for 239 999 non-mutants. the second marble selected is black.
We are assuming that the conditional prob-
ability of a porg testing positive given it’s a mu-
tant is 0.99. But what is the conditional prob-
ability of it being a mutant given that it tested
positive? From our guesses, we would expect
99
this to be 99+239999 ≈ 0.0004. Bayes’ theorem
gives us a way to formalise this:
Pr(P |M )Pr(M )
Pr(M |P ) = Pr(P |M )Pr(M )+Pr(P |M )Pr(M )
100
×0.99
= 100
24000000
×0.99+(1− 100
)×0.01
24000000 24000000
99
= 99+239999
≈ 0.0004.
Lecture 23: Random variables
In a game, three standard dice will be rolled Example. A standard die is rolled three times.
and the number of sixes will be recorded. We Let X be the number of sixes rolled. What is the
could let X stand for the number of sixes rolled. probability distribution of X? Obviously X can
Then X is a special kind of variable whose value only take values in {0, 1, 2, 3}. Each roll there
is based on a random process. These are called is a six with probability 16 and not a six with
random variables. probability 65 . The rolls are independent.
Because the value of X is random, it doesn’t 5 5 5
Pr(X = 0) = 6 × 6 × 6
make sense to ask whether X = 0, for exam-
ple. But we can ask what the probability is that Pr(X = 1) = ( 61 )( 65 )( 56 ) + ( 56 )( 16 )( 56 ) + ( 56 )( 56 )( 16 )
X = 0 or that X > 2. This is because “X = 0” Pr(X = 2) = ( 16 )( 61 )( 56 ) + ( 16 )( 56 )( 16 ) + ( 56 )( 16 )( 16 )
and “X > 2” correspond to events from our 1 1 1
sample space. Pr(X = 3) = 6 × 6 × 6
So the probability distribution of X is
23.1 Formal definition x 0 1 2 3
125 75 15 1
.
Formally, a random variable is defined as a func- Pr(X = x) 216 216 216 216
tion from the sample space to R. In the example
above, X is a function from the process’s sample 23.3 Independence
space that maps every outcome to the number
of sixes in that outcome. We have seen that two events are independent
when the occurrence or non-occurrence of one
Example. Let X be the number of 1s in event does not affect the likelihood of the other
a binary string of length 2 chosen uniformly occurring. Similarly two random variables are
at random. Formally, X is a function from independent if the value of one does not affect
{00, 01, 10, 11} to {0, 1, 2} such that the likelihood that the other will take a certain
value.
X(00) = 0, X(01) = 1, X(10) = 1, X(11) = 2.
For most purposes, however, we can think of X Random variables X and Y are independent
as simply a special kind of variable. if, for all x and y,
Pr(X = x ∧ Y = y) = Pr(X = x)Pr(Y = y).
23.2 Probability distribution
We can describe the behaviour of a random vari- Example. An integer is generated uniformly at
able X by listing, for each value x that X can random from the set {10, 11, . . . , 29}. Let X and
take, the probability that X = x. This gives the Y be its first and second (decimal) digit. Then
probability distribution of the random variable. X and Y are independent random variables be-
Again, formally this listing is a function from cause, for x ∈ {1, 2} and {0, 1, . . . , 9},
the values of X to their probabilities.
1
Pr(X = x ∧ Y = y) = 20
Example. Continuing with the last example, 1 1
= 2 × 10
the probability distribution of X is given by = Pr(X = x)Pr(Y = y).
1
4 if x = 0
Pr(X = x) = 1
if x = 1 23.4 Operations
2
1
if x = 2. From a random variable X, we can create new
4
It can be convenient to give this as a table: random variables such as X + 1, 2X and X 2 .
These variables work as you would expect them
x 0 1 2 to.
1 1 1
.
Pr(X = x) 4 2 4
Example. If X is the random variable with dis- Questions
tribution
23.1 An elevator is malfunctioning. Every
x −1 0 1 minute it is equally likely to ascend one
1 1 1
,
Pr(X = x) 6 3 2
floor, descend one floor, or stay where it
is. When it begins malfunctioning it is on
then the distributions of X + 1, 2X and X 2 are
level 5. Let X be the level it is on three
y 0 1 2 minutes later. Find the probability distri-
Pr(X + 1 = y) 1 1 1 bution for X.
6 3 2
When we said “average” above, we really meant Our initial die-rolling example hinted that the
“mean”. Remember that the mean of a collec- average of a large number of independent trials
tion of numbers is the sum of the numbers di- will get very close to the expected value. This
vided by how many of them there are. So the is mathematically guaranteed by a famous the-
mean of x1 , . . . , xt is x1 +···+x t
. The mean of 2,2,3 orem called the law of large numbers.
t
2+2+3+11
and 11 is 4 = 4.5, for example.
Let X1 , X2 , . . . be independent random vari-
The expected value of a random variable is
ables, all with the same distribution and ex-
calculated as a weighted average of its possible
pected value µ. Then
values.
lim 1 (X1 + · · · + Xn ) = µ.
n→∞ n
If X is a random variable with distribution
x x1 x2 · · · xt
, 24.3 Linearity of expectation
Pr(X = x) p1 p2 · · · pt
then the expected value of X is We saw in the last lecture that adding random
variables can be difficult. Finding the expected
E[X] = p1 x1 + p2 x2 + · · · + pt xt .
value of a sum of random variables is easy if we
know the expected values of the variables.
Example. If X is a random variable represent-
ing a die roll, then If X and Y are random variables, then
1 1 1 E[X + Y ] = E[X] + E[Y ].
E[X] = ×1+ × 2 + ··· + × 6 = 3.5.
6 6 6
Example. Someone estimates that each year This works even if X and Y are not independent.
the share price of Acme Corporation has a 10% Similarly, finding the expected value of a
chance of increasing by $10, a 50% chance of in- scalar multiple of a random variable is easy if
creasing by $4, and a 40% chance of falling by we know the expected value of the variable.
$10. Assuming that this estimate is good, are
Acme shares likely to increase in value over the If X is a random variable and s ∈ R, then
long term? E[sX] = sE[X].
We can represent the change in the Acme
share price by a random variable X with distri-
Example. Two standard dice are rolled. What
bution
is the expected total?
x −10 4 10 Let X1 and X2 be random variables repre-
2 1 1
.
Pr(X = x) 5 2 10
senting the first and second die rolls. From the
earlier example E[X1 ] = E[X2 ] = 3.5 and so Similarly,
1 1
E[X1 + X2 ] = E[X1 ] + E[X2 ] = 3.5 + 3.5 = 7. Var[Y ] = 2 × (−1)2 + 2 × 12 = 1
1 1
Var[Z] = × (−50)2 + × 502 = 2500.
Example. What is the expected number of ‘11’ 2 2
substrings in a binary string of length 5 chosen Notice that the variance of X is much smaller
uniformly at random? than the variance of Z because X is very likely
For i = 1, . . . , 4, let Xi be a random vari- to be close to its expected value whereas Z will
able that is equal to 1 if the ith and (i + 1)th certainly be far from its expected value.
bits of the string are both 1 and is equal to 0
Example. Let X be a random variable with
otherwise. Then X1 + · · · + X4 is the number
distribution given by
of ‘11’ substrings in the string. Because the bits
are independent, Pr(Xi = 1) = 21 × 12 = 14 and x 0 2 6
E[Xi ] = 41 for i = 1, . . . , 4. So, 1 1 1
.
Pr(X = x) 6 2 3
4 Then the expected value of X is
E[X1 + · · · + X4 ] = E[X1 ] + · · · + E[X4 ] = = 1.
4 1 1 1
Note that the variables X1 , . . . , X4 in the above E[X] = ×0+ ×2+ × 6 = 3.
6 2 3
example were not independent, but we were still So, the variance of X is
allowed to use linearity of expectation. 1 1 1
Var[X] = ×(0−3)2 + ×(2−3)2 + ×(6−3)2 = 5.
6 2 3
24.4 Variance
Questions
Think of the random variables X, Y and Z
whose distributions are given below. 24.1 Do you agree or disagree with the following
statement? “The expected value of a ran-
x −1 99 y −1 1
dom variable is the value it is most likely
99 1 1 1
Pr(X = x) 100 100 Pr(Y = y) 2 2 to take.”
z −50 50 24.2 Let X be the sum of 1000 spins of the spin-
1 1
Pr(Z = z) 2 2 ner from Lecture 21, and let Y be 1000
These variables are very different. Perhaps X times the result of a single spin. Find E[X]
corresponds to buying a raffle ticket, Y to mak- and E[Y ]. Which of X and Y do you think
ing a small bet on a coin flip, and Z to making would have greater variance?
a large bet on a coin flip. However, if you only
24.3 Let X be the number of heads occurring
consider expected value, all of these variables
when three fair coins are flipped. Find
look the same – they each have expected value
E[X] and Var[X].
0.
To give a bit more information about a ran-
dom variable we can define its variance, which
measures how “spread out” its distribution is.
(b−a+1)2 −1
Pr(X = k)
a+b
We have E[X] = 2 and Var[X] = 12 . 0.3
Uniform distribution with a = 3, b = 8
0.2
0.2
0.1
0.15
0
Pr(X = k)
0 2 4 6 8 10 12
0.1 k
The Bernoulli distribution with parameter The binomial distribution with parameters
p ∈ [0, 1] is given by n ∈ Z+ and p ∈ [0, 1] is given by
(
Pr(X = k) = nk pk (1 − p)n−k
p for k = 1
Pr(X = k) =
1 − p for k = 0. for k ∈ {0, . . . , n}.
We have E[X] = p and Var[X] = p(1 − p). We have E[X] = np and Var[X] = np(1 − p).
Binomial distribution with n = 20, p = 0.5 λ = 6 approximates the probability it receives
k calls in a certain minute. It follows that the
0.2 expected value is 6 calls and the variance is 6.
0.15 Questions
Pr(X = k)
0.2
0.15
Pr(X = k)
0.1
0.05
0
0 2 4 6 8 10 12 14 16
k
Just as the structure of the natural numbers Remark. Using a recursive program to com-
supports induction as a method of proof, it sup- pute Fibonacci numbers can easily lead to stack
ports induction as a method of definition or of overflow, because each value depends on two
computation. previous values (each of which depends on an-
When used in this way, induction is usually other two, and so on).
called recursion, and one speaks of a recursive A more efficient way to use the recursive defi-
definition or a recursive algorithm. nition is to use three variables to store F (k + 1),
F (k) and F (k − 1). The new values of these
26.1 Recursive Definitions variables, as k increases by 1, depend only on
the three stored values, not on all the previous
Many well known functions f (n) are most easily
values.
defined in the “base step, induction step” for-
mat, because f (n + 1) depends in some simple
way on f (n).
The induction step in the definition is more 26.2 Properties of recursively defined
commonly called the recurrence relation for f , functions
and the base step the initial value.
These are naturally proved by induction, using
Example. The factorial f (n) = n! a base step and induction step which parallel
those in the definition of the function.
Initial value. 0! = 1.
Recurrence relation. (k + 1)! = (k + 1) × k!
Many programming languages allow this Example. For n > 5, 10 divides n!
style of definition, and the value of the func-
tion is then computed by a descent to the initial
value. Proof Base step.
For example, to compute 4!, the machine 5! = 5 × 4 × 3 × 2 × 1 = 10 × 4 × 3,
successively computes
hence 10 divides 5!.
4! = 4 × 3! Induction step. We have to show
= 4 × (3 × 2!) 10 divides k! =⇒ 10 divides (k + 1)!
= 4 × (3 × (2 × (1!)))
Since (k + 1)! = (k + 1) × k! by the recurrence
= 4 × (3 × (2 × (1 × 0!))) relation for factorial, the induction step is clear,
which can finally be evaluated since 0! = 1. and hence the induction is complete.
Remark. The numbers 4, 3, 2, 1 have to be
stored on a “stack” before the program reaches Example. F (0) + F (1) + · · · + F (n) = F (n +
the initial value 0! = 1 which finally enables it 2) − 1.
to evaluate 4!.
Thus a recursive program, though short,
may run slowly and even cause “stack overflow.” Proof Base step. F (0) = 0 = F (2) − 1,
because F (2) = 1.
Induction step. We have to show
Example. The Fibonacci sequence
0, 1, 1, 2, 3, 5, 8, . . . F (0) + F (1) + · · · + F (k)
= F (k + 2) − 1
The nth number F (n) in this sequence is defined
by ⇒ F (0) + F (1) + · · · + F (k + 1)
= F (k + 3) − 1.
Initial values. F (0) = 0, F (1) = 1.
Recurrence relation. F (k +1) = F (k)+F (k −1).
Well, hence it follows (by induction) that
F (0) + F (1) + · · · + F (k) f (n) = number of n bit strings
= F (k + 2) − 1 with no consecutive 0s
⇒ F (0) + F (1) + · · · + F (k + 1) = F (n + 2).
= F (k + 2) + F (k + 1) − 1,
by adding F (k + 1) to both sides Questions
⇒ F (0) + F (1) + · · · + F (k + 1) 26.1 A function s(n) is defined recursively by
= F (k + 3) − 1 Initial value: s(0) = 0
since F (k + 2) + F (k + 1) = F (k + 3) Recurrence relation: s(n + 1) = s(n) + 2n + 1
by the Fibonacci recurrence relation Write down the first few values of s(n),
This completes the induction. and guess what function s is.
Example. 1 + a + a2 + ··· + an n
X
2 n
1 + a + a + · · · + a is written ak .
This is the function g(n) defined by
k=0
Initial value. g(0) = 1 Σ is capital sigma, standing for “sum.”
Recurrence relation. g(k + 1) = g(k) + ak+1 n
Y
1 × 2 × 3 × · · · × n is written k.
We can use this relation to prove by induc- k=1
n+1
tion that g(n) = a a−1−1 (a formula for the sum Π is capital pi, standing for “product.”
of a geometric series), provided a 6= 1.
27.4 Binary search algorithm
Proof
a0+1 −1 Given a list of n numbers in order
Base step. For n = 0, 1 = g(0) = a−1 , as
required.
x1 < x2 < · · · < xn ,
Induction step. We want to prove
we can find whether a given number a is in the
ak+1 − 1 ak+2 − 1 list by repeatedly “halving” the list.
g(k) = ⇒ g(k + 1) = .
a−1 a−1 The algorithm binary search is specified
Well, recursively by a base step and a recursive step.
ak+1 − 1
g(k) = Base step. If the list is empty,
a−1
ak+1 − 1 report ‘a is not in the list.’
⇒ g(k + 1) = + ak+1
a−1
ak+1 − 1 + (a − 1)ak+1 Recursive step If the list is not empty, see
⇒ g(k + 1) = whether its middle element is a. If so, report
a−1
k+2 k+1 ‘a found.’
a +a − ak+1 − 1
= Otherwise, if the middle element m > a,
a−1
k+2 binary search the list of elements < m.
a −1
= as required. And if the middle element m < a, binary
a−1
search the list of elements > m.
This completes the induction.
27.5 Correctness half with each question.
E.g. if the answer is an integer, do binary
We prove that the algorithm works on a list of
search on the list of possible answers. If the
n items by strong induction on n.
answer is a word, do binary search on the list
Base step. The algorithm works correctly on
of possible answers (ordered alphabetically). If
a list of 0 numbers, by reporting that a is not in
this is done, then 20 questions suffice to find the
the list.
correct answer out of 220 = 1, 048, 576 possibili-
Induction step. Assuming the algorithm
ties.
works correctly on any list of < k + 1 numbers,
suppose we have a list of k + 1 numbers.
The recursive step either finds a as the mid- Questions
dle number in the list, or else produces a list of
P
27.1 Rewrite the following sums using nota-
< k +1 numbers to search, which by assumption tion.
it will do correctly.
This completes the induction. • 1 + 4 + 9 + 16 + · · · + n2
Remark. This example shows how easy it is to • 1 − 2 + 3 − 4 + · · · − 2n
prove correctness of recursive algorithms, which 27.2 Which of the proofs in this lecture uses
may be why they are popular despite the prac- strong induction?
tical difficulties in implementing them.
27.3 Imagine a game where the object is to
27.6 Running time identify a natural number between 1 and
220 using 20 questions with YES-NO an-
log2 n is the number x such that
swers. The lecture explains why 20 ques-
n = 2x . tions are sufficient to identify any such
For example, 1024 = 210 , and therefore number.
Explain why less than 20 YES-NO ques-
log2 1024 = 10. tions are not always sufficient.
Similarly log2 512 = 9, and hence log2 1000 is
between 9 and 10.
Repeatedly dividing 1000 by 2 (and discard-
ing remainders of 1) runs for 9 steps:
500, 250, 125, 62, 31, 15, 7, 3, 1
The 10 halving steps for 1024 are
512, 256, 128, 64, 32, 16, 8, 4, 2, 1
This means that the binary search algorithm
would do at most 9 “halvings” in searching a
list of 1000 numbers and at most 10 “halvings”
for 1024 numbers.
More generally, binary search needs at most
blog2 nc “halvings” to search a list of n numbers,
where blog2 nc is the floor of log2 n, the greatest
integer 6 log2 n.
Remark. In a telephone book with 1,000,000
names, which is just under 220 , it takes at most
20 halvings (using alphabetical order) to find
whether a given name is present.
27.7 20 questions
A mathematically ideal way to play 20 questions
would be to divide the number of possibilities in
Lecture 28: Recursion, lists and sequences
Questions
28.1 Find the next four values of each of the fol-
lowing recurrence relations. What order is
each recurrence relation? Which are ho-
mogeneous and which are inhomogeneous?
(a) rk+1 = rk + k 2 , r0 = 0.
A graph consists of a set of objects called ver- 29.2 Problems given by graphs
tices and a list of pairs of vertices, called edges.
Many problems require vertices to be connected
Graphs are normally represented by pic-
by a “path” of successive edges. We shall define
tures, with vertex A represented by a dot la-
paths (and related concepts) next lecture, but
belled A and each edge AB represented by a
the following examples illustrate the idea and
curve joining A and B.
show how often it comes up.
Such pictures are helpful for displaying data
They also show how helpful it is to have
or relationships, and they make it easy to recog-
graph pictures when searching for paths.
nise properties which might otherwise not be no-
ticed.
1. Gray Codes
The description by lists of vertices and edges
is useful when graphs have to be manipulated by
The binary strings of length n are taken as the
computer. It is also a useful starting point for
vertices, with an edge joining any two vertices
precise definitions of graph concepts.
that differ in only one digit. This graph is called
the n-cube.
29.1 Examples of graphs
E.g. the 2-digit binary strings form a square
Description Picture (a “2-cube”).
A 01 11
Vertices: A, B, C
Edges: AB, BC, CA 00 10
B C
Such a graph, with at most one edge be- and the 3-digit binary strings form an ordinary
tween each pair of vertices, and no vertex joined cube (a “3-cube”).
to itself, is called a simple graph.
101 111
Description Picture
001 011
Vertices: A, B, C, D A
Edges: AB, AB, BC, BC, 000 010
AD, BD, CD B D
100 110
C
The two edges AB which join the same pair
of vertices are called parallel edges. A Gray code of length n is a path which in-
cludes each vertex of the n-cube exactly once.
Description Picture E.g. here is a path in the 3-cube which gives the
Gray code
Vertices: A, B A B 000, 001, 011, 010, 110, 111, 101, 100
Edges: AA, AB, AB
The edge joining A to A is called a loop. 101 111
The name multigraph is used when loops
and/or parallel edges are allowed. 001 011
(
There are several ways of “travelling” around 1 if Vi is adjacent to Vj in G,
the edges of a graph. aij =
0 otherwise.
A walk is a sequence
V1
V1 , e1 , V2 , e2 , V3 , e3 , . . . , en−1 , Vn , For example, the graph
V2 V3
where each ei is an edge joining vertex Vi to ver-
0 0 1
tex Vi+1 . (In a simple graph, where at most one
has adjacency matrix 0 0 1
edge joins Vi and Vi+1 , it is sufficient to list the
vertices alone.) 1 1 0
If Vn = V1 the walk is said to be closed.
A path is a walk with no repeated vertices.
A trail is a walk with no repeated edges. 30.3 Adjacency matrix powers
The product of matrices
30.1 Examples
a11 a12 · · · b11 b12 · · ·
In these pictures, a walk is indicated by a di-
a21 a22 · · · × b21 b22 · · ·
rected curve running alongside the actual edges
in the walk. ··· ··· ··· ··· ··· ···
is the matrix whose (i, j) entry is
ai1 b1j + ai2 b2j + ai3 b3j + · · · ,
–the “dot product” of the ith row
A walk which is not a trail or a path. (Repeated ai1 ai2 ai3 ···
edge, repeated vertex.) of the matrix on the left with the jth column
b1j
b2j
b3j
..
A trail which is not a path. (Repeated vertex.) .
of the matrix on the right.
. .
is viewed asThe
a reason
handshake,
for the name is that if each edge is 1. Each time a walk enters and leaves a vertex
viewed as a handshake, 31.3 Euler’s solution
it "uses up" 2 from the degree.
� 2. Hence if all edges are used by the walk, all
then at each vertex V Eulerthe(1737)
vertices except first and observed
last must havethat the answer is no, be-
then at each vertex V = number of hands.
degree(V) even degree.
cause
Hence 3. The seven bridges graph in fact has four ver
degree(V ) = number of hands. tices of odd degree.
sum of degrees 1. Each time a walk enters and leaves a ver-
Hence = total number of hands tex it “uses up” 2 from the degree.
sum of =degrees 31.4 Euler's theorem
2 x number of handshakes
The same argument shows in general that
= total number
An important of hands
consequence 2. Hence if all edges are used by the walk, all
A graph with > 2 odd degree vertices
× handshaking
= 2The number of implies that in any
handshakes
lemma vertices
has no trail using all its edges. except
the first and last must have
graph the sum of degrees is even (being
2xsomething). Thus it is impossible, e.g. for even degree.
And a similar argument shows
An important consequence
a graph to have degrees 1,2,3,4,5.
A graph with odd degree vertices
The handshaking lemma implies thatofin anyhas no closed trail 3. using
Theallseven
its edges.bridges graph in fact has four
31.2 The seven bridges
graph the sumKonigsberg of degrees is even (being vertices
(Because in this case the first andoflast
odd degree.
vertex
2×something). ThusKonigsberg
In 18th century it is impossible,
there were sevene.g. are
forthe same, and its degree is ''used up" by a
bridges connecting islands in the river to the closed trail as follows: 1 at the start, 2 each time
a graph tobanks
have degrees 1,2,3,4,5.
as follows. through, 1 at the end.)
31.4 Euler’s theorem
56
The same argument shows in general that
31.2 The seven bridges of Königsberg
A graph with > 2 odd degree vertices has no
In 18th century Königsberg there were seven trail using all its edges.
bridges connecting islands in the river to the
And a similar argument shows
banks as follows.
connected, since any unused edge would
A graph with odd degree vertices has no
be connected to used ones, and thus would
closed trail using all its edges.
have eventually been used).
(Because in this case the first and last vertex
are the same, and its degree is “used up” by a 31.6 Bridges
closed trail as follows: 1 at the start, 2 each time A bridge in a connected graph G is an edge
through, 1 at the end.) whose removal disconnects G. E.g. the edge
B is a bridge in the following graph.
31.5 The converse theorem
If, conversely, we have a graph G whose vertices B
all have even degree, is there a closed trail using
The construction of an Euler trail is improved
all the edges of G?
by the doing the following (Fleury’s algorithm).
Not necessarily. For example, G might be
disconnected : • Erase each edge as soon as it is used.
• Use a bridge in the remaining graph only
if there is no alternative.
It turns out, when this algorithm is used, that it
We say a graph is connected if any two of is not necessary to make any detours. The im-
its vertices are connected by a walk ( or equiva- provement, however, comes at the cost of need-
lently, by a trail or a path). We call a trail using ing an algorithm to recognise bridges.
all edges of a graph an Euler trail. Then we have
A e B
is a spanning tree of
Remarks
32.3 Also find spanning trees of the cube and
1. T is not necessarily a tree at all steps of dodecahedron which are paths.
the algorithm, but it is at the end.
To search a graph G systematically, it helps each vertex v to a “predecessor” among the ad-
to have a spanning tree T , together with an or- jacent vertices of v already in T . An arbitrary
dering of the vertices of T . vertex is chosen as the root V0 of T .
Sets of numbers
N the set of natural numbers {0, 1, 2, 3, . . .}
Z the set of integers {. . . , −2, −1, 0, 1, 2, . . .}
Q the set of rational numbers { ab : a, b ∈ Z, b 6= 0}
R the set of real numbers
Number Theory
a|b a divides b b = qa for some q ∈ Z
gcd(a, b) greatest common divisor of a and b
a ≡ b (mod n) a and b are congruent modulo n n | (a − b)
Logic
¬p not p
p∧q p and q
p∨q p or q
p→q p implies q
∀x for all x
∃x there exists x
Sets
x∈A x is an element of A
{x : P (x)} the set of x such that P (x)
|A| the number of elements in A
A⊆B A is a subset of B
A∩B A intersect B {x : x ∈ A ∧ x ∈ B}
A∪B A union B {x : x ∈ A ∨ x ∈ B}
A−B set difference A minus B {x : x ∈ A ∧ x ∈
/ B}
Functions
f :A→B f is a function from A to B
Probability
Pr(E) probability of E
Pr(A|B) conditional probability of A given B
E[X] expected value of X
Var[X] variance of X
¬¬p ≡ p
p∧p≡p
p∨p≡p Conditional probability
p∧q ≡q∧p Pr(A ∩ B)
Pr(A|B) =
Pr(B)
p∨q ≡q∨p
p ∧ (q ∧ r) ≡ (p ∧ q) ∧ r Bayes’ theorem
p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r Pr(B|A)Pr(A)
Pr(A|B) =
Pr(B|A)Pr(A) + Pr(B|A)Pr(A)
p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)
p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r)
¬(p ∧ q) ≡ (¬p) ∨ (¬q) Discrete uniform distribution
¬(p ∨ q) ≡ (¬p) ∧ (¬q) 1
Pr(X = k) = for k ∈ {a, a + 1, . . . , b}
b−a+1
p∧T≡p a+b (b−a+1)2 −1
E[X] = 2 , Var[X] = 12
p∨F≡p
Bernoulli distribution
p∧F≡F (
p∨T≡T p for k = 1
Pr(X = k) =
1 − p for k = 0
p ∧ (¬p) ≡ F
E[X] = p, Var[X] = p(1 − p)
p ∨ (¬p) ≡ T
p ∧ (p ∨ q) ≡ p Geometric distribution
p ∨ (p ∧ q) ≡ p Pr(X = k) = p(1 − p)k for k ∈ N
1−p 1−p
¬∀xP (x) ≡ ∃x¬P (x) E[X] = p , Var[X] = p2
¬∃xP (x) ≡ ∀x¬P (x)
Binomial distribution
n k
Pr(X = k) = p (1 − p)n−k for k ∈ {0, . . . , n}
k
Ordered selections without repetition
E[X] = np, Var[X] = np(1 − p)
n!
n(n − 1) · · · (n − r + 1) =
(n − r)! Poisson distribution
λk e−λ
Unordered selections without repetition Pr(X = k) = for k ∈ N
k!
n(n − 1) · · · (n − r + 1) n! n E[X] = λ, Var[X] = λ
= =
r! r!(n − r)! r