Sie sind auf Seite 1von 133

The Second-Hottest Logic Book on Earth

Ryan Christensen
& K Codell Carter
The Second-Hottest Logic Book on Earth
Winter 2018 Edition
Copyright 2018 Ryan Christensen
The Second-Hottest Logic Book on Earth

Ryan Christensen
& K Codell Carter

Contents 4

0 Truth-Functional Logic 6
0.1 Symbolizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.2 Scope and statement forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1 First-Order Logic 15
1.1 Translating categorical statements. . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2 Relations and multiply general statements . . . . . . . . . . . . . . . . . . . . . . 23
1.3 Properties of Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.4 Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2 First-Order proofs 44
2.1 The first three rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2 existential instantiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.3 Quantifier Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.4 Logical Truths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.5 Strategies and Tactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3 Axiom Systems 57
3.1 Axiom Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 Axiom Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3 An axiom system for TF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.4 An Axiom system for FOL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.5 Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4 Modal Logic 75
4.1 What is modal logic? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3 Quantified modal logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.4 Models of Quantified Modal Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 91


5 Arithmetic 97
5.1 Robinson Arithmetic (Q) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2 Peano Arithmetic (P) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6 Set Theory 104

6.1 Naive Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.2 The Logicist Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.3 Zermelo-Fraenkel Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.4 Cantor’s Theory of Transfinite Numbers . . . . . . . . . . . . . . . . . . . . . . . 119
6.5 Peano’s Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

7 Gödel’s Proofs 127

7.1 The basic idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2 The details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Chapter 0

Truth-Functional Logic

0.1 Symbolizing
Consider this dead-simple argument:

Dogs are mammals and cats are mammals.

Therefore, dogs are mammals.

This argument works by exploiting the fact that the premise is a combination of two simple state-
ments: ‘A and B ’. ‘And’ is a logically significant word here; the rest of that sentence could change
without changing the logical form of the sentence. A simple statement is a statement without logically
significant words, and a complex statement is one that uses logically significant words to combine one
or more simple statements. To symbolize a statement is to replace all the simple statements with
capital letters and the logically significant words with special symbols. The first symbol we’ll use is
‘&’, to symbolize statements like the premise in the above argument. (Sometimes ‘·’ or ‘∧’ are used
instead). Thus, if A=‘dogs are mammals’ and B=‘cats are mammals’, the above argument can be


Statements of the form ‘A & B ’ are called conjunctions, and the statements on either side of the ‘&’
are called conjuncts.
Sometimes the order of the statement must be rearranged a little. Consider a statement like
‘Jezebel is happy and hungry’. Because the logical symbol ‘&’ can only combine statements, we
can’t take A to be ‘Jezebel is happy’ and B to be ‘hungry’ because ‘hungry’ is not a statement.
We have to rewrite this statement so the ‘and’ combines two whole statements: ‘Jezebel is happy
and Jezebel is hungry’. Then we have two simple statements that we combine to make a logically
complex conjunction. This is one reason we have a special word ‘conjunction’, instead of just saying


‘and’: in English, ‘and’ can combine adjectives (as in ‘happy and hungry’) or statements. The logical
conjunction can combine only statements.
Negations usually also take some rewriting. The statement ‘Jezebel is not happy’ is the negation
of the statement ‘Jezebel is happy’. If that statement is symbolized A, we can’t put the symbol for
‘not’ in between two statement symbols, since there’s only one. Instead we put it in front of the letter,
like this: ‘∼A’. (Sometimes ‘¬’ is used instead.) When reading a sentence, logicians will often say
‘it is not the case that Jezebel is happy’, to get the negation in the right place.
There are two other basic symbols of logic: Disjunction, symbolized ‘v’, which translates ‘or’.
Thus ‘Jezebel is happy or hungry’ could be symbolized ‘A v B ’. (Each side of a disjunction is called
a disjunct.)
The last basic symbol is conditional, symbolized →, which translates ‘if ... then ...’. So ‘if Jezebel
is hungry, then Brünnhilde is happy’ could be symbolized ‘A → B ’. (The left side of a conditional
is called the antecedent and the right side is called the consequent.) Conditional statements are different
from conjunctions and disjunctions in that they’re asymmetrical. ‘Jezebel is happy and Brünnhilde
is happy’ means the same thing as ‘Brünnhilde is happy and Jezebel is happy’. There might be
some reason you’d say one rather than the other, but either one works as a premise to conclude
‘Brünnhilde is happy’ (or ‘Jezebel is happy’). But with conditional statements, the antecedent and
the consequent are not interchangeable. We’ve seen before how swapping the antecedent and the
consequent makes an argument valid or invalid. Because of this, it is crucial to get the antecedent
and consequent right. The difficulty is made worse by the fact that we have several different ways
of saying conditionals, and depending on how we say it, the antecedent might come before or after
the consequent in the English sentence. For example, if I say ‘The light works only if you jiggle
the switch (and sometimes not even then)’, am I saying ‘If you jiggle the handle, the light works’?
No—that’s the point of that parenthetical remark. Jiggling the handle doesn’t guarantee that the
light works. Instead, I’m saying something like ‘If the light works, you must have jiggled the switch’.
In other words, while ‘A if B ’ is symbolized ‘B → A’, ‘A only if B ’ is symbolized ‘A → B ’. This
point is worth emphasizing because it is one of the most common sources of mistakes:

p→q If p then q ; If p, p; p only if q

p→q p if q ; only if p, q

The rule is this: the words ‘only if ’ come before the consequent; the word ‘if ’ comes before the
There is one more common symbol, called the ‘biconditional’. It’s shorthand for a conjunction
of conditionals. So if I want to say ‘if A then B , and if B then A’, I could write

(A → B) & (B → A)

or I could use the biconditional operator, usually symbolized ↔:


Sometimes we say that the biconditional symbolizes the English expression ‘if and only if ’, because
another way to say ‘if A then B , and if B then A’ is ‘A if and only if B ’.
We can combine these basic symbols, often called connectives, to symbolize statements that are
even more logically complex. We use parentheses, just as in math, to indicate which statements go
Example: If Jezebel comes or Brünnhilde comes, I am leaving.
Let J =‘Jezebel comes’, B =‘Brünnhilde comes’, I =‘I am leaving. Then the main connective in
this sentence is the conditional. The consequent of this conditional is simple, but the antecedent of
this conditional is complex. The sentence is symbolized like this:
(J v B) → I
Example: I don’t enjoy movies if the theater is loud and I don’t eat popcorn.
Let M =‘I enjoy movies’, L=‘the theater is loud’, P =‘I eat popcorn’. Again the main connective
is the conditional, but notice that the consequent comes first. The single word ‘if ’ comes before
the antecedent, so ‘the theater is loud and I don’t eat popcorn’ is the antecedent. Also, given our
symbols, two of the statements are negated.
(L & ∼P ) → ∼M
There are many English expressions that can by translated using these symbols. Here are some
of the most common:
neither and not both.
Think about the difference between these two sentences:

Neither Siegfried nor Brünnhilde can go to the bank.

Siegfried and Brünnhilde can’t both go to the bank.

The first says that Siegfried can’t go to the bank, and Brünnhilde can’t go to the bank. If either
of them can go to the bank, the sentence is false. The second says that they can’t both go. If either
of them goes alone, the sentence is true, but if they go together, it’s false. So the sentences mean
different things, have different logical value.
The first can be symbolized like this:
∼S & ∼B
or it could be symbolized like this:
∼(S v B)
We can prove that these two statements are logically equivalent, and we will do that later. But it also
makes sense if you think about it. The first way of symbolizing it says that Siegfried can’t do it, and
Brünnhilde can’t do it. The second says that it’s false that either of them can do it. These statements
mean the same.
The second can be symbolized like this:

∼(S & B)

or like this:

∼S v ∼B

Again, these statements are logically equivalent. The first says that it’s not the case that Siegfried
does it and Brünnhilde does it, and the second says that either Siegfried doesn’t or Brünnhilde
doesn’t (or neither does it). Again, these mean the same.
The moral to take from this is that parentheses are important, and can change meaning.
How would you translate a sentence like ‘Jezebel is tall, but Brünnhilde is strong’? The English
sentence has some logical information: Jezebel is tall and Brünnhilde is strong. It also has some
psychological information, expressing something about the contrast between the statements or the
surprisingness of one of them. This extra psychological information is important, perhaps, but not
logically important. So this sentence could be translated into logic in just the same way as ‘Jezebel
is tall and Brünnhilde is strong’—T & S . Lots of English words are used to assert that two different
statements are true, with different amounts of coloring about how the author wants you to consider
the statements: ‘whereas’, ‘on the other hand’, ‘although’, ‘in addition’, ‘also’. These can all be
translated by the conjunction.
‘Unless’ is a right old mess. If I say ‘Siegfried will go to the bank unless Brünnhilde does’, I
probably mean ‘If Brünnhilde goes to the bank, then Siegfried won’t go, but if Brünnhilde doesn’t
go to the bank, then Siegfried will go’. This can be symbolized by the biconditional:

S ↔ ∼B

This is an exclusive disjunction. But most logic books and standardized tests think of this sentence
another way. Suppose Brünnhilde does go to the bank, but Siegfried doesn’t know, so she goes, too.
Is the sentence false in this situation? If the sentence is true in the case where both Siegfried and
Brünnhilde go to the bank, the sentence expresses an inclusive disjunction:


Bowing to tradition, we will translate ‘unless’ this way as well. But be aware of the ambiguity.
If one side is negated, that disjunct will also be negated. So ‘Siegfried will not go to the store
unless Brünnhilde does’ can be translated

∼S v B

It could also be translated by the logically equivalent statement:



The world will explode unless you enter this code exactly and press that button.

Let W =‘the world will explode’, C =‘you enter this code exactly’, and B =‘you press that button’.
We could rewrite the sentence using these abbreviations:

W unless C and B .

The main connective here is ‘unless’—you must do the two things, or else the world will explode.
So ‘C and B ’ should be put in parentheses. ‘Unless’ is a disjunction, so the whole statement is this:

W v (C & B)

Here’s a table summarizing some common English expressions and their logical translation:

not p ∼p
it is not the case that p ∼p
it is false that p ∼p
p and q p&q
p but q p&q
neither p nor q ∼p & ∼q or ∼(p v q)
not both p and q ∼(p & q) or ∼p v ∼q
p or q pvq
p unless q pvq
not p unless q ∼p v q or p → q
if p, q p→q
p if q q→p
p only if q p→q
only if p, q q→p
p if and only if q p↔q

Let the letters on the left symbolize the statements on the right:
I I eat ice cream
C you eat cake
P we eat pie
B we eat brownies
Translate each of the following sentences into logical notation:

1 If I eat ice cream, you eat cake.

2 We eat pie, but I eat ice cream.

3 If we eat pie, you don’t eat cake.

4 Either we eat pie, or if I don’t eat ice cream, you don’t eat cake.

5 I eat ice cream only if you eat cake.

6 Either I eat ice cream, or, if I don’t eat ice cream and you don’t eat cake, then we eat brownies.

7 I don’t eat ice cream, nor do you eat cake.

8 We eat pie only if either you eat cake or I eat ice cream.

9 If I eat ice cream, then if you eat cake, we eat both pie and brownies.

10 If either you eat cake or we eat pie, then if I eat ice cream, we don’t eat brownies.

11 I eat ice cream only if we eat brownies.

12 I don’t eat ice cream unless you eat cake.

13 If we eat brownies, I don’t eat ice cream.

14 You eat cake only if we eat both pie and brownies.

15 If I eat ice cream and we eat pie, either you eat cake or we eat brownies.

16 We eat both pie and brownies, but if I eat ice cream, you eat cake.

17 You don’t eat cake if either I eat ice cream or we don’t eat brownies.

18 We neither eat brownies nor pie if I don’t eat ice cream.

19 If we neither eat pie nor brownies, then if I eat ice cream, you don’t eat cake.

20 We don’t eat pie, or if we do eat pie, we also eat brownies.

Choose your own letters for the simple statements, and symbolize each of the following sen-
tences. Be sure to write down the meaning of the simple statements.

21 If Nebuchadnezzar is a basketball player, then Nebuchadnezzar is tall.

22 If all basketball players are tall, then Nebuchadnezzar is tall.


23 This book is too long, but I am enjoying it anyway.

24 The government will default on its debt unless the federal bank cuts interest rates and the
treasury prints more money.

25 The federal bank will cut interest rates only if either inflation is too high or unimployment is
too low.

26 Jezebel is fast if Jezebel is a track star.

27 Jezebel is fast only if Jezebel is a track star.

28 The company has massive layoffs, but it will collapse only if the CEO steps down.

29 Unless the CEO steps down, the company will either collapse or have massive layoffs.

30 If the CEO steps down and the company has no viable replacement, it will collapse unless the
board acts quickly.

Symbolize the following arguments:

31 If everything is merely contingent, at one time nothing existed. If this were true, even now
nothing exists, which is absurd. So not everything is merely contingent.

32 Jezebel qualified for the finals only if every runner faster than Jezebel also qualified. Every
runner faster than Jezebel qualified, so Jezebel must have qualified, too.

33 Neither Dr. Black nor Professor Plum committed the murder. If Nurse White committed the
murder, so did Professor Plum. Either Dr. Black or Reverend Green committed the murder.
If Dr. Black committed the murder, so did Miss Scarlett. Therefore, if Reverend Green
committed the murder, so did Miss Scarlett.

34 Either Reverend Green or Dr. Black killed him. If Colonel Yellow killed him, Dr. Black
didn’t. Neither Nurse White nor Miss Scarlett killed him. Therefore, if Colonel Yellow killed
him, Reverend Green didn’t.

35 The universe is orderly and apparently designed, like a mechanical watch. If the universe is
orderly and apparently designed, like a mechanical watch, it must have been created by an
intelligent designer. So the universe must have been created by an intelligent designer.

36 There are only three possibilities: either your sister is mad, or she is telling lies, or she is telling
the truth. You know she does not tell lies, and she is obviously not mad, so for the time being,
unless other evidence turns up, we must assume she is telling the truth. (C.S. Lewis, The Lion,
the Witch, and the Wardrobe)

0.2 Scope and statement forms

Parentheses are very important in symbolizing complex statements. Look at this example:

The police will arrest you only if they see you and you do are doing something illegal.

Given some obvious symbols, this should be symbolized like this:

P → (S & I)

What would happen if we moved the parentheses, like this:

(P → S) & I

This statement could be translated back into English as

You are doing something illegal, and if the police arrest you, they they see you.

The second, unlike the first, asserts that you are doing something illegal. In the second statement, the
conjunction has the wider scope, so both conjuncts are being asserted, but in the first, the conditional
has the wider scope, so the conjuncts are asserted only conditionally.
The scope of a logical operator is the portion of the statement that is governed by that operator:
scope of →
z }| {
P → (S & I)
| {z }
scope of &

The main operator in a statement is the operator with the widest scope. Normally this will be
outside of all parentheses. If it is a negation, it will be the leftmost symbol. Here are some examples,
with the main operator indicated:

∼ (A & B)
(A & B) → (C v D)
∼(A & ∼B) & ∼C
A → (B → (C → (D)))
∼A v (B & C)
∼ ∼(A ↔ B)
It is often crucial to decide whether two statements have the same form. If we are compar-
ing a statement with only one operator with a statement that has more than one operator, we are
concerned only with the main operator in the more complex statement. For example, these two
statements have the same form:

A&B (A v B) & (C → D)

The statement on the left is a conjunction, and so is the statement on the right, because its main
operator is a conjunction.
If both statements are more complex, they must have the same operators with the same scope
to have the same form. For example, these two statements have the same form:

∼A → B ∼(A v B) → (C & D)

But these two statements do not:

∼A → B (∼A v B) → (C & D)

These last two statements don’t have the same form because the negation on the left has as its scope
the whole antecedent, but the negation on the right has as its scope only part of the antecedent.
Often we will use lower-case letters p and q (and possibly others) to identify statement forms. So
if we say ‘a statement of the form p & q ’, we mean any conjunction, no matter how complex.

Determine whether the following statements have the form indicated.

1 ∼p → q A → ∼B

2 ∼p → q ∼∼A → ∼B

3 (p → q) & ∼p (∼A → ∼B) & A

4 (p → q) & ∼p (∼A → ∼B) & ∼∼A

5 p&q (A & B) & (C & D)

6 p v ∼q A v ∼(B & (C → D))

Chapter 1

First-Order Logic

1.1 Translating categorical statements.

First-order logic (FOL) was developed in the nineteenth and twentieth century by several math-
ematicians and philosophers to correct some deficiencies they saw in traditional logic, which was
based on the logic of Aristotle.
We begin with atomic sentences. An atomic sentence is a simple sentence, with a subject and a
predicate, like this

Jezebel is tall.

This sentence has a subject (‘Jezebel’) and a predicate (‘is tall’). In FOL, predicates are symbol-
ized by capital letters and subjects by lower-case letters. The subject comes after the predicate. So
this sentence is symbolized


There were some simple valid arguments that could not be assimilated into traditional logic.
For example, look at this argument:

Jezebel ate the chocolate cake, so someone ate the chocolate cake.

This is clearly a valid argument. But how would you translate it into logic? There is no ‘and’ or
‘or’ or ‘not’—this argument has none of the logical connectives we’ve discussed. It has two simple

J: Jezebel ate the chocolate cake.

S: Someone ate the chocolate cake.

So the argument would be symbolized like this:


J .˙.S

That’s not a valid argument. So we need more.

The first change we’ll make is to distinguish the subject of a sentence from the predicate. We’ll
now use capital letters to stand for predicates and lower-case letters to stand for subjects:

j: Jezebel Cx: x ate the chocolate cake.

Now we can symbolize the premise of the above argument like this:


Now the simple statements are more complex than they were before. Before, we just had single
letters; now we have statements that reveal the inner logic of these statements. Once we have these
statements, we can then combine them just as we’ve been doing.

s: Siegfried Bx: x ate the blueberry pie.

Now we can combine them together:

Cj &Bs.

Another example:

Either Jezebel or Siegfried ate the chocolate cake

You might be tempted to put the disjunction between the lower-case letters, but that would be
wrong. The logical connectives we have can go only between sentences. So we need to paraphrase
the sentence so that every simple sentence has just one subject and one predicate, like this:

Either Jezebel ate the chocolate cake, or Siegfried ate the chocolate cake.

Now we can translate it easily:

Cj vCs.

Use the following key:
a: apple pie; b: blueberry cheesecake; c: carrot cake
Fx: x is fried; Gx: x is good; Hx: x is healthy.

1 Apple pie is good.

2 If blueberry cheesecake is fried, it is healthy.


3 Carrot cake is healthy and good.

4 If apple pie is healthy, carrot cake is healthy, too.

5 If apple pie is fried, it is neither healthy nor good.

That’s the first step: We split apart the simple statements to reveal the subject-predicate structure
within simple sentences. We can now symbolize the premise of the above argument. But what about
the conclusion? We have to take another step. Sometimes we want to talk about what is true of
everything or something—that is, we want to symbolize sentences that have no proper names, as in
‘Someone ate the chocolate cake’. To do this requires some new symbols, which we call ‘quantifiers’:

∀x: for all x ∃x: for some x

The first, ‘∀’ is called the universal quantifier, and is sometimes said ‘for every’ or ‘everything’. (In
older works, the universal quantifier is written with parentheses around the variable; so instead of
‘ ∀x’, they have ‘(x)’. It makes it easier to type, I suppose.) The second is called the ‘existential
quantifier’, and is also said ‘there exists something such that’ or ‘there is’.

Jezebel ate the chocolate cake, so someone ate the chocolate cake.

We already have the premise of this argument:


Now, to translate the conclusion, we use the quantifiers:


So the whole argument can by symbolized

Cj .˙.∃xCx.

Notice, just as before, we need to paraphrase a little before we can translate. The symbolized
sentence literally says ‘For some x, x ate the chocolate cake’. The ‘x’ here is called a variable; we
normally pick letters toward the end of the alphabet—‘x’, ‘y’, ‘z’, and then ‘w’, ‘v’, and so on. The
variable functions like a pronoun, so another paraphrase would be ‘there exists something such that
it ate the chocolate cake’.
The variable that follows a quantifier is said to be bound by that quantifier. So, for example in
the sentence ‘ ∃xCx’, the x is bound by the existential quantifier. A quantifier’s scope extends to the
next connective, or, if there are parentheses, to the right parenthesis. So in the statement

∀xPx &Qx

The ‘x’ in ‘Px’ is bound—within the scope of the universal quantifier—and the ‘x’ in ‘Qx’ is
unbound or free. A bound variable is like a pronoun, but an unbound variable is like a name. So
the above sentence says ‘Everything is P, and x is Q’. If we put parentheses, like this:

∀x(Px &Qx)

it says, ‘Everything is both P and Q’.

Now, if ‘ ∀xPx’ says ‘Everything is purple’, how do we say ‘Not everything is purple’? By putting
the negation in front of the quantifier, like this:


And how do we say ‘Everything is not purple’? Like this:

∀x ∼Px.

So the order of the symbols matters. The first says ‘It is not the case that everything is such
that it is purple’; the second says ‘Everything is such that it is not purple’. This second is probably
more naturally said ‘Nothing is purple’, which can be paraphrased ‘It is not the case that there is
something that is purple’, or


So here we have two different, equivalent ways of saying the same thing: ‘Everything is not
purple’ can be translated ‘ ∀x ∼Px’ or ‘ ∼∃xPx’. In fact, there are other equivalences:

∀xRx ≡ ∼∃x ∼Rx

∀x ∼Rx ≡ ∼∃xRx
∼∀xRx ≡ ∃x ∼Rx
∼∀x ∼Rx ≡ ∃xRx

If we call a negation before the quantifier an outer negation, and a negation after the quantifier
an inner negation, we can say that an outer negation of one quantifier is the same as the inner
negation of the other. Or, you can think of it like this: when you push a negation through the
quantifier, the quantifier flips.
There are four common statement types that use a single quantifier, which since the middle ages
have been named after vowels:
A Every P is Q ∀x(Px →Qx)
E No P is Q ∀x(Px →∼Qx)
I Some P is Q ∃x(Px &Qx)
O Not every P is Q ∃x(Px &∼Qx)

These four types of statement, along with the unquantified statements of the forms Pa and ∼Pa,
are called categorical statements.
Notice a few things about these categorical statements. A and E sentences begin with the uni-
versal quantifer and have →as the main connective; I and O sentences begin with the existential
quantifer and have &as the main connective. This is always the way it works with categorical state-
ments, and almost always the way it works even with more complicated statements. This is important
enough to put into its own box.

Whenever you’re symbolizing a categorical statement, the main connective after ∀ is

→; the main connective after ∃ is &.

Why is that? Let’s look at some concrete examples. The next few examples will use the following

Px: x is a pie
Dx: x is delicious.

Now look at an A statement, like ‘Every pie is delicious’. This says to look at everything (in the
universe, or in our universe of discourse). If it’s not a pie, we ignore it. If it is a pie, then it must also
be delicious, or what we’ve said is false. That’s what

∀x(Px →Dx)

says. If we had translated instead ‘∀x (Px &Dx)’, this would have said that everything is both a pie
and delicious. Everything is a delicious pie—that’s far stronger than we wanted to say. Any non-pie
or any non-delicious thing would be a counterexample. As it is, only something that is both a pie
and is not delicious would prove the sentence false.
Now think about an I statement, like ‘Some pies are delicious’. You might think we’d want to
symbolize with a conditional, as before. But recall that ‘Px →Qx’ is equivalent to ‘∼Px vQx’, so
‘∃x(Px →Qx)’ is equivalent to ‘∃x(∼Px vQx)’. This says ‘there is something that is either not a pie or
is not delicious’. This is far weaker than we wanted to say. It could be made true if there is anything
that’s not a pie, or anything that’s delicious. It’s far too easy to be made true. What we want is for
the sentence to say ‘there are some things that are pies and are delicious’, and that’s what

∃x(Px &Dx)

says. (Note that this says that there is at least one delicious pie, whereas the English has a plural. In
logic we lose that distinction. An existential quantifier is true even if there’s only one, and is still true
if there are billions.)
Many categorical sentences are fairly straightforward to translate. A few words have a trick to

A sentence like

Only pies are delicious

says there’s nothing delicious—not cakes, not cookies—except pies. It says ‘Everything that is deli-
cious is a pie’:

∀x(Dx →Px)

That is, ‘only’ swaps the antecedent and consequent, just as it does in truth-functional logic. Just
as ‘only if ’ is the converse of ‘if ’, ‘only’ is the converse of ‘all’.

Sometimes in English, ‘any’ means ‘all’ and sometimes it means ‘some’. Look at these sentences:
Not any pies are delicious = No pies are delicious (E) ∀x(P x → ∼Dx)
If anyone is hungry, he will eat = All hungry people will eat (A) ∀x(Hx → Ex)
If there are any pies, Jezebel is hungry ∃xP x → Hj
The last sentence is not a categorical statement, and looks ahead to what we’ll be doing in later
chapters. Whether ‘any’ should be translated using an existential or a universal depends on the
If the English doesn’t have ‘all’ or ‘some’, as in ‘Pie is delicious’, sometimes it means ‘all’ and
sometimes it means ‘some’. How can you tell? Only by looking at the sentence as a whole and
figuring out what it means and how best to translate that into the language of logic. That’s the same
method you use when translating into Spanish, or Swahili. There are a few tricks, but translation is
an art.


6 Insects are dangerous. (Ix: x is an insect; Dx: x is dangerous)

7 Insects are not all dangerous. (Ix: x is an insect; Dx: x is dangerous)

8 No insects are dangerous. (Ix: x is an insect; Dx: x is dangerous)

9 Wizards all work magic. (Wx: x is a wizard; Mx: x works magic)

10 Only wizards work magic. (Wx: x is a wizard; Mx: x works magic)

11 All who work magic are wizards. (Wx: x is a wizard; Mx: x works magic)

12 Not all who work magic are wizards. (Wx: x is a wizard; Mx: x works magic)

13 No one is happy. (Px: x is a person; Hx: x is happy)


14 Not everyone is happy. (Px: x is a person; Hx: x is happy)

15 Someone is not happy. (Px: x is a person; Hx: x is happy)

16 Some books are both long and interesting. (Bx: x is a book; Lx: x is long; Ix: x is interesting)

17 No long book is interesting. (Bx: x is a book; Lx: x is long; Ix: x is interesting)

18 No book is interesting unless it is not long. (Bx: x is a book; Lx: x is long; Ix: x is interesting)

19 All long books are interesting. (Bx: x is a book; Lx: x is long; Ix: x is interesting)

20 Only fools and horses work. (Fx: x is a fool; Hx: x is a horse; Wx: x works)

21 Fools and children speak the truth. (Fx: x is a fool; Cx: x is a child; Tx: x speaks the truth)

22 Some insects are dangerous only if bothered. (Ix: x is an insect; Dx: x is dangerous; Bx: x is

23 All insects are dangerous if bothered. (Ix: x is an insect; Dx: x is dangerous; Bx: x is bothered)

24 All politicians and outlaws are liars and scoundrels. (Px: x is a politician; Ox: x is an outlaw;
Lx: x is a liar; Sx: x is a scoundrel)

Cx = x is a cake.
Gx = x is good.
Bx = x has been baked properly.

25 Some cakes are good and have been baked properly.

26 Some cakes are good only if they have been baked properly.

27 Some cakes are good if they have been baked properly.

28 Any cake is good that has been baked properly.

29 Any cake that is good has been baked properly.

30 No cake is good unless it has been baked properly.

31 Any cake is good if it has been baked properly.

32 Any cake has been baked properly if it is good.

33 A cake is good if and only if it has been baked properly.

34 Good cakes have all been baked properly.


35 Only properly baked cakes are good.

36 Only good cakes have been baked properly.

37 Some cakes are good even though they have not been baked properly.

38 If something is a properly baked cake, then it must be good.

39 Some cakes that are baked properly are not good.

40 Some cakes are neither good nor baked properly.

41 No cake that is baked properly fails to be good.

42 A cake is good only if it has been baked properly.

43 If anything is a good cake, then it has been baked properly.

44 If any cake is baked properly, then it is good.

Devise your own symbols, and translate the following

45 All beginnings are hard.

46 None but the brave deserve the fair.

47 All good things must come to an end.

48 There’s no bad publicity.

49 No man can serve two masters.

50 No man is an island.

51 Only the good die young.

52 Nothing is secret which shall not be made manifest.

Universe of discourse
Sometimes, instead of quantifying over everything, we’ll explicitly say the variables range over some
smaller class. This is called restricting the universe of discourse. There are two cases when this is
very common: (1) We restrict the universe of discourse to persons, so that ‘ ∃xCx’ says ‘Someone
ate the chocolate cake’ instead of ‘Something ate the chocolate cake’. (2) When we’re doing math,
we restrict the universe of discourse to numbers.

1.2 Relations and multiply general statements

Multiply-general propositions
Syllogisms and categorical statements are only one small part of modern logic. The real power
comes from multiply general statements, statements that have more than one quantifier. Here is an
example of an argument that looks like a syllogism—or rather, an enthymeme—but can’t be shown
valid by any of the many methods devised for syllogisms:

All donuts are delicious, so anyone who eats a donut eats something delicious.

Here the premise is a simple categorical statement; it might be symbolized ‘ ∀x(Nx →Dx)’. The
conclusion could also be translated as a simply categorical statement; taking ‘Ex’ to be ‘x eats a
donut’ and ‘Sx’ to be ‘x eats something delicious’. But then the conclusion would be ‘ ∀x(Ex →Sx)’,
which does not follow from the premise.
The first step to symbolizing this argument is to extend our notation so that a quantifier can apply
to only part of a line, and a sentence can have more than one quantifier. With that change, we can
have sentences that are truth-functional combinations of categorical statements. For these sentences,
let Px=‘x is a professor’, Sx=‘x is a student’, Hx=‘x is happy’, Cx=‘x goes to class’, b=‘Brünnhilde’,
and restrict the universe of discourse to persons.

If all students are happy, Brünnhilde is happy

∀x(Sx →Hx) →Hb

Some professors are happy only if all students are unhappy

∃x(Px &Hx) →∀x(Sx →∼Hx)

If no one goes to class, everyone is happy

∼∃xGx →∀xHx

Everyone except students is happy unless some professors don’t go to class

∀x(∼Sx →Hx) v∃x(Px &∼Cx)

If everyone who goes to class is happy, and not all students are happy, then not all students go to class
(∀x (Gx →Hx) &∃x (Sx &∼Hx)) →∃x (Sx &∼Cx)

Brünnhilde goes to class only if she is a student, not a professor, and all students go to class
Cb →((Sb &∼Pb) &∀x(Sx →Cx))

In many of these examples, we used the variable x for multiple quantifiers. Because each quan-
tifier has its own scope, marked by the parentheses, there is no ambiguity.
Sometimes, however, it is necessary for ony quantifier to fall in the scope of another. One way
for this to happen is with what I call a “yellow banana” sentence, because of this example (let Yx=‘x
is yellow’, Bx=‘x is a banana’, Rx=‘x is ripe’):

If any bananas are yellow, then if all yellow bananas are ripe, they are ripe
∀x((Bx &Yx) →(∀y((By &Yy) →Ry) →Rx))

This sentence is difficult. The antecedent seems to say ‘there are yellow bananas’, which would
be translated ‘∃x(Bx &Yx)’. But they that ‘they’ if the last consequent needs to be bound by the
same quantifier. This requires making the quantifer apply to the whole statement, not merely the
antecedent, which requires it to become a universal.

Translate these sentences using the following symbols:
Dx=x is a dog
Cx=x is a cat
Wx=x is well-trained
Px=x is a perfect pet
Fx=x is friendly

1 Kinkie is a cat and not a dog.

2 Kinkie is not friendly if Kinkie is a cat.

3 No dog is a perfect pet.

4 Some friendly dog is not a perfect pet.

5 Either no dog is a perfect pet or some dog is friendly.

6 Any dog that is a perfect pet is well-trained.

7 No dog is a perfect pet unless it is well-trained.

8 Every dog is a perfect pet, but, among cats, only those that are well-trained are perfect pets.

9 No friendly, well-trained cat is not a perfect pet.

10 A dog is a perfect pet if and only if it is both friendly and well-trained.

11 If all cats are friendly, Kinkie is not well-trained if he is not a perfect pet.

12 Any cat that is either friendly or well-trained is a perfect pet.

13 If any cat is well-trained, it is a perfect pet.

14 If any cat is well-trained, then if all cats are friendly, it is a perfect pet.

15 No dog is a cat.

16 If Kinkie is a cat, Kinkie is well-trained.

17 Not every cat is a perfect pet, but Kinkie is.

18 Some well-trained cats are not friendly.

19 Among cats, only those that are friendly and well-trained are perfect pets.

20 If any dog is unfriendly, it is not a perfect pet.

21 Nothing is a friendly cat without being a perfect pet.

22 If all cats are unfriendly, no cats are well-trained.

23 If any dog is friendly, then if all dogs are well-trained, it is a perfect pet.

24 No unfriendly cats are perfect pets but some friendly ones are.

25 No cat is a perfect pet only if neither all cats are friendly nor some cats are not well-trained.

Having more than one quantifier allows for us to symbolize relations. For example, we can take
‘Dxy’ to be ‘x is more delicious than y’. Then, if we name a particular cookie ‘Nebuchadnezzar’
(symbolized ‘n’), and another cookie ‘Ahasuerus’ (symbolized ‘a’), then we can symbolize

Nebuchadnezzar is more delicious than Ahasuerus



If we wanted to say that Ahasuerus is more delicious than Nebuchadnezzar, we would switch the
order of the names, like this:


We can add quantifiers, so we can symbolize, for example,

Something is more delicious than Nebuchadnezzar



and we can symbolize


Everything is more delicious than Nebuchadnezzar



As we’ve seen, the order of the names or variables after the relation symbol matters. Likewise,
the order of the quantifiers matters. We choose something for the leftmost quantifer first, and then
work our way in. For example, compare these two sentences:

∃x∀yDxy (Something is more delicious than everything)

∀x∃yDxy (Everything is more delicious than something)

Note that these are not the same. The first says that there’s one thing, a blueberry cheese-
cake most likely, that is more delicious than everything. The second says that there’s no end to
deliciousness—take anything you like, no matter how delicious it is, there’s something even more

∀x∃yDxy ∀x∃yDyx
Everything is more delicious than something Everything has something more delicious
than it
i.e., there is no least delicious thing i.e., there is no most delicious thing

∃x∀yDxy ∃x∀yDyx
There is something more delicious than ev- There is a something than which everything
erything is more delicious
i.e., there is a most delicious thing i.e., there is a least delicious thing

(You may have noticed that the bottom two sentences require something to be more delicious than
itself. Thus the English paraphrase after ‘i.e.’ isn’t exactly right. To symbolize those, we’ll need
identity, which we’ll learn later.)
The order of the quantifiers matters to the interpretation of the sentence, and so does the order
of the variables in a relation. ‘Dxy’ means that x is more delicious than y; ‘Dyx’ means that y is
more delicious than x.
Now we can symbolize the argument mentioned before:

All donuts are delicious, so anyone who eats a donut eats something delicious.

We let ‘Nx’ stand for ‘x is a donut’ and ‘Dx’ stand for ‘x is delicious’, as before. We need to add ‘Px’
for ‘x is a person’ and ‘Exy’ for ‘x eats y’. Now we can symbolize the argument like this:

∀x(Nx →Dx) .˙.∀y[(Py &∃x(Nx &Eyx)) →∃x(Dx &Eyx)]



Every pie is more delicious than any cookie

Paraphrase: ‘Take and pie you like and any cookie you like, the pie will be more delicious than
the cookie’. That is, the ‘any’ here is a universal quantifier. It doesn’t mean that the pie is more
delicious than at least one cookie; it means it’s more delicious than them all.

∀x(Px →∀y(Cy →Dxy))

Another way of saying this, one closer to the English paraphrase, is this:

∀x ∀y((Px &Cy) →Dxy)

This is equivalent to the first, but the scope of all quantifiers is the whole sentence.

If any book is damaged, any student who checked it out will be fined for it.

It is often helpful to symbolize in stages, sometimes starting with related, but easier propositions.

If any book is damaged, any student will be fined.

Look first at the antecedent. It doesn’t mean ‘if every book is damaged’, but rather ‘if there is a
damaged book’. So we symbolize it like this, with ‘Bx’ symbolizing ‘x is a book’ and ‘Dx’ symbolizing
‘x is damaged’:

∃x(Bx &Dx) →…

Now, the consequent. Here again, it doesn’t mean that all students will be fined, only some.
The ‘any’ indicates that there are are no restrictions; no student is exempt from whatever process
the library uses to decide who pays the fine. So, taking ‘Sx’ to symbolize ‘x is a student’ and ‘Fx’ to
symbolize ‘x will be fined’, the whole proposition is symbolized like this:

∃x(Bx &Dx) →∃x(Sx &Fx).

Now, the original statement had a qualification on ‘any student’: ‘any student who checked it
out’. We’ll need a new relation statement, ‘Cxy’ to symbolize ‘x checked y out’. It is a single object
that is a damaged book and is checked out by the student, so the quantifier will need to stretch across
both the antecedent and the consequent. Here the consequent is fairly easy:

… →∃x(Sx &Cxy &Fx),

but we need to bind the y. We might be tempted to try this:


∃y((By &Dy) →∃x(Sx &Cxy &Fx))

that is, we simply extended the scope of the first quantifier to range over the whole statement. But
this is wrong. Recall the rule of thumb about the main connective for an existential quantifier, and
think about what this says: There is something that is either not a damages book or was checked out
by some student who was fined. It won’t help to change the connective to a conjunction, like this:

∃y((By &Dy) &∃x(Sx &Cxy &Fx)).

This asserts that there is a damaged book, but the original statement only claimed that if there were,
some student would be charged. The solution is to change the first quantifier to a universal:

∀y((By &Dy) →∃x(Sx &Cxy &Fx))

This gets it right. It says that all damaged books (if there are any) are such that someone who checked
them out will be fined. There was one more complication in the original statement, but not one that
should puzzle us now. It said that the student ‘will be fined for it’. So instead of having a monadic
predicate ‘x is fined’ we need a relation ‘Fxy’, ‘x is fined for y’:

∀y((By &Dy) →∃x(Sx &Cxy &Fxy))

Another rule of thumb: An existential quantifier on the antecedent of a conditional becomes a

universal quantifier on the whole conditional.

Symbolize the following English sentences using the following symbols:
Px: x is a pie
Cx: x is a cake
Dxy: x is more delicious than y

26 Everything is more delicious than everything.

27 Something is such that everything is more delicious than it.

28 There’s nothing more delicious than everything.

29 Any pie is more delicious than any cake.

30 There’s a cake more delicious than any pie.

31 There’s a cake and a pie that are equally delicious.

32 If there’s a pie more delicious than any cake, then that pie will be more delicious than every

Symbolize the following English sentences using the following symbols:
Lxy: x loves y
Restrict the universe of discourse to persons

33 Everyone loves someone.

34 Everyone is loved by someone.

35 Someone loves everyone.

36 Someone is loved by everyone.

37 If someone loves everyone, then someone loves himself. (or herself)

38 If someone loves everyone, then that person loves himself. (or herself)

39 If anyone loves anyone, then everyone loves someone.

40 Whoever loves anyone also loves himself.

41 Someone who loves everyone also loves himself.

Translate the following from logical notation into English. All are common proverbs, or are
based on the scripture cited. Use the following symbols:
Bx: x is broken
Cxy: x comes to y
Fxy: x falls on y
Gx: x is made of glass
Gxy: x gathers y
Hx: x is home
Lx: x is a place
Lxy: x lives in y
Kxy: x is like y
Mx: x is moss
Px: x is a person
Pxy: x grinds y to powder.
Rx: x is rolling
Sx: x is a stone
Txy: x should throw y
Wx: x waits

42 ∃x(Sx &∀y((Py &Fyx) →By)) (see Matt 21:44/Luke 20:18)


43 ∃x(Sx &∀y((Py &Fxy) →Pxy)) (see Matt 21:44/Luke 20:18)

44 ∀x(Px →∀y(Ry →Lxy)) (loosely based on Isaiah 17:13)

45 ∀x((Rx &Sx) →∼∃y(My &Gxy))

46 ∀x(Px →∀yCyx)

47 ∼∃x(Lx &∀y(Hy →Kxy))

48 ∀x(Px →∃y(Hy &Cxy))

49 ∀x((Px &∃y(Hy &Gy)) →∀z(Sz →∼Txz))

For the following exercises, use these symbols:
Lx: x is a lion
Zx: x is a zebra
Axy: x attacks y
Sxt: x sees y

50 Every lion attacks every zebra it sees.

51 Some lions attack every zebra that sees it.

52 No lion attacks every zebra that sees itself.

53 No lion attacks any zebra that sees it.

54 Every lion that sees a zebra is attacked by it.

55 Some lions attack only zebras they see.

56 If every lion attacks only zebras it sees, but no lion sees every zebra, then not every zebra is

57 Some lions attack only zebras, but no lion attacks every zebra.

58 Every zebra is attacked by some lion, but no lion attacks every zebra.

For the following exercises, use these symbols:
Bx: x is a boy
Gx: x is a girl
Kxy: x kisses y

Sxt: x sees y
Also, restrict the universe of discourse to persons. That means that the variables refer only to

59 Every boy kisses some girl (or other).

60 Some girl is kissed by every boy.

61 Some boys kiss only girls.

62 Some girls kiss not only boys.

63 Every girl kisses some boy she sees.

64 Every girl kisses some boy who sees her.

65 Every girl who sees a boy kisses him.

66 Some girl kisses only boys she sees.

67 No one kisses every boy.

68 No girl kisses every boy.

69 No boy kisses every girl he sees.

70 If a boy kisses a girl, she kisses him (back).

71 Some girls see no one but kiss everyone.

72 Everyone kisses someone, but no one kisses everyone.

73 Everyone kisses someone, but no one is kissed by everyone.

74 If someone is kissed by everyone, everyone kisses someone.

For the following exercises, use these symbols:
Ex: x is even
Ox: x is odd
x<y: x is less than y (notice that we use “infix” notation here)
Sxt: x is the successor of y
Restrict the universe of discourse to natural numbers (0, 1, 2, ...). That means that the variables
refer only to numbers.

75 Every number is less than some number (or other).


76 Some even number is less than every odd number.

77 No number is less than every number.
78 Every even number has an odd successor.
79 Every odd number is the successor of some even number.
80 Every number is less than its successor.
81 No number is its own successor.
82 No number is less than itself.
83 If every number is less than its successor, then there is some number that is not the successor
of anything.
84 If a number is less than another, the second is not less than the first.
85 If one number is less than another, and that number is less than a third, the first number is
less than the third.

Translate the following using the symbols provided.
86 There is a treasure in each egg, and an egg in every hiding spot. (Tx: x is a treasure, Ex: x is
an egg, Ixy: x is in y, Hx: x is a hiding spot)
87 If Brünnhilde is faster than everyone, she is faster than herself. (universe=persons, b: Brünnhilde,
Fxy: x is faster than y)
88 Any dog that chases itself will hurt something. (Dx: x is a dog, Cxy: x chases y, Hxy: x hurts
89 Every man who looks at himself sees something he doesn’t like. (Mx: x is a man, Lxy: x looks
at y, Sxy: x sees y, Lxy: x likes y)
90 Every farmer who has a donkey beats it. (Fx: x is a farmer, Dx: x is a donkey, Bxy: x beats y.
This sentence is tricky, and it’s somewhat controversial what the right symbolism is.)
91 Harry shaves everyone who doesn’t shave himself. (universe=persons, h: Harry, Sxy: x shaves
92 Harry shaves only those who don’t shave themselves. (universe=persons, h: Harry, Sxy: x
shaves y)
93 If Harry shaves all and only those who don’t shaves themselves. (universe=persons, h: Harry,
Sxy: x shaves y)

1.3 Properties of Relations

If you know that A is next to B in a straight line, then you know that B is next to A. This is because
the relation next to has a special property, called symmetry. If R is a symmetric relation, whenever x
stands in relation R to y , y stands in the same relation to x.

In symbols:

∀xy(Rxy → Ryx)

Some relations are symmetric. Others are asymmetric: if x stands in R to y , y doesn’t stand in R
to x. Taller than is a good example: If A is taller than B, you know that B is not taller than A. Other
relations are non-symmetric, which means that there are no guarantees either way.

symmetric asymmetric non-symmetric

the same height as taller than loves
next to in front of
∀xy(Rxy → Ryx) ∀xy(Rxy → ∼Ryx) ∼∀xy(Rxy → Ryx) & ∼∀xy(Rxy → ∼Ryx)

Some relations are reflexive, which means everything stands in that relation to itself. Is the same
height as is a good example of this: everything is the same height as itself. Others are irreflexive, so
don’t stand in the relation to themselves, and others are non-reflexive.
(Actually, reflexivity usually refers to a slightly weaker property: if x bears relation R to anything,
or anything bears R to it, then it bears R to itself. The simpler property is called total reflexivity).

reflexive irreflexive non-reflexive

the same height as taller than loves
in the same place as in front of
∀xRxx ∀x∼Rxx ∼∀x(∃y(Rxy v Ryx) → Rxx)
∀x(∃y(Rxy v Ryx) → Rxx)

The final important relation is transitivity.


transitive intransitive non-transitive

taller than next to loves
in front of immediately in front of
∀xyz((Rxy & Ryz) → Rxz) ∀xyz((Rxy & Ryz) → ∼Rxz) ∼∀xyz((Rxy & Ryz) → Rxz)
&∼∀xyz((Rxy & Ryz) → ∼Rxz)

1.4 Identity
One special relation is identity, the relation that everything bears to itself and to nothing else. Partly
because this relation is so special, but also partly because it’s so familiar, we symbolize this relation
differently. Instead of writing


we write


We also symbolize non-identity in a special way. Instead of writing


we write


We will consider four forms of statement that can be symbolized using identity.

There are at least two As ∃x ∃y (Ax &Ay &x ̸= y)

There are at least three As ∃x ∃y ∃z (Ax &Ay &Az &x ̸= y &x ̸= z &y ̸= z)
There is at most one A ∀x ∀y ((Ax &Ay) →x=y)
There are at most two As ∀x ∀y ∀z((Ax &Ay &Az)→(x=y vx=z vy=z)
There is exactly one A ∃x (Ax &∀y (Ay →x=y))
There are exactly two As ∃x ∃y(Ax &Ay &x ̸= y &∀z(Az →(x=z vy=z)))
The A is B ∃x (Ax &∀y (Ay →x=y) &Bx)
The C-est A is B ∃x (Ax &∀y ((Ay &x ̸= y) →Cxy) &Bx)
1.4. IDENTITY 35

There are at least n As

There is at least one pie on the shelf.

(For this and the following examples, we’ll use these symbols: Px=‘x is a pie’ and Sx=‘x is on the
shelf ’)
The existential quantifier already says ‘at least one’, so ‘there is at least one pie on the shelf ’ is

∃x (Px &Sx)

With larger n, we want to guarantee that the items picked out are distinct. To say ‘There are
at least two pies on the shelf ’, we need to say ‘there is a pie on the shelf, and there is another pie on
the shelf ’. To get the idea of ‘another’, we use the negated identity, to mean something that is not the
thing we already picked out:

∃x ∃y (Px &Sx &Py &Sy &x ̸= y)

For larger numbers (at least three, at least four, ...), we need more quantifiers. To say ‘at least n’,
we need n quantifiers. Then we need to say that none(of these ) is the same as any of the others, that
there are n distinct things. To do this, we need to add n−1 or n(n − 1)/2 non-identity conjuncts.
So to say ‘There are at least three pies on the shelf ’, we say

∃x ∃y ∃z (Px &Sx &Py &Sy &Pz &Sz &x ̸= y &x ̸= z &y ̸= z)

You may have noticed that there are some parentheses missing. When symbolizing with identity,
&can multiply quickly. In this section, we will allow strings of conjunctions to have parentheses only
surrounding the whole, instead of around each pair.

There are at most n As

There is at most one pie on the shelf

(For this and the following examples, we’ll use these symbols: Px=‘x is a pie’ and Sx=‘x is on the
shelf ’)
To say ‘at most one’, we’ll say something like ‘if you try to take two things, you’ll really take the
same thing twice’. That is:

∀x ∀y ((Px &Sx &Py &Sy) →x=y)

We might paraphrase this sentence like this


Take anything you like, and take anything you like: if they’re both pies on the shelf, you
took the same thing twice.

This is a little awkward to say in English. We want to say ‘take anything you like, and take anything
else ...’. But that ‘else’ isn’t built into the quantifier. This allows us to say things that are awkward to
say in English. If we want the ‘else’, we use identity.
To add more things (at least two, at least three, ...), we add more quantifiers. To say ‘there are at
most n things’, we need n + 1 quantifiers. In the( consequent,
) we disjoin an identity statement with
every combination of variables (so there will be n−1 or n(n − 1)/2 disjuncts). The sentences get
unwieldy pretty quickly, but the idea is this: if you try to pick out n + 1 things, you’ve picked out the
same thing at least once.

There are exactly n As

There is exactly one pie on the shelf

‘Exactly one’ means ‘at least one and at most one’. So we can symbolize this simply by conjoining
‘at least’ with ‘at most’:

∃x (Px &Sx) &∀x ∀y ((Px &Sx &Py &Sy) →x=y)

There is a simpler way to do this, however. First I’ll paraphrase, then symbolize:

There is a pie on the shelf, and everything that is a pie on the shelf is that first pie.

∃x (Px &Sx &∀y ((Py &Sy) →x=y))

For larger numbers (exactly two, exactly three, ...), we could likewise simply conjoin the sentences
for ‘at least’ and ‘at most’. But we could also combine them, with n existential quantifiers, and an
additional one universal. We need just as many non-identity conjuncts as we to do say ‘at least’, but
in the consequent of universal conjunct, we need only n additional identity disjuncts.

There are exactly two pies on the shelf

∃x ∃y(Px &Sx &Py &Sy &x ̸= y &∀z((Pz &Sz) →(x=z vy=z)))

The A is B
The pie is on the shelf

This statement asserts that there is at least one pie and that there is at most one pie, and that
this pie is on the shelf. So we can symbolize it as if it said
1.4. IDENTITY 37

There is exactly one pie, and it is on the shelf

in symbols

∃x (Px &∀y (Py →x=y) &Sx)

It may seem odd that ‘the pie is on the shelf ’ and ‘Peter is on the shelf ’, which have similar
grammatical structure in English, should translate into logic so differently. ‘Peter is on the shelf ’
is a simple subject-predicate statement: ‘Sp’. But ‘the pie is on the shelf ’, also a simple subject-
predicate in English, turns into logic as a monster: ‘∃x (Px &∀y (Py →x=y) &Sx)’. The philosopher
who proposed this translation, Bertrand Russell, recognized how very odd this seems. He took the
moral to be that we cannot simply look to the structure of a sentence in a natural language like
English to find out what its logical and philosophical implications are.
Not every instance of ‘the’ should be translated this way. Sometimes ‘the’ means ‘all’, as in ‘The
good die young’ (paraphrase: ‘Everything is such that if it is good, it dies young’). But if ‘the’ means
to pick out a single thing, it should be translated according to Russell’s theory.

The C-est A is B
One special case of ‘the’ sentences are sentences with a superlative. We might try to symbolize

The tastiest pie is on the shelf

(using an additional symbol: Txy=x is tastier than y) as ‘∃x (Px &∀y (Py →Txy) &Sx)’. This is close:
it says that there is some pie tastier than all pies, and it is on the shelf. But this implies that this pie
is tastier than itself. We want to say that this pie is tastier than all other pies, and for that we need

∃x (Px &∀y ((Py &x ̸= y ) →Txy) &Sx)

This also allows us to correct the translations of ‘there is a most delicious thing’ and ‘there is a
least delicious thing’ from a few sections back:

There is a most delicious thing

∃x ∀y (x ̸= y →Dxy)

There is a least delicious thing

∃x ∀y (x ̸= y →Dyx)

Symbolize the following sentences, using this key:
Lxy: x is larger than y
Exy: x eats y
Sx: x is a snickerdoodle
Cx: x is a cookie
j: Jezebel

1 Jezebel eats at least one cookie.

2 Jezebel eats at least two cookies.

3 Jezebel eats at least three cookies.

4 Jezebel eats at most one cookie.

5 Jezebel eats at most two cookies.

6 Jezebel eats at most three cookies.

7 Jezebel eats exactly one snickerdoodle.

8 Jezebel eats exactly two snickerdoodles.

9 The cookie is a snickerdoodle.

10 The largest cookie is a snickerdoodle.

11 The largest cookie that Jezebel eats is a snickerdoodle.

Symbolize the following sentences, using this key:
Ax: x is an apple pie
Bxy: x is better than y
Cx: x is a cookie
Sx: x is on the shelf
Sxy: x is smaller than y

12 There’s an apple pie on the shelf. (That is, there’s exactly one.)

13 The best apple pie is on the shelf.

14 The apple pie is better than anything.

15 The apple pie is better than any cookie.

1.4. IDENTITY 39

16 The smallest thing on the shelf is a cookie.

17 The best thing on the shelf is an apple pie.

18 The biggest cookie is smaller than the smallest apple pie.

19 The smallest apple pie is better than the biggest cookie.

20 There are two cookies, and they are the same size.

21 The apple pie is the same size as the cookie.

For these exercises, restrict the universe of discourse to the “natural” numbers 0, 1, 2, 3, ....
Often, we we are working with numbers, we also use ‘<’ and ‘>’ as relations between terms. So to
say ‘x is greater than y’, we say ‘x>y’. We also let the numerals stand as names for the numbers, so
we can say ‘5>4’.
Use the following symbols:
Px: x is prime
Ex: x is even
x>y: x is greater than y
Dxy: is a a divisor of y

22 There is exactly one even prime.

23 The even prime is less than 5.

24 There is no greatest prime.

25 No even number greater than 2 is prime.

26 Every number has at least one divisor.

27 The prime number greater than 5 and less than 10 is not even.

28 There is a smallest prime.

29 2 is the smallest prime.

30 Every prime number has exactly one divisor that is not 1.

31 Any two distinct prime numers share at most one divisor.


Translate the following into English, using this key.
Bxy: x is better than y
Fx: x is free
Gxy: x is longer than y
Lx: x is a laugh
Lxy: x is later than y
Jx: x is a journey
Sx: x is a step
Sxy: x starts with y

32 ∃x ∀y(Bxy &Fx)

33 ∃x[Sx &∀y[(Sy &x=

̸ y) →Lyx]] &Fx

34 ∃x[Lx &∀y[(Ly &x̸=y) →Lxy]] &∀z[(Ly &x̸=z) →Gxz]

35 ∀xJx →∃z(Sz &Sxz) &∀w[(Sw &Sxw) →z=w]

36 ∃x[Jx &∀y[(Jy &x̸=y) →Lxy]] &∃z(Sz &Sxz) &∀w[(Sw &Sxw) →z=w]

Symbolizing with functions

The concept of a function is particularly important in logic and mathematics. A function can be
thought of as a rule enabling one to go from one or more members of one set (called the domain)
to a unique member of a second set (called the range). For example, one function would be a
rule (perhaps in the form of a chart or an equation) that, given a package of hamburger of a certain
weight, enables one to determine its cost. The domain of this function is weights of different packages
and the range is costs. Notice that a function may assign the same value to more than one member
of the domain (packages with different weights can have the same cost—perhaps costs go up by half
pound increments). However, it cannot assign different values to any single member of the domain,
that is, packages of the same weight cannot be assigned different costs. If this were to happen, the
cost would not be a function just of the weight (although, it may be a function of weight and of other
variables such as fat content).
For many functions, the domain and the range are the same one set (e.g. numbers). Thus, many
mathematical functions are rules enabling one to go from one or more numbers to another number.
For example, the so-called successor function is this rule: given any number, write its successor.
Values in a domain are called arguments. Of course, this use of ‘argument’ is not related to its
typical use in logic (just like the bark of a dog has nothing to do with the bark that grows on trees).
We have encountered functions throughout our study of logic. Here are two examples: (1) ‘not’,
‘or’, ‘and’, ‘if . . ., [then] . . .’, and so forth are called truth-functional connectors because the truth
values of the compound statements in which they are used are functions of the truth values of the
simple statements of which those compound statements are composed. The domain and the range
1.4. IDENTITY 41

of these functions are the same, namely, truth and falsity. ‘Not’ is a one-place function, the other
connectors are two-place functions. (2) Expressions like ‘Px’ are called statement functions. Such
expressions are functions whose domain is a set of individuals (whose names we insert in place of ‘x’)
and whose range is truth and falsity. For each named individual, if that individual has property P,
the value of ‘Px’ is true, otherwise it is false. So Px is a rule for going from individuals (the domain)
to truth and falsity (the range).
As we have noted, for many mathematical functions the domain and the range are both the
set of natural numbers. Some functions defined in the domain of natural numbers require two (or
more) arguments to assign a value in the range. For example, given any two numbers (which may
or may not be the same), an addition table enables one to find their sum. Of course, different pairs
of numbers can have the same sum (e.g. 1 and 5 or 2 and 4), but, as in the case of any other function
(for example, the package-weight X cost function), if a given pair were assigned more than one value
in the range, addition would not be a function (at least not a function simply of the numbers being
added). Thus, addition and multiplication are two-place functions. In mathematics, functions are
usually represented by such letters as ‘f ’ or ‘g’. If ‘x’ and ‘y’ represent values in a domain, expressions
like ‘f(x)’ and ‘g(x,y)’ represent the corresponding values in the range. Thus, we could represent
the sum of x and y by ‘s(x,y)’ and their product by ‘p(x,y)’; however, these functions are usually
represented by ‘x + y’ and ‘x × y’ respectively, and these are the symbols we will use.
When a function symbol is attached to the names of appropriate objects, the result can be
thought of as a complex name for the corresponding value in the range. Names and appropri-
ately filled function symbols are called terms. Thus, ‘5’ and ‘2 + 3’ are terms both of which happen
to pick out the same object—the number five.
Function symbols resemble predicate symbols—both can have one, two, three, or more blanks.
But do not confuse the two. Functions are rules that enable one to go from one or more members
of a domain to a unique value in a range. Terms, whether names or appropriately filled function
symbols, are neither true nor false. For example, the term ‘2 + 3’, in which the function symbol
‘+’ is attached to two appropriate arguments, is neither true nor false (just as ‘5’ is neither true nor
false). This is true even of truth functions. For example, like ‘2 + 3’, the compound statement, ‘ A’ is
neither true nor false—it simply picks out, in the range, the opposite truth value to whatever value
is assigned to ‘A’.
By contrast, predicates assert that a given object has a particular property or that two or more
objects stand in a certain relation. Thus, when a predicate is appropriately attached to one or more
terms, the result is a sentence that is either true or false. For example, if ‘x>y’ represents the relation
is greater than (and using ‘=’ as we have already done), ‘3 > 2’, ‘2 + 3 = 5’, and ‘2 + 3 > 2’ are true,
while ‘2 > 3’, ‘2 + 3 = 3 + 3’, and ‘4 + 3 > 3 + 4’ are false. Here ‘=’ and ‘>’ are two-place predicates
whereas ‘+’ is a two-place function.
In the language of logic, functions match up with constants and (first-order) variables; the general
term for these three things is ‘term’. Terms pick out objects, whereas predicates (and second-order
variables) pick out properties of objects. A two-place relation, for example, must have two terms,
but these could be constants, variables, or functions. Constants and variables are each only one
character long, but functions will be more. Because of this, when we symbolize using functions, we

sometimes put parentheses around the terms of the predicate or relation, and separate the terms
with commas. So, if we let ‘s(x)’ symbolize ‘the student of x’, ‘Txy’ stand for ‘x teaches y’, and ‘c’
stand for ‘Dr. Christensen’, then
will symbolize ‘Dr. Christensen teaches his student’, and (restricting the universe of discourse to
symbolizes ‘Dr. Christensen teaches everyone’s student’. This last example has a statement that
includes a variable, a constant, and a function.
Notice that the function ‘s(x)’ symbolizes ‘the student of x’—a noun phrase—not something like
‘x is a student’. All terms, including function, stand for noun phrases, and as such cannot be either
true or false. Here’s a way to think of the three kinds of terms: constants stand for names, variables
stand for pronouns, and functions stand for “definite descriptions,” roughly, noun phrases starting
with ‘the’.
Functions can be chained. So to symbolize ‘The student of Dr. Christensen teaches the student
of the student of Dr. Christensen’, we say


Translate the following sentences using the key provided:
t(x) the teacher of x
s Socrates
p Plato
Gx x is Greek
Lxy x learns from y
(restrict the universe of discourse to persons)
37 Every Greek learns from the teacher of Plato.

38 Every Greek learns from Plato and his teacher.

39 Plato learns from his teacher.

40 No one learns from the teacher of Plato.

41 Socrates learns from the teacher of Plato.

Functions are not strictly necessary. Every statement that can be symbolized using functions can
be symbolized with identity, but without functions. But the opposite doesn’t hold. A sentence like
‘Socrates is the teacher of Plato’ can be symbolized like this:
1.4. IDENTITY 43


thus using both identity and functions.

Translate the following sentences using the key provided:
s(x) the successor of x (that is, the natural number that follows x)
Px x is prime
Ex x is even
Also, use the numerals to name numbers, use >, <, and =, and restrict the universe of discourse
to (natural) numbers.

42 5 is the successor of 4.

43 The successor of 2 is prime.

44 2 is the only number that is both even and prime.

45 Every prime number greater than 2 is odd.

46 Every number is less than its successor.

47 No prime number other than 2 has a prime successor.

48 There is some prime number such that the successor of its successor is prime.

Translate the following sentences using the key provided:
f(x) the father of x
g George
Px x is a pioneer
Axy x is an ancestor of y
Oxy x is older than y

49 George’s great-grandfather is a pioneer.

50 George’s oldest pioneer ancestor is his great-grandfather.

51 George is younger than his father.

52 George is the oldest pioneer.

53 George is the youngest pioneer.

Chapter 2

First-Order proofs

2.1 The first three rules

Because all the truth-functional connectives are used in statements of FOL, all the rules of TF work
here, too. For example, the following is a legitimate proof in FOL:
1 ∀xAx →∃xBx
2 ∀xAx ∃xBx
3 ∃xBx 1,2 MP
But we have new symbols: the quantifiers, and we need new rules to help us deal with them. We
will add four new rules, two for each quantifier. Two of the rules are straightforward, and we start
with them.

universal instantiation
If I know that every human being is mortal, then I know that any particular human being (say,
Socrates) is mortal. So I should be able to go from a sentence like


to a sentence like


The rule of inference that permits us to do this is called universal instantiation (UI) because we take
and instance of a universal quantifier. In symbols, the rule is stated like this:

∀xPx ⊢ Pa

In this rule, ‘P’ stands for any sentence, ‘x’ for any variable in that sentence, and ‘a’ for any
There are a few restrictions of our use of this rule:


1. The universal quantifer must apply to the whole line. If any other symbol precedes the quan-
tifier, or if there are other symbols after the scope of the quantifier, we cannot apply the rule.
(The whole-line restriction.)

2. We must replace every occurence of the variable with the name. (The general convention).

The first restriction prohibits the following inference:

1 ∀xAx → ∀xBx
2 Aa → ∀xBx
Because the ∀is not the main logical symbol, because it does not apply to the whole line, universal
instantiation cannot be applied here.
The second restriction prohibits the following inference:

1 ∀xAxx
2 Aax
Here we have instantiated only one instance of ‘x’ with the name ‘a’ and left the other alone.
Now, for a legitimate use of the rule:

1 ∀x(Hx → M x)
2 Hs .˙.Ms
3 Hs → M s 1 UI
4 Ms 2,3 MP

existential generalization
If I know that Felix is in the room, I know that something is in the room. So I should be able to go




The rule that permits this inference is called existential generalization (EG). In symbols, the rule is

Pa ⊢ ∃xPx

This means that I can take any statement with some name and replace one or more instances
of that name with a variable, prefixing the statement with an existential quantifier. As before, there
are two restrictions:

1. The existential quantifer must apply to the whole line. We cannot put the existential quantifier
anywhere but at the front of the statement, with its scope the entire statement. (The whole-line

2. The variable that we choose cannot occur anywhere else in that statement. (The general con-

The first restriction is violated in this example:

1 ∀x(Lax → Lxa)
2 ∀x(∃yLyx → Lxa)
This goes from the claim that everyone Artlinde loves loves her to the claim that anyone who is
loved by anyone loves Artlinde. That’s clearly not a legitimate inference, and the rule blocks it.
The second restriction is violated in this example:

1 Lax
2 ∃Lxx
Here the x becomes bound when the quantifier is added. This deduction goes from Artlinde
loves Xenophon to someone loves himself. This is clearly illegitimate, and the second restriction
blocks it.
Now, an example of these rules:

1 ∀x(Ax → Bx)
2 Aa ∃xBx
3 Ax → Ba 1 UI
4 Ba 2,3 MP
5 ∃xBx 4 EG

universal generalization
The next rule is a little trickier. We need a rule that allows us to introduce the universal quantifier.
Say we had a universe of only three objects—the books on this table, say. Call them ‘a’, ‘b’, and
‘c’. If we know that a was written by Aristotle, and b was written by Aristotle, and c was written by
Aristotle, we could conclude that everything (in this universe) was written by Aristotle. That is, we
could go from Aa, Ba, and Ca to ∀xAx.
In general, though, this won’t work. We don’t always know ahead of time how many things are
in the universe of discourse, and there may be infinitely many things. So we need a different rule.
In mathematical proofs, we might draw a triangle and label it ABC, say. We prove things about this
arbitrary triangle with an arbitrary name, and we can conclude that what we’ve proved holds for all
similar triangles. But we are not allowed to make any special use of the triangle we’ve drawn. We
can’t measure the angles, and conclude that all triangles have angles of just this size, for instance.

We might say it this way. If I can prove something about a single individual, but I know that the
proof would have worked no matter which individual I chose, I don’t have to prove it about all of
them. This one individual stood in for them all. So I need some restrictions to guarantee that the
object I have chosen is really arbitrary, that what I prove about it I could have proved about anything.
In symbols, the rule is this:


And the restrictions are these:

1. The universal quantifer must apply to the whole line. If any other symbol precedes the quan-
tifier, or if there are other symbols after the scope of the quantifier, we cannot apply the rule.
(The whole-line restriction.)

2. The variable that we choose cannot occur anywhere else in that statement. (The general con-

3. We must replace every occurence of the name with the variable. (The general convention).

4. The name that we quantify from cannot appear in any premises or assumptions that are still
in force. (The arbitrariness restriction.)

The first two restrictions are the same as before. The third is similar, and the fourth is new. Let’s
look at some examples of violations.

1 ∀x(Ax → Bx) .˙.Aa →∀xBx

2 Aa → Ba (cp)
3 Aa → Ba 1 UI
4 Ba 2,3 MP
5 ∀xBx illegitimate use of UG
6 Aa → ∀xBx 2-5 CP

This proof goes from the premise that all alligators are brown to the conclusion that if Artlinde
is an alligator, everything is brown. This is clearly invalid, and the fourth restrictions prohibits this.
Notice that the restriction is concerned only with assumptions that are still in force. The follow-
ing proof (of a Barbara syllogism) is fine:

1 ∀x(Ax → Bx)
2 ∀x(Bx → Cx) .˙.∀x(Ax →Cx)
3 Aa (cp)
4 Aa → Ba 1 UI
5 Ba → Ca 2 UI
6 Ba 3,4 MP
7 Ca 5,6 MP
8 Aa → Ca 3-7 CP
9 ∀x(Ax → Cx) 8 UG
This proof is fine, even though the letter a occurs in the assumption on line 3, because this
assumption is discharged on line 7. The restriction prohibits us from applying UG within a subproof
on any letters that occur in the assumption.
Barbara can be proved without CP, like this:

1 ∀x(Ax → Bx)
2 ∀x(Bx → Cx) .˙.∀x(Ax →Cx)
3 Aa → Ba 1 UI
4 Ba → Ca 2 UI
5 Aa → Ca 3,4 HS
6 ∀x(Ax → Cx) 8 UG
This is by far the most common way UG is used. We first use UI, then, after doing some TF rules,
we use UG to put the quantifier back on. As long as the letter we choose isn’t used in the premises
or assumptions, nothing prevents us from having chosen any other letter.
The third restriction prohibits us from an inference like this:

1 Laa
2 ∀x(Lxa)
This inference goes from the claim that Artlinde loves herself to the claim that everyone loves
Artlinde. This is clearly flawed, and whether line 1 is in a premise or assumption the third restriction
prohibits it.

1 ∀x (Ax →Bx)
∀x (Bx →Cx)
∀xAx .˙.∃x (Ax &Cx)
2 Aa &Ba
∃xBx →∀x(Ax →Cx) .˙.∃xCx
3 ∀x((Ax &Bx) →Cx)
∀x(Ax &Bx) .˙.∀xCx

4 Aa
∀x(Ax →Bx)
∀x(Bx →Cx) .˙.∃xAx
5 ∃xCx →∀x∼Bx
∀x(Ax →Bx)
Ca .˙.∃x ∼Ax

6 ∃xAx →∀x ∀y Bxy

∀xAx .˙.∃x ∀y Bxy

2.2 existential instantiation

Existential instantiation (EI) allows us to go from the claim that something is a certain way to the claim
that some specific thing is that way. It allows us to go from the claim that some cookies are delicious
(∃x(Cx &Dx) to the claim that Nebuchadnezzar is a delicious cookie (Cn &Dn). Clearly, this must
be subject to careful restrictions, just as UG is.
In fact, EI will be treated a little differently from the other quantifier rules. It will require an
assumption, like CP and IP. Suppose we know that all cookies are delicious, and that there are
cookies. We could conclude that there are delicious things:

∀x(Cx → Dx)
∃xCx ∃xDx
Suppose we reason as follows:

Let’s call one of the cookies ‘Nebuchadnezzar’. By the first premise, if Nebuchadnezzar
is a cookie, then it’s delicious. And we assumed it’s a cookie, so it must be delicious. So
there is something that delicious. There was nothing special about the name, so we can
conclude that something is delicious.

That reasoning can be formalized like this:

1 ∀x(Cx → Dx)
2 ∃xCx .˙.∃xDx
3 Cn (ei, n)
4 Cn → Dn 1 UI
5 Dn 3,4 MP
6 ∃Dn 5 EG
7 ∃Dn 2, 3-6 EI
In symbols, the rule is this:

∃xPx, Pa ... p ⊢ p

That is, if there is an existentially quantified line and we assume an instance of that line and
conclude some statement p, we can conclude p outside the subproof.
The restrictions are these:

1. The existential quantifer must apply to the whole line. If any other symbol precedes the
quantifier, or if there are other symbols after the scope of the quantifier, we cannot apply the
rule. (The whole-line restriction.)

2. We must replace every occurence of the variable with the name. (The general convention).

3. The name that we quantify from cannot appear in any previous line of the proof (excluding
closed subproofs). (The arbitrariness restriction.)

4. The name cannot appear in the line that closes out the assumption.

These restrictions are similar to the restrictions on UG, but are even stricter. For UG, we couldn’t
generalize on a letter that occurs in any premises or undischarged assumptions. For EI, we cannot
instantiate using a letter that occurs on any previous line. So even if the letter was introduced by an
application of UI, it cannot be used in the EI assumption.
Here’s an example that violates that restriction:
∀x∃y Txy .˙.∃Txx
The argument goes from the claim that everything is taller than something to the claim that some-
thing is taller than itself. This is clearly invalid. Suppose we tried to prove it like this:

1 ∀x∃yT xy .˙.∃Txx
2 ∃yT ay 1 UI
3 T aa (ei,a)
4 ∃T xx 3 EG
5 ∃T xx 2, 3-4 EI (incorrect, illegitimate, wrong)
The assumption on line 3 violates the restriction, because the letter a occurs on line 2. We would
have had to choose a different letter, and the proof would not have worked. And we couldn’t do the
EI assumption before we did UI, because then we would have violated the whole-line restriction.
Here’s a proof that violates the fourth restriction. It should be obvious that it’s invalid.

1 ∃x(Ax & Bx) .˙.∀xAx

2 Aa & Ba (ei, a)
3 Aa 2 Simp
4 Aa 1, 2-3 EI (violates restriction 4)
5 ∀xAx 4 UG
Here the UG line is fine, since the assumption where a was introduced has been closed out.
But restriction 4 requires that the letter introduced no longer appears on the line that closes out the

assumption. This restriction, together with the UG restriction against generalizing on letters that
occur in assumptions, prevent this kind of logical hooliganism.
To help us remember which letter was introduced, to the right of an EI assumption we put the
letter in parentheses, along with the letters ‘ei’. On any line in the subproof that does not have that
letter, we may end the subproof, duplicating the line in the main proof.
Often we need to instantiate the same letter for a universal and an existential quantifier. If we
do UI first, we cannot use the same letter for EI. So, whenever possible, do EI first. Here’s a proof of a
Disamis syllogism, to illustrate a legitimate proof using EI:

1 ∃x(Ax &Bx)
2 ∀x(Ax →Cx) .˙.∃x(Bx &Cx)
3 Aa &Ba (ei, a)
4 Aa →Ca 2 UI
5 Aa 3 Simp
6 Ba 3 Simp
7 Ca 4,5 MP
8 Ba &Ca 6,7 Conj
9 ∃(Bx &Cx) 8 EG
10 ∃(Bx &Cx) 1, 3-9 EI

This proof is a good example of a typical use of EI. As in this proof, typically an EI assump-
tion will end with an EG. The citation to the right of the EI line includes the line number of the
existentially generalized statement and all the lines in the subproof.

1 ∀x(Ax → Bx) .˙.∃x∼Bx →∃x∼Ax

2 ∃x∼Bx (cp)
3 ∼Ba (ei, a)
4 Aa → Ba 1 UI
5 ∼Aa 3,4 MT
6 ∃x∼Ax 5 EG
7 ∃x∼Ax 2,3-6 EI
8 ∃x∼Bx → ∃x∼Ax 2-7 CP


1 ∃x(Ax &∀yBxy)
∀x ∀y(Bxy →Byx) .˙.∀x ∃yBxy

2 ∀x(Bx →Cx)
∃x(Dx &∼Cx) .˙.∃x (Dx &∼Bx)

3 ∃x (Ax &∀y Bxy) .˙.∃x ∃y Bxy


4 ∀x (Ax →∃y(By &Cxy))

∃xBx .˙.∃x ∃y ((Bx &Cy) &Dxy)

2.3 Quantifier Negation

As noted above, the quantifiers can be defined in terms of each other:

∀xRx ≡ ∼∃x ∼Rx

∀x ∼Rx ≡ ∼∃xRx
∼∀xRx ≡ ∃x ∼Rx
∼∀x ∼Rx ≡ ∃xRx

We will adopt these equivalences as a new rule of proof, quantifier negation (QN). This rule doesn’t
allow us to prove any new arguments. All the quantifier equivalences can be proved using our regular
quantifier rules, so QN is just a shortcut. But it does come in handy when we have a quantifier with
a negation in front of it. The negation prohibits us from applying any of the quantifier rules. But
we can “drive in” the quantifier using QN, as in this proof of a version of Celarent:
1 ∼∃x(Ax &Bx)
2 ∀x(Cx →Ax) .˙.∼∃x(Cx &Bx)
3 ∀x ∼(Ax &Bx) 1 QN
4 ∼(Aa &Ba) 3 UI
5 ∼Aa v∼Ba 4 DeM
6 Aa →∼Ba 5 CE
7 Ca →Aa 2 UI
8 Ca →∼Ba 6,7 HS
9 ∼Ca v∼Ba 8 CE
10 ∼(Ca &Ba) 9 DeM
11 ∀∼(Ca &Ba) 10 UG
12 ∼∃x(Cx &Bx) 11 QN

1 ∃x (Ax &Bx)
∼∃x (Bx &Cx) .˙.∼∀x (Ax →Cx)

2 ∼∀x(Bx →Cx)
∼(Bx &Dx)
∀x(Ax →(Cx vDx)) .˙.∼∀xAx

3 ∼∃x(Ax &∼Bx)
∼∃x(Bx &(∼Cx v∼Dx)) .˙.∼∃x(Ax &(∼Cx &∼Dx))

4 ∀x(Bx →∼∀Axy)
∼∀x (∼Bx &∀(Cx →Axy)) .˙.∼∀x Cx

5 ∀x ∀y Axy
∼∃x ∃y Bxy .˙.∃x ∼∃y (Axy →Byx)

2.4 Logical Truths

TF statements that are always true are called tautologies. We could use that word also to apply to
FOL statements that are always true, but normally the word is restricted to TF use. I will call them
by the generic term logical truths.
Just as when we prove that a statement is a tautology, we prove that a statement is a logical truth
by constructing an argument with no premises. We can always begin by assuming the negation of the
statement, and looking for a contradiction. If the main connective in the statement is a conditional,
we could assume the antecedent and conclude with the consequent. If the main connective of the
sentence is a quantifier, sometimes we can assume part of the statement. Here’s an example of that
last strategy:
Example: ∀x(Ax →Ax)
1 | Aa (cp)
2 Aa →Aa 1 CP
3 ∀(Ax →Ax) 2 UG
We have seen the trick on line 1 before. Notice also that line 3 violates no restrictions, as the
subproof where a is introduced has been closed out.


1 (∀xFx &∀xGx) ↔∀x(Fx &Gx)

2 (∀xFx v∀xGx) →∀x(Fx vGx)

3 (∃xFx v∃xGx) ↔∃x(Fx vGx)

4 ∃x(Fx &Gx) →(∃xFx &∃xGx)

5 ∀x(Fx &p) ↔(∀xFx &p)

6 ∃x(Fx &p) ↔(∃xFx &p)

7 ∃x(Fx vp) ↔(∃xFx vp)

8 ∀x(Fx vp) ↔(∀xFx vp)

9 ∀x(p ↔x) ↔(p →∀xFx)

10 ∃x(p ↔x) ↔(p →∃xFx)


11 ∀x(Fx →p) ↔(∃xFx →p)

12 ∃x(Fx →p) ↔(∀xFx →p)

13 ∀y(∀xFx ↔y)

14 ∀y(Fy →∃xFx)

15 ∃y(Fy →∀xFx)

16 ∃y(∃xFx ↔y)

17 ∀x∃y(Fx →Gy) →∃x(Fx →∃yGy)

18 (∃xFx →∃xGx) →∃x(Fx →Gx)

19 ∀x∃y(Fx &Gy) ↔∃y∀x(Fx &Gy)

2.5 Strategies and Tactics

The highest-level strategies remain the same for any arguments: analyze the argument forward,
backward, and globally, and when all else fails try random walk or indirect proof. There are a few
more specific strategies, and a few tactics.
Strategy 1: Reduce to a truth-functional proof.
Many arguments in quantificational logic, including all syllogisms, can be proved by following
these steps:

1. Remove any quantifiers in the premises;

2. Apply truth-functional rules;

3. If needed, reattach the quantifiers.

When applying rule 1, there are a few things you should watch out for:
a. Sometimes some of the premises will be unquantified. Be careful that you pick the right
constants when removing the other quantifiers, so that the argument will work, but not violate the
restrictions on EI.
b. Speaking of EI, remember as a rule of thumb to remove existential quantifiers before universal
Strategy 2: Mix truth-functional and quantifier steps.
Sometimes the above strategy fails. This may be because the premises or conclusion have quan-
tifiers, but they don’t apply to the whole line, so cannot be removed with our rules. Here we need
to be a little more flexible, but in general, here are the rules to follow:

1. Remove any quantifiers that apply to whole lines. Again be careful in what order you remove
them. (If there’s a mix of existential and universal, you may want to hold off on removing the
universal quantifiers until it’s necessary.)

2. Do any truth-functional steps you can.

3. Reattach any quantifiers necessary to do more truth-functional steps.

4. Do those truth-functional steps.

5. Repeat as needed.

It might be good to see an example:

1 ∃xAx→∃xBx
2 ∀x(Bx→Cx) .˙.∃xAx→∃xCx
3 ∃xAx (cp)
4 ∃xBx 1,3 MP
5 Ba (ei, a)
6 Ba→Ca 2 UG
7 Ca 5,6 MP
8 ∃xCx 7 EG
9 ∃xCx 4,5-8 EI
10 ∃xAx→∃xCx 3-9 CP
Since the conclusion is a conditional, we started by assuming its antecedent. Line 1 has no
quantifiers applying to the whole line. Line 2 does, but it’s a universal quantifier and we have several
existential quantifiers, so we might want to hold off. We notice that the line we just assumed is the
same as the antecedent of line 1, so we do modus ponens. Now, looking at lines 2 and 4 we see a ‘B’
in common, but to allow them to interact we need to remove the quantifiers (E before U). Then we
do a few truth-functional steps until we get ‘Ca’ on line 7. That’s an instance of the consequence of
the conclusion. So we generalize, and clean up.
Tactics & Tricks

1. Remove existential quantifiers before universal quantifiers.

2. If there’s a negation outside the quantifier, the quantifier cannot be removed. Use QN to
move the negation in, then remove the quantifier.

3. If the conclusion is a conditional, assume the antecedent, even if the antecedent is quantified.
(If the conclusion can be turned into a conditional via TF rules, ditto.)

4. If the conclusion is a quantified conditional, assume an instance of the antecedent.

Let’s look at the difference between (3) and (4). If the conclusion is


you should assume ‘∃xAx’ and try to get ‘∃xBx’, and the conclusion will follow by conditional proof.
But, if the conclusion is


you may want to assume ‘Fa’, get ‘Ga’, then after the conditional proof, universally generalize to
get the conclusion. This will usually not violate the restrictions on UG, since the assumption that
introduced ‘a’ has been closed out, and so the constant is not in any assumptions still in force.

5. Sometimes the quantifiers don’t need to be removed before you apply TF rules. If the TF
rules can be applied within a line, they can be applied to a line that has a quantifier. For
instance, you can go from ‘∀x(Fx→Gx)’ to ‘∀x(∼Gx→∼Fx)’ without removing the quantifiers
and reattaching them.
Chapter 3

Axiom Systems

3.1 Axiom Systems

The Philosophers’ Dream
In his famous allegory of the cave, Plato imagines that in our current state, our minds are full of
vague and unsystematic thoughts, as if we were imprisoned in a dark cave. To gain true knowledge,
true understanding, we would need to exit this cave and gaze at the sun. The sun, representing
ultimate reality, could give us understanding of everything and put all our knowledge into its proper
place. The imaginations of Plato and other ancient philosophers were sparked by geometric proofs,
which gave a great deal of knowledge from only a few basic assumptions, or axioms. Plato imagined
that we could carry this method all the way back to the beginning, that with just a single basic piece
of knowledge, everything else would follow. The Greek word axioma means ‘worthy thing’—the
axioms are the things most worthy of knowledge, since knowing them allows us to know everything.
Other philosophers have had similar dreams. They have hoped that all knowledge could be
ordered systematically, that knowledge of a few basic facts would allow us to know everything.
It hasn’t turned out that way. Not only has no one been able to find anything like Plato’s ultimate
reality that, once understood, allows us to know everything, but it turns out that in a very real sense
truth is always unsystematic. But even if we can’t have an axiom system of absolutely everything, the
axiom system is a model of rigor, and it will be useful to see just how far we can push the philosophers’

Historical Background
But we’re getting ahead of ourselves. Beginning perhaps with Thales, the Greeks demonstrated the
connection between different geometrical facts. The project culminated in Euclid’s Elements, a work
that begins with a few axioms and definitions, and proceeds to prove various theorems, such as that
the interior angles of a triangle are equal to two right angles. It really is an impressive achievement,
one of the pinnacles of human accomplishment.


It is, however, riddled with flaws. In doing his proofs, Euclid repeatedly assumes things he hasn’t
stated. One famous example: the very first proof assumes that whenever lines intersect, there is a
point at which they intersect. This may be obvious, but we can, without contradiction, make all of
Euclid’s axioms true and this false, which means it’s not something he is entitled to assume. In the
axiomatic method, we can’t assume anything unless we say so.
By the end of the 19th century, several mathematicians wanted to bring mathematics back to
its promise of a sure foundation. Frege was among these, but much more famous in his lifetime was
David Hilbert. Hilbert gave a new axiomatization of geometry that was intended to do away with
the flaws of Euclid’s.
The secret is the sharp contrast between syntax and semantics. When we set up an axiom
system, we begin with a few undefined terms, and then specify rules for combining them to form
sentences. We begin with a few axioms, and we specify rules for generating theorems. But, crucially,
these terms and axioms are just marks on a page. When we follow the rules, it’s as if we were playing
chess. We are never allowed to declare something on the grounds that it’s obvious because of the
subject matter. Of course, in general we’ll be interested in an axiom system because of an intended
interpretation—because it tells us about planes and solids, or about arithmetic, or about astronomy.
But the interpretations come only after we’ve spelled everything out in a precise formal or symbolic

Symbolic Languages
The concept of a symbolic language is explained as follows. One first distinguishes logical and non-
logical symbols. Logical symbols are (i) variables for quantifiers (e.g. x, y, z, xo, x1, . . .), (ii) the
following five symbols: ‘&’, ‘∼’, ‘)’, ‘(‘, and ‘=’, and (iii) all other symbols such as ‘v’, ‘→’, and ‘↔’
that can be defined from these. Non-logical symbols are terms (including numerals and individual
constants like ‘h’ or ‘t’ that represent names), statement letters, function symbols, and all predicates
other than ‘=’. A symbolic language includes the logical symbols (which, by this definition, are
part of every symbolic language) together with a specified set of non-logical symbols. We treat the
relational predicate ‘=’ as a logical symbol to insure that it is included in every symbolic language—it
is the only predicate that receives this special treatment.
Here are three examples: (1) In addition to the logical symbols, the symbolic language of truth-
functional logic (called L) consists of a collection of statement letters: ‘A’, ‘B’, ‘C’ . . . In this language,
there are no terms, function symbols, or predicates. (2) In addition to the logical symbols, the
symbolic language of arithmetic (called A) consists of the name ‘0’, the function symbols ‘” (read:
the successor of), ‘+’ and ‘×’, and the predicate ‘N’ (read: is a number). Other symbols can be
introduced into A by suitable definitions. For example, ‘1’ can be defined as ‘0”, ‘2’ as ‘1” (i.e. as
‘0”’), etc. Using these symbols we can write sentences like ‘2 + 3 = 5’ or ∀x(x + 1 = x’). (3) In
addition to the logical symbols, the symbolic language of set theory (called S) consists of the single
two-place predicate ‘∈’ (read: is an element of). Other symbols are introduced into S by suitable

3.2 Axiom Systems

Elements of Axiom Systems
It is often possible to systematize a set of statements so that some or all of them can be derived from
a few members of the set. The members from which statements are derived are called axioms. The
best known axiom system is Euclid’s axiomatization of geometry, but other branches of mathematics,
logic, other sciences, and even the statements of ethics or political theory can be axiomatized with
more of less success. There are usually alternative ways of axiomatizing a given set of statements,
that is, given a set of statements, it may be possible to select different groups of axioms from which
the statements can be derived. Thus, the particular axioms one chooses may be arbitrary or dictated
by convenience (rather than by any assumptions about which statements are most basic, intuitively
obvious, or essential).
The most apparent elements in an axiom system are the axioms and the theorems derived from
them. For example, in Euclidian geometry one axiom is the famous parallels postulate which states
that given a line and a point not on the line, one and only one line can be drawn through the
point and parallel to the line. From this and other axioms one can derive various theorems such
as the equally famous Pythagorean theorem that expresses a relation between the sides and the
hypothenuse of right triangles.
In the late nineteenth century, an Italian mathematician named Guiseppi Peano axiomatized
arithmetic. Here is a set of axioms equivalent to the one that Peano used:

3.2.A Zero is a number.

3.2.B Every number has a number as a successor.
3.2.C No number has zero as a successor.
3.2.D Given any two numbers, if the successors of those numbers are equal, the num-
bers are equal.
3.2.E If zero has some property and if, supposing that any number whatsoever has that
property, the successor of that number must also have the property, then every number
has the property.

From these axioms, one can derive various theorems. For example, from the first two it follows
that zero has a successor which is a number.
Reflection reveals that axiom systems include more than just axioms and theorems. Any axiom
system must be stated in some language. In that language, some terms will be defined and others
will be undefined. In 3.2.A through 3.2.E, ‘zero’, ‘successor’ and ‘number’ are undefined. Other
terms can be defined from the undefined terms; for example ‘one’ can be defined as the successor
of zero. However, axioms can be stated not only in natural languages like English or Greek but also
in symbolic languages. For example, 3.2.A through 3.2.E can be stated in the symbolic language A
(as defined above):

3.2.F N0
3.2.G ∀x(Nx → Nx’)
3.2.H ∼∃x(x’ = 0)
3.2.I ∀x∀y(x’ = y’ → x = y)
3.2.J ∀X((X0 &∀x(Xx → Xx’)) → ∀xXx)

As we know and as is here illustrated, a symbolic language like A includes the logical symbols
and certain non-logical symbols. In 3.2.F through 3.2.J, the undefined non-logical symbols are ‘N’,
‘0’ and ‘” and other symbols can be defined from them. For example, ‘1’ can defined as ‘0”.
For most purposes, the only elements of axiom systems that are explicitly identified are axioms
and theorems and undefined and defined terms. However, in axiomatizing branches of logic, two
other elements must be taken into account. Ordinarily, given a set of statements that one wishes
to axiomatize, one uses whatever is “logical” to advance from axioms to theorems and whatever
is “grammatical” counts as a statement. However, in axiomatizing logic itself, one cannot simply
allow whatever inferences seem logical, and, working in an artificial symbolic language such as L
or A, one must specify the grammar of the language to make clear which combinations of symbols
are acceptable. So in using a symbolic language to axiomatize logic, one must identify specific rules
of inference (e.g. modus ponens) that are allowed, and one must state explicit rules of syntax that
determine whether a string of symbols is acceptable. Thus, in principle, any axiomatization will
involve six elements (which may or may not be made explicit):

Undefined Terms Axioms

Definitions Theorems
Rules of Syntax Rules of Inference

Properties of Axiom Systems

Several properties of axiom systems have been identified and studied; we will give particular at-
tention to three: independence, consistency, and completeness. The axioms of a system are inde-
pendent if no one of them can be derived from the others. Of the three properties in question,
independence is the least important—its value is mostly aesthetic. An axiom system is consistent if
no theorem is a self-contradiction. Since a self-contradiction truth-functionally implies every state-
ment, an equivalent definition of consistency is that there is some grammatical statement in the
language of the system that is not a theorem. Consistency is absolutely essential in every axiom
system because in an inconsistent system one can prove everything. Such a system is worthless.
Completeness is more difficult to define than independence or consistency, and there are differ-
ent concepts of completeness depending in part on the subject matter to be axiomatized. We will be
interested in two such concepts. The first is this: an axiom system is complete if for every grammat-
ical statement in the language of that system, either that statement or its denial is a theorem. This
concept of completeness works for arithmetic. For example, consider these two pairs of statements:

2 + 2 = 5, 2 + 2 ̸= 5 and 2 + 2 = 4, 2 + 2 ̸= 4. Since, for every given statement in arithmetic, either

that statement or its denial is true, if an axiom system is complete for arithmetic, it must be possible
to prove one or the other of each such pair.
The preceding concept of completeness works for arithmetic but not for logic, and the reason
is simple. Think about an axiom system intended to prove tautologies. We do not want this system
to be such that, for every grammatical statement in the language of truth-functional logic, either
that statement or its denial is a theorem. This is because, in truth-functional logic, some statements
are contingencies. For example, neither ‘A’ nor ‘∼A’ is a tautology, and if our axiom system enabled
us to prove either of these, it would also enable us to prove the other—that is, it would yield a self-
contradiction and so be inconsistent. So this concept of completeness, while suitable for arithmetic,
is not useful in logic.
Logic requires a different concept of completeness. We can approach this concept by thinking
again of truth-functional logic. Remember that a tautology is a compound statement true for every
interpretation of its simple statement letters. An axiomatization for truth-functional logic is com-
plete for tautologicality if, within that axiomatization, one can prove every tautology. We can use
this approach to define completeness for the logical truths of quantificational logic. In addition to
statement letters, quantificational statements can include predicates, function symbols, and terms; in
quantificational logic, variables must also be defined over some universe of discourse. So, in quan-
tificational logic, we can define logical truth as a statement true under every interpretation of its
statement letters, predicates, function symbols, and terms, and within every universe of discourse.
In Section 2.2 we encountered numerous logical truths from quantificational logic; for example,
exercises 2.2.44 through 2.2.62 are logical truths. According to this definition, all tautologies are
logical truths but, of course, not all logical truths are tautologies. Now, given this generalized def-
inition of logical truth, we can define completeness as follows: to say that an axiomatization for
quantificational logic is complete means that it is possible, within that system, to prove every logical
truth of quantificational logic.

3.3 An axiom system for TF

A set S is semantically axiomatizable iff there is A ⊆ S such that for every s ∈ S , A |= s. Every
Σ is semantically axiomatizable, if only by the set itself. We say the set is finitely, or recursively, etc.
axiomatizable if the set meets those conditions.
We’re interested here in a related notion: syntactic axiomatization. This requires, in addition
to the set of axioms, rules of inference. A derivation is a finite sequence of sentences such that every
sentence in the sequence is an axiom or follows from earlier sentences in the sequence via the rule.
Here we will take {¬, →} to be our logical constants.
The rule of inference will be

MP p, p → q ⊢ q

The lower-case letters serve as a kind of meta-variable over all sentences. So all of these are valid
instances of the rule:

A, A → B ⊢ B
A → B, (A → B) → ∼C ⊢ ∼C
∼(C & B), ∼(C & B) → (A v ∼A) ⊢ A v ∼A

(The last instance uses some defined symbols.) This rule is called ‘MP’ (for modus ponens).
The axioms are these:
TF1 p → (q → p)
TF2 [p → (q → r)] → [(p → q) → (p → r)]
TF3 (¬p → ¬q) → (q → p)

Every sentence in a derivation will be called a theorem, and we write ⊢ σ to say that sigma is a
theorem. We could extend the notion of a derivation to include derivations from assumptions. We
say ∆ ⊢ σ if there is a sequence of sentences, every sentence being an axiom, in ∆, or following
from earlier sentences via the rule, and σ is in that sequence. Thus ⊢ σ iff ∅ ⊢ σ .
Being stated in terms of variables rather than in terms of statement letters, each axiom and LR1
includes all the infinitely many statements in L that are instances of the appropriate forms. For
example, here are three instances of TF1:

A → (B → A)
(A & B) → (B → (A & B))
D → ((A → (B → A)) → D)

One trick in doing axiomatic proofs is finding an appropriate substitution instance.

The axiom system we’re using here is due to Łukasiewicz. The first two axioms are the same as
Frege’s first two; the third axiom takes the place of three of Frege’s original axioms. There are many
other systems; some take conjunction or disjunction rather than conditional to be the undefined
term; some have more axioms and some fewer. Some systems have only one axiom. In setting up
an axiom system for truth-functional logic, it’s most important to find a set of axioms that’s complete
and consistent; among those that are complete and consistent, we choose one that is elegant and
easy to work with.
Let’s start finding some theorems. And let’s begin with a simple statement.
TF4 A → A.
Here’s the proof:

1 (A → ((A → A) → A)) → ((A → (A → A)) → (A → A)) TF2

2 (A → ((A → A) → A) TF1
3 (A → (A → A)) → (A → A) 1,2 MP
4 A → (A → A) TF1
5 A→A 3,4 MP

Proofs in axiom systems must be approached differently from the proofs that we’ve done before.
Every line in the proof is an instance of one of the axioms, or follows from two earlier lines in the
proof by an application of modus ponens. The proofs are generally short and simple to follow. They
are often not so simple to find. The trick is to find an instance of the axioms that will give us what we
want. Here, for example, the first line is an instance of TF2, with ‘A’ substituted for ‘p’ and ‘r’, and
with ‘A → A’ substituted for ‘q ’. How can we tell what substitution instances to use? A good rule
of thumb is to find a consequent of some axiom or previous theorem that looks like you’re trying
to prove. (By ‘looks like’, I mean ‘is a substitution instance of ’.) Here I noticed that the theorem,
‘A → A’, is a substitution instance of the consequent of TF2, ‘p → r’. So I wrote down that instance
of TF2:

(A → (− → A)) → ((A → −) → (A → A))

I’m left with two blank spaces. The substitution instances of ‘p’ and ‘r’ are forced, but ‘q ’ is left
open. Now I need to figure out how to get rid of the rest of that line. In other words, I need to find
a way to fill in the blanks so that ‘A → (− → A)’ and ‘A → −’ are instances of axioms. Then I
can do modus ponens twice and I’ll be left with ‘A → A’. The first one, ‘A → (− → A)’, is easy.
No matter what I put in for the blank, it will be an instance of TF1. But because I need to put the
same thing in for both blanks (since they’re both the same variable), I’ll look at the second formula,
‘A → −’. To make this one an instance of TF1, I need to substitute a conditional for the blank, and
in particular a conditional whose consequent is ‘A’. Any such conditional would work: ‘A → A’,
‘B → A’, ‘(A → (B → C)) → A’. I picked a simple one.
Now clearly we could continue to prove variations on TF4. We could prove ‘B → B ’, ‘C → C ’,
and so on. But each of the proofs would be identical, except for variations in substitution instance.
Instead of that, we will treat this proof as a proof schema, proving each particular instance. We will
state the theorem in terms of statement variables:
TF4 p→p
From now on, all theorems can be thought of as schemata. If we prove, say, ∼∼A → A, we act
as if we had proved every instance of the same form.
We can also prove derived rules. These proofs go slightly differently from proofs of theorems,
since we start with assumptions.
TF5 (HS) A → B, B → C ⊢ A → C

1 A→B
2 B→C
3 (B → C) → (A → (B → C)) TF1
4 A → (B → C) 2,3 MP
5 (A → (B → C)) → ((A → B) → (A → C)) TF2
6 (A → B) → (A → C) 4,5 MP
7 A→C 1,6 MP

(‘HS’ stands for ‘hypothetical syllogism’.) Here the method of proof is similar to TF4, but we
began with two lines without justification. This is similar to proving from premises. But once again,
the trick is to find an instance of the axioms or previous theorems that will give us what we want.
Here, the conclusion looks a lot like the last consequent of TF2. Then we work backward. What
would we need to have to get ‘A →C ’ by modus ponens from TF2? Well, we would need ‘A →(B
→C )’ and ‘A →B ’. The second we already have as an assumption. The first is a conditional
statement with the other assumption as the consequent. So now we need to figure out how to add
an antecedent to something we already have. That is what TF1 does. So if we put ‘B →C ’ for ‘A’
and ‘A’ for ‘B ’, we have it.
Study this proof; it provides a good model for many of the axiomatic proofs. Many proofs,
however, will be easier, and proofs generally get easier as we go along. A big part of the reason for
this is that once we prove a theorem or a derived rule, we can cite it in further proofs.
TF6 (MT*) ∼A → ∼B, B ⊢ A
(This is called ‘MT*’ since it’s related to modus tollens.)
TF7 ∼A → ∼B, ∼B → ∼C ⊢ C → A
TF8 (EFQ) A, ∼A ⊢ B
(‘EFQ’ stands for ‘ex falso quodlibet’, the medieval name of this principle.)

The Deduction Theorem

Axiomatic proofs are a little tricky. For all these conditional statements, it would be nice if we could
use something like conditional proof. It turns out we can. For axiom systems, this new rule is called
the “deduction theorem,” and we could state it like this:

CP If p, . . . ⊢ q , then . . . ⊢ p → q

The dots indicate that there may be other assumptions present. The rule says that if I assume p
and conclude q (possibly with other assumptions), then (with those assumptions) I can prove p → q .
We’ll call it ‘CP’ (for conditional proof ) because that’s a more familiar name.
We could prove the deduction theorem, but we won’t. It takes TF1, TF2, and TF4 to prove it.
The deduction theorem also allows one more adjustment: For every theorem of the form ‘A
→B ’, we’ll assume we have the associated rule ‘A ⊢B ’. We can do this because we could always
assume A, then do modus ponens to get B . In fact, many of the proofs we’ll do will be expressed as
rules. Given the deduction theorem and modus ponens, ‘p → q ’ and ‘p ⊢ q ’ are equivalent.
For example, there are four different ways to express EFQ:

A ⊢ ∼A → B
∼A ⊢ A → B
A → (∼A → B)
∼A → (A → B)

The first follows from EFQ as stated by one application of modus ponens. If we assume ‘A’ and
EFQ , we can conclude ‘∼A → B ’. Likewise EFQ as stated follows from this by one application of
the deduction theorem. The particular instance of the deduction theorem is ‘If A, ∼A ⊢ B , then
A ⊢ ∼A → B ’. All the others are likewise equivalent to EFQ by modus ponens and the deduction
theorem. Make sure you understand how.
Notice what happens to the axioms when we express them as rules. Here is one way to state

TF2 A → (B → C), A → B ⊢ A → C
TF3 ∼B → ∼A ⊢ A → B

There are other ways to write them. You may want to list them all.
Some proofs. Let’s start my proving something we’ve already proved—HS—to compare the
proof with the deduction theorem and the proof without it.
TF5 (HS) A → B, B → C ⊢ A → C

1 A→B
2 B→C
3 A
4 B 1,3 MP
5 C 2,4 MP
6 A→C 3–5 CP

TF9 (CM*) ∼A → A ⊢ A

1 ∼A → A
2 ∼A
3 A 1,2 MP
4 ∼(∼A → A) 2,3 EFQ
5 ∼A → ∼(∼A → A) 2–4 CP
6 (∼A → A) → A 5 TF3
7 A 1,6 MP
(CM* is related to consequentia mirabilis, which will be proved in TF15. It also, by definition
of v, is equivalent to A vA ⊢A, which we’ve called the rule of tautology.)
TF10 (DNE) ∼∼A ⊢ A

1 ∼∼A
2 ∼A
3 A 1,2 EFQ
4 ∼A → A 2–3 CP
5 A 4 CM*

TF11 (DNI) A ⊢ ∼∼A

1 A
2 ∼∼∼A → ∼A DNE
3 A → ∼∼A 2 TF3
4 ∼∼A 1,3 MP
‘DNE’ stands for ‘double negation elimination’ and ‘DNI’ for ‘double elimination introduction’.
These two together make the rule of double negation (DN). From now on we can add or remove
pairs of negations to any whole line, citing DN.
From here on the proofs will be left as exercises.
TF12 (MT) A → B, ∼B ⊢ ∼A
TF13 A → ∼B, B ⊢ ∼A
TF14 ∼A → B, ∼B ⊢ A
Notice that, by the deduction theorem, MT is equivalent to A →B ⊢∼B →∼A. This, with
TF3, gives us transposition. TF13 and TF14 tell us that A →∼B ⊢B →∼A and ∼A →B ⊢∼B
→A. These are obviously related laws, so we will allow any of them to be cited as Trans.
TF15* (CM) A → ∼A ⊢ ∼A
TF16 (RAA) A → B, A → ∼B ⊢ ∼A
‘RAA’ stands for reductio ad absurdum. This is related to, but not identical with, the rule of
indirect proof (IP). That rule must be stated like the the rule of conditional proof: Metatheorem 2
(IP) If Γ, A ⊢B and Γ, A ⊢∼B , then Γ ⊢∼A.
Given the deduction theorem and RAA, this is easy to prove, and the other part of indirect
proof, If Γ, ∼A ⊢B and Γ, ∼A ⊢∼B , then Γ ⊢A, follows easily too, given double negation.
So far all the theorems and rules have involved only our undefined terms. To prove theorems
involving the other truth functors, we need to use the definitions. (The name of the next theorem
stands for ‘Law of Excluded Middle’.)
TF17 (LEM) A v ∼A

1 ∼A → ∼A TF4
2 A v ∼A 1 Def v
Perhaps the best way to approach proofs involving other truth functors is to begin at the end.
We begin by translating the thing we’re trying to prove into the basic symbols, and then treating
the proof as a proof using only those basic symbols. For example, to do TF17 we first translated ‘A
v∼A’ into ‘∼A →∼A’, and then proved that.
TF18 (l-Add) A⊢BvA
TF19 (r-Add) A⊢AvB
TF20* A v B, ∼A ⊢ B
TF21 A v B, ∼B ⊢ A
TF18 and TF19 together give us the rule of addition (Add). (TF18 is addition to the left and
TF19 is addition to the right.) TF20 and TF21 together give us the rule of disjunctive syllogism

(DS). These are the basic rules for dealing with disjunction; they allow us to do proofs involving
disjunction without translating back into the basic symbols.
TF22 A&B ⊢A
TF23 A&B ⊢B
TF24 (Conj) A, B ⊢ A & B
TF25* A↔B⊢A→B
TF26 A↔B⊢B→A
TF27 A → B, B → A ⊢ A ↔ B
At this point, we’ve proved all the “basic rules” from truth-functional logic, and a few of the
“shortcut rules.” That means that any argument we could prove with those rules, we can prove with
this new system. We could look at that in two different ways. One way to look at that is that from
here on out, any proof in this new axiom system is really just a proof in the system you learned in
your first-year logic course. We may write it a little different than you did there, but it’s really the
same thing. The other way to look at it is that we’ve given axiomatic justification for the logic you
learned in your first-year course. When we turn from proving arguments in the system to proving
argument about the system—such as, for example, that it is complete and sound—any results we
can prove about the axiom system will also hold for the other system. Similarly, any proof that you
did in your first-year course, and any proof that we do here, can be done citing only the axioms and
modus ponens. It might be instructive to try it. Another way to generate theorems is to prove meta-
theorems, like the deduction theorem. This allows us to show that whole classes of statements are
theorems, without proving each one individually. There are two useful meta-theorems that follow
easily from what we’ve done.
Metatheorem 3 If Γ, A ⊢B and Γ, B ⊢A, then Γ ⊢A ↔B .
Proof: Suppose A ⊢B and B ⊢A. Then, by the deduction theorem, Γ ⊢A →B and Γ ⊢B →A.
From these two it follows by BE that Γ ⊢A ↔B .
This meta-theorem allows us to generate theorems like ‘A ↔∼∼A’ (by DNI and DNE). It
also allows us to approach any biconditional theorem as if it were two separate derived rules. So
whenever we have a theorem A ↔B , we can prove it in two parts: A ⊢B and B ⊢A. Metatheorem
3 tells us that given both of these, the biconditional is a theorem.
Metatheorem 4 If Γ ⊢A ↔B and Γ ⊢A, then Γ ⊢B .
Proof: Suppose Γ ⊢A ↔B and Γ ⊢A. Thus we begin a proof with ‘A ↔B ’ and ‘A’ as the first
two lines. We can then attach ‘A →B ’ by BE, and then ‘B ’ by MP. Because there is a proof of ‘B ’,
⊢B . These two meta-theorems can be used with the following powerful meta-theorem, the theorem
of replacement, to allow us to substitute logically equivalent statements within a line. For example,
if we have ‘∼∼A →B ’ as a line in a proof, we can write ‘A →B ’ as the next line.
Metatheorem 5 (Replacement) If A ↔B and A occurs in C , then C ↔D, where C and D differ
only in 0 or more occurrences of A in C have been replaced by B .
The proof of this theorem is by induction.
To illustrate these three meta-theorems in actions, consider this proof of one part of DeMorgan’s
TF29 ∼(A v B) ⊢ ∼A & ∼B

1 ∼(A v B)
2 ∼(∼A → B) 1 Def v
3 ∼(∼A → ∼∼B) 2 DN
4 ∼A & ∼B 3 Def &
This proof is largely just replacement of definitional equivalents. On line 3, we cited ‘DN’ to
justify adding two negation symbols in front of the consequent of a conditional within a negation.
But the rule of DN as proved in TF10 and TF11, allows only adding or removing two negations in
front of the whole line. But by Metatheorem 3, A ↔∼∼A, and so by the Theorem of Replacement
we can substitute ‘∼∼A’ for ‘A’ whenever it occurs in a line. When we do that, we could cite
Replacement, but we could also cite the theorem that demonstrates that A ↔B , which in this case
is DN. Notice that the Theorem of Replacement allows multiple substitutions in a single line. We
could add double negations to two parts of a line in a single step.
TF29 is an example of a class of theorems that are largely definitional substitutions. Because the
proofs often involve double negation or transposition, they are much easier—almost trivial—once
we have the Theorem of Replacement.
TF30* ∼(A v B) ↔ ∼A & ∼B
TF31 ∼A v ∼B ↔ ∼(A & B)
(These two make up De Morgan’s law (DeM).)
TF32 (CN) ∼(A → B) ↔ (A & ∼B)
TF33 (A → B) ↔ (∼A v B)
TF34 (A → B) ↔ (B v ∼A)
(TF33 and TF34 together make conditional exchange (CE).)
TF35* (A v B) ↔ ∼(∼A & ∼B)
TF36 (A & B) ↔ ∼(∼A v ∼B)
Then follow several more rules or interesting tautologies:
TF37 (Exp) [(A & B) → C] ↔ [A → (B → C)]
TF38 (v-Comm) (A v B) ↔ (B v A)
TF39 (&-Comm) (A & B) ↔ (B & A)
TF40 (v-Assoc) (A v (B v C)) ↔ ((A v B) v C)
TF41* (&-Assoc) (A & (B & C)) ↔ ((A & B) & C)
TF42 (CD) A v B, A → C, B → C ⊢ C v C
TF43 (Dil) A v B, A → C, B → C ⊢ C
TF44 (v-Dist) (A v (B & C)) ↔ (A v B) & (A v C)
TF45 (&-Dist) (A & (B v C)) ↔ (A & B) v (A & C)
TF46 (&-Dist) ((A v B) & (C v D)) ↔ ((A & C) v (A & D) v (B & C) v (B & D))
TF47* A&B ⊢A↔B
TF48 ∼A & ∼B ⊢ A ↔ B
TF49 (Peirce’s Law) ((A → B) → A) → A
TF50 (A → B) v (B → A)
TF51 A v (A → B)

TF52 A → (B → C), D → B ⊢ A → (D → C)
TF53 A & B, A → C ⊢ C
TF54 A & B, B → C ⊢ C
TF55 (HS2) A & B, A → C ⊢ B & C

3.4 An Axiom system for FOL

We keep the axioms as before. But we somehow need to extend the system to cover the new symbols
and new kinds of sentences.
UI ∀xp ⊢ p(a/x)
UG p ⊢ ∀xp(x/a), if a is not in any assumption
EG p ⊢ ∃x(x/a)
EI If p, . . . ⊢ q , then ∃xp(x/a), . . . ⊢ q , if a is not in q or any assumption
The lower-case letters p and q , as before, stand for any sentence. The x stands for any bound
variable. The notation ‘p(a/x)’ means that we take the sentence p and replace every x with some
letter a. For example, these are all instances of UI:

∀x(Ax → Bx) ⊢ (Aa → Ba)

∀x∀y(Ax & By) ⊢ ∀y(Aa & By)
∀y∃xRxy ⊢ ∃xRxb

In the other three rules, (x/a) means that every instand of some letter a is replaced by some
variable x. The rules UG and EI have extra restrictions.
We can prove some theorems and derived rules of FOL. The first is a good example of UI and
FOL1 (Barbara) ∀x(Ax → Bx), ∀x(Bx → Cx) ⊢ ∀x(Ax → Cx)

1 ∀x(Ax → Bx)
2 ∀x(Bx → Cx)
3 Aa → Ba 1 UI
4 Ba → Ca 2 UI
5 Aa → Ca 3,4 TF (HS)
6 ∀x(Ax → Cx) 5 UG

Line 6 calls the rule UG. The restriction for UG does not allow the rule if the letter being
generalized from (here a) to be free in any assumptions (here lines 1 and 2). It’s not, so the restriction
is met.
The next adds EI and EG:
FOL2 (Darii) ∃x(Ax & Bx), ∀x(Bx → Cx) ⊢ ∃x(Ax & Cx)

1 ∃x(Ax & Bx)

2 ∀x(Bx → Cx)
3 Aa & Ba (1 ei a)
4 Ba → Ca 2 UI
5 Aa & Ca 3,4 TF (HS2 + Comm)
6 ∃x(Ax & Cx) 5 EG / 3 EI

The restriction on EI requires that the letter (here a) not be free in any suppositions (lines 1 and
2) or the result (q in the rule description, here line 6). We cite the rule in parenthesis on the line of
the assumption, with the existentially quantified line and the letter introduced. Then on the result
line (the q line), we cite, after whatever other rules allowed us to get that line, the assumption line.
Every syllogism can be proved easily from using Barbara and Darii along with truth functional
equivalences. Here’s a list of the valid syllogisms, with traditional names:
Figure 1
Barbara ∀x(Ax → Bx), ∀x(Bx → Cx) ⊢ ∀x(Ax → Cx)
Celarent ∀x(Ax → Bx), ∀x(Bx → ∼Cx) ⊢ ∀x(Ax → ∼Cx)
Darii ∃x(Ax & Bx), ∀x(Bx → Cx) ⊢ ∃x(Ax & Cx)
Ferio ∃x(Ax & Bx), ∀x(Bx → ∼Cx) ⊢ ∃x(Ax & ∼Cx)
Figure 2
Cesare ∀x(Ax → Bx), ∀x(Cx → ∼Bx) ⊢ ∀x(Ax → ∼Cx)
Camestres ∀x(Ax → ∼Bx), ∀x(Cx → Bx) ⊢ ∀x(Ax → ∼Cx)
Festino ∃x(Ax & Bx), ∀x(Cx → ∼Bx) ⊢ ∃x(Ax & ∼Cx)
Baroco ∃x(Ax & ∼Bx), ∀x(Cx → Bx) ⊢ ∃x(Ax & ∼Cx)
Figure 3
Datisi ∃x(Bx & Ax), ∀x(Bx → Cx) ⊢ ∃x(Ax & Cx)
Disamis ∀x(Bx → Ax), ∃x(Bx & Cx) ⊢ ∃x(Ax & Cx)
Ferison ∃x(Bx & Ax), ∀x(Bx → ∼Cx) ⊢ ∃x(Ax & ∼Cx)
Bocardo ∀x(Bx → Ax), ∃x(Bx & ∼Cx), ⊢ ∃x(Ax & ∼Cx)
Figure 4
Celantes ∀x(Ax → Bx), ∀x(Bx → ∼Cx) ⊢ ∀x(Cx → ∼Ax)
Dabitis ∃x(Ax & Bx), ∀x(Bx → Cx) ⊢ ∃x(Cx & Ax)
Friseson ∀x(Ax → ∼Bx), ∃x(Bx & Cx) ⊢ ∃x(Cx & ∼Ax)
For example, here is a proof of Cesare:
1 ∀x(Ax → Bx)
2 ∀x(Cx → ∼Bx)
3 ∀x(Bx → ∼Cx) 2 TF (Trans)
4 ∀x(Ax → ∼Cx) 1,3 Barbara
FOL3 ∀x(P → F x) ⊢ P → ∀xF x
Here, as elsewhere in this section, P stands for any TF sentence, or in general any sentence that
does not have the quantified variable.
FOL4 ∀x(F x → P ) → (∀xF x → P )

1 ∀x(F x → p)
2 Fa → p 1 UI
3 ∀xF x (cp)
4 Fa 3 UI
5 p 2,4 MP
6 ∀xF x → p 3-5 CP
7 ∀x(F x → p) → (∀xF x → p) 1-6 CP
FOL5 (Distribution) ∀x(F x → Gx) ⊢ ∀xF x → ∀xGx
1 ∀x(F x → Gx)
2 ∀xF x (cp)
3 F a → Ga 1 UI
4 Fa 2 UI
5 Ga 3,4 MP
6 ∀xGx 5 UG
7 ∀xF x → ∀xGx 2-6 CP
FOL6 ∀xP ↔ P
FOL7 ∃xP ↔ P
In these two theorems, the quantifiers are vacuous: they don’t bind any variables.
FOL8 ∀xF x ⊢ ∃xF x
FOL9 ∀x(F x → ∃yF y)
FOL10 ∃x∀yRxy → ∀y∃xRxy
FOL11 ∀x(F x → P ) ⊢ ∃xF x → P
FOL12 ∀x∼F x ⊢ ∼∃xF x
FOL13 ∼∀xF x ⊢ ∃x∼F x
FOL14 ∼∀x∼F x ↔ ∃xF x
FOL15 ∀x∼F x ↔ ∼∃xF x
FOL16 ∼∀xF x ↔ ∃x∼F x
FOL17 ∀xF x ↔ ∼∃x∼F x
These last four logical equivalences are often useful; we will refer to them collectively as the rule
of quantifier negation (QN).
Prenex Normal Form
Every sentence of FOL is logically equivalent to a sentence in which all the quantifiers are at the
left, followed by a quantifier-free formula. For example, the following two sentences are equivalent:

∃xAx → ∃yBy
∀x∃y(Ax → By)

The first sentence has the quantifiers applied to the shortest segment of the sentence necessary; the
second sentence has the quantifiers applied to the whole sentence. Another way to say this is that, in
the first sentence, the quantifiers lie within the scope of a truth-functional connective (the conditional
has the broadest scope); the second sentence has no quantifier within the scope of a truth-functional

connective. This will serve as the definition of prenex normal form (PNF): A sentence in prenex
normal form has no quantifiers falling within the scope of a truth-functional connective. There are
two major steps in converting a sentence into PNF. They correspond to the two undefined truth-
functional connectives: ∼ and →. The first involves moving the negations to fall within the scope
of the quantifiers. We do this by applying the rule of quantifier negation. For example, if we have
the following as part of a sentence

...∼∃x(P x → ...

we need to move the existential quantifier to have broader scope than the negation. By QN, we

...∀x∼(P x → ...

The next step in converting a sentence to PNF consists in moving the conditionals to fall within
the scope of the quantifiers. This is done by applying the next four equivalences:
FOL18 ∀x(P → F x) ↔ (P → ∀xF x)
FOL19 ∃x(P → F x) ↔ (P → ∃xF x)
FOL20 ∀x(F x → P ) ↔ (∃xF x → P )
FOL21 ∃x(F x → P ) ↔ (∀xF x → P )
By repeatedly using the right-to-left directions of these biconditionals and the suitable instances
of QN, we can change every sentence into a sentence in PNF. (Of course, we could also go the
other direction, driving the quantifiers in as far as they will go.) If the sentence has defined truth-
functional connectives, it can be converted to PNF by first replacing the defined connectives by their
definitions, then proceeding as before.
Of course, we can prove the PNF rules using other operators directly.
FOL22 ∀xF x ⊢ ∀x(F x v Gx)
FOL23 ∃xF x ⊢ ∃x(F x v Gx)
FOL24 ∀x(F x & Gx) ⊢ ∀xF x
FOL25 ∃x(F x & Gx) ⊢ ∃xF x
FOL26 ∀x(F x & Gx) ↔ (∀xF x & ∀xGx)
FOL27 ∃x(F x v Gx) ↔ (∃xF x v ∃xGx)
FOL28 ∃x(F x & Gx) ⊢ ∃xF x & ∃xGx
FOL29 ∀xF x v ∀xGx ⊢ ∀x(F x v Gx)
(Notice with some of these the equivalence goes in only one direction.)
With a complex sentence that has many quantifiers, there’s no rule about which quantifier to
bring out first. And it may be that you get a different sentence if you bring them out in one order
rather than another. But we can prove that any way of bringing them out is equivalent to any other.
For example, the following three sentences are logically equivalent:
a: ∃xFx→∃yGy b: ∀x∃y(Fx→Gy) c: ∃y∀x(Fx→Gy)
Prove their equivalence. That is, prove
3.5. IDENTITY 73

FOL30 (∃xF x → ∃yGy) ↔ ∀x∃y(F x → Gy) ↔ ∃y∀x(F x → Gy)

Other logical truths
FOL31 ∀y(F y → ∃xF x)
FOL32 ∃y(F y → ∀xF x)
FOL33 ∃y(∃xF x → F y)
FOL34 ∀x∃y(F x → Gy) → ∃x(F x → ∃yGy)
FOL35 (∃xF x → ∃xGx) → ∃x(F x → Gx)
FOL36 ∀x∃y(F x & Gy) ↔ ∃y∀x(F x & Gy)
FOL37 ∀x∃y(F x v Gy) → (∀xF x v ∃yGy)

3.5 Identity
Identity is, on the one hand, just another two-place predicate. But, on the other hand, it is certainly
a logical relation, so it will take special consideration. We could symbolize it ‘Ixy’, but we will stick
with the more familiar x = y , to mark it as a special logical relation.
The basic principle of identity was stated by Bishop Butler: “Every thing is what it is, and not
another thing.” Thus it’s never really correct to talk about two things being identical; everything is
identical only to itself. When we say something like
we are saying that the variables x and y pick out the same object. So our first axiom of identity we’ll
call ‘Butler’s Law’:

BL ∀x(x = x)

The other axiom is usually named after Leibniz, and says this:

LL ∀x∀y(x = y → (P x → P y))

This, of course, is an axiom schema, and holds for any P . This law is also sometimes known as The
Principle of Indiscernibility of Identicals. It says that everything has whatever properties it has.
The trick to doing proofs with identity is finding the right substitution instance for P. Sometimes
it’s straightforward, but sometimes the substitution instance is fairly complex. The following are all
allowable as instances of LL:
∀x∀y(x = y → (P x → P y)) (P _ : P _)
∀x∀y(x = y → (Rxx → Rxy)) (P _ : Rx_)
∀x∀y[x = y → [(P x & Rxy) ↔ (P x & Rzy)]] (P _ : P x & R_y)

FOL38 ∀x∀y((P x & x = y) → P y)

FOL39 ((P x & ∼P y) → x ̸= y)
FOL40 (x = y → y = x)
FOL41 ((x = y & z = y) → x = z)

FOL42 ((x = y & y = z) → x = z)

FOL43 ((y = x & z = y) → x = z)
FOL44 ((x = y & z = y) → z = x)
FOL45 ((z = x & y = z) → x = y)
Axiom BL tells us that identity is totally reflexive. FOL30 tells us it’s symmetric, and FOL32
tells us it’s transitive. Thus identity forms an equivalence class. This allows a helpful shortcut rule:

Ident Given any sequence of appropriately linked identities, we can take the extremes as identical.

If we have a chain of identities (e.g., a = b & b = c & c = d & d = e & . . . & m = n), we can
take the first and last and set them equal (a = n).
Chapter 4

Modal Logic

4.1 What is modal logic?

What are modals?
By now you are an expert at dealing with truth-functional logic. Truth-functional logic takes certain
symbols as constant, symbols like ‘v’ and ‘→’. These constants works like functions from truth-
values to truth-values. In other words, ‘AvB’ is true whenever either ‘A’ or ‘B’ is true. The only
thing truth-functional logic is concerned with is the truth-value of ‘A’ and ‘B’. That’s what allows us
to construct truth tables. Truth-functional logic is a powerful model for some of our language, but
not all of our language is truth-functional. Consider the sentences ‘Superman can fly’ and ‘Clark
Kent can fly’. Let’s assume that both sentences are true. Now consider the sentences ‘Lois knows
that Superman can fly’ and ‘Lois knows that Clark Kent can fly’. It seems plausible to say that the
former sentence is true and the latter sentence is false. The same phrase, ‘Lois knows that’, can
attach to two different sentences, both of them true, and the two resulting sentences have different
truth-values. That means that ‘Lois knows that’ is not truth-functional.
Another example. Consider the sentences ‘Stephen Douglas was the sixteenth President of the
United States’ and ‘9 is a prime number’. These are both false. Now consider what happens when
we prefix the words ‘had things gone differently, it might have been the case that’: ‘Had things
gone differently, it might have been the case that Stephen Douglas was the sixteenth President of
the United States’ and ‘Had things gone differently, it might have been the case that 9 is a prime
number’. The former is probably true, but the second is certainly false.
‘Lois knows that’ and ‘had things gone differently, it might have been the case that’ are examples
of modals. Modals have that name because they don’t merely reflect truth, but modes of truth—
whether something has to be true, or used to be true, or is known to be true. They capture what
modern grammars call ‘adverbials’.
One important class of modals has to do with knowledge and belief, and are sometimes called
epistemic or doxastic modals (from the Greek words for ‘knowledge’ and ‘belief ’, respectively). These
are phrases like ‘it is known that’, ‘it is believed that’, ‘Lois knows that’, ‘Lois believes that’. Another


important class of modals has to do with obligation and permission, and are sometimes called deon-
tic modals: ‘it is obligatory to’ (or ‘must’), ‘it is permissible to’ (or ‘may’). Still another important class
has to do with time. With truth-functional logic we have ignored time completely. The statements
of logic are officially tenseless. It’s easy to see that verb tenses are not truth-functional. ‘Superman
can fly’ is true now, but not before he retreated to the Fortress of Solitude to learn how; ‘9 is a com-
posite number’ is true now, and there never has been a time when it wasn’t true. So when we add
an operator like ‘ten years ago’ to a sentence, the resulting sentence is only sometimes true. These
temporal modals can be expressed by changing the tense of the verb, and also by adding prefixes
like ‘it is always the case that’, ‘it will be the case that’, ‘yesterday it was the case that’, and so on.
All of these modals are philosophically important. Epistemic modals are important in episte-
mology, the study of knowledge. Deontic modals are important in ethics. Temporal modals are
important in the branch of metaphysics that studies time. (It should be called ‘chronology’, but that
word’s already taken.) But the most important modals in philosophy are sometimes called ‘alethic’
(from the Greek word for ‘truth’) or ‘metaphysical’ or ‘counterfactual’. These modals are usually
expressed ‘necessarily’ and ‘possibly’ (or ‘it is necessary that’ and ‘it is possible that’), and these words
are given special meanings. The meaning of the second is roughly ‘had things gone differently, it
might have been the case that’ and the meaning of the first is ‘even had things gone differently, it
would still have been the case that’, or ‘it has to be that’. Another modal that is sometimes thrown
in is ‘contingent’, which means ‘true and not necessary’.
It may be helpful to think of the difference as what God could have done when he made the
world. God could have made grass blue, so it’s possible that grass is blue, but God couldn’t have
made 9 prime, so it’s not possible that 9 is prime. It’s important here not to get confused. Of course,
‘9’ might have referred to the number 7, so the sentence ‘9 is prime’ might have meant something
different—something true—but that’s irrelevant to the truth of the sentence ‘it is possible that 9 is
prime’. In this way the modal operators are the same as the truth-functional operators: attaching
an operator to a sentence does not give you the right to change the meaning of the words in the

How do modals make a logic?

If modals are not truth functional, how can they make a logic? You are already familiar with a logic
that’s not truth functional. Modern predicate logic is not truth functional—you can’t make truth
tables for arguments in predicate logic. But predicate logic fails to be truth functional for a different
reason than modal logic. In predicate logic, the quantifiers attach to open formulas, which have no
truth value. But modal operators do attach to statements.
The way the modals work is similar to the way Aristotelian logic works. On Aristotelian logic,
quantifiers are attached to a sentence like ‘tigers are tame’ to make ‘all tigers are tame’, ‘no tigers
are tame’, ‘some tigers are tame’, and ‘not all tigers are tame’. The two quantifiers, ‘all’ and ‘some’,
attach to a statement to make a new statement. These quantifiers have certain relations to each
other, by virtue of which they are duals of each other. The most fundamental of these relations is
that the quantifiers are interdefined. To use modern symbols:

The modal operators work just like this. There are duals, usually symbolized ‘□’ and ‘⋄’, that
are interdefined:
In deontic logic, ‘□’ is interpreted ‘it is obligatory that’ and ‘⋄’ is interpreted ‘it is permissible
that’. It should be easy to see that, under this interpretation, the operators are duals: if it is obligatory
for me to brush my teeth, it is not permissible for me not to brush my teeth. In epistemic logic, ‘□’
is interpreted ‘it is known that’ and ‘⋄’ is interpreted ‘it is believed that’. Sometimes they will have a
subscript: ‘□Lois’ means ‘Lois knows that’. (Also, sometimes epistemologists use ‘K’ instead of ‘□’.)
Again, it should be easy to see that the operators are duals: if Lois knows something to be true, she
does not believe it to be untrue (in some sense of ‘believe’). In temporal logic, ‘□’ is interpreted ‘it
is always the case that’, and ‘⋄’ is interpreted ‘it is sometimes the case that’. (There are also other
temporal logics, which define the operators in such ways as ‘it will always be the case that’ and ‘it will
sometimes be the case that’; or ‘it is and always will be the case that’ and ‘it is or will sometimes be
the case that’.) In alethic modal logic, ‘□’ is interpreted ‘necessarily’, and ‘⋄’ is interpreted ‘possibly’.

Symbolizing modal logic

Propositional modal logic, which is just propositional logic with modals attached, is much easier to
deal with than first-order logic. We see this first with translating sentences from English into modal
Example 4.1.A
Translate into modal logic the following sentence: “It is not possible for John to go to the store.”
Take ‘J’ to be ‘John goes to the store’. Then it could be symbolized either of the following ways:
Example 4.1.B
Translate the following sentence: “Anna does sometimes counsel take—and sometimes tea.”
(This sentence is adapted from a line in Pope’s Rape of the Lock.) Here the modal is temporal: it
expresses what happens not now and not always, but sometimes. Take ‘C’ to be ‘Anna takes counsel’
and ‘T’ to be ‘Anna takes tea’. Then the sentence is symbolized like this:
⋄C &⋄T
Why isn’t it ‘⋄(C &T)’? That would say that there are times she takes both counsel and tea, i.e.,
that she takes them both at the same time. But the joke in the original line is that Anna sometimes
would rather take tea than listen to good advice, that she doesn’t do them at the same time. Even if
she does, that’s not what the original sentence said.
As we have seen, there are several words in English that express modes that are translated by ‘□’:
must, necessarily, always, has to. There are several words that are translated by ‘⋄’: may, possibly,
sometimes, can. These different concepts work very differently—something may be possible without
being permissible, or necessary without being known. Because of that, modal logic is not really

one logic, but many. We’ll get to that soon; for now, let’s ignore those differences and practice


1 If you do your homework, you may play frisbee. (H: You do your homework. F: You play

2 Maybe I’ll eat that last cookie, but then again, maybe I won’t. (C: I will eat that last cookie.)

3 You always beat me at basketball, but I sometimes beat you at chess. (B: You beat me at
basketball. C: I beat you at chess.)

4 It’s possible, pig—I might be bluffing. (Princess Bride) (B: I am bluffing.)

5 It ain’t necessarily so. (S: It is so.)

6 It’s possible that, if this is milk, then it’s necessarily milk, but it’s not possible that, if it’s not
milk, then it’s necessarily not milk. (M: This is milk.)

7 If it is so, it may be, and it if must be so, it is. (S: It is so.)

8 If I must eat the cookie, then I may eat the cookie, but if I mayn’t eat the cookie, then I don’t
eat the cookie. (C: I eat the cookie.)

9 If grass is always green, then grass is green sometime or other. (G: Grass is green.)

10 Necessarily, if God possibly exists, then God necessarily exists. (G: God exists.)

11 It is possible that there is a green-eyed monster and not a red-eyed monster, but it is not possible
that, if there is possibly a red-eyed monster then there is possibly a green-eyed monster. (G:
There is a green-eyed monster. R: There is a red-eyed monster.)

4.2 Models

We’re going to talk about the semantics of modal logic before we talk about the syntax. That’s the
usual approach now, because it’s more intuitive, but it’s historically backwards. Modern modal logic
was invented in 1913 by C.I. Lewis, but it wasn’t until the 60s when a teenager named Saul Kripke
developed a syntax. This syntax depends on models.
For our purposes, a model needs three things. First, it needs a set of points. These can model
states in a game, or times, or alternate scenarios. We call these ‘possible worlds’, or just ‘worlds’.
Second, it needs a relation that specifies which of these worlds is accessible from which. And third,
4.2. MODELS 79

it needs a valuation, which says which simple propositions are true at which world. (The first two
things together make a frame; when a frame has a valuation, it becomes a model.)
A truth table is a kind of model for propositional logic. The lines on the truth table are the
possible worlds, each with its own valuation. The big difference between truth tables and models
for modal logic is that truth tables don’t have the second feature, the accessibility relation. That was
the feature that Kripke added, and that makes it possible to model modal logic. (Say this three times
fast: “A modal model models modal logic. A modal model models modal logic. A modal model
models modal logic.” Now, don’t get the words confused.)
Example 4.2.A.
Look at this model.
1 2:P


The numbers are the worlds, and the arrows are the accessibility relations. The ‘P’ and ‘Q’ at
worlds 2 and 3 indicate that P is true on 2 and Q is true on 3. (By convention we mark only the
propositions that are true at a world, so we can assume P is false on 1 and 3, and so on.) This model
gives us the information we need to determine the truth not only of the simple propositions, like
P and Q , but also the modal propositions, like ⋄P and □Q. ‘□p’ means ‘on every world accessible
from the given world, p is true’. ‘⋄p’ means ‘on at least one world accessible from the given world, p
is true’. What formulas are true at 1? Well, since P is true at 2 and 2 is accessible from 1, ⋄P is true
at 1. Similarly, ⋄Q is true. Also, since PvQ is true at 2 (since it is always true if P is true), ⋄(PvQ) is
also true at 1. But then, PvQ is true at 3 also, so □(PvQ) is true at 1. On the other hand, □P is false,
and □Q is false, so (□Pv□Q) is false at 1.
Example 4.2.B

1 2:P 3:P,Q

What’s true on world 1? Well, from world 1 we can get to worlds 2 and 3; both worlds are
accessible to world 1. And, P is true on both 2 and 3, so □P is true. But, since Q is false on world
2, □Q is not true on world 1. Because q is true on 3, and s is accessible from both worlds 1 and 2,
⋄P is true on worlds 1 and 2.
You can see that ‘⋄’ is a little like ‘∃’: just as ‘∃xPx’ means ‘for some x, Px’, ‘⋄p’ means ‘on some
world, p’. Similarly, ‘□’ is a little like ‘∀’: just as ‘∀xPx’ means ‘for all x, Px’, ‘□p’ means ‘on every
world, p’. You may remember that ‘∃x(x is a unicorn and has one horn)’ is true only if there is at
least one unicorn, but ‘∀x(if x is a unicorn then x has one horn)’ may be true even if there are no

unicorns. It’s the same with the modals. ‘⋄p’ means that p is true on at least one accessible world, so
it cannot be true if there are no accessible worlds. ‘□p’ means that p is true on all accessible worlds.
If there are no accessible worlds, then p is sure true on all the accessible worlds there are! In that
case, we say that ‘□p’ is true “vacuously”. So, on world 3, ‘⋄P’ and ‘⋄Q’ are both false, since there
are no accessible worlds where ‘P’ or ‘Q’ are true, but on world 3 ‘□P’ and ‘□Q’ are true vacuously.
Example 4.2.C
Is □P true on world 1? Yes, since every world accessible to world 1—in this case, there’s only
one such world: world 3—is a world where p is true. Is p&⋄P on world 2? Again, yes, since p is true
on world 2 and ⋄P is true on world 2.
1:P 2:P


For exercises 1–10, use the following model:

1:P 2:P,Q


State whether the given proposition is true on the given world.

1 1: P

2 1: ⋄P

3 1: □P

4 1: Pv□P

5 2: P

6 2: □P

7 2: Q→□P

8 3: □P

9 3: ⋄P
4.2. MODELS 81

10 3: Q→□P

For exercises 11–22, use the following model:



11 1: P

12 1: ⋄P

13 1: □P

14 1: P&□P

15 1: □(PvQ)

16 2: ⋄P

17 2: □P

18 2: Q→□P

19 3: □Pv⋄P

20 3: Q→□P

21 4: PvQ

22 4: Q→□P

Example 4.2.D Treasure Island

This is an island. The “worlds” on this island are discrete parts of the island. At 3, 5, and 6
are pirates. At 9 there is a treasure. (That is, ‘P’ here means ‘There be pirates here’, and ‘T’ means
‘There be treasure here’.)

1 2 3:P

4 5:P 6:P

7 8 9:T

Notice that at world 2, □P is true; I can’t move anywhere without running into pirates. □P is
true also on world 3, and, vacuously, 9. At which worlds is ⋄P true? 2, 3, 4, and 5, because each of
these has some move that leads to pirates. At which worlds is ⋄□P true? Recall that this means that
there is some move that leads me to a place where □P is true. Because we already figured out that
□P is true on worlds 2, 3, and 9, ⋄□P will be true on every world from which those three worlds are
accessible: 1, 2, 6, and 8. It’s not true on 9 because there are no legal moves from 9.

At which worlds of Treasure Island are the following propositions true? (There may be more
than one.)

23 ⋄T

24 ⋄□T

25 ⋄P

26 □⋄P

27 P&⋄T

28 ∼P&□T

29 P&□T

30 ⋄□□T

For the following exercises, state whether the following propositions are true at world 1 of this
4.2. MODELS 83

2 3:P

4 5:Q

31 ⋄P

32 ⋄⋄P

33 ⋄□P

34 □⋄P

35 ⋄(□Pv□Q)

36 □(⋄Pv⋄Q)

37 (⋄□P&⋄□Q)→⋄□(P&Q)

38 (⋄□P&⋄□Q)→⋄□(PvQ)

Just as with other levels of logic, there are modal statements that are logical truths. A modal logical
truth will be true at every world of every model. We sometimes say that a statement is valid on a
given model if it is true on every world of that model; a modal logical truth is valid on every model.
Most statements, of course, are not valid. To show that a statement is not valid, we provide a
counterexample: a model and a world on that model where the statement is not true. By convention,
we usually take world 1 to be this world. Sometimes it’s easy to find a counterexample for a given
statement, but sometimes it takes some thought and some trial and error.
Example 4.2.E
Find a counterexample to □P→P.
Because this is a conditional, it will be false if the antecedent is true and the consequent is false.
Here is such a model:

1 2:P

The antecedent is true at world 1, since at every world accessible from world 1 (that is, only
world 2), P is true. But the consequent is false at world 1, so the conditional is false.
Example 4.2.F
Find a counterexample to P→⋄P.

Again, we need to make the antecedent true and the consequent false. To make the antecedent
true, we need P to be true at world 1. To make the consequent false, we need some world accessible
from world 1 where P is not true. Here is such a model:

1:P 2

Example 4.2.G
Find a counterexample to □⋄P→⋄P.
Here to make the consequent false we simply need some world accessible from world 1 where P
is not true. To make the antecedent true we need to make every world accessible from world 1 such
that there’s some accessible world where P is true. Thus we need world 2, a world accessible from
world 1, on which P is false. Then we need a world accessible from world 2 on which P is true. One
way to do it is this:

1 2 3:P

But that isn’t the only way to do it. We can get by with just two worlds:

1:P 2

As you can see, there is never just one correct counterexample.

Example 4.2.H
Find a counterexample to □P&P.
Because this is a conjunction, to falsify this statement we need to make at least one conjunct
false. Here we can do this with just one world: a world on which P is false:
Example 4.2.I
Find a counterexample to (⋄P&⋄Q)→⋄(P&Q).
Here we need two worlds accessible from world one, one to make each of the conjuncts of the
antecedent true. If we made both conjuncts true with the same world, that world would also make
the conseqent true.



Find counterexamples to the following statements:
4.2. MODELS 85

1 ⋄P

2 P→□P

3 ⋄□P→□P

4 ⋄⋄P→⋄P

5 □P→⋄P

6 (⋄Pv⋄Q)→⋄(P&Q)

7 □(PvQ)→(□Pv□Q)

8 (P→□Q)→□Q

9 (⋄P&□□P)→□(P→⋄P)

Properties of Relations
Every model, you recall, has an accessibility relation. Sometimes these relations can have interesting
properties. For example, a model’s relation is reflexive if every world is accessible from itself. This
model has a reflexive relation.

The relation is transitive if you can skip worlds; that is, if 3 is accessible from 2, and 2 is accessible
from 1, the relation is transitive if 3 is accessible from 1. Another way to say this is that every world
that is eventually accessible—accessible after some number of steps—is (immediately) accessible.

The relation is symmetric if every world is accessible from all worlds that are accessible from it.
Another way of saying this is that you can always get back to where you started. You can always get
home. Yet another way of saying this is that every world in the model is accessible in some number
of steps from every other world. You can get anywhere from anywhere else.

We can either say that the model’s relation is symmetric (or transitive or reflexive), or we could
just say that the model is symmetric (etc.). To say that a model is symmetric doesn’t mean that the
picture is symmetric; it means that the relation is symmetric.
We are interested in models that have these various relations because they allow us to talk
about modal logics of different strengths. We’ve seen, for example, that ‘P→⋄P’ and ‘□P→P’ and
‘□P→⋄P’ are not valid. But for certain interpretations of the box and diamond, we do want them to
be valid. For example, given the deontological reading (‘□’ means ‘obligatory’ and ‘⋄’ means ‘per-
missible’), we want the third statement to be true, but not the other two—it’s true that if something
is obligatory, it must be permissible, but it’s not true that if something is done, it is permissible. On
the temporal reading (‘□’ means ‘always true’ and ‘⋄’ means ‘sometimes true’), we want all three to
be true. And it turns out that some of these statements are valid on some kinds of models and not
on others.
There are names for the various frames based on the properties of the accessibility relation. (A
frame, recall, is a model without a valuation):
K no conditions
M reflexive
B reflexive, symmetric
K4 transitive
S4 reflexive, transitive
S5 reflexive, symmetric, transitive
(Note: ‘M’ is now usually called ‘T’.) Because there are formulas that are true in some frames
and not others, it is possible to make different logics for each of the different frames. If a formula is
true on every world in every B model (for instance), we will say that the formula is true or valid in B.
So far we’ve provided counterexamples in K, since to be valid in K means to be valid in every
model. But we can do the same with the other models.
Show that ⋄⋄P→⋄P is not true on B.
As we construct our counterexample, we need to make sure the accessibility relation is both
reflexive and symmetric. Thus, every world in the model must be accessible to itself (reflexivity),
and every world has to be accessible from every world it can access (symmetry). Another way to
say this: there must be circular accessibility arrows on every world, and every arrow connecting two
worlds must go both ways.

1 2:P

This model is both reflexive and symmetric, and hence a B model. The antecedent is true on
world 1, since world 2 (the only world on which p is true) is accessible in two moves from world 1.
It is not, however, accessible in one move, and so the consequent is not true, and hence the formula
is not true.
4.2. MODELS 87

For the following exercises, show that the given formula is not valid in the given frame by pro-
viding a counterexample.

1 M: P→□P

2 M: ⋄□P→□P

3 M: ⋄⋄P→⋄P

4 M: □(PvQ)→(□Pv□Q)

5 M: (P→□Q)→□Q

6 B: P→□P

7 B: P→⋄□P

8 B: ⋄⋄P→⋄P

9 B: ⋄ ⋄ ⋄P→P

10 B: □(PvQ)→(□Pv□Q)

11 B: (P→□Q)→□Q

12 K4: P→□P

13 K4: □(PvQ)→(□Pv□Q)

14 K4: (P→□Q)→□Q

15 K4: (⋄⋄P&□Q)→□(P&Q)

16 S4: P→□P

17 S4: ⋄□P→□P

18 S4: ⋄P→□⋄P

19 S4: □(PvQ)→(□Pv□Q)

20 S4: (P→□Q)→□Q

21 S5: P→□P

22 S5: □(PvQ)→(□Pv□Q)

23 S5: (P→□Q)→□Q

24 S5: (□□(P→Q)&⋄∼P)→∼Q

Some of these logics are stronger than others. One logic is stronger than another if there are state-
ments that are valid in it but not valid in the other. But sometimes two logics are incommensurable,
meaning that each has statements valid in it but not the other.
K is the weakest logic: it has the fewest valid statements, so it’s the easiest to find counterexamples
in. S5 is the strongest logic. In fact, S5 has an interesting property: On S5, you can collapse every
string of modal operators into the last one. For example, on S5, □□ ⋄ □ ⋄ □□□ ⋄ □⋄p↔⋄p. This is
because S5’s accessibility relation is an equivalence relation. The modal statements of every world
are just the same as those of every other world.

4.3 Quantified modal logic

Symbolizing modal logic: de re and de dicto
So far we’ve dealt only with propositional modal logic. From here on we’ll extend this to quantified
modal logic, which mixes quantifiers with modal operators. This sounds like a simple step, but it
brings up several philosophical and technical questions that need to be answered.
The move is similar to the move from propositional logic to quantified logic. When we made
that move, we went from statements like ‘A→B’ to statements like ‘∀x(Ax→Bx)’. We changed the
simple sentences to open sentences (by adding the variable x), and we prefixed the whole with a
quantifier. We’ll do the same thing here. We’ll change simple modal sentences like ‘⋄A→□A’ to
quantified modal statements like ‘∀x(⋄A→□A)’.
Quantified modal logic is a really powerful language, so powerful that it has led philosophers to
believe that it can help answer really vexing philosophical questions, at least by clarifying what’s at
Example 1: free will. Some philosophers say that I acted freely on some occasion if and only if I
could have done something other than what I did. If I freely helped the lady across the street, it was
possible for me to have spit in her eye instead. Thus, if we take ‘x’ to quantify over my actions and
‘Ax’ to mean ‘I perform x’, the following sentence expresses what it means to say that I’m sometiems
free: ‘∃x(Ax &⋄∼Ax)’—there’s some action such that I performed it and it’s possible that I didn’t
perform it.
Example 2: Why is there something rather than nothing? Why does anything exist at all? This
is a question that many philosophers have taken to be very important, and many of those who have
taken it to be important have found in its answer some knowledge about the universe and about
God. In particular, there is an argument that, if you accept the premises, proves that there is a
necessary being that explains the existence of everything else. It is crucial that this is a necessary
being; that is, the conclusion of the argument is that something necessarily exists.
This conclusion introduces the first technical subtlety. What does it mean to say “Something
necessarily exists”? Does it mean “It is necessarily true that something exists”—that is, it is impossi-
ble for nothing to exist? Or does it mean “There is something that exists necessarily”—that is, that

something has the property of necessary existence? It’s something like the ambiguity in ‘She has a
ring on every finger’. Does this mean she has five rings, or one huge ring? This is called (you prob-
ably remember) a “scope ambiguity”: what is the scope of the quantifier? Is it ‘∃x∀yOxy’—‘there
is some ring x such that for all fingers y, x is on y’? Or it is ‘∀y∃xOxy’—‘for every finger y there
is some ring x such that x is on y’? (Less silly examples: ‘Everyone loves someone’ and Aristotle’s
‘Every action aims at some end’.)
Here, too, the ambiguity is one of scope. If the quantifier has the wider scope—‘∃x□Ex’—it
means that we choose the thing first. There is something (perhaps God) that necessarily exists. If
the modal operator has the wider scope—‘□∃xEx’—it means that we choose the thing that exists
on a world only after we consider that world. On this world it may be the sun, on that world it may
be this slice of blueberry cheesecake. It doesn’t matter what we choose: the statement is true if there
something, anything, on every world, even if that thing isn’t on any other world.
This distinction is called the de re/de dicto distinction, from Latin phrases meaning ‘of the
thing’ and ‘of the statement’. The distinction turns on whether the operator applies only to the
predicate (the thing) or to the whole statement. It’s sometimes easiest to see the distinction if we
look at it using a temporal modal operator. Consider this sentence: ‘The U.S. President will always
be democrat’. Right now, the President, Barack Obama, is democrat. The sentence might be saying
that this particular entity, Obama himself, will always be democrat. Or the sentence might be saying
that the sentence, ‘the U.S. President is democrat’ will always be true (i.e., by having a succession of
democratic presidents). The former reading of the sentence is the de re reading; the latter is the de
dicto reading.
In a de dicto reading, the modal operator always has wider scope. That’s just what it means
to say that the modality of “of the statement” rather than “of the thing.” So, taking the box to
mean ‘always’ (and simplyfying a little by saying ‘all’ rather than ‘the’), the de dicto reading of ‘The
U.S. President will always be democrat’ is ‘□∀x(Px→Dx)’. Here, the statement ‘∀x(Px→Dx)’ has
necessary modality. The de re reading is ‘∀x□(Px→Dx)’. Here it’s only the concrete individual
already picked out that has the property necessarily.
With the original example, ‘something necessarily exists’, the de dicto reading (of course) is the
one that has the modal operator first: ‘□∃xEx’. The de re reading is the one that has the quantifier
first: ‘∃x□Ex’.


1 Explain, in words, the difference between □∀xPx and ∀x□Px.

2 Explain, in words, the difference between ⋄∃xPx and ∃x⋄Px.

3 What are the de re and de dicto readings of ‘Everything can happen’?

4 What are the de re and de dicto readings of ‘Something has to be there’?


Just as with propositional modal logic, various modals in English are translated with the box and
diamond. The trickiest part of translating into the modal logic is the difference between de re and
de dicto.
Example 4.4.A
Translate “Someone’s gotta talk to him.”
What does this mean? Does it mean that there’s someone waiting out in the lobby who needs
to talk to him, perhaps to ask for clemency for her son? Or does it mean that he must be talked to
by someone or other? If it means the former, it is translated ‘∃x□Txh’ (with ‘Txy’ meaning ‘x talks
to y’ and ‘h meaning ‘him’). If it means the latter, it is translated ‘□∃xTxh’.
Example 4.4.B
Translate “Everybody needs somebody sometime.” Take ‘Nxy’ to be ‘x needs y’, and take the
modal to be temporal, and restrict the univese of discourse to persons.
The intent of this sentence, I think, is quite clear. It is not saying that there are moments at
which everybody needs somebody, but that everyone is such that he or she needs somebody or other
at some time or other. Thus, it is symbolized ‘∀x⋄∃yNxy’. The ‘everybody’ is outside the modal, but
the ‘somebody’ is inside. If the existential quantifier were also outside the modal operator, it would
say that everyone has some specific person that the first person needs every once in a while.
So, just as with quantificational logic, one must take care to get the scope just right.


5 What goes up must come down. (Ux: x goes up. Dx: x comes down)

6 If anything can go wrong, it will. (Wx: x goes wrong.)

7 All things must pass. (Px: x passes.)

8 A cat may look at a king. (Heywood, Proverbs and Epigrams) (Cx: x is a cat. Kx: x is a king.
Lxy: x looks at y.)

9 All good things must come to an end. (Gx: x is good. Ex: x ends.)

10 What can’t be cured must be endured. (Cx: x is cured. Ex: x is endured.)

11 What is permitted to all is required of some. (Dxy: x does y)

12 What is forbidden to some is forbidden to all, and what is permitted to some is permitted to
all. (Adapted from the Babylonian Talmud.) (Dxy: x does y.)

13 Someone has to slay the dragon. (i.e., the dragon must be slain by someone or other.) (Px: x
is a person. Dx: x is a dragon. Sxy: x slays y.)

14 There’s someone who must slay the dragon. (i.e., only that person can slay the dragon.) (Px:
x is a person. Dx: x is a dragon. Sxy: x slays y.)

15 Caesar’s wife must be above suspicion. (c: Caear. Mxy: x is married to y. Ax: x is above

16 It’s not possible to go faster than the speed of light. (Fxy: x goes faster than y. c: the speed of

Let’s add identity. Identity brings up some interesting philosophical issues, but the translation—
besides rampant de re/de dicto confusion—is straightforward.

Exercises For the following exercises, choose your own sym-


17 Everything is necessarily self-identical.

18 If Jane is possibly a philosophy student, and Jane is the murderer, then the murderer is possibly
a philosophy student.

19 If the murderer must be a sociology student, and Jane is not a sociology student, then Jane
cannot be the murderer.

20 If it is possible that Bob is president of the club, then it is possible that Bob is Jane.

21 If everything is necessarily self-identical, then Xanthippe is necessarily self-identical.

4.4 Models of Quantified Modal Logic

Models of quantified modal logic are just like models of propositional modal logic, except that the
worlds have predicate statements instead of primitive propositions. Recall that models in proposi-
tional modal logic had a set of worlds, an accessibility relation, and a valuation. Models in quantified
modal logic replace this simply valuation with two things: for each world, we have a domain of that
world (the objects that exist on that world) and an interpretation (an assignment of each object to
certain predicates). More informally, models look like this:



Every world is labeled with a number, a list of that objects exist on that world, and a list of what
statements are true on that world. What statements are true on the various worlds of this model?
At world 1, □Pa is true. At world 3, ∀xPx is true (since all the things at world 3, a and b, are P at
world 3), so ⋄∀xPx is true at world 1. ∀xPx is also true at world 2 (since all the things at world 2, just
a, are P at world 2), and hence □∀xPx is true at world 1. There are, of course, other propositions
that are true as well.
Let’s look at ⋄∀xPx and ∀x⋄Px. The first is true at some world w if at some world accessible
from w, ∀xPx is true. The second is true at a world w if, for every object x on w, there is some
world accessible from w on which x is P. We can illustrate the difference between these two with the
following models:

2{a}Pa 2{a}Pa

1{a,b} 1{a,b}

3{a,b}Pa,Pb 3{b}Pb

Both of these models have the same domain. They vary only in the properties the objects in the
domain have at various worlds. On the left model, ⋄∀xPx is true at 1 because there is some world
accessible to it, namely 3, on which ∀xPx is true. ∀x⋄Px is also true, since for every object, there is
some world accessible to 1 on which that object is P. On the right model, ∀x⋄Px is also true. But
⋄∀xPx is not true. Even though Pa is true at 2 and Pb is true at 3, there is no single world on which
all the objects are P, so there’s no single world on which ∀xPx is true.
Our definitions allow objects to have properties on worlds where they don’t exist. For example,
there’s nothing wrong, according to the definition, with this model:




On this model, ∀x⋄Px is true at world 1, since everything on world 1 has property P on some
accessible world. Yet on worlds 2 and 3, ∃xPx is false, since nothing that exists on world 2 or 3 has
property P on that world. This may seem a little odd, but there are two reasons for making this
move. The first is that there are some statements, such as Pav∼Pa, that we want to be true even on
worlds where a doesn’t exist. The second is that under some interpretations of the modal operators,

it is plausible that objects can have properties where they don’t exist. For example, on the temporal
reading, it’s plausible that George Washington doesn’t exist at all moments (e.g., he doesn’t exist
now), and yet he is famous at some moments at which he doesn’t exist (e.g., he is famous now).

State which statements are true on the given world.
Use the following model:

1{a}Pb 2{a,b}Pa,Pb,Qb


1 2:Pa

2 2:∀xPx

3 2:∀x(PxvQx)

4 3:Pa

5 3:∀xPx

6 4:Pa

7 4:∀x(PxvQx)

8 1:□

9 1:□Pa

10 1:□Qb

11 1:□∃xQx

12 1:∀xPx

13 1:∃xPx

14 1:∀x□Px

15 1:□∀xPx

16 1:∀x□(PxvQx)

We find counterexamples in just the same way we did before. That is, we construct a model on which
the statement is not true at world 1. If the statement is conditional, we make the antecedent true
and the consequent false. The counterexamples are a little tricker here, since we need to keep track
of the worlds, the objects on the worlds, and the interpretation of the worlds. To falsify a statement,
we may need to add a new world, or add a new object to a world, or change the interpretation of a
Show that ⋄∀xPx→∀xPx is not valid in M.
Because M models are reflexive, we need to add a self-accessibility arrow for every world we
include. We can start with a model with two worlds and see if that’s enough. To make the antecedent
true, we need to make some world accessible from world one on which everything is P; to make the
consequent false we need to make something on world 1 not P:

1{a} 2{a}Pa

We check to make sure this meets the requirements: it’s an M model; the antecedent is true; the
consequent is false. We have our counterexample.

Find counterexamples to the following statements in the frame indicated.

1 K:∀x⋄Px→∀xPx

2 K:⋄∀x□Px→∀x□Px

3 K:∀x□(PxvQx)→∀x(□Pxv□Qx)

4 K:∀x□(Px&Qx)→(□∀xPx&□∀xQx)

5 M:⋄□∀xPx→∀xPx

6 M:□□∃xPx→∃x□Px

7 M:⋄□∀x(Px&Qx)→∃x□(Px&Qx)

8 M:□∀x(∀yPy→□Px)

9 K4:□□∃xPx→∃x□Px

10 K4:⋄∃xPx→∃x⋄⋄Px

11 K4:∀x□Px→□∀xPx

12 K4:∀x(□Px→Px)

13 S4:⋄□∀xPx→∀x□Px

14 S4:∀x⋄Px→⋄∀xPx

15 S4:□∃xPx→∃x□Px

16 S4:∀x□Px→□⋄∃xPx

17 S5:∀x□Px→□∀xPx

18 S5:□∀xPx→∀x□Px

19 S5:⋄∃xPx→∃x⋄Px

20 S5:∃x⋄Px→⋄∃xPx

Constant and Varying Domains

To produce many of the counterexamples in the last set, we relied on there being different things in
different worlds. Sometimes, for various reasons, we might want to eliminate that possibility. That
is, we might want the things that exist on one world to exist on all worlds. Such a model is called a
constant domain model (in contrast with a varying domain model).
There are two basic statement that are true in a constant doman model that are not true in a
varying domain model. The first statement, usually called the Barcan Formula (after Ruth Barcan
Marcus) is this:


The Barcan formula rules out expanding domains, models on what exists grows from world to
world, models on which there are things on accessible worlds not on the actual world.
The second statement is the converse of the first, and is usually called the Converse Barcan


This rules out shrinking domains, on which there are things in the actual world not in the ac-
cessible worlds. These two principles together imply a constant domain.
This distinction has interesting philosophical consequences. In the last section I said that it was
plausible that George Washington no longer exists. This is one view of time, on which only the
present exists—that is, dinosaurs don’t exist because they are in the past, and moon colonies don’t
exist because they are in the future. On this view of time, a temporal reading of the quantifiers
would want the domain to vary. On another view of time, the past and the present both exist, but
the future is still open, and hence its objects don’t exist. On this view of time, an expanding domain

model would be more appropriate. On an alethic reading of the quantifiers, we probably want a
varying domain model, since it seems that there might have been things other than there are, and
there might have been fewer things than there are. The universe might have been smaller or larger
than it is.
For more information, see the following books:
G.E. Hughes and M.J. Cresswell, A New Introduction to Modal Logic. Routledge, 1996.
Ted Sider, Logic for Philosophers. Oxford, 2010. Chapters 6, 7, and 9.
M. Fitting and R.L. Mendelssohn. First-order Modal Logic. Kluwer, 1998.
Chapter 5


As we saw in chapter 3, one major motivation behind axiomatic theories is to systematize and pro-
vide a foundation for mathematics. In the late nineteenth and early twentieth centuries, several
people were involved in laying axiomatic foundations for arithmetic, until the project achieved the
rigor and sophistication it needed.
The language of arithmetic requires, in addition to the logical symbols, four undefined non-
logical symbols:


The first, 0, is a constant; the other three are functions. The last two functions are functions of
two places and should be familiar to you. Normally functions are written with the function name
preceding the terms, as in ‘f(x,y)’; in this notation these functions should be written ‘+(x,y)’ and
‘×(x,y)’. We will instead use the standard infix notation, so instead of writing ‘+(2,3)=5’ (or worse,
‘=+((2,3),5)’, as we should if both the relation ‘=’ and the function ‘+’ were prefix), we’ll simply write
‘2+3=5’. The second function is a function of one place, pronounced ‘successor’, and instead of
being written before the term it modifies, it is written after. It is intended to mean the next natural
number in the sequence. Given that, we can introduce other defined constants:

Def 1 1::0′
Def 2 2::1′ ::0′′
Def 3 3::2′ ::1′′ ::0′′′
Def 4 4::3’::2′′ ::1′′′ ::0′′′′

Some typical sentences in the language of arithmetic are:


2+2=4 v2+2=5

There are, of course, infinitely many such sentences. An axiomatic theory of arithmetic has some
small set of true sentences in the language of arithmetic from which all the other true sentences of
arithmetic follow.
Before we find the axioms, it is worth emphasizing again what it means to say that this is a
formal language. We set out the undefined terms, and then indicate the “intended interpretation,”
where ‘0’ means the number 0, ‘+’ means the addition function, and so on. But we must resist
tacitly assuming something we already know about arithmetic in our proofs. Part of what it means
to say that these terms are undefined is that they could be given other interpretations. Consider, for
example, this “non-standard” interpretation of the symbols of arithmetic:

0 means 0
′ means the predecessor function
+ means the addition function
× means the negation of the multiplication function.

Here the sequence 0, 1, 2, 3 ... will mean the sequence 0, -1, -2, -3, ..., and every sentence of
arithmetic will still be true. It will simply mean something different than you might expect. For
example, the sentence


will mean that -2+-3=-5. Consider an even weirder example:

0 means 1
′ means the divide-by-2 function
+ means the multiplication function
× means the function x-log2(y)

Here the sequence 0, 1, 2, 3 ... will mean the sequence 1, 1/2, 1/4, 1/8, .... Given these
interpretations, it’s easy to see that any sentence will still be true. For example, the sentence


will mean, on this interpretation, that 1/4 × 1/4 = 1/16. It’s kind of fun to play around with
these non-standard interpretations.
Both of the examples given here work because we have a sequence of objects with a beginning but
no end. And any sequence that has that form can make the sentences of arithmetic true. Imagine an
archangel on an infinite seashore with a string of seashells infinite in one direction. Here the symbol
0 can refer to the first seashell, the ′ function can be the function from one seashell to the next, and
the + and × functions are functions to take you from a pair of seashells to another, depending on
how far each seashell is from the first. You could easily set it up so that “arithmetic” was about this
archangelic game and not about numbers at all.
What does this mean for doing proofs? One thing it means is that, for example, unless we assume
or prove that + is commutative, we can’t know that 2+3=3+2. We can’t assume that + and × work
the way they are “supposed” to work. All we know about them is what the axioms say, and then
what we can prove based on those axioms.

5.1 Robinson Arithmetic (Q)

The first axiom system we’re going to consider is named after Raphael Robinson. It has seven

Q1 x′ ̸= 0
Q2 x′ = y′ →x = y
Q3 x = 0 v∃y(x=y’)
Q4 x+0=x
Q5 x + y′ = (x + y)′
Q6 x×0=0
Q7 x × y′ = (x × y) + x

(By convention we drop universal quantifiers that apply to the whole line. Thus Q1, for example,
could be written ‘∀x(x′ ̸= 0)’.) On the intended interpretation, these axioms mean:

Q1 0 is not the successor of any number

Q2 If the successors of x and y are equal, x and y are equal.
Q3 Every number but 0 has a successor.
Q4 Any number plus 0 equals that number.
Q5 The sum of any number and the successor of any number equals the successor
of their sum.
Q6 Any number times 0 equals 0.

Q7 The product of any number and the successor of any number equals the sum
of the first number and the product of the two numbers.

Because Q is an extension of the axiom system for logic, you are welcome to use any of the
theorems or rules from that system (including, for example, conditional proof/deduction theorem).
Because all of the following theorems are statements of identity, the identity axioms, theorems, and
rules will be useful in every proof.
Example: AT4 0+1=1

1 0+0=0 Q4
2 (0 + 0)’= 0’ 1 AT1
3 0 + 0’= (0 + 0)’ Q5
4 0 + 0’ = 0 ’ 2,3 Ident
5 0+1=1 4 Def 1
It may be helpful to go through this proof backwards. The last line is a definitional substitution.
This will often be the case when you’re proving results about specific numbers. If the proof is about
1, you should think that you need to prove something about 0′ . Line 4 is an identity chain. Notice
that the left side of line 2 and the right side of line 3 are the same thing. By our shortcut identity
theorem, we are entitled to take these extremes as identical. In this proof the chain is short—only
two lines. But sometimes the chains can get quite long (a=b, b=c, d=c, d=e, ...). Once you have
a chain that connects the left side of what you’re trying to prove with the right side of what you’re
trying to prove, you are ready for the identity chain. The step before that, then, is to create such
a chain. Different styles of theorem will require different strategies for doing this, but notice the
strategy we used here: we started (line 1) with a statement we had before (in this case an axiom) that
had as one side the predecessor of what we were trying to prove. Then we applied theorem AT1 to
take the successor of both sides. With that move, we had one side down. Then we needed to get
a line that had the other side we were trying to prove as identical with the term the other side was
identical with. Q5 came in handy here.
Prove the following theorems:
AT1 x = y →x′ = y′
AT2 x = y →x + z = y + z
AT3 x = y →z + x = z + y
AT4 0+1=1
AT5 0+2=2
AT6 1+0=0+1
AT7 1+1=2
AT8 2+0=2
AT9 2+1=3
AT10 1+2=2+1
AT11 2+2=4
AT12 x′ = x + 1

AT13 x” = x + 2
AT14 x = y →x × z = y × z
AT15 x = y →z × x = z × y
AT16 0×1=0
AT17 0×2=0
AT18 1×1=1
AT19 1×2=2
AT20 2×1=2
AT21 2×2=4

5.2 Peano Arithmetic (P)

Robinson arithmetic can prove infinitely many particular sentences of arithmetic. It can prove
‘1+2=2+1’, ‘5+7=7+5’, and so on, for any pair of numbers. But it cannot prove ‘x+y=y+x’. That
is, most general sentences of arithmetic cannot be proven without something called “mathematical
Mathematical induction is perhaps poorly named. It is not the same thing as induction in the
sense that it is a non-deductive leap from premises to conclusion. It is like induction in that sense in
that it does go from particular premises to a general concludion, but it is a deductive law.
What is called Peano arithmetic is named after Giuseppe Peano. It is like Q except with the
third premise replaced with the axiom of mathematical induction:

P1 x′ ̸= 0
P2 x′ = y′ →x = y
P3 [X0 &∀x(Xx →Xx′ )] →∀xXx
P4 x+0=x
P5 x + y′ = (x + y)′
P6 x×0=0
P7 x × y′ = (x × y) + x

(Historically, Peano arithmetic came first. Robinson and his colleagues were interested in seeing
how weak an arithmetic they could develop.) P3 claims that if 0 has a certain property, and if the
successor of every number that has the property also has the property, then every number has the
Inductive proofs have two parts. First we prove that 0 has the given property; then we assume
that some arbitrary number k has a given property and show that this entails the k’ has the property.
Example: 0 + x = x
Notice that this is different from the axiom P4.
We first prove the 0 case. Here it is an instance of P4, substituting 0 for x:

1 0+0=0 P4
Then we prove the inductive case:

1 0+k =k
2 (0 + k)’= k ’ AT1
3 0 + k ′ = (0 + k)’ P5
4 0 + k ’= k ’ 2,3 Ident
That completes the proof. This proof is an abbreviation of this longer poof:

1 0+0=0 P4
2 0+k =k (cp)
3 (0 + k)’= k ’ AT1
4 0 + k ′ = (0 + k)’ P5
5 0 + k′ = k′ 2,3 Ident
6 0 + k = k → 0 + k′ = k′ 2–5 CP
7 ∀x(0 + x = x → 0 + x′ = x′ ) 6 UG
8 0 + 0 = 0 & ∀x(0 + x = x → 0 + x′ = x′ ) 1,7 Conj
9 ∀x(0 + x = x) 8, P3
Line 1 of this proof is the proof of the 0 case, and lines 2–5 are the proof of the induction case.
Line 6 closes out the proof of the induction case; line 7 generalizes this; line 8 conjoins the conclusion
of the 0 case and the induction case; and line 9 applies axiom P3. These last four lines will be similar
in every proof by induction, thus to save the space and tedium we’ll adopt the shortcut of proving
only the 0 case and induction case, as we did above. But be sure you understand why this shortcut
In A, one can prove all the familiar results of arithmetic, but without AA5 most general results
cannot be proven. We must now become familiar with this powerful axiom. AA5 assures us that
if 0 has a certain property, and if the successor of every number that has the property also has the
property, then every number has the property.
AT22 0+x=x
Notice that AT22 is different from AA6.
Prove the following theorems:
AT23 x×1=x
AT24 x×2=x+x
AT25 x′ + y = (x + y)′
AT26 x+y=y+x
AT27 (x + y) + z = x + (y + z)
AT28 0×x=0
AT29 x′ × y = (x × y) + y
AT30 x×y=y×x
AT31 (u = v&w = x) →u + w = v + x

AT32 x × (y + z) = (x × y) + (x × z)
AT33 (y + z) × x = (y × x) + (z × x)
AT34 (x × y) × z = x × (y × z)
AT35 x + y = x + v →y = v
Q3 x = 0 v∃y(x=y’)
Chapter 6

Set Theory

6.1 Naive Set Theory

What are sets?
A set is a collection of objects. It is itself an abstract object that considers the various objects as a
unity. A dozen eggs, for example. Given any particular twelve eggs, there is a set of just those twelve
eggs. The set is not the carton but some abstract unity of the twelve individuals.
Sets have members or elements. Socrates is a member of the set of humans; red is a member of the
set of colors; Europe is a member of the set of continents on Earth.

Class algebra
One way to think of categorical statements, like ‘All acrobats are bohemians’, is that the terms of
the statement denote sets. This statement asserts that the entire set of acrobats is contained in the
set of bohemians. This relationship is known as subset: the set of acrobats is a subset of the set of
bohemians. In symbols, we say
A ⊂ B.
We may also want to specify a set more precisely. For example, given the set of acrobats and the
set of bohemians, we may be interested in the set of those who are both acrobats and bohemians,
those who are acrobats but not bohemians, or those who are neither acrobats nor bohemians.
It is important to distinguish between a subset of a set and an element of a set. The set of Cirque
du Soleil performers is (let’s say) a subset of the set of acrobats. But Robin is an element of the set
of acrobats. The symbol for this relation is ‘∈’ (which looks like an ‘e’ for ‘element’). The subset of
a set must always be a set, but an element of a set need not be. We will use capital letters to indicate
sets, and lower-case letters to indicate elements:
a ∈ A.
A very common notation for expressing the content of a set is to enclose a list of the elements
within curly brackets, like this:


{a, b, c}
Here a bare letter is an element, while a letter surrounded by brackets is a set. Thus, a ∈ {a, b, c}
and a a, b, c
The basic assumption of set theory is that sets are extensional, which means that two sets are
identical if they have the same members. A little more precisely, it tells us that the criterion for
identity of sets is identity of membership. It doesn’t matter what property we specified to gather
together just these things; the only thing that matters is the elements of the set. To use a famous
example, consider the set of animals that have hearts, and the set of animals that have kidneys. Now
clearly having a heart is a different thing from having a kidney, so we have picked out different
properties. (‘Having a heart’ and ‘having a kidney’ have different intensions.) But it may happen that
whatever animals have hearts also have kidneys and vice versa. If this is the case both properties
have picked out the same set. (‘Having a heart’ and ‘having a kidney’ have the same extension.) There
is just one set that could equally well be specified by listing all the members, or by the property of
having a heart, or by the property of having a kidney:
{Kermit the frog, Shasta the liger, Beyonce the human, ...}
{x: x has a heart}
{x: x has a kidney}
The first way of specifying a set, by listing all its members (separated by commas, surrounded
by braces) is fine if there are only a few elements of a set. But for a large set, we can’t list every item,
so we have to resort to ellipses. And ellipses are unhelpful unless we also have a rule to inform us
how to fill them out. The second way of spelling out a set—with a variable, then a colon, then a
property, surrounded by braces—tells us what property to use in filling out the set. But the axiom
of extension tells us that a difference in property doesn’t automatically make a difference in sets.
Similarly, the order of the elements in a set doesn’t matter.
{a, b, c} = {b, a, c}.
Sometimes we do care about the order of the elements in a set. An ordered set-like group of two
elements is called an ordered pair, and is notated with angled brackets:
⟨a, b⟩.
And ordered group of three elements is called a triple, of four elements, a quadruple, of five ele-
ments, a quintuple, of n elements, an n-tuple.
The set of everything that is both A and B is called the intersection of sets A and B. The set of
acrobats who are bohemians is the intersection of the set of acrobats and the set of bohemians. In
symbols, we write
and in a Venn diagram, with the shaded portion indicating the set we’re interested in,


It may be helpful to think of the symbol ∩ as a cup that clamps down on just the portion of the sets
that we’re interested in.
The set of everything that is in either A or B, the two sets put together, is called the union of
the two sets. The set of anyone who is an acrobat or a bohemain is the union of these two sets. In
symbols, we write
and in a Venn diagram,


It may be helpful here to think of the ∪ symbol as a cup upright, open for every- thing in both sets.
The set of everything in one set but not in anotheris called the difference between the sets. The
set of acrobats who are not bohemians is the difference be- tween the two sets. The symbol is
and the diagram is


The set of everything not in a set is called the complement of the set. The complement of the
set of acrobats is everything not an acrobat. (We may sometimes have a universe of discourse; if
the universe of discourse is people, the complement of the set of acrobats is all people who are not
acrobats.) In symbols, we write

and in a Venn diagram, we draw


We will later develop a subtle and powerful version of set theory. But for now, we will leave it at
the intuitive level.

Which of the following sentences are true?

1 A⊆A∩B

2 A⊆A∪B

3 A∩B⊆A

4 A∪B⊆A

5 A⊆A

6 A⊆A–B

7 A–B⊆A

8 A∩B=A∩B

9 A∪B=A∪B

10 A–B=B–A

11 A∩B=Ā∪B

12 A∪B=Ā∩B

We can calculate these operations on sets. For example,

{a, b, c} ∪ {c, d, e} = {a, b, c, d, e}
{a, b, c} ∩ {c, d, e} = {c}
{a, b, c} − {c, d, e} = {d, e}

Calculate the following.

13 a,b,c,d∪a,c,g

14 a,c,g∩a,b,c,d

15 a,b,d,e–a,b,c

16 (a,b,c∩c,d,e)∪d,e,f

Sets as elements
Sometimes the members of a set will themselves be sets. If a club is a set of the members of that
club, the set of clubs will have sets as members. Sometimes that won’t make any difference, but it
does allow for some additional operations.
For example, let’s take this to be an illustration of a set of sets:

Each circle represents one of the sets that is a member of the bigger set. We can define the
intersection of a set of sets as the set of elements that are in every set. For example, the intersection
of the set of clubs is the set of people who are members of every club. We can diagram it like this:

The symbol for the intersection of a set A is ∩A (Again, the cup clamps down on just the part
that overlaps.) Likewise, we can take the union of a set of sets. It is the set of all things that are
elements of any of the sets. For example, the union of the set of clubs is the set of people who belong
to any club. We diagram it like this:

The symbol for the union of a set A is ∪A. (This cup is upright and open for everything.) Notice
that ∪A is not always the same thing as A, even though we’ve shaded everything in every element

of A. A is a set of sets; ∪A is a set of elements of those sets. In the club example, A is a set of clubs,
∪A is a set of people who belong to clubs. Similarly for ∩A. ∩ and ∪ skip a level: if A is a set of sets
of individuals, ∩A and ∪A are sets of individuals. They skip over that “sets of ” in the middle.
To indicate sets of sets using bracket notation, we nest bracketed lists. For example,
{{a, b, c}, {d, e, f }, {e, f, g}} is a set containing three elements, each of those elements being
sets containing three elements. The union or intersection of a set of sets will be a set of elements.
∪{{a, b, c}, {d, e, f }, {e, f, g}} = {a, b, c, d, e, f, g}.

Which of the following sentences are true?

1 A∈ A

2 A∈ A

3 A⊆ A

4 A⊆ A

Calculate the following.

5 a,c,e,d,e,f,e,f,g

6 a,c,e,d,e,f,e,f,g

7 a,e,a,d,f,g,a,e,g

8 a,e,a,d,f,g,a,e,g

Empty Set and Power Set

Once we are comfortable with the difference between a thing and a set containing just that thing—
e.g., between the thing a and the set {a}—we might as whether it’s possible to have a set containing
nothing at all. The answer is yes: the set {}, which is a set with no elements, is not the same thing
as nothing. This set is important enough that we have a special symbol for it: ∅. Because the sets
{a, b, c}, {d, e, f }, {g, h, i} have no elements in common, ∩{{a, b, c}, d, e, f }, {g, h, i}} = ∅.
It might be helpful to think of a set as a box that has various things in it. The empty set, then,
is a box with nothing in it. But notice that ∅ ̸= {∅}—the left side is an empty box, and the right
side is a box containing an empty box. Thus the left side is empty, but the right side has something
in it, namely, an empty box.

Which of the following are true?

1 ∅⊆A

2 A∩∅=∅

3 A∩∅=A

4 A∪∅=A

5 A∪∅=∅

6 ∪∅ = ∅

7 A–∅=A

8 A–A=∅

9 ∅–A=A

10 ∅ – A = ∅

Just as union and intersection move us from a set of sets to a set of individuals, the power set
function moves from a set of individuals to a set of sets. The power set of a set is a set of all subsets
of that set. The symbol for the power set of A is ℘A.
℘{a, b, c} = {∅, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}.
It is always the case that A ⊆ A and ∅ ⊆ A, so the empty set and the set itself are always
elements of the power set. The name “power set” comes from the fact that if A has n elements, ℘A
has 2n elements.

Calculate the power set of the following.

11 {a, b}

12 {∅, a, b}

13 {a, b, c, d}

6.2 The Logicist Project

Frege and the beginnings of Logicism
Simple mathematical statements (e.g., 2+2=4) may be epistemically basic. That is, we could prove
them from axioms, but those axioms are no more obvious than the statement we’re trying to prove
from them. If our only interest was being certain that the statements are true, we would have no
need of an axiom system.

Not all mathematical statements are that obvious, however. Some statements have taken great
effort to prove. And some statements have taken great effort, but their proofs are either impossible
or have escaped us so far. Consider any complex mathematical statement. How do we know it’s
true? Well, there’s a proof, but that proof only shows that the statement follows from our axioms or
assumptions. How do we know that the proof really guarantees the results? How do we know that
our assumptions are true, and that the mathematical reasoning we use preserves truth?
One idea is that mathematics is simply logic. That is, all the axioms of mathematics are just
theorems of FOL, or possibly FOL with an additional axiom or two. This idea is called logicism,
and the attempt to prove that mathematics is logic is a fascinating chapter in the history of human
We begin the chapter with Gottlob Frege, a German mathematician and philosopher who was
dissatisfied with the theories of his contemporaries about logic and the foundations of mathematics.
The logic of the nineteenth century involved very intricate elaborations of syllogism, but was unable
to explain much of mathematical reasoning. Frege began a new foundation.
We have already considered logical axioms that are similar to those Frege used. Frege also used
an axiom of set theory:

∃X∀x(x ∈ X ↔φx)

In words, any property determines a set. Or, less concisely, given any property—the property
of being an acrobat, or the property of being a bohemian acrobat, or the property of being a milk-
drinking Olympic curler—there is a set that has as its elements all and only the things with that
With this axiom, we can define all the symbols of set theory, and hence the entire system of
class algebra developed by other nineteenth-century logicians. With that, Frege was able to define
the basic notions of arithmetic. Frege’s definitions have never been surpassed in their philosophical
insight and ingenuity. We won’t go through his definitions here, because Frege’s system has a serious
problem: it is inconsistent.

Russell’s Paradox
None of Frege’s brilliant work received much attention in Germany, and it might have passed into
oblivion had it not been for Bertrand Russell, an English logician and philosopher, who discovered
Frege’s writings around the turn of the century. Russell had been working on the same problems
and had discovered most of these definitions independently. He had discovered a difficulty that was
giving him trouble. When he read Frege seriously, he recognized Frege’s profundity and originality
and agreed with Frege’s views on the relation between mathematics and logic. He found, however,
that Frege had not noticed the difficulty. The difficulty was this: Frege’s axiomatic system was
Recall Frege’s axiom of set theory:

∃X∀x(x ∈ X ↔φx)

In Frege’s system, this was known as Rule V and it appeared near the beginning of the first
volume of the Grundgesetze. Rule V seemed so obvious that, before the twentieth century, no one
questioned it.
This axiom would solve the problem we confronted at the end of the previous section—for any
property, this axiom would assure us of the existence of the set of things with that property. It would
then remain to show only that the set is unique (which generally follows quite easily from SA1), and
we could confidently employ the braces notation introduced in the final paragraphs of Section 5.1.
However, Bertrand Russell identified a predicate that could not determine a set thereby exposing a
devastating flaw in Frege’s system.
In symbols, Russell’s paradox can be put succinctly.
1 ∃X∀x(x ∈ X ↔φx)
2 ∃Χ∀x(x ∈ Χ ↔x ̸∈ x) 1 UI (instantiating ξ̸∈ξ for φ)
3 ∀x(x ∈ n ↔x ̸∈ x) 2 (ei)
4 n ∈ n ↔n ̸∈ n 3 UI
5 p &∼p 4 TF
6 p &∼p 2,3-5 EI
Russell wrote to Frege to inform him of the contradiction. Frege responded like this:

Your discovery of the contradiction caused me the greatest surprise and, I would almost
say, consternation, since it has shaken the basis on which I intended to build arithmetic.
… It is all the more serious since, with the loss of my Rule V, not only the foundations
of my arithmetic, but also the sole possible foundations of arithmetic, seem to vanish.

Later, Russell commented on Frege’s response:

As I think about acts of integrity and grace, I realise there is nothing in my knowl-
edge to compare with Frege’s dedication to truth. His entire life’s work was on the
verge of completion, much of his work had been ignored to the benefit of men infinitely
less capable, his second volume was about to be published, and upon finding that his
fundamental assumption was in error, he responded with intellectual pleasure clearly
submerging any feelings of personal disappointment. It was almost superhuman and a
telling indication of that of which men are capable if their dedication is to creative work
and knowledge instead of cruder efforts to dominate and be known.

Various logicians and mathematicians, beginning with Russell himself, sought to circumvent
this paradox by rejecting one or another step in the proof. Russell’s solution is called the Theory of
Types. According to this theory, there are verious levels or types of entity: basic entities, sets of basic
entities, sets of sets of basic entities, and so on. The membership relation ∈ holds only between an
entity at one level and one at the next level up. Thus no sets can be members of themselves, and the
sentence ‘x ̸∈ x’ in line 2 is rejected as ungrammatical. Other theories, like the theory of Zermelo
and Fraenkel, which is treated in this book, revise Frege’s Rule V with a series of axioms.

6.3 Zermelo-Fraenkel Set Theory

The ZF Axioms
For most mathematicians, “set theory” means the axiomatic set theory developed by Zermelo and
Fraenkel and others. The basic idea of this set theory is that it is not a general theory of any sets
whatever. It is a theory of a specific hierarchy of sets. The elements of any set are themselves sets.
The axioms are listed below. They will be explained one at a time in the sections below.
Extension A = B ↔∀x(x ∈ A ↔x ∈ B)
Separation ∃x∀y(y ∈ x ↔(y ∈ A &φy))
Pairing ∀xy∃z(x∈z &y∈z)
Union ∀z∃x∀y(y ∈ x ↔∃w(y ∈ w &w ∈ z))
Infinity ∃x(∅ ∈ x &∀y(y ∈ x →y′ ∈ x))
Power Set ∀x∃y∀z(z ∈ y ↔z ⊆ x)

Extension and Separation

The axiom of extension tells us that two sets are identical if they have the same members. A little
more precisely, it tells us that the criterion for identity of sets is identity of membership. It doesn’t
matter what property we specified to gather together just these things; the only thing that matters
is the elements of the set. To use a famous example, consider the set of animals that have hearts,
and the set of animals that have kidneys. Now clearly having a heart is a different thing from having
a kidney, so we have picked out different properties. (‘Having a heart’ and ‘having a kidney’ have
different intensions.) But it may happen that whatever animals have hearts also have kidneys and
vice versa. If this is the case both properties have picked out the same set. (‘Having a heart’ and
‘having a kidney’ have the same extension.) There is just one set that could equally well be specified
by listing all the members, or by the property of having a heart, or by the property of having a

Kermit the frog, Shasta the liger, Beyonce the human, ...
x: x has a heart
x: x has a kidney

The first way of specifying a set, by listing all its members (separated by commas, surrounded
by braces) is fine if there are only a few elements of a set. But for a large set, we can’t list every item,
so we have to resort to ellipses. And ellipses are unhelpful unless we also have a rule to inform us
how to fill them out. The second way of spelling out a set—with a variable, then a colon, then a
property, surrounded by braces—tells us what property to use in filling out the set. But the axiom
of extension tells us that a difference in property doesn’t automatically make a difference in sets.
We can define these notations formally:
Def w ∈ x ↔w = x
Def , w ∈ x,y ↔w = x vw = y

Def : y ∈ x : Xx ↔Xy
As a reminder, the universe of discourse of ZF set theory is sets. That means that all quantifiers
range only over sets, that all properties are properties of sets, that all terms refer to sets. It means
that ZF set theory doesn’t really allow talk of sets of creatures with kidneys, since creatures aren’t
sets. So even though the axiom listed above distinguished between the sets (with capital letters) and
the elements (with lower-case letters), officially it’s sets all the way down.
So, to use an example with numbers (which we take to be sets): consider the set of all even
primes, and consider the set of all square roots of 4. Here are three ways of specifying this set:
x: x is even &x is prime
x: x=√4
These properties used to specify this set may be different in intension, but they all pick out the
same set, the set containing only the number 2.
Aside: Strictly speaking, we need only the right-to-left part of the biconditional. The left-to-right
half follows from the axioms of identity. If x and y are identical, the axioms tell us, they have every
property in common, including their membership. It’s the right-to-left half that tells us something
more restrictive about sets.
The Axiom of Separation (also called Comprehension, Abstraction, or its German name Aus-
sonderung) is the revision of Frege’s Rule V. As we have seen, in contrast to Frege’s Rule V, this
axiom forces us to draw the elements for each new set from members of earlier sets. If we already
know that they set John, Mary, Susan exists, this allows us to draw from that set to form a new set.
We call it ’Separation’ because it allows us to separate out a new set from an old.
Def ⊆ A ⊆ B :: ∀x(x ∈ A →x ∈ B)
Def ⊂ A ⊂ B :: A ⊆ B &A ̸= B
ZF1 (A ⊆ B &B ⊆ C) →A ⊆ C
ZF2 (A ⊆ B &B ⊆ A) →A = B
ZF3 A ⊆ A
ZF4 A ̸⊂ A
*ZF5 A ⊂ B →B ̸⊂ A
ZF6 x ̸∈ A →(x ∈ A ↔x ̸= x)
ZF1 tells us that the subset relation is transitive; ZF3 tells us that it is reflexive. (Is it symmetric?)
ZF4 tells us that the proper-subset relation is irreflexive, and ZF5 tells us it is asymmetric. (What
about transitive?) ZF6 should be obvious, if a trifle odd. It claims that if x is not a member of a given
set, then it is a member of that set only if a particular contradiction is true. It will be handy in the
next section.

The Empty Set

So far we are able to extract sets from larger sets. But how do we get these larger sets in the first
place? And, relatedly, how do we know that there are any sets?

We can’t take this for granted. Because we’re developing set theory axiomatically, we can’t
simply say that it’s obvious that there are sets, that everyone knows there are sets. If it’s going to play
a role in the proofs, we need to assume the existence of sets.
ZF has one existential axiom: the Axiom of Infinity. This assumes the existence of an infinite set,
which, by Separation, we can break off into smaller sets. It is an interesting and powerful axiom,
and we’ll discuss it at greater length in just a moment. For now, all we need from the Axiom of
Infinity is that there is at least one set.
With that assumption we can prove the existence of another useful and interesting set: the empty
set. Let A name the set that the Axiom of Infinity guarantees to exist. Then we can prove the
existence of a set that has no members:
∃x∀y(y ∈ x ↔(y ∈ A &y ̸= y)) (Separation φy: y ̸= y)
∀y(y ∈ x ↔(y ∈ A &y ̸= y)) (ei x)
y ∈ x ↔(y ∈ A &y ̸= y) 2 UI
y ∈ A →(y ∈ x ↔y ̸= y) 3 TF
y ∈ A →∃x∀y(y ∈ x ↔y ̸= y) 4 FOL
y ̸∈ A →(y ∈ A ↔y ̸= y) ZF6
y ̸∈ A →∃x∀y(y ∈ x ↔y ̸= y) 6 FOL
∃x∀y(y ∈ x ↔y ̸= y) 5,7 TF
∃x∀y(y ∈ x ↔y ̸= y) EI
This proof tells us to separate off from the Infinity set a set according to the following rule: pick
only those members that are not self-identical. Because, of course, there are no such things, we are
guaranteed to have a set that is empty. This empty set (sometimes called the null set) has a special
symbol: ∅.
This may seem strange. Maybe the strangeness can be expressed like this: If a set is a collection
of things, then if there are no things to collect, there’s no set! It may help to visualize a set as an
imaginary box containing various things (e.g., the current members of the U.S. Senate, the solar
planets, the numbers). If the box happens to have nothing in it, the box doesn’t thereby disappear.
The empty set is just such an empty box.
Because sets are determined by their members, there is only one empty set. The set of all
dragons, the set of all married bachelors, the set of all honest knaves—each of these descriptions
picks out just this one set with nothing in it.
ZF7 ∀xyz[((x ∈ z ↔x ̸= x) &(y ∈ z ↔y ̸= y)) →x = y]
ZF8 ∀x(x ∈ ∅ ↔x ̸= x) ↔∀x(x ̸∈ ∅)
ZF7 tells us that there is at most one empty set. Together with the proof above, we know that
there is exactly one empty set, that the empty set is unique. ZF8 tells us that the way we defined
the empty set above is equivalent to a simpler way. We will adopt this simpler way as a theorem.
Practically, this theorem will be more useful in our proofs than the earlier definition. We will cite
this theorem as ’Empty Set’.
ZF9 Empty Set x ̸∈ ∅
*ZF10 ∅ ⊆ x

Pairing, Union, Intersection

We have an axiom (Separation) that allows us to make sets out of bigger sets, and we have the empty
set. It would be nice to be able to have an axiom that allows us to put sets together to make bigger
sets. And that’s what the Pairing Axiom allows us to do.
The Pairing Axiom tells us that given any two sets, there’s a set they both belong to. (More
precisely, given any sets A and B, which may be the same set, there’s a set C such that A∈C and
B∈C.) In symbols:
Pairing ∀xy∃z(x∈z &y∈z)
From this axiom we can prove that, given any two sets, there is a set just they belong to—given,
for example, A and B, there is a set A,B:
ZF11 ∀x∀y∃z∀w(w ∈ z ↔(w = x vw = y))
In fact, this theorem is equivalent to the Pairing Axiom:
ZF12 ∀xy∃z(x∈z &y∈z) ↔∀x∀y∃z∀w(w ∈ z ↔(w = x vw = y))
This axiom allows us to introduce the brace notation used above. So we have the following
Def , w ∈ x,y :: w = x vw = y
If x = y, we write x in place of x,x. Thus,
Def w ∈ x :: w = x
Prove the following theorems:
ZF13 x,y = y,x
ZF14 ∃x∀y(y ∈ x ↔∃z∃w(y ∈ z &y ∈ w))
ZF14 says that, given any two sets, there is a set consisting only of the elements common to both
sets. That is, given any two sets, their intersection is also a set. We can define intersection formally
Def ∩ x ∈ A ∩ B :: x ∈ A &x ∈ B
In words, x is an element of the intersection of A and B if and only if x is an element of B and
an element of C. (You may want to review the intuitive presentation of intersection above to make
sure this definition makes sense to you.)
ZF15 A ∩ ∅ = ∅
*ZF16 A ∩ B = B ∩ A
ZF17 (A ∩ B) ∩ C = A ∩ (B ∩ C)
ZF18 A ∩ A = A
ZF19 (A ⊆ B &A ⊆ C) →A ⊆ (B ∩ C)
We can generalize the concept of intersection so that it applies not just to two sets, y and z, but
to any number of sets, that is, to any set of sets. Suppose A is a set of sets; then ’x is an element of
the intersection set of A’ means that x is an element of every set that is an element of A. In symbols
∩ ∩
Def x ∈ A :: ∀z(z ∈ A →x ∈ z)
Of course we can use this definition only after proving the existence and uniqueness of the
intersection set of a given set. The next theorem guarantees this.

ZF20 ∀∩A∃B∀x(x ∈ B ↔∀z(z ∈ A →x ∈ A))

ZF21 A,B = A ∩ B
Next we want to introduce the concept of the union of two or more sets. Some set z is in the
union of x and y if z is in x or z is in y. This concept is parallel to the concept of intersection just
discussed, but it needs a new axiom. The axiom of Separation allows us to make smaller sets out
of sets we already have (licensing intersections) but does not allow us to make bigger sets. We need
a new axiom that allows us to do this. We’ll pick a general axiom that allows us to make arbitrary
unions, and then define pairwise unions as a special case:
Union ∀z∃x∀y(y ∈ x ↔∃w(y ∈ w &w ∈ z)) ∪
As always, uniqueness follows easily from SA1, and we introduce a new symbol ‘ ’, read union.
Here is the
∪ definition:

Def x ∈ A :: ∃z(x ∈ z &z ∈ A)
Prove the
∪ following theorems:
*ZF22∪ A = A
ZF23 ∅∪= ∅
ZF24 x ∈ y,w ↔(x ∈ y vx ∈ w)
ZF24 makes possible the following definition:
Def ∪ x ∈ y ∪ w :: x ∈ y vx ∈ w
This is a special case of union—the union of exactly two sets. It corresponds to the special case
of intersection ∩ above.
Prove ∪the following theorems:
ZF25 A,B = A ∪ B
ZF26 A ∪ ∅ = A
ZF27 A ∪ B = B ∪ A
ZF28 (A ⊆ B vA ⊆ C) →A ⊆ (B ∪ C)
ZF29 A ∪ A = A ∪ ∪
ZF30 A = B → A = B
ZF31 A∪∪ B = A,B∪ ∪
*ZF32 A ∪ B = A ∪ B
Notice that in Zermelo-Frankel set theory, absolute complements cannot be defined since the
absolute complement of the null set would be the universal set which, as we have seen, cannot exist.
However, in this system, there is what is called a relative complement. This consists of the set of all
those entities that are in one set but not in a second set. This can be thought of as the complement
of the second set within the first set. Given sets x and y, the existence and uniqueness of the relative
complement of y within x are insured by SA2 and SA1 respectively. The symbol that is used for a
relative complement is ‘–’ and here is its definition:
Def – z ∈ x – y :: z ∈ x &z ̸∈ y
Prove the following theorems:

ZF33 A – ∅ = A
ZF34 A – A = ∅
ZF35 ∅ – A = ∅
ZF36 A ∩ B = ∅ →A – B = A
We have seen in ZF13 that x,y = y,x. So when picking out a set by listing the elements inside
braces, the order of the listed items is not relevant. But since not all relations are symmetric, we
want to distinguish between Mxy and Myx—one may be true while the other is false (x may be the
mother of y, but if so y is not the mother of x). As in this case, it often happens that the order of
terms matters. How can the concept of order be captured in set theory? The usual approach is by
introduction of what is called an ordered pair. Here is the definition:
Def < , > <x,y> :: x,x,y
Prove the following theorem:
ZF37 <x,y> = <u,w> ↔(x = u &y = w)
Suppose D is some set and R is an equivalence relation defined on D. We know that R partitions
D into equivalence classes such that each member of any one of these classes stands in the relation R
to any other member of that class. We use square braces to identify the equivalence class determined
by x, which is any one particular element of D. Thus, within the universe of discourse D:
Def []R [x]R :: y:&Rxy
ZF38 x ∈ [x]R
ZF39 (y ∈ [x]R &w ∈ [x]R )→Rwy
ZF40 (y ∈ [x]R &y ∈ [w]R ) →[x]R = [w]R
We require one final set-theoretic function. Here is the axiom on which this function rests:
Power Set ∀x∃y∀z(z ∈ y ↔z ⊆ x)
Once again, it is possible to prove that the set y is unique. The symbol that we now introduce is
‘℘’ which is read “the power set of.” Here is the definition:
Def ℘ x ∈ ℘A :: x ⊆ A
Given some set, the power set of that set is the set comprising all the subsets of the given set. As
we will see, this is a rich concept with many consequences. As usual, we will prove a few theorems
to become familiar with this concept.
Prove the following theorems:
ZF41 A ∈ ℘A
ZF42 ∅ ∈ ℘A

*ZF43 x:x ∈ ℘A = ∅
ZF44 A ⊆ B ↔℘A ⊆ ℘B
ZF45 ℘A ∪ ℘B ⊆ ℘(A ∪ B)
ZF46 ℘(A ∩ B) = ℘A ∩ ℘B

ZF47 A = ℘A

ZF48 A ⊆ ℘ A

6.4 Cantor’s Theory of Transfinite Numbers

We will now digress to consider some ramifications of the power set axiom. Set theory was developed
by Georg Cantor in his attempt to understand infinity, and some of the most interesting results of
set theory come from infinite sets.
We begin by defining ‘equinumerous’. Roughly, two sets are equinumerous if they have the same
number of elements. For example, the sets a,b,c and d,e,f are equinumerous. This rough definition
uses the concept of number, and sometimes (like when we talk about infinity) the concept of number
is the very thing we’re trying to understand. So instead, we’ll say that sets A and B are equinumerous
if there’s a one-to-one correspondence between the two sets. We can understand that as a special
kind of function, or just a pair of lists with arrows going between them, like this:


There is a one-to-one correspondence if there is an arrow from every member on one list to
exactly one member on the other list. Said another way, sets are equinumerous if, for each member
of either, there is a unique member of the other. (Another way to say that two sets are equinumerous
is to say that they have the same cardinality.) To show that two sets are equinumerous, we put the sets
into a one-to-one correspondence. The question before us is this: are all infinite sets equinumerous?
Obviously, no finite set is equinumerous with any of its proper subsets. For example, {a, b, c}
is not equinumerous with any of its proper subsets: {a, b}, {a, c}, {b, c}, {a}, {b}, {c}, ∅. This is
because {a, b, c} has “more elements” than any of its proper subsets. However, infinite sets can
be equinumerous with their proper subsets. Indeed, this fact can be used to define ‘infinite set’:
an infinite set is one that is equinumerous with some of its own proper subsets. For example, the
set of natural numbers (N) is equinumerous with many (indeed infinitely many) of its own proper
subsets. It is easy to see that N is equinumerous with the odd numbers: we just show a one-to-one
correspondence between the natural numbers and the odd numbers. The most obvious way to do
this is like this:
0 1 2 3 4 5 6 7 8 9 10 …
1 3 5 7 9 11 13 15 17 19 21 …
Here we assign each odd number x to the natural number x2–1 . So there are exactly as many
even numbers as there are natural numbers. Strange!
This strangeness is just the strangeness of infinity you’re already familiar with. This strangeness
is obvious at Hilbert’s Hotel. Hilbert’s Hotel has infinitely many rooms, and they’re all filled. A

weary traveler comes to the office and asks for a room. The clerk at the desk tells him to wait a
moment, and he’ll make one available. He has each lodger move down one room, making the first
room now vacant. Next, a bus pulls up with infinitely many passangers. The clerk tells them to wait
a moment, and has the lodger in room 1 move over a room (leaving room 1 vacant), the lodger in
room 2 move over two rooms into room 4 (leaving room 3 vacant), the lodger in room 3 move over
three rooms, and so on. Now every other room is vacant, and all the travelers on the bus have a
room. There’s always room at Hilbert’s Hotel.
Hilbert’s Hotel is, of course, just a vivid way of imagining putting the numbers in a one-to-one
correspondence. A bus with infinitely many people filling in only the odd-numbered rooms is the
same thing as putting the odd numbers into a one-to-one correspondence with the natural numbers.
This can be done with any infinite subset of the natural numbers: the multiples of ten (or of any other
number), the set of perfect squares, the members of greater than 1,000, and so forth. Each of
these is a proper subset of , and yet is equinumerous with . Moreover, itself is equinumerous
with other sets that include as a proper subset. For example, is equinumerous with the integers
( ) even through includes all the positive and negative integers. Here we arrange each negative
number after its corresponding positive number, and then match them up with the natural numbers:
0 1 –1 2 –2 3 –3 4 –4 5 –5 …
is also equinumerous with the positive improper fractions. It takes a little more ingenuity to
match up these numbers with the natural numbers, but it can be done. We first put all the rational
numbers on a grid, and then determine an orderly path through the grid so that every number gets
counted exactly once. One way to do it is like this:
1/1 2/1 3/1 4/1 5/1
1/2 2/2 3/2 4/2 5/2
1/3 2/3 3/3 4/3 5/3
1/4 2/4 3/4 4/4 5/4
1/5 2/5 3/5 4/5 5/5
1/6 2/6 3/6 4/6 5/6
With a similar approach, we can show that is equinumerous with the proper fractions, and
with the rationals ( , which include all fractions positive and negative. Indeed, is equinumerous
with what appears to be an even larger set: the set of all numbers that can be represented as roots of
polynomial equations ( ). This set goes beyond by including irrational numbers (e.g. the square
roots of two). has a proper subset equinumerous with (and therefore with and ), but
even is equinumerous with . Strange!
Or maybe not so strange. All of this is just saying ∞ + 1 = ∞, ∞ + ∞ = ∞, and so on,
and that is old hat. What is truly strange is that some infinite numbers are larger than others.
There are infinite sets that are not equinumerous with , , , or . The real numbers ( )
include subsets equinumerous with all the sets of numbers mentioned in the previous paragraph
but they also include what are called transcendental numbers. These numbers are not elements
of —they cannot be represented as roots of polynomial equations. Most of us are familiar with
only one such number: π (pi), the ratio of the circumference to the diameter of a circle. However,
while one seldom encounters transcendental numbers, they are vastly more numerous than any of

the numbers mentioned in the preceding paragraph. Once the transcendental numbers are added
to one obtains , a set not equinumerous with , , , or —in this sense L is a larger
infinite set. But how can we prove that is not equinumerous with ?
Cantor’s diagonal proof establishes this result. In discussing the proof, rather than talking about
all real numbers, we focus only on those between 0 and 1 (this set turns out to be equinumerous with
the set of all real numbers). We will write these numbers as decimals. For example, we can write
1/2, 1/3, 2/3, 1/4, 3/4 etc. as .5, .333. . ., .662. . ., .25, .75, etc. Of course π will not be included
since it is greater than 1, but the decimal part of it will be: .14159265 . . . We write terminating
decimals (for example, .5 in contrast to .333. . .) with 0’s in all the decimal places following their
termination (for example, instead of .5 we write .500 . . .). Now suppose the real numbers between
0 and 1 are equinumerous with . This hypothesis leads to a contradiction and so must be false.
The natural numbers can be written in a column:


If, by hypothesis, is equinumerous with , there must be a way of arranging the real numbers
in a column matching the column of natural numbers. Suppose we try to construct such an array
comprising all the real numbers between 0 and 1. It may begin like this:

1 .500000000 . . .
2 .798622222 . . .
3 .141592653 . . .
4 .250000000 . . .
5 .333333333 . . .
6 .999999999 . . .
7 .183183183 . . .
8 .718281828 . . .
.. ..
. .

If these sets are equinumerous, every real number between 0 and 1 must be somewhere in the
list at the right. But we can construct real numbers between 0 and 1 that are nowhere in the list
(regardless of how the list is generated or how long it may be). This proves that the hypothesis is
Here is one way of constructing such a number: Proceed down the diagonal of the digits in
our supposedly exhaustive list. In each case, find the nth digit of the nth number (first digit of the

first number, second digit of the second number, third digit of the third number, etc). If that digit
is 0 through 8, the nth digit in the number we are constructing will be one greater; if that digit is
9, the nth digit in the number we are constructing will be 0. What does this mean in terms of our
supposedly exhaustive list? Start with the first number in the list. Its first digit is 5 so the first digit of
the number we are constructing will be 6. The second digit of the second number is 9 so the second
digit of the number we are constructing will be 0. The third digit of the new number will be 2, and
so on. Proceeding in this way, we construct this number:
.60214026 . . .
But this number must differ from every number in the (supposedly exhaustive) list. In general it
will differ from the nth real at least in its nth digit. This means that our enumeration is not complete
and so the hypothesis that led to the enumeration is false. That is, is not equinumerous with .
has a property doesn’t have: density. Density means that between any two numbers
there is another number: ∀xy(x>y→∃z(x>z&z>y)). But is also dense, and it is equinumerous
with , so this is not the property that causes it to have greater cardinality than . That property
is continuity. Continuity was defined by Dedekind, and his definition is given below (*); for now it
will be enough to understand it intuitively. Density allows infinitessimal gaps, but continuity doesn’t.
Here’s a way to picture it: Construct, in your imagination, a dense line segment. Begin by placing
two points, then a point in between, then a point in between each of those, and so on infinitely. At
two different stages, the line will look like these:
Now imagine you have a subtle sword, a blade of infinitessimal breadth, and you slice through
the line without touching any of the points on the line. Such a feat is obviously possible on the two
non-dense line segments above—the sword need not even be all that subtle. As the number of points
increases to infinity, the distance between them shrinks, but there is always a gap large enough for
a sword of infinitessimal breadth. If the line were continuous, it would present a solid surface with
no gaps for the subtle sword, and any slice would pas through a point rather than between them.
(The sword is Dedekind’s; we’ll see it again later, as this is precisely the way to define real numbers.)
The line of real numbers is not just dense, it is continuous, and so the cardinality of is called the
cardinality of the continuum.
This proves that there is more than one infinite number. Hence it turns out that ‘infinite’ is a
misleading term to use. ‘Infinite’ is a negative term, it negates finitude, so some have thought infinity
to be a negative property, like the property of being not-green. (And there is no such thing as infinity,
any more than there is such a thing as a not-elephant; there is only the negative property.) But since
there is more than one infinity, infinity is more than a simple negation of finitude. Cantor introduced
the word ‘transfinite’: a transfinite number is a real number, as real as 17 or 254, but transcends all
finite numbers. He also introduced names for these numbers: ℵ0 (‘aleph-null’ or ‘aleph-naught’): ℵ
is the first letter of the Hebrew alphabet) is the number that is the cardinality of , and they go on
from there: ℵ1 , ℵ2 , ℵ3 .
Cardinals and ordinals. (This paragraph is another digression. While this distinction is impor-
tant for a lot of philosophy and mathematics, it won’t show up again in our story, so you can skip
this paragraph without loss.) We defined the numbers in terms of their order: 0 is the number that
doesn’t follow anything, 5 is the number that follows 4, and so on. We determined their order by

their cardinality, and this by the relation of equinumerosity so there would be no circularity. So or-
der and cardinality are properties that coincide for all numbers. Well, all finite numbers. It turns out
that for transfinite numbers, order and cardinality are different things. So in fact there are two sets of
transfinite numbers: the transfinite cardinals and the transfinite ordinals. The transfinite cardinals
we’ve already seen. The transfinite ordinals work differently. The first transfinite ordinal, which is
the ordered set of all natural numbers in their standard order, is ω (‘omega’, the last letter of the
Greek alphabet). But these numbers could have been ordered differently: how about <0,1,2,3,…,
c>? Here we have the set ω as usual, but then another number, c, greater than all of them. This is
the number ω+1, and ω+1̸=ω. In fact, ω+1̸=1+ω, since putting the new number at the beginning
of the sequence is different from putting it at the end. Thus, transfinite ordinal arithmetic is different
from transfinite cardinal arithmetic. We’re going to be interested only in transfinite cardinals.
Cantor’s Diagonal Proof can be generalized. The Generalized Diagonal Proof establishes that
the power set of any set always has greater cardinality than the given set. This generalizes the
argument we have just discussed since , the set of real numbers, can be regarded as the power set
of .
Start with any set, say A. We are mainly interested in infinite sets, but the proof works for finite
sets as well. Form the power set of A, ℘A. Suppose ℘A and A are equinumerous, that is, suppose
each element of ℘A corresponds to an element of A (as we will now see, this supposition leads to a
contradiction and so is false). Form a new set, A′ . The elements of A′ are taken from A (so A′ is a
subset of A and thus an element of ℘A); in particular we form A′ as follows: for any element of A,
if it is not in the element of ℘A that it corresponds to, let it be in A′ and otherwise not. Since A′ is a
subset of A and thus an element of ℘A, by the supposition that we want to prove false, some element
of A must correspond to it. Call this element b. Here is a diagram of what we have so far:
Now ask: Is b is an element of A′ ? If it is, by definition, it can’t be (since A′ includes only
elements of A not in the element of ℘A to which they correspond); but if it isn’t, by definition, it
must be. Thus, b must both be and not be an element of A′ —a contradiction. But what led to this
contradiction? It was our supposition that A and ℘A are equinumerous. So this supposition is false.
This proof establishes that the power set of any set, finite or infinite, has greater cardinality
(more elements) than the set itself. In particular, ℘ (which is ) must have greater cardinality
than , but ℘ must have greater cardinality than , and so on. Cantor thought that ℵ1 was the
cardinality of , but he couldn’t prove it. This hypothesis is known as the Continuum Hypothesis,
and the general version, that ℵ2 is the cardinality of ℘ , and so on, is known as the Generalized
Continuum Hypothesis. It turns out that it is independent of the standard axioms of set theory.That
means ZFC, the set theory we have been using, is not strong enough to prove the either Generalized
Continuum Hypothesis or its negation, so either it or its negation (or some broader axiom that
suffices to prove it or its negation) must be added. Which one? It’s not easy to tell. (Sometimes a
different sequence of letters is explicitly defined as the cardinality of the successive power sets, so
ℶ1 = ℵ1 , the cardinality of ℘ , and so on. So the Generalized Continuum Hypothesis is the claim
that ℶx = ℵx for all x>0.)
Here is a final result of Cantor’s work. Suppose there were a universal set. Then, according
to the generalized diagonal proof, the power set of the universal set would have greater cardinality

than the universal set—that is, it would have greater cardinality than the set that already has ev-
erything in it. But that is a contradiction. This paradox, called Cantor’s paradox, is resolved by
rejecting the idea of a universal set. As we have seen in the process of avoiding Russell’s paradox,
Zermelo-Frankel set theory also abandons the universal set. In Paul Halmos’s hyperbolic words:
“We have proved, in other words, that nothing contains everything, or, more spectacularly, there is
no universe.” Halmos goes on to apologize for the hyperbole, but his words are quite literally true.
Of course, if we understand ‘universe’ to mean ‘cosmos’ or ‘all the stars and galaxies and everything
else physics studies’, there may well be a universe. That’s a question for the physicists: is there, in
addition to the galaxies and magnetic fields and whatnot, an object that contains them all? But Can-
tor proved something else. If we understand ‘universe’ to mean ‘the object that contains absolutely
everything’—here not restricting the definition to physical objects, but including also numbers and
other mathematical objects, and possibly (who knows?) much more—we’ve just proved that there is
no such thing.

6.5 Peano’s Axioms

Now we return to Frege’s project. Our present goal is to prove Peano’s axioms and thereby follow,
at least in spirit, the central steps in Frege’s attempt to reduce arithmetic to logic.
As we saw in Section 3.2, one version of Peano’s axioms can be symbolized as follows:
6.5.A N0
6.5.B Nx →Nx′
6.5.C x′ ̸= 0
6.5.D x′ = y′ →x = y
6.5.E (X0 &∀x(Xx →Xx′ )) →∀xXx
As a first step we must define, in strictly logical terms, the undefined terms in these axioms:
‘0’, ‘′ ’, and ‘N’. As we have seen, Frege accepted too liberal a concept of sets in formulating his
definitions and so, in spite of their clarity and inherent plausibility, Frege’s definitions led directly to
Russell’s paradox. Zermelo-Frankel set theory adopts different definitions.
Here is one way of defining ‘zero’ and ‘successor’ compatibly with Zermelo-Frankel set theory:
Def 0 0 :: ∅
Def ′ x′ :: x ∪ x
Thus we define the other numbers like this:
1 :: ∅ ∪ ∅ :: ∅ :: 0
2 :: 1 ∪ 1 :: ∅ ∪ ∅ :: ∅, ∅ :: 0,1
3 :: 2 ∪ 2 :: ∅,∅,∅,∅ :: 0,1,2
We define zero as the null set and the successor of any number as the union of that number with
the set whose only member is that number. These definitions lack the intuitive plausibility of Frege’s
definitions, but they avoid loose talk about sets of all sets and they seem to avoid the paradoxes.
The Zermelo-Frankel definition of number is similar to Frege’s—it also defines the natural num-
bers as the intersection set of all sets that include 0 and are closed with respect to the relation ‘is a

successor of ’, but it avoids the paradox by drawing these sets only from other sets. However, this
approach requires a preliminary definition and an axiom.
Def Z-inductive x is Z-inductive :: 0 ∈ x &∀y(y ∈ x →y′ ∈ x)
This means that a set is Z-inductive if it includes 0 and is closed with respect to the relation ‘is
a successor of.’
Infinity ∃x(0 ∈ x &∀y(y ∈ x →y′ ∈ x))
By SA7, there is at least one z-inductive set. This axiom is sometimes called the axiom of infinity
because it entails the existence of an infinite set. You can probably see that the set of natural numbers
is Z-inductive, but many other sets are also. To take the same fanciful example we encountered
earlier, the set that includes the natural numbers plus the moon is z-inductive (the moon doesn’t
have a numerical successor). Once again, we must formulate a definition of the natural numbers
that will eliminate such fanciful additions to the set we are interested in. Following Frege, we do this
by taking the intersection of all Z-inductive sets. It is customary to call this set ω (the last letter in
the Greek alphabet).

Def ω ω :: x : x is z-inductive
As you can probably see ω is the common core of all z-inductive sets; we can define the phrase
‘x is a number’ by stipulating that x is any element of this core:
Def N Nx :: x ∈ ω
From here on we will simplify things by further restricting the universe of discourse to sets that
are elements of ω, that is, to the natural numbers.
It now remains only to prove Peano’s axioms. Four of the proofs, including the three in the
following exercise set, are quite easy:
Prove the following theorems
ZF49 N0
ZF50 Nx →Nx′
ZF51 0 ̸= x′
Next we will prove 5.5.E. In the notation of set theory it is written like this:
ZF52 (0 ∈ y &∀x(x ∈ y →x′ ∈ y)) →∀x x ∈ y
This axiom of arithmetic (which becomes a theorem in set theory) is called, somewhat mis-
leadingly, the principle of mathematical induction. It is a powerful axiom/theorem that we will use
extensively in proving theorems in arithmetic. However, it follows easily from our definition of num-
ber. The antecedent in ZF42 tells us that y is z-inductive. Since each number is in the intersection
of all z-inductive sets, each number is an element of every z-inductive set and so of y.
The remaining axiom, 5.5.D is somewhat more difficult to prove. We start with a definition:
Def Tz Tz :: (x ∈ y &y ∈ z) →x ∈ z
‘Tz’ is read “z is transitive.” This name is appropriate because in such sets, the relation of
membership is transitive. Given this definition, one can prove two lemmas (ZF46 and ZF47) from
which the final Peano axiom (ZF48) follows quite easily.

Prove the following theorems. Note: ZF53 through ZF55 are straightforward and are mainly
intended to provide experience working with the definition of ‘T’. ZF46 is more difficult (but man-
ageable). ZF57 requires the use of mathematical induction (i.e. of ZF52). ZF58, which is the final
Peano axiom, follows
∪ from ZF56 and ZF57.
ZF53 Tx ↔ x ⊆ x
ZF54 Tx ↔∀y(y ∈ x →y ⊆ x)
ZF55 Tx ↔x∪⊆ ℘x
ZF56 Tx → x′ = x
ZF57 Tx
ZF58 x′ = y′ →x = y
Chapter 7

Gödel’s Proofs

7.1 The basic idea

We must be clear at the outset exactly what Kurt Gödel proved. Here are Gödel’s two main results:
(1) no consistent axiom system for arithmetic can be complete, and (2) no axiom system for arith-
metic can be proven consistent by any argument that can be expressed [the usual technical term
is represented] in that system. Remember what it means for an axiom system for arithmetic to be
complete: it means that within that system, every true arithmetic statement can be proven. Thus,
Gödel’s first result means that, given any consistent axiom system for arithmetic, there will always
be true arithmetic statements that cannot be proven within that system. This holds not only for
Peano’s axioms but for any consistent axiom system no matter how many axioms it may include
(even an infinite number). In the simplest terms, Gödel did this by constructing a statement, in the
language of arithmetic, that says of itself that it cannot be proven. Such a statement must be true
because if it were false, it would be both false and provable (an impossible combination if the axioms
are consistent). Thus, the statement must be true and so unprovable.
Here are a couple of claims Gödel did not prove (be sure you see the difference): (1) He did
not prove, merely, that no consistent axiom system for arithmetic could be proven complete (and
thereby leave open the possibility that some such system really may be complete but we just can’t
prove it)—rather he proved that there is no possibility of an axiom system for arithmetic that is both
consistent and complete. (2)He did not prove that no axiom systems are complete. Inconsistent
axiom systems (even for arithmetic) are always complete. Moreover, there are complete and consis-
tent axiom systems for truth-functional and quantificational logic, for identity, and for lots of other
subjects—but not for every subject and, in particular, not for arithmetic. (3) He did not prove that
arithmetic itself is not complete. Only axiom systems can be complete or incomplete or consistent
or inconsistent—it makes no sense to say that arithmetic itself (the collection of all arithmetic truths)
is or is not complete.
Until the 1920s, when Gödel’s results appeared, everyone assumed that Peano’s system was
complete or, at worst, that it could be made complete by adding a few axioms. Gödel’s results came


as a great blow to preconceptions–what, after all, can it mean for an arithmetic statement to be
true except that it is provable? Yet Gödel’s first result is that, within any consistent axiom system for
arithmetic, there will always be true but unprovable arithmetic statements. Moreover, one can prove
that while the provable theorems are countable, the unprovable truths are uncountable. Thus, the
overwhelming majority of the truths expressible in arithmetic are unprovable. Amazing!
The idea behind Gödel’s proof is something like the liar paradox. Look at this sentence:

This sentence is not true.

You can easily see that the sentence must be both true and not true, a contradiction. Gödel’s
proof relies on a similar sentence, but with provability replacing truth:

This sentence is not provable in PA.

Is this sentence provable in PA? Well, if it is, PA can prove a false sentence and so is inconsistent.
If it’s not, the sentence is true but not provable in PA, so PA is incomplete. Hence PA cannot be
both complete and consistent.
If that seems a little too fast, you’re right. Why should we expect that sentence to be provable in
PA? PA is about arithmetic; the sentences it can prove are sentences in arithmetic; it is incomplete
only if there is some sentence of arithmetic that it can’t prove. That sentence is not a sentence of
arithmetic, so that sentence doesn’t count against the completeness of PA. It would, however, count
against the completeness of PA if there were a sentence of arithemetic that said of itself that it wasn’t
provable. The basic idea behind Gödel’s proof is showing that there is such a sentence.
One key to being able to state both the liar sentence and Gödel’s sentence is self-reference. Each
of these sentences referred to itself (using the words ‘this sentence’). There are other ways to achieve
self-reference. For example, say we had a list of sentences. Name the list L. If the nth sentence on
the list were

n Sentence n on list L is not true.

then n would be a liar sentence, having achieved self-reference by referring to its own name.
There are other ways of doing this in English, since English has a huge variety of ways to refer to
arbitrary sentences. Arithmetic doesn’t seem to have that. One of the many extremely clever bits of
Gödel’s proof is that he showed how sentences of arithmetic can refer to themselves.

7.2 The details

Step 1: The Gödel numbers
The first thing we do is assign a number to each of the undefined symbols of the language. Here is
one way to do it:
7.2. THE DETAILS 129

line separator 00
( 11
) 12
, 13
∀ 21
∼ 22
& 23
= 24
x 31
P 32
f 33
. 34
0 41
′ 42
+ 43
× 44
This way of matching up symbols to numbers is not the only way. It’s not the way Gödel did it,
but it is considerably simpler. These numbers are called the Gödel numbers.
We now can give a number to each sentence of PA. We do this by concatenation. So, for example,
the open sentence
will get the number 312431, which is the number for ‘x’ next to the number for ‘=’ next to the
number for ‘x’. The defined terms can be given Gödel numbers via their definitions. For example,
1 is defined as 0′ , so the Gödel number of 1 is 4142. We can also give Gödel numbers to proofs, with
are just sequences of sentences. The Gödel numbers of the lines in the proof will be separated by
double zeroes. (Even though I will write ‘.˙.’ to mark the conclusion, we don’t have a Gödel number
for it, since we don’t need one. The last line is the conclusion.) So, for example, the proof
∀x(x = x)
.˙.0 = 0
will get the number 21311124311200412441. (As you can see, the Gödel numbers are normally
quite large. The Gödel number of this very simple proof is more than 2×1019; for longer proofs
the numbers can get extremely large.)
Not every number is a Gödel number, but given any number it’s easy to check whether it’s a
Gödel number, and what symbol or sentence or proof it’s a Gödel number of.
Give the Gödel number of these statements/proofs.
1 ∀xPx
2 1=1
3 ∼(0=1)
4 ∀xPx

Give the statement numbered by these Gödel numbers.

5 41434142244142
6 213122114124314212
7 2131113143412431120041424341244142
From now on we’ll follow this convention: We’ll put upper corner quotation marks around a
formula to indicate the Gödel number of that formula, and lower corner quotes around a number
to mean the formula that that is a number of. Thus:
⌜x = x⌝ :: 312431
We’ll call a Gödel number written out ‘fully expanded’ and a Gödel number written with corner
quotes ‘partially expanded’. So ⌜x = x⌝ is partially expanded, and ‘312431’ is fully expanded.

Step 2: The unprovable relation

Partly because Gödel’s own system for numbering was more complex, this section of his proof was
vastly more complicated. The next step is to show that there is some relation between the Gödel
number of a proof and the Gödel number of the conclusion. Given our system of Gödel numbering,
the relation is straightforward: the Gödel number of conclusion of the proof is the sequence of
numbers after the last pair of zeros. In the proof above that 0 = 0, the Gödel number of the proof is
21311124311200412441, and the Gödel number of the conclusion is 412441. Another part of the
reason this section of the proof was more complicated for Gödel is that it is necessary to show that
this numerical relation can be defined within PA. The concatenation function is easily seen to be
definable in PA, since it’s simply addition and multiplication: to concatenate the number 21 to the
right of the number 31, we multiply 31 by 100 and add 21. It’s slightly more complicated to find
the last pair of zeroes and then subtract the numbers after them, but it can be done.
We’ll call this relation between the proof and the conclusion Bxy: x is the Gödel number of
a proof of y. (For clarity, I will sometimes write the relation with parentheses and a comma, like
this: B(21311124311200412441,412441).) It will often be clearer to use the corner-quote notation:
B(⌜∀x(x = x)⌝, ⌜0 = 0⌝). (‘B’ stands for the German word ‘Beweis’, which means ‘proof ’.)
Given this relation, we say that a certain formula has no proof. For example, to say that there’s
no proof of ‘0 = 1’, we say ‘∼∃xB(x, ⌜0 = 1⌝)’, or equivalently, ‘∀x∼B(x, ⌜0 = 1⌝)’. Let’s define
a new one-place predicate, Ux, that says that a certain sentence has no proof, i.e., is unprovable.
So ‘U ⌜0 = 1⌝’ (i.e., ‘H41214142’) says that the statement ‘0 = 1’ is unprovable. (‘U’ stands for
‘unbeweisbare’ or ‘unprovable’. Gödel himself used ‘Bew’ as the name of the predicate.)
Given the predicates as defined above, state that there is no proof of the following sentences, in
both the fully expanded and partially expanded notations. (In some cases the sentences need to be
rewritten in terms of the basic symbols.)
8 ∀xPx
9 1=2
*10 x̸=x
11 ∃x(x=0 &x̸=0)
7.2. THE DETAILS 131

Step 3: Self-reference
Let’s take a step back and see where we are. First Gödel showed that there is a way to encode every
symbol, every sentence, and every proof in a formal language into (a subset of) the natural numbers.
(This, incidentally, was a key insight that was important in the development of computers.) Then he
showed that there is a specific mathematical relation between the Gödel number of a proof and the
Gödel number of the conclusion of that proof, a relation no less mathematical than ‘<’. This allows
us to talk about mathematics within mathematics. Now, if Gödel had done only these two things,
he may well have been the greatest logician and mathematician of his generation. Applying these
results to show that no axiomatic system for arithmetic can be complete—well, that’s amazing.
Back to it. We will now define a function g of three variables. We represent the variables by
numbered blanks: _1 , _2 , and _3 , and we represent the function of these variables by ‘g(_1 , _2 , _3 )’.
This function goes FROM the numbers we write in the blanks TO a particular Gödel number.
In particular, the value of the function is the Gödel number of the expression one gets if one be-
gins with the expression with the Gödel number that appears in the first blank (_1 ) and replaces,
in that expression, the symbol whose Gödel number appears in the second blank (_2 ) by whatever
number appears in the third blank (_3 ). Thus, g is a three-place function FROM the Gödel num-
ber of a particular expression, the Gödel number of a particular symbol, and an arbitrary number
TO a particular Gödel number—namely, the Gödel number of the result of inserting the number
for the symbol in the expression. For example, the Gödel number of the statement function ‘x =
x’ is 312431, so g(312421,31,2) is 41424224414242, the Gödel number of 2 = 2. In particular,
g(_1 , 31, _1 ) is the Gödel number of the expression one gets if, beginning with the expression with
Gödel number _1 , one replaces all the x’s in that expression with that very Gödel number. For exam-
ple, g(⌜x = x⌝, 31, ⌜x = x⌝)—in expanded form g(312431,31,312431)—is ⌜312431 = 312431⌝.
(I hope you’ll forgive me if I don’t write it all out.)
Now consider this open sentence: Ug(x,31,x). This says that the sentence whose Gödel number
is g(x,31,x) is unprovable. This is an open sentence because it has an unbound variable x; we can
replace that x with a particular number. If we replace it, as we did above, with ⌜x = x⌝, then the
sentence says (falsely) that ⌜312431 = 312431⌝ is not provable. But if we replace it with the number
⌜g(x, 31, x)⌝—that is, the Gödel number of the open sentence itself—things get interesting. We’ll
call this sentence G.
G U g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝).
And that’s it. That’s the sentence that says of itself that it’s not provable. How does it do that?
Well, it says that the sentence with the Gödel number g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝) is not
provable (that, recall, is what the predicate ‘U’ means). But what sentence has that Gödel number?
To figure that out, we replace all the x’s in the function g’s first slot with the number in the third
slot. Why don’t you figure it out. I’ll wait here.
We have terms for fully and partially expanded Gödel numbers; let’s add one: call a Gödel
number written as the function g ‘condensed’. So ‘g(312421,31,2)’ is condensed (as is ‘g(⌜x =
x⌝, ⌜x⌝, 2)’), ⌜2 = 2⌝ is partially expanded, and ‘41424224414242’ is fully expanded. Partially

expand the following condensed Gödel numbers.

12 g(4144412441, 41, 1)
*13 g(⌜x + x = x⌝, ⌜x⌝, 0)
14 g(⌜x + x = 824882⌝, 31, ⌜0 = 0⌝)
15 g(⌜g(x, 31, x) = 0⌝, 31, ⌜g(x, 31, x)⌝)
16 g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝)
If you didn’t make a mistake, you got
⌜U g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝)⌝
which is just the Gödel number of G itself. So G says that there’s a sentence unprovable in PA,
and that sentence is the one whose Gödel number is g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝)—that
is, whose Gödel number is ⌜U g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝)⌝—that is, G itself.

Step 4: Incompleteness
So we have a dilemma. If G is true, there is a true statement of arithmetic that is unprovable, so PA
is not complete. If G is not true, there is a false statement of arithmetic that is provable, so PA is not
consistent. It seems reasonable to take the first horn of this dilemma—it seems fairly obvious that
PA is consistent, so it mst be incomplete. This, of course, doesn’t mean that there’s no way at all to
prove G (we just proved it, for example, from the premise that PA is consistent); it means that there’s
no way to prove it within PA—in other words, it’s not a theorem of PA.
Thus, PA is incomplete. Normally that means that more axioms are needed. We could, for
example, add G itself as a new axiom, thus guaranteeing that G is provable. But then Gödel’s
argument could be restated in terms of this strengthened system and, indeed, no matter how many
times we augment A, even through the addition of infinitely many axioms, we can follow the same
steps to generate new versions of G that are true but unprovable in the augmented system (this
aspect of Gödel’s proof resembles Cantor’s diagonal proof). Thus, no consistent extension of A can
be complete for arithmetic. This is sometimes expressed by saying that axiom systems for arithmetic
are essentially incomplete.
Gödel showed that his results held for PA and for ZF, and claimed they held for “related systems.”
Several logicians after Gödel tried to figure out what exactly were the minimum requirements of an
axiom system that make it essentially incomplete. It turns out that any axiom system that is sufficient
to give rules for addition and multiplication is sufficient. (That is because we needed addition and
multiplication to generate the function in step 2 that we used in step 3 to simulate self-reference.)
For example, Q—that is, PA without induction, merely finite arithmetic—is sufficient.

Step 5: Gödel’s second theorem

We now come to the second of Gödel’s results, namely that no axiom system for arithmetic can be
proven consistent by any argument expressible in that system. This follows quite easily from the first
7.2. THE DETAILS 133

Suppose ∼G, that is, suppose ∼∀x∼U g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝). By QN, we have
∃xU g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝). This asserts that some number stands in the relation
U __ to g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝) and, on the meta-level, this means that G, the state-
ment with Gödel number U g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝), is provable in PA. But if PA is
consistent, if G is provable, G must be true. Thus, if PA is consistent, if we assume ∼G (that is, if G
is provable), G must be true. Thus, if PA is consistent, if ∼G, we can prove G. But then we would
have this proof:
1. PA is consistent →(∼G →G)
2. PA is consistent →(∼∼G vG) CE
3. PA is consistent →(G vG) DN
4. PA is consistent →G Taut
Thus, if one could prove that PA is consistent (by any argument in PA), modus ponens would
yield G. But, from Gödel’s first result, we know G cannot be proven in PA. Thus, in PA we can never
prove that PA is consistent.