0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

34 Ansichten133 Seiten© © All Rights Reserved

PDF, TXT oder online auf Scribd lesen

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)

34 Ansichten133 Seiten© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

Sie sind auf Seite 1von 133

Ryan Christensen

& K Codell Carter

The Second-Hottest Logic Book on Earth

Winter 2018 Edition

Copyright 2018 Ryan Christensen

The Second-Hottest Logic Book on Earth

Ryan Christensen

& K Codell Carter

Contents

Contents 4

0 Truth-Functional Logic 6

0.1 Symbolizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

0.2 Scope and statement forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1 First-Order Logic 15

1.1 Translating categorical statements. . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.2 Relations and multiply general statements . . . . . . . . . . . . . . . . . . . . . . 23

1.3 Properties of Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.4 Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2 First-Order proofs 44

2.1 The ﬁrst three rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.2 existential instantiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.3 Quantiﬁer Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

2.4 Logical Truths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.5 Strategies and Tactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3 Axiom Systems 57

3.1 Axiom Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.2 Axiom Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.3 An axiom system for TF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.4 An Axiom system for FOL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.5 Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4 Modal Logic 75

4.1 What is modal logic? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.3 Quantiﬁed modal logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.4 Models of Quantiﬁed Modal Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4

CONTENTS 5

5 Arithmetic 97

5.1 Robinson Arithmetic (Q) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.2 Peano Arithmetic (P) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.1 Naive Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6.2 The Logicist Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.3 Zermelo-Fraenkel Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.4 Cantor’s Theory of Transﬁnite Numbers . . . . . . . . . . . . . . . . . . . . . . . 119

6.5 Peano’s Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

7.1 The basic idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7.2 The details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Chapter 0

Truth-Functional Logic

0.1 Symbolizing

Consider this dead-simple argument:

Therefore, dogs are mammals.

This argument works by exploiting the fact that the premise is a combination of two simple state-

ments: ‘A and B ’. ‘And’ is a logically signiﬁcant word here; the rest of that sentence could change

without changing the logical form of the sentence. A simple statement is a statement without logically

signiﬁcant words, and a complex statement is one that uses logically signiﬁcant words to combine one

or more simple statements. To symbolize a statement is to replace all the simple statements with

capital letters and the logically signiﬁcant words with special symbols. The ﬁrst symbol we’ll use is

‘&’, to symbolize statements like the premise in the above argument. (Sometimes ‘·’ or ‘∧’ are used

instead). Thus, if A=‘dogs are mammals’ and B=‘cats are mammals’, the above argument can be

symbolized

A&B

A

Statements of the form ‘A & B ’ are called conjunctions, and the statements on either side of the ‘&’

are called conjuncts.

Sometimes the order of the statement must be rearranged a little. Consider a statement like

‘Jezebel is happy and hungry’. Because the logical symbol ‘&’ can only combine statements, we

can’t take A to be ‘Jezebel is happy’ and B to be ‘hungry’ because ‘hungry’ is not a statement.

We have to rewrite this statement so the ‘and’ combines two whole statements: ‘Jezebel is happy

and Jezebel is hungry’. Then we have two simple statements that we combine to make a logically

complex conjunction. This is one reason we have a special word ‘conjunction’, instead of just saying

6

0.1. SYMBOLIZING 7

‘and’: in English, ‘and’ can combine adjectives (as in ‘happy and hungry’) or statements. The logical

conjunction can combine only statements.

Negations usually also take some rewriting. The statement ‘Jezebel is not happy’ is the negation

of the statement ‘Jezebel is happy’. If that statement is symbolized A, we can’t put the symbol for

‘not’ in between two statement symbols, since there’s only one. Instead we put it in front of the letter,

like this: ‘∼A’. (Sometimes ‘¬’ is used instead.) When reading a sentence, logicians will often say

‘it is not the case that Jezebel is happy’, to get the negation in the right place.

There are two other basic symbols of logic: Disjunction, symbolized ‘v’, which translates ‘or’.

Thus ‘Jezebel is happy or hungry’ could be symbolized ‘A v B ’. (Each side of a disjunction is called

a disjunct.)

The last basic symbol is conditional, symbolized →, which translates ‘if ... then ...’. So ‘if Jezebel

is hungry, then Brünnhilde is happy’ could be symbolized ‘A → B ’. (The left side of a conditional

is called the antecedent and the right side is called the consequent.) Conditional statements are different

from conjunctions and disjunctions in that they’re asymmetrical. ‘Jezebel is happy and Brünnhilde

is happy’ means the same thing as ‘Brünnhilde is happy and Jezebel is happy’. There might be

some reason you’d say one rather than the other, but either one works as a premise to conclude

‘Brünnhilde is happy’ (or ‘Jezebel is happy’). But with conditional statements, the antecedent and

the consequent are not interchangeable. We’ve seen before how swapping the antecedent and the

consequent makes an argument valid or invalid. Because of this, it is crucial to get the antecedent

and consequent right. The difﬁculty is made worse by the fact that we have several different ways

of saying conditionals, and depending on how we say it, the antecedent might come before or after

the consequent in the English sentence. For example, if I say ‘The light works only if you jiggle

the switch (and sometimes not even then)’, am I saying ‘If you jiggle the handle, the light works’?

No—that’s the point of that parenthetical remark. Jiggling the handle doesn’t guarantee that the

light works. Instead, I’m saying something like ‘If the light works, you must have jiggled the switch’.

In other words, while ‘A if B ’ is symbolized ‘B → A’, ‘A only if B ’ is symbolized ‘A → B ’. This

point is worth emphasizing because it is one of the most common sources of mistakes:

p→q p if q ; only if p, q

The rule is this: the words ‘only if ’ come before the consequent; the word ‘if ’ comes before the

antecedent.

There is one more common symbol, called the ‘biconditional’. It’s shorthand for a conjunction

of conditionals. So if I want to say ‘if A then B , and if B then A’, I could write

(A → B) & (B → A)

A↔B

8 CHAPTER 0. TRUTH-FUNCTIONAL LOGIC

Sometimes we say that the biconditional symbolizes the English expression ‘if and only if ’, because

another way to say ‘if A then B , and if B then A’ is ‘A if and only if B ’.

We can combine these basic symbols, often called connectives, to symbolize statements that are

even more logically complex. We use parentheses, just as in math, to indicate which statements go

together.

Example: If Jezebel comes or Brünnhilde comes, I am leaving.

Let J =‘Jezebel comes’, B =‘Brünnhilde comes’, I =‘I am leaving. Then the main connective in

this sentence is the conditional. The consequent of this conditional is simple, but the antecedent of

this conditional is complex. The sentence is symbolized like this:

(J v B) → I

Example: I don’t enjoy movies if the theater is loud and I don’t eat popcorn.

Let M =‘I enjoy movies’, L=‘the theater is loud’, P =‘I eat popcorn’. Again the main connective

is the conditional, but notice that the consequent comes ﬁrst. The single word ‘if ’ comes before

the antecedent, so ‘the theater is loud and I don’t eat popcorn’ is the antecedent. Also, given our

symbols, two of the statements are negated.

(L & ∼P ) → ∼M

There are many English expressions that can by translated using these symbols. Here are some

of the most common:

neither and not both.

Think about the difference between these two sentences:

Siegfried and Brünnhilde can’t both go to the bank.

The ﬁrst says that Siegfried can’t go to the bank, and Brünnhilde can’t go to the bank. If either

of them can go to the bank, the sentence is false. The second says that they can’t both go. If either

of them goes alone, the sentence is true, but if they go together, it’s false. So the sentences mean

different things, have different logical value.

The ﬁrst can be symbolized like this:

∼S & ∼B

or it could be symbolized like this:

∼(S v B)

We can prove that these two statements are logically equivalent, and we will do that later. But it also

makes sense if you think about it. The ﬁrst way of symbolizing it says that Siegfried can’t do it, and

Brünnhilde can’t do it. The second says that it’s false that either of them can do it. These statements

mean the same.

The second can be symbolized like this:

0.1. SYMBOLIZING 9

∼(S & B)

or like this:

∼S v ∼B

Again, these statements are logically equivalent. The ﬁrst says that it’s not the case that Siegfried

does it and Brünnhilde does it, and the second says that either Siegfried doesn’t or Brünnhilde

doesn’t (or neither does it). Again, these mean the same.

The moral to take from this is that parentheses are important, and can change meaning.

but

How would you translate a sentence like ‘Jezebel is tall, but Brünnhilde is strong’? The English

sentence has some logical information: Jezebel is tall and Brünnhilde is strong. It also has some

psychological information, expressing something about the contrast between the statements or the

surprisingness of one of them. This extra psychological information is important, perhaps, but not

logically important. So this sentence could be translated into logic in just the same way as ‘Jezebel

is tall and Brünnhilde is strong’—T & S . Lots of English words are used to assert that two different

statements are true, with different amounts of coloring about how the author wants you to consider

the statements: ‘whereas’, ‘on the other hand’, ‘although’, ‘in addition’, ‘also’. These can all be

translated by the conjunction.

unless

‘Unless’ is a right old mess. If I say ‘Siegfried will go to the bank unless Brünnhilde does’, I

probably mean ‘If Brünnhilde goes to the bank, then Siegfried won’t go, but if Brünnhilde doesn’t

go to the bank, then Siegfried will go’. This can be symbolized by the biconditional:

S ↔ ∼B

This is an exclusive disjunction. But most logic books and standardized tests think of this sentence

another way. Suppose Brünnhilde does go to the bank, but Siegfried doesn’t know, so she goes, too.

Is the sentence false in this situation? If the sentence is true in the case where both Siegfried and

Brünnhilde go to the bank, the sentence expresses an inclusive disjunction:

SvB

Bowing to tradition, we will translate ‘unless’ this way as well. But be aware of the ambiguity.

If one side is negated, that disjunct will also be negated. So ‘Siegfried will not go to the store

unless Brünnhilde does’ can be translated

∼S v B

C→B

10 CHAPTER 0. TRUTH-FUNCTIONAL LOGIC

Example.

The world will explode unless you enter this code exactly and press that button.

Let W =‘the world will explode’, C =‘you enter this code exactly’, and B =‘you press that button’.

We could rewrite the sentence using these abbreviations:

W unless C and B .

The main connective here is ‘unless’—you must do the two things, or else the world will explode.

So ‘C and B ’ should be put in parentheses. ‘Unless’ is a disjunction, so the whole statement is this:

W v (C & B)

Here’s a table summarizing some common English expressions and their logical translation:

not p ∼p

it is not the case that p ∼p

it is false that p ∼p

p and q p&q

p but q p&q

neither p nor q ∼p & ∼q or ∼(p v q)

not both p and q ∼(p & q) or ∼p v ∼q

p or q pvq

p unless q pvq

not p unless q ∼p v q or p → q

if p, q p→q

p if q q→p

p only if q p→q

only if p, q q→p

p if and only if q p↔q

Exercises

Let the letters on the left symbolize the statements on the right:

I I eat ice cream

C you eat cake

P we eat pie

B we eat brownies

Translate each of the following sentences into logical notation:

0.1. SYMBOLIZING 11

4 Either we eat pie, or if I don’t eat ice cream, you don’t eat cake.

6 Either I eat ice cream, or, if I don’t eat ice cream and you don’t eat cake, then we eat brownies.

8 We eat pie only if either you eat cake or I eat ice cream.

9 If I eat ice cream, then if you eat cake, we eat both pie and brownies.

10 If either you eat cake or we eat pie, then if I eat ice cream, we don’t eat brownies.

15 If I eat ice cream and we eat pie, either you eat cake or we eat brownies.

16 We eat both pie and brownies, but if I eat ice cream, you eat cake.

17 You don’t eat cake if either I eat ice cream or we don’t eat brownies.

19 If we neither eat pie nor brownies, then if I eat ice cream, you don’t eat cake.

Exercises

Choose your own letters for the simple statements, and symbolize each of the following sen-

tences. Be sure to write down the meaning of the simple statements.

12 CHAPTER 0. TRUTH-FUNCTIONAL LOGIC

24 The government will default on its debt unless the federal bank cuts interest rates and the

treasury prints more money.

25 The federal bank will cut interest rates only if either inﬂation is too high or unimployment is

too low.

28 The company has massive layoffs, but it will collapse only if the CEO steps down.

29 Unless the CEO steps down, the company will either collapse or have massive layoffs.

30 If the CEO steps down and the company has no viable replacement, it will collapse unless the

board acts quickly.

Exercises

Symbolize the following arguments:

31 If everything is merely contingent, at one time nothing existed. If this were true, even now

nothing exists, which is absurd. So not everything is merely contingent.

32 Jezebel qualiﬁed for the ﬁnals only if every runner faster than Jezebel also qualiﬁed. Every

runner faster than Jezebel qualiﬁed, so Jezebel must have qualiﬁed, too.

33 Neither Dr. Black nor Professor Plum committed the murder. If Nurse White committed the

murder, so did Professor Plum. Either Dr. Black or Reverend Green committed the murder.

If Dr. Black committed the murder, so did Miss Scarlett. Therefore, if Reverend Green

committed the murder, so did Miss Scarlett.

34 Either Reverend Green or Dr. Black killed him. If Colonel Yellow killed him, Dr. Black

didn’t. Neither Nurse White nor Miss Scarlett killed him. Therefore, if Colonel Yellow killed

him, Reverend Green didn’t.

35 The universe is orderly and apparently designed, like a mechanical watch. If the universe is

orderly and apparently designed, like a mechanical watch, it must have been created by an

intelligent designer. So the universe must have been created by an intelligent designer.

36 There are only three possibilities: either your sister is mad, or she is telling lies, or she is telling

the truth. You know she does not tell lies, and she is obviously not mad, so for the time being,

unless other evidence turns up, we must assume she is telling the truth. (C.S. Lewis, The Lion,

the Witch, and the Wardrobe)

0.2. SCOPE AND STATEMENT FORMS 13

Parentheses are very important in symbolizing complex statements. Look at this example:

The police will arrest you only if they see you and you do are doing something illegal.

P → (S & I)

(P → S) & I

You are doing something illegal, and if the police arrest you, they they see you.

The second, unlike the ﬁrst, asserts that you are doing something illegal. In the second statement, the

conjunction has the wider scope, so both conjuncts are being asserted, but in the ﬁrst, the conditional

has the wider scope, so the conjuncts are asserted only conditionally.

The scope of a logical operator is the portion of the statement that is governed by that operator:

scope of →

z }| {

P → (S & I)

| {z }

scope of &

The main operator in a statement is the operator with the widest scope. Normally this will be

outside of all parentheses. If it is a negation, it will be the leftmost symbol. Here are some examples,

with the main operator indicated:

∼ (A & B)

(A & B) → (C v D)

∼(A & ∼B) & ∼C

A → (B → (C → (D)))

∼A v (B & C)

∼ ∼(A ↔ B)

It is often crucial to decide whether two statements have the same form. If we are compar-

ing a statement with only one operator with a statement that has more than one operator, we are

concerned only with the main operator in the more complex statement. For example, these two

statements have the same form:

A&B (A v B) & (C → D)

14 CHAPTER 0. TRUTH-FUNCTIONAL LOGIC

The statement on the left is a conjunction, and so is the statement on the right, because its main

operator is a conjunction.

If both statements are more complex, they must have the same operators with the same scope

to have the same form. For example, these two statements have the same form:

∼A → B ∼(A v B) → (C & D)

∼A → B (∼A v B) → (C & D)

These last two statements don’t have the same form because the negation on the left has as its scope

the whole antecedent, but the negation on the right has as its scope only part of the antecedent.

Often we will use lower-case letters p and q (and possibly others) to identify statement forms. So

if we say ‘a statement of the form p & q ’, we mean any conjunction, no matter how complex.

Exercises

Determine whether the following statements have the form indicated.

1 ∼p → q A → ∼B

2 ∼p → q ∼∼A → ∼B

Chapter 1

First-Order Logic

First-order logic (FOL) was developed in the nineteenth and twentieth century by several math-

ematicians and philosophers to correct some deﬁciencies they saw in traditional logic, which was

based on the logic of Aristotle.

We begin with atomic sentences. An atomic sentence is a simple sentence, with a subject and a

predicate, like this

Jezebel is tall.

This sentence has a subject (‘Jezebel’) and a predicate (‘is tall’). In FOL, predicates are symbol-

ized by capital letters and subjects by lower-case letters. The subject comes after the predicate. So

this sentence is symbolized

Tj

There were some simple valid arguments that could not be assimilated into traditional logic.

For example, look at this argument:

Jezebel ate the chocolate cake, so someone ate the chocolate cake.

This is clearly a valid argument. But how would you translate it into logic? There is no ‘and’ or

‘or’ or ‘not’—this argument has none of the logical connectives we’ve discussed. It has two simple

sentences:

S: Someone ate the chocolate cake.

15

16 CHAPTER 1. FIRST-ORDER LOGIC

J .˙.S

The ﬁrst change we’ll make is to distinguish the subject of a sentence from the predicate. We’ll

now use capital letters to stand for predicates and lower-case letters to stand for subjects:

Now we can symbolize the premise of the above argument like this:

Cj.

Now the simple statements are more complex than they were before. Before, we just had single

letters; now we have statements that reveal the inner logic of these statements. Once we have these

statements, we can then combine them just as we’ve been doing.

Cj &Bs.

Another example:

You might be tempted to put the disjunction between the lower-case letters, but that would be

wrong. The logical connectives we have can go only between sentences. So we need to paraphrase

the sentence so that every simple sentence has just one subject and one predicate, like this:

Either Jezebel ate the chocolate cake, or Siegfried ate the chocolate cake.

Cj vCs.

Exercises

Use the following key:

a: apple pie; b: blueberry cheesecake; c: carrot cake

Fx: x is fried; Gx: x is good; Hx: x is healthy.

1.1. TRANSLATING CATEGORICAL STATEMENTS. 17

That’s the ﬁrst step: We split apart the simple statements to reveal the subject-predicate structure

within simple sentences. We can now symbolize the premise of the above argument. But what about

the conclusion? We have to take another step. Sometimes we want to talk about what is true of

everything or something—that is, we want to symbolize sentences that have no proper names, as in

‘Someone ate the chocolate cake’. To do this requires some new symbols, which we call ‘quantiﬁers’:

The ﬁrst, ‘∀’ is called the universal quantiﬁer, and is sometimes said ‘for every’ or ‘everything’. (In

older works, the universal quantiﬁer is written with parentheses around the variable; so instead of

‘ ∀x’, they have ‘(x)’. It makes it easier to type, I suppose.) The second is called the ‘existential

quantiﬁer’, and is also said ‘there exists something such that’ or ‘there is’.

Jezebel ate the chocolate cake, so someone ate the chocolate cake.

Cj.

∃xCx.

Cj .˙.∃xCx.

Notice, just as before, we need to paraphrase a little before we can translate. The symbolized

sentence literally says ‘For some x, x ate the chocolate cake’. The ‘x’ here is called a variable; we

normally pick letters toward the end of the alphabet—‘x’, ‘y’, ‘z’, and then ‘w’, ‘v’, and so on. The

variable functions like a pronoun, so another paraphrase would be ‘there exists something such that

it ate the chocolate cake’.

The variable that follows a quantiﬁer is said to be bound by that quantiﬁer. So, for example in

the sentence ‘ ∃xCx’, the x is bound by the existential quantiﬁer. A quantiﬁer’s scope extends to the

next connective, or, if there are parentheses, to the right parenthesis. So in the statement

∀xPx &Qx

18 CHAPTER 1. FIRST-ORDER LOGIC

The ‘x’ in ‘Px’ is bound—within the scope of the universal quantiﬁer—and the ‘x’ in ‘Qx’ is

unbound or free. A bound variable is like a pronoun, but an unbound variable is like a name. So

the above sentence says ‘Everything is P, and x is Q’. If we put parentheses, like this:

∀x(Px &Qx)

Now, if ‘ ∀xPx’ says ‘Everything is purple’, how do we say ‘Not everything is purple’? By putting

the negation in front of the quantiﬁer, like this:

∼∀xPx.

∀x ∼Px.

So the order of the symbols matters. The ﬁrst says ‘It is not the case that everything is such

that it is purple’; the second says ‘Everything is such that it is not purple’. This second is probably

more naturally said ‘Nothing is purple’, which can be paraphrased ‘It is not the case that there is

something that is purple’, or

∼∃xPx.

So here we have two different, equivalent ways of saying the same thing: ‘Everything is not

purple’ can be translated ‘ ∀x ∼Px’ or ‘ ∼∃xPx’. In fact, there are other equivalences:

∀x ∼Rx ≡ ∼∃xRx

∼∀xRx ≡ ∃x ∼Rx

∼∀x ∼Rx ≡ ∃xRx

If we call a negation before the quantiﬁer an outer negation, and a negation after the quantiﬁer

an inner negation, we can say that an outer negation of one quantiﬁer is the same as the inner

negation of the other. Or, you can think of it like this: when you push a negation through the

quantiﬁer, the quantiﬁer ﬂips.

There are four common statement types that use a single quantiﬁer, which since the middle ages

have been named after vowels:

A Every P is Q ∀x(Px →Qx)

E No P is Q ∀x(Px →∼Qx)

I Some P is Q ∃x(Px &Qx)

O Not every P is Q ∃x(Px &∼Qx)

1.1. TRANSLATING CATEGORICAL STATEMENTS. 19

These four types of statement, along with the unquantiﬁed statements of the forms Pa and ∼Pa,

are called categorical statements.

Notice a few things about these categorical statements. A and E sentences begin with the uni-

versal quantifer and have →as the main connective; I and O sentences begin with the existential

quantifer and have &as the main connective. This is always the way it works with categorical state-

ments, and almost always the way it works even with more complicated statements. This is important

enough to put into its own box.

→; the main connective after ∃ is &.

Why is that? Let’s look at some concrete examples. The next few examples will use the following

symbols:

Px: x is a pie

Dx: x is delicious.

Now look at an A statement, like ‘Every pie is delicious’. This says to look at everything (in the

universe, or in our universe of discourse). If it’s not a pie, we ignore it. If it is a pie, then it must also

be delicious, or what we’ve said is false. That’s what

∀x(Px →Dx)

says. If we had translated instead ‘∀x (Px &Dx)’, this would have said that everything is both a pie

and delicious. Everything is a delicious pie—that’s far stronger than we wanted to say. Any non-pie

or any non-delicious thing would be a counterexample. As it is, only something that is both a pie

and is not delicious would prove the sentence false.

Now think about an I statement, like ‘Some pies are delicious’. You might think we’d want to

symbolize with a conditional, as before. But recall that ‘Px →Qx’ is equivalent to ‘∼Px vQx’, so

‘∃x(Px →Qx)’ is equivalent to ‘∃x(∼Px vQx)’. This says ‘there is something that is either not a pie or

is not delicious’. This is far weaker than we wanted to say. It could be made true if there is anything

that’s not a pie, or anything that’s delicious. It’s far too easy to be made true. What we want is for

the sentence to say ‘there are some things that are pies and are delicious’, and that’s what

∃x(Px &Dx)

says. (Note that this says that there is at least one delicious pie, whereas the English has a plural. In

logic we lose that distinction. An existential quantiﬁer is true even if there’s only one, and is still true

if there are billions.)

Many categorical sentences are fairly straightforward to translate. A few words have a trick to

them.

20 CHAPTER 1. FIRST-ORDER LOGIC

only

A sentence like

says there’s nothing delicious—not cakes, not cookies—except pies. It says ‘Everything that is deli-

cious is a pie’:

∀x(Dx →Px)

That is, ‘only’ swaps the antecedent and consequent, just as it does in truth-functional logic. Just

as ‘only if ’ is the converse of ‘if ’, ‘only’ is the converse of ‘all’.

any

Sometimes in English, ‘any’ means ‘all’ and sometimes it means ‘some’. Look at these sentences:

Not any pies are delicious = No pies are delicious (E) ∀x(P x → ∼Dx)

If anyone is hungry, he will eat = All hungry people will eat (A) ∀x(Hx → Ex)

If there are any pies, Jezebel is hungry ∃xP x → Hj

The last sentence is not a categorical statement, and looks ahead to what we’ll be doing in later

chapters. Whether ‘any’ should be translated using an existential or a universal depends on the

context.

If the English doesn’t have ‘all’ or ‘some’, as in ‘Pie is delicious’, sometimes it means ‘all’ and

sometimes it means ‘some’. How can you tell? Only by looking at the sentence as a whole and

ﬁguring out what it means and how best to translate that into the language of logic. That’s the same

method you use when translating into Spanish, or Swahili. There are a few tricks, but translation is

an art.

Exercises

11 All who work magic are wizards. (Wx: x is a wizard; Mx: x works magic)

12 Not all who work magic are wizards. (Wx: x is a wizard; Mx: x works magic)

1.1. TRANSLATING CATEGORICAL STATEMENTS. 21

16 Some books are both long and interesting. (Bx: x is a book; Lx: x is long; Ix: x is interesting)

18 No book is interesting unless it is not long. (Bx: x is a book; Lx: x is long; Ix: x is interesting)

19 All long books are interesting. (Bx: x is a book; Lx: x is long; Ix: x is interesting)

20 Only fools and horses work. (Fx: x is a fool; Hx: x is a horse; Wx: x works)

21 Fools and children speak the truth. (Fx: x is a fool; Cx: x is a child; Tx: x speaks the truth)

22 Some insects are dangerous only if bothered. (Ix: x is an insect; Dx: x is dangerous; Bx: x is

bothered)

23 All insects are dangerous if bothered. (Ix: x is an insect; Dx: x is dangerous; Bx: x is bothered)

24 All politicians and outlaws are liars and scoundrels. (Px: x is a politician; Ox: x is an outlaw;

Lx: x is a liar; Sx: x is a scoundrel)

Exercises

Cx = x is a cake.

Gx = x is good.

Bx = x has been baked properly.

26 Some cakes are good only if they have been baked properly.

22 CHAPTER 1. FIRST-ORDER LOGIC

37 Some cakes are good even though they have not been baked properly.

Exercises

Devise your own symbols, and translate the following

50 No man is an island.

Universe of discourse

Sometimes, instead of quantifying over everything, we’ll explicitly say the variables range over some

smaller class. This is called restricting the universe of discourse. There are two cases when this is

very common: (1) We restrict the universe of discourse to persons, so that ‘ ∃xCx’ says ‘Someone

ate the chocolate cake’ instead of ‘Something ate the chocolate cake’. (2) When we’re doing math,

we restrict the universe of discourse to numbers.

1.2. RELATIONS AND MULTIPLY GENERAL STATEMENTS 23

Multiply-general propositions

Syllogisms and categorical statements are only one small part of modern logic. The real power

comes from multiply general statements, statements that have more than one quantiﬁer. Here is an

example of an argument that looks like a syllogism—or rather, an enthymeme—but can’t be shown

valid by any of the many methods devised for syllogisms:

All donuts are delicious, so anyone who eats a donut eats something delicious.

Here the premise is a simple categorical statement; it might be symbolized ‘ ∀x(Nx →Dx)’. The

conclusion could also be translated as a simply categorical statement; taking ‘Ex’ to be ‘x eats a

donut’ and ‘Sx’ to be ‘x eats something delicious’. But then the conclusion would be ‘ ∀x(Ex →Sx)’,

which does not follow from the premise.

The ﬁrst step to symbolizing this argument is to extend our notation so that a quantiﬁer can apply

to only part of a line, and a sentence can have more than one quantiﬁer. With that change, we can

have sentences that are truth-functional combinations of categorical statements. For these sentences,

let Px=‘x is a professor’, Sx=‘x is a student’, Hx=‘x is happy’, Cx=‘x goes to class’, b=‘Brünnhilde’,

and restrict the universe of discourse to persons.

∀x(Sx →Hx) →Hb

∃x(Px &Hx) →∀x(Sx →∼Hx)

∼∃xGx →∀xHx

∀x(∼Sx →Hx) v∃x(Px &∼Cx)

If everyone who goes to class is happy, and not all students are happy, then not all students go to class

(∀x (Gx →Hx) &∃x (Sx &∼Hx)) →∃x (Sx &∼Cx)

Brünnhilde goes to class only if she is a student, not a professor, and all students go to class

Cb →((Sb &∼Pb) &∀x(Sx →Cx))

In many of these examples, we used the variable x for multiple quantiﬁers. Because each quan-

tiﬁer has its own scope, marked by the parentheses, there is no ambiguity.

Sometimes, however, it is necessary for ony quantiﬁer to fall in the scope of another. One way

for this to happen is with what I call a “yellow banana” sentence, because of this example (let Yx=‘x

is yellow’, Bx=‘x is a banana’, Rx=‘x is ripe’):

24 CHAPTER 1. FIRST-ORDER LOGIC

If any bananas are yellow, then if all yellow bananas are ripe, they are ripe

∀x((Bx &Yx) →(∀y((By &Yy) →Ry) →Rx))

This sentence is difﬁcult. The antecedent seems to say ‘there are yellow bananas’, which would

be translated ‘∃x(Bx &Yx)’. But they that ‘they’ if the last consequent needs to be bound by the

same quantiﬁer. This requires making the quantifer apply to the whole statement, not merely the

antecedent, which requires it to become a universal.

Exercises

Translate these sentences using the following symbols:

Dx=x is a dog

Cx=x is a cat

Wx=x is well-trained

Px=x is a perfect pet

Fx=x is friendly

k=Kinkie

8 Every dog is a perfect pet, but, among cats, only those that are well-trained are perfect pets.

11 If all cats are friendly, Kinkie is not well-trained if he is not a perfect pet.

14 If any cat is well-trained, then if all cats are friendly, it is a perfect pet.

15 No dog is a cat.

1.2. RELATIONS AND MULTIPLY GENERAL STATEMENTS 25

19 Among cats, only those that are friendly and well-trained are perfect pets.

23 If any dog is friendly, then if all dogs are well-trained, it is a perfect pet.

24 No unfriendly cats are perfect pets but some friendly ones are.

25 No cat is a perfect pet only if neither all cats are friendly nor some cats are not well-trained.

Relations

Having more than one quantiﬁer allows for us to symbolize relations. For example, we can take

‘Dxy’ to be ‘x is more delicious than y’. Then, if we name a particular cookie ‘Nebuchadnezzar’

(symbolized ‘n’), and another cookie ‘Ahasuerus’ (symbolized ‘a’), then we can symbolize

as

Dna.

If we wanted to say that Ahasuerus is more delicious than Nebuchadnezzar, we would switch the

order of the names, like this:

Dan.

as

∃xDxn

26 CHAPTER 1. FIRST-ORDER LOGIC

as

∀xDxn.

As we’ve seen, the order of the names or variables after the relation symbol matters. Likewise,

the order of the quantiﬁers matters. We choose something for the leftmost quantifer ﬁrst, and then

work our way in. For example, compare these two sentences:

Note that these are not the same. The ﬁrst says that there’s one thing, a blueberry cheese-

cake most likely, that is more delicious than everything. The second says that there’s no end to

deliciousness—take anything you like, no matter how delicious it is, there’s something even more

delicious.

∀x∃yDxy ∀x∃yDyx

Everything is more delicious than something Everything has something more delicious

than it

i.e., there is no least delicious thing i.e., there is no most delicious thing

∃x∀yDxy ∃x∀yDyx

There is something more delicious than ev- There is a something than which everything

erything is more delicious

i.e., there is a most delicious thing i.e., there is a least delicious thing

(You may have noticed that the bottom two sentences require something to be more delicious than

itself. Thus the English paraphrase after ‘i.e.’ isn’t exactly right. To symbolize those, we’ll need

identity, which we’ll learn later.)

The order of the quantiﬁers matters to the interpretation of the sentence, and so does the order

of the variables in a relation. ‘Dxy’ means that x is more delicious than y; ‘Dyx’ means that y is

more delicious than x.

Now we can symbolize the argument mentioned before:

All donuts are delicious, so anyone who eats a donut eats something delicious.

We let ‘Nx’ stand for ‘x is a donut’ and ‘Dx’ stand for ‘x is delicious’, as before. We need to add ‘Px’

for ‘x is a person’ and ‘Exy’ for ‘x eats y’. Now we can symbolize the argument like this:

1.2. RELATIONS AND MULTIPLY GENERAL STATEMENTS 27

Example

Paraphrase: ‘Take and pie you like and any cookie you like, the pie will be more delicious than

the cookie’. That is, the ‘any’ here is a universal quantiﬁer. It doesn’t mean that the pie is more

delicious than at least one cookie; it means it’s more delicious than them all.

Another way of saying this, one closer to the English paraphrase, is this:

This is equivalent to the ﬁrst, but the scope of all quantiﬁers is the whole sentence.

Example

If any book is damaged, any student who checked it out will be ﬁned for it.

It is often helpful to symbolize in stages, sometimes starting with related, but easier propositions.

Look ﬁrst at the antecedent. It doesn’t mean ‘if every book is damaged’, but rather ‘if there is a

damaged book’. So we symbolize it like this, with ‘Bx’ symbolizing ‘x is a book’ and ‘Dx’ symbolizing

‘x is damaged’:

∃x(Bx &Dx) →…

Now, the consequent. Here again, it doesn’t mean that all students will be ﬁned, only some.

The ‘any’ indicates that there are are no restrictions; no student is exempt from whatever process

the library uses to decide who pays the ﬁne. So, taking ‘Sx’ to symbolize ‘x is a student’ and ‘Fx’ to

symbolize ‘x will be ﬁned’, the whole proposition is symbolized like this:

Now, the original statement had a qualiﬁcation on ‘any student’: ‘any student who checked it

out’. We’ll need a new relation statement, ‘Cxy’ to symbolize ‘x checked y out’. It is a single object

that is a damaged book and is checked out by the student, so the quantiﬁer will need to stretch across

both the antecedent and the consequent. Here the consequent is fairly easy:

28 CHAPTER 1. FIRST-ORDER LOGIC

that is, we simply extended the scope of the ﬁrst quantiﬁer to range over the whole statement. But

this is wrong. Recall the rule of thumb about the main connective for an existential quantiﬁer, and

think about what this says: There is something that is either not a damages book or was checked out

by some student who was ﬁned. It won’t help to change the connective to a conjunction, like this:

This asserts that there is a damaged book, but the original statement only claimed that if there were,

some student would be charged. The solution is to change the ﬁrst quantiﬁer to a universal:

This gets it right. It says that all damaged books (if there are any) are such that someone who checked

them out will be ﬁned. There was one more complication in the original statement, but not one that

should puzzle us now. It said that the student ‘will be ﬁned for it’. So instead of having a monadic

predicate ‘x is ﬁned’ we need a relation ‘Fxy’, ‘x is ﬁned for y’:

universal quantiﬁer on the whole conditional.

Exercises

Symbolize the following English sentences using the following symbols:

Px: x is a pie

Cx: x is a cake

Dxy: x is more delicious than y

32 If there’s a pie more delicious than any cake, then that pie will be more delicious than every

cake.

1.2. RELATIONS AND MULTIPLY GENERAL STATEMENTS 29

Exercises

Symbolize the following English sentences using the following symbols:

Lxy: x loves y

Restrict the universe of discourse to persons

38 If someone loves everyone, then that person loves himself. (or herself)

Exercises

Translate the following from logical notation into English. All are common proverbs, or are

based on the scripture cited. Use the following symbols:

Bx: x is broken

Cxy: x comes to y

Fxy: x falls on y

Gx: x is made of glass

Gxy: x gathers y

Hx: x is home

Lx: x is a place

Lxy: x lives in y

Kxy: x is like y

Mx: x is moss

Px: x is a person

Pxy: x grinds y to powder.

Rx: x is rolling

Sx: x is a stone

Txy: x should throw y

Wx: x waits

30 CHAPTER 1. FIRST-ORDER LOGIC

46 ∀x(Px →∀yCyx)

Exercises

For the following exercises, use these symbols:

Lx: x is a lion

Zx: x is a zebra

Axy: x attacks y

Sxt: x sees y

56 If every lion attacks only zebras it sees, but no lion sees every zebra, then not every zebra is

attacked.

57 Some lions attack only zebras, but no lion attacks every zebra.

58 Every zebra is attacked by some lion, but no lion attacks every zebra.

Exercises

For the following exercises, use these symbols:

Bx: x is a boy

Gx: x is a girl

Kxy: x kisses y

1.2. RELATIONS AND MULTIPLY GENERAL STATEMENTS 31

Sxt: x sees y

Also, restrict the universe of discourse to persons. That means that the variables refer only to

persons.

Exercises

For the following exercises, use these symbols:

Ex: x is even

Ox: x is odd

x<y: x is less than y (notice that we use “inﬁx” notation here)

Sxt: x is the successor of y

Restrict the universe of discourse to natural numbers (0, 1, 2, ...). That means that the variables

refer only to numbers.

32 CHAPTER 1. FIRST-ORDER LOGIC

77 No number is less than every number.

78 Every even number has an odd successor.

79 Every odd number is the successor of some even number.

80 Every number is less than its successor.

81 No number is its own successor.

82 No number is less than itself.

83 If every number is less than its successor, then there is some number that is not the successor

of anything.

84 If a number is less than another, the second is not less than the ﬁrst.

85 If one number is less than another, and that number is less than a third, the ﬁrst number is

less than the third.

Exercises

Translate the following using the symbols provided.

86 There is a treasure in each egg, and an egg in every hiding spot. (Tx: x is a treasure, Ex: x is

an egg, Ixy: x is in y, Hx: x is a hiding spot)

87 If Brünnhilde is faster than everyone, she is faster than herself. (universe=persons, b: Brünnhilde,

Fxy: x is faster than y)

88 Any dog that chases itself will hurt something. (Dx: x is a dog, Cxy: x chases y, Hxy: x hurts

y)

89 Every man who looks at himself sees something he doesn’t like. (Mx: x is a man, Lxy: x looks

at y, Sxy: x sees y, Lxy: x likes y)

90 Every farmer who has a donkey beats it. (Fx: x is a farmer, Dx: x is a donkey, Bxy: x beats y.

This sentence is tricky, and it’s somewhat controversial what the right symbolism is.)

91 Harry shaves everyone who doesn’t shave himself. (universe=persons, h: Harry, Sxy: x shaves

y)

92 Harry shaves only those who don’t shave themselves. (universe=persons, h: Harry, Sxy: x

shaves y)

93 If Harry shaves all and only those who don’t shaves themselves. (universe=persons, h: Harry,

Sxy: x shaves y)

1.3. PROPERTIES OF RELATIONS 33

If you know that A is next to B in a straight line, then you know that B is next to A. This is because

the relation next to has a special property, called symmetry. If R is a symmetric relation, whenever x

stands in relation R to y , y stands in the same relation to x.

In symbols:

∀xy(Rxy → Ryx)

Some relations are symmetric. Others are asymmetric: if x stands in R to y , y doesn’t stand in R

to x. Taller than is a good example: If A is taller than B, you know that B is not taller than A. Other

relations are non-symmetric, which means that there are no guarantees either way.

the same height as taller than loves

next to in front of

∀xy(Rxy → Ryx) ∀xy(Rxy → ∼Ryx) ∼∀xy(Rxy → Ryx) & ∼∀xy(Rxy → ∼Ryx)

Some relations are reﬂexive, which means everything stands in that relation to itself. Is the same

height as is a good example of this: everything is the same height as itself. Others are irreﬂexive, so

don’t stand in the relation to themselves, and others are non-reﬂexive.

(Actually, reﬂexivity usually refers to a slightly weaker property: if x bears relation R to anything,

or anything bears R to it, then it bears R to itself. The simpler property is called total reﬂexivity).

the same height as taller than loves

in the same place as in front of

∀xRxx ∀x∼Rxx ∼∀x(∃y(Rxy v Ryx) → Rxx)

&∼∀x∼Rxx

∀x(∃y(Rxy v Ryx) → Rxx)

34 CHAPTER 1. FIRST-ORDER LOGIC

taller than next to loves

in front of immediately in front of

∀xyz((Rxy & Ryz) → Rxz) ∀xyz((Rxy & Ryz) → ∼Rxz) ∼∀xyz((Rxy & Ryz) → Rxz)

&∼∀xyz((Rxy & Ryz) → ∼Rxz)

1.4 Identity

One special relation is identity, the relation that everything bears to itself and to nothing else. Partly

because this relation is so special, but also partly because it’s so familiar, we symbolize this relation

differently. Instead of writing

Ixy

we write

x=y.

∼x=y

we write

x̸=y.

We will consider four forms of statement that can be symbolized using identity.

There are at least three As ∃x ∃y ∃z (Ax &Ay &Az &x ̸= y &x ̸= z &y ̸= z)

There is at most one A ∀x ∀y ((Ax &Ay) →x=y)

There are at most two As ∀x ∀y ∀z((Ax &Ay &Az)→(x=y vx=z vy=z)

There is exactly one A ∃x (Ax &∀y (Ay →x=y))

There are exactly two As ∃x ∃y(Ax &Ay &x ̸= y &∀z(Az →(x=z vy=z)))

The A is B ∃x (Ax &∀y (Ay →x=y) &Bx)

The C-est A is B ∃x (Ax &∀y ((Ay &x ̸= y) →Cxy) &Bx)

1.4. IDENTITY 35

There is at least one pie on the shelf.

(For this and the following examples, we’ll use these symbols: Px=‘x is a pie’ and Sx=‘x is on the

shelf ’)

The existential quantiﬁer already says ‘at least one’, so ‘there is at least one pie on the shelf ’ is

symbolized

∃x (Px &Sx)

With larger n, we want to guarantee that the items picked out are distinct. To say ‘There are

at least two pies on the shelf ’, we need to say ‘there is a pie on the shelf, and there is another pie on

the shelf ’. To get the idea of ‘another’, we use the negated identity, to mean something that is not the

thing we already picked out:

For larger numbers (at least three, at least four, ...), we need more quantiﬁers. To say ‘at least n’,

we need n quantiﬁers. Then we need to say that none(of these ) is the same as any of the others, that

n

there are n distinct things. To do this, we need to add n−1 or n(n − 1)/2 non-identity conjuncts.

So to say ‘There are at least three pies on the shelf ’, we say

You may have noticed that there are some parentheses missing. When symbolizing with identity,

&can multiply quickly. In this section, we will allow strings of conjunctions to have parentheses only

surrounding the whole, instead of around each pair.

There is at most one pie on the shelf

(For this and the following examples, we’ll use these symbols: Px=‘x is a pie’ and Sx=‘x is on the

shelf ’)

To say ‘at most one’, we’ll say something like ‘if you try to take two things, you’ll really take the

same thing twice’. That is:

36 CHAPTER 1. FIRST-ORDER LOGIC

Take anything you like, and take anything you like: if they’re both pies on the shelf, you

took the same thing twice.

This is a little awkward to say in English. We want to say ‘take anything you like, and take anything

else ...’. But that ‘else’ isn’t built into the quantiﬁer. This allows us to say things that are awkward to

say in English. If we want the ‘else’, we use identity.

To add more things (at least two, at least three, ...), we add more quantiﬁers. To say ‘there are at

most n things’, we need n + 1 quantiﬁers. In the( consequent,

) we disjoin an identity statement with

n

every combination of variables (so there will be n−1 or n(n − 1)/2 disjuncts). The sentences get

unwieldy pretty quickly, but the idea is this: if you try to pick out n + 1 things, you’ve picked out the

same thing at least once.

There is exactly one pie on the shelf

‘Exactly one’ means ‘at least one and at most one’. So we can symbolize this simply by conjoining

‘at least’ with ‘at most’:

There is a simpler way to do this, however. First I’ll paraphrase, then symbolize:

There is a pie on the shelf, and everything that is a pie on the shelf is that ﬁrst pie.

For larger numbers (exactly two, exactly three, ...), we could likewise simply conjoin the sentences

for ‘at least’ and ‘at most’. But we could also combine them, with n existential quantiﬁers, and an

additional one universal. We need just as many non-identity conjuncts as we to do say ‘at least’, but

in the consequent of universal conjunct, we need only n additional identity disjuncts.

The A is B

The pie is on the shelf

This statement asserts that there is at least one pie and that there is at most one pie, and that

this pie is on the shelf. So we can symbolize it as if it said

1.4. IDENTITY 37

in symbols

It may seem odd that ‘the pie is on the shelf ’ and ‘Peter is on the shelf ’, which have similar

grammatical structure in English, should translate into logic so differently. ‘Peter is on the shelf ’

is a simple subject-predicate statement: ‘Sp’. But ‘the pie is on the shelf ’, also a simple subject-

predicate in English, turns into logic as a monster: ‘∃x (Px &∀y (Py →x=y) &Sx)’. The philosopher

who proposed this translation, Bertrand Russell, recognized how very odd this seems. He took the

moral to be that we cannot simply look to the structure of a sentence in a natural language like

English to ﬁnd out what its logical and philosophical implications are.

Not every instance of ‘the’ should be translated this way. Sometimes ‘the’ means ‘all’, as in ‘The

good die young’ (paraphrase: ‘Everything is such that if it is good, it dies young’). But if ‘the’ means

to pick out a single thing, it should be translated according to Russell’s theory.

The C-est A is B

One special case of ‘the’ sentences are sentences with a superlative. We might try to symbolize

(using an additional symbol: Txy=x is tastier than y) as ‘∃x (Px &∀y (Py →Txy) &Sx)’. This is close:

it says that there is some pie tastier than all pies, and it is on the shelf. But this implies that this pie

is tastier than itself. We want to say that this pie is tastier than all other pies, and for that we need

identity:

This also allows us to correct the translations of ‘there is a most delicious thing’ and ‘there is a

least delicious thing’ from a few sections back:

∃x ∀y (x ̸= y →Dxy)

∃x ∀y (x ̸= y →Dyx)

38 CHAPTER 1. FIRST-ORDER LOGIC

Exercises

Symbolize the following sentences, using this key:

Lxy: x is larger than y

Exy: x eats y

Sx: x is a snickerdoodle

Cx: x is a cookie

j: Jezebel

Exercises

Symbolize the following sentences, using this key:

Ax: x is an apple pie

Bxy: x is better than y

Cx: x is a cookie

Sx: x is on the shelf

Sxy: x is smaller than y

12 There’s an apple pie on the shelf. (That is, there’s exactly one.)

1.4. IDENTITY 39

20 There are two cookies, and they are the same size.

Exercises

For these exercises, restrict the universe of discourse to the “natural” numbers 0, 1, 2, 3, ....

Often, we we are working with numbers, we also use ‘<’ and ‘>’ as relations between terms. So to

say ‘x is greater than y’, we say ‘x>y’. We also let the numerals stand as names for the numbers, so

we can say ‘5>4’.

Use the following symbols:

Px: x is prime

Ex: x is even

x>y: x is greater than y

Dxy: is a a divisor of y

27 The prime number greater than 5 and less than 10 is not even.

40 CHAPTER 1. FIRST-ORDER LOGIC

Exercises

Translate the following into English, using this key.

Bxy: x is better than y

Fx: x is free

Gxy: x is longer than y

Lx: x is a laugh

Lxy: x is later than y

Jx: x is a journey

Sx: x is a step

Sxy: x starts with y

32 ∃x ∀y(Bxy &Fx)

̸ y) →Lyx]] &Fx

The concept of a function is particularly important in logic and mathematics. A function can be

thought of as a rule enabling one to go from one or more members of one set (called the domain)

to a unique member of a second set (called the range). For example, one function would be a

rule (perhaps in the form of a chart or an equation) that, given a package of hamburger of a certain

weight, enables one to determine its cost. The domain of this function is weights of different packages

and the range is costs. Notice that a function may assign the same value to more than one member

of the domain (packages with different weights can have the same cost—perhaps costs go up by half

pound increments). However, it cannot assign different values to any single member of the domain,

that is, packages of the same weight cannot be assigned different costs. If this were to happen, the

cost would not be a function just of the weight (although, it may be a function of weight and of other

variables such as fat content).

For many functions, the domain and the range are the same one set (e.g. numbers). Thus, many

mathematical functions are rules enabling one to go from one or more numbers to another number.

For example, the so-called successor function is this rule: given any number, write its successor.

Values in a domain are called arguments. Of course, this use of ‘argument’ is not related to its

typical use in logic (just like the bark of a dog has nothing to do with the bark that grows on trees).

We have encountered functions throughout our study of logic. Here are two examples: (1) ‘not’,

‘or’, ‘and’, ‘if . . ., [then] . . .’, and so forth are called truth-functional connectors because the truth

values of the compound statements in which they are used are functions of the truth values of the

simple statements of which those compound statements are composed. The domain and the range

1.4. IDENTITY 41

of these functions are the same, namely, truth and falsity. ‘Not’ is a one-place function, the other

connectors are two-place functions. (2) Expressions like ‘Px’ are called statement functions. Such

expressions are functions whose domain is a set of individuals (whose names we insert in place of ‘x’)

and whose range is truth and falsity. For each named individual, if that individual has property P,

the value of ‘Px’ is true, otherwise it is false. So Px is a rule for going from individuals (the domain)

to truth and falsity (the range).

As we have noted, for many mathematical functions the domain and the range are both the

set of natural numbers. Some functions deﬁned in the domain of natural numbers require two (or

more) arguments to assign a value in the range. For example, given any two numbers (which may

or may not be the same), an addition table enables one to ﬁnd their sum. Of course, different pairs

of numbers can have the same sum (e.g. 1 and 5 or 2 and 4), but, as in the case of any other function

(for example, the package-weight X cost function), if a given pair were assigned more than one value

in the range, addition would not be a function (at least not a function simply of the numbers being

added). Thus, addition and multiplication are two-place functions. In mathematics, functions are

usually represented by such letters as ‘f ’ or ‘g’. If ‘x’ and ‘y’ represent values in a domain, expressions

like ‘f(x)’ and ‘g(x,y)’ represent the corresponding values in the range. Thus, we could represent

the sum of x and y by ‘s(x,y)’ and their product by ‘p(x,y)’; however, these functions are usually

represented by ‘x + y’ and ‘x × y’ respectively, and these are the symbols we will use.

When a function symbol is attached to the names of appropriate objects, the result can be

thought of as a complex name for the corresponding value in the range. Names and appropri-

ately ﬁlled function symbols are called terms. Thus, ‘5’ and ‘2 + 3’ are terms both of which happen

to pick out the same object—the number ﬁve.

Function symbols resemble predicate symbols—both can have one, two, three, or more blanks.

But do not confuse the two. Functions are rules that enable one to go from one or more members

of a domain to a unique value in a range. Terms, whether names or appropriately ﬁlled function

symbols, are neither true nor false. For example, the term ‘2 + 3’, in which the function symbol

‘+’ is attached to two appropriate arguments, is neither true nor false (just as ‘5’ is neither true nor

false). This is true even of truth functions. For example, like ‘2 + 3’, the compound statement, ‘ A’ is

neither true nor false—it simply picks out, in the range, the opposite truth value to whatever value

is assigned to ‘A’.

By contrast, predicates assert that a given object has a particular property or that two or more

objects stand in a certain relation. Thus, when a predicate is appropriately attached to one or more

terms, the result is a sentence that is either true or false. For example, if ‘x>y’ represents the relation

is greater than (and using ‘=’ as we have already done), ‘3 > 2’, ‘2 + 3 = 5’, and ‘2 + 3 > 2’ are true,

while ‘2 > 3’, ‘2 + 3 = 3 + 3’, and ‘4 + 3 > 3 + 4’ are false. Here ‘=’ and ‘>’ are two-place predicates

whereas ‘+’ is a two-place function.

In the language of logic, functions match up with constants and (ﬁrst-order) variables; the general

term for these three things is ‘term’. Terms pick out objects, whereas predicates (and second-order

variables) pick out properties of objects. A two-place relation, for example, must have two terms,

but these could be constants, variables, or functions. Constants and variables are each only one

character long, but functions will be more. Because of this, when we symbolize using functions, we

42 CHAPTER 1. FIRST-ORDER LOGIC

sometimes put parentheses around the terms of the predicate or relation, and separate the terms

with commas. So, if we let ‘s(x)’ symbolize ‘the student of x’, ‘Txy’ stand for ‘x teaches y’, and ‘c’

stand for ‘Dr. Christensen’, then

T(c,s(c))

will symbolize ‘Dr. Christensen teaches his student’, and (restricting the universe of discourse to

persons),

∀xT(c,s(x))

symbolizes ‘Dr. Christensen teaches everyone’s student’. This last example has a statement that

includes a variable, a constant, and a function.

Notice that the function ‘s(x)’ symbolizes ‘the student of x’—a noun phrase—not something like

‘x is a student’. All terms, including function, stand for noun phrases, and as such cannot be either

true or false. Here’s a way to think of the three kinds of terms: constants stand for names, variables

stand for pronouns, and functions stand for “deﬁnite descriptions,” roughly, noun phrases starting

with ‘the’.

Functions can be chained. So to symbolize ‘The student of Dr. Christensen teaches the student

of the student of Dr. Christensen’, we say

T(s(c),(s(s(c))).

Exercises

Translate the following sentences using the key provided:

t(x) the teacher of x

s Socrates

p Plato

Gx x is Greek

Lxy x learns from y

(restrict the universe of discourse to persons)

37 Every Greek learns from the teacher of Plato.

Functions are not strictly necessary. Every statement that can be symbolized using functions can

be symbolized with identity, but without functions. But the opposite doesn’t hold. A sentence like

‘Socrates is the teacher of Plato’ can be symbolized like this:

1.4. IDENTITY 43

s=t(p),

Exercises

Translate the following sentences using the key provided:

s(x) the successor of x (that is, the natural number that follows x)

Px x is prime

Ex x is even

Also, use the numerals to name numbers, use >, <, and =, and restrict the universe of discourse

to (natural) numbers.

42 5 is the successor of 4.

48 There is some prime number such that the successor of its successor is prime.

Exercises

Translate the following sentences using the key provided:

f(x) the father of x

g George

Px x is a pioneer

Axy x is an ancestor of y

Oxy x is older than y

Chapter 2

First-Order proofs

Because all the truth-functional connectives are used in statements of FOL, all the rules of TF work

here, too. For example, the following is a legitimate proof in FOL:

1 ∀xAx →∃xBx

2 ∀xAx ∃xBx

3 ∃xBx 1,2 MP

But we have new symbols: the quantiﬁers, and we need new rules to help us deal with them. We

will add four new rules, two for each quantiﬁer. Two of the rules are straightforward, and we start

with them.

universal instantiation

If I know that every human being is mortal, then I know that any particular human being (say,

Socrates) is mortal. So I should be able to go from a sentence like

∀xMx

to a sentence like

Ms

The rule of inference that permits us to do this is called universal instantiation (UI) because we take

and instance of a universal quantiﬁer. In symbols, the rule is stated like this:

∀xPx ⊢ Pa

In this rule, ‘P’ stands for any sentence, ‘x’ for any variable in that sentence, and ‘a’ for any

name.

There are a few restrictions of our use of this rule:

44

2.1. THE FIRST THREE RULES 45

1. The universal quantifer must apply to the whole line. If any other symbol precedes the quan-

tiﬁer, or if there are other symbols after the scope of the quantiﬁer, we cannot apply the rule.

(The whole-line restriction.)

2. We must replace every occurence of the variable with the name. (The general convention).

1 ∀xAx → ∀xBx

2 Aa → ∀xBx

Because the ∀is not the main logical symbol, because it does not apply to the whole line, universal

instantiation cannot be applied here.

The second restriction prohibits the following inference:

1 ∀xAxx

2 Aax

Here we have instantiated only one instance of ‘x’ with the name ‘a’ and left the other alone.

Now, for a legitimate use of the rule:

1 ∀x(Hx → M x)

2 Hs .˙.Ms

3 Hs → M s 1 UI

4 Ms 2,3 MP

existential generalization

If I know that Felix is in the room, I know that something is in the room. So I should be able to go

from

Rf

to

∃xRx.

The rule that permits this inference is called existential generalization (EG). In symbols, the rule is

this:

Pa ⊢ ∃xPx

This means that I can take any statement with some name and replace one or more instances

of that name with a variable, preﬁxing the statement with an existential quantiﬁer. As before, there

are two restrictions:

46 CHAPTER 2. FIRST-ORDER PROOFS

1. The existential quantifer must apply to the whole line. We cannot put the existential quantiﬁer

anywhere but at the front of the statement, with its scope the entire statement. (The whole-line

restriction.)

2. The variable that we choose cannot occur anywhere else in that statement. (The general con-

vention).

1 ∀x(Lax → Lxa)

2 ∀x(∃yLyx → Lxa)

This goes from the claim that everyone Artlinde loves loves her to the claim that anyone who is

loved by anyone loves Artlinde. That’s clearly not a legitimate inference, and the rule blocks it.

The second restriction is violated in this example:

1 Lax

2 ∃Lxx

Here the x becomes bound when the quantiﬁer is added. This deduction goes from Artlinde

loves Xenophon to someone loves himself. This is clearly illegitimate, and the second restriction

blocks it.

Now, an example of these rules:

1 ∀x(Ax → Bx)

2 Aa ∃xBx

3 Ax → Ba 1 UI

4 Ba 2,3 MP

5 ∃xBx 4 EG

universal generalization

The next rule is a little trickier. We need a rule that allows us to introduce the universal quantiﬁer.

Say we had a universe of only three objects—the books on this table, say. Call them ‘a’, ‘b’, and

‘c’. If we know that a was written by Aristotle, and b was written by Aristotle, and c was written by

Aristotle, we could conclude that everything (in this universe) was written by Aristotle. That is, we

could go from Aa, Ba, and Ca to ∀xAx.

In general, though, this won’t work. We don’t always know ahead of time how many things are

in the universe of discourse, and there may be inﬁnitely many things. So we need a different rule.

In mathematical proofs, we might draw a triangle and label it ABC, say. We prove things about this

arbitrary triangle with an arbitrary name, and we can conclude that what we’ve proved holds for all

similar triangles. But we are not allowed to make any special use of the triangle we’ve drawn. We

can’t measure the angles, and conclude that all triangles have angles of just this size, for instance.

2.1. THE FIRST THREE RULES 47

We might say it this way. If I can prove something about a single individual, but I know that the

proof would have worked no matter which individual I chose, I don’t have to prove it about all of

them. This one individual stood in for them all. So I need some restrictions to guarantee that the

object I have chosen is really arbitrary, that what I prove about it I could have proved about anything.

In symbols, the rule is this:

Pa⊢∀xPx

1. The universal quantifer must apply to the whole line. If any other symbol precedes the quan-

tiﬁer, or if there are other symbols after the scope of the quantiﬁer, we cannot apply the rule.

(The whole-line restriction.)

2. The variable that we choose cannot occur anywhere else in that statement. (The general con-

vention).

3. We must replace every occurence of the name with the variable. (The general convention).

4. The name that we quantify from cannot appear in any premises or assumptions that are still

in force. (The arbitrariness restriction.)

The ﬁrst two restrictions are the same as before. The third is similar, and the fourth is new. Let’s

look at some examples of violations.

2 Aa → Ba (cp)

3 Aa → Ba 1 UI

4 Ba 2,3 MP

5 ∀xBx illegitimate use of UG

6 Aa → ∀xBx 2-5 CP

This proof goes from the premise that all alligators are brown to the conclusion that if Artlinde

is an alligator, everything is brown. This is clearly invalid, and the fourth restrictions prohibits this.

Notice that the restriction is concerned only with assumptions that are still in force. The follow-

ing proof (of a Barbara syllogism) is ﬁne:

48 CHAPTER 2. FIRST-ORDER PROOFS

1 ∀x(Ax → Bx)

2 ∀x(Bx → Cx) .˙.∀x(Ax →Cx)

3 Aa (cp)

4 Aa → Ba 1 UI

5 Ba → Ca 2 UI

6 Ba 3,4 MP

7 Ca 5,6 MP

8 Aa → Ca 3-7 CP

9 ∀x(Ax → Cx) 8 UG

This proof is ﬁne, even though the letter a occurs in the assumption on line 3, because this

assumption is discharged on line 7. The restriction prohibits us from applying UG within a subproof

on any letters that occur in the assumption.

Barbara can be proved without CP, like this:

1 ∀x(Ax → Bx)

2 ∀x(Bx → Cx) .˙.∀x(Ax →Cx)

3 Aa → Ba 1 UI

4 Ba → Ca 2 UI

5 Aa → Ca 3,4 HS

6 ∀x(Ax → Cx) 8 UG

This is by far the most common way UG is used. We ﬁrst use UI, then, after doing some TF rules,

we use UG to put the quantiﬁer back on. As long as the letter we choose isn’t used in the premises

or assumptions, nothing prevents us from having chosen any other letter.

The third restriction prohibits us from an inference like this:

1 Laa

2 ∀x(Lxa)

This inference goes from the claim that Artlinde loves herself to the claim that everyone loves

Artlinde. This is clearly ﬂawed, and whether line 1 is in a premise or assumption the third restriction

prohibits it.

Exercises

1 ∀x (Ax →Bx)

∀x (Bx →Cx)

∀xAx .˙.∃x (Ax &Cx)

2 Aa &Ba

∃xBx →∀x(Ax →Cx) .˙.∃xCx

3 ∀x((Ax &Bx) →Cx)

∀x(Ax &Bx) .˙.∀xCx

2.2. EXISTENTIAL INSTANTIATION 49

4 Aa

∀x(Ax →Bx)

∀x(Bx →Cx) .˙.∃xAx

5 ∃xCx →∀x∼Bx

∀x(Ax →Bx)

Ca .˙.∃x ∼Ax

∀xAx .˙.∃x ∀y Bxy

Existential instantiation (EI) allows us to go from the claim that something is a certain way to the claim

that some speciﬁc thing is that way. It allows us to go from the claim that some cookies are delicious

(∃x(Cx &Dx) to the claim that Nebuchadnezzar is a delicious cookie (Cn &Dn). Clearly, this must

be subject to careful restrictions, just as UG is.

In fact, EI will be treated a little differently from the other quantiﬁer rules. It will require an

assumption, like CP and IP. Suppose we know that all cookies are delicious, and that there are

cookies. We could conclude that there are delicious things:

∀x(Cx → Dx)

∃xCx ∃xDx

Suppose we reason as follows:

Let’s call one of the cookies ‘Nebuchadnezzar’. By the ﬁrst premise, if Nebuchadnezzar

is a cookie, then it’s delicious. And we assumed it’s a cookie, so it must be delicious. So

there is something that delicious. There was nothing special about the name, so we can

conclude that something is delicious.

1 ∀x(Cx → Dx)

2 ∃xCx .˙.∃xDx

3 Cn (ei, n)

4 Cn → Dn 1 UI

5 Dn 3,4 MP

6 ∃Dn 5 EG

7 ∃Dn 2, 3-6 EI

In symbols, the rule is this:

∃xPx, Pa ... p ⊢ p

50 CHAPTER 2. FIRST-ORDER PROOFS

That is, if there is an existentially quantiﬁed line and we assume an instance of that line and

conclude some statement p, we can conclude p outside the subproof.

The restrictions are these:

1. The existential quantifer must apply to the whole line. If any other symbol precedes the

quantiﬁer, or if there are other symbols after the scope of the quantiﬁer, we cannot apply the

rule. (The whole-line restriction.)

2. We must replace every occurence of the variable with the name. (The general convention).

3. The name that we quantify from cannot appear in any previous line of the proof (excluding

closed subproofs). (The arbitrariness restriction.)

4. The name cannot appear in the line that closes out the assumption.

These restrictions are similar to the restrictions on UG, but are even stricter. For UG, we couldn’t

generalize on a letter that occurs in any premises or undischarged assumptions. For EI, we cannot

instantiate using a letter that occurs on any previous line. So even if the letter was introduced by an

application of UI, it cannot be used in the EI assumption.

Here’s an example that violates that restriction:

∀x∃y Txy .˙.∃Txx

The argument goes from the claim that everything is taller than something to the claim that some-

thing is taller than itself. This is clearly invalid. Suppose we tried to prove it like this:

1 ∀x∃yT xy .˙.∃Txx

2 ∃yT ay 1 UI

3 T aa (ei,a)

4 ∃T xx 3 EG

5 ∃T xx 2, 3-4 EI (incorrect, illegitimate, wrong)

The assumption on line 3 violates the restriction, because the letter a occurs on line 2. We would

have had to choose a different letter, and the proof would not have worked. And we couldn’t do the

EI assumption before we did UI, because then we would have violated the whole-line restriction.

Here’s a proof that violates the fourth restriction. It should be obvious that it’s invalid.

2 Aa & Ba (ei, a)

3 Aa 2 Simp

4 Aa 1, 2-3 EI (violates restriction 4)

5 ∀xAx 4 UG

Here the UG line is ﬁne, since the assumption where a was introduced has been closed out.

But restriction 4 requires that the letter introduced no longer appears on the line that closes out the

2.2. EXISTENTIAL INSTANTIATION 51

assumption. This restriction, together with the UG restriction against generalizing on letters that

occur in assumptions, prevent this kind of logical hooliganism.

To help us remember which letter was introduced, to the right of an EI assumption we put the

letter in parentheses, along with the letters ‘ei’. On any line in the subproof that does not have that

letter, we may end the subproof, duplicating the line in the main proof.

Often we need to instantiate the same letter for a universal and an existential quantiﬁer. If we

do UI ﬁrst, we cannot use the same letter for EI. So, whenever possible, do EI ﬁrst. Here’s a proof of a

Disamis syllogism, to illustrate a legitimate proof using EI:

1 ∃x(Ax &Bx)

2 ∀x(Ax →Cx) .˙.∃x(Bx &Cx)

3 Aa &Ba (ei, a)

4 Aa →Ca 2 UI

5 Aa 3 Simp

6 Ba 3 Simp

7 Ca 4,5 MP

8 Ba &Ca 6,7 Conj

9 ∃(Bx &Cx) 8 EG

10 ∃(Bx &Cx) 1, 3-9 EI

This proof is a good example of a typical use of EI. As in this proof, typically an EI assump-

tion will end with an EG. The citation to the right of the EI line includes the line number of the

existentially generalized statement and all the lines in the subproof.

2 ∃x∼Bx (cp)

3 ∼Ba (ei, a)

4 Aa → Ba 1 UI

5 ∼Aa 3,4 MT

6 ∃x∼Ax 5 EG

7 ∃x∼Ax 2,3-6 EI

8 ∃x∼Bx → ∃x∼Ax 2-7 CP

Exercises

1 ∃x(Ax &∀yBxy)

∀x ∀y(Bxy →Byx) .˙.∀x ∃yBxy

2 ∀x(Bx →Cx)

∃x(Dx &∼Cx) .˙.∃x (Dx &∼Bx)

52 CHAPTER 2. FIRST-ORDER PROOFS

∃xBx .˙.∃x ∃y ((Bx &Cy) &Dxy)

As noted above, the quantiﬁers can be deﬁned in terms of each other:

∀x ∼Rx ≡ ∼∃xRx

∼∀xRx ≡ ∃x ∼Rx

∼∀x ∼Rx ≡ ∃xRx

We will adopt these equivalences as a new rule of proof, quantiﬁer negation (QN). This rule doesn’t

allow us to prove any new arguments. All the quantiﬁer equivalences can be proved using our regular

quantiﬁer rules, so QN is just a shortcut. But it does come in handy when we have a quantiﬁer with

a negation in front of it. The negation prohibits us from applying any of the quantiﬁer rules. But

we can “drive in” the quantiﬁer using QN, as in this proof of a version of Celarent:

1 ∼∃x(Ax &Bx)

2 ∀x(Cx →Ax) .˙.∼∃x(Cx &Bx)

3 ∀x ∼(Ax &Bx) 1 QN

4 ∼(Aa &Ba) 3 UI

5 ∼Aa v∼Ba 4 DeM

6 Aa →∼Ba 5 CE

7 Ca →Aa 2 UI

8 Ca →∼Ba 6,7 HS

9 ∼Ca v∼Ba 8 CE

10 ∼(Ca &Ba) 9 DeM

11 ∀∼(Ca &Ba) 10 UG

12 ∼∃x(Cx &Bx) 11 QN

Exercises

1 ∃x (Ax &Bx)

∼∃x (Bx &Cx) .˙.∼∀x (Ax →Cx)

2 ∼∀x(Bx →Cx)

∼(Bx &Dx)

∀x(Ax →(Cx vDx)) .˙.∼∀xAx

3 ∼∃x(Ax &∼Bx)

∼∃x(Bx &(∼Cx v∼Dx)) .˙.∼∃x(Ax &(∼Cx &∼Dx))

2.4. LOGICAL TRUTHS 53

4 ∀x(Bx →∼∀Axy)

∼∀x (∼Bx &∀(Cx →Axy)) .˙.∼∀x Cx

5 ∀x ∀y Axy

∼∃x ∃y Bxy .˙.∃x ∼∃y (Axy →Byx)

TF statements that are always true are called tautologies. We could use that word also to apply to

FOL statements that are always true, but normally the word is restricted to TF use. I will call them

by the generic term logical truths.

Just as when we prove that a statement is a tautology, we prove that a statement is a logical truth

by constructing an argument with no premises. We can always begin by assuming the negation of the

statement, and looking for a contradiction. If the main connective in the statement is a conditional,

we could assume the antecedent and conclude with the consequent. If the main connective of the

sentence is a quantiﬁer, sometimes we can assume part of the statement. Here’s an example of that

last strategy:

Example: ∀x(Ax →Ax)

1 | Aa (cp)

2 Aa →Aa 1 CP

3 ∀(Ax →Ax) 2 UG

We have seen the trick on line 1 before. Notice also that line 3 violates no restrictions, as the

subproof where a is introduced has been closed out.

Exercises

54 CHAPTER 2. FIRST-ORDER PROOFS

13 ∀y(∀xFx ↔y)

14 ∀y(Fy →∃xFx)

15 ∃y(Fy →∀xFx)

16 ∃y(∃xFx ↔y)

Strategies

The highest-level strategies remain the same for any arguments: analyze the argument forward,

backward, and globally, and when all else fails try random walk or indirect proof. There are a few

more speciﬁc strategies, and a few tactics.

Strategy 1: Reduce to a truth-functional proof.

Many arguments in quantiﬁcational logic, including all syllogisms, can be proved by following

these steps:

When applying rule 1, there are a few things you should watch out for:

a. Sometimes some of the premises will be unquantiﬁed. Be careful that you pick the right

constants when removing the other quantiﬁers, so that the argument will work, but not violate the

restrictions on EI.

b. Speaking of EI, remember as a rule of thumb to remove existential quantiﬁers before universal

quantiﬁers.

Strategy 2: Mix truth-functional and quantiﬁer steps.

Sometimes the above strategy fails. This may be because the premises or conclusion have quan-

tiﬁers, but they don’t apply to the whole line, so cannot be removed with our rules. Here we need

to be a little more ﬂexible, but in general, here are the rules to follow:

2.5. STRATEGIES AND TACTICS 55

1. Remove any quantiﬁers that apply to whole lines. Again be careful in what order you remove

them. (If there’s a mix of existential and universal, you may want to hold off on removing the

universal quantiﬁers until it’s necessary.)

5. Repeat as needed.

1 ∃xAx→∃xBx

2 ∀x(Bx→Cx) .˙.∃xAx→∃xCx

3 ∃xAx (cp)

4 ∃xBx 1,3 MP

5 Ba (ei, a)

6 Ba→Ca 2 UG

7 Ca 5,6 MP

8 ∃xCx 7 EG

9 ∃xCx 4,5-8 EI

10 ∃xAx→∃xCx 3-9 CP

Since the conclusion is a conditional, we started by assuming its antecedent. Line 1 has no

quantiﬁers applying to the whole line. Line 2 does, but it’s a universal quantiﬁer and we have several

existential quantiﬁers, so we might want to hold off. We notice that the line we just assumed is the

same as the antecedent of line 1, so we do modus ponens. Now, looking at lines 2 and 4 we see a ‘B’

in common, but to allow them to interact we need to remove the quantiﬁers (E before U). Then we

do a few truth-functional steps until we get ‘Ca’ on line 7. That’s an instance of the consequence of

the conclusion. So we generalize, and clean up.

Tactics & Tricks

2. If there’s a negation outside the quantiﬁer, the quantiﬁer cannot be removed. Use QN to

move the negation in, then remove the quantiﬁer.

3. If the conclusion is a conditional, assume the antecedent, even if the antecedent is quantiﬁed.

(If the conclusion can be turned into a conditional via TF rules, ditto.)

Let’s look at the difference between (3) and (4). If the conclusion is

56 CHAPTER 2. FIRST-ORDER PROOFS

∃xAx→∃xBx,

you should assume ‘∃xAx’ and try to get ‘∃xBx’, and the conclusion will follow by conditional proof.

But, if the conclusion is

∀x(Fx→Gx),

you may want to assume ‘Fa’, get ‘Ga’, then after the conditional proof, universally generalize to

get the conclusion. This will usually not violate the restrictions on UG, since the assumption that

introduced ‘a’ has been closed out, and so the constant is not in any assumptions still in force.

5. Sometimes the quantiﬁers don’t need to be removed before you apply TF rules. If the TF

rules can be applied within a line, they can be applied to a line that has a quantiﬁer. For

instance, you can go from ‘∀x(Fx→Gx)’ to ‘∀x(∼Gx→∼Fx)’ without removing the quantiﬁers

and reattaching them.

Chapter 3

Axiom Systems

The Philosophers’ Dream

In his famous allegory of the cave, Plato imagines that in our current state, our minds are full of

vague and unsystematic thoughts, as if we were imprisoned in a dark cave. To gain true knowledge,

true understanding, we would need to exit this cave and gaze at the sun. The sun, representing

ultimate reality, could give us understanding of everything and put all our knowledge into its proper

place. The imaginations of Plato and other ancient philosophers were sparked by geometric proofs,

which gave a great deal of knowledge from only a few basic assumptions, or axioms. Plato imagined

that we could carry this method all the way back to the beginning, that with just a single basic piece

of knowledge, everything else would follow. The Greek word axioma means ‘worthy thing’—the

axioms are the things most worthy of knowledge, since knowing them allows us to know everything.

Other philosophers have had similar dreams. They have hoped that all knowledge could be

ordered systematically, that knowledge of a few basic facts would allow us to know everything.

It hasn’t turned out that way. Not only has no one been able to ﬁnd anything like Plato’s ultimate

reality that, once understood, allows us to know everything, but it turns out that in a very real sense

truth is always unsystematic. But even if we can’t have an axiom system of absolutely everything, the

axiom system is a model of rigor, and it will be useful to see just how far we can push the philosophers’

dream.

Historical Background

But we’re getting ahead of ourselves. Beginning perhaps with Thales, the Greeks demonstrated the

connection between different geometrical facts. The project culminated in Euclid’s Elements, a work

that begins with a few axioms and deﬁnitions, and proceeds to prove various theorems, such as that

the interior angles of a triangle are equal to two right angles. It really is an impressive achievement,

one of the pinnacles of human accomplishment.

57

58 CHAPTER 3. AXIOM SYSTEMS

It is, however, riddled with ﬂaws. In doing his proofs, Euclid repeatedly assumes things he hasn’t

stated. One famous example: the very ﬁrst proof assumes that whenever lines intersect, there is a

point at which they intersect. This may be obvious, but we can, without contradiction, make all of

Euclid’s axioms true and this false, which means it’s not something he is entitled to assume. In the

axiomatic method, we can’t assume anything unless we say so.

By the end of the 19th century, several mathematicians wanted to bring mathematics back to

its promise of a sure foundation. Frege was among these, but much more famous in his lifetime was

David Hilbert. Hilbert gave a new axiomatization of geometry that was intended to do away with

the ﬂaws of Euclid’s.

The secret is the sharp contrast between syntax and semantics. When we set up an axiom

system, we begin with a few undeﬁned terms, and then specify rules for combining them to form

sentences. We begin with a few axioms, and we specify rules for generating theorems. But, crucially,

these terms and axioms are just marks on a page. When we follow the rules, it’s as if we were playing

chess. We are never allowed to declare something on the grounds that it’s obvious because of the

subject matter. Of course, in general we’ll be interested in an axiom system because of an intended

interpretation—because it tells us about planes and solids, or about arithmetic, or about astronomy.

But the interpretations come only after we’ve spelled everything out in a precise formal or symbolic

language.

Symbolic Languages

The concept of a symbolic language is explained as follows. One ﬁrst distinguishes logical and non-

logical symbols. Logical symbols are (i) variables for quantiﬁers (e.g. x, y, z, xo, x1, . . .), (ii) the

following ﬁve symbols: ‘&’, ‘∼’, ‘)’, ‘(‘, and ‘=’, and (iii) all other symbols such as ‘v’, ‘→’, and ‘↔’

that can be deﬁned from these. Non-logical symbols are terms (including numerals and individual

constants like ‘h’ or ‘t’ that represent names), statement letters, function symbols, and all predicates

other than ‘=’. A symbolic language includes the logical symbols (which, by this deﬁnition, are

part of every symbolic language) together with a speciﬁed set of non-logical symbols. We treat the

relational predicate ‘=’ as a logical symbol to insure that it is included in every symbolic language—it

is the only predicate that receives this special treatment.

Here are three examples: (1) In addition to the logical symbols, the symbolic language of truth-

functional logic (called L) consists of a collection of statement letters: ‘A’, ‘B’, ‘C’ . . . In this language,

there are no terms, function symbols, or predicates. (2) In addition to the logical symbols, the

symbolic language of arithmetic (called A) consists of the name ‘0’, the function symbols ‘” (read:

the successor of), ‘+’ and ‘×’, and the predicate ‘N’ (read: is a number). Other symbols can be

introduced into A by suitable deﬁnitions. For example, ‘1’ can be deﬁned as ‘0”, ‘2’ as ‘1” (i.e. as

‘0”’), etc. Using these symbols we can write sentences like ‘2 + 3 = 5’ or ∀x(x + 1 = x’). (3) In

addition to the logical symbols, the symbolic language of set theory (called S) consists of the single

two-place predicate ‘∈’ (read: is an element of). Other symbols are introduced into S by suitable

deﬁnitions.

3.2. AXIOM SYSTEMS 59

Elements of Axiom Systems

It is often possible to systematize a set of statements so that some or all of them can be derived from

a few members of the set. The members from which statements are derived are called axioms. The

best known axiom system is Euclid’s axiomatization of geometry, but other branches of mathematics,

logic, other sciences, and even the statements of ethics or political theory can be axiomatized with

more of less success. There are usually alternative ways of axiomatizing a given set of statements,

that is, given a set of statements, it may be possible to select different groups of axioms from which

the statements can be derived. Thus, the particular axioms one chooses may be arbitrary or dictated

by convenience (rather than by any assumptions about which statements are most basic, intuitively

obvious, or essential).

The most apparent elements in an axiom system are the axioms and the theorems derived from

them. For example, in Euclidian geometry one axiom is the famous parallels postulate which states

that given a line and a point not on the line, one and only one line can be drawn through the

point and parallel to the line. From this and other axioms one can derive various theorems such

as the equally famous Pythagorean theorem that expresses a relation between the sides and the

hypothenuse of right triangles.

In the late nineteenth century, an Italian mathematician named Guiseppi Peano axiomatized

arithmetic. Here is a set of axioms equivalent to the one that Peano used:

3.2.B Every number has a number as a successor.

3.2.C No number has zero as a successor.

3.2.D Given any two numbers, if the successors of those numbers are equal, the num-

bers are equal.

3.2.E If zero has some property and if, supposing that any number whatsoever has that

property, the successor of that number must also have the property, then every number

has the property.

From these axioms, one can derive various theorems. For example, from the ﬁrst two it follows

that zero has a successor which is a number.

Reﬂection reveals that axiom systems include more than just axioms and theorems. Any axiom

system must be stated in some language. In that language, some terms will be deﬁned and others

will be undeﬁned. In 3.2.A through 3.2.E, ‘zero’, ‘successor’ and ‘number’ are undeﬁned. Other

terms can be deﬁned from the undeﬁned terms; for example ‘one’ can be deﬁned as the successor

of zero. However, axioms can be stated not only in natural languages like English or Greek but also

in symbolic languages. For example, 3.2.A through 3.2.E can be stated in the symbolic language A

(as deﬁned above):

60 CHAPTER 3. AXIOM SYSTEMS

3.2.F N0

3.2.G ∀x(Nx → Nx’)

3.2.H ∼∃x(x’ = 0)

3.2.I ∀x∀y(x’ = y’ → x = y)

3.2.J ∀X((X0 &∀x(Xx → Xx’)) → ∀xXx)

As we know and as is here illustrated, a symbolic language like A includes the logical symbols

and certain non-logical symbols. In 3.2.F through 3.2.J, the undeﬁned non-logical symbols are ‘N’,

‘0’ and ‘” and other symbols can be deﬁned from them. For example, ‘1’ can deﬁned as ‘0”.

For most purposes, the only elements of axiom systems that are explicitly identiﬁed are axioms

and theorems and undeﬁned and deﬁned terms. However, in axiomatizing branches of logic, two

other elements must be taken into account. Ordinarily, given a set of statements that one wishes

to axiomatize, one uses whatever is “logical” to advance from axioms to theorems and whatever

is “grammatical” counts as a statement. However, in axiomatizing logic itself, one cannot simply

allow whatever inferences seem logical, and, working in an artiﬁcial symbolic language such as L

or A, one must specify the grammar of the language to make clear which combinations of symbols

are acceptable. So in using a symbolic language to axiomatize logic, one must identify speciﬁc rules

of inference (e.g. modus ponens) that are allowed, and one must state explicit rules of syntax that

determine whether a string of symbols is acceptable. Thus, in principle, any axiomatization will

involve six elements (which may or may not be made explicit):

Deﬁnitions Theorems

Rules of Syntax Rules of Inference

Several properties of axiom systems have been identiﬁed and studied; we will give particular at-

tention to three: independence, consistency, and completeness. The axioms of a system are inde-

pendent if no one of them can be derived from the others. Of the three properties in question,

independence is the least important—its value is mostly aesthetic. An axiom system is consistent if

no theorem is a self-contradiction. Since a self-contradiction truth-functionally implies every state-

ment, an equivalent deﬁnition of consistency is that there is some grammatical statement in the

language of the system that is not a theorem. Consistency is absolutely essential in every axiom

system because in an inconsistent system one can prove everything. Such a system is worthless.

Completeness is more difﬁcult to deﬁne than independence or consistency, and there are differ-

ent concepts of completeness depending in part on the subject matter to be axiomatized. We will be

interested in two such concepts. The ﬁrst is this: an axiom system is complete if for every grammat-

ical statement in the language of that system, either that statement or its denial is a theorem. This

concept of completeness works for arithmetic. For example, consider these two pairs of statements:

3.3. AN AXIOM SYSTEM FOR TF 61

that statement or its denial is true, if an axiom system is complete for arithmetic, it must be possible

to prove one or the other of each such pair.

The preceding concept of completeness works for arithmetic but not for logic, and the reason

is simple. Think about an axiom system intended to prove tautologies. We do not want this system

to be such that, for every grammatical statement in the language of truth-functional logic, either

that statement or its denial is a theorem. This is because, in truth-functional logic, some statements

are contingencies. For example, neither ‘A’ nor ‘∼A’ is a tautology, and if our axiom system enabled

us to prove either of these, it would also enable us to prove the other—that is, it would yield a self-

contradiction and so be inconsistent. So this concept of completeness, while suitable for arithmetic,

is not useful in logic.

Logic requires a different concept of completeness. We can approach this concept by thinking

again of truth-functional logic. Remember that a tautology is a compound statement true for every

interpretation of its simple statement letters. An axiomatization for truth-functional logic is com-

plete for tautologicality if, within that axiomatization, one can prove every tautology. We can use

this approach to deﬁne completeness for the logical truths of quantiﬁcational logic. In addition to

statement letters, quantiﬁcational statements can include predicates, function symbols, and terms; in

quantiﬁcational logic, variables must also be deﬁned over some universe of discourse. So, in quan-

tiﬁcational logic, we can deﬁne logical truth as a statement true under every interpretation of its

statement letters, predicates, function symbols, and terms, and within every universe of discourse.

In Section 2.2 we encountered numerous logical truths from quantiﬁcational logic; for example,

exercises 2.2.44 through 2.2.62 are logical truths. According to this deﬁnition, all tautologies are

logical truths but, of course, not all logical truths are tautologies. Now, given this generalized def-

inition of logical truth, we can deﬁne completeness as follows: to say that an axiomatization for

quantiﬁcational logic is complete means that it is possible, within that system, to prove every logical

truth of quantiﬁcational logic.

A set S is semantically axiomatizable iff there is A ⊆ S such that for every s ∈ S , A |= s. Every

Σ is semantically axiomatizable, if only by the set itself. We say the set is ﬁnitely, or recursively, etc.

axiomatizable if the set meets those conditions.

We’re interested here in a related notion: syntactic axiomatization. This requires, in addition

to the set of axioms, rules of inference. A derivation is a ﬁnite sequence of sentences such that every

sentence in the sequence is an axiom or follows from earlier sentences in the sequence via the rule.

Here we will take {¬, →} to be our logical constants.

The rule of inference will be

MP p, p → q ⊢ q

62 CHAPTER 3. AXIOM SYSTEMS

The lower-case letters serve as a kind of meta-variable over all sentences. So all of these are valid

instances of the rule:

A, A → B ⊢ B

A → B, (A → B) → ∼C ⊢ ∼C

∼(C & B), ∼(C & B) → (A v ∼A) ⊢ A v ∼A

(The last instance uses some deﬁned symbols.) This rule is called ‘MP’ (for modus ponens).

The axioms are these:

TF1 p → (q → p)

TF2 [p → (q → r)] → [(p → q) → (p → r)]

TF3 (¬p → ¬q) → (q → p)

Every sentence in a derivation will be called a theorem, and we write ⊢ σ to say that sigma is a

theorem. We could extend the notion of a derivation to include derivations from assumptions. We

say ∆ ⊢ σ if there is a sequence of sentences, every sentence being an axiom, in ∆, or following

from earlier sentences via the rule, and σ is in that sequence. Thus ⊢ σ iff ∅ ⊢ σ .

Being stated in terms of variables rather than in terms of statement letters, each axiom and LR1

includes all the inﬁnitely many statements in L that are instances of the appropriate forms. For

example, here are three instances of TF1:

A → (B → A)

(A & B) → (B → (A & B))

D → ((A → (B → A)) → D)

The axiom system we’re using here is due to Łukasiewicz. The ﬁrst two axioms are the same as

Frege’s ﬁrst two; the third axiom takes the place of three of Frege’s original axioms. There are many

other systems; some take conjunction or disjunction rather than conditional to be the undeﬁned

term; some have more axioms and some fewer. Some systems have only one axiom. In setting up

an axiom system for truth-functional logic, it’s most important to ﬁnd a set of axioms that’s complete

and consistent; among those that are complete and consistent, we choose one that is elegant and

easy to work with.

Let’s start ﬁnding some theorems. And let’s begin with a simple statement.

TF4 A → A.

Here’s the proof:

2 (A → ((A → A) → A) TF1

3 (A → (A → A)) → (A → A) 1,2 MP

4 A → (A → A) TF1

5 A→A 3,4 MP

3.3. AN AXIOM SYSTEM FOR TF 63

Proofs in axiom systems must be approached differently from the proofs that we’ve done before.

Every line in the proof is an instance of one of the axioms, or follows from two earlier lines in the

proof by an application of modus ponens. The proofs are generally short and simple to follow. They

are often not so simple to ﬁnd. The trick is to ﬁnd an instance of the axioms that will give us what we

want. Here, for example, the ﬁrst line is an instance of TF2, with ‘A’ substituted for ‘p’ and ‘r’, and

with ‘A → A’ substituted for ‘q ’. How can we tell what substitution instances to use? A good rule

of thumb is to ﬁnd a consequent of some axiom or previous theorem that looks like you’re trying

to prove. (By ‘looks like’, I mean ‘is a substitution instance of ’.) Here I noticed that the theorem,

‘A → A’, is a substitution instance of the consequent of TF2, ‘p → r’. So I wrote down that instance

of TF2:

I’m left with two blank spaces. The substitution instances of ‘p’ and ‘r’ are forced, but ‘q ’ is left

open. Now I need to ﬁgure out how to get rid of the rest of that line. In other words, I need to ﬁnd

a way to ﬁll in the blanks so that ‘A → (− → A)’ and ‘A → −’ are instances of axioms. Then I

can do modus ponens twice and I’ll be left with ‘A → A’. The ﬁrst one, ‘A → (− → A)’, is easy.

No matter what I put in for the blank, it will be an instance of TF1. But because I need to put the

same thing in for both blanks (since they’re both the same variable), I’ll look at the second formula,

‘A → −’. To make this one an instance of TF1, I need to substitute a conditional for the blank, and

in particular a conditional whose consequent is ‘A’. Any such conditional would work: ‘A → A’,

‘B → A’, ‘(A → (B → C)) → A’. I picked a simple one.

Now clearly we could continue to prove variations on TF4. We could prove ‘B → B ’, ‘C → C ’,

and so on. But each of the proofs would be identical, except for variations in substitution instance.

Instead of that, we will treat this proof as a proof schema, proving each particular instance. We will

state the theorem in terms of statement variables:

TF4 p→p

From now on, all theorems can be thought of as schemata. If we prove, say, ∼∼A → A, we act

as if we had proved every instance of the same form.

We can also prove derived rules. These proofs go slightly differently from proofs of theorems,

since we start with assumptions.

TF5 (HS) A → B, B → C ⊢ A → C

1 A→B

2 B→C

3 (B → C) → (A → (B → C)) TF1

4 A → (B → C) 2,3 MP

5 (A → (B → C)) → ((A → B) → (A → C)) TF2

6 (A → B) → (A → C) 4,5 MP

7 A→C 1,6 MP

64 CHAPTER 3. AXIOM SYSTEMS

(‘HS’ stands for ‘hypothetical syllogism’.) Here the method of proof is similar to TF4, but we

began with two lines without justiﬁcation. This is similar to proving from premises. But once again,

the trick is to ﬁnd an instance of the axioms or previous theorems that will give us what we want.

Here, the conclusion looks a lot like the last consequent of TF2. Then we work backward. What

would we need to have to get ‘A →C ’ by modus ponens from TF2? Well, we would need ‘A →(B

→C )’ and ‘A →B ’. The second we already have as an assumption. The ﬁrst is a conditional

statement with the other assumption as the consequent. So now we need to ﬁgure out how to add

an antecedent to something we already have. That is what TF1 does. So if we put ‘B →C ’ for ‘A’

and ‘A’ for ‘B ’, we have it.

Study this proof; it provides a good model for many of the axiomatic proofs. Many proofs,

however, will be easier, and proofs generally get easier as we go along. A big part of the reason for

this is that once we prove a theorem or a derived rule, we can cite it in further proofs.

TF6 (MT*) ∼A → ∼B, B ⊢ A

(This is called ‘MT*’ since it’s related to modus tollens.)

TF7 ∼A → ∼B, ∼B → ∼C ⊢ C → A

TF8 (EFQ) A, ∼A ⊢ B

(‘EFQ’ stands for ‘ex falso quodlibet’, the medieval name of this principle.)

Axiomatic proofs are a little tricky. For all these conditional statements, it would be nice if we could

use something like conditional proof. It turns out we can. For axiom systems, this new rule is called

the “deduction theorem,” and we could state it like this:

CP If p, . . . ⊢ q , then . . . ⊢ p → q

The dots indicate that there may be other assumptions present. The rule says that if I assume p

and conclude q (possibly with other assumptions), then (with those assumptions) I can prove p → q .

We’ll call it ‘CP’ (for conditional proof ) because that’s a more familiar name.

We could prove the deduction theorem, but we won’t. It takes TF1, TF2, and TF4 to prove it.

The deduction theorem also allows one more adjustment: For every theorem of the form ‘A

→B ’, we’ll assume we have the associated rule ‘A ⊢B ’. We can do this because we could always

assume A, then do modus ponens to get B . In fact, many of the proofs we’ll do will be expressed as

rules. Given the deduction theorem and modus ponens, ‘p → q ’ and ‘p ⊢ q ’ are equivalent.

For example, there are four different ways to express EFQ:

A ⊢ ∼A → B

∼A ⊢ A → B

A → (∼A → B)

∼A → (A → B)

3.3. AN AXIOM SYSTEM FOR TF 65

The ﬁrst follows from EFQ as stated by one application of modus ponens. If we assume ‘A’ and

EFQ , we can conclude ‘∼A → B ’. Likewise EFQ as stated follows from this by one application of

the deduction theorem. The particular instance of the deduction theorem is ‘If A, ∼A ⊢ B , then

A ⊢ ∼A → B ’. All the others are likewise equivalent to EFQ by modus ponens and the deduction

theorem. Make sure you understand how.

Notice what happens to the axioms when we express them as rules. Here is one way to state

them:

TF1 A⊢B→A

TF2 A → (B → C), A → B ⊢ A → C

TF3 ∼B → ∼A ⊢ A → B

There are other ways to write them. You may want to list them all.

Some proofs. Let’s start my proving something we’ve already proved—HS—to compare the

proof with the deduction theorem and the proof without it.

TF5 (HS) A → B, B → C ⊢ A → C

1 A→B

2 B→C

3 A

4 B 1,3 MP

5 C 2,4 MP

6 A→C 3–5 CP

TF9 (CM*) ∼A → A ⊢ A

1 ∼A → A

2 ∼A

3 A 1,2 MP

4 ∼(∼A → A) 2,3 EFQ

5 ∼A → ∼(∼A → A) 2–4 CP

6 (∼A → A) → A 5 TF3

7 A 1,6 MP

(CM* is related to consequentia mirabilis, which will be proved in TF15. It also, by deﬁnition

of v, is equivalent to A vA ⊢A, which we’ve called the rule of tautology.)

TF10 (DNE) ∼∼A ⊢ A

1 ∼∼A

2 ∼A

3 A 1,2 EFQ

4 ∼A → A 2–3 CP

5 A 4 CM*

66 CHAPTER 3. AXIOM SYSTEMS

1 A

2 ∼∼∼A → ∼A DNE

3 A → ∼∼A 2 TF3

4 ∼∼A 1,3 MP

‘DNE’ stands for ‘double negation elimination’ and ‘DNI’ for ‘double elimination introduction’.

These two together make the rule of double negation (DN). From now on we can add or remove

pairs of negations to any whole line, citing DN.

From here on the proofs will be left as exercises.

TF12 (MT) A → B, ∼B ⊢ ∼A

TF13 A → ∼B, B ⊢ ∼A

TF14 ∼A → B, ∼B ⊢ A

Notice that, by the deduction theorem, MT is equivalent to A →B ⊢∼B →∼A. This, with

TF3, gives us transposition. TF13 and TF14 tell us that A →∼B ⊢B →∼A and ∼A →B ⊢∼B

→A. These are obviously related laws, so we will allow any of them to be cited as Trans.

TF15* (CM) A → ∼A ⊢ ∼A

TF16 (RAA) A → B, A → ∼B ⊢ ∼A

‘RAA’ stands for reductio ad absurdum. This is related to, but not identical with, the rule of

indirect proof (IP). That rule must be stated like the the rule of conditional proof: Metatheorem 2

(IP) If Γ, A ⊢B and Γ, A ⊢∼B , then Γ ⊢∼A.

Given the deduction theorem and RAA, this is easy to prove, and the other part of indirect

proof, If Γ, ∼A ⊢B and Γ, ∼A ⊢∼B , then Γ ⊢A, follows easily too, given double negation.

So far all the theorems and rules have involved only our undeﬁned terms. To prove theorems

involving the other truth functors, we need to use the deﬁnitions. (The name of the next theorem

stands for ‘Law of Excluded Middle’.)

TF17 (LEM) A v ∼A

1 ∼A → ∼A TF4

2 A v ∼A 1 Def v

Perhaps the best way to approach proofs involving other truth functors is to begin at the end.

We begin by translating the thing we’re trying to prove into the basic symbols, and then treating

the proof as a proof using only those basic symbols. For example, to do TF17 we ﬁrst translated ‘A

v∼A’ into ‘∼A →∼A’, and then proved that.

TF18 (l-Add) A⊢BvA

TF19 (r-Add) A⊢AvB

TF20* A v B, ∼A ⊢ B

TF21 A v B, ∼B ⊢ A

TF18 and TF19 together give us the rule of addition (Add). (TF18 is addition to the left and

TF19 is addition to the right.) TF20 and TF21 together give us the rule of disjunctive syllogism

3.3. AN AXIOM SYSTEM FOR TF 67

(DS). These are the basic rules for dealing with disjunction; they allow us to do proofs involving

disjunction without translating back into the basic symbols.

TF22 A&B ⊢A

TF23 A&B ⊢B

TF24 (Conj) A, B ⊢ A & B

TF25* A↔B⊢A→B

TF26 A↔B⊢B→A

TF27 A → B, B → A ⊢ A ↔ B

At this point, we’ve proved all the “basic rules” from truth-functional logic, and a few of the

“shortcut rules.” That means that any argument we could prove with those rules, we can prove with

this new system. We could look at that in two different ways. One way to look at that is that from

here on out, any proof in this new axiom system is really just a proof in the system you learned in

your ﬁrst-year logic course. We may write it a little different than you did there, but it’s really the

same thing. The other way to look at it is that we’ve given axiomatic justiﬁcation for the logic you

learned in your ﬁrst-year course. When we turn from proving arguments in the system to proving

argument about the system—such as, for example, that it is complete and sound—any results we

can prove about the axiom system will also hold for the other system. Similarly, any proof that you

did in your ﬁrst-year course, and any proof that we do here, can be done citing only the axioms and

modus ponens. It might be instructive to try it. Another way to generate theorems is to prove meta-

theorems, like the deduction theorem. This allows us to show that whole classes of statements are

theorems, without proving each one individually. There are two useful meta-theorems that follow

easily from what we’ve done.

Metatheorem 3 If Γ, A ⊢B and Γ, B ⊢A, then Γ ⊢A ↔B .

Proof: Suppose A ⊢B and B ⊢A. Then, by the deduction theorem, Γ ⊢A →B and Γ ⊢B →A.

From these two it follows by BE that Γ ⊢A ↔B .

This meta-theorem allows us to generate theorems like ‘A ↔∼∼A’ (by DNI and DNE). It

also allows us to approach any biconditional theorem as if it were two separate derived rules. So

whenever we have a theorem A ↔B , we can prove it in two parts: A ⊢B and B ⊢A. Metatheorem

3 tells us that given both of these, the biconditional is a theorem.

Metatheorem 4 If Γ ⊢A ↔B and Γ ⊢A, then Γ ⊢B .

Proof: Suppose Γ ⊢A ↔B and Γ ⊢A. Thus we begin a proof with ‘A ↔B ’ and ‘A’ as the ﬁrst

two lines. We can then attach ‘A →B ’ by BE, and then ‘B ’ by MP. Because there is a proof of ‘B ’,

⊢B . These two meta-theorems can be used with the following powerful meta-theorem, the theorem

of replacement, to allow us to substitute logically equivalent statements within a line. For example,

if we have ‘∼∼A →B ’ as a line in a proof, we can write ‘A →B ’ as the next line.

Metatheorem 5 (Replacement) If A ↔B and A occurs in C , then C ↔D, where C and D differ

only in 0 or more occurrences of A in C have been replaced by B .

The proof of this theorem is by induction.

To illustrate these three meta-theorems in actions, consider this proof of one part of DeMorgan’s

law:

TF29 ∼(A v B) ⊢ ∼A & ∼B

68 CHAPTER 3. AXIOM SYSTEMS

1 ∼(A v B)

2 ∼(∼A → B) 1 Def v

3 ∼(∼A → ∼∼B) 2 DN

4 ∼A & ∼B 3 Def &

This proof is largely just replacement of deﬁnitional equivalents. On line 3, we cited ‘DN’ to

justify adding two negation symbols in front of the consequent of a conditional within a negation.

But the rule of DN as proved in TF10 and TF11, allows only adding or removing two negations in

front of the whole line. But by Metatheorem 3, A ↔∼∼A, and so by the Theorem of Replacement

we can substitute ‘∼∼A’ for ‘A’ whenever it occurs in a line. When we do that, we could cite

Replacement, but we could also cite the theorem that demonstrates that A ↔B , which in this case

is DN. Notice that the Theorem of Replacement allows multiple substitutions in a single line. We

could add double negations to two parts of a line in a single step.

TF29 is an example of a class of theorems that are largely deﬁnitional substitutions. Because the

proofs often involve double negation or transposition, they are much easier—almost trivial—once

we have the Theorem of Replacement.

TF30* ∼(A v B) ↔ ∼A & ∼B

TF31 ∼A v ∼B ↔ ∼(A & B)

(These two make up De Morgan’s law (DeM).)

TF32 (CN) ∼(A → B) ↔ (A & ∼B)

TF33 (A → B) ↔ (∼A v B)

TF34 (A → B) ↔ (B v ∼A)

(TF33 and TF34 together make conditional exchange (CE).)

TF35* (A v B) ↔ ∼(∼A & ∼B)

TF36 (A & B) ↔ ∼(∼A v ∼B)

Then follow several more rules or interesting tautologies:

TF37 (Exp) [(A & B) → C] ↔ [A → (B → C)]

TF38 (v-Comm) (A v B) ↔ (B v A)

TF39 (&-Comm) (A & B) ↔ (B & A)

TF40 (v-Assoc) (A v (B v C)) ↔ ((A v B) v C)

TF41* (&-Assoc) (A & (B & C)) ↔ ((A & B) & C)

TF42 (CD) A v B, A → C, B → C ⊢ C v C

TF43 (Dil) A v B, A → C, B → C ⊢ C

TF44 (v-Dist) (A v (B & C)) ↔ (A v B) & (A v C)

TF45 (&-Dist) (A & (B v C)) ↔ (A & B) v (A & C)

TF46 (&-Dist) ((A v B) & (C v D)) ↔ ((A & C) v (A & D) v (B & C) v (B & D))

TF47* A&B ⊢A↔B

TF48 ∼A & ∼B ⊢ A ↔ B

TF49 (Peirce’s Law) ((A → B) → A) → A

TF50 (A → B) v (B → A)

TF51 A v (A → B)

3.4. AN AXIOM SYSTEM FOR FOL 69

TF52 A → (B → C), D → B ⊢ A → (D → C)

TF53 A & B, A → C ⊢ C

TF54 A & B, B → C ⊢ C

TF55 (HS2) A & B, A → C ⊢ B & C

We keep the axioms as before. But we somehow need to extend the system to cover the new symbols

and new kinds of sentences.

UI ∀xp ⊢ p(a/x)

UG p ⊢ ∀xp(x/a), if a is not in any assumption

EG p ⊢ ∃x(x/a)

EI If p, . . . ⊢ q , then ∃xp(x/a), . . . ⊢ q , if a is not in q or any assumption

The lower-case letters p and q , as before, stand for any sentence. The x stands for any bound

variable. The notation ‘p(a/x)’ means that we take the sentence p and replace every x with some

letter a. For example, these are all instances of UI:

∀x∀y(Ax & By) ⊢ ∀y(Aa & By)

∀y∃xRxy ⊢ ∃xRxb

In the other three rules, (x/a) means that every instand of some letter a is replaced by some

variable x. The rules UG and EI have extra restrictions.

We can prove some theorems and derived rules of FOL. The ﬁrst is a good example of UI and

UG:

FOL1 (Barbara) ∀x(Ax → Bx), ∀x(Bx → Cx) ⊢ ∀x(Ax → Cx)

1 ∀x(Ax → Bx)

2 ∀x(Bx → Cx)

3 Aa → Ba 1 UI

4 Ba → Ca 2 UI

5 Aa → Ca 3,4 TF (HS)

6 ∀x(Ax → Cx) 5 UG

Line 6 calls the rule UG. The restriction for UG does not allow the rule if the letter being

generalized from (here a) to be free in any assumptions (here lines 1 and 2). It’s not, so the restriction

is met.

The next adds EI and EG:

FOL2 (Darii) ∃x(Ax & Bx), ∀x(Bx → Cx) ⊢ ∃x(Ax & Cx)

70 CHAPTER 3. AXIOM SYSTEMS

2 ∀x(Bx → Cx)

3 Aa & Ba (1 ei a)

4 Ba → Ca 2 UI

5 Aa & Ca 3,4 TF (HS2 + Comm)

6 ∃x(Ax & Cx) 5 EG / 3 EI

The restriction on EI requires that the letter (here a) not be free in any suppositions (lines 1 and

2) or the result (q in the rule description, here line 6). We cite the rule in parenthesis on the line of

the assumption, with the existentially quantiﬁed line and the letter introduced. Then on the result

line (the q line), we cite, after whatever other rules allowed us to get that line, the assumption line.

Every syllogism can be proved easily from using Barbara and Darii along with truth functional

equivalences. Here’s a list of the valid syllogisms, with traditional names:

Figure 1

Barbara ∀x(Ax → Bx), ∀x(Bx → Cx) ⊢ ∀x(Ax → Cx)

Celarent ∀x(Ax → Bx), ∀x(Bx → ∼Cx) ⊢ ∀x(Ax → ∼Cx)

Darii ∃x(Ax & Bx), ∀x(Bx → Cx) ⊢ ∃x(Ax & Cx)

Ferio ∃x(Ax & Bx), ∀x(Bx → ∼Cx) ⊢ ∃x(Ax & ∼Cx)

Figure 2

Cesare ∀x(Ax → Bx), ∀x(Cx → ∼Bx) ⊢ ∀x(Ax → ∼Cx)

Camestres ∀x(Ax → ∼Bx), ∀x(Cx → Bx) ⊢ ∀x(Ax → ∼Cx)

Festino ∃x(Ax & Bx), ∀x(Cx → ∼Bx) ⊢ ∃x(Ax & ∼Cx)

Baroco ∃x(Ax & ∼Bx), ∀x(Cx → Bx) ⊢ ∃x(Ax & ∼Cx)

Figure 3

Datisi ∃x(Bx & Ax), ∀x(Bx → Cx) ⊢ ∃x(Ax & Cx)

Disamis ∀x(Bx → Ax), ∃x(Bx & Cx) ⊢ ∃x(Ax & Cx)

Ferison ∃x(Bx & Ax), ∀x(Bx → ∼Cx) ⊢ ∃x(Ax & ∼Cx)

Bocardo ∀x(Bx → Ax), ∃x(Bx & ∼Cx), ⊢ ∃x(Ax & ∼Cx)

Figure 4

Celantes ∀x(Ax → Bx), ∀x(Bx → ∼Cx) ⊢ ∀x(Cx → ∼Ax)

Dabitis ∃x(Ax & Bx), ∀x(Bx → Cx) ⊢ ∃x(Cx & Ax)

Friseson ∀x(Ax → ∼Bx), ∃x(Bx & Cx) ⊢ ∃x(Cx & ∼Ax)

For example, here is a proof of Cesare:

1 ∀x(Ax → Bx)

2 ∀x(Cx → ∼Bx)

3 ∀x(Bx → ∼Cx) 2 TF (Trans)

4 ∀x(Ax → ∼Cx) 1,3 Barbara

FOL3 ∀x(P → F x) ⊢ P → ∀xF x

Here, as elsewhere in this section, P stands for any TF sentence, or in general any sentence that

does not have the quantiﬁed variable.

FOL4 ∀x(F x → P ) → (∀xF x → P )

3.4. AN AXIOM SYSTEM FOR FOL 71

1 ∀x(F x → p)

2 Fa → p 1 UI

3 ∀xF x (cp)

4 Fa 3 UI

5 p 2,4 MP

6 ∀xF x → p 3-5 CP

7 ∀x(F x → p) → (∀xF x → p) 1-6 CP

FOL5 (Distribution) ∀x(F x → Gx) ⊢ ∀xF x → ∀xGx

1 ∀x(F x → Gx)

2 ∀xF x (cp)

3 F a → Ga 1 UI

4 Fa 2 UI

5 Ga 3,4 MP

6 ∀xGx 5 UG

7 ∀xF x → ∀xGx 2-6 CP

FOL6 ∀xP ↔ P

FOL7 ∃xP ↔ P

In these two theorems, the quantiﬁers are vacuous: they don’t bind any variables.

FOL8 ∀xF x ⊢ ∃xF x

FOL9 ∀x(F x → ∃yF y)

FOL10 ∃x∀yRxy → ∀y∃xRxy

FOL11 ∀x(F x → P ) ⊢ ∃xF x → P

FOL12 ∀x∼F x ⊢ ∼∃xF x

FOL13 ∼∀xF x ⊢ ∃x∼F x

FOL14 ∼∀x∼F x ↔ ∃xF x

FOL15 ∀x∼F x ↔ ∼∃xF x

FOL16 ∼∀xF x ↔ ∃x∼F x

FOL17 ∀xF x ↔ ∼∃x∼F x

These last four logical equivalences are often useful; we will refer to them collectively as the rule

of quantiﬁer negation (QN).

Prenex Normal Form

Every sentence of FOL is logically equivalent to a sentence in which all the quantiﬁers are at the

left, followed by a quantiﬁer-free formula. For example, the following two sentences are equivalent:

∃xAx → ∃yBy

∀x∃y(Ax → By)

The ﬁrst sentence has the quantiﬁers applied to the shortest segment of the sentence necessary; the

second sentence has the quantiﬁers applied to the whole sentence. Another way to say this is that, in

the ﬁrst sentence, the quantiﬁers lie within the scope of a truth-functional connective (the conditional

has the broadest scope); the second sentence has no quantiﬁer within the scope of a truth-functional

72 CHAPTER 3. AXIOM SYSTEMS

connective. This will serve as the deﬁnition of prenex normal form (PNF): A sentence in prenex

normal form has no quantiﬁers falling within the scope of a truth-functional connective. There are

two major steps in converting a sentence into PNF. They correspond to the two undeﬁned truth-

functional connectives: ∼ and →. The ﬁrst involves moving the negations to fall within the scope

of the quantiﬁers. We do this by applying the rule of quantiﬁer negation. For example, if we have

the following as part of a sentence

...∼∃x(P x → ...

we need to move the existential quantiﬁer to have broader scope than the negation. By QN, we

have

...∀x∼(P x → ...

The next step in converting a sentence to PNF consists in moving the conditionals to fall within

the scope of the quantiﬁers. This is done by applying the next four equivalences:

FOL18 ∀x(P → F x) ↔ (P → ∀xF x)

FOL19 ∃x(P → F x) ↔ (P → ∃xF x)

FOL20 ∀x(F x → P ) ↔ (∃xF x → P )

FOL21 ∃x(F x → P ) ↔ (∀xF x → P )

By repeatedly using the right-to-left directions of these biconditionals and the suitable instances

of QN, we can change every sentence into a sentence in PNF. (Of course, we could also go the

other direction, driving the quantiﬁers in as far as they will go.) If the sentence has deﬁned truth-

functional connectives, it can be converted to PNF by ﬁrst replacing the deﬁned connectives by their

deﬁnitions, then proceeding as before.

Of course, we can prove the PNF rules using other operators directly.

FOL22 ∀xF x ⊢ ∀x(F x v Gx)

FOL23 ∃xF x ⊢ ∃x(F x v Gx)

FOL24 ∀x(F x & Gx) ⊢ ∀xF x

FOL25 ∃x(F x & Gx) ⊢ ∃xF x

FOL26 ∀x(F x & Gx) ↔ (∀xF x & ∀xGx)

FOL27 ∃x(F x v Gx) ↔ (∃xF x v ∃xGx)

FOL28 ∃x(F x & Gx) ⊢ ∃xF x & ∃xGx

FOL29 ∀xF x v ∀xGx ⊢ ∀x(F x v Gx)

(Notice with some of these the equivalence goes in only one direction.)

With a complex sentence that has many quantiﬁers, there’s no rule about which quantiﬁer to

bring out ﬁrst. And it may be that you get a different sentence if you bring them out in one order

rather than another. But we can prove that any way of bringing them out is equivalent to any other.

For example, the following three sentences are logically equivalent:

a: ∃xFx→∃yGy b: ∀x∃y(Fx→Gy) c: ∃y∀x(Fx→Gy)

Prove their equivalence. That is, prove

3.5. IDENTITY 73

Other logical truths

FOL31 ∀y(F y → ∃xF x)

FOL32 ∃y(F y → ∀xF x)

FOL33 ∃y(∃xF x → F y)

FOL34 ∀x∃y(F x → Gy) → ∃x(F x → ∃yGy)

FOL35 (∃xF x → ∃xGx) → ∃x(F x → Gx)

FOL36 ∀x∃y(F x & Gy) ↔ ∃y∀x(F x & Gy)

FOL37 ∀x∃y(F x v Gy) → (∀xF x v ∃yGy)

3.5 Identity

Identity is, on the one hand, just another two-place predicate. But, on the other hand, it is certainly

a logical relation, so it will take special consideration. We could symbolize it ‘Ixy’, but we will stick

with the more familiar x = y , to mark it as a special logical relation.

The basic principle of identity was stated by Bishop Butler: “Every thing is what it is, and not

another thing.” Thus it’s never really correct to talk about two things being identical; everything is

identical only to itself. When we say something like

x=y

we are saying that the variables x and y pick out the same object. So our ﬁrst axiom of identity we’ll

call ‘Butler’s Law’:

BL ∀x(x = x)

The other axiom is usually named after Leibniz, and says this:

LL ∀x∀y(x = y → (P x → P y))

This, of course, is an axiom schema, and holds for any P . This law is also sometimes known as The

Principle of Indiscernibility of Identicals. It says that everything has whatever properties it has.

The trick to doing proofs with identity is ﬁnding the right substitution instance for P. Sometimes

it’s straightforward, but sometimes the substitution instance is fairly complex. The following are all

allowable as instances of LL:

∀x∀y(x = y → (P x → P y)) (P _ : P _)

∀x∀y(x = y → (Rxx → Rxy)) (P _ : Rx_)

∀x∀y[x = y → [(P x & Rxy) ↔ (P x & Rzy)]] (P _ : P x & R_y)

FOL39 ((P x & ∼P y) → x ̸= y)

FOL40 (x = y → y = x)

FOL41 ((x = y & z = y) → x = z)

74 CHAPTER 3. AXIOM SYSTEMS

FOL43 ((y = x & z = y) → x = z)

FOL44 ((x = y & z = y) → z = x)

FOL45 ((z = x & y = z) → x = y)

Axiom BL tells us that identity is totally reﬂexive. FOL30 tells us it’s symmetric, and FOL32

tells us it’s transitive. Thus identity forms an equivalence class. This allows a helpful shortcut rule:

Ident Given any sequence of appropriately linked identities, we can take the extremes as identical.

If we have a chain of identities (e.g., a = b & b = c & c = d & d = e & . . . & m = n), we can

take the ﬁrst and last and set them equal (a = n).

Chapter 4

Modal Logic

What are modals?

By now you are an expert at dealing with truth-functional logic. Truth-functional logic takes certain

symbols as constant, symbols like ‘v’ and ‘→’. These constants works like functions from truth-

values to truth-values. In other words, ‘AvB’ is true whenever either ‘A’ or ‘B’ is true. The only

thing truth-functional logic is concerned with is the truth-value of ‘A’ and ‘B’. That’s what allows us

to construct truth tables. Truth-functional logic is a powerful model for some of our language, but

not all of our language is truth-functional. Consider the sentences ‘Superman can ﬂy’ and ‘Clark

Kent can ﬂy’. Let’s assume that both sentences are true. Now consider the sentences ‘Lois knows

that Superman can ﬂy’ and ‘Lois knows that Clark Kent can ﬂy’. It seems plausible to say that the

former sentence is true and the latter sentence is false. The same phrase, ‘Lois knows that’, can

attach to two different sentences, both of them true, and the two resulting sentences have different

truth-values. That means that ‘Lois knows that’ is not truth-functional.

Another example. Consider the sentences ‘Stephen Douglas was the sixteenth President of the

United States’ and ‘9 is a prime number’. These are both false. Now consider what happens when

we preﬁx the words ‘had things gone differently, it might have been the case that’: ‘Had things

gone differently, it might have been the case that Stephen Douglas was the sixteenth President of

the United States’ and ‘Had things gone differently, it might have been the case that 9 is a prime

number’. The former is probably true, but the second is certainly false.

‘Lois knows that’ and ‘had things gone differently, it might have been the case that’ are examples

of modals. Modals have that name because they don’t merely reﬂect truth, but modes of truth—

whether something has to be true, or used to be true, or is known to be true. They capture what

modern grammars call ‘adverbials’.

One important class of modals has to do with knowledge and belief, and are sometimes called

epistemic or doxastic modals (from the Greek words for ‘knowledge’ and ‘belief ’, respectively). These

are phrases like ‘it is known that’, ‘it is believed that’, ‘Lois knows that’, ‘Lois believes that’. Another

75

76 CHAPTER 4. MODAL LOGIC

important class of modals has to do with obligation and permission, and are sometimes called deon-

tic modals: ‘it is obligatory to’ (or ‘must’), ‘it is permissible to’ (or ‘may’). Still another important class

has to do with time. With truth-functional logic we have ignored time completely. The statements

of logic are ofﬁcially tenseless. It’s easy to see that verb tenses are not truth-functional. ‘Superman

can ﬂy’ is true now, but not before he retreated to the Fortress of Solitude to learn how; ‘9 is a com-

posite number’ is true now, and there never has been a time when it wasn’t true. So when we add

an operator like ‘ten years ago’ to a sentence, the resulting sentence is only sometimes true. These

temporal modals can be expressed by changing the tense of the verb, and also by adding preﬁxes

like ‘it is always the case that’, ‘it will be the case that’, ‘yesterday it was the case that’, and so on.

All of these modals are philosophically important. Epistemic modals are important in episte-

mology, the study of knowledge. Deontic modals are important in ethics. Temporal modals are

important in the branch of metaphysics that studies time. (It should be called ‘chronology’, but that

word’s already taken.) But the most important modals in philosophy are sometimes called ‘alethic’

(from the Greek word for ‘truth’) or ‘metaphysical’ or ‘counterfactual’. These modals are usually

expressed ‘necessarily’ and ‘possibly’ (or ‘it is necessary that’ and ‘it is possible that’), and these words

are given special meanings. The meaning of the second is roughly ‘had things gone differently, it

might have been the case that’ and the meaning of the ﬁrst is ‘even had things gone differently, it

would still have been the case that’, or ‘it has to be that’. Another modal that is sometimes thrown

in is ‘contingent’, which means ‘true and not necessary’.

It may be helpful to think of the difference as what God could have done when he made the

world. God could have made grass blue, so it’s possible that grass is blue, but God couldn’t have

made 9 prime, so it’s not possible that 9 is prime. It’s important here not to get confused. Of course,

‘9’ might have referred to the number 7, so the sentence ‘9 is prime’ might have meant something

different—something true—but that’s irrelevant to the truth of the sentence ‘it is possible that 9 is

prime’. In this way the modal operators are the same as the truth-functional operators: attaching

an operator to a sentence does not give you the right to change the meaning of the words in the

sentence.

If modals are not truth functional, how can they make a logic? You are already familiar with a logic

that’s not truth functional. Modern predicate logic is not truth functional—you can’t make truth

tables for arguments in predicate logic. But predicate logic fails to be truth functional for a different

reason than modal logic. In predicate logic, the quantiﬁers attach to open formulas, which have no

truth value. But modal operators do attach to statements.

The way the modals work is similar to the way Aristotelian logic works. On Aristotelian logic,

quantiﬁers are attached to a sentence like ‘tigers are tame’ to make ‘all tigers are tame’, ‘no tigers

are tame’, ‘some tigers are tame’, and ‘not all tigers are tame’. The two quantiﬁers, ‘all’ and ‘some’,

attach to a statement to make a new statement. These quantiﬁers have certain relations to each

other, by virtue of which they are duals of each other. The most fundamental of these relations is

that the quantiﬁers are interdeﬁned. To use modern symbols:

4.1. WHAT IS MODAL LOGIC? 77

∀xφx↔∼∃x∼φx.

The modal operators work just like this. There are duals, usually symbolized ‘□’ and ‘⋄’, that

are interdeﬁned:

□φ↔∼⋄∼φ.

In deontic logic, ‘□’ is interpreted ‘it is obligatory that’ and ‘⋄’ is interpreted ‘it is permissible

that’. It should be easy to see that, under this interpretation, the operators are duals: if it is obligatory

for me to brush my teeth, it is not permissible for me not to brush my teeth. In epistemic logic, ‘□’

is interpreted ‘it is known that’ and ‘⋄’ is interpreted ‘it is believed that’. Sometimes they will have a

subscript: ‘□Lois’ means ‘Lois knows that’. (Also, sometimes epistemologists use ‘K’ instead of ‘□’.)

Again, it should be easy to see that the operators are duals: if Lois knows something to be true, she

does not believe it to be untrue (in some sense of ‘believe’). In temporal logic, ‘□’ is interpreted ‘it

is always the case that’, and ‘⋄’ is interpreted ‘it is sometimes the case that’. (There are also other

temporal logics, which deﬁne the operators in such ways as ‘it will always be the case that’ and ‘it will

sometimes be the case that’; or ‘it is and always will be the case that’ and ‘it is or will sometimes be

the case that’.) In alethic modal logic, ‘□’ is interpreted ‘necessarily’, and ‘⋄’ is interpreted ‘possibly’.

Propositional modal logic, which is just propositional logic with modals attached, is much easier to

deal with than ﬁrst-order logic. We see this ﬁrst with translating sentences from English into modal

logic.

Example 4.1.A

Translate into modal logic the following sentence: “It is not possible for John to go to the store.”

Take ‘J’ to be ‘John goes to the store’. Then it could be symbolized either of the following ways:

∼⋄J

□∼J

Example 4.1.B

Translate the following sentence: “Anna does sometimes counsel take—and sometimes tea.”

(This sentence is adapted from a line in Pope’s Rape of the Lock.) Here the modal is temporal: it

expresses what happens not now and not always, but sometimes. Take ‘C’ to be ‘Anna takes counsel’

and ‘T’ to be ‘Anna takes tea’. Then the sentence is symbolized like this:

⋄C &⋄T

Why isn’t it ‘⋄(C &T)’? That would say that there are times she takes both counsel and tea, i.e.,

that she takes them both at the same time. But the joke in the original line is that Anna sometimes

would rather take tea than listen to good advice, that she doesn’t do them at the same time. Even if

she does, that’s not what the original sentence said.

As we have seen, there are several words in English that express modes that are translated by ‘□’:

must, necessarily, always, has to. There are several words that are translated by ‘⋄’: may, possibly,

sometimes, can. These different concepts work very differently—something may be possible without

being permissible, or necessary without being known. Because of that, modal logic is not really

78 CHAPTER 4. MODAL LOGIC

one logic, but many. We’ll get to that soon; for now, let’s ignore those differences and practice

symbolizing.

Exercises

1 If you do your homework, you may play frisbee. (H: You do your homework. F: You play

frisbee.)

2 Maybe I’ll eat that last cookie, but then again, maybe I won’t. (C: I will eat that last cookie.)

3 You always beat me at basketball, but I sometimes beat you at chess. (B: You beat me at

basketball. C: I beat you at chess.)

6 It’s possible that, if this is milk, then it’s necessarily milk, but it’s not possible that, if it’s not

milk, then it’s necessarily not milk. (M: This is milk.)

8 If I must eat the cookie, then I may eat the cookie, but if I mayn’t eat the cookie, then I don’t

eat the cookie. (C: I eat the cookie.)

9 If grass is always green, then grass is green sometime or other. (G: Grass is green.)

10 Necessarily, if God possibly exists, then God necessarily exists. (G: God exists.)

11 It is possible that there is a green-eyed monster and not a red-eyed monster, but it is not possible

that, if there is possibly a red-eyed monster then there is possibly a green-eyed monster. (G:

There is a green-eyed monster. R: There is a red-eyed monster.)

4.2 Models

We’re going to talk about the semantics of modal logic before we talk about the syntax. That’s the

usual approach now, because it’s more intuitive, but it’s historically backwards. Modern modal logic

was invented in 1913 by C.I. Lewis, but it wasn’t until the 60s when a teenager named Saul Kripke

developed a syntax. This syntax depends on models.

For our purposes, a model needs three things. First, it needs a set of points. These can model

states in a game, or times, or alternate scenarios. We call these ‘possible worlds’, or just ‘worlds’.

Second, it needs a relation that speciﬁes which of these worlds is accessible from which. And third,

4.2. MODELS 79

it needs a valuation, which says which simple propositions are true at which world. (The ﬁrst two

things together make a frame; when a frame has a valuation, it becomes a model.)

A truth table is a kind of model for propositional logic. The lines on the truth table are the

possible worlds, each with its own valuation. The big difference between truth tables and models

for modal logic is that truth tables don’t have the second feature, the accessibility relation. That was

the feature that Kripke added, and that makes it possible to model modal logic. (Say this three times

fast: “A modal model models modal logic. A modal model models modal logic. A modal model

models modal logic.” Now, don’t get the words confused.)

Example 4.2.A.

Look at this model.

1 2:P

3:P

The numbers are the worlds, and the arrows are the accessibility relations. The ‘P’ and ‘Q’ at

worlds 2 and 3 indicate that P is true on 2 and Q is true on 3. (By convention we mark only the

propositions that are true at a world, so we can assume P is false on 1 and 3, and so on.) This model

gives us the information we need to determine the truth not only of the simple propositions, like

P and Q , but also the modal propositions, like ⋄P and □Q. ‘□p’ means ‘on every world accessible

from the given world, p is true’. ‘⋄p’ means ‘on at least one world accessible from the given world, p

is true’. What formulas are true at 1? Well, since P is true at 2 and 2 is accessible from 1, ⋄P is true

at 1. Similarly, ⋄Q is true. Also, since PvQ is true at 2 (since it is always true if P is true), ⋄(PvQ) is

also true at 1. But then, PvQ is true at 3 also, so □(PvQ) is true at 1. On the other hand, □P is false,

and □Q is false, so (□Pv□Q) is false at 1.

Example 4.2.B

1 2:P 3:P,Q

What’s true on world 1? Well, from world 1 we can get to worlds 2 and 3; both worlds are

accessible to world 1. And, P is true on both 2 and 3, so □P is true. But, since Q is false on world

2, □Q is not true on world 1. Because q is true on 3, and s is accessible from both worlds 1 and 2,

⋄P is true on worlds 1 and 2.

You can see that ‘⋄’ is a little like ‘∃’: just as ‘∃xPx’ means ‘for some x, Px’, ‘⋄p’ means ‘on some

world, p’. Similarly, ‘□’ is a little like ‘∀’: just as ‘∀xPx’ means ‘for all x, Px’, ‘□p’ means ‘on every

world, p’. You may remember that ‘∃x(x is a unicorn and has one horn)’ is true only if there is at

least one unicorn, but ‘∀x(if x is a unicorn then x has one horn)’ may be true even if there are no

80 CHAPTER 4. MODAL LOGIC

unicorns. It’s the same with the modals. ‘⋄p’ means that p is true on at least one accessible world, so

it cannot be true if there are no accessible worlds. ‘□p’ means that p is true on all accessible worlds.

If there are no accessible worlds, then p is sure true on all the accessible worlds there are! In that

case, we say that ‘□p’ is true “vacuously”. So, on world 3, ‘⋄P’ and ‘⋄Q’ are both false, since there

are no accessible worlds where ‘P’ or ‘Q’ are true, but on world 3 ‘□P’ and ‘□Q’ are true vacuously.

Example 4.2.C

Is □P true on world 1? Yes, since every world accessible to world 1—in this case, there’s only

one such world: world 3—is a world where p is true. Is p&⋄P on world 2? Again, yes, since p is true

on world 2 and ⋄P is true on world 2.

1:P 2:P

3:P

Exercises

For exercises 1–10, use the following model:

1:P 2:P,Q

3:P

1 1: P

2 1: ⋄P

3 1: □P

4 1: Pv□P

5 2: P

6 2: □P

7 2: Q→□P

8 3: □P

9 3: ⋄P

4.2. MODELS 81

10 3: Q→□P

Exercises

For exercises 11–22, use the following model:

1:Q

2:P,Q

4:Q

3:P

11 1: P

12 1: ⋄P

13 1: □P

14 1: P&□P

15 1: □(PvQ)

16 2: ⋄P

17 2: □P

18 2: Q→□P

19 3: □Pv⋄P

20 3: Q→□P

21 4: PvQ

22 4: Q→□P

This is an island. The “worlds” on this island are discrete parts of the island. At 3, 5, and 6

are pirates. At 9 there is a treasure. (That is, ‘P’ here means ‘There be pirates here’, and ‘T’ means

‘There be treasure here’.)

82 CHAPTER 4. MODAL LOGIC

1 2 3:P

4 5:P 6:P

7 8 9:T

Notice that at world 2, □P is true; I can’t move anywhere without running into pirates. □P is

true also on world 3, and, vacuously, 9. At which worlds is ⋄P true? 2, 3, 4, and 5, because each of

these has some move that leads to pirates. At which worlds is ⋄□P true? Recall that this means that

there is some move that leads me to a place where □P is true. Because we already ﬁgured out that

□P is true on worlds 2, 3, and 9, ⋄□P will be true on every world from which those three worlds are

accessible: 1, 2, 6, and 8. It’s not true on 9 because there are no legal moves from 9.

Exercises

At which worlds of Treasure Island are the following propositions true? (There may be more

than one.)

23 ⋄T

24 ⋄□T

25 ⋄P

26 □⋄P

27 P&⋄T

28 ∼P&□T

29 P&□T

30 ⋄□□T

Exercises

For the following exercises, state whether the following propositions are true at world 1 of this

model:

4.2. MODELS 83

2 3:P

4 5:Q

31 ⋄P

32 ⋄⋄P

33 ⋄□P

34 □⋄P

35 ⋄(□Pv□Q)

36 □(⋄Pv⋄Q)

37 (⋄□P&⋄□Q)→⋄□(P&Q)

38 (⋄□P&⋄□Q)→⋄□(PvQ)

Counterexamples

Just as with other levels of logic, there are modal statements that are logical truths. A modal logical

truth will be true at every world of every model. We sometimes say that a statement is valid on a

given model if it is true on every world of that model; a modal logical truth is valid on every model.

Most statements, of course, are not valid. To show that a statement is not valid, we provide a

counterexample: a model and a world on that model where the statement is not true. By convention,

we usually take world 1 to be this world. Sometimes it’s easy to ﬁnd a counterexample for a given

statement, but sometimes it takes some thought and some trial and error.

Example 4.2.E

Find a counterexample to □P→P.

Because this is a conditional, it will be false if the antecedent is true and the consequent is false.

Here is such a model:

1 2:P

The antecedent is true at world 1, since at every world accessible from world 1 (that is, only

world 2), P is true. But the consequent is false at world 1, so the conditional is false.

Example 4.2.F

Find a counterexample to P→⋄P.

84 CHAPTER 4. MODAL LOGIC

Again, we need to make the antecedent true and the consequent false. To make the antecedent

true, we need P to be true at world 1. To make the consequent false, we need some world accessible

from world 1 where P is not true. Here is such a model:

1:P 2

Example 4.2.G

Find a counterexample to □⋄P→⋄P.

Here to make the consequent false we simply need some world accessible from world 1 where P

is not true. To make the antecedent true we need to make every world accessible from world 1 such

that there’s some accessible world where P is true. Thus we need world 2, a world accessible from

world 1, on which P is false. Then we need a world accessible from world 2 on which P is true. One

way to do it is this:

1 2 3:P

But that isn’t the only way to do it. We can get by with just two worlds:

1:P 2

Example 4.2.H

Find a counterexample to □P&P.

Because this is a conjunction, to falsify this statement we need to make at least one conjunct

false. Here we can do this with just one world: a world on which P is false:

1

Example 4.2.I

Find a counterexample to (⋄P&⋄Q)→⋄(P&Q).

Here we need two worlds accessible from world one, one to make each of the conjuncts of the

antecedent true. If we made both conjuncts true with the same world, that world would also make

the conseqent true.

2:P

3:Q

Exercises

Find counterexamples to the following statements:

4.2. MODELS 85

1 ⋄P

2 P→□P

3 ⋄□P→□P

4 ⋄⋄P→⋄P

5 □P→⋄P

6 (⋄Pv⋄Q)→⋄(P&Q)

7 □(PvQ)→(□Pv□Q)

8 (P→□Q)→□Q

9 (⋄P&□□P)→□(P→⋄P)

Properties of Relations

Every model, you recall, has an accessibility relation. Sometimes these relations can have interesting

properties. For example, a model’s relation is reﬂexive if every world is accessible from itself. This

model has a reﬂexive relation.

The relation is transitive if you can skip worlds; that is, if 3 is accessible from 2, and 2 is accessible

from 1, the relation is transitive if 3 is accessible from 1. Another way to say this is that every world

that is eventually accessible—accessible after some number of steps—is (immediately) accessible.

The relation is symmetric if every world is accessible from all worlds that are accessible from it.

Another way of saying this is that you can always get back to where you started. You can always get

home. Yet another way of saying this is that every world in the model is accessible in some number

of steps from every other world. You can get anywhere from anywhere else.

86 CHAPTER 4. MODAL LOGIC

We can either say that the model’s relation is symmetric (or transitive or reﬂexive), or we could

just say that the model is symmetric (etc.). To say that a model is symmetric doesn’t mean that the

picture is symmetric; it means that the relation is symmetric.

We are interested in models that have these various relations because they allow us to talk

about modal logics of different strengths. We’ve seen, for example, that ‘P→⋄P’ and ‘□P→P’ and

‘□P→⋄P’ are not valid. But for certain interpretations of the box and diamond, we do want them to

be valid. For example, given the deontological reading (‘□’ means ‘obligatory’ and ‘⋄’ means ‘per-

missible’), we want the third statement to be true, but not the other two—it’s true that if something

is obligatory, it must be permissible, but it’s not true that if something is done, it is permissible. On

the temporal reading (‘□’ means ‘always true’ and ‘⋄’ means ‘sometimes true’), we want all three to

be true. And it turns out that some of these statements are valid on some kinds of models and not

on others.

There are names for the various frames based on the properties of the accessibility relation. (A

frame, recall, is a model without a valuation):

K no conditions

M reﬂexive

B reﬂexive, symmetric

K4 transitive

S4 reﬂexive, transitive

S5 reﬂexive, symmetric, transitive

(Note: ‘M’ is now usually called ‘T’.) Because there are formulas that are true in some frames

and not others, it is possible to make different logics for each of the different frames. If a formula is

true on every world in every B model (for instance), we will say that the formula is true or valid in B.

So far we’ve provided counterexamples in K, since to be valid in K means to be valid in every

model. But we can do the same with the other models.

EXAMPLE 4.2.J

Show that ⋄⋄P→⋄P is not true on B.

As we construct our counterexample, we need to make sure the accessibility relation is both

reﬂexive and symmetric. Thus, every world in the model must be accessible to itself (reﬂexivity),

and every world has to be accessible from every world it can access (symmetry). Another way to

say this: there must be circular accessibility arrows on every world, and every arrow connecting two

worlds must go both ways.

1 2:P

This model is both reﬂexive and symmetric, and hence a B model. The antecedent is true on

world 1, since world 2 (the only world on which p is true) is accessible in two moves from world 1.

It is not, however, accessible in one move, and so the consequent is not true, and hence the formula

is not true.

4.2. MODELS 87

Exercises

For the following exercises, show that the given formula is not valid in the given frame by pro-

viding a counterexample.

1 M: P→□P

2 M: ⋄□P→□P

3 M: ⋄⋄P→⋄P

4 M: □(PvQ)→(□Pv□Q)

5 M: (P→□Q)→□Q

6 B: P→□P

7 B: P→⋄□P

8 B: ⋄⋄P→⋄P

9 B: ⋄ ⋄ ⋄P→P

10 B: □(PvQ)→(□Pv□Q)

11 B: (P→□Q)→□Q

12 K4: P→□P

13 K4: □(PvQ)→(□Pv□Q)

14 K4: (P→□Q)→□Q

15 K4: (⋄⋄P&□Q)→□(P&Q)

16 S4: P→□P

17 S4: ⋄□P→□P

18 S4: ⋄P→□⋄P

19 S4: □(PvQ)→(□Pv□Q)

20 S4: (P→□Q)→□Q

21 S5: P→□P

22 S5: □(PvQ)→(□Pv□Q)

23 S5: (P→□Q)→□Q

88 CHAPTER 4. MODAL LOGIC

24 S5: (□□(P→Q)&⋄∼P)→∼Q

Some of these logics are stronger than others. One logic is stronger than another if there are state-

ments that are valid in it but not valid in the other. But sometimes two logics are incommensurable,

meaning that each has statements valid in it but not the other.

K is the weakest logic: it has the fewest valid statements, so it’s the easiest to ﬁnd counterexamples

in. S5 is the strongest logic. In fact, S5 has an interesting property: On S5, you can collapse every

string of modal operators into the last one. For example, on S5, □□ ⋄ □ ⋄ □□□ ⋄ □⋄p↔⋄p. This is

because S5’s accessibility relation is an equivalence relation. The modal statements of every world

are just the same as those of every other world.

Symbolizing modal logic: de re and de dicto

So far we’ve dealt only with propositional modal logic. From here on we’ll extend this to quantiﬁed

modal logic, which mixes quantiﬁers with modal operators. This sounds like a simple step, but it

brings up several philosophical and technical questions that need to be answered.

The move is similar to the move from propositional logic to quantiﬁed logic. When we made

that move, we went from statements like ‘A→B’ to statements like ‘∀x(Ax→Bx)’. We changed the

simple sentences to open sentences (by adding the variable x), and we preﬁxed the whole with a

quantiﬁer. We’ll do the same thing here. We’ll change simple modal sentences like ‘⋄A→□A’ to

quantiﬁed modal statements like ‘∀x(⋄A→□A)’.

Quantiﬁed modal logic is a really powerful language, so powerful that it has led philosophers to

believe that it can help answer really vexing philosophical questions, at least by clarifying what’s at

issue.

Example 1: free will. Some philosophers say that I acted freely on some occasion if and only if I

could have done something other than what I did. If I freely helped the lady across the street, it was

possible for me to have spit in her eye instead. Thus, if we take ‘x’ to quantify over my actions and

‘Ax’ to mean ‘I perform x’, the following sentence expresses what it means to say that I’m sometiems

free: ‘∃x(Ax &⋄∼Ax)’—there’s some action such that I performed it and it’s possible that I didn’t

perform it.

Example 2: Why is there something rather than nothing? Why does anything exist at all? This

is a question that many philosophers have taken to be very important, and many of those who have

taken it to be important have found in its answer some knowledge about the universe and about

God. In particular, there is an argument that, if you accept the premises, proves that there is a

necessary being that explains the existence of everything else. It is crucial that this is a necessary

being; that is, the conclusion of the argument is that something necessarily exists.

This conclusion introduces the ﬁrst technical subtlety. What does it mean to say “Something

necessarily exists”? Does it mean “It is necessarily true that something exists”—that is, it is impossi-

ble for nothing to exist? Or does it mean “There is something that exists necessarily”—that is, that

4.3. QUANTIFIED MODAL LOGIC 89

something has the property of necessary existence? It’s something like the ambiguity in ‘She has a

ring on every ﬁnger’. Does this mean she has ﬁve rings, or one huge ring? This is called (you prob-

ably remember) a “scope ambiguity”: what is the scope of the quantiﬁer? Is it ‘∃x∀yOxy’—‘there

is some ring x such that for all ﬁngers y, x is on y’? Or it is ‘∀y∃xOxy’—‘for every ﬁnger y there

is some ring x such that x is on y’? (Less silly examples: ‘Everyone loves someone’ and Aristotle’s

‘Every action aims at some end’.)

Here, too, the ambiguity is one of scope. If the quantiﬁer has the wider scope—‘∃x□Ex’—it

means that we choose the thing ﬁrst. There is something (perhaps God) that necessarily exists. If

the modal operator has the wider scope—‘□∃xEx’—it means that we choose the thing that exists

on a world only after we consider that world. On this world it may be the sun, on that world it may

be this slice of blueberry cheesecake. It doesn’t matter what we choose: the statement is true if there

something, anything, on every world, even if that thing isn’t on any other world.

This distinction is called the de re/de dicto distinction, from Latin phrases meaning ‘of the

thing’ and ‘of the statement’. The distinction turns on whether the operator applies only to the

predicate (the thing) or to the whole statement. It’s sometimes easiest to see the distinction if we

look at it using a temporal modal operator. Consider this sentence: ‘The U.S. President will always

be democrat’. Right now, the President, Barack Obama, is democrat. The sentence might be saying

that this particular entity, Obama himself, will always be democrat. Or the sentence might be saying

that the sentence, ‘the U.S. President is democrat’ will always be true (i.e., by having a succession of

democratic presidents). The former reading of the sentence is the de re reading; the latter is the de

dicto reading.

In a de dicto reading, the modal operator always has wider scope. That’s just what it means

to say that the modality of “of the statement” rather than “of the thing.” So, taking the box to

mean ‘always’ (and simplyfying a little by saying ‘all’ rather than ‘the’), the de dicto reading of ‘The

U.S. President will always be democrat’ is ‘□∀x(Px→Dx)’. Here, the statement ‘∀x(Px→Dx)’ has

necessary modality. The de re reading is ‘∀x□(Px→Dx)’. Here it’s only the concrete individual

already picked out that has the property necessarily.

With the original example, ‘something necessarily exists’, the de dicto reading (of course) is the

one that has the modal operator ﬁrst: ‘□∃xEx’. The de re reading is the one that has the quantiﬁer

ﬁrst: ‘∃x□Ex’.

Exercises

90 CHAPTER 4. MODAL LOGIC

Just as with propositional modal logic, various modals in English are translated with the box and

diamond. The trickiest part of translating into the modal logic is the difference between de re and

de dicto.

Example 4.4.A

Translate “Someone’s gotta talk to him.”

What does this mean? Does it mean that there’s someone waiting out in the lobby who needs

to talk to him, perhaps to ask for clemency for her son? Or does it mean that he must be talked to

by someone or other? If it means the former, it is translated ‘∃x□Txh’ (with ‘Txy’ meaning ‘x talks

to y’ and ‘h meaning ‘him’). If it means the latter, it is translated ‘□∃xTxh’.

Example 4.4.B

Translate “Everybody needs somebody sometime.” Take ‘Nxy’ to be ‘x needs y’, and take the

modal to be temporal, and restrict the univese of discourse to persons.

The intent of this sentence, I think, is quite clear. It is not saying that there are moments at

which everybody needs somebody, but that everyone is such that he or she needs somebody or other

at some time or other. Thus, it is symbolized ‘∀x⋄∃yNxy’. The ‘everybody’ is outside the modal, but

the ‘somebody’ is inside. If the existential quantiﬁer were also outside the modal operator, it would

say that everyone has some speciﬁc person that the ﬁrst person needs every once in a while.

So, just as with quantiﬁcational logic, one must take care to get the scope just right.

Exercises

5 What goes up must come down. (Ux: x goes up. Dx: x comes down)

8 A cat may look at a king. (Heywood, Proverbs and Epigrams) (Cx: x is a cat. Kx: x is a king.

Lxy: x looks at y.)

9 All good things must come to an end. (Gx: x is good. Ex: x ends.)

12 What is forbidden to some is forbidden to all, and what is permitted to some is permitted to

all. (Adapted from the Babylonian Talmud.) (Dxy: x does y.)

13 Someone has to slay the dragon. (i.e., the dragon must be slain by someone or other.) (Px: x

is a person. Dx: x is a dragon. Sxy: x slays y.)

14 There’s someone who must slay the dragon. (i.e., only that person can slay the dragon.) (Px:

x is a person. Dx: x is a dragon. Sxy: x slays y.)

4.4. MODELS OF QUANTIFIED MODAL LOGIC 91

15 Caesar’s wife must be above suspicion. (c: Caear. Mxy: x is married to y. Ax: x is above

suspicion.)

16 It’s not possible to go faster than the speed of light. (Fxy: x goes faster than y. c: the speed of

light.)

Let’s add identity. Identity brings up some interesting philosophical issues, but the translation—

besides rampant de re/de dicto confusion—is straightforward.

bols.

18 If Jane is possibly a philosophy student, and Jane is the murderer, then the murderer is possibly

a philosophy student.

19 If the murderer must be a sociology student, and Jane is not a sociology student, then Jane

cannot be the murderer.

20 If it is possible that Bob is president of the club, then it is possible that Bob is Jane.

Models

Models of quantiﬁed modal logic are just like models of propositional modal logic, except that the

worlds have predicate statements instead of primitive propositions. Recall that models in proposi-

tional modal logic had a set of worlds, an accessibility relation, and a valuation. Models in quantiﬁed

modal logic replace this simply valuation with two things: for each world, we have a domain of that

world (the objects that exist on that world) and an interpretation (an assignment of each object to

certain predicates). More informally, models look like this:

2{a}Pa

3{a,b}Pa,Pb

92 CHAPTER 4. MODAL LOGIC

Every world is labeled with a number, a list of that objects exist on that world, and a list of what

statements are true on that world. What statements are true on the various worlds of this model?

At world 1, □Pa is true. At world 3, ∀xPx is true (since all the things at world 3, a and b, are P at

world 3), so ⋄∀xPx is true at world 1. ∀xPx is also true at world 2 (since all the things at world 2, just

a, are P at world 2), and hence □∀xPx is true at world 1. There are, of course, other propositions

that are true as well.

Let’s look at ⋄∀xPx and ∀x⋄Px. The ﬁrst is true at some world w if at some world accessible

from w, ∀xPx is true. The second is true at a world w if, for every object x on w, there is some

world accessible from w on which x is P. We can illustrate the difference between these two with the

following models:

2{a}Pa 2{a}Pa

1{a,b} 1{a,b}

3{a,b}Pa,Pb 3{b}Pb

Both of these models have the same domain. They vary only in the properties the objects in the

domain have at various worlds. On the left model, ⋄∀xPx is true at 1 because there is some world

accessible to it, namely 3, on which ∀xPx is true. ∀x⋄Px is also true, since for every object, there is

some world accessible to 1 on which that object is P. On the right model, ∀x⋄Px is also true. But

⋄∀xPx is not true. Even though Pa is true at 2 and Pb is true at 3, there is no single world on which

all the objects are P, so there’s no single world on which ∀xPx is true.

Our deﬁnitions allow objects to have properties on worlds where they don’t exist. For example,

there’s nothing wrong, according to the deﬁnition, with this model:

2{a}Pa

1{a,b}

3{b}Pa

On this model, ∀x⋄Px is true at world 1, since everything on world 1 has property P on some

accessible world. Yet on worlds 2 and 3, ∃xPx is false, since nothing that exists on world 2 or 3 has

property P on that world. This may seem a little odd, but there are two reasons for making this

move. The ﬁrst is that there are some statements, such as Pav∼Pa, that we want to be true even on

worlds where a doesn’t exist. The second is that under some interpretations of the modal operators,

4.4. MODELS OF QUANTIFIED MODAL LOGIC 93

it is plausible that objects can have properties where they don’t exist. For example, on the temporal

reading, it’s plausible that George Washington doesn’t exist at all moments (e.g., he doesn’t exist

now), and yet he is famous at some moments at which he doesn’t exist (e.g., he is famous now).

Exercises

State which statements are true on the given world.

Use the following model:

1{a}Pb 2{a,b}Pa,Pb,Qb

3{a,b}Pa,Pb,Qa

4{a,b}Pa,Qa,Qb

1 2:Pa

2 2:∀xPx

3 2:∀x(PxvQx)

4 3:Pa

5 3:∀xPx

6 4:Pa

7 4:∀x(PxvQx)

8 1:□

9 1:□Pa

10 1:□Qb

11 1:□∃xQx

12 1:∀xPx

13 1:∃xPx

14 1:∀x□Px

15 1:□∀xPx

16 1:∀x□(PxvQx)

94 CHAPTER 4. MODAL LOGIC

Counterexamples

We ﬁnd counterexamples in just the same way we did before. That is, we construct a model on which

the statement is not true at world 1. If the statement is conditional, we make the antecedent true

and the consequent false. The counterexamples are a little tricker here, since we need to keep track

of the worlds, the objects on the worlds, and the interpretation of the worlds. To falsify a statement,

we may need to add a new world, or add a new object to a world, or change the interpretation of a

world.

Example

Show that ⋄∀xPx→∀xPx is not valid in M.

Because M models are reﬂexive, we need to add a self-accessibility arrow for every world we

include. We can start with a model with two worlds and see if that’s enough. To make the antecedent

true, we need to make some world accessible from world one on which everything is P; to make the

consequent false we need to make something on world 1 not P:

1{a} 2{a}Pa

We check to make sure this meets the requirements: it’s an M model; the antecedent is true; the

consequent is false. We have our counterexample.

Exercises

Find counterexamples to the following statements in the frame indicated.

1 K:∀x⋄Px→∀xPx

2 K:⋄∀x□Px→∀x□Px

3 K:∀x□(PxvQx)→∀x(□Pxv□Qx)

4 K:∀x□(Px&Qx)→(□∀xPx&□∀xQx)

5 M:⋄□∀xPx→∀xPx

6 M:□□∃xPx→∃x□Px

7 M:⋄□∀x(Px&Qx)→∃x□(Px&Qx)

8 M:□∀x(∀yPy→□Px)

9 K4:□□∃xPx→∃x□Px

10 K4:⋄∃xPx→∃x⋄⋄Px

11 K4:∀x□Px→□∀xPx

4.4. MODELS OF QUANTIFIED MODAL LOGIC 95

12 K4:∀x(□Px→Px)

13 S4:⋄□∀xPx→∀x□Px

14 S4:∀x⋄Px→⋄∀xPx

15 S4:□∃xPx→∃x□Px

16 S4:∀x□Px→□⋄∃xPx

17 S5:∀x□Px→□∀xPx

18 S5:□∀xPx→∀x□Px

19 S5:⋄∃xPx→∃x⋄Px

20 S5:∃x⋄Px→⋄∃xPx

To produce many of the counterexamples in the last set, we relied on there being different things in

different worlds. Sometimes, for various reasons, we might want to eliminate that possibility. That

is, we might want the things that exist on one world to exist on all worlds. Such a model is called a

constant domain model (in contrast with a varying domain model).

There are two basic statement that are true in a constant doman model that are not true in a

varying domain model. The ﬁrst statement, usually called the Barcan Formula (after Ruth Barcan

Marcus) is this:

∀x□Px→□∀xPx.

The Barcan formula rules out expanding domains, models on what exists grows from world to

world, models on which there are things on accessible worlds not on the actual world.

The second statement is the converse of the ﬁrst, and is usually called the Converse Barcan

Formula:

□∀xPx→∀x□Px.

This rules out shrinking domains, on which there are things in the actual world not in the ac-

cessible worlds. These two principles together imply a constant domain.

This distinction has interesting philosophical consequences. In the last section I said that it was

plausible that George Washington no longer exists. This is one view of time, on which only the

present exists—that is, dinosaurs don’t exist because they are in the past, and moon colonies don’t

exist because they are in the future. On this view of time, a temporal reading of the quantiﬁers

would want the domain to vary. On another view of time, the past and the present both exist, but

the future is still open, and hence its objects don’t exist. On this view of time, an expanding domain

96 CHAPTER 4. MODAL LOGIC

model would be more appropriate. On an alethic reading of the quantiﬁers, we probably want a

varying domain model, since it seems that there might have been things other than there are, and

there might have been fewer things than there are. The universe might have been smaller or larger

than it is.

Notes

For more information, see the following books:

G.E. Hughes and M.J. Cresswell, A New Introduction to Modal Logic. Routledge, 1996.

Ted Sider, Logic for Philosophers. Oxford, 2010. Chapters 6, 7, and 9.

M. Fitting and R.L. Mendelssohn. First-order Modal Logic. Kluwer, 1998.

Chapter 5

Arithmetic

Background

As we saw in chapter 3, one major motivation behind axiomatic theories is to systematize and pro-

vide a foundation for mathematics. In the late nineteenth and early twentieth centuries, several

people were involved in laying axiomatic foundations for arithmetic, until the project achieved the

rigor and sophistication it needed.

The language of arithmetic requires, in addition to the logical symbols, four undeﬁned non-

logical symbols:

0′+×

The ﬁrst, 0, is a constant; the other three are functions. The last two functions are functions of

two places and should be familiar to you. Normally functions are written with the function name

preceding the terms, as in ‘f(x,y)’; in this notation these functions should be written ‘+(x,y)’ and

‘×(x,y)’. We will instead use the standard inﬁx notation, so instead of writing ‘+(2,3)=5’ (or worse,

‘=+((2,3),5)’, as we should if both the relation ‘=’ and the function ‘+’ were preﬁx), we’ll simply write

‘2+3=5’. The second function is a function of one place, pronounced ‘successor’, and instead of

being written before the term it modiﬁes, it is written after. It is intended to mean the next natural

number in the sequence. Given that, we can introduce other deﬁned constants:

Def 1 1::0′

Def 2 2::1′ ::0′′

Def 3 3::2′ ::1′′ ::0′′′

Def 4 4::3’::2′′ ::1′′′ ::0′′′′

etc.

97

98 CHAPTER 5. ARITHMETIC

2+3=5

∃x(x+5=7)

2+2=4 v2+2=5

There are, of course, inﬁnitely many such sentences. An axiomatic theory of arithmetic has some

small set of true sentences in the language of arithmetic from which all the other true sentences of

arithmetic follow.

Before we ﬁnd the axioms, it is worth emphasizing again what it means to say that this is a

formal language. We set out the undeﬁned terms, and then indicate the “intended interpretation,”

where ‘0’ means the number 0, ‘+’ means the addition function, and so on. But we must resist

tacitly assuming something we already know about arithmetic in our proofs. Part of what it means

to say that these terms are undeﬁned is that they could be given other interpretations. Consider, for

example, this “non-standard” interpretation of the symbols of arithmetic:

0 means 0

′ means the predecessor function

+ means the addition function

× means the negation of the multiplication function.

Here the sequence 0, 1, 2, 3 ... will mean the sequence 0, -1, -2, -3, ..., and every sentence of

arithmetic will still be true. It will simply mean something different than you might expect. For

example, the sentence

2+3=5

0 means 1

′ means the divide-by-2 function

+ means the multiplication function

× means the function x-log2(y)

Here the sequence 0, 1, 2, 3 ... will mean the sequence 1, 1/2, 1/4, 1/8, .... Given these

interpretations, it’s easy to see that any sentence will still be true. For example, the sentence

2+3=5

5.1. ROBINSON ARITHMETIC (Q) 99

will mean, on this interpretation, that 1/4 × 1/4 = 1/16. It’s kind of fun to play around with

these non-standard interpretations.

Both of the examples given here work because we have a sequence of objects with a beginning but

no end. And any sequence that has that form can make the sentences of arithmetic true. Imagine an

archangel on an inﬁnite seashore with a string of seashells inﬁnite in one direction. Here the symbol

0 can refer to the ﬁrst seashell, the ′ function can be the function from one seashell to the next, and

the + and × functions are functions to take you from a pair of seashells to another, depending on

how far each seashell is from the ﬁrst. You could easily set it up so that “arithmetic” was about this

archangelic game and not about numbers at all.

What does this mean for doing proofs? One thing it means is that, for example, unless we assume

or prove that + is commutative, we can’t know that 2+3=3+2. We can’t assume that + and × work

the way they are “supposed” to work. All we know about them is what the axioms say, and then

what we can prove based on those axioms.

The ﬁrst axiom system we’re going to consider is named after Raphael Robinson. It has seven

axioms:

Q1 x′ ̸= 0

Q2 x′ = y′ →x = y

Q3 x = 0 v∃y(x=y’)

Q4 x+0=x

Q5 x + y′ = (x + y)′

Q6 x×0=0

Q7 x × y′ = (x × y) + x

(By convention we drop universal quantiﬁers that apply to the whole line. Thus Q1, for example,

could be written ‘∀x(x′ ̸= 0)’.) On the intended interpretation, these axioms mean:

Q2 If the successors of x and y are equal, x and y are equal.

Q3 Every number but 0 has a successor.

Q4 Any number plus 0 equals that number.

Q5 The sum of any number and the successor of any number equals the successor

of their sum.

Q6 Any number times 0 equals 0.

100 CHAPTER 5. ARITHMETIC

Q7 The product of any number and the successor of any number equals the sum

of the ﬁrst number and the product of the two numbers.

Because Q is an extension of the axiom system for logic, you are welcome to use any of the

theorems or rules from that system (including, for example, conditional proof/deduction theorem).

Because all of the following theorems are statements of identity, the identity axioms, theorems, and

rules will be useful in every proof.

Example: AT4 0+1=1

1 0+0=0 Q4

2 (0 + 0)’= 0’ 1 AT1

3 0 + 0’= (0 + 0)’ Q5

4 0 + 0’ = 0 ’ 2,3 Ident

5 0+1=1 4 Def 1

It may be helpful to go through this proof backwards. The last line is a deﬁnitional substitution.

This will often be the case when you’re proving results about speciﬁc numbers. If the proof is about

1, you should think that you need to prove something about 0′ . Line 4 is an identity chain. Notice

that the left side of line 2 and the right side of line 3 are the same thing. By our shortcut identity

theorem, we are entitled to take these extremes as identical. In this proof the chain is short—only

two lines. But sometimes the chains can get quite long (a=b, b=c, d=c, d=e, ...). Once you have

a chain that connects the left side of what you’re trying to prove with the right side of what you’re

trying to prove, you are ready for the identity chain. The step before that, then, is to create such

a chain. Different styles of theorem will require different strategies for doing this, but notice the

strategy we used here: we started (line 1) with a statement we had before (in this case an axiom) that

had as one side the predecessor of what we were trying to prove. Then we applied theorem AT1 to

take the successor of both sides. With that move, we had one side down. Then we needed to get

a line that had the other side we were trying to prove as identical with the term the other side was

identical with. Q5 came in handy here.

Prove the following theorems:

AT1 x = y →x′ = y′

AT2 x = y →x + z = y + z

AT3 x = y →z + x = z + y

AT4 0+1=1

AT5 0+2=2

AT6 1+0=0+1

AT7 1+1=2

AT8 2+0=2

AT9 2+1=3

AT10 1+2=2+1

AT11 2+2=4

AT12 x′ = x + 1

5.2. PEANO ARITHMETIC (P) 101

AT13 x” = x + 2

AT14 x = y →x × z = y × z

AT15 x = y →z × x = z × y

AT16 0×1=0

AT17 0×2=0

AT18 1×1=1

AT19 1×2=2

AT20 2×1=2

AT21 2×2=4

Robinson arithmetic can prove inﬁnitely many particular sentences of arithmetic. It can prove

‘1+2=2+1’, ‘5+7=7+5’, and so on, for any pair of numbers. But it cannot prove ‘x+y=y+x’. That

is, most general sentences of arithmetic cannot be proven without something called “mathematical

induction.”

Mathematical induction is perhaps poorly named. It is not the same thing as induction in the

sense that it is a non-deductive leap from premises to conclusion. It is like induction in that sense in

that it does go from particular premises to a general concludion, but it is a deductive law.

What is called Peano arithmetic is named after Giuseppe Peano. It is like Q except with the

third premise replaced with the axiom of mathematical induction:

P1 x′ ̸= 0

P2 x′ = y′ →x = y

P3 [X0 &∀x(Xx →Xx′ )] →∀xXx

P4 x+0=x

P5 x + y′ = (x + y)′

P6 x×0=0

P7 x × y′ = (x × y) + x

(Historically, Peano arithmetic came ﬁrst. Robinson and his colleagues were interested in seeing

how weak an arithmetic they could develop.) P3 claims that if 0 has a certain property, and if the

successor of every number that has the property also has the property, then every number has the

property.

Inductive proofs have two parts. First we prove that 0 has the given property; then we assume

that some arbitrary number k has a given property and show that this entails the k’ has the property.

Example: 0 + x = x

Notice that this is different from the axiom P4.

We ﬁrst prove the 0 case. Here it is an instance of P4, substituting 0 for x:

102 CHAPTER 5. ARITHMETIC

1 0+0=0 P4

Then we prove the inductive case:

1 0+k =k

2 (0 + k)’= k ’ AT1

3 0 + k ′ = (0 + k)’ P5

4 0 + k ’= k ’ 2,3 Ident

That completes the proof. This proof is an abbreviation of this longer poof:

1 0+0=0 P4

2 0+k =k (cp)

3 (0 + k)’= k ’ AT1

4 0 + k ′ = (0 + k)’ P5

5 0 + k′ = k′ 2,3 Ident

6 0 + k = k → 0 + k′ = k′ 2–5 CP

7 ∀x(0 + x = x → 0 + x′ = x′ ) 6 UG

8 0 + 0 = 0 & ∀x(0 + x = x → 0 + x′ = x′ ) 1,7 Conj

9 ∀x(0 + x = x) 8, P3

Line 1 of this proof is the proof of the 0 case, and lines 2–5 are the proof of the induction case.

Line 6 closes out the proof of the induction case; line 7 generalizes this; line 8 conjoins the conclusion

of the 0 case and the induction case; and line 9 applies axiom P3. These last four lines will be similar

in every proof by induction, thus to save the space and tedium we’ll adopt the shortcut of proving

only the 0 case and induction case, as we did above. But be sure you understand why this shortcut

works.

In A, one can prove all the familiar results of arithmetic, but without AA5 most general results

cannot be proven. We must now become familiar with this powerful axiom. AA5 assures us that

if 0 has a certain property, and if the successor of every number that has the property also has the

property, then every number has the property.

AT22 0+x=x

Notice that AT22 is different from AA6.

Prove the following theorems:

AT23 x×1=x

AT24 x×2=x+x

AT25 x′ + y = (x + y)′

AT26 x+y=y+x

AT27 (x + y) + z = x + (y + z)

AT28 0×x=0

AT29 x′ × y = (x × y) + y

AT30 x×y=y×x

AT31 (u = v&w = x) →u + w = v + x

5.2. PEANO ARITHMETIC (P) 103

AT32 x × (y + z) = (x × y) + (x × z)

AT33 (y + z) × x = (y × x) + (z × x)

AT34 (x × y) × z = x × (y × z)

AT35 x + y = x + v →y = v

Q3 x = 0 v∃y(x=y’)

Chapter 6

Set Theory

What are sets?

A set is a collection of objects. It is itself an abstract object that considers the various objects as a

unity. A dozen eggs, for example. Given any particular twelve eggs, there is a set of just those twelve

eggs. The set is not the carton but some abstract unity of the twelve individuals.

Sets have members or elements. Socrates is a member of the set of humans; red is a member of the

set of colors; Europe is a member of the set of continents on Earth.

Class algebra

One way to think of categorical statements, like ‘All acrobats are bohemians’, is that the terms of

the statement denote sets. This statement asserts that the entire set of acrobats is contained in the

set of bohemians. This relationship is known as subset: the set of acrobats is a subset of the set of

bohemians. In symbols, we say

A ⊂ B.

We may also want to specify a set more precisely. For example, given the set of acrobats and the

set of bohemians, we may be interested in the set of those who are both acrobats and bohemians,

those who are acrobats but not bohemians, or those who are neither acrobats nor bohemians.

It is important to distinguish between a subset of a set and an element of a set. The set of Cirque

du Soleil performers is (let’s say) a subset of the set of acrobats. But Robin is an element of the set

of acrobats. The symbol for this relation is ‘∈’ (which looks like an ‘e’ for ‘element’). The subset of

a set must always be a set, but an element of a set need not be. We will use capital letters to indicate

sets, and lower-case letters to indicate elements:

a ∈ A.

A very common notation for expressing the content of a set is to enclose a list of the elements

within curly brackets, like this:

104

6.1. NAIVE SET THEORY 105

{a, b, c}

Here a bare letter is an element, while a letter surrounded by brackets is a set. Thus, a ∈ {a, b, c}

and a a, b, c

The basic assumption of set theory is that sets are extensional, which means that two sets are

identical if they have the same members. A little more precisely, it tells us that the criterion for

identity of sets is identity of membership. It doesn’t matter what property we speciﬁed to gather

together just these things; the only thing that matters is the elements of the set. To use a famous

example, consider the set of animals that have hearts, and the set of animals that have kidneys. Now

clearly having a heart is a different thing from having a kidney, so we have picked out different

properties. (‘Having a heart’ and ‘having a kidney’ have different intensions.) But it may happen that

whatever animals have hearts also have kidneys and vice versa. If this is the case both properties

have picked out the same set. (‘Having a heart’ and ‘having a kidney’ have the same extension.) There

is just one set that could equally well be speciﬁed by listing all the members, or by the property of

having a heart, or by the property of having a kidney:

{Kermit the frog, Shasta the liger, Beyonce the human, ...}

{x: x has a heart}

{x: x has a kidney}

The ﬁrst way of specifying a set, by listing all its members (separated by commas, surrounded

by braces) is ﬁne if there are only a few elements of a set. But for a large set, we can’t list every item,

so we have to resort to ellipses. And ellipses are unhelpful unless we also have a rule to inform us

how to ﬁll them out. The second way of spelling out a set—with a variable, then a colon, then a

property, surrounded by braces—tells us what property to use in ﬁlling out the set. But the axiom

of extension tells us that a difference in property doesn’t automatically make a difference in sets.

Similarly, the order of the elements in a set doesn’t matter.

{a, b, c} = {b, a, c}.

Sometimes we do care about the order of the elements in a set. An ordered set-like group of two

elements is called an ordered pair, and is notated with angled brackets:

⟨a, b⟩.

And ordered group of three elements is called a triple, of four elements, a quadruple, of ﬁve ele-

ments, a quintuple, of n elements, an n-tuple.

The set of everything that is both A and B is called the intersection of sets A and B. The set of

acrobats who are bohemians is the intersection of the set of acrobats and the set of bohemians. In

symbols, we write

A∩B

and in a Venn diagram, with the shaded portion indicating the set we’re interested in,

A B

106 CHAPTER 6. SET THEORY

It may be helpful to think of the symbol ∩ as a cup that clamps down on just the portion of the sets

that we’re interested in.

The set of everything that is in either A or B, the two sets put together, is called the union of

the two sets. The set of anyone who is an acrobat or a bohemain is the union of these two sets. In

symbols, we write

A∪B

and in a Venn diagram,

A B

It may be helpful here to think of the ∪ symbol as a cup upright, open for every- thing in both sets.

The set of everything in one set but not in anotheris called the difference between the sets. The

set of acrobats who are not bohemians is the difference be- tween the two sets. The symbol is

A–B

and the diagram is

A B

The set of everything not in a set is called the complement of the set. The complement of the

set of acrobats is everything not an acrobat. (We may sometimes have a universe of discourse; if

the universe of discourse is people, the complement of the set of acrobats is all people who are not

acrobats.) In symbols, we write

A

6.1. NAIVE SET THEORY 107

We will later develop a subtle and powerful version of set theory. But for now, we will leave it at

the intuitive level.

Exercises

Which of the following sentences are true?

1 A⊆A∩B

2 A⊆A∪B

3 A∩B⊆A

4 A∪B⊆A

5 A⊆A

6 A⊆A–B

7 A–B⊆A

8 A∩B=A∩B

9 A∪B=A∪B

10 A–B=B–A

11 A∩B=Ā∪B

12 A∪B=Ā∩B

{a, b, c} ∪ {c, d, e} = {a, b, c, d, e}

{a, b, c} ∩ {c, d, e} = {c}

{a, b, c} − {c, d, e} = {d, e}

Exercises

Calculate the following.

13 a,b,c,d∪a,c,g

14 a,c,g∩a,b,c,d

15 a,b,d,e–a,b,c

16 (a,b,c∩c,d,e)∪d,e,f

108 CHAPTER 6. SET THEORY

Sets as elements

Sometimes the members of a set will themselves be sets. If a club is a set of the members of that

club, the set of clubs will have sets as members. Sometimes that won’t make any difference, but it

does allow for some additional operations.

For example, let’s take this to be an illustration of a set of sets:

Each circle represents one of the sets that is a member of the bigger set. We can deﬁne the

intersection of a set of sets as the set of elements that are in every set. For example, the intersection

of the set of clubs is the set of people who are members of every club. We can diagram it like this:

The symbol for the intersection of a set A is ∩A (Again, the cup clamps down on just the part

that overlaps.) Likewise, we can take the union of a set of sets. It is the set of all things that are

elements of any of the sets. For example, the union of the set of clubs is the set of people who belong

to any club. We diagram it like this:

The symbol for the union of a set A is ∪A. (This cup is upright and open for everything.) Notice

that ∪A is not always the same thing as A, even though we’ve shaded everything in every element

6.1. NAIVE SET THEORY 109

of A. A is a set of sets; ∪A is a set of elements of those sets. In the club example, A is a set of clubs,

∪A is a set of people who belong to clubs. Similarly for ∩A. ∩ and ∪ skip a level: if A is a set of sets

of individuals, ∩A and ∪A are sets of individuals. They skip over that “sets of ” in the middle.

To indicate sets of sets using bracket notation, we nest bracketed lists. For example,

{{a, b, c}, {d, e, f }, {e, f, g}} is a set containing three elements, each of those elements being

sets containing three elements. The union or intersection of a set of sets will be a set of elements.

So,

∪{{a, b, c}, {d, e, f }, {e, f, g}} = {a, b, c, d, e, f, g}.

Exercises

Which of the following sentences are true?

∩

1 A∈ A

∪

2 A∈ A

∩

3 A⊆ A

∪

4 A⊆ A

Exercises

Calculate the following.

∪

5 a,c,e,d,e,f,e,f,g

∩

6 a,c,e,d,e,f,e,f,g

∪

7 a,e,a,d,f,g,a,e,g

∩

8 a,e,a,d,f,g,a,e,g

Once we are comfortable with the difference between a thing and a set containing just that thing—

e.g., between the thing a and the set {a}—we might as whether it’s possible to have a set containing

nothing at all. The answer is yes: the set {}, which is a set with no elements, is not the same thing

as nothing. This set is important enough that we have a special symbol for it: ∅. Because the sets

{a, b, c}, {d, e, f }, {g, h, i} have no elements in common, ∩{{a, b, c}, d, e, f }, {g, h, i}} = ∅.

It might be helpful to think of a set as a box that has various things in it. The empty set, then,

is a box with nothing in it. But notice that ∅ ̸= {∅}—the left side is an empty box, and the right

side is a box containing an empty box. Thus the left side is empty, but the right side has something

in it, namely, an empty box.

Exercises

Which of the following are true?

110 CHAPTER 6. SET THEORY

1 ∅⊆A

2 A∩∅=∅

3 A∩∅=A

4 A∪∅=A

5 A∪∅=∅

6 ∪∅ = ∅

7 A–∅=A

8 A–A=∅

9 ∅–A=A

10 ∅ – A = ∅

Just as union and intersection move us from a set of sets to a set of individuals, the power set

function moves from a set of individuals to a set of sets. The power set of a set is a set of all subsets

of that set. The symbol for the power set of A is ℘A.

℘{a, b, c} = {∅, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}.

It is always the case that A ⊆ A and ∅ ⊆ A, so the empty set and the set itself are always

elements of the power set. The name “power set” comes from the fact that if A has n elements, ℘A

has 2n elements.

Exercises

Calculate the power set of the following.

11 {a, b}

12 {∅, a, b}

13 {a, b, c, d}

Frege and the beginnings of Logicism

Simple mathematical statements (e.g., 2+2=4) may be epistemically basic. That is, we could prove

them from axioms, but those axioms are no more obvious than the statement we’re trying to prove

from them. If our only interest was being certain that the statements are true, we would have no

need of an axiom system.

6.2. THE LOGICIST PROJECT 111

Not all mathematical statements are that obvious, however. Some statements have taken great

effort to prove. And some statements have taken great effort, but their proofs are either impossible

or have escaped us so far. Consider any complex mathematical statement. How do we know it’s

true? Well, there’s a proof, but that proof only shows that the statement follows from our axioms or

assumptions. How do we know that the proof really guarantees the results? How do we know that

our assumptions are true, and that the mathematical reasoning we use preserves truth?

One idea is that mathematics is simply logic. That is, all the axioms of mathematics are just

theorems of FOL, or possibly FOL with an additional axiom or two. This idea is called logicism,

and the attempt to prove that mathematics is logic is a fascinating chapter in the history of human

thought.

We begin the chapter with Gottlob Frege, a German mathematician and philosopher who was

dissatisﬁed with the theories of his contemporaries about logic and the foundations of mathematics.

The logic of the nineteenth century involved very intricate elaborations of syllogism, but was unable

to explain much of mathematical reasoning. Frege began a new foundation.

We have already considered logical axioms that are similar to those Frege used. Frege also used

an axiom of set theory:

∃X∀x(x ∈ X ↔φx)

In words, any property determines a set. Or, less concisely, given any property—the property

of being an acrobat, or the property of being a bohemian acrobat, or the property of being a milk-

drinking Olympic curler—there is a set that has as its elements all and only the things with that

property.

With this axiom, we can deﬁne all the symbols of set theory, and hence the entire system of

class algebra developed by other nineteenth-century logicians. With that, Frege was able to deﬁne

the basic notions of arithmetic. Frege’s deﬁnitions have never been surpassed in their philosophical

insight and ingenuity. We won’t go through his deﬁnitions here, because Frege’s system has a serious

problem: it is inconsistent.

Russell’s Paradox

None of Frege’s brilliant work received much attention in Germany, and it might have passed into

oblivion had it not been for Bertrand Russell, an English logician and philosopher, who discovered

Frege’s writings around the turn of the century. Russell had been working on the same problems

and had discovered most of these deﬁnitions independently. He had discovered a difﬁculty that was

giving him trouble. When he read Frege seriously, he recognized Frege’s profundity and originality

and agreed with Frege’s views on the relation between mathematics and logic. He found, however,

that Frege had not noticed the difﬁculty. The difﬁculty was this: Frege’s axiomatic system was

inconsistent.

Recall Frege’s axiom of set theory:

∃X∀x(x ∈ X ↔φx)

112 CHAPTER 6. SET THEORY

In Frege’s system, this was known as Rule V and it appeared near the beginning of the ﬁrst

volume of the Grundgesetze. Rule V seemed so obvious that, before the twentieth century, no one

questioned it.

This axiom would solve the problem we confronted at the end of the previous section—for any

property, this axiom would assure us of the existence of the set of things with that property. It would

then remain to show only that the set is unique (which generally follows quite easily from SA1), and

we could conﬁdently employ the braces notation introduced in the ﬁnal paragraphs of Section 5.1.

However, Bertrand Russell identiﬁed a predicate that could not determine a set thereby exposing a

devastating ﬂaw in Frege’s system.

In symbols, Russell’s paradox can be put succinctly.

1 ∃X∀x(x ∈ X ↔φx)

2 ∃Χ∀x(x ∈ Χ ↔x ̸∈ x) 1 UI (instantiating ξ̸∈ξ for φ)

3 ∀x(x ∈ n ↔x ̸∈ x) 2 (ei)

4 n ∈ n ↔n ̸∈ n 3 UI

5 p &∼p 4 TF

6 p &∼p 2,3-5 EI

Russell wrote to Frege to inform him of the contradiction. Frege responded like this:

Your discovery of the contradiction caused me the greatest surprise and, I would almost

say, consternation, since it has shaken the basis on which I intended to build arithmetic.

… It is all the more serious since, with the loss of my Rule V, not only the foundations

of my arithmetic, but also the sole possible foundations of arithmetic, seem to vanish.

As I think about acts of integrity and grace, I realise there is nothing in my knowl-

edge to compare with Frege’s dedication to truth. His entire life’s work was on the

verge of completion, much of his work had been ignored to the beneﬁt of men inﬁnitely

less capable, his second volume was about to be published, and upon ﬁnding that his

fundamental assumption was in error, he responded with intellectual pleasure clearly

submerging any feelings of personal disappointment. It was almost superhuman and a

telling indication of that of which men are capable if their dedication is to creative work

and knowledge instead of cruder efforts to dominate and be known.

Various logicians and mathematicians, beginning with Russell himself, sought to circumvent

this paradox by rejecting one or another step in the proof. Russell’s solution is called the Theory of

Types. According to this theory, there are verious levels or types of entity: basic entities, sets of basic

entities, sets of sets of basic entities, and so on. The membership relation ∈ holds only between an

entity at one level and one at the next level up. Thus no sets can be members of themselves, and the

sentence ‘x ̸∈ x’ in line 2 is rejected as ungrammatical. Other theories, like the theory of Zermelo

and Fraenkel, which is treated in this book, revise Frege’s Rule V with a series of axioms.

6.3. ZERMELO-FRAENKEL SET THEORY 113

The ZF Axioms

For most mathematicians, “set theory” means the axiomatic set theory developed by Zermelo and

Fraenkel and others. The basic idea of this set theory is that it is not a general theory of any sets

whatever. It is a theory of a speciﬁc hierarchy of sets. The elements of any set are themselves sets.

The axioms are listed below. They will be explained one at a time in the sections below.

Extension A = B ↔∀x(x ∈ A ↔x ∈ B)

Separation ∃x∀y(y ∈ x ↔(y ∈ A &φy))

Pairing ∀xy∃z(x∈z &y∈z)

Union ∀z∃x∀y(y ∈ x ↔∃w(y ∈ w &w ∈ z))

Inﬁnity ∃x(∅ ∈ x &∀y(y ∈ x →y′ ∈ x))

Power Set ∀x∃y∀z(z ∈ y ↔z ⊆ x)

The axiom of extension tells us that two sets are identical if they have the same members. A little

more precisely, it tells us that the criterion for identity of sets is identity of membership. It doesn’t

matter what property we speciﬁed to gather together just these things; the only thing that matters

is the elements of the set. To use a famous example, consider the set of animals that have hearts,

and the set of animals that have kidneys. Now clearly having a heart is a different thing from having

a kidney, so we have picked out different properties. (‘Having a heart’ and ‘having a kidney’ have

different intensions.) But it may happen that whatever animals have hearts also have kidneys and

vice versa. If this is the case both properties have picked out the same set. (‘Having a heart’ and

‘having a kidney’ have the same extension.) There is just one set that could equally well be speciﬁed

by listing all the members, or by the property of having a heart, or by the property of having a

kidney:

Kermit the frog, Shasta the liger, Beyonce the human, ...

x: x has a heart

x: x has a kidney

The ﬁrst way of specifying a set, by listing all its members (separated by commas, surrounded

by braces) is ﬁne if there are only a few elements of a set. But for a large set, we can’t list every item,

so we have to resort to ellipses. And ellipses are unhelpful unless we also have a rule to inform us

how to ﬁll them out. The second way of spelling out a set—with a variable, then a colon, then a

property, surrounded by braces—tells us what property to use in ﬁlling out the set. But the axiom

of extension tells us that a difference in property doesn’t automatically make a difference in sets.

We can deﬁne these notations formally:

Def w ∈ x ↔w = x

Def , w ∈ x,y ↔w = x vw = y

114 CHAPTER 6. SET THEORY

Def : y ∈ x : Xx ↔Xy

As a reminder, the universe of discourse of ZF set theory is sets. That means that all quantiﬁers

range only over sets, that all properties are properties of sets, that all terms refer to sets. It means

that ZF set theory doesn’t really allow talk of sets of creatures with kidneys, since creatures aren’t

sets. So even though the axiom listed above distinguished between the sets (with capital letters) and

the elements (with lower-case letters), ofﬁcially it’s sets all the way down.

So, to use an example with numbers (which we take to be sets): consider the set of all even

primes, and consider the set of all square roots of 4. Here are three ways of specifying this set:

2

x: x is even &x is prime

x: x=√4

These properties used to specify this set may be different in intension, but they all pick out the

same set, the set containing only the number 2.

Aside: Strictly speaking, we need only the right-to-left part of the biconditional. The left-to-right

half follows from the axioms of identity. If x and y are identical, the axioms tell us, they have every

property in common, including their membership. It’s the right-to-left half that tells us something

more restrictive about sets.

The Axiom of Separation (also called Comprehension, Abstraction, or its German name Aus-

sonderung) is the revision of Frege’s Rule V. As we have seen, in contrast to Frege’s Rule V, this

axiom forces us to draw the elements for each new set from members of earlier sets. If we already

know that they set John, Mary, Susan exists, this allows us to draw from that set to form a new set.

We call it ’Separation’ because it allows us to separate out a new set from an old.

Def ⊆ A ⊆ B :: ∀x(x ∈ A →x ∈ B)

Def ⊂ A ⊂ B :: A ⊆ B &A ̸= B

ZF1 (A ⊆ B &B ⊆ C) →A ⊆ C

ZF2 (A ⊆ B &B ⊆ A) →A = B

ZF3 A ⊆ A

ZF4 A ̸⊂ A

*ZF5 A ⊂ B →B ̸⊂ A

ZF6 x ̸∈ A →(x ∈ A ↔x ̸= x)

ZF1 tells us that the subset relation is transitive; ZF3 tells us that it is reﬂexive. (Is it symmetric?)

ZF4 tells us that the proper-subset relation is irreﬂexive, and ZF5 tells us it is asymmetric. (What

about transitive?) ZF6 should be obvious, if a triﬂe odd. It claims that if x is not a member of a given

set, then it is a member of that set only if a particular contradiction is true. It will be handy in the

next section.

So far we are able to extract sets from larger sets. But how do we get these larger sets in the ﬁrst

place? And, relatedly, how do we know that there are any sets?

6.3. ZERMELO-FRAENKEL SET THEORY 115

We can’t take this for granted. Because we’re developing set theory axiomatically, we can’t

simply say that it’s obvious that there are sets, that everyone knows there are sets. If it’s going to play

a role in the proofs, we need to assume the existence of sets.

ZF has one existential axiom: the Axiom of Inﬁnity. This assumes the existence of an inﬁnite set,

which, by Separation, we can break off into smaller sets. It is an interesting and powerful axiom,

and we’ll discuss it at greater length in just a moment. For now, all we need from the Axiom of

Inﬁnity is that there is at least one set.

With that assumption we can prove the existence of another useful and interesting set: the empty

set. Let A name the set that the Axiom of Inﬁnity guarantees to exist. Then we can prove the

existence of a set that has no members:

∃x∀y(y ∈ x ↔(y ∈ A &y ̸= y)) (Separation φy: y ̸= y)

∀y(y ∈ x ↔(y ∈ A &y ̸= y)) (ei x)

y ∈ x ↔(y ∈ A &y ̸= y) 2 UI

y ∈ A →(y ∈ x ↔y ̸= y) 3 TF

y ∈ A →∃x∀y(y ∈ x ↔y ̸= y) 4 FOL

y ̸∈ A →(y ∈ A ↔y ̸= y) ZF6

y ̸∈ A →∃x∀y(y ∈ x ↔y ̸= y) 6 FOL

∃x∀y(y ∈ x ↔y ̸= y) 5,7 TF

∃x∀y(y ∈ x ↔y ̸= y) EI

This proof tells us to separate off from the Inﬁnity set a set according to the following rule: pick

only those members that are not self-identical. Because, of course, there are no such things, we are

guaranteed to have a set that is empty. This empty set (sometimes called the null set) has a special

symbol: ∅.

This may seem strange. Maybe the strangeness can be expressed like this: If a set is a collection

of things, then if there are no things to collect, there’s no set! It may help to visualize a set as an

imaginary box containing various things (e.g., the current members of the U.S. Senate, the solar

planets, the numbers). If the box happens to have nothing in it, the box doesn’t thereby disappear.

The empty set is just such an empty box.

Because sets are determined by their members, there is only one empty set. The set of all

dragons, the set of all married bachelors, the set of all honest knaves—each of these descriptions

picks out just this one set with nothing in it.

ZF7 ∀xyz[((x ∈ z ↔x ̸= x) &(y ∈ z ↔y ̸= y)) →x = y]

ZF8 ∀x(x ∈ ∅ ↔x ̸= x) ↔∀x(x ̸∈ ∅)

ZF7 tells us that there is at most one empty set. Together with the proof above, we know that

there is exactly one empty set, that the empty set is unique. ZF8 tells us that the way we deﬁned

the empty set above is equivalent to a simpler way. We will adopt this simpler way as a theorem.

Practically, this theorem will be more useful in our proofs than the earlier deﬁnition. We will cite

this theorem as ’Empty Set’.

ZF9 Empty Set x ̸∈ ∅

*ZF10 ∅ ⊆ x

116 CHAPTER 6. SET THEORY

We have an axiom (Separation) that allows us to make sets out of bigger sets, and we have the empty

set. It would be nice to be able to have an axiom that allows us to put sets together to make bigger

sets. And that’s what the Pairing Axiom allows us to do.

The Pairing Axiom tells us that given any two sets, there’s a set they both belong to. (More

precisely, given any sets A and B, which may be the same set, there’s a set C such that A∈C and

B∈C.) In symbols:

Pairing ∀xy∃z(x∈z &y∈z)

From this axiom we can prove that, given any two sets, there is a set just they belong to—given,

for example, A and B, there is a set A,B:

ZF11 ∀x∀y∃z∀w(w ∈ z ↔(w = x vw = y))

In fact, this theorem is equivalent to the Pairing Axiom:

ZF12 ∀xy∃z(x∈z &y∈z) ↔∀x∀y∃z∀w(w ∈ z ↔(w = x vw = y))

This axiom allows us to introduce the brace notation used above. So we have the following

deﬁnition:

Def , w ∈ x,y :: w = x vw = y

If x = y, we write x in place of x,x. Thus,

Def w ∈ x :: w = x

EXERCISES

Prove the following theorems:

ZF13 x,y = y,x

ZF14 ∃x∀y(y ∈ x ↔∃z∃w(y ∈ z &y ∈ w))

ZF14 says that, given any two sets, there is a set consisting only of the elements common to both

sets. That is, given any two sets, their intersection is also a set. We can deﬁne intersection formally

then:

Def ∩ x ∈ A ∩ B :: x ∈ A &x ∈ B

In words, x is an element of the intersection of A and B if and only if x is an element of B and

an element of C. (You may want to review the intuitive presentation of intersection above to make

sure this deﬁnition makes sense to you.)

ZF15 A ∩ ∅ = ∅

*ZF16 A ∩ B = B ∩ A

ZF17 (A ∩ B) ∩ C = A ∩ (B ∩ C)

ZF18 A ∩ A = A

ZF19 (A ⊆ B &A ⊆ C) →A ⊆ (B ∩ C)

We can generalize the concept of intersection so that it applies not just to two sets, y and z, but

to any number of sets, that is, to any set of sets. Suppose A is a set of sets; then ’x is an element of

the intersection set of A’ means that x is an element of every set that is an element of A. In symbols

∩ ∩

Def x ∈ A :: ∀z(z ∈ A →x ∈ z)

Of course we can use this deﬁnition only after proving the existence and uniqueness of the

intersection set of a given set. The next theorem guarantees this.

6.3. ZERMELO-FRAENKEL SET THEORY 117

ZF21 A,B = A ∩ B

Next we want to introduce the concept of the union of two or more sets. Some set z is in the

union of x and y if z is in x or z is in y. This concept is parallel to the concept of intersection just

discussed, but it needs a new axiom. The axiom of Separation allows us to make smaller sets out

of sets we already have (licensing intersections) but does not allow us to make bigger sets. We need

a new axiom that allows us to do this. We’ll pick a general axiom that allows us to make arbitrary

unions, and then deﬁne pairwise unions as a special case:

Union ∀z∃x∀y(y ∈ x ↔∃w(y ∈ w &w ∈ z)) ∪

As always, uniqueness follows easily from SA1, and we introduce a new symbol ‘ ’, read union.

Here is the

∪ deﬁnition:

∪

Def x ∈ A :: ∃z(x ∈ z &z ∈ A)

EXERCISES

Prove the

∪ following theorems:

*ZF22∪ A = A

ZF23 ∅∪= ∅

ZF24 x ∈ y,w ↔(x ∈ y vx ∈ w)

ZF24 makes possible the following deﬁnition:

Def ∪ x ∈ y ∪ w :: x ∈ y vx ∈ w

This is a special case of union—the union of exactly two sets. It corresponds to the special case

of intersection ∩ above.

EXERCISES

Prove ∪the following theorems:

ZF25 A,B = A ∪ B

ZF26 A ∪ ∅ = A

ZF27 A ∪ B = B ∪ A

ZF28 (A ⊆ B vA ⊆ C) →A ⊆ (B ∪ C)

ZF29 A ∪ A = A ∪ ∪

ZF30 A = B → A = B

ZF31 A∪∪ B = A,B∪ ∪

*ZF32 A ∪ B = A ∪ B

Notice that in Zermelo-Frankel set theory, absolute complements cannot be deﬁned since the

absolute complement of the null set would be the universal set which, as we have seen, cannot exist.

However, in this system, there is what is called a relative complement. This consists of the set of all

those entities that are in one set but not in a second set. This can be thought of as the complement

of the second set within the ﬁrst set. Given sets x and y, the existence and uniqueness of the relative

complement of y within x are insured by SA2 and SA1 respectively. The symbol that is used for a

relative complement is ‘–’ and here is its deﬁnition:

Def – z ∈ x – y :: z ∈ x &z ̸∈ y

EXERCISES

Prove the following theorems:

118 CHAPTER 6. SET THEORY

ZF33 A – ∅ = A

ZF34 A – A = ∅

ZF35 ∅ – A = ∅

ZF36 A ∩ B = ∅ →A – B = A

We have seen in ZF13 that x,y = y,x. So when picking out a set by listing the elements inside

braces, the order of the listed items is not relevant. But since not all relations are symmetric, we

want to distinguish between Mxy and Myx—one may be true while the other is false (x may be the

mother of y, but if so y is not the mother of x). As in this case, it often happens that the order of

terms matters. How can the concept of order be captured in set theory? The usual approach is by

introduction of what is called an ordered pair. Here is the deﬁnition:

Def < , > <x,y> :: x,x,y

EXERCISE

Prove the following theorem:

ZF37 <x,y> = <u,w> ↔(x = u &y = w)

Suppose D is some set and R is an equivalence relation deﬁned on D. We know that R partitions

D into equivalence classes such that each member of any one of these classes stands in the relation R

to any other member of that class. We use square braces to identify the equivalence class determined

by x, which is any one particular element of D. Thus, within the universe of discourse D:

Def []R [x]R :: y:&Rxy

ZF38 x ∈ [x]R

ZF39 (y ∈ [x]R &w ∈ [x]R )→Rwy

ZF40 (y ∈ [x]R &y ∈ [w]R ) →[x]R = [w]R

We require one ﬁnal set-theoretic function. Here is the axiom on which this function rests:

Power Set ∀x∃y∀z(z ∈ y ↔z ⊆ x)

Once again, it is possible to prove that the set y is unique. The symbol that we now introduce is

‘℘’ which is read “the power set of.” Here is the deﬁnition:

Def ℘ x ∈ ℘A :: x ⊆ A

Given some set, the power set of that set is the set comprising all the subsets of the given set. As

we will see, this is a rich concept with many consequences. As usual, we will prove a few theorems

to become familiar with this concept.

EXERCISES

Prove the following theorems:

ZF41 A ∈ ℘A

ZF42 ∅ ∈ ℘A

∩

*ZF43 x:x ∈ ℘A = ∅

ZF44 A ⊆ B ↔℘A ⊆ ℘B

ZF45 ℘A ∪ ℘B ⊆ ℘(A ∪ B)

ZF46 ℘(A ∩ B) = ℘A ∩ ℘B

∪

ZF47 A = ℘A

∪

ZF48 A ⊆ ℘ A

6.4. CANTOR’S THEORY OF TRANSFINITE NUMBERS 119

We will now digress to consider some ramiﬁcations of the power set axiom. Set theory was developed

by Georg Cantor in his attempt to understand inﬁnity, and some of the most interesting results of

set theory come from inﬁnite sets.

We begin by deﬁning ‘equinumerous’. Roughly, two sets are equinumerous if they have the same

number of elements. For example, the sets a,b,c and d,e,f are equinumerous. This rough deﬁnition

uses the concept of number, and sometimes (like when we talk about inﬁnity) the concept of number

is the very thing we’re trying to understand. So instead, we’ll say that sets A and B are equinumerous

if there’s a one-to-one correspondence between the two sets. We can understand that as a special

kind of function, or just a pair of lists with arrows going between them, like this:

A B

There is a one-to-one correspondence if there is an arrow from every member on one list to

exactly one member on the other list. Said another way, sets are equinumerous if, for each member

of either, there is a unique member of the other. (Another way to say that two sets are equinumerous

is to say that they have the same cardinality.) To show that two sets are equinumerous, we put the sets

into a one-to-one correspondence. The question before us is this: are all inﬁnite sets equinumerous?

Obviously, no ﬁnite set is equinumerous with any of its proper subsets. For example, {a, b, c}

is not equinumerous with any of its proper subsets: {a, b}, {a, c}, {b, c}, {a}, {b}, {c}, ∅. This is

because {a, b, c} has “more elements” than any of its proper subsets. However, inﬁnite sets can

be equinumerous with their proper subsets. Indeed, this fact can be used to deﬁne ‘inﬁnite set’:

an inﬁnite set is one that is equinumerous with some of its own proper subsets. For example, the

set of natural numbers (N) is equinumerous with many (indeed inﬁnitely many) of its own proper

subsets. It is easy to see that N is equinumerous with the odd numbers: we just show a one-to-one

correspondence between the natural numbers and the odd numbers. The most obvious way to do

this is like this:

0 1 2 3 4 5 6 7 8 9 10 …

1 3 5 7 9 11 13 15 17 19 21 …

Here we assign each odd number x to the natural number x2–1 . So there are exactly as many

even numbers as there are natural numbers. Strange!

This strangeness is just the strangeness of inﬁnity you’re already familiar with. This strangeness

is obvious at Hilbert’s Hotel. Hilbert’s Hotel has inﬁnitely many rooms, and they’re all ﬁlled. A

120 CHAPTER 6. SET THEORY

weary traveler comes to the ofﬁce and asks for a room. The clerk at the desk tells him to wait a

moment, and he’ll make one available. He has each lodger move down one room, making the ﬁrst

room now vacant. Next, a bus pulls up with inﬁnitely many passangers. The clerk tells them to wait

a moment, and has the lodger in room 1 move over a room (leaving room 1 vacant), the lodger in

room 2 move over two rooms into room 4 (leaving room 3 vacant), the lodger in room 3 move over

three rooms, and so on. Now every other room is vacant, and all the travelers on the bus have a

room. There’s always room at Hilbert’s Hotel.

Hilbert’s Hotel is, of course, just a vivid way of imagining putting the numbers in a one-to-one

correspondence. A bus with inﬁnitely many people ﬁlling in only the odd-numbered rooms is the

same thing as putting the odd numbers into a one-to-one correspondence with the natural numbers.

This can be done with any inﬁnite subset of the natural numbers: the multiples of ten (or of any other

number), the set of perfect squares, the members of greater than 1,000, and so forth. Each of

these is a proper subset of , and yet is equinumerous with . Moreover, itself is equinumerous

with other sets that include as a proper subset. For example, is equinumerous with the integers

( ) even through includes all the positive and negative integers. Here we arrange each negative

number after its corresponding positive number, and then match them up with the natural numbers:

0 1 –1 2 –2 3 –3 4 –4 5 –5 …

is also equinumerous with the positive improper fractions. It takes a little more ingenuity to

match up these numbers with the natural numbers, but it can be done. We ﬁrst put all the rational

numbers on a grid, and then determine an orderly path through the grid so that every number gets

counted exactly once. One way to do it is like this:

1/1 2/1 3/1 4/1 5/1

1/2 2/2 3/2 4/2 5/2

1/3 2/3 3/3 4/3 5/3

1/4 2/4 3/4 4/4 5/4

1/5 2/5 3/5 4/5 5/5

1/6 2/6 3/6 4/6 5/6

With a similar approach, we can show that is equinumerous with the proper fractions, and

with the rationals ( , which include all fractions positive and negative. Indeed, is equinumerous

with what appears to be an even larger set: the set of all numbers that can be represented as roots of

polynomial equations ( ). This set goes beyond by including irrational numbers (e.g. the square

roots of two). has a proper subset equinumerous with (and therefore with and ), but

even is equinumerous with . Strange!

Or maybe not so strange. All of this is just saying ∞ + 1 = ∞, ∞ + ∞ = ∞, and so on,

and that is old hat. What is truly strange is that some inﬁnite numbers are larger than others.

There are inﬁnite sets that are not equinumerous with , , , or . The real numbers ( )

include subsets equinumerous with all the sets of numbers mentioned in the previous paragraph

but they also include what are called transcendental numbers. These numbers are not elements

of —they cannot be represented as roots of polynomial equations. Most of us are familiar with

only one such number: π (pi), the ratio of the circumference to the diameter of a circle. However,

while one seldom encounters transcendental numbers, they are vastly more numerous than any of

6.4. CANTOR’S THEORY OF TRANSFINITE NUMBERS 121

the numbers mentioned in the preceding paragraph. Once the transcendental numbers are added

to one obtains , a set not equinumerous with , , , or —in this sense L is a larger

inﬁnite set. But how can we prove that is not equinumerous with ?

Cantor’s diagonal proof establishes this result. In discussing the proof, rather than talking about

all real numbers, we focus only on those between 0 and 1 (this set turns out to be equinumerous with

the set of all real numbers). We will write these numbers as decimals. For example, we can write

1/2, 1/3, 2/3, 1/4, 3/4 etc. as .5, .333. . ., .662. . ., .25, .75, etc. Of course π will not be included

since it is greater than 1, but the decimal part of it will be: .14159265 . . . We write terminating

decimals (for example, .5 in contrast to .333. . .) with 0’s in all the decimal places following their

termination (for example, instead of .5 we write .500 . . .). Now suppose the real numbers between

0 and 1 are equinumerous with . This hypothesis leads to a contradiction and so must be false.

The natural numbers can be written in a column:

1

2

3

..

.

If, by hypothesis, is equinumerous with , there must be a way of arranging the real numbers

in a column matching the column of natural numbers. Suppose we try to construct such an array

comprising all the real numbers between 0 and 1. It may begin like this:

1 .500000000 . . .

2 .798622222 . . .

3 .141592653 . . .

4 .250000000 . . .

5 .333333333 . . .

6 .999999999 . . .

7 .183183183 . . .

8 .718281828 . . .

.. ..

. .

If these sets are equinumerous, every real number between 0 and 1 must be somewhere in the

list at the right. But we can construct real numbers between 0 and 1 that are nowhere in the list

(regardless of how the list is generated or how long it may be). This proves that the hypothesis is

false.

Here is one way of constructing such a number: Proceed down the diagonal of the digits in

our supposedly exhaustive list. In each case, ﬁnd the nth digit of the nth number (ﬁrst digit of the

122 CHAPTER 6. SET THEORY

ﬁrst number, second digit of the second number, third digit of the third number, etc). If that digit

is 0 through 8, the nth digit in the number we are constructing will be one greater; if that digit is

9, the nth digit in the number we are constructing will be 0. What does this mean in terms of our

supposedly exhaustive list? Start with the ﬁrst number in the list. Its ﬁrst digit is 5 so the ﬁrst digit of

the number we are constructing will be 6. The second digit of the second number is 9 so the second

digit of the number we are constructing will be 0. The third digit of the new number will be 2, and

so on. Proceeding in this way, we construct this number:

.60214026 . . .

But this number must differ from every number in the (supposedly exhaustive) list. In general it

will differ from the nth real at least in its nth digit. This means that our enumeration is not complete

and so the hypothesis that led to the enumeration is false. That is, is not equinumerous with .

has a property doesn’t have: density. Density means that between any two numbers

there is another number: ∀xy(x>y→∃z(x>z&z>y)). But is also dense, and it is equinumerous

with , so this is not the property that causes it to have greater cardinality than . That property

is continuity. Continuity was deﬁned by Dedekind, and his deﬁnition is given below (*); for now it

will be enough to understand it intuitively. Density allows inﬁnitessimal gaps, but continuity doesn’t.

Here’s a way to picture it: Construct, in your imagination, a dense line segment. Begin by placing

two points, then a point in between, then a point in between each of those, and so on inﬁnitely. At

two different stages, the line will look like these:

Now imagine you have a subtle sword, a blade of inﬁnitessimal breadth, and you slice through

the line without touching any of the points on the line. Such a feat is obviously possible on the two

non-dense line segments above—the sword need not even be all that subtle. As the number of points

increases to inﬁnity, the distance between them shrinks, but there is always a gap large enough for

a sword of inﬁnitessimal breadth. If the line were continuous, it would present a solid surface with

no gaps for the subtle sword, and any slice would pas through a point rather than between them.

(The sword is Dedekind’s; we’ll see it again later, as this is precisely the way to deﬁne real numbers.)

The line of real numbers is not just dense, it is continuous, and so the cardinality of is called the

cardinality of the continuum.

This proves that there is more than one inﬁnite number. Hence it turns out that ‘inﬁnite’ is a

misleading term to use. ‘Inﬁnite’ is a negative term, it negates ﬁnitude, so some have thought inﬁnity

to be a negative property, like the property of being not-green. (And there is no such thing as inﬁnity,

any more than there is such a thing as a not-elephant; there is only the negative property.) But since

there is more than one inﬁnity, inﬁnity is more than a simple negation of ﬁnitude. Cantor introduced

the word ‘transﬁnite’: a transﬁnite number is a real number, as real as 17 or 254, but transcends all

ﬁnite numbers. He also introduced names for these numbers: ℵ0 (‘aleph-null’ or ‘aleph-naught’): ℵ

is the ﬁrst letter of the Hebrew alphabet) is the number that is the cardinality of , and they go on

from there: ℵ1 , ℵ2 , ℵ3 .

Cardinals and ordinals. (This paragraph is another digression. While this distinction is impor-

tant for a lot of philosophy and mathematics, it won’t show up again in our story, so you can skip

this paragraph without loss.) We deﬁned the numbers in terms of their order: 0 is the number that

doesn’t follow anything, 5 is the number that follows 4, and so on. We determined their order by

6.4. CANTOR’S THEORY OF TRANSFINITE NUMBERS 123

their cardinality, and this by the relation of equinumerosity so there would be no circularity. So or-

der and cardinality are properties that coincide for all numbers. Well, all ﬁnite numbers. It turns out

that for transﬁnite numbers, order and cardinality are different things. So in fact there are two sets of

transﬁnite numbers: the transﬁnite cardinals and the transﬁnite ordinals. The transﬁnite cardinals

we’ve already seen. The transﬁnite ordinals work differently. The ﬁrst transﬁnite ordinal, which is

the ordered set of all natural numbers in their standard order, is ω (‘omega’, the last letter of the

Greek alphabet). But these numbers could have been ordered differently: how about <0,1,2,3,…,

c>? Here we have the set ω as usual, but then another number, c, greater than all of them. This is

the number ω+1, and ω+1̸=ω. In fact, ω+1̸=1+ω, since putting the new number at the beginning

of the sequence is different from putting it at the end. Thus, transﬁnite ordinal arithmetic is different

from transﬁnite cardinal arithmetic. We’re going to be interested only in transﬁnite cardinals.

Cantor’s Diagonal Proof can be generalized. The Generalized Diagonal Proof establishes that

the power set of any set always has greater cardinality than the given set. This generalizes the

argument we have just discussed since , the set of real numbers, can be regarded as the power set

of .

Start with any set, say A. We are mainly interested in inﬁnite sets, but the proof works for ﬁnite

sets as well. Form the power set of A, ℘A. Suppose ℘A and A are equinumerous, that is, suppose

each element of ℘A corresponds to an element of A (as we will now see, this supposition leads to a

contradiction and so is false). Form a new set, A′ . The elements of A′ are taken from A (so A′ is a

subset of A and thus an element of ℘A); in particular we form A′ as follows: for any element of A,

if it is not in the element of ℘A that it corresponds to, let it be in A′ and otherwise not. Since A′ is a

subset of A and thus an element of ℘A, by the supposition that we want to prove false, some element

of A must correspond to it. Call this element b. Here is a diagram of what we have so far:

Now ask: Is b is an element of A′ ? If it is, by deﬁnition, it can’t be (since A′ includes only

elements of A not in the element of ℘A to which they correspond); but if it isn’t, by deﬁnition, it

must be. Thus, b must both be and not be an element of A′ —a contradiction. But what led to this

contradiction? It was our supposition that A and ℘A are equinumerous. So this supposition is false.

This proof establishes that the power set of any set, ﬁnite or inﬁnite, has greater cardinality

(more elements) than the set itself. In particular, ℘ (which is ) must have greater cardinality

than , but ℘ must have greater cardinality than , and so on. Cantor thought that ℵ1 was the

cardinality of , but he couldn’t prove it. This hypothesis is known as the Continuum Hypothesis,

and the general version, that ℵ2 is the cardinality of ℘ , and so on, is known as the Generalized

Continuum Hypothesis. It turns out that it is independent of the standard axioms of set theory.That

means ZFC, the set theory we have been using, is not strong enough to prove the either Generalized

Continuum Hypothesis or its negation, so either it or its negation (or some broader axiom that

sufﬁces to prove it or its negation) must be added. Which one? It’s not easy to tell. (Sometimes a

different sequence of letters is explicitly deﬁned as the cardinality of the successive power sets, so

ℶ1 = ℵ1 , the cardinality of ℘ , and so on. So the Generalized Continuum Hypothesis is the claim

that ℶx = ℵx for all x>0.)

Here is a ﬁnal result of Cantor’s work. Suppose there were a universal set. Then, according

to the generalized diagonal proof, the power set of the universal set would have greater cardinality

124 CHAPTER 6. SET THEORY

than the universal set—that is, it would have greater cardinality than the set that already has ev-

erything in it. But that is a contradiction. This paradox, called Cantor’s paradox, is resolved by

rejecting the idea of a universal set. As we have seen in the process of avoiding Russell’s paradox,

Zermelo-Frankel set theory also abandons the universal set. In Paul Halmos’s hyperbolic words:

“We have proved, in other words, that nothing contains everything, or, more spectacularly, there is

no universe.” Halmos goes on to apologize for the hyperbole, but his words are quite literally true.

Of course, if we understand ‘universe’ to mean ‘cosmos’ or ‘all the stars and galaxies and everything

else physics studies’, there may well be a universe. That’s a question for the physicists: is there, in

addition to the galaxies and magnetic ﬁelds and whatnot, an object that contains them all? But Can-

tor proved something else. If we understand ‘universe’ to mean ‘the object that contains absolutely

everything’—here not restricting the deﬁnition to physical objects, but including also numbers and

other mathematical objects, and possibly (who knows?) much more—we’ve just proved that there is

no such thing.

Now we return to Frege’s project. Our present goal is to prove Peano’s axioms and thereby follow,

at least in spirit, the central steps in Frege’s attempt to reduce arithmetic to logic.

As we saw in Section 3.2, one version of Peano’s axioms can be symbolized as follows:

6.5.A N0

6.5.B Nx →Nx′

6.5.C x′ ̸= 0

6.5.D x′ = y′ →x = y

6.5.E (X0 &∀x(Xx →Xx′ )) →∀xXx

As a ﬁrst step we must deﬁne, in strictly logical terms, the undeﬁned terms in these axioms:

‘0’, ‘′ ’, and ‘N’. As we have seen, Frege accepted too liberal a concept of sets in formulating his

deﬁnitions and so, in spite of their clarity and inherent plausibility, Frege’s deﬁnitions led directly to

Russell’s paradox. Zermelo-Frankel set theory adopts different deﬁnitions.

Here is one way of deﬁning ‘zero’ and ‘successor’ compatibly with Zermelo-Frankel set theory:

Def 0 0 :: ∅

Def ′ x′ :: x ∪ x

Thus we deﬁne the other numbers like this:

1 :: ∅ ∪ ∅ :: ∅ :: 0

2 :: 1 ∪ 1 :: ∅ ∪ ∅ :: ∅, ∅ :: 0,1

3 :: 2 ∪ 2 :: ∅,∅,∅,∅ :: 0,1,2

We deﬁne zero as the null set and the successor of any number as the union of that number with

the set whose only member is that number. These deﬁnitions lack the intuitive plausibility of Frege’s

deﬁnitions, but they avoid loose talk about sets of all sets and they seem to avoid the paradoxes.

The Zermelo-Frankel deﬁnition of number is similar to Frege’s—it also deﬁnes the natural num-

bers as the intersection set of all sets that include 0 and are closed with respect to the relation ‘is a

6.5. PEANO’S AXIOMS 125

successor of ’, but it avoids the paradox by drawing these sets only from other sets. However, this

approach requires a preliminary deﬁnition and an axiom.

Def Z-inductive x is Z-inductive :: 0 ∈ x &∀y(y ∈ x →y′ ∈ x)

This means that a set is Z-inductive if it includes 0 and is closed with respect to the relation ‘is

a successor of.’

Inﬁnity ∃x(0 ∈ x &∀y(y ∈ x →y′ ∈ x))

By SA7, there is at least one z-inductive set. This axiom is sometimes called the axiom of inﬁnity

because it entails the existence of an inﬁnite set. You can probably see that the set of natural numbers

is Z-inductive, but many other sets are also. To take the same fanciful example we encountered

earlier, the set that includes the natural numbers plus the moon is z-inductive (the moon doesn’t

have a numerical successor). Once again, we must formulate a deﬁnition of the natural numbers

that will eliminate such fanciful additions to the set we are interested in. Following Frege, we do this

by taking the intersection of all Z-inductive sets. It is customary to call this set ω (the last letter in

the Greek alphabet).

∩

Def ω ω :: x : x is z-inductive

As you can probably see ω is the common core of all z-inductive sets; we can deﬁne the phrase

‘x is a number’ by stipulating that x is any element of this core:

Def N Nx :: x ∈ ω

From here on we will simplify things by further restricting the universe of discourse to sets that

are elements of ω, that is, to the natural numbers.

It now remains only to prove Peano’s axioms. Four of the proofs, including the three in the

following exercise set, are quite easy:

EXERCISES

Prove the following theorems

ZF49 N0

ZF50 Nx →Nx′

ZF51 0 ̸= x′

Next we will prove 5.5.E. In the notation of set theory it is written like this:

ZF52 (0 ∈ y &∀x(x ∈ y →x′ ∈ y)) →∀x x ∈ y

This axiom of arithmetic (which becomes a theorem in set theory) is called, somewhat mis-

leadingly, the principle of mathematical induction. It is a powerful axiom/theorem that we will use

extensively in proving theorems in arithmetic. However, it follows easily from our deﬁnition of num-

ber. The antecedent in ZF42 tells us that y is z-inductive. Since each number is in the intersection

of all z-inductive sets, each number is an element of every z-inductive set and so of y.

The remaining axiom, 5.5.D is somewhat more difﬁcult to prove. We start with a deﬁnition:

Def Tz Tz :: (x ∈ y &y ∈ z) →x ∈ z

‘Tz’ is read “z is transitive.” This name is appropriate because in such sets, the relation of

membership is transitive. Given this deﬁnition, one can prove two lemmas (ZF46 and ZF47) from

which the ﬁnal Peano axiom (ZF48) follows quite easily.

EXERCISES

126 CHAPTER 6. SET THEORY

Prove the following theorems. Note: ZF53 through ZF55 are straightforward and are mainly

intended to provide experience working with the deﬁnition of ‘T’. ZF46 is more difﬁcult (but man-

ageable). ZF57 requires the use of mathematical induction (i.e. of ZF52). ZF58, which is the ﬁnal

Peano axiom, follows

∪ from ZF56 and ZF57.

ZF53 Tx ↔ x ⊆ x

ZF54 Tx ↔∀y(y ∈ x →y ⊆ x)

ZF55 Tx ↔x∪⊆ ℘x

ZF56 Tx → x′ = x

ZF57 Tx

ZF58 x′ = y′ →x = y

Chapter 7

Gödel’s Proofs

We must be clear at the outset exactly what Kurt Gödel proved. Here are Gödel’s two main results:

(1) no consistent axiom system for arithmetic can be complete, and (2) no axiom system for arith-

metic can be proven consistent by any argument that can be expressed [the usual technical term

is represented] in that system. Remember what it means for an axiom system for arithmetic to be

complete: it means that within that system, every true arithmetic statement can be proven. Thus,

Gödel’s ﬁrst result means that, given any consistent axiom system for arithmetic, there will always

be true arithmetic statements that cannot be proven within that system. This holds not only for

Peano’s axioms but for any consistent axiom system no matter how many axioms it may include

(even an inﬁnite number). In the simplest terms, Gödel did this by constructing a statement, in the

language of arithmetic, that says of itself that it cannot be proven. Such a statement must be true

because if it were false, it would be both false and provable (an impossible combination if the axioms

are consistent). Thus, the statement must be true and so unprovable.

Here are a couple of claims Gödel did not prove (be sure you see the difference): (1) He did

not prove, merely, that no consistent axiom system for arithmetic could be proven complete (and

thereby leave open the possibility that some such system really may be complete but we just can’t

prove it)—rather he proved that there is no possibility of an axiom system for arithmetic that is both

consistent and complete. (2)He did not prove that no axiom systems are complete. Inconsistent

axiom systems (even for arithmetic) are always complete. Moreover, there are complete and consis-

tent axiom systems for truth-functional and quantiﬁcational logic, for identity, and for lots of other

subjects—but not for every subject and, in particular, not for arithmetic. (3) He did not prove that

arithmetic itself is not complete. Only axiom systems can be complete or incomplete or consistent

or inconsistent—it makes no sense to say that arithmetic itself (the collection of all arithmetic truths)

is or is not complete.

Until the 1920s, when Gödel’s results appeared, everyone assumed that Peano’s system was

complete or, at worst, that it could be made complete by adding a few axioms. Gödel’s results came

127

128 CHAPTER 7. GÖDEL’S PROOFS

as a great blow to preconceptions–what, after all, can it mean for an arithmetic statement to be

true except that it is provable? Yet Gödel’s ﬁrst result is that, within any consistent axiom system for

arithmetic, there will always be true but unprovable arithmetic statements. Moreover, one can prove

that while the provable theorems are countable, the unprovable truths are uncountable. Thus, the

overwhelming majority of the truths expressible in arithmetic are unprovable. Amazing!

The idea behind Gödel’s proof is something like the liar paradox. Look at this sentence:

You can easily see that the sentence must be both true and not true, a contradiction. Gödel’s

proof relies on a similar sentence, but with provability replacing truth:

Is this sentence provable in PA? Well, if it is, PA can prove a false sentence and so is inconsistent.

If it’s not, the sentence is true but not provable in PA, so PA is incomplete. Hence PA cannot be

both complete and consistent.

If that seems a little too fast, you’re right. Why should we expect that sentence to be provable in

PA? PA is about arithmetic; the sentences it can prove are sentences in arithmetic; it is incomplete

only if there is some sentence of arithmetic that it can’t prove. That sentence is not a sentence of

arithmetic, so that sentence doesn’t count against the completeness of PA. It would, however, count

against the completeness of PA if there were a sentence of arithemetic that said of itself that it wasn’t

provable. The basic idea behind Gödel’s proof is showing that there is such a sentence.

One key to being able to state both the liar sentence and Gödel’s sentence is self-reference. Each

of these sentences referred to itself (using the words ‘this sentence’). There are other ways to achieve

self-reference. For example, say we had a list of sentences. Name the list L. If the nth sentence on

the list were

then n would be a liar sentence, having achieved self-reference by referring to its own name.

There are other ways of doing this in English, since English has a huge variety of ways to refer to

arbitrary sentences. Arithmetic doesn’t seem to have that. One of the many extremely clever bits of

Gödel’s proof is that he showed how sentences of arithmetic can refer to themselves.

Step 1: The Gödel numbers

The ﬁrst thing we do is assign a number to each of the undeﬁned symbols of the language. Here is

one way to do it:

7.2. THE DETAILS 129

line separator 00

( 11

) 12

, 13

∀ 21

∼ 22

& 23

= 24

x 31

P 32

f 33

. 34

0 41

′ 42

+ 43

× 44

This way of matching up symbols to numbers is not the only way. It’s not the way Gödel did it,

but it is considerably simpler. These numbers are called the Gödel numbers.

We now can give a number to each sentence of PA. We do this by concatenation. So, for example,

the open sentence

x=x

will get the number 312431, which is the number for ‘x’ next to the number for ‘=’ next to the

number for ‘x’. The deﬁned terms can be given Gödel numbers via their deﬁnitions. For example,

1 is deﬁned as 0′ , so the Gödel number of 1 is 4142. We can also give Gödel numbers to proofs, with

are just sequences of sentences. The Gödel numbers of the lines in the proof will be separated by

double zeroes. (Even though I will write ‘.˙.’ to mark the conclusion, we don’t have a Gödel number

for it, since we don’t need one. The last line is the conclusion.) So, for example, the proof

∀x(x = x)

.˙.0 = 0

will get the number 21311124311200412441. (As you can see, the Gödel numbers are normally

quite large. The Gödel number of this very simple proof is more than 2×1019; for longer proofs

the numbers can get extremely large.)

Not every number is a Gödel number, but given any number it’s easy to check whether it’s a

Gödel number, and what symbol or sentence or proof it’s a Gödel number of.

Exercises

Give the Gödel number of these statements/proofs.

1 ∀xPx

2 1=1

3 ∼(0=1)

4 ∀xPx

.˙.P0

130 CHAPTER 7. GÖDEL’S PROOFS

5 41434142244142

6 213122114124314212

7 2131113143412431120041424341244142

From now on we’ll follow this convention: We’ll put upper corner quotation marks around a

formula to indicate the Gödel number of that formula, and lower corner quotes around a number

to mean the formula that that is a number of. Thus:

⌜x = x⌝ :: 312431

We’ll call a Gödel number written out ‘fully expanded’ and a Gödel number written with corner

quotes ‘partially expanded’. So ⌜x = x⌝ is partially expanded, and ‘312431’ is fully expanded.

Partly because Gödel’s own system for numbering was more complex, this section of his proof was

vastly more complicated. The next step is to show that there is some relation between the Gödel

number of a proof and the Gödel number of the conclusion. Given our system of Gödel numbering,

the relation is straightforward: the Gödel number of conclusion of the proof is the sequence of

numbers after the last pair of zeros. In the proof above that 0 = 0, the Gödel number of the proof is

21311124311200412441, and the Gödel number of the conclusion is 412441. Another part of the

reason this section of the proof was more complicated for Gödel is that it is necessary to show that

this numerical relation can be deﬁned within PA. The concatenation function is easily seen to be

deﬁnable in PA, since it’s simply addition and multiplication: to concatenate the number 21 to the

right of the number 31, we multiply 31 by 100 and add 21. It’s slightly more complicated to ﬁnd

the last pair of zeroes and then subtract the numbers after them, but it can be done.

We’ll call this relation between the proof and the conclusion Bxy: x is the Gödel number of

a proof of y. (For clarity, I will sometimes write the relation with parentheses and a comma, like

this: B(21311124311200412441,412441).) It will often be clearer to use the corner-quote notation:

B(⌜∀x(x = x)⌝, ⌜0 = 0⌝). (‘B’ stands for the German word ‘Beweis’, which means ‘proof ’.)

Given this relation, we say that a certain formula has no proof. For example, to say that there’s

no proof of ‘0 = 1’, we say ‘∼∃xB(x, ⌜0 = 1⌝)’, or equivalently, ‘∀x∼B(x, ⌜0 = 1⌝)’. Let’s deﬁne

a new one-place predicate, Ux, that says that a certain sentence has no proof, i.e., is unprovable.

So ‘U ⌜0 = 1⌝’ (i.e., ‘H41214142’) says that the statement ‘0 = 1’ is unprovable. (‘U’ stands for

‘unbeweisbare’ or ‘unprovable’. Gödel himself used ‘Bew’ as the name of the predicate.)

Exercises

Given the predicates as deﬁned above, state that there is no proof of the following sentences, in

both the fully expanded and partially expanded notations. (In some cases the sentences need to be

rewritten in terms of the basic symbols.)

8 ∀xPx

9 1=2

*10 x̸=x

11 ∃x(x=0 &x̸=0)

7.2. THE DETAILS 131

Step 3: Self-reference

Let’s take a step back and see where we are. First Gödel showed that there is a way to encode every

symbol, every sentence, and every proof in a formal language into (a subset of) the natural numbers.

(This, incidentally, was a key insight that was important in the development of computers.) Then he

showed that there is a speciﬁc mathematical relation between the Gödel number of a proof and the

Gödel number of the conclusion of that proof, a relation no less mathematical than ‘<’. This allows

us to talk about mathematics within mathematics. Now, if Gödel had done only these two things,

he may well have been the greatest logician and mathematician of his generation. Applying these

results to show that no axiomatic system for arithmetic can be complete—well, that’s amazing.

Back to it. We will now deﬁne a function g of three variables. We represent the variables by

numbered blanks: _1 , _2 , and _3 , and we represent the function of these variables by ‘g(_1 , _2 , _3 )’.

This function goes FROM the numbers we write in the blanks TO a particular Gödel number.

In particular, the value of the function is the Gödel number of the expression one gets if one be-

gins with the expression with the Gödel number that appears in the ﬁrst blank (_1 ) and replaces,

in that expression, the symbol whose Gödel number appears in the second blank (_2 ) by whatever

number appears in the third blank (_3 ). Thus, g is a three-place function FROM the Gödel num-

ber of a particular expression, the Gödel number of a particular symbol, and an arbitrary number

TO a particular Gödel number—namely, the Gödel number of the result of inserting the number

for the symbol in the expression. For example, the Gödel number of the statement function ‘x =

x’ is 312431, so g(312421,31,2) is 41424224414242, the Gödel number of 2 = 2. In particular,

g(_1 , 31, _1 ) is the Gödel number of the expression one gets if, beginning with the expression with

Gödel number _1 , one replaces all the x’s in that expression with that very Gödel number. For exam-

ple, g(⌜x = x⌝, 31, ⌜x = x⌝)—in expanded form g(312431,31,312431)—is ⌜312431 = 312431⌝.

(I hope you’ll forgive me if I don’t write it all out.)

Now consider this open sentence: Ug(x,31,x). This says that the sentence whose Gödel number

is g(x,31,x) is unprovable. This is an open sentence because it has an unbound variable x; we can

replace that x with a particular number. If we replace it, as we did above, with ⌜x = x⌝, then the

sentence says (falsely) that ⌜312431 = 312431⌝ is not provable. But if we replace it with the number

⌜g(x, 31, x)⌝—that is, the Gödel number of the open sentence itself—things get interesting. We’ll

call this sentence G.

G U g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝).

And that’s it. That’s the sentence that says of itself that it’s not provable. How does it do that?

Well, it says that the sentence with the Gödel number g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝) is not

provable (that, recall, is what the predicate ‘U’ means). But what sentence has that Gödel number?

To ﬁgure that out, we replace all the x’s in the function g’s ﬁrst slot with the number in the third

slot. Why don’t you ﬁgure it out. I’ll wait here.

Exercises

We have terms for fully and partially expanded Gödel numbers; let’s add one: call a Gödel

number written as the function g ‘condensed’. So ‘g(312421,31,2)’ is condensed (as is ‘g(⌜x =

x⌝, ⌜x⌝, 2)’), ⌜2 = 2⌝ is partially expanded, and ‘41424224414242’ is fully expanded. Partially

132 CHAPTER 7. GÖDEL’S PROOFS

12 g(4144412441, 41, 1)

*13 g(⌜x + x = x⌝, ⌜x⌝, 0)

14 g(⌜x + x = 824882⌝, 31, ⌜0 = 0⌝)

15 g(⌜g(x, 31, x) = 0⌝, 31, ⌜g(x, 31, x)⌝)

16 g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝)

If you didn’t make a mistake, you got

⌜U g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝)⌝

which is just the Gödel number of G itself. So G says that there’s a sentence unprovable in PA,

and that sentence is the one whose Gödel number is g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝)—that

is, whose Gödel number is ⌜U g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝)⌝—that is, G itself.

Step 4: Incompleteness

So we have a dilemma. If G is true, there is a true statement of arithmetic that is unprovable, so PA

is not complete. If G is not true, there is a false statement of arithmetic that is provable, so PA is not

consistent. It seems reasonable to take the ﬁrst horn of this dilemma—it seems fairly obvious that

PA is consistent, so it mst be incomplete. This, of course, doesn’t mean that there’s no way at all to

prove G (we just proved it, for example, from the premise that PA is consistent); it means that there’s

no way to prove it within PA—in other words, it’s not a theorem of PA.

Thus, PA is incomplete. Normally that means that more axioms are needed. We could, for

example, add G itself as a new axiom, thus guaranteeing that G is provable. But then Gödel’s

argument could be restated in terms of this strengthened system and, indeed, no matter how many

times we augment A, even through the addition of inﬁnitely many axioms, we can follow the same

steps to generate new versions of G that are true but unprovable in the augmented system (this

aspect of Gödel’s proof resembles Cantor’s diagonal proof). Thus, no consistent extension of A can

be complete for arithmetic. This is sometimes expressed by saying that axiom systems for arithmetic

are essentially incomplete.

Gödel showed that his results held for PA and for ZF, and claimed they held for “related systems.”

Several logicians after Gödel tried to ﬁgure out what exactly were the minimum requirements of an

axiom system that make it essentially incomplete. It turns out that any axiom system that is sufﬁcient

to give rules for addition and multiplication is sufﬁcient. (That is because we needed addition and

multiplication to generate the function in step 2 that we used in step 3 to simulate self-reference.)

For example, Q—that is, PA without induction, merely ﬁnite arithmetic—is sufﬁcient.

We now come to the second of Gödel’s results, namely that no axiom system for arithmetic can be

proven consistent by any argument expressible in that system. This follows quite easily from the ﬁrst

result.

7.2. THE DETAILS 133

Suppose ∼G, that is, suppose ∼∀x∼U g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝). By QN, we have

∃xU g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝). This asserts that some number stands in the relation

U __ to g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝) and, on the meta-level, this means that G, the state-

ment with Gödel number U g(⌜U g(x, 31, x)⌝, 31, ⌜U g(x, 31, x)⌝), is provable in PA. But if PA is

consistent, if G is provable, G must be true. Thus, if PA is consistent, if we assume ∼G (that is, if G

is provable), G must be true. Thus, if PA is consistent, if ∼G, we can prove G. But then we would

have this proof:

1. PA is consistent →(∼G →G)

2. PA is consistent →(∼∼G vG) CE

3. PA is consistent →(G vG) DN

4. PA is consistent →G Taut

Thus, if one could prove that PA is consistent (by any argument in PA), modus ponens would

yield G. But, from Gödel’s ﬁrst result, we know G cannot be proven in PA. Thus, in PA we can never

prove that PA is consistent.