ALC Master

c Copyright for the material in this text resides with Peter Stacey, Kevin Bick-
nell, John Banks and the Department of Mathematics and Statistics, La Trobe
University. As such, reproduction of this material may only be undertaken with
their express permission.
i
Machines and Languages
Subject Notes Part A for MAT2ALC
Algebra, Linear Codes and Automata
This text was developed by Peter Stacey and subsequently revised by Kevin
Bicknell and John Banks. The 2012 full edition was typeset by John Banks.
Contents
Part 1. Relations, Languages and Computation 1
Chapter 1. Relations and Functions 3

1.1. Cartesian products 3
1.2. Relations 4
1.3. Functions 6
1.4. An alternative view of Cartesian products 8
1.5. Combining relations and functions 10
Chapter 2. Properties of Binary Relations 13

2.1. Directed graphs of relations 13
2.2. Properties of binary relations 14
2.3. Closures of binary relations 18
Chapter 3. Finite State Machines 21

3.1. Deterministic finite state machines 21
3.2. Finite state machines without output 24
3.3. Recognition machines 24
3.4. Notations for words and languages 26
3.5. Extended transition function and suffix sets 27
Chapter 4. Regular Languages and Recognition Machines 29

4.1. Grammars and regular languages 29
4.2. Recognition machines for regular grammars 33
4.3. Nondeterministic machines 35
4.4. Regular Expressions 36
Chapter 5. Deterministic Machines 39

5.1. Equivalent deterministic and nondeterministic machines 39
5.2. Simplifying deterministic machines 41
5.3. An algorithm for finding suffix equivalence classes 43
5.4. Designing machines from language descriptions 46
v
vi CONTENTS
Chapter 6. Machines with Memory 49

6.1. Non-Regular Languages 49
6.2. Stacks 51
6.3. Push down automata 54
6.4. Nondeterministic push down automata 59
Chapter 7. Context Free Languages 63

7.1. Context free grammars 63
7.2. Greibach normal form 67
7.3. PDA and context free grammars 70
7.4. Deterministic context free languages 73
Chapter 8. Countability and uncountability 75

8.1. How big is a set? 75
8.2. Countable sets 79
8.3. How big is a language? 82
8.4. Uncountable sets 86
Chapter 9. More Powerful Machines 91

9.1. Not all languages are context free 91
9.2. Turing machines 94
9.3. The power of Turing machines 102
Part 1
Relations, Languages and Computation

Chapter One
Relations and Functions

Relations play an important role in computer science, for ex-
ample in conceptual models for databases. A mathematical de-
scription of relations can be based on the notion of Cartesian
products of sets. Functions and binary operations can be de-
fined as special sorts of relations.
Reference. Relations and functions are discussed in Chapters 6 and 7 of Discrete

Structures for Computer Science by Alan Doerr and Kenneth Levasseur (SRA,
1985).
1.1. Cartesian products

If D1 , D2 , . . . , Dn are sets then the Cartesian product D1 D2 Dn is usually1
defined to be the set of all ordered n-tuples (D1 , D2 , . . . , Dn ) where d1 D1 , d2
D2 , . . . , dn Dn . An ordered n-tuple is just a list of n objects in a particular order
so that, for example, the 3-tuple (1, 2, 3) is different from the 3-tuple (3, 2, 1).
Although the word product and the multiplication sign are used, Cartesian
products have nothing to do with ordinary multiplication. It is conventional to
use round brackets to enclose the elements of an n-tuple, in which the order
matters, and curly brackets to enclose the elements of a set, in which the order
doesnt matter.
Example 1.1.1. In applications to databases D1 , D2 , . . . , Dn will consist of ob-

jects which can be linked by various relationships. For example D1 could consist
of names, such as D1 = {Andrew, Michelle, Tracey} and D2 could consist of sub-
urbs, for example, D2 = {Bundoora, Greensborough, Heidelberg}. Then D1 D2
consists of all pairings of names with suburbs (with names listed first). So
D1 D2 = {(Andrew, Bundoora), (Andrew, Greensborough), (Andrew, Heidelberg),

(Michelle, Bundoora), (Michelle, Greensborough), (Michelle, Heidelberg),
(Tracey, Bundoora), (Tracey, Greensborough), (Tracey, Heidelberg)}.
1In Section 1.4 we will consider a slightly different approach.

3
4 1. RELATIONS AND FUNCTIONS
Example 1.1.2. When D1 , D2 are sets of numbers we can draw a picture of their
Cartesian product. We represent the ordered pair (D1 , D2 ) by the point in the
plane at a horizontal distance D1 and a vertical distance D2 from a chosen origin.
For example, if D1 = {1, 2} and D2 = {2, 3, 4} then
D1 D2 = {(1, 2), (1, 3), (1, 4), (2, 2), (2, 3), (2, 4)}
which is sketched below.

y
x
1 2

Notice that if D1 has m1 elements and D2 has m2 elements then D1 D2 has

m1 m2 elements. This is one reason why the Cartesian product is called a
product. More generally, if D1 has m1 elements, D2 has m2 elements and so on,
then D1 D2 Dn has m1 m2 mn elements. To see why this is
true, notice that for each of the m1 choices for the first coordinate there are m2
choices for the second coordinate, giving a total of m1 m2 choices for the first
two coordinates. For each of these there are m3 choices for the third coordinate,
giving a total of m1 m2 m3 choices for the first three coordinates, and so on.
1.2. Relations
If D1 , . . . , Dn are any sets, then a relation between elements of these sets is a
subset of the Cartesian product D1 D2 Dn .
Example 1.2.1. As in Example 1.1.1 let D1 = {Andrew, Michelle, Tracey} and

D2 = {Bundoora, Greensborough, Heidelberg}. If Andrew and Tracey live in
Greensborough and Michelle lives in Bundoora, then the lives in relation be-
tween D1 and D2 is given by the set L = {(Andrew, Greensborough), (Michelle,
Bundoora), (Tracey, Greensborough)}. If Andrew and Michelle work in Heidel-
berg and Tracey works in Bundoora, the works in relation between D1 and D2
is W = {(Andrew, Heidelberg), (Michelle, Heidelberg), (Tracey, Bundoora)}.
1.2. RELATIONS 5
Example 1.2.2. Let D1 = D2 = D3 = N, the set of natural numbers. The

addition relation A is given by all the 3-tuples (d1 , d2 , d3 ) for which d3 = d1 + d2 ,
so
A = {(1, 1, 2), (1, 2, 3), (2, 1, 3), (1, 3, 4), (2, 2, 4), (3, 1, 4), . . . }.
Example 1.2.3. Let D1 = D2 = N. The less than or equal to relation B can

be described by the set of all ordered pairs (d1 , d2 ) for which d1 d2 , so
B = {(1, 1), (1, 2), (1, 3), (2, 2), (1, 4), (2, 3), . . . }.
1.2.1. Binary and ternary relations. We have defined a relation R to be a

subset of a Cartesian product D1 D2 Dn and hence we use (d1 , . . . , dn ) R
to specify that D1 , . . . , Dn are related. The simplest non-trivial case of relations,
when the Cartesian product involves just two sets, is particularly important. Such
relations are called binary relations. In the special case of binary relations it is
common to write d1 R d2 instead of (d1 , d2 ) R when D1 is related to D2 .
Sometimes we create some sort of special symbol associated with the relation
R and write d1 d2 instead of (d1 , d2 ) R. Thus, for example, we always write
d1 = d2 instead of (d1 , d2 ) R = {(d1 , d2 ) : d1 is equal to d2 }
and
d1 < d2 instead of (d1 , d2 ) R = {(d1 , d2 ) : d1 is less than d2 }.
The inverse of a binary relation R D1 D2 is defined to be the to be the set
R1 = {(y, x) : (x, y) R} D2 D1 .
Example 1.2.4. The inverse of the lives in relation

{(Andrew, Greensborough), (Michelle, Bundoora),(Tracey, Greensborough)}
is the has a resident relation
{(Greensborough, Andrew), (Bundoora, Michelle),(Greensborough, Tracey)}.
If we have a relation R on D1 D2 and a relation S on D2 D3 then we can form

a relation S R on D1 D3 , known as the composite of R and S, by2
S R = {(d1 , d3 ) : there exists d2 D2 with (d1 , d2 ) R and (d2 , d3 ) S}.
Example 1.2.5. The composite of the relations L = {(Andrew, Greensborough),

(Michelle, Bundoora), (Tracey, Greensborough)} and T = {(Greensborough, train),
(Bundoora, tram), (Heidelberg, train)} is T L = {(Andrew, train), (Michelle,
tram), (Tracey, train)}. Notice that Heidelberg does not appear in L, but that
this does not matter. In fact a composite of non-empty relations could easily turn
out to be empty.
2Fenced off sections of the text can be omitted by students focussing on the basics.
A three place relation R D1 D2 D3 (like the one in Example 1.2.2)

is called a ternary relation. In general the number n of places in a relation
R D1 D2 Dn is called the arity of R.
1.3. Functions
A function is a binary relation R D1 D2 with the special property that, for
each d1 D1 there is exactly one d2 D2 with (d1 , d2 ) R. The set D1 is usually
called the domain of the function and D2 is called its codomain and we say that
R is a function from D1 to D2 .
Example 1.3.1. Let d1 = d2 = N and let R = {(d1 , d2 ) : d1 > d2 }. Then R is not

a function because for the choice d1 = 10, for instance, there are lots of different
elements d2 N with (10, d2 ) R. For example (10, 1) R and (10, 2) R.
Another reason R fails to be a function is that for d1 = 1 there is no d2 such that
(d1 , d2 ) R because 1 is less than or equal to every natural number.
Example 1.3.2. Let D1 = N\{1} and D2 = N and let F = {(d1 , d2 ) : d1 = d2 +1}.

Then F is a function from d1 to d2 . Note however, that if we let D1 = N then
F would fail to be a function because there would be no d2 D2 for which
(1, d2 ) R.
Example 1.3.3. Let D1 = D2 = N and let G = {(d1 , d2 ) : d2 = d21 }. Then G is

a function: whichever d1 we pick in N there is exactly one choice of d2 (namely
d2 = d1 d1 ) for which (d1 , d2 ) G.
In Example 1.3.3 there was a rule associated with elements of R, namely square
the first element to get the second one. Similarly, all functions can be thought
of as rules. Given an element d1 in the domain, the rule produces the unique
element d2 in the codomain for which (d1 , d2 ) R. If we call the function f , then
we use the notation f (d1 ) = d2 to describe the fact that d2 depends on d1 . When
discussing functions we will frequently use the traditional notation
f : D1 D2
which is read as the function f maps the domain D1 to the codomain D2 or

more briefly f mapping D1 to D2 . In this notation, the subset of D1 D2
which we have used to define the function is called the graph of the function. As
shown in calculus or precalculus courses, a function specified as a rule determines
a unique graph. As outlined above, the graph also determines the rule. Hence
either the rule or the subset of D1 D2 characterises the function.
Since a function f is just a binary relation, it always has an inverse f 1 as defined
in 1.2.1, but there is no guarantee that the inverse will be a function.
1.3. FUNCTIONS 7
Example 1.3.4. The squaring function f : R R defined by f (x) = x2 may be

written in ordered pair notation as f = {(x, x2 ) : x R}. Its inverse,
f 1 = {(x2 , x) : x R}
the square root relation, is not a function because, for example, it contains the
pairs (4, 2) and (4, 2).
When the domain of a function f is finite (as is often the case in computer science
applications) we can represent it using a table by simply tabulating all of the x
and f (x) values.
Example 1.3.5. The function f : {0, 1, 2, 3} {0, 1, 2, 3} such that f (x) = 3x

may be defined by the following table.
x 0 1 2 3
f (x) 3 2 1 0
We could even list all possible functions mapping {0, 1, 2, 3} to {0, 1, 2, 3} in a
single table as follows.
x 0 1 2 3
f1 (x) 0 0 0 0
f2 (x) 0 0 0 1
f2 (x) 0 0 0 2
.. .. .. .. ..
. . . . .
f256 (x) 3 3 3 3
Since there are 44 = 256 such functions, we have omitted most of them!
Given a function f : D1 D2 and a S D1 the restriction of f to S is the

function with domain S and codomain D2 which takes exactly the same values as
f , but only for elements of S. The restriction of f to S is sometimes written as
f |S , so f |S (x) = f (x) for all x S. In the ordered pair notation for a function
f |S = {(x, y) f : x S}.
1.3.1. Partial functions. A partial function is a relation R D1 D2 with
the property that, for each d1 D1 there is at most one d2 D2 with (d1 , d2 ) R.
Unlike a function, we dont insist that a partial function be defined at every
d1 D1 . Every function f : D1 D2 is a partial function satisfying the additional
condition that for every d1 D1 there is at least one d2 D2 such that (d1 , d2 ) f ,
so functions are a special case of partial functions.
Example 1.3.6. Let D1 = D2 = N and let R = {(d1 , d2 ) : d1 > d2 }. Then

R is not a partial function because when d1 = 10, for instance, there are lots
of different elements d2 N with (10, d2 ) R. For example (10, 1) R and
(10, 2) R.
Example 1.3.7. Let D1 = D2 = N and let F = {(d1 , d2 ) : d1 = d2 + 1}. Then

F is a partial function from D1 to D2 . It doesnt matter that there is no d2 D2
for which (1, d2 ) R. In Example 1.3.2 we observed that F is not a function.
Partial functions arise in many settings in computer science. We will see many
examples of them in our study of automata. As for functions, we can specify a
partial function on a finite set using a table. The only difference is that we use
the symbol to indicate that the partial function is not defined for certain input
values.
Example 1.3.8. The the following table gives the values of the square root
relation
S = {(x, y) D D : x = y 2 }
on the set D = {0, 1, 2, 3, 4, 5} and illustrates the fact that S is indeed a partial
function.
x 0 1 2 3 4 5
f (x) 0 1 2

1.4. An alternative view of Cartesian products

In accordance with usual practice in elementary mathematics, we introduced the
Cartesian product D1 D2 . . . Dn as the set of all ordered n-tuples
(d1 , d2 , . . . , dn ) where d1 D1 , d2 d2 , . . . , dn Dn . Another equivalent way
of viewing this product is as the set of all functions F with domain3
Nn = {1, 2, . . . , n}
that have the property that f (i) Di for each i Nn . Although the equivalence
of this approach may not seem obvious at first sight, notice that:
(a) For each function f having the above property, it is true by definition
that the n-tuple (f (1), f (2), . . . , f (n)) is an element of D1 D2 Dn
as traditionally defined.
(b) For each (d1 , d2 , . . . , dn ) D1 D2 Dn the function
f : Nn D1 D2 Dn : i 7 di
has the above property and (f (1), f (2), . . . , f (n)) = (d1 , d2 , . . . , dn ).

Together (a) and (b) show that there is a one to one correspondence
f (f (1), f (2), . . . , f (n))

3In case you are worried about it, D1 D2 Dn provides a suitable the codomain for
these functions.
1.4. AN ALTERNATIVE VIEW OF CARTESIAN PRODUCTS 9
between the n-tuples of our original definition of D1 D2 . . . Dn and the

set of functions defined above. In the context of this representation of Cartesian
products, the set Nn which is the domain of all of the functions in the product is
called the index set for the product. One of the advantages of this representation
is that we can actually use any finite set in place of the standard index set Nn .
This allows meaningful names to be used for the coordinates or places in
the product. In the theory of relational databases, this corresponds t the named
perspective on the concept of Cartesian product and it is the perspective we will
adopt when discussing the theory of relational databases in the next few chapters.
Example 1.4.1. We usually represent the plane R2 = {(x, y) : x, x R} in x-y

coordinates where the horizontal or x coordinate comes first and the vertical or
y coordinate comes second. Upon reflection, it should be clear that the order
of these coordinates is not important. What really matters is that we dont mix
up the horizontal and vertical coordinates. Listing them in a particular order is
merely one way of doing that. We could just as easily represent the plane as the
set of all functions p : {h, v} R. In this representation, our index set is {h, v},
the horizontal coordinate of the point represented by p is p(h) and p(v) is the
vertical coordinate. For example, the point we normally write as (1, 2) would be
represented by the function p : {h, v} R where p(h) = 1 and p(v) = 2.
We will refer to this alternative description as the function representation of Carte-

sian products as opposed to the n-tuple representation of Cartesian products. This
representation of Cartesian products turns out to be particulary useful in the
mathematical description of relational databases because we often want to asso-
ciate meaningful names or labels with the places in a Cartesian product or a
relation. It also avoids the need to order the places a significant theoretical
advantage from the relational database point of view.
Example 1.4.2. We may represent the Cartesian product of Example 1.1.1 as

the set of all functions f with domain {name, residence} such that
f (name) D1 = {Andrew, Michelle, Tracey}

f (residence) D2 = {Bundoora, Greensborough, Heidelberg}.
The lives in relation
L = {(Andrew, Greensborough), (Michelle, Bundoora), (Tracey, Greensborough)}
of Example 1.2.1 is now represented as the set of functions {f1 , f2 , f3 } shown in

the following table.
x name residence
f1 (x) Andrew Greensborough
f2 (x) Michelle Bundoora
f3 (x) Tracey Greensborough
The fact that the function representation of the Cartesian product allows us to
describe this relation using a table will be particularly relevant for description of
relations in relational databases, as discussed in the next chapter. In this context,
we typically omit the names of the functions (unless we need them for some
reason), so the table becomes a bit simpler:
name residence
Andrew Greensborough
Michelle Bundoora
Tracey Greensorough

The function representation of Cartesian products makes it easy define the product
of infinitely many sets. For example, the set of all infinite sequences of natural
numbers is an infinite product of copies of N. This means the set of all possible
functions S : N N. The index set is now N. Similarly, the set of all infinite
sequences of real numbers is an infinite product of copies of R which just means
the set of all functions S : N R. Such products are of vital importance in many
branches of mathematics.
1.5. Combining relations and functions

Recall that, given two sets R and S, the union of R and S, denoted by RS, is the
set of all elements which are either in R or S (possibly both) and the intersection
of R and S, denoted by R S, is the set of all elements which are both in R and
S. Thus
R S = {x : x R or x S}
and
R S = {x : x R and x S}.
The set difference (often denoted by RS but here by R\S to avoid any confusion
with subtraction) consists of the elements of R which are not in S i.e.
R\S = {x R : x
/ S}.
Venn diagrams for these three operations are given in Figure 1. It is fairly easy
1.5. COMBINING RELATIONS AND FUNCTIONS 11
R S R S R S
R S shaded R S shaded R \ S shaded
Figure 1. Union, intersection and difference of two sets
to see that if R and S are both subsets of the same Cartesian product
D1 D2 Dn
then so are R S, R S and R \ S. In other words, if R and S are both relations

between elements of D1 , D2 , . . . , Dn then so are R S, R S and R \ S.
Example 1.5.1.
(a) If R is the less than relation {(m, n) : m, n N, m < n} on the natural
numbers N and S is the equals relation {(m, m) : m N} on N, then
R S is the less than or equal to relation on N.
(b) If R is the less than or equal to relation {(m, n) : m, n N, m n} on N
and S is the greater than or equal to relation {(m, n) : m, n N, m n}
on N, then R S is the equals relation on N.
(c) If R is the less than or equal to relation {(m, n) : m, n N, m n}
on N and S is the equals relation {(m, m) : m N} on the natural
numbers, then R \ S is the less than relation on N.

We can also take unions, intersections and differences of pairs relations that are
subsets of different Cartesian products, but this only yields a relation in cases
where the arities (the numbers of sets in the two products) are the same. If
R D1 D2 Dn and S E1 E2 En it turns out that
R S (D1 E1 ) (D2 E2 ) (Dn En ),

R S (D1 E1 ) (D2 E2 ) (Dn En ),
R \ S D1 D2 Dn .
Example 1.5.2.
(a) Let F be the set of female mathematics students and M be the set of
male mathematics students at La Trobe University. The union of the
relations S = {(x, y) : x, y F, x is a sister of y} and B = {(x, y) :
x, y M, x is a brother of y} is the same sex sibling relation on the set
F M of all mathematics students. Note that S B is not the same as
the sibling relation {(x, y) : x, y F M, x is a sibling of y} because

it does not contain any (brother, sister) or (sister, brother) pairs.
(b) It makes no sense to take the union of the (binary) less than relation
{(a, b) : a < b} NN with the (ternary) addition relation {(a, b, c) : c =
a+b} NNN because the union of these two sets would be a mixture
of 3-tuples and 2-tuples and therefore not a relation. The intersection or
difference of these relations would not be a relation either .

Since a function is by definition a subset of a Catesian product of two sets, the do-
main and codomain, we can always take the union of two functions. The resulting
union may or may not turn out to be a function.
Example 1.5.3.

(a) The union of the functions f = {(x, x) : x R, x 0} and

g = {(x, x) : x R, x 0} is the square root relation pictured in
Figure 2. Although f g is the inverse of the function {(x, x2 ) : x R},
it is not a function itself.
(b) The union of f = {(x, x) : x R, x 0} and g = {(x, x) : x R, x 0}
is a function, known as the absolute value function, usually written |x|.
1 2
0 x 1
1 2 3 4
1 x
2 1 0 1 2
2
Figure 2. The square root relation and absolute value function.

Lemma 1.1. The union of two functions f : D1 D2 and g : D3 D4 is a

function precisely when f and g agree on D1 D3 , ie, when f (x) = g(x) for all
x-values in D1 D3 .
There is a subtle point to note about Lemma 1.1. If D1 D3 is empty then f g

will indeed be a function. The Lemma still holds in these cases because, from a
logicians point of view, it is true that f (x) = g(x) for all x-values in D1 D3 .
Chapter Two
Properties of Binary Relations

Binary relations on finite sets can be represented using directed
graphs. These can be used to understand important special
properties like reflexivity, symmetry and transitivity possessed
by many binary relations. When a relation does not have one of
these properties, we can often find a larger relation which does.
The smallest such relation is the closure of the original relation
with respect to the given property.
Reference. Chapter 6 of Doerr and Levasseur.
2.1. Directed graphs of relations

There are two quite different ways of picturing a binary relation R D D for
some set D. The first of these was introduced in Example 1.1.2 and only makes
sense if D R. The second, using directed graphs, can be applied to any (finite)
set D and is particularly helpful when we are trying to understand various special
properties of binary relations. The idea is very simple.
For each element d D we draw a vertex (a dot) and label it d.
For each (d, e) R we draw a directed edge (an arrow) from d to e (so we
must draw a loop in the case where d = e).
Example 2.1.1. Let D = {1, 2, 3} and let R be the less than or equal to
relation R = {(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)}. The directed graph is
1 2 3
For the same set, the directed graph representing the strictly less than relation
R = {(1, 2), (1, 3), (2, 3)} is
1 2 3
13
14 2. PROPERTIES OF BINARY RELATIONS
and the equality relation R = {(1, 1), (2, 2), (3, 3)} has directed graph
1 2 3
We have just seen how easy it is to draw the directed graph of a relation written
as a finite set of ordered pairs. In the opposite direction, it is just as easy to write
down an ordered pair description of the relation represented by a given directed
graph. In fact, from a mathematical point of view a binary relation on a set S is
essentially the same thing as a directed graph with vertex set s.
2.2. Properties of binary relations

We return for now to some ideas about binary relations that are not particularly
relevant to the theory of relational databases. It will be convenient to use the
ordered pair representation for these relations. A relation R X X is said to
be symmetric if whenever x is related to y then y is related to x (i.e. if whenever
(x, y) R then (y, x) R). It is called antisymmetric if whenever (x, y) R and
x 6= y then (y, x) / R (alternatively, whenever both (x, y) R and (y, x) R
then x = y). It is said to be reflexive if every element is related to itself (i.e. if
(x, x) R for each x X) and is said to be transitive if whenever (x, y) R and
(y, z) R then (x, z) R.
It can be helpful to picture these properties using a notation for binary relations
instead of (x, y) R. Then we have
(a) Symmetry: x y implies y x
(b) Antisymmetry: x y and y x implies x = y
(c) Reflexivity: x x for each x
(d) Irreflexivity: x x is false for each x
(e) Transitivity: x y and y z implies x z.
2.2.1. Graphical detection of properties. The abstract nature of these

definitions can make them difficult to understand. This is where directed graphs
come to the rescue. Each of these properties translates directly into a property of
the directed graph representing the binary relation. In all cases (with the possible
exception of transitivity) it is very easy to decide whether a graph has the required
property.
(a) Symmetry: If there is an directed edge from a to b there must also be
one from b to a. In pictures, for any vertices a and b you can have
2.2. PROPERTIES OF BINARY RELATIONS 15
a b a b a b
this . . . . . . or this . . . . . . but not this.
(b) Antisymmetry: There are no pairs of vertices a and b with an edge from
a to b and an edge from b to a. You can never have
a b
(c) Reflexivity: There is a loop at every vertex.

(d) Irreflexivity: There are no loops at all.
(e) Transitivity: Wherever you find an edge from a to b and an edge from b
to c, there must also be an edge from a to c.
a b c a b c
where you see this . . . . . . you must also have this.
There is a special case where a = c. In that case, there is an edge

from a to b and an edge from b to a, so there must be a loop at a and
therefore also a loop at b.
a b a b a b
if you see this . . . . . . you must also have this . . . and also have this.
a b a b
So, if you see this . . . . . . you must have two loops.
In addition to the above properties, note that concepts like being a partial func-
tion and being a function are also properties of binary relations. These too,
admit simple interpretations in terms of directed graphs.
(a) Partial Function: There is at most one edge coming out of every vertex.
You can never have:
b
a
c
(b) Function: There is exactly one edge coming out of every vertex.
You can never have
b
a no out edge
c a b
this . . . . . . or this.
Of course, this approach to detecting properties of relations only applies where

the set D on which he relation is defined is finite. Nonetheless, the above graph-
ical properties can assist us to understand the meaning of properties of binary
relations.
2.2.2. Partial orders and equivalences. Two types of binary relations are
particularly important in computer science:
A relation which is reflexive, transitive and antisymmetric is known as a
partial order.
A relation which is symmetric, transitive and reflexive is called an equiv-
alence relation. For an equivalence relation R it is customary to write
x R y (or just x y when the relation is clear) rather than (x, y) R.
Example 2.2.1.
(a) Let R = {(n, n) : n Z} (the equality relation on Z). Then R is sym-
metric (because x = y implies y = x), reflexive and transitive, so is an
equivalence relation. It is also antisymmetric in a subtle way (because we
never have (x, y) R and x 6= y), so is also a partial order.
(b) Let R = {(x, y) Z Z : x y} (the less than or equal to relation
on Z). Then R is reflexive, transitive and antisymmetric (because when
x y and x 6= y then y x.) It is not symmetric (because, for example,
(1, 2) R but (2, 1)
/ R.) Hence it is a partial order (as we would hope)
but not an equivalence relation.
(c) Let R = {(x, y) Z Z : x < y} (the strictly less than relation on
Z). Then R is transitive and antisymmetric but neither reflexive nor
symmetric. It is a partial order.
(d) Let R = {(x, y) Z Z : x y is a multiple of 2}. Then R is symmetric
(because if x y = 2m then y x = 2(m)), reflexive (because x x =
2 0) and transitive (because if x y = 2m and y z = 2n then
x z = 2(m + n)). It is therefore an equivalence relation. It is not
antisymmetric because (1, 3) R with 1 6= 3 and (3, 1) R.
(e) Let R = {(x, y) Z Z : x is a factor of y}. Then R is reflexive (because
x = 1 x for each x), transitive (because if y = mx and z = ny then
z = (mn)x), not symmetric (because 2 is a factor of 4 but 4 is not a factor
2.2. PROPERTIES OF BINARY RELATIONS 17
of 2) and not antisymmetric because 2 is a factor of -2 and -2 is a factor

of 2 so both (2, 2) R and (2, 2) R although 2 6= 2.

If S is any set, then a finite partition of S is a way of writing S as a union of a

finite number S1 , S2 , . . . , Sn of disjoint sets.
Sn
...
S1 S2
Figure 1. A finite partition of a set S.
Define R on S S by {(x, y) : x, y Si for some i}. In other words x is related to

y if x and y belong to the same set of the partition. Then R is reflexive, symmetric
and transitive, so is an equivalence relation. The same argument applies when S
is a union of an infinite number of disjoint sets. Hence every partition gives rise
to an equivalence relation.
It is also true that every equivalence relation gives a partition, so that partitions
and equivalence relations are just two different names for essentially the same
thing. Given x S let [x] be the set of all elements related to x, i.e.
[x] = {y S : (x, y) R} = {y S : x y}
where R is the equivalence relation. [x] is called the equivalence class containing
x S. Note that if x S, then x [x]. It turns out that the set of all equivalence
classes forms a partition of S, as we will now see.
A proof that equivalence classes form a partition

Since R is reflexive, it is clear that every element x S belongs to [x] so S = [x].
x
It is not quite so clear that if [x] 6= [y] then [x] and [y] are disjoint. Suppose that
there exists z [x] [y]. Then (x, z) R and (y, z) R i.e. x z and y z.
From symmetry (z, y) R i.e. z y, and then by transitivity, (x, y) R i.e.
x y. Hence y [x]. Then, if s [y], y s so, by transitivity and x y, x s.
Hence s [x]. It follows that [y] [x]. Similarly [x] [y] and so [x] = [y]. Thus
if [x] 6= [y] then [x] and [y] are disjoint.
Example 2.2.2. (a) For the example R = {(n, n) : n Z}, each element is
related only to itself, i.e. [n] = {n} for each n. Hence each equivalence
class contains exactly one number. There are infinitely many equivalence
classes, giving a partition of Z into infinitely many sets.
(b) For the example R = {(x, y) Z Z : x y is a multiple of 2}, the equiv-
alence class containing 0 consists of the even numbers and the equivalence
class containing 1 consists of the odd numbers, i.e.
[0] = {y : y 0 is a multiple of 2} = {. . . , 2, 0, 2, 4, . . .}
and
[1] = {y : y 1 is a multiple of 2} = {. . . , 1, 0, 1, 3, . . . }.
In this example the associated partition contains just two sets, each with
infinitely many elements.

2.3. Closures of binary relations

When a binary relation R D D lacks a certain desirable property we can often
extend it to one that has the property by adding some new ordered pairs.
When R is not reflexive, we can make it reflexive by adding all pairs
(x, x) i.e. by replacing R by (R) = R {(x, x) : x D}. Not only is
(R) reflexive, but it is also the smallest reflexive set containing R (since
a reflexive set containing R is forced to contain both R and each (x, x)
where x D). We call (R) the reflexive closure of R. We obtain the
directed graph representation of (R) by starting with the directed graph
representation of R and adding a loop at any vertex where none exists.
In a similar way we can define the symmetric closure (R) of R by (R) =
R {(y, x) : (x, y) R}. To obtain the directed graph representation of
(R) from that of R by adding an edge from b to a wherever we see one
from a to b but none from b to a.
It is less obvious how to construct the transitive closure of R. Certainly, it
must contain all pairs (x, z) where (x, y) R and (y, z) R. However it
must also contain all pairs (x, z) for which there exist y1 , y2 with (x, y1 )
R, (y1 , y2 ) R and (y2 , z) R i.e. for which there is a three step process
x y1 y2 z. In a similar way the transitive closure of R must
contain each pair (x, z) obtained in n steps x y1 y2 yn1 z
for any n. In fact this is all we need, and the resulting relation is the
smallest transitive relation containing R, called the transitive closure of
R and denoted (R). The next example illustrates its construction using
directed graph representations.
2.3. CLOSURES OF BINARY RELATIONS 19
Example 2.3.1. The relation R = {(a, b) : b = a+1} on the set D = {1, 2, 3, 4, 5}

is represented by the following directed graph.
1 2 3 4 5
Adding edges from a to c wherever we see edges from a to b and from b to c yields
the following graph.
1 2 3 4 5
But we are not yet done. The relation represented by this graph is still not
transitive because, for example, there is an edge from 1 to 3 and an edge from
3 to 4, but no edge from 1 to 4. So, repeating the procedure used above gives
the following graph, which does in fact represent a transitive relation (in fact it
represents the relation <on D). Here we repeated the procedure twice to obtain
the graph of (R). In general, we may need to repeat this procedure several times
to obtain the transitive closure.
1 2 3 4 5
Combining these ideas we find that (((R))) is a transitive, symmetric, reflexive

set (this is not quite obvious) and is the smallest such relation containing R. It
is therefore the equivalence relation generated by R.
Example 2.3.2. (a) Let R = {(x, y) Z Z : x y}, as considered

in Example 3.1.1(b). Recall that R is reflexive and transitive, but not
symmetric. Its symmetric closure (R) is R {(x, y) : (y, x) R} =
R {(x, y) : y x} = {(x, y) : x y or y x} = Z Z. Thus Z Z is
the smallest equivalence relation containing R.
If x Z then the equivalence class [x] containing x is {y Z : (x, y)
Z Z} = Z. Hence the partition corresponding to (R) = Z Z contains
only one equivalence class (the whole of Z).
(b) Let R = {(x, y) Z Z : x < y}, as considered in Example 3.1.1(c). R
is transitive but neither reflexive or symmetric. Its reflexive closure (R)
is R {(x, x) : x Z} = {(x, y) Z Z : x < y} {(x, y) Z Z :
x = y} = {(x, y) Z Z : x y}. From (a) it follows that the smallest
equivalence relation containing R must also be Z Z.
(c) Let R = {(x, y) Z Z : x is a factor of y}, considered in Example

3.1.1(e), which is reflexive and transitive but not symmetric. Its symmet-
ric closure (R) = R {(x, y) : (y, x) R} = R {(x, y) : x is a multiple
of y} = {(x, y) : x is a factor of y or x is a multiple of y}. This is now
no longer transitive since, for example, (8, 24) (R) and (24, 6) (R)
(because 8 is a factor of 24 and 24 is a multiple of 6) but (8, 6) / (R).
It can be shown that the equivalence relation generated by R is Z Z.
(Notice that [1] = {y : 1 is a factor of y} = Z.)

It is sometimes necessary to start with a set P0 of subsets of S and combine them to

produce a partition P such that every element of P0 is contained in some element
of P . This is really just the idea of finding the equivalence relation generated
by some relation, but it is a little easier because we dont have to worry about
symmetry.
Algorithm 2.1. Let P0 be a finite set of subsets of some set S. Construct a

sequence of sets P1 , P2 , . . . as follows:
For each set A in Pi let A0 be the union of A and all the sets in Pi that
intersect A and define Pi+1 = {A0 : A Pi }.
It can be shown that Pn = Pn+1 for some i.
S
Let P = Pn W where W = S \ AP0 A is the set of all {x} such that
0 0
x / A for any A Pn .
P is then a partition and each element of P0 is contained in some element of P .
Example 2.3.3. For N10 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} and

P0 = {{1, 2}, {3, 2}, {3, 5}, {4, 7}, {7, 9}}
applying Algorithm 2.1 gives
P1 = {{1, 2, 3}, {3, 2, 5}, {4, 7, 9}}
P2 = {{1, 2, 3, 5}, {4, 7, 9}}
P3 = {{1, 2, 3, 5}, {4, 7, 9}}
so we stop calculating Pi s and let P = {{1, 2, 3, 5}, {4, 7, 9}, {6}, {8}, {10}}.
As you can see from the example, Algorithm 2.1 could easily be applied for small
P0 by simple inspection.
Chapter Three
Finite State Machines

A finite state machine is an abstract way of modelling a range
of mechanical or electronic devices such as vending machines,
simple flip-flops or entire computers. It can also be used to
model certain computer programs and non-hardware systems,
such as network protocols.
References. The classic textbook on finite state machines is Introduction to

Automata Theory, Languages and Computation by John Hopcroft, and Jeffrey
Ullman (Addison -Wesley, 1979), but this book is quite technical and difficult as
a first account. More readable extended accounts are given in An Introduction
to Formal Languages and Automata by Peter Linz (D. C. Heath & co., 1990)
and Automata and Formal Languages: An Introduction by Dean Kelly (Prentice-
Hall, 1995). Shorter more elementary treatments appear in chapters of Discrete
Mathematics by Richard Johnsonbaugh (Macmillan, 3rd edition, 1993), Discrete
Mathematical Structures by B. Kolman, R. Busby and S. Ross (Prentice Hall,
1996) and in Doerr and Levasseur.
3.1. Deterministic finite state machines

There are several, slightly different, ways of describing a finite state machine or
finite state automaton. We will consider a machine which can be in any of finitely
many internal states {q1 , . . . , qr }, can process a finite set {a1 , . . . , am } of allowable
inputs and can deliver a finite set {z1 , . . . , zn } of defined outputs. When the
system processes an input it may change states, so there is transition function
which describes exactly how it does that. This function maps a pair (x, s)
representing the current input and the current state to the next state (x, s).
There may also be an output function f which maps (x, s) to an output f (x, s).
In summary, our definition of a (deterministic) finite state machine consists of 5
parts:
Q = {q1 , . . . , qr } is the set of states
= {a1 , . . . , am } is the input alphabet
= {z1 , . . . , zn } is the output alphabet
: Q Q is the transition function
f : Q is the output function.
21
22 3. FINITE STATE MACHINES
Many mechanical devices, such as vending machines and electrical circuits, can
be modeled as finite state machines.
Example 3.1.1. For simplicity, consider a machine which sells two items A and
B, both costing $2. The set of inputs is {select A, select B, deposit $2}, the set
of outputs is {release A, release B, release nothing }, and the set Q of states is
{permit release, forbid release}. The transition function has, for example
(deposit $2, forbid release) = permit release
(select A, forbid release) = forbid release
(select A, permit release) = forbid release
and the output function f has, for example,

f (deposit $2, forbid release) = release nothing
f (deposit $2, permit release) = release nothing
f (select A, permit release) = release A
f (select A, forbid release) = release nothing.
Notice that when A is selected and the machine permits release then A is released
and the state is moved to forbid release, so a further $2 is needed before more
items can be released. The alert student might spot that this machine is biased in
favour of the owner: it allows the purchaser to pay more than $2 for an item!
There are two convenient ways to describe a finite state machine. The first is to
use a pair of function tables to describe the transition and output functions. For
instance, in Example 3.1.1 with the obvious notation1, the transition function is
given by
x sA sB d$2
(x,per) for for per
(x,for) for for per
and the output function is given by
x sA sB d$2
f (x,per) rA rB rn
f (x,for) rn rn rn
The other way to describe the machine is by a directed graph with the vertices
labelled by the states and the edges, representing possible transitions between
states, labelled by the corresponding inputs and outputs. The graph for Example
3.1.1 is shown in Figure 1.
1Since and f are functions of two variables, the function tables are slightly more compli-
cated than the ones we have seen so far.
3.1. DETERMINISTIC FINITE STATE MACHINES 23
d$2/rn sA/rA sA/rn
sB/rB
per for sB/rn
Figure 1. Finite state diagram describing a vending machine
Example 3.1.2. A finite state machine with the states carry and dont carry can
be used to add a pair of binary numbers which are input as a sequence of pairs of
binary digits. For example, to add 1101 and 11 then (starting from the right) the
pairs (1,1), (0,1), (1,0), (1,0), (0,0) are entered in turn. (The final (0,0) allows for
carry overs.) The transition function is
x (0, 0) (1, 0) (0, 1) (1, 1)

(x, c) dc c c c
(x, dc) dc dc dc c
To see this, note that we must carry if the total of the two inputs and any currently
carried digit is at least 2. The output function would be
(0,0) (1,0) (0,1) (1,1)

f (x, c) 1 0 0 1
f (x, dc) 0 1 1 0
It is left as an exercise to draw the corresponding directed graph.
The output from a given string of inputs can be calculated, once we know the
initial state, the transition function and the output function. For example, to add
the numbers 1101 and 11 using the machine of Example 5.1.2 we can record the
output as follows.
Inputs (1,1) (0,1) (1,0) (1,0) (0,0)

States dc c c c c
Outputs 0 0 0 0 1
Thus, recalling that we started on the right,

1101 + 11 = 10000.
Notice that we started in the state dont carry. This is known as the initial
state (or starting state).
3.2. Finite state machines without output

In some cases the states of a machine themselves govern the output, so that the
machine is determined by just the inputs, the states and the transition function2.
Example 3.2.1. A flip-flop can be thought of a machine with two states S1 , S2 :

in state one a positive voltage difference is maintained across a pair of terminals
and in state two the voltage difference is negative.
high voltage low voltage

input signal input signal
S1 S2
low voltage high voltage
Figure 2. The two possible states S1 and S2 of a flip-flop.
The input signal is either 0 (no impulse) or 1 (an

impulse). The machine is then completely specified x 0 1
by the transition function in the table at right. No- (x, s1 ) s1 s2
tice that the input 0 leaves the system unchanged, (x, s2 ) s2 s1
whereas the input 1 causes a change of state.

When we draw the directed graph of a finite state machine without output each
edge has just one label, giving the corresponding input. For example, the flip-flop
of Example 3.2.1 has the following diagram.
1
0 s1 s2 0
1
Figure 3. Finite state machine describing a flip-flop.
3.3. Recognition machines

A particularly useful class of finite state machines are designed to accept or reject
a given string of inputs. These recognition machines or finite state automata have
two additional features an initial state q0 and a set F of accepting states (also
known as final states). The machine accepts a given input string (or word) w
if, starting in the starting state, it finishes in an accepting state. In the directed
2Machines with outputs are sometimes callled transducers to distinguish them from those
without inputs. In fact the machines we described in Section 3.1 are a particular type of
transducer called a Mealy machine.
3.3. RECOGNITION MACHINES 25
graph of a recognition machine, an accepting state q marked by a pair of concentric

circles and the initial state q0 is marked by an incoming arrow as shown below.
q q0
A state may be both an initial state and an accepting state.
Example 3.3.1. Consider the recognition machine in Figure 4.
b b
a
q0 q1
a
Figure 4. Finite state machine of Example 3.3.1.
The input word aabbab produces the following effect
inputs: a a b b a b
state: q0 q1 q0 q0 q0 q1 q1
which may be written more briefly as

a a b b a b
q0 q1 q0 q0 q 0 q1 q1 .
Since the final state q1 is accepting, the word is accepted. On the other hand if
the input had been aabb (with the final ab deleted) then the final state would have
been q0 and the input word would have been rejected.
The set of input words accepted by a recognition machine is called the language of
the machine. For example, the language of the machine in Example 5.3.1 consists
of all words with an odd number of as. To see this, note that the input b never
changes the state but the input a always does. Hence to move from q0 to q1
requires an odd number of inputs a.
Remark 3.3.1. Although we have defined a recognition machine without using

outputs some authors prefer to say that they have outputs 0 and 1, with output 0
when the current state is not accepting and output 1 when the state is accepting.
The input word is accepted if the final output is 1.
The purpose of a recognition machine is to accept or reject the finite words from a
given alphabet. Two machines which accept precisely the same words are therefore
effectively the same and we say that they are equivalent. This gives an equivalence
relation on the class of all recognition machines.
b b b
a a
q0 q1 q2
a a
Figure 5. Recognition machine equivalent to that of Example 3.3.1.
Example 3.3.2. It is easy to check that the recognition machine in Figure 5

accepts the same language as that of Example 3.3.1.

3.4. Notations for words and languages

From here on, we will be studying a lot of recognition machines, as well as other
types of automata that either accept or reject a given input word or string w. It
will be useful to have some notations for describing the languages of such machines.
Where repeated symbols occur in a word, as in the above example, a power
notation is often used. Using this notation
04 = 0000, 12 03 = 11000, abn = a |b .{z
. . }b
n
and so on. This notation extends easily to cases where more than one symbol is
repeated. The language
{0101
| {z . . . 01} : n 0}
n
consisting of an arbitrary number of repeats of the word 01, for example, can be
written more briefly as
{(01)n : n 0}.
Note that in this case the parentheses around the 01 are not part of the language
itself. They are used as a notation that shows the scope of the power. Without
them the expression
{01n : n 0}
would be taken to mean
. . . 1} : n 0}.
{0 |11 {z
n
Another convenient notation is na (w) which counts the number of occurrences
of a particular alphabet symbol a in a given word w. Using this notation we can
write the language of the recognition machine of Examples 3.3.1 and 3.3.2 as
{w : na (w) is odd}.
The only problem with this notation is that it doesnt tell us what letters are
allowed to be in w apart from a. The star notation allows us to rectify this. For
3.5. EXTENDED TRANSITION FUNCTION AND SUFFIX SETS 27
a given alphabet the set of all possible words or input strings made from the
symbols in is denoted by . With this notation we can describe the language
of Examples 3.3.1 and 3.3.2 unambiguously as
{w {a, b} : na (w) is odd}.
We will discuss the star notation further in the next chapter. Finally it is conve-
nient to have a notation for the null or empty word the word with no symbols
at all. We write this as .
3.5. Extended transition function and suffix sets

Recall that the transition function of a finite state machine acts on a pair (x, s)
of an input x and a state s to produce a new state (x, s). If we have a finite
word x1 . . . xn of inputs and a state s then we can produce a state by repeatedly
applying the transition functions as follows.
(x1 , S) = (x1 , S) is the state to which the state S

moves after the sequence of inputs x1 .
(x1 x2 , S) = (x2 , (x1 , S)) is the state to which the state S
moves after the sequence of inputs x1 x2 .
.. ..
. .
(x1 x2 . . . xn , S) is the state to which the state S
moves after the sequence of inputs x1 x2 . . . , xn .
More formally, is defined inductively by
(x1 , s) = (x1 , S)
(x1 . . . xn+1 , S) = (xn+1 , (x1 . . . xn , S))
(Inductive definition is studied in MAT1DM and MAT1CLA.) For machines with

output, a similar function F can be defined by
F (x, S) = f (x1 , S)
F (x1 . . . xn+1 , S) = f (xn+1 , F (x1 . . . xn , S))
where f (x, s) is the output when the machine is in state s and receives input x.
Example 3.5.1. For the recognition machine of Example 3.3.1.

(aaa, q0 ) = q1 (which shows aaa is accepted).
(aaba, q0 ) = q1 (which shows aa is not accepted).
(aa, q0 ) = q0 (which shows aa is not accepted).
(aa, q1 ) = q1 .
(aaba, q1 ) = q0 .
In general the word w {a, b} is accepted provided (w, q0 ) = q1 .
This example illustrates how can be used to write down a definition of the
language of a recognition machine:
L = {w : (w, q0 ) F }
where q0 is the initial state and F is the set of accepting states. In fact, a similar
idea applies to any state qi . We define the set of words that take us to an
accepting state if we start processing at qi . This set
S(qi ) = {w : (w, qi ) F }
is called the suffix set of qi . The concept of a suffix set can make the task of
calculating the language of a machine easier by breaking up the calculation into
smaller steps.
Example 3.5.2. We calculate some suffix sets for the recognition machine shown.
q1 0 q2
0
S(q5 ) = .
1 0, 1
q0 S(q2 ) = {}.
1 S(q1 ) = {0}.
0 0 S(q4 ) = {1n : n 0}.
q3 q4 q5
S(q3 ) = {1m 01n : m, n 0}.
1 1 0, 1
Using the above calculations of S(q1 ) and S(q3 ), the language of this machine is
S(q0 ) = {00} {1m 01n : m 1, n 0}.

Some Observations on Example 3.5.2

Note that S(q2 ), S(q4 ). This always happens for accepting states.
Once we reach q5 , we are stuck there. A non-accepting state with this
property is called a sink or black hole.
The suffix set of a sink is always .
Chapter Four
Regular Languages and

Recognition Machines
Formal language theory underlies many aspects of computer sci-
ence including the specification of programming languages. Lan-
guages may be specified by theire grammars and, if these gram-
mars are of a particularly simple form, the languages produced
are each associated with a recognition machine.
References. The material in this chapter is covered in much greater detail in

Chapters 2 and 3 of Linz and in Chapter 2 of Kelley. Shorter treatments are given
in Chapter 10 of Johnsonbaugh, Chapter 10 of Kolman, Busby and Ross and 14.2
and 14.3 of Doerr and Levasseur.
4.1. Grammars and regular languages

A formal language L over an alphabet is just a set of finite strings (or words)
of elements of , possibly including the empty string . Another way of saying
this is that L is any subset of . Although any set of strings is by definition
a language, for particular applications, we usually want the construction of valid
strings to be governed by rules (as in ordinary language). To specify these rules
called production rules we use two disjoint sets, a set N of non-terminal symbols
and a set T of terminal symbols. The terminal symbols are really just the symbols
in our alphabet, while the non-terminal symbols never appear in any string in our
language. They are used along the way as we gradually build up valid strings using
the production rules. The non-terminal symbols include a special one, known
as the starting symbol. We will obtain the words in the language by starting
from and repeatedly applying the production rules. The set of production rules
used to specify a language in this way is called a grammar for that language. A
regular grammar is one where all of the production rules take one of the following
two very simple forms:
(RG1) Replace a symbol A N by symbols tB where t T and B N . The
notation for a production rule of this type is A tB. (It may help to
read this type of rule as replace A with tB or A goes to tB.)
29
30 4. REGULAR LANGUAGES AND RECOGNITION MACHINES
(RG2) Replace a symbol A N by . In other words drop A out of the expres-

sion. The notation for a production rule of this type is A . (It may
help to read this type of rule as delete A.)
A regular language is one that is generated by a regular grammar. This means that
every string or word in the language can be built up by the following algorithm.
Algorithm 4.1.
(a) Start with the string (and observe that there is precisely one non-
terminal symbol to begin with).
(b) While a non-terminal symbol remains in the string, either:
(i) use an appropriate production rule of type (RG1) to replace the non-
terminal symbol with a terminal symbol followed by a non-terminal
one (and observe that each time we do this there will be precisely
one non-terminal symbol present in the string).
(ii) Use a production rule of type (RG2) to replace the non-terminal
symbol in the string.
Once we perform step (ii), there will be no non-terminal symbols left, so
we are forced to stop.
There is a subtle point about the definition of a regular language that is easily
missed. It is sometimes possible to generate a regular language using a grammar
that is not regular. Example 4.1.3 illustrates this. The definition of a regular
language requires only that there exists at least one regular grammar for the the
language, not that all grammars for that language are regular.
Example 4.1.1. Let T = {0, 1} and N = {, A} and consider the grammar

consisting of the following production rules.
0A (rule of type (RG1))
A 0A (rule of type (RG1))
A 1B (rule of type (RG1))
A (rule of type (RG2))
B (rule of type (RG2))
Starting with the string we have only one choice since there is only one produc-
tion rule with on the left hand side (LHS). We must therefore replace by 0A.
We may now replace the A by 0A a many times as we like (including none), thus
building up a sequence of 0s:
0A, 00A, 000A, 0000A, . . .
4.1. GRAMMARS AND REGULAR LANGUAGES 31
As soon as we choose to use the rule A 1B we get the string 0n 1B for some
n 0 and at the next move we are forced to use the rule B , so we end up
with 0n 1. Alternatively, we may choose to use the rule A and hence end
up with 0n . The language generated by this grammar is therefore the set of all
strings of binary digits beginning with a positive number of 0s which may or may
not be followed by a single 1. We can write this language a set
L = {0n 1 : n 1} {0n : n 1}.
The step-by-step process of using Algorithm 4.1 to obtain a word in the language
generated by a regular grammar is called derivation. We represent the steps in
a specific derivation using the symbol = . For instance, the derivation of the
word 00001 from the grammar in Example 4.1.1 is
= 0A = 00A = 000A = 0000A = 00001B = 00001.
There is a correspondence between regular languages and recognition machines.

We start out by considering the easy direction of this correspondence.
Algorithm 4.2. Starting with a recognition machine we can construct a regular

grammar with the same language as follows.
(a) The non-terminal symbols are the labels of the states of the machine
(b) The starting symbol is the initial state of machine
(c) The terminal symbols is the input alphabet of machine
(d) Construct a set of production rules by:
(i) Adding an (RG1) rule A tB for each arrow from A to B labelled
t in the graph
(ii) Adding an (RG2) rule A for each accepting state A.
The best way to understand the construction is to look at a particular example.
Example 4.1.2. The following recognition machine is equivalent to that of Ex-

ample 3.3.1. It has input alphabet {a, b}, state set {A, B}, initial state A and
accepting state set F = {B}.
b b
a
A B
a
The construction gives N = {A, B}, the starting state is A and the production
rules are
A aB, A bA, B aA, B bB, B .
Repeated application of these rules starting with A allows the production of cer-
tain strings, but does not allow the production of others. For example
A = aB = a = a
produces the word a and
A = bA = bbA = bbaB = bbabB = bbab = bbab
produces the word bbab. On the other hand , you might like to convince yourself
that there is no way to produce the word aba using the above grammar.
The language generated by this grammar should be precisely the language of

the given finite state machine. We now illustrate why this is so by considering
some examples. If we start with an input string, such as bbab, which is clearly
accepted by the machine then (because there is only one accepting state B) the
production rules corresponding to the given inputs b, b, a, b lead from A to bbabB.
The production rule B then enables bbab to be obtained in the regular
language generated by the rules.
On the other hand, if bbab is obtained by a sequence of production rules then,
working backwards, it must have come in turn from bbabB, bbaB, bbA, bA, A
via inputs bbab. Thus the input bbab leads from state A to B, which is accepting,
so bbab is accepted by the recognition machine.
The next example illustrates how a grammar can be used to specify the syntax of
part of a programming language.
Example 4.1.3. Let the set of terminal symbols be {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, the
non-terminal symbols be {h digit i, hintegeri, hsigned integeri, hunsigned integeri},
let the starting symbol be hintegeri and let the production rules be
hintegeri hsigned integeri, hintegeri hunsigned integeri
hsigned integeri + hunsigned integeri
hsigned integeri hunsigned integeri
hunsigned integeri hdigitihunsigned integeri
hunsigned integeri hdigiti
hdigiti 0, hdigiti 1, . . . , hdigiti 9.
One example of a valid derivation is:
hintegeri = hunsigned integeri = hdigitihunsigned integer i =
hdigitihdigit i = hdigiti2 = 12
Another is:
hintegeri = hsigned integeri = hunsigned integeri
= hdigiti = 7
4.2. RECOGNITION MACHINES FOR REGULAR GRAMMARS 33
In this way any integer can be obtained. This is not a regular grammar because,
for example, the production rule hunsigned integeri hdigitihunsigned integeri
is not of the form (RG1) or (RG2). The language generated by this grammar
is regular, however, because there is a regular grammar that generates the same
language (Exercise: Try to write one).
4.2. Recognition machines for regular grammars

Given a regular grammar, we can reverse the process used in the last section
and produce a recognition machine which will accept precisely the words in the
language. Based on our experience of going from a recognition machine to a
regular grammar, we expect that the recognition machine based on the language
should have the following features.
Algorithm 4.3. Starting with a regular grammar, we construct a recognition

machine with the same language as follows.
(a) The states are the non-terminal symbols
(b) The initial state is the starting symbol
(c) The input alphabet is the set of terminal symbols
(d) The accepting states are the non-terminal symbols A for which there is
an (RG2) rule A .
(e) Each (RG1) rule A tB gives an arrows from A to B labelled t.
As an example, you should check that applying this construction to the grammar
of Example 4.1.2, takes you back to the graph from which the grammar was
generated. All goes smoothly because the grammar was derived from a recognition
machine. If we start with an arbitrary regular grammar, however, problems can
arise.
Example 4.2.1. Suppose T = {a, b}, N = {, , } and the production rules are
b, a, b, b,
Our constructiion gives the recognition machine shown in Figure 1.
b b
a b

Figure 1. Proposed FSM for the grammar of Example 4.2.1.
However, this diagram does not correspond to a (deterministic) recognition ma-

chine because:
(a) the state (b, ) is not uniquely defined
(b) the states (a, ) and (b, ) are not defined at all.

The latter problem with the machine of Example 4.2.1 is easily fixed. We simply
add a new non-accepting state to which we move from states where there is no
definition of how to handle a particular input. The transition function is then
completed by taking all inputs at the new state to the new state. In Section 3.5,
we described such a state as a sink or sometimes a black hole. Adding a sink
to the machine of Example 4.2.1 gives the complete machine (i.e. one where all
transitions are defined) of Figure 2.
b b a, b
a b a, b

Figure 2. Proposed FSM of Figure 1 with sink added.
Although we have fixed one of the problems mentioned above, the problem that
(b, ) is not uniquely defined remains. We will see how to fix this later. A
commonly used convention, which we will sometimes employ, is to leave out the
sink. Under this convention, when we reach a state for which the next transition
is not defined, the word we are processing is rejected. This is usually done to keep
the diagram for the machine simpler. It can be made formal by:
(a) allowing the transition map to be a partial function (as defined in Section
1.3) rather than a function and
(b) adopting the above convention of rejecting any word that requires the use
of a transition that is not defined.
When we discuss push down automata in Chapter 6, we will nearly always use a
partial function to define transitions and employ the latter convention.
An Equivalent Definition
The more common way of defining of regular grammars allows production rules
of the form
At (RG3)
(where t is a terminal symbol) in addition to (RG1) and (RG2) rules. We have
avoided this because in our construction of a recognition machine from a regular
grammar it is not immediately obvious how to handle such rules. It turns out,
however, that from the point of view of the language generated by a grammar
(which is really all that matters) these two definitions are equivalent. A gram-
mar consisting of rules of the forms (RG1), (RG2) and (RG3) can always be
4.3. NONDETERMINISTIC MACHINES 35
transformed to an equivalent grammar (one that generates the same language)

containing only (RG1) and (RG2) rules by:
Adding one new non-terminal symbol E
Adding the (RG2) rule E .
Replacing each (RG3) rule A t by the (RG1) rule A tE.
You should convince yourself that this new grammar generates exactly the same
language as the original did.
4.3. Nondeterministic machines

The remaining problem with the machine in Figure 2 the fact that one of the
transitions is not uniquely defined is more fundamental. In fact, the diagram
in Figure 2 describes a more general type of machine called a nondeterministic
recognition machine or nondeterministic finite state automaton. The difference
between a nondeterministic recognition machine and a deterministic one is that
in the former the transition function maps a pair of an input and a state to a set
of possible states to which we are allowed to move rather than a single state to
which we must move. For the machine of Figure 2 we have the following (partial)
transition function given by the table
x a b
(x, ) { } {}
(x, ) {, }
(x, )
Recall that is used to denote the undefined entries in the table of a partial
function. Here, we can either think of in the same way or in its usual inter-
pretation as the empty set. For a nondeterministic recognition machine, an input
string does not lead to a unique finishing state, so we have to decide what it ac-
tually means for the machine to accept an input. This is the key idea in defining
nondeterministic machines:
When is a word accepted by a non-deterministice machine?

We say that the input is accepted if there is at least one possible choice of
transitions corresponding to the input string which ends in an accepting
state (even if there are other choices that do not end in an accepting state).
For instance, in Example 4.2.1 the input string ab could either correspond to the
state sequence or to . Even though the first of these does
not end in an accepting state, the input string is accepted by the machine.
To summarise, for each (deterministic) recognition machine there is a regular
language consisting precisely of the words accepted by the machine and for each
regular language there is a nondeterministic recognition machine which accepts
precisely the words in the language. In fact, as we will see in the next chapter,
every nondeterministic machine has an equivalent deterministic one, so regular
languages correspond exactly to recognition machines.
4.4. Regular Expressions

We now generalise the star notation introduced in section 3.4. For any set B
words, the notation B denotes the set of all words (of any finite length including
zero) that can be made from the words in B by simply placing words in B after
one another in any order we please. The technical name for the operation of
making a word w by putting together two words u and v is concatenation and we
simply write w = uv. For example if u = 011 and v = 1001 then
uv = 0111001 {u, v} , vu = 1001011 {u, v} , uu = 011011 {u, v}
This example illustrates that in general concatenation is not commutative uv

is not the same word as vu. By definition B always includes the empty word .
If we want to exclude the empty word we use the notation B + . Here are some
examples using and + :
{1} = {1n : n 0} = {, 1, 11, 111, . . . },

{11} = {12n : n 0} = {, 11, 1111, 111111, . . . },
{1}+ = {1n : n 1} = {1, 11, 111, . . . },
{1, 00} = {, 1, 00, 11, 100, 001, 111, 1111, 1100, 1001, 0011, 0000, . . . }
{1, 00}+ = {1, 00, 11, 100, 001, 111, 1111, 1100, 1001, 0011, 0000, . . . }
Taken together with set union (which we used earlier in this section) and a rather
obvious notation for concatenation, the notation gives a remarkably powerful way
of describing languages. In fact, once they have been properly defined, expressions
using these few operations can be used to define any regular language. They are
called regular expressions and provide a third standard way to describe regular
languages (in addition to recognition machines and regular grammars).
Example 4.4.1.
(a) The language L = {0n 1 : n 1} {0n : n 1} of Example 4.1.1 is
described by the regular expression 0{0} 1 0{0} or {0}+ 1 {0}+ .
4.4. REGULAR EXPRESSIONS 37
(b) L = {0101
| {z . . . 01} : n 0} is described by the regular expression {01} .
n
(c) The language of all words on the alphabet {a, b, c} containing precisely
two as and commencing with b is described by the regular expression
b{b, c} a{b, c} a{b, c} .

Regular expressions are used for many purposes in computer science. Although
we have used set theoretic notations here to emphasise the fact that a regular
expression really denotes a set of words, the versions of regular expressions used
in computer science have been adapted to need only standard computer keyboard
characters so for example:
+ is used instead of
Parentheses ( and ) are used in place of { and } for indicating the scope
of *s and +s.
Example 4.4.2. We rewrite the regular expressions of Example 4.4.1 in computer

science style notation (which may be more familiar to you).
(a) The regular expression 0{0} 1 0{0} would be written as 0+ 1 + 0+ .
(b) The regular expression {01} would be written as (01) .
(c) The regular expression b{b, c} a{b, c} a{b, c} would be written as
b(b + c) a(b + c) a(b + c) .

The notations may also have various abbreviations added for frequently needed
items like digits, white space and alphabetic characters. One well known notation
for regular expressions is the advanced text searching system known as GREP
(Global Regular Expression Parser ) searching. This allows for much more sophis-
ticated search and replace patterns than simple text strings or text strings with
wildcards. In principle, one can use GREP patterns to search for any set of text
strings that constitutes a regular language. GREP based searching is available in
many text editors and command line applications.
Chapter Five
Deterministic Machines
Although non-deterministic recognition machines appear to be
more general, we will see that every such machine accepts the
same inputs as some deterministic machine. This machine might
be quite complicated, but there is a method of simplifying a
machine while still keeping the same accepted inputs.

Chapters 2 and 3 of Linz and in Chapter 2 of Kelley. The construction of a
deterministic recognition machine from a non-deterministic one is also discussed
in 10.5 of Johnsonbaugh. Simplification of machines using suffix sets is covered
in 3.4 of Hopcroft and Ullman.
5.1. Equivalent deterministic and nondeterministic machines

A nondeterministic machine appears to be a more general concept than a deter-
ministic one. Nevertheless, for every nondeterministic recognition machine there
is a deterministic one which accepts precisely the same input strings. To see this
we focus on an example.
Example 5.1.1. Consider the nondeterministic machine in Figure 1.
b b a, b
a, b a

Figure 1. A nondeterministic machine.
Since the output of the transition function of a nondeterministic machine is a

set of states, we let the states of a new machine be all the subsets of the sets of
the given one i.e. , {}, { }, {}, {, }, {, }, {, }, {, , }. The accepting
states are all those subsets containing an accepting state of the original machine,
the initial state is {} and the input alphabet is the same as originally. For any
subset S of original states and any input x, the new transition function 0 takes
a non-empty set S to the set consisting of all the original states obtained from
39
40 5. DETERMINISTIC MACHINES
[
elements of S under the input x. More formally, 0 (x, S) = (x, s). Also 0
sS
maps the empty set to itself after any input. Hence the transition table is
x {} { } {} {, } {, } {, } {, , }
(x, a) { } {} {} {, } {, }
0
{} {, }
0 (x, b) {, } { } {} {, } {, } {, , } {, , }
and the diagram is given in Figure 2.
b b
a, b
{} {, } {, , }
a b a a
a
a
{ } {} {, } {, }
a
b a, b b b
Figure 2. Deterministic FSM equivalent to machine in Example 4.2.1.
The inputs accepted by the original nondeterministic machine are all accepted by
the new machine. For example, bbabb is accepted by the original machine through
the path
b b a b b

This gives rise to the path
b b a b b
{} {} {} { } {, } {, }
in the new machine, so that bbabb is also accepted by the new machine. (Note
that in each element of the path, the state set in the new machine contains the
original state, so that the final state will be accepting.)
It is also true that any input accepted by the new machine will also be accepted
by the old one. For example, the new machine accepts abbb through the path
a b b b
{} { } {, } {, } {, }.
and in the old machine, abbb is accepted by the path
a b b b

5.2. SIMPLIFYING DETERMINISTIC MACHINES 41
The arguments used in this example can be applied generally to show that every
nondeterministic recognition machine has an associated deterministic one which
accepts precisely the same inputs. For later reference we can now summarise the
results of the previuos chapter and this section in the form of a theorem:
Theorem 5.1. A language is regular precisely if it is the language of some deter-

ministic recognition machine.
5.2. Simplifying deterministic machines

If we are given a regular language, specified by production rules, then we can
use the theory of Section 4.2 to construct a (possibly) nondeterministic machine
accepting the given language and can then (if necessary) use the theory of Section
5.1 to construct a deterministic machine doing the same job. However the answer
might be very complicated, with many states. We need techniques for making a
simpler machine accepting the same inputs. The techniques we discuss in the rest
of this chapter are, of course, useful in many other situations as well.
The first step is to remove any inaccessible states from the machine (and any
transitions to or from them). These are states that can never be reached from
the initial state. For the machine illustrated in Figure 2 for example, the states
{}, {, }, {, } and {, , } are clearly inaccessible. Removing them gives the
machine shown in Figure 3, which clearly has the same language as the original
machine.
{} {, }
b
a a
a
{ } {} {, }
a
b a, b b
Figure 3. Machine of Figure 2 with inaccessible states removed.
So we now suppose M is a deterministic machine with no inaccessible states and

input alphabet . Calculating the suffix sets of all of the states of M gives us a
way to decide whether there is a smaller machine equivalent to M . If there are less
suffix sets than states we can construct a smaller machine M 0 that is equivalent
to M .
Example 5.2.1. Although our Example from Figure 3 has five vertices, you can
check that it only has three distinct suffix sets:
S({}) =
S({ }) = S({, }) = {bn : n 0} (accepting)
S({}) = S({, }) = {bm abn : m, n 0} {bn : n 0} (initial & accepting)
Observe that the suffix set S({ }) of the accepting state { } contains the empty
string . A moments thought should convince you that S(q) precisely in the
case where q is accepting.
The construction goes as follows:

(a) The states of M 0 are the suffix sets themselves.
(b) The initial state of M 0 is S(q0 ) where q0 is the initial state of the original
machine.
(c) The accepting states of M 0 are the states S(q) where q is an accepting
state in the original machine.
(d) The transition function 0 of M 0 is defined for states q and inputs a by
0 (a, S(q)) = S((a, q)).
Although this construction may seem a little abstract, it is really no worse than
the construction of the previous section. There we used subsets of the set of states
of the original machine to construct our new machine. Here we use the suffix sets
of the states of the original machine.
Theorem 5.2. The suffix set machine M 0 constructed above is deterministic ,

equivalent to M and has the least possible number of states for such a machine
(we say M 0 is minimal).
Applying this construction to our Example from Figure 3, we see that the states
are the three distinct suffix sets S(), S({ }) = S({, }) and S({}) = S({, })
calculated in Example 5.2.1, the initial state is S({}), the accepting states are
S({}) and S({ }) and according to (d) above, the new transition function 0
may be calculated using the transition function for the machine of Figure 3:
0 (a, S({})) = S((a, {})) = S({})

0 (b, S({})) = S((b, {})) = S({})
0 (a, S({ })) = S((a, { })) = S({})
0 (b, S({ })) = S((b, { })) = S({, }) = S({ })
5.3. AN ALGORITHM FOR FINDING SUFFIX EQUIVALENCE CLASSES 43
0 (a, S({})) = S((a, {})) = S({ })

0 (b, S({})) = S((b, {})) = S({, }) = S({})
giving the minimal deterministic machine shown in Figure 4.
b b a, b
a a
S({}) S({ }) S({})
Figure 4. Minimal deterministic FSM equivalent to FSM of Figure 2.
We may also construct our new machine M 0 by partitioning the machine M (with-
out inaccessible states) into equivalence classes. We regard states as equivalent if
they have the same suffix set and say that they are suffix equivalent. The equiv-
alence class of the initial state is the initial state of M 0 , the accepting states of
M 0 are the equivalence classes of the accepting states of M and the transition
function 0 is given by the rule
0 (a, [q]) = [(a, q)]
for states q and inputs a. This is obviously just another way of carrying out the
above construction, but it emphasizes the key problem that needs to be solved:
Which vertices are suffix equivalent?
5.3. An algorithm for finding suffix equivalence classes

Although the method of the previous section is theoretically valid, it is not always
practical. For a large complicated machine it is not as easy to calculate the
suffix sets as our example might suggest. Fortunately, there are algorithms that
calculate the suffix equivalence classes without having to calculate the suffix sets
explicitly. We now present an algorithm for this purpose which actually works by
determining which pairs of states are not suffix equivalent.
To keep track of this information we use a table with

a cell for each pair {p, q} of distinct states. An ex- A B C D E
ample is shown at right for a machine with states B - - - -
A, B, C, D, E and F . The dashes indicate cells we C - - -
dont need (because we only need one cell for each D - -
pair of distinct vertices. Our algorithm will place an E -
X in the cell for every pair of distinct states that are F
not suffix equivalent.
Algorithm 5.1.
Initialization: We observed in Example 5.2.1 that S(q) precisely in the case
where q is accepting. This shows an accepting state never has the same suffix set
as a non-accepting one. Therefore:
Initialize the table by placing an X in the cell for every pair {p, q} where
q is accepting and p is not.
Loop stage: Suppose at some stage we have states p and q and an input a for
which (a, p) and (a, q) are distinct and the cell for the pair {(a, p), (a, q)}
already has an X. This means (a, p) and (a, q) are not suffix equivalent, so
there must be either w S((a, p)) that is not in S((a, q)) or vice-versa. If there
such w exists then aw S(p), but aw cant be in S(q) because this would mean
w S((a, q)). Thus S((a, p)) 6= S((a, q)). In the vice-versa case, we again
have S((a, p)) 6= S((a, q)) for a similar reason. Therefore:
Make repeated passes through all table cells that do not yet contain an
X, placing an X in the cell for {p, q} if there is an input a such that the
cell for the pair {(a, p), (a, q)} already has an X.
Stopping criterion: We may need to go through the table many times because
a pass that adds at least one X may be setting up the scene for adding more Xs
on the next pass. Therefore:
Continue until a pass is carried out that adds no new Xs.
Calculating the equivalence relation: If the cell for {p, q} does not have an
X after the loop has finished, p and q must be suffix equivalent. Since this is an
equivalence relation, we can easily find the equivalence classes using Algorithm
2.1. Therefore:
Let P0 be the set of pairs {p, q} for which the cell does not contain an
X. Apply Algorithm 2.1 to obtain a partition P . This is the set of suffix
equivalence classes.
Once the equivalence classes are calculated the simplified machine is defined in
the manner discussed at the end of the previous section.
Example 5.3.1. The initialization stage and loop passes for the deterministic
recognition machine in Figure 5 are shown in the following tables.
5.3. AN ALGORITHM FOR FINDING SUFFIX EQUIVALENCE CLASSES 45
1
A B
1 0 1
0 0
F C
0 0
1 0 1
E D
1
Figure 5. An unnecessarily complicated recognition machine.
A B C D E A B C D E A B C D E
B X - - - - B X - - - - B X - - - -
C X - - - C X X - - - C X X - - -
D X X - - D X X - - D X X - -
E X X - E X X X - E X X X -
F X X F X X X X F X X X X
Initialization First Loop Pass Second Loop Pass
Initialization is straightforward. The first loop pass adds 4 new Xs. For example,
an X is added for the pair {B, F } since (0, B) = A, (0, F ) = E and the cell for
{A, E} already has an X. Since the second loop pass adds no new Xs no further
passes are made.
The final table shows that the pairs {A, D}, {B, E} and {C, F } are suffix equiv-
alent. This is already a partition (although it might not be for some machines).
The equivalence classes are therefore [A] = {A, D}, [B] = {B, E} and {C, F }.
These are the states of our simplified machine. Since F [C] the initial state is
[C]. Since [A] contains all accepting states from the original machine it is the only
accepting state in the new machine. Using the formula
0 (a, [q]) = [(a, q)]
from the previous section to calculate the transition function 0 , we obtain the
new recognition machine shown in Figure 6.

1 [A]
0
[C] 0 1
0
1 [B]
Figure 6. Simplified deterministic machine equivalent to machine

of Figure 5
5.4. Designing machines from language descriptions

The ideas of Section 5.2 can also be used to design minimal machines that accept
a languages described using the notations introduced in Section 3.4. For any
language L with alphabet , we can define the suffix set of an element w of
(recall that this means the set of all words made from ) by
SL (w) = {z : wz L}
so that SL (w) is the set of all possible words made from that can be concatenated
with w to give a word in L. In many cases SL (w) will be empty. In many cases,
SL (w) will be infinite. For any language L we have SL () = L (can you see why?).
Example 5.4.1. For the language

L = {1, 00} = {, 1, 00, 11, 100, 001, 111, 1111, 1100, 1001, 0011, 0000, . . . }
of Example 4.4.1 blocks of 0s must be of even length. Hence no word in L can
begin with 01 and therefore SL (01) = . On the other hand, one may check that
SL (1) = L, which is of course infinite.
Even though the individual suffix sets for a language are often infinite, it could
still be the case that there are only finitely many of them (ie only finitely many
distinct suffix sets) because there may be many, many different words in that
have the same suffix set.
Example 5.4.2. The language L = 0{11} = {012n : n 0} has four distinct

suffix sets SL (1) = , SL () = L, SL (0) and SL (01) since:
SL (w) = SL (1) = w starts with 1 or has a 0 anywhere other than
in first place.
SL (w) = L w = .
SL (w) = SL (0) = {12n : n 0} w = 01n for some even n 0.
5.4. DESIGNING MACHINES FROM LANGUAGE DESCRIPTIONS 47
SL (w) = SL (01) = {1n : n is odd} w = 01n for some odd n 1.

Since any word in {0, 1} that does not start with 1 or have a 0 anywhere other
than in first place must be of the form 01k for some k 0, these three cases cover
all possibilities.
It is no coincidence that the language L of the previous example only has finitely
many distinct suffix sets. In fact it is a consequence of the following theorem
which gives yet another characterization of regular languages.
Theorem 5.3. A language is regular precisely if it has finitely many suffix sets.
Part of the proof of this theorem involves showing how to construct a recognition
machine for a language L with finitely many suffix sets. The construction is simple
and is very similar to the simplification construction discussed in section 5.2:
The states of the machine are the suffix sets of L.
The initial state is SL () = L.
The accepting states are the states of the form SL (w) where w L.
The transition function is defined for w and a by
(a, SL (w)) = SL (wa).
According to a theorem similar to Theorem 5.2, the machine constructed in this
way is guaranteed to be minimal and deterministic.
Example 5.4.3. Applying this construction to the language L = {012n : n 0}

of Example 5.4.2 gives:
Our calculations from Example 5.4.2 show that the states of the machine
are SL (1) = , SL () = L, SL (0) and SL (01) .
The initial state is SL () = L as always.
The only accepting state is S(0).
We calculate the values for the transition function using the formula:
(0, SL ()) = SL (0) = S(0)
(1, SL ()) = SL (1) = S(1)
(0, SL (1)) = SL (10) = SL (1) since 10 starts with 1.
(1, SL (1)) = SL (11) = SL (1) since 11 starts with 1.
(0, SL (0)) = SL (00) = SL (1) since 00 has a 0 in second place.
(1, SL (0)) = SL (01).
(0, SL (01)) = SL (010) = SL (1) since 010 has a 0 in third place.
(1, SL (01)) = SL (011) = SL (0).
The diagram for this machine is shown in figure 7. You should check that its
language is in fact L. We could simplify the machine by omitting the sink SL (1).

0
SL () SL (0)
1 1 1
0
0, 1 SL (1) SL (01)
0
Figure 7. Suffix set machine accepting L = {012n : n 0}.

Chapter Six
Machines with Memory

Although regular languages can be powerful and useful, not all
languages are regular. In fact, most modern programming lan-
guages are not regular. First we will consider why this is so and
see how to obtain a more powerful class of machines capable of
recognising a larger class of languages by adding a limited type
of memory in the form of a stack.

Chapter 7 of Linz, Chapter 3 of Kelley and Chapter 5 of Hopcroft and Ullman.
6.1. Non-Regular Languages

We have seen that regular languages are quite useful and powerful. For example,
in Section 3.4 we saw that they may be used for advanced text searching. Given
the ease with which they may be described, it would be a wonderful world indeed
if all useful languages turned out to be regular. Unfortunately this is not the case.
For reasons we will discuss shortly, most modern programming languages are not
regular. Lets first consider why even the very simple language
L = {0n 1n : n 0}
fails to be regular. Many text books use a pumping lemma to prove this, but
Theorem 5.3 allows us to take an easier approach. We simply show that L has
infinitely many distinct suffix sets (as defined in Section 5.4), so L cannot be
regular by Theorem 5.3. For each n 1 it is easy to see that the only word
in {0, 1} that can follow 0n 1 is 1n1 so SL (0n 1) = {1n1 }. Since these sets are
different for each n 1, we have found an infinite number of distinct suffix sets,
proving L is not regular. Notice that we didnt need to find all of the suffix sets
(there are others) to do this. It was enough to find an infinite set of distinct ones.
Example 6.1.1. (a) To show L = {0m 1n : m > n 0} is not regular,

observe that SL (0m 1) = {} {1n : n m 2} for each m 2 so the sets
SL (001) = {}, SL (0001) = {, 1}, SL (00001) = {, 1, 11}, . . .
are all distinct. The fact that these suffix sets are not disjoint does not
matter. We only need to know that they are distinct.
49
50 6. MACHINES WITH MEMORY
(b) Recall from Section 3.4 that we denote the number of 0s in a word w in
{0, 1} by n0 (w) and the number of 1s by n1 (w). To show that
L = {w {0, 1} : n0 (w) = n1 (w)}
is not regular, observe that
SL (1m ) = {w {0, 1} : n0 (w) = n1 (w) + m}
for each m 0 so the sets SL (1), SL (11), SL (111), . . . are all disjoint.
(c) Showing the language M = {(m )m : m 0} of matched parentheses is not
regular is much the same as for the language L = {0n 1n : n 0} discussed
above. Here the sets SL ((n )) = {)n1 } are disjoint for each n 1.

Example 6.1.1(c) suggests one reason why most modern programming languages
are not regular. They generally allow arithmetic expressions like
a + b, (a + b) 4 and ((a b) + (b/(a a)))
where the parentheses must match. To illustrate this, the next Example will give
a grammar for a limited language of arithmetic expressions of this type and show
that it is not regular. We first introduce the alternation convention for writing
the production rules in a grammar. This convention is designed to reduce the
amount of writing and allows us to combine production rules like
A tB
A rS
A
with the same left hand side into a single expression as

A tB rS .
This is just a more compact way of writing several similar rules at once and means
exactly the same thing. It may be read as replace A with either tB or rS or .
Example 6.1.2. To keep our language L of simple arithmetic expressions simple,

we limit ourselves to just four single letter variables, two operations and allow no
numerical constants. The grammar for L is defined as follows:
Terminal symbols T = {a, b, c, d, +, , (, )}
Non-terminal symbols N = {, E}
Starting symbol
Production rules

E+E EE

E (E + E) (E E) a b c d
6.2. STACKS 51
It is easy to see that this grammar produces parenthesized arithmetic expressions

involving the variables a, b, c and d like
a + c, (a + b) c, d + (a a)
and so on. Derivations for these expressions illustrate this:
= E + E = a + E = a + c
= E E = (E + E) E = (a + E) E = (a + b) E
= (a + b) c
= E + E = E + (E E) = d + (E E) = d + (a E)
= d + (a a)

The form of the productions E (E +E) (E E) guarantees that the parenthe-
ses will be balanced. To show that this language is not regular, observe first that
for each m 1 the suffix set S((m ) is not empty since, for example, it contains
a{+a)}m = a +a) + a) + a) .
| {z }
m
m
Second, observe that for any w S(( ), the fact that left and right parentheses
must match means that
n) (w) = n( (w) + m.
But this means the sets S((m ) are disjoint for different m giving infinitely many
distinct suffix sets, so L cannot be regular.

We used the requirement of matching parentheses to show that the language

L of example 6.1.2 cannot be regular. Another fundamental feature of L that
cannot be realised in a regular grammar is the arbitrary nesting of expressions.
We can expand any E in a derivation to obtain any expression in L and thus
nest expressions to arbitrary depth. Regular languages do not posses this kind
of recursive structure. This is a further reason why most modern programming
languages are not regular. They almost always allow arbitrary nesting of control
constructs like if-then-else statements and loops inside each other.
6.2. Stacks
The reason recognition machines are unable to recognise languages like
L = {0n 1n : n 0}
is that they only have a very limited form of memory. To process elements of L,
we would need some way of keeping track of how many 0s we have encountered
so that we can make sure the number of 1s is the same. In general, there is no
way of doing this with a finite state recognition machine. Since the language L we
have discussed in this section is not regular, it cannot be generated any regular
grammar. It is very easy, however, to give a non-regular grammar that generates
it. The terminal symbols are of course 0 and 1 and we need only one non-terminal
symbol (which therefore must be the starting symbol). The production rules

are 01 . It is easy to check that this simple little grammar generates L.
The solution is to add a rudimentary form of memory to our

machine in the form of a stack. A stack is a simple data an
structure that permits only a simple operation of adding ele- an1
..
ments to one end and removing them at the same end. Like .
a machine, the stack has an alphabet of allowed symbols. A a2
diagrammatic representation is given at right for a stack con- a1
taining the elements a1 , a2 , . . . , an .
The key feature of this kind of memory is

that we only have access to one element of
an Top of Stack
the stack at any point in time the element
an1
most recently added. This element is called ..
.
the top of the stack. Once this element is
a2
removed an action called popping the one
a1 Bottom of Stack
below that becomes the top of the stack and
hence becomes available to be popped.
The action of popping makes the top of the stack available for use in computation
and removes it from the stack. The reverse action putting items on the stack
is called pushing. The item at the other end of the stack is called the bottom.
Stacks are used for huge range of purposes in computer science. We may think
of a stack as the most rudimentary form of memory or storage available in a
computation. In the next section, we will use them to define a more powerful
type of machine capable of recognising all of the examples of non-regular languages
from the previous section. Lets see how we can use a stack to solve the problem
of deciding whether an input word is in one of these languages.
Example 6.2.1. We use a stack to decide whether an input word w {0, 1} is

in the language
L = {w {0, 1} : n0 (w) = n1 (w)}
of Example 6.1.1(b). Our stack alphabet in this case is {0, 1, z} and we start with
an stack containing a single z. The purpose of the initial stack symbol z is to alert
us if we reach the bottom of the stack. The strategy is to add 0s and 1s to the
stack in such a way that they cancel each other out so that if n0 (w) = n1 (w),
we should be left with just z on the stack after the input word is processed. More
6.2. STACKS 53
0
0 0 0
0 1 0 0 0 0 0
z z z z z z z z z z z
0 1 1 0 0 0 0 1 1 1
Input symbols
Figure 1. Using a stack to decide whether n0 (w) = n1 (w).
precisely, we process each input symbol a in turn, comparing it to the element s

popped from the stack and:
If a 6= s and s 6= z we do nothing since 0s and 1s should cancel out
(remember that popping has already removed s from the stack).
If s = z we push z and then a onto the stack because there is nothing to
cancel and we must replace the popped element z and then add a to the
stack for future cancellation.
If a = s (and therefore s 6= z) we push two copies of a onto the stack
because we must replace the popped element and add a new one to account
for the input symbol.
If at the end of this processing the top element of the stack is z, the word is
accepted. If not the word is rejected.
To see how this works, consider the processing of the word w = 0110000111.
Figure 1 shows how the stack evolves as each input symbol is processed. This
word is accepted since the top element of the stack is z after the last input symbol
has been processed.
Although the vertical representation of a stack shown in the diagrams is a helpful

way to visualize how a stack works, we will find a horizontal representation much
more convenient and compact. In fact, we simply write the contents of the stack
as a string of symbols. We adopt the convention writing the top element of the
stack at the left. Using this convention we can write down the evolution of the
stack shown in Figure 1 as:
z 0z z 1z z 0z 00z 000z 00z 0z z
and if we want to emphasize how the input symbols drive the evolution of the
stack:
0 1 1 0 0 0 0 1 1 1
z 0z z 1z z 0z 00z 000z 00z 0z z.
6.3. Push down automata

We obtain a new and more powerful class of machines by adding a stack to a finite
state machine in such a way that as each input symbol is processed, the top of
the stack is popped and may be used as well as the input symbol and the current
state to decide:
(a) Which state to move to next.
(b) What to push onto the stack.
Machines of this kind are called push down automata (PDA), so named because
stacks are often referred to as push-down data structures. Formally speaking a
(deterministic) push down automaton is specified as follows:
Q = {q1 , . . . , qr } is the set of states.
= {a1 , . . . , am } is the input alphabet
= {z, b1 , . . . , bn } is the stack alphabet
: Q Q is the partial transition function.
F is the set of accepting states.
Q0 is the initial state.
z is the initial stack symbol .
Most items in this description are familiar from our study of finite state recognition
machines and we already met the stack alphabet and the initial stack symbol in the
previous section. The difference that requires explanation is the more complicated
form of the partial transition function . We will explain shortly why it is always
a partial function. For a PDA, it needs to take three arguments:
The current input symbol (as always),
The current state (as always),
The symbol popped from the stack (which may now influence the result).
Instead of a single state, the partial transition function for a PDA returns an
ordered pair consisting of:
the state to move to next,
the word to be pushed onto the stack.
For reasons already apparent in Example 6.2.1, we typically need the ability to
push more than one symbol onto the stack at each move. There we often needed
to replace the element that had just been popped (remember that the action of
popping removed it from the stack) and then add a new one. Sometimes we may
even need to push more than two elements. It is also convenient in some cases to
start with more than just the initial stack symbol on the stack.
Example 6.3.1. Suppose we want our PDA to move from state p to state q when
the input is 0 and the stack top is 1. If we wish to replace the popped 1 and then
add the input symbol 0 to the top of the stack, the transition formula would be
6.3. PUSH DOWN AUTOMATA 55
(0, p, 1) = (q, 01) (Since the top of the stack is written on the left, we push 01,
not 10). In the notation introduced in the previous section, we can write this
stack transition as:
0
1bn . . . b1 z 01bn . . . b1 z
where bn . . . b1 z is the prior contents of the stack at this point. We extend this
notation to show state and stack transitions using ordered pair notation:
0
(p, 1bn . . . b1 z) (q, 01bn . . . b1 z).
To erase the top of the stack, the transition formula would be (0, p, 1) = (q, )
and the effect of the transition would be written as
0
(p, 1bn . . . b1 z) (q, bn . . . b1 z).
This notation shows how the internal configuration the state and stack con-
tents of the PDA is changing during a computation. We consequently call this
configuration notation.
We define the transitions for a PDA using a partial function. This is because
most PDAs we construct would otherwise need a sink to cope with all of the un-
wanted combinations of input symbol, state and stack top, making their descrip-
tion unnecessarily complicated. With so many such combinations to consider, the
description of a full transition function is tedious and typically contains lots of
rows where all entries go to a sink. For these reasons, the description of PDAs
using either a directed graph or transition function table is greatly simplified by
omitting sinks and using a partial function.
The fact that the transition function now takes three arguments instead of two
is inconvenient when writing down a transition table. We are forced to combine
two of the arguments have on one axis. We adopt the convention of writing
the input symbols along the top of the table and all possible combinations of the
state and stack symbol down the side. Entries in the transition table are ordered
pairs of the form (state, word). We use the symbol to indicate, where necessary,
places where the transition is undefined. The reasons for this notation will become
clearer in Section 6.4.
Example 6.3.2. The following PDA accepts L = {0n 1n : n 0}, our first
example of a non-regular language.
Q = {q0 , q1 , q2 , q3 }. x 0 1
= {0, 1}. (x, q0 , z) (q1 , z)
= {z, 0}. (x, q1 , z) (q1 , 0z) (q0 , z)
F = {q0 , q3 }. (x, q1 , 0) (q1 , 00) (q2 , )
Initial state q0 (x, q2 , z) (q0 , z)
Initial stack symbol z. (x, q2 , 0) (q2 , )
Since L contains the empty word, q0 is an accepting state. If the first input is 0 we
move to q1 and leave the stack unchanged. While the input consists of consecutive
0s, we stay at q1 adding a 0 to the stack for each input 0. Since we didnt push
the first 0, the stack always contains one less 0 than we have processed.
When an input 1 occurs, we move to q2 erasing the 0 on the top of the stack
and remain there while the input word contains consecutive 1s, erasing a 0 for
each input 1. If we are at q1 or q2 , the next input is 1 and the stack top is z, we
are at a point where the input processed so far is of the form 0n 1n1 because we
have erased precisely n 1 zeros on the stack, so we move to the accepting state
q3 . The next input 1 then completes the word 0n 1n L. Any further input gives
a word that is not in L so there are no transitions defined from q3 which means
that such a word is rejected.
We illustrate the operation of this PDA using the configuration notation intro-
duced in Example 6.3.1 for various words (as in the transition table, indicates
an undefined transition):
0 1
w = 01 : (q0 , z) (q1 , z) (q3 , z) (accept)
0 0 1
w = 001 : (q0 , z) (q1 , z) (q1 , 0z) (q2 , z) (reject)
0 0 1 1
w = 0011 : (q0 , z) (q1 , z)) (q1 , 0z) (q2 , z) (q3 , z) (accept)
0 0 1 0
w = 0010 : (q0 , z) (q1 , z) (q1 , 0z) (q2 , z) (reject)

We can also draw directed graph representations of push down automata in a
similar way to those for finite state automata. The difference is that when we
label the edges representing the transitions, we also need to show
(a) how the popped element determines the transition,
(b) what gets pushed at the transition.
We can take care of (a) by drawing an edge coming out of every state for every
possible ordered pair of the form (input symbol, stack symbol), i.e., for every
(a, b) . As usual, we dont draw multiple edges between the same two
states, instead labeling one edge with the information for all of the transitions
between the states. This makes the diagram simpler.
(a, b) 7 u,
(c, d) 7 v (e, f ) 7 w
p q r s
Figure 2. Labelling the edges in the directed graph for a PDA.
We can take care of (b) above using a similar notation to that for stack evolu-
tion introduced at the end of Section 6.2. For example, edge labellings for the
6.3. PUSH DOWN AUTOMATA 57
transitions (a, p, b) = (q, u), (c, p, d) = (q, v) and (e, r, f ) = (s, w) are shown in
Figure 2. The directed graph for the PDA of Example 6.3.2 is shown in Figure 3.
(0, z) 7 0z
z q0 q1 (0, z) 7 z, (0, 0) 7 00
(1, z) 7 z (1, 0) 7
q3 q2 (1, 0) 7
(1, z) 7 z
Figure 3. Directed graph for the PDA of Example 6.3.2.
So far in our discussion of PDAs we have applied the same criterion for acceptance
of words as we used for finite state automata, namely, that processing the word
takes us to an accepting state. Another criterion often used to determine whether
a PDA accepts an input word is that the stack should be empty after the word
is processed. By saying that the stack is empty, we really mean that the stack
contains only the initial stack symbol z (or more accurately, since we are only
allowed to look at the top of the stack, z is on the top of the stack). In fact, we
used this criterion in Example 6.2.1. We call this criterion acceptance on empty
stack and when defining a machine we will adopt the convention that this criterion
is used whenever the set F of accepting states for a PDA is empty. Acceptance
on empty stack frequently leads to simpler machines.
Example 6.3.3. Using the acceptance on empty stack criterion, we construct a

PDA that accepts the same language as that of 6.3.2, but has only two states:
Q = {q0 , q1 }. x 0 1
= {0, 1}. (x, q0 , z) (q0 , 0z)
= {z, 0}. (x, q0 , 0) (q0 , 00) (q1 , )
F = . (x, q1 , 0) (q1 , )
Initial state q0
Initial stack symbol z.
The directed graph for this PDA is shown in Figure 4. Not only does it have less
states, but it also avoids the annoying requirement that the number of 0s to be
placed on the stack be one less than the number processed.

(0, 0) 7 00,
(0, z) 7 0z (1, 0) 7
(1, 0) 7
z q0 q1
Example 6.3.4. The PDA described below implements the stack algorithm used
in Example 6.2.1 as a PDA. The language is
L = {w {0, 1} : n0 (w) = n1 (w)}.
Q = {q0 }. x 0 1
= {0, 1}. (x, q0 , z) (q0 , 0z) (q0 , 1z)
= {z, 0, 1}. (x, q0 , 0) (q0 , 00) (q0 , )
F = . (x, q0 , 1) (q0 , ) (q0 , 11)
Initial state q0
Using acceptance on empty stack has enabled us to construct a PDA with only
one state! Although we could draw a directed graph for this PDA, a moments
thought should convince you that directed graphs for single state machines are
not very informative. We can express the processing of the word w = 0110000111
of Section 6.2 in configuration notation:
0 1 1 0 0 0
(q0 , z) (q0 , 0z) (q0 , z) (q0 , 1z) (q0 , z) (q0 , 0z) (q0 , 00z)
0 1 1 1
(q0 , 000z) (q0 , 00z) (q0 , 0z) (q0 , z).

Although the fact that the PDA of Example 6.3.4 has only a single state may
seem rather strange at first sight, it can be shown that for any PDA there is an
equivalent one using acceptance on empty stack with only one state. In our final
example of this section, we use the notation wR to denote the reverse of a word
w. This is just the word w written in reverse order. For example:
001R = 100, 0101R = 1010, 1001R = 1001.
Words like 1001 that satisfy the condition w = wR are known as palindromes.
Palindromes of odd length are always of the form ucuR for some word u and
symbol c. Palindromes of even length are of the form vv R for some word v.
Example 6.3.5. The language

L = {w2wR : w {0, 1} }
6.4. NONDETERMINISTIC PUSH DOWN AUTOMATA 59
consists of palindromes in {0, 1, 2} of a special form. They have odd length and
the middle symbol 2 does not appear anywhere else, so we can easily detect when
we have reached the middle of the word. The following PDA accepts this language:
Q = {q0 , q1 }. x 0 1 2
= {0, 1, 2}. (x, q0 , z) (q0 , 0z) (q0 , 1z) (q1 , z)
= {z, 0, 1}. (x, q0 , 0) (q0 , 00) (q0 , 10) (q1 , 0)
F = . (x, q0 , 1) (q0 , 01) (q0 , 11) (q1 , 1)
Initial state q0 (x, q1 , 0) (q1 , )
Initial stack symbol z. (x, q1 , 1) (q1 , )
Before we encounter an input 2, we stay at q0 adding the input symbols to the

stack. When a 2 occurs we move to q1 . For an input word w2wR L the stack
should now contain the w part. We then compare each new input symbol to the
stack top. For an input word in L, they should be the same because they are
coming off the stack in reverse order. If they are not the same, the transition is
undefined, so the word is rejected. If we get to an empty stack stage, we have
processed a word in L. If there are any further input symbols, the word is rejected
because no transitions are defined for stack top z at q1 . For similar reasons any
word containing a second 0 is rejected. The graph this PDA is shown in Figure 5.
(0, z) 7 0z, (1, z) 7 1z,

(0, 0) 7 00, (1, 0) 7 10, (0, 0) 7 ,
(0, 1) 7 01, (1, 1) 7 11 (1, 1) 7
(2, z) 7 z,
(2, 0) 7 0,
(2, 1) 7 1
z q0 q1
We illustrate the operation of this PDA for various words:

0
w=2: (q0 , z) (q1 , z) (accept)
0 1 2 1
w = 0121 : (q0 , z) (q0 , 0z)) (q0 , 10z) (q2 , 10z) (q1 , 0z) (reject)
0 1 2
w = 01210 : (q0 , z) (q0 , 0z)) (q0 , 10z) (q2 , 10z)
1 0
(q1 , 0z) (q1 , z) (accept)
0 1 2 0
w = 01201 : (q0 , z) (q0 , 0z)) (q0 , 10z) (q2 , 10z) (reject)

6.4. Nondeterministic push down automata

Just as finite state machines come in deterministic and nondeterministic varieties,
so do PDAs. As was the case for finite state machines, the difference is in the
(partial) transition function. Instead of returning an ordered pair of the form

(state, word) the partial transition function returns a set of such ordered pairs.
This is the set of all possible moves that can be made for a given input symbol,
state and stack top.
As was the case for finite state machines, we can define nondeterministic PDAs
using transition tables or directed graphs. As was the case for finite state recogni-
tion machines, a nondeterministic PDA is said to accept a word w if there is some
possible evolution of the machine starting from the initial state that processes the
word and ends at an accepting state or with an empty stack (depending on the
acceptance criterion).
There is a very important difference between nondeterministic PDAs and nonde-
terministic finite state machines. We saw in Chapters 4 and 5 that deterministic
and non-deterministic finite state machines are capable of recognizing exactly the
same languages, namely, the regular ones. For PDAs this is not true. Nondeter-
ministic PDAs are capable of recognizing a strictly larger class of languages than
deterministic PDAs.
Example 6.4.1. Despite its similarity to the language of Example 6.3.5, it can
be shown that no deterministic PDA that accepts the language
L = {wwR : w {0, 1} }
of even length palindromes in {0, 1} . We can, however, describe a nondetermin-

istic PDA that accepts this language:
Q = {q0 , q1 }. x 0 1
= {0, 1}. (x, q0 , z) {(q0 , 0z), (q1 , 0z)} {(q0 , 1z), (q1 , 1z)}
= {z, 0, 1}. (x, q0 , 0) {(q0 , 00), (q1 , 00)} {(q0 , 10), (q1 , 10)}
F = . (x, q0 , 1) {(q0 , 01), (q1 , 01)} {(q0 , 11), (q1 , 11)}
Initial state q0 (x, q1 , 0) {(q1 , )}
Initial stack symbol z. (x, q1 , 1) {(q1 , )}
The operation of this PDA is similar to that of Example 6.3.5, except that while
it is at q0 pushing input symbols onto the stack, it has the ability to jump for
no particular reason to q1 and start removing matching items from the stack. If
it happens to do this at the right point for a word wwR in L just as it pushes
the last symbol of w it will remove all of the symbols in w from the stack in
reverse order and empty the stack. This means that there is a possible evolution
of the machine starting from the initial state that processes the word and ends
with an empty stack, which is precisely how we define which words are accepted
by a nondeterministic PDA. We illustrate the operation of this PDA for various
words:
6.4. NONDETERMINISTIC PUSH DOWN AUTOMATA 61
For w = 0 there are two possible configuration evolutions

0 0
(q0 , z) (q0 , 0z) and (q0 , z) (q1 , 0z)
neither of which ends with an empty stack, so w is rejected, correctly
since w is not an even length palindrome.
For For w = 00 there are three possible evolutions
0 0
(q0 , z) (q0 , 0z) (q0 , 00z),
0 0
(q0 , z) (q0 , 0z) (q1 , 00z),
0 0
(q0 , z) (q1 , 0z) (q1 , z)
precisely one of which ends with an empty stack, so w is accepted.
For For w = 1001 there are five possible evolutions
1 0 0 1
(q0 , z) (q0 , 1z) (q0 , 01z) (q0 , 001z) (q0 , 1001z),
1 0 0 1
(q0 , z) (q0 , 1z) (q0 , 01z) (q0 , 001z) (q1 , 1001z),
1 0 0 1
(q0 , z) (q0 , 1z) (q0 , 01z) (q1 , 001z) ,
1 0 0 1
(q0 , z) (q0 , 1z) (q1 , 01z) (q1 , 1z) (q1 , z),
1 0
(q0 , z) (q1 , 1z)
precisely one of which ends with an empty stack, so w is accepted.
Notice that once this machine reaches q1 , its operation becomes deterministic,
ensuring that it exactly matches the part of the word pushed to the stack while
it was at q0 .
(0, z) 7 0z, (1, z) 7 1z,

(0, 0) 7 00, (1, 0) 7 10, (0, 0) 7 ,
(0, 1) 7 01, (1, 1) 7 11 (1, 1) 7
(0, z) 7 0z, (1, z) 7 1z,
(0, 0) 7 00, (1, 0) 7 10,
(0, 1) 7 01, (1, 1) 7 11
z q0 q1
The fact that the machine in Example 6.4.1 can decide somewhat arbitrarily to
move to an alternative state may seem strange in the study of computation, but
it is precisely this feature that gives the machine its power to detect the middle
of an even length palindrome. The machines ability to guess where the middle
is come from the way we define which words are accepted by a nondeterministic
PDA. You might say that it only guesses correctly in the sense that at least one
correct guess is possible.
Nondeterminism is obviously an undesirable feature in many applications. If we

use a PDA to specify the desired operation of a piece of hardware or software, for
example, we typically require determinism. The task of simulating a deterministic
PDA in hardware or software is typically much easier than for a non-deterministic
one.
On the other hand, nondeterministic PDAs have turned out to be useful models
for analysing the behaviour of backtracking algorithms that explore trees of pos-
sibilities in search of a solution to a problem. We may think of the machine in
Example 6.4.1 in this way. In a certain sense it tries switching from pushing
symbols mode to matching symbols model at every possible point in the input
word, looking for a switch point that gives a match. Conversely, the task of sim-
ulating a deterministic PDA in hardware or software typically involves some sort
of backtracking algorithm.
In the case of non-deterministic PDA, using the empty set symbol to mark
undefined entries in the transition table is completely consistent with its usual
usage in set theory. It simply indicates the that the set of possible transitions for
the table entry is empty. The transition table in Example 6.4.1 illustrates this.
Chapter Seven
Context Free Languages

We now show how to write grammars for the languages we stud-
ied in Chapter 6. Just as the languages generated by regular
grammars are those accepted by finite state recognition ma-
chines, there is a class of grammars which generate the languages
accepted by PDA.

Chapters 5, 6 and 7 of Linz, Chapter 3 of Kelley and Chapter 4 of Hopcroft
and Ullman. An algorithm for transforming context free grammars to Greibach
Normal Form is given in Greibach Normal Form Transformation Revisited by
Norbert Blum and Robert Koch (Information and Computation, 150 (1999), 112
118).
7.1. Context free grammars

In Chapter 6 we saw some reasons why most programming languages are not reg-
ular. Although we can give grammars for them, we cannot give regular grammars.
Equivalently, it is impossible to design a finite state machine that recognises them.
In this chapter we consider a larger class of grammars called context free gram-
mars. These grammars are powerful enough to describe most1 of the syntax of
modern programming languages.
Just like a regular grammar, a context free grammar has a set N of non-terminal
symbols, a set T of terminal symbols and a special starting symbol N . The
difference is in the production rules. Recall that all production rules in a regular
grammar must be of type (RG1) or (RG2). In a context free grammar, the
only restriction on production rules is that the part of the rule to the left of the
arrow () must be a single non-terminal symbol. Any finite string of terminal
or non-terminal symbols can appear to the right of the arrow. In other words the
production rules must all be of the form
A 1 2 . . . n
1It is known that certain features of programming languages, particularly the requirement
that variables be declared before use, cannot be described using context free grammars. Nonethe-
less, the standard method of describing programming languages, BNF, is context free (see
page 65). Problems like pre-declaration of variables are dealt with separately.
63
64 7. CONTEXT FREE LANGUAGES
where A N and each i N T . Just as we call a language regular if it is

generated by some regular grammar, we say L is a context free language if there
exists some context free grammar that generates L. As was the case for regular
languages, there may be other non-context free grammars that generate L, but
provided there is at least one context free one that also generates L, we call L
context free.
Recall that the production rules of a regular grammar always have just a single
non-terminal to the left of the arrow. Since regular grammars also have this
property, they are always context free and hence regular languages are always
context free. This idea is illustrated in the Venn diagram in Figure 1.
Regular Languages
Figure 1. Regular and Context Free Languages
Because the production rules of a context free grammar are much less restricted
than those in a regular grammar, they allow us to describe more complicated
languages. For instance, the grammar for simple arithmetic expressions presented
in Example 6.1.2 is context free and therefore the language consisting of all such
expressions is context free. All of the non-regular languages for which we con-
structed PDAs in Chapter 6 are context free. In most cases, it is easy to give
context free grammars that generate them.
Example 7.1.1. (a) The language L = {0n 1n : n 0} is generated by the

grammar with T = {0, 1}, N = {} and rules

01 .
As an illustration, the derivation of 000111 is given by
= 01 = 0011 = 000111.
(b) The language L = {w2wR : w {0, 1} } of example 6.3.5 is generated by

the grammar with T = {0, 1, 2}, N = {} and rules

00 11 2.
As an illustration, the derivation of 1012101 is given by
= 11 = 1001 = 101101 = 1012101.

7.1. CONTEXT FREE GRAMMARS 65
(c) The even palindrome language L = {wwR : w {0, 1} } of example 6.4.1

is generated by the grammar with T = {0, 1}, N = {} and rules

00 11 .
As an illustration, the derivation of the palindrome 101101 is given by
= 11 = 1001 = 101101 = 101101.
Students familiar with the Backus Naur Form (BNF ) notation for specifying
the syntax of programming languages may have already noticed the similarity
between BNF and the way we write context free grammars. In fact, BNF is just
an alternative way of writing context free grammars that has some short cuts
useful for describing program syntax. In BNF:
The arrow is written as ::=.
Non-terminal symbols are written as names enclosed by < and >, for
example <expression>.
Terminals are just written as themselves.
The grammar of Example 6.1.2 could be written in BNF as:

< expr > ::= < subexp > + < subexp > < subexp > < subexp >

< subexp > ::= (< subexp > + < subexp >) (< subexp > < subexp >)

< subexp > ::= a b c d
BNF also has shortcuts for optional elements and for repetitions of elements.
These can be expressed in our notation for context free grammars, although it
is necessary to use more production rules to do so. The notation of Example
4.1.3 was inspired by BNF. Indeed, if changing each to ::= converts this
notation to valid BNF.
Context Sensitive Grammars and the Chomsky Hierarchy.

In order to understand the reason context free grammars are so called, it is in-
structive to consider an even more powerful class of grammars called the context
sensitive grammars. In these grammars the production rules are of the form:
A ()
where A is a non-terminal (so A N ) , and denote strings (with and

possibly empty) of non terminals and terminals (so , (N T ) and
(N T )+ ). For example, we might have a production rule like
tAu taaBu
The idea expressed by such a rule is that the non-terminal A can be replaced by
the string aaB in a derivation, but only if it has a t to its left and a u to its right.
In other words, it can only be replaced if it appears in the context tAu. In this
sense, the production rules are senstive to the context in which items appear.
Observe also that since and in () may be empty, every context free production
rule is of the form required for a context sensitive grammar. Thus every context
free grammar is also context sensitive and hence every context free language is
also context sensitive. Not all grammars and languages are context sensitive.
Even this powerful class of grammars has its limitations because it does not allow
certain types of production rules. For example, none of the rules
AB CD or t tAB or AtB AB
where A, B, C, D N and t T are valid in a context sensitive grammar. The
broadest possible class of grammars is the class of unrestricted grammars. For
these grammars, almost any kind of production rules are allowed, the only re-
striction being that the left hand side of a rule cannot be . The four classes of
grammars and languages we have disscussed regular, context free, context sen-
sitive and unrestricted make up the Chomsky hierarchy 2, the most fundamental
classification in formal language theory.
Unrestricted Languages
Context Sensitive Languages
Regular Languages
Figure 2. A Venn diagram of the Chomsky hierarchy.
2Named after Noam Chomsky, a pioneer of the study of formal languages and their appli-
cation to both computer science and linguistics.
7.2. GREIBACH NORMAL FORM 67
7.2. Greibach normal form

In Chapter 4 we saw that the regular languages are precisely those accepted by
finite state recognition machines. It turns out that there is a similar correspon-
dence between context free languages and (not necessarily deterministic) PDA.
In the next section, we show how to construct a PDA that accepts the language
generated by a context free grammar. Before we can do this, however, we must
first transform the grammar into an equivalent one that is suitable for this pur-
pose. By an equivalent grammar, we mean one that generates exactly the same
language as the one we started with. We need to convert our grammar into an
equivalent one in which every production rule takes one of the following forms3.
(GNF1) A tB1 B2 . . . Bn
(GNF2) A t
(GNF3)
where t T, A N and B1 , B2 , . . . , Bn N \{} (so cannot appear on the right

hand side of any production rule). In other words (apart from the special rule
), the right hand side of every production rule must consist of a terminal
symbol followed by a (possibly empty) string of non-terminal symbols. Context
free grammars where every production rule takes one of these forms are said to
be in Greibach normal form 4.
Although it is far from obvious, every context free grammar is equivalent to a
grammar in Greibach normal form. There are various algorithms that take a
general context free grammar and produce an equivalent one in Greibach normal
form. They are all rather technical and complicated and we will not investigate
them in detail. Instead we will give an idea of how they work by considering how
to find equivalent gammars in Greibach normal form in the case of some very
simple context free grammars.
Example 7.2.1. It is easy to check that the language L = {0n 1n : n 0} is

generated by the grammar with T = {0, 1}, N = {} and rules

01 01
A very simple trick enables us to get rid of the troublesome terminal 1 on the
right hand side of each of the rules 01 and 01. We just replace it with
a completely new non-terminal symbol B. This works provided we add another
(GNF2) rule B 1 production rule which ensures that B is eventually replaced
3Definitions of Greibach normal form given in various texts vary quite a bit. We adopt the
definition used by Blum and Koch in the paper listed at the start of this chapter.
4Named after its inventor, Sheila Greibach, now Professor of Computer Science at UCLA.
by 1 in any derivation. We must also be sure to add no further rules with B on

the left hand side. This gives the grammar

0B 0B
B1
which is not quite in Greibach normal form yet, because there is a on the right
hand side of 0B. We can replace this with another entirely new non-
terminal symbol V , but we must then to duplicate every (GNF1) or (GNF2) rule
that has a on the left hand side by one that has every replaced by V . This
ensures that all derivations involving will still be possible in our new grammar.
We thus obtain two new rules from 0B and 0B by replacing s by
V s, giving the grammar

0V B 0B

V 0V B 0B
B1
which is in Greibach normal form. You should convince yourself that this really
is equivalent to the grammar we started with. In other words, you should check
that both grammars generate L.
The methods used in Example 7.2.1 to replace troublesome terminals and s on

the right hand side of a context free production rule always work. In the worst
case we are forced to add one new non-terminal and one new rule for each terminal
plus one new terminal replacing and as many new rules as there are rules with
on the left hand side.
Thus for any context free grammar, we can always find an equivalent one where
terminals only appear as the first symbol on the right hand side (if they appear at
all) and there are no s on the right hand sides of the production. The remaining
problem where there are production rules with only non-terminals on the right
hand side is somewhat trickier.
Example 7.2.2. For the grammar with T = {0, 1}, N = {, U, V } and rules

UV V U

U 0V 0

V 1U 1
the rules for U and V are already consistent with Greibach normal form. By
making the two substitutions allowed by the rules for U , into the troublesome
rule U V we obtain two new (GNF1) rules 0V V and 0V . These
7.2. GREIBACH NORMAL FORM 69
new rules replace the existing rule U V . Using a similar approach, we can
replace the rule V U with two new (GNF1) rules, giving a grammar

0V V 0V 1U U 1U

U 0V 0

V 1U 1
in Greibach normal form. Again, you should convince yourself this is equivalent
to the original grammar.
The method used in Example 7.2.2 doesnt always work. This is the reason a
more sophisticated algorithm is sometimes needed to find an equivalent grammar
in Greibach normal form.
Example 7.2.3. Consider the grammar with T = {0, 1}, N = {, U } and rules

U 1U

U 0 0
Both production rules for U are in Greibach normal form, as is 1U . To apply

the method of Example 7.2.2 to replacing the troublesome rule U , we would
need to make both of the substitutions allowed by the rules for . This would
give the rules
U and 1U U.
Although 1U U is in Greibach normal form, the rule U is not and

it is easy to see that further applications of the method of Example 7.2.2 to this
rule will not improve matters. They will just give rules of the form
U, U, U, U
and so on.
Example 7.2.3 illustrates just one of the problems that can be encountered when
attempting to apply a naive approach to transforming a context free grammar
into Greibach normal form. In view of such examples, it is not obvious that there
is always a way to carry out this transformation. The paper by Blum and Koch
appearing in the references at the start of this chapter contains a discussion and
proof of the following theorem.
Theorem 7.1. Every context free grammar is equivalent to a grammar in Greibach

normal form.
7.3. PDA and context free grammars

A context free grammar in Greibach normal form looks a bit like a regular gram-
mar in the sense that the right hand side of every production rule except
(if it is present) begins with a terminal which may or may not be followed by
non-terminals. The biggest difference in the regular case is that there can be
at most one non-terminal. This lets us use a modified version of the technique
of Section 4.2 to construct a (typically non-deterministic) PDA recognising the
language of the grammar. As in Section 4.2, the input alphabet consists of the
terminal symbols (so = T ). Instead of the non-terminals labeling the states,
they (together with the initial stack symbol z) constitute the stack alphabet .
We need only one state q (which must therefore be the initial state) and our PDA
accepts on empty stack. The transition function is built up using the following
algorithm (remember that we are constructing a non-deterministic PDA, so the
transition maps (t, q, A) to a set).
Algorithm 7.1.
(a) For each (GNF1) production rule A tB1 B2 . . . Bn add an element

(q, B1 B2 . . . Bn ) to the set (t, q, A).
(b) For each (GNF2) rule A t add an element (q, ) to the set (t, q, A).
(c) If there is a (GNF3) rule , find all other rules with on the left
hand side and:
For each (GNF1) rule tB1 B2 . . . Bn add an element (q, B1 B2 . . . Bn z)
to (t, q, z).
For each such (GNF2) rule t add an element (q, z) to (t, q, z).
Note that if there is no (GNF3) rule, we need not bother adding any rules for the
initial stack symbol z. We will come back to the reasons for this shortly. The
construction is best understood by means of an example.
Example 7.3.1. Let L = {0n 1n : n 0}. In Example 7.2.1 we observed that the
Greibach normal form grammar with T = {0, 1}, N = {, V, B} and rules

0V B 0B

V 0V B 0B
B1
generates L. Algorithm 7.1 yields the following PDA.

7.3. PDA AND CONTEXT FREE GRAMMARS 71
Q = {q}. x 0 1
= {0, 1}. (x, q, z) {(q, Bz), (q, V Bz)}
= {z, , V, B}. (x, q, ) {(q, B), (q, V B)}
F = . (x, q, V ) {(q, B), (q, V B)}
Initial state q (x, q, B) {(q, )}
The production rules 0V B and 0B add (q, V B) and (q, B) respectively
to the set (0, q, ). Similarly, The production rules V 0V B and V 0B add
(q, V B) and (q, B) respectively to the set (0, q, V ). The production rule B 1
adds (q, ) to the set (1, q, B). Finally, since the (GNF3) rule is present,
the production rules 0V B and 0B add (q, V Bz) and (q, Bz) respectively
to the set (0, q, z).
In order to accept a word w = 0n 1n L, this non-deterministic PDA must guess
when it reaches the last 0 in w and switch from pushing V B to pushing B only.
After that it has no choice but to pop a B for each 1 in the input string. We
illustrate this with some configuration evolutions for some elements of L.
w = : (q, z)
0 1
w = 01 : (q, z) (q, Bz) (q, z)
0 0 1 1
w = 0011 : (q, z) (q, V Bz) (q, BBz) (q, Bz) (q, z)
0 0 0 1
w = 000111 : (q, z) (q, V Bz) (q, V BBz) (q, BBBz) (q, BBz)
1 1
(q, Bz) (q, z)
In all cases, the evolution ends with the configuration (q, z) so each word is ac-
cepted. You should convince yourself that L really is the language accepted by
this PDA by considering what happens when a word not in L is processed.
This construction always gives a PDA with one state q. To simplify notation in
practice classes and assignments, we will leave q out of the transition tables and
configuration notation, giving only the push word and the stack contents.
Example 7.3.1 illustrates an important fact about a single state PDA that accepts
on empty stack. Namely, if we start with just a z on the stack, the empty word
will always be accepted. This is not good news if we have a grammar whose
language does not contain . By definition Greibach normal form, rules of the
form A where A 6= are not allowed. In fact, the only way a Greibach
normal form grammar can generate is if it contains the (GNF3) production
because applying rules of type (GNF1) and (GNF2) always adds one
terminal symbol at each step in a derivation.
This is convenient because it means we can easily tell whether or not the language
L generated by the grammar contains . When / L, there is a simple trick we
can use to prevent the PDA from accepting . Instead of starting with just the
initial stack symbol z on the stack, we start with z on the stack. This guarantees
that the initial configuration (q, z) is not accepting, so is not accepted. As a
bonus, we no longer need to define any transitions for the case where z is on the
top of the stack, simplifying the transition table a bit.
Example 7.3.2. Let L = {0n 1n : n 1}. From Example 7.3.1 and the preceding
paragraph, it should be clear that the Greibach normal form grammar with T =
{0, 1}, N = {, V, B} and rules

0V B 0B

V 0V B 0B
B1
generates L. Algorithm 7.1 therefore yields the following PDA.
Q = {q}. x 0 1
= {0, 1}. (x, q, z)
= {z, , V, B}. (x, q, ) {(q, B), (q, V B)}
F = . (x, q, V ) {(q, B), (q, V B)}
Initial state q (x, q, B) {(q, )}
The transitions for , V and B are the same as in Example 7.3.1 and since there
is no (GNF3) rule, we dont need to define any transitions for z. As in Example
7.3.1, the PDA must guess when the last 0 is reached. We illustrate this by
giving configuration evolutions for some elements of L.
0 1
w = 01 : (q, z) (q, Bz) (q, z)
0 0 1 1
w = 0011 : (q, z) (q, V Bz) (q, BBz) (q, Bz) (q, z)
0 0 0 1
w = 000111 : (q, z) (q, V Bz) (q, V BBz) (q, BBBz) (q, BBz)
1 1
(q, Bz) (q, z)
Since the evolutions all end with (q, z), the words are accepted. Even though the
initial symbol z wasnt marking the stack top when we started, we still need it to
detect the fact that the stack was empty after processing the input.
It is also possible to show that for any PDA with language L there is a context
free grammar that generates L (and hence L is a context free language). As this
is rather complicated, we will not consider the details, but putting this result
7.4. DETERMINISTIC CONTEXT FREE LANGUAGES 73
together with the PDA construction of Algorithm 7.1 and Theorem 7.1 yields the
following theorem.
Theorem 7.2. A language is context free precisely if it is the language accepted

by some (possibly non-deterministic) PDA.
7.4. Deterministic context free languages

Recall from Section 6.4, that in contrast to the situation with finite state ma-
chines, non-deterministic PDA are genuinely more powerful than deterministic
ones. They are capable of recognising a larger class of languages. This give rise to
a very important distinction in the theory of context free languages. A determin-
istic context free language is one that is recognised by some deterministic PDA.
Example 6.4.1 gave a non-deterministic PDA recognising the even palindrome
language
L = {wwR : w {0, 1} }.
It was noted there that there is no deterministic PDA recognising this language,
illustrating the fact that not all context free languages are deterministic.
From the point of view of compiler or interpreter design, deterministic context free
languages are highly desirable. A compiler or interpreter for a non-deterministic
language would typically need to use of some kind of backtracking algorithm.
Such algorithms are slow because the number of steps need to process an input
typically grows exponentially as a function the length of the input.
The problem of determining whether or not the language generated by a given
context free grammar is deterministic is a large and difficult topic. We will not
discuss it in detail. However, there is one particular class of context free languages
that is worth discussing because they can be very neatly shown to be deterministic
using ideas we have already developed.
A context free grammar is called an s-grammar or simple grammar if it is in
Greibach Normal form and has the additional property that
for any non-terminal symbol A and any terminal symbol t there is at most
one production rule with A on the left and t on the right.
Of course, a rule with A on the left and t on the right is either (GNF1) or (GNF2).
Example 7.4.1. The Greibach normal form grammar of Example 7.3.2 for L =
{0n 1n : n 1} has T = {0, 1}, N = {, V, B} and rules

0V B 0B

V 0V B 0B
B1
and is not an s-grammar because, for example, there are two production rules
V 0V B and V 0B having the same non-terminal V on the left and terminal
0 on the right. There is, however, an equivalent s-grammar. One may check that
the grammar with T = {0, 1}, N = {, V, B} and rules
0V

V 0V B 1
B1
also generates L (you should convince yourself of this). The two rules V 0V B
and V 1 for the non-terminal V are acceptable in an s-grammar, because the
terminal symbols on the right are different. This is therefore an s-grammar.
When we carry out the construction of Algorithm 7.1 for an s-grammar, we obtain
a deterministic PDA. This happens because the defining property of an s-grammar
guarantees we can only ever add at most one element to (t, q, A) for each t T
and A N .
Example 7.4.2. Applying the construction of Section 7.3 to the s-grammar we
found in Example 7.4.1 for the language L = {0n 1n : n 1} yields the following
PDA.
Q = {q}. x 0 1
= {0, 1}. (x, q, ) {(q, V )}
= {z, , V, B}. (x, q, V ) {(q, V B)} {(q, )}
F = . (x, q, B) {(q, )}
Initial state q
Since / L we start with z on the stack

and need not define transitions for z. Since x 0 1
all defined entries in the table are one element (x, q, ) (q, V )
set, the PDA is really deterministic so we could (x, q, V ) (q, V B) (q, )
rewrite its transition table in the deterministic (x, q, B) (q, )
form show at right.
S-grammars are very easy to work with. It is easy to write fast and efficient
programs that can recognise the language generated by an s-grammar. Although
they are not sufficiently powerful to describe modern programming languages, they
can be used for some tasks and they can be generalised to more powerful types of
grammars appropriate to the task of compiling or interpreting real programming
languages. In the theory of compilers and programming language design, the most
widely studied generalisations of s-grammars are the LL grammars and the LR
grammars.
Chapter Eight
Countability and uncountability

Resolution of many issues in theoretical computer science, re-
quires a consideration of the relative sizes of sets. For finite sets
A and B, questions like Is A larger than B? or Do A and B
have the same number of elements? are simple to interpret and
usually fairly easy to answer. For infinite sets, the meaning of
such questions is not immediately obvious. In this chapter, we
consider how to compare the sizes of infinite sets.
References. Chapter 7 of Doerr and Levasseur and Chapter 0 of Kelley discuss

cardinality and countability.
8.1. How big is a set?

8.1.1. When sets are the same size. Before attempting to analyse the
situation for infinite sets, we first consider what it means for one finite set to be
the same size as another. Provided we resolve these questions in the right way,
our definition extends to the infinite case in a straightforward way.
It does not take mathematical genius to calculate that the set A = {a, b, c, d} has
4 elements. One simply counts the elements: a is the 1st element, b is the 2nd , c is
the 3rd and d is the 4th . Another way of viewing this counting process is that we
are matching up the elements of A with the numbers 1, 2, 3 and 4 in such a way
that each element of A corresponds to precisely one of these numbers. In other
words, we are showing there is a one-to-one correspondence
a b c d
l l l l
1 2 3 4
between A and N4 = {1, 2, 3, 4}. This correspondence is only possible because the
sets A and N4 are the same size. A similar correspondence between
F = {Hawthorn, Essendon, Footscray, Geelong, Carlton}
and N5 = {1, 2, 3, 4, 5} shows that F has five elements or, equivalently, that F
and N5 have the same size. This line of thinking leads to the definition that
A has n elements precisely there is one-to-one correspondence between A and
Nn = {1, 2, . . . , n} and more generally, that two finite sets are the same size if
there is a one-to-one correspondence between them. We may think of the sets Nn
75
76 8. COUNTABILITY AND UNCOUNTABILITY
as reference sets for the sizes of finite sets. It should be obvious that every finite
set is in one-to-one correspondence with precisely one of them.
If we wish to use one-to-one correspondences to frame a definition of sets having
the same number of elements, we need to state in a precise mathematical way
what is meant by a one-to-one correspondence. Changing the correspondence
diagram slightly
x 1 2 3 4
f (x) a b c d
makes it clear that a one-to-one correspondence between sets is in fact a function
between the sets, not just any function, but one that matches up the elements
of its domain and codomain with each other in a one-to-one fashion. A function
between the sets A and B has this matching property precisely if it satisfies two
requirements:
It must be a one-to-one function. This means it must send different ele-
ments of its domain to different elements in the codomain. In quantifiers,
this may be expressed as
(x, y A) x 6= y = f (x) 6= f (y).
It must be an onto function. This means that every element of the

codomain has some element of the domain mapping to it. In quantifiers,
this may be expressed as
(y B)(x A) f (x) = y.
The quantified definition of a one-to-one function given above captures the moti-
vation of he definition nicely, namely, that distinct points in the first set cannot
correspond to the same point in the other. When checking whether a particular
function is one-to-one, however, it is often easier to use the equivalent contrapos-
itive form
(x, y A) f (x) = f (y) = x = y
of the definition. This is particularly so in cases where f is defined using a formula
rather than a table.
Example 8.1.1. Let A = {1, 0, 1, 2} and B = {2, 0, 2, 4}. To check that the
function f : A B defined by f (x) = 2x is a one-to-one correspondence, we
check that the contrapositive form of the definition of one-to-one holds
f (x) = f (y) = 2x = 2y = x = y
and that f is onto. Since B is finite, we can check this by checking exhaustively
that every element of B has something that maps to it:
2 = 2 (1) = f (1), 0 = 2 0 = f (0), 2 = 2 1 = f (1), 4 = 2 2 = f (2).

8.1. HOW BIG IS A SET? 77
In the infinite case, we wont be able to do this kind of exhaustive checking, so

we will have to work smarter.
So far we have only considered finite sets A and B, but this idea of saying A and
B have the same number of elements (or the same size) precisely if there is a one-
to-one and onto function f : A B still makes perfectly good sense in the case
where A and B are infinite. In fact, this is the universally accepted mathematical
definition of the idea that two sets are the same size.
Definition 8.1.1. We say sets A and B have the same size or cardinality if there
exists a one-to-one and onto function f : A B. We express this by writing
|A| = |B|.
Example 8.1.2. For the language L = {0n 1n : n 1}, it is easy to see that
the function f : N L defined by f (n) = 0n 1n is is a one-to-one correspondence.
Hence |N| = |L|.
All of this may seem like childs play in the case of finite sets, but in the infinite
case there are a few surprises in store.
Example 8.1.3. Let A = N and B = {2n : n N}, so B is the set of even

natural numbers. We now check that the function f : A B defined by f (x) = 2x
is a one-to-one correspondence. As in Example 8.1.1, the contrapositive form of
the definition of one-to-one holds since
f (x) = f (y) = 2x = 2y = x = y.
To check that f is onto, we observe that every element y = B takes the form
y = 2n for some n N, so y = f (n).
Example 8.1.3 may seem a bit strange at first sight. Despite the fact that B is a
proper subset of A, it is the same size as A according to our definition. This kind
of thing seems strange because we are so used to thinking about the size of finite
sets, for which such behaviour is impossible1. There are stranger things to come.
There is another point about Definition 8.1.1 that may be bothoering some read-
ers. It seems to be asymmetrical. Surely |A| = |B| should imply |B| = |A| and
yet the definition is expressed using a one-to-one and onto function f : A B.
Recall from Example 1.3.4 that although a function f always has an inverse rela-
tion f 1 , there is no guarantee in general that f 1 is a function. The following
theorem, however, shows that if f : A B is one-to-one and onto, then so is
f 1 : B A and hence |B| = |A|. This shows that the asymmetry in Definition
1In fact, there is an alternative definition of a finite set which says that a set is finite precisely
if it does not have the same cardinality of any of its proper subsets. A set having this property
is called Dedekind finite.
8.1.1 is apparent rather than real. In other words, when proving set cardinalities
are equal, it doesnt matter which direction the function goes.
Theorem 8.1. The inverse f 1 of a function f : A B is a function precisely

if f is one-to-one and onto. Moreover, if f is one-to-one and onto, so is f 1 .
Theorem 8.1 is also useful for checking that a function is one-to-one and onto,
because it says we can do so by finding a formula for the inverse. And in order
to establish that a function g is the inverse of a function f , all we need to do is
check that the following two conditions hold.
(INV1) g(f (x)) = x for every x in the domain of f .
(INV2) f (g(x)) = x for every x in the domain of g.
Example 8.1.4. The set Z of integers consists of all whole numbers, positive,
negative and zero. The one-to-one correspondence suggested by
1 2 3 4 5 6 7 8 9 ...
l l l l l l l l l
0 1 1 2 2 3 3 4 4 . . .
can be turned into a formula for a function f : N Z defined by

1n if n is odd
2
f (n) =
n

2
if is even.
In view of Theorem 8.1, we can show that f is one-to-one and onto by showing it
has an inverse. The function g : Z N defined by

1 2n if n 0
g(n) =
2n if n > 0.
is the inverse of f (you should check this, by checking conditions (INV1) and
(INV2)) so f is one-to-one and onto and hence |Z| = |N|.
8.1.2. Large and small. When trying to decide when one set is larger than
another, it is again sensible to start with finite examples. Although there is no
one-to-one correspondence between A = {a, b, c, d} and B = {a, c, e}, we can get
a one-to-one correspondence between B and a subset of A.
a c e
l l l
a b c d
Such a correspondence gives a one-to-one function from B to A that is not onto.
This gives a way of expressing the fact that A is larger than B, but we have to
be careful. We could attempt to define |B| < |A| to mean that there is a one-to-
one function from B that is not onto. This would work in the finite case, but it
8.2. COUNTABLE SETS 79
would give a questionable definition in the case where the sets are infinite. The
function f (x) = 2x of Example 8.1.3(a) illustrates this, because it is a one-to-one
function but not onto function from N to N showing that |N| < |N| according to
this tentative definition!
To avoid these difficulties, we instead define |B| |A| to mean that there is an
one-to-one function from B to A. This function could be onto, in which case the
sets would equal, as suggested by the notation. The statement |B| < |A| is
then defined to mean that |B| |A| is true, but |B| = |A| is false. In other
words, there is an one-to-one function from B to A, but there is no one-to-one
and onto function. This definition lets us prove the following (rather obvious, but
very useful) theorem.
Theorem 8.2. If B A then |B| |A|.
Proof. Since B A, a valid function definition for f : B A is given by

f (x) = x. This function is one-to-one since f (x) = f (y) = x = y.
Example 8.1.5. (a) Since Z is contained in the set Q of rational numbers

(the numbers we can write as fractions m n
where m and n are integers and
n 6= 0), we have |Z| |Q|.
(b) Since Q is contained in the set R of real numbers, we have |Q| |R|.
(c) If F is a finite set then for some m N there is a one-to-one and onto
function f : F Nm and Nm N. Hence |F | = |Nm | |N| and hence
|F | |N|. Since it is obvious that there can be no one-to-one and onto
function g : F N, we have |F | < |N|.

We saw that |Z| = |N| in Example 8.1.3(c) and we now know that |Z| |Q| and
|Q| |R|. There is no obvious way of deciding whether |Z| < |Q| or |Q| < |R| at
this stage. In fact, some real surprises are in store here.
8.2. Countable sets

It may be shown that the smallest possible infinite sets are those having the same
cardinality as N. A precise meaning for this slightly mysterious statement is given
out in the following theorem.
Theorem 8.3. If a set A is infinite and |A| |N| then |A| = |N|.
Stated in a slightly different form, Theorem 8.3 says:

If |A| < |N| then A is finite. ()
Sets of the same cardinality as N are extremely important in mathematics and
theoretical computer science because they have the very useful property that their
elements can be written down as an infinite list
x1 , x2 , x3 , x4 , . . .
in which no element is ever repeated. This follows directly from the definition of
cardinality, for if |A| = |N| there is a one-to-one and onto function f : N A,
and putting
x1 = f (1), x2 = f (2), x3 = f (3), x4 = f (4), . . .
gives the desired listing of the elements of A. The fact that f is one-to-one
guarantees no element is ever repeated. Moreover, this argument can easily be
reversed to show that any set A whose elements can be written as an infinite list
x1 , x2 , x3 , x4 , . . .
with no element is ever repeated has the same cardinality as N. All we need to
do is define f : A N by f (xi ) = i. Sets of this cardinality are so important,
they have a special name. Just as the sets Nn of Section 8.1 are used to represent
the sizes of finite sets, the set N is used as the standard representative the size of
these sets. This is the idea behind the following definition.
Definition 8.2.1. A set A is called countably infinite if |A| = |N| and A is called
countable if |A| |N|.
Example 8.1.5(c) shows immediately that every finite set is countable. In fact by
the version of Theorem 8.3 given above in (), the countable sets that are not
countably infinite are precisely the finite sets. Example 8.1.3 shows, somewhat
surprisingly, that the set of even natural numbers and the set Z of integers are both
countably infinite. Much more surprising is the fact that the Cartesian product
N N pictured in Figure 1, is countable.
4 (1,4) (2,4) (3,4) (4,4)
3 (1,3) (2,3) (3,3) (4,3)
2 (1,2) (2,2) (3,2) (4,2)
1 (1,1) (2,1) (3,1) (4,1)
1 2 3 4
Figure 1. Some points in the Cartesian product N N.

8.2. COUNTABLE SETS 81
Even by itself, the bottom row is a copy of N, so N N appears to have far more
elements than N. However we can use the idea of writing the elements of N N
as an infinite list to demonstrate that N N is countably infinite. The trick is to
count following a diagonal pattern, as follows:
(1, 1), (1, 2), (2, 1) (1, 3), (2, 2), (3, 1), ....
| {z } | {z } | {z }
sum 2 sum 3 sum 4
We first count the ordered pairs with sum 2, then those with sum 3 (in increasing
order of the first coordinate), then those with sum 4, and so on. Pairs with the
same sum lie on a diagonal. This process is illustrated in figure 2.
4 (1,4) (2,4) (3,4) (4,4)
3 (1,3) (2,3) (3,3) (4,3)
2 (1,2) (2,2) (3,2) (4,2)
1 (1,1) (2,1) (3,1) (4,1)
1 2 3 4
Figure 2. A counting path in N N.
This enables the pairs to be matched to the natural numbers as follows

1 2 3 4 5 6
l l l l l l ....
(1, 1) (1, 2) (2, 1) (1, 3) (2, 2) (3, 1)
It should be clear that this process gives a one-to-one correspondence, but to

convince a skeptic we need to produce a formula for a one-to-one and onto function
f : N N N. It turns out that
1
f (m, n) = (m + n 2)(m + n 1) + m
2
does the trick. It is left as a challenge to the interested student to show that
f really is one-to-one and onto. Notice, however, that it works for the first six
ordered pairs listed before:
f (1, 1) = ( 21 0 1) + 1 = 1, f (1, 2) = ( 21 1 2) + 1 = 1 + 1 = 2,
f (2, 1) = ( 21 1 2) + 2 = 1 + 2 = 3, f (1, 3) = ( 12 1 3) + 1 = 3 + 1 = 4,
f (2, 2) = ( 21 2 3) + 2 = 3 + 2 = 5, f (3, 1) = ( 12 2 3) + 3 = 3 + 3 = 6.
The fact that N N is countably infinite has other surprising consequences. Let
Q+ denote the set of positive rational numbers. Recall that we can write any
positive rational number q Q+ in lowest positive terms. This means q can be
written in the form q = m q
nq
where mq and nq have no common factors (because
we have cancelled as far as possible) and mq and nq are also both positive (there
is no point writing q = mn
where m and n are both negative). The fact that we
have written q in its lowest terms means mq and nq are unique, so the function
f : Q+ N N defined by f (q) = (mq , nq ) is one-to-one and hence
|Q+ | |N N| = |N|
which shows that |Q+ | |N|. Since Q+ is infinite, Theorem 8.3 shows that
|Q+ | = |N|. A construction similar to the one used in Example 8.1.4 can be used
to show that |Q+ | = |Q|, giving the the following theorem.
Theorem 8.4. The set Q of rational numbers is countably infinite.
In view of the very generous way in which the rational numbers are scattered
among the real numbers between any pair of distinct real numbers, there is
always a rational number this seems very surprising. It looks as though there
are a should be a lot more rational numbers than there are natural numbers!
A Proof of Theorem 8.3

If A is infinite and |A| |N|, there is a one-to-one function f : A N. Since f
is one-to-one, the set
B = {f (x) : x A}
is infinite and since B N, we can list its elements in increasing order
k1 < k2 < k3 < . . .
For each n N the fact that kn B means there is an x A such that f (x) = kn .
Since f is one-to-one, there is only one such element, so we an call it xn and define
g : N A by g(n) = xn . It may be shown that g is one-to-one and onto, which
shows |A| = |N|. It is left as a challenge to the interested student to show g is
one-to-one and onto.
8.3. How big is a language?

Recall that the union A1 A2 An of a finite sequence A1 , A2 , . . . , An of
sets is the set obtained by throwing in all of the elements of all of the Ai . Using
8.3. HOW BIG IS A LANGUAGE? 83
quantifiers, this may be expressed as

[n
Ai = A1 A2 An = {x : (i Nn ) x Ai }.
i=1
This approach makes perfectly good sense for an infinite sequence

A1 , A2 , A3 , A4 , . . .
of sets. The aim is the same. We seek the set obtained by throwing in all of the
elements of all of the Ai . The union of an infinite sequence of sets A1 , A2 , A3 , . . .
is defined using quantifiers by
[
Ai = A1 A2 A3 = {x : (i N) x Ai }.
i=1
Example 8.3.1.
(a) If Ai = {i} for each i N, it is easy to see that A1 A2 A3 = N.
(b) If Ai = {i, i, 0} for each i N, it is easy to see that A1 A2 A3 = Z.
(c) If Ai = {0i 1i } for each i N where 0i 1i denotes a word in {0, 1} , it is
easy to see that A1 A2 A3 . . . is the language L = {0n 1n : n N}.

Now, suppose we have an infinite sequence

A1 , A2 , A3 , A4 , . . .
of disjoint, finite, non-empty sets, so |Ai | = ki for each i N. To label the
elements of these sets carefully, we need two subscripts one to tell which set Ai
the element is in and one to tell precisely which element it is giving a labeling
A1 = {a11 , a12 , a13 , . . . , a1k1 }
A2 = {a21 , a22 , a23 , . . . , a2k2 }
A3 = {a31 , a32 , a33 , . . . , a3k3 }
.. ..
. .
for all of the elements of all of the Ai . This labeling allows us to write the elements
of A1 A2 A3 . . . as a list
a11 , a12 , . . . , a1k1 , a21 , . . . , a2k2 , a31 , a32 , . . . , a3k3 , a41 , . . .
Since the Ai are disjoint, each element of A1 A2 A3 . . . appears exactly once
in this list. This listing shows us that |A1 A2 A3 . . . | = |N|.
For the skeptics we note that this list order is given by the function
[
f: Ai N
i=1
defined by the formula f (aij ) = j in the case where i = 1 and

i1
X
f (aij ) = j + kr
r=1
in the case where i > 1. It is left as a challenge to the interested student to show
f is one-to-one and onto.
In fact, it is possible to drop the requirement that the Ai be disjoint. This requires
a modification to the above list in the case where the Ai are not disjoint as we
construct the list, whenever we encounter an aij we have seen before, we just leave
it out. This yields the following theorem.
Theorem 8.5. The union of any sequence of finite sets is countable.
In Example 8.1.2, we saw that the language L = {0n 1n : n 1} is countable. In

fact, we can use Theorem 8.5 to see why every language must be countable. It is
implicit our definition of a language L that the alphabet of the L is finite, say
|| = m.
Let Li be the set of words in L of length i, so Li = {w L : |w| = i}. We can

use a counting argument of the type you met in MAT1DM to show that each Li
is finite. For any word
w = w1 w2 . . . wi Li
where i > 0, we have w1 so there are at most m possible choices for the
symbol w1 . Similarly, there are at most m possible choices for w2 , w3 , . . . , wi1
and wi . These choices may or may not be independent, depending on L, but in
the case where they are independent, we obtain the total number of words in Li
by multiplying, so
|Li | = m m m = mi .
If the choices are not independent, the number of possibilities will actually be less
and we will have |Li | < mi . In either case Li is finite. Finally, there are only two
possibilities for L0 . If L containd then L0 = {} and if not L0 = .
Example 8.3.2. (a) If L = {0, 1} , then |Li | = 2i for each i 0 because

there are no restrictions on the choice of 0s and 1s. In other words, the
choices are independent of one another.
(b) If L = {0n 1n : n 0} , then Li = {0i/2 1i/2 } for each even i 0 and hence
|Li | = 1. For odd i 0, we have Li = so |Li | = 0.
8.3. HOW BIG IS A LANGUAGE? 85
(c) If L = {w {0, 1} : |w| is even}, it is obvious that |Li | = 0 for each

odd i 0. For even i 0, the choices of 0s and 1s are independent so
|Li | = 2i .

Since every word in L must be in Li for some i 0, we have2

[
L= Li1 = L0 L1 L2 L3 . . . ()
i=1
and Theorem 8.5 shows L is countable. This gives the following theorem.
Theorem 8.6. Every language is countable.
We can obtain an even better result than Theorem 8.5 using the fact that N N
is countably infinite. If A1 , A2 , A3 , . . . is an infinite sequence of disjoint countable
sets, then for each i N there is a one-to-one function gi : Ai N (not necessarily
onto because some of the Ai might be finite). Because the Ai are disjoint, we can
define

[
f: Ai N N
i=1
by f (aij ) = (i, gi (j)). (Can you see why we require that the Ai be disjoint here?)
It can be shown that f is one-to-one and hence

[
Ai |N N| = |N|

i=1
which shows that |

S S
i=1 Ai | |N|. Since i=1 Ai is infinite, Theorem 8.3 shows
S
that | i=1 Ai | = |N|, so is countably infinite. Here again it is possible to drop the
requirement that the Ai be disjoint using a the sam approach as for Theorem 8.5.
This gives a stronger theorem:
Theorem 8.7. The union of any sequence of countable sets is countable.
It is left as a challenge to the interested student to show f is one-to-one.
2Notice that a language nee not be infinite. It may be the case that only finitely many of
the Li are non-empty, in which case the union in () would be finite.
8.4. Uncountable sets

A set that is not countable is (not surprisingly) called uncountable. In view of the
theorems of the previous section, some readers may be wondering by now whether
all sets are countable. Of course, they are not. Otherwise, we wouldnt waste your
time by defining uncoutable sets! A key idea we will need in order to demonstrate
the existence of uncountable sets is that of the set of all subsets of a given set.
Definition 8.4.1. Let A be a set. The power set of A is the set of all subsets of
A (never forget that this always includes and A itself) and is denoted P(A).
Example 8.4.1.
(a) P() = {}. Notice that even though is empty, P() is not, since it
contains itself.
(b) P(N1 ) = P ({1}) = {, {1}} so |P({1})| = 2.
(c) P(N2 ) = {, {1}, {2}, {1, 2}} so |P(N2 )| = 4.
(d) P(N3 ) = {, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}} so |P(N3 )| = 8.
(e) P(N) contains , N, all of the finite subsets of N, for example,
{1}, {1, 2}, {4, 7}, {100, 1000}, {10, 11, . . . , 20}
and so on as well as all of the infinite subsets like the even numbers, the
odd numbers, the perfect squares {1, 4, 9, 16, . . . }, and countless others
(in fact, uncountably many, as we shall soon see).

You may have already spotted a pattern in Example 8.4.1. It turns out that
|P(Nn )| = |P({1, 2, . . . , n})| = 2n
and from this it may be shown3 that if |A| = n then |P(A)| = 2n . It follows that
|A| < |P(A)| for any finite set A. Remarkably, this is true even for infinite sets.
Theorem 8.8 (Cantors Theorem). |A| < |P(A)| for any set A.
Thus |N| < |P(N)|, so P(N) is our first example of an uncountable set. By
applying Cantors Theorem repeatedly, we cans construct a sequence of infinite
sets of strictly increasing cardinality
|N| < |P(N)| < |P(P(N))| < |P(P(P(N)))| < . . .
and by definition the sets
P(N), P(P(N)), P(P(P(N))), . . .

3Because of this, many books use the notation 2A instead of P(A) for the power set. In
fact, this is the reason for the name power set.
8.4. UNCOUNTABLE SETS 87
are all uncountable, so we have not just shown how to construct an uncountable
set we have shown how to construct uncountable sets as big as we like. These sets
may seem a little unfamiliar, but we can use the fact that P(N) is uncountable,
to show that more familiar sets are also uncountable.
Example 8.4.2.
(a) Let B denote the set of all infinite binary sequences4, so B is the set of
all sequences of the form
x1 , x2 , x3 , . . .
where each xi is either 0 or 1. We can define a function f : B P(N) in

a very natural way by letting f (x1 , x2 , x3 , . . . ) = {i N : xi = 1}.
For example
f (0, 0, 0, 0, . . . ) =
f (1, 0, 0, 0, . . . ) = {1}
f (0, 1, 0, 1, 0, 1, . . . ) = {n N : n is even}
f (1, 1, 1, 1, . . . ) = N.
It is easy to check that f is one-to-one and onto, so |B| = |P(N)| and

hence B is uncountable.
(b) Define a function f : B R (where B is defined as in (a) above) by
sending x1 , x2 , x3 , B to the number with decimal representation
0.x1 x2 x3 . . . so every f (x1 , x2 , x3 , . . . ) = 0.x1 x2 x3 . . . is contained in the
interval [0, 1). Thus for example
f (0, 0, 0, 0, . . . ) = 0
1
f (1, 0, 0, 0, . . . ) = 0.1 = 10
f (1, 1, 1, 1, . . . ) = 0.1111 = 19 .
It is well known that any two distinct decimal representations give distinct
real numbers except in the important case where one representation ends
in an infinite string of 9s. For example,
0.99999 = 1, 0.099999 = 0.1, 0.499999 = 0.5.
Since B contains no sequences ending in an infinite string of 9s, distinct

elements of B are always sent to different numbers by f . This means f is
one-to-one and hence |B| |R|. It follows that R is uncountable by (a).

4The usual mathematical notation for B is {0, 1}N .
Example 8.4.2(b) seems very surprising indeed! We already observed that Q is

countable, but that between any pair of distinct real numbers, there is always a
rational number. Nonetheless, there are a lot more real numbers than there are
rational numbers.
The most important result from our point of view concerns the set of all languages
made from a (non-empty) alphabet . Since contains some symbol a, it contains
the word an for each n N, so it is infinite. Since is a language, Theorem 8.6
tells us that | | |N| and hence | | = |N| by Theorem 8.3. By our definition
of a language with alphabet , every subset of is a language. The set of all
possible languages made from is therefore P( ). But now Cantors Theorem
gives
|P( )| > | | = |N|
which proves the following Theorem.
Theorem 8.9. The set of all possible languages made from any non-empty alpha-
bet is uncountable.
A proof of Cantors Theorem.
The fact that |P(A)| |A| follows from Theorem 8.2 since the subset
{{x} : x A}
of P(A) clearly has the same cardinality as A (you should prove this).
To show |P(A)| > |A| we need to establish that |P(A)| = |A| is false. Here we
need to use proof by contradiction. This means we assume the negation of what
we really want to prove and show that this leads to a contradiction a statement
that is clearly absurd or false.
In this case, we assume that |A| = P(A)|. This means there is a one-to-one and
onto function f : A P(A). Now here comes the fiendishly clever trick! Define
E = {x A : x
/ f (x)}.
Although this definition looks a bit strange, it makes sense because the codomain
of f consists of subsets of A, so f (x) is a set for each x A. This means we can
always ask whether or not x f (x) and define E to be the set of x values for
which this is false. It may be that E is empty, but this doesnt matter.
Since E X it is an element of the codomain P(X) of f . Since f is onto, there
must be a z X such that f (z) = E. We get a contraction from the fact that z
must either be an element of E or not:
8.4. UNCOUNTABLE SETS 89
If z E then z
/ f (z) by definition of E, but f (z) = E so this means
z / E. But this is a contradiction because we have now shown that
z E = z / E.
If z
/ E then z f (z) by definition of E, but f (z) = E so this means
z E. This is also a contradiction because we have shown that z /
E = z E.
Since both cases lead to a contradiction, we conclude that there can be no one-
to-one and onto map A P(A) so |P(A)| = |A| must be false, as we wanted to
prove. Thus |A| < |P(A)|.
Chapter Nine
More Powerful Machines

In Chapters 3 to 5 we considered finite state machines and their
correspondence with regular languages. In Chapters 6 and 7 we
added more power to finite state machines to obtain PDA and
considered their correspondence with context free languages. In
this chapter, we extend these ideas to define an even more pow-
erful class of automata and consider the languages they are ca-
pable of recognising. In fact these automata known as Turing
machines are so powerful that they have been universally re-
garded for over 60 years as the gold standard of what can
theoretically be calculated by an algorithm.
Chapters 9 to 12 of Linz, Chapters 4, 5 and 6 of Kelley and Chapters 7 and 8 of
Hopcroft and Ullman.
9.1. Not all languages are context free

Earlier we saw that not all languages are regular. We now observe that even the
larger class of context free languages does not contain everything. In fact, some
rather innocent looking languages like
L1 = {0n 1n 2n : n 1}
and
L2 = {ww : w {0, 1}+ }
are not context free. In view of the construction of Section 7.3 this means there
is no PDA that recognizes them. This seems plausible for both of the above
languages. For L1 it seems likely that the only way a PDA can test whether the
number of 0s is the same as the number of 1s is to use a method similar to
that of Example 6.3.2 and 6.3.3 adding a symbol to the stack as each 0 is read
and then deleting a symbol as each 1 is read. The problem is that this method
destroys our count of 0s and we have no way to check that the number of 2s
is correct. For L2 we would need a PDA similar to the one of Example 6.4.1
which remembers the first half of a palindrome by writing it to the stack and
then compares. For L2 a problem arises with this method because items retrieved
from a stack are in reverse order. There is no way to correct this. We can give
91
92 9. MORE POWERFUL MACHINES
a rigourous demonstration of the fact that these languages are not context free
using the following theorem.
Theorem 9.1 (Pumping Lemma for Context Free Languages). For any context
free language L, there is a special number p N (called a pumping length) that
has the following property:
Any word w L such that |w| p can be written as w = uvxyz where
(a) |vxy| p,
(b) v 6= or y 6= (i.e. at least one of v and y is non-empty),
(c) The word uv i xy i z is in L for every i 0.
The name of Theorem 9.1 comes from (c). It tells us that the words
uvvxyyz, uvvvxyyyz, . . . , uv i xy i z, . . .
and so on are all in L. Provided w L is long enough to begin with, we can
pump it up in the way described by (c) to get longer and longer words that are
also in L. We will not attempt to prove this result in MAT2MFC. A proof would
be at least a weeks work in itself. We will simply demonstrate how it can be
used to show that languages are not context free. To do this we need to use the
technique of proof by contradiction. This means we assume the negation of what
we really want to prove and show that this leads to a contradiction a statement
that is clearly false. We begin by assuming the language is context free and show
(using the pumping lemma) that this leads to a false conclusion.
Example 9.1.1. Suppose L1 = {0n 1n 2n : n 1} is a context free language and

let n be any integer greater than p where p is the pumping length for L1 . The
word w = 0n 1n 2n is in L1 and |w| = 3n > p so the pumping lemma lets us write
w = 0n 1n 2n = uvxyz
where |vxy| p < n and at least one of the words v and y is non-empty. The fact
that |vxy| < n means that vxy isnt long enough to contain all three symbols 0, 1
and 2, so it must be a sub-word of one of the five words
0n , 0n 1n , 1n , 1n 2n and 2n .
Lets consider these possibilities:
(i) If vxy were a sub-word of 0n we would have vxy = 0j for some j n,
u would be a sub-word of 0n and 1n 2n would be a sub-word of z. We
would also have v = 0k and y = 0m for some k, m 0. Since at least
one of v and y is non-empty, this would give vvxyy = 0q+j for some
q > 0 (i.e. vvxyy would have q more 0s than vxy). This would mean
s = uvvxyvz = 0q+n 1n 2n so s would have q more 0s than w = uvxyz but
the same number of 1s and 2s. This would show s / L1 . But Theorem
9.1. NOT ALL LANGUAGES ARE CONTEXT FREE 93
9.1 tells us that s L1 . This shows vxy cant have been a sub-word of
0n after all.
(ii) By similar reasoning vxy cant be a sub-word of 1n or some 2n either.
The only possibilities left are that vxy is a sub-word of 0n 1n or 1n 2n .
(iii) If vxy were a sub-word of 0n 1n , a similar (although slightly more compli-
cated) argument as the one in (i) would show that vvxyy would have to
contain either more 0s than vxy or more 1s than vxy (or possibly both).
But this in turn would mean that s = uvvxyvz would have to contain
either more 0s than 2s or more 1s than 2s. In either case, we have
s / L1 , but as before, Theorem 9.1 tells us that s L1 . As before, vxy
cant have been a sub-word of 0n 1n after all.
(ii) By similar reasoning vxy cant be a sub-word of 1n 2n either. There are
no possibilities left.
All possibilities ended in tears! We must conclude that L1 is not context free.
A similar proof can be given for L2 , starting with the word w = 0n 1n 0n 1n

L2 where n is any number greater than the pumping length. Believe it or not,
Example 9.1.1 is a relatively easy application of the pumping lemma! Even for the
relatively simple languages L1 and L2 in the introduction to this section, applying
the pumping lemma can be quite hard. There is a much easier way to show
languages on a one-letter alphabet are not context free.
Theorem 9.2. Let a be any symbol. A language L {a} is context free precisely
if it is regular.
We already know that languages like {0n 1n : n 0} are context free but not
regular, so even for a two letter alphabet, this theorem no longer holds. For
L {a} however, if a suffix set argument shows L is not regular, then L cannot
be context free either.
Example 9.1.2. We show that the language

n
L = {02 : n 0} = {w {0} : (m 0) |w| = 2m }
n
is not context free. We will show that the suffix sets S(02 ) are distinct for all
n n n n+1 n n
n 0. First note that 02 02 = 022 = 02 L so 02 S(02 ) for all n 0.
n m
But 02 / S(02 ) for any m > n. This is because
m n m 2n n (2mn +1)
02 02 = 02 = 02
cannot be in L since 2n (2mn + 1) has an odd factor 2mn + 1 and therefore cannot
be a power of 2. (You should convince yourself that powers of 2 never have odd
n
factors.) This shows that the suffix sets S(02 ) are all distinct, so L is not regular
by Theorem 5.3 and therefore not context free by Theorem 9.2.
9.2. Turing machines

We have seen that there are correspondences between regular languages and finite
state recognition machines and between context free languages and push down
automata. We now know that there are languages, some quite simple to describe,
that are not context free. It is natural, therefore, to seek a more powerful class of
machines capable of recognizing an even larger class of languages. As illustrated
at the start of Section 9.1, the power of a PDA seems to be limited by the way
in which it accesses the stack. We can define a more powerful type of machine
called a Turing machine (abbreviated as TM) by giving more flexible access to
memory. Instead of only ever reading from and writing to the stack top, we allow a
kind of sequential access memory that allows us to move back and forth between
locations, reading and writing as we go. We picture this kind of memory as a
tape extending infinitely in both directions that can store a symbol at each
location or cell.
... a b a 1 a b 0 ...
At any particular stage in processing, most of the tape is blank. In fact, there
are only ever finitely many cells that are not blank. We represent this situation
using a special blank symbol .
... a b a 1 a b 0 ...
Since we can now read from and write to any cell, we need to know which cell
is currently the active one, usually called the read/write cell (or sometimes the
read/write head ). After each step in processing, a TM moves the read/write cell
either one cell to the left or one cell to the right. We represent the position of the
read/write cell by underlining the symbol in that cell.
... a b a 1 a b 0 ...
Just like finite state machines and PDA, Turing machines have a set Q of states, an
initial state q0 and a set F of accepting states. A major difference from finite state
machines and PDA, however, is that there is no separate input string. Instead,
the input is placed directly on the tape initially. As you will see, our new class
of machines can do more that just pass through the input symbols one at a time.
They can pass back and forth along the input word (or parts of it) as many times
as necessary. This means we need the input right there on the tape where we can
process it.
This mode of operation also means that all Turing machines have output, because
we can regard the contents of the tape after processing has ceased as output. To
make all of this work, there must be an alphabet of symbols (including the
blank symbol ) that can appear on the tape. It makes things easier to also have
an input alphabet \ { }. This is not strictly necessary, but it gives a
9.2. TURING MACHINES 95
way of restricting the allowed input symbols. Symbols (other than ) in \

are typically used as temporary markers during computations. This idea will
become clear in the examples. To avoid confusion, we never use as an input
symbol or as a marker.
Finally, just like finite state machines and PDA, Turing machines have a partial
transition function which specifies how they operate. Given the current state
and the current read/write symbol, must tell us:
which state to move to (as it would in a finite state machine or PDA).
what to replace the symbol in the read/write cell with (we can replace it
with the same symbol if we like).
whether to move left (represented by L) or right (represented by R) after
processing the the symbol in the read/write cell.
so has domain Q and codomain Q {L, R}. Putting all of this together
gives a (standard) Turing machine. In summary, a standard Turing machine1
consists of the following.
a set Q states.
a set F of accepting states.
an initial state q0 .
a tape alphabet containing the blank symbol .
a partial transition function : Q Q {L, R}
Because TM do not simply process their input string one symbol at a time, it is
not immediately obvious when processing ceases. There is no sensible criterion
analogous to the empty stack criterion for PDA (especially if we want to regard
the contents of the tape after processing as output). The convention we adopt
is that processing stops whenever no further transitions are defined (remember
that is a partial function). This makes it easy to use a TM as a recognition
machine. We say that an input word is accepted precisely if processing stops in an
accepting state. If it stops in a non-accepting state (because the next transition is
undefined), the word is rejected. To keep things simple, we adopt the convention
in MAT2MFC that transitions will never be defined for any accepting state.
The only way to really understand TM is to study some examples. To do so, we
need informative ways to represent TM and their operations. Transition tables
can certainly be given, but as for FSM, directed graphs are usually easier to use.
Transition tables are different than for PDA because the domain of is slightly
simpler while the codomain is slightly more complicated. In a directed graph, edge
labellings look a little different. For example, edge labellings for the transitions
1Named after the mathematical logician and cryptanalyst Alan Turing, who proposed what
we now call Turing machines in the 1930s. Many consider him the founder of computer science.
(a, p) = (q, u, R), (c, p) = (q, v, L) and ( , r) = ( , w, L) are shown in Figure 1.

We will see why transitions like ( , r) = ( , w, L) are necessary in the examples.
a 7 (u, R),
c 7 (v, L) 7 (, L)
p q r s
Figure 1. Labelling edges in the directed graph of a Turing machine.
Example 9.2.1. The TM described below recognizes L = {0n 1n : n N} of

Example 6.3.2.
Q = {q0 , q1 , q2 , q3 , q4 }.
initial state q0 . x 0 1 a
(x, q0 ) (a, q1 , R) (1, q4 , L)
tape alphabet = {0, 1, a, }.
(x, q1 ) (0, q1 , R) (1, q1 , R) (1, q2 , L) ( , q2 , L)
input alphabet = {0, 1}.
(x, q2 ) (a, q3 , L)
accepting states F = {q4 }. (x, q3 ) (0, q3 , L) (1, q3 , L) (0, q0 , R)
a 7 (0, R) a 7 (1, L)
1 7 (1, L), 0 7 (0, L) q3 q0 q4
1 7 (a, L) 0 7 (a, R)
a 7 (1, L),
7 (, L)
q2 q1 1 7 (1, R), 0 7 (0, R)
a 7 (0, R) a 7 (1, L)
1 7 (1, L), 0 7 (0, L) q3 q0 q4
1 7 (a, L) 0 7 (a, R)
a 7 (1, L),
7 (, L)
q2 q1 1 7 (1, R), 0 7 (0, R)
Figure 2. Turing machine accepting L = {0n 1n : n N}.

Like most TM this one has a strategy. It starts by putting an a in place of

the first 0 and the last 1 it then repeatedly moves the left a to the next 0 on
its right and right a to the next 1 on its left. For a word w = 0n 1n L this
strategy should give a word of the form 0n1 aa1n1 after n 1 repeats and leave
the machine in state q0 . It then moves to accepting state q4 . In brief, the strategy
is to progressively transfor the input word as follows
0n 1n a0n1 1n1 a 0a0n2 1n2 a1 00a0n3 1n3 a11 0n1 aa1n1
and accept it provided the as meet up in the middle. Of course, we already

know how to build a PDA that recognizes L, but the contrast of methods will
be illuminating. We will also use this machine to build other, more powerful
machines. The details of the operation of this (or any) TM are best understood by
examining how some example inputs are processed using a configuration notation
for Turing machines which we now describe. At each step in processing we need
to know the state and the contents of the tape, which we write as an ordered pair.
We definitely do not want to write the contents of the tape in the cumbersome
way we did on page 94. Instead, we give the word consisting of all of the non-
blank symbols, including a blank at one end of this word when it is necessary
because the read/write cell contains a blank. Here is the configuration notation
representing the processing of w = 0011 L.
(q0 , 0011) (q1 , a011) (q1 , a011) (q1 , a011) (q1 , a011 )
(q2 , a011) (q3 , a01a) (q3 , a01a) (q3 , a01a) (q0 , 001a)
(q1 , 0a1a) (q1 , 0a1a) (q2 , 0a11) (q3 , 0aa1) (q0 , 00a1)
(q4 , 0011) (accept)
We can also use configuration notation to illustrate why the machine rejects words
that are not in L.
w = 001 : (q0 , 001) (q1 , a01) (q1 , a01) (q1 , a01 ) (q2 , a01) (q3 , a0a)
(q3 , a0a) (q0 , 00a) (q1 , 0aa) (q2 , 0a1) (reject)
w = 011 : (q0 , 011) (q1 , a11) (q1 , a11) (q1 , a11 ) (q2 , a11) (q3 , a1a)
(q3 , a1a) (q0 , 01a) (reject)
w = 010 : (q0 , 010) (q1 , a10) (q1 , a10) (q1 , a10 ) (q2 , a10) (reject)
We have already observed that TM always have output because the contents of the
tape at the end of processing may be viewed as output. We now consider examples
that make deliberate use of this idea to carry out some string calculations.
Example 9.2.2. If the TM in Figure 3 is started with input 0n on the tape it

halts in an accepting state with 02n on the tape. It starts by replacing the first
symbol in 0n with marker a. It then repeatedly shifts a one cell to the right end
and each time it does so writes an extra zero at the left hand end of the tape
contents (the transition ( , q1 ) 7 (0, q2 , R)). It stops after the a reaches the right
end of the tape contents. This strategy has the effect of doubling 0n . When the
tape contents after processing are regarded as output, a TM may be thought of as
calculating a string function that converts one string into another. This machine
computes the string function f : 0 0 defined by f (0n ) = 02n . It doubles the
string.
0 7 (a, L)
q0 q1 0 7 (0, L)
7 (, L) a 7 (0, R) 7 (0, R)
q3 q2 0 7 (0, R)
Figure 3. Turing machine that computes f (0n ) = 02n .
The string function of the machine of Example 9.2.2 can be computed much more
easily and efficiently by the FSM (with output) shown in Figure 4. Indeed, this
is a trivial task for an FSM. This illustrates just how much harder TM are to
program. However the machine of Example 9.2.2 can be expanded to perform a
task impossible for an FSM or PDA computing the string function f : 0 0
n
defined by f (0n ) = 02 , as we shall see in Example 9.2.4.
q0 0/00
Figure 4. FSM that computes f (0n ) = 02n .
Example 9.2.3. A modification of the strategy used to construct the machine

of Example 9.2.1 gives us a machine with input language 0+ = {0n : n 1}
that halves strings of even length and rejects strings of odd length. The strategy
is again to place the marker a at each end of the input word and move them
toward he middle, this time erasing the right hand end of the word as the right
hand marker moves toward the centre. Similar to Example 9.2.1, provided the
as meet up in the middle, the word must be of even length. Notice that this
strategy could also be used to design a machine that checks whether a a given
input word is of even length without erasing it. The machine implements the
string function
f (02n ) = 0n
with domain {02n : n 1}. It is shown in Figure 5.
a 7 (0, R) a 7 (, L)
0 7 (0, L) q3 q0 q4
0 7 (a, L) 0 7 (a, R)
a 7 (, L)
7 (, L)
q2 q1 0 7 (0, R)
Figure 5. TM implementing f (02n ) = 0n on domain {02n : n 1}.
Comparing the processing of w = 0011 L in Example 9.2.1 with the PDA

processing of the same word on page 56 illustrates the fact that TM are typically
less efficient than PDA. They also require more care and subtlety in design and
are harder to program it is harder to devise the strategy and work out the
transitions. So we would like to know this is all worthwhile. So far we havent
shown that TM have any more power than PDA. We rectify this by exhibiting a
machine that accepts the language
n
L = {02 : n 0} = {w {0} : (m 0) |w| = 2m }
which is not context free as shown in Example 9.1.2. The strategy is to add some
further states to the machine of Example 9.2.3 that allow it to repeatedly halve
the length of the input string. If the length of the original input string was a
power of two, this should eventually yield a string of length one as follows:
n n1 0
02 7 02 7 . . . 7 02 7 02 = 0.
If not, a sting of odd length greater than one will eventually be obtained. The
trick is to accept the string 0 if it appears as the tape contents at any stage of the
computation, but to reject any string 0n where n > 1 and n odd.
Example 9.2.4. A modification of the machine of Example 9.2.3 gives us a

machine with input language 0+ = {0n : n 1} that repeatedly halves the input
string. If an attempt to halve the string currently on the tape reveals that the
string is of odd length, the original string is rejected. Before attempting to halve
the string, however, the machine must first check that it is not just 0. If this
condition is detected (at state q1 ), the original string is accepted. The state q5
is used to move the head back to the start of the string after each halving. The
machine is shown in Figure 6.
a 7 (, L)
a 7 (0, R)
0 7 (0, L) q4 q0 q5 0 7 (0, L)
7 (, R)
0 7 (a, L)
q3 0 7 (a, R)
a 7 (, L), 7 (, L)
0 7 (0, R) 7 (, L)
0 7 (0, R) q2 q1 q6
Figure 6. Turing machine that recognizes L = {02n : n 1}.
If we think of TM as programs, the following example illustrates the idea of a sub-

program. Recall that the machine of of Example 9.2.2 implements the string func-
tion f (0n ) = 02n . We can use this function repeatedly to calculate the the string
n
function f (0n ) = 02 . We can design a TM that does this by starting with ma-
chine M of Example 9.2.2 and building a supervisor machine that causes M to
run n times. This is done in Example 9.2.5. It illustrates the modularity of Turing
machines. We frequently build up machines that perform more complicated tasks
joining simpler machines together or plugging one into another in various ways.
Joining together two slightly modified versions of the machine of Example 9.2.1
gives a machine that accepts the non-context free language L1 = {0n 1n 2n : n 1}
of Example 9.1.1.
Example 9.2.5. The TM of Figure 7 wraps the machine of Example 9.2.2

(states q0 , q1 , q2 , q3 ) in a supervisor machine (states p0 , p1 , p2 , p3 , p4 ) which causes
it to run once for each 0 in the input word. The strategy is to first convert the
input string 0n to 1n so we can use the 1s to count how many times we have
doubled and not mix up the counters with the output. Then each 1 (there
should be n of them to begin with) is deleted but the number of 0s is doubled
using a copy of the machine of Example 9.2.2. When all 1s have been deleted,
the machine halts.
q3
0 7 (0, L)
7 (, L)
1 7 (1, R) 0 7 (a, L)
1 7 (1, R) p3 q0 q1
7 (, L)
p4 7 (, R) a 7 (0, R) 7 (0, R)
1 7 (, L) 7 (0, L)
0 7 (0, L)
1 7 (1, L) p1 p2 q2
7 (, L)
0 7 (0, L) 0 7 (0, R)
0 7 (1, R) p0
n
Figure 7. Turing machine that computes f (0n ) = 02 .
In view of the result of Example 9.1.2, the machine constructed in Example 9.2.4
proves conclusively that TM are strictly more powerful than PDA. In fact they
are very powerful indeed, even if somewhat clumsy and inefficient. We will come
back to the issue of just how powerful in Section 9.3. This increase in power
is not surprising given the substantially greater flexibility of memory access we
grant to TM. However, this greatly expanded power has its down side. We have
already observed that TM are typically less efficient than PDA and they tend to be
harder to program, but there is a far more serious problem. TM sometimes fail
to stop processing. Given certain input words, they may just continue processing
indefinitely, neither reaching an accepting state nor halting in a non-accepting
state due to an undefined transition. This is analogous to a program entering an
infinite loop.
Example 9.2.6. The very simple TM in Figure 8 was designed to accept the
regular language L = {(01)n 0 : n 0} = {01} 0 by simulating a simple
finite state recognition machine. Suppose we mistakenly (a typical programming
0 7 (0, R)
7 (, L)
q0 q1 q2
1 7 (1, R)
Figure 8. Turing machine accepting L = {01} 0.
error) include the transition 1 7 (1, L) in place of 1 7 (1, R) giving the almost
identical machine of Figure 9. This new machine still correctly accepts the word
0 L and it still correctly rejects, for example, words beginning with 1 or 00 (you
should check these claims). However, using configuration notation to analyse the
0 7 (0, R)
7 (, L)
q0 q1 q2
1 7 (1, L)
Figure 9. Incorrect version of machine of Figure 8.
the processing of the word 010 L reveals a serious problem.

(q0 , 010) (q1 , 010) (q0 , 010) (q1 , 010) (q0 , 010) (q1 , 010) . . .
The machine cycles between the two configurations (q0 , 010) and (q1 , 010) forever!

In the next section we will see that TM provide the most widely accepted model
of what it means to be computable by an algorithm. Given that they enjoy such
power, it is not surprising that they run the risk, well known to every programmer,
of writing a program that goes into an infinite loop.
9.3. The power of Turing machines

9.3.1. The gold standard. Standard TM of the type presented in Section
9.2 have underpinned the accepted model of what can and cannot be computed
by an algorithm for about sixty years. Many variations on the standard TM have
been proposed:
(a) Non-deterministic Turing machines allow for a non-deterministic transi-
tion function in much the same way as for finite state machines or PDA.
(b) Multiple tape Turing machines have more than one tape and can read
from and write to all of them at each move.
(c) Turing machines with a stay put option have a third option S in addition
to L and R which means dont move the read/write cell.
9.3. THE POWER OF TURING MACHINES 103
(d) Turing machines with a one sided tape have a tape with a fixed starting
point that only extends infinitely in one direction.
None of these have yielded anything different. Just as non-deterministic finite
state recognition machines turn out to be equivalent in power to deterministic
ones, so non-deterministic TM are equivalent in power to deterministic ones.
Similarly, the other variations have all turned out to yield classes of machines
equivalent in power to standard TM. This is not to say that these variations are
of no interest. Some of them allow for easier coding of algorithms, easier proofs
and so on. For example, multiple tapes allow for many words to be processed si-
multaneously and compared with one another or for distinct tapes used for input
and output. Various other alternative models of computation have also been pro-
posed, but they have all turned out to be equivalent to (or in some cases weaker
than) standard TM.
This history has led to a widespread acceptance of the idea that TM provide
a definition of what can can be computed by an algorithm. A function is said
be computable by an algorithm precisely if there is a TM that computes it and
halts on all possible inputs. We want our machine to halt on all possible inputs
because we feel that something worthy of the name algorithm should not go
into an infinite loop and should eventually give an output in all cases. Roughly
speaking, this idea of equating algorithmic computability with computability by
a TM (or with some equivalent system) is known as the Church-Turing thesis 2.
As a definition of computability it can neither be proved nor disproved, but the
fact that it has stood the test of time gives us confidence that it is a sensible
definition. Not only are halting TM the gold standard of what can and cannot
be computed by an algorithm, they also underpin the theory of computational
complexity, which analyses how efficiently (both in terms of time and memory)
computations can be done.
9.3.2. The limits of computation. Notwithstanding the above comments,
TM are not omnipotent. There are some things they cant do. One consequence
of the following theorem is that are even languages they cannot recognize.
Theorem 9.3. There are countably many Turing recognition machines with a
given input alphabet = {x1 , x2 , . . . , xm }.
Proof of Theorem 9.3.

To keep things simple, we assume that the set of states of an n state TM is
Qn = {q0 , q1 , . . . , qn1 }, the initial state is q0 (it should be clear that it doesnt
2The mathematical logician Alonzo Church was Alan Turings PhD supervisor.
really matter how we label the states) and that the set F of accepting states is
not empty (since the language of a recognition machine with no accepting states
is empty). We also assume that the set of marker elements of the tape alphabet
the set \ ( { }) is of the form {a1 , . . . , ak } for some k 0 (because
it doesnt really matter what the markers are called, as long as we have enough
of them). The transition table of a machine with n states and and k markers has
the following structure.
x x1 . . . x m a0 . . . ak
(x, q0 )
(x, q1 )
.. .. ..
. . .
(x, qn1 )
The table entry for (x1 , q0 ) marked by is either or of the form (b, q, D) for
some q Qn , b = {x1 , . . . , xm , a1 , . . . , ak , } and D {L, R}. There are
m + k + 1 possibilities for b, multiplied by n possibilities for q, multiplied by 2
possibilities for D giving 2n(m + k + 1) + 1 possible ways of completing the entry
for (x1 , q0 ). But this argument works exactly the same for each of the n(m+k+1)
entries in the table. By the type of counting argument familiar from MAT1DM,
this means that the total number of ways to fill in the table entries is
(2n(m + k + 1) + 1)n(m+k+1) . ()
Since we have agreed that q0 is the initial state, the only remaining issue is which
states are accepting. Now F Q so the number of possible ways of choosing F is
|P(Q)| = 2n and since we usually want our machine to have at least one accepting
state we can rule out F = giving 2n 1 choices for F . Putting this together
with (), there are
(2n 1)(2n(m + k + 1) + 1)n(m+k+1) ()
elements in the set M(n,k) of machines with n states and k markers.
In particular M(n,k) is finite. We saw in Section 8.2, the set N N is countable.
It is easy to extend this argument to show (N {0}) N is countable (you should
check this), so there are countably many distinct sets M(n,k) which we may list as
M1 , M2 , M3 , . . . Since each of these sets is finite, Theorem 8.5 shows that their
union
[
M= Mi
i=1
is countable. But M is clearly the set of all Turing recognition machines with
input alphabet .
In view of Theorems 8.9 and 9.3, there must be languages with alphabet
= {0, 1} that are not recognized by any TM. Remarkably, there must even be
languages with one letter input alphabet = {0} that are not recognized by any
TM. Notice that the problem here is not that we lack the cunning to construct
recognition machines for some languages. It is that there arent enough recognition
machines to recognize all of the possible languages with input alphabet . Given
that some languages are not recognized by TM, there is a name for those that
are. They are called recursively enumerable. It even turns out that there are
some languages that can be recognised by a TM but not by any TM that halts
on all possible inputs. Because of this situtation, there is a name for the class of
languages recognised by some TM that is guaranteed to halt on all inputs. Such
languages are called recursive. In view of the above discussion, we may think of
the recursive languages as those that can be recognized by some kind of algorithm.
This result is just the tip of the iceberg. There are many, many things that simply
cannot be done by TM. Perhaps the most famous is the halting problem for TM
themselves. This is the problem of deciding whether a given Turing machine M
with input alphabet fails to halt when it attempts to process a given word w.
Before we can even think about asking a TM to solve this problem, we need to
be able to represent our machine M in some form that can be used as input to
another TM. This always turns out to be possible because of the finite nature of
a TM:
The input and tape alphabets are finite.

The transition table has finitely many entries.
There are finitely many states, one of which is initial, and finitely many
of which are accepting.
We can represent this information using a finite string. The main difficulty is that
if we want to feed this string to a TM, it must be based on a finite alphabet. This
means we cant have infinitely many symbols q0 , q1 , q2 , . . . for our states. We can
avoid this problem by using the words q, qq, qqq, . . . to represent the states. This
means we only need one symbol to represent all of the states. A similar trick can
be used for marker symbols, which we represent as a, aa, aa, . . . and so on.
Example 9.3.1. The machine of Figure 9 has transition table shown and:
states Q = {q0 , q1 , q2 }. x 0 1
initial state q0 . (x, q0 ) (0, q1 , R)
accepting states Q = {q2 }. (x, q1 ) (1, q0 , L) ( , q2 , L)
tape alphabet = {0, 1, }. (x, q2 )
input alphabet = {0, 1} .
As in the discussion following Theorem 9.3, we simplify things by assuming the

first named state is the initial state and we dont really need to mention the
symbol because every machine has in its tape alphabet. With these conventions,
we could encode this TM in a single string S
S = q/qq/qqq qqq 0/1 0/1 (0/qq/R)//|/(1/q/L)/( /qqq/L)|//0
using as a delimiter separating the various parts of the description and | as a

delimiter separating the rows in the transition table, which we just write down
in order. Many other encodings are possible, some no doubt more efficient and
easier to process. The point is that our string is made from a fixed finite alphabet
0 = {q, , |, (, ), , , /}
and completely defines the operation of the machine, because from the string
S you could easily write down the transition table and hence draw the graph.
Moreover, 0 could be used to describe any machine with input alphabet . (We
used a forward slash (/) in place of a comma (,) here to avoid some very confusing
set notation.) It is now a simple matter to add a word w {0, 1} to this encoding
by simply adding w to the end of S. We are now ready to feed this string S w
to our very clever TM, in the hope that it can decide whether this machine would
halt if given the input w.
This is an example of a decision problem. We seek a TM that can decide whether

the machine halts, given any Turing machine M with input alphabet and any
input word w . But it cant be done! A rigourous mathematical proof can
be given to show no such machine exists (no matter how cleverly we encode the
machines). This doesnt mean that we can never decide whether a particular
machine halts for a particular input. It means there is no TM that can decide this
question for all possible TM with input alphabet and words in . This has
immensely important consequences in computer science. It means, for example,
that we cannot write a program that can check whether a program given to it as
input will halt on all possible inputs. A similar proof shows there is no general
algorithmic way of checking whether a program given to it as input will go into
an infinite loop on some input.
There are many other famous decision problems for which there is no TM that
works in all cases. Such problems are called undecidable. Many of them are
difficult even to describe let alone attempt to solve. On the other hand, some of
the undecidable problems concerning context free grammars are easy to describe.
Here it is very easy to see how to encode the objects we wish to study. A set

of production rules like 00 11 for a grammar is, after all, just a
string of symbols and the terminal and non-terminal symbols can be coded as in
Example 9.3.1. Among the many decision problems known to be undecidable are
the following surprisingly simple ones.
Is the language of a given context free grammar G a regular language?
For a given context free grammar G with set of terminal symbols, is
the language of G is the whole of ?
Do a pair of context free grammars G1 and G2 give the same language?
Here again, the claim is not that we can never decide whether a particular context
free grammar actually generates a regular language. The claim is that there is no
TM that can decide this question for all possible context free grammars.

ALC Master

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

ALC Master

Hochgeladen von

Copyright:

Verfügbare Formate

c Copyright for the material in this text resides with Peter Stacey, Kevin Bick-

Part 1. Relations, Languages and Computation 1

Chapter 1. Relations and Functions 3

Chapter 2. Properties of Binary Relations 13

Chapter 3. Finite State Machines 21

Chapter 4. Regular Languages and Recognition Machines 29

Chapter 5. Deterministic Machines 39

Chapter 6. Machines with Memory 49

Chapter 7. Context Free Languages 63

Chapter 8. Countability and uncountability 75

Chapter 9. More Powerful Machines 91

Relations, Languages and Computation

Relations and Functions

Reference. Relations and functions are discussed in Chapters 6 and 7 of Discrete

1.1. Cartesian products

Example 1.1.1. In applications to databases D1 , D2 , . . . , Dn will consist of ob-

D1 D2 = {(Andrew, Bundoora), (Andrew, Greensborough), (Andrew, Heidelberg),

1In Section 1.4 we will consider a slightly different approach.

which is sketched below.

Notice that if D1 has m1 elements and D2 has m2 elements then D1 D2 has

Example 1.2.1. As in Example 1.1.1 let D1 = {Andrew, Michelle, Tracey} and

Example 1.2.2. Let D1 = D2 = D3 = N, the set of natural numbers. The

Example 1.2.3. Let D1 = D2 = N. The less than or equal to relation B can

1.2.1. Binary and ternary relations. We have defined a relation R to be a

Example 1.2.4. The inverse of the lives in relation

If we have a relation R on D1 D2 and a relation S on D2 D3 then we can form

Example 1.2.5. The composite of the relations L = {(Andrew, Greensborough),

A three place relation R D1 D2 D3 (like the one in Example 1.2.2)

Example 1.3.1. Let d1 = d2 = N and let R = {(d1 , d2 ) : d1 > d2 }. Then R is not

Example 1.3.2. Let D1 = N\{1} and D2 = N and let F = {(d1 , d2 ) : d1 = d2 +1}.

Example 1.3.3. Let D1 = D2 = N and let G = {(d1 , d2 ) : d2 = d21 }. Then G is

which is read as the function f maps the domain D1 to the codomain D2 or

Example 1.3.4. The squaring function f : R R defined by f (x) = x2 may be

Example 1.3.5. The function f : {0, 1, 2, 3} {0, 1, 2, 3} such that f (x) = 3x

Given a function f : D1 D2 and a S D1 the restriction of f to S is the

Example 1.3.6. Let D1 = D2 = N and let R = {(d1 , d2 ) : d1 > d2 }. Then

Example 1.3.7. Let D1 = D2 = N and let F = {(d1 , d2 ) : d1 = d2 + 1}. Then

1.4. An alternative view of Cartesian products

has the above property and (f (1), f (2), . . . , f (n)) = (d1 , d2 , . . . , dn ).

f (f (1), f (2), . . . , f (n))

between the n-tuples of our original definition of D1 D2 . . . Dn and the

Example 1.4.1. We usually represent the plane R2 = {(x, y) : x, x R} in x-y

We will refer to this alternative description as the function representation of Carte-

Example 1.4.2. We may represent the Cartesian product of Example 1.1.1 as

f (name) D1 = {Andrew, Michelle, Tracey}

The lives in relation

L = {(Andrew, Greensborough), (Michelle, Bundoora), (Tracey, Greensborough)}

of Example 1.2.1 is now represented as the set of functions {f1 , f2 , f3 } shown in

1.5. Combining relations and functions

R S shaded R S shaded R \ S shaded

Figure 1. Union, intersection and difference of two sets

then so are R S, R S and R \ S. In other words, if R and S are both relations

R S (D1 E1 ) (D2 E2 ) (Dn En ),

the sibling relation {(x, y) : x, y F M, x is a sibling of y} because

Figure 2. The square root relation and absolute value function.

Lemma 1.1. The union of two functions f : D1 D2 and g : D3 D4 is a

There is a subtle point to note about Lemma 1.1. If D1 D3 is empty then f g

Properties of Binary Relations

Reference. Chapter 6 of Doerr and Levasseur.

2.1. Directed graphs of relations

2.2. Properties of binary relations

2.2.1. Graphical detection of properties. The abstract nature of these

(c) Reflexivity: There is a loop at every vertex.

There is a special case where a = c. In that case, there is an edge