Beruflich Dokumente
Kultur Dokumente
nell, John Banks and the Department of Mathematics and Statistics, La Trobe
University. As such, reproduction of this material may only be undertaken with
their express permission.
i
Machines and Languages
Subject Notes Part A for MAT2ALC
Algebra, Linear Codes and Automata
This text was developed by Peter Stacey and subsequently revised by Kevin
Bicknell and John Banks. The 2012 full edition was typeset by John Banks.
Contents
v
vi CONTENTS
Example 1.1.2. When D1 , D2 are sets of numbers we can draw a picture of their
Cartesian product. We represent the ordered pair (D1 , D2 ) by the point in the
plane at a horizontal distance D1 and a vertical distance D2 from a chosen origin.
For example, if D1 = {1, 2} and D2 = {2, 3, 4} then
D1 D2 = {(1, 2), (1, 3), (1, 4), (2, 2), (2, 3), (2, 4)}
x
1 2
1.2. Relations
If D1 , . . . , Dn are any sets, then a relation between elements of these sets is a
subset of the Cartesian product D1 D2 Dn .
2Fenced off sections of the text can be omitted by students focussing on the basics.
6 1. RELATIONS AND FUNCTIONS
1.3. Functions
A function is a binary relation R D1 D2 with the special property that, for
each d1 D1 there is exactly one d2 D2 with (d1 , d2 ) R. The set D1 is usually
called the domain of the function and D2 is called its codomain and we say that
R is a function from D1 to D2 .
In Example 1.3.3 there was a rule associated with elements of R, namely square
the first element to get the second one. Similarly, all functions can be thought
of as rules. Given an element d1 in the domain, the rule produces the unique
element d2 in the codomain for which (d1 , d2 ) R. If we call the function f , then
we use the notation f (d1 ) = d2 to describe the fact that d2 depends on d1 . When
discussing functions we will frequently use the traditional notation
f : D1 D2
When the domain of a function f is finite (as is often the case in computer science
applications) we can represent it using a table by simply tabulating all of the x
and f (x) values.
Partial functions arise in many settings in computer science. We will see many
examples of them in our study of automata. As for functions, we can specify a
partial function on a finite set using a table. The only difference is that we use
the symbol to indicate that the partial function is not defined for certain input
values.
Example 1.3.8. The the following table gives the values of the square root
relation
S = {(x, y) D D : x = y 2 }
on the set D = {0, 1, 2, 3, 4, 5} and illustrates the fact that S is indeed a partial
function.
x 0 1 2 3 4 5
f (x) 0 1 2
Nn = {1, 2, . . . , n}
that have the property that f (i) Di for each i Nn . Although the equivalence
of this approach may not seem obvious at first sight, notice that:
(a) For each function f having the above property, it is true by definition
that the n-tuple (f (1), f (2), . . . , f (n)) is an element of D1 D2 Dn
as traditionally defined.
(b) For each (d1 , d2 , . . . , dn ) D1 D2 Dn the function
f : Nn D1 D2 Dn : i 7 di
x name residence
f1 (x) Andrew Greensborough
f2 (x) Michelle Bundoora
f3 (x) Tracey Greensborough
The fact that the function representation of the Cartesian product allows us to
describe this relation using a table will be particularly relevant for description of
relations in relational databases, as discussed in the next chapter. In this context,
we typically omit the names of the functions (unless we need them for some
reason), so the table becomes a bit simpler:
name residence
Andrew Greensborough
Michelle Bundoora
Tracey Greensorough
The function representation of Cartesian products makes it easy define the product
of infinitely many sets. For example, the set of all infinite sequences of natural
numbers is an infinite product of copies of N. This means the set of all possible
functions S : N N. The index set is now N. Similarly, the set of all infinite
sequences of real numbers is an infinite product of copies of R which just means
the set of all functions S : N R. Such products are of vital importance in many
branches of mathematics.
R S R S R S
to see that if R and S are both subsets of the same Cartesian product
D1 D2 Dn
Example 1.5.1.
(a) If R is the less than relation {(m, n) : m, n N, m < n} on the natural
numbers N and S is the equals relation {(m, m) : m N} on N, then
R S is the less than or equal to relation on N.
(b) If R is the less than or equal to relation {(m, n) : m, n N, m n} on N
and S is the greater than or equal to relation {(m, n) : m, n N, m n}
on N, then R S is the equals relation on N.
(c) If R is the less than or equal to relation {(m, n) : m, n N, m n}
on N and S is the equals relation {(m, m) : m N} on the natural
numbers, then R \ S is the less than relation on N.
We can also take unions, intersections and differences of pairs relations that are
subsets of different Cartesian products, but this only yields a relation in cases
where the arities (the numbers of sets in the two products) are the same. If
R D1 D2 Dn and S E1 E2 En it turns out that
Example 1.5.2.
(a) Let F be the set of female mathematics students and M be the set of
male mathematics students at La Trobe University. The union of the
relations S = {(x, y) : x, y F, x is a sister of y} and B = {(x, y) :
x, y M, x is a brother of y} is the same sex sibling relation on the set
F M of all mathematics students. Note that S B is not the same as
12 1. RELATIONS AND FUNCTIONS
Since a function is by definition a subset of a Catesian product of two sets, the do-
main and codomain, we can always take the union of two functions. The resulting
union may or may not turn out to be a function.
Example 1.5.3.
(a) The union of the functions f = {(x, x) : x R, x 0} and
g = {(x, x) : x R, x 0} is the square root relation pictured in
Figure 2. Although f g is the inverse of the function {(x, x2 ) : x R},
it is not a function itself.
(b) The union of f = {(x, x) : x R, x 0} and g = {(x, x) : x R, x 0}
is a function, known as the absolute value function, usually written |x|.
1 2
0 x 1
1 2 3 4
1 x
2 1 0 1 2
2
Example 2.1.1. Let D = {1, 2, 3} and let R be the less than or equal to
relation R = {(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)}. The directed graph is
1 2 3
For the same set, the directed graph representing the strictly less than relation
R = {(1, 2), (1, 3), (2, 3)} is
1 2 3
13
14 2. PROPERTIES OF BINARY RELATIONS
and the equality relation R = {(1, 1), (2, 2), (3, 3)} has directed graph
1 2 3
We have just seen how easy it is to draw the directed graph of a relation written
as a finite set of ordered pairs. In the opposite direction, it is just as easy to write
down an ordered pair description of the relation represented by a given directed
graph. In fact, from a mathematical point of view a binary relation on a set S is
essentially the same thing as a directed graph with vertex set s.
a b a b a b
this . . . . . . or this . . . . . . but not this.
(b) Antisymmetry: There are no pairs of vertices a and b with an edge from
a to b and an edge from b to a. You can never have
a b
a b c a b c
where you see this . . . . . . you must also have this.
a b a b a b
if you see this . . . . . . you must also have this . . . and also have this.
a b a b
So, if you see this . . . . . . you must have two loops.
In addition to the above properties, note that concepts like being a partial func-
tion and being a function are also properties of binary relations. These too,
admit simple interpretations in terms of directed graphs.
(a) Partial Function: There is at most one edge coming out of every vertex.
You can never have:
b
a
c
16 2. PROPERTIES OF BINARY RELATIONS
(b) Function: There is exactly one edge coming out of every vertex.
You can never have
b
a no out edge
c a b
this . . . . . . or this.
2.2.2. Partial orders and equivalences. Two types of binary relations are
particularly important in computer science:
A relation which is reflexive, transitive and antisymmetric is known as a
partial order.
A relation which is symmetric, transitive and reflexive is called an equiv-
alence relation. For an equivalence relation R it is customary to write
x R y (or just x y when the relation is clear) rather than (x, y) R.
Example 2.2.1.
(a) Let R = {(n, n) : n Z} (the equality relation on Z). Then R is sym-
metric (because x = y implies y = x), reflexive and transitive, so is an
equivalence relation. It is also antisymmetric in a subtle way (because we
never have (x, y) R and x 6= y), so is also a partial order.
(b) Let R = {(x, y) Z Z : x y} (the less than or equal to relation
on Z). Then R is reflexive, transitive and antisymmetric (because when
x y and x 6= y then y x.) It is not symmetric (because, for example,
(1, 2) R but (2, 1)
/ R.) Hence it is a partial order (as we would hope)
but not an equivalence relation.
(c) Let R = {(x, y) Z Z : x < y} (the strictly less than relation on
Z). Then R is transitive and antisymmetric but neither reflexive nor
symmetric. It is a partial order.
(d) Let R = {(x, y) Z Z : x y is a multiple of 2}. Then R is symmetric
(because if x y = 2m then y x = 2(m)), reflexive (because x x =
2 0) and transitive (because if x y = 2m and y z = 2n then
x z = 2(m + n)). It is therefore an equivalence relation. It is not
antisymmetric because (1, 3) R with 1 6= 3 and (3, 1) R.
(e) Let R = {(x, y) Z Z : x is a factor of y}. Then R is reflexive (because
x = 1 x for each x), transitive (because if y = mx and z = ny then
z = (mn)x), not symmetric (because 2 is a factor of 4 but 4 is not a factor
2.2. PROPERTIES OF BINARY RELATIONS 17
Sn
...
S1 S2
Example 2.2.2. (a) For the example R = {(n, n) : n Z}, each element is
related only to itself, i.e. [n] = {n} for each n. Hence each equivalence
class contains exactly one number. There are infinitely many equivalence
classes, giving a partition of Z into infinitely many sets.
(b) For the example R = {(x, y) Z Z : x y is a multiple of 2}, the equiv-
alence class containing 0 consists of the even numbers and the equivalence
class containing 1 consists of the odd numbers, i.e.
[0] = {y : y 0 is a multiple of 2} = {. . . , 2, 0, 2, 4, . . .}
and
[1] = {y : y 1 is a multiple of 2} = {. . . , 1, 0, 1, 3, . . . }.
In this example the associated partition contains just two sets, each with
infinitely many elements.
1 2 3 4 5
Adding edges from a to c wherever we see edges from a to b and from b to c yields
the following graph.
1 2 3 4 5
But we are not yet done. The relation represented by this graph is still not
transitive because, for example, there is an edge from 1 to 3 and an edge from
3 to 4, but no edge from 1 to 4. So, repeating the procedure used above gives
the following graph, which does in fact represent a transitive relation (in fact it
represents the relation <on D). Here we repeated the procedure twice to obtain
the graph of (R). In general, we may need to repeat this procedure several times
to obtain the transitive closure.
1 2 3 4 5
so we stop calculating Pi s and let P = {{1, 2, 3, 5}, {4, 7, 9}, {6}, {8}, {10}}.
As you can see from the example, Algorithm 2.1 could easily be applied for small
P0 by simple inspection.
Chapter Three
Many mechanical devices, such as vending machines and electrical circuits, can
be modeled as finite state machines.
Example 3.1.1. For simplicity, consider a machine which sells two items A and
B, both costing $2. The set of inputs is {select A, select B, deposit $2}, the set
of outputs is {release A, release B, release nothing }, and the set Q of states is
{permit release, forbid release}. The transition function has, for example
(deposit $2, forbid release) = permit release
(select A, forbid release) = forbid release
(select A, permit release) = forbid release
x sA sB d$2
(x,per) for for per
(x,for) for for per
x sA sB d$2
f (x,per) rA rB rn
f (x,for) rn rn rn
The other way to describe the machine is by a directed graph with the vertices
labelled by the states and the edges, representing possible transitions between
states, labelled by the corresponding inputs and outputs. The graph for Example
3.1.1 is shown in Figure 1.
1Since and f are functions of two variables, the function tables are slightly more compli-
cated than the ones we have seen so far.
3.1. DETERMINISTIC FINITE STATE MACHINES 23
sB/rB
per for sB/rn
Example 3.1.2. A finite state machine with the states carry and dont carry can
be used to add a pair of binary numbers which are input as a sequence of pairs of
binary digits. For example, to add 1101 and 11 then (starting from the right) the
pairs (1,1), (0,1), (1,0), (1,0), (0,0) are entered in turn. (The final (0,0) allows for
carry overs.) The transition function is
To see this, note that we must carry if the total of the two inputs and any currently
carried digit is at least 2. The output function would be
The output from a given string of inputs can be calculated, once we know the
initial state, the transition function and the output function. For example, to add
the numbers 1101 and 11 using the machine of Example 5.1.2 we can record the
output as follows.
When we draw the directed graph of a finite state machine without output each
edge has just one label, giving the corresponding input. For example, the flip-flop
of Example 3.2.1 has the following diagram.
1
0 s1 s2 0
1
b b
a
q0 q1
a
inputs: a a b b a b
state: q0 q1 q0 q0 q0 q1 q1
Since the final state q1 is accepting, the word is accepted. On the other hand if
the input had been aabb (with the final ab deleted) then the final state would have
been q0 and the input word would have been rejected.
The set of input words accepted by a recognition machine is called the language of
the machine. For example, the language of the machine in Example 5.3.1 consists
of all words with an odd number of as. To see this, note that the input b never
changes the state but the input a always does. Hence to move from q0 to q1
requires an odd number of inputs a.
The purpose of a recognition machine is to accept or reject the finite words from a
given alphabet. Two machines which accept precisely the same words are therefore
effectively the same and we say that they are equivalent. This gives an equivalence
relation on the class of all recognition machines.
26 3. FINITE STATE MACHINES
b b b
a a
q0 q1 q2
a a
and so on. This notation extends easily to cases where more than one symbol is
repeated. The language
{0101
| {z . . . 01} : n 0}
n
consisting of an arbitrary number of repeats of the word 01, for example, can be
written more briefly as
{(01)n : n 0}.
Note that in this case the parentheses around the 01 are not part of the language
itself. They are used as a notation that shows the scope of the power. Without
them the expression
{01n : n 0}
would be taken to mean
. . . 1} : n 0}.
{0 |11 {z
n
Another convenient notation is na (w) which counts the number of occurrences
of a particular alphabet symbol a in a given word w. Using this notation we can
write the language of the recognition machine of Examples 3.3.1 and 3.3.2 as
{w : na (w) is odd}.
The only problem with this notation is that it doesnt tell us what letters are
allowed to be in w apart from a. The star notation allows us to rectify this. For
3.5. EXTENDED TRANSITION FUNCTION AND SUFFIX SETS 27
a given alphabet the set of all possible words or input strings made from the
symbols in is denoted by . With this notation we can describe the language
of Examples 3.3.1 and 3.3.2 unambiguously as
We will discuss the star notation further in the next chapter. Finally it is conve-
nient to have a notation for the null or empty word the word with no symbols
at all. We write this as .
(x1 , s) = (x1 , S)
(x1 . . . xn+1 , S) = (xn+1 , (x1 . . . xn , S))
F (x, S) = f (x1 , S)
F (x1 . . . xn+1 , S) = f (xn+1 , F (x1 . . . xn , S))
where f (x, s) is the output when the machine is in state s and receives input x.
28 3. FINITE STATE MACHINES
This example illustrates how can be used to write down a definition of the
language of a recognition machine:
L = {w : (w, q0 ) F }
where q0 is the initial state and F is the set of accepting states. In fact, a similar
idea applies to any state qi . We define the set of words that take us to an
accepting state if we start processing at qi . This set
S(qi ) = {w : (w, qi ) F }
is called the suffix set of qi . The concept of a suffix set can make the task of
calculating the language of a machine easier by breaking up the calculation into
smaller steps.
Example 3.5.2. We calculate some suffix sets for the recognition machine shown.
q1 0 q2
0
S(q5 ) = .
1 0, 1
q0 S(q2 ) = {}.
1 S(q1 ) = {0}.
0 0 S(q4 ) = {1n : n 0}.
q3 q4 q5
S(q3 ) = {1m 01n : m, n 0}.
1 1 0, 1
Using the above calculations of S(q1 ) and S(q3 ), the language of this machine is
S(q0 ) = {00} {1m 01n : m 1, n 0}.
Algorithm 4.1.
(a) Start with the string (and observe that there is precisely one non-
terminal symbol to begin with).
(b) While a non-terminal symbol remains in the string, either:
(i) use an appropriate production rule of type (RG1) to replace the non-
terminal symbol with a terminal symbol followed by a non-terminal
one (and observe that each time we do this there will be precisely
one non-terminal symbol present in the string).
(ii) Use a production rule of type (RG2) to replace the non-terminal
symbol in the string.
Once we perform step (ii), there will be no non-terminal symbols left, so
we are forced to stop.
There is a subtle point about the definition of a regular language that is easily
missed. It is sometimes possible to generate a regular language using a grammar
that is not regular. Example 4.1.3 illustrates this. The definition of a regular
language requires only that there exists at least one regular grammar for the the
language, not that all grammars for that language are regular.
As soon as we choose to use the rule A 1B we get the string 0n 1B for some
n 0 and at the next move we are forced to use the rule B , so we end up
with 0n 1. Alternatively, we may choose to use the rule A and hence end
up with 0n . The language generated by this grammar is therefore the set of all
strings of binary digits beginning with a positive number of 0s which may or may
not be followed by a single 1. We can write this language a set
The step-by-step process of using Algorithm 4.1 to obtain a word in the language
generated by a regular grammar is called derivation. We represent the steps in
a specific derivation using the symbol = . For instance, the derivation of the
word 00001 from the grammar in Example 4.1.1 is
The construction gives N = {A, B}, the starting state is A and the production
rules are
A aB, A bA, B aA, B bB, B .
32 4. REGULAR LANGUAGES AND RECOGNITION MACHINES
Repeated application of these rules starting with A allows the production of cer-
tain strings, but does not allow the production of others. For example
A = aB = a = a
produces the word bbab. On the other hand , you might like to convince yourself
that there is no way to produce the word aba using the above grammar.
The next example illustrates how a grammar can be used to specify the syntax of
part of a programming language.
Example 4.1.3. Let the set of terminal symbols be {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, the
non-terminal symbols be {h digit i, hintegeri, hsigned integeri, hunsigned integeri},
let the starting symbol be hintegeri and let the production rules be
hintegeri hsigned integeri, hintegeri hunsigned integeri
hsigned integeri + hunsigned integeri
hsigned integeri hunsigned integeri
hunsigned integeri hdigitihunsigned integeri
hunsigned integeri hdigiti
hdigiti 0, hdigiti 1, . . . , hdigiti 9.
One example of a valid derivation is:
hintegeri = hunsigned integeri = hdigitihunsigned integer i =
hdigitihdigit i = hdigiti2 = 12
Another is:
hintegeri = hsigned integeri = hunsigned integeri
= hdigiti = 7
4.2. RECOGNITION MACHINES FOR REGULAR GRAMMARS 33
In this way any integer can be obtained. This is not a regular grammar because,
for example, the production rule hunsigned integeri hdigitihunsigned integeri
is not of the form (RG1) or (RG2). The language generated by this grammar
is regular, however, because there is a regular grammar that generates the same
language (Exercise: Try to write one).
As an example, you should check that applying this construction to the grammar
of Example 4.1.2, takes you back to the graph from which the grammar was
generated. All goes smoothly because the grammar was derived from a recognition
machine. If we start with an arbitrary regular grammar, however, problems can
arise.
Example 4.2.1. Suppose T = {a, b}, N = {, , } and the production rules are
b, a, b, b,
Our constructiion gives the recognition machine shown in Figure 1.
b b
a b
(b) the states (a, ) and (b, ) are not defined at all.
The latter problem with the machine of Example 4.2.1 is easily fixed. We simply
add a new non-accepting state to which we move from states where there is no
definition of how to handle a particular input. The transition function is then
completed by taking all inputs at the new state to the new state. In Section 3.5,
we described such a state as a sink or sometimes a black hole. Adding a sink
to the machine of Example 4.2.1 gives the complete machine (i.e. one where all
transitions are defined) of Figure 2.
b b a, b
a b a, b
Although we have fixed one of the problems mentioned above, the problem that
(b, ) is not uniquely defined remains. We will see how to fix this later. A
commonly used convention, which we will sometimes employ, is to leave out the
sink. Under this convention, when we reach a state for which the next transition
is not defined, the word we are processing is rejected. This is usually done to keep
the diagram for the machine simpler. It can be made formal by:
(a) allowing the transition map to be a partial function (as defined in Section
1.3) rather than a function and
(b) adopting the above convention of rejecting any word that requires the use
of a transition that is not defined.
When we discuss push down automata in Chapter 6, we will nearly always use a
partial function to define transitions and employ the latter convention.
An Equivalent Definition
The more common way of defining of regular grammars allows production rules
of the form
At (RG3)
(where t is a terminal symbol) in addition to (RG1) and (RG2) rules. We have
avoided this because in our construction of a recognition machine from a regular
grammar it is not immediately obvious how to handle such rules. It turns out,
however, that from the point of view of the language generated by a grammar
(which is really all that matters) these two definitions are equivalent. A gram-
mar consisting of rules of the forms (RG1), (RG2) and (RG3) can always be
4.3. NONDETERMINISTIC MACHINES 35
x a b
(x, ) { } {}
(x, ) {, }
(x, )
Recall that is used to denote the undefined entries in the table of a partial
function. Here, we can either think of in the same way or in its usual inter-
pretation as the empty set. For a nondeterministic recognition machine, an input
string does not lead to a unique finishing state, so we have to decide what it ac-
tually means for the machine to accept an input. This is the key idea in defining
nondeterministic machines:
For instance, in Example 4.2.1 the input string ab could either correspond to the
state sequence or to . Even though the first of these does
not end in an accepting state, the input string is accepted by the machine.
To summarise, for each (deterministic) recognition machine there is a regular
language consisting precisely of the words accepted by the machine and for each
regular language there is a nondeterministic recognition machine which accepts
precisely the words in the language. In fact, as we will see in the next chapter,
every nondeterministic machine has an equivalent deterministic one, so regular
languages correspond exactly to recognition machines.
Taken together with set union (which we used earlier in this section) and a rather
obvious notation for concatenation, the notation gives a remarkably powerful way
of describing languages. In fact, once they have been properly defined, expressions
using these few operations can be used to define any regular language. They are
called regular expressions and provide a third standard way to describe regular
languages (in addition to recognition machines and regular grammars).
Example 4.4.1.
(a) The language L = {0n 1 : n 1} {0n : n 1} of Example 4.1.1 is
described by the regular expression 0{0} 1 0{0} or {0}+ 1 {0}+ .
4.4. REGULAR EXPRESSIONS 37
(b) L = {0101
| {z . . . 01} : n 0} is described by the regular expression {01} .
n
(c) The language of all words on the alphabet {a, b, c} containing precisely
two as and commencing with b is described by the regular expression
b{b, c} a{b, c} a{b, c} .
Regular expressions are used for many purposes in computer science. Although
we have used set theoretic notations here to emphasise the fact that a regular
expression really denotes a set of words, the versions of regular expressions used
in computer science have been adapted to need only standard computer keyboard
characters so for example:
+ is used instead of
Parentheses ( and ) are used in place of { and } for indicating the scope
of *s and +s.
The notations may also have various abbreviations added for frequently needed
items like digits, white space and alphabetic characters. One well known notation
for regular expressions is the advanced text searching system known as GREP
(Global Regular Expression Parser ) searching. This allows for much more sophis-
ticated search and replace patterns than simple text strings or text strings with
wildcards. In principle, one can use GREP patterns to search for any set of text
strings that constitutes a regular language. GREP based searching is available in
many text editors and command line applications.
Chapter Five
Deterministic Machines
Although non-deterministic recognition machines appear to be
more general, we will see that every such machine accepts the
same inputs as some deterministic machine. This machine might
be quite complicated, but there is a method of simplifying a
machine while still keeping the same accepted inputs.
b b a, b
a, b a
[
elements of S under the input x. More formally, 0 (x, S) = (x, s). Also 0
sS
maps the empty set to itself after any input. Hence the transition table is
x {} { } {} {, } {, } {, } {, , }
(x, a) { } {} {} {, } {, }
0
{} {, }
0 (x, b) {, } { } {} {, } {, } {, , } {, , }
b b
a, b
{} {, } {, , }
a b a a
a
a
{ } {} {, } {, }
a
b a, b b b
The inputs accepted by the original nondeterministic machine are all accepted by
the new machine. For example, bbabb is accepted by the original machine through
the path
b b a b b
This gives rise to the path
b b a b b
{} {} {} { } {, } {, }
in the new machine, so that bbabb is also accepted by the new machine. (Note
that in each element of the path, the state set in the new machine contains the
original state, so that the final state will be accepting.)
It is also true that any input accepted by the new machine will also be accepted
by the old one. For example, the new machine accepts abbb through the path
a b b b
{} { } {, } {, } {, }.
and in the old machine, abbb is accepted by the path
a b b b
5.2. SIMPLIFYING DETERMINISTIC MACHINES 41
The arguments used in this example can be applied generally to show that every
nondeterministic recognition machine has an associated deterministic one which
accepts precisely the same inputs. For later reference we can now summarise the
results of the previuos chapter and this section in the form of a theorem:
{} {, }
b
a a
a
{ } {} {, }
a
b a, b b
suffix sets than states we can construct a smaller machine M 0 that is equivalent
to M .
Example 5.2.1. Although our Example from Figure 3 has five vertices, you can
check that it only has three distinct suffix sets:
S({}) =
S({ }) = S({, }) = {bn : n 0} (accepting)
S({}) = S({, }) = {bm abn : m, n 0} {bn : n 0} (initial & accepting)
Observe that the suffix set S({ }) of the accepting state { } contains the empty
string . A moments thought should convince you that S(q) precisely in the
case where q is accepting.
Although this construction may seem a little abstract, it is really no worse than
the construction of the previous section. There we used subsets of the set of states
of the original machine to construct our new machine. Here we use the suffix sets
of the states of the original machine.
Applying this construction to our Example from Figure 3, we see that the states
are the three distinct suffix sets S(), S({ }) = S({, }) and S({}) = S({, })
calculated in Example 5.2.1, the initial state is S({}), the accepting states are
S({}) and S({ }) and according to (d) above, the new transition function 0
may be calculated using the transition function for the machine of Figure 3:
b b a, b
a a
S({}) S({ }) S({})
We may also construct our new machine M 0 by partitioning the machine M (with-
out inaccessible states) into equivalence classes. We regard states as equivalent if
they have the same suffix set and say that they are suffix equivalent. The equiv-
alence class of the initial state is the initial state of M 0 , the accepting states of
M 0 are the equivalence classes of the accepting states of M and the transition
function 0 is given by the rule
0 (a, [q]) = [(a, q)]
for states q and inputs a. This is obviously just another way of carrying out the
above construction, but it emphasizes the key problem that needs to be solved:
Which vertices are suffix equivalent?
Algorithm 5.1.
Initialization: We observed in Example 5.2.1 that S(q) precisely in the case
where q is accepting. This shows an accepting state never has the same suffix set
as a non-accepting one. Therefore:
Initialize the table by placing an X in the cell for every pair {p, q} where
q is accepting and p is not.
Loop stage: Suppose at some stage we have states p and q and an input a for
which (a, p) and (a, q) are distinct and the cell for the pair {(a, p), (a, q)}
already has an X. This means (a, p) and (a, q) are not suffix equivalent, so
there must be either w S((a, p)) that is not in S((a, q)) or vice-versa. If there
such w exists then aw S(p), but aw cant be in S(q) because this would mean
w S((a, q)). Thus S((a, p)) 6= S((a, q)). In the vice-versa case, we again
have S((a, p)) 6= S((a, q)) for a similar reason. Therefore:
Make repeated passes through all table cells that do not yet contain an
X, placing an X in the cell for {p, q} if there is an input a such that the
cell for the pair {(a, p), (a, q)} already has an X.
Stopping criterion: We may need to go through the table many times because
a pass that adds at least one X may be setting up the scene for adding more Xs
on the next pass. Therefore:
Calculating the equivalence relation: If the cell for {p, q} does not have an
X after the loop has finished, p and q must be suffix equivalent. Since this is an
equivalence relation, we can easily find the equivalence classes using Algorithm
2.1. Therefore:
Let P0 be the set of pairs {p, q} for which the cell does not contain an
X. Apply Algorithm 2.1 to obtain a partition P . This is the set of suffix
equivalence classes.
Once the equivalence classes are calculated the simplified machine is defined in
the manner discussed at the end of the previous section.
Example 5.3.1. The initialization stage and loop passes for the deterministic
recognition machine in Figure 5 are shown in the following tables.
5.3. AN ALGORITHM FOR FINDING SUFFIX EQUIVALENCE CLASSES 45
1
A B
1 0 1
0 0
F C
0 0
1 0 1
E D
1
A B C D E A B C D E A B C D E
B X - - - - B X - - - - B X - - - -
C X - - - C X X - - - C X X - - -
D X X - - D X X - - D X X - -
E X X - E X X X - E X X X -
F X X F X X X X F X X X X
Initialization First Loop Pass Second Loop Pass
Initialization is straightforward. The first loop pass adds 4 new Xs. For example,
an X is added for the pair {B, F } since (0, B) = A, (0, F ) = E and the cell for
{A, E} already has an X. Since the second loop pass adds no new Xs no further
passes are made.
The final table shows that the pairs {A, D}, {B, E} and {C, F } are suffix equiv-
alent. This is already a partition (although it might not be for some machines).
The equivalence classes are therefore [A] = {A, D}, [B] = {B, E} and {C, F }.
These are the states of our simplified machine. Since F [C] the initial state is
[C]. Since [A] contains all accepting states from the original machine it is the only
accepting state in the new machine. Using the formula
from the previous section to calculate the transition function 0 , we obtain the
new recognition machine shown in Figure 6.
46 5. DETERMINISTIC MACHINES
1 [A]
0
[C] 0 1
0
1 [B]
Even though the individual suffix sets for a language are often infinite, it could
still be the case that there are only finitely many of them (ie only finitely many
distinct suffix sets) because there may be many, many different words in that
have the same suffix set.
It is no coincidence that the language L of the previous example only has finitely
many distinct suffix sets. In fact it is a consequence of the following theorem
which gives yet another characterization of regular languages.
Theorem 5.3. A language is regular precisely if it has finitely many suffix sets.
Part of the proof of this theorem involves showing how to construct a recognition
machine for a language L with finitely many suffix sets. The construction is simple
and is very similar to the simplification construction discussed in section 5.2:
The states of the machine are the suffix sets of L.
The initial state is SL () = L.
The accepting states are the states of the form SL (w) where w L.
The transition function is defined for w and a by
(a, SL (w)) = SL (wa).
According to a theorem similar to Theorem 5.2, the machine constructed in this
way is guaranteed to be minimal and deterministic.
0
SL () SL (0)
1 1 1
0
0, 1 SL (1) SL (01)
0
(b) Recall from Section 3.4 that we denote the number of 0s in a word w in
{0, 1} by n0 (w) and the number of 1s by n1 (w). To show that
L = {w {0, 1} : n0 (w) = n1 (w)}
is not regular, observe that
SL (1m ) = {w {0, 1} : n0 (w) = n1 (w) + m}
for each m 0 so the sets SL (1), SL (11), SL (111), . . . are all disjoint.
(c) Showing the language M = {(m )m : m 0} of matched parentheses is not
regular is much the same as for the language L = {0n 1n : n 0} discussed
above. Here the sets SL ((n )) = {)n1 } are disjoint for each n 1.
Example 6.1.1(c) suggests one reason why most modern programming languages
are not regular. They generally allow arithmetic expressions like
a + b, (a + b) 4 and ((a b) + (b/(a a)))
where the parentheses must match. To illustrate this, the next Example will give
a grammar for a limited language of arithmetic expressions of this type and show
that it is not regular. We first introduce the alternation convention for writing
the production rules in a grammar. This convention is designed to reduce the
amount of writing and allows us to combine production rules like
A tB
A rS
A
with the same left hand side into a single expression as
A tB rS .
This is just a more compact way of writing several similar rules at once and means
exactly the same thing. It may be read as replace A with either tB or rS or .
6.2. Stacks
The reason recognition machines are unable to recognise languages like
L = {0n 1n : n 0}
is that they only have a very limited form of memory. To process elements of L,
we would need some way of keeping track of how many 0s we have encountered
so that we can make sure the number of 1s is the same. In general, there is no
way of doing this with a finite state recognition machine. Since the language L we
52 6. MACHINES WITH MEMORY
have discussed in this section is not regular, it cannot be generated any regular
grammar. It is very easy, however, to give a non-regular grammar that generates
it. The terminal symbols are of course 0 and 1 and we need only one non-terminal
symbol (which therefore must be the starting symbol). The production rules
are 01 . It is easy to check that this simple little grammar generates L.
The action of popping makes the top of the stack available for use in computation
and removes it from the stack. The reverse action putting items on the stack
is called pushing. The item at the other end of the stack is called the bottom.
Stacks are used for huge range of purposes in computer science. We may think
of a stack as the most rudimentary form of memory or storage available in a
computation. In the next section, we will use them to define a more powerful
type of machine capable of recognising all of the examples of non-regular languages
from the previous section. Lets see how we can use a stack to solve the problem
of deciding whether an input word is in one of these languages.
0
0 0 0
0 1 0 0 0 0 0
z z z z z z z z z z z
0 1 1 0 0 0 0 1 1 1
Input symbols
Example 6.3.1. Suppose we want our PDA to move from state p to state q when
the input is 0 and the stack top is 1. If we wish to replace the popped 1 and then
add the input symbol 0 to the top of the stack, the transition formula would be
6.3. PUSH DOWN AUTOMATA 55
(0, p, 1) = (q, 01) (Since the top of the stack is written on the left, we push 01,
not 10). In the notation introduced in the previous section, we can write this
stack transition as:
0
1bn . . . b1 z 01bn . . . b1 z
where bn . . . b1 z is the prior contents of the stack at this point. We extend this
notation to show state and stack transitions using ordered pair notation:
0
(p, 1bn . . . b1 z) (q, 01bn . . . b1 z).
To erase the top of the stack, the transition formula would be (0, p, 1) = (q, )
and the effect of the transition would be written as
0
(p, 1bn . . . b1 z) (q, bn . . . b1 z).
This notation shows how the internal configuration the state and stack con-
tents of the PDA is changing during a computation. We consequently call this
configuration notation.
We define the transitions for a PDA using a partial function. This is because
most PDAs we construct would otherwise need a sink to cope with all of the un-
wanted combinations of input symbol, state and stack top, making their descrip-
tion unnecessarily complicated. With so many such combinations to consider, the
description of a full transition function is tedious and typically contains lots of
rows where all entries go to a sink. For these reasons, the description of PDAs
using either a directed graph or transition function table is greatly simplified by
omitting sinks and using a partial function.
The fact that the transition function now takes three arguments instead of two
is inconvenient when writing down a transition table. We are forced to combine
two of the arguments have on one axis. We adopt the convention of writing
the input symbols along the top of the table and all possible combinations of the
state and stack symbol down the side. Entries in the transition table are ordered
pairs of the form (state, word). We use the symbol to indicate, where necessary,
places where the transition is undefined. The reasons for this notation will become
clearer in Section 6.4.
Example 6.3.2. The following PDA accepts L = {0n 1n : n 0}, our first
example of a non-regular language.
Q = {q0 , q1 , q2 , q3 }. x 0 1
= {0, 1}. (x, q0 , z) (q1 , z)
= {z, 0}. (x, q1 , z) (q1 , 0z) (q0 , z)
F = {q0 , q3 }. (x, q1 , 0) (q1 , 00) (q2 , )
Initial state q0 (x, q2 , z) (q0 , z)
Initial stack symbol z. (x, q2 , 0) (q2 , )
56 6. MACHINES WITH MEMORY
Since L contains the empty word, q0 is an accepting state. If the first input is 0 we
move to q1 and leave the stack unchanged. While the input consists of consecutive
0s, we stay at q1 adding a 0 to the stack for each input 0. Since we didnt push
the first 0, the stack always contains one less 0 than we have processed.
When an input 1 occurs, we move to q2 erasing the 0 on the top of the stack
and remain there while the input word contains consecutive 1s, erasing a 0 for
each input 1. If we are at q1 or q2 , the next input is 1 and the stack top is z, we
are at a point where the input processed so far is of the form 0n 1n1 because we
have erased precisely n 1 zeros on the stack, so we move to the accepting state
q3 . The next input 1 then completes the word 0n 1n L. Any further input gives
a word that is not in L so there are no transitions defined from q3 which means
that such a word is rejected.
We illustrate the operation of this PDA using the configuration notation intro-
duced in Example 6.3.1 for various words (as in the transition table, indicates
an undefined transition):
0 1
w = 01 : (q0 , z) (q1 , z) (q3 , z) (accept)
0 0 1
w = 001 : (q0 , z) (q1 , z) (q1 , 0z) (q2 , z) (reject)
0 0 1 1
w = 0011 : (q0 , z) (q1 , z)) (q1 , 0z) (q2 , z) (q3 , z) (accept)
0 0 1 0
w = 0010 : (q0 , z) (q1 , z) (q1 , 0z) (q2 , z) (reject)
We can also draw directed graph representations of push down automata in a
similar way to those for finite state automata. The difference is that when we
label the edges representing the transitions, we also need to show
(a) how the popped element determines the transition,
(b) what gets pushed at the transition.
We can take care of (a) by drawing an edge coming out of every state for every
possible ordered pair of the form (input symbol, stack symbol), i.e., for every
(a, b) . As usual, we dont draw multiple edges between the same two
states, instead labeling one edge with the information for all of the transitions
between the states. This makes the diagram simpler.
(a, b) 7 u,
(c, d) 7 v (e, f ) 7 w
p q r s
We can take care of (b) above using a similar notation to that for stack evolu-
tion introduced at the end of Section 6.2. For example, edge labellings for the
6.3. PUSH DOWN AUTOMATA 57
transitions (a, p, b) = (q, u), (c, p, d) = (q, v) and (e, r, f ) = (s, w) are shown in
Figure 2. The directed graph for the PDA of Example 6.3.2 is shown in Figure 3.
(0, z) 7 0z
z q0 q1 (0, z) 7 z, (0, 0) 7 00
(1, z) 7 z (1, 0) 7
q3 q2 (1, 0) 7
(1, z) 7 z
So far in our discussion of PDAs we have applied the same criterion for acceptance
of words as we used for finite state automata, namely, that processing the word
takes us to an accepting state. Another criterion often used to determine whether
a PDA accepts an input word is that the stack should be empty after the word
is processed. By saying that the stack is empty, we really mean that the stack
contains only the initial stack symbol z (or more accurately, since we are only
allowed to look at the top of the stack, z is on the top of the stack). In fact, we
used this criterion in Example 6.2.1. We call this criterion acceptance on empty
stack and when defining a machine we will adopt the convention that this criterion
is used whenever the set F of accepting states for a PDA is empty. Acceptance
on empty stack frequently leads to simpler machines.
Q = {q0 , q1 }. x 0 1
= {0, 1}. (x, q0 , z) (q0 , 0z)
= {z, 0}. (x, q0 , 0) (q0 , 00) (q1 , )
F = . (x, q1 , 0) (q1 , )
Initial state q0
Initial stack symbol z.
The directed graph for this PDA is shown in Figure 4. Not only does it have less
states, but it also avoids the annoying requirement that the number of 0s to be
placed on the stack be one less than the number processed.
58 6. MACHINES WITH MEMORY
(0, 0) 7 00,
(0, z) 7 0z (1, 0) 7
(1, 0) 7
z q0 q1
Example 6.3.4. The PDA described below implements the stack algorithm used
in Example 6.2.1 as a PDA. The language is
L = {w {0, 1} : n0 (w) = n1 (w)}.
Q = {q0 }. x 0 1
= {0, 1}. (x, q0 , z) (q0 , 0z) (q0 , 1z)
= {z, 0, 1}. (x, q0 , 0) (q0 , 00) (q0 , )
F = . (x, q0 , 1) (q0 , ) (q0 , 11)
Initial state q0
Initial stack symbol z.
Using acceptance on empty stack has enabled us to construct a PDA with only
one state! Although we could draw a directed graph for this PDA, a moments
thought should convince you that directed graphs for single state machines are
not very informative. We can express the processing of the word w = 0110000111
of Section 6.2 in configuration notation:
0 1 1 0 0 0
(q0 , z) (q0 , 0z) (q0 , z) (q0 , 1z) (q0 , z) (q0 , 0z) (q0 , 00z)
0 1 1 1
(q0 , 000z) (q0 , 00z) (q0 , 0z) (q0 , z).
Although the fact that the PDA of Example 6.3.4 has only a single state may
seem rather strange at first sight, it can be shown that for any PDA there is an
equivalent one using acceptance on empty stack with only one state. In our final
example of this section, we use the notation wR to denote the reverse of a word
w. This is just the word w written in reverse order. For example:
001R = 100, 0101R = 1010, 1001R = 1001.
Words like 1001 that satisfy the condition w = wR are known as palindromes.
Palindromes of odd length are always of the form ucuR for some word u and
symbol c. Palindromes of even length are of the form vv R for some word v.
consists of palindromes in {0, 1, 2} of a special form. They have odd length and
the middle symbol 2 does not appear anywhere else, so we can easily detect when
we have reached the middle of the word. The following PDA accepts this language:
Q = {q0 , q1 }. x 0 1 2
= {0, 1, 2}. (x, q0 , z) (q0 , 0z) (q0 , 1z) (q1 , z)
= {z, 0, 1}. (x, q0 , 0) (q0 , 00) (q0 , 10) (q1 , 0)
F = . (x, q0 , 1) (q0 , 01) (q0 , 11) (q1 , 1)
Initial state q0 (x, q1 , 0) (q1 , )
Initial stack symbol z. (x, q1 , 1) (q1 , )
Example 6.4.1. Despite its similarity to the language of Example 6.3.5, it can
be shown that no deterministic PDA that accepts the language
L = {wwR : w {0, 1} }
Q = {q0 , q1 }. x 0 1
= {0, 1}. (x, q0 , z) {(q0 , 0z), (q1 , 0z)} {(q0 , 1z), (q1 , 1z)}
= {z, 0, 1}. (x, q0 , 0) {(q0 , 00), (q1 , 00)} {(q0 , 10), (q1 , 10)}
F = . (x, q0 , 1) {(q0 , 01), (q1 , 01)} {(q0 , 11), (q1 , 11)}
Initial state q0 (x, q1 , 0) {(q1 , )}
Initial stack symbol z. (x, q1 , 1) {(q1 , )}
The operation of this PDA is similar to that of Example 6.3.5, except that while
it is at q0 pushing input symbols onto the stack, it has the ability to jump for
no particular reason to q1 and start removing matching items from the stack. If
it happens to do this at the right point for a word wwR in L just as it pushes
the last symbol of w it will remove all of the symbols in w from the stack in
reverse order and empty the stack. This means that there is a possible evolution
of the machine starting from the initial state that processes the word and ends
with an empty stack, which is precisely how we define which words are accepted
by a nondeterministic PDA. We illustrate the operation of this PDA for various
words:
6.4. NONDETERMINISTIC PUSH DOWN AUTOMATA 61
The fact that the machine in Example 6.4.1 can decide somewhat arbitrarily to
move to an alternative state may seem strange in the study of computation, but
it is precisely this feature that gives the machine its power to detect the middle
of an even length palindrome. The machines ability to guess where the middle
is come from the way we define which words are accepted by a nondeterministic
PDA. You might say that it only guesses correctly in the sense that at least one
correct guess is possible.
62 6. MACHINES WITH MEMORY
In the case of non-deterministic PDA, using the empty set symbol to mark
undefined entries in the transition table is completely consistent with its usual
usage in set theory. It simply indicates the that the set of possible transitions for
the table entry is empty. The transition table in Example 6.4.1 illustrates this.
Chapter Seven
Regular Languages
Because the production rules of a context free grammar are much less restricted
than those in a regular grammar, they allow us to describe more complicated
languages. For instance, the grammar for simple arithmetic expressions presented
in Example 6.1.2 is context free and therefore the language consisting of all such
expressions is context free. All of the non-regular languages for which we con-
structed PDAs in Chapter 6 are context free. In most cases, it is easy to give
context free grammars that generate them.
= 01 = 0011 = 000111.
Students familiar with the Backus Naur Form (BNF ) notation for specifying
the syntax of programming languages may have already noticed the similarity
between BNF and the way we write context free grammars. In fact, BNF is just
an alternative way of writing context free grammars that has some short cuts
useful for describing program syntax. In BNF:
The arrow is written as ::=.
Non-terminal symbols are written as names enclosed by < and >, for
example <expression>.
Terminals are just written as themselves.
The grammar of Example 6.1.2 could be written in BNF as:
< expr > ::= < subexp > + < subexp > < subexp > < subexp >
< subexp > ::= (< subexp > + < subexp >) (< subexp > < subexp >)
< subexp > ::= a b c d
BNF also has shortcuts for optional elements and for repetitions of elements.
These can be expressed in our notation for context free grammars, although it
is necessary to use more production rules to do so. The notation of Example
4.1.3 was inspired by BNF. Indeed, if changing each to ::= converts this
notation to valid BNF.
A ()
tAu taaBu
66 7. CONTEXT FREE LANGUAGES
The idea expressed by such a rule is that the non-terminal A can be replaced by
the string aaB in a derivation, but only if it has a t to its left and a u to its right.
In other words, it can only be replaced if it appears in the context tAu. In this
sense, the production rules are senstive to the context in which items appear.
Observe also that since and in () may be empty, every context free production
rule is of the form required for a context sensitive grammar. Thus every context
free grammar is also context sensitive and hence every context free language is
also context sensitive. Not all grammars and languages are context sensitive.
Even this powerful class of grammars has its limitations because it does not allow
certain types of production rules. For example, none of the rules
AB CD or t tAB or AtB AB
where A, B, C, D N and t T are valid in a context sensitive grammar. The
broadest possible class of grammars is the class of unrestricted grammars. For
these grammars, almost any kind of production rules are allowed, the only re-
striction being that the left hand side of a rule cannot be . The four classes of
grammars and languages we have disscussed regular, context free, context sen-
sitive and unrestricted make up the Chomsky hierarchy 2, the most fundamental
classification in formal language theory.
Unrestricted Languages
Regular Languages
2Named after Noam Chomsky, a pioneer of the study of formal languages and their appli-
cation to both computer science and linguistics.
7.2. GREIBACH NORMAL FORM 67
(GNF1) A tB1 B2 . . . Bn
(GNF2) A t
(GNF3)
A very simple trick enables us to get rid of the troublesome terminal 1 on the
right hand side of each of the rules 01 and 01. We just replace it with
a completely new non-terminal symbol B. This works provided we add another
(GNF2) rule B 1 production rule which ensures that B is eventually replaced
3Definitions of Greibach normal form given in various texts vary quite a bit. We adopt the
definition used by Blum and Koch in the paper listed at the start of this chapter.
4Named after its inventor, Sheila Greibach, now Professor of Computer Science at UCLA.
68 7. CONTEXT FREE LANGUAGES
which is not quite in Greibach normal form yet, because there is a on the right
hand side of 0B. We can replace this with another entirely new non-
terminal symbol V , but we must then to duplicate every (GNF1) or (GNF2) rule
that has a on the left hand side by one that has every replaced by V . This
ensures that all derivations involving will still be possible in our new grammar.
We thus obtain two new rules from 0B and 0B by replacing s by
V s, giving the grammar
0V B 0B
V 0V B 0B
B1
which is in Greibach normal form. You should convince yourself that this really
is equivalent to the grammar we started with. In other words, you should check
that both grammars generate L.
Example 7.2.2. For the grammar with T = {0, 1}, N = {, U, V } and rules
UV V U
U 0V 0
V 1U 1
the rules for U and V are already consistent with Greibach normal form. By
making the two substitutions allowed by the rules for U , into the troublesome
rule U V we obtain two new (GNF1) rules 0V V and 0V . These
7.2. GREIBACH NORMAL FORM 69
new rules replace the existing rule U V . Using a similar approach, we can
replace the rule V U with two new (GNF1) rules, giving a grammar
0V V 0V 1U U 1U
U 0V 0
V 1U 1
in Greibach normal form. Again, you should convince yourself this is equivalent
to the original grammar.
The method used in Example 7.2.2 doesnt always work. This is the reason a
more sophisticated algorithm is sometimes needed to find an equivalent grammar
in Greibach normal form.
Example 7.2.3. Consider the grammar with T = {0, 1}, N = {, U } and rules
U 1U
U 0 0
U, U, U, U
and so on.
Example 7.2.3 illustrates just one of the problems that can be encountered when
attempting to apply a naive approach to transforming a context free grammar
into Greibach normal form. In view of such examples, it is not obvious that there
is always a way to carry out this transformation. The paper by Blum and Koch
appearing in the references at the start of this chapter contains a discussion and
proof of the following theorem.
Algorithm 7.1.
Note that if there is no (GNF3) rule, we need not bother adding any rules for the
initial stack symbol z. We will come back to the reasons for this shortly. The
construction is best understood by means of an example.
Example 7.3.1. Let L = {0n 1n : n 0}. In Example 7.2.1 we observed that the
Greibach normal form grammar with T = {0, 1}, N = {, V, B} and rules
0V B 0B
V 0V B 0B
B1
Q = {q}. x 0 1
= {0, 1}. (x, q, z) {(q, Bz), (q, V Bz)}
= {z, , V, B}. (x, q, ) {(q, B), (q, V B)}
F = . (x, q, V ) {(q, B), (q, V B)}
Initial state q (x, q, B) {(q, )}
Initial stack symbol z.
The production rules 0V B and 0B add (q, V B) and (q, B) respectively
to the set (0, q, ). Similarly, The production rules V 0V B and V 0B add
(q, V B) and (q, B) respectively to the set (0, q, V ). The production rule B 1
adds (q, ) to the set (1, q, B). Finally, since the (GNF3) rule is present,
the production rules 0V B and 0B add (q, V Bz) and (q, Bz) respectively
to the set (0, q, z).
In order to accept a word w = 0n 1n L, this non-deterministic PDA must guess
when it reaches the last 0 in w and switch from pushing V B to pushing B only.
After that it has no choice but to pop a B for each 1 in the input string. We
illustrate this with some configuration evolutions for some elements of L.
w = : (q, z)
0 1
w = 01 : (q, z) (q, Bz) (q, z)
0 0 1 1
w = 0011 : (q, z) (q, V Bz) (q, BBz) (q, Bz) (q, z)
0 0 0 1
w = 000111 : (q, z) (q, V Bz) (q, V BBz) (q, BBBz) (q, BBz)
1 1
(q, Bz) (q, z)
In all cases, the evolution ends with the configuration (q, z) so each word is ac-
cepted. You should convince yourself that L really is the language accepted by
this PDA by considering what happens when a word not in L is processed.
This construction always gives a PDA with one state q. To simplify notation in
practice classes and assignments, we will leave q out of the transition tables and
configuration notation, giving only the push word and the stack contents.
Example 7.3.1 illustrates an important fact about a single state PDA that accepts
on empty stack. Namely, if we start with just a z on the stack, the empty word
will always be accepted. This is not good news if we have a grammar whose
language does not contain . By definition Greibach normal form, rules of the
form A where A 6= are not allowed. In fact, the only way a Greibach
normal form grammar can generate is if it contains the (GNF3) production
because applying rules of type (GNF1) and (GNF2) always adds one
terminal symbol at each step in a derivation.
72 7. CONTEXT FREE LANGUAGES
This is convenient because it means we can easily tell whether or not the language
L generated by the grammar contains . When / L, there is a simple trick we
can use to prevent the PDA from accepting . Instead of starting with just the
initial stack symbol z on the stack, we start with z on the stack. This guarantees
that the initial configuration (q, z) is not accepting, so is not accepted. As a
bonus, we no longer need to define any transitions for the case where z is on the
top of the stack, simplifying the transition table a bit.
Example 7.3.2. Let L = {0n 1n : n 1}. From Example 7.3.1 and the preceding
paragraph, it should be clear that the Greibach normal form grammar with T =
{0, 1}, N = {, V, B} and rules
0V B 0B
V 0V B 0B
B1
Q = {q}. x 0 1
= {0, 1}. (x, q, z)
= {z, , V, B}. (x, q, ) {(q, B), (q, V B)}
F = . (x, q, V ) {(q, B), (q, V B)}
Initial state q (x, q, B) {(q, )}
Initial stack symbol z.
The transitions for , V and B are the same as in Example 7.3.1 and since there
is no (GNF3) rule, we dont need to define any transitions for z. As in Example
7.3.1, the PDA must guess when the last 0 is reached. We illustrate this by
giving configuration evolutions for some elements of L.
0 1
w = 01 : (q, z) (q, Bz) (q, z)
0 0 1 1
w = 0011 : (q, z) (q, V Bz) (q, BBz) (q, Bz) (q, z)
0 0 0 1
w = 000111 : (q, z) (q, V Bz) (q, V BBz) (q, BBBz) (q, BBz)
1 1
(q, Bz) (q, z)
Since the evolutions all end with (q, z), the words are accepted. Even though the
initial symbol z wasnt marking the stack top when we started, we still need it to
detect the fact that the stack was empty after processing the input.
It is also possible to show that for any PDA with language L there is a context
free grammar that generates L (and hence L is a context free language). As this
is rather complicated, we will not consider the details, but putting this result
7.4. DETERMINISTIC CONTEXT FREE LANGUAGES 73
together with the PDA construction of Algorithm 7.1 and Theorem 7.1 yields the
following theorem.
Example 7.4.1. The Greibach normal form grammar of Example 7.3.2 for L =
{0n 1n : n 1} has T = {0, 1}, N = {, V, B} and rules
0V B 0B
V 0V B 0B
B1
74 7. CONTEXT FREE LANGUAGES
and is not an s-grammar because, for example, there are two production rules
V 0V B and V 0B having the same non-terminal V on the left and terminal
0 on the right. There is, however, an equivalent s-grammar. One may check that
the grammar with T = {0, 1}, N = {, V, B} and rules
0V
V 0V B 1
B1
also generates L (you should convince yourself of this). The two rules V 0V B
and V 1 for the non-terminal V are acceptable in an s-grammar, because the
terminal symbols on the right are different. This is therefore an s-grammar.
When we carry out the construction of Algorithm 7.1 for an s-grammar, we obtain
a deterministic PDA. This happens because the defining property of an s-grammar
guarantees we can only ever add at most one element to (t, q, A) for each t T
and A N .
Example 7.4.2. Applying the construction of Section 7.3 to the s-grammar we
found in Example 7.4.1 for the language L = {0n 1n : n 1} yields the following
PDA.
Q = {q}. x 0 1
= {0, 1}. (x, q, ) {(q, V )}
= {z, , V, B}. (x, q, V ) {(q, V B)} {(q, )}
F = . (x, q, B) {(q, )}
Initial state q
Initial stack symbol z.
as reference sets for the sizes of finite sets. It should be obvious that every finite
set is in one-to-one correspondence with precisely one of them.
If we wish to use one-to-one correspondences to frame a definition of sets having
the same number of elements, we need to state in a precise mathematical way
what is meant by a one-to-one correspondence. Changing the correspondence
diagram slightly
x 1 2 3 4
f (x) a b c d
makes it clear that a one-to-one correspondence between sets is in fact a function
between the sets, not just any function, but one that matches up the elements
of its domain and codomain with each other in a one-to-one fashion. A function
between the sets A and B has this matching property precisely if it satisfies two
requirements:
It must be a one-to-one function. This means it must send different ele-
ments of its domain to different elements in the codomain. In quantifiers,
this may be expressed as
(y B)(x A) f (x) = y.
The quantified definition of a one-to-one function given above captures the moti-
vation of he definition nicely, namely, that distinct points in the first set cannot
correspond to the same point in the other. When checking whether a particular
function is one-to-one, however, it is often easier to use the equivalent contrapos-
itive form
(x, y A) f (x) = f (y) = x = y
of the definition. This is particularly so in cases where f is defined using a formula
rather than a table.
Example 8.1.1. Let A = {1, 0, 1, 2} and B = {2, 0, 2, 4}. To check that the
function f : A B defined by f (x) = 2x is a one-to-one correspondence, we
check that the contrapositive form of the definition of one-to-one holds
f (x) = f (y) = 2x = 2y = x = y
and that f is onto. Since B is finite, we can check this by checking exhaustively
that every element of B has something that maps to it:
So far we have only considered finite sets A and B, but this idea of saying A and
B have the same number of elements (or the same size) precisely if there is a one-
to-one and onto function f : A B still makes perfectly good sense in the case
where A and B are infinite. In fact, this is the universally accepted mathematical
definition of the idea that two sets are the same size.
Definition 8.1.1. We say sets A and B have the same size or cardinality if there
exists a one-to-one and onto function f : A B. We express this by writing
|A| = |B|.
Example 8.1.2. For the language L = {0n 1n : n 1}, it is easy to see that
the function f : N L defined by f (n) = 0n 1n is is a one-to-one correspondence.
Hence |N| = |L|.
All of this may seem like childs play in the case of finite sets, but in the infinite
case there are a few surprises in store.
Example 8.1.3 may seem a bit strange at first sight. Despite the fact that B is a
proper subset of A, it is the same size as A according to our definition. This kind
of thing seems strange because we are so used to thinking about the size of finite
sets, for which such behaviour is impossible1. There are stranger things to come.
There is another point about Definition 8.1.1 that may be bothoering some read-
ers. It seems to be asymmetrical. Surely |A| = |B| should imply |B| = |A| and
yet the definition is expressed using a one-to-one and onto function f : A B.
Recall from Example 1.3.4 that although a function f always has an inverse rela-
tion f 1 , there is no guarantee in general that f 1 is a function. The following
theorem, however, shows that if f : A B is one-to-one and onto, then so is
f 1 : B A and hence |B| = |A|. This shows that the asymmetry in Definition
1In fact, there is an alternative definition of a finite set which says that a set is finite precisely
if it does not have the same cardinality of any of its proper subsets. A set having this property
is called Dedekind finite.
78 8. COUNTABILITY AND UNCOUNTABILITY
8.1.1 is apparent rather than real. In other words, when proving set cardinalities
are equal, it doesnt matter which direction the function goes.
Theorem 8.1 is also useful for checking that a function is one-to-one and onto,
because it says we can do so by finding a formula for the inverse. And in order
to establish that a function g is the inverse of a function f , all we need to do is
check that the following two conditions hold.
(INV1) g(f (x)) = x for every x in the domain of f .
(INV2) f (g(x)) = x for every x in the domain of g.
Example 8.1.4. The set Z of integers consists of all whole numbers, positive,
negative and zero. The one-to-one correspondence suggested by
1 2 3 4 5 6 7 8 9 ...
l l l l l l l l l
0 1 1 2 2 3 3 4 4 . . .
can be turned into a formula for a function f : N Z defined by
1n if n is odd
2
f (n) =
n
2
if is even.
In view of Theorem 8.1, we can show that f is one-to-one and onto by showing it
has an inverse. The function g : Z N defined by
1 2n if n 0
g(n) =
2n if n > 0.
is the inverse of f (you should check this, by checking conditions (INV1) and
(INV2)) so f is one-to-one and onto and hence |Z| = |N|.
8.1.2. Large and small. When trying to decide when one set is larger than
another, it is again sensible to start with finite examples. Although there is no
one-to-one correspondence between A = {a, b, c, d} and B = {a, c, e}, we can get
a one-to-one correspondence between B and a subset of A.
a c e
l l l
a b c d
Such a correspondence gives a one-to-one function from B to A that is not onto.
This gives a way of expressing the fact that A is larger than B, but we have to
be careful. We could attempt to define |B| < |A| to mean that there is a one-to-
one function from B that is not onto. This would work in the finite case, but it
8.2. COUNTABLE SETS 79
would give a questionable definition in the case where the sets are infinite. The
function f (x) = 2x of Example 8.1.3(a) illustrates this, because it is a one-to-one
function but not onto function from N to N showing that |N| < |N| according to
this tentative definition!
To avoid these difficulties, we instead define |B| |A| to mean that there is an
one-to-one function from B to A. This function could be onto, in which case the
sets would equal, as suggested by the notation. The statement |B| < |A| is
then defined to mean that |B| |A| is true, but |B| = |A| is false. In other
words, there is an one-to-one function from B to A, but there is no one-to-one
and onto function. This definition lets us prove the following (rather obvious, but
very useful) theorem.
We saw that |Z| = |N| in Example 8.1.3(c) and we now know that |Z| |Q| and
|Q| |R|. There is no obvious way of deciding whether |Z| < |Q| or |Q| < |R| at
this stage. In fact, some real surprises are in store here.
Theorem 8.3. If a set A is infinite and |A| |N| then |A| = |N|.
x1 , x2 , x3 , x4 , . . .
in which no element is ever repeated. This follows directly from the definition of
cardinality, for if |A| = |N| there is a one-to-one and onto function f : N A,
and putting
x1 = f (1), x2 = f (2), x3 = f (3), x4 = f (4), . . .
gives the desired listing of the elements of A. The fact that f is one-to-one
guarantees no element is ever repeated. Moreover, this argument can easily be
reversed to show that any set A whose elements can be written as an infinite list
x1 , x2 , x3 , x4 , . . .
with no element is ever repeated has the same cardinality as N. All we need to
do is define f : A N by f (xi ) = i. Sets of this cardinality are so important,
they have a special name. Just as the sets Nn of Section 8.1 are used to represent
the sizes of finite sets, the set N is used as the standard representative the size of
these sets. This is the idea behind the following definition.
Definition 8.2.1. A set A is called countably infinite if |A| = |N| and A is called
countable if |A| |N|.
Example 8.1.5(c) shows immediately that every finite set is countable. In fact by
the version of Theorem 8.3 given above in (), the countable sets that are not
countably infinite are precisely the finite sets. Example 8.1.3 shows, somewhat
surprisingly, that the set of even natural numbers and the set Z of integers are both
countably infinite. Much more surprising is the fact that the Cartesian product
N N pictured in Figure 1, is countable.
1 2 3 4
Even by itself, the bottom row is a copy of N, so N N appears to have far more
elements than N. However we can use the idea of writing the elements of N N
as an infinite list to demonstrate that N N is countably infinite. The trick is to
count following a diagonal pattern, as follows:
(1, 1), (1, 2), (2, 1) (1, 3), (2, 2), (3, 1), ....
| {z } | {z } | {z }
sum 2 sum 3 sum 4
We first count the ordered pairs with sum 2, then those with sum 3 (in increasing
order of the first coordinate), then those with sum 4, and so on. Pairs with the
same sum lie on a diagonal. This process is illustrated in figure 2.
1 2 3 4
The fact that N N is countably infinite has other surprising consequences. Let
Q+ denote the set of positive rational numbers. Recall that we can write any
positive rational number q Q+ in lowest positive terms. This means q can be
written in the form q = m q
nq
where mq and nq have no common factors (because
we have cancelled as far as possible) and mq and nq are also both positive (there
is no point writing q = mn
where m and n are both negative). The fact that we
have written q in its lowest terms means mq and nq are unique, so the function
f : Q+ N N defined by f (q) = (mq , nq ) is one-to-one and hence
|Q+ | |N N| = |N|
which shows that |Q+ | |N|. Since Q+ is infinite, Theorem 8.3 shows that
|Q+ | = |N|. A construction similar to the one used in Example 8.1.4 can be used
to show that |Q+ | = |Q|, giving the the following theorem.
In view of the very generous way in which the rational numbers are scattered
among the real numbers between any pair of distinct real numbers, there is
always a rational number this seems very surprising. It looks as though there
are a should be a lot more rational numbers than there are natural numbers!
For each n N the fact that kn B means there is an x A such that f (x) = kn .
Since f is one-to-one, there is only one such element, so we an call it xn and define
g : N A by g(n) = xn . It may be shown that g is one-to-one and onto, which
shows |A| = |N|. It is left as a challenge to the interested student to show g is
one-to-one and onto.
Example 8.3.1.
(a) If Ai = {i} for each i N, it is easy to see that A1 A2 A3 = N.
(b) If Ai = {i, i, 0} for each i N, it is easy to see that A1 A2 A3 = Z.
(c) If Ai = {0i 1i } for each i N where 0i 1i denotes a word in {0, 1} , it is
easy to see that A1 A2 A3 . . . is the language L = {0n 1n : n N}.
For the skeptics we note that this list order is given by the function
[
f: Ai N
i=1
84 8. COUNTABILITY AND UNCOUNTABILITY
in the case where i > 1. It is left as a challenge to the interested student to show
f is one-to-one and onto.
In fact, it is possible to drop the requirement that the Ai be disjoint. This requires
a modification to the above list in the case where the Ai are not disjoint as we
construct the list, whenever we encounter an aij we have seen before, we just leave
it out. This yields the following theorem.
|| = m.
where i > 0, we have w1 so there are at most m possible choices for the
symbol w1 . Similarly, there are at most m possible choices for w2 , w3 , . . . , wi1
and wi . These choices may or may not be independent, depending on L, but in
the case where they are independent, we obtain the total number of words in Li
by multiplying, so
|Li | = m m m = mi .
If the choices are not independent, the number of possibilities will actually be less
and we will have |Li | < mi . In either case Li is finite. Finally, there are only two
possibilities for L0 . If L containd then L0 = {} and if not L0 = .
and Theorem 8.5 shows L is countable. This gives the following theorem.
We can obtain an even better result than Theorem 8.5 using the fact that N N
is countably infinite. If A1 , A2 , A3 , . . . is an infinite sequence of disjoint countable
sets, then for each i N there is a one-to-one function gi : Ai N (not necessarily
onto because some of the Ai might be finite). Because the Ai are disjoint, we can
define
[
f: Ai N N
i=1
by f (aij ) = (i, gi (j)). (Can you see why we require that the Ai be disjoint here?)
It can be shown that f is one-to-one and hence
[
Ai |N N| = |N|
i=1
2Notice that a language nee not be infinite. It may be the case that only finitely many of
the Li are non-empty, in which case the union in () would be finite.
86 8. COUNTABILITY AND UNCOUNTABILITY
Definition 8.4.1. Let A be a set. The power set of A is the set of all subsets of
A (never forget that this always includes and A itself) and is denoted P(A).
Example 8.4.1.
(a) P() = {}. Notice that even though is empty, P() is not, since it
contains itself.
(b) P(N1 ) = P ({1}) = {, {1}} so |P({1})| = 2.
(c) P(N2 ) = {, {1}, {2}, {1, 2}} so |P(N2 )| = 4.
(d) P(N3 ) = {, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}} so |P(N3 )| = 8.
(e) P(N) contains , N, all of the finite subsets of N, for example,
{1}, {1, 2}, {4, 7}, {100, 1000}, {10, 11, . . . , 20}
and so on as well as all of the infinite subsets like the even numbers, the
odd numbers, the perfect squares {1, 4, 9, 16, . . . }, and countless others
(in fact, uncountably many, as we shall soon see).
You may have already spotted a pattern in Example 8.4.1. It turns out that
and from this it may be shown3 that if |A| = n then |P(A)| = 2n . It follows that
|A| < |P(A)| for any finite set A. Remarkably, this is true even for infinite sets.
Theorem 8.8 (Cantors Theorem). |A| < |P(A)| for any set A.
Thus |N| < |P(N)|, so P(N) is our first example of an uncountable set. By
applying Cantors Theorem repeatedly, we cans construct a sequence of infinite
sets of strictly increasing cardinality
are all uncountable, so we have not just shown how to construct an uncountable
set we have shown how to construct uncountable sets as big as we like. These sets
may seem a little unfamiliar, but we can use the fact that P(N) is uncountable,
to show that more familiar sets are also uncountable.
Example 8.4.2.
(a) Let B denote the set of all infinite binary sequences4, so B is the set of
all sequences of the form
x1 , x2 , x3 , . . .
f (0, 0, 0, 0, . . . ) =
f (1, 0, 0, 0, . . . ) = {1}
f (0, 1, 0, 1, 0, 1, . . . ) = {n N : n is even}
f (1, 1, 1, 1, . . . ) = N.
f (0, 0, 0, 0, . . . ) = 0
1
f (1, 0, 0, 0, . . . ) = 0.1 = 10
f (1, 1, 1, 1, . . . ) = 0.1111 = 19 .
It is well known that any two distinct decimal representations give distinct
real numbers except in the important case where one representation ends
in an infinite string of 9s. For example,
Theorem 8.9. The set of all possible languages made from any non-empty alpha-
bet is uncountable.
The fact that |P(A)| |A| follows from Theorem 8.2 since the subset
{{x} : x A}
of P(A) clearly has the same cardinality as A (you should prove this).
To show |P(A)| > |A| we need to establish that |P(A)| = |A| is false. Here we
need to use proof by contradiction. This means we assume the negation of what
we really want to prove and show that this leads to a contradiction a statement
that is clearly absurd or false.
In this case, we assume that |A| = P(A)|. This means there is a one-to-one and
onto function f : A P(A). Now here comes the fiendishly clever trick! Define
E = {x A : x
/ f (x)}.
Although this definition looks a bit strange, it makes sense because the codomain
of f consists of subsets of A, so f (x) is a set for each x A. This means we can
always ask whether or not x f (x) and define E to be the set of x values for
which this is false. It may be that E is empty, but this doesnt matter.
Since E X it is an element of the codomain P(X) of f . Since f is onto, there
must be a z X such that f (z) = E. We get a contraction from the fact that z
must either be an element of E or not:
8.4. UNCOUNTABLE SETS 89
If z E then z
/ f (z) by definition of E, but f (z) = E so this means
z / E. But this is a contradiction because we have now shown that
z E = z / E.
If z
/ E then z f (z) by definition of E, but f (z) = E so this means
z E. This is also a contradiction because we have shown that z /
E = z E.
Since both cases lead to a contradiction, we conclude that there can be no one-
to-one and onto map A P(A) so |P(A)| = |A| must be false, as we wanted to
prove. Thus |A| < |P(A)|.
Chapter Nine
a rigourous demonstration of the fact that these languages are not context free
using the following theorem.
Theorem 9.1 (Pumping Lemma for Context Free Languages). For any context
free language L, there is a special number p N (called a pumping length) that
has the following property:
Any word w L such that |w| p can be written as w = uvxyz where
(a) |vxy| p,
(b) v 6= or y 6= (i.e. at least one of v and y is non-empty),
(c) The word uv i xy i z is in L for every i 0.
The name of Theorem 9.1 comes from (c). It tells us that the words
uvvxyyz, uvvvxyyyz, . . . , uv i xy i z, . . .
and so on are all in L. Provided w L is long enough to begin with, we can
pump it up in the way described by (c) to get longer and longer words that are
also in L. We will not attempt to prove this result in MAT2MFC. A proof would
be at least a weeks work in itself. We will simply demonstrate how it can be
used to show that languages are not context free. To do this we need to use the
technique of proof by contradiction. This means we assume the negation of what
we really want to prove and show that this leads to a contradiction a statement
that is clearly false. We begin by assuming the language is context free and show
(using the pumping lemma) that this leads to a false conclusion.
9.1 tells us that s L1 . This shows vxy cant have been a sub-word of
0n after all.
(ii) By similar reasoning vxy cant be a sub-word of 1n or some 2n either.
The only possibilities left are that vxy is a sub-word of 0n 1n or 1n 2n .
(iii) If vxy were a sub-word of 0n 1n , a similar (although slightly more compli-
cated) argument as the one in (i) would show that vvxyy would have to
contain either more 0s than vxy or more 1s than vxy (or possibly both).
But this in turn would mean that s = uvvxyvz would have to contain
either more 0s than 2s or more 1s than 2s. In either case, we have
s / L1 , but as before, Theorem 9.1 tells us that s L1 . As before, vxy
cant have been a sub-word of 0n 1n after all.
(ii) By similar reasoning vxy cant be a sub-word of 1n 2n either. There are
no possibilities left.
All possibilities ended in tears! We must conclude that L1 is not context free.
Theorem 9.2. Let a be any symbol. A language L {a} is context free precisely
if it is regular.
We already know that languages like {0n 1n : n 0} are context free but not
regular, so even for a two letter alphabet, this theorem no longer holds. For
L {a} however, if a suffix set argument shows L is not regular, then L cannot
be context free either.
cannot be in L since 2n (2mn + 1) has an odd factor 2mn + 1 and therefore cannot
be a power of 2. (You should convince yourself that powers of 2 never have odd
n
factors.) This shows that the suffix sets S(02 ) are all distinct, so L is not regular
by Theorem 5.3 and therefore not context free by Theorem 9.2.
94 9. MORE POWERFUL MACHINES
1Named after the mathematical logician and cryptanalyst Alan Turing, who proposed what
we now call Turing machines in the 1930s. Many consider him the founder of computer science.
96 9. MORE POWERFUL MACHINES
a 7 (u, R),
c 7 (v, L) 7 (, L)
p q r s
Q = {q0 , q1 , q2 , q3 , q4 }.
initial state q0 . x 0 1 a
(x, q0 ) (a, q1 , R) (1, q4 , L)
tape alphabet = {0, 1, a, }.
(x, q1 ) (0, q1 , R) (1, q1 , R) (1, q2 , L) ( , q2 , L)
input alphabet = {0, 1}.
(x, q2 ) (a, q3 , L)
accepting states F = {q4 }. (x, q3 ) (0, q3 , L) (1, q3 , L) (0, q0 , R)
a 7 (0, R) a 7 (1, L)
1 7 (1, L), 0 7 (0, L) q3 q0 q4
1 7 (a, L) 0 7 (a, R)
a 7 (1, L),
7 (, L)
q2 q1 1 7 (1, R), 0 7 (0, R)
a 7 (0, R) a 7 (1, L)
1 7 (1, L), 0 7 (0, L) q3 q0 q4
1 7 (a, L) 0 7 (a, R)
a 7 (1, L),
7 (, L)
q2 q1 1 7 (1, R), 0 7 (0, R)
(q0 , 0011) (q1 , a011) (q1 , a011) (q1 , a011) (q1 , a011 )
(q2 , a011) (q3 , a01a) (q3 , a01a) (q3 , a01a) (q0 , 001a)
(q1 , 0a1a) (q1 , 0a1a) (q2 , 0a11) (q3 , 0aa1) (q0 , 00a1)
(q4 , 0011) (accept)
We can also use configuration notation to illustrate why the machine rejects words
that are not in L.
w = 001 : (q0 , 001) (q1 , a01) (q1 , a01) (q1 , a01 ) (q2 , a01) (q3 , a0a)
(q3 , a0a) (q0 , 00a) (q1 , 0aa) (q2 , 0a1) (reject)
w = 011 : (q0 , 011) (q1 , a11) (q1 , a11) (q1 , a11 ) (q2 , a11) (q3 , a1a)
(q3 , a1a) (q0 , 01a) (reject)
w = 010 : (q0 , 010) (q1 , a10) (q1 , a10) (q1 , a10 ) (q2 , a10) (reject)
We have already observed that TM always have output because the contents of the
tape at the end of processing may be viewed as output. We now consider examples
that make deliberate use of this idea to carry out some string calculations.
98 9. MORE POWERFUL MACHINES
0 7 (a, L)
q0 q1 0 7 (0, L)
q3 q2 0 7 (0, R)
The string function of the machine of Example 9.2.2 can be computed much more
easily and efficiently by the FSM (with output) shown in Figure 4. Indeed, this
is a trivial task for an FSM. This illustrates just how much harder TM are to
program. However the machine of Example 9.2.2 can be expanded to perform a
task impossible for an FSM or PDA computing the string function f : 0 0
n
defined by f (0n ) = 02 , as we shall see in Example 9.2.4.
q0 0/00
toward he middle, this time erasing the right hand end of the word as the right
hand marker moves toward the centre. Similar to Example 9.2.1, provided the
as meet up in the middle, the word must be of even length. Notice that this
strategy could also be used to design a machine that checks whether a a given
input word is of even length without erasing it. The machine implements the
string function
f (02n ) = 0n
with domain {02n : n 1}. It is shown in Figure 5.
a 7 (0, R) a 7 (, L)
0 7 (0, L) q3 q0 q4
0 7 (a, L) 0 7 (a, R)
a 7 (, L)
7 (, L)
q2 q1 0 7 (0, R)
a 7 (, L)
a 7 (0, R)
0 7 (0, L) q4 q0 q5 0 7 (0, L)
7 (, R)
0 7 (a, L)
q3 0 7 (a, R)
0 7 (0, R) 7 (, L)
0 7 (0, R) q2 q1 q6
doubled and not mix up the counters with the output. Then each 1 (there
should be n of them to begin with) is deleted but the number of 0s is doubled
using a copy of the machine of Example 9.2.2. When all 1s have been deleted,
the machine halts.
q3
0 7 (0, L)
7 (, L)
1 7 (1, R) 0 7 (a, L)
1 7 (1, R) p3 q0 q1
7 (, L)
1 7 (, L) 7 (0, L)
0 7 (0, L)
1 7 (1, L) p1 p2 q2
7 (, L)
0 7 (0, L) 0 7 (0, R)
0 7 (1, R) p0
n
Figure 7. Turing machine that computes f (0n ) = 02 .
In view of the result of Example 9.1.2, the machine constructed in Example 9.2.4
proves conclusively that TM are strictly more powerful than PDA. In fact they
are very powerful indeed, even if somewhat clumsy and inefficient. We will come
back to the issue of just how powerful in Section 9.3. This increase in power
is not surprising given the substantially greater flexibility of memory access we
grant to TM. However, this greatly expanded power has its down side. We have
already observed that TM are typically less efficient than PDA and they tend to be
harder to program, but there is a far more serious problem. TM sometimes fail
to stop processing. Given certain input words, they may just continue processing
indefinitely, neither reaching an accepting state nor halting in a non-accepting
state due to an undefined transition. This is analogous to a program entering an
infinite loop.
Example 9.2.6. The very simple TM in Figure 8 was designed to accept the
regular language L = {(01)n 0 : n 0} = {01} 0 by simulating a simple
finite state recognition machine. Suppose we mistakenly (a typical programming
102 9. MORE POWERFUL MACHINES
0 7 (0, R)
7 (, L)
q0 q1 q2
1 7 (1, R)
error) include the transition 1 7 (1, L) in place of 1 7 (1, R) giving the almost
identical machine of Figure 9. This new machine still correctly accepts the word
0 L and it still correctly rejects, for example, words beginning with 1 or 00 (you
should check these claims). However, using configuration notation to analyse the
0 7 (0, R)
7 (, L)
q0 q1 q2
1 7 (1, L)
(d) Turing machines with a one sided tape have a tape with a fixed starting
point that only extends infinitely in one direction.
None of these have yielded anything different. Just as non-deterministic finite
state recognition machines turn out to be equivalent in power to deterministic
ones, so non-deterministic TM are equivalent in power to deterministic ones.
Similarly, the other variations have all turned out to yield classes of machines
equivalent in power to standard TM. This is not to say that these variations are
of no interest. Some of them allow for easier coding of algorithms, easier proofs
and so on. For example, multiple tapes allow for many words to be processed si-
multaneously and compared with one another or for distinct tapes used for input
and output. Various other alternative models of computation have also been pro-
posed, but they have all turned out to be equivalent to (or in some cases weaker
than) standard TM.
This history has led to a widespread acceptance of the idea that TM provide
a definition of what can can be computed by an algorithm. A function is said
be computable by an algorithm precisely if there is a TM that computes it and
halts on all possible inputs. We want our machine to halt on all possible inputs
because we feel that something worthy of the name algorithm should not go
into an infinite loop and should eventually give an output in all cases. Roughly
speaking, this idea of equating algorithmic computability with computability by
a TM (or with some equivalent system) is known as the Church-Turing thesis 2.
As a definition of computability it can neither be proved nor disproved, but the
fact that it has stood the test of time gives us confidence that it is a sensible
definition. Not only are halting TM the gold standard of what can and cannot
be computed by an algorithm, they also underpin the theory of computational
complexity, which analyses how efficiently (both in terms of time and memory)
computations can be done.
9.3.2. The limits of computation. Notwithstanding the above comments,
TM are not omnipotent. There are some things they cant do. One consequence
of the following theorem is that are even languages they cannot recognize.
Theorem 9.3. There are countably many Turing recognition machines with a
given input alphabet = {x1 , x2 , . . . , xm }.
really matter how we label the states) and that the set F of accepting states is
not empty (since the language of a recognition machine with no accepting states
is empty). We also assume that the set of marker elements of the tape alphabet
the set \ ( { }) is of the form {a1 , . . . , ak } for some k 0 (because
it doesnt really matter what the markers are called, as long as we have enough
of them). The transition table of a machine with n states and and k markers has
the following structure.
x x1 . . . x m a0 . . . ak
(x, q0 )
(x, q1 )
.. .. ..
. . .
(x, qn1 )
The table entry for (x1 , q0 ) marked by is either or of the form (b, q, D) for
some q Qn , b = {x1 , . . . , xm , a1 , . . . , ak , } and D {L, R}. There are
m + k + 1 possibilities for b, multiplied by n possibilities for q, multiplied by 2
possibilities for D giving 2n(m + k + 1) + 1 possible ways of completing the entry
for (x1 , q0 ). But this argument works exactly the same for each of the n(m+k+1)
entries in the table. By the type of counting argument familiar from MAT1DM,
this means that the total number of ways to fill in the table entries is
(2n(m + k + 1) + 1)n(m+k+1) . ()
Since we have agreed that q0 is the initial state, the only remaining issue is which
states are accepting. Now F Q so the number of possible ways of choosing F is
|P(Q)| = 2n and since we usually want our machine to have at least one accepting
state we can rule out F = giving 2n 1 choices for F . Putting this together
with (), there are
(2n 1)(2n(m + k + 1) + 1)n(m+k+1) ()
elements in the set M(n,k) of machines with n states and k markers.
In particular M(n,k) is finite. We saw in Section 8.2, the set N N is countable.
It is easy to extend this argument to show (N {0}) N is countable (you should
check this), so there are countably many distinct sets M(n,k) which we may list as
M1 , M2 , M3 , . . . Since each of these sets is finite, Theorem 8.5 shows that their
union
[
M= Mi
i=1
is countable. But M is clearly the set of all Turing recognition machines with
input alphabet .
9.3. THE POWER OF TURING MACHINES 105
In view of Theorems 8.9 and 9.3, there must be languages with alphabet
= {0, 1} that are not recognized by any TM. Remarkably, there must even be
languages with one letter input alphabet = {0} that are not recognized by any
TM. Notice that the problem here is not that we lack the cunning to construct
recognition machines for some languages. It is that there arent enough recognition
machines to recognize all of the possible languages with input alphabet . Given
that some languages are not recognized by TM, there is a name for those that
are. They are called recursively enumerable. It even turns out that there are
some languages that can be recognised by a TM but not by any TM that halts
on all possible inputs. Because of this situtation, there is a name for the class of
languages recognised by some TM that is guaranteed to halt on all inputs. Such
languages are called recursive. In view of the above discussion, we may think of
the recursive languages as those that can be recognized by some kind of algorithm.
This result is just the tip of the iceberg. There are many, many things that simply
cannot be done by TM. Perhaps the most famous is the halting problem for TM
themselves. This is the problem of deciding whether a given Turing machine M
with input alphabet fails to halt when it attempts to process a given word w.
Before we can even think about asking a TM to solve this problem, we need to
be able to represent our machine M in some form that can be used as input to
another TM. This always turns out to be possible because of the finite nature of
a TM:
We can represent this information using a finite string. The main difficulty is that
if we want to feed this string to a TM, it must be based on a finite alphabet. This
means we cant have infinitely many symbols q0 , q1 , q2 , . . . for our states. We can
avoid this problem by using the words q, qq, qqq, . . . to represent the states. This
means we only need one symbol to represent all of the states. A similar trick can
be used for marker symbols, which we represent as a, aa, aa, . . . and so on.
Example 9.3.1. The machine of Figure 9 has transition table shown and:
states Q = {q0 , q1 , q2 }. x 0 1
initial state q0 . (x, q0 ) (0, q1 , R)
accepting states Q = {q2 }. (x, q1 ) (1, q0 , L) ( , q2 , L)
tape alphabet = {0, 1, }. (x, q2 )
input alphabet = {0, 1} .
106 9. MORE POWERFUL MACHINES
0 = {q, , |, (, ), , , /}
and completely defines the operation of the machine, because from the string
S you could easily write down the transition table and hence draw the graph.
Moreover, 0 could be used to describe any machine with input alphabet . (We
used a forward slash (/) in place of a comma (,) here to avoid some very confusing
set notation.) It is now a simple matter to add a word w {0, 1} to this encoding
by simply adding w to the end of S. We are now ready to feed this string S w
to our very clever TM, in the hope that it can decide whether this machine would
halt if given the input w.
Example 9.3.1. Among the many decision problems known to be undecidable are
the following surprisingly simple ones.
Is the language of a given context free grammar G a regular language?
For a given context free grammar G with set of terminal symbols, is
the language of G is the whole of ?
Do a pair of context free grammars G1 and G2 give the same language?
Here again, the claim is not that we can never decide whether a particular context
free grammar actually generates a regular language. The claim is that there is no
TM that can decide this question for all possible context free grammars.