Beruflich Dokumente
Kultur Dokumente
In this first section we will look at alphabets, strings and languages. In this
setting, an alphabet will be any finite set, that we will think of as symbols; a
string or a word will just be any sequence of elements of the alphabet, and a
language will be a collection of words.
1.1 Alphabets
We will start with some basic definitions:
Definition 1. By an alphabet we will mean a finite set. A member of an
alphabet is known as a symbol or a letter.
Example 1. In genetics, the information in the DNA is encoded in a sequence
amino acids. Each of these amino acids is one of adenine (A), cytosine (C),
guanine (G) and thymine. This means that we can view the gene as a sequence
of the letters A, C, G and T, so the relevant alphabet here is {A, C, G, T }
Example 2. For writing integers in binary form, one needs just the digits 0
and 1, so we get the alphabet {0, 1}, if we want to write real numbers in binary
form we also need the binary point, and we get the alphabet {0, 0, .}.
1.2 Strings
Definition 2. A string over the alphabet Σ is a finite sequence of symbols in
Σ.
A sequence can be empty, and when it is, we have the empty string, which we
write ε. The length of a string is simply the number of (not necessarily distinct)
symbols in the string. The length of a string u is denoted by |u|.
Example 3. Some strings over the alphabet {0, 1} are 000, 0101, 0 and ε. Their
lengths are 3, 4, 1 and 0 respectively.
If u and v are two strings over some alphabet Σ, then we can form the
concatenation uv of the two strings, which is obtained by writing all the letters
in u followed by the letters of v. Sometimes we write u · v to emphasize the
operation of concatenation.
Example 4. Let u = 0101 and v = 101, then uv = 0101101 and vu = 1010101.
Since the empty string does not contain any letters, we get the identity
ε·u=u·ε=u
The above identity together with the fact that concatenation is associative,
that is, for all strings u, v and w
(uv)w = u(vw)
1
makes the set of strings over an alphabet Σ into a monoid.
Just as powers of numbers are just repeated multiplication, we form powers
of strings as well. For a string u and an non-negative integer n, we can define
un to be u · u · · · · · u, where we have u repeated n times. Just as for numbers,
we get the identity
un · um = un+m ,
directly from the definition of powers of strings.
1.3 Languages
Definition 3. A language over Σ is a subset of Σ∗ , the set of all strings over
Σ.
Let L and M be languages over an alphabet Σ, from these we can form new
languages in a variety of ways.
We can take the set of all strings that are members at least one of the
languages L and M , and we get their union
L ∪ M = {w | w ∈ L or w ∈ M }.
We can take the set of all string that are members of both the languages L
and M and we get their intersection
L ∩ M = {w | w ∈ L and w ∈ M }.
We could also take the set of all strings that are not members of L and we
get the complement of L
L̄ = {w | w 6∈ L}.
If we remove all strings from the language L that also are elements of M ,
we get the set difference of L and M
L \ M = {w | w ∈ L and w 6∈ M }.
The above operations are not special to languages, we can form unions,
intersections and complements of arbitrary sets. Something which is special
for strings1 is that we can take the product of two strings. If we consider the
collection of all the strings we can get by taking one string from L and one
string from M and multiplying them together we get the product of L and M
L · M = {uv | u ∈ L, v ∈ M }.
2
Another operation we can do on a language, is to consider the set of all
strings we can get by taking n strings u1 , u2 , . . . , un from L and form their
product u1 u2 · · · un , if n = 0 we take the empty string as the product. The
resulting language is known as the Kleene closure of L
L∗ = {u1 · · · un | n ≥ 0, ui ∈ L}
L∗ = L0 + L1 + L2 + · · · .
Example 6. Let L = {aa, ab} and M = {aa, bb, ba} be languages over Σ =
{a, b}. Then L ∪ M = {aa, ab, ba, bb}, L ∩ M = {aa}, L \ M = {ab}, LM =
{aaaa, aabb, aaba, abaa, abbb, abba}, and M L = {aaaa, aaab, bbaa, bbab, baaa, baab}.
1.4 Exercises
1. Let u = ab and v = ba. What is u2 v 2 and (uv)2 ?
2. Let Σ = {a, b}. The languages L1 , L2 and L3 over Σ are given by L1 =
{aab, bba, aa}, L2 = {bb, bbb} and L3 = {ba, ab}.
(b) Which (if any) of the following strings are in the language L∗1 : aab, bbaa,
bbaaa, bbaaab, aaaaaaa?
2. Σ is a finite alphabet.
3
2.1 The transition diagram
We can describe a DFA in various forms. One way is to the transtition diagram.
This is a diagram that is constructed in the following way:
1. One circle is drawn for each state, and the label of the state is written
inside the circle.
2. For each state s ∈ S and each letter x ∈ Σ, a directed edge (an arrow) is
drawn from s to δ(s, x).
3. An arrow that does not start at a state, but points to the initial state i is
drawn.
4. All circles for the accepting states t ∈ T are doubled.
If this procedure is applied to the machine A, we get the following diagram:
/ ?>=<
89:; / ?>=<
89:;
0123
7654
a
s0 o s1
a
δ ∗ (s, ε) = s (1)
δ ∗ (s, wx) = δ (δ ∗ (s, w), x) (2)
where s ∈ S, w ∈ Σ∗ and x ∈ Σ.
4
Definition 6. Let A = (S, Σ, i, δ, T ) be a DFA, the language recognised by A,
L(A) is then defined as the set of all strings w in Σ∗ such that δ ∗ (i, w) ∈ T .
Definition 7. A language L over the alphabet Σ∗ is said to be regular if there
exists a DFA A such that L = L(A).
Let the initial state in A be sε and the accepting states be all swi .
We can now claim that L(A) = L. We prove this by proving that
(
sv , if v ∈ PL ,
δ ∗ (sε , v) = (5)
s⊥ , otherwise.
We proceed by induction. For the base of the induction, it is clear that δ ∗ (sε , ε) =
sε . Now assume that the equality (5) holds for the string u, then for x ∈ Σ, we
have that, if ux ∈ PL , then u ∈ PL , and δ ∗ (sε , u) = su and hence δ ∗ (sε , ux) =
δ(su , x) by the definition of the extended transition function. If ux is not in PL
it is clear that δ ∗ (sε , ux) = s⊥ . Thus we have proved the equality in (5), from
which we can deduce that w is in L(A) precisely when w ∈ L.
5
Example 8. Let L = {00, 001, 10}. The set of all prefixes of the strings in L is
then
{ε, 0, 1, 00, 10, 001}
From each of the prefixes we thus get a state, and also an extra failure state, so
the full set of states
{sε , s0 , s1 , s00 , s10 , s001 , s⊥ }.
The initial state will be s⊥ , and the accepting states will be the states corre-
sponding to the strings of L: {s00 , s001 , s10 } We can now construct the arrows
of the transition diagram as follows:
1. From the state sε we draw an arrow labelled by 0 to s0 , since ε · 0 = 0
which is a prefix in P , similarly we get an arrow labelled by
1
/ ?>=<
89:; 0 / ?>=<
89:; / GFED
@ABC
89:;
?>=< / @ABC
GFED
s?>=<
89:; / GFED
@ABC
a,b
0,1 &
0 1
sε s0 s00 s⊥
AA 001
89 D
AA
AA1
AA
89:;
?>=< / GFED
@ABC
89:;
?>=<
0
A 0,1
0
s1 s10
1
6
For the state sw we set
/ ?>=<
89:; / ?>=<
89:; / GFED
@ABC / @ABC
GFED
s?>=<
89:; GFED
@ABC
0
%)
0 1 1
sε s0 s01 s⊥
011
S 8 S
0 0,1 0,1
7
Example 10. Let L be the language consisting of all strings that have the
string 011 as a suffix, i.e. all strings ending in 011; we will then construct a
DFA recognising L.
1. For each prefix of 011 we get a state, so the set of states is S = {sε , s0 , s01 , s011 }.
2. As before we go through the states one by one to construct the arrows in
the transition diagram.
(a) From sε , we see that for the arrow with label 0, that the longest suffix
of ε · 0 = 0 which is a prefix of 011 is 0, so the target of the 0 arrow
is s0 . When it comes to the arrow labelled by 1, the longest suffix of
ε · 1 = 1 that is a prefix of 011 is ε (because the only other suffix of
1, namely 1 itself is not a prefix of 011). so the target of the arrow
labelled by 1 is thus sε .
(b) From s0 : the longest suffix of 0 · 0 = 00 that is a prefix of 011 is 0,
so we get a loop labelled by 0, the longest suffix of 0 · 1 = 01 that is
a prefix of 011 is 01, so we get an arrow labelled by 1 to s01 .
(c) From s01 : the longest suffix of 01 · 0 = 010 that is a prefix of 011 is 0
so we get a 0-labelled arrow to s0 , and the longest suffix of 01·1 = 011
that is a prefix of 011 is 011, and we get a 1-labelled arrow to s011 .
(d) Lastly from the state s011 , we get that 011 · 0 = 0110 has 0 as its
longest suffix which is a prefix of 011, and that 011 · 1 = 0111 has ε
as its longest suffix which is a prefix of 011. This means that we get
a 0-labelled arrow to s0 and a 1-labelled arrow to sε .
3. The initial state i is sε .
4. The only accepting state is s011 .
Again we summarise this in the transition diagram.
1
/ ?>=<
89:; / ?>=<
89:; / GFED
@ABC / @ABC
GFED
s?>=<
89:;
0
x x
0 1 1
sε s0 s01
V f
011
V
1 0 0
8
1. The set of states is unchanged: S = {sε , s0 , s01 , s011 }.
2. The transition function is changed by making s011 into a sink state.
3. The initial state i is still sε .
4. The only accepting state is s011 .
The only change we have made is thus to make s011 into a sink state; the diagram
is:
/ ?>=<
89:; / ?>=<
89:; / GFED
@ABC / @ABC
GFED
s?>=<
89:;
0
x j
0 1 1
sε s0 s01 011 0,1
V V
1 0
2.5 Exercises
3. For each of the following transition tables, construct the corresponding tran-
sition diagram:
a b
→ s0 s1 s0
(a)
s1 s2 s1
← s2 s0 s2
a b
↔ s0 s1 s1
(b)
s1 s0 s2
← s2 s0 s1
a b c
↔ s0 s1 s0 s2
(c) s1 s0 s3 s0
← s2 s3 s2 s0
← s3 s1 s0 s1
4. Let A be the DFA with the following transition table:
5. For each of the automata below, describe its recognised language:
G@F ECD
(a)
/ 89:;
?>=< / 89:;
?()*+
/.-,
>=< / 89:;
?>=< / 89:;
?()*+
/.-,
>=< / 89:;
?>=< / 89:;
?()*+
/.-,
>=<
a
a a a a a
1 2 3 4 5 6
(b)
/ ?>=<
89:; / ?>=<
89:;
0123
7654
a,b
a
s0 s1
}>
}}}
}}
89:;
?>=<
b
}}} a,b
s2
9
/ ?>=<
89:;
0123
7654 / ?>=<
89:;
(c)
a
s0 o s1
b }}
}}
}}a
89:;
?>=<
b
~} }}
s2
S
a,b
/) ?>=<
89:; /5 ?>=<
89:; / ?>=<
89:;
0123
7654
(d)
b a b
s0 s1 s2
S
a a,b
6. Let Σ = {a, b}. Construct finite automata that recognise the following lan-
guages:
(a) All strings x over Σ such that |x| ≡ 0 mod 3.
(b) All strings x over Σ such that |x| ≡ 2 mod 3.
(c) All strings x over Σ such that |x| ≡ 0 or 2 mod 3.
7. Let A = {0, 1}. Construct finite automata that recognise the following lan-
guages:
(a) All strings x over Σ such that |x| ≥ 3.
(b) All strings x over Σ such that |x| =
6 3.
(c) All strings x over Σ such that |x| ≤ 3.
(d) All strings x over Σ such that |x| = 3.
8. Let A = {a, b}. Construct DFAs that recognise the following languages:
(a) all strings in which the first letter is the same as the last letter.
(b) all strings of length at least 4 in which the second letter is equal to the next
to last letter.
(c) all strings containing ab but not aba.
9. (a) The DFA
a a
/ ?>=<
89:;
0123
7654 / ?>=<
89:;
b
s0 o b s1
accepts the language L, where L consists of all strings with an even number
of bs. Construct a DFA M such that L(M ) = L \ {ε}.
(b) Show that if L is a regular language, then so is L \ {ε}.
10