Sie sind auf Seite 1von 10

1 Languages

In this first section we will look at alphabets, strings and languages. In this
setting, an alphabet will be any finite set, that we will think of as symbols; a
string or a word will just be any sequence of elements of the alphabet, and a
language will be a collection of words.

1.1 Alphabets
We will start with some basic definitions:
Definition 1. By an alphabet we will mean a finite set. A member of an
alphabet is known as a symbol or a letter.
Example 1. In genetics, the information in the DNA is encoded in a sequence
amino acids. Each of these amino acids is one of adenine (A), cytosine (C),
guanine (G) and thymine. This means that we can view the gene as a sequence
of the letters A, C, G and T, so the relevant alphabet here is {A, C, G, T }
Example 2. For writing integers in binary form, one needs just the digits 0
and 1, so we get the alphabet {0, 1}, if we want to write real numbers in binary
form we also need the binary point, and we get the alphabet {0, 0, .}.

1.2 Strings
Definition 2. A string over the alphabet Σ is a finite sequence of symbols in
Σ.
A sequence can be empty, and when it is, we have the empty string, which we
write ε. The length of a string is simply the number of (not necessarily distinct)
symbols in the string. The length of a string u is denoted by |u|.
Example 3. Some strings over the alphabet {0, 1} are 000, 0101, 0 and ε. Their
lengths are 3, 4, 1 and 0 respectively.
If u and v are two strings over some alphabet Σ, then we can form the
concatenation uv of the two strings, which is obtained by writing all the letters
in u followed by the letters of v. Sometimes we write u · v to emphasize the
operation of concatenation.
Example 4. Let u = 0101 and v = 101, then uv = 0101101 and vu = 1010101.
Since the empty string does not contain any letters, we get the identity

ε·u=u·ε=u

The above identity together with the fact that concatenation is associative,
that is, for all strings u, v and w

(uv)w = u(vw)

1
makes the set of strings over an alphabet Σ into a monoid.
Just as powers of numbers are just repeated multiplication, we form powers
of strings as well. For a string u and an non-negative integer n, we can define
un to be u · u · · · · · u, where we have u repeated n times. Just as for numbers,
we get the identity
un · um = un+m ,
directly from the definition of powers of strings.

Example 5. Let u = aba, then u3 = aba · aba · aba = abaabaaba. Also,


b(an)2 a = b · an · an · a = banana.

1.3 Languages
Definition 3. A language over Σ is a subset of Σ∗ , the set of all strings over
Σ.

Let L and M be languages over an alphabet Σ, from these we can form new
languages in a variety of ways.
We can take the set of all strings that are members at least one of the
languages L and M , and we get their union

L ∪ M = {w | w ∈ L or w ∈ M }.

We can take the set of all string that are members of both the languages L
and M and we get their intersection

L ∩ M = {w | w ∈ L and w ∈ M }.

We could also take the set of all strings that are not members of L and we
get the complement of L
L̄ = {w | w 6∈ L}.
If we remove all strings from the language L that also are elements of M ,
we get the set difference of L and M

L \ M = {w | w ∈ L and w 6∈ M }.

The above operations are not special to languages, we can form unions,
intersections and complements of arbitrary sets. Something which is special
for strings1 is that we can take the product of two strings. If we consider the
collection of all the strings we can get by taking one string from L and one
string from M and multiplying them together we get the product of L and M

L · M = {uv | u ∈ L, v ∈ M }.

Note that in general we have L · M 6= M · L.


1 Actually of monoids in general.

2
Another operation we can do on a language, is to consider the set of all
strings we can get by taking n strings u1 , u2 , . . . , un from L and form their
product u1 u2 · · · un , if n = 0 we take the empty string as the product. The
resulting language is known as the Kleene closure of L

L∗ = {u1 · · · un | n ≥ 0, ui ∈ L}

Another way of writing the Kleene closure is the following

L∗ = L0 + L1 + L2 + · · · .

Example 6. Let L = {aa, ab} and M = {aa, bb, ba} be languages over Σ =
{a, b}. Then L ∪ M = {aa, ab, ba, bb}, L ∩ M = {aa}, L \ M = {ab}, LM =
{aaaa, aabb, aaba, abaa, abbb, abba}, and M L = {aaaa, aaab, bbaa, bbab, baaa, baab}.

1.4 Exercises
1. Let u = ab and v = ba. What is u2 v 2 and (uv)2 ?
2. Let Σ = {a, b}. The languages L1 , L2 and L3 over Σ are given by L1 =
{aab, bba, aa}, L2 = {bb, bbb} and L3 = {ba, ab}.

(a) Write down all strings in the following languages: L1 L2 , L2 L1 , L1 + L2 ,


L1 ∩ L2 .

(b) Which (if any) of the following strings are in the language L∗1 : aab, bbaa,
bbaaa, bbaaab, aaaaaaa?

(c) Show that L∗2 = {bn | n ≥ 4} + {ε}.

2 Deterministic finite automata


Next, we will look at the following problem: given a language L, how can we
determine if a given string is in L or not?
Our approach to this problem will be to construct (abstract) machines for
deciding language membership. These machines . . .

Definition 4. A deterministic finite automaton, (or DFA), A (over the alphabet


Σ) is given by a 5-tuple A = (S, Σ, i, δ, T ), where

1. S is a finite set, the set of states.

2. Σ is a finite alphabet.

3. i ∈ S, the initial state.

4. δ : S × Σ → S, the transition function

5. T ⊆ S, the set of accepting states.

3
2.1 The transition diagram
We can describe a DFA in various forms. One way is to the transtition diagram.
This is a diagram that is constructed in the following way:
1. One circle is drawn for each state, and the label of the state is written
inside the circle.
2. For each state s ∈ S and each letter x ∈ Σ, a directed edge (an arrow) is
drawn from s to δ(s, x).
3. An arrow that does not start at a state, but points to the initial state i is
drawn.
4. All circles for the accepting states t ∈ T are doubled.
If this procedure is applied to the machine A, we get the following diagram:

/ ?>=<
89:; / ?>=<
89:;
0123
7654
a
s0 o s1
a

2.2 The transition table


Another way of encoding the same information is in the form of a table, the
transition table which is constructed as follows:
1. The table has one row for each state, labelled by that state.
2. The table has one column for each letter in the alphabet, labelled by that
letter.
3. For a state s ∈ S and letter x ∈ Σ, in the position of row s, and column
x, the state δ(s, x) is written.
4. The row label of the initial state is prefixed with →.
5. The row labels of all the accepting states are labelled by ←
The transition table of the machine A is thus:
a
→ s0 s1
← s1 s0

Definition 5. Given a DFA A over the alphabet Σ with transition function δ,


we define the extended transition function δ ∗ : S × Σ∗ → S by

δ ∗ (s, ε) = s (1)
δ ∗ (s, wx) = δ (δ ∗ (s, w), x) (2)

where s ∈ S, w ∈ Σ∗ and x ∈ Σ.

4
Definition 6. Let A = (S, Σ, i, δ, T ) be a DFA, the language recognised by A,
L(A) is then defined as the set of all strings w in Σ∗ such that δ ∗ (i, w) ∈ T .
Definition 7. A language L over the alphabet Σ∗ is said to be regular if there
exists a DFA A such that L = L(A).

2.3 Finite languages


Our main theorem in this subsection is to show that for every finite language,
that is a language that only contains finitely many strings, is regular. Before
stating and proving this, we need to define the concepts of prefixes and suffixes
of a string.
Definition 8. Let w be a string; if w = uv for some strings u and v, we say
that u is a prefix of w, and that v is a suffix of w.
Example 7. Let w = 001101, the set of prefixes of w is

{ε, 0, 00, 001, 0011, 00110, 001101},

and the set of suffixes of w is

{ε, 1, 01, 101, 1101, 01101, 001101}.

Theorem 1. Any finite language is regular.


Proof. Let L be a finite language. We will prove that L is regular by explicitly
constructing an automaton A that recognises it.
Suppose L = {w1 , w2 , . . . , wn }. Let PL be the set of all possible prefixes of
the strings in L. Let the set S of states in A consist of all su u ∈ PL , together
with an extra state s⊥ . Now define the transition function δ as follows:
(
sux , if ux ∈ PL ,
δ(su , x) = (3)
s⊥ , otherwise.
δ(s⊥ , x) = s⊥ (4)

Let the initial state in A be sε and the accepting states be all swi .
We can now claim that L(A) = L. We prove this by proving that
(
sv , if v ∈ PL ,
δ ∗ (sε , v) = (5)
s⊥ , otherwise.

We proceed by induction. For the base of the induction, it is clear that δ ∗ (sε , ε) =
sε . Now assume that the equality (5) holds for the string u, then for x ∈ Σ, we
have that, if ux ∈ PL , then u ∈ PL , and δ ∗ (sε , u) = su and hence δ ∗ (sε , ux) =
δ(su , x) by the definition of the extended transition function. If ux is not in PL
it is clear that δ ∗ (sε , ux) = s⊥ . Thus we have proved the equality in (5), from
which we can deduce that w is in L(A) precisely when w ∈ L.

5
Example 8. Let L = {00, 001, 10}. The set of all prefixes of the strings in L is
then
{ε, 0, 1, 00, 10, 001}
From each of the prefixes we thus get a state, and also an extra failure state, so
the full set of states
{sε , s0 , s1 , s00 , s10 , s001 , s⊥ }.
The initial state will be s⊥ , and the accepting states will be the states corre-
sponding to the strings of L: {s00 , s001 , s10 } We can now construct the arrows
of the transition diagram as follows:
1. From the state sε we draw an arrow labelled by 0 to s0 , since ε · 0 = 0
which is a prefix in P , similarly we get an arrow labelled by
1

/ ?>=<
89:; 0 / ?>=<
89:; / GFED
@ABC
89:;
?>=< / @ABC
GFED
s?>=<
89:; / GFED
@ABC
a,b
0,1 &
0 1
sε s0 s00 s⊥
AA 001
89 D
AA
AA1
AA
89:;
?>=< / GFED
@ABC
89:;
?>=<
0
A 0,1
0
s1 s10
1

2.4 Pattern recognising automata


Let u be a string over Σ, we will now see how to construct automata that
recognise the languages
1. All strings that have u as a prefix.
2. All strings that contain u as a substring.
3. All strings that have u as a suffix.

2.4.1 Prefix patterns


Theorem 2. Let w be a string over Σ, the language of all strings that have w
as a prefix is then regular.
Just as in the case of the proof of the fact that every finite language is regular,
we will prove this by an actual construction of an automaton that recognises
this language.
Proof. Let Pw be the set of all prefixes of the word w, and let the set of states
S consist of all su where u is a prefix of w, together with an extra state s⊥ . The
transition function is defined similar to before: Let u be a prefix of w, w 6= w,
(
sux , if ux ∈ Pw ,
δ(su , x) =
s⊥ , otherwise.

6
For the state sw we set

δ(su , x) = su for all x ∈ Σ

and finally for the state s⊥ we set

δ(s⊥ , x) = s⊥ for all x ∈ Σ

We can now prove the theorem similarly to before . . . [TODO].


Example 9. Let L be the language consisting of all strings over Σ = {0, 1}
that have the string 010 as a prefix. We can then construct a DFA recognising
L as follows.
1. The prefixes of w = 011 is {ε, 0, 01, 011}, and for each of the prefixes we
get a state: sε , s0 , s01 , s011 . Togheter with the extra “failure state”, s⊥ ,
they give the set of states.
2. For each state we can now draw the transition arrows.
(a) From sε we draw an arrow labelled by 0 to the state s0 since 0 is a
prefix of 010, and an arrow labelled by 1 to s⊥ , since 1 is not a prefix
of 010.
(b) Since 01 is a prefix of 010 and 00 is not; we get an arrow labelled by
1 from s0 to s01 , and an arrow labelled by 0 from s0 to s⊥ .
(c) From s01 we draw an arrow to s011 labelled by 1, and an arrow to s⊥
labelled by 0.
(d) The state s011 is a sink state.
(e) The state s⊥ is also a sink state.
3. The initial state is sε .
4. The accepting state is s011 .
We can give all of this information in the following diagram:
1

/ ?>=<
89:; / ?>=<
89:; / GFED
@ABC / @ABC
GFED
s?>=<
89:; GFED
@ABC
0
%)
0 1 1
sε s0 s01 s⊥
011
S 8 S
0 0,1 0,1

2.4.2 Suffix patterns


Theorem 3. Let w be a string over Σ, the language of all strings that have w
as a suffix is then regular.
TODO.

7
Example 10. Let L be the language consisting of all strings that have the
string 011 as a suffix, i.e. all strings ending in 011; we will then construct a
DFA recognising L.
1. For each prefix of 011 we get a state, so the set of states is S = {sε , s0 , s01 , s011 }.
2. As before we go through the states one by one to construct the arrows in
the transition diagram.
(a) From sε , we see that for the arrow with label 0, that the longest suffix
of ε · 0 = 0 which is a prefix of 011 is 0, so the target of the 0 arrow
is s0 . When it comes to the arrow labelled by 1, the longest suffix of
ε · 1 = 1 that is a prefix of 011 is ε (because the only other suffix of
1, namely 1 itself is not a prefix of 011). so the target of the arrow
labelled by 1 is thus sε .
(b) From s0 : the longest suffix of 0 · 0 = 00 that is a prefix of 011 is 0,
so we get a loop labelled by 0, the longest suffix of 0 · 1 = 01 that is
a prefix of 011 is 01, so we get an arrow labelled by 1 to s01 .
(c) From s01 : the longest suffix of 01 · 0 = 010 that is a prefix of 011 is 0
so we get a 0-labelled arrow to s0 , and the longest suffix of 01·1 = 011
that is a prefix of 011 is 011, and we get a 1-labelled arrow to s011 .
(d) Lastly from the state s011 , we get that 011 · 0 = 0110 has 0 as its
longest suffix which is a prefix of 011, and that 011 · 1 = 0111 has ε
as its longest suffix which is a prefix of 011. This means that we get
a 0-labelled arrow to s0 and a 1-labelled arrow to sε .
3. The initial state i is sε .
4. The only accepting state is s011 .
Again we summarise this in the transition diagram.
1

/ ?>=<
89:; / ?>=<
89:; / GFED
@ABC / @ABC
GFED
s?>=<
89:;
0
x x
0 1 1
sε s0 s01
V f
011
V
1 0 0

2.4.3 Substring patterns


Theorem 4. Let w be a string over Σ, the language of all strings that have w
as a suffix is then regular.
TODO.
Example 11. Let L be the language consisting of all strings that contain 001
as a substring. A DFA recognising this language can easily be constructed by
sligthly modifying the automaton recognising the language of all strings with
001 as a suffix as constructed above.

8
1. The set of states is unchanged: S = {sε , s0 , s01 , s011 }.
2. The transition function is changed by making s011 into a sink state.
3. The initial state i is still sε .
4. The only accepting state is s011 .
The only change we have made is thus to make s011 into a sink state; the diagram
is:

/ ?>=<
89:; / ?>=<
89:; / GFED
@ABC / @ABC
GFED
s?>=<
89:;
0
x j
0 1 1
sε s0 s01 011 0,1
V V
1 0

2.5 Exercises
3. For each of the following transition tables, construct the corresponding tran-
sition diagram:
a b
→ s0 s1 s0
(a)
s1 s2 s1
← s2 s0 s2
a b
↔ s0 s1 s1
(b)
s1 s0 s2
← s2 s0 s1
a b c
↔ s0 s1 s0 s2
(c) s1 s0 s3 s0
← s2 s3 s2 s0
← s3 s1 s0 s1
4. Let A be the DFA with the following transition table:
5. For each of the automata below, describe its recognised language:

G@F ECD
(a)

/ 89:;
?>=< / 89:;
?()*+
/.-,
>=< / 89:;
?>=< / 89:;
?()*+
/.-,
>=< / 89:;
?>=< / 89:;
?()*+
/.-,
>=<
a

a a a  a a
1 2 3 4 5 6

(b)

/ ?>=<
89:; / ?>=<
89:;
0123
7654
a,b

a
s0 s1
}>
}}}
}}
89:;
?>=<
b
 }}} a,b
s2

9
/ ?>=<
89:;
0123
7654 / ?>=<
89:;
(c)
a
s0 o s1
b }}
}}
}}a
89:;
?>=<
b
 ~} }}
s2
S
a,b

/) ?>=<
89:; /5 ?>=<
89:; / ?>=<
89:;
0123
7654
(d)
b a b
s0 s1 s2
S
a a,b

6. Let Σ = {a, b}. Construct finite automata that recognise the following lan-
guages:
(a) All strings x over Σ such that |x| ≡ 0 mod 3.
(b) All strings x over Σ such that |x| ≡ 2 mod 3.
(c) All strings x over Σ such that |x| ≡ 0 or 2 mod 3.
7. Let A = {0, 1}. Construct finite automata that recognise the following lan-
guages:
(a) All strings x over Σ such that |x| ≥ 3.
(b) All strings x over Σ such that |x| =
6 3.
(c) All strings x over Σ such that |x| ≤ 3.
(d) All strings x over Σ such that |x| = 3.
8. Let A = {a, b}. Construct DFAs that recognise the following languages:
(a) all strings in which the first letter is the same as the last letter.
(b) all strings of length at least 4 in which the second letter is equal to the next
to last letter.
(c) all strings containing ab but not aba.
9. (a) The DFA
a a

/ ?>=<
89:;
0123
7654 / ?>=<
89:;
 b 
s0 o b s1

accepts the language L, where L consists of all strings with an even number
of bs. Construct a DFA M such that L(M ) = L \ {ε}.
(b) Show that if L is a regular language, then so is L \ {ε}.

10

Das könnte Ihnen auch gefallen