Sie sind auf Seite 1von 47

CS 203: Introduction to Formal

Languages and Automata


Chapter 4
Properties of Regular Languages
These class notes are based on material from our textbook, An
Introduction to Formal Languages and Automata, by Peter
Linz, published by Jones and Bartlett Publishers, Inc., Sudbury,
MA.

Properties of regular languages


What happens when we perform operations on
regular languages?
E.g., if we concatenate two regular languages, is the
resulting language also regular?

Can we decide whether a given language has a


certain property or not?
E.g., Can we tell if a certain language is finite or not?

Can we tell whether a given language is regular or


not?

Closure properties of regular languages


Definition: A regular language is any language that
is accepted by a finite automaton
Theorem 4.1 : The class of regular languages is
closed under the following operations (that is,
performing these operations on regular languages
creates other regular languages)

Union
Concatenation
Kleene star
Complement
Intersection

Closure for union, concatenation, and


Kleene star
If L1 and L2 are regular languages, then there
exist regular expressions r1 and r2 such that L1
= L(r1) and L2 = L(r2). By definition 3.1.2 in
our text: r1+r2 , r1r2, and r* are regular
expressions, and:
L1 L2 = L(r1+r2)
L1L2 = L(r1r2)
L1* = L(r*)

Closure for union, concatenation, and


Kleene star
Since languages represented by regular
expressions are by definition regular,
performing the operations of union,
concatenation, and star-closure on regular
languages produces regular languages.
We say that the class of regular languages is
closed under union, concatenation, and
Kleene star (star-closure).

So:
The null language is regular
The language consisting of the empty string,
{}, is regular
For each a in , {a} is regular
If L1 and L2 are regular:
L1 L2 is regular
L1L2 is regular
L1* is regular

Unions, Intersections, and


Complements: Theorem 4.1, p. 100
Suppose that
M1 = (Q1, , 1, q1, F1) accepts language L1, and
M2 = (Q2, , 2, q2, F2) accepts language L2
Let M be an FA defined by M = (Q, , , q0, F) where
Q = Q1 Q2
q0 = (q1, q2)
and the transition function is defined by:
((p, q), a) = (1(p, a), 2(q, a)),
for any p Q1, q Q2, and a

Unions, Intersections, and Difference:


Theorem 4.1, p. 100
Then:
1. If F = {(p, q) p F1 or q F2}, M
accepts the language L1 L2
2. If F = {(p, q) p F1 and q F2},
M accepts the language L1 L2
3. If F = {(p, q) p F1 and q F2},
M accepts the language L1 L2

Theorem 4.1, p. 100


Proof:
For any x and any (p, q) Q:
*((p, q), x) = (1*(p, x), 2*(q, x))
A string x is accepted by M iff
*((q1, q2), x) F
By our formula, this is true only if
(1*(q1, x), 2*(q2, x)) F

Theorem 4.1, p. 100


Proof (continued):
For Case 1, this is equivalent to saying that:
1*(q1, x) A1 or 2*(q2, x) A2
Which is equivalent to
x L1 L2
Cases 2 and 3 are similar

Complement
Consider the special case in which L1 is all of
*. Here, L1 L2 is actually L2 (the
complement of L2)

Reversal
Theorem 4.2: The family of regular languages is
closed under reversal.
Proof: If L is a regular language, construct an NFA
with a single final state that accepts it. Now
change the initial vertex into a final vertex, the
final vertex into the initial vertex, and reverse
the direction on all the edges. For every string
w accepted by the original NFA, the modified
version of the NFA accepts wR.

Homomorphism
Definition 4.1: A homomorphism is a
substitution in which a single letter is
replaced with a string. Formally, if and
are alphabets, then a function
h : *
is a homomorphism.
If L is a language on S, then its homomorphic
image is:
h(L) = {h(w) : w L}

Homomorphism
Theorem 4.3: If L is a regular langugae,
then its homomorphic image h(L) is also
regular.
Thus the family of regular languages is
closed under homomorphism.

Right quotient
To form the right quotient of L1 with L2,
L1/L2, take all strings in L1 that have a
suffix belonging to L2 and remove the
suffix.
Example:
L1 = {ab, aab, aaab, aaaab}
L2 = {b}
L1/L2 = {a, aa, aaa, aaaa}

Right quotient
Theorem 4.4: If L1 and L2 are regular
languages, then L1/L2 is also regular.
Thus the family of regular languages is
closed under right quotient with another
regular language.
Proof: By construction see textbook, pp.
106-107.

The membership question


Given a language L and a string w, is w L?
A method for answering the membership
question is called a membership algorithm.
Is there a membership algorithm for regular
languages?

The membership question


Theorem 4.5: Given a standard representation
(i.e., a finite automaton, a regular expression,
or a regular grammar) of any regular language
L on and w *, there exists an algorithm
for determining whether w is in L.
Proof: Here is the algorithm:
1. If the standard representation of L is in the
form of a regular expression, or a regular
grammar, construct an equivalent FA.
2. Test w to see if it is accepted by the FA.

The finiteness question


Theorem 4.6: Given a standard representation
(i.e., a finite automaton, a regular expression,
or a regular grammar) of any regular language
L on , there exists an algorithm for
determining whether L is empty, finite, or
infinite.

The finiteness question


Proof: Here is the algorithm:
1. If the standard representation of L is in the
form of a regular expression, or a regular
grammar, construct an equivalent FA.
2. If there is a simple path from the initial vertex
to any final vertex, then the language is not
empty.
3. Find all the vertices that are the base of a
cycle. If any of these vertices is on a path
from the initial to a final vertex, the language
is infinite; otherwise, it is finite.

The does L1 = L2 question


Theorem 4.7: Given standard representations of
two regular languages L1 and L2, there exists
an algorithm for determining whether or not L 1
= L2.

The does L1 = L2 question


Proof: Here is the algorithm:
1. Define a new language
2. L3 = (L1 ~L2) (~L1 L2)
3. L3 is regular (see previous closure proofs)
4. Therefore, we can find a DFA that accepts L 3.
5. Use theorem 4.6 to decide if L3 is empty.
6. L3 = iff L1 = L2 (exercise 8 in section 1.1 in
the Linz textbook).
7. So L1 = L2 if L3 = ; otherwise, L1 L2

The pigeonhole principle


The pigeonhole principle states that if n + 1
items are placed into n pigeonholes, then at
least 1 pigeonhole must end up with more than
1 item in it.
In set notation:
if f : A B
|A| = n + 1
|B| = n
then f cannot be one-to-one

Not all formal languages are regular


An automaton that accepts the language L = {akbk | k 0}
must count the number of as in each string to make sure
there is an identical number of bs. There is no limit on how
high the automaton might need to count to accept a string
in this language. But an automaton with finite memory can
only count as high as the size of its memory.
This is an intuitive argument why this language is not
regular. It is not a proof, however. To prove that a language
is not regular, we use a mathematical result called the
pumping lemma for regular languages.

4.3: The Pumping Lemma


The Pumping Lemma is used to prove that a
language is not regular
How do we prove that a language L is regular?
Write a regular expression for it
Draw a Finite Automaton for it
Construct a regular grammar for it

Pumping Lemma
Theorem 4.8: Let L be a regular language. There exists a
positive integer m such that for any string w L with |w|
m, w may be written as w = xyz, for some x, y, and z
satisfying the following:
|xy| m,
|y| 1,
and xyiz L for every i 0

Pumping Lemma
In other words, every sufficiently long string in L can be
broken down into three parts in such a way that an
arbitrary number of repetitions of the middle part yields
another string in L. We say that the middle string is
pumped, hence the term pumping lemma.

Based on the idea of loops


Given:
M = (Q, , ,q0,A), where |Q| = n, and
any string x where |x| n , then x must pass through a
sequence of n + 1 states.
Suppose x = a1 a2 a3 ... an y. Then the sequence of n+1 states
q0= *(q0, )
q1= *(q0, a1)
q2= *(q0, a1 a2)
qn= *(q0, a1 ...an)
must contain some state at least twice, by the pigeonhole
principle.

Example
b

q0

q1

x=a
|x| = 1
Sequence of states = q0 q1
n = Number of different states passed through = 2

Example
b

q0

q1

x = bba so |x| = 3
Sequence of states = q0 q0 q1
n=2
Any string where |x| n must have repeated a state

Pumping
If a state is repeated one or more times, it
means that there must be a loop in the
transition diagram.
If there is a loop, then it can be pumped to
produce additional strings that belong to the
language

Example
If ba is in the language, and there are only 2
states in the automaton, then a, bba, bbba,
bbbba, etc. are also in the language.
b

q0

q1

Example of a nonregular language


L = 0i1i | i 0
Is this regular?
No.
Why not?
Intuitively: We cant build a finite automaton
to recognize it.
Why not?

Example of a nonregular language


L = 0i1i | i 0
Because the FA has no memory for past events
except its states. Each state can tell you how
you got to that state from the immediately
previous state (i.e., the last character you
processed), but, if there is a loop, it cant
remember the number of characters you
processed up to that point.

Limits of a FA
Being in state q1 and having just read a 1
doesnt tell you anything about how many 1s
have already been processed. The FA simply
doesnt have the memory needed to retain
this information.
0

q0

q1

Limits of a FA
Moreover, if you have a loop like this in an FA,
the FA must accept any number of 1s in the
loop. There is no way to specify exactly as
many 1s as 0s this FA can accept 000111,
but must also accept 0111, 00001, etc.
0

q0

q1

Limits of an FA
Consequently, we cant build an FA that can tell
whether the number of 0s that it saw at the
beginning of the string exactly matches the
number of 1s at the end of the string.
But this is not a formal proof.

Proof idea
If a DFA has n states, then any path of length n must
visit n+1 states, and contains a cycle. (This is an
application of the pigeonhole principle.)
y
x

This part of the string can be pumped


to produce other strings in the language.

Proof idea again


If an infinite language is regular, it is accepted by
a DFA.
The DFA has some finite number of states, say,
m.
Because the language is infinite, some strings
must have length > m.
For a string of length > m accepted by the DFA, a
walk through the DFA must contain a cycle.
Repeating the cycle an arbitrary number of times
must yield another string accepted by the DFA.

Proof
Suppose that qi = qi+p , where
0i<i+pn
x = uvw
u = a1a2ai
v = ai+1a2ai+p
w = ai+p+1ai+p+2any
y = part of string longer than n + 1
Remember that qi = qi+p

Proof
Assume a dfa with states labeled q0,q1,qn
Now take as string in L |w| m = n +1
To process w the machine could go through a set of states say,
q0, qi, qj, qf.
Since this sequence has exactly |w| +1 entries, at least one state
must be repeated, and this repetition starts no later than the
nth move.
So the sequence of states must look like
q0, qi, qj, , qr, qr, , qf
indicating there must be substrings x, y, z of w such that
*(q0, x) = qr
*(qr, y) = qr
*(qr, z) = qf
with |xy| n +1 = m and |y| 1

Proof (cont.)
From this it immediately follows that
*(q0, xz) = qf
as well as
*(q0, xy2z) = qf,
*(q0, xy3z) = qf,
and so on, completing the proof of the
theorem

How to use the pumping lemma


The Pumping Lemma describes a property that is
possessed by every regular language. If we show
that a language does not possess this property,
we know that it is not regular.
The strategy is proof by contradiction. We
assume a language has the property described by
the pumping lemma, and then we show that this
leads to a contradiction. It follows that the
language is not regular.

Example
Example 4.7: The language L = {anbn | n 0} is not
regular.
The proof is by contradiction:
If L is regular, it must be accepted by some DFA.
Let m be the number of states of the DFA and consider
some w L such that |w| m.
By the pumping lemma, we can split w into three pieces,
w = xyz, such that for any n 0, the string xynz is in L.
So let w = ambm.
Because |xy| m, y must consist of all as.
But then xy2z will contain more as than bs.
This is a contradiction.

Homework
Use the pumping lemma to show that the language of
palindromes L = {w | w = wR, w {a,b}*} is not regular.

Homework
Use the pumping lemma (plus some closure
properties of regular languages) to show that the language
L = {w {a,b}* | w contains equal number of as and bs}
is not regular.

Homework
Use the pumping lemma to show that the language
L = {ww | w {a,b}*} is not regular.

Das könnte Ihnen auch gefallen