Beruflich Dokumente
Kultur Dokumente
Regular Expressions and Context Free Grammars: Regular expression formalism- equivalence with finite
automata-regular sets and closure properties- pumping lemma for regular languages- decision algorithms for
regular sets- applications. Context-Free Grammars – derivation trees, , ambiguous and unambiguous grammars-
equivalence of regular grammar and finite automata- Chomsky Normal Forms and Greibach Normal Forms
pumping lemma for Context free languages – applications.
REGULAR SETS
Regular languages can also be defined, from the empty set and from some finite number of singleton sets, by the
operations of union, composition, and Kleene closure. Specifically, consider any alphabet. Then a regular set
over is defined in the following way.
The empty set Ø, the set {} containing only the empty string, and the set {a} for each symbol a in Σ, are
regular sets.
If L1 and L2 are regular sets, then so are the union L1 L2, the composition L1L2, and the Kleene closure
L1*.
No other set is regular.
The Pumping Lemma for Regular Sets
Applications:
o The pumping lemma is a powerful tool for providing and proving certain language is regular or
not.
o It is also useful in the development of algorithms to answer certain questions concerning finite
automata, such as whether the language accepted by a given FA is finite or infinite.
Statement:
Let “L” be a regular set. Then there is a constant n such that if z is any word in L, and |z|≥n. We may
write z=uvw in such a way that |uv|≤n, |v|≥1 and for all i≥0, then uviw is in L.
Proof:
If a language is accepted by a DFA M= (Q, Σ, δ, q0, F) then it is regular with some particular number of
states, say n. Consider a input of n or more symbols a1a2…..am, m≥n and for i=1,2,…..,m.
Let δ (q0, a1a2…..ai) = qi
It is not possible for each of the n+1 states q0,q1,……qn to be distinct, since there are only different states. This
there are two integers j and k, 0≤j<k≤n, such that qj=qk. The path labeled a1a2….am in the transition diagram of
M. Since j<k, the string aj+1....ak is of length 1.
Aj+1…….ak
a1…….ajak+1………am
q0 qj=qk qm
J. Veerendeswari/IT/RGCET Page 1
Hence proved.
Property 1:
The regular sets are closed under union, concatenation, and Kleene closure.
Proof:
Let Σ be a finite set of symbols and let L, L1 and L2 be sets of strings from Σ*. The concatenation of L1
and L2, denoted L1L2, is the set {xy| x is in L1and y is in L2}. That is, the strings in L1L2 are formed by choosing a
string L1 and following it by a string in L2, in all possible combinations. Define L0= {ε} and Li=LLi-1 for i≥1. The
Kleene closure of L denoted L*, is the set
L*=∑i=0Li
and the positive closure of L, denoted L+, is the set
L+=∑i=1Li
That is, L* denotes words constructed by concatenating any number of words from L. L+ is the same, but the case
of zero words, whose concatenation is defined to be ε, is excluded. Note that L+ contains ε if and only if L does.
Property 2:
The class of regular sets is closed under complementation. That is, if L is a regular set and L€Σ *, and then
Σ*-L is a regular set.
Proof:
Let L be L (M) for DFA M= (Q, Σ1, δ, q0, F) and L€Σ*. First, we may assume Σ1=Σ, for if there are
symbols in Σ1 not in Σ, we may delete all transitions of M on symbols not in Σ. The fact that L€Σ * assures us that
we shall not thereby change the language of M. If there are symbols in Σ not in Σ1, then none of these symbols
appear in words of L. We may therefore introduce a “dead state” d into M with δ (d, a) = d for all a in Σ and δ (q,
a) = d for all q in Q and a in Σ-Σ1.
Now, to accept Σ*-L, complement the final states of M. That is, let M‟= (Q, Σ1, δ, q0, Q-F). Then M‟
accepts a word w if and only if δ (q0, w) is in Q-F, that is, w is in Σ*-L. Note that it is essential to the proof that M
is deterministic and without ε moves.
Property 3:
The regular sets are closed under intersection.
Proof:
L1∩L2= (L1‟U L2‟)‟, where the overbar denotes complementation with respect to an alphabet including the
alphabets of L1 and L2.Closure under intersection then follows from closure under union and complementation.
It is worth noting that a direct construction of a DFA for the intersection of two regular sets exists. The
construction involves taking the Cartesian product of states, and we sketch the construction as follows:
Let M1= (Q1, Σ, δ1, q1, F1) and M2= (Q2, Σ, δ2, q2, F2) be two deterministic finite automata. Let
M= (Q1* Q2, Σ, δ, [q1, q2], F1* F2)
where for all p1 in Q1, p2 in Q2, and a in Σ,
δ ([p1, p2], a) = [δ1 (p1, a), δ2 (p2, a)]
Property 4:
The class of regular sets is closed under substitution.
J. Veerendeswari/IT/RGCET Page 2
Theory Of Computation UNIT-II
Proof:
Let R €Σ* be a regular set and for each a in Σ let Ra€∆* be a regular set. Let f: Σ->∆* be the substitutions
denoting R and Ra. Replace each occurrence of the symbol a in the regular expression R by the regular
expression for Ra. To prove that the resulting regular expression denotes f®, observe that the substitution of a
union, product or closure is the union, product or closure of the substitution. [Thus for example, f (L1 L2) = f
(L1)Uf (L2).] A simple induction on the number of operators in the regular expression completes the proof.
A type of substitution that is of special interest is the homomorphism. A homomorphism h is a
substitution such that h(a) contains a single string for each a. We generally take h(a) to be the string itself, rather
than the set containing that string. It is useful to define the inversehomomorphic image of a language L to be
h-1(L)={λ| h(x) is in L}
We also use, for string w;
h-1(w)={λ| h(w) = w}
Property 5:
The class of regular sets is closed under homomorphism and inverse homomorphism.
Proof:
Closure under homomorphism follows immediately from closure under substitution, since every
homomorphism is a substitution, in which h(a) has one member.
To show closure under inverse homomorphism, let M= (Q, Σ, δ, q0, F) be a DFA accepting L, and let h be
a homomorphism from ∆ to Σ*. We construct a DFA M‟ that accepts
h-1(L) by reading symbol a in ∆ and simulating M on h(a). Formally, let M‟= (Q, Σ, δ‟, q0, F) and define δ‟(q, a)
for q in Q and a in ∆ to be δ(q, h(a)). Note that h(a) may be long string, or ε, but δ is defined on all strings by
extension. It is easy to show by induction on |x| that δ‟(q0,x)=δ(q,h(x)). Therefore M‟ accepts x if and only if M
accepts h(x). That is,
L(M‟)= h-1(L(M)).
Property 6:
The class of regular sets is closed under quotient with arbitrary sets.
Proof:
Let M= (Q, Σ, δ, q0, F) be a finite automaton accepting some regular set R, and let L be an arbitrary
language. The quotient R/L is accepted b a finite automaton M‟= (Q, Σ, δ, q0, F‟), which behaves like M except
that the final states of M‟ are all states q of M such that δ (q0, xy) is in F. Thus M‟ accepts R/L.
J. Veerendeswari/IT/RGCET Page 3
Theory Of Computation UNIT-II
Rules to be followed while writing a CFG.
1. A single non terminal should be at LHS
2. Production should be always in the form of LHS RHS where RHS may be combination
ofnon terminal and terminal symbol.
3. The null derivation can be specified as NT ε
4. One of the NT should be start symbol.
Any language that can be generated bysome context-free grammar is called a context-free language (CFL).
Derivation
Derivation from S means generation of string ω from S
Types of derivation
1. Left most derivation
The derivation in which the left most non terminal is always replaced at each step (choose
leftmost non terminal in a sentential form )
Example : Given the grammar ( set of productions)
E E + E
E E * E
E id
Obtain the left most derivation for the string id*id
E => E * E => id * E => id * id
2. Right most derivation
The derivation in which the right most non terminal is always replaced at each step (choose
leftmost non terminal in a sentential form)
Example : Given the grammar ( set of productions)
E E + E
E E * E
E id
Obtain the left most derivation for the string id*id
E => E * E => E * id => id * id
J. Veerendeswari/IT/RGCET Page 4
Theory Of Computation UNIT-II
Example : Given the grammar ( set of productions)
EE+E
EE*E
E ( E)
E id
Obtain the left most derivation for the string id*id+id
Left Most Derivation Tree
J. Veerendeswari/IT/RGCET Page 5
Theory Of Computation UNIT-II
->aaabbabbS
->aaabbabbbA
->aaabbabbba
Right Most Derivation:
S->aB
->aaBB
->aaBbS
->aaBbbA
->aaBbba
->aaaBBbba
->aaaBbbba
->aaabSbbba
->aaabbAbbba
->aaabbabbba
SIMPLIFICATION OF CFG
All the grammars are not always optimized. That means grammar may consists of some extra symbol(non
terminal). Having extra symbols unnecessary increases the length of the grammar .simplification of grammar
means reduction of grammar by removing useless symbol.
The properties of the reduced grammar:
1. Each variable (ie non terminal) and each terminal of G appears in the derivation of some word in L.
2. There should not be any production as XY where X and Y arenon terminal
3. If 𝜀is not in the language L then there need not be the production X
Reduced grammar
Removal of useless symbol elimination of production removal of unit production
Example:1
J. Veerendeswari/IT/RGCET Page 6
Theory Of Computation UNIT-II
T00 - rule 2
Now, in the above CFG the non terminal are S, T, X
S
0T S0T
000 T00
Thus we can reach to certain string after following these rules
But if SX then there is no further rule as a definition to X. That means there is no point in the rule
SX. hence we can declare that X is a useless symbol. And we can remove this so after removal of this
useless symbol CFG becomes
G=(V,T,P,S) where V={S,T}
T={0,1} and P={S0T|1T|0|1
T00}
S is start symbol
Example:2
Consider the CFG G= {V,T,P,S} where V={S,A,B} T={0,1}
P={SA11B|11A
SB|11
A0
BBB}
solution
Now in the given CFG if we try to derive any string A gives some terminal symbol as 0 but B does not
give any terminal string . By following the rules with B we simply get sample number of B and no
significant string.
Hence we can declare B as useless symbol and can remove the rules associated with it. Hence after
removal of useless symbol we get
S11A|11
A0
Elimination of 𝜀 production
If there is production we can remove it without changing the meaning of the grammar. Thus 𝜀production are not
necessary in a grammar.
Example:1
S0S|1S|𝜀
Then we remove production. But we have to take a care of meaning of CFG. Ie meaning of CFG should
not be get changed if we place S𝜀 in other rules we get S0 when S0S and S𝜀
J. Veerendeswari/IT/RGCET Page 7
Theory Of Computation UNIT-II
As well as S1 when S1S and S𝜀
Hence we can rewrite the rules as
S0S|1S|0|1
Thus production is removed.
Example:2
SXYX
X0X|𝜀
Y1Y|𝜀
Now while removing production we are deleting the rules X𝜀 and Y𝜀 to preserve the meaning of CFG we
are actually placing 𝜀 at right hand side wherever X and Y have appeared.
Let us take
SXYX
If first X at right hand side is𝜀
Then SYX
Similarly if last X in RHS =𝜀
Then SXY
If Y= 𝜀 then
SXX
If Y and X are then
SX
SY when both X are replaced by 𝜀
SXY|YX|XX|X|Y
Now let us consider
X0X
If we place 𝜀 at right hand side for X then
X0
X0X|0
Y1Y|1
We can rewrite the CFG with removed 𝜀 production as
SXY|YX|XX|X|Y
X0X|0
Y1Y|1
Eg X Y YZ
Then X,Y and Z are unit productions. To optimize the grammar we need to remove the unit production.
If A->B production we should add a rule A->x1x2x3x4…xn.
J. Veerendeswari/IT/RGCET Page 8
Theory Of Computation UNIT-II
Example 1:
S0A|1B\C
A0S|00
B1|A
C01
SOLUTION:
Clearly SC is a unit production. But while removing SC we have to consider what C gives. So, we
can add a rule to S.
S0A\1B|01
Similarly BAis also a unit production so we can modify it as
B1|0S|00
Thus finally we can write CFG without unit production as
S0A|1B|01
A0S|00
B1|0S|00
C01
Example: 2
SA|0C1
AB|01|10
C𝜀 |CD
SOLUTION:
SABis a unit production
C is a null production.
CCD B and D are useless symbol.
J. Veerendeswari/IT/RGCET Page 9
Theory Of Computation UNIT-II
Example 1:
Find the grammar in CNF equivalent to SaAbB, AaA/a, BbB/b
Step 1:As there are no unit productions or null productions, we need not carry out step 1. We proceed to
step 2.
J. Veerendeswari/IT/RGCET Page 10
Theory Of Computation UNIT-II
Step 2: Let G1= (VN‟, {a, b}, P, S) where P1 and VN‟ are constructed.
Add productions A->a, B->b to P1.
S->CaD1
D1->AD2
D2->CbB
A->CaA
B->CbB
A->a
B->b
Example 2:
Find the CNF equivalent to the grammar S~S/[S∩S]/p/qwhere ~, [, ∩,], p, q are terminals.
J. Veerendeswari/IT/RGCET Page 11
Theory Of Computation UNIT-II
Step 2: Consider the production (1),
S->~S
Add new production C1->~
The production S->~S becomes S->C1S
C1->~
S->C2D1
D1->SD2
D2->C3D3
D3->SC4
S->p
S->q
Hence derived.
J. Veerendeswari/IT/RGCET Page 12
Theory Of Computation UNIT-II
Solution:
Consider SaAD
The production is of the form NTT*NT*NT
Replace a by Ca. We get, SCaAD
Let D1AD. Then the production becomes SCa D1 which is in CNF.
Now consider AaB
This is of the form NTT*NT
Since Caa, A->aB becomes ACaB which is in CNF.
Let us now consider AbAB.
This production is of the form NTT*NT*NT
Replace b by Cb and AB by D2. We get, ACb D2.
The other two productions Bb and Dd are already in CNF.
D1->AD
A->CaB
A->CbD2.
D2.->AB
B->b
D->d
J. Veerendeswari/IT/RGCET Page 13
Theory Of Computation UNIT-II
Let us now consider the production AbS. Replacing b by Cb the grammar gets converted into CNF,
ACbS.
The last production to be converted to CNF is AaAAb.
Replacing Ca and b by Cb we get ACaAACb. Let us now consider D3AD2 where
D2ACb.So the production becomes ACa D3 which is in CNF.
D->SCb
D1->CbD
S->CaD1
D2->ACb
S->CaD2
A->CaD3
D3->AD2
D2->ACb
S->a
B->b
Solution:
Consider SASA.
D->d
The above production is of the form NTNT*NT*NT
Let D1SA. Replacing SA the production becomes SAD1 which is in CNF.
Now consider SbA. Replace b by Cb in order to convert the production to CNF.
S->bA becomes SCbA.
Consider the productions AB, AS and BC. They can also be written as AεB, AεS, BεC. If we
assume Caε then the productionsbecome ACaB, ACaS, BCaC which are in CNF.
J. Veerendeswari/IT/RGCET Page 14
Theory Of Computation UNIT-II
SAD1
D1SA
ACaB
SCbA
ACaS
BCaC
Solution:
Consider S1A. The above production is of the form NTT*NT. In order to convert it to CNF, we
replace 1 by Ca. Upon replacing we get, SCaA which is in CNF.
Now consider S0B. The above production is of the form NTT*NT. In order to convert it to CNF, we
replace 0 by Cb. Upon replacing we get, SCbB which is in CNF.
The next production to be considered is A1AA which is of the form NTT*NT*NT. Let us assume
that D1AA. Also we know Ca1. The production becomes A->CaD1 which is in CNF.
The next production A0S is of the form NTT*NT. We know Cb0. Upon replacing we get, ACbS
which is in CNF. The other production A0 is already in CNF.
Now take into account the production B->0BB which is of the form NTT*NT*NT. Let us assume that
D2BB. Also we know Cb0. The production becomes BCbD2 which is in CNF.
The next production B1S is of the form NTT*NT. We know Ca1. Upon replacing we get, BCaS
which is in CNF. The other production B1 is already in CNF.
Hence the resultant CNF production
S->CaA
S->CbB
D1->AA
A->CaD1
A->CbS
A->0
B->CaS
J. Veerendeswari/IT/RGCETB->0 Page 15
Theory Of Computation UNIT-II
A Context-Free Grammar „G‟ is in Greibach Normal Form if every production is of the form
Aaα where α € VN* and a € Σ
(Non TerminalTerminal*any number of Non Terminals)
Aa (Non TerminalTerminal)
For Example:
SaAB
AbC
Bb
Cc
are in GNF. begin
for k:=1 to m do
Greibach Normal Form
Algorithm: begin
for j=1 to k-1 do
for each production of the form Ak->Ajα do
begin
for all productions Aj->β do
add production Ak->βα;
remove production Ak->Ajα
end;
for each production of the form Ak->Akα do
begin
add production Bk->α and Bk->α Bk;
remove production Ak->Akα
end;
for each production Ak->β, where β does
not begin with Ak do
add production Ak->β Bk
end
end
J. Veerendeswari/IT/RGCET Page 16
Theory Of Computation UNIT-II
Reduction to Greibach Normal Form
Lemma (1):
Let G= (V,P,T, S) be a CFG. Let ABγ be an A-production in P. Let the B-production be
BB1|B2|…..|Bn
Lemma (2):
Let G= (V, Σ, P, S) be a CFG. Let A->Bγ be an A-production be
A->Aα1|….. Aαγ| B1|…..|Bn
then z be a new variable when P1 is defined as:
(i)The set of A-productions in P1 are
A->β1|β2|….|βn
A-> β1z| β2z|…….| βnz
(ii)The set of z-productions in P1 are
z->α1|α2|…|αn
z->α1z|α2z|…|αnz
Example:
Construct Equivalent GNF for the CFG S->AA/a,A->SS/b
Step 1: The given grammar is in CNF. S and A are renamed as A1 and A2.
Hence the productions become A1-> A2 A2/a, A2-> A1 A1/b
There is no need for null production or unit production elimination because the production is
already in CNF.
Step 2:The A1 productions are in the required form. They are A1-> A2 A2/a.
The production A2->b is also in the required form whereas we have to apply lemma 1 to convert
the production A2-> A1A1to the required form. Applying lemma 1 we get,
A2-> A2A2 A1 A2-> aA1
J. Veerendeswari/IT/RGCET Page 17
Theory Of Computation UNIT-II
Step 5: The z2 productions to be modified are z2 -> A2A1 ,z2 -> A2 A1 z2.
Applying lemma 1 we get z2 -> aA1A1/bA1 /aA1z2 A1/ bz2A1
z2-> aA1A1z2/bA1 z2 /aA1z2 A1 z2/ bz2A1 z2
Hence the equivalent grammar is
A1->a/ aA1 A2/bA2/ aA1z A2/bz2A2
A2-> aA1/ b/ aA1z2/ bz2
z2 -> aA1A1/bA1 /aA1z2 A1/ bz2A1 z2
z2-> aA1A1z2/bA1 z2 /aA1z2 A1 z2/ bz2A1 z2
J. Veerendeswari/IT/RGCET Page 18
Theory Of Computation UNIT-II
A3->az2 A3->a
A2->bA3A2z2A1 A2->bA3A2A1
A2->az2A1 A2->aA1
A2->b
A1->bA3A2z2A1A3 A1->bA3A2A1A3
A1->b z2A1A3 A1->b A1A3
A1->bA3
z2-> A1A3A2 z2-> A1A3A2z2
Step 3:The two z2productionsare converted to proper form, resulting in 10 more productions. That is, the
productions
z2-> A1A3A2 z2-> A1A3A2z2
are altered by substituting the right side of each of the five productions with A1 on the left for the first
occurrences of A1. Thus z2-> A1A3A2 becomes
z2->bA3A2z2A1A3 A3A2 z2->az2A1A3A3A2
z2-> bA3A3A2 z2-> bA3A2z2A1A3 A3A2
z2-> aA1A3A3A2
The other production for z2 is replaced similarly. The final set of productions is
A3->bA3A2z2 A3->bA3A2
A3->az2 A3->a
A2->bA3A2z2A1 A2->bA3A2A1
A2->az2A1 A2->aA1
A2->b
A1->bA3A2z2A1A3 A1->bA3A2A1A3
A1->bA3
J. Veerendeswari/IT/RGCET Page 19
Theory Of Computation UNIT-II
Unambiguous Grammar: If the production contains exactly one derivation or one derivation tree then it is
termed as an unambiguous grammar.
Consider an example:
S->a/abSb/aAb
A->bS/aAAb, the word to be derived is W=abab
Derivation:
S->abSb
->abab
Hence derived.
Now consider
S->aAb
->abSb
->ababSbb
->ababbbb
S->abSb
->ababSbb
->ababbbb
A->aAAb
->abSbSb
->ababab
A->bS
->ba
J. Veerendeswari/IT/RGCET Page 20
Theory Of Computation UNIT-II
From the above derivations it is obvious that the string abab can have only one derivation. Hence the grammar
can be termed as an unambiguous grammar.
Begin
Class B= contains the states other than the final state B={a,b,c,e,f,g,h}
For class A
A
d
O A
For class B 1 B
B a b c e f g h
0 B B A A B B B
1 B B B B B B A
Class D={h}
B a b f g
0 B B C B
1 B C C B
For Class B
B a g
0 E E
1 B B
Class E={b,f}
Class B={a,g}
For Class C
J. Veerendeswari/IT/RGCET Page 22
Theory Of Computation UNIT-II
C c e
0 A A
1 E E
For Class D
D h
0 B
1 A
For Class E
E b f
0 B B
1 C C
O 1
1 0
B
A
C
0 0 1
1 0 1
D E
J. Veerendeswari/IT/RGCET Page 23