Sie sind auf Seite 1von 21

c

Lecture Note for 300121 Formal Languages and Automata UWS


(Yan Zhang)

Lecture Five: Context-Free Languages

1. Context-free grammars
2. Parsing and ambiguity
3. Context-free grammars and programming languages
4. Tutorial questions

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

Context-free grammars
Recall regular grammar definition:
A grammar G = (V, T, S, P ) is called to be right-linear if all its
productions are of the form:
A xB, or
A x,
where A, B V and x T .
For example, grammar G:
A aB, B abA|b,
is a right-linear regular grammar, and L(G) = {(aab)n ab : n 0}.

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

Observation: Regular languages cannot represent some important


structures in programming languages.
For instance, L = {an bn : n 0} is not a regular language, but it
represents nested structures that are essential in programming
languages.
By replacing a with ( and b with ), (), (()), ((())), are also in L.
We need a more powerful type of languages that can capture most
features of programming languages.

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

Definition. A grammar G = (V, T, S, P ) is called a context-free if all


productions in P have the form A x, where A V and
x (V T ) .
A language L is called context-free iff there is a context-free grammar
G such that L = L(G).
Consider the difference between a regular grammar and a
context-free grammar.

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

Example. G = ({S}, {a, b}, S, P ), where P :


S aSa,
S bSb,
S ,
is a context-free grammar, and L(G) = {ww R : w {a, b} }.
wR denotes the reversed string of w.

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

Consider G: S aSb|SS|.
L(G) = {, ab, abab, aabbaabb, }
= {w {a, b} : na (w) = nb (w), na (v) nb (v),
v is any prefix of w}.

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

Leftmost and rightmost derivations


Consider G = ({A, B, S}, {a, b}, S, P ), where P contains:
S AB
A aaA|
B Bb|
Question: is aab in L(G)?
Leftmost derivation:
S AB aaAB aaB aaBb aab
Rightmost derivation:
S AB ABb aaABb aaAb aab

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

Derivation tree
Consider G:
S aAB
A bBb
B A|
S

Figure 1: A derivation tree for abbbb.

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

The equivalence between sentential forms and derivation


trees
Theorem. Let G be a context-free grammar. A string w L(G) iff
there exists a derivation tree that yields w.
Example.
S aaB
A bBb|
B Aa
Show string aabbabba is not in L(G).
S aaB aaAa aabBba aabAaba aabbBbaba
aabbAababa
Note the surfix of aabbAababa is ababa which does not match the
surfix babba of the given string aabbabba. So the given string is not in
L(G).

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

2 Parsing and ambiguity


parsing and membership
Parsing is a procedure that finds a sequence of productions to
generate a string w L(G).
Exhaustive search parsing method: it searches all possible derivations
from the productions in S and check if one of these derivations can
generate w.
Example. S SS|aSb|bSa|. Let w = aabb.
Step 1. look at all four possible derivations:
(1)
(2)
(3)
(4)

S
S
S
S

SS,
aSb,
bSa,
.

(3) and (4) cannot lead to the generation of aabb, so give up them.

10

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

Step 2: from (1) above, we have


(1.1)
(1.2)
(1.3)
(1.4)

S
S
S
S

SS
SS
SS
SS

SSS,
aSbS
bSaS,
S,

Step 3: from (2), we have


(2.1)
(2.2)
(2.3)
(2.4)

S
S
S
S

aSb aSSb,
aSb aaSbb,
aSb abSab,
aSb ab,

Step 4: we finally find (ignore other options):


(2.2.4) S aSb aaSbb aabb.

11

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

Observation:
(1) Exhaustive search parsing is too tedious and inefficient;
(2) For some w 6 L(G), it is possible the parsing procedure never
stops.
Example.
S aaB
A bBb|
B Aa
Show string aabbabba is not in L(G).
In a context-free grammar G, removing productions
A and
AB
from G will make derivation much easier.

12

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

Theorem. Suppose G = (V, T, S, P ) is a context-free grammar which


does not contain productions A and A B where A, B V .
Then the exhaustive search parsing method can be made into an
algorithm which, for any w + , either producres a parsing of w, or
tells that no parsing is possible.
Theorem. For every context-free grammar there exists an algorithm
that parses any w L(G) in a number of steps proportional to |w|3 ,
i.e. O(|w|3 ).

13

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

ambiguity in grammars and languages


Definition. A context-free grammar S is ambiguous if for some
w L(G), there are at least two distinct derivation trees.
Example. Consider S aSb|SS|. Is this grammar ambiguous?
Class discussion.

14

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

Example. A grammar for simple arithmetic expressions:


G = (V, T, E, P ), where V = {E, I}, T = {a, b, c, +, , (, )}, and P :
E I|E + E|E E|(E),
I a|b|c.
Consider the different derivation trees for a + b c - Class discussion.

15

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

Most of the time, we want to have an unambiguous grammar.


Sometimes, we can transform an ambiguous grammar into an
equivalent unambiguous grammar.
Consider:
E T,
T F,
F I,
E E + T,
T T F,
F (E),
I a|b|c.
Derivation tree for a + b c.
Class discussion - the difference between this one and the previous
one.

16

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

Note in general it is a hard question whether a grammar is


ambiguous or two grammars are equivalent.
Definition. If L is a context-free language for which there exists an
unambiguous grammar, then L is called unambiguous. If every
grammar that generates L is ambiguous, then L is called inherently
ambiguous.

17

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

Example. L = {an bn cm } {an bm cm } where n, m 0.


We show that L is an inherently ambiguous context-free langauge.
(1) L is a context-free. Why?
L = L1 L2 :
L1 = {an bn cm } and L2 = {an bm cm }.
L1 is generated by:
S1 S1 c|A,
A aAb|.
L2 is generated by:
S2 aS2 |B,
B bBc|.
L is generated by S S1 |S2 .

18

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

(2) grammar S S1 |S2 is ambiguous.


Consider string an bn cn : it has two different derivation from S:
S S1 , or
S S2 .
Consider the case n = 2 - Class discussion
(3) L is inherently ambiguous because there is no possibility to solve
the conflicting requirements on strings between L1 and L2 , i.e.
an bn cm and an bm cm .
Class discussion

19

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

3 Context-free grammars and programming languages


Most important features in programming languages can be defined by
context-free grammars, e.g. arithmetic expressions, if-then
statement, while statement, etc.
Backus-Naur Form (BNF)
<expression>::=<term>|<expression>+<term>
<term>::=<factor>|<term>*<factor>
<if_statement>::=if<expression><then_clause><else_clause>

20

c
Lecture Note for 300121 Formal Languages and Automata UWS
(Yan Zhang)

4 Tutorial questions
1. Exercises 3, 7(c), 8(h), and 19 on pages 133-134.
2. Exercises 1, 13 and 15 on pages 144-145.
Questions in the 3rd edition textbook
1. Exercises 3, 7(c), 8(h), and 18 on pages 133-135.
2. Exercises 1, 13 and 15 on page 145.

21

Das könnte Ihnen auch gefallen