Sie sind auf Seite 1von 28

Practical Application:

Parsing

1
13. Parsing
Parsing is one of the major functions of the compiler of a programming language. Given a source code w, the parser examines w
to see whether it can be derived by the grammar of the programming language, and, if it can be, the parser constructs a parse tree
yielding w. Based on this parse tree, the compiler generates an object code. So, the parser acts as a membership test algorithm
designed for a given grammar G that, given a string w, tells us whether w is in L(G) or not, and, if it is, outputs a parse tree.
Notice that the parser tests the membership based on the given grammar. Recall that when we practiced constructing a PDA for a
given language, say {aibi | i > 0 }, we used the structural information of the language, such as a’s come first, then b’s, and the number
of a’s and b’s are same. Consider the two CFG’s G1 and G2 shown below in figure (a), which generate the same language {aibi | i >
0 }. Figure (b) shows a PDA that recognizes this language. For an input string w, this PDA does not give any information about the
grammar and how the string w is derived. Hence, we need a different approach to construct a parser based on the grammar, not the
language.
There are several algorithms available for parsing that, given an arbitrary CFG G and a string x, tell whether x ∈ L(G) or not, and
if it is, output how x is derived. (CYK algorithm is a typical example, which is shown in Appendix F.) However, these algorithms
are too slow to be practical. (For example, CYK algorithm takes O(n3) time for an input string of length n) Thus, we restrict CFG’s
to a subclass for which we can build a fast practical parser. This chapter presents two parsing strategies applicable to such restricted
grammars together with several design examples. Finally, the chapter briefly introduces Lex (the lexical analyzer generator) and
YACC (the parser generator).
( a, a/aa ) (b, a/ε )

G1: S → aSb | ab (a, Z0/aZ0 ) (b, a/ ε )


G2: S → aA A → Sb | b
(a) (ε , Z0/Z0 )
start (b)
2
Parsing

13.1 Derivation 354


Leftmost derivation, Rightmost-derivation
Derivations and parse trees
13.2 LL(k) parsing strategy 357
13.3 Designing an LL(k) parser 367
Examples
Definition of LL(k) grammars
13.4 LR(k) parsing strategy 379
13.5 Designing LR(k) parsers 387
Examples
Definition of LR(k) grammars
13.6 Lex and YACC 404
Rumination 409
Exercises 412

Memories Break Time

Two very elderly ladies were enjoying the sunshine on a park bench in Miami. They had been meeting at the park
every sunny day, for over 12 years, chatting, and enjoying each others friendship. One day, the younger of the two
ladies, turns to the other and says, “Please don't be angry with me dear, but I am embarrassed, after all these years...
What is your name ? I am trying to remember, but I just can't.”
The older friend stares at her, looking very distressed, says nothing for 2 full minutes, and finally with tearful
eyes, says, “How soon do you have to know ?”
- overheard by Rubin -

3
Parsing
13.1 Derivation
The parser of a grammar generates a parse tree for a given input string. For
convenience, the tree is commonly presented in a sequence of rules applied in
one of the following two ways to derive the input string starting with S.

• Leftmost derivation: A rule is applied with the leftmost nonterminal symbol


in the current sentential form.
• Rightmost derivation: A rule is applied with the rightmost nonterminal
symbol in the current sentential form.

Example: G: S → ABC A → aa B → a C → cC | c
Leftmost derivation: S ⇒ ABC ⇒ aaBC ⇒ aaaC ⇒ aaacC ⇒ aaacc
Rightmost derivation: S ⇒ ABC ⇒ ABcC ⇒ ABcc ⇒ Aacc ⇒ aaacc

4
Derivation Parsing

The derivation sequences, either leftmost or rightmost, are more


convenient to deal with than the tree data structure. However, to generate
an object code there must be a simple way to translate the derivation
sequence into its unique parse tree. The following two observations shows
how it can be done.
Observation 1: The sequence of rules applied according to the leftmost
derivation corresponds to the order of the nodes visited, when you traverse
the parse tree top-down (i.e., breadth first), left-to-right. (See the following
example.)
1S
G: S → ABC A → aa B → bD
2 3
C → cC | c D → bd A B C5
aa b 4 D c C6
Leftmost derivation:
1 2 3 c
S ⇒ ABC ⇒ aaBC ⇒ aabDC bd
123456
4 aabbdC 5
⇒ 6
⇒ aabbdcC ⇒ aabbdcc

5
Derivation Parsing

Observation 2: The reverse order of the rules applied according to the


rightmost derivation corresponds to the nodes visited, when you
traverse the parse tree bottom-up, left-to-right. (See the following
example.)

S1
G: S → ABC A → aa B → bD
C → cC | c D → bd 6A 4B C2
aa b 5D c C3
Rightmost derivation: c
1 2 3 bd
S ⇒ ABC ⇒ ABcC ⇒ ABcc
4 5 6 654321
⇒ AbDcc ⇒ Abbdcc ⇒ aabbdcc

6
Parsing
13.2 LL(k) parsing strategy
We know that parsers are different from PDA’s, because their membership test
should be based on the given CFG. Let’s try to build a conventional DPDA which,
with the grammar G stored in the finite control, tests whether the input string x is in
L(G), and, if it is, outputs a sequence of rules applied to derive x. We equip the
finite control with an output port for the output (see figure (b) below).
Our first strategy is to derive the same input string x in the stack. Because any
string must be derived starting with the start symbol, we let the machine push S into
the stack and enter state q1 for the next move. For convenience, we assign a rule
number to each rule as shown in figure (a).

(1) (2) (3)


aaaaaaaaaabbb
G: S → AB | AC A → aaaaaaaaaa
output
(4) (5) (6) (7) port q1 G
B → bB | b C → cC | c S →?

SZ0
L(G) = {a10 x | x = bi or x = ci, i ≥ 1 }
(a) (b)

7
LL(k) Parsing Parsing

Now, we ask which rule, either (1) or (2), the machine should apply with S to
eventually derive the string on the input tape. If the input string is derived using rule
(1) (rule (2)) first, then there should be the symbol b (respectively, symbol c) after
the 10-th a. Unfortunately, our conventional DPDA model cannot look-ahead the
input before reading it. Recall that conventional DPDA’s decide whether they will
read the input or not depending on the stack top symbol. Only after reading the
input does the machine knows what it is. Thus, without reading up to the 11-th input
symbol, there is no way for the machine in the figure to identify the symbol at that
position.

(1) (2) (3) aaaaaaaaaabbb


G: S → AB | AC A → aaaaaaaaaa output
port
(4) (5) (6) (7) q1 G S → ?
B → bB | b C → cC | c

L(G) = {a10 x | x = bi or x = ci, i ≥ 1 } SZ0


(a) (b)
8
LL(k) Parsing Parsing

To overcome this problem, we equip the finite state control with a “telescope”
with which the machine can look some finite k cells ahead on the input tape. For
the grammar G, it is enough to have a telescope with the range of 11 cells.
(Notice that for the range to look ahead, we also include the cell under the head.)
With this new capability, the machine scans the input string ahead in the range,
and, based on what it sees ahead, it takes the next move. While looking ahead,
the input head does not move.
A
N
(1) (2) (3)
I
G: S → AB | AC A → aaaaaaaaaa aaaaaaaaaabbb
(4) (5) (6) (7)
B → bB | b C → cC | c q1 G
S → AB !
L(G) = {a10 x | x = bi or x = ci, i ≥ 1 } SZ0
(a) (b)

9
LL(k) Parsing Parsing

Now, the parser, looking ahead 11 cells, sees aaaaaaaaaab. Since there is b at
the end, the machine chooses rule (1) (i.e., S → AB), rewrites the stack top S with
AB and outputs rule number (1) as shown in figure (a).
Let q, α , and β be, respectively, the current state, the remaining input portion
to read, and the current stack contents. From now on, for convenience we shall use
the triple (q, α , β ), called the configuration, instead of drawing the cumbersome
diagram to show the parser.

(1) (2)
aaaaaaaaaabbb α
G: S → AB | AC
(3) (1) q1 G q G
A → aaaaaaaaaa
(4) (5) A B Z0
B → bB | b β
(6) (7) (a) Apply rule S → AB (b) Configuration (q, α , β )
C → cC | c

10
LL(k) Parsing Parsing

(1) (2) (3) (4) (5) (6) (7)

G: S → AB | AC A → aaaaaaaaaa B → bB | b C → cC | c

Looking ahead 11 cells in the current configuration (q0, aaaaaaaaaabbb, SZ0), the
parser applies rule (1) by rewriting the stack top S with the rule’s right side AB.
Consequently, the configuration changes as follows.

look-ahead 11 cells
(1)
(q0, aaaaaaaaaabbb, Z0) ⇒(q1, aaaaaaaaaabbb, SZ0) ⇒(q1, aaaaaaaaaabbb, ABZ0)

Now, with nonterminal symbol A at the stack top, the parser must find a rule to
apply. Since A has only one rule, i.e., rule (3), there is no choice. So, the parser
applies rule (3), consequently changing the configuration as follows.
(3)
(q1, aaaaaaaaaabbb, ABZ0) ⇒ (q1, aaaaaaaaaabbb, aaaaaaaaaaBZ0)

11
LL(k) Parsing Parsing

(1) (2) (3) (4) (5) (6) (7)

G: S → AB | AC A → aaaaaaaaaa B → bB | b C → cC | c

Notice that the terminal symbol appearing at the stack top after applying rule
(3) corresponds to the leftmost terminal symbol appearing in the leftmost
derivation. Thus, the terminal symbol appearing at the stack top must match the
next input symbol, if the input string is generated by the grammar.
So, the parser, seeing a terminal symbol at the stack top, reads the input and, if
they match, pops the stack top. The following sequence of configurations shows
how the parser successfully pops all the terminal symbols pushed on the stack
top by applying rule (3).
(1)
(q0, aaaaaaaaaabbb, Z0) ⇒(q1, aaaaaaaaaabbb, SZ0) ⇒(q1, aaaaaaaaaabbb, ABZ0)

(3)
⇒ (q1, aaaaaaaaaabbb, aaaaaaaaaaBZ0) ⇒ . . .⇒(q1, abbb, aBZ0) ⇒(q1, bbb, BZ0)

12
LL(k) Parsing Parsing

(1) (2) (3) (4) (5) (6) (7)

G: S → AB | AC A → aaaaaaaaaa B → bB | b C → cC | c
(1) (3)
(q0, aaaaaaaaaabbb, Z0) ⇒(q1, aaaaaaaaaabbb, SZ0) ⇒(q1, aaaaaaaaaabbb, ABZ0) ⇒

(q1, aaaaaaaaaabbb, aaaaaaaaaaBZ0) ⇒ . . . .⇒(q1, abbb, aBZ0) ⇒(q1, bbb, BZ0) ⇒ ?

Now, the parser must choose one of B’s rules, either (4) or (5). If there remains
only one b in the input tape, rule (5) is the choice. Otherwise (i.e., if there are
more than one b), rule (4) must be applied. It follows that the parser needs to look
two cells ahead and proceeds as follows.

Look-ahead 2 cells
(4) (4)
(q1, bbb, BZ0) ⇒ (q1, bbb, bBZ0) ⇒ (q1, bb, BZ0) ⇒ (q1, bb, bBZ0) ⇒
(5)
(q1, b, BZ0) ⇒ (q1, b, bZ0) ⇒ (q1, ε , Z0)

13
LL(k) Parsing Parsing

In summary, our parser works as follows, where underlined parts of the input
string are look-ahead contents and the numbers are the rules in the order applied
during the parsing.
(1) (3)
(q0, aaaaaaaaaabbb, Z0)⇒(q1, aaaaaaaaaabbb, SZ0)⇒(q1, aaaaaaaaaabbb, ABZ0) ⇒
(q1, aaaaaaaaaabbb, aaaaaaaaaaBZ0)⇒ . . . .⇒(q1, abbb, aBZ0)⇒
(4) (4)
(q1, bbb, BZ0) ⇒ (q1, bbb, bBZ0) ⇒ (q1, bb, BZ0) ⇒ (q1, bb, bBZ0) ⇒
(5)
(q1, b, BZ0) ⇒ (q1, b, bZ0) ⇒ (q1, ε , Z0)

Notice that the last configuration above implies a successful parsing. It shows that
the sequence of rules applied on the stack generates exactly the same string as the one
originally written on the input tape. If the parser fails to reach the accepting
configuration, we say the input is rejected. In the above example, the sequence of rules
applied to the nonterminal symbols appearing at the stack top matches the sequence of
rules applied for the leftmost derivation of the input string shown below.
(1) (3) (4) (4) (5)
S ⇒ AB ⇒ aaaaaaaaaaB ⇒ aaaaaaaaaabB ⇒ aaaaaaaaaabbB ⇒ aaaaaaaaaabbb
14
LL(k) Parsing Parsing

(1) (2) (3) (4) (5) (6) (7)

G: S → AB | AC A → aaaaaaaaaa B → bB | b C → cC | c

For the other input strings ending with c’s, the parser can apply the same
strategy and successfully parse it by looking ahead at most 11 cells (see below).
This parser is called an LL(11) parser, named after the following property of the
parser; the input is read Left-to-right, the order of rules applied matches the order
of the Leftmost derivation, and the longest look-ahead range is 11 cells. For a
grammar G, if we can build an LL(k) parser, for some constant k, we call G an
LL(k) grammar.
(2) (3)
(q0, aaaaaaaaaabbb, Z0) ⇒(q1, aaaaaaaaaaccc, SZ0) ⇒(q1, aaaaaaaaaaccc, ACZ0) ⇒
(q1, aaaaaaaaaaccc, aaaaaaaaaaCZ0) ⇒ . . . .⇒(q1, abbb, aCZ0) ⇒
(6) (6)
(q1, ccc, CZ0) ⇒ (q1, ccc, bBZ0) ⇒ (q1, cc, CZ0) ⇒ (q1, cc, cCZ0) ⇒
(7)
(q1, c, CZ0) ⇒ (q1, c, cZ0) ⇒ (q1, ε , Z0)

15
LL(k) Parsing Parsing

Formally, an LL(k) parser is defined by a parse table with the nonterminal


symbols on the rows and the look-ahead contents on the columns. The table entries
are the right sides of the rules applied. Blank entries are for the rejecting cases.
The parse table below is constructed based on our observations, while analyzing
how the parser should work for the given input string. In the look-ahead contents, X
is a don’t-care symbol, and ε means no look-ahead is needed.
(1) (2) (3) (4) (5) (6) (7)

G: S → AB | AC A → aaaaaaaaaa B → bB | b C → cC | c

Contents of 11 look-ahead
Stack a10 b a10 c bbX9 bB10 ccX9 cB10 ε
top S AB AC
A a10
B bB b
cC c
C
Parse Table
16
Parsing

13.3 Designing an LL(k) Parser


Example 1. Design an LL(k) parser with minimum k for the following CFG.

(1) (2)
S → aSb | aabbb

The language of this grammar is {aiaabbbbi | i ≥ 0}. Every string generated by


this grammar has aabbb at the center. As we did in the preceding section, let’s
examine how an LL(k) parser will parse the input aaaabbbbb with the shortest
possible look-ahead range of k.
To parse the input string successfully, the machine should apply the rules in the
order of (1), (1), (1), (2), which is the same order applied for the following
leftmost derivation.

(1) (1) (1) (2)


S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaaaabbbbbb

17
Designing LL(k) Parser Parsing
Pushing the start symbol S into the stack in the initial configuration, the parser
gets ready to parse the string as shown below. With S in the stack top, it must
apply one of S’s two rules. To choose one of them, the parser needs to look ahead
for supporting information. What could be the shortest range to look ahead?

(1) (2)
(q0, aaaaabbbbbb, Z0) ⇒ (q1, aaaaabbbbbb, SZ0) ⇒ ?
S → aSb | aabbb

If there is aabbb, rule (2) must be applied. So it appears k = 5. But the parser does
not have to see the whole string. If there is aaa ahead, the leftmost symbol a must
have been generated by rule (1). Otherwise, if there is aab ahead, the leftmost a
must have been generated by rule (2). It is enough to look ahead 3 cells (i.e., k = 3).

Thus, in the current configuration, since the contents of 3 look-ahead is aaa, the
parser applies rule (1), then reads the input to match and pop the terminal symbol a
from the stack top as follows.
(1)
(q1, aaaaabbbbbb, SZ0) ⇒(q1, aaaaabbbbbb, aSbZ0) ⇒(q1, aaaabbbbbbb, SbZ0)

Look-ahead 3
18
Designing LL(k) Parser Parsing

Again, with S on the stack top, the parser looks ahead 3 cells, and seeing aaa,
applies rule (1), and repeats the same procedure until it looks ahead aab as follows.

(1)
(q1, aaaaabbbbbb, SZ0) ⇒(q1, aaaaabbbbbb, aSbZ0) ⇒
(1) (2) (1)
S → aSb | aabbb (q1, aaaabbbbbbb, SbZ0) ⇒ (q1 , aaaabbbbbb, aSbbZ0 ) ⇒
(1)
(q1, aaabbbbbbb, SbbZ0) ⇒ (q1 , aaabbbbbb, aSbbbZ0 ) ⇒

(q1 , aabbbbbb, SbbbZ0 ) ⇒?

Now, the parser finally applies rule (2), and keeps reading and match-and-
popping until it enters the accepting configuration as follows.
(2)
(q1 , aabbbbbb, SbbbZ0 ) ⇒ (q1 , aabbbbbb, aabbbbbbZ0 ) ⇒ … ⇒ (q1 , ε , Z0)

19
Designing LL(k) Parser Parsing

The parser applied the rules in the order, (1), (1), (1), (2), which is the same
order applied for the leftmost derivation of the input string aaaaabbbbbb.

(1) (1) (1) (2)


S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaaaabbbbbb

Given an arbitrary input string, the parser, applying the same procedure, will end
up in the final accepting configuration if and only if the input belongs to the
language of the grammar. The parser needs to look ahead at least 3 cells. Hence, the
grammar is LL(3). The parse table is shown below.

3 look-ahead
aaa aab
Stack top
(1) (2) S aSb aabbb
S → aSb | aabbb

Parse Table

20
Designing LL(k) Parser Parsing

Example 2. Construct an LL(k) parser with minimum k for the following CFG.

(1) (2) (3) (4)


S → abA | ε A → Saa | b

As we did for Example1, we pick up a typical string, ababaaaa, derivable by the


grammar, and examine how it can be parsed according to the LL(k) parsing
strategy with minimum k. Then, based on the analysis, we will construct a parse
table. The order of the rules applied by the parser should be the same as the one
applied in the following leftmost derivation.
(1) (3) (1) (3) (2)
S ⇒ abA ⇒ abSaa ⇒ ababAaa ⇒ ababSaaaa ⇒ ababaaaa
Pushing the start symbol S on the top of the stack, the parser must choose
either rule (1) or (2) that will lead to finally deriving the input string. For the
choice, is there any useful information ahead on the input tape?

(q0, ababaaaa, Z0) ⇒ (q1, ababaaaa, SZ0) ⇒ ?

21
Designing LL(k) Parser Parsing

(1) (2) (3) (4)


S → abA | ε A → Saa | b

If the input is not empty, the parser, with S at the stack top, should choose rule (1) to
apply. Then, as shown below, for each terminal symbol appearing at the stack top, the
parser reads the next input symbol, and if they match, pops out the stack top until A
appears. If the input tape was empty, the parser would simply pops S (i.e., rewrites S
with ε ) and enters the accepting configuration. Now, with A at the stack top, the
parser should choose a rule between (3) and (4).
(1)
(q1, ababaaaa, SZ0) ⇒ (q1, ababaaaa, abAZ0) ⇒ . . ⇒ (q1, abaaaa, AZ0) ⇒?

If rule (4) was used to derive the input, the next input symbol ahead should be b, not
a. Looking symbol a ahead, the parser applies rule (3), and consequently, having S on
the stack top as before, it needs to look ahead to choose the next rule. Up to this point,
it appears that 1 look-ahead is an appropriate range.
(3)
(q1, abaaaa, AZ0) ⇒ (q1, abaaaa, SaaZ0) ⇒ ?

22
Designing LL(k) Parser Parsing

But this time, with S at the stack top it is uncertain which rule to apply. Looking
a ahead, the parser can apply either rule (1) or rule (2), because in either case, the
parser will successfully match the stack top a with the next input symbol a (see
below). To resolve this uncertainty, the parser needs one more cell to look ahead.
To solve this problem we could have the parser look down the stack. But we have
chosen to extend the range of look-ahead, a straightforward solution. Later in this
chapter, we will discuss parsers which are allowed to look down the stack some
finite depth.
(1) (q1 , abaaaa, abaaZ0)
(1) (2) (3) (4)

(q1 , abaaaa, SaaZ0) ⇒
S → abA | ε A → Saa | (2) (q , abaaaa, aaZ )
1 0
b
Now, looking ab ahead in the extended range, which must be generated by rule
(1), the parser applies the rule and repeats the previous procedure as follows till S
appears at the stack top again.
(1) (3)
(q1 , abaaaa, SaaZ0 ) ⇒ (q1 , abaaaa, abAaaZ0 ) ⇒. . ⇒ (q1, aaaa, AaaZ0) ⇒
(q1, aaaa, SaaaaZ0) ⇒?
23
Designing LL(k) Parser Parsing

(1) (2) (3) (4)


S → abA | ε A → Saa | b

Looking aa ahead with S on the stack top, the parser applies rule (2). Then, for
each a appearing at the stack top, it keeps reading the next input symbol, matching
them and popping the stack top, eventually entering the accepting configuration.
(2)
(q1, aaaa, SaaaaZ0) ⇒ (q1, aaaa, aaaaZ0) ⇒ . . . . ⇒ (q1, ε , Z0)

In summary, the parser parses the input string ababaaaa as follows.


(1) (3)
(q1, ababaaaa, SZ0) ⇒ (q1, ababaaaa, abAZ0) ⇒ . . ⇒ (q1, abaaaa, AZ0) ⇒
(1) (3)
(q1, abaaaa, SaaZ0) ⇒ (q1 , abaaaa, abAaaZ0 ) ⇒. . ⇒ (q1, aaaa, AaaZ0) ⇒
(2)
(q1, aaaa, SaaaaZ0) ⇒ (q1, aaaa, aaaaZ0) ⇒ . . . . (q1, ε , Z0)

24
Designing LL(k) Parser Parsing

The input string that we have just examined is the one derived by applying rule
(2) last. For the other typical string ababbaa that can be derived by applying rule (4)
last, the LL(2) parser will parse it as follows.
(1) (3)
(q1, ababbaa, SZ0) ⇒ (q1, ababbaa, abAZ0) ⇒ . . ⇒ (q1, abbaa, AZ0) ⇒
(1)
(q1, abbaa, SaaZ0) ⇒ (q1 , abbaa, abAaaZ0 ) ⇒. . ⇒ (q1, baa, AaaZ0)
(3)
⇒ (q1, baa, baaZ0) ⇒ . . . . ⇒ (q1, ε , Z0)

From the analysis with the two parsing examples, we construct the following parse
table. (Notice that with A at the stack top, though 1 look-ahead is enough, the
entries are under the column of 2 look-ahead.)
2 look-ahead
ab aa bX BB
(1) (2) (3) (4) Stack
S → abA | ε A → Saa | top S abA ε ε
B: blank
b A Saa Saa b X: don’t care
Parse Table
25
Designing LL(k) Parser Parsing

For a given input string, the basic strategy of LL(k) parsing is to generate the
same string on the top of the stack by rewriting every nonterminal symbol appearing
at the stack top with the right side of that nonterminal’s rule. If the nonterminal
symbol has more than one rule, the parser picks the right one based on the prefix of
the input string appearing on k cells looked ahead.
Whenever a terminal symbol appears on the stack top, the machine reads the next
input symbol and pops the stack top, if they match. The sequence of rules applied for
a successful parsing according this strategy is the same as the one applied for the
leftmost derivation of the input string.
The class of CFG’s that can be parsed by LL(k) parsing strategy is limited. The
CFG G1 below is an example for which no LL(k) parser exists. However, G2, which
generates the same language, is an LL(k) grammar. We will shortly explain why.

G1: S → A | B A → aA | 0 B → aB | 1
G2: S → aS | D D→ 0|1
L(G1) = L(G2) = {ait | i ≥ 0, t ∈ {0, 1}}

26
Designing LL(k) Parser Parsing

Consider the first working configuration illustrated below (with the start symbol S
on top of the stack.) The parser should choose one of S’s two rules, S→A and S
→B. But it is impossible to choose a correct rule, because the right end symbol 0 (or
1), which is essential for the correct choice, can be located arbitrarily far to the right.
It is impossible for any LL(k) parser to identify it ahead with its “telescope” of a
finite range k.
But for the grammar G2, we can easily design an LL(1) parser.

aaaa..... aa0
G1: S → A | B A → aA | 0 B → aB | 1
G2: S → aS | D D→ 0|1 q1 G1

L(G1) = L(G2) = {ait | i ≥ 0, t ∈ {0, 1}}


SZ0 S → ?

27
Parsing
Definition of LL(k) grammars
We saw just now two CFG’s that generate the same language, but for the one,
no LL(k) parser exists, and for the other, we can design an LL(k) parser. So, we
may ask the following: What is the property of LL(k) grammars?
For a string x, let (k) x denote the prefix of length k of string x. If | x | < k, then
(k)
x = x. For example, (2) ababaa = ab, (3) ab = ab.
Definition (LL(k) grammar). Let G = (VT, VN, P, S) be a CFG. Grammar G is an
LL(k) grammar if it satisfies the following condition. Consider two arbitrary
leftmost derivations of the following forms.
S ⇒* ω Aα ⇒ ω β α ⇒* ω y
S ⇒* ω Aα ⇒ ω γ α ⇒* ω x
, where α , β , γ ∈(VT ∪VN)*, ω , x, y ∈ VT*, A ∈VN.
If (k) x = (k) y , then it must be that β = γ . That is, in the above two derivations,
the same rule of A should have been applied if (k) x = (k) y.
The above condition implies that with a nonterminal symbol A on the stack
top, the parser can identify A’s rule to apply by looking ahead k cells. If G has
such property, we can build an LL(k) parser.
28

Das könnte Ihnen auch gefallen