Beruflich Dokumente
Kultur Dokumente
1 2
ScAd • ScAd
Aab|a
• Aab|a
3 4
SAB
Another example AaA|ε Top down vs. bottom up parsing
BbB|b
• Bottom up approach
Expansion Remaining input Action aabb
• Given the rules
S aabb Try SAB SAB – SAB a abb
aAB aa bb
aaAB – AaA|ε
AB aabb Try AaA aaεB – BbB|b
aaε bb
aaA bb
aAB aabb match a aabB aA bb
aabb
• How to parse aabb ? A bb
AB abb AaA Ab b
• Topdown approach
Abb
aAB abb match a SAB
AbB
aAB
AB bb Aepsilon aaAB
AB
S
B bb BbB aaεB
aabB • If read backwards, the derivation is
bB bb match b aabb right most
• In both topdown and bottom up
B b Bb Note that it is a left most approaches, the input is scanned
derivation from left to right
b b match
5 6
1
Recursive descent parsing
• Each method corresponds to a non-terminal
//BbB|b
static boolean checkS() {
int savedPointer = pointer;
if (checkA() && checkB()) SAB
static boolean checkB() {
return true;
int savedPointer = pointer;
pointer = savedPointer;
return false; if (nextToken().equals(‘b’) && checkB())
} return true;
pointer = savedPointer;
static boolean checkA() {
int savedPointer = pointer; AaA|ε if(nextToken().equals(‘b’)) return true;
if (nextToken().equals(‘a’) && checkA()) pointer = savedPointer;
return true;
return false;
pointer = savedPointer;
return true; }
}
7 8
9 10
2
Recursive Descent parsing Summary of recursive descent parser
• Recursive descent parsing is an easy, natural way to code top-down • Simple enough that it can easily be constructed by hand;
parsers. • Not efficient;
– All non terminals become procedure calls that return true or false; • Limitations:
– all terminals become matches against the input stream. /** E E+T | T **/
• Example: static boolean expr() throws Exception {
/** assignment--> ID=exp **/ int savePointer = pointer;
if ( expr()
static boolean assignment() throws Exception{
&& nextToken().sym==Calc2Symbol.PLUS
int savePointer= pointer; && term())
if ( nextToken().sym==Calc2Symbol.ID return true;
&& nextToken().sym==Calc2Symbol.EQUAL pointer = savePointer;
&& expr()) if (term()) return true;
return true; pointer = savePointer;
pointer = savePointer; return false;
return false; }
}
• A recursive descent parser can enter into infinite loop.
13 14
17 18
3
Predictive parsing Left factoring
• Predictive parser is a special case of top-down parsing when no • General method
backtracking is required;
For a production Aα β1 | αβ2| ... | αβn | γ
• At each non-terminal node, the action to undertake is unambiguous;
STATif ... where γ represents all alternatives that do not begin with α,
| while ... it can be replaced by
| for ... Aα B | γ
• Not general enough to handle real programming languages; Bβ1 | β2 | ... | βn
• Grammar must be left factored;
IFSTATif EXPR then STAT
• Example
| if EXPR then STAT else STAT ET+E | T
– A predictive parser must choose the correct version of the IFSTAT Can be transformed into:
before seeing the entire input ET E’
– The solution is to factor out common terms:
IFSTATif EXPR then STAT IFREST E’+E| ε
IFRESTelse STAT | ε
• Consider another familiar example:
ET+E | T
19 20
• The recursive descent parser is not efficient because of • LL(1) grammar has no left-recursive productions and has
the backtracking and recursive calls. been left factored.
• a predictive parser does not require backtracking. – left factored grammar with no left recursion may not be LL(1)
– able to choose the production to apply solely on the basis of the • there are grammars that cannot be modified to become
next input symbol and the current nonterminal being processed LL(1).
• In such cases, another parsing technique must be
• To enable this, the grammar must be LL(1). employed, or special rules must be embedded into the
– The first “L” means we scan the input from left to right; predictive parser.
– the second “L” means we create a leftmost derivation;
– the 1 means one input symbol of lookahead.
21 22
23 24
4
First(): Definition First() algorithm
• First(): compute the set of terminals that can begin a rule
• The First set of a sequence of symbols α, written as First 1. if a is a terminal, then first(a) is {a}.
(α), is the set of terminals which start the sequences of 2. if A is a non-terminal and Aaα is a production, then add a to first(A).
symbols derivable from α. if Aε is a production, add ε to first(A).
3. if Aα1 α2 ... αm is a production, add Fisrt(α1)-ε to First(A).
– If α =>* aβ, then a is in First(α).
If α1 can derive ε, add First(α2)-ε to First(A).
– If α =>* ε , then ε is in First(α). If both α1 and α2 derives ε, add First(α3)-ε to First(A). and so on.
If α1 α2 ... αm =>*ε , add ε to First(A).
5
The use of First() and Follow() LL(1) parse table construction
• Construct a parse table (PT) with one axis the set of terminals, and
• If we want to expand S in this grammar: the other the set of non-terminals.
S A ... | B ... • For all productions of the form Aα
A a... – Add Aα to entry PT[A,b] for each token b in First(α);
B b ... | a... – add Aα to entry PT[A,b] for each token b in Follow(A) if First(α)
contains ε;
• If the next input character is b, we should rewrite S with
– add Aα to entry PT[A,$] if First(α) contains ε and Follow(A) contains $.
A... or B ....?
S Aa | b
– since First(B) ={a, b}, and First(A)= {a}, we know to rewrite S with A b d Z | eZ First Follow
B; S Aa b, e $
Z cZ | a d Z | ε
– First and Follow gives us information about the next characters b b
expected in the grammar. a b c d e $
A bdZ b a
• If the next input character is a, how to rewrite S? S SAa SAa eZ e
Sb
– a is in both First(A) and First(B); Z cZ c a
A AbdZ AeZ
– The grammar is not suitable for predictive parsing. adZ a
Z Zε ZcZ ε ε
ZadZ
31 32
a b c d e $
Construct the parsing table S SAa SAa S Aa | b
Sb A b d Z | eZ
Z cZ | a d Z | ε
• if Aα, which column we place Aα in row A? A AbdZ AeZ
– in the column of t, if t can start a string derived from α, i.e., t in Z Zε ZcZ
ZadZ
First(α).
– what if α is empty? put Aα in the column of t if t can follow an A,
i.e., t in Follow(A). Stack RemainingInput Action
– Note that it is not LL(1) because there are more than one rule can be selected.
– The correspondent (leftmost) derivation
SAabdZabdε a
– Note when Zε rule is used.
33 34
6
Construct LL(1) parse table LL(1) parse program
{ } ; , c d $
SP$
input$
S
P P{D;C} S
D DdD2 t Program
a
D2 D2ε D2,D parse
c
C CcC2 k table
C2 C2ε C2,C $
First Follow
SP S { $
• Stack: contain the current rewrite of the start symbol.
P { D; C} P { $
D d D2 • Input: left to right scan of input.
D d ;
D2 , D | ε • Parse table: contain the LL(k) parse table.
Cc C2 D2 , ε ;
C2 , C | ε C c }
C2 ,ε }
37 38
SP
LL(1) parsing algorithm Running LL(1) parser P { D; C}
D d D2
Stack Remaining Input Action D2 , D | ε
• Use the stack, input, and parse table with the following S {d,d;c}$ predict SP$ Cc C2
rules: P$ {d,d;c}$ predict P{D;C} C2 , C | ε
{D;C}$ {d,d;c}$ match {
– Accept: if the symbol on the top of the stack is $ and the input D;C}$ d,d;c}$ predict Dd D2
symbol is $, successful parse d D2 ; C } $ d,d;c}$ match d Derivation
– match: if the symbol on the top of the stack is the same as the D2 ; C } $ ,d;c}$ predict D2,D SP$
next input token, pop the stack and advance the input ,D;C}$ ,d;c}$ match , { D;C}$
D;C}$ d;c}$ predict Dd D2 {d D2 ; C } $
– predict: if the top of the stack is a non-terminal M and the next d D2 ; C } $ d;c}$ match d {d, D;C}$
input token is a, remove M from the stack and push entry PT[M,a] D2 ; C } $ ;c}$ predict D2ε {d,d D2; C } $
on to the stack in reverse order ε;C}$ ;c} $ match ; {d,d; C} $
– Error: Anything else is a syntax error C}$ c}$ predict Cc C2 {d,d; c C2}$
c C2 } $ c}$ match c {d,d; c} $
C2 } $ }$ predict C2ε
}$ }$ match }
$ $ accept Note that it is leftmost
derivation
39 40
7
Parsing the wrong expression int*]+int Error handling
Stack Remaining input Action • There are three types of error processing:
E$ int*]+int $ predicate ETE’
– report, recovery, repair
TE’$ int*]+int $ predicate TFT’
FT’E’ $ int*]+int $ predicate Fint • general principles
int T’ E’ $ int*]+int $ match int – try to determine that an error has occurred as soon as possible. Waiting
T’ E’ $ * ]+int $ predicate T’*FT’ too long before declaring an error can cause the parser to lose the actual
location of the error.
* F T’ E’ $ *]+ int $ match *
F T’ E’ $ ]+int $ error, skip ] – Error report: A suitable and comprehensive message should be reported.
F T’ E’$ + int $ PT[F, +] is sync, pop F “Missing semicolon on line 36” is helpful, “unable to shift in state 425” is
not.
T’E’$ +int$ predicate T’ ε
– Error recovery: After an error has occurred, the parser must pick a likely
E’ $ +int$ predicate E’+TE’
place to resume the parse. Rather than giving up at the first problem, a
... ... parser should always try to parse as much of the code as possible in
It is easy for LL(1) parsers to skip error order to find as many real errors as possible during a single run.
– A parser should avoid cascading errors, which is when one error
+ * ( ) int ] $ generates a lengthy sequence of spurious error messages.
E ETE’ ETE’ error
45 46
47 48