Sie sind auf Seite 1von 12

Syntax Analysis

CS2210
Lecture 4

CS2210 Compiler Design 2004/05

Parser

token
lexical parse rest of IR
source parser tree
analyzer frontend
get next
token

symbol
table

Parsing = determining whether a string of tokens


can be generated by a grammar
CS2210 Compiler Design 2004/05

Grammars
■ Precise, easy-to understand description of
syntax
■ Context-free grammars -> efficient parsers
(automatically!)
■ Help in translation and error detection
■ Eg. Attribute grammars
■ Easier language evolution
■ Can add new constructs systematically

CS2210 Compiler Design 2004/05

1
Syntax Errors
■ Many errors are syntactic or exposed by
parsing
■ eg. Unbalanced ()
■ Error handling goals:
■ Report errors quickly & accurately
■ Recover quickly (continue parsing after
error)
■ Little overhead on parse time

CS2210 Compiler Design 2004/05

Error Recovery
■ Panic mode
■ Discard tokens until synchronization token found (often ‘;’)
■ Phrase level
■ Local correction: replace a token by another and continue
■ Error productions
■ Encode commonly expected errors in grammar
■ Global correction
■ Find closest input string that is in L(G)
■ Too costly in practice

CS2210 Compiler Design 2004/05

Context-free Grammars
■ Precise and easy way to specify the
syntactical structure of a programming
language
■ Efficient recognition methods exist
■ Natural specification of many “recursive”
constructs:
■ expr -> expr + expr | term

CS2210 Compiler Design 2004/05

2
Context-free Grammar
Definition
■ Terminals T
■ Symbols which form strings of L(G), G a CFG (= tokens in
the scanner), e.g. if, else, id
■ Nonterminals N
■ Syntactic variables denoting sets of strings of L(G)
■ Impose hierarchical structure (e.g., precedence rules)
■ Start symbol S (∈ N)
■ Denotes the set of strings of L(G)
■ Productions P
■ Rules that determine how strings are formed
■ N -> (N|T) *

CS2210 Compiler Design 2004/05

Example: Expression Grammar


expr -> expr op expr ■ Terminals:
expr -> (expr)
■ {id, +, -, *, /, ^}
expr -> - expr
expr -> id ■ Nonterminals
op -> + ■ {expr, op,}
op -> -
■ Start symbol
op -> *
op -> / ■ Expr
op -> ^

CS2210 Compiler Design 2004/05

Notational Conventions
■ Terminals ■ Nonterminals
A, B, C ..
a,b,c..


■ S start symbol (if present)
■ +,-,.. or first nonterminal in
production list
■ ‘,’.’;’ etc
■ Terminal strings
■ 0..9 ■ u,v,..
■ expr or <expr> ■ Grammar symbol
strings
■ α,β
■ Productions
■ A -> α

CS2210 Compiler Design 2004/05

3
Shorthands & Derivations
E -> E + E | E * E | ■ E => - E “E derives
(E) | - E | <id> -E”
■ => derives in 1 step
■ =>* derive in n (0..)
steps

CS2210 Compiler Design 2004/05

More Definitions
■ L(G) language generated by G = set of
strings derived from S
■ S =>+ w : w sentence of G (w string of
terminals)
■ S =>+ α : α sentential form of G
■ (string can contain nonterminals)
■ G and G’ are equivalent :⇔ L(G) = L(G’)
■ A language generated by a grammar (of the
form shown) is called a context-free language

CS2210 Compiler Design 2004/05

Example
G = ({-,*,(,),<id>}, Sentence: -(<id> + <id>)
{E}, E, {E -> E + E, Derivation:
E-> E * E , E -> (E) E => -E => -(E) =>
-(E+E)=>-(<id>+E)
, E-> - E, E -> => -(<id> + <id>)
<id>}) • Leftmost derivation i.e.
always replace leftmost
nonterminal
• Rightmost derivation
analogously
• Left /right sentential form

CS2210 Compiler Design 2004/05

4
Parse Trees
Parse tree =
graphical representation
E of a derivation
E => -E => ignoring replacement
-(E) => - order
E
-(E+E)=>
-(<id>+E) =>
-(<id> + ( E )
<id>)
E + E
<id> <id>

CS2210 Compiler Design 2004/05

Ambiguous Grammars
■ >=2 different parse trees for some sentence
⇔ >= 2 leftmost/rightmost derivations
■ Usually want to have unambiguous grammars
■ E.g. want to just one evaluation order:
<id> + <id> * <id> to be parsed as <id> +
(<id> * <id>) not (<id>+<id>)*<id>
■ To keep grammars simple accept ambiguity and
resolve separately (outside of grammar)

CS2210 Compiler Design 2004/05

Expressive Power
■ CFGs are more powerful than REs
■ Can express matching () with CFGs
■ Can express most properties desired for
programming languages
■ CFGs cannot express:
■ Identifiers declared before used L = {wcw|w is in
(a|b) *}
■ Parameter checking (#formals = #actuals)
L ={a nbmcndm|n ≥ 1, m ≥ 1}

CS2210 Compiler Design 2004/05

5
Eliminating Ambiguity (1)
Grammar stmt => if expr then stmt else stmt
stmt -> if expr then stmt => if E1 then stmt else stmt
| if expr then stmt else stmt =>
| other
is ambiguous:
if E1 then if expr then stmt else
Sentence: if E1 then if E2 then S1 else stmt =>
S2 if E1 then if E2 then stmt else stmt
stmt => if expr then stmt => =>
if E1 then stmt => if E1 then if E2 then S1 else stmt =>
if E1 then if expr then stmt else
stmt => if E1 then if E2 then S1 else S 2
if E1 then if E2 then stmt else
stmt => Which one do we
if E1 then if E2 then S1 else stmt =>
if E1 then if E2 then S1 else S 2 prefer?

CS2210 Compiler Design 2004/05

Eliminating Ambiguity (2)


Grammar stmt -> matchted_stmt
| unmatched_stmt
stmt -> if expr then
matched_stmt -> if expr
stmt then matched_stmt else
| if expr then stmt matched_stmt
else stmt | other
| other unmatched_stmt -> if
is ambiguous: expr then stmt
| if expr then
Sentence: if E1 then if matched_stmt else
E2 then S1 else S2 unmatched_stmt

CS2210 Compiler Design 2004/05

Left Recursion
■ If for grammar G there is a derivation
A =>+ Aα, for some string α then G is
left recursive
Example:
S -> Aa | b
A -> Ac | Sd | ε

CS2210 Compiler Design 2004/05

6
Parsing
■ = determining whether a string of
tokens can be generated by a grammar
■ Two classes based on order in which
parse tree is constructed:
■ Top-down parsing
■ Start construction at root of parse tree
■ Bottom-up parsing
■ Start at leaves and proceed to root

CS2210 Compiler Design 2004/05

Recursive Descent Parsing


■ A top-down method based on recursive
procedures (one for each nonterminal
typically)
■ May have to backtrack when wrong production
was picked
■ Predictive parsing = a recursive descent
parsing approach that avoids backtracking
■ More efficient
■ Uses (limited) lookahead to decide what
productions to use

CS2210 Compiler Design 2004/05

Predictive Parser
■ Program with a (parsing) procedure for
each nonterminal which
■ Decides what production to use (based on
lookahead in the input)
■ Uses a production by mimicking the right
side

CS2210 Compiler Design 2004/05

7
Predictive Parser Example
type -> simple | ^id |
procedure match(t:token);
begin

array [simple] of type if lookahead = t then


lookahead = nexttoken;
else
simple -> integer | error;

char | end;
procedure type;

num dotdot num begin


if lookahead is in
{integer,char,num) then
simple
else if lookakead = ‘^’ then begin
match(‘^’);match(id)
end
else if lookahead = array then
begin
match(array);match(‘[‘);
simple;
match(‘]’);match(of);
type
end
else error;
end
CS2210 Compiler Design 2004/05

Predictive Parsing Obstacles


■ expr -> expr + term
■ expr; match(‘+’); term;
■ Infinite recursion (left recursion)
■ stmt -> if expr then stmt else stmt |
if expr then stmt
■ Common prefix
■ Can’t predict production
■ Solution
■ Eliminate left recursion
■ Left factoring

CS2210 Compiler Design 2004/05

Eliminating Left Recursion (1)


■ Simple case: immediate left recursion:
Replace A -> A α | β with
A -> β A’
A’ -> αA’ | ε

CS2210 Compiler Design 2004/05

8
Eliminating Left Recursion (2)
Order the nonterminals A 1 .. A n
for i := 1 to n do begin
for j := 1 to i-1 do begin
replace each production of the form
Ai -> Ajγ by the productions
Ai -> δ1γ | δ 2γ |…| δkγ
where A i -> δ1 | δ2 | … | δk are all
current A j productions
end
eliminate immediate left recursion among the A i
productions
end

CS2210 Compiler Design 2004/05

Example Eliminating Left


Recursion
S -> Aa | b i=2,j=1:
A -> Ac | Sd | ε Eliminate A->S γ
Order: S,A Replace A->Sd with
for i := 1 to n do begin
for j := 1 to i-1 do begin A->Ac|Aad|bd|ε
replace each production of the
form Eliminate immediate left
Ai -> A jγ by the productions
Ai -> δ1γ | δ2γ |…| δkγ
recursion:
all
where Ai -> δ1 | δ2 | … | δk are
current A j
S->Aa|b
productions
end
A -> bdA’|A’
eliminate immediate left recursion
among the A i productions
A’ ->cA’ | adA’ | ε
end

CS2210 Compiler Design 2004/05

Left Factoring
■ Find longest common prefix and turn
into new nonterminal
■ stmt -> if expr then stmt stmt’
■ stmt’ -> else stmt | ε

CS2210 Compiler Design 2004/05

9
Transition Diagrams
■ Create initial and final state
■ For each production A -> X1X2…Xn
create a path from the initial to the final
state, with edges labeled X1, X2, … Xn
+
T ε
E: 0 3 6

CS2210 Compiler Design 2004/05

Non-recursive Predictive
Parsers
■ Avoid recursion for efficiency reasons
■ Typically built automatically by tools
Input a + b $

X Predictive Parsing output


Stack Y Program
Z
$ M[A,a]gives production
Parsing Table A symbol on stack
M a input symbol (and $)
CS2210 Compiler Design 2004/05

Parsing Algorithm
■ X symbol on top of stack, a current input
symbol
■ Stack contents and remaining input called parser
configuration (initially $S on stack and complete
input string)
1. If X=a=$ halt and announce success
2. If X=a ≠ $ pop X off stack advance input to next symbol
3. If X is a nonterminal use M[X,a] which contains production
X->rhs or error
replace X on stack with rhs or call error routine, respectively,
e.g. X->UVW replace X with WVU (U on top) output the
production (or augment parse tree)

CS2210 Compiler Design 2004/05

10
Construction of Parsing Table
Helpers (1)
■ First(α) : =set of terminals that begin
strings derived from α
■ First(X) = {X} for terminal X
■ If X-> ε a production add ε to First(X)
■ For X->Y1…Yk place a in First(X) if a in
First(Y i) and ε ∈First(Yj) for j=1…i-1, if ε
∈First(Yj) j=1…k add ε to First(X)

CS2210 Compiler Design 2004/05

Construction of Parsing Table


Helpers (2)
■ Follow(A) := set of terminals a that can appear
immediately to the right of A in some sentential form
i.e., S =>* α Aaβ for some α,β (a can include $)
■ Place $ in Follow(S), S start symbol, $ right end marker
■ If there is a production A-> αBβ put everything in First(β)
except ε in Follow(B)
■ If there is a production A-> αB or A->αBβ where ε is in
First(β) then everything in Follow(A) is in Follow(B)

CS2210 Compiler Design 2004/05

Construction Algorithm
Input: Grammar G
Output: Parsing table M

For each production A -> α do


For each terminal a in FIRST(α) add
A-> α to M[A, a]
If ε is in FIRST(α) add A-> α to M[A,b]
for each terminal b in FOLLOW(A).
($ counts as a terminal in this step)
Make each undefined entry in M to error

CS2210 Compiler Design 2004/05

11
Example
E -> TE’
I +
* (
)
E’ -> +TE’ | ε
d
$
T ->FT’
T’ -> *FT’ | ε E
F -> (E) | id
FIRST(E) = FIRST(T) = FIRST(F)
E’
={(,id}
FIRST(E’) = {+, ε}
T
FIRST(T’) = {*, ε} T’
FOLLOW(E)=FOLLOW(E’)={),$}
FOLLOW(T)=FOLLOW(T’)={+.),$} F
FOLLOW(F) ={+.*,),$}

CS2210 Compiler Design 2004/05

LL(1)
■ A grammar whose parsing table has no
multiply defined entries is said to be LL(1)
■ First L = left to right input scanning
■ Second L = leftmost derivation
■ (1) = 1 token lookahead
■ Not all grammars can be brought to LL(1)
form, i.e., there are languages that do not fall
into the LL(1) class

CS2210 Compiler Design 2004/05

12

Das könnte Ihnen auch gefallen