Beruflich Dokumente
Kultur Dokumente
CS2210
Lecture 4
Parser
token
lexical parse rest of IR
source parser tree
analyzer frontend
get next
token
symbol
table
Grammars
■ Precise, easy-to understand description of
syntax
■ Context-free grammars -> efficient parsers
(automatically!)
■ Help in translation and error detection
■ Eg. Attribute grammars
■ Easier language evolution
■ Can add new constructs systematically
1
Syntax Errors
■ Many errors are syntactic or exposed by
parsing
■ eg. Unbalanced ()
■ Error handling goals:
■ Report errors quickly & accurately
■ Recover quickly (continue parsing after
error)
■ Little overhead on parse time
Error Recovery
■ Panic mode
■ Discard tokens until synchronization token found (often ‘;’)
■ Phrase level
■ Local correction: replace a token by another and continue
■ Error productions
■ Encode commonly expected errors in grammar
■ Global correction
■ Find closest input string that is in L(G)
■ Too costly in practice
Context-free Grammars
■ Precise and easy way to specify the
syntactical structure of a programming
language
■ Efficient recognition methods exist
■ Natural specification of many “recursive”
constructs:
■ expr -> expr + expr | term
2
Context-free Grammar
Definition
■ Terminals T
■ Symbols which form strings of L(G), G a CFG (= tokens in
the scanner), e.g. if, else, id
■ Nonterminals N
■ Syntactic variables denoting sets of strings of L(G)
■ Impose hierarchical structure (e.g., precedence rules)
■ Start symbol S (∈ N)
■ Denotes the set of strings of L(G)
■ Productions P
■ Rules that determine how strings are formed
■ N -> (N|T) *
Notational Conventions
■ Terminals ■ Nonterminals
A, B, C ..
a,b,c..
■
■
■ S start symbol (if present)
■ +,-,.. or first nonterminal in
production list
■ ‘,’.’;’ etc
■ Terminal strings
■ 0..9 ■ u,v,..
■ expr or <expr> ■ Grammar symbol
strings
■ α,β
■ Productions
■ A -> α
3
Shorthands & Derivations
E -> E + E | E * E | ■ E => - E “E derives
(E) | - E | <id> -E”
■ => derives in 1 step
■ =>* derive in n (0..)
steps
More Definitions
■ L(G) language generated by G = set of
strings derived from S
■ S =>+ w : w sentence of G (w string of
terminals)
■ S =>+ α : α sentential form of G
■ (string can contain nonterminals)
■ G and G’ are equivalent :⇔ L(G) = L(G’)
■ A language generated by a grammar (of the
form shown) is called a context-free language
Example
G = ({-,*,(,),<id>}, Sentence: -(<id> + <id>)
{E}, E, {E -> E + E, Derivation:
E-> E * E , E -> (E) E => -E => -(E) =>
-(E+E)=>-(<id>+E)
, E-> - E, E -> => -(<id> + <id>)
<id>}) • Leftmost derivation i.e.
always replace leftmost
nonterminal
• Rightmost derivation
analogously
• Left /right sentential form
4
Parse Trees
Parse tree =
graphical representation
E of a derivation
E => -E => ignoring replacement
-(E) => - order
E
-(E+E)=>
-(<id>+E) =>
-(<id> + ( E )
<id>)
E + E
<id> <id>
Ambiguous Grammars
■ >=2 different parse trees for some sentence
⇔ >= 2 leftmost/rightmost derivations
■ Usually want to have unambiguous grammars
■ E.g. want to just one evaluation order:
<id> + <id> * <id> to be parsed as <id> +
(<id> * <id>) not (<id>+<id>)*<id>
■ To keep grammars simple accept ambiguity and
resolve separately (outside of grammar)
Expressive Power
■ CFGs are more powerful than REs
■ Can express matching () with CFGs
■ Can express most properties desired for
programming languages
■ CFGs cannot express:
■ Identifiers declared before used L = {wcw|w is in
(a|b) *}
■ Parameter checking (#formals = #actuals)
L ={a nbmcndm|n ≥ 1, m ≥ 1}
5
Eliminating Ambiguity (1)
Grammar stmt => if expr then stmt else stmt
stmt -> if expr then stmt => if E1 then stmt else stmt
| if expr then stmt else stmt =>
| other
is ambiguous:
if E1 then if expr then stmt else
Sentence: if E1 then if E2 then S1 else stmt =>
S2 if E1 then if E2 then stmt else stmt
stmt => if expr then stmt => =>
if E1 then stmt => if E1 then if E2 then S1 else stmt =>
if E1 then if expr then stmt else
stmt => if E1 then if E2 then S1 else S 2
if E1 then if E2 then stmt else
stmt => Which one do we
if E1 then if E2 then S1 else stmt =>
if E1 then if E2 then S1 else S 2 prefer?
Left Recursion
■ If for grammar G there is a derivation
A =>+ Aα, for some string α then G is
left recursive
Example:
S -> Aa | b
A -> Ac | Sd | ε
6
Parsing
■ = determining whether a string of
tokens can be generated by a grammar
■ Two classes based on order in which
parse tree is constructed:
■ Top-down parsing
■ Start construction at root of parse tree
■ Bottom-up parsing
■ Start at leaves and proceed to root
Predictive Parser
■ Program with a (parsing) procedure for
each nonterminal which
■ Decides what production to use (based on
lookahead in the input)
■ Uses a production by mimicking the right
side
7
Predictive Parser Example
type -> simple | ^id |
procedure match(t:token);
begin
char | end;
procedure type;
8
Eliminating Left Recursion (2)
Order the nonterminals A 1 .. A n
for i := 1 to n do begin
for j := 1 to i-1 do begin
replace each production of the form
Ai -> Ajγ by the productions
Ai -> δ1γ | δ 2γ |…| δkγ
where A i -> δ1 | δ2 | … | δk are all
current A j productions
end
eliminate immediate left recursion among the A i
productions
end
Left Factoring
■ Find longest common prefix and turn
into new nonterminal
■ stmt -> if expr then stmt stmt’
■ stmt’ -> else stmt | ε
9
Transition Diagrams
■ Create initial and final state
■ For each production A -> X1X2…Xn
create a path from the initial to the final
state, with edges labeled X1, X2, … Xn
+
T ε
E: 0 3 6
Non-recursive Predictive
Parsers
■ Avoid recursion for efficiency reasons
■ Typically built automatically by tools
Input a + b $
Parsing Algorithm
■ X symbol on top of stack, a current input
symbol
■ Stack contents and remaining input called parser
configuration (initially $S on stack and complete
input string)
1. If X=a=$ halt and announce success
2. If X=a ≠ $ pop X off stack advance input to next symbol
3. If X is a nonterminal use M[X,a] which contains production
X->rhs or error
replace X on stack with rhs or call error routine, respectively,
e.g. X->UVW replace X with WVU (U on top) output the
production (or augment parse tree)
10
Construction of Parsing Table
Helpers (1)
■ First(α) : =set of terminals that begin
strings derived from α
■ First(X) = {X} for terminal X
■ If X-> ε a production add ε to First(X)
■ For X->Y1…Yk place a in First(X) if a in
First(Y i) and ε ∈First(Yj) for j=1…i-1, if ε
∈First(Yj) j=1…k add ε to First(X)
Construction Algorithm
Input: Grammar G
Output: Parsing table M
11
Example
E -> TE’
I +
* (
)
E’ -> +TE’ | ε
d
$
T ->FT’
T’ -> *FT’ | ε E
F -> (E) | id
FIRST(E) = FIRST(T) = FIRST(F)
E’
={(,id}
FIRST(E’) = {+, ε}
T
FIRST(T’) = {*, ε} T’
FOLLOW(E)=FOLLOW(E’)={),$}
FOLLOW(T)=FOLLOW(T’)={+.),$} F
FOLLOW(F) ={+.*,),$}
LL(1)
■ A grammar whose parsing table has no
multiply defined entries is said to be LL(1)
■ First L = left to right input scanning
■ Second L = leftmost derivation
■ (1) = 1 token lookahead
■ Not all grammars can be brought to LL(1)
form, i.e., there are languages that do not fall
into the LL(1) class
12