Lesson 10

CPSC 388 Compiler Design
and Construction
Implementing a Parser
LL(1) and LALR Grammars
FBI Noon Dining Hall Vicki Anderson Recruiter
Announcements
PROG 3 out, due Oct 9th
Get started NOW!
HW due Friday
HW6 posted, due next Friday
Parsing using CFGs
Algorithms can parse using CFGs in O(n3) time (n is the

number of characters in input stream) TOO SLOW
Subclasses of grammars can be parsed in O(n) time
LL(1)
1 token of look ahead
Do a left most derivation
Scan input from left to right
LALR(1)
one token of look-ahead
do a rightmost derivation in reverse
scan the input left-to-right
LA means "look-ahead
(nothing to do with the number of tokens)
LALR(1)
More general than LL(1) grammars

(Every LL(1) grammar is a LALR(1) grammar but not
vice versa)
Class of grammars used by java_cup, Bison,

YACC
Parsed bottom up
(start with non-terminals and build tree from leaves
up to root)
Covered in text section 4.6-4.7

For class need to understand details of just
LL(1) grammars
LL(1) Grammars Predictive Parsers
build parse tree top-down

actually discover tree top-down, dont
actually build it
Keep track of work to be done using a stack
Scanned tokens along with stack
correspond to leaves of incomplete tree
Use parse table to decide how to parse
input
Rows are non-terminals

Columns are tokens (plus EOF token)
Cells are the bodies of production rules
Predictive Parser Algorithm

s.push(EOF) // special EOF terminal
s.push(start) // start is start non-terminal
x=s.peek()
t=scanner.next_token()
While (x != EOF):
if x==t:
s.pop()
else: if x is terminal: error
else: if table[x][t]==empty: error
else:
let body=table[x][t] //body of production
output xbody
s.pop()
s.push() //push body from right to left
x=s.peek()
Example Parse using algorithm
Consider the language of balanced

parentheses and brackets, e.g. ([])
Input String is ([])EOF
Grammar:
S|(S)|[S]
Parse Table:
S
EOF
(S)
[S]
Not All Grammars LL(1)
Not all Grammars are LL(1):

S(S)|[S]|()|[]
If input is ( dont know which rule to

use!
Try input [[]] to LL(1) grammar
using predictive parser
Draw input seen so far

Stack
Action taken
Is Grammar LL(1)
Given a grammar how do you tell if it

is LL(1)?
How to build the parse table?
If parse table is built and only one
entry per cell then LL(1)
Non-LL(1) Grammars
If a grammar is left-recursive
If a grammar is not left-factored
It is sometimes possible to change a

grammar to remove left-recursion
and to make it left-factored
Left-Recursion
Grammar g is recursive if there exists

a production such that:
x * x
Recursive
x * x
Left recursive
x * x
Right recursive
Removing Immediate Left-Recursion
Consider the grammar

A A |
A is a nonterminal
a sequence of terminals and/or nonterminals
is a sequence of terminals and/or nonterminals

not starting with A
Replace production with

A A
A A |
Two grammars are equivalent (recognize

same set of input strings)
You Try it
Remove left recursion from the grammar:

exp
factor
exp - factor | factor

INTLITERAL | ( exp )
Construct parse tree using original

grammar and new grammar using input 53-2
In general more difficult than this to
remove left recursion, see text 4.3.3
Left Factored
A grammar is NOT left-factored if a

non-terminal has two productions
whose bodies have common prefixes
exp ( exp ) | ( )
A top-down predictive parser would

not know which production rule to
use when seeing input character of
(
Left Factoring
Given a pair of productions:

A 1 | 2
is sequence of terminals and non-terminals

1 and 2 are sequence of terminals and nonterminals but dont have common prefix (may
be epsilon)
Change to:
A A
A 1 | 2
Left Factoring Example
So for grammar
exp
( exp ) | ( )
It becomes
exp
exp
( exp
exp ) | )
You Try It
Remove left recursion and do left

factoring for grammar
exp ( exp ) | exp exp | ( )
Building Parse Tables
Recall a parse table
Every row is a non-terminal

Every column is an input token
Every cell contains a production body
If any cell contains more than one

production body then grammar is not
LL(1)
To build parse table need to have
FIRST set and FOLLOW set
FIRST set
FIRST()
is some sequence of terminals and nonterminals
FIRST() is set of terminals that begin the
strings derivable from
if can derive , then is in FIRST()
FIRST ( ) t |
t is terminal and * t
t and *
FIRST(X)
X is a single terminal, non-terminal or

FIRST(X)={X} //X is terminal
FIRST(X)={} //X is
FIRST(X)=
//X is non-terminal
Look at all productions rules with X as head
For each production rule, X Y1,Y2,Yn
Put FIRST(Y1) - {} into FIRST(X).

If is in FIRST(Y1), then put FIRST(Y2) - {} into
FIRST(X).
If is in FIRST(Y2), then put FIRST(Y3) - {} into
FIRST(X).
etc...
If is in FIRST(Yi) for 1 <= i <= n (all production righthand side
Example FIRST Sets
Compute FIRST sets for each nonterminal:

exp
exp
term
term
factor
{ INTLITERAL, ( }
term exp
{ /, }
- term exp |
{ INTLITERAL, ( }
factor term
{ -, }
/ factor term |
INTLITERAL | ( exp ) {INTLITERAL, ( }
FIRST() for any
is of the form X1, X2, , Xn
1.
2.
3.
4.
Where each X is a terminal, non-terminal or
Put FIRST(X1) - {} into FIRST()

If epsilon is in FIRST(X1) put
FIRST(X2) into FIRST().
etc...
If is in the FIRST set for every X n,
put into FIRST().
Example FIRST sets for rules

FIRST( term exp' )
FIRST( - term exp' )
FIRST( )
FIRST( factor term' )
FIRST( / factor term'
FIRST( )
FIRST( INTLITERAL )
FIRST( ( exp ) )
=
=
=
=
)=
=
=
=
{ INTLITERAL, ( }
{-}
{ }
{ INTLITERAL, ( }
{/}
{ }
{ INTLITERAL }
{(}
Why Do We Care about FIRST()?
During parsing, suppose the top-of-stack

symbol is nonterminal A, that there are two
productions:
A
A
And that the current token is x

If x is in FIRST() then use first production
If x is in FIRST() then use second
production
FOLLOW(A) sets
Only defined for single

non-terminals, A
the set of terminals that can appear

immediately to the right of A (may
include EOF but never )
Calculating FOLLOW(A)
If A is start non-terminal put EOF in

FOLLOW(A)
Find productions with A in body:
For each production X A
put FIRST() {} in FOLLOW(A)

If in FIRST() put FOLLOW(X) into
FOLLOW(A)
For each production X A
put FOLLOW(X) into FOLLOW(A)
FIRST and FOLLOW sets
To compute FIRST(A) you must look for A

on a production's left-hand side.
To compute FOLLOW(A) you must look for
A on a production's right-hand side.
FIRST and FOLLOW sets are always sets of
terminals (plus, perhaps, for FIRST sets,
and EOF for follow sets).
Nonterminals are never in a FIRST or a
FOLLOW set.
Example FOLLOW sets

CAPS are non-terminals and lower-case are terminals
S
Bc|DB
B
ab|cS
D
d|
X
FIRST(X)
FOLLOW(X)
------------------------------------------D
{ d, }
{ a, c }
B
{ a, c }
{ c, EOF }
S
{ a, c, d }
{ EOF, c }
Note: FOLLOW of S always includes EOF
You Try It
Computer FIRST and FOLLOW sets

for:
methodHeader
paramList
paramList
nonEmptyParamList
nonEmptyParamList
VOID ID LPAREN paramList RPAREN

epsilon
nonEmptyParamList
ID ID
ID ID COMMA nonEmptyParamList
Remember you need FIRST and FOLLOW

sets for all non-terminals and FIRST sets
for all bodies of rules
Parse Table
Current
Token
a
S
Non-terminals
A
X
R
Rule bodies
Parse Table Construction Algorithm

for each production X :
for each terminal t in First():
put in Table[X,t]
if is in First() then:
for each terminal t in Follow(X):
put in Table[X,t]
Example Parse Table Construction

SBc|DB
Bab|cS
Dd|
For this grammar:
Construct FIRST and FOLLOW Sets
Apply algorithm to calculate parse
table
Example Parse Table Construction

X
FIRST(X)
FOLLOW(X)
--------------------------------------------------D
{ d, }
{ a, c }
B
{ a, c }
{ c, EOF }
S
{ a, c, d }
{ EOF, c }
Bc
{ a, c }
DB
{ d, a, c }
ab
{a}
cS
{c}
D
{d}
{ }
Parse Table
a
S
Bc
DB
Bc
DB
DB
B
D
Finish Filling In Table
EOF
Predictive Parser Algorithm

s.push(EOF) // special EOF terminal
s.push(start) // start is start non-terminal
x=s.peek()
While (x != EOF):
if x==t:
s.pop()
else: if x is terminal: error
else: if table[x][t]==empty: error
else:
let body=table[x][t] //body of production
output xbody
s.pop()
s.push() //push body from right to left
x=s.peek()

Lesson 10

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lesson 10

Hochgeladen von

Copyright:

Verfügbare Formate

CPSC 388 Compiler Design

PROG 3 out, due Oct 9th

Get started NOW!

Parsing using CFGs

Algorithms can parse using CFGs in O(n3) time (n is the

More general than LL(1) grammars

Class of grammars used by java_cup, Bison,

Covered in text section 4.6-4.7

LL(1) Grammars Predictive Parsers

build parse tree top-down

Rows are non-terminals

Predictive Parser Algorithm

Example Parse using algorithm

Consider the language of balanced

Not All Grammars LL(1)

Not all Grammars are LL(1):

If input is ( dont know which rule to

Draw input seen so far

Given a grammar how do you tell if it

If a grammar is not left-factored

It is sometimes possible to change a

Grammar g is recursive if there exists

Removing Immediate Left-Recursion

Consider the grammar

a sequence of terminals and/or nonterminals

is a sequence of terminals and/or nonterminals

Replace production with

Two grammars are equivalent (recognize

Remove left recursion from the grammar:

exp - factor | factor

Construct parse tree using original

A grammar is NOT left-factored if a

A top-down predictive parser would

Given a pair of productions:

is sequence of terminals and non-terminals

Left Factoring Example

Remove left recursion and do left

Building Parse Tables

Recall a parse table

Every row is a non-terminal

If any cell contains more than one

X is a single terminal, non-terminal or

Look at all productions rules with X as head

For each production rule, X Y1,Y2,Yn

Put FIRST(Y1) - {} into FIRST(X).

Example FIRST Sets

Compute FIRST sets for each nonterminal:

FIRST() for any

is of the form X1, X2, , Xn

Where each X is a terminal, non-terminal or

Put FIRST(X1) - {} into FIRST()

Example FIRST sets for rules

Why Do We Care about FIRST()?

During parsing, suppose the top-of-stack

And that the current token is x

Only defined for single

the set of terminals that can appear

If A is start non-terminal put EOF in

For each production X A

put FIRST() {} in FOLLOW(A)

For each production X A

put FOLLOW(X) into FOLLOW(A)