Sie sind auf Seite 1von 35

CPSC 388 Compiler Design

and Construction
Implementing a Parser
LL(1) and LALR Grammars
FBI Noon Dining Hall Vicki Anderson Recruiter

Announcements

PROG 3 out, due Oct 9th

Get started NOW!

HW due Friday
HW6 posted, due next Friday

Parsing using CFGs

Algorithms can parse using CFGs in O(n3) time (n is the


number of characters in input stream) TOO SLOW
Subclasses of grammars can be parsed in O(n) time

LL(1)
1 token of look ahead
Do a left most derivation
Scan input from left to right

LALR(1)
one token of look-ahead
do a rightmost derivation in reverse
scan the input left-to-right
LA means "look-ahead
(nothing to do with the number of tokens)

LALR(1)

More general than LL(1) grammars


(Every LL(1) grammar is a LALR(1) grammar but not
vice versa)

Class of grammars used by java_cup, Bison,


YACC
Parsed bottom up
(start with non-terminals and build tree from leaves
up to root)

Covered in text section 4.6-4.7


For class need to understand details of just
LL(1) grammars

LL(1) Grammars Predictive Parsers

build parse tree top-down


actually discover tree top-down, dont
actually build it
Keep track of work to be done using a stack
Scanned tokens along with stack
correspond to leaves of incomplete tree
Use parse table to decide how to parse
input

Rows are non-terminals


Columns are tokens (plus EOF token)
Cells are the bodies of production rules

Predictive Parser Algorithm


s.push(EOF) // special EOF terminal
s.push(start) // start is start non-terminal
x=s.peek()
t=scanner.next_token()
While (x != EOF):
if x==t:
s.pop()
t=scanner.next_token()
else: if x is terminal: error
else: if table[x][t]==empty: error
else:
let body=table[x][t] //body of production
output xbody
s.pop()
s.push() //push body from right to left
x=s.peek()

Example Parse using algorithm

Consider the language of balanced


parentheses and brackets, e.g. ([])
Input String is ([])EOF
Grammar:
S|(S)|[S]

Parse Table:
S

EOF

(S)

[S]

Not All Grammars LL(1)

Not all Grammars are LL(1):


S(S)|[S]|()|[]

If input is ( dont know which rule to


use!
Try input [[]] to LL(1) grammar
using predictive parser

Draw input seen so far


Stack
Action taken

Is Grammar LL(1)

Given a grammar how do you tell if it


is LL(1)?
How to build the parse table?
If parse table is built and only one
entry per cell then LL(1)

Non-LL(1) Grammars

If a grammar is left-recursive

If a grammar is not left-factored

It is sometimes possible to change a


grammar to remove left-recursion
and to make it left-factored

Left-Recursion

Grammar g is recursive if there exists


a production such that:

x * x

Recursive

x * x

Left recursive

x * x

Right recursive

Removing Immediate Left-Recursion

Consider the grammar


A A |

A is a nonterminal

a sequence of terminals and/or nonterminals

is a sequence of terminals and/or nonterminals


not starting with A

Replace production with


A A
A A |

Two grammars are equivalent (recognize


same set of input strings)

You Try it

Remove left recursion from the grammar:


exp
factor

exp - factor | factor


INTLITERAL | ( exp )

Construct parse tree using original


grammar and new grammar using input 53-2
In general more difficult than this to
remove left recursion, see text 4.3.3

Left Factored

A grammar is NOT left-factored if a


non-terminal has two productions
whose bodies have common prefixes
exp ( exp ) | ( )

A top-down predictive parser would


not know which production rule to
use when seeing input character of
(

Left Factoring

Given a pair of productions:


A 1 | 2

is sequence of terminals and non-terminals


1 and 2 are sequence of terminals and nonterminals but dont have common prefix (may
be epsilon)

Change to:
A A
A 1 | 2

Left Factoring Example

So for grammar
exp

( exp ) | ( )

It becomes
exp
exp

( exp
exp ) | )

You Try It

Remove left recursion and do left


factoring for grammar
exp ( exp ) | exp exp | ( )

Building Parse Tables

Recall a parse table

Every row is a non-terminal


Every column is an input token
Every cell contains a production body

If any cell contains more than one


production body then grammar is not
LL(1)
To build parse table need to have
FIRST set and FOLLOW set

FIRST set

FIRST()
is some sequence of terminals and nonterminals
FIRST() is set of terminals that begin the
strings derivable from
if can derive , then is in FIRST()

FIRST ( ) t |

t is terminal and * t
t and *

FIRST(X)

X is a single terminal, non-terminal or


FIRST(X)={X} //X is terminal
FIRST(X)={} //X is
FIRST(X)=
//X is non-terminal

Look at all productions rules with X as head

For each production rule, X Y1,Y2,Yn

Put FIRST(Y1) - {} into FIRST(X).


If is in FIRST(Y1), then put FIRST(Y2) - {} into
FIRST(X).
If is in FIRST(Y2), then put FIRST(Y3) - {} into
FIRST(X).
etc...
If is in FIRST(Yi) for 1 <= i <= n (all production righthand side

Example FIRST Sets

Compute FIRST sets for each nonterminal:


exp
exp
term
term
factor

{ INTLITERAL, ( }
term exp
{ /, }
- term exp |
{ INTLITERAL, ( }
factor term
{ -, }
/ factor term |
INTLITERAL | ( exp ) {INTLITERAL, ( }

FIRST() for any

is of the form X1, X2, , Xn

1.
2.

3.
4.

Where each X is a terminal, non-terminal or

Put FIRST(X1) - {} into FIRST()


If epsilon is in FIRST(X1) put
FIRST(X2) into FIRST().
etc...
If is in the FIRST set for every X n,
put into FIRST().

Example FIRST sets for rules


FIRST( term exp' )
FIRST( - term exp' )
FIRST( )
FIRST( factor term' )
FIRST( / factor term'
FIRST( )
FIRST( INTLITERAL )
FIRST( ( exp ) )

=
=
=
=
)=
=
=
=

{ INTLITERAL, ( }
{-}
{ }
{ INTLITERAL, ( }
{/}
{ }
{ INTLITERAL }
{(}

Why Do We Care about FIRST()?

During parsing, suppose the top-of-stack


symbol is nonterminal A, that there are two
productions:

A
A

And that the current token is x


If x is in FIRST() then use first production
If x is in FIRST() then use second
production

FOLLOW(A) sets

Only defined for single


non-terminals, A

the set of terminals that can appear


immediately to the right of A (may
include EOF but never )

Calculating FOLLOW(A)

If A is start non-terminal put EOF in


FOLLOW(A)
Find productions with A in body:

For each production X A

put FIRST() {} in FOLLOW(A)


If in FIRST() put FOLLOW(X) into
FOLLOW(A)

For each production X A

put FOLLOW(X) into FOLLOW(A)

FIRST and FOLLOW sets

To compute FIRST(A) you must look for A


on a production's left-hand side.
To compute FOLLOW(A) you must look for
A on a production's right-hand side.
FIRST and FOLLOW sets are always sets of
terminals (plus, perhaps, for FIRST sets,
and EOF for follow sets).
Nonterminals are never in a FIRST or a
FOLLOW set.

Example FOLLOW sets


CAPS are non-terminals and lower-case are terminals
S
Bc|DB
B
ab|cS
D
d|
X
FIRST(X)
FOLLOW(X)
------------------------------------------D
{ d, }
{ a, c }
B
{ a, c }
{ c, EOF }
S
{ a, c, d }
{ EOF, c }
Note: FOLLOW of S always includes EOF

You Try It

Computer FIRST and FOLLOW sets


for:
methodHeader
paramList
paramList
nonEmptyParamList
nonEmptyParamList

VOID ID LPAREN paramList RPAREN


epsilon
nonEmptyParamList
ID ID
ID ID COMMA nonEmptyParamList

Remember you need FIRST and FOLLOW


sets for all non-terminals and FIRST sets
for all bodies of rules

Parse Table
Current
Token

a
S
Non-terminals

A
X
R

Rule bodies

Parse Table Construction Algorithm


for each production X :
for each terminal t in First():
put in Table[X,t]
if is in First() then:
for each terminal t in Follow(X):
put in Table[X,t]

Example Parse Table Construction


SBc|DB
Bab|cS
Dd|
For this grammar:
Construct FIRST and FOLLOW Sets
Apply algorithm to calculate parse
table

Example Parse Table Construction


X
FIRST(X)
FOLLOW(X)
--------------------------------------------------D
{ d, }
{ a, c }
B
{ a, c }
{ c, EOF }
S
{ a, c, d }
{ EOF, c }
Bc
{ a, c }
DB
{ d, a, c }
ab
{a}
cS
{c}
D
{d}

{ }

Parse Table
a
S

Bc
DB

Bc
DB

DB

B
D

Finish Filling In Table

EOF

Predictive Parser Algorithm


s.push(EOF) // special EOF terminal
s.push(start) // start is start non-terminal
x=s.peek()
t=scanner.next_token()
While (x != EOF):
if x==t:
s.pop()
t=scanner.next_token()
else: if x is terminal: error
else: if table[x][t]==empty: error
else:
let body=table[x][t] //body of production
output xbody
s.pop()
s.push() //push body from right to left
x=s.peek()

Das könnte Ihnen auch gefallen