Sie sind auf Seite 1von 29

Syntax Analysis ( Parsing)

Phases of a Compiler
Source Code
Lexical Analyzer
Syntax Analyzer
Symbol
Table
Manager

Semantic Analyzer

Error
Handler

Intermediate Code Generator


Code Optimizer
Code Generator
Object Code
2

Parsing Overview

What is syntax ?
The way in which words are put together to form phrases, clauses,
or sentences.

The function of a parser :


Input: sequence of tokens from lexical analyzer
Output: parse tree of the program

The parser checks the stream of words (tokens) and their parts of speech
for grammatical correctness.
It determines if the input is syntactically well formed.
It guides context sensitive (semantic) analysis (type checking).
Finally, it builds IR for the source program.
The parser ensures that sentences of a programming language that make
up a program abide by the syntax of the language.
If there are errors, the parser will detect them and reports them
accordingly.

Consider the following code segment that contains a number of


syntax errors:

int* foo(int i, int j))


{
for(k=0; i j; )
if( i > j )
return j;
}

It is clear that a scanner based upon regular expressions will not be able
to detect syntax error.
5

Errors in the previous code:

Line 1 has extra parenthesis at the end.


The boolean expression in the for loop in line 3 is incorrect.
All such errors are due to the fact that the function does not abide by the
syntax of the C++ language grammar.

Example

Consider a program statement


if x == y
z =1
else
z=2

Parser input:
IF ID == ID
ID = INT
ELSE
ID = INT

Example
IF-THEN-ELSE
=

==
ID

ID

ID

=
INT

ID

INT

Example

Java expression
x == y ? 1 : 2

Parser input
ID == ID ? INT : INT

Parser output

?:
INT

==
ID

INT

ID
9

Comparison with Lexical Analysis


Phase

Input

Output

Lexical Analyzer

Sequence of
characters

Sequence of tokens

Parser

Sequence of tokens

Parse tree

10

Scanners
Task: recognize language tokens
Implementation: DFA
Transition based on the next character
Parsers
Task: recognize language syntax (organization of tokens)
Implementation:
Top-down parsing
Bottom-up parsing

11

Role of the Parser

Not all sequences of tokens are programs


Parser must distinguish between valid and invalid sequences of tokens

We need
A language for describing valid sequences of tokens
A method for distinguishing valid from invalid sequences of token
An acceptor mechanism that determines if input token stream
satisfies the syntax of the programming language.

12

QUIZZ

Write a program code in C++ that takes a


string and display tokens types; e.g., identifiers,
keywords, special characters, digits etc.

13

Context-Free Grammars (CFG)

The syntax of most programming languages is specified using Context-Free


Grammars (CFG).
Context- free syntax is specified with a four tuple grammar G=(S,N,T,P)
where
S is the start symbol (non terminal)
N is a set of non-terminal symbols that will be substituted by terminals
T is set of terminal symbols or words that cant be substituted
P is a set of productions or rewrite rules
Parsing is the process of discovering a derivation for some sentence of a
language. The mathematical model of syntax is represented by a grammar G.
The language generated by the grammar is indicated by L(G)
14

For example, the Context-Free Grammar for arithmetic expressions is


1. goal expr
2. expr expr op term | term
3. term number | id
4. op + |
For this CFG,
S = goal
T = { number, id}
N = { goal, expr, term, op}
P = { 1, 2, 3, 4} i.e., all the above 4 rules.

15

Example: Given the above CFG, we can derive sentence x+2-y by


repeated substitution.
Productions
Result
goal
goal expr
expr
expr expr op term
expr op term
term id
expr op y
op
expr y
expr expr op term
expr op term y
term number
expr op 2 y
op +
expr + 2 y
expr term
term + 2 y
term id
x+2y

16

Example: Given S aS | bS | a | b . Derive abbab.


S aS
abS
abbS
abbaS
abbab
Example:
S aA | bB
A aS | a
B bS | b
Derive bbaaaa. [for practice]

17

Key Idea
1.
2.

3.
4.

Begin with a string consisting of the start symbol S


Replace any non-terminal X in the string by a right-hand side of some
production
X Y1 Yn
Repeat step (2) until there are only terminals in the string
The successive strings created in this way are called sentential forms.

18

What is meant by context-free?


A rule that is free of context.
The non-terminals appear by themselves to the left of the arrow in
context-free rules:
A
The rule A says that A may be replaced by anywhere, regardless
of where A occurs.
On the other hand, we could define a context as pair of strings , , such
that a rule would apply only if occurs before and occurs after A.
We would write this as
A
Such a rule is called context -sensitive grammar rule.

19

Types of derivations:

Left-most derivation: replace left-most non-terminal at each step.

Right-most derivation: replace right-most non-terminal at each step.


Example: Consider E E + E | E E | (E ) | id
Derive a string i d i d + i d
E
E

E+E

E+E

E + id

E E+E

E E + id

id E + E

id id + E

E id + id

id id + id

id id + id
Left-most derivation
Right-most derivation
20

Parse Tree/Syntax Tree:


The derivations can be represented in a tree-like fashion called parse
tree.
It represents the syntactic structure of a string according to some formal
grammar.
It is made up of nodes and branches.
The start symbol is the root and the derived symbols are nodes.
The interior nodes contain the non-terminals used during the derivation.
The leaf nodes are the terminals.
Note that right-most and left-most derivations have the same parse tree
The difference is the order in which branches are added

21

Derivation- Learn by Example


Example: Given a CFG E E + E | E E | ( E ) | i d
Derive a string i d
id + id
Left-most derivation:

E
E+E

E E+E
id E + E

id id + E

id id + id

E
E
id

+
E

E
id

id

22

Derivation- Learn by Example

Right-most derivation:

E
E+E
E + id

E E + id
E id + id

id id + id

id

id

id

E
E
E
id

+
E

E
id

id

23

Example:

24

Abstract Syntax Tree

The parse tree contains a lot of unneeded information. Compilers often


use an abstract syntax tree (AST).
AST is much more concise; it summarizes grammatical structure
without the details of derivation. ASTs are one kind of intermediate
representation (IR).
For example, the AST for below parse tree constructed for id id + id
E
E
E
id

+
E

+
*

E
id

id

id
id

id
Parse Tree

Abstract Syntax Tree


25

CFG Ambiguity

A grammar is ambiguous if it generates two parse trees (left and right)


for the same string .
Equivalently, there is more than one right-most or left-most derivation
for some string.
Ambiguity is bad
Leaves meaning of some programs ill-defined
Ambiguity is common in programming languages
Arithmetic expressions
IF-THEN-ELSE

26

CFG Ambiguity
Consider E E + E | E E | ( E ) | i n t
We can generate a string int * int + int with two different parse trees.

E
int

E +

* E

int

int E

int

int

E
+ E
int

27

CFG Ambiguity
Examples of non-ambiguous CFG:
Consider a CFG of the language PALINDROME.

aSa

S aSa | bSb | a | b |

bSb
e

PALINDROME is a word that is readable the same from left or right.


e.g., abba, babab, abbabba.
Try the above CFG and derive parse trees.
Can more than one trees be generated for a single word?

28

Ambiguity: The Dangling Else

Consider the grammar


S if E then S | if E then S else S | OTHER
This grammar is also ambiguous. HOW?

The expression has two different parse trees


if E1 then if E2 then S1 else S2

if
E1

if E1 then
if E2 then S1
else S2

S2

if
E2

S1

if
E1

if
E2

S1

if E1 then
if E2 then S1
else S2

S2

Typically we want the second form because ELSE matches the closest
previously unmatched THEN.
29

Das könnte Ihnen auch gefallen