Sie sind auf Seite 1von 25

Chapter 3

Describing Syntax
and Semantics

ISBN 0-321-33025-0
Chapter 3 Topics

• 3.1 Introduction
• 3.2 The General Problem of Describing
Syntax
• 3.3 Formal Methods of Describing Syntax
• 3.4 Attribute Grammars
• 3.5 Describing the Meanings of Programs:
Dynamic Semantics

Copyright © 2006 Addison-Wesley. All rights reserved. 1-2


3.1 Introduction
• Syntax and semantics provide a language’s
definition
• Syntax: the form or structure of the
expressions, statements, and program
units
• Semantics: the meaning of the expressions,
statements, and program units
• E.g., while (x>20)
{ sum = sum + x;
x = x+1;
}

Copyright © 2006 Addison-Wesley. All rights reserved. 1-3


3.2 The General Problem of Describing
Syntax: Terminology

• A language is a set of sentences


• A sentence/statement is a string of
characters over some alphabet
• A token is a category of lexemes (e.g.,
identifier)
• A lexeme is the lowest level syntactic unit
of a language (e.g., *, sum, begin)

Copyright © 2006 Addison-Wesley. All rights reserved. 1-4


3.2 The General Problem of Describing
Syntax: Terminology

• E.g.
language: token: identifier

{
int index, count; token: int literal
statement …
index = 2 * count + 17;
}

lexeme

Copyright © 2006 Addison-Wesley. All rights reserved. 1-5


Formal Definition of Languages

• Recognizers
– A recognition device reads input strings of the
language and decides whether the input strings
belong to the language
– Example: syntax analysis part of a compiler
– Detailed discussion in Chapter 4
• Generators
– A device that generates sentences of a language
– One can determine if the syntax of a particular
sentence is correct by comparing it to the
structure of the generator

Copyright © 2006 Addison-Wesley. All rights reserved. 1-6


3.3 Formal Methods of Describing
Syntax
• 3.3.1 Backus-Naur Form and Context-
Free Grammars (BNF form)
– Most widely known method for describing
programming language syntax
• 3.3.2 Extended BNF
– Improves readability and writability of BNF

Copyright © 2006 Addison-Wesley. All rights reserved. 1-7


BNF and Context-Free Grammars

• Context-Free Grammars
– Developed by Noam Chomsky in the mid-1950s
– Natural language linguist
– Described four classes of grammars that define
four classes of languages
– Two classes (context-free and regular) turned
out to be useful for describing the syntax of
programming languages

Copyright © 2006 Addison-Wesley. All rights reserved. 1-8


Backus-Naur Form (BNF)
• Backus-Naur Form (1959)
– Invented by John Backus to describe Algol 58,
later modified by Peter Naur
– BNF is equivalent to context-free grammars
– BNF is a metalanguage used to describe
another language
– In BNF, abstractions are used to represent
classes of syntactic structures--they act like
syntactic variables (also called nonterminal
symbols)
• E.g. <assign> -> <var> = <expression>

a = 2*b ;
Copyright © 2006 Addison-Wesley. All rights reserved. 1-9
BNF Fundamentals

• Non-terminals: BNF abstractions


• Terminals: lexemes and tokens
• Grammar: a collection of rules
– Examples of BNF rules:
<ident_list> → identifier | identifier, <ident_list>
<if_stmt> → if <logic_expr> then <stmt>

Copyright © 2006 Addison-Wesley. All rights reserved. 1-10


BNF Rules

• A rule has a left-hand side (LHS) and a


right-hand side (RHS), and consists of
terminal and nonterminal symbols
• A grammar is a finite nonempty set of
rules
• An abstraction (or nonterminal symbol)
can have more than one RHS
<stmt>  <single_stmt>
| begin <stmt_list> end
Qs: ? ?

? ?
Copyright © 2006 Addison-Wesley. All rights reserved. 1-11
Specific Rule for Describing Lists

• Syntactic lists are described using


recursion
<ident_list>  ident
| ident, <ident_list>

Copyright © 2006 Addison-Wesley. All rights reserved. 1-12


Grammars and Derivations

• A derivation is a repeated application of


rules, starting with the start symbol and
ending with a sentence (all terminal
symbols)

Copyright © 2006 Addison-Wesley. All rights reserved. 1-13


An Example Grammar And Derivation

Begin

Grammar Derivation of
A = B + C;
B = C;
End

<program>  begin <stmt_list> end <program> => begin <stmt_list> end


<stmt_list>  <stmt> begin <stmt>; <stmt_list> end
| <stmt>; <stmt_list> begin <var> = <expression>;<stmt_list> end
<stmt>  <var> = <expression> begin A = <expression>;<stmt_list> end
<var>  A | B | C begin A = <var> + <var>;<stmt_list> end
<expression>  <var> + <var> begin A = B + C;<stmt_list> end
| <var> - <var> Begin A = B + C;<stmt> end
| <var>
begin A = B + C; <var> = <expression> end
begin A = B + C; B = <expression> end
begin A = B + C; B = <var> end
Copyright © 2006 Addison-Wesley. All rights reserved.
begin A = B + C; B = C end 1-14
Derivation

• Every string of symbols in the derivation is a


sentential form
• A sentence is a sentential form that has only
terminal symbols
• A leftmost derivation is one in which the leftmost
nonterminal in each sentential form is the one that
is expanded
• A derivation can be rightmost or neither leftmost
nor rightmost
• Derivation order has no effect on the language
generated by a grammar

Copyright © 2006 Addison-Wesley. All rights reserved. 1-15


Another Example

Grammar Derivation of “A = B * (A + C )”

<assign>  <id > = <expr> <assign> => <id> = <expr>


<id>  A | B | C =>A = <expr>
<expr>  <id> + <expr> => A = <id> * <expr>
| <id> * <expr> => A = B * <expr>
| (<expr>) => A = B * (<expr>)
| <id> => A = B * (<id> + <expr>)
=> A = B * (A + <expr>)
= > A = B * (A + <id>)
=> A = B * ( A + C )

Copyright © 2006 Addison-Wesley. All rights reserved. 1-16


Parse Tree
• A hierarchical representation of a derivation
<assign>

<id> = <expr>

A <id> * <expr>

B ( <expr> )

<id> + <expr>

A <id>

C
Copyright © 2006 Addison-Wesley. All rights reserved. 1-17
Ambiguity in Grammars

• A grammar is ambiguous if and only if it


generates a sentential form that has two
or more distinct parse trees

Copyright © 2006 Addison-Wesley. All rights reserved. 1-18


An Ambiguous Expression Grammar
<assign> -> <id> = <expr>
<id> -> A|B|C
<expr> -> <expr> + <expr>
A=B+C*A
|<expr> * <expr>
|(<expr>)
|<id>

a. <assign> b. <assign>

<id> = <expr> <id> = <expr>

A <expr> + <expr> A <expr> * <expr>

<id> <expr> * <expr> <expr> + <expr> <id>

B <id> <id> <id> <id> A

C A B C
Copyright © 2006 Addison-Wesley. All rights reserved. 1-19
Operator Precedence

• An operator in an arithmetic expression is


generated lower in the parse tree (and
therefore must be evaluated first) can be
used to indicate that it has precedence over
an operator produced higher up in the tree
• As in previous slides
– Tree a: A = B + (C * A)
– Tree b: A = (B + C ) * A

Copyright © 2006 Addison-Wesley. All rights reserved. 1-20


An Unambiguous Expression Grammar

<assign> -> <id> = <expr>


<id> - > A | B | C <assign>
<expr> - > <expr> + <term> <id> = <expr>
|<term>
<term> - > <term> * <factor> A <expr> + <term>
|<factor> <term> <term> * <factor>
<factor> - >(<expr>)
|<id> <factor> <factor> <id>

<id> <id> A

B C

Copyright © 2006 Addison-Wesley. All rights reserved. 1-21


Leftmost and rightmost derivations
leftmost: rightmost:
<assign> =><id> = <expr> <assign> =><id> = <expr>
=>A = <expr> =><id> = <expr> + <term>
 A = <expr> + <term> =><id> = <expr> + <term>*<factor>
 A = <term> + <term> =><id> = <expr> + <term>*<id>
 A = <factor> + <term> =><id> = <expr> + <term>*A
 A = <id> + <term> =><id> = <expr> + <factor> * A
 A = B + <term> <id> =<expr> + <id> * A
 A = B + <term> * <factor> <id> = <expr> + C * A
 A = B + <factor> * <factor> <id> = <term> + C * A
 A = B + <id> * <factor> <id> = <factor> + C * A
 A = B + C * <factor> <id> = <id> + C * A
 A = B + C * <id> <id> = B + C * A
 A=B+C*A A = B + C * A
Every derivation with an unambiguous grammar has a unique parse
tree, although that tree can be represented by different derivations.
Copyright © 2006 Addison-Wesley. All rights reserved. 1-22
3.3.2 Extended BNF

• Optional parts are placed in brackets [ ]


<selection> -> if (<expression>) <statement> [else
<statement>]
• Repetitions (0 or more) are placed inside braces
{}
<ident_list> -> <identifier> {, <identifier>}
• When a single element must be chosen from a
group, the options are placed in parentheses
and separted by the OR operator, |.
<term> -> <term> (* | / | %)<factor>

Copyright © 2006 Addison-Wesley. All rights reserved. 1-23


BNF and EBNF

• BNF
<expr>  <expr> + <term>
| <expr> - <term>
| <term>
<term>  <term> * <factor>
| <term> / <factor>
| <factor>
• EBNF
<expr>  <term> {(+ | -) <term>}
<term>  <factor> {(* | /) <factor>}

Copyright © 2006 Addison-Wesley. All rights reserved. 1-24


EBNF variations

• In place of the arrow, a colon is used and the RHS


is placed on the next line
• Instead of a vertical bar to separate alternative
RHSs, they are simply placed on separate lines
• In place of squared brackets to indicate something
being optional, the subscript opt is used. E.g.
– ConstructorDeclarator ->
SimpleName(FormalParameterListopt)
• Rather than using the | symbol in a parenthesized
list of elements to indicate a choice, the words
“one of” are used. E.g.
– AssignmentOperator -> one of
= *= /= %= += -= <<= >>= &= |=

Copyright © 2006 Addison-Wesley. All rights reserved. 1-25