Sie sind auf Seite 1von 53

Programming Languages and Compiler Design

Programming Language Semantics


Compiler Design Techniques

Yliès Falcone, Frédéric Prost

Master of Sciences in Informatics at Grenoble (MoSIG)


Grenoble Universités
(Université Joseph Fourier, Grenoble INP)

Academic Year 2011 - 2012


Some practical information
Lecture sessions:
I Yliès Falcone
I Frédéric Prost
Exercise sessions (TD):
I Yliès Falcone
Emails: FirstName.LastName@imag.fr (with no accents)
Web pages: I http://www.ylies.fr
I http://membres-lig.imag.fr/prost/
I A common Web-page with all resources is
forthcoming
Phone numbers:
I YF: +33 4 76 82 72 14
I FP: +33 4 76 63 56 71
Office location: LIG, Buildings C (YF) and D (FP). Those buildings are
in front of or next to the UFR.
Meetings are possible (on appointment)
1/36
Global Course objectives

I Programming languages, and their description:


I syntax
I semantics
I General compiler architecture
I Some more detailed compiler techniques

Basic objective
How to translate a program in a programming language to a program
executable by a machine

2/36
References

A. Aho, R. Sethi and J. Ullman


Compilers: Principles, techniques and tools
InterEditions, 1989
H. R. Nielson and F. Nielson.
Semantics with Applications: An Appetizer.
Springer, March 2007. ISBN 978-1-84628-691-9
W. Waite and G. Goos.
Compiler Construction
Springer Verlag, 1984
R. Wilhelm and D. Maurer.
Compilers - Theory, construction, generation
Masson 1994

These lectures are based on a course built by Jean-Claude Fernandez,


Yassine Lakhnech, and Laurent Mounier

3/36
Outline for today

Compilers: some light reminders


Overview of a compiler architecture

Compiling, semantics: two connected themes

Some Maths Reminders


Compilers: what you surely already know. . .

A compiler is a language processor: transforms a program


I from a language we can understand: the programming language
I to a language the machine can understand: the target language

Source Program

Compiler

Target Code

4/36
Compilers: what you surely already know. . .
Source Program

Pre−processor

Modified Source Program

Compiler

Program in target assembler

Assembler

Relocatable Machine code

Linker−Loader Librairies

Target Code
5/36
Compilers: what do we expect ?

source pgm S target pgm T

language Ls
Compiler language Lt

Expected Properties?

6/36
Compilers: what do we expect ?

source pgm S target pgm T

language Ls
Compiler language Lt

Expected Properties?
I correctness:
execution of T should preserve the semantics of S
I efficiency:
T should be optimized w.r.t. some execution resources (time,
memory, energy, etc.)
I “user-friendliness”: errors in S should be accurately reported
I completeness: any correct Ls-program should be accepted

6/36
Many programming language paradigms . . .

Imperative languages
FORTRAN, Algol-xx, Pascal, C, Ada, Java, etc
control structure, (explicit) memory assignment,
expressions, types, . . .
Functional languages
ML, CAML, LISP, Scheme, etc
term reduction, function evaluation, recursion, . . .
Object-oriented languages
Java, Ada, Eiffel, ...
objects, classes, types, inheritance, polymorphism, . . .
Logical languages
Prolog
resolution, unification, predicate calculus, . . .
etc.

7/36
. . . and many architectures to target!

I Complex instruction set computer (CISC)


I Reduced instruction set computer (RISC)
I VLIW, multi-processor architectures
I dedicated processors (DSP, . . . )
I embedded systems (mobile phones, . . . ).
I etc.

8/36
We will mainly focus on:

Imperative languages
I data structures
I basic types (integers, characters, pointers, etc)
I user-defined types (enumeration, unions, arrays, . . . )
I control structures
I assignments
I iterations, conditionals, sequence
I nested blocks, sub-programs
“Standard” general-purpose machine architecture: (e.g. ARM, iX86)
I heap, stack and registers
I arithmetic and logical binary operations
I conditional branches

9/36
Describing a programming language P
Lexicon L: words of P
→ a regular language over P alphabet
Syntax S: sentences of P
→ a context-free language over L
Static semantic (e.g., typing): “meaningful” sentences of P
→ subset of S, defined by inference rules or attribute
grammars
Dynamic semantic: the meaning of P programs

10/36
Describing a programming language P
Lexicon L: words of P
→ a regular language over P alphabet
Syntax S: sentences of P
→ a context-free language over L
Static semantic (e.g., typing): “meaningful” sentences of P
→ subset of S, defined by inference rules or attribute
grammars
Dynamic semantic: the meaning of P programs

Meaning?
But How to define the meaning of program?
→ The semantics of programs

10/36
Describing a programming language P
Lexicon L: words of P
→ a regular language over P alphabet
Syntax S: sentences of P
→ a context-free language over L
Static semantic (e.g., typing): “meaningful” sentences of P
→ subset of S, defined by inference rules or attribute
grammars
Dynamic semantic: the meaning of P programs

Meaning?
But How to define the meaning of program?
→ The semantics of programs

Semantics?
I Several notions/visions of semantics
→ transition relation, predicate transformers, partial functions
I Depends on “what we want to do/know on programs”
10/36
Compiler architecture: the logical steps
source pgm lexical analysis
tokens
syntactic analysis
AST + symbol table
semantic analysis
AST + symbol table
optimisation
intermediate code
code generation target pgm
In practice:
I steps regrouped into passes

I one passe vs multi-pass


11/36
Running example: an assignment
Consider the assignment position = initial + speed ∗ 60
Example (Processing position = initial + speed ∗ 60)
position = initial+speed*60

COMPILER
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Intermediate Code Generation
Optimisation
Code Generation

12/36
Lexical analysis by a scanner
Input: sequence of characters
Output: sequence of lexical unit classes

1. compute the longest sequence ∈ a given lexical class


,→ lexems of the program
2. insert a reference in the symbol table for identifiers
3. returns to the syntactical analyzer:
I lexical class (token): constants, identifiers, keywords, operators,
separators, . . .
I the element associated to this class: the lexem
4. skip the comments
5. special token: error

Based on Formal tools: regular languages


I (deterministic) finite automata
I regular expressions
Example of scanner: LeX (scanner generator)
13/36
Lexical Analysis on the running example

Example (position = initial + speed ∗ 60)


lexem token
position < id, 1 >
= <=> Symbol Table
initial < id, 2 > 1 position ...
+ <+> 2 initial ...
speed < id, 3 > 3 speed ...
∗ <∗>
60 < 60 >

Some remarks:
I id is an abstract symbol meaning identifier
I In < id, i >, i is a reference to the entry in the symbol table
(The entry in the symbol table associated to an identifier contains
information on the identifier such as name and type)
I normally < 60 > is represented < number , 4 >

14/36
Running example: an assignment

Example (Lexical analysis of


position = initial + speed ∗ 60)
position = initial+speed*60

Symbol Table
1 position ...
Lexical Analysis
2 initial ...
3 speed ...

< id , 1 >, <=>, < id , 2 >, < + >, < id , 3 >, < ∗ >, < 60 >

15/36
About the symbol table

Some features:
I Data structure containing an entry for each identifier (variable
name,. . . )
I Rapid Read/Write accesses

Store the various attributes of identifiers


I allocated memory

I type
I scope (locations of the program where the variable can be used)
I for procedure names: number and types of the parameters
I ...

16/36
Syntactic Analysis by a parser

Input: sequence of tokens


Output: abstract syntax tree (AST) + (completed) symbol table

1. syntactic analysis of the input sequence


2. AST construction (from a derivation tree)
,→ depicts the grammatical structure of the program
I node: an operation
I children: arguments of operations
3. complete the symbol table

Based on Formal tools: context-free languages (CFG)


I Push-down automata
I Context-free grammars

Example of parser: Yacc (parser generator)

17/36
Syntactic Analysis of the running example
Example (Syntactic analysis of
< id, 1 >, <=>, < id, 2 >, < + >, < id, 3 >, < ∗ >, < 60 >)
=

< id, 1 > +

< id, 2 > ∗

< id, 3 > 60

I ∗ has < id, 3 > as a left-child and 60 as a right child


I The AST indicates the order to compute the assignment
(compatible with arithmetical conventions)

The next steps (of analysis and generation) will use the syntactic
structure of the tree
18/36
Running example: an assignment
position = initial+speed*60

Lexical Analysis

< id , 1 >, <=>, < id , 2 >, < + >, < id , 3 >, < ∗ >, < 60 >

Symbol Table Syntactic Analysis


1 position ...
2 initial ...
3 speed ... =

< id , 1 > +

< id , 2 > ∗

< id , 3 > 60

19/36
Semantic analysis

Input: : Abstract syntax tree (AST) + Symbol Table


Output: : enriched AST / error wrt semantics

1. name identification:
→ bind use-def occurrences
2. type verification and/or type inference
→ type system
(e.g., ∗ uses integers, indexes of arrays are integers,. . . )
3. languages may allow type coercion
⇒ traversals and modifications of the AST

Based on the language semantics

20/36
Semantic Analysis of the running example

Example (from the previous AST)


=

< id, 1 > +

< id, 2 > ∗

< id, 3 > inttofloat

60

speed (< id, 2 >) is declared as a float


I Type inference: position (< id, 1 >) is a float
I Type coercion:
I60 denotes an integer
→ the integer 60 is converted to a float

21/36
Running example: an assignment
position = initial+speed*60

Lexical Analysis

< id , 1 >, <=>, < id , 2 >, < + >, < id , 3 >, < ∗ >, < 60 >

Syntactic Analysis

< id , 1 > +

Symbol Table < id , 2 > ∗

1 position ...
< id , 3 > 60
2 initial ...
3 speed ... Semantic Analysis

< id , 1 > +

< id , 2 > ∗

< id , 3 > intofloat

60

22/36
Intermediate Code generation

Input: AST
Output: intermediate code(, machine code)

I based on a systematic translation functions f s.t.

Semsource (P) = Semtarget (f(P))

I in practice: several intermediate code levels


(to ease the optimization steps)

Two concerns for the IR:


I easy to produce
I easy to analyze
I easy to translate to the target machine

Based on the semantics of the source and target languages

23/36
Intermediate Code generation for the running example

Example
=

< id, 1 > +


t1 = inttofloat(60)
t2 = id3 * t1
< id, 2 > ∗ →
t3 = id2 + t2
< id, 3 > inttofloat
id1 = t3

60
Some remarks:
I Every operation has at most one right-hand operand
I Use the order described by the AST
I Compiler may create temporary names that receive values created by
one operation: t1, t2, t3
I Some operations have less than 3 operands

24/36
Intermediate Code Optimization

Input/Output: Intermediate code

I several criteria: execution time, size of the code, energy


I several optimization levels (source level vs machine level)
I several techniques:
I data-flow analysis
I abstract interpretation
I typing systems
I etc.

25/36
Intermediate Code Optimization for the running example

Example
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3

26/36
Intermediate Code Optimization for the running example

Example
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3

Some remarks:
I Conversion of 60 to a float can be once for all by replacing the
inttofloat operation by the number 60.0
I t3 is only used to transmit the value to id1

26/36
Intermediate Code Optimization for the running example

Example
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3

Some remarks:
I Conversion of 60 to a float can be once for all by replacing the
inttofloat operation by the number 60.0
I t3 is only used to transmit the value to id1
→ The code can be “shortened”:
t1 = id3 * 60.0
id1 = id2 + t1

26/36
Optimization for the running example
position = initial+speed*60

Lexical Analysis

< id , 1 >, <=>, < id , 2 >, < + >, < id , 3 >, < ∗ >, < 60 >

Syntactic Analysis

< id , 1 > +

< id , 2 > ∗
t1 = t1 = id3 * 60.0
id1 = id2 + t1
< id , 3 > 60

Semantic Analysis Code Optimisation

< id , 1 > + t1 = inttofloat(60)


t2 = id3 * t1
t3 = id2 + t2
< id , 2 > ∗ id1 = t3

< id , 3 > intofloat


Intermediate Code Generation

60 27/36
(Final) Code Generation
Input: Intermediate code
Output: Machine code

Principles:
I Each intermediate statement is translated into a sequence of
machine statements that “does the same job”
I Each variable corresponds to a register or a memory address

Challenge: wisely use the registers


We will study 3-address code (Assembly code)
OPER Ri, Rj, Rk or OPER Ri, @

I at most 3 operands per statement


I 1 operand ≈ 1 register
I first operand is the destination

28/36
Final Code Generation for the running example
Example
Input:
t1 = id3 * 60.0
id1 = id2 + t1
Output:

LD R2, id3 (loads the content at memory @ id3 into R2)


MUL R2, R2, #60.0 (multiplies 60.0 by the content of R2)
LD R1, id2
ADD R1, R1, R2
ST id1, R1

I # in #60.0 indicates that it is an immediate constant


I We will refine this in the Exercise session and later in the course
I We do not talk about memory allocation which is also a very
important and large subject (we will come back on it later)
29/36
Final Code Generation for the running example
position = initial+speed*60

Lexical Analysis LD R2, id3


MUL R2, R2, #60.0
< id , 1 >, <=>, < id , 2 >, < + >, < id , 3 >, < ∗ >, < 60 > LD R1, id2
ADD R1, R1, R2
Syntactic Analysis ST id1, R1

< id , 1 > + Code Generation

< id , 2 > ∗
t1 = t1 = id3 * 60.0
id1 = id2 + t1
< id , 3 > 60

Semantic Analysis Code Optimisation

< id , 1 > + t1 = inttofloat(60)


t2 = id3 * t1
t3 = id2 + t2
< id , 2 > ∗ id1 = t3

< id , 3 > intofloat


Intermediate Code Generation

60 30/36
Outline

Compilers: some light reminders

Compiling, semantics: two connected themes

Some Maths Reminders


Motivation

Why do we need to study programming language semantics ?

Semantics is paramount to:


I write compilers (and program transformers)
I understand programming languages
I classify programming languages
I validate programs
I write program specifications

Why do we need to formalize this semantics?

31/36
Example: static vs. dynamic binding

Program Static Dynamic


begin var x := 0;
proc p is x := x ∗ 2;
proc q is call p;
begin
var x := 5;
proc p is x := x + 1;
call q; y := x;
end;
end

32/36
Example: static vs. dynamic binding

Program Static Dynamic


begin var x := 0;
proc p is x := x ∗ 2;
proc q is call p;
begin
var x := 5;
proc p is x := x + 1;
call q; y := x;
end;
end
What is the final value of y ?

32/36
Example: static vs. dynamic binding

Program Static Dynamic


begin var x := 0;
proc p is x := x ∗ 2;
proc q is call p;
begin
var x := 5;
proc p is x := x + 1;
call q; y := x;
end;
end
What is the final value of y ?
I dynamic scope for variables and procedures: y = 6
I dynamic scope for variables and static scope for procedures: y = 10
I static scope for variables and procedures: y = 5
32/36
Example: parameters

Program value reference


var a;
proc p(x);
begin
x := x + 1; write(a); write(x)
end;
begin
a := 2; p(a); write(a)
end;

33/36
Example: parameters

Program value reference


var a;
proc p(x);
begin
x := x + 1; write(a); write(x)
end;
begin
a := 2; p(a); write(a)
end;

What values are printed ?

33/36
Example: parameters

Program value reference


var a;
proc p(x);
begin
x := x + 1; write(a); write(x)
end;
begin
a := 2; p(a); write(a)
end;

What values are printed ?


p(a) write(a)
call-by-value 2 3 2
call-by-reference 3 3 3

33/36
Overview of the semantics part of the course
Various Semantic styles:
Operational semantics: “How a computation is performed?” - meaning
in terms of “computation it induces”
I Natural: “from a bird-eye view”
I Operational: “step by step”

Axiomatic semantics (Hoare logic): “What are the properties of the


computation?”
I Specific properties using assertions,
pre/post-conditions
I Some aspects of the computation are ignored

Denotational semantics: “What is performed by the computation?”


I Meaning in terms of mathematical objects
I Only the effect

Different styles/techniques for different purposes: not rival!


Language families:
I imperative
I functional
34/36
Outline

Compilers: some light reminders

Compiling, semantics: two connected themes

Some Maths Reminders


Inductive/Compositional definitions

Let us consider:
I E a set
I f : E × E × . . . × E → E a partial function
I A ⊆ E a subset of E

35/36
Inductive/Compositional definitions

Let us consider:
I E a set
I f : E × E × . . . × E → E a partial function
I A ⊆ E a subset of E

Definition (closure)
A is closed by f iff f (A × . . . × A) ⊆ A

35/36
Inductive/Compositional definitions

Let us consider:
I E a set
I f : E × E × . . . × E → E a partial function
I A ⊆ E a subset of E

Definition (closure)
A is closed by f iff f (A × . . . × A) ⊆ A

Definition (Inductive definition)


An inductive definition on E is a family of rules (partial functions)
defining the smallest subset of E that is closed by these rules

35/36
Inductive definitions: examples

Example (Natural numbers)


How can define them?
I basis element 0
I 1 rule: x 7→ succ(x)
2 is the natural number defined as succ(succ(succ(0)))

Example (Even numbers)


I basis element 0
I 1 rule x 7→ x + 2

Example (Palindromes on {a, b})


I basis elements , a, b
I 2 rules: w 7→ a · w · a, w 7→ b · w · b

36/36
A notation: derivation tree

t = f (x1 , . . . , xn ): “t is built/obtained from x1 , . . . , xn0 ”


will be noted
x1 . . . xn
t

Example (Derivation trees)


I 2 = succ(succ(succ(0))) is a natural number

0
1
2

I aba is a palindrome:
I ababa is a palindrome:

a
b bab
aba ababa

37/36
(Some simple) Proofs techniques
Proof by contradiction, reducto-ad-absurdum, contraposition,. . .

Structural Induction
I Proof for the basic elements, atoms, of the set.
I Proof for composite elements (created by applying) rules:
I assume it holds for the immediate components (induction hypothesis)
I prove the property holds for the composite element

Induction on the shape of a derivation tree


I Proof for ’one-rule’ derivation trees, i.e., axioms.
I Proof for composite trees:
I For each rule R, consider a composite tree where R is the last rule
applied
I Proof for the composite tree
I Assume it holds for subtrees, or premises of the rule (induction
hypothesis)
I Proof for the composite tree

38/36

Das könnte Ihnen auch gefallen