Sie sind auf Seite 1von 40

1

Introduction to Compiler
Design
B.Sc.(CS)-4th Year, First
Semester
(Session-2015-16)
Dr. Deepak K.
Sinha, JIT, JU

Syllabus
Prerequisites: Formal Language and
Automata Theory
Textbook: Compilers: Principles, Techniques,
and Tools by Aho, Sethi, and Ullman, 2nd
edition
Other material: class handouts
Grade breakdown:
Exams (two midterm, one final) (30%+50%)
Lab/Project assignments (15%)
Assignment (5%)

Dr. Deepak K.
Sinha, JIT, JU

Objectives
Be able to build a compiler for a (simplified)
(programming) language
Know how to use compiler construction tools,
such as generators of scanners and parsers
Be familiar with assembly code and virtual
machines, such as the JVM, and bytecode
Be familiar with compiler analysis and
optimization techniques

Dr. Deepak K.
Sinha, JIT, JU

Introductions.
Programming languages are notations for
describing computations to people and to
machine.
The world as we know it depends on
programming language, because all the S/W
running on all the computers was written in
some programming language.
But, before a program can be run, it first
must be translated into a form in which it can
be execute by computer.
Dr. Deepak K.
Sinha, JIT, JU

The S/W system that do this translation are


called Compilers.
The study of Compilers Writing touches
upon:
Programming Languages
Machine Architecture
Language Theory
Algorithms
S/W Engineering

Dr. Deepak K.
Sinha, JIT, JU

Language Processors
Simply stated, a compiler is a program that
can read a program into one language-(the
source language) and translate it into an
equivalent program into another language(the target language)
Input
Source
Program

Dr. Deepak K.
Sinha, JIT, JU

Compiler

Error messages

Target
Program
Output

An Important role of the compiler is to report


any errors in the source program that it
detects during the translation process.
If the target program is an executable
machine language program, it can then be
called by the user to process inputs and
produce outputs.

Dr. Deepak K.
Sinha, JIT, JU

Compilers and Interpreters


(contd)
An Interpreter is another kind of language
processor. Instead of producing a target
program as a translation, an interpreter
appears to directly execute the operation
specified in the source program on inputs
supplied by the user.
Source
Program
Interpreter
Input
Dr. Deepak K.
Sinha, JIT, JU

Error messages

Output

The machine-language target program


produced by complier is usually much faster
than an interpreter at mapping inputs to
outputs.
An Interpreter however, can usually give
better error diagnostics than a compiler,
because it executes the source program
statement by statement.

Dr. Deepak K.
Sinha, JIT, JU

A Hybrid Compilers

10

Source Program

Translato
r

Intermediate
Program
Inputs

Dr. Deepak K.
Sinha, JIT, JU

VM

Output

In addition to the compiler, several other


program may be required to create an
executable target program:

11

A source program may be divided into modules stored


in separate files.
The task of collecting the source program is sometimes
entrusted to a separate program, called a preprocessor.
The preprocessor may also expand shorthand, called
macros, into source language statements.
The modified source program is then fed to a compiler.
The compiler may produce an assembly language
program as its output.
The assembly language is then processed by a
program called assembler that produces relocatable
Dr.
Deepak K.
machine
code as its output.
Sinha, JIT, JU

Contd..

12

Large programs are often complied in pieces,


so the relocatable machines code may have
to be linked together with other relocatable
object files and library files into the code that
actually runs on the machine(linker)
The loader then puts together all the
executable objects files into memory for
execution.

Dr. Deepak K.
Sinha, JIT, JU

Preprocessors, Compilers,
Assemblers, and Linkers

13

Source Code
Preprocessor
Modified Source Code
Compiler
Target Assembly Program
Assembler
Relocatable Object Code
Linker/Loader
Dr. Deepak K.
Sinha, JIT, JU

Libraries and
Relocatable Object Files

Target Machine Code

The structure of a compiler


Up to this point, we have treated a compiler
as a single box that maps a source program
into semantically equivalent target program.
If we open up this box a little, we see that
there are two parts of this mapping
Analysis, and
Systhesis

Dr. Deepak K.
Sinha, JIT, JU

14

15

The Structure of a Compiler (1)


Any compiler must perform two major tasks
Compiler

Analysis

Synthesis

Analysis of the source program


Dr. Deepak K.
Synthesis
of a machine-language program
Sinha,
JIT, JU

The Analysis-Synthesis Model of


Compilation
Analysis part breaks up the source program into
constituent piece and imposes a grammatical
structure on them
If the analysis part detects that the source
program is either syntactically ill formed or
syntactically unsound, then it must provide
information message, so the user can take
correct action.
The analysis part also collects information about
the source program and stores it in a data
structure called a symbol table, which is passed
along with the intermediate representation to the
Dr. Deepak K.
synthesis
part.
Sinha,
JIT, JU

16

17

Contd
The synthesis part construct the desired
target program from the intermediate
representation and information in the symbol
table
Synthesis takes the tree structure and
translates the operations therein into the
target program
The analysis part is often called the front end
and the synthesis part is the back end of the
compiler.
Dr. Deepak K.
Sinha, JIT, JU

The Structure of Compiler (Phases of Compiler)

Character
stream

Lexical
Analyzer

Syntax
Analyzer
Token
Syntax
Stream
Tree

Symbol
Table
MachineCode
Dependent
Generator
Code Optimizer
IR
Target
machine
Target Machine
code
Deepak K.
CodeDr.
Sinha, JIT, JU

18

Semantic
Analyzer
Syntax
tree
Intermediat
e code
Generator
Intermediate
representation
MachineIndependent
Code
Optimizer

19

Code Generator
[Intermediate Code Generator]

Scanner
[Lexical Analyzer]

Non-optimized Intermediate Code

Tokens
Code Optimizer
Parser
[Syntax Analyzer]

Optimized Intermediate Code

Parse tree

Code Optimizer
Semantic Process
[Semantic analyzer]
Abstract Syntax Tree w/ Attributes

Dr. Deepak K.
Sinha, JIT, JU

Target machine code

Lexical Analysis

20

The first phase of a compiler is called Lexical


Analysis or Scanning.
The Lexical Analyzer reads the stream of
characters making up the source program
and groups the characters into meaningful
sequences called Lexemes.
For each lexemes, the lexical analyzer
produces an output a token of the form
<token-name, attribute-value>

The same token is being passed to the


subsequent phase, syntax alalysis
Dr. Deepak K.
Sinha, JIT, JU

In the token, the first component token-name


is an abstract symbol that is used during
syntax analysis, And
The second component attribute-value points
to an entry in the symbol table for this token.
(Information from the symbol table entry is
needed for semantic analysis and code
generation)
Example:
p=i+r*60
Dr. Deepak K.
Sinha, JIT, JU

21

p=i+r*60
p is a lexeme that would be mapped into a
token <id, 1>, where id is an abstract symbol
standing for identifier and 1 points to the
symbol-table entry for p.
The symbol-table entry for an identifier holds
information about the identifier, such as its
name and type 1 p
-------2 i

3 r

symbol table
Dr. Deepak K.
Sinha, JIT, JU

22

23

The assignment symbol = is a lexeme that


is mapped into the token <=>
i.<id, 2>
+.<+>
r.........................<id, 3>
*<*>
60<60>

Dr. Deepak K.
Sinha, JIT, JU

Syntax Analysis

24

The second phase of the compiler is syntax


analysis or parsing
The parser uses the first component of the
tokens produced by the lexical analyzer to
create a tree-like intermediate representation
that depicts the grammatical structure of the
token stream
A typical representation of a syntax tree in
which each interior node represents an
operation and the children of the node
represent the arguments of the operation.
Dr. Deepak K.
Sinha, JIT, JU

25

Dr. Deepak K.
Sinha, JIT, JU

Semantic Analysis
The semantic analyzer uses the syntax tree
and the information in the symbol table to
check the source program for semantic
consistency with the language definition.
It also gathers type information and saves
it in either the syntax tree or the symbol
table, for subsequent use during
intermediate code generation.

Dr. Deepak K.
Sinha, JIT, JU

26

An important part of semantic analysis is


type checking, where the compiler checks
that each operator has matching operands.
For Example, many programming language
definition require an array index to be an
integer; the compiler must report an error if
a floating-point number is used to index an
array.
The language specification may present
some type conversion called coercions.
(For example p,i,r have been declared to
be floating point numbers, and that the
Dr. Deepak K.
lexeme
<60> by itself forms an integer. )
Sinha, JIT, JU

27

28

The type checker in the semantic analyzer


discovers that the operator * is applied to a
floating-point number (r) and an integer 60,
in this case the integer may be converted
into a floating-point number.

Dr. Deepak K.
Sinha, JIT, JU

Intermediate Code Generator


In the process of translating a source
program into target code, a compiler may
construct one or more intermediate
representation.
After syntax and semantic analysis of the
source program, many compilers generate
an explicit low-level or machine like
intermediate representation.
The intermediate representation should
have two important properties:
It should be easy to produce, and it should be

Dr. Deepak K.
Sinha, JIT, JU

29

The output of the intermediate code


generation consists of three-address code
sequence, as below:

Why and how three-address???


Each three-address assignment instruction has
at most one operator on the right side, thus the
precedence of the operation can be done.
Compiler must generate a temporary name to
hold the value computed by a three-address
method.
First and last in the sequence have fewer then
Dr. Deepak K.
Sinha,
JIT, JU operands.
these

30

Code Optimization
The code optimization attempts to improve
the intermediate code so that better target
code will result.
Usually better means faster, but some times
shorter code or target code that consumes
less resources

Dr. Deepak K.
Sinha, JIT, JU

31

Code Generation
The code generation takes as input an
intermediate representation of the source
program and maps it into the target
language.
If the target language is machine code,
registers or memory location are selected
for each variables used by the program

Dr. Deepak K.
Sinha, JIT, JU

32

Compiler-Construction Tools
Software development tools are available to
implement one or more compiler phases
Parser generators:
That automatically produces syntax analyzer
from a grammatical description of a
programming language.
Scanner generators:
That produce lexical analyzers from a regularexpression description of the token of a
language.
Syntax-directed translation engines:
That produce collection of routines for walking a
Dr. Deepak K.
parse
Sinha,
JIT, JU tree and generating intermediate code.

33

34
Automatic code generators:
that produces a code generator from a
collection of rule for translating each operation
of the intermediate language into the machine
language for a target machine.
Data-flow engines:
That facilitate the gathering of information about
how values are transmitted from one part of a
program to each other part. This is key part of
code optimization.
-Compiler Construction toolkit:
That provides an integral set of the routine for
contracting various phases of a compiler

Dr. Deepak K.
Sinha, JIT, JU

Programming Language
Basics

Static/Dynamic Distinction:

Policy that allows a decision to be made when


we execute the program is said to be a dynamic
policy or to require a decision at run time or
otherwise it is static.
The scope of declaration:
A language uses static scope or lexical scope if it is
possible to determine the scope of a declaration by
looking only at the program, or otherwise, the
language uses dynamic scope.
(C/C++/JAVA uses static scope)
Dr. Deepak K.
Sinha, JIT, JU

35

Environment and Scopes


The association of name with locations in the memory and
then with values can be described by two mapping that
changes as the program runs
environment
state
names
location
value
(variable)
Environment changes according to the scope rule of a
language

Dr. Deepak K.
Sinha, JIT, JU

int i;
void f()
{
int i;
i=3;
}
x=x+i;

36

The Environment and the state mapping

37

Static/dynamic binding of names to locate:


Most binding of names to location is dynamic.

Static/dynamic binding of location to values:


The binding of locations to value is generally
dynamic, since we can not tell the value in
location until we run the program.

Dr. Deepak K.
Sinha, JIT, JU

Static scope and Block Structure


C and its family uses static scope (implicit)
C++/Java and C# explicit control over scope
through the use of keyword public, private and
protected.
main() {

int a=1, b=1;


{
int b=2;
{
int a=3;
cout<<a<<b;
}
{
int b=4;
cout<<a<<b;
}
cout<<a<<b;
}
cout<<a<<b;
Dr. Deepak K.
Sinha, JIT, JU

B3
B2
B4

B1

38

39

Declaration

Scope

int a=1

B1-B3

int b=1

B1-B2

int b=2

B2-B4

int a=3

B3

int b=4

B4

Dr. Deepak K.
Sinha, JIT, JU

Passing Parameter
Mechanism
Call by Value
Call by Reference

Dr. Deepak K.
Sinha, JIT, JU

40

Das könnte Ihnen auch gefallen