Sie sind auf Seite 1von 51

Compiler Design

Satya Ranjan Dash

1 Wednesday, December S.R. Dash KIIT University


08, 2021
Welcome!
A compiler is a program takes a program written in a source language
and translates it into an equivalent program in a target language.

source program COMPILER target program


( Normally a program written in
a high-level programming language) ( Normally the equivalent program in
machine code – relocatable object file)
error messages

2 Wednesday, December S.R. Dash KIIT University


08, 2021
 What is an interpreter?
– A program that reads an executable program and
produces the results of executing that program
 Target Machine: machine on which compiled
program is to be run
 Cross-Compiler: compiler that runs on a different
type of machine than is its target
 Compiler-Compiler: a tool to simplify the construction
of compilers (YACC/LEX)
3 Wednesday, December S.R. Dash KIIT University
08, 2021
Cross Compiler
 a compiler which generates target code for a
different machine from one on which the
compiler runs.
 A host language is a language in which the
compiler is written. S T
– T-diagram
H

 Cross compilers are used very often in practice.

4 Wednesday, December S.R. Dash KIIT University


08, 2021
High-level View of a Compiler

Complier
Source code Machine code

errors
Implications
Must recognize legal (and illegal) programs
Must generate correct code
Must manage storage of all variables (and code)
Must agree with OS & linker on format for object code

5 Wednesday, December S.R. Dash KIIT University


08, 2021
Other Applications
 In addition to the development of a compiler, the techniques used in
compiler design can be applicable to many problems in computer
science.
– Techniques used in a lexical analyzer can be used in text editors,
information retrieval system, and pattern recognition programs.
– Techniques used in a parser can be used in a query processing system
such as SQL.
– Many software having a complex front-end may need techniques used in
compiler design.
 A symbolic equation solver which takes an equation as input. That
program should parse the given input equation.
– Most of the techniques used in compiler design can be used in Natural
Language Processing (NLP) systems.

6 Wednesday, December S.R. Dash KIIT University


08, 2021
Applications of Compiler
Technology
 Implementation of high-level programming languages
 Optimizations for computer architectures
– Parallelism
– Memory hierarchy
 Design of new computer architecture
– RISC
– Specialized architectures
 Program translation
– Binary translation
– Hardware synthesis
– Database query interpreters
– Compiled simulation
 Software productivity tools
– Type checking
– Bounds checking
– Memory-management tools
7 Wednesday, December S.R. Dash KIIT University
08, 2021
Major Parts of Compilers
There are two major parts of a compiler: Analysis and Synthesis
 Analysis determines the operations implied by the
source program which are recorded in a tree
structure

 Synthesis takes the tree structure and translates the


operations therein into the target program

8 Wednesday, December S.R. Dash KIIT University


08, 2021
Major Parts of Compilers
 In analysis phase, ANALYSIS breaks the program into pieces and
creates an intermediate representation of the source program.
– Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the parts of this
phase.
 In synthesis phase, the equivalent target program is created from this
intermediate representation.
– Intermediate Code Generator, Code Generator, and Code Optimizer are the
parts of this phase.

Sometimes we call the analysis part the FRONT END and the synthesis part
the BACK END of the compiler. They can be written independently.

9 Wednesday, December S.R. Dash KIIT University


08, 2021
Analysis of the Source program
Analysis consists of 3 parts
Linear Analysis in which the stream of characters making up
the source program is read from left-to-right and grouped into
tokens that are sequences of characters having a collective
meaning.
Hierarchical Analysis, in which characters or tokens are
grouped hierarchically into nested collections with collective
meaning.
Semantic Analysis, in which certain checks are performed to
ensure that the components of a program fit together meaningfully.
E.g. Variables should be declared before they are used.
10 Wednesday, December S.R. Dash KIIT University
08, 2021
Lexical Analysis
The linear analysis stage is called LEXICAL ANALYSIS
or SCANNING.
Example:
position = initial + rate * 60
gets translated as:
1. The IDENTIFIER “position”
2. The ASSIGNMENT SYMBOL “=”
3. The IDENTIFIER “initial”
4. The PLUS OPERATOR “+”
5. The IDENTIFIER “rate”
6. The MULTIPLICATION OPERATOR “*”
7. The NUMERIC LITERAL 60

11 Wednesday, December S.R. Dash KIIT University


08, 2021
Syntax Analysis
Hierarchical analysis is called parsing or syntax analysis. it involves
grouping the tokens of the source program into grammatical phrases
that are used by the complier to synthesize output.
assignment stmt
=
expr
+
Identifier
pos expr expr
*
identifier
expr
init expr
number
identifier
60
rate

12 Wednesday, December S.R. Dash KIIT University


08, 2021
Semantic Analysis

The Semantic Analysis phase checks the source


program for semantic errors and gather type
information for the subsequent code generation
phase.
An important component of semantic Analysis is type
checking, here it checks that each operator has
operands that are permitted by the source language
specification.

13 Wednesday, December S.R. Dash KIIT University


08, 2021
source program

lexical
analyzer Phases of A Compiler

syntax
analyzer

semantic
analyzer
symbol-table error
manager intermediate handler
code generator

code
optimizer

code
generator

target program
14 Wednesday, December S.R. Dash KIIT University
08, 2021
Phases of A Compiler
Error Handler

Source Lexical Syntax Semantic Intermediate Code Target


Program Analyzer Analyzer Analyzer Code generator generator Program

Symbol Table Manager

 Each phase transforms the source program from one representation


into another representation.
 They communicate with error handlers.
 They communicate with the symbol table.

15 Wednesday, December S.R. Dash KIIT University


08, 2021
Process of Compiling
Stream of characters

scanner
Stream of tokens

parser
Parse/syntax tree

Semantic analyzer
Annotated tree
Intermediate code generator
Intermediate code
Code optimization
Intermediate code
Code generator
Target code

Code optimization

16 Wednesday, December S.R. Dash KIIT University


08, 2021
Symbol Table Management

A symbol table is a data structure containing a


record for each identifier, with fields for the
attributes of the identifier.
The data structure allow us to find the record
for each identifier quickly and to store or
retrieve data from that record quickly.
e.g. int pos,init,rate;

17 Wednesday, December S.R. Dash KIIT University


08, 2021
Symbol-table management
 During analysis, we record the identifiers used in the
program.
 The symbol table stores each identifier with its
ATTRIBUTES.
 Example attributes:
– How much STORAGE is allocated for the id
– The id’s TYPE
– The id’s SCOPE
– For functions, the PARAMETER PROTOCOL
 Some attributes can be determined immediately; some
are delayed.
18 Wednesday, December S.R. Dash KIIT University
08, 2021
Symbol Table
 Identifiers are names of variables, constants,
functions, data types, etc.
 Store information associated with identifiers
– Information associated with different types of identifiers
can be different
 Information associated with variables are name, type,
address,size (for array), etc.
 Information associated with functions are name,type of return
value, parameters, address, etc.

19 Wednesday, December S.R. Dash KIIT University


08, 2021
Symbol Table (cont’d)

 Accessed in every phase of compilers


– The scanner, parser, and semantic analyzer put
names of identifiers in symbol table.
– The semantic analyzer stores more information
(e.g. data types) in the table.
– The intermediate code generator, code optimizer
and code generator use information in symbol
table to generate appropriate code.
 Mostly use hash table for efficiency.
20 Wednesday, December S.R. Dash KIIT University
08, 2021
Error Detection and Reporting
 Each compilation phase can have errors
 Normally, we want to keep processing after an error,
in order to find more errors.
 Each stage has its own characteristic errors, e.g.
– Lexical analysis: a string of characters that do not
form a legal token
– Syntax analysis: unmatched { } or missing ;
– Semantic: trying to add a float and a pointer

Most of the errors usually handle in the syntax and


semantic analysis phase.
21 Wednesday, December S.R. Dash KIIT University
08, 2021
Error Handling
 Error can be found in every phase of compilation.
– Errors found during compilation are called static (or
compile-time) errors.
– Errors found during execution are called dynamic (or
run-time) errors
 Compilers need to detect, report, and recover from
error found in source programs
 Error handlers are different in different phases of
compiler.
22 Wednesday, December S.R. Dash KIIT University
08, 2021
Translation Process
Source
Code
Scanner
Tokens

Parser
Syntax Tree
Semantic Analyzer Symbol
Annotated Tree Table
Source code
Optimizer
Intermediate Code Error
Code Generator
Handler
Target Code

Target Code
Optimizer

Target Code
23 Wednesday, December S.R. Dash KIIT University
08, 2021
Scanner
The scanner performs what is called lexical Analysis. it collects sequences of
characters into meaningful units called tokens.

a[index] = 4 + 2
a identifier
[ left bracket
index identifier
] left bracket
= assignment
4 number
+ plus sign
2 number

24 Wednesday, December S.R. Dash KIIT University


08, 2021
Parser
The parser receives the source code in the form of tokens from
the scanner and performs syntax analysis, which determines
the structure of the program. Syntax analysis are usually
represented as a parse tree or a syntax tree.

25 Wednesday, December S.R. Dash KIIT University


08, 2021
position = initial + rate * 60

lexical analyzer
Internal
id1 = id2 + id3 * 60
Representations
syntax analyzer
Each stage of
symbol table = processing
id1 +
1 Position …
id2 * transforms a
2 initial … id3 60 representation of
3 rate …
the source code
4
semantic analyzer program into a new
representation.
=
id1 +
id2 *
id3 inttoreal

60
26 Wednesday, December S.R. Dash KIIT University
08, 2021
The Structure of a Compiler
Source
Program
Tokens Syntactic Semantic
Scanner Parser
Structure Routines

Intermediate
Representation

Symbol and Optimizer


Attribute
Tables

(Used by all Phases of The Compiler)

Code
Generator

27 Target machine code


27 Wednesday, December S.R. Dash KIIT University
08, 2021
The Structure of a Compiler

Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Scanner Representation
 The scanner begins the analysis of the source program by
reading the input, character by character, and grouping
characters into individualSymbol and
words and symbols (tokens) Optimizer
Attribute
 RE ( Regular expression ) Tables
 NFA ( Non-deterministic Finite Automata )
 DFA ( Deterministic Finite Automata )

(Used by all
LEX
Phases of
The Compiler) Code
Generator
28 Wednesday, December
S.R. Dash KIIT University Target machine code
08, 2021
The Structure of a Compiler

Source
Program Tokens Syntactic Semantic
Scanner Parser Structure
(Character Stream) Routines

Intermediate
Parser Representation

 Given a formal syntax specification (typically as a context-free


grammar [CFG] ), the parse reads tokens and groups them into
Symbol and Optimizer
units as specified by the productions of the CFG being used.
 As syntactic structure isAttribute
recognized, the parser either calls
Tables
corresponding semantic routines directly or builds a syntax tree.
 CFG ( Context-Free Grammar )
 (Used by all
LL, LR, SLR, LALR Parsers
 YACC Phases of
The Compiler) Code
Generator
29 Wednesday, December
S.R. Dash KIIT University Target machine code
08, 2021
The Structure of a Compiler
Source
Program Tokens Syntactic Semantic
Scanner Parser
Structure Routines

Intermediate
(Character Stream)
Representation

Semantic Routines
 Perform two functionsSymbol and Optimizer
 Check the static semantics of each construct
Attribute
 Do the actual translation Tables
 The heart of a compiler
(Used by all
 Syntax Directed Translation
Phases of
 Semantic Processing Techniques
The Compiler)
 IR (Intermediate Representation)
Code
Generator

30 Wednesday, December S.R. Dash KIIT University Target machine code


The Structure of a Compiler

Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Optimizer Representation
 The IR code generated by the semantic routines is analyzed and
transformed into functionally equivalent but improved IR code
 This phase can be verySymbol
complex and
and slow Optimizer
 Peephole optimization Attribute
Tables
 loop optimization, register allocation, code scheduling
 Register and Temporary Management
 Peephole Optimization
(Used by all
Phases of
The Compiler) Code
Generator
31 Wednesday, December
S.R. Dash KIIT University Target machine code
08, 2021
The Structure of a Compiler

Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Code Generator Representation
 Interpretive Code Generation
 Generating Code from Tree/Dag
 Grammar-Based Code Generator
Optimizer

Code
Generator
32 Wednesday, December Target machine code
S.R. Dash KIIT University
08, 2021
The Structure of a Compiler
Code Generator
[Intermediate Code Generator]

Non-optimized Intermediate Code


Scanner
[Lexical Analyzer]

Tokens

Code Optimizer
Parser
[Syntax Analyzer]
Optimized Intermediate Code
Parse tree

Code Generator
Semantic Process
[Semantic analyzer] Target machine code

Abstract Syntax Tree w/ Attributes

33 Wednesday, December S.R. Dash KIIT University


08, 2021
Cousins of the compiler
 PREPROCESSORS take raw source code and
produce the input actually read by the compiler
– MACRO PROCESSING: macro calls need to be
replaced by the correct text
 Macros can be used to define a constant used in many places.
E.g. #define BUFSIZE 100 in C
 Also useful as shorthand for often-repeated expressions:
#define DEG_TO_RADIANS(x) ((x)/180.0*M_PI)
#define ARRAY(a,i,j,ncols) ((a)[(i)*(ncols)+(j)])
– FILE INCLUSION: included files (e.g. using #include in
C) need to be expanded

34 Wednesday, December S.R. Dash KIIT University


08, 2021
Cousins of the compiler
 ASSEMBLERS take assembly code and
covert to machine code.
 Some compilers go directly to machine code;
others produce assembly code then call a
separate assembler.
 Either way, the output machine code is
usually RELOCATABLE, with memory
addresses starting at location 0.

35 Wednesday, December S.R. Dash KIIT University


08, 2021
Cousins of the compiler

 LOADERS take relocatable machine code


and alter the addresses, putting the
instructions and data in a particular location
in memory.

 The LINK EDITOR (part of the loader) pieces


together a complete program from several
independently compiled parts.

36 Wednesday, December S.R. Dash KIIT University


08, 2021
Compiler writing tools
 We’ve come a long way since the 1950s.
 SCANNER GENERATORS produce lexical analyzers
automatically.
– Input: a specification of the tokens of a language (usually written
as regular expressions)
– Output: C code to break the source language into tokens.
 PARSER GENERATORS produce syntactic analyzers
automatically.
– Input: a specification of the language syntax (usually written
as a context-free grammar)
– Output: C code to build the syntax tree from the token sequence.
 There are also automated systems for code synthesis.
37 Wednesday, December S.R. Dash KIIT University
08, 2021
Compiler-Construction Tools
Some commonly used compiler-construction tools include
1. Parser generators
 Automatically produce syntax analyzers from a grammatical description of a PL.
2. Scanner generators
 Produce lexical analyzers from a regular-expression description of the tokens of a
language.
3. Syntax-directed translation engines
 Produce a collection of routines for walking a parse tree and generating intermediate code.
4. Code-generator generators
 Produce a code generator from a collection of rules for translating each operation of
intermediate language into the machine language for the target language.
5. Data-flow analysis engines
 Facilitate the gathering of information about how values are transmitted from one part of a
program to each other part. Key part of code optimization.
6. Compiler-construction toolkits
 Provide an integrated set of routines for constructing various phases of a compiler.
38 Wednesday, December S.R. Dash KIIT University
08, 2021
Applications of Compiler Technology
 Implementation of high-level programming languages
 Optimizations for computer architectures
– Parallelism
– Memory hierarchy
 Design of new computer architecture
– RISC
– Specialized architectures
 Program translation
– Binary translation
– Hardware synthesis
– Database query interpreters
– Compiled simulation
 Software productivity tools
– Type checking
– Bounds checking
– Memory-management tools
39 Wednesday, December S.R. Dash KIIT University
08, 2021
The Grouping of Phases
 Compiler front and back ends:
– Front end: analysis (machine independent)
– Back end: synthesis (machine dependent)
 Compiler passes:
– A collection of phases is done only once (single
pass) or multiple times (multi pass)
 Single pass: usually requires everything to be defined
before being used in source program
 Multi pass: compiler may have to keep entire program
representation in memory

40 Wednesday, December S.R. Dash KIIT University


08, 2021
One-Pass Compiler
 A one-pass compiler is a compiler that
interleaves semantic analysis and code
generation with parsing.
 If intermediate code generation is interleaved
with parsing, a syntax tree is not needed to
be built.
 In such case, it is also possible to write the
intermediate code to an output file on-the-fly.

41 Wednesday, December S.R. Dash KIIT University


08, 2021
Multi-Pass Compiler
 Semantic analysis is easier to perform during
a separate traversal of a syntax tree.
 This is because a syntax tree reflects the
program’s semantic structure better than the
parse tree (especially with top-down
parsers).
 In this case, attribute grammars can be used
to build a syntax tree, not to enforce
semantic rules.
42 Wednesday, December S.R. Dash KIIT University
08, 2021
Lexical Analyzer
 Lexical Analyzer reads the source program character by
character and returns the tokens of the source program.
 A token describes a pattern of characters having same
meaning in the source program. (such as identifiers,
operators, keywords, numbers, delimiters and so on)
Ex: newval := oldval + 12 => tokens: newval identifier
:= assignment operator
oldval identifier
+ add operator
12 a number

 Puts information about identifiers into the symbol table.


 Regular expressions are used to describe tokens (lexical
constructs).
 A (Deterministic) Finite State Automaton can be used in
the implementation of a lexical analyzer.
43 Wednesday, December S.R. Dash KIIT University
08, 2021
Syntax Analyzer(Parser)
 A Syntax Analyzer creates the syntactic structure (generally a parse tree)
of the given program.
 A syntax analyzer is also called as a parser.
 A parse tree describes a syntactic structure.
assg stmt

identifier := expression
• In a parse tree, all terminals are at leaves.

New val expression expression


• All inner nodes are non-terminals in
a context free grammar.
identifier number

Old value
12
44 Wednesday, December S.R. Dash KIIT University
08, 2021
Syntax Analyzer (CFG)
 The syntax of a language is specified by a context free grammar
(CFG).
 The rules in a CFG are mostly recursive.
 A syntax analyzer checks whether a given program satisfies the
rules implied by a CFG or not.
– If it satisfies, the syntax analyzer creates a parse tree for the given program.

 Ex:
assgstmt -> identifier := expression
expression -> identifier
expression -> number
expression -> expression + expression

45 Wednesday, December S.R. Dash KIIT University


08, 2021
Syntax Analyzer versus Lexical
Analyzer
Which constructs of a program should be
recognized by the lexical analyzer, and which
ones by the syntax analyzer?
– Both of them do similar things; But the lexical analyzer deals with simple
non-recursive constructs of the language.
– The syntax analyzer deals with recursive constructs of the language.
– The lexical analyzer simplifies the job of the syntax analyzer.
– The lexical analyzer recognizes the smallest meaningful units (tokens) in a
source program.
– The syntax analyzer works on the smallest meaningful units (tokens) in a
source program to recognize meaningful structures in our programming
language.
46 Wednesday, December S.R. Dash KIIT University
08, 2021
Parsing Techniques
 Depending on how the parse tree is created, there are different
parsing techniques.
 These parsing techniques are categorized into two groups:
– Top-Down Parsing,
– Bottom-Up Parsing
 Top-Down Parsing:
– Construction of the parse tree starts at the root, and proceeds towards the leaves.
– Efficient top-down parsers can be easily constructed by hand.
– Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).
 Bottom-Up Parsing:
– Construction of the parse tree starts at the leaves, and proceeds towards the root.
– Normally efficient bottom-up parsers are created with the help of some software tools.
– Bottom-up parsing is also known as shift-reduce parsing.
– Operator-Precedence Parsing – simple, restrictive, easy to implement
– LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR
47 Wednesday, December S.R. Dash KIIT University
08, 2021
Semantic Analyzer
 A semantic analyzer checks the source program for semantic errors
and collects the type information for the code generation.
 Type-checking is an important part of semantic analyzer.
 Normally semantic information cannot be represented by a context-
free language used in syntax analyzers.
 Context-free grammars used in the syntax analysis are integrated
with attributes (semantic rules)
– the result is a syntax-directed translation,
– Attribute grammars
 Ex:
newval := oldval + 12

 The type of the identifier newval must match with type of the expression (oldval+12)

48 Wednesday, December S.R. Dash KIIT University


08, 2021
Intermediate Code Generation
 A compiler may produce an explicit intermediate codes representing the
source program.
 These intermediate codes are generally machine (architecture
independent). But the level of intermediate codes is close to the level of
machine codes.
 Ex:
pos := init * rate + 60

id1 := id2 * id3 + 60

Temp1= inttoreal(60)
Temp2 = id3 * temp1
Temp3= id2 + temp2
Id1= temp3

49 Wednesday, December S.R. Dash KIIT University


08, 2021
Code Optimizer (for Intermediate
Code Generator)

 The code optimizer optimizes the code


produced by the intermediate code generator
in the terms of time and space.

 Ex:
Temp1= id3* 60.0
Id1=id2*temp1

50 Wednesday, December S.R. Dash KIIT University


08, 2021
Code Generator
 Produces the target language in a specific architecture.
 The target program is normally is a relocatable object file
containing the machine codes.
Ex:
( assume that we have an architecture with instructions whose at least one of its operands is
a machine register)

MOVE id3,R2
MULF #60.0, R2
MOVF ID2,R2
ADDF R2,R1
MOVE R1,id1

51 Wednesday, December S.R. Dash KIIT University


08, 2021

Das könnte Ihnen auch gefallen