Sie sind auf Seite 1von 8

COMPILER CONSTRUCTION

A compiler translates the code written in one language to some other language without
changing the meaning of the program. It is also expected that a compiler should make the
target code efficient and optimized in terms of time and space.
Compiler design principles provide an in-depth view of translation and optimization process.
Compiler design covers basic translation mechanism and error detection & recovery. It
includes lexical, syntax, and semantic analysis as front end, and code generation and
optimization as back-end.

Why to Learn Compiler design?


Computers are a balanced mix of software and hardware. Hardware is just a piece of
mechanical device and its functions are being controlled by a compatible software. Hardware
understands instructions in the form of electronic charge, which is the counterpart of binary
language in software programming. Binary language has only two alphabets, 0 and 1. To
instruct, the hardware codes must be written in binary format, which is simply a series of 1s
and 0s. It would be a difficult and cumbersome task for computer programmers to write such
codes, which is why we have compilers to write such codes.

Language Processing System


We have learnt that any computer system is made of hardware and software. The hardware
understands a language, which humans cannot understand. So we write programs in
high-level language, which is easier for us to understand and remember. These programs are
then fed into a series of tools and OS components to get the desired code that can be used by
the machine. This is known as Language Processing System.
Computers are a balanced mix of software and hardware. Hardware is just a piece of
mechanical device and its functions are being controlled by a compatible software. Hardware
understands instructions in the form of electronic charge, which is the counterpart of binary
language in software programming. Binary language has only two alphabets, 0 and 1. To
instruct, the hardware codes must be written in binary format, which is simply a series of 1s
and 0s. It would be a difficult and cumbersome task for computer programmers to write such
codes, which is why we have compilers to write such codes.

Language Processing System


We have learnt that any computer system is made of hardware and software. The hardware
understands a language, which humans cannot understand. So we write programs in
high-level language, which is easier for us to understand and remember. These programs are
then fed into a series of tools and OS components to get the desired code that can be used by
the machine. This is known as Language Processing System.
The high-level language is converted into binary language in various phases. A compiler is a
program that converts high-level language to assembly language. Similarly, an assembler is
a program that converts the assembly language to machine-level language.
Let us first understand how a program, using C compiler, is executed on a host machine.
User writes a program in C language (high-level language).
The C compiler, compiles the program and translates it to assembly program
(low-level language).
An assembler then translates the assembly program into machine code (object).
A linker tool is used to link all the parts of the program together for execution
(executable machine code).
A loader loads all of them into memory and then the program is executed.
Before diving straight into the concepts of compilers, we should understand a few other tools
that work closely with compilers.

Preprocessor

A preprocessor, generally considered as a part of compiler, is a tool that produces input for
compilers. It deals with macro-processing, augmentation, file inclusion, language extension,
etc.

Interpreter
An interpreter, like a compiler, translates high-level language into low-level machine language.
The difference lies in the way they read the source code or input. A compiler reads the whole
source code at once, creates tokens, checks semantics, generates intermediate code,
executes the whole program and may involve many passes. In contrast, an interpreter reads
a statement from the input, converts it to an intermediate code, executes it, then takes the
next statement in sequence. If an error occurs, an interpreter stops execution and reports it.
whereas a compiler reads the whole program even if it encounters several errors.

Assembler
An assembler translates assembly language programs into machine code.The output of an
assembler is called an object file, which contains a combination of machine instructions as
well as the data required to place these instructions in memory.

Linker
Linker is a computer program that links and merges various object files together in order to
make an executable file. All these files might have been compiled by separate assemblers.
The major task of a linker is to search and locate referenced module/routines in a program
and to determine the memory location where these codes will be loaded, making the program
instruction to have absolute references.

Loader
Loader is a part of operating system and is responsible for loading executable files into
memory and execute them. It calculates the size of a program (instructions and data) and
creates memory space for it. It initializes various registers to initiate execution.

Cross-compiler
A compiler that runs on platform (A) and is capable of generating executable code for platform
(B) is called a cross-compiler.

Source-to-source Compiler
A compiler that takes the source code of one programming language and translates it into the
source code of another programming language is called a source-to-source compiler.

Compiler Design - Architecture

A compiler can broadly be divided into two phases based on the way they compile.

Analysis Phase
Known as the front-end of the compiler, the analysis phase of the compiler reads the source
program, divides it into core parts and then checks for lexical, grammar and syntax errors.The
analysis phase generates an intermediate representation of the source program and symbol
table, which should be fed to the Synthesis phase as input.

Synthesis Phase
Known as the back-end of the compiler, the synthesis phase generates the target program
with the help of intermediate source code representation and symbol table.
A compiler can have many phases and passes.
Pass : A pass refers to the traversal of a compiler through the entire program.
Phase : A phase of a compiler is a distinguishable stage, which takes input from the
previous stage, processes and yields output that can be used as input for the next
stage. A pass can have more than one phase.
Compiler Design - Phases of Compiler
The compilation process is a sequence of various phases. Each phase takes input from its
previous stage, has its own representation of source program, and feeds its output to the next
phase of the compiler. Let us understand the phases of a compiler.

Phase 1: Lexical Analysis

Lexical Analysis is the first phase when compiler scans the source code. This process can be
left to right, character by character, and group these characters into tokens.

Here, the character stream from the source program is grouped in meaningful sequences by
identifying the tokens. It makes the entry of the corresponding tickets into the symbol table and
passes that token to next phase.

The primary functions of this phase are:

 Identify the lexical units in a source code


 Classify lexical units into classes like constants, reserved words, and enter them in
different tables. It will Ignore comments in the source program
 Identify token which is not a part of the language

Example:

x = y + 10

Tokens
X identifier

= Assignment operator

Y identifier

+ Addition operator

10 Number

Lexical analyzer represents these lexemes in the form of tokens as:

<token-name, attribute-value>

Phase 2: Syntax Analysis(or parsing.)

Syntax analysis is all about discovering structure in code. It determines whether or not a text
follows the expected format. The main aim of this phase is to make sure that the source code
was written by the programmer is correct or not.

Syntax analysis is based on the rules based on the specific programing language by
constructing the parse tree with the help of tokens. It also determines the structure of source
language and grammar or syntax of the language.

Here, is a list of tasks performed in this phase:

 Obtain tokens from the lexical analyzer


 Checks if the expression is syntactically correct or not
 Report all syntax errors
 Construct a hierarchical structure which is known as a parse tree

Example

Any identifier/number is an expression

If x is an identifier and y+10 is an expression, then x= y+10 is a statement.

Consider parse tree for the following example (a+b)*c


In Parse Tree

 Interior node: record with an operator filed and two files for children
 Leaf: records with 2/more fields; one for token and other information about the token
 Ensure that the components of the program fit together meaningfully
 Gathers type information and checks for type compatibility
 Checks operands are permitted by the source language

Phase 3: Semantic Analysis

Semantic analysis checks the semantic consistency of the code(whether the parse tree
constructed follows the rules of language.). It uses the syntax tree of the previous phase along
with the symbol table to verify that the given source code is semantically consistent. It also
checks whether the code is conveying an appropriate meaning.

Semantic Analyzer will check for Type mismatches, incompatible operands, a function called
with improper arguments, an undeclared variable, etc.

Functions of Semantic analyses phase are:

 Helps you to store type information gathered and save it in symbol table or syntax tree
 Allows you to perform type checking
 In the case of type mismatch, where there are no exact type correction rules which
satisfy the desired operation a semantic error is shown
 Collects type information and checks for type compatibility
 Checks if the source language permits the operands or not

Example

float x = 20.2;

float y = x*30;

In the above code, the semantic analyzer will typecast the integer 30 to float 30.0 before
multiplication

Phase 4: Intermediate Code Generation

Once the semantic analysis phase is over the compiler, generates intermediate code for the
target machine. It represents a program for some abstract machine.

Intermediate code is between the high-level and machine level language. This intermediate
code needs to be generated in such a manner that makes it easy to translate it into the target
machine code.

Functions on Intermediate Code generation:

 It should be generated from the semantic representation of the source program


 Holds the values computed during the process of translation
 Helps you to translate the intermediate code into target language
 Allows you to maintain precedence ordering of the source language
 It holds the correct number of operands of the instruction
Example

total = count + rate * 5

Intermediate code with the help of address code method is:

t1 := int_to_float(5)

t2 := rate * t1

t3 := count + t2

total := t3

Code Optimization
The next phase does code optimization of the intermediate code. Optimization can be
assumed as something that removes unnecessary code lines, and arranges the sequence of
statements in order to speed up the program execution without wasting resources (CPU,
memory).

The primary functions of this phase are:

 It helps you to establish a trade-off between execution and compilation speed


 Improves the running time of the target program
 Generates streamlined code still in intermediate representation
 Removing unreachable code and getting rid of unused variables
 Removing statements which are not altered from the loop

Code Generation
In this phase, the code generator takes the optimized representation of the intermediate code
and maps it to the target machine language. The code generator translates the intermediate
code into a sequence of (generally) re-locatable machine code. Sequence of instructions of
machine code performs the task as the intermediate code would do.

Symbol Table
It is a data-structure maintained throughout all the phases of a compiler. All the identifier's
names along with their types are stored here. The symbol table makes it easier for the
compiler to quickly search the identifier record and retrieve it. The symbol table is also used
for scope management.

Grouping of Phases - Compiler Design


The phases of a compiler can be grouped as:

Front end
Front end of a compiler consists of the phases
• Lexical analysis.
• Syntax analysis.
• Semantic analysis.
• Intermediate code generation.

Back end
Back end of a compiler contains
• Code optimization.
• Code generation.

Front End
• Front end comprises of phases which are dependent on the input (source language) and
independent on the target machine (target language).
• It includes lexical and syntactic analysis, symbol table management, semantic analysis and
the generation of intermediate code.
• Code optimization can also be done by the front end.
• It also includes error handling at the phases concerned

Back End
• Back end comprises of those phases of the compiler that are dependent on the target
machine and independent on the source language.
• This includes code optimization, code generation.
• In addition to this, it also encompasses error handling and symbol table management
operations.
Passes
• The phases of compiler can be implemented in a single pass by marking the primary actions
viz. reading of input file and writing to the output file.
• Several phases of compiler are grouped into one pass in such a way that the operations in
each and every phase are incorporated during the pass.
• (eg.) Lexical analysis, syntax analysis, semantic analysis and intermediate code generation
might be grouped into one pass. If so, the token stream after lexical analysis may be
translated directly into intermediate code.

Reducing the Number of Passes


• Minimizing the number of passes improves the time efficiency as reading from and writing
to intermediate files can be reduced.
• When grouping phases into one pass, the entire program has to be kept in memory to
ensure proper information flow to each phase because one phase may need information in a
different order than the information produced in previous phase.
The source program or target program differs from its internal representation. So, the
memory for internal form may be larger than that of input and output.

Das könnte Ihnen auch gefallen