Beruflich Dokumente
Kultur Dokumente
Recap
• What is Translator?
• What is Compiler??
• Is Compiler, the only translator available??
• What is a phases?
• A phase is a logically cohesive
operation that takes as input one
representation of the source program and
produces as output another
representation.
Sorce Program
Lexical Analysis
(SCANNER)
Syntax Analysis
(PARSER)
Semantic Analysis
Code Optimization
Code Generation
Target Program
Lexical Analyzer (SCANNER)
• ROLE: separates the characters of the
source language into groups that logically
belongs together.
• TOKENS: These groups are called
tokens.
• Tokens are basic unit of syntax.
double f = sqrt(-1);
T_DOUBLE (“double)
T_IDENT (“f”)
T_OP (“=“)
T_IDENT (“sqrt”)
T_LPAREN (“(“)
T_OP (“-”)
T_INTCONSTANT (“1”)
T_RPAREN (“)”)
T_SEP (“;”)
• Eliminates white spaces (tabs, blanks,
comments)
• Key issue is speed.
• A scanner must recognize various parts of
the language’s syntax.
1. REGULAR EXPRESSIONS
2. AUTOMATA’s
– DETERMINISTIC FINITE AUTOMATA(DFA)
– NON-DETERMINISTIC FINITE AUTOMATA
(NDFA)
SYNTAX ANALYSIS (PARSER)
• ROLE: Syntax analyzer groups tokens
together into syntactic structures.
• Example: three tokens representing A+B
might be grouped into syntactic structure
called an expression.
Expression
FuncCall
UnaryExpression
T_OP - Expression
T_INTCONSTAT 1
OVERVIEW Contd..
SEMANTIC ANALYSIS
• Checks the source program for semantic errors and gathers type
information for the subsequent code-generation phase.
• Compiler checks that each operator has operands that are permitted
by the source language specification.
Lexical Analysis
(SCANNER)
id1=id2+id3*60
Syntax Analysis
(PARSER)
:=
id1 +
id2 *
id3 60
Semantic Analysis
:=
id1 +
Intermediate Code id2 *
Generation id3 inttoreal
60
temp1:=inttoreal(60)
temp2=:=id3*temp1
temp3=:=id2+temp2
Id1:=temp3
• Multiplication precedes addition.
• Compiler must generate a temporary name to hold
the value computed by each instruction.
• Some three address instructions can have fewer
than three operands.
– INTERMEDIATE CODE GENERATION IS
DONE BY SYNTAX DIRECTED
TRANSLATION, A PROCESS IN WHICH
ACTIONS OF SYNTAX ANALYSIS PHASE
GUIDE THE TRANSLATION.
Code Optimization
• Is an optional phase.
• Code Optimizer analyzes and changes the intermediate
code, so that transformed code is better in some sense.
• GOAL of this phase is to
– Either reduce runtime or space
• The term optimization is a complete misnomer, since
there is no algorithmic way of producing a target
language program that is best possible under any
reasonable definition of “best”.
• A good optimizing compiler can improve the target
program by perhaps a factor of two in overall speed, in
comparison with a compiler that generates code without
using specialized techniques.
Example of optimization:
• Local Optimization:
– See two instances of jumps over jumps in the
intermediate code.
if A>B goto L2
goto L3
L2:
– This sequence could be replaced by the
single statement
if A≤B goto L3
Sequence 1
• Compare A and B to set the condition codes.
• Jump to L2 if the code for > is set and
• Jump to L3
Sequence 2
• Compare A and B to set the condition codes and
• Jump to L3 if the code for < or = is set.
Begin
for I:=1 to 10 do
for J:=1 to 10 do
A[I,J]:=0;
for I:=1 to 10 do
A[I,I]=1
End
Optimized Version
Begin
for I:=1 to 10 do
begin
for J:=1 to 10 do
A[I,J]:=0
A[I,1]:=1
end
• Other Optimizations
– Common sub expression elimination
– Redundant computation elimination
– Move computation to less frequently executed
place (ie. Out of loops).
Code Generation
• Converts the intermediate code into a sequence of machine
instructions.
• Simple code generator might map the statement A:=B+C into the
machine code sequence
LOAD A
ADD C
STORE A
• However such straightforward macro like expansion of intermediate
code into machine code usually produces a target program that
contains many redundant loads and stores and that uses the
resources of target machine inefficiently.
• To avoid it, code generator keep track of run time contents of
registers.
• Knowing what quantities resides in registers, the code
generator can generate loads and stores only when necessary.
• Attempt to utilize register as efficiently as possible.
• Register allocation is difficult to do
optimally , but some heuristic approaches
exists and give reasonably good results.
SUMMARY
Instruction Selection
– Produce compact, fast code.
– Use available addressing modes.
– pattern matching problem
• Ad hoc techniques
• Tree pattern matching algos.
• String pattern matching algos.
• Dynamic programming techs.
Register Allocation
– Limited resources.
– Loads and Stores should be minimized.
– Keep run time track of values in registers.
– Optimal allocation is difficult.
• NP-complete for 1 or k registers.
In next class
• Continue with the overview.
• Passes of compilers.
• Bootstrapping
• Cross Compilation
Overview Contd…
Symbol Table Management
• Essential function of a compiler is to record the
identifiers used in the source program and
collect information about various attributes of
identifiers.
• The attributes may provide information about the
– Storage allocated for an identifier.
– Its type
– Its scope
– In case of procedures (name & types of arguments,
method of passing each argument, the type returned)
• Symbol Table is a data structure containing a
record for each identifier, with fields for the
attributes of the identifier.
• This data structure allow us to find the record
for each identifier quickly and to store and
retrieve data from the record quickly.
• When identifier in the source program is
detected by the lexical analyzer, the identifier is
entered into symbol table.
• However, the attributes of an identifier
cannot normally be determined during
lexical analysis.
• Eg. Pascal Declaration
var position, initial, rate: real;
• The remaining phases enter information
about identifiers into the symbol table and
then use the information in various ways.
Error Detection and Handling
• Each phase can encounter error.
• For eg:
– The lexical analyzer may be unable to proceed because the next token
in the source program is misspelled.
– The syntax analyzer may be unable to infer a structure for its input
because a syntactic error such as missing parenthesis has occurred.
– The intermediate code generator may detect an operator whose
operands have incompatible types.
– The code optimizer, during control flow analysis, may detect that
certain statements can never be reached.
– The code generator may find a compiler created constant that is too
large to fit in a word of the target machine.
– While entering information into symbol table, the book-keeping routine
may discover an identifier that has been multiply declared with
contradictory attributes.
• Whenever a phase of compiler discovers an
error, it must report the error to error handler,
which issues appropriate diagnostic message.
• Once error has been noted, the compiler must
modify the input to the phase detecting the error,
so that the latter can continue processing its
input, looking for subsequent errors.
• A compiler that stops when it finds the first error
is not as helpful as it could be.
• Good error handling is difficult because certain
errors can mask subsequent errors.
Phases Grouped into Passes
• In an implementation of a compiler, portions of
one or more phases are combined into a module
called a pass.
• A pass reads the source program or the output
of the previous pass, makes the transformations
specified by its passes and writes output into an
intermediate file, which may then be read by
subsequent pass.
• If several phases are grouped together into one
pass, then the operation of the phases may be
interleaved, with control altering among several
phases.
• The number of passes and the grouping of
phases into passes depends upon:
– A particular language and machine.
– Structure of language.
• Certain languages require atleast two passes to
generate code easily. For example. ALGOL allow
declaration of a name to occur after uses of that
name. Code for expressions containing name
cannot be generated conveniently until the
declaration has been seen.
– Environment.
– The environment in which the compiler must operate
can also affect the number of passes.
– A multi pass compiler can be made to use less space
than a single pass compiler, since the space occupied
by the compiler program for one pass can be reused
by the following pass.
– A multi pass compiler is ofcourse slower than a single
pass compiler because each pass reads and writes
an intermediate file.
– Thus compiler running on computers with small
memory would normally use several passes while, on
a computer with large memory, a compiler with fewer
passes would be possible.
Bootstrapping
• A compiler is characterized by three languages:
– Its source language
– Its object language
– The language in which it is written.
• All language may be different.
• Compiler may run on one machine and may
produce code for another machine. Such a
compiler is called cross-compiler.
• Sometimes we hear of compiler being
implemented in its own language.
• How was the first compiler compiled?
• Suppose we have new language L, which
we want to make available on several
machine say A and B.