Sie sind auf Seite 1von 36

Compiler Design

JEFREY D ULLMAN,RAVI SETTHY


translator
Compiler Design
OVERVIEW OF LANGUAGE PROCESSING SYSTEM

Compiler : Compiler is a translator program that translates a program written in (HLL) the
source program and translate it into an equivalent program in (MLL) the target program. As
an important part of a compiler is error showing to the programmer.
Executing a program written in High Level programming language is basically of two parts.
The source program must first be compiled translated into a object program.
Then the result object program is loaded into a memory for execution..
Compiler Design
OVERVIEW OF LANGUAGE PROCESSING SYSTEM
Preprocessor : A preprocessor produce input to compilers.
They perform the following functions.
1. Macro processing: A preprocessor may allow a user to
define macros that are short hands for longer constructs.
2. File inclusion: A preprocessor may include header files into
the program text.
3. Rational preprocessor: these preprocessors augment older
languages with more modern flow-of-control and data
structuring facilities.
4. Language Extensions: These preprocessor attempts to add
capabilities to the language by certain amounts to build-in
macro
Compiler Design
OVERVIEW OF LANGUAGE PROCESSING SYSTEM
Translator : A translator is a program that takes as input a program written in one language and produces
as output a program in another language. Beside program translation, the translator performs another
very important role, the error-detection. Any violation of HLL specification would be detected and
reported to the programmers.

Important role of translator are:


1 Translating the hll program input into an equivalent ml program.
2 Providing diagnostic messages wherever the programmer violates specification of the high level
language

Type Of Translators :
Interpretor
Compiler
Preprossessor
Compiler Design
Interpreter Vs Compiler
Compiler Interpreter

Compiler Checks or scans the entire It translates one statement at a time


high program at once.
For error free program it compiles For error free program, it executes the
object program in machine language program and continues till the last
which has to be executed by an statement.
interpreter
The translation process carried out by a It is termed as interpretation.
compiler is termed as compilation
Object Program execution is not carried Object program execution is carried out
out by the compiler by the interpretation
It processes the program statements in It processes according to the logical
their physical input sequence flow of control through the program
It process each program statement It might process some statements
exactly once repeatedly and mightly ignore others
Phases of a compiler
Phases of a compiler
The Evolution of Programming Languages

• The move to Programming languages based on


generations.
• Based on functions,
a.Imperative language
b.Declarative language.
• Von Neuman language.
• OOL and Scripting language
Programming Languages classification by
generation:

First generation : Machine languages.

Second generation : Assembly level Languages

Third generation : High level languages (C,C++,C#, Java etc,.)

Fourth generation : Designed for specific application like NOMAD for report
generation, ABAP(SAP) for ERP, SQL for database queries, Postscript for
text formatting.

Fifth generation: These have been applied to logic and constraint based
languages like Prolog and OPS5 etc,.
Programming Languages classification
Based on functions :
Imperative languages : C,C++,C#,Java- notion of program state and
statements that change the state.

Declarative Languages : Functional Languages such as ML(meta


language) and Haskell and constraint logic languages such as Prolog.

Based On architecture:
Von Neumann Languages : The languages which have computational
model as that of von Neumann computer architectures.

Object Oriented Programming Languages: Simula67, smalltalk,


C++,Java,C#,Ruby

Scripting Languages: Interpreted languages with high level operators


designed for “gluing together” computation.
Awk,JavaScript,Perl,PHP,Python,Ruby and Tcl.
The Science of building a compiler

1.Modeling in compiler design and


implementation
2.The science of code optimization
Applications of compiler
technology
1.Implementation of High-Level Programming
languages
2.Optimizations for computer architecture
a. parallelism
b. memory hierarchies
Lexical Analysis
• Lexical analyzer: reads input characters and produces a
sequence of tokens as output (nexttoken()).
– Trying to understand each element in a program.
– Token: a group of characters having a collective meaning.
const pi = 3.14159;

Token 1: (const, -)
Token 2: (identifier, ‘pi’)
Token 3: (=, -)
Token 4: (realnumber, 3.14159)
Token 5: (;, -)
Outline
• Role of lexical analyzer
• Specification of tokens
• Recognition of tokens
• Lexical analyzer generator
• Finite automata
• Design of lexical analyzer generator
The role of lexical analyzer

token
Source To semantic
Lexical Analyzer Parser
program analysis
getNextToken

Symbol
table
Why to separate Lexical analysis and
parsing
1. Simplicity of design
2. Improving compiler efficiency
3. Enhancing compiler portability
Tokens, Patterns and Lexemes
• A token is a pair a token name and an optional
token value
• A pattern is a description of the form that the
lexemes of a token may take
• A lexeme is a sequence of characters in the
source program that matches the pattern for a
token
Example

Token Informal description Sample lexemes


if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=

id Letter followed by letter and digits pi, score, D2


number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but “ sorrounded by “ “core dumped”

printf(“total = %d\n”, score);


Attributes for tokens
• E = M * C ** 2
– <id, pointer to symbol table entry for E>
– <assign-op>
– <id, pointer to symbol table entry for M>
– <mult-op>
– <id, pointer to symbol table entry for C>
– <exp-op>
– <number, integer value 2>
Lexical errors
• Some errors are out of power of lexical
analyzer to recognize:
– fi (a == f(x)) …
• However it may be able to recognize errors
like:
– d = 2r
• Such errors are recognized when no pattern
for tokens matches a character sequence
Error recovery
• Panic mode: successive characters are ignored
until we reach to a well formed token
• Delete one character from the remaining
input
• Insert a missing character into the remaining
input
• Replace a character by another character
• Transpose two adjacent characters
Input buffering
• Sometimes lexical analyzer needs to look
ahead some symbols to decide about the
token to return
– In C language: we need to look after -, = or < to
decide what token to return
• We need to introduce a two buffer scheme to
handle large look- aheads safely
E = M * C * * 2 eof
Sentinels

E = M eof * C * * 2 eof eof


Switch (*forward++) {
case eof:
if (forward is at end of first buffer) {
reload second buffer;
forward = beginning of second buffer;
}
else if {forward is at end of second buffer) {
reload first buffer;\
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
cases for the other characters;
}
Specification of tokens
• In theory of compilation regular expressions
are used to formalize the specification of
tokens
• Regular expressions are means for specifying
regular languages
• Example:
• Letter_(letter_ | digit)*
• Each regular expression is a pattern specifying
the form of strings
Regular expressions
• Ɛ is a regular expression, L(Ɛ) = {Ɛ}
• If a is a symbol in ∑then a is a regular expression,
L(a) = {a}
• (r) | (s) is a regular expression denoting the
language L(r) ∪ L(s)
• (r)(s) is a regular expression denoting the
language L(r)L(s)
• (r)* is a regular expression denoting (L9r))*
• (r) is a regular expression denoting L(r)
Regular definitions
d1 -> r1
d2 -> r2

dn -> rn

• Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
Extensions
• One or more instances: (r)+
• Zero of one instances: r?
• Character classes: [abc]

• Example:
– letter_ -> [A-Za-z_]
– digit -> [0-9]
– id -> letter_(letter|digit)*
Recognition of tokens
• Starting point is the language grammar to
understand the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt

expr -> term relop term
| term
term -> id
| number
Recognition of tokens (cont.)
• The next step is to formalize the patterns:
digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
• We also need to handle whitespaces:
ws -> (blank | tab | newline)+
Transition diagrams
• Transition diagram for relop
Transition diagrams (cont.)
• Transition diagram for reserved words and
identifiers
Lexical Analyzer Generator - Lex

Lex Source program


Lexical Compiler lex.yy.c
lex.l

lex.yy.c
C a.out
compiler

Input stream a.out Sequence


of tokens
Structure of Lex programs

declarations
%%
translation rules Pattern {Action}
%%
auxiliary functions
Example
%{
Int installID() {/* funtion to install the
/* definitions of manifest constants
lexeme, whose first character is
LT, LE, EQ, NE, GT, GE, pointed to by yytext, and whose
IF, THEN, ELSE, ID, NUMBER, RELOP */ length is yyleng, into the symbol
%} table and return a pointer thereto
*/
/* regular definitions }
delim [ \t\n]
ws {delim}+ Int installNum() { /* similar to
installID, but puts numerical
letter [A-Za-z]
constants into a separate table */
digit [0-9]
}
id {letter}({letter}|{digit})*
number {digit}+(\.{digit}+)?(E[+-]?{digit}+)?

%%
{ws} {/* no action and no return */}
if {return(IF);}
then{return(THEN);}
else {return(ELSE);}
{id} {yylval = (int) installID(); return(ID); }
{number} {yylval = (int) installNum(); return(NUMBER);}

Finite Automata
• Regular expressions = specification
• Finite automata = implementation

• A finite automaton consists of


– An input alphabet 
– A set of states S
– A start state n
– A set of accepting states F  S
– A set of transitions state input state
36

Das könnte Ihnen auch gefallen