Beruflich Dokumente
Kultur Dokumente
Laboratory Manual
On
For
FOREWORD
As you may be aware that MGM has already been awarded with ISO 9000
certification and it is our endure to technically equip our students taking the
advantage of the procedural aspects of ISO 9000 Certification.
Faculty members are also advised that covering these aspects in initial stage
itself, will greatly relived them in future as much of the load will be taken care
by the enthusiasm energies of the students once they are conceptually clear.
Dr. S.D.Deshmukh
Principal
This manual is intended for the Final year students of IT & CSE branches in
the subject of PCD. This manual typically contains practical/Lab Sessions
related PCD covering various aspects related the subject to enhanced
understanding. Strengthen knowledge of a procedural programming language.
Further develop your skills in software development using a procedural
language.
This course will also helpful for student for understanding design of a
compiler. We have made the efforts to cover various aspects of the subject
covering these labs encompass the regular materials as well as some advanced
experiments useful in real life applications. Programming aspects will be
complete in itself to make it meaningful, elaborative understandable concepts
and conceptual visualization.
Students are advised to thoroughly go through this manual rather than only
topics mentioned in the syllabus, as practical aspects are the key to
understanding and conceptual visualization of theoretical aspects covered in
the books.
Prof. D.S.Deshpande
HOD, CSE
Ms. R.D.Narwade
Lecturer, CSE Dept.
Submission related to whatever lab work has been completed should be done
during the next lab session. The immediate arrangements for printouts related to
submission on the day of practical assignments.
2. Students should be taught for taking the printouts under the observation of lab
teacher.
3. The promptness of submission should be encouraged by way of marking and
evaluation patterns that will benefit the sincere students.
Rules
All students are bound to adhere to the following rules. Failure of complying with any rules will
be penalized accordingly.
1. Students must be in the lab before the lab activities started. No late coming is tolerated
without prior consent from the respective lecturer. Failure to do so may eliminate your mark
for that particular lab session.
2. During lab session any form of portable data storage and retrieval devices is prohibited. If
found, then we reserve the right to confiscate the item and devoid your mark for that
particular lab session. Collection of the confiscated item(s) requires approval from Deputy
Dean of Academic Affairs.
3. Duplicated lab assignment: the source and duplicate will be considered void.
4. Submission procedure:
a) Create a folder in the D: \ drive of your workstation. Name the folder with your ID
number and your name. Example: 04xxxxxx Rahul Joshi
b) Save all your answers and source codes inside the folder. Name the files according to the
question, example: question1.cpp/question1.txt.
Report your completed lab assignment to the instructor/demonstrator for inspection and
assessment
3. Lab module is designed as a guideline, not a comprehensive set of notes and exercises. Read
your theory notes and books pertaining to the topics to be covered.
IBM-compatible 486 System, a hard drive, Min 8Mb memory, Win xpS/w.
Turbo C++
Linux Operating System :
Tool : FLEX
SUBJECT INDEX
Lab No.
Index
Week
involved
1-2
5-6
7-8
9-10
Study of YACC
10-11
Implementation of code
optimization for common sub-expression
13-14
elimination, loop invariant code movement
C Conduction of Viva-Voce Examinations
Appendix - A
Introduction
Introduction to the compiler
What is a Compiler
Compiler is a program that reads a program written in one language the source language and
translates it in to an equivalent program in another language the target language. As an
important part of this translation process, the compiler reports to its user the presence of errors in
the source program.
Source Program
Compiler
Target program
Error Message
Phases of Compiler
Source program
Lexical
Analyzer
Syntax
Analyzer
Semantic
Analyzer
Symbol table
manager
Intermediate code
generation
Error
Handler
Code
Optimizer
Code
Generator
Target
program
Lexical analysis
In compiler, lexical analysis is also called linear analysis or scanning. In lexical analysis the
stream of characters making up the source program is read from left to right and grouped into
tokens that are sequences of characters having a collective meaning.
Syntax analysis
It is also called as Hierarchical analysis or parsing. It involves grouping the tokens of the source
program into grammatical phrases that are used by the compiler to synthesize output. Usually, a
parse tree represents the grammatical phrases of the sourse program.
Semantic Analysis
The semantic analysis phase checks the source program for semantic errors and gathers type
information for the subsequent code generation phase. It uses the hierarchical structure
determined by the syntax-analysis phase to identify the operators and operands of expressions
and statements.
An important component of semantic analysis is type checking. Here the compiler checks that
each operator has operands that are permitted by the source language specification.
Symbol table management
Symbol table is a data structure containing the record of each identifier, with fields for the
attributes of the identifier. The data structure allows us to find the record for each identifier
quickly and store or retrieve data from that record quickly. When the lexical analyzer detects an
identifier in the source program, the identifier is entered into symbol table. The remaining
phases enter information about identifiers in to the symbol table.
Error detection
Each phase can encounter errors. The syntax and semantic analysis phases usually handle a large
fraction of the errors detectable by compiler. The lexical phase can detect errors where the
characters remaining in the input do not form any token of language. Errors where the token
stream violates the structure rules of the language are determined by the syntax analysis phase.
10
11
Lexical
Analyzer
getnext_token
Parser
Symbol
Table
Error
Handler
Figure 1.1 Interaction of Lexical Analyzer with Parser
Since the lexical analyzer is the part of the compiler that reads the source text, it may
also perform certain secondary tasks at the user interface. One such task is stripping out from
the source program comments and white spaces in the form of blank, tab, and new line
characters. Another is correlating error messages from the compiler with the source program.
Sometimes lexical analyzers are divided into a cascade of two phases first called
scanning and the second lexical analysis. The scanner is responsible for doing simple tasks,
while the lexical analyzer proper does the more complex operations.
12
Algorithm:
Declare an array of characters, as buffer to store the tokens ,that is,lexbuffer;
Get token from user put it into character type of variable, say c.
If c is blank then do nothing.
If c is new line character line=line+1.
If c is digit, set token_val ,the value assigned for a digit and return the NUMBER.
If c is proper token then assign the token value.
Print the complete table with
a. Token entered by the user
1.
2.
3.
4.
5.
6.
1.
Output:
Enter the Statement
If(a= =1) then b++;
Token
Code
Value
-------------------------------if
1
(
5
1
a
8
Pointer to Symbol table
==
6
1
1
9
Pointer to literal table
)
5
2
then
2
b
8
Pointer to literal table
++
;
6
7
2
1
13
5. Compared the string entered by the user with the string in the character array by using
6.
7.
8.
9.
Output:
Enter the string : return
It is KEYWORD
Enter the string : hello
It is not KEYWORD
Output:
Input a string : 24
It is a CONSTANT
Input a string : a34
It is a NOT CONSTANT
14
0
1.
s0
s0
s1
s1
{s1, s2}
s1
s2
s2
s0
1, 0
s1
0
0
s2
This figure (which corresponds to the transition table above) is a non-deterministic finite
automaton, NFA. The big circles represent states and the double circles represent accepting or
final states. The state with an unlabeled arrow coming from its left is the starting state
Constructing NFA from Regular Expression
REGULAR EXPRESSIONS
A regular expression is built up out of simpler regular expressions using a set of defining
rules. Each regular expression r denotes a language L(r). The defining rules specifies how L(r)
is formed by combining in various ways the languages denoted by the subexpressions of r.
Rules that define the regular expressions using a set of defining rules:
15
1. is a regular expression that denotes { }, that is, the set of containing the empty
string.
2. If a is a symbol in , then a is a regular expression that denotes {a} is a set containing
the string a.
3. Suppose r and s are regular expressions denoting the language L(r) and L(s). Then:
a)
b)
c)
d)
16
s = move(s, c);
c = nextchar;
end
if s is in F then
return YES
else return NO;
Finite Automata Parsing
Accept input string iff there exists at least one path from the initial state (s0) to a final state
(sf) that spells the input string.
If no path exists, reject the string
Example: Use the input string x = 11000
0
1,0
0 0 0 1 1->
s0
s1
s2
1,0
0
0 0 0 1 ->
s0
s1
s2
1,0
0
0 0 ->
s0
s1
1
0
1,0
s0
s2
s1
s2
Example
q0
1
0
17
q1
q3
q0
{q0, q3}
{q0, q1}
q1
{q2}
q2
{q2}
{q2}
q3
{q4}
q4
{q4}
{q4}
We show next the proliferation of states of the NFA under the input string
q0
q0
q0
q0
1
q3
q0
q0
0
q3
q1
01001
q1
q3
0
1
q4
q4
Example
X = e-closure({0}) = {0, 1, 2, 4, 7}
Y = e-closure({2}) = {2}
Z = e-closure({6}) = {6, 1, 2, 4, 7} = {1, 2, 4, 6, 7}
T = e-clos.({1, 2, 5}) = e-clos.({1}) U e-clos.({2}) U e-clos.({5})
= {1, 2, 5, 4, 6, 7} = {1, 2, 4, 5, 6, 7}
e
a
e
e
0
e
e
6
b
e
4
7
18
Consider an NFA N and a DFA D accepting the same regular language. [Equivalent to
Algorithm 2].
1) Initially the list of states in D is empty. Create the starting state as e-closure(s0), named after
initial state {s0} in N. That is, new start state: state(1) = e-closure (s0).
2) While (there is an uncompleted row in the transition table for D) do:
a) Let x = {s1, s2, ...,sk) be the state for this row.
b) For each input symbol a do:
i) Find the e-closure({s1,s2,...sk},a) = N({s1},a) U N({s2},a) U..... U N({sk},a) = some
set we'll call T .
ii) Create the D-state y = {T}, corresponding to T.
iii) If y is not already in the list of D-states, add it to the list. (This results in a new row in
the table)
iv) Add the rule D(x , a) = y to the list of transition rules for D.
3) Identify the accepting states in D. (They must include the accepting states in N). That is,
make state(j) final state iff at least one state in state(j) is final state in the NFA.
Yet another algorithm for the construction of a DFA equivalent to an NFA.
state[0] = { };
state[1] = e-closure (s0);
p = 1;
j = 0;
while (j <= p) {
foreach c in do {
e = DFAedge(state[j],c);
20
There are in total 23 = 8 states for the corresponding DFA. This is the Power Set of set S =
{1, 2, 3} represented by 2|S|. That is, the 8 states of the DFA are:
2|S| ={, {0}, {1}, {2}, {0, 1}, {0, 2}, {1, 2}, {0, 1, 2}}
and graphically, the transition diagram is:
0
1,0
1
{0}
{1}
1
{0, 2}
1
{0, 1}
{1, 2}
{2}
{0, 1, 2}
Trivially, the states {2} and plus the states {0, 2} and {0, 1} that have no input can be
eliminated. After the 1st elimination cycle is complete, {0, 1, 2} has no input to it and can be
eliminated. Only A = {0}, B= {1} and C = {1, 2} remain.
Example
The NFA below represents the regular expression letter (letter | digit)*.
e case
corresponding DFA. (The theoretical number of states in this
Find its
letter
5
letter
1
e
2
e
3
21
e
9
10
is 210 = 1,024).
Solution:
A = e-closure ({1}) = {1}
move(A, letter) = {2} (function move defined in Alg. 2)
move(A, digit) =
B = e-closure({2}) = {2, 3, 4, 5, 7, 10}
move(B, letter) = {6}
move(B, digit) = {8}
C = e-closure({6}) = {6, 9, 10, 4, 5, 7} = {4, 5, 6, 7, 9, 10}
move(C, letter) = {6}
move(C, digit) = {8}
D = e-closure({8}) = {8, 9, 10, 4, 5, 7} = {4, 5, 7, 8, 9, 10}
move(D, letter) = {6}
move(D, digit) = {8}
State A is the start state of the DFA and all states that include state 10 of the NFA are
accepting states of the DFA. The transition diagram of the DFA is given below.
letter
start
A
letter
digit
B
digit
letter
digit
C
letter
D
digit
Example
The NFA below represents the regular expression letter(letter | digit)*.
Find its
corresponding DFA.
MINIMIZING THE NUMBER OF STATES OF A DFA
This is an important theoretical result: every regular set is recognized by a minimumstate DFA.
22
Algorithm 4
Input: A DFA M with set of states S, start state s0 and a set of accepting states F.
Output: A DFA M accepting the same language as M and having a minimum number of
states.
Method:
1. Construct an initial partition P of the set of states with two groups: the accepting states
F and the non-accepting states S F.
2. Apply the procedure given below to P and construct a new partition Pnew.
for each group G of P
partition G into subgroups such that two states s and t of G are in the same
subgroup if and only if for all input symbols a, states s and t have transitions on
a to states in the same group of P;
/* at worst, a state will be in a subgroup by itself */
partition G in Pnew by the set of all subgroups formed
end
3. If Pnew = P, let Pfinal = P and continue to step 4. Otherwise, repeat step 2 with P = Pnew.
4. Choose one state in each group of the partition Pfinal as the representative for that
group. The representative will be the states of the reduced DFA M. Let s be a
representative state, and suppose on input a there is a transition of M from s to t. Let r
be the representative of ts group (r may be t). The M has a transition from s to r on
a. Let the start state of M be the representative of the group containing the start state
s0 of M, and let the accepting states of M be the representatives that are in F. Note
that each group of Pfinal either consists only of states in F or has no states in F.
5. If M has a dead state, that is, a state d that is not accepting and that has transitions to
itself on all input symbols, then remove d from M. Also, remove any states not
reachable from the start state. Any transitions to d from other states become
undefined.
Example
Consider the DFA given by the following transition table and transition diagram.
a
A
B
|
|
|
a
B
B
b
C
D
b
a
A
a
b
a
a
D
b
23
E
D
E
|
|
B
B
E
C
[CDE]
|
We must separate D from the subgroup [ABCD] since D is going to E under b. We now
build the following table.
|
a
b .
[ABC]
D
E
[CD]
|
|
B
B
E
C
Now we separate B which is going to D under b. We can build the following table.
|
a
b
[AC]
|
B
C
B
|
B
D
D
|
B
E
E
a
b
B
a
a
a
D
b
24
e
0
a
3
1
e
b
8
b
9
10
5
a
b
a
a
b
b
b
25
yacc
yyparse
y.tab.c
source
cc
y.tab.h
Regular Expressions
:
example
In the following we denote by c=character, x,y=regular expressions, m,n=integers,
i=identifier.
matches anylex
single character except newline
. Example.1
matches 0 or more instances lex.yy.c
of the preceding regular expression
*
Compiled output
yylex
matches 1 or more instances of the preceding regular expression
+
?
[]
()
"..."
x|y
x or y
{i}
definition of i
x/y
x{m,n}
m to n occurrences of x
x$
"s"
exactly what is in the quotes (except for "\" and following character)
26
Algorithm:
1. Open file in text editor
2. Enter keywords, rules for identifier and constant, operators and relational operators. In
the following format
a) %{
Definition of constant /header files
%}
b) Regular Expressions
%%
Transition rules
%%
c) Auxiliary Procedure (main( ) function)
3. Save file with .l extension e.g. Mylex.l
4. Call lex tool on the terminal e.g. [root@localhost]# lex Mylex.l This lex tool will convert
.l file into .c language code file i.e. lex.yy.c
5. Compile the file lex.yy.c e.g. gcc lex.yy.c .After compiling the file lex.yy.c, this will
create the output file a.out
6. Run the file a.out e.g. ./a.out
7. Give input on the terminal to the a.out file upon processing output will be displayed
Example: Program for counting number of vowels and consonant
%{
#include <stdio.h>
#include <ctype.h>
int vowels = 0;
int consonants = 0;
%}
%%
27
[aeiouAEIOU]
vowels++;
[a-zA-Z]
[\n]
.
%%
consonants++;
;
;
int main()
{
printf ("This Lex program counts the number of vowels and ");
printf ("consonants in given text.");
printf ("\nEnter the text and terminate it with CTRL-d.\n");
yylex();
printf ("Vowels = %d, consonants = %d.\n", vowels, consonants);
return 0;
}
Output:
#lex alphalex.l
#gcc lex.yy.c
#./a.out
This Lex program counts the number of vowels and consonants in given text.
Enter the text and terminate it with CTRL-d.
Iceream
Vowels =4, consonants =3.
Output:
For lexical analyzer
28
123ba
NUMBER
WORD
Practical 4. Parsers
Aim: Program to implement predictive parsers
Theory:
BASIC PARSING TECHNIQUES:
ADVANTAGES OFFERED BY GRAMMAR TO BOTH LANGUAGES AND DESIGNERS
1. A grammar gives a precise, yet easy to understand, syntactic specification of a programming
language.
2. From certain classes of grammars, it can automatically construct an efficient parser that
determines if a source program is syntactically well formed.
3. A properly designed grammar imparts a structure to programming language that is useful for
the translation of source programs into correct object code and for the detection of errors.
Tools are available for converting grammar-based descriptions of translations into working
programs.
4. Languages evolve a period of time, acquiring new constructs and performing additional
tasks.
THE ROLE OF THE PARSER
Token
Source
Program
Lexical
Analyzer
Parser
Intermediate
Parse
Rest of
Tree
Front End
Representation
Get_Next
Token
Symbol
Table
29
There are three general parsers for grammars. Universal methods such as the CockeYounger-Kasami algorithm and Earleys algorithm can parse any grammar. These two methods
are too efficient to use in the production compilers.
The most efficient top-down and bottom up methods work only on the classes of
grammars, but several of these subclasses, such as the LL and LR grammars, are expensive
enough to describe most syntactic constructs in programming languages.
TOP DOWN PARSING: The parse tree is created top to bottom.
Top-down parser
Recursive-Descent Parsing
o Backtracking is needed (If a choice of a production rule does not work, we
backtrack to try other alternatives.)
o It is a general parsing technique, but not widely used.
o Not efficient
S aBc
B bc | b
input: abc
Predictive Parsing
o no backtracking
o efficient
o Needs a special form of grammars (LL (1) grammars).
o Recursive Predictive Parsing is a special form of Recursive Descent parsing
without backtracking.
o Non-Recursive (Table Driven) Predictive Parser is also known as LL (1) parser.
A grammar
left factor
When re-writing a non-terminal in a derivation step, a predictive parser can uniquely choose a
production rule by just looking the current symbol in the input string.
A 1 | ... | n
Current token
Example
stmt if ...... |
while ...... |
begin ...... |
for .....
When we are trying to write the non-terminal stmt, if the current tokenis if we have to choose
first production rule. When we are trying to write the non-terminal stmt, we can uniquelychoose
the production rule by just looking the current token. We eliminate the left recursion in the
grammar, and left factor it. But itmay not be suitable for predictive parsing (not LL (1)
grammar).
Each non-terminal corresponds to a procedure.
Ex: A
aBb (This is only the production rule for A)
proc A {
- match the current token with a, and move to the next token;
- call B;
- match the current token with b, and move to the next token;
}
A aBb | bAB
proc A {
case of the current token {
a: - match the current token with a, and move to the next token;
- call B;
- match the current token with b, and move to the next token;
b: - match the current token with b, and move to the next token;
- call A;
- call B;
}
}
When to apply -productions.
A
aA | bB |
If all other productions fail, we should apply an -production. For example, if the current
token is not a or b, we may apply the -production.
Most correct choice: We should apply an -production for a nonterminal A when the current
token is in the follow set of A (which terminals can follow A in the sentential forms).
aBe | cBd | C
31
bB |
Proc A{
Case of the current token {
a: -match the current token with a, and move to the next token;
-call B;
c: - match the current token with e, and move to the next token;
-call B;
-match the current token with d, and move to the next token;
f: -call C
(f means first set of C)
}
}
Proc C {match the current token with f, and move to the next token ;}
Proc B {
Case of the current token {
b: -match the current token with b, and move to the next token ;
-call B
e,d: do nothing
(follow set of B)
}
}
Implementation of a Table-Driven Predictive Parser
A table-driven parser can be implemented using an input buffer, a stack, and a parsing table. The
input buffer is used to hold the string to be parsed. The string is followed by a "$" symbol that is
used as a right-end maker to indicate the end of the input string. The stack is used to hold the
sequence of grammar symbols. A "$" indicates bottom of the stack. Initially, the stack has the
start symbol of a grammar above the $. It is a two-dimensional array TABLE[A, a], where A is a
nonterminal and a is a terminal, or $ symbol. The parser is controlled by a program that behaves
as follows:
1. The program considers X, the symbol on the top of the stack, and the next input symbol
a.
2. If X = a = $, then parser announces the successful completion of the parsing and halts.
3. If X = a $, then the parser pops the X off the stack and advances the input pointer to the
next input symbol.
32
4. If X is a nonterminal, then the program consults the parsing table entry TABLE[x, a]. If
TABLE[x, a] = x UVW, then the parser replaces X on the top of the stack by UVW in
such a manner that U will come on the top. If TABLE[x, a] = error, then the parser calls
the error-recovery routine.
For example consider the following grammar:
FIRST(S) = FIRST(aABb) = { a }
FIRST(A) = FIRST(c) FIRST() = { c, }
FIRST(B) = FIRST(d) FIRST() = { d, }
Since the right-end marker $ is used to mark the bottom of the stack, $ will initially be
immediately below S (the start symbol) on the stack; and hence, $ will be in the FOLLOW(S).
Therefore:
S aABb
Ac
Bd
Consider an input string acdb. The various steps in the parsing of this string, in terms of the
contents of the stack and unspent input are shown in Table 3.
33
acdb$
$bBAa
acdb$
Popping a off the stack and advancing one position in the input
$bBA
cdb$
Derivation using A c
$bBc
cdb$
Popping c off the stack and advancing one position in the input
$bB
db$
Derivation using B d
$bd
db$
Popping d off the stack and advancing one position in the input
$b
b$
Popping b off the stack and advancing one position in the input
Similarly, for the input string ab, the various steps in the parsing of the string, in terms of the
contents of the stack and unspent input, are shown in Table 4.
ab$
$bBAa
ab$
Popping a off the stack and advancing one position in the input
$bBA
b$
Derivation using A
$bB
b$
Derivation using B
$b
b$
Popping b off the stack and advancing one position in the input
For a string adb, the various steps in the parsing of the string, in terms of the contents of the
stack and unspent input are shown in Table 5.
Table 5: Production Selections for Parsing Derivations for the String adb
Stack Contents Unspent Input Moves
$S
adb$
$bBAa
adb$
Popping a off the stack and advancing one position in the input
$bBA
ab$
EXAMPLE
Test whether the grammar is LL(1) or not, and construct a predictive parsing table for it.
34
Since the grammar contains a pair of productions S AaAb | BbBa, for the grammar to be
LL(1), it is required that:
S AaAb
S BbBa
35
%Definition Section
(Token declarations)
-----------------------------Production rules section:
define how to "understand" the
input language, and what actions
to take for each "sentence".
------------------------------
;
expr:
INTEGER { $$ = $1; }
| expr + expr { $$ = $1 + $3; }
36
Algorithm:
1. Open file in text editor
2. Specify grammar rules and associated action in the following format
a. %{
Statements (Include statement optional)
%}
b. Lexical tokens, grammar, precedence and associated information.
%%
Grammar, rules and action
%%
c. Auxiliary Procedure (main( ) function)
3. Save file with .l extension e.g. Mylex.l
4. Call lex tool on the terminal e.g. [root@localhost]# lex Mylex.l This lex tool will convert
.l file into .c language code file i.e. lex.yy.c
5. Compile the file lex.yy.c e.g. gcc lex.yy.c .After compiling the file lex.yy.c, this will
create the output file a.out
7. Run the file a.out e.g. ./a.out
8. Give input on the terminal to the a.out file upon processing output will be displayed
<parser.l>
37
%{
#include<stdio.h>
#include "y.tab.h"
%}
%%
[0-9]+ {yylval.dval=atof(yytext);
return DIGIT;
}
\n|.
%%
return yytext[0];
<parser.y>
%{
/*This YACC specification file generates the LALR parser for the program considered in
experiment 4.*/
#include<stdio.h>
%}
%union
{
double dval;
}
%token <dval> DIGIT
%type <dval> expr
%type <dval> term
%type <dval> factor
%%
line: expr '\n'
{
printf("%g\n",$1);
}
;
expr: expr '+' term
| term
;
{$$=$1 + $3 ;}
38
{$$=$1 * $3 ;}
| factor
;
factor: '(' expr ')'
| DIGIT
;
{$$=$2 ;}
%%
int main()
{
yyparse();
}
yyerror(char *s)
{
printf("%s",s);
}
Output:
#lex parser.l
#yacc d parser.y
#cc lex.yy.c y.tab.c ll lm
#./a.out
2+3
5.0000
39
possible and allows some optimizations to be carried out that would otherwise not be possible.
The following are commonly used intermediate representations:
1. Postfix notation
2. Syntax tree
3. Three-address code
Postfix Notation
In postfix notation, the operator follows the operand. For example, in the expression (a b) * (c
+ d) + (a b), the postfix representation is:
Syntax Tree
The syntax tree is nothing more than a condensed form of the parse tree. The operator and
keyword nodes of the parse tree (Figure 1) are moved to their parent, and a chain of single
productions is replaced by single link (Figure ).
40
Sometimes a statement might contain less than three references; but it is still called a threeaddress statement. The following are the three-address statements used to represent various
programming language constructs:
41
Infix Expression :
Any expression in the standard form like "2*3-4/5" is an Infix(Inorder) expression.
Postfix Expression :
The Postfix(Postorder) form of the above expression is "23*45/-".
Infix to Postfix Conversion :
In normal algebra we use the infix notation like a+b*c. The corresponding postfix notation is
abc*+. The algorithm for the conversion is as follows :
has higher precedence over the scanned character Pop the stack else Push the scanned
character to stack. Repeat this step as long as stack is not empty and topStack has
precedence over the character.
Repeat this step till all the characters are scanned.
(After all characters are scanned, we have to add any character that the stack may have to
the Postfix string.) If stack is not empty add topStack to Postfix string and Pop the stack.
Repeat this step as long as stack is not empty.
Return the Postfix string.
Example :
42
Let us see how the above algorithm will be imlemented using an example.
Infix String : a+b*c-d
Initially the Stack is empty and our Postfix string has no characters. Now, the first character
scanned is 'a'. 'a' is added to the Postfix string. The next character scanned is '+'. It being an
operator, it is pushed to the stack.
Postfix String
Stack
Next character scanned is 'b' which will be placed in the Postfix string. Next character is '*'
which is an operator. Now, the top element of the stack is '+' which has lower precedence than
'*', so '*' will be pushed to the stack.
Postfix String
Stack
The next character is 'c' which is placed in the Postfix string. Next character scanned is '-'. The
topmost character in the stack is '*' which has a higher precedence than '-'. Thus '*' will be
popped out from the stack and added to the Postfix string. Even now the stack is not empty.
Now the topmost element of the stack is '+' which has equal priority to '-'. So pop the '+' from the
stack and add it to the Postfix string. The '-' will be pushed to the stack.
Postfix String
Stack
Next character is 'd' which is added to Postfix string. Now all characters have been scanned so
we must pop the remaining elements from the stack and add it to the Postfix string. At this stage
43
we have only a '-' in the stack. It is popped out and added to the Postfix string. So, after all
characters are scanned, this is how the stack and Postfix string will be :
Postfix String
Stack
End result :
Algorithm:
1. Take a stack OPSTK and initialize it to be empty.
2. Read the entire string or in infix form e.g. A+B*C
3. Read the string character by character into var symbol.
i)If symbol is an operand
Add it to the postfix string.
ii)if stack OPSTK is not empty and precedence of top of stack symbol is
greater than recently read character symbol then pop OPSTK .
topsymbol=pop(OPSTK)
Add this popped topsymbol to the postfix string
iii) Repeat step iii. Till stack is not empty precedence of top of stack symbol is greater
than recently read character symbol.
iv) push symbol onto OPSTK.
4. Output any remaining operators.
Pop OPSTK till it is not empty and ad top symbol to postfix string .
Output:
-----------------------------------------Enter the Infix Notation : (A+B)*C
Postfix Notation is: AB+C*
44
Theory:
To create an efficient target language program, a programmer needs more than an
optimizing compiler. In this section, we review the options available to a programmer and a
compiler for creating efficient target programs. We mention of code-improving transformations
that a programmer and a compiler writer can be expected to use it improve the performance of a
program.
Criteria for Code-Improving Transformations
The best program transformations are those that yield the most benefit for the least effort. The
transformations provided by an optimizing compiler should have several properties.
First, a transformation must preserve the meaning of programs. That is, an optimization
must not change the output produced by a program for a given input, or cause an error, such as
division by zero, that was not present in the original version of the source program.
Second, a transformation must, on the average, speed up programs by a measurable amount.
Third, a transformation must be worth the effort. It does not make sense for a compiler writer
to expend the intellectual effort to implement a code-improving transformation
Getting Better Performance
Dramatic improvements in the running time of a program such as cutting the running time
from a few hours to few seconds-are usually obtained by improving the program at all levels.
Source
code
User can
Profile program
Change algorithm
Transform loops
Front-end
Intermediate
code
Code
generator
target
code
Compiler can
Compiler can
Improve loops
Procedure calls
instructions Do
Address calculations
peephole
transformations
45
< v );
do j = j 1; while ( a[ j ] > v );
if ( i >= j ) break;
x = a [i ] ; a[ i ] = a[ j ]; a [ n ] = x;
/*fragment ends here*/
quicksort ( m, j ) ; quicksort ( i + 1,n);
}
Fig.2 C code for quicksort
An Organization for an Optimizing Compiler
Advantages of figure 3:
1. The operations needed to implement high-level constructs are made explicit in the
intermediate code, so it is possible to optimize them. For example, the address
calculations for a [i] are explicit in figure 4 so the recomputation of expressions like 4*i
can be eliminated as discussed in the next section.
2. The intermediate code can be relatively independent of the target machine, so the
optimizer does not have to change much if one for a different machine replaces the code
generator. The intermediate code in figure 4 assumes that each element of the array a
takes four bytes. Some intermediate codes, e.g. P-code for Pascal, leave it to the code
generator to fill in the size of a machine word. We could have done the same in our
intermediate code if we replaced 4 by symbolic constant.
46
front
end
Code
optimizer
Code
generator
Control-flow
data-flow
Transfor-
analysis
analysis
mations
(16) t7 : = 4 * i
(17) t8 : = 4 * j
(18) t9 : = a [t8 ]
(19) a[t7]: = t9
(20) t10 : = 4 * j
(21) a [ t10] : = x
(22) goto (5)
(23) t11 : = 4 * i
(24) x := a [t11]
(25) t12 : = 4 * i
(26) t13 : = 4 *n
(27) t14: = a [t13]
(28) a[t12] : = t14
(29) t15 : = 4 * n
(30) a [ t15] : = x
47
f = f*i;
return(f);
}
The three-address-code representation for the program fragment above is:
1.
2.
3.
4.
f = 1;
i=2
if i <= x goto(8)
f = f *i
5.
6.
7.
8.
t1 = i + 1
i = t1
goto(3)
goto calling program
Therefore, the basic blocks into which the above code can be partitioned are as follows, and
the program flow graph is shown in Figure 1.
Block B1:
Block B2:
Block B3:
Block B4:
49
50
Figure 2: The flow graph back edges are identified by computing the dominators.
Dominator (dom) relationships have the following properties:
1. They are reflexive; that is, every node dominates itself.
2. That are transitive; that is, if a dom b and b dom c, this implies a dom c.
5 Reducible Flow Graphs
Several code-optimization transformations are easy to perform on reducible flow graphs. A flow
graph G is reducible if and only if we can partition the edges into two disjointed groups, forward
edges and back edges, with the following two properties:
1. The forward edges form an acyclic graph in which every node can be reached from the
initial node G.
2. The back edges consist only of edges whose heads dominate their tails.
For example, consider the flow graph shown in Figure 3. This flow graph has no back edges,
because no edge's head dominates the tail of that edge. Hence, it could have been a reducible
graph if the entire graph had been acyclic. But that is not the case. Therefore, it is not a reducible
flow graph.
d, we start with node n and add all the predecessors of node n to the loop. Then we add the
predecessors of the nodes that were just added to the loop; and we continue this process until we
reach node d. These nodes plus node d constitute the set of all those nodes that can reach node n
without going through node d. This is the natural loop of the edge n d. Therefore, the
algorithm for detecting the natural loop of a back edge is:
Input : back edge n d.
Output: set loop, which is a set of nodes forming the natural
loop of the back edge n d.
main()
{
loop = { d } / * Initialize by adding node d to the set loop*/
insert(n); /* call a procedure insert with the node n */
}
procedure insert(m)
{
if m is not in the loop then
{
loop = loop { m }
for every predecessor p of m do
insert(p);
}
}
For example in the flow graph shown in Figure 1, the back edges are edge B3 B2, and the
loop is comprised of the blocks B2 and B3.
After the natural loops of the back edges are identified, the next task is to identify the loop
invariant computations. The three-address statement x = y op z, which exists in the basic block B
(a part of the loop), is a loop invariant statement if all possible definitions of b and c that reach
upto this statement are outside the loop, or if b and c are constants, because then the calculation
b op c will be the same each time the statement is encountered in the loop. Hence, to decide
whether the statement x = b op c is loop invariant or not, we must compute the ud chaining
information. The ud chaining information is computed by doing a global data flow analysis of
the flow graph. All of the definitions that are capable of reaching to a point immediately before
the start of the basic block are computed, and we call the set of all such definitions for a block B
the IN(B). The set of all the definitions capable of reaching to a point immediately after the last
statement of block B will be called OUT(B). We compute both IN(B) and OUT(B) for every
block B, GEN(B) and KILL(B), which are defined as:
52
KILL(B): The set of all the definitions outside block B that define the same variables as
are defined in block B.
Consider the flow graph in Figure 4.The GEN and KILL sets for the basic blocks are as shown
in Table 1.
Table 1: GEN and KILL sets for Figure 4 Flow Graph
Block
GEN
KILL
B1
{1,2}
{6,10,11}
B2
{3,4}
{5,8}
B3
{5}
{4,8}
B4
{6,7}
{2,9,11}
B5
{8,9}
{4,5,7}
B6
{10,11}
{1,2,6}
53
The next step, therefore, is to solve these equations. If there are n nodes, there will be 2n
equations in 2n unknowns. The solution to these equations is not generally unique. This is
because we may have a situation like that shown in Figure 5, where a block B is a predecessor of
itself.
54
5.
{
flag = false
for each block B do
{
INnew(B) =
for each predecessor P of B
6.
IN
OUT
B1
{1,2}
B2
{3,4}
B3
{5}
B4
{6,7}
B5
{8,9}
B6
{10,11}
IN
OUT
B1
{1,2}
B2
{1,2,6,7}
{1,2,3,4,6,7}
B3
{3,4,8,9}
{3,5,9}
B4
{3,4,5}
{3,4,5,6,7}
B5
{5}
{8,9}
B6
{6,7}
{7,10,11}
55
IN
OUT
B1
{1,2}
B2
{1,2,3,4,5,6,7}
{1,2,3,4,6,7}
B3
{1,2,3,4,6,7,8,9}
{1,2,3,5,6,7,9}
B4
{1,2,3,4,5,6,7,9}
{1,3,4,5,6,7}
B5
{3,5,9}
{3,8,9}
B6
{3,4,5,6,7}
{3,4,5,7,10,11}
IN
OUT
B1
{1,2}
B2
{1,2,3,4,5,6,7}
{1,2,3,4,6,7}
B3
{1,2,3,4,6,7,8,9}
{1,2,3,5,6,7,9}
B4
{1,2,3,4,5,6,7,9}
{1,3,4,5,6,7}
B5
{1,2,3,5,6,7,9}
{1,2,3,6,8,9}
B6
{1,3,4,5,6,7}
{1,3,4,5,7,10,11}
IN
OUT
B1
{1,2}
B2
{1,2,3,4,5,6,7}
{1,2,3,4,6,7}
B3
{1,2,3,4,6,7,8,9}
{1,2,3,5,6,7,9}
B4
{1,2,3,4,5,6,7,9}
{1,3,4,5,6,7}
B5
{1,2,3,5,6,7,9}
{1,2,3,6,8,9}
B6
{1,3,4,5,6,7}
{1,3,4,5,7,10,11}
The next step is to compute the ud chains from the reaching definitions information, as follows.
If the use of A in block B is preceded by its definition, then the ud chain of A contains only the
last definition prior to this use of A. If the use of A in block B is not preceded by any definition
of A, then the ud chain for this use consists of all definitions of A in IN(B).
For example, in the flow graph for which IN and OUT were computed in Tables 26, the use of
a in definition 4, block B2 is preceded by definition 3, which is the definition of a. Hence, the
ud chain for this use of a only contains definition 3. But the use of b in B2 is not preceded by
any definition of b in B2. Therefore, the ud chain for this use of B will be {1}, because this is
the only definition of b in IN(B2).
The ud chain information is used to identify the loop invariant computations. The next step is
to perform the code motion, which moves a loop invariant statement to a newly created node,
called "preheader," whose only successor is a header of the loop. All the predecessors of the
header that lie outside the loop will become predecessors of the preheader.
56
But sometimes the movement of a loop invariant statement to the preheader is not possible
because such a move would alter the semantics of the program. For example, if a loop invariant
statement exists in a basic block that is not a dominator of all the exits of the loop (where an exit
of the loop is the node whose successor is outside the loop), then moving the loop invariant
statement in the preheader may change the semantics of the program. Therefore, before moving
a loop invariant statement to the preheader, we must check whether the code motion is legal or
not. Consider the flow graph shown in Figure 6
In the flow graph shown in Figure 6, x = 2 is the loop invariant. But since it occurs in B3, which
is not the dominator of the exit of loop, if we move it to the preheader, as shown in Figure 7, a
value of two will always get assigned to y in B5; whereas in the original program, y in B5 may
get value one as well as two.
Figure 7: Moving a loop invariant statement changes the semantics of the program.
After Moving x = 2 to the Preheader
In the flow graph shown in Figure 7, if x is not used outside the loop, then the statement x = 2
can be moved to the preheader. Therefore, for a code motion to be legal, the following
conditions must be met, even if no errors are encountered:
57
1. The block in which a loop invariant statement occurs should be a dominator of all exits
of the loop, or the name assigned to the block should not be used outside the loop.
2. We cannot move a loop invariant statement assigned to A into preheader if there is
another statement in the loop that assigns to A. For example, consider the flow graph
shown in Figure 8.
Figure 9: Moving a value to the preheader changes the original meaning of the program.
Even though x is not used outside the loop, the statement x = 2 in the block B2 cannot be moved
to the preheader, because the use of x in B4 is also reached by the definition x = 1 in B1.
Therefore, if we move x = 2 to the preheader, then the value that will get assigned to a in B4 will
always be a one, which is not the case in the original program.
58
Conclusion: Thus we have studied code optimization for common sub-expression elimination,
loop invariant code movement.
59
60
61