Sie sind auf Seite 1von 61

Jawaharlal Nehru Engineering College

Laboratory Manual
On

Principles of compiler design

For

Final Year Students CSE


Dept: Computer Science & Engineering (NBA Accredited)

Author JNEC, Aurangabad

FOREWORD

It is my great pleasure to present this laboratory manual for Final year


engineering students for the subject of Principles of compiler design keeping in
view the vast coverage required for designing a compiler.

As a student, many of you may be wondering with some of the questions in


your mind regarding the subject and exactly what has been tried is to answer
through this manual.

As you may be aware that MGM has already been awarded with ISO 9000
certification and it is our endure to technically equip our students taking the
advantage of the procedural aspects of ISO 9000 Certification.

Faculty members are also advised that covering these aspects in initial stage
itself, will greatly relived them in future as much of the load will be taken care
by the enthusiasm energies of the students once they are conceptually clear.

Dr. S.D.Deshmukh
Principal

LABORATORY MANUAL CONTENTS

This manual is intended for the Final year students of IT & CSE branches in
the subject of PCD. This manual typically contains practical/Lab Sessions
related PCD covering various aspects related the subject to enhanced
understanding. Strengthen knowledge of a procedural programming language.
Further develop your skills in software development using a procedural
language.
This course will also helpful for student for understanding design of a
compiler. We have made the efforts to cover various aspects of the subject
covering these labs encompass the regular materials as well as some advanced
experiments useful in real life applications. Programming aspects will be
complete in itself to make it meaningful, elaborative understandable concepts
and conceptual visualization.
Students are advised to thoroughly go through this manual rather than only
topics mentioned in the syllabus, as practical aspects are the key to
understanding and conceptual visualization of theoretical aspects covered in
the books.

Good Luck for your Enjoyable Laboratory Sessions

Prof. D.S.Deshpande
HOD, CSE

Ms. R.D.Narwade
Lecturer, CSE Dept.

General Guidelines and Rules


Guidelines
1. Students must read the lab module and make necessary preparation for each session PRIOR
TO coming to the lab.
2. Each lab session will last for approximately TWO hours, which is divided into the following
time slots:
Item
Time
40 minutes
Demonstration
Instructor will give a brief demonstration during the allocated time
and students will be given some hands-on exercises. Hence the
students are expected to fully utilize this time to enquire the
instructor/tutors regarding the subject matter.
80 minutes
Lab assignment
Students must::
o Answer all the given questions
o Report to the lab instructor and demonstrator to submit the
answers before the lab ends
No take-home assignment will be allowed.
Solutions will ONLY be given after the module is covered by all
groups.

DOs and Donts in Laboratory:


1. Make entry in the Log Book as soon as you enter the Laboratory.
2. All the students should sit according to their roll numbers starting from their left
to right.
3. All the students are supposed to enter the terminal number in the log book.
4. Do not change the terminal on which you are working.
5. All the students are expected to get at least the algorithm of the program/concept
to be implemented.
6. Strictly observe the instructions given by the teacher/Lab Instructor.

Instruction for Laboratory Teachers::


1.

Submission related to whatever lab work has been completed should be done

during the next lab session. The immediate arrangements for printouts related to
submission on the day of practical assignments.
2. Students should be taught for taking the printouts under the observation of lab
teacher.
3. The promptness of submission should be encouraged by way of marking and
evaluation patterns that will benefit the sincere students.

Rules
All students are bound to adhere to the following rules. Failure of complying with any rules will
be penalized accordingly.
1. Students must be in the lab before the lab activities started. No late coming is tolerated
without prior consent from the respective lecturer. Failure to do so may eliminate your mark
for that particular lab session.
2. During lab session any form of portable data storage and retrieval devices is prohibited. If
found, then we reserve the right to confiscate the item and devoid your mark for that
particular lab session. Collection of the confiscated item(s) requires approval from Deputy
Dean of Academic Affairs.
3. Duplicated lab assignment: the source and duplicate will be considered void.
4. Submission procedure:
a) Create a folder in the D: \ drive of your workstation. Name the folder with your ID
number and your name. Example: 04xxxxxx Rahul Joshi
b) Save all your answers and source codes inside the folder. Name the files according to the
question, example: question1.cpp/question1.txt.
Report your completed lab assignment to the instructor/demonstrator for inspection and
assessment

Special Instruction Before Each Lab Session


1. Create 2 folders in the D: \ drive:
a) 1 folder for lab exercise and hand-on experience:
Name the folder with your ID number and your name with (demo). Example:

04xxxxxx Rahul Joshi (demo)


b) 1 folder for submission (refer to Section B)
2. This lab is aimed to apply the theories that you have learnt in Principles of compiler design,
not intended to teach you everything about C. Hence, you need to do your own homework.

3. Lab module is designed as a guideline, not a comprehensive set of notes and exercises. Read
your theory notes and books pertaining to the topics to be covered.

How to get lab manual


1. You can obtain the lab manual from the selected lab representatives
2. Please get the materials ready before coming to the lab.

H/w, S/w Requirement:

IBM-compatible 486 System, a hard drive, Min 8Mb memory, Win xpS/w.
Turbo C++
Linux Operating System :
Tool : FLEX

SUBJECT INDEX
Lab No.

Index

Week
involved

Program to generate Lexical Tokens

1-2

Program to convert NFA to DFA

5-6

Study of LEX tool

7-8

Design of a Predictive Parser

9-10

Study of YACC

10-11

Program to convert Infix Expression to Postfix

Implementation of code
optimization for common sub-expression
13-14
elimination, loop invariant code movement
C Conduction of Viva-Voce Examinations

Appendix - A

Introduction
Introduction to the compiler
What is a Compiler
Compiler is a program that reads a program written in one language the source language and
translates it in to an equivalent program in another language the target language. As an
important part of this translation process, the compiler reports to its user the presence of errors in
the source program.

Source Program

Compiler

Target program

Error Message

The Analysis-Synthesis Model of Compilation


There are two parts to compilation: Analysis and Synthesis. The analysis part breaks up the
source program into constituent pieces and creates an intermediate representation of source
program. The synthesis part constructs the desired target program from the intermediate
representation. Of the two parts, synthesis requires the most specialize technique.
The Phases Of A Compiler
A compiler operates in six phases, each of which transforms the source program from one
representation to another. The first three phases are forming the bulk of analysis portion of a
compiler. Two other activities, symbol table management and error handling, are also interacting
with the six phases of compiler. These six phases are lexical analysis, syntax analysis, semantic
analysis, intermediate code generation, code optimization and code generation.

Phases of Compiler

Source program

Lexical
Analyzer

Syntax
Analyzer

Semantic
Analyzer

Symbol table
manager

Intermediate code
generation

Error
Handler

Code
Optimizer

Code
Generator

Target
program
Lexical analysis

In compiler, lexical analysis is also called linear analysis or scanning. In lexical analysis the
stream of characters making up the source program is read from left to right and grouped into
tokens that are sequences of characters having a collective meaning.

Syntax analysis
It is also called as Hierarchical analysis or parsing. It involves grouping the tokens of the source
program into grammatical phrases that are used by the compiler to synthesize output. Usually, a
parse tree represents the grammatical phrases of the sourse program.
Semantic Analysis
The semantic analysis phase checks the source program for semantic errors and gathers type
information for the subsequent code generation phase. It uses the hierarchical structure
determined by the syntax-analysis phase to identify the operators and operands of expressions
and statements.
An important component of semantic analysis is type checking. Here the compiler checks that
each operator has operands that are permitted by the source language specification.
Symbol table management
Symbol table is a data structure containing the record of each identifier, with fields for the
attributes of the identifier. The data structure allows us to find the record for each identifier
quickly and store or retrieve data from that record quickly. When the lexical analyzer detects an
identifier in the source program, the identifier is entered into symbol table. The remaining
phases enter information about identifiers in to the symbol table.
Error detection
Each phase can encounter errors. The syntax and semantic analysis phases usually handle a large
fraction of the errors detectable by compiler. The lexical phase can detect errors where the
characters remaining in the input do not form any token of language. Errors where the token
stream violates the structure rules of the language are determined by the syntax analysis phase.

10

Intermediate code generation


After syntax and semantic analysis, some compilers generate an explicit intermediate
representation of the source program. This intermediate representation should have two
important properties: it should be easy to produce and easy to translate into target program.
During semantic analysis the compiler tries to detect constructs that have the right syntactic
structure but no meaning to the operation involved
Code optimization
The code optimization phase attempts to improve the intermediate code so that the fasterrunning machine code will result. There are simple optimizations that significantly improve the
running time of the target program without slowing down compilation too much.
Code generation
The final phase of compilation is the generation of target code, consisting normally of reloadable
machine code or assembly code.

11

Practical 1. Lexical Analysis


Aim: Program to generate lexical tokens.
Theory:
THE ROLE OF LEXICAL ANALYZER
The lexical analyzer is the first phase of a compiler. Its main task is to read the input
characters and produce as output a sequence of tokens that the parser uses for syntax analysis.
Upon receiving a get next token command from the parser, the lexical analyzer reads input
characters until it can identify the next token.
next_token
Input

Lexical
Analyzer

getnext_token

Parser

Symbol
Table

Error
Handler
Figure 1.1 Interaction of Lexical Analyzer with Parser

Since the lexical analyzer is the part of the compiler that reads the source text, it may
also perform certain secondary tasks at the user interface. One such task is stripping out from
the source program comments and white spaces in the form of blank, tab, and new line
characters. Another is correlating error messages from the compiler with the source program.
Sometimes lexical analyzers are divided into a cascade of two phases first called
scanning and the second lexical analysis. The scanner is responsible for doing simple tasks,
while the lexical analyzer proper does the more complex operations.

12

Algorithm:
Declare an array of characters, as buffer to store the tokens ,that is,lexbuffer;
Get token from user put it into character type of variable, say c.
If c is blank then do nothing.
If c is new line character line=line+1.
If c is digit, set token_val ,the value assigned for a digit and return the NUMBER.
If c is proper token then assign the token value.
Print the complete table with
a. Token entered by the user

1.
2.
3.
4.
5.
6.
1.

b. Associated token value.

Output:
Enter the Statement
If(a= =1) then b++;
Token

Code

Value

-------------------------------if
1
(
5
1
a
8
Pointer to Symbol table
==
6
1
1
9
Pointer to literal table
)
5
2
then
2
b
8
Pointer to literal table
++
;

6
7

2
1

Algorithm to check the whether the string is KETWORD or not


1. Start
2. Declare the character storing the keyword
S[5][10]={if,else,for,int,goto,return} and another character array to store
the string to be compared st[], initialize integer variable l, flag = 0 ,m
3. Input the string that is to be compared st[]
4. Repeat step 5 to 6 till counter i becomes equal to number of keyword stored in an
array

13

5. Compared the string entered by the user with the string in the character array by using
6.
7.
8.
9.

m=strcmp(st,s[i]),where strcmp()function returns TRUE if strings are equal


i=i+1
if flag equal to one then it is keyword
Else it is not keyword
Stop

Output:
Enter the string : return
It is KEYWORD
Enter the string : hello
It is not KEYWORD

Algorithm to find whether the string is CONSTANT or not


1. Start
2. Declare the character array str[] and initialize integer variable len , a= 0.
3. Input the string from user
4. Find the length of the string
5. Repeat step 6 to 7 till a<len
6. If str[a] is numeric character then a++
7. Else it is not constant and break from the loop and goto step 9.
8. if a = = len then print that entered string is constant
9. else it is not a constant
10. Stop.

Output:
Input a string : 24
It is a CONSTANT
Input a string : a34
It is a NOT CONSTANT

Conclusion: Thus we have studied Lexical Analyzer

14

Practical 2. NFA TO DFA


Aim: Program to convert NFA to DFA.
Theory:
FINITE AUTOMATA: Machine Model that Recognizes Regular Languages
The finite automata, (FA), machine M defined by the 5-tuple M = {, S, s0, , F}; where the
alphabet is: = {0,1}; the set of states is: S = {s0, s1, s2}; the starting state is s0; the final state
is: F = {s2}; and the transitions are defined by the table below.

0
1.
s0
s0
s1
s1
{s1, s2}
s1
s2

s2

M can also be represented by the transition graph:


0

s0

1, 0

s1

0
0

s2

This figure (which corresponds to the transition table above) is a non-deterministic finite
automaton, NFA. The big circles represent states and the double circles represent accepting or
final states. The state with an unlabeled arrow coming from its left is the starting state
Constructing NFA from Regular Expression
REGULAR EXPRESSIONS
A regular expression is built up out of simpler regular expressions using a set of defining
rules. Each regular expression r denotes a language L(r). The defining rules specifies how L(r)
is formed by combining in various ways the languages denoted by the subexpressions of r.
Rules that define the regular expressions using a set of defining rules:

15

1. is a regular expression that denotes { }, that is, the set of containing the empty
string.
2. If a is a symbol in , then a is a regular expression that denotes {a} is a set containing
the string a.
3. Suppose r and s are regular expressions denoting the language L(r) and L(s). Then:
a)
b)
c)
d)

(r) | (s) is a regular expression denoting L(r) L(s)


(r) (s) is a regular expression denoting L(r) L(s)
(r)* is a regular expression denoting (L(r))*
(r) is a regular expression denoting L(r)2

A language denoted by a regular expression is said to be a regular set.

NFA vs. DFA


Deterministic Finite Automaton (DFA)
For every state there is exactly one outgoing edge per alphabet symbol (the transition
function IS a function)
For each symbol in the alphabet there is a corresponding output and there is only one.
Non-Deterministic Finite Automaton (NFA)
At least one of the states has more than one outgoing edge for the same alphabet symbol
(the transition function IS NOT a function, it is a relation.)
There may be e transitions (transitions that occur without the presence of an input symbol
from the alphabet.)
ALGORITHM TO SIMULATE A DFA
Algorithm 1
Input: A string x ended by an EOF character and a DFA defined as a 5-tuple with s0 as its
initial state and F as its set of accepting states.
Output: A YES if the DFA accepts x or NO otherwise.
Method:
Apply the pseudo code algorithm below to the input string x. The function move(x, c) gives
the state to which there is a transition from state s on input character c. The function nextchar
returns the next character of the input string x.
s = s0;
c = nextchar;
while c =! EOF

16

s = move(s, c);
c = nextchar;
end
if s is in F then
return YES
else return NO;
Finite Automata Parsing
Accept input string iff there exists at least one path from the initial state (s0) to a final state
(sf) that spells the input string.
If no path exists, reject the string
Example: Use the input string x = 11000
0

1,0

0 0 0 1 1->
s0

s1

s2

1,0

0
0 0 0 1 ->
s0

s1

s2

1,0

0
0 0 ->

s0

s1

1
0

1,0

s0

s2

s1

s2

Example

Consider the NFA N = {{q0,.., q4}, {0, 1}, q0,


0, 1

q0
1

0
17

q1

q3

q0

{q0, q3}

{q0, q1}

q1

{q2}

q2

{q2}

{q2}

q3

{q4}

q4

{q4}

{q4}

We show next the proliferation of states of the NFA under the input string
q0

q0

q0

q0

1
q3

q0

q0

0
q3

q1

01001

q1

q3
0
1

q4

q4

DEFINITIONS RELATED TO THE e-closure ALGORITHM


Definition
Given a state s of some NFA N, the e-closure(s) is the set of states in N reachable
froms on e-transitions only.
Definition
Given a set of states T of some NFA N, the e-closure(T) is the set of states in N
reachable from some state s in T on e-transitions alone.

Example
X = e-closure({0}) = {0, 1, 2, 4, 7}
Y = e-closure({2}) = {2}
Z = e-closure({6}) = {6, 1, 2, 4, 7} = {1, 2, 4, 6, 7}
T = e-clos.({1, 2, 5}) = e-clos.({1}) U e-clos.({2}) U e-clos.({5})
= {1, 2, 5, 4, 6, 7} = {1, 2, 4, 5, 6, 7}
e
a
e

e
0

e
e

6
b

e
4

7
18

THE e-closure ALGORITHM


The computation of e-closure(T) is a typical process of searching a graph for nodes
reachable from a given set T of nodes by following the e-labeled edges of the NFA.
The (pseudo code) algorithm below used to compute the e-closure (T) uses a stack to
hold the states whose edges have not been checked for e-labeled transitions
Algorithm (e-closure)
push all states in T onto stack;
initialize e-closure(T) to T;
while stack is not empty
pop t, the top element, off stack;
for each state u with an edge from t to u labeled e
if u is not in the e-closure(T)
add u to e-closure(T);
push u onto stack
end
end

ALGORITHM TO BUILD THE DFA FROM AN NFA


Potentially, the total number of states of the DFA is the power set of the number of
states in the NFA (2N).
Input: An NFA N with starting state s0, accepting some regular language.
Output: A DFA D accepting the same language as N.
Method:
The algorithm constructs a transition table DTran for D. We call Dstates the set of states in D.
We will use the definitions given for the e-closure algorithm plus:
move(T, a) = set of states to which there is a transition on input symbol a in from
some NFA state s in T.

Algorithm 2 (subset construction).


19

initially, e-closure(s0) is the only state in Dstates and it is unmarked;


while there is an unmarked state T in Dstates
mark T;
for each input symbol a in
U = e-closure(move(T, a));
if U is not in Dstates then
add U as an unmarked state to Dstates;
DTran(T, a) = U
end
end
Algorithm for Subset Construction.
(taken from Parsons textbook)

Consider an NFA N and a DFA D accepting the same regular language. [Equivalent to
Algorithm 2].
1) Initially the list of states in D is empty. Create the starting state as e-closure(s0), named after
initial state {s0} in N. That is, new start state: state(1) = e-closure (s0).
2) While (there is an uncompleted row in the transition table for D) do:
a) Let x = {s1, s2, ...,sk) be the state for this row.
b) For each input symbol a do:
i) Find the e-closure({s1,s2,...sk},a) = N({s1},a) U N({s2},a) U..... U N({sk},a) = some
set we'll call T .
ii) Create the D-state y = {T}, corresponding to T.
iii) If y is not already in the list of D-states, add it to the list. (This results in a new row in
the table)
iv) Add the rule D(x , a) = y to the list of transition rules for D.
3) Identify the accepting states in D. (They must include the accepting states in N). That is,
make state(j) final state iff at least one state in state(j) is final state in the NFA.
Yet another algorithm for the construction of a DFA equivalent to an NFA.
state[0] = { };
state[1] = e-closure (s0);
p = 1;
j = 0;
while (j <= p) {
foreach c in do {
e = DFAedge(state[j],c);

20

if (e == state[i] for some i <= p) {


trans[j,c] = i;
}
else {
states[p] = e;
trans[j,c] = p;
} }
j = j + 1;
}
Example
Find all possible theoretical states for the NFA given below.
0
1,0
0

There are in total 23 = 8 states for the corresponding DFA. This is the Power Set of set S =
{1, 2, 3} represented by 2|S|. That is, the 8 states of the DFA are:
2|S| ={, {0}, {1}, {2}, {0, 1}, {0, 2}, {1, 2}, {0, 1, 2}}
and graphically, the transition diagram is:
0

1,0

1
{0}

{1}
1

{0, 2}

1
{0, 1}

{1, 2}

{2}

{0, 1, 2}

Trivially, the states {2} and plus the states {0, 2} and {0, 1} that have no input can be
eliminated. After the 1st elimination cycle is complete, {0, 1, 2} has no input to it and can be
eliminated. Only A = {0}, B= {1} and C = {1, 2} remain.
Example
The NFA below represents the regular expression letter (letter | digit)*.
e case
corresponding DFA. (The theoretical number of states in this

Find its

letter
5
letter
1

e
2

e
3

21
e
9

10

is 210 = 1,024).

Solution:
A = e-closure ({1}) = {1}
move(A, letter) = {2} (function move defined in Alg. 2)
move(A, digit) =
B = e-closure({2}) = {2, 3, 4, 5, 7, 10}
move(B, letter) = {6}
move(B, digit) = {8}
C = e-closure({6}) = {6, 9, 10, 4, 5, 7} = {4, 5, 6, 7, 9, 10}
move(C, letter) = {6}
move(C, digit) = {8}
D = e-closure({8}) = {8, 9, 10, 4, 5, 7} = {4, 5, 7, 8, 9, 10}
move(D, letter) = {6}
move(D, digit) = {8}
State A is the start state of the DFA and all states that include state 10 of the NFA are
accepting states of the DFA. The transition diagram of the DFA is given below.
letter
start
A

letter

digit

B
digit

letter

digit

C
letter

D
digit

Example
The NFA below represents the regular expression letter(letter | digit)*.

Find its

corresponding DFA.
MINIMIZING THE NUMBER OF STATES OF A DFA
This is an important theoretical result: every regular set is recognized by a minimumstate DFA.

22

Algorithm 4
Input: A DFA M with set of states S, start state s0 and a set of accepting states F.
Output: A DFA M accepting the same language as M and having a minimum number of
states.
Method:
1. Construct an initial partition P of the set of states with two groups: the accepting states
F and the non-accepting states S F.
2. Apply the procedure given below to P and construct a new partition Pnew.
for each group G of P
partition G into subgroups such that two states s and t of G are in the same
subgroup if and only if for all input symbols a, states s and t have transitions on
a to states in the same group of P;
/* at worst, a state will be in a subgroup by itself */
partition G in Pnew by the set of all subgroups formed
end
3. If Pnew = P, let Pfinal = P and continue to step 4. Otherwise, repeat step 2 with P = Pnew.
4. Choose one state in each group of the partition Pfinal as the representative for that
group. The representative will be the states of the reduced DFA M. Let s be a
representative state, and suppose on input a there is a transition of M from s to t. Let r
be the representative of ts group (r may be t). The M has a transition from s to r on
a. Let the start state of M be the representative of the group containing the start state
s0 of M, and let the accepting states of M be the representatives that are in F. Note
that each group of Pfinal either consists only of states in F or has no states in F.
5. If M has a dead state, that is, a state d that is not accepting and that has transitions to
itself on all input symbols, then remove d from M. Also, remove any states not
reachable from the start state. Any transitions to d from other states become
undefined.
Example
Consider the DFA given by the following transition table and transition diagram.
a

A
B

|
|
|

a
B
B

b
C
D

b
a
A

a
b

a
a

D
b
23
E

D
E

|
|

B
B

E
C

Since E is the only accepting state we have the following partition:


[ABCD]
&
[E]
Under a [ABCD] goes to B which is in [ABCD] and under b [ABCD] goes to [CDE], thus we
have the new table
|
[ABCD]
E

[CDE]
|

We must separate D from the subgroup [ABCD] since D is going to E under b. We now
build the following table.
|
a
b .
[ABC]
D
E

[CD]
|
|

B
B

E
C

Now we separate B which is going to D under b. We can build the following table.
|
a
b
[AC]
|
B
C
B
|
B
D
D
|
B
E
E

The final table and transition diagram is given below.


|a b .
[AC] |B [AC]
B
|B
D
D
|B
E
E
|B [AC]
Thus the minimal DFA has only four states versus five states in the initial DFA.

a
b
B
a

a
a

D
b

24

e
0

a
3

1
e

b
8

b
9

10

5
a
b

a
a

b
b
b

Conclusion: Thus we have studied conversion of NFA to DFA, minimization of DFA

Practical 3. LEX tool


Aim:Study of Lex tool
Theory:
Lex is officially known as a "Lexical Analyzer". Its main job is to break up an input stream
into more into meaningful units, or tokens. For example, consider breaking a text file up into
individual words.
More pragmatically, Lex is a tool for automatically generating a lexer ( also known as scanner)
starting from a lex specification (*.l file 2 ).
The skeleton of a lex specification file is given in Figure 3.
The rules section is composed of tuples of the form <pattern, action>. As it can be seen from

25

the following examples, are specified by regular expressions.


Example
<pattern> <action to take when matched> [A-Za-z]+ printf("this is a word") ;
<pattern> <action to take when matched> [0-9]+ printf("this is a number") ;
Example.y

yacc

yyparse
y.tab.c

source

cc
y.tab.h
Regular Expressions
:
example
In the following we denote by c=character, x,y=regular expressions, m,n=integers,
i=identifier.
matches anylex
single character except newline
. Example.1
matches 0 or more instances lex.yy.c
of the preceding regular expression
*
Compiled output
yylex
matches 1 or more instances of the preceding regular expression
+
?

matches 0 or 1 of the preceding regular expression

matches the preceding or following regular expression

[]

defines a character class

()

groups enclosed regular expression into a new regular expression

"..."

matches everything within the " " literally

x|y

x or y

{i}

definition of i

x/y

x, only if followed by y (y not removed from input)

x{m,n}

m to n occurrences of x

x, but only at beginning of line

x$

x, but only at end of line

"s"

exactly what is in the quotes (except for "\" and following character)

26

Algorithm:
1. Open file in text editor
2. Enter keywords, rules for identifier and constant, operators and relational operators. In
the following format
a) %{
Definition of constant /header files
%}
b) Regular Expressions
%%
Transition rules
%%
c) Auxiliary Procedure (main( ) function)
3. Save file with .l extension e.g. Mylex.l
4. Call lex tool on the terminal e.g. [root@localhost]# lex Mylex.l This lex tool will convert
.l file into .c language code file i.e. lex.yy.c
5. Compile the file lex.yy.c e.g. gcc lex.yy.c .After compiling the file lex.yy.c, this will
create the output file a.out
6. Run the file a.out e.g. ./a.out
7. Give input on the terminal to the a.out file upon processing output will be displayed
Example: Program for counting number of vowels and consonant
%{
#include <stdio.h>
#include <ctype.h>
int vowels = 0;
int consonants = 0;
%}
%%

27

[aeiouAEIOU]

vowels++;

[a-zA-Z]
[\n]
.
%%

consonants++;
;
;

int main()
{
printf ("This Lex program counts the number of vowels and ");
printf ("consonants in given text.");
printf ("\nEnter the text and terminate it with CTRL-d.\n");
yylex();
printf ("Vowels = %d, consonants = %d.\n", vowels, consonants);
return 0;
}

Output:
#lex alphalex.l
#gcc lex.yy.c
#./a.out
This Lex program counts the number of vowels and consonants in given text.
Enter the text and terminate it with CTRL-d.
Iceream
Vowels =4, consonants =3.

Output:
For lexical analyzer

[root@localhost]# lex Mylex.l


[root@localhost]# gcc lex.yy.c
[root@localhost]# ./a.out
123
NUMBER
bar123
WORD

28

123ba
NUMBER
WORD

Conclusion: Thus we have studied Lex tool

Practical 4. Parsers
Aim: Program to implement predictive parsers
Theory:
BASIC PARSING TECHNIQUES:
ADVANTAGES OFFERED BY GRAMMAR TO BOTH LANGUAGES AND DESIGNERS
1. A grammar gives a precise, yet easy to understand, syntactic specification of a programming
language.
2. From certain classes of grammars, it can automatically construct an efficient parser that
determines if a source program is syntactically well formed.
3. A properly designed grammar imparts a structure to programming language that is useful for
the translation of source programs into correct object code and for the detection of errors.
Tools are available for converting grammar-based descriptions of translations into working
programs.
4. Languages evolve a period of time, acquiring new constructs and performing additional
tasks.
THE ROLE OF THE PARSER
Token
Source
Program

Lexical
Analyzer

Parser

Intermediate

Parse

Rest of

Tree

Front End

Representation

Get_Next
Token

Symbol
Table

THE POSITION OF PARSER IN COMPILER MODEL

29

There are three general parsers for grammars. Universal methods such as the CockeYounger-Kasami algorithm and Earleys algorithm can parse any grammar. These two methods
are too efficient to use in the production compilers.
The most efficient top-down and bottom up methods work only on the classes of
grammars, but several of these subclasses, such as the LL and LR grammars, are expensive
enough to describe most syntactic constructs in programming languages.
TOP DOWN PARSING: The parse tree is created top to bottom.
Top-down parser
Recursive-Descent Parsing
o Backtracking is needed (If a choice of a production rule does not work, we
backtrack to try other alternatives.)
o It is a general parsing technique, but not widely used.
o Not efficient
S aBc
B bc | b

input: abc

Predictive Parsing
o no backtracking
o efficient
o Needs a special form of grammars (LL (1) grammars).
o Recursive Predictive Parsing is a special form of Recursive Descent parsing
without backtracking.
o Non-Recursive (Table Driven) Predictive Parser is also known as LL (1) parser.

A grammar

a grammar suitable for predictive parsing


Eliminating
left factoring

left factor

When re-writing a non-terminal in a derivation step, a predictive parser can uniquely choose a
production rule by just looking the current symbol in the input string.
A 1 | ... | n

Input: ... a.......


30

Current token
Example

stmt if ...... |

while ...... |
begin ...... |
for .....
When we are trying to write the non-terminal stmt, if the current tokenis if we have to choose
first production rule. When we are trying to write the non-terminal stmt, we can uniquelychoose
the production rule by just looking the current token. We eliminate the left recursion in the
grammar, and left factor it. But itmay not be suitable for predictive parsing (not LL (1)
grammar).
Each non-terminal corresponds to a procedure.
Ex: A
aBb (This is only the production rule for A)
proc A {
- match the current token with a, and move to the next token;
- call B;
- match the current token with b, and move to the next token;
}
A aBb | bAB
proc A {
case of the current token {
a: - match the current token with a, and move to the next token;
- call B;
- match the current token with b, and move to the next token;
b: - match the current token with b, and move to the next token;
- call A;
- call B;
}
}
When to apply -productions.
A
aA | bB |
If all other productions fail, we should apply an -production. For example, if the current
token is not a or b, we may apply the -production.
Most correct choice: We should apply an -production for a nonterminal A when the current
token is in the follow set of A (which terminals can follow A in the sentential forms).

Example And Algorithm Recursive Predictive Parsing


A

aBe | cBd | C

31

bB |

Proc A{
Case of the current token {
a: -match the current token with a, and move to the next token;
-call B;
c: - match the current token with e, and move to the next token;
-call B;
-match the current token with d, and move to the next token;
f: -call C
(f means first set of C)
}
}
Proc C {match the current token with f, and move to the next token ;}
Proc B {
Case of the current token {
b: -match the current token with b, and move to the next token ;
-call B
e,d: do nothing

(follow set of B)

}
}
Implementation of a Table-Driven Predictive Parser
A table-driven parser can be implemented using an input buffer, a stack, and a parsing table. The
input buffer is used to hold the string to be parsed. The string is followed by a "$" symbol that is
used as a right-end maker to indicate the end of the input string. The stack is used to hold the
sequence of grammar symbols. A "$" indicates bottom of the stack. Initially, the stack has the
start symbol of a grammar above the $. It is a two-dimensional array TABLE[A, a], where A is a
nonterminal and a is a terminal, or $ symbol. The parser is controlled by a program that behaves
as follows:
1. The program considers X, the symbol on the top of the stack, and the next input symbol
a.
2. If X = a = $, then parser announces the successful completion of the parsing and halts.
3. If X = a $, then the parser pops the X off the stack and advances the input pointer to the
next input symbol.

32

4. If X is a nonterminal, then the program consults the parsing table entry TABLE[x, a]. If
TABLE[x, a] = x UVW, then the parser replaces X on the top of the stack by UVW in
such a manner that U will come on the top. If TABLE[x, a] = error, then the parser calls
the error-recovery routine.
For example consider the following grammar:

FIRST(S) = FIRST(aABb) = { a }
FIRST(A) = FIRST(c) FIRST() = { c, }
FIRST(B) = FIRST(d) FIRST() = { d, }
Since the right-end marker $ is used to mark the bottom of the stack, $ will initially be
immediately below S (the start symbol) on the stack; and hence, $ will be in the FOLLOW(S).
Therefore:

Using S aABb, we get:

Therefore, the parsing table is as shown in Table 2.


Table 2: Production Selections for Parsing Derivations

S aABb

Ac

Bd

Consider an input string acdb. The various steps in the parsing of this string, in terms of the
contents of the stack and unspent input are shown in Table 3.

33

Table 3: Steps Involved in Parsing the String acdb


Stack Contents Unspent Input Moves
$S

acdb$

Derivation using S aABb

$bBAa

acdb$

Popping a off the stack and advancing one position in the input

$bBA

cdb$

Derivation using A c

$bBc

cdb$

Popping c off the stack and advancing one position in the input

$bB

db$

Derivation using B d

$bd

db$

Popping d off the stack and advancing one position in the input

$b

b$

Popping b off the stack and advancing one position in the input

Announce successful completion of the parsing

Similarly, for the input string ab, the various steps in the parsing of the string, in terms of the
contents of the stack and unspent input, are shown in Table 4.

Table 4: Production Selections for String ab Parsing Derivations


Stack Contents Unspent Input Moves
$S

ab$

Derivation using S aABb

$bBAa

ab$

Popping a off the stack and advancing one position in the input

$bBA

b$

Derivation using A

$bB

b$

Derivation using B

$b

b$

Popping b off the stack and advancing one position in the input

Announce successful completion of the parsing

For a string adb, the various steps in the parsing of the string, in terms of the contents of the
stack and unspent input are shown in Table 5.

Table 5: Production Selections for Parsing Derivations for the String adb
Stack Contents Unspent Input Moves
$S

adb$

Derivation using S aABb

$bBAa

adb$

Popping a off the stack and advancing one position in the input

$bBA

ab$

Calling an error-handling routine

EXAMPLE
Test whether the grammar is LL(1) or not, and construct a predictive parsing table for it.

34

Since the grammar contains a pair of productions S AaAb | BbBa, for the grammar to be
LL(1), it is required that:

Hence, the grammar is LL(1).


To construct a parsing table, the FIRST and FOLLOW sets are computed, as shown below:

1. Using S AaAb, we get:

2. Using S BbBa, we get

Table 6: Production Selections for Example 2 Parsing Derivations

S AaAb

S BbBa

Conclusion: Thus we have studied Parser.

35

Practical 5. YACC tool


Aim: Study of YACC tool
Theory:
YACC stands for Yet Another Compiler Compiler. Its GNU version is called Bison. YACC
translates any grammar of a language into a parser for that language. Grammars for YACC are
described using a variant of Backus Naur Form (BNF). A BNF grammar can be used to express
Context-free languages. By convention, a YACC file has the suffix .y.
Structure of a yacc file
A yacc file looks much like a lex file:
...definitions...
%%
...rules...
%%
...code...
Definitions All code between %{ and %} are copied to the beginning of the resulting C file.
Rules A number of combinations of pattern and action: if the action is more than a single
command it needs to be in braces.
Code This can be very elaborate, but the main ingredient is the call to yylex, the lexical
analyzer. If the code segment is left out, a default main is used which only calls
yylex.
YACC file structure
%{
#include <stdio.h>
int yylex(void);

-----------------------------Part to be embedded into the *.c


------------------------------

void yyerror(char *);


%}
%token INTEGER
%%
program:
program expr \n { printf("%d\n",
$2);}
|

%Definition Section
(Token declarations)
-----------------------------Production rules section:
define how to "understand" the
input language, and what actions
to take for each "sentence".
------------------------------

;
expr:
INTEGER { $$ = $1; }
| expr + expr { $$ = $1 + $3; }

< C auxiliary subroutines >


Any user code. For example,
a main function to call the
parser function yyparse()

| expr - expr { $$ = $1 - $3; }


;
%%
void yyerror(char *s) {

36

Algorithm:
1. Open file in text editor
2. Specify grammar rules and associated action in the following format
a. %{
Statements (Include statement optional)
%}
b. Lexical tokens, grammar, precedence and associated information.
%%
Grammar, rules and action
%%
c. Auxiliary Procedure (main( ) function)
3. Save file with .l extension e.g. Mylex.l
4. Call lex tool on the terminal e.g. [root@localhost]# lex Mylex.l This lex tool will convert
.l file into .c language code file i.e. lex.yy.c
5. Compile the file lex.yy.c e.g. gcc lex.yy.c .After compiling the file lex.yy.c, this will
create the output file a.out
7. Run the file a.out e.g. ./a.out
8. Give input on the terminal to the a.out file upon processing output will be displayed

Design LALR Bottom up Parser .

<parser.l>
37

%{
#include<stdio.h>
#include "y.tab.h"
%}
%%
[0-9]+ {yylval.dval=atof(yytext);
return DIGIT;
}
\n|.
%%

return yytext[0];

<parser.y>
%{
/*This YACC specification file generates the LALR parser for the program considered in
experiment 4.*/
#include<stdio.h>
%}
%union
{
double dval;
}
%token <dval> DIGIT
%type <dval> expr
%type <dval> term
%type <dval> factor
%%
line: expr '\n'

{
printf("%g\n",$1);
}

;
expr: expr '+' term
| term
;

{$$=$1 + $3 ;}

38

term: term '*' factor

{$$=$1 * $3 ;}

| factor
;
factor: '(' expr ')'
| DIGIT
;

{$$=$2 ;}

%%
int main()
{
yyparse();
}
yyerror(char *s)
{
printf("%s",s);
}

Output:
#lex parser.l
#yacc d parser.y
#cc lex.yy.c y.tab.c ll lm
#./a.out
2+3
5.0000

Practical 6. Code Generation


Aim: Program to convert Infix expression to Postfix
Theory:
While translating a source program into a functionally equivalent object code representation, a
parser may first generate an intermediate representation. This makes retargeting of the code

39

possible and allows some optimizations to be carried out that would otherwise not be possible.
The following are commonly used intermediate representations:
1. Postfix notation
2. Syntax tree
3. Three-address code
Postfix Notation
In postfix notation, the operator follows the operand. For example, in the expression (a b) * (c
+ d) + (a b), the postfix representation is:

Syntax Tree
The syntax tree is nothing more than a condensed form of the parse tree. The operator and
keyword nodes of the parse tree (Figure 1) are moved to their parent, and a chain of single
productions is replaced by single link (Figure ).

Figure 1: Parse tree for the string id+id*id.

40

Figure 2: Syntax tree for id+id*id.


Three-Address Code
Three address code is a sequence of statements of the form x = y op z. Since a statement involves
no more than three references, it is called a "three-address statement," and a sequence of such
statements is referred to as three-address code. For example, the three-address code for the
expression a + b * c + d is:

Sometimes a statement might contain less than three references; but it is still called a threeaddress statement. The following are the three-address statements used to represent various
programming language constructs:

Used for representing arithmetic expressions:

Used for representing Boolean expressions:

Used for representing array references and dereferencing operations:

41

Used for representing a procedure call:

Infix Expression :
Any expression in the standard form like "2*3-4/5" is an Infix(Inorder) expression.
Postfix Expression :
The Postfix(Postorder) form of the above expression is "23*45/-".
Infix to Postfix Conversion :
In normal algebra we use the infix notation like a+b*c. The corresponding postfix notation is
abc*+. The algorithm for the conversion is as follows :

Scan the Infix string from left to right.

Initialise an empty stack.


If the scannned character is an operand, add it to the Postfix string. If the scanned
character is an operator and if the stack is empty Push the character to stack.
If the scanned character is an Operand and the stack is not empty, compare the
precedence of the character with the element on top of the stack (topStack). If topStack

has higher precedence over the scanned character Pop the stack else Push the scanned
character to stack. Repeat this step as long as stack is not empty and topStack has
precedence over the character.
Repeat this step till all the characters are scanned.

(After all characters are scanned, we have to add any character that the stack may have to
the Postfix string.) If stack is not empty add topStack to Postfix string and Pop the stack.
Repeat this step as long as stack is not empty.
Return the Postfix string.

Example :

42

Let us see how the above algorithm will be imlemented using an example.
Infix String : a+b*c-d
Initially the Stack is empty and our Postfix string has no characters. Now, the first character
scanned is 'a'. 'a' is added to the Postfix string. The next character scanned is '+'. It being an
operator, it is pushed to the stack.

Postfix String
Stack
Next character scanned is 'b' which will be placed in the Postfix string. Next character is '*'
which is an operator. Now, the top element of the stack is '+' which has lower precedence than
'*', so '*' will be pushed to the stack.

Postfix String
Stack
The next character is 'c' which is placed in the Postfix string. Next character scanned is '-'. The
topmost character in the stack is '*' which has a higher precedence than '-'. Thus '*' will be
popped out from the stack and added to the Postfix string. Even now the stack is not empty.
Now the topmost element of the stack is '+' which has equal priority to '-'. So pop the '+' from the
stack and add it to the Postfix string. The '-' will be pushed to the stack.

Postfix String
Stack
Next character is 'd' which is added to Postfix string. Now all characters have been scanned so
we must pop the remaining elements from the stack and add it to the Postfix string. At this stage

43

we have only a '-' in the stack. It is popped out and added to the Postfix string. So, after all
characters are scanned, this is how the stack and Postfix string will be :

Postfix String
Stack
End result :

Infix String : a+b*c-d


Postfix String : abc*+d-

Algorithm:
1. Take a stack OPSTK and initialize it to be empty.
2. Read the entire string or in infix form e.g. A+B*C
3. Read the string character by character into var symbol.
i)If symbol is an operand
Add it to the postfix string.
ii)if stack OPSTK is not empty and precedence of top of stack symbol is
greater than recently read character symbol then pop OPSTK .
topsymbol=pop(OPSTK)
Add this popped topsymbol to the postfix string
iii) Repeat step iii. Till stack is not empty precedence of top of stack symbol is greater
than recently read character symbol.
iv) push symbol onto OPSTK.
4. Output any remaining operators.
Pop OPSTK till it is not empty and ad top symbol to postfix string .
Output:
-----------------------------------------Enter the Infix Notation : (A+B)*C
Postfix Notation is: AB+C*

44

Practical 7. Code optimization


Aim: Implementation of code optimization for common sub-expression elimination, loop
invariant code movement

Theory:
To create an efficient target language program, a programmer needs more than an
optimizing compiler. In this section, we review the options available to a programmer and a
compiler for creating efficient target programs. We mention of code-improving transformations
that a programmer and a compiler writer can be expected to use it improve the performance of a
program.
Criteria for Code-Improving Transformations
The best program transformations are those that yield the most benefit for the least effort. The
transformations provided by an optimizing compiler should have several properties.
First, a transformation must preserve the meaning of programs. That is, an optimization
must not change the output produced by a program for a given input, or cause an error, such as
division by zero, that was not present in the original version of the source program.
Second, a transformation must, on the average, speed up programs by a measurable amount.
Third, a transformation must be worth the effort. It does not make sense for a compiler writer
to expend the intellectual effort to implement a code-improving transformation
Getting Better Performance
Dramatic improvements in the running time of a program such as cutting the running time
from a few hours to few seconds-are usually obtained by improving the program at all levels.
Source
code

User can
Profile program
Change algorithm
Transform loops

Front-end

Intermediate
code

Code
generator

target
code

Compiler can

Compiler can
Improve loops

Use registers Select

Procedure calls

instructions Do

Address calculations

peephole
transformations

Fig.1 Places for improvements by the user and the compiler

45

Algorithmic transformations occasionally produce spectacular improvements in running


time. For example, Bentley [1982] relates that the running time of a program for sorting N
elements dropped from 2.02N2 microseconds to 12Nlog2N microseconds when a carefully coded
insertion sort was replaced by quick sort. For N=100 the replacement speeds up the program by
a factor of 2.5. For N =100,000 the improvement is far more dramatic: the replacement speeds
up the program by a factor of more than a thousand. Unfortunately, no compiler can find the best
algorithm for a given program.
In this section, and the next, a sorting program called quicksort will be used to illustrate
the effect of various code-improving transformations. The C program below is derived from
Sedgewick [1978], where hand-optimization of such a program is discussed
void quicksort (m, n)
int m, n
{
int i, j;
int v, x;
if (n <= m) return ;
/* fragment begins here*/
i = m 1; j = n; v = a[n];
while (1) {
do i = i + 1; while ( a[ j ]

< v );

do j = j 1; while ( a[ j ] > v );
if ( i >= j ) break;
x = a [i ] ; a[ i ] = a[ j ]; a [ n ] = x;
/*fragment ends here*/
quicksort ( m, j ) ; quicksort ( i + 1,n);
}
Fig.2 C code for quicksort
An Organization for an Optimizing Compiler
Advantages of figure 3:
1. The operations needed to implement high-level constructs are made explicit in the
intermediate code, so it is possible to optimize them. For example, the address
calculations for a [i] are explicit in figure 4 so the recomputation of expressions like 4*i
can be eliminated as discussed in the next section.
2. The intermediate code can be relatively independent of the target machine, so the
optimizer does not have to change much if one for a different machine replaces the code
generator. The intermediate code in figure 4 assumes that each element of the array a
takes four bytes. Some intermediate codes, e.g. P-code for Pascal, leave it to the code
generator to fill in the size of a machine word. We could have done the same in our
intermediate code if we replaced 4 by symbolic constant.

46

front
end

Code
optimizer

Code
generator

Control-flow

data-flow

Transfor-

analysis

analysis

mations

Fig. 3 Organization of the code optimizer


(1) i : = m 1
(2) j : = n
(3) t1 := 4 * n
(4) v : = a [ t1 ]
(5) i : = i + 1
(6) t2: = 4 * i
(7) t3: = a [t2]
(8) if t3 < v goto (5)
(9) j : = j 1
(10)t4 : = 4 * j
(11)t5 : = a [t4 ]
(12)if t5 > v goto (9)
(13)if i >=j goto (23)
(14)t6 : = 4 * i
(15)x : = a [t6 ]

(16) t7 : = 4 * i
(17) t8 : = 4 * j
(18) t9 : = a [t8 ]
(19) a[t7]: = t9
(20) t10 : = 4 * j
(21) a [ t10] : = x
(22) goto (5)
(23) t11 : = 4 * i
(24) x := a [t11]
(25) t12 : = 4 * i
(26) t13 : = 4 *n
(27) t14: = a [t13]
(28) a[t12] : = t14
(29) t15 : = 4 * n
(30) a [ t15] : = x

Fig.4 Three-address code for fragment in figure 3

Loop optimization is the most valuable machine-independent optimization because a program's


inner loops are good candidates for improvement. The important loop optimizations are
elimination of loop invariant computations and elimination of induction variables. A loop
invariant computation is one that computes the same value every time a loop is executed.
Therefore, moving such a computation outside the loop leads to a reduction in the execution
time. Induction variables are those variables used in a loop; their values are in lock-step, and
hence, it may be possible to eliminate all except one.

47

1 Eliminating Loop Invariant Computations


To eliminate loop invariant computations, we first identify the invariant computations and then
move them outside loop if the move does not lead to a change in the program's meaning.
Identification of loop invariant computation requires the detection of loops in the program.
Whether a loop exists in the program or not depends on the program's control flow, therefore,
requiring a control flow analysis. For loop detection, a graphical representation, called a
"program flow graph," shows how the control is flowing in the program and how the control is
being used. To obtain such a graph, we must partition the intermediate code into basic blocks.
This requires identifying leader statements, which are defined as follows:
1. The first statement is a leader statement.
2. The target of a conditional or unconditional goto is a leader.
3. A statement that immediately follows a conditional goto is a leader.
A basic block is a sequence of three-address statements that can be entered only at the
beginning, and control ends after the execution of the last statement, without a halt or any
possibility of branching, except at the end.
2 Algorithm to Partition Three-Address Code into Basic Blocks
To partition three-address code into basic blocks, we must identify the leader statements in the
three-address code and then include all the statements, starting from a leader, and up to, but not
including, the next leader. The basic blocks into which the three-address code is partitioned
constitute the nodes or vertices of the program flow graph. The edges in the flow graph are
decided as follows. If B1 and B2 are the two blocks, then add an edge from B1 to B2 in the
program flow graph, if the block B2 follows B1 in an execution sequence. The block B2 follows
B1 in an execution sequence if and only if:
1. The first statement of block B2 immediately follows the last statement of block B1 in the
three-address code, and the last statement of block B1 is not an unconditional goto
statement.
2. The last statement of block B1 is either a conditional or unconditional goto statement,
and the first statement of block B2 is the target of the last statement of block B1.
For example, consider the following program fragment:
Fact(x)
{
int f = 1;
for(i = 2; i<=x; i++)
48

f = f*i;
return(f);
}
The three-address-code representation for the program fragment above is:
1.
2.
3.
4.

f = 1;
i=2
if i <= x goto(8)
f = f *i

5.
6.
7.
8.

t1 = i + 1
i = t1
goto(3)
goto calling program

The leader statements are:

Statement number 1, because it is the first statement.


Statement number 3, because it is the target of a goto.

Statement number 4, because it immediately follows a conditional goto statement.

Statement number 8, because it is a target of a conditional goto statement.

Therefore, the basic blocks into which the above code can be partitioned are as follows, and
the program flow graph is shown in Figure 1.

Block B1:

Block B2:

Block B3:

Block B4:

49

Figure 1: Program flow graph.


3 Loop Detection
A loop is a cycle in the flow graph that satisfies two properties:
1. It should have a single entry node or header, so that it will be possible to move all of the
loop invariant computations to a unique place, called a "preheader," which is a
block/node placed outside the loop, just in front of the header.
2. It should be strongly connected; that is, it should be possible to go from any node of the
loop to any other node while staying within the loop. This is required until at least some
of the loops get executed repeatedly.
If the flow graph contains one or more back edges, then only one or more loops/ xcycles exist in
the program. Therefore, we must identify any back edges in the flow graph.
4 Identification of the Back Edges
To identify the back edges in the flow graph, we compute the dominators of every node of the
program flow graph. A node a is a dominator of node b if all the paths starting at the initial node
of the graph that reach to node b go through a. For example, consider the flow graph in Figure 2.
In this flow graph, the dominator of node 3 is only node 1, because all the paths reaching up to
node 3 from node 1 do not go through node 2.

50

Figure 2: The flow graph back edges are identified by computing the dominators.
Dominator (dom) relationships have the following properties:
1. They are reflexive; that is, every node dominates itself.
2. That are transitive; that is, if a dom b and b dom c, this implies a dom c.
5 Reducible Flow Graphs
Several code-optimization transformations are easy to perform on reducible flow graphs. A flow
graph G is reducible if and only if we can partition the edges into two disjointed groups, forward
edges and back edges, with the following two properties:
1. The forward edges form an acyclic graph in which every node can be reached from the
initial node G.
2. The back edges consist only of edges whose heads dominate their tails.
For example, consider the flow graph shown in Figure 3. This flow graph has no back edges,
because no edge's head dominates the tail of that edge. Hence, it could have been a reducible
graph if the entire graph had been acyclic. But that is not the case. Therefore, it is not a reducible
flow graph.

Figure 3: A flow graph with no back edges.


After identifying the back edges, if any, the natural loop of every back edge must be identified.
The natural loop of a back edge a b is the set of all those nodes that can reach a without
going through b, including node b itself. Therefore, to find a natural loop of the back edge n
51

d, we start with node n and add all the predecessors of node n to the loop. Then we add the
predecessors of the nodes that were just added to the loop; and we continue this process until we
reach node d. These nodes plus node d constitute the set of all those nodes that can reach node n
without going through node d. This is the natural loop of the edge n d. Therefore, the
algorithm for detecting the natural loop of a back edge is:
Input : back edge n d.
Output: set loop, which is a set of nodes forming the natural
loop of the back edge n d.
main()
{
loop = { d } / * Initialize by adding node d to the set loop*/
insert(n); /* call a procedure insert with the node n */
}
procedure insert(m)
{
if m is not in the loop then
{
loop = loop { m }
for every predecessor p of m do
insert(p);
}
}
For example in the flow graph shown in Figure 1, the back edges are edge B3 B2, and the
loop is comprised of the blocks B2 and B3.
After the natural loops of the back edges are identified, the next task is to identify the loop
invariant computations. The three-address statement x = y op z, which exists in the basic block B
(a part of the loop), is a loop invariant statement if all possible definitions of b and c that reach
upto this statement are outside the loop, or if b and c are constants, because then the calculation
b op c will be the same each time the statement is encountered in the loop. Hence, to decide
whether the statement x = b op c is loop invariant or not, we must compute the ud chaining
information. The ud chaining information is computed by doing a global data flow analysis of
the flow graph. All of the definitions that are capable of reaching to a point immediately before
the start of the basic block are computed, and we call the set of all such definitions for a block B
the IN(B). The set of all the definitions capable of reaching to a point immediately after the last
statement of block B will be called OUT(B). We compute both IN(B) and OUT(B) for every
block B, GEN(B) and KILL(B), which are defined as:

52

GEN(B): The set of all the definitions generated in block B.

KILL(B): The set of all the definitions outside block B that define the same variables as
are defined in block B.

Consider the flow graph in Figure 4.The GEN and KILL sets for the basic blocks are as shown
in Table 1.
Table 1: GEN and KILL sets for Figure 4 Flow Graph
Block

GEN

KILL

B1

{1,2}

{6,10,11}

B2

{3,4}

{5,8}

B3

{5}

{4,8}

B4

{6,7}

{2,9,11}

B5

{8,9}

{4,5,7}

B6

{10,11}

{1,2,6}

Figure 4: Flow graph with GEN and KILL block sets.


IN(B) and OUT(B) are defined by the following set of equations, which are called "data flow
equations":

53

The next step, therefore, is to solve these equations. If there are n nodes, there will be 2n
equations in 2n unknowns. The solution to these equations is not generally unique. This is
because we may have a situation like that shown in Figure 5, where a block B is a predecessor of
itself.

Figure 5: Nonunique solution to a data flow equation, where B is a predecessor of itself.


If there is a solution to the data flow equations for block B, and if the solution is IN(B) = IN0 and
OUT(B) = OUT0, then IN0 {d} and OUT0 {d}, where d is any definition not in IN0. OUT0
and KILL(B) also satisfy the equations, because if we take OUT0 {d} as the value of OUT(B),
since B is one of the predecessors of itself according to IN(B) = OUT(P), d gets added to
IN(B), because d is not in the KILL(B). Hence, we get IN(B) = IN0 {d}. And according to
OUT(B) = IN(B) KILL(B) GEN(B), OUT(B) = OUT0 {d} gets satisfied. Therefore, IN0,
OUT0 is one of the solutions, whereas IN0 {d}, OUT0 {d} is another solution to the
equationsno unique solution. What we are interested in is finding smallest solution, that is, the
smallest IN(B) and OUT(B) for every block B, which consists of values that are in all solutions.
For example, since IN0 is in IN0 {d}, and OUT0 is in OUT0 {d}, IN0, OUT0 is the smallest
solution. And this is what we want, because the smallest IN(B) turns out to be the set of all
definitions reaching the point just before the beginning of B. The algorithm for computing the
smallest IN(B) and OUT(B) is as follows:
1. For each block B do
2.
{
IN(B)=
OUT(B)= GEN(B)
}
3. flag = true
4. while (flag) do

54

5.

{
flag = false
for each block B do
{
INnew(B) =
for each predecessor P of B

6.

INnew(B) = INnew(B) OUT(P)


if INnew(B) IN(B) then
{
flag = true
IN(B) = INnew(B)
OUT(B) = IN(B) - KILL(B) GEN(B)
}
}
}
Initially, we take IN(B) for every block that is to be an empty set, and we take OUT(B) for
GEN(B), and we compute INnew(B). If it is different from IN(B), we compute a new OUT(B) and
go for the next iteration. This is continued until IN(B) comes out to be the same for every B in a
previous or current iteration.
For example, for the flow graph shown in Figure 5, the IN and OUT iterations for the blocks are
computed using above algorithm, as shown in Tables 26.
Table 2: IN and OUT Computation for Figure 5
Block

IN

OUT

B1

{1,2}

B2

{3,4}

B3

{5}

B4

{6,7}

B5

{8,9}

B6

{10,11}

Table 3: First Iteration for the IN and OUT Values


Block

IN

OUT

B1

{1,2}

B2

{1,2,6,7}

{1,2,3,4,6,7}

B3

{3,4,8,9}

{3,5,9}

B4

{3,4,5}

{3,4,5,6,7}

B5

{5}

{8,9}

B6

{6,7}

{7,10,11}

55

Table 4: Second Iteration for the IN and OUT Values


Block

IN

OUT

B1

{1,2}

B2

{1,2,3,4,5,6,7}

{1,2,3,4,6,7}

B3

{1,2,3,4,6,7,8,9}

{1,2,3,5,6,7,9}

B4

{1,2,3,4,5,6,7,9}

{1,3,4,5,6,7}

B5

{3,5,9}

{3,8,9}

B6

{3,4,5,6,7}

{3,4,5,7,10,11}

Table 5: Third Iteration for the IN and OUT Values


Block

IN

OUT

B1

{1,2}

B2

{1,2,3,4,5,6,7}

{1,2,3,4,6,7}

B3

{1,2,3,4,6,7,8,9}

{1,2,3,5,6,7,9}

B4

{1,2,3,4,5,6,7,9}

{1,3,4,5,6,7}

B5

{1,2,3,5,6,7,9}

{1,2,3,6,8,9}

B6

{1,3,4,5,6,7}

{1,3,4,5,7,10,11}

Table 6: Fourth Iteration for the IN and OUT Values


Block

IN

OUT

B1

{1,2}

B2

{1,2,3,4,5,6,7}

{1,2,3,4,6,7}

B3

{1,2,3,4,6,7,8,9}

{1,2,3,5,6,7,9}

B4

{1,2,3,4,5,6,7,9}

{1,3,4,5,6,7}

B5

{1,2,3,5,6,7,9}

{1,2,3,6,8,9}

B6

{1,3,4,5,6,7}

{1,3,4,5,7,10,11}

The next step is to compute the ud chains from the reaching definitions information, as follows.
If the use of A in block B is preceded by its definition, then the ud chain of A contains only the
last definition prior to this use of A. If the use of A in block B is not preceded by any definition
of A, then the ud chain for this use consists of all definitions of A in IN(B).
For example, in the flow graph for which IN and OUT were computed in Tables 26, the use of
a in definition 4, block B2 is preceded by definition 3, which is the definition of a. Hence, the
ud chain for this use of a only contains definition 3. But the use of b in B2 is not preceded by
any definition of b in B2. Therefore, the ud chain for this use of B will be {1}, because this is
the only definition of b in IN(B2).
The ud chain information is used to identify the loop invariant computations. The next step is
to perform the code motion, which moves a loop invariant statement to a newly created node,
called "preheader," whose only successor is a header of the loop. All the predecessors of the
header that lie outside the loop will become predecessors of the preheader.
56

But sometimes the movement of a loop invariant statement to the preheader is not possible
because such a move would alter the semantics of the program. For example, if a loop invariant
statement exists in a basic block that is not a dominator of all the exits of the loop (where an exit
of the loop is the node whose successor is outside the loop), then moving the loop invariant
statement in the preheader may change the semantics of the program. Therefore, before moving
a loop invariant statement to the preheader, we must check whether the code motion is legal or
not. Consider the flow graph shown in Figure 6

Figure 6: A flow graph containing a loop invariant statement.

In the flow graph shown in Figure 6, x = 2 is the loop invariant. But since it occurs in B3, which
is not the dominator of the exit of loop, if we move it to the preheader, as shown in Figure 7, a
value of two will always get assigned to y in B5; whereas in the original program, y in B5 may
get value one as well as two.

Figure 7: Moving a loop invariant statement changes the semantics of the program.
After Moving x = 2 to the Preheader
In the flow graph shown in Figure 7, if x is not used outside the loop, then the statement x = 2
can be moved to the preheader. Therefore, for a code motion to be legal, the following
conditions must be met, even if no errors are encountered:

57

1. The block in which a loop invariant statement occurs should be a dominator of all exits
of the loop, or the name assigned to the block should not be used outside the loop.
2. We cannot move a loop invariant statement assigned to A into preheader if there is
another statement in the loop that assigns to A. For example, consider the flow graph
shown in Figure 8.

Figure 8: Moving the preheader changes the meaning of the program.


Even though the statement x = 3 in B2 satisfies condition (1), moving it to the preheader
will change the meaning of the program. Because if x = 3 is moved to the preheader, then
the value that will be assigned to y in B5 will be two if the execution path is B1B2B3
B4B2B4B5. Whereas for the same execution path, the original program assigns a
three to y in B5.
3. The move is illegal if A is used in the loop, and A is reached by any definition of A other
than the statement to be moved. For example, consider the flow graph shown in Figure 9.

Figure 9: Moving a value to the preheader changes the original meaning of the program.
Even though x is not used outside the loop, the statement x = 2 in the block B2 cannot be moved
to the preheader, because the use of x in B4 is also reached by the definition x = 1 in B1.
Therefore, if we move x = 2 to the preheader, then the value that will get assigned to a in B4 will
always be a one, which is not the case in the original program.

58

Conclusion: Thus we have studied code optimization for common sub-expression elimination,
loop invariant code movement.

59

Appendix B: ASCII Character Set


American Standard Code for Information Interchange (ASCII) is a character set the most
widely used at the early development of computers, proposed by ANSI in 1963. It uses an 7-bit
word to represent a character so that at maximum 27=128 characters can be represented.. ASCII
code is the numerical representation of a character such as 'x' or '$' or any action commonly
required in simple data interchange.
Later on, ASCII code was extended to cater for the need of increasing demand to
represent additional characters on the computer. This stem to the Extended ASCII character set
consist of 128 characters. Below is the original ASCII character set consisting of 128 characters.

60

Evaluation and marking system:


Basic honesty in the evaluation and marking system is absolutely essential and in the process
impartial nature of the evaluator is required in the examination system to become popular
amongst the students. It is a wrong approach or concept to award the students by way of easy
marking to get cheap popularity among the students to which they do not deserve. It is a
primary responsibility of the teacher that right students who are really putting up lot of hard
work with right kind of intelligence are correctly awarded.
The marking patterns should be justifiable to the students without any ambiguity and teacher
should see that students are faced with unjust circumstances.
The assessment is done according to the directives of the Principal/ Vice-Principal/ Dean
Academics.

61

Das könnte Ihnen auch gefallen