Beruflich Dokumente
Kultur Dokumente
BACHELOR OF TECHNOLOGY
by
Haynes M G Y2033
2005,Monsoon Semester
National Institute of Te
hnology, Cali
ut
Department of Computer Engineering
Haynes M G Y2033
Job Abraham Y2029
1 Introdu
tion 1
1.1 Problem Spe
i
ation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 The SUIF
ompiler system and intermediate format 2
2.1 SUIF kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 SUIF toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 SUIF intermediate format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3.1 File Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.2 Pro
edure Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.3 Instru
tion Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.4 Symboli
Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.5 Other Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 The
ompiler pass 8
3.1 Working of the pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 The run time monitor 11
4.1 Re
Obj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Re
A
ess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Re
Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4 Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5 Con
lusion 12
1 Introdu
tion
• Understanding the SUIF format and the working of the ompiler system.
• Implementing a pass to
he
k for data a
ess at run time and do an analysis of the same
using a run time monitor.
• An Overview of the SUIF System - This do
ument gives a
omplete overview of the SUIF
ompiler system , ar
hite
ture, the design of the SUIF base kernel and the
ompiler toolkit
available with it.
• The SUIF Library - This do
ument is a referen
e manual for the SUIF library.This li-
brary gives us a
omplete referen
e about the SUIF intermediate format and the routines
available to manipulate these data stru
tures.
• The SUIF Cookbook - This is a guide whi
h
ontains a series of examples des
ribing
various passes. This guide introdu
es us to writing new passes and explains how to
ompile and run the newly
reated passes.
• The SUIF sour
e
ode - The a
tual sour
e
ode was also extensively referred to make
ourselves
lear about SUIF internals.
• Compilers Prin iples, Te hniques and Tools: Aho, Sethi & Ullman [1℄
The
ompiler is stru
tured as a small kernel plus a toolkit
onsisting of various
ompilation
analyses and optimizations built using the kernel. The kernel denes the intermediate repre-
sentation and the interfa
e between passes of the
ompiler. This interfa
e is always the same
so that the passes in the toolkit
an easily be enhan
ed, repla
ed, or rearranged. All program
information ne
essary to implement s
alar and parallel
ompiler optimizations is easily avail-
able from the SUIF kernel.
The intermediate program representation is a hierar
hy of data stru
tures dened in an obje
t-
oriented
lass library. This intermediate representation retains almost all the high-level infor-
mation from the sour
e
ode. A
essing and manipulating the data stru
tures are generally
straightforward due to the modular design of the kernel.
At the root of the hierar
hy for a SUIF program is a "le set"
ontaining a list of the les being
ompiled. Ea
h entry within the le set is a "le set entry" that
ontains the input and output
streams for a parti
ular le.
The le level of the SUIF hierar
hy also
ontains the global symbol tables. The le set
ontains
the global symbol table that is shared a
ross all of the les. This shared symbol table is the key
to supporting interpro
edural analysis. Referen
es to a global symbol or type from dierent
les
an point to the same entry in the shared global symbol table, making it easy to determine
that they refer to the same entity. Ea
h le set entry also
ontains its own symbol table for
things de
lared privately within that le.
Lower levels of the SUIF hierar
hy
an be rea
hed through the global symbol tables. Besides
the types and variables, the global symbol tables
ontain symbols for the pro
edures. The
pro
edure bodies
an be a
essed through these pro
edure symbols. If the body of a pro
edure
is
ontained in one of the input les, the
orresponding pro
edure symbol automati
ally re
ords
a pointer to the input le and provides a method to read the body into memory. The pro
edure
symbol also has other methods to write the body to an output le and to ush the body from
memory. Many SUIF programs need to pro
ess all the pro
edures. This
an be done by sear
h-
ing through the global symbol tables for the pro
edure symbols. However sin
e this is su
h
a
ommon task, the le set entries in
lude pro
edure iterators to step through all the pro
edures.
The "le set" is very helpful when we are working with interpro
edural passes. Without it,
we would have to
ombine all the
ode into one big le.
Pro
edure bodies are represented using a language-independent form of abstra
t syntax trees
(ASTs). In the rst stages of a
ompilation, the high-level stru
ture is represented by a language-
independent form. This format is
alled high-SUIF. This format is well suited for passes that
require the high level stru
ture of
ode like dependen
e analysis and loop transformation.
Later in the
ompilation pro
ess, the ASTs are redu
ed to sequential lists of instru
tions. This
form
alled low-SUIF, works well for some s
alar optimizations and for
ode generation. Both
formats are represented using the same tree data stru
tures. They only dier in the amount of
information present .
• Instru
tion Nodes: They form the leave nodes of the ast. Ea
h of these nodes
ontains
a single instru
tion or expression tree. In low-SUIF
ode, a pro
edure body is redu
ed
to a list of instru
tion nodes
ontaining individual instru
tions. This form resembles the
quadruple representation used by traditional s
alar optimizers. Ea
h instru
tion node
ontains a single instru
tion or expression tree. Methods are provided to atta
h/deta
h
an instru
tion, apply a fun
tion over all instru
tions in an expression tree and so on.
• Blo
k nodes: Blo
k nodes represent nested s
opes. A blo
k node
ontains a symbol table
and a list of the AST nodes within the blo
k. The s
ope of the symbols and types dened
in the symbol table is restri
ted to the AST nodes within the blo
k. They
annot be
referen
ed from outside the blo
k.
• If nodes: Conditional stru
tures may be represented by if nodes. An "if" node has three
parts, ea
h of whi
h is a list of AST nodes. The header list
ontains
ode to evaluate
the
ondition and either bran
h to the else list or fall through to the then list. Be
ause
the header
an
ontain
ontrol ow, it is easy to implement short-
ir
uit evaluation of
onditional expressions. SUIF has two dierent kinds of loops. One type is the "for"
loops and the other is "loop node".
• A loop node
ontains two lists of AST nodes. It represents a "do-while" loop. It
ontains
two tree node lists: the body and the test. The body list
omes rst and holds the loop
body. The test list
ontains
ode to evaluate the "while" expression and
onditionally
bran
h ba
k to the beginning of the body. The for loop node in addition to the loop body
spe
ies the index variable and the range of values for the index. The lower bound, upper
bound, and step operands are expressions that are evaluated on
e at the beginning of the
loop. The for loop also has an optional landing_pad part whi
h is used to exe
ute loop
invariant
ode.
Ea
h instru
tion node in an abstra
t syntax tree holds a SUIF instru
tion. Most SUIF in-
stru
tions perform simple operations; the op
odes resemble those for a typi
al RISC pro
essor.
However, more
omplex instru
tions are used in pla
es where it is important to retain high-level
information.
SUIF supports both expression trees (high-SUIF) and at lists (low-SUIF) of instru
tions. In
an expression tree, the instru
tions for an expression are all grouped together. This works well
for high-level passes. Be
ause expression trees do not totally order the evaluation of the instru
-
tions, they do not work so well for ba
k-end optimization and s
heduling passes. Thus SUIF
also provides the at list presentation where ea
h instru
tion node
ontains a single instru
tion.
Most SUIF instru
tions use a "quadruple" format with a destination operand and two sour
e
operands; however, some instru
tions require more spe
ialized formats. For example, ld
(load
onstant) instru
tions have an immediate value eld in pla
e of the sour
e operands. The
al
(
all) instru
tion implements a ma
hine-independent pro
edure
all with a list of parameters.
This hides the details of various linkage
onventions.
SUIF in
ludes detailed symboli
information. Symbols and types are dened in nested s
opes
orresponding to the blo
k stru
ture of the program. A symbol table is atta
hed to ea
h ele-
ment of the main SUIF hierar
hy that denes a new s
ope. Symbols re
ord information about
variables, labels, and pro
edures. The SUIF type system is similar to C but also has some
support for FORTRAN and other languages.
The symbol tables are dened in a tree stru
ture that forms a hierar
hy parallel to the main
SUIF hierar
hy. Ea
h table re
ords a pointer to its parent and keeps a list of its
hildren. The
global symbol table at the root is atta
hed to the le set and is shared a
ross all the les. Its
hildren are the le symbol tables atta
hed to the le set entries. The pro
edure symbol tables
for the AST pro
edure nodes are in the next level down, followed by the blo
k symbol tables
for blo
k nodes within the ASTs. The blo
k symbol tables may be nested to any level.
Ea
h symbol table
ontains a list of symbols that are dened within the
orresponding
s
ope. There are three dierent kinds of symbols: variables, labels, and pro
edures.
A variable symbol
ontains a pointer to the type for the variable. The type determines
the amount of storage used to hold the variable as well as the interpretation of its
ontents.
Some additional ags are used for variable symbols. e.g.: ags to distinguish between formal
parameters and those that have their address taken, ags to identify variables that represent
ma
hine registers et
.
Label symbols
an only be de
lared within pro
edures. The position of a label in the
ode
is marked with a spe
ial instru
tion.
Pro
edure symbols
an only be de
lared in the global and le s
opes. A pro
edure symbol
ontains a pointer to the AST for the body of the pro
edure if it exists. It also provides methods
to read the body from an input le, write it to an output le, and ush it from memory. The
pro
edure symbol also has a pointer to the type for the pro
edure.
The SUIF type system
an represent most, if not all, high-level types for C programs and
for many other languages. The types are implemented with various kinds of type nodes. Ea
h
type node
ontains an operator that spe
ies the kind of node. Some of the type operators
dene base types that stand alone, while other operators refer to other type's nodes. For
example, a type node with the TYPE_INT operator denes a new integer type. A node with
the TYPE_PTR operator
an then refer to the integer type node to
reate a type for pointers
to integers.
SUIF is designed to be extended with new kinds of analyses and optimizations. These future
extensions will generally require that additional information be atta
hed to SUIF obje
ts and
propagated between passes. SUIF provides "annotations" whi
h allow user-dened data stru
-
tures to be atta
hed to most SUIF obje
ts. This is the primary me
hanism for making SUIF
easily extensible.
New annotations
an be de
lared by any program and used to re
ord whatever information
is needed within that program. They
an also be written to the SUIF output les so that other
programs
an use them. An annotation manager re
ords the annotation names and the format
of the data asso
iated with ea
h kind of annotation.
3 The
ompiler pass
The pass inserts fun
tion
alls at appropriate pla
es. By appropriate pla
es we mean every
data a
ess. This does not in
lude a
esses to literal values like numbers. For example after
applying the pass, the following
hanges o
ur.
After applying the pass the above program gets
hanged to:
int main()
{
int a,b;
Re
Obj('a',&a);
Re
Obj('b',&b);
a=3;
Re
A
ess(&a);
b=a+1;
Re
A
ess(&b);
Re
A
ess(&a);
return 0;
}
Re
Obj () is used to notify the monitor about the dierent variables dened in a pro
edure.
The arguments for this fun
tion are the name of the variable and the address of the variable.
Re
A
ess () is used to re
ord ea
h data a
ess. The argument for this fun
tion is the address
of the variable a
essed.
3.1 Working of the pass
We start by iterating through ea
h le in the sour
e program. For ea
h le, we iterate through
ea
h pro
edure in the le.
The pro
edure body
an be a
essed as a list of type tree_node_list.
By iterating through this list, we a
ess the individual tree nodes in the pro
edure body.
The individual nodes in the pro
edure body
an be one of these ve types.
We
an dire
tly a
ess the sour
e operand(s) and the destination operand for nodes of type
tree_instr only.For the other nodes, we have to re
ursively traverse through these nodes further
deeper until we rea
h a tree_instr node.
Now that we have a
ess to the individual operands we
an insert
ode into the tree_node_list.We
make use of the SUIF library itself for this purpose.We make obje
ts of type tree_node and
insert them into the tree_node_list.
The steps involved in making a tree_node for a fun tion all are as follows:
1. We iterate through ea
h sour
e operand of the instru
tion. If the operands are instru
-
tions, we re
ursivley examine this instru
tion for it's operands.If the operand is an im-
mediate value , we do nothing. If the operand is a variable, we lookup in the pro
edure
symbol table for this operand and extra
t the name of the variable.We will be using this
name to identify ea
h operand.
2. For ea
h sour
e operand whi
h is a variable, we have to insert a fun
tion
all (Re
A
ess)
to re
ord its a
ess. This is done by
reating an obje
t of type in_
al. in_
al is a
lass
dened in the SUIF library to represent fun
tion
alls.
in_
al obje
ts have as attributes the address of the
alled pro
edure, the number of
arguments and ea
h argument.We have to set ea
h of these attributes
orre
tly .
3. The above two steps are repeated for the destination operand also.We make an in_
al
obje
t for the destination operand .
4. On
e we have
reated an in_
al obje
t , we have to make a tree node for this obje
t and
insert the node into the tree_node_list.We make use of the insert_before/insert_after
fun
tions dened in the tree_node_list
lass for this purpose.
E.g:
in_
al * f = (in_
al *)re
a
ess;
operand addr = f->addr_op().
lone();
in_
al * new_f = new in_
al();
new_f->set_addr_op(addr);
int args = f->num_args();
new_f->set_num_args(args);
/*
ode to
reate ea
h operand */
......
......
......
new_f->set_argument(0,p);
new_f->set_argument(1,q);
new_f->set_argument(2,r);
We also have to insert fun
tion
alls for ea
h variable dened in the pro
edure. This is available
from the symbol table of the pro
edure.
The steps involved are as follows:
1. Iterate through ea
h symbol in the symbol table and extra
t the name of ea
h symbol
2. If the symbol is a label, do nothing. If it is a variable symbol ,
reate an in_
al obje
t
(Re
Obj) with arguments properly set .
3. Make a tree_node for this obje
t and insert the node into the tree_node_list at the
orre
t position.
E.g:
in_
al * f = (in_
al *)re
obj;
operand addr = f->addr_op().
lone();
in_
al * new_f = new in_
al();
new_f->set_addr_op(addr);
int args = f->num_args();
new_f->set_num_args(args);
/*
ode to
reate ea
h operand */
......
......
......
new_f->set_argument(0,sr
1);
new_f->set_argument(0,sr
2);
After we have inserted fun
tion
alls at the required pla
es, we have to write the
hanges
ba
k to an output le. This output le is in SUIF format itself.
The output le is
onverted to normal C
ode using the s2
utility that
omes along with
the SUIF 1.x
ompiler system.
On
e we have the normal C
ode we
an
ompile it using any standard C
ompiler like g
and
do an analysis of the
ode.
A transformed program invokes the run time monitor through the high level interfa
e
onsisting
mainly of three types of fun
tion
alls:
In addition to the above three fun
tions we have written a fun
tion Analyzer() to display the
number of a
ess for ea
h variable and the per
entage of a
ess for ea
h data.
4.1 Re
Obj
For ea
h data unit in a Re
Obj
all, the monitor
reates a re
ord whi
h
ontains its memory
address. For fast retrieval of shadow data, we store them in a hash table indexed by the starting
address of data units. We used linear hashing with
haining.
4.2 Re
A
ess
A
ess re
ording in Re
A
ess involves hash-table sear
h to nd the shadow data, and re
ord
the a
ess. The rst parameter of Re
A
ess is used in hash-table sear
h. It is either the start-
ing address of a data unit or an internal address. The hash entry is initialized by Re
Obj in
the rst
ase and by Re
Link in the se
ond
ase. Note that Re
Link happens before a program
takes an internal address from a data unit.
4.3 Re
Link
At Re
link, the monitor inserts the extra
ted address into the hash table and links it to the
shadow re
ord of its data unit. In the worst
ase, a program stores the address of every data
element, and the hash table has one entry for ea
h data element. However, our experien
e
shows that a program usually takes at most a
onstant number of internal addresses from any
data unit.
4.4 Analyzer
A
all to this fun
tion is inserted at the end of the main() fun
tion. This fun
tion prints out
number of a
esses and the per
entage of a
ess for ea
h data.
5 Con
lusion
Appropriate fun
tion
alls were inserted to the input program by the new pass whi
h re
ords the
data a
esses using the run time monitor. This proje
t serves as a basis for further appli
ation
spe
i
work. For example the re
ording of data a
ess
an be used for appli
ations like data
ooading . A further improvement possible in this proje
t would be sele
tive monitoring where
we further optimize on where to insert fun
tion
alls. In our base s
heme , we are monitoring
every data a
ess.However if a variable is a
essed frequently in a short
ode sequen
e,we
an
re
ord the rst a
ess and omit the rest.
The
urrent base s
heme implements monitoring of data a
ess for primitive data types like
int,
har , oat and for arrays. The implementation
an be easily extended to monitor data
a
ess in the
ase of pointers and stru
tures.
Referen
es
[1℄ Alfred V. Aho , Ravi Sethi , Jerey D. Ullman, Compilers: prin
iples, te
hniques, and tools,
Addison-Wesley Longman Publishing Co., In
., Boston, MA, 1986
[2℄ Compiler Dire
ted Monitoring of Program Data A
ess - Chen Ding, Yutao Zhong June 2002
ACM SIGPLAN Noti
es, Pro
eedings of the 2002 workshop on Memory system performan
e
MSP '02, Volume 38 Issue 2 supplement , pages 1-12
[3℄ The SUIF 1.x Compiler System - http://www.suif.stanford.edu