Sie sind auf Seite 1von 9

Optimization in a COMPILER

“An Optimizing compiler tries to generate code which a good Assembly language
programmer, intimately knowledgeable about the CPU being used, would have written. “
The why are we interested in optimization?
The point is – we are interested in an optimized code which is also portable. If a good
assembly language programmer has to write similar code for the different CPUs.
Two goals:
• Time Reduction – reduce the time for execution of the compiled program
• Space reduction – reduce the space occupied by the running program.
Two levels:
• Machine Independent optimization (MI):
e.g. (taling example from FORTON language):
DO l1 I = 1, 1000
C = 0.0
l1 A = D (I)*5

In this program segment, the statement C = 0.0 does something which is not related to
the loop and can be done only once outside the loop and reduce the execution time.
Thus:
C = 0.0
DO l1 I = 1, 1000
l1 A = D (I)*5

This operation of adjusting the code for reducing the execution time is independent of
the type of CPU used; it is valid for any type of CPU.
• Machine Dependent optimization (MD):
For example, store the temporary results which are needed to saved while evaluating
an expression in the CPU (or NFU) registers and not in the main memory.
We call these two levels because they generally occur at two different levels of translation
activity.
Three Categories:
There are three categories of optimization which are generally used:
• Parse optimization – i.e., optimization actions which are implemented within the
parser
• Linear (Peephole) optimization – those optimization actions which are done at the
level of the intermediate code
• Structural optimization – deals with the total program or a substantial part of it.
A particular method of optimization may lead to reduction in execution time (T) or
the required memory space (S) or both (S&T). In the following description of the methods,
appropriate code (MD, MI) and (S, T or S&T) are written to indicate the overall properties.
1. Parser Optimization : (MI, MD) (T and possible S)
Generate good code as far as possible:
• Reduce number of “goto”; (MI, S&T)
• Use in-line code for short functions (e.g. string copy) rather than a function call;
(MI, MD, T)
• Use shift-left to multiply by 2 or power of 2; (MI, MD, T) etc.
Generally limited to optimization that can be done based on a single production rule in
the grammar.
2. Linear or Peephole optimization :
This kind of optimization is done by one or more additional passes over the entire
output of the Parser, in a linear fashion. Such optimization is called “peephole”
optimization because it inspects and manipulates a small portion of the output of the
Parser at a time.
• Relatively small and fast
• Require only small amount of additional memory
• Looks for certain patterns and replaces them with simpler ones
• Works best with 3-tuple and 4-tuple as IL
• A number of individual things that can be done

We shall now discuss these methods in some details:


a) Strength Reduction : replace one operation with another requiring less time on
any machine (MI) (T S trade-off : T ↓ S ↑);
e.g. replace multiplication by a constant by a series of add and shifts;

Operation Replace by [opt.]


x*4 x<<2 [T&S]
x/8 x>>3 [T&S]
x%8 x&7 [T&S]
multiply by a small int. Adds or shifts and adds: [T ↓ S ↑]

= T1 a
< T1 3
<
< T1 1
<
+ T1 (2)
+ (4) (3)
* a 11
b) Constant Folding :
Compiler itself does as much of the calculation as possible. If any expression
or part of it contains only constants, then the compiler can easily calculate it at the
compile time, even the Parser can do it. For example,
the expression a + 3 * 7 is replaced by a + 21,
As the parser itself can do the calculations 3*7=21. This is called Constant Folding.
1 * 3 7
2 + A (1)
Both are constants so fold them.
If the expression were a + 1 + 3 then parser will not be able to detect it for constant
folding:
1 + a 1
2 + (1 2
)
Cannot detect as operation with two constants. But a Peephole optimizer, discussed
later on will detect it as two consecutive operations with constants and may change
the matrix to
1 + a 3
Multiply or divide by 1 and add/sub of 0 can be completely eliminated. [MI, S&T]
Similarly < < 0 or > > 0 can also be eliminated.
c) Constant Propagation :
More general than the above method. In a typical code, many variables retain
a constant value for a large portion of the code. The compiler can take note when
a constant is assigned to a variable and use the constant instead of the variable,
generally the operation with constants take less time. For example
=y5 =y5
..... is replaced by .....
..... .....
+xy +x5
i.e. , till the compiler can use the constant value assigned to a variable, it will go
on using a constant. [MI, T]
d) Dead variable and Dead code :
Dead variables: if a variable is initialized, but then it is not used on any RHS,
in any function call, etc., till a certain statement, then between these statements it
is a dead variable.
For example:
x = .........

....... x does not appear on RHS.


....... x is dead. If there are any assignments to x here they may be
eliminated.
x = .........
y = x + ...... x is live again.

Dead Code: is the code that cannot be reached under any circumstances or
does not do anything useful. For example,
If(0) do – something; <next> 1|-00
2 | BMZ 1 n+1
3
.....
n do – something entries
all the entries in the matrix labled “do – something” can be eliminated.
It should be noted that a combination of constant propagation and dead
– variable/code removal may take place. For example,
f1(){
int i,j,a[10];
..... Note that i is not dead here.
i=3;
++i;
.....
j = a[i];
.....
}
But if first propagation is used:
f1(){
int i,j,a[10];
.....
i = 3;
++i;  now i is dead as it can be replaced by 4.
.....
j = a[4];  we may remove i.
and remove the dead code:
f1(){
int i,j,a[10];
.....
j = a[4];
.....
}
One important point to remember about the Dead variable/code removal is that
if the variable represents some hardware action, it may be modified due to ISR
etc. In fact ANSI C recognize this possibility and provides a keyword volatile
to declare those variables whose values may not remain constant for any
predictable of the code. For example,
volatile char *port;
No assumptions are made about the variable by the optimizer in this case.
3. Structural Optimization : [ usually MI, usually S&T ]
In this type of optimization, the overall structure of the code is analyzed. It uses Parse
Tree or Parse Matrix.
For the best optimization, the full Parse Tree of the program is required.
The tree can be, of course, in some Intermediate Language (IL).
The optimizer may also recreate the Parse Tree out of this IL output.
Methods available are :
a) Elimination of common sub-expression : [MI, S&T]
If the code contains expressions which are repeatedly used, they are calculated
only once and each subsequent is replaced by a temporary quantity. For example,
a*b+a*b
the raw generated code is: optimized code is:
LD a
MUL b
ST T1
LD a
MUL b
ADD T1

LD a
MUL b
ST T1
ADD T1

Note that here the sub-expressions was in the same main expression, but a good
structural optimizer is able to detect and eliminate sub-expressions used across
statements. For example,
c = a * b; this will be optimized to:
..... c=a*b
..... d = c < < 1;
d = a * b + a * b; assuming that a, b and c do not change inbetween.
b) Register Allocation : [MD, S&T]
• Put temporaries which are used most frequently in registers;
• Save registers on a subroutine CALL and then restore them on return;
• “life-time” analysis is required;
c) Loop unwinding : [MI,T]
Replace a loop by its equivalent linear code. For example,
for (i=0; i<5; i++) a[i] = i;
can be replaced by :
a[0] = 0;
a[1] = 1; etc..
Here the compiler can precalculate the addresses of a[0], a[1], etc. and avoid
generation of array element address calculation code. The loop setting-up and
loop-control code is, of course, removed.
d) Replace index with pointers : [MD, S&T]
Here we replace code segments like:
a[i] by *(a+i)
b[i][j] by *(b+jmax*i+j)
which is must faster, because the compiler can take advantage of constant
propagation, folding, etc. If indexing code is left as it is, the a run time subroutine
will be used to generate the array element address, which can be very time
consuming.
On the machines where indexing by more than one index registers is
provided, this method does not give much advantage.
e) Move loop-invariant code out of the loop : [MI, T]
For example,
For (i=0; i<100; ++i)
Array [ i ] = num/denom;
Here the RHS of the assignment within the loop is invariant with respect to the
loop, so it can be removed out of the loop. Thus,
t1 =num/denom;
for (i=0; i<100; ++i)
array [ i ] = t1;
It should be noted that one must move all the relevant code (see Holub p. 676).

GNU C compiler:
The GNU C and C++ compilers available with LINUX operating system have very
good facilities for control of the type and level of optimization that is done by the compiler.
Out of the several levels of the optimization, levels O1, O2 and “unwinding loops”
seems interesting. In order to study the optimization, you should compile test C programs
with appropriate optimization switch and obtain the Assembly language output but use of –S
switch. Thus,
cc –S test1.c
The compiler will a file named test1.s containing the Assembly language version of your
program. You should try to test various possible optimization.

Some examples of optimization achieved by GNU C compiler:


1. The original C program was :
Main ( ) {
int i, j , k;
k = 5;
i = ( k +5 ) * ( k + 5 );
j = (k + 5 );
printf (“%d %d %d , i, j, k) ;
}
The AL file generated when compiled without any optimization was :
{ command used was cc –S test1.c }
.file “test1 . c”
.version “01 . 01”
gcc2_compiled . :
. section . rodata
. LC0 :
. string “%d %d %d”
. text
. align 16
. glob1 main
. type main , @function
main :
push1 %ebp
mov1 %esp, %ebp
sub1 $12, %esp
mov1 $5, -12 (%ebp)
mov1 -12 (%ebp) , %eax
add1 $5, %eax
mov1 -12 (%ebp), %edx
add1 $5, %edx
mov1 %eax, %ecx
imul1 %edx, %ecx
mov1 %edx, -4 (%ebp)
mov1 -12 (%ebp), %ecx
add1 $5, %ecx
mov1 %ecx, -8(%ebp)
mov1 -12 (%ebp), %eax
push1 %eax
mov1 -8 (%ebp), %eax
push1 %eax
mov1 -4 (%ebp), %eax
push1 %eax
push1 $.LC0
call printf
add1 $16, %esp
.L1:
mov1 %ebp, %esp
pop1 %ebp
ret
.Lfe1:
.size main, .Lfe1 – main
.ident “GCC: (GNU) 2.7.0”
When compiled with level O1 optimization { cc –O1 –S test1.c }
.file “test1.c”
.version “01.01”
gcc2_compiled.:
.section .rodata
.LC0:
.string “%d %d %d”
.text
.align 16
.global main
.type main, @function
Main:
push1 %ebp
mov1 %esp, %ebp
push1 $5
push1 $100
push1 $.LC0
call printf
mov1 %ebp, %esp
pop1 %ebp
ret
.Lfe1:
.size main, .Lfe1 – main
.indent “GCC: (GNU) 2.7.0”
When compiled with level O2 optimization { cc –O2 –S test1.c }
.file “test1.c”
.version “01.01”
gcc2_compiled.:
.section .rodata
.LC0:
.string “%d %d %d”
.text
.align 16
.global main
.type main, @function
main:
push1 %ebp
mov1 %esp, %ebp
push1 $5
push1 $100
push1 $.LC0
call printf
mov1 %ebp, %esp
pop1 %ebp
ret
.Lfe1:
.size main, .Lfe1 – main
.indent “GCC: (GNU) 2.7.0”
2. The second example C program was:
main() {
int i, j, k;
for(i=0; i<4; i++){
j = 9;
k = i+4;
}
}
Without any optimization:
.file “test2.c”
.version “01.01”
gcc2 compiled.:
.text
.align 16
.global main
.type main, @function
.main:
push1 %ebp
mov1 %esp, %ebp
sub1 $12, %esp
nop
mov1 $0, -4 (%ebp)
.L2:
cmp1 $3, -4 (%ebp)
jle .L5
jmp .L3
.align 16
.L5:
mov1 $9, -8 (%ebp)
mov1 -4 (%ebp), %eax
add1 $4, %eax
mov1 %eax, -12 (%ebp)
.L4:
incl -4 (%ebp)
jmp .L2
.align 16
.L3:
.L1:
mov1 %ebp, %esp
pop1 %ebp
ret
.Lfe1:
.size main, .Lfe1 – main
.indent “GCC: (GNU) 2.7.0”

Das könnte Ihnen auch gefallen