Sie sind auf Seite 1von 20

1 The Compilation Process

 The compilation process combines both translation and optimisation of


high level language code.
2 Machine Independent Compilation

 Simplifying arithmetic expressions is one example of a machine-


independent optimization. Not all compilers do such optimizations, and
compilers can vary widely regarding which combinations of machine-
independent optimizations they do perform.
 x[i] = c*x[i];
 A simple code generator would generate the address for x[i] twice, once
for each appearance in the statement. The later optimization phases can
recognize this as an example of common expressions that need not be
duplicated. While in this simple case it would be possible to create a code
generator that never generated the redundant expression, taking into
account every such optimization at code generation time is very
difficult.We get better code and more reliable compilers by generating
simple code first and then optimizing it.
3 Statement Translation

 Compiling an arithmetic expression


 In the following arithmetic expression, a*b + 5*(c – d) the variable is written
in terms of program variables.
 In some machines we may be able to perform memory-to-memory
arithmetic directly on the locations corresponding to those variables.
However, in many machines, such as the ARM, we must first load the
variables into registers. This requires choosing which registers receive not
only the named variables but also intermediate results such as (c d). The
code for the expression can be built by walking the data flow graph.
4 Statement Translation
5 Code for the statement translation
example
6 Procedures

 The ARM Procedure Call Standard (APCS) is a good illustration of a typical


procedure linkage mechanism. Although the stack frames are in main
memory, understanding how registers are used is key to understanding the
mechanism, as explained below.
 ■ r0,r3 are used to pass parameters into the procedure. r0 is also used to
hold the return value. If more than four parameters are required, they are
put on the stack frame.
 ■ r4, r7 hold register variables.
 ■ r11 is the frame pointer and r13 is the stack pointer.
 ■ r10 holds the limiting address on stack size, which is used to check for
stack overflows. Other registers have additional uses in the protocol.
7 Data Structures

 2D array data structure


is stored as contiguous
memory. A row
major approach is
used to track indexes.
8 Program Optimisation

 If we want to write programs in a high-level language, then we need to


understand how to optimize them without rewriting them in assembly
language.
 This first requires creating the proper source code that causes the compiler
to do what we want.
 Hopefully, the compiler can optimize our program by recognizing features
of the code and taking the proper action.
9 Expression Simplification

a*b + a*c
We can use the distributive law to rewrite the expression as
a*(b + c)
Since the new expression has only two operations rather than three for the
original form, it is almost certainly cheaper, because it is both faster and
smaller.
We can also use the laws of arithmetic to further simplify expressions on
constants. Consider the following C statement:
for (i = 0; i < 8 + 1; i++)
We can simplify 8 + 1 to 9 at compile time—there is no need to perform that
arithmetic while the program is executing.
10 Dead Code Elimination

 Code that will never be executed can be safely removed from the
program. The general problem of identifying code that will never be
executed is difficult, but there are some important special cases where it
can be done.
 Programmers will intentionally introduce dead code in certain situations.
Consider this C code fragment:
#define DEBUG 0 ... if (DEBUG)
print_debug_stuff();
 In the above case, the print_debug_stuff( ) function is never executed, but
the code allows the programmer to override the preprocessor variable
definition (perhaps with a compile-time flag) to enable the debugging
code.
11 Procedure Inlining

 An inlined procedure does not have a separate procedure body and


procedure linkage; rather, the body of the procedure is substituted in place
for the procedure call.
12 Loop Transformation

 Loops are important program structures—although they are compactly


described in the source code, they often use a large fraction of the
computation time. Many techniques have been designed to optimize
loops.
 Loop Unrolling – done for parallelism
 Loop fusion – combine 2 or more loops
 Loop Distribution – single loop decomposed to multiple loops
 Loop tiling – Breaks up a loop into a set of nested loop. Helps control
behaviour of cache.

 Array Padding – add dummy data into array. Control cache behaviour.
13 Loop Tiling
14 Register Allocation

 Given a block of code, we want to choose assignments of variables (both


declared and temporary) to registers to minimize the total number of
required registers.
 Consider the following C code:
 w = a + b; /* statement 1 */
 x = c + w; /* statement 2 */
 y = c + d; /* statement 3 */

 Naïve way of register allocation is to use 6 registers. It is not optimised.


15 Register Allocation – Life time graph

 X axis is instruction count


 Y axis is variables.
 The dashes represent the lifetime
Of a variable.
The maximum number of variables
active at any statement is the max
Registers required.
Hence optimised answer is 3 register.
16 Register Allocation
17 Scheduling and Instruction Selection

 In order to improve register allocation, sometimes we can change the


order of execution of operations in the instructions.
 It can be done by tracking resource utilisation and maintaining a
reservation table. This process is known as scheduling.
 Software pipelining is a technique for reordering instructions across several
loop iterations to reduce pipeline bubbles.
18 Scheduling and Instruction Selection

 Selecting the instructions to use to implement each operation is not trivial.


There may be several different instructions that can be used to accomplish
the same goal, but they may have different execution times.

 One useful technique for generating code is template matching.

 Dynamic programming can be used to efficiently find the lowest-cost


covering of trees,and heuristics can extend the technique to DAGs.
19 Scheduling and Instruction Selection
20 Interpreters and Just In Time Compilers

 Sometimes, we require on the fly translation of code. Interpreters and just in


time compilers are used in such cases.
 Interpreter is used in an embedded language called Forth.
 Similarly Just in Time compilers are widely used in Java.
 Drawback – Adds overhead in time and memory to execution.

Interpreter Just in Time


Does not produce explicit code Produces explicit code
One statement at a time A section of program is translated
Consumes less memory Relatively, consumes more
memory but less than normal
compilers.

Das könnte Ihnen auch gefallen