Sie sind auf Seite 1von 13

3/11/2015

Program design and 1


Program-level performance 2

analysis analysis

Program-level performance analysis. Need to understand


performance in detail:
Optimizing for: Real-time behavior, not
just typical.
Execution time.
On complex platforms.
Energy/power. Program performance
Program size. CPU performance:
Pipeline, cache are
Program validation and testing. windows into program.
We must analyze the entire
program.

Complexities of program 3
How to measure program 4

performance performance

Varies with input data: Simulate execution of the CPU.


Different-length paths. Makes CPU state visible.
Cache effects.
C Measure on real C
CPU
U using
g timer.
Instruction-level performance variations: Requires modifying the program to control
Pipeline interlocks. the timer.
Fetch times. Measure on real CPU using logic
analyzer.
Requires events visible on the pins.

1
3/11/2015

Program performance 5
Elements of program 6

metrics performance

Average-case execution time. Basic program execution time formula:


Typically used in application programming. execution time = program path + instruction timing
Solving these problems independently helps
Worst-case execution time.
simplify
i lif analysis.
l i
A component in deadline satisfaction. Easier to separate on simpler CPUs.
Best-case execution time. Accurate performance analysis requires:
Task-level interactions can cause best-case Assembly/binary code.
program behavior to result in worst-case Execution platform.
system behavior.

Data-dependent paths in 7 8

an if statement Paths in a loop


if (a || b) { /* T1 */ a b c path for (i=0, f=0; i<N; i++) i=0
if ( c ) /* T2 */ 0 0 0 T1=F, T3=F: no assignments
f=0
f = f + c[i] * x[i];
x = r*s+t; /* A1 */ 0 0 1 T1=F, T3=T: A4
else y=r+s; /* A2 */ 0 1 0 T1=T, T2=F: A2, A3 N
z = r+s+u; /* A3 */ 0 1 1 T1=T, T2=T: A1, A3 i N
i=N
} 1 0 0 T1=T, T2=F: A2, A3
Y
else { 1 0 1 T1=T, T2=T: A1, A3
if ( c ) /* T3 */ 1 1 0 T1=T, T2=F: A2, A3
f = f + c[i] * x[i]
y = r-t; /* A4 */ 1 1 1 T1=T, T2=T: A1, A3
} i=i+1

2
3/11/2015

Instruction Timing
9
Mesaurement-driven 10

Performance Analysis
Not all instructions take the same amount of time.
Multi-cycle instructions. Not so easy as it sounds:
Fetches. Must actually have access to the CPU.
Execution times of instructions are not Must know data inputs that give worst/best
depe de t
independent. case pe
performance.
o a ce
Pipeline interlocks. Must make state visible.
Cache effects.
Still an important method for performance
Execution times may vary with operand value.
analysis.
Floating-point operations.
Some multi-cycle integer operations.

11 12

Trace-driven Measurement Physical Measurement


Trace-driven: In-circuit emulator allows tracing.
Instrument the program. Affects execution timing.
Save information about the path. Logic analyzer can measure behavior at
Requires modifying the program.
program p
pins.
Trace files are large. Address bus can be analyzed to look for
events.
Widely used for cache analysis. Code can be modified to make events visible.
Particularly important for real-world input
streams.

3
3/11/2015

Performance Optimization 13
Programs and Performance 14

Motivation Analysis
Embedded systems must often meet Best results come from analyzing optimized
deadlines. instructions, not high-level language code:
Faster may not be fast enough. Non-obvious translations of HLL statements into
Need
N d tto b
be able
bl tto analyze
l execution
ti instructions;
Code may move;
time.
Cache effects are hard to predict.
Worst-case, not typical.
Need techniques for reliably improving
execution time.

15 16
Loop Optimizations
Code Motion
Loops are good targets for
optimization. for (i=0; i<N*M; i++)
i=0; Xi=0;
= N*M
z[i] = a[i] + b[i];
Basic loop optimizations: N
i<N*M
i<X
Code motion; Y
Induction-variable elimination; z[i] = a[i] + b[i];

Strength reduction (x*2 -> x<<1).


i = i+1;

4
3/11/2015

Induction Variable 17
Cache Analysis
18

Elimination
Induction variable: loop index. Loop nest: set of loops, one inside
Consider loop: other.
for (i=0; i<N; i++) Perfect loop nest: no conditionals in
f (j=0;
for (j 0 jj<M;
M jj++)) nest.
z[i,j] = b[i,j];
Because loops use large quantities of
Rather than recompute i*M+j for each array
data, cache conflicts are common.
in each iteration, share induction variable
between arrays, increment at end of loop
body.

Array Conflicts in Cache


19 20

Array conflicts, contd.

Array elements conflict because they are


a[0,0] 1024 in the same line, even if not mapped to
1024 4099 same location.
Solutions:
b[0,0] 4099 ... move one array;
pad array.

Main Memory Cache

5
3/11/2015

Performance Optimization 21
Energy/power Optimization
22

Hints
Use registers efficiently. Energy: ability to do work.
Most important in battery-powered systems.
Use page mode memory accesses.
Power: energy per unit time.
Analyze cache behavior: Important even in wall-plug
wall plug systems---power
systems power
Instruction conflicts can be handled by becomes heat.
rewriting code, rescheudling;
Conflicting scalar data can easily be
moved;
Conflicting array data can be moved,
padded.

Measuring Energy 23
Sources of Energy 24

Consumption Consumption
Relative energy per operation (Catthoor et
Execute a small loop, measure current:
al):
I
Memory transfer: 33
External
E t l I/O
I/O: 10
SRAM write: 9
while (TRUE)
a(); SRAM read: 4.4
Multiply: 3.6
Add: 1

6
3/11/2015

Cache Behavior is Important Cache Sweet Spot


25 26

Energy consumption has a sweet


spot as cache size changes:
Cache too small
Program thrashes
thrashes, burning energy on
external memory accesses;
Cache too large
Cache itself burns too much power.
[Li98] 1998 IEEE

Optimizing for Energy Optimizing for Energy


27 28

First-order optimization: Use registers efficiently.


High performance = low energy. Identify and eliminate cache conflicts.
Not many instructions trade speed Moderate loop unrolling eliminates some
for energy.
energy loopp overhead instructions.
Eliminate pipeline stalls.
In lining procedures may help: reduces
linkage, but may increase cache thrashing.

7
3/11/2015

Efficient Loops Single-instruction Repeat


29 30

Loop Example
General rules:
STM #4000h,AR2
Dont use function calls. ; load pointer to source
Keep loop body small to enable local STM #100h,AR3
repeat (only forward branches)
branches). ; load
l d pointer
i t tto ddestination
ti ti
Use unsigned integer for loop counter. RPT #(1024-1)
Use <= to test loop counter. MVDD *AR2+,*AR3+
; move
Make use of compiler---global
optimization, software pipelining.

Optimizing for Program Size


31 32
Data Size Minimization
Goal: Reuse constants, variables, data
Reduce hardware cost of memory; buffers in different parts of code.
Reduce power consumption of memory Requires careful verification of
units. correctness.
Two opportunities: Generate data using instructions.
Data;
Instructions.

8
3/11/2015

Reducing Code Size


33
Program Validation and 34

Testing
Avoid function inlining.
Choose CPU with compact instructions. But does it work?
Use specialized instructions where possible. Concentrate here on functional
verification.
Major testing strategies:
Black box doesnt look at the source code.
Clear box (white box) does look at the source
code.

Clear-box Testing
35
Controlling and Observing 36

Programs
Examine the source code to determine whether it
works: firout = 0.0;
Controllability:
Can you actually exercise a path? for (j=curr, k=0; j<N; j++, k++)
firout += buff[j] * c[k]; Must fill circular buffer
Do you get the value you expect along a path? for (j=0; j<curr; j++, k++) with desired N values.
Testing procedure: firout +=
+ buff[j] * c[k]; Other code governs
if (firout > 100.0) firout = 100.0; how we access the
Controllability: arovide program with inputs.
if (firout < -100.0) firout = -100.0;
Execute. buffer.
Observability: examine outputs. Observability:
Want to examine
firout before limit
testing.

9
3/11/2015

Execution Paths and Testing


37
Choosing the Paths to Test 38

Paths are important in functional testing as


well as performance analysis. Possible criteria:
Execute every
In general, an exponential number of paths statement at least
through the program. once. not covered
Show that some paths dominate others
others. Execute every
Heuristically limit paths. branch direction at
least once.
Equivalent for
structured programs.
Not true for gotos.

Cyclomatic Complexity
39 40
Basis Paths
Approximate CDFG Cyclomatic
with undirected complexity is a bound
graph. on the size of basis
Undirected graphs sets:
have basis p
paths: e = # edges
g
All paths are linear n = # nodes
combinations of basis p = number of graph
paths. components
M = e n + 2p.

10
3/11/2015

41 42

Branch Testing Branch Testing Example


Heuristic for testing branches.
Correct: Test:
Exercise true and false branches of if (a || (b >= c)) { a=F
conditional. printf(OK\n); } (b >=c) = T
Exercise every simple condition at least once
once. Incorrect: E
Example:
l
if (a && (b >= c)) { Correct: [0 || (3 >= 2)]
printf(OK\n); } =T
Incorrect: [0 && (3 >=
2)] = F

Another Branch Testing 43 44

Example Domain Testing

Correct: Incorrect code Heuristic test for


if ((x == good_pointer) && changes pointer. linear inequalities.
x->field1 == 3)) { printf(got
the value\n); } Assignment returns Test on each side +
new LHS in C
C. b
boundary
d off
Incorrect:
if ((x = good_pointer) && x-
Test that catches inequality.
>field1 == 3)) { printf(got error:
the value\n); }
(x != good_pointer)
&& x->field1 = 3)

11
3/11/2015

45
Loop Testing 46

Def-use Pairs Loops need specialized tests to be tested


efficiently.
Variable def-use:
Heuristic testing strategy:
Def when value is
assigned (defined). Skip loop entirely.
Use when used on One loop iteration
iteration.
right-hand side. Two loop iterations.
Exercise each def- # iterations much below max.
use pair.
n-1, n, n+1 iterations where n is max.
Requires testing
correct path.

47 48

Black-box Testing Black-box Test Vectors

Complements clear-box testing. Random tests.


May require a large number of tests. May weight distribution based on software
Tests software in different ways.
y specification.
Regression tests.
Tests of previous versions, bugs, etc.
May be clear-box tests of previous versions.

12
3/11/2015

How much testing is 49

enough?

Exhaustive testing is impractical.


One important measure of test quality---bugs
escaping into field.
Good organizations can test software to give
very low field bug report rates.
Error injection measures test quality:
Add known bugs.
Run your tests.
Determine % injected bugs that are caught.

13

Das könnte Ihnen auch gefallen