Sie sind auf Seite 1von 44

Tutorial on Timing Analysis and Optimization

Is your program always fast enough?

Dr. Christian Ferdinand AbsInt Angewandte Informatik GmbH Dr. Kai Richter Symtavision GmbH

AbsInt Angewandte Informatik GmbH


Provides advanced development tools for embedded systems, and tools for validation, verification, and certification of safety-critical software
40

Staff growth graph

Founded in February 1998 by six researchers of Saarland University, Germany Privately held by the founders

30 20 10 0 1998 2008

Key Products

Hard Real-Time Systems


Controllers in planes, cars, plants, are expected to finish their tasks within reliable time bounds. Schedulability analysis must be performed Hence, it is essential that an upper bound on the execution times of all tasks is known Commonly called the Worst-Case Execution Time (WCET)

The Timing Problem

Probability

Best-case execution time

Unsafe: execution time measurement

Safe worst-case execution time estimate Exact worst-case execution time

Execution time

The Ever-Growing Gap


LOAD r2, _a r1, _b r3,r2,r1

x = a + b;

LOAD ADD

68K (1990)
Execution time (clock cycles)
300 200 100 0 20 Best case 20 Worst case

MPC 5xx (2000)


Execution time depending on flash memory
300 200 100 0
4
0 wait cycles

PPC 755 (2001)


Execution time (clock cycles)
300 200 100 320

8
1 wait cycle

30
External (6,1,1,1,..)

4 Best case Worst case

(Concrete) Instruction Execution


mul Fetch
I-Cache miss?

Issue
Unit occupied?

Execute
Multicycle?

Retire
Pending instructions?

1 30 3

1 3 1 3 6

1 4 1 1 1 44

Murphys Law in Timing Analysis


Nave, but safe guarantee accepts Murphys Law: Any accident that may happen will happen Consequence: hardware overkill necessary to guarantee timeliness Example: EADS study: Measured performance of PPC 603e with all the caches switched off
Corresponds to assumption all memory accesses miss the cache Result: Slowdown of a factor of 30!!!

Fighting Murphys Law


Static Program Analysis allows the derivation of Invariants about all execution states at a program point Derive Safety Properties from these invariants: Certain timing accidents will never happen. Example: At program point p, instruction fetch will never cause a cache miss The more accidents excluded, the lower the upper bound

10

aiT WCET Analyzer


The solution to the timing problem Global program analysis
abstract interpretation for cache, pipeline, and value analysis integer linear programming for path analysis

Everything combined in a single intuitive GUI

11

Structure of the aiT WCET Analyzer

Example: Direct Mapped I-Cache

12

CPU
Program Counter:

1028 1032
Instruction:

I-Cache
1032: ble 1024 1024: add 1028: mul

Main memory
1024: add 1028: mul

mul1024 ble ...

1032: ble 1024

Cache Hit: ~ 1 Cycle Cache Miss: ~ +1 to +100 Cycles

13

Cache Analysis
Example: Fully Associative Cache (2 Elements)

Must analysis: for each program point and calling context, find out which blocks are in the cache May analysis: for each program point and calling context, find out which blocks may be in the cache

Set Associative Cache


CPU
Address:
Address prefix Set number Byte in line
1 Adr. prefix Tag Rep Data block 2 Adr. prefix Tag Rep Data block Set: Fully associative subcache of A elements with LRU, FIFO, rand. replacement strategy A

14

Compare address prefix If not equal, fetch block from memory

Main Memory

Byte select & align

Data Out

15

Pipelines
Inst 1 Fetch Decode Execute Write back Fetch Decode Execute Write back Fetch Decode Execute Write back Fetch Decode Execute Write back Fetch Decode Execute Write back Inst 2 Inst 3 Inst 4

Ideal case: 1 instruction per cycle

16

Pipeline Analysis
Goal: calculate all possible pipeline states at a program point Method: perform a cycle-wise evolution of the pipeline, determining all possible successor pipeline states Implementation: from a formal model of the pipeline, its stages and communication between them Generation: from a PAG specification Result: WCET for basic blocks

17

Pipeline Model
MPC555 Block Diagram aiT's internal pipeline model

RCPU Block Diagram

aiT visualization

18

Visualization of Pipeline Analysis Results

19

Path Analysis: Example


if a then b elseif c then d else e endif f

(simplified constraints)

max: 4 xa + 10 xb + 3 xc +

4t

2 xd + 6 xe + 5 xf 3t where xa = xb + xc xcc = xd + xe 6t xf = xb + xd + xe xa = 1
Value of objective function: 19

10t 2t

5t

xa xb xc xd xe xf

1 1 0 0 0 1

20

A Hybrid Approach:
Combining block measurements with static analysis

Measurements of execution times of blocks (emulator, logic analyzer, Nexus, ETM, )

Avoids the high costs of micro-architecture modeling Requires to measure all local worst-case behaviors Regrettably, this is nearly impossible generally not safe! Nevertheless, can be quite useful for optimizations by hand

21

Some Architectural Features that make Measurement-Based WCET Analysis a Challenge


Fine-grain timing measurement is not always possible
Instrumentation changes timing behavior Debug interfaces rarely available in real embedded applications

The empty cache is not necessarily the worst case cache Domino effects

22

Domino Effect
Timing anomaly Execution time increase is not bounded by hardware determined constants Certain instruction sequences e.g. in loop bodies can trigger this effect and increase latencies in further iterations

23

Pseudo-LRU Replacement (e.g., PPC G3)


Each setting of B[0..2] points to a specific line:

B0
1 0

B1

B2

L0

L1

L2

L3

24

4-way PLRU Domino Effect


Empty cache . c c c c c c c c c c c c . . d d d d d d d d d d d . . . f f f f f f f f f f . . . . . . h h h h h h h
0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0 0 1 1 1 0 0 0 1 1 1 0

Non-empty cache Sequence: c, d, f, c, d, h c: d: f: c: d: h: c: d: f: c: d: h: f c c c c c c c c c c c c e e e f f f h h h f f f h a a d d d d d d d d d d d b b b b b b b b b b b b b


0 1 1 0 1 1 0 1 1 0 1 1 0 0 1 0 1 1 0 1 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1

c: d: f: c: d: h: c: d: f: c: d: h:

This sequence is then repeated ad infinitum only cache hits two misses each time

25

aiT WCET Analysis Input/Output


Application Code
void Task (void) { variable++; function(); next++: if (next) do this; terminate() }

Specifications (*.ais)
clock 10200 kHz ; loop "_codebook" + 1 loop exactly 16 end ; recursion "_fac" max 6; SNIPPET "printf" IS NOT ANALYZED AND TAKES MAX 333 CYCLES; flow "U_MOD" + 0xAC bytes / "U_MOD" + 0xC4 bytes is max 4; area from 0x20 to 0x497 is read-only;

Entry Point

Compiler Linker

aiT Worst Case Execution Time Visualization, Documentation

Executable (*.elf / *.out)


=@ a | @, @ ; K ; K 2} ` (8 H# (

26

Hardware-Settings

Hardware settings have to be specified in aiT according to the target processor configuration in the start-up code.

27

Challenge: Reconstruction of CFG


Indirect Jumps
Case/Switch statements as compiled by the C-compiler are automatically recognized For hand-written assembly code annotations might be necessary

INSTRUCTION ProgramPoint BRANCHES TO Target1, , Targetn


Indirect Calls
Can often be recognized automatically if a static array of function pointers is used For other cases

INSTRUCTION ProgramPoint CALLS Target1, , Targetn

28

Loops
aiT includes a loop bound analysis based on interval analysis and pattern matching that is able to recognize the iteration count of many simple FOR loops automatically Other loops need to be annotated
Example: loop "_prime" + 1 loop end max 10;

29

Source Level Annotations


bool divides (uint n, uint m) { /* ai: SNIPPET HERE NOT ANALYZED, TAKES MAX 173 CYCLES; */ return (m % n == 0); } bool prime (uint n) { uint i; if (even (n)) /* ai: SNIPPET HERE INFEASIBLE; */ return (n == 2); for (i = 3; i * i <= n; i += 2) { /* ai: LOOP HERE MAX 20; */ if (divides (i, n)) return 0; } return (n > 1); }

30

aiT: Timing Details

31

Recent Advances
Cache-miss penalties WCET overestimation

Source: studies by Lim et al. (1995), Thesing et al. (2002), and Souyris et al. (2005)

32

Masters Thesis of Daniel Sehlberg


Mlardalen University, Sweden, ASTEC-Project, August 2005

Real-time tasks under Rubus OS on C16x taken from Volvo CE application

33

WCET Challenge 2006


Organized by the University of Mlardalen http://www.idt.mdh.se/personal/jgn/challenge/ Aim: Compare different approaches in analyzing the Worst-Case Execution Time Excerpts from the final report:
"aiT is able to handle every kind of benchmark and every test program that was tested in the Challenge. aiT is able to support WCET analysis even for complex processors. aiT demonstrates its leading position through all its features []"

Full report: http://dc.informatik.uni-essen.de/Tan/all/

34

SCADE / aiT automated Flow

35

Analysis Reports
Customizable HTML reports Global and detailed reports Diff feature

36

Integration with ETAS/ASCET


aiT/StackAnalyzer is started from the ASCET main menu ASCET generates the annotation files and the analyses are performed in the background

37

Practical Experiments, Execution Time

Engine throttle control module specified in ASCET, Tasking compiler v7.5., STM ST10F269 microcontroller board. Run-times extracted from bus traces (ISYSTEMS ILA 128 logic analyzer) The worst-case path information provided by aiT was used to manually construct a corresponding input.

38

Practical Experiments: Stack Usage

ST10/C16x uses two stacks. Most generated functions neither use local variables nor call subroutines, i.e. the stack usage is zero.

39

Integration with Scheduling Analysis


System level: SymTA/S
System model (tasks, activation,scheduling)

Code level: aiT/StackAnalyzer


Additional info

WCET/stack request

Refinement

WCET/stack analysis (single task)

Scheduling analysis (WCRT) system stack analysis

WCET/stack response

40

Future Work
Extraction of timing (pipeline) models from HW description (VHDL) Use of source-level program analyses Tighter integration with measurement based approaches Early phase worst-case execution time estimation

41

aiT WCET Analyzer Advantages


Inspect the worst-case timing behavior of (critical parts of) your code Tight WCET bounds reflect the actual worst-case performance of your system Determined automatically Valid for all inputs and all execution scenarios No modification of your code or tool chain required

42

aiT Visualization Features


Precise insight into the program and processor behavior Valuable feedback in optimizing your program

43

Conclusion
aiT enables development of complex hard-real time systems on state-of-the-art hardware Increases safety Saves development time and costs Usability proven in industrial practice

Contact

Coffee break We start again at 11h

Visit us!

Hall10, booth 403

Das könnte Ihnen auch gefallen