Sie sind auf Seite 1von 85

Measurement tools and techniques

Fundamental strategies Interval timers Program profiling Tracing Indirect measurement

Copyright 2004 David J. Lilja

Events

Most measurement tools based on events

Some predefined change to system state Memory reference Disk access Change in a registers state Network message Processor interrupt
Copyright 2004 David J. Lilja 2

Definition depends on metric being measured


Event Classification

Count metrics

The number of times event X occurs Number of cache misses Number of I/O operations

Copyright 2004 David J. Lilja

Event Classification

Secondary-event metrics

Record a value when triggered by some event Record block size for each I/O operation Count number of operations Find average I/O transfer size

Copyright 2004 David J. Lilja

Event Classification

Profiles

Characterization of overall behavior Aggregate/big picture view of an application program Time spent in each function

Copyright 2004 David J. Lilja

Event-Driven Strategies

Record necessary information only when selected event occurs Modify system to record event Dump data when program terminates

May need intermediate dumps also

E.g. simple counter in page fault routine

Copyright 2004 David J. Lilja

Event-Driven Strategies

System overhead

Only when the event of interest actually occurs Infrequent events little perturbation Frequent events high perturbation Perturbation changes system being measured

No longer typical behavior?

Copyright 2004 David J. Lilja

Event-Driven Strategies

Inter-event time is unpredictable


Depends on when events actually occur Makes it hard to estimate perturbation How long to measure? Good for low-frequency events

Event-driven measurement tools

Copyright 2004 David J. Lilja

Event-Driven Strategies

+1

+1 +1

+1

+1

+1 +1

+1

Counts 8 events exactly


Copyright 2004 David J. Lilja 9

Tracing

Similar to event-driven But record additional system state


Event has occurred count Additional information to uniquely identify event E.g. addresses that cause page faults Additional memory or disk storage Time to save state

Overhead

Relatively large system perturbation


Copyright 2004 David J. Lilja 10

Tracing

+1; +1; +1; Addr Addr Addr

+1; Addr

+1; Addr

+1; +1; Addr Addr

+1; Addr

Counts 8 events plus extra data


Copyright 2004 David J. Lilja 11

Sampling

Record necessary state at fixed time intervals Overhead


Independent of specific event frequency Depends on sampling frequency

Misses some events Produces statistical summary


May miss infrequent events Each replication will produce different results
Copyright 2004 David J. Lilja 12

Sampling

+1

+1

+1

Counts 3 events out of 5 samples


Copyright 2004 David J. Lilja 13

Comparisons
Event count Resolution Overhead Perturbation Exact count Low ~ #events Tracing Detailed info High High Sampling Statistical summary Constant Fixed

Copyright 2004 David J. Lilja

14

Comparison

Event counting

Best for low frequency events Required if exact counts needed Best for high frequency events If statistical summary is adequate When additional detail is required
Copyright 2004 David J. Lilja 15

Sampling

Tracing

Indirect Measurements

Used when desired metric is not directly accessible Measure one thing directly

Derive or deduce desired metric

Highly dependent on creativity of performance analyst

Copyright 2004 David J. Lilja

16

Interval Timers

Fundamental tool of performance measurement Measure execution time of any portion of a program Provide time basis for sampling

Copyright 2004 David J. Lilja

17

Interval Timers

Actually count clock pulses between two events


Event 2

Event 1

Tc x1= Counter x2 = Counter

Te=(x2 x1)Tc
Copyright 2004 David J. Lilja 18

Using an Interval Timer

Within an application program


Start_count = read_timer();

Portion of program to be measured


Stop_count = read_timer(); Elapsed_time = (stop_count start_count)

* clock_period;

Copyright 2004 David J. Lilja

19

Hardware Timer
Tc n-bit counter Clock

Te=(x2 x1)Tc

To CPU input port

Copyright 2004 David J. Lilja

20

Software Timer
Tc Prescalar (divide-by-n) Tc

Clock

CPU interrupt input

Te=(x2 x1)Tc

Software counter

Copyright 2004 David J. Lilja

21

Quantization Errors

Copyright 2004 David J. Lilja

22

Quantization Error

Timer resolution quantization error Repeated measurements


nTc < Te < (n+1)Tc Te rounded to one clock tick

Completely unpredictable rounding Want Tc to be as small as possible


Copyright 2004 David J. Lilja 23

Timer Rollover

n-bit counter

count = [0, 2n-1]

Rollover = transition from (2n 1) 0 If rollover occurs between start/stop events

Then count = (x2 x1) < 0 Measure again Add 2n to count


Copyright 2004 David J. Lilja 24

Check for count < 0


Timer Rollover
Counter width, n Resolution (Tc) 10 ns 1 us 100 us 1 ms 16 32 64

655 us
65.5 ms 6.55 s 1.1 min

43 s
1.2 h 5 days 50 days

58.5 cent
5,580 cent 585,000 cent 5,850,000 cent
25

Copyright 2004 David J. Lilja

Timer Overhead
Start_count = read_timer();

Stop_count = read_timer(); Elapsed_time = (stop_count start_count) * clock_period;

Portion of program to be measured

To access timer

Min of 1 memory read subroutine call Min of 1 memory write subroutine call

Once at start, again at stop


Copyright 2004 David J. Lilja 26

Event begins; Initiate read_timer() Current time actually read

Timer Overhead

T1 T2 Event being measured begins


Copyright 2004 David J. Lilja

T3 Event ends; Initiate read_time() T4 Current time actually read

27

Timer Overhead

T1 = time to read counter value T2 = time to store counter value T3 = time of the event we are measuring T4 = time to read counter value

T 4 = T1

T1

T2

T3
Copyright 2004 David J. Lilja

T4
28

Timer Overhead

Te = event time = T3 But actually measured

Tm = T2 + T 3 + T4

Te = Tm (T2 + T4) = Tm (T1 + T2) Timer overhead = Tovhd = (T1 + T2)

T1

T2

T3
Copyright 2004 David J. Lilja

T4
29

Timer Overhead

If Te >> Tovhd

Ignore the timer overhead Measurements will be highly suspect

If Te Tovhd

Potentially large variations in Tovhd Good rule of thumb

Te should be 100-1000x > Tovhd

Copyright 2004 David J. Lilja

30

Approximate Measures of Short Intervals

How to measure an event that is shorter than the resolution of the clock? Cannot directly measure events with Te < Tc Overhead makes it hard to measure even when Te > nTc,

n is small integer

Copyright 2004 David J. Lilja

31

Approximate Measures of Short Intervals


Tc

Te

Case 1: Count+1 Case 2: Count+0

Te

Copyright 2004 David J. Lilja

32

Approximate Measures of Short Intervals

Bernoulli experiment

Outcome = +1 with probability p Outcome = +0 with probability (1-p) Equivalent to flipping a biased coin Approximates a binomial distribution Only approximate since each measurement cannot be guaranteed to be independent

Repeat n times

Usually close enough in practice


Copyright 2004 David J. Lilja 33

Approximate Measures of Short Intervals

m = number of times Case 1 occurs

Count+1

n = total number of measurements Average duration is ratio of m/n Use confidence interval for proportions

m Te Tc n
Copyright 2004 David J. Lilja 34

Example

Clock resolution = 10 us n = 8764 measurements m = 467 clock ticks counted 95% confidence interval
10 us

Case 1: 467 Case 2: 8297

?
Copyright 2004 David J. Lilja

35

Example
467 467 1 467 8764 8764 (c1 , c2 ) 1.96 8764 8764 (0.0486,0.0580)

Scale by clock period = 10 us 95% chance that measured event is

(0.49, 0.58) us
Copyright 2004 David J. Lilja 36

Profiling

Overall view of programs execution-time behavior Fraction of total time spent in specific states

Fraction of time in each subroutine Fraction of time in OS kernel Fraction of time doing I/O Optimize those sections first
Copyright 2004 David J. Lilja 37

Find bottlenecks, code hot-spots

Statistical Sampling

Select a random subset of a population Gather information on only this subset Extrapolate this information to overall population Results are a statistical summary with corresponding error probabilities
Copyright 2004 David J. Lilja 38

PC Sampling

+1

+1

+1

Periodically interrupt program at fixed intervals Record appropriate state information in interrupt service routine Post-process to obtain overall profile
Copyright 2004 David J. Lilja 39

PC Sampling

At each interrupt

Examine PC on return address stack Use address map to translate this PC to subroutine i Increment array element H[i]
Addr map 0-1298: Subr 1 1299-3455: Subr 2 3456-5567: Subr 3 5568-9943: Subr 4
Copyright 2004 David J. Lilja

PC: 4582

Histogram counters: H[3]=H[3]+1

40

PC Sampling
140 120 100 80 60 40 20 0

0]

1] H [1

H [1

H [2

H [3

H [4

H [5

H [6

H [7

H [8

H [9

H [1

Copyright 2004 David J. Lilja

H [1
41

2]

PC Sampling

n total interrupts Post-processing step


H[i]/n = fraction of time executing in subroutine i (H[i]/n) * (interrupt period) = time in each subroutine

Copyright 2004 David J. Lilja

42

PC Sampling

This is a statistical process

Different counts each time the experiment is performed

Infer behavior of entire program from small sample Apply confidence intervals to quantify precision of results

Copyright 2004 David J. Lilja

43

Example

40 us interrupt 36,128 interrupts in subroutine A Program runs for 10 seconds Time in this subroutine?

90% confidence interval

m = 36,128 n = 10 sec / 40 us = 250,000 p = m/n = 0.144


Copyright 2004 David J. Lilja 44

Example
0.144512(0.855488) (c1 , c2 ) 0.144512 1.645 250000 (0.144,0.146)

90% chance that the program spent 14.4-14.6% of its time in subroutine A

Copyright 2004 David J. Lilja

45

Example

10 ms interrupt 12 interrupts in subroutine A n = 800 samples

8 seconds total execution time 99% confidence interval

Time in this subroutine?

p = m/n = 0.015

Copyright 2004 David J. Lilja

46

Example
0.015(1 0.015) (c1 , c2 ) 0.015 2.576 800 (0.0039,0.0261)

99% chance that the program spent 31-210 ms in subroutine A A pretty wide range! But only <3% of total execution time Start optimizing somewhere else first
Copyright 2004 David J. Lilja 47

Reducing the Interval Size


Use a lower confidence level Obtain more samples

Run program longer

May not be possible May be fixed by system Will increase overhead and perturbation

Increase sample rate


Run multiple times and add samples from each run


Copyright 2004 David J. Lilja 48

PC Sampling

+1

+1

+1

Interrupts must occur asynchronously w.r.t. any program events

Samples must be independent of each other Else over/under-sample events synchronous with interrupt
Copyright 2004 David J. Lilja 49

Periodic versus random sampling

Basic Block Counting

Basic block

Sequence of instructions with no branches into or out of the block When first instruction is executed, guaranteed that all instructions in block will be executed Single entry, single exit

Copyright 2004 David J. Lilja

50

Basic Block Counting

Generate a program profile by inserting additional instructions in each block

Increment a unique counter each time a block is entered

Produces a histogram of program execution Can post-process to find instruction execution frequencies

Copyright 2004 David J. Lilja

51

Comparison
PC sampling Output Overhead Perturbation Basic block counting

Repeatability

Statistical Exact count estimate Interrupt service Extra instructions routine per block Randomly High distributed Within statistical Perfect variance
Copyright 2004 David J. Lilja 52

Event Tracing

Profile shows overall frequency-of-execution behavior

Ignores time-ordering of events Dynamic list of events generated by program Events = anything you want to instrument

Program trace

Sequence of memory addresses I/O blocks accessed

Typically used to drive a simulator


Copyright 2004 David J. Lilja 53

Trace Generation
Modify to generate trace Application program Compress

Uncompress Trace consumer


Copyright 2004 David J. Lilja 54

Trace Generation
Modify to generate trace Application program Compress Online trace consumption Uncompress Trace consumer
Copyright 2004 David J. Lilja 55

Trace Generation

Source-code modification

Allows precise control of what events are traced and what data is recorded Typically a manual process

Source code

Compiler

Object code

Proc

Trace

Copyright 2004 David J. Lilja

56

Trace Generation

Software exceptions

HW forces an exception before each instruction Exception routine decodes instruction

Store instr type, PC, operand addresses, etc.

Trace bit in many processors Tremendous slowdown

Source code

Compiler

Object code
Copyright 2004 David J. Lilja

Proc

Trace
57

Trace Generation

Emulation

Make a system appear to be something else Modify emulator to generate trace E.g. Java Virtual Machine

Source code

Compiler

Object code
Copyright 2004 David J. Lilja

Proc

Trace
58

Trace Generation

Microcode modification

Modify instruction execution directly Allows tracing of all instructions

Including operating system

Depends on access to lower levels of the processor E.g. Transmeta Crusoe processor
Compiler

Source code

Object code
Copyright 2004 David J. Lilja

Proc

Trace
59

Trace Generation

Compiler modification

Insert trace code directly in object file Requires access to the compiler itself

Source code

Compiler

Object code
Copyright 2004 David J. Lilja

Proc

Trace
60

Trace Generation

Compiler modification

Insert trace code directly in object file Requires access to the compiler itself Write post-compilation binary editor/rewrite tool

Source code

Compiler

Object code
Copyright 2004 David J. Lilja

Proc

Trace
61

Trace Data

Tracing generates a tremendous volume of data Trace 100,000,000 instrs/sec 16 bits of data per event 190 Mbytes of data per second

11 Gbytes per minute


Due to tracing code Time to store trace data
Copyright 2004 David J. Lilja 62

Huge perturbations

Trace Data Compression


Modify to generate trace

Application program Compress

Standard compression algorithms as trace is written to disk Uncompress when reading Typical reduction

20-70%

Uncompress

Tradeoff is compressuncompress time


63

Trace consumer
Copyright 2004 David J. Lilja

Online Trace Consumption


Modify to generate trace

Application program Online trace consumption Trace consumer

Use trace data as it is generated Never stored on disk Multitasking may lead to non-deterministic behavior

Repeatability issue

Before-and-after comparison tests

Difference due to change in system or change in trace? Becomes statistical comparison with n runs
64

Copyright 2004 David J. Lilja

Abstract Execution

Use higher-level information to intelligently compress trace info Two-step process

Compiler-style analysis to find critical subset of trace

Store only control flow information sufficient to reconstruct trace later

Produce trace-regeneration code for subsequent use of trace


Copyright 2004 David J. Lilja 65

Abstract Execution
1. if (i > 5) 2. then a = a + i; 3. else b = b + i; 4. i = i + 1;
1. if (i>5)

Trace will be either

1-2-4 1-3-4

2. a=a+i

3. b=b+i

4. i=i+1

Store only 2 or 3 Combine with compilergenerated control flow graph to regenerate trace Slowdown = 2-10x Compress = 10-100x
66

Copyright 2004 David J. Lilja

Trace Sampling

Save only subsequences of overall trace Drive simulator with samples Results should be statistically similar to driving with complete trace One sample = k consecutive events Sampling interval = P (period)
k k

P
Copyright 2004 David J. Lilja 67

SimPoint

Find representative program samples


Match basic block execution frequencies Clustering tool to automate process

Perform detailed timing simulation on only these samples Fast-forward (functional simulation) between samples
[Sherwood et al, ASPLOS, 2002]
Copyright 2004 David J. Lilja 68

SimPoint

Weight each samples result by execution frequency to produced overall result Relatively small number (10s) of SimPoints produced 3% error in IPC on SPEC

Copyright 2004 David J. Lilja

69

SMARTS

Uses systematic sampling

Fixed sample interval

Apply statistical sampling techniques to determine j, k, P


Functional simulation Detailed simulation j k j k

P
Copyright 2004 David J. Lilja 70

Indirect Ad Hoc Techniques

Sometimes the desired metric cannot be measured directly Use your creativity to measure one thing and then derive/infer the desired value

Copyright 2004 David J. Lilja

71

Example System Load

What is system load?

Number of jobs in run queue? Number of jobs actively time-sharing? Fraction of time processor is not in idle loop? Others? Modify OS PC sampling Indirect?
Copyright 2004 David J. Lilja 72

How to measure it?


Example
T
Monitor
Count

Let system run for fixed time T Note value of counter


Copyright 2004 David J. Lilja 73

Example
T
Monitor
Monitor App 1 Count

n
n/2

Let system run for fixed time T Compare value of loaded system monitor counter to unloaded system count value
Copyright 2004 David J. Lilja 74

Example
T
Monitor
Monitor App 1 Monitor App 1 App 2 Count

n
n/2

n/3

Let system run for fixed time T Compare value of loaded system monitor counter to unloaded system count value
Copyright 2004 David J. Lilja 75

Perturbation

To obtain more information (higher resolution)

Use more instrumentation points Greater perturbation

More instrumentation points

Copyright 2004 David J. Lilja

76

Perturbation

Computer performance measurement uncertainty principle

Accuracy is inversely proportional to resolution.


Accuracy

High

Low

Resolution
Copyright 2004 David J. Lilja

High

77

Perturbation

Superposition does not work here


Non-linear Non-additive

Double instrumentation double impact on performance


Some instrumentation cancels out Some multiplies impact

No way to predict!
Copyright 2004 David J. Lilja 78

Instrumentation Code

Changes memory access patterns

Affects memory banking optimizations More frequent cache flushes and replacements But may reduce set associativity conflicts

Generates additional load/store instructions


Generates more I/O operations Will increase overall execution time

More time-sharing context switches

Alters virtual memory paging behavior


Copyright 2004 David J. Lilja 79

Important Points

Event types

Simple counts of primary event Secondary events triggered by some primary event Overall profiles

Copyright 2004 David J. Lilja

80

Important Points

Measurement strategies

Event-driven Tracing Sampling Indirect approaches

Copyright 2004 David J. Lilja

81

Important Points

Interval timers

Stopwatch functionality Rollover problem Overhead Quantization errors

Statistical measures of short intervals

Copyright 2004 David J. Lilja

82

Important Points

Profiling

PC sampling

Statistical view Exact behavior High overhead and perturbation

Basic block counting

Copyright 2004 David J. Lilja

83

Important Points

Trace generation

Source code modification Force exceptions Emulation Microcode modification Compiler modification Object code editor

Online trace consumption Trace sampling


Copyright 2004 David J. Lilja 84

Important Points

Indirect measurements when all else fails

System load example Nobody likes them Have to learn to live with them

Perturbations

Copyright 2004 David J. Lilja

85

Das könnte Ihnen auch gefallen