Beruflich Dokumente
Kultur Dokumente
Computers Performance
Lecture 03 (22 Feb 2016)
Five-Stage Pipeline
Parallelism (1)
Instruction-level parallelism
Pipelining also see the previous slides
Pipelining allows a trade-off between latency (how long it takes to
execute an instruction) and processor bandwidth (how many
instructions/sec the CPU can complete)
Superscalar architectures
A dual pipeline or a single pipeline with multiple functional units
The two instructions must neither conflict over resource usage
(e.g., registers) nor depend on the result of the other either
guaranteed by the compiler or detected & eliminated during
execution by extra hardware
Most of the functional units in stage 4 take appreciably longer
than one clock cycle to execute, certainly the ones that access
memory or do floating-point arithmetic
Arsitektur & Organisasi Komputer
Superscalar Architectures
Parallelism (2)
Processor-level parallelism
Multicore processors
Fabricating multiple processing units on a single chip, e.g., dual-core,
quad-core, hex-core, etc.
SIMD Processor
Processing steps per cycle:
The scheduler selects two threads
to execute on the processor
The next instruction from each
thread then executes on up to 16
SIMD cores
10
Parallelism (3)
Processor-level parallelism (contd)
Multiprocessors (SMP / symmetric multiprocessing)
Computer systems that contain many processors, each possibly
containing multiple cores
Used for either executing a number of different application tasks
concurrently or executing subtasks of a single large task in parallel
All processors usually have access to all of the memory sharedmemory multiprocessor
11
Multiprocessor
A single-bus multiprocessor
12
Tianhe-2
Worlds Fastest Computer
(by November 2015)
source: http://www.top500.org
Processors
Total cores
Memory
Interconnect
Linpack performance
Power
OS
MPI
:
:
:
:
:
:
:
:
13
Parallelism (4)
14
Performance Assessment
Performance is a key parameter in evaluating a
computer system, along with cost, size, security,
reliability, and power consumption
Raw speed is far less important than how a processor
performs when executing a given application
Some measures of computers performance:
Clock speed
Instruction execution rate
Benchmarks
Amdahls Law
Littles Law
Arsitektur & Organisasi Komputer
15
Clock Speed
Clock speed or clock rate is measured in cycles/second or
Hertz (Hz)
Clock signals typically are generated by a quartz crystal, which
generate a constant signal wave while power is applied; the
wave is in turn converted into a digital voltage pulse stream
Since the execution of an instruction involves a number of
steps such as fetching the instruction from memory,
decoding the instruction, loading & storing data, and
performing arithmetic & logical operations most instructions
require multiple clock cycles to complete
A straight comparison of clock speeds on different processors
does not tell the whole story about performance (e.g., when
pipelining is used)
Arsitektur & Organisasi Komputer
16
CPI
i 1
(CPI i I i )
Ic
17
18
Ic
f
19
Benchmarks (1)
MIPS and MFLOPS often are inadequate to evaluate the
processors performance (e.g., CISC vs. RISC machines may
have different MIPS rates although both take about the same
amount of time)
In early 1990s, measuring the performance of systems is
shifted to using a set of benchmark programs
Desirable characteristics of a benchmark program:
Written in a high-level language, making it portable across machines
Representative of a particular kind of programming style such as
systems programming, numerical programming, or commercial
programming
Can be measured easily
Has wide distribution
Arsitektur & Organisasi Komputer
20
Benchmarks (2)
SPEC (System Performance Evaluation Corporation)
benchmarks defined and maintained by an industry
consortium (e.g., SPEC CPU2006, SPECjvm98, SPECjbb2000,
SPECweb99, SPECmail2001)
Averaging results run a number of different benchmark
programs on each machine and then average the results
1 m
simple arithmetic mean RA Ri
m i 1
harmonic mean RH
m
m
1
i 1 Ri
21
Benchmarks (3)
SPEC benchmarks concern with speed metric and rate metric
Speed metric measures the ability of a machine to complete a task
Results are reported as the ratio of the reference run time to the system
(under test) run time
The overall performance measure for the system under test is calculated
by averaging the ratios values by a geometric mean
1n
n
Trefi
rG ri
ri
Tsuti
i 1
22
T 1 f Tf
1
Tf
f
T (1 f )
(1 f )
N
N
23
24
Littles Law
Based on a queuing theory, Littles Law can be applied to any
system that is statistically in steady state and in which there is
no leakage
General setup:
Suppose there is a steady state system where items arrive at an
average rate of items per unit time
The item stay in the system an average of W units of time
There is an average of L units in the system at any one time
25
26
27
Solution:
Speedup
1 0.4 0.4
K
1
0.6
0.4
K
28