Beruflich Dokumente
Kultur Dokumente
Computer Networks
(ECE 6620)
Type of WorkLoads
(Chapter#4)
4-1
Overview
Terminology
Test Workloads for Computer Systems
Addition Instruction
Instruction Mixes
Kernels
Synthetic Programs
Application Benchmarks: Sieve, Ackermann's Function,
Debit-Credit, SPEC
4-2
Workload Selection
7.
4-3
Terminology
Test workload
Real workload
Synthetic workload:
No sensitive data
4-4
Addition Instruction
Instruction Mixes
Kernels
Synthetic Programs
Application Benchmarks
4-5
Addition Instruction
Processors were the most expensive and most used
components of the system
Addition was the most frequent instruction
Thus, as a first approximation, the computer with the
faster addition instruction was considered to be the
better performer
The addition instruction was the sole workload used,
and the addition time was the sole performance metric
4-6
Instruction Mixes
4-7
Disadvantages:
Complex classes of instructions not reflected in the mixes.
Instruction time varies with:
Addressing modes
Cache hit rates
Pipeline efficiency
Interference from other devices during processor-memory
access cycles
Parameter values
Frequency of zeros as a parameter
The distribution of zero digits in a multiplier
Average number of positions of pre-shift in floating-point add
Number of times a conditional branch is taken
2010 Raj Jain www.rajjain.com
4-8
Performance Metrics:
This may or may not have effect on the total system performance
when the system consists of many other components
4-9
Kernels
Most of the initial kernels did not make use of the input/output (I/O)
devices and concentrated solely on the processor performance, this class of
kernels could be called the processing kernel
4-10
Synthetic Programs
4-11
Synthetic Programs
Advantage:
Disadvantages:
Too small
Do not make representative memory or disk references
Mechanisms for page faults and disk cache may not be adequately
exercised
CPU-I/O overlap may not be representative
Loops may create synchronizations, better or worse performance
2010 Raj Jain www.rajjain.com
4-12
Synthetic
workload
generation
program
4-13
Application Benchmarks
Benchmarking
4-14
Sieve
The sieve kernel has been used to compare microprocessors,
personal computers, and high-level languages
Based on Eratosthenes' sieve algorithm: find all prime numbers
below a given number n.
Algorithm:
Write down all integers from 1 to n
Strike out all multiples of k, for k=2, 3, , n.
Example:
Write down all numbers from 1 to 20. Mark all as prime:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
Remove all multiples of 2 from the list of primes:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
4-15
Sieve (Cont)
The next integer in the sequence is 3. Remove all
multiples of 3:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20
5 > 20 Stop
Pascal Program to Implement the Sieve Kernel:
See Program listing Figure 4.2 in the book
4-16
Ackermann's Function
4-17
Other Benchmarks
Whetstone
U.S. Steel
LINPACK
Dhrystone
Doduc
TOP
Lawrence Livermore Loops
Digital Review Labs
Abingdon Cross Image-Processing Benchmark
4-18
Debit-Credit Benchmark
A de facto standard for transaction processing
systems.
First recorded in Anonymous et al (1975).
In 1973, a retail bank wanted to put its 1000
branches, 10,000 tellers, and 10,000,000 accounts
online with a peak load of 100 Transactions Per
Second (TPS).
Each TPS requires 10 branches, 100 tellers, and
100,000 accounts.
4-19
Debit-Credit (Cont)
4-20
Response time
Measured as the time interval between the arrival of the last bit from the
communications line and the sending of the first bit to the
communications line
Cost
4-21
4-22
4-23
Systems Performance Evaluation Cooperative (SPEC): Nonprofit corporation formed by leading computer vendors to
develop a standardized set of benchmarks.
Release 1.0 consists of the 10 benchmarks: GCC, Espresso,
Spice 2g6, Doduc, LI, Eqntott, Matrix300, Fpppp, Tomcatv
Primarily stress the CPU, Floating Point Unit (FPU), and to
some extent the memory subsystem compare CPU
speeds.
Benchmarks to compare I/O and other subsystems may be
included in future releases.
4-24
4-25
6. LI:
Elapsed time to solve the popular 9-queens problem by the LISP interpreter is
measured
7. Eqntom
Translates a logical representation of a boolean equation to a truth table
8. Matrix300
Performs various matrix operations using several LINPACK routines on
matrices of size 300 300
The code uses double-precision floating-point arithmetic and is highly
vectorizable
9. Fpppp
This is a quantum chemistry benchmark that performs two electron integral
derivatives using double-precision floating-point FORTRAN. It is difficult to
vectorize.
10. Tomcatv
A vectorized mesh generation program using double-precision floating-point
FORTRAN
Since it is highly vectorizable, substantial speedups have been observed on
several shared-memory multiprocessor systems
2010 Raj Jain www.rajjain.com
4-26
SPEC (Cont)
4-27
SPEC (Cont)
4-28