Sie sind auf Seite 1von 28

Simulation, Modeling and Analysis of

Computer Networks
(ECE 6620)

Type of WorkLoads
(Chapter#4)

Art of Computer Systems Performance Analysis


By R. Jain

Dr. M. Hasan Islam

2010 Raj Jain www.rajjain.com

4-1

Overview

Terminology
Test Workloads for Computer Systems
Addition Instruction
Instruction Mixes
Kernels
Synthetic Programs
Application Benchmarks: Sieve, Ackermann's Function,
Debit-Credit, SPEC

2010 Raj Jain www.rajjain.com

4-2

Workload Selection

Computer system performance measurements involve monitoring the


system while it is being subjected to a particular workload
In order to perform meaningful measurements, the workload should be
carefully selected
To achieve that goal, the performance analyst needs to understand the
following before performing measurements:
1.
2.
3.
4.
5.
6.

7.

What are the different types of workloads?


Which workloads are commonly used by other analysts?
How are the appropriate workload types selected?
How is the measured workload data summarized?
How is the system performance monitored?
How can desired workload be placed on the system in a controlled
manner?
How are the results of the evaluation presented?
2010 Raj Jain www.rajjain.com

4-3

Terminology

Test workload

Any workload used in performance studies

Test workload can be real or synthetic

Real workload

Observed on a system being used for normal operations

Cannot be repeated, generally not suitable for use as a test workload

Synthetic workload:

Similar to real workload

Can be applied repeatedly in a controlled manner

No large real-world data files

No sensitive data

Easily modified without affecting operation

Easily ported to different systems due to its small size

May have built-in measurement capabilities


2010 Raj Jain www.rajjain.com

4-4

Test Workloads for Computer Systems


1.
2.
3.
4.
5.

Addition Instruction
Instruction Mixes
Kernels
Synthetic Programs
Application Benchmarks

2010 Raj Jain www.rajjain.com

4-5

Addition Instruction
Processors were the most expensive and most used
components of the system
Addition was the most frequent instruction
Thus, as a first approximation, the computer with the
faster addition instruction was considered to be the
better performer
The addition instruction was the sole workload used,
and the addition time was the sole performance metric

2010 Raj Jain www.rajjain.com

4-6

Instruction Mixes

Specification of various instructions coupled with their usage


frequency
Gibson mix: Developed by Jack C. Gibson in 1959 for IBM
704 systems.

2010 Raj Jain www.rajjain.com

4-7

Instruction Mixes (Cont)

Disadvantages:
Complex classes of instructions not reflected in the mixes.
Instruction time varies with:

Addressing modes
Cache hit rates
Pipeline efficiency
Interference from other devices during processor-memory
access cycles
Parameter values
Frequency of zeros as a parameter
The distribution of zero digits in a multiplier
Average number of positions of pre-shift in floating-point add
Number of times a conditional branch is taken
2010 Raj Jain www.rajjain.com

4-8

Instruction Mixes (Cont)

Performance Metrics:

MIPS = Millions of Instructions Per Second

MFLOPS = Millions of Floating Point Operations Per Second

It must be pointed that the instruction mixes only measure the


speed of the processor

This may or may not have effect on the total system performance
when the system consists of many other components

System performance is limited by the performance of the


bottleneck component, and unless the processor is the bottleneck
(that is, the usage is mostly compute bound), the MIPS rate of
the processor does not reflect the system performance
2010 Raj Jain www.rajjain.com

4-9

Kernels

Introduction of pipelining, instruction caching, and various address


translation mechanisms made computer instruction times highly variable

Instead, it became more appropriate to consider a set of instructions, which


constitutes a higher level function, a service provided by the processors

Such and function is called a Kernel (most frequent function (algorithms))

Most of the initial kernels did not make use of the input/output (I/O)
devices and concentrated solely on the processor performance, this class of
kernels could be called the processing kernel

An individual instruction could no longer be considered in isolation

Commonly used kernels: Sieve, Puzzle, Tree Searching, Ackerman's Function,


Matrix Inversion, and Sorting

Disadvantages: Do not make use of I/O devices or OS services, and thus,


the kernel performance does not reflect the total system performance

2010 Raj Jain www.rajjain.com

4-10

Synthetic Programs

To measure I/O performance lead analysts develop simple


exerciser loops that make a specified number of service calls or
I/O requests (Exerciser loops)
Allows them to compute the average CPU time and elapsed
time for each service call
Exerciser loops are also used to measure operating system
services such as process creation, forking, and memory
allocation
In order to maintain portability to different operating systems,
such exercisers are usually written in high-level languages such
as FORTRAN or Pascal
First exerciser loop was by Buchholz (1969) who called it a
synthetic program
2010 Raj Jain www.rajjain.com

4-11

Synthetic Programs

Advantage:

Quickly developed and given to different vendors


No real data files
Easily modified and ported to different systems
Have built-in measurement capabilities
Measurement process is automated
Repeated easily on successive versions of the operating systems

Disadvantages:

Too small
Do not make representative memory or disk references
Mechanisms for page faults and disk cache may not be adequately
exercised
CPU-I/O overlap may not be representative
Loops may create synchronizations, better or worse performance
2010 Raj Jain www.rajjain.com

4-12

Synthetic
workload
generation
program

2010 Raj Jain www.rajjain.com

4-13

Application Benchmarks

If computer systems to be compared are to be used for a particular


application (banking or airline reservations), a representative
subset of functions for that application may be used

Benchmarking

Such benchmarks are generally described in terms of functions to be


performed and make use of almost all resources in the system, including
processors, I/O devices, networks, and databases
Process of performance comparison for two or more systems by measurements
Workloads used in the measurements are called benchmarks

Some Authors: Benchmark = set of programs taken from real


workloads
Popular Benchmarks

Sieve, Ackermanns Function, Whetstone, LINPACK, Dhrystone, Lawrence Livermore Loops,


Debit-Credit Benchmark, SPEC Benchmark Suite
2010 Raj Jain www.rajjain.com

4-14

Sieve
The sieve kernel has been used to compare microprocessors,
personal computers, and high-level languages
Based on Eratosthenes' sieve algorithm: find all prime numbers
below a given number n.
Algorithm:
Write down all integers from 1 to n
Strike out all multiples of k, for k=2, 3, , n.
Example:
Write down all numbers from 1 to 20. Mark all as prime:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
Remove all multiples of 2 from the list of primes:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20

2010 Raj Jain www.rajjain.com

4-15

Sieve (Cont)
The next integer in the sequence is 3. Remove all
multiples of 3:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20
5 > 20 Stop
Pascal Program to Implement the Sieve Kernel:
See Program listing Figure 4.2 in the book

2010 Raj Jain www.rajjain.com

4-16

Ackermann's Function

To assess the efficiency of the procedure-calling mechanisms


The function has two parameters and is defined recursively
Ackermann(3, n) evaluated for values of n from one to six.
Metrics:
Average execution time per call
Number of instructions executed per call, and
Stack space per call
Verification: Ackermann(3, n) = 2n+3-3
Number of recursive calls in evaluating Ackermann(3,n):
(512 4n-1 -15 2n+3 + 9n + 37)/3
This expression is used to compute the execution time per call

Depth of the procedure calls = 2n+3-4


Stack space required doubles when n is increase by 1
2010 Raj Jain www.rajjain.com

4-17

Other Benchmarks
Whetstone
U.S. Steel
LINPACK
Dhrystone
Doduc
TOP
Lawrence Livermore Loops
Digital Review Labs
Abingdon Cross Image-Processing Benchmark

2010 Raj Jain www.rajjain.com

4-18

Debit-Credit Benchmark
A de facto standard for transaction processing
systems.
First recorded in Anonymous et al (1975).
In 1973, a retail bank wanted to put its 1000
branches, 10,000 tellers, and 10,000,000 accounts
online with a peak load of 100 Transactions Per
Second (TPS).
Each TPS requires 10 branches, 100 tellers, and
100,000 accounts.

2010 Raj Jain www.rajjain.com

4-19

Debit-Credit (Cont)

2010 Raj Jain www.rajjain.com

4-20

Debit-Credit Benchmark (Continued)

Metric: price/performance ratio.


Performance

Response time

Measured as the time interval between the arrival of the last bit from the
communications line and the sending of the first bit to the
communications line

Cost

Throughput in terms of TPS such that 95% of all transactions provide


one second or less response time

Total expenses for a five-year period on purchase, installation, and


maintenance of the hardware and software in the machine room.

Cost does not include expenditures for terminals,


communications, application development, or operations
2010 Raj Jain www.rajjain.com

4-21

Debit-Credit Transaction Pseudo-Code

2010 Raj Jain www.rajjain.com

4-22

Pseudo-code Definition of Debit-Credit


Four record types: account, teller, branch, and history
Fifteen percent of the transactions require remote
access
Transactions Processing Performance Council (TPC)
was formed in August 1988
TPC BenchmarkTM A is a variant of the debit-credit
Metrics: TPS such that 90% of all transactions provide
two seconds or less response time

2010 Raj Jain www.rajjain.com

4-23

SPEC Benchmark Suite

Systems Performance Evaluation Cooperative (SPEC): Nonprofit corporation formed by leading computer vendors to
develop a standardized set of benchmarks.
Release 1.0 consists of the 10 benchmarks: GCC, Espresso,
Spice 2g6, Doduc, LI, Eqntott, Matrix300, Fpppp, Tomcatv
Primarily stress the CPU, Floating Point Unit (FPU), and to
some extent the memory subsystem compare CPU
speeds.
Benchmarks to compare I/O and other subsystems may be
included in future releases.

2010 Raj Jain www.rajjain.com

4-24

SPEC Benchmark Suite


1. GCC
The time for the GNU C Compiler to convert 19 preprocessed source files into assembly language
output is measured
This benchmark is representative of a software engineering environment and measures the
compiling efficiency of a system
2. Espresso
An Electronic Design Automation (EDA) tool that performs heuristic boolean function
minimization for Programmable Logic Arrays (PLAs)
The elapsed time to run a set of seven input models is measured.
3. Spice 2g6
Spice, another representative of the EDA environment, is a widely used analog circuit simulation
tool
The time to simulate a bipolar circuit is measured.
4. Doduc
This is a synthetic benchmark that performs a Monte Carlo simulation of certain aspects of a
nuclear reactor. Because of its iterative structure and abundance of short branches and compact
loops, it tests the cache memory effectiveness.
5. NASA7
This is a collection of seven floating-point intensive kernels performing matrix operations on
double-precision data.

2010 Raj Jain www.rajjain.com

4-25

SPEC Benchmark Suite

6. LI:
Elapsed time to solve the popular 9-queens problem by the LISP interpreter is
measured
7. Eqntom
Translates a logical representation of a boolean equation to a truth table
8. Matrix300
Performs various matrix operations using several LINPACK routines on
matrices of size 300 300
The code uses double-precision floating-point arithmetic and is highly
vectorizable
9. Fpppp
This is a quantum chemistry benchmark that performs two electron integral
derivatives using double-precision floating-point FORTRAN. It is difficult to
vectorize.
10. Tomcatv
A vectorized mesh generation program using double-precision floating-point
FORTRAN
Since it is highly vectorizable, substantial speedups have been observed on
several shared-memory multiprocessor systems
2010 Raj Jain www.rajjain.com
4-26

SPEC (Cont)

The elapsed time to run two copies of a benchmark on each of


the N processors of a system (a total of 2N copies) is measured
and compared with the time to run two copies of the
benchmark on a reference system (which is VAX-11/780 for
Release 1.0).
For each benchmark, the ratio of the time on the system under
test and the reference system is reported as SPECthruput
using a notation of #CPU@Ratio. For example, a system with
three CPUs taking 1/15 times as long as the the reference
system on GCC benchmark has a SPECthruput of 3@15.
Measure of the per processor throughput relative to the
reference system
2010 Raj Jain www.rajjain.com

4-27

SPEC (Cont)

The aggregate throughput for all processors of a multiprocessor


system can be obtained by multiplying the ratio by the number
of processors. For example, the aggregate throughput for the
above system is 45.
The geometric mean of the SPECthruputs for the 10
benchmarks is used to indicate the overall performance for the
suite and is called SPECmark.

2010 Raj Jain www.rajjain.com

4-28

Das könnte Ihnen auch gefallen