Beruflich Dokumente
Kultur Dokumente
CHAPTER 1
EXERCISE 1.5
Consider two different implementations,P1 and P2, of the same
instruction set. There are five classes of instructions(A,B,C,D,
and E) in the instruction set. The clock rate and CPI of each cla
ss is given below.
Solution:
a. P1: 2 × 109 inst/sec, P2: 2 × 109 inst/sec
b. P1: 2 × 109 inst/sec, P2: 3 × 109 inst/sec
a. T(P1)/T(P2) =(1*2+2+3+4+3)4/(2*2+2+2+4+4)
2
=14*2/16= 7/4;
P2 is
Clock Rate1.75CPI
times CPI
faster than
CPI P1 CPI CPI
b. T(P2)/T(P1 Class
)= 4.66/5
A Class B Class C Class D Class E
a P1 2.0GHz 1 2 3 4 3
P2 is 1.07 times faster than P1
P2 4.0GHz 2 2 2 4 4
b P1 2.0GHz 1 1 2 3 2
P2 3.0GHz 1 2 3 4 3
1.5.3 If the number of instructions executed
in a certain program is divided equally among
the classes of instructions except for class E
, which occurs twice as often as each of the o
thers, which computer is faster? How much fast
er is it?
Solution:
a. T(P2)/T(P1) = 4.5/8
P2 is 1.77 times faster than P1
b. T(P2)/T(P1) = 5.33/5.5
P2 Rate
Clock is 1.03
CPItimesCPI
faster CPI
than P1CPI CPI
Class A Class B Class C Class D Class E
a P1 2.0GHz 1 2 3 4 3
P2 4.0GHz 2 2 2 4 4
b P1 2.0GHz 1 1 2 3 2
P2 3.0GHz 1 2 3 4 3
The table below shows instruction-type breakdo
wn for different programs. Using this data, yo
u will be exploring the performance trade-offs
for different changes made to an MIPS process
.
No. Instructions
Compute Load Store Branch Total
a program1 600 600 200 50 1450
b program2 900 500 100 200 1700
1.5.4 Assuming that ALU take 1 cycle, loads a
nd store instructions take 10 cycles, and bran
ches take 3 cycles, find the execution time on
a 3GHz MIPS processor.
Solution:
a. (600+600*10+200*10+50*3)/3*10^9=2.91 μs
b. 2.50 μs
No. Instructions
Compute Load Store Branch Total
a program1 600 600 200 50 1450
b program2 900 500 100 200 1700
1.5.5 Assuming that computers take 1 cycle, l
oads and store instructions take 2 cycles, and
branches take 3 cycles, find the execution tim
e on a 3GHz MIPS processor.
Solution:
a. 0.78 μs
b. 0.90 μs
No. Instructions
Compute Load Store Branch Total
a program1 600 600 200 50 1450
b program2 900 500 100 200 1700
1.5.6 Assuming that computers take 1 cycle, l
oads and store instructions take 2 cycles, and
branches take 3 cycles, what is the speedup if
the number of compute instruction can be reduc
ed by one-half ?
Solution:
a. 0.78 μs
b. 0.90 μs
No. Instructions
Compute Load Store Branch Total
a program1 600 600 200 50 1450
b program2 900 500 100 200 1700
EXERCISE 1.6
Compilers can have a profound impact on the performan
ce of an application on given a processor. This problem
will explore the impact compilers have on execution tim
e.
Compiler A Compiler B
No. Instructions Execution Time No. Instructions Execution Time
a 1.00E+09 1.8s 1.20E+09 1.8s
b 1.00E+09 1.1s 1.20E+09 1.5s
1.6.1 For the same program, two different com
pilers are used. The table above shows the exe
cution time of the two different compiled prog
rams. Find the average CPI for each program g
iven that the processor has a clock cycle time
of 1 ns.
Solution:
a P1 1 2 3 4 5
P2 3 3 3 5 5
b P1 1 2 3 4 5
P2 2 2 2 2 6
1.6.4 Assume that peak performance is defined
as the fastest rate that a computer can execut
e any instruction sequence. What are the peak
performances of P1 and P2 expressed in instru
ctions per second?
Solution:
a P1 1 2 3 4 5
P2 3 3 3 5 5
b P1 1 2 3 4 5
P2 2 2 2 2 6
1.6.5 If the number of instructions executed
in a certain program is divided equally among
the five classes of instructions except for cl
ass A, which occurs twice as often as each of
the others, how much faster is P2 than P1?
Solution:
a. T1/T2 = 1.9
b. T1/T2 = 1.5
CPI CPI CPI CPI CPI
Class A Class B Class C Class D Class E
a P1 1 2 3 4 5
P2 3 3 3 5 5
b P1 1 2 3 4 5
P2 2 2 2 2 6
1.6.6 At what frequency does P1 have the sam
e performance of P2 for the instruction mix g
iven in 1.6.5 ?
Solution:
a. 4.37 GHz
b. 6 GHz
CPI CPI CPI CPI CPI
Class A Class B Class C Class D Class E
a P1 1 2 3 4 5
P2 3 3 3 5 5
b P1 1 2 3 4 5
P2 2 2 2 2 6
EXERCISE 1.14
Section 1.8 cites as a pitfall the utilization
of a subset of the performance equation as a p
erformance metric. To illustrate this, conside
r the following data for the execution of a pr
ogram in different processors.
Processor Clock Rate CPI No. Instr.
a P1 4 GHz 0.9 5.00E+06
P2 3 GHz 0.75 1.00E+06
b P1 3 GHz 1.1 3.00E+06
P2 2.5 GHz 0.50E+06
1.14.1 One usual fallacy is to consider the c
omputer with the largest clock rate as having
the largest performance. Check if this is true
for P1 and P2.
Solution:
a. Tfp = 70 × 0.8 = 56 s.
Tnew= 56 + 85 + 55 + 40 = 236 s.
Reduction: 5.6%
b. Tfp = 40 × 0.8 = 32 s.
Tnew= 32 + 90 + 60 + 20 = 202 s.
Reduction: 3.8%
FP Instr. INT Instr. L/S Instr. Branch Instr. Total Time
a 70s 85s 55s 40s 250s
b 40s 90s 60s 20s 210s
1.15.2 How much is the time for INT operation
s reduced if the total time is reduced by 20%?
Solution:
a. 2 processors:
CPIimproved fp = (2,048 – 3,816)/280 < 0 ==> not po
ssible
b. 16 processors:
CPIimproved fp = (256 – 462)/50 < 0 ==> not possibl
e
1.15.5 How much must we improve the CPI of L/S instru
ctions if we want the program to run two times faster?
Solution:
a.
2 processors:
CPIimproved l/s = (2,048 – 1,536)/640 = 0.8
b.
16 processors:
CPIimproved l/s = (256 – 198)/80 = 0.725
1.15.6 How much is the execution time of the program impr
oved if the CPI of INT and FP instructions is reduced by 4
0% and the CPI of L/S and Branch is reduced by 30%?
Solution:
clock cyles =
CPIfp × No. FP instr.
+ CPIint × No. INT instr.
+ CPIl/s × No. L/S instr.
+ CPIbranch × No. branch instr.
Tcpu = clock cycles/clock rate = clock cycles/2 × 109
CPIint = 0.6 × 1 = 0.6; CPIfp = 0.6 × 1 = 0.6;
CPIl/s = 0.7 × 4 = 2.8; CPIbranch = 0.7 × 2 = 1.4
2 processors: Tcpu (before improv.) = 2.048 s ;
Tcpu (after improv.) = 1.370 s ;
16processors: Tcpu (before improv.) = 0.256 s ;
Tcpu (after improv.) = 0.171 s 。