Sie sind auf Seite 1von 5

CSE 3666 Introduction to Computer Architecture

Mid-term Exam No. 1



October 4, 2012 (75 minutes, in-class)


Your Name: _______________________ Total Points: _________
Last 5 of PeopleSoft ID: _____________

There are 4 groups of questions, each is worth 25 questions, totaling 100 points.
Show all your work besides your answers for full or maximum partial credit.
Note: The Exam questions have been proofread thoroughly and should be self-
explanatory. Therefore unless absolutely necessary, please refrain from asking
questions during the Exam. This helps maintain a quiet environment for everyone
during the limited time available for the Exam.

Topics covered in this Mid-term Exam No. 1:
A. Computer architecture fundamentals
B. MIPS assembly language concepts
C. Performance and power evaluations
D. MIPS instruction code implementation

Q1. These questions relate to general concepts in computer architecture. Answer and justify
your answers concisely and sufficiently.

Q1a. (10 points) Amdahls Law assumes that there is a fraction of the running time, e.g. o, that
can be improved by a factor p, to produce an improvement on overall performance. Let T
old
be
the old running time without improvement, and T
new
be the running time after improvement.
Express T
new
in terms of T
old
, o and p. Then, write down T
old
/T
new
in terms of o and p.

T
new
=(oT
old
)/p +(1 - o)T
old
, unit in time
T
old
/T
new
=1 / (o/p +(1 - o)), unitless

Q1b. (5 points) Based on your results from Q1a, comment on what happens if you could come
up with a factor p that is infinitely large. In that case, how does o affects overall performance?

T
old
/T
new
is limited by 1/(1 - o) as p , theoretically.
This implies that, even if p is large, the performance improvement will only be 2 or less if o is
not larger than 50%. This can be seen by plotting 1/(1 - o) vs. o for o =[0, 1].

Q1c. (3 points) What is the clock rate (in Hz) of a CPU with a 200 picosecond cycle time?

Clock rate =1 / Cycle Time =1 / (200 x 10
-12
) =1 / (0.2 x 10
-9
) =5 GHz.

Q1d. (7 points) Given that performance is measured by CPU time, explain the roles of IC
(instruction count), CPI (cycles per instruction), and CCT (clock cycle time). State at least 2
challenges hardware designers face when trying to improve performance.

CPU time =IC * CPI * CCT. CCT is usually a function of the hardware. IC and CPI, on the
other hand, are usually affected by software such as compiler, algorithm, and the programming
language in use. All these 3 are affected by the ISA, thus indirectly affected by the hardware

An increase in clock rate, i.e. smaller CCT, often drives the CPI up and increase power
consumption trade-off 1.

Reducing total number of clock cycles, i.e. IC*CPI, often means we need to have a lower clock
rate as well trade-off 2.

Q2. These questions relate to MIPS assembly language concepts and MIPS instructions.
Q2a. (7 points) Write the following sequence of code in MIPS assembly language:

x =x +y +z - q;

Assume that x, y, z, and q are stored in registers $s1, $s2, $s3 and $s4 respectively.

add $t0, $s1, $s2
add $t1, $t0, $s2
sub $s1, $t1, $s4

Q2b. (5 points) In MIPS, how do we flip the bits, i.e. perform a 1s-compliment, on a 32-bit
value stored in a register, using one instruction? Write down that instruction with required
register operands, and justify your answer.

Since (a NOR b) is equivalent to NOT (a OR b), we can flip the bits in $t1 and store the result of
the flip in $t0 with this MIPS instruction: nor $t0, $t1, $zero

Q2c. (5 points) Given this MIPS instruction beq $s0, $s1, L1, show what should be done if
L1 is located at an address too large to be accommodated by the 16-bit branch address field.

bne $s0,$s1,L2
j L1 #else for L1 at an address larger than 16 bits can accommodate
L2:

Q2d. (8 points) Bitwise operations such as AND, OR, and SLL instructions are faster because
these are directly supported by hardware, as compared to * (multiplication) and % (modulus)
which are typically emulated by subroutines implemented with other MIPS instructions. Explain
why the following equivalencies hold (&, <<and | represent AND, SLL and OR, respectively):
1. Offset =L_Addr & 0x03ff IS EQUIVALENT TO Offset =L_Addr % 1024
2. Addr =(Frame <<10) | Offset IS EQUIVALENT TO Addr =Frame * 1024 +Offset
where L_Addr, Addr, and Frame represent 16-bit-wide registers, and Offset is 10-bit wide.

L_Addr & 0x3FF will give you the 10 least significant bits of L_Addr, since 0x03FF is equal to
0000 0011 1111 1111 (with 10 bits of value 1). The result is the same as L_Addr % 1024.
Frame << 10 will multiply Frame by 2
10
=1024. The| Offset bitwise OR will fill in the 10 least
significant bits of Addr (initially all with value of 0 after the Frame << 10 operation) with the
bits from Offset. This is the same as adding Offset to Frame * 1024.
Q3. The following questions pertain to two CPU designs, X and Y, both based on CMOS IC
technology and shares the same instruction set architecture. X has a clock cycle time of 250 ps
and a measured CPI of 1.0 for a J ava program. Y, on the other hand, has a clock cycle time of
500 ps and a measured CPI of 1.5 for the same J ava program. Furthermore, under normal
conditions, Y operates at 15% less voltage and 20% less capacitive load compared to X.
Q3a. (10 points) Which design, X or Y, is faster, i.e. gives better performance over the other
one, and by how much, using CPU time as the measure?

Let I be the Instruction Count (X and Y share the same ISA), then:
CPU Time X =I * CPI
X
* Cycle Time X =I * 1.0 * 250 =I * 250 ps
CPU Time Y =I * CPI
Y
* Cycle Time Y =I * 1.5 * 500 =I * 750 ps
X is faster than Y by (I*750) / (I*250) =3 times

Q3b. (10 points) Which design, X or Y, consumes less dynamic power over the other one, and
by how much?

Dynamic Power =C V
2
f
Power for X =C V
2
(1/250ps) =4x10
9
CV
2
Power for Y =(C*0.80) (V*0.85)
2
(1/500ps) =1.156x10
9
CV
2

Y consumes less power than X, by about 2.844x10
9
CV
2

Y consumes less power than X, i.e. Power Y / Power X =1.156 / 4 =0.289 ~ 30%
Alternatively, Power for Y / Power for X =(C*0.80) (V*0.85)
2
(f*0.5) / CV
2
f =0.289

Q3c. (5 points) If you are a mobile device design engineer, which architecture, X or Y, will you
choose for your device? What if you are a server designer?
(You will receive full or maximum partial credit for any answer that makes sense. Note that Y
uses 30% power of X but X performs 3 times faster than Y.)

Q4. Suppose that a new MIPS instruction, called bcp, was designed to copy a block of words
from one address to another. Assume that this instruction requires that the starting address of the
source block be in register $t1 and that the destination address be in $t2. The instruction also
requires that the number of words to copy be in $t3 (which is >0). Furthermore, assume that the
values of these registers as well as register $t4 can be destroyed in executing this instruction (so
that the registers can be used as temporaries to execute the instruction). You will most likely
need the MIPS load, store, add, and branch instructions for Q4a.

Q4a. (15 points) Use MIPS assembly code to implement block copy operation WITHOUT bcp,
i.e. emulate bcp using other existing MIPS instructions.
loop:
lw $t4, 0($t1)
sw $t4, 0($t2)
addi $t1, $t1, 4
addi $t2, $t2, 4
subi $t3, $t3, 1
bne $t3, $zero, loop

Q4b. (10 points) Use MIPS assembly code to implement block copy WITH this bcp instruction.
You can assume the same registers to be used as in Q4a and that bcp uses no register operands.
li $t1, src
li $t2, dst
li $t3, count
bcp

Das könnte Ihnen auch gefallen