Sie sind auf Seite 1von 6

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

I SEMESTER 2011-2012
CS G562 ADVANCED ARCHITECTURE & PERFORMANCE EVALUATION
CS C524 ADVANCED COMPUTER ARCHITECTURE
COMPREHENSIVE EXAM (PART A)(CLOSE BOOK)
MAX TIME: 120 Min. 07/12/11 MM: 55/2

ID NO: NAME:

Q1. List four factors which affect the amount of exploitable ILP. [1.5]

Q2. List 3 major differences between Power4 and Power5 processors. [1.5]

Q3. What is the main difference between Superscalar and VLIW architectures? [1]

Q4. What are the advantages and disadvantages of Fine grained multithreading? [1]

Q5. In SMT is it possible to control the performance of an individual thread? Explain your answer.
[1.5]

Q6. How are structural hazards avoided in speculative Tomasulo algorithm? [1]

Q7. What are the three main problems associated with VLIW implementation? [1.5]

Q8.Find all data dependencies in the following instruction sequence. For every data dependence,
identify its type, and the instructions involved. Example RAW: 2,3 etc. [2]
1: lw $1,40($2)
2: add $2,$3,$3
3: add $1,$1,$2
4: sw $1,20($2)

1
Q9. Assume that you have a typical 5-stage pipelined processor that uses forwarding and stalls to
solve data hazards, resolves branches in the decode stage, and has one branch delay slot. After you do
pipeline scheduling (rearrange instructions) for the following code sequence, what is the minimum
number of cycles needed to execute this code sequence? Show the rearranged sequence. [4]

Number of cycles __________________________

ORIGINAL SEQUENCE REARRANGED SEQUENCE


L1: lw R5, 0(R1)
lw R6, 0(R5)
add R7, R7, R6
addi R1, R1, #-4
bne R1, R2, L1
nop

Q10. Consider the pipelined datapath shown in Figure below. [10]


(a) Describe the difficulty in executing the following instructions in this datapath. How many stall
cycles must be inserted to correctly execute this code?
Add $a3, $a1, $a2
Sub $a4, $a5, $a6
Beq $a3, $a4, Label

(b) Show how the number of stalls could be minimized with forwarding in the figure below. You
may add new components in the data path and strike off current components if required. Show the
location of the forwarding unit and show all the input and output signals to/from it. Give the
governing equations for the forwarding unit. Assume that reading from Register file and Memory
takes same amount of time. You can assume the register file and ALU have the same latency, while
muxes have negligible delays. You only need to consider dependencies between branches and R type
instructions (as shown above in part(a)) . You don’t have to worry about other types of instructions.

EQUATIONS:

2
Q11. (a) The figure below shows the pipelined datapath with forwarding, but no hazard detection unit.
Consider the following code fragment. Draw the pipeline diagram for this code showing the stalls
required for correct execution. [10]
lw $t0, 0($a0)
sw $t0, 0($t0)

(b) It is possible to modify the datapath so the code in Part (a) executes with minimum stalls. Explain
how this could be done, and show your changes on the diagram. Also write the equations governing
the generation of any new control signal you may use.

3
4
Q12. Use register renaming to rewrite the following code segment to eliminate WAW and WAR
dependencies. Assume only two more registers R4 and R5 are available. [4]
lw R2, 0(R1)
add R3, R2, R2
sw R3, 0(R1)
lw R2, 4(R1) 
add R3, R2, R2
sw R3, 4(R1)

Q13. What are the main components of Tournament branch predictor? Explain in brief. [3]

Q14. The branch outcome of one branch instruction was (T, T, T, NT, NT). What are the
accuracies of the following branch predictors on this branch instruction? [5]
(a) (1,1) correlating predictor with all entries initialized to Taken

(b) Always-not-taken predictor

(c) Two-bit predictor assuming that the predictor starts in the deep state of predict not
taken

(d) Two-bit predictor when the predictor executes this pattern repeatedly forever

Q15. Explain in brief the technique of merging cache write buffers. [2]

5
Q16. Consider the complete memory hierarchy with physically tagged and virtually accessed L1
cache. You have a paged memory system. The processor has 256 Mbytes of memory and an 4 Gbyte
virtual address space with 4 Kbyte pages. The L1 cache is, 4-way set associative cache with 64 byte
lines. The L2 cache is a 2 Mbyte, 8-way set associative cache with 256 byte lines. Address translation
is supported by a 8 entry TLB.
(a) Assuming L1 cache of maximum size show the breakdown of the fields of virtual address,
physical address, the physical address as interpreted by the L1 cache, and the physical address
as interpreted by the L2 cache. Clearly mark the number of bits in each address and the
number of bits in each field.
(b) Draw neatly the complete address translation diagram starting from virtual address, showing
the TLB, L1 and L2 caches. [6]

Das könnte Ihnen auch gefallen