Sie sind auf Seite 1von 2

CS151B/EE116C Homework 6

Session: Winter 2015


Total Points: 60

Instructor: Prof. Lei HE


Due Date: 23rd Feb (before class starts)

1. Cache Configuration [5 points]


Consider a cache of 48KB, with 16 byte-blocks. For each question below, describe
how or why not. Remember that 1K = 2^10.
a)
b)
c)

Can you make it fully associative?


How about a set-associative cache?
And a direct-mapped cache?

2. Locality [5 points]
a) Describe the general characteristics of a program that would exhibit very
little temporal and spatial locality with regard to instruction fetches. Provide
an example program (pseudocode is fine).
b)

(b) Describe the general characteristics of a program that would exhibit very
high amounts of temporal locality but very little spatial locality with regard to
instruction fetches. Provide an example program (pseudocode is fine).

c)

Describe the general characteristics of a program that would exhibit very


little temporal locality but very high amounts of spatial locality with regard to
instruction fetches. Provide an example program (pseudocode is fine).

3. Here is a series of address references given as word addresses: 2, 3, 11, 16, 21, 13, 64,
48, 19, 11, 3, 22, 4, 27, 6 and 11. Using the series references show the hits and misses
and the final cache contents for a direct-mapped cache with four-word blocks and a
total size of 16 words.
[10 points]
4. Assume we have codes

lw $1, 40($6)

beq $2, $0, Label ;


sw $6, 50($2)
Label: add $2, $3, $4
sw $3, 50($4)

Assume $2 = $0

(a) For this problem, assume that all branches are perfectly predicted, eliminating
all control hazards, and that no delay slots are used. If we change load/store
instructions to use a register without an offset as the address, these
instructions no longer need to use the ALU. As a result, MEM and EX can be
overlapped and the pipeline has only 4 stages. Change the code to
accommodate this changed ISA. Assuming this change does not affect clock
cycle time, what speedup is achieved in this instruction sequence?
[10 points]
(b) Assuming stall on branch and no delay slots, what speedup is achieved on this
code if branch outcomes are determined in the ID stage, relative to the
execution where branch outcomes are determined in the EX stage?
[10 points]
5. Consider three processors with different cache configurations:
Cache 1: Direct-mapped with one-word blocks
Cache 2: Direct-mapped with four-word blocks
Cache 3: Two-way set associative with four-word blocks
The following miss rate measurements have been made:
Cache 1: Instruction miss rate is 4%; data miss rate is 6%
Cache 2: Instruction miss rate is 2%; data miss rate is 4%
Cache 3: Instruction miss rate is 2%; data miss rate is 3%
For these processors, one-half of the instructions contain a data reference. Assume
that the cache miss penalty is 6 + Block size in words. The CPI for this workload
was measured on a processor with cache 1 and was found to be 2.0. Determine
which processor spends the most cycles on cache misses. [20 points]

Das könnte Ihnen auch gefallen