Sie sind auf Seite 1von 5

c  

 

   

In this question, consider the following series of address references (given as word addresses):
4, 1, 20, 5, 8, 17, 16, 44 45, 4, 20, 21, 1, 20, 5, 21
For each of the following cache organizations, show the content of the cache after each memory
reference and indicate whether the reference is a hit or a miss. Use [tag, M(address), ...] to
describe the content of each entry. For example [4,M(46)] indicates that the entry contains tag=4
and the data from memory location 46. Similarly, [4,M(46),M(47)] indicates that the entry
contains a block of two words from memory locations 46 and 47. As discussed in class, avoid
drawing the cache after each reference by drawing only one cache and indicating that an entry E1
is replaced by E2 by crossing E1 and writing E2 next to it. Assume Least Recently Used
replacement and assume that the cache is initially empty (invalid entries).
(a) a direct mapped cache with 16 one-word blocks.

An address¶s tag is floor(A/16), the index is A % 16

Index Content of cache (ordered from oldest to most recent)


0 [1,M(16)]
1 [0,M(1)] · [1,M(17)]· [0,M(1)]
2
3
4 [0,M(4)] · [1,M(20)] · [0,M(4)]· [1,M(20)]
5 [0,M(5)] · [1,M(21)] · [0,M(5)] · [1,M(21)]
6
7
8 [0,M(8)]
9
10
11
12 [2,M(44)]
13 [2,M(45)]
14
15

4(miss), 1(miss), 20(miss), 5(miss), 8(miss), 17(miss), 16(miss), 44(miss), 45(miss),


4(miss), 20(hit), 21(miss), 1(miss), 20(hit), 5(miss), 21(miss)

(b) a direct mapped cache with two-word blocks and total size of 16 words.

For this part, cache blocks are twice as large, so index 0 would hold data held by
indices 0   1 in part a. An address¶s index is floor(A/2) % 8. The tags are
unchanged from a.
Index Content of cache (ordered from oldest to most recent)
0 [0,M(0),M(1)] · [1,M(16),M(17)] · [0,M(0),M(1)]
1
2 [0,M(4),M(5)] · [1,M(20),M(21)] ·[0,M(4),M(5)] ·
[1,M(20),M(21)]·[0,M(4),M(5)]·[1,M(20),M(21)]
3
4 [0,M(8),M(9)]
5
6 [2,M(44),M(45)]
7

4(miss), 1(miss), 20(miss), 5(miss), 8(miss), 17(miss), 16(hit), 44(miss), 45(hit), 4(hit),
20(miss), 21(hit), 1(miss), 20(hit), 5(miss), 21(miss)

(c) a 2-way associative cache with one-word blocks and total size of 16 words

Two-way set associativity means addresses mapping to index 0 hold blocks that
would have mapped to indices 0   8 in part a. The index is A % 8 now, and the tag
is floor(A/8). A cache block holding addresses 6 and 14 where 6 is the most recently
used looks like [0,M(6)|1,M(14)]

Index Content of first way Content of second way


0 [0,M(8)] [2,M(16)]
1 [0,M(1)] [2,M(17)]
2
3
4 [0,M(4)] · [5,M(44)]· [0,M(20)] [0,M(20)]· [0,M(4)]
5 [0,M(5)]· [2,M(21)] [5,M(45)]· [0,M(5)]
6
7

4(miss), 1(miss), 20(miss), 5(miss), 8(miss), 17(miss), 16(miss), 44(miss), 45(miss), 4(miss),
20(miss), 21(miss), 1(hit), 20(hit), 5(miss), 21(hit)

(d) a 2-way associative cache with two-word blocks and total size of 16 words

With both 2-way set associativity and 2-word blocks, there are only 4 sets or valid
indices. An address¶s index is floor(A/2) % 4, its tag is floor(A/8)

Index Content of first way Content of second way


0 [0,M(0),M(1)] · [2,M(16),M(17)] [1,M(8),M(9)]·[0,M(0),M(1)]
1
2 [0,M(4),M(5)] [2,M(20),M(21)] · [5,M(44),M(45)]
· [2,M(20),M(21)]
3
4(miss), 1(miss), 20(miss), 5(hit), 8(miss), 17(miss), 16(hit), 44(miss), 45(hit), 4(hit), 20(miss),
21(hit), 1(miss), 20(hit), 5(hit), 21(hit)


   

Compute the total number of bits required to implement each of the caches in question 1.
Assume that each memory word is 32-bit long and that the entire memory contains 512KB (128
Kilo words). That is, the address of a word is 17-bit long. Note that the number of bits needed to
implement the cache represents the total amount of memory needed for storing all of the data,
tags and valid bits.

To hold the data for all parts, we need 16 32-bit words, which is 512 bits. The difference for each
cache organization is in the tag bits, valid bits, and bits to store LRU information.
hp
ndex bits do not need to be stored, but are log2(the number of cache lines)
hp Tag bits are all bits needed to uniquely identify the address, which will be:
(17 ± index bits ± offset bits) for each cache entry
hp We need one valid bit for each cache entry
hp We need log2(associativity) bits for each block.
n this case, 1 bit per entry

a)p
hp Tag bits: (17-4) * 16 = 208
hp ùalid bits: 16
hp Total: 512+208 + 16 = 736 bits
b)p
hp Tag bits: (17-3-1) * 8 = 104
hp ùalid bits: 8
hp Total: 512+104+8 = 624
c)p
hp Tag bits: (17-3) * 16 = 224
hp ùalid bits: 16
hp LRU bits: 16
hp Total: 512+224+16+16=768
d)p
hp Tag bits: (17-2-1)*8 =112
hp ùalid bits: 8
hp LRU bits: 8
hp Total: 512+112+8+8=640

Note that you are not expected to take into account the LRU overhead for this question.
n this
case, the answers for (c) and (d) change to 752 and 632, respectively.


   
Consider a program in which 40% of the instructions are memory load or store instructions, and
assume that the CPI for this machine is 1 when the data and instructions are always found in the
cache.

1.p Assume that the cache miss penalty is 50 cycles, what is the effective CPI if the
instruction cache miss rate is 4% and the data cache miss rate is 6%?
2.p Assume that an L2 cache is added to the system, and that the hit time for the L2 cache is
10 cycles and its miss penalty is 40 cycles. What would be the effective CPI if 60% of the
references to the L2 cache (the misses from L1) are L2 hits?

1.p When we miss in either cache, we need to stall 50 cycles.

CP
= CP
(base) + CP
(inst) + CP
(data)
= 1 + 0.04*50 + 0.06*0.4*50
= 4.2

2.p Here we first get the average miss time (T) for an L1 cache (in part 1 it was 50 cycles)

T = T(L2) + T(L2 miss)


= 10 + (1-0.6)*40
= 26

Now we repeat the steps in part 1, but we use 26 instead of 50 for the expected miss
latency

CP
= CP
(base) + CP
(inst) + CP
(data)
= 1 + 0.04*26 + 0.06*0.4*26
= 2.7


   

mo problems 5.6.4, 5.6.5 and 5.6.6 from the textbook.

5.6.4)What¶s the optimal block size for a miss latency of 20 * B cycles?

Just minimize the average latency for a memory access (miss rate * miss penalty)

a)p

X      !"


    #  !
8 0.08 * 20 * 8 12.8
$ 0.03 * 20 * 16 %&$
32 0.018 * 20 * 32 11.52
64 0.015 * 20 * 64 19.2
128 0.02 * 20 * 128 51.2
b)p
X      !"
    #  !
' 0.04 * 20 * 8 6.4
16 0.04 * 20 * 16 12.8
32 0.03 * 20 * 32 19.2
64 0.015 * 20 * 64 19.2
128 0.02 * 20 * 128 51.2

5.6.5) What¶s the optimal block size for a miss latency of 24+B cycles?

a)p

X      !"


    #  !
8 0.08 * (24+8) 2.56
16 0.03 * (24+16) 1.2
 0.018 * (24+32) &'
64 0.015 * (24+64) 1.32
128 0.02 * (24+128) 3.04
b)p

X      !"


    #  !
' 0.04 * (24+8) &'
16 0.04 * (24+16) 1.6
32 0.03 * (24+32) 1.68
64 0.015 * (24+64) 1.32
128 0.02 * (24+128) 3.04

5.6.6) For constant miss latency, what¶s the optimal block size?p

a)p

X      !"


    #  !
8 0.08 * C 0.08C
16 0.03 * C 0.03C
32 0.018 * C 0.018C
$ 0.015 * C &(
128 0.02 * C 0.02C
b)p

X      !"


    #  !
8 0.04 * C 0.04C
16 0.04 * C 0.04C
32 0.03 * C 0.03C
$ 0.015 * C &(
128 0.02 * C 0.02C