Sie sind auf Seite 1von 25

CPS 104 Computer Organization and Programming Lecture- 26: Cache Memory

March 22, 2004 Gershon Kedem http://kedem.duke.edu/cps104/Lectures

CPS104 Lec26.1

GK Spring 2004

Admin.

Homework-6: Is posted. u Due date was extended to March 29. u No further extension! u The second part of this assignment will be posted Monday March 22. u This assignment is harder then it looks. u These two assignment have larger weight then other homework assignments. u Please start ASAP!! Homework -7: will be posted today

CPS104 Lec26.2

GK Spring 2004

Review: Direct Mapped Cache

For a Cache of 2M bytes with block size of 2L bytes


u u

There are 2M-L cache blocks, Lowest L bits of the address are Block-Offset bits

u u

Next (M - L) bits are the Cache-Index. The last (32 - M) bits are the Tag bits.
32-M bits Tag M-L bits Cache Index L bits block offset

Data Address
CPS104 Lec26.3
GK Spring 2004

Example: 1-KB Cache with 32B blocks: Cache Index = (<Address> Mod (1024))/ 32 Block-Offset = <Address> Mod (32) Tag = <Address> / (1024)
22 bits Tag 5 bits Cache Index 5 bits block offset

Address
Valid bit Cache Tag 22 bits Direct Mapped Cache Data 32-byte block Byte 31 Byte 30

....

Byte 1 Byte 0

32 cache blocks

CPS104 Lec26.4

1K = 210 = 1024 25 = 32

GK Spring 2004

Example: 1KB Direct Mapped Cache with 32B Blocks

For a 1024 (210) byte cache with 32-byte blocks: u The uppermost 22 = (32 - 10) address bits are the Cache Tag u The lowest 5 address bits are the Byte Select (Block Size = 25) u The next 5 address bits (bit5 - bit9) are the Cache Index
31 Cache Tag Example: 0x50 9 Cache Index Ex: 0x01 Cache Data Byte 31 Byte 63 0x50 4 0 Byte Select Ex: 0x00

Stored as part of the cache state Valid Bit Cache Tag

: :

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3

:
Byte 1023

:
CPS104 Lec26.5

Byte 992 31

Byte Select
GK Spring 2004

Example: 1K Direct Mapped Cache


31 Cache Tag 0x0002fe 9 Cache Index 0x00 4 Byte Select 0 0x00

Valid Bit 0 1 1

Cache Tag 0xxxxxxx 0x000050 0x004440

Cache Data Byte 31 Byte 63

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3

: :

:
Byte 1023

:
Byte 992 31

=
CPS104 Lec26.6

Byte Select Cache Miss

GK Spring 2004

Example: 1K Direct Mapped Cache


31 Cache Tag 0x0002fe 9 Cache Index 0x00 4 Byte Select 0 0x00

Valid Bit 1 1 1

Cache Tag 0x0002fe 0x000050 0x004440

Cache Data 0 New Block of data Byte 63 Byte 33 Byte 32 1 2 3

:
Byte 1023

:
Byte 992 31

=
CPS104 Lec26.7

Byte Select

GK Spring 2004

Example: 1K Direct Mapped Cache


31 Cache Tag 0x000050 9 Cache Index 0x01 4 Byte Select 0 0x08

Valid Bit 1 1 1

Cache Tag 0x0002fe 0x000050 0x004440

Cache Data Byte 31 Byte 63

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3

: :

:
Byte 1023

:
Byte 992 31

=
CPS104 Lec26.8

Byte Select Cache Hit

GK Spring 2004

Example: 1K Direct Mapped Cache


31 Cache Tag 0x002450 9 Cache Index 0x02 4 Byte Select 0 0x04

Valid Bit 1 1 1

Cache Tag 0x0002fe 0x000050 0x004440

Cache Data Byte 31 Byte 63

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3

: :

:
Byte 1023

:
Byte 992 31

=
CPS104 Lec26.9

Byte Select Cache Miss

GK Spring 2004

Example: 1K Direct Mapped Cache


31 Cache Tag 0x002450 9 Cache Index 0x02 4 Byte Select 0 0x04

Valid Bit 1 1 1

Cache Tag 0x0002fe 0x000050 0x002450

Cache Data Byte 31 Byte 63

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 New Block of data 3

: :

:
Byte 1023

:
Byte 992 31

=
CPS104 Lec26.10

Byte Select

GK Spring 2004

Block Size Tradeoff

In general, larger block size take advantage of spatial locality BUT: u Larger block size means larger miss penalty:

Takes longer time to fill up the block Too few cache blocks

If block size is too big relative to cache size, miss rate will go up

In general, Average Access Time:


u

Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate


Miss Rate Exploits Spatial Locality
Fewer blocks: compromises temporal locality Average Access Time Increased Miss Penalty & Miss Rate

Miss Penalty

Block CPS104 Lec26.11 Size

Block Size

Block Size 2004 GK Spring

A N-way Set Associative Cache

N-way set associative: N entries for each Cache Index u N direct mapped caches operating in parallel Example: Two-way set associative cache u Cache Index selects a set from the cache u The two tags in the set are compared in parallel u Data is selected based on the tag result
Cache Tag Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Cache Tag Valid

Valid

Adr Tag

Compare

SEL11

Mux

0 SEL0

Compare

OR
CPS104 Lec26.12

Hit

Cache Block
GK Spring 2004

Advantages of Set associative cache

Higher Hit rate for the same cache size. Fewer Conflict Misses. Can can have a larger cache but keep the index smaller (same size as virtual page index)

CPS104 Lec26.13

GK Spring 2004

Disadvantage of Set Associative Cache

N-way Set Associative Cache versus Direct Mapped Cache: u N comparators vs. 1 u Extra MUX delay for the data u Data comes AFTER Hit/Miss decision and set selection In a direct mapped cache, Cache Block is available BEFORE Hit/Miss: u Possible to assume a hit and continue. Recover later if miss.
Cache Tag Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Cache Tag Valid

Valid

Adr Tag

Compare

SEL11

Mux

0 SEL0

Compare

OR
CPS104 Lec26.14

Hit

Cache Block

GK Spring 2004

Another Extreme Example:Fully Associative cache

Fully Associative Cache -- push the set associative idea to its limit! u Forget about the Cache Index u Compare the Cache Tags of all cache entries in parallel u Example: Block Size = 32B blocks, we need N 27-bit comparators By definition: Conflict Miss = 0 for a fully associative cache
31 Cache Tag (27 bits long) 4 0 Byte Select Ex: 0x01

Cache Tag X X X X X
CPS104 Lec26.15

Valid Bit Cache Data Byte 31 Byte 1 Byte 0 Byte 63 Byte 33 Byte 32

: :

:
GK Spring 2004

Sources of Cache Misses

Compulsory (cold start or process migration, first reference): first access to a block u Cold fact of life: not a whole lot you can do about it Conflict (collision): u Multiple memory locations mapped to the same cache location u Solution 1: increase cache size u Solution 2: increase Associativity Capacity: u Cache cannot contain all blocks access by the program u Solution: increase cache size Invalidation: other process (e.g., I/O) updates memory

CPS104 Lec26.16

GK Spring 2004

Sources of Cache Misses

Direct Mapped Cache Size Compulsory Miss Big Same

N-way Set Associative Medium Same Medium Medium Same

Fully Associative Small Same Zero High Same

Conflict Miss Capacity Miss Invalidation Miss

High Low(er) Same

Note: If you are going to run billions of instruction, Compulsory Misses are insignificant.

CPS104 Lec26.17

GK Spring 2004

The Need to Make a Decision!

Direct Mapped Cache: u Each memory location can only mapped to 1 cache location u No need to make any decision :-)

Current item replaced the previous item in that cache location

N-way Set Associative Cache: u Each memory location have a choice of N cache locations Fully Associative Cache: u Each memory location can be placed in ANY cache location Cache miss in a N-way Set Associative or Fully Associative Cache: u Bring in new block from memory u Throw out a cache block to make room for the new block u We need to make a decision on which block to throw out!

CPS104 Lec26.18

GK Spring 2004

Cache Block Replacement Policy


l

Random Replacement: u Hardware randomly selects a cache block out of the set and replaces it. Least Recently Used: u Hardware keeps track of the access history u Replace the entry that has not been used for the longest time. u For two way set associative cache one needs one bit for LRU replacement. Example of a Simple Pseudo Least Recently Used Implementation: u Assume 64 Fully Associative Entries u Hardware replacement pointer points to one cache entry u Whenever an access is made to the entry the pointer points to: Move the pointer to the next entry Entry 0 Otherwise: do not move the pointer
Replacement Pointer
CPS104 Lec26.19

Entry 1

:
Entry 63 2004 GK Spring

Cache Write Policy: Write Through versus Write Back

Cache read is much easier to handle than cache write: u Instruction cache is much easier to design than data cache Cache write: u How do we keep data in the cache and memory consistent? Two options (decision time again :-) u Write Back: write to cache only. Write the cache block to memory when that cache block is being replaced on a cache miss.

u

Need a dirty bit for each cache block Greatly reduce the memory bandwidth requirement Control can be complex What!!! How can this be? Isnt memory too slow for this?

Write Through: write to cache and memory at the same time.

CPS104 Lec26.20

GK Spring 2004

Write Buffer for Write Through


Cache

Processor

DRAM

Write Buffer

A Write Buffer is needed between the Cache and Memory u Processor: writes data into the cache and the write buffer u Memory controller: write contents of the buffer to memory Write buffer is just a FIFO: u Typical number of entries: 4 u Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle Memory system designers nightmare: u Store frequency (w.r.t. time) > 1 / DRAM write cycle u Write buffer saturation
GK Spring 2004

CPS104 Lec26.21

Write Buffer Saturation


Processor Cache DRAM

Write Buffer

Store frequency (w.r.t. time) -> 1 / DRAM write cycle u If this condition exist for a long period of time (CPU cycle time too quick and/or too many store instructions in a row):

Store buffer will overflow no matter how big you make it The CPU Cycle Time << DRAM Write Cycle Time

Solution for write buffer saturation: u Use a write back cache u Install a second level (L2) cache: u store compression
Processor Cache L2 Cache DRAM
GK Spring 2004

CPS104 Lec26.22

Write Buffer

Write Allocate versus Not Allocate

Assume: a 16-bit write to memory location 0x0 and causes a miss u Do we read in the block?
31

Yes: Write Allocate No: Write Not Allocate


Cache Tag Example: 0x00

9 Cache Index Ex: 0x00

4 0 Byte Select Ex: 0x00

Dirty-bit Valid Bit Cache Tag 0x00

Cache Data Byte 31 Byte 63

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3

: :

:
CPS104 Lec26.23

:
Byte 1023

:
Byte 992 31
GK Spring 2004

Four Questions for Memory Hierarchy Designers Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy)

CPS104 Lec26.24
GK Spring 2004

What is a Sub-block?

Sub-block: u Share one cache tag between all sub-blocks in a block u A unit within a block that has its own valid bit u Example: 1 KB Direct Mapped Cache, 32-B Block, 8-B Sub-block Each cache entry will have: 32/8 = 4 valid bits Write miss: only the bytes in that sub-block is brought in. u reduce cache fill bandwidth (penalty).
SB3s V Bit SB2s V Bit SB1s V Bit SB0s V Bit

Cache Tag

Cache Data B31 B24 Sub-block3 Sub-block2

B7 B0 0 Sub-block1 Sub-block0 1 2 3

:
CPS104 Lec26.25

: : : :
Byte 1023

:
31 Byte 992
GK Spring 2004

Das könnte Ihnen auch gefallen