Beruflich Dokumente
Kultur Dokumente
Caches
Lecture 20: 1
Announcements
• HW7 will be posted tonight
Lecture 20: 2
Course Content
• Binary numbers and logic gates
• Boolean algebra and combinational logic
• Sequential logic and state machines
• Binary arithmetic
• Memories
• Instruction set architecture
• Processor organization
• Caches and virtual memory
• Input/output
• Advanced topics
Lecture 20: 3
Review: Pipelined Microprocessor
PCJ CU
=? LD
MW, MD
sign bit MB, F
Adder
+2 Fm … F0 Data
M
RF U M RAM
M LD X
P U
Decoder
U Inst SA X M
M ALU
X C RAM SB
M
U U
DR X D_IN
U
X
M
U
X
PCJ X MB
D_in MW MD
PCL
SE
IF/ID ID/EX EX/MEM MEM/WB
Lecture 20: 4
Example: Data Hazards with Forwarding
• Assume HW forwarding and NO delay slot for load
Lecture 20: 5
We Need Fast and Large Memory
IF ID EX MEM WB
Main
Memory
(DRAM)
Lecture 20: 7
Cache
• Small SRAM memory that permits rapid access to a
subset of instructions or data
– If the data is in the cache (cache hit), we retrieve it without
slowing down the pipeline
– If the data is not in the cache (cache miss), we retrieve it from the
main memory (penalty incurred in accessing DRAM)
Lecture 20: 8
Memory Access with Cache
• Average memory access time with cache:
Hit time + Miss rate * Miss penalty
• Example
– Main memory access time = 50ns
– Cache hit time = 2ns
– Miss rate = 10%
Lecture 20: 9
Why Caches Work: Principle of Locality
• Temporal locality
– If memory location X is accessed, then it is likely to
be accessed again in the near future
• Caches exploit temporal locality by keeping a referenced
instruction or data in the cache
• Spatial locality
– If memory location X is accessed, then locations near
X are likely to be accessed in the near future
• Caches exploit spatial locality by bringing in a block of
instructions or data into the cache on a miss
Lecture 20: 10
Some Important Terms
• Cache is partitioned into blocks
– Each cache block (or cache line) typically contains
multiple bytes of data
– A whole block is read or written during data transfer
between cache and main memory
Lecture 20: 11
Direct Mapped Cache Concepts
• A given memory block is mapped to one and
only one cache block
Lecture 20: 12
Direct Mapped (DM) Cache Concepts
00001
11001
11101
Lecture 20: 13
Lecture 21: 4
Address Translation for DM Cache
• DM cache parameters
– Size of each cache block is 2b bytes
• “cache block” and “cache line” are synonymous
– Number of blocks is 2i
– Total cache size is 2b × 2i = 2b+i bytes
Lecture 20: 14
DM Cache Organization
32-bit memory address
DM cache parameters
• 2 byte offset bits
• 10 index bits
• 20 tag bits
Lecture 20: 15
Reading DM Cache
• Use the index bits to retrieve
the tag, data, and valid bit
• Otherwise (miss)
– Bring the memory block into the cache (also set valid)
– Store the tag from the address with the block
– Select the desired data using the byte offset
Lecture 20: 16
Writing DM Cache
• Use the index bits to retrieve
the tag and valid bit
Data
Lecture 20: 17
Direct Mapped Cache Example
• Size of each block is 4 bytes
• Cache holds 4 blocks
• Memory holds 16 blocks
• Memory address has 6 bits
V tag data
00
01
10
2 tag bits 2 byte offset bits 11
2 index bits
Lecture 20: 18
Direct Mapped Cache Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100] miss
00 0 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 0 0011 130
R1 <= M[000000] 10 0 0100 140
R1 <= M[000100] 11 0 0101 150
0110 160
0111 170
R0 1000 180
R1 1001 190
R2 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250
Memory block
Data (decimal)
address (binary) Lecture 20: 19
Direct Mapped Cache Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100] miss
00 1 00 100 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 0 0011 130
R1 <= M[000000] 10 0 0100 140
R1 <= M[000100] 11 0 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 20: 20
Direct Mapped Cache Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100]
miss 00 1 00 100 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 0 0011 130
R1 <= M[000000] 10 0 0100 140
R1 <= M[000100] 11 0 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 20: 21
Direct Mapped Cache Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100]
miss 00 1 00 100 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] 10 0 0100 140
R1 <= M[000100] 11 0 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 20: 22
Direct Mapped Cache Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100] miss
00 1 00 100 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] 10 0 0100 140
R1 <= M[000100] 11 0 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 20: 23
Direct Mapped Cache Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100] miss
00 1 01 140 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] 10 0 0100 140
R1 <= M[000100] 11 0 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 20: 24
Direct Mapped Cache Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100]
00 1 01 140 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] miss 10 0 0100 140
R1 <= M[000100] 11 0 0101 150
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 110 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 20: 25
Direct Mapped Cache Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100]
00 1 01 140 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] miss 10 0 0100 140
11 1 01 170 150
R1 <= M[000100] 0101
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 20: 26
Direct Mapped Cache Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100] miss
00 1 01 140 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] 10 0 0100 140
11 1 01 170 150
R1 <= M[000100] 0101
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 20: 27
Direct Mapped Cache Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100] miss
00 1 00 100 0010 120
R3 <= M[010000]
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] 10 0 0100 140
11 1 01 170 150
R1 <= M[000100] 0101
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 20: 28
Direct Mapped Cache Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100]
00 1 00 100 0010 120
R3 <= M[010000] hit
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] 10 0 0100 140
11 1 01 170 150
R1 <= M[000100] 0101
0110 160
0111 170
R0 1000 180
R1 100 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 20: 29
Direct Mapped Cache Example
Processor Cache Memory
0000 100
R1 <= M[000000]
V tag data 0001 110
R2 <= M[000100]
00 1 00 100 0010 120
R3 <= M[010000] hit
R2 <= M[011100] 01 1 00 110 0011 130
R1 <= M[000000] 10 0 0100 140
11 1 01 170 150
R1 <= M[000100] 0101
0110 160
0111 170
R0 1000 180
R1 110 1001 190
R2 170 1010 200
R3 140 1011 210
1100 220
1101 230
1110 240
1111 250
Lecture 20: 30
Doubling the Block Size
• Size of each block is 8 bytes
• Cache holds 2 blocks
• Memory holds 8 blocks
• Memory address has 6 bits
V tag data
0
1
2 tag bits 3 byte offset bits
1 index bit
Lecture 20: 31
Doubling the Block Size
Processor Cache Memory
000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] miss
R3 <= M[010000] 0 0 001 120
R2 <= M[011100] 1 0 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 190
R2 101 200
R3 101 210
110 220
110 230
111 240
111 250
Lecture 20: 32
Doubling the Block Size
Processor Cache Memory
000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 00 110 100 001 120
R2 <= M[011100] 1 0 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 101 200
R3 101 210
110 220
110 230
111 240
111 250
Lecture 20: 33
Doubling the Block Size
Processor Cache Memory
000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] hit
R3 <= M[010000] 0 1 00 110 100 001 120
R2 <= M[011100] 1 0 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 101 200
R3 101 210
110 220
110 230
111 240
111 250
Lecture 20: 34
Doubling the Block Size
Processor Cache Memory
000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] hit
R3 <= M[010000] 0 1 00 110 100 001 120
R2 <= M[011100] 1 0 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 110 101 200
R3 101 210
110 220
110 230
111 240
111 250
Lecture 20: 35
Doubling the Block Size
Processor Cache Memory
000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 00 110 100 001 120
R2 <= M[011100] 1 0 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 110 101 200
R3 101 210
110 220
110 230
111 240
111 250
Lecture 20: 36
Doubling the Block Size
Processor Cache Memory
000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 01 150 140 001 120
R2 <= M[011100] 1 0 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 110 101 200
R3 140 101 210
110 220
110 230
111 240
111 250
Lecture 20: 37
Doubling the Block Size
Processor Cache Memory
000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100]
R3 <= M[010000] miss 0 1 01 150 140 001 120
R2 <= M[011100] 1 0 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 110 101 200
R3 140 101 210
110 220
110 230
111 240
111 250
Lecture 20: 38
Doubling the Block Size
Processor Cache Memory
000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100]
R3 <= M[010000] miss 0 1 01 150 140 001 120
R2 <= M[011100] 1 1 01 170 160 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 170 101 200
R3 140 101 210
110 220
110 230
111 240
111 250
Lecture 20: 39
Doubling the Block Size
Processor Cache Memory
000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 01 150 140 001 120
R2 <= M[011100] 1 1 01 170 160 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 170 101 200
R3 140 101 210
110 220
110 230
111 240
111 250
Lecture 20: 40
Doubling the Block Size
Processor Cache Memory
000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] miss
R3 <= M[010000] 0 1 00 110 100 001 120
R2 <= M[011100] 1 1 01 170 160 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 170 101 200
R3 140 101 210
110 220
110 230
111 240
111 250
Lecture 20: 41
Doubling the Block Size
Processor Cache Memory
000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] hit
R3 <= M[010000] 0 1 00 110 100 001 120
R2 <= M[011100] 1 1 01 170 160 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 100 100 190
R2 170 101 200
R3 140 101 210
110 220
110 230
111 240
111 250
Lecture 20: 42
Doubling the Block Size
Processor Cache Memory
000 100
R1 <= M[000000]
V tag data 000 110
R2 <= M[000100] hit
R3 <= M[010000] 0 1 00 110 100 001 120
R2 <= M[011100] 1 1 01 170 160 001 130
R1 <= M[000000] 010 140
R1 <= M[000100] 010 150
011 160
011 170
R0 100 180
R1 110 100 190
R2 170 101 200
R3 140 101 210
110 220
110 230
111 240
111 250
Lecture 20: 43
Block Size Considerations
• Larger blocks may reduce miss rate due to
spatial locality
Lecture 20: 44
Next Time
More Caches
Lecture 20: 45