Lecture22 PDF

ECE 2300
Digital Logic & Computer Organization

Spring 2018
More Caches
Measuring Performance
Lecture 22: 1
Announcements
• HW7 due tomorrow 11:59pm
• Prelab 5(c) due Saturday 3pm
• Lab 6 (last one) released
• HW8 (last one) to be released tonight
Lecture 22: 2
Another LRU Replacement Example
• 2-way set associative (*) = LRU block
1 bit in this case
Block Cache Hit/miss Cache contents after access
address index Set 0 Set 1
0 0 miss Mem[0]
4 0 miss Mem[0] (*) Mem[4]
2 0 miss Mem[2] Mem[4] (*)
6 0 miss Mem[2] (*) Mem[6]
8 0 miss Mem[8] Mem[6] (*)
0 0 miss Mem[8] (*) Mem[0]
4 0 miss Mem[4] Mem[0] (*)
2 0 miss Mem[4] (*) Mem[2]
6 0 miss Mem[6] Mem[2] (*)
8 0 miss Mem[6] (*) Mem[8]
2 0 miss Mem[2] Mem[8] (*)
6 0 miss Mem[2] (*) Mem[6]
2 0 hit Mem[2] Mem[6] (*)
0 0 miss Mem[2] (*) Mem[0]
Color code: Cold miss Conflict miss Capacity miss

Lecture 22: 3
What About Writes?
• Where do we put the result of a store?
• Cache hit (block is in cache)

– Write new data value to the cache
– Also write to memory (write through)
– Don’t write to memory (write back)
• Requires an additional dirty bit for each cache block
• Writes back to memory when a dirty cache block is evicted
• Cache miss (block is not in cache)

– Allocate the line (bring it into the cache)
(write allocate)
– Write to memory without allocation
(no write allocate or write around)
Lecture 22: 4
Write Through Example
• Assume write allocate
• Size of each block is 8 bytes
• Cache holds 2 blocks
• Memory holds 8 blocks
• Memory address
V tag data
0
1
2 tag bits 3 byte offset bits
1 index bit
Lecture 22: 5
Write Through
Processor Cache Memory
000 100
M[000000] <= R0
V tag data 000 110
M[000100] <= R1 miss
M[010000] <= R2 0 0 001 120
M[011100] <= R3 1 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 6
Write Through
000 100
M[000000] <= R0
V tag data 000 110
M[000100] <= R1 miss
M[010000] <= R2 0 1 00 110 100 001 120
M[011100] <= R3 1 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 7
Write Through
000 333
M[000000] <= R0
V tag data 000 110
M[000100] <= R1 miss
M[010000] <= R2 0 1 00 110 333 001 120
M[011100] <= R3 1 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 8
Write Through
000 333
M[000000] <= R0
V tag data 000 110
M[000100] <= R1 hit
M[010000] <= R2 0 1 00 110 333 001 120
M[011100] <= R3 1 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 9
Write Through
000 333
M[000000] <= R0
V tag data 000 444
M[000100] <= R1 hit
M[010000] <= R2 0 1 00 444 333 001 120
M[011100] <= R3 1 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 10
Write Through
000 333
M[000000] <= R0
V tag data 000 444
M[000100] <= R1 miss
M[010000] <= R2 0 1 00 444 333 001 120
M[011100] <= R3 1 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 11
Write Through
000 333
M[000000] <= R0
V tag data 000 444
M[000100] <= R1 miss
M[010000] <= R2 0 1 01 150 140 001 120
M[011100] <= R3 1 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 12
Write Through
000 333
M[000000] <= R0
V tag data 000 444
M[000100] <= R1 miss
M[010000] <= R2 0 1 01 150 555 001 120
M[011100] <= R3 1 0 001 130
010 555
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 13
Write Through
000 333
M[000000] <= R0
V tag data 000 444
M[000100] <= R1
M[010000] <= R2 miss 0 1 01 150 555 001 120
M[011100] <= R3 1 0 001 130
010 555
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 14
Write Through
000 333
M[000000] <= R0
V tag data 000 444
M[000100] <= R1
M[010000] <= R2 miss 0 1 01 150 555 001 120
M[011100] <= R3 1 1 01 170 160 001 130
010 555
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 15
Write Through
000 333
M[000000] <= R0
V tag data 000 444
M[000100] <= R1
M[010000] <= R2 miss 0 1 01 150 555 001 120
M[011100] <= R3 1 1 01 666 160 001 130
010 555
010 150
011 160
011 666
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 16
Write Back Example
• Assume write allocate
• Size of each block is 8 bytes
• Cache holds 2 blocks
• Memory holds 8 blocks
• Memory address Dirty bit
V D tag data
0
1
2 tag bits 3 byte offset bits

1 index bit
Lecture 22: 17
Write Back
000 100
M[000000] <= R0
V D tag data 000 110
M[000100] <= R1 miss
M[010000] <= R2 0 0 0 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 18
Write Back
000 100
M[000000] <= R0
M[000100] <= R1 miss
M[010000] <= R2 0 1 0 00 110 100 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 19
Write Back
000 100
M[000000] <= R0
M[000100] <= R1 miss
M[010000] <= R2 0 1 1 00 110 333 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 20
Write Back
000 100
M[000000] <= R0
M[000100] <= R1 hit
M[010000] <= R2 0 1 1 00 110 333 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 21
Write Back
000 100
M[000000] <= R0
M[000100] <= R1 hit
M[010000] <= R2 0 1 1 00 444 333 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 22
Write Back
000 100
M[000000] <= R0
M[000100] <= R1 miss
M[010000] <= R2 0 1 1 00 444 333 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 23
Write Back
000 333
M[000000] <= R0
M[000100] <= R1 miss
M[010000] <= R2 0 1 1 00 444 333 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 24
Write Back
000 333
M[000000] <= R0
M[000100] <= R1 miss
M[010000] <= R2 0 1 0 01 150 140 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 25
Write Back
000 333
M[000000] <= R0
M[000100] <= R1 miss
M[010000] <= R2 0 1 1 01 150 555 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 26
Write Back
000 333
M[000000] <= R0
M[000100] <= R1
M[010000] <= R2 miss 0 1 1 01 150 555 001 120
M[011100] <= R3 1 0 0 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 27
Write Back
000 333
M[000000] <= R0
M[000100] <= R1
M[010000] <= R2 miss 0 1 1 01 150 555 001 120
M[011100] <= R3 1 1 0 01 170 160 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 28
Write Back
000 333
M[000000] <= R0
M[000100] <= R1
M[010000] <= R2 miss 0 1 1 01 150 555 001 120
M[011100] <= R3 1 1 1 01 666 160 001 130
010 140
010 150
011 160
011 170
R0 333 100 180
R1 444 100 190
R2 555 101 200
R3 666 101 210
110 220
110 230
111 240
111 250
Lecture 22: 29
Cache Hierarchy
• Time to get a block from memory is so long that
performance suffers even with a low miss rate
• Example: 3% miss rate, 100 cycles to main

memory
– 0.03 × 100 = 3 extra cycles on average to access
instructions or data
• Solution: Add another level of cache
Lecture 22: 30
Pipeline with a Cache Hierarchy
Adder L1
+2 Fm … F0 Data
M
RF U M
Cache
M LD X
(KB)
P L1 U
Decoder
U Instr SA X M
M ALU
X C Cache SB
M
U U
(KB) DR X D_IN
U
X
M
U
X
PCJ X MB
D_in MD
PCL
SE
IF/ID ID/EX EX/MEM MEM/WB
L2 Cache (MB)
Main Memory (GB)
Lecture 22: 31
Cache Hierarchy
• Level 1 (L1) instruction and data caches
– Small, but very fast
• Level 2 (L2) cache handles L1 misses
– Larger and slower than L1, but much faster than main memory
– L1 data are also present in L2
• Main memory handles L2 cache misses
• Example: assume 1 cycle to access L1 (3% miss rate),

10 cycles to L2, 10% L2 miss rate, 100 cycles to main
memory
– How many cycles on average for instruction/data access?
1 + 0.03 × (10 + 0.1 × 100) = 1.6 cycles
Lecture 22: 32
How Do We Measure Performance?
• Execution time: The time between the start and
completion of a program (or task)
• Throughput: Total amount of work done in a

given time
• Improving performance means

– Reducing execution time, or
– Increasing throughput
Lecture 22: 33
CPU Execution Time
• Amount of time the CPU takes to run a program
• Derivation
number of instructions clock cycle time

in the program (1/frequency)
average number of
cycles per instruction
Lecture 22: 34
Instruction Count (I)
• Total number of instructions in the given
program
• Factors
– Instruction set
– Mix of instructions chosen by the compiler
Lecture 22: 35
Cycle Time (CT)
• Clock period (1/frequency)
• Factors
– Instruction set
– Structure of the processor and memory hierarchy
Lecture 22: 36
Cycles Per Instruction (CPI)
• Average number of cycles required to execute
each instruction
• Factors
– Instruction set
– Mix of instructions chosen by the compiler
– Ordering of the instructions by the compiler
– Structure of the processor and memory hierarchy
Lecture 22: 37
Processor Organization
Impact on CPI (Example 1)
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
ADD R1,R2,R3 IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM Reg L DM Reg
U
A
AND R6,R1,R2 IM Reg L DM Reg
U
A
ADDI R7,R7,3 IM Reg L DM Reg
U
With forwarding: Reduced stall cycles

Lower CPI, potentially reduced execution time
Lecture 22: 38
Processor Organization
Impact on CPI (Example 2)
Control
CU Signals
=?
sign bit
Adder
+2 Fm … F0 Data
M
RF U M RAM
M LD X
P U
Decoder
U Inst SA X M
M ALU
X C RAM SB
M
U U
DR X D_IN
U
X
M
U
X
PCJ X MB
D_in MW MD
PCL
SE
IF/ID ID/EX EX/MEM MEM/WB
Only one delay slot needed with branch resolved in ID

Lower CPI
Lecture 22: 39
Compiler Impact on CPI (Example 3)
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
A
BEQ R2,R3,X IM Reg L DM Reg
U
A
NOP
ADDI R7,R7,3 IM Reg L DM Reg
U
A
OR R4,R1,R3 IM Reg L DM Reg
U
A
SUB R5,R2,R1 IM Reg L DM Reg
U
A
X: AND R6,R1,R2 IM Reg L DM Reg
U
ADDI R7,R7,3
Filling the branch delay slot
...
with a useful instruction
Lecture 22: 40
A Rough Breakdown of CPI
• CPIbase is the base CPI in an ideal scenario where
instruction fetches and data memory accesses incur no
extra delay
• CPImemhier is the (additional) CPI spent for accessing the

memory hierarchy when a miss occurs in caches
• CPItotal is the overall CPI

– CPItotal = CPIbase + CPImemhier
Lecture 22: 41
Impact of L1 Caches
• With L1 caches
– L1 instruction cache miss rate = 2%
– L1 data cache miss rate = 4%
– Miss penalty = 100 cycles (access main memory)
– 20% of all instructions are loads, 10% are stores
• CPImemhier = 0.02 × 100 + 0.3 × 0.04 × 100 = 3.2
Lecture 22: 42
Impact of L1+L2 Caches
• With L1 and L2 caches
– L1 instruction cache miss rate = 2%
– L1 data cache miss rate = 4%
– L2 access time = 15 cycles
– L2 miss rate = 25%
– L2 miss penalty = 100 cycles (access main memory)
– 20% of all instructions are loads, 10% are stores
• CPImemhier = 0.02 × (15 + 0.25 × 100) +

0.30 × 0.04 × (15 + 0.25 × 100) = 1.28
Lecture 22: 43
Before Next Class
• H&H 8.4
Next Time
Virtual Memory
Lecture 22: 44

Lecture22 PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lecture22 PDF

Hochgeladen von

Copyright:

Verfügbare Formate

ECE 2300

Digital Logic & Computer Organization

• HW7 due tomorrow 11:59pm

• Prelab 5(c) due Saturday 3pm

• Lab 6 (last one) released

• HW8 (last one) to be released tonight

Color code: Cold miss Conflict miss Capacity miss

• Cache hit (block is in cache)

• Cache miss (block is not in cache)

2 tag bits 3 byte offset bits

• Example: 3% miss rate, 100 cycles to main

• Solution: Add another level of cache

Main Memory (GB)

• Example: assume 1 cycle to access L1 (3% miss rate),

• Throughput: Total amount of work done in a

• Improving performance means

number of instructions clock cycle time

With forwarding: Reduced stall cycles

Only one delay slot needed with branch resolved in ID

• CPImemhier is the (additional) CPI spent for accessing the

• CPItotal is the overall CPI

• CPImemhier = 0.02 × 100 + 0.3 × 0.04 × 100 = 3.2

• CPImemhier = 0.02 × (15 + 0.25 × 100) +

Das könnte Ihnen auch gefallen