Beruflich Dokumente
Kultur Dokumente
CPS104 Lec28.1
GK Spring 2004
Admin.
Homework-6: Is posted. u Due date was extended to March 29. u No further extension! u The second part of this assignment will be posted Monday March 22. u This assignment is harder then it looks. u These two assignment have larger weight then other homework assignments. u Please start ASAP!! Homework -7: is posted.
CPS104 Lec28.2
GK Spring 2004
Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy)
CPS104 Lec28.3
GK Spring 2004
Separate Inst & Data Caches u Harvard Architecture Can access both at same time Combined L2 u L2 >> L1
Combined L2 Cache
DRAM
CPS104 Lec28.4
GK Spring 2004
Cache Performance
CPU time = (CPU_execution_clock_cycles + Memory_stall_clock_cycles) x clock_cycle_time Memory_stall_clock_cycles = Memory_accesses x Miss_rate x Miss_penalty Example Assume every instruction takes 1 cycle Miss penalty = 20 cycles Miss rate = 10% 1000 total instructions, 300 memory accesses Memory stall cycles? CPU clocks?
CPS104 Lec28.5
GK Spring 2004
Cache Performance
Memory Stall cycles = 300 * 0.10 * 20 = 600 CPU_clocks = 1000 + 600 = 1600 60% slower because of cache misses!
CPS104 Lec28.6
GK Spring 2004
1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the cache.
CPS104 Lec28.7
GK Spring 2004
Reducing Misses
Classifying Misses: 3 Cs u CompulsoryThe first access to a block is not in the cache, so the block must be brought into the cache. These are also called cold start misses or first reference misses. (Misses in Infinite Cache) u CapacityIf the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved. (Misses in Size X Cache) u ConflictIf the block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory and capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. These are also called collision misses or interference misses. (Misses in N-way Associative, Size X Cache)
GK Spring 2004
CPS104 Lec28.8
Cache Performance
Your program and caches Can you affect performance? Think about 3Cs
CPS104 Lec28.9
GK Spring 2004
Instructions u Reorder procedures in memory so as to reduce misses u Profiling to look at conflicts u McFarling [1989] reduced caches misses by 75% on 8KB direct mapped cache with 4 byte blocks Data
u
CPS104 Lec28.10
GK Spring 2004
CPS104 Lec28.11
GK Spring 2004
CPS104 Lec28.12
GK Spring 2004
/* After */ for (i = 0; i < N; i = i+1) for (j = 0; j < N; j = j+1) { a[i][j] = 1/b[i][j] * c[i][j]; d[i][j] = a[i][j] + c[i][j];} 2 misses per access to a & c vs. one miss per access
CPS104 Lec28.13
GK Spring 2004
Blocking Example
/* Before */ for (i = 0; i < N; i = i+1) for (j = 0; j < N; j = j+1) {r = 0; for (k = 0; k < N; k = k+1) r = r + y[i][k]*z[k][j]; x[i][j] = r; };
Two Inner Loops: u Read all NxN elements of z[] u Read N elements of 1 row of y[] repeatedly u Write N elements of 1 row of x[] Capacity Misses a function of N & Cache Size: u 3 NxN => no capacity misses; otherwise ... Idea: compute on BxB submatrix that fits
CPS104 Lec28.14
GK Spring 2004
Blocking Example
/* After */ for (jj = 0; jj < N; jj = jj+B) for (kk = 0; kk < N; kk = kk+B) for (i = 0; i < N; i = i+1) for (j = jj; j < min(jj+B-1,N); j = j+1) {r = 0; for (k = kk; k < min(kk+B-1,N); k = k+1) { r = r + y[i][k]*z[k][j];}; x[i][j] = x[i][j] + r; };
Capacity Misses from 2N3 + N2 to 2N3/B +N2 B called Blocking Factor Conflict Misses Too?
CPS104 Lec28.15
GK Spring 2004
0.1 Direct Mapped Cache 0.05 Fully Associative Cache 0 0 50 100 150 Blocking Factor
Conflict misses in caches not FA vs. Blocking size u Lam et al [1991] a blocking factor of 24 had a fifth the misses vs. 48 despite both fit in cache
GK Spring 2004
CPS104 Lec28.16
loop interchange
loop fusion
blocking
GK Spring 2004
Summary
Cost Effective Memory Hierarchy Split Instruction and Data Cache 4 Questions CPU cycles/time, Memory Stall Cycles Your programs and cache performance Virtual Memory
Next
CPS104 Lec28.18
GK Spring 2004
Virtual Memory
CPS104 Lec28.19
GK Spring 2004
Different Programs have different memory requirements. u How to manage program placement? Different machines have different amount of memory. u How to run the same program on many different machines? At any given time each machine runs a different set of programs. u How to fit the program mix into memory? Reclaiming unused memory? Moving code around? The amount of memory consumed by each program is dynamic (changes over time) u How to effect changes in memory location: add or subtract space? Program bugs can cause a program to generate reads and writes outside the program address space. u How to protect one program from another?
CPS104 Lec28.20
GK Spring 2004
Virtual Memory
Provides illusion of very large memory Sum of the memory of many jobs greater than physical memory Address space of each job larger than physical memory Allows available (fast and expensive) physical memory to be well utilized. Simplifies memory management: code and data movement, protection, ... (main reason today) Exploits memory hierarchy to keep average access time low. Involves at least two storage levels: main and secondary Virtual Address -- address used by the programmer Virtual Address Space -- collection of such addresses Memory Address -- address in physical memory also known as physical address or real address
CPS104 Lec28.21
GK Spring 2004
What are the possible addresses generated by the program? How big is our DRAM? Is there more than one program running? If so, how do we allocate memory to each?
Data
3
Text
add r,s1,s2
Reserved
CPS104 Lec28.22
GK Spring 2004
Divide memory (virtual and physical) into fixed size blocks (Pages, Frames).
u u
Make page size a power of 2: (page size = 2k) All pages in the virtual address space are contiguous. Pages can be mapped into physical Frames in any order. Some of the pages are in main memory (DRAM), some of the pages are on secodary memory (disk). All programs are written using Virtual Memory Address Space. The hardware does on-the-fly translation between virtual and physical address spaces. Use a Page Table to translate between Virtual and Physical addresses
CPS104 Lec28.23
GK Spring 2004
mem
disk
pages
frame Paging Organization virtual and physical address space partitioned into blocks of equal size frames pages
CPS104 Lec28.24
GK Spring 2004
Disk
Page -2
Virtual Address
31 Virtual Page Number 11 Page offset 0
Page Table
11 Page offset
Physical Address
CPS104 Lec28.26
GK Spring 2004