Sie sind auf Seite 1von 26

CPS 104 Computer Organization and Programming Lecture- 28: Cache Memory, Virtual Memory

March 26, 2004 Gershon Kedem http://kedem.duke.edu/cps104/Lectures

CPS104 Lec28.1

GK Spring 2004

Admin.

Homework-6: Is posted. u Due date was extended to March 29. u No further extension! u The second part of this assignment will be posted Monday March 22. u This assignment is harder then it looks. u These two assignment have larger weight then other homework assignments. u Please start ASAP!! Homework -7: is posted.

CPS104 Lec28.2

GK Spring 2004

Review: Four Questions for Memory Hierarchy Designers

Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy)

CPS104 Lec28.3

GK Spring 2004

Separate Instruction and Data Caches


Inst Cache Data Path Data Cache

Separate Inst & Data Caches u Harvard Architecture Can access both at same time Combined L2 u L2 >> L1

Combined L2 Cache

DRAM

CPS104 Lec28.4

GK Spring 2004

Cache Performance
CPU time = (CPU_execution_clock_cycles + Memory_stall_clock_cycles) x clock_cycle_time Memory_stall_clock_cycles = Memory_accesses x Miss_rate x Miss_penalty Example Assume every instruction takes 1 cycle Miss penalty = 20 cycles Miss rate = 10% 1000 total instructions, 300 memory accesses Memory stall cycles? CPU clocks?

CPS104 Lec28.5

GK Spring 2004

Cache Performance

Memory Stall cycles = 300 * 0.10 * 20 = 600 CPU_clocks = 1000 + 600 = 1600 60% slower because of cache misses!

CPS104 Lec28.6

GK Spring 2004

Improving Cache Performance

1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the cache.

CPS104 Lec28.7

GK Spring 2004

Reducing Misses

Classifying Misses: 3 Cs u CompulsoryThe first access to a block is not in the cache, so the block must be brought into the cache. These are also called cold start misses or first reference misses. (Misses in Infinite Cache) u CapacityIf the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved. (Misses in Size X Cache) u ConflictIf the block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory and capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. These are also called collision misses or interference misses. (Misses in N-way Associative, Size X Cache)
GK Spring 2004

CPS104 Lec28.8

Cache Performance

Your program and caches Can you affect performance? Think about 3Cs

CPS104 Lec28.9

GK Spring 2004

Reducing Misses by Compiler Optimizations

Instructions u Reorder procedures in memory so as to reduce misses u Profiling to look at conflicts u McFarling [1989] reduced caches misses by 75% on 8KB direct mapped cache with 4 byte blocks Data
u

Merging Arrays: improve spatial locality by single


array of compound elements vs. 2 arrays

Loop Interchange: change nesting of loops to access


data in order stored in memory

Loop Fusion: Combine 2 independent loops that have


same looping and some variables overlap

Blocking: Improve temporal locality by accessing


blocks of data repeatedly vs. going down whole columns or rows

CPS104 Lec28.10

GK Spring 2004

Merging Arrays Example


/* Before */ int val[SIZE]; int key[SIZE]; /* After */ struct merge { int val; int key; }; struct merge merged_array[SIZE];

Reducing conflicts between val & key

CPS104 Lec28.11

GK Spring 2004

Loop Interchange Example


/* Before */ for (k = 0; k < 100; k = k+1) for (j = 0; j < 100; j = j+1) for (i = 0; i < 5000; i = i+1) x[i][j] = 2 * x[i][j]; /* After */ for (k = 0; k < 100; k = k+1) for (i = 0; i < 5000; i = i+1) for (j = 0; j < 100; j = j+1) x[i][j] = 2 * x[i][j]; Sequential accesses Instead of striding through memory every 100 words

CPS104 Lec28.12

GK Spring 2004

Loop Fusion Example


/* Before */ for (i = 0; i < N; i = i+1) for (j = 0; j < N; j = j+1) a[i][j] = 1/b[i][j] * c[i][j]; for (i = 0; i < N; i = i+1) for (j = 0; j < N; j = j+1) d[i][j] = a[i][j] + c[i][j];

/* After */ for (i = 0; i < N; i = i+1) for (j = 0; j < N; j = j+1) { a[i][j] = 1/b[i][j] * c[i][j]; d[i][j] = a[i][j] + c[i][j];} 2 misses per access to a & c vs. one miss per access

CPS104 Lec28.13

GK Spring 2004

Blocking Example
/* Before */ for (i = 0; i < N; i = i+1) for (j = 0; j < N; j = j+1) {r = 0; for (k = 0; k < N; k = k+1) r = r + y[i][k]*z[k][j]; x[i][j] = r; };

Two Inner Loops: u Read all NxN elements of z[] u Read N elements of 1 row of y[] repeatedly u Write N elements of 1 row of x[] Capacity Misses a function of N & Cache Size: u 3 NxN => no capacity misses; otherwise ... Idea: compute on BxB submatrix that fits

CPS104 Lec28.14

GK Spring 2004

Blocking Example
/* After */ for (jj = 0; jj < N; jj = jj+B) for (kk = 0; kk < N; kk = kk+B) for (i = 0; i < N; i = i+1) for (j = jj; j < min(jj+B-1,N); j = j+1) {r = 0; for (k = kk; k < min(kk+B-1,N); k = k+1) { r = r + y[i][k]*z[k][j];}; x[i][j] = x[i][j] + r; };

Capacity Misses from 2N3 + N2 to 2N3/B +N2 B called Blocking Factor Conflict Misses Too?

CPS104 Lec28.15

GK Spring 2004

Reducing Conflict Misses by Blocking


0.15

0.1 Direct Mapped Cache 0.05 Fully Associative Cache 0 0 50 100 150 Blocking Factor

Conflict misses in caches not FA vs. Blocking size u Lam et al [1991] a blocking factor of 24 had a fifth the misses vs. 48 despite both fit in cache
GK Spring 2004

CPS104 Lec28.16

Summary of Compiler Optimizations to Reduce Cache Misses


vpenta (nasa7) gmty (nasa7) tomcatv btrix (nasa7) mxm (nasa7) spice cholesky (nasa7) compress 1 1.5 2 2.5 3

Performance Improvement merged arrays


CPS104 Lec28.17

loop interchange

loop fusion

blocking
GK Spring 2004

Summary

Cost Effective Memory Hierarchy Split Instruction and Data Cache 4 Questions CPU cycles/time, Memory Stall Cycles Your programs and cache performance Virtual Memory

Next

CPS104 Lec28.18

GK Spring 2004

Virtual Memory

CPS104 Lec28.19

GK Spring 2004

Memory System Management Problems

Different Programs have different memory requirements. u How to manage program placement? Different machines have different amount of memory. u How to run the same program on many different machines? At any given time each machine runs a different set of programs. u How to fit the program mix into memory? Reclaiming unused memory? Moving code around? The amount of memory consumed by each program is dynamic (changes over time) u How to effect changes in memory location: add or subtract space? Program bugs can cause a program to generate reads and writes outside the program address space. u How to protect one program from another?

CPS104 Lec28.20

GK Spring 2004

Virtual Memory
Provides illusion of very large memory Sum of the memory of many jobs greater than physical memory Address space of each job larger than physical memory Allows available (fast and expensive) physical memory to be well utilized. Simplifies memory management: code and data movement, protection, ... (main reason today) Exploits memory hierarchy to keep average access time low. Involves at least two storage levels: main and secondary Virtual Address -- address used by the programmer Virtual Address Space -- collection of such addresses Memory Address -- address in physical memory also known as physical address or real address
CPS104 Lec28.21
GK Spring 2004

Review: A Simple Programs Memory 2n-1 Layout


Stack
5

What are the possible addresses generated by the program? How big is our DRAM? Is there more than one program running? If so, how do we allocate memory to each?

Data
3

Text
add r,s1,s2

Reserved

CPS104 Lec28.22

GK Spring 2004

Paged Virtual Memory: Main Idea

Divide memory (virtual and physical) into fixed size blocks (Pages, Frames).
u u

Pages in Virtual space. Frames in Physical space.

Make page size a power of 2: (page size = 2k) All pages in the virtual address space are contiguous. Pages can be mapped into physical Frames in any order. Some of the pages are in main memory (DRAM), some of the pages are on secodary memory (disk). All programs are written using Virtual Memory Address Space. The hardware does on-the-fly translation between virtual and physical address spaces. Use a Page Table to translate between Virtual and Physical addresses

CPS104 Lec28.23

GK Spring 2004

Basic Issues in Virtual Memory System Design


size of information blocks (pages)that are transferred from secondary to main storage (M) block of information brought into M, and M is full, then some region of M must be released to make room for the new block --> replacement policy which region of M is to hold the new block --> placement policy missing item fetched from secondary memory only on the occurrence of a fault --> demand load policy
cache reg

mem

disk

pages

frame Paging Organization virtual and physical address space partitioned into blocks of equal size frames pages
CPS104 Lec28.24
GK Spring 2004

Virtual and Physical Memories


Physical Memory Virtual Memory
Frame-0 Page-0 Frame-1 Page-1 Frame-2 Page-2 Frame-3 Page-3 Frame-4 Frame-5

Disk
Page -2

Page N-2 Page N-1 Page N


CPS104 Lec28.25
GK Spring 2004

Virtual to Physical Address translation


Page size: 4K

Virtual Address
31 Virtual Page Number 11 Page offset 0

Page Table

29 Physical Frame Number

11 Page offset

Physical Address
CPS104 Lec28.26
GK Spring 2004

Das könnte Ihnen auch gefallen