Beruflich Dokumente
Kultur Dokumente
Memory hierarchy
A structure that uses multiple levels of memories; as the distance from the processor
increases, the size of the memories and the access time both increase.
Average memory access time (AMAT) = Hit time + Miss rate x Miss penalty.
Performance reduce AMAT - Reduce the miss rate, miss penalty, or hit time
Cache Parameters
Cache Addresses
Logical
Physical
Cache Size
Only the data bytes
Mapping Function
Direct
Associative
Set associative
Replacement Algorithm
Least recently used (LRU)
First in first out (FIFO)
Least frequently used (LFU)
Random
Write Policy
Write through
Write back
Write-allocate
Write no-allocate
Line (block)Size
Bytes sharing a tag
Number of Caches
Single or two level
Unified or split
Elements of Cache Design
Direct-mapped cache
-- each memory location is mapped to exactly one location in the cache
Direct mapped cache with two sets and a four-word block size
address 0x8000009C
4-way set associative cache Size of cache: 1K blocks = 256 sets * 4-block/set, or
4 KB cache: 4x256 blocks, 1 word/ block
The Output enable signals of the cache RAMs can be used to select the entry in the set
that drives the output.
The Output enable signal driven by the comparators, selects the required data
This organization eliminates the need for the multiplexor.
Advantages of Set associative caches
Higher Hit rate for the same cache size - reduced Conflict Misses.
Disadvantages of Set Associative Caches
N-way Set Associative Cache versus Direct Mapped Cache:
N comparators vs. 1
Extra MUX delay for the data
Data comes AFTER Hit/Miss decision and set selection
In a direct mapped cache, Cache Block is available BEFORE Hit/Miss
Possible to assume a hit and continue. Recover later if miss.
Fully Associative Cache
Practical Pseudo-LRU -- use a binary tree Each node records which half is older/newer
Update nodes on each reference
Follow older pointers to find LRU victim
LRU for a fully associative cache.
The cache mechanism maintains a separate list of indexes to all the lines in the cache.
When a line is referenced, it moves to the front of the list.
For replacement, the line at the back of the list is used.
First-in-first-out (FIFO) -- Replace that block in the set that has been in the cache longest.
FIFO is easily implemented as a round-robin or circular buffer technique.
Least Frequently Used (LFU) -- Counter per block, incremented on reference
Evictions choose lowest count
Random -- Victim blocks are randomly selected
Simulation -- random replacement provides only slightly inferior performance to an
algorithm based on usage
Handling Writes
Write Policy Choices
Write through write cache and lower-level memory.
Slow - always requires memory write, higher traffic
Performance improved with write buffer - blocks stored while waiting to be written to
memory ( by MMU) processor can continue execution until write buffer is full.
+ Read misses cannot result in writes, + data coherency
Write back write only in the cache.
The modified cache block is written to main memory only when it is replaced.
More efficient than write-through, more complex to implement.
Dirty bit per block indicating modified value can further reduce the traffic
+ Less memory bandwidth
Write miss: allocate block if its a miss?
Write allocate The block is allocated on a write miss, followed by the write hit actions.
Organization
Variations on a set-associative
The advantage of increasing the degree of associativity -- usually decreases the miss rate
Potential disadvantages of associativity -- increased cost and slower access time.
Addressing
Tag / Index / Block
Q3: Which block should be replaced on a miss? (Block replacement)
Random, LRU,
Q4: What happens on a write? (Write strategy)
Write Back or Write Through, Write Buffer
Write-back advantages:
Individual words can be written by the processor at the rate that the cache, rather than the
memory can accept them.
Multiple writes within a block require only one write to the lower level in the hierarchy.
When blocks are written back, the system can make effective use of a high bandwidth
transfer, since the entire block is written.
Write-through advantages:
Misses are simpler and cheaper because they never require a block to be written back to the
lower level.
Write-through is easier to implement than write-back, although to be practical, a writethrough cache will still need to use a write buffer.
Cache performance
Causes for Cache Misses
Average memory access time = Hit time + Miss rate Miss penalty
3 Cs model -- cache misses: compulsory, capacity, and conflict misses.
compulsory miss -- also called cold-start miss.
A cache miss caused by the first access to a block that has never been in the cache.
capacity miss -- the cache cannot contain all the blocks needed to satisfy the requests
conflict miss -- also called collision miss.
Occurs in a set-associative or direct mapped cache when multiple blocks compete for
the same set (are eliminated in a fully associative cache of the same size)
4th C: Coherence - Misses caused by cache coherence (Multiprocessors)
Miss rate
decreases the miss rate, but reduces the number of blocks that can be held in the cache,
-- competition for those blocks
the cost of a miss increases -- the miss penalty
the miss rate actually goes up if the block size is too large relative to the cache size
Virtual memory
Pipeline and VM
Page table - indexed with VPn to obtain the PPn 4GB 1GB
Page table register pointer to the starting address of the page table.
The number of entries in the page table is 220, or 1 million entries.
Valid bit -- If it is off , the page is not present in main memory.
A single Page table holds either the physical page number or the disk address.
Page table entry -- PPn or disk address
VPn index to page table
Valid bit is on, the page table supplies the PPn (starting address of the page in memory)
Valid bit is off , the page currently resides only on disk, at a specified disk address.
The table of physical page addresses and disk page addresses -- logically one table may
be stored in two separate data structures
Multi level page tables To avoid large page table size -- Each program has its own page table.
Physically addressed
Virtually addressed
Virtually Indexed, Physically Tagged
Longer hit time
Aliasing problem
Cache index = page size
virtually indexed and virtually tagged cache
the address translation hardware (TLB) is unused during the normal cache access
This takes the TLB out of the critical path, reducing cache latency.
Aliasing one object - two names -- two virtual addresses for the same physical page
A word on such a page may be cached in two different locations, each corresponding to
different virtual addresses.
This ambiguity would allow one program to write the data without the other program being
aware that the data had changed.
Virtually addressed caches - design limitations to ensure that aliases do not occur.
virtually indexed but physically tagged caches -- page offset used as cache index
Accessing the TLB and the (direct-mapped) cache in parallel; k= Index+Dsp = page size
the last k bits of the virtual and physical addresses are the same
The same Index Cache size cache associativity
Implementing Protection with Virtual Memory
Write access bit in -- TLB, Page Table
OS implements the protection CPU: user supervisor, kernel mode
-- special instructions that are only available in supervisor mode.
Mechanism for changing -- user mode to supervisor mode and vice versa.
system call exception (type) -- special instruction (syscall in MIPS) -- transfers control to a
dedicated location in supervisor code space.
o PC from the point of the system call is saved in the exception PC (EPC), and the
processor is placed in supervisor mode.
Return to user mode from the exception -- return from exception (ERET) instruction
resets to user mode and jumps to the address in EPC.
Process protection
Each process has its own virtual address space.
Page tables -- in the protected address space of the OS.
When processes want to share information in a limited way -- OS must assist them
Access right bits for a page must be included in both the page table and the TLB
Protection with Virtual Memory