6.0 Cache Memory

07/05/2012
Characteristics
EN3542 Digital System Design
Cache Memory
Ajith Pasqual pasqual@ent.mrt.ac.lk
Dept. of Electronic & Telecommunication Engineering B.Sc. Engineering Semester 5 Module
Capacity expressed in terms of bytes or words. Typical word lengths are 8,16 and 32 bits. Word The natural unit of organization of memory. The size of the word is typically equal to the number of bits used to represent a number Addressable Units Word or Byte Unit of Transfer No. of bits read out of or written into memory at a time.
Method of Accessing:
Sequential Access access made in a specific linear sequence Direct Access Individual blocks or records have a unique address based on physical location. Access is accomplished by direct access followed by sequential access
Characteristics ..
Random Access Each addressable location in memory has a unique, physically wired in addressing mechanism Associative A random access type of memory that enables one to make a comparison of desired bit locations within a word for specified match, and to do this for all words simultaneously. Thus, a word is retrieved based on a portion of contents rather than its address.
Memory Hierarchy
Key characteristics of memory: Cost, capacity and access time Faster access time, greater cost per bit Greater capacity, smaller cost per bit Greater capacity, slower access time.
Performance parameters: Access time time taken to perform a read/write operation Memory cycle time access time plus any additional time required before a second access can commence. (applicable to RAM) Transfer Rate Rate at which data can be transferred into or out of memory unit
Decreasing cost per bit

Increasing capacity Increasing access time Decreasing frequency of access of the memory by the processor.
Performance of a simple two-level memory

T1 Access time to level 1 T2 Access time to level 2 H (Hit ratio) fraction of all memory accesses that are found in the faster memory
Cache Memory
- intended to give memory speed approaching that of the fastest memories available, and at the same time provide a large memory size at the price of less expensive types of semi-conductor memories.
Cache & Main Memory Main Memory consists of up to 2n addressable words, with each word having a unique n-bit address. Mapping memory consists of a number of fixed length blocks of K words each (M = 2n /K blocks). Cache consists of C lines of K words each. (C << M)
07/05/2012
Cache Memory
Cache Memory
At any time, some subset of blocks resides in lines in the cache.

Each line includes a tag (portion of memory address) that identifies which block is currently stored
Elements of Cache Design

Cache Size Mapping Function Direct Write Policy Write Through Write Back Write Once Lines of Cache Number of Caches Single or two level Unified or Split

Cache Size - to be small enough so that the overall average cost per bit is close to that of main memory alone and large enough so that overall access time is closer to that of the cache alone. Large Caches tend to be slightly slower than the smaller ones (due to large number of gates involved in addressing the cache)
Associative
Set Associative Replacement Algorithm Least Recently Used First In First Out Least Frequently Used Random
Cache Size is limited by the available chip and board area.

Mapping
Since cache is much smaller than main memory, an algorithm is needed for mapping main memory blocks into cache lines. Further, a means is needed for determining which memory block currently occupies a cache line. The choice of mapping function determines the way the cache is organized.

Consider the following data: Cache can hold 64 kB Data is transferred between cache & main memory in blocks of 4 bytes each. (i.e. cache is organized as 16K lines of 4 bytes.
Direct Mapping
Each block of main memory maps to only one cache line
i.e. if a block is in cache, it must be in one specific place
Main Memory consists of 16MB with each byte directly addressable. (Thus for mapping purposes we can consider main memory to consist of 4M blocks of 4 bytes each)
Address is in two parts Least Significant w bits identify unique word Most Significant s bits specify one memory block The MSBs are split into a cache line field r and a tag of s-r (most significant)
07/05/2012
Direct Mapping Address Structure

Tag s-r 8 Line or Slot r 14 Word w 2
Direct Mapping Cache Line Table

Cache line held 0 1 m+1 m-1 1 Main Memory blocks 0, m, 2m, 3m2s-m 1,m+1, 2m+12s-
24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier
8 bit tag (=22-14) 14 bit slot or line
m-1, 2m-1,3m-12s-
No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag
Direct Mapping: Maps each block of main memory into one possible cache line . i = j modulo n i cache line, j main memory block number, m number of lines in the cache.
Address Format:
Mapping
Direct mapping cache treats a main memory address as 3 distinct fields Tag identifier
Line number identifier Word identifier (offset)
Word identifier specifies the specific word (or addressable unit) in a cache line that is to be read Line identifier specifies the physical line in cache that will hold the referenced address The tag is stored in the cache along with the data words of the line For every memory reference that the CPU makes, the specific line
that would hold the reference (if it is has already been copied into the cache) is determined The tag held in that line is checked to see if the correct block is in the cache
Mapping functions
Direct Mapping Advantages: Simple & inexpensive to implement Disadvantages: - fixed cache location for any given block. Thus if a program happens to reference words repeatedly from two different blocks that map into the same cache line, then the blocks will be continually swapped in the cache. (Hit ratio becomes low)
Direct Mapping Summary

Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = m = 2r Size of tag = (s r) bits
Associative Mapping: This overcomes the disadvantages of direct mapping by permitting each main memory block to be loaded into any line of the cache. The cache control logic interprets a memory address simply as a tag and word field. Tag field uniquely identifies a block of main memory
07/05/2012
Associative Mapping
A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every lines tag is examined for a match Cache searching gets expensive
Associative Mapping Address Structure

Tag 22 bit

Word 2 bit
22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block e.g. Address Tag Data Cache line FFFFFC FFFFFC 24682468 3FFF
Associative Mapping Summary

Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = undetermined Size of tag = s bits
Associative Mapping
Disadvantages: Complex circuitry required to examine the tags of all cache line in parallel.
Set Associative Mapping

Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set
e.g. Block B can be in any line of set i
Mapping functions
Set Associative Mapping: This is a compromise that exhibits the strength of both the direct and associative approaches while reducing their disadvantages. Cache is divided into v sets each, each of which consists of k lines. m=vxk i = j modulo v i cache set number, j -main memory block number m number of lines in the cache This is referred to as k-way set associative mapping
e.g. 2 lines per set

2 way associative mapping A given block can be in one of 2 lines in only one set
07/05/2012
Set Associative Mapping
Set Associativ e Mapping
Set Associative Mapping Example

13 bit set number Block number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 map to same set
Set Associative Mapping Address Structure
Tag 9 bit
Set 13 bit
Word 2 bit
Use set field to determine cache set to look in Compare tag field to see if we have a hit e.g
Address number 1FF 7FFC 001 7FFC Tag 1FF 001 Data 1FFF 1FFF Set
12345678 11223344
Set Associative Mapping Summary

Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2d Number of lines in set = k Number of sets = v = 2d Number of lines in cache = kv = k * 2d Size of tag = (s d) bits
Example:
Mapping ..
Memory size of 1 MB (20 address bits) addressable to the individual byte Cache size of 1 K lines, each holding 8 bytes Word id = 3 bits Line id = 10 bits Tag id = 7 bits Where is the byte stored at main memory location $ABCDE stored? $ABCDE=1010101 1110011011 110 Cache line $39B, word offset $6, tag $55
07/05/2012
Mapping ..
Associative mapping Let a block be stored in any cache line that is not in use Overcomes direct mappings main weakness Must examine each line in the cache to find the right memory block Examine the line tag id for each line Slow process for large caches! Line numbers (ids) have no meaning in the cache Parse the main memory address into 2 fields (tag and word offset) rather than 3 as in direct mapping Implement cache in 2 parts The lines themselves in SRAM The tag storage in associative memory Perform an associative search over all tags to find the desired line (if its in the cache) Word id = 3 bits Tag id = 17 bits
Mapping
Associative Mapping:
Where is the byte stored at main memory location $ABCDE stored? $ABCDE=10101011110011011 110 Cache line unknown, word offset $6, tag $1579D
Mapping
Set Associative Mapping: Assume the 1024 lines are 4-way set associative 1024/4 = 256 sets Word id = 3 bits Set id = 8 bits Tag id = 9 bits Where is the byte stored at main memory location $ABCDE stored? $ABCDE=101010111 10011011 110 Cache set $9B, word offset $6, tag $157
Line Replacement Algorithms

When an associative cache or a set associative cache set is full, which line should be replaced by the new line that is to be read from memory? Not a problem for direct mapping since each block has a predetermined line it must use Least recently used First in first out
Least frequently used

Random
Write Policy
Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly When a line is to be replaced, must update the original copy of the line in main memory if any addressable unit in the line has been changed Write through Anytime a word in cache is changed, it is also changed in main memory Both copies always agree Generates lots of memory writes to main memory
Write Policy
Write back
During a write, only change the contents of the cache Update main memory only when the cache line is to be replaced Causes cache coherency problems -- different values for the contents of an address are in the cache and the main memory Complex circuitry to avoid this problem I/O must access main memory through cache N.B. 15% of memory references are writes
07/05/2012
Number of Lines/Blocks in Cache

Block / line sizes How much data should be transferred from main memory to the cache in a single memory reference Complex relationship between block size and hit ratio as well as the operation of the system bus itself As block size increases, Locality of reference predicts that the additional information transferred will likely be used and thus increases the hit ratio (good) Number of blocks in cache goes down,limiting the total number of blocks in the cache (bad) As the block size gets big, the probability of referencing all the data in it goes down (hit ratio goes down) (bad) Size of 4-8 addressable units seems about right for current systems
No. of Caches
Number of caches Single vs. 2-level Modern CPU chips have on-board cache (L1) 80486 -- 8KB Pentium -- 16 KB
Power PC -- up to 64 KB
L1 provides best performance gains Secondary, off-chip cache (L2) provides higher speed access to main memory L2 is generally 512KB or less -- more than this is not cost-effective
Types of Cache
Unified vs. split cache Unified cache stores data and instructions in 1 cache Only 1 cache to design and operate
MESI Cache Coherence Protocol

MESI protocol provides cache coherency in both the Pentium and the PowerPC Stands for Modified Exclusive Shared Invalid Implemented with an additional 2-bit field for each cache line Becomes interesting in the interactions of the L1 and the L2 caches - each track the local MESI status as a line moves from main memory to L2 and then to L1 PowerPC adds an addition state A for allocated A line is marked as A while its data is being swapped out
Cache is flexible and can balance allocation of space to instructions or data to best fit the execution of the program -- higher hit ratio
- Split cache uses 2 caches -1 for instructions and 1 for data Must build and manage 2 caches
Static allocation of cache sizes

Can out perform unified cache in systems that support parallel execution and pipelining (reduces cache contention)
Operation of 2 level Memory

Recall the goal of the memory system: Provide an average access time to all memory locations that is approximately the same as that of the fastest memory component Provide a system memory with an average cost approximately equal to the cost/bit of the cheapest memory component Simplistic approach, Ts = H1xT1 + H2(T1 + T2 + Tb21) H2 = 1 - H1 T1, T2 are the access times to level 1 and 2 Tb21 is the block transfer time from level 2 to level 1 - Can be generalized to 3 or more levels
Pentium 4 Cache
80386 no on chip cache 80486 8k using 16 byte lines and four way set associative organization Pentium (all versions) two on chip L1 caches Data & instructions Pentium 4 L1 caches 8k bytes 64 byte lines four way set associative L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative

6.0 Cache Memory

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

6.0 Cache Memory

Hochgeladen von

Copyright:

Verfügbare Formate

07/05/2012

Decreasing cost per bit

Performance of a simple two-level memory

At any time, some subset of blocks resides in lines in the cache.

Elements of Cache Design

Elements of Cache Design

Cache Size is limited by the available chip and board area.

Elements of Cache Design

Direct Mapping Address Structure

Direct Mapping Cache Line Table

Direct Mapping Summary

Associative Mapping Address Structure

Associative Mapping Summary

Set Associative Mapping

e.g. 2 lines per set

Set Associative Mapping

Set Associativ e Mapping

Set Associative Mapping Example

Set Associative Mapping Address Structure

Set Associative Mapping Summary

Line Replacement Algorithms

Least frequently used

Number of Lines/Blocks in Cache

MESI Cache Coherence Protocol

Static allocation of cache sizes

Operation of 2 level Memory

Das könnte Ihnen auch gefallen