Beruflich Dokumente
Kultur Dokumente
Characteristics
EN3542 Digital System Design
Cache Memory
Ajith Pasqual pasqual@ent.mrt.ac.lk
Dept. of Electronic & Telecommunication Engineering B.Sc. Engineering Semester 5 Module
Capacity expressed in terms of bytes or words. Typical word lengths are 8,16 and 32 bits. Word The natural unit of organization of memory. The size of the word is typically equal to the number of bits used to represent a number Addressable Units Word or Byte Unit of Transfer No. of bits read out of or written into memory at a time.
Method of Accessing:
Sequential Access access made in a specific linear sequence Direct Access Individual blocks or records have a unique address based on physical location. Access is accomplished by direct access followed by sequential access
Characteristics ..
Random Access Each addressable location in memory has a unique, physically wired in addressing mechanism Associative A random access type of memory that enables one to make a comparison of desired bit locations within a word for specified match, and to do this for all words simultaneously. Thus, a word is retrieved based on a portion of contents rather than its address.
Memory Hierarchy
Key characteristics of memory: Cost, capacity and access time Faster access time, greater cost per bit Greater capacity, smaller cost per bit Greater capacity, slower access time.
Performance parameters: Access time time taken to perform a read/write operation Memory cycle time access time plus any additional time required before a second access can commence. (applicable to RAM) Transfer Rate Rate at which data can be transferred into or out of memory unit
Cache Memory
- intended to give memory speed approaching that of the fastest memories available, and at the same time provide a large memory size at the price of less expensive types of semi-conductor memories.
Cache & Main Memory Main Memory consists of up to 2n addressable words, with each word having a unique n-bit address. Mapping memory consists of a number of fixed length blocks of K words each (M = 2n /K blocks). Cache consists of C lines of K words each. (C << M)
07/05/2012
Cache Memory
Cache Memory
Associative
Set Associative Replacement Algorithm Least Recently Used First In First Out Least Frequently Used Random
Direct Mapping
Each block of main memory maps to only one cache line
i.e. if a block is in cache, it must be in one specific place
Main Memory consists of 16MB with each byte directly addressable. (Thus for mapping purposes we can consider main memory to consist of 4M blocks of 4 bytes each)
Address is in two parts Least Significant w bits identify unique word Most Significant s bits specify one memory block The MSBs are split into a cache line field r and a tag of s-r (most significant)
07/05/2012
24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier
8 bit tag (=22-14) 14 bit slot or line
m-1, 2m-1,3m-12s-
No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag
Direct Mapping: Maps each block of main memory into one possible cache line . i = j modulo n i cache line, j main memory block number, m number of lines in the cache.
Address Format:
Mapping
Direct mapping cache treats a main memory address as 3 distinct fields Tag identifier
Line number identifier Word identifier (offset)
Word identifier specifies the specific word (or addressable unit) in a cache line that is to be read Line identifier specifies the physical line in cache that will hold the referenced address The tag is stored in the cache along with the data words of the line For every memory reference that the CPU makes, the specific line
that would hold the reference (if it is has already been copied into the cache) is determined The tag held in that line is checked to see if the correct block is in the cache
Mapping functions
Direct Mapping Advantages: Simple & inexpensive to implement Disadvantages: - fixed cache location for any given block. Thus if a program happens to reference words repeatedly from two different blocks that map into the same cache line, then the blocks will be continually swapped in the cache. (Hit ratio becomes low)
Associative Mapping: This overcomes the disadvantages of direct mapping by permitting each main memory block to be loaded into any line of the cache. The cache control logic interprets a memory address simply as a tag and word field. Tag field uniquely identifies a block of main memory
07/05/2012
Associative Mapping
A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every lines tag is examined for a match Cache searching gets expensive
Word 2 bit
22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block e.g. Address Tag Data Cache line FFFFFC FFFFFC 24682468 3FFF
Associative Mapping
Disadvantages: Complex circuitry required to examine the tags of all cache line in parallel.
Mapping functions
Set Associative Mapping: This is a compromise that exhibits the strength of both the direct and associative approaches while reducing their disadvantages. Cache is divided into v sets each, each of which consists of k lines. m=vxk i = j modulo v i cache set number, j -main memory block number m number of lines in the cache This is referred to as k-way set associative mapping
07/05/2012
Tag 9 bit
Set 13 bit
Word 2 bit
Use set field to determine cache set to look in Compare tag field to see if we have a hit e.g
Address number 1FF 7FFC 001 7FFC Tag 1FF 001 Data 1FFF 1FFF Set
12345678 11223344
Mapping ..
Memory size of 1 MB (20 address bits) addressable to the individual byte Cache size of 1 K lines, each holding 8 bytes Word id = 3 bits Line id = 10 bits Tag id = 7 bits Where is the byte stored at main memory location $ABCDE stored? $ABCDE=1010101 1110011011 110 Cache line $39B, word offset $6, tag $55
07/05/2012
Mapping ..
Associative mapping Let a block be stored in any cache line that is not in use Overcomes direct mappings main weakness Must examine each line in the cache to find the right memory block Examine the line tag id for each line Slow process for large caches! Line numbers (ids) have no meaning in the cache Parse the main memory address into 2 fields (tag and word offset) rather than 3 as in direct mapping Implement cache in 2 parts The lines themselves in SRAM The tag storage in associative memory Perform an associative search over all tags to find the desired line (if its in the cache) Word id = 3 bits Tag id = 17 bits
Mapping
Associative Mapping:
Where is the byte stored at main memory location $ABCDE stored? $ABCDE=10101011110011011 110 Cache line unknown, word offset $6, tag $1579D
Mapping
Set Associative Mapping: Assume the 1024 lines are 4-way set associative 1024/4 = 256 sets Word id = 3 bits Set id = 8 bits Tag id = 9 bits Where is the byte stored at main memory location $ABCDE stored? $ABCDE=101010111 10011011 110 Cache set $9B, word offset $6, tag $157
Write Policy
Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly When a line is to be replaced, must update the original copy of the line in main memory if any addressable unit in the line has been changed Write through Anytime a word in cache is changed, it is also changed in main memory Both copies always agree Generates lots of memory writes to main memory
Write Policy
Write back
During a write, only change the contents of the cache Update main memory only when the cache line is to be replaced Causes cache coherency problems -- different values for the contents of an address are in the cache and the main memory Complex circuitry to avoid this problem I/O must access main memory through cache N.B. 15% of memory references are writes
07/05/2012
No. of Caches
Number of caches Single vs. 2-level Modern CPU chips have on-board cache (L1) 80486 -- 8KB Pentium -- 16 KB
Power PC -- up to 64 KB
L1 provides best performance gains Secondary, off-chip cache (L2) provides higher speed access to main memory L2 is generally 512KB or less -- more than this is not cost-effective
Types of Cache
Unified vs. split cache Unified cache stores data and instructions in 1 cache Only 1 cache to design and operate
Cache is flexible and can balance allocation of space to instructions or data to best fit the execution of the program -- higher hit ratio
- Split cache uses 2 caches -1 for instructions and 1 for data Must build and manage 2 caches
Pentium 4 Cache
80386 no on chip cache 80486 8k using 16 byte lines and four way set associative organization Pentium (all versions) two on chip L1 caches Data & instructions Pentium 4 L1 caches 8k bytes 64 byte lines four way set associative L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative