Sie sind auf Seite 1von 7

07/05/2012

Characteristics
EN3542 Digital System Design
Cache Memory
Ajith Pasqual pasqual@ent.mrt.ac.lk
Dept. of Electronic & Telecommunication Engineering B.Sc. Engineering Semester 5 Module
Capacity expressed in terms of bytes or words. Typical word lengths are 8,16 and 32 bits. Word The natural unit of organization of memory. The size of the word is typically equal to the number of bits used to represent a number Addressable Units Word or Byte Unit of Transfer No. of bits read out of or written into memory at a time.

Method of Accessing:
Sequential Access access made in a specific linear sequence Direct Access Individual blocks or records have a unique address based on physical location. Access is accomplished by direct access followed by sequential access

Characteristics ..
Random Access Each addressable location in memory has a unique, physically wired in addressing mechanism Associative A random access type of memory that enables one to make a comparison of desired bit locations within a word for specified match, and to do this for all words simultaneously. Thus, a word is retrieved based on a portion of contents rather than its address.

Memory Hierarchy
Key characteristics of memory: Cost, capacity and access time Faster access time, greater cost per bit Greater capacity, smaller cost per bit Greater capacity, slower access time.

Performance parameters: Access time time taken to perform a read/write operation Memory cycle time access time plus any additional time required before a second access can commence. (applicable to RAM) Transfer Rate Rate at which data can be transferred into or out of memory unit

Decreasing cost per bit


Increasing capacity Increasing access time Decreasing frequency of access of the memory by the processor.

Performance of a simple two-level memory


T1 Access time to level 1 T2 Access time to level 2 H (Hit ratio) fraction of all memory accesses that are found in the faster memory

Cache Memory
- intended to give memory speed approaching that of the fastest memories available, and at the same time provide a large memory size at the price of less expensive types of semi-conductor memories.

Cache & Main Memory Main Memory consists of up to 2n addressable words, with each word having a unique n-bit address. Mapping memory consists of a number of fixed length blocks of K words each (M = 2n /K blocks). Cache consists of C lines of K words each. (C << M)

07/05/2012

Cache Memory

Cache Memory

At any time, some subset of blocks resides in lines in the cache.


Each line includes a tag (portion of memory address) that identifies which block is currently stored

Elements of Cache Design


Cache Size Mapping Function Direct Write Policy Write Through Write Back Write Once Lines of Cache Number of Caches Single or two level Unified or Split

Elements of Cache Design


Cache Size - to be small enough so that the overall average cost per bit is close to that of main memory alone and large enough so that overall access time is closer to that of the cache alone. Large Caches tend to be slightly slower than the smaller ones (due to large number of gates involved in addressing the cache)

Associative
Set Associative Replacement Algorithm Least Recently Used First In First Out Least Frequently Used Random

Cache Size is limited by the available chip and board area.


Mapping
Since cache is much smaller than main memory, an algorithm is needed for mapping main memory blocks into cache lines. Further, a means is needed for determining which memory block currently occupies a cache line. The choice of mapping function determines the way the cache is organized.

Elements of Cache Design


Consider the following data: Cache can hold 64 kB Data is transferred between cache & main memory in blocks of 4 bytes each. (i.e. cache is organized as 16K lines of 4 bytes.

Direct Mapping
Each block of main memory maps to only one cache line
i.e. if a block is in cache, it must be in one specific place

Main Memory consists of 16MB with each byte directly addressable. (Thus for mapping purposes we can consider main memory to consist of 4M blocks of 4 bytes each)

Address is in two parts Least Significant w bits identify unique word Most Significant s bits specify one memory block The MSBs are split into a cache line field r and a tag of s-r (most significant)

07/05/2012

Direct Mapping Address Structure


Tag s-r 8 Line or Slot r 14 Word w 2

Direct Mapping Cache Line Table


Cache line held 0 1 m+1 m-1 1 Main Memory blocks 0, m, 2m, 3m2s-m 1,m+1, 2m+12s-

24 bit address 2 bit word identifier (4 byte block) 22 bit block identifier
8 bit tag (=22-14) 14 bit slot or line

m-1, 2m-1,3m-12s-

No two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tag

Direct Mapping: Maps each block of main memory into one possible cache line . i = j modulo n i cache line, j main memory block number, m number of lines in the cache.

Address Format:

Mapping

Direct mapping cache treats a main memory address as 3 distinct fields Tag identifier
Line number identifier Word identifier (offset)

Word identifier specifies the specific word (or addressable unit) in a cache line that is to be read Line identifier specifies the physical line in cache that will hold the referenced address The tag is stored in the cache along with the data words of the line For every memory reference that the CPU makes, the specific line
that would hold the reference (if it is has already been copied into the cache) is determined The tag held in that line is checked to see if the correct block is in the cache

Mapping functions
Direct Mapping Advantages: Simple & inexpensive to implement Disadvantages: - fixed cache location for any given block. Thus if a program happens to reference words repeatedly from two different blocks that map into the same cache line, then the blocks will be continually swapped in the cache. (Hit ratio becomes low)

Direct Mapping Summary


Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = m = 2r Size of tag = (s r) bits

Associative Mapping: This overcomes the disadvantages of direct mapping by permitting each main memory block to be loaded into any line of the cache. The cache control logic interprets a memory address simply as a tag and word field. Tag field uniquely identifies a block of main memory

07/05/2012

Associative Mapping
A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every lines tag is examined for a match Cache searching gets expensive

Associative Mapping Address Structure


Tag 22 bit

Word 2 bit

22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block e.g. Address Tag Data Cache line FFFFFC FFFFFC 24682468 3FFF

Associative Mapping Summary


Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = undetermined Size of tag = s bits

Associative Mapping

Disadvantages: Complex circuitry required to examine the tags of all cache line in parallel.

Set Associative Mapping


Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set
e.g. Block B can be in any line of set i

Mapping functions
Set Associative Mapping: This is a compromise that exhibits the strength of both the direct and associative approaches while reducing their disadvantages. Cache is divided into v sets each, each of which consists of k lines. m=vxk i = j modulo v i cache set number, j -main memory block number m number of lines in the cache This is referred to as k-way set associative mapping

e.g. 2 lines per set


2 way associative mapping A given block can be in one of 2 lines in only one set

07/05/2012

Set Associative Mapping

Set Associativ e Mapping

Set Associative Mapping Example


13 bit set number Block number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 map to same set

Set Associative Mapping Address Structure

Tag 9 bit

Set 13 bit

Word 2 bit

Use set field to determine cache set to look in Compare tag field to see if we have a hit e.g
Address number 1FF 7FFC 001 7FFC Tag 1FF 001 Data 1FFF 1FFF Set

12345678 11223344

Set Associative Mapping Summary


Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2d Number of lines in set = k Number of sets = v = 2d Number of lines in cache = kv = k * 2d Size of tag = (s d) bits
Example:

Mapping ..
Memory size of 1 MB (20 address bits) addressable to the individual byte Cache size of 1 K lines, each holding 8 bytes Word id = 3 bits Line id = 10 bits Tag id = 7 bits Where is the byte stored at main memory location $ABCDE stored? $ABCDE=1010101 1110011011 110 Cache line $39B, word offset $6, tag $55

07/05/2012

Mapping ..
Associative mapping Let a block be stored in any cache line that is not in use Overcomes direct mappings main weakness Must examine each line in the cache to find the right memory block Examine the line tag id for each line Slow process for large caches! Line numbers (ids) have no meaning in the cache Parse the main memory address into 2 fields (tag and word offset) rather than 3 as in direct mapping Implement cache in 2 parts The lines themselves in SRAM The tag storage in associative memory Perform an associative search over all tags to find the desired line (if its in the cache) Word id = 3 bits Tag id = 17 bits

Mapping
Associative Mapping:

Where is the byte stored at main memory location $ABCDE stored? $ABCDE=10101011110011011 110 Cache line unknown, word offset $6, tag $1579D

Mapping
Set Associative Mapping: Assume the 1024 lines are 4-way set associative 1024/4 = 256 sets Word id = 3 bits Set id = 8 bits Tag id = 9 bits Where is the byte stored at main memory location $ABCDE stored? $ABCDE=101010111 10011011 110 Cache set $9B, word offset $6, tag $157

Line Replacement Algorithms


When an associative cache or a set associative cache set is full, which line should be replaced by the new line that is to be read from memory? Not a problem for direct mapping since each block has a predetermined line it must use Least recently used First in first out

Least frequently used


Random

Write Policy
Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly When a line is to be replaced, must update the original copy of the line in main memory if any addressable unit in the line has been changed Write through Anytime a word in cache is changed, it is also changed in main memory Both copies always agree Generates lots of memory writes to main memory

Write Policy
Write back
During a write, only change the contents of the cache Update main memory only when the cache line is to be replaced Causes cache coherency problems -- different values for the contents of an address are in the cache and the main memory Complex circuitry to avoid this problem I/O must access main memory through cache N.B. 15% of memory references are writes

07/05/2012

Number of Lines/Blocks in Cache


Block / line sizes How much data should be transferred from main memory to the cache in a single memory reference Complex relationship between block size and hit ratio as well as the operation of the system bus itself As block size increases, Locality of reference predicts that the additional information transferred will likely be used and thus increases the hit ratio (good) Number of blocks in cache goes down,limiting the total number of blocks in the cache (bad) As the block size gets big, the probability of referencing all the data in it goes down (hit ratio goes down) (bad) Size of 4-8 addressable units seems about right for current systems

No. of Caches
Number of caches Single vs. 2-level Modern CPU chips have on-board cache (L1) 80486 -- 8KB Pentium -- 16 KB

Power PC -- up to 64 KB
L1 provides best performance gains Secondary, off-chip cache (L2) provides higher speed access to main memory L2 is generally 512KB or less -- more than this is not cost-effective

Types of Cache
Unified vs. split cache Unified cache stores data and instructions in 1 cache Only 1 cache to design and operate

MESI Cache Coherence Protocol


MESI protocol provides cache coherency in both the Pentium and the PowerPC Stands for Modified Exclusive Shared Invalid Implemented with an additional 2-bit field for each cache line Becomes interesting in the interactions of the L1 and the L2 caches - each track the local MESI status as a line moves from main memory to L2 and then to L1 PowerPC adds an addition state A for allocated A line is marked as A while its data is being swapped out

Cache is flexible and can balance allocation of space to instructions or data to best fit the execution of the program -- higher hit ratio
- Split cache uses 2 caches -1 for instructions and 1 for data Must build and manage 2 caches

Static allocation of cache sizes


Can out perform unified cache in systems that support parallel execution and pipelining (reduces cache contention)

Operation of 2 level Memory


Recall the goal of the memory system: Provide an average access time to all memory locations that is approximately the same as that of the fastest memory component Provide a system memory with an average cost approximately equal to the cost/bit of the cheapest memory component Simplistic approach, Ts = H1xT1 + H2(T1 + T2 + Tb21) H2 = 1 - H1 T1, T2 are the access times to level 1 and 2 Tb21 is the block transfer time from level 2 to level 1 - Can be generalized to 3 or more levels

Pentium 4 Cache
80386 no on chip cache 80486 8k using 16 byte lines and four way set associative organization Pentium (all versions) two on chip L1 caches Data & instructions Pentium 4 L1 caches 8k bytes 64 byte lines four way set associative L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative

Das könnte Ihnen auch gefallen