Sie sind auf Seite 1von 28

Chapter IX

Memory Organization
CS 147

Presented by:
Duong Pham
Introduction
• In chapter IV we look at two simple computers consisting of a
CPU, I/O subsystem, and a memory subsystem.
• The memory of these computers was build using only ROM and
RAM.
• This memory subsystem is fine for computers that perform a
specific task:
– examples: controlling a microwave oven
– controlling a dishwasher, etc..
• However, a complex computers cannot run on a memory
subsystem consisting only of such physical memory because it
would be relatively slow and somewhat limited.
Overview
• Hierarchy of Memory System
• Cache Memory
– Associative Memory
– Cache Memory with Associative Mapping
– Cache Memory with Direct Mapping
– Cache Memory with Set-Associative Mapping
– Replacing Data in the Cache
– Writing Data to the Cache
– Cache Performance
Hierarchy of Memory System
• A computer system is not constructed using a single type of memory.
• In fact, several types of memory are used.
– For examples: Level 1 cache (L1 cache)
– Level 2 cache (L2 cache)
– Physical Memory
– Virtual Memory
• The most well known element of the memory subsystem is the physical memory,
which is constructed using DRAM chips.
• There is also a cache controller which copies data from the physical memory to cache
memory before or when the CPU needs it.
• In general, the closer a component is to the processor, the faster it is and the more
expensive it is.
• Therefore, memory system tend to increase in size as they move away from the CPU.
Virtual
Virtual
CPUwith
CPU with L2
L2 Physical
Physical memory
memory
L1cache
L1 cache cache
cache memory
memory storage
storage

• This is the hierarchy of the memory system.


Cache Memory
• In general, the goal of cache memory is to minimize the
processor’s memory access time at a reasonable cost.
• The main design of these cache memory is to move instructions
and data into cache before the microprocessor’s tries to access
them.
• This means that if we were to achieve this goal, system
performance would improved greatly.
• This is the principle behind the Harvard architecture for
computers.
• Instead of have separate caches for instructions and data, it may
have one unified cache for both.
Associative Memory
• Cache memory can be constructed using either SRAM or
associative memory (content addressable memory).
• Unlike other RAM, associative memory is accessed
differently.
• To access data in associative memory, it searches all of its
locations in parallel and marks the locations that match the
specified data input.
• The matching data are then read out sequentially.
Associative Memory cont.
• To illustrate this, consider a Data
Data v
Data 1
simple associative memory Read
register
0000 1111 0000 1111
0
Write 0000 1011 1000 0000
consisting of eight words, each 1000 1000 0011 1101 1
1111 1111 0100 1001 1
with 16 bits. Mask
1000 1000 0011 1101 0
register
• Note that each word has one 0011 0000 1010 0000 1
1010 1101 0000 0111 1
additional bit labeled v. 1010 0000 0000 0000 0

• This is called the valid bit. Match


Memory register
• If a 1 is shown, it indicates that
the word contains valid data.
• The 0 shows that the data is not Output
valid. register
Associative Memory cont.
• Example:
– to accessed data in the associative memory that has 1010 as its four high order bits.
– The CPU would load the value 1111 0000 0000 0000 into the mask register.
– Each bit that is to be checked, regardless of the value it has is set to 1; all the other bits
are set to zero.
– The CPU also loads the value 1010 xxxx xxxx xxxx into the data register.
– The four leading bits are to be matched and the rest can be anything.
– A match occurs if for every bit position that has a value of 1 in the mask register and
the location of that valid bit is set to 1. Otherwise it’s set to zero.
Associative Memory cont.
• Writing data to associative memory is straight forward.
• The CPU supplies data to the data register and asserts the
write signal.
• The associative memory checks for a location whose valid
bit is zero.
• If it finds one, it will store that information into that
location.
• If it find none, it must clear out a location before it can
store that data.
Cache Memory with Associative Mapping
• Associative memory can be Address X
used to construct a cache
16 8
with associative mapping, or
an associative cache.
Data
Data Register
Register
• The figure shown at right is
an associative cache for a
68K of 8-bit memory system. Mask Register
• An associative cache from 1111 1111 1111 1111 0000 0000
associative memory that is
24-bit wide.
24
• The first 16-bit is the
Memory
memory address.
Valid Match
• The last 8-bit would be data bit Register
that is stored in physical
memory. Address Data
• 16 8
It works just like the
associative memory as I’ve
describe before. Output Register
Cache Memory with Direct Mapping
• Since associative memory is much
more expensive than SRAM, a cache
mapping scheme that uses standard
SRAM can be much more larger than 6(A[15…10])
associative cache and still cost less.
• This is called direct mapping.
• To illustrate this, we consider a 1k cache
for the Relatively Simple (R.S) CPU as
shown on the right.
• Since the cache is 1K, the 10 low-order From

Valid
address bits( index) select on specific R.S. CPU Tag Data
location in the cache. 10
• As in associative cache, it contains a (A[9…0])
valid bit to denote whether or not the
location has valid data.
• In addition, a tag field contains the
high-order bits of the original address
000000 10101010 1
that were not a part of the index.
Therefore, the six high-order bits are
stored in the tag field.
• Last, the cached data value is stored as
the value. Output Register
Cache Memory with Direct Mapping cont.
• For example, consider location 0000 0011 1111 1111 of
physical memory, which contains data 1010 1010.
• This data can only be stored in one location in the cache. The
location that has the same 10 low-order address bits as the
original address, or 11 1111 1111.
• However, any address of the form xxxx xx11 1111 1111 would
map to this same cache location.
• This is the purpose of the tag field.
• In the previous picture, the tag value for this location is 00 0000.
• This means that the data stored at location 11 1111 1111 is
actually the data from physical memory location 0000 0011
1111 1111, which is 1010 1010.
• Also, in the previous picture, we see a 1 in the valid section, if
the bit was 0, none of this would be considered because the data
in that location is not valid.
Cache Memory with Direct Mapping cont.
• Although direct-mapped cache is much less expensive than the associative
cache, it is also much less flexible.
• In associative cache any word of physical memory can occupy any word of
cache.
• However, in direct-mapped cache, each word of physical memory can be
mapped to only one specific location.
• This is a problems for certain of programs.
• A good compiler will allocate the cod so this does not happen.
• However, it does illustrate a problem that can occur due to inflexibility of
direct mapping.
• Set-associative mapping seeks to alleviate this problem while taking advantage
of the strengths of direct-cache mapping method.
• This brings us to the next topic.
Cache Memory with Set-Associative Mapping

• Set-associative cache can makes use of relatively low-cost


SRAM while trying to alleviate the problems of overwriting
data inherent to direct mapping.
• This process is organized just like direct mapped cache except
each address in cache can contain more than one data value.
• A cache in which each location can contain n bytes or words of
data is called an n-way set-associative cache.
Set-associative mapping cont.
• Let consider the 1K, 2-way set-associative cache for the R.S. CPU.
• Each location contains two groups of fields, one for each way of the cache.
• The tag field is the same as in direct mapped cache except it’s 1 bit longer.
• Since the cache holds 1K data entries, and each location holds 2 data values, there are 512
locations total.
• The 9-bit address select the cache location and the remaining 7-bit specify the tag value.
• As before, the data field contains the data from the physical memory location.
• The count/valid field serves 2 purposes:
– (1) One bit of this field is a valid bit, just like the cache mapping schemes.
– (2) the count value used to keep track of when data was accessed.
• This information determines which piece of data will be replaced when a new value is
loaded into the cache.
7(A[15…..9])

Count/valid

Count/valid
From

Data

Data
Tag

Tag
F
R.S. CPU
9(A[8….0])

Two-way set-associative cache for the R.S. CPU.


Replacing Data in the Cache
• As you know, when a computer is powered up, it performs several
functions necessary to ensure its proper operation.
• Among those tasks, it must initialize its cache.
• Therefore, it set the valid bits to 0, much like asserting a register’s
clear input.
• When the computer begins to execute a program, it fetches
instructions and data from memory and load it into the cache.
• It works well if the cache is empty or sparsely populated.
• However, the computer will need to move data into cache locations
that are already occupied.
• Then the problems is to decide which data to move out of the cache
and how to preserve that data in physical memory.
• Direct mapping offers the easiest solution to this problem.
Replacing Data in the Cache cont.
• Since associative cache allows any location in physical memory to be
mapped to any location in cache. It does not have to move data out of
cache and back into physical memory unless it has no location without
valid data.
• There are a number of replacement method that can be use to do this.
• Here are a few of the more popular ones that are used frequently:
– FIFO (First In First Out)
– LRU (Least Recently Used)
– Random
Replacing Data in the Cache cont.

• FIFO (First In First Out):


– This replacement process fills the associative memory from its
top location to its bottom location.
– When it copies data to its last location, the cache is full.
– It then goes back to the top location, replacing its data with the
next value to be stored.
– This algorithm always replaces the data that was loaded into
the cache first among all the data in the cache at that time.
– This method requires nothing other than a register to hold a
pointer to the next location to be replaced.
– Its performance is generally good.
Replacing Data in the Cache cont.
• LRU (Least Recently Used):
– The LRU method keeps track of the relative order in which each
location is accessed and replaces the least recently used value
with the new data.
– This requires a counter for each location in cache and generally
not used with associative caches.
– However, it is used frequently with set-associative cache memory.
• Random:
– The name said it all.
– Random method selects a location to use for the new data.
– In spite of the lack of logic to its selection of location, this
replacement method produces good performance closed to that of
the FIFO method.
Writing Data to the Cache
• To write data to the cache, we use two methods called write-through and
write-back.
• Write-through:
– In write-through, every time a value is written from the CPU into a location
in the cache, it is also written into the corresponding location in physical
memory.
– This guarantees that physical memory always contains the correct value,
but it requires additional time for the writes to physical memory.

• Write-back:
– In write-back, the value written to the cache is not always written to
physical memory.
– The value is written to physical memory only once, when the data is
removed from the cache.
– This saves time used by write-through caches to copy their data to physical
memory, but also introduces a time frame during which physical memory
holds invalid data.
Writing Data to the Cache cont.

• Example:
– Let consider a simple program loop:
– for I = 1 to 1000 do
– x = x + I;
– During the loop, the CPU would write a value to x 1000
times.
– If we use the write-back method, this loop would only
write the result to physical memory one time instead of
1000 times if we were to used write-through method.
– Therefore, write-back offers a significant time savings.
Writing Data to the Cache cont.
• However, performance is not the only consideration.
• Sometimes the currency of data also takes precedence.
• Another situation that must be addressed is how to write data to locations
not currently loaded into the cache.
• This is called a write-miss.
• One possibility is to load the location into cache and then write the new
value to cache using either write-back or write-through method.
• This is called write-allocate policy.
• Then there is the write-no allocate policy.
• This process updates the value in physical memory without loading it into
the cache.
Cache Performance
• The primary reason for including cache memory in a computer is to
improve system performance by reducing the time needed to access
memory.
• The two primary components of cache performance are cache hits and
cache misses.
• Cache hits:
– Every time the CPU accesses memory, it checks the cache.
– If the requested data is in the cache, the CPU accesses the data in
the cache, rather than physical memory
• Cache misses:
– If the requested data is not in the cache, the CPU accesses the data
from main memory (and usually writes the data into the cache as
well.)
Cache Performance cont.
• Hit ratio is the percentage of memory
accesses that are served from the cache, h Tm
rather than from physical memory. 0 60 ns
0.1 55 ns
• The higher the hit ratio, the more times
0.2 50 ns
the CPU accesses the relatively fast 0.3 45 ns
cache memory and the better the system 0.4 40 ns
performance. 0.5 35 ns
• The average memory access time(Tm) is 0.6 30 ns
0.7 25 ns
the weighted average of the cache access
0.8 20 ns
time, Tc, plus the access time for 0.9 15 ns
physical memory, Tp. 1.0 10 ns
• The weighing factor is the hit ratio h.
• Therefore, Tm can be expressed as:
* This is the table for the hit ratios
– Tm = h Tc + (1 - h) Tp and average memory access times
Cache Performance cont.
• The rest of section 9.2 (pages 393-395) show the different methods of
cache activity using all those method that I’ve been discussing so far.
• It uses the average memory access time (Tm) equation to generate
results (hit ratio and average memory access time (Tm)) for each
different methods.
• If you want to take a look at those examples to see how they were
process and generate those results, take a look at those pages I’ve
mention above.
• This concluded my presentation.

• Thank you.
Any questions?