Beruflich Dokumente
Kultur Dokumente
Memory Hierarchy
Main Memory
Associative Memory
Cache Memory
Virtual Memory
Memory Management Hardware
Memory
Ideally,
1. Fast
2. Large
3. Inexpensive
Is it possible to meet all 3 requirements simultaneously ?
Introduction
Even a sophisticated processor may
perform well below an ordinary
processor:
Unless supported by matching performance
by the memory system.
Memory Hierarchy
MEMORY HIERARCHY
Memory Hierarchy is to obtain the highest possible
access speed while minimizing the total cost of the memory system
Auxiliary memory
Magnetic
tapes
I/O
processor
Main
memory
CPU
Cache
memory
Magnetic
disks
Register
Cache
Main Memory
Magnetic Disk
Magnetic Tape
Increasing size
Increasing speed
Increasing cost
k - b it a d d r e s s b u s
M e m o ry
k
n - b it d a t a b u s
U p t o 2 a d d r e s s a b le
lo c a t io n s
C o n t r o l lin e s
w o r d le n g t h = n b it s
CU
R / W , MFC , etc
R e g is t e r s
in c r e a s in g
speed
C ache L1
SRAM
C ach e L2
M a in
m e m o ry
se co n d a ry
s to ra g e
m e m o ry
ADRAM
in c r e a s in g
c o s t p e r b it
Random Access Memory any location can be accessed for read / write operation
in fixed amount of time .
Types of RAM :
1. Static memory / SRAM : Capable of retain states as long as power is
applied, volatile in nature.[High cost & speed]
2. Asynchronous DRAM : Dynamic RAM are less expensive but they do not
retain their state indefinitely. Widely used in computers.
3. Synchronous DRAM : Whose operation is directly Synchronized with a clock
signal.
4. Performance Parameter :- Bandwidth & Latency.
5. Bandwidth :-Number of bytes transfer in 1 unit of time.
6. Latency:- Amount of time takes to transferred a word of data to & from
memory.
Read Only Memory / ROM : location can be accessed for read operation only in
fixed amount of time . Capable of retain states called as, non-volatile in nature.
Programmable ROM : Allows data to be loaded by user.
Erasable PROM : Erased [ by UV ray ]Stored data to load new data.
Electrically EPROM : Erased by different voltages.
Memory uses semiconductor integrated circuit to increase performance.
To reduce memory cycle time Use Cache memory A small SRAM physically very
closed to processor which works on locality of reference.
Virtual memory is used to increase the size of the physical memory.
b
7
FF
A0
A2
W1
Address
decoder
FF
A1
W0
Memory
cells
A3
W 15
4
Sense/Write
circuit
Datainput
/outputlines:
b7
Sense/Write
circuit
Sense/Write
circuit
b1
b0
R /W
CS
Each row of cell contains memory word / data & all cells are connected to a
common word line, which is driven by address decoder on chip.
Cell in each column are connected to sense / write circuit by 2 bit lines.
sense / write circuits are connected to data I/O lines of chip.
READ Operation sense / write circuit Sense / Read information stored in
cells selected by a word line & transmit same information to o/p data line.
WRITE Operation sense / write circuit receive i/p information & store it in
the cell.
If a memory chip consist of 16 memory words of 8 bit each then it is referred
as 16 x 8 organization or 128 x 8 bit organization.
The data I/O of each sense / write circuit are connected to a single
bidirectional data line that can be connected to the data bus of a computer.
2 control lines Read / Write [Specifies the required operation ] & Chip Select
(CS ) [ select a chip in a multichip memory ].
It can store 128 bits & 14 external connections like address, data & control
lines.
32 - to -1
O/P MUX
&
I/P DMUX
Static Memories
Circuits capable of retaining their state as long as power is applied Static
RAM (SRAM) (volatile ).
2 inverters are cross connected to form a latch.
Latch is connected to 2 bit lines by transistors T 1 & T2.
transistors T1 & T2 act as switches can be opened & closed under control
of word line.
For ground level transistors turned off (initial time cell is in state 1, X=1
& Y=0 ).
Read Operation :1.
2.
3.
T1
T2
Wordline
Bitlines
Asynchronous DRAM
SRAMs are fast but very costly due to much number of transistors for
their cells.
So, less expensive cell, which also cant retain their state indefinitely
turn into a memory as dynamic RAM [DRAM].
Data is stored in DRAM cell in form of charge on capacitor but only for
a period of tens of milliseconds.
An Example of DRAM
DRAM cell consist of a capacitor,
C , & a transistor, T .
To store information in cell,
transistor T is turn on, & provide
correct amount of voltage to bit
line.
After transistor turn off capacitor
begins to discharge.
So, Read operation must be
completed
before
capacitor
drops voltage below some
threshold value [ by sense
amplifier connected to bit line].
1.
2.
3.
Timing of the
memory unit is
controlled by a
specialized unit
which generates
RAS and CAS.
RA S
Row
address
latch
A20
A8
CA S
4096 x 512 x 8
cellarray
Sense/Write
circuits
Column
address
latch
Row
decoder
Column
decoder
D7
D0
CS
R/ W
2 M x 8 memory chip .
Cells are organized in the form of 4K x 4K .
4096 cells in each row divided into 512 group of 8. Hence 512 byte data can be stored in each
row.
12 [ 512 x 8 = 212 ] bit address to select row & 9 [ 512 = 212 ] bits to specify a group of 8 bits in the
selected row.
RSA [Row address strobe] & CSA [Column address strobe] will be crossed to find the proper bit
to read or write.
Different column addresses can be applied to select and place different bytes on the
data lines.
Allows a block of data to be transferred at a much faster rate than random accesses.
Synchronous DRAM
1.
Refresh
counter
2.
3.
Row
address
latch
Row
decoder
Cell array
4.
Row/Column
address
Column
address
counter
Column
Read / Write
decoder
6.
Clock
RAS
CAS
R/ W
5.
Mode register
and
timing control
Data input
register
Data output
register
7.
CS
Data
Operation
is
directly
synchronized with processor
clock signal.
The outputs of the sense
circuits are connected to a latch.
During a Read operation, the
contents of the cells in a row are
loaded onto the latches.
During a refresh operation, the
contents of the cells are
refreshed without changing the
contents of the latches.
Data held in the latches
correspond to the selected
columns are transferred
to the output.
For a burst mode of operation,
successive
columns
are
selected using column address
counter and clock.
CAS signal need not be
generated
externally. A new data is placed
during raising edge of the clock
Double-Data-Rate SDRAM
In addition to faster circuits, new organizational and operational features
make it possible to achieve high data rates during block transfers.
The key idea is to take advantage of the fact that a large number of bits are
accessed at the same time inside the chip when a row address is applied.
Various techniques are used to transfer these bits quickly to the pins of
the chip.
To make the best use of the available clock speed, data are transferred
externally on both the rising and falling edges of the clock. For this reason,
memories that use this technique are called double-data-rate SDRAMs
(DDR SDRAMs).
Several versions of DDR chips have been developed. The earliest version
is known as DDR. Later versions, called DDR2, DDR3, and DDR4, have
enhanced capabilities.
Static memories
21bit
addresses
19bitinternalchipaddress
A19
A20
3. Each column
chips.
consists
of
2bit
decoder
D3124
D2316
D 158
512 K 8memorychip
19bit
address
8bitdata
input/output
Chipselect
D70
Memory Controller
A d d re ss
RAS
R /W
R eq uest
P ro ce sso r
M e m o ry
C o n t r o lle r
CAS
R /W
CS
C lo c k
C lo c k
d a ta
M e m o ry
Read-Only Memory:
Data are written into a ROM when it is manufactured.
Flash memory:
Has similar approach to EEPROM.
Read contents of a single cell, but write contents of an entire block of cells.
Higher capacity and low storage cost per bit.
Power consumption of flash memory is very low, making it attractive for use in
equipment that is battery-driven.
Associative Memory
Reduces the search time
efficiently
Address
is
replaced
by
content of data called as
Content Addressable Memory
(CAM)
Called as Content based data.
Hardwired Requirement :
It contains memory array &
logic for m words with n bits
per each word.
Argument register (A) & Key
register (k) each have n bits.
Match register (M) has m bits,
one for each word in memory.
Each word in memory is
compared in parallel with the
content of argument register
and key register.
If a match found for a word
which matches with the bits
of argument register & its
corresponding bits in the
match register then a search
for a data word is over.
Cache Memory
Cache Memory
Locality of Reference
The references to memory at any given time interval tend to be confined within a
localized areas.
This area contains a set of information and the membership changes gradually
as time goes by
Temporal Locality
Spatial Locality
Instructions with addresses close to a recently instruction are likely to be executed soon.
If a word is accessed, adjacent (near) words are likely accessed soon (e.g.
Related data items (arrays) are usually stored together; instructions are executed
sequentially)
Cache is a fast small capacity memory that should hold those information which
are most likely to be accessed.
Main memory
CPU
Cache memory
Cache Memory
Processor
Cache
Main
memory
Processor issues a Read request, a block of words is transferred from the main
memory to the cache, one word at a time.
Subsequent references to the data in this block of words are found in the cache.
At any given time, only some blocks in the main memory are held in the cache.
Which blocks in the main memory are in the cache is determined by a mapping
function.
When the cache is full, and a block of words needs to be transferred from the
main memory, some block of words in the cache must be replaced. This is
determined by a replacement algorithm.
Cache Hit
Existence of a cache is transparent to the processor. The
processor issues Read and Write requests in the same manner.
If the data is in the cache it is called a Read or Write hit.
Read hit:
The data is obtained from the cache.
Write hit:
Cache has a replica of the contents of the main memory.
Contents of the cache and the main memory may be updated
simultaneously. This is the write-through protocol.
Update the contents of the cache, and mark it as updated by
setting a bit known as the dirty bit or modified bit. The
contents of the main memory are updated when this block is
replaced. This is write-back or copy-back protocol.
Te = h*Tc + (1 - h) [Tc+Tm]
Example:
Tc = 0.4 s, Tm = 1.2s, h = 85%
Te = 0.85*0.4 + (1 - 0.85) * 1.6 = 0.58s
Cache Miss
If the data is not present in the cache, then a Read miss or Write miss
occurs.
Read miss:
Block of words containing this requested word is transferred from the
memory.
After the block is transferred, the desired word is forwarded to the
processor.
The desired word may also be forwarded to the processor as soon as it
is transferred without waiting for the entire block to be transferred. This
is called load-through or early-restart.
Write-miss:
Write-through protocol is used, then the contents of the main memory
are
updated directly.
If write-back protocol is used, the block containing the addressed word
is first brought into the cache. The desired word is overwritten with new
information.
If the block contains valid data, then the bit is set to 1, else it is 0.
Valid bits are set to 0, when the power is just turned on.
When a block is loaded into the cache for the first time, the valid bit is set to 1.
Data transfers between main memory and disk occur directly bypassing the cache.
When the data on a disk changes, the main memory block is also updated.
However, if the data is also resident in the cache, then the valid bit is set to 0.
What happens if the data in the disk and main memory changes and the write-back
protocol is being used?
In this case, the data in the cache may also have changed and is indicated by the
dirty bit.
The copies of the data in the cache, and the main memory are different. This is
called the cache coherence problem.
One option is to force a write-back before the main memory is updated from the
disk.
Direct mapping
Main
memory
Block1
Cache
tag
Block0
Block0
tag
Block1
Block127
Block128
tag
Block129
Block127
Tag
Block
Word
Mainmemoryaddress
Block255
Block256
Block257
Block4095
Direct mapping
Each memory block has only one place to load in Cache memory.
Operation
1.As execution proceeds, the 7-bit cache block field of each address
generated by the processor points to a particular block location in the cache.
2.The high-order 5 bits of the address are compared with the tag bits
associated with that cache location.
3.If they match, then the desired word is in that block of the cache.
4.If there is no match, then the block containing the required word must first
be read from the main memory and loaded into the cache.
5.The direct-mapping technique is easy to implement, but it is not very
flexible.
Associative mapping
Main
memory
Block0
Block1
Cache
tag
Block0
tag
Block1
Block127
Block128
tag
Block129
Block127
Tag
12
Word
4
Mainmemoryaddress
Block255
Block256
Block257
Block4095
Set-associative mapping
1. Blocks of cache are grouped into sets.
Cache
tag
Main
memory
Block0
tag
Block1
tag
Block2
tag
Block3
Block0
Block1
tag
Block65
Block126
tag
Block127
Tag
Set
Word
4
Mainmemoryaddress
Block127
Block128
Block129
Block4095
Performance Considerations
A key design objective of a computer system is to
achieve the best possible performance at the lowest
possible cost.
Price/performance ratio is a common measure of success.
Memory Interleaving
Divides the memory system into a number of memory
modules.
Each module has its own address buffer register (ABR)
and data buffer register (DBR).
mbits
Module
Addressinmodule
MMaddress
mbits
kbits
Addressinmodule
Module
MMaddress
ABR DBR
ABR DBR
ABR DBR
ABR DBR
ABR DBR
ABR DBR
Module
0
Module
i
Module
n 1
Module
0
Module
i
Module
k
2 1
be
T ave = h1c1+(1-h1)h2c2+(1-h1)(1-h2)M
VIRTUAL MEMORY
memory space
Mapping
physical address
Address Mapping
Memory Mapping Table for Virtual Address -> Physical Address
Virtual address
Virtual
address
register
Memory
mapping
table
Memory table
buffer register
Main memory
address
register
Physical
Address
Main
memory
Main memory
buffer register
ADDRESS MAPPING
Address Space and Memory Space are each divided into fixed size group
of words called blocks or pages
Page 0
Page 1
1K words group
Block 0
Page 2
Block 1
Page 3
Address space
N = 8K = 213
Page 5
Block 2
Block 3
Page 6
Page 7
Line number
0 1 0 1 0 1 0 0 1 1
Virtual address
Presence
bit
000
001
010
011
100
101
Memory space
M = 4K = 212
Page 4
110
111
11
00
01
10
01
0
1
1
0
0
1
1
0
Main memory
Block 0
Block 1
01
0101010011
Main memory
address register
Block 2
Block 3
MBR
PAGE FAULT
1. Trap to the OS
trap
OS
Reference
LOAD M
frame
a. Wait in a queue for this device until serviced
0
6
restart
instruction
reset
page
table
free frame
main memory
bring in
missing
page
PAGE REPLACEMENT
Decision on which page to displace to make room for an incoming page
when no free frame is available
Modified page fault service routine
1. Find the location of the desired page on the backing store
2. Find a free frame
- If there is a free frame, use it
- Otherwise, use a page-replacement algorithm to select a victim frame
- Write the victim page to the backing store
3. Read the desired page into the (newly) free frame
4. Restart the user process
valid/
frame invalid bit
f 0
f
v i
v
page table
2 change to
invalid
4
reset page
table for
new page
swap
out
victim
1
page
victim
3
swap
desired
page in
physical memory
backing store
H
I
T
15 PAGE FAULTS
TWOHITS
TWOHITS
9 page faults
UNEXPECTED
4 frames
1
Page fault
increases
10 page faults
Optimal Algorithm
To recover from beladys anomaly problem : Use Optimal page
replacement algorithm
Replace the page that will not be used for longest period of time.
This guarantees lowest possible page fault rate for a fixed
number of frames.
Example :
H
I
T
H
I
T
09 PAGE FAULTS
TWOHITS
TWOHITS
THREE
HITS
TWOHITS
2
6 page faults
3
4
H
IT
12 PAGE FAULTS
H
IT
TWOHITS
H
IT
H
IT
TWOHITS
Counter implementation
Every page entry has a counter; every time page is
referenced through this entry, copy the clock into the
counter
When a page needs to be changed, look at the counters to
determine which are to change