Sie sind auf Seite 1von 63

The Memory System

Deepak John, Department Computer Applications,


SJCET-Pala
Basic Concepts
The maximum size of the memory that can be used in any computer is
determined by the addressing scheme.
16-bit addresses = 216 = 64K memory locations
Most modern computers are byte addressable.

Processor Memory
k-bit
address bus
MAR
n-bit
data bus
Up to 2 k addressable
MDR locations

Word length = n bits

Control lines
( R / W, MFC, etc.)

Figure 5.1. Connection of the memory to the processor.


The Memory Hierarchy
Main Memory : memory unit that communicates directly with
the CPU (RAM)
Auxiliary Memory : device that provide backup storage (Disk
Drives)
Cache Memory : special very-high-speed memory to increase
the processing speed (Cache RAM)
Auxiliary m em o ry
Mag n etic
tap es
Main
I/ O p ro c esso r
m em o ry
Mag n etic
d isks

Cac he
CPU
m em o ry
Basic Concepts
Memory access time :the average time taken to read a unit of
information
Memory cycle time: - the average time lapse between two successive
read operations
Semiconductor RAM Memories
Internal organization of memory chips

Each memory cell can hold one bit of information.


Memory cells are organized in the form of an array.
One row is one memory word.
All cells of a row are connected to a common line, known as the
“word line”.
Word line is connected to the address decoder.
Sense/write circuits are connected to the data input/output lines of
the memory chip.
Internal Organization of Memory Chips
b7 b¢7 b1 b¢1 b0 b¢0

W0




FF FF
A0 W1




A1
Address Memory
• • • • • • cells
decoder • • • • • •
A2
• • • • • •
A3

W15 •

Sense / Write Sense / Write Sense / Write R /W


circuit circuit circuit
CS

Data input/output lines: b7 b1 b0


Figure 5.2. Organization of bit cells in a memory chip.
A Memory Chip
5-bit row
address W0
W1
32 32
5-bit
decoder memory cell
array
W31
Sense/ Write
circuitry

10-bit
address
32-to-1
R/ W
output multiplexer
and
CS
input demultiplexer

5-bit column
address
Data
input/output
Figure 5.3. Organization of a 1K(1024)  1 memory chip
SRAM Cell
Two transistor inverters are cross connected to implement a basic flip-flop.
The cell is connected to one word line and two bits lines by transistors T1 and T2
When word line is at ground level, the transistors are turned off and the latch retains
its state.
Two states
State 1:if the logic value at point X is 1 and Y is 0
State 0: if the logic value at point X is 0 and Y is 1 ¢
b b

T1 T2
X Y

Word line
Bit lines
Read operation
I.the word line is activated to close switches T1 and T2.
II.Sense/Write circuits at the bottom, monitor the state of b and b’
and set the o/p accordingly.
Write operation
I.State of cell is set by placing the appropriate value on b and b’
II.Activating word line
Asynchronous DRAMs
StaticRAMs (SRAMs):
Consist of circuits that are capable of retaining their state as long
as the power is applied.
Volatile memories, because their contents are lost when power is
interrupted.
Access times of static RAMs are in the range of few
nanoseconds.
the cost is usually high.
 Dynamic RAMs (DRAMs):
Do not retain their state indefinitely.
Contents must be periodically refreshed.
Contents may be refreshed while accessing them for reading.
Asynchronous DRAMs
Each row can store 512 bytes.
RAS 12 bits to select a row, and 9 bits
to select a group in a row. Total
Row Row 4096 (512 8) of 21 bits.
address
latch decoder cell array
•First apply the row address,
RAS signal latches the row
address. Then apply the column
A20 - 9 A8 - 0 CS
Sense / Write
circuits
address, CAS signal latches the
R/ W
address.
•Timing of the memory unit is
Column
address Column
decoder
controlled by a specialized unit
latch which generates RAS and CAS.
•This is asynchronous DRAM
CAS D7 D0

16 megabit DRAM chip


Synchronous DRAMs
The operations of SDRAM are controlled by a clock signal.
•Operation is directly
synchronized with processor clock
Refresh
counter signal.
•The outputs of the sense circuits
Row
are connected to a latch.
Row
address
latch
decoder Cell array •During a Read operation, the
Row/Column contents of the cells in a row are
address
Column
loaded onto the latches.
Column Read/Write
address
counter decoder circuits & latches •During a refresh operation, the
contents of the cells are refreshed
without changing the contents of
Clock the latches.
R AS
CAS
Mode register
and Data input Data output •Data held in the latches
register register
R/ W timing control correspond to the selected
CS columns are transferred to the
output.
Data
Figure 5.8. Synchronous DRAM.
Synchronous DRAMs

Refresh circuits are included (every 64ms).


Clock frequency > 100 MHz
Intel PC100 and PC133

Latency and Bandwidth


• Memory latency – the amount of time it takes to transfer a word of
data to or from the memory.
• Memory bandwidth – the number of bits or bytes that can be
transferred in one second. It is used to measure how much time is
needed to transfer an entire block of data.
• Bandwidth is the product of the rate at which data are transferred
(and accessed) and the width of the data bus.
DDR-SDRAM
Double-Data-Rate SDRAM
Standard SDRAM performs all actions on the rising edge of the
clock signal.
DDR SDRAM accesses the cell array in the same way, but transfers
the data on both edges of the clock.
The cell array is organized in two banks. Each can be accessed
separately.
Static memories
21-bit
addresses 19-bit internal chip address
A0
A1 •Implement a memory unit of 2M
words of 32 bits each.
A19
A20
•Use 512x8 static memory chips.
•Each column consists of 4 chips.
•Each chip implements one byte
position.
•A chip is selected by setting its
2-bit
decoder
chip select control line to 1.
•Selected chip places its data on he
data output line, outputs of other
chips are in high impedance state.
512K 8
memory chip
D31-24 D23-16 D 15-8 D
7-0
•21 bits to address a 32-bit word.
512K 8 memory chip
•High order 19 bits are needed to
select the row and lower order 2 bits for
19-bit 8-bit data select column by activating the
address input/output
four Chip Select signals.
•19 bits are used to access specific
Chip select
byte locations inside the selected
Organization of a 2M * 32 memory module using 512 K * 8 static chips
chip.
Memory System Considerations
The choice of a RAM chip for a given application depends on
several factors:
Cost, speed, power, size…
SRAMs are faster, more expensive, smaller.
DRAMs are slower, cheaper, larger.
Dynamic memories
• Large dynamic memory systems can be implemented using DRAM
chips in a similar way to static memory systems.
• Placing large memory systems directly on the motherboard will
occupy a large amount of space.
Also, this arrangement is inflexible since the memory system
cannot be expanded easily.
• Packaging considerations have led to the development of larger
memory units known as SIMMs (Single In-line Memory Modules)
and DIMMs (Dual In-line Memory Modules).
• Memory modules are an assembly of memory chips on a small
board that plugs vertically onto a single socket on the motherboard.
Occupy less space on the motherboard.
Allows for easy expansion by replacement.
Memory Controller

Row/Column
Address address

RAS
R/ W
CAS
Memory
Request controller R/ W
Processor Memory
CS
Clock
Clock

Data

Figure 5.11. Use of a memory controller.


Read-Only Memories
ROM
ROM is used for storing programs that are PERMENTLY resident in
the computer and for tables of constants that do not change in value
once the production of the computer is completed
The ROM portion of main memory is needed for storing an initial
program called bootstrap loader,
Bit line

Word line

T
Not connected to store a 1
P Connected to store a 0

Figure 5.12. A ROM cell.


Read-Only-Memory
ROM
PROM:programmable ROM
EPROM: erasable, reprogrammable ROM
EEPROM: can be programmed and erased electrically

Flash Memory
• Difference: only possible to write an entire block of cells ,but read the
contents of a single cell
• Low power consumption
• Use in portable equipment
 Flash cards
 Flash drives
Cache Memories
Cache

Main
Processor Cache
memory

Figure 5.14. Use of a cache memory.


• Processor issues a Read request, a block of words is transferred
from the main memory to the cache, one word at a
time.Subsequent references to the data in this block of words are
found in the cache.
• At any given time, only some blocks in the main memory are held
in the cache. Which blocks in the main memory are in the cache is
determined by a “mapping function.
• When the cache is full, and a block of words needs to be
transferred
from the main memory, some block of words in the cache must be
replaced. This is determined by a “replacement algorithm”.
Cache memory
If the active portions of the program and data are placed in a fast small
memory, the average memory access time can be reduced, Thus reducing
the total execution time of the program
Such a fast small memory is referred to as cache memory
The cache is the fastest component in the memory hierarchy and
approaches the speed of CPU component.
When CPU needs to access memory, the cache is examined

• Locality of reference
- temporal
- spatial
• Cache block – cache line
 A set of contiguous address locations of some size
Principle of Locality
Principle of locality (or locality of reference):
Program accesses a relatively small portion of the address space at
any instant of time.
Temporal locality and spatial locality.
Temporal locality (locality in time):
• Keep most recently accessed data items closer to the processor.
Spatial locality (locality in space):
• Move blocks consisting of contiguous words to ‘upper’ levels.

Cache Hit: data appears in some block


• Hit rate: the fraction of memory access found in the upper level.
Cache Miss: data is not found, and needs to be retrieved from a block.
Miss

Main
CPU
Memory
Cache

Hit
The basic characteristic of cache memory is its fast access time,
Therefore, very little or no time must be wasted when searching the
words in the cache
The transformation of data from main memory to cache memory is
referred to as a mapping process, there are three types of mapping:
Associative mapping
Direct mapping
Set-associative mapping

• In general case, there are 2^k words in cache memory and 2^n
words in main memory .
• The n bit memory address is divided into two fields: k-bits for the
index and n-k bits for the tag field.
Direct mapping
Main Block 0
memory
•Block j of the main memory maps to j modulo 128
Block 1
Cache of the cache. 0 maps to 0, 129 maps to 1.
tag
Block 0 •More than one memory block is mapped onto the
tag
Block 1 same position in the cache.
•May lead to contention for cache blocks even if the
Block 127
cache is not full.
Block 128 •Resolve the contention by allowing new block to
tag
Block 127 Block 129 replace the old block, leading to a trivial
replacement algorithm.
•Memory address is divided into three fields:
- Low order 4 bits determine one of the 16
Tag Block Word
Block 255 words in a block.
5 7 4 Block 256 - When a new block is brought into the cache,
Block 257 the next 7 bits determine which cache block this
Main memory address
new block is placed in.
- High order 5 bits determine which of the
possible 32 blocks is currently present in the cache.
Block 4095 These are tag bits.
•Simple to implement but not very flexible.
Tag Block Word
5 7 4 Main memory address

11101,1111111,1100

Tag: 11101
Block: 1111111=127, in the 127th block of the cache
Word:1100=12, the 12th word of the 127th block in the cache
Associative mapping
Main
memory Block 0 •Main memory block can be placed
Cache Block 1 into any cache position.
tag
Block 0 •Memory address is divided into two
tag
Block 1
fields:
Block 127
- Low order 4 bits identify the word
Block 128
tag
within a block.
Block 127 Block 129
-high order 12 tag bits Identify
which of the 4096 blocks that are
resident in the cache 4096=212.
Block 255
Tag
12
Word
4 Block 256
•Flexible, and uses cache space
Main memory address Block 257 efficiently.
•Replacement algorithms can be used
to replace an existing block in the
Block 4095 cache when the cache is full.
Tag Word
12 4 Main memory address

111011111111,1100

Tag: 111011111111
Word:1100=12, the 12th word of a block in the cache
Set-Associative Mapping
Main
memory

Block 0

Block 1
•Blocks of cache are grouped into sets.
Cache
tag
•Mapping function allows a block of the
Block 0
Set 0
tag
main memory to reside in any block of a
Block 1
tag
specific set.
Block 2 Block 63
Set 1
tag
•Divide the cache into 64 sets, with two
Block 64
Block 3
blocks per set.
Block 65
•Memory address is divided into three
fields:
tag
Set 63
Block 126 - 6 bit Set field determines the set
tag Block 127
Block 127 number.
Block 128
- High order 6 bit fields are
Block 129
Tag Set Word compared to the tag fields of the two
6 6 4
blocks in a set.
Main memory address
•Number of blocks per set is a design
Block 4095 parameter.
Tag Set Word
6 6 4 Main memory address

111011,111111,1100

Tag: 111011
Set: 111111=63, in the 63th set of the cache
Word:1100=12, the 12th word of the 63th set in the cache
Replacement Algorithms

Difficultto determine which blocks to kick out


The cache controller tracks references to all blocks as computation
proceeds.
Increase / clear track counters when a hit/miss occurs

• For Associative & Set-Associative Cache


Which location should be emptied when the cache is full and a
miss occurs?
•First In First Out (FIFO)
•Least Recently Used (LRU)
Replacement Algorithms
CPU A B C A D E A D C F
Reference
Miss Miss Miss Hit Miss Miss Miss Hit Hit Miss

Cache A A A A A E E E E E
FIFO  B B B B B A A A A
C C C C C C C F
D D D D D D
Replacement Algorithms
CPU A B C A D E A D C F
Reference
Miss Miss Miss Hit Miss Miss Hit Hit Hit Miss

Cache A B C A D E A D C F
LRU  A B C A D E A D C
A B C A D E A D
B C C C E A
Virtual Memories
 Main memory smaller than address space
Example: 32-bit address allows an address space of 4G bytes, but
main memory may only be a few hundred megabytes.
 Parts of program not in main memory are stored on secondary
storage devices, such as disks.
 Techniques that automatically move program and data blocks into
the physical main memory when they are required for execution
are called virtual-memory techniques.
 Operating system moves programs and data automatically between
the physical main memory and secondary storage (virtual
memory).
Virtual memory organization
•Memory management unit (MMU) translates
virtual addresses into physical addresses.
Processor
•If the desired data or instructions are in the
Virtual address main memory they are fetched as described
previously.
Data MMU •If the desired data or instructions are not in
the main memory, they must be transferred
Physical address
from secondary storage to the main memory.
Cache •MMU causes the operating system to bring
the data from the secondary storage into the
Data Physical address main memory.
•Virtual addresses will be translated into
Main memory
physical addresses.
DMA transfer
•The virtual memory mechanism bridges the
size and speed gaps between the main
Disk storage memory and secondary storage – similar to
cache
Address translation
 Assume that program and data are composed of fixed-length units
called pages. A page consists of a block of words that occupy
contiguous locations in the main memory.
 Page is a basic unit of information that is transferred between
secondary storage and main memory.
 Size of a page commonly ranges from 2K to 16K bytes.
 Each virtual or logical address generated by a processor is
interpreted as a virtual page number (high-order bits) plus an
offset (low-order bits) that specifies the location of a particular
byte within that page.
 Information about the main memory location of each page is kept in the
page table.
 Area of the main memory that can hold a page is called as page frame.
 Starting address of the page table is kept in a page table base register.
 Page table entry for a page includes:
 Address of the page frame where the page resides in the main memory.
 Some control bits.
 Virtual page number generated by the processor is added to the contents
of the page table base register.
 This provides the address of the corresponding entry in the page table.
 The contents of this location in the page table give the starting address
of the page if the page is currently in the main memory.
Virtual address from processor

PTBR holds
Page table base register
the address of
the page table. Page table address Virtual page number Offset
Virtual address is
interpreted as page
+ number and offset.
PAGE TABLE
PTBR + virtual page
number provide
the entry of the page
in the page table.

Page table holds


information about each
page. This includes the
starting address of the Control Page frame
bits in memory Page frame Offset
page in the main memory.

Physical address in main memory


 Page table entry for a page also includes some control bits
which describe the status of the page while it is in the main
memory.
 The page table information is used by the MMU for every
access, so it is supposed to be with the MMU. which
consists of the page table entries that correspond to the
most recently accessed pages,
 A small cache called as Translation Lookaside Buffer (TLB)
is included in the MMU.
 TLB holds page table entries of the most recently
accessed pages.
 In addition to the above for each page, TLB must hold the
virtual page number for each page.
Virtual address from processor

Virtual page number Offset


•High-order bits of the virtual
address generated by the
TLB processor select the virtual
Virtual page
number
Control
bits
Page frame
in memory
page.
•These bits are compared to
the virtual page numbers in
the TLB.
No
=? •If there is a match, a hit
Yes
occurs and the corresponding
Miss
address of the page frame is
Hit read.
•If there is no match, a miss
Page frame Offset
occurs and the page table
Physical address in main memory
within the main memory must
be consulted.
 if a program generates an access to a page that is not in the main
memory, a page fault is occur.
 Upon detecting a page fault by the MMU, following actions
occur:
 MMU asks the operating system to intervene by raising an
exception.
 Processing of the active task which caused the page fault is
interrupted.
 Control is transferred to the operating system.
 Operating system copies the requested page from secondary
storage to the main memory.
 Once the page is copied, control is returned to the task which
was interrupted.
Performance considerations
 A key design objective of a computer system is to achieve the best
possible performance at the lowest possible cost.
 Price/performance ratio is a common measure of success.
 Performance of a processor depends on:
 How fast machine instructions can be brought into the processor
for execution.
 How fast the instructions can be executed.
Interleaving
 Divides the memory system into a number of memory modules.
Each module has its own address buffer register (ABR) and data
buffer register (DBR).
 Arranges addressing so that successive words in the address space
are placed in different modules.
 When requests for memory access involve consecutive addresses,
the access will be to different modules.
 Since parallel access to these modules is possible, the average
rate of fetching words from the Main Memory can be increased.
Methods of address layouts
k bits m bits
m bits k bits
Module Address in module MM address
Address in module Module MM address

ABR DBR ABR DBR ABR DBR ABR DBR ABR DBR ABR DBR

Module Module Module Module Module Module


k
0 i n- 1 0 i 2 -1

Consecutive words in a module Consecutive words in consecutive modules.

 Consecutive words are placed in a module. •Consecutive words are located in


 High-order k bits of a memory address consecutive modules.
determine the module. •Consecutive addresses can be located
 Low-order m bits of a memory address
determine the word within a module. in consecutive modules.
 When a block of words is transferred from •While transferring a block of data,
main memory to cache, only one module is several memory modules can be kept
busy at a time. busy at the same time.
Hit Rate and Miss Penalty
 The success rate in accessing information at various levels of the
memory hierarchy – hit rate / miss rate.
 A miss causes extra time needed to bring the desired information into
the cache.
 Hit rate can be improved by increasing block size, while considering
the cache size .
 Tave=hC+(1-h)M
 Tave: average access time experienced by the processor
 h: hit rate
 M: miss penalty, the time to access information in the main memory
 C: the time to access information in the cache
Caches on the Processor Chip
 On chip vs. off chip
 Two separate caches for instructions and data or Single cache for
both
 the advantage of separating caches – parallelism, better
performance
 Level 1 and Level 2 caches, are used in high performance
processors
 L1 cache – faster and smaller. Access more than one word
simultaneously and let the processor use them one at a time. Is on
the processor chip
 L2 cache – slower and larger. May be implemented externally using
SRAM chips.
 Average access time: tave = h1C1 + (1-h1)h2C2 + (1-h1)(1-h2)M
where h is the hit rate, C is the time to access information in cache,
M is the time to access information in main memory.
Other Performance Enhancements
Write buffer
 Write-through:
• Each write operation involves writing to the main memory.
• If the processor has to wait for the write operation to be complete, it
slows down the processor.
• Processor does not depend on the results of the write operation.
• Write buffer can be included for temporary storage of write
requests.
• Processor places each write request into the buffer and continues
execution.
 Write-back:
• Block is written back to the main memory when it is replaced.
Prefetching

• New data are brought into the processor when they are first
needed.
• Processor has to wait before the data transfer is complete.
• Prefetch the data into the cache before they are actually needed,
or before a Read miss occurs.
• Prefetching can be accomplished through software by including a
special instruction in the machine language of the processor.
 Inclusion of prefetch instructions increases the length of the
programs.
• Prefetching can also be accomplished using hardware:
 Circuitry that attempts to discover patterns in memory
references and then prefetches according to this pattern.
Lockup-Free Cache
• Prefetching scheme does not work if it stops other accesses to the
cache until the prefetch is completed.
• A cache of this type is said to be “locked” while it services a
miss.
• Cache structure which supports multiple outstanding misses is
called a lockup free cache.
• Since only one miss can be serviced at a time, a lockup free
cache must include circuits that keep track of all the outstanding
misses.
• Special registers may hold the necessary information about these
misses.
Secondary Storage
Magnetic Hard Disks
Organization of Data on a Disk

Sector 0, track 1
Sector 3, trackn
Sector 0, track 0

Figure 5.30. Organization of one surface of a disk.


Access Data on a Disk
 Sector header
 Following the data, there is an error-correction code (ECC).
 Formatting process
 Difference between inner tracks and outer tracks
 Access time – seek time / rotational delay (latency time)
 Data buffer/cache
Disk Controller

Processor Main memory

System bus

Disk controller

Disk drive Disk drive

Figure 5.31. Disks connected to the system bus.


Disk Controller
 Seek
 Read
 Write
 Error checking

RAID Disk Arrays


• Redundant Array of Inexpensive Disks
• Using multiple disks makes it cheaper for huge storage, and
also possible to improve the reliability of the overall system.
• RAID0 – data striping
• RAID1 – identical copies of data on two disks
• RAID2, 3, 4 – increased reliability
• RAID5 – parity-based error-recovery
Aluminum Acrylic Label

Optical Disks Pit Land Polycarbonate plastic

(a) Cross-section

 CD-ROM Pit Land

 CD-Recordable (CD-R)
 CD-ReWritable (CD-RW) Reflection Reflection

 DVD
No reflection
 DVD-RAM
Source Detector Source Detector Source Detector

(b) Transition from pit to land

0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0

(c) Stored binary pattern

Figure 5.32. Optical disk.


Magnetic Tape Systems

File File
mark File
mark
• •
• • 7 or 9
• • bits
• •

File gap Record Record Record Record


gap gap

Figure 5.33. Organization of data on magnetic tape.