Computer Systems (SCS1003) Lecture 07 Note

SCS1003 – Computer Systems
Lecture 7 – Memory Components and

Organization
Lakshman Jayaratne
Learning Objectives
§ Types of primary memory

§ The memory hierarchy
§ Cache memory and mapping schemes
§ Virtual memory
Ø paging and segmentation
Ø address translation
Ø memory fragmentation
§ Measuring memory performance

Primary and Secondary Memory
§ Primary/Main memory
Ø Highest speed
Ø Most expensive, therefore smallest
Ø Typically solid state technology (transistors)
§ Secondary memory
Ø Lower speed
Ø Less expensive, therefore can be larger
Ø Typically magnetic media and electro-
electro-
mechanical drive mechanism
Types of Main Memory
§ ROM – Read Only Memory
Ø Non
Non--volatile, read-
read-only
Ø Variants of ROM
o PROM (Programmable Read Only Memory)
o EEPROM (Electrically Erasable PROM)
§ RAM – Random Access Memory

Ø Volatile, read-
read-write
Ø Two types of RAM
o dynamic RAM (DRAM) – made of capacitors
o static RAM (SRAM) – made of flip flops, used in
caches
Memory Hierarchy
§ Faster memory is normally more expensive

than slower memory
§ Memory hierarchy classifies memory
according to distance from the processor
Ø Small, fast storage elements are kept in the CPU
Ø Larger, slower main memory is accessed through the
data bus
Ø Larger, permanent storage (disk and tape drives) are
still further from the CPU
Memory Hierarchy
Memory Hierarchy
SRAM
Fast
3-30 ns
Accessing Data
§ CPU first sends a request to its nearest memory,

usually cache
§ If the data is not in cache, then main memory is
queried
§ If the data is not in main memory, then the request
goes to disk
§ Once the data is located, then the data, and nearby
data are fetched into cache memory
Ø It is hoped that nearby data contains data or instructions
that will be referenced in the short term
Performance - Terminology
§ Hit – when data is found at a given

memory level
§ Miss – when it is not found
§ Hit ratio – the percentage of time data is
found at a given memory level
§ Miss ratio – the percentage of time it is
not
Ø MissRatio = 1 – HitRatio
Spatial and Temporal Locality
§ When designing memory, we exploit two statistical

properties of executable code and data
Ø Spatial Locality – If we access a particular area in memory
(or disk), subsequent memory accesses are likely to be at
nearby memory locations
Ø Temporal Locality – If we access a particular area in
memory (or disk), we will most likely use it again a short
time later
§ Example: Executing a loop which writes to an array
Ø The CPU will need to fetch every instruction in the loop,
once per loop, and write into the array once per loop
è The loop has temporal and spatial locality in its
access behaviour
Typical Memory Access
§ Most frequently in order to fetch instructions

§ Less frequently to read operands (data)
§ Least frequently to write operands (data)
Caches and Mapping Schemes
§ Caches are small memories which are

situated in the CPU, and run at speeds
similar to CPU registers
§ Instructions and data which are accessed
are stored in the cache
Ø subsequent accesses can find these instructions or
data in the cache
Cache Hierarchy
§ Because SRAMs used in caches increase in cost

with increasing speed, large caches can become
very expensive
§ A hierarchy of caches gives a good hit ratio,
ratio, at a
reasonable cost
§ 2-Level cache hierarchy (most common)
Ø very small and very fast "primary" or "level 1 (L1)" cache
o 64K, 128K
Ø larger and slower "secondary" or "level 2 (L2)" cache
o 256K, 512K, 1MB, 2 MB
Issues in Cache Design
§ How do we find the instruction or the data in

the cache?
§ How do we make the cache “invisible” to the
programmer, or even the operating system?
§ How do we maximize performance?
§ How do we minimize the cost, and maximize
the speed of the cache?
§ How do we ensure consistency between the
cache and memory?
Cache Performance
§ Measured by Effective Access Time (EAT) and Hit

Ratio (HR)
§ Hit Ratio Hits
HR =
Hits + Misses
§ Effective Access Time (EAT)
EAT = HR ´ Access C + (1 - HR ) ´ Access M
§ Example:
Ø Main memory read time 200ns
Ø Cache read time 10ns, HR = 99%
Ø EAT = 0.99 x 10ns + 0.01 x 200ns = 11.9 ns
Design Choices for Caches
§ Size
§ Write Policy (write through, write back)
§ Type (directly mapped, set associative)
Ø line replacement policy
§ Architecture (split, combined)

Cache Management Policy –
Consistency
§ How do we ensure the cache and memory
are consistent ?
Ø Employ a cache management policy, designing the
hardware and the operating system around this policy
Ø Two policies:
o Write through
o Write back
Write Through Policy
§ Whenever the CPU writes data, it is saved in

the cache and also written through to
memory
Ø Every write incurs a write to cache and memory
§ Advantage
Ø Simplicity
§ Disadvantage
Ø Inefficient in bus traffic
Write Back Policy
§ When the CPU writes to the cache, no update

of memory is performed
§ Before replacing a cache line, it must be
"flushed" to memory
§ Advantage
Ø Efficient in bus traffic
§ Disadvantage
Ø Memory doesn’t always agree with the value in the
cache
Types of Caches
§ Directly mapped
§ Set associative
§ (Fully associative)
20
Directly Mapped Cache
Cached contents of memory locations are held

in the Cache Data Memory
§ Each Line or Entry contains the content of one
memory location
§ Each Entry has a Valid Bit which is set to 1
when the Entry is filled
§ A Tag is associated with each Entry
Ø Tags are held in the Cache Tag Memory
§ A very fast comparator is used to compare the
Tag values with address bits
Directly Mapped Cache
N-C Tag Field Size Data Width (eg 32 bits)
2 C Cache Memory Lines (Entries)

Valid Bit
Cache Tag Memory Cache Data Memory
Low Order Address Bits
A 0... A C-1
Memory Address
High Bits Low Bits
High Order Address Bits Hit or Miss ?

A C... A N-1 Comparator
Directly Mapped Cache Access
§ Use the Low Order Address Bits (LOAB) of the
memory address to index into the Cache Tag
Memory
§ Use the comparator to test if
[High Order Address Bits (HOAB) of the
memory address] ==
[value held in the Cache Tag Memory location]
Ø If [Valid Bit == 1 AND comparison is successful]
Then it is a Cache Hit
o retrieve the value in the Cache Data Memory
Ø Else it is a Cache Miss

Directly Mapped Cache Access –
Example
N-C Tag Field Size Data Width (eg 32 bits)
0x000 0x10020 1 25
2 C Cache Memory Lines (Entries)

0x004 0x10020 1 327
0x008 0x10020 0 360
0x00c
Valid Bit
0x010
Cache Tag Memory Cache Data Memory
A 0... A C-1 0x014
0x018
Memory Address 0x01c
0x10020
Bits Low004
.
High Bits .
.
High Order Address Bits Hit or Miss ?
A C... A N-1 Comparator
Directly Mapped Cache – Note (I)
§ Why is this cache model used?

Ø Spatial locality – if we use a particular location,
we will probably also use nearby locations
o These locations share common High Order
Address Bits (HOAB), but differ in their Low Order
Address Bits (LOAB)
At any given time, a directly mapped cache can

hold the content of only one memory location
with a particular set of Low Order Address Bits
Directly Mapped Cache – Note (II)
§ How does the cache become filled ?

Ø Every time there is a cache miss, whatever is fetched from
memory is also saved in the cache
§ How many consecutive instructions can we fit in
the cache, assuming one instruction per line?
Ø Equal to the number of cache lines
§ A cache line in data memory usually contains
several words (4 to 64 consecutive bytes)
Ø A word number is appended to the tag to find the correct
word in a line
Directly Mapped Cache – Pros and
Cons
§ Advantage
Ø Very simple design
§ Disadvantage
Ø If a program frequently accesses two locations
which have the same LOABs and different
HOABs, then for every access the cache will
overwrite the previous entry
Set Associative Cache
§ A cache design where two or more cache lines

can have the same LOABs, but different
HOABs
Ø Cache is divided into 2 or 4 or 8 smaller caches
§ Advantage
Ø Achieve good hit ratios despite being smaller in
size
2-Way Set Associative Cache
§ The simplest set associative cache

§ The total number of cache entries is split into
two halves
Ø Each half has its own comparator for testing HOABs
Ø Common choice – Split architecture (half cache for
instructions and half cache for data)
§ The size of the tag field must be increased by 1
bit
§ Achieves a higher hit ratio than a directly
mapped cache of the same size
2-Way Set Associative Cache
N-C+1 Data Width N-C+1 Data Width
2 C-1 Cache Memory Lines (Entries)

Cache Cache Cache Cache
A 0... A C-2 Tag Data Tag Data
Memory Memory Memory Memory
Memory Address
High Bits Low Bits
High Order Address Bits

A C-1 ... A N-1 Comparator Comparator
Hit or Miss ? Hit or Miss ?

N-Way Set Associative Cache
§ Set associativity can be increased to

higher orders, but the hardware is more
expensive
§ Two popular alternatives:
Ø 4-way and 8-
8-way set associative caches
§ Best improvements in cache hit ratios are
obtained for set associativity 2 and 4
Replacement Policies
§ In set associative caches, we must decide
which of the N cache lines should be replaced
with a new entry
§ The replacement policy chosen depends upon
the locality that we are trying to optimize
Ø Usually we are interested in temporal locality
§ Common replacement policies
Ø Least Recently Used (LRU)
Ø First In First Out (FIFO)
Ø Random – Randomly choose 1 of N
Ø Least Frequently Used
Replacement Policies
§ Least Recently Used (LRU)
Ø Replaces the line that has been unused for the
longest time
Ø Disadvantage: Complexity – must maintain an
access history for each line (slows down the cache)
§ First In First Out (FIFO)
Ø Replaces the line that has been in the cache the
longest, regardless of when it was last used
§ Random
Ø Replaces a line at random
Ø Disadvantage – can replace a line that will be
needed often or soon
Design Choices for Caches: Summary
§ Size
Ø Bigger is usually better
§ Write Policy – Write Through vs Write Back
§ Type – 1 vs N-
N-Way Set Associative
Ø N=2 and N=4 is usually better
Ø Replacement policy for set associative
§ Architecture – Split (instructions and data) vs
Combined
Ø Advantage of Split –
o can get instructions and data at once
o can have different types of caches
Ø Disadvantage of Split –
o size of data or instructions in the cache is application
dependent
Virtual Memory Vs Physical Memory
§ Physical Memory (PM) – the memory
hardware in the machine
Ø PhysicalAddress (PA) – the binary address
used to directly access Main Memory
§ Virtual Memory (VM) – the memory model
seen by the programmer, and the hardware
that implements it
Ø Virtual
Address (VA) – the address seen by
the CPU (and the programmer)
Virtual Memory Vs Cache
§ Cache memory enhances performance by

providing faster memory access speed
§ Virtual memory enhances performance by
providing greater memory capacity,
capacity,
without adding main memory
Cache, Memory and Virtual Memory
Fast Slow
If address not in MMU,
execute software to
fetch from disk into
MISS memory, and update
Virtual Address
the MMU entries
Extremely Slow
Memory
Cache
CPU Mgt Main Memory Disk Subsystem

I/O Bus
Unit (MMU)
I/O Bus I/O Ctlr I/O Ctlr

Controller
HIT Mass Storage Bus
Physical Address
If address Peripheral (Disk)

Disk Block Address
in MMU, fetch
Many Gigabytes
from memory Megabytes - Gigabytes
Virtual Memory Functions
§ Provide a large address space for

programmer
Ø hide
the limitations of the machine's main
memory
§ Provide a “caching mechanism”
Ø conceal the complexity and performance
limitations of the disk hardware
Virtual Memory Implementation
§ We need a scheme that allows us to translate virtual

addresses into physical addresses
§ The device used to perform these translations is called
Memory Management Unit (MMU)
Ø The MMU is usually a part of the CPU
§ When the CPU attempts to access a location in memory,
the MMU
Ø Translates the VA into a PA, and checks whether the
location is already in main memory
Ø If the location is not in main memory, then it is
brought into memory
Design Choices for Virtual Memory
Design principle: Memory is divided into blocks

§ Type of block
Ø Fixed size – Paging
o What size?
Ø Variable size – Segmentation

§ Virtual to physical address mapping
Ø Calculating an address in physical memory
§ Page replacement policy
Ø What do we do if memory is full and we need to
bring a referenced location into memory?
Paging
§ Memory is divided into pages
Ø enables us to store shorter addresses
§ Some VM pages are in page frames in main memory
§ Pages are brought into main memory as needed
(paging)
Pages Page
0 2 Frames
1 - 0 Physical
2 - 1
Virtual 3 0 Memory
Memory 2
4 1
5 - 3
6 -
7 3
Terminology
§ Page frames – the equal size chunks / blocks into

which main memory is divided
§ Pages –blocks into which virtual memory is divided,
each equal in size to a page frame
§ Paging – the process of copying a virtual page from
disk to a page frame in main memory
§ Page fault – an event where a referenced address is
not in main memory, and paging must be performed
Ø by the "page fault handler" in the operating system
Virtual to Physical Address Mapping
Two main steps:

§ Find out whether a referenced address is in
main memory
§ Calculate its actual location in main memory
Execution is interleaved
Finding out where a Referenced
Address is
Through a Page Table, which stores the
physical location of each Virtual Page (VP) page not
in main
memory
2
3
0 page in
2 main
memory
Calculating an Address in Physical
Memory
§ Dividing memory into blocks enables an address to
be divided into two parts
Ø High
High--order bits – contain the address of a page (or
page frame)
Ø Low
Low--order bits – contain the address of a word
inside the page
High-order bits Low-order bits
§ Every word in a page shares the same high-

high-order
address bits, and therefore the same
VA àPA mapping
§ MMU is a device that maps virtual addresses (VAs) into

physical addresses (PAs)
Ø using the Page Table
§ When the CPU attempts to access a location in memory,
the MMU
Ø Translates the VA into a PA, and checks whether the location
is already in main memory
Ø If the location is in main memory, we have a HIT
Ø Else, we have MISS, and a page fault occurs:
– the page containing this location is brought from
disk into memory, and the Page Table is updated
Extremely Slow
Fast Via Software
Disk Subsystem
I/O Bus
I/O Bus I/O Ctlr I/O Ctlr

Controller
MISS
Virtual Address Mass Storage Bus
MMU
HOAB:
HOAB
31…12
-> 13 ... 31
Cache
Memory Peripheral (Disk)

CPU Management Disk Block Address
TLB
Unit Many Gigabytes
Slow
HIT Physical Address Main Memory

Page Select
HOAB
HOAB: ->31…12
13 ... 31
Page
Word Select Table
LOAB ->11…0
LOAB: 0 ... 12
Megabytes - Gigabytes
Slow
Fast
Main Memory
10100000010010101101 Page #N
MISS
Virtual Address
MMU
HOAB:
HOAB
->31…12
13 ... 31
Cache
Memory
CPU
TLB
Management Page #2
Unit
Page Size
HIT Physical Address Page #1 2^12 , 4096
Page Select Bytes
HOAB
HOAB: ->31…12
12 ... 31
Word Select Page #0
LOAB ->11…0
LOAB: 0 ... 11
00000100110010001010 Megabytes - Gigabytes
Slow
Virtual to Physical Address Mapping Main Memory
Page #N
Number of Pages = 2^Q
Q Bits P Bits
Page Size = 2^P
Page #2
10100000010010101101 00000100110010001010 Page Size

Virtual Address MMU Physical Address Page #1 2^12 , 4096
Page Select Page Select Bytes
HOAB
HOAB:->31…12
12 ... 31 HOAB
HOAB: -> 31…12
12 ... 31
Word Select Page #0
LOAB ->11…0
LOAB: 0 ... 11
Megabytes - Gigabytes
Translating a VA into a PA - Procedure
§ Calculate the Virtual Page Number from the

HOAB
§ Look up the entry for this number in the Page
Table à Page Frame
§ If we have a HIT, then calculate the physical
address of the Page Frame
§ Use the word address to get to the correct
location in the Page Frame
Translating a VA into a PA - Example
§ Suppose we have
• 8 MB of physical memory
• 256 MB of virtual memory
• page size is 4096 bytes
§ and we reference address 0x04ad396
HOAB LOAB
0000 0100 1010 1101 0011 1001 0110
Translating a VA into a PA - Example
Ø HOAB = 0000 0100 1010 1101

is mapped to Virtual Page 1197
( 210 + 27 + 25 + 23 + 22 + 20 )
Ø Look up Page 1197 in the Page Frame # Valid bit
Page Table
0 2004 1
Ø Page Frame 2 is mapped
1 1 0
to a memory address:
2 1637 0
4096 x 2 = 8192
… … 1
Ø LOAB = 0011 1001 0110
is used to get to byte 918 1197 2 1
inside the Page Frame … … …
Page Replacement Policy
When all memory Page Frames are being

used, we need to replace a Page Frame
in memory
§ Choose which Page Frame to replace
Ø Least Recently Used (LRU)
Ø First
First--in
in--First
First--out (FIFO)
Ø Least Frequently Used
Ø Random
§ Write its contents to disk

§ Update the Page Table
Effective Access Time (I)
§ Effective access time takes all levels of memory
into consideration
Ø Virtual memory is a factor in the calculation
Ø We have to consider Page Table access time
EAT = (1 - PFR ) ´ Access M + PFR ´ Access HD
§ Effective Access Time (EAT)
where PFR = Page Fault Rate

Effective Access Time –
§ Suppose
Examplea main memory access takes 200ns, and it
takes 10ms to load a page from disk
Ø If the page fault rate is 1%
o EAT = 0.99(200ns + 200ns) + 0.01(10ms) = 100,396 ns
Ø If 100% of pages found in main memory
o EAT = 1.00(200ns + 200ns) = 400ns
Accessing the Page Table costs an additional memory

access because the Page Table is in main memory
Effective Access Time (II)
§ Use Translation Look-

Look-aside Buffer (TLB)
Ø Special Page Table in cache containing most
recent page lookup values
Ø Each entry contains a mapping from
Virtual Page Number à Page Frame
Advantages and Disadvantages
§ Advantages
Ø Can run programs whose virtual address space is
larger than physical memory
Ø Allows more programs to run at the same time
Ø Fixed frame and page sizes simplifies allocation
and placement by the operating system
§ Disadvantages
Ø Paging adds extra memory reference when
accessing data
o Partially alleviated via TLB
Ø Extraresource consumption: the memory
overhead for storing Page Tables
Ø Special hardware and operating system support
Ø Internal Fragmentation
Internal Fragmentation
§ Wasted space when files do not end on a page
boundary
§ On average, 50% of memory is wasted in this
way
§ Internal fragmentation reduces the efficiency of
memory usage on a VM system
§ Should we reduce the page size?
Ø NO – smaller pages increase the size of the Page
Table
o expensive technology
§ Should we share pages between programs ?
Segmentation and
Segmentation with Paging
Segmented Virtual Memory (I)
§ An alternative scheme to the paged VM

system
§ Instead of using fixed size pages, the
system uses variable size segments, each
of which is identical in size to a program
or data file
è Memory is used in chunks of variable in
size, and Internal
Segment Fragmentation
# cannot
byte # in segment
arise
• Each segment consists of a linear
sequence of addresses (from 0 to some
Segmented Virtual Memory (II)
Segmented VM System Main Memory
Segment #1
Segment Table
SIZE ADDRESS
Segment Size
== Variable
Segment #42 Number of
Bytes
Segment #7
Segment #13
Segment Table
§ Holds information about each segment

Ø memory location
Ø size
§ Memory addresses are calculated by
providing
Ø segment number
Ø offset within the segment
Segment Fault
§ Instead of page faults, segmented

systems have "segment faults“ – when a
required segment is not in main memory,
and must be copied from disk into
memory
Ø If the main memory is sparsely occupied, the
segment can be copied into unused memory
Ø If the main memory is mostly occupied, then
one or more segments must be copied back
to disk to make space for the new segment
External Fragmentation
§ When a segment fault arises, enough

contiguous memory must be freed to fit
the new segment in
Ø This requires more effort than a page fault
§ It is unlikely that the space freed by
removing one or more existing segments
will be exactly the right size to fit the new
segment
èThis usually results in External
Fragmentation – gaps between segments
Paged versus Segmented VM
§ Presence in memory
Ø In a paged VM system, only portions of a
file need be in memory at any time, the rest
may be on disk
Ø In a segmented VM system, the whole file
must be in memory at any time
§ Each segment and page must always be
smaller than the physical memory
§ Segment Tables are usually smaller than
Page Tables
Hybrid VM Systems
§ Combine segmented and paged VM

§ Such hybrids use a segmentation
scheme, and each segment uses its own
page table
ØA hybrid system has both internal and
external fragmentation
Example VM Systems
§ Most modern machines use paged VM

§ Most machines designed to run Unix are
built with paged VM
Ø MIPS is paged
§ Burroughs B5000 has segmented
memory
Ø First
commercial computer with virtual
memory
§ The Intel i386/i486 and Pentium-
Pentium-I/II/III
have segmentation and paging hardware
Design Choices for Virtual Memory –
Summary
1. Type of block
• Fixed size – Paging
o Large pages are better
o May cause Internal Fragmentation
• Variable size – Segmentation
o May cause External Fragmentation
• Hybrid
2. Virtual to physical address mapping
• Use Page/Segment Table
o Employ TLB (Translation Look-
Look-aside
Buffer)
Thank You
SCS1003 - Computer
Systems

Computer Systems (SCS1003) Lecture 07 Note

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Computer Systems (SCS1003) Lecture 07 Note

Hochgeladen von

Copyright:

Verfügbare Formate

SCS1003 – Computer Systems

Lecture 7 – Memory Components and

§ Types of primary memory

§ Measuring memory performance

§ RAM – Random Access Memory

§ Faster memory is normally more expensive

§ CPU first sends a request to its nearest memory,

§ Hit – when data is found at a given

§ When designing memory, we exploit two statistical

§ Most frequently in order to fetch instructions

§ Caches are small memories which are

§ Because SRAMs used in caches increase in cost

§ How do we find the instruction or the data in

§ Measured by Effective Access Time (EAT) and Hit

§ Architecture (split, combined)

§ Whenever the CPU writes data, it is saved in

§ When the CPU writes to the cache, no update

Cached contents of memory locations are held

N-C Tag Field Size Data Width (eg 32 bits)

2 C Cache Memory Lines (Entries)

High Order Address Bits Hit or Miss ?

Ø Else it is a Cache Miss

2 C Cache Memory Lines (Entries)

§ Why is this cache model used?

At any given time, a directly mapped cache can

§ How does the cache become filled ?

§ A cache design where two or more cache lines

§ The simplest set associative cache

N-C+1 Data Width N-C+1 Data Width

2 C-1 Cache Memory Lines (Entries)

High Order Address Bits

Hit or Miss ? Hit or Miss ?

§ Set associativity can be increased to

§ Cache memory enhances performance by

the MMU entries

CPU Mgt Main Memory Disk Subsystem

I/O Bus I/O Ctlr I/O Ctlr

If address Peripheral (Disk)

§ Provide a large address space for

§ We need a scheme that allows us to translate virtual

Design principle: Memory is divided into blocks

Ø Variable size – Segmentation

§ Page frames – the equal size chunks / blocks into

Two main steps:

§ Every word in a page shares the same high-

§ MMU is a device that maps virtual addresses (VAs) into

I/O Bus I/O Ctlr I/O Ctlr

Memory Peripheral (Disk)

HIT Physical Address Main Memory

Page Size = 2^P

10100000010010101101 00000100110010001010 Page Size

§ Calculate the Virtual Page Number from the

§ and we reference address 0x04ad396

Ø HOAB = 0000 0100 1010 1101

When all memory Page Frames are being

§ Write its contents to disk

where PFR = Page Fault Rate

Accessing the Page Table costs an additional memory

§ Use Translation Look-

§ An alternative scheme to the paged VM

§ Holds information about each segment

§ Instead of page faults, segmented