Memory Management

Computer Architecture Memory Management
Memory Paging Segmentation Virtual Memory Caches

Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis
Memory
Core Memory
Period: 1950 ... 1975 Non-volatile Matrix of magnetic cores Storing a bit by changing the magnetic polarity of a core Access time 3s ... 300ns Destructive read
After reading a core, the content is lost. A read cycle must be followed by a write cycle i.o. to restore.
Image source: http://www.psych.usyd.edu.au/pdp-11/core.html Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis
Memory
Semiconductor Memory (1970 ...)
Dynamic memory (DRAM)
Storing a bit by charging a capacitor
(sometimes just the self-capacitance of a transistor) Memory Management
One transistor per bit

High density / capacity per area unit
Volatile Destructive read Self-discharging

Periodic refresh needed
Image source: http://www.research.ibm.com/journal/rd/391/adler.html Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis
Memory
Semiconductor Memory (1970 ...)
Static memory (SRAM)
Storing a bit in a flip-flop
Setting / Resetting the flip-flop Memory Management
6 transistors per bit

More chip area than with DRAM
Volatile Non-destructive read No self-discharge Fast!

Image source: Wikipedia on SRAM (English) Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis
Memory Hierarchy
Memory Management
Program(mer)s want unlimited amounts of fast memory. Economical solution: Memory hierarchy.
Memory hierarchy levels in typical desktop / server computers, figure from [HP06 p.288]
Main Memory
Central to computer system Large array of words / bytes Many programs at a time
for multi-programming / tasking to be effective
Operating System
program 1 program 2 program 3 program 4 program 5 program 6 program n
Working Memory Memory layout of a time sharing system

Address Binding
Memory Management
Program = binary executable file Code/data accessible via addresses

... i = i + 1; check(i); ...
Addresses in the source code are symbolic, here: i (a variable) and check (a function). The compiler typically binds the symbolic addresses to relocatable addresses, such as i is 14 bytes from the beginning of the module. The compiler may also
be instructed to produce absolute addresses (non-relocatable code). The loader finally binds the relocatable addresses to absolute addresses, such as i is at 74014 when loading the code into memory.
Address Binding Schemes

The binding of code and data to logical memory addresses can be done at three stages:
Memory Management
Compile time (Program creation)

The resulting code is absolute code. All addresses are absolute. The program must be loaded exactly to a particular logical address in memory.
Load time
The code must be relocatable, that is, all addresses are given as an offset from some starting address (relative addresses). The loader calculates and fills in the resulting absolute addresses at load time (before execution starts).
Execution time
The relocatable code is executed. Address translation from relative to absolute addresses takes place at execution time (for every single memory access). Special hardware needed (MMU).
Logical / Physical Addresses

Memory Management
Logical Address
The address generated by the CPU, also termed virtual address. All logical addresses form the logical (virtual) address space.
Physical Address
The address seen by the memory. All physical addresses form the physical address space. In compile-time and load-time address-binding schemes the logical and the physical addresses are the same. In execution-time address-binding the logical and physical addresses differ.
Memory Management Unit

Memory Management
Hardware device that maps logical addresses to physical addresses (MMU).
A program (a process) deals with logical addresses, it never sees the real physical addresses.
Computer Architecture WS 06/07
Figure from [Sil00 p.258] Dr.-Ing. Stefan Freinatis
Protection
Memory Management
Protecting the kernel against user processes

No user process may read, modify or even destroy kernel data (or kernel code). Access to kernel data (system tables) only through system calls.
Protecting user processes from one another

No user process may read or modify other processes` data or code. Any data exchange between processes only via IPC.
MMU equipped with limit register Loaded with the highest allowed logical address
This is done by the dispatcher as part of the context switch.
Any address beyond the limit causes an error Assumption: contiguous physical memory per process
Protection
Memory Management
Limit register for protecting process spaces against each other

Figure from [Sil00 p.266] Computer Architecture WS 06/07 Dr.-Ing. Stefan Freinatis
Memory Occupation
Obtaining better memory-space utilization
Memory Management Initially the entire program plus its data (variables) needed to be in memory
Dynamic Loading
Load what is needed when it is needed.
Overlays
Replace code by other code.
Dynamic Linking (Shared Libraries)

Use shared code rather than back-pack everything.
Swapping
Temporarily kick out a process from memory.
Dynamic Loading
Memory Occupation
Routines are kept on disk

Main program is loaded into memory.
Routine loaded when needed

Upon each call it is checked whether the routine is in memory. If not, the routine is loaded into memory.
Unused routines are never loaded

Although the total program size may be large, the portion that is actually executed can be much smaller.
No special OS support required

Dynamic loading is implemented by the user. System libraries (and corresponding system calls) may help the programmer.
Computer Architecture
WS 06/07
Dr.-Ing. Stefan Freinatis
Overlays
Memory Occupation
Existing code is replaced by new code

Similar to dynamic loading, but instead of adding new routines to the memory, existing code is replaced by the loaded code.
No special OS support required

Overlay technique implemented by the user.
Example: Consider a two-pass assembler Pass 1 Pass 2 Symbol table Common routines 70 kB 80 kB 20 kB 30 kB Loading everything at once would require 200 kB.
Pass 1 and pass 2 do not need to be in memory at the same time Overlay
Overlays
Memory Occupation Pass 1, when finished, is overlayed by pass 2. An additional overlay driver is needed (10 kB), but the total memory requirement now is 140 kB instead of 200 kB.
Memory
Figure from [Sil00 p.262] Dr.-Ing. Stefan Freinatis
Dynamic Linking
Different processes use same code
Memory Occupation This especially true for shared system libraries (e.g. reading from keyboard, graphical output on screen, networking, printing, disk access).
Single copy of shared code in memory

Rather than linking the libraries statically to each program (which increases the size of each binary executable), the libraries (or individual routines) are linked dynamically during execution time. Each library only resides once in physical memory.
Stub
is a piece of program code initially located at the library references in the program. When first called it loads the library (if not yet loaded) and replaces itself with the address of the library routine.
OS support required
since a user process cannot look beyond its address space whether (and where) the library code may be located in physical memory (protection!).
Swapping
Memory Occupation
A process can be swapped temporarily out of memory to a
backing store, and then brought back into memory for continued execution.
Backing store: fast disk large enough to accommodate copies
of all memory images for all users; must provide direct access to these memory images.
Roll out, roll in swapping variant used for priority-based
scheduling algorithms; lower-priority process is swapped out so higher-priority process can be loaded and executed.
Major part of swap time is transfer time; total transfer time is
directly proportional to the amount of memory swapped.

Swapping
Memory Occupation
Figure from [Sil00 p.263]
Figure: Process P1 is swapped out, and process P2 is swapped in.

Memory Allocation
Allocation of physical memory to a process
Memory Management
Contiguous
The physical memory space is contiguous (linear) for each process.
Fixed-sized partitions Variable sized partitions

Placement schemes: first fit, best fit, worst fit
Non-Contiguous
The physical memory space per process is fragmented (has holes).
Paging Segmentation Combination of Paging and Segmentation

Contiguous Memory Allocation

The physical memory allocated to a process is contiguous (no holes).
Fixed-sized partitions
Memory is divided into fixed sized partitions. Originally used by IBM OS/360, no longer in use today.
Operating System
process 1
Simple to implement Degree of multiprogramming is bound by the number of partitions Internal fragmentation
free partition
process 2 process 3
process 4
WS 06/07
Contiguous Memory Allocation

The physical memory allocated to a process is contiguous (no holes).
Variable-sized partitions
Partitions are of variable size.
Operating System
process 1 process 2 process 3
OS must keep a free list

listing free memory (holes)
OS must provide placement scheme Degree of multiprogramming only limited by available memory No (or very little) internal fragmentation External fragmentation
The holes may be too small for a new process
process 4
Compaction
Reducing external fragmentation (for variable-sized partitions)
Operating System
process 1 process 2 process 3 process 3 process 4
Operating System
process 1 process 2
Copy operation is expensive

process 4 free memory
WS 06/07
Placement Schemes
Satisfying a request of size n from a list of free holes.
General to the following schemes: find a large enough hole, allocate the portion needed, and return the remainder (leftover hole) to the free list.
First fit
Find the first hole that is large enough. Fastest method.
Best fit
Find the smallest hole that is large enough. The entire list must be searched (unless it is sorted by hole size). This strategy produces the smallest leftover hole.
Worst fit
Find the largest hole. Search entire list (unless sorted). This strategy produces the largest left-over hole, which may be more useful than the smallest leftover hole from the best-fit approach.
WS 06/07
First Fit
Example: we need this amount of memory: Search starts at the bottom.
Operating System
Operating System
process 1 process 2 process 3 The first hole encountered is large enough.
process 4
Search
process 4 leftover hole
WS 06/07
Best Fit
Operating System
Operating System
leftover hole process 1 process 2 process 3 We have to search all holes. The top hole fits best. This scheme creates the smallest leftover hole among the three schemes.
process 4
Search
process 4
WS 06/07
Worst Fit
Operating System
Operating System
process 1 process 2 process 3 We have to search all holes. The bottom hole is found to be the largest. This scheme creates the largest leftover hole among the three schemes. leftover hole
process 4
Search
process 4
WS 06/07
Memory Allocation
Allocation of physical memory to a process
Contiguous
The physical memory space is contiguous (linear) for each process.
Fixed-sized partitions Variable sized partitions

Placement schemes: first fit, best fit, worst fit
Non-Contiguous
The physical memory space of a process is fragmented (has holes).
Paging Segmentation Combination of Paging and Segmentation

Paging
Physical address space of a process can
be non-contiguous
Physical memory divided into fixed-sized frames
Frame size is power of 2, between 512 bytes and 8192 bytes
Logical memory divided into pages

Prage size is identical to frame size.
OS keeps track of all free frames (free-frame list) Running a program of size n pages requires
finding n free frames

Page table translates logical to physical addresses. Internal fragmentation, no external fragmentation.
Address Translation
Paging
Address generated by CPU is divided into:

Page number p used as in index into a page table which contains the base address f of the corresponding frame in physical memory. Page offset d the offset from the frame start, physical memory address = f + d.
page number logical address p
mn
page offset d
n
Logical address is m bits wide. Page size = frame size = 2n.

Paging
Physical address = f + d f = PageTable[p] p = m-n significant bits of logical address d = n least significant bits
low memory
high memory
Paging
Paging model: logical address space is contiguous, whereas the corresponding physical address space is not.
Paging
What is the physical address of k?
n = 2 (page size is 4 byte) m = 4 (logical address space is 16 byte) k is located at logical address 10D
p d
frame number
frame 0
frame 1 frame 2
frame 3
frame address
frame 4
10D = 1010 B
10 10
p = 2, d = 2. 0 1 2 3 20 24 4 8
frame 5 frame 6
f = PageTable[2] = 4
Physical address = f + d = 4 + 2 = 6
PageTable
frame 7
WS 06/07
Free-Frame List
The OS must maintain a table of free frames (free-frame list)
free-frame list free-frame list Paging 13 14 15 16 17 18
page 0 page 1 page 2 page 3
free
14 13 18 20 15
15
13 14
page 1 page 0
frame number
15 16
19 20
0 1 2 3
14 13 18 20
17 18 19 20
page 3 page 2
page table of new process
new process
Page-Table
Where to locate the page table?
Paging
Dedicated registers within CPU

Only suitable for small memory. Used e.g. in PDP-11 (8 page registers, each page 8 kB, 64 kB main memory total). Fast access (high speed registers).
Table in main memory

A dedicated CPU register, the page-table base register (PTBR), points to the table in memory (the table currently in use). With each context switch the PTBR is reloaded (then pointing to another page table in memory). The actual size of the page table is given by a second register, the page table length register (PTLR).
With the latter scheme we need two memory accesses, one for the page table, and one for accessing the memory location itself. Slowdown! Solution: Special hardware cache: translation look-aside buffer (TLB)
WS 06/07
Translation Look-Aside Buffer

Paging
A translation look-aside buffer (TLB) is a small fast-lookup associative memory.

key value
5 0
page number
12 14 13 4 18 15 17 20
frame address or frame number
1 4 2 6 9 3
18
The associative registers contain page frame entries (key | value). When a page number is presented to the TLB, all keys are checked simultaneously. If the desired page number is not in the TLB, it must be fetched from memory.
Translation Look-Aside Buffer

Paging
Paging hardware with TLB. Figure from [Sil00 p.276]

Memory Access Time

Paging
Assume: Memory access time = 100 ns. TLB access time = 20 ns When page number is in TLB (hit): total access time = 20 ns + 100 ns = 120 ns When page number is not in TLB (miss): total access time = 20 ns + 100 ns + 100 ns = 220 ns With 80% hit ratio: average access time = 0.8 120 ns + 0.2 220 ns = 140 ns With 98% hit ratio: average access time = 0.98 120 ns + 0.02 220 ns = 122 ns
Protection
With paging the processes memory spaces are automatically protected against each other since each process is assigned its own set of frames. If a page is tried to be accessed that is not in the page table (or is marked invalid -- see next slide), the process is trapped by the OS. 0 1 2 3
Paging
frame 0
frame 1 frame 2
frame 3
frame address
frame 4
Valid physical addresses:

20 ... 23 24 ... 27 04 ... 07 08 ... 11
20 24 4 8
frame 5 frame 6
PageTable
frame 7
WS 06/07
Frame Attributes
Each frame may be characterized by additional bits in the page table.
Paging
Valid / invalid
Whether the frame is currently allocated to the process
Read-Only
Frame is read-only
Execute-Only
Frame contains code
Shared
Frame is accessible to other processes as well.
WS 06/07
Shared Pages
Implementation of shared memory through paging is rather easy.
Paging
A shared page is a page whose frame is allocated to other processes as well. Many processes share a page in that each of the shared pages is mapped to the same frame in physical memory. Shared code must be non-self modifying code (reentrant code).
Figure on the next slide: Three processes are using an editor. The editor needs 3 pages for its code. Rather than loading the code three times into memory, the code is shared. It is loaded only once into memory, but is visible to each process as if it is their private code. The data (the text edited), of course, is private to each process. Each process thus has its own data frame.
Shared Pages
0 1 2 3
Note: Free memory is shown in gray, occupied memory is in white.
0 1 2 3
Pages 0,1,2 of each process are mapped to physical frames 3,4,6.
0 1 2 3
0 1 2 3
0 1 2 3
0 1 2 3 Figure from [Sil00 p.283]
WS 06/07
Paging
Logical address space of modern CPUs: 232 ... 264 Assume: 32-bit CPU, frame size = 4K 232 / 212 = 220 page table entries (per process) Each entry size = 20 bit + 20 bit = 5 byte
20 bit for page number. 20 bit for frame number (less than requiring 32 bit for the frame address).
page table entry
page number frame number

20 20
220 x 5 byte = 5 MB per page table!
WS 06/07
Two-Level Paging
Paging Often, a process will not use all of its logical address space. Rather than allocating the page table contiguously in main memory (for the worst case), the page table is divided into small pieces and is paged itself.
outer page table
inner page table output points to a frame containing page table entries (inner page table entries) output points to final destination frame
WS 06/07 Dr.-Ing. Stefan Freinatis
Two-Level Paging
Paging
page number logical address p1

10
page offset p2
10
d
12
Numbers are for the 32-bit, 4 kB frame, example
max 210 entries each page of inner table has 210 entries final destination frame in memory
WS 06/07
Multi-Level Paging
Paging
Tree-Structure principle
Each outer page entry defines a root node of a tree.
Two / three / four level paging

SPARC (32 bit): three-level paging. Motorola 68030 (32 bit): four-level paging.
Better memory utilization

than using a contiguous (and possibly maximum-sized) page table.
Increase in access time

since we hop several times until final memory location is reached. Caching (TLB) however helps out a lot. Four-level paging with 98% hit rate: Effective access time = 0.98 120 ns + 0.02 520 ns = 128 ns

Memory Management

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Memory Management

Hochgeladen von

Copyright:

Verfügbare Formate

Computer Architecture Memory Management

Memory Paging Segmentation Virtual Memory Caches

One transistor per bit

Volatile Destructive read Self-discharging

6 transistors per bit

Volatile Non-destructive read No self-discharge Fast!

Working Memory Memory layout of a time sharing system

Program = binary executable file Code/data accessible via addresses

Address Binding Schemes

Compile time (Program creation)

Logical / Physical Addresses

Memory Management Unit

Hardware device that maps logical addresses to physical addresses (MMU).

Figure from [Sil00 p.258] Dr.-Ing. Stefan Freinatis

Protecting the kernel against user processes

Protecting user processes from one another

Limit register for protecting process spaces against each other

Dynamic Linking (Shared Libraries)

Routines are kept on disk

Routine loaded when needed

Unused routines are never loaded

No special OS support required

Dr.-Ing. Stefan Freinatis

Existing code is replaced by new code

No special OS support required

Figure from [Sil00 p.262] Dr.-Ing. Stefan Freinatis

Single copy of shared code in memory

A process can be swapped temporarily out of memory to a

directly proportional to the amount of memory swapped.

Figure from [Sil00 p.263]

Figure: Process P1 is swapped out, and process P2 is swapped in.

Fixed-sized partitions Variable sized partitions

Paging Segmentation Combination of Paging and Segmentation

Contiguous Memory Allocation

Dr.-Ing. Stefan Freinatis

Contiguous Memory Allocation

OS must keep a free list

Dr.-Ing. Stefan Freinatis

Copy operation is expensive

Dr.-Ing. Stefan Freinatis

Dr.-Ing. Stefan Freinatis

process 4 leftover hole

Dr.-Ing. Stefan Freinatis

Dr.-Ing. Stefan Freinatis

Dr.-Ing. Stefan Freinatis

Fixed-sized partitions Variable sized partitions

Paging Segmentation Combination of Paging and Segmentation

Logical memory divided into pages

finding n free frames

Address generated by CPU is divided into:

Logical address is m bits wide. Page size = frame size = 2n.

Figure from [Sil00 p.272]

Dr.-Ing. Stefan Freinatis

page table of new process

Dedicated registers within CPU

Table in main memory

Dr.-Ing. Stefan Freinatis

Translation Look-Aside Buffer

A translation look-aside buffer (TLB) is a small fast-lookup associative memory.

Translation Look-Aside Buffer

Paging hardware with TLB. Figure from [Sil00 p.276]

Memory Access Time

Figure from [Sil00 p.272]

Valid physical addresses: