14

SCSC 511 Operating Systems Caching and Demand Paging
Note: Some slides are adapted from slides 2005 Silberschatz, Galvin, and Gagne, and Prof. John Kubiatowicz lecture
Goals for Today Virtual Memory Demand Paging Page Replacement Allocation of Frames Thrashing Demand Segmentation Operating System Examples
Caching Applied to Address Translation
CPU
Virtual Address
TLB
Cached? Yes No
Physical Address
Physical Memory
Translate (MMU) Data Read or Write (untranslated)
Question is one of page locality: does it exist?

Instruction accesses spend a lot of time on the same page (since accesses sequential) Stack accesses have definite locality of reference Data accesses have less page locality, but still some
Can we have a TLB hierarchy?

Sure: multiple levels at different sizes/speeds
Virtual Memory & Demand Paging

Modern programs require a lot of physical memory
Memory per system growing faster than 25%-30%/year
But they dont use all their memory all of the time
90-10 rule: programs spend 90% of their time in 10% of their code Wasteful to require all of users code to be in memory
Solution: use main memory as cache for disk

Processor Control On-Chip Cache
Caching
Main Second Secondary Level Memory Storage Cache (DRAM) (Disk) (SRAM)
Tertiary Storage (Tape)
Datapath
Virtual Memory
Virtual memory separation of user logical memory from physical memory.
Only part of the program needs to be in memory for execution.
Some functions are almost never executed. E.g. Over-provision data structures in program. E.g.
Logical address space can be much larger than physical address space. Allows address spaces to be shared by several processes. Allows for more efficient process creation. E.g. COW and vfork( ) (in a moment )
Virtual memory can be implemented via:

Demand paging Demand segmentation
Virtual Memory That is Larger Than Physical Memory
Virtual memory that is larger than physical memory
Virtual-address space
Virtual Memory and Process Creation

Virtual memory provides benefits during process creation: - Copy-on-Write (COW) allows both parent and child processes to initially share the same pages in memory. If either process modifies a shared page, then the page is copied.
COW allows more efficient process creation, used in several OS: WinXP, Linux, Solaris Need to find out where to copy a modified page In many OS, free pages are allocated from a pool of zeroed-out pages to improve performance Linux and Solaris also provide vfork( ) vfork( ) doesnt use COW, instead the child process calls exec( ) immediately after vfork( )
Chapter 9: Virtual Memory

Virtual Memory
Demand Paging
Page Replacement Allocation of Frames Thrashing Demand Segmentation Operating System Examples
Demand Paging
Demand paging: pages are only loaded into memory when they are demanded during execution
Less I/O needed Less memory needed Higher degree of multiprogramming Faster response
Pager (lazy swapper) never swaps a page into memory unless that page will be needed. An extreme case: Pure demand paging starts a process with no pages in memory
Transfer of a Paged Memory to Contiguous Disk Space
Valid-Invalid Bit
V/I bit is a hardware support (slight different from v/i bit in ch8)
associated with each page table entry (PTE)

1: page is legal & in-memory 0: page is invalid OR valid but on the disk page fault
Frame #
valid-invalid bit
1 1 1 1 0
0 0
page table
Handling a Page Fault

If access a PTE and find V bit there a page fault trap Check a page table in PCB to determine: Invalid reference abort the process Valid page, but this page is on disk 1. Find a free frame from freeframe list, and then schedule a disk I/O to read the desired page into that free frame; 2. if no free frame, find a page in memory to swap it out (Page replacement algorithms) 3. Restart the instruction that was interrupt by the page fault
(Issues on restart instruction pp323 -- not required)
Performance of Demand Paging

Page Fault Rate: 0 e p e 1.0 Effective Access Time (EAT)
EAT = (1 p) x memory access time + p x page fault time
e.g. P = 0.1 Memory access time = 200 ns page fault time = 8 ms EAT = (1 0.1) x 200 + 0.1 x 8000000 ns Why The page fault is very expensive?
The page fault rate significantly affect the System performance reduce page fault rate P !

Virtual Memory Demand Paging
Page Replacement
Allocation of Frames Thrashing Demand Segmentation Operating System Examples
Page Replacement
Basic page replacement algorithm
1. Find the location of the desired page on disk 2. Find a free frame:
if there is a free frame, use it if there is no free frame, use a page replacement algorithm to select a victim frame. Check if the victim frame is modified (dirty). If its dirty, write back to disk
3. Load the demand page from disk into the (newly) free frame. Update the page and frame tables 4. Restart the interrupted process
(page replacement diagram is next )
Page Replacement Diagram
Page Replacement Algorithms

The goal of page replacement algorithm: lowest pagefault rate Evaluate algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults E.g. If The address sequence is 1, 1, 1, 2, 3, 3, 4, 1, 2, 2, 2, 2, 5, 1, 1, 2, 2, 3, 4, 4, 4, 5,
What is the reference string? Do we need to know other conditions to compute the number of page faults?
Page Faults vs. the Number of Frames

Ans: also need to know the number of frames available There are two major problems to implement demand paging:
Page replacement: how to select victim frame Frame allocation: how many frames to allocate to each process (discuss after finishing page replacement part)
number of Page faults Vs. number of frames (general case)
FIFO Page Replacement Algorithm

Always replace the oldest page A FIFO queue Easy to implement, but the performance is not always good. e.g. 1 Reference string: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 2, 1, 2, 0, 1, 7, 0, 1 (Pure demand paging) 3 pages in memory at a time per process e.g. 2 Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 3 pages in memory at a time per process vs. 4 pages in memory at a time per process
FIFO Illustrating Beladys Anomaly

Beladys Anomaly: for some page replacement algorithm, the page fault rate may increase as the number of allocated frames increases
Optimal (OPT) Page Replacement Algorithm OPT has the lowest page fault rate: replace page that will not be used for longest period of time 4 frames example 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1 2 3 4 5 4
How do you know which page will not be used for longest period of time?
A benchmark used for measuring how well other page replacement algorithm performs
Least-Recently Used (LRU) Algorithm

Motivation: We cannot know the future Use recent past as approximation of the near future LRU algorithm is an approximation of OPT algorithm Choose the page that has not been used for the longest period of time to replace Like OPT, LRU does NOT suffer from Beladys anomaly E.g. reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1 2 3 4
Two implementations of LRU Counter implementation Stack implementation
LRU Algorithm Implementations

Counter implementation Every page entry has a counter; every time page is referenced through this entry, copy the clock (time stamp) into the counter When a page needs to be changed, look at the counters to find the smallest time stamp -- determine which are to change Stack implementation keep a stack of page numbers in a double linked list: When a page is referenced (example is on the next slide)
move it to the top requires 6 pointers to be changed
No search for replacement Problem with both implementations Need additional hardware support Expensive housekeeping is required at each memory reference. Interrupt handling overhead Any solutions ?
An Example of Stack Implementation of LRU
LRU Approximation Algorithms

Many systems provide limited hardware support in the form of a reference bit
With each PTE associate a bit, initially 0 When page is referenced bit set to 1 Replace the page with reference bit is 0 (if one exists). But we do not know the order of use a rough approximation
Additional-Reference-Bit Algorithm
Keep an 8-bit (reference byte ?)for each page to record reference information for the last 8 time periods At regular interval (e.g. 100 ms), OS shifts the reference bit right by 1 and discard the low-order bit e.g.1 if a page does not be referred for a while 0 0 0 0 0 0 0 e.g.2 if a page is continuously referred 1 1 1 1 1 1 1 1 e.g.3 Which page shall be replaced?
11000100 0111 0111
Second-Chance (clock) Page-Replacement Algorithm Second chance: If page to be replaced (in clock order) has reference bit = 1 then: set reference bit 0, and leave page in memory replace next page (in clock order), subject to same rules
Counting Algorithms
Keep a counter of the number of references that have been made to each page LFU Algorithm: replaces page with smallest count MFU Algorithm: based on the argument that the page with the smallest count was probably just brought in and has yet to be used

Virtual Memory Demand Paging Page Replacement
Allocation of Frames
Thrashing Demand Segmentation Operating System Examples
How does OS allocate the fixed amount of free memory (frames) among the various processes? Simple frame allocation algorithm: in a single user system, OS takes some frames, the rest of frames are assigned to a user process Some variations:
Demand paging on the buffer and table space of OS Always reserve a couple of free frames in free frame list Why?
Allocate at least a minimum number of frames for each process defined by the computer architecture e.g. In IBM 370 MVC instruction instruction might span 2 pages 2 pages to handle from 2 pages to handle to Hence minimum 6 pages are needed.
What if there are total 5 frames in this system?
Another example: indirect addressing one-level indirect addressing: a load instruction refers to an address on another page, which is an indirect reference to another page.
How many frames needed? What if in a multi-level indirect addressing computer architecture? any solution?
Two major frame allocation algorithms:

fixed allocation priority allocation
Fixed Allocation Algorithm

Equal allocation
If there are 100 frames and 5 processes, give each process 20 frames.
Proportional allocation Allocate according to the size of

process
si ! size of process pi S ! si m ! total number of frames s ai ! allocation for pi ! i v m S
m ! 64 si ! 10 s2 ! 127 10 v 64 } 5 137 127 v 64 } 59 a2 ! 137 a1 !
Priority Allocation
In both equal and proportional allocation algorithms:
The number of frames allocated also depends on multiprogramming level the more processes, the less frames each gets No differentiation on the priority of processes
We want to allocate more frames to high priority processes to speed up their execution Proportional + Priority allocation scheme Use a proportional allocation scheme using priorities rather than size
Global vs. Local Allocation

Frame allocation is closely related to page replacement Which page to replace in a page fault Global vs. Local ? Global replacement process selects a replacement frame from the set of all frames; one process can take a frame from another (The number of frames allocated to a process may change.) Local replacement each process selects from only its own set of allocated frames (The number of frames allocated to a process does NOT change.) Pros and Cons: Global vs. Local replacement Global
Pro: high priority process may have more frames, flexible and higher system throughput (commonly implemented) con: non-deterministic
Local
pro: deterministic con: inflexible and may hinder system throughput

14

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

14

Hochgeladen von

Copyright:

Verfügbare Formate

SCSC 511 Operating Systems Caching and Demand Paging

Caching Applied to Address Translation

Translate (MMU) Data Read or Write (untranslated)

Question is one of page locality: does it exist?

Can we have a TLB hierarchy?

Virtual Memory & Demand Paging

Solution: use main memory as cache for disk

Tertiary Storage (Tape)

Virtual memory can be implemented via:

Virtual Memory That is Larger Than Physical Memory

Virtual memory that is larger than physical memory

Virtual Memory and Process Creation

Chapter 9: Virtual Memory

associated with each page table entry (PTE)

Handling a Page Fault

Performance of Demand Paging

Chapter 9: Virtual Memory

Page Replacement Diagram

Page Replacement Algorithms

Page Faults vs. the Number of Frames

number of Page faults Vs. number of frames (general case)

FIFO Page Replacement Algorithm

FIFO Illustrating Beladys Anomaly

Least-Recently Used (LRU) Algorithm

Two implementations of LRU Counter implementation Stack implementation

LRU Algorithm Implementations

An Example of Stack Implementation of LRU

LRU Approximation Algorithms

Chapter 9: Virtual Memory

Two major frame allocation algorithms:

Fixed Allocation Algorithm

Proportional allocation Allocate according to the size of

m ! 64 si ! 10 s2 ! 127 10 v 64 } 5 137 127 v 64 } 59 a2 ! 137 a1 !

Global vs. Local Allocation

Das könnte Ihnen auch gefallen