Sie sind auf Seite 1von 32

SCSC 511 Operating Systems Caching and Demand Paging

Note: Some slides are adapted from slides 2005 Silberschatz, Galvin, and Gagne, and Prof. John Kubiatowicz lecture

Goals for Today Virtual Memory Demand Paging Page Replacement Allocation of Frames Thrashing Demand Segmentation Operating System Examples

Caching Applied to Address Translation

CPU

Virtual Address

TLB
Cached? Yes No

Physical Address

Physical Memory

Translate (MMU) Data Read or Write (untranslated)

Question is one of page locality: does it exist?


Instruction accesses spend a lot of time on the same page (since accesses sequential) Stack accesses have definite locality of reference Data accesses have less page locality, but still some

Can we have a TLB hierarchy?


Sure: multiple levels at different sizes/speeds

Virtual Memory & Demand Paging


Modern programs require a lot of physical memory
Memory per system growing faster than 25%-30%/year

But they dont use all their memory all of the time
90-10 rule: programs spend 90% of their time in 10% of their code Wasteful to require all of users code to be in memory

Solution: use main memory as cache for disk


Processor Control On-Chip Cache

Caching
Main Second Secondary Level Memory Storage Cache (DRAM) (Disk) (SRAM)

Tertiary Storage (Tape)

Datapath

Virtual Memory
Virtual memory separation of user logical memory from physical memory.
Only part of the program needs to be in memory for execution.
Some functions are almost never executed. E.g. Over-provision data structures in program. E.g.

Logical address space can be much larger than physical address space. Allows address spaces to be shared by several processes. Allows for more efficient process creation. E.g. COW and vfork( ) (in a moment )

Virtual memory can be implemented via:


Demand paging Demand segmentation

Virtual Memory That is Larger Than Physical Memory

Virtual memory that is larger than physical memory

Virtual-address space

Virtual Memory and Process Creation


Virtual memory provides benefits during process creation: - Copy-on-Write (COW) allows both parent and child processes to initially share the same pages in memory. If either process modifies a shared page, then the page is copied.
COW allows more efficient process creation, used in several OS: WinXP, Linux, Solaris Need to find out where to copy a modified page In many OS, free pages are allocated from a pool of zeroed-out pages to improve performance Linux and Solaris also provide vfork( ) vfork( ) doesnt use COW, instead the child process calls exec( ) immediately after vfork( )

Chapter 9: Virtual Memory


Virtual Memory

Demand Paging
Page Replacement Allocation of Frames Thrashing Demand Segmentation Operating System Examples

Demand Paging
Demand paging: pages are only loaded into memory when they are demanded during execution
Less I/O needed Less memory needed Higher degree of multiprogramming Faster response

Pager (lazy swapper) never swaps a page into memory unless that page will be needed. An extreme case: Pure demand paging starts a process with no pages in memory
Transfer of a Paged Memory to Contiguous Disk Space

Valid-Invalid Bit
V/I bit is a hardware support (slight different from v/i bit in ch8)

associated with each page table entry (PTE)


1: page is legal & in-memory 0: page is invalid OR valid but on the disk page fault

Frame #

valid-invalid bit

1 1 1 1 0

0 0
page table

Handling a Page Fault


If access a PTE and find V bit there a page fault trap Check a page table in PCB to determine: Invalid reference abort the process Valid page, but this page is on disk 1. Find a free frame from freeframe list, and then schedule a disk I/O to read the desired page into that free frame; 2. if no free frame, find a page in memory to swap it out (Page replacement algorithms) 3. Restart the instruction that was interrupt by the page fault
(Issues on restart instruction pp323 -- not required)

Performance of Demand Paging


Page Fault Rate: 0 e p e 1.0 Effective Access Time (EAT)
EAT = (1 p) x memory access time + p x page fault time

e.g. P = 0.1 Memory access time = 200 ns page fault time = 8 ms EAT = (1 0.1) x 200 + 0.1 x 8000000 ns Why The page fault is very expensive?

The page fault rate significantly affect the System performance reduce page fault rate P !

Chapter 9: Virtual Memory


Virtual Memory Demand Paging

Page Replacement
Allocation of Frames Thrashing Demand Segmentation Operating System Examples

Page Replacement
Basic page replacement algorithm
1. Find the location of the desired page on disk 2. Find a free frame:
 if there is a free frame, use it  if there is no free frame, use a page replacement algorithm to select a victim frame. Check if the victim frame is modified (dirty). If its dirty, write back to disk

3. Load the demand page from disk into the (newly) free frame. Update the page and frame tables 4. Restart the interrupted process
(page replacement diagram is next )

Page Replacement Diagram

Page Replacement Algorithms


The goal of page replacement algorithm: lowest pagefault rate Evaluate algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults E.g. If The address sequence is 1, 1, 1, 2, 3, 3, 4, 1, 2, 2, 2, 2, 5, 1, 1, 2, 2, 3, 4, 4, 4, 5,
What is the reference string? Do we need to know other conditions to compute the number of page faults?

Page Faults vs. the Number of Frames


Ans: also need to know the number of frames available There are two major problems to implement demand paging:
Page replacement: how to select victim frame Frame allocation: how many frames to allocate to each process (discuss after finishing page replacement part)

number of Page faults Vs. number of frames (general case)

FIFO Page Replacement Algorithm


Always replace the oldest page A FIFO queue Easy to implement, but the performance is not always good. e.g. 1 Reference string: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 2, 1, 2, 0, 1, 7, 0, 1 (Pure demand paging) 3 pages in memory at a time per process e.g. 2 Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 3 pages in memory at a time per process vs. 4 pages in memory at a time per process

FIFO Illustrating Beladys Anomaly


Beladys Anomaly: for some page replacement algorithm, the page fault rate may increase as the number of allocated frames increases

Optimal (OPT) Page Replacement Algorithm OPT has the lowest page fault rate: replace page that will not be used for longest period of time 4 frames example 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1 2 3 4 5 4

How do you know which page will not be used for longest period of time?

A benchmark used for measuring how well other page replacement algorithm performs

Least-Recently Used (LRU) Algorithm


Motivation: We cannot know the future Use recent past as approximation of the near future LRU algorithm is an approximation of OPT algorithm Choose the page that has not been used for the longest period of time to replace Like OPT, LRU does NOT suffer from Beladys anomaly E.g. reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1 2 3 4

Two implementations of LRU Counter implementation Stack implementation

LRU Algorithm Implementations


Counter implementation Every page entry has a counter; every time page is referenced through this entry, copy the clock (time stamp) into the counter When a page needs to be changed, look at the counters to find the smallest time stamp -- determine which are to change Stack implementation keep a stack of page numbers in a double linked list: When a page is referenced (example is on the next slide)
move it to the top requires 6 pointers to be changed

No search for replacement Problem with both implementations Need additional hardware support Expensive housekeeping is required at each memory reference. Interrupt handling overhead Any solutions ?

An Example of Stack Implementation of LRU

LRU Approximation Algorithms


Many systems provide limited hardware support in the form of a reference bit
With each PTE associate a bit, initially 0 When page is referenced bit set to 1 Replace the page with reference bit is 0 (if one exists). But we do not know the order of use a rough approximation

Additional-Reference-Bit Algorithm
Keep an 8-bit (reference byte ?)for each page to record reference information for the last 8 time periods At regular interval (e.g. 100 ms), OS shifts the reference bit right by 1 and discard the low-order bit e.g.1 if a page does not be referred for a while 0 0 0 0 0 0 0 e.g.2 if a page is continuously referred 1 1 1 1 1 1 1 1 e.g.3 Which page shall be replaced?
11000100 0111 0111

Second-Chance (clock) Page-Replacement Algorithm Second chance: If page to be replaced (in clock order) has reference bit = 1 then: set reference bit 0, and leave page in memory replace next page (in clock order), subject to same rules

Counting Algorithms
Keep a counter of the number of references that have been made to each page LFU Algorithm: replaces page with smallest count MFU Algorithm: based on the argument that the page with the smallest count was probably just brought in and has yet to be used

Chapter 9: Virtual Memory


Virtual Memory Demand Paging Page Replacement

Allocation of Frames
Thrashing Demand Segmentation Operating System Examples

Allocation of Frames
How does OS allocate the fixed amount of free memory (frames) among the various processes? Simple frame allocation algorithm: in a single user system, OS takes some frames, the rest of frames are assigned to a user process Some variations:
Demand paging on the buffer and table space of OS Always reserve a couple of free frames in free frame list Why?

Allocate at least a minimum number of frames for each process defined by the computer architecture e.g. In IBM 370 MVC instruction instruction might span 2 pages 2 pages to handle from 2 pages to handle to Hence minimum 6 pages are needed.
What if there are total 5 frames in this system?

Allocation of Frames
Another example: indirect addressing one-level indirect addressing: a load instruction refers to an address on another page, which is an indirect reference to another page.
How many frames needed? What if in a multi-level indirect addressing computer architecture? any solution?

Two major frame allocation algorithms:


fixed allocation priority allocation

Fixed Allocation Algorithm


Equal allocation
If there are 100 frames and 5 processes, give each process 20 frames.

Proportional allocation Allocate according to the size of


process
si ! size of process pi S ! si m ! total number of frames s ai ! allocation for pi ! i v m S

m ! 64 si ! 10 s2 ! 127 10 v 64 } 5 137 127 v 64 } 59 a2 ! 137 a1 !

Priority Allocation
In both equal and proportional allocation algorithms:
The number of frames allocated also depends on multiprogramming level the more processes, the less frames each gets No differentiation on the priority of processes

We want to allocate more frames to high priority processes to speed up their execution Proportional + Priority allocation scheme Use a proportional allocation scheme using priorities rather than size

Global vs. Local Allocation


Frame allocation is closely related to page replacement Which page to replace in a page fault Global vs. Local ? Global replacement process selects a replacement frame from the set of all frames; one process can take a frame from another (The number of frames allocated to a process may change.) Local replacement each process selects from only its own set of allocated frames (The number of frames allocated to a process does NOT change.) Pros and Cons: Global vs. Local replacement Global
Pro: high priority process may have more frames, flexible and higher system throughput (commonly implemented) con: non-deterministic

Local
pro: deterministic con: inflexible and may hinder system throughput

Das könnte Ihnen auch gefallen