Sie sind auf Seite 1von 50

MIT IAP Course Lecture #1: Virtualization 101

Carl Waldspurger (SB SM 89 PhD 95)


VMware R&D
January 16, 2007

Copyright 2007 VMware, Inc. All rights reserved.

What is Virtualization?

virtual (adj): existing in essence or effect, though not in actual fact


Virtual systems
Abstract physical components using logical objects Dynamically bind logical objects to physical configurations

Examples
Network Virtual LAN (VLAN), Virtual Private Network (VPN) Storage Storage Area Network (SAN), LUN Computer Virtual Machine (VM), simulator

Copyright 2007 VMware, Inc. All rights reserved.

Overview
Virtual Machines Virtualization Approaches Processor Virtualization Additional Topics

Copyright 2007 VMware, Inc. All rights reserved.

Starting Point: A Physical Machine


Physical Hardware
Processors, memory, chipset, I/O bus and devices, etc. Physical resources often underutilized

Software
Tightly coupled to hardware Single active OS image OS controls hardware

Copyright 2007 VMware, Inc. All rights reserved.

What is a Virtual Machine?


Hardware-Level Abstraction
Virtual hardware: processors, memory, chipset, I/O devices, etc. Encapsulates all OS and application state

Virtualization Software
Extra level of indirection decouples hardware and OS Multiplexes physical hardware across multiple guest VMs Strong isolation between VMs Manages physical resources, improves utilization

Copyright 2007 VMware, Inc. All rights reserved.

VM Isolation
Secure Multiplexing
Run multiple VMs on single physical host Processor hardware isolates VMs, e.g. MMU

Strong Guarantees
Software bugs, crashes, viruses within one VM cannot affect other VMs

Performance Isolation
Partition system resources Example: VMware controls for reservation, limit, shares

Copyright 2007 VMware, Inc. All rights reserved.

VM Encapsulation
Entire VM is a File
OS, applications, data Memory and device state

Snapshots and Clones


Capture VM state on the fly and restore to point-in-time Rapid system provisioning, backup, remote mirroring

Easy Content Distribution


Pre-configured apps, demos Virtual appliances

Copyright 2007 VMware, Inc. All rights reserved.

VM Compatibility
Hardware-Independent
Physical hardware hidden by virtualization layer Standard virtual hardware exposed to VM

Create Once, Run Anywhere


No configuration issues Migrate VMs between hosts

Legacy VMs
Run ancient OS on new platform E.g. DOS VM drives virtual IDE and vLance devices, mapped to modern SAN and GigE hardware

Copyright 2007 VMware, Inc. All rights reserved.

Common Virtualization Uses Today


Test and Development Rapidly provision test and development servers; store libraries of pre-configured test machines

Server Consolidation and Containment Eliminate server sprawl by deploying systems into virtual machines that can run safely and move transparently across shared hardware Business Continuity Reduce cost and complexity by encapsulating entire systems into single files that can be replicated and restored onto any target server

Enterprise Desktop Secure unmanaged PCs without compromising end-user autonomy by layering a security policy in software around desktop virtual machines

Copyright 2007 VMware, Inc. All rights reserved.

Overview
Virtual Machines Virtualization Approaches
Virtual machine monitors (VMMs) Virtualization platform types Alternative system virtualizations

Processor Virtualization Additional Topics

Copyright 2007 VMware, Inc. All rights reserved.

10

What is a Virtual Machine Monitor?

An Old Concept
Classic definition from Popek & Goldberg 74 IBM mainframes since 60s

VMM Characteristics
Fidelity Performance Isolation / Safety

Copyright 2007 VMware, Inc. All rights reserved.

11

VMM Technology
So this is just like Java, right?
No, a Java VM is very different from the physical machine that runs it A hardware-level VM reflects underlying processor architecture

Like a simulator or emulator that can run old Nintendo games?


No, they emulate the behavior of different hardware architectures Simulators generally have very high overhead A hardware-level VM utilizes the underlying physical processor directly

Copyright 2007 VMware, Inc. All rights reserved.

12

VMMs Past
An Old Idea
Hardware-level VMs since 60s IBM S/360, IBM VM/370 mainframe systems Timeshare multiple single-user OS instances on expensive hardware

Classical VMM
Run VM directly on hardware
From IBM VM/370 product announcement, ca. 1972

Trap and emulate model for privileged instructions Vendors had vertical control over proprietary hardware, operating systems, VMM

Copyright 2007 VMware, Inc. All rights reserved.

13

VMMs Present
Renewed Interest
Academic research since 90s VMs for commodity systems Server consolidation

VMM for x86


Industry-standard hardware, from laptops to datacenter
VMware Fusion for Mac OS X running WinXP, 2006

Run unmodified commodity guest operating systems Significant challenges, e.g. non-virtualizable instructions Pioneered by VMware in 98

Copyright 2007 VMware, Inc. All rights reserved.

14

VMM Platform Types


Hosted Architecture
Install as application on existing x86 host OS, e.g. Windows, Linux, OS X Small context-switching driver Leverage host I/O stack and resource management Examples: VMware Player/Workstation/Server, Microsoft Virtual PC/Server, Parallels Desktop

Bare-Metal Architecture
Hypervisor installs directly on hardware Acknowledged as preferred architecture for high-end servers Examples: VMware ESX Server, Xen, Microsoft Viridian (2008)

Copyright 2007 VMware, Inc. All rights reserved.

15

System Virtualization Alternatives


Virtual machines abstracted using a layer at different places

Language Level

OS Level

Hardware Level

Copyright 2007 VMware, Inc. All rights reserved.

16

System Virtualization Taxonomy


System Virtualization
Hardware Level High-Level Language

Java Microsoft .NET / Mono Smalltalk

Bare-Metal/ Hypervisor

Hosted

HP Integrity VM IBM zSeries z/VM VMware ESX Server Xen

Microsoft Virtual Server Microsoft Virtual PC Parallels Desktop VMware Player VMware Workstation VMware Server

OS Level

Emulators

Para-virtualization

Virtual Iron VMware VMI Xen

FreeBSD Jail HP Secure Resource Partitions Sun Solaris Zones SWsoft Virtuozzo User-Mode Linux

Bochs Microsoft VPC for Mac QEMU Virtutech Simics

Copyright 2007 VMware, Inc. All rights reserved.

17

Overview
Virtual Machines Virtualization Approaches Processor Virtualization
Classical techniques Software x86 VMM Hardware-assisted x86 VMM Para-virtualization

Additional Topics

Copyright 2007 VMware, Inc. All rights reserved.

18

Classical Instruction Virtualization


Trap and Emulate
Run guest operating system deprivileged All privileged instructions trap into VMM VMM emulates instructions against virtual state e.g. disable virtual interrupts, not physical interrupts Resume direct execution from next guest instruction

Implementation Technique
This is just one technique Popek and Goldberg criteria permit others

Copyright 2007 VMware, Inc. All rights reserved.

19

Classical Memory Virtualization


Traditional VMM Approach Extra Level of Indirection
shadow page table Virtual Physical Guest maps VPN to PPN using primary page tables Physical Machine VMM maps PPN to MPN

VPN
guest

PPN
VMM

hardware TLB

Shadow Page Table


Composite of two mappings

MPN

For ordinary memory references Hardware maps VPN to MPN Cached by physical TLB

Copyright 2007 VMware, Inc. All rights reserved.

20

Memory Traces
Shadow Page Table
Derived from primary page table in guest VMM must keep primary and shadow coherent

Trace = Coherency Mechanism


Write-protect primary page table Trap guest writes to primary Update or invalidate corresponding shadow Transparent to guest

Copyright 2007 VMware, Inc. All rights reserved.

21

Classical VMM Performance


Native Speed Except for Traps
No overhead in direct execution Overhead = trap frequency average trap cost

Trap Sources
Most frequent: Guest page table traces Privileged instructions Memory-mapped device traces

Copyright 2007 VMware, Inc. All rights reserved.

22

x86 Virtualization Challenges


Not Classically Virtualizable
x86 ISA includes instructions that read or modify privileged state But which dont trap in unprivileged mode

Example: POPF instruction


Pop top-of-stack into EFLAGS register EFLAGS.IF bit privileged (interrupt enable flag) POPF silently ignores attempts to alter EFLAGS.IF in unprivileged mode! So no trap to return control to VMM

Deprivileging not possible with x86!

Copyright 2007 VMware, Inc. All rights reserved.

23

How to Virtualize x86?


Interpretation
Problem too inefficient x86 decoding slow

Code Patching
Problem not transparent Guest can inspect its own code

Binary Translation (BT)


Approach pioneered by VMware Run any unmodified x86 OS in VM

Extend x86 Architecture

Copyright 2007 VMware, Inc. All rights reserved.

24

Software VMM: Binary Translation

Direct execute unprivileged guest application code


Will run at full speed until it traps, we get an interrupt, etc.

Binary translate all guest kernel code, run it unprivileged


Since x86 has non-virtualizable instructions, proactively transfer control to the VMM (no need for traps) Safe instructions are emitted without change For unsafe instructions, emit a controlled emulation sequence VMM translation cache for good performance
Copyright 2007 VMware, Inc. All rights reserved.

25

VMware Translator Properties


Binary input is x86 hex, not source Dynamic interleave translation and execution On Demand translate only what about to execute (lazy) System Level makes no assumptions about guest code Subsetting full x86 to safe subset Adaptive adjust translations based on guest behavior

Copyright 2007 VMware, Inc. All rights reserved.

26

BT Mechanics
Input: BB
55 ff 33 c7 03 ...

Each Translator Invocation


Consume a basic block (BB) Produce a compiled code fragment (CCF)

Store CCF in Translation Cache


translator
Future reuse Capture working set of guest kernel Amortize translation costs Not patching in place

Output: CCF
55 ff 33 c7 03 ...

Copyright 2007 VMware, Inc. All rights reserved.

27

Example: IDENT Translation

80304a69 80403a6a 80403a6c 80403a72 80403a74 80403a7a 80403a7b 80403a7d

push push mov mov mov push mov call

%ebp (%ebx) (%ebx), ffffffff %edx, %esp %esp, 81c(%ebx) %edx %ebp, %eax 80460ba4

BB

25555b0 25555b1 25555b3 25555b9 25555bb 25555c1 25555c2 25555c4 25555c9 25555cb

push %ebp push (%ebx) mov (%ebx), ffffffff mov %edx, %esp mov %esp, 81c(%ebx) push %edx mov %ebp, %eax push 80403a82 int 3a data: 80460ba4

CCF
25555c4: push return address 25555c9: invoke translator on callee

Copyright 2007 VMware, Inc. All rights reserved.

28

Adaptive BT
Translation Cache

Translated Code Is Fast


Mostly IDENT translations Runs at speed

!*!

Except Writes to Traced Memory


Page fault (shown as !*!) Decode and interpret instruction Fire trace callbacks Resume execution Can take 1000s of cycles
Invoke Translator

Copyright 2007 VMware, Inc. All rights reserved.

29

Adaptive BT: Fast Trace Handling


Detect and Track Trace Faults
JMP

Splice in TRACE Translation


Execute memory access in software Avoid page fault No re-decoding
TRACE

Faster resumption

Faster Traces
10x performance improvement Adapts to runtime behavior

Invoke Translator

Copyright 2007 VMware, Inc. All rights reserved.

30

Software VMM Evaluation


Benefits
Adaptation Fast traces Fast I/O emulation Flexibility

Costs
Running translator Path lengthening System call slowdown Complexity

Copyright 2007 VMware, Inc. All rights reserved.

31

Hardware-Assisted VMM
Recent x86 Extension
1998 2005: Software-only VMMs using binary translation 2005: Intel and AMD start extending x86 to support virtualization

First-Generation Hardware
Enables classical trap-and-emulate VMMs Intel VT, aka Vanderpool Technology AMD SVM, aka Pacifica

Performance
VT/SVM help avoid BT, but not MMU ops (actually slower!) Main problem is efficient virtualization of MMU and I/O, Not executing the virtual instruction stream

Copyright 2007 VMware, Inc. All rights reserved.

32

VT/SVM Architecture
Diagram
CPL 3 CPL 3

Y-axis: old school x86 privilege (CPL) X-axis: virtualization privilege

CPL 2

CPL 2

Guest Mode
Runs unmodified OS Sensitive operations exit (trap out) to host mode

CPL 1

CPL 1

VMCB
CPL 0 Host CPL 0 Guest

Virtual Machine Control Block VMM-controlled, hardware-walked Buffers simple exits

Copyright 2007 VMware, Inc. All rights reserved.

33

Hardware-Assisted VMM

Hardware-Assisted Direct Exec CPL 0-3

Guest mode

Fault, Trace, Interrupt, I/O ...

Resume Guest

Host mode VMM CPL 0-3

Copyright 2007 VMware, Inc. All rights reserved.

34

Hardware-Assisted VMM Evaluation


Benefits
Simplicity (no BT) Fast system calls No translator overheads

Costs
Exits: 1000s of cycles for traces and I/O No adaptation or software flexibility Stateless model

Future
Hardware support for fast MMU virtualization Intel EPT, AMD NPT

Copyright 2007 VMware, Inc. All rights reserved.

35

What is Paravirtualization?
Full Virtualization
No modifications to guest OS Excellent compatibility, good performance, but complex

Paravirtualization Exports Simpler Architecture


Term coined by Denali project in 01, popularized by Xen Modify guest OS to be aware of virtualization layer Remove non-virtualizable parts of architecture Avoid rediscovery of knowledge in hypervisor Excellent performance and simple, but poor compatibility

Ongoing Linux Standards Work


Paravirt Ops interface between guest and hypervisor Small team from VMware, Xen, IBM LTC, etc.

Copyright 2007 VMware, Inc. All rights reserved.

36

Paravirtualization: Conceptual Diagram

Guest OS

System call interface

Guest OS
Hypercalls (GOOD)

Hypervisor Hardware

Hypervisor Hardware

NOT GOOD!

Full Virtualization
Copyright 2007 VMware, Inc. All rights reserved.

Paravirtualization
37

VMware Vision: Transparent Paravirtualization


Same OS binary

Dom0

VMI Linux DomU

Xeno Linux

VMI Linux

Windows

Solaris VMI Linux

Xen 3.0.x Native

VMware ESX Native Native

Copyright 2007 VMware, Inc. All rights reserved.

38

Further Reading
VMware Publications
www.vmware.com/academic/resources.html A Comparison of Software and Hardware Techniques for x86 Virtualization (ASPLOS 06) Fast Transparent Migration for Virtual Machines (USENIX 05) Memory Resource Management in VMware ESX Server (OSDI 02) Virtualizing I/O Devices on VMware Workstations Hosted VMM (USENIX 01)

Additional Academic Publications


Xen and the Art of Virtualization (SOSP 03) Disco: Running Commodity Operating Systems on Scalable Multiprocessors (SOSP 97) Many more

Copyright 2007 VMware, Inc. All rights reserved.

39

Additional Topics
I/O Virtualization Memory Management

Copyright 2007 VMware, Inc. All rights reserved.

40

I/O Virtualization Stack


Guest Device Driver
Guest OS
Device Driver

Virtual Device
Model existing device, e.g. e1000 Model an idealized device, e.g. vmxnet

Virtualization Layer
Device Emulation I/O Stack Device Driver

Emulates the virtual device Remaps guest and real I/O addresses Multiplexes and drives physical device Provides additional features, e.g. transparent NIC teaming

Real Device
Physical hardware, e.g. bcm5700 Likely to be different than virtual device
Copyright 2007 VMware, Inc. All rights reserved.

41

I/O Virtualization Implementations


Emulated I/O
Hosted or Split
Guest OS
Device Driver Host OS/Dom0/ Parent Domain Device Emulation Device Emulation I/O Stack Device Driver Device Emulation I/O Stack Device Driver Device Manager

Passthrough I/O
Guest OS
Device Driver

Hypervisor Direct
Guest OS
Device Driver

VMware Workstation, VMware Server, VMware ESX Server (for slow devices), Xen, Microsoft Viridian, Virtual Server

VMware ESX Server (storage and network)

A Future Option Many Challenges

Copyright 2007 VMware, Inc. All rights reserved.

42

Passthrough I/O Virtualization


High Performance
Guest OS
Device Driver

Guest OS
Device Driver

Guest OS
Device Driver

Guest drives device directly Minimizes CPU utilization

Enabled by HW Assists
Virtualization Layer I/O MMU Device Manager

I/O-MMU for DMA isolation e.g. Intel VT-d, AMD IOMMU Partitionable I/O device e.g. PCI-SIG IOV spec

VF

VF

VF

Challenges
Hardware independence Migration, suspend/resume Memory overcommitment

I/O Device

PF

PF = Physical Function, VF = Virtual Function

Copyright 2007 VMware, Inc. All rights reserved.

43

Additional Topics
I/O Virtualization Memory Management

Copyright 2007 VMware, Inc. All rights reserved.

44

Memory Management
Desirable capabilities
Efficient memory overcommitment Accurate resource controls Exploit sharing opportunities

Challenges
Allocations should reflect both importance and working set Best data to guide decisions known only to guest OS Guest and meta-level policies may clash

Copyright 2007 VMware, Inc. All rights reserved.

45

VMware Memory Management


Reclamation mechanisms
Ballooning guest driver allocates pinned PPNs, hypervisor deallocates backing MPNs Swapping hypervisor transparently pages out PPNs, paged in on demand Page sharing hypervisor identifies identical PPNs based on content, maps to same MPN copy-on-write

Allocation policies
Proportional sharing revoke memory from VM with minimum shares-per-page ratio Idle memory tax charge VM more for idle pages than for active pages to prevent unproductive hoarding

Copyright 2007 VMware, Inc. All rights reserved.

46

Ballooning
inflate balloon (+ pressure) may page out to virtual disk

Guest OS
balloon

Guest OS
balloon

guest OS manages memory implicit cooperation may page in from virtual disk

deflate balloon ( pressure)

Guest OS

Copyright 2007 VMware, Inc. All rights reserved.

47

Page Sharing
Motivation
Multiple VMs running same OS, apps Collapse redundant copies of code, data, zeros

Transparent page sharing


Map multiple PPNs to single MPN copy-on-write Pioneered by Disco [Bugnion 97], but required guest OS hooks

Content-based sharing
General-purpose, no guest OS changes Background activity saves memory over time

Copyright 2007 VMware, Inc. All rights reserved.

48

Page Sharing: Scan Candidate PPN


011010 110101 010111 101100

hash page contents

2bd806af

VM 1

VM 2

VM 3

hint frame Machine Memory

Hash: VM: PPN: MPN:

06af 3 43f8 123b

hash table

Copyright 2007 VMware, Inc. All rights reserved.

49

Page Sharing: Successful Match

VM 1

VM 2

VM 3

shared frame Machine Memory

Hash: 06af Refs: 2 MPN: 123b

hash table

Copyright 2007 VMware, Inc. All rights reserved.

50

Das könnte Ihnen auch gefallen