Beruflich Dokumente
Kultur Dokumente
Session xVI05
Open Source Virtualization with KVM
for IBM System x
Tom.Schwaller@de.ibm.com - Linux Architect
Since 2006 his main focus is on high performance and cloud computing,
virtualization, high end x86 systems, iDataPlex & BladeCenter and high
speed Infiniband/10GbE networking/storage (incl. GPFS). From 2007-2008
he was Deep Computing Lead Architect in CEEMEA. At the moment he
works as Lead Architect for a major Linux Desktop Cloud project in
Germany.
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x
Agenda
x86-Virtualization
KVM (Kernel-based Virtual Maschine)
KVM Performance Tuning
– KSM (Kernel SamePage Merging)
– VirtFS
KVM I/O-Architecture (Evolution)
– Virtio
– I/O Acceleration with vhost-net & SRIOV
KVM Security
– SELinux & sVirt
QEMU
– Creating Disk Images & manual Installation
x86-Virtualization
VM VM VM VM VM VM VM VM
Domain 0
Binary Translation
An Alternative to Xen
In 2006-2007, kernel developers started talking
about an alternative to Xen that was more closely
aligned with Linux
– A few issues stood out:
• NUMA Support
• Control Tool Stack
A startup, Qumranet, wanted to build a VDI solution
using Open Virtualization
Qumranet never intended on being a hypervisor vendor
Leveraging Hardware
Design virtualization support around hardware virtualization
– Hardware virtualization support is pervasive
– Modern Intel VT-x/AMD-V outperforms paravirtualization (PV)
Tremendous simplification comes from not supporting older hardware
– Intrusive Linux patching is unnecessary
Leveraging Linux
Xen is virtualization added to an exokernel
Linux is a proof-by-example that monolithic kernels
are more scalable/secure/fast/stable than microkernels/exokernels
– Linux dominates the top 100
– Linux has a large share in the embedded space
– Rising desktop/server market shares
If Linux can be used in Naval destroyers for systems control,
why can't it be used to run a couple dozen Windows XP instances
running Office?
Type 1 Hypervisor
– Not “bare metal” in a classical sense, but hypervisor is kernel-integrated
Introduces new instruction execution mode – Guest Mode
– Executes VMs closer to Kernel avoiding User Mode context switching like
traditional non-kernel integrated Type 2 Hypervisor
Slightly modified QEMU is used for HVM construct and I/O
– virtio utilizes user mode virtio drivers inherent in Kernel/QEMU for performance
ioctl()
Switch to
Guest Mode
Kernel
Exit Handler
Userspace
Exit Handler
KVM Features
Power Management
– C and P state support
– Advanced governers
– Suspend/resume
Memory Management
– NUMA support
• Policy control
• Memory migration
– Swapping
– Overcommit
– Compression (KSM)
Resource Control
– cgroups
– CFS tunables
Anything that Linux supports
All Hardware that Linux supports is supported in KVM
– Compare this to ESX!
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x
Red Hat
Libvirt
Daniel Berrange
Toolchain
100 kloc Released
Hypervisor
IBM
Qemu Device
Anthony Liguori
Model
120 kloc
Red Hat
KVM Modules Avi Kivity
15 kloc
VirtFS
VirtFS - Overview
What is it?
– Filesystem pass-through mechanism between the KVM host
and guest operating systems (para-virtualized file system)
– Uses Plan-9 Protocol (9P2000.L) between Client and Server
• Simple, efficient protocol, maintained by IBM
– Server is on Host and is part of QEMU with VirtIO transport
– Client is part of the Quest Kernel.
What are the expectations?
– Provide secure and isolated Filesystem exports
between the KVM Host and Guest.
– Close to native Filesystem (GPFS) performance
– Multi-tenancy
Who Needs it?
– VSC (Virtual Storage Cloud)
• SoNAS on top of VirtFS client in a KVM guest
– SoNAS
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x
Apps on Guest
VirtIO
Ring
VFS Interface
VirtFS (v9fs) Host
Client
User Space
Guest Kernel
VirtFS
Server
GPFS API
(v9fs server in
QEMU) GPFS Client
VFS Interface
HOST KERNEL
HARDWARE
Virtio
Virtio
First proposed by Rusty Russell
– Based on our experiences with Xen frontend/backend architecture
Addressed a number of concerns:
– Clear separation between protocol and transport to allow multiple
hypervisors to utilize
– Each component uses well defined interface and is replaceable
– Minimum driver implementation required
– Fits on top of existing hardware abstraction well (PCI)
Linux will support lguest, KVM, Xen, KVM-lite, PHYP, VMware,
Viridian, and possibly more
– If each has 4-5 PV drivers, that's 35 new drivers!
– All drivers would be doing the same thing
virtio is an abstraction of the common mechanism of VMMs
– A single driver could, with little modification, run on many different VMMs
Especially important for “small” drivers (entropy driver, CPU hotplug,
ballooning, etc.)
Virtio Architecture
virtio
Virtio-net
virtio-net
Where most of the work is happening these days
Current performance is in most cases, better than Xen
netfront/netback
− Xen suffers from asymmetric RX/TX performance
− KVM maintains symmetry on both
The mainline bits are still only roughly 50% of native
− Active work underway to improve that further
Uses the tun/tap device
− Added GSO support to improve performance
Host OS
SRIOV Overview
SRIOV : Single Root I/O Virtualization
PCI-SIG Standard
Ability to drive a PCIe function from
multiple independent software entities
Each software entity believes it has
exclusive access
Provides high throughput, low CPU
utilization, high scalability
Requires platform support
VF2 n
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x
CPU utilization
QEMU
QEMU
QEMU is a community-driven project
– No company has sponsored major portions of it's development
QEMU does a really amazing thing
– Can emulate 9 target architectures on 13 host architectures!
– Provides full system emulation supporting ~200 distinct devices
– Very sophisticated and complete command line interface (CLI)
– There are more than 90 options in the output of qemu-kvm --help
Is the basis of KVM, Xen HVM, and xVM Virtual Box
– Every Open Source virtualization project uses QEMU
Userspace device model for KVM
− Provides management interface
− Provides device emulation
− Provides paravirtual IO backends
libvirt communicates directly with QEMU
libvirt-cim communicates with libvirt
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x
KVM Security
– XACE
Similar work XEN Vulnerability
http://www.hacker-soft.net/Soft/Soft_13289.htm
– XSM (port of Flask to Xen)
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x
virt-manager with
static SELinux labeling
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x
Questions ?
Trademarks
Trademarks
The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. For a complete list of IBM Trademarks, see
www.ibm.com/legal/copytrade.shtml: AS/400, DBE, e-business logo, ESCO, eServer, FICON, IBM, IBM Logo, iSeries, MVS, OS/390, pSeries, RS/6000, S/30, VM/ESA, VSE/ESA,
Websphere, xSeries, z/OS, zSeries, z/VM
Lotus, Notes, and Domino are trademarks or registered trademarks of Lotus Development Corporation
Java and all Java-related trademarks and logos are trademarks of Sun Microsystems, Inc., in the United States and other countries
LINUX is a registered trademark of Linux Torvalds
UNIX is a registered trademark of The Open Group in the United States and other countries.
Microsoft, Windows and Windows NT are registered trademarks of Microsoft Corporation.
SET and Secure Electronic Transaction are trademarks owned by SET Secure Electronic Transaction LLC.
Intel is a registered trademark of Intel Corporation
* All other products may be trademarks or registered trademarks of their respective companies.
NOTES:
Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that
any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the
workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.
IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have
achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.
This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject
to change without notice. Consult your local IBM business contact for information on the product or services available in your area.
All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the
performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.
References in this document to IBM products or services do not imply that IBM intends to make them available in every country.
Any proposed use of claims in this presentation outside of the United States must be reviewed by local IBM country counsel prior to such use.
The information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of
the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those
Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.