Sie sind auf Seite 1von 57

IBM Systems Technical University - Budapest, 3-6 May 2010

Session xVI05
Open Source Virtualization with KVM
for IBM System x
Tom.Schwaller@de.ibm.com - Linux Architect

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Tom Schwaller (tom.schwaller@de.ibm.com)


Linux IT Architect, IBM Germany
Tom Schwaller studied Mathematics and Theoretical Physics at the
Swiss Federal Institute of Technology in Zurich and worked afterwards
for several high performance computing research projects. From
1996-2001 he was editor-in-chief of he German Linux Magazine
and also cofounded the Linux New Media AG.

Since June 2001 he works as Linux IT Architect at IBM Gemany and


helped as member of the Linux Impact Team (until end of 2005) many
IBM customers with their Linux migration. As IBM‘s Linux Evangelist he
also supported hundreds of Linux customer briefings, gave dozends of
radio, TV and press interviews, represented IBM as keynote speaker on
all major German Linux events (LinuxTag, LinuxWorldExpo, LinuxPark,
etc.) and was advisory board member / chairman of several conferences
(SambaXP, iX Eclipse Conference, First International Virtualization
Conference, etc.). As EMEA Linux Dektop Technical Leader he also
coauthored the very successful IBM Linux Client Migration Cookbook.

Since 2006 his main focus is on high performance and cloud computing,
virtualization, high end x86 systems, iDataPlex & BladeCenter and high
speed Infiniband/10GbE networking/storage (incl. GPFS). From 2007-2008
he was Deep Computing Lead Architect in CEEMEA. At the moment he
works as Lead Architect for a major Linux Desktop Cloud project in
Germany.
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Agenda
 x86-Virtualization
 KVM (Kernel-based Virtual Maschine)
 KVM Performance Tuning
– KSM (Kernel SamePage Merging)
– VirtFS
 KVM I/O-Architecture (Evolution)
– Virtio
– I/O Acceleration with vhost-net & SRIOV
 KVM Security
– SELinux & sVirt
 QEMU
– Creating Disk Images & manual Installation

Thanks to Anthony Liguori, Hollis Blanchard, Ram Pai

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

x86-Virtualization

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Some Virtualization Use Cases


 Datacenter Consolidation
– Primary driver: increase utilization (primarily CPU, memory)
– Special case: security isolation (e.g. hosting providers)
 Hardware Abstraction
– Driver: new hardware with older guest OS
– Support via virtual device drivers, e.g. ATA disk
 Leverage New Technologies (FCoE, 10 GbE)
 Cloud Computing
– Resources on demand, pay per use
 Development and Testing
– Driver: multiple development and testing environments,
isolation from main workspace in host. Fault injection.
 Virtual Desktop
– Multiple OS
– Legacy applications (DOS, Windows)
 Thin Clients
– Provision, manage, high availability
– Accessible from everywhere

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Evolution of x86 Hypervisors

First Gen eration Secon d Gen eration Third Gen eration


Software based P ara-virtu alization H ardware-based /
OS virtu alization

VM VM VM VM VM VM VM VM

Domain 0
Binary Translation

Hypervisor Hypervisor Linux KVM

x86 VT x86 x86 VT

Virtualization logic PV kernel PV drivers

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM (Kernel-based Virtual Maschine)

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

An Alternative to Xen
 In 2006-2007, kernel developers started talking
about an alternative to Xen that was more closely
aligned with Linux
– A few issues stood out:
• NUMA Support
• Control Tool Stack
 A startup, Qumranet, wanted to build a VDI solution
using Open Virtualization
 Qumranet never intended on being a hypervisor vendor

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Leveraging Hardware
 Design virtualization support around hardware virtualization
– Hardware virtualization support is pervasive
– Modern Intel VT-x/AMD-V outperforms paravirtualization (PV)
 Tremendous simplification comes from not supporting older hardware
– Intrusive Linux patching is unnecessary

Leveraging Linux
 Xen is virtualization added to an exokernel
 Linux is a proof-by-example that monolithic kernels
are more scalable/secure/fast/stable than microkernels/exokernels
– Linux dominates the top 100
– Linux has a large share in the embedded space
– Rising desktop/server market shares
 If Linux can be used in Naval destroyers for systems control,
why can't it be used to run a couple dozen Windows XP instances
running Office?

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM Design Philosophy


 Guest is a userspace process
– Not just from a management perspective
 From a performance perspective, switching to and from the
guest is roughly equivalent to switching between kernel
and userspace
 The problems that need to be solved to do something in
virtualization are roughly the same as to do them in userspace
– PCI passthrough == userspace PCI drivers
– Unsurprisingly, interrupt sharing is the major issue for both
 If we solve the problems for Linux userspace in general, we
solve the problems for virtualization

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Kernel-based Virtual Maschine (KVM) - Overview


 Converts Linux into a Type-1 Hypervisor
 Incorporated into the Linux kernel in 2006 http://www.linux-kvm.org
http://www.linux-kvm.com
 Runs Windows, Linux and other guests
 KVM architecture leverages the power of Linux
– Built on trusted, stable enterprise grade platform
– Scheduler, memory management, hardware support etc.
– Ease of management - same Linux paradigm
 Advanced features
– Inherit scalability, NUMA support, power management, hot-plug from Linux
• Red Hat Hypervisor (KVM) expected to support >96 cores / 1 TB RAM on host
and 16 vCPU / 64 GB RAM on guest!
– SELinux security, Real-Time scheduler, RAS support, OpenGL for guests
– Live Migration of virtual machines
– VM Storage access (iSCSI, AoE, FCoE, GNBD, cLVM,..) from Linux
 Hybrid-mode operation
– Run regular Linux applications side-by-side with Virtual Machines on
the same server - much higher degree of hardware efficiency
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Linux KVM - Architecture

 Type 1 Hypervisor
– Not “bare metal” in a classical sense, but hypervisor is kernel-integrated
 Introduces new instruction execution mode – Guest Mode
– Executes VMs closer to Kernel avoiding User Mode context switching like
traditional non-kernel integrated Type 2 Hypervisor
 Slightly modified QEMU is used for HVM construct and I/O
– virtio utilizes user mode virtio drivers inherent in Kernel/QEMU for performance

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM Execution Model


Userspace Kernel Guest

ioctl()

Switch to
Guest Mode

Heavyweight Lightweight Exit Native Guest


Exit Execution

Kernel
Exit Handler

Userspace
Exit Handler

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM is a Virtualization Driver


 KVM is a small kernel driver that adds virtualization
support on multiple architectures
– AMD, Intel (included in 2.6.20)
• KVM-lite: PV Linux guest on non-VTx / non-SVM host
– IA64 (included in 2.6.26)
– S390 (included in 2.6.26)
– Embedded PowerPC (power.org, included in 2.6.26)
 About 30k LOCS
 Compared to ~250k LOCS for Xen
 Uses QEMU in userspace as a device model
 Safe to use by unprivileged userspace processes
 Can leverage almost all Linux features

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM Features
 Power Management
– C and P state support
– Advanced governers
– Suspend/resume
 Memory Management
– NUMA support
• Policy control
• Memory migration
– Swapping
– Overcommit
– Compression (KSM)
 Resource Control
– cgroups
– CFS tunables
 Anything that Linux supports
 All Hardware that Linux supports is supported in KVM
– Compare this to ESX!
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM Upstream Maintainerships


IBM
Libvirt
Kaitlin Rupert
CIM
Providers
25 kloc

Red Hat
Libvirt
Daniel Berrange
Toolchain
100 kloc Released
Hypervisor

IBM
Qemu Device
Anthony Liguori
Model
120 kloc

Red Hat
KVM Modules Avi Kivity
15 kloc

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

RHEL KVM Roadmap

03/2010 2Q/2010 tbd

RHEL 5.5 – Functions RHEL 6.0 – Functions RHEL 6.x - Functions


 Nehalem-EX performance  Improved I/O throughput  LPAR Mode
optimizations  SR-IOV support  Aptus support
 Device model improvements  Virtual Switch enhancements  HA and node failover
 More flexible Interrupt  Improved RAS via Intel MCA  Clustered File System
handling  Storage and Network  UEFI guest BIOS
 PCI pass-through Management APIs  Westmere performance
 Bug fixes (vis a vis 5.4) optimizations

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KSM (Kernel SamePage Merging)

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KSM - Memory Page Sharing


 Implemented as loadable kernel module
– Kernel SamePage Merging (KSM) included in Linux Kernel 2.6.32 (Izik Eidus )
– modprobe ksm
– cat /sys/kernel/mm/KSM/max_kernel_pages (-> 2000)
– cat /sys/kernel/mm/KSM/pages_sharing (>0)
 Kernel scans memory of virtual machines
– Looks for identical pages
– Merges identical pages
– Only stores one copy (read only) of shared memory
– If a guest changes the page it gets it's own private copy
 qemu-kvm KSM-patch added to kvm
development tree after kvm-88 release
 Significant hardware savings
– Better consolidation ratio
– Allows more virtual machines to run per host
• Memory Overcommit (avoiding Linux Swapping)
• 600 VMs (web) on host with 48 cores and 256 GB RAM! (Red Hat claim)
http://www.linux-kvm.com/content/using-ksm-kernel-samepage-merging-kvm
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

IBM Linux Technology Center - KSM Recommendations


 Include the virtio balloon driver and auto-ballooning daemon in RHEL 5.x guests.
Tests show that ballooning is necessary for KVM over-commitment to be effective
for Linux guests (1.7x).
 Substitute a more recent 2.6.3x kernel for the default RHEL 5.x 2.6.18 kernel. These
later kernels are significantly better at handling low-memory situations than the
default kernel.
 Decrease the frequency or randomize the runtime of periodic daemons. For
example, the yum-updatesd daemon loads by default 1 hour after boot. When this
daemon loads concurrently on over-committed guests, all the guests fault in the
python runtime, causing a concurrent resource spike. By changing the setting to be
more random, there would be no spike in memory usage.
 Setting the swappiness tunable in the KVM host to zero significantly improves
performance by causing the host to prefer evicting page cache pages before guest
pages.
 Turn zone reclaim off. Initial testing showed that memory pressure caused by over-
commitment can trigger some unexpected performance reductions when specific
NUMA memory zones become fully allocated.
 Turn Numa off. Provisionally we see better results with numa turned off. We suspect
there are some latent bugs in the numa allocation that we should find and fix.

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

VirtFS

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

VirtFS - Overview
 What is it?
– Filesystem pass-through mechanism between the KVM host
and guest operating systems (para-virtualized file system)
– Uses Plan-9 Protocol (9P2000.L) between Client and Server
• Simple, efficient protocol, maintained by IBM
– Server is on Host and is part of QEMU with VirtIO transport
– Client is part of the Quest Kernel.
 What are the expectations?
– Provide secure and isolated Filesystem exports
between the KVM Host and Guest.
– Close to native Filesystem (GPFS) performance
– Multi-tenancy
 Who Needs it?
– VSC (Virtual Storage Cloud)
• SoNAS on top of VirtFS client in a KVM guest
– SoNAS
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

VirtFS - Block Diagram

Apps on Guest
VirtIO
Ring
VFS Interface
VirtFS (v9fs) Host
Client
User Space
Guest Kernel
VirtFS
Server
GPFS API
(v9fs server in
QEMU) GPFS Client
VFS Interface

HOST KERNEL
HARDWARE

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

VirtFS and GPFS


 Why not use GPFS Directly on the Guest?
– GPFS limits the number of cluster cross mounts
limiting the number of virtual machines.
– GPFS in the guests would be a significant resource
usage (memory) due to fixed i-node cache allocation.
– GPFS is much more sensitive (narrow) about supported
kernel versions and Operating Systems.
– Disk management becomes difficult on dynamically
changing environment.

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtio

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtio

First proposed by Rusty Russell
– Based on our experiences with Xen frontend/backend architecture

Addressed a number of concerns:
– Clear separation between protocol and transport to allow multiple
hypervisors to utilize
– Each component uses well defined interface and is replaceable
– Minimum driver implementation required
– Fits on top of existing hardware abstraction well (PCI)

Linux will support lguest, KVM, Xen, KVM-lite, PHYP, VMware,
Viridian, and possibly more
– If each has 4-5 PV drivers, that's 35 new drivers!
– All drivers would be doing the same thing

virtio is an abstraction of the common mechanism of VMMs
– A single driver could, with little modification, run on many different VMMs

Especially important for “small” drivers (entropy driver, CPU hotplug,
ballooning, etc.)

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtio Architecture

virtio-net virtio-blk virtio-video virtio-9p

virtio

kvm virtio lguest virtio xen virtio


backend backend backend

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtio for KVM


 Paravirtualized Drivers for KVM/Linux
– virtio was chosen to be the main platform for IO virtualization in KVM
– The idea behind it is to have a common framework for hypervisors for IO
virtualization (same in XEN)
– network/block/balloon/PCI passthrough devices are supported for KVM
– The host implementation is in userspace - qemu, so no driver is needed in
the host (but still has some perforance issues)
 Hardware assisted Virtualization
– Support for advanced hardware features for both KVM and Xen
• VT-d for secure PCI Pass-thru on Intel platforms
• IOMMU for secure PCI Pass-thru on AMD platforms
• PCI Single-Root I/O Virtualization (SR-IOV)
– Delivers native I/O performance for network and block devices
 Support for Microsoft Windows Servers guests
– Paravirtualized drivers for network and disk (WHQL certified -> Enterprise Distros)
– Microsoft SVVP Certification (-> Enterprise Distros)

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Current State of Virtio



Drivers:
− virtio-net
− virtio-blk
− virtio-console
− virtio-balloon
− virtio-random

Transports:
− virtio-pci
− virtio-s390
− virtio-lguest

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtio-net

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

virtio-net

Where most of the work is happening these days

Current performance is in most cases, better than Xen
netfront/netback
− Xen suffers from asymmetric RX/TX performance
− KVM maintains symmetry on both

The mainline bits are still only roughly 50% of native
− Active work underway to improve that further

Uses the tun/tap device
− Added GSO support to improve performance

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM netperf with virtio-net

© 2009 Chris Wright

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM netperf with Device Assignement

© 2009 Chris Wright

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

I/O Acceleration with


vhost-net & SRIOV

Thanks to Ram Pai (IBM)

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtualization Phase I - Qemu Emulation

 Emulation entirely done in user space


 Qemu emulated hardware
Guest OS – E1000/RTL nic
– IDE driver
 No hardware support
Virtual QEMU Virtual NIC  No Host OS support
Disk
 Qemu/Guest is just a user process
for the Host OS
Host OS  Performance – VERY SLOW
Disk Driver NIC Driver

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtualization Phase II - KVM Acceleration

 Qemu emulates devices and runs in User Mode


QEMU Guest OS  Guest still part of the QEMU process
 Guest image run in Guest Mode facilitated by KVM
 KVM exploits Intel VT / AMD-V CPU support
 Performance
Virtual
– Guest CPU speed is near native
Disk Virtual NIC
– IO is slow
 Guest->Host->User mode and vice-versa
 context switch penalty for each i/o operation
Host OS KVM
Disk Driver NIC Driver

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtualization Phase III - I/O Acceleration through Virtio


 Qemu emulated virtio device
QEMU Guest  Guest runs a paravirt virtio driver
Send Q OS  I/O is buffered in circular send and receive queue
Virtual
NIC
Virtio
Driver
 Context switch from Guest->Host->User
and vice/versa reduced significantly
Recv Q  Performance
Virtual
Disk – Guest CPU speed is near native
– Better I/O throughput and lower latency
at lower CPU utilization
Host OS KVM
Disk NIC
Driver Driver

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtualization Phase IV - I/O Accel. through vhost-net


 Kernel emulated virtio device through vhost-net
QEMU Guest  Guest runs a paravirt virtio driver
Send Q OS  I/O is buffered in circular send and receive queue
Virtio
Driver
 Context switch from kernel to user and vice-versa

Recv Q eliminated per vmexit.


Virtual
Disk  Performance
– Guest CPU speed is near native
– Further lower latencies
vhost-net
KVM

Host OS

Disk Driver NIC Driver

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

SRIOV Overview
 SRIOV : Single Root I/O Virtualization
 PCI-SIG Standard
 Ability to drive a PCIe function from
multiple independent software entities
 Each software entity believes it has
exclusive access
 Provides high throughput, low CPU
utilization, high scalability
 Requires platform support

 Replicates each hardware physical


Configuration
resource PF 0 VF0 1
0 function into multiple virtual functions
VF02 – PF -> Physical Function
VF0 3
– VF -> Virtual Function
VF0 0
PCIe  Virtual Function is a replica of the
PF1 VF1 1 ..
port Physical Function
VF2 0 VF1 2  Each Physical Function can have
VF2 1 VF1 3 VF0 n up to 256 Virtual Functions
 Each Virtual Function can be driven
PF2 VF2 2 ..
by an independent software entity
VF2 3  Virtual Functions are light weight
.. VF1 n which lack configuration resources.

VF2 n
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtualization Phase V - SRIOV Hardware Acceleration


 Host configures and enables PF
Guest
and all VF
Guest Guest Guest OSn  Host and Qemu are by-passed in the
OS1 OS2 OS3 I/O path
VF VF VF VF
driver driver driver driver  Each guest controls a VF function
 Side band communication path
between PF and VF
– For communicating device
KVM information
Host OS – Co-ordination between PF and VF
PF on device reset
driver  Native CPU speed
 Promises native I/O performance at
negligible CPU overhead

PF VF1 VF2 ... ... VFn

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Issues with SRIOV

 All guest pages have to be pinned


– Cannot overcommit guest memory
 Requires PCI pass-thru platform support
 Guest migration across hosts is challenging

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

TCP Bandwidth Comparison


SRIOV provides higher bandwidth at lower

CPU utilization

• 3400M2 Nehalem, 16cpu, 4G memory


• Intel 1G SRIOV adapter
• rhel5u4 host running rhel5u4 guest

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

TCP Latency Comparison


SRIOV surprisingly has higher latency

• 3400M2 Nehalem, 16cpu, 4G memory


• Intel 1G SRIOV adapter
• rhel5u4 host running rhel5u4 guest

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

How to use SRIOV


 Ensure SRIOV is supported in BIOS/UFI
 If kernel not enabled use command line workaround : pci=assign-busses
 Activate driver to enable VF: modprobe igb max_vfs=1
 Check for the existence of VF
lspci
07:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
08:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
 Enable the VF for pci-passthru
pciid=$(lspci -n | grep $bus | awk '{print $3}' | sed -e 's/:/ /')
echo -n $pciid > /sys/bus/pci/drivers/pci-stub/new_id
echo -n 0000:$bus > /sys/bus/pci/devices/0000:$bus/driver/unbind
echo -n 0000:$bus > /sys/bus/pci/drivers/pci-stub/bind
. Pass the VF pciid to the guest
qemu-kvm -pcidevice host=$bus
 Verify that the device is grabbed by the corresponding driver in the guest
# ethtool -i eth0
driver: igbvf

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

QEMU

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

QEMU
 QEMU is a community-driven project
– No company has sponsored major portions of it's development
 QEMU does a really amazing thing
– Can emulate 9 target architectures on 13 host architectures!
– Provides full system emulation supporting ~200 distinct devices
– Very sophisticated and complete command line interface (CLI)
– There are more than 90 options in the output of qemu-kvm --help
 Is the basis of KVM, Xen HVM, and xVM Virtual Box
– Every Open Source virtualization project uses QEMU

Userspace device model for KVM
− Provides management interface
− Provides device emulation
− Provides paravirtual IO backends

libvirt communicates directly with QEMU

libvirt-cim communicates with libvirt
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Creating a virtual Disk with qemu-img


 # qemu-img create
– Options: format; filename; size; compression; encryption; base image
– Formats: qcow/qcow2, Vmware, Virtual PC, Parallels, raw and 8 more
– A base image is an existing virtual disk to use as the initial state for a
copy on write snapshot
• No coordination so you should manually make it read only
– Example: # qemu-img create -f qcow2 /path/to/virtualdisk 6G
 # qemu-img info /path/to/virtualdisk
– Give information about the virtual disk, e.g. size (on-disk and virtual), format,
snapshots
 # qemu-img convert -f <fmt> /path/to/virtualdisk \
-O <fmt> /path/to/converteddisk
– Convert the virtual disk format, e.g.:
• Virtual PC to qcow
• Add compression or encryption
 # qemu-img commit /path/to/virtualdisk
– Write a virtual disk's changes to its base image
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Windows 2003 Installation on Fedora 12


 Install KVM package
# yum install qemu kvm
 Create an image file for the virtual hard disk
# qemu-img create -f qcow2 win2003-1.img 4G
 Start the virtual machine
# qemu-kvm -no-acpi -m 384 -hda windows2003-1.img \
-cdrom w2k3.iso -boot d -smp 2
-m = memory (in MB)
-hda = first hard drive (many image file types supported)
-cdrom = ISO‐Image or CD/DVD drive
-boot[a|c|d] = boot from Floppy (a), Hard disk (c) or CDROM (d)
-smp = number of CPUs
 Install Windows, download PV drivers to VM and restart with virtio network option
# qemu-kvm -no-acpi -m 384 -boot c -smp 2 windows2003-1.img \
-net nic,model=virtio
 Install paravirtualized (PV) network drivers and reboot with virtio network option
 For paravirtualized Windows block device (driver installation and usage) check e.g.
http://www.linux-kvm.com/content/redhat-54-windows-virtio-drivers-part-2-block-drivers
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Result of Windows 2003 Installation

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

OpenSolaris 2008.11 on RHEL-5.3/5.4 (KVM)

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM Example with more Networking Options


 # qemu-kvm -hda /path/to/virtualdisk \
-net nic,model=e1000,macaddr=ac:de:48:64:61:1c \
-net tap,script=no,ifname=kvmtap679a \
-m 512 -smp 2 -usb -localtime -name “MyVM”
 Networking Options
– User space stack [-net user,vlan=<n>]
• Port forwarding [-redir tcp|udp:host-port:guestIP:guest-port]
– Tap a host interface
[-net tap,vlan=<n>,ifname=<tapname>,script=no]
– Socket (private shared network with another host)
[-net socket,vlan=<n>,listen=:<port>]
[-net socket,vlan=<n>,connect=<host>:<port>]
– Multicast socket (shared network with multiple hosts)
[-net socket,vlan=<n>,mcast=<addr>:<port>]
– VM with no NICs [-net none]

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM Security

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

sVirt: Hardening Linux Virtualization with Mandatory Access Control


 sVirt: pluggable security design for libvirt
– supports MAC security schemes like SELinux, SMACK
 MAC policy enforced by host kernel
 Guests and resources uniquely labeled: svirt_t, virt_image_t, virt_content_t,…
 Coarse rules for all isolated guests applied to svirt_t
 For simple isolation: all accesses between different UUIDs are denied
 Current status http://selinuxproject.org/page/SVirt
– Low-level libvirt integration done
– Can launch labeled gues
– Basic label support in virsh
 Future enhancements
– Different types of isolated guests: svirt_web_t
– Virtual network security
– Controlled flow between guests
– Distributed guest security
 Related work
– Labeled NFS © 2009
– Labeled Networking James
Morris

– XACE
 Similar work XEN Vulnerability
http://www.hacker-soft.net/Soft/Soft_13289.htm
– XSM (port of Flask to Xen)
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

sVirt Dynamic Labeling


 Generates a Random unused MCS (Multiple Category Security) label.
 Labels the image file/device: svirt_image_t:MCS1
 Launches the image: svirt_t:MCS1
 Labels R/O Content: virt_content_t:s0
 Labels Shared R/W Content: svirt_t:s0
 Labels image on completion: virt_image_t:s0

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

sVirt Static Label (Multi-Level Security)


 Administrator must specify image label svirt_t:TopSecret
 Launches the image: svirt_t:TopSecret
 Libvirt will NOT label any content. Administrator responsible for labeling content.

virt-manager with
static SELinux labeling
© 2010 IBM Corporation
IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Questions ?

© 2010 IBM Corporation


IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Trademarks
Trademarks

The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. For a complete list of IBM Trademarks, see
www.ibm.com/legal/copytrade.shtml: AS/400, DBE, e-business logo, ESCO, eServer, FICON, IBM, IBM Logo, iSeries, MVS, OS/390, pSeries, RS/6000, S/30, VM/ESA, VSE/ESA,
Websphere, xSeries, z/OS, zSeries, z/VM

The following are trademarks or registered trademarks of other companies

Lotus, Notes, and Domino are trademarks or registered trademarks of Lotus Development Corporation
Java and all Java-related trademarks and logos are trademarks of Sun Microsystems, Inc., in the United States and other countries
LINUX is a registered trademark of Linux Torvalds
UNIX is a registered trademark of The Open Group in the United States and other countries.
Microsoft, Windows and Windows NT are registered trademarks of Microsoft Corporation.
SET and Secure Electronic Transaction are trademarks owned by SET Secure Electronic Transaction LLC.
Intel is a registered trademark of Intel Corporation
* All other products may be trademarks or registered trademarks of their respective companies.

NOTES:

Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that
any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the
workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.

IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.

All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have
achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.

This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject
to change without notice. Consult your local IBM business contact for information on the product or services available in your area.

All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the
performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

References in this document to IBM products or services do not imply that IBM intends to make them available in every country.

Any proposed use of claims in this presentation outside of the United States must be reviewed by local IBM country counsel prior to such use.

The information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of
the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those
Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

© 2010 IBM Corporation

Das könnte Ihnen auch gefallen