Sie sind auf Seite 1von 51

VMware ESX Server 3.

0: How VMware ESX Server


virtualizes HP ProLiant servers

Executive summary ..............................................................................................................................................3


This white paper................................................................................................................................................3
Architecture ..........................................................................................................................................................4
Hardware performance differences ............................................................................................................5
Driver translation ..........................................................................................................................................5
World switching ............................................................................................................................................5
Accommodating increased utilization ....................................................................................................5
Service console.....................................................................................................................................................5
Boot process......................................................................................................................................................5
Service console overview...............................................................................................................................7
VMware Virtual SMP.............................................................................................................................................8
Resource virtualization.........................................................................................................................................9
CPU .....................................................................................................................................................................9
Reacting to an idle VM.............................................................................................................................11
Updating the VM clock.............................................................................................................................12
Default settings...........................................................................................................................................12
Impact of processor cache size..............................................................................................................12
Impact of cache on scheduling .............................................................................................................13
Intel Xeon processors.................................................................................................................................14
Xeon processors introduced Hyper-Threading Technology, which allows ESX Server to treat a
single physical processor package as two logical processors. By design, hyperthreaded
processors include a second instruction pipeline but still feature a single execution pipeline. The
processor is solely responsible for distributing execution cycles between the instruction pipelines.
.......................................................................................................................................................................14
AMD Opteron processors .........................................................................................................................15
Using NUMA architecture .........................................................................................................................15
Disabling NUMA capability ......................................................................................................................17
Memory............................................................................................................................................................17
Other memory consumers........................................................................................................................17
Memory management .............................................................................................................................19
Using the balloon driver ............................................................................................................................20
Using a swap file.........................................................................................................................................21
Using background memory page sharing ............................................................................................21
More on memory overcommitting .........................................................................................................21
Recommendations for memory virtualization ......................................................................................22
Network............................................................................................................................................................22
How to dedicate a physical NIC to a VM .............................................................................................23
Configuring virtual switches .....................................................................................................................24
Load distribution.........................................................................................................................................25
Distributing outbound traffic ....................................................................................................................27
Distributing inbound traffic .......................................................................................................................27
Eliminating the switch as a single point of failure.................................................................................28
Improving network performance............................................................................................................28
How the network perceives VMs.............................................................................................................28
VLANs ...........................................................................................................................................................29
Considerations when configuring virtual switches...............................................................................29
Storage.............................................................................................................................................................30
Architecture ................................................................................................................................................31
VMFS.............................................................................................................................................................32
LUN performance considerations ...........................................................................................................33
Tuning VM storage .....................................................................................................................................34
Using raw device mapping......................................................................................................................34
Other design considerations....................................................................................................................35
Sizing VM disk files ......................................................................................................................................35
Presenting a raw LUN to a VM .................................................................................................................35
Raw device mapping ...............................................................................................................................36
Planning partitions .....................................................................................................................................39
Implementing boot-from-SAN .................................................................................................................39
Noting changes to the boot drive and device specification ...........................................................40
Taking care during the installation..........................................................................................................40
Defining the connection type .................................................................................................................40
Fibre Channel multipathing and failover...............................................................................................40
Fail-back ......................................................................................................................................................41
Resource Management....................................................................................................................................41
Clusters .............................................................................................................................................................41
VMware High Availability (HA) Clusters......................................................................................................42
VMware Distributed Resource Scheduling (DRS) Clusters ......................................................................42
Resource Pools................................................................................................................................................42
Resource Allocation ......................................................................................................................................42
Absolute allocation ...................................................................................................................................43
Share-based allocation ............................................................................................................................43
Differences between allocation methods ............................................................................................43
Warning on setting a guaranteed minimum ........................................................................................43
Allocating shares for other resources .....................................................................................................44
Best practices......................................................................................................................................................44
VMware VirtualCenter.......................................................................................................................................44
Architecture ....................................................................................................................................................45
Templates and clones ...................................................................................................................................45
Template .....................................................................................................................................................45
Cloning.........................................................................................................................................................46
Differences between templates and clones ........................................................................................46
Considerations and requirements for VirtualCenter server ...................................................................46
Compatibility ..............................................................................................................................................47
Virtual Infrastructure Client application requirements ........................................................................47
VMotion................................................................................................................................................................47
Architecture ....................................................................................................................................................47
Considerations and requirements...............................................................................................................48
For more information .........................................................................................................................................51
Executive summary
This document contains functional information for VMware ESX server as well as new features and
functionality introduced by VMware ESX Server 3.0. Specifically, it provides operational parameters,
component virtualization methodologies, general utilization, and best practice methods for the integration
and operation of a virtual infrastructure.
This guide is intended for Project Managers and corporate decision makers involved in the initial phases of
enterprise virtualization. This document provides an overall understanding of how ESX works, and should
help the reader make informed decisions concerning the implementation of virtualization.
The reader should be familiar with industry terminology, and generally familiar with virtualized infrastructures.
For access to more in-depth information see the reference section of this guide.
This guide is the result of a joint effort by VMware and HP.

This white paper


• Architecture – Outlines the virtualized computing environment implemented by VMware ESX Server;
describes performance differentials
• Service console – Outlines the architecture and capabilities of the ESX Server service console; explains the
differences between Linux, the service console, and ESX Server
• VMware Virtual SMP – Outlines the use of Virtual SMP to give a VM access to four virtual processors;
explains processor fragmentation
• Resource virtualization – Outlines resource utilization issues in a virtualized computing environment
– CPU – Describes processor virtualization concepts such as virtual processors; provides an explanation
of single core, dual core and hyperthreaded processing resources; outlines the management of idle
VMs and virtual processor scheduling; describes the impact of cache size on performance; describes
the concept and impact of cache fragmentation; outlines the impact of Intel® Xeon™ and AMD
Opteron™ chipset technologies; describes the influence of Intel® and AMD virtualization
technologies; describes the use of Non-Uniform Memory Access (NUMA) architecture
– Memory – Describes the memory guarantees required for VMs; outlines the memory management
features of ESX Server; describes the use of the balloon driver, swap files, and background memory
map sharing to free up memory; discusses the implications of memory overcommitting; provides
recommendations for memory virtualization
– Network – Describes the concept of a virtual switch; outlines how to configure a virtual switch;
describes MAC- and IP-based methods for load distribution; describes how to eliminate the switch as
a single point of failure; outlines methods for improving network performance; discusses the uses of
VLANs; outlines NIC recommendations for the service console, VMotion and the VMs
– Storage – Provides an overview of virtual storage architecture; explains why VMFS-3 is used for
virtualized storage; outlines LUN performance considerations; provides methods for tuning VM storage;
describes iSCSI support; outlines support for NAS; describes the use of raw device mapping; outlines
the use of internal storage controllers; describes how to implement boot-from-SAN; discusses
multipathing and failover in a virtualized environment
• Resource management – Outlines the ability to divide and allocate the resources of a combined group
of ESX Server hosts
– Cluster – Describes the ability to group similar hosts in order to improve workload distribution and
failover capability
• VMware HA – Briefly describes the failover capability of a cluster configured for HA and provides a link
to the VMware HA whitepaper which includes best practices
• VMware DRS – Outlines the basic principals of DRS and provides a link to the VMware DRS whitepaper
and best practices
– Resource pools - Outlines the advantages of using resource pools; describes the process of establishing
resource pools and adding VMs to the newly created pools; outlines the process of modifying the

3
resource allocation allotted to a specific resource pool; explains potential issues that arise when
assigning resource reservations and workarounds for over allocation

Architecture
VMware ESX Server provides a virtualized computing environment that, unlike VMware Server or VMware
Workstation, does not rely on an underlying operating system to communicate with the server hardware –
instead, ESX Server is installed directly on the server hardware. Virtual Machines (VMs) are then installed and
managed on top of the ESX Server software layer.
Since virtualization components are not hosted within the confines of a host operating system, the ESX
Server architecture has been described as “unhosted,” “native,” or “hostless.” Hosted and unhosted
architectures are compared in Figure 1.

Figure 1: Comparing the native ESX Server architecture with a typical hosted architecture

The ESX Server architecture provides shorter, more efficient computational and I/O paths for VMs and their
applications, reducing virtualization overhead and improving application performance. The unhosted
architecture also enables ESX Server to provide more granular and enforceable policies for hardware
allocation and VM prioritization – an important differentiator for ESX Server over a hosted architecture. In a
hosted virtualization environment, the host OS governs the execution of VM threads and typically limits the
granularity of prioritization to categories such as “high,” “low,” or “normal.”
Furthermore, in the unhosted architecture of ESX Server, VM processes do not contend with the many and
various processes that consume the resources allocated to a host OS.
In short, the unhosted architecture of ESX Server provides a lightweight, single-purpose virtualization
environment that allows enforceable hardware allocation and prioritization policies. The single-purpose
micro-kernel, called VMkernel, translates into higher-performance and flexibility. Also, because VMkernel
uses only drivers ported and rigorously validated by both HP and VMware, the micro-kernel provides
exceptional stability.

4
Hardware performance differences
Performance and resource utilization for a particular operating system instance and application differ when
running in virtualized and unvirtualized environments, as discussed below.

Driver translation
While the quantification of performance differentials is very complex, it can be stated that, in general, CPU
and memory performance overheads in a VM tend to be lower than overheads for network or disk traffic.
This is because neither CPU nor memory needs the same amount of translation as required when data flows
between virtual and physical device drivers. With the introduction of ESX Server 3.0, many of the physical
device drivers have been incorporated into the kernel to further improve VM performance. In general, the
performance of a primarily CPU-intensive application in a VM is likely to be closer to its performance on a
physical server than that of an application that is more network- or disk-intensive.
However, translation between virtual and physical devices does consume some additional CPU resources
on top of those required for application and guest OS processing. This translation results in a higher
percentage of CPU utilization being needed for each request processed in a VM when compared to a
physical server running the same application.

World switching
The world switching process helps the sharing of physical system resources by preempting a currently-
running VM, capturing and saving the instantaneous execution state of that VM, and initiating CPU
execution for a second VM.
Although world switching allows VMs to share physical system resources, the process introduces an
additional amount of overhead associated with running VMs. Though this process adds a small amount of
overhead, the benefits of virtualization strongly outweigh the additional cost.

Accommodating increased utilization


On average1, the CPU utilization of a Microsoft® Windows®-based x86 server is approximately 4%. Even with
virtualization overhead and driver translation, many systems and application environments have sufficient
CPU resources to accommodate a substantial increase in utilization. While this is not always the case, many
OS and application environments can be virtualized without sacrificing much performance.
The performance sacrifice has been reduced by optimizations made in ESX Server 3.0 which target OLTP,
Citrix, Windows 2003 Web Server and custom Linux applications. Specific improvements have been made
to optimize the referenced workloads and enhance performance.

Service console
ESX Server is often thought of as Linux or Linux-based – a misconception that might stem from the service
console. To rebut this misconception, consider the following:
• The VMkernel, responsible for the creation and execution of virtual machines, is a single-purpose, micro-
kernel.
• The VMkernel cannot boot itself and has no user interface. It relies on a privileged, modified Red Hat
Enterprise Linux installation to provide ancillary services like a boot loader and user interface.
• It is important to understand that the VMkernel – not the Linux kernel – is the governing authority in an ESX
Server deployment. It is the VMkernel that creates, monitors, and defines the virtualization components; it
makes the only and final decision for execution allocation – even the Linux service console is subject to
the scheduling decisions of the VMkernel.

Boot process
The boot process helps explain the relationship between Linux, the service console, and ESX Server.

1 According to industry averages compiled by VMware Capacity Planner.

5
During boot, the bootloader (GRUB) loads a Linux kernel. Since certain PCI devices are masked by the
GRUB configuration, the Linux kernel only loads drivers for visible devices.
After most Linux services have been loaded, the Linux kernel loads the vmnixmod module, which loads the
VMkernel logger, which, in turn, loads the VMkernel itself. During its loading process, the VMkernel assumes
nearly all hardware interrupts and, effectively, takes over server hardware that was not allocated to the
Linux service console kernel. At this point, with the VMkernel owning most of the server hardware, it is free to
schedule VM execution and distribute physical resources between VMs.
The final component loaded is the VMkernel core dump process, which is designed to capture the state of
the VMkernel in the event of a kernel crash.

6
Service console overview
With the advent of ESX Server 3.0, the service console is now executed as a VM. Drivers for service console
devices, such as the NIC and storage, are loaded in the VMkernel which allows service console access to
the configured hardware through the kernel itself. Although the service console accesses most devices
through kernel modules, access to USB devices and the floppy is direct.
The VMware host agent which runs within the service console provides access for the Virtual Infrastructure
client. Additionally, a web access client is available and is powered by a Tomcat web service which runs
within the confines of the service console. The web access client allows users to perform many
management tasks using a web based interface. Furthermore, the Secure SHell within the service console
provides secure access for the command-line management of the ESX Server. While these interfaces might
appear identical to any Linux installation, the service console includes packages that allow both the
command line and the web interfaces to pass commands and configuration data to the VMkernel.
Figure 2 shows how the service console integrates into the ESX Server architecture.

Figure 2: The relationship between ESX Server and the service console

The service console, which uses a uni-processor 2.4.21 Linux kernel, is scheduled only on physical CPU0. By
default, the VMkernel also reserves a minimum of 8% of CPU0 for the service console through the same
guarantee mechanism used for VMs, which are free to consume remaining CPU0 resources. This CPU
allocation, in most cases, ensures that the service console remains responsive, even if other, busy VMs are
consuming all other available physical resources.
Although the service console is not responsible for scheduling VMs, there is a correlation between the
responsiveness of the service console and the responsiveness of VMs. This is due, in part, because ESX Server
transmits the keyboard, video, and mouse access of a VM to a VMware Remote Console session through
the service console network connection.

7
Because of this relationship, if the service console should become unresponsive or unable to perform the
supporting processes (such as updating /proc nodes or maintaining the VMkernel logger), the virtualized
environment may exhibit symptoms of this contention, ranging from slow remote console access to
VMkernel crashing. To combat this, consider increasing the memory allocation and/or minimum CPU
guarantee for the service console. Note, however, that this discussion addresses an extreme case; in most
cases, the default allocations should provide stable and responsive operation.
The service console also provides an execution environment for management and backup agents; the
loads generated by these additional processes further justify an increase in memory and CPU allocations
over the default values.

Note:
HP Systems Insight Manager (SIM) and other hardware monitoring agents run
in the service console, not in VMs.
ESX Server does not support the running of unqualified packages within the
service console environment.

Access to floppy drives, serial port devices, parallel port devices, and CD-ROM drives access – even from
within a VM – are proxied through the service console. This delegation of slower-access devices allows the
VMkernel to focus on high-speed, low-latency devices like hard disks.

VMware Virtual SMP


With a valid Virtual SMP license, ESX Server can optionally give a single VM simultaneous access to four
execution cores by exposing four virtual processors within the VM, allowing multithreaded applications
within the VM to process simultaneous instructions on four distinct processor cores. Since ESX Server simply
abstracts – as opposed to emulating – processors, simultaneous execution requires four processor cores to
be allocated – simultaneously and exclusively – to a single VM. Two cores are available within a single
package (for a dual-core processor or a processor with Hyper-Threading Technology) or across physical
packages.

Note:
Virtual SMP is licensed separately and requires this license to support the
exposure of four processors within a single VM. This license can be purchased
separately or as part of the Virtual Infrastructure Node bundle.

Although it may be tempting to use Virtual SMP by default when creating new VMs, it should be used
carefully – especially when running on systems which offer few execution cores, for instance a dual
processor system populated with single core processors. A VM is never allocated a portion of a core; during
its allocated unit of CPU time, a VM’s access is exclusive. As a result, when a VM using Virtual SMP is
deployed on a physical server with only two cores (as in a dual-processor, single-core server without Hyper-
Threading Technology), both cores are allocated to this VM during the scheduled period; no CPU resources
are available for other VMs or the service console. The corollary also applies: when any other VM or service
console process is scheduled for execution on either one of the two execution cores, processes on the
Virtual SMP-enabled VM cannot execute. This phenomenon is known as processor fragmentation and is
shown in Figure 3.

8
Figure 3: Both physical processor cores have been allocated to a Virtual SMP-enabled VM, leaving no CPU resources
available for other VMs

Processor fragmentation is often the reason for poor performance on servers with only two execution cores.
With dual-processor VMs on servers with only two cores, there is nearly 100% contention for CPU resources
when the system is under load. Now, if the goal were to run only a single VM with Virtual SMP on a platform
with two execution cores, the performance impact of this contention would be less noticeable; however, it
is far more common to deploy multiple VMs on such a platform, making contention a significant issue.
When using Virtual SMP, it is recommended that more processor cores be deployed on the physical server
than any single VM.
New technologies, such as Hyper-Threading Technology and dual-core processors, change this behavior
slightly and are examined more closely in later sections of this white paper.

Resource virtualization
In discussing how ESX Server presents virtual abstractions of hardware and schedules the execution of VMs,
it is helpful to discuss the four primary resource groups (CPU, memory, network, and disk) independently.

CPU
The primary concepts of CPU virtualization are as follows:
• A physical processor package
• A virtual processor
• A logical processor capable of executing a thread

The physical processor is a familiar concept; it has a clock speed, a cache size, and a manufacturer; and
you can hold it in your hand.
Introduced with newer technologies such as Hyper-Threading Technology and dual-core processors, the
logical processor is slightly more abstract, and may best be explained by examples such as those shown in
Figure 4.

9
Figure 4: Representations of physical processors

Single-core processor One logical processor


without Hyper-Threading
Technology
Single-core processor with Two logical processors
Hyper-Threading Technology
Dual-core processor Two logical processors

In the context of ESX Server, not all logical processors are equal. For example, a processor with Hyper-
Threading Technology includes two instruction pipelines; however, only one of these can access the
execution pipeline at any given moment. Contrast this with a dual-core processor where both instruction
pipelines have access to their own execution pipelines. As such, when discussing virtual processors, it might
be helpful to refer specifically to an execution core to avoid confusion with the non-executing second
instruction pipeline in a processor with Hyper-Threading Technology.
By far the most abstract of the concepts of CPU virtualization is the virtual processor, which is best defined
as a period of time allocated for exclusive execution on a processor core. When a VM is powered on and
its virtual processor is scheduled to execute (or multiple virtual processors if using Virtual SMP), a slice of time
on one logical execution core (or multiple logical execution cores if using Virtual SMP) within the physical
processors is assigned to the virtual processor(s) within the VM. Since this explanation and concept is purely
abstract, perhaps an example will clarify. Consider the following simplified examples:
• A physical processor with a single logical core and a single VM with a single virtual processor
Ignoring all virtualization overheads and execution cycles for system services, the virtual processor in this
scenario receives 100% of the execution time. If the logical core of the physical processor is a 3.0 GHz
CPU, the virtual processor receives all three billion cycles of the CPU clock.
• A single processor with a single logical core and two VMs, each with one virtual processor
Ignoring all virtualization overheads and execution cycles for system services and assuming equal priority
for both VMs, each virtual processor receives, over some period of time, 50% of the execution time.
It is important to understand that when one VM is executing, that VM has exclusive access to the
allocated logical execution core. In other words, only one virtual processor can execute within a single
logical processor at any given moment. The x86 architecture does not allow two virtual processors to
have simultaneous access to a single execution pipeline. Thus, in this example, if one VM is executing,
the other is not. The non-executing VM is not idle, rather, it has been pre-empted.

10
Figure 5 shows a number of scenarios featuring a single virtual processor.

Figure 5: Showing scenarios with one virtual processor per physical processor core

As can be inferred from the above discussion, when the number of virtual processors increases, the period
of time for execution may become shorter. Similarly, as the ratio of virtual processors to logical execution
cores increases2, the contention for physical resources may increase. One possible result – depending on
the amount of idleness within the VMs – is that the number of computational cycles available to a VM may
be less.
The qualifications in the previous paragraph – “may become shorter,” “may increase,” and “may be less” –
are rooted in the manner in which ESX Server treats an “idle” VM. When an operating system is not
consuming resources for system-sustaining processes or in support of an application, it is idle; indeed, most
operating systems spend considerable amounts of time in this state. While idle, the operating system issues
instructions3 to the CPU indicating that no work is to be done.
ESX Server is capable of recognizing this idle loop – unique to each operating system – and automatically
gives priority to VMs using CPU cycles to perform non-idle operations, giving rise to the qualifications stated
above. For example, as the ratio of virtual processors to logical processors increases, the contention for
physical resources may increase unless there are idle VMs.

Reacting to an idle VM
Consider the following to illustrate how idle VMs may affect the scheduling of virtual processors. In this
scenario, there is a single processor with a single logical core supporting two VMs, each with one virtual
processor. If one VM is performing CPU-intensive operations that entirely consume the cycles allotted to it

2 This ratio is often called virtual machine density or consolidation ratio.


3 Generally referred to as an idle loop

11
and the other VM is completely idle, ESX Server recognizes this disparity and effectively increases the
percentage of cycles allocated to the busy VM. Note that the busy VM never receives 100% of the CPU
cycles; some cycles are allocated to – and consumed by – the more idle VM to advance its clock and
ensure that it has the opportunity to become busy.
Updating the VM clock
As mentioned earlier, when a VM is not scheduled for execution within a processor core, time does not pass
in that VM. As a result, the measurement of time within that VM is incorrect – unless the VMware Tools
package is installed.
This package includes a component that, when enabled, updates the clock within the VM to ensure more
accurate timekeeping. This updating does not provide for a real-time measurement; however, the
accuracy of the updated time should be sufficient for most application purposes.

IMPORTANT:
VMware strongly recommends that the measurement of time should not be
used for the purposes of benchmarking.
Any application running in a VM that measures performance with respect to
time – for example, requests per second, transactions per second, or response
time – has temporal components that should be considered unreliable.

Default settings
To this point, the discussion has assumed that none of the CPU resource management features of ESX Server
have been changed from their default values, which allow a virtual processor to consume up to 100% of an
execution core or as little as 0%. The default configuration allows ESX Server to dynamically change which
logical processor clock cycles are used to fulfill the time allotted to a virtual processor; in other words, ESX
Server can move a virtual processor between logical cores or, even, physical processors in response to
shifting loads within the physical host.
While unique resource management settings can be configured for each VM, VMware recommends
leaving these values set to their default values and allowing the VMkernel to make these decisions with
maximum flexibility.

Impact of processor cache size


Generally speaking, VM performance is more sensitive to processor cache size than to the speed of the
processor. Cache size is important when multiple VMs are switching between execution states, reducing the
effective CPU cache hit ratio. In a non-virtualized server, this ratio may be as high as 90%, which is
sustainable in single operating system environment; however, in a virtualized environment, many operating
systems are utilizing the same physical processor and core, making it difficult to obtain such a high rate.
This reduction of the cache hit ratio is known as cache fragmentation or cache pollution.

12
To illustrate the performance impact of cache fragmentation, consider an ESX Server environment freshly
booted with no VMs powered on. When the first VM is powered on, the cache hit ratio for its processor is
initially zero but begins to increase; after some time, this VM, running alone on the processor, might achieve
a hit ratio that is high enough to improve performance. When a second VM is powered on and scheduled
to execute on the same processor, this VM begins to populate processor cache with its own data and
processes, replacing the cached data and processes from the first VM. When the first VM is next scheduled
for execution, the cache hit ratio will be lower than previously achieved.
While the hit ratio will improve over time, it is likely to be lower during initial execution cycles (as shown in
Figure 6), forcing VMs to execute from main memory. Because access to main memory is much slower than
filling the requests from processor cache, the VM will run slower until the hit ratio improves.

Figure 6: Simulated impact of cache fragmentation on the CPU cache hit ratio, showing the ratio dropping to zero
each time a world switch (indicated by a red line) occurs

Note:
Unlike processor registers, processor cache is not restored or saved when
switching between executions of virtual machines.

The impact of cache fragmentation is intensified in a higher-density deployment. With more VMs running
per processor, each VM runs for a shorter period of time, which may limit a VM’s ability to fully populate and
realize the benefits of processor cache. As density increases, the following conditions occur:
• There are more VMs to push data out of cache
• The length of time between executions for each VM increases

These conditions combine to reduce the amount of data that remains cached between executions.
In order to combat this performance degradation, a larger processor cache may provide sufficient storage
to maintain a significant amount of cached data between executions. This would improve the cache hit
ratios for all VMs running on the processor.

Impact of cache on scheduling


Cache also plays an important role in the scheduling decisions made by the VMkernel. The VMkernel
understands the relationship between cache and VM performance. Specifically, the VMkernel understands

13
that performance can be maximized by continuing to run a virtual processor on the same logical core
(which is likely to contain cache pages for the particular VM4). As a result, the VMkernel is prepared to
accept a temporary increase in contention within one logical core before migrating a virtual processor to a
different core.
The decision to migrate or leave a virtual processor in place is governed by the potential penalty imposed
by the migration. If the contention within a logical core causes a virtual processor to delay an execution
request by a period that exceeds the migration penalty, the VMkernel recognizes that cache relevance
has been outweighed by the contention and will migrate the virtual processor to a logical core able to
serve the request more quickly.
To retain the ability to migrate virtual processors, VMware recommends that users allow ESX Server to
determine processor affinity to virtual machines. Many variables govern scheduling decisions made by the
VMkernel to guarantee the best possible performance from a particular physical configuration. Specifying
processor affinity reduces the flexibility available to the VMkernel to make optimizations. Thus, VMware
discourages forced association of a virtual machine to a specific processing core.

Intel Xeon processors


Xeon processors introduced Hyper-Threading Technology, which allows ESX Server to treat a single physical
processor package as two logical processors. By design, hyperthreaded processors include a second
instruction pipeline but still feature a single execution pipeline. The processor is solely responsible for
distributing execution cycles between the instruction pipelines.
From processor allocation and guarantee accounting perspectives, the VMkernel considers the two cores
(instruction pipelines) to be equivalent, even though only one is executing at any given moment. This, in
turn, means that two virtual processors scheduled to run in the two cores of a hyperthreaded processor are
considered to have equal access to the physical processor.
With a virtual processor staged and waiting in the secondary core, when the currently executing virtual
machine is unscheduled, the next machine to execute is already populated within the processor. One VM
can be running in the physical processor’s execution pipeline while instructions for a second VM can be
staged in the secondary core. When the scheduled allocation for the first VM ends, the next VM is ready to
execute, improving the speed of the world switch. However, the overall impact of hyperthreading on VM
performance depends on the nature of the application.

4 A concept known as cache relevance

14
On the other hand, if only one virtual processor were scheduled to run within the hyperthreaded physical
processor, the VMkernel would account for this exclusive access through its internal accounting
capabilities. In this case, the virtual processor would be charged more for its exclusive consumption of the
physical processor. The rationale behind this extra charge is that the single virtual processor consumes the
full physical package, whereas two virtual processors within the two logical cores of a hyperthreaded
physical processor are each utilizing half of the physical processor. Other than this, the concepts of resource
management in ESX Server apply to servers with hyperthreaded processors in exactly the same manner as
servers with single-core processors.
ESX Server, however, can also use this secondary core to address processor fragmentation by scheduling
the two virtual processors of a Virtual SMP-enabled VM to use both cores of a hyperthreaded processor.
However, since hyperthreading may cause contention between these two cores, overall performance
depends on the nature of the particular application. If the application uses only a single virtual processor,
leaving the second processor largely unused, hyperthreading gives ESX Server the flexibility to avoid the
effects of processor fragmentation without significantly impacting application performance. If, however,
simultaneous, parallel execution of the two virtual processor threads is required, poor application
performance is likely.
Like most other parameters governing scheduling decisions, it is possible to update the policy for scheduling
virtual processors in a hyperthreaded environment. For more information on these policies and how to
change them, type man hyperthreading at the service console command prompt.
Note that the default setting for hyperthreading scheduling policy for a virtual processor is any, which
places no restrictions on the allocation of cores between virtual processors. This setting allows VMkernel to
use both cores within each hyperthreaded physical processor for the scheduling and execution of any
virtual processor within the system.
Setting hyperthreading sharing policy to none causes the particular VM to effectively ignore the fact that
the physical processor is hyperthreaded and continue to consume physical packages as though they
contained only a single logical core. Since this policy is set per VM, no virtual processor associated with this
VM will share a physical package with any other; the other logical core within the package will remain
unused.
VMs with more than one virtual processor can also use the internal setting for hyperthreading sharing policy.
This allows the virtual processors of a single VM to share cores within a physical package; however, these
virtual processors will not share cores with virtual processors associated with any other VM.
As with any parameters that can alter scheduling decisions made by the VMkernel, VMware strongly
recommends accepting the default values for the hyperthreading sharing policy.

AMD Opteron processors


AMD Opteron processors feature a unique memory architecture that integrates the memory controller into
the high-speed core of the processor. As a result, AMD Opteron processors can access memory with
extremely low latency, a capability that is particularly useful when cache fragmentation reduces the
processor cache hit ratio, making VMs operate from main memory. The high-speed memory bus delivered
by an Opteron processor can also translate into increased VM density on a particular processor by
supporting faster world switches5.
Using NUMA architecture
Non-Uniform Memory Access (NUMA) is a system architecture that groups memory and processor cores into
nodes consisting of some physical memory and some processor packages and cores. Processor cores and
memory within a single node are said to be within the same proximity domain. All memory is accessible to
all cores, regardless of node membership. However, for cores accessing memory deployed in the same
proximity domain, accesses are faster and encounter less contention than accesses to memory within a
different node, as shown in Figure 9.

5The process by which one VM is unscheduled and another scheduled to execute is known as a world switch. This process involves
capturing one VM’s processor registers and writing these registers to memory, and reading the registers for the other VM from main
memory and, finally, writing these registers to the processor.

15
Figure 9: Showing a NUMA implementation with two nodes

16
HP ProLiant servers with AMD Opteron processors are NUMA systems: that is, the system BIOS creates a
System Resource Allocation Table (SRAT) that presents the nodes and proximity domains to ESX Server. ESX
Server is NUMA-aware and uses the contents of the SRAT to make decisions on how to optimally schedule
VMs and allocate memory.
On NUMA systems, ESX Server attempts to schedule a VM thread to execute in core(s) that are in the same
proximity domain as the memory associated with that VM. ESX Server also attempts to maintain physical
memory for a particular VM within a single NUMA node. If a VM needs more memory than that available in
a NUMA node, ESX Server allocates additional memory from the nearest proximity domain.
The single-core Opteron processor creates a unique NUMA architecture where each NUMA node has only
one processor core. While this is perfectly valid within NUMA specifications, it is more typical to deploy
multiple cores in a single NUMA node. Within the ESX Server context, the only scenario that is affected by
the unique Opteron NUMA presentation involves Virtual SMP.
The Virtual SMP code of ESX Server has been coded to take particular advantage of NUMA architecture on
dual-core Opteron processors. When a VM is allocated two virtual processors, ESX Server schedules both
threads to execute within a single NUMA node; however, because the SRAT dictates that each node has
only one processor, it is impossible for ESX Server to execute both virtual processor threads within a single
proximity domain.
Dual-Core Opteron processors implement an architecture with two processors per NUMA node, allowing
ESX Server to schedule dual-processor VMs within the confines of a single NUMA node. As long as the
number of virtual processors remains the same or lower than the number of execution cores in a proximity
domain, ESX Server NUMA optimizations should be in effect.

Disabling NUMA capability


Most HP ProLiant servers with Opteron processors offer the capability to disable NUMA by enabling the node
interleaving option in the BIOS. With node interleaving enabled, the HP ProLiant BIOS does not construct or
present the NUMA SRAT architecture and appears to be flat, uniform memory architecture.
VMware and HP recommend using NUMA features with single-processor VMs. For VMs with multiple virtual
processors, testing is recommended to determine which setting delivers the best performance for your
application.

Memory

Note:
An outstanding resource for detailed information on memory virtualization is
available at the ESX Server command line. Issuing the command man mem
displays a comprehensive guide on ESX Server memory virtualization.

In a non-overcommitted situation, when a VM is powered on the VMkernel attempts to allocate a region of


physical memory for the exclusive use of this VM. This memory space must be no larger than the maximum
memory size and no smaller than the minimum memory guarantee (assuming the VM is requesting at least
its minimum memory allocation). The total memory space may initially be comprised of both physical RAM
and VMkernel swap space. ESX Server performs this allocation and creates the address mappings that
allow virtual memory to be mapped to the physical memory. When the VM is powered off, its memory
allocation is returned to the pool of free, available physical memory.
Other memory consumers
Apart from VMs, there are two other consumers of memory within a physical host:
Service console
In the past, the service console required a specific amount of memory overhead for each VM running on a
host. The release of ESX Server 3.0 employs a new architecture whereby the service console is no longer

17
burdened with memory requirements per VM running on a host. Individual VM process threads are handled
directly by the VMkernel, thereby eliminating the need for additional service console memory per VM.
However, if you intend to run additional agents – for hardware monitoring and/or backup – in the service
console, it may be prudent to allocate more memory than the defaults allow.

18
Memory management
ESX Server provides advanced memory management features that help ensure the flexible and efficient
use of system memory resources. For example, ESX Server systems support VM memory allocations that are
greater than the amount of physical memory available – overcommitting – as well as background memory
page sharing and ballooning.
When attempting to power on a VM, an ESX Server host first verifies that there is enough free physical
memory to meet the guaranteed minimum needed to support this VM. Once this admission control feature
has been passed, the VMkernel creates and presents the virtual memory space.
While virtual memory space is created and completely addressed as the VM is powered on, physical
memory is not allocated entirely at this time; instead, the VMkernel allocates physical memory to the VM as
needed. In every case, VMs are granted uncontested allocations of physical memory up to their
guaranteed minimums; because of admission control, these allocations are known to be present and
available in the server.
If the entire physical memory pool is already being actively used when a VM requests the memory due it
according to its guaranteed minimum, the VMkernel makes physical memory available by decreasing the
physical memory allocation to another VM deployed on the same host. The VMkernel relies on its own swap
file to accommodate the increased physical memory demand.
Consider the following example where two VMs, VMA and VMB, are each guaranteed a minimum of 256
MB and a maximum of 512 MB of RAM, and two additional VMs, VMC and VMD are each guaranteed a
minimum of 512 MB and a maximum of 1024 MB of RAM. Ignoring all service console allocations and
virtualization overheads for a moment, assume that the server has 1.5 GB of RAM and that VMA and VMB
are each actively using 512 MB of physical memory while VMC and VMD are each actively using only 256
MB. Admission control allows all of these VMs to run since the 1.5 GB of physical memory can
accommodate the guaranteed minimums. While all physical memory is consumed in this example (as
shown in Figure 7), some machines are not actively using their guaranteed minimum or maximum
allocations.

Figure 7: In this example, VMC and VMD are using less than their guaranteed minimum memory allocations; all physical
memory is consumed

Now, to continue with this example, VMC and VMD each request an additional 256MB of memory, which is
guaranteed and must be granted. To accommodate these additional allocations, ESX Server reclaims

19
physical memory from VMA and VMB (as shown in Figure 8), both of which are operating with above their
minimum guaranteed allocations. If VMA and VMB have equivalent memory shares, each should relinquish
roughly the same amount of memory.

Figure 8: In this continuing example, VMC and VMD are allocated memory that has been reclaimed from VMA and
VMB

However, the applications running within VMA and VMB are not aware of memory guarantees and will
continue to address what they perceive to be their full memory ranges; as a result, the VMkernel must
address the deficits.

Using the balloon driver


The VMkernel has a range of options for addressing these deficits; two options are active and one passive.
The preferred active approach is to employ the balloon driver, a virtual memory controller driver installed in
a VM with the VMware Tools package. The VMkernel can instruct the balloon driver to inflate (consume)
memory within the memory space of a VM, forcing the guest operating system to use its own algorithms to
swap its own memory contents.

Note:
In the case of a balloon driver-induced swap within a VM, memory is
swapped to the VM’s – rather than the VMkernel’s – swapfile.

Memory that is reclaimed by the balloon driver and then distributed to an


alternate VM is cleared prior to the re-distribution. Therefore, the re-allocated
memory contains no residual information from the virtual machine that
previously occupied that memory space. This process reinforces the isolation
and encapsulation properties that are inherent within VMware virtual
machines.

Since the guest operating system is able to make intelligent decisions about which pages are appropriate
to swap and which are not, ESX Server uses the balloon driver to force the guest operating system to apply

20
this intelligence to reduce the physical memory used by its processes. At the same time, ESX Server is able to
identify the memory pages consumed by the balloon driver. These consumed pages are useless to the VM
but, to ESX Server, they represent physical memory that is essentially free to commit to other VMs. The
balloon driver only inflates by an amount that is enough to reduce the VM’s physical memory utilization to
the appropriate guaranteed minimum memory allocation.

Using a swap file


Beyond the balloon driver, ESX Server also supports a swap file for each VM. Once the balloon driver has
reduced every VM within a physical host to the guaranteed minimum memory allocation, additional
memory requested by VMs is granted through the VM swap file. ESX Server, which maintains a swap file for
each VM in the same location as the virtual machine configuration file (.vmx file), simulates the additional
memory by swapping memory contents to disk. The swap file is used when there simply is not enough
physical memory to accommodate requests beyond the guaranteed minimums.
Since the balloon driver is not an instantaneous solution, it may take a few minutes to fully inflate and free
hundreds of megabytes of physical memory. During this time, the VMkernel can use the VM swapfile to
provide the memory requested. After the ballooning operation is complete, the VMkernel may be able to
move all pages from the swapfile into physical memory.
In extreme circumstances when the VMkernel cannot provide timely memory allocations – either through
ballooning or the use of the swapfile – the VMkernel may temporarily pause a VM in an attempt to meet
memory allocation requests.

Using background memory page sharing


Both the VMkernel swapfile and the balloon driver can be considered active mechanisms to combat
memory overcommit. By contrast, background memory page sharing is a passive process that uses idle
cycles to identify redundant memory contents and consolidate them to reclaim physical memory. When
ESX Server detects an extended period of idleness in the system, the VMkernel will begin to compare
physical memory pages using a hashing algorithm. After encountering two memory pages that appear to
have the same contents, a binary compare is executed to ensure similar content. The ESX Server then frees
up one of the memory pages by updating the memory mappings for both VMs to point to the same
physical memory address. In this way, physical memory can be freed up for additional VMs.
ESX Server performs a copy-on-write operation so that one VM can update a shared page without
affecting the original, shared data. Should a VM attempt to write to or modify a shared memory page, ESX
Server first copies the shared memory page, so that a distinct instance is created for each VM. The VM
requesting the write operation to the memory page is then able to its contents without affecting other VMs
sharing this same page.
Background memory page sharing should not have a noticeable impact on VM performance since, by
default, the memory scrubbing algorithms that detect redundant pages are only active during periods of
low activity.
The intended effect of background memory page sharing is to create free physical memory; how this free
memory is used will be dependent upon the specific virtualized environment. In many cases, the memory
simply remains free until VMs attempt to modify the shared pages; however, in some cases, the free
physical memory created by background memory page sharing is used to power on an additional VM.
Note that it is possible to free enough physical memory to allow more total memory to be allocated for VMs
than is available on the server.

More on memory overcommitting


Memory overcommit should be well understood before relying on this feature to deliver higher VM densities;
there is the potential for a significant performance impact in memory overcommit situations. This feature,
like many others, is best explained through an example. Begin with the following assumptions:
• The physical host is an HP ProLiant server with 2.0 GB of RAM (assume 2.0 GB = 2048 MB)
• There are three VMs; all are powered on
• Each VM has been assigned 576MB of RAM

21
3A*576MBB=1728MBC

A – 3 virtual machines currently powered on


B – Total physical memory currently allocated per VM
C – Total physical memory required for all 3 VMs

Initially, this physical host would not have enough memory available to power-on a fourth VM. However,
through background memory page sharing, ESX Server may eventually find the necessary 256 MB of
redundant memory pages.

Note:
The amount of redundant memory reclaimed on a system is highly dependent
on the nature of the specific environment. The opportunity to share memory
dramatically increases in an environment where VMs are executing the same
OS. Metrics appearing in this example are only used for illustrative purposes
and do not represent actual physical memory that may be reclaimed.

With this newly reclaimed memory, ESX Server has enough free RAM to power on a fourth VM. Unless the first
three VMs attempt to update their shared memory pages, this system will continue to function as expected
with no discernable manifestation of either the memory page sharing or memory overcommitment.
However, if activity in the three original VMs should increase to the extent that each VM needs its own
distinct page, ESX Server can accommodate this shortage of physical memory through the use of a
swapfile.
Based on the share-based memory allocation policy, ESX Server reclaims physical memory by moving the
memory contents of a VM to disk. Ordinarily, this would be a very risky operation since there is no reliable,
programmatic method for the VMkernel to identify VM pages that are optimal for swapping to disk and
pages that should never be swapped to disk (for example, it would be inappropriate to swap VMkernel
pages). ESX Server solves this problem through the use of the balloon driver.
It is important to note, though, that this overcommit scenario with the use of the ESX Server swapfile can
have serious implications on the performance of the applications running in a VM. In environments where
performance is, in any way, a concern, avoid memory overcommitment. If possible, configure the physical
ESX Server platform with enough physical memory to accommodate all hosted VMs.

Recommendations for memory virtualization


From a performance perspective, the recommendations for memory virtualizations are few and
straightforward:
• When configuring servers to run ESX Server, try to install as much memory as possible on these systems, so
that more VMs can be run on them without memory overcommitment.
• To handle memory overcommit for virtual machines, you should install the VMware Tools package into
your VMs. This package includes a memory controller driver that allows ESX Server to gracefully reclaim
memory from individual VMs.
• When installing ESX Server, VMware recommends creating a VMkernel swapfile whose size is between
100% and 200% of the amount of physical RAM installed in a system. This large swapfile can
accommodate significant overcommitment and provide the VMkernel with maximum flexibility when
addressing memory allocation requests.

Network
Network virtualization in ESX Server is centered on the concept of a virtual switch, which is a software
representation of a 1016-port, full-duplex Ethernet switch. The virtual switch is the conduit between VM
network interfaces in two VMs or between a VM and the physical network. VM network interfaces connect

22
to virtual switches; virtual switches connect to both physical and virtual network interfaces, as shown in
Figure 10.

Figure 10: Showing a virtual switch providing connectivity between VMs and the network

When an application running in a VM attempts to send a packet, the request is handled by the operating
system and pushed to the network interface card device driver through the network stack. Inside the VM,
however, the network interface driver is actually the driver for the abstracted instance of the network
resource; the pathway through this virtual interface is not directly to the physical network interface but,
instead, passes the packet to the virtual switch components of the VMkernel. Once the packet has been
passed to the virtual switch, the VMkernel forwards the packet to the appropriate destination – either out of
the physical interface or to the virtual interface of another VM connected to the same virtual switch. Since
the virtual switch is implemented entirely in software, switch speed is a function of server processor power.
How to dedicate a physical NIC to a VM
A common question is, “How do I dedicate a physical NIC to a VM?”
By connecting a single virtual network adapter and a single physical network interface to a virtual switch, a
single VM obtains exclusive use of the physical interface. In this way, a physical NIC can be dedicated to a
VM. However, if a second VM is connected to the same virtual switch, both VMs will pass traffic through the
same physical interface.
A more complete response to the question of dedicating a physical NIC to a VM would be that a physical
network interface is not dedicated to a VM; instead, ESX Server is configured, through virtual switches, to
bridge only a single virtual network adapter through an individual physical interface.

23
Configuring virtual switches
Since it is possible to connect either more than one virtual adapter to a virtual switch or more than one
physical adapter to a single virtual switch (as shown in Figure 11), consider the multiplexing operation of a
virtual switch in each of these scenarios.

Figure 11: Showing connectivity options

First, consider a virtual switch connected to two virtual network adapters deployed in two different VMs. Just
as if this were a physical switch with two servers connected, these VMs can communicate with one another
via the virtual switch. This scenario can be scaled up to the limits of the virtual switch, a 1016-port device (32
ports by default), allowing up to 1016 VMs attached to the same virtual switch to communicate within a
physical host. In this environment, with no physical adapter connected to the virtual switch, network traffic is
utterly isolated from the physical network segment.
If this example is modified by connecting 1016 VMs and one physical adapter to the virtual switch, all 1016
VMs can communicate with the physical network via the single physical interface. Interestingly, in ESX
Server, physical adapters connected to virtual switches do not deduct from the number of ports available
on a virtual switch.
Note that, in the example, each VM has only a single network interface connected to the virtual switch;
there is no reason for a VM to have more than one virtual network interface connected to a single virtual
switch. In fact, ESX Server does allow a VM to be configured with more than one virtual network adapter
connected to a single virtual switch.
Virtual network adapters, virtual switches, and the connections between these devices, are VMkernel
processes whose speeds are dictated by server CPU speed; as purely software processes within the
VMkernel, these devices are operational as long as the VMkernel is operational. Unlike physical network
components, virtual network devices cannot fail or reach the physical limitations of media throughput. As a
result, there is no need to use multiple virtual adapters to address fault tolerance or performance concerns
– not always the case for physical adapters.

24
When interfacing with the physical world, however, virtual switches can be connected to multiple physical
network adapters, as shown in Figure 12.

Figure 12: Connecting virtual switches to multiple physical network adapters

When multiple physical network interfaces are attached to a single virtual switch, ESX Server and the
VMkernel recognize this as an attempt to address the fault-tolerance and performance concerns of
physical networks and automatically create a bonded team of physical network interfaces. This bonded
team is able to send and receive higher rates of data and, should a link in the bonded team fail, the
remaining member(s) of the team continue to provide network access. In other words, the connection of
multiple physical adapters to the same virtual switch creates a fault-tolerant NIC team for all VMs
communicating through this virtual switch. There is no need for a driver or special configuration settings
within the VMs.
The fault-tolerance delivered by the NIC team is completely transparent to the guest operating system.
Indeed, even if the guest operating system does not support NIC teaming or fault-tolerant network
connections, the VMkernel and the virtual switch deliver this functionality through the abstracted network
service exposed to the VM.

Load distribution
ESX Server does not distribute the frames that make up a single TCP session across multiple links in a bond.
This means that a session with a single source and single destination never consumes more bandwidth than
is provided by a single network interface. However, IP-based load-balancing multiple sessions to multiple
destinations can consume more total bandwidth than any single physical link. The only scenario in which
the same frame is sent over more than one interface occurs when no network links within the bond can be
verified as functional.
The network load sharing capability of ESX Server can be configured to employ load sharing policies based
on either Layer 2 (based on the source MAC address) or Layer 3 (based on a combination of the source
and destination IP addresses). By default, ESX Server uses the Layer 2 policy, which does not require any
configuration on the external, physical switch.

25
With MAC-based teaming, because the only consideration when determining which link use is the MAC
address, the VM always transmits frames over the same physical NIC within a bond. However, the IP-based
load distribution algorithm typically results in a more evenly balanced utilization of all physical links in a
bond.
If ESX Server is configured to use the IP-based load-distribution algorithm, the external, physical switch must
be configured to communicate using the IEEE 802.3ad specification. Because the MAC address of the VM
will appear to be connected to each of the ports on virtual switch that it is transmitting, this configuration is
likely to confuse the switch unless 802.3ad is enabled. The load-distribution algorithm also handles inbound
and outbound traffic differently.
Figure 13 compares MAC-based load balancing with IP-based load balancing.

Figure 13: Comparing MAC- and IP-based load distribution

MAC-based load distribution IP-based load distribution

26
Distributing outbound traffic
With the IP-address-based load-distribution algorithm enabled, and outbound network packets being sent
from a virtual machine, non-IP frames are distributed among the network interfaces within a single bond in
a round-robin fashion. IP-based traffic is, by default, distributed among the member interfaces of a bond
based on the destination IP address within the packet. This algorithm has the following strengths and
weaknesses:
• It prevents out-of-order TCP segments and provides, in most cases, reasonable distribution of the load.
• 802.3ad-capable network switches may be required as there have been reports that this algorithm
confuses non-802.3ad switches.
• It is not well-suited for environments where a VM communicates with a single host – in this environment, all
traffic is destined for the same IP address and would not be distributed.
• With this algorithm, all traffic between each pair of hosts traverses only one link per pair of hosts until a
failover occurs.
• The transmit NIC to be used by a VM for a network transaction is chosen based on whether the
combination of destination and source IP addresses is even or odd. Specifically, the IP-based algorithm
uses an exclusive-or (XOR) of the last bytes in both the source and destination IP addresses to determine
which physical link should be used for a source-destination pair. By considering both the source and
destination IP addresses when selecting the bond member for a particular host, it is possible for two VMs
within the same ESX Server host, using the same set network bond to select different physical interfaces,
even when communicating with the same remote host.
It is important to note is that the load distribution algorithm is not bound on a per-VM basis. In other words,
the path selection for load distribution supports different physical paths to the same destination on a
multi-homed virtual adapter

Distributing inbound traffic


For inbound traffic destined for a VM, there are two possible load distribution methods, depending on the
capabilities of the external, physical switch. For non-802.3ad switches, the return path for packets is
determined by the ARP table built by the switch and by an understanding of which machines (virtual or
physical) are connected to which physical ports. Because performance with non-802.3ad switches may be
affected by this learning process, higher throughput is possible with 802.3ad-compatible switches when
using bonded NICs with ESX Server.
Having decided which load distribution mechanism is most appropriate for your environment, you must
configure your virtual switches appropriately. This advanced configuration is available through the Virtual
Infrastructure Client management interface under the “Configuration” tab for the appropriate ESX Server
host.
ESX Server supports several other virtual switch and load-distribution settings, which are documented in the
Virtual Infrastructure Server Configuration Guide.

27
Eliminating the switch as a single point of failure
ESX Server allows a single team of bonded NICs to be connected to multiple physical switches, eliminating
the switch as a single point of failure. This feature requires the beacon monitoring feature of both the
physical switch and ESX Server NIC team to be enabled.
Beacon monitoring allows ESX Server to test the links in a bond by sending a packet from one adapter to
the other adapters within a virtual switch across the physical links. For more information on the beacon
monitoring feature, see the ESX Server administration guide.

Improving network performance


If a VM is not delivering acceptable network performance, load distribution through NIC teaming and
bonding (as described above) can improve performance in certain situations.
Bonding physical Gigabit Ethernet ports is not likely to improve performance; in general, VMs cannot
saturate a single Gigabit Ethernet port because of the CPU overhead on extremely high network
throughput. However, bonding 100Mbps Ethernet ports can improve throughput if all the following
conditions are met:
• There are spare CPU cycles within the physical server to handle the additional processing required for
increased traffic
• The VM is communicating with more than one destination
• ESX Server is configured to perform Layer 3-based load distribution (using IP addresses)
• The performance limitation is the network interface

Another effective way to improve VM performance is by deploying the VMware vmxnet device driver. By
default, VMs are created with a highly-compatible virtual network adapter – the device reports itself as an
AMD PCNet PCI Ethernet adapter (vlance). This device is used as the default because of its near-universal
compatibility – there are DOS drivers for this adapter, as well as Linux, Netware, and all versions of Windows.
While this virtual adapter reports link speeds of 10Mbps with only a half-duplex interface, the actual
throughput can be much closer to the capabilities of the physical interface.
If the vlance adapter is not delivering acceptable throughput or if the physical host is suffering from
excessive CPU utilization, higher throughput may be possible by changing to the vmxnet adapter, which is a
highly-tuned virtual network adapter for VMs. The vmxnet driver is installed as a component of the VMware
Tools package, and must be supported by the operating system running in the virtual machine. For a list of
supported operating systems, please see the System Compatibility Guide for ESX Server 3.
Another key to maximizing the performance of physical network adapters is the manual configuration of
the speed and duplex settings of both the physical network adapters in an ESX Server and the physical
switches to which the ESX Server is connected. VMware Knowledge Base article #813 details the settings
and steps necessary to force the speed and duplex of most physical network adapters.
In most cases, ESX Server is configured to dedicate a physical network adapter to the service console for
management and administration. There are, however, scenarios where it may be necessary to have the
service console use a network adapter that is allocated to the ESX Server VMkernel. Such scenarios are
usually introduced by dense server blade configurations that have only two physical NICs and cannot
spare an entire physical interface for the service console. In this case, the service console can access the
same virtual networking resources (virtual switches and network adapters). This is achieved by correctly
configuring a single virtual switch to handle a combination of interfaces such as the service console,
VMotion and VM networks. Although consolidating the interfaces is not a recommended best practice, it is
possible.

How the network perceives VMs


It is critical to remember that all virtual network adapters have their own MAC addresses. Since TCP/IP is
governed by the operating system within a VM, each VM requires its own IP address for network
connectivity. To an external network, a VM looks exactly like a physical machine, with every packet having
a unique source MAC and IP address.

28
To handle multiple source MAC addresses, the physical network interface of the server is put into
promiscuous mode. This causes its physical MAC address to be masked; all packets transmitted on the
network segment are presented to the VMkernel virtual switch interface. Any packets destined for a VM are
forwarded to the virtual network adapter through the virtual switch interface. Packets not destined for a VM
are immediately discarded. Similarly, network nodes perceive packets from a VM to have been transmitted
by the VM; the role of the physical interface is undetectable – the physical network interface has become
similar to a port on a switch, an identity-less conduit.

VLANs
ESX Server and virtual switches also support IEEE 802.1q VLAN tagging. To increase isolation or improve the
security of network traffic, ESX Server allows VMs to fully leverage existing VLAN capabilities and even
extends this functionality by implementing VLAN tagging within virtual switches.
VLAN tagging allows traffic to be isolated within the confines of a switched network. Traditionally, VLAN
tagging is performed by a physical switch, based on the physical port on which a packet arrives at the
switch. In an environment with no virtualized server instances, this approach provides complete isolation
within broadcast domains. However, when virtualization is introduced, port-based tagging at the physical
switch does not provide VLAN isolation between VMs that share the same physical network connection.
To address the scenario where broadcast-domain isolation is required between two VMs sharing the same
physical network, virtual switches support the creation of port groups that can provide VLAN tagging
isolation between VMs within the confines of a virtual switch. Port groups aggregate multiple ports under a
common configuration and provide a stable anchor point for virtual machines connecting to labeled
networks. Each port group is identified by a network label, which is unique to the current host, and can
optionally have a VLAN tagging ID.
Considerations when configuring virtual switches
• When initially configuring your virtual switches on ESX Server, invest in creating a naming convention that
provides meaningful names for these switches beyond the context of a single server. For example,
VMotion requires that both the source and destination ESX Server have the same network names; for this
reason, virtual switch names like “Second Network” may not translate from server to server as easily as
more definitive designations like, “Production Network” or “Management Network.”
• ESX Server supports a maximum of 20 physical NICs, whether 100 Mbps or 1 Gbps

• A virtual switch provides up to 1016 ports for virtual network adapter connections, the default is 32.
However, physical connections do not consume ports on virtual switches. For example, if four physical
network cards are connected to a single virtual switch, that switch still has all 1016 ports available for VMs.
• When using VLAN tagging within a virtual switch, you should configure the VM’s network adapter to
connect to the name of the port group, rather than the name of the physical switch. Note that external,
physical switch port to which ESX Server connects should be set to VLAN trunking mode to allow the port
to receive packets bound for multiple broadcast domains.
• A virtual switch may connect to multiple virtual network adapters (multiple VMs), but a VM can have no
more than one connection to any virtual switch
• A physical adapter may not connect to more than one virtual switch, but a virtual switch may connect to
multiple physical network adapters. When multiple physical adapters are connected to the same virtual
switch, they are automatically teamed and bonded.
• If you are implementing VMotion within your ESX Server environment, reserve or assign a Gigabit NIC for
VMotion to ensure the quickest possible migration.

Note:
VMware only supports VMotion over Gigabit Ethernet; VMotion over a 10/100
Mbps network is not supported.

29
Storage
Storage virtualization is probably the most complex component within an ESX Server environment. Some of
this complexity can be attributed to the robust, feature-rich Storage Array Network (SAN) devices deployed
to provide storage, but much is due to the fact that SANs and servers are often managed independently,
sometimes by entirely different organizations. As a result, this white paper discusses storage virtualization
from the following two perspectives:
• How the SAN (iSCSI, fibre channel) sees ESX Server
• How ESX Server sees the SAN (iSCSI and fibre channel)

Presenting both perspectives should help both SAN and server administrators better communicate their
unique requirements in an ESX Server deployment.

30
Architecture
Figure 14 presents an overview of virtual storage.

Figure 14: A virtual storage solution with three VMs accessing a single VMFS volume

ESX Server storage virtualization allows VMs to access underlying physical storage as though it were JBOD
SCSI within the VM – regardless of the physical storage topology or protocol. In other words, a VM accesses
physical storage by issuing read and write commands to what appears to be a local SCSI controller with a
locally-attached SCSI drive. Either an LSILogic or BusLogic SCSI controller driver is loaded in the VM so that
the guest operating system can access storage exactly as if this were a physical environment.
When an application within the VM issues a file read or write request to the operating system, the operating
system performs a file-to-block conversion and passes request to the driver. However, the driver in an ESX
Server environment does not talk directly to the hardware; instead, the driver passes the block read/write
request to the VMkernel where the physical device driver resides and then the read/write request is
forwarded to the actual physical hardware device and forwarded to the storage controller. In previous
versions of ESX, the physical device drivers were not loaded in the kernel which created an extra leg in the
journey from the VM to the physical storage. The integration of the drivers into the kernel in ESX Server 3
thereby removes an extra translation layer and improves I/O performance.
The storage controller may be a locally-attached RAID controller or a remote, multi-pathed SAN device –
the physical storage infrastructure is completely hidden from the virtual machine. To the SAN, however, the
converse is true: VMs are completely hidden from the physical storage infrastructure. The storage controller
sees I/O requests that appear to originate from an ESX Server; all storage bus traffic from VMs on a
particular physical host appears to originate from a single source.

31
There are two ways to make blocks of storage accessible to a VM:
• Using an encapsulated, VMware File System (VMFS)-hosted VM disk file
• Using a raw LUN formatted with the operating system’s native file system

VMFS
The vast majority of (unclustered) VMs use encapsulated disk files stored on a VMFS volume.

Note:
VMFS is a high-performance file system that stores large, monolithic virtual disk
files and is tuned for this task alone.

To understand why VMFS is used requires an understanding of VM disk files. Perhaps the closest analogy to a
VM disk file is an .ISO image of a CD-ROM disk, which is a single, large file containing a file system with many
individual files. Through the virtualization layer, the storage blocks within this single, large file are presented
to the VM as a SCSI disk drive, made possible by the file and block translations described above. To the VM,
this file is a hard disk, with physical geometry, files, and a file system; to the storage controller, this is a range
of blocks.
A VM disk file is, for all intents and purposes, the hard drive of a VM. This file contains the operating system,
applications, data, and all the settings associated with a typical/conventional hard drive. If an
administrator were to delete a VM disk file, it would be analogous to throwing a physical hard drive in the
trash – the data, the operating system, the applications, the settings, and even blocks of storage would be
lost. By the same token, if an administrator were to copy a VM disk file, an exact duplicate of the VM’s hard
drive would be created for use as a backup or for cloning the particular configuration.
Unlike Windows and Linux operating systems, ESX Server does not lock a LUN when it is mounted – a simple
fact that is the source of both power and potential confusion in an ESX Server environment. When
configuring a switched SAN topology, it is critical to use zoning, selective storage presentation, or LUN
masking to limit the number of physical servers (non-ESX Server) that can see a particular LUN. Without
limiting which physical – Windows or Linux – servers can see a LUN, locking and LUN contention will quickly
cause data to become inaccessible or inconsistent between nodes.
VMFS is inherently a distributed file system, allowing more than one ESX Server to view the same LUN. Unlike
Windows/NTFS or Linux/ext3, ESX Server/VMFS supports simultaneous access by multiple hosts. This means
that while numerous ESX Server instances may view the contents of a VMFS LUN, only one ESX Server may
open a file at any given moment. To an ESX Server and VMFS, when a VM is powered on, the VM disk file is
locked.
While VMotion is described in detail later in the document, it might be helpful to explain now that, in a
VMotion operation, the VM disk file remains in place on the SAN, in the same LUN; file ownership is simply
transferred between ESX Server hosts that have access to the same LUN.
The distributed nature of VMFS means that, when configuring the SAN to which ESX Server is attached,
zoning should be configured to allow multiple ESX Servers to access the same LUN where the VMFS partition
resides. This may be out of the ordinary for the SAN administrator.

32
Figure 15 shows a typical SAN solution.

Figure 15: A virtual storage solution with six VMs accessing LUNs on a SAN array

There are many perspectives to a virtualized storage environment:


• To VMs, VMFS is completely hidden. A VM is not aware that the storage that it sees is encapsulated in a
larger file within a VMFS volume.
• To ESX Server, multiple LUNs and multiple VMFS partitions may be visible. ESX Server can run multiple VMs
from multiple SAN devices; that is, ESX Server does not require or prefer that all VMs are in the same VMFS
volumes.
• To the SAN, the controller should be configured to expose the LUN containing the VMFS volume to any
and all ESX Servers that might be involved in VMotion operations for a particular VM.

LUN performance considerations


When constructing a LUN for VMFS volumes, you should follow some basic storage rules that apply to VMFS
and a few that require further consideration.
As with any LUN, more spindles mean more concurrent I/O operations. When planning a storage
configuration that maximizes storage performance, you should deploy as many spindles as practical to
create your VMFS LUN.

33
Remember that the VMFS volume will host multiple VMs, which has two effects on LUN performance:
• Since a single VMFS volume may have multiple ESX Servers and each ESX Server may have multiple VMs
within the same partition, the I/O loads on a VMFS-formatted LUN can be significantly higher than the
loads on a single-host, single-operating system LUN.
• Since many VM disk files are likely to be stored within a single VMFS volume, the importance for fault
tolerance on this LUN is amplified. Always employ at least the level of fault tolerance used for physical
machines.
Fault tolerance becomes even more of a concern if a larger VMFS volume is created from multiple,
smaller VMFS extents within ESX Server. Should any one extent fail, all data within that extent would be
lost, whereas information on the remaining extents would remain available. Therefore, measures like RAID
technology and stand-by drives should be considered standard as part of any VMFS LUN.

From a pure performance perspective, tuning an array for a particular application may not be as effective
with VMs as with physical machines. Since VM storage is abstracted from the VM and, typically,
encapsulated in a virtual machine disk file within a VMFS volume, it is probable that the same parameters
that enhanced database performance in an NTFS partition will not deliver the same gains in a virtualized
environment. As a result, at this time there are no recommended application-specific tuning parameters for
a VMFS formatted LUN.

Tuning VM storage
While it may be possible to perform some storage performance tuning in an ESX Server environment, you
should consider some potential trade-offs.
Storage performance tuning generally involves a low-level understanding of how an application accesses
disks and how to configure placement, allocation units, and caches within an array to optimize the
performance of this application. What is not always considered is that enhancing the performance for
application may, in practice, degrade the performance of many other applications.
Understanding this tuning trade-off is especially important in an ESX Server environment where dissimilar
applications may access the same groups of spindles. If an array hosting the virtual disks for several file and
print server VMs were tuned to optimize Microsoft SQL Server performance, for example, the performance
of the file and print servers would probably be degraded. It is also possible that, since the array is tuned for
SQL Server traffic – and is therefore less efficient when handling file and print traffic – SQL Server
performance could be degraded while the array struggles with the suboptimal file and print workload.
What may ultimately determine the degree to which storage is tuned for VM application performance is
the trade-off in flexibility. For the majority of deployments, the flexible, on-demand capability to create and
move VMs is one of the most powerful features of an ESX Server environment. To some extent, creating LUNs
that are tuned for specific applications restricts this flexibility.

Using raw device mapping


While the concept of raw device mapping is described in detail later in this white paper, it is relevant to
mention here that raw device mapping can allow a VM to access a LUN in much the same way as a non-
virtualized machine. In this scenario, where LUNs are created on a per-machine basis, the strategy of tuning
a LUN for the specific application within a VM may be more appropriate.
Since raw device mappings do not encapsulate the VM disk as a file within the VMFS file system, LUN
access more closely resembles the native application access for which the LUN is tuned.

34
Other design considerations
• When designing LUN schemes and storage layouts for a virtualized environment, you should consider the
requirements of VMotion, which needs all VM disk files (or the raw device mapping file) to be visible on
the SAN to both the source and destination servers.
• According to the “VirtualCenter Technical Best Practices” white paper, available at
http://www.vmware.com/pdf/vc_technical_best.pdf, there should be no more than 16 ESX Server hosts
connected to a single VMFS volume.
• In a larger deployment, it may not be practical to expose all VMs to all hosts; as a result, care should be
taken to ensure that VM disk files or disk mappings are accessible to the appropriate ESX Server hosts.

Sizing VM disk files


If a 72 GB hard drive is created for a virtual machine, ESX Server will create a sparse file within the specified
VMFS volume. As the space requirements for that VM increase, the VM disk file increases as well. While a 72
GB file is too large for most other file systems, VMFS can accommodate a 27 TB file, allowing the VM to
support disk files that meet the needs of almost all enterprise applications.
Presenting a raw LUN to a VM
Presenting the encapsulated VM disk file may not always be the optimal storage configuration. To a VM
running on an ESX Server, its VM disk file appears as a hard drive, with geometry, many files, and a file
system. However, to an external system not running within the ESX Server instance, the VM disk file appears
as a single, monolithic file.
If, for example, a VM were to be clustered with a physical machine, the data and quorum drives could not
be VM disk files since the physical cluster node would not be able to read a VMFS file system, let alone the
encapsulated virtual disk file.
To accommodate scenarios where external physical machines must share data (at a block level) with a
VM, ESX Server allows a raw LUN to be presented to the VM. The raw LUN is nothing more than a traditional
array of drives, as opposed to the encapsulated, monolithic virtual machine disk file. With a raw LUN, the
VM can be configured to use storage in nearly the same way that a physical device accesses storage
(except that the VM still accesses this raw LUN through a driver that presents the blocks as a locally
attached SCSI drive; the VMkernel still does the translation and encapsulation that results in the I/O
reaching the SAN storage controller).
This configuration is most commonly deployed when a VM is clustered with a physical server (as shown in
Figure 16). However, a VM cannot be clustered with a physical server running multipathing software such as
HP StorageWorks Secure Path; in this scenario, some custom multipathing commands are not supported in
ESX Server bus sharing.

35
Figure 16: A VM and physical machine clustered with a raw LUN

Using its capability to attach a raw device as a local storage device, a VM can hold or host data within
native operating system file systems, such as NTFS or ext3.
Raw device mapping
Prior to the release of ESX Server 2.5, the use of raw devices meant that many of the flexible aspects of
VMFS and VM disk files were not available. However, a feature called Raw Device Mapping (RDM)
addresses this shortcoming by allowing a VM to attach to a raw device as though it were a VMFS-hosted
file. With RDM, raw devices can deliver many of the same features previously reserved for VM disk files –
particularly VMotion and .redo logs.

Note:
.redo logs for VM disk files used in undoable disks and VM snapshots are
available only when raw device mapping is in virtual compatibility mode.

Raw device mapping relies on a VMFS-hosted pointer – or proxy – file to redirect requests from the VMFS file
system to the raw LUN.
For example, consider the following VMFS directory:
[root@System1 root]# ls –la /vmfs/demo/
total 25975808
drwxrwxrwt 1 root root 512 Aug 19 20:12
drwxrwxrwt 1 root root 512 Aug 22 19:10
-rw------- 1 root root 4194304512 Aug 25 15:13 W2K-SQL.vmdk
-rw------- 1 root root 18210038272 Aug 19 20:12 W2K-SQLDATA.vmdk
-rw------- 1 root root 4194304512 Aug 25 15:13 WNT-BDC.vmdk

36
In the above example, the VMFS volume demo contains both VM disk files and VM raw device mappings.
W2K-SQLDATA.vmdk is the raw device mapping that points the physical host to the appropriate LUN.
Note that raw device mapping appears to be exactly like a VM disk file, even appearing to have a file size
that is equivalent to the LUN to which the mapping refers. Since the map file is accessible through VMFS, it
appears to all physical hosts that can see the VMFS volume. When a VM attempts to access its raw-device-
mapped storage, the VMkernel resolves the SAN target through the data stored in the mapping file, which
is able to do per-host resolution for the raw device proxied by the raw device mapping file.
Consider a second example with two physical hosts; on each host is one node of a two-node cluster. Each
node – NodeA and NodeB – references a shared data disk that is a raw device. The VM configuration file
for NodeA references the shared disk as /vmfs/demo/data_disk.vmdk; NodeB shares this apparently
identical reference to /vmfs/demo/data_disk.vmdk for the shared data drive. However, because of
physical and configuration differences between the two systems, the physical SAN paths to the VMFS
volume demo and the physical SAN paths to the LUN referenced by the mapping file data_disk.vmdk are
different.
For the server hosting NodeA, the physical SAN address for the demo LUN is vmhba1:0:1:2; for the server
hosting NodeB, the physical SAN address for the demo LUN vmhba2:0:1:2. Similarly, the paths to the LUN
referred to by raw device mapping might be different. Without raw device mapping, only the physical,
static SAN path is used to access the raw LUN. Since the two physical hosts access the LUN over different
physical SAN paths, the VM configuration files would have to be updated to resolve the change in SAN
without a raw device map.
By removing the limitations of static SAN path definitions, raw device maps enable VMotion operations with
VMs that use raw devices. Additional functionality is enabled by raw device mappings; now, all raw device
access for a mapped LUN is proxied through a VMFS volume. As a result, the raw device may have access
to many of the features of the VMFS file system, depending on the mapping mode used.

37
There are two raw device mapping modes – virtual compatibility mode and physical compatibility mode.
• Virtual compatibility mode allows a mapped raw device to inherit nearly all of the features of a VM disk
file – such as file locking, file permissions, and .redo logs.
• Physical compatibility mode allows nearly every SCSI command to be passed directly to the storage
controller. This means that SAN-based replication tools, such as HP StorageWorks Business Copy or
Continuous Access, should work within a VM that is presented storage through a raw device map in
physical compatibility mode. This mode should allow SAN management applications to communicate
directly with storage controllers for monitoring and configuration.
Check with the storage vendor to determine if the appropriate storage management software has been
tested and is supported for running in a VM.

Testing has shown no performance difference between VMs accessing storage as encapsulated disk files
and those accessing storage as raw volumes; however, from an administrative perspective, the use of raw
volumes requires more coordination between SAN and server administrators. VMFS does not require the
strict SAN zoning needed to support raw devices with non-distributed file systems.
From a functional perspective, with the introduction of RDM, many of the differences between VMFS and
raw devices have been resolved. As such, unless there is an application requirement or architectural
justification for using raw devices, the use of VM disk files in a VMFS volume is preferable due to their
flexibility and ease of management. For example, with raw storage devices an administrator must create a
LUN on the SAN whenever a new VM is to be created; on the other hand, when creating a VM using a VM
disk file within a VMFS volume, no SAN administration is required since the LUN already exists.

38
Planning partitions
Before installing ESX Server, VMware strongly recommends that you consider your partitioning needs.
Repartitioning an ESX Server requires some Linux expertise; it is easier to plan an appropriate installation
rather than having to repartition later. Table 2 shows the recommended partitioning for a typical scenario. Comment [A1]: This seems out of
See the Installation and Upgrade Guide for more detailed information. place. The rest of the paragraph and the
following table talk about partitioning,
and no more mention is made of array
Table 2: Default storage configuration and partitioning for a VMFS volume on internal drives controllers.

Partition File system Size Description


name format
/boot ext3 100 MB (fixed) This partition contains the service console kernel,
drivers, and the Linux boot loader (LILO), as well as
the LILO configuration files.
/ ext3 2560MB (fixed) Called the “root” partition, this contains all user
and system files used by the service console,
including the ESX Server configuration files.
Swap 544MB (fixed) The swap partition is used by the Linux kernel of
the service console.
/var/log ext3 2GB The /var partition can provide log file storage
outside of the service console root partition.
vmkcore vmkcore 100 MB This partition serves as a repository for the
VMkernel core dump files in the event of a
VMkernel core dump.
VMFS VMFS3 <remaining space> The VMFS file system for the storage of virtual
machine disk files.

Note:
If your ESX Server host has no network storage, and one local disk, you must
create two more required partitions on the local disk (for a total of five
required partitions): vmkcore, a vmkcore partition is required to store core
dumps for troubleshooting. VMware does not support ESX Server host
configurations without a vmkcore partition.
vmfs3, a vmfs3 partition is required to store your virtual machines. These vmfs
and vmkcore partitions are required on a local disk only if the ESX Server host
has no network storage.

The /var partition can be particularly important as a log file repository. By having the /var mount point
reference a partition that is separate from the root partition, the root partition is less likely to become full. If
the root partition on the service console becomes completely full, the system may become unstable.

Implementing boot-from-SAN
The distributed nature of the VMFS files system can only be leveraged in a shared storage environment;
currently, a SAN (iSCSI or fibre channel) or NAS is the only form of shared storage currently certified for use
with ESX Server. As a result, most ESX Server deployments are attached to a SAN.
ESX Server supports boot-from-SAN, wherein the boot partitions for the Linux-based service console are
placed on the SAN (iSCSI or fibre channel); NAS does not support boot-from-SAN. In this boot-from-SAN
environment, there is no need for local drives within the physical host.
Unlike the VMFS volumes used for storing VM disk files, the partitions for booting ESX Server, which use the
standard Linux ext3 files system, should not be zoned for access by more than one system. In other words, in
the zoning configuration within your SAN, VMFS volumes may be exposed to many hosts; however, boot
partitions – /boot, / (root), swap and any other service console partitions you may have created – should
only be exposed to a single host.

39
The configuration of ESX Server to boot from SAN should be performed at installation time. If you are
installing from the product CD, you must select either the bootfromsan or bootfromsan-text option

Noting changes to the boot drive and device specification


When booting from a local controller, which uses the cciss.o driver, the drives and partitions are referenced
under /dev/cciss/cXdY (where X is the controller number and Y is the device number on the controller).
When booting from SAN, it is important to note the change to the boot drive and device specification. The
boot devices are presented as SCSI devices to the service console and therefore are referenced under
/dev/sda (or /dev/sdX, where X corresponds to the controller that provides access to the boot partitions).

Taking care during the installation


When installing ESX Server in a boot- from-SAN environment, exercise caution with the storage configuration
presented during the installation process. If you already have VMFS volumes on the SAN, ensure that the ESX
Server installer is not configured to create and format VMFS volumes; otherwise, the installer will format your
volumes, destroying VM disk files on the SAN.

Defining the connection type


In addition to the zoning configuration on the SAN, the connection type for each physical host must be
defined. This setting is different for each SAN, depending on model and manufacturer.
• For an HP StorageWorks Modular Smart Array (MSA), using either the Array Configuration Utility (ACU) or
serial cable command line, set the operating system type to Linux in the Selective Storage Presentation
options.
• For an HP StorageWorks EVAgl (active/passive) set the connection type to Custom and enter the
following string for the connection parameters: 000000002200282E. For firmware 4001 (active/active
firmware for “gl” series) and above, use type vmware.
• For an HP StorageWorks EVAxl (active/active) set the connection type to Custom and enter the following
string for the connection parameters: 000000002200283E. For firmware 5031 and above, use type
vmware.
• For an HP StorageWorks XP disk array, use host mode type 0C.

Fibre Channel multipathing and failover


SAN multipathing functionality is built into the VMkernel.

Note:
ESX Server multipathing even allows fault-tolerant SAN access to non-VMFS
volumes, if there are raw devices within the VMs.

ESX Server identifies storage entities through a hierarchical naming convention that references the following
elements: controller, target, LUN and partition. This convention provides unique references to VMFS volumes,
such as vmhba2:0:1:2, for example.
This example corresponds to the partition accessed through HBA vmhba2, target 0, LUN 1, and partition 2.
When ESX Server scans the SAN, each HBA reports all LUNs visible on the storage network; each LUN reports
an ID that uniquely identifies it to all nodes on the storage network. After detecting the same unique LUN ID
reported by the storage network, the VMkernel automatically enables multiple, redundant paths to this LUN,
known as multipathing.
ESX Server uses a single storage path for a particular LUN until the LUN becomes unavailable over this path.
After noting the path failure, ESX Server switches to an operational path.

40
Fail-back
For fail-back after all paths are restored, two policies are available to govern the ESX Server response: fixed
and Most-Recently Used (MRU). These policies can be configured through the Storage Management
Options in the web interface or from the command line.
• The fixed policy dictates that access to a particular LUN should always use the specified path, if available.
Should the specified, preferred path become unavailable, the VMkernel uses an alternate path to access
data and partitions on the LUN. ESX Server periodically attempts to initialize the failed SAN path; when the
preferred path is restored, the VMkernel reverts to this path for access to the LUN.
• The MRU policy does not place a preference on SAN paths; instead, the VMkernel accesses LUNs over
any available path. In the event of a failure, the VMkernel maintains LUN connectivity by switching to a
healthy SAN path. The LUN will continue to be accessed over this path, regardless of the state of the
previously-failed path; ESX Server does not attempt to initialize and restore any particular path.

Note:
The concept of a preferred path applies only when the failover policy is fixed;
with the MRU policy, the preferred path specification is ignored.

Application of the path policy is dictated, to a large extent, by the particular storage array deployed.
• For the active-passive SAN controllers found in HP StorageWorks EVA3000, EVA5000 and MSA-series arrays,
avoid the fixed policy; only use MRU.
• For the newer HP StorageWorks EVA4000, EVA6000, and EVA8000 arrays and all members of the HP
StorageWorks XP disk array family, which are all true active-active storage controller platforms, either
policy – fixed or MRU – can be used.

Since the physical storage mechanism is masked by the VMkernel, VMs are unaware of the underlying
infrastructure hosting their data. As a result, multipathing, multipathing policy, and path failover are all
irrelevant within a VM.

Resource Management
ESX Server 3.0 provides the ability for organizations to pool computing resources and then logical and
dynamically allocate guaranteed resources as appropriate, whether that be to organizations, individuals or
job functions. For the following sections, it is helpful to consider resource providers and resource consumers.

Clusters
VirtualCenter allows users to create clusters which can be viewed as logical containers within which
computing resources will be grouped. Each cluster can be configured to support VMware DRS and
VMware HA which will be discussed later in this section. Clusters are consumers of host resources and are
providers to resource pools and VMs.

41
Figure 17: Host systems aggregated into a single resource pool

VMware High Availability (HA) Clusters


ESX Server 3.0 provides a method to help improve service levels and uptime while removing the complexity
and expense of alternative high availability solutions. In addition to the ease of configuration, HA operates
independent of hardware and OS. In the event of a host failure, HA allows VMs to automatically restart on
an appropriate host within the HA cluster. The alternate host is chosen based on several factors including
resource availability and current workload. For detailed information on VMware HA, please refer to
“Automating High Availability (HA) Services with VMware HA.”

VMware Distributed Resource Scheduling (DRS) Clusters


Clusters which have been enabled for DRS provide the ability for a global scheduler, managed by
VirtualCenter, to automatically distribute VMs across the cluster. DRS provides different levels of service
based on the configuration of the cluster. For instance, the cluster can be configured to automatically
place VMs within the DRS cluster when they are powered on. Additionally, DRS can be configured to
dynamically balance the workload, across physical hosts within a cluster, based on real time utilization of
the clusters’ resources. For additional details and best practices regarding DRS, please refer to “Resource
Management with VMware DRS.”

Resource Pools
Resource Pools are used to hierarchically divide CPU and memory resources within a designated cluster.
Each individual host and each DRS cluster has a root resource pool which aggregates the resources of that
individual host or cluster. Children resource pools can be created from the root resource pool. Each child
owns a portion of the parent resources and can, in turn, provide a hierarchy of child pools. Resource pools
can be made up of both child resource pools and virtual machines. Within each pool users can specify
reservations, limits, shares which are then available to the child resource pools or VMs. For a detailed
discussion on the benefits, usage and resource pool best practices, please refer to the VMware “Resource
Management Guide.”

Resource Allocation
ESX Server provides powerful, flexible hardware allocation policies to enforce Quality of Service (QoS) or
performance requirements, allowing users to define limits and reservations for CPU and memory allocations
within each VM. These dynamic resource management policies make it possible to reserve CPU resources
for a particular VM – even while the VM is operational. For example, administrators could improve the
potential performance of one VM by specifying a reservation of 100% of CPU resources; at the same time,
other VMs in the same physical host could be constrained to a limit of 25%.

42
Allocations can be absolute or share-based. In addition, the allocations can be made to a resource pool
or to an individual VM.

Absolute allocation
It is possible to set a limit and reservation for each VM on a physical host. If, for example, a VM has been
allocated 25% of a CPU, VMkernel gives the VM at least 25% of CPU regardless of the demands of other
VMs, unless the VM with the reservation is idle6. Likewise, a resource pool can be guaranteed a reservation
of 25% of the CPU resources of a cluster. Therefore, each VM within that resource pool will further divide the
compute resources available to that pool. Regardless of reservations, idle VMs are preempted in favor of
VMs requesting resources.

Share-based allocation
In addition to the absolute allocation of resources for an individual, busy VM (with limit and reservation
guarantees), share-based allocation provides a mechanism for the relative distribution of server resources
between VMs. This concept applies to resource pools as well.
Each VM is assigned a certain number of per-resource, per-VM shares. For example, if two VMs have an
equal number of CPU shares, VMkernel ensures that they receive an equal number of CPU cycles (assuming
that neither reservations nor limits are violated for either VM and that neither VM is idle). If one VM has twice
as many shares as another, the VM with the larger share receives twice as many CPU cycles (again,
assuming that minimum or maximum guarantees are not violated for either VM and that neither VM is idle).

Consider a cluster that contains two resource pools, each with an equivalent number of CPU shares. The
VMkernel will guarantee that each pool within that cluster is provided an equal number of CPU cycles. If
one resource pool has twice as many shares as the other, the resource pool with the superior share count
will receive twice the number of CPU cycles allotted to that particular cluster.

Differences between allocation methods


The differences between using shares and minimum and maximum guarantees to affect relative VM
performance are subtle but important. When using guaranteed allocations, ESX Server is more likely to
encounter admission control issues, enforcing the policy that ESX Server must have enough free physical
resources to meet the guaranteed minimums for all virtual resources for all VMs. If ESX Server cannot meet
the guaranteed minimum allocation for any resource, the VM requesting an allocation is not be powered
on; in the case of a VMotion migration, the operation is denied.
In short, a physical host must have enough free resources to meet a VM’s guaranteed reservations in order
to power on the VM. There is a caveat to this rule when using VMware HA which is covered later is this
document.
Adjusting relative allocations through resource shares does not result in a similar limitation since share-based
allocation guarantees only the relative distribution of resources. In the hierarchy of enforcement, meeting
guaranteed maximums and minimums takes precedence over maintaining the relative distribution defined
by share-based mechanisms.

Warning on setting a guaranteed minimum


Be aware that setting a guaranteed reservation might limit the maximum VM density of a physical host. For
example, if each VM were to be guaranteed a reservation of 25% of a processor core in a dual-processor
(single-core processor, without Intel Hyper-Threading Technology) server, this server could power on only
seven VMs. The explanation for this limitation is as follows: the service console has a minimum allocation of
5% of a processor core, leaving 195%; dividing 195% by 25% yields a limit of seven VMs that can meet the
guaranteed minimum core allocation.
Note that the total capacity of server system is the sum of the percentages delivered by each processor
core. For example, for an eight-processor server with Hyper-Threading Technology, the capacity is 8 x 100%
x 2 = 1600%.
By default, VMs are allocated a CPU reservation of 0%.

6 An idle VM is only attempting to execute instructions that constitute the idle loop process.

43
Allocating shares for other resources
Shares can be defined and allocated per-resource – for CPU, memory, or disk – for each VM. Note that the
application of the relative share allocation policy for other resources differs slightly from CPU:
• Memory
The share allocation policy for memory defines the relative extent to which memory is reclaimed from a
VM if memory overcommitment should occur. A VM with a larger allocation of shares retains a
proportionally larger allocation of physical memory when VMkernel needs to usurp memory from VMs.
• Disk
For disk accesses, the proportional share algorithm allows proportional prioritization for each VM’s disk
access.

There are no shares associated with network traffic; instead of using shares, network resources are
constrained either by traffic shaping or limiting outbound bandwidth.

Best practices
VMware publishes best practices guides for many components of the virtualized environment. These
include the following:
VirtualCenter & Templates http://www.vmware.com/pdf/vc_2_templates_usage_best_practices_wp.pdf

Virtual SMP http://www.vmware.com/pdf/vsmp_best_practices.pdf


ESX Server http://www.vmware.com/pdf/esx3_best_practices.pdf
For additional best practices and technical documents, please refer to
http://www.vmware.com/vmtn/resources/cat/91,100.

VMware VirtualCenter
VirtualCenter is a centralized management application that supports the hierarchical and logical
organization and viewing of physical ESX Server resources and associated VMs.
VirtualCenter 2.0 allows users to view the following key items:
• All running VMs
• The current state and utilization of each VM
• All ESX Server physical hosts
• The current state and utilization of each ESX Server physical host
• Historical performance and utilization data for each VM
• Historical performance and utilization data for each ESX Server physical host
• VM configuration
• Cluster (DRS and HA) and resource pool configuration

VirtualCenter also offers remote console access to each VM.


However, VirtualCenter is more than just a management application, it is the cornerstone of the Virtual
Infrastructure from VMware. This centralized management service allows any application to authenticate
and issue commands to the VirtualCenter server through the VMware VirtualCenter SDK, providing a single
point of administration for users and applications.
The VirtualCenter user interface also provides access to higher-level functionality, such as VMotion, DRS and
HA.

44
Architecture
VirtualCenter is based on a client – server – agent architecture, with each managed host requiring a
management agent license7.
When an ESX Server physical host connects to VirtualCenter, VirtualCenter automatically installs an agent,
which communicates status as well as command and control functions between the VirtualCenter server
and ESX Server.
VirtualCenter server8 is a Windows service that may run either on a physical server or inside a VM. Each
VirtualCenter server should be able to manage between 50 and 100 physical hosts and between 1000 and
2000 VMs, depending on the configuration of the server running the VirtualCenter server service.
The Virtual Infrastructure Client application, acts as the user interface for VirtualCenter. It does not require a
license, and many clients may access the same VirtualCenter server simultaneously. The Virtual Center
Client also acts as the main interface to ESX Server 3; this is convenient for environments that have a small
number of ESX Server hosts, in which it is feasible to manage these hosts directly.

Note:
VirtualCenter requires an ODBC-compliant database for its datastore. This
database holds historical performance data and VM configuration
information.

Templates and clones


VirtualCenter builds on the portability and encapsulation of VM disk files by enabling two additional
features: templates and clones.

Template
A template is analogous to a “golden master” server image and represents a ready-to-provision server
installation that helps eliminate the redundant tasks associated with provisioning a new server. For instance,
a template can be built by creating a VM and installing an operating system, all of the required patches
and service packs, and standard security and management applications as well as any common
configuration parameters. Then, the VM’s network identity is reset using a tool such as SysPrep, and is
powered off. Then, VirtualCenter can be used to create a template from this VM. New VMs can be
deployed and customized using a wizard-driven interface or an .XML formatted file containing the desired
customizations.
Templates are not required to be stored within the VirtualCenter server filesystem; they can also be stored
on a NAS shared storage or on a VMFS3 datastore.

7 The management agent is licensed based on the number of physical processors present in the platform to be managed.
8 This server is also a separately licensed component of the Virtual Infrastructure, though, unlike other separately licensed products, the
license for VirtualCenter Management Server is not included within the Virtual Infrastructure Node bundle.

45
Cloning
VirtualCenter can also clone a VM to achieve the rapid deployment and replication of a server
configuration.

Differences between templates and clones


The differences between templates and clones are subtle but important, as shown below.

A template is static; once created, it never A clone is dynamic.


changes
To update a template, you must first create A clone can be changed.
a VM from the current template and then
install or apply the desired updates. Lastly,
you must create a new template to replace
the original.
A template is not a VM and cannot be A clone is a VM.
powered on.
A template has a rigid definition, ensuring A clone is easy to update and patch.
consistency in the deployment of VMs.

Both of these deployment options support the thorough customization of a new VM before it is powered on.
Consider your particular environment before selecting the approach that best meets your needs.

Considerations and requirements for


VirtualCenter server
• VirtualCenter server installs as an application and runs as a Windows service. As such it requires a Windows
2000 Server SP4, Windows Server 2003 (Web, Standard or Enterprise) except 64-bit, or Windows XP at any
SP level. The VirtualCenter installer requires Internet Explorer 5.5 or higher in order to run.
• VirtualCenter server can run either in a VM or on a physical server. In either case, the server instance
hosting the VirtualCenter server must have at least 2 GB of RAM, a 2 GHz processor, and at least one
network interface. This minimum configuration should support a total of 50 ESX Server hosts, 1000 VMs, and
20 simultaneous VirtualCenter client connections.
Scaling up the machine configuration to 3 GB of RAM, dual 2 GHz processors, and a Gigabit network
interface should provide support for a total of 100 ESX Server hosts, 2000 VMs, and 50 simultaneous
VirtualCenter client connections.
• VirtualCenter server also requires a database, which may run on the same server as the VirtualCenter
server service or on a remote system. Consider the following:
– VirtualCenter 2.0 supports Microsoft SQL Server 2000 and SQL Server 7, Oracle® 8i, 9i and 10g as well
as Microsoft MSDE (not recommended for production environments). Note Microsoft Access is no
longer a supported DBS.

46
When configuring the database connection for VirtualCenter, configure the ODBC client to use a
System DSN with SQL Authentication.
– VirtualCenter 2.0 does not support Windows Authentication to the database servers.

Compatibility
Refer to Table 3 for compatibility between current and previous version of VirtualCenter and ESX Server.

Table 3: Compatibility matrix showing capabilities of VC and ESX hosts


Manage ESX Server 2 hosts and their Yes, but no DRS, HA or other new
VMs with VirtualCenter 2? features
Manage ESX Server 3 hosts with
No
VirtualCenter 1?
VMotion from ESX Server 2 to ESX
No
Server 3?
After upgrading a VM on ESX Server 3,
No
boot it on ESX Server 2?
Store ESX Server 2 and ESX Server 3 VM
No
files in the same VMFS?

Virtual Infrastructure Client application requirements


• The Virtual Infrastructure Client application requires .NET framework 1.1 in order to operate. The
application is designed to operate on Windows XP Pro (at any SP level), Windows 2000 Pro SP4, Windows
2000 Server SP4 and all versions of Windows Server 2003 except 64-bit.
• The application must run on a 266MHz or higher Intel or AMD processor.
• The application requires a network interface and at least 256 MB of RAM (512 MB of RAM recommended).
• 150MB of freed disk space is required for basic installation. Users must have 55MB free on the destination
drive for installation of the program, and you must have 100MB free on the drive containing your %temp%
directory.
• A Gigabit Ethernet port is recommended, although 10/100 is supported.

VMotion
With the release of VMotion, VMware introduced a unique, new technology that allows a VM to move
between physical platforms while the VM is running. VMotion can address a wide range of IT challenges –
from accommodating scheduled downtime to building an Adaptive Enterprise.

Architecture
VMotion relies on several of the underlying components of ESX Server virtualization, most notably the VMFS
file system.
As described earlier, VMFS is a distributed file system that locks VM disk files at the file level, a unique locking
mechanism that allows multiple ESX Server instances to utilize a VM disk file within a particular VMFS volume.
This mechanism ensures that only one physical host at a time can access a disk file and power-on the
associated VM.
To support the rapid movement of VMs between physical machines, it is imperative that the large amount
of data associated with each VM does not move – moving the many gigabytes of disk storage associated
with a typical VM would take a significant length of time. As a result, instead of moving the disk storage,
VMotion and VirtualCenter simply change the owner of the VM disk file, allowing the VM to migrate to a
different physical host (as shown in Figure 17).

47
Figure 17: Migrating a VM from one physical host to another without moving the VM disk file

The new physical host also requires access to the memory contents and CPU state information of the VM to
be migrated. However, unlike the disk-bound data, there is no shared medium for memory and CPU
resources; the CPU state must be migrated by copying the data over a network connection.
When initiating a migration, VMotion takes a snapshot of source server memory, then sends a copy of these
memory pages – unencrypted – to the destination server. During this copying process, execution continues
on the source server; as a result, memory contents on this server are likely to change. ESX Server tracks these
changes and, when copying is complete, sends the destination server a map indicating which memory
pages have changed.
At this point, the CPU state is sent to the destination server, the file lock is changed, and the destination
server opens the VM file and assumes execution of the VM. Accesses to one of the changed memory
pages are served from the source server until all memory changes have been communicated to the
destination server via background processes.
These network-intensive operations justify the deployment of a Gigabit network interface to minimize
latency between source and destination servers and maximize the rate at which memory pages can be
moved between these servers. Moreover, since these memory pages are not encrypted, security needs
may justify the deployment of a dedicated network interface.

Considerations and requirements


To ensure stable and consistent execution after migrating a VM to a different physical host, VirtualCenter
thoroughly reviews the capabilities of both source and destination servers prior to initiating a migration.
These servers must comply with the following safeguards:

48
• Both must have access to the VMFS SAN-based partition that holds either the VM disk file or the VM disk
raw device map file.
Since the VM disk file or raw device map file is not moved during the VMotion operation, both servers
must be able to access this partition. A VMotion operation cannot involve moving the disk file from one
LUN (either local or SAN-based) to another9.
Ensure that the SAN is configured to expose the LUN to the HBAs of both the source and destination
servers.
• Both must have identical virtual switches defined and available for all virtual network adapters within the
VM.
Assuming that the VM to be migrated is using a network interface to perform meaningful activity,
VirtualCenter must ensure that this connection is still available after the migration. Before initiating a
VMotion operation, VirtualCenter examines and compares virtual switch definitions and configurations on
both source and destination servers to ensure that they are identical.
Note that VirtualCenter does not attempt to validate the defined connectivity, VC assumes that the IT
staff followed good practice during configuration.
For example, if the VM connects to a virtual switch named devnet on the source server, the destination
server must also have a virtual switch named devnet. If the appropriate virtual switch exists on the
destination server, VirtualCenter assumes that the networks are identical and provides the same
functional connectivity. As a result, if the virtual switch on the source server connects to a development
network and the virtual switch on the destination server connects to a production network, the VMotion
operation still continues; however, it is likely that the application within the migrated VM will not be able to
access the appropriate network resources.
To facilitate this process, you should take care to be consistent and use meaningful names when
configuring and creating virtual switches.
• Both must have compatible processors.
Since both the CPU and the execution states are moved from one server to the other, it is critical that
both processors implement the same instruction sets in exactly the same manner. If not, unsupported or
altered instruction execution will have unknown and potentially catastrophic affect on the migrated VM.
While this safeguard seems straightforward, the enhancements regularly implemented by Intel and AMD
mean that compatibility is not always clear – particularly when these enhancements occur within a
particular processor family, compatibility is not always clear.
VMware Knowledge Base article #1377 provides an overview of the challenges to be faced when
migrating a VM from a physical host that supports the SSE3 instruction set and one that does not, and vice
versa. For example, VMotion reports an incompatibility between HP ProLiant BL20p G2 and G3 server
blades.
If an incompatibility were to be reported, the above Knowledge Base describes an unsupported method
for overcoming this safeguard.
• The destination server must have enough free memory to support the minimum guarantee for the VM to
be moved.
In practice, this statement could apply to all server resources: if a physical resource allocation
guaranteed to the VM on the source machine cannot be met on the destination machine, the VMotion
operation fails. In this case, the VM continues to run on the source server.
Clustered VMs unsupported
Currently, clustered VMs are not supported for VMotion operations.
VMotion requires that VMs access VMFS volumes using the public bus access mode; however, because of
their shared storage requirements, ESX Server requires clustered VMs to use the shared mode. These two
access modes are incompatible.
In order to migrate a clustered VM node from one physical host to another, you must take down one node
and perform a “cold migration.” After the migration is complete, bring the cluster node back up and rejoin
the cluster. With the cluster complete, repeat the process with the other node, if desired.

9 To perform a migration that requires the disk file to be moved, the VM must be powered off (or suspended) and “cold migrated.”

49
50
For more information
For access to VMware product guides see, http://www.vmware.com/support/pubs
For detailed information on Planning, Deploying, or Managing a virtual infrastructure on
ProLiant see, http://h71019.www7.hp.com/ActiveAnswers/cache/71086-0-0-0-121.html

Copyright © 2006 VMware, Inc. All rights reserved. Protected by one or more of U.S. Patent
Nos. 6,397,242, 6,496,847, 6,704,925, 6,711,672, 6,725,289, 6,735,601, 6,785,886, 6,789,156,
6,795,966, 6,880,022 6,961,941, 6,961,806 and 6,944,699; patents pending. VMware, the
VMware “boxes” logo and design, Virtual SMP and VMotion are registered trademarks or
trademarks of VMware, Inc. in the United States and/or other jurisdictions. Microsoft,
Windows and Windows NT are registered trademarks of Microsoft Corporation. Linux is a
registered trademark of Linus Torvalds. All other marks and names mentioned herein may
be trademarks of their respective companies.

Das könnte Ihnen auch gefallen