Sie sind auf Seite 1von 15

Managing Capacity in VMware® Environments

Part II – Five Practices that will Optimize your VMware


Capacity and Result in a Standing Ovation

May 2009

© 2009 Systar, Inc. 1


http://www.systar.com/solutions/virtualization_management
Executive Summary

Nearly every large enterprise IT shop has virtualized some portion of their infrastructure. By
producing nearly 30% hardware cost savings annually, their IT executives are applauding the
initial success. But, can they do better?

While new virtualization platforms proved to be stable, and consolidation of physical resources
was achieved during the initial deployments, the reality of the situation is that VMware
infrastructure remains bloated. Systar has spoken with hundreds of IT executives, virtualization
architects and capacity managers at leading enterprises who have all admitted to only achieving
post-virtualization capacity utilization rates of 10 – 20%, while their objectives were to safely
reach 50% - 60%. Reaching the >50% objective cannot be accomplished with default VMware
settings, and requires a new understanding of virtualized capacity.

In this paper, Systar explores five practices for optimizing the utilization of virtualized capacity
without increasing risk to service quality. The practices are:

• Managing at the cluster level


• Sizing objects correctly
• Placing workloads carefully
• Optimizing DRS
• Minimizing impact of HA

As companies prepare to expand their virtualized environments, Systar sees an opportunity for
IT organizations to improve their understanding of virtualized capacity, reduce new hardware
spending significantly, and meet their utilization objectives safely. By applying the practices
discussed in this paper, Systar sees the initial round of applause transforming into a standing
ovation as virtualization expands across the enterprise while safely meeting the >50% objective.

© 2009 Systar, Inc. 2


http://www.systar.com/solutions/virtualization_management
Executive Summary............................................................................................................. 2
Improve Quality and Contain Costs .................................................................................... 4
The Many Dimensions of VMware Capacity....................................................................... 4
Managing at the Cluster Level ............................................................................................ 5
Sizing Objects Correctly ...................................................................................................... 7
Placing Workloads............................................................................................................... 9
Optimizing DRS.................................................................................................................. 11
Minimizing Impact of High Availability ............................................................................. 12
Summary ........................................................................................................................... 13
Glossary............................................................................................................................. 14

© 2009 Systar, Inc. 3


http://www.systar.com/solutions/virtualization_management
Improve Quality and Contain Costs

The vast majority of today’s VMware infrastructure capacity is bloated. Where physical capacity
was over-provisioned on average by a factor of 10, consolidation to virtualized environments
has reduced the footprint of IT infrastructure, but not optimized computing capacity. On
average, virtualized capacity is over-provisioned by a factor of 4 including peak headroom –
representing millions of dollars in over-spending as corporate IT budgets continue to tighten.

A proven way to combat over-provisioning and reduce unnecessary virtualization expenditures


is through improved capacity management practices. According to a Forrester report titled The
Capacity Planning Software Market1, “As data centers struggle with server consolidation and
server virtualization, capacity planning becomes the key to maintaining or improving service
quality while containing costs”.

In theory, optimizing the utilization of IT infrastructure capacity via consolidation to virtualized


environments is easy, but in practice it can be difficult without the right information. In order to
maximize the utilization of virtualized infrastructure, the proper visibility, control, processes and
expertise are needed. Yet each path to optimizing virtual capacity is met with its own challenge.
For example, IT organizations must:

• Maximize the aggregate utilization of virtualized resources, but respect the


tolerance for risk of the organization
• Take advantage of the advanced capabilities of the virtualized infrastructure, but be
able to quantify its available effective capacity, workload patterns, migration
tendency and patterns, and HA capability
• Maintain the flexibility to respond to unexpected workload variances with virtual
machine migrations, but strive to minimize migrations through careful placement
• Take advantage of cost-effective failover capabilities, but minimize capacity
dedicated to support failovers
• Establish processes to automate the efforts mentioned above ensuring their burden
does not outweigh their benefit as the virtualized environment scales

The Many Dimensions of VMware Capacity

Virtualization changes many IT functions but it changes capacity management more than most.
Where capacity in the physical world often focused on a single machine hosting a single
application, VMware clusters – made up of ESX Hosts and Virtual Machines – are the new
“computer” and capacity must be managed accordingly. VMware’s CTO, Steven Herrod,
recently pointed to this notion when he commented that “virtualization is the mainframe for the
21st century”.2

1
The Capacity Planning Software Market: Sustaining Application Performance by Evelyn Hubbert and Jean-Pierre
Garbani with Thomas Mendel, Ph.D.
2
VMware’s vSphere Introduction, Conference Call (2009)

© 2009 Systar, Inc. 4


http://www.systar.com/solutions/virtualization_management
Understanding capacity of VMware environments, starting at the cluster level, is a
multidimensional concept. Cluster capacity is first determined by the size and number of ESX
Hosts belonging to the cluster. Each ESX Host will support a number of VMs that have sizing
properties like Reservations and calculated Entitlements. Each Reservation should account for
the typical workload including some headroom for demand spikes. Next, we need to
understand potential workload resource utilization contention on each ESX Host due to its
resource sharing between all VMs in addition to the hypervisor and whitespace3. Then, as
contention for resources builds, load balancing and policy enforcement performed by VMware’s
Distributed Resource Scheduler (DRS) can migrate VMs from one host to another within the
cluster. Finally and of equal importance, cluster capacity needs to account for headroom
requirements of High Availability (if enabled).

Although VMware capacity is complex in nature, optimizing capacity utilization in these


environments does not have to be difficult. In fact, Systar has successfully led and enabled
many organizations through optimizing their virtualized capacity. Based on many years of
experience in optimizing physical and virtualized infrastructure, Systar recommends increasing
your understanding and adoption of five proven practices:

• Managing at the cluster level


• Sizing objects correctly
• Placing workloads carefully
• Optimizing DRS
• Minimizing impact of HA

Let’s explore some of the multidimensional aspects of cluster capacity while discussing practices
that can be applied to improve their management.

Managing at the Cluster Level

During the initial rollout of VMware more concern is typically placed on sizing VMs correctly and
placing workloads carefully, but the approach taken to achieve this is often rudimentary. We
have spoken with many organizations that ignore proper VM sizing and workload placement
methods in favor of placing common Operating Systems on the same machine and using a core-
to-VM ratio rule.

As deployments expand across the enterprise and DRS and HA enabled clusters enter into the
equation, managing capacity of the cluster becomes paramount. This sentiment is echoed in
Gartner, Inc.’s Data Center Conference Survey4, which states “In the long term, Gartner believes
that capacity-planning tools and processes will have to shift their orientation to focusing less on
a single VM or physical server to assist with the sizing of resource pools and clusters.”

3
Whitespace: capacity on a host that cannot be utilized due to alignment of resource requirements, or resources
that cannot be used since they are too small support a whole VM.
4
Data Center Conference Survey: Addressing the Operational Challenges of Virtual Server Management, by Cameron
Haight (February 2008)

© 2009 Systar, Inc. 5


http://www.systar.com/solutions/virtualization_management
Balancing workload among hosts is an important aspect of managing cluster capacity. If the
load of an individual host is too high, a DRS enabled cluster will try to rebalance that load.
Although this automated load balancing is available, it does create overhead in the
environment. Simply relying on DRS to balance all workloads, all of the time, will lead to
unnecessary migration overhead in the system and is not considered a best practice. By
monitoring migrations and VM tracks, you can begin to determine if excess balancing is
occurring or if VMs are wandering around like lost souls. With careful balancing of loads
upfront, you can preempt capacity loss as a result of excess migrations.

Next, we will look at Entitlements5. The general measure of the capacity and health of a cluster
is the ability of the cluster to deliver the entitled resources to all VMs. If VM Entitlements total
more than the total capacity of the cluster, the cluster is undersized or improperly balanced.
Within the cluster itself, a good measure of a host’s ability to provide expected capacity is the
measure of its total Entitlements vs. total capacity. Unlike Reservations6, the cluster and its
hosts will not identify a violation when Entitlements exceed capacity.

By understanding the sum of all VM Entitlements on a host and within a cluster, VMware
architects and administrators will have a clear picture of the resources being made available to
meet demands on their capacity.

Another practice to consider is the calculation of target headroom7 within a cluster plus its high
availability (HA) failover capacity. The sum of these elements can be used to establish an
“effective capacity” of the cluster. The image below describes the effective capacity of the
cluster in terms of the percentage of CPU and memory available.

Figure 1. Systar’s OmniVision VMware Capacity Cluster Reports automatically


calculate effective capacity of clusters by percentage, normalized MIPS, and number
of CPUs. Daily, weekly and monthly report perspectives are available.

5
Entitlements are the computed result of configurations, reservations, limits, and shares used to establish the
resource allocation given to each VM for its operation. The Entitlement will always fall between the Reservation and
the Limit, based upon its Share.
6
A Reservation is the amount of vCPU and memory (in absolute units) that a VM is guaranteed should it need it.
7
Target headroom = demand spike headroom (which depends on workload profiles and risk tolerance + whitespace
(5% for large hosts, 10% for smaller ones); then, add in HA space of 15% per host in an 8-host cluster.

© 2009 Systar, Inc. 6


http://www.systar.com/solutions/virtualization_management
Effective capacity is only one measure of the cluster that should be considered. Another
recommended element of this practice is to not only measure the available capacity of the
cluster in terms of resources like CPU and memory, but also in terms of VMs that can be added.
It is important to understand that raw computing resource sizes do not take into account
whitespace sizes (resources that cannot be used since they are too small support a whole VM).

When calculating the number of VMs that can be added to a cluster, it is important to first
define differing VM template sizes (e.g., small, medium and large). The definition can start with
a simple average VM reservation for each resource per template size. For a more accurate
picture of where VMs should be placed within a cluster, you can expand the calculation to
consider maximum and minimum VM sizes and workload type such as sustained or peaky.

The diagram below provides a calculated assessment of how many medium-sized VMs can be
added to each cluster on the basis of CPU and memory requirements for that VM template.

Figure 2. Systar’s OmniVision VMware Capacity Cluster Reports have calculated


that on the basis of CPU requirements 23 additional medium-sized VMs can be
added to cluster “clsioub19”. The same cluster shows available memory for 53 new
VMs. When determining the number of new VMs that can be hosted, it is
recommended to use the lesser of the Effective CPU and memory metrics as a
guide.

Sizing Objects Correctly

The next practice we will explore is the importance of sizing objects correctly. The key to
successful VMware capacity optimization is to set the Reservation property for each VM. The
Reservation determines the amount of resources that a VM can receive before it begins
competing with other VMs for the remaining shared resources. The Reservation also
determines the size of the VMKernel swap file for the VM’s memory and impacts HA and DRS
calculations.

Reservations are used by VMware Admission Control to prevent placing too many VMs on a host
based on resources. A VM can only be powered on if there are adequate unreserved resources
available on the host to satisfy that VM’s Reservation requirement. If all VMs have the default
Reservation setting of zero, then there is no effective Admission Control and VMs can be loaded

© 2009 Systar, Inc. 7


http://www.systar.com/solutions/virtualization_management
on hosts until overhead space finally runs out. This approach totally defeats Admission Control
and makes it very hard for both the VMKernel and DRS to manage resources optimally.

When HA is enabled, Reservations are used to calculate the amount of space (slots) needed to
meet the Failover Level Policy in effect. HA will calculate the maximum of all Reservations and
then based on the Failover Level Policy in effect, it will set aside space to provide the designated
number of VMs with sufficient capacity to operate if trouble strikes. Without Reservations in
place, HA must use an input parameter that defaults to 256MHz/256MB slots sizes, which is not
optimal.

Reservations are also used as one of the triggers for DRS VM migrations at Periodic Invocation
time (along with load balancing and other mandatory moves). If the sum of the Reservations on
a host exceeds its capacity then a VM is selected for migration to correct the situation.

Now that we have established the importance of Reservations when managing capacity, we will
provide some guidance on selecting its correct size. From our experience in assisting large
organizations with their VMware environments, a common practice is to set the Reservation in
the range of the 45th to 60th percentile for the VM’s historical resource utilization. Beyond this
setting, allow the VM to compete with other VMs based on Shares and Entitlements during
periods of greater load.

VM Workload Profiles

Reservation too low Reservation too high

Reservation low side Reservation high side

Figure 3. Systar’s OmniVision Workload Profile Reports display minimum (light blue), average
(blue) and peak (orange) resource usage hour-by-hour for CPU, I/O and Memory. The measures
are calculated from data collected every 15 seconds. Profiles are available of daily, multi-day,
weekly and monthly views. Average and maximum usage profiles can be used to accurately
assign VMware Reservations.

© 2009 Systar, Inc. 8


http://www.systar.com/solutions/virtualization_management
Once you have established the Reservations and Shares (Entitlements are established indirectly
based on total capacity, number of VMs, Limits and Shares) you need to track them against the
VM’s resource utilization over time. Usage should stay below the Reservation 50-60% of the
time and below the Entitlement 90% of the time or more. Where utilization is not matched well
to these settings, you can improve service delivery by providing more resources or reclaim
capacity by assigning fewer resources to the VM.

Once the reservations have been set, there is an opportunity to set Shares that represent the
VM’s business priority. Shares will determine resource Entitlement which the VMKernel will try
to provide when needed (e.g., handling peak workload periods).

Placing Workloads

The third practice to optimize your VMware capacity is placing workloads effectively. Capacity is
determined to some extent by how well a workload behaves with other workloads in the shared
resource environment.

If a number of workloads peak at similar times within an ESX Host, the result may be degraded
service or restricted capacity for other workloads needing access to the remaining resources.
Multiple peaking workloads can be more troublesome if their behavior pattern is unpredictable,
making them more challenging to manage. Peaky workloads like user-generated transactions
must be studied more carefully to see which match up best for resource sharing. On the other
hand, sustained workloads such as batch can be safely stacked, resulting in high capacity
utilization, because their peak resource requirements are well known. It would make sense, for
example, to place transaction workloads that peak during working hours with batch processes
that run throughout the night to provide a balanced use of the resources available. To
accomplish this type of safe-stacking requires a keen understanding of workload behaviors,
including average and peak usage, over a period of time. Most experts would request
monitoring of the workload behaviors for a minimum of one month. Of course, placement
accuracy will increase by gathering additional workload data points over an extended period of
time.

A good rule is to produce a stacked chart of all workloads targeted for a resource (e.g., CPU,
memory) belonging to the ESX Host. Flatter curves or sets of bars over time indicate a better
workload fit. A highly variable curve or set of bars means that the peaks may be coinciding.
Coinciding workloads increase the risk of resource contention, or wasted resources, and the
need to separate those workloads on different hosts or clusters.

© 2009 Systar, Inc. 9


http://www.systar.com/solutions/virtualization_management
Figure 4. Systar’s OmniVision Workload Profile Reports utilize pie and stacked bar
charts to represent the combined behavior of multiple VMs running on a host. The
stacked bar chart in the top-right corner shows a period between 12pm and 2pm
where multiple workloads are peaking at the same time and contention for
resources may be affecting quality of service as response times slow. Additionally,
the chart shows that up to 6 CPUs are under-utilized 22 hours of the day.

Another benefit of careful workload placement is to minimize DRS migrations. Although DRS
can automatically place a new VM and load balance existing ones, it only considers overall
resource utilization of the hosts and not workload profiles. Therefore, if a VM is migrated to a
host where workloads peak shortly after the move, DRS will trigger another migration. For
example, the figure below shows a stacked bar chart of VM Memory Resource use on an ESX
host. Each color represents memory use of a different VM. As you can see, memory use
declines on the Host from 2am to 1pm and then peaks. DRS could easily migrate a memory
intensive VM to this Host at 11am, but would then have to migrate it once again at 2pm if
memory contention reaches an unacceptable level. When workloads like this are not placed
effectively, it can result in continually wandering VMs.

Figure 5. Systar’s OmniVision shows a stacked bar of VM memory usage


for an ESX Host over a one-day period.

© 2009 Systar, Inc. 10


http://www.systar.com/solutions/virtualization_management
When considering placement of critical applications, we suggest placing the workload manually
in order to ensure the best fit with minimal migrations. Although the dynamic nature of
VMware workload migrations theoretically provides greater availability for critical VMs, best
practices recommend minimal shifting of these workloads. You may even want to set the VM to
manual migration for very critical workloads.

DRS migrations via vMotion are very efficient but still require some overhead. DRS migrations
should not be approached haphazardly (e.g., stacking VMs at random and letting DRS determine
how to best balance the workloads). Peaky workloads that are not placed effectively may result
in excessive migrations, causing what has come to be known as “vMotion sickness”. In general,
setting anti-affinity rules within VMware is not recommended, but can be very helpful for
workloads that are variable and may peak simultaneously.

Matching workload patterns is one of many considerations when stacking VMs on a host.
Affinity rules, geographical constraints, organizational alignment, compliance issues and other
factors will play into best practice guidelines for where workloads are permitted to be placed.

Optimizing DRS

As a reminder from Part I of this white paper series, Distributed Resource Scheduling (DRS)
provides a watchful eye over VMs in clustered environments. With an intention to provide each
VM its required resources, DRS observes resource utilization on each host within a cluster.
When unsatisfactory conditions are observed within one host, DRS assesses other hosts within
the cluster where conditions may be more attractive. If DRS finds a suitable location, it then
facilitates a VM move known as a VM migration.

Our fourth practice points to the need to optimize DRS. Not all application environments need
or are suited to its workload balancing features. For some sets of applications (VMs) it makes
sense not to use DRS. For example, horizontally scaled applications like web servers are already
load balanced.

In other instances, the cluster may be hosting hundreds of tier 2, non-critical business
applications that roughly demonstrate the same resource consumption. This environment may
be best suited to setting DRS to “Auto” and the default level to “Aggressive” for all VMs. Auto
settings allow for the initial placement of a VM inside the cluster to be automated and the
automatic execution of migration recommendations. Aggressive migration thresholds will
trigger movements that promise even a slight improvement in the cluster’s load balance.

Where sets of workloads take on the opposite profile of the example above - becoming less
homogeneous and more critical in nature - you will want to consider less aggressive migration
thresholds and either partially-automated or manual placement and migration settings.

The vCenter screen shot below shows a DRS enabled cluster with 4 ESX Hosts, of which only 2
are active. These systems are hosting 36 VMs and show 190 migrations. In this instance,
workloads are clearly not balanced properly and excessive migrations are occurring.

© 2009 Systar, Inc. 11


http://www.systar.com/solutions/virtualization_management
Figure 6. VMware’s vCenter shows 4 hosts, of which 2 are active. In this case,
190 Migrations were observed. (Source: VMware)

As we discussed above in the Placing Workloads section, accomplishing safe-stacking requires a


keen understanding of workload behaviors, including average and peak usage, over a period of
time. The more variable the workload, the harder it is to determine proper placement for new
VMs. If DRS migrations are occurring frequently, the workload profiles of new VMs will be
difficult to match to existing VMs on a host. Based on our extensive experience in profiling
workloads and recommending VM placement, the goal of the optimizing DRS practice is to
minimize migrations and use the capability as a last resort.

Minimizing Impact of High Availability

Our final practice is centered on high availability (HA) within DRS-enabled clusters. HA is a very
cost-effective capability built into the DRS-enabled cluster. However, there is a tradeoff and its
cost is twofold:

• HA’s strict Admission Control is very conservative and wastes a great deal of capacity
• It is difficult to understand whether an application can be restarted in a given cluster
state

The current method of calculating VMware’s HA failover capacity is complicated (and too
lengthy to share here). However, in short, HA’s strict Admission Control uses the maximum
Reservation size in the cluster as a slot size for all calculations. In fact, many users report seeing
a message that there are “Insufficient resources to satisfy configured failover level for HA”,
when attempting to configure their HA environment.

Many sites we talk to limit resource utilization far below what they might need to restart the
critical VMs on a host. And many of these same sites do not set their HA restart priority. We
recommend setting and optimizing the restart priority around two points: minimizing capacity
loss, and ensuring critical VMs restart immediately while low priority VMs restart when possible.

© 2009 Systar, Inc. 12


http://www.systar.com/solutions/virtualization_management
In order to minimize capacity loss from poor HA configurations, we recommend the following
approaches:

• Turn off strict HA Admission Control; VMware admits this is a very conservative
approach (i.e., it wastes a lot of capacity).
• Set the restart priority of all VMs very carefully, usually according to the policies defined
in your DR plan.
• Take the max of the sum of the Reservations for your High Priority VMs on any host.
Subtract that sum plus an additional 5% for overhead from the aggregate capacity.
Subtract the remainder from your required headroom which is based on the peakiness
of the workloads and your risk tolerance. Manage the cluster to the remaining
“effective capacity”.

VMware is planning to change its HA approach in ESX 4, and once we have sufficient experience
with that release, this section of the paper will be revised.

Summary

As your VMware environment continues to expand and the pressure to reduce costs continues
to increase, applying the five practices recommended in this paper will provide greater control
over new virtualization spending and improve the quality of services delivered. Systar is
confident that by following these practices, your organization will be able to safely maximize the
utilization of your VMware capacity above the 50% mark.

Applying virtualization-aware management solutions, processes, and best practices is key to


achieving results that deliver standing ovations. Optimizing capacity utilization is not only a
great practice, but can result in substantial savings. According to a recent IDC report8, “an
optimally managed or ‘advanced virtualization’ infrastructure (described as an infrastructure
that includes penetration of virtualized servers of more than 25%, storage virtualization, and the
use of systems management tools) can deliver a total [cost] reduction of up to 52% per user per
year”.

Your standing ovation awaits.

8
Business Value of Virtualization: Realizing the Benefit of Integrated Solutions, July 2008

© 2009 Systar, Inc. 13


http://www.systar.com/solutions/virtualization_management
Glossary

The concepts in this paper apply to most virtual environments – however, we use VMware VI 3.x to
illustrate our points.

• Admission Control – if on, will not allow the power on of a new VM on a host if there is not enough
unreserved resources available for the VM’s specified reservation

• Workload – the basic unit of work, a Virtual Machine (VM).

• Capacity – the aggregate capacity of a host or cluster.

• Cluster – a group of hosts managed as an aggregate computing resource, VMware DRS-enabled


cluster

• Configured size – the number of vCPUs and the number of MB of memory that represent the size of
the physical machine the VM is presented with.

• Entitlement - the computed result of configurations, reservations, limits, and shares used to establish
the resource allocation given to each VM for its operation. The Entitlement will always fall between
the Reservation and the Limit, based upon its Share.

• Effective capacity – the amount of that capacity that can be used given workload mix, high availability
requirements and white space.

• Host – a server that supports multiple workloads and the ESX server.

• Limit – this property serves as a hard cap on resource allocation for a VM. If Limit is not specified,
then the configuration size is the Limit.

• Reservation – the amount of CPU MHz and MB of memory (in absolute units) that a VM is guaranteed
should it need it.

• Shared resources CPU and memory resources that are actively managed by the hypervisor

• Shares – relative units that determine a VM’s priority among sibling VMs, used to determine resource
allocation under contention.

• White space – capacity on a host that cannot be utilized due to alignment of resource requirements,
or resources that cannot be used since they are too small support a whole VM.

© 2009 Systar, Inc. 14


http://www.systar.com/solutions/virtualization_management
About Systar

Systar is a leading worldwide provider of performance management software. Systar’s OmniVision


product suite enables customers to achieve the optimal alignment between IT resources and business
requirements in both distributed and virtualized server environments. Systar’s proven capacity
management solutions deliver the full benefits of virtualization by enabling customers to gain visibility
into these complex environments, tune for optimal capacity, and move business-critical applications into
production with full confidence.

United States Germany


8618 Westwood Center Dr. Mergenthallerallee 79-81
Suite 240 D-65760 Eschborn
Vienna, VA 22182 Tel. +49 211 598 8520
Tel. +1 703-556-8400 info-de@systar.com
Fax +1 703-556-8430
info@systar.com Spain
Centro de Negocios Eisenhower
France C/ Cañada Real de las Merinas, 17
171 bureaux de la Colline Edificio 5 - 1º D
92213 Saint-Cloud Cedex 28042 Madrid
Tel. +33 (0) 1 49 11 45 00 Tel. +34 91 747 88 64
Fax +33 (0) 1 49 11 45 45 Fax +34 91 747 54 35
info-fr@systar.com info-es@systar.com

United Kingdom
Ground Floor Left
3 Dyer’s Buildings
London EC1N 2JT
Tel. +44 2072 692 799
Fax +44 2072 429 400
info-uk@systar.com

Systar, BusinessBridge, OmniVision, BusinessVision, ServiceVision, WideVision and Systar’s logo are registered
trademarks of Systar. All other brand names, product names and trademarks are the property of their respective
owners. Copyright 2009.

© 2009 Systar, Inc. 15


http://www.systar.com/solutions/virtualization_management

Das könnte Ihnen auch gefallen