02 Beyond Server Consolidation 2

FOCUS
Virtualization
Beyond Server
20 January/February 2008 !#- 15%5% rants: feedback@acmqueue.com

Consolidation
7%2.%26/'%,3!-!:/.#/-
Server consolidation helps companies
improve resource utilization, but
virtualization can help in other ways, too.
V irtualization technology was devel-

oped in the late 1960s to make more
efficient use of hardware. Hardware was
server consolidation, we are underestimat-
ing its true potential. Virtualization breaks
the 1:1 relationship between applications
expensive, and there was not that much and the operating system and between
available. Processing was largely out- the operating system and the hardware.
sourced to the few places that did have The removal of this constraint not only
computers. On a single IBM System/360, benefits us in creating N:1 relationships
one could run in parallel several environ- where we run multiple isolated applica-
ments that maintained full isolation and tions on a single shared resource, but also
gave each of its customers the illusion of enables 1:N relationships where applica-
owning the hardware.1 Virtualization was tions can span multiple physical resources
time sharing implemented at a coarse- more easily by providing elasticity in their
grained level, and isolation was the key resource usage.
achievement of the technology. It also
provided the ability to manage resources #/.3/,)$!4)/.
efficiently, as they would be assigned Classic consolidation is focused on multi-
to virtual machines such that deadlines plexing physical resources over a number
could be met and a certain quality of ser- of virtualized environments. The imme-
vice could be achieved. diate benefits are obvious: reduce the
At first glance it appears that not much amount of hardware, reduce the data-cen-
has changed. Today the main application ter footprint, and indirectly reduce power
of virtualization technology in the enter- consumption. The latter is an increasingly
prise is to combat server sprawl through important driver for consolidation since
virtualization-based consolidation. Isola- energy companies are starting to provide
tion, security, and efficiency remain the significant incentives for cutting back
main benefits of using virtual machines in consumption. Consolidation is in essence
this context. a cost-reduction activity; by significantly
Even though this article is mainly reducing the server footprint (by 30 to 50
about improving resource utilization, if we percent or even more), the capital invest-
consider virtualization only as a tool for ment requirements are directly affected,
more queue: www.acmqueue.com !#- 15%5% January/February 2008 21

FOCUS
Virtualization
Beyond Server Consolidation

which leads to reduced staffing needs and lower opera- ing the data center every day, the utilization number is
tional costs. decreasing rather than going up.
One of the main causes for server sprawl in the enter- Single averages seldom tell the whole story, however.
prise has been the requirement by vendors to run their Utilization of servers is highly dependent on the type
applications in isolation. This requires the IT department of workloads and is often subject to periodicity. If you
to dedicate one or more servers to an application, even inspect utilization over longer periods, you will find that
if the servers provide more resources than the applica- it is more accurately represented by a range that differs
tion requires. Also at the infrastructure level we see depending on the application. In their article on energy-
that the modern enterprise has many dedicated servers: proportional computing, Luiz André Barroso and Urs
DNS, DHCP, SMTP, printing, Active Directory/LDAP, etc. Hölzle show that in a highly tuned environment such
Another driver of sprawl is operating system heterogene- as Google’s the utilization tends to fluctuate between 10
ity: a mail server that requires Windows Server, a database and 50 percent when inspected over longer timeframes.2
that is best run on Solaris, a network management pack- Figure 1 shows the average CPU utilization of more than
age originally acquired for use with AIX, etc. 5,000 servers at Google during a six-month period. This
Add to this the effects of mergers and acquisitions and data reflects our experiences at Amazon; some utilization
other integration projects and you will find that an enter- is driven by customer behavior, but some is triggered by
prise with a large collection of servers, each dedicated to fulfillment process patterns or digital asset conversions.
a single task, is a common pattern. Mergers and acquisi- Cost reduction is an important goal in many IT depart-
tions in particular bring new applications or application ments, and server consolidation certainly tops the focus
versions, additional servers, and, often, new complex list. Virtualization has become the primary tool in driving
integration middleware. It is not uncommon that after server consolidation: 81 percent of CIOs were using virtu-
a merger the number of servers to support the new alization technologies to drive consolidation, according
infrastructure is larger than the combined server count of to a recent survey by CIO Research.3 Even though the
the separate companies. Given the complexity of these strategy appears mature, consolidation architects still face
integration projects, the IT organization relies heavily significant technical hurdles.
on coarse-grained server-driven isolation to achieve the The first challenge in the consolidation process is
integration.
The large number of
Average Server-CPU Utilization at Google
underutilized servers has 0.030
become a major problem
in many IT departments.
0.025
Individual companies
provide no official num-
bers about server utiliza- 0.020
fraction of time
tion, but many of the large

analyst firms estimate that 0.015
resource utilization of 15
to 20 percent is common.
0.010
From personal experience
in talking to other CTOs
and CIOs around the 0.005
world, I believe that those
numbers are on the high
FIG 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
side and the true utiliza-
CPU utilization
tion is often in the 5 to 12 graph courtesy of Barroso & Hölzle
percent range. With more
powerful servers enter-
22 January/February 2008 !#- 15%5% rants: feedback@acmqueue.com

how accurately to characterize an application’s resource occurs with spikes. As such, some CPU cycles or IOPS
requirements. An engineer’s educated guess would, in (I/O operations per second) will always be unused when
general, result in an incomplete view of the real con- you measure utilization at larger time scales. Even at the
straints. Building a resource-usage profile of an applica- individual operating system level, however, we know that
tion is essential, and such a profile does not only focus perfect utilization is not possible. For example, an operat-
on which resource eventually will bound the application, ing system such as Linux may start to behave unpredict-
but also analyzes resource usage over time to determine ably under combined high CPU/IO loads. We joke that
periodicity and whether dependencies on other system some of these operating systems exhibit an “Einstein
or application components exist. The second part of the Effect”—at high utilization, space and time are no longer
profile is how the application behaves when it runs out guaranteed to behave the same.
of capacity: how sensitive is the application to resource As a consequence, the measure of success of con-
shortage, can it adapt, or does the environment need to solidation is set more realistically: for pure CPU-bound
maintain strict bounds? environments, 70 percent seems to be achievable for
A common step before this analysis is to break up highly tuned applications; for environments with mixed
applications that run in shared environments and put workloads, 40 percent is a major success, and 50 percent
each one in an isolated environment in the hope that has become the Holy Grail.
it will become easier to predict the individual applica- For applications and servers that do become over-
tions’ resource usage in response to request patterns. They loaded, migration is potentially a solution. Transparent
will then be separately managed according to their own migration, however, is hard to achieve, and many legacy
resource profiles. applications do not respond favorably to this. Two more
The next challenge arrives once there is a clear picture coarse-grained techniques seem to be effective: applica-
of resource usage and scaling: how optimally to distribute tion checkpoint and restart has been built into several
the virtual machines hosting the applications over the applications as a disaster recovery tool and is used to
physical resources. This is an area with many emerging move applications to different physical servers; and a
tools to assist system architects in finding the right mix, number of applications can be run in clustered mode (e.g.,
but reports from the field indicate that this is still largely MS Cluster Service enabled), where a second node can be
a process of trial and error before a reasonable balance is brought up and at the application level, state and work
achieved. This is not a trivial task. As we can see from fig- can be migrated from the first to the second node.
ure 1, resource usage may change significantly over time, An extreme example of the use of virtual machine
which makes the relevant load testing very hard. migration is application parking. In this case several
The biggest challenge of the whole consolidation pro- applications that are hardly using any resources are each
cess, however, is without a doubt the balancing of server running in their own VMs but are sharing one physical
workloads at runtime; 64 percent of the CIOs mention server when they are in rest state. As soon as an applica-
this as problematic in the CIO Research survey. Because of tion starts using more resources, it is migrated to a server
the reduced slack in the system, the applications are more that has sufficient resources available that fit the applica-
exposed to resource shortages, especially in situations tion’s profile.
where workloads are highly dynamic.
A good example of a business with changing resource Beyond cost saving
demands is Powerset. Initially building indexes and Until now we have discussed traditional consolidation, as
updating indexes over time have very different resource exercised by many IT departments, where the main focus
demands. Powerset has released a data-center resource is thoroughly analyzing enterprise-wide resource usage
analysis tool that helps predict which business-specific and using virtualization to multiplex those resources as
scenarios make sense for buying, leasing, or using virtual- efficiently as possible. Business priorities determine at
ized resources. Given the changing resource demands, in any given time how efficiency is measured. In the classic
most cases the virtualized servers are more cost effective.4 environments we see a grow-and-shrink trend; an appli-
cation is brought in on its own server or added as part
Is 100 percent utilization the goal? of a merger integration. This is followed by a phase of
There are many reasons why we will never see 100 resource usage and risk analysis, which determines where
percent utilization: workloads in the enterprise are the applications can be collocated in a virtualized man-
heterogeneous, and demand may be uncertain and often ner, after which the server pool shrinks again.
more queue: www.acmqueue.com ACM QUEUE January/February 2008 23

FOCUS
Virtualization

In all of this, however, virtualization is used as a tradi- a physical server based on resource requirements. This
tional IT cost-saving tool. The real power of virtualization provides engineers with the ability to grow and shrink the
as a strategic enabling technology comes when you con- resources their services use based on customer demand
sider its role in application deployment and management. and other scaling attributes.
With the right virtualization management tools you can This brings us to one of the main strategic advantages
get to an environment in which you can significantly of virtualization: it creates a uniform application deploy-
speed up the time to market of new applications and have ment environment where engineers are shielded from the
them scale efficiently to customer demand. particulars of the underlying hardware. It is not uncom-
A good example of this is the role of virtualization in mon to see a single virtual machine running on a physi-
Amazon’s infrastructure. Amazon is the world’s largest cal server, where the goal is not to maximize efficient
service-oriented software organization, where not only resource sharing, but to speed up deployment of applica-
the technology is service oriented, but also people are tions and to scale up and down at a moment’s notice.
organized in teams that mirror the software organiza- Feedback from Amazon EC2 customers revealed that
tion. This gives Amazon great agility in customer-focused they were traditionally confronted with significant over-
business and technology development. In running close head in acquiring resources from their IT organizations.
to 1,000 services, Amazon ended up with many engineers Server acquisition times often run into several months,
performing similar tasks, most of them related to resource and once a resource has been allocated to an application,
management: managing application deployments, con- teams are unwilling to release it given the long lead times
figuring servers, handling storage failures, configuring in reacquiring the resource when needed again.
load balancers, etc. Conservative estimates indicated that This conservative approach requires long resource
engineers were spending up to 70 percent of their time on planning cycles: teams need to predict their resource
general tasks not directly related to the business function- usage long ahead of deployment and execution, which
ality of their service. triggers overscaling to deal with unexpected higher
We decided to bring these common activities into an demands on the application. This model is a stumbling
infrastructure-services platform where they could be man- block for enterprises that want to react to demand faster
aged more effectively while maintaining Amazon’s focus and more efficiently. There is increasing uncertainty in
on reliability and performance. Storage, compute, and many markets as product and service life cycles are com-
messaging were virtualized as infrastructure services. A pressed and increased competition makes the success of
number of these services have since been made available products more difficult to predict. To adapt to these new
outside of Amazon: S3 (Amazon Simple Storage Service), realities, enterprises need to shift to different models for
EC2 (Elastic Compute Cloud), SQS (Simple Queue Ser- their resource management, where acquiring and releas-
vice), and SimpleDB.5 ing resources based on demand is becoming an essential
Two key requirements in the design of these infra- strategic tool. In this context the pay-as-you-go model of
structure services markedly changed the way resources the Amazon infrastructure services is very attractive.
are managed: the services are fully self-service, allowing Having the virtual machine as the standardized unit
engineers to start using them with minimal friction; and of deployment is crucial in adapting to shifting resource
resources can be managed dynamically, giving engineers demands, where it is important not only to acquire
the power to acquire and release resources immediately. resources but also to release them when they are no lon-
ger needed. Many of Amazon’s EC2 enterprise customers
Uniform Application Deployment claim that their resource acquisition cycles have changed
Amazon EC2, the service most similar to traditional virtu- from months to minutes.
alization, uses a model where engineers programmatically One area characterized by very long cycles in acquir-
can start and stop instances that they have previously ing resources is IT in government. Funding and allocation
built.6 These instances are virtual machine images that decisions often require teams to purchase servers at the
are the output of the application build process, and they beginning of a project, many months before the software
are stored in the Amazon S3 storage service. The EC2 is completed and before a good usage pattern has been
management environment places the virtual machine on developed. This leads to ultra-conservative planning with
24 January/February 2008 ACM QUEUE rants: feedback@acmqueue.com

low utilization of the ultimate configuration and results where you pay for the resources only for the period of
in significant barriers to prototyping and experimenta- time that you actually use them. Frequently, enterprises
tion. One DoD IT architect reported that the department’s start using these utility computing services to address
software prototype normally would cost $30,000 in server their needs for overflow and peak capacity; this way they
resources, but by building it in virtual machines for Ama- can deal with uncertainty in demand without big invest-
zon EC2, in the end it consumed only $5 in resources.7 ments in hardware that will be idle most of the time. This
The new agility also caters to other advantages of on-demand acquiring and releasing of resources is addic-
using virtualized infrastructures. While traditional consol- tive; once enterprises have become comfortable using a
idation based on virtualization only increases the density computing utility service for handling peaks, they quickly
of resource usage, there still may be barriers to changing start using it for other tasks, especially those that do
the mix of applications and services running at any given not require around-the-clock resource allocation such as
time. Incorporating virtual machines in the change man- document indexing, daily price calculations, digital asset
agement process and adding autonomic management fea- conversion, etc.
tures significantly improves the agility of the enterprise. A good example of using utility computing for excess
Using economic models to automate resource allocation capacity tasks is the New York Times’s project to convert 11
to optimize business value remains a Holy Grail. million historical articles from TIFF to PDF. Finding suf-
ficient capacity on the corporate server would have been
Virtualization of the Data center difficult, given the deadlines for the project, and buying
Virtualization plays a crucial role in enabling the IT additional hardware for such a one-off task would not be
organization to grow beyond its data centers and exploit very efficient. The Times created a virtual machine image
utility computing infrastructures. Utility computing is the containing a special conversion application, moved 4 TB
packaging of resources such as computation and storage of images into Amazon S3, and fired up 100 instances of
as metered services similar to public utilities (electricity, the virtual machine in Amazon EC2. Within 24 hours all
water, natural gas, and telephone networks). This has the articles were converted into 1.5 TB of PDF at the cost of a
advantage of low or no initial cost to acquire hardware; fraction of a single server.8
instead, computational resources are essentially rented. One of the benefits of this model is that measuring
In an organization where virtualization is already TCO (total cost of ownership) becomes easier; instead of
pervasive to support consolidation and/or application amortizing the costs of server, network, power, and cool-
deployment scenarios, the tight dependency between ing over a number of applications running on a server,
application/operating system and the physical hardware the absolute infrastructure costs are metered utility cost.
has already been removed. Running the virtual machines Looking at the wide variety in companies that use
on hardware that is not directly controlled by the organi- virtualization to run their applications in Amazon EC2,
zation is a logical next step. one can see that utility computing has many applications
Utility computing services are different from tradi- beyond enterprise capacity management. Usage ranges
tional application outsourcing where the infrastructure from classical parallel computing by financial and phar-
owner runs the application on behalf of the client and maceutical companies to startups running Web services,
has application-specific knowledge. These services are also from large software companies using it for product and
different from grid environments as they do not impose a release testing to image rendering by movie studios. All
particular programming model to be used for application this is enabled by virtual machine technology for packag-
development. Instead a utility computing service allows ing and instantiating the applications, managing security,
its customers to launch virtual machine instances on and on-demand access to required resources.
their hardware in a manner similar to running these VMs
in their private data centers. Amazon EC2 is one of the Testing, testing, …
prominent services that offer access to compute resources Software testing is another area that is always on the
in a utility style using virtual machines; EC2 customers short end of receiving resources and has much to gain
can package virtual machines as they run them in their from virtualization. The demands of testing on the infra-
data centers to run in Amazon EC2 as well. structure change during the development cycle. Early in
Using utility computing services benefits the cost-sav- the cycle one may use a continuous integrating tech-
ing targets that often underlie consolidation efforts; capi- nique with nightly rebuilds of the environments, chang-
tal expenditures are greatly reduced by going to a model ing to load and scale testing later in the cycle. Test
more queue: www.acmqueue.com ACM QUEUE January/February 2008 25

FOCUS
Virtualization

engineers often need to keep many different servers run- to see a very different picture a number of years from
ning, each with a different version of an operating system now, where virtualization will be the key enabling tech-
for managing release testing. nology for a series of strategic changes in IT.
Traditionally, QA departments manage their own Adaptive resource management using utility com-
resources, and in many cases they are highly constrained puting will be essential to success in an economy with
in the resources available to them. Even in this con- increasing uncertainty. Adapting quickly to new customer
strained environment, there would be periods during a demands, new business relationships, and cancelled
day, week, or year where the hardware would go unused. contracts will be a key business enabler in the modern
Virtualization has changed the QA process dramatically enterprise, regardless of whether the enterprise executes
by acquiring resources on demand when they are needed a software-as-a-service strategy or uses the resource in a
for particular tests and releasing them when the tests are more traditional manner.
finished. This has tremendously improved the utilization Virtualization will change the way we do testing,
the testers get out of their environments. No longer is with QA departments getting access to a greater variety
there a need to have many different operating systems of resources than they ever had before—at a much lower
running or to have complex multiboot environments cost to the business. Similarly, companies that were not
around; starting and stopping different operating system proficient in handling reliability, fault tolerance, and
images becomes an on-demand activity. Going virtual has business continuity will find in virtualization a new tool
in many cases increased the number of resources available that will allow them to make significant progress toward
for QA at any given time, as the pool of physical resources these goals without rewriting all of their software. Q
can be shared with the production environment. This
makes load testing at scale more realistic. References
1. Gum, P. H. 1983. System/370 extended architecture:
Scale, reliability, and security Facilities for virtual machines. IBM Journal of Research
While this article focuses on the role of virtualization in and Development 27(6): 530-544.
utilization management, there are other areas where vir- 2. Barroso, L. A., Hölzle, U. 2007. The case for energy-pro-
tual machines can play an important role. One of those portional computing. IEEE Computer 40(12).
is security, where many innovative uses are possible but 3. CIO Research. 2008. Virtualization in the enterprise
where even the simplest brings many benefits. Moving survey: Your virtualized state in 2008. CIO Magazine
an application from a shared environment into its own (January).
dedicated virtual machine allows for straightforward 4. Powerset Datacenter Dashboard; http://www.powerset.
operator and user access control. It can reduce the num- com/flash/datacenter_model.
ber of open ports and as such the potential for exposure 5. Amazon Web Services; http://aws.amazon.com.
to vulnerabilities. Many IT groups use this technique to 6. Amazon Elastic Compute Cloud (EC2); http://aws.
meet compliance requirements for applications that do amazon.com/ec2.
not have adequate access control and auditing. 7. Brewin, B. 2006. Amazon.mil? DISA is intrigued by
Similarly, the use of VMs for uniform application Web services model for creating systems. Federal Com-
deployment can be the basis for disaster management. puter Week (October 30).
Often a simple checkpoint-restart facility is sufficient to 8. Gottfrid, D. 2007. Self-service, prorated super comput-
do fast failover between machines. If applications are ing fun! New York Times (November 1).
built for incremental scalability, the adaptive manage-
ment facilities such as those in utility computing infra- LOVE IT, HATE IT? LET US KNOW
structures will allow organizations to quickly grow and feedback@acmqueue.com or www.acmqueue.com/forums
shrink capacity based on demand.
WERNER VOGELS is vice president and chief technology
Summary officer at Amazon.com, where he is responsible for driving
Virtualization’s main application in the enterprise is still the company’s technology vision.
server consolidation. As effective as that is, we are likely © 2008 ACM 1542-7730/08/0100 $5.00
26 January/February 2008 ACM QUEUE rants: feedback@acmqueue.com

02 Beyond Server Consolidation 2

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

02 Beyond Server Consolidation 2

Hochgeladen von

Copyright:

Verfügbare Formate

FOCUS

20 January/February 2008 !#- 15%5% rants: feedback@acmqueue.com

V irtualization technology was devel-

more queue: www.acmqueue.com !#- 15%5% January/February 2008 21

Beyond Server Consolidation

tion, but many of the large

22 January/February 2008 !#- 15%5% rants: feedback@acmqueue.com

more queue: www.acmqueue.com ACM QUEUE January/February 2008 23

Beyond Server Consolidation

24 January/February 2008 ACM QUEUE rants: feedback@acmqueue.com

more queue: www.acmqueue.com ACM QUEUE January/February 2008 25

Beyond Server Consolidation

26 January/February 2008 ACM QUEUE rants: feedback@acmqueue.com

Das könnte Ihnen auch gefallen