Sie sind auf Seite 1von 9

ClientVisor: Leverage COTS OS Functionalities for Power

Management in Virtualized Desktop Environment


Huacai Chen1,2, Hai Jin1, Zhiyuan Shao1 Ke Yu2, Kun Tian2
1 2
Services Computing Technology and System Lab Open Source Technology Center (OTC)
Cluster and Grid Computing Lab Intel Corporation, Shanghai, 200241, China
School of Computer Science and Technology ke.yu@intel.com
Huazhong University of Science and Technology
Wuhan, 430074, China
hjin@mail.hust.edu.cn

Abstract cation in future. A typical installation of virtualized desktop


environment consists of a Virtual Machine Monitor (VMM),
As an emerging trend, virtualization is more and more widely which lies between the hardware and the operating systems, and
used in today’s computing world. But, the introduction of virtual three guest virtual machines (we call them domains in the rest of
machines bring trouble for the power management (PM for short), this paper): a control domain, which provide a set of management
since the operating system can not directly access and control the tools to control and monitor other domains; a background domain,
hardware as before. Solutions were proposed to manage the power which performs auxiliary functionalities, such as virus checking,
in the server consolidation case. However, such solutions are or network flow filtering; and a primary user domain, which actu-
VMM-centric: the VMM gathers the PM decisions of the guests ally interacts with the user. A COTS operating system is always
as hints, and makes the final decision to manipulate the hardware. installed in the primary user domain according to the user’s favor-
These solutions do not fit well for the virtualized desktop envi- ites.
ronment, which is highly interactive with the users. Compared with the traditional computing model (OS on bare
In this paper, we propose a novel solution, called ClientVisor, hardware), the virtualized desktops provide more exciting new
to manage the power in the virtualized desktop environment. The features at ease. For example, manageability, by which, the sys-
key idea of our scheme is to leverage the functionalities of the tem can greatly ease the installation of a new operating system
Commercial-Off-The-Shelf (COTS) operating system, which actu- and configuration of the applications; security isolation, by which,
ally interacts with the user, to manage the power of the processor the VMM can isolate the primary user domain from the dangerous
and the peripheral devices in all possible cases. VMM coordinates world of virus and attacks, and can even rollback the domain to a
the PM decisions of the guests only at the key points. By proto- healthy state after crash; and accessibility, by which, the system
type implementation and experiments, we find our scheme results can be controlled from remote. However, besides these advan-
in 22% lower power consumption in the static power usage sce- tages, virtualization brings trouble for the power management
nario, and about 8% lower in the dynamic scenario than the corre- (PM), since the operating system within the primary user domain
sponding cases of Xen. Moreover, the experimental data shows can not directly access and control the hardware as in the tradi-
that the deployment of our scheme will not deteriorate the user tional computing model.
experience. Solutions were proposed to manage the power of consolidated
Categories and Subject Descriptors D.4.7 [Operating Sys- servers. They make PM decisions completely inside VMM, or
tems]: Organization and Design; K.6.4 [Management of Com- gather the decisions on PM made by the guest domains, as hints,
puting and Information Systems]: System Management and make the final control on hardware by these hints. Unfortu-
nately, the experience we had learnt from the PM solution on
General Terms Algorithms, Management, Performance, De- consolidated server can not apply to the virtualized desktop envi-
sign, Experimentation ronment directly. In server consolidation case, the guest domains
are treated equally, and fairness is one of the important design
Keywords Virtual Machine, Client Virtualization, Power Man- objectives of the system. While for the virtualized desktop case,
agement the roles of guest domains are asymmetric: the primary user do-
main occupies most of the CPU cycles so its internal power re-
1. Introduction quirement is approximate to the global requirement.
As an emerging trend, virtualization is more and more widely Moreover, the PM decisions made by the primary user domain
used in today’s computing world. Among the applications, using are very important, and it may deteriorate the user’s experience if
virtualization technology on the desktops becomes an important they are not followed. Besides, since the device type is limited in
branch. The continuously growing computing power of the desk- server environment while desktop has variety type of peripherals,
tops (e.g., the multi-core platforms) will greatly foster such appli- the device power management method used in server consolida-
tion case is hard to be directly applied in desktop virtualization.
In this paper, we propose a novel solution, called ClientVisor,
to solve the PM problem in the virtualized desktop environment.
Permission to make digital or hard copies of all or part of this work for personal The key idea of our solution is to leverage the functionalities on
or classroom use is granted without fee provided that copies are not made or PM of the COTS operating system, which hosted in the primary
distributed for profit or commercial advantage and that copies bear this notice and user domain, to directly manage the power of processor and de-
the full citation on the first page. To copy otherwise, or republish, to post on servers
or to redistribute to lists, requires prior specific permission and/or a fee. vices except the sensitive ones (e.g. NIC in ClientVisor). VMM
VEE’09 March 11–13, 2009, Washington, DC, USA. will coordinate the PM decisions made by the guest domains only
Copyright © 2009 ACM 978-1-60558-375-4/09/03…$5.00.

131
at the critical points, e.g., when the primary user domain decides Above researches greatly improve power efficiency by gluing
to put CPU in a sleep state. various software policies to hardware power states, however the
In order to simplify the discussion, we choose virus checking introduction of virtualization immediately breaks those efforts as
and network flow filtering as the background domains, and im- a new virtualization layer is added between original software
plement a prototype system based on the proposed solution. By policies and low level hardware. Latest Xen release (with version
experiments, we find our scheme results in 22% lower power number 3.3.0) has a VMM-centric power management scheme
consumption in the static power usage scenario, and about 8% with VMM managing various power states directly by heuristics
lower in the dynamic scenario than the corresponding cases of style. Some researches have also been done in server virtualiza-
Xen. The experiment data also show that the system does not tion area, especially in large data center environment. Virtual-
deteriorate the user experience: the performance will decrease Power Management (VPM) [19] exports virtualized soft power
only about 2%~3%. states into VMs, and then intercepts VM decisions as hints to
The contributions of this paper are two folded. First, we intro- either real power policy or resource allocation like scheduler.
duce a new PM model for the virtualized desktop environment However, the exposed states are different from the low level
from the counterparts applied on consolidated server case. This power characteristics to provide a uniform basis across heteroge-
model can minimize the side-effects of the virtualization layer by neous hardware platforms, which may make existing power man-
considering the asymmetry of the guest domains’ roles, which is agement decisions within VM suboptimal, especially in a desktop
applicable to the wide range of client side computing devices. We environment where lots of platform specific tricks exist.
believe this research will greatly illuminate other related research
areas on PM problems for other mobile devices (e.g., PDAs, 3. Key Design Issues
handsets, and so on) when they are virtualized. Second, in this In this section, we discuss the key design principles as well as the
paper, we study the PM problem for I/O devices, which is largely related techniques of ClientVisor. Section 3.1 gives the overview
ignored in consolidated server case. of the architecture. Section 3.2 discusses the technique employed
The rest of this paper is organized as follows: Section 2 briefly to expose the power feature of the platform components. Section
surveys the related works. Section 3 discusses the key design 3.3 and section 3.4 explain the strategies employed to handle
issues of ClientVisor. Section 4 conducts experiments on the pro- processor and device power management respectively. Section 3.5
totype implementation, analyzes the result data and makes com- gives a discussion on the tuning and optimization techniques.
parisons with other similar solutions. Section 5 concludes this
paper and discusses the future works. 3.1 The Architecture Overview
2. Related Works ClientVisor implements VMM based on Xen with version number
3.1.0, and considers three types of co-existing domains: one pri-
Low power requirement is pushed on almost every component in mary user domain, one or more background domains, and one
today’s computer architecture. Hardware level researches include control domain (i.e., Domain 0 in Xen). Figure 1 depicts the ar-
transistor sizing [5], transistor reordering [17], and logic gate chitecture of the system. In order to facilitate further discussion,
reconstructing [24], etc. Such studies try to reduce power con- we name the operating system installed in the background domain
sumption on transistor level with software transparency. Besides, as Service OS (SOS for short), and that installed in primary user
software controllable states are exposed to adapt to various user domain as Capability OS (COS for short).
power profiles, such as Dynamic Voltage and Frequency Scaling Within the architecture, the control domain is deployed for
(DVFS) and clock gating in OS level. DVFS scales the voltage management purposes, which includes but not limited to creating
and frequency according to some run-time activities like CPU and destroying the other domains. SOS is employed to contain the
utilization [6][8], while clock gating is used to put the processor virtual appliances (i.e., VA in Figure 1) that perform auxiliary
in a low power state if system is actually idle. functionalities, such as virus checking, network flow filtering.
The introduction of multi-core technology adds more com- Besides CPU and memory, SOS needs other peripheral devices to
plexity regarding to cross-core dependency, like a shared power achieve functionalities of the virtual appliances. In this paper, we
plane, which triggers the concept of power-aware task scheduling call such devices as sensitive devices. For example, for the net-
to count such dependencies [15][21][24][27]. These instruments work flow filtering application, which is chosen for the prototype
are also used to tackle thermal emergencies, which is called dy- implementation of ClientVisor, the NIC is the sole sensitive de-
namic thermal management (DTM) [2][16]. Memory and various vice. Moreover, as the virtual appliances may require different
devices are also power consumers, and thus novel researches are execution environments, ClientVisor is designed to harbor multi-
driven into this area. The concept of power-aware virtual memory ple background domains, i.e., multiple SOSes that contain appli-
is introduced [10], by adding spatial factor into page allocator for cations for different purposes.
a given task which keeps room for more memory DIMMs into a As in most cases, the user may adopt a legacy COTS OS as the
low power state. Also research is done on how to aggregate disk COS, and the primary user domain is designed to be a HVM
accesses into burst which then allows for longer idle period of (Hardware-assisted Virtual Machine) guest, which exploits virtu-
disk being spun down [22]. Other papers talk about moving most alization extension from the processor, e.g., VT-x [19] of Intel or
recently/frequently accessed files to a subset of disks, which then SVM [1] of AMD processors.
increases the data access locality in RAID [3][23]. To conduct power management, COS needs to know the
Similar researches are also made on other types of devices, power states of components in the platform, which include the
like network interface [14] and display devices [4]. People also Cx/Px capabilities of CPU, descriptions of PM registers, power
extend investigation into application level. An MPI library exten- states of peripheral devices, and so forth. However, retrieving
sion for power-saving is introduced [18], as most MPI calls are such information from the HVM guest is very difficult, since in
I/O intensive which is good low power hints to processor power original Xen, HVM guest manipulates the devices by emulation,
decision. They implement a transparent runtime system which can while current emulation model (inherited from QEMU) provides
distinguish the regions in a program where MPI calls concentrate, only basic information about the devices, without including those
and scale the P-states [13] at region boundaries. Compiler analy- for power management. Considering the variety of devices, it is
sis with profiling [9] is another research in application layer. also not feasible to describe the power features in detail by emula-

132
Frontend A: Px operation
Driver COS (primary user domain)

Backend C B: Cx operation
SOS Driver Domain0 Device
Device (control domain) OSPM
VA Driver C: Dx operation
Driver

Xen VMM
C C B
A D: Cx operation after Figure 2. Identity mapping of BIOS and ACPI regions
Coordination Logic
coordination
D
E E: Dx operation after Some other basic power features of the host platform can be
Physical Platform Devices CPU coordination obtained by using special instructions, e.g., CPUID. The original
Xen intercepts and emulates these instructions to hide the plat-
form specific information from HVM guests. However, ClientVi-
Figure 1. The architecture of ClientVisor
sor gives the actual information that reveals the status of the host
platform to COS to expose desired power features, although they
tion. In order to solve this problem, ClientVisor exposes the are still emulated.
power features of the host platform to COS with necessary excep- ClientVisor permits some of the PM operations, e.g., those on
tions, which will be explained in detail in Section 3.2. The sensi- the PM registers, of COS to affect the hardware directly accord-
tive devices, e.g., NIC in the network flow filtering SOS, are the ing to the predefined strategies. This requires exposure of the
exceptions for the exposure. For such devices, the split driver platform interface for control, which includes machine specific
model is used instead, where COS possesses the frontend driver registers (i.e., MSRs), I/O ports, system memory space, PCI con-
and SOS possesses the backend driver as shown in Figure 1. figuration space, and so on. Accesses to MSR and I/O ports are
The PM operations initiated by COS PM module (i.e., OSPM privileged operations, which are trapped into VMM and emulated
in Figure 1) can be classified into two catalogs: those manipulat- in Xen. In ClientVisor, with hardware virtualization extensions,
ing processor power states (e.g., Px/Cx operations) and those such as VT-x and SVM, these two types of accesses can be con-
manipulating device power states (i.e., Dx operations). ClientVi- figured to be non-trapped by programming the MSR bitmap and
sor traps these operations by configuring the VM control structure I/O bitmap in VMCS for COS. By this way, the operations can
(i.e., VMCS for short), which resides in Xen VMM and used for change the states of hardware directly. Accesses to system mem-
HVM guest domains, according to the system requirements and ory space can be passed through by identity memory mapping as
strategies that will be discussed in Section 3.3 and Section 3.4. discussed before. By combining the techniques used for the con-
trol interface of MSR and I/O ports as well as that for system
3.2 Platform Power Feature Exposure memory space, ClientVisor can also pass through the access inter-
As discussed in Section 3.1, the power features of components in face of PCI configuration space and other address spaces. Since
the host platform need to be exposed to the primary user domain the accesses to these spaces are composed of privileged opera-
to facilitate power management of COS. These include exposing tions and system memory space accessing in x86 architecture.
the basic power features that are described by ACPI tables or
retrieved by special instructions, and exposing device power fea- 3.2.2 Exposing Device Power Features
ture data, which can be retrieved from the devices or by parsing Different from those of CPU, the power features of peripheral
the PCI bus/device hierarchy. devices are different from one type to another. The power state
varies when the device is in different situation, e.g., the power
3.2.1 Exposing Basic Power Features states differ from each other with different rotate speed for hard
Most basic power feature data of the host platform are stored in disk drive, and the power states differ when the SATA controller
the ACPI tables, which belongs to the physical BIOS. These has different link state.
power features include Cx/Px capabilities of the processor, de- Power state data of some devices is stored in ACPI tables,
scriptions of PM registers, control methods of PM events, and so while that of the other devices can only be obtained through the
on. In order to know these basic power features of the host plat- device interfaces. From operating system point of view, a device
form, COS should access the real ACPI tables rather than the is an abstraction of such interfaces, which includes I/O ports,
virtualized ones. MMIO addresses, PCI configuration space, etc. Therefore, COS
ClientVisor uses identity memory mapping technology to can access such devices directly by using the methods discussed
solve this problem. By this technology, the host physical memory in previous section, i.e., VMCS configuration, identity memory
address (HPA for short) region is directly mapped to its corre- mapping, or customized emulation.
sponding and identical region of guest physical memory address
(GPA for short). By this way, COS can access the real ACPI ta- 3.3 Processor Power Management
bles by accessing the corresponding address regions in its own Generally, processor power management includes working state
physical address space. As ACPI tables are dispersed in the host management and idle state management.
physical memory, whose range can be retrieved from E820 map
of the physical BIOS, there will be multiple regions need to be 3.3.1 Working State Power Management
identically mapped, as illustrated in Figure 2.
The goal of working state power management is to reduce the
In x86 architecture, the layout of host physical memory is
power consumption with little performance impact during the
stored in E820 map [13]. During virtualization, the host E820
working state or by the customization from the user. There are
map is duplicated, and some entries marked as free memory in the
several instruments for working state power management, e.g.,
map is changed to be reserved to protect the memory allocated for
DVFS, which is used to select an appropriate Px state for the
the VMM, domain 0 and the background domains from the erro-
processor.
neous access from COS. After that, the duplicated E820 map is
exposed to COS.

133
Most existing working state PM approaches in virtualized en- e.g., interrupt from a device. After the breaking event, the proces-
vironments are VMM-centric, e.g., Xen implements an on- sor will return working state (i.e., a certain P-state of C0) again.
demand policy inside VMM: It defines two threshold values to A deeper sleeping means less power consumption. However,
denote the upper and lower bound of load. By this policy, if the unlike changing between Px power states, a Cx entry and exit
current CPU load is below the lower bound value, the VMM will usually result in high latencies, and the deeper a Cx is, the longer
change the Px state to the next lower one, and vice versa. Such latency it may incur. Besides, deep Cx states have some side ef-
schemes have some shortcomings. First, this simple policy can fects, e.g., the lost of cache data. Therefore, switching between
not always satisfy the user requirements, and it can not be cus- deep Cx frequently will bring considerable overhead to the system.
tomized by the user. In some scenarios, performance is so impor- To find a good balance between performance and power, the sys-
tant that it is better to set the processor to run always at the tem should predict the time that the idle state breaking events will
highest speed, while in others, we may need a lower Px state to arrive according to the statistic data, and choose a sleeping state
maximize the battery lifetime. On the contrary, unlike Xen and as deep as possible. However, the current stage of VMM-centric
other virtualization solutions, most COTS OSes used in the desk- PM solutions can not make good prediction on the arrival time of
top environment have a rich set of policies and function interfaces, the idle breaking events, while most COTS OSes installed in the
which makes it easy to find a better balance between performance primary user domain can make better predictions. More impor-
and power consumption. tantly, most of the peripheral devices, whose interrupts are the
Second, the VMM-centric solutions can not adjust the power major source of idle state breaking events, are assigned and man-
consumption of some multi-core processors effectively, since in aged by the COS. By considering these facts, ClientVisor is de-
such systems, the power states of each core can not be scaled signed to make PM decisions for the idle state based on that from
accordingly. This can be addressed by some other instruments the COS.
provided by most COTS OSes, e.g., power-aware scheduling [27]. In ClientVisor, Cx operations are triggered by COS when its
ClientVisor solves the processor power management problem load is pretty low. Such PM decisions are reasonable for the
during working state by leveraging the existing functionalities of whole system, since the overall load is approximate to that of
COTS OSes: When COS decides change the power state of the COS as discussed in Section 3.3.1. However, coordination is
processor from Pi to Pj, the hypervisor will perform the change to needed, because Cx operations will halt the processor, but this is
the processor. This requires that during the working state, the not always the same requirement of other domains, i.e., domain 0
COS knows exactly the workload of the whole system. We will and the SOSes.
analyze and explain this strategy of ClientVisor in the following. When the Cx request from COS is trapped, VMM does not
In ClientVisor, the overall load of the system, denoted as Ltotal, perform the operation immediately but blocks the requested vcpu,
can be computed by Equation 1. n
which emulates a halt operation to the primary user domain. After
Ltotal = LCOS + LVMM + Ldomain 0 + ∑ LSOS i (1)
that, the VMM will check the runqueues to find out whether there
are runnable vcpus. If there are, the runnable vcpu is scheduled.
i =1
Otherwise, an event checking is performed to find out whether
where LCOS denotes the inner load of COS on the processor, while there are pending events (e.g., virtual interrupt) need to be deliv-
LVMM, Ldomain0, and LSOSi denotes those of the VMM, the domain 0, ered. The event checking is necessary, since an asynchronous
and the ith SOS, respectively. event may bring a blocked vcpu back to running state. Only when
Within Equation 1, LVMM and Ldomain0 actually make little con- there are not runnable vcpus and no event pending, the VMM
tribution to Ltotal except for some special cases, such as system eventually sets the physical processors into the requested Cx
bootstraping. As we consider the normal working state, they can power state.
be omitted from the equation.
In the virtualized desktop environment, although most of the 3.4 Device Power Management
user applications are running inside the COS, the overhead on the
processor resulted from these applications on the background Almost all PCI (including the later compatible standards) devices
domains is not negligible sometimes. In ClientVisor, by setting have defined at least two power states for the devices, i.e., D0 and
the parameter of the scheduler in VMM (e.g., weight and cap of D3, where D0 means working and D3 means power off. The de-
credit scheduler), the COS is ensured to share at least 95% of the vices may also define intermediary power states between these
CPU cycles. By this way, Equation 1 can be further simplified. two states to provide more choices.
In virtualized environment, the virtualized device driver (e.g.,
Ltotal ≈ LCOS (2) frontend driver of COS in Figure 1) always exposes only two
Based on such observations and analysis, ClientVisor transmits all basic power states of the device, i.e., D0 and D3, to the guest
the PM decisions of COS in the working state to the processor domain to maintain the compatibility and generality. Therefore,
without coordination with other modules of the system as de- the device can be better controlled by its real device driver, which
picted in Figure 1. knows the exact power states set of the device. In ClientVisor, the
device drivers are possessed by different guest domains: The driv-
3.3.2 Idle State Power Management ers of sensitive devices are possessed by SOSes, while the COS
has the others. However, the PM operations to the devices can not
For the idle state PM, modern processors have defined multiple be conducted independently by these domains, since there are
sleeping states (i.e., Cx power states), which are performed by some special devices, i.e., bridges, on which the PM operations
clock gating technology [24]. Once in one of the sleeping states, need to be coordinated.
no instruction is executed on the processor, until a breaking event,

134
To facilitate further discussion, we call a bridge that has at- Global variables:
tached other devices as the super device (SupDev for short), and N: the number of Cx
OTable[N]: the latencies of each Cx of old way
call the devices that attach to it as subordinate devices (SubDev NTable[N]: the latencies of each Cx of new way
for short). For the device that has no subordinate devices, we call Parameter:
it the endpoint device. One type of dependency of device PM is ReqCx: the requested Cx (in old way)
that a PM operation on a SupDev will affect all its SubDevs. For Begin
the native case, OS will not power off a bridge only if all its if (NTable[N] <= OTable[ReqCx]) {
target = N; goto Do;
downstream devices have the same requirement, and no mistake }
will be made since the OS controls all the devices. However, in else {
for i = N to 1 {
ClientVisor, a sensitive device owned by SOS and a non-sensitive if exist i, satisfy NTable[i] == OTable[ReqCx]
device owned by COS may have the same SupDev. In such a target = i; goto Do;
else find i, NTable[i-
case, COS may power off the bridge, since all the devices it pos- 1]<OTable[ReqCx]<OTable[i]
sesses attached to that bridge have the requirements. But this may }
target = i or i-1, by turns; goto Do;
not be the result wanted by SOS. }
When a VMM receives a Dx request triggered by COS or Do:
Put processor into the target Cx (in new way)
SOS, it first checks whether it is an operation on an endpoint de- End
vice. If so, the actual Dx operation on the device is carried out.
Otherwise, an enumeration of all its SubDevs is conducted. For Figure 3. Algorithm of Cx mapping
each SubDev belonging to the domain that issues the Dx opera-
tion, the Dx operation is performed if its current state is above the This algorithm attempts to find a Cx state transition in the new
requested one (when we say Di is above Dj, that means Di is a method with close latency as the requested one in the old method.
power state with higher power consumption, e.g., D2 is above If the requested Cx operation by the old method has higher la-
D3). After that, VMM checks whether all SubDevs of the re- tency than the deepest one in the new method, the deepest Cx
quested one is in or below the Dx power state. If so, the requested state transition is conducted. If the latency of the requested Cx
Dx is eventually performed on the requested device. Otherwise, it transition is between two ones in the new method, the two Cx
means some of its SubDevs are possessed by other domains and transitions are used by turns. For example, if the latency of C2
above Dx currently, and such Dx operation on the requested de- transition in the old method is 10 microseconds, while those of C3
vice is delayed. and C4 by the new method are 9 and 12 microseconds respec-
To handle the delayed request, each time after a Dx operation tively, a sequence {C3, C3, C4} is employed and selected by
is performed on the requested device (no matter an endpoint de- turns to map the original C2 transition issued by COS. With Cx
vice or a bridge), VMM will check whether its SupDev has a mapping optimization, deeper Cx residencies are improved, which
delayed Dx request. If so, a checking on the SubDevs is started gains ClientVisor more power benefits.
again. Another optimization made in ClientVisor is to reduce the
number of interrupts, which is the major source of idle breaking
3.5 Optimizations events. By profiling the original implementation and analysis, we
With the original implementation, we find that although the find the hotspot timer handlers in ClientVisor are
power efficiency of ClientVisor in working state is close to native vcpu_periodic_timer_fn() and csched_tick(), which consists of
case, it turns unacceptable when the system is in the idle state 90% within the idle breaking events. As modern operating sys-
(sometimes with more than 40% additional power consumption). tems use tickless technique, and the continuously periodical timer
Several optimizations are conducted to shorten this gap. interrupts are no longer needed to maintain the system time, the
The goal of idle state PM optimization is to improve deep Cx first timer handler, i.e., vcpu_periodic_timer_fn(), which is used
residencies so as to prolong the battery lifetime. From the profil- to provide virtual timer interrupt to guests, can be disabled during
ing data obtained from the original implementation, we find that Cx residencies. Moreover, the latter one, i.e., csched_tick(), which
the processor rarely resides in sleeping states deeper than C1, is used to account a vcpu by scheduler in VMM, can be also dis-
even if the machine is in the idle state. This is because the deep abled during idle state since several times of accounting can be
Cx power states always have too long latencies, and it prevents compressed as one. This optimization, by disabling the two timer
the COS from deciding to enter. Most legacy COTS OSes, e.g., handlers at Cx entry, and re-enabling them at Cx exit, can reduce
Windows XP, conduct Cx transitions for the processor by access- the number of timer interrupts by more than 70%, and the total
ing the I/O ports. However, this method always incurs high la- interrupts about 20%.
tency. Modern processors provide a new method, i.e., by using
MWAIT instruction, to reduce the overhead resulted by Cx transi- 4. Performance Evaluation
tions [11]. To facilitate discussion, we call the Cx transition by In this section, we compare the performance data obtained from
accessing I/O porting as the old method, and that by using experiments on prototype implementation of ClientVisor with that
MWAIT instruction as the new method. of VMM-centric PM scheme. The comparison is divided into
One of the optimizations made in ClientVisor is to replace the three parts: The static power consumption, which indicates the
old method for Cx transitions with the new method, if the system power consumption rate when the platform is in idle state; the
finds itself running on the new platforms. With this optimization, dynamic power consumption, which indicates the power con-
when COS triggers a Cx state transition operation, VMM does not sumption rate when the system is in the composite states of idle
use the requested Cx directly, instead it selects a deeper one and and working; and the performance to power ratio, which reflects
performs it in the new method, according to the transition laten- how much work a unit of power can do in a fixed time period.
cies. We call this optimization as Cx mapping, whose algorithm is
illustrated in Figure 3.

135
18 16.71 30.00 28.43
26.85 26.27 26.21
16 15.36
25.00 23.50
14 13.34 13.01
10.91 20.00
12

Power (W)
Power (W)

10 15.00
8
10.00
6
5.00
4
2 0.00
0 Native Xen CV/Orig CV/Cx_opt CV/Cx_Timer_opt
Native Xen CV/Orig CV/Cx_opt CV/Cx_Timer_opt (a) Overall

40
Figure 4. The static average power consumption results
35

30
4.1 Experimental Setup

Power(W)
25
Our testbed is Intel’s mobile platform Montevina. The processor 20
Native
Xen
is Intel Core2 Duo T9400, 2.53GHz, with 32KB L1 I-Cache, CV/Orig
32KB L1 D-Cache and 6MB L2 Cache. It has 2GB RAM, 160GB 15 CV/Cx_opt
CV/Cx_Timer_opt
SATA hard disk, and an Intel 82567LM Gigabit Ethernet control- 10
ler. 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
Our power data is obtained by using an Extech 380803 power
(b) Each Load Level
analyzer [6]. Meanwhile, SPECpower_ssj 2008 [25] is selected to
simulate the daily workload to evaluate both performance and
power. Figure 5. The dynamic power consumption results
Native Microsoft Windows Vista Ultimate is used to get the
baseline data, and a recent Xen unstable distribution1 is used as an no much room to improve the residency of the deepest Cx (i.e.,
implementation of VMM-centric PM approach. ClientVisor in- C1, the deepest one of COS’s PM decisions) with the original
cludes three revisions: the original implementation (i.e., implementation. For the same reason, its Cx residency improve-
CV/Orig), a revision with Cx mapping optimization (i.e., ment over the revision with Cx mapping is also limited, although
CV/Cx_opt), and a revision with both Cx mapping and timer op- it can reduce the count of idle breaking events. Even so, timer
timization (i.e., CV/Cx_Timer_opt). In our experiments, all virtu- optimization still decreases the power consumption by 2.47%
alized systems use Microsoft Windows Vista Ultimate as the COS further on top of Cx mapping.
inside the primary user domain.
Table 1. Cx residencies of ClientVisor
4.2 Static Power Consumption Cx CV/Orig CV/Cx_opt CV/Cx_Timer_opt
Static power consumption data is collected with the systems in
C6 0.00% 88.33% 89.98%
idle state. The platform is powered on and does not run any user
programs. Hibernation feature may cause the whole machine C5~C3 0.00% 0.00% 0.00%
power down and break the experiment after a long idle period.
Therefore, it is disabled in the experiments. The data are collected C2 0.00% 0.27% 0.22%
during one-hour period, with the sampling period of 2 seconds.
The average power consumption data of each period is depicted in C1 90.93% 0.33% 0.08%
Figure 4. C0 9.07% 11.07% 9.72%
As discussed in Section 3.2.2, the VMM-centric approach
usually misses the prediction of idle breaking events and chooses 4.3 Dynamic Power Consumption
unsuitable Cx states to enter, which results in the 53.16%
additional power consumption than the native case as shown in To validate the data obtained during measuring the dynamic
Figure 4. It can be observed that the original ClientVisor results in power consumption, a fixed amount of work should be defined for
40.79% additional power than the native case, and can be reduced the tests. However, with the default configuration, SPEC-
to 19.25% by the optimizations. power_ssj calculates a maximum throughput to reflect the 100%
The results show that for the optimization techniques, Cx load and runs for a fixed time period, which makes the total
mapping has evident effect, since it replaced most Cx transition amount of each system’s work different due to the virtualization
operations into that of deep Cx residencies with the new method, overhead and different PM capabilities. To avoid this we set a
as shown in Table 1. Timer optimization is applied together with fixed value for the maximum throughput, with the parameter in-
Cx mapping, since it does not replace Cx operations, and there is put.load_level.target_max_throughput 2 . From experiments we

1
Xen-3.3-unstable with changeset 18048, July 14, 2008; linux-2.6.18-xen 2
Setting a fixed value for “input.load_level.target_max_throughput”
(i.e., domain 0) with changeset 610, July 23, 2008 parameter of SPECpower_ssj is OK for research and academic usage. But

136
40,000 40

35,000
35
Performance (ssj_ops)

30,000
30
25,000

Power (W)
20,000 25
15,000
20
10,000
15
5,000

0 10
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
Load Level

Native Xen CV/Orig CV/Cx_opt


CV/Cx_Timer_opt Native Xen CV/Orig
CV/Cx_opt CV/Cx_Timer_opt
Figure 6. Power and performance data

find the actual value of this parameter changes between 30,000


and 40,000 ssj_ops, so it is set to 30,000. SPECpower_ssj runs for 4.4 Balance of Power and Performance
11 240-second intervals, each interval with a different load level Power management is a tradeoff of power and performance. In a
(relative to the maximum throughput) from peak (100%) to idle computer system, power is very important, but a good PM scheme
(0%), each decrement step is 10%. Figure 5(a) depicts the dy- should not incur too much performance impact while reducing the
namic average power of these systems during the whole experi- power consumption. The balance of performance and power is
ment, and power consumptions of each load level are depicted in embodied by their ratio, which can be also used as a metric of
Figure 5(b). user experience.
We can observe that the overall power consumption of Xen is To collect performance data we still use SPECpower_ssj to
20.97% larger than the native case, while those of the three revi- generate the workload. This time the default configuration is used,
sions of ClientVisor are 14.24%, 11.80%, and 11.53%, respec- i.e., SPECpower_ssj calculates a maximum throughput in calibra-
tively. The overall result reflects the combined effect of various tion, and then it runs for 11 240-second intervals with different
PM methods, while the change trend of power in Figure 5(b) can load level, as in the previous experiment. Figure 6 depicts the
give us more information. When the load is low (i.e., 0% ~ 30%), power and performance data of all comparison systems.
idle state PM is the dominant method and the result is similar to The columns in this figure denote the performance of different
the previous experiment. Working state PM becomes dominant as systems in different load levels (measured in ssj_ops), while the
the load increases, but Xen still acts the worst case since its sim- curves denote the change trend of power consumption of each
ple working state policies are not as effective as those in operat- system (measured in watt). Due to the overhead incurred by virtu-
ing system, as discussed in Section 3.3.1. We can also observe alization, both Xen and ClientVisor have performance impacts:
that Cx mapping and timer optimization have little effects when Xen brings an about 12% overall performance decrease, while
the load is above 30%, because both of them are for idle state PM, ClientVisor only decreases the performance by 2%~3%. For the
which will loose their effects when the load is not pretty low. We power consumption, the differentiation becomes more obvious as
believe that the power benefit of device PM is also revealed in the load decreases, because of the large gaps among their static
Figure 5(b), since the power consumptions of the systems still power consumption. The overall power consumption of Xen is
have gaps when the load is very high (above 90%), in which case 9.97% greater than native OS, while those of ClientVisor are
both working state PM and idle state PM instruments have very 10.89%, 8.00%, and 8.76%, respectively. The power consumption
limited effect. curves are interleaved in this figure, the reason is that Xen has
done less work than ClientVisor in this experiment so does not
always consume more power.
The results of performance to power ratio are shown in Table
2. It can be observed that the ratio decreases as the load decreases.
This is because the performance of every system is proportional
to the load level, but their power consumptions are not; there is
still power consumption (i.e., static power consumption) when the
performance (load) decreases to 0.
the results don’t comply with SPECpower_ssj’s run rules. Therefore, don’t
compare the results in this experiment with other compliant results.

137
Table 2. Performance to power ratio (ssj_ops/watt) [3] Colarelli, D. and Grunwald, D. Massive Arrays of Idle Disks for
Load CV/Cx_ CV/Cx_T Storage Archives. In Proceedings of International Conference on
High Performance Networking and Computing, 2002, Baltimore,
Level Native Xen CV/Orig opt imer_opt
Maryland, pp.1-11.
100% 933 811 886 886 878 [4] Dalton, A. B. and Ellis, C. S. Sensing user intention and context for
90% 886 781 849 846 849 energy management. In Proceedings of the 9th conference on Hot
Topics in Operating Systems (HotOS'03), 2003, Lihue, Hawaii,
80% 842 715 771 783 777 pp.23-25.
70% 786 662 717 723 723 [5] Ebergen, J., Gainsley, J., and Cunningham, P. Transistor sizing:
60% 735 584 653 678 667 How to control the speed and energy consumption of a circuit. In
Proceedings of the 10th International Symposium on Asynchronous
50% 659 516 577 605 598 Circuits and Systems (ASYNC'04), 2004, pp.51-61.
40% 592 432 510 524 531 [6] Extech Instrument Corporation, Software Manual for Models
30% 500 341 421 426 453 380801 and 380803 Power Analyzers, 2006.
20% 381 249 296 326 316 [7] Flautner, K., Reinhardt, S., and Mudge, T. Automatic performance-
setting for dynamic voltage scaling. In Proceedings of the 7th Con-
10% 230 137 183 186 197 ference on Mobile Computing and Networking (MobiCom'01),
0% 0 0 0 0 0 2001, pp.507-520.
Over- [8] Govil, K., Chan, E., and Wasserman, H. Comparing Algorithms for
all 681 519 598 614 614 Dynamic Speed-Setting of a Low-Power CPU. In Proceedings of
the 1st Annual International Conference on Mobile Computing and
Networking (MobiCom'95), 1995, pp.13-25.
For the ratio of performance to power, Xen brings a 21.15%
overall decrease, while those of ClientVisor float around 10%. [9] Hsu, C. and Kremer, U. The Design, Implementation, and Evalua-
tion of a Compiler Algorithm for CPU Energy Reduction. In Pro-
Even in the worst case (load level=10%), ClientVisor is only
ceedings of ACM SIGPLAN Conference on Programming
about 17% worse than native OS, while Xen has an about 40% Language Design and Implementation (PLDI'03), 2003, pp.38-48.
decrease. From this experiment we can conclude that ClientVisor
does not deteriorate the user experience, both in the performance [10] Huang, H., Pillai, P., and Shin, K. G. Design and Implementation of
Power-Aware Virtual Memory. In Proceedings of USENIX Annual
and the PM capability.
Technical Conference (USENIX'03), 2003, pp.57-70.
5. Conclusion and Future Work [11] Intel Corporation. Intel 64 and IA32 Architectures Software Devel-
oper's Manual Vol.2A: Instruction Set Reference, 2007.
This paper has presented a novel power management scheme in
[12] Intel Corporation. Intel Virtualization Technology for Directed I/O
desktop virtualization environment, and implements the prototype
Architecture Specification, 2007.
named ClientVisor. ClientVisor exposes the platform power fea-
tures to the guest domains, which eliminates the side-effect [13] Intel Corporation, HP Corporation, Microsoft Corporation, et al.
brought by the virtualization layer. Experiment results show that Advanced Configuration and Power Interface Specification, 2006.
this scheme can leverage COTS OS functionalities for power [14] Kravets, R. and Krishnan, P. Power management techniques for
management in a virtualized desktop environment. This is better mobile communication. In Proceedings of the 4th annual
than the VMM-centric approach in the recent Xen distribution, ACM/IEEE international conference on Mobile computing and
and very close to the native case. networking (MobiCom'98), 1998, Dallas, Texas, USA, pp.157-168.
Future works on ClientVisor includes multiple COSes co-exist [15] Krishna, C. M. and Lee, Y. Voltage-clock-scaling adaptive schedul-
on multi-core platforms, memory PM, and enabling our model to ing techniques for low power hard real-time systems. In IEEE
support and global suspending/resuming, i.e., S3 (suspend to Transactions on Computer, Vol.52, No.12, Dec. 2003, pp.1586-
RAM) and S4 (suspend to disk) states. 1593.
[16] Kumar, A., Shang, L., Peh, L., and Jha, N. K. System-level dynamic
Acknowledgments thermal management for high performance micro-processors. In
IEEE Transactions on Computer-Aided Design, Vol.27, 2008,
This work is supported by National 973 Basic Research Program pp.96-108.
of China under grant No.2007CB310900, Hubei Fund under grant
[17] Kursun, E., Ghiasi, S., and Sarrafzadeh, M. Transistor level budget-
No.2007ABD009, the Ministry of Education-Intel Information
ing for power optimization. In Proceedings of the 5th International
Technology special research fund under grant No.MOE-INTEL- Symposium on Quality Electronic Design (ISQED'04), 2004,
08-06, the research fund supported by HP Labs China. pp.116-121.
We acknowledge the developers of ClientVisor for their hard-
[18] Lim, M. Y., Freeh V. W., and Lowenthal, D. K. Adaptive, transpar-
working programming and tuning experience. We are also grate-
ent frequency and voltage scaling of communication phases in MPI
ful to Eddie Dong, Susie Li, and Bo Li for their review and programs. In Proceedings of the ACM/IEEE Supercomputing Con-
feedback on improving this paper. Finally, we would like to thank ference (SC’06), Tampa, Florida, USA, pp.14.
Intel Corporation to provide us devices, platforms, and evaluation
[19] Nathuji, R. and Schwan K. Virtual Power: Coordinated Power
tools to perform this work. Management in Virtualized Enterprise Systems. In Proceedings of
International Symposium on Operating Systems Principles
References (SOSP'07), 2007, Stevenson, Washington, USA, pp.265-278.
[1] Advanced Micro Devices, Inc. AMD64 Architecture Programmer's [20] Neiger, G., Santoni, A., Leung, F., Rodgers, D., and Uhlig, R. Intel
Manual Vol.2: System Programming, 2007. virtualization technology: Hardware support for efficient processor
[2] Brooks, D. and Martonosi, M. Dynamic Thermal Management for virtualization. In Intel Technology Journal, Vol.10, 2006, pp.167-
High-Performance Microprocessors. In Proceedings 7th Interna- 177.
tional Symposium on High Performance Computer Architecture
(HPCA'01), 2001, pp.171-182.

138
[21] Okuma, T., Ishihara, T., and Yasuura, H. Real-time task scheduling [25] Standard Performance Evaluation Corporation, User Guide of
for a variable voltage processor. In Proceedings of the International SPECpower_ssj 2008, 2008.
Symposium on System Synthesis (ISSS'99), 1999, pp.24-29.
[26] Venkatachalam, V. and Franz, M. Power Reduction Techniques For
[22] Papathanasiou, A. E. and Scott, M. L. Increasing disk burstiness for Microprocessor Systems. In ACM Computing Surveys, Vol.37,
energy efficiency. Department of Computer Science, University of 2005, pp.195-237.
Rochester, TR-792, 2002.
[27] Weissel, A. and Bellosa, F. Process cruise control: event-driven
[23] Pinheiro, E. and Bianchini, R. Energy Conservation Techniques for clock scaling for dynamic power management. In Proceedings of
Disk Array-Based Servers. In Proceedings of the 18th International the International Conference on Compilers, Architecture, and Syn-
Conference on Supercomputing, 2004, Malo, France, pp.88-95. thesis for Embedded Systems (CASES'02), 2002, Grenoble, France.
[24] Shin, Y. and Choit, K. Power conscious fixed priority scheduling
for hard real-time systems, Proceedings of the 36th Annual Design
Automation Conference (DAC'99), 1999, pp. 134-139

139

Das könnte Ihnen auch gefallen