Beruflich Dokumente
Kultur Dokumente
Madhukar Korupolu
IBM Almaden Research Center
madhukar@us.ibm.com
Aameek Singh
IBM Almaden Research Center
aameek.singh@us.ibm.com
Abstract
We introduce the coupled placement problem for modern
data centers spanning placement of application computation
and data among available server and storage resources.
While the two have traditionally been addressed independently in data centers, two modern trends make it beneficial
to consider them together in a coupled manner: (a) rise
in virtualization technologies, which enable applications
packaged as VMs to be run on any server in the data center
with spare compute resources, and (b) rise in multi-purpose
hardware devices in the data center which provide compute
resources of varying capabilities at different proximities from
the storage nodes.
We present a novel framework called CPA for addressing
such coupled placement of application data and computation
in modern data centers. Based on two well-studied problems
Stable Marriage and Knapsacks the CPA framework
is simple, fast, versatile and automatically enables high
throughput applications to be placed on nearby server and
storage node pairs. While a theoretical proof of CPAs
worst-case approximation guarantee remains an open question, we use extensive experimental analysis to evaluate
CPA on large synthetic data centers comparing it to Linear
Programming based methods and other traditional methods.
Experiments show that CPA is consistently and surprisingly
within 0 to 4% of the Linear Programming based optimal
values for various data center topologies and workload
patterns. At the same time it is one to two orders of
magnitude faster than the LP based methods and is able
to scale to much larger problem sizes.
The fast running time of CPA makes it highly suitable
for large data center environments where hundreds to thousands of server and storage nodes are common. LP based
approaches are prohibitively slow in such environments.
CPA is also suitable for fast interactive analysis during
consolidation of such environments from physical to virtual
resources.
1. Introduction
Managing enterprise data centers is a challenging and
expensive task [9]. Modern data centers often have hundreds
to thousands of servers and storage subsystems, connected
Bhuvan Bamba
Georgia Tech
bhuvan@cc.gatech.edu
via a web of network switches. As the scale and complexity continues to increase, enterprises find themselves at a
unique tipping point. The recent success of virtualization
technologies is encouraging enterprises to transform their
data centers from the old rigid IT configurations to a more
agile infrastructure using virtualization technologies. Server
virtualization technologies like VMware and Xen enable
applications, which were hitherto running in their individual
silos of servers (that were often over-provisioned), to be
packaged as virtual machines (VMs) that can run on any
compatible physical machine in the data center. It also allows
multiple virtual machines to be run on the same physical
machine without interference as well as moving a VM from
one machine to another without application downtime [29],
[22]. Such agility while improving utilizations and reducing
costs, introduces new challenges.
Imagine an enterprise migrating its applications from a
physical IT infrastructure to a virtualized cloud within the
enterprise. One of the tasks that the administrators need
to plan intelligently during this consolidation is to decide
where to place application computation (VM) as well as
application data, among the available server and storage
resources in the virtualized data center. The traditional approach of making computation and data placement decisions
independently [34], [14], [16] yields suboptimal placements
due to different proximities of server and storage nodes in
the data center. Often times data center administrators work
around this problem by over-provisioning resources which
leads to under-utilization, not only of capital and hardware
resources but also of space and energy costs. As a result,
the need for an intelligent placement planning mechanism
is significant.
Another data center reality complicates this process further. Data centers evolve over time and not all nodes in the
data center are created equal. Figure 1 shows an example
modern data center with servers, storage and network nodes.
Different nodes could be purchased at different times thus
having differing capabilities based on the technology at that
time. For example in Figure 1, one switch can be more
recent than the other and have lower latency and greater
I/O throughput capacity. This leads to differing affinities
among different server-storage node pairs. This heterogeneous nature of data center environments requires handling
storage and computational resources in a coupled manner.
Host
Host
Host
Host
High
Bandwidth
Connectivity
Host
Edge Switch
Extra Hop
Edge Switch
Edge Switch
Edge Switch
Edge Switch
CPU
Multiple CPUs
Core Switch
High-end
Storage
Core Switch
High-end
Storage
Storage
Low-end
Disks
(I)
Ai .Sreq cap(Sk )
i:Snode(Ai )=Sk
min
(I)
Ai .Creq cap(Cj )
i:Cnode(Ai )=Cj
3. Algorithms
In this section, we begin by outlining two simpler algorithms a greedy individual placement algorithm INDVGR that places computation and storage of applications
independently in a natural greedy fashion and a pairs
placement algorithm PAIR-GR that greedily places each
applications computation and storage pair simultaneously.
The pseudocode for these is given as Algorithm-1 and
Algorithm-2.
The
INDV-GR
algorithm
(Algorithm-1)
first
places application storage by sorting applications
and greedily assigning them to
by App.Iorate
App.Sreq
storage nodes sorted by BestDist metric, where
n
i=1
vi xi subject to
n
wi xi W
i=1
3:
Propose to best storage node in the list
4: end for
5: while (All apps not placed) do
6:
M axStgN ode StgQ[0]
7:
for all Stg in StgQ do
8:
Compute profits for proposals received at Stg
9:
M axStgN ode argmax(M axStgN ode.prof it ,
Stg.prof it)
10:
end for
11:
Knapsack M axStgN ode
12:
for all Apps accepted by Knapsack M axStgN ode do
13:
Place App storage on M axStgN ode
14:
end for
15:
for all Rejected apps do
16:
Propose to next storage node in its preference list
17:
end for
18: end while
A2 10 15 20
Comp-X
Comp-Y
A1 80 35 50
Comp-Y
3
3
4 2
A3 10 25 60
STG
Placement
Comp-X
Comp-Y
1
33
1
Computation
Placement
2
3
Capacities
Y
50
50
80
80
STG-M
1
STG-N
3
STG-M
2
STG-N
STG-M
1
2
STG-N
A4 80 50 70
A5 10 50 70
5
4
Comp-X
Comp-Y
5
SWAP
Comp-Y
Capacities
X
Y
50
50
80
80
4
STG-M
5
STG-O
5
STG-M
4
STG-O
Figure 3. Swap exchanges storage-compute pairs between A4 , A5 . Individual rounds cannot make this move during STG placement, M and O are equally preferable for
A1 as they have the same distance from A4 -computation (X).
Similarly during Compute placement.
MinCost , SolutionCfg
loop
CPA-Stg()
CPA-Comp()
CPA-Swap()
Cost 0
for all App in AppsQ do
Cost Cost + Cost(App, Cnode(App), Snode(App))
end for
if (Cost < MinCost) then
MinCost Cost
SolutionCfg current placement solution
else
break // termination by local optimum
end if
end loop
return SolutionCfg
4. Evaluation
In this section, we evaluate the performance of the different algorithms through a series of simulation based experiments. Section 4.1 describes the algorithms considered,
4.2 describes the experimental setup followed by results in
subsequent sections. We summarize our findings in 4.7.
The computation and storage requirements for applications are generated using a normal distribution with configurable mean and standard deviations. The mean and standard
deviations are configured as an or fraction of the total
capacity (compute or storage accordingly) in the system
per application. More concretely, for compute resources, the
where is a fraction between
mean (c ) equals Nc cap(C)
N
0 and 1, Nc is the number of compute nodes in the system,
cap(C) is the capacity of each single compute node and
N is the total number of applications in the system. The
where is a
standard deviation (c ) equals Nc cap(C)
N
fraction between 0 and 1. Similarly for storage nodes.
Experiments with increasing and are presented in 4.4
and 4.5. The varying experiments investigate how CPA
behaves as available resources are filled loosely or tightly
by the application workloads. The varying experiments
investigate how CPA behaves with an increasing level of
heterogeneity of the application workloads.
Another important piece of the input is the application
compute-storage node distance matrix. For this, we use
an exponential function based on the physical inter-hop
distance; the closest compute-storage node pairs (both nodes
inside a high end storage system) are set at distance 1 and
for every subsequent level the distance value is multiplied
by a distance-factor. A higher distance factor implies greater
relative distance between two consecutive levels of compute
nodes from the underlying storage nodes. Experiments with
increasing distance factor are presented in 4.6.
0.8
3000
0.5
0.4
0.3
INDV-GR
PAIR-GR
CPA-R1
CPA
0.2
0.1
0
0
100
200
300
400
500
600
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
INDV-GR
PAIR-GR
CPA-R1
CPA
OPT-LP
0
sizes upto 1.08M. The exact cost measures for these small
sizes are given in Table-1.
First, from Figure-4 note the better performance of CPA
and CPA-R1 compared to the two greedy algorithms. Of the
two greedy algorithms, PAIR-GR does better as it places
applications based on storage-computation pair distances,
whereas INDV-GRs independent decisions fail to capture
that. CPA-R1 performs better than both greedy algorithms.
This is because of its use of knapsacks (enabling it to
pick the right application combinations for each node) and
placing computation based on the corresponding storage
placement. Running CPA for further rounds improves the
value iteratively to the best value shown.
Figure-5 shows the same experiment zooming in on the
smaller problem sizes (up to 1.08M) where the OPT-LP was
also able to complete. While the rest of the trends are similar,
most interesting is the closeness in curves of OPT-LP and
CPA, which are pretty much overlapping. Table-1 gives the
exact values achieved by OPT-LP and CPA, CPA-R1 for
these problem sizes. CPA is within 2% for all but one case
where it is within 4% of OPT-LP. This is an encouraging
confirmation of CPAs optimization quality.
Size
0.00 M
0.01 M
0.14 M
0.34 M
0.62 M
1.08 M
2.51 M
4.84 M
8.27 M
CPA-R1
986
10395
27842
37694
52796
67805
91717
114991
136328
CPA
986
9079
24474
32648
45316
59141
79933
100256
117118
OPT-LP
986
8752
24200
32152
45242
58860
Difference
0%
3.7%
1.1%
1.5%
0.1%
0.4%
1500
1000
1.2
2000
500
INDV-GR
PAIR-GR
CPA-R1
CPA
2500
Time (sec)
0.6
Cost (x 1M)
Cost (x 1M)
0.7
Figure 7.
0.055
0.05
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
INDV-GR
PAIR-GR
CPA-R1
CPA
OPT-LP
Time (sec)
Cost (x 1M)
Cost (x 1M)
0.45
INDV-GR
0.4
PAIR-GR
0.35
CPA-R1
CPA
0.3
0.25
0.2
0.15
0.1
0.05
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Alpha
80
70
60
50
40
30
20
10
0
INDV-GR
PAIR-GR
CPA-R1
CPA
Alpha
Alpha
100
0.06
0.055
0.05
0.045
0.04
0.035
0.03
0.025
0.02
INDV-GR
PAIR-GR
CPA-R1
CPA
OPT-LP
variance
120
Cost (x 1M)
Cost (x 1M)
80
60
40
2
0
4
6
8
10 12
Distance Factor
Figure 13.
14
16
0.2
0.3
0.4
0.5
0.6
Time (sec)
Beta
Figure 12. Time with increasing variance. CPA used 3, 10, 6, 8, 5, 7 rounds.
70
0.1
80
20
2
INDV-GR
PAIR-GR
CPA-R1
CPA
OPT-LP
10
INDV-GR
PAIR-GR
CPA-R1
CPA
40
12
INDV-GR
PAIR-GR
CPA-R1
CPA
100
60
20
Beta
Figure 10.
80
Time (sec)
INDV-GR
PAIR-GR
CPA-R1
CPA
Cost (x 1M)
Cost (x 1M)
0.4
0.38
0.36
0.34
0.32
0.3
0.28
0.26
0.24
0.22
0.2
60
50
40
INDV-GR
PAIR-GR
CPA-R1
CPA
30
20
10
6
8 10 12
Distance Factor
14
16
0
2
10
12
14
16
Distance Factor
4.7. Discussion
References
[1] Minknap Code. http://www.diku.dk/pisinger/codes.html.
[2] CPLEX 8.0 student edition for AMPL. http://www.ampl.com/
DOWNLOADS/cplex80.html.
[3] Knapsack problem: Wikipedia. http://en.wikipedia.org/wiki/
Knapsack problem.
[4] NEOS server for optimization. http://www-neos.mcs.anl.gov.
[5] NEOS server: MINOS. http://neos.mcs.anl.gov/neos/solvers/
nco:MINOS/AMPL.html.
[6] Stable marriage problem: Wikipedia. http://en.wikipedia.org/
wiki/Stable marriage problem.
[7] Tiburon project at IBM almaden.
[8] Cisco MDS 9000 Family SANTap service - enabling intelligent fabric applications, 2005.
[9] Data Center of the Future, IDC Quarterly Study Number
06C4799, April 2006.
[10] VMware infrastructure 3 architecture, 2006.
[11] VMware Infrastructure: Resource management with VMware
DRS. VMware Whitepaper, 2006.
[12] B. Adra, A. Blank, M. Gieparda, J. Haust, O. Stadler, and
D. Szerdi. Advanced POWER virtualization on IBM eserver
P5 servers: Introduction and basic configuration. IBM Redbook, October 2004.
[13] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows:
Theory, Algorithms, and Applications. Prentice Hall, 1993.
[14] G.A. Alvarez, J. Wilkes, E. Borowsky, S. Go, T.H. Romer,
R. Becker-Szendy, R. Golding, A. Merchant, M. Spasojevic,
and A. Veitch. Minerva: An automated resource provisioning
tool for large-scale storage systems. ACM Transactions on
Computer Systems, 19(4):483518, November 2001.
[15] E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal,
and A. Veitch. Hippodrome: Running circles around storage
administration. In Proceedings of USENIX Conference on
File and Storage Technologies, pages 175188, 2002.
[16] E. Anderson, M. Kallahalla, S. Spence, R. Swaminathan, and
Q. Wang. Ergastulum: quickly finding near-optimal storage
system designs. HP Labs SSP HPL-SSP-2001-05, 2002.
[17] K. Appleby, S. Fakhouri, L. Fong, G. Goldszmidt, M. Kalantar, S. Krishnakumar, DP Pazel, J. Pershing, and B. Rochwerger. Oceano - SLA-based management of a computing
utility. In Proceedings of IFIP/IEEE Symposium on Integrated
Network Management, pages 855868, 2001.