Power-Aware Speed Scaling in

Power-Aware Speed Scaling in
Processor Sharing Systems

Adam Wierman Lachlan L.H. Andrew Ao Tang
Computer Science Department Computer Science Department School of ECE
California Institute of Technology California Institute of Technology Cornell University
Abstract—Energy usage of computer communications systems tradeoff between energy consumption and mean response time.
has quickly become a vital design consideration. One effective This model is the focus of the current paper. In particular, the
method for reducing energy consumption is dynamic speed performance metric considered is E[T ] + E[E]/β 0 , where T is
scaling, which adapts the processing speed to the current load.
This paper studies how to optimally scale speed to balance mean the response time of a job, E is the expected energy expended
response time and mean energy consumption under processor on that job, and β 0 controls the relative cost of delay.
sharing scheduling. Both bounds and asymptotics for the optimal This performance metric has attracted attention recently
speed scaling scheme are provided. These results show that a [16], [20], [21]. The related analytic work falls into two
simple scheme that halts when the system is idle and uses a categories: worst-case analyses and stochastic analyses. The
static rate while the system is busy provides nearly the same
performance as the optimal dynamic speed scaling. However, the former provide specific, simple speed scalings guaranteed to be
results also highlight that dynamic speed scaling provide at least within a constant factor of the optimal performance regardless
one key benefit – significantly improved robustness to bursty of the workload, e.g., [16], [20]. In contrast, stochastic results
traffic and mis-estimation of workload parameters. have focused on service rate control in the M/M/1 model
under First Come First Served (FCFS) scheduling, which can
I. I NTRODUCTION be solved numerically using dynamic programming. One such
Power management is increasingly important in computer approach [21] is reviewed in Section III-C. Unfortunately, the
communications systems. Not only is the energy consumption structural insight obtained from stochastic models has been
of the internet becoming a significant fraction of the energy limited.
consumption of developed countries [1], but cooling is also Our work extends the stochastic analysis of dynamic speed
becoming a major concern. Consequently, there is an important scaling. We focus on the M/GI/1 queue under Processor
tradeoff in modern system design between reducing energy Sharing (PS) scheduling, which serves all jobs currently in
usage and maintaining good performance. the system at equal rates. We focus on PS because it is a
There is an extensive literature on power management, tractable model of current scheduling policies in CPUs, web
reviewed in [2]–[4]. A common technique, which is the focus servers, routers, etc. Based on the model (Section II) and the
of the current paper, is dynamic speed scaling [5]–[8]. This speed scaling we consider (Section III), our analysis makes
dynamically reduces the processing speed at times of low three main contributions.
workload, since processing more slowly uses less energy per • We provide bounds on the performance of dynamic speed
operation. This is now common in many chip designs [9], scaling (Section IV-A). Surprisingly, these bounds show
[10]. In particular, speed scaling has been proposed for many that even an idealized version of dynamic speed scaling
network devices, such as switch fabrics [11], TCP offload improves performance only marginally compared to a sim-
engines [12], and OFDM modulation clocks [13]. ple scheme where the server uses a static speed when busy
This paper studies the efficacy of dynamic speed scaling and sleeps when idle – at most a factor of 2 for typical pa-
analytically. The goal is twofold: (i) to elucidate the structure rameters and often less (see Section V). Counterintuitively,
of the optimal speed scaling scheme, e.g., how should the these bounds also show that the power-optimized response
speed depend on the current workload? (ii) to compare the time remains bounded as the load grows.
performance of dynamic speed scaling designs with that of • We provide bounds and asymptotics for the speeds used
designs that use static processing speeds, e.g., how much by the optimal dynamic speed scaling scheme (Sections
improvement does dynamic speed scaling provide? IV-B and IV-C). These results provide insight into how
There are many analytic studies of speed scaling designs. the speeds scale with the arriving load, the queue length,
Beginning with Yao et al. [14], the focus has been on either and the relative cost of energy. Further, they uncover
(i) the goal of minimizing the total energy used in order to a connection between the optimal stochastic policy and
complete arriving jobs by their deadlines, e.g., [15], [16], or results from the worst-case community (Section IV).
(ii) the goal of minimizing the average response time of jobs, • We illustrate through analytic results and numerical experi-
i.e., the time between their arrival and their completion of ments that, though dynamic speed scaling provides limited
service, given a set energy/heat budget, e.g., [17]–[19]. performance gains, it dramatically improves robustness to
Web settings typically have neither job completion deadlines mis-estimation of workload parameters and bursty traffic
nor fixed energy budgets. Instead, the goal is to optimize a (Section VI).
2
II. M ODEL AND NOTATION 0
−0.5
In order to study the performance of dynamic speed scaling,
log(dyn power)
we focus on a simple model: an M/GI/1 PS queue with −1
controllable service rates, dependent on the queue length. In −1.5

this model, jobs arrive to the server as a Poisson process with −2
Intel PXA
TCP proc
rate λ, have intrinsic sizes with mean 1/µ, and depart at rate Pentium M
sn µ when there are n jobs in the system. Under static schemes, −2.5
5 5.5 6 6.5 7 7.5 8
the (constant) service rate is denoted by s. Define the “load” log(freq)
as ρ = λ/µ, and note that the ρ is not the fraction of time the Fig. 1. Dynamic power for an Intel PXA 270, a TCP offload engine, and
server is busy. a Pentium M 770. The slopes of the fitted lines are 1.11, 1.66, and 1.62
The performance metric we consider is E[T ] + E[E]/β 0 , respectively.
where T is the response time of a job and E is the energy much less understood, and so including leakage in our analysis
expended on a job. It is often convenient to work with the is beyond the scope of this paper and we leave the question of
expected cost per unit time, instead of per job, which by including both leakage and dynamic power for future work.
Little’s law can be written as z = E[N ] + λE[f (s)]/β 0 , where III. P OWER - AWARE SPEED SELECTION
N is the number of jobs in the system and f (s) determines
the power used when running at speed s. When provisioning processing speed in a power-aware
The remaining piece of the model is to define the form of manner, there are three natural thresholds in the capability
f (s). Prior literature has typically assumed that f is convex, of the server.
and often, that f is a polynomial, specifically a cubic. That is (i) Static provisioning: The server uses a constant static
because the dynamic power of CMOS is proportional to V 2 f , speed, which is determined based on workload charac-
where V is the supply voltage and f is the clock frequency [4]. teristics so as to balance energy usage and response time.
Operating at a higher frequency requires dynamic voltage (ii) Static-with-sleep provisioning: The server sleeps by dis-
scaling (DVS) to a higher voltage, nominally with V ∝ f , abling its clock (setting s = 0) if no jobs are present,
yielding a cubic relationship. and if jobs are present it works at a constant rate chosen
To validate the polynomial form of f , we consider data to balance energy usage and response time.
from real 90 nm chips in Figure 1. The voltage versus speed (iii) Dynamic speed scaling: The server adapts its speed to
data comes from the Intel PXA [22], Pentium M 770 proces- the current number of requests present in the system.
sor [23], the TCP offload engine studied in [12] (specifically The goal of this paper is to understand how to choose
the NBB trace at 75◦ C in Fig 8.4.5). Interestingly, the dynamic optimal speeds in each of these scenarios and to contrast the
power usage of real chips is well modeled by a polynomial relative merits of each scheme. Clearly the expected cost is
scaling of speed to power, but this polynomial is far from reduced each time the server is allowed to adjust its speed
cubic. In fact, it is closer to quadratic, indicating that the more dynamically. This must be traded against the costs of
voltage is scaled down less aggressively than linearly with switching, such as a delay of up to tens of microseconds to
speed. As a result, we will model the power used by running change speeds [2]. The important question is “What is the
at speed s by magnitude of improvement at each level?” For our comparison,
f (s) sα we will use idealized versions of each scheme. In particular,
λ 0 = (1) in each case we will assume that the server can be run
β β
at any desired speed in [0, ∞) and ignore switching costs.
where α > 1 and β takes the role of β 0 , but has dimension Thus, in particular, the dynamic speed scaling is a significant
(time)−α . The cost per unit time then becomes idealization of what is possible in practice. However, our
sα results will suggest that it provides very little improvement
z = E[N ] + . (2) over the static-with-sleep scheme.
β
In this section, we will derive expressions for the optimal
We will often focus on the case of α = 2 to provide intuition. speeds in cases (i) and (ii). For case (iii), we will describe a
Clearly, this is an idealized model since in reality only a few numerical approach for calculating the optimal speeds which
discrete speeds can be used. is due to George and Harrison [21]. Though this numerical
Interestingly, we find that the impact of the workload approach is efficient, it provides little structural insight into
parameters ρ, β, and α can often be captured using one simple the structure of the dynamic speeds or the overall performance.
parameter γ = ρ/β 1/α , which is a dimensionless measure. Providing such results will be the focus of Section IV.
Thus, we will state our results in terms of γ to simplify their
form. Also, it will often be convenient to use the a natural A. The optimal static speed
dimensionless unit of speed s/β 1/α . The simplest system to manage power is one which selects
Though we focus on dynamic power in this paper, it should an optimal speed, and then always runs the processor at that
be noted that leakage power is increasingly important. It speed. This case, which we call static-without-sleep, is the
represents 20-30% of the power usage of current and near- least power-aware scenario we consider, and will be used
future chips [4]. However, analytic models for leakage are simply as a benchmark for comparison.
3
Even when the speed is static, the optimal design can be C. Optimal dynamic speed scaling
“power-aware” since the optimal speed can be chosen so that
A popular alternative to static power management is to allow
it trades off the cost of response time and energy appropriately.
the speed to adjust dynamically to the number of requests in
In particular, we can write the cost per unit time (2) as
the system. The task of designing an optimal dynamic speed
ρ sα scaling scheme in our model can be viewed as a stochastic
z= + . control problem.
s−ρ β
We start the analysis by noting that we can simplify
Then, differentiating and solving for the minimizer gives that the problem dramatically with the following observation. An
the optimum s occurs when s > ρ and sα−1 (s − ρ)2 = βρ/α. M/GI/1 PS system is well-known to be insensitive to the
job size distribution. This still holds when the service rate is
B. The optimal static speed for a sleep-enabled system queue-length dependent since the policy still falls into the class
of symmetric policies introduced by Kelly [24]. As a result,
The next simplest system is when the processor is allowed
the mean response time and entire queue length distribution
two states: sleeping or processing. We model this situation
are affected by the service distribution through only its mean.
with a server that runs at a constant rate except when there
Thus, we can consider an M/M/1 PS system. Further, the
are no jobs in the system, at which point it can sleep, using
mean response time and entire queue length distribution are
zero dynamic power.
equivalent under all non-size based service distributions in the
To determine the optimal static speed, we proceed as we
M/M/1 queue [24]. Thus, to determine the optimal dynamic
did in the previous section. If the server can sleep when it is
speed scaling scheme for an M/GI/1 PS queue we need only
idle, the energy cost is only incurred during the fraction of
consider an M/M/1 FCFS queue.
time during which the server is busy, ρ/s. The cost per unit
The “service rate control” problem in the M/M/1 FCFS
time (2) then becomes
queue has been studied extensively [21], [25], [26]. In partic-
ρ sα−1 ular, George and Harrison [21] provide an elegant solution to
z= +ρ . the problem of selecting the state-dependent processing speeds
s−ρ β
to minimize a weighted sum of an arbitrary “holding” cost
The optimum occurs when s > ρ and with a “processing speed” cost. Specifically, the optimal state-
dependent processing speeds can be framed as the solution
dz ρ (α − 1)sα−2 to a stochastic dynamic program, to which [21] provides an
0= =− 2
+ρ ,
ds (s − ρ) β efficient numerical solution. In the remainder of this section,
we will provide an overview of this numerical approach. The
which is solved when
core of this approach will form the basis of our derivation of
bounds on the optimal speeds in Section IV.
(α − 1)sα−2 (s − ρ)2 = β. (3)
We will describe the algorithm of [21] specialized to the
The optimal speed can be solved for√explicitly for some α. case considered in this paper, where the holding cost in state
For example, when α = 2, sss = ρ + β. n is simply n. Further, we will generalize the description to
In general, define allow arbitrary arrival rates, λ. The solution starts with an
estimate z of the minimal cost per unit time, including both the
G(γ; α) = σ s.t. σ > γ occupancy cost and the energy cost. As in [21], [26], [27], the
minimum cost of returning from state n to the empty system
(α − 1)σ α (1 − γ/σ)2 = 1. (4)
is given by the dynamic program
With this notation, the optimal static speed for a server ½ · ¸
1 f (s)
which sleeps while idle is sss = β 1/α G(γ; α). We call this vn = inf λ 0 +n−z
s∈A λ + µs β
policy the “static-with-sleep” policy, and denote the corre- ¾
sponding cost zss . µs λ
+ vn−1 + vn+1
The following lemma serves to bound G. The proof is λ + µs λ + µs
deferred to Appendix A.
where A is the set of available speeds. We will usually assume
Lemma 1. For α ≥ 2, A = R+ ∪ {0}. With the substitution un = λ(vn − vn−1 ), this
r can be written as [21], [27]
γ 2−α 2 ½ ¾
γ+ ≤ G(γ; α) ≤ (α − 1)−1/α + γ (5) f (s) sun
α−1 α un+1 = sup z − n + λ 0 + . (6)
s∈A β ρ
and the inequalities are reversed for α ≤ 2.
Two additional functions are defined. First,
Note that the first inequality becomes tight for γ α À 1 and
the second becomes tight for γ α ¿ 1. Further, when α = 2
φ(u) = sup {ux/ρ − λf (x)/β 0 } (7)
both become equalities, giving G(γ; 2) = γ + 1. x∈A
4
Second, the minimum value of x which achieves this supre- is constant-competitive, i.e., in the worst case the total cost
mum, normalized to be dimensionless, is is within a constant of optimal. This matches the asymptotic
behavior of the bounds for α = 2 for large n. This behavior
ψ(u) = β −1/α min{x : ux/ρ − λf (x)/β 0 = φ(u)}. (8) can also be observed for general α (Lemma 7 and Theorem 4).
Note that under (1), A. Bounds on cost
µ ¶α/(α−1) µ ¶1/(α−1)
u u We start the analysis by providing bounds on z in this
φ(u) = (α − 1) , ψ(u) = . subsection, and then using the bounds on z to bound s∗n above
αγ αγ
and below (Sections IV-B and IV-C).
Given the estimate of z, un satisfy Recall that zss is the total cost under static-with-sleep.
u1 = z (9a) Theorem 2.
un+1 = φ(un ) − n + z (9b) ³ ´
max γ α , γα(α − 1)(1/α)−1
The optimal value of z can be found as the minimum value γ
such that (un )∞ ≤ z ≤ zss = + γG(γ; α)α−1
n=1 is an increasing sequence. This allows z to G(γ; α) − γ
be found by an efficient binary search, after which un can in
principle be found recursively. Proof: The optimal cost z is bounded above by the cost
The optimal speed in state n is then given by of the static-with-sleep policy, which is simply
γ
s∗n zss = + γG(γ; α)α−1 (15)
= ψ(un ). (10) G(γ; α) − γ
β 1/α
Two lower bounds can be obtained as follows.
This highlights the fact that γ = ρ/β 1/α provides the appro- In order to maintain stability, the time-average speed must
priate scaling of the workload information because the cost z, satisfy E[s] ≥ ρ. But z > E[sα ]/β ≥ (E[s])α /β by Jensen’s
normalized speed sβ −1/α and variables un depend on λ, µ inequality and the convexity of (·)α . Thus
and β only through γ.
E[sα ] ρα
Note that this “forward” approach advocated in [21] is z> ≥ = γα. (16)
numerically unstable (Appendix B). We suggest that a more β β
stable way to calculate un is to start with a guess for large For small loads, this bound is quite loose. Another bound
n, and work backwards. Errors in the initial guess decay comes from considering the minimum cost of processing a
exponentially as n decreases, and are much smaller than the single job of size X, with no waiting time or processor sharing.
accumulated roundoff errors of the forward approach. This It is optimal to serve the job at a constant rate [14]. Thus
backward approach is made possible by the bounds we derive · µ ¶¸
z X sα X
in Section IV. ≥ EX min +
λ s s β s
IV. B OUNDS ON OPTIMAL DYNAMIC SPEED SCALING
The right hand side is minimized for s = (β/(α − 1))1/α
In the prior section, we presented the optimal designs for the independent of X, giving z ≥ ρβ −1/α α(α − 1)(1/α)−1 . Thus
cases of static, static-with-sleep and dynamic speed scaling. In ³ ´
the first two cases, the optimal speeds were presented more- z ≥ max γ α , γα(α − 1)(1/α)−1 . (17)
or-less explicitly, however in the third case we presented only
a recursive numerical algorithm for determining the optimal
The form of the bounds on z are complicated, so it is useful
dynamic speed scaling. Even though this approach provides an
to look at the particular case of α = 2.
efficient means to calculate s∗n , it is difficult to gain insight into
system design. In this section, we provide results exhibiting the Corollary 3. For α = 2, static-with-sleep has cost within a
structure of the optimal dynamic speeds and the performance factor of 2 of optimal. Specifically,
they achieve.
The main results of this section are summarized in Table I. max(γ 2 , 2γ) ≤ z ≤ zss = γ 2 + 2γ. (18)
The bounds on z for arbitrary α are essentially tight (i.e., Proof: For α = 2, G(γ; 2) = γ + 1. Hence (15) gives
agree to leading order) in the limits of small or large γ. Due γ
to the complicated form of the general results, we illustrate the zss = + γ(γ + 1) = γ 2 + 2γ, (19)
(γ + 1) − γ
bounds for the specific case of α = 2 to provide insight. In
particular, it is easy to see the behavior of sn and z as a func- which establishes the upper bound.
tion of γ and n in the case of α = 2. This leads to interesting The lower bound follows from substituting α = 2 into (17):
observations. For example, it illustrates a connection between
z ≥ max(γ 2 , 2γ). (20)
the optimal stochastic policy and policies analyzed in the
worst-case model. In particular, Bansal, Pruhs and Stein [20] The ratio of zss to the lower bound on z has a maximum
showed that, when nothing is known about future arrivals, a value of 2 at γ = 2, and hence static-with-sleep is within a
policy that gives speeds of the form sn = (n/(α − 1))1/α factor of 2 of the true optimal scheme.
5
TABLE I
B OUNDS ON TOTAL COSTS AND SPEED AS A FUNCTION OF THE NUMBER n ≥ 1 OF JOBS IN THE SYSTEM .
For any α,
³ ´ γ
max γ α , γα(α − 1)(1/α)−1 ≤ z ≤ + γG(γ; α)α−1 Theorem 2 (11)
G(γ; α) − γ
µ µ ¶¶1/(α−1)
s∗n 1 n + σα − γ α γ
σn ≤ 1/α ≤ min + Theorems 8 and 4 (12)
β α σ>0 (σ − γ) (σ − γ)2
α−1
where σn satisfies σn ((α − 1)σn − αγ) ≥ n − (γ/(G(γ; α) − γ) + γG(γ; α)α−1
For α = 2,
¡ ¢
max γ 2 , 2γ ≤ z ≤ γ 2 + 2γ Corollary 3 (13)
µ ¶
p s∗ √ γ 3 ³ γ ´1/3
γ + n − 2γ ≤ √n ≤ γ + n + min , Corollaries 9 and 5 (14)
β 2n 2 4
For α = 2 and n < 2γ, a lower bound on sn results from linear interpolation between max(γ/2, 1) at n = 1 and γ at n = 2γ.
It is perhaps surprising that such an idealized version of Unrolling the dynamic program (24) gives a joint minimization
dynamic speed scaling provides such a small magnitude of over all sn
improvement over a simplistic policy such as static-with-sleep. ·
1 α
In fact, the bound of 2 is very loose when γ is large or small. un = ρ min sn /β + n − z
sn sn
Further, empirically, the maximum ratios for typical α are ¸
1 £ α ¤
below 1.1 (see Figure 3). Thus there is little to be gained by + ρ min sn+1 /β + (n + 1) − z + un+2
sn+1 sn+1
dynamic scaling in terms of mean cost. However, Section VI  
shows that dynamic scaling dramatically improves robustness. X∞ Yi
ρ α
A second interesting observation about Corollary 3 is that = min  (si /β + i − z) (25)
si ,i≥n
j=n j
s
the expected response time under these power aware schemes i=n
remains bounded as the arrival rate λ grows. Specifically, An upper bound can be found by taking any (possibly
by (16), suboptimal) choice of sn+i for i ≥ 1, and bounding the
z E[s2 /β] 2 optimal z. Taking si = σβ 1/α > 0 for all i ≥ n gives
E[T ] = − ≤ √ .
λ λ γ X ³ γ ´j α
µ β ∞
un ≤ min (σ + (n + j) − z)
σ>0 σ σ
This is a marked contrast to the standard M/GI/1 queue. j=0
· ¸
n + σα − z γ
= γ min + .
B. Upper bounds on the optimal dynamic speeds σ>0 σ−γ (σ − γ)2
We now move to providing upper bounds on the optimal Since z ≥ γ α from (17), equation (21) follows. With (10),
dynamic speed scaling scheme. this establishes (22).
For n = 0, (23) holds since u0 = 0. Otherwise, it follows
Theorem 4. For all n and α, from the inequality σ α = n(1 + γn−1/α )α ≤ n(1 + γ)α and
the fact that n−2/α ≤ 1.
n + σα − γ α γ2 By specializing to the case when α = 2, we can provide
un ≤ γ + (21)
σ−γ (σ − γ)2 some intuition for the upper bound on the speeds.
for all σ > 0, whence Corollary 5. For α = 2,
µ ¶
µ µ ¶¶1/(α−1) s∗n √ γ 3 ³ γ ´1/3
s∗n 1 n + σα − γ α γ ≤ n + γ + min , . (26)
≤ min + . (22) β 1/α 2n 2 4
β 1/α α σ>0 (σ − γ) (σ − γ)2 Proof: Factoring the difference of squares in the first term
of (21) and canceling with the denominator yields
In particular, for σ = γ + n1/α ,
γn £ ¤ γ2
α un ≤ + 2γ 2 + γ(σ − γ) + . (27)
un ≤ n(α−1)/α γ (1 + (1 + γ) ) + γ 2 (23) σ−γ (σ − γ)2
One term of (27) is increasing in s, and two are decreasing.
which is concave in n. Minimizing pairs of these terms gives upper bounds √ on un .
Proof: As explained in [27], (6) can be rewritten as A first bound can be obtained by setting σ −γ = n, which
minimizes the sum of the first two terms, and gives
· α ¸
sn /β + n + un+1 − z √ γ2
un = ρ min . (24) un ≤ 2γ n + 2γ 2 + .
sn sn n
6
By (10), this gives a bound on the optimal speeds of Theorem 8. The scaled speed σn = s∗n /β 1/α satisfies
s∗ √ γ ¡ ¢ γ
√n ≤ n + γ + . (28) σnα−1 (α − 1)σn − αγ ≥ n − − γG(γ; α)α−1 .
β 2n G(γ; α) − γ
A second bound comes by minimizing the sum of the second Proof: Note that un ≤ un+1 [21]. Thus by (9b)
and third terms, when σ − γ = (2γ)1/3 . This gives α−1
un ≤ uα/(α−1) − n + z. (32)
γn γ2 (αγ)α/(α−1) n
un ≤ + 2γ 2 + γ(2γ)1/3 +
(2γ) 1/3 (2γ)2/3 By (10), this can be expressed in terms of s∗n as
µ ∗ ¶α−1
which, upon division by 2γ, gives sn (s∗ )α
µ ¶1/3 αγ 1/α
≤ (α − 1) n − n + z
s∗n n 1 3 ³ γ ´1/3 β β
√ ≤ +γ+ . (29) whence
β 2 2γ 2 4
µ ¶α−1 µ ¶
The minimum of the right hand sides of (28) and (29) is a s∗n s∗n
(α − 1) 1/α − αγ ≥ n − z
bound on sn . β 1/α β
The result then follows from the fact that and the result follows from (15) since z ≤ zss .
µ ¶1/3
3 ³ γ ´1/3 γ n 1 √ For α = 2, the above theorem can be expressed more
≤ ⇒ ≤ n, explicitly as follows.
2 4 2n 2 2γ
which follows from taking the square root of the first inequal- Corollary 9. For α = 2 and any n ≥ 2γ,
ity and rearranging factors. s∗n p
1/α
≥ γ + n − 2γ. (33)
C. Lower bounds on the optimal dynamic speeds β
Finally, we prove lower bounds on the dynamic speed Proof: For α = 2, (32) can be solved explicitly, giving
p
scaling scheme. We begin by bounding the speed used when un ≥ 2γ 2 + 4γ 4 + 4γ 2 (n − z).
there is one job in the system. The following result is an
immediate consequence of Corollary 3 and (9a). By (10),
s∗n p
Corollary 6. For α = 2, ≥ γ + (n − z) + γ 2 (34)
β 1/α
³γ ´ s∗ γ
max , 1 ≤ √1 ≤ + 1. (30) and substituting z ≤ 2γ + γ 2 from (18) gives the result.
2 β 2 There are two important observations about the above
Observe that bounds in (30), like those in Corollary 3, are corollary. First, note that the corollary only applies when
essentially tight for both large and small γ, but loose for γ s∗ ≥ ρ, and hence after the mode of the distribution. However,
near 1, especially the lower bound. it also proves that the mode occurs at n ≤ 2γ. Second, note
Next, we will prove a bound on s∗n for large n. that the corollary only applies when n ≥ 2γ. In this case, we
can simplify the upper bound on sn in (28) and combine it
Lemma 7. For sufficiently large n, with (33) to obtain:
µ ¶1/α p
s∗n n s∗ √ 1
> . (31) n − 2γ + γ ≤ √n ≤ n + γ + . (35)
β 1/α α−1 β 4
Proof: Rearrange (9b) as This form clearly highlights the tightness of the bounds for
µ ¶(α−1)/α µ ¶(α−1)/α large n and/or large γ.
un n − z + un+1 n Finally, note that in the case when n < 2γ the only
= ≥
αγ α−1 α−1 bounds
√ we have on the optimal speeds are s∗n ≥ s∗1 ≥
where the inequality uses the fact that the un is non- β max(γ/2, 1), which follow from Corollary 6 and the fact
decreasing [21] and unbounded, whence un+1 − z > 0 for that s∗n is increasing in n [21]. The following lemma proves
large n. Applying s∗n = β 1/α (un /(αγ))1/(α−1) gives (31). that an improved lower bound can be attained by interpolating
This result highlights the connection between the optimal linearly between max(γ/2, 1) and γ.
stochastic policy and prior policies analyzed in the worst-case Lemma 10. The sequence un is strictly concave increasing.
model that we mentioned at the beginning of this section.
Specifically, combining (31) with (23) and (10) shows that Proof: Let P (n) be the proposition
speeds chosen to perform well in the worst-case are asymptot- un+1 − un ≥ un − un−1 . (36)
ically optimal (for large n) in the stochastic model. However,
note that the probability of n being large is small. Strict concavity of (un ) is equivalent to there being no n for
Next, we can derive a tighter, albeit implicit, bound on the which P (n) holds. Since (un ) is non-decreasing [21] and there
optimal speeds. exists an upper bound on (un ), (23), with gradient tending to
0, it is sufficient to show that P (n) implies P (n + 1). If so,
7
2
6 10
20
optimal α=1.6
5 static−sleep 1.15 α=2
15
static α=3
cost per job, z/λ

4 bounds
rate/sqrt(β)
1.1
rate/sqrt(β)
zss / z
1
3 10 10
1.05
2 optimal optimal
static−sleep 5 static−sleep 1
1 static static
bounds bounds 0
10 0.95
0 0 −2 0 2 −2 0 2
0 5 10 15 20 0 20 40 60 80 100 10 10 10 10 10 10
occupancy, n occupancy, n γ γ
(a) γ = 1 (b) γ = 10 (a) Absolute costs, α = 2 (b) Ratio of cost for static-with-sleep
to optimal, zss /z.
Fig. 2. Rate vs n, for α = 2 and different energy-aware-load, γ.
Fig. 3. Cost z vs energy-aware-load γ.
then any local non-concavity would imply convexity from that
shows that the static-with sleep (i.e., the upper bound) has
point onwards, in which case its long-term gradient is positive
very close to the optimal cost.
and bounded away from zero and hence must violate the upper
In addition to comparing the total cost of the schemes, it is
bound.
important to contrast the mean response time and mean energy
By (9b), un+1 − un = φ(un ) − φ(un−1 ) − 1. With this
usage. Figure 4 shows the breakdown. A reference load of
identity, P (n) is equivalent to
ρ = 3 with delay-aversion β = 1 and power scaling α = 2
φ(un ) − φ(un−1 ) − (un − un−1 ) ≥ 1. was compared against changing ρ for fixed γ, changing β for
fixed ρ and changing α. Note γ = 3 was chosen to maximize
This implies un−1 6= un and
µ ¶ the ratio of zss /z. The second scenario shows that when γ
φ(un ) − φ(un−1 ) is held fixed, but the load ρ is reduced and delay-aversion
− 1 (un − un−1 ) ≥ 1. (37)
un − un−1 is reduced commensurately, the energy consumption becomes
negligible.
Note that the first factor is positive, since the second factor is
positive. Since φ is convex, there is a subgradient g defined VI. ROBUST POWER - AWARE DESIGN
at each point. This gives We have seen both analytically and numerically that (ide-
µ ¶ µ ¶
φ(un ) − φ(un−1 ) φ(un+1 ) − φ(un ) alized) dynamic speed scaling only marginally reduces the
≤ g(un ) ≤ . cost compared to the simple static-with-sleep. This raises the
un − un−1 un+1 − un
question of why dynamic scaling is worth the complexity.
This and (36) imply that both of the factors of (37) in- This section illustrates one reason: robustness. Specifically,
crease when going from P (n) to P (n + 1), establishing dynamic schemes provide significantly better performance in
P (n + 1), and the strict concavity of (un ). Since it is also the face of bursty traffic and mis-estimation of workload.
non-decreasing [21], the result follows. We focus on robustness with respect to the load, ρ. The
V. C OMPARING STATIC AND DYNAMIC SCHEMES optimal speeds are sensitive to ρ, but in reality this parameter
must be estimated, and will be time-varying.
To this point, we have only provided analytic results. We
It is easy to see the problems mis-estimation of ρ causes for
now use numerical experiments to contrast static and dynamic
static speed designs. If the load is not known, then the selected
schemes. In addition, these experiments will illustrate the
speed must be satisfactory for all possible anticipated loads.
tightness of the bounds proven in Section IV on the optimal
Consider the case that it is only known that ρ ∈ [ρ, ρ̄]. Let
dynamic speed scaling scheme.
z(ρ1 |ρ2 ) denote the expected cost per unit time if the arrival
We will start by contrasting the optimal speeds under each
rate is ρ1 , but the speed was optimized for ρ2 . Then, the robust
of the schemes. Figure 2 compares the optimal dynamic speeds
design problem is to select the speed ρ0 such that
with the optimal static speeds. Note that the bounds on the
dynamic speeds are quite tight, especially when the number min
0
max z(ρ|ρ0 ).
ρ ρ∈[ρ,ρ̄]
of jobs in the system, n, is large. For reference, the modes
of the occupancy distributions are about 1 and 5, close to the The optimal design is to provision for the highest foreseen
points at which the optimal speed matches the static speeds. load, i.e., maxρ∈[ρ,ρ̄] z(ρ|ρ0 ) = z(ρ̄|ρ0 ). However, this is
Note also that the optimal rate grows only slowly for n much wasteful in the typical case that the load is less than ρ̄. The
larger than the typical occupancy. This is important since the fragility of static speed designs is illustrated in Figure 5,
range over which DVS is possible is limited [4]. which shows that when speed is underprovisioned, the server is
Although the speed of the optimal scheme differs signifi- unstable, and when it is overprovisioned the design is wasteful.
cantly from that of static-with-sleep, the actual costs are very Optimal dynamic scaling is not immune to mis-estimation
similar, as predicted by the remark after Corollary 3. This is of ρ, since s∗n is highly dependent on ρ. However, because
shown in Figure 3. The bounds on the optimal speed are also the speed adapts to the queue length, dynamic scaling is more
very tight, both for large and small γ. Part (a) shows that the robust. Figure 5 shows this improvement.
lower bound is loosest for intermediate γ, where the weights However, though the optimal dynamic scheme is more ro-
given to power and response time are comparable. Part (b) bust than a static scheme, robustness can be improved further.
8
80 30
delay or energy (normalized units)

energy γ = 30 γ=3
α=2 α=2 25
response time ρ=3 ρ=3
60 γ=3 γ=3 β=0.01 β=1
α=2 α=2 20
ρ=3 ρ = 0.3
cost, z
β=1 β=0.01
40 15
Static−sleep
10
Optimal
optimal
Static
20 static−sleep
5
linear speed
0
0 0 5 10 15 20 25 30 35 40
design ρ
Fig. 4. Breakdown of E[T ] and E[sα ], for several scenarios.

Fig. 5. Cost at load ρ = 10, when speeds are designed for “design ρ”, using
Specifically, consider the following speed scaling scheme that β = 1, α = 2.
we term “linear”. It scales the server speed in proportion to the
√ Proof: The optimal rates for the linear policy are sn =
queue length, i.e., sn /β 1/α = n. Note that under this scaling n β, independent of ρ0 . Thus its cost is always (38).
the server is equivalent to an M/GI/∞ queue with homoge- The
√ optimal speed for static-with-sleep in this case is sn =
neous servers. Figure 5 shows that the linear scaling provides ρ0 + β for n 6= 0. When operated at actual load ρ, this gives
significantly improved robustness when compared with the
optimal dynamic scheme; indeed, the “optimal” scheme is only ρ E[s2 ] ρρ0 ρ
E[N ] = √ = +√
optimal for designs with ρ ∈ [7, 14]. Further, when ρ is in this β + ρ0 − ρ β β β
region, the linear scaling provides only slightly higher cost and
than the optimal scaling. The price that linear scaling pays is
E[s2 ] ρ2 + ²ρ ρ ρ
that it requires very high processing speed when the occupancy zss = + E[N ] = +√ +√
is high, which may not be supported by the hardware. β β β β+²
In addition to the numerical illustrations above, we can where ² = ρ0 − ρ. We can further relate zss to zlin by
compare the robustness analytically in the case of α = 2. First, ²ρ ρ ρ
we will show that if ρ is known, the cost of the linear scheme zss − zlin = +√ −√
β β+² β
is exactly the same as the cost of the static-with-sleep scheme. ²ρ ²ρ
Thus, the cost of the linear scheme is within a factor of 2 of = −√ √
β β( β + ²)
optimal (Theorem 11). Then, we will show that when the target
load differs from the actual load the linear scheme significantly from which (39) follows.
reduces the cost (Theorem 12). Interestingly, Theorem 12
shows that the linear scaling scheme has cost independent of VII. C ONCLUDING REMARKS
the difference between the design and actual ρ. In contrast, Speed scaling is an important method for reducing energy
the cost of static-with-sleep grows linearly in this difference. consumption in computer communication systems. Intrinsi-
This is also illustrated by Figure 5. cally, it trades off the mean response time and the mean energy
consumption, and this paper provides insight into this tradeoff
Theorem 11. When α = 2, zss = zlin . Thus, zlin ≤ 2z.
using a stochastic analysis.
Proof: If the speed in state n is kn then Specifically, in the M/GI/1 PS model, both bounds and
∞ asymptotics for the optimal speed scaling scheme are provided.
1 X (ρ/k)n −ρ/k
E[N ] = E[sα ] = (kn)α e These bounds are tight for small and large γ and provide
kµ n=0
n! a number of insights, e.g., that the mean response time is
bounded as the load grows under the optimal dynamic speed
For α =√2, E[s2 ] = ρk + ρ2 , and so the total cost is optimized scaling and that the optimal dynamic speeds in the stochastic
for k = β. In this case, model match (for large n) dynamic speed scalings that have
µ ¶ been shown to have good worst-case performance.
E[sαn] ρ ρ ρ2
zlin = E[N ] + =√ + √ + Surprisingly, the bounds also illustrate that a simple scheme
β β β β
2 which sleeps when the system is idle and uses a static rate
= γ + 2γ, while the system is busy provides performance within a factor
which is identical to the cost for static-with-sleep. By Corol- of 2 of the optimal dynamic speed scaling. However, the value
lary 3, this is within a factor of 2 of z. of dynamic speed scaling is also illustrated – dynamic speed
The next scaling schemes provide significantly improved robustness
to bursty traffic and mis-estimation of workload parameters.
Theorem 12. Consider a system designed for target load ρ0 Interestingly, the dynamic scheme that optimizes the mean
that is operating at load ρ. When α = 2, cost is no longer optimal when robustness is considered. In
ρ2 ρ particular, a scheme that scales speeds linearly with n can
zlin = + 2√ (38) provide significantly improved robustness while increasing
β β
µ ¶ cost only slightly.
ρ ²2 There are a number of related directions in which to extend
zss = zlin + √ (39)
β β+² this work. For example, we have only considered dynamic
9
power consumption, which can be modeled as a polynomial [24] F. P. Kelly, Reversibility and Stochastic Networks. Wiley, 1979.
of the speed. However, the contribution of leakage power is [25] s. A. Bari and S. Shneorson, “Dynamic control of an M/M/1 service
system with adjustable arrival and service rates,” Management Science,
growing and an important extension is to develop models of vol. 51, no. 11, pp. 1778–1791, Nov. 2006.
total power usage that can be used for analysis. Also, it will be [26] D. Low, “Optimal pricing policies for an M/M/s queue,” Operations
very interesting to extend the analysis to scheduling policies Research, vol. 22, pp. 545–561, 1974.
[27] J. Wijngaard and J. Shaler Stidham, “Forward recursion of Markov
beyond PS. For example, given that the speed can be reduced decision processes with skip-free-to-the-right transitions, part I: Theory
if there are fewer jobs in the system, it is natural to suggest and algorithm,” Mathematics of Operations Research, vol. 11, no. 2, pp.
scheduling according to Shortest Remaining Processing Time 295–308, May 1986.
[28] L. E. Schrage, “A proof of the optimality of the shortest remaining
first (SRPT), which is known to minimize the number of jobs processing time discipline,” Oper. Res., vol. 16, pp. 678–690, 1968.
in the system [28].
A PPENDIX A
VIII. ACKNOWLEDGEMENTS B OUNDS ON G(γ; α)
A subset of this work will be presented at the Allerton 2008 Proof of Lemma 1: Let k1 satisfy
workshop Sept. 23-26, 2008.
σ = G(γ; α) = (α − 1)−1/α + k1 γ. (40)
R EFERENCES
Substituting the identity (a + b) = a (1 + b/[(a + b) − b])α
α α
[1] J. Baliga, R. Ayre, W. Sorin, K. Hinton, and R. Tucker, “Energy
consumption in access networks,” in IEEE Conf. Optical Fiber com- and (40) into (4) gives
µ ¶α ³
munication (OFC), Feb. 2008, pp. 1–3.
k1 γ γ ´2
[2] O. S. Unsal and I. Koren, “System-level power-aware deisgn techniques 1 = (α − 1)(α − 1)−α/α 1 + 1− ,
in real-time systems,” Proc. IEEE, vol. 91, no. 7, pp. 1055–1069, 2003. σ − k1 γ σ
[3] S. Irani and K. R. Pruhs, “Algorithmic problems in power management,”
SIGACT News, vol. 36, no. 2, pp. 63–76, 2005. solved for (1 − k1 γ/σ)α/2 = 1 − γ/σ. Thus, for α ≥ 2
[4] S. Kaxiras and M. Martonosi, Computer Architecture Techniques for
Power-Efficiency. Morgan and Claypool, 2008. αk1 γ γ
[5] N. Bansal, T. Kimbrel, and K. Pruhs, “Speed scaling to manage energy 1− ≤1− ,
and temperature,” J. ACM, vol. 54, no. 1, pp. 1–39, Mar. 2007.
2 s s
[6] Y. Zhu and F. Mueller, “Feedback EDF scheduling of real-time tasks with the inequality reversed for α ≤ 2. For small γ, this
exploiting dynamic voltage scaling,” Real Time Systems, vol. 31, pp. inequality tends to equality. Hence k1 ≥ 2/α for α ≥ 2,
33–63, Dec. 2005.
[7] L. Yuan and G. Qu, “Analysis of energy reduction on dynamic volt- and k1 ≤ 2/α for α ≤ 2 and the second inequality in (5) is
age scaling-enabled systems,” IEEE Trans. Comput.-Aided Des. Integr. accurate to leading order in γ.
Circuits Syst., vol. 24, no. 12, pp. 1827–1837, Dec. 2005. Similarly, substituting G(γ; α) = γ + k2 . into (4) gives
[8] S. Herbert and D. Marculescu, “Analysis of dynamic voltage/frequency
scaling in chip-multiprocessors,” in Proc. ISLPED, 2007, p. 6. µ ¶2
α γ
[9] “Intel Xscale.” [Online]. Available: www.intel.com/design/intelxscale x = (α − 1)(γ + k2 ) 1 −
[10] “IBM PowerPC.” [Online]. Available: http://www-03.ibm.com/ γ + k2
technology/power/powerpc.html
[11] L. Mastroleon, D. O’Neill, B. Yolken, and N. Bambos, “Power aware
= (α − 1)(γ + k2 )α−2 k22 .
management of packet switches,” in Proc. High-Perf. Interconn., 2007.
[12] S. Narendra et al., “Ultra-low voltage circuits and processor in 180 nm This is solved for
r
to 90 nm technologies with a swapped-body biasing technique,” in Proc. γ 2−α
IEEE Int. Solid-State Circuits Conf, 2004, p. 8.4. k2 = − ²2 .
[13] R. Chandra, R. Mahajan, T. Moscibroda, R. Raghavendra, and P. Bahl, α−1
“A case for adapting channel width in wireless networks,” in Proc. ACM
SIGCOMM, Seattle, WA, Aug. 2008. For α ≥ 2, 0 ≤ ²2 → 0 as k2 /ρ → 0, which shows that
[14] F. Yao, A. Demers, and S. Shenker, “A scheduling model for reduced the first inequality of (5) is an upper bound. For α ≤ 2, 0 ≥
CPU energy,” in Proc. IEEE Symp. Foundations of Computer Science ²2 → 0 as k2 /ρ → 0, which shows that the first inequality
(FOCS), 1995, pp. 374–382.
[15] K. Pruhs, P. Uthaisombut, and G. Woeginger, “Getting the best response of (5)pis a lower bound. The requirement k2 ¿ γ is then
for your erg,” in Scandinavian Worksh. Alg. Theory, 2004. γ À γ 2−α /(α − 1) or equivalently γ α À 1/(α − 1).
[16] S. Albers and H. Fujiwara, “Energy-efficient algorithms for flow time
minimization,” in Lecture Notes in Computer Science (STACS), vol. A PPENDIX B
3884, 2006, pp. 621–633. N UMERICAL CONSIDERATIONS OF OPTIMAL SCALING
[17] K. Pruhs, R. van Stee, and P. Uthaisombut, “Speed scaling of tasks
with precedence constraints,” in Proc. Workshop on Approximation and Let ûn and ẑ be numerical estimates of un and z, with
Online Algorithms, 2005. errors ∆n = ûn − un and δ = ẑ − z, and consider how errors
[18] D. P. Bunde, “Power-aware scheduling for makespand and flow,” in Proc.
ACM Symp. Parallel Alg. and Arch., 2006. propagate under (9b). If z is known exactly, then ∆n+1 =
[19] S. Zhang and K. S. Catha, “Approximation algorithm for the φ(ûn ) − φ(un ) giving
temperature-aware scheduling problem,” in Proc. IEEE Int. Conf. Comp.
Aided Design, Nov. 2007, pp. 281–288. |∆n+1 | > φ0 (min(un , ûn ))|∆n |
[20] N. Bansal, K. Pruhs, and C. Stein, “Speed scaling for weighted flow
times,” in Proc. ACM-SIAM SODA, 2007, pp. 805–813. since φ is convex. If φ0 (u) = α(u/(αγ))1/(α−1) > 1, then
[21] J. M. George and J. M. Harrison, “Dynamic control of a queue with
adjustable service rate,” Operations Research, vol. 49, no. 5, pp. 720–
the error grows exponentially if ŷn+1 is calculated from ŷn ,
731, Sep. 2001. but decreases exponentially if calculation instead starts from a
[22] Intel Corp., “Intel PXA270 processor: Electrical, mechanical, and ther- large n and works backwards using (24). Working backwards
mal specification.” 2005.
[23] M. Telgarsky, J. C. Hoe, and J. M. F. Moura, “SPIRAL: Joint runtime
requires an initial condition to replace (9a). It is sufficient to
and energy optimization of linear transforms,” in Proc. ICASSP, 2006. choose an initial estimate such as (33).

Power-Aware Speed Scaling in

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Power-Aware Speed Scaling in

Hochgeladen von

Copyright:

Verfügbare Formate

Power-Aware Speed Scaling in

Processor Sharing Systems

II. M ODEL AND NOTATION 0

controllable service rates, dependent on the queue length. In −1.5

the (constant) service rate is denoted by s. Define the “load” log(freq)

cost per job, z/λ

delay or energy (normalized units)

Fig. 4. Breakdown of E[T ] and E[sα ], for several scenarios.

Das könnte Ihnen auch gefallen