06111328

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 21, NO.
1, JANUARY 2013
239
Power Optimization in Embedded Systems via Feedback Control of Resource Allocation

Martina Maggio, Henry Hoffmann, Marco D. Santambrogio, Anant Agarwal, and Alberto Leva
AbstractEmbedded systems often operate in so variable conditions that design can only be carried out for some worst-case scenario. This leads to over-provisioned resources, and undue power consumption. Feedback control is an effective (and not yet fully explored) way to tailor resource usage online, thereby making the system behave and consume as if it was optimized for each specic utilization case. A control-theoretical methodology is here proposed to complement architecture design in a view to said tailoring. Experimental results show that a so addressed architecture meets performance requirements, while consuming less power than any xed (i.e., uncontrolled) one capable of attaining the same goals. Also, the methodology naturally induces computationally lightweight control laws. Index TermsComputing systems, control systems, embedded systems, power consumption, self-adaptive systems.
level irrespective of the operating conditions. Peculiar of the presented research is the deliberate quest for extremely simple control laws, which is particularly relevant when power constraints are strict. Notice that the proposed methodology complements classical design, and can be seamlessly applied to existing devices, e.g., by means of software instrumentation. To demonstrate its potential, the methodology is here applied step by step to a relevant case study, and experimental results are commented. II. RELATED WORKS AND MOTIVATION A vast literature exists on the use of control theory in computing systems [7], including embedded ones [8], [9]. Contributions span from reliability [10] to security [11], software testing [12], scheduling [13], thread-pool management [14], and more. The l rouge of virtually all the documented research is that by closing feedback loops around a computing (embedded) system, desired propertiessuch as a prescribed throughputcan be enforced online. However, as recently noticed [15], control theory can do much more, because several computing system functionalities are feedback controllers in nature. If one thus accepts not only to close control loops around existing systems, but also to design parts of said systems as controllers, a strong and unitary framework is available to carry out and formally assess the design [16], limiting the role of heuristic in the nal product. Also the importance of power optimization is testied by numerous research works. For example, [17] and [18] explore how to execute tasks and turn off processors in homogeneous multiprocessor systems. Other relevant works are multicore chips that manage resource allocation [19], take care of critical sections [20] and optimize for power [6]. This research too shows a l rougean almost ubiquitous focus on ofine design for optimization. Also in this case, more effective a use for feedback control can be envisaged: by controlling resource allocation online, an (already designed) architecture can be tailored not only for a given application, but for each specic run of it. The work described here is therefore in some sense complementary to those quoted. To the best of the authors knowledge, the literature results nearest to the presented research are the recent papers [21], [22]. The dynamical relocation of tasks proposed in [21] is however based on statically precomputed mappings from processes to computing elements, while here static data are just the starting point for setting up the on-line resource tailoring. Also, [22] relies on modications of the Linux kernel, which allows a ner control granularity but limits the applicability to devices with an operating system, while here only user space resource allocation is used, which can be done on any device, with or without supporting software.
I. INTRODUCTION
HE ideas of embedded and dedicated device are nowadays less coincident than just a few years ago. It could make sense to design a dedicated chip for an old-fashioned mobile phone, while this is apparently meaningless for a modern smartphone [1]. Modern devices are not designed for just one task. They host different applications, with various requirements, so that resource optimization is a key to meet requirements without scarifying power. This scenario poses nontrivial design problems, tackled at different levels, from energy-efcient code generation [2], [3] to consumption estimation [4], battery optimization [5] and more. Ideally, a device should activate a different subset of its architectural capabilities for any different load conguration, depending on the executing applications, and also on the particular data processed [6]. This makes it impossible to design the architecture based on some general power consumption/performance trade-off: strictly, online architecture tailoring is needed, not only for any application, but for each run of it. This work proposes to do so by a novel use of feedback control, that makes an already designed architecture capable of adapting active features continuously, minimizing power consumption while preserving a prescribed performance
Manuscript received September 14, 2011; revised November 15, 2011; accepted November 18, 2011. Manuscript received in nal form November 19, 2011. Date of publication December 21, 2011; date of current version December 14, 2012. Recommended by Associate Editor Q. Wang. M. Maggio, M. D. Santambrogio, and A. Leva are with the Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milan 20133, Italy (e-mail: maggio@elet.polimi.it). H. Hoffmann and A. Agarwal are with the Computer Science and Articial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA. Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TCST.2011.2177499
1063-6536/$26.00 2011 IEEE
240
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2013
B. Methodology Steps
Fig. 1. Benets and limits of the proposed methodology on top of existing design choices.
The proposed approach opens more than one interesting perspective also from an industrial standpoint. It allows to design more general-purpose devices, over-dimensioning the hardware and delegating to a intermediate software level its case-specic exploitation, with the certainty that only the necessary resources will be used. Correspondingly, already designed architectures can be extended to new applications without undue power losses. III. PROPOSED METHODOLOGY A. Overview Suppose that a hardware architecture needs designing for a given purpose, and to minimize power consumption. Experiments are carried out to explore the spectrum of operating conditions. Based on their results, no globally optimal design is found, i.e., some optimal designs are found for some sets of operating conditions, but different from one another. Now, order those locally optimal designs by capability. This leads to order the locally optimal architectures by some index (e.g., the number of cores) for which a lower and an upper bound exist. Apparently, setting is suboptimal in performance, setting is suboptimal in power consumption, and there is no way to select any intermediate value a priori. As anticipated, the methodology proposed here comes downstream architecture design. Standard design proceduressee, e.g., [23], [24]are used up to the determination of , the so obtained architecture is realized, and then complemented with the proposed online resource tailoring. Fig. 1 illustrates in abstracto how the proposed methodology impacts the problem. Performance degradation monotonically decreases with hardware potential, and is zero from a certain capability on. Area and power cost, given the used technology, conversely increase with said potential. If a threshold is put on the area cost, a certain level of performance degradation is to be accepted irrespective of any online action. Also a power cost threshold reects into performance degradation, but this can be acted on online. The proposed methodology thereby breaksto the maximum possible extent, denoted with in Fig. 1the dependence of the power cost on the architecture potential.
A procedure to realize the proposed idea can be summarized as follows. It is assumed that the hardware is already designed, and capable of covering the heaviest utilization case. 1) Dene measurements to assess desired behavior and resource use qualitye.g., throughput and consumed powerand actions on said measured quantities, like enabling or disabling cores, or scaling frequencyi.e., dene sensors and actuators. Denote by the number of sensors, each one providing a value in a set , and by the number of actuators, each one applying a value in a set . Notice that and can be intrinsically heterogeneous. 2a) Perform some conveniently designed (ofine) tests to assess how a generic application can exploit the hardware capabilities in the absence of online tailoring. This provides an uncontrolled hardware prole (UHP), dened as follows. Denoting by the vector of actuator values, with the vector of the performance metrics, each belonging to a set and being obtained from (but not necessarily coinciding with) the sensor outputs, and with a value of providing the worst value of , the UHP is a map
where . Somehow contrary to intuition, the tests of this step are not strictly tied to any specic application, rather designed on the sole basis of sensors, actuators, and performance indices. Based on experience, just some broad assumptions on the applications are generally enough to produce the UHP, and this happens only when some values are not obtainable with objective measurements. The application reported later on, where the UHP is substantiated by (possible) Power-Optimal (PO) states, will clarify. 2b) In parallel with step 2a, i.e., here too off line, devise a control structure (CS) capable of regulating the hardware so as to attain the desired level of performance. If the previous steps were carried out thoroughly, the required CS is very often surprisingly simple. This is merit both in practice, as time and resources are always an issue in embedded systems, and methodologically, as simple structures allow for a systematic domain-specic design, generally safer and more effective than mere heuristics. 3) Based on the UHP, parametrize the CS. Notice that with the proposed approach, should the hardware change, only step 2a) would need repeating. Also, step 3) naturally leads to produce dynamic models for the (now controlled) hardware, the controller, and the closed loop. This allows to analyze and possibly simulate the overall system, assessing its behavior in a general manner with respect to any subsequent use. In the next section, the proposed methodology is demonstrated by going through a signicant case study.
MAGGIO et al.: POWER OPTIMIZATION IN EMBEDDED SYSTEMS VIA FEEDBACK CONTROL OF RESOURCE ALLOCATION
241
IV. CASE STUDY A video encoder processes frames from a raw video stream, for efcient transmission. Each frame can be encoded producing information for all pixels (I frames) or by difference, either with respect to the previous one (P frames) or with both the previous and the subsequent one (B frames). Encoders thus encounter both strict power requirements (e.g., on handheld devices) and high workload variability (e.g., from quasi-static cases like a lecture, to high-variations ones like sport events). In all possible cases, to limit the need for buffering, a constant encoding rate is desired. For example, the National Television System Committee (NTSC) recommends a standard rate of 30 frames per second. The chosen domain is therefore hard, as testied by previous works such as [25], whence the test relevance. As a side remark, if multiple applications are to be controlled, the methodology could be extended such as in [26]. A. Step 1: Sensors and Actuators A free software library (x264) for encoding video streams into the H.264/MPEG-4 AVC format is used here, released within the GNU GPL license and widely used. By analyzing the hardware, an upper bound to the resources amount needed to meet the requirement for every kind of video was found. This upper bound architecture, used in the following tests, has eight cores with a clock speed of 2394 MHz. Intuitively, since the design was carried out to cover the worst case, the amount of available resources is disproportionate for a variety of real cases. To introduce some baseline for the following comparisons, also a lower bound was determined, i.e., an architecture where at least one of the video was encoded meeting the performance requirements, which has three cores and the same maximum clock speed. To assess the system performances, three sensors are required. The rst one measures the number of encoded frames per second, belonging to the set . To obtain such data, the video encoder application was instrumented by means of the Application Heartbeat framework [27]. The encoder was modied to signal the termination of each frame encoding via an heartbeat, so that an external entity (in this case the controller) can retrieve that information and act accordingly. Also, the encoder was turned into a soft real-time application, by allowing it to drop frames if the specied frame rate is too low with respect to the targetin the following tests, below 25 frames per second. The second sensor provides the amount of dropped frames, in the set , as output by the encoder. The third sensor measures the power consumed in the encoding, belonging to , with a WattsUp device [28]. To intervene on the system, two actuators are used, namely the number of cores and their clock speed. Denote with the set of possible number of cores currently allotted to the encoding process and with the set of frequencies of all the cores. With the hardware architecture used for the subsequent tests, is dened as the set of integer values in the range . The actuator is implemented through the taskset Unix command. Similarly, is a set of seven available frequencies, in the range MHz. This actuator is implemented through the cpufrequtils Linux package.
TABLE I SUMMARY OF POWER-OPTIMAL (X) AND NON-POWER-OPTIMAL (-) STATES
B. Step 2a: Data Collection The architecture is stressed with the PARSEC benchmark suite [29]. Consumption is measured for all the 56 power states produced by the possible combinations of number of cores and frequency, and only some of the possible control actions are found to result in a PO state. In detail, to nd the PO states, the following procedure is followed. A set of software applications that present a wide variety of workloads are chosen, and instrumented within the Application Heartbeat framework. Recall that this is an off-line phase, thus employing a very large set is not detrimental, and one could really think of trying any possible application that the device is reasonably expected to host. The selected applications are then executed in each of the possible congurations as for available (xed number of) cores and frequency with different data, collecting power and performance measurements. Congurations are nally sorted based on their power consumption, and each of them is included in the PO set if for at least one of the runs, any other conguration that consumes more power, also produces a higher performance. In other words, PO states are those control input combinations for which in all the runs of all the applications allotting less resources diminishes performance. With the given hardware, only 25 of the 56 possible states are power-optimal. Surprisingly, none of the combinations of seven cores and any possible frequency is PO, while for example assigning eight cores is PO with any given frequency. Table I reports the optimal power states found with the analysis. Anticipating what is explained in general later on, the control rationale of PO states is the following. Each PO state is provided with an estimated speedup value. The controller produces a speedup request, the nearest values to that request (one lower and one higher) are found in the UHP map, and a couple of PO states is thus selected. Those two states are applied, the rst for a fraction of the subsequent sampling period and the second for the rest of the period, so that the time-weighed average of their two speedups coincide with the request. Thus, the feedback controller decides the amount of resources to allot, and the UHP map is used to do so by setting the system always to a PO state. This in some sense decouples power from performance, although relying on off-line estimates: the controller operates like if it had just to allot a comprehensive resource affecting performance, and the off-line proling is used to select the most power-efcient input combination that has the desired effect. To estimate the speedup of a PO state, assumptions are of course needed. Here, applications are supposed to scale
242
linearly as the frequency increases and to obtain a speedup of when cores are devoted to execute them, and is the total number of available cores. This is an experimentally determined relationship, suitable for the presented case, but in general one can conduct tests on the specic software to be executed to provide application dependent speedup values. C. Step 2b: Control Design The desired level of performance is measured by the Frame Encoding Rate output, while the actuation mechanism is designed to select only control actions that result in PO states for any admissible speedup value. Control structuring thus simply means adding a feedback block acting on the speedup and having the frame rate as controlled variable, the set point being xed at 30 frames per second. To complete Step 3) it is necessary to choose a form for the control law, which can here be done by applying some very simple results of the discrete-time control theory for linear systems. The controller will calculate the speedup to be applied (with core and frequency actuation) periodically. Denoting by the time index for said periodic calculation, and assuming that the time span between two subsequent calculations is long enough for the applied speedup to exert all of its action on the plant, the relationship between the speedup and the frame encoding rate (fer) is expressed in the discrete time domain by (1) where is the possibly time varying application workload, i.e., the (nominal) amount of time consumed between two subsequent frame encodings, and an exogenous disturbance accounting for any non nominal time behavior. Of course, is in general unknown but the proling phase can easily provide a reliable average and bounds for it. The decision was here taken to treat not as a time varying quantity but as an unknown parameter, denoted in the following by . From (1) the transfer function from to fer is obviously (2) being the transform of the speedup signal and that of the frame encoding rate. To testify how simple control design can be with the proposed approach, which we recall to be a specic aim of this research, a simple choice is made, namely the deadbeat controller obtained by solving (3) with respect to , where desired frame rate. This yields is the transform of the (4) In the time domain this corresponds to the control law (5)
Of course a minimum applicable speedup and a maximum one do exist and are determined in the proling phase, but the resulting control saturations can be easily managed by standard anti-windup. Since is just a nominal value for the workload, at least a minimal robustness analysis is in order. Trivial computations show that if the actual workload is expressed as , thereby introducing the unknown quantity as a multiplicative error, then the eigenvalue of the closed-loop system is . Requiring the magnitude of said eigenvalue to be less than the unity, one nds that closed-loop stability is preserved for any in the range hence if the workload is not excessively overestimated such a simple control law can effectively regulate the system despite acceptable variations. Incidentally, this justiesfrom a practical standpointthe idea of treating as a time-varying parameter. Strictly speaking, the system is to be considered a linear discrete-time switching one with state-independent switching signal, and a deeper analysis should be done. However, for such systems, if the dynamic matrix eigenvalues lie in the unit circle for any value of the switching signal there surely exist a nite dwell time ensuring stability [30]. Delving into estimates of that time would stray from the scope of this paper, and is thus omitted. A nal point needs addressing, since the speedup values obtained with the proling phase are isolated points while (5) produces an output covering the whole (continuous) range from the minimum to the maximum speedup. To cope with this, denoting by the speedup computed from (5), observe that it is by construction possible to nd two vectors of actuator values and such that
(6) Denoting by the time span between two subsequent controller interventions, a time division output (TDO) actuation mechanism can be applied. This means computing the amounts of time and in which to apply respectively the control vectors and as (7) The TDO actuation mechanism allows translation of a set of discrete (speedup) control values into a continuous one, therefore obtaining the computed control signal. Notice that TDO is standard practice in many control domains, when quantized actuation comes into play. D. Step 3: Parametrization In this particular case the parametrization phase simply consists of plugging the numbers coming from the proling phase into the control law (5) and into the actuation policy. This task can be simply realized by including a C header le, therefore the controller is modular and there is no need to change its code if the architecture changes.
243
Fig. 2. (left, upper plot) Frame encoding rate during part of a video encoding, with some xed architectures and the controlled one; (left, lower plot) control action in the controlled case.
V. EXPERIMENTAL RESULTS
First of all, the results of a single run are presented. The upper diagram of Fig. 2 shows the time behavior of the frame rate during part of a video encoding, with some xed architectures, and the controlled one. The lower picture of Fig. 2 shows how the actuators affect the system in the controlled case. The video used for this test is old_town_cross_1080p.yuv1 and, for a meaningful comparison, the xed architectures were selected as those that have similar capabilities with respect to the most frequently selected values of the vector of actuators in the controlled case. Among the many tests presented later on, this one shows an average control quality. Notice that on average the power consumption of the controlled architecture is lower than all the xed ones except for the least capable. Numbers for the least capable architecture (1 core) are reported just for the purpose of normalization, since de facto it would never be able to attain the required goals and therefore never used in practice. Also, the maximum power of the controlled solution is not the minimum one, therefore proving that in some part of the video encoding more power with respect to others architectures is needed to attain the performance specications and that feedback takes care of the matter correctly. Coming to a more extensive evaluation campaign, the system was tested with 16 different videos. Each video was encoded 25 times, and the so obtained results were averaged for a statistically meaningful analysis. Results are shown in Fig. 3. For each video, the encoding procedure was conducted with 1 enabled core and the minimum frequency, to obtain the lowest the power consumption, and than repeated with different xed (uncontrolled) architectures, namely with a number of cores from 3 to 8, and the maximum frequency. The test was nally repeated activating the controller to adapt the number of cores on line. The top row of Fig. 3 shows the average frame encoding rate (or heart rate, according to the used framework terminology), while the bottom row provides reports the power consumption, normalized to the minimum value above. Note that the controller is
1All
the videos used for the experiments are available at http://xiph.org.
almost invariantly able to maintain the encoding rate, while consuming an amount of power that is comparable or lower to that of the uncontrolled architecture best attaining that rate (that, it is worth stressing, is not the same architecture as the video varies). For example, for the video named old_town_cross_1080p.yuv, the encoding rate with 3 cores is less than 30 frames per second, while the one with four cores is above the set point value. The controlled solution provides a frame rate that is closer to the desired value, and the power it consumes is slightly less than the three cores solution. In Fig. 4, the drop rate is brought in, producing a 3-D chart where the axes report the (average) normalized power, the (average) encoding frame rate, and the (average) drop rate per video. Each architecture produces 16 points in the chart, each one being the average over 25 runs of one of the 16 videos. The black point nally identies the optimum point, corresponding to zero drop rate, 30 frames per second, and the lowest power consumption. As can be seen, a xed low number of cores is better as for power, but fails at maintaining the frame and drop rates, while a xed high number of cores drops no frames, but consumes more power and produces an excessive frame rate. On the other hand, the controlled architecture remains in the vicinity of the optimum point, and above all its results are signicantly nearer to one another (i.e., more uniform for the various videos) than if a xed number of cores is used. A further analysis can be conducted on the data to verify the approach validity, computing the distance between the point depicted in Fig. 4 and the optimum point depicted in black, for each of the 16 videos. Table II contains the results, each row referring to an architecture. The rst two columns report the minimum and the maximum value of the mentioned distance, i.e., the values for the videos that are encoded on average in the best and the worst way by each architecture. This gives an idea of the performance range produced by each architecture. The third and fourth column give the average distance value and its variance, to synthetically appreciate its distribution. Notice that the controlled solution produces in each case comparable ranges with respect to the best uncontrolled one for that case,
244
Fig. 3. (top row) Average frame encoding (or heart) rate and (bottom row) average normalized power consumption of the x264 video encoder with 16 different input data over 25 different runs. The power consumption is normalized to its minimum value.
Fig. 4. Summary of the x264 modied video encoder results. Average frame rate, percentage of dropped frames (as a quality measure) and average power consumption for the 16 input data of Fig. 3. The controller is able to meet the application goals (30 frames per second) while minimizing the average power consumption ( -axes). The black dot represents the optimum point (minimum power consumption, performance guarantees and no frames dropped).
245
TABLE II SUMMARY OF DISTANCES BETWEEN THE VIDEO ENCODER POINTS AND THE OPTIMUM ONE AS IDENTIFIED IN FIG. 4
REFERENCES
but with a signicantly lower variance. To get a visual idea of that, see how the red dots in Fig. 4 are nearer to one another than those in other colors.
VI. CONCLUSION AND FUTURE WORK A methodology was presented to introduce feedback control in the management of embedded system resources or, more precisely, to structure such problems so that said introduction relegates heuristics in a region as small as possible, and not in the design of the core control law. Devices endowed with the so obtained controllers are made capable of using only the necessary resources in any reasonably expectable environmental condition. Such devices can therefore be designed for the worst case, with the certainty that this will not lead to undue power consumption. Most important, thanks to the use of feedback control in the way just recalled, and somehow contrary to several proposals in the literature, any involved entity can be characterized and quantied, and both the results and their validity limits can be assessed formally. Also, the problem structuring induced by the proposed methodology often allows the core control law to be extremely simple, like the deadbeat one used here. Apparently, this is a merit wherever time and resource constraints are relevant. The potential of the proposed approach was demonstrated by its application to video encoding. The presented study shows how the controlled architecture not only uses just the necessary resources for each application run, but also provides more uniform performances with respect to all the uncontrolled ones. In the opinion of the authors, this work is a step forward in the direction of fullling the need for self adaptation that is emerging in the computing system community. Future research will further investigate the sensing and actuation issues here encountered, in a view to ease the adoption of the proposed methodology in a category of devices and applications as vast as possible. In addition, since even soft failures can be addressed within the proposed framework, techniques to deal with such problems will be developed. Finally, attention will always be paid to keep the architecture design and the subsequent introduction of control clearly separated.
[1] T. Hubbard, R. Lencevicius, E. Metz, and G. Raghavan, Performance Validation on Multicore Mobile Devices, in Veried Software: Theories, Tools, Experiments, B. Meyer and J. Woodcock, Eds. New York: Springer-Verlag, 2008, pp. 413421. [2] V. Tiwari, S. Malik, and A. Wolfe, Power analysis of embedded software: A rst step towards software power minimization, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 2, no. 4, pp. 437445, Dec. 1994. [3] A. Muttreja, A. Raghunathan, S. Ravi, and N. K. Jha, Automated energy/performance macromodeling of embedded software, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 26, no. 3, pp. 542552, Mar. 2007. [4] Y. Fei, S. Ravi, A. Raghunathan, and N. K. Jha, A hybrid energyestimation technique for extensible processors, IEEE Trans. Comput.Aided Design Integr. Circuits Syst., vol. 23, no. 5, pp. 652664, May 2004. [5] K. Lahiri, A. Raghunathan, and S. Dey, Efcient power proling for battery-driven embedded system design, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 23, no. 6, pp. 919932, Jun. 2004. [6] R. Kumar, K. Farkas, N. Jouppi, P. Ranganathan, and D. Tullsen, Processor power reduction via single-isa heterogeneous multi-core architectures, Comput. Arch. Lett., vol. 2, no. 1, pp. 58, 2003. [7] J. Hellerstein, Achieving service rate objectives with decay usage scheduling, IEEE Trans. Softw. Eng., vol. 19, no. 8, pp. 813825, Aug. 1993. [8] J. Hellerstein, Y. Diao, S. Parekh, and D. Tilbury, Feedback Control of Computing Systems. New York: Wiley, 2004. [9] G. Buttazzo, Research trends in real-time computing for embedded systems, ACM SIGBED Rev., vol. 3, no. 3, pp. 110, 2006. [10] O. Kreidl and T. Frazier, Feedback control applied to survivability: A host-based autonomic defense system, IEEE Trans. Reliab., vol. 53, no. 1, pp. 148166, Mar. 2004. [11] R. Dantu, J. Cangussu, and S. Patwardhan, Fast worm containment using feedback control, IEEE Trans. Depend. Secure Comput., vol. 4, no. 2, pp. 119136, Apr. 2007. [12] M. Bayan and J. Cangussu, ACM, Automatic feedback, control-based, stress and load testing, in Proc. ACM Symp. Appl. Comput., 2008, pp. 661666. [13] T. Cucinotta, F. Checconi, L. Abeni, and L. Palopoli, Self-tuning schedulers for legacy real-time applications, in Proc. 5th Euro. Conf. Comput. Syst., 2010, pp. 5568. [14] J. Hellerstein, V. Morrison, and E. Eilebrecht, Applying control theory in the real world: Experience with building a controller for the .net thread pool, SIGMETRICS Perform. Eval. Rev., vol. 37, no. 3, pp. 3842, 2009. [15] A. Leva and M. Maggio, Feedback process scheduling with simple discrete-time control structures, IET Control Theory Appl., vol. 4, no. 11, pp. 23312342, 2010. [16] C. Karamanolis, M. Karlsson, and X. Zhu, USENIX Association, Designing controllable computer systems, in Proc. 10th Conf. Hot Topics Operat. Syst., 2005, pp. 915. [17] R. Xu, D. Zhu, C. Rusu, R. Melhem, and D. Moss, Energy-efcient policies for embedded clusters, in Proc. LCTES, 2005, pp. 110. [18] J. Chen, H. Hs, and T. Kuo, Leakage-aware energy-efcient scheduling of real-time tasks in multiprocessor systems, in Proc. RTAS, 2006, pp. 408417. [19] R. Bitirgen, E. Ipek, and J. Martinez, IEEE Computer Society, Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach, in Proc. 41st Annual IEEE/ACM Int. Symp. Microarch. (MICRO 41), 2008, pp. 318329. [20] M. Suleman, O. Mutlu, M. Qureshi, and Y. Patt, Accelerating critical section execution with asymmetric multi-core architectures, in Proc. ASPLOS, 2009, pp. 253264. [21] A. Schranzhofer, J.-J. Chen, and L. Thiele, Dynamic power-aware mapping of applications onto heterogeneous MPSoC platforms, IEEE Trans. Ind. Inform., vol. 6, no. 4, pp. 692707, Nov. 2010. [22] E. Bini, G. Buttazzo, J. Eker, S. Schorr, R. Guerra, G. Fohler, K.-E. Arzen, V. R. Segovia, and C. Scordino, Resource management on multicore systems: The actors approach, IEEE Micro, vol. 31, no. 3, pp. 7281, May/Jun. 2011. [23] T. Lv, J. Xu, W. Wolf, I. Ozer, J. Henkel, and S. Chakradhar, A methodology for architectural design of multimedia multiprocessor socs, IEEE Design Test Comput., vol. 22, no. 1, pp. 1826, Jan. 2005.
246
[24] H. Hsu, J. Chen, and T. Kuo, Multiprocessor synthesis for periodic hard real-time tasks under a given energy constraint, in Proc. Conf. Design, Autom., Test Euro. (DATE), 2006, pp. 10611066. [25] M. Shaque, L. Bauer, and J. Henkel, Enbudget: A run-time adaptive predictive energy-budgeting scheme for energy-aware motion estimation in h.264/MPEG-4 AVC video encoder, in Proc. Design, Autom. Test Euro. Conf. Exhib. (DATE), 2010, pp. 17251730. [26] M. Maggio, H. Hoffmann, A. Agarwal, and A. Leva, Control-theoretical cpu allocation: Design and implementation with feedback control, presented at the 6th Int. Workshop Feedback Control Implementation Design Comput. Syst. Netw., New York, 2011.
[27] H. Hoffmann, J. Eastep, M. Santambrogio, J. Miller, and A. Agarwal, ACM, Application heartbeats: A generic interface for specifying program performance and goals in autonomous computing environments, in Proc. 7th Int. Conf. Autonomic Comput. (ICAC), 2010, pp. 7988. [28] Watts up?, Denver, CO, Wattsup .net meter, 2010. [Online]. Available: http://www.wattsupmeters.com/ [29] C. Bienia, S. Kumar, J. Singh, and K. Li, The PARSEC benchmark suite: Characterization and architectural implications, in Proc. 17th Int. Conf. Paral. Arch. Compilation Techniques, 2008, pp. 7281. [30] J. Geromel and P. Colaneri, Stability and stabilization of discrete time switched systems, Int. J. Control, vol. 79, no. 7, pp. 719728, 2006.

06111328

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

06111328

Hochgeladen von

Copyright:

Verfügbare Formate

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 21, NO.

Power Optimization in Embedded Systems via Feedback Control of Resource Allocation

1063-6536/$26.00 2011 IEEE

TABLE I SUMMARY OF POWER-OPTIMAL (X) AND NON-POWER-OPTIMAL (-) STATES

the videos used for the experiments are available at http://xiph.org.

Das könnte Ihnen auch gefallen