Sie sind auf Seite 1von 26

A SEMINAR REPORT ON

The Design Methodology and Practice of Low Power SoC

By Cherupalli Akshitha (10B81D5702) M.Tech (VLSI System Design) I-year I-Semester

Under the Supervision of Mr. B. Harish Asst. Professor

Department of Electronics and Communication Engineering CVR COLLEGE OF ENGINEERING ACCREDITED BY NBA, AICTE & Affiliated to JNTUH) Vastunagar, Mangalpalli (V) Ibrahimpatan (M), R.R. District, PIN 501 510

ABSTRACT
With the evolution of the GSM mobile to a multimedia mobile terminal, optimizing the power consumption becomes an extremely complex task. New operating modes like the MP3 player mode or high resolution graphic games gain significant importance from a power consumption point-of-view. Submicron technologies with their significantly increased leakage currents pose another new challenge. New power concepts are required to achieve reasonable operating and standby times. The design methodology, power estimation and optimization of low power have to be pursued at all stages of the design down to gate level. They also have to be compatible with standard or custom software to minimize the impact on time-to market for the custom product. The lower power

techniques implemented at system/architecture and register transfer level in a complex mix-signal mobile baseband SoC, low-power design optimization flow, power management technique fulfilling mobile application and on-chip low-power memory are discussed.

TABLE OF CONTENTS
CHAPTER 1. Introduction. 2. Why Low Power. 3. Where power is consumed. 4. Basic architecture and power concept. 5. Low power design flow. 6. Power minimization techniques. 7. On chip power management. 8. Low power embedded memory strategy. 9. Summary. PAGE NO

LIST OF FIGURES
FIGURE
1. Dynamic power vs leakage power. 2. Cmos power dissipation. 3. Basic structure of soc. 4. Low power design flow. 5. Potentials for power reduction. 6. Diminishing returns through levels of abstraction. 7. Example of multiple-domain structure. 8. Clock phasing. 9. Multiple voltage threshold leakage optimization. 10. Clock gating implementation. 11. Clock gate restructuring. 12. Operand isolation. 13. Gate-level optimization. 14. Basic concepts of clock generation. 15. On chip memory structure.

PAGE NO

1.INTRODUCTION
Below deep submicron technology designer could integrate more and more complex functionalities in single chip, and could achieve much higher performance. However the growing integration density has posed many critical challenges to designer, power is one the most important issues, especially for handheld and portable device where battery life is very important. The minimum of power consumption requested by chip design involves into two aspects, the minimum of dynamic power (switching power and short-cut power) and leakage power. And the latter one will be dominant in VDSM process. The feature size of transistor continues decreasing and the current flows even in standby mode, which keep batteries power dissipated, and then affect performance, as shown in figure 1.

Figure 1: Dynamic power Vs Leakage power

2.WHY LOW POWER


Before going into the details of analyzing and reducing power consumption, we should first look at why it is so critical in today's designs.

The continuing trend in applications for ever increasing functionality, performance and integration within SoCs is leading to designs with power dissipations in the hundreds of Watts. This can be seen from the latest processor variants from Intel, with the Itanium2, for example, approaching 130 Watts. This class of device requires expensive packaging, heat sinks and a cooling environment.

This leads to a number of additional issues that need to be addressed to maintain the feasibility of future applications. The increased integration of mobile applications puts greater demands on the battery lifetime of the product over previous generations. While the advances in CMOS technology have seen a doubling in transistor density roughly every 18 months, the equivalent advancement in battery technology is greater than every five years.

Having high current on-chip decreases the life time and reliability of the product. With increasing frequencies, the average on-chip current required to charge (and discharge) total load capacitance also increases, while the time during which the current surges results in power fluctuations across the power distribution network of the device.

These dynamic voltage drops are a concern in creating delay uncertainty, leading to possible functional problems and eventually to a shortened product life through complete device failure. Finally, the issue of how to address the power dissipated from the device as heat becomes ever more expensive to handle as part of the overall system design.

3. WHERE POWER IS CONSUMED


Power dissipation within a device can be broken down into two basic types -dynamic power consumption based on the switching activity and the static power consumption based on leakage. The dynamic power consumption can be broken down into the switch power due to the charging and discharging of capacitive loads driven by the circuit (including net capacitance and input loads), and short circuit power occurring momentarily during switching when pairs of PMOS and NMOS transistors are conducting simultaneously.

The leakage power can also be broken down into a number of key contributors. One is the current flowing through the reverse biased diode formed between the diffusion regions and the substrate (Idiode). Another is the current flowing through transistors that are not conducting, tunneling through the gate oxide (Isubthreshold). Note the leakage of the device is dramatically impacted by the operating temperature. Therefore, as the chip heats up, the static power dissipation increases exponentially.

Figure 2: CMOS power dissipation

Leakage current within a 130nm process with a 0.7V threshold gives approximately 10-20 pA per transistor; reduce the threshold to 0.3V in the same process and the leakage current rockets to 10-20 nA per transistor, increasing exponentially in smaller geometries. 7

It can be seen, therefore, that leakage is affected by how close Vth is with respect to Vdd, transistor size and temperature. The effects of varying and optimizing Vdd and Vth are discussed in depth in the papers by David J. Frank and Tadahiro Kuroda. The following equations define the power within the device: Ptotal = Pdynamic + Pshort + Pleakage Pswitch = A * C * V2 * F Pshort = A (B/12) (V-2Vth)3 * F * T Pleakage = (Idiode + Isubthreshold) * V A = Switching Activity C = Total Load Capacitance V = Supply Voltage F = Target Frequency B = Gain Factor T = Rise/Fall Time (gate inputs) Vth = Voltage Threshold

Note: consumption is not constant and peak power is an important concern for failure due to electro migration and voltage drops even if average power consumption is low.

In order to meet challenge of power optimization, many low power techniques are developed in different design level and widely used to reduce dynamic power and leakage power: Power management, dynamic voltage scaling, data representation, bus encoding, instruction encoding and memory optimization techniques used in system level; At the algorithm level large powering saving can be obtained through the application of transformations. The strategy consists of modifying the computational structure of the algorithm while preserving its I/O behavior. The objective is optimizing the power dissipation of the final circuit while meeting the functional throughput of the system. There are two key approaches: enabling the reduction of the supply voltage through application of speed up transformation (i.e., transformations commonly used for

performance optimization, and minimizing the effective capacitance by using more generic transformation. Glitch minimization, resource sharing optimization, pre computation, operand isolation, clock gating and extraction of computational kernels approaches used in RTL level. At gate level the impact of signal probabilities should be taken into consideration, combinational circuit and sequential circuit could be optimized for power issue. Technology decomposition, technology mapping, gate resizing and signal-to-pin assignment techniques can be used. Power optimization in circuit level consists of two side: technological solutions and electrical solutions. From technological side, the usage of dynamic logic, pass gate logic and reduced-swing are design choices to reduce power. At electrical level, the trade-off between power consumption and speed has to be taken into account through accurate transistor sizing. Similarly power minimization needs to be targeted while performing circuit partition, floor-planning, placement and routing, and realization of the clock distribution network.

However those design techniques require new design methodology to solve the new issues during the design process, we develop a new low power design flow which could effectively monitor the low power design optimization in each design stage. In real mobile-application SoC design according to the operation modes used the correct on-chip power management module is implemented. Also new low power embedded memories are used to meet the requirement of multimedia application.

4. BASIC ARCHITECTURE AND POWER CONCEPT

This SoC is a GSM/EDGE single chip mixed signal baseband IC containing all analog and digital functionalities of a cellular phone as shown in figure 3. Additionally it provides multimedia extensions to enable todays and future feature phone applications. It is designed as a single chip solution, integrating the digital and mixed signal portions of the baseband in a 90nm, 1.35V technology to meet the ever increasing demands of the GSM cellular subscriber market for feature rich and high performance terminals at low costs.

Figure 3: Basic structure of SoC

In addition to the standby clock mode and the power off modes SoC allows software to configure various processing units so that they automatically adjust to draw the minimum necessary power for the applications. SoC provides three power save modes, Run Mode, Idle Mode and Sleep Mode. In Run Mode the system is fully operational, all clocks and peripherals are enabled which are controlled by software. During Idle Mode ARM is in low power state (wait-for-interrupt mode). All internal ARM clocks are stopped until either an interrupt request or a debug request occurs. These events will return the system into Run Mode. Idle Mode can be entered when ARM has no active tasks to perform, whereas peripherals remain powered and clocked. The difference between Sleep Mode and Slow Mode is that the peripherals clocks will be 10

stopped if they react on the sleep signal. In typical operation, Idle Mode or Sleep Mode will be entered and exited frequently during the runtime of an application. For example, system software will typically cause ARM to enter Idle Mode each time, it has to wait for an interrupt before continuing its tasks. In Idle Mode and Sleep Mode, wake-up is triggered automatically by an interrupt request or a debug request.

Multi-Vt devices are used in this SoC. The core devices are regular-Vt and low-Vt devices. Those devices are desired for better performance, but they have high leakage current. Low leakage device is used for low power applications. The main measure of these devices is an increased gate oxide thickness that reduces the gate leakage. The purpose of the mixed-Vt synthesis is to implement the majority of the circuit with low leakage devices and use only a few regular-Vt and low-Vt devices on the critical paths. Therefore performance goals can be met at an acceptable standby power budget for battery operation. To reduce leakage current, power-down is highly desirable for all those parts of the circuit not required in a specific mode of operation. This is above all import in mobile standby mode to achieve an acceptable standby time. Due to the high leakage currents of the regular-Vt devices the parts of the circuit which have to use these devices due to performance requirements, are primarily in the focus for power-down. DSP subsystem contains many modules requiring regular-Vt devices including DSP core and high performance peripherals. Since no operation of the DSP is required in the sleep phases during mobile standby, power-down that supported by software/firmware is mandatory. Power-down of ARM poses extreme problems to the software, since it has to save its state to memory and reconstruct it after sleep. Power-down concepts of ARM considers using the cache as temporary storage for the register contents. In this case the energy saved by powering-down ARM has to be balanced against leakage current of the cache in mobile standby and the energy required saving and reconstructing the register contents at the beginning and end of the sleep phase.

Dynamic power depends on the following parameters: net toggle activity, capacitance of net, supply voltage Vdd, signal slope, and frequency. Each of the parameters shall be tacked to reduce the power. In each mode of different performance 11

i.e. frequency, select the lowest Vdd value possible since dynamic power depends quadratic on Vdd. That verifying all modes in Static Timing Analysis and simulation determines the minimum voltage for each mode. Onchip voltage regulators add more to the power dissipation than off-chip supply. So at least reduce Vdd externally in coarse steps and do on-chip trimming for fine-tuning only. If the voltage of 2 components doesnt differ too much in all related modes and if the modes match, then combine the power supply to form one unique voltage domain. If power consumption is more important than silicon area excess throughput can also be traded against power consumption by means of supply voltage reduction. At architecture level it is especially important to utilize silicon resource and energy for operation most efficiently. At this level it can be demonstrated that only pipelining results in significant improvements of the efficiency. However to determine the optimum degree of pipelining the overhead for the respective pipeline latches/registers has to be considered carefully. Performance improvements can be gained by parallel processing but has to be traded for area and power. Parallel processing is exemplified for algorithms such as FIR filters and Viterbi decoders.

12

5. LOW POWER DESIGN FLOW


Figure 4 shows the basic low power design flow. Power analysis can be done the first time if the RTL code and a testbench are available. A clock tree will be internally built. To confirm the results it is possible to make a second power analysis after synthesis, but be careful with the results of clock tree. No clock tree will be internally built and the estimations will be wrong. Finally a reliability power analysis should be done after place and route. For dedicated power reduction use the following possibilities: Analyze RTL design for possible clock gating cell insertions. Automatic clock gating in the elaboration step during synthesis. In this step the Power Compiler is able to insert clock gating cells for registers which are coded in the RTL source with enable signals. Power optimization using the Power Compiler with back annotated simulation data can optimize the gate level netlist.

Figure 4: Low power design flow 13

It is mandatory that a functional testbench exists which describes the typical behavior of the design. The analysis is based on the resulting switching activities. To reduce the power of a circuit always keep in mind figure 5. It means that power consumption is mainly affected by the algorithms and the architecture of the design. If you have to make big improvements in your design because of the power consumption do not try to do it during or after synthesis but try to change your algorithms and/or architecture.

Figure 5: Potentials for power reductions

After the synthesis step the estimated power should meet the power constraints. Power optimization is only reasonable for incremental changes within an optimization of small design blocks because this is a very memory consuming issue. To analyze the power of a chip it is not only important to do it with the functional testbench. You have also to check the power with the worst case power consumption within a functional testbench (all power saving switches off), the scan pattern re-simulation, and reset behavior. It must be done for reliability and to be sure that the chip is still working after test, because in this steps you have the highest power consumption. 14

6. POWER MINIMIZATION TECHNIQUES


In targeting an SoC architecture for low power application, we must first fully understand the requirements that will define the power budget. These may be derived from some form of standards-based requirements limiting current draw under certain conditions, or alternately to prolong the life of the battery in the case of a mobile application. The solution for the target applications will differ in how the device is controlled and architected.

Once the requirements are clearly defined, we can start to explore various architectures and determine potential trade offs. By starting at the highest level of abstraction, where the potential for maximum savings are, and further refining this through the levels of design abstraction, we can continually drive the power savings downwards toward the target budget.

Figure 6 -- Diminishing returns through levels of abstraction

15

In finalizing the SoC architecture, a number of considerations and decisions will need to be made at various stages of the design abstraction to reach the optimal solution. These will include such requirements as system performance, processor and other IP selection, new modules to be designed, target technology, the number of power domains to be considered, target clock frequencies, clock distribution and structure, I/O requirements, memory requirements, analog features and voltage regulation. All of these are contributors to the power budget and therefore can be targeted for power minimization to achieve the low power goal.

In bringing all the pieces of the architecture together we need to next look at the global control and clock features that can be used to reduce the overall power of the system. A design is likely to have many modes of operation for various application demands, such as startup, active, standby, idle, and power down.

In some cases multiple levels of these modes will be used to achieve the best overall power management strategy. These modes tend to be generally controlled by a combination of software and hardware features, and need to be planned into the system development from a very early stage of the design process.

From the previously described equations, it can be seen that the best way to save as much power as possible is to scale the voltage to the optimal levels for the required performance. The impact of reducing voltage levels, however, is to increase the gate delay, and beyond a certain level that becomes unfeasible.

The ideal solution is to have varying modes of operation, with the target to power down as much of the design as possible for the given application, reducing both dynamic and leakage power. In standby mode, for example, the minimum amount of logic required should be maintained on a low voltage domain to bring the device out of this state on demand from some external event, then moving through the modes of operation to the required performance level.

16

While this solution provides the maximum saving, it also carries the largest overhead in terms of complexity. These range through the considerations for on or off chip switching regulation, power domain isolation, performance impact of delays associated with the switching and resumption of stable power, and potential loss of state for flip-flops and memory requiring save and restore routines, along with all the additional associated test and verification requirements. In developing this type of implementation, consideration needs to be given to all of the above items and the feasibility of the management of the periods of time where this can be realistically achieved.

Figure 7 -- Example of multiple-domain structure

Simpler implementations containing multiple domains, but no switching or scaling, will carry some of the associated benefits from the quadratic voltage effect. In these cases consideration needs to be given for the partitioning of the design into high performance/higher voltages and low performance/lower voltages.

The next level of consideration, after defining the voltage partitioning and scaling, should be the system level clock architecture and methods of controlling frequency and associated switching levels. While it doesn't address power consumption through leakage, this method goes a long way towards reducing the dynamic power consumption of the device. It is not uncommon for a design to have the clock distribution and clocked elements consume over 50% of the total power consumption of the device. 17

Note the scaling of frequency may be directly proportional to any voltage scaling if implemented to meet the required system level performance. In a given idle or sleep mode, all the non dependent modules can be gated off completely from the route of the tree, eliminating the switching in both the clock distribution and logic within these parts of the design. The use of multiple clock domains, frequency scaling and frequency phasing to reduce peak power can all be managed from the central level of distribution.

Figure 8 -- Clock phasing

The control for the clock architecture is generally controlled through the software interface available via the processor. However, dynamic hardware controlled switching for on-demand activation can also be implemented -- for example, in the case of some decoder function that is required to support bursts of data traffic. These types of features reduce the total software sequence support and system latency.

In all the above aspects of implementation for the defined system clock architecture, detailed consideration is required to avoid all forms of clock glitching, the additional overheads associated with multiple domains in terms of functional test, skew control, design for test considerations and timing closure implications.

18

Once the design architecture has been captured, the RTL code can now be targeted towards a low power synthesis flow, automatically trading power alongside the generally accepted performance and area constraints. The main features targeted by the tools include multiple threshold leakage optimization, multiple supply voltage domains, local latch based clock gating, de-clone and re-clone restructuring, operand isolation, and gate level power optimization.

For multiple threshold leakage optimizations, generally up to three versions of the targeted library are used: Low Vth (fast, high leakage), Standard Vth, and High Vth (slower, low leakage). The tool will target to use as many of the high threshold cells as possible, while maintaining the timing constraints, only utilizing the low threshold cells for critical paths. Obviously, selecting and targeting the appropriate library and characterization for the application performance requirements are a key consideration that should be addressed early on in the design process.

Figure 9 -- Multiple voltage threshold leakage optimization To support multiple voltage domains, additional characterized libraries for the targeted voltages are required. These may also include multiple threshold variants within them. The savings in costs in terms of power will obviously relate to the quadratic voltage scaling effect. Along with a consistent and supporting tool flow, managing the domain partitioning requires careful design consideration in the early stages of development, and close integration between front end design and layout processing to support all of the above methodology.

19

If enabled, local latch based clock gating will generally insert library specific clock gating latches wherever possible before groups or banks of associated flops. The effect of this is to reduce unnecessary clock toggles to the associated flops.

The user can generally define the range of flops to be driven from a single clock gate to avoid any unnecessary imbalance in the clock distribution network. Each clock gating cell provides a functional and a test-activated enable for the clock path, with the optional addition of observability automatically generated if required to reach required target ATPG coverage.

Figure 10 -- Clock gating implementation In relation to clock gating, an additional step can be added using physical data to restructure the clock gating, further reducing power and area. This is achieved from relative placement of the registers and gating cells, reducing fragmentation and replication. Where possible, the original logical partitioning of the clock gating cells to flops will be restructured to provide a more physical layout friendly structure.

Figure 11 -- Clock gate restructuring 20

The complete process has a number of steps. In the pre-layout design the local clock gating is de-cloned to a higher common level, reducing area and creating a cleaner starting point for clock tree synthesis (CTS). Then during the detailed placement/CTS phase, local clock gating cells can be re-cloned to provide the optimal required clock tree. The operand isolation step automatically identifies and shuts down data path elements and hierarchical combinatorial modules with a common control signal. The tool only partially commits to the restructuring, to allow optimal timing and power tradeoffs.

Figure 12 -- Operand isolation

Classical gate level optimization resizes cells, performs pin swapping, removes unnecessary buffering, merges gates, adds buffers to reduce slew and restructures logic to provide the best possible power optimization. However, the majority of these steps are also rehashed in the physical domain with real placement and wire length constraints.

Figure 13 -- Gate-level optimization

Comparative numbers between a base line flow and that of a low power synthesis flow employing the above techniques show that an embedded processor device in a 90 nm technology of approximately 650K gates can achieve savings of greater than 40% for both dynamic switching power and leakage power. 21

7. ON-CHIP POWER MANAGEMENT


Power management comprises many aspects. One part is the control of clocks and clock frequencies. Another part is the control of the supply voltage for the power domains.

Clock control in general is located in the clock generation module (CGM). Several clock control functions are local control functions residing in the blocks themselves. A special clock control function is the standby clock control which permits usage of a 32 KHz in mobile standby when very little activity is required in the system. The basic structure of clock generation system is shown in figure 14. As we could see SoC chip integrates a powerful clocking scheme which includes clock flexibility during normal operation combined with possibilities to minimize power dissipation by special clock setting for standby and low power mode.

Figure 14 : Basic concept of clock generation system 22

Figure 14: Basic Concept of Clock Generation System As shown in table 1 the core power supply domain has been split up into several sub-domains. Only one of these sub-domains, the standby supply sub-domain (VDD_STDBY) has to remain powered up during the inactive phases in mobile standby. All other core subdomains should be switched off using the on-chip switches. For the VDD_CORE power domain two different supply voltage modes corresponding to two different supply voltage levels can be used. At a nominal supply voltage of 1.35V is running in Fast Mode and the full performance will be achieved. To reduce the power consumption the voltage can be reduced to 1.05V, and then chip is running in Slow Mode. In this mode the maximum achievable clock frequencies are reduced roughly by a factor of 3 and the energy required for any operation will be reduced by about 40%.

For the transition from fast to slow mode the software will first have to reduce the clock frequencies and then signal to the power management IC to reduce the supply voltage. For the transition from slow to fast the ARM controller software will first have to signal to the power management IC to increase the supply voltage. Only after the higher supply voltage level has been reached the clock frequencies may be increased. Since the transition time from slow mode to fast mode will be in the order of 100 microseconds usage of slow mode is only advisable if the system stays in this mode for at least 1 millisecond.

23

8. LOW POWER EMBEDDED MEMORY STRATEGY


Due to the influence on the whole system power of external memory access, system power could be greatly saved to efficiently use the on-chip memory resource. Because of limited on-chip memory resource, its usage is reasonably dynamic scheduled according to the operation mode. The levels of memory hierarchies implemented on BB chip are differentiated by size and access time. The first level comprises Caches and tightly-coupled memories (TCM), and they provide the shortest access time (no memory wait state), but with limited volume; local memory (SRAM and ROM) that have longer access time than the firs level, but still faster than external memory access, and could provide bigger volume, are the second level memory hierarchy; the third level are offchip memories, connected with chip through external memory controller unit. Power optimization of course involves in the optimization of memory system, because much power is dissipated in the memory access. The usage of lower level memory can not only provide better performance, but also decrease power consumption. The optimization target is to put the frequently used code and data to the first and second level of the memory system. Data and code need to put a new place to use TCM and one chip SRAM in an optimized way, which is normally no need to change the code itself, just modify mapping file. In addition code (or initialization data) is required to load to the right memory before execution. Changing the place of date and code could lower the occurrence possibility of some contention, improve cache hit, and thus decrease the access times of higher level memory. Enhanced using efficiency of cache and TCM could improve the performance and reduce power greatly, which also could avoid the power consumed by internal bus and external bus operation, and CPU return to Idle Mode earlier.

As described in section 4, idle parts of chip could be powered off by on-chip switch. The same for memories, i.e. on chip RAMs and ROMs could be powered off to reduce leakage current. If the content of data need retained, RAMs could enter data retention mode to reduce current, or the data in the shared RAMs could be retained in the RAMs without powered off before they are powered off. 24

On chip SRAMs and ROMs are implemented with low leakage devices (highVth) to achieve low leakage in active mode to achieve the target of low power. According to size on chip SRAMs and ROMs have own power domains. System power control module could control SLOWB and SLEEPB ports of RAM to enter corresponding low voltage (slow mode) or data retention (sleep) mode. The external power could be switched off to reduce leakage in sleep mode, but the content could be retained. In this mode RAMs cant be access until back to normal mode.

Figure 15 : On chip memory structure. As shown in Figure 15 in power down mode the switch will switch off memory power, and data cant be retained. When powered up, memory must be reset. In data retention mode, sleep port is active and there will be 70% leakage reduction. The content in memory is retained, and Write/Read is forbidden. In slow mode, slow port is active and 45% leakage reduction. Data inside memory are kept, and Write/Read is possible with a reduce frequency.

25

9. SUMMARY
Submicron technologies used in SoC with their significantly increased leakage currents pose new challenges of power. New power concepts are required to achieve reasonable operating and standby time. A new design methodology, power estimation and optimization of low power are presented to assure the power convergence of power. On chip power management is the key part of low power design, and multiple power domains and clock domain meet the performance requirements of high, medium and low end operation mode, and reduce the power greatly. Dynamic voltage scaling and multiVth technique are implemented to reduce power in mobile operation modes. Low power memory usage is also talked in this paper.

26

Das könnte Ihnen auch gefallen