Arithmetic and Logic Circuits Using Sub-Threshold Pass-Transistor Logic For Ultra-Low Energy Applications

University of Southampton
Faculty of Physics and Applied Sciences School of Electronics and Computer Science
Arithmetic and Logic Circuits Using Sub-Threshold Pass-Transistor Logic For Ultra-Low Energy Applications
By
Choudhury Md Salim Ul Haque Salmee

21st September, 2012
A dissertation submitted in partial fulfillment of the degree of
MSc Microelectronics Systems Design

By examination and dissertation
Project Supervisor: Dr Tom J. Kazmierski Second Examiner: Dr Koushik Maharatna
C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design
September 2012
ABSTRACT This dissertation paper summarises the research and design work carried out during an MSc project which was aimed to develop practical arithmetic and logic circuits being integrated into an Arithmetic Logic Unit for energy constrained applications. The method adopted for ultra-low power design was sub-threshold pass-transistor style logic. The project started with a wide range of literature review, including research publications, focused on the performance of various passtransistor logic styles in terms of speed, power dissipation and area. The circuits of this project were developed both in CMOS and PTL style in order to provide a power comparison between the two styles. Some of the PTL logic circuits designed in this project were modified in terms of transistor size and design style in order to ensure the smooth power efficient operation in the sub-threshold region. Comprehensive simulations were carried out to characterise the circuits in terms of propagation delay and power consumption. Simulations were conducted for supply voltages below and around the threshold with different ambient temperatures and fan-outs. The results show that the implementation of sub-threshold PTL circuits to develop a complex hierarchical structure such as an ALU is feasible. Furthermore, comparative analysis and assessment of the results suggest that for sub-threshold design, PTL logic is power efficient for large scale circuits such as ALU compared to its CMOS counterpart. Measurements of the 8-bit ALU structure show that for worst case simulation conditions such as high sub-threshold supply and extreme temperature, the PTL version consumed 153.15 pw of dynamic power whereas the CMOS version consumed 314.21 pw which is two times more than the earlier one. Maximum power consumption of such design is restricted to a few hundreds of pico watt power which ensures the ultra-low power design of a system. However, power efficiency of PTL is gained at the cost of circuit performance. Despite of that, such system is beneficial for numerous applications for which power is a scarce resource and performance is not the primary concern.
September 2012
ACKNOWLEDGEMENTS First of all, I would like to take the opportunity to express my sincerest gratitude to my project supervisor, Dr Tom J Kazmierski for allowing me to do the project which is related to one of the leading research topics in the field of digital design and also related to my professional interest low power design. His astute supervision and proper guidance helped me to make this project possible. Furthermore, his knowledge on the topic and continuous support during the project encouraged me to conduct in-depth research. I am grateful to Dr Koushik Maharanta, the project second examiner who pointed out some facts about the project that guided me to revise my work with more accuracy. I am also grateful to all the lectures of the modules that I took during my MSc study. Especially I would like to mention Dr Koushik Maharatna and Iain McNally whose lectures and laboratory sessions were very helpful to conduct my research properly. I would also like to thank Mr Robert Rudolf, a full-time research graduate from Electronics and Electrical group for his assistance with Cadence simulation. I would like to acknowledge the library facilities provided by the University of Southampton. I am also thankful to ECS School for providing computer access with state-of-the-art EDA tools and scientific publications.
September 2012
LIST OF CONTENTS
ABSTRACT................................................................................................................................................ 2 ACKNOWLEDGEMENTS ........................................................................................................................... 3 CHAPTER 1 INTRODUCTION .................................................................................................................... 6 1.1. Motivation.................................................................................................................................... 6 1.2. The Project ................................................................................................................................... 7 1.3. Results and Benefits ..................................................................................................................... 8 CHAPTER 2 BACKGROUND AND PREVIOUS WORK ................................................................................. 9 2.1. Energy Constraint Applications .................................................................................................... 9 2.1.1. Micro-sensor Network and Nodes ........................................................................................ 9 2.1.2. Radio Frequency Identification ............................................................................................. 9 2.1.3. Low Power Digital Signal Processor and Microcontroller Unit ............................................. 9 2.1.4. Energy Harvester................................................................................................................... 9 2.2. Sub-Threshold Operations of MOSFET and CMOS Logic Gates ................................................. 10 2.2.1. Strong Inversion .................................................................................................................. 10 2.2.2. Weak Inversion ................................................................................................................... 11 2.2.3. Static CMOS Inverter in Sub-Threshold Operation ............................................................. 12 2.2.4. Application, Advantages and Demerits of Sub-Threshold Logic ......................................... 13 2.3. Pass Transistor Logic (PTL) ......................................................................................................... 15 2.3.1. Basic Operations Principle .................................................................................................. 15 2.3.2. Complementary Pass-Transistor Logic (CPL)....................................................................... 16 2.3.3. Dual Pass-Transistor Logic (DPL) ......................................................................................... 17 2.3.4. LEAP and Other PTL Styles .................................................................................................. 17 2.3.5. Merits and Demerits of PTL ................................................................................................ 18 2.4. Sub-Threshold Pass-Transistor Logic.......................................................................................... 18 2.5. Basic Circuits .............................................................................................................................. 19 2.5.1. PTL Logic Circuits ................................................................................................................. 19 2.5.2. CMOS Logic Circuits ............................................................................................................ 21 2.6. Arithmetic Logic Unit (ALU)........................................................................................................ 23 2.6.1. ALU Design .......................................................................................................................... 23 2.6.1.1. Tree Structure .................................................................................................................. 24 2.6.1.2. Chain Structure ................................................................................................................ 24
4
September 2012
CHAPTER 3 BASIC CIRCUITS DESIGN AND CHARACTERISATION ........................................................ 26 3.1. Design......................................................................................................................................... 26 3.1.1. PTL Circuit ........................................................................................................................... 26 3.1.2. CMOS Circuits...................................................................................................................... 26 3.2. Characterisation ......................................................................................................................... 28 3.2.1. Propagation Delay Measurement ....................................................................................... 29 3.2.2. Power Consumption Measurement .................................................................................... 30 3.3. Presentation of Results .............................................................................................................. 31 3.3.1. Propagation Delay ............................................................................................................... 31 3.3.2. Power Consumption............................................................................................................ 33 3.4. Result Analysis ........................................................................................................................... 36 CHAPTER 4 ARITHMETIC LOGIC UNIT DESIGN, POWER MEASUREMENTS AND RESULTS ANALYSIS . 38 4.1. ALU Design ................................................................................................................................. 38 4.1.1. 1-Bit PTL Design .................................................................................................................. 38 4.1.2. 1-Bit CMOS Design .............................................................................................................. 44 4.1.3. 8-Bit PTL Design .................................................................................................................. 45 4.1.4. 8Bit CMOS Design ............................................................................................................. 47 4.2. Power Consumption Measurements and Results ................................................................... 47 4.2.1. Simulation Setup ................................................................................................................. 47 4.2.2. Results ................................................................................................................................. 48 4.3. Result Analysis ........................................................................................................................... 54 CHAPTER 5 CONCLUSION AND FUTURE WORK .................................................................................... 56 APPENDICES .......................................................................................................................................... 58 Appendix A Project Gantt Chart ..................................................................................................... 58 Appendix B - Design Files .................................................................................................................. 58 Appendix C - Detailed Simulation Data ............................................................................................. 58 REFERENCES .......................................................................................................................................... 59
September 2012
CHAPTER 1 INTRODUCTION Power consumption is a major concern for integrated electronic circuits and devices. It influences the design and fabrication of such circuits and systems in two aspects. Firstly, power dissipates in the form of heat which affects the performance of a chip. It also requires special cooling and packaging which is expensive. Secondly, the increasing number of mobile systems and energy constrained applications such as an energy harvester, micro-sensor nodes and self-powered Radio frequency identification (RFID) require low power consumption to maximise their battery life. Therefore, there have been on-going researches on a multiple level of systems such as behavioural, architecture, logic and technology level. 1.1. Motivation The previous project [1] and [2] on sub-threshold pass-transistor logic provided solid assessment based on basic logic circuits and adder circuits that sub-threshold PTL circuits are more energy efficient than CMOS counterparts with the circuit propagation delay being trade off with low power consumption. This research project concentrates on developing more complex and practical arithmetic and logic circuits based on sub-threshold PTL in a view of minimizing the energy consumption of a digital circuit system (processor) for ultra-low energy applications. For energy constrained applications, standard practice is to use conventional microcontroller. These microcontrollers have far more contemporary and multipurpose functionality with the capability of operating in tens to hundreds of megahertz of clock frequency. With multiple general purpose input-output terminals, these microcontrollers also have very precise and highspeed ADCs. All these features and flexibility of use lie behind the obvious usage popularity of such microcontrollers in a wide range of applications. Energy consumption of these microcontrollers is not a serious issue for typical household, industrial or automotive applications. However, for energyconstraint applications where power is a scarce resource, this power consumption is a significant factor. The project investigates to find more energy efficient cohesive circuits for designing the building blocks of a customary processor in deep transistor level. There were three aspects of research. Firstly, the research focused on PTL circuits only instead of CMOS logic circuits since many publications and research [3], [4], [5], [6] and [7] concluded that PTL has lower leakage and require less number of transistors compared to CMOS logic. The second aspect is the use of transistor in subthreshold region as a method for low power consumption [8], [9], [10] and [11]. Transistor operating in the sub-threshold region consumes a very small amount of energy, but at the cost of circuit performance in terms of speed [11]. However, for the aforementioned energy-constraint applications, performance is insignificant and primary concern is power consumption. Chapter 2 includes the details of the sub-threshold operation of CMOS logics and other prominent energyconstraint applications. Lastly, the study includes energy efficient structural methods for complex circuits [29], [30] and [31]. The research is motivated by the previous project [2] work which shows that PTL logic circuits are more energy efficient than CMOS logic and PTL can operate in sub-threshold voltage. Moreover, other studies [4], [6] and [7] conceptualized that PTL can be operated in sub-threshold voltage. However, the project [2] validates sub-threshold PTL only for a limited number of basic logic circuits and relatively smaller hierarchical structures. Positive outcome of [2] could effectively lead towards building larger circuit blocks and hierarchical structures and ultimately to the development of an ultra-low power digital system (processor). If successful, this can be advantageous for energyconstraint applications in two ways. Firstly, it will make the design simpler with a smaller number of circuits and devices. Secondly, energy consumption will be more efficient which can ensure ultra-low
6
September 2012
power consumption of the system. To the knowledge of the author, apart from the previous project [1] and [2], there has been a very insignificant amount of studies and publication in subthreshold PTL. 1.2. The Project The whole design project was conducted with Cadence AMS 0.35m process design kit. The MOSFET transistors used in this project are obtained from this PDK built-in library where the transistors are fully characterised for all three regions of operations including sub-threshold. Therefore, the simulation results are asserted to be valid and accurate. The project started with developing a comprehensive collection of PTL and CMOS basic circuits for large scale design structures. Therefore, a total of 9 basic logic circuits were added to the existing strings of PTL and CMOS circuits from [2]. Circuits were chosen and designed carefully in order to develop efficient and hierarchical structures of 1-bit and 8-bit ALU. All the PTL circuits were thoroughly characterised in terms of propagation delay and power consumption for different fanouts, ambient temperatures and sub-threshold supply voltages. The characterisations were carried out for all the PTL circuits and two CMOS circuits only. This is because the project goal was to develop more advanced and larger PTL circuits and also to avoid the repetition of the previous project work on CMOS circuits. Based on basic circuits, 4 versions of 1-Bit PTL ALU with different style and functionality were developed and characterised for power consumption. Development of the latest version was encouraged by the successful implementation of the earlier ones. Design of 8bit PTL ALU was based on the latest version of 1-bit ALU, which is explained in chapter 4. The 8-bit ALU was designed both in PTL and CMOS logic and the two structures were compared for power consumption in different temperatures and supply voltages. A total of 7 PTL hierarchical circuits and 4 CMOS hierarchical logic circuits were created during the ALU design process. Additional 53 PTL test circuits and 25 CMOS test circuits were designed for simulation purpose. An overall of more than 2500 simulations were executed for design and characterization during the course of the project. The dissertation paper describes all the research and project works that were carried out during the course of this project. The project started with a wide range of literature review and study of the previous project [2] which is included in chapter 2. Literature review comprises of subthreshold operations of MOSFET and CMOS inverter, applications, benefits and disadvantages of sub-threshold operations. It also summarises the contemporary and major research findings on subthreshold design. The review continued with different PTL design styles with their advantages and disadvantages. A brief section in this chapter includes the sub-threshold PTL operation. It also contains a review of the basic PTL and CMOS circuits from [2]. Different design methods for ALU were also a part of the literature review. Chapter 3 includes the design work of extended clusters of PTL and CMOS basic circuits with brief descriptions of functionality and features. All the PTL circuits including two CMOS circuits were characterised under different simulation conditions which includes different supply voltages and temperatures for different fan-out circuits. The result of characterisation - propagation delay and power consumption (static and dynamic) of PTL circuits are presented with explanations. The paper continues with practical design work for 1-bit and 8-bit ALU in chapter 4. It provides design details of 1-bit ALU and power comparison between different versions of 1-bit ALU with explanations for the best possible version, selected for 8-bit hierarchical ALU design. Along with the detailed design architecture, the chapter presents power comparison of the 8-bit ALU in PTL and CMOS structure and concludes with result analysis. The paper finishes with project outcome and suggestions on prospective future work which
7
September 2012
is in chapter 5. A grant chart with detailed timing on project progress and development is included in appendix A. Appendix B includes lists of all the design files along with the Cadence design files. Appendix C is provided with detailed simulation data. Both the appendix B and C are available in the submitted zip file. 1.3. Results and Benefits The result shows the successful implementation of sub-threshold PTL logic in a complex and hierarchical design such as ALU. As mentioned earlier, the previous project [1] and [2] validated this method on basic logic circuits only and no other researches provided a solid assessment of the practical feasibility of using PTL in sub-threshold. Moreover, achievements of this project along with [2] directly oppose the suggestion of other research [12] that sub-threshold PTL is unfeasible in principle. The ALU developed in this project is one of the major building blocks of a processor. The project requires a lot of research work and in-depth analysis which was beyond the scope of this project due to the specific goal and time constraint in MSc degree. The successful implementation of this method will be an essential development in terms of power consumption for ultra-low energy applications. The challenging part is the effective implementation sub-thresholds PTL for other major building blocks to successfully develop an ultra-low power digital system (processor), which demands a significant amount of research and design work.
September 2012
CHAPTER 2 BACKGROUND AND PREVIOUS WORK 2.1. Energy Constraint Applications The following section includes a brief description of prominent and contemporary applications that can be benefited from ultra-low power design. 2.1.1. Micro-sensor Network and Nodes A micro sensor node is a node in a micro-sensor network capable of sensing, computing and communication functionality. Typically, tens of thousands of spatially distributed micro-sensor nodes constitute a wireless micro sensor network for sensing, processing and relaying information data to the end user [11]. There have been many on-going researches on the practical implementation of such network and substantial proposed applications are health monitoring, automotive sensing, habitat and structural monitoring [11]. The performance requirements for this application are very low, for example, measuring the rate of change of data for health monitoring is in the order of few second to a minute [11]. The battery lifetime required for micro-sensor network is very long since it is impossible to change the battery of such nodes frequently. Therefore, low performance and longer battery life requirement make the micro-sensor network a perfect candidate for ultra-energy technology implementation. 2.1.2. Radio Frequency Identification Radio frequency identification (RFID) system is used to track and identify an object by means of an RFID tag attached to the object [11]. RFID tags use radio frequency to communicate with the end user. These tags are being used for many years and flexibility of use has spawned in to many applications such as medical implants, tracking automobiles, pharmaceutical goods, livestock and pets, smart credit cards and smart keys for automobiles. An RFID tag usually has antenna and other communication circuits [11]. The functionally of an RFID tag requires very simple logic processing [11]. An active RFID tag transmits signals to the reader using energy from the battery. Extra energy from battery could ensure extended processing. Moreover, low powered design means lower energy for communication and hence communication distance could be longer. On the other hand, a passive tag operates and also most often energized by the electromagnetic signal it receives from the reader. As a result, a passive tag is smaller in size and independent of energy consumption. By minimizing the digital processing power, it would require less transmission power from the reader and makes the communication distance longer. 2.1.3. Low Power Digital Signal Processor and Microcontroller Unit Portable applications have successfully used Texas Instrument (TI) C5xx family of Digital Signal Processor and the T1 MSP430 microcontroller unit for metering, measurement and instrumentation purposes [11]. Modern day portable devices, such as mobile phones and PDAs require a dynamic range of power consumption and performance. Such applications require high performance digital signal processor or microcontroller unit during active mode. When in standby mode, they urge for limited processing and low power consumption in order to extend the battery life. Although in a variety of applications for both active and standby mode, devices are required to be optimized for power consumption. 2.1.4. Energy Harvester Energy harvesting is the source of energy for small wireless electronic autonomous devices like wireless sensor networks [13]. By this process, energy is derived from external sources such as thermal, solar, wind and kinetic energy into electrical energy for circuits. A wide range of low power
University of Southampton 9
September 2012
applications can be benefited from the energy harvesting process provided that there is abandoned energy source and sufficient amount of energy can be derived from the source for the required operations [13]. Figure 1 shows a block diagram of a typical self-powered wireless sensor node using piezoelectric vibrating energy harvester [13]. The system includes a microcontroller unit (MCU) with integrated antenna for transmission and sensors for collecting information from the environment. The supply voltage required for the MCU is 3.3 volt.
Figure 1 a) Block Diagram of a Self-Powered Smart Sensor Node with Energy Harvesting Method b) Different Node Voltages with Time (Adapted from [13] and reprinted (b) from [2]) The derived energy from the harvester is rectified and fed to a super capacitor with nominal capacitance of milli-farads to tens of farads [13].It takes hours to charge the capacitor to 1-1.2 v (figure 1b) which allows the voltage regulator to start. To reach a fully functional energy level for the system, it takes more than 26 hours of energy harvesting. Moreover, the voltage regulator requires a cold start circuit [13] for successful operation of the system. All these factors are disadvantageous since the system consumes time and energy and also it requires the additional components which implies higher cost. Therefore, the energy harvesting process is a prime candidate for ultra-low power design which can ensure low power consumption with relatively faster operation time and also make the design simpler hence cost effective. 2.2. Sub-Threshold Operations of MOSFET and CMOS Logic Gates 2.2.1. Strong Inversion The requirement for the normal operation of a MOSFET is the gate voltage to be bigger than the device threshold voltage [14]. The region of this operation can be referred to as strong inversion operation [14]. VGS > VT, strong inversion requirement (1) There are two regions of operation for strong inversion triode and saturation region. Both region of operation is controlled by the bias voltage of the device. For an nMOS transistor, Expression (2) and (3) shows the condition for triode and saturation region operation consecutively. VDS < VGS VT (2) VDS VGS VT (3)
10
September 2012
IDS (A)
400
300
Triode Region
VDS = 1.8 V
200
VDS = 1.5 V
100
VDS = 1.2 V Saturation Region VDS = 0.9 V
0.3
0.6
0.9
1.2
1.5
1.8
VDS (V)
Figure 2 Current Voltage Characteristics of an Ideal NMOS transistor [14] In triode (linear) region, the device behaves like a linear resistor whose value is controlled by VGS [14]. In saturation, the device current reaches a maximum value and the device is said to be pinched off [14]. 2.2.2. Weak Inversion A MOSFET is said to be in cut-off region for gate voltages less than the device threshold voltages. In theory, there is no current flow. However, in practical a weak inversion layer exists which causes the flow of diffusion carriers in the channel [11]. Therefore, the device current IDS exhibits an exponential dependence on VGS [15]. This region of operation is called the sub-threshold regime. VGS < VT, weak inversion requirement (4) The sub-threshold current is mainly contributed by diffusion current [11]. Expression (5) represents the basic equation for sub-threshold current. = o
= Io exp
(
Expression (5) shows that the sub-threshold current is strongly corresponding to thermal voltage = . It also depends exponentially on VGS. Expression (6) shows the sub-threshold slope n which depends on device capacitance. An nMOS transistor operating in different gate voltage, VGS below threshold voltage (approximately 0.57V) and the corresponding drain current IDS response is shown in figure 3. It implies that nMOS can operate in the sub-threshold region [2].
= 1 +
1) 2, drain current at VGS VT

(5)
[11] [11]
(6)
[11]
11
September 2012
Figure 3: VGS versus IDS for nMOS Transistor at VDS = 0.5v in 0.35m AMS Technology (Adapted from [2]) 2.2.3. Static CMOS Inverter in Sub-Threshold Operation Inverter in sub-threshold mode requires the supply voltage VDD to be less than the threshold voltage, VT to ensure the weak inversion operation for both the NMOS and PMOS transistor of inverter while maintaining input logic 1 value less than VT [9] and [11]. That ensures the successful implementation of CMOS inverter in sub-threshold.
VDD < VT
PMOS Vin Vout
NMOS
Figure 4: CMOS Inverter in Sub-Threshold Operation [11] Although the sub-threshold inverter implementation is feasible, many researchers expressed concern on the delay of such logic gates [16], [17], [18] and [19]. The propagation delay of a symmetric inverter for VDD < VT is stated in expression (7), from where it can be seen that the delay is strongly depended and inversely proportional to Vdd [11]. On the other hand, dependence on Vdd of the speed (tpd) of a normal inverter (8) is insignificant. Figure 5 shows the normalised speed for different supply voltage of an inverter. In the sub-threshold region, the speed decreases at the rate of 6 times per 100 mv [11].
12
, =
= (
September 2012
(7) (8)
Figure 5 Relative Normalized Speed versus Voltage of a CMOS Inverter [11] The voltage transfer characteristics (VTC, shown in figure 6) of a static CMOS inverter is similar for both normal and sub-threshold operation [11]. This is a key fact that makes the subthreshold implementation of logic cell possible without any large scale adjustment in design.
Figure 6 Voltage Transfer Characteristics of a AMS 0.35m CMOS inverter for VDD = 1.8 V and 0.3v [2] 2.2.4. Application, Advantages and Demerits of Sub-Threshold Logic The most important feature of sub-threshold design is that it can offer minimal energy consumption in electronic circuits. Figure 3 shows that for a small drop in supply voltage, the consumption of current reduces by a decade [2]. However, such energy efficiency comes at the expense of performance which is the large propagation delay in circuits. Figure 7 depicts a rough idea of how speed can be affected by low power. Conventionally design is optimized at MinimumDelay Operation Point (MDP). When emphasized in power consumption, it can only achieve
13
September 2012
Minimum Energy Point (MEP). In [8], Markovic states that for 10 times lower energy consumption, the propagation delay would increase by 1000 times.
Normalised Energy
~ x 1,000 MDP Traditional Operation Region

~ x 10
Suboptimal
Infeasible Emin Dmin
Ultralow-Energy Region MEP Normalised Delay
Figure 7: Energy Delay Trade-off for Minimum Delay Point (MDP) and Minimum-Energy Point (MEP) [8] Dependence of threshold voltage on the temperature along with process is another major concern for sub-threshold design [18]. For a mere change in temperature, the exponentially dependent current (5) changes significantly. Therefore, sub-threshold design has to concede restriction for a primary design parameter such as speed. On the other hand, sub-threshold design does not require immense amount of design effort and hence easier to implement. Calhoun and Wang showed in their research that with a slight modification, a standard cell library using 0.18m technology can operate smoothly in sub-threshold voltage [9] and [11]. They analysed different process corners TT, SF and FS in order to discover the lowest working voltage for each process. The result show that all the process can operate in subthreshold voltage. However, certain cells in FT process show unstable operation in sub-threshold. This is because the cells are designed with a longer series of logic gates and a large number of parallel transistors, as the authors conclude [8] and [25]. In [25], Calhoun and Wang suggested resizing of transistor for the unsettling cells to achieve stable sub-threshold operation. Positive outcome from researches [9], [10] and [11] regarding stable operation of standard CMOS library in sub-threshold voltage is very beneficial for the design process since modern day digital design process is dependent on cell library synthesis and HDL entry. Therefore, it could be possible to design a VLSI integrated circuit with minor modification using standard designing process. In spite of all the concern regarding speed, temperature and process dependency, a number of applications implement sub-threshold technique since it offers low power consumption and easier design process. As mentioned earlier in this chapter that portable applications like mobile phone, PDA require dynamic range of power and process operation. Ultra-Dynamic Voltage scaling (UDVS) is used to ensure the low power consumption in such devices for extending the battery life [25]. For high performance critical operations, it allows devices to run in high voltage or in high frequency. While in sleep mode, the devices run in sub-threshold voltage to minimize power consumption. Another major platform of sub-threshold technique exploration is the energy constrained applications. These applications typically do not require high performance process and strive for low power consumption. Earlier section of this chapter (section 2.1) exemplifies how these applications can be benefited from low power consumption which is the primary goal of this project.
14
September 2012
2.3. Pass Transistor Logic (PTL) In standard CMOS logic circuits all input signals are applied to the gate of both nMOS and pMOS transistors. When in static mode, the complementary transistors are either in cut-off mode (high impedance) or in saturation mode (conducting) depending on the input signals state. However, in pass transistor logic (PTL) the input signals are connected to both drain and source of a transistor [20]. 2.3.1. Basic Operations Principle A popular alternative of conventional CMOS logic is PTL. PTL requires comparatively fewer number of transistor than CMOS and easier to implement. Figure 8a shows an nMOS transistor implemented as in PTL AND gate. Source voltage of the transistor is VDD VT [27] and [20]. In practice, the supply voltage is much bigger than the voltage drop caused by VT and the output voltage is considered as logic 1. However, it is inadequate to carry out the AND operation for the arrangement of figure 8a where circuit goes to high impendence state for gate logic 0. Therefore another nMOS is added to the design (figure 8b) [27] and [20]. The addition of nMOS2 is essential for the static design since it ensures low impendence path to the supply rail (input rail for PTL) under all the circumstances provided [27].
VA = VDD
A Gate B
VY = VDD - VT
nMOS Y = A.B Drain Source
nMOS2 A.B Drain Source Gate B Y = A.B
VB = VDD
nMOS1 B.nB Drain Source Gate nB
b) Figure 8 a) Pass Transistor Operation Using Single nMOS b) AND Operation Using Two nMOS
a)
PLT logic makes the design much easier with fewer transistor and variety of logic operations. Compared to 6 transistor in CMOS implementation, it uses only 2 transistor for the AND operation. Other logic operations are also achievable with the appropriate change of wiring. Expression (9) shows the logic function of a PTL AND gate. VY = VG1 VD1 + VG2 VD2
(9)
A major concern for PTL design is the lower output voltage due to VT drop, as mentioned earlier. A PTL NAND gate should not be connected the input of another gate [27] for the VT drop at output end [27] (figure 9a). The degraded output ultimately becomes insufficient to drive the next gate. When connected in series, the input signal is degraded for VT drop throughout the chain (Figure 9b). Therefore, it does not allow a very longer chain connection.
VDD
VIN VT1 nMOS1 VIN2 nMOS1 VIN2 VT2
VIN
nMOS1 VDD
VIN VT1 nMOS2 VDD
VIN VT1VT2 nMOS3 VDD
VIN VT1 VT2 VT3
Figure 9 a) Pass Transistor Output Driving Another Gate b) Degradation of Voltage in Pass-Transistor Chain [27]
15
September 2012
However, this signal degradation can be recovered by using a level restorer buffer (figure 10). Conventionally, a CMOS inverter is used at the end of the chain to restore the signal to logic values 1 = VDD and 0 = 0V. This added inverter however leads to static dissipation.
A nMOS2 A.B A.B B B nMOS1 B.nB VDD Y = n(A.B)
nB
Figure 10 Level Restoration Using CMOS Inverter [27] An important feature of PLT logic needs to be addressed is that it uses complementary signals for input signal. In accordance to that fact, a number of design methods have been introduced such as CPL, LEAP, and Dual PTL. 2.3.2. Complementary Pass-Transistor Logic (CPL) Complementary Pass Transistor Logic (CPL) is based on the true and complementary signal at both the input and output end. The operation is based on the discussed PTL AND gate (figure 8b). The logic is also known as differential pass transistor logic for the complementary outputs. Figure 11 shows AND/NANND, and OR/NOR gate. They follow the same topology with input signal combinations defining the type of logic operation [20]. Furthermore a XOR/XNOR gate could also be derived from the same topology.
VDD nY VDD Y VDD nY VDD Y
B A B nB
B nA nB nB
B B A nB
B nB nA nB
a)
b)
Figure 11 Pass Transistor Logic Circuits a) AND/NAND b) OR/NOR The main feature of CPL is that it offers a simple Full-Adder implementation. Simple design of XOR/XNOR gate allows to design a 2-input Full-Adder very easily. This Full-Adder is used in this project and detailed discussion is included in section 2.5.1. First major publication on CPL implementation was made on 1989 [7]. The researcher from Hitachi Research Laboratory proposed a 3.8ns CPL multiplier (16x16) in 0.5m technology. It was reportedly the fastest version of multiplier at the time of publication. The research concluded that for low static power dissipation and smaller circuit capacitances, CPL is more efficient in terms of power consumption and speed. When compared to transmission-gate logic (TG), research [21] shows the similar result in terms of speed efficiency for CPL. However, the study is based on 2-input
16
September 2012
basic logic cells only. 2.3.3. Dual Pass-Transistor Logic (DPL) Dual Pass-Transistor Logic (DPL) overcomes the CPL threshold voltage drop when passing logic 1. Unlike CPL logic which uses a CMOS inverter to overcome the voltage drop, DPL uses pMOS logic in parallel with nMOS. Figure 12 shows DPL AND/NAND gate. In this approach, the pMOS transistor passes logic 1 without any threshold loss while logic 0 is passed by nMOS transistor [20].
A nB
B B
nA
nB
nA
A.B
n(A.B)
Figure 12 AND/NAND Logic gate in DPL Similarly for CPL, DPL offers a very efficient Full-Adder design. Other logic gates such as OR/NOR and XOR/XNOR could also be designed effectively. Furthermore, the circuit capacitance in DPL is equally distributed for each output as well as for the inputs [6] and [20]. The researchers in the project [6] successfully designed a 32-bit ALU based on 0.25m technology and reported that the ALU is 30% faster than the CMOS version. The research also proposed a carry propagation circuit to resolve the signal propagation issue which is a major concern for PTL design. 2.3.4. LEAP and Other PTL Styles Lean Integration with Pass Transistor (LEAP) was introduced in 1996 in [3]. The researchers successfully developed a smart and small PTL based cell library (7 cells) with a synthesis tool defined as cell inventor. The main objective of the research was to optimize area, speed and power optimization in digital design. The outcome of scheme [3] indicates that LEAP obtains all the primary objectives. Furthermore, LEAP was more cost effective compared to CMOS. Along with 4 different inverters used to meet the drive requirement, the cell library consists of 3 logic cells Y1, Y2 and Y3 (figure 13). These 3 cells are capable of executing basic logic function with different number of input signals as necessary. The Y3 cell is used in this project for 4-input MUX which is further discussed in chapter 3. Further study [22] on LEAP cell-library focused on synthesis algorithm.
A C B nC
Y1
Y2
Y3
Figure 13 Basic Cells for Logic Operation in LEAP [3] Further research on PTL technology similarly emphasized on synthesis algorithm of basic
17
September 2012
cells [5], [23], [24] and [25]. In these projects, a complete cell library was designed using MUX gates only. The MUX cells adapted the same circuit topology as the Y3 cell of LEAP technology (figure 13). All the MUX gates were associated with different drive inverters. 2.3.5. Merits and Demerits of PTL As mentioned earlier that the key benefit of PTL design style is that it requires lower number of transistor compared to CMOS design [3], [6], and [7] and hence easier to design. Furthermore, PTL is comparatively power efficient in terms of both static consumption and dynamic consumption. Ideally, PTL designs do not have a direct path to from power rail to ground rail provided that no inverters are used. Therefore, no gate current induces which is the main contributor of static power dissipation. This leads to better speed operation of PTL [3] and [7]. Furthermore, lower number of transistor leads to reduced dynamic power dissipation. Expression (10) shows the equation of dynamic power dissipation [14]. PTL designs have lower number of switching nodes and subsequently lower node capacitance which is why PTL have low dynamic power consumption. As the PTL devices do not define the drive of the gates, transistor sizes are kept to a minimum which also lead to lower circuit capacitance and hence lower dynamic dissipation. Moreover, due to reduced voltage swing, PTL requires low switching energy [27]. Where is switching activity factor (10) [14] = 2
However, PTL design styles require major modification in process technology, and hence the cost of fabrication increases, since most of the aforementioned researches use specific low threshold voltage MOS devices [21]. Zimmermann in his research [26] identified that the previous works on PTL focused developing Full-Adders only which is relatively easier to design in CPL or DPL compared to least efficient CMOS approach. Furthermore, design topology of PTL requires immense design effort and layout of such design is complicated as well. In fact the outcome research [26] is based on the variety of digital application in CMOS which does not thoroughly cancel out the merits of PTL design. 2.4. Sub-Threshold Pass-Transistor Logic A number of researches have been conducted on sub-threshold voltage implementation and pass-transistor logic separately for different parameter optimization such as speed, power consumption and area. However, there is only a limited amount of research discussing about combining both the techniques. Most of the researches concentrate on circuit performance in terms of speed for different design techniques. In [16], Moalemi and Afzali-Kusha examined the propagation delay dependency on temperature for different sub-threshold PTL design. Speed is a major concerns for such sub-threshold design. However the result of [16] is not comprehensive since it investigated only XOR gates. Moreover, the research ignored the resistive component of input capacitance for series chain of pass-transistors and carried out the test with ideal load capacitors only. Other researches focused on sub-threshold PTL in the perspective of reducing power consumption. In [19], the researchers analysed a Dynamic Threshold MOS (DTMOS). The gate terminal of such device is shorted to the body (figure 14). This connection allows the threshold voltage to change depending on gate voltage values. In this method, however, the threshold voltage changes along with the supply voltage and hence this approach cannot be categorised as subthreshold design. Furthermore, each DTMOS requires their body to be isolated which give rise to design complexity.
18
September 2012
DTnMOS
DTpMOS
Figure 14 DTnMOS and DTpMOS Circuit in DTMOS Mode As mentioned earlier, that many researchers declared different type of PTL design to be more energy efficient than CMOS design. Moreover, sub-threshold implementation is capable of optimizing the design for minimal power consumption. Combination of these two techniques indicates a substantially power efficient design at the cost of speed. Therefore, sub-threshold PTL design could be greatly beneficial for self-power energy-constraint application where power is a scarce resource and performance is not the main concern. 2.5. Basic Circuits The previous project [2] developed a hierarchical Accumulator-Adder and compared the power consumption of PTL and CMOS design. Therefore, it created a total of 6 PTL basic circuits and another 5 CMOS circuits. The following section includes the design details and features of each the basic circuits from [2]. 2.5.1. PTL Logic Circuits AND/NAND, OR/NOR and XOR/XNOR Design of these basic circuits is based on CPL method which is discussed earlier in 2.3.2. All the circuits use to same circuit topology (figure 15). It is the input combinations which determine the function of the circuits. Because of the differential design, the circuits have complementary inputs and outputs. It eliminates the necessity of additional inverters which is often a requirement for static CMOS design. Moreover, the design of XOR and XNOR gate have 4 transistors only which makes the design very simple compared to their CMOS counterpart. Each design has a level restoring inverter for recovering voltage level of logic 1 to Vdd. Transistor size of the inverter is selected such that they provide balanced minimum delay, but at the same time providing sufficient drive [14]. The size of the pass transistors are kept to minimum since they do not define the drive of the gate. It also minimizes the circuit capacitance which in turns reduces dynamic power consumption [14].
AND
VDD
W=3.3u L=0.35u
NAND
VDD Y
W=3.3u L=0.35u
OR
VDD
W=3.3u L=0.35u
NOR
VDD Y
W=3.3u L=0.35u
XOR
VDD nY
W=3.3u L=0.35u
XNOR
VDD Y
W=3.3u L=0.35u
nY
nY
W=0.85u L=0.35u
W=0.85u L=0.35u
W=0.85u L=0.35u
W=0.85u L=0.35u
W=0.85u L=0.35u
W=0.85u L=0.35u
W=0.4u L=0.35u
W=0.4u L=0.35u
B A B nB
W=0.4u L=0.35u
W=0.4u L=0.35u
B nB nA nB
W=0.4u L=0.35u
W=0.4u L=0.35u
B B A nB
W=0.4u L=0.35u
W=0.4u L=0.35u
W=0.4u L=0.35u
W=0.4u L=0.35u
nB A nA B
W=0.4u L=0.35u
W=0.4u L=0.35u
nA nB nB
A nB nA
a)
b)
c)
Figure 15 PTL Basic circuits a) AND, NAND b) OR and NOR and c) XOR and XNOR D-Type Flip Flop The design of D-type flip flop [2] is based on the proposed version by Hsiao in [4] with a slight modification on the circuit component. Figure 16 shows the flip flop designed in [2].
19
September 2012
VDD
W=0.4u L=0.35u W=3.3u L=0.35u W=0.85u L=0.35u W=1u L=0.35u W=0.4u L=0.35u
VDD
W=0.4u L=0.35u
VDD
W=3.3u L=0.35u W=0.85u L=0.35u
VDD
W=3.3u L=0.35u W=0.85u L=0.35u
D nClock nReset Clock
nQ
Figure 16 Resettable D-Type Flip Flop Based on PTL The designer in [2] modified the original design for using the flip flop in sub-threshold voltage. The feedback pMOS transistor used in [4] for better performance of inverter and speed increment, was removed. This is because the author [2] claimed that pMOS caused the inverter to be in permanent pull-up mode in sub-threshold and hence the circuit was not operational. Moreover, the circuit in [4] has pMOS and nMOS clock transistors. The author [2] observed that pMOS caused significant delay in the circuit causing inappropriate non-synchronous operation of the circuit. Therefore, the pMOS clock transistor was replaced by the nMOS transistor which enables the edge triggering of flip-flop. Moreover, the whole project was inspired by nMOS pass-transistor, the author claimed [2]. The transistor size in the flip flop is same as the other basic logic circuits. 2-Input Multiplexer The 2-input multiplexer is a very simple circuit consisting of 2 nMOS transistor. This is the most commonly used multiplexor in PTL method, especially in CPL and LEAP. The inputs of the circuit are controlled by the complementary control signal Load and nLoad. This multiplexer is used with the D-type flip flop to design a load register (figure 17). According to the author, no level restoring inverter is used with the multiplexor because it is loaded with small capacitance from D-type.
MUX2 nLoad Q D
W=0.4u L=0.35u
DTYPE
D Clock Clock nClock nClock nReset nReset
Q nQ
Q nQ
W=0.4u L=0.35u
Load
Figure 17 PTL Load Register using 2-Input MUX and D-Type Flip Flop Load Register As mentioned earlier that the register is designed with connecting the 2-input multiplexor with the D-type flip flop as shown in figure 17. When the Load signal is enabled (logic 1), the register updated with value from input signal, D otherwise it retains the value from previous stage. Full -Adder PTL Full-Adder is one of the major benefits of PTL based design because it is easier to design with effective circuit functionality. Figure 18 shows a classic Full-Adder circuit is based on PTL AND/NAND, OR/NOR and XOR/XNOR circuits. It appeared in a number of publications [26], [20] and [7] and was analysed successfully. Moreover, the publications also concluded that this PTL version is faster and more energy efficient than any other CMOS version. With all input signals being differential, the Full-Adder can provide complementary output of sum signal S and
20
September 2012
carry-out signal Cout.

nA nB B A
W=0.4u L=0.35u
nCin
W=0.4u L=0.35u Wn=3.3u Wp=1.85u
W=0.4u L=0.35u
Cin
W=0.4u L=0.35u
W=0.4u L=0.35u
W=0.4u L=0.35u Wn=3.3u Wp=1.85u
nS
W=0.4u L=0.35u
W=0.4u L=0.35u
nA B A Cin nB nCin nA
W=0.4u L=0.35u W=0.4u L=0.35u
nCin nB A Cin B
W=0.4u L=0.35u W=0.4u L=0.35u
nA
W=0.4u L=0.35u Wn=3.3u Wp=1.85u
Cout
W=0.4u L=0.35u
W=0.4u L=0.35u
W=0.4u L=0.35u
A
W=0.4u L=0.35u W=0.4u L=0.35u Wn=3.3u Wp=1.85u
nCin nA
nCout
W=0.4u L=0.35u
W=0.4u L=0.35u
nB
nA
Figure 18 PTL Based Full-Adder 2.5.2. CMOS Logic Circuits AND Gate Figure 19 shows the classic CMOS logic circuits for 2-input AND gate. The AND gate includes a classic CMOS inverter. Transistor sizes of the inverter are kept same as the ones used in PTL design. It allows comparing of the CMOS structures with their PTL counterparts under realistic condition. In fact, all the CMOS circuits except the D-type flip flop have the same size of nMOS and pMOS transistor as the inverter. This is because the ratio of pMOS to nMOS transistor from 1.4 to 2 is proven to provide minimum delay and sufficient drive [14].
VDD
W=3.3u L=0.35u
VDD
W=3.3u L=0.35u
VDD
W=3.3u L=0.35u W=0.85u L=0.35u
A B
W=1.85u L=0.35u W=1.85u L=0.35u
Figure 19 Classic CMOS 2Input Logic Circuit for AND Gate

21
September 2012
D-Type Flip Flop The CMOS version of D-type flip flop is shown in figure 20. The circuit has a reset input signal (nReset) and it is triggered at the rising edge of clock cycle which is similar to its PTL counterpart. The circuit consists of six NAND gates with three 2-input gate, two 3-input gate and one 4-input gates. The design of flip-flop is an optimized style of a typical Master-Slave circuit [28]. Although the input signal nD is inverting, the output signal is differential. In typical design approach, the pMOS transistor is bigger in size compared to the nMOS transistor. This is, however was not operational in sub-threshold since the circuit did not respond to the positive edge of Clock signal which was reported in [2]. The researchers in [9] also reported similar incident for sub-threshold voltage and suggested resizing of the flip flop with nMOS transistors bigger in size than the pMOS ones. Therefore the transistor were resized as shown in figure 20 (Wp= 1.85 um and Wn= 3.3 um) and the flip flop was observed to be operational at the positive edge of Clock signal [2].
Wp=1.85u Wn=3.3u nD Wp=1.85u Wn=3.3u Wp=1.85u Wn=3.3u Wp=1.85u Wn=3.3u Wp=1.85u Wn=3.3u Q
nQ Wp=1.85u Wn=3.3u
Clock nReset
Figure 20 CMOS D-type Flip Flop with Master-Slave Configuration [28] 2-Input Inverting Multiplexer The two-input multiplexor circuit in CMOS design is shown in figure 21a. The input signal nLoad and the output signal nD are inverting which compensate for the inverting input of D-type flip flop. This inverting output, however discard the use of additional inverter at the output when required for circuit operation.
VDD VDD W=3.3u L=0.35u nLoad W=0.85u L=0.35u Load Q W=3.3u L=0.35u W=3.3u L=0.35u D nLoad VDD W=3.3u L=0.35u W=3.3u L=0.35u
Q nD D
Load
Q nLoad
W=1.85u L=0.35u W=1.85u L=0.35u
Load D
W=1.85u L=0.35u W=1.85u L=0.35u
Load
Figure 21 a) 2-Input CMOS Multiplexer Circuit with Inverting Output b) Circuit Symbol [28]
22
September 2012
Load register The design of CMOS load register is similar to the PTL version with slight modification. Figure 22 shows that the Load Register uses inverting multiplexer in order to compensate for the inverting input of the modified version of D-type flip flop, as mentioned earlier. The operation of the register is similar, with the input signal D being stored at the positive edge of Clock signal.
MUX2 DTYPE
nD D Load
D Clock Clock nReset nReset
Q nQ
Q nQ
Figure 22 CMOS Load Register Full-Adder The Full-Adder circuit shown in figure 23 is a classic version of CMOS design. Although it requires a total of 28 transistors, it is the most optimized version in terms of performance and the number of transistor required [14], [20] and [26]. Transistor size ratios are maintained as similar to basic logic circuits which are 3.3um/0.35um for pMOS and 1.8um/0.35um for nMOS.
VDD A Cin W=3.3u L=0.35u W=3.3u L=0.35u B W=1.85u L=0.35u VDD A B VDD W=3.3u A L=0.35u W=3.3u L=0.35u VDD W=3.3u B L=0.35u W=3.3u L=0.35u W=1.85u L=0.35u VDD Cin VDD A B VDD W=3.3u L=0.35u W=3.3u L=0.35u W=3.3u L=0.35u W=1.85u L=0.35u W=1.85u L=0.35u W=1.85u L=0.35u VDD W=3.3u L=0.35u W=0.85u L=0.35u VDD
W=1.85u Cin L=0.35u
Cin A
W=1.85u L=0.35u W=1.85u B L=0.35u
W=1.85u L=0.35u
B A
W=1.85u L=0.35u W=1.85u A L=0.35u
W=1.85u L=0.35u
W=1.85u W=1.85u Cin L=0.35u L=0.35u B W=1.85u B Cin A L=0.35u
W=3.3u L=0.35u Cout W=0.85u L=0.35u
Figure 23 CMOS Full-Adder Circuit with Transistor Sizes [26] 2.6. Arithmetic Logic Unit (ALU) Arithmetic Logic Unit (ALU) is one of the fundamental building blocks of a typical microprocessor. The ALU performs both the arithmetic and logic functions. Therefore, it consist of basic functional components like Adder, AND, OR, XOR gates and others. Each functional component can offer one type of operation. For example, the adder in an ALU performs the add operation. However, combination of multiple units is also required for a few specific operations such as subtraction operation which requires both XOR gate and Adder for carrying out the calculation. 2.6.1. ALU Design This project goal is to develop ultra-low power ALU. Therefore, the design of ALU is influenced by low power implementation. However, there are many approaches to reduce the power consumption in ALU or in general, the digital circuits. At the low level design, transistor sizing method is used to minimize circuit capacitance. Technology mapping is another process at the logic
23
September 2012
gate level. Different algorithms have been developed for different ALU architecture targeted for power reduction. At the system level and register transfer level (RTL), power gating and clock gating and are two popular techniques. Among the other popular techniques, Dynamic Voltage Scaling (DVS) is widely used in portable devices, which is discussed on chapter 2. Another possible approach is structural level customization. A numbers of customizations have been proposed and implemented for performance enhancement of digital design. However, most of the projects such as University of Illinois Illiac 2 project, IBM Stretch Project and [29] emphasized on performance. On the other hand, a few researches [30] and [31] have proposed structural level power minimization techniques. There are two basic methods for structural design of ALU which are chain method and tree method. Following section includes the brief description of the two techniques. 2.6.1.1. Tree Structure In tree structures, functional components are connected in parallel with a multiplexer. Figure 24 shows an ALU with Adder, AND, OR and NOR gate connected in parallel through a 4-input multiplexer (MUX). Depending on the value of MUX control signal, the ALU output is determined from the results of all the functional components.
A B A B A B A B A B ADDER Y
AND
D1 D2 D3 D4 MUX4 Q Q
OR
XOR
Y S0 S1
Figure 24 Tree Structure Design [30] This structure requires more area. Furthermore routing of signals is complicated which makes the layout difficult. However, the circuit operation is faster. 2.6.1.2. Chain Structure In chain structure the larger multiplexer is replaced by a chain of smaller multiplexers typically with 2input MUX (figure 25). The first stage of the chain starts with two arbitrary functional components with outputs connected with the first MUX. The MUX output is then connected to one of the two inputs of next stage MUX. The other input is occupied by another functional component output (figure 25). Due to the concatenation, some of the component outputs have to travel longer transmission path.
24
September 2012
S0 A B A B A B A B ADDER Y D MUX2 AND Y A B Q Y D MUX2 Q OR Y A B A B XOR Y Y D MUX2 Q Y Q S1 S2
A B
Figure 25 Chain Structure Design [30] The chain a structure requires smaller area for design. Moreover, chain structures offer variety of ways for component placement. This in turn can be utilized to reduce power by placing frequently functional component closer to the output. However, circuit operation is relatively slower compared to tree connection because of the chain structure.
25
September 2012
CHAPTER 3 BASIC CIRCUITS DESIGN AND CHARACTERISATION In order to achieve the project goal, it was essential to develop a sub-threshold ALU both in PTL and CMOS logic and compare the two designs in terms of power consumption. However, the basic circuits available from the previous project [2] were inadequate for designing a large hierarchical circuit block like ALU. Therefore, an additional of 8 basic CMOS logic circuits and 1 PTL circuit were designed. This chapter includes the design details, functionality and characterisation of the additional circuits. 3.1. Design All the design work in this project was carried out in the Cadence AMS 0.35m process. This technology is chosen specifically for two reasons. Firstly, the Spectre simulator included in this process can provide very detailed and precise simulation on analogue circuits with user friendly interface. Most importantly it can characterise the MOS devices from its own library for subthreshold operation. Secondly, this technology is well known and has been widely used for years in custom processor design, while providing cost effective solution for such complex design. 3.1.1. PTL Circuit 4-Input Multiplexor A 4-input multiplexor is an essential part of digital circuit blocks. Figure 26 shows a PTL 4input multiplexor (MUX4). The size of the transistors which is also shown in figure 26, are kept same as the other PTL circuits. This circuit is adapted from the Y3 circuit of LEAP (Lean Integration with Pass-Transistor) technology [3] which was discussed previously in section 2.3.4. The Y3 circuit is a generic PTL logic circuit which can be utilised for multiple logic operations with different input signal combinations. The proper combination of complementary input control signals (Load1, nload1, Load2 and nLoad2) enables the circuit to operate as MUX4 for the data input signals (D1, D2, D2 and D4). Since the output of the Y3 circuit is inverted, an additional inverter is added to the output to generate the non-inverted output signal. Moreover, analysis showed that, without the additional inverter, the output of the Y3 circuit is degraded for sub-threshold supply. Transistor sizes of the inverters are explained in the following section of this chapter. PTL transistor size is kept same as the other basic circuits.
D1 D2 D3 D4
Load1
W=0.4u L=0.35u W=0.4u L=0.35u W=0.4u L=0.35u W=0.4u L=0.35u W=0.4u L=0.35u Wp=3.3u Wn=1.85u Wp=3.3u Wn=1.85u W=0.4u L=0.35u
nLoad1
Load2
nLoad2
Figure 26 A 4-Input Multiplexor with Transistor Sizes in LEAP Technology [3] 3.1.2. CMOS Circuits Inverter In the previous project, inverter was used as an integrated part of logic circuits such as AND
26
September 2012
and Full-Adder circuits. However, there was no separate circuit designed and characterised for inverter operation. Moreover, for larger design blocks, inverter is used extensively. Therefore, an inverter circuit was designed and characterised (figure 27). Transistor sizes of this inverter are same as the ones used in other basic circuits and as explained earlier that this set of transistor size can provide balanced minimum delay and yet sufficient drive [14]. In addition, the project [2] used inverter with the same transistor ratio in sub-threshold without any flows being reported. Both the PTL and CMOS circuits use this same inverter.
VDD W=3.3u L=0.35u
Y W=1.85u L=0.35u
Figure 27 CMOS Inverter with Transistor Size AND, OR and NOR The circuits of figure 28 shows a classic 2-input CMOS design for AND, OR and NOR gates with the transistor sizes used in this project and the previous one [2]. AND gate is derived from the previously designed NAND gate [2] by adding an inverter to it. The sizes of the transistor in these circuits are kept same as the inverter used in PTL circuits for comparing the CMOS design with its PTL counterparts.
VDD A
W=3.3u L=0.35u
VDD
W=3.3u L=0.35u
VDD
VDD VDD A
W=3.3u L=0.35u W=1.85u L=0.35u
VDD
W=3.3u L=0.35u W=1.85u L=0.35u
A Y B
W=1.85u L=0.35u
W=3.3u L=0.35u W=3.3u L=0.35u W=1.85u L=0.35u
W=3.3u L=0.35u
B Y A
W=1.85u L=0.35u W=1.85u L=0.35u
A B
W=1.85u L=0.35u W=1.85u L=0.35u
Figure 28 Classic CMOS Logic Circuits with Transistor Sizes a) AND Gate b) OR Gate and c) NOR Gate XOR and XNOR The design of these two logic circuits are based on NAND gates which a classic method in CMOS process. Figure 29 shows the symbol diagram of two logic circuits. Similarly to other CMOS circuits, the transistor sizes are maintained accordingly.
Wp=3.3u Wn=1.85u A Q Wp=3.3u Wn=1.85u B Wp=3.3u Wn=1.85u Wp=3.3u Wn=1.85u Wp=3.3u Wn=1.85u Q Wp=3.3u Wn=1.85u
A B
Wp=3.3u Wn=1.85u
Wp=3.3u Wn=1.85u
Wp=3.3u Wn=1.85u
Figure 29 Symbol Diagram of Classic CMOS Logic Circuits a) XOR Gate b) XNOR Gate
27
September 2012
4-Input Multiplexor The 4-input multiplexor is designed using three 2-input multiplexor cascaded to each other as shown in figure 30. The 2-input MUXs are adopted from the project [2]. As mentioned earlier, the 2-input MUX has inverting output. However, for the circuit connection shown in figure 30, it can be observed that the signal travels from input to output through 2 inverting MUXS and hence the signal obtained at output is noninverting. This in turn eliminates any requirement for additional inverter to obtain a non-inverting output signal.
MUX2 D2 D1 S0 MUX2 D4 S1 D3 S0 Wp=3.3u Wn=1.85u Wp=3.3u Wn=1.85u Wp=3.3u Wn=1.85u MUX2 Q
Figure 30 A 4-Input CMOS Multiplexor Designed from 2-Input CMOS Multiplexors with Transistor Sizes Tri-State Buffer The circuit of a tri-state buffer with transistor sizes is shown in figure 31 [35]. Transistor sizes are maintained as the other CMOS logic circuits. When enable signal EN is at logic 1, the output signal has the input signal values of logic 0 and logic 1. On the other hand when EN is set to logic 0, the output is in high-impedance state.
IN
W=3.3u L=0.35u W=3.3u L=0.35u OUT W=1.85u L=0.35u W=1.85u L=0.35u
EN
Wp=3.3u Wn=1.85u
Figure 31 Tri-State Buffer with Transistor Sizes [35] 3.2. Characterisation The basic logic circuits designed in this project and the other basic circuits from the previous project [2] were characterized in terms of propagation delay and power consumption (static and dynamic) under different simulation conditions such as different ambient temperatures, supply
28
September 2012
voltages and different fan-outs. However, the simulations were mainly carried out on PTL basic circuits and only a few CMOS circuits were characterised. This is because the project intended for designing complex and practical circuits based on PTL method only. Moreover, CMOS basic circuits were already characterised in the project [2] and this project was planned to avoid repeating of previous project works on CMOS circuits [2] and to progress further towards the ultimate goal of ultra-low power custom processor design. However, ALU modules (both 1-bit and 8-bit versions) were designed and simulated for PTL and CMOS version to distinguish the difference in power consumption which is discussed in chapter 4. All the supply voltages used for simulation were in the sub-threshold region except for 0.6V. Figure 32 shows the simulation tree for the characterisation of the PTL basic circuit under different simulation condition.
Simulation on Supply Voltages of 0.3V, 0.4V, 0.5V and 0.6V
Delay Test
Power Consumption Test
-20 C FO = 0
27 C FO = 0
85 C FO = 0 Static Dynamic
FO = 2 FO = 4
FO = 2 FO = 4
FO = 2 FO = 4 -20 C FO = 1 27 C FO = 1 85 C FO = 1 27 C FO = 1 85 C FO = 1
PTL Cells: AND/NAND, OR/NOR, XOR/XNOR, Load Register CMOS Cells: Inverter
FO = 4
FO = 4
FO = 4
FO = 4
FO = 4
PTL Cells: AND/NAND, OR/NOR, XOR/XNOR, MUX4, Full Adder, Load Register CMOS Cells: Inverter, Tri-State Buffer
Figure 32 Simulation Tree Diagram for Characterisation of Basic PTL Circuits (Adapted from [2]) 3.2.1. Propagation Delay Measurement Propagation delay is an important design characteristic of a logic circuit. For design and validation purpose, this parameter must be available to the design engineer. In this project, propagation delay was measured for PTL logic circuits such as AND/NAND, OR/NOR, XOR/XNOR and load register and the only CMOS logic circuit that was simulated was inverter. These 5 circuits were used frequently for designing of other circuits. The propagation delay of a circuit is defined as the average value (11) of delay at the rising edge and the falling edge of the output signal [14] figure 33b. Figure 33a shows a generalised test circuit for delay measurement with different fan-out. Measurement of the fan-out 0 circuit provides the parasitic delay of a logic circuit. For comprehensive characterisation, tests were carried
September 2012
out for fan-out 1 and 4. Fan-out 4 circuits were specifically used because digital logic circuits show realistic characteristics for a minimum of 4 fan-out connection. Supply voltages used for the simulations were 0.3V, 0.4V, 0.5V and 0.6V. Except for 0.6V, all the other supply voltages are subthreshold voltages which is the major simulation variable for this project. Another important variable is temperature which influences the performance of circuits. It has already been discussed in chapter 2 that sub-threshold circuits have strong dependence on temperature [18]. All the test circuits were run at three different temperatures which are -20C, 27C and 85C. This temperature range (-20C to 85C) does not necessarily cover all the operational temperatures but is wide enough to examine the temperature effect in sub-threshold. Figure 32 shows the simulation tree diagram used for basic circuits. =
() + () 2
(11)
Vin
Vin
Vout
50% Vdd Pulse Stimulus

Vout
50% Vdd
time
Delay At Fall 50% Vdd
Delay At Rise 50% Vdd

time
FO = 0, 2 and 4
Figure 33 a) Generalized Simulation Setup for Propagation Delay Measurement b) Definition of Propagation Delay (Adapted from [2]) 3.2.2. Power Consumption Measurement Power consumption measurement is the most important simulation procedure in this project since this project objective is the design of an ultra-low power system. Static and dynamic consumptions were measured for both PTL and CMOS logic circuits. Similarly, to delay measurement, this procedure was also focused on PTL circuits such as AND/NAND, OR/NOR, XOR/XNOR, Full-Adder and Load Register. Only two CMOS logic circuits - inverter and tri-state buffer were characterised. The simulations were carried out for different fan-out and temperature with different supply voltages as shown in figure 32. A generalised test circuit is shown in figure 34. The circuit under the test was powered by an external independent source and the current drained from this source was measured. Static Power Consumption A logic circuit is said to be in static mode when the input signal does not change its state. The main sources of static power dissipation are gate leakage (tunnelling of the electron through gate oxide), reverse-biased junction leakage (diode leakage between diffusion region) and subthreshold conduction (due to carrier diffusion for supply voltage smaller than threshold voltage) [14]. Since the project is designed for sub-threshold supply, static power dissipation would be a significant source of power consumption. Moreover, it will be more prominent since the PTL circuits are slow and require longer processing time. It has already been mentioned in chapter 2 that the level restoring inverter used in PTL causes static power dissipation due to shot-circuit conduction. Most of the PTL circuits in this project use this inverter and therefore the shot-circuit dissipation is
30
September 2012
also a major source of static dissipation in this project. Static power dissipation is modelled with (12) for both PTL and CMOS circuits. Here, measured current IDD is multiplied by the supply voltage VDD to derive static power consumption. The current is measured directly from simulation result. Figure 32 shows the different simulation conditions for a circuit under the test. = () (12)
Dynamic Power Consumption
Dynamic power dissipation is produced by the energy drawn from the supply for charging and discharging of a logic circuit output node capacitance. Therefore, energy consumption depends on the rate of change of state for output signals. =
1
Measurement of dynamic consumption can be modelled with expression (13). Similar to static measurement, the average value of the current can be measured directly from simulation result. The Spectre simulator provides the average current for a particular time of simulation, which is then multiplied by the circuit Vdd to obtain the dynamic dissipation. However, simulations for the dynamic dissipation were carried out for two temperature values (27C and 85C) as shown in figure 32.
Average Current Measurement
1
2
()
(13)
Vdd Test Vdd Vin Vout
Vdd
Pulse Stimulus
Cell Under Test
Figure 34 General form of Simulation Circuit used for Current Measurements to Extract Static and Dynamic Power Dissipation (Adapted from [2]) 3.3. Presentation of Results 3.3.1. Propagation Delay The results of propagation delay measurement of PTL logic circuits AND/NAND, OR/NOR, XOR/XNOR, Load Register and CMOS Inverter are presented in the table 1. As shown in the simulation tree diagram of figure 32, the simulations were carried out for three different fan-out circuits (0, 2 and 4). Each fan-out circuit, however was tested under 3 different temperature - -20 C, 27C and 85 C. Supply voltages used for simulations were 0.3V, 0.4V, 0.5V and 0.6v. After a thorough observation of the data in table 1, propagation delay characteristics can be summarized into three different aspects. Firstly, the delay is strongly dependent on temperature. For a very low temperature of -20 C, all the circuits have thousands of microseconds of delay. However, increment in temperature shows that the delay improves by 10 times to more than 1000 times. For example, for 0.4V Vdd and FO = 4, the AND/NAND gate has a delay of 524.91s at -20C,
September 2012
whereas for 85C the delay reduces significantly to 2.53s. The second aspect of delay is the influence of supply voltage. In the deep sub-threshold region all the logic circuit has a very low speed. As the Vdd increases gradually towards the circuit threshold, the speed of the circuit increases immensely. The other aspect of delay is the fan-out of circuits. For larger fan-out, the circuits tend to show bigger propagation delay. Table 1 Propagation Delay of PTL and CMOS logic Circuits Under Different Simulation Condition
FO=0 27C Delay (s) 364.85 70.11 25.77 7.56 PTL AND/NAND FO=2 -20C 27C Delay (s) 9107.45 561.79 1682.90 98.71 533.73 27.11 140.08 14.26 FO=4 27C Delay (s) 836.45 132.63 42.00 18.89
-20C VDD (v) 0.30 0.40 0.50 0.60 4869.80 1561.89 473.05 135.45
85C 11.18 3.60 1.64 0.68
85C 15.88 4.38 2.05 0.80
-20C 21630.50 3262.60 542.91 191.01
85C 29.28 5.55 2.53 0.85
-20C VDD (v) 0.30 0.40 0.50 0.60 8148.50 1379.50 412.29 120.10
FO=0 27C Delay (s) 370.91 72.07 45.18 6.62
85C 11.18 2.31 1.86 0.53
PTL OR/NOR FO=2 -20C 27C Delay (s) 9398.70 384.74 2271.20 93.27 566.43 18.57 143.80 16.20
85C 473.70 4.51 5.07 4.05
-20C 20477.45 2485.05 640.06 163.91
FO=4 27C Delay (s) 581.33 224.26 53.31 49.42
85C 906.97 11.47 8.41 6.93
-20C VDD (v) 0.30 0.40 0.50 0.60 4091.50 1532.50 535.50 127.60
FO=0 27C Delay (s) 223.40 116.87 93.19 6.47
85C 113.20 36.62 0.89 0.60
PTL XOR/XNOR FO=2 -20C 27C Delay (s) 8995.00 322.91 2856.50 153.50 604.32 133.31 218.83 61.21
85C 184.82 148.32 46.00 2.60
-20C 19310.00 3178.85 2629.06 2420.57
FO=4 27C Delay (s) 898.58 218.32 184.26 117.50
85C 723.08 170.00 109.71 34.94
-20C VDD (v) 0.30 0.40 0.50 0.60 15521.85 3805.21 789.87 247.36
FO=0 27C Delay (s) 712.36 209.23 28.21 5.42
85C 14.31 10.54 1.92 0.74
PTL LOAD REGISTER FO=2 -20C 27C Delay (s) 17525.21 872.54 4325.57 167.28 1201.00 47.12 265.37 7.06
85C 24.05 11.38 2.11 0.83
-20C 23814.12 7808.96 1402.84 207.32
FO=4 27C Delay (s) 1074.21 424.53 87.06 1.08
85C 30.08 14.27 2.81 0.98
-20C VDD (v) 0.30 0.40 0.50 0.60 1719.50 427.67 21.38 0.95
FO=0 27C Delay (s) 105.54 7.63 0.81 0.04
85C 2.65 0.38 0.04 0.01
CMOS INVERTER FO=2 -20C 27C Delay (s) 1764.50 200.68 792.93 17.69 38.15 0.92 1.82 0.10
85C 2.83 0.46 0.08 0.02
-20C 20263.22 1155.87 55.35 2.57
FO=4 27C Delay (s) 301.16 23.87 1.99 0.23
85C 3.03 0.83 0.10 0.03
32
September 2012
3.3.2. Power Consumption Static Power Consumption Static Power consumption of basic circuits are shown in figure 35 and 36. Figure 35 includes static consumption of six PTL circuits - AND/NAND, OR/NOR, XOR/XNOR, Full-Adder and Load Register. Figure 36 on the other hand, shows static power consumption of two CMOS circuits Inverter and Tri-State Buffer. Simulations were executed for fan-out 1 and 4 circuits under different temperatures and supply voltages as shown in the simulation tree diagram of figure 32. The following figures depict that static power consumption characteristics depend on supply voltage Vdd and temperature, but it is independent of fan-out connection. For all of the logic circuits, static power is almost identical for fan-out 1 and 4 for a particular Vdd and temperature. For example, AND/NAND gate static power consumption at 85C and 0.6V is 4.19pw and 4.21pw respectively for fan-out 1 and 4. However, for a rise in temperature and Vdd, static consumption goes up as well. At higher temperature like 85C, all the circuits have very high power consumption. For larger circuits with a higher number of transistors like Load register and the Full-Adder circuit, increment in static power consumption is significant. It can be summarized from figure 33 and 34 that the influence of temperature is much intense on static power consumption than that of the subthreshold supply voltages. AND/NAND Static Power
0.60 0.50 0.40 0.30 Vdd (V) 0.77 0.72 0.77 0.72 0.55 0.50 0.55 0.50 0.36 0.32 0.36 0.32 0.20 0.15 0.20 0.14 4.21 4.19 3.32 3.30 2.50 2.48 1.76 1.75
85 C for FO=4 27 C for FO=4 -20 C for FO=4 85 C for FO=1 27 C for FO=1 -20 C for FO=1
OR/NOR Static Power

0.60 0.50 0.40 0.30 Vdd (V) 0.77 0.72 0.77 0.72 0.55 0.50 0.55 0.50 0.36 0.32 0.36 0.32 0.20 0.14 0.20 0.14 4.22 4.19 3.33 3.30 2.51 2.48 1.77 1.75
0.00
2.00
4.00
6.00 Power (pw)
0.00
2.00
4.00
6.00
Power (pw)
a) AND/NAND
b) OR/NOR
XOR/XNOR Static Power

0.60 0.50 0.40 0.30 Vdd (V) 1.00 0.73 1.00 0.73 0.63 0.50 0.63 0.50 3.96 0.38 0.32 3.95 0.38 0.32 2.35 0.22 0.10 2.34 0.22 0.10 11.15 11.13 6.62 6.60
MUX4 Static Power

0.60 0.50 0.40 0.30 0.77 0.72 0.77 0.72 0.55 0.50 0.55 0.50 0.36 0.32 0.36 0.32 0.20 0.14 0.20 0.14 4.19 4.17 3.31 3.29 2.50 2.48 1.76 1.75
0.00
5.00
10.00
15.00
Power (pw)
Vdd (V)
0.00
2.00
4.00
6.00
Power (pw)
c) AND/NAND
d) MUX4
33
September 2012
Vdd (V)
Full-Adder Static Power

0.60 2.62 1.52 2.62 1.52 1.50 1.02 1.50 1.02 0.86 0.61 0.86 0.61 38.06 38.01 20.83 20.78 11.37 11.33
Load Register Static Power

1.08 1.08 17.18 17.16 17.16 17.16 91.05 91.04 91.04 91.04
0.60
0.50
0.50
0.85 0.85 3.32 3.30 0.39 3.30 3.30 0.39 0.78 0.77 0.24 0.77 0.77 0.24
0.40
0.40
0.30 Vdd (V)
6.14 0.46 0.20 6.11 0.46 0.20
0.30
0.00
10.00
20.00
30.00
40.00
Power (pw)
0.00
50.00
100.00 Power (pw)
e) Full-Adder
f) Load Register
Figure 35 Static Power Consumption of Basic PTL Circuits INVERTER Static Power
0.60 0.41 0.36 0.41 0.36 0.30 0.25 0.30 0.25 0.19 0.16 0.19 0.16 0.11 0.06 0.11 0.06 3.43 3.43 2.74 2.74 2.10 2.10 1.51 1.51
Tri-State Buffer Power

0.60 0.97 0.84 0.84 0.50 0.69 0.58 0.58 0.40 0.45 0.37 0.37 0.30 Vdd (V) 0.27 0.14 0.14 7.16 7.16 5.72 5.72 4.38 4.38 3.13 3.13
0.97
0.50
0.69
0.40
0.45
0.30 Vdd (V)
0.27
0.00
1.00
2.00
3.00
4.00 Power (pw)
0.00
2.00
4.00
6.00
8.00
Power (pw)
a) Inverter
b) Tri-State Buffer
Figure 36 Static Power Consumption of Basic CMOS Circuits Dynamic Power Consumption The results for dynamic power consumption of basic PTL circuits and CMOS circuits are shown in figure 37 and 38 respectively. Same circuits which were simulated for static consumption were simulated for dynamic consumption. However, the simulations were performed at 27C and 85C for fan-out 1 and 4 circuits under different supply voltages as shown in figure 32. Unlike static consumption, dynamic consumption increases for larger fan-out circuits. In contrast, dynamic power consumption increases with a rising temperature and supply voltage which is similar to static consumption. At a high temperature of 85C all the circuits consume a significant amount of power. The larger the circuit size, the more is the power consumption (Full-Adder and Load Register).
34
September 2012
Vdd (V)
Vdd (V)
AND/NAND Dynamic Power

3.19 2.59 2.16 1.79 1.38 1.14 0.75 0.62 0.00 2.00 2.51 2.38 4.00 6.00 3.79 3.55 5.29 4.92 7.08 6.51
OR/NOR Dynamic Power

3.19 2.59 2.16 1.79 1.38 1.14 0.75 0.62 2.51 2.38 4.00 6.00 8.00 3.79 3.55 5.29 4.92 7.08 6.51
0.60
0.60
0.50
0.50
85 C for FO=4 27 C for FO=4 85 C for FO=1 27 C for FO=1
0.40
0.40
0.30
0.30 0.00
8.00 Power (pw)
2.00
Power (pw)
a) AND/NAND Vdd (V)
b) OR/NOR
Vdd (V)
XOR/XNOR Dynamic Power

3.19 2.59 2.16 1.79 1.38 1.14 0.75 0.62 0.00 2.00
MUX4 Dynamic Power

3.19 2.59 2.16 1.79 1.38 1.14 0.75 0.62 0.00 2.00 2.51 2.38 4.00 6.00 8.00 3.79 3.55 5.29 4.92 7.08 6.51
0.60
6.51 4.92
5.29
7.08
0.60
0.50
0.50
0.40
3.55 2.38
4.00 2.51
3.79
0.40
0.30
0.30 6.00 8.00 Power (pw)
Power (pw)
c) XOR/XNOR
d) MUX4
Vdd (V)
0.60
3.19 2.59 2.16 1.79 1.38 1.14 0.75 0.62 0.00 2.00 2.51 2.38 4.00 6.00 3.79 3.55 5.29 4.92
7.08 6.51
Vdd (V)
Full-Adder Dynamic Power
Load Register Dynamic Power

3.19 2.59 2.16 1.79 1.38 1.14 0.75 0.62 0.00 2.00 2.51 2.38 4.00 6.00 8.00 Power (pw) 3.79 3.55 5.29 4.92
0.60
7.08 6.51
0.50
0.50
0.40
0.40
0.30
0.30 8.00 Power (pw)
e) Full-Adder
f) Load Register
Figure 37 Dynamic Power Consumption of Basic PTL Circuits

35
September 2012
Vdd (V)
0.60
3.19 2.59 2.16 1.79 1.38 1.14 0.75 0.62 0.00 2.00 2.51 2.38 4.00 6.00 3.79 3.55 5.29 4.92
7.08 6.51
Vdd (V)
Inverter Dynamic Power
Tri-State Buffer Dynamic Power

3.19 2.59 2.16 1.79 1.38 1.14 0.75 0.62 0.00 2.00 2.51 2.38 4.00 6.00 8.00 Power (pw) 3.79 3.55 5.29 4.92
0.60
7.08 6.51
0.50
0.50
0.40
0.40
0.30
0.30 8.00 Power (pw)
Figure 38 Dynamic Power Consumption of Basic CMOS Circuits 3.4. Result Analysis
a)
Inverter
b) Tri-State Buffer
All the results demonstrated in the previous section show the characteristics of circuits in terms of propagation delay, and both static and dynamic power consumption. When analysing the characteristics results, it is important to assess these data in the context of applications being targeted. Moreover, the results were generated from simulations on a limited number of circuits. Therefore the results can provide an outline of the trends of the expected behaviour under the circumstances provided. The propagation delay results show that the sub-threshold design of PTL basic circuits have a very high range of delay. The delay of a logic circuit is calculated by the time required for charging and discharging of the circuit capacitance. The combination of pass-transistor logic and the subthreshold supply is the main reason of such delay. Typically, an nMOS pass-transistor has high impedance at output for no gate voltage. Moreover, due to body effect the nMOS transistor experiences a degraded output voltage signal, which in turn increases the delay. As discussed before in chapter 2, that for the sub-threshold gate voltage, channel conductance in MOS transistor is very low which leads to a high device resistance and the device operates as a voltage controlled resistor. All these factors increase the resistance of the time constant for charging and discharging of circuit capacitance, which in turns leads to longer propagation delay. The results show that the delay reduces with an increase in supply voltage. This is due to the reduction in device resistance as explain the section above. The relation between device resistance and the supply voltage can be conceptualised from (14), which approximates the resistance of a MOS transistor operating in the linear region for small VDs [14]. It shows that the resistance is inversely proportional to supply voltage. =
1 1 ( )
(14) [14]
The delay results for different fan-outs show that with an increase in fan-out, the delay increases. This is because added fan-out connections add more capacitance at the output node for the circuit-under-test and hence increases the propagation delay. The effect of temperature on propagation delay in sub-threshold PTL is one of the major findings of this project in the field low power design. The results show that the delay of the circuits
36
September 2012
decreases as much as a thousand times with the temperature increment. This is due to the reverse temperature effect for low supply voltage [32] and [33]. Rise in temperature causes threshold voltage VT to reduce which in turns increases drain current. However, temperature rise causes carrier mobility to be degraded which leads to the drain current reduction. However, the value of the drain current is determined by either of the two competing components under given supply voltage and temperature. Usually for VDS>>VT, drain current is mainly dominated by carrier mobility [32]. A rise in temperature decreases the carrier mobility [14] and hence the speed of the transistor is degraded. For lower supply voltages around the threshold, VT becomes dominant and drain current goes up with temperature causing better speed performance. Therefore, sped performance gets better at higher temperature for sub-threshold supply voltage. This temperature effect has to be considered carefully when designing since the speed behaviour is depended on ambient temperature. Static power consumption is a major source of power dissipation in this project as shown by the results. Results suggest that the static power consumption is almost as big as the dynamic consumption. For example, the PTL XOR/XNOR gate consumption for static power is 11.15pw whereas the dynamic power consumption is 14.41pw. As mentioned earlier that apart from subthreshold conduction, static power dissipation is dominated by the number of inverter used in both PTL and CMOS circuits. Additional inverter adds to the leakage current of the whole circuit and which in turns increases the total static consumption. Therefore, PTL circuits like Full-Adder and Load Register has very high static consumption since their circuits use 3 and 4 inverters respectively. The results show that static power increases with a rise in supply voltage and temperature. When the supply voltage is 0.6v, all the circuits show considerable increment in static consumption. This is because for a 0.6V supply, circuits are not in sub-threshold conduction and the circuit resistance reduces. With an increase in supply voltage, channel conduction increases and the circuits consume more static power. This situation is more obvious for bigger circuits like PTL Full-Adder and Load Register. A rise in temperature also causes leakage to increase which adds up to static consumption. However, results show that static power is not influenced by the circuit fan-out since added circuit capacitance does affect static consumption. The dynamic power consumption increases with supply voltage, fan-out and temperature. Expression (10) asserts that dynamic consumption is proportional to the second order of supply voltage VDD. Therefore, the result show that for a slight increase in supply voltage, dynamic consumption increases considerably. For larger fan-out circuits, the output node capacitance increases which causes more dynamic dissipation (10). As explained earlier, high temperature at lower supply voltage improves the speed of the circuit which means high frequency of operation. This rise in frequency also leads to more dynamic dissipation (10). It was also observed that, a circuit with a higher number of transistors such as Full-Adder and Load Register consume more dynamic power for higher circuit capacitance. However, the PTL circuits have comparatively low dynamic dissipation, since PTL circuits usually require less number of transistors compared to their CMOS counterparts. All the test results suggest that sub-threshold PTL circuits have high dependence on supply voltage and temperature in terms of speed and power consumption. This dependence has to be considered carefully when designing sub-threshold PTL circuits. It makes the designing of such circuits more challenging. However, the simulations were carried out on a limited number of logic circuits with a design approach considered for targeted energy-constraint applications. There are more design approaches for energy efficient design for PTL and CMOS method. The following chapter discusses more practical and complex design, such as an Arithmetic Logic Unit based on the simulated logic circuits which can provide a wider window for assessment of such energy efficient design approach.
37
September 2012
CHAPTER 4 ARITHMETIC LOGIC UNIT DESIGN, POWER MEASUREMENTS AND RESULTS ANALYSIS The main goal of this project was to develop an energy efficient 8-bit Arithmetic Logic Unit (ALU) using the sub-threshold pass-transistor logic. This chapter describes the design of a 1-bit and 8-bit version of such ALU. However, for the purpose of power consumption comparison, designs were parallelly being done in CMOS style as well. The result section shows the power consumption of both PTL and CMOS ALU designs along with different versions of 1-bit ALU design. The results show that PTL ALU can operate in sub-threshold supply and it also confirms the obvious energy efficiency of PTL designs compared to their CMOS counterparts. 4.1. ALU Design Traditionaly ALU is a combinational block of basic logic circuits performing arithmetic and logic functions. The output signal of ALU is then saved into an external register in the datapath of a typical processor. In this project, however, a load register is combined with other arithmetic and logic units in order to include the clock edge and reset operation into the ALU block. Since the targeted design method is in PTL, the structure of the ALU core avoided chain structures to avoid more signal degradation and delay. All these factors would lead to using more inverters in design and more power consumption. Therefore, the ALU is designed in tree structure to tradeoff more design area with less power consumption. There are 4 different versions of 1-bit PTL ALU. The succesful operation of basic versions led to further enhancement of operations in later versions of ALU. The 8-bit version is based on the latest version of 1-bit ALU, which is further discussed in section 4.1.3. 4.1.1. 1-Bit PTL Design ALU version 1 ALU version 1 is a very basic model for arithmetic and logic operation as shown in figure 39. The complementary output signal nS of Full-Adder and nQ of MUX4 are not required for the ALU operation. Therefore, they are left floating and bolted to no-connection to suppress any simulation warning in Cadence. The ALU can perform a total of 2 arithmetic operations and 6 logic operations. Table 2 shows the truth table. Out of 9 control signals, 7 signals are used for the selection of the different operation and the other two are used in MUX4 to the final output signal. Two arithmetic operations - addition (ADD) and twos-compliment subtraction (SUB2C) are controlled by the Cin signal value regardless of other control signal values. The XOR gate combined with Full-Adder block serve as a subtraction circuit. When Cin is set to 1, the Full-Adder carries out the twos complement subtraction operation or else it performs addition. However, for a logic operation the corresponding control signal is set to logic 1 and the control signal of the complementary operation has to be set to logic 0. Implementation of several control signals was the result of careful design approach in the initial state. However, based on the successful operation of this version, designs for later versions were modified and enhanced further.
38
September 2012
Cin B Cin B
A B nA nB XOR/XNOR
Y nY
A Cin
A B Cin nA nB nCin
Cout S nCout nS
Cout nCout D1 D2 D3 D4 SMUX4 Load1 Load2 nLoad1 nLoad2 Q nQ
FA
A B A
Select_0 A AND/NAND nA nB B Y nY AND NAND D Load SMUX2 nLoad Q Y Select_1
A B
A B nA nB OR/NOR
Y nY
OR NOR
D Load SMUX2 nLoad Q
Clock Clock Q D nQ Load Load ldreg nClock nLoad nReset nReset
Result nResult
A B
A B nA XOR/XNOR nB
Y nY
XOR XNOR
Figure 39 Block Diagram of PTL 1-Bit ALU Version 1 Table 2: Truth Table for PTL 1-Bit ALU Version 1 Select_0 Select_1 Control Signal Operation 1 1 Cin 0 ADD (Result = A +B) 1 1 Cin 1 SUB2C (Result = A +B) AND 1 0 1 AND (Result = A And B) NAND 0 NAND 1 0 1 NAND (Result = Not (A And B)) AND 0 OR 1 1 0 OR (Result = A Or B) NOR 0 NOR 1 1 0 NOR (Result = Not (A Or B)) OR 0 XOR 1 0 0 XOR (Result = (A Xor B)) XNOR 0 XNOR 1 0 0 XNOR (Result = Not(A Xor B)) XOR 0 ALU version 1.1 The modified version of the ALU version 1 is the version 1.1 which is shown in figure 40. Version 1.1 has the similar type and number of operations as the previous one. The only modification is the reduction of control signal. Compared to the 9 control signal of the last version, this version operates only on 3 control signal as shown in the table 3 without any additional circuits. This modification significantly simplify the understanding of the truth table and hence the operation of the whole ALU block. Moreover, a less number of pins is also convenient for chip designing. Most importantly, the simulation results show that version 1.1 is more power
39
September 2012
efficient compared to version 1.
S0 B S0 B
A B nA nB XOR/XNOR
Y nY
A S0
A B Cin nA nB nCin
Cout S nCout nS
Cout nCout D1 D2 D3 D4 SMUX4 Load1 Load2 nLoad1 nLoad2 Q nQ
FA
A B A
A AND/NAND nA nB B
Y nY
S0
S1 S2
A B
A B nA nB OR/NOR
Y nY
S0
Clock Load
nReset A B A B nA XOR/XNOR nB Y nY D Load SMUX2 nLoad Q Y
Clock Q D nQ Load ldreg nClock nLoad nReset
Result nResult
S0
Figure 40 Block Diagram of PTL 1-Bit ALU Version 1.1 Table 3: Truth Table for PTL 1-Bit ALU version 1.1 S2 S1 S0 Operation 0 0 0 ADD (Result = A +B) 0 0 1 SUB2C (Result = A +B+1) 0 1 0 AND (Result = A And B) 0 1 1 NAND (Result = Not (A And B)) 1 0 0 OR (Result = A Or B) 1 0 1 NOR (Result = Not (A Or B)) 1 1 0 XOR (Result = (A Xor B)) 1 1 1 XNOR (Result = Not(A Xor B)) ALU Version 2 Simulation results of earlier basic ALU versions affirm the ability of sub-threshold PTL to be functional for such large hierarchical block. However, these versions have only 8 operations which might be inadequate for the targeted applications. Therefore, the design focused on increasing the number of functions for the ALU block. It led to an addition of 3 more basic circuits which resulted ALU version 2 with 16 operations. It has two different blocks, an Arithmetic Unit (AU) which performs 8 arithmetic operations and a Logic Unit (LU) performing 8 logical operations. Arithmetic Unit (AU) The arithmetic unit is based on Full-Adder block. Full-Adder inputs can be manipulated to
40
September 2012
perform a number of arithmetic operations. Following are the examples of such operations [34]: Increment (S=A+1): A=A, B=0 and Cin=1 Decrement (S = A -1): A=A, B=1 and Cin= 0 Transfer (S=A): A=A, B=0 and Cin=0, or A=A, B=1 and Cin=1 Addition (S=A+B): A=A, B=B and Cin= 0 Add with Carry (S =A +B+1) : A=A, B=B and Cin=1 Ones Complement Subtraction (S=A+B'): A=A, B=B and Cin=0 Twos Complement Subtraction (S=A+B'+1): A =A, B =B and Cin=1
Therefore, the Full-Adder input B and Cin require being in different states for different arithmetic operations. The Cin signal only requires being in either logic 0 or 1. The B signal, however, needs to be in 4 different states. Therefore, the truth table for signal B can be abbreviated as shown in the table 4 [34]. The Boolean function for B can be deduced from K-map as (15) [34]. This explains the additional logic circuits used for signal B along with selection signal S1 and S2 (Figure 41). FullAdder input signal Cin is configured as the other selection signal S0. The truth table for AU is shown in table 5. Table 4: Truth Table for Signal B S2 S1 B 0 0 0 0 1 B 1 0 B' 1 1 1 B = S2.B + S1.B (15)
B S1 nB nS1 A Y A A B nA nB S2 B nS2 A Y nB OR/NOR Y nY S0 A B Cin nA nB nCin Cout S nCout nS Cout G nCout nG
[34]
nY B AND/NAND nA nB
FA
nS0
B nY AND/NAND nA nB
Figure 41 Block Diagram of Arithmetic Unit (AU) for ALU Version 2 Table 5: Truth Table for Arithmetic Unit (AU)
S2 0 0 0 0 1 1 1 1 S1 0 0 1 1 0 0 1 1 S0 0 1 0 1 0 1 0 1 Arithmetic Operation Transfer1 (A) Increment (A+1) ADD (A+B) ADDC (A+B+1) SUB1C (A+B') SUB2C (A+B'+1) Decrement (A-1) Transfer2 (A) Required Adder Input B 0 0 B B B B 1 1 Cin 0 1 0 1 0 1 0 1
41
September 2012
As mentioned earlier that AU can perform 8 arithmetic operations. These arithmetic operations are widely used in generic processors. Therefore, it was an important step in the design process to develop such a unit with multiple arithmetic operations. Logic Unit (LU) The LU of ALU version 2 is shown in figure 42. It can perform 8 logic operations. The only difference in logic functions from the earlier versions of ALU is that it can perform the logical NOT operation for input signal A (figure 42). Table 6 shows the truth table for LU. Different combinations of 3 select signals S2, S1 and S0 can provide different logic operations for input signal A and B.
A B nA nB A AND/NAND nA nB B Y nY D Load SMUX2 nLoad Q D1 A B nA nB A B nA nB OR/NOR Y nY D Load SMUX2 nLoad Q D2 Y S0 S1 nS0 nS1 Y D3 D4 SMUX4 Load1 Load2 nLoad1 nLoad2 Q nQ Q nQ Y
S2 nS2
S2 nS2
A B nA nB
A B nA XOR/XNOR nB
Y nY
S2 nS2
Figure 42 Block Diagram of Logic Unit (LU) for ALU Version 2 Table 6: Truth Table for Logic Unit S2 S1 S0 Operation 0 0 0 AND 0 0 1 OR 0 1 0 XOR 0 1 1 NOT 1 0 0 NAND 1 0 1 NOR 1 1 0 XNOR 1 1 1 NOT Combination of the Two Arithmetic Logic Unit The ALU Version 2 is designed by combining the AU and LU as shown in figure 43. Therefore, the ALU can perform 8 arithmetic and 8 logic operations as shown in table 7. The input signal A and B are connected through a tri-state buffer with select signal S3 operating as enable signal for the buffers. The buffers are used because both AU and LU are large combinational logic and consume power for continuous input. When S3 is set to logic 0, input signals are fed to AU of the ALU. On the other hand, when S3 is set to logic 1, LU is fed with input signals. Same select signal S3 distinguishes the type of operation at output (table 7). A proper combination of the 4 select signals S3, S2, S1 and S0 can provide the desired ALU operation at the output for input signals A and B (table 7).
42
September 2012
IN EN
trisbuf
OUT
A nA
Cout nCout
Cout nCout D S3 Load SMUX2 nLoad Q Y
IN EN
trisbuf
OUT
S2 S1 S0
B G nB nG S2 au_1bit nS2 S1 nS1 S0 nS0 A nA B nB S2 lu_1bit nS2 S1 nS1 S0 nS0 Q nQ
A S3 B
IN EN IN EN
trisbuf
OUT
Clock Load
trisbuf
OUT
S2 S1 S0
nReset
G nG
Figure 43 Block Diagram of Arithmetic Logic Unit Version 2 Table 7: Truth Table for Arithmetic Logic Unit Version 2 S3 S2 S1 S0 Operation 0 0 0 0 Transfer1 0 0 0 1 Increment 0 0 1 0 ADD 0 0 1 1 ADDC 0 1 0 0 SUB1C 0 1 0 1 SUB2C 0 1 1 0 Decrement 0 1 1 1 Transfer2 1 0 0 0 AND 1 0 0 1 OR 1 0 1 0 XOR 1 0 1 1 NOT 1 1 0 0 NAND 1 1 0 1 NOR 1 1 1 0 XNOR 1 1 1 1 NOT ALU Version 2.1 Simulation result of ALU version 2 showed a very high power consumption compared to version 1.1. Therefore, it was necessary to design another version with lower power consumption, but at the same time maintaining the total number operation to 16. Therefore, the final version 2.1 was designed with removing buffers from version 2. Figure 44 shows the block diagram of ALU version 2.1.
43
September 2012
A B S2 S1 S0
A nA
Cout nCout
Cout nCout D S3 Load SMUX2 nLoad Q Y
B G nB nG S2 au_1bit nS2 S1 nS1 S0 nS0 A nA B nB S2 lu_1bit nS2 S1 nS1 S0 nS0 Q nQ
A B S2 S1 S0
Clock Load
nReset
G nG
Figure 44 Block Diagram of ALU Version 2.1 Version 2.1 has identical operational functionality of version 2.1 (table 7). However, version 1.1 is the most power efficient design, but it can perform 8 operations only. On the other hand, with 16 operations ALU version 2.1 consumes relatively higher power but in a small scale. Power consumption of the different ALU versions is further discussed in result section of this chapter. 4.1.2. 1-Bit CMOS Design The 1-bit CMOS ALU design is based on the version 2.1 of PTL counterpart (figure 44). Similarly to PTL version 2.1, it has two functional blocks Arithmetic Unit (AU) and Logic Unit (LU) and it can perform 16 operations as shown in the table (7). However, the inverting 2-input multiplexor require additional inverter to obtain the non-inverting output. Similarly for using inverting MUX2 at Load Register input, the output was labelled inversely to avoid the use of any additional inverter (figure 44).
44
September 2012
B S1
A B
AND
Y A A B OR Y S0 A B Cin Cout S Cout G
FA
nCout S2 A B AND Y
a)
A B A B
A B A B
AND NAND
Y nY Y
S2
D Y Loa SMUX2 d Q D1 Q Q
A B A B
A B A B
OR NOR
Y nY Y
S2
D Y Loa SMUX2 d Q
S0 S1
D2 D3 SMUX4 D4 Load1 Load2
A B A B
A B A B
XOR XNOR
Y Y
S2
Y D Loa SMUX2 d Q
b)
A B S2 S1 S0
A B S2 S1 S0
Cout nCout au_1bit G
Cout nCout D S3 Load SMUX2 Q Y
A B S2 S1 S0
A B S2 S1 S0
Q lu_1bit S3 Clock Load nRese t Clock Q D Load ldreg nQ nReset nG G
c)
Figure 45 1-Bit CMOS Design a) Arithmetic Unit b) Logic Unit and c) 1-Bit ALU 4.1.3. 8-Bit PTL Design The 8-bit PTL ALU design is shown in figure 46. This is based on ALU version 2.1. 8 units of 1bit ALU are adjoined together with proper interconnection to perform 8-Bit operations. The select signal S3, S2 and S1 are connected to every single unit 1-bit ALU. The Signal S0 however, is connected to each LU in the design, but not into each AU except for the first 1-bit ALU unit. All the remaining AU input ports S1 are connected to Cout signal from the previous ALU unit as shown in figure 46. This connection ensures the proper carry propagation throughout out the design. Two consecutive inverters are used for carry propagation in each ALU unit for to overcome signal degradation throughout the design chain.
45
September 2012
A8 B8 S2 S1
Cout A nCout nA B G nB nG S2 nS2 au_1bit S1 nS1 S0 nS0 Q A nA nQ B nB S2 lu_1bit nS2 S1 nS1 S0 nS0
Cout nCout D Y Load smux2 nLoad Q
S3
A8 B8 S2 S1 S0
Clock Load nReset
Clock Q D nQ Load nClock ldreg nLoad nReset
G8 nG8
A1 B1 S2 S1
S3
D Y Load smux2 nLoad Q
A1 B1 S2 S1 S0
Clock Load nReset
Clock Q D nQ Load nClock ldreg nLoad nReset
G1 nG1
A0 B0 S2 S1 S0 A0 B0 S2 S1 S0
S3
D Y Load smux2 nLoad Q
Clock Load nReset
Clock Q D nQ Load nClockldreg nLoad nReset
G0 nG0
Figure 46 Block Diagram of 8-bit PTL ALU
46
September 2012
4.1.4. 8Bit CMOS Design The CMOS version of the 8-bit ALU is shown in figure 47. It uses the same design architecture as the PTL counterpart.
A8 B8 S2 S1 A B S2 S1 S0 Cout nCout G au_1bit Cout nCout S3 D Load SMUX2 Q Y
A8 B8 S2 S1 S0
A B S2 S1 S0
Q lu_1bit S3 Clock Load nReset Clock Q D Load ldreg nQ nReset nG8 G8
A1 B1 S2 S1
A B S2 S1 S0
Cout nCout G au_1bit
S3
D Load SMUX2 Q
A1 B1 S2 S1 S0
A B S2 S1 S0
Q lu_1bit S3 Clock Load nReset Clock Q D ldreg nQ Load nReset nG1 G1
A0 B0 S2 S1 S0
A B S2 S1 S0
Cout nCout G au_1bit
S3
D Load SMUX2 Q
A0 B0 S2 S1 S0
A B S2 S1 S0
Q lu_1bit S3 Clock Load nReset Clock Q D ldreg nQ Load nReset nG0 G0
Figure 47 Block Diagram of 8-bit CMOS ALU 4.2. Power Consumption Measurements and Results 4.2.1. Simulation Setup The ALU designs were simulated for both static and dynamic power consumption. Simulation
September 2012
setup and process are similar to basic circuit simulations for power consumption (figure 34). The designs were simulated under different simulation conditions such as different ambient temperatures, supply voltages and different fan-outs. Figure 48 shows the simulation tree for the ALUs under different simulation conditions.
Figure 48 Simulation Tree Diagram for ALU Designs 4.2.2. Results The simulations were aimed to find the worst possible power consumption for each circuit. It was important to find out the right input combination for which a circuit consumes most static power. There were 4 input combinations for single-bit 2-input circuits. Each combination was run for every single operation of ALU version 1 for 0.3v supply at 27C (Table 8). The results show that all the operations consumed most power for the input combination, A=0, B=0 and A=0, B=1. With these
48
September 2012
two combinations of inputs, further simulations were carried out for supply voltages of 0.4V, 0.5V and 0.6V. Table 9 shows the average static power consumption for each supply voltage and it can be concluded from the result that the input combination for worst power consumption is A=0, and B=1. This input combination was used for all the static power measurement simulation for all the ALU designs. Static consumption results of every operations of ALU at each supply voltages are available in the appendix C of this report. Table 8 ALU Version 1 Static Power Consumption for Vdd = 0.3v and FO = 1 at 27C ADD SUB2C AND NAND OR NOR XOR XNOR Input Power P (pw) A= 1, B= 1 3.22 3.13 3.35 3.16 3.36 3.07 3.26 3.16 A= 1, B= 0 3.44 3.15 3.48 3.36 3.47 3.25 3.18 3.28 A= 0, B= 0 3.48 3.65 3.51 3.70 3.51 3.86 3.67 3.71 A= 0, B= 1 4.07 3.63 4.11 3.99 4.09 3.73 3.65 3.75 Table 9 Average Static Power Consumption at 27C A= 0, B= 0 A= 0, B= 1 Vdd Power P (pw) 0.30 3.60 4.06 0.40 6.42 7.22 0.50 10.64 11.51 0.60 16.45 17.39 ALU version 2.1 was also simulated for different input combinations, and similarly found that A=0, B=1 caused worst static consumption. Detailed simulation results are included in appendix C. ALU Version 1 versus Version 1.1 The first version of PTL ALU version 1 was modified and developed into version 1.1. Average static consumption and dynamic consumption of both versions are shown in figure 49. Simulations were carried out for different supply voltages and temperatures as shown in the simulation tree diagram of figure 48. However, figure 49b shows dynamic power consumption for FO=4 circuits only. The results of detailed static and dynamic simulations for multiple fan-outs are provided in appendix C. Static consumption of the ALU version 1.1 is marginally better than that of version 1. At 27C for Vdd = 0.3V, both version had same dynamic consumption. However, as the temperature and supply voltage increases, version 1 consumes more power. Therefore, it is obvious from figure 49 that the ALU version 1.1 is more power efficient in terms of both static and dynamic consumption than that of the version 1.
49
September 2012
16.00 Average Static Power (pw) 14.00 12.00 10.00 8.00 6.00 4.00 2.00 0.00
Chart TitleC ALU_ver1_ -20

9.29 8.97 5.52 5.36 1.51 1.50 0.30 0.40 0.50
ALU_ver1.1_ -20 C
Average Static Power (pw)
13.59 13.13
0.60
20.00 18.00 16.00 14.00 12.00 10.00 8.00 6.00 4.00 2.00 0.00
Chart Title ALU_ver1_ 27 C

11.51 11.09 7.22 6.93
ALU_ver1.1_ 27 C 17.39 16.86
160.00 140.00 120.00 100.00 80.00
Chart TitleC ALU_ver1_ 85
ALU_ver1.1_ 85 C 145.83 144.58
94.54 94.46
4.06 3.88
61.64 60.77 39.29 60.00 38.22 40.00 20.00 0.00 0.30 0.40 0.50 0.60
0.30
0.40
0.50
0.60
Vdd (v)
Vdd (v)
Vdd (v)
a)
35.00 Average Static Power (pw) 30.00 25.00 20.00 15.00 10.00 5.00 0.00

17.35 16.75 5.40 5.40 10.04 9.93
ALU_ver1.1_ 27 C 30.90 28.82 200.00 180.00 160.00 140.00 120.00 100.00 80.00 60.00 40.00 20.00 0.00
ALU_ver1.1_ 85 C 186.72 186.02
105.60 105.04 37.80 37.79 62.84 62.79
0.30
0.40
0.50
0.60
0.30
0.40
0.50
0.60
Vdd (v)
Vdd (v)
b) Figure 49 Power Consumption of ALU Version 1 and Version 1.1 a) Static Consumption for FO = 1 b) Dynamic Consumption for FO = 4 ALU Version 2 versus Version 2.1 Dynamic power consumption of ALU version 2 and Version 2.1 is shown in figure 50. Version 2 consumed a considerable amount of dynamic power compared to version 2.1. For different temperature and supply voltage, the version 2 consumes 1.3 to 6.5 times more power than the version 2. For Vdd = 0.6V when transistors are not in sub-threshold, this consumption is significant both at 27C and 85C (figure 50).
50
September 2012
300.00 Average Static Power (pw) 250.00 200.00 150.00 100.00 50.00 0.00
ALU_ver2.1_ 27 C 400.00 245.81 Average Static Power (pw) 350.00 300.00 250.00 200.00 150.00 100.00 50.00 0.00
ALU_ver2.1_ 85 C
333.58
215.33 189.27 115.74 125.50 76.28
64.30 22.84 9.53 13.24 22.13 7.23 0.30 0.40 0.50 37.83
71.22 46.46
0.60
0.30
0.40
0.50
0.60
Vdd (v)
Vdd (v)
a) b) Figure 50 Dynamic Power Consumption of Version 2 and Version 2.1 a) At 27C for FO = 1 and b) At 85C for F0 = 1 ALU Version 1.1 versus Version 2.1 Previous results showed that for 8 operations, the ALU version 1.1 is the most power efficient design, whereas for 16 operations, the version 2.1 shows better power efficiency. A comparison of both static power (average) and dynamic power consumption is shown in figure 51. At a low temperature of -20C, version 2.1 shows slightly higher static consumption compared to version 1.1 (figure 51a). However, with an increase in temperature, version 2.1 consumes more power. A similar trend was observed for the dynamic operation. However, the difference in power consumption at worst possible case (Vdd =0.6V and Temperature = 85C) is not very significant. Compared to 148.58pw of static and 186.02pw of dynamic consumption by version 1.1, the other version 2.1 consumes 165.10pw of static and 217.96pw of dynamic power. Therefore, the overall consumption of version 2.1 compared to version 1.1 is insignificant.
Chart Title
16.00 14.00 Average Static Power (pw) 12.00 10.00 8.00 6.00 4.00 2.00 0.00 0.30 0.40 0.50 1.69 1.51 5.65 5.52 9.44 9.29
ALU_ver1.1_ -20 C ALU_ver2.1_ -20 C 13.85 13.59 Average Static Power (pw)
Chart Title
25.00 20.00 15.00 10.00 5.00 0.00 4.11 3.88
ALU_ver1.1_ 27 C ALU_ver2.1_ 27 C 19.68 Average Static Power (pw) 16.86 12.03 11.09 7.53 6.93
ALU_ver1.1_ 85 C ALU_ver2.1_ 85 C 180.00 160.00 140.00 120.00 100.00 80.00
Chart Title
165.10 144.58 101.78 94.46
64.65 60.77 40.84 60.00 38.22 40.00 20.00 0.00
0.60
0.30 0.40 0.50 0.60 Vdd (v)
0.30
0.40
0.50
0.60
Vdd (v)
Vdd (v)
a)
51
September 2012
Chart Title
45.00 40.00 Average Static Power (pw) 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0.00 0.30 0.40 0.50 23.82 14.33 16.75 7.79 9.93 5.40
ALU_ver1.1_ 27 C ALU_ver2.1_ 27 C 40.27 Average Static Power (pw) 250.00 200.00 150.00 100.00 50.00 0.00 0.60
Chart Title
28.82
ALU_ver1.1_ 85 C ALU_ver2.1_ 85 C 217.96 186.02 127.29 105.04 77.41
47.11 62.79 37.79
0.30
0.40
0.50
0.60
Vdd (v)
Vdd (v)
b) Figure 51 Power Consumption Comparison for ALU version 1.1 and ALU Version 2.1 a) Static Power Consumption at -20C, 27C and 85C for F0 = 1 b) Dynamic Power Consumption 4 FO = 4 Figure 51 includes only the average value of static power and dynamic power for F0 =4. Static consumption of each operation under different simulation conditions are provided in appendix C. Similarly for Dynamic consumption, results for different fan-out circuits are included in appendix C. PTL versus CMOS 1-BIT ALU As mentioned earlier that the PTL ALU version 2.1 was chosen as the final version of PTL design. The CMOS 1-bit ALU is the counterpart version of the PTL ALU version 2.1. Simulations were carried out for different temperature, supply voltage and fan-outs as shown in the figure 48. All the simulation results are included in appendix C. A part of the simulation results is shown in figure 52. In the sub-threshold region CMOS consumes more static power in small scale. However, for Vdd = 0.6V, CMOS ALU showed significant static consumption (figure 52a). In case of dynamic dissipation CMOS ALU consumes more power as expected. The only exception was for 0.3v supply at 85C when CMOS ALU consumed 13.36pw of power whereas PTL ALU consumed 47.11pw.
ALU-1bit_CMOS_ 27 C ALU-1bit_PTL_ 27 C
Chart Title
0.60 0.50 0.40 0.30
19.68 16.22 12.03 17.80 7.53 6.23 4.11 0.00 1000.00 2000.00 3000.00 Power (pw) 4000.00
4855.41
Vdd (v)
5000.00
6000.00
a)
52
September 2012
Chart Title27 ALU-1bit_CMOS_

90.00 80.00 Average Static Power (pw) 70.00 60.00 50.00 40.00 30.00 20.00 10.00 0.00 0.30 0.40 0.50 13.34 7.79 28.20 23.82 14.33 44.85 C
ALU-1bit_PTL_ 27 C
84.64
400.00 Average Static Power (pw) 350.00 300.00 250.00 200.00 150.00 100.00 50.00 0.00
Chart Title85 C ALU-1bit_CMOS_

376.51
ALU-1bit_PTL_ 85 C
217.96 157.13 100.86 127.29 47.11 13.46 0.30 77.41
40.27
0.60
0.40
0.50
0.60
Vdd (v)
Vdd (v)
b) Figure 52 Power Consumption Comparison of 1-Bit ALU a) Static Power at 27C for FO = 1 b) Dynamic Power at 27C and 87C for FO = 4 PTL VS CMOS 8-BIT ALU Figure 53 shows the power comparison for CMOS and PTL 8-bit ALU design. In subthreshold, CMOS ALU consumes 1.75 to a few hundred times. For the non-threshold supply of 0.6V, the magnitude of this consumption for CMOS is more severe (figure 53a). Similar trend was observed for dynamic consumption (figure 53a).
4500 Average Static Power (pw) 4000 3500 3000 2500 2000 1500 1000 500 0
Chart Title C ALU-8bit_CMOS_ -20

4032.77
ALU-8bit_PTL_ -20 C
140000 Average Static Power (pw) 120000 100000 80000 60000 40000 20000 0

132876. 85
ALU-8bit_PTL_ 27 C
28.45 117.41 245.46 14.75 45.49 74.98 110.63 0.30 0.40 0.50 0.60
56.35 300.61 3460.37 32.56 58.60 98.25 170.95 0.30 0.40 0.50 0.60
Vdd (v)
Vdd (v)
53
September 2012
700000.00 Average Static Power (pw) 600000.00 500000.00 400000.00 300000.00 200000.00 100000.00 0.00 752.08 319.79 0.30

576535.18
ALU-8bit_PTL_ 85 C
6031.39 514.83 0.40
59276.72 839.31 0.50
1444.13 0.60
Vdd (v)
a)
1000.00 Average Static Power (pw) 800.00 600.00 400.00 200.00 0.00
ALU-8bit_PTL_ 27 C
314.21 242.20 185.37 58.07 95.54 153.15 40.00 0.30 0.40 0.50 0.60
857.56
6000.00 5000.00 4000.00 3000.00
Chart Title
ALU-8bit_PTL_ 85 C 5437.62
1497.99 1483.98 827.16 547.38 942.90 607.71 1000.00 386.97 2000.00 0.00 0.30 0.40 0.50 0.60
Vdd (v)
Vdd (v)
b) Figure 53 Power Consumption of 8-Bit ALU PTL and CMOS a) Static Power for F0 =1 b) Dynamic Power for FO = 4
4.3. Result Analysis This project simulation was aimed at designing power efficient PTL ALU design for the targeted application. This is why 4 versions of PTL 1-bit ALU were simulated to find the best possible version of PTL ALU in terms of power consumption considering the number of operations they can offer. At the same time, the CMOS version of 1-bit and 8-bit ALU was also simulated. The results of a large and hierarchical design such ALU provide more concrete evidence of successful implementation of power efficient sub-threshold PTL. The results show that the ALU version 1.1 is more power efficient compared to the version 1. This is because the version 1 has 9 control signals compared 3 control signals of version 1.1. More signalling nodes lead to more number of switching nodes which in turn deduce more dynamic consumption for version 1. Moreover, multiple control signals allow MUX2 to be in idle state for several instances of operations in static mode and thus static power consumption is higher for version 1. The only design difference between ALU version 2 and 2.1 was the tri-state buffer used in version 2. It was explained earlier that more circuit components lead to more power consumption. Therefore version 2 consumes more power as shown by the result.
54
September 2012
Analysis of results for version 1.1 and 2.1 showed that the earlier version is more power efficient as expected. Circuits used for additional functionality caused more power dissipation for version 2.1. However, version 2.1 offers 16 operations which are two times more than the version 1 but version 2.1 required only three additional logic gates. Therefore, version 2.1 does not consume any significant amount of power compared to version 1.1. The results confirm that both 1-bit and 8-bit PTL ALU has lower static and dynamic consumption. It was already mentioned in chapter one that the basic circuits used in this project were tested before for power consumption and PTL versions were found to be more power efficient. Moreover, PTL designs are simple. The total number of the transistor used in PTL ALUs is significantly less than the CMOS version. CMOS ALUs with a higher number of transistor have high circuit capacitance and more switching nodes. These factors explain the high dynamic consumption of CMOS ALU. Finally, the results validate that PTL is more power efficient compared to CMOS technology in sub-threshold design for large and hierarchical circuits.
55
September 2012
CHAPTER 5 CONCLUSION AND FUTURE WORK The project results reported in this paper assert that the sub-threshold pass-transistor logic can be thoroughly implemented in a large and complex hierarchical block of Arithmetic Logic Unit. Research data were analysed for both PTL and CMOS design and results confirmed that PTL ALU is more power efficient compared to the CMOS version for sub-threshold design. This project however, was focused on arithmetic and logic circuits. To design a complete system for processor, other blocks need to be designed and analysed. Furthermore, there is a wide range of design approach for such energy-efficient system. The research work has some vital findings. Firstly, it provides solid assessment that basic logic circuits can operate in sub-threshold supply for pass-transistor style logic design and quantitative measurement of propagation delay and power consumption is feasible for such design. This is a fundamental finding in this field of work. Secondly, fully functional ALU can be designed by integrating basic PTL circuits. This is a significant advancement in this area of work as it was mentioned earlier that only a limited number of publications discussed about sub-threshold PTL design over a narrow range of basic logic circuits. Thirdly, the project data include the power comparison of ALU for PTL and CMOS design under different ambient temperatures and fan-outs. It shows that PTL logic is power efficient in every condition compared to CMOS logic for sub-threshold operation. For 0.5 V Vdd and 27 C, the 8-bit PTL ALU consumes 153.15 pw whereas the CMOS version consumes 314.21 pw of dynamic power which is two times more than the earlier one. PTL designs have longer propagation delay for the sub-threshold supply which is a major performance concern. However, performance is not an issue for targeted energy-constraint applications. Nevertheless, special attention was paid for PTL styles to avoid any prolonged delay. Another issue of this design was temperature dependence. The results suggest that the power consumption varies from hundreds to a thousand times with different temperatures. However, for the sub-threshold supply, PTL circuits showed the reverse temperature effect for which the speed increases with raising temperature. This a major finding of this project and according to the best knowledge of this author the reverse temperature effect was reported for only CMOS design styles in a very limited number of publications. This project focused on developing power efficient practical arithmetic and logic circuits for ALU design. To design a complete system of a typical processor, a huge amount of research and design work has to be done. For instance, a fully comprehensive cell library of the PTL circuits has to be designed. Optimal layout techniques should be implied since PTL circuits require more area. It is necessary to build a synthesiser especially for the PTL cells since there is no commercial PTL synthesis tool available. There are publications [4], and [5] suggesting different models of PTL synthesiser targeted for specific cell-libraries. Therefore, synthesiser design should be based on the developed cells. Furthermore, the processor architecture should be biased for the targeted applications requirement. The advantages of PTL style design should be considered for designing the other blocks of processor. Another aspect of future work is to consider the temperature effect on performance and noise margin. This project investigated only the temperature effect in a very narrow window since these issues were beyond the scope of this project. To find out whether PTL or CMOS shows better performance under these two factors, further researches need to be carried out. There is a number research publication [3], [5], [8], [21], [23], [24], and [25] examining these factors for super threshold (with conventional supply voltage) which can be fundamental means for future work. Few of the papers include discussion about minimizing the temperature effect while others suggested methods for better noise performance. For instance, Markovic and Rabaey in their work [8] assert that nearthreshold operation enhance the circuit performance and temperature dependence.
56
September 2012
Technology size can be another prospect of future work. The current project was carried out in AMS 0.35 m technology. In spite of this aged technology, it has the advantage of being cost effective for the commercial chip fabrication. However, in the long run it is necessary to adapt newer technology sizes to maintain the sustainably and application of the design considering the advantages offered by PTL. This project is a fundamental step for designing a PTL based system (processor) for energyconstrained applications. To develop the whole system, a huge amount of research and design work has to be carried out. Moreover, any commercial implementation of such PTL design has to compete against the popular and widely used CMOS logic styles.
57
September 2012
APPENDICES Appendix A Project Gantt Chart Table 10: Project Planning and Development Week week beginning:
Activities and Milestones Part 1 - Research and previous work review Initial research and project brief submission Thorough review on related literature Tools familiarization Previous project data validation Milestone: Presentation of project conceptualization Part 2 - Design Basic circuits design and characterization ALU Design and Characterization Design extension for CMOS version and characterization Part 3 - Writing, Demonstration and Submission Writing Up Milestone: Demonstrate project to project second examiner Milestone: Draft of dissertation completion Final checking and corrections Milestone: Hand-in
1 13/6 2 20/6 3 27/6 4 4/7 5 11/7 6 18/7 7 25/7 8 1/8 9 8/8 10 15/8 11 22/8 12 29/8 13 5/9 14 12/9 15 21/9
Appendix B - Design Files Appendix B consists of lists for PTL and CMOS design files with brief circuit description. The Cadence design files are also included in this appendix. Both the lists and Cadence files are provided in the project zip file. Appendix C - Detailed Simulation Data This section includes the characterisation data of ALUs with measured current which is also provided in the project zip file.
58
September 2012
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] Ivan Y. Ivanov, Tom J. Kazmierski, "Extremely Low-Power Circuits Based on Pass-Transistor Logic in Sub-Threshold Region", conference paper submitted to VW-FEDA. Ivan Y. Ivanov, "Sub-Threshold Pass Transistor Logic or Ultra-Low Power Processor, Suitable for Energy Harvester Application", M.S. Dissertation, Electronics and Computer Science, University of Southampton, Southampton, 2011. K.Yano, Y. Sasaki, K. Rikino, K.Seki, "Top-Down Pass-Transistor Logic Design", IEEE Journal of Solid-Sate Circuits, vol.3(6), Jun. 1996, pp. 792-803. S.-F. Hsiao, Y. Tsai, C.-S. Wen, "Efficient Pass-Transistor-Logic for Sequential Circuits", IEEE Conference on Circuits and Systems, Dec, pp. 1631-1634. S.-F. Hsiao, M.-Y.Tsai, M.-C. Chen, C.-S. Wen, "An efficient pass-transistor-logic synthesizer using multiplexers and inverters only", Proc. International Symposium on Circuits and Systems (ISCAS), Koe, May 2005, pp. 2433-2436. M. Suzuki, N.Ohkubo, T. Shinbo, T. Yamanaka, A. Shimizu, K. Sasaki, "A 151.5-ns 32-b CMOS ALU in Double Pass-Transistor Logic", IEEE Journal of Solid-Sate Circuits, vol.28(11), Nov. 1993, pp. 1145-1151. K.Yano, T. Yamanka, T.Nishida, M. Saitoh, K. Shimohigashi, A. Shimizu, "A 3.8 ns VMOS 16x16 multiplier using complementary pass transistor logic", Proceedings of the IEEE Custom Integrated Circuits Conference, San Diego, May 1989, pp. 10.4/1-10.4/4. D. Markovic, C.C. Wang, L.P. Alarcon, L. Tsung-Te, J.M. Rabaey, "Ultra-Low Power Design in near Threshold region", IEEE Journal on Solid-State Circuits, Feb.2010,pp. 237-252. B.H. Calhoun, A. Wang, A. Chandrakasan, "Modeling and Sizing for Minimum Energy Operation in Subthreshold Circuits", Proceedings of the IEEE. Sep.2005, pp. 1778-1786. B.H. Calhoun, A. Chandrakasan, "Characterizing and modeling minimum energy operation in Subthreshold circuits", Proc. Of the 2004 International Symposium on Low Power Electronics and design, Aug.2004, pp. 90-95. A. Wang, B.H. Calhoun, A. Chandrakasan, Sub-Threshold Design for Ultra Low-Power Systems. Cambridge, MA: Springer, 2006. D. Blaauw, J. Kitchener, B. Phillips, "Optimizing Addition for Sub-Threshold logic", Asilmar Conference on Signals, Systems and Computers, Pacific rove, Oct.2008, pp. 751-756. D. Zhu, S. Beeby, J. Tudor, N. Harris, "A Credit Card Sized Self Powered Smart Card Sensor Node", Sensors and Actuators A: Physical, 169 (2), 2011, pp.317-325. Nail H. E. Weste, D.Harris, CMOS VLSI Design: A Circuit and Systems Perspective. Boston, MA: Pearson Addison Wesley, 2005. B. Razavi, Design of Analog CMOS integrated Circuits. International Edition, MA: Singapore: McGraw-Hill, 2001. V. Moalemi, A. Afzal-Kusha, "Suthreshold Pass Transistor Logic for Ultra-Low Power Opeartion", IEEE Proc. Symposium on VLSI, Porto Alege, Mar. 2007, pp. 490-491. D. Markovic, B. Nikolic, V.G. Oklobdzija, "General Method in Synthesis of Pass-transistor Circuits", Proc. of 22nd International Conference on Microelectronics, Nis, May 2000, pp.695-698. H. Soleman, K. Roy, B. Paul, "Robust Ultra-Power Sub-Threshold DTMOS Logic", Rapallo/Portacino Coast, Jul. 2000, pp. 25-30, N. Limdert, T. Sugii, S. Tang, C. Hu, "Dynamic Threshold Pass-Transistor Logic for Improved delay At Power Supply Voltages", IEEE Journal of Solid-State Circuits, Jan. 1999, pp. 85-89. J.P. Uyemara, CMOS Logic Circuit Design, MA: Kluwer Academic Publishers, 1999. V.M. Srivastava, R. patel, H. Parashar, G. Singh, "Reduction in Parasitic Capacitance for
59
September 2012
Transmission Gate with The Help of CPL", International Conference on recent Trends in Information, Telecommunication and Computing, Kochi, Mar. 2010, pp. 218-220. [22] [23] [24] [25] [26] [27] [28] G. Paul, A. Pal, B.B. Bhattacharya, "A Novel Mapping Technique for DD-Based Circuits Using Leap Cells", International Symposium on Electronic System Design, Bhubaneswar, Dec. 2010, pp. 135-140. S.-F. Hsiao, M.-Y.Tsai, M.-C. Chen, C.-S. Wen, "Transistor Sizing and Layout Merging of Basic Cells In Pass-transistor Logic Cell library", IEEE International Symposium on VLSI Design, Automation and Test, Apr 2008, pp. 89-82. K. Gulati, N. Jayakumar, S.P. Khatri, " A Structured ASIC Design Approach Using Pass Transistor Logic", Proc. International Symposium on Circuits and Systems (ISCAS), New Orleans, May 2007, pp. 1787-1790. S.-F. Hsiao, M.-Y.Tsai, C.-S. Wen, "Low Area/Power Synthesis Using Hybrid Pass Transistor/CMOS Logic Cells in standard Cell-Based Design Environment", IEEE transactions on Circuits and Systems, Vol 57(1), Jan 2010, pp. 21-25. R. Zimmermann, W. Fichtner, "Low-Power Logic Styles: CMOS versus Pass-transistor Logic", IEEE Journal of Solid-State Circuits, vol. 32(7), Jul. 1997, pp. 1079-1090. J. -M. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuit: A Design Perspective. 2nd Ed., International Ed, MA: Upper Saddle River, NJ: Prentice Hall, 2003. I. McNally (2012, August), "Alternative Cell Design Strategy", ELEC6010: Digital IC Design Lecture Notes. Available: https://secure.ecs.soton.ac.uk/bim/notes/did/lecture/pdf/did03.pdf [29] [30] [31] C. E. Leiserson, J.B. Saxe, "Retiming Synchronous Circuitry", Algorithmica, 1991, Volume 6, Number 1-6, pp. 5-35. Yu Zhou; Hui Guo, "Application Specific Low Power ALU Design", Embedded and Ubiquitous Computing, 2008. EUC '08. IEEE/IFIP International Conference on, vol.1, no., Dec. 2008, pp.214-220. Hui Guo; , "A Structural Customization Approach for Low Power Embedded Systems Design", Green Computing and Communications (GreenCom), 2010 IEEE/ACM Int'l Conference on & Int'l Conference on Cyber, Physical and Social Computing (CPSCom) , vol., no., pp.237-244, 18-20 Dec. 2010. Changhae Park; John, J.P.; Klein, K.; Teplik, J.; Caravella, J.; Whitfield, J.; Papworth, K.; Sunny Cheng; , "Reversal of temperature dependence of integrated circuits operating at very low voltages," Electron Devices Meeting, 1995., International , vol., no., pp.71-74, 10-13 Dec 1995. K. Kanda, K. Nose, H. Kawaguchi, T. Sakurai, "Design impact of positive temperature dependence on drain current in sub-1-V CMOS VLSIs", Custom Integrated Circuits, 1999. Proc. of the IEEE 1999, vol., no., pp. 563-566, 1999. K. Maharatna (2012, August), "Arithmetic Logic Unit", ELEC3017 Digital System Design Lecture Notes. Available: https://secure.ecs.soton.ac.uk/notes/elec3017 [35] I. McNally (2012, August), "Pass Transistor Circuits", ELEC3025: Integrated Circuit Design Lecture Notes. Available: https://secure.ecs.soton.ac.uk/bim/notes/icd/lecture/pdf/icd07.pdf
[32]
[33] [34]
60

Arithmetic and Logic Circuits Using Sub-Threshold Pass-Transistor Logic For Ultra-Low Energy Applications

Hochgeladen von

Dokumentinformationen

Copyright

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Arithmetic and Logic Circuits Using Sub-Threshold Pass-Transistor Logic For Ultra-Low Energy Applications

Hochgeladen von

Copyright:

University of Southampton

Choudhury Md Salim Ul Haque Salmee

A dissertation submitted in partial fulfillment of the degree of

MSc Microelectronics Systems Design

Project Supervisor: Dr Tom J. Kazmierski Second Examiner: Dr Koushik Maharatna

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

VDS = 1.2 V Saturation Region VDS = 0.9 V

1) 2, drain current at VGS VT

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

PMOS Vin Vout

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

~ x 1,000 MDP Traditional Operation Region

Infeasible Emin Dmin

Ultralow-Energy Region MEP Normalised Delay

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

nMOS Y = A.B Drain Source

nMOS2 A.B Drain Source Gate B Y = A.B

nMOS1 B.nB Drain Source Gate nB

VIN VT1 nMOS1 VIN2 nMOS1 VIN2 VT2

VIN VT1 nMOS2 VDD

VIN VT1VT2 nMOS3 VDD

VIN VT1 VT2 VT3

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

D nClock nReset Clock

D Clock Clock nClock nClock nReset nReset

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

carry-out signal Cout.

W=0.4u L=0.35u Wn=3.3u Wp=1.85u

W=3.3u L=0.35u W=0.85u L=0.35u

W=1.85u L=0.35u W=1.85u L=0.35u

Figure 19 Classic CMOS 2Input Logic Circuit for AND Gate

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

W=1.85u L=0.35u W=1.85u L=0.35u

W=1.85u L=0.35u W=1.85u L=0.35u

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

D Clock Clock nReset nReset

W=1.85u Cin L=0.35u

W=1.85u L=0.35u W=1.85u B L=0.35u

W=1.85u L=0.35u W=1.85u A L=0.35u

W=1.85u W=1.85u Cin L=0.35u L=0.35u B W=1.85u B Cin A L=0.35u

W=3.3u L=0.35u Cout W=0.85u L=0.35u

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

S0 A B A B A B A B ADDER Y D MUX2 AND Y A B Q Y D MUX2 Q OR Y A B A B XOR Y Y D MUX2 Q Y Q S1 S2

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

W=3.3u L=0.35u W=1.85u L=0.35u

W=3.3u L=0.35u W=3.3u L=0.35u W=1.85u L=0.35u

W=1.85u L=0.35u W=1.85u L=0.35u

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

W=3.3u L=0.35u W=3.3u L=0.35u OUT W=1.85u L=0.35u W=1.85u L=0.35u

C.M.S Ul Haque Salmee MSc in Microelectronics Systems Design

Power Consumption Test