Effect of Tech Scaling

Effect of Technology Scaling on Digital CMOS Logic Styles
Mohamed Allam, Mohab Anis and Mohamed Elmasry*
Abstract
In this paper, the main challenges of technology scaling are reviewed in depth. Five popular logic families, namely, Conventional CMOS, CPL, Domino, DCVS and MCML are represented highlighting their advantages and drawbacks. The behavior of each logic style in deep submicron technologies is analyzed and predicted for future generations. To verify the qualitative analysis, simulations were performed on the basic logic gates, full adder and a 16-bit Carry Look Ahead adder. The circuits were implemented in 0.8, 0.6, 0.35 and 0.25pm CMOS technologies.
Product (EDP) [2]. In order to illustrate the influence of technology scaling on the behavior of digital circuits, Conventional CMOS, CPL, Domino, DCVS and CML logic styles are used to implement the basic logic gates, full adders, and a 16-bit Carry-Look-Ahead (CLA) adder. These circuits are implemented in CMOS technologies 0.8, 0.6, 0.35 and 0.25pm, under nominal operating conditionas, and are all optimized for minimum EDP values. An overview of the most important logic styles is f i s t presented, followed by how logic styles are affected by technology scaling. Finally, simulation results are presented to verify the qualitative analysis.
Introduction
2
2.1
Logic Styles
Conventional CMOS
Ever since the invention of the f i s t integrated circuit, device dimension, voltage supply, threshold voltage, and oxide thickness are parameters that have been scaled down at a dramatic rate over the past three decades [l]. They are considered as the main stimulus to the growth of the microelectronics industry. But as technology scales down, many phenomena like short channel effects, hot carriers and subthreshold leakage currents, dominate the functionality of CMOS logic circuits. Depending on the application, the kind of circuit to be implemented and the technology used, different performance aspects vary significantly from one logic style to another. Choosing the appropriate logic style for a certain application is becoming a challenge where the designer undergoes exhaustive simulations to evaluate the various implement ations. Considerable potential for high speed and power savings exists by means of proper choice of a logic style for implementing combinational circuits. This is because the parameters governing power dissipation and performance are strongly influenced by the chosen logic style. Power dissipation is governed by the supply voltage, operating frequency, nodal switching activity and device sizes. Speed, on the other hand, is affected by the No. of inversion levels, No. of devices in series, supply voltage, device sizes and interconnect wiring capacitance. The circuits robustness with respect to voltage and device scaling, process variations and compatability with surrounding circuits is also affected by the type of logic style used for implementation. These parameters are also influenced by the technology used for implementation, making a logic style favorable over another to implement a certain application, while this is not necessarily true as the technology is varied. A metric that is heavily influenced by technology scaling, and that describes the efficiency of the circuit in terms of performance and power dissipation, would be the Energy-Delay
M.W. Allam, M.H. Anis and M.I. Elmasry are with the VLSI Research Group, Department of Electrical and Computer Engineering, University of Waterloo, ON N2L 3G1,Canada.
Logic gates in conventional CMOS are built from an N and P block. An AND-OR-Invert (AOI) CMOS gate is shown in Figure l(a). The N block implements a sum-of-product function to evaluate the 0 state by creating a path from the output to G N D . The P block evaluates the 1 state of the output by implementing a product-of-sums function to create a path from VDDto the output node. This is equivalent to stating that the output node is always a low-impedance node in steady state. The N and P networks should be designed so that, whatever the value of the inputs, one and only one of the networks is conducting at steady state. The main drawback of CMOS circuits is the existence of the P block, due to its low mobility. The PMOS devices have to be therefore, sized up. Furthermore, the input capacitance of a CMOS gate is large because each input is connected to the gates of at least one PMOS transistor and one NMOS transistor. This also degrades the gates speed. However, the best gate performance is achieved with a PMOS/NMOS width ratio of [ ] This ratio will eventually approach 1 in Deep3. Submicron (DSM) technologies, where the carrier drift velocities in NMOS and PMOS transistors become almost equal due to velocity saturation. Another drawback of CMOS is the relatively weak output driving capability due to series transistors in the output stage.
d x
Another impact that the large input capacitance of a CMOS has, is high power dissipation. However, static CMOS circuits have a smaller switching activity and short-circuit current compared to the other logic styles. CMOS is also robust against voltage and transistor scaling and thus reliable operation at low voltages and minimal transistor sizes. This is attributed to the presence of a static path that restores the correct logic state in the case of noise. Through out this paper the terminology CMOS will be used to define Conventional CMOS.
19-1-1
0-7803-5809-O/OO/$lO.OO 0 2000 IEEE
IEEE 2000 CUSTOM INTEGRATEDCIRCUITS CONFERENCE
401
2.2
Complementary Pass Logic (CPL)
A CPL gate [4] consists of two NMOS logic networks (one for each signal rail), two small pull-up PMOS transistors for swing restoration, and two output inverters for the complementary output signals. Figure l(b) shows an A01 circuit implemented using CPL. Unlike CMOS logic, the CPL gate creates a path from the output node to one of the input nodes of the gate instead of the power lines. Because the MOS networks are connected to variable gate inputs rather than constant power lines, only one signal path through each network must be active at a time in order to avoid shorts between the inputs. Therefore, each pass-transistor network must realize a multiplexer (MUX) structure. All two-input functions AND, OR and XOR are therefore, implemented by this basic MUX structure. This is relatively expensive for simple monotonic gates such as AND and OR. In most cases, CPL uses smaller and less number of transistors especially in XOR and MUX based functions. There CPL employ small input loads and good output driving capability due to the output inverters, and the fast differential stage due to the cross-coupled PMOS pull-up transistors. However, most of the CPL gates require a l the inputs and their coml plements which increases the routing complexity and overhead, and ultimately augment power and delay. Since the CPL gate is constructed mainly from N transistors, the output voltage swing will be lower than the input swing by the NMOS threshold voltage V T H ~This could cause DC cur. rent to flow through the inverter. A swing restoring circuit should therefore, be added after each two or three cascaded gates to restore the full output swing. This in turn adds to the power of the circuit. The layout of pass-transistor cells is not as straightforward and efficient as CMOS due to the rather irregular transistor arrangements and high wiring requirements, because of the double rails.
well as the large clock load switching at full rate. Domino logic is very susceptible to noise. A voltage at the input as low as VTHcould turn on the NMOS pull-down transistor, and the output will eventually reach GND. This which is quite low compared to is translated to a NM of VTH, static versions. Some subthreshold leakage current can flow through the NMOS even when the input is 0. This effect becomes more pronounced when the input is not completely O, but approaches VTH in the presence of noise, causing the N-devices to turn ON. To compensate for the low noise margins, the size of the PMOS keeper must increase, in turn increasing the contention current during evaluation and consequently reducing the gates performance. This is the typical Speed-Noise Margin trade-off in Domino logic circuits. Another one of the problems of Domino circuits is that noninvkrting logic could only be implemented. This is a problem in-the implementation of XOR gates and full adders (FA). A Domino style which overcomes this problem is the NP-Domino [SI. NP-Domino was used to realize the simulated XOR and FAS in this work.
2.4
Differential Cascode Voltage Switch (DCVS)
23 .
Domino Logic
The A 0 1 structure of a domino logic gate [5] is shown in Figure l(c). It is a non-inverting structure and consists of a dynamic gate stage, a static CMOS inverter, which provides the circuits output, and a PMOS keeper transistor which restores the logic a t the Domino output node. The dynamic gate stage consists of an NMOS transistor network, which implements the required function and two transistors (NMOS and PMOS) where the clock signal is applied and synchronizes the operation of the circuit. The CMOS inverter is included for the proper operation of a chain of domino gates, and to increase the driving capability of the gate. The keeper transistor restores the logic and gives the domino gate immunity against charge sharing and charge loss [6]. Any number of logic stages can be cascaded, provided that the sequence can evaluate within the evaluate clock phase. The domino input signal to a domino gate must therefore, satisfy some setup and hold timing constraints for correct operation of the gate [7]. Domino logic has low transistor count and input capacitance, which enhances its speed. Furthermore, since the logic block is only constructed from high-mobility N transistors, the evaluation is fast. Domino logic consumes large power. This is attributed to its high switching activity because all the output nodes are precharged to VDD each clock cycle, as
The static and dynamic DCVS logic were first proposed by Heler et al. as a high performance logic family [9]. The static version suffered from major drawbacks: 1. High dynamic power, 2. Limited driving capability and 3. Complex design. On the other hand, the dynamic version experienced speednoise margin trade-offs similar to Domino Logic. The dynamic DCVSL (DDCVSL) was therefore proposed. Figure l(d) presents the architecture of an A01 gate implemented in DDCVSL logic. It is clear that during precharge phase (CLK=O), both keeper transistors Q1, will be O F F . Q2 Unlike domino logic, the keeper transistors will be O F F at bhe start of the evaluation phase (CLK=l) which will reduce power and delay caused by the contention. One branch will implement the required function, while the other branch implements its inverse. DDCVSL is considered a general purpose logic style because it may be used to implement inverting and non inverting logic circuits. DCVS is more area efficient in implementing complex logic gates. Most of the complex logic functions may be implemented using one gate only which makes DCVS logic much faster than CMOS circuits. It is also suitable for implementing gates with XOR functionality like arithmetic circuits and MUX style logic gates. Over the past fifteen years, many flavors of Cascode Voltage Switch Logic (CVSL) were introduced. Differential Cascode Voltage Switch with Pass Logic family (DCVSPG) uses pass logic to implement the logic function of each branch[lO]. It avoids the problem of the floating output node that exists in DCVS logic. Switched Output Differential Structure (SO.DS) replaces the PMOS latch with a clocked latch to avoid the contention [ll]. Charge Recycling Differential Logic (CRDL) reduces power dissipation by shorting the output nodes before each evaluation phase [12].
2.5
MOS Current Mode Logic (MCML)
Figure 2(a) shows the architecture of an MCML inverter/ buffer. Transistor Q1 acts as a DC current source controlled by V,,,. Resistors RI and Rz are pull up resistors. The logic
402
19-1-2
OUT
rh
CLK
-+
+
(a) CMOS
(b) CPL
cLy
- ( i
(d) DCVS
(c) DOMINO
Figure 1: Full Swing Logic Styles
function is implemented by the logic block connected between the resistors and the current source. For an inverter/ buffer, the logic block is the differential pair constructed by transistors Qz and Q3. The operation of the CML is based on the differential pair circuit. Each differential input variable is connected to a differential pair circuit. The value of the input variable controls the flow of current through the two is branches. For example , if VGS(QZ) higher than V G S ( Q ~ ) , the current passing through Qz will be higher than that passing through Q3. Therefore, the voltage of node N I will start to drop until reaching a steady state where the current going through the resistor RI matches the current going through transistor Qz.The amount of current going through the ON branch (Qz in the previous case) controls the discharge delay of the logic gate while the load resistor controls the charging of the output nodes. The output voltage swing V . is deThe small fmed as the voltage difference between N I and Nz. output swing of MCML circuits reduces cross t a k between adjacent signals. The constant current source reduces the switching noise and supply fluctuations. For those reasons, MCML is recommended for mixed signal design to reduce the interference between the digital and analog blocks [13], [14]. The reduced output swing also reduces the dynamic power dissipation for long busses. Therefore, MCML may be used in the implementation of bus transceivers. Another important feature of CML circuits is its noise immunity due to the differential nature which is recommended at high operating frequencies. However MCML has some major drawbacks which limit its use in digital systems. First is the static power dissipation due to the constant current source which is independent on the operating frequency. Therefore, MCML is preferred at high frequency applications only to reduce the overhead of its static biasing power. MCML circuits are not suitable for powerdown mode because of the DC current source. MCML circuits also require special fabrication technologies to implement the large load resistors in a reasonable area which increases the cost and area of the chip. A reference voltage distribution tree has to be included in the design to distribute Vretleading to larger chip area and more complex routing. Finally, the
matching of the rise and fall delays is not an easy task because its dependency on the load of each gate.
RL
RL
"'UCG
(a) Inverter
(b) A01
Figure 2: MCML
3
3.1
Effect of Technology on Logic Styles

Velocity Saturation and Mobility degradation
In order to evaluate the output logic of a certain gate implemented by some logic style, a series of charging and discharging processes occur to the output node (at which the logic is determined). As the input of a logic gate changes, it causes the output node(s) to either charge or discharge. This is true for logic styles consisting of an N logic block. A static CMOS inverter is a simple example. The delay of which is the time taken for the output node to fully charge or discharge.
19-1-3
403
For full swing logic styles, this NMOS will go through all the operating phases (cut-off, saturation and linear modes) while discharging the output node. The transistor is initially in the cut off mode, when the input is 0. As the input increases, the NMOS operates in 2 regions; Saturation and Linear. The NMOS will f i s t operate in saturation where the drain current I D S is large ( I D S a (VGS- V T H ) which dis~, charges the O/P node quickly. a is the velocity saturation index [15], which takes a value of 2 for long-channel devices, and around 1.3 for short-channel devices. The NMOS will operate along a constant VGScurve in the saturation region S in the typical I D S / V Dcharacteristics plot. When the output node reaches VDD V T H ~ NMOS moves from the satura,the tion to the linear region. I D Sin the linear region is less than in the saturation region for the same VGS[15], which causes the discharge to slow down. The slowest transition however is from cut-off + saturation because all the charge stored in the depletion region of the NMOS device has to sink before the channel is constructed between the drain and the source. MCML is therefore, faster than other logic styles (refer to Figure 2(a)) This is because Qz and Q3 are never totally OFF, and experience a transition from the saturation -+ linear region and vice versa which take a short time. The speed advantage of CML over other logic styles will start to fade as we move deeper in the DSM regime, where saturation currents are reduced compared to the linear currents and no longer follow the long channel behavior (a approaches 1).Not only will the carrier velocity tend to saturate as the channel length is scaled down, but the devices mobility will start to degrade as well. Figures 3(a) and 3(b) show the saturation velocity and mobility degradation of the electron respectively.
equal driving capability is achieved, which keeps the short3. circuit current within bounds [ ] Thus CMOS performance and robustness are both enhanced relative to other styles as technology scales down.
3.2
Hot carrier effect (HCE)
Another phenomenon that takes place as the technology is scaled down is the hot carrier effect (HCE) [16].The scaling down of the gate oxide thickness TOXat a higher rate than the supply voltage causes the electric field across the gate to increase, which causes the increase of electron velocity. Electrons would leave the silicon and tunnel into the gate oxide upon reaching enough energy levels. Electrons trapped in the oxide change V T Htypically increasing VTHof NMOS devices , ( V T H ~ ) , decreasing VTHof PMOS devices. MCML may while have some trouble with VTHvariation caused by the HCE, because the devices have to be matched for correct functionality. HCE is another reason that makes low voltage operations favorable. Logic families that can work at a lower supply voltage like MCML (with no degradation in functionality) will get more preference because this will reduce the HCE and the punch through phenomenon, leading to better reliability and lifetime. Logic styles that can tolerate minor changes in VTH will gain more importance because the HCE and electromigration ~ tend to increase V T H over time. For Domino and DCVS logic, this is translated into a small variation in delay and better noise margin. On the other hand, the higher V T H ~ may cause MCML to cease functionality. This is attributed ~ to the fact that increasing V T Hwould decrease the discharge current, causing the voltage swing VS to be limited in value. When V . is small, it might cause the following CML stages to malfunction. Circuits implemented using CPL also have degraded performance when affected by HCE, as a larger voltage drop ( V T H is )produced across the pass transistor. ~ The pass transistor and output inverter will therefore, have lower switching speeds, because the current is reduced. Shortcircuit currents also take place, adding to the CPLs power dissipation.
3.3
(a) Velocity Sat.
Leakage currents
(b) Mobility Deg.
Figure 3: Velocity Saturation & Mobility Degradation
In NMOS, the saturation velocity is reached at a lower critical electric field compared to PMOS. This indicates that pn is degraded at a much faster rate than p p [16].Eventually, a point is reached where both NMOS and PMOS have comparable mobilities and switching speeds. This is particularly important for the implementation of CMOS structures, for two reasons. Firstly, CMOS suffers from degraded performance because of the low mobility PMOS transistors. This speed disadvantage will gradually decrease as the technology scales down, and pn approaches pp. This enhances the performance of CMOS in terms of delay, power and area. Secondly, the optimum noise margin in CMOS is achieved when p p equals pn [17]. With p p = p n , the CMOS noise margin is enhanced, and
The performance of dynamic styles, particularly Domino, will degrade in DSM technologies. As explained in section 2.3, Domino logic is particularly susceptible to noise, due to the effect of leakage currents. Leakage currents are more pronounced as we move down in the DSM regime. This deteriorates the gates noise margin. To compensate for the low noise margins, the size of the PMOS keeper must increase, in turn increasing the contention current during evaluation, as well as the loading of the O / P node. This reduces the gates performance. The rate of improvement in the Dominos performance will therefore gradually decrease as we go deeper in DSM technologies. This is another reason that the performance of CMOS circuits is expected to approach the dynamic logic gates without tampering with noise margins. Figure 4 [18]plots the optimal VTHversus process technology for the static and dynamic cases. It is clear that the optimal VTHused in static and dynamic circuits diverge. Static circuits need lower VTH
404
19-1-4
to maintain gate drive with lower V D Dwhile in dynamic cir, cuits it becomes difficult to scale VTHdue to noise limits.
,
6 -
5 :
4 -
0.2
0.4
0.6
0.8
1.2
1.4
1.6
Technology ( p m )
02 .
' ~ " l " a ' l ~ ' s
'
'
"
"
0.8
0.6
0.35
0.25
0.18
0.15
Figure 5: Scaling trend for VDD and VTH
'
'
'
Technology ( pm) Figure 4: Optimal threshold voltage for static and dynamic circuits versus technology
Driver Circuit
3.4
The Drain-Induced Barrier Lowering (DIBL)

VDD-VTH
DIBL causes VTHto be a function of the operating voltage.

VTHdecreases with L,jj for short-channel devices, while an increase in the drain-source voltage VDScauses VTHto decrease. This effect is called DIBL. This becomes a problem especially for dynamic circuits which causes a reduction in the noise margin, that is particularly a problem in Domino logic implementations. As mentioned previously, to maintain sufficient noise margin, this would come at the expense of reduced performance.
Figure 6: Section of a gate implemented using CPL
3.5
Scaling down VDD/VTH ratio
VDD scaled down at a relatively slower rate than the scaling is down of VTH as shown in Figure 5. This is attributed to reliability restrictions that limit the electric field applied to the gate. drops with technology scaling Hence, the ratio VDDIVTH until it reaches a minimum value of 3 at a feature size of 0.07pm. This again explains the performance and power degradation of CPL logic styles. To further illustrate this, a section of the CPL circuit is shown in Figure 6. The voltage at the output of the driver circuit is V D D , while the pass transistor is initially OFF. As transistor Q 1 turns ON, it will start operating in the saturation mode, ~ ~. 1 where its current I cx ( V G S - V T H ~ )In the case of CPL, V G S ~ VDD- V T H (due to the V T H drop), thus = ~ ~ 1 cx (VDD- ~ V T H , )If VDDwas to take the worst case 1 ~. : a. value of 3VTHN [19], then 11 0 V T H ~ 11 is thus significantly reduced, and the switching speed of Q1 is largely degraded. Furthermore, this will increase the short-circuit current flowing from VDD G N D in the inverter. A further speed degrato dation, is accompanied as Qz passes through the saturation then linear phases while discharging the O/P node. Qz starts discharging in the saturation mode when V G S= V T H ~ . ~ Thus
I 0: ( V G S- V T H ~ ) ~ z ~ and is initially at V T Hto dis~ charge the O/P node. This produces a very small discharging current, hence a large time delay. This goes on until the out~, put node goes down to VDD- V T H where the keeper turns ON, pulling up the internal node to V D Dand hence acceler, ating the discharge process. This provides an additional delay constraint to CPL. Another problem associated with decreasratio is the reduction in gate robustness ing the VDD/VTH because the noise margins will dwindle. CPL, is also sensitive to voltage and device scaling [ZO], which again influences the gate's performance, power consumption and robustness.
3.6
Scaling of Interconnects
CPL has a complex structure, and a high wiring overhead due to the dual-rail signals. The wiring capacitance (interconnects) are high, causing the power and delay to also grow. This becomes worse in the DSM regime, where the RC delay of the interconnects occupies a large ratio of the clock cycle time, which reaches over 30% in the 0.25pm technology [21], as shown in Figure 7. This is another reason for the degradation in the CPL's performance. Complex structures implemented with DCVS also suffer from interconnect scaling.
19-1-5
405
35
4.2
Logic Style and Area
%ob.
B
0
\
1
. A
.. .. .. .. .. ..
L L
l...dL..L
....L.... ..L..*.. 1 .. .. ..
.. .L...l...&.-L ._._ .... 1 ,. .L . .... 0.8

1
B.......i
02 .
0.4
06 .
1.2
1.4
1.6
Technology (p) Figure 7: Trend of the ratio of the interconnect RC delay and the clock cycle
The choice of logic style affects the area in two ways; cell area and routing area. Cell area is a function of the number and size of the devices. It is also dependent on the complexity of the logic cell, since complex gates require more area for connecting the devices of the gate. Generally, differential logic styles CPL, DCVS and CML are area efficient in implementing arithmetic circuits and XOR based logic systems. For simple gates such as AND and OR, single ended logic styles CMOS and Domino are preferred. Input signals are connected to transistor gates only, which facilitates the usage and characterization of logic cells. The layout of CMOS gates is straight forward and efficient due to the complementary transistor pairs. Routing area is the wire interconnect area for connecting the gates together. Differential logic styles have twice the number of inputs and outputs compared to single ended lpgic families, leading to larger interconnect areas. As a d e of thumb, differential logic should be used only for complex gates especially XOR gates where it will reduce the total number of logic gates.
4
4.1
Area Considerations
Technology Scaling and Area
Results and Analysis
Metal interconnects are needed to connect transistors, route signals and supply power across the integrated circuit chip. As technology scales down, transistor feature sizes scale down linearly, while this is not true for metal wire interconnects due to physical limitations on the metal deposition. The interconnect pitch (metal width+space) is decreasing to exploit integration. However, the interconnect length is kept constant because of the use of more transistors per circuits. This leads to an increase in parasitic capacitances and line resistance. This degrades the chips performance, and higher power is dissipated per unit area which consequently augments the chips temperature. In older technologies, poly layers were used for routing because of their reasonable resistance. This is not the case in DSM technologies, where the impedance of the poly layer grows and is unsuitable for long interconnects. Such limitations lead to the use of extra vias and metal wires in routing, which adds additional overhead. Copper interconnects are particularly used to reduce the interconnect area since the physical limitations on copper size are more relaxed. Copper also has lower resistivity, allowing wires to have small widths, and thus less interconnect delays. However, many problems are associated with the use of copper wiring which makes it an expensive alternative [22]. The use of larger number of metal layers and stacked vias is a technique for improving interconnect density without reducing pitch . For DSM devices six levels of metal or more are used. Older technologies used only two or three levels of metal. Finally, the interconnect height is scaled at a slower rate than its width. This increases the wires aspect ratio, and consequently reduces the wire resistance. This, however, evokes line coupling, which causes crosstalk, increased power dissipation, and degradation in performance.
The performance of the logic gates in terms of power and delay are divided into two groups. The first includes the NAND, NOR and A 0 1 gates (Group I). The second group includes the MUX, XOR, and the FA (Group 11). Group I gates are usually implemented using single ended structures. Generally, CMOS is the most efficient style to implement Group I. Its low power consumption, and relatively good delay contribute to its low EDP values. The three dynamic styles follow CMOS in terms of minimum EDP. CML is particularly the most efficient due to its high speed and limited power. It is followed by DCVS then Domino logic. Domino though proves to be the fastest for NOR gate, but consumes a large amount of power. The high dynamic power associated with dynamic circuits is partly attributed to its high switching activity. CPL is considered the least efficient logic style to implement Group I gates. This is attributed to its exceptionally long delay and considerably high power, proving that AND and OR gates are the least efficient gates that could be realized by CPL. As for the complex structures in Group 11, logic styles having inverted signals and dual rails, are usually used to implement these functions efficiently. CML and DCVS are the most efficient styles to implement Group I1 gates. T i is attributed hs to their differential nature, inverted signal structures, sufficient speed, and tolerable power dissipation. Despite the NPDominos high speed, its large power degrades its EDP value, when implementing XOR and FA. Both static styles; CMOS and CPL, inefficiently implement Group I1 gates. However, MUXs are best realized using CPL, while CML tops other styles in implementing XOR and FA gates. XOR and MUX are considered the least efficient gates that could be realized using the CMOS implementation because they require inverted signals as inputs. Figures 8, 9 and 10 present the average normalized delay, power and EDP of Group I gates. While Figures 11, 12 and 13 present the average normalized delay, power and EDP of Group I1 gates. In Figure 8 it is clear that the speed enhancement for the logic styles decreases
406
19-1-6
3.5
I
-Domino
worth noting that CML had high EDP values in the 0.8pm technology (Figures 10 and 13), but achieves low EDP's as technology is scaled down. This is consistent with [14],because MCML works efficiently in power down technologies. Finally, all six graphs verify that both the delay and power of CMOS gates are relatively enhanced in DSM technologies.
-a-
Conv. CMOS
- -h--DCVS
a
0.25 0.35
0.6
0.8
Technology ( pm) Figure 8: Average Normalized Delay for Group I

loo
L
I CI
.*.... A ..1... .. ..L.

--A--
OCVS
0.25
0.35
0.6
0.8
Technology ( pm) Figure 11: Average Normalized Delay for Group I1
-Conv.
___L_._.I
CMOS
... ..
0.25
..L_..L___I..I.. ..l. .A.. .. .. ..* ...

0.6 0.8
+ Donilno
DCVS -cpL CMI. .U
0.35
--*--
Technology ( pm) Figure 9: Average Normalized Power/MHz for Group I

1
I
0.6 08 .
o w
........................... ..........................
0.25
N .-
0.35
Technology ( pm) Figure 12: Average Normalized Power/MHz for Group I1
0.25
0.35
0.6
0.8
Technology @" Figure 10: Average Normalized EDP for Group I
as technology scales down. CMOS however, has the best enhancement. In Figure 13, CPL had the best EDP values in the 0.8pm technology, but gradually experiences a relative increase in EDP as we move deeper in the DSM regime. It is also
Table 1 shows the results of the the CLA adder. Conventional CMOS proves to have the worst delay, while attaining a somehow average power dissipation value. Conventional CMOS io therefore, the least efficient way to implement the CLA adder. Domino logic comes as the second worst implementation. because of its single ended structure. All the differential ended structures have the best EDP to implement the CLA adder. This is because of the numerous A01 and XOR structures that are used to build the CLA adder. It should be noted that the CPL CLA adder was implemented with single branch structures. This is the main reason for CPL's limited power consumption.
19-1-7
407
Logic Style CMOS CPL Domino DCVS CML
Power (Norm.) 0.25 1 I 1.23 1.33 1.57 1.96
Table 1 CLA Comparison : Delay (Norm.)
EDP (Norm.)
08 . 36 .2 3.16 1.75 1.54 1.88
0.35 I 2.12 I 1.43 7.86 2.96 3.31
06 .
I 0.8 I 0.25 1
26.6 I 1 I 14.8 0.62 50.5 0.67 14.6 0.74 21.5 0 6 .
5.82 I 24 .9 11.6 39 . 4.22
0.35 1 16 I .5 1.58 0.81 0.91 0.81
0.6 1 31 1 . 1.95 1.62 1.17 1.15
0.25 1 1 0.48 0.59 0.85 0.71
0.35 57 1 .8 3.57 52 . 2.46 2.17
06 .
1 0.8
348 148 154 34.2 75.4
56 I 9.4 30.7 53 . 5.58
-Conv.
-m--A-
CMOS
- Vomino VCVS
[2] R.Gonzalez el al., Supply and threshold voltage scaling for low power CMOS, IEEE JSSC, pp. 1210-1216,1997, [3] J.M.Rabaey, Digital Integrated Circuits, Prentice Hall, 1996. [4] R.Zimmermann and W.Fichtner, Low-Power Logic Styles: CMOS Versus Pass-Tkansistor Logic, IEEE JSSC, pp. 10791090, July 1997. [5] R.H.Krambeck el al., High-speed Compact Circuits with CMOS, IEEE JSSC, pp. 614-619,1982. [6] P.Srivastava et al., Issues in the Design of Domino Logic Circuits, Proc. of IEEE GLSVLSI, pp. 108-112, 1998. [7] Ruchir Puri, Design Issues in Mixed Static-Domino Circuit
0.1
, ,
. .
0.6
, , .
, , . .
0.25
0.35
0.8
Technology ( pm)
Figure 1 : Average Normalized EDP for Group 1 3 1
Implementations, Proc. IEEE International Conf. on Computer Design, pp. 270-275, Oct. 1998. [8] N.Weste and K.Eshraghian, Principles of CMOS VLSI Design, Addison-Wesley Publishing Company, 1994. [9] William R. Griffin Lawrence G. Heller, Cascode Voltage Switch Logic: A Differential CMOS Logic Family, ISSCC, pp. 16-17, 1984. [lo] Wei Hwang Fang-shi Lai, Design and Implementation of Differential Cascode Switch with Pass-Gate (DCVSPG) Logic for High-Performance Digital Systems, JSSC, pp. 563-573, April
1997. [ll] A. Barriga M. J. Bellido J.L. Huertas A.J. Acosta, M. Va-
Conclusions
As technology scales down, CMOS is the least affected logic style. Its performance and robustness are enhanced compared to other logic styles. Dominos performance and power will deteriorate because of the leakage currents and contention caused by the keeper transistor, while DCVS will also suffer from leakage power, but doesnt have any contention problems during evaluation. Because interconnects are not scaled linearly with technology, the percentage of power consumed in the clock tree will grow. CPL performance degrades much faster than other logic styles because of the reduction of the with technology scaling. Hot carrier effect ratio VDD/VTH makes it even worse by increasing VTH over the long term. CPL area will tend to grow with more power dissipation for the larger area and the complex routing. Although CML tops the logic styles in many circuit implementations in terms of minimum EDP, it is yet not very widely used. This is attributed that CML cannot be used as standard cells, because the RC delay of each gate varies for every gate, according to the Funin and Funout. MCML may also have some trouble with VTHvariations caused by the hot carrier effects. But if MCML is used a t a lower supply voltage, the effect of the hot carrier will be less significant.
lencia, SODS: A New CMOS Differential Type Structure, JSSC, pp. 835-838, July 1995. [12] B. Kong, J . Choi, S. Lee and K. Lee, Charge Recycling Differential Logic CRDL for Low Power Applications, JSSC, pp. 1267-1276, September 1996. [13] M. Mizuno et al., A GHz MOS Adaptive Pipeline Technique Using MOS Current-Mode Logic, JSSC, pp. 784-791, June
MOS current mode logic MCML circuit for low-power GHz processors, NEC Research 4 Development, vol. 36, n. 1, pp. 54-63, J a n 1995. [15] T.Sakurai el-al., Alpha-power law MOSFET model and its applicatiorp- to CMOS inverter delay and other formulas, IEEE JSSC,pp. 584-594,1990. [IS] T.Hayashi et al., Hot carrier injection in PMOSFETs, OK1 Technical Review, pp. 59-62, 1991. [17] A. Bellaouar and M. I. Elmasry, Low-Power Digital VLSI Design Circuits and Systems, Kluwer Academics Publications,
1995. [18] S.Thompson et al., Dual Threshold Voltage and Substrate Bias: Keys t o High Performance, Low Power, O.lpm Logic
1996. [14] M. Yamashina and H. Yamada,
References
[l] M.Bohr el al., A high-performance 0.25-pm logic technology optimized for 1.8V operation, IEDM, pp. 847-850, 1996.
Designs, IEEE Symposium on VLSI Technology Tech. Dig., pp. 69-70,1997. [19] S.Thompson et al., MOS Scaling: Transistor Challenges for the 21st Century, Intel Technology Journal, Q9, 1998. [ZO] K.Yano et al., Top-Down Pass-Transistor Logic Design, IEEE JSSC, pp. 792-803, June 1996. Technology for Advanced High[21] M.Bohr et Y.Elmansy, Performance Microprocessors, IEEE Trans. on Electron Devices, pp. 620-625, vo1.45 1998. [22] Mark Bohr, Technology development strategies for the 21st century, Applied Surface Science, pp. 534-540,100/101 1996.
408
19-1-8

Effect of Tech Scaling

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Effect of Tech Scaling

Hochgeladen von

Copyright:

Verfügbare Formate

Effect of Technology Scaling on Digital CMOS Logic Styles

Mohamed Allam, Mohab Anis and Mohamed Elmasry*

Complementary Pass Logic (CPL)

Differential Cascode Voltage Switch (DCVS)

MOS Current Mode Logic (MCML)

Figure 1: Full Swing Logic Styles

Effect of Technology on Logic Styles

Hot carrier effect (HCE)

(b) Mobility Deg.

Figure 3: Velocity Saturation & Mobility Degradation

Figure 5: Scaling trend for VDD and VTH

The Drain-Induced Barrier Lowering (DIBL)

DIBL causes VTHto be a function of the operating voltage.

Figure 6: Section of a gate implemented using CPL

Scaling down VDD/VTH ratio

Logic Style and Area

.. .L...l...&.-L ._._ .... 1 ,. .L . .... 0.8

Results and Analysis

Technology ( pm) Figure 8: Average Normalized Delay for Group I

.*.... A ..1... .. ..L.

Technology ( pm) Figure 11: Average Normalized Delay for Group I1

..L_..L___I..I.. ..l. .A.. .. .. ..* ...

Technology ( pm) Figure 9: Average Normalized Power/MHz for Group I

Technology ( pm) Figure 12: Average Normalized Power/MHz for Group I1

Technology @" Figure 10: Average Normalized EDP for Group I

Logic Style CMOS CPL Domino DCVS CML

Power (Norm.) 0.25 1 I 1.23 1.33 1.57 1.96

Table 1 CLA Comparison : Delay (Norm.)

0.35 I 2.12 I 1.43 7.86 2.96 3.31

5.82 I 24 .9 11.6 39 . 4.22

0.35 1 16 I .5 1.58 0.81 0.91 0.81

0.6 1 31 1 . 1.95 1.62 1.17 1.15

0.25 1 1 0.48 0.59 0.85 0.71

0.35 57 1 .8 3.57 52 . 2.46 2.17

56 I 9.4 30.7 53 . 5.58

1996. [14] M. Yamashina and H. Yamada,

Das könnte Ihnen auch gefallen