Sie sind auf Seite 1von 2

2.

3
High Performance, Energy Efficient Master-Slave Flip-Flop Circuits
Uming Kol,and Poras T. Balsara2
‘Texas Instruments Incorporated, P. 0. Box 655303, M I S 8316, Dallas, TX 75265, uko@daldd.sc.ti.com
Dept. of Elect. Engg., University of Texas at Dallas, P.O. Box 830668, EC33, Richardson, TX 75083

MOSFETs for push-pull, the two transmission gates in feedback paths are
eliminated. Compared to the regular DFF, this push-pull DFF is 31% faster but
This paper investigates performance, power and energy e&iency of several
with a 22% power overhead.
CMOS master-slave D-fEip-JIops (DFFs). To improve performance and energy
eficiency, a push-pull DFF and a push-pull isolatwn DFF are proposed To optimize for energy usage in the push-pull DFF, two pMOSFETs are added to
Among the $ve DFFs compared, the proposed push-pull kolation circuit is isolate the feedback path. This push-pull isolation DFF (PPI-DFF) increases the
found to be the fastest with the highest energy e&iency and a minimum data transistor count to 18, but achieves 16% reduction in total power and a speedup of
pulse width property. Effects of using DPL circuit and tri-state push-pull driver 25% (Table I) relative to the previous push-pull DFF. Compared to the regular
are studied. The impact of scaling supply voltage alone and scaling transistor DFF, PPI-DFF improves speed by 56% at an expense of 6% more power. Energy
threshold voltage with supply voltage on speed and power consumption of these efficiency of this PPI-DFF is enhanced by 45%-122% when compared to the
circuits is also examined. previous four DFFs. Applying a DPL [5] input (Fig. I(f)) to the PPI-DFF can
result in a 20% reduction in the setup time. However, when D is at logic 1 and C
I. Introduction switches from logic 1 to 0, a DC-path exists (INK-P2-P1-C) leading to a 60%
DFFs are one of major functions in finite state machines (FSM) which in turn is power overhead. Another option is to use a tri-state inverter to replace the push-
the critical part of control logic. It has been reported in [I] that the control logic pull driver of PPI-DFF, as shown in Fig. 1(g). Though this approach reduces the
of a microprocessor can occupy 20% of the processor’s power. As more short-circuit power of the push-pull driver, it weakens the drive strength due to
advanced architecture concepts, such as register renaming and out-of-order stacked MOSFETs and is 10% less efficient in energy compared to the PPI-DFE
execution in a superscalar microprocessor [2], continue to prevail, the control
logic will likely be more complicated and it power dissipation will likely grow HI.Effects of Scaling V, and v d d
beyond this current level. In addition, to boost processor clock frequency, SPICE simulation results for various supply voltages (vdd) using the same device
modem processors typically adopt superpipelined execution [2] which uses DFFs. models with constant threshold voltages (VJ are summarized in Fig.2-Fig. 5. In
Enhancing DFFs’ speed can either lead to a higher clock rate or allow more logic
Fig. 2, speed of the low-area DFF degrades faster than others under low voltages,
depths between two pipeline registers. In this paper, we compare area, speed, and
as it takes longer to resolve the logic value. In Fig. 3, the push-pull and low-area
power of five different DFF implementations: a regular low-risk DFF [3], a low-
DFFs consistently dissipate higher power due to voltage contention in feedback
area DFF, a low-power DFF [4], a proposed push-pull DFF for performance, and
loops. For a v d d range of 1.5-3.5V, PPI-DFF’s energy efficiency is 42-51%
another proposed push-pull isolation DFF (PPI-DW) for performance and energy
efficiency. Discussion is then extended to the use of double pass-transistor logic higher than that of the regular, low-power, and push-pull DFFs, and is 218-272%
(DPL) for speed [5] and tr-stated circuit for reducing short-circuit power higher than that of the low-area DFF (Fig. 4). From 3.5V to 1.5V of Vdd, on an
dissipation. Lastly, effects rf scding supply voltage at constant threshold average, energy efficiency is improved by a factor of 2. At lSV, the minimum
voltage, as well as scaling threshold voltage with supply voltage are examined. data pulse width of low-area and push-pull DFFs degrades to 3ns while the other
three DFFs maintain at 0.7ns, which strongly suggests the former should be
11. Design Techniques and Comparison of Energy Efficiency avoided in low-voltage, high-performance applications (Fig. 5). Scaling V, with
A conventional negative edge-triggered DFF consists of two level-sensitive Vdd can maintain a proper signal-to-noise ratio [5]. Assuming V, can be kept at 1/
latches or 16 MOSFETs is illustrated in Fig. I(@. The speed of this regular DFF 5 of Vdd, the effects of scaling both simultaneously down to a Vdd of 1.OV are
is limited by two-gate delay (245 ps, Table I) after the clock signal, C, transitions summarized in Fig. &Fig. 9. In contrast to Fig. 4, Fig. 8 indicates that energy
from logic 1 to 0. The advantage of this DFF design is that it involves minimum efficiency is improved by a factor of 5.7 when Vdd is scaled from 3.5V down to
design risk. A common approach to reduce area overhead of the regular DFF is to 1 .OV. From Fig. 5 to Fig. 9 at a v d d of 1.5V, minimum data pulse of low-area and
remove the two feedback transmission gates. This low-area DFF is depicted in
push-pull DFFs improves by a factor of 5.8 as the V, scaling reduces effects of
Fig. l(b), and it uses 25% fewer transistors. Although the strength of feedback
voltage contention, while that of the other DFFs improve only by a factor of 1.8.
inverters has been weakened to minimize the short-circuit power dissipation due
to voltage contention, this low-area DFF still consumes 18% more total power
and is 42% slower (or 76% more energy, Table I) than that of the regular DFF.
IV. Conclusions
Though the low-area DFF uses up to 33% fewer transistors, the internal voltage
One approach to optimize for power dissipation is to replace the inverter and contention consumes up to 122% more energy than the rest of DFFs. Compared
transmission gate in the feedback path of Fig. I(a) with a single tr-state inverter, to a regular DFF, a low-power and a push-pull DFF improve power dissipation by
which is referred to as a low-power DFF [4] as shown in Fig. l(c). The tri-state 1% and delay by 31%, respectively, but end up with a comparable energy
inverter avoids short-circuit power dissipation in the feedback path, and yields efficiency. The proposed PPI-DFF improves speed by 56% at the expense of only
only 1% reduction in total power and 3% (Table I) slower speed when compared 6% of more power, when compared to a regular DFF. Energy efficiency of this
to the regular DFF. Considering area and energy efficiency, the low-power DFF PPI-DFF is 45-122% higher than that of the other DFFs. On an average, while
is comparable to the regula DFF. To optimize for speed, an inverter and scaling supply voltage from 3.5 to 1.xV enhances energy efficiency by a factor of
transmission gate are added between outputs of the master and slave latches to 2, scaling it with threshold voltage can boost the efficiency by a factor of 5.7.
accomplish a push-pull effect at the slave latch, as depicted in Fig. 1(d). This
adds four MOSFETs, but reduces the clock-to-output (C-to-Q) delay from two References
gates in a regular DFF to one gate. One method to reduce the transistor count is [I] IBM, “Blue Lightning Technology Preview,” IBM, 1993.
to use nMOSFET for latches’ input [6]. However, the output of the nMOSFET [2] M. Johnson, “Superscalar Microprocessor Design,” Prentice Hall, NJ, 1991.
can only reach a voltage level of Vdd-Vtwhen it is at logic I, causing a power [3] N. Weste er al., “Principles of CMOS VLSI Design,” Addison-Wesley, 1993.
[4] G. Gerosa et al., “2.2W, 80MHz Superscalar RISC Processor,” JSSC, 12/1994.
overhead up to 50% [5]. A second issue of the nMOSFET input is the speed [5] U. KOet al., “Low Power Techniques for HP Adder,” Trans. on VLSI, 6/1995.
degradation due to a slow transition from logic 0 to logic 1. Therefore, a full [6] R. Hossain er al., “Low Power Design with DET FF,” Trans. on VLSI, 6/1994.
transmission gate is kept in the push-pull DFF. To offset the four added

1
16 0-7803-3036-6195/$4.00 01995 IEEE
Q
500 b.. I 1 I --~- - I1

Reqular +-
Lowarea -+--
~~

4
a) RGular DFF (b) Low-&ea DFF . Low-power - 0 . -
I
. 400

......
0 ...........
7 "6.-
...........

Y 200 -A--.-

.-"0.2
inn I I
0.3 0.4 0.5
I
0.6 0.7
I I

c) Low-power DFF Threshold Voltage, Vt (with Vdd= 5Vt) (volts)


e
Fig. 6. Dependency of delay on v, & v d d scaling
190 I I I I R
Q

C c
5)Push-pull isolation DFF (0 DPL CKT (g) 3-state CKT
Fig. 1. Schematic of different purposes flip-flops and circuits
1300 5 I , I 40

L
in
.- I

0.2 0.3 0.4 0.5 0.6 0.7


Threhold Voltage, Vt (with Vdd= 5Vt) (volts)
PP Isolation 4-
Fig. 7. Dependency of power on V, & v d d scaling

1.5 2 2.5 3 3.5


Supply Voltage (volts)
Fig. 2. Delay vs. supply voltage
190, I I I f

-._
0 -
0.3 0.4 0.5 0.6 0.7
Threshold Voltage, Vt (with Vdd= 5Vt) (volts)
Fig. 8. Dependency of energy on Vt & Vdd scaling
0.6 I I I I

1.5 2 2.5 3 3.5


Supply Voltage (volts)
Fig. 3. Average power vs. supply voltage

50 Regula; -e-- ,,---"


- ow area -t- _-'
45 - Low-power - U - - A-*'- 0.1 I I I I I T
40 r PUShDUll X __--
/--

0.2 0.3 0.4 0.5 0.6 0.7


Threshold Voltage, Vt (with Vdd= 5Vt) (volts)

Fig. 9. Dependency of min. pulse (=t,,tup+thold) on V, & Vdd scaling

.- TABLE I. Comparison of power, delay, & energy for various DFFs


1.5 2 2.5 3 3.5
Supply Voltage (volts)
Fig. 4. Energy consumption vs. supply voltage
I I I I
2.5 Regular -e-
Lowarea
2 .......... Low power - 0 . -
1.5 3:::
... Pushpull --X.-
1
<
.: -.>...<. .... PP Isolation A.-

0.5
0
1.5 2 2.5 3 3.5
Supply Voltage (volts)
Fig. 5. Minimum pulse ('tsetup+thold) vs. supply voltage

2 17

Das könnte Ihnen auch gefallen