Constant Delay Logic Style

554 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO.
3, MARCH 2013
Constant Delay Logic Style
Pierce Chuang, Student Member, IEEE, David Li, Student Member, IEEE, and Manoj Sachdev, Fellow, IEEE
AbstractA constant delay (CD) logic style is proposed in
this paper, targeting at full-custom high-speed applications. The
CD characteristic of this logic style regardless of the logic type
makes it suitable in implementing complicated logic expressions
such as addition. CD logic exhibits a unique characteristic where
the output is pre-evaluated before the inputs from the preceding
stage is ready. This feature offers performance advantage over
static and dynamic domino logic styles in a single-cycle multistage
circuit block. Several design considerations including timing
window width adjustment and clock distribution are discussed.
Using 65-nm general-purpose CMOS technology, the proposed
logic demonstrates an average speedup of 94% and 56% over
static and dynamic domino logic, respectively, in ve different
logic gates. Simulation results of 8-bit ripple carry adders show
that CD logic is 39% and 23% faster than the static and
dynamic-based adders, respectively. CD logic also demonstrates
39% speedup and 64% (22%) energy-delay product (EDP)
reduction from static logic at 100% (10%) data activity in
32-bit carry lookahead adders. For 8-bit Wallace tree multiplier,
CD logic achieves a similar speedup with at least 50% EDP
reduction across all data activities.
Index TermsAdder, constant delay (CD), feedthrough,
high-performance logic style, pre-evaluated.
I. INTRODUCTION
H
IGH-PERFORMANCE energy-efcient logic style has
always been a popular research topic in the eld of
VLSI circuits because of the continuous demands of ever-
increasing circuit operating frequency. The invention of the
dynamic domino logic [Fig. 1(a)] in the 1980s is one of the
answers to this request, as it allows designers to implement
high-performance circuit blocks, i.e., arithmetic logic units, at
an operating frequency that traditional static and pass transistor
CMOS logic styles nd difcult to achieve [1]. However, the
performance enhancement comes with several costs, including
a reduced noise margin, a problem of charge-sharing, and
higher power dissipation due to a higher data activity. Several
variations of the dynamic domino logic, namely NP domino
(NORA domino) [2], zipper domino [3], and data-driven
dynamic logic (D
3
L) [4], [5], have been proposed but they
are never widespread in the VLSI industry [6], [7].
Compound domino logic (CDL), where dynamic and static
gates alternating between each other, has become the most
Manuscript received May 9, 2011; revised October 13, 2011; accepted
February 16, 2012. Date of publication March 13, 2012; date of current version
February 20, 2013.
P. Chuang and M. Sachdev are with the Department of Electrical and
Computer Engineering, University of Waterloo, Waterloo, ON N2L5Y9,
Canada (e-mail: pichuang@uwaterloo.ca; msachdev@uwaterloo.ca).
D. Li was with the Department of Electrical and Computer Engineering,
University of Waterloo, Waterloo, ON N2L5Y9, Canada. He is now with Qual-
comm Inc., San Diego, CA 92121 USA (e-mail: d4li@engmail.uwaterloo.ca).
Color versions of one or more of the gures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identier 10.1109/TVLSI.2012.2189423
popular logic style in high-performance circuit blocks, i.e.,
64-bit adder, in modern CPUs [8][11]. In this design, the
output inverter is replaced with a more complex inverting
static gate, i.e., NAND, such that the monotonicity require-
ment is satised while conducting complex logic operations
without wasting the one inverter delay [12]. Moreover, all
the dynamic stages except the rst stage can be footless in
CDL. This implementation, however, comes at the expense of:
1) increased power consumption due to the possible direct path
current during the precharge period; and 2) a reduced noise
margin as a result of unprotected dynamic domino logics
outputs [6].
A signicant research effort has been dedicated to exploring
new logic styles that go beyond dynamic domino logic and
CDL. In particular, source-coupled logic (SCL) [13] has shown
superior performances that are difcult to achieve by any other
logic styles. However, it suffers from high power dissipation
due to a constant current draw, and its differential nature
requires complementary signals. Pseudo-nMOS logic, which
uses a single pull-up pMOS transistor, provides both high
speed and low transistor count at the expense of high static
power consumption as well as reduced output voltage swing.
Output prediction logic (OPL) [14] has also shown superior
performance in high-speed adders [15]. Nevertheless, OPL
requires the generation and distribution of multiphase clock
signals with small timing separations and low skews, which are
difcult to achieve. While numerous high-speed logic styles
have been proposed, dynamic and CDL still remain the most
attractive choices when performance is the primary concern.
In recent years, a new way of logic operation, also known as
feedthrough logic (FTL) [16], [17], has been proposed, which
has demonstrated its high-performance capability. Consider a
dynamic domino logic [Fig. 1(a)], the critical path consists of
nMOS logic transistors. In FTL however, the roles of the clock
and logic transistors are interchanged (Section II-A) and the
clock transistor is now the critical path. The rst generation of
FTL exhibits many shortcomings, including excessive power
dissipation, and reduced noise margin. To mitigate these prob-
lems, we propose a new high-performance logic, which we call
constant delay (CD) logic. CD logic provides a local window
technique and a self-reset circuit which enable robust logic
operation with minimized power consumption while maintain-
ing FTLs speed advantage. The most distinct characteristic of
CD logic from previously proposed logic styles is that the
delay is, on a rst-order approximation, not affected by the
logic expression.
1
Unlike SCL, CD logic does not require
1
The delay of CD logic is still a function of logic expression when all
effects are considered, since a more complicated logic expression implies a
larger capacitive load.
10638210/$31.00 2012 IEEE
CHUANG et al.: CONSTANT DELAY LOGIC STYLE 555
CLK
CLK
PDN
IN
Out
(a)
CLK
NMOS
Pull Down
Network
Out
CLK
M1
M2
IN
(b)
Fig. 1. Schematic of (a) dynamic domino logic with a footer transistor and
(b) FTL.
complementary signals and can be easily integrated with static
and dynamic domino logics. Also, CD logic does not have
the problem of constant static power dissipation similar to
pseudo-nMOS. Furthermore, the clock timing requirement of
CD logic is not as stringent as OPL. CD logic can achieve
robust operation with optimal performance as long as clock
signals arrive earlier than the input signals. In this paper, we
will demonstrate that: 1) CD logic outperforms other logic
styles with better energy efciency and is particularly suitable
for high-performance digital blocks and 2) CD logic is robust
under extreme process and temperature variations.
The rest of this paper is organized as follows. Section II
reviews the history of FTL and introduces the proposed CD
logic. Section III discusses the design considerations of CD
logic and provides a rst-order approximation of the unwanted
glitch voltage. Section IV analyzes the impacts of window
width on robustness and compares CD logic with other logic
styles in different logic expressions. Section V demonstrates
CD logics performance advantage in single-cycle multistage
digital applications such as 32-bit adders. Finally, Section VI
concludes with some nal remarks.
II. EVOLUTION OF CD LOGIC
A. FTL Logic
FTL logic [Fig. 1(b)] in CMOS technology was rst intro-
duced in [16] and [17]. Its basic operation is as follows: when
CLK is high, the predischarge period begins and Out is pulled
down to GND through M2. When CLK becomes low, M1
is on, M2 is off, and the gate enters the evaluation period.
If inputs (IN) are logic 1, Out enters the contention mode
where M1 and transistors in the nMOS pull-down network
(PDN) are conducting current simultaneously. If PDN is off,
then the output quickly rises to logic 1. In this case, FTLs
critical path is always a single pMOS transistor.
Despite its performance advantage, FTL suffers from
reduced noise margin, excess direct path current, and nonzero
nominal low output voltage, which are all caused by the
contention between M1 and nMOS PDN during the evaluation
period. Furthermore, cascading multiple FTL stages together
to perform complicated logic evaluations is not practical.
Consider a chain of inverters implemented in FTL cascaded
together and driven by the same clock, as shown in Fig. 2.
When CLK is low, M1 of every stage turns on, and the output
of every stage begins to rise. This will result in false logic
575 600 625 650 675 700 725
1.0
.75
.5
.25
0
(
V
)
time(ps)
CLK
N2 N4 N6 N8
N2 N4
CLK
N4
CLK
M1
M2
N3
M3
N1 N3
CLK
Fig. 2. Simulated unwanted glitch at different logic depths in a chain of
inverters implemented with FTL.
evaluations at even numbered (i.e., 2, 4, 6, etc.,) stages since
initially there is no contention between M1 and nMOS PDN
because all inputs to nMOS transistors are reset to logic 0
during the reset period.
B. CD Logic
To mitigate the above-mentioned problems, CD logic is
proposed with a schematic shown in Fig. 3(a). Timing block
(TB) creates an adjustable window period to reduce the static
power dissipation. Logic Block (LB) helps to reduce the
unwanted glitch and also makes cascading CD logic feasible.
A buffer implemented in CD logic with schematics of TB and
LB is shown in Fig. 3(b).
1) CD Logic Operation: Fig. 4 depicts the corresponding
CD logic timing diagram and owchart. For simplicity, we
assume that IN come from dynamic domino logic gates. When
CLK is high, CD logic predischarges both X and Y to GND.
When CLK is low, CD logic enters the evaluation period and
three scenarios can take place: namely, the contention, CQ
delay, and DQ delay modes. The contention mode happens
when CLK is low while IN remain at logic 1. In this case, X
is at a nonzero voltage level which causes Out to experience
a temporary glitch. The duration of this glitch is determined
by the local window width, which is determined by the delay
between CLK and CLK_d. When CLK_d becomes high, and
if X remains low, then Y rises to logic 1, and turns off M1.
Thus the contention period is over, and the temporary glitch at
Out is eliminated. CQ delay mode takes places when IN make
a transition from high to low before CLK becomes low. When
CLK becomes low, X rises to logic 1 and Y remains at logic
0 for the entire evaluation cycle. The delay is measured by
the falling edge of both CLK and Out: hence the name CQ
delay. DQ delay mode utilizes the pre-evaluated characteristic
of CD logic to enable high-performance operations. In this
mode, CLK falls from high to low before IN transit, hence
X initially rises to a nonzero voltage level. As soon as IN
become logic 0, while Y is still low, then X quickly rises
to logic 1. A race condition exists in this case between X
and Y. If CLK_d rises much earlier than X and Y will go to
logic 1, turn off M1, and result in a false logic evaluation. If
CLK_d rises slightly slower than X, then Y will initially rise
(thus slightly turns off M1) but eventually settle back to logic
0. CD logic can still perform the correct logic operation
in this case, however, its performance is degraded because of
M1s reduced current drivability. Therefore, it is important to
556 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013
Out
NMOS
Pull Down
Network
X
Y
CLK
M2
M1
IN
M0
CLK
Logic
Block
Timing
Block
(a)
Out
Y
CLK
M2
M1
CLK
Y
X
X
CLK
CLK
M3
M4
M5 M6
CLK_d
M0
CLK
2.8u
2.8u
IN
X
1u
M7
NMOS Pull Down Network
Timing Block
Logic Block
INV1 INV2 INV3
(b)
Fig. 3. CD logic (a) block diagram and (b) buffer.
CLK is low, evaluation period
begins
PDN is on ?
(IN = 1)
Yes, direct path current between
PMOS and PDN
X rises to a non-zero voltage
level, Out experiences a
temporary glitch
C-Q Delay Mode
No X rises to logic 1, Out
discharges to GND
Predischarge period
X rises to logic 1, Out
discharges to GND
Contention Mode
PDN is on for the entire
evaluation period ?
Yes, remain in contention mode
Y = logic 1 (window closes),
eliminates temporary glitch
No, inputs transit to 0
while window is still
transparent (Y = 0)
D-Q Delay Mode
Contention Mode exists
for the entire evaluation
period
*All inputs are initially at logic 1,
and conditionally transit to 0
prior or during the evaluation
period
CLK is high. X is predischarged to GND and
Out is precharged to VDD
D-Q Delay C-Q Delay Window Width
CLK
CLK_d
IN
X
Y
Out
Pre-evaluated
Contention
Temporary
Glitch
Fig. 4. Timing diagram and owchart of the proposed CD logic.
TABLE I
SUMMARY OF CD LOGICS OPERATION
Mode Scenario Operation
Predischarge CLK is high X and Out are predischarged and precharged to
GND and VDD, respectively.
Contention IN = 1 for the entire evaluation period. Direct path current ows from pMOS to PDN.
X rises to a nonzero voltage level and Out
experiences a temporary glitch.
C-Q Delay IN goes to 0 before CLK transits to low. X rises to logic 1 and Out is discharged to VDD.
Delay is measured from CLK to Out.
D-Q Delay IN goes to 0 after CLK transits to low (while
window is still open, i.e., Y is still 0).
X initially enters contention mode and later rises to
logic 1. Delay is measured from IN to Out.
maintain a sufcient window width under processvoltage
temperature (PVT) variations. Table I presents a summary of
CD logics operations.
Compared to FTL, where the contention lasts for the
entire evaluation period, TB effectively reduces CD logics
power consumption during the contention mode. The local
window technique in the proposed CD gate allows designers to
customize the window width for different logic expressions
to achieve minimal power dissipation while not sacricing
the performance. For instance, a multiple input NAND gate
will require a longer window width than a NOR gate because
of the larger internal capacitance due to the stacked nMOS
transistors. Another advantage of CD logic is that the internal
node (X) is always connected to either VDD or GND, thus
making the robustness of CD logic comparable to static logic,
except during the contention mode.
CD logic eliminates the problem of false logic evaluation
associated with cascaded FTL. Consider a cascaded CD logic
system, in which the inputs to nMOS PDN are always at logic
1 when rst entering the evaluation period, because X and
Out are always predischarged and precharged to logic 0 and
1, respectively. Therefore, when CLK is low, CD gates will
always rst enter the contention mode and conditionally make
a low-to-high transition depending on the inputs. This is not
the case for the rst stage CD gate, however, as there is no
guarantee that the inputs will always be at logic 1. In other
VSS
P1
N1
N2
P2
V1 VDD - V2
VDD
Fig. 5. Simplied schematic of CD logic during contention mode.
words, designers need to ensure that the input signals to the
rst CD gate arrive earlier than the clock signal, i.e., operate
in CQ delay mode only.
III. CD LOGIC DESIGN CONSIDERATIONS
A. CD Logic Sizing
The sizing of INV1-3 and M3M6 in Fig. 3(b) is close
to the minimum size so that they do not create a huge area
burden. The length of INV1-3 can be altered to provide the
required timing window duration based on designers choices.
1) CD Logic Versus Pseudo-nMOS: Both pseudo-nMOS
and CD logic are ratioed circuits which rely on the correct
pMOS to nMOS strength ratio to perform correct logic oper-
ations. pMOS transistor width is often selected to be about
one-fourth the strength of the nMOS PDN as a compromise
between noise margin and speed in pseudo-nMOS [6]. On the
other hand, CD logic always discharges X to GND when CLK
is high, therefore, CD logic can be optimized for low-to-high
transition only. Hence, pMOS clock transistors in CD logic
can be upsized larger to provide more speedup, as long as the
output glitch is maintained at an acceptable level.
B. Output Glitch
Fig. 5 depicts a simplied schematic of CD logic during
the contention mode, where both transistors P1 and N1 are on
simultaneously and induce a glitch voltage V
1
, which in turn
generates another smaller glitch V
2
. By design, V
1
should
be small [i.e., less than the threshold voltage (V
t
)]. Hence,
P1 operates in the saturation region while N1 is in the linear
region. The current equation is given as
1
2
p
C
ox
W
p1
L
p1
_
V
gsp1
V
t p
_
2
=
n
C
ox
W
n1
L
n1
_
_
V
gsn1
V
t n
_
V
dsn1
(V
dsn1
)
2
2
_
(1)
where
p
and
n
are the hole and electron mobility of
pMOS and nMOS transistors, respectively, C
ox
is the oxide
capacitance, W and L are the transistor width and length,
respectively, and V
gs
and V
ds
are the transistor gate-to-source
and drain-to-source voltages, respectively. Assuming same
length devices are used, and
p
0.5
n
, rearranging (1)
gives
W
p1
4W
n1
_
V
DD
V
t p
_
2
= (V
DD
V
t n
) V
1
(V
1
)
2
2
. (2)
V
1
can be found by solving the quadratic equation
V
1
= V
DD
V
t n
_
(V
DD
V
t n
)
2
W
p1
2W
n1
_
V
DD
V
t p
_
2
. (3)
By Taylor expansion, the square root term can be approxi-
mated in rst order as
_
N
2
+d = N +
d
2N

d
2
8N
3
+
d
3
16N
5
N +
d
2N
. (4)
Assuming V
t p
V
t n
, (3) can be approximated as
V
1
V
DD
V
t n
(V
DD
V
t n
) +
W
p1
_
V
DD
V
t p
_
2
4W
n1
(V
DD
V
t n
)
W
p1
_
V
DD
V
t p
_
4W
n1
. (5)
V
2
can also be found through a similar approach. Consider
Fig. 5 again transistor N2 operates in the subthreshold region
while P2 is working in the linear mode. Equating the two
current equations yields
W
n2
L
n2
I
t
e
V
gsn2
V
t n
V
T
_
1 e
V
dsn2
V
T
_
=
p
C
ox
W
p2
L
p2
_
_
V
gsp2
V
t p
_
V
dsp2
(V
dsp2
)
2
2
_
(6)
where [18], [19]
I
t
=
o
C
ox
(V
T
)
2
e
1.8
, = 1 +
3T
ox
W
dm
, V
T
=
KT
q
(7)
where
o
is the zero bias mobility, is the subthreshold swing
coefcient, W
dm
is the maximum depletion layer width, V
T
is the thermal voltage, K is the Boltzman constant, T is the
temperature in kelvin, and q is the electron charge. In the case
of nMOS transistors,
o
is simply
n
. Rearranging (6) gives
W
n2
I
t
e
V
1
V
t n
V
T
p
C
ox
W
p2
=
_
V
DD
V
1
V
t p
_
V
2
(V
2
)
2
2
(8)
where e
(V
dsn2
)/V
T
0 since (V
DD
V
2
) >> V
T
. Solving
V
2
then yields
V
2
= V
DD
V
1
V
t p
_
_
V
DD
V
1
V
t p
_
2
Ae
V
1
V
t n
V
T
(9)
A =
2W
n2
I
t
p
C
ox
W
p2
4W
n2
W
p2
(V
T
)
2
e
1.8
. (10)
Applying Taylor expansion
V
2
V
DD
V
1
V
t p
_
_
V
DD
V
1
V
t p
_
Ae
V
1
V
t n
V
T
2
_
V
DD
V
1
V
t p
_
_
.
(11)
1.6 1.65 1.7 1.75
1.03
1.0
. 975
.95
. 925
.9
. 875
(
V
)
Time (ns)
N1
N2
N5
N4
N3
Fig. 6. Simulated output glitch of a ve-stage two-input AND gate
implemented with CD logic.
-40 -20 0 20 40 60 80 100 120 140
0
10
20
30
40
50
60
70
80
90
100
AND3 - Average Glitch
AND3 -3
OR3 - Average Glitch
OR3 -3
V
o
l
t
a
g
e
(
m
V
)
Temperature (C)
Fig. 7. Temporary glitch mean and standard deviation at the output of three-
input CD, AND, and OR gate versus temperature in a Monte Carlo simulation
with 7500 iterations.
Finally
V
2

Ae
V
1
V
t n
V
T
2
_
V
DD
V
1
V
t p
_. (12)
Equations (5) and (12) provide several rst-order design
insights for CD logic. For a given V
1
, designers can quickly
estimate the required W
p1
to W
n1
ratio. Moreover, V
1
is
linearly proportional to the shift of V
t
and transistor width in
the presence of process variations. When V
1
is sufciently
small, V
2
is approximately zero. As V
1
increases, both
the numerator and denominator in 12 contribute to V
2
s
exponential increase. In a multistage CD logic circuitry, V
1
of each stage will slowly increase because of the reduced V
gs
of N1 due to V
2
from the preceding stage. This phenomenon
is demonstrated in Fig. 6, where the glitch level aggravates as it
traverses through a series of two-input AND gate implemented
with CD logic. Equation (12) also suggests that V
2
is a strong
function of temperature. Fig. 7 illustrates the mean and three
standard deviations (3) of the temporary glitch at the output
of a three-input CD, AND, and OR gate versus temperature.
Clearly, as the temperature increases from 20 C to 120 C, the
3 glitch level of a three-input AND gate raises from 55 mV
to approximately 140 mV.
C. Power Consumption
Data activity measures how frequent signals toggle and is
dened as
data activity =
# of signal transitions
# of signals # of clock cycles
. (13)
Static logic has an empirical of 0.10.2 [6] and dynamic
domino logic has an activity factor of 0.5. While CD logics
is also 0.5, it always consumes power when it enters the
evaluation period. During the evaluation period, CD logic
always dissipates power via either dynamic power dissipation
(X goes to VDD and Out is discharged to GND) or direct path
current (contention mode). While CD logic consumes more
power, we believe that CD logic is still an attractive choice in
a high-performance full-custom design because: 1) CD logic
is only intended to replace the critical path; and 2) power
management techniques such as clock gating [20], [21], where
the clock connection to idle module is turned off (gated), will
signicantly reduce CD logics dynamic power consumption.
D. CD Logic Family
CD logics LB [Fig. 3(a)] can be modied such that the
inverter is replaced by a static gate to achieve even higher
performance, since the inverter delay is not wasted. We refer
to such a variation as compound CD logic (CCD), analogous
to the case of CDL of the dynamic domino logic. Another
family of CD logic was proposed in [22], where the output
inverter is replaced by a dynamic domino logic. The analysis
in [22] shows that a 64-bit parallel-prex adder employing
this type of logic is superior to its CDL-based counterpart,
however, it requires additional design considerations because
of the degraded noise margin.
IV. CD LOGIC CHARACTERIZATION
Unless otherwise specied, all simulation runs in this paper
are done in schematic level (transistor netlists), with extracted
parasitic capacitance in the Cadence design environment using
65-nm CMOS technology. All the CD logic gates are designed
such that the worst case glitch level under 6 deviation with
nominal VDD (1.0 V) is less than 300 mV at 110 C. The
measured power consumption includes clock trees and data
buffers, which are both sized to drive a fanout-of-4 (FO4) load.
The outputs of all the logic gates are driving an identical 20 fF
load. The window duration (width) is dened as the 50% point
of the falling edge of CLK to the 50% point of the rising edge
of node Y. The delay is measured at the 50% switching point
of either the CLK or data to the 50% switching point of the
latest output.
In this section, all logic transistors have a 1-m effective
nMOS width. For CD logic, the pMOS CLK transistors
width is 2.4 m. The transistor sizings are optimized primary
for delay, because the main objective of this section is to
explore CD logics performance advantage. The clock and data
frequencies are set to 2 GHz.
A. Noise Margin Versus Window Width
Noise margin is dened as the dc noise level at the input
generating a false logic evaluation at the output of the same
gate and can be computed based on the following formula:
Noise Margin =
V
original
V
noise
(14)
where V
original
is the expected voltage level without any input
noise interference, and V
noise
is the input dc noise voltage that
0 50 100 150 200 250 300
100
200
300
400
500
CD OR3 "0" NM
CD OR3 "1" NM
CD AND3 "1" NM w keeper
CD AND3 "1"-NM w/t keeper
Dynamic
OR3 NM
N
o
i
s
e
M
a
r
g
i
n
(
m
V
)
Window Duration (ps)
400
600
800
Dynamic
AND3 NM
CD AND3 "0" NM w keeper
CD AND3 "0"-NM w/t keeper
N
o
i
s
e
M
a
r
g
i
n
(
m
V
)
Fig. 8. Simulated logic 1 and 0 noise margin versus window duration
for a three-input AND and OR gate.
AND2 AND3 OR2 OR3 AOI22
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
Static CD C-Q CD D-Q Dynamic
N
o
r
m
a
l
i
z
e
d
D
e
l
a
y
Fig. 9. Normalized delay of ve logic expressions implemented in static,
dynamic, and CD logic.
causes the false logic evaluation. For CD logic, two types of
noise margin are dened: logic 1 and 0 noise margins.
Logic 1 noise margin refers to the input dc noise level that
causes the CD logic to fail to remain in the contention mode.
If IN [Fig. 3(b)], which is supposed to be at full VDD, is now
degraded due to noise, then the glitch level at X may be too
high such that Out is falsely discharged. In this case, the noise
margin can be calculated as 1V V
in
, where V
in
is V
g
of M7.
Similarly, logic 0 noise margin refers to the input dc noise
level that causes the CD logic to fail evaluating. In this case, if
an input which is supposed to be at GND is now much higher
due to noise, the contention between M1 and M7 will cause
X to settle at an intermediate voltage instead of VDD. When
CLK_d rises to VDD (window closes), Y will also be charged
up through M3 and M4, since M3 is on and M4 is partially on
because X is not at VDD. If the voltage level at X is too low,
then Y will be charged to VDD through positive feedback and
X will be discharged to GND through M7, which is driven by
the noise source.
Fig. 8 shows the simulated worst case logic 1 and
0 noise margin for a three-input AND gate and OR gate
implemented with CD and dynamic domino logic. The CD
logics logic 0 noise margin is always much higher than
the logic 1 noise margin, suggesting that CD logic is more
robust during the CQ delay and the DQ delay than the
contention mode. Moreover, as the window width prolongs,
AND2 AND3 OR2 OR3 AOI22
1
2
3
4
5
6
7
8
12
15
18
21
24
27
N
o
r
m
a
l
i
z
e
d
A
v
e
r
a
g
e
P
o
w
e
r
a
t
V
a
r
i
o
u
s
D
a
t
a
A
c
t
i
v
i
t
y
Static Dynamic CD
10%
50%
100%
10%
50%
100%
10%
50%
100%
10%
50%
100%
10%
50%
100%
Fig. 10. Normalized average power of ve logic expressions at various data
activities implemented in static, dynamic, and CD logic.
logic 0 noise margin improves while logic 1 noise margin
degrades for both logic types. Therefore, reducing the window
duration not only minimizes the power consumption but also
improves CD logics overall robustness. For noise-margin-
sensitive applications, a minimum-size nMOS keeper (gate
connected to Out, drain connected to X) can be added to
improve its overall robustness. In this case, the minimum-
size keeper improves logic 1 noise margin by approxi-
mately 60 mV with virtually no degradation in logic 0
noise margin.
B. CD Logic Performance
Figs. 9 and 10 illustrate the normalized delay and aver-
age power consumption of static, dynamic, and CD logic,
respectively, in ve logic expressions with various input data
activities. The average power is calculated by summing up the
power consumption of every possible input vector and then
dividing it by the number of input vector combinations.
CD logic demonstrates superior performance, especially
for complicated logic expressions, such as Y = AB + CD
(AOI22), in the D-Q mode due to the pre-evaluated charac-
teristic. This is demonstrated in Fig. 11, where CD logic is
approximately two times faster than dynamic domino logic.
This is contributed by: 1) the pre-evaluated characteristic;
and 2) the less number of transistors in the critical path
(3N1P for dynamic, while only 2P1N for CD logic). On the
other hand, CD logics performance is only approximately
the same as or even worse than that of dynamic domino
logic during the CQ mode. Therefore, it is advantageous
to implement CD logic in a single-cycle multistage datapath
because then the pre-evaluated feature (DQ delay) of CD
logic can be fully utilized. The power consumption of CD
logic at 50% data activity is at least 3 and 5 higher than
that of static logic in AOI22 and the rest of logic expressions,
respectively. This suggests that CD logic should be used only
to replace the critical path in any circuit block, since it is
not energy efcient to implement any system with CD logic
only. Table II summarizes the total transistor width of static,
dynamic, and CD logic. Despite CD logics additional tran-
sistor overhead, the average area of CD logic is 13% smaller
and 4.5% larger than that of static and dynamic domino logic,
respectively.
CLK1
CLK1
OUT1
B
A
3u
3u
3u
0.5u
3u
0.5u
1.1 1.15 1.2 1.25 1.3 1.35
1.25
1.0
.75
.5
.25
0
(
V
)
1.25
1.0
.75
.5
.25
0
-.25
(
V
)
CLK1
A
B
OUT1
M0(27.56ps, 0.0V) M0(27.56ps, 0.0V) M0(27.56ps, 0.0V) 27.56ps
time (ns)
20fF
(a)
1.325 1.35 1.375 1.400 1.425 1.45
1.25
1.0
.75
.5
.25
0
( V
)
1.25
1.0
.75
.5
.25
0
-.25
(V
)
A
CLK2
OUT2_
OUT2
13.27ps
time (ns)
1.25
1.0
.75
.5
.25
0
-.25
(V
)
B
Pre-evaluated
OUT2
Y
CLK2
M2
M1
M7
CLK2
Y
X
X
CLK2
CLK2
M3
M4
M5 M6
CLK_d
M0
CLK2
0.3u A
2.8u
2u
2.8u
1u
1.5u
B
M8
X (OUT2_ )
2u
20fF
(b)
Fig. 11. Schematic and timing waveform of (a) dynamic and (b) CD logic.
TABLE II
STATIC, DYNAMIC, AND CD LOGIC AREA COMPARISON
Total transistor width (m) Number of transistors
Static Dynamic CD Static Dynamic CD
AND2 11 13.6 14.96 6 7 17
AND3 18 20.9 19.96 8 8 18
OR2 13 10.6 12.96 6 7 17
OR3 24 12.6 13.96 6 7 18
AOI22 27 19.6 18.96 10 9 19
Average 18.6 15.46 16.16 7.2 7.6 17.8
V. PERFORMANCE ANALYSIS
A. 8-bit Ripple Carry Adders (RCAs)
The simulation setup in this section is similar to that of
Section IV. Three 8-bit RCAs using static, dynamic, and CD
logic style are simulated to compare their performances. An
RCA with FTL on the critical path is also implemented,
however, our analysis indicates that FTL-based RCA would
generate false outputs at the later bits because of the false
evaluation phenomenon described earlier. NP-FTL (equiva-
lent to NP-domino, where nMOS-FTL and pMOS-FTL alter-
nate) is also difcult to realize because the output glitch
is signicant and easily exceeds 500 mV under process
variations.
The basic static full adder (FA) is implemented with 28 tran-
sistors with sizing strongly in favor of C
out
computation [6].
The main purpose of this 8-bit RCA is to demonstrate CD
logics performance advantage and to discuss the design con-
siderations that should be taken into account when using CD
logic. A more energy-efcient pass-transistor FA design [23]
will be implemented in the subsequent analysis to provide a
more realistic comparison.
Only the timing-critical carry generation is replaced with
dynamic and CD logic, while noncritical sum computation
remains static in all three RCAs. Ten-thousand random input
vectors are applied to RCAs to compute the average power
consumption. The clock timing is designed in such a way that
all the CD logic gates except the rst stage are operated in the
DQ mode with a window duration of approximately 115 ps.
2
Fig. 12(a) depicts the RCA block diagram and FA schematic.
Fig. 12(b) shows the corresponding worst case timing diagram
for CD logic, which occurs when {C
in
, A
0
A
7
, B
0
B
7
} =
{0, 0 0, 1 1}.
1) Design Considerations in a Multistage System: Fig. 12(b)
provides several insights into designing single-cycle multistage
CD circuitries. When CD logic is to be used with other logic
styles and when the gate preceding the CD logic is not a
precharge type logic (i.e., dynamic domino logic), then the
inputs (Data) can only make transitions when all CD logic
gates are in the predischarge mode (i.e., CLK14 are high). CD
logic may suffer from additional power consumption due to
the possible direct path current during the predischarge mode.
Consider a CD-based RCA with the worst case input vector,
FA0-2s CD logic carry circuitries enter predischarge mode
when CLK1 goes to high. The internal nodes before the output
inverter of all CD logic gates are all discharged to logic 0
by nMOS clock transistors, and the outputs (C
1
to C
3
) are
charged to logic 1. If C
3
becomes logic 1 before FA3
enters predischarge mode (i.e., CLK2 is still high), then a
direct path takes place.
3
To avoid this condition, it is necessary
for T
discharge
T
delta
, where T
discharge
is the time that the
CD logic takes to charge its output and T
delta
is the delay
between two adjacent clock signals (CLK2 to CLK1, or CLK3
to CLK2, etc.).
2) CD Logic Sizing Strategy: To guarantee a 6 glitch
level of 300 mV at 110 C, the following sizing strategy is
employed.
1) An equally weighted variable is assigned to the width
of CD logics pMOS pull-up transistors.
2
The window duration is a function of the logic expression, the number of
preceding stages that are driving by the same phase clock signal, the maximum
glitch level constraint, and the robustness of the overall system. Extensive
simulation results have indicated that a window duration of 115 ps provides
excellent delay and power performances while maintaining a sufcient timing
margin against PVT and transistors mismatch variations (Table VII).
3
Consider the CD carry generation circuitry shown in Fig. 12(a), under
worst case vector condition B = 1 and A = 0. If input C (C
in
from the
preceding stage) rises to logic 1 (originally at logic 0) before CLK is
high, then both pMOS and nMOS transistors are on.
S
A
B
C
A
B
C 1
1
1
1
1
1
A B C
A B C
1 1 1
1 1 1
1
1
Cout
0.5
1
S
2
2 A
B
C
B A 2 2
2
2
1
Cout
2.8
2.8 CLK
TB
0.3 CLK
Sum Generation
1
1 A
B
C
B A 3 3
3
3 CLK
1
2
1 CLK Keeper
Cout
Cout
Cout
Carry Generation
CD Dynamic
FA0
...
C1
S0
A0 B0
FA1
S1
A1 B1
FA7
C8
S7
A7 B7
Cin
Critical Path
CLK
CLK1 (FA0~2) CLK2 (FA3~4)
CLK3 (FA5~6) CLK4 (FA7)
CD logic clock generation * Window period
115ps
* Clock and data
buffer power is
included
t 115ps
CLK1 CLK1 CLK4
1
1
1
1
A
B
A
B A B
C
C
B A 3 3
3
6
6 6
Cout
1
2
Cout
Static
(a)
CLK1
CLK2
CLK3
CLK4
C
1
C
2
C
3
C
4
C
5
C
6
C
7
S
8
Data
1 Cycle Time
T
discharge
T
delta
CLK2 needs to arrive
earlier than C3
(b)
Fig. 12. RCA (a) block diagram and (b) timing diagram implemented with CD logic.
TABLE III
RCA PERFORMANCE COMPARISON
Static Dynamic CD
Data activity 10% 50% 100% 10% 50% 100% 10% 50% 100%
Delay (ps) 369.4 (1.00) 292.6 (0.79) 224.4 (0.61)
Power (W) 50.95 254.77 509.54 142.70 336.05 401 231.85 533.96 602.82
Powerdelay product (PDP) (fJ) 18.8 94.1 188.2 41.8 98.3 117.3 52 119.8 135
Energydelay product (EDP) (fJps) 6944 34761 69521 12231 28763 34322 11669 26883 30294
2) The entire circuitry (i.e., 8-bit RCA) is then simu-
lated under typical corner at 110 C to determine the
glitch level. Extensive simulation results reveal that, if
this glitch level is approximately 65 mV, then the 6
glitch level will be less than 300 mV.
3) Iterative simulations are performed by sweeping this
variable until the glitch level is around 65 mV.
The equally weighted scheme clearly may not be the opti-
mal solution. However, different sizing schemes have been
explored and simulation results indicate that no apparent
performance improvement is achieved compared to the sizing
strategy described above.
3) RCA Performance: Table III compares static, dynamic,
and CD logic RCAs with various gures of merits at dif-
ferent data activity factors. CD-based RCA is approximately
39 and 23% faster than the static and dynamic counterparts,
respectively. On the other hand, the power consumption of CD
logic ranges from 4.55 to 1.18 higher than that of static
logic. In terms of the PDP, CD logic is 2.78 more and 0.72
less than static logic at 10 and 100% data activity, respectively.
CD logic provides a speed advantage that logic styles such as
static and dynamic nd difcult to reach. Therefore, CD logic
is suitable in a system where performance is the most critical
factor.
B. 32-bit Carry Lookahead Adder (CLA)
We implement 32-bit CLAs to further analyze CD logics
performance. The detailed operations of CLA are described
in [6] and the schematic is displayed in Fig. 13. The 32-bit
CLA uses eight 4-bit FAs with dedicated circuitry to facilitate
carry generation. The energy-efcient FA used in this analysis
utilizes pass transistor logic styles with only 24 transistors for
sum generation [23]. For the carry generation, only the critical
path is replaced with different logic style. The maximum fan-
in is limited to four, except in the case of dynamic domino
logic due to the footer transistor. In this case, the 4-bit critical
carry generation path of CLA is
G
3:0
= G
3
+ P
3
(G
2
+ P
2
(G
1
+ P
1
(G
0
))) (15)
where G and P are the generate (A B) and propagate (AB)
signals, respectively. CDL and CCD logic are implemented to
reduce the number of fan-ins. One can utilize the inversion
property and rearrange 15 to
G
1:0
= G
1
+ P
1
G
0
, P
3:2
= P
3
P
2
G
3:2
= G
3
+ P
3
G
2
, G
3:0
= G
3:2
(P
3:2
+ G
1:0
). (16)
Therefore, a maximum fan-in of two and three can be achieved
with CCD logic (Section III-D) and CDL, respectively. The
critical pMOS transistors width of CCD logic is slightly
0.5
0.25
G3
1
2
G3:0
2 P3
1 G2
2 P2
1.5 G1
2 P1
2 G0
Pseudo-NMOS
A0:3 B0:3 S0:3 A4:7 B4:7 S4:7 A8:11 B8:11 S8:11 A12:15 B12:15 S12:15
A16:19 B16:19 S16:19 A20:23 B20:23 S20:23 A24:27 B24:27 S24:27 A28:31 B28:31 S28:31
Cin
C3 C7 C11
C19 C23
C31
G15:0, P15:0
G31:28, P31:28
G27:0, P27:0
C27
C15
Critical Path
0.5
1.4
1.4 CLK
TB
0.3 CLK
t 115ps
G3
3
1
G3:0
2 P3
1 G2
2 P2
1.5 G1
2 P1
2 G0
CD logic
1.2
1.2 CLK
TB
0.3 CLK
t 115ps
1 P3
1 P2
P3G2
G3:0
Compound CD
(CCD) logic*
1
1
CLK
CLK
G3
0.5
3
G3:0
2.5 P3
1.5 G2
2.5 P2
2 G1
2.5 P1
2.5 G0
2.5
0.3
0.3
CLK
Dynamic logic
B
0.5
A
1
B
A
B
0.5
A
1
B
A
1
2
P
A
0.5
B
1
A
B
A
0.5
B
1
A
B
1
2
P
0.5
C
1
C
0.5
C
1
S
P
P
2
1
2
1
2
1
A
B
C
A
B
C
Sum Generation
1
2
0.5
1.2
1.2 CLK
TB
0.3 CLK
t 115ps
G1 1 P1
1 G0
G3+P3G2
3
3
4 4
4
1.5
G1+P1G0
Non-critical path
implemented with static
logic (not shown)
CCD logic replaces the inverter in the logic block with a complex logic gate to provide
better performance, similar to the idea of compound domino logic (CDL) over dynamic logic.
*
Fig. 13. 32-bit CLA.
TABLE IV
32-BIT CLA PERFORMANCE COMPARISON
Data activity 100% 50% 25% 10%
Delay Power PDP EDP Power PDP EDP Power PDP EDP Power PDP EDP Worst case
(ps) (mW) (pJ) (pJps) (mW) (pJ) (pJps) (mW) (pJ) (pJps) (mW) (pJ) (pJps) leakage (A)
Static 448 (1.00) 5.13 2.3 1030 1.94 0.87 390 0.99 0.45 199 0.42 0.19 85.1 32.9
Dynamic 316 (0.71) 5.27 1.67 526 2.22 0.7 222 1.32 0.42 132 0.8 0.25 79.8 32.1
CDL 287 (0.64) 5.29 1.52 437 2.21 0.63 182 1.33 0.38 110 0.81 0.23 66.7 32.7
Pseudo-nMOS 313 (0.70) 5.34 1.67 523 2.21 0.69 216 1.25 0.39 122 0.68 0.21 66.3 551.2
CD logic 272 (0.61) 5.02 1.37 371 2.31 0.63 171 1.43 0.39 106 0.90 0.25 66.8 33.5
CCD logic 239 (0.53) 5.09 1.22 291 2.34 0.56 134 1.46 0.35 84 0.94 0.22 53.5 33.4
smaller than that of CD logic to satisfy the 300-mV glitch
constraint because the pull-up path in the LB now consists of
two pMOS transistors. Footless dynamic domino logic cannot
be applied in this analysis because not all the circuits are
implemented with dynamic domino logic. Therefore, some of
the inputs to the dynamic domino logic in the critical path
can come from noncritical static gates. In order to satisfy
the monotonicity requirement and to avoid the possible direct
path current, a footer transistor is required for all dynamic
domino logics. On the other hand, if the entire 32-bit CLA
is implemented with dynamic domino logic, then the footless
scheme can be applied. However, simulation results indicate
that the power consumption in this case is much higher than
that of the current setup, thus making it a less attractive design.
Table IV summarizes the simulation results for the 32-bit
CLAs. Power consumption is calculated with 5000 random
input vectors. The performance enhancement of CD and CCD
logic is evident in this case, with 39 and 47% speedup over
static design, respectively. Both CD and CCD logic are also
faster than pseudo-nMOS, primarily because of the larger
effective pMOS width. The worst case leakage of all the
designs is comparable except the case of pseudo-nMOS, which
is caused by the contention between nMOS PDN and the weak
pMOS pull-up transistor. In this case, pseudo-nMOSs leakage
(static power dissipation) is at least 15 higher than the rest
of the designs.
Figs. 1416 show the normalized power, PDP, and EDP
of all the CLAs analyzed in this paper, respectively. At 10%
(100%) , the power consumption of CD logic is 2.1(1.05)
higher than the static logic. Compared to the case of the
8-bit RCA, the power consumption discrepancy between static
and CD logic at low data activity is less obvious, because
CD logic only accounts for approximately 10% of the entire
circuitry. CCD logic achieves the lowest PDP at all except
100% 50% 25% 10%
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
N
o
r
m
a
l
i
z
e
d
P
o
w
e
r
Data Activity Factor
Static
Dynamic
CDL
pseudoNMOS
CD logic
CCD logic
Fig. 14. Normalized power of 32-bit CLAs.
100% 50% 25% 10%
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
N
o
r
m
a
l
i
z
e
d
P
D
P
Static
Dynamic
CDL
pseudoNMOS
CD logic
CCD logic
Fig. 15. Normalized PDP of 32-bit CLAs.
100% 50% 25% 10%
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
N
o
r
m
a
l
i
z
e
d
E
D
P
Static
Dynamic
CDL
pseudoNMOS
CD logic
CCD logic
Fig. 16. Normalized EDP of 32-bit CLAs.
at 10%. While higher than CCD logic, the PDP of CD logic is
comparable to CDL and pseudo-nMOS and is lower than the
rest of the designs. At 25% , EDP reduction of CD and CCD
logic from other designs is at least 4 and 24%, respectively.
At 10% data activity, CCD logic achieves the lowest EDP with
at least 19% improvement. Notice that the power consumption
of a CLA implemented with CD logic is lower than that of a
CLA with dynamic domino logic at 100% data activity. This
is because the width (1 m) of CD logics pull-up pMOS
transistors in this CLA is much smaller than that (2.4 m,
Section IV) of CD logic in a single-stage logic gate. Hence
the power consumption of dynamic domino logic is higher
than that of CD logic due to the higher internal capacitance
11 Bit Carry-bypass Adder
CLK DFF
CLK DFF
CLK DFF
2:1 Mux
4 Bit FA
2:1 Mux
S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S13 S14 S15
3 Bit FA
P8P7P6
P12P11P10P9
1 Bit FA
1 Bit FA
1 Bit FA
1 Bit FA
Critical
Path
Wallace
Tree
Critical path implemented
with various logic styles
Static logic only
S12
Fig. 17. 8-bit Wallace tree multiplier.
100% 50% 25% 10%
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
N
o
r
m
a
l
i
z
e
d
P
o
w
e
r
Static
Dynamic
pseudoNMOS
CD logic
Fig. 18. Normalized power of 8-bit multipliers.
TABLE V
CD AND CCD LOGIC GLITCH IN 32-bit CLAS
CD logic CCD logic
Temp. Mean Mean + Mean Mean +
(C) (mV) (mV) 6 (mV) (mV) (mV) 6 (mV)
85 65.3 23.2 204.5 63.6 26.2 220.8
110 88.5 29.8 267.3 86.8 36.1 303.4
at 100% data activity. Similar reasoning can easily apply to a
CLA with CCD logic.
Table V summarizes the mean and of CD and CCD
logics worst glitch level in a Monte Carlo simulation with
2000 samples. The worst case glitch is calculated by summing
up the mean and the extrapolated six deviation.
4
CCD logic
exhibits higher worst case glitch level than CD logic at both
85 C and 110 C, despite its already lower critical pMOS
effective width. Both designs are approximately within the
glitch constraint set earlier, and demonstrate that they are
still able to function properly under extreme process and
temperature conditions.
4
It is assumed that the glitch variation as a result of process variations
follows Gaussian distribution.
TABLE VI
8-bit MULTIPLIER PERFORMANCE COMPARISON
Data activity 100% 50% 25% 10%
Delay Power PDP EDP Power PDP EDP Power PDP EDP Power PDP EDP Worst case
(ps) (mW) (pJ) (pJps) (mW) (pJ) (pJps) (mW) (pJ) (pJps) (mW) (pJ) (pJps) leakage (A)
Static 404 (1.00) 3.52 1.42 575 2.29 0.93 375 1.29 0.52 211 0.67 0.27 109 88.78
Dynamic 294 (0.73) 3.57 1.05 309 2.36 0.7 205 1.39 0.41 121 0.80 0.23 69 89.09
Pseudo-nMOS 292 (0.72) 3.82 1.11 325 2.56 0.75 218 1.57 0.46 134 0.96 0.28 82 619.6
CD logic 243 (0.60) 3.59 0.87 212 2.46 0.60 145 1.49 0.36 88 0.88 0.22 52 89.99
TABLE VII
PVT AND MONTE CARLO PERFORMANCE ANALYSIS OF THE CD AND CCD LOGIC-BASED DESIGNS
Corner FF FS SF SS
Monte Carlo
Temperature (C) 110 30 110 30 110 30 110 30
VDD (V) 1.1 0.9 1.1 0.9 1.1 0.9 1.1 0.9 1.1 0.9 1.1 0.9 1.1 0.9 1.1 0.9 mean
8-bit RCA (CD) (ps) 152 199 142 197 182 244 178 265 164 220 152 217 262 369 249 392 226 16.2
32-bit CLA (CD) (ps) 189 238 172 229 222 288 206 289 205 259 186 248 321 439 294 437 274 17.8
32-bit CLA (CCD) (ps) 157 200 147 199 190 250 185 263 169 218 156 214 280 388 266 405 240 18.2
8-bit multiplier (CD) (ps) 211 229 208 228 223 247 219 249 216 236 212 235 259 327 251 347 244 7
100% 50% 25% 10%
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
N
o
r
m
a
l
i
z
e
d
P
D
P
Static
Dynamic
pseudoNMOS
CD logic
Fig. 19. Normalized PDP of 8-bit multipliers.
C. 8-bit Wallace Tree Multiplier
Single-cycle two-phase 8-bit Wallace tree multipliers are
implemented and analyzed. The rst phase (CLK is high) is a
Wallace tree utilizing 3:2 compressors to reduce the number of
partial products, and during the second phase nal addition is
carried out by a 11-bit carry bypass adder, as shown in Fig. 17.
Only the critical path of the nal adder is implemented with
various logic styles with the exception of multiplexors, while
the rest of the circuits remain static. Simulation setups are
similar to that of 32-bit CLAs.
Table VI summarizes the performance results of 8-bit
multipliers and Figs. 1820 illustrate the normalized power,
PDP, and EDP of the various multipliers under different data
activities, respectively. CD logic achieves a similar speedup
(40%) over static logic compared to 32-bit adders. At 25% ,
CD logic consumes 16% more power, but is 30 and 58%
more PDP- and EDP-efcient than static logic, respectively.
In 8-bit multipliers, because the critical path is only a small
portion of the entire circuitry, CD logic has the lowest PDP
and EDP values among all the logic styles across all data
activities. CCD and CDL logic are not included in this
analysis because they are not particularly suitable for this
100% 50% 25% 10%
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
N
o
r
m
a
l
i
z
e
d
E
D
P
Static
Dynamic
pseudoNMOS
CD logic
Fig. 20. Normalized EDP of 8-bit multipliers.
setup. This is because G and P signals are not generated
in this case, instead, direct inputs (A, B) similar to 8-bit
RCA are used to compute the carry. Finally, Table VII shows
the PVT analysis and Monte Carlo simulation results (2000
iterations, 27 C) of the proposed CD and CCD logic-based
designs. All the designs are functional under extreme PVT and
transistors mismatches and variations, thereby demonstrating
the proposed CD logics robustness.
VI. CONCLUSION
A new high-performance logic style with CD characteris-
tic and self-reset circuitry was proposed. The pre-evaluated
feature of CD logic makes it particularly suitable in a circuit
block where a unique critical path exists and performance
is the primary concern. Performance analysis of 8-bit RCAs
reveals that CD logic is 39 and 23% faster than static and
dynamic domino logic, respectively. Simulation results of
32-bit CLAs show similar speed advantage of CD logic
compared to static logic. In this setup, CCD logic achieves
lowest PDP at all data activity except at 10%. Also, CCD
logic achieves the best EDP results, with 66% (37%) reduction
compared to static logic at 50% (10%) data activity. The 6
worst case glitch for CD and CCD logic at 110 C are 220.8
and 303.4 mV, respectively.
CD logics advantages in terms of delay and EDP were also
demonstrated in 8-bit Wallace tree multipliers. Compared to
32-bit adders, CD logic achieves a similar delay improvement,
but has an even better EDP reduction, primarily because the
nal adder which makes up the critical path of the multiplier
is a relatively small circuit block of the overall circuitry. At
25% , CD logic is 52, 25, and 37% more EDP-efcient than
static, dynamic, and pseudo-nMOS logic, respectively.
REFERENCES
[1] R. Zimmermann and W. Fichtner, Low-power logic styles: CMOS
versus pass-transistor logic, IEEE J. Solid-State Circuits, vol. 32, no. 7,
pp. 10791090, Jul. 1997.
[2] N. Goncalves and H. De Man, NORA: A racefree dynamic CMOS
technique for pipelined logic structures, IEEE J. Solid-State Circuits,
vol. 18, no. 3, pp. 261266, Jun. 1983.
[3] C. Lee and E. Szeto, Zipper CMOS, IEEE Circuits Syst. Mag., vol. 2,
no. 3, pp. 1016, May 1986.
[4] R. Rafati, S. Fakhraie, and K. Smith, A 16-bit barrel-shifter imple-
mented in data-driven dynamic logic (D
3
L), IEEE Trans. Circuits Syst.
I, Reg. Papers, vol. 53, no. 10, pp. 21942202, Oct. 2006.
[5] F. Frustaci, M. Lanuzza, P. Zicari, S. Perri, and P. Corsonello, Low-
power split-path data-driven dynamic logic, Circuits Dev. Syst. IET,
vol. 3, no. 6, pp. 303312, Dec. 2009.
[6] N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems
Perspective, 4th ed. Reading, MA: Addison Wesley, Mar. 2010.
[7] K. Bernstein, High Speed CMOS Design Styles, 1st ed. New York:
Springer-Verlag, Aug. 1998.
[8] S. Mathew, R. K. Krishnamurthy, M. A. Anders, R. Rios, K. R. Mistry,
and K. Soumyanath, Sub-500-ps 64-b ALUs in 0.18-m SOI/bulk
CMOS: Design and scaling trends, IEEE J. Solid-State Circuits, vol.
36, no. 11, pp. 318319, Nov. 2001.
[9] S. Mathew, M. Anders, R. Krishnamurthy, and S. Borkar, A 4 GHz
130 nm address generation unit with 32-bit sparse-tree adder core, in
VLSI Circuits Dig. Tech. Papers Symp., 2002, pp. 126127.
[10] S. K. Mathew, M. A. Anders, B. Bloechel, T. N. Krishnamurthy, and
S. Borkar, A 4-GHz 300-mW 64-bit integer execution ALU with dual
supply voltages in 90-nm CMOS, IEEE J. Solid-State Circuits, vol. 40,
no. 1, pp. 4451, Jan. 2005.
[11] S. Wijeratne, N. Siddaiah, S. Mathew, M. Anders, R. Krishnamurthy, J.
Anderson, S. Hwang, M. Ernest, and M. Nardin, A 9 GHz 65 nm intel
pentium 4 processor integer execution core, in IEEE Int. Solid-State
Circuits Conf. ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2006,
pp. 353365.
[12] I. Sutherland, R. F. Sproull, and D. Harris. (Feb. 1999). Log-
ical Effort: Designing Fast CMOS Circuits [Online]. Available:
http://amazon.com/o/ASIN/1558605576/
[13] S. Kiaei, S.-H. Chee, and D. Allstot, CMOS source-coupled logic
for mixed-mode VLSI, in Proc. IEEE Int. Circuits Syst. Symp., New
Orleans, LA, May 1990, pp. 16081611.
[14] L. McMurchie, S. Kio, G. Yee, T. Thorp, and C. Sechen, Output
prediction logic: A high-performance CMOS design technique, in Proc.
Comput. Des. Int. Conf., Austin, TX, 2000, pp. 247254.
[15] K. H. Chong, L. McMurchie, and C. Sechen, A 64 b adder using self-
calibrating differential output prediction logic, in IEEE Int. Solid-State
Circuits Conf. Dig. Tech. Papers, San Francisco, CA, Feb. 2006, pp.
17451754.
[16] V. Navarro-Botello, J. A. Montiel-Nelson, and S. Nooshabadi, Low
power arithmetic circuit in feedthrough dyanmic CMOS logic, in Proc.
IEEE Int. 49th Midw. Symp. Circuits Syst., Aug. 2006, pp. 709712.
[17] V. Navarro-Botello, J. A. Montiel-Nelson, and S. Nooshabadi, Analysis
of high-performance fast feedthrough logic families in CMOS, IEEE
Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 6, pp. 489493, Jun.
2007.
[18] Y. Taur and T. Ning, Fundamentals of Modern VLSI Devices. Cambridge,
U.K.: Cambridge Univ. Press, 1998.
[19] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, Leak-
age current mechanisms and leakage reduction techniques in deep-
submicrometer cmos circuits, Proc. IEEE, vol. 91, no. 2, pp. 305327,
Feb. 2003.
[20] N. Kurd, J. S. Barkarullah, R. O. Dizon, T. D. Fletcher, and P. D.
Madland, A multigigahertz clocking scheme for the Pentium(R) 4
microprocessor, IEEE J. Solid-State Circuits, vol. 36, no. 11, pp. 1647
1653, Nov. 2001.
[21] S. Vangal, N. Borkar, E. Seligman, V. Govindarajulu, V. Erraguntla,
H. Wilson, and A. Pangal, 5GHz 32b integer-execution core in 130
nm dual-VT CMOS, in IEEE Int. Solid-State Circuits Conf. Dig. Tech.
Papers, Nov. 2002, pp. 334535.
[22] P. Chuang, D. Li, and M. Sachdev, Design of a 64-bit low-energy high-
performance adder using dynamic feedthrough logic, in Proc. IEEE Int.
Circuits Syst. Symp., May 2009, pp. 30383041.
[23] M. Aguirre-Hernandez and M. Linares-Aranda, CMOS full-adders for
energy-efcient arithmetic applications, IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 19, no. 99, pp. 15, Apr. 2010.
Pierce Chuang (S08) received the B.A.Sc. (hons.)
and M.A.Sc. degrees in electrical engineering from
the University of Waterloo, Waterloo, ON, Canada,
in 2008 and 2010, respectively, where he is currently
pursuing the Ph.D. degree in electrical engineering.
His current research interests include low-power
high-performance digital CMOS circuits.
David Li (S05) received the B.A.Sc. (hons.),
M.A.Sc., and Ph.D. degrees in electrical engineering
from the University of Waterloo, Waterloo, ON,
Canada, in 2005, 2007, and 2011, respectively.
He joined Qualcomm Inc., San Diego, CA, in 2011
to work on custom memory and digital designs.
His current research interests include low-power,
high-performance, and reliable digital designs with
special focus on ip-op metastability.
Dr. Li received the Best Paper Award at the Inter-
national Symposium on Quality Electronic Design
in 2011.
Manoj Sachdev (F11) received the B.E. degree
(hons.) in electronics and communication engi-
neering from the University of Roorkee, Roorkee,
India, and the Ph.D. degree from Brunel University,
Uxbridge, U.K., in 1984 and 1996, respectively.
He was with Semiconductor Complex Ltd.,
Chandigarh, India, from 1984 to 1989, where he
was involved in the design of CMOS integrated
circuits. From 1989 to 1992, he was with the ASIC
Division, SGS-Thomson, Agrate, Milan. In 1992,
he joined Philips Research Laboratories, Eindhoven,
The Netherlands, where he worked on various aspects of VLSI testing
and manufacturing. He joined the Electrical and Computer Engineering
Department, University of Waterloo, Waterloo, ON, Canada, as a Professor in
1998. He serves as the Department Chair and holds the University of Waterloo
Research Chair. He has contributed ve books and two book chapters, and
has authored over 170 technical articles published in conference proceedings
and journals. He holds more than 30 granted or pending U.S. patents on
various aspects of VLSI circuit design, reliability, and testing. His current
research interests include low-power and high-performance digital circuit
designs, mixed-signal circuit designs, and test and manufacturing issues of
integrated circuits.
Prof. Sachdev was the recipient of the Best Paper Award at the European
Design and Test Conference in 1998, the Honorable Mentioned Award at the
International Test Conference in 1999, the Best Panel Award at the VLSI
Test Symposium in 2004, and the Best Paper Award at the International
Symposium on Quality Electronic Design in 2011. He has served on several
conference committees. He was the Technical Program Co-Chair for the
IEEE Mixed-Signal Test Workshop in 1999 and the Technical Program
Chair for the IEEE IDDQ Test Workshop in 1999 and 2000. From 2007
to 2008, he was an Associate Editor for the IEEE TRANSACTIONS ON
VEHICULAR TECHNOLOGY. From 2008 to 2009, he was the Program Chair
for Microsystems and Nanoelectronics Research Conference, Ottawa, Canada.
He serves on the Editorial Board of the Journal of Electronic Testing:
Theory and Applications. He is the Chair of the Test, Debug, and Reliability
Subcommittee of the IEEE Custom Integrated Circuits Conference.

Constant Delay Logic Style

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Constant Delay Logic Style

Hochgeladen von

Copyright:

Verfügbare Formate

554 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO.

Das könnte Ihnen auch gefallen