Sie sind auf Seite 1von 56



In the fast growing VLSI industry transistor density per chip is increasing day
by day following the Moores law. With increase in transistor density, area and
power consumption also increases. The design engineers are striving to achieve more
and more functionality at higher speed and low power, keeping area and cost low.
Circuit design techniques also plays an important role in achieving high
performance, low power or low area. Design engineers can consider different logic
design techniques according to the need of their design. Some designs need to be
very fast despite of area and power dissipation, example in some real time systems,
while some requires very low power and small area, as in like portable devices.
The rapid advancement in VLSI circuit is due to increased use of portable and
wireless systems with low power budgets and microprocessors with higher speed. To
achieve this, the size of transistors and supply voltages are scaled down along with
technology. Due to larger number of devices per chip the interconnection density
increases. The interconnection density along with high clock frequency increases
capacitive coupling of the circuit. Therefore, noise pulses known as crosstalk are
generated leading to logic failure and delay of the circuit . Again, when supply
voltage is scaled, threshold voltage of the device needs to be scaled to preserve the
circuit performance, which in turn leads to increase in the leakage current of the
device. Due to high speed and low device count especially compared to
complementary CMOS, dynamic-logic circuits are used in a wide variety of
applications including microprocessors, digital signal processors and dynamic
memory. Dynamic circuit contains a pulldown network (PDN) which realizes the
desired logic function. According to the basic theory, the dynamic logic circuit will
precharge at every clock cycle. Due to the high frequency of the clock signal a lot of
extra noise is introduced in the circuit that consumes additional power and slows
down the circuit .There are several circuit technique which can reduce the noise of
dynamic logic dramatically. These circuit increases speed and decreases the power
dissipation of the circuit as compared to other domino logic styles.


In integrated circuit design, dynamic logic (or sometimes clocked logic) is a
design methodology in combinatory logic circuits, particularly those implemented
in MOS technology. It is distinguished from the so-called static logic by exploiting
temporary storage of information in stray and gate capacitances. It was popular in the
1970s and has seen a recent resurgence in the design of high speed
digital electronics, particularly computer CPUs. Dynamic logic circuits are usually
faster than static counterparts, and require less surface area, but are more difficult to
design. Dynamic logic has a higher toggle rate than static logic but the capacitive
loads being toggled are smaller so the overall power consumption of dynamic logic
may be higher or lower depending on various tradeoffs. When referring to a
particular logic family, the dynamic adjective usually suffices to distinguish the
design methodology, e.g. dynamic CMOS or dynamic SOI design.
Dynamic logic is distinguished from so-called static logic in that dynamic
logic uses a clock signal in its implementation of combinational logic circuits. The
usual use of a clock signal is to synchronize transitions in sequential logic circuits.
For most implementations of combinational logic, a clock signal is not even needed.
The largest difference between static and dynamic logic is that in dynamic
logic, a clock signal is used to evaluate combinational logic.
Most types of logic design, termed static logic, there is at all times some
mechanism to drive the output either high or low. In many of the popular logic styles,
such as TTL and traditional CMOS, this principle can be rephrased as a statement
that there is always a low-impedance DC path between the output and either the
supply voltage or the ground. As a side note, there is of course an exception in this
definition in the case of high impedance outputs, such as a tri-state buffer; however,
even in these cases, the circuit is intended to be used within a larger system where
some mechanism will drive the output, and they do not qualify as distinct from static


In contrast, in dynamic logic, there is not always a mechanism driving the
output high or low. In the most common version of this concept, the output is driven
high or low during distinct parts of the clock cycle. During the time intervals when
the output is not being actively driven, its impedance causes it to maintain a level
within some tolerance range of the driven level.
Dynamic logic requires a minimum clock rate fast enough that the output
state of each dynamic gate is used or refreshed before the charge in the output
capacitance leaks out enough to cause the digital state of the output to change, during
the part of the clock cycle that the output is not being actively driven.
Static logic has no minimum clock rate, the clock can be paused indefinitely.
While it may seem that doing nothing for long periods of time is not particularly
useful, it leads to two advantages:
being able to pause a system at any time makes debugging and testing much
easier, enabling techniques such as single stepping.
being able to run a system at extremely low clock rates allows low-power
electronics to run longer on a given battery.
Being able to pause a system at any time for any duration can also be used to
synchronize to asynchronous events, or processor bus cycle extension mechanisms
such as WAIT inputs, using hardware to gate the clock to a static-core CPU is
simpler, is more temporally precise, uses no program code memory, and uses almost
no power in the CPU while it is waiting. In a basic design, to start waiting, the CPU
would write to a register to set a binary latch bit which would be ANDed or ORed
with the processor clock, stopping the processor. A signal from a peripheral device
would reset this latch, resuming CPU operation. In particular, although many popular
CPUs use dynamic logic, only static cores CPUs designed with fully static
technology and are usable in space satellites due to their higher radiation
hardness. Most satellites do not use CMOS circuits anyway; gallium arsenide is more
popular in these applications.
Dynamic logic, when properly designed, can be over twice as fast as static
logic. It uses only the faster N transistors, which improve transistor sizing


optimizations. Static logic is slower because it has twice the capacitive loading,
higher thresholds, and uses slow P transistors for logic. Dynamic logic can be harder
to work with, but it may be the only choice when increased processing speed is
needed. Most electronics running at over 2 GHz these days

require the use of
dynamic, although some manufacturers such as Intel have completely switched to
static logic to reduce power consumption. Note that reducing power use not only
extends the running time with limited power sources such as batteries or solar arrays
(as in spacecraft), but it also reduces the thermal design requirements, reducing the
size of needed heatsinks, fans, etc., which in turn reduces system weight and cost.
In general, dynamic logic greatly increases the number of transistors that are
switching at any given time, which increases power consumption over static
CMOS. There are several power saving techniques that can be implemented in a
dynamic logic based system. In addition, each rail can convey an arbitrary number of
bits, and there are no power-wasting glitches. Power-saving clock gating and
asynchronous techniques are much more natural in dynamic logic.
Domino logic design technique is the improved version of dynamic logic
family. Fig. shows domino logic which consists of a dynamic logic circuit followed
by a static CMOS inverter. This circuit consists of a PMOS pre-charge transistor MP
and an NMOS evaluation transistor MN with their gates connected to clock, and
there is an NMOS logic network which implements the required logic function.
During the pre-charge phase (Clock = 0) the output of the circuit get charged
through the pre-charge transistor MP to the level of VDD and the output of inverter
is low. Now during the evaluation phase (Clock = 1) the evaluation transistor MN is
ON, and the output of the dynamic circuit either discharges to ground or remains at
high level depending on the inputs applied to the NMOS network.
In circuits of domino logic there can be other capacitive nodes in NMOS
block which shares the charge of the output node capacitance, which decreases the
output voltage level, this problem is called charge sharing. One of the major
advantages of the domino logic over the static CMOS is that this works on high
frequency clock and there is no PUN so this eliminates the spurious transitions and
corresponding power dissipation. But in some logical conditions output is pre-


charged only to discharge in the evaluation phase, for example, if output is low and
we apply inputs which gives low output then in pre-charge phase the output will
charge to high voltage and during evaluation phase it will discharge to low,
increasing the power dissipation.

Fig 1.1 domino logic circuit
Therefore, the signal activity increases for this circuit design technique and this
increased signal activity along with the extra load that the clock line has to derive are
the main reasons for high power dissipation in domino logic as compared to static
CMOS circuits. Noise margin of Domino logic circuits is low as compared to static
CMOS circuits so they are not as scalable as static CMOS. So transistor threshold
voltage is kept high to reduce leakage in domino logic circuits. As compared to static
CMOS, area is reduced in domino logic circuits because of the reduced number of
PMOS transistors. Also there is no short circuit power dissipation in domino and
they have strong output driving capability.
The power consumed in high performance microprocessors has increased to
levels that impose a fundamental limitation to increasing performance and
functionality . If the current trend in increasing power continues, high performance
microprocessors will soon consume thousands of watts. The power density of a high
performance microprocessor will exceed the power density levels encountered in
typical rocket nozzles within the next decade. The generation, distribution, and
dissipation of power are at the forefront of current problems faced by the integrated
circuit industry. The application of aggressive circuit design techniques which only


focus on enhancing circuit speed without considering power is no longer an
acceptable approach in most high complexity digital systems. Dynamic switching
power, the dominant component of the total power consumed in current CMOS
technologies, is quadratically reduced by lowering the supply voltage. Lowering the
supply voltage, however, degrades circuit speed due to reduced transistor currents.
Threshold voltages are scaled to reduce the degradation in speed caused by supply
voltage scaling while maintaining the dynamic power consumption within acceptable
levels . At reduced threshold voltages, however, subthreshold leakage currents
increase exponentially. Energy efficient circuit techniques aimed at lowering leakage
currents are, therefore, highly desirable.
Domino logic circuit techniques are extensively applied in high performance
microprocessors due to the superior speed and area characteristics of domino CMOS
circuits as compared to static CMOS circuits . However, deep sub micrometer
(DSM) domino logic circuits utilizing low power supply and threshold voltages have
decreased noise margins . As on-chip noise becomes more severe with technology
scaling and increasing operating frequencies, error free operation of domino logic
circuits has become a major challenge. Domino logic is a CMOS-based evolution
of the dynamic logic techniques which is based on either PMOS or NMOS
transistors. It allows a rail-to-rail logic swing and is developed to speed up circuits.
Using this technique, glitch-free operation can be obtained as each gate can make
only one transition. But the main problem is that of the charge distribution. The
major necessity of making use of CMOS domino logic for the design of
combinational logic circuits is that of low-power high- speed operation .Various
domino logic circuit techniques which offer better speed, energy-efficiency and noise
immunity in DSM technology are implemented.
Noise immunity in digital dynamic circuits is becoming a major issue with
the progress of advanced VLSI technology. Deep submicron noise is a major issue in
integrated circuit design due to scaling of devices. Noise is used to designate any
phenomenon that causes voltage at non switching node to deviate from its nominal
value. The various sources of noise in deep submicron regions are crosstalk noise
due to capacitive coupling between neighbouring interconnects, small variation in
nominal supply voltage values, leakage current and fluctuations in device parameters


due to process variation . Among the various sources of noise, the sub-threshold
leakage current is the most critical because it exponentially increases with continuous
scaling of MOS transistor dimension. Due to technology scaling the supply voltage is
scaled down in each new technology; at the same time threshold voltage VTH of
transistor is also scaled down to achieve high performance that leads to continuous
increase in sub-threshold leakage current. The leakage current is also increased due
to continuous reduction in gate oxide thickness. Therefore, the design of efficient
noise tolerant circuit is an important issue in present day VLSI design.
Dynamic type domino logic circuits are widely used in high performance
integrated circuits and they are more compact with respect to static CMOS logic
particularly when they have wide fan-in. Wide fan-in dynamic logic circuits are
employed in the critical paths of high performance chips. The major limitation of
domino gates is that they are having less noise tolerance as compared to static
CMOS. In static CMOS the switching threshold is equal to VDD/2 but in dynamic
logic the switching threshold is equal to the threshold voltages of the pull down
NMOS transistors. The feedthrough logic (FTL) is also a type of dynamic logic with
certain added features that facilitates better performance compared to domino
counterpart. The various limitations of domino logic like charge sharing, charge
leakage etc. are also eliminated by FTL. However, FTL is also less noise tolerant.
The noise tolerant property of FTL can be improved mainly by increasing the VTH
of NMOS transistors while they operate in evaluation phase. This is achieved by
raising the source voltage of transistors to prevent the input gate from noise injection.

Power dissipation in CMOS circuits is caused by three sources:
1) the leakage current which is primarily determined by the fabrication
technology, consists of reverse bias current in the parasitic diodes formed
between source and drain diffusions and the bulk region in a MOS transistor
as well as the sub threshold current that arises from the inversion charge that
exists at the gate voltages below the threshold voltage
2) the short-circuit (rush-through) current which is due to the DC path
between the supply rails during output transitions and


3) the charging and discharging of capacitive loads during logic changes. The
dominant source of power dissipation is thus the charging and discharging of
the load capacitances (also referred to as the dynamic power dissipation) .
The power consumed by CMOS circuits can be classified into two categories:
A. Dynamic Power Dissipation
For a fraction of an instant during the operation of a circuit, both the PMOS
and NMOS devices are o n simultaneously . The duration of the interval depends on
the input and output transition (rise and fall) times. During this time, a path exists
between VDD and Gnd and a short-circuit current flows. However, this is not the
dominant factor in dynamic power dissipation. The major component of dynamic
power dissipation arises from transient switching behaviour of the nodes. Signals in
CMOS devices transition back and forth between the two logic levels, resulting in
the charging and discharging of parasitic capacitances in the circuit. Dynamic power
dissipation is proportional to the square of the supply voltage. In deep sub-micron
processes, supply voltages and threshold voltages for MOS transistors are greatly
reduced. This, to an extent, reduces the dynamic power dissipation.
B. Static Power Dissipation
This is the power dissipation due to leakage currents which flow through a
transistor when no transactions occur and the transistor is in a steady state. Leakage
power depends on gate length and oxide thickness. It varies exponentially with
threshold voltage and other parameters. Reduction of supply voltages and threshold
voltages for MOS transistors, which helps to reduce dynamic power dissipation,
becomes disadvantageous in this case. The sub threshold leakage current increases
exponentially, thereby increasing static power dissipation.
1.4.1 Significance of Voltage:
Voltage has quadratic relationship to power; its reduction offers the most
effective means of minimizing power consumption. By reducing the supply voltage
without loss in throughput and seed is to modify the V
of the devices, keeping in
mind adequate noise margins and control the increase in sub threshold leakage


currents. Since the inverse threshold slope (S) of a MOSFET is invariant with
scaling, for every 80-100 mV (based on the operating temperature) reduction in V
the standby current will be increased by one order of magnitude. This tends to limit
to about 0.3 V for room temperature operation of CMOS circuits. Another
important concern in the low V
- low V
regime is the fluctuation in V
. Basically,
delay increases by 3x for a delta V
of plus/minus 0.15 Vat V
of 1 V. This is a
major limitation on how low V
can go unless the V
fluctuation is cancelled by
circuit techniques such as the self-adjusting threshold scheme that will reduce the V

fluctuation to plus/minus 0.05 V at V
of 1 V.
1.4.2. Physical Capacitance:
Dynamic power consumption depends linearly on the physical capacitance
being switched. So, in addition to operating at low voltages, minimizing capacitances
offers another technique for minimizing power consumption. In order to consider this
possibility we must first understand what factors contribute to the physical
capacitance of a circuit.
Power dissipation is dependent on the physical capacitances seen by
individual gates in the circuit. Estimating this capacitance at the behavioural or
logical levels of abstraction is difficult and imprecise as it requires estimation of the
load capacitances from structures which are not yet mapped to gates in a cell library;
this calculation can however be done easily after technology mapping by using the
logic and delay information from the library.
Interconnect plays an increasing role in determining the total chip area,
delay and power dissipation, and hence, must be accounted for as early as possible
during the design process. The interconnect capacitance estimation is however a
difficult task even after technology mapping due to lack of detailed place and route
information. Approximate estimates can be obtained by using information derived
from a companion placement solution or by using stochastic / procedural
interconnect models. Interconnect capacitance estimation after layout is
straightforward and in general accurate.
With this understanding, we can now consider how to reduce physical
capacitance. From the previous discussion, we recognize that capacitances can be


kept at a minimum by using less logic, smaller devices, fewer and shorter wires.
Example techniques for reducing the active area include resource sharing, logic
minimization and gate sizing. Example techniques for reducing the interconnect
include register sharing, common sub-function extraction, placement and routing. As
with voltage, however, we are not free to optimize capacitance independently. For
example, reducing device sizes reduces physical capacitance, but it also reduces the
current drive of the transistors making the circuit operate more slowly. This loss in
performance might prevent us from lowering V
as much as we might otherwise be
able to do.



This chapter presents the detail survey of relevant literature on dynamic and
domino logic circuits in the standard journals and database. There are many low
power design techniques at different abstraction levels of digital system design. This
chapter starts with literature survey on low power design techniques and justies the
need of domino logic circuit technique.
(Sandeep Sangwan, Mrs. Jyoti Kedia, Deepak Kedia)
In this paper the comparative analysis of various CMOS logic design
techniques for various important constraints such as power, area and speed has been
done. The logic design techniques considered are Static CMOS, Domino logic,
Feedthrough Logic (FTL), Modified FTL and Zigzag Keeper. Static CMOS
dissipates less power but it uses more number of PMOS transistors resulting in large
area and in some cases (when PMOS are in series) results in large delay. Domino
logic improves the speed of the circuit and reduces area but at the cost of large power
dissipation. Feedthrough logic which is the improved version of dynamic logic
family further improves the speed of the circuits but it also dissipates more power.
The modified FTL largely improves the power consumption of FTL logic but delay
increases. Zigzag Keeper technique highly improves the power consumption but area
is highly increased. At last a qualitative and quantitative analysis has been shown
between different techniques for power and delay.
Static CMOS and zigzag keeper are good where low power is required and
speed and area are not considered. Zigzag keeper has very low power dissipation
than Static CMOS but Static CMOS has low area than zigzag keeper. Whereas
domino logic and LP-FTL are good where speed is the primary concern. LP-FTL


has good speed than domino and also solves the problems of domino logic such as
non-inverting logic, charge sharing, need of inverter at output etc. but its area is
greater than domino. So this comparison of these different logic design techniques
can make it easy for someone to choose logic design technique according to the need
of design.
(Volkan Kursun, Member, IEEE, and Eby G. Friedman, Fellow, IEEE)
A variable threshold voltage keeper circuit technique is proposed for
simultaneous power reduction and speed enhancement of domino logic circuits. The
threshold voltage of a keeper transistor is dynamically modified during circuit
operation to re- duce contention current without sacrificing noise immunity. The
variable threshold voltage keeper circuit technique enhances circuit evaluation speed
by up to 60% while reducing power dissipation by 35% as compared to a standard
domino (SD) logic circuit. The keeper size can be increased with the proposed
technique while preserving the same delay or power characteristics as compared to a
SD circuit. The proposed domino logic circuit technique offers 14% higher noise
immunity as compared to a SD circuit with the same evaluation delay characteristics.
Forward body biasing the keeper transistor is also proposed for improved noise
immunity as compared to a SD circuit with the same keeper size. It is shown that by
applying forward and reverse body biased keeper circuit techniques, the noise
immunity and evaluation speed of domino logic circuits are simultaneously
A high-speed, low-power domino logic circuit technique is proposed. The proposed
technique dynamically changes the threshold voltage of the keeper with a specific
delay after the beginning of each operational phase (evaluation and precharge) of the
domino circuit by varying the body bias voltage of the keeper transistor. The keeper
contention current is reduced by increasing the keeper threshold voltage by applying
a reverse body bias to the keeper at the beginning of the evaluation phase. Similarly,
the degradation in noise immunity of DVTVK as compared to SD is avoided by
reducing the keeper threshold voltage to the zero body bias level after a delay greater
than the worst case evaluation delay of a domino logic circuit. Significant


enhancements in speed and reductions in power are achieved when the keeper is
sized for increased noise immunity. The DVTVK and SD circuit techniques are
compared in terms of the evaluation delay and power dissipation assuming the
DVTVK and SD circuits have the same keeper size. The DVTVK technique operates
at up to 60% higher speed while consuming 35% less power as compared to SD.
DVTVK also reduces the PDP by up to 74% as compared to SD. A temporary
degradation in the noise immunity of DVTVK of less than 11% as compared to SD is
observed when the keeper of the DVTVK is reverse body biased. Since the
contention current is significantly reduced with the proposed variable threshold
voltage keeper technique, the keeper transistor in a DVTVK circuit can be sized
larger, offering greater noise immunity with the same delay and power characteristics
as compared to a SD logic circuit.
(Farshad Moradi a,n, Tuan Vu Cao b, Elena I. Vatajelu c, Ali Peiravi d, Hamid
Mahmoodi e, Dag T. Wisland b)

Robustness of high fan-in domino circuits is degraded by technology scaling
due to exponential increase in leakage. In this paper, we propose several domino
logic circuit techniques to improve the robustness and performance along with
leakage power. Lower total power consumption is achieved by utilizing proposed
techniques. According to the simulations in TSMC 65 nm CMOS process, the
proposed circuits increase noise immunity for wide OR gates by at least 3.5X and
shows performance improvement of up to 20% compared to conventional domino
logic circuits. For FinFET simulation TCAD tools have been used.
Floating dynamic node at the beginning of evaluation phase for domino logic
has made these circuits more and more sensitive to noise sources. The dynamic node
is very sensitive to noise sources such as crosstalk, leakage current, charge sharing,
power supply bump, and ground bounce. Since it is a dynamic node, it cannot be
recovered after losing its data due to the noise sources. The dynamic node in the
evaluation phase is the most important node in domino circuits to be stable to have a


right behavior. Conventional domino logic styles include footless standard domino
logic (FLDL) footed standard domino logic (FDL) and high speed domino logic
(HSDL), and conditional keeper domino logic (CKL). In general, domino logic is
primarily proposed for high-speed applications. However, the sensitivity of the
dynamic node to the noise sources has emerged as a serious design challenge in
scaled technologies. Conventionally, the keeper transistor is added to provide
immunity to noise and leakage for the dynamic node. However, adding this PMOS
keeper transistor degrades performance and increases power dissipation in the circuit.
Upsizing the keeper transistor improves robustness at a cost of higher power
dissipation and delay. In other words, upsizing the keeper increases contention
between the keeper transistor and the evaluation network. Therefore, for high-speed
applications using small size keeper is desirable while larger keeper is wanted for a
robust design.
In this paper, several domino logic circuit topologies were proposed for high-
speed and leakage-tolerant design. The proposed circuits showed at least 3.5X
improvement in UNG compared to the conventional design. Furthermore, FinFET
based- Domino logic styles were simulated to investigate the advantages from using
FinFET in logic circuit design. Simulation results using TCAD tools, Taurus, showed
that FinFET domino logic designs gives signicantly lower leakage compared to
utilizing bulk CMOS device. Simulation results show 2.7 times less power
consumption of conventional domino circuit by using FinFET comparing with
USING Bulk CMOS devices.
(Salendra.Govindarajulu1, Dr.T.Jayachandra Prasad2 1Associate Professor, ECE,
Sub-threshold leakage power is soon expected to dominate the total power
consumed by a CMOS circuit in deep submicron ( DSM ) technology. Circuit
techniques aimed at lowering leakage currents are therefore highly desirable. In this
work, low power CMOS designs using dual threshold voltage ( dual-Vt ) domino
logic are proposed. Single threshold voltage ( single-Vt ), standard dual-Vt and
modified dual- Vt domino logic circuits regarding power and speed are compared.


These design styles are compared by performing detailed transistor-level simulations
on bench mark circuits using DSCH3 and Microwind3 CAD tool.
In this work, the benchmark circuits or2, and2 xor2, or8 and 8:1 multiplexer
are successfully implemented using CMOS domino logic. Considering the power-
delay product [PDP] as the figure of merit, each of the circuits for standard low-Vt,
standard high-Vt, standard dual-Vt and modified dual-Vt technologies is compared.
It is observed that the proposed modified dual-Vt technology produces the minimal
power-delay product [PDP] among the four techniques. Hence with the use of low-Vt
transistors in critical timing paths and high-Vt transistors in noncritical timing paths,
the performance characteristics of the domino logic circuit can be significantly
improved. Therefore the proposed modified dual-Vt technology is a better solution
for the optimization of sub-threshold leakage and minimizing the overall power
consumption of the domino circuit

(Mahjabeen Mansoori1, Prof.Tejaswini Choudri2 RKDF Institute of Sciecnce &
Technology, Bhopal)
In this paper, a new domino circuit is proposed, which has a lower leakage
and higher noise immunity without dramatic speed degradation for wide fan-in gates.
In this domino circuit a chain of evaluation network uses well known stacking effect
technique to reduce the leakage. The stacking effect and current mirror makes the
circuit more noise immune and considerably improves the Power Delay Product
(PDP) as compared to the other existing domino logic. The leakage current is also
decreased by exploiting the footer transistor in diode configuration, which results in
increased noise immunity. DCLCR domino circuit reduces the leakage power
consumption by maintaining the same level of delay. By connecting the gate of the
transistor MK1 to its drain power is reduced up to 14% as compare to standard
footless domino and 24%. Simulation results of wide fan-in gates designed using a
65-nm high-performance predictive technology model (PTM) for 8 Input OR Logic.
The leakage current of the evaluation network of dynamic gates was
dramatically increased with technology scaling, especially in wide domino gates,


yielding reduced noise immunity and increased power consumption. In this research
work, two different domino logic are implemented, first is diode connected leakage
current replica (DCLCR). DCLCR circuit takes all the advantage of leakage current
replica and the novelty of the proposed circuit is that DCLCR domino circuit reduces
the leakage power consumption by maintaining the same level of delay. By
connecting the gate of the transistor MK1 to its drain power is reduced up to 14% as
compare to standard footless domino and 24%.

(Preetisudha Meher, K. K. Mahapatra Department of Electronics and
Communication Engineering, National Institute of Technology, Rourkela, India-
769008. E-Mail:,
Dynamic logic style is used in high performance circuit design because of its
fast speed and less transistors requirement as compared to CMOS logic style. But it
is not widely accepted for all types of circuit implementations due to its less noise
tolerance and charge sharing problems. A small noise at the input of the dynamic
logic can change the desired output. Domino logic uses one static CMOS inverter at
the output of dynamic node which is more noise immune and consuming very less
power as compared to other proposed circuit. In this paper we have proposed a novel
circuit for domino logic which has less noise at the output node and has very less
power-delay product (PDP) as compared to previous reported articles. Low PDP is
achieved by using semi-dynamic logic buffer and also reducing leakage current when
PDN is not conducting. This paper also analyses the PDP of the circuit at very low
voltage and different W/L ratio of the transistors.
In this paper, we have proposed a high-speed and low- power domino logic
circuit, which also have noise-tolerance in the output node. The simulation was done
with 90 nm and 1 V CMOS process. The results have shown that the proposed
scheme can work with very high speed and also consuming very low power, which
reduces the PDP of the circuit exponentially. Proposed circuit also shows noise
efficiency because the noise of the output buffer dramatically improved as compared


to previous work. Also the circuit is flexible for wide variety of dynamic logic styles
and adequate for large fan-in gates.
The motivations in minimizing power consumption differ from application to
application. The different power requirements for different types of applications
decide the trade-off between area, power and delay. CMOS has prevailed as the
superior logic style because of its negligible static power dissipation. The other type
of power dissipation that occurs in CMOS is short-circuit power dissipation which is
also negligible because the short circuit current ows for a very small duration. The
major contributor of the total power dissipation in CMOS is the dynamic power and
any measure taken to reduce the dynamic power results in substantial decrease in
overall power dissipation in CMOS. The three degrees of freedom inherent in low-
power CMOS design space are voltage, capacitance and switching activity. These
three parameters cannot be optimized independently. With a quadratic relationship to
power, voltage reduction is the must step in power reduction. The fact that the supply
voltage reduction causes circuit delays to increase puts a lower bound on reduction in
supply voltage, typically to 2VT. The issue of compatibility should also be
considered while scaling down the volt-age supply since there is some penalty
involved in supporting different supply voltages. Minimizing the capacitance i.e. the
device and interconnect capacitance is an- other technique for minimizing the power
dissipation. The capacitances can be kept at a minimum by using less logic, smaller
devices, register sharing, proper placement and routing, etc. Reducing device sizes
also reduces physical capacitance, but it also reduces the current drive of the
transistors making the circuit operate at lower speed.
This loss in performance may force the designer to increase supply voltage
and hence the designer may have to give up possible quadratic reduction in power
through voltage scaling for a linear reduction through capacitance scaling. The third
factor is the switching activity governed by frequency and data activity. Certain data
representations such as sign magnitude have an inherently lower activity than twos-
complement. Glitching should be avoided whenever possible as it can cause
unnecessary dynamic power dissipation. Again, optimization of activity cannot be


undertaken independently without consideration for the effects on voltage and
Low power digital design is an optimization problem at all levels of the de-
sign i.e. technology, device, circuits, logic, architecture, algorithm and system levels.
The optimization of different low power solutions is a challenging task for the
researchers and largely depends upon the application requirements. Many researchers
and technocrats are working in low power domain to develop innovative techniques
for achieving this low power objective. Figure 2.1 shows an integrated low-power
design approach and possible energy savings that can be achieved. The percentage of
energy savings goes on decreasing from system level to technology level. A brief
literature survey on low power digital design techniques given compares these
techniques and emphasizes the need of energy recovery technique.
Figure 2.1: Possible Energy Savings at Different Abstraction Levels
At the top abstraction level i.e. system level, power management can be implemented
with system logic or software controlling the power consumption of the processor
and peripherals. The research work done shows that:
1. Power dissipation of PowerPC603 microprocessor can be minimized up to
80% if clocks are supplied only to data cache, snooping logic and time base
in the stand-by mode.


2. Similarly, power dissipation of MIPS 4200 microprocessor can be
minimized up to 70% if the clock frequency is reduced to 25%.
3. Power reduction of 95% to 98% can be achieved in Hitachi SH7032 and
PowerPC603 microprocessors respectively if all clocks are stopped.
These researches have introduced various modes of microprocessors namely
doze, nap, sleep, standby etc. But the major obstacle to weighing architectural
decisions with respect to power is the difficulty in estimating the power effects of
these trades-offs, especially in the absence of an implementation.
The power dissipation factor is used for comparing algorithms and for
measuring the effect of algorithm level decisions. This demands for estimating the
algorithm-inherent dissipation and estimating the implementation overhead. The
algorithm-inherent dissipation refers to the power consumed by the execution units
and memory.. Experimental results show that memory access and multiplication
operations are power hungry and hence these functions should be optimized.
The implementation overhead consists of the control, overhead and
implementation related memory/register power. It has been shown that the
implementation- dependent power is strongly correlated to the structural property of
an algorithm namely spatial locality and temporal locality. Given a targeted
hardware platform and a number of algorithm properties, techniques can be
developed for prediction of implementation overhead. Low power goal at algorithmic
level can be achieved in following ways;
1. If there are several algorithms for a given task, the one with least number
of operations is generally preferred.
2. Reducing the algorithmic-inherent dissipation by replacing energy
consuming operations by a combination of simpler operations and reducing
memory size using loop reordering and loop merging transformations .
3. Reducing the implementation overhead by retiming and reducing the chip
area considerably.
Due to lack of implementation details at the algorithm level, the accuracy on
estimating the implementation overhead is not so high. This can be improved only by
stochastic modelling. Researchers have made the power estimation at this level more
reliable with the help of the capacitance of an RTL level module, Dual Bit Type


(DBT) model , and Activity-Based Control (ABC) model . The accuracy of power
estimation is higher at architectural level than at algorithm level. Various circuit and
logic level techniques to reduce power dissipation are;
1. Glitch power reduction:
Glitches occur as a result of statistical delay variations and can be avoided by
precharge (domino) logic. But any hard- ware solution to avoid glitches will always
add to power consumption.
2. Power reduction by reducing voltage swing:
Power can be reduced by reducing voltage swing in case of high capacitance
nodes and I/Os.
3. Power reduction in timing circuits:
The research study shows that gate based static ip-op has lower power
consumption than the transmission gate ip-op, the non-precharged TSPC (True
Single Phase Clock) dynamic ip-op consumes the least power and the RS ip-op
latch is the most energy efficient.
4. Many CMOS circuit techniques have been proposed in the research literature
namely standard static logic (SL), Complementary Pass-Transistor Logic (CPL),
Precharged Logic (PCL), Cascade Voltage Switch Logic (CVSL) and PCVSL . The
experimental results by the researchers show;
(a) Static Logic is the power-lean circuit technique.
(b) PCL and PCVSL are the most power-hungry but these are faster as
compared to others.
(c) The differential techniques such as CPL and CVSL consume large power
but are advantageous in self-contained blocks like adders, multipliers etc.
5. Large capacitive loads like clock network are driven by a tapered inverter chain
and its power consumption can be reduced by decreasing the tapering factor of f.
6. The energy recovery adiabatic approach to low power design can be used to any
digital design system. With energy-recovery CMOS, circuit energy is conserved for
later use instead of allowing it to dissipate as heat in the system
7. To minimize the power dissipated by clock, the distributed buffering scheme can
be used and an optimization scheme can be used to minimize skews introduced by
buffers .


The effect of voltage scaling on reduction in power dissipation is unparalleled
by any other techniques and offers quadratic power reduction. But as the VDD is
lowered, the gate speed is also lowered if the device technology is not changed.
Hence it is imperative to optimize the energy-delay product. It was shown that the
energy- delay product may be expressed in terms of technology parameters as;

The equation shows that if voltage scaling method is used to reduce the
power without signicantly increasing the delays, then VT should be also scaled
down appropriately to maintain ratio of (VT / VDD) around four. Unfortunately,
every 0.1V reduction in VT raises the leakage current by ten times. Low VT devices
can be used when the active circuit blocks and higher VT devices can be used in idle
circuit blocks. This can be achieved with DTCMOS (Double Threshold CMOS)
devices or using circuit techniques namely Switched-Source-Impedance to raise
source potential or by raising the body potential using active well control method.
Transistor sizing and gate oxide thickness can be also appropriately selected for
optimizing the energy-delay product.
Alternative novel device structures namely devices based on quantum
tunneling, single-electron effect , Silicon-on-Insulator (SOI) , surround-gate
MOSFET with 60mV per decade sub-threshold slope, Inter-band tunneling
transistor- a PN junction acting as a non-FET voltage controlled switch, and
Quantum interference transistorhave been proposed for low power applications.
More accurate models are available at circuit and device level to predict the
power dissipation and delay. The major obstacle to weighing architectural decisions
with respect to power is the difculty in estimating the power effects of these trades-
offs, especially in the absence of an implementation. Due to lack of implementation
details at the algorithm level, the accuracy on estimating the implementation
overhead is not so high. This can be improved only by stochastic modelling. It has
been demonstrated by several researchers that algorithm and architecture level design
decisions can have a dramatic impact on power consumption . However there is a
need of accurate design automation techniques at this level of abstraction . All the
low power design methodologies above the abstraction level of circuit logic cant


reduce the energy below 0.5CV 2. The energy recovery principle which is the only
solution to achieve sub-0.5CV 2 energy dissipation .
Power dissipation in digital CMOS circuits has three components: static
power due to leakage currents, dynamic power due to short-circuit currents and
dynamic power due to charging and discharging of node capacitance. The last term,
which is the most significant in VLSI chips, is proportional to the square of the
supply voltage, while the dependence of the remaining components (capacitance,
frequency and switching activity) is only linear. Therefore, in order to minimise
power consumption the supply voltage should be as low as possible. However, since
the propagation delay of a circuit is inversely proportional to the supply voltage,
supply voltage scaling comes at the expense of throughput degradation. The major
concern of VLSI designers is to reduce power supply whilst keeping propagation
delay as low as possible.
There are two major approaches to power supply scaling, each at a different
design level. The first is the architecture-driven approach, where several techniques
at the architecture- and/or algorithm-level are applied, in order to allow voltage
scaling without performance degradation However, such techniques, including the
use of parallelism and/or pipelining, may require more devices, give rise to more
control overhead, and complicate the system algorithms (the last point is a major
disadvantage). On the other hand, following the device-level approach for supply
voltage reduction, could be more promising for future applications.
Here presents a review of low-voltage design techniques at device and circuit
levels for both digital and analogue systems. Firstly the effect of threshold voltage
scaling on low power design is investigated. Then some clever ways of controlling
threshold voltage are described. As will be shown, control of the threshold voltage is
a very advantageous strategy for designing low power circuits. A low-voltage device
technology, Silicon On Insulator is presented.



2.4.1 The Impact of Threshold-Voltage Scaling on Low Power CMOS
Propagation delay of a circuit increases as the supply voltage scales down and
this effect is more dramatic when the supply voltage approaches the device threshold
voltage. Therefore, in order to allow voltage scaling in a CMOS circuit without any
performance loss, it is necessary, additionally to decrease the threshold voltage as far
as possible. The delay of a gate strongly depends on the ratio VT/VDD, and
consequently with a fixed VT/VDD ratio, delay only increases slowly with
decreasing VDD.
It is clear that in order to keep the delay increase due to voltage reduction
within bounds, VT/VDD ratio should be kept constant as far as possible. There are
good opportunities for reducing power consumption in digital CMOS circuits, in
current and future technologies, by the reduction of both supply and threshold
voltages. The simulation predicted that by optimising the threshold voltage as well as
the feature size in dynamic CMOS circuits it is possible to reduce power by between
3 and 8 times without any speed loss. The results confirm that the shorter the gate
length is, the better the power consumption improvement will be. Their simulation
predicted that a dynamic CMOS inverter, designed in a 0.25m process could
consume 8 times less power owing to reduced supply voltage, without any speed
loss. Specifically, its delay was found to be 31ps, which is equivalent to a maximum
clock frequency of about 4GHz, with a switching energy of 5.6fJ, at a supply voltage
of 1.05V.
Furthermore, the simulations showed that systems with minimum size
devices in static logic could provide more impressive power savings. The switch
energy and the propagation delay of a 0.25m static CMOS inverter, operating at a
supply voltage of 0.48V, was measured to be 1.1fJ and 27ps respectively which
indicates a 44 times power reduction at the same speed, compared to the 3V VDD
Finally, using a 0.5m gate length and static logic, the supply voltage can be
decreased to as low as 0.13V to get more than 800 times system power reduction
with a 19 times system speed loss. The threshold voltage in this case was reduced to


0.05V. In this case, the delay increase can be compensated for by the use of parallel
or pipeline architectures.
2.4.2 Multiple Threshold Voltage CMOS Circuits
The multi-threshold CMOS (MTCMOS) circuit scheme p, employs NMOS
and PMOS transistors with different threshold voltages within the same chip. A
simple two-input NAND gate, implemented in this topology. In this circuit, VDD is
the supply voltage, set at 1V, while VDDV and GNDV are the virtual power supply
rails, serving as the power supplies of the NAND gate, which consists of low-VT
(about 0.2- 0.3V) MOSFETs. On the other hand transistors Q1 and Q2, which link
the virtual supply lines with the real ones, have high threshold voltages (0.5-0.6V)
and serve as sleep control transistors, depending on the signal SL applied to their
gate: when SL is high, the circuit is in standby mode, while when SL is low it is in
active mode.

Fig 2.2 MTCMOS
The operation of this circuit can be described as follows. Firstly, in the active mode
(SL=0), transistors Q1 and Q2 conduct, and there is a current flow from VDD to
GND. Assuming that both Q1 and Q2 have a quite large gate width, which means


that their on-resistance is quite small, VDDV and GNDV act as real power supply
rails. Therefore, the gate operates regularly and at a high speed because the threshold
voltage of 0.3V is low enough compared to the supply voltage of 1V.
In the sleep mode (SL=1), Q1 and Q2 are turned off, thus disconnecting the
virtual supply from the real supply lines. In this case, subthreshold leakage currents
are eliminated because of the high VT of Q1 and Q2. Simulation showed that from
both delay and power-delay-product point of view, this circuit works as well as a
conventional low threshold CMOS design. To confirm the effectiveness of this
circuit scheme, Mutoh et al designed a PLL test chip and implemented it in a 0.5m
CMOS process using new MTCMOS standard libraries. The chip operated at
18MHz, at 1V supply voltage, and its dynamic power dissipation was found to be
less than 5% of the dissipation of the conventional high-VT CMOS LSI operated at
5V. The most impressive result, however, was that the standby-mode static power
dissipation was 600 times lower than the active-mode static power dissipation.
However, there are two factors that influence the speed performance of such a logic
circuit. The one is the size of the sleep control devices, Q1 and Q2, which should be
large enough in order to eliminate the voltage drop across them. The second is the
size of the capacitances CV1 and CV2 of the virtual supply rails. These capacitances
act as temporary voltage sources to the gate and should be set to be large enough for
fast gate performance. Simulation results indicate that the optimum sleep-control
transistor width is five times that of the internal transistors, and virtual supply line
capacitances should be also five times larger than the load capacitance of the gate.
Another significant drawback is that an MTCMOS circuit cannot store data in the
standby mode, hence special latches have been developed to overcome this problem
and allows this scheme be applied to memory elements.
2.4.3 Substrate Bias Controlled Variable Threshold CMOS
An alternative approach for reducing subthreshold leakage currents in a
standby mode, without any speed or yield penalties, was proposed in previous papers
.In these paper, a standby power reduction (SPR) circuit is used for raising the
threshold voltage in a standby mode by switching substrate (bulk) bias between the


supply rails and an external additional supply voltage. Since the threshold voltage of
a device increases as the potential difference | VSB | increases, it is clear that the SPR
circuit should raise this absolute potential difference when the chip is in a sleep
However, the major disadvantage of this technique is that two additional
external power supply rails are required. A more efficient technique for dynamically
varying the threshold voltage is presented in This scheme, called variable threshold-
voltage CMOS (VTCMOS), uses simple feedback circuits for the substrate potential
control and has some significant advantages in that it does not need external power
supplies for bulk bias, is easy in use, no speed and area penalties are imposed and
finally it can be applied to both logic gates and memory elements.
In conclusion, the essential difference between MTCMOS and VTCMOS
approaches is that the VTCMOS approach controls the substrate potential, whilst
MTCMOS controls the power supplies. Since much smaller current (almost none)
flows in the substrate compared to the supply rails, a much simpler circuit can be
applied with the VTCMOS scheme. This leads to negligible penalties in area and
speed performance.
2.4.4 Silicon On Insulator (SOI) CMOS - A Low Voltage Technology
Up to now, techniques of reducing power supply voltages without any speed
loss in conventional CMOS devices have been described. In the foreseeable future,
however, innovations in conventional CMOS technology are expected to take low
power electronics forward. Developments in Silicon-On-Insulator CMOS (SOI-
CMOS), are believed to be capable of reducing supply voltages below 1V and also
reducing the circuit capacitances as well as simplifying the fabrication process. A
simple inverter implemented in SOI CMOS is shown in Figure 2.3. The transistors
are completely insulated from each other, as the substrate is entirely covered by a


layer of silicon dioxide. In three major advantages of this

Fig 2.3 SOI CMOS inverter
device technology, compared to conventional bulk CMOS, are mentioned. Firstly, as
the source and drain areas are electrically isolated, the channel area (body) floats,
leading to a significant reduction of MOSFET parasitic capacitance, which in turn
leads to less switched capacitance. The second advantage is the elimination of the
body-effect, owing to the dielectric isolation, which leads to better performance in
circuits consisting of devices in a chain. Last but not least, since the body floats, it
can be connected with the gate terminal giving lower subthreshold leakage currents.
It is clear that because of the above features, SOI CMOS is very attractive for low
power design.
Two main techniques for fabricating SOI devices currently exist .The first,
referred to as SIMOX, uses conventional ion implantation of oxygen through the
silicon surface and then thermal processing to create an insulating layer. In the
second, called wafer bonding, two bulk silicon wafers are oxidised, and the two
oxide surfaces are held together and bonded at a moderately high temperature. One
of the initial wafers is thinned by polishing, possibly followed by chemical or plasma
etching, until a thin layer of silicon is left over the oxide layer.
There are two kinds of SOI devices, the fully-depleted (FD) and the partially-
depleted (PD) devices. These two styles of SOI MOSFETs differ in both the
fabrication strategy and the electrical features. Especially, in PD SOI transistors, the
silicon thickness is significantly larger than the depletion depth under the channel,
making these devices looking similar to bulk MOSFETs. This similarity, however,


leads to undesirable characteristics, like body-effect and subthreshold currents, even
though, from point of view of ease of fabrication, they are superior to the FD SOI
devices. On the other hand, FD SOI CMOS transistors are more suitable for low
An effective modified type of SOI MOSFET, called dynamic threshold-
voltage CMOS (DTCMOS), providing excellent performance at low power supplies,
is presented in [Assaderaghi94]. The modification is quite simple: the body of the
device is tied to its gate, leading to a forward bias for the source-body junction
,which in turn lead to a VT reduction. This tying, also means that the body potential,
and, therefore, the threshold voltage of the transistor, can dynamically be adjusted
There are many existing techniques for this domino logic. Each technique has
its own advantages as well as disadvantages. The first technique proposed was the
standard footless domino logic technique. J.M. Rabey et al., in 2002 proposed a
Standard Footless Domino Logic (SFLD) . This is the most popular dynamic logic
and the conventional one. Here a PMOS keeper transistor is used to avoid any
unwanted discharge of the dynamic node because of leakage current and charge
sharing because of the Pull Down Network (PDN) which happens during the
evaluation phase. So the noise robustness is made high. But keeper upsizing
increases current contention between the keeper transistor and evaluation network
increasing power consumption and evaluation delay of domino circuits. H. Suzuki,
C. Kim and K. Roy proposed logic called Diode Partitioned Domino in Feb 2002 for
fast tag comparators. It reduces the parasitic capacitance and enables smaller keeper
in highfan-in gates. The diode circuit is also improved by an enhanced diode that
boosts up the gate voltage of the NMOS diode. Yet it suffers from power dissipation
value being little greater. M.H Anis, M.W. Allam and M.I. Elmsary in May 2002
proposed a logic called as High Speed Domino logic (HSD). Reduce the current
drawn through the PMOS keeper and the NMOS PDN. This helps in keeping the
large PMOS keeper without performance degradation and leakage current. However
the area and power overhead of the clock delay circuit will still be there. H.


Mahamoodi and K. Roy proposed a logic named as Diode Footed Domino logic
(DFD) in March 2004. A diode footer transistor is used in series with the evaluation
network. So robustness and noise the replica circuit. For equal noise margin, more
legs are possible. Gate is faster with same number of gates. A fairly large safety
factor is needed to account for random on- die process variation especially FET Vt
variation. Ali Peravi and Mohamed Asyaei proposed a robust low leakage controlled
keeper based domino in 2012 which works on reduction of leakage current and
power but yet suffers from serious efficiency issues in terms of area and delay. A.
Alvandpur, R. Krishnamurthy and K. Sourrty proposed a logic called Conditional
Keeper Domino logic in Jan 2007. This consists of small and large keeper transistors.
The conditional keeper domino has certain disadvantages such as limitations on
increasing the delay and power dissipation due to upsizing. Y. Lih, N. Tzartzanis and
W.W Walker proposed a leakage current replica keeper dynamic circuit in Jan 2007.
It improves scaling of the dynamic logic gates.



By the late 1970s complementary metal oxide semiconductor (CMOS) started
to become the process of choice for digital semiconductor designs. CMOS had
originally been proposed by Frank Wanlass in 1963 as a low standby power
technology, since CMOS logic gates dissipate almost no power when the inputs to
the gate do not change. This follows as CMOS contains both PMOS field effect
transistors (FETs), which can efficiently drive a high voltage, or logic one value, and
NMOS transistors, which are good at driving a zero voltage. The presence of
complementary transistors allows CMOS logic gates to be implemented so that the
output voltage level is connected to the power or ground line, but not both. This
ability to avoid contention ensures that if the inputs are not changing, then no power
is dissipated. This was a major advantage of CMOS over the other manufacturing
processes then available, which dissipated constant leakage or bias currents.
In Figure 3.1 the schematic representation of a CMOS static NAND logic
gate is shown. The logic gate has two inputs A and B. A high logic value at inputs A
and B turns on transistors MN1 and MN2, while turning off transistors MP1 and
MP2. This causes the output Z to be low. When either input A or B is off, however,
the path to the ground line is ruptured, with a path to the power supply (Vdd) being
established. This causes Z to rise. While a NAND gate represents a simple function,
it does show how contention between the power and ground supplies can be avoided
in CMOS circuits. This lack of contention means that when the inputs to a CMOS
circuit do not change, often called a standby or idle state, almost no power
dissipation occurs, except for a small leakage current which flows through the
transistors due to the imperfect manner in which a MOSFET acts as a switch due to
the relentless scaling in the physical dimensions of CMOS processes, driven by the
cost advantages of having a smaller silicon area for digital functions, MOS
transistors have become less perfect switches, leading to greater leakage current.


Fig 3.1 Static nand
The fact that CMOS logic would lead to substantial power savings was apparent to
its Inventor Frank Wanlass, who in 1963 attempted to prove the viability and
technical advantages of CMOS with a monolithic implementation of the technology.
When this proved infeasible, he proved the concept with discrete transistors. His
CMOS implementations reduced standby power by six orders of magnitude over
equivalent bipolar and PMOS implementations. While impressive, this advantage of
CMOS would not prove decisive for many years. Early monolithic designs were very
small, with the standby power consequently being very small as an absolute quantity.
The inferior maturity of MOS transistors meant that in the 1960s, bipolar logic raced
ahead of MOS transistors in applications. Transistor transistor logic (TTL) and
emitter-coupled logic (ECL) developed in 1962 and 1966, respectively, provided
effective digital design techniques for bipolar transistors in the rapidly increasing
semiconductor industry. The major user of CMOS in its early years was the watch
industry, where battery life was a more important attribute than speed. Starting in the
1970s, MOS technology began to mature rapidly, with much of the early industrial
development being driven by Intel. In 1971 Intel released the 4004, the worlds first



The 4004 was built using a 10 m line width PMOS transistor and used 2300
transistors running at 108 kHz. In 1974 Intel released the 8-bit 8080, manufactured in
a 6 m NMOS process. The chip ran at 2MHz and had 6000 transistors. Yield and
cost concerns at the time ensured manufacturers preferred to use a single type of
MOS transistor. Since NMOS transistors were faster than PMOS ones, due to the
higher mobility of electrons over holes, the move to an NMOS process was natural.
Advantages of CMOS Logic:
(1) Robustness(less sensitive to noise).
(2) Simple approach for implementing logic gates.
(3) Easy to translate logic to FETs.
(4) Good noise margins since FETs are in cut off & sizing not critical
(5) No static power dissipation.
(6) Low power consumption.
Disadvantages of CMOS Logic:
(1) Complexity of circuits increases with increased Fan-in.
(2) For N-input logic gate, 2N-transistors are required which results in
significantly large implementation area.
(3) Propagation delay of CMOS gates deteriorates rapidly as a function of

Fig 3.2 Nand using nmos only


The schematic implementation of a NAND gate using NMOS transistors only the
PMOS transistors MP1 and MP2 shown for the CMOS implementation in Fig. 3.2
are removed here and replaced by a resistor, R1. This conceptual resistor is actually
implemented by a depletion mode NMOS transistor. The NMOS NAND gate output
is at Vdd, or a logic one value, when either of the inputs, A or B, is low. When input
A and input B are both high, the output is driven low. The current-driving ability of
pull-down NMOS transistors must be much greater than that of the pull-up resistor.
This ensures that the output can be driven to a low voltage at the cost of higher
power dissipation. In addition to the standby power dissipation, NMOS circuits tend
to be slower than equivalent CMOS circuits. This is due to the need for a weak pull-
up resistor, which results in very slow low-to-high transitions. While these
disadvantages may make NMOS appear to be unappealing, NMOS designs are more
compact than CMOS circuits. Figure 1.5 uses only two transistors and a resistor,
compared with the four transistors needed by a CMOS design. Since the pull-up
resistor is implemented by another NMOS MOSFET, the NMOS design uses fewer
transistors and a simpler process than the CMOS design. The need to move to CMOS
therefore arose only when the integration level on integrated circuits (ICs) made the
large standby power on the NMOS design unacceptable. For Intel this transition
occurred in 1978, when the 8088/8086 family of microprocessors was introduced
(the designs were almost identical to the 8088, having an 8- bit bus while the 8086
has a 16-bit bus). With 29,000 transistors and a clock rate of 5 to 10 MHz, the 8086
dissipated 1.5W. This exceeded the 1W per chip power limit for plastic packaging.
Increases in integration levels meant that a 32-bit processor would dissipate 5 to 6W,
leading to severe reliability problems. The CMOS version of the 8086, the 80C86,
consumed only 250mW.
The ability of CMOS to reduce power dissipation with increasing integration
meant that it rapidly emerged as the technology that could best utilize fabrication
advances. It is an advantage that CMOS maintains till today with the overwhelming
majority of digital IC designs in the world being manufactured in CMOS, and the
increased convergence of systems onto chips leading CMOS to make strong inroads
into analog and radio frequency (RF) designs. As semiconductor manufacturing
progressed, the largest challenge to the nascent industry was the ability to design and


verify designs using the increasing number of transistors available. This need was
met by the development of a new field of software, often closely tied to dedicated
hardware in its early years, called electronic design automation (EDA). It may have
been assumed that the emergence of ASIC design methodologies would displace all
other techniques for implementing digital CMOS logic. This has not happened, as
many digital designs have specific needs that cannot be achieved by using standard
ASIC techniques. In recent years the capabilities of ASIC tools have increased
The two most common benefits of custom design are
(1) Its ability to optimize across the different levels of abstractions in the
ASIC design framework and the opportunity it provides for using logic
families other than standard static logic. The first of these advantages relates
to the sequential approach that an ASIC design methodology uses, by which
standard cell library development, logic synthesis, and physical design are
broadly separate processes.
(2) The second advantage of custom design is that it can utilize certain logic
families, specifically dynamic logic, that automated design frameworks have
not traditionally been able to support.
In ICs, Dynamic logic (Clocked logic) is a design methodology logic family
in digital logic that was popular in 1970s. It can be distinguished from static logic in
that it uses a clock signal in its implementation of Combinational logic circuits. The
use of clock signal in Dynamic logic is to evaluate the combinational logic. The
clock signal is also used in sequential circuits where it is used to synchronize the
transitions in sequential logic circuits.
When CLK is low
(1) Evaluate Me is off and precharge Mp is on
(2) Output node is precharged to VDD, other nodes may precharge to
VDD-Vth,n depending on values of inputs
When CLK goes high
(1) Evaluate Me is on and precharge Mp is off


(2) Output node may be discharged if inputs have configured a conducting
path to GND, otherwise output node stays charged high.

Fig 3.3 basic dynamic circuit
(3) Inputs must be stable before CLK goes high because once output has been
discharged it wont go high again until next cycle
(4) For same reason, noise/glitches on inputs cannot exceed Me threshold, a
much more stringent requirement than for static CMOS gates.
(1) No static power consumption with addition of a clock input, it uses a
sequence of (a) Pre-charge (b) Evaluation phases.
(2) Increased speed and reduced implementation area.
(3) This logic is twice as fast as the normal static CMOS logic since it uses
only fast N- transistors in its evaluation phase.
(4) It is amenable to transistor sizing optimizations.
(5) Glitches (Dynamic Hazards) do not occur.
3.2.1 Glitches (Dynamic Hazards):
The finite propagation delay from one logic block to next logic block causes
spurious or abrupt transitions, which are known as Glitches. Gates have a non-zero
propagation delay


Drawbacks: (1) More power consumption because this logic greatly increases the
number of transistors which are switching at any given time.
(2) Problems will arise when cascading one gate to next gate.

Signal Integrity Issues in Dynamic Design:
There are several important considerations that must be taken into account if
one wants Dynamic circuits to function properly. They are (1) Charge leakage (2)
Charge sharing (3) Capacitive coupling and (4) Clock feed through.Charge leakage
and Charge sharing occur in Evaluation phase.
It is a CMOS-based evolution of dynamic logic techniques which were based
on either PMOS or NMOS transistors. To speed-up the circuits this logic was

Fig 3.4 basic domino logic circuit


The AND gate shown in Fig. 3.4 can be used to illustrate the functionality,
the speed advantage, and also some of the challenges involved in using this logic
family. In Fig. 3.4 it can be seen that the two functional inputs, A and B, are also
attended by the clock signal, Clk. At first glance this may seem strange, since an
AND gate should be a purely combinational circuit, which unlike latches and flip-
flops does not require the presence of the clock signal. Domino logic is, however, a
clocked logic family, which means that every single logic gate has a clock signal
present. When the clock signal turns low, node N0 (which is called the evaluation or
internal node or dynamic node) goes high, causing the output of the gate to go low.
This represents the only mechanism for the gate output to go low once it has been
driven high. The operating period of the cell when its input clock and output are low
is called the precharge phase or cycle. The next phase, when the clock is high, is
called the evaluate phase or cycle. During the evaluate phase the output of the
domino AND cell can go high provided that both inputs A and B are high, which
causes the evaluation node, N0, to be driven to a low value. The evaluate phase is the
functional operating phase in domino cells, with the precharge phase enabling the
next evaluate phase to occur. The appropriate application of the clock signal ensures
that the critical path in domino cells only traverses through cells in the evaluate
phase. One of the advantages of domino logic over static logic can also be garnered
from the schematic in Fig. 1.17. Since the domino cell only switches from a low to a
high direction, there is no need for the inputs A and B to drive any pull-up PMOS

Fig 3.5 Domino Nand


The lack of a PMOS transistor means that the effective transistor width that loads
down a previous stage of logic, for a particular current drive, favours domino over
static logic. This is critical since the key to high speed is ensuring that a speed
advantage can be gained without loading down the cell greatly.

(1) This logic allows rail-to-rail swing.
(2) These Domino logic circuits have smaller areas than CMOS.
(3) Parasitic capacitances are smaller so that higher operating speeds are
(4) Operation is free of Glitches as each gate can make only one transition.
(1) Degradation of noise-immunity due to inevitable leakage current and
charge sharing.
(2) Large power consumption especially if compared to the static CMOS
logic family.
(3) Only non-inverting structures are possible because of the presence of
inverting buffer. (4) Charge distribution may also be a problem.
Several domino circuits have been proposed in the literature such as
conventional higher fan-in domino OR logic with footer less and footer transistor,
high speed domino, conditional keeper domino, wide OR gate diode footer domino.
The main goal of these circuit design technique is to improved Power consumption,
Delay and Area for high circuit performance, especially for wide fan- in circuit.
3.4.1 Standard Footless Domino Logic Circuit (SFLDL)
The footless scheme is characterized by the fact that discharge of dynamic
node is faster. This property is exploited by the high-performance circuits. The
circuit of the SFLD logic is shown in Fig 3.5. Operation of Footless- Domino is as
follows: Precharge phase: During the pre-charge phase, i.e. when then clock (CLK)
is LOW, the dynamic node is charged to VDD and the keeper transistor MP2 turns
ON to maintain the voltage of the dynamic node. Evaluation phase: During the


evaluation mode, i.e. when the CLK goes HIGH, the dynamic node is either
discharged to ground or remains HIGH depending on the inputs. The size of the
keeper transistor should be large enough to compensate for charge sharing problem
and at the same time it should be small enough to reduce the contention between the
keeper and the nMOS pull down transistor in the case the pull down network
evaluates the dynamic node to logic level zero. Otherwise, the pull down network
and keeper transistor compete to drive the dynamic node to two opposite directions,
this effect is called contention and this results in the degradation of speedggg

Fig 3.6 Standard Footless Domino Logic Circuit
3.4.2 Standard Footed Domino Logic Circuit (SFDL)
The footer nMOS transistor MN2 is connected to the source of evaluation
nMOS transistor to obtain the FDL design which basically reduces the leakage
current. The speed the SFDL is lower than the footless one because of the stacking
effect , but the noise immunity is higher. Fig 3.6 shows the most conventional footed
domino logic circuit. When clock is low, the dynamic node is pre- charged to VDD .
In this phase the footed transistor MN2 is turned off, which reduces the leakage
current. When clock goes high, footer transistor MN2 is turned on. So, depending on
incoming data to pull-down network the state of output node is obtained.


Fig 3.7 Standard Footed Domino Logic Circuit (SFDL)
3.4.3 High Speed Domino Logic (HS)
The circuit of the HS Domino logic is shown in Fig.. In HS domino the
keeper transistor is driven by a combination of the output node and a delayed clock.
The circuit works as follows: At the start of the evaluation phase, when clock is high,
MP3 turns on and then the keeper transistor MP2 turns OFF. In this way, the
contention between evaluation network and keeper transistor is reduced by turning
off the keeper transistor at the beginning of evaluation mode. After the delay equals
the delay of two inverters, transistor MP3 turns off. At this moment, if the dynamic
node has been discharged to ground, i.e. if any input goes high, the nMOS transistor
MN1 remains OFF. Thus the voltage at the gate of the keeper goes to VDD-Vth and
not VDD causing higher leakage current though the keeper transistor. On the other
hand, if the dynamic node remains high during the evaluation phase (all inputs at
0, standby mode), MN1 turns on and pulls the gate of the keeper transistor. Thus
keeper transistor will turn on to keep the dynamic node high, fighting the effects of


Fig 3.8 High Speed Domino Logic (HS)
3.4.4 Conditional Keeper Domino Logic (CKD)
Conditional Keeper employs two keepers, small keeper and large keeper . In
this technique, the keeper device (PK) in conventional domino is divided into two
smaller ones, PK1 and PK2. The keeper sizes are chosen such that PK=PK1+PK2 .
Such sizing insures the same level of leakage tolerance as the conventional gate but
yet improving the speed. The circuit works as follows: in pre-charge phase when
clock is low, the pull-up transistor is on, so the dynamic node starts being charge up
to VDD. At the beginning of evaluation phase when clock is high pre-charge
transistors and large keeper PK1 are off.

Fig 3.9 Conditional Keeper Domino Logic (CKD)



3.4.5 Split Domino Logic (SDL)
As mentioned before, there are many parallel branches in a large fan-in
dynamic OR gate. When the dynamic node voltage remains at VDD, the nMOS pull-
down branches cause a large amount of leakage current. The propagation delay is
increased due the large parasitic capacitive effect as this parasitic capacitance must
be discharged to zero during evaluation. Split-domino is a very smart technique that
by splitting the pull-down network into smaller groups improves the operation of the
gate by using small size of keeper in both situations. Therefore, in theory we need
two keeper transistors with a width almost half as much as the conventional circuit.
Fig.5 shows the 16-bit domino OR gate split in two. The circuit overhead is not as
much as it might look, as there are two static inverters in the conventional domino
circuit in place of two and three input NAND gates and besides they can be
implemented using minimum size transistors. The circuit overhead is almost the
same as the conditional keeper technique.

Fig 3.10 Split Domino Logic (SDL)


3.4.6 Diode Footed Domino (DFD)
In diode footed domino we modify the conventional domino circuit by
adding an nMOS transistor M1 in series with the foot of the evaluation network. This
nMOS transistor is in diode configuration i.e. gate and drain terminals connected
together. Fig.6 [10] shows the Diode Footed Domino configuration. Stacking effect
[11] occur because this transistor M1 is connected in series with the evaluation
network. Thus subthreshold leakage current reduces as a result of stacking effect.
DFD circuit works as follow

Fig 3.11 Diode Footed Domino (DFD)



Dynamic logic is used in the implementation of logic circuit for high speed
designs such as data path in microprocessor. However, it is not widely used because
of its disadvantages like less noise robust and more power consuming compared to
static logic style . Domino logic is made by adding one inverter at the output of the
dynamic gate. Domino gate has got advantage over the dynamic gate because fan-out
of former is driven by inverter which has low output impedance and thus increases
the noise immunity of the gate along with decreasing the output capacitance. Fig. 4.1
shows the standard domino logic style. Keeper transistor is used to maintain the logic
one in the evaluation phase (CLK goes high) when there is charge leakage from the
dynamic node through the pull down network (PDN). When PDN is ON in the
evaluation phase dynamic node is discharged to zero through the PDN and
evaluation transistor. Output inverter starts switching from zero to one and the keeper
transistor starts turning OFF from ON. During this period there is static power
dissipation from Vdd to Gnd.

Fig.4.1. Standard Domino OR gate


During the evaluation phase small noise-signal at the input(s) of dynamic gate
can change the desired output because of discharge of dynamic node. In worst case
the circuit becomes very less noise-tolerant in case of high-fan in OR gate
Noise robustness can be improved by upsizing the keeper transistor (making
wider) which makes keeper (PMOS) more conducting and thus maintains the charge
at the dynamic node . But this comes at the cost of static power dissipation which
flows from Vdd to Gnd through keeper transistor when noise signal arrives at one of
the inputs. To make dynamic circuit more noise robust different circuit styles have
been proposed .
Fig. 4.2 is an example of footless domino gate.During the precharge phase
when the clock is LOW, the pre-charging PMOS gets ON and the dynamic node is
connected to the VDD and gets precharge to VDD. When clock goes high, the
evaluation phase starts and the output gets evaluated with the pull-down network and
conditionally gets discharged if any one of the input is at logic 1. At the evaluation
period when all the inputs are at logic 0, the dynamic node should be at logic 1. But
the wide fan-in NMOS pull-down leaks the charge stored in the capacitance at the
dynamic node due to the subthreshold leakage. This is again compensated by the
PMOS keeper, which aims to restore the voltage of the dynamic node. When a noise
voltage impulse occurs at ant gate input, the keeper may not be able to restore the
voltage level of the dynamic node. The subthreshold leakage current is exponentially
dependent upon VGS. So in the presence of noise impulse the gate voltage increases,
which leads to increase in VGS and the dynamic node gets wrongly discharged.
As noise of domino gates is now more important than the area, energy
dissipation and delay issues, so recently several techniques have been proposed to
reduce the noise of dynamic circuits. All the techniques have reduced the noise
sensitivity but there are many drawbacks with area, power dissipation and delay.


Fig.4. 2. A typical footless Domino OR gate
To compensate the leakage current at the dynamic node a week transistor
called keeper transistor is used. Keeper transistor prevents the charge loss and keeps
the dynamic node at strong high when PDN is OFF. In the first domino proposal the
gate of the keeper transistor is tied to ground, therefore the keeper is always on. If at
the beginning of evaluation the pull-down network (PDN) turns on, the dynamic
node tends to discharge through the PDN. However, the keeper is injecting charge to
the dynamic node as it is always on. This is called contention. Furthermore, a
potential DC power consumption problem is generated. To alleviate the potential DC
power consumption problem a feedback keeper was proposed.

Fig 4.3 domino style


Domino circuit to improve noise tolerance is shown in Fig. 4. At the
beginning of the evaluation phase node C is at 0 V. Noise glitches at the input
temporarily increases the gate-to-source voltage of the corresponding NMOS in
PDN. Increase in the subthreshold current increases the charging of node C. During
this process, gate-to-source voltage of the active NMOS decreases and the
subthreshold leakage current is exponentially reduced.



The proposed novel domino circuit scheme is shown in Fig. 5.1 Transistor
M1 is used as diode. Due to voltage drop across M1, gate-to-source voltage of the
NMOS transistor in the PDN decreases (stacking effect]). The proposed circuit
differs from as it has additional evaluation transistor M5 with gate connected to the
CLK. In , when M1 has voltage drop due to presence of noise-signals, M2 starts
leaking that causes the circuit to dissipate power and also makes it less noise robust.
The purpose of M5 in proposed scheme causes the stacking effect and makes gate-to-
source voltage of M2 smaller (M2 less conducting).

Fig. 5.1. Proposed domino circuit scheme


Fig. 5.2. Simulated waveform of proposed scheme

Fig. 5.3. Waveform simulated for the OR gate 1. Clock Input 2. Input A 3. Input B 4. Output for
basic circuit 5. Output for [6] 6. Output for [7] 7. Output for Proposed circuit


Hence circuit becomes more noise robust and less leakage power consuming.
But for performance degrades because of stacking effect in mirror current path. This
can be increased by widening the M2 (high W/L) to make it more conducting. Due to
the stacking effect of the diode footer the subthreshold leakage also decreases . Due
to the presence of the diode footer there is a voltage drop across the diode footer in
evaluation phase. Due to the voltage drop VGS becomes negative which cause
exponential reduction in the threshold voltage.
Fig. 5.2 shows the simulated waveform of the proposed circuit. This
waveform shows the characteristic of the output node, the dynamic node and the
N_Foot with the input waveform. This wave form has been taken at the evaluation
phase of the clock, when the dynamic node evaluates the input waveform.
Fig. 5.3 shows the output simulation result of the proposed circuit with the
other reference circuits. The proposed circuit output is containing very less noise as
compared to the other domino circuits, which can be shown in the waveform. The
proposed circuit output waveform does not have more number of charging and
discharging which minimises the power dissipation of the circuit and also makes the
circuit fast from expected. This reduction in power and delay reduces the power-
delay product (PDP) of the circuit.
Also the power can be reduced due to the voltage drop across the diode
footer, which makes the VGS of the OFF evaluation network negative, causing
exponential reduction in subthreshold leakage. This phenomenon reduces the power
consumption of the circuit.



Circuits are simulated using Cadence specter simulator at temperature of 27
degree Celsius with in 90 nm CMOS technology for bulk CMOS . The simulation
was being done in different voltages of VGS and the power dissipation and delay
was measured. That delay and power of the proposed circuit were compared with the
basic CMOS footed and footless domino circuits, the keeperred and keeperless
circuits and previous proposed circuits. It can be shown from the tables that power-
delay product can be reduced up to 100% from the latest proposed schemes. This can
be a better circuit for high speed embedded circuits.
Table 6.1. Power delay comparison of the proposed circuit with previous reported articles simulated
with 2 input or gate



Table 6.2. Comparison of or gate for proposed domino logic with or gate designed with basic circuit
and other reference circuits

Fig. 6.1 compares the proposed circuits power-delay product with the basic
keeper, keeperless and footed footless schemes. Also it compares the PDP with
recent proposed articles. The lowest line shows the proposed circuits PDP at
different VDC. This can conclude that the proposed circuit shows least PDP then

Fig. 6.1. Comparison of the PDP of the proposed circuit with other reference circuits and the basic
circuits for different supply voltages


The circuit was being simulated for a fan-in of 1 bit to 32 bit OR gate and compared
with the conventional basic CMOS domino logics i.e. footed and footless logics and
also the keepered and keeperless logics. The OR-gate with different fan- in of the
proposed circuit also compared to the previous reported domino logics and found to
be having least PDP from others. Table 2 shows the delay and power dissipation
comparison of the proposed logic with others. Fig. 9 shows the PDP comparison of
all the logics and it can be seen that the proposed logic possesses lowest PDP.

Fig. 6.3. Comparison of the PDP of the proposed circuit with other reference circuits and the basic
circuits for different numbers of fan-in

Table 6.3. Ung and performance measurement with different width of m2 for the proposed circuit at
90 nm technology for an input of 2



Transistor M2 plays a crucial role in terms of leakage and performance of
gate in the proposed scheme. Its high width improves the performance by making the
speed more but penalty paid is less noise robustness and slightly more power
consumption. Table 3 shows the UNG, power and delay measurements for various
widths of M2 which was simulated with 1 V and 90 nm technology.



Domino CMOS logic circuit family finds a wide variety of applications in
microprocessors, digital signal processors, and dynamic memory due to their high
speed and low device count. Domino logic is a CMOS logic style obtained by adding
a static inverter to the output of the basic dynamic gate circuit. High- performance
noise-tolerant circuit techniques for CMOS dynamic logic and other Domino logic
techniques are studied and corresponding Domino logic techniques have been
designed & simulated. The results are studied. The advantages & disadvantages are
also observed.

Advantages of Domino CMOS logic:
(1) High speed
(2) Low device count.
(1) Degradation of Noise immunity.
(2) Inevitable leakage currents.
(3) Charge sharing.
(4) Large power consumption.

In all those techniques the important effects like sub threshold leakage
currents, threshold voltages, supply voltages, sources of noise, power consumptions,
delays and area are considered. Few modifications have also been made to already
existing domino techniques to get desired results. The improved techniques, though
they suffer from few drawbacks, are giving better results compared with previous
A new circuit scheme for the domino logic is proposed here. The proposed
circuit style is simulated in nm CMOS technology for bulk CMOS model. Proposed
scheme when compared with the recent proposals shows high power savings as well


as less power-delay product with almost same noise immunity. With this circuit PDP
can be increased by 100% of the recent proposed schemes. The proposed circuit can
be used in design of high-speed embedded processors where low power consumption
is an essential requirement.