Beruflich Dokumente
Kultur Dokumente
over the former approach, and it is the focus of our discussion design modifications that can help in decreasing the probability
in this paper. In addition, the pulser usage can eliminate the of circuit failure (i.e. enhancing pulsed latch reliability) at
need for some of the clock buffers used in the clock tree, thus different supply voltage values. With the proposed approaches,
providing an additional amount of power and area savings. pulsed latches present a formidable alternative to MSFFs,
Having only one latch between the input and the output, PLs providing higher performance, lower area and power consump-
have lower timing overhead than MSFFs. At the same time, tion, and higher reliability and robustness to different kinds of
since the driving pulse is very short, the transparency period variations.
for the latch becomes very narrow, allowing the PLs to have a The main contribution areas of this paper are:
timing behavior close to that of MSFFs, to the extent that they • A study of the effect of PVT variations on the operation of
are sometimes classified among flip-flop families [9], [10]. pulsed latches in advanced technology nodes, considering
Also, due to the presence of the narrow transparent window the effects on both the pulser and the latches.
of the latch, pulsed latches have an inherent tolerance to clock • Two novel pulse generator designs for the pulsed latch
skew and jitter [2]. Since they have fewer transistors that that can be utilized to increase the reliability of pulsed
are triggered by the clock signal, they have the advantage of latch circuits, while keeping its main advantages of high
reducing a significant amount of clocking power [8], and they performance and low power consumption.
consume much less leakage power compared to MSFFs due • Comprehensive comparisons of reliability, power, and
to the smaller area and fewer transistors. area between different data registers implemented using
One complication in PL design is the choice of the pulser the traditional transmission gate pulsed latches and the
output pulse width. Too short of a pulse width may not proposed reconfigurable pulsed latches.
be enough for the latches to store the input data correctly, The remainder of the paper is organized as follows.
while too long of a pulse width will result in a longer latch Section II discusses some previous work in this area.
transparency window; which, in turn, increase the timing Section III discusses PVT variations and their effects on
overhead or can violate hold time requirements [11]. This pulsed latches behavior. Section IV discusses the two proposed
issue becomes more complicated when considering different design approaches to improve the reliability of pulsed latch
sources of variations. PVT variations have significant impacts under supply voltage scaling. Section V discusses the results
on different circuit components. Since sequential elements, obtained for the enhancement of circuit reliability, power,
in general, are by nature time sensitive elements, they are and area on a case study of a typical register. Finally, some
among the circuit categories that are highly affected by any conclusions are drawn in section VI.
PVT variations [12]. Since pulsed latches, in particular, are
very time sensitive, good study of the effect of different II. P REVIOUS W ORK
sources of variations has to be considered. Since some of Pulsed latches have been always proposed to decrease
these variations, such as voltage and temperature, are temporal power consumption and increase performance. In [13], PLs
variations that change over the operating period of chips, with relatively wide pulse widths were used to allow cycle
careful analysis and design has to be performed to ensure that borrowing and tolerate any clock skew. In order to compensate
reliable circuit operation can always be achieved without any for any hold time issues, deracer circuits were used to block
significant loss in performance, power and area. any incoming fast data before the end of the pulse. In [8],
The study presented in this paper shows that some variation PLs were used as the main sequential elements to increase the
effects on PLs can be compensated with careful analysis performance of the Intel XScale microprocessor without con-
during the design process, while some others can not be com- suming high clock power. Although the minimum pulse widths
pensated without significant degradation in reliability or per- to ensure reliable operation across different PVT corners were
formance. As an example, pulsed latches designed to work at used, delay buffers were still be needed to decrease the risk
certain voltage corner may not work at another voltage corner of hold time violations. Baumann et al. [7] proposed three
without degradation in either performance or reliability. Since options for selectively replacing some of the MSFFs by PLs
sequential elements such as pulsed latch are prolifically used to improve the performance of the ARM926 microprocessor.
within the die, any degradation in their performance or reli- However, some area and power overhead were presented due
ability can significantly affect the performance of the entire to buffer insertion. Baumann et al. [14] proposed replacing
chip or can even cause a large yield loss. In addition, since MSFFs by PLs in an ARM microprocessor. This was used
designing a chip that can perfectly operate at just one voltage to gain some performance improvement that was utilized as
corner is not an acceptable solution nowadays, adding a recon- timing margins to compensate for the within-die variations.
figuration ability to PL can help to reach the required target In [6], a pulse generator with a pulse width control option
design goals. With this added feature, PLs can be configured was presented to enhance the reliability against Bias Tem-
to run with the minimum timing overhead to guarantee correct perature Stability (BTI). In [15], a traditional pulser and
operation at different voltage levels in the presence of different a latch formed by a tri-state inverter and a static keeper
sources of variation. were proposed. The pulse width was chosen large enough
In this paper, we are presenting variability analysis of one to ensure correct operation under different PVT corners,
of the popular topologies of pulsed latches, Transmission assuming that delay cells can be used to fix any hold time
Gate Pulsed Latch (TGPL), studying the effects of process, issues. In [16], the impact of process variations on several
voltage, and temperature variations, as well as proposing pulsed flip-flops were presented, in addition to two techniques
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
to capture the input data and pass it through the internal nodes Fig. 4. The probability of failure of a traditional pulsed latch designed at
nominal supply voltage at different supply voltages and temperatures.
to the storing cross coupled inverters). The area under the
intersection between the two PDFs represents the failure of
write operation, since this is the region where there is a high
PL circuit is designed to operate reliably at an intermediate
probability that the pulse width will be smaller than the time
supply voltage (0.9V as an example), the reliability will
needed by the latch. Alternatively, knowing the information
still significantly degrade at lower voltages, especially at low
about the distribution of the latch write time and for easiness
operating temperatures.
of timing analysis, a maximum value for latch write time can
One possible solution is to design the PL circuit to operate
be calculated for certain sigma value of the designer choice.
with the needed level of reliability at the lowest possible
In this case, the probability of write failure can be calculated as
operating voltage. Since chips usually operate at different
the probability of having the width of the pulser output smaller
supply voltages with different operating modes, when pulsed
than this desired maximum value. In both cases, depending
latches are operating at a voltage higher than that mini-
on the target yield, the designer can determine the minimum
mum value, they will be operating with extra timing margin
acceptable value for circuit failure, and hence, the transistors’
(the pulse width will be larger than the needed width to achieve
dimension can be adjusted to reach the target yield.
the required level of reliability). Hence, this will negatively
affect one of the main advantage of PLs which is their low
B. Voltage Scaling timing overhead, in addition to increasing the risk of hold time
Voltage scaling is a popular run-time technique used for violations. The proposed circuit approaches in this paper will
reducing the power consumption of circuits. It significantly help in forcing PLs to reliably operate with just the needed
decreases both dynamic power (with its two components timing margins at different supply voltages. Hence, this will
of switching power and internal power) and leakage power. assure gaining the maximum benefits of pulsed latches at
On the other hand, the ability to reduce the operating supply different operating modes without any unnecessary waste in
voltage is limited by a minimum value determined usu- the design performance.
ally by some timing constrains (critical path delay as an
example), in addition to some margins for the PVT variations,
and usually adding a margin for aging effects [22]. As the C. Temperature Effect
supply voltage is scaled down, the available voltage headroom Studying the effect of temperature variation on the design
decreases further and the transistors become more sensitive to is very important. Not only does the variation in temperature
any variations. affect leakage power and performance, but it also affects the
The effect of voltage scaling is naturally associated with probability of having an error during circuit operation, as well
the increase of timing delays for different circuit components. as impacting the life span of different chip parts [23]. Factors
While this can be handled at design time for several circuit such as the increase of leakage power with technology process
components, the case may not be as easy for PLs. Since PL scaling, the nonequivalent down scaling of the supply voltage
operation depends on two different components (pulser and when compared to geometry scaling, and the increase in the
latches) of different micro-architectures, the timing of each of dynamic power associated with the increase in performance
them is affected differently. As shown in Fig. 3, voltage scaling required in current designs, all lead to the increase of the oper-
affects the probability distribution of the pulser and the latch ating temperature. Careful study of the effect of temperature
differently. Hence, failure probability calculated at one voltage is required especially for time-sensitive sequential elements.
will not be the same when applying voltage scaling. Indeed, PL The study of temperature effects on pulsed latches is much
reliability degrades significantly when scaling down the supply more critical, since each of the pulser and the latches can
voltage. As shown in Fig. 4, the probability of write failure have a different response to temperature variation. The study
for a PL can increase by up to two order of magnitude when done in this paper shows that both circuits become more
the supply voltage is scaled down by around 30%. Even if the sensitive to process variation with the decrease in temperature.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 5. The effect of temperature variation on both the latch write time and
the pulser pulse width running at nominal supply of 1.05V.
Fig. 6. The effect of temperature variation on both the latch write time and
the pulser pulse width running at scaled supply voltage of 0.7V. Fig. 8. A diagram showing arbitrary PDFs for the pulser and the latch
when (a) operating at nominal supply voltage, (b) scaling down the sup-
ply voltage without configuring the pulser (or having a fixed pulser), and
(c) scaling down the supply voltage and configuring the pulser circuit to
However, the pulser is more significantly affected by temper- generate a wider output pulse .
ature variations. In addition, the entire PL would have high
failure rates when operating at a lower temperature. IV. P ROPOSED D ESIGN A PPROACHES
When running at nominal supply voltage, the transistors
become faster as the temperature decreases. Since the pulser The basic structure of a conventional pulser of the TGPL
is more timing sensitive than the latch, the timing margin is shown in Fig. 7, where the delay unit, usually consisting
between the latch write time and the pulser output pulse of an even number of inverters, is responsible for determining
width will decrease with the decrease in temperature as shown the width of the needed output pulse. To guarantee correct
in Fig. 5. Hence, the probability of write failure is expected operation, the pulser is designed to generate a pulse width
to increase with the decrease of temperature. that is larger than the latch write time.
When scaling down the supply voltage beyond certain As described in the previous section, it is not easy to
limit, the relation between the circuit delays and temperature design a non-configurable pulsed latch circuit that can operate
is revered due to temperature inversion effect [24]. Again, with just the needed timing margins at different supply volt-
due to the higher sensitivity of the pulser over the latch, ages in the presence of process and temperature variations,
the time margin increases as temperature decreases. However, while keeping the needed level of reliability. To be able to
the process variation effect increases significantly with the reach the needed reliability level, the pulser circuit should be
decrease in temperature. As shown in Fig. 6, the standard reconfigured at run time to generate an output pulse whose
deviation for the latch distribution is doubled with the tem- width can be controlled based on the operating condition.
perature decrease from 125◦C to −40◦C, while the standard Shown in Fig. 8(a) is a pulsed latch circuit designed to
deviation for the pulser distribution increases by more than operate correctly at nominal supply voltage with high level
60%. This significant increase in the variation of the latch and of reliability. However, when scaling down the supply voltage
pulser timing with the decrease in temperature leads to the as shown in Fig. 8(b), the circuit become less reliable with
increase of the probability of write failure. higher probability of failure. The required level of reliability
Therefore, for different supply voltages, even when taking can be achieved at the lower supply voltage by increasing the
temperature inversion into account, pulsed latches become width of the generated pulse. As shown in Fig. 8(c), this is
less reliable at lower temperature. Hence, to ensure correct equivalent to shifting the pulser probability distribution to the
operation at different temperatures, pulsed latches should be right, compensating for the increased variation effects at lower
designed at the lowest operating temperature expected for voltages and therefore, decreasing the probability of circuit
the system. This can be a little different from one process failure.
technology to another, but similar results can be expected, In this section, two design approaches are proposed. Both
at least for planar CMOS devices at closer process nodes. approaches depend on controlling the delay path (the delay
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 13. The structure of the circuit used to generate the CTRL signal for
the PL-SW and PL-MUX registers.
Fig. 11. The structure of the pulser used for the PL-SW based register. The control signal (CTRL) used to select the required pulse
width can be generated by one of two methods. In a system
with two supply rails, the CTRL signal can be driven by the
same control signal used to switch the power gates used to
choose the operating supply rail. For a system with a single
supply rail, a circuit can be designed to generate an output
voltage that is dependent on the value of the VDD. For the
proposed register circuits, the circuit in Fig. 13 was designed.
Fig. 12. The structure of the pulser used for the PL-MUX based register. The first stage of this circuit is a voltage divider of the supply
voltage that generates an output a little less than half the VDD.
The output of this voltage divider is used to drive a pseudo-
The traditional PL is designed to have three inverters in its NMOS inverter circuit. The pull down network (PDN) of this
delay path to generate the needed pulse width. The transistor inverter is a strong NMOS, while the pull up network (PUN)
sizing for this pulser (which is also a common part in the is built using a weak PMOS in parallel with a weak NMOS.
two proposed designs) follows rules close to that described The presence of both NMOS and PMOS in the PUN is to
in [26]. All the transistors have the minimum length, and ensure reliable circuit operation at different process corners,
only the transistors widths are varied. The first inverter is especially the fast-NMOS slow-PMOS (FS) corner. The output
chosen to be of minimum size to reduce the load on the clock of this pseudo-NMOS inverter passes through few regular
network. Hence, the sizes of the second and the third inverters CMOS inverters to generate the final CTRL signal. At nominal
are adjusted to determine the needed pulse width. With the supply voltage, when the value of VDD is high, the output of
technology used in this paper, stacked transistors were used the voltage divider will be high enough to strongly turn on
for the second and the third inverters. The NAND gate and its the PDN NMOS of the pseudo-NMOS inverter, generating
following inverter are sized depending on the load they drive in a lower voltage value at its output. Since the PUN of the
order to generate reasonable sharp edges for the output pulse. circuit has weak NMOS and PMOS devices, the generated
For the first proposed pulsed latch approach, which is output of this pseudo-NMOS inverter will be low enough to
the header switches-based design (PL-SW), the pulser header be interpreted by the following CMOS inverter as a logic ’0’.
switches are divided into two switches as shown in Fig. 11. At scaled-down voltage, when the value of VDD is low, the
One switch is always turned ON by tying its gate to ground, output of the voltage divider will be low enough to make
while the other one is turned ON or OFF by the input control the NMOS of the pseudo-NMOS PDN hardly on. Hence,
signal (CTRL). This will add one additional level of voltage the always-on PMOS and NMOS of the PUN will generate
scaling to the delay path’s virtual supply voltage (VDI). a high enough voltage to be interpreted as a logic ’1’ by the
The size of the always-on header switch is chosen to ensure the following CMOS inverter. The regular CMOS inverters are
correct operation at the down-scaled supply voltage with the used to adjust the voltage levels of the CTRL signal and to
required reliability level. The size of the other controllable generate the needed voltage polarity. Since this circuit can
header switch is chosen to reduce the voltage drop across the be shared between different resisters in a design block, any
switches when this switch is turned on, and hence, driving area or power overhead associated with it can be negligible.
VDI to be very close to the main supply voltage VDD when In addition, the same circuit can be repeated to generate
running at nominal VDD. additional control signals (for multiple supply voltage scaling
For the second proposed approach, which is the multi- levels), where the switching threshold for each circuit can be
plexed delay units-based design (PL-MUX), a pulser with two controlled by adjusting the sizes and the threshold voltages of
delay paths was designed as shown in Fig. 12, having either each transistor.
three or five inverters in the delay path. The sizes of the three The experiments were carried out using the Synopsys 28nm
main inverters after the output of the multiplexer were chosen PDK [27], [28]. All of the implementations were examined
to ensure correct operation at nominal VDD with the required at the same frequency of 1 GHz and the rise time of the
reliability level. The sizes of the two inverters at the input input clock signal was chosen to be 50ps. The nominal supply
of the multiplexer were chosen to ensure the same reliable voltage used was 1.05V and the same level of supply voltage
operation when scaling down the VDD. scaling to 0.7V was applied. The analysis for the effect of
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE I
T HE P ROBABILITY OF W RITE FAILURE OF D IFFERENT R EGISTER I MPLEMENTATIONS
AT D IFFERENT S UPPLY V OLTAGES AND D IFFERENT T EMPERATURES
A. Reliability Analysis
Since the traditional PL circuits do not have any configura-
tion ability, they were designed to ensure correct functionality
at nominal supply condition. Correct functionality means
that the pulsed latch circuit can achieve the target level of
reliability (i.e. target probability of failure) when running at
nominal supply at the entire range of temperature values in
Fig. 14. The layouts of the proposed pulser circuits: (a) PL-SW, the presence of process variation.
(b) PL-MUX. When the supply voltage is scaled down, the traditional PL
register becomes more susceptible to write failure. As shown
temperature variation was conducted to cover a typical indus- in Table I, the PW F for the traditional PL register is within
trial range of temperature variation from -40◦C up to 125◦C. the required value at nominal supply voltage. However, PW F
The simulations were done using Hspice and the variability increases when the supply voltage is scaled down to 0.7V.
analysis were carried out using Solido Variation Designer [12] Hence, the circuit becomes much less reliable at this lower
and Matlab. All the simulations and analysis were carried out voltage. The latch Twr−max and the pulser PWs at different
on the post-layout extracted circuits. The power numbers were supply voltages are shown in Fig. 15 for the PL-SW design.
calculated over a few hundred cycles of common activity levels The typical values for the PW are chosen to be the mean
for the different register implementations. The area calcula- of their distributions, while the minimum values for the PW
tions were carried out through layouts drawn with Synopsys are arbitrary chosen to be 3σ lower than the mean. The short
Custom Designer and verified with Synopsys IC Vaildator configuration of the PL-SW is nearly the same architecture
using the same 28nm technology. The layouts of proposed as the traditional PL. When the supply voltage decreases,
PL-SW pulser and PL-MUX pulser are shown in Fig. 14. For the latch Twr−max starts to move away from the typical short
simplicity, the shown layouts are for the pulser circuits only. PW and closer to the minimum short PW, increasing the
However, the reported areas were calculated on the complete probability of write failure as shown in Table I. If the pulser
16-bit register layout, which include the 16 latches in addition can be configured to generate longer PW at lower supply
to some buffers. voltages, the reliable timing relation between the Twr−max and
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 17. Probability distribution function for PL-MUX before and after
Fig. 15. The latch worst WR time and the typical and worst pulser PWs for configuration for VDD = 0.7V at 125◦ C.
the two configurations of PL-SW at different supply voltages at 25◦ C.
Fig. 18. The average energy per cycle normalized to the energy per cycle
Fig. 16. Probability distribution function for PL-SW before and after of the traditional PL register at 1.05V and 125◦ C.
configuration for VDD = 0.7V at 125◦ C.
TABLE II
N ORMALIZED Y IELD P ER U NIT P OWER FOR THE T WO R EGISTER I MPLEMENTATIONS U SING THE T WO P ROPOSED PL D ESIGNS
C. Discussion
This paper presents an analysis of the effect of process
and temperature variations at different supply voltages on
the reliability of the traditional TGPL. This analysis includes
the effect on both the pulser and the latch, in addition to the
relation between them. TGPL was chosen as it is one of the
most efficient PL topologies. Without any configuration ability, Fig. 19. Box plot of pulse width of PL-SW at 0.7V and 25C over 1000-
the TGPL should be designed to operate with high reliability samples Monte Carlo simulation.
at the lowest supply voltage (the pulse width is increased to
ensure correct operation with low failure probability). This will
result in much wider pulser PW when the circuit is operating In addition, the two proposed designs are not significantly
at higher voltages. Hence, this adds unnecessary extra timing affected by the clock rise time. This can be shown in Fig. 19
overhead, in addition to increasing the chances of hold time for PL-SW when running 1000-sample Monte Carlo simu-
violations. These hold time violations can be solved by adding lations for different clock rise time. Also, an output pulse
some delay buffers. However, this will degrade some of the was generated successfully with no failure at both 1.05V and
advantages of PLs which are their lower timing overhead and 0.7V when running 1000-samples of Monte Carlo simulations
their lower sequential and clock network power. Therefore, while varying the clock rise time up to 150ps. Similar results
reconfigurable PLs are needed. In addition, any overhead were also obtained for the PL-MUX design. This makes the
associated with adding the configuration ability should be two proposed designs very comparable to the design proposed
minimized. in [17] for the range of supply voltage studied in this paper.
The two proposed approaches have an advantage of keeping When compared to similar studies presented in the literature,
the required level of reliability at different supply conditions the approaches proposed in this paper have some advantages.
with the minimum power and area overhead when compared In [16], only the effect of process variations on PL was studied
to the static traditional PLs. At the same time, each of the two and the two proposed techniques were not quantified under
approaches is running with just the needed margins at different the effect of temperature and voltage variations. The study
supply voltage values, hence, minimizing the timing overhead. in [18] and [19] considered the effect of PVT variations, but
To evaluate the gained benefits from the proposed designs it didn’t quantify the effect of these variations on the design
when compared to the added power overhead, the design yield, yield. In addition, the covered temperature variation range was
calculated using equation 3, per unit power is calculated for very limited and didn’t study the entire range covered in this
the three different register implementations. The results of the paper. In [15], the pulser used for the proposed PL is similar to
design yield per unit power for the two proposed implementa- the traditional pulser studied in this paper, but it was designed
tions when normalized to that of the traditional PL register at to generate wider pulse for correct operation at different
the same voltage and temperature are shown in Table II. When voltages. The paper reported the usage of delay cells to fix
running at nominal supply voltage, the proposed designs are hold time issues. Therefore, with the very small area overhead
running as reliable as the traditional PL register with negligible of PL-SW and PL-MUX and the elimination of the need to
power overhead at different temperatures. When the supply delay buffers, our two proposed approaches are expected to
voltage is scaled down, only the two proposed designs can save area when compared to the one proposed in [15]. Since
keep the same level of reliability with a very small power PL-SW doesn’t add any significant power overhead, power
overhead at different temperatures. Hence, the two proposed saving are also expected (PL-MUX can also save power if
designs show great advantages over the traditional PL register large number of delay cells must be used). Similarly, [7], [8]
at the entire range of voltages and temperatures. Although reported the need of delay buffers to fix hold time violations,
the two proposed approaches was used for the TGPL, both which could be eliminated to save power and area by using
approaches can be easily used with any other PL topology the proposed reconfigurable approaches.
whose pulser uses a delay chain to control the generated pulse Comparing the results of the two proposed approaches, each
width. has advantages and drawbacks. While the PL-SW design is
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
smaller in area and requires less power, the design of the [2] S. Paik, G.-J. Nam, and Y. Shin, “Implementation of pulsed-latch
power switches is more complicated specially if several control and pulsed-register circuits to minimize clocking power,” in Proc.
IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD), Nov. 2011,
level is needed. In addition, when all the switches are ON pp. 640–646.
(i.e. running at nominal supply voltage), there will still be a [3] Y. Shin and S. Paik, “Pulsed-latch circuits: A new dimension in
small voltage drop on the power switches. Hence, the delay ASIC design,” IEEE Des. Test Comput., vol. 28, no. 6, pp. 50–57,
Nov./Dec. 2011.
unit and its consecutive inverter will still run at a slightly lower [4] M. A. Alam, K. Roy, and C. Augustine, “Reliability- and process-
voltage than the rest of the pulser circuit, generating a pulse variation aware design of integrated circuits—A broader perspec-
width slightly larger than expected. One possible solution is to tive,” in Proc. IEEE Int. Rel. Phys. Symp. (IRPS), Apr. 2011,
pp. 4A.1.1–4A.1.11.
increase the sizes of the switches, however, this will increase
[5] E. Consoli, G. Palumbo, J. M. Rabaey, and M. Alioto, “Novel class of
the pulser area. energy-efficient very high-speed conditional push–pull pulsed latches,”
On the other side, the PL-MUX is simpler and easier in IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 7,
design. However, its power overhead is much larger due to the pp. 1593–1605, Jul. 2014.
[6] J. Warnock et al., “Circuit and physical design of the zEnterprise
increase of dynamic power with the extra inverters. In addition, EC12 microprocessor chips and multi-chip module,” IEEE J. Solid-State
the area and power will exponentially increase with each Circuits, vol. 49, no. 1, pp. 9–18, Jan. 2014.
additional voltage scaling level, due to the additional delay [7] T. Baumann et al., “Performance improvement of embedded low-power
microprocessor cores by selective flip flop replacement,” in Proc. 33rd
units and the larger multiplexer. Eur. Solid State Circuits Conf. (ESSCIRC), Sep. 2007, pp. 308–311.
Hence, for designs with few voltage scaling levels, the [8] L. T. Clark et al., “An embedded 32-b microprocessor core for low-
PL-MUX design can be preferable over the PL-SW one, as it power and high-performance applications,” IEEE J. Solid-State Circuits,
is easier in design, generates more precise pulse widths, and its vol. 36, no. 11, pp. 1599–1608, Nov. 2001.
[9] M. Alioto, E. Consoli, and G. Palumbo, “Analysis and comparison in
overheads (power and area) are reasonable. On the other hand, the energy-delay-area domain of nanometer CMOS flip-flops: Part I—
for designs with large number of voltage scaling levels, the PL- Methodology and design strategies,” IEEE Trans. Very Large Scale
SW design is preferred, as the area and power overheads of the Integr. (VLSI) Syst., vol. 19, no. 5, pp. 725–736, May 2011.
[10] M. Alioto, E. Consoli, and G. Palumbo, “Analysis and compari-
PL-MUX design will be significant. son in the energy-delay-area domain of nanometer CMOS flip-flops:
Part II—Results and figures of merit,” IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 19, no. 5, pp. 737–750, May 2011.
VI. C ONCLUSION [11] S. Paik and Y. Shin, “Pulsed-latch circuits to push the envelope of
ASIC design,” in Proc. Int. SoC Design Conf. (ISOCC), Nov. 2010,
In this paper, an analysis of the effect of PVT variations pp. 150–153.
on the pulsed latch performance was presented. The analysis [12] T. McConaghy, K. Breen, J. Dyck, and A. Gupta, Variation-Aware
considered both the pulser and the latch to evaluate the relia- Design of Custom Integrated Circuits: A Hands-On Field Guide. New
York, NY, USA: Springer-Verlag, 2012.
bility of the entire pulsed latch circuit. In addition, the benefits
[13] S. D. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger,
of having a reconfigurable pulsed latch circuit was discussed. T. J. Sullivan, and T. Grutkowski, “The implementation of the
Two novel modifications to add the reconfiguration ability to Itanium 2 microprocessor,” IEEE J. Solid-State Circuits, vol. 37, no. 11,
TGPL circuits were proposed. pp. 1448–1460, Nov. 2002.
[14] T. Baumann, D. Schmitt-Landsiedel, and C. Pacha, “Architectural assess-
The benefits of using the proposed design approaches in ment of design techniques to improve speed and robustness in embedded
enhancing the robustness of pulsed latch circuits at different microprocessors,” in Proc. 46th ACM/IEEE Design Autom. Conf. (DAC),
supply voltages were demonstrated using 16-bit registers. Both Jul. 2009, pp. 947–950.
[15] R. Kumar, K. C. Bollapalli, R. Garg, T. Soni, and S. P. Khatri, “A robust
proposed approaches were able to ensure reliable operation of pulsed flip-flop and its use in enhanced scan design,” in Proc. IEEE Int.
the pulsed latch-based register under different supply voltages Conf. Comput. Design (ICCD), Oct. 2009, pp. 97–102.
in the presence of process and temperature variations, without [16] M. Lanuzza, R. De Rose, F. Frustaci, S. Perri, and P. Corsonello, “Impact
any unnecessary timing overhead. Both approaches have a of process variations on pulsed flip-flops: Yield improving circuit-level
techniques and comparative analysis,” in Proc. Int. Workshop Power
very small area overhead of around 3% or less. In addition, Timing Modeling, Optim. Simulation, 2011, pp. 180–189.
the power overhead of both approaches is minimal when [17] S. Dhong et al., “A 0.42 V Vccmin ASIC-compatible pulse-latch
compared to the traditional pulsed latch based register at the solution as a replacement for a traditional master-slave flip-flop in a
digital SoC,” in Proc. IEEE Custom Integr. Circuits Conf., Sep. 2014,
same reliability level. Both approaches are easily scalable to pp. 1–4.
cover different levels of voltage scaling. In addition, they can [18] M. Alioto, E. Consoli, and G. Palumbo, “Variations in nanometer CMOS
be applied to any other pulsed latches topology that depends flip-flops: Part I—Impact of process variations on timing,” IEEE Trans.
Circuits Syst. I, Reg. Papers, vol. 62, no. 8, pp. 2035–2043, Aug. 2015.
on a delay path to generate the output pulse. [19] M. Alioto, E. Consoli, and G. Palumbo, “Variations in nanometer CMOS
flip-flops: Part II—Energy variability and impact of other sources of
variations,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 62, no. 3,
ACKNOWLEDGMENT pp. 835–843, Mar. 2015.
[20] K. Bernstein et al., “High-performance CMOS variability in the
The authors gratefully acknowledge the staff of Solido 65-nm regime and beyond,” IBM J. Res. Develop., vol. 50, nos. 4–5,
Design Automation Inc. for their support with the Solido pp. 433–449, Jul. 2006.
Variation Designer tool. [21] H. Mahmoodi, S. Mukhopadhyay, and K. Roy, “Estimation of delay vari-
ations due to random-dopant fluctuations in nanoscale CMOS circuits,”
IEEE J. Solid-State Circuits, vol. 40, no. 9, pp. 1787–1796, Sep. 2005.
R EFERENCES [22] M. Wirnshofer, Variation-Aware Adaptive Voltage Scaling for Digital
CMOS Circuits. Dordrecht, The Netherlands: Springer, 2013.
[1] D. Chinnery and K. Keutzer, Closing the Gap Between ASIC & Custom: [23] A. Khajeh et al., “TRAM: A tool for temperature and reliability
Tools and Techniques for High-Performance ASIC Design. Norwell, MA, aware memory design,” in Proc. Design, Autom. Test Eur. Conf.
USA: Kluwer, 2002. Exhibit. (DATE), Apr. 2009, pp. 340–345.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[24] Y. Pu et al., “Misleading energy and performance claims in sub/near Ahmed M. Eltawil (S’97–M’03–SM’14) received
threshold digital systems,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided the Ph.D. degree from the University of California at
Design (ICCAD), Nov. 2010, pp. 625–631. Los Angeles, Los Angeles, in 2003. Since 2005, he
[25] D. Chinnery and K. Keutzer, Closing the Power Gap Between ASIC & has been with the Department of Electrical Engineer-
Custom Tools and Techniques for Low Power Design. New York, NY, ing and Computer Science, University of California
USA: Springer, 2007. at Irvine, Irvine. He is the Founder and the Director
[26] M. Alioto, E. Consoli, and G. Palumbo, Flip-Flop Design in Nanometer of the Wireless Systems and Circuits Laboratory. His
CMOS: From High Speed to Low Energy. Springer International Pub- current research interests are in low power digital
lishing, 2015. circuit and signal processing architectures for wire-
[27] Synopsys 32/28 nm iPDKs, accessed on Jan. 16, 2017. [Online]. less communication systems. He received several
Available: https://www.synopsys.com/community/university- distinguished awards, including the NSF CAREER
program/teaching-resources.html Award in 2010 supporting his research in low power wireless systems.
[28] R. Goldman, K. Bartleson, T. Wood, K. Kranen, V. Melikyan, and
E. Babayan, “32/28 nm educational design kit: Capabilities, deploy-
ment and future,” in Proc. IEEE Asia Pacific Conf. Postgraduate Res.
Microelectron. Electron. (PrimeAsia), Dec. 2013, pp. 284–288.
[29] S. Mukhopadhyay, H. Mahmoodi-Meimand, and K. Roy, “Modeling and
estimation of failure probability due to parameter variations in nano-
scale SRAMs for yield enhancement,” in Symp. VLSI Circuits Dig. Tech.
Papers, Jun. 2004, pp. 64–67.