Sie sind auf Seite 1von 9

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS 1

Variable Latency Speculative Han-Carlson Adder


Darjn Esposito, Davide De Caro, Senior Member, IEEE, Ettore Napoli, Nicola Petra, Member, IEEE, and
Antonio Giuseppe Maria Strollo, Senior Member, IEEE

Abstract—Variable latency adders have been recently proposed detection network that asserts an output signal when speculation
in literature. A variable latency adder employs speculation: the fails. In this case (misprediction), another clock cycle is needed
exact arithmetic function is replaced with an approximated one
that is faster and gives the correct result most of the time, but to obtain the correct result with the help of a correction stage.
not always. The approximated adder is augmented with an error Since the addition time is one clock cycle when no error occurs
detection network that asserts an error signal when speculation and two clock cycles when the speculation fails, the average ad-
fails. Speculative variable latency adders have attracted strong dition time can be computed as
interest thanks to their capability to reduce average delay com-
pared to traditional architectures. This paper proposes a novel
variable latency speculative adder based on Han-Carlson par- (1)
allel-prefix topology that resulted more effective than variable
latency Kogge-Stone topology. The paper describes the stages in where is the clock period and is the error probability
which variable latency speculative prefix adders can be subdivided of the speculative adder.
and presents a novel error detection network that reduces error Speculative adders are built upon the observation that the crit-
probability compared to previous approaches. Several variable
latency speculative adders, for various operand lengths, using ical path is rarely activated in traditional adders [9]–[13]. In par-
both Han-Carlson and Kogge-Stone topology, have been synthe- ticular, in traditional adders each output depends on all previous
sized using the UMC 65 nm library. Obtained results show that bits, so the most significant output depends on all the input
proposed variable latency Han-Carlson adder outperforms both bits. Instead, in speculative adders each output depends only
previously proposed speculative Kogge-Stone architectures and
non-speculative adders, when high-speed is required. It is also on the previous bits, where goes as [12]–[15].
shown that non-speculative adders remain the best choice when This reflects the fact that a propagate chain longer than
the speed constraint is relaxed. is a very rare event.
Index Terms—Addition, digital arithmetic, parallel-prefix A first speculative approach to addition was proposed by
adders, speculative adders, speculative functional units, variable Nowick [12] in asynchronous contest, which implements a vari-
latency adders. able latency adder cutting the lowest levels of a Kogge-Stone
adder. In synchronous contest, Verma et al.[13] propose a vari-
I. INTRODUCTION able latency speculative adder; here the speculative addition

A DDERS ARE basic functional units in computer arith- is realized in the same way as [12], cutting the lower levels
metic. Binary adders are used in microprocessor for of a Kogge-Stone adder. A similar approach is employed in
addition and subtraction operations as well as for floating point [14]. In [15] a variable latency carry-select adder is introduced,
multiplication and division. Therefore adders are fundamental where the adder is fragmented in various windows, each one
components and improving their performance is one of the containing a Kogge-Stone adder.
major challenges in digital designs. Theoretical research [1] The Kogge-Stone adder is often used when speed is the pri-
has established lower bounds on area and delay of -bit adders: mary concern, since it uses the minimum number of logic levels
the former varies linearly with adder size, the latter has an and each cell in the adder tree has fanout of 2. This comes at the
behavior. cost of using many propagate-generate cells and many wires that
High speed adders are based on well established parallel- must be routed between stages.
prefix architectures [1], [2], including Brent-Kung [3], Kogge- In this paper we propose a novel variable latency specula-
Stone [4], Sklansky [5], Han-Carlson [6], Ladner-Fischer [7], tive adder based on Han-Carlson [6] parallel-prefix topology.
Knowles [8]. These standard architectures operate with fixed The Han-Carlson topology uses one more stage than Kogge-
latency. Better average performances can be achieved by using Stone adder, while requiring a reduced number of cells and sim-
variable latency adders, that have been recently proposed in lit- plified wiring. Thus, it can achieve similar speed performance
erature [9]. A variable latency adder employs speculation: the compared to Kogge-Stone adder, at lower power consumption
exact arithmetic function is replaced with an approximated one and area [16]. We show that a speculative carry tree can be
that is faster and gives the correct result most of the time, but obtained by pruning some intermediate levels of the classical
not always. The approximated adder is augmented with an error Han-Carlson topology. The paper presents a rigorous derivation
of the error detection network and shows that the error detec-
tion network required in speculative Han-Carlson adders is sig-
nificantly faster than the one used by speculative Kogge-Stone
Manuscript received July 23, 2014; revised December 12, 2014 and nulldate;
accepted January 29, 2015. This paper was recommended by Associate Editor architecture. An extensive set of implementation results for 65
C. P. Ravikumar. nm CMOS technology shows that proposed Han-Carlson vari-
The authors are with the Department of Electrical Engineering and Infor- able latency adders outperform previously developed variable
mation Technology, University of Napoli “Federico II”, I80125 Naples, Italy latency Kogge-Stone architectures. Compared with traditional,
(e-mail: dadecaro@unina.it).
Color versions of one or more of the figures in this paper are available online non-speculative, adders, our analysis demonstrates that variable
at http://ieeexplore.ieee.org. latency Han-Carlson adders show sensible improvements when
Digital Object Identifier 10.1109/TCSI.2015.2403036 the highest speed is required; otherwise the burden imposed by

1549-8328 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS

error detection and error correction stages overwhelms any ad-


vantage.
The paper is organized as follows. In Section II we recall the
basic architecture of parallel-prefix adders. The stages in which
variable latency speculative prefix adders can be subdivided are
presented in Section III where, after a brief review of Kogge-
Stone speculative prefix-processing stage introduced in [12],
we present the proposed Han-Carlson speculative topology. De-
tailed discussion about the error detection stage is also reported
in this Section. The Section IV presents spatial and timing com-
plexity of investigated architectures. Section V shows detailed
implementation and synthesis results of the proposed adders, for
operand size ranging from 32 through 128 bits. Section VI con-
cludes the paper with some final remarks.
II. PRELIMINARIES

A. Prefix Addition
The binary addition problem can be formulated as fol-
lows: given an -bit augend and an
-bit addend generate the -bit sum
Fig. 1. Han-Carlson and Kogge-Stone parallel-prefix topologies. .
. Let us indicate as the carry out of the
-th bit. The sum bit and the carry can be computed as
follows:
(2)
(3) (10)
In prefix addition we use three stages to compute the sum: where: . The prefix operator has two important
pre-processing, prefix-processing and post-processing. properties: it is associative and it is idempotent. These proper-
In the pre-processing stage the generate and propagate ties are exploited in the prefix-processing stage to speed-up the
signal are computed as: computation.
Finally, in the post-processing stage, the sum bit are com-
(4)
puted using (8) and:
(5)
(11)
The condition means that a carry is generated at bit
, while the condition means that a carry is propagated
through bit . B. Han-Carlson and Kogge-Stone Parallel-Prefix Adder
The concept of generate and propagate can be extended to Topologies
a block of contiguous bits, from bit to bit (with ) as The pre-processing and post-processing stages of a prefix
follows: adder involve only simple operations on signals local to each
if bit position. Therefore, adder performance mainly depends on
(6)
otherwise prefix-processing stage.
if Fig. 1 shows Han-Carlson and Kogge-Stone prefix adders
(7) topologies. Here black dots represent the prefix operator (10),
otherwise
while white dots are simple placeholders.
where: . Kogge-Stone adder is composed by levels and
The condition means that a carry is generated in the present a fanout of two at each level using a large number of
block , while the condition means that a carry black cells and many wire tracks. A good trade-off between
is propagated through the block. Thus, for any bit , the carry fanout, number of logic levels and number of black cells is
can be expressed as: given by Han-Carlson. The outer rows of the Han-Carlson
topology are Brent-Kung [3] graphs, while the inner rows are
(8)
Kogge-Stone graphs. The Han-Carlson adder in Fig. 1 uses a
where is the input carry of the -bit adder. In the following, single Brent-Kung level at the beginning and at the end of the
for the sake of simplicity, we assume that , so that (8) graph, and the number of levels is .
simplifies as:
III. VARIABLE LATENCY SPECULATIVE PREFIX ADDERS
(9)
Variable latency speculative prefix adders can be subdivided
The block generate and propagate terms are computed in in five stages: pre-processing, speculative prefix-processing,
the prefix-processing stage of the adder. To that purpose, the post-processing, error detection and error correction. The error
( , ) couples are expressed with the help of the prefix correction stage is off the critical path, as it has two clock
operator defined as follows: cycles to obtain the exact sum when speculation fails.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ESPOSITO et al.: VARIABLE LATENCY SPECULATIVE HAN-CARLSON ADDER 3

Fig. 2. Kogge-Stone speculative prefix-processing stage. The last row of a


bit Kogge-Stone adder is pruned, resulting in a speculative prefix-processing
Fig. 3. Han-Carlson speculative prefix-processing stage. The last Kogge-Stone
stage with .
row of the bit graph is pruned, resulting in a speculative prefix pro-
cessing stage with .
A. Pre-Processing
, where is the number of pruned levels; the number
In the pre-processing stage the generate and propagate of levels of the speculative Han-Carlson stage reduces from
signals are computed as in (4), (5). to (assuming that is a power of
two).
B. Speculative Prefix-Processing
As it can be observed in Fig. 3, the length of the propagate
The speculative prefix-processing stage is one of the main chains is only for , while for
differences compared with the standard prefix adders recalled the propagate chain length is .
in previous section. Instead of computing all the and In general, the computed propagate and generate signals for
required in (8) to obtain the exact carry values, only a subset of the speculative Han-Carlson architecture are:
block generate and propagate signals is calculated; in the post-
processing stage approximate carry values are obtained from
this subset. The output of the speculative prefix-processing stage
will also be used in the error detection and in the error correction (13)
stages discussed in the following.
The basic assumption behind speculative prefix-processing As it will be apparent in the following, having the propagate
stage is that carry signals propagate for no more than bits, lengths equal to for half of the outputs greatly simplifies
with and . This assumption is corrob- the error detection.
orated by the analyses in [13],[17] that demonstrate that having C. Post-Processing
a propagate chain longer that is a very rare event.
1) Kogge-Stone Topology: The Kogge-Stone speculative In the post-processing stage we firstly compute the approx-
prefix-processing stage has been proposed in [12],[13] and imate carries, , and then use them to obtain the approximate
can be obtained by pruning the last levels of a traditional sum bits as follows:
Kogge-Stone adder. In the example shown in Fig. 2, the last (14)
level of a bit Kogge-Stone adder is pruned. As it can be
observed, for the length of propagate chains extends for Similarly to (9), the approximate carries are obtained as the
8 bits, resulting in a speculative prefix-processing stage with generate signals available in the last level of the prefix-pro-
. cessing stage. We have:
In general, one has , where is the number of for:
pruned levels; the number of levels of the speculative stage is otherwise
correspondingly reduced from to (assuming (15)
that is a power of two). and:
In general, the computed propagate and generate signals for for:
the speculative Kogge-Stone architecture are: for: odd
for: even
(16)
(12)
D. Error Detection
2) Han-Carlson Topology: Han-Carlson adder constitutes The conditions in which at least one of the approximate car-
a good trade-off between fanout, number of logic levels and ries is wrong (misprediction) are signaled by the error detection
number of black cells. Because of this, Han-Carlson adder can stage. In case of misprediction, an error signal is asserted by
achieve equal speed performance respect to Kogge-Stone adder, error detection stage and the output of the post-processing stage
at lower power consumption and area [16]. Therefore it is inter- is discarded. The error correction stage will give the correct sum
esting to implement a speculative Han-Carlson adder. in the next clock period.
Moved by these reasons, we have generated a Han-Carlson 1) Kogge-Stone: The error condition for carry can be ob-
speculative prefix-processing stage by deleting the last rows of tained from (9),(15) and using the properties of propagate gen-
the Kogge-Stone part of the adder. As an example, the Fig. 3 erate signals as:
shows the Han-Carlson adder of Fig. 1 in which the two Brent- for:
Kung rows at the beginning and at the end of the graph are un- (17)
otherwise
changed, while the last Kogge-Stone row is pruned. This yields
a speculative stage with . In general, one has Thus, the error signal can be expressed as:
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS

2) Han-Carlson: The error condition for carry can be ob-


tained from (9), (16) as:
for:
odd (25)
even
The error signal can be written as:

(26)
It can easily be seen that in (26) the terms in the second OR
are implied by the terms in the first OR. Let us consider, for
instance, the first two terms of the OR (assuming that is even).
We have:
Fig. 4. The nodes of the prefix-processing stage, whose outputs are needed to
compute the error signal, are named “checking nodes” and are highlighted as (27)
big hatched dots, for the topologies in Fig. 2–3.
Thus, we can write:

(28)
(18)
The last equation can be simplified, following an approach
where the symbol represents the logical OR. similar to previous subsection. Let us consider the last two terms
It is important to note that (18) is a necessary and sufficient of the OR in (28), with index and (assuming that
error condition that requires the calculation of . Unfor- is even):
tunately, these terms are actually not computed by the specula-
(29)
tive prefix-processing stage (avoiding the computation of these
terms is the key idea of speculative adders). Thus, in previous One has:
papers,(18) is replaced by the following looser relation:
(30)
(19) Substituting (30) in (29), the terms with index and
of (28) can be simplified as:
The last equation is a necessary-only error condition. By
using (19), the error signal can be triggered even in absence (31)
of actual misprediction. While this does not harm the correct Similar simplifications can be realized by considering in (28)
operation of the speculative adder, having an high rate of such the terms and and so on. Finally one obtains:
“false positive” errors degrades the average addition time (1).
In this paper, instead, we rewrite the necessary and sufficient (32)
condition (18) in a form that does not require the
terms. To that purpose, let us consider the last two terms of the
OR in (18), with index and : Let us consider, as an example, the prefix-processing stage in
(20) Fig. 3. The error signal (32) is given by:

One has: (33)

(21) By comparing (23) and (32), it can easily be seen that the
number of terms to be OR-ed to obtain the error signal is halved
Substituting (21) in (20), the terms with index and in the Han-Carlson topology, compared to Kogge-Stone.
of (18) can be simplified as: We name “checking nodes” the nodes of the prefix-processing
(22) stage, whose outputs are needed to compute the error signal.
The checking nodes for both the Kogge-Stone example of Fig. 2
Similar simplifications can be realized by considering in (18) and the Han-Carlson example of Fig. 3 are highlighted as big
the terms and and so on. Finally one obtains: hatched dots in Fig. 4.
As it can be observed, in Kogge-Stone some of the checking
(23) cells are at the last level of the graph; their output signals are
available after three black cells delay. In Han-Carlson the crit-
Let us consider, as an example, the prefix-processing stage in ical checking cells are in the second last level of the graph and
Fig. 2. The error signal (23) is given by: are also available after three black cells delay, in spite of the
larger number of levels of the Han-Carlson prefix-processing
(24) stage.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ESPOSITO et al.: VARIABLE LATENCY SPECULATIVE HAN-CARLSON ADDER 5

Fig. 5. Error correction and detection stages for the proposed speculative Han-Carlson adder of Fig. 3.

From the above observations, it can be concluded that error employed black cells (AO gates). Error detection spatial com-
detection is sensibly simplified and potentially faster in Han- plexity is simply estimated assuming that it is composed by a
Carlson, compared to Kogge-Stone. set of AND gates (to compute the terms in (23) or in (32))
As an additional note, the need of driving the gates of the followed by a tree of two-input OR to compute the error signal
error detection stage increases the fanout of the checking cells, (see Fig. 5 for an example). According to the model proposed in
slowing the speculative prefix-processing stage. [18] we assume as unit gate a basic 2-input gate, such as AND
gates and OR gates, while we count black cells (AO gates) as
E. Error Correction two unit gates.
Regarding delay, we assume that speculative sum delay is
The error correction stage computes the exact carry signals
proportional to the number of levels of speculative parallel-
(9), to be used in case of misprediction.
prefix stage, plus two additional levels to take into account pre-
The error correction stage is composed by the levels of the
processing and post-processing. Error detection delay is esti-
prefix-processing stage pruned to obtain the speculative adder.
mated as the number of OR-tree levels, plus one additional level
The Fig. 5 shows the error correction stage of the proposed spec-
to take into account the AND gates computing the terms
ulative Han-Carlson adder; the error correction for Kogge-Stone
in (23) or in (32). Assuming unit gate delay model of [18], we
topology can be obtained similarly.
count the basic 2-input gates such as AND and OR as one gate
It can be observed that the inclusion of the error correction
delay with the exception of the XOR gates which we count as
stage increases the fanout of some of the cells of the speculative
two gate delays.
prefix-processing stage, with adverse effect on adder speed.
Obtained results are shown in Table I. For speculative adders,
spatial complexity is reported as the sum of two contributions.
F. Post-Processing
The first one (curly brackets) is the contribution of speculative
The approximate carries are already available at the output of prefix stage and error correction stage, the second one (square
the prefix-processing stage. The post-processing, according to brackets) is the contribution of error detection stage. As it can
(14), is equal to the one of a non-speculative adder and consists be observed, the two area contributions are both lower in the
of xor gates. proposed Han-Carlson speculative adder, compared to Kogge-
Stone. It also worth noting that the spatial complexity of spec-
IV. ADDERS CHARACTERIZATION ulative adders is higher than non-speculative ones, because of
In this section we provide a characterization of the spatial and error detection and correction stages.
timing complexity of the investigated variable latency specula- Regarding to timing complexity, Table I reports the values
tive adders, using either Han-Carlson or Kogge-Stone topolo- of both speculative sum and error detection. The Kogge-Stone
gies. Results for non-speculative adders are also reported, for adder saves two gate levels to perform the speculative sum.
comparison. This will be achieved with the help of simplistic However, the critical path traverses the error detection stage
hypotheses on area and speed of employed gates, with the aim and hence the proposed Han-Carlson architecture appears
of obtaining an analytic comparison (albeit approximated) be- faster than Kogge-Stone speculative adder, owing to the halved
tween the various topologies. Accurate values of area, speed and number of terms to be OR-ed (column in Table I) to obtain
power for 65 nm technology will be presented in the next sec- the error signal.
tion for a quantitative assessment of variable latency speculative
adders. Results of error rate analysis will also be reported at the B. Error Rate Analysis
end of this section.
The value of error probability is fundamental to understand
the degradation of average addition time (1) caused by mispre-
A. Spatial and Timing Complexity
diction. In order to evaluate error probabilities, the proposed
In order to estimate adder complexity, we make some sim- speculative Han-Carlson and the Kogge-Stone topologies have
plistic hypotheses. been simulated by using a Monte Carlo approach with a 1%
We assume that the spatial complexity of speculative prefix- relative error and a 99% confidence level. Input vectors have
processing and error correction is proportional to the number of been chosen uniformly distributed [12],[13]. Table II reports
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS

TABLE I
SPATIAL AND TIMING COMPLEXITY

TABLE II It is not easy to compare performances (in terms of power,


ERROR PROBABILITY VALUES speed, and area) of different designs, since they strongly de-
pend on timing constraint used during synthesis. The results re-
ported in the following have been obtained by performing sev-
eral syntheses of the circuits under investigation, by varying
the timing constraint. In this way we can compare the various
topologies and find the most effective ones depending on the re-
quired speed.
The dynamic power dissipation has been evaluated after syn-
thesis by extracting the nodes activities from a back-annotated
simulation.

A. The Optimal K Choice


The variable latency parallel-prefix adders depend on the
choice of the parameter (the assumed maximum length of
propagate chains, see Section III). One has , where
is the number of pruned levels of the parallel-prefix stage.
The optimal value descends from the following trade-off:
the results of the analysis. For Kogge-Stone we name “Pre-
if we increase we reduce the error probability (with positive
cise” the error detection stage based on equations (23), while we
effects on average delay (1)) but we make parallel-prefix stage
name “Coarse” the one based on the necessary-only error con-
slower (a little number of levels is pruned) and we make also
dition (19); similar naming convention is used for Han-Carlson
the error detection slower (because the checking-cells descend
topology.
toward last levels).
The Precise error detection stage significantly reduces the
To investigate this trade-off, we have synthesized the variable
error probability, compared to the Coarse one. Moreover, Han-
latency speculative adders for different values of parameter.
Carlson speculative adder exhibits a lower error probability than
Results, by varying the synthesis timing constraint, for Han-
Kogge-Stone one. This can be interpreted as follows: in Kogge-
Carlson topology are displayed in Fig. 6, considering 32, 64, and
Stone speculative prefix stage all the carries are computed inde-
128 bit adders. Note that -axis variable is the average delay,
pendently from each other, instead in Han-Carlson, half of the
that takes into account the error probability. For comparison, we
carries (those in even bit-positions) are calculated from “par-
report in Fig. 6 the results obtained also for the non-speculative
ents” carries (those in odd bit-positions), through an additional
Han-Carlson adder.
level of the tree. This reduces error probability (if a parent carry
As it can be observed in Fig. 6, the implementations with
is correct the “child” carry will be correct, too).
and with reveal ineffective, the former because
In the following we will consider only implementations with
of high error rate, the latter because a single level is pruned
the Precise error detection stage which, as shown, provides
compared to the non-speculative adder.
lower error probabilities than Coarse detection.
For 32 bit (Fig. 6(a)) the optimum value is ; this value
of is also the best choice for bit (Fig. 6(b)). For
V. SYNTHESIS RESULTS
bit (Fig. 6(c)) both and give similar
We have developed Matlab scripts which generate Ver- performance.
ilog descriptions of the proposed variable latency specula- Comparison between variable latency adder and the
tive adders, and of their non-speculative counterpart. The non-speculative Han-Carlson topology reveal that variable
synthesis command was used to mark latency adders allow to reduce the minimum achievable delay.
the non-speculative outputs of the speculative adders. We have For instance, in the 64 bit case, the minimum achievable delay
synthesized these adders in UMC 65 nm library, for 32 bit, 64 is about 280 ps for the non-speculative adder and reduces up to
bit, and 128 bit operands. 225 ps in the variable latency architecture.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ESPOSITO et al.: VARIABLE LATENCY SPECULATIVE HAN-CARLSON ADDER 7

Fig. 6. Area and power of Han-Carlson speculative and non-speculative adders, as a function of the timing constraint. (a) 32 bit, (b) 64 bit, (c) 128 bit. Different
values are used for speculative adders.

The analysis of Area Occupation and Power Dissipation lower Power Dissipation for . For ,
shows that speculative adders are not effective for large average the non-speculative adder presents an area of and
delay. As the timing constraint imposed during synthesis is a power of , while the variable latency adder
made tighter speculative adders become advantageous. For exhibits an area of (20% reduction) and a power of
instance, in the 64-bit case, speculative Han-Carlson adder about (9% reduction).
results in a lower Area for lower than 385 ps and also in a
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS

Fig. 7. Comparison of Han-Carlson and Kogge-Stone speculative and non-speculative adders, as a function of the timing constraint. (a) 32 bit, (b) 64 bit, (c) 128
bit.

B. Comparison with Kogge-Stone Variable Latency performance of non-speculative adders, in order to identify the
Speculative Adder region where the speculative approach is effective (the optimum
value for the variable latency speculative Kogge-Stone adder
Fig. 7 shows the comparison between proposed speculative is: for , for and
adder and Kogge-Stone one. Also in this case, we report the ).
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

ESPOSITO et al.: VARIABLE LATENCY SPECULATIVE HAN-CARLSON ADDER 9

The proposed variable latency Han-Carlson adder outper- [16] S. K. Mathew, R. K. Krishnamurthy, M. A. Anders, R. Rios, K. R.
forms the speculative Kogge-Stone architecture in all the Mistry, and K. Soumyanath, “Sub-500-ps 64-b ALUs in 0.18- m SOI/
bulk CMOS: Design and scaling trends,” IEEE J. Solid-State Circuits,
considered cases, confirming the trend highlighted in Table I. vol. 36, no. 11, pp. 1636–1646, Nov. 2001.
As an example, focusing on 64-bit adders, for lower than [17] B. Parhami, Computer Arithmetic: Algorithms and Hardware De-
350 ps, the proposed Han-Carlson speculative adder is the best sign. New York: Oxford Univ. Press, 2000.
choice in terms of silicon area and power consumption. More- [18] A. Tyagi, “A reduced-area scheme for carry-select adders,” IEEE
Trans. Comput., vol. 42, no. 10, pp. 1163–1170, Oct. 1993.
over, it allows to reduce the minimum achievable to 225
ps, with a 18% improvement respect to Kogge-Stone non-spec-
ulative adder and a 11% improvement respect to Kogge-Stone
speculative adder. For , proposed speculative Darjn Esposito was born in 1989 in Naples, Italy. He
adders offer 45% area reduction and 35% power saving com- received the M.S. degree (with honors) in electronic
pared to Kogge-Stone non-speculative adder. engineering from the University of Naples “Federico
II,” in 2013, where he is currently working toward the
VI. CONCLUSION Ph.D. degree. His research interests include design
In this paper a novel variable latency Han-Carlson parallel- of digital VLSI circuits, with particular emphasis on
speculative functional units.
prefix speculative adder for high-speed application is proposed.
A new, more accurate, error detection network is introduced,
which allows reducing the error probability compared to the pre-
vious approaches.
An extensive set of implementation results for 65 nm
CMOS technology shows that proposed Han-Carlson variable Davide De Caro (M'05–SM'09) received the M.S.
degree in electronic engineering with honors, in July
latency adders outperforms previously developed variable 1999, and the Ph.D. degree in electronic engineering
latency Kogge-Stone architectures. Compared with traditional, and computer science, in February 2003, both from
non-speculative, adders, our analysis demonstrates that variable the University of Naples “Federico II”, Italy.
latency Han-Carlson adders show sensible improvements when He has worked in the area of digital integrated
VLSI circuit design for the last fourteen years. Since
the highest speed is required; otherwise the burden imposed March 2003 he is a Researcher at the Department of
by error detection and error correction stages overwhelms any Electrical Engineering and Information Technology.
advantage. Additional work is required to extend the specu- Dr. De Caro is author of more than 50 technical
lative approach to other parallel-prefix architectures, such as papers in international journals and refereed inter-
national conferences.
Brent-Kung, Ladner-Fisher, and Knowles.

REFERENCES Ettore Napoli was born in Italy in 1971. He received


[1] I. Koren, Computer Arithmetic Algorithms. Natick, MA, USA: A K the Electronic engineering degree with honors in
Peters, 2002. 1995; the Ph.D. degree in electronic engineering in
[2] R. Zimmermann, “Binary adder architectures for cell-based VLSI and 1999, and the Physics degree with honors in 2009.
their synthesis,” Ph.D. thesis, Swiss Federal Institute of Technology, He has been an Associate Professor, University of
(ETH) Zurich, Zurich, Switzerland, 1998, Hartung-Gorre Verlag. Napoli, Italy, since 2005.
[3] R. P. Brent and H. T. Kung, “A regular layout for parallel adders,” He was a Research Associate at the Engineering
IEEE Trans. Comput., vol. C-31, no. 3, pp. 260–264, Mar. 1982. Dept. of the University of Cambridge, U.K., in 2004.
[4] P. M. Kogge and H. S. Stone, “A parallel algorithm for the efficient His scientific interests include modeling and design
solution of a general class of recurrence equations,” IEEE Trans. of power semiconductor devices and VLSI circuit de-
Comput., vol. C-22, no. 8, pp. 786–793, Aug. 1973. sign. Prof. Napoli is author or coauthor of more than
[5] J. Sklansky, “Conditional-sum addition logic,” IRE Trans. Electron. 100 papers published in international journals and conferences.
Comput., vol. EC-9, pp. 226–231, Jun. 1960.
[6] T. Han and D. A. Carlson, “Fast area-efficient VLSI adders,” in Proc.
IEEE 8th Symp. Comput. Arith. (ARITH), May 18–21, 1987, pp. 49–56. Nicola Petra (M'05) received the Laurea degree
[7] R. E. Ladner and M. J. Fischer, “Parallel prefix computation,” J. ACM, and the Ph.D. degree from the University of Napoli
vol. 27, no. 4, pp. 831–838, Oct. 1980. “Federico II,” Italy, in 2002 and 2007 respectively.
[8] S. Knowles, “A Family of Adders,” in Proc. 14th IEEE Symp. Comput. His research interests include design of digital VLSI
Arith., Vail, CO, USA, Jun. 2001, pp. 277–281. circuits for telecommunications and high-perfor-
[9] S.-L. Lu, “Speeding up processing with approximation circuits,” Com- mance arithmetic circuits. He is now working as
puter, vol. 37, no. 3, pp. 67–73, Mar. 2004. a Researcher at the Department of Electronics and
[10] T. Liu and S.-L. Lu, “Performance improvement with circuit-level Telecommunications Engineering of the University
speculation,” in Proc. 33rd Annu. IEEE/ACM Int. Symp. Microarchit. of Napoli “Federico II.” He has authored or coau-
(MICRO-33), 2000, pp. 348–355. thored more than 30 papers on scientific journals and
[11] N. Zhu, W.-L. Goh, and K.-S. Yeo, “An enhanced low-power high- international conferences.
speed Adder For Error-Tolerant application,” in Proc. 2009 12th Int.
Symp. Integr. Circuits (ISIC '09), Dec. 14–16, 2009, pp. 69–72.
[12] S. M. Nowick, “Design of a low-latency asynchronous adder using Antonio Giuseppe Maria Strollo (M'05–SM'06) re-
speculative completion,” IEE Proc. Comput. Digit. Tech., vol. 143, no. ceived the Laurea degree (cum laude) and the Ph.D.
5, pp. 301–307, Sep. 1996. degree in electronic engineering from the University
[13] A. K. Verma, P. Brisk, and P. Ienne, “Variable Latency Speculative of Napoli Federico II, Italy. From 2002 he is full
Addition: A New Paradigm for Arithmetic Circuit Design,” in Proc. professor at the same University. He has published
Design, Autom., Test Eur. (DATE '08), Mar. 2008, pp. 1250–1255. more than 110 papers on international journals
[14] A. Cilardo, “A new speculative addition architecture suitable for two's and conferences. His current research interests are
complement operations,” in Proc. Design, Autom., Test Eur. Conf. design and analysis of VLSI circuits. From 2009
Exhib. (DATE '09), Apr. 2009, pp. 664–669. to 2012 he served as Associate Editor of the IEEE
[15] K. Du, P. Varman, and K. Mohanram, “High performance reliable vari- TRANSACTIONS ON CIRCUITS AND SYSTEMS—PART
able latency carry select addition,” in Proc. Design, Autom., Test Eur. I: REGULAR PAPERS; currently he is Associate Editor
Conf. Exhib. (DATE '12), Mar. 2012, pp. 1257–1262. of Integration, the VLSI Journal.