Beruflich Dokumente
Kultur Dokumente
A Novel Modulo
Adder for
Residue Number System
Shang Ma, Jian-Hao Hu, Member, IEEE, and Chen-Hao Wang
I. INTRODUCTION
(1)
In (1),
, which is referred as correction [5][8]. In
the general modular adder design, the two values,
and
, should be computed firstly. Then, one of them is
selected as the final output. According to the form of the modulus, modular adders can be classified into two types: the general modular adder and the special modular adder.
Manuscript received April 13, 2012; revised October 11, 2012 and December
18, 2012; accepted February 05, 2013. This work was supported in part by
the National Natural Science Foundation of China under Grants 61101033 and
61070696, and by the Fundamental Research Funds for the Central Universities
of China under Grant ZYGX2011J118. This paper was recommended by Associate Editor B.-H. Gwee.
The authors are with the National Key Laboratory of Science and Technology
on Communications, University of Science and Technology of China, Chengdu
611731, China (e-mail: mashang@uestc.edu.cn).
Digital Object Identifier 10.1109/TCSI.2013.2252639
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
MA et al.: A NOVEL MODULO
where and
represent the
carry
generation bit and carry propagation bit respectively.
The prefix computation unit is used to compute the carry information used in the sum computation unit. Prefix computation
can be performed by
(5)
,
,
,
where
stage. The smaller means the shorter
and represents the
delay of the carry chain. The operator in (5) is the prefix
operator and
is the prefix computation result of the
stage from the
bit to the
bit, which is also called group
prefix computation. There are several well known binary prefix
addition structures, such as Sklansky (SK), Brunt-Kung (BK),
Kogge-Stone (KS), Han-Carlson (HC), ELM, and so forth [22].
The prefix structures mentioned above are usually called prefix
trees.
for
After prefix computation, carries
the
bit can be obtained. They can be computed as
adder structure.
(6)
The pre-processing unit is used to generate the carry generation and carry propagation bits
of
. From (3),
when
(8)
(7)
C. Unit-gate Model for Area and Delay Analysis
The unit-gate model is one of the most commonly used
models to estimate the circuit complexity and performance in
VLSI design. In the unit-gate model, simple two-input logic
gates, such as AND, OR, NAND, and NOR, are treated as unit
gates. They have the same area and delay, which are referred as
and
in this paper, respectively. For those more complicated two-input gates, such as XOR and XNOR, their area and
and
in our analysis, respectively.
delay are defined as
Complex logical circuits as well as multi-input gates can be
implemented with 2-input unit gates, and their gate counts
equal the sum of gate counts of the unit gate [22].
III. PROPOSED MODULO
ADDER
Obviously,
the
binary
.
representation
of
is
and
and
be
respeccan be regarded as
(9)
where
is the carry-out bit of adder
.
For
, one of the inputs of
, every bit is 0 except the
least significant bit. Thus,
can be treated as a -bit adder with
the lowest carry-in bit, which is exactly as same as the general
binary adder. And the way pre-processing of
is also similar
with the general binary adder. The difference is that the lowest
carry-in bit should be considered. Therefore, carry generation
and carry propagation bits are
(10)
For adder
, it does not only add the constant
, but also
the carry-out bit
from adder
. It can be regarded as a
three-inputs adder with the lowest carry-in bit. The three inputs
are
,
and
in binary. In this paper,
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
4
Proof: Let
and
be
the binary representations of and , respectively. Then, we
have
,
and
.
According to the parallel prefix algorithm, we have
.
If
and
(12)
From (10) and (12), all of the information required in the
prefix computation is obtained. Furthermore, the carry-out bit
of SCSA,
, is required to compute the carry-out bit of
,
. It is calculated as
(13)
B. Carry Generation Unit
In carry generation unit, the carries
of
can be obtained with the carry generation and
carry propagation bits from the pre-processing unit. Any existing prefix structure can be used to get the carries
in this
paper.
It is worth pointing out that the carry-out bit of SCSA in
the pre-processing unit, as shown in (13), is not involved in
the prefix computation. Instead,
combined with the
carry-out bit of the prefix tree is required to determine the
carry-out bit of
(denoted as
)
(14)
where
, then
, which yields
. Thus, we have
. That is,
.
If
Hence,
, it means that
cant be propagated to
.
, which is irrelevant with . That means
.
.
Thus,
Q. E. D.
Theorem 1 means that
can be determined from
by
simple logic operation. That is the foundation of the carry
correction for the proposed modular adder. We present the procedure of the carry correction in our scheme based on Theorem
1 as following.
For the proposed modulo
adder,
and can be represented as
in binary. The
computation of
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
MA et al.: A NOVEL MODULO
are input signals. And the output is the result of the first
correction, denoted as
(16)
Carry Correction for
From (16),
is the carry information of
or
after the correction
for adder
. Then we can perform the second correction
based on
and let the carry bits of the second correction
be
. Similar to the first correction,
is the carry of
(that is,
) when
. Otherwise,
is the carry of
. That is,
is the
final carry information needed in sum computation unit.
When
,
. The bit 1 in
will
not affect
. Hence,
(17)
, according to Theorem 1
When
and (16), the carries after the second carry correction are
(21)
Substituting (19) into (21), we get
When
, the inputs of adder
in Fig. 2
are
and
. And the carry-in bit
is the carry-out bit of adder , that is, . Considering the least
significant bit of
is 1, we can treat the operation of adder
as the addition of two inputs,
and
, with the lowest carry-in bit 1. That
is, the results and carry information of
,
in (18) are identical
(22)
Substituting (16) into (22), we get
(18)
Thus, we can get the carries of
by modifying the carries
of adder
with Theorem 1.
Combined with the final carry-out bit of
,
, the
carries
required by the proposed modular adder are determined.
Since the second carry correction is performed under the condition that the lowest carry-in bit of adder
is a constant 1,
the propagation bits used in the carry correction unit should
be computed by
and
. From
the above analysis, it is shown that the difference between these
(23)
When
. Similarly, we get
(24)
According to (16), (17), (23) and (24), the carry bits required
by the proposed modular adder are determined as shown in (25),
at the bottom of the page.
(25)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6
Let
). Besides,
just
(26)
Then
(30)
(27)
Hence
(31)
(28)
and
(29)
Thus, we can get the carry information that will be used in
the sum computation unit of the proposed modular adder.
D. The Sum Computation
Generally, the sum computation is as same as that in prefixbased binary adder. However,
is the correction result when
is taken into account. That is, if
,
is the carry
bit of
. Otherwise, it is the carry bit of
. Thus,
the partial sum bits of
and
are both required
in the final sum computation. Let and
be the partial sum bits of
and
respectively.
Note that
has been determined in the
(32)
When
(33)
At last, the sum bits are
(34)
In (34),
and
can be obtained at the same time.
Therefore, there is no extra delay compared with other sum computation units.
E. Design Example
The VLSI implementation structure of modulo
adder based on the proposed scheme is shown in Fig. 3(a).
Fig. 3(b) illustrates the function of each module.
Pre-processing Unit
The pattern in Fig. 3 is the pre-processing unit and
used to generate carry generation and carry propagation
bits for the following prefix computation. Since there are
fixed 1 inputs at the 1st and the 4th places, the patterns
and are used for this special situations. The pattern does not cost any resource in unit-gate model.
The computations of these patterns correspond to (10), (11)
and (12).
Prefix Computation
The pattern is the prefix computation unit. In this example, the Sklansky prefix tree is used and there are 11
prefix computation units, which corresponds to (4). The
delay of is determined by its carry generation path
which is one OR gate and one AND gate. However, the
pattern in the final stage of prefix tree is not needed
to compute propagation bits.
The Computation of
The
is computed by pattern in Fig. 3. According to (14),
,
and
can be computed concurrently. Then, we can get
after an OR gate. Thus,
the delay of
computation will not exceeding the
delay of pattern and there is no extra delay. In order
to minimize the delay of , the value of can be selected so that the delay difference of
and
is at
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
MA et al.: A NOVEL MODULO
Fig. 3. Modulo
and
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8
AREA OF MODULO
TABLE I
ADDER BASED ON UNIT-GATE MODEL
(36)
Obviously, the critical path delay of pre-processing and the first
level prefix computation is
. Meanwhile, we can get
and
in the computation procedure of (36). As result, there is
no extra area.
The delay of carry correction units and sum computation units
are both
. As for prefix operation, its delay depends on the
adopted prefix structure. According to the above analysis, the
critical path delay of the proposed modulo
adder is
the sum of the delay of prefix structure and 7 unit gates. That is
(37)
TABLE II
ADDER WITH DIFFERENT
THE DELAY AND AREA OF MODUL
PREFIX STRUCTURES BASED ON UNIT-GATE MODEL
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
MA et al.: A NOVEL MODULO
TABLE III
THE AREA AND DELAY COMPARISON BASED ON UNIT-GATE MODEL
TABLE IV
ASIC SYNTHESIZED RESULTS FOR THE PROPOSED
MODULAR ADDER WITH DIFFERENT
TABLE V
ASIC SYNTHESIZED RESULTS FOR TIMING OPTIMIZATION I
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
10
TABLE VI
ASIC SYNTHESIZED RESULTS FOR TIMING OPTIMIZATION II
TABLE VIII
ASIC SYNTHESIZED RESULTS FOR AREA OPTIMIZATION
,
TABLE VII
ASIC SYNTHESIZED RESULTS FOR TIMING OPTIMIZATION III
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
MA et al.: A NOVEL MODULO
[3] P. Patronik, K. Berezowski, S. J. Piestrak, J. Biernat, and A. Shrivastava, Fast and energy-efficient constant-coefficient FIR filters using
residue number system, in Proc. Int. Symp. Low Power Electronics
and Design (ISLPED), 2011, pp. 385390.
[4] J. C. Bajard, L. S. Didier, and T. Hilaire, -direct form transposed
and residue number systems for filter implementations, in Proc. IEEE
54th Int. Midwest Symp. Circuits and Systems (MWSCAS), 2011, pp.
14.
[5] M. Bayoumi, G. Jullien, and W. Miller, A VLSI implementation of
residue adders, IEEE Trans. Circuits Syst., vol. CAS-34, no. 3, pp.
284288, Mar. 1987.
[6] S. J. Piestrak, Design of residue generators and multioperand modular
adders using carry-save adders, IEEE Trans. Comput,, vol. 43, no. 1,
pp. 6877, Jan. 1994.
[7] H. Vergos, On the design of efficient modular adders, J. Circuits,
Syst., and Comput., vol. 14, no. 5, pp. 965972, Oct. 2005.
[8] G. Jaberipur, B. Parhami, and S. Nejati, On building general modular adders from standard binary arithmetic components, in Proc. 45th
Asilomar Conf. Signals, Systems, and Computers, 2011, pp. 69.
[9] M. Dugdale, VLSI implementation of residue adders based on binary
adders, IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process.,
vol. 39, no. 5, pp. 325329, May 1992.
[10] A. A. Hiasat, High-speed and reduced-area modular adder structures
for RNS, IEEE Trans. Comput,, vol. 51, no. 1, pp. 8489, Jan. 2002.
[11] R. A. Patel, M. Benaissa, N. Powell, and S. Boussakta, ELMMA: A
new low power high-speed adder for RNS, in Proc. IEEE Workshop
on Signal Processing Systems, Oct. 2004, pp. 95100.
[12] R. A. Patel, M. Benaissa, N. Powell, and S. Boussakta, Novel
power-delay-area-efficient approach to generic modular addition,
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 6, pp.
12791292, Jun. 2007.
arithmetic
[13] E. Vassalos, D. Bakalis, and H. T. Vergos, Modulo
units with embedded diminished-to-normal conversion, in Proc. 14th
Euromicro Conf. Digital System Design (DSD), 2011, pp. 468475.
[14] G. Jaberipur and S. Nejati, Balanced minimal latency RNS addition
,, in Proc. 18th Int. Conf. Systems,
for moduli set
Signals and Image Processing (IWSSIP), 2011, pp. 17.
[15] H. T. Vergos and C. Efstathiou, A unifying approach for weighted and
addition, IEEE Trans. Circuits Syst. II,
diminished-1 modulo
Exp. Briefs, vol. 55, no. 10, pp. 10411045, Oct. 2008.
[16] S. H. Lin and M. H. Sheu, VLSI design of diminished-one modulo
adder using circular carry selection, IEEE Trans. Circuits Syst.
II, Exp. Briefs, vol. 55, no. 9, pp. 897901, Sep. 2008.
[17] C. Efstathiou, H. T. Vergos, and D. Nikolos, Fast parallel-prefix
adders, IEEE Trans. Comput., vol. 53, no. 9, pp.
modulo
12111216, Sep. 2004.
[18] R. A. Patel and S. Boussakta, Fast parallel-prefix architectures for
addition with a single representation of zero, IEEE
modulo
Trans. Comput., vol. 56, no. 11, pp. 14841492, Nov. 2007.
[19] P. M. Matutino, R. Chaves, and L. Sousa, Arithmetic units for RNS
and
operations, in Proc. 13th Euromicro Conf.
moduli
Digital System Design: Architecture, Methods and Tools (DSD), 2010,
pp. 243246.
11