Sie sind auf Seite 1von 6

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO.

10, OCTOBER 2012

1903

propagation delay will be 0 (1 + 2(CI )=(CL )), where 0 is the delay of a crosstalk-free line, and CL and CI are the wire-to-substrate capacitance and inter-wire capacitance, respectively. For the uniformly distributed random data, all these techniques incur nearly the same number of switching transitions on an average. VI. CONCLUSION By exploiting Fibonacci number system, we proposed a family of Fibonacci coding techniques for crosstalk avoidance. We showed the inter-dependency among the proposed techniques and provided a formal procedure to convert a codeword set into another codeword set. We also related our proposed techniques with some of the existing crosstalk avoidance coding techniques. The proposed techniques eliminate crosstalk completely, but not inductance. The worst-case inductance occurs when adjacent lines transition in the same direction. We plan to come up with a suitable mechanism to minimize the inductance effects using Fibonacci codes in future.

New Bit Parallel Multiplier With Low Space Complexity for All Irreducible Trinomials Over
Young In Cho, Nam Su Chang, Chang Han Kim, Young-Ho Park, and Seokhie Hong

AbstractKo and Sunar proposed an architecture of the Mastrovito multiplier for the irreducible trinomial f (x) = x + x + 1, where k 6= n=2 to reduce the time complexity. Also, many multipliers based on the Karatsuba-Ofman algorithm (KOA) was proposed that sacriced time efciency for low space complexity. In this paper, a new multiplication formula which is a variant of KOA presented. We also provide a straightforward architecture of a non-pipelined bit-parallel multiplier using the new formula. The proposed multiplier has lower space complexity than and comparable time complexity to previous Mastrovito multipliers for all irreducible trinomials. Index TermsBit-parallel multiplier, nite eld, irreducible trinomial, Mastrovito multiplication, polynomial basis.

REFERENCES
[1] F. Caignet, S. Delmas-Bendhia, and E. Sicard, The challenge of signal integrity in deep-submicrometer CMOS technology, Proc. IEEE, vol. 89, no. 4, pp. 556573, Apr. 2001. [2] D. Pamunuwa, L.-R. Zheng, and H. Tenhunen, Maximizing throughput over parallel wire structures in the deep submicrometer regime, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, no. 2, pp. 224243, Apr. 2003. [3] R. Arunachalam, E. Acar, and S. Nassif, Optimal shielding/spacing metrics for low power design, in Proc. IEEE Comput. Soc. Annu. Symp. VLSI, 2003, pp. 167172. [4] B. Victor and K. Keutzer, Bus encoding to prevent crosstalk delay, in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, 2001, pp. 5763. [5] C. Duan, A. Tirumala, and S. Khatri, Analysis and avoidance of crosstalk in on-chip buses, in Proc. Hot Interconnects, 2001, pp. 133138. [6] P. Subramanya, R. Manimeghalai, V. Kamakoti, and M. Mutyam, A bus encoding technique for power and cross-talk minimization, in Proc. IEEE Int. Conf. VLSI Design, 2004, pp. 443448. [7] M. Stan and W. Burleson, Limited-weight codes for low power I/O, in Proc. IEEE/ACM Int. Workshop Low Power Design, 1994, pp. 209214. [8] M. Mutyam, Preventing crosstalk delay using Fibonacci representation, in Proc. IEEE Int. Conf. VLSI Design, 2004, pp. 685688. [9] C. Duan, C. Zhu, and S. Khatri, Forbidden transition free crosstalk avoidance codec design, in Proc. Design Autom. Conf., 2008, pp. 986991. [10] C. Duan, V. C. Calle, and S. Khatri, Efcient on-chip crosstalk avoidance codec design, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 4, pp. 551560, Apr. 2009. [11] K. Karmarkar and S. Tragoudas, Scalable codeword generation for coupled buses, in Proc. Design, Autom., Test Eur., 2010, pp. 729734. [12] A. Fraenkel, Systems of numeration, Amer. Math. Monthly, no. 92, pp. 105114, 1985. [13] W. Kautz, Fibonacci codes for synchronization control, IEEE Trans. Inf. Theory, vol. 11, no. 2, pp. 284292, Apr. 1965. [14] P. Sotiriadis and A. Chandrakasan, Reducing bus delay in sub-micron technology using coding, in Proc. IEEE/ACM Asia South Pacic Design Autom. Conf., 2001, pp. 109114.

I. INTRODUCTION Efcient hardware implementations of the nite eld GF (2n ) arithmetic (addition, multiplication, squaring, and inversion) are required for several applications, such as coding theory, computer algebra, and public key cryptography. Among them, multiplication is the basic operation and an important factor in many applications of the arithmetic operations over GF (2n ). Therefore, a number of efcient GF (2n ) multiplication approaches and architectures have been proposed. The efciency of the hardware implementations is measured in terms of the number of XOR and AND gates (]AND and ]XOR) and the total gate delay of the circuit. If TA and TX correspond to the delay due to one two-input AND and XOR gate, respectively, then the total gate delay due to gates can be expressed as zA TA + zX TX when zA and zX are positive integers. The efciency of the multiplier depends on the representation of the eld elements and the number of nonzero terms in an irreducible polynomial. In this paper, we consider the polynomial basis (PB), the most widely used, and irreducible trinomials for the design of bit-parallel nite eld multipliers. In bit-parallel designs, a complete operand word is processed in every cycle, where the bits of input multiplicands are fed in parallel and the bits of output product word are also obtained in parallel. The multiplication in GF (2n ) based on the PB is often accomplished in two-step algorithms which are polynomial multiplication and modular reduction. Mastrovito proposed a bit parallel multiplier which is called the Mastrovito multiplier, combining the above two steps to reduce time complexity [5], [6]. The time complexity of

Manuscript received October 04, 2010; revised April 17, 2011; accepted June 11, 2011. Date of publication August 30, 2011; date of current version July 19, 2012. This work was supported by the Ministry of Knowledge Economy (MKE), Korea, under the ITRC support program supervised by the National IT Industry Promotion Agency (NIPA) (NIPA-2011-C1090-1001-0004) and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (2011-0004395). Y. I. Cho and S. Hong are with the Graduate School of Information Management and Security, Korea University, Seoul 136-701, Korea (e-mail: elowey@korea.ac.kr; shhong@korea.ac.kr). N. S. Chang and Y.-H. Park are with the Department of Information Security Systems, Sejong Cyber University, Seoul 143-150, Korea (e-mail: nschang@sjcu.ac.kr; youngho@sjcu.ac.kr). C. H. Kim is with the Department of Information and Security, Semyung University, Jecheon 390-711, Korea (e-mail: chkim@semyung.ac.kr). Digital Object Identier 10.1109/TVLSI.2011.2162594

1063-8210/$26.00 2011 IEEE

1904

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 10, OCTOBER 2012

Mastrovito multipliers in [1][3], [8] is TA + (dlog2 ne + 2)TX for the general irreducible trinomials f (x) = xn + xk + 1 and TA + (dlog2 ne + 1)TX for special cases, i.e., k = 1, n=2, and n = 2j +1 for some j > 0. The space complexity of these multipliers is n2 AND gates and n2 0 1 XOR gates. On the other hand, KOA has been considered to implement multipliers with low space complexity that broke the quadratic complexity [12], [14]. However, its gate delay is three times more than one of the existing fast parallel multipliers [4]. So we know that its space complexity is lower but time complexity is higher than the Mastrovito multipliers. In this paper, we present a new multiplication formula which is a variant of KOA, and a straightforward architecture of non-pipelined bitparallel multiplier using the proposed formula. The proposed formula makes the space complexity of the multiplier lower than the existing Mastrovito multipliers. Moreover the time complexity of the multiplier is comparable to the time complexity of the other Mastrovito multipliers. For the range 100  n  1; 023 and k  n=2, there are 1405 irreducible trinomials and 493 values of n. Our contributions are as follows. 1) Using the proposed multiplication formula, the space complexity of the new multiplier is lower than any other Mastrovito multipliers for all irreducible trinomials. 2) Besides, its time complexity is equal to TA + (dlog2 ne + 1)TX for 331 values of n. 3) For the other cases, it is only 1TX higher. The remainder of this paper is organized as follows. The new multiplication formula is presented in Section II. In Section III, we present the architecture of the proposed multiplier using the new multiplication formula. We nish our paper with a conclusion in Section IV.

other words, if we denote the vector representation of aL (x); aH (x) and aS (x) by aL , aH and aS , respectively, then

(a0 ; 1 1 1 ; ak01 ) + + + H a : (ak ; ak+1 ; 1 1 1 ; an0k ; 1 1 1 ; an01 )


k

aL :

aS

: (ak ; ak+1 ; 1 1 1 ; an0k ; 1 1 1 ; an01 )

and we generate bL (x), bH (x) and bS (x) from B with the same process. Then we can rewrite the product formula as follows:

S =A 2 B = aL (x)bL (x) +

aL (x)bH (x) + aH (x)bL (x) xn0k + aH (x)bH (x)x2n02k = aL (x)bL (x) + aL (x) + aH (x) bL (x) + bH (x)

+ aH (x)bH (x)x2n02k = aL (x)bL (x) + aL (x)bL (x)xn0k + aH (x)bH (x)xn0k + aH (x)bH (x)x2n02k + aS (x)bS (x)xn0k :
Here we denote S1 , S2 , and S3 as follows:

+aL (x)bL (x) + aH (x)bH (x)

xn0k

II. NEW BIT-PARALLEL MULTIPLIER Let f (u) = un + uk + 1 be an irreducible polynomial over GF (2) and GF (2n ) = GF (2)[u]=(f (u)). If x is a root of f (u) then f1; x; x2 ; . . . ; xn01 g is a PB over GF (2n ). Given two eld elements n01 ai xi and A and B over GF (2n ) can be represented as A = i =0 n 0 1 j B = j =0 bj x , where ai ; bj 2 GF (2). Then, the product of n01 ct xt  2n02 st xt mod f (x), where A and B is C = t =0 t=0 ai bj , t = 0; 1; . . . ; 2n 0 2. In this section, we st = present a new multiplication formula using the irreducible trinomial f (u) = un + uk + 1 where 1  k  n=2 because there always exist irreducible trinomial f (u) = un + un0k + 1 by the reciprocal property [15]. A. Variant of KOA First, we divide a eld element A into aL (x) and aH (x)xn0k in order to have k and n 0 k terms, respectively, as follows:

S1 = aL (x)bL (x) + aL (x)bL (x)xn0k ; S2 = aH (x)bH (x)xn0k + aH (x)bH (x)x2n02k ; S3 = aS (x)bS (x)xn0k :
Then C = S mod f (x) = S1 + S2 + S3 mod f (x). So we develop closed form expression for the coefcients of C in terms of S1 , S2 , and S3 . We need to derive the formulas of S1 mod f (x), S2 mod f (x), and S3 mod f (x). So we devote the following subsections to these formulas. 2k02 t 1) S1 mod f (x): Let aL (x)bL (x) = t=0 pt x , where pt = ai bj then S1 represented as
2k02

t=0

pt xt + xn0k 1

2k02

t=0

pt xt =

2k02

t=0

pt xt +

n+k02 t=n0k

pt0n+k xt : (1)

According to the value of n and k , we consider three cases for (1) as follows:

A=

n01

i=0 + (ak x0n+2k + . . . + an01 xk01 )xn0k = aL (x) + aH (x)xn0k : n02k01 ak+i xi0(n02k) + i=0 = ai0(n02k) + ak+i . In

ai xi = (a0 + . . . + ak01 xk01 )

We denote the sum of aL (x) and aH (x) by aS (x). Then

aS (x) = aL (x) + aH (x) = n0k01 k+i xi0(n02k) , where a k+i i=n02k a

S1 = p0 + p0 xn01 2k02 n+k02 pt xt + pt0n+k xt t=0 t=n0k n0k01 t 2k02 pt x + (pt + pt0n+k )xt t=0 t=n0k n+k02 + pt0n+k xt t=2k01

(k = 1) (1 < k < (n + 2)=3) ((n + 2)=3  k  n=2) :


(2)

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 10, OCTOBER 2012

1905

Since xn  1+ xk mod f (x), as follows:

n+k02 p t t0n+k x need to be reduced t=n

Therefore from (4)(6)

n+k02 t=n

pt0n+k xt 
=

n+k02 t=n 2k02 t=k

pt0n+k (xt0n+k + xt0n ) mod f (x)


k02 t=0

pt xt +

pt+k xt :

(3)

From (2) and (3), we have

01 qt xt + (q0 + qk ) + q2 x t=0n+3k n02 + (qt+k + qt0n+k )xt t=2 +qk01 xn01 (k = 1) k01 01 qt xt + (qt + qt+k )xt t=0 t=0n+3k 2k01 n02 t
(qt+k qt+k x + t=k t=2k t +qt0n+k )x + qk01 xn01 01 k01 qt xt + (qt + qt+k )xt t=0 t=0n+3k n02 t + qt+k x + qk01 xn01 t=k k01 01 qt xt + (qt + qt+k )xt t=0 t=0n+3k n02 t + qt+k x t=k + (2  k  (n 0 2)=2) (k = (n 0 1)=2)
(7)

p0 + p0 xn01 (k = 1) k02 t (p + p )x S1 mod f (x) = t=0 t t+k n01 +pk01 xk01 + pt0n+k xt (2  k  n=2). n0k
n+k02 t 2) S2 mod f (x): Let aH (x)bH (x)xn0k = t=0n+3k qt x , ai bj then S2 is represented as where qt =

(k = n=2).

2k01
t=0n+3k

qt xt +

n+k02 2n02 (qt + qt0n+k )xt + qt0n+k xt : t=2k t=n+k01

Because of the difference in the reduction process according to the value of k we consider two cases as follows:

3) S3 mod f (x): For a brief representation, we denote ai = ai , where (k  i < n 0 k) and ai , where (n 0 k  i  n 0 1). Then S3 = n+k02 r xt , where r = aS (x)bS (x) = t ai bj . t =0n+3k t n01 t . Otherwise S is r x When k = 1, S3 is represented as t 3 =0n+3k t described by the following formula:

S2 =

n01 (qt + qt0n+k )xt t=2 t=0n+3k 2n02 + qt0n+k xt t=n 2k01 n+k02 qt xt + (qt + qt0n+k )xt t=0n+3k t=2k 2n02 + qt0n+k xt t=n+k01

qt xt +

S3 =
(k = 1)
(4)

n01 t=0n+3k n01 t=0n+3k n01 t=0n+3k k02

rt xt + rt xt + rt xt +

n+k02 t=n n+k02 k02 t=0 t=n

rt xt rt (xt0n + xt0n+k ) mod f (x)


= =

(2  k  n=2).

rt+n xt +

2k02
t=k

rt+n0k xt

Some partial terms in (4) need to be reduced as follows:

2n02
t=n
Also, when 2

qt0n+k xt 

n02 t=0

qt+k xt +

n01 t=1

qt xt :

(5)

t=0n+3k 2k02 n01 + (rt + rt+n0k )xt + rt xt : t=k t=2k01

(rt + rt+n )xt + rk01 xk01

(8)

 k  n=2, we have

Therefore, we have

n+k02 2n02 (qt + qt0n+k )xt + qt0n+k xt t=n t=n+k01 n+k02  (qt + qt0n+k )(xt0n+k + xt0n ) t=n 2n0k01 qt0n+k (xt0n+k + xt0n ) + t=n+k01 2n02 qt0n+k (xt02n+k + xt02n+2k + xt0n ) + t=2n0k k01 n02 qt+k xt + (qt + qt+k )xt + qn01 xn01 : = t=0 t=k

S3 mod f (x)

n01

t=0n+3k k02

rt xt

(k = 1)

(rt + rt+n )xt + rk01 xk01 t=0n+3k 2k02 n01 + (rt + rt+n0k )xt + rt xt t=k t=2k01

(2  k  n=2).
(9)

(6)

Now we obtained all entries of the Si mod f (x), where i = 1, 2, and 3.

1906

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 10, OCTOBER 2012

B. Formulation Using the Variant of KOA Since the closed form expression for the coefcient of C in terms f x , where i 1, 2, and 3 are different according to the of Si range of k , we consider seven cases

Theorem 1: The space complexity of the proposed multiplier is

mod ( )

1) k = 1 3) k = n=3 5) (n + 1)=3 < k < (n 0 1)=2 7) k = n=2: +3 (2)

2) 1 < k < n=3 4) k = (n + 1)=3 6) k = (n 0 1)=2 1

# AND =n2 0 k2 and n2 0 1 0 (k 0 1)2 2 0 1 0 k2 + k # XOR = n 2 n 0 1 0 n 0 k2 +4k n2 0 (k 0 1)2 2 (

(k =1; (n 0 1)=2) (1 <k<n=3) (n=3  k< (n 0 1)=2) (k = n=2). ) ( 1) 1) ( 2) ( 2 )( (

Please note that when 0n k  t  k 0 , qt are the same as rt . Thus, rt can be replaced by qt and we do not need to compute the sum of qt and rt over GF . According to the above cases, reduced result C is expressed differently. For example, we consider Case 3) as follows:

( 2 + 1) 2

3) k = n=3 02 C = (p + p + + q + + r + )x
k t t k t k t n t=0 k k k

+(p 01 + q2 01 )x 01 + (q + + r + r + 0 )x = +(q3 01 + r2 01 )x2 01 02 + (p 0 + + q + + q 0 + + r )x =2 +(p 01 + q 01 + r 01 )x 01:


t k t t n k t k k k k n t n k t k t n k t t t k k k n n

2k02

Proof: By the denition of ai and bj , it requires k XOR gates respectively. So we need k XOR gates. The number of two-input gates for pt is k2 AND gates and k 0 2 2 2 XOR gates. Each qt and rt requires n 0 k AND gates and n 0 k 0 2 XOR gates. However, from Remark 1 and 2, we can save n 0 k AND gates and n 0 k 0 2 XOR gates for Case 1)5) and n 0 k n 0 k = AND gates for Case 6). If the partial sums pt , qt and rt are obtained then they are summed k 0 XOR gates for all to generate the coordinate ct . It requires n cases. However, qt+k rt+n  t  k 0 and rt rt+n0k k  t  k 0 exist for Case 2)5). Note that rt+n  t  k 0 and t+k  t  n 0 k 0 appear in that partial sums by Remark 2. Thus we need to compute t+k rt+n only once. If n 0 k 0  k 0 then k 0 XOR gates can be saved to compute t+k rt+n for Case 2). Otherwise, n 0 k 0 XOR gates can be saved for Case 3)5). Therefore, n2 0 k2 AND gates are required for all cases. We summarized the total number of XOR gates of the multiplier for each case as follows:

1)

2 2) (0 1

(0

2)

2 +2 4 2) + ( (0 2) 2 2 2 +

(10)

The reduced result C of the other cases are expressed in electronic appendix A, available at http://www.youngincho.com/multiplier_ appendix. III. ARCHITECTURE In this section, we present a bit parallel architecture of the proposed multiplier using the variant of KOA. From the denitions of pt , qt and rt , some partial sums of them can be reused. Remark 1: When k < n= and 0n k  t  k 0 , we do not need to compute rt since qt is equal to rt . Thus, n 0 k n 0 k = AND gates and n 0 k 0 n 0 k = XOR gates can be saved. ai bj , where k  t  n 0 k 0 If we denote t as

1); 6) : 2k + (k 0 1)2 + 2(n 0 k 0 1)2 0 (n 0 2k 0 1)2 +2n + 2k 0 4 = n2 0 1 0 (k 0 1)2 : 2) : n2 0 1 0 (k 0 1)2 0 (k 0 1) = n2 0 1 0 (k2 0 k): 3)  5) : n2 0 1 0 (k 0 1)2 0 (n 0 2k 0 1) = n2 0 1 0 (n + k2 0 4k): 2 7) : n 0 (k 0 1)2 :
Now we discuss the time complexity of the proposed multiplier. The design procedure of the proposed multiplier is only applicable in the case where the eld-generating irreducible polynomial is xed. So we consider Case 3) again to discuss the time complexity of the multiplier. For simplicity, we do not present detailed computational procedures of gate delays for the other cases here. However we note that we have obtained these values for all irreducible trinomials and they are presented in appendix B. From (10), the coefcients of C is can be rewritten as

and k

< (n 0 1)=2 then ab =


i j

2 +3 1 ( 2 )( 2 +1) 2 1)( 2 ) 2 2
a b + +
i j t

q =
t

ab

i j

and

r =
t

ab =
i j

a b + +
i j t

ab:
i j

c =[p ] + [p + + (q + 0 + )] + [ + + r + ] (0  t  k 0 2) c =[p 01 ] + [q2 01 ] (t = k 0 1) c = [q + + (r 0 )] + [ + r + 0 ] (k  t  2k 0 2) c =[q3 01 + r2 01 ] (t =2k 0 1) c =[p 02 ] + [q + + q 0 + + r ] (2k  t  n 0 2) c =[p 01 ] + [q 01 + r 01 ] (t = n 0 1):


t t t k t k t k t k t n t t k t k k t t t t n k t t t k k t t k k t n k t k k n

(11) (12) (13) (14) (15) (16)

Thus, the signal of t can be reused. We summarize this fact as Remark 2. Remark 2: For k < n 0 = , we need to compute t k  t  n 0 k 0 only once. Thus, n 0 k 0 n 0 k = AND gates and n 0 k 0 n 0 k 0 = XOR gates can be saved. Theorem 1 provides the space complexity of the proposed multiplier.

2)

2)(

1) 2 ( 2 1) 2

1)(

2)2

Terms in square brackets are should be treated as one single term. Each TX while ai bj does only TA . i j of rt incurs a gate delay of TA In order to reduce the overall delay needed to compute ct , we perform an XOR operation on a pair of ai bj terms while each ai bj is generated due to parallelism. Thus we consider the critical path length at time TA TX . Table I summarizes the gate delay of each partial sum. From (11)(16) and Table I, we compute the critical path length of ct 

ab

(0

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 10, OCTOBER 2012

1907

TABLE I GATE DELAYS OF THE PARTIAL SUMS

TABLE II GATE DELAYS OF c s FOR Case 3)

TABLE III COMPLEXITIES COMPARISON OF SOME TRINOMIAL-BASED MULTIPLIERS

 n 0 1). It can be seen from (11)(16) that ct is a sum of one to three terms. 1) ct has one term: From (14), the gate delay of c2k01 is dlog2 (d(n + k 0 1 0 (3k 0 1))=2e + n + k 0 10(2k 0 1))eTX = dlog2 (d(n 0 2k)=2e + n 0 k )eTX =dlog2 (d(3n 0 4k )=2e)eTX . 2) ct is a sum of two terms: From (12), the gate delay of pk01 and q2k01 are dlog2 (dk=2e)eTX and dlog2 (d(n 0 k)=2e)eTX , respectively.

TABLE IV EFFICIENCY OF THE PROPOSED MULTIPLIER

y: (The number of irreducible trinomials have the time complexity which is less than or equal to that of the existing multipliers)/(The number of irreducible trinomials where 100  n  1023 and k  n=2).

1908

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 10, OCTOBER 2012

Since k=2 < (n 0 k)=2, the critical path length for ck01 is (dlog2 d(n 0 k)=2ee + 1)TX . From (13), the gate delay of qt+k + (rt 0 t ) and t + rt+n0k are dlog2 (d(n 0 k 0 1 0 t)=2e + 2 + 2t)eTX and dlog2 (d(n 0 3k 010t)=2e010t)eTX , respectively. Since dlog2 (d(n+2k 0 3)=2e)eTX is the maximum when t = 2k 0 2, (dlog2 (d(n + 2k 0 3)=2e)e + 1)TX is the critical path length for ct (k  t  2k 0 2). From (15) and (16), the critical path length for ct (2k  t  n 0 2) is (dlog2 ((2n 0 k 0 2)=2)e + 1)TX when t = 2k and the critical path length for cn01 is (dlog2 (n=2)e + 1)TX with the same way. 3) ct is a sum of three terms: Write the ct s as ct = dt + dt + dt . Let t1 , t2 and t3 denote the critical path lengths for the partial sums dt , dt and dt , respectively, and assume t1  t2  t3 . Then ct can be generated with ct = (dt + dt ) + dt and the critical path length for ct is

REFERENCES
[1] A. Halbutogullari and C. K. Ko, Mastrovito multiplier for general irreducible polynomials, IEEE Trans. Comput., vol. 49, no. 5, pp. 503518, May 2000. [2] A. Reyhani-Masoleh and M. A. Hasan, Low complexity bit parallel (2 ), IEEE architectures for polynomial basis multiplication over Trans. Comput., vol. 53, no. 8, pp. 945959, Aug. 2004. [3] B. Sunar and C. K. Ko, Mastrovito multiplier for all trinomial, IEEE Trans. Comput., vol. 48, no. 5, pp. 522527, May 1999. [4] C. Paar, Efcient VLSI architectures for bit-parallel computation in Galois elds, Ph.D. dissertation, Univ. Essen, Dsseldorf, Germany, 1994. [5] E. D. Mastrovito, VLSI architecture for computation in Galois elds, Dept. Elect. Eng., Linkoping University, Linkoping, Sweden, 1991. [6] E. D. Mastrovito, VLSI architecture for multiplication over nite elds, in Proc. Appl. Algebra, Algebraic Algorithms, Error Correct. Codes, 6th Int. Conf. (AAECC-6), 1998, pp. 297309. [7] G. Shou, Z. Mao, Y. Hu, Z. Guo, and Z. Qian, Low complexity architecture of bit parallel multipliers for (2 ), Electron. Lett. 16th, vol. 46, no. 19, pp. 13261327, 2010. [8] H. Wu, Bit-parallel nite eld multiplier and squarer using polynomial basis, IEEE Trans. Comput., vol. 51, no. 7, pp. 750758, Jul. 2002. [9] H. Wu, M. A. Hasan, and I. F. Blake, New low-complexity bit-parallel nite eld multipliers using weakly dual bases, IEEE Trans. Comput., vol. 47, no. 11, pp. 12231234, Nov. 2002. [10] H. Shen and Y. Jin, Low complexity bit parallel multiplier for (2 ) generated by equally-spaced trinomials, Inform. Process. Lett., vol. 107, no. 6, pp. 211215, 2008. [11] J. L. Imana, J. M. Sanchez, and F. Tirado, Bit-parallel nite eld multipliers for irreducible trinomials, IEEE Trans. Comput., vol. 55, no. 5, pp. 520533, May 2006. [12] N. Nedjah and L. de Macedo Mourelle, A review of modular multiplication methods and respective hardware implementations, Informatica, vol. 30, pp. 111129, 2006. [13] N. Petra, D. De Caro, and A. G. M. Strollo, A novel architecture for Galois elds (2 ) multipliers based on Mastrovito scheme, IEEE Trans. Comput., vol. 56, no. 11, pp. 14701483, Nov. 2007. [14] P. L. Montgomery, Five, six, and seven-term Karatsuba-like formulae, IEEE Trans. Comput., vol. 54, no. 3, pp. 362369, Mar. 2005. [15] R. Lidl and H. Niederreiter, Introduction to Finite Fields and Its Applications. Cambridge, U.K.: Cambridge Univ. Press, 1994. [16] S. Lee, S. Jung, C. Kim, J. Yoon, J. Koh, and D. Kim, Design of bit parallel multiplier with lower time complexity, in Proc. ICICS, 2004, pp. 127139. [17] T. Zhang and K. K. Parhi, Systematic design of original and modied Mastrovito multipliers for general irreducible polynomials, IEEE Trans. Comput., vol. 50, no. 7, pp. 734749, Jul. 2001.

GF

GF

maxft2 + 2TX ; t3 + TX g:

(17)

From (11), the gate delays of pt , pt+k +qt+k 0 t+k and t+k + rt+n are dlog2 (d(t + 1)=2e)eTX , dlog2 (d((k 0 1 0 t)=2) + t + 1e)eTX = dlog2 (d(k + 1 + t)=2e)eTX and dlog2 (d((n 0 2k)=2) + k 0 1 0 te)eTX = dlog2 (d(n 0 3 0 3t)=2e)eTX , respectively. Then the critical path length for ct (0  t  k 0 2) is dlog2 (n=4) + 2eTX when t = (k 0 1)=2 from (17). Table II summarizes the gate delays of ct s. The maximum of these values is the time complexity of the multiplier. From Table II, we know that the time complexity of the multiplier for f (x) = xn + xn=3 + 1 is

GF

TA + (dlog2 (5n=3 0 2)e + 1) TX :


A. Comparison We discuss the efciency of the multiplier by comparing it to existing multipliers. For the range 100  n  1; 023, there are 1405 irreducible trinomials and 493 values of n. The time complexity of the proposed multiplier is TA + (dlog2 ne + 1)TX for 331 values of n and TA + (dlog2 ne+2)TX for 162 values of n. The complexities of the proposed multiplier and some other multipliers for the irreducible trinomials are listed in Table III. As shown in Table III, the space complexity of the proposed multiplier outperforms the previously known best result for the same class of elds. The best result of the time complexity of PB multipliers is TA +(dlog2 (n 0 1+ dk=2e +1)e)TX [16]. In comparison to the other multipliers, [16] requires more (k2 + 3k)=2+ 1 XOR gates. Table IV shows that our space complexity gain is greater than the time complexity loss. IV. CONCLUSION In this paper, we proposed a new non-pipelined Mastrovito multiplier for all irreducible trinomials. The architecture of the proposed multiplier is applicable to the case where the eld generating irreducible trinomial is xed since it depends on the degree of the midterm k of f (x). For all irreducible trinomials, the space complexity of the multiplier is lower than Mastrovito multipliers. Considering the low space complexity with no sacrice of time efciency, we expect it to be comparable to any other Mastrovito multiplier.

GF

Das könnte Ihnen auch gefallen