Sie sind auf Seite 1von 5

International Conference on Communication and Signal processing ICCOS 2011 Karunya University, Coimbatore-641 114.

17-18, March 2011

VLSI IMPLEMENTATION OF CRYPTOGRAPHIC ALGORITHM DESIGNED WITH MULTIMODE MULTIPLIER


M.PRIYADHARSINI
Department of electronics and communication engineering, kongu engineering college
Perundurai,india Email:mullai_muthu@yahoo.co.in

E.D.KANMANI RUBY
Department of electronics and communication engineering, kongu engineering college
Perundurai,india Email: bewin_santhkumar@yahoo.co.in Montgomery multiplication (MM) algorithm [12], [13], we propose a multimode multiplier supporting the essential operations used by AES and the public-key cryptosystems.
A. CIPHER ALGORITHMS 1) AES Algorithm

Abstract.This work presents a highly efficient


multimode multiplier supporting prime field, namely, polynomial field, and matrix vector multiplications based on an symmetric word-based Montgomery multiplication (MM) algorithm. In this paper, the multimode multiplier supporting matrix-vector and dual field multiplication is designed and simulated using ModelSim software tool. Then it can be implemented in cryptographic algorithms to verify the performance. KeywordsAES, Asymmetric Word Based Modular Multiplication Algorithm, Prime Field, Galios Field(GF)

INTRODUCTION

Many communication applications have been invented to make daily life more convenient, such as the credit card transaction system, Internet transaction service, etc. However, using insecure network to transmit private data may suffer from significant risk, resulting in huge loss. One of the most useful methods to protect data is employing a cryptographic system, as the design of cipher algorithms is based on an advanced mathematical theorem. Recently, there have been many works on designing cost-effective encryption hardware used in portable applications[ 1][12]. Some works [1][5] focus on area reduction of AES, while others [6][10] propose to reduce hardware cost for both ECC and RSA cryptosystems. Three different area reduction strategies on InvSubBytes/SubBytes transformations are proposed in [3],[8][10].In our survey, most papers [1][10] focus on area minimization of a single type of cryptosystem (either AES/DES or RSA/ECC). When considering security issues on portable applications, users often need both types of cryptographic hardware to speed up performance for certain applications. Thus, it is necessary to study efficient architectures supporting dual-type cryptosystems with higher performance but lower area cost. The purpose of this paper is to design a cost-effective multiplier that can enhance the performance of multiple encryption algorithms, particularly for AES. Based on a wordbased

AES is a private-key block cipher algorithm, which is composed of three key procedures: the encryption, decryption, and round-key expansion processes. It deals with data blocks of 128 b using keys with three standard lengths of 128, 192, or 256 b. Figure 3.1 shows the AES algorithm. Each 128-bit state, operated by four primitive transformations. During the encryption/decryption process, the four primitive transformations are executed iteratively in rounds, where the value of will be 10, 12, or 14, depending on which key size is selected. In the encryption procedure, the incoming data will first be bitwise XORed with an initial key, and then, four transformations are executed in the following order: Sub- Bytes, ShiftRows, MixColumns, and AddRoundKey. arranged as a 4x4 Notice that the MixColumns transformation is not performed in the last round. information is round needs a round key, an initial key is used to generate all round keys before encryption/decryption. execution sequence is reversed in the decryption process, where their inverse transformations are InvSubBytes, InvShiftRows, InvMixColumns, and add roundkey respectively.In the AES algorithm, the SubBytes transformation is a nonlinear byte substitution composed of two operations: 1) modular inversion over GF(2 ) , modulo an irreducible polynomial p(x)=x +x 4 x +x+1 and 2) affine transformation defined as , where is an 8 x8 b matrix, v is an 8-b constant, and x/y denotes 8-b input/output.
8 3 8

Department of Electronics and Communication, Karunya University

870

International Conference on Communication and Signal processing ICCOS 2011 S=XY2-mmod N, MM(X,Y,N) {= 0; For(i=0 to m-1){ P1:qi=(S+xi.Y)/2;//Parity Generation P2;S=S+ Xi.Y//Accumulation P3: S=S+ qi.N)/2//reduction P4:return S=(S>N)?S-N:S;//Final correction The MM algorithm, shown in Algorithm I, generally consists of four phases:The parity generation phase, The accumulation phase, The reduction phase, The final-correction phase. We use the labels {P1, P2, P3, and P4} to mark the four operation phases respectively. As shown in Algorithm, the for-loop iteratively generates the parity (q i ) , accumulates each partial product, and performs the modular reduction. After the for-loop, the final correction adjusts the final result to fall within the range of [0,N]. figure 3.1 AES Algorithm In the MixColumns transformation,the 128-b data arranged as a 4 x 4 state are operated column by column. The four elements of each column form a four-term polynomial that is multiplied by a constant polynomial B.AWBMM Algorithm We modify the MM algorithm into an asymmetric word-based MM (AWBMM) algorithm.The asymmetric feature of the operand size helps us design an efficient multiplier to support MV and dual-field multiplications. In the AWBMM algorithm (Algorithm II), all operands are represented in word-based form, but the word width of different operands may be different For instance, an m bit integer X is represented in wordbased form with u s-bit words (s x u=m) , and an m -bit integer Y is represented as vt -bit words (t x v=m).Notice that Xj denotes the jth word of X and that Xij:k denotes a sequence of bits from the jth bit to the kth bit of Xj. The variable T denotes the parity, and S z,k , and j are four variables used to store the temporary values in the inner for-loop.It is similar to the MM algorithm in that the AWBMM algorithm also has the same operation phases: parity generation, accumulation, reduction, and final correction, marked with P1, P2, P3, and P4 in Algorithm II, respectively. The difference is that all operands in the AWBMM algorithm are processed word by word. Algorithm II: AWBMM(X,Y,N) INPUTS: X=(Xu-1,Xu-2,X1,X0)2S,su=m, Y=(Yv-1,Yv-2,Y1,Y0)2t,tv=m,at=s, N=(Nu-1,Nu-2,N1,N0)2t,0X,YN, W=-N-1mod2S OUTPUTS: C=(CU-1,C U-2,.C 1,C0)2S =XY2-m mod N AWBMM(X,Y,N)

c( x) {03}x3

{01}x {02} modulo x 4 1 . The ShiftRows


transformation is a simple operation in which each row of the state is cyclically shifted right by different offsets. The AddRoundKey transformation is a bitwise XOR operation of each round key and current state 1) MM Algorithm Modular multiplication is the major operation of many popular public-key cryptosystems, and the MM algorithm is the most effective algorithm to compute modular multiplication, which was proposed by Montgomery in 1985. In the MM algorithm, it replaces the modular multiplication as a series of additions and right shifting. Given the inputs X,Y and N, where N is an m -bit modulus and 0 X,Y<N, the output of the MM algorithm is equal to XY2 mod N . Algorithm I:MM(X,Y,N) Inputs: X=(xm-1,.x1,x0)2, Y=(ym-1,.y1,y0)2, N=(nm-1,.n1,n0)2,0<X,X<N Outputs:
m

Department of Electronics and Communication, Karunya University

871

International Conference on Communication and Signal processing ICCOS 2011

{S=0; for(i=0 to u-1)\{ Z=0; P1: T=(C0+XIY0)W mod 2s;//PARITY GENERATION for(j=0 to v-1){//accumulation & reduction P2: S=(CJ+XIYJ+TNJ+Z) P3: Z=S/2t; K=(j-v/u)/u;l=j%(v/u); If(v/ujv-1) Ck(l+1)t-1:lt=s mod2t } Cu-1=z; P4: return C=(CN)?C-N:C;//FINAL
CORRECTION

The MV multiplication of (11) can be reformulated as eight vectors XORed together. The eight columns of matrix M shown above.

II

PROPOSED MULTIMODE MULTIPLIER The eight partial products are padded with some zeros and partitioned as P1, P2, and P3, as shown in Figure 3.4(b). The new eight partial products of P2 are labeled as pp1, pp2,..., and pp8 and fed into an XOR tree calculator (XTC) shown in Figure. 3.4. It is named XTC since only the sum vectors of each CSA are XORed. In the XTC module, the square and circle at the output of CSA or HA denote the produced carry vector and sum vector. The carry vectors of each CSA and the final sum vector are sent to the Wallace tree accumulator (WTA) module. In the meantime, the other partial products in P1 and P3 are sent to WTA as well for carry-save accumulation. Finally, an adder is used to convert the WTAs outputs (carry and sum) into a normal binary number. It is obvious that the final sum vector of XTC is equal to the XOR value of pp1, pp2,.., and pp8. By replacing {pp1,pp2,.pp8} with { C 0 b0 ,C1b1 ,...,C 7 b7 } ,the MV product can be obtained at the XTC module. In addition, the polynomial product is easy to obtain by concatenating the sum vectors of P1, P2 (final sum vector), and P3. Hence, we can get the MV product from the XTC module, the polynomial product from the WTA module, and the integer product from the final-stage adder. B. Width Selection of Multimode Multiplier Until now, a novel multimode 8x8 b multiplier has been presented. The multiplier size is further enlarged to handle all MV multiplications

Based on the modified AES and MM algorithms, we propose a multimode multiplier to support both MV and dual-field multiplications. The multimode multiplier is modified from the dual-field multiplier, as both their multiplier and ours are designed based on the word-based MM algorithm. Therefore, the description of this section is focused on the Matrix-Vector multiplication. A. Proposed Multimode 8x8 b Multiplier

Figure 3.4: Proposed Multimode 8x8b multiplier Figure 3.4 shows the proposed multimode multiplier, using an example, which is a multimode 8x8 b multiplier. In Figure.3.4(a), it shows eight partial products of an 8x8 b multiplication.

Department of Electronics and Communication, Karunya University

872

International Conference on Communication and Signal processing ICCOS 2011

Figure 3.5 : Arrangement of new MixColumns coefficients and 128b data. central partial products as new 32 128-b vectors arranged like Figure 3.5(a).

needed in the new AES round function. It needs 64 MV multiplications executed concurrently in each Inv-/MixColumns transformation. Figure 3. 5(a) shows the Mix- Columns transformation represented by the 16x4 blocks. In the figure, a 128-b input is partitioned into 16 8-b vectors Columns are defined as {01*},{02*}, and {03*}. Each block, e.g.,{02*}B3, indicates an MV multiplication, and it produces an 8-b intermediate value after the MV multiplication. The positions of all blocks are carefully arranged so that the result of the MixColumns transformation is defined as the 16x4 intermediate values XORed column by column. For example, the MixColumns transformation of the first column of the 4x4 state, is represented as the first four columns in the rightmost side. The results, i.e.,B0,B1,B2,B3,are obtained by vertically XORing the rightmost 4 x 4 intermediate values. In fact, each 8-b result of the MixColumns transformation, such as B3, can be represented as 32x8-b vectors XORed together. The 32x8-b vectors of each column can therefore be concatenated into 32 128-b vectors. The MixColumns transformation is finally formulated as the XOR value of 32 128-b vectors; hence, it needs a 128 x 32 b multiplier to do the XOR operation. Figure 3.5(b), (c) shows the extension from a multimode 8 x 8 b multiplication (see Figure 3.4) to a multimode 128 32 b multiplication. In Figure 3.5(b), it shows the partial products of a multiplication whose size is 128 32 b. In the same way, the 32 128-b partial products are padded with some zeros in the upper- and bottomright corners, as shown in Figure 3.5(c). It does not affect the result of dual-filed multiplication since there are just some zeros padded in the partial products. The central 32 128-b partial products with the padded zeros are computed by an enlarged XTC. Then, if the newInv-/MixColumns transformations are needed, it simply replaces the

III

CIPHER CORE ARCHITECTURE

Figure 3.7 shows the proposed cipher core architecture based on a multimode 128 x 32 b multiplier (highlighted by solid rectangle). Other components include dedicated hardware for the AES function (encircled by dashed rectangle), storage element unit, main controller, I/O controller, as well as I/O interface.

Figure 3.6 : Proposed cipher core architecture based on a multimode 128x32b multiplier.

Department of Electronics and Communication, Karunya University

873

International Conference on Communication and Signal processing ICCOS 2011

RESULTS In this paper, a high-efficiency and highperformance cipher core based on the multimode multiplier is presented. The multimode multiplier also supports the modular addition, subtraction, as well as multiplication in both and fields. In addition, the integration architecture supports more features than other low-cost AES designs, and it also supports scalable key sizes for the MM algorithm by changing the storage size. As the proposed integration architecture efficiently shares the hardware resources, it saves more area cost than other straightforward methods, directly integrating different cipher cores into single core architecture. When comparing the hardware efficiency for both AES and MM algorithms, the proposed architecture will have higher efficiency CONCLUSION In this paper, a high-efficiency and high-performance cipher core based on the multimode multiplier is presented. The multimode multiplier also supports the modular addition, subtraction, as well as multiplication in both and fields. In the next phase, the composite field arithmetic to decompose the Inv-/SubBytes transformations is going to be used. Therefore, the AES round function is regrouped as new linear and nonlinear functions. The new Inv-/MixColumns transformations, which are the most areaconsuming part, are reformulated as multiple MV multiplications; then, they are efficiently executed by the proposed multimode multiplier. REFERENCES [1] Alam M., Ray S. Mukhopadhayay.D, Ghosh S., RoyChowdhury D., and Sengupta I., (2007) An area optimized reconfigurable encryptor for AESRijndael, in Proc. Conf. DATE, Apr., pp. 16. In [2], the authors present a fully rolled inner pipelined architecture that uses only two 8b basis conversion units(GF(2 ) GF (2 )), while others need 16 conversion and 16 inverseconversion units. Chen-Hsing Wang, Chieh-Lin Chuang, and Cheng-Wen Wu An Efficient multimode Multiplier Supporting AES and Fundamental Operations of Public-Key Cryptosystems IEEE transactions on Very Large Scale Integration (VLSI) Systems, vol. 18, no. 4, april 2010 553 Chih-Chung, Lu and Shau-Yin Tseng Integrated Design of AES (Advanced Encryption Standard) Encrypter and
4 2

[4]

[5]

[6]

[7] IV

[8]

[9]

[10]

[11]

[12]

Decrypter Proceedings of the IEEE International Conference on ApplicationSpecific Systems, Architectures, and Processors (ASAP02) 1063-6862/02 Crowe F., Daly A., and Marnane W.,(2005) A scalable dual mode arithmetic unit for public key cryptosystems, in Proc. Int. Conf. ITCC, Apr. Eslami Y., Sheikholeslami A., Gulak P.G., Masui.S, and Mukaida K.,( 2006.) An areaefficient universal cryptography processor for smart cards, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 1, pp 4356 Harris D., Krishnamurthy R., Anders M., Mathew S. and Hsu S. 2005) An improved unified scalable radix-2 Montgomery multiplier, in Proc. 17th IEEE Symp Comput. Arithmetic. Ko K. and Acar T. (1997) Montgomery multiplication Des., Codes Cryptography, vol. 14, no. 1, pp. 5769. Lai Y.-K., Chang L.-C., Chen L.-F., Chou C.-C., and Chiu C.,-W,(2004) A novel memoryless AES cipher architecture for networking applications, in Proc. IEEE ISCAS Li H. and Li J., (2007) A new compact architecture for AES with optimized Shiftrows operation, in Proc. IEEE ISCAS, pp. 18511854. Lu C.-C. and Tseng S.-Y., (2002)Integrated design of AES (Advanced Encryption Standard) encrypter and decrypter, in Proc. IEEE Int. Conf. Appl.-Specific Syst. Architectures, Processors, Jul., pp. 277285. Mangard S., Aigner M., and Dominikus S.,(2003) A highly regular and scalable AES hardware architecture, IEEE Trans. Comput., vol. 52, no. 4. Montgomery P. L.,( 1985.) Modular multiplication without trial division, Math. Comput., vol. 44, no. 170, pp. 519 521.

[2]

[3]

Department of Electronics and Communication, Karunya University

874

Das könnte Ihnen auch gefallen