Sie sind auf Seite 1von 2

A VLSI Implementation of the Blowfish Encryption/Decryption Algorithm

Michael C.-J. Lin, Youn-Long Lin


Department of Computer Science
National Tsing Hua University
Hsin-Chu, Taiwan 30043, R.O.C.

Abstract We propose an efficient hardware architecture for maximum frequency is up to 50MHz.


the Blowfish algorithm [1]. The speed is up to 4 bit/clock,
which is 9 times faster than a Pentium. By applying II. B LOWFISH ALGORITHM
operator-rescheduling method, the critical path delay is
The elementary operators of Blowfish algorithm include
improved by 21.7%. We have successfully implemented it
table-lookup, addition and XOR. The table includes four
using Compass cell library targeted at a 0.6 m TSMC
S-boxes (256x32bits) and a P-array (18x32bits).
SPTM CMOS process. The die size is 5.7x6.1 mm2 and the
maximum frequency is 50MHz. The Blowfish algorithm consists of four steps including
table initialization, key initialization, data encryption and
I. INTRODUCTION data decryption. Fig. 1 shows the Blowfish encryption
algorithm
Cryptography is widely applied to protect digital data.
Nowadays, there are many kinds of cryptography and most III. THE P ROPOSED ARCHITECTURE
of them require a secret key to encode digital data. After
XL XR P-array
applying a cryptography algorithm to our digital data, others 8 bits 8 bits 8 bits 8 bits 32 bits 32 bits

S-box1 S-box2 S-box3 S-box4


cant regain the original data easily without the secret key. a
32 bits
b
32 bits
c
32 bits
d
32 bits
e f

Then, the private data are under protection.


XOR

The Blowfish algorithm was designed by Bruce Schneier


in 1993. It is a symmetric block cipher and each block is 64
CG

XOR

bits. The secret key of Blowfish cryptography ranges from 32


bits to 448 bits. 32 bits XOR

Blowfish has been examined for five years. Serge XOR

Vaudenay has examined weak keys in Blowfish. Vincent


Rijmen's Ph.D. paper includes a second-order differential
CG

XOR

attack on 4-round Blowfish [2]. The key of the Blowfish


algorithm is 448 bits, so it re-quires 2448 combinations to
32 bits
XOR

examine all keys. 32 bits XOR

The Blowfish algorithm has many advantages. It is


r1 32 bits
suitable and efficient for hardware implementation. Besides, Result

it is unpatented and no license is required. Fig 2. DFG of the loop body


XL XR P-array
The proposed architecture can produce 4-bit data per 8 bits 8 bits 8 bits 8 bits 32 bits 32 bits

clock. The scan chains are also included in this architecture S-box1
32 bits
S-box2
32 bits
S-box3
32 bits
S-box4
32 bits

for testing. The die size of the chip is 5.7x6.1 mm2, and the
a b c d e f

XOR XOR

Divide X into two 32-bit halves: XL, XR


For i = 1 to 16:
CG

32 bits XOR

XL = XL Pi
XR = F(XL) XR XOR

Swap XL and XR
Swap XL and XR (Undo the last swap.)
XR = XR P17
32 bits
CG

XOR

XL = XL P18 r2 32 bits

Concatenate XL and XR Result

Divide XL into four eight-bit quarters: a, b, c, and d Fig. 3 DFG after rescheduling
F(XL) = ( ( S1[a] + S2[b] ) S3[c] ) + S4[d]
A. Operator Rescheduling
Fig. 1 Blowfish algorithm
When calculating s = a + b, the i-th bit of s is equal to ai
bi ci , where c i is the carry-in of i-th bit.
Fig. 2 shows the original DFG of the loop body after

Supported in part by the National Science Council, R.O.C, replacing the add operation with CG and XOR function.
under contract no. NSC 88-2215-E-007-025
2000 IEEE ISBN 0-7803-5974-7 1
The operators include only carry generators and XOR, so CORE implements the loop of the 16-round iteration. A
we can use operator-rescheduling method to reduce the pipeline stage is added to the output of the SRAM modules.
critical path delay. Fig. 3 shows the result of operator The pipeline stage will double the performance of the
rescheduling. Blowfish hardware but lead to the overhead of area.
The gray line in these figures shows the critical path. The D. DFT Consideration
original critical path delay is two CG delay plus five XOR
delay. After rescheduling, the critical path delay is reduced to The testing circuit of the controller is done by adding
two CG delay plus two XOR delay. Three 2-input XOR scan registers to store the signals of the controller and scan
delays are hidden. According to a synthesizers report, the out the contents of the registers in test mode.
improvement of critical path delay is about 21.7%. The datapath is described by Verilog RTL model. All of
the flip-flops of the datapath are replaced by scan flip-flops.
B. Fast Carry Generator
The fast carry generator is based on a carry-lookahead IV. EXPERIMENTAL RESULTS
adder [3]. We construct the carry generator using hierarchical Table 1 shows the feature of this chip. The maximum
4-bit carry generators. frequency of this Blowfish cipher chip is 50MHz. Fig. 6
shows the photomicrograph.
C. The System Configuration
Controller
The controller is implemented as a finite state machine Table 1 The chip feature
and described in a behavioral Verilog model. See Fig. 4.
Die size 5.7 x 6.1 mm2
Pad
start
!reset Ext. Power 5 vdd 7 gnd
Int. Power 4 pairs
clear Input 12
Output 6
load e1
mode=0 Clock buffer PC5C03
mode=1
Macro
idle
e2
initial SRAM 256x32(x4), 16x32(x1)
e4
ROM 256x32(x4)
mode=3 mode=2
e3 Random logic 16K gates
decrypt

encrypt

Fig. 4 FSM of the controller


Datapath
It includes ROM modules, SRAM modules, and the main
arithmetic units of Blowfish. Fig. 5 shows the datapath
architecture.
addr_p[3:0] clk sel5 sel6 oe_p we_p oe_s we_s[3:0] clk DataIn
4 Fig. 6 Photomicrograph
XOR
Shift Register clk
32 32
Mux

p_din
FF
Mux

p 32 SRAM_P FF FF cnl0
p_dout

V. CONCLUSION
FF

ROM_P 0 0
Mux

s0 32
so_din sel0
Mux Mux
FF

The proposed hardware architecture of the Blowfish


Mux

s1 32
s1_din
s0_dout
FF

Mux Mux sel1

ROM_Sbox
SRAM_Sbox
s1_dout
algorithm can achieve high-speed data transfer up to 4 bits
Mux

s2_din
s2 32
FF

s2_dout
clk
en_de
sel2
per clock, which is 9 times faster than a Pentium. By
Mux

s3_din s3_dout
s3 32

applying operator-rescheduling method, the critical path


FF

CORE

addr_s0
delay is improved about 21.7%. Besides, DFT is also taken
Mux

addr_s1 into consideration. Specially, the chip is cascadable that


Mux

addr_s2
FF FF cnl1
means if two chips are used, the performance is double. The
Mux

test results show that the maximum frequency of this


Mux

addr_s3
Mux

32 32

Shift Register clk


Blowfish cipher chip is 50MHz. The proposed architecture
addr_s[7:0] sel4 sel3 DataOut
4
has satisfied the need of high-speed data transfer and can be
applied to security device of a system.
Fig. 5 The architecture of the datapath
REFERENCE
Because the size of SRAM module is 2n words, P1 and
P18 are implemented as registers, and the others are mapped [1] Bruce Schneier, Applied Cryptography, John Wiley & Sons, Inc. 1996
to 16x32 bits SRAM. We use a shift register under DataIn to [2] The homepage of description of a new variable-length key, 64-bit block
expand 4-bit input to 64-bit input and a shift register over cipher http://www.counterpane.com/bfsverlag.html
DataOut to reduce 64-bit output to 4-bit output. [3] Patterson and Hennessy, Computer Organization & Design: The
Hardware/ Software Interface, Morgan Kaufmann, Inc. 1994
2

Das könnte Ihnen auch gefallen