Beruflich Dokumente
Kultur Dokumente
XOR
XOR
clock. The scan chains are also included in this architecture S-box1
32 bits
S-box2
32 bits
S-box3
32 bits
S-box4
32 bits
for testing. The die size of the chip is 5.7x6.1 mm2, and the
a b c d e f
XOR XOR
32 bits XOR
XL = XL Pi
XR = F(XL) XR XOR
Swap XL and XR
Swap XL and XR (Undo the last swap.)
XR = XR P17
32 bits
CG
XOR
XL = XL P18 r2 32 bits
Divide XL into four eight-bit quarters: a, b, c, and d Fig. 3 DFG after rescheduling
F(XL) = ( ( S1[a] + S2[b] ) S3[c] ) + S4[d]
A. Operator Rescheduling
Fig. 1 Blowfish algorithm
When calculating s = a + b, the i-th bit of s is equal to ai
bi ci , where c i is the carry-in of i-th bit.
Fig. 2 shows the original DFG of the loop body after
Supported in part by the National Science Council, R.O.C, replacing the add operation with CG and XOR function.
under contract no. NSC 88-2215-E-007-025
2000 IEEE ISBN 0-7803-5974-7 1
The operators include only carry generators and XOR, so CORE implements the loop of the 16-round iteration. A
we can use operator-rescheduling method to reduce the pipeline stage is added to the output of the SRAM modules.
critical path delay. Fig. 3 shows the result of operator The pipeline stage will double the performance of the
rescheduling. Blowfish hardware but lead to the overhead of area.
The gray line in these figures shows the critical path. The D. DFT Consideration
original critical path delay is two CG delay plus five XOR
delay. After rescheduling, the critical path delay is reduced to The testing circuit of the controller is done by adding
two CG delay plus two XOR delay. Three 2-input XOR scan registers to store the signals of the controller and scan
delays are hidden. According to a synthesizers report, the out the contents of the registers in test mode.
improvement of critical path delay is about 21.7%. The datapath is described by Verilog RTL model. All of
the flip-flops of the datapath are replaced by scan flip-flops.
B. Fast Carry Generator
The fast carry generator is based on a carry-lookahead IV. EXPERIMENTAL RESULTS
adder [3]. We construct the carry generator using hierarchical Table 1 shows the feature of this chip. The maximum
4-bit carry generators. frequency of this Blowfish cipher chip is 50MHz. Fig. 6
shows the photomicrograph.
C. The System Configuration
Controller
The controller is implemented as a finite state machine Table 1 The chip feature
and described in a behavioral Verilog model. See Fig. 4.
Die size 5.7 x 6.1 mm2
Pad
start
!reset Ext. Power 5 vdd 7 gnd
Int. Power 4 pairs
clear Input 12
Output 6
load e1
mode=0 Clock buffer PC5C03
mode=1
Macro
idle
e2
initial SRAM 256x32(x4), 16x32(x1)
e4
ROM 256x32(x4)
mode=3 mode=2
e3 Random logic 16K gates
decrypt
encrypt
p_din
FF
Mux
p 32 SRAM_P FF FF cnl0
p_dout
V. CONCLUSION
FF
ROM_P 0 0
Mux
s0 32
so_din sel0
Mux Mux
FF
s1 32
s1_din
s0_dout
FF
ROM_Sbox
SRAM_Sbox
s1_dout
algorithm can achieve high-speed data transfer up to 4 bits
Mux
s2_din
s2 32
FF
s2_dout
clk
en_de
sel2
per clock, which is 9 times faster than a Pentium. By
Mux
s3_din s3_dout
s3 32
CORE
addr_s0
delay is improved about 21.7%. Besides, DFT is also taken
Mux
addr_s2
FF FF cnl1
means if two chips are used, the performance is double. The
Mux
addr_s3
Mux
32 32