Beruflich Dokumente
Kultur Dokumente
4)
4. HPM and Carry select adder
The modifed OCT architecture has blocks of
adder, subtractor and multiplier. These blocks have
the structure shown in Fig. 2. The adder used in this
architecture is carry select adder and multiplier used
is High performance Multiplier (HPM).
al
=a2 __ *- __ sl
Fig 2 Adder and Multiplier
a. HPM
In any integer multiplication, three stages of
operations are performed. In the frst stage, the partial
978-1-61284-653-8/11/$26.00 2011 IEEE
product matrix is formed. In the second stage, this
partial product matrix is reduced to a height of two.
In the fnal stage, these two rows are combined using
a carry propagating adder. RM is a reduction tree
which is completely regular in structure. RM
reduction tree is easy to place and route and has a
logarithmic logic depth. The reduction tree can be
easily explained with encircling scheme shown in
Fig. 3. Each dot represents a partial product and each
step the height of the tree is reduced by one. Finally,
when two rows of partial products are present a carry
propagating adder is used to get the result.
High-Performance Multiplier (HPM) reduction tree
has an 0 (log !) delay dependence on word
length ! .The connectivity of adding cells in the
reduction tree is regular and so that routing becomes
trivial, which can be utilized in any type of design
method; flly automatic, custom, or somewhere in
between. In contrast to other logarithmic multipliers,
like Wallace, the design effort for a custom-made
HPM multiplier is very limited; in fact it is as low as
for a textbook array multiplier. The predictable
wiring resulting from the regularity of the HPM tree
can enable both systematic sizing of logic circuitry
and systematic wire spacing engineering so as to
minimize total multiplier delay.
I I I I fl I I I I
" , W" "
1111111
"'1'
I , I
11111
I II I
, I . . I'
o
4
445
Proceedings of 2011 International Conference on Signal Processing, Communication, Computing and Networking Technologies (ICSCCN 2011)
MSB LSB
Fig 3 RPM reduction tree
a. Carry select adder
Carry select adders are relatively fast when
compared to ripple carry adders. The schematic is
shown in Fig. 4. Bits are broken into small numbers
of blocks and each block is computed using ripple
carry adder. Computation is done for both input carry
1 and 0 in parallel for each block. Use muxes and the
carry bits to choose the right output. The number of
logic levels in this adder is around sqrt (n), where n is
number of bits.
Adding 5
bits(carry bit + 4
bit result) and
carry 1
Adding 5
bits(carry bit
+ 4 bit result)
and carry 0
Fig 4 Carry select adder Architecture
5. RESULTS
Initially, the proposed system is simulated in
Matlab. This step is very important to validate the
algorithm structure before algorithmic
implementation in FPGA or ASIC. Regarding the
description language, we decided to use Verilog
HOL as it is easier and gives better visibility to
hardware details when compared to others. Further,
the Verilog-HL gives the choice of implementing
target devices like ASIC, FPGA etc. Simulation
results of the Matlab model are shown in Fig 5. The
PSNR value for images is 24dB.
978-1-61284-653-8/11/$26.00 2011 IEEE
The original OCT Loefer architecture, modifed
OCT architecture [12] and the modifed OCT
architecture with optimized operators have been
implemented in the same kind of FPGA boards, that
Fig 5a Input images
Fig 5b Output images
is, Virtex 5 of xc5vlx330t. In order to illustrate the
differences in hardware consumption, the FPGA
implementation results are presented in Table I. In
Table 2, comparison done between compression and
encryption system with and without optimized
operators. From this comparison we can notice that
the modifed OCT architecture with optimized
operators reduces the area consumption (slices and
Look Up Tables, LUTs). Furthermore, throughput of
this compression system has increased when
compared to the previous works.
Table 1 Synthesis results
characteristics Slice Slice Fully Through-
registers LUT used put(MS/s)
LUT
Loefer
507 1293 316 19l.867
Modifed
OCT
247 492 162 206.423
Modifed DCT
With optimized
23 432 0 624
operators
An ASIC implementation of proposed
compression and encryption system has been done
using Cadence tools. The RTL code has been
synthesized in RTL compiler using SAGE-Modeler
TSMC 0.18fm technology libraries at 1.8V to get the
timing, area and power results. The results are
tabulated in Table 3. The backend of the design has
446
Proceedings of 2011 International Conference on Signal Processing, Communication, Computing and Networking Technologies (ICSCCN 2011)
been done in SoC Encounter and fnal chip layout is
shown in Fig. 6.
Table 2 Synthesis results
characteristics Slice Slice Fully Through-
registers LUT used put(MS/s)
LUT
Compression
system without
1536 2058 955 206.423
optimized
operators
Compression
system with
optimized
367 1085 251 624
operators
Table 3 Synthesis Results
Results Timing Power Area
(MHz) (mW)
(2)
Proposed
Compression
and 83.626 9.88 70490
encryption
system
Fig 6 Chip layout
6. CONCLUSION
In this paper, a new method of simultaneous
compression and encryption is presented. The
modifed DCT algorithm is an optimized model in
terms of number of arithmetic operators. I needs
only 4 multipliers and 14 adders. The arithmetic
operators used in DCT model are also optimized in
order to increase the throughput and
to decrease the power consumption. The FPGA
implementation of the whole method shows
improvement in terms of throughput, area and power
consumption when compared to existing methods. An
978-1-61284-653-8/11/$26.00 2011 IEEE
ASIC implementation of the whole method from
RTL to GDSH has been done.
REFERENCES
[I] Chen, T.H.: 'A cost-effective 8*8 2-D IDCT core
processor with folded architecture', IEEE Trans.
Consum. Electron, 1999, 45, (2), pp. 333-339.
[2] A. Alfalou and C. Brosseau, Image Optical
Compression and Encryption Methods, OSA:
Advances in Optics and Photonics, vol 1, pp. 589-
636, 2009.
[3] C. Loefer and A. Lightenberg and G.S.
Moschytz , Practical fast I-D DCT algorithm with 11
multiplication, IEEE, ICAPSS, pp. 988-991, May
1989.
[4] P. Duhamel and H. H'mida, New 2n DCT
algorithm suitable for VLSI implementation, IEEE,
ICAPSS, pp. 1805-1808, November 1987.
[5] C.Y Pai, W.E. Lynch and A.J. Al-Khalili, Low
Power data-dependant 8x8 DCT/IDCT for video
compression, lEE, Proceedings. Vision, Image and
Signal Processing, Vol. 150, pp. 245-254, August
2003.
[6] S. Yu and E.E. Swartzlander Jr, DCT
implementation with distributed arithmetic, IEEE
Transactions on Computers, Vol. 50, No.9, pp, 985-
991, September 2001.
[7] A. Shams, A. Chidanandan, W. Pan and M.A
Bayoumi, NEDA: A low-power high-performance
DCT architecture, IEEE transactions on signal
processing, Vol. 54, No.3, pp, 955-964, 2006.
[8] ISO/IEC JTCl/SC2/WG8, IPEG-8-R8, JPEG
technical specifcation, 1990.
[9] ISO/IEC JTCl/SC2/WGll, MPEG 901176,
Coding of moving picture and associated audio,
1990.
[10] ISO/IEC DIS 10 918-1, Digital compression and
coding of continuous-tone still image, 1992.
[II] K. F Blinn, What's the deal with the DCT?,
IEEE Computer Graphics and Applications, pp. 78-
83, July 1993.
[12] Maher Jridi and Ayman AIFalou, 'A VLSI
Implementation of a New Simultaneous Image
Compression and Encryption Method', IEEE,201 O.
447