An Optimized Architecture To Perform Image Compression and Encryption Simultaneously Using Modified DCT Algorithm

Proceedings of 2011 International Conference on Signal Processing, Communication, Computing and Networking Technologies (ICSCCN 2011)
AN OPTIMIZED ARCHITECTURE TO PERFORM IMAGE

COMPRESSION AND ENCRYPTION SIMULTANEOUSLY USING
MODIFIED DCT ALGORITHM
S V V Sateesh
VIT University
Vellore, T
R Sakthivel
VIT University
Vellore, T
K Nirosha
VIT University
Veil ore, T
INDIA
sateeshlOvv@gmail.com
INOlA
rsakthivel@vit.ac.in
INDIA
nirosha.kondepati@gmail.com
Harish M Kittur
VIT University
Veil ore, T
INOlA
Kittur@vit.ac.in
Abstract:
Traditional fast Discrete Cosine
Transforms (DCT)/ Inverse DCT (mCT)
algorithms have focused on reducing the
arithmetic complexity. In this manuscript, we
implemented a new architecture simultaneous for
image compression and encryption technique
suitable for real-time applications. Here, contrary
to traditional compression algorithms, only special
points of DCT outputs are calculated. For the
encryption process, LFSR is used to generate
random number and added to some DCT outputs.
Both DCT algorithm and arithmetic operators
used in algorithm are optimized in order to realize
a compression with reduced operator
requirements and to have a faster throughput.
High Performance Multiplier (HPM) is being used
for integer multiplications. Simulation results
show the compression ratio around 66% and a
PSNR about 24dB. The throughput of this
architecture is 624 M samples/s with a clock
frequency of78 MHz.
Index Terms- DCT, LFSR, HPM, Carry select
adder, JPEG, FPGA and ASIC.
1. INTRODUCTION
The discrete cosine transform (OCT) has
been widely used in speech and image compression
because it has features such as good energy
compaction and low computational complexity. It has
become an integral part of several standards, such as
JPEG, MPEG-2, MPEG-4, CCITT Recommendation
H.26l and H.263, and HOTV applications [1]. For
applications like face recognition, detector etc. we
need to use communication systems with a good
security level and an acceptable transmission rate.
However, for some applications the encryption and
the compression techniques cannot be deployed
independently or in a cascade manner without
considering the impact of one technique over another
[2]. To solve this problem, we developed a new
technique to simultaneously compress and encrypt
978-1-61284-653-8/11/$26.00 2011 IEEE
images. The main idea of our approach consists,
frstly, in multiplexing the spectra of different
transformed images (to be compressed and
encrypted) by a Discrete Cosine Transform (OCT)
and secondly in implementing the proposed system in
FPGA. Consequently, special attention is given to the
OCT algorithm implementation in the context of
image compression. In fact, the OCT is the heart of
the proposed compression and encryption system.
This paper focuses on reducing the number of
multiplications because general purpose multipliers
are assumed to be the basic hardware elements for the
computation of the OCT. Much diferent OCT
architectures have been proposed to reduce number
of multipliers. Among these, OCT algorithm
proposed by Loefer [3] has 11 multipliers and 29
adders. But when compared to Loefer the modifed
OCT has only 4 multipliers and 14 adders. Also these
multipliers and adders are optimized to realize a low
power and fast OCT architecture. High-Performance
multiplier (RM) and carry select adder are being
used as multipliers and adders respectively. High
Performance Multiplier (RM) reduction tree is
completely regular.
This paper is organized as follows: the
previous works on OCT are mentioned in section 2,
the description of the proposed simultaneous
compression and encryption system and modifed
OCT architecture is presented in section 3. Section 4
is dedicated to explanation of HM and carry select
adder. Implementation results are illustrated in the
section 5. Conclusion has been done in the last
section.
2. RELATEDWORK
The N-point OCT of N input samples x 0), ... , x (N -
1) is defned as:
i- 1
Z
L
. . (Zk
= -
.
.
--
. N

ZN
(1)
i=D
In literature, many fast OCT algorithms are reported.
In [4], the authors show that the theoretical lower
442
limit of 8-point OCT algorithm is 11 multiplications.
Since the number of multiplications of Loefer's
algorithm [3] reaches the theoretical limit, we use this
algorithm as the reference to this work. In [5] one
realization based on Loefer algorithm is shown. A
low power design is obtained with this algorithm. In
[6] use the recursive OCT algorithm and their design
requires less area than conventional algorithms. The
authors of [6] use Distributed Arithmetic (DA)
multipliers and show that N-point OCT can be
obtained by computing N N12-point inner products
instead of computing N N-point inner products. In
[7], a new OA architecture called NEOA is proposed,
aimed at reducing the cost metrics of power and area
while maintaining high speed and accuracy in digital
signal processing (DSP) applications.
3. PROPOSED SYSTEM ARCHITECTURE
As DCT is the heart of the proposed compression
and encryption system, we mainly concentrate on
optimization of DCT architecture. Once it has been
achieved, we move our concentration on optimization
of arithmetic operators used in OCT architecture. In
this section, frst we present proposed compression
and encryption system architecture and later we
present the proposed OCT architecture for the
proposed compression and encryption system.
a. SYSTEM ARCHITECTURE
In this paper, we propose a new technique which can
carry out compression and simultaneous encryption
using Discrete Cosine Transform (DCT) and random
number generator respectively. The main idea of our
approach consists in multiplexing the spectra of
different transformed images separately by a DCT.
The choice of the OCT is justifed by the use of the
DCT in many standards such as JPEG [8], MPEG [9]
and ITU-T H261 [10]. Moreover, we need fewer
DCT coeffcients than the usual DFT coeffcients to
get a good approximation to a typical signal [11]. In
fact, by applying the OCT, most of the signal
information tends to be concentrated in a few low
frequency components. Consequently, the higher
frequency coeffcients are small in magnitude and
can be ignored in the compression and encryption
process. The proposed compression and encryption
system is shown in fg 1. In the lef side, pixels of an
image to be compressed are coming in serially to the
system. To apply for OCT algorithm block, we need
to parallelize image by blocks of 8 pixels. This
operation can be done by a serial to parallel block
978-1-61284-653-8/11/$26.00 2011 IEEE
consists of 8 fip- fops. Then, DCT block is used to
transform the input image. This OCT block is used to
generate lower frequency components of the
transformed coefcients by taking into account only
the frst and the second OCT outputs among 8, we
can get good approximation of input pixels. Let the
notations for frst OCT output be Octoutl and for
second DCT output be Dctout2. The data values of
Dctoutl are high when compared to Dctout2. This is
because Octoutl is addition of all 8 input pixels and
where as a part of Dctout2 is subtraction of input
pixels. In fact, as it will be explained in the next
section, the low value of the Dctout2 is due to the
spatial correlation between 8 successive pixels
presented in input images. In order to ensure a good
encryption level against any hacking attempt, we
propose to add a positive random value to Octout2 to
have a data values close to Dctoutl. The security key
will be sent separately as a private encryption
key. Once secure and compressed information safely
reach the authorized receiver, the image extraction
can be easily done by reversing the various steps used
in the whole process like subtracting the security key
from received image pixels and running an Inverse
OCT.
b. MODIFIED DCT ARCHITECTURE
The modifed OCT architecture is in fact
inspired from Loeffer OCT model. Hence, we can
fnd similarities in both. The OCT outputs are
calculated in four stages. Each stage has different
number of adders and subtractors. The frst stage has
four adders and four subtractors, the second stage has
two adders and four multipliers, the third stage has
three adders and the fourth stage has only one adder.
The optimization details of these blocks are explained
in the next section.
In traditional DCT algorithms, 8 pixels are
given as input and 8 DCT outputs will be generated.
But the modifed OCT circuit accepts 8 pixels per
clock cycle and generates only 2 outputs against 8
outputs in the original Loefer algorithm. So we
should make changes in the architecture to calculate
only necessary DCT coeffcients Dctoutl and
Dctout2. It should be noted that only DCT really does
not compress the image because it is almost lossless.
Usually afer DCT step, Quantization and encoding
are done to achieve compression. Quantization makes
use of the fact that higher frequency components
443
Input
image
pixels
Serial
To
Parallel
Compression
Encryption
Modified
DCT
with
optimized
Arithmetic
operators
LFSR
(random
number
generator)
Transmission
II

Parallel
To
Serial
are less important than low frequency components.
It allows varying levels of image compression
and quality through selection of specifc
quantization matrices. Thus quality levels ranging
from 1 to 100 can be selected, where 1 gives the
poorest image quality and highest compression, while
100 gives the best quality and lowest compression.
Encoder creates a fxed or variable- length code to
represent the quantizer's output and maps the output
in accordance with the code. In most cases, a
variable-length code is used. An entropy encoder
compresses the compressed values obtained by
the quantizer to provide more effcient
compression. Most important types of entropy
encoders used in lossy image compression techniques
are arithmetic encoder, Huffman encoder and run -
length encoder. Quantization and encoding steps
therefore increases computational time and latency.
The proposed compression system does not
require quantization and encoding stages. The
modifed DCT architecture itself does the required
compression. To achieve this we need to do some
changes in DCT architecture. The changes in DCT
architecture are as follows: First, necessary
calculations are made to get only DCToutl and
DCTout2. Thus, we can economize 5 multipliers, 2
adders and 2 subtractors compared to the Loefer
architecture. We can notice that in stages 2, 3 and 4
only adders are used to get the required outputs.
Consequently, 6 additional subtractors can be saved.
The following are the calculations made in each
stage. Let the 8 input pixels be In1, In2, In3, In4, In5,
978-1-61284-653-8/11/$26.00 2011 IEEE
Parallel
Decryption To
Serial
Fig 1 Compression and Encryption System
In6, In7 and In8 and output DCT values are Dctoutl
and Dctout2.
Stage 1
Xl = In1 +In8; Y1 =In1-In8;
X2= In2+In7; Y2=In2-In7;
X3= In3+In6; Y3= In3-In6;
X4=In4+In5; Y 4=In4-In5;
Stage 2
X5= X1+X2;
X6= X3+X4;
Z 1 = Y 4 *( cos (3rI16) + sin (3rI16;
Z2= Y3*(cos (3r/16) - sin (3r/16;
Z3= YI *(cos (r/16) + sin (r/16;
Z4= Y2*(cos (rI16) - sin (rI16;
Stage 3
Dctoutl= X5+X6;
X8= ZI+Z4;
X9= Z2+Z3;
Stage 4
Dctout2= X8+ X9;
When we combine all the equations we get the
following two DCT output equations:
Dctoutl =(Inl +In2+In3+In4+InS+In6+In7+InS) (2)
Dctout2= Y2*(cos (n/16) - sin (n/16
+ Y3*(cos (3nI16) - sin (3n/16
+ Y4*(cos (3nI16) + sin (3nI16
+ Yl *(cos (n/16) + sin (n/16 (3)
444
Output
image
pixels
According to Octout2 equation, the number of
multipliers used is four. In fact, the original DCT
architecture requires 2 adders, 2 subtractors and 8
multipliers to compute the outputs. Loefer reduces
the number of arithmetic operators to 6 multipliers
and 6 adders. In this work, the Octout2 can be
calculated using only 4 multipliers. Like this, we
economize 6 adders and 2 multipliers. Using these
three optimization levels i.e. stages 2,3 and 4 , the
modifed OCT architecture requires 4 multipliers and
14 adders to compute relevant and representative data
outputs for image compression against 11 multipliers
and 29 adders proposed by Loefer[3].
As mentioned before in this paper, the bit
values of Octoutl is high when compared to Octout2.
In the input side of the proposed method, the pixels
are encoded using signed 9-bit values. In the output
side, Dctoutl contains the major part of the
information, so it has 12 bit values. For DCTout2
encode, we can take into account the spatial
correlation of images. To make Octout2 bit values
close to Dctoutl, we add a random number generated
by LFSR. Thus, it also provides the encryption of the
Dctout2 coeffcients. Finally, Dctout2 is encoded
using 12 bits. When we replicate hardware for four
images the compression ratio is given by
12 )
R = 1
. .
=JUU =
+*

4)
4. HPM and Carry select adder
The modifed OCT architecture has blocks of
adder, subtractor and multiplier. These blocks have
the structure shown in Fig. 2. The adder used in this
architecture is carry select adder and multiplier used
is High performance Multiplier (HPM).
al
=a2 __ *- __ sl
Fig 2 Adder and Multiplier
a. HPM
In any integer multiplication, three stages of
operations are performed. In the frst stage, the partial
978-1-61284-653-8/11/$26.00 2011 IEEE
product matrix is formed. In the second stage, this
partial product matrix is reduced to a height of two.
In the fnal stage, these two rows are combined using
a carry propagating adder. RM is a reduction tree
which is completely regular in structure. RM
reduction tree is easy to place and route and has a
logarithmic logic depth. The reduction tree can be
easily explained with encircling scheme shown in
Fig. 3. Each dot represents a partial product and each
step the height of the tree is reduced by one. Finally,
when two rows of partial products are present a carry
propagating adder is used to get the result.
High-Performance Multiplier (HPM) reduction tree
has an 0 (log !) delay dependence on word
length ! .The connectivity of adding cells in the
reduction tree is regular and so that routing becomes
trivial, which can be utilized in any type of design
method; flly automatic, custom, or somewhere in
between. In contrast to other logarithmic multipliers,
like Wallace, the design effort for a custom-made
HPM multiplier is very limited; in fact it is as low as
for a textbook array multiplier. The predictable
wiring resulting from the regularity of the HPM tree
can enable both systematic sizing of logic circuitry
and systematic wire spacing engineering so as to
minimize total multiplier delay.
I I I I fl I I I I
" , W" "
1111111
"'1'
I , I
11111
I II I
, I . . I'
o
4
445

MSB LSB
Fig 3 RPM reduction tree
a. Carry select adder
Carry select adders are relatively fast when
compared to ripple carry adders. The schematic is
shown in Fig. 4. Bits are broken into small numbers
of blocks and each block is computed using ripple
carry adder. Computation is done for both input carry
1 and 0 in parallel for each block. Use muxes and the
carry bits to choose the right output. The number of
logic levels in this adder is around sqrt (n), where n is
number of bits.
Adding 5
bits(carry bit + 4
bit result) and
carry 1
Adding 5
bits(carry bit
+ 4 bit result)
and carry 0
Fig 4 Carry select adder Architecture
5. RESULTS
Initially, the proposed system is simulated in
Matlab. This step is very important to validate the
algorithm structure before algorithmic
implementation in FPGA or ASIC. Regarding the
description language, we decided to use Verilog
HOL as it is easier and gives better visibility to
hardware details when compared to others. Further,
the Verilog-HL gives the choice of implementing
target devices like ASIC, FPGA etc. Simulation
results of the Matlab model are shown in Fig 5. The
PSNR value for images is 24dB.
978-1-61284-653-8/11/$26.00 2011 IEEE
The original OCT Loefer architecture, modifed
OCT architecture [12] and the modifed OCT
architecture with optimized operators have been
implemented in the same kind of FPGA boards, that
Fig 5a Input images
Fig 5b Output images
is, Virtex 5 of xc5vlx330t. In order to illustrate the
differences in hardware consumption, the FPGA
implementation results are presented in Table I. In
Table 2, comparison done between compression and
encryption system with and without optimized
operators. From this comparison we can notice that
the modifed OCT architecture with optimized
operators reduces the area consumption (slices and
Look Up Tables, LUTs). Furthermore, throughput of
this compression system has increased when
compared to the previous works.
Table 1 Synthesis results
characteristics Slice Slice Fully Through-
registers LUT used put(MS/s)
LUT
Loefer
507 1293 316 19l.867
Modifed
OCT
247 492 162 206.423
Modifed DCT
With optimized
23 432 0 624
operators
An ASIC implementation of proposed
compression and encryption system has been done
using Cadence tools. The RTL code has been
synthesized in RTL compiler using SAGE-Modeler
TSMC 0.18fm technology libraries at 1.8V to get the
timing, area and power results. The results are
tabulated in Table 3. The backend of the design has
446
been done in SoC Encounter and fnal chip layout is
shown in Fig. 6.
Table 2 Synthesis results
characteristics Slice Slice Fully Through-
registers LUT used put(MS/s)
LUT
Compression
system without
1536 2058 955 206.423
optimized
operators
Compression
system with
optimized
367 1085 251 624
operators
Table 3 Synthesis Results
Results Timing Power Area
(MHz) (mW)
(2)
Proposed
Compression
and 83.626 9.88 70490
encryption
system
Fig 6 Chip layout
6. CONCLUSION
In this paper, a new method of simultaneous
compression and encryption is presented. The
modifed DCT algorithm is an optimized model in
terms of number of arithmetic operators. I needs
only 4 multipliers and 14 adders. The arithmetic
operators used in DCT model are also optimized in
order to increase the throughput and
to decrease the power consumption. The FPGA
implementation of the whole method shows
improvement in terms of throughput, area and power
consumption when compared to existing methods. An
978-1-61284-653-8/11/$26.00 2011 IEEE
ASIC implementation of the whole method from
RTL to GDSH has been done.
REFERENCES
[I] Chen, T.H.: 'A cost-effective 8*8 2-D IDCT core
processor with folded architecture', IEEE Trans.
Consum. Electron, 1999, 45, (2), pp. 333-339.
[2] A. Alfalou and C. Brosseau, Image Optical
Compression and Encryption Methods, OSA:
Advances in Optics and Photonics, vol 1, pp. 589-
636, 2009.
[3] C. Loefer and A. Lightenberg and G.S.
Moschytz , Practical fast I-D DCT algorithm with 11
multiplication, IEEE, ICAPSS, pp. 988-991, May
1989.
[4] P. Duhamel and H. H'mida, New 2n DCT
algorithm suitable for VLSI implementation, IEEE,
ICAPSS, pp. 1805-1808, November 1987.
[5] C.Y Pai, W.E. Lynch and A.J. Al-Khalili, Low
Power data-dependant 8x8 DCT/IDCT for video
compression, lEE, Proceedings. Vision, Image and
Signal Processing, Vol. 150, pp. 245-254, August
2003.
[6] S. Yu and E.E. Swartzlander Jr, DCT
implementation with distributed arithmetic, IEEE
Transactions on Computers, Vol. 50, No.9, pp, 985-
991, September 2001.
[7] A. Shams, A. Chidanandan, W. Pan and M.A
Bayoumi, NEDA: A low-power high-performance
DCT architecture, IEEE transactions on signal
processing, Vol. 54, No.3, pp, 955-964, 2006.
[8] ISO/IEC JTCl/SC2/WG8, IPEG-8-R8, JPEG
technical specifcation, 1990.
[9] ISO/IEC JTCl/SC2/WGll, MPEG 901176,
Coding of moving picture and associated audio,
1990.
[10] ISO/IEC DIS 10 918-1, Digital compression and
coding of continuous-tone still image, 1992.
[II] K. F Blinn, What's the deal with the DCT?,
IEEE Computer Graphics and Applications, pp. 78-
83, July 1993.
[12] Maher Jridi and Ayman AIFalou, 'A VLSI
Implementation of a New Simultaneous Image
Compression and Encryption Method', IEEE,201 O.
447

An Optimized Architecture To Perform Image Compression and Encryption Simultaneously Using Modified DCT Algorithm

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

An Optimized Architecture To Perform Image Compression and Encryption Simultaneously Using Modified DCT Algorithm

Hochgeladen von

Copyright:

Verfügbare Formate

Proceedings of 2011 International Conference on Signal Processing, Communication, Computing and Networking Technologies (ICSCCN 2011)

AN OPTIMIZED ARCHITECTURE TO PERFORM IMAGE

Das könnte Ihnen auch gefallen