Sie sind auf Seite 1von 7


PG Student, ECE Department, DBIT, Bangalore, Karnataka, India

Assistant Professor ECE Department, DBIT, Bangalore, Karnataka, India

ABSTRACT: Implementation of memory efficient scalable video encoder architecture is

proposed in this aspect that the quality of the video to be good even though it is scaled down.
Apart from that also see that the memory utilized by it to be as less as possible. Here a low
complexity and memory efficient architecture is being used for scalable video encoder. The
proposed architecture of the scalable video encoder is being implemented on Field Programmable
Gate Array (FPGA).The hardware synthesized using Xilinx ISE (Integrated Software
Environment). Here a architecture is to be developed in which lossless compression is achieved
and also trying to scale down size and SNR.

INDEX TERMS: Lossless compression; discrete wavelet transform; size scalability; SNR
scalability; image compression


In earlier days both Wireless HD and WHDI were in use, but they did not consider transmission
issue when many devices are in use in parallel. Hence a scalable video encoder which can process
a video with high processing capability with high definition real time video of encoder is needed.
Encoder is also helpful when a particular format of video is played using some other form for
example if input video is of .avi and we need to get output of .flv etc the encoder can be used.

H.264/AVC was the new video coding form. When compared to other forms such as MPEG-4
(Advanced Simple Profile)ASP and MPEG-2 , respectively it can save 25%-45% and 50%-70%
of bitrates. AVC(MPEG-4) video coding standard is a well known scalable video codec. The main
aim of H.264/AVC was to create a standard capable of providing good video quality at lower bit
rates without increasing the complexity of design than previous standards (i.e., half or less the bit
rate of MPEG-2, H.263, or MPEG-4 Part 2), so much that it would be impractical or expensive to
implement. Apart from that it also wanted to provide good flexibility to be applied wide variety of
applications on a wide variety of networks and systems, including low and high bit rates, low and
high resolution video, broadcast, DVD storage, RTP/IP packet networks, and ITU-T multimedia
telephony systems.

The H.264/AVC is a family of standards today. A particular decoder do not decodes all profiles,
but it decodes at least one. The specification of decoder tells us which profiles can be decoded.
This standard is a member of the H.26x line of VCEG video coding standards , where the H.264
name follows the ITU-T naming convention; the MPEG-4 AVC name is related to the naming
convention in ISO/IEC MPEG, which is the suite of standards known as MPEG-4.However it
involves high complexity on computation and implementation.

Apart from high computation complexity and memory access, it has very long coding path, which
includes prediction, reconstruction, and entropy coding. Parallel processing is restricted by
reference software only sequential processing of each block in Macro Block(MB) is being
adopted. Intra prediction induces the bubble cycles and reduces the hardware utilization and
throughput caused by Block-level reconstruction loop. The functions have re-configurable engine
and multiplex modes to achieve resource sharing. The coding tools include many data
dependencies to increase the coding performance, but the considerable storage space is the penalty.

Here a low complexity scalable video encoder, i.e size and SNR scalable image-video compression
codec(SS-SIVC)is being proposed. This mainly includes three main modules: discrete wavelet
transform (DWT), the bit plane sequencer and the fast adaptive binary arithmetic coding (M-
Coder).Discrete wavelet transform have several architectures[3]-[6]. Here a pipelined architecture
with parallel scanning method which is memory-efficient used for 2-D lifting-based DWT in
scalable video encoder applications. In bit planar sequencer we try to avoid computational
complexity and also try to improve the run time performance. When we compare arithmetic coder
on AVC video coding standard , CABAC (Context-based Adaptive Binary Arithmetic Coder)is
one of the most efficient entropy coder. CABAC realizes high compression efficiency close to
theoretical limit by utilizing context-adaptive probability estimation of binary symbol (bin).
adaptive binary arithmetic coder (M-coder), CABAC adopts 398 contexts, while proposed bit
plane sequencer reduces original 398 contexts to only 3 contexts. In adaptive binary arithmetic
coder (M-coder), CABAC adopts 398 contexts, while bit plane sequencer proposed reduces
original 398 contexts to only 3 contexts. Hence proposed isa low complexity pipelined fast
adaptive binary arithmetic coding (M-Coder). The design is based on a new lossless compression
codec with size and SNR scalable image-video compression codec.


The 2-D DWT architecture[2] consist of two 1-D DWT cores and a transposing array register. 1-D
DWT core produces two output coefficients consuming two input data per cycle . Most interesting
thing is that in order to reduce the internal buffer size it utilizes parallel scanning method instead
of line based scanning method . Only 4N temporal memory and the 22 register array are required
for the NN tile image with one-level 2-D DWT decomposition, for 9/7 filter to store the
intermediate coefficients in the column 1-D DWT core. In the transposing array the column
processed data is being rearranged. The result shows that the implementation cost is lesser than the
earlier and it also process 1080p HDTV pictures at 30frames/sec.
Context-Based Adaptive Binary Arithmetic Coding (CABAC) is a part of the new ITU-T/ISO/IEC
standard and AVC for video compression[3]. Here we combine an adaptive binary arithmetic
coding method with context modeling to redundancy reduction and also achieve high degree of
adaptation. CABAC includes binary arithmetic coding and probability estimation of low
complexity which is efficient for hardware and software implementation. CABAC performs
baseline entropy coding method of AVC for some area of target applications. The average bit-rate
savings was of 9%-14% being achieved with acceptable quality video with range about 30-30dB
for broadcast applications test sequences.
Design of Architecture and FPGA Implementation of a Video Encoder proposes a novel VLSI
architecture for Video Encoder, which processes high resolution video sequences at real time

rates[5]. The architecture has been realized using Verilog and implemented on an XUPVP30
FPGA of Xilinx family. Including an output FIFO of size 128 K bits the total gate count of the
implementation was around 800,000. As per MPEG-2 standard it can process 30 frames per
second with 1600x1200 pixels color motion pictures in 4:2:0 format. The reconstructed picture is
of good quality with a PSNR values of 32 db or more and the compression effected is typically 20
to 40. The main advantage of the architecture proposed is that it improves the throughput by over
30% compared to the earlier Encoder developed by one of the present authors. The proposed
architecture of Video Encoder consists of Discrete Cosine Transform and Quantization Processor,
Run Length Encoder, Variable Length Coder, Header Generator, Serializer, FIFOs and a Master
Controller that co-ordinates all the activities of the encoder. The Video Encoder has also been
coded in Matlab in order to validate the Verilog realization. The drawback was that it used FIFO
and needed a serializer.
A new coding paradigm Distributed Video Coding (DVC) is for video compression, based on
lossless coding and lossy coding information results are in this [4]. The first aim of DVC is video
encoding with low-complexity, in which to the decoder computation is shifted in bulk. There exist
two types in DVC techniques they are as follows: pixel domain and transform domain based. Due
to better rate distortion(RD) performance of transform domain design and compacts the block
energy into few transform coefficients and also it exploits spatial correlation between neighboring
samples. In low to medium motion activity sequences when compared to H.264 DVC results in
better RD performance. The drawbacks were: lack of compressed video transport bit-stream
definition, inconsistencies in RD performance compared to H.264 intra with different video
Streams, No standardization, flicker, lack of procedure for chroma components coding, need of
Feedback channel from decoder to encoder.
A new architecture for JPEG2000 for lifting processor was proposed it was implemented on both
ASIC and FPGA[7]. It had a new cell structure for lifting process of a repetitive arithmetic that
executed a unit of lifting calculation. The unit cell was optimized on the basis of in detail analyses
of operational sequence of lifting arithmetic and imposing the causality to implement in hardware.
By repeatedly arranging the unit cells and a lifting processor. a new simple lifting kernel was
organized that was realized for Motion JPEG2000 with the kernel. The proposed processor support
both lossy and lossless operation can handle any size of tiles with (9,7) filter and (5,3) filter,
respectively. Also, it provide continuous output ,the wavelet coefficients of the four types (LL,
LH, HL, HH) simultaneously and has the same throughput rate as the input. The CMOS
fabrication process of 0.35 m was used to implement lifting processor. The operation speeds of
the ASIC and FPGA were 150 MHz and 115 MHz, which correspond to the data rates of 1.2 Gb/s
(219 frames/s) and 0.92 Gb/s (166 frames/s), respectively. These speeds are more than enough for
real-time process even for high-definition images. The result from comparing to the existing
architectures showed that the proposed uses less hardware resources and simple complexity in
compensation to the larger memory requirement and also stably operated.


The below Figure.1 represents the block diagram of SS-SIVC encoder. It consists of a 3-level
DWT module, three two pass quality driven bit-plane sequencer based on M- coder modules
(BPS), and a bitstream buffer (BSB).


Fig. 1Block diagram of the SS-SIVC codec system.

The block diagram of lifting based 2-D DWT architecture is shown in below Figure .2. Here we
use of a pipelined architecture with parallel scanning method for which is memory -efficient for
2-D lifting-based DWT and high speed for the application. This architecture contains two 1-D
processors, a temporal buffer, and a transposing buffer. The LL-band coefficients of lower levels
are stored in the LL memory for the decomposition of higher level. The transpose buffer is
designed with the consideration on the scan order. According to the 2-input/2-output
characteristics, the parallel scan is adopted by combining the line scan methodology and two-
output characteristic.

Fig.2 Block diagram of the proposed 2-D DWT architecture.

In the below figure 3 we can see that how an image is being decomposed into various bands using
filters and also the structure of wavelet decomposition is shown in which each time LL band is
decomposed again and again.

Fig.3 Filter stages in 2D DWT and the structure of wavelet decomposition

The output of the DWT is taken as input and bit plane slicing is being performed and the further
operations are performed.


Initially a real time video is taken as input through webcam and then the video is being divided
into number of frames. Hence we get series of images ,on this series of images the 2D DWT
operation is being performed and the output is being stored in a folder. The below figure 4 shows
one of the frame from the continuous video and the 2D DWT operation being performed on it.
Figure.5 shows the DWT performed output.

Fig 4. Image from video Fig 5. Sub bands LL,LH, HL &HH after applying

The below figure 6.shows the output of bit plane slicing where in bit plane slicing one image is
being distributed into 8 planes in which the MSB will all lie in one plane and the LSB will lie in
one plane

Fig .6 Bit Plane slicing output


Here we have performed DWT on the images which are being obtained from the video being
captured. Instead of other transforms we employ DWT as it takes both location information and
frequency into consideration whereas others take only either of one. The results are obtained till bit
plane slicing. We have to perform the bit planar sequencer operation where we get sequence of
sliced image and later perform the M-Coder operation.


Tsung-Han Tsai, Zong-Hong Li, Hsueh-Yi Lin and Li-Yang Huang Memory-Efficient
Scalable Video Encoder Architecture for Multi-Source Digital Home Environment
Department of Electrical Engineering National Central University,Taoyuan, Taiwan, 2013
Yeong-Kang Lai, Lien-Fei Chen and Yui-Chih Shih A High-Performance and Memory
Efficient VLSI Architecture with Parallel Scanning Method for 2-D Lifting-Based Discrete
Wavelet Transform IEEE Transactions on Consumer Electronics, Vol. 55, No. 2, MAY
D. Marpe, H. Schwarz, and T. Wiegand, Context-based adaptive binary arithmetic coding in
the H.264/AVC video compression standard, IEEE Transaction on Circuit and Systems for
Video Technology 13,pp. 620- 636, July 2003
Vijay Kumar Kodavalla and D. P. G. Krishna Mohan Distributed Video Coding: CODEC
architecture and implementation Signal &image processing:Internal journal(SIPIJ)Vol.
2,No.1, march 2011.
Tung-Chien Chen, Chung-Jr Lian, and Liang-Gee ChenHardware Architecture Design of an
H.264/AVC Video Codec Department of Electrical Engineering, National Taiwan
University, Taipei, Taiwan,2006.

B.-F. Wu and C.-F. Lin, A high-performance and memory-efficient pipeline architecture for
the 53 and 97 discrete wavelet transform of JPEG2000 codec IEEE Trans. Circuits Syst.
Video Technol., vol.15, no. 12, Dec. 2005.
Y.H. Seo and D.W. Kim,VLSI architecture of line-based lifting wavelet transform for motion
jpeg2000, IEEE Journal of Solid-State Circuits, Vol.42, No.2, pp.431-440, 2007.
C.-T. Huang, P.-C. Tseng, and L.-G. Chen, Flipping structure: An efficient VLSI architecture
for lifting-based discrete wavelet transform.IEEE Trans Signal Process., vol. 52. no. 4, pp.
1080-1089, April 2004.
Y.H. Seo and D.W. Kim,VLSI architecture of line-based lifting wavelet transform for motion
jpeg2000, IEEE Journal of Solid-State Circuits, Vol.42, No.2, pp.431-440, 2007