Beruflich Dokumente
Kultur Dokumente
Abstract- Video compression is becoming increasingly important in several applications. Vector quantization (VQ) is a powerful technique for very low bit rate imagehide0 compression and is an attractive technique for mobile multimedia applications. Adaptive VQ techniques provide an excellent coding performance at the expense of significant increases in computational complexity making real-time implementation difficult. In this' paper, we propose a VLSI chip-set design to implement a high performance cache based (adaptive) VQ (CVQ) technique using VHDL for real-time video compression.
1. IN TRODUCTI0N
Vector quantization (VQ) is becoming increasingly popular in the domain of very low bit rate imageivideo compression. VQ is particularly attractive in applications such as mobile multimedia, video conferencing, video telephony, etc. In VQ[l], a set of representative images is decomposed into L-dimensional vectors. An iterative clustering algorithm such as the LBG algorithm is used to generate a codebook (CB) of size N. This codebook is then made available at both the transmitter and the receiver. In the encoding process, the image to be coded is decomposed into Ldimensional vectors. For each input vector V, , CB is searched using a nearest neighbor rule to find the closest codeword W,. Compression is achieved by transmitting the label j corresponding to W,. Reconstruction of images is implemented by usingj as an address to a table containing the codewords. The computational complexity of V Q for K input vectors of dimension L and a codebook size ,V is O(KLN). For example, a 5 12 x 5 12 image with vector dimension of L = 16 encoded using a codebook of size N = 256 requires approximately 192 million arithmetic operations. This high computational complexity has
been an impediment for real time implementation in many applications. Recently, special purpose architectures that implement VQ in real time have been reported in the literature[2] . Several adaptive VQ techniques which improve the coding performance have also been proposed in the literature. However, most adaptive techniques result in further increases in computational complexity making real time implementation difficult. Recently, adaptive VQ techniques[3] based on the cache concept hake been reported. For example, a cache VQ technique (CVQ) has been presented in the literatureE41, where ~ W V O codebooks are used namely: a small primary codebook (PC) and a larger secondary codebook (SC). The frequently used codewords are stored in PC while the less frequently used codewords are stored in SC. To start with, both PC and SC are empty. For each input vector, PC is first searched for a match within a prespecified threshold (Figure I). If no match is obtained, the input vector is transmitted and is also appended to PC as a new codeword. If a match is obtained the index of the corresponding codeword is transmitted. When PC becomes full, the least recently used (LRU) codeword is moved from PC to SC freeing room for the new codeword. From this point on. PC is searched first for a match, however, if it fails. SC is also searched. A new codeword is appended only if no match is obtained in both codebooks, and in this case, the LRU codeword in PC is moved to SC. However, if SC is also full, the LRU codeword in SC is deleted. If a match is obtained in SC, the index of that codenord is transmitted and the codeword is swapped with the LRG codeword in PC. However, we note that this algorithm and other cache based VQ algorithms cannot be directly mapped onto the existing architectures. since a software implementation of the LRU replacement algorithm may degrade the real-time performance of CVQ. In this paper, we propose a VLSI chip-set design which implements both the CVQ algorithm and the
CCECUCCGEI '95
7.4
129
LRU algorithm in real-time. The circuit has been built and tested using VHDL. The details of architecture are discussed in section 11. In section 111. the VHDL implementation of the design is presented followed by the conclusions in section IV.
II.CVQ ARCHITECTURE
The design of CVQ architecture consists of five main modules: 0 Input delay module (IM) 0 Systolic array module (Ski) 0 LRU module (LM) 0 Comparator module (CM) 0 Output delay module (OM) The details of each module follows.
There are two SM modules in the CVQ design: 0 SIM, LxN, cells for the PC of size X , . with 0 SM2 with LxN2 cells for the SC of size NZ. The block diagram of the SM with LTiV basic cells is shown in Figure 3.
Equation 1
The basic cell in S M is shown in Figure 2. It calculates the distortion between an element of the input vector with the corresponding element of a codeword and accumulates the distortion. (Equation 2).
The CLK signal synchronizes the operation of the cell. The element of the codeword, C,, is stored in the RAM cell, C. The input value of the vector element is also sent to the output as V,,, . The sequence of operations can be expressed as follows:
The systolic array executes the encoding algorithm. Here, each input vector element, V,,, (I=1,2, ....K: and j=1,2, ...,L), is compared with the corresponding element of each codeword, C,,, (p=l,&..iV ) stored in the array ( the RAM'S are in the read mode). We note that the elements of the vector are pumped into the array at intervals of one clock cycle. The distortion, CD, , is the cumulative distortion value of the ith input vector compared with the pth codeword. This is then fed to a comparator module to select the best codeword. The CLK signal also serves to synchronize the operations of the cells in the array. In the first clock cycle, the input vector element V,,, is fed into the cell Cl.,. In the next clock cycle the element is fed into C,,z, while the second element of the input vector. VI.: is fed to C1,? and so on. After L clock cycles the last , element of the input vector, is fed to the cell C1.L. The element VI,, is compared with the CN., the Nth at clock cycle. The sequence of operations is best understood by the cell occupancy diagram (Figure 4). which shows the cells occupied by the elements of the vector at different instants of time.
130
-I 1
0. We note that this algorithm can be mapped onto a pipeline which generates the LRU label at every clock cycle. The schematic diagram of the LRU module is shown in Figure 7. The architecture comprises of a chain of identical processing elements (PES), a decoder, an encoder and a register to store the RP. Each PE consists of a flip-flop and some combinatlonal logic. The implementation is modular and is easily expandable. Details of the design are presented in [ 5 ] .
RU codeword index
Each cell consists of a comparator and a label register. The cell determines the label using the following algorithm. If CD,, 2 TD,,
else
131
The OM reorganizes the input vector so that all of its elements are output in one clock cycle. This module (Figure 8) is simply the mirror image of IM.
IV. Conclusion
Vector quantization (VQ) is an excellent technique
for very low bit rate imagehide0 compression and is attractive for mobile multimedia applications. CVQ is a
powerful adaptive VQ technique, and provides an excellent coding performance at a reduced complexity. In this paper, we have presented a VLSI chip-set design to implement the CVQ technique using VHDL for real-time video compression. A behavioral VHDL description of the design has been implemented using the synthesizable part of the VHDL language. Timing analysis demonstrates that this chip set is suitable for real-time video compression.
been implemented using the synthesizable part of the VHDL language. The implementation is based on general values for the codebook size, N,and the vector dimension, L. After an initial latency which is L+mar(N,, IVJ clock cycles, the label of the first input vector becomes available. The labels for the subsequent input vectors are output at intervals of one clock cycle. The design, has been synthesized (translated and then optimized) and tested. The resulting chip area a,nd speed for the three basic cells are shown in table 1. We note that area and speed can be improved by using advanced technology libraries.
V.A cknowledgment
The authors would like to thank Mr. Robert Sawaya for his help in this project and the Ministry of Culture and Higher Education of the Islamic Republic of Iran for the financial support of this project.
area*
VI.References
1)N.M. Nasrabadi and R. A. King, "Image C o c b g L'sing Vector Quantization: A Review", IEEE Trans. on Communications, Vol. COM-36, No. S, pp. 957-971, August 1988. 2) G. A. Davidson, P. R. Cappello and A. Gersho, "Systolic Archtectuxes for Vector Quantization",(\em IEEE Trans. Acoust., Speech, Signal Processing], Vol. ASP-36, pp. 163-1664, October 1988. 3) S. Panchanathan and M. Goldberg, "A Mini-Max Algorithm for Image Adaptive Vector Quantization", IEE P r o c e e h g s : Part I - Communications, Speech and Vision, Vol. 138, No. I,pp. 53-60, February 1991. 1) F. Idris and S. panchanathan, "Image Sequence Coding Using Frame Adaptive Vector Quantization", Visual Communications and Image Processing '93,vol. 2094, pp. 941952, November 1993 5 ) "FPGX Implementation Of The LRU Algorithm For Video Compression", 0. Fatemi, F. I& and S. Panchanathan, IEEE Transactions on Consumer Electronics Vol 10,Xo 3 pp. 337-344, August 1994
Table 1