2063

IMPLEMENTATION OF HUFFMAN DECODER IN FPGA USING VERILOG
A PROJECT REPORT Submitted by P. ABHIRAM (41501106001) S. ANAND (41501106005) W. ANAND (41501106006)
in partial fulfillment for the award of the degree of BACHELOR OF ENGINEERING in ELECTRONICS AND COMMUNICATION ENGINEERING
S.R.M. ENGINEERING COLLEGE, KATTANKULATHUR-603 203, KANCHEEPURAM DISTRICT. ANNA UNIVERSITY : CHENNAI - 600 025 MAY 2005
BONAFIDE CERTIFICATE
Certified that this project report "IMPLEMENTATION OF HUFFMAN DECODER IN FPGA USING VERILOG" is the bonafide work of " P. ABHIRAM(41501106001), S. ANAND (41501106005) and W. ANAND (41501106006) " who carried out the project work under my supervision.
Prof. S. JAYASHRI HEAD OF THE DEPARTMENT ELECTRONICS AND COMMUNICATON ENGG. S.R.M.Engineering College Kattankulathur - 603 203 Kancheepuram District
Mr.B.RAMACHANDRAN SUPERVISOR Assistant Professor ELECTRONICS AND COMMUNICATON ENGG. S.R.M.Engineering College Kattankulathur - 603 203 Kancheepuram District
ACKNOWLEDGEMENT We take sincere efforts to acknowledge the guidance and the advice of all the people who have helped us in completing the project successfully. We grab this opportunity to thank our reve re d director Dr.T.P.Ganesan for providing us with an opportunity to carry on with the project. We take immense pleasure in expressing our thanks to our respected principal Prof.R.Venkatramani, B.E, .M.Tech; F.I.E.who has always been a source of inspiration for us.
iii
We are greatly obliged to Dr. S. Jayasri, PhD, Head of the Department, Electronics and Communication Engineering for her constant enc oura geme nt throughout the project. We express our gratitude to our guide Mr.Ramachandran, Asst. professor of department of Electronics and communication engineering for his guidance and timely suggestions that helped us go through the tough times of the project. We are indebted to Mrs. Susila, Sr. lecturer Department of Electronics and Communication engineering for conducting timely reviews which helped us completion of our work in time. We also express our since re thanks to all our college staffs and frie nds for their sugge stions and encoura ge ment.
ABSTRACT Digital bandwidth compression of data in is the important due to the
limitations
inherent
transmission
medium.
Algorithms ba sed on Huffman coding are effective and ha ve become feasible with efficient use of bandwidth. The purpose of this project is to present a tutorial and implementation on Huffman coding and decoding techniques. The project also deals with the implementation of Huffman decoder on a xilinx Spartan La ngua ge. CONTENTS Abstract List of Tables List of Figures CHAPTER 1 Page No iii vii viii FPGA using Verilog Hardware Description
iv
1. Introduction CHAPTER 2 2. Data Compression 2.1 Fundamental Concepts 2.1.1. Definitions 2.2 Classification of Methods 2.3 A Data Compression Model CHAPTER 3 3. Introduction To Huffman coding 3.1. Adaptive huffman coding 3.2. Huffman Decoding Techniques 3.2.1 Look-Up Table Method CHAPTER 4 4. Very Large Scale Integration 4.1 Dealing with VLSI circuits 4.2. The VLSI design process 4.3 Classification of VLSI Designs 4.4. De velopme nts in the field of VLSI 4.4.1. Reconfigurable computing CHAPTER 5 5. Verilog 5.1 A Brief History of Verilog 5.2 Design Flow using Verilog v
2 2 2 6 9
12 16 17 18
21 21 22 24 25 25
26 28 29
5.3 Syste m-le vel Verification 5.4 RTL design and test bench creation 5.5 RTL verification 5.6 Le vels of Abstraction 5.7 Design process 5.8 System level 5.9 Synthesizing Verilog CHAPTER6 6. FPGA technology 6.1. General Overview 6.2. FPGA Implementation Overhead 6.3. Performance Characteristics 6.4. FPGA design flow 6.4.1. Design Entry 6.4.2 Behavioral Simulation] 6.4.3 Synthesis 6.4.4 Post-Synthesis Simulation 6.4.5 Implementation 6.4.6 Rapid development cycles CHAPTER 7 7. Implementation of Huffman decoder 7.1 Decoding procedure
29 30 31 31 33 34 36
37 38 40 41 41 41 44 45 45 46 47
48 48
CHAPTER 8 8. Applications of Huffman coding 8.1 Huffman Coding of the Image Data 49 49
vi
8.2.The Huffman Codi ng i n MP3 8.3 Huffman Decoding in MP3 CONCLUSION APPENDIX REFERENCES LIST OF TABLES
52 53 55 56 70
Name of the table 3.1. Each Character's Appearance Frequency 3. 2. Fixe d Lengt h Code 3.3. Huffman Code 3.2.1 Look-Up Table Method 8.1. Huffman Coded C ross Image 8.2. Pixel Number and Huffman Code of Cross mark LIST OF FIGURES
page no. 13
14 15 18 51 51
Name of the Figure Figure 2.1 A block-block code Figure 2.2 Retrie ve Message Figure 2.3 Huffman Code for the Messa ge Figure 2.4 A dynamic Huffman code
page no. 3 3 7 8
vii
Figure 3.1 System Diagram Figure 3.2. Huffman Code Decoding Figure5.2.1Design Flow using Verilog figure5.6 .1 Le vels of Abstraction figure5.7.1 Design process Figure 6.1.1: Basic Spartan-II Block Diagram Figure 6.4.1: HDL Editor Figure 6.4.2: FSM Editor Figure 6.4.3: Block Diagram Editor Figure 7.1 Block diagram Figure 8.1 Huffman Coding of the Ima ge Data
12 15 29 32 34 39 43 43 44 48 50
viii
1. INTRODUCTION Compression technology has gained significance due to various reasons. A typical audio clip lasts for at least 3 minutes. This would amount to a storage requirement of around 8 -9MB for PCM audio samples. VLSI technology is enabling faster DSPs providing more computational power. At the same time, the memory is becoming cheaper though not at the same rate as that of processo r speed. The a vailability of fa st processors e nable d applica tio ns that need high MIPS. But due to relatively expensive on-chip memory and expensive bandwidth connecting PCs (Personal Computer) through Internet, the necessity of efficient data compression schemes for storage and transmission was felt. This resulted in a spurt of activities generating Audio, Video, Imaging and Speech codec (coder/decoder) standards. Storing of an uncompressed signal requires a large amount of space. Data compression reduces the number of bits required to store and transmit the information. Due to the explosion of multimedia applications, effective compression techniques are becoming essential. Huffman coding is a loss-less compression technique, often used in lossy compression schemes as the final step after decomposition and quantization of a signal. Huffman coding uses unique variable length code words and no Huffman code is prefix of the othe r codes in the table. For a give n pro bability de nsity function of symbol set, by assigning short codes to frequently occ urrin g symbols a nd lon ge r codes to inf requently occurrin g symbols, Huffman's minimum redundancy encoding minimizes the a ve ra ge numbe r of b its re quired to represent the data. Huffma n coding is one of the Variable Length Coding (VLC). Huffman coding is useful for the statistical information redundancies using reduction of to encode bit-rate by exploring a "minimum Usually set" of entropy and
entropy-coding
technique.
coders exploit the symbol probabilities independent of previous symbol, and hence they are optimal for uncorrelated sequences.
2.Data Compression 2.1 Fundamental Concepts A brief introduction to information theory is provided in this section. The definitions and assumptions necessary to a comprehensive discussion and evaluation of data compression methods are discussed. The following string of characters is used to illustrate the concepts defined: EXAMPLE = aa bbb cccc ddddd eeeeee fffffff gggggggg. 2.1.1. Definitions A code is a mapping of source messages (words from the source alphabet alpha) into codewords (words of the code alpha bet be ta). The source me ssa ge s are the basic u nits into which the string to be represented is partitioned. These basic units may be single symbols from the source alphabet, or they may be strings of symbols. For string EXAMPLE, alpha = { a, b, c, d, e, f, g, space}. For purposes of explana tion, beta will be taken to be { 0, 1 }. Codes can be categorized as block-block, block-variable, variable-block or variable-variable, where blockbloc k indicate s that the sou rce messa ge s an d co deword s a re of fixed le ngth and va riable-va riable c odes map variable-len gth sou rce messa ge s into va riable-length c ode words. A block-bloc k code for EXAMPLE is shown in Figure 2.1 and a variablevariable code is given in Figure 2.2. If the string EXAMPLE were coded using the Figu re 2.1 c ode, the len gth of the c ode d messa ge would be 120; using Figure 2.2 the length would be 30. source messa ge 0 1 10 message codeword a b c 000 001 010 10 aa bbb cccc codeword source
11 100 101 110 111
d e f g space
011 100 101 110 111
ddddd eeeeee fffffff gggggggg space Figure 2.2:
Figure 2.1: A block-block code The EBCDIC, oldest are and most of widely
used
codes, codes,
ASCII mapping
and an
examples
block-block
alphabet of 64 (or 256) single characters onto 6-bit (or 8-bit) code words. These a re not disc ussed, a s they do not provide compression. The codes featured in this survey are of the blockva ria ble, variable-va riable, and va riable-block types. When so urce messa ge s of va ria ble len gth are allo wed, the question of how a message ensemble (sequence of messages) is parsed into individual messages arises. Many of the algorithms described here are defined-word schemes. That is, the set of source messages is determined prior to the invocation of the coding scheme. For example, in text file processing each character may constitute a message, or messages may be defined to consist of alphanumeric and non-alphanumeric strings. In Pascal source code, each token may represent a message. All codes involving fixed-length source messages are, by default, defined-word codes. In free-parse methods, the coding algorithm itself pa rse s the en semble in to va riable -length seq uences of symbols. Most of the known schemes; the data compression methods are free-parse model differs in a defined-word
fundamental way from the classical coding paradigm. A code is distinct if each codeword is distinguishable from e ve ry othe r (i.e., the mappin g from sou rce messages to code words is one-to-one). A distinct code is uniquely decodable if every
11
codeword
is
identifiable
when
immersed
in
sequence
of
codewords. Clearly, each of these features is desirable. The codes of Figure 1.1 and Figure 1.2 are both distinct, but the code of Figure 1.2 is not uniquely decodable. For example, the coded messa ge 11 could be decode d as eithe r ddddd o r b bbb bb. A uniquely decodable code is a prefix code (or prefix-free code) if it has the prefix pro pe rty, which re qu ires that no codeword is a proper prefix of any other codeword. All uniquely decodable block-block and variable-block codes are prefix codes. The code with codewords { 1, 100000, 00 } is an example of a code which is uniquely decodable but which does not have the prefix prope rty. Prefix codes are instan taneously decodable; that is, the y ha ve the desirable property that the coded me ssa ge can be parsed into codewords without the need for lookahead. In order to decode a message encoded using the codeword set { 1, 100000, 00 }, lookahead is required. For example, the first codeword of the messa ge 100000 0001 is 1, but this can not be de termined until the last (tenth) symbo l of the message is read (if the string of zeros had been of odd length, then the first codeword would have been 100000). In this section, a code has been defined to be a mapping from a source alphabet to a code alphabet; we now define related terms. The process of transforming a source ensemble into a coded messa ge is co ding or encoding. The e ncoded messa ge ma y be referred to as an encoding of the source ensemble. The algorithm which constructs the mapping and uses it to transform the source ensemble is called the encoder. The decoder performs t he i nve rse ope ratio n, restorin g t he coded message to i ts original form. 2.2 Classification of Methods In addition to the categorization of data compression
schemes with respect to message and codeword lengths, these 12
methods are classified as either static or dynamic. A static metho d is o ne in wh ich the ma pping from the set of messa ges to the set of codeword s is fixed before tran smission be gins, so that a given me ssa ge is represen ted by the same code word every time it a ppea rs in the me ssa ge ensemble. The classic static defined word scheme is Huffman coding [Huffman 1952]. In Huffman coding, the assignment of codewords to source messages is based on the probabilities with which the source messages appear in the messa ge en semble. Me ssages which appear mo re frequently a re represented by short codewords; messages with smaller probabilities map to longer codewords. These probabilities are determined before transmission begins. A Huffman code for the ensemble EXAMPLE is given in Figure 2.3. If EXAMPLE were coded using this Huffman mapping, the length of the coded messa ge would be 1 17 source messa ge a b c d e f g space probability 2/40 3/40 4/40 5/40 6/40 7/40 8/40 5/40 codeword 1001 1000 011 010 111 110 00 101
Figure 2.3 -- A Huffman co de for the me ssa ge EXAMPLE (code len gth=117). A code is dynamic if the mapping from the set of messages to the set of codewords changes over time. For example, dynamic Huffman c oding involve s computing an a pproximation to the proba bilities of occurrence "on the fly" , as the en semb le is bein g transmitted. The assignme nt of codewo rds to messa ge s is ba sed 13
on the values of the relative frequencies of occurrence at each point in time. A message x may be represented by a short codeword early in the transmission because it occurs frequently at the beginning of the ensemble, even though its probability of occurrence over the total ensemble is low. Later, when the more probable messages begin to occur with higher frequency, the short codeword will be mapped to one of the higher probability messa ge s and x will be mapped to a longer code word. As a n illustration, Figure 2.4 presents a dynamic Huffman code table corresponding to the prefix aa bbb of EXAMPLE. Although the freq uenc y of space over the en tire messa ge is grea ter than that of b, at this point in time b has higher frequency and therefore is mapped to the shorter codeword. source messa ge a b space probability 2/6 3/6 1/6 codeword 10 0 11
Figure 2.4 -- A dynamic Huffman code table for the prefix aa bbb of messa ge EXAMPLE. Dynamic codes are also referred to in the literature as adaptive, in that they adapt to changes in ensemble characteristics over time. The term adaptive will be used for the remainder of this paper; the fact that these codes adapt to changing characteristics is the source of their appeal. Some ada ptive methods ada pt to chan ging patte rns in the so urce while others exploit locality of reference. Locality of reference is the tende nc y, common in a wide va riety of text type s, for a pa rticula r word to occur frequently for short periods of time then fall into disuse for long periods.
14
All of the adaptive methods are one-pass methods; only one scan of the ensemble is required. Static Huffman coding requires two passes: one pass to compute probabilities and determine the mapping, and a second pass for transmission. Thu s, as lo ng as the encoding and decoding times of an adaptive method are not substantially greater than those of a static method, the fact that an initial scan is not needed implies a speed improvement in the adaptive case. In addition, the mapping determined in the first pass of a static coding scheme must be transmitted by the encoder to the decoder. The mapping may preface each transmission (that is, each file sent), or a single mapping may be agreed upon and used for multiple transmissions. In one-pass methods the encoder defines and redefines the mapping dynamically, during transmission. The decoder must define and redefine the mapping in sympathy, in essence " learn in g" the mappin g as code words a re receive d. An algorithm may also be a hybrid, neither completely static nor completely dynamic. In a simple hybrid scheme, sender and receiver maintain identical codebooks containing k static codes. For each transmission, the sender must choose one of the k previously-agreed-upon codes and inform the receiver of his choice (by transmitting first the "name" or number of the chosen code). 2.3 A Data Compression Model In orde r to discuss the rela tive me rits of data comp ression techniques, a framework for comparison must be established. There are two dimensions along which each of the schemes discussed here may be measured, algorithm complexity and amount of compression. When data compression is used in a data transmission application, the goal is speed. Speed of transmission depends upon the number of bits sent, the time required for the encoder to generate the coded message, and the time required for 15
the decoder to recover the original ensemble. In a data storage application, although the degree of compression is the primary concern, scheme, it is nonetheless are three necessary algorithms that to the algorithm the be map efficient in order for the scheme to be practical. For a static there analyze: construction algorithm, the encoding algorithm, and the decoding algorithm. For a dynamic scheme, there are just two algorithms: the encoding algorithm, and the decoding algorithm. When data is compressed, the goal is to re duce re dunda nc y, leaving only the informational content. The measure of information of a source message x (in bits) is -lg p(x). This definitio n has intuitive appeal; in the case that p(x=1, it is clea r that x is not at all informa tive since it had to occur. Simila rly, the smalle r the value of p(x, the more u nlikely x is to a ppea r, he nce t he la rger its i nfo rmation content. The a ve ra ge information conte nt o ve r the source alphabet can be c omputed b y weightin g the information content of each source letter by its probability of occurrence, yielding the expression SUM{i=1 to n} [-p(a(i)) lg p(a(i))]. This quantity is referred to as the entropy of a source letter, or the entropy of the source, and is denoted by H. Since the length of a codeword for message a(i) must be sufficient to carry the information content of a(i), entropy imposes a lower bo und o n the numbe r of bits re qu ired for the c ode d messa ge. The total number of bits must be at least as large as the product of H and the length of the source ensemble. Since the value of H is generally not an integer, variable length codewords must be used if the lower bound is to be achieved. Given that message EXAMPLE is to be encoded one letter at a time, the entropy of its source can be calculated using the probabilities given in Figure 1.3: H = 2.894, so that the minimum number of bits contained in an encoding of EXAMPLE is 116
16
H uffma n uses average me ssa ge length, SUM p (a (i)) l(i), as a measure of the efficiency of a code. Clearly the meaning of this term is the average le ngth of a coded messa ge. We will use the term average codeword le ngth to repre sent this quantity. Since redundanc y is defined to be a ve rage codewo rd le ngth min us entropy and entropy is constant average for a given length probability minimizes distribution, redundanc y. ` The amount of compression yielded by a coding scheme can minimizing codeword
be measured by a compression ratio. The term compression ratio ha s been defined in se veral wa ys. The definition C = (a ve ra ge messa ge len gth)/(average c odeword length) captures the commo n meaning, which is a comparison of the length of the coded messa ge to the le ngth of the original en semb le. If we think of the characters of the ensemble EXAMPLE as 6-bit ASCII characters, then the average me ssa ge length is 6 bits. The Huffman c ode in the above EXAMPLE in 117 bits, or 2.9 bits per character. This yields a compression ratio of 6/2.9, representing compression by a facto r of more than 2. Alternatively, we ma y sa y that Huffma n encoding produces a file whose size is 49% of the original ASCII file, or that 49% comp ressio n has been achie ve d. 3. Introduction To Huffman coding The latest trend in data compression is Variable Length Decoder for Huffman codes, with which the amount of the digital data is compressed. Using the decoder we can obtain the original data from the compressed data.
17
Figure 3.1 System Diagram The figure 3.1 shows the system diagram. The system is divided into two large blocks such as Sender and Decoder. The sender transmits the compressed data using Huffman codes, and the decode r uncomp re sses the data and recove r the origina l data. Since the Huffman code length is not fixed, the decoder has to find out the length of each code while processing. In the following sections, the basics of the Huffman codes are explained in o rde r the students who has the digita l design kno wled ge to challenge the design task. Suppose the following example. There is a sentence in which only 5 characters A to E are used. CEACDABABCEABADACADABABADEACBADABCADAEE In total there are 39 characters. The table 1 show the each cha racte r's appea rance fre que nc y. Symbol Frequency to appear A B C D E 15 7 6 6 5
18
Table 3.1. Each Character's Appearance Frequency From the table 1, you will notice the appearance frequency of 'A' is ma ximum. If you w an t to express the sente nce as a digital data, you can assign fixed length bit pattern to each alphabets. Please see the table 2. Symbol Frequency to appear Bit Pattern A B C D E 15 7 6 6 5 000 001 010 011 100 Total bits 15*3=45 7*3=21 6*3=18 6*3=18 5*3=15 Total 117bits Table 3.2. Fixed Length Code In order to express the 5 alphabets, 3 bits-length code is used. Since there are totally 39 alphabets, 39*3=117 bits are needed to express the sentence in digital data. In the table 2 example used the 3 bits fixed length code. Since the appearance frequency of 'A' is high, if we use shorter length code for 'A', the amount of digital da ta to ex press the sentence can be decreased. One method to do this is to use the Huffman code. The table 3 shows the the example using the Huffman code. The total number of bits are reduced from 117 to 87 bits.
19
Symbol A B C D E
Frequency to appear 15 7 6 6 5
Huffman code 0 100 101 110 111
code length 1 3 3 3 3
Total bits 15*1=15 7*3=21 6*3=18 6*3=18 5*3=15 Total 87bits
Table 3.3. Huffman Code Using the Huffman code in tha table 3, the word 'BAD' can be expressed '1000110', which is 7 bits length. Instead, the bit stream '01100111' can be analyzed from the beginning then the word 'ADAE' can be recovered. Using the figure 2, the decoding method will be explained.
Figure 3.2. Huffman Code Decoding The bit stream '01100111' has to be analyzed from the be ginnin g, then fin d out a matching Huffman Co de. Since the code length is not fixed, once the matched Huffman code is
20
detected, the first bit of the next code can be found. In the exa mple of the figure 2, at first the code '0' is found. Then yo u can know the alphabet 'A' and the code length of 1. Then you can restart the analysis from the second bit. After 3 more bits are analyzed, the code '110' will be found. Then you can know the alphabet 'D' and the code length of 3. The Huffman code satisfies the condition of " any code of the group does not match the any prefix of other code of the group". Then the decoding method explained above can only generate the original data.
3.1. Adaptive huffman coding Adaptive Huffman coding was first conceived independently by Faller and Gallager. Knuth contributed improvements to the original algorithm and the resulting algorithm is referred to as algorithm FGK. A mo re recent version of adaptive Huffma n coding is described by Vitter. All of these methods are definedword schemes which dete rmine the map ping from source messa ge s to codewords based upon a remain optimal for the running estimate of the estimates. In this source the messa ge proba bilitie s. The c ode is ada ptive, chan ging so as to current way, ada ptive Huffman codes resp ond to locality. In essence, the enc ode r is " learnin g" the characteristics of the source. The decoder must learn along with the encoder by continually updating the Huffman tree so as to stay in synchronization with the encoder. Another advantage of these systems is that they require only one pass over the data. Of course, one-pass methods are not very interesting if the number that of of bits the they transmit is significantly greater than two-pass scheme.
21
In tere stin gly, the performance of these method s, in te rms of number of bits transmitted, can be better than that of static Huffman coding. Th is does not c ontrad ict the optimality of the static method as the static method is optimal only over all methods which assume a time-in va riant ma pp in g. The performance of the adaptive methods can also be worse than that of the static method. Upper bounds on the redundancy of these methods are presented in this section. As discussed in the introduction, the adaptive method of Faller, Gallager and Knuth is the basis for the UNIX utility compact. The performance of compact is quite good, providing typical compression factors of 30-40%. 3.2. Huffman Decoding Techniques The three types of decoding techniques are 1. 2. 3. Look up table metod N- level look up table method Binary tree search method
3. 2.1 Look-Up Table Me thod In this technique, Huffman tables used by the algorithm should be converted in the form of look-up table as explained below. Number of rows in the table will be 2L where L is maximum Huffman code length. Tables used for this technique must contain entry of symbols with its corresponding Huffman code lengths. Huffman code length for each symbol is stored in the table to determine the number of bits used for decoding a symbol. Remaining bits are put back into the bit-stream, where the next symbol starts. Number of bits to be extracted from the bit-stream fo r e ve ry symbol sea rch depen ds upon the maximum Huffma n code length in the table. Huffman symbols are decoded within one search since extracted bits from the bit-stream gives address to the symbol in the table.
22
Procedure for Table Conversion: Table 1 is use d to demonstrate conversion of a valid Huffma n Table into LookUp form
Table 3.2.1 The above table has maximum Huffman code length of 3. The first row of the table has, symbol = 0, Huffman code length = 1 and Huffma n code = 0. Since Huffman code len gth is less tha n the maximum value, it can have symbol entries more than one in the con ve rted table. Addresses for the look -up ta ble entries a re calculated as: Bits required for appending Huffman code Maximum Huffman code length Length of Huffman code. Bits required to append Huffman code = 3 1 = 2 000 = 0 001 = 1 010 = 2 011 = 3 These a re the four a dd re sses f rom 0 to 3, fo r whic h symbol = 0 and Huffman code length = 1 in the converted table. Next row of the table has symbol = 1, Huffman code length = 2 and Huffman code = 3. Number of bits required to append Huffman code = 3 2 = 1. 110 = 6 111 = 7 Add resses 6 a nd 7 will ha ve symb ol = 1 and Huffman code len gth = 2. In th is wa y, other ad dre sses of the table e nt ries are calculated. Since the maximum code length for Table1 is 3, the number of rows in the converted =
23
table will be 2 = 2 3 = 8. Table2 below shows converted Huffman Table for Table1:
Table3.2.2 Example of a Decoding Procedure : Let " 10 00110" be a valid bit-stream. Since 3 is the maximum Huffman code length of Table2, extract 3 bits from the bitstream. First 3 bits give address "100" (= 4), which corresponds to symbol 3 and Huffman code length = 3 in the look-up table. Again e xtract 3 bits " 011" (= 3) to get the next symbol from the bit-Stream. This ad dress give s symbo l = 0 and Huffman c ode len gth = 1. Since 3 bits are extracted f rom the bit-stream and the actual length of Huffman code is 1, hence 2 bits are put back into the bit-stream. The bit-stream pointer now points to 5th bit in the bit-stream. addressing Aga in "110" extract (= 6), 3 its bits from the bitstream = 1 for and corresponding symbol
Huffman code
length = 2, put back 1 bit into the bit-stream. In
this wa y the symbols are sea rched in Lo ok-Up Table method. This method requires huge memory for tables having long Huffman codes and hence inefficient for such tables. B ut it is ve ry useful for the tables having small Huffman codes. This is the fastest Huffman Decoding Technique. 4. Very Large Scale Integration: VLSI stands for " Ve ry Large Scale Integration". This is the field which involves packing more and more logic devices into
24
smaller and smaller areas. Thanks to VLSI, circuits that would have taken boardfuls of space can now be put into a small space few millimeters across! This has opened up a big opportunity to do things that were not possible before. VLSI circuits are e ve rywhere ... your computer, you r ca r, your bran d new state -of the-art digital camera, the cell-phones, and what have you. All this involve s a lot of expertise on man y fronts within the same field, which we will look at in later sections.
4.1 Dealing with VLSI circuits: Digital VLSI circuits are predominantly CMOS based. The way normal blocks like latches and gates are implemented is differen t from what students ha ve seen so far, but the beha viou r remains the same. All the miniaturisatio n involves new things to consider. A lot of thought has to go into actual implementations as well as design. Let us look at some of the factors involved ... 1. Circuit Delays. Large complicated circuits running at very high frequencies ha ve o ne big problem to tackle - the problem of de lays in propagation of signals th rough ga tes a nd wires ... e ve n for areas a few micrometers across! The operation speed is so large that as the dela ys add up, the y can actually beco me comparable 2. Power. Another to effect of the high clock operation speeds. frequencies is
increased consumption of power. This has two-fold effect devices consume batteries faster, and heat dissipation increases. Coupled with the fact that surface areas ha ve decreased, heat poses a major threat to the stability of the circuit itself. 3. Layout. Laying out the circuit components is task common to all branches of electronics. Whats so special in our case is that there are many possible ways to do this; there can be multiple layers of different mate rials o n the same silic on, there ca n be
25
different component
arrangements
of and
the
smaller
parts so
for
the
same on.
The power dissipation and speed in a circuit present a trade-off; if we try to optimize on one, the other is affected. The choice between the two is determined by the way we chose the layout the circuit components. Layout can also affect the fabrication of VLSI chips, making it either easy or difficult to implement the components on the silicon. 4.2. The VLSI design process A typical digital design flow is as follows: Specification Architecture RTL RTL Synthesis Backend Tape Out to Foundry to get end product.a wafer with repeated number of identical Ics. All modern digital designs start with a designer writing a hardware description of the IC (using HDL or Hardware Description La ngua ge ) in Verilog/VHDL. A Verilog or VHDL program e ssen tially d esc ribes the ha rdwa re (logic gates, Flip Flops, counters etc) and the interconnect of the circuit blocks and the functiona lity. Without going into details, we can say that the Verilog can be called as the " C" of the VLSI industry. This la nguage is use d to de sign the c irc uits at a high-le vel, in two ways. It can eithe r be a behavioural description, which describes what the circuit is supposed to do, or a structural description, which describes what the circuit is made of. There are o ther la ngua ge s for describin g circu its, suc h as Ve rilog, which work in a similar fashion. 26 Coding Verification
B oth forms of de sc ription a re then used to ge nera te a very low-le vel desc riptio n that actua lly spells out ho w all this is to be fabricated manufacture on the silicon of chips. the This will result in the IC. intended
A typical analog design flow is as follows: In case of analog design, the flow changes somewhat. Specifications Architecture Circuit Design SPICE Simulation Layout Parametric Extraction / Back Annotation Final Design Wh ile digital design is high ly automate d now, ve ry small portion of analog design can be automated. There is a hardware description language called AHDL but is not widely used as it does not accurately give us the behavioral model of the circuit because of the complexity of the effects of parasitic on the ana lo g be ha vior of the circuit. Man y ana lo g chips a re what a re termed as flat or non-hierarchical designs. This is true for small transistor count chips such as an operational amplifier, or a filter o r a powe r ma na gement c hip. Fo r more complex analo g chips suc h a s data con ve rters, the design is d one at a transisto r level, b uildin g u p to a cell level, the n a block le vel a nd the n integrate d at a chip le vel. 4.3. Classification of VLSI designs 1. Analog: Small transistor count precision circuits such as Amplifiers, Data converters, filters, Phase Locked Loops, Sensors etc.
27
2 Application Specific Integrated Circuits: Progress in the fabrication of IC's has enabled us to create fast and powerful circuits in smaller and smaller devices. This also means that we can pack a lot more of functionality into the same area. The biggest application of this ability is found in the design of ASIC's. These are IC's that are created for specific purposes - each device is created to do a particular job, and do it well. The most common application area for this is DSP - signal filters, image compression, etc. To go to extremes, consider the fact that the digital wristwatch normally consists of a single IC doing all the time-keeping jobs as well as extra features like ga mes, calenda r, etc. 3. SoC or Systems on a chip: These are highly complex mixed signal circuits (digital and analog all on the same chip). A network processor chip or a wireless radio chip is an example of an SoC. 4.4. Developments in the field of VLSI There are a number of directions a person can take in VLSI, and they are all closely related to each other. Together, these developments are going to make possible the visions of embedded systems and ubiquitous computin g.
Reco nfigurable computing: Reco nfigu rable c omputing is a ve ry interesting a nd pretty recent de ve lo pment in microelectro nics. It in volves fab ricatin g circuits that can be reprogrammed on the fly! And no, we are not talking about microcontrollers running with EEPROM inside. Reconfigurable computing involves specially fabricated devices called FPGA's, that when programmed act just like normal electronic circuits. They are so designed that by changing or 28
" reprogrammin g" the c onnections be tween nume rous sub module s, the FPGA's can be made to behave like any circuit we wish. This fantastic ability to create modifiable circuits a gain opens up new possibilities in microelectronics. Consider for example, microprocessors which are partly reconfigurable. We know that running complex programs can benefit greatly if sup port wa s b uilt into the ha rdware itself. We co uld ha ve a microprocessor that could optimise itself for every task that it tac kled! Or then co nside r a system tha t is too big to impleme nt on hardware that may be limited by cost, or other constraints. If we use a reconfigurable platform, we could design the system so that parts of it are mapped onto the same hardware, at different times. One could think of many such applications, not the least of which is prototyping - using an FPGA to try out a new design before it is actually fabricated. This can drastically reduce de velopme nt cycles, and also sa ve some mone y tha t wo uld ha ve been spent in fabricating prototype IC's. 5. Verilog V erilog is a Hardwa re Description Lan gua ge; a textual format for describing electronic circuits and systems. Applied to electronic de sign, Verilog is intended to be used for ve rificatio n through simulation, for timing analysis, for test analysis (testability analysis and fault grading) and for logic synthesis. The Verilog HDL is an IEEE standard - number 1364. The first ve rsion of the IEEE standard fo r Ve rilo g was published in 1995. A revised version was published in 2001; this is the current version. The IEEE Verilog standard document is known as the Lan gua ge Reference Manual, or LRM. This is the complete authoritative definition of the Verilog H DL.
29
A further revision of the Verilog standard is expected to be published in 2005. Accellera - the organisation that oversees developments in the Verilog language - has also published the SystemVerilog extensions to Verilog, which is also expected to become an IEEE standard in 2005. See the appropriate Knowhow section for more details about SystemVerilog. IEEE Std 13 64 also defines the Programming Langua ge Interface, or PLI. This is a collection of software routines which permit a bidirectional interface between Verilog and other lan gua ges (u sually C). Note that verilog is not an abbreviation for VHDL HDL Verilog and VHDL are two different HDLs. They have more similarities than differences, howe ve r.
5.1 A Brief History of Verilog The history of the Verilog HDL goes back to the 1980s, when a company called Gateway Design Automation developed a logic simulator, Verilog-XL, and with it a hardware description lan gua ge. Cadence Design Systems acquired Gateway in 1989, and with it the rights to the language and the simulator. In 1990, Ca dence put the language (but not t he simulat or) into the publ ic domain, with the intention that it should become a standard, nonproprie tary langua ge. The Verilog HDL is now maintained by a non profit making organisation, Accellera, which was fo rmed from the merge r of Open Verilog International (OVI) and VHDL International. OVI had the task of taking the langua ge through the IEEE standardisation procedure. 30
In December 1995 Verilog HDL became IEEE Std. 136419 95. A re vised version was pu blished in 2001: IEEE Std. 1364 2001. This is the current version, although a further revision is expected in 2005. Accellera ha ve also been de ve loping a new standard,
Syste mVe rilog, which extends Ve rilo g. SystemVerilog is also expected to become an IEEE standard in 2005. For more details, see the Systemverilog section of Know-How There is also a draft standard for analog and mixed-signal extensions to Verilog, Verilog-AMS 5.2 Design Flow using Verilog The diagram below summarizes the high level design flow for an ASIC (i.e. gate array, standard cell) or FPGA. In a practical design situation, each step described in the following sections may be split into several smaller steps, and parts of the de sign flow will be ite rated as errors are unc ove red.
31
Figure 5.2.1 5.3 System-level Verification As a first step, Verilog may be used to model and simulate aspects of the complete system containing one or more ASICs or FPGAs. This may be a fully functional description of the system allowing the specification to be validated prior to commencing de tailed design. Alternatively, this ma y be a partial desc ription that abstracts certain properties of the system, such as a performance model to detect system performance bottle-necks. Verilog is not ideally suited to system-level mode ling. This is one motivation for SystemVe rilo g, whic h enhances Ve rilo g in this area. 5.4 RTL design and testbench creation
32
Once the overall system architecture and partitioning is stable, the detailed design of each ASIC or FPGA can commence. This starts by capturing the design in Verilog at the register transfer level, and capturing a set of test cases in Verilog. These two tasks are complementary, and are sometimes performed by different design teams in isolation to ensure that the specification is correctly interpreted. The RTL Verilog should be synthesizable if au toma tic logic synthe sis is to be use d. Test case ge neration is a major task that requires a disciplined approach and much engineering in ge nuity: the quality of the final ASIC or FPGA de pends on the co ve ra ge of these test case s. For today's large, complex designs, verification can be a real bottleneck. This it has provides features another for motivation for SystemVerilog development. 5.5 RTL verification The RTL Verilog is then simulated to validate the expediting testbench
fu nctionality a ga inst the specification. RTL simulation is usually one or two o rde rs of magnitude faster than ga te le vel simulatio n, and experience has shown that this speed-up is best exploited by doing more simulation, not spending less time on simulation. In practice it is common to spend 70-80% of the design c yc le writing a nd sim ulating Ve ril og at and a bo ve t he registe r transfer level, and 20-30% of the time synthesizing and verifying the gates. 5.6 Levels of Abstraction Verilog descriptions can span multiple levels of abstraction i.e. le vels of de tail, a nd can be used for different purposes at va riou s sta ge s in the design process.
33
Figure 5.6.1 At the highest level, Verilog contains stochastical functions (queues and random probability distributions) to support pe rf ormance modeling. V erilog supports abstract beha vioral modeling, so can be used to model the func tionality of a system at a high le vel of abstraction. This is useful at the system analysis and partitioning stage. Verilog which are supports for Register Transfer design Level of descriptions, circuits.
used
the
detailed
digital
Syn thesis to ols tran sform R TL de scription s to gate le vel. Verilog supports gate and switch level descriptions, used for the verification of digital designs, including gate and switch
34
level
logic
simulation,
static
and
dynamic
timing
analysis,
testability analysis and fault grading. Verilog and analysis. With some tools, Verilog can be used to control simulation e.g. setting breakpoints, taking checkpoints, restarting from time 0, tracing waveforms. Howe ve r, most of these functio ns a re not included in the 1364 standard, but are proprietary to particular simulators. Mo st simulators ha ve their own comman d languages; with many tools this is based on Tcl, which is an industrystan dard tool lan gua ge. 5.7 Design process The diagram below shows a very simplified view of the electronic system design process incorporating Verilog. The central portion of the diagram shows the parts of the design process whic h will be impacted b y Ve rilo g. can also be used to describe simulation
environments; test vectors, expected results, results comparison
35
Figure 5.7.1 5.8 System level Verilog is not ideally suited for abstract system-level
simulation, prior to the hardware-software split. This is to some extent addresse d by SystemVerilo g. Unlike VHDL, wh ich has support for user-defined types and overloaded operators which allow the designer to abstract his work into the domain of the problem, Verilog restricts the designer to working with predefined system functions and tasks for stochastic simulation and can be used for modeling performance, throughput and queuing but only in so far as those built-in langua ge features allow. Designers occasionally use the stochastic level of abstraction for this phase of the design process.
36
Digital Verilog is suitable for use today in the digital hardware design process, from functional simulation, manual design and logic synthe sis do wn to gate-level simulatio n. Verilo g tools provide an integrated design environment in this area. Verilog is also suited for specialized implementation-level de sign ve rificatio n to ols suc h as fault simulation, switch level simulation and worst case timing simulation. Verilog can be used to simu late gate leve l fano ut load in g effects and routing d ela ys through the import of SDF files. The RTL le vel of abstraction is used for functional
simulation prior to synthesis. The gate level of abstraction exists post-synthesis but this level of abstraction is not often created by the designer, it is a level of abstraction adopted by the EDA tools (synthesis and timing analysis, for example). Analog Because of Ve rilo g's fle xibility as a programming lan guage, it has been stretched to handle analog simulation in limited cases. There is a draft standard Verilog-AMS that addresses 5.9 Synthesizing Verilog Synthesis is a broad term often used to describe very different tools. Synthesis can include silicon compilers and fu nction ge nera to rs used by ASIC vendors to produce re gula r RAM and ROM type structure s. Synthesis in the context of this tutorial refers to generating random logic structures from Verilog descriptions. This is best suited to gate arrays and programmable devices such FPGAs. Syn thesis is n ot a panacea ! It is vital to tac kle High Level Design using Verilog with realistic expectations of synthesis.
37
The definition of Verilog for simulation is cast in stone and enshrined in the Language Reference Manual. Other tools which use Verilog, such as synthesis, will make their own interpretation of the Ve rilo g langua ge. There is an IEEE standard fo r Ve rilo g syn thesis (IEEE Std. 1364.1-2002) but no vendor ad heres strictly to it. It is not sufficient that the Verilog is functionally correct; it must be written in such a way that it directs the synthesis tool to generate good hardware, and moreover, the Verilog must be matched to the idiosyncrasies of the particular synthesis tool being used. We shall tackle some of these idiosyncrasies in this Verilog tutorial. The re are currently three kinds of synthe sis:

behavioral synthesis h igh-le vel synthesis RTL synthesis
There is some overlap between these three synthesis domains. We will concentrate on RTL synthesis, which is by far the most common. The essence of RTL code is that operations described in Verilo g a re tie d to pa rticular clock c yc les. The synthe sized netlist exhibits the same clock-by-clock cycle behaviour, allowing the RTL testbench to be easily re-used for gate-level simulation.
6. FPGA technology FPGAs are chips, which are programmed by the customer to perform the desired functionality. The chips may be programmed either once: Antifuse technology, e.g. devices manufactured by Quicklogic se ve ral times: Flash base d, e. g. devices manufactures by Actel 38
Dyn amically: SRAM based, e. g. de vices manufactured b y Actel, Altera, Atmel, Cypress, Lucent, Xilinx each technology has its own advantages, which shall be discussed only very briefly: Antifuse FPGAs: - Devices are configured by burning a set of fuses. Once the chip is configured, it cannot be altered any more. - Bug fixes and updates possible for new PCBs, but hardly for already manufactured boards. - ASIC replacement for small volumes. Flash FPGAs - Devices ma y be re-programmed se ve ral thousand times and a re non-volatile, i.e. keep their configuration after power-off - With only marginal additional effort, the chips may be updated in the field - Expensive - Re-configuration takes several seconds SRAM FPGAs - Currently the dominating technology - Unlimited re-programming - Additional circuitry is required to load the configuration into the FPGA after power on - Re-configu ration is very fast, some devices allo w even pa rtial re-configuration during operation allows new approaches and applications buzzword reconfigurable computing. E.g. a circuit, that searches for a specific DNA pattern, or a mobile phone that downloads the latest protocol update 6.1. General Overview
39
There different
are
several
families
of
FPGAs These
available device
from
semiconductor
companies.
families
slightly differ in their architecture and feature set, however most of them follow a Common approach: A regular, flexible, programmable architecture of Configurable Logic Blocks (CLBs), surrounded by a perimeter of programmable Input/Output Blocks (IOBs). These functional elements are interconnected by a powerful hierarchy of versatile routing channels. The following paragraphs describe the a rchitectu re impleme nted b y Xilinx Spartan-II FPGAs, a de vice family launched in mid 2000, which is typically used in highvolume applications where the versatility of a fast programmable solution adds benefits. The u ser-programma ble gate a rra y, shown in Figu re 6.1.1, is compose d of five major configurable elements: IOBs provide the interface between the packa ge p ins and the internal logic CLBs provide the functional elements for constructing most logic Dedicated BlockRAM memories of 4096 bits each Clock DLLs for clock-distribution delay compensation and clock domain control Versatile multi-level interconnect structure
Figure 6.1.1: Basic Spartan-II Block Diagram
40
As can be seen in Figure 6.1.1, the CLBs form the central logic structure with easy access to all support and routing structures. The IOBs are located around all the logic and memory elements for easy and quick routing of signals on and off the chip. Values stored in static memory cells control all the configurable logic elements and interconnect resources. These values load into the memory cells on power-up, and can reload if necessary to change the function of the device. 6.2. FPGA Implementation Overhead Having a gate-count metric for a device, helps to estimate the overhead created by the programmable logic fabric, compared to a standard-cell ASIC. First, we calculate the number of configuration bits required for a single logic slice. The block RAM bits are excluded from this calculation, as the block RAM implementation is close to the ideal implementation and therefore can be directly compared between ASICs and FPGAs. Each configuration bit has to be stored in a single flip-flop. This flipflop in turn, controls a specific a ttribute of the lo gic cells be ha vior. Assuming e ve ry bit drives a single add itional gate is ve ry conservative. These gates numbe rs ma y be divided by the typical gates per slice to ga in an imp ressio n of the ga tes pe r gates implementation overhead. As the feature of distributed RAM is part of the overhead, we have to take it into account he re. Taking the squa re roo t of this result, gives an impression of t he o ve rhea d in geo metric un its. The ove rhead of 5 9 implies, that a 0.15-micron FPGA easily reaches die size parity with a 1.2micron standard-cell ASIC. 6.3. Performance Characteristics According to the data sheets, Spartan-II devices provide system clock rates up to 200 MHz and internal performance as high as 333 MHz. This section 41 provides the performance
characteristics of some common functions. Unlike the data sheet figures, these examples have been described in VHDL and ran through the standard synthesis and implementation tools to ach ie ve an un derstan ding of the real world performance. In the case of multiple inputs and outputs, the worst delay is reported; all values are reported in MHz. For all performance data it should be remembered, that about 50% of the delays are caused by routing delays. The routing delays are highly dependent on device utilization and the quality of the place & route process. 6.4. FPGA design flow 6.4.1. Design Entry Design Entry is the process of creating the design and entering it into the development system. The following methods are widely used for design entry: HDL Editor State Machine Editor Block Diagram Editor Typing a design into an HDL Editor is the most obvious way of enterin g high-le ve l langua ges like VHDL in to the de velopme nt system. Recent editors offer functionality like syntax highlighting, auto completion or language templates to speed-up de sign entry. The ma in a dvanta ge of usi ng a n HD L Ed itor fo r design entry is that text files are simple to share across tools, platforms and sites. On the other side, text may not be the most convenient wa y of editin g a design ; howe ve r this is highly dependent on the design. For creating finite state machines, special editors are available. Using these editors is a convenient way of creating FSMs by graphical entry of bubble diagrams. Most tools create VHDL from the graphics representation, but hide this process completely from the use r. The main advan ta ge 42
is, that the graphical representation is much easier to understand and maintain. On the other side, sharing a design across tool or platform boundaries may be difficult. For creating structural designs, block diagram editors are available. Like FSM editors, these tools create VHDL or EDIF from the graphical representation and hide this process from the user. Again, the main advan ta ge is, that the graphical represe ntation is easie r to understand and maintain, with the drawback of a reduced compatibility across tool or platform boundaries.
Figure 6.4.1: HDL Editor
43
Figure 6.4.2: FSM Editor
Figure 6.4.3: Block Diagram Editor
44
6.4.2 Behavioral Simulation After design entry, the design is verified by performing behavioral simulation. To do so, a highle ve l or behavioral simulator is used, which executes the design by interpreting the VHDL c ode like any othe r pro grammin g language, i.e. re ga rdless of the target architecture. At this stage, FPGA development is much like software de ve lo pmen t; sign als a nd varia bles may be watched, procedures and functions may be traced, and breakpoints may be set. The entire process is very fast, as the design is not synthesized, thus giving the developer a quick and complete understanding of the design. The downside of behavioral simulation is that specific properties of the target architecture, na mely timing and resource usa ge are n ot covered. 6.4.3 Synthesis Synthesis is the process of translating VHDL to a netlist, which is built from a structure of macros, e. g. adders, perform multiplexers, and registers. Chip synthesizers
optimizations, especially hierarchy flattening and optimization of combinational paths. Specific cores, like RAMs or ROMs are treated as black boxes. Recent tools can duplicate registers, pe rf orm re -timing, or optimize the ir results acco rding to given constraints. 6.4.4 Post-Synthesis Simulation After performing chip synthesis, post-synthesis simulation is performed. Timing information is either not available or preliminary based on statistical assumptions which may not reflect the actual design. As the design hierarchy is flattened and optimized, tracing signals is difficult. Due to the mapping of the 45
design into very basic macros, simulation time is lengthy. When postsynthesis results differ from behavioral simulation, most likely initialization value s ha ve been o mitted, or dont-cares ha ve been resolved in unexpected ways.
6.4.5 Implementation Implementation is the process of translating the synthesis output into a bitstream suited for a specific target device. This process consists of the following steps: Translation mapping Place & route During translation, all instances of target-specific or
external cores, especially RAMs and ROMs are resolved. This step is much like the linking step in software development. The result is a single netlist containing all instances of the design during mapping, all macro instances are mapped onto the target architecture consisting of LUTs, IOBs, and registers. With this step completed, the design is completely described in primitives of the ta rget a rchitecture. Du ring place & route, all in stance s are assigned to ph ysical locatio ns on the silicon. This is usually a n iterative process, guided by timing constra ints provided by the designer. The process continues, until the timing constraints are either met, or the tool fails to further improve the timing.
46
6.4.6 Rapid development cycles The traditional method of designing hardware is a long and winding p rocess, going thro ugh many sta ge s with spec ial effort spent in design verificat io n at e ve ry sta ge. Thi s mea ns tha t t he time from drawing board to marke t, is very long. This pro ve s to be rather undesirable in case of large expanding market, with many competitors trying to grab a share. We need alternatives to cut down on this time so that new ideas reach the market faster, whe re the first person to get in normally gains a large advan ta ge. Another quite different language, that is still under
de velopme nt, is La va. This is based o n an e sote ric bra nch of compute r scie nce, called "functiona l p rogramming" . FP itself is pretty old, and is radically different from the normal way we write programs. This is because it assumes parallel execution as a part of its structure - its not based on the normal idea of " sequence of instructions". This parallel nature is something very suitable for hardware since the logic circuits are is inherently pa rallel in nature. Prelimina ry stu dies ha ve shown that La va can actually create better circuits than VHDL itself, since it affords a high-level view of the system, without losing sight of low-level features
7. Implementation of Huffman decoder 7.1 Block diagram
47
Shift Register
Figure 7.1 7.2 Decoding procedure 1. The input is taken bit by bit and stored in a shift register. pulse. 3. 4. 5. 6. 7. 8. 9. The output of the shift register is given to a comparator whose other input is from a look-up table. The comparator compares the output from the shift register with the look-up table. If a match is found the output is stored in an output register and the con tents of the in put shift re giste r is cleared. If no match is found the next bit is shifted and a compariso n with a look-up table is made a gain. The same process is done till a match is found. A counter is used to keep track of the number of bits being shifted into the shift register. The above process is done till all the bits are decoded. above verilog procedure code is then is implemented and using ModelSim software with verilog HDL. 11. The simulated synthesized as explained in the above chapters. 12. The Xilinx design software is used to place,route the gate components according to the design specifications. 10. The
2. A new bit gets shifted into the shift register for every clock
48
13. The softwa re ge ne rates a bit file which is then dow nloa ded onto a FPGA Spartan kit. 14. The hardware is then tested with a given set of bitstream.
8. Applications of huffman coding 8.1 Huffman Coding of the Image Data Figure 7.1 shows a one example of the black and White ima ge da ta. Each pixel (picture element) is organized by 3 b its. Then there are 8 levels of the pixel intensity.
Figu re 8.1. Black and White Ima ge The left figure in the figure 7.1 is 8 kinds of pixel intensities. The center number is the level. The most bright pixel is 0. The larger the number increases, the darker the pixel gets. Since there a re 8 levels, 3 bits can exp ress the le vels. Since the right image (cross mark) is organized by 8x8=64 pixels, simply the image needs 64x3=192 bits.
49
The table 4 shows an example to apply the Huffman code to this image. in Most 168 frequently bits. appeared the pixel number of 0 is corresponds to the code '1'. In total, the cross mark image can be expressed Then compression ration 168/192=87.5%.
Pixel Number Frequency Huffman Code Code length 0 (white) 1 2 3 4 5 6 7 (black) 24 16 8 2 2 4 4 4 1 011 0101 0100 0011 0010 0001 0000 1 3 4 4 4 4 4 4
Total bits 24 48 32 8 8 16 16 16 Total 168bit
Table 8.1. Huffman Coded Cross Image The table 8.2. Shows more details of the pixel number and its Huffman code of the cross mark. Pixel Number Huffman Code 7 0 0 0 0 0 0 7 0000 0 6 1 1 1 1 6 0 0 1 5 2 2 5 1 0 0 1 2 3 4 2 1 0 0 1 2 4 3 2 1 0 0 1 5 2 2 5 1 0 0 6 1 1 1 1 6 0 1 1 1 1 1 1 1 1 1 011 1 011 1 1 0000 1 1 1 1 1 1 0000
0001 011
011 0001
011 0010 0101 0101 0010 011 011 0101 0100 0011 0101 011 011 0101 0011 0100 0101 011 011 0010 0101 0101 0010 011 0001 011 1 1 011 1 011 1 011 0001 1 1
7 0 0 0 0 0 0 7 0000
50
Table 8.2. Pixel Number and Huffman Code of Crossmark 8.2. The Huffman Coding in MP3 In the MP3 encoder at the end of the perceptual coding process, a seco nd c ompre ssion process is ru n. Howe ve r, this second rou nd is not a pe rceptual co ding, but rathe r a more traditional compression of all the bits in the file, taken together as a wh ole. To use a loose analogy, you might think of this second run, called the " Huffman coding," as being similar to zip or other standard compression mechanisms (in other words, the Huffman run is completely lossless, unlike the perceptual coding techniques). Huffman coding is extremely fast, as it utilizes a look-up table for spotting possible bit substitutions. In other words, it doesn' t ha ve to " figure a nyth in g o ut" in orde r to do its job. The chief benefit of the Huffman compression run is that it compensates for those areas where the perceptual masking is less efficient. For example, a passa ge of music that co ntains man y sounds happening at once (i.e., a polyphonous" passage) will be nefit grea tly from the maskin g filter. Howe ve r, a musical phrase consisting only of a single, sustained note will not. Howe ve r, th is passa ge can be compressed very efficiently with more tra ditional means, d ue to its high le ve l of red undancy. On a ve ra ge, an a dditio nal 20% of the total file size can be sha ved du ring the Huffman cod in g.
Huffman Decoding in MP3 The great bulk of the work in the MP3 system as a whole is placed on the encoding process. Since one typically plays files more frequently than one encodes them, this makes sense.
51
Decoders do not need to store or work with a model of human psychoacoustic principles, nor do they require a bit allocation procedure. All the MP3 player has to worry about is examining the bitstream of header and data frames for spectral components and the is side information but an stored (often) alongside fancy them, and onto then your reconstructing this information to create an audio signal. The pla ye r nothing interface collection of MP3 files and playlists and your sound card, encapsulating the relatively straightforwa rd rules of decoding the MP3 bitstream format. While there are measurable differences in the efficiencyand audible differences in the quality-of various MP3 decoders, the differences are largely negligible on computer hardware manufactured in the last few years. That's not to say that decoders just sit in the background consuming no resources. In fact, on some machines and some operating systems you'll notice a slight (or even pronounced) sluggishness in other operations while your player is running. This is particularly true on operating systems that don't feature a finely grained threading m odel, such a s Mac OS and most ve rs ions of Wi nd ows. Li nu x and, to an even greater extent, BeOS are largely exempt from MP3 skipping problems, given decent hardware. And of course, if yo u're liste ning to M P3 a udio streamed o ve r the Inte rnet, you'll get skipping problems if you don't have enough bandwidth to handle the bitrate/sampling frequency of the stream. Some MP3 decoders chew up more CPU time than others, but the differences between them in terms of efficiency is not as great as the differences between their feature sets, or between the efficienc y of va riou s encode rs. Cho osing an MP3 pla ye r becomes a que stion of cost, extensibility, au dio quality, and appearance.
52
CONCLUSION We have presented a novel method for Huffman decoding using FPGA. The system can be used directly as the final stage for Multimedia decoders like JPEG, MPEG. This method uses a serial input method of input with a look up table approach for decoding. This is a better option compared to the parallel approach to Huffman decoding which is used in present system of Huffman decoding which when implemented on an FPGA uses more number of flip-flops and computational power. Though the system presented above uses more time then the parallel approach for decoding, it will not be a constraint in the present scenario because modern FPGAs uses much faster clock timing. The coding was done in Verilog hardware description language instead of the standard VHDL used in the present system which is complex than Verilog. The advantages of the above system on FPGA are that Reprogrammable, quickly debuggable, less power and space requirements on the FPGA chip. The look-up table method is the most common and the most widely accepted method for Huffman decoder than the other systems available for decoding. Thus the proposed system lays minimal strain on the FPGA and extracts maximum output for the given input.
APPENDIX Look up table for English alphabets: A 1101 B 001101 C 01100 D 0010 E 101 F 111100 G 001110 H 0100 I 1000
53
J 11111100 K 11111101 L 01111 M 01101 N 1100 O 1110 P 111101 Q 111111100 R 1001 S 0101 T 000 U 01110 V 001100 W 001111 X 111111101 Y 111110 Z 11111111 Source code for Huffman decoder: module huffmanmod(out,in,clk); // declaring huffman module input clk,in; output out; //declarin g in put,ou tp ut and the type wire [8:1] out; reg [8:1] tout; reg [9:1] bitshift; integer i=0; //initializing design initial begin
54
//making all registers 0 bitshift=9'b0; tout=8'b0; end always @(posedge clk) begin //shifting the bits bitshift = {bitshift[8:1],in}; i=i+1; if (i == 9) casex(bitshift) 9'b111111100: begin tout=8'b00010001; bitshift = 9'b0; i=0; $di spla y(" the outpu t is %d Q" ,tout ); end 9'b111111101: begin tout=8'b00011000; bitshift = 9'b0; i=0; $displa y(" the output is %d X" ,tout); end endcase
55
else if (i == 8) casex(bitshift) 9'b?11111111: begin tout=8'b00011010; bitshift = 9'b0; i=0; $ d i s p l a y ( " t h e o u t p u t i s % d Z" , t o u t ) ; end 9'b?11111100: begin tout=8'b00001010; bitshift = 9'b0; i=0; $display(" the output is %d J",tout); end 9'b?11111101: begin tout=8'b00001011; bitshift = 9'b0; i=0; $ d i s p l a y ( " t h e o u t p u t i s % d K" , t o u t ) ; end endcase else if (i == 6) casex(bitshift) 9'b??? 001101: begin tout=8'b00000010;
56
bitshift = 9'b0; i=0; $display(" the output is %d B",tout); end 9'b??? 111100: begin tout=8'b00000110; bitshift = 9'b0; i=0; $display("the output is %d F",tout); end 9'b??? 001110: begin tout=8'b00000111; bitshift = 9'b0; i=0; $displa y(" the output is %d G" ,tout); end 9'b??? 111101: begin tout=8'b00010000; bitshift = 9'b0; i=0; $display("the output is %d P",tout); end 9'b??? 001100: begin tout=8'b00010110; bitshift = 9'b0; 57
i=0; $displa y(" the output is %d V" ,tout); end 9'b??? 001111: begin tout=8'b00010111; bitshift = 9'b0; i=0; $displa y(" the output is %d W",tout); end 9'b??? 111110: begin tout=8'b00011001; bitshift = 9'b0; i=0; $displa y(" the output is %d Y" ,tout); end Endcase else if (i == 5) casex(bitshift) 9'b???? 01100: begin tout=8'b00000011; bitshift = 9'b0; i=0; $display(" the output is %d C",tout); end 9'b???? 01111:
58
begin tout=8'b00001100; bitshift = 9'b0; i=0; $ d i s p l a y ( " t h e o u t p u t i s % d L" , t o u t ) ; end 9'b???? 01101: begin tout=8'b00001101; bitshift = 9'b0; i=0; $display(" the output is %d M" ,tout); end 9'b???? 01110: begin tout=8'b00010101; bitshift = 9'b0; i=0; $displa y(" the output is %d U" ,tout); end endcase else if (i == 4) casex(bitshift) 9'b?????1101: begin tout=8'b00000001; bitshift = 9'b0; i=0;
59
$displa y(" the output is %d A" ,tout); end 9'b?????0010: begin tout=8'b00000100; bitshift = 9'b0; i=0; $displa y(" the output is %d D" ,tout); end 9'b?????0100: begin tout=8'b00001000; bitshift = 9'b0; i=0; $displa y(" the output is %d H" ,tout); end 9'b?????1000: begin tout=8'b00001001; bitshift = 9'b0; i=0; $disp la y(" the ou tput is %d I" ,tout); end 9'b?????1100: begin tout=8'b00001110; bitshift = 9'b0; i=0;
60
$displa y(" the output is %d N" ,tout); end 9'b?????1110: begin tout=8'b00001111; bitshift = 9'b0; i=0; $displa y(" the output is %d O" ,tout); end 9'b?????1001: begin tout=8'b00010010; bitshift = 9'b0; i=0; $display(" the output is %d R",tout); end 9'b?????0101: begin tout=8'b00010011; bitshift = 9'b0; i=0; $display("the output is %d S",tout); end endcase else if (i == 3) casex(bitshift) 9'b??????101:
61
begin tout=8'b00000101; bitshift = 9'b0; i=0; $disp la y(" the ou tput is %d E" , tout ); end 9'b??????000: begin tout=8'b00010100; bitshift = 9'b0; i=0; $disp la y(" the ou tput is %d T" , tout ); end endcase else; end //assigning the output to the wire output assign out=tout; endmodule module stimulus; //stimulus block reg clk; reg in; wire [8:1] out; initial clk=1'b0; always #1 clk = ~clk;
62
huffman1 h1(out,in,clk); initial begin #in=1'b1; in=1'b1; #2 in=1'b0; #2 in=1'b1; #2 in=1'b0; #2 in=1'b0; #2 in=1'b1; #2 in=1'b0; #2 in=1'b0; #2 in=1'b0; #2 in=1'b1; #2 in=1'b1; #2 in=1'b1; #2 in=1'b0; #2 in=1'b1; #2 in=1'b0; #2 in=1'b0; #2 in=1'b0; #2 in=1'b1; #2 in=1'b1; #2 in=1'b0; #2 in=1'b0; #2 in=1'b1; #2 in=1'b1; #2 in=1'b1; #2 in=1'b0; in=1'b0; //bitstream 0101 #2 #2 in=1'b1; #2 in=1'b0; #2 in=1'b1; #2 in=1'b0; 000 #2 in=1'b0; 63 //bitstream //bitstream 1110 //bitstream 1100 //bitstream 1000 //bitstream 001110 //bitstream 0010 #2
#2 in=1'b0; #2 in=1'b0; #2 in=1'b1; #2 in=1'b1; #2 in=1'b1; #2 in=1'b0; #2 in=1'b0; #2 in=1'b0; #2 in=1'b0; #2 in=1'b1; #2 in=1'b1; #2 in=1'b1; #2 in=1'b1; #2 $stop; end endmodule Simulation Results //bitstream 000 //bitstream 01110
64
65
REFERENCES 1. Ji-Han Jiang, Chin-Chen Chang, and Tung-Shou Chen, An Efficient Huffman Decoding Method Based on Pattern Partition and Look-up Table
2. Zulfakar Aspar, Zulkalnain Mohd Yusof, Ishak Suleiman, Parallel Huffman Decoder with an Optimize Look UP Table Option on FPGA Faculty of Electrical Engineering, Universiti Teknologi Malaysia
3. FPGA design guides from XILINX.
4. Samir Palnitkar, Verilog HDL Prentice Hall, Published 2003, 2nd Bk&CD edition, ISBN 0130449113
5. www.scholar.google.com 6. Introduction to data compression by khalid sayood
66

2063

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

2063

Hochgeladen von

Copyright:

Verfügbare Formate

IMPLEMENTATION OF HUFFMAN DECODER IN FPGA USING VERILOG

A PROJECT REPORT Submitted by P. ABHIRAM (41501106001) S. ANAND (41501106005) W. ANAND (41501106006)

ABSTRACT Digital bandwidth compression of data in is the important due to the

11 100 101 110 111

011 100 101 110 111

ddddd eeeeee fffffff gggggggg space Figure 2.2:

schemes with respect to message and codeword lengths, these 12

Huffman code 0 100 101 110 111

Total bits 151=15 73=21 63=18 63=18 5*3=15 Total 87bits

length = 2, put back 1 bit into the bit-stream. In

environments; test vectors, expected results, results comparison

behavioral synthesis h igh-le vel synthesis RTL synthesis

Figure 6.1.1: Basic Spartan-II Block Diagram

Figure 6.4.1: HDL Editor

Figure 6.4.2: FSM Editor

Figure 6.4.3: Block Diagram Editor

7. Implementation of Huffman decoder 7.1 Block diagram

Total bits 24 48 32 8 8 16 16 16 Total 168bit

3. FPGA design guides from XILINX.

5. www.scholar.google.com 6. Introduction to data compression by khalid sayood

Das könnte Ihnen auch gefallen