Documentation Pprojectdf

Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using Verilog HDL
Chapter 1 INTRODUCTION
1.1 Overview of the project

Convolutional coding has been used in communication systems including deep space communications and wireless communications. It offers an alternative to block codes for transmission over a noisy channel. An advantage of convolutional coding is that it can be applied to a continuous data stream as well as to blocks of data. IS-95, a wireless digital cellular standard for CDMA (code division multiple access), employs convolutional coding. A third generation wireless cellular standard, under preparation, plans to adopt turbo coding, which stems from convolutional coding. The Viterbi decoding algorithm, proposed in 1967 by Viterbi, is a decoding process for convolutional codes in memory-less noise. The algorithm can be applied to a host of problems encountered in the design of communication systems. The Viterbi decoding algorithm provides both a maximum-likelihood and a maximum a posteriori algorithm. A maximum a posteriori algorithm identifies a code word that maximizes the conditional probability of the decoded code word against the received code word, in contrast a maximum likelihood algorithm identifies a code word that maximizes the conditional probability of the received code word against the decoded code word. The two algorithms give the same results when the source information has a uniform distribution. Traditionally, performance and silicon area are the two most important concerns in VLSI design. Recently, power dissipation has also become an important concern, especially in battery- powered applications, such as cellular phones, pagers and laptop computers. Power dissipation can be classified into two categories, static power dissipation and dynamic power dissipation Typically, static power dissipation is due to various leakage currents, while dynamic power dissipation is a result of charging and discharging the parasitic capacitance of transistors and wires. Since the dynamic power dissipation accounts for about 80 to 90 percent of overall power dissipation in CMOS circuits; numerous techniques have been proposed to reduce dynamic power dissipation. These techniques can be applied at different levels of
Department of ECE, GITAM UNIVERSITY
digital design, such as the algorithmic level, the architectural level, the gate level and, the circuit level. A Viterbi decoder uses the Viterbi algorithm for decoding a bit stream that has been encoded using Forward error correction based on a Convolutional code. The Viterbi algorithm is commonly used in a wide range of communications and data storage applications. It is used for decoding convolutional codes, in baseband detection for wireless systems, and also for detection of recorded data in magnetic disk drives. The requirements for the Viterbi decoder or Viterbi detector, which is a processor that implements the Viterbi algorithm, depend on the applications where they are used. This results in very wide range of required data throughputs and power or area requirements. Viterbi detectors are used in cellular telephones with low data rates, of the order below 1Mb/s but with very low energy dissipation requirement. They are used for trellis code demodulation in telephone line modems, where the throughput is in the range of tens of kb/s, with restrictive limits in power dissipation and the area/cost of the chip. On the opposite end, very high speed Viterbi detectors are used in magnetic disk drive read channels, with throughputs over 600Mb/s. But at these high speeds, area and power are still limited. Convolutional coding has been used in communication systems including deep space communications and wireless communications. It offers an alternative to block codes for transmission over a noisy channel. An advantage of convolutional coding is that it can be applied to a continuous data stream as well as to blocks of data. IS-95, a wireless digital cellular standard for CDMA (code division multiple access), employs convolutional coding.
1.2 Motivation
Unlike wired digital networks, wireless digital networks are much more prone to bit errors. Packets of bits that are received are more likely to be damaged and considered unusable in a packetized system. Error detection and correction mechanisms are vital and numerous techniques exist for reducing the effect of bit-errors and trying to ensure that the receiver eventually gets an error free version of the packet. The major techniques used are error detection with Automatic Repeat
Request (ARQ), Forward Error Correction (FEC) and hybrid forms of ARQ and FEC (H-ARQ). This project focuses on FEC techniques. Forward Error Correction (FEC) is the method of transmitting error correction information along with the message. At the receiver, this error correction information is used to correct any biterrors that may have occurred during transmission. The improved performance comes at the cost of introducing a considerable amount of redundancy in the transmitted code. There are various FEC codes in use today for the purpose of error correction. Most codes fall into either of two major categories: block codes and convolutional codes. Block codes work with fixed length blocks of code. Convolutional codes deal with data sequentially (i.e. taken a few bits at a time) with the output depending on both the present input as well as previous inputs. In terms of implementation, block codes become very complex as their length codes, are less complex and therefore easier to implement. In
packetized digital networks convolutionally coded data would still be transmitted as packets or blocks. However these blocks would be much larger in comparison to those used by block codes. The fact that convolutional codes are easier to implement, coupled with the emergence of a very efficient convolutional decoding algorithm, known as Viterbi Algorithm is one of the reasons for convolutional codes becoming the preferred method for real time communication technologies. This project studies the use of various error detection and correction techniques for mobile networks with a focus on non-recursive convolutional coding and the Viterbi Algorithm. The constraint length of a non-recursive convolutional code results from the number of stages present in the combinatorial logic of the encoder. The error correction power of a convolutional code increases with its constraint length. However, decoding complexity increases exponentially as the constraint length increases. Fortunately, the efficiency of the Viterbi algorithm allows the use of convolutional coding with quite reasonable constraint lengths in many applications. Due to its high accuracy in finding the most likely sequence of states, the Viterbi algorithm is used in many applications ranging from communication networks, optical character recognition and even DNA sequence analysis. Recently, interest has grown in the use of certain error correction codes that provide much superior performance. Two of these codes are Low Density Parity Check codes and Turbo Codes. The ideas presented in this thesis are likely to
Department of ECE, GITAM UNIVERSITY 3
be relevant to these more advanced codes as well as non-recursive convolutional codes, but this thesis will concentrate on convolutional codes. Since preservation of battery energy is a major concern for mobile devices, it is desirable that the error detection and correction mechanism take the minimum amount of energy to execute. This project explores the possibility of improving the energy efficiency of the Viterbi decoder and develops an algorithm to achieve this.
1.3 Outline and Context of the Report

This project focuses on the use of Viterbi Algorithm for forward error correction in mobile networks. It is desirable to keep energy consumption at a minimum in order to optimize use of available battery energy. In order to get good error correcting capabilities, the constraint length must be kept high and since the complexity of a convolutional decoder increases exponentially with its constraint length, optimizing the decoding mechanism with respect to energy consumption becomes a worthwhile goal. The growing need for improved energy efficiency of decoders has resulted in several approaches being explored. The main focus of the project is to explore an idea, proposed by Barry Cheetham which is to switch off the Viterbi decoder and use a simpler decoder when no bit-errors are occurring. It is possible that by doing this, a significant amount of energy could be saved. When bit-errors are detected, the Viterbi decoder can be switched back on to take advantage of its error correction functionality. This process at the receiver depends on having a memory of previous bits received. Correctly maintaining and using this previous memory (previous history) when switching between the two decoders is one of the main technical challenges in the project. The energy saving mechanism proposed by Barry Cheetham is based on an earlier idea published by Wei Shao, though it is hoped that the new approach will be easier to implement. This algorithm can be developed using verilog though it will require a custom designed version of the Viterbi algorithm to be developed from scratch, and then adapted to the new energy saving idea. Possible problems that may affect the accuracy and energy saving capabilities of the algorithm must be analyzed and solutions to these problems must be developed. The performance of the resulting
algorithm must be studied in terms of bit-error performance, packet loss rates and processing time. In principle, evaluating the performance of the new technique requires profiling of the energy consumption of the two algorithms involved. To do this accurately would require resources beyond the scope of the project verilog provides some profiling facilities. But relating information obtained to energy consumption as would be observed in a VLSI implementation of the code is a complex issue. Nevertheless, it is believed that the execution times of particular parts of the algorithms can give some idea of the likely relationship between the energy consumption of these particular parts. Hence, in place of quoting estimations of the likely energy consumption of different techniques, execution times will be quoted with an implicit assumption that this gives a first order approximation to the likely energy consumption. By comparison with the standard Viterbi decoder available verilog an analysis will be made of whether this method provides a significant improvement over existing mechanisms.
1.4 Contributions and main objectives

The main objectives of this project are as follows 1. An understanding of the background literature relevant to error detection and
error control mechanisms as currently used in packetized digital communication networks. 2. A detailed understanding of the concept of convolutional coding, and decoding An implementation of the Viterbi algorithm in verilog to obtain a custom
using the Viterbi algorithm. 3. designed version called My Viterbi and check that it is working correctly by comparing its performance with that of the Viterbi decoder function provided by verilog (A custom designed Viterbi decoder is needed because verilog does not provide access to the code. 4. A resolution of questions that still need to be answered about the new
algorithm including the correct initialization of component decoders and the stability of the feedback mechanism 5. An implementation in verilog of the new algorithm as a modification of the
custom designed Viterbi algorithm.
6.
An evaluation of the new algorithm in terms of its accuracy and capacity for
achieving energy saving the Analysis will be performed on the basis of bit-error performance, packet loss rates and execution time (considered to provide a first order approximation to energy.
1.5 Scope of the Project

This project is intended to further develop and implement the energy saving decoding algorithm developed by Barry Cheetham. Solutions to some issues that still remained to be resolved at the beginning of this project. The main focus of this project is to provide a working demonstration of the algorithm by implementation in verilog and to analyze its performance by comparison with the standard Viterbi decoder available in verilog. The system will be developed using a hard decision Viterbi decoder but may be extended to using a soft decision decoder. The project does not consider the circuit level design of the algorithm but uses a high level approach to test the proposed algorithm. This may be considered in future work if it is found that this algorithm promises considerable benefits over existing mechanisms.
Chapter 2
2.1 Overview of VLSI

The first semiconductor chips held one transistor each. Subsequent advances added more and more transistors, and, as a consequence, more individual functions or systems were integrated over time. The first integrated circuits held only a few devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it possible to fabricate one or more logic gates on a single device. Now known retrospectively as "small-scale integration" (SSI), improvements in technique led to devices with hundreds of logic gates, known as large-scale integration (LSI), i.e. systems with at least a thousand logic gates. Current technology has moved far past this mark and today's microprocessors have many millions of gates and hundreds of millions of individual transistors. At one time, there was an effort to name and calibrate various levels of large-scale integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used. But the huge number of gates and transistors available on common devices has rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of integration are no longer in widespread use. Even VLSI is now somewhat quaint, given the common assumption that all microprocessors are VLSI or better. As of early 2008, billion-transistor processors are commercially available, an example of which is Intel's Montecito Itanium chip. This is expected to become more commonplace as semiconductor fabrication moves from the current generation of 65 nm processes to the next 45 nm generations (while experiencing new challenges such as increased variation across process corners). Another notable example is NVIDIAs 280 series GPU. This microprocessor is unique in the fact that its 1.4 Billion transistor count, capable of a teraflop of performance, is almost entirely dedicated to logic (Itanium's transistor count is largely due to the 24MB L3 cache). Current designs, as opposed to the earliest devices, use extensive design automation and automated logic synthesis to lay out the transistors, enabling higher levels of complexity in the resulting logic functionality. Certain high-performance logic blocks like the SRAM cell, however, are still designed by hand to ensure the highest efficiency (sometimes by bending or
breaking established design rules to obtain the last bit of performance by trading stability).
2.2 INTRODUCTION OF VLSI

Very-large-scale integration (VLSI) is the process of creating integrated circuits by combining thousands of transistor-based circuits into a single chip. VLSI began in the 1970s when complex semiconductor and communication technologies were being developed. The microprocessor is a VLSI device. The term is no longer as common as it once was, as chips have increased in complexity into the hundreds of millions of transistors. The first semiconductor chips held one transistor each. Subsequent advances added more and more transistors, and, as a consequence, more individual functions or systems were integrated over time. The first integrated circuits held only a few devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it possible to fabricate one or more logic gates on a single device. Now known retrospectively as "small-scale integration" (SSI), improvements in technique led to devices with hundreds of logic gates, known as large-scale integration (LSI), i.e. systems with at least a thousand logic gates. Current technology has moved far past this mark and today's microprocessors have many millions of gates and hundreds of millions of individual transistors. At one time, there was an effort to name and calibrate various levels of large-scale integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used. But the huge number of gates and transistors available on common devices has rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of integration are no longer in widespread use. Even VLSI is now somewhat quaint, given the common assumption that all microprocessors are VLSI or better. As of early 2008, billion-transistor processors are commercially available, an example of which is Intel's Montecito Itanium chip. This is expected to become more commonplace as semiconductor fabrication moves from the current generation of 65 nm processes to the next 45 nm generations (while experiencing new challenges such as increased variation across process corners). Another notable example is NVIDIAs 280 series GPU.
This microprocessor is unique in the fact that its 1.4 Billion transistor count, capable of a teraflop of performance, is almost entirely dedicated to logic (Itanium's transistor count is largely due to the 24MB L3 cache). Current designs, as opposed to the earliest devices, use extensive design automation and automated logic synthesis to lay out the transistors, enabling higher levels of complexity in the resulting logic functionality. Certain high-performance logic blocks like the SRAM cell, however, are still designed by hand to ensure the highest efficiency (sometimes by bending or breaking established design rules to obtain the last bit of performance by trading stability).
2.3 What is VLSI?

VLSI stands for "Very Large Scale Integration". This is the field which involves packing more and more logic devices into smaller and smaller areas. VLSI size Applications wide ranging: most electronic logic devices Simply we say Integrated circuit is many transistors on one chip. Design/manufacturing of extremely small, complex circuitry using modified
semiconductor material Integrated circuit (IC) may contain millions of transistors, each a few mm in
2.3.1 History of scale integration

late 40s Transistor invented at Bell Labs late 50s First IC (JK-FF by Jack Kilby at TI) early 60s Small Scale Integration (SSI) 10s of transistors on a chip 100s of transistors on a chip early 70s Large Scale Integration (LSI) 1000s of transistor on a chip early 80s VLSI 10,000s of transistors on a chip (later 100,000s & now 1,000,000s) Ultra LSI is sometimes used for 1,000,000s
SSI - Small-Scale Integration (0-102) MSI - Medium-Scale Integration (102-103) LSI - Large-Scale Integration (103-105) VLSI - Very Large-Scale Integration (105-107) ULSI - Ultra Large-Scale Integration (>=107)
2.3.2 Advantages of ICs over discrete components

While we will concentrate on integrated circuits, the properties of integrated circuitswhat we can and cannot efficiently put in an integrated circuit-largely determine the architecture of the entire system. Integrated circuits improve system characteristics in several critical ways. ICs have three key advantages over digital circuits built from discrete components: Size. Integrated circuits are much smaller-both transistors and
wires are shrunk to micrometer sizes, compared to the millimeter or centimeter scales of discrete components. Small size leads to advantages in speed and power consumption, since smaller components have smaller parasitic resistances, capacitances, and inductances. Speed. Signals can be switched between logic 0 and logic 1
much quicker within a chip than they can between chips. Communication within a chip can occur hundreds of times faster than communication between chips on a printed circuit board. The high speed of circuits on-chip is due to their small sizesmaller components and wires have smaller parasitic capacitances to slow down the signal. Power consumption. Logic operations within a chip also take
much less power. Once again, lower power consumption is largely due to the small size of circuits on the chip-smaller parasitic capacitances and resistances require less power to drive them.
2.3.4 VLSI and systems

These advantages of integrated circuits translate into advantages at the system level: Smaller physical size. Smallness is often an advantage in
itself-consider portable televisions or handheld cellular telephones.
10
Lower power consumption. Replacing a handful of standard
parts with a single chip reduces total power consumption. Reducing power consumption has a ripple effect on the rest of the system: a smaller, cheaper power supply can be used; since less power consumption means less heat, a fan may no longer be necessary; a simpler cabinet with less shielding for electromagnetic shielding may be feasible, too. Reduced cost. Reducing the number of components, the
power supply requirements, cabinet costs, and so on, will inevitably reduce system cost. The ripple effect of integration is such that the cost of a system built from custom ICs can be less, even though the individual ICs cost more than the standard parts they replace. Understanding why integrated circuit technology has such profound influence on the design of digital systems requires understanding both the technology of IC manufacturing and the economics of ICs and digital systems.
Applications Electronic system in cars. Digital electronics control VCRs Transaction processing system, ATM Personal computers and Workstations Medical electronic systems.
2.4 Applications of VLSI

Electronic systems now perform a wide variety of tasks in daily life. Electronic systems in some cases have replaced mechanisms that operated mechanically, hydraulically, or by other means; electronics are usually smaller, more flexible, and easier to service. In other cases electronic systems have created totally new applications. Electronic systems perform a variety of tasks, some of them visible, some more hidden: Personal entertainment systems such as portable MP3
players and DVD players perform sophisticated algorithms with remarkably little energy.
11
Electronic systems in cars operate stereo systems and
displays; they also control fuel injection systems, adjust suspensions to varying terrain, and perform the control functions required for anti-lock braking (ABS) systems. Digital electronics compress and decompress video, even at
high-definition data rates, on-the-fly in consumer electronics. Low-cost terminals for Web browsing still require
sophisticated electronics, despite their dedicated function. Personal computers and workstations provide word-
processing, financial analysis, and games. Computers include both central processing units (CPUs) and special-purpose hardware for disk access, faster screen display, etc. Medical electronic systems measure bodily functions and
perform complex processing algorithms to warn about unusual conditions. The availability of these complex systems, far from overwhelming consumers, only creates demand for even more complex systems.
2.5 VERILOG HDL

Verilog HDL is a hardware description language that can be used to model a digital system at many levels of abstraction ranging from the algorithmic-level to the gate-level to the switch-level. The complexity of the digital system being modelled could vary from that of a simple gate to a complete electronic digital system, or anything in between. The digital system can be described hierarchically and timing can be explicitly modelled within the same description. The Verilog HDL language includes capabilities to describe the behaviour-al nature of a design, the dataflow nature of a design, a design's structural composition, delays and a waveform generation mechanism including aspects of response monitoring and verification, all modelled using one single language. In addition, the language provides a programming language interface through which the internals of a design can be accessed during simulation including the control of a simulation run. The language not only defines the syntax but also defines very clear simulation semantics for each language construct. Therefore, models written in this language can be verified using a Verilog simulator. The language inherits many of its operator symbols and constructs from the C programming language. Verilog HDL provides an
12
extensive range of modelling capabilities, some of which are quite difficult to comprehend initially. However, a core subset of the language is quite easy to learn and use. This is sufficient to model most applications.
2.5.1 History:
The verilog HDL language was first developed by Gateway Design Automation in 1983 as hardware are modelling language for their simulator product, At that time was a propnetary language. Because of the popularity of the simulator product, Verilog HDL gained acceptance as a usable and practical language by a number of designers. In an effort to increase the popularity of the language, the language was placed in the public domain in 1990. Open verilog International (OVI) was formed to promote Verilog. In 1992 OVI decided to pursue standardization of verilog HDL as an IEEE standard. This effort was successful and the language became an IEEE standard in 1995. The complete standard is described in the verilog hardware description language reference manual. The standard is called std 1364-1995.
2.5.2 Major Capabilities:

Listed below are the major capabilities of the verilog hardware description: Primitive logic gates, such as and, or and nand, are built-in into the language. Flexibility of creating a user-defined primitive (UDP). Such a primitive could
either be a combinational logic primitive or a sequential logic primitive. Switch-level modelling primitive gates, such as pmos and nmos, are also built-
in into the language. Explicit language constructs are provided for specifying pin-to-pin delays,
path delays and timing checks of a design. A design can be modelled in three different styles or in a mixed style. These
styles are: behavioural style - modelled using procedural constructs; dataflow style modelled using continuous assignments; and structural style - modelled using gate and module instantiations. There are two data types in Verilog HDL; the net data type and the register
data type. The net type represents a physical connection between structural elements while a register type represents an abstract data storage element.
13
Figure.2-1 shows the mixed-level modeling capability of Verilog HDL, that is,
in one design, each module may be modeled at a different level.
Fig2.1: Mixed level modelling
Verilog HDL also has built-in logic functions such as & (bitwise-and) and I
(bitwise-or). High-level programming language constructs such as condition- als, case
statements, and loops are available in the language. Notion of concurrency and time can be explicitly modelled. Powerful file read and write capabilities fare provided. The language is non-deterministic under certain situations, that is, a model
may produce different results on different simulators; for example, the ordering of events on an event queue is not defined by the standard.
2.6 Verilog synthesis

Synthesis is the process of constructing a gate level netlist from a register-transfer level model of a circuit described in Verilog HDL. Figure.2-2 shows such a process. A synthesis system may as an intermediate step, generate a netlist that is comprised of register-transfer level blocks such as flip-flops, arithmetic-logic-units, and multiplexers, interconnected by wires. In such a case, a second program called the RTL module builder is necessary. The purpose of this builder is to build, or acquire
14
from a library of predefined components, each of the required RTL blocks in the userspecified target technology.
Figure: 2.2 synthesis process
Having produced a gate level netlist, a logic optimizer reads in the netlist and optimizes the circuit for the user-specified area and timing constraints. These area and timing constraints may also be used by the module builder for appropriate selection or generation of RTL blocks. In this book, we assume that the target netlist is at the gate level. The logic gates used in the synthesized netlists are described in Appendix B. The module building and logic optimization phases are not described in this book. The above figure shows the basic elements of Verilog HDL and the elements used in hardware. A mapping mechanism or a construction mechanism has to be provided that translates the Verilog HDL elements into their corresponding hardware elements as shown in figure.
15
2.7 Tools used and explanation

Requirements: Xilinx 9.1i Modelsim 6.2
2.8 Introduction about the Software:

Xilinx ISE 8.2i software includes the new Xilinx Smart Compile technology, which significantly improves run times by up to 6 times faster than the previous version, while maintaining exact design preservation of unchanged logic. Modelsim SE 6.2C is a verification and simulation tool for VHDL, Verilog, System-Verilog, and mixed language designs.
16
CHAPTER-3
3.1 THE VITERBI DECODER ALGORITHM
The Viterbi decoding algorithm is a decoding process for convolutional codes for memory-less channel. It depicts the normal flow of information over a noisy channel. For the purpose of error recovery, the encoder adds redundant information to the original Information, and the output is transmitted through a channel. Input at receiver end (r) is the information with redundancy and possibly, noise. The receiver tries to extract the original information through a decoding algorithm and generates an estimate (e). A decoding algorithm that maximizes the probability p(r|e) is a maximum likelihood (ML) algorithm. An algorithm which maximizes the p(r|e) through the proper selection of the estimate (e) is called a maximum a posteriori (MAP) algorithm. The two algorithms have identical results when the source information has a uniform distribution.
Figure 3.1 The Convolutional Decoding
The Viterbi Algorithm was developed by Andrew J. Viterbi and first published in the IEEE transactions journal on Information theory in 1967. It is a maximum likelihood decoding algorithm for convolutional codes. This algorithm provides a method of finding the branch in the trellis diagram that has the highest probability of matching the actual transmitted sequence of bits. Since being discovered, it has become one of the most popular algorithms in use for convolutional decoding. Apart from being an efficient and robust error detection code, it has the advantage of having a fixed decoding time. This makes it suitable for hardware implementation. The algorithm has found universal application in decoding the convolutional codes used in both CDMA and GSM digital cellular, dial-up modems, satellite, deep-space
communications, and 802.11 wireless LANs. It is now also commonly used in speech
17
recognition, speech
synthesis, keyword
spotting, computational
linguistics,
and bioinformatics. For example, in speech-to-text (speech recognition), the acoustic signal is treated as the observed sequence of events, and a string of text is considered to be the "hidden cause" of the acoustic signal. The Viterbi algorithm finds the most likely string of text given the acoustic signal. The terms Viterbi path and Viterbi algorithm are also applied to related dynamic programming algorithms that discover the single most likely explanation for an observation. For example, in statistical parsing a dynamic programming algorithm can be used to discover the single most likely context-free derivation (parse) of a string, which is sometimes called the Viterbi parse.
3.2 Convolutional Encoders

Like any error-correcting code, a convolutional code works by adding some structured redundant information to the user's data and then correcting errors using this information. A convolutional encoder is a linear system. A binary convolutional encoder can be represented as a shift register. The outputs of the encoder are modulo 2 sums of the values in the certain register's cells. The input to the encoder is either the unencoded sequence (for non-recursive codes) or the unencoded sequence added with the values of some register's cells (for recursive codes). In telecommunication, a convolutional code is a type of error-correcting code in which
Each m-bit information symbol
(each m-bit
string)
to
be
encoded
is
transformed into an n-bit symbol, where m/n is the code rate (n m) and
The transformation is a function of the last k information symbols, where k is
the constraint length of the code. Convolutional codes can be systematic and non-systematic.
Systematic codes are those where an unencoded sequence is a part of the output sequence. Systematic codes are almost always recursive, conversely, non-recursive codes are almost always non-systematic. Convolutional codes are used extensively in numerous applications in order to achieve reliable data transfer, including digital video, radio, mobile communication, and satellite communication. These codes are often implemented in concatenation with a hard-decision code, particularly Reed
18
Solomon. Prior to turbo codes, such constructions were the most efficient, coming closest to the Shannon limit. To convolutionally encode data, start with k memory registers, each holding 1 input bit. Unless otherwise specified, all memory registers start with a value of 0. The encoder has nmodulo-2 adders (a modulo 2 adder can be implemented with a single Boolean XOR gate, where the logic is: 0+0 = 0, 0+1 = 1, 1+0 = 1, 1+1 = 0), and n generator polynomials one for each adder (see figure below). An input bit m1 is fed into the leftmost register. Using the generator polynomials and the existing values in the remaining registers, the encoder outputs n bits. Now bit shift all register values to the right (m1 moves to m0, m0 moves to m-1) and wait for the next input bit. If there are no remaining input bits, the encoder continues output until all registers have returned to the zero state. The figure below is a rate 1/3 (m/n) encoder with constraint length (k) of 3. Generator polynomials are G1 = (1,1,1), G2 = (0,1,1), and G3 = (1,0,1). Therefore, output bits are calculated (modulo 2) as follows: n1 = m1 + m0 + m-1 n2 = m0 + m-1 n3 = m1 + m-1. A combination of register's cells that forms one of the output streams (or that is added with the input stream for recursive codes) is defined by a polynomial. Let m be the maximum degree of the polynomials constituting a code, then K=m+1 is a constraint length of the code.
Figure 3.2 The Convolutional Encoder Figure A standard convolutional encoder with polynomials (171,133). For example, for the decoder on the Figure 3.2, the polynomials are:
19
g1(z)=1+z+z2+z3+z6 g2(z)=1+z2+z3+z5+z6 Encoder polynomials are usually denoted in the octal notation. For the above example, these designations are 1111001 = 171 and 1011011 = 133.The constraint length of this code is 7.An example of a recursive convolutional encoder is on fig3.3
. Figure 3.3. A recursive convolutional encoder
3.2.1 Trellis Diagram

A convolutional encoder is often seen as a finite state machine. Each state corresponds to some value of the encoder's register. Given the input bit value, from a certain state the encoder can move to two other states. These state transitions constitute a diagram which is called a trellis diagram. A trellis diagram for the code on the Figure 2 is depicted on the Figure 3. A solid line corresponds to input 0, a dotted line to input 1 (note that encoder states are designated in such a way that the rightmost bit is the newest one). Each path on the trellis diagram corresponds to a valid sequence from the encoder's output. Conversely, any valid sequence from the encoder's output can be represented as a path on the trellis diagram. One of the possible paths is denoted as red (as an example).Note that each state transition on the diagram corresponds to a pair of output bits. There are only two allowed transitions for every state, so there are two allowed pairs of output bits, and the two other pairs are forbidden. If an error occurs, it is very likely that the receiver will get a set of forbidden pairs, which don't constitute a path on the trellis diagram. So, the task of the decoder is to find a path on the trellis diagram which is the closest match to the received sequence.
20
Figure 3.4 A trellis diagram corresponding to the encoder on the Figure 3.3
Let's define a free distance df as a minimal Hamming distance between two different allowed binary sequences (a Hamming distance is defined as a number of differing bits).
21
Chapter -4
4. Viterbi decoder
A Viterbi decoder uses the Viterbi algorithm for decoding a bit stream that has been encoded using a convolutional code. There are other algorithms for decoding a convolutionally encoded stream (for example, the Fano algorithm). The Viterbi algorithm is the most resource-consuming, but it does the maximum
likelihood decoding. It is most often used for decoding convolutional codes with constraint lengths k<=10, but values up to k=15 are used in practice. A hardware Viterbi decoder for basic code usually consists of the following major blocks:

Branch metric unit (BMU) Path metric unit (PMU) Traceback unit (TBU)
Figure 4.1 Block Diagram Of Viterbi Decoder
4.1. Branch Metric Computation (BMC)

A branch metric unit's function is to calculate branch metrics, which are normed distances between every possible symbol in the code alphabet, and the received symbol. There are hard decision and soft decision Viterbi decoders. A hard decision Viterbi decoder receives a simple bitstream on its input, and a Hamming distance is used as a metric. A soft decision Viterbi decoder receives a bitstream containing information about the reliability of each received symbol. The branch metric unit usually compares the expected value with the determined value.
22
For each state, the Hamming distance between the received bits and the expected bits is calculated. Hamming distance between two symbols of the same length is calculated as the number of bits that are different between them. These branch metric values are passed to Block 2. If soft decision inputs were to be used, branch metric would be calculated as the squared Euclidean distance between the received symbols. The squared Euclidean distance is given as (a1-b1)2 + (a2-b2)2 + (a3-b3)2 where a1, a2, a3 and b1, b2, b3 are the three soft decision bits of the received and expected bits respectively. value Meaning 000 strongest 0 001 relatively strong 0 010 relatively weak 0 011 weakest 0 100 weakest 1 101 relatively weak 1 110 relatively strong 1 111 strongest 1
Figure 4.2 A recursive convolutional encoder
23
4.2. Path Metric Computation and Add-Compare-Select (ACS) Unit

A path metric unit summarizes branch metrics to get metrics for paths, where K is the constraint length of the code, one of which can eventually be chosen as optimal. Every clock it makes decisions, throwing off wittingly
nonoptimal paths. The results of these decisions are written to the memory of a traceback unit. The core elements of a PMU are ACS (Add-Compare-Select) units. The way in which they are connected between themselves is defined by a specific code's trellis diagram. Since branch metrics are always , there must be an additional circuit
preventing metric counters from overflow (it isn't shown on the image). An alternate method that eliminates the need to monitor the path metric growth is to allow the path metrics to "roll over", to use this method it is necessary to make sure the path metric accumulators contain enough bits to prevent the "best" and "worst" values from coming within 2(n-1) of each other. The compare circuit is essentially unchanged.
Figure 4.3 ACS Unit
It is possible to monitor the noise level on the incoming bit stream by monitoring the rate of growth of the "best" path metric. A simpler way to do this is to monitor a single location or "state" and watch it pass "upward" through say four discrete levels within the range of the accumulator. As it passes upward through each of these thresholds, a counter is incremented that reflects the "noise" present on the incoming signal. The path metric or error probability for each transition state at a particular time instant is measured as the sum of the path metric for its preceding state and the branch metric between the previous state and the present state. The initial path metric at the
24
first time instant is infinity for all states except state 0. For each state, there are two possible predecessors. The mechanism of calculating the predecessors (and successors) is the path metrics from both these predecessors are compared and the one with the smallest path metric is selected. This is the most probable transition that occurred in the original message. In addition, a single bit is also stored for each state which specifies whether the lower or upper predecessor was selected.
Figure 4.4 A sample implantation of a path metric unit for a specific k=4 decoder
In cases where both paths result in the same path metric to the state, either the higher or lower state may consistently be chosen as the surviving predecessor. For the purpose of this project the higher state is consistently chosen as the surviving predecessor. Finally, the state with the least accumulated path metric at the current time instant is located. This state is called the global winner and is the state from which traceback operation will begin. This method of starting the traceback operation from the global winner instead of an arbitrary state was described by Linda Brackenbury in her design of an asynchronous Viterbi decoder. This greatly improves probability of finding the correct traceback path quicker and hence reduces the amount of history information that needs to be maintained. It also reduces the number of updates required to the surviving path. Both these measures result in improved energy savings. The values for the surviving predecessors (also called local winners) and the global winner are passed to Block 3.
25
Figure 4.5 A sample implementation of an ACS Unit
4.3. Survivor memory unit or Trace back Unit

Back-trace unit restores an (almost) maximum-likelihood path from the decisions made by PMU. Since it does it in inverse direction, a viterbi decoder comprises a FILO (first-in-last-out) buffer to reconstruct a correct order. Note that the implementation shown on the image requires double frequency. There are some tricks that eliminate this requirement. The global winner for the current state is received from Block 2. Its predecessor is selected in the manner. In this way, working backwards through the trellis, the path with the minimum accumulated path metric is selected. This path is known as the traceback path. A diagrammatic description will help visualize this process. The trellis diagram for a K=3 (7, 5) coder with sample input taken as the received data. The general approach to traceback is to accumulate path metrics for up to five times the constraint length (5 * (K 1)), find the node with the largest accumulated cost, and begin traceback from this node. However, computing the node which has accumulated the largest cost (either the largest or smallest integral path metric) involves finding the maxima or minima of several (usually 2K-1) numbers, which may be time consuming when implemented on embedded hardware systems. Most communication systems employ Viterbi decoding involving data packets of fixed sizes, with a fixed bit/byte pattern either at the beginning or/and at the end of
26
the data packet. By using the known bit/byte pattern as reference, the start node may be set to a fixed value, thereby obtaining a perfect Maximum Likelihood Path during traceback.
Figure 4.6 Selected minimum error path for a k=3(7, 5) coder
The state having minimum accumulated error at the last time instant is State 10 and traceback is started here. Moving backwards through the trellis, the minimum error path out of the two possible predecessors from that state is selected. This path is marked in blue. The actual received data is described at the bottom while the expected data written in blue along the selected path. It is observed that at time slot three there was an error in received data (11). This was corrected to (10) by the decoder. Local winner information must be stored for five times the constraint length. For a K =7 decoder, this results in storing history for 7 x 5 = 35 time slots. The state of the decoder at the time instant 35 time slots prior can then be accurately determined. This state value is passed to Block 4. At the next time slot, all the trellis values are shifted left to the previous time slot. The path metric for the last received data and compute the minimum error path is then calculated. If the global winner at this stage is not a child of the previous global winner, the traceback path has to be updated accordingly until the traceback state is a child of the previous state [22].
27
Figure 4.7 Trace back path unit Multiple traceback paths are possible and it may be thought that traceback up to the first bit is necessary to correctly determine the surviving path. However, it was found that all possible paths converge within a certain distance or depth of traceback. This information is useful as it allows the setting of a certain traceback depth beyond which it is neither necessary nor advantageous to store path metric and other information. This greatly reduces memory storage requirements and hence energy consumption of the decoder. Empirical observations showed that a depth of five times the constraint length was sufficient to ensure merging of paths. Therefore, local winner information is stored for 35 slots (five times seven) in the decoder used for this project. Block 4. Data Input Determination Now going forwards through the
traceback path, the state transitions at successive time intervals are studies and the data bit that would have caused this transition is determined. This represents the decoded output. Determining Successors to a particular State, Each state is represented by 6 shift registers (in the case of a K=7 encoder or decoder). The next state can therefore be obtained by a right shift of the values of the shift registers. The first shift register is given a value of 0. The resulting state represents the next state of the coder if the input bit was 0. By adding 32 (1x25) to this value, the next state of the coder if the input bit was 1 Determining Predecessors to a particular State In a similar way, the
28
first predecessor can be calculated this time by a left shift of the values of the shift registers. By adding one (1x20) to this value, the value of the second predecessor to the state is derived.
4.3.1 State Metric Storage

The block stores the partial path metric of each state at the current stage.
4.3.2 Output Generator:

This block generates the decoded output sequence. In the traceback approach, the block incorporates combinational logic, which traces back along the survivor path and latches the path (equivalently the decoded output sequence) to a register.
Figure 4.8 the block diagram of a general Viterbi Decoder
4.4. Encoding Mechanism

Data is coded by using a convolutional encoder, as described. It consists of a series of shift registers and an associated combinatorial logic. The combinatorial logic is usually a series of exclusive-or gates. The conventional encoder K=7, (171,133) is used for the purpose of this project. The octal numbers 171 and 133 when represented
29
in binary form correspond to the connection of the shift registers to the upper and lower exclusive-or gates respectively. Figure 3.1 represents this convolutional encoder that will be used for the project.The encoder consists of series of xor gates for the mechanism of encoding.
Figure 4.9: Rate=1/2 k=7, (171,133) Convolution Encoder
. 4.5.
Decoding Mechanism
There are two main mechanisms by which Viterbi decoding may be
carried out namely, the Register Exchange mechanism and the Traceback mechanism. Register exchange mechanisms, as explained by Ranpara and Sam Ha store the partially decoded output sequence along the path. The advantage of this approach is that it eliminates the need for traceback and hence reduces latency. However at each stage, the contents of each register needs to be copied to the next stage. This makes the hardware complex and more energy consuming than the traceback mechanism. Traceback mechanisms use a single bit to indicate whether the survivor branch came from the upper or lower path. This information is used to traceback the surviving path from the final state to the initial state. This path can then be used to obtain the decoded sequence. Traceback mechanisms prove to be less energy consuming and will hence be the approach followed in this project. Decoding may be done using either hard decision inputs or soft decision inputs. Inputs that arrive at the receiver may not be exactly zero or one.
30
Having been affected by noise, they will have values in between and even higher or lower than zero and one. The values may also be complex in nature. In the hard decision Viterbi decoder, each input that arrives at the receiver is converted into a binary value (either 0 or 1). In the soft decision Viterbi decoder, several levels are created and the arriving input is categorized into a level that is closest to its value. If the possible values are split into 8 decision levels, these levels may be represented by 3 bits and this is known as a 3 bit Soft decision. This project uses a hard decision Viterbi decoder for the purpose of developing and verifying the new energy saving algorithm. Once the algorithm is verified, a soft decision Viterbi decoder may be used in place of the hard decision decoder. Figure 3.2 shows the various stages required to decode data using the Viterbi Algorithm. The decoding mechanism comprises of three major stages namely the Branch Metric Computation Unit, the Path Metric Computation and Add-CompareSelect (ACS) Unit and the Traceback Unit. A schematic representation of the decoder is described below
Figure 4.10: Schematic representation of the Viterbi decoding block
31
CHAPTER-5 METHODS AND TYPES OF VITERBI DECODER 5.1 REGISTER EXCHANGE METHOD
The register exchange (RE) method is the simplest conceptually and a commonly used technique. Because of the large power consumption and large area required in VLSI implementations of the RE method, the trace back method (TB) method is the preferred method in the design of large constraint length, high performance Viterbi decoders. In the register exchange, a register assigned to each state contains information bits for the survivor path from the initial state to the current state. In fact, the register keeps the partially decoded output sequence along the path, as illustrated in Figure 3.3. The register of state S1 at t=3 contains '101'. This is the decoded output sequence along the hold path from the initial state.
Figure 5.1 Register Exchange Method
The register-exchange method eliminates the need to trace back since the register of the final state contains the decoded output sequence. However, this method results in complex hardware due to the need to copy the contents of all the registers in a stage to the next stage. The survivor path information is applied to the least significant bit of each register, and all the registers perform a shift left operation at each stage to make room for the next bits. Hence, each register fills in the survivor
32
path information from the least significant bit toward the most significant bit. The scheme is called shift update. The shift update method is simple in implementation but causes high switching activity due to the shift operation and, hence, results in high power dissipation.
5.2 Trace back mechanism

Register exchange mechanisms, as explained by Ranpara and Sam Ha store the partially decoded output sequence along the path. The advantage of this approach is that it eliminates the need for traceback and hence reduces latency. However at each stage, the contents of each register needs to be copied to the next stage. This makes the hardware complex and more energy consuming than the traceback mechanism. Traceback mechanisms use a single bit to indicate whether the survivor branch came from the upper or lower path. This information is used to traceback the surviving path from the final state to the initial state. This path can then be used to obtain the decoded sequence. Traceback mechanisms prove to be less energy consuming and will hence be the approach followed in this project.
5.3 TYPES OF VITERBI DECODING

In order to realize a certain coding scheme a suitable measure of similarity or distance metric between two code words is vital. The two important metrics used to measure the distance between two code words are the Hamming distance and Euclidian distance adopted by the decoder depending on the code scheme, required accuracy, channel characteristics and demodulator type.
5.3.1 HARD DECISION VITERBI DECODING

In the hard-decision decoding, the path through the trellis is determined using the Hamming distance measure. Thus, the most optimal path through the trellis is the path with the minimum Hamming distance. The Hamming distance can be defined as a number of bits that are different between the observed symbol at the decoder and the sent symbol from the encoder. Furthermore, the hard decision decoding applies one bit quantization on the received bits. Hard decision decoding takes a stream of bits say
33
from the 'threshold detector' stage of a receiver, where each bit is considered definitely one or zero. E.g. For binary signalling, received pulses are sampled and the resulting voltages are compared with a single threshold. If a voltage is greater than the threshold it is considered to be definitely a 'one' say regardless of how close it is to the threshold. If it is less, it is definitely zero.
5.3.2
SOFT DECISION VITERBI DECODING

Soft-decision decoding is applied for the maximum likelihood
decoding, when the data is transmitted over the Gaussian channel. On the contrary to the hard decision decoding, the soft-decision decoding uses multi-bit quantization for the received bits, and Euclidean distance as a distance measure instead of the hamming distance. The demodulator input is now an analog waveform and is usually quantized into different levels in order to help the decoder decide more easily. A 3-bit quantization results in an 8-array output. Soft decision decoding requires a stream of 'soft bits' where we get not only the 1 or 0 decision but also an indication of how certain we are that the decision is correct. One way of implementing this would be to make the threshold detector generate instead of 0 or 1, say: 000 (definitely 0), 100 (guess 1), 001 (probably 0), 101 (maybe 1), 010 (maybe 0), 011 (guess 0),
110 (probably 1), 111(definitely 1).
We may call the last two bits 'confidence' bits. This is easy to do with eight voltage thresholds rather than one. This helps when we anticipate errors and have some 'forward error correction' coding built into the transmission.
34
35
CHAPTER-6 Applications
The Viterbi algorithm has a wide range of applications ranging from satellite and space communications, DNA sequence analysis and Optical Character Recognition. An attempt to perform optical character recognition of text was investigated by Neuhoff. The initial approach considered was to create a dictionary which simulated vocabularies. Each time a character was read by the optical reader, it would search the dictionary for the most likely estimate. The huge amount of computational and storage requirements required under this approach made it impractical. However, another approach makes use of statistical information about the language such as relative frequency of letter pairs. A maximum a priori probability (MAP) of a word is determined based on its probability as the output of the source model. The Viterbi algorithm may then be used to perform this MAP sequence estimation. An interesting application discussed by Metzner investigated among others, the use of Viterbi decoding with soft decision to increase the probability of successfully transmitting a data packet during a meteor burst. Since meteor trails are made up of ionized material, these can be used for reliable communications. Some characteristics of such meteor burst communication and descriptions of its practical applications are detailed in. Metzner showed that convolutional codes with soft decision were considerably better for meteor burst applications as compared to ReedSolomon codes. Low power applications of the Viterbi decoder are particularly relevant to many digital communication and recording systems today. As described by Kawokgy and Salama systems like these are increasingly being used in wireless applications which being battery operated, require low power consumption. In addition, these systems also require processing speeds of over 100Mbps to allow multimedia transmission. Following this trend, many papers have been written on designing low power Viterbi decoding algorithms targeted for next generation wireless applications, particularly CDMA systems. Some of these energy saving ideas that have been investigated are described in the next section.
36
6.1 Research Work

In mobile networks, decoding capabilities are limited by the receiver which is a mobile handset. As such, it has limited resources of energy and computation power. Another factor that affects wireless communication is that bandwidth is expensive. Therefore, there is a high demand for codes that can correct errors very efficiently while at the same time utilizing minimum energy. Hence, a lot of the past research has been focused on how this may be achieved. The fixed T-algorithm algorithm is an optimization of the Viterbi algorithm which applies a pruning threshold to the accumulated path metrics of the Viterbi decoder. Instead of storing all the survivor paths for all 2K-1 states, only some of the most-likely paths are kept at every trellis stage. This results in fewer paths being found and stored. The following Figure 3.4 demonstrates the result of an experiment conducted by Henning and Chakrabarti [34] which compares normalized energy estimates for the Viterbi and the fixed T-algorithm decoders as it varies with signal to noise ratio (Eb/No) and code rate.
Figure 6.1: Normalised energy edtimated for the Viterbi and fixed T-algorithm (Tf) decoders as code rate and signal to noise ratio (Eb/No) vary.
From the graph, it is estimated that a 33% to 83 % reduction in energy consumption can be achieved when the signal to noise ratio is between 2.1 and 4 dB. One of the other approaches taken has been to develop an adaptive T-algorithm which adjusts parameters of the decoder based on real-time variations in signal to noise ratio (SNR), code rate and maximum acceptable bit-error rate. The parameters adjusted are
37
truncation length and pruning threshold of the T-algorithm along with trace-back memory management. Henning and Chakrabarti demonstrate in their paper how this can achieve a potential energy reduction of 70% to 97.5% as compared to Viterbi decoding. Truncation length refers to the number of bits a path is followed back before a decision is made on the bit that was encoded. By reducing the truncation length more bits can be decoded per traceback. Similarly, lowering the pruning threshold means fewer paths need to be found and stored. Both of these measures can reduce the number of memory accesses required by the decoder and hence reduce energy consumption. However, these measures may cause significant reduction in the error correcting capability of the decoder. Nevertheless, adjusting these parameters based on real-time changes in the channel can optimize energy consumption. The following figure, Figure 3.5 demonstrates the results of an experiment conducted by Henning and Chakrabarti [34] in which pruning threshold and truncation length are adapted to maintain bit-error rate below 0.0037. From the graph, it is estimated that an energy consumption reduction of 70 to 97.5 % compared to the Viterbi decoder can be achieved when the signal to noise ratio is between 2.1 and 4 dB. However, the adaptive T-algorithm does require an additional overhead in terms of monitoring the real-time variations and choosing the appropriate truncation and threshold parameters from a lookup table. Since these operations are not complex it is assumed that their energy consumption is negligible.
38
Figure 6.2: Normalised energy estimates for the Viterbi and adaptive Talgorithm (Ta) decoders as code rate and signal to noise ratio (Eb/No) vary while maintaining bit-error rate below 0.0037
Yet another approach that was put forward by Jie Jin and Chi-Ying Tsui in the 2006 International Symposium on Low Power Electronics and Design, was to integrate the T-algorithm with a Scarce-StateTransition (SST) decoder structure. The SST structure first pre-decodes the received data (Rx) by performing an inverse operation of the encoder. The pre-decoded signal will contain the original message along with bit errors (Pre-Dec). This message Pre-Dec is re-encoded and XORed with Rx, the original received data. The operation results in an output which consists of mainly 0s and the errors in the message. This output is then fed to the Viterbi decoder and the errors are corrected. In the end, the pre-decoded data (Pre-Dec) is added to the decoded output of the Viterbi decoder using modulo-2 addition. When channel bit-errors are low, most of the Viterbi decoder output bits are zero and thus reduces switching activity. The SST structure was used to reduce the switching activities of the decoder and combined with the T-algorithm to reduce the average number of Add-
39
Compare Select calculations. In their experiments, Jie Jin and Chi-Ying Tsui achieved a 30%-76% reduction in power consumption over the traditional Viterbi design for a range of SNR values varying from 4 dB to 12 dB.
A different approach investigated by Sherif Welsen Shaker, Salwa Hussein Elramly and Khaled Ali Shehata at a Telecommunications forum held in Belgrade last year (2009) was to use the traceback approach with clock gating. In clock gating, the clock of each register is enabled only when the register updates it survivor path information. This reduces power dissipation. Their simulations showed a 30% reduction in dynamic power dissipation which gives a good indication of power reduction on implementation. A similar approach investigated by Ranpara and Sam Ha and presented in the International ASIC conference at Washington in 1999 was the use of clock gating in combination with a concept known as toggle filtering. Signals may arrive at the inputs of a combinational block at different times and this causes the block to go through several intermediate transitions before it stabilizes. By blocking early signals, the number of intermediate transitions can be reduced and hence power disspation can be minimized. This mechanism of blocking early signals until all input signals arrive, called toggle filtering, was used by Ranpara, et al, to reduce energy consumption of the Viterbi decoder. Recently a new approach, targeted towards wireless applications has been introduced [38] and involves a pre-traceback architecture for the survivor path memory unit. The start state of decoding is obtained directly through a pointer register pointing to the target traceback state instead of estimating the start state through a recursive traceback operation. This approach makes use of the similarity between bit write and decode traceback operation to introduce the pre-traceback operation. Effectively resulting in a trace forward type of operation, it results in a 50% reduction in survivor memory read operations. Apart from improving latency by 25%, implementation results predict up to 11.9% better energy efficiency when compared to conventional traceback architecture for typical wireless applications.
40
6.3 Low power consumption

For the branch metric of the Viterbi decoder, our design employs a soft-decision method to improve its correction capability. In order to find the survivor path efficiently, we modify the classical Viterbi decoding algorithm into a new one. This new algorithm is similar to the register-exchange method with lower latency, but using RAM instead of register banks for recording the output bit-stream of the survivor path. Hence, our design can provide a low-power design. Finally, the chip of this design consumes about 28.6 K gates using TSMC 0.18 m CMOS technology. The power consumption of our chip is about 19.5 mW at 100 MHz. The power usage in the implementation is around 367 mw.
Figure 6.3: Demonstration Of Power Consumption
41
6.4 Summary
This chapter has explained the decoding mechanism of the Viterbi decoder in detail and described a few of its applications. A number of energy saving techniques that have been investigated in the past has been discussed. The next chapter gives a detailed description of the proposed energy saving algorithm that will be used in this project.
42
CHAPTER-7
SYNTHESIS AND SIMULATION RESULTS 7.1 Sample code

/************************************** ****************/ module pDFF(DATA,QOUT,CLOCK,RESET);
/****************************************************** /
Code for d flip flop

parameter WIDTH = 1; input [WIDTH-1:0] DATA; input CLOCK, RESET; output [WIDTH-1:0] QOUT; reg [WIDTH-1:0] QOUT; always @(posedge CLOCK or negedge RESET) if (~RESET) QOUT <= 0; //active low reset else QOUT <= DATA; endmodule
Code for branch metric unit (BMU)

module BMU (Reset, Clock2, ACSSegment , Code,Distance) ; input Reset, Clock2; input [`WD_FSM-1:0] ACSSegment; input [`WD_CODE-1:0] Code; output [`WD_DIST*2*`N_ACS -1:0] Distance; wire [`WD_STATE:0 ] PolyA, PolyB;
wire [`WD_STATE:0 ] wA, wB; assign assign PolyA = 9'b110_101_111 ; PolyB= 9'b100_011_101 ; // polynomial code used
wire [`WD_STATE:0] B0,B1,B2,B3,B4,B5,B6,B7 ; // WIDTH of B = WD_STATE + 1
43
wire [`WD_CODE-1:0] wire [`WD_DIST-1:0] D0,D1,D2,D3,D4,D5,D6,D7 ;// output distances reg [`WD_CODE-1:0] CodeRegister; always @(posedge Clock2 or negedge Reset) begin if (~Reset) CodeRegister <= 0; else if (ACSSegment == 6'h3F) CodeRegiste r <= Code; end assign B0 = {ACSSegment,3'b000} ; // The branch to be calculated is assign B1 = {ACSSegment,3'b001} ; // determine d by ACSSegment assign B2 = {ACSSegment,3'b010} ; assign B3 = {ACSSegment, 3'b011}; assign B4 = {ACSSegment,3'b100} ; assign B5 = {ACSSegment,3'b101} ; assign B6 = {ACSSegment,3'b110} ; assign B7 ={ACSSegment,3'b111} ; ENC EN0(PolyA,PolyB,B0,G0) ; assign G1 = ~G0; //Find the 'correct' NC EN2(PolyA,PolyB,B2,G2) ; assign G3 = ~G2; // branch metric ENC EN4(PolyA,PolyB,B4,G4) ; assign G5 = ~G4; ENC EN6(PolyA,PolyB,B6,G6) ; assign G7 = ~G6; HARD_DIST_CAL C HD0(CodeRegister,G0,D0) ;//Calculate its hammingd i HARD_DIST_CAL C HD1(CodeRegister,G1,D1) ; HARD_DIST_CAL C HD2(CodeRegist er,G2,D2) ; HARD_DIST_CAL C HD3(CodeRegister,G3,D3) ; HARD_DIST_CAL C HD4(CodeRegister,G4,D4) ; HARD_DIST_CAL C HD5(CodeRegister,G5,D5) ; HARD_DIST_CAL C HD6(CodeRegister,G6,D6) ; HARD_DIST_CAL C HD7(CodeRegister,G7,D7) ; assign Distance = {D7,D6,D5,D4,D3, D2,D1,D0};// bus of distances endmodule
44
Code for hamming distance calculation

module HARD_DIST_CAL C (InputSymbol , BranchOutput , OutputDistance) ;
//desc.
: performs 2 bits hamming DISTance calculation /*-----------------------------------*/
input [`WD_CODE-1:0] InputSymbol , BranchOutput ; output [`WD_DIST-1:0] OutputDistance; reg [`WD_DIST-1:0] OutputDistance;77 wireMS,LS;79 assign MS = (InputSymbol[1 ] ^ BranchOutput[1]) ; assign LS = (InputSymbol[0 ] ^ BranchOutput[0]) ;82 always @(MS or LS) begin OutputDistance[1 ] <= MS & LS; OutputDistance[0 ] <= MS ^ LS; end endmodule
/*----------------------------------- */ module ENC (PolyA, PolyB,BranchID,EncOut); //desc. : encoder to determine branch output /*-----------------------------------*/
input [`WD_STATE:0 ] PolyA,PolyB; input [`WD_STATE:0 ] BranchID; output [`WD_CODE-1:0]EncOut; wire [`WD_STATE:0 ] wA, wB; reg [`WD_CODE-1:0] EncOut;
45
assign wA = PolyA & BranchID; assign wB = PolyB&BranchID; always @(wA or wB) begin EncOut[1] = (((wA[0]^wA[1]) ^ (wA[2]^wA[3]))^((wA[4]^wA[5] ) ^ (wA[6]^wA[7]))^wA[8]) ; EncOut[0] = (((wB[0]^wB[1]) ^ (wB[2]^wB[3]))^((wB[4]^wB[5] ) ^ (wB[6]^wB[7]))^wB[8]) ; end
code for viterbi encoder

/***************************************** ************* / module viterbi_encode9(X,Y,Clock,Reset) ; /****************************************************** /
Input X,Clock,Reset; output [1:0] wire [1:0] Y; wire X, Clock,Reset; wire [8:0] PolyA, PolyB; wire [8:0] wA, wB, ShReg;
assign
PolyA
=9'b111_101_011 ; PolyA = 9'b110_101_111 ;
assign
PolyB =
9'b101_110_001 ; assign assign
PolyB = 9'b100_011_101 ;
assign wA = PolyA & ShReg; assign wB = PolyB & ShReg; assign ShReg[8] = X;
46
pDFF dff7(ShReg[8], ShReg[7], Clock, Reset); pDFF dff6(ShReg[7], ShReg[6], Clock, Reset); pDFF dff5(ShReg[6], ShReg[5], Clock, Reset); pDF dff4(ShReg[5], ShReg[4],Clock,Reset); pDFF dff3(ShReg[4], ShReg[3], Clock, Reset); pDFF dff2(ShReg[3], ShReg[2], Clock.Reset); pDFF dff1(ShReg[2], ShReg[1], Clock, Reset); pDFF dff0(ShReg[1], ShReg[0], Clock,Reset); assign Yt[1] = wA[0] ^ wA[1] ^ wA[2] ^ wA[3] ^ wA[4] ^ wA[5] ^ wA[6] ^ wA[7] ^ wA[8]; assign Yt[0] = wB[0] ^ wB[1] ^ wB[2] ^ wB[3] ^ wB[4] ^ wB[5] ^ wB[6] ^ wB[7] ^ wB[8];
pDFF dffy1(Yt[1], Y[1], Clock, Reset); pDFF dffy0(Yt[0], Y[0], Clock, Reset); endmodule
code for viterbi decoder

// Module : VITERBIDECODER // File : decoder.v
// Description : Top Level Module of Viterbi Decoder
//module
VITERBIDECODE R
(Reset,
CLOCK,
Active,
Code,
DecodeOut) ;
module
VITERBIDECODER
(Reset,
CLOCK,
Active,
Code,
DecodeOut) ; input Reset, CLOCK, Active; input [`WD_CODE-1:0] Code; output DecodeOut;
47
wire [`WD_DIST*2*`N_ACS -1:0] Distance; Output
//
BMG
wire [`WD_FSM-1:0] ACSSegment ; wire [`WD_DEPTH-1:0] ACSPage; wire CompareStart , Hold, Init;
// // Control Output //
wire [`N_ACS-1:0] Survivors; wire [`WD_STATE-1:0] LowestState ; wire TB_EN;
// ACS Output
wire RAMEnable; wire ReadClock, WriteClock, RWSelect;31 wire [`WD_RAM_ADDRESS -1:0] AddressRAM; AddressBus , // generated by TBU and ACSU wire [`WD_RAM_DATA-1:0] DataRAM; wire [`WD_RAM_DATA-1:0] DataTB; wire [`WD_RAM_ADDRESS -`WD_FSM-1:0] AddressTB; wire Clock1, Clock2; // for metric memory connection wire [`WD_METR*2*`N_ACS -1:0] MMPathMetric ; wire [`WD_METR*`N_ACS-1:0] MMMetric; wire [`WD_FSM-2:0] MMReadAddress; wire [`WD_FSM-1:0] MMWriteAddress ; wire MMBlockSelect ; // RAM Databus // RAM
// instantiatio n of Viterbi Decoder Modules CONTROL ctl (Reset, CLOCK, Clock1, Clock2, ACSPage, ACSSegment , Active, CompareStart , Hold, Init, TB_EN); BMU bmu (Reset, Clock2, ACSSegment , Code, Distance); ACSUNIT acs (Reset, Clock1, Clock2, Active, Init, Hold, CompareStart ,
Department of ECE, GITAM UNIVERSITY 48
ACSSegment , Distance, Survivors, MMReadAddress , MMWriteAddress , MMBlockSelect , MMPathMetric) ;
LowestState , MMMetric,
MMU mmu (CLOCK, Clock1, Clock2, Reset, Active, Hold, Init, ACSPage, ACSSegment [`WD_FSM-1:1], Survivors, DataTB, AddressTB, RWSelect , ReadClock, WriteClock,RAMEnable , AddressRAM , DataRAM); TBU tbu (Reset, Clock1, Clock2, TB_EN, Init, Hold, LowestState
Code for main memory unit

// Module : MMU
// Description : Description of MMU Unit in Viterbi Decoder module MMU (CLOCK, Clock1, Clock2, Reset, Active, Hold, Init, ACSPage, ACSSegment_minusLSB , Survivors, DataTB, AddressTB, RWSelect , ReadClock, WriteClock, RAMEnable , AddressRAM, DataRAM);
// connection from Control input CLOCK, Clock1, Clock2, Reset, Active, Hold, Init; input [`WD_DEPTH-1:0] ACSPage; input [`WD_FSM-2:0] ACSSegment_minusLSB ;21 // connection from ACS Unit input [`N_ACS-1:0] Survivors; // connection from/to TB Unit output [`WD_RAM_DATA-1:0] DataTB; input [`WD_RAM_ADDRESS -`WD_FSM-1:0] AddressTB;
49
// connection from/to RAM output RWSelect, ReadClock, WriteClock, RAMEnable; output [`WD_RAM_ADDRESS -1:0] AddressRAM; inout [`WD_RAM_DATA-1:0] DataRAM;
wire [`WD_RAM_DATA-1:0] SurvRDY;
WrittenSurvivors ;35reg dummy,
reg [`WD_RAM_ADDRESS-1:0] AddressRAM ; reg [`WD_DEPTH-1:0] TBPage;
wire [`WD_DEPTH-1:0] TBPage_; wire [`WD_DEPTH-1:0] ACSPage; wire [`WD_TB_ADDRESS-1:0] AddressTB; // Read and Write clock // Dummy variable used because Write Clock only occur every 2 Clocks. always @(posedge Clock2 or negedge Reset) if (~Reset) dummy <= 0;else if (Active) dummy <= ~dummy;49 assign WriteClock = (Active && ~dummy) ? Clock1:0; assign ReadClock = (Active && ~Hold) ? ~Clock1:0; // -// For Survivor Buffer, // // -- The buffer used because Data Bus Width is 8, while ACS output is only 4 bits at one time
always @(posedge Clock1 or negedge Reset) if (~Reset) SurvRDY <= 1; else if (Active) SurvRDY <= ~SurvRDY; ACSSURVIVORBUFFE R buff (Reset, Clock1, Active, SurvRDY, Survivors , WrittenSurvivor
50
// every negedge Clock2 : - TBPage is decreased by 1, OR // - When Init is Active, TBPage equal ACSPage - 1
always @(negedge Clock2 or negedge Reset) begin if (~Reset) begin TBPage <= 0; end else if (Init) TBPage <= ACSPage-1; else TBPage <= TBPage_; end assign TBPage_ = TBPage - 1; // For RAMs assign RAMEnable = 0; assign RWSelect = (Clock2) ? 1:0; assign DataRAM = (~Clock2) ? WrittenSurvivors:'bz ; assign DataTB = (Clock2) ? DataRAM:'bz ;84 / every time Clock2 changes, the Address and Enable for each RAM has to // be set so it will be ready when Read/Writ e Clock occur on the edges of // Clock1. always @(posedge CLOCK or negedge Reset) begin if (~Reset) AddressRAM <= 0; else if (Active) begin if (Clock2 == 0) begin AddressRAM <= {ACSPage, ACSSegment_minusLSB} ; end else // this is for read operation begin AddressRAM <= {TBPage [`WD_DEPTH-1:0],AddressTB} ; end // this is when write happened
51 Department Of ECE, GITAM UNIVERSITY
end end endmodule module ACSSURVIVORBUFFE R (Reset, Clock1, Active, SurvRDY, Survivors , WrittenSurvivors) ; // // To accomodate the use of 8 bit wide RAM DATA BUS, the Survivor // (which is only 4 on every clock) must be buffered first. /*----------------------------------- */
input Reset, Clock1, Active, SurvRDY; input [`N_ACS-1:0] Survivors; output [`WD_RAM_DATA-1:0] WrittenSurvivors ; wire[`WD_RAM_DATA-1:0] WrittenSurvivors ; reg [`N_ACS-1:0] WrittenSurvivors_ ;123 always @(posedge Clock1 or negedge Reset) begin if (~Reset) WrittenSurvivors _ = 0; else if (Active) WrittenSurvivors_ = Survivors; end
code for ACS unit( add compare and select)

// Module // File : ACSUNIT : acs.v
// Description : Description of ACS Unit in Viterbi Decoder module ACSUNIT (Reset, Clock1, Clock2, Active, Init, Hold, CompareStart , ACSSegment , Distance, Survivors, LowestState ,
MMReadAddress , MMWriteAddress , MMBlockSelect , MMMetric, MMPathMetric) ;
/*----------------------------------- */ // ACS UNIT consists of : // // // - 4 ACS modules (ACS) - RAM Interface - State with smallest metric finder (LOWESTPICK) /*----------------------------------- */ input Reset, Clock1, Clock2, Active, Init, Hold, CompareStart ; input [`WD_FSM-1:0] ACSSegment ; input [`WD_DIST*2*`N_ACS -1:0] Distance; // to Survivor Memory output [`N_ACS-1:0] Survivors; // to TB Unit output [`WD_STATE-1:0] LowestState ; // to Memory Metric output [`WD_FSM-2:0] MMReadAddress; output [`WD_FSM-1:0] MMWriteAddress ; output MMBlockSelect; output [`WD_METR*`N_ACS -1:0] MMMetric; input [`WD_METR*2*`N_ACS-1:0] MMPathMetric ; wire [`WD_DIST-1:0] Distance7,Distance6,Distance5,Distance4 , Distance3,Distance2,Distance1,Distance0 ; wire [`WD_METR*`N_ACS-1:0] Metric; wire [`WD_METR-1:0] Metric0, Metric1, Metric2, Metric3;45 wire [`WD_METR*2*`N_ACS -1:0] PathMetric; wire[`WD_METR-1:0] PathMetric7,PathMetric6,PathMetric5,PathMetric4 , PathMetric3,PathMetric2,PathMetric1,PathMetric0 ; wire [`WD_METR-1:0] LowestMetric; assign {Distance7,Distance6,Distance5,Distance4 , Distance3,Distance2,Distance1,Dist ance0} = Distance;
assign {PathMetric7,PathMetric6,PathMetric5,PathMetric4 , PathMetric3,PathMetric2,PathMetric1,PathMetric0 } = PathMetric ;56 ACS acs0 (CompareStart ,Distance1,Distance0,PathMetric1,PathMetric0 , ACSData0, Metric0); ACS acs1 (CompareStart,Distance3,Distance2,PathMetric3,PathMetric2 , ACSData1, Metric1); ACS acs2 (CompareStart ,Distance5,Distance4,PathMetric5,PathMetric4 , ACSData2, Metric2); ACS acs3 (CompareStart , Distance7,Distance6,PathMetric7,PathMetric6 ,ACSData3, Metric3); // global parameters for Viterbi Decoder // decoder specs : // // // RATE = 1/2, DEPTH = 63, number of ACS = 4 WIDTH PARS WD_CODE
// PARAMETER VALUES BITS ORDER // input symbol bi // number of states // iterations each data // iterations until depth // Surv mem. data bus // Surv mem. address bus // simulation parameters 2
2568 WD_STATE 256 : 4= 646 `WD_FSM 63 6`WD_DEPTH 83 `WD_DATA 256x1x64/ 8 = 2048 11`WD_ADDR
`define HALF `define FULL `define DPERIOD // decoder parameters `define CONSTRAINT `define N_ACS
100 200 (`FULL*128)
9 4
// K // 4 ACSs
54
Department Of ECE, GITAM UNIVERSITY
`define N_STATE `define N_ITER `define WD_STATE 64
256 // 8 //
//
define WD_CODE2// width of Decoder Input define WD_FSM 6// 256 (states) : 4 (ACSs) = 64 --> log2(64) = 6 define WD_DEPTH6// depth has to be at least 5*(K-1). define WD_DIST2// Width of Calculated Distance define WD_METR 8// width of metric.
// For survivor memory define WD_RAM_DATA define WD_RAM_ADDRES S define WD_TB_ADDRESS 8 // width of RAM Data Bus 11 // width of RAM Address Bus 5 // width of Address Bus
// between TB and MMU// --> `WD_RAM_ADDRES S - `WD_DEPTH
7.2 DESIGN TOOL

Start the ISE Software by clicking the XILINX ISE icon.
Create a New Project and find the following properties displayed.
Create a VHDL Source formatting all inputs, outputs and buffers if required. which provides a window to write the VHDL code, to be synthesized.
Check Syntax after finally editing the VHDL source for any errors. Design Simulation is done after compilation. Synthesizing starts by creating Timing Constraints Implement Design and Verify Constraints
Assigning Pin Location Constraints according to the requirement on FPGA board. Download Design to the Spartan FPGA Board by clicking Configure device, until a .bit file is generated showing a message Program Succeeded.
7.3 ModelSim 6.2C

It is software used to simulate and verify the functionality of a VHDL/VERILOG code. In our project, we use this software for the same purpose. For each module, the input values are given and the results are observed accordingly.
Following steps have to be followed to simulate a design in Modelsim: 1. Invoke modelsim by double clicking the icon. Then a window appears
containing menu for various commands, work space and library space. Create a directory where the simulation files have to be saved.
2. Create a new file/Add existing file in Add items to the Project window and Create a new file window.
3. Then, on the work space of the main window, you find your file in which the code can be written when double-clicked.
4. After the code is written, it is saved and then compiled for syntax errors. If no syntax errors, a Green Tick mark will be seen in the work space, else a Red Cross. A red message indicates that there is an error in our code. To correct any errors, just double-click on the Error message and the error is highlighted in the source window.
5. Simulate the compiled code, by clicking (+) sign of work in Library tab. Clicking that file will directly open a Signals window in which all the signals(internal/external) used in the module are ready for simulation.
6. The appropriate signals are selected and added to the wave window by clicking Add to wave in the tool bar. Hence the obtained signals are assigned with required values to provide desired outputs. 7. There are different options available in the waveform window to view the output values in various representations such as Binary, Hexadecimal, Symbolic, Octal, ASCII, Decimal and Unsigned representations. 8. The Modelsim can be exited by selecting File -> quit from menu. 9. In this way the Modelsim software is used for functional verification or Simulation of the users code.
7.4 SCHEMATIC DIAGRAM OF Viterbi Decoder
7.4.11 RTL SCHEMATIC
SIMULATION RESULTS
7.5 WAVEFORM GENERATED
CHAPTER-8 CONCLUSION AND FUTURE WORK CONCLUSION

We have proposed a high speed VD design for TCM systems. The pre computation architecture that incorporates T-algorithm efficiently reducing the decoding speed appreciably. We have also analysed pre computation algorithm .where the optimal pre computation steps are calculated and dicussed.This algorithm is suitable for TCM systems which always employ high rate convolution code. Finally we presented a design case. Both the ACSU and SMU are modified to correctly to decode the signal. Synthesis results show that VD could improving the maximum decoding speed.
FUTURE WORK
By using FPGA device and hybrid microprocessor the decoding benefits can be achieved in future. In future to improve the decoder performance the Viterbi algorithm is carried out in reconfigurable hardware. Power saving architecture can be designed for the above decoder which is executable in the mobile devices. Viterbi decoder can also be implemented using JAVA. Therefore in the future Viterbi algorithm may be used for various scenarios. So in the future the complexity can be greatly reduced.
Chapter 9 APPENDIX
SYNTHESIS REPORT
TABLE OF CONTENTS 1) Synthesis Options Summary 2) HDL Compilation 3) Design Hierarchy Analysis 4) HDL Analysis 5) HDL Synthesis 5.1) HDL Synthesis Report 6) Advanced HDL Synthesis 6.1) Advanced HDL Synthesis Report 7. Device utilization summary
=================================================================== ====== * Synthesis Options Summary *
=================================================================== ====== ---- Source Parameters Input File Name Input Format : "VITERBIDECODER.prj" : mixed
Ignore Synthesis Constraint File : NO
---- Target Parameters Output File Name Output Format Target Device : "VITERBIDECODER" : NGC : xc2vp20-6-fg676
---- Source Options Top Module Name Automatic FSM Extraction FSM Encoding Algorithm FSM Style RAM Extraction RAM Style ROM Extraction Mux Style Decoder Extraction Priority Encoder Extraction Shift Register Extraction Logical Shifter Extraction XOR Collapsing ROM Style Mux Extraction Resource Sharing Multiplier Style : lut : Yes : Auto : Yes : Auto : YES : YES : YES : YES : YES : Auto : YES : YES : auto : No : VITERBIDECODER : YES : Auto
Automatic Register Balancing
---- Target Options Add IO Buffers Global Maximum Fanout : YES : 500 : 16
Add Generic Clock Buffer(BUFG) Register Duplication Slice Packing : YES : YES
Pack IO Registers into IOBs Equivalent register Removal
: auto : YES
---- General Options Optimization Goal Optimization Effort Keep Hierarchy RTL Output Global Optimization Write Timing Constraints : Speed :1 : NO : Yes : AllClockNets : NO
Hierarchy Separator Bus Delimiter Case Specifier Slice Utilization Ratio Slice Utilization Ratio Delta
:/ : <> : maintain : 100 :5
---- Other Options lso Read Cores cross_clock_analysis verilog2001 safe_implementation : VITERBIDECODER.lso : YES : NO : YES : No
Optimize Instantiated Primitives : NO tristate2logic use_clock_enable use_sync_set use_sync_reset : Yes : Yes : Yes : Yes
===================================================================
=================================================================== ====== * HDL Compilation *
=================================================================== ====== Compiling verilog file "acs.v" in library work Compiling verilog include file "params.v" Module <ACSUNIT> compiled Module <RAMINTERFACE> compiled Module <ACS> compiled Module <COMPARATOR> compiled Module <LOWESTPICK> compiled Compiling verilog file "tbu.v" in library work Compiling verilog include file "params.v" Module <LOWEST_OF_FOUR> compiled Module <TBU> compiled
Compiling verilog file "ram.v" in library work Compiling verilog include file "params.v" Module <TRACEUNIT> compiled Module <RAM> compiled Module <RAMMODULE> compiled Compiling verilog file "mmu.v" in library work Compiling verilog include file "params.v" Module <SMU> compiled Module <MMU> compiled Compiling verilog file "bmu.v" in library work Compiling verilog include file "params.v" Module <ACSSURVIVORBUFFER> compiled Module <BMU> compiled Module <HARD_DIST_CALC> compiled Compiling verilog file "control.v" in library work Compiling verilog include file "params.v" Module <ENC> compiled Compiling verilog file "decoder.v" in library work Compiling verilog include file "params.v" Module <CONTROL> compiled Module <VITERBIDECODER> compiled No errors in compilation Analysis of file <"VITERBIDECODER.prj"> succeeded.
Analyzing hierarchy for module <COMPARATOR> in library <work>.
Analyzing hierarchy for module <LOWEST_OF_FOUR> in library <work>.
Analyzing hierarchy for module <COMPARATOR> in library <work>
=================================================================== ====== * HDL Synthesis *
===================================================================
======
Synthesizing Unit <CONTROL>. Related source file is "control.v". Found 4x1-bit ROM for signal <$mux0000>. Found 1-bit register for signal <Clock1>. Found 1-bit register for signal <Clock2>. Found 1-bit register for signal <Hold>. Found 1-bit register for signal <TB_EN>. Found 1-bit register for signal <CompareStart>. Found 6-bit register for signal <ACSPage>. Found 6-bit register for signal <ACSSegment>. Found 1-bit register for signal <Init>. Found 12-bit adder for signal <$AUX_1>. Found 4-bit up counter for signal <CompareCount>. Found 1-bit register for signal <count>. Summary: inferred 1 ROM(s). inferred 1 Counter(s). inferred 19 D-type flip-flop(s). inferred 1 Adder/Subtractor Synthesizing Unit <SMU>. Related source file is "ram.v". Found 2048-bit register for signal <M_REG_A>. Found 2048-bit register for signal <M_REG_B>. INFO:Xst:738 - HDL ADVISOR - 2048 flip-flops were inferred for signal <M_REG_A>. You may be trying to describe a RAM in a way that is incompatible with block and distributed RAM resources available on Xilinx devices, or with a specific template that is not supported. Please review the Xilinx resources documentation and the XST user manual for coding guidelines. Taking advantage of RAM resources will lead to improved device usage and reduced synthesis time. INFO:Xst:738 - HDL ADVISOR - 2048 flip-flops were inferred for signal <M_REG_B>. You may be trying to describe a RAM in a way that is incompatible with block and distributed RAM resources available on Xilinx devices, or with a specific template that is not supported. Please review the Xilinx resources documentation and the XST user manual for coding guidelines. Taking advantage of RAM resources will lead to improved device usage
and reduced synthesis time.
Summary: inferred 4096 D-type flip-flop(s). Unit <SMU> synthesized.
Synthesizing Unit <ENC>. Related source file is "bmu.v". Found 2-bit xor9 for signal <EncOut>. Summary: inferred 2 Xor(s). Unit <ENC> synthesized.
Synthesizing Unit <HARD_DIST_CALC>. Related source file is "bmu.v". Found 1-bit xor2 for signal <OutputDistance<0>>. Found 1-bit xor2 for signal <LS>. Found 1-bit xor2 for signal <MS>. Unit <HARD_DIST_CALC> synthesized.
Synthesizing Unit <RAMINTERFACE>. Related source file is "acs.v". Found 1-bit register for signal <MMBlockSelect>. Summary: inferred 1 D-type flip-flop(s). Unit <RAMINTERFACE> synthesized.
Synthesizing Unit <COMPARATOR>. Related source file is "acs.v". Found 8-bit comparator greater for signal <$cmp_gt0000> created at line 171. Found 1-bit xor3 for signal <$xor0000> created at line 173. Summary: inferred 1 Comparator(s).
inferred 1 Xor(s). Unit <COMPARATOR> synthesized.
Synthesizing Unit <ACSSURVIVORBUFFER>. Related source file is "mmu.v". Found 8-bit tristate buffer for signal <WrittenSurvivors>. Found 4-bit register for signal <WrittenSurvivors_>. Summary: inferred 4 D-type flip-flop(s). inferred 8 Tristate(s). Unit <ACSSURVIVORBUFFER> synthesized.
Synthesizing Unit <TRACEUNIT>. Related source file is "tbu.v". Found 8-bit register for signal <OutState>. Found 1-bit 8-to-1 multiplexer for signal <$COND_5>. Found 8-bit register for signal <CurrentState>. Found 8-bit register for signal <NextState>. Found 1-bit tristate buffer for signal <SurvivorBit>. Summary: inferred 24 D-type flip-flop(s). inferred 1 Multiplexer(s). inferred 1 Tristate(s). Unit <TRACEUNIT> synthesized.
Synthesizing Unit <RAMMODULE>. Related source file is "ram.v". Found 2048x8-bit single-port distributed RAM for signal <Data_Regs>. ----------------------------------------------------------------------| ram_style | Auto | |
----------------------------------------------------------------------| Port A | | | | aspect ratio | 2048-word x 8-bit clkA weA addrA | connected to signal <WClock> | connected to signal <_Enable> | connected to signal <Address> | | | | fall | low | | | |
| |
diA doA
| connected to signal <Data> | connected to internal node
| |
| |
----------------------------------------------------------------------Found 8-bit tristate buffer for signal <Data>. Found 8-bit register for signal <DataBuff>. Summary: inferred 1 RAM(s). inferred 8 D-type flip-flop(s). inferred 8 Tristate(s). Unit <RAMMODULE> synthesized.
Synthesizing Unit <BMU>. Related source file is "bmu.v". Found 2-bit register for signal <CodeRegister>. Summary: inferred 2 D-type flip-flop(s). Unit <BMU> synthesized.
=================================================================== ====== HDL Synthesis Report
Macro Statistics # RAMs :1 :1
2048x8-bit single-port distributed RAM # ROMs 4x1-bit ROM # Adders/Subtractors 12-bit adder 6-bit subtractor 8-bit adder # Counters 4-bit up counter :1 :1 :8 :2 :1 :1 :1 : 10
6-bit down counter # Registers 1-bit register 11-bit register 2-bit register 32-bit register 4-bit register 6-bit register 8-bit register # Comparators 8-bit comparator greater # Multiplexers 1-bit 8-to-1 multiplexer # Tristates 1-bit tristate buffer 8-bit tristate buffer # Xors 1-bit xor2 1-bit xor3 1-bit xor9 :5
:1 : 151 : 10 :1 :1 : 128 :1 :2 :8 :8 :8 :1 :1
:1 :4 : 40 : 24 :8 :8
=================================================================== ====== Advanced HDL Synthesis Report
Macro Statistics # RAMs :1 :1
2048x8-bit single-port distributed RAM # ROMs 4x1-bit ROM # Adders/Subtractors 12-bit adder 6-bit subtractor 8-bit adder # Counters 4-bit up counter 6-bit down counter :1 :1 :8 :2 :1 :1 :1 :1 : 10
# Registers Flip-Flops # Comparators 8-bit comparator greater # Multiplexers 1-bit 8-to-1 multiplexer # Xors 1-bit xor2 1-bit xor3 1-bit xor9
: 4199 : 4199 :8 :8 :1 :1 : 40 : 24 :8 :8
=================================================================== ======
Device utilization summary: ---------------------------
Selected Device : 2vp20fg676-6
Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number used as logic: Number used as RAMs: Number of IOs:
3168 out of 9280
34% 22% 23%
4218 out of 18560 4427 out of 18560 3403 1024 6 404 25% 1%
Number of bonded IOBs: 6 out of Number of GCLKs: 4 out of 16
Bibliography
77

Documentation Pprojectdf

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Documentation Pprojectdf

Hochgeladen von

Copyright:

Verfügbare Formate

Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using Verilog HDL

1.1 Overview of the project

Department of ECE, GITAM UNIVERSITY

Department of ECE, GITAM UNIVERSITY

1.3 Outline and Context of the Report

Department of ECE, GITAM UNIVERSITY

1.4 Contributions and main objectives

custom designed Viterbi algorithm.

Department of ECE, GITAM UNIVERSITY

1.5 Scope of the Project

Department of ECE, GITAM UNIVERSITY

2.1 Overview of VLSI

Department of ECE, GITAM UNIVERSITY

2.2 INTRODUCTION OF VLSI

Department of ECE, GITAM UNIVERSITY

2.3 What is VLSI?

2.3.1 History of scale integration

Department of ECE, GITAM UNIVERSITY

2.3.2 Advantages of ICs over discrete components

2.3.4 VLSI and systems

itself-consider portable televisions or handheld cellular telephones.

Department of ECE, GITAM UNIVERSITY

Lower power consumption. Replacing a handful of standard

2.4 Applications of VLSI

Department of ECE, GITAM UNIVERSITY

Electronic systems in cars operate stereo systems and

2.5 VERILOG HDL

Department of ECE, GITAM UNIVERSITY

2.5.2 Major Capabilities:

Department of ECE, GITAM UNIVERSITY

in one design, each module may be modeled at a different level.

Fig2.1: Mixed level modelling

(bitwise-or). High-level programming language constructs such as condition- als, case

2.6 Verilog synthesis

Department of ECE, GITAM UNIVERSITY

Figure: 2.2 synthesis process

Department of ECE, GITAM UNIVERSITY

2.7 Tools used and explanation

2.8 Introduction about the Software:

Department of ECE, GITAM UNIVERSITY

Figure 3.1 The Convolutional Decoding

Department of ECE, GITAM UNIVERSITY

3.2 Convolutional Encoders

Each m-bit information symbol

The transformation is a function of the last k information symbols, where k is

Department of ECE, GITAM UNIVERSITY

Department of ECE, GITAM UNIVERSITY

. Figure 3.3. A recursive convolutional encoder

3.2.1 Trellis Diagram

Department of ECE, GITAM UNIVERSITY

Department of ECE, GITAM UNIVERSITY

Figure 4.1 Block Diagram Of Viterbi Decoder

4.1. Branch Metric Computation (BMC)

Department of ECE, GITAM UNIVERSITY

Figure 4.2 A recursive convolutional encoder

Department of ECE, GITAM UNIVERSITY

4.2. Path Metric Computation and Add-Compare-Select (ACS) Unit

Figure 4.3 ACS Unit

Department of ECE, GITAM UNIVERSITY

Department of ECE, GITAM UNIVERSITY

Figure 4.5 A sample implementation of an ACS Unit

4.3. Survivor memory unit or Trace back Unit

Department of ECE, GITAM UNIVERSITY

: performs 2 bits hamming DISTance calculation /-----------------------------------/

wire [`WD_DIST2`N_ACS -1:0] Distance; Output