Sie sind auf Seite 1von 59

Design of High Speed and Power Efficient Viterbi Decoder

Using VHDL
A thesis submitted in partial fulfillment of the requirements for the award of degree of

B.Tech

In

Electronics and Communication Engineering

BY

RAMRAJ NAGAR (ROLL NO: 108111062)

SHRI PRAKASH (ROLL NO: 108111079)

SHOBHIT KUMAR BHADANI (ROLL NO: 108111078)

ELECTRONICS AND COMMUNICATION ENGINEERING


NATIONAL INSTITUTE OF TECHONOLOGY
TIRUCHIRAPPALLI-620015
May 2015
BONAFIDE CERTIFICATE

This is to certify that the project titled “DESIGN OF HIGH SPEED AND POWER
EFFICIENT VITERBI DECODER USING VHDL” is a bonafide record of the work

done by:  

RAMRAJ NAGAR (ROLL NO: 108111062)

SHRI PRAKASH (ROLL NO: 108111079)

SHOBHIT KUMAR BHADANI (ROLL NO: 108111078)

   
In partial fulfillment of the requirements for the award of the degree of Bachelor
of Technology in Electronics and Communication Engineering of the
NATIONAL INSTITUTE OF TECHNOLOGY, TIRUCHIRAPPALLI,
during the year 2011-2015.

Dr. R.K. Jeyachitra Dr. D. Sriram Kumar


(Project Guide) Head of the Department

Project Viva-voce held on___________________________________________

Internal Examiner External Examiner

ii
ABSTRACT
In the modern era of electronics and communication decoding and encoding of any data(s)
using VLSI technology requires low power, less area and high speed constrains. The Viterbi
Decoder using survivor path with necessary parameters for electronics and communication
field, is an attempt to reduce the power and cost and at the same time increase the speed
compared to normal decoder.

This project presents three objectives. Firstly, an orthodox Viterbi Decoder is designed and
simulated. For faster process application, the Gate Diffused Input Logic based Viterbi Decoder
is designed using Xilinx ISE, simulated and synthesized successfully. The new proposed GDIL
Viterbi Decoder provides very less path delay with low power simulation results.

Secondly, the GDIL Viterbi Decoder is again compared with our proposed technique, which
comprises a Survivor Path Unit implements a trace back method with DRAM and SRAM. This
proposed approach of incorporating DRAM and SRAM stores the path information in a manner
which allows fast read access without requiring physical partitioning of the DRAM and
SRAM. This leads to a comprehensive gain in speed with low power effects.

Thirdly, all the Viterbi Decoders are compared, simulated, synthesized and the proposed
approach shows the best simulation and synthesize results for low power and high speed
application in VLSI design. The ACS and TB units and its sub circuits of the decoder(s) have
been operated in deep pipelined manner to achieve high transmission rate. Although the
register exchange based survivor unit has better throughput when compared to trace back unit,
here by introducing the RAM cell between the ACS array and output register bank, a significant
amount of reduction in path delay and Speed Improvement have been observed. All the
designing of Viterbi is done using Xilinx ISE 12.4.

Keywords: Branch Metric Unit, Add-Compare-Select Unit, Survivor Memory Unit, Memory
Unit, Viterbi Decoder.
iii
ACKNOWLEDGEMENTS

We wish to record our sincere thanks to our project guide, Dr. R.K. Jeyachitra,
Assistant Professor, Department of Electronics and Communication Engineering,
NIT-Trichy, who constantly motivate us during the entire course of the project and
made tremendous sacrifice while we snatched away valuable time from her to
complete this project work.

We take this opportunity to thank Dr.D. Sriram Kumar, Head of the Department,
Electronics and Communication Engineering for allowing us to undertake this
project.

We are also thankful to all the teaching and non-teaching staff of this department.

iv
TABLE OF CONTENTS
Title Page No.

BONAFIDE CERTIFICATE........................................................................................ ii

ABSTRACT .................................................................................................................... iii

ACKNOWLEDGEMENTS ......................................................................................... iv
TABLE OF CONTENTS .............................................................................................v

LIST OF FIGURES ...................................................................................................... vii

LIST OF TABLES ........................................................................................................ ix

ABBERVIATIONS ....................................................................................................... x

CHAPTER 1 INTRODUCTION

1.1 Viterbi Decoder .................................................................................................. 1

1.2 Limitation of existing Viterbi decoder ............................................................... 2

1.3 Scope of our project ........................................................................................... 2

1.4 Conclusion .......................................................................................................... 2

CHAPTER 2 LITERATURE REVIEW

2.1 Limitation of Orthodox Viterbi decoder ............................................................ 3

2.2 Limitations of GDIL based Viterbi Decoder with DRAM ................................. 4

2.3 Conclusion .......................................................................................................... 4

CHAPTER 3 DESIGNING OF ORTHODOX VITERBI DECODER

3.1 Introduction ......................................................................................................... 5

3.2 Concept of GDIL Viterbi Decoder ..................................................................... 5

3.2.1 Branch Metric Unit using GDIL ..................................................... 5

3.2.2 Add-Compare-Unit using GDIL ..................................................... 7


v
3.2.3 Survivor Memory Unit.................................................................... 10

3.2.4 Complete Integration of GDIL Based Viterbi Decoder .................. 12

3.3 Simulation Results ............................................................................................. 13

3.4 Conclusion ......................................................................................................... 14

CHAPTER 4 PROPOSED VITERBI DECODER WITH DRAM

4.1 Introduction ........................................................................................................ 15

4.2 Designing of DRAM based Viterbi Decoder ..................................................... 15

4.3 Simulation report for the model ........................................................................ 16

4.4 Synthesis result of proposed Viterbi decoder .................................................... 18

4.5 Conclusion ......................................................................................................... 18

CHAPTER 5  PROPOSED VITERBI DECODER WITH DRAM

5.1 Introduction ........................................................................................................ 20

5.2 Simulation Results ............................................................................................. 21

5.3 Synthesis Results of proposed Viterbi decoder ................................................. 23

5.4 Conclusion……………………………………………………………………..23

CHAPTER 6   SIMULATION RESULTS

6.1 Introduction ........................................................................................................ 24

6.2 Performance comparison of different Viterbi decoder models .......................... 24

6.2.1 Timing report comparison................................................................ 24

6.2.2 Power analysis comparison ............................................................ 25

6.3 Design Planner of Viterbi Decoder .................................................................... 25

vii
6.4 Graphical Analysis ............................................................................................. 26

6.4.1 In terms of time ................................................................................ 26

6.4.2 In terms of power ............................................................................. 26

6.5 Conclusion ......................................................................................................... 27

CHAPTER 7   CONCLUSION AND FUTURE SCOPE

7.1 Conclusion……………………………………………………………………..28

7.2 Future Scope of the Project…………………………………………………… 28

APPENDIX ................................................................................................................... 29

REFERENCES............................................................................................................. 48

vii
LIST OF FIGURES

Figure No. Title Page No.

2.1 RTL View of Orthodox Viterbi Decoder ........................................................ 4

3.1 RTL Design of BMU ...................................................................................... 7

3.2 RTL Design of ACSU ..................................................................................... 9

3.3 RTL Design of SMU ...................................................................................... 13

3.4 RTL Design of Integrated Viterbi Decoder ................................................... 12

4.1 RTL Design of Proposed Viterbi Decoder using DRAM .............................. 16

4.2 Power Analysis Report of Viterbi Decoder using DRAM............................. 17

4.3 Timing Analysis Report of Viterbi Decoder using DRAM ........................... 18

5.1 RTL Design of Proposed Viterbi Decoder using SRAM............................... 20

5.2 Timing Analysis Report of Viterbi Decoder using SRAM............................ 22

5.3 Power Analysis Report of Viterbi Decoder using SRAM ............................. 22

6.1 Design Planner of Viterbi Decoder ................................................................ 25

6.2 Timing and Delay Analysis ........................................................................... 26

6.3 Power Analysis .............................................................................................. 26

viii
LIST OF TABLES

Table No. Title Page No.

6.1 Performance Summary (Timing) .................................................................... 24

6.2 Performance Summary (Power) ..................................................................... 25

ix
ABBREVIATIONS

GDIL Gate Diffusion Input Logic

BMU Branch Metric Unit

ACSU Add-Compare-Select Unit

SMU Survivor Memory Unit

BM Branch Metric

SM State Metric

PM Path Metric

TB Trace Back

DRAM Dynamic Random Access Memory

SRAM Static Random Access Memory

x
Chapter 1
INTRODUCTION
The Viterbi decoding algorithm, proposed by Viterbi, is a decoding process for
convolution codes in memory-less noise. The algorithm can be applied to a host of problems
encountered in the design of communication systems. The Viterbi Algorithm finds the most-
likely state transition sequence in a state diagram, given a sequence of symbols. The Viterbi
algorithm is used to find the most likely noiseless finite-state sequence, given a sequence of
finite-state signals that are corrupted by noise.

1.1 Viterbi Decoder

Generally, a Viterbi Decoder consists of three basic computation units: BMU, ACSU
and TBU. The BMU calculates the branch metrics by the hamming distance or Euclidean
distance and the ACSU calculates a summation of the branch metric from the BMU and
previous state metrics, which are called the path metrics. After this summation, the value of
each state is updated and then the survivor path is chosen by comparing path metrics.
The convolution encoder has a rate of ½ (k/n) with a constraint length of 3. With an
encoder, the 3 bit shift register provides the memory and two modulo-2 adders provide
convolution operations. For each bit in the input sequence, two bits are output, one from each
of the two modulo-2 adders.
The decoding procedure compares the received sequence with all the possible
sequences that may be obtained with the respective encoder and then selects the sequence that
is closest to the received sequence. There are always two paths merging at each node and the
path selected is the one with the minimum hamming distance, the other is simply terminated.
The retained paths are known as survivor paths and the final path selected is the one with the
continuous path through the trellis with a minimum aggregate hamming distance.

1
1.2 Limitation of Existing Viterbi Decoder

The existing Viterbi decoder was implemented using normal CMOS and pseudo NMOS
techniques which lead to high switching activity and thereby more power consumption. We
need some new techniques to implement the same functionality of Viterbi Decoder using an
efficient mechanism

1.3 Scope of our project

In this project we will develop a Viterbi Decoder using gate diffused input logic technique
and then we will optimize the same for speed and power consumption. We can increase the
speed by using a memory source for storing the path information so that in the next fetch we
can use the stored path information. We will have a look into optimizing the speed by using
different kinds of memories.

1.4 Conclusion
We discussed the Viterbi decoder and its concepts. We also saw the limitations of existing
Viterbi Decoder and how we can make use of gate diffused input logic concepts to make the
existing model even better in terms of power and speed.

2
Chapter 2
LITERATURE REVIEW

Y. Zhu and M. Benaissa (2005) presented a novel ACS scheme that enables high speeds
to be achieved in area efficient Viterbi Decoders without compromising for area and power
efficiency. Multilevel pipelining has been introduced into the ACS feedback loop. Arkadiy
Morgenshtein et al (2007) used Gate Diffusion Input circuits for asynchronous design and
compared the designs with CMOS asynchronous design. Dalia A. El-Dib and Mohamed I.
Elmasry (2008) discussed the implementation of a Viterbi Decoder based on modified register-
exchange (RE) method. Song Li and Qing-Ming Yi (2009) proposed a scheme based on
Verilog language for the implementation of high-speed and low power consumption bi-
directional Viterbi Decoder. The decoding was done in both positive and negative direction
and the delay was half of that of the unilateralism decoder and the decoding speed was greatly
Improved. Yun-Nan Chang and Yu Chung Ding (2009) presented a low power design for
Viterbi Decoder based on a novel survivor path trace mechanism. Lupin Chen et al (2010)
presented a low-power trace-back (TB) scheme for high constraint length Viterbi Decoder.
Xuan-zhong Li et al (2011) discussed a high speed Viterbi Decoder which was based on
parallel radix-4 architecture and bit level carry-save algorithm. Seongjoo Lee (2012) presented
an efficient implementation method for parallel processing Viterbi Decoders in UWB systems.

2.1 LIMITATIONS OF ORTHODOX VITERBI DECODER

Viterbi Decoder is mainly used in all communication techniques. Logic styles like
CMOS, Pseudo NMOS and Dynamic logic design of circuits at ACS level are done, but the
switching activity in these logic styles are high and hence lead to high power dissipation.

3
Fig.2.1: RTL view of Orthodox Viterbi Decoder

In fig.2.1 the RTL logic design of the old orthodox Viterbi Decoder is simulated. The
backend coding is done in VHDL and the circuit is synthesized using Xilinx ISE. This
Viterbi Decoder provides the delay report and power analysis report, which clearly depicts
the high power dissipation that is undesired for any FPGA based VLSI circuit design.

2.2 LIMITATIONS IN GDIL BASED VITERBI DECODER WITH DRAM

The existing work has been implemented where DRAM were used to store the path information
which needs to be refreshed again and again. Use of DRAM results in a loss of data after certain
period of time. DRAM consume more power than SRAM. SRAM is faster than DRAM.

2.3 CONCLUSION
In this chapter we presented the literature review of our project which dealt with the
advancement of Viterbi decoder over a period of time. We also gave a look into how we will
proceed towards the completion of our objective.
4
Chapter 3
DESIGNING OF ORTHODOX VITERBI DECODER

3.1 Introduction

GDIL is a technique of low power digital for circuit design which allows reducing power
consumption, delay and area of the digital circuit. The basic GDIL cell is similar to the standard
CMOS inverter, the differences are: (1) GDIL cell contains three inputs (2) Bulks of both
NMOS and PMOS are connected to N or P, so it can be randomly biased at contrast with
CMOS inverter. The GDIL contains four terminals – G (common gate input of the NMOS and
PMOS transistors), P (outer diffusion node of the PMOS transistor), N (outer diffusion node
of the NMOS transistor) and D node (common diffusion of both transistors). The GDIL
approach allows implementation of a wide range of complex logic functions using only two
transistors. This GDIL method is suitable for design of fast, low power circuits using a reduced
number of transistors (as compared to CMOS and existing Pass Transistor Logic techniques),
while improving logic level swing and static power characteristics and allowing simple top-
down design by using small cell library.
A simple change of the input configuration of the simple GDI cell corresponds to very
different Boolean functions. Most of these functions are complex (6–12 transistors) in CMOS,
as well as in standard PTL implementations but very simple (only two transistors per function)
in the GDIL design method. This GDIL undoubtedly reduces area as lesser number of LUTs
and CLBs are used in FPGA prototyping. GDIL Viterbi Decoder consists of three blocks. They
are BMU, ACSU and SMU. All these blocks are designed using GDIL technology, simulated
and synthesized using Xilinx ISE.

3.1 Concept of GDIL Viterbi

3.2.1. Branch Metric Unit using GDIL


The branch metric computation block compares the received code symbol with the expected
5
Code symbol and counts the number of differing bits. It consists of EXOR gate and counter.
The Branch Metric Unit is designed using the EXOR gate and the 3-bit counter. The
output of the EXOR gate is fed as the clock input to the 3-bit counter. 3-bit counter is designed
by cascading the D FF and the output of the one flip flop is given as clock input for the next
flip flop. Further the D input for all the flip flops are tied to HIGH input. The preset and clear
input is used to make the counter working as asynchronous counter. The RTL schematic
diagram of the Branch Metric Unit is shown in fig.3.1.

Flow Chart of BMU

Block diagram of BMU

6
Fig.3.1: RTL design of Branch Metric Unit

Initially the counter output is “000” when the preset= ‘1’and clear= ‘1’. Now if the first clock
is applied then the counter starts counting the clock cycle as “001”. For the next clock cycle
the output of the counter is “001”. Similarly the counter will count all the clock cycles and it
will go to “000” when reaches “111”. Explanation of the architecture is given for constraint
length K=3.

3.2.2. Add-Compare-Select Unit


ACSU which adds the BM to the corresponding PM compares the new PMs and then
stores the selected PMs in the PMM. At the same time, the ACSU stores the associated survivor
path decisions in the SMU. The PM of the survivor path of each state is updated and stored
back into the PMM. The Block diagram of the Add Compare and Select unit is shown in the
fig 3.2.

7
Flow Chart of Add-Compare-Select Unit

Block Diagram of Add-Compare-Select Unit


8
Fig.3.2: RTL design of Add-Compare-Select Unit

State metric (SMi, j) and Branch Metric (BMi, j) are the two inputs to the Adder unit. Each
butterfly wing is usually implemented by a module called ACS module. The two adders
Compute the partial path metric of each branch. The comparator compares the two partial
metrics and the selector selects an appropriate branch. The new partial path metric updates the
state metric of state p and the survivor path recording block records the survivor path. The
adder unit which is proposed in the design consists of two full adders and one half-adder.
Output of the BMU is added with the previous path metric and the obtained output is the new
path metric for the next branch. The input sequence from a0 (1100), b0 (1001), a1 (1011), b1
(1101), a2 (0111) and b2 (0011) are added to the adder unit. The 1st bit of a0 and b0, i.e. 11, is
the input of half-adder and produces the output of the half adder operation sum1 as ‘0’and
carry as ‘1’.The half adder carry is given to the next full-adder which takes the 3-bit input as
a1, b1 and carry, i.e. 111 and produces the output as full-adder operation sum2 as ‘1’ and carry
as ‘1’.

The “carry” is given to the next full-adder input as 3-bit input a2, b2 and carry, i.e. 100
which produces the output as full adder operation sum as ‘1’ and carry as ‘0’. All other input
9
bits are processed similarly. The input value of 4-bit comparator a0, a1, a2, a3 and b0, b1, b2, b3
is 0011 and 1011. When the two inputs to the comparator are A=0011 and B=1011, then the
output line A<B will go to high state i.e. less than state. A and B are given as A=0111 and
B=0100, then the output line A>B will go to the high state i.e. greater than state. Likewise all
the input values are worked in the comparator. Here the A<B value is taken because it is the
smallest value and hence the A<B state output is given to the next SMU unit.

The output of the comparator is given as the select signal for the multiplexer which is
used to select the minimum path metric of the decoded message bit in the Viterbi Decoder.
The selector unit consist of four 2x1 MUX and the select signal for all the multiplexers are
from the A<B output of the comparator. Hence the selector selects the minimum path metric
value. The 4-bit input is a0=1101, b0=1111, a1=1101, b1=1000, a2=1111, b2=1110, a3=1001,
b3=1110 and select line value is 1010.When the select line value is high i.e. ‘1’, the output
value z0 will be b0, i.e. ‘1’. When the select line value is ‘0’, the output value z0 will be a0.
Likewise all other input values z1, z2 and z3 are taken and the outputs are obtained.

3.2.3. Survivor Memory Unit

The Survivor memory unit is designed by using the serial -in-serial-out shift register
and the length of the shift register depends on the length of the convolution encoder. The fig.3.9
shows the 4x4 memory unit to store the minimum surviving path and the clock signal of the
SMU are the ACSU output for a constraint length of 3. The 4-bit output of the selector unit is
input of the SMU. The SMU was designed as 4x4 shift register using D flip-flop. Each bit is
stored in each of the D flip-flop. Similarly all the 4 shift registers store one bit each.

10
Flow Chart of Survivor Memory Unit

Block diagram of Survivor Memory Unit

11
Fig.3.3: RTL design of Survivor Memory Unit

The input of D flip-flop and clock is D flip-flop GND PULSE (0 5 0 1n 1n 30n 60n) and CLK
GND PULSE (0 5 0 1n 1n 10n 20n) and the output for the q1, q2, q3 and q4 are shifting
depending on the input value of D flip-flop and clock pulse. Number of memory stage depends
on 2k-1 where k is the constraint length. In this GDIL method it varies from 4 to 128 stages.

3.2.4. Complete Integration of GDIL based Viterbi Decoder

The Viterbi Decoder using GDIL technique is designed by integrating all the units like
BMU, ACSU and SMU. The proposed design using GDIL is shown in the fig.3.10.
Here two Branch Metric Units are used since two possible changes from one state to another.
This BMU calculates the branch metric between the expected sequence (original random input
which is ‘a’) and the received sequence (introduced errors which is ‘b’). Then adder unit adds
branch metric with the previous path metric and comparator compares the two paths and select
the least path using selector. The survivor memory unit stores the path metric value and its
corresponding states using the 2x1 multiplexer and 2 bit shift register to get the decoded output.

12
Fig.3.4: RTL design of GDIL Viterbi Decoder

3.3 Simulation Results for above model

The GDIL Viterbi Decoder consists of three major blocks: BMU, SPU and ACSU. The
above units are discussed in detail in this chapter above. The simulation of orthodox Viterbi
Decoder is done using Xilinx ISE 12.4. Each block is programmed using Verilog HDL in back
end. Individual blocks are verified and tested. Finally the total Viterbi is synthesized, placed
and routed successfully. Table 2 depicts the detail timing report of GDIL Viterbi.

13
Timing Report of GDIL Viterbi

3.4 Conclusion

As we have seen the details of timing report of orthodox Viterbi, the table clearly shows
the propagation delay, maximum clock frequency 64.516MHz, minimum clock period
21.400ns, clock to set-up cycle time 21.400ns, set- up to clock pad delay 87.663ns, clock pad
to output pad delay 16.635 ns. And also more number of gate level of integration is done to
design the old Viterbi Decoder circuit. This is taken up as limitations of orthodox Viterbi
which is further compared with the GDIL based Viterbi in order to improve the overall
performance of the circuit.

14
Chapter 4
PROPOSED VITERBI DECODER USING DRAM

4.1 Introduction

The normal GDIL Viterbi is also compared with our proposed technique, which
comprises a SPU implements a trace back method with DRAM. This proposed approach of
incorporating DRAM stores the path information in a manner which allows fast read access
without requiring physical partitioning of the DRAM. This leads to a comprehensive gain in
speed with low power effects.

4.2 Design of DRAM based Viterbi Decoder


As shown in fig.4.1 the proposed DRAM based GDIL Viterbi is designed in Register
Transfer Logic style. As in earlier inventions of Viterbi, the main block is divided into two
main sub-units, i.e. ACS array and a SPU. Here in this proposed system the metric calculation,
addition, weight comparison and survivor path selection everything take place in the ACS array
.Thus ACS array contains the weight values at each state, as a progression along the survivor
path. The weight values are very necessary when comparing with other weights of other paths,
to determine the survivor path. The compilation and comparison functions, which takes place
within the ACS array actually determines the path extensions. Signals indicating these
extensions to the Survivor paths are passed from the ACS array to the SPU, which then updates
the survivor paths. Till now two methods exist for implementing the SPU: the register
exchange method and the trace back method. Though the register exchange based survivor unit
has better throughput when compared to trace back unit, but in this project by introducing the
RAM cell between the ACS array and output register bank, a significant amount of reduction
in path delay has been observed.The TBU Unit operates in the following manner.

15
Fig.4.1: RTL design of Proposed Viterbi Decoder using DRAM

At each time step, the chosen transitions are stored in a column. One state is chosen as a starting
point and then trace back begins. Here in this proposed system the TBU consists of repeated
reads from the RAM cell. Each read accesses a column which precedes the column last
accessed in same clock cycle. This would cause very demanding requirements of GDIL Viterbi
using DRAM for high speed applications though it requires larger silicon chip area.

4.2 Simulation report for above model


The simulation is mainly performed by Xilinx ISE 12.4 and synthesized in the FPGA
target device: Virtex 6. The major constrains for VLSI design is speed, power and area. In this
whole project the simulation is mainly performed by Xilinx ISE 12.4 and synthesized in the
FPGA target device: Virtex 6. The major constrains for VLSI design is speed, power and area.

16
In this whole project all three constrains have been taken into account. The detail timing report
and power report describes the delay analysis for high speed applications and low power for
proposed Virtex based Viterbi Decoder.

Fig.4.2: Power Analysis Report of Viterbi Decoder with DRAM

17
Fig.4.3: Timing Analysis Report of Viterbi Decoder with DRAM

4.3 Synthesis Results of Proposed Viterbi Decoder


The developed proposed Viterbi Decoder is simulated and verified its functionality.
Once the functional verification is done, the RTL model is taken to the synthesis process using
the Xilinx
ISE 12.4. In synthesis process, the RTL model will be converted to the gate level netlist
mapped to a specific technology library. This modified Viterbi Decoder design is implemented
on FPGA (Field Programmable Gate Array) family of Virtex 6. Here in this Virtex 6 family
many different devices were available in the Xilinx ISE tool. In order to implement this
modified Viterbi with DRAM, the device named as “XA9536XL” has been chosen and the
package as “FG320” with the device speed as “– 6 ”. The design of modified Viterbi Decoder
for low power and high speed is synthesized successfully and its results are analyzed as shown
in the fig.4.3.

18
4.4 Conclusion
The design was implemented and the result found were recorded. The Min Clock to Pad
Delay and Setup to Clock at the Pad were found to be 15.500 ns and 52.600 ns respectively
which is much lesser than the earlier 87.663 ns. This clearly shows a remarkable change in
speed optimization for the target device.

19
Chapter 5
PROPOSED VITERBI DECODER USING SRAM

5.1 Introduction

In the earlier sections we showed the implementation of GDIL based Viterbi Decoder

using DRAM.

To further improve the speed and power efficiency we can incorporate SRAM in place

of DRAM. SRAM is known for its permanent storage of data and its power efficiency. The

SRAM used here stores the path information and provide much faster read access.

Fig.5.1: RTL design of Proposed Viterbi Decoder using SRAM

20
As shown in fig.5.1 the proposed SRAM based GDIL Viterbi Decoder is designed in Register

Transfer Logic style. As in earlier inventions of Viterbi Decoder, the main block is divided

into two main sub-units, i.e.; ACS array and a SPU. Here in this proposed system the metric

calculation, addition, weight comparison and survivor path selection everything take place in

the ACS array. Thus ACS array contains the weight values at each state, as a progression along

the survivor path. The weight values are very necessary when comparing with other weights

of other paths, to determine the survivor path. The compilation and comparison functions,

which takes place within the ACS array actually determines the path extensions. Signals

indicating these extensions to the survivor paths are passed from the ACS array to the SPU,

which then updates the survivor paths. In this project by introducing the SRAM cell between

the ACS array and output register bank, a significant amount of reduction in path delay has

been observed.

The TBU Unit operates in the following manner. At each time step, the chosen

transitions are stored in a column. One state is chosen as a starting point and then trace back

begins. Here in this proposed system the TBU consists of repeated reads from the RAM cell.

Each read accesses a column which precedes the column last accessed in same clock cycle.

This would cause very demanding requirements of GDIL Viterbi Decoder using SRAM for

high speed applications though it requires larger complexities.

5.2 Simulation Results


The simulation is mainly performed by Xilinx ISE 12.4 and synthesized in the FPGA
target device: Spartan 6. The major constrains for VLSI design is speed, power and area. In
this whole project all three constrains have been taken into account. The detail timing report
and power report describes the delay analysis for high speed applications and low power for
proposed Spartan based Viterbi Decoder.
21
Fig.5.2: Timing Analysis Report of Viterbi Decoder with SRAM

Fig.5.3: Power Analysis Report of Viterbi Decoder with SRAM

22
5.3 Synthesis Results of Proposed Viterbi Decoder
The developed proposed Viterbi Decoder is simulated and verified its functionality. Once
the functional verification is done, the RTL model is taken to the synthesis process using the
Xilinx ISE 12.4. In synthesis process, the RTL model will be converted to the gate level netlist
mapped to a specific technology library. This modified Viterbi Decoder design is implemented
on FPGA (Field Programmable Gate Array) family of Virtex 6. Here in this Virtex 6 family
many different devices were available in the Xilinx ISE tool. In order to implement this
modified Viterbi with SRAM, the device named as “XA9536XL” has been chosen and the
package as “FG320” with the device speed as “– 6 ”. The design of modified Viterbi Decoder
for low power and high speed is synthesized successfully and its results are analyzed as shown
in the fig.5.2

5.4 Conclusion
The design was implemented and the result found were recorded. The Min Clock to Pad
Delay and Setup to Clock at the Pad were found to be 15.500 ns and 20.800 ns respectively
which is much lesser than the earlier 87.663 ns. This clearly shows a remarkable change in
speed optimization for the target device.

23
Chapter 6
RESULTS
6.1 Introduction

The above chapters dealt with the design and report of Viterbi decoder which we got in the
simulation.
Here we will show a comparative analysis of our proposed model and the existing model to
have an idea of the success of our work

6.2 Performance comparison of Different Viterbi Decoder Models

The performance measures of different Viterbi decoder models are compared and the results
are shown in the tabulated form. Table 6.1 shows the comparison of Timing Analysis Report
of different models.
While Table 6.2 shows the comparison of Power Analysis Report of different model

6.2.1 Timing Report comparison

Timing Orthodox GDIL GDIL


Report Viterbi Viterbi Viterbi
Decoder Decoder Decoder
with DRAM with
SRAM
Target XC6SLX45 XC6SLX45 XC6SLX45
Device (Virtex 6) (Virtex 6) (Virtex 6)
Max Clock 64.516 MHz 64.516 MHz 64.516
Frequency MHz
Min Clock 21.400 ns 15.500 ns 15.500 ns
Period
Clock to 21.400 ns 15.500 ns 15.500 ns
Set-up
Cycle Time
Set-up to 87.663 ns 52.600 ns 20.800 ns
Clock
Pad Delay
Clock Pad 16.635 Ns 5.800 ns 5.800 ns
to
Output Pad
Delay
TABLE 1. PERFORMANCE SUMMARY (TIMING)
24
6.2.2 Power Analysis comparison

Power Report Orthodox Viterbi GDIL Viterbi GDIL Viterbi


Decoder Decoder with Decoder with
DRAM SRAM
Target Device XC6SLX45 XC6SLX45 XC6SLX45
(Virtex 6) (Virtex 6) (Virtex 6)
Leakage Power[1] 3.791 μw 1.007 μw 0.718 μw
Quiescent Power[2] 2.315 μw 1.007 μw 1.043 μw
Dynamic Power[3] 0.739 μw 0.003 μw 0.001 μw

TABLE 2: PERFORMANCE SUMMARY (POWER)


[1] Power leakage when the device is powered but not configured.
[2] Power drawn by the device when it is powered up, configured with user logic and there is
no switching activity.
[3] Fluctuating power as your design runs. It represents the amount of power generated by
the switching user logic and routing.

6.3 Design Planner of Viterbi Decoder

Fig.6.1: Design Planner of Viterbi Decoder

25
6.4 Graphical Analysis

The graphical analysis is also performed for all three Viterbi Decoders to have a
complete performance summary. The Timing - Delay Analysis and Power Analysis are
shown in fig.6.2 and fig.6.3 respectively.
6.4.1 In terms of Time

Fig.6.2: Timing and Delay Analysis

6.4.2 In terms of Power

Fig.6.3: Power Analysis

24
6.5 Conclusion

We can conclude that by two decoder simultaneously a significant amount path


delay has been reduced which provides us an advantage of making the modification of the
decoder circuit. Also power report and performance summary shows the improvement of
speed for the target object.

27
Chapter 7
CONCLUSION AND FUTURE SCOPE

7.1 Conclusion
The ACS units and its sub circuits of all the three types of Decoder have been

operated in deep pipelined manner to achieve high transmission rate. Although the register

exchange based survivor unit has better throughput when compared to trace back unit, but here

by introducing the RAM cell between the ACS array and output register bank, a significant

amount of reduction in path delay has been observed. The fast decoding can be achieved by

this Viterbi Decoder, which shows significant results in High Speed Applications.

7.2 Future Scope of the Project

Among the variety of techniques and concepts used to improve the speed and power

efficiency of Viterbi Decoder, this project has explored the use of faster memory and gate

diffused input logic mechanism. Alternate algorithms can be developed and complexities of

the circuit can be further reduced to gain more speed and accuracy. The register exchange

method can be used along with memory combinations instead of using trace back method but

then the extra overheads are the registers required for storing the carry bits and the path metrics

which has to be sorted out.

28
APPENDIX

Orthodox Viterbi Decoder

module
orthogdi1(a,b,reset,clk,t,b1,b2,b3,c_in,q_d1,q_d2,q_d3,q_d4,d,
vd_out,sel,wr_enable);
input a,b,reset,t,b1,b2,b3,c_in,clk,d,sel,wr_enable;
output q_d1,q_d2,q_d3,q_d4,vd_out;
wire
q3,q2,q1,b1,b2,b3,q11,q22,q33,s1,s2,s3,s11,s22,s33,c_out,c_out
1,f0,f1,
f2,f3,f01,f02,f03,f04,q_d1,q_d2,q_d3,q_d4,d,d11,d12,sm;

bmu out11(a,b,q3,q2,q1,reset,t);
callcla1 out12(s1,s2,s3,c_out,q1,q2,q3,b1,b2,b3,c_in);
bmu value3(a,b,q33,q22,q11,reset,t);
callcla1 value4(s11,s22,s33,c_out1,q11,q22,q33,b1,b2,b3,c_in);

comparator4bit1 value5(s1,s2,s3,c_out,s11,s22,s33,c_out1,sm);
mux2_call
value10(s1,s2,s3,s11,s22,s33,sm,c_out,c_out1,f0,f1,f2,f3);

call_d_ff
value11(f01,f02,f03,f04,clk,reset,q_d1,q_d2,q_d3,q_d4);
mux1 value12( sm, d,b, d11 );
dff_sync_reset value13(d11,clk,reset,d12);
dff_sync_reset valude14(d12,clk,reset,vd_out);
endmodule
29
Branch Metric Unit

module bmu(a,b,q3,q2,q1,reset,t);
input a,b,reset,t;
output q3,q2,q1;
wire qbar1,y,q2,q3,qbar2,qbar3;

xorgate out22(a,b,y);

tff_async_reset out1(t,y,reset,q1,qbar1);
tff_async_reset out2(t,qbar1,reset,q2,qbar2);
tff_async_reset out3(t,qbar2,reset,q3,qbar3);
endmodule

Async T-Flip Flop

module tff_async_reset (
t,// Data Input
y, // Clock Input
reset , // Reset input
q , // Q output
qbar // Qbar output
);

input t,y,reset ;
output q,qbar;
reg q;

always @ ( posedge y or negedge reset)


begin
30
if (~reset)
q <= 1'b0;
else if (y)
q <= !q;
else
q <= q;
end
end
assign qbar = ~q;

endmodule

Exor Gate

module xorgate(a, b, y);


input a;
input b;
output y;
assign y = a ^ b;

endmodule

Carry Look Ahead Adder

module callcla1(s1,s2,s3, c_out, q1,q2,q3,b1,b2,b3, c_in);


output s1,s2,s3;
output c_out;
input q1,q2,q3,b1,b2,b3;
input c_in;
31
wire p1, g1, p2, g2, p3, g3;
wire c3, c2, c1;

//compute the p for each stage


assign p1 = q1 ^ b1,
p2 = q2 ^ b2,
p3 = q3 ^ b3;

//compute the g for each stage


assign g1 = q1 & b1,
g2 = q2 & b2,
g3 = q3 & b3;

//compute the carry for each stage


// Note that c_in is equivalent c0 in the arithmetic equation
for
//carry lookahead computation

assign c1 = g1 | (p1 & c_in),


c2 = g2 | (p2 & g1) | (p2 & p1 & c_in),
c3 = g3 | (p3 & g2) | (p3 & p2 & g1) | (p3 & p2 & p1 & c_in);

//compute sum
assign s1 = p1 ^ c_in,
s2 = p2 ^ c1,
s3 = p3 ^ c2;
//Assign carry output
assign c_out = c3;

endmodule

32
Comparator

module comparator4bit1(s1,s2,s3,c_out,s11,s22,s33,c_out1,sm);
input s1,s2,s3,c_out,s11,s22,s33,c_out1;
//input [3:0]b;
output sm;
reg eq,sm,gt;
wire [3:0]a;
wire [3:0]b;
assign a[0]=s1,
a[1]=s2,
a[2]=s3,
a[3]=c_out,
b[3]=s11,
b[2]=s22,
b[1]=s33,
b[0]=c_out1;

always @(a,b)
begin
if (a<b)
sm<=1'b0;
else
sm<=1'b1;
end

endmodule

33
Multiplexer

module
mux2_call(s1,s2,s3,s11,s22,s33,sm,c_out,c_out1,f0,f1,f2,f3);
input s1,s2,s3,s11,s22,s33,sm,c_out,c_out1;
output f0,f1,f2,f3;

mux1 value6( sm, s1,s11, f0 );


mux1 value7( sm, s2,s22, f1 );
mux1 value8( sm, s3,s33, f2);
mux1 value9( sm, c_out,c_out1, f3);

endmodule

Survivor Memory Unit

module
call_d_ff(f01,f02,f03,f04,clk,reset,q_d1,q_d2,q_d3,q_d4);
input f01,f02,f03,f04,clk,reset;
output q_d1,q_d2,q_d3,q_d4;

dff_sync_reset out111(f01,clk,reset,q_d1);
dff_sync_reset out112(f02,clk,reset,q_d2);
dff_sync_reset out113(f03,clk,reset,q_d3);
dff_sync_reset out114(f04,clk,reset,q_d4);

endmodule

34
D-Flip Flop

module dff_sync_reset (
data , // Data Input
clk , // Clock Input
reset , // Reset input
q // Q output
);
input data, clk, reset ;
output q;
reg q;

always @ ( posedge clk)


if (~reset) begin
q <= 1'b0;
end
else
begin
q <= data;
end

endmodule

Viterbi Decoder with DRAM

module ram11(
f0,f1,f2,f3,wr_enable,sel,reset,f01,f02,f03,f04);
input f0,f1,f2,f3,wr_enable,sel,reset;
35
output f01,f02,f03,f04;
//wire data_out1,data_out2,data_out3;

tristatebuffer3 final333(f01,f0,wr_enable,sel,reset);
tristatebuffer3 final334(f02,f1,wr_enable,sel,reset);
tristatebuffer3 final335(f03,f2,wr_enable,sel,reset);
tristatebuffer3 final336(f04,f3,wr_enable,sel,reset);

//orgate final337(data_out,data_out1,data_out2,data_out3);

endmodule

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
module tristatebuffer3(data_out,data_in,wr_enable,sel,reset1);
input data_in, wr_enable, sel,reset1;
output data_out;
wire g;

and_f out1(g, wr_enable, sel);


dff_sync_reset out2 (data_in , g ,reset1,q);
tristate out3(q, sel, data_out);
endmodule
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
module and_f( y,a,b);
input a;
input b;
output y;
assign y = ~a & ~b;
endmodule

36
Viterbi Decoder with SRAM

//sram

module debounce(
input clk, // this is 10Mhz
input rst,
input btn_in,
output reg btn_out
);

reg [19:0] count;

always@(posedge clk)
begin
if(rst)
begin
count <= 0;
btn_out <= 0;
end
else
begin
if(btn_in == 1)
begin
count <= count + 1'b1;
if(count == 20'd1000000)
begin
btn_out <= 1'b1;
count <= 0;
end
37
else
btn_out <= 1'b0;
end
end
end

endmodule

module seven_seg_decoder(
input [3:0] data_in,
output [7:0] data_out
);

assign data_out = (data_in == 4'b0000) ? 8'b00000011 :


(data_in == 4'b0001) ? 8'b10011111 :
(data_in == 4'b0010) ? 8'b00100101 :
(data_in == 4'b0011) ? 8'b00001101 :
(data_in == 4'b0100) ? 8'b10011001 :
(data_in == 4'b0101) ? 8'b01001001 :
(data_in == 4'b0110) ? 8'b01000001 :
(data_in == 4'b0111) ? 8'b00011111 :
(data_in == 4'b1000) ? 8'b00000001 :
(data_in == 4'b1001) ? 8'b00011001 :
(data_in == 4'b1010) ? 8'b00010001 :
(data_in == 4'b1011) ? 8'b11000001 :
(data_in == 4'b1100) ? 8'b01100011 :
(data_in == 4'b1101) ? 8'b10000101 :
(data_in == 4'b1110) ? 8'b01100001 :
8'b01110001;
endmodule
38
module seven_seg_top(
input clk,
input rst,
input [15:0] data_in,
output [7:0] seg,
output reg [3:0] ctl
);

reg [12:0] count;


reg clk_low;
reg [1:0] choose;
reg [3:0] din;

seven_seg_decoder seg1 (
.data_in(din),
.data_out(seg)
);

//-------generate a low frequency clock


always@(posedge clk)
begin
if(rst)
begin
count <= 13'b0;
clk_low <= 13'b0;
end
else
begin
count <= count + 1'b1;
if(count == 13'd5000)

39
begin
count <= 13'b0;
clk_low <= ~clk_low;
end
end
end

//------choose which seven seg is on-----

always@(posedge clk_low)
begin
if(rst)
choose <= 2'b0;
else
begin
choose <= choose + 1'b1;
end
end

always@(choose,data_in)
begin
case(choose)
2'b00 : begin ctl = 4'b0111; din = data_in[15:12]; end
2'b01 : begin ctl = 4'b1011; din = data_in[11:8]; end
2'b10 : begin ctl = 4'b1101; din = data_in[7:4]; end
2'b11 : begin ctl = 4'b1110; din = data_in[3:0]; end
endcase
end
//-----------

endmodule
40
module sram_control(
input [3:0] data_in,
input [3:0] address_in,
input write,
input read,
inout [3:0] sram_data,
input clk, //this is 10MHz
input rst,
output [3:0] data_out,
output [3:0] address_out,
output ce,
output reg we,
output oe,
output sram_clk,
output adv,
output cre,
output lb,
output ub,
output [1:0] state
);

parameter [1:0] s0 = 2'b0, s1 = 2'b01, s2 = 2'b10;


reg [1:0] current_state, next_state;
reg [3:0] address = 0;
reg [3:0] data [0:3];

assign sram_clk = 1'b0;


assign adv = 1'b0;
assign ce = 1'b0;
assign oe = 1'b0;

41
assign cre = 1'b0;
assign lb = 1'b0;
assign ub = 1'b0;
assign address_out = address;
assign data_out = data[address];
assign state = current_state;
assign sram_data = (we == 1'b0) ? data[address] : 4'bz;

//----------state machine for the control part-----------

always@(posedge clk)
begin
if(rst)
current_state <= s0;
else
current_state <= next_state;
end

always@(current_state,write,read)
begin
case(current_state)
s0 : begin
we = 1'b0;
if(write == 1)
next_state = s1;
else if(read == 1)
next_state = s2;
else
next_state = s0;
end
42
s1 : begin
we = 1'b0;
next_state = s0;
end
s2 : begin
we = 1'b1;
next_state = s0;
end
default : begin next_state = s0; we = 1'b0; end
endcase
end
//-----this is for the address part------
// if write , address = address + 1
// if read, address = sw

always@(posedge clk)
begin
if(current_state == s1)
address <= address + 1'b1;
else if(current_state == s2)
address <= address_in;
else
address <= address;
end

//-----this is for the data part----


// if write, data = sw
// if read, data = sram_data

always@(posedge clk)
begin
43
if(current_state == s1)
data[address] <= data_in;
else if(current_state == s2)
data[address] <= sram_data;
else
data[address] <= data[address];
end

endmodule

module sram_top(
input clk,
input rst,
input [3:0] sw,
input write_in,
input read_in,
input [3:0] sram_data,
output ce,
output we,
output oe,
output sram_clk,
output adv,
output cre,
output lb,
output ub,
output [3:0] address_out,
output [7:0] seg,
output [3:0] ctl,
output [1:0] state,
output w_l,
44
output r_l
);

reg [3:0] data_in;


reg [3:0] address_in;
wire [3:0] data_out;
wire clk_100;
wire clk_10;
wire [15:0] in_seg;
wire write,read;

assign in_seg = {address_in,data_in,address_out,data_out};


assign w_l = write;
assign r_l = read;

dcm top1
(// Clock in ports
.CLK_IN1(clk), // IN
// Clock out ports
.CLK_OUT1(clk_100), // OUT
.CLK_OUT2(clk_10), // OUT
// Status and control signals
.RESET(rst));

sram_control top2 (
.data_in(data_in),
.address_in(address_in),
.write(write),
.read(read),
.sram_data(sram_data),
.clk(clk_10),
45
.rst(rst),
.data_out(data_out),
.address_out(address_out),
.ce(ce),
.we(we),
.oe(oe),
.sram_clk(sram_clk),
.adv(adv),
.cre(cre),
.lb(lb),
.ub(ub),
.state(state)
);

seven_seg_top top3 (
.clk(clk_100),
.rst(rst),
.data_in(in_seg),
.seg(seg),
.ctl(ctl)
);

debounce top4 (
.clk(clk_10),
.rst(rst),
.btn_in(write_in),
.btn_out(write)
);

debounce top5 (
.clk(clk_10),
46
.rst(rst),
.btn_in(read_in),
.btn_out(read)
);

always@(posedge clk_100)
begin
if(write == 1)
begin
data_in <= sw;
end
else if(read == 1)
begin
address_in <= sw;
end
end

endmodule

47
REFERENCES

[1] Sun F and T. Zhang, 2005. ‘Parallel high-throughput limited search trellis decoder

VLSI design” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 9, pp.

1013–1022.

[2] Arun, C and Rajamani, 2008. “Design and VLSI implementation of a low probability

of error viterbi decoder”. First international conference on Emerging trends in

Engineering and technology: pp. 418-423.

[3] Min Woo Kim and Jun Dong Cho, 2006. “A VLSI Design of High Speed Bit-level

Viterbi

Decoder”. IEEE transactions on Electrical & Electronics.vol. 7,no .10, pp. 309-312.

[4] Mohammad K.Akbari and Ali Jahanian, 2004, “Area efficient,Low Power and

Robust design for Add Compare and Select Units,” Proceedings of the IEEE

Conferecne on EUROMICRO Systems on Digital System Design (DSD ’04).

[5] Designing in Verilog HDL - www.xilinx.com/training/languages/designing-with-

vhdl.htm

[6]Design Tools of Xilinx - http://www.xilinx.com/products/design-tools/ise-design-

suite/index.htm .

[7] Song li and qing-ming yi., 2006. ‘The Design of High-Speed and Low Power

Consumption Bidirectional Viterbi Decoder”. Fifth International Conference on

Machine Learning and Cybernetics. pp. 3886-3890.

48
[8] Arun, C and Rajamani, 2008. “Design and VLSI implementation of a low

probability of error viterbi decoder”. First international conference on Emerging

trends in Engineering and technology: pp. 418-423.

[9] Y. Syamala, K. Srilakshmi and N. Somasekhar Varma,”DESIGN OF LOW POWER

CMOS LOGIC CIRCUITS USING GATE DIFFUSION INPUT (GDI) TECHNIQUE”.

International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.5,

October 2013

[10] T. Kalavathidevi and C. Venkatesh , “Gate Diffusion Input (GDI) Circuits Based Low

Power VLSI Architecture for a Viterbi Decoder”. IRANIAN JOURNAL OF ELECTRICAL

AND COMPUTER ENGINEERING, VOL. 10, NO. 2, SUMMER-FALL 2011(BP)

49

Das könnte Ihnen auch gefallen