Beruflich Dokumente
Kultur Dokumente
Using VHDL
A thesis submitted in partial fulfillment of the requirements for the award of degree of
B.Tech
In
BY
This is to certify that the project titled “DESIGN OF HIGH SPEED AND POWER
EFFICIENT VITERBI DECODER USING VHDL” is a bonafide record of the work
done by:
In partial fulfillment of the requirements for the award of the degree of Bachelor
of Technology in Electronics and Communication Engineering of the
NATIONAL INSTITUTE OF TECHNOLOGY, TIRUCHIRAPPALLI,
during the year 2011-2015.
ii
ABSTRACT
In the modern era of electronics and communication decoding and encoding of any data(s)
using VLSI technology requires low power, less area and high speed constrains. The Viterbi
Decoder using survivor path with necessary parameters for electronics and communication
field, is an attempt to reduce the power and cost and at the same time increase the speed
compared to normal decoder.
This project presents three objectives. Firstly, an orthodox Viterbi Decoder is designed and
simulated. For faster process application, the Gate Diffused Input Logic based Viterbi Decoder
is designed using Xilinx ISE, simulated and synthesized successfully. The new proposed GDIL
Viterbi Decoder provides very less path delay with low power simulation results.
Secondly, the GDIL Viterbi Decoder is again compared with our proposed technique, which
comprises a Survivor Path Unit implements a trace back method with DRAM and SRAM. This
proposed approach of incorporating DRAM and SRAM stores the path information in a manner
which allows fast read access without requiring physical partitioning of the DRAM and
SRAM. This leads to a comprehensive gain in speed with low power effects.
Thirdly, all the Viterbi Decoders are compared, simulated, synthesized and the proposed
approach shows the best simulation and synthesize results for low power and high speed
application in VLSI design. The ACS and TB units and its sub circuits of the decoder(s) have
been operated in deep pipelined manner to achieve high transmission rate. Although the
register exchange based survivor unit has better throughput when compared to trace back unit,
here by introducing the RAM cell between the ACS array and output register bank, a significant
amount of reduction in path delay and Speed Improvement have been observed. All the
designing of Viterbi is done using Xilinx ISE 12.4.
Keywords: Branch Metric Unit, Add-Compare-Select Unit, Survivor Memory Unit, Memory
Unit, Viterbi Decoder.
iii
ACKNOWLEDGEMENTS
We wish to record our sincere thanks to our project guide, Dr. R.K. Jeyachitra,
Assistant Professor, Department of Electronics and Communication Engineering,
NIT-Trichy, who constantly motivate us during the entire course of the project and
made tremendous sacrifice while we snatched away valuable time from her to
complete this project work.
We take this opportunity to thank Dr.D. Sriram Kumar, Head of the Department,
Electronics and Communication Engineering for allowing us to undertake this
project.
We are also thankful to all the teaching and non-teaching staff of this department.
iv
TABLE OF CONTENTS
Title Page No.
BONAFIDE CERTIFICATE........................................................................................ ii
ACKNOWLEDGEMENTS ......................................................................................... iv
TABLE OF CONTENTS .............................................................................................v
ABBERVIATIONS ....................................................................................................... x
CHAPTER 1 INTRODUCTION
5.4 Conclusion……………………………………………………………………..23
vii
6.4 Graphical Analysis ............................................................................................. 26
7.1 Conclusion……………………………………………………………………..28
APPENDIX ................................................................................................................... 29
REFERENCES............................................................................................................. 48
vii
LIST OF FIGURES
viii
LIST OF TABLES
ix
ABBREVIATIONS
BM Branch Metric
SM State Metric
PM Path Metric
TB Trace Back
x
Chapter 1
INTRODUCTION
The Viterbi decoding algorithm, proposed by Viterbi, is a decoding process for
convolution codes in memory-less noise. The algorithm can be applied to a host of problems
encountered in the design of communication systems. The Viterbi Algorithm finds the most-
likely state transition sequence in a state diagram, given a sequence of symbols. The Viterbi
algorithm is used to find the most likely noiseless finite-state sequence, given a sequence of
finite-state signals that are corrupted by noise.
Generally, a Viterbi Decoder consists of three basic computation units: BMU, ACSU
and TBU. The BMU calculates the branch metrics by the hamming distance or Euclidean
distance and the ACSU calculates a summation of the branch metric from the BMU and
previous state metrics, which are called the path metrics. After this summation, the value of
each state is updated and then the survivor path is chosen by comparing path metrics.
The convolution encoder has a rate of ½ (k/n) with a constraint length of 3. With an
encoder, the 3 bit shift register provides the memory and two modulo-2 adders provide
convolution operations. For each bit in the input sequence, two bits are output, one from each
of the two modulo-2 adders.
The decoding procedure compares the received sequence with all the possible
sequences that may be obtained with the respective encoder and then selects the sequence that
is closest to the received sequence. There are always two paths merging at each node and the
path selected is the one with the minimum hamming distance, the other is simply terminated.
The retained paths are known as survivor paths and the final path selected is the one with the
continuous path through the trellis with a minimum aggregate hamming distance.
1
1.2 Limitation of Existing Viterbi Decoder
The existing Viterbi decoder was implemented using normal CMOS and pseudo NMOS
techniques which lead to high switching activity and thereby more power consumption. We
need some new techniques to implement the same functionality of Viterbi Decoder using an
efficient mechanism
In this project we will develop a Viterbi Decoder using gate diffused input logic technique
and then we will optimize the same for speed and power consumption. We can increase the
speed by using a memory source for storing the path information so that in the next fetch we
can use the stored path information. We will have a look into optimizing the speed by using
different kinds of memories.
1.4 Conclusion
We discussed the Viterbi decoder and its concepts. We also saw the limitations of existing
Viterbi Decoder and how we can make use of gate diffused input logic concepts to make the
existing model even better in terms of power and speed.
2
Chapter 2
LITERATURE REVIEW
Y. Zhu and M. Benaissa (2005) presented a novel ACS scheme that enables high speeds
to be achieved in area efficient Viterbi Decoders without compromising for area and power
efficiency. Multilevel pipelining has been introduced into the ACS feedback loop. Arkadiy
Morgenshtein et al (2007) used Gate Diffusion Input circuits for asynchronous design and
compared the designs with CMOS asynchronous design. Dalia A. El-Dib and Mohamed I.
Elmasry (2008) discussed the implementation of a Viterbi Decoder based on modified register-
exchange (RE) method. Song Li and Qing-Ming Yi (2009) proposed a scheme based on
Verilog language for the implementation of high-speed and low power consumption bi-
directional Viterbi Decoder. The decoding was done in both positive and negative direction
and the delay was half of that of the unilateralism decoder and the decoding speed was greatly
Improved. Yun-Nan Chang and Yu Chung Ding (2009) presented a low power design for
Viterbi Decoder based on a novel survivor path trace mechanism. Lupin Chen et al (2010)
presented a low-power trace-back (TB) scheme for high constraint length Viterbi Decoder.
Xuan-zhong Li et al (2011) discussed a high speed Viterbi Decoder which was based on
parallel radix-4 architecture and bit level carry-save algorithm. Seongjoo Lee (2012) presented
an efficient implementation method for parallel processing Viterbi Decoders in UWB systems.
Viterbi Decoder is mainly used in all communication techniques. Logic styles like
CMOS, Pseudo NMOS and Dynamic logic design of circuits at ACS level are done, but the
switching activity in these logic styles are high and hence lead to high power dissipation.
3
Fig.2.1: RTL view of Orthodox Viterbi Decoder
In fig.2.1 the RTL logic design of the old orthodox Viterbi Decoder is simulated. The
backend coding is done in VHDL and the circuit is synthesized using Xilinx ISE. This
Viterbi Decoder provides the delay report and power analysis report, which clearly depicts
the high power dissipation that is undesired for any FPGA based VLSI circuit design.
The existing work has been implemented where DRAM were used to store the path information
which needs to be refreshed again and again. Use of DRAM results in a loss of data after certain
period of time. DRAM consume more power than SRAM. SRAM is faster than DRAM.
2.3 CONCLUSION
In this chapter we presented the literature review of our project which dealt with the
advancement of Viterbi decoder over a period of time. We also gave a look into how we will
proceed towards the completion of our objective.
4
Chapter 3
DESIGNING OF ORTHODOX VITERBI DECODER
3.1 Introduction
GDIL is a technique of low power digital for circuit design which allows reducing power
consumption, delay and area of the digital circuit. The basic GDIL cell is similar to the standard
CMOS inverter, the differences are: (1) GDIL cell contains three inputs (2) Bulks of both
NMOS and PMOS are connected to N or P, so it can be randomly biased at contrast with
CMOS inverter. The GDIL contains four terminals – G (common gate input of the NMOS and
PMOS transistors), P (outer diffusion node of the PMOS transistor), N (outer diffusion node
of the NMOS transistor) and D node (common diffusion of both transistors). The GDIL
approach allows implementation of a wide range of complex logic functions using only two
transistors. This GDIL method is suitable for design of fast, low power circuits using a reduced
number of transistors (as compared to CMOS and existing Pass Transistor Logic techniques),
while improving logic level swing and static power characteristics and allowing simple top-
down design by using small cell library.
A simple change of the input configuration of the simple GDI cell corresponds to very
different Boolean functions. Most of these functions are complex (6–12 transistors) in CMOS,
as well as in standard PTL implementations but very simple (only two transistors per function)
in the GDIL design method. This GDIL undoubtedly reduces area as lesser number of LUTs
and CLBs are used in FPGA prototyping. GDIL Viterbi Decoder consists of three blocks. They
are BMU, ACSU and SMU. All these blocks are designed using GDIL technology, simulated
and synthesized using Xilinx ISE.
6
Fig.3.1: RTL design of Branch Metric Unit
Initially the counter output is “000” when the preset= ‘1’and clear= ‘1’. Now if the first clock
is applied then the counter starts counting the clock cycle as “001”. For the next clock cycle
the output of the counter is “001”. Similarly the counter will count all the clock cycles and it
will go to “000” when reaches “111”. Explanation of the architecture is given for constraint
length K=3.
7
Flow Chart of Add-Compare-Select Unit
State metric (SMi, j) and Branch Metric (BMi, j) are the two inputs to the Adder unit. Each
butterfly wing is usually implemented by a module called ACS module. The two adders
Compute the partial path metric of each branch. The comparator compares the two partial
metrics and the selector selects an appropriate branch. The new partial path metric updates the
state metric of state p and the survivor path recording block records the survivor path. The
adder unit which is proposed in the design consists of two full adders and one half-adder.
Output of the BMU is added with the previous path metric and the obtained output is the new
path metric for the next branch. The input sequence from a0 (1100), b0 (1001), a1 (1011), b1
(1101), a2 (0111) and b2 (0011) are added to the adder unit. The 1st bit of a0 and b0, i.e. 11, is
the input of half-adder and produces the output of the half adder operation sum1 as ‘0’and
carry as ‘1’.The half adder carry is given to the next full-adder which takes the 3-bit input as
a1, b1 and carry, i.e. 111 and produces the output as full-adder operation sum2 as ‘1’ and carry
as ‘1’.
The “carry” is given to the next full-adder input as 3-bit input a2, b2 and carry, i.e. 100
which produces the output as full adder operation sum as ‘1’ and carry as ‘0’. All other input
9
bits are processed similarly. The input value of 4-bit comparator a0, a1, a2, a3 and b0, b1, b2, b3
is 0011 and 1011. When the two inputs to the comparator are A=0011 and B=1011, then the
output line A<B will go to high state i.e. less than state. A and B are given as A=0111 and
B=0100, then the output line A>B will go to the high state i.e. greater than state. Likewise all
the input values are worked in the comparator. Here the A<B value is taken because it is the
smallest value and hence the A<B state output is given to the next SMU unit.
The output of the comparator is given as the select signal for the multiplexer which is
used to select the minimum path metric of the decoded message bit in the Viterbi Decoder.
The selector unit consist of four 2x1 MUX and the select signal for all the multiplexers are
from the A<B output of the comparator. Hence the selector selects the minimum path metric
value. The 4-bit input is a0=1101, b0=1111, a1=1101, b1=1000, a2=1111, b2=1110, a3=1001,
b3=1110 and select line value is 1010.When the select line value is high i.e. ‘1’, the output
value z0 will be b0, i.e. ‘1’. When the select line value is ‘0’, the output value z0 will be a0.
Likewise all other input values z1, z2 and z3 are taken and the outputs are obtained.
The Survivor memory unit is designed by using the serial -in-serial-out shift register
and the length of the shift register depends on the length of the convolution encoder. The fig.3.9
shows the 4x4 memory unit to store the minimum surviving path and the clock signal of the
SMU are the ACSU output for a constraint length of 3. The 4-bit output of the selector unit is
input of the SMU. The SMU was designed as 4x4 shift register using D flip-flop. Each bit is
stored in each of the D flip-flop. Similarly all the 4 shift registers store one bit each.
10
Flow Chart of Survivor Memory Unit
11
Fig.3.3: RTL design of Survivor Memory Unit
The input of D flip-flop and clock is D flip-flop GND PULSE (0 5 0 1n 1n 30n 60n) and CLK
GND PULSE (0 5 0 1n 1n 10n 20n) and the output for the q1, q2, q3 and q4 are shifting
depending on the input value of D flip-flop and clock pulse. Number of memory stage depends
on 2k-1 where k is the constraint length. In this GDIL method it varies from 4 to 128 stages.
The Viterbi Decoder using GDIL technique is designed by integrating all the units like
BMU, ACSU and SMU. The proposed design using GDIL is shown in the fig.3.10.
Here two Branch Metric Units are used since two possible changes from one state to another.
This BMU calculates the branch metric between the expected sequence (original random input
which is ‘a’) and the received sequence (introduced errors which is ‘b’). Then adder unit adds
branch metric with the previous path metric and comparator compares the two paths and select
the least path using selector. The survivor memory unit stores the path metric value and its
corresponding states using the 2x1 multiplexer and 2 bit shift register to get the decoded output.
12
Fig.3.4: RTL design of GDIL Viterbi Decoder
The GDIL Viterbi Decoder consists of three major blocks: BMU, SPU and ACSU. The
above units are discussed in detail in this chapter above. The simulation of orthodox Viterbi
Decoder is done using Xilinx ISE 12.4. Each block is programmed using Verilog HDL in back
end. Individual blocks are verified and tested. Finally the total Viterbi is synthesized, placed
and routed successfully. Table 2 depicts the detail timing report of GDIL Viterbi.
13
Timing Report of GDIL Viterbi
3.4 Conclusion
As we have seen the details of timing report of orthodox Viterbi, the table clearly shows
the propagation delay, maximum clock frequency 64.516MHz, minimum clock period
21.400ns, clock to set-up cycle time 21.400ns, set- up to clock pad delay 87.663ns, clock pad
to output pad delay 16.635 ns. And also more number of gate level of integration is done to
design the old Viterbi Decoder circuit. This is taken up as limitations of orthodox Viterbi
which is further compared with the GDIL based Viterbi in order to improve the overall
performance of the circuit.
14
Chapter 4
PROPOSED VITERBI DECODER USING DRAM
4.1 Introduction
The normal GDIL Viterbi is also compared with our proposed technique, which
comprises a SPU implements a trace back method with DRAM. This proposed approach of
incorporating DRAM stores the path information in a manner which allows fast read access
without requiring physical partitioning of the DRAM. This leads to a comprehensive gain in
speed with low power effects.
15
Fig.4.1: RTL design of Proposed Viterbi Decoder using DRAM
At each time step, the chosen transitions are stored in a column. One state is chosen as a starting
point and then trace back begins. Here in this proposed system the TBU consists of repeated
reads from the RAM cell. Each read accesses a column which precedes the column last
accessed in same clock cycle. This would cause very demanding requirements of GDIL Viterbi
using DRAM for high speed applications though it requires larger silicon chip area.
16
In this whole project all three constrains have been taken into account. The detail timing report
and power report describes the delay analysis for high speed applications and low power for
proposed Virtex based Viterbi Decoder.
17
Fig.4.3: Timing Analysis Report of Viterbi Decoder with DRAM
18
4.4 Conclusion
The design was implemented and the result found were recorded. The Min Clock to Pad
Delay and Setup to Clock at the Pad were found to be 15.500 ns and 52.600 ns respectively
which is much lesser than the earlier 87.663 ns. This clearly shows a remarkable change in
speed optimization for the target device.
19
Chapter 5
PROPOSED VITERBI DECODER USING SRAM
5.1 Introduction
In the earlier sections we showed the implementation of GDIL based Viterbi Decoder
using DRAM.
To further improve the speed and power efficiency we can incorporate SRAM in place
of DRAM. SRAM is known for its permanent storage of data and its power efficiency. The
SRAM used here stores the path information and provide much faster read access.
20
As shown in fig.5.1 the proposed SRAM based GDIL Viterbi Decoder is designed in Register
Transfer Logic style. As in earlier inventions of Viterbi Decoder, the main block is divided
into two main sub-units, i.e.; ACS array and a SPU. Here in this proposed system the metric
calculation, addition, weight comparison and survivor path selection everything take place in
the ACS array. Thus ACS array contains the weight values at each state, as a progression along
the survivor path. The weight values are very necessary when comparing with other weights
of other paths, to determine the survivor path. The compilation and comparison functions,
which takes place within the ACS array actually determines the path extensions. Signals
indicating these extensions to the survivor paths are passed from the ACS array to the SPU,
which then updates the survivor paths. In this project by introducing the SRAM cell between
the ACS array and output register bank, a significant amount of reduction in path delay has
been observed.
The TBU Unit operates in the following manner. At each time step, the chosen
transitions are stored in a column. One state is chosen as a starting point and then trace back
begins. Here in this proposed system the TBU consists of repeated reads from the RAM cell.
Each read accesses a column which precedes the column last accessed in same clock cycle.
This would cause very demanding requirements of GDIL Viterbi Decoder using SRAM for
22
5.3 Synthesis Results of Proposed Viterbi Decoder
The developed proposed Viterbi Decoder is simulated and verified its functionality. Once
the functional verification is done, the RTL model is taken to the synthesis process using the
Xilinx ISE 12.4. In synthesis process, the RTL model will be converted to the gate level netlist
mapped to a specific technology library. This modified Viterbi Decoder design is implemented
on FPGA (Field Programmable Gate Array) family of Virtex 6. Here in this Virtex 6 family
many different devices were available in the Xilinx ISE tool. In order to implement this
modified Viterbi with SRAM, the device named as “XA9536XL” has been chosen and the
package as “FG320” with the device speed as “– 6 ”. The design of modified Viterbi Decoder
for low power and high speed is synthesized successfully and its results are analyzed as shown
in the fig.5.2
5.4 Conclusion
The design was implemented and the result found were recorded. The Min Clock to Pad
Delay and Setup to Clock at the Pad were found to be 15.500 ns and 20.800 ns respectively
which is much lesser than the earlier 87.663 ns. This clearly shows a remarkable change in
speed optimization for the target device.
23
Chapter 6
RESULTS
6.1 Introduction
The above chapters dealt with the design and report of Viterbi decoder which we got in the
simulation.
Here we will show a comparative analysis of our proposed model and the existing model to
have an idea of the success of our work
The performance measures of different Viterbi decoder models are compared and the results
are shown in the tabulated form. Table 6.1 shows the comparison of Timing Analysis Report
of different models.
While Table 6.2 shows the comparison of Power Analysis Report of different model
25
6.4 Graphical Analysis
The graphical analysis is also performed for all three Viterbi Decoders to have a
complete performance summary. The Timing - Delay Analysis and Power Analysis are
shown in fig.6.2 and fig.6.3 respectively.
6.4.1 In terms of Time
24
6.5 Conclusion
27
Chapter 7
CONCLUSION AND FUTURE SCOPE
7.1 Conclusion
The ACS units and its sub circuits of all the three types of Decoder have been
operated in deep pipelined manner to achieve high transmission rate. Although the register
exchange based survivor unit has better throughput when compared to trace back unit, but here
by introducing the RAM cell between the ACS array and output register bank, a significant
amount of reduction in path delay has been observed. The fast decoding can be achieved by
this Viterbi Decoder, which shows significant results in High Speed Applications.
Among the variety of techniques and concepts used to improve the speed and power
efficiency of Viterbi Decoder, this project has explored the use of faster memory and gate
diffused input logic mechanism. Alternate algorithms can be developed and complexities of
the circuit can be further reduced to gain more speed and accuracy. The register exchange
method can be used along with memory combinations instead of using trace back method but
then the extra overheads are the registers required for storing the carry bits and the path metrics
28
APPENDIX
module
orthogdi1(a,b,reset,clk,t,b1,b2,b3,c_in,q_d1,q_d2,q_d3,q_d4,d,
vd_out,sel,wr_enable);
input a,b,reset,t,b1,b2,b3,c_in,clk,d,sel,wr_enable;
output q_d1,q_d2,q_d3,q_d4,vd_out;
wire
q3,q2,q1,b1,b2,b3,q11,q22,q33,s1,s2,s3,s11,s22,s33,c_out,c_out
1,f0,f1,
f2,f3,f01,f02,f03,f04,q_d1,q_d2,q_d3,q_d4,d,d11,d12,sm;
bmu out11(a,b,q3,q2,q1,reset,t);
callcla1 out12(s1,s2,s3,c_out,q1,q2,q3,b1,b2,b3,c_in);
bmu value3(a,b,q33,q22,q11,reset,t);
callcla1 value4(s11,s22,s33,c_out1,q11,q22,q33,b1,b2,b3,c_in);
comparator4bit1 value5(s1,s2,s3,c_out,s11,s22,s33,c_out1,sm);
mux2_call
value10(s1,s2,s3,s11,s22,s33,sm,c_out,c_out1,f0,f1,f2,f3);
call_d_ff
value11(f01,f02,f03,f04,clk,reset,q_d1,q_d2,q_d3,q_d4);
mux1 value12( sm, d,b, d11 );
dff_sync_reset value13(d11,clk,reset,d12);
dff_sync_reset valude14(d12,clk,reset,vd_out);
endmodule
29
Branch Metric Unit
module bmu(a,b,q3,q2,q1,reset,t);
input a,b,reset,t;
output q3,q2,q1;
wire qbar1,y,q2,q3,qbar2,qbar3;
xorgate out22(a,b,y);
tff_async_reset out1(t,y,reset,q1,qbar1);
tff_async_reset out2(t,qbar1,reset,q2,qbar2);
tff_async_reset out3(t,qbar2,reset,q3,qbar3);
endmodule
module tff_async_reset (
t,// Data Input
y, // Clock Input
reset , // Reset input
q , // Q output
qbar // Qbar output
);
input t,y,reset ;
output q,qbar;
reg q;
endmodule
Exor Gate
endmodule
//compute sum
assign s1 = p1 ^ c_in,
s2 = p2 ^ c1,
s3 = p3 ^ c2;
//Assign carry output
assign c_out = c3;
endmodule
32
Comparator
module comparator4bit1(s1,s2,s3,c_out,s11,s22,s33,c_out1,sm);
input s1,s2,s3,c_out,s11,s22,s33,c_out1;
//input [3:0]b;
output sm;
reg eq,sm,gt;
wire [3:0]a;
wire [3:0]b;
assign a[0]=s1,
a[1]=s2,
a[2]=s3,
a[3]=c_out,
b[3]=s11,
b[2]=s22,
b[1]=s33,
b[0]=c_out1;
always @(a,b)
begin
if (a<b)
sm<=1'b0;
else
sm<=1'b1;
end
endmodule
33
Multiplexer
module
mux2_call(s1,s2,s3,s11,s22,s33,sm,c_out,c_out1,f0,f1,f2,f3);
input s1,s2,s3,s11,s22,s33,sm,c_out,c_out1;
output f0,f1,f2,f3;
endmodule
module
call_d_ff(f01,f02,f03,f04,clk,reset,q_d1,q_d2,q_d3,q_d4);
input f01,f02,f03,f04,clk,reset;
output q_d1,q_d2,q_d3,q_d4;
dff_sync_reset out111(f01,clk,reset,q_d1);
dff_sync_reset out112(f02,clk,reset,q_d2);
dff_sync_reset out113(f03,clk,reset,q_d3);
dff_sync_reset out114(f04,clk,reset,q_d4);
endmodule
34
D-Flip Flop
module dff_sync_reset (
data , // Data Input
clk , // Clock Input
reset , // Reset input
q // Q output
);
input data, clk, reset ;
output q;
reg q;
endmodule
module ram11(
f0,f1,f2,f3,wr_enable,sel,reset,f01,f02,f03,f04);
input f0,f1,f2,f3,wr_enable,sel,reset;
35
output f01,f02,f03,f04;
//wire data_out1,data_out2,data_out3;
tristatebuffer3 final333(f01,f0,wr_enable,sel,reset);
tristatebuffer3 final334(f02,f1,wr_enable,sel,reset);
tristatebuffer3 final335(f03,f2,wr_enable,sel,reset);
tristatebuffer3 final336(f04,f3,wr_enable,sel,reset);
//orgate final337(data_out,data_out1,data_out2,data_out3);
endmodule
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
module tristatebuffer3(data_out,data_in,wr_enable,sel,reset1);
input data_in, wr_enable, sel,reset1;
output data_out;
wire g;
36
Viterbi Decoder with SRAM
//sram
module debounce(
input clk, // this is 10Mhz
input rst,
input btn_in,
output reg btn_out
);
always@(posedge clk)
begin
if(rst)
begin
count <= 0;
btn_out <= 0;
end
else
begin
if(btn_in == 1)
begin
count <= count + 1'b1;
if(count == 20'd1000000)
begin
btn_out <= 1'b1;
count <= 0;
end
37
else
btn_out <= 1'b0;
end
end
end
endmodule
module seven_seg_decoder(
input [3:0] data_in,
output [7:0] data_out
);
seven_seg_decoder seg1 (
.data_in(din),
.data_out(seg)
);
39
begin
count <= 13'b0;
clk_low <= ~clk_low;
end
end
end
always@(posedge clk_low)
begin
if(rst)
choose <= 2'b0;
else
begin
choose <= choose + 1'b1;
end
end
always@(choose,data_in)
begin
case(choose)
2'b00 : begin ctl = 4'b0111; din = data_in[15:12]; end
2'b01 : begin ctl = 4'b1011; din = data_in[11:8]; end
2'b10 : begin ctl = 4'b1101; din = data_in[7:4]; end
2'b11 : begin ctl = 4'b1110; din = data_in[3:0]; end
endcase
end
//-----------
endmodule
40
module sram_control(
input [3:0] data_in,
input [3:0] address_in,
input write,
input read,
inout [3:0] sram_data,
input clk, //this is 10MHz
input rst,
output [3:0] data_out,
output [3:0] address_out,
output ce,
output reg we,
output oe,
output sram_clk,
output adv,
output cre,
output lb,
output ub,
output [1:0] state
);
41
assign cre = 1'b0;
assign lb = 1'b0;
assign ub = 1'b0;
assign address_out = address;
assign data_out = data[address];
assign state = current_state;
assign sram_data = (we == 1'b0) ? data[address] : 4'bz;
always@(posedge clk)
begin
if(rst)
current_state <= s0;
else
current_state <= next_state;
end
always@(current_state,write,read)
begin
case(current_state)
s0 : begin
we = 1'b0;
if(write == 1)
next_state = s1;
else if(read == 1)
next_state = s2;
else
next_state = s0;
end
42
s1 : begin
we = 1'b0;
next_state = s0;
end
s2 : begin
we = 1'b1;
next_state = s0;
end
default : begin next_state = s0; we = 1'b0; end
endcase
end
//-----this is for the address part------
// if write , address = address + 1
// if read, address = sw
always@(posedge clk)
begin
if(current_state == s1)
address <= address + 1'b1;
else if(current_state == s2)
address <= address_in;
else
address <= address;
end
always@(posedge clk)
begin
43
if(current_state == s1)
data[address] <= data_in;
else if(current_state == s2)
data[address] <= sram_data;
else
data[address] <= data[address];
end
endmodule
module sram_top(
input clk,
input rst,
input [3:0] sw,
input write_in,
input read_in,
input [3:0] sram_data,
output ce,
output we,
output oe,
output sram_clk,
output adv,
output cre,
output lb,
output ub,
output [3:0] address_out,
output [7:0] seg,
output [3:0] ctl,
output [1:0] state,
output w_l,
44
output r_l
);
dcm top1
(// Clock in ports
.CLK_IN1(clk), // IN
// Clock out ports
.CLK_OUT1(clk_100), // OUT
.CLK_OUT2(clk_10), // OUT
// Status and control signals
.RESET(rst));
sram_control top2 (
.data_in(data_in),
.address_in(address_in),
.write(write),
.read(read),
.sram_data(sram_data),
.clk(clk_10),
45
.rst(rst),
.data_out(data_out),
.address_out(address_out),
.ce(ce),
.we(we),
.oe(oe),
.sram_clk(sram_clk),
.adv(adv),
.cre(cre),
.lb(lb),
.ub(ub),
.state(state)
);
seven_seg_top top3 (
.clk(clk_100),
.rst(rst),
.data_in(in_seg),
.seg(seg),
.ctl(ctl)
);
debounce top4 (
.clk(clk_10),
.rst(rst),
.btn_in(write_in),
.btn_out(write)
);
debounce top5 (
.clk(clk_10),
46
.rst(rst),
.btn_in(read_in),
.btn_out(read)
);
always@(posedge clk_100)
begin
if(write == 1)
begin
data_in <= sw;
end
else if(read == 1)
begin
address_in <= sw;
end
end
endmodule
47
REFERENCES
[1] Sun F and T. Zhang, 2005. ‘Parallel high-throughput limited search trellis decoder
VLSI design” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 9, pp.
1013–1022.
[2] Arun, C and Rajamani, 2008. “Design and VLSI implementation of a low probability
[3] Min Woo Kim and Jun Dong Cho, 2006. “A VLSI Design of High Speed Bit-level
Viterbi
Decoder”. IEEE transactions on Electrical & Electronics.vol. 7,no .10, pp. 309-312.
[4] Mohammad K.Akbari and Ali Jahanian, 2004, “Area efficient,Low Power and
Robust design for Add Compare and Select Units,” Proceedings of the IEEE
vhdl.htm
suite/index.htm .
[7] Song li and qing-ming yi., 2006. ‘The Design of High-Speed and Low Power
48
[8] Arun, C and Rajamani, 2008. “Design and VLSI implementation of a low
International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.5,
October 2013
[10] T. Kalavathidevi and C. Venkatesh , “Gate Diffusion Input (GDI) Circuits Based Low
49