Sie sind auf Seite 1von 49

VLSI Signal Processing

Digital signal processing (DSP) has many


advantages over analog signal processing
digital signals more robust than analog
signals w.r.t temperature and process
variation, accuracy is more and many more.
DSP systems can be realized using
programmable processors or custom
designed hardware circuits fabricated using
very large scale integrated (VLSI) circuit
technology.

Unit-I
Basics: - Vector Quantization, Decimator and Expander,
Representations of DSP Algorithms: Block Diagrams, SignalFlow-Graph, Data-Flow Graph, Dependence Graph.
Iteration Bound: - Data Flow Graph Representations, Loop Bound
and Iteration Bound, Algorithms for computing Iteration Bound:
Longest Path Matrix Algorithm, Minimum Cycle Mean Algorithm,
Iteration Bound of Multirate Data-Flow Graphs.

Unit-II
Pipelining and Parallel Processing: - Cutset, Feed-Forward Cutset,
Pipelining of FIR Digital Filters: Data- Broadcast Structures, FineGrain pipelining, Parallel Processing: Designing a Parallel FIR
System, Pipelining and Parallel Processing for Low Power:
Pipelining for Low Power, Parallel Processing for Low Power,
Combining Pipelining and Parallel Processing.
Retiming: - Quantitative Description of Retiming, Properties of
Retiming, Solving Systems of Inequalities, Retiming Techniques:
Cutset Retiming and Pipelining, Retiming for Clock Period
Minimization, Retiming for Register Minimization.

Unit-III
Unfolding: - Algorithm for Unfolding, Properties of Unfolding, Critical
Path, Unfolding and Retiming, Applications of Unfolding: Sample
Period Reduction, Word-Level Parallel Processing, Bit-Level Parallel
Processing.
Folding:- Folding Transformation, Register Minimization Techniques:
Lifetime Analysis, Data Allocation using Forward-Backward Register
Allocation, Register Minimization in Folded Architectures: Biquad
Filter, IIR Filter, Folding of Multirate Systems.

Unit-IV
Bit-Level Arithmetic Architectures: - Parallel Multiplication with Sign
Extension, Baugh-Wooley Multipliers, Parallel Multipliers with
Modified Booth Recoding, Interleaved Floor-Plan and Bit-Plane based
Digital Filters.
Computer Arithmetic:- Floating Point Numbers, Floating Point
Addition, Floating Point Multiplication, Floating Point Division,
Floating Point Reciprocal, CORDIC Algorithm: Introduction, Modes,
Architectures, Computation of special functions using CORDIC
Algorithm (e.g. Trigonometric, Hyperbolic, Square Root etc. )

Text Books
1. K. K. Parhi, VLSI Digital Signal Processing Systems, John
Wiley, 2010.
2. U. Meyer-Baese, Digital Signal Processing with FPGAs,
Springer, 2011

Reference Books
1. P.B. Denyer and D. Renshaw, VLSI Signal Processing,
Addison-Wesely, 1986.
2. R.I. Hartley and K. K. Parhi, Digit-Serial Computation,
Kluwer, 1995
3. S.Y. Kung, H.J. White House, T. Kailath, "VLSI and Modern
Signal Processing ", Prentice Hall, 1985.

Vector Quantization
Originated as pattern matching scheme.
Commonly used for data compression in
speech, image and video coding, and
speech recognition.
Lossy compression technique that exploits
spatial correlation that exists between
neighboring signal samples.
Group of samples quantized together rather
than individually.

On the encoder side, vector quantizer takes a group


of input samples, compares this input vector to the
codewords in the codebook and selects the
codeword with minimum distortion.

where
C={} is N x k matrix with the j-th

codeword vector as its j-th row, x is the input
vector of dimension k, and e=[e 0 e1 .eN-1]T
The above searching algorithm is a bruteforce
approach where the distortion between the
input vector and every entry in the codebook
is computed, and is called full-search vector
quantization.
Every full-search operation requires N
distortion computations and each distortion
computation involves k multiply-add
operations. This algorithm may be a
bottleneck for high performance for large N.

For
these cases, a tree structured
vector quantization scheme can be
used whose complexity is proportional
to.

Decimator and Expander

Representation of DSP Algorithms

Block Diagrams(BD)
Signal Flow Graph(SFG)
Data Flow Graph(DFG)
Dependence Graph(DG)

Block Diagram
Consists of functional blocks connected
with directed edges.
Represents data flow from its input block
to its output block.
Edges may or may not contain delay
elements.
Can be used to describe both linear single
rate and nonlinear multirate DSP systems.
3-tap FIR filter

Signal Flow Graph

Collection of nodes and edges.


Nodes represent computation tasks.
2 types of nodes - source and sink node.
Source node no entering edges, used to
inject external input.
Sink node only entering edges, used to
extract output.
Only applicable to liner networks and cannot
be used to describe multirate DSP systems.

Data Flow Graph


Nodes represent computations or functions
or subtasks.
Directed edges represent data path and
each edge has a nonnegative number of
delays associated with it.
Captures the data driven property of DSP
algorithms where any node can fire
whenever all the input data are available.
Can be used to describe both linear single
rate and nonlinear multirate DSP systems.

Dependence Graph
Directed graph that shows dependence of
computations in an algorithm.
Nodes represent computations.
Edges represent precedence constraints
among nodes
New node is created whenever a new
computation is called for in the algorithm.
No node is reused on a single computation
basis.

Iteration Bound
Many DSP algorithms contain
feedback loops, which impose an
inherent fundamental lower bound
on the achievable iteration or sample
period.
This bound is referred to as iteration
bound.
Not possible to achieve an iteration
period less than iteration bound even
when infinite processors are

Data Flow Graph


Representation
DSP programs are considered to be
nonterminating that run from time
index n=0 to n=.

Input to the program is the sequence x(n) for


n=0,1,2,.. and the initial condition y(-1). The
output is the sequence y(n) for n=0,1,2,.
DSP program is represented using a DFG,
which is a directed graph that describes the
program.
Nodes represent tasks or computations and
each node has an execution time associated
with it.
The edges represent communication between
nodes and each edge has a nonnegative
number of delays associated with it.

Iteration of anode is the execution of


the node exactly once, and an iteration
of the DFG is the execution of each
node in the DFG exactly once.
Each edge describes a precedence
constraint between two nodes intra
iteration precedence constraint if the
edge has zero delays or inter iteration
precedence constraint if the edge has
one r more delays.
Critical path of a DFG is defined to be
the path with the longest computation
time among all the paths that contain

Critical

path is the longest path for


combinational rippling in the DFG, so
the computation time of the critical
path is minimum computation time for
one iteration of the DFG.
DFG can be classified as nonrecursive
or recursive nonrecursive contains no
loops, while recursive contains at least
one loop.
Recursive DFG has a fundamental limit
on how fast the DSP program can be
implemented in hardware. This limit is
called the iteration bound .

Loop Bound and Iteration


Bound
Loop bound represents the lower bound
on the loop computation time. Loop
bound of the l-th loop is defined as tl/wl,
where tl is the loop computation time and
wl is the number of delays in the loop.
Critical loop is the loop with maximum
loop bound, the loop bound of the critical
loop is the iteration bound T.
L is the set loops

Algorithm for computing Iteration


Bound

Longest Path Matrix Algorithm(LPM)


Minimum Cycle Mean
Algorithm(MCM)

Longest Path Matrix Algorithm(LPM)

Minimum Cycle Mean


Algorithm(MCM)

The cycle means of the new graph G d are


used to compute the iteration bound, where
Gd can be found from the DFG (G) for which
we are computing the iteration bound. If d is
the number of delay elements in G, then the
graph Gd has d nodes where each node
corresponds to one of the delays in G. Weight
w(i,j) of the edge from node i to j in G d is the
longest path among all paths in G from delay
di to dj that do not pass through any delays.
If no zero-delay path exists from delay d i to
dj, then edge ij does not exist in G d.

In Gd, number of edges in a cycle equals the


number of nodes in the cycle, and this equals
the number of delays in the cycle in G.
Cycle mean of a cycle c in G d is

Maximum cycle mean of Gd is the maximum


cycle bound of all cycles in G, which is the
iteration bound of G.
To compute maximum cycle mean of G d,
graph d is constructed from Gd by simply
multiplying the weight of the edges by -1.

for
1 and . in some cases we may
encounter f(d)(i) f(m)(i) = -
which should be treated as zero.
Using the right column of the above
table, we can determine
T= - min{-2,-1,-2,} = 2

Iteration Bound of Multirate Data


Flow Graphs
This is a 2-step process
1. Construct a single-rate DFG (SRDFG)
that is equivalent to the multirate
DFG (MRDFG).
2. Compute the iteration bound of the
equivalent SRDFG using LPM or MCM
algorithm.

If the nodes U and V are invoked kU


and kV times , in an iteration, then the
number of samples produced on the
edge from the node U to the node V in
1 iteration is OUVkU, and then the
number of samples consumed from
edge by node V in 1 iteration is IUVkV.
OUVkU = IUVkV

a/b = and a%b = a - where is the


Where

floor of x, which is the largest integer less


than or equal to x.

4ka = 3kb
kb = 2kc
kc = kc
3kc = 2kb

We have the solution ka=3, kb=4, kc=2.

Das könnte Ihnen auch gefallen