You are on page 1of 34

1

Pipelining and
Parallel Processing
Shao-Yi Chien
Shao-Yi Chien 2 DSP in VLSI Design
Introduction
Pipelining and parallel are the most important
design techniques in VLSI DSP systems
Make use of the inherent parallel property of
DSP algorithms
Pipelining: different function units working in parallel
Parallel: duplicated function units working in parallel
Shao-Yi Chien 3 DSP in VLSI Design
An Example:
Car Production Line
Throughput: how
many cars produced
in one hour
Latency: how long will
it take to produce one
car
mT
Cost: 1
Throughput: 1/mT
Latency: mT
T T
T
mT
mT
mT
m
Cost: >1
Throughput: 1/T
Latency: >mT
Cost: >n
Throughput: (1/mT)*n
Latency: >mT
n
Shao-Yi Chien 4 DSP in VLSI Design
Pipelining of Digital Filters (1/3)
FIR filter
Critical path: T
M
+2T
A
Shao-Yi Chien 5 DSP in VLSI Design
Pipelining of Digital Filters (2/3)
Pipelined FIR filter
A M sample
T T T +
A M
sample
T T
f
+

1
Shao-Yi Chien 6 DSP in VLSI Design
Pipelining of Digital Filters (3/3)
Schedule
Shao-Yi Chien 7 DSP in VLSI Design
Pipeline
Can reduce the critical path to increase
the working frequency and sample rate
T
M
+2T
A
T
M
+T
A
Shao-Yi Chien 8 DSP in VLSI Design
Drawbacks of Pipelining
Increasing latency (in cycle)
For M-level pipelined system, the number of
delay elements in any path from input to
output is (M-1) greater than the origin one
Increase the number of latches (registers)
Shao-Yi Chien 9 DSP in VLSI Design
How to Do Pipelining?
Put pipelining latches across any feed-forward
cutset of the graph
Cutset
A cutset is a set of edges of a graph such that if these
edges are removed from the graph, the graph
becomes disjoint
Feed-forward cutset
The data move in the forward direction on all the
edges of the cutset
Shao-Yi Chien 10 DSP in VLSI Design
How to Do Pipelining?
Feed-forward cutset
Shao-Yi Chien 11 DSP in VLSI Design
Example
In the SFG in Fig. (a),
all the computation time
for each node is 1 u.t.
(a) Calculate the critical
path
(b) The critical path is
reduced to 2 u.t. by
inserting 3 extra delay
elements. Is it a valid
pipelining?
Shao-Yi Chien 12 DSP in VLSI Design
Answer
(a) The critical path
is A3A5A4A6,
4 u.t.
(b) No, it is not a
valid pipelining. Fig.
(c) is the corrected
one
Shao-Yi Chien 13 DSP in VLSI Design
Fine-Grain Pipelining
Critical path (T
M
=10, T
A
=2)
T
M
+2T
A
=14
T
M
+T
A
=12
T
M1
=6 or T
M2
+T
A
=6
Shao-Yi Chien 14 DSP in VLSI Design
Notes for Pipelining (1/2)
Pipelining is a very simple design technique
which can maintain the input output data
configuration and sampling frequency
T
clk
=T
sample
Supported in many EDA tools
Still has some limitations
Pipeline bubbles
Has some problems for recursive system
Introduces large hardware cost for 2-D or 3-D data
Communication bound
Shao-Yi Chien 15 DSP in VLSI Design
Notes for Pipelining (2/2)
Effective pipelining
Put pipelining registers on the critical path
Balance pipelining
10(2+8): critical path=8
10(5+5): critical path=5
Shao-Yi Chien 16 DSP in VLSI Design
Parallel of Digital Filters (1/5)
Single-input single-output (SISO) system
Multiple-input multiple-output (MIMO) system
3-Parallel System!
Shao-Yi Chien 17 DSP in VLSI Design
Parallel of Digital Filters (2/5)
Parallel processing, block processing
Block size (L): the number of data to be
processed at the same time
Block delay (L-slow)
A latch is equivalent to L clock cycles at the
sample rate
Shao-Yi Chien 18 DSP in VLSI Design
Parallel of Digital Filters (3/5)
The critical path is the same
T
clk
is not equal to T
sample
L-slow
Shao-Yi Chien 19 DSP in VLSI Design
Parallel of Digital Filters (4/5)
Whole system
Shao-Yi Chien 20 DSP in VLSI Design
Parallel of Digital Filters (5/5)
Parallel-to-serial converter
Serial-to-parallel converter
Shao-Yi Chien 21 DSP in VLSI Design
Parallel Processing v.s.
Pipelining
Parallel processing is superior than pipelining
processing for the I/O bottleneck
(communication bounded)
Pipelining only can increase the clock rate
System clock rate = sample rate for pipelining system
Use parallel processing can further lower the required
working frequency
Shao-Yi Chien 22 DSP in VLSI Design
Pipelining-Parallel Architecture
Shao-Yi Chien 23 DSP in VLSI Design
Notes for Parallel Processing
The input/output data access scheme
should be carefully designed, it will cost a
lot sometimes
T
clk
>T
sample
, f
clk
<f
sample
Large hardware cost
Combine with pipelining processing
Shao-Yi Chien 24 DSP in VLSI Design
Low Power Issues
Pipelining and parallel processing are also
beneficial for low power design
Propagation delay
Power consumption
Assume the sampling frequency is the
same
Shao-Yi Chien 25 DSP in VLSI Design
Pipelining for Low Power (1/2)
For M-level pipelining
Critical path is reduced to 1/M
The capacitance is also reduced to C
charge
/M
The supply voltage can be reduced to , and the
propagation delay remains unchanged
Shao-Yi Chien 26 DSP in VLSI Design
Pipelining for Low Power (2/2)
Power consumption:
How about the parameter ?
Solve this equation to get
Shao-Yi Chien 27 DSP in VLSI Design
Example
Parameters
T
M
=10 u.t.
T
A
=2 u.t.
T
m1
=6 u.t.
T
m2
=4 u.t.
C
M
=5C
A
V
t
=0.6V
Normal V
cc
=5V
(a) New supply
voltage?
(b) Power saving
percentage?
Shao-Yi Chien 28 DSP in VLSI Design
Answer
(a)
Origin system:
Pipelined system:
(b)
Invalid value, less
than threshold
voltage
Shao-Yi Chien 29 DSP in VLSI Design
Parallel Processing for Low
Power (1/2)
For L-parallel system
Clock period: T
seq
LT
seq
C
charge
remains unchanged
C
totol
LC
total
Have more time to charge the capacitance, the supply
voltage can be lower
Shao-Yi Chien 30 DSP in VLSI Design
Parallel Processing for Low
Power (2/2)
Power consumption
To derive the parameter
Shao-Yi Chien 31 DSP in VLSI Design
Example
Parameters
T
M
=8 u.t.
T
A
=1 u.t.
T
sample
=9 u.t.
C
M
=8C
A
V
t
=0.45V
Normal V
cc
=3.3V
(a) New supply
voltage?
(b) Power saving
percentage?
Shao-Yi Chien 32 DSP in VLSI Design
Answer
(a)
Origin:
2-parallel system:
(b)
Invalid value, less
than threshold
voltage
Shao-Yi Chien 33 DSP in VLSI Design
Pipelining-Parallel for Low
Power
Shao-Yi Chien 34 DSP in VLSI Design
Conclusion
Pipelining
Reduce the critical path
Increase the clock speed
Reduce power consumption at the same speed
Parallel
Increase effective sampling rate
Reduce power consumption at the same speed