Chap 10

Chapter 10: Pipelined and Parallel
Recursive and Adaptive Filters

Keshab K. Parhi
Chapter 10 2
Outline
Introduction
Pipelining in 1
st
-Order IIR Digital Filters
Pipelining in Higher-Order IIR Digital Filters
Parallel Processing for IIR Filters
Combined Pipelining and Parallel Processing for IIR
Filters
Chapter 10 3
First-Order IIR Filter
Consider a 1
st
-order linear time-invariant recursion (see Fig. 1)
The iteration period of this filter is , where represent
word-level multiplication time and addition time
Look-Ahead Computation
{ }
a m
T T + { }
a m
T T ,
) ( ) ( ) 1 ( n u b n y a n y + +
(10.1)
In look-ahead transformation, the linear recursion is first iterated a few
times to create additional concurrency.
By recasting this recursion, we can express y(n+2) as a function of y(n)
to obtain the following expression (see Fig. 2(a))
The iteration bound of this recursion is , the same as the
original version, because the amount of computation and the number of
logical delays inside the recursive loop have both doubled
( ) 2 2
a m
T T +
[ ] ) 1 ( ) ( ) ( ) 2 ( + + + + n bu n bu n ay a n y
(10.2)
Chapter 10 4
Another recursion equivalent to (10.2) is (10.3). Shown on Fig.2(b), its
iteration bound is , a factor of 2 lower than before.
Applying (M-1) steps of look-ahead to the iteration of (10.1), we can
obtain an equivalent implementation described by (see Fig. 3)
Note: the loop delay is instead of , which means that the loop
computation must be completed in M clock cycles (not 1 clock cycle). The
iteration bound of this computation is , which corresponds
to a sample rate M times higher than that of the original filter
The terms in (10.4) can be pre-computed
(referred to as pre-computation terms). The second term in RHS of (10.4)
is the look-ahead computation term (referred to as the look-ahead
complexity); it is non-recursive and can be easily pipelined
) 1 ( ) ( ) ( ) 2 (
2
+ + + + n u b n u ab n y a n y
( ) 2
a m
T T +
(10.3)
+ + +
1
0
) 1 ( ) ( ) (
M
i
i M
i M n u b a n y a M n y (10.4)
M
z
1
z
( ) M T T
a m
+
{ }
M M
a b a b a ab , , , ,
1 2

Chapter 10 5
Fig.2.(a)
Fig.2.(b)
2D
D
b
a
u(n+1)
y(n+1)
u(n)
b
a
y(n+2)
y(n)
2D
D
ab a
u(n+1)
u(n)
y(n+2)
y(n)
b
2
D
b a
u(n)
y(n)
y(n+1)
Fig. 1
Chapter 10 6
MD
D
ab a
u(n+M-1) u(n+M-2)
y(n+M)
y(n)
b
M
D
a b
u(n)
M-1
Fig. 3: M-stage Pipelinable 1st-Order IIR Filter
Look-ahead computation has allowed a single serial computation to be
transformed into M independent concurrent computations, and to
pipeline the feedback loop to achieve high speed filtering of a single
time series while maintaining full hardware utilization.
Provided the multiplier and the adder can be conveniently pipelined,
the iteration bound can be achieved by retiming or cutset
transformation (see Chapter 4)
Chapter 10 7
Pipelining in 1
st
-Order IIR Digital Filters
Look-ahead techniques add canceling poles and zeros with equal
angular spacing at a distance from the origin which is same as
that of the original pole. The pipelined filters are always stable
provided that the original filter is stable
The pipelined realizations require a linear increase in complexity
but decomposition techniques can be used to obtain an
implementation with logarithmic increase in hardware with
respect to the number of loop pipeline stages
Example: Consider the 1
st
-order IIR filter transfer function
The output sample y(n) can be computed using the input sample
u(n) and the past output sample as follows:
The sample rate of this recursive filter is limited by the
computation time of one multiply-add operation
1
1
1
) (

z a
z H
(10.5)
) ( ) 1 ( ) ( n u n y a n y +
(10.6)
Chapter 10 8
Pipelining in 1
st
-Order IIR Digital Filters (continued)
1. Look-Ahead Pipelining for 1
st
-Order IIR Filters
Look-ahead pipelining adds canceling poles and zeroes to the transfer
function such that the coefficients of in the
denominator of the transfer function are zero. Then, the output sample
y(n) can be computed using the inputs and the output sample y(n-M)
such that there are M delay elements in the critical loop, which in turn
can be used to pipeline the critical loop by M stages and the sample
rate can be increased by a factor M
Example: Consider the 1
st
-order filter, , which
has a pole at z=a (a1). A 3-stage pipelined equivalent stable filter can
be derived by adding poles and zeroes at , and is given
by
( )
} , , {
1 1

M
z z
( )
1
1 1 ) (

z a z H
( ) 3 2 j
ae z
t
3 3
2 2 1
1
1
) (

+ +
z a
z a z a
z H
Chapter 10 9
Pipelining in 1
st
2. Look-Ahead Pipelining with Power-of-2 Decomposition
With power-of-2 decomposition, an M-stage (for power-of-2 M)
pipelined implementation for 1
st
-order IIR filter can be obtained by
sets of transformations
Example: Consider a 1
st
-order recursive filter transfer function
described by . The equivalent pipelined
transfer function can be described using the decomposition technique
as follows
This pipelined implementation is derived by adding (M-1) poles and zeros
at identical locations.
The original transfer function has a single pole at (see Fig.4(a)).
M
2
log
( )
M M
M
i
z a
z a z b
z H
i i

1
1
) (
1 log
0
2 2 1 2
(10.7)
( ) ( )
1 1
1 ) (

z a z b z H
a z
Chapter 10 10
The pipelined transfer function has poles at the following locations (see
Fig.4(b) for M=8):
The decomposition of the canceling zeros is shown in Fig.4(c). The i-th
stage of the decomposed non-recursive portion implements zeros
located at:
The i-th stage of the decomposed non-recursive portion requires a single
pipelined multiplication operation independent of the stage number i
The multiplication complexity of the pipelined implementation is
The finite-precision pipelined filters suffer from inexact pole-zero
cancellation, which leads to magnitude and phase error. These errors can
be reduced by increasing the wordlength (see p. 323)
( ) ( )
{ }
M M j M j
ae ae a
2 1 2
, , ,

i
2
( ) ( ) ( ), 2 1 2 exp
i
n j a z + (10.8)
( ) 1 2 , , 1 , 0
i
n
( ) 2 log
2
+ M
Chapter 10 11
F
i
g
.

4
P
o
l
e

r
e
p
r
e
s
e
n
t
a
t
i
o
n

o
f

a

1
S
T
-
o
r
d
e
r

r
e
c
u
r
s
i
v
e

f
i
l
t
e
r
P
o
l
e
/
z
e
r
o

r
e
p
r
e
s
e
n
t
a
t
i
o
n

o
f

a

1
S
T
-
o
r
d
e
r

I
I
R

w
i
t
h

8

l
o
o
p

s
t
a
g
e
s
D
e
c
o
m
p
o
s
i
t
i
o
n

b
a
s
e
d

o
n

p
i
p
e
l
i
n
e
d

I
I
R

f
o
r

M

=

8
Chapter 10 12
Pipelining in 1
st
3. Look-Ahead Pipelining with General Decomposition
The idea of decomposition can be extended to any arbitrary number of
loop pipelining stages M. If , then the non-recursive
stages implement , , zeros,
respectively, totaling (M-1) zeros
Example (Example 10.3.3, p.325) Consider the 1
st
-order IIR
A 12-stage pipelined decomposed implementation is given by
p
M M M M
2 1
( ) 1
1
M ( ) 1
2 1
M M
( ) 1 ,
1 2 1

p p
M M M M
1
1
1
) (

z a
z H
( )( )( )
12 12
6 6 4 4 2 2 1
12 12
11
0
1
1 1 1
1
) (

+ + + +

z a
z a z a z a az
z a
z a
z H
i
i i
- This implementation is based on a
decomposition (see Fig. 5)
2 3 2
Chapter 10 13
The first section implements 1 zero at a, the second section implements 4
zeros at , and the third section implements 6 zeros at
Another decomposed implementation ( decomposition) is given
by
The first section implements 1 zero at a, the second section
implements 2 zeros at , and the third section implements 8 zeros
at
{ }
3 2 3
,
j j
ae ae
t t
{ }
6 5 6
, ,
j j
ae ae ja
t t
t
( )( )( )
12 12
8 8 4 4 2 2 1
1
1 1 1
) (

+ + + +
z a
z a z a z a az
z H
3 2 2
{ }
6 5 3 2 3 6
, , ,
j j j j
ae ae ae ae
t t t t
ja t
The third decomposition ( decomposition) is given by
The first section implements 2 zeros at , the second section
implements 3 zeros at , and the third section
implements 6 zero at
2 2 3
( )( )( )
12 12
6 6 3 3 2 2 1
1
1 1 1
) (

+ + + +
z a
z a z a z a az
z H
{ }
3 2 j
ae
t
{ }
3
,
j
ae a
t
{ }
6 5 6
, ,
j j
ae ja ae
t t
t
Chapter 10 14
Fig.5(a)
Fig.5(b)
Pole-zero location of a
12-stage pipelined
1
ST
-order IIR
Decomposition of the zeros
of the pipelined filter for
a 2X3X2 decomposition
Chapter 10 15
Pipelining in Higher-order IIR Digital Filters
Higher-order IIR digital filters can be pipelined by using clustered
look-ahead or scattered look-ahead techniques. (For 1
st
-order IIR filters,
these two look-ahead techniques reduce to the same form)
Clustered look-ahead: Pipelined realizations require a linear
complexity in the number of loop pipelined stages and are not
always guaranteed to be stable
Scattered look-ahead: Can be used to derive stable pipelined IIR
filters
Decomposition technique: Can also be used to obtain area-efficient
implementation for higher-order IIR for scattered look-ahead filters
Constrained filter design techniques: Achieve pipelining without
pole-zero cancellation
Chapter 10 16
The transfer function of an N-th order direct-form recursive filter is
described by
Equivalently, the output y(n) can be described in terms of the input
sample u(n) and the past input/output samples as follows
The sample rate of this IIR filter realization is limited by the throughput of
1 multiplication and 1 addition, since the critical path contains a single
delay element
N
i
i
i
N
i
i
i
z a
z b
z H
1
0
1
) (
(10.9)
(10.10)
) ( ) (
) ( ) ( ) (
1
0 1
n z i n y a
i n u b i n y a n y
N
i
i
N
i
i
N
i
i
+
+

Chapter 10 17
1. Clustered Look-Ahead Pipelining
The basic idea of clustered look-ahead:
Add canceling poles and zeros to the filter transfer function such
that the coefficients of in the denominator of the
transfer function are 0, and the output samples y(n) can be
described in terms of the cluster of N past outputs:
Hence the critical loop of this implementation contains M delay
elements and a single multiplication. Therefore, this loop can be
pipelined by M stages, and the sample rate can be increased by a
factor M. This is referred to as M-stage clustered look-ahead
pipelining
{ }
) 1 ( 1
, ,

M
z z
{ } ) 1 ( , ), ( + N M n y M n y
Pipelining in Higher-order IIR Digital Filters (contd)
Chapter 10 18
Example: (Example 10.4.1, p.327) Consider the all-pole 2
nd
-order IIR filter
with poles at . The transfer function of this filter is
A 2-stage pipelined equivalent IIR filter can be obtained by eliminating
the term in the denominator (i.e., multiplying both the numerator and
denominator by ). The transformed transfer function is
given by:
From, the transfer function, we can see that the coefficient of in the
denominator is zero. Hence, the critical path of this filter contains 2 delay
elements and can be pipelined by 2 stages
{ } 4 3 , 2 1
2
8
3
1
4
5
1
1
) (

+
z z
z H
(10.11)
1
z
( )
1
4 5 1

+ z
3
32
15
2
16
19
1
4
5
1
4
5
1
4
5
2
8
3
1
4
5
1
1
1
1
1
1
) (

+
+
+
+
z z
z
z
z
z z
z H
(10.12)
1
z
Chapter 10 19
Computation complexity: The numerator (non-recursive portion) of this
pipelined filter needs (N+M) multiplications, and the denominator (recursive
portion) needs N multiplications. Thus, the total complexity of this pipelined
implementation is (N+N+M) multiplications
Stability: The canceling poles and zeros are utilized for pipelining IIR filters.
However, when the additional poles lie outside the unit circle, the filter
becomes unstable. Note that the filters in (10.12) and (10.13) are unstable.
Similarly, a 3-stage pipelined realization can be derived by eliminating
the terms of in the denominator of (10.21), which can be
done by multiplying both numerator and denominator by
.
The new transfer function is given by:
{ }
2 1
,

z z
( )
2 1
16 19 4 5 1

+ + z z
4
128
57
3
64
65
2
16
19
1
4
5
1
1
) (

+
+ +
z z
z z
z H (10.13)
Chapter 10 20
If the desired pipeline delay M does not produce a stable filter, M
should be increased until a stable pipelined filter is obtained. To obtain
the optimal pipelining level M, numerical search methods are generally
used
Example (Example 10.4.3, p.330) Consider a 5-level (M=5) pipelined
implementation of the following 2
nd
-order transfer function
By the stability analysis, it is shown that (M=5) does not meet the stability
condition. Thus M is increased to M=6 to obtain the following stable
pipelined filter as
2 1
6889 . 0 5336 . 1 1
1
) (

+
z z
z H
(10.14)
7 6
5 4 3 2 1
5011 . 0 3265 . 1 1
7275 . 0 1454 . 1 4939 . 1 6630 . 1 5336 . 1 1
) (

+
+ + + + +
z z
z z z z z
z H
(10.15)
2. Stable Clustered Look-Ahead Filter Design
Chapter 10 21
3. Scattered Look-Ahead Pipelining
Scattered look-ahead pipelining: The denominator of the transfer
function in (10.9) is transformed in a way that it contains the N terms
. Equivalently, the state y(n) is computed in
terms of N past scattered states y(n-M),y(n-2M),, and y(n-NM)
In scattered look-ahead, for each poles in the original filter, we
introduce (M-1) canceling poles and zeros with equal angular spacing
at a distance from the origin the same as that of the original pole.
Example: if the original filter has a pole at , we add (M-1)
poles and zeros at to
derive a pipelined realization with M loop pipeline stages
Assume that the denominator of the transfer function can be factorized
as follows:
p z
{ }
NM M M
z z z

, , ,
2
( ) { } 1 , , 2 , 1 , 2 exp M k M k j p z

N
i
i
z p z D
1
1
) 1 ( ) (
(10.16)
Chapter 10 22
Then, the pipelining process using the scattered look-ahead
approach can be described by
(continued)
( )
( )
) ( '
) ( '
1
1 ) (
) (
) (
) (
1
1
0
1 2
1
1
1
1 2
M N
i
M
k
M k j
i
N
i
M
k
M k j
i
z D
z N
z e p
z e p z N
z D
z N
z H
(10.17)
nd
-order filter with complex
conjugate poles at . The filter transfer function is given by
We can pipeline this filter by 3 stages by introducing 4 additional
poles and zeros at if
j
re z
t
2 2 1
cos 2 1
1
) (

+
z r z r
z H
( ) ( )
{ }
3 2 3 2
,
t + t

j j
re z re z
3 2
Chapter 10 23
(contd) The equivalent pipelined filter is then given by
When , then only 1 additional pole and zero at is
required for 3-stage pipelining since
and . The equivalent pipelined filter is then
given by
nd
-order filter with real
poles at . The transfer function is given by
( )
( )
6 6 3 3
4 4 3 3 2 2 1
3 cos 2 1
cos 2 2 cos 2 1 cos 2 1
) (

+
+ + + + +
z r z r
z r z r z r z r
z H

3 2 r z
( )
r re z
j

t 3 2
( ) j j
re re z
t + t

3 2
( )( )
3 3
1
1 2 2 1
1
1
1
1 1
1
) (

+ +

z r
z r
z r z r z r
z r
z H
{ }
2 1
, r z r z
( )
2
2 1
1
2 1
1
1
) (

+ +
z r r z r r
z H
Chapter 10 24
(contd) A 3-stage pipelined realization is derived by adding
poles(and zeros) at . The pipelined
realization is given by
The pole-zero locations of a 3-stage pipelined 2
nd
-order filter with
poles at z=1/2 and z=3/4 are shown in Fig. 6
CONCLUSIONS:
If the original filter is stable, then the scattered look-ahead
approach leads to stable pipelined filters, because the distance of
the additional poles from the original is the same as that of the
original filter
Multiplication Complexity: (NM+1) for the non-recursive portion
in (10.17), and N for the recursive portion. Total pipelined filter
multiplication complexity is (NM+N+1)
{ }
3 2
2
3 2
1
,
j j
e r z e r z
t t

( ) ( ) ( )
( )
6
3
2
3
1
3
3
2
3
1
4
2
2
2
1
3
2 1 2 1
2
2
2 2 1
2
1
1
2 1
1
1
) (

+ +
+ + + + + + + +
z r r z r r
z r r z r r r r z r r r r z r r
z H
Chapter 10 25
2
1
4
3
1
Re[z]
Im[z]
Fig. 6: Pole-zero representation of a 3-stage pipelined equivalent
stable filter derived using scattered look-ahead approach
(contd) The multiplication complexity is linear with respect to M.
and is much greater than that of clustered look-ahead
Also the latch complexity is square in M, because each multiplier
is pipelined by M stages
Chapter 10 26
4. Scattered Look-Ahead Pipelining with Power-of-2
Decomposition
This kind of decomposition technique will lead to a logarithmic
increase in multiplication complexity (hardware) with respect to the
level of pipelining
Let the transfer function of a recursive digital filter be
A 2-stage pipelined implementation can be obtained by
multiplying by in the numerator and
denominator. The equivalent 2-stage pipelined implementation is
described by
) (
) (
1
) (
1
0
z D
z N
z a
z b
z H
N
i
i
i
N
i
i
i
(10.18)
( )

N
i
i
i
i
z a
1
) 1 ( 1
( ) ( ) ( )
( )( )
) ( '
) ( '
) 1 ( 1 1
1 1
) (
1 1
1 0
z D
z N
z a z a
z a z b
z H
N
i
i
i
i
N
i
i
i
N
i
i
i
i
N
i
i
i
(10.19)
Chapter 10 27
(contd) Similarly, subsequent transformations can be applied to
obtain 4, 8, and 16 stage pipelined implementations, respectively
Thus, to obtain an M-stage pipelined implementation (for power-
of-2 M), sets of such transformations need to be applied.
By applying sets of such transformations, an
equivalent transfer function (with M pipelining stages inside the
recursive loop) can be derived, which requires a complexity of
multiplications, a logarithmic complexity
with respect to M.
M
2
log
( ) 1 log
2
M
( ) 1 log 2
2
+ + M N N
Note: the number of delays (or latches) is linear: the total number of
delays (or latches) is approximately , about NM
delays in non-recursive portion, and delays for
pipelining each of the multipliers by M stages
In the decomposed realization, the 1
st
stage implements an N-th
order non-recursive section, and the subsequent stages respectively
implement 2N, 4N,, NM/2-order non-recursive sections
( ) 1 log
2
+ M NM
M NM
2
log
M N
2
log
Chapter 10 28
Example (Example 10.4.7, p.334) Consider a 2
nd
-order recursive filter
described by
The poles of the system are located at (see Fig. 7(a)).
The pipelined filter requires multiplications and is
described by
2 2 1
2
2
1
1 0
cos 2 1 ) (
) (
) (

+
+ +

z r z r
z b z b b
z U
z Y
z H
{ }
j
re z
t
M M M M
i
i
i
z r z M r
z b
z H
2 2
2
0
cos 2 1
) (
( )

+ +
+ +
1 log
0
2 2 2 2
2
1 1
2 cos 2 1
M
i
i
i i i i
z r z r
{ } M where 2
The 2M poles of the transformed transfer function (shown in Fig.
7(b)) are located at
( ) ( )
( ) 1 , 2 , 1 , 0 ,
2

+ t
M i re z
M i j
( ) 5 log 2
2
+ M
Chapter 10 29
Fig. 7: (also see Fig.10.11, p.336) Pole-zero representation for Example 10.4.7,
the decompositions of the zeros is shown in (c)
Chapter 10 30
Parallel Processing in IIR Filters
Parallel processing can also be used in design of IIR filters
First discuss parallel processing for a simple 1
st
-order IIR filter, then
we discuss higher order filters
Example: (Example 10.5.1, p.339) Consider the transfer function of a 1
st
-
order IIR filter given by
where for stability. This filter has only 1 pole located at . The
corresponding input-output can be written as y(n+1)=ay(n)+u(n)
Consider the design of a 4-parallel architecture (L=4) for the foregoing
filter. Note that in the parallel system, each delay element is referred to as
a block delay, where the clock period of the block system is 4 times the
sample period. Therefore, the loop update equation should update y(n+4)
by using inputs and y(n).
1
1
1
) (

az
z
z H
1 a
a z
Chapter 10 31
By iterating the recursion (or by applying look-ahead technique), we get
Substituting
The corresponding architecture is shown in Fig.8.
) 3 ( ) 2 ( ) 1 ( ) ( ) ( ) 4 (
2 3 4
+ + + + + + + + n u n au n u a n u a n y a n y
k n 4
) 3 4 ( ) 2 4 (
) 1 4 ( ) 4 ( ) 4 ( ) 4 4 (
2 3 4
+ + + +
+ + + +
k u k au
k u a k u a k y a k y
D
2
a
3
a
a
) 2 4 ( + k u ) 1 4 ( + k u ) 4 ( k u ) 3 4 ( + k u
4
a
) 4 4 ( + k y
) 4 ( k y
Fig. 8: (also see Fig.10.14, p. 340)
(10.20)
Chapter 10 32
The pole of the original filter is at , whereas the pole for the
parallel system is at , which is much closer to the origin since
An important implication of this pole movement is the improved
robustness of the system to the round-off noise
A straightforward block processing structure for L=4 obtained by
substituting n=4k+4, 4k+5, 4k+6 and 4k+7 in (10.20) is shown in Fig. 9.
Hardware complexity of this architecture: multiply-add operations
(Because L multiply-add operations are required for each output and there
are L outputs in total)
The square increase in hardware complexity can be reduced by exploiting
the concurrency in the computation (the decomposition property in the
scattered look-ahead mode can not be exploited in the block processing mode
because one hardware delay element represents L sample delays)
a z
4
a z
{ } 1 since ,
4
a a a
2
L
Chapter 10 33
Fig. 9: (also Fig.10.15, p.341) A 4-parallel 1
st
-order recursive filter
D
2
a
3
a
) 4 4 ( + k u ) 5 4 ( + k u ) 6 4 ( + k u ) 3 4 ( + k u
4
a
) 7 4 ( + k y
) 3 4 ( + k y
a
D
2
a
3
a
4
a
) 6 4 ( + k y ) 2 4 ( + k y
a
D
D
2
a
3
a
4
a
) 5 4 ( + k y ) 1 4 ( + k y
a
D
3
a
2
a
4
a
) 6 4 ( + k y
) 4 ( k y
a
D
D
Chapter 10 34
Trick:
Instead of computing y(4k+1), y(4k+2) & y(4k+3) independently,
we can use y(4k) to compute y(4k+1), use y(4k+1) to compute
y(4k+2), and use y(4k+2) to compute y(4k+3), at the expense of an
increase in the system latency, which leads to a significant
reduction in hardware complexity.
This method is referred as incremental block processing, and
y(4k+1), y(4k+2) and y(4k+3) are computed incrementally.
Example (Example 10.5.2, p.341) Consider the same 1
st
-order filter in last
example. To derive its 4-parallel filter structure with the minimum hardware
complexity instead of simply repeating the hardware 4 times as in Fig.15, the
incremental computation technique can be used to reduce hardware complexity
First, design the circuit for computing y(4k) (same as Fig.14)
Then, derive y(4k+1) from y(4k), y(4k+2) from y(4k+1), y(4k+3) from
y(4k+2) by using
'
+ + + +
+ + + +
+ +
) 2 4 ( ) 2 4 ( ) 3 4 (
) 1 4 ( ) 1 4 ( ) 2 4 (
) 4 ( ) 4 ( ) 1 4 (
k u k ay k y
k u k ay k y
k u k ay k y
Chapter 10 35
The complete architecture is shown in Fig.10
The hardware complexity has reduced from to (2L-1) at the expense of
an increase in the computation time for y(4k+1), y(4k+2) and y(4k+3)
Fig.10: (also see Fig.10.16, p.342) Incremental block filter structure with L=4
2
L
D
2
a
3
a
a
) 2 4 ( + k u ) 1 4 ( + k u ) 4 ( k u ) 3 4 ( + k u
4
a
) 4 4 ( + k y
) 4 ( k y
a
a
a
) 1 4 ( + k y
) 2 4 ( + k y
) 3 4 ( + k y
Chapter 10 36
Example (Example 10.5.3, p.342) Consider a 2
nd
-order IIR filter described
by the transfer function (10.21). Its pole-zero locations are shown in
Fig.11. Derive a 3-parallel IIR filter where in every clock cycle 3
inputs are processed and 3 outputs are generated
Since the filter order is 2, 2 outputs need to be updated independently and
the 3
rd
output can be computed incrementally outside the feedback loop
using the 2 updated outputs. Assume that y(3k) and y(3k+1) are computed
using loop update operations and y(3k+2) is computed incrementally.
From the transfer function, we have:
The loop update process for the 3-parallel system is shown in Fig.12
where y(3k+3) and y(3k+4) are computed using y(3k) and y(3k+1)
2
8
3
1
4
5
2 1
1
) 1 (
) (

+
+
z z
z
z H (10.21)
) 2 ( ) 1 ( 2 ) ( ) (
); ( ) 2 ( ) 1 ( ) (
8
3
4
5
+ +
+
n u n u n u n f
n f n y n y n y
(10.22)
Chapter 10 37
2
1
4
3
1
1
D
D
y(3k+3) y(3k)
y(3k+1)
y(3k+4)
Fig.11: Pole-zero plots
for the transfer function
Fig.12: Loop update for
block size=3
Chapter 10 38
The computation of y(3k+3) using y(3k) & y(3k+1) can be carried out if
y(n+3) can be computed using y(n) & y(n+1). Similarly y(3k+4) can be
computed using y(3k) & y(3k+1) if y(n+4) can be expressed in terms of
y(n) & y(n+1) (see Fig.13). These state update operations correspond to
clustered look-ahead operation for M=2 and 3 cases. The 2-stage and 3-
stage clustered look-ahead equations are derived as:
[ ]
[ ]
) ( ) 1 (
) 3 ( ) 2 ( ) 4 ( ) 3 (
) (
) 2 ( ) 1 ( ) 3 ( ) 2 (
) ( ) 2 ( ) 1 ( ) (
4
5
32
15
8
3
4
5
16
19
8
3
8
3
4
5
4
5
8
3
4
5
n f n f
n y n f n y n y
n f
n y n f n y n y
n f n y n y n y
+ +
+
+
+
+
(10.23)
Chapter 10 39
Substituting n=3k+3 & n=3k+4 into (10.23), we have the following 2
loop update equations:
The output y(3k+2) can be obtained incrementally as follows:
The block structure is shown in Fig. 14
) 4 3 ( ) 3 3 (
) 2 3 ( ) 3 ( ) 1 3 ( ) 4 3 (
) 2 3 (
) 2 3 ( ) 3 ( ) 1 3 ( ) 3 3 (
4
5
16
19
128
57
64
65
4
5
32
15
16
19
+ + + +
'
+ + + +
+ +
+ + + +
k f k f
k f k y k y k y
k f
k f k y k y k y
(10.24)
) 2 3 ( ) 3 ( ) 1 3 ( ) 2 3 (
3
8
4
5
+ + + + k f k y k y k y
3
2
n+2 n+3 n+4 n+1 n
Fig.13: Relationship of the recursive outputs
Chapter 10 40
Fig. 14: Block structure of the 2
nd
-order IIR filter (L=3)
(also see Fig.10.20, p.344)
D
2
) 4 3 ( + k u
D
2
2
D
D
4
5
16
19
4
5
16
19
32
15
64
65
128
51
8
3
4
5
) 3 3 ( + k u ) 2 3 ( + k u
) 4 3 (
1
+ k f
) 3 3 (
1
+ k f
) 2 3 (
1
+ k f
) 4 3 (
2
+ k f
) 3 3 (
2
+ k f
) 1 3 ( + k y
) 3 ( k y
) 2 3 ( + k y
Chapter 10 41
Comments
The original sequential system has 2 poles at {1/2, 3/4}. Now consider the
pole locations of the new parallel system. Rewrite the 2 state update
equations in matrix form: Y(3k+3)=AY(3k)+F, i.e.
The eigenvalues of system matrix A are , which are the poles of
the new parallel system. Thus, the parallel system is more stable. Note:
the parallel system has the same number of poles as the original system
For a 2
nd
-order IIR filter (N=2), there are total 3L+[(L-2)+(L-1)]+4+2(L-
2)=7L-3 multiplications, (the numerator part 3L; the overhead of loop
update [(L-2)+(L-1)]; the loop multiplications 4; the incremental
computation 2(L-2)). The multiplication complexity is linear function
of block size L. This multiplication complexity can be further reduced by
using fast parallel filter structures and substructure sharing for the
incrementally-computed outputs
1
]
1
+
1
]
1
1
]
1
1
]
1
+
+
2
1
64
65
128
57
16
19
32
15
) 1 3 (
) 3 (
) 4 3 (
) 3 3 (
f
f
k y
k y
k y
k y
(10.25)
( ) ( )
3
4
3
3
2
1
,
Chapter 10 42
Combined Pipelining and Parallel Processing
For IIR Filters
Pipelining and parallel processing can also be combined for IIR filters
to achieve a speedup in sample rate by a factor LM, where L denotes
the levels of block processing and M denotes stages of pipelining, or to
achieve power reduction at the same speed
st
-order IIR with transfer
function (10.26). Derive the filter structure with 4-level pipelining and
3-level block processing (i.e., M=4, L=3)
Because the filter order is 1, only 1 loop update operation is required.The
other 3 outputs can be computed incrementally.
1
1
1
) (

z a
z H
(10.26)
Chapter 10 43
Since pipelining level M=4, the loop must contain 4 delay elements
(shown in Fig.15). Since the block size L=3, each delay element
represents a block delay (corresponds to 3 sample delays). Therefore,
y(3k+12) needs to be expressed in terms of y(3k) and inputs (see Fig. 15).
Fig.15: Loop update for the pipelined block system
(also see Fig.10.21, p. 346)
4D
y(3k)
y(3k+12)
12
a
Chapter 10 44
Substituting n=3k+12, we get:
Finally, we have:
The parallel-pipelined filter structure is shown in Fig. 16
) ( ) 11 ( ) 12 (
) ( ) 1 ( ) (
11 12
n u n u a n y a
n u n ay n y
+ + +

+
) 12 3 ( ) 9 3 ( ) 6 3 ( ) 3 (
) 12 3 ( ) 1 3 ( ) 3 ( ) 12 3 (
1 1
3
2
6 12
11 12
+ + + + + +
+ + + + + +
k f k f a k f a k y a
k u k u a k y a k y

'
+ + + +
+ + + + + +
) 12 3 ( ) 9 3 ( ) 12 3 (
) 12 3 ( ) 11 3 ( ) 10 3 ( ) 12 3 (
1 1
3
2
2
1
k f k f a k f
k u k au k u a k f
where

'
+ + + +
+ + +
+ + + + + + +
) 2 3 ( ) 1 3 ( ) 2 3 (
) 1 3 ( ) 3 ( ) 1 3 (
) 12 3 ( ) 9 3 ( ) 6 3 ( ) 3 ( ) 12 3 (
1 1
3
2
6 12
k u k ay k y
k u k ay k y
k f k f a k f a k y a k y
(10.27)
Chapter 10 45
Fig. 16: The filter structure of the pipelined block system
with L=3 & M=4 (also see Fig.10.22, p.347)
3D
) 10 3 ( + k u
2
a
) 11 3 ( + k u
) 12 3 ( + k u
a
3D
D
3
a
2D
3
a
4D
12
a
) 12 3 (
1
+ k f ) 12 3 (
2
+ k f
) 1 3 ( + k u
) 2 3 ( + k u ) 2 3 ( + k y
) 1 3 ( + k y
) 3 ( k y
Chapter 10 46
Comments
The parallel-pipelined filter has 4 poles: . Since the
pipelining level is 4 and the filter order is 1, there are total 4 poles in the
new system, which are separated by the same angular distance. Since the
block size is 3, the distance of the poles from the origin is .
Note: The decomposition method is used here in the pipelining phase.
The multiplication complexity ( assuming the pipelining level M to be
power of 2) can be calculated as (10.28), which is linear with respect to L,
and logarithmic with respect to M:
Example (Example 10.6.2, p. 347) Consider the 2
nd
-order filter in Example
10.5.3 again, design a pipelined-block system for L=3 and M=2
( )
3 3 3 3
, , , ja ja a a
3
a
( ) ( ) M L L M L
2 2
log 1 2 1 1 log 1 + + + +
) 2 ( ) 1 ( 2 ) ( ) (
); ( ) 2 ( ) 1 ( ) (
8
3
4
5
+ +
+
n u n u n u n f
n f n y n y n y
(10.28)
(10.29)
Chapter 10 47
A method similar to clustered look-ahead can be used to update y(3k+6)
and y(3k+7) using y(3k) and y(3k+1). Then by index substitution, the final
system of equations can be derived.
Suppose the system update matrix is A. Since the poles of the original
system are , the eigenvalues of A can be verified to be
The poles of the new parallel-pipelined second-order filter are the square
roots of eigenvalues of A, i.e.,
Comments: In general, the systematic approach below can be used to
compute the pole location of the new parallel pipelined system:
1. Write the loop update equations using LM-level look-ahead, where M
and L denote the level of pipelining and parallel processing, respectively.
2. Write the state space representation of the parallel pipelined filter,
where state matrix A has dimension NN and N is the filter order
3. Compute the eigenvalues of matrix A,
4. The NM poles of the new parallel-pipelined system correspond to the
M-th roots of the eigenvalues of A, i.e.,
( )
4
3
2
1
,
( ) ( )
6
4
3
6
2
1
,
( ) ( ) ( ) ( )
3
4
3
3
4
3
3
2
1
3
2
1
, , ,
i
N i 1 ( )
M
i
1

Chap 10

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Chap 10

Hochgeladen von

Copyright:

Verfügbare Formate

Chapter 10: Pipelined and Parallel

Recursive and Adaptive Filters

Das könnte Ihnen auch gefallen