Beruflich Dokumente
Kultur Dokumente
Zoran Banjac
Milan Milosavljevic
Academic Mind
University of Belgrade - School of Electrical Engineering
University Singidunum
Springer-Verlag Berlin Heidelberg
2013
Adaptive Digital Filters
Branko Kovacevic Zoran Banjac
Milan Milosavljevic
Academic Mind
University of Belgrade - School of Electrical Engineering
University Singidunum, Belgrade
Springer-Verlag Berlin Heidelberg
Branko Kovacevic Milan Milosavljevic
University of Belgrade University of Belgrade and Singidunum
Belgrade University
Serbia Belgrade
Serbia
Zoran Banjac
School of Electrical and Computing
Engineering of Applied Studies
Belgrade
Serbia
v
vi Preface
introduce the reader into the field of advanced modern algorithms, some of which
represent a contribution of the authors of the book. The work in the field of
adaptive signal processing requires the use of a complex mathematical apparatus.
The manner of exposition in this book presumes a detailed presentation of the
mathematical models, a task done by the authors in a clear and consistent way. The
chosen approach enables everyone with a college level of mathematics knowledge
to successfully follow the mathematical derivations and descriptions of algorithms
in the book. The algorithms are presented by flow charts, which facilitates their
practical implementation. The book gives many experimental results and treats the
aspects of practical application of adaptive filtering in real systems. The book will
be useful both to students of undergraduate and graduate studies, and to all of those
who did not have an opportunity to master this important science field during their
formal education.
The authors would like to express their gratitude to the referees for their useful
suggestions and advices which contributed significantly to the quality of the book.
The text of the book is divided into six chapters.
The first, introductory chapter, considers generally three most often used
theoretical approaches to the design of linear filtersthe conventional approach,
the optimal filtration, and the adaptive filtration. The further text analyzes only the
third approach, i.e., the adaptive filtration.
Chapter 2 presents the basic structures of adaptive filters. It also considers the
criterion function for the optimization of the parameters of adaptive filters and
analyzes the two basic numerical methods for the determination of the minimum
of the criterion function: the Newton method and the method of steepest descent.
After presenting the basic concept of adaptive filtering, it overviews the standard
and the derived adaptive algorithms of the Least Mean Square ErrorLMS type
and Recursive Least SquareRLS algorithm, for the sake of further analysis and
estimation of the possibilities to modify them in order to improve the character-
istics of the mentioned adaptive algorithms. Also, potential advantages of the
Infinite Impulse ResponseIIR filters impose a need for their more intensive use,
as well as for the analysis of the adaptation of the proposed solution for the
systems with Finite Impulse ResponseFIR systems, as well as for the IIR sys-
tems. This is the reason why in the second chapter care has been given to this
problem too.
An analysis of the ability of adaptive algorithms to follow nonstationary
changes in the system, together with the synthesis of efficient algorithms based on
variable forgetting factor, is presented in Chap. 3. A comparison has been made
among a number of strategies for the choice of forgetting factor (extended pre-
diction error, parallel adaptation, and Fortecue-Kershenbaum-Ydstie algorithm)
against their ability to follow nonstationary changes and the complexity of the
implementation of algorithms. The most convenient strategies for the choice of
variable forgetting factor from the practical point of view were emphasized.
Chapter 4 presents an original approach to the design of an FIR-type adaptive
algorithm with a goal to increase the convergence speed in the parameter esti-
mation process. The approach is based on an optimum approach to the
Preface vii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Conventional Approach to the Design of Digital Filters . . . . . . . 1
1.2 Optimal Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Wiener Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Adaptive Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Structures of Digital Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.1 Filters with Infinite Impulse Response (IIR Filters) . . . . . 32
2.2.2 Filters with Finite Impulse Response (FIR Filters) . . . . . 34
2.3 Criterion Function for the Estimation of FIR Filter Parameters . . . 36
2.3.1 Mean Square Error (Risk) Criterion: MSE Criterion . . . . 37
2.3.2 Minimization of the Criterion of Mean Square
Error (Risk) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 39
2.4 Adaptive Algorithms for the Estimation of Parameters
of FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.1 Least Mean Square (LMS) Algorithm . . . . . . . . . . . . . . 46
2.4.2 Least Squares Algorithm (LS Algorithm). . . . . . . . . . . . 49
2.4.3 Recursive Least Squares (RLS) Algorithm . . . . . . . . . . . 51
2.4.4 Weighted Recursive Least Squares (WRLS)
Algorithm with Exponential Forgetting Factor . . . . . . .. 53
2.5 Adaptive Algorithms for the Estimation of the Parameters
of IIR Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 59
2.5.1 Recursive Prediction Error Algorithm
(RPE Algorithm). . . . . . . . . . . . . . . . . . . . . . . . . . . .. 67
2.5.2 Pseudo-Linear Regression (PLR) Algorithm . . . . . . . . .. 72
ix
x Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Abbreviations
xiii
xiv Abbreviations
Electrical filters find numerous practical applications, often for very different
purposes. One of the basic applications of filters is suppression of the influence of
noise or interference, with a goal to extract useful components of the signal. A
large subgroup is linear filters, their crucial property being a linear relation
between the signals at their input and at their output. Basically, there are three
theoretical approaches to the design of such filters: (1) conventional approach; (2)
optimal filtering, the so-called Wiener or Kalman filtering and (3) self-adjusting
filters or adaptive filtering.
Although this book is mostly dedicated to the third approach to the design of
linear filters, i.e. to the synthesis of self-adjusting or adaptive digital filtering, for
the sake of completeness the introductory chapter will consider the other two
approaches to the design of such systems.
or excitation signal); (3) system realization. Although these three steps are not
independent, a special care is dedicated in literature to the second step, since the
first step is primarily dependent on the field of application of the filter, while the
third step is connected with the existing implementation technology. It is inter-
esting to note that digital filters are often used to process signals obtained from
continuous signals utilizing analogdigital (A/D) converters. When a digital filter
is used to process analog signals, the specification of the digital filter and the
effective continual filters (whose digital approximation is designed) is given in the
frequency domain. This is primarily valid for frequency selective filters, like low-
pass filter (LP), band-pass filter (BP) and high-pass filter (HP). If the sampling
(discretization) period, T, is sufficiently short, no overlap will occur between the
frequency components from different periods in the periodic frequency charac-
teristics (spectrum) of the discretized continual filter (aliasing effect). Thus in
Nyquist range jXj\ Tp, where X is the analog angular frequency, the digital filter
will behave virtually identical to the desired continual filter with the frequency
response
jXT
H e ; jXj\p=T
Heff jX 1:1
0; jXj p=T
In this case the specification that may be posed to the effective continual filter
may also be posed as the requirements for the digital filter, by introducing sub-
stitution x XT (x is denoted as digital, and X analog angular frequency), i.e.
Hejx becomes the specification of the digital filter within a (basic) period of
infinite and periodic frequency response of the digital filter (represents an infinitely
periodic function of the argument x, with a period 2p)
x
H ejx Heff j ; jxj\p 1:2
T
A typical characteristic of a digital filter is shown in Fig. 1.1, plotted for a
normalized (digital) frequency 0 x p.
In order to satisfy the posed requirements, the real characteristics must fulfill
the following:
1 1
1 d1 H ejx 1 d1 ; jxj xp ; xp Xp T 1:3
jx
H e d2 ; xs x p; xs Xs T 1:4
Many practically used filters are specified in the presented manner, without any
limitation posed to their phase characteristics, stability (a system is stable if its
output in a steady or equilibrium state which commences after its transient state
vanishes is dictated by its excitation only) and causality (a causal system has a
property that its output signal or response is equal to zero before the input or
excitation signal is brought to it). At the same time it is known that an Infinite
Impulse Response (IIR) filter must be causal and stable, while the basic
assumption for a Finite Impulse Response (FIR) filter is that its phase is linear,
since such a filter by itself represents a stable and causal system. In any case,
according to the specified requirements one should choose a filter with frequency
characteristics satisfying the posed limits, which represents a problem of functional
approximation (Fig. 1.2).
Traditional approach to the design of IIR filters is based on the transformation
of the transfer function of a continuous filter satisfying given requirements into a
corresponding discrete transfer function [1, 3]. This approach stems from the
following:
Procedures for the design of continuous filters are well researched, numerous
and they furnish good results;
Many methods for continuous filter design are very simple and their transfor-
mation to the digital domain significantly simplifies the numerical procedure of
digital filter design;
Many simple methods used for the design of continuous filters do not furnish
simple solutions in closed form if directly applied for digital filter design.
The Butterworth filter and the Chebyshev filter are most often used for the
design of continuous filters.
The Butterworth LP filter is designed to satisfy the requirement to have a
maximally flat amplitude-frequency characteristic in the passband. For an N-order
filter this means that the first (2N 1) derivatives of the squared amplitude
H (e j )
Fig. 1.2 Satisfactory
amplitude-frequency
characteristics of an LP filter
1 + 1
1 1
2
P S /T
4 1 Introduction
The zeroes of the polynomial in the denominator (the so-called poles of the
filter) in the last expression are
0.6
0.4
0.2
0
c
1.1 Conventional Approach to the Design of Digital Filters 5
associated with the function Hc s. Thus the stable transfer function of a But-
terworth filter can be written in the form
K
H c s ; 1:8
s s1 . . .s sN
where the poles si are from the circle with the radius Xc from the left half-plane of
the s-plane, while the amplification K is calculated from the condition that (unit
amplification at zero frequency)
Y
Hc 0 1 ) K 1N si : 1:9
i1;N
1 1=N
b a a1=N : 1:14
2
The poles of the filter are determined by a procedure in which in the first step
one sketches 2 N half-lines in the s-plane with their starting point in the origin,
they being mostly equidistant (each two half-lines form an angle equal to p=N
radian) and with positions symmetric with regard to the real and the imaginary
axis. In other words, if N is odd, these half-lines form an angle of ip=N; i
0; 1; . . .; 2N 1 with the positive part of the real axis, and an angle of i
0:5 p=N; i 0; 1; . . .; 2N 1 if N is an even number. After that one determines
the cross-sections of these half-lines with the circle whose center is in the origin
and the radius is aXc . The real parts of these cross-sections, which are negative,
simultaneously represent the real parts of the desired poles. The imaginary parts of
the desired poles are determined in the cross-section of the corresponding half-
lines with a circle with its center in the origin and with a radius bXc . The procedure
of the determination of the fourth-order filter poles (N = 4) is shown in Fig. 1.5.
The obtained poles are simultaneously on the ellipse with its semi-axes aXc and
bXc .
In that case one adopts the following form for the transfer function of the
Chebyshev filter
K
H c s ; 1:15
s s1 s s2 . . .s sN
where
XNc
K : 1:16
2N1 e
A Chebyshev filter of the second kind is characterized by the fact that its
squared amplitude characteristic is given by the following expression
1.1 Conventional Approach to the Design of Digital Filters 7
s2
ac
s3
s4
1
jHc jXj2 1
: 1:17
1 e2 VN2 X=Xc
There are several ways to transform a continuous filter into a digital filter. One
of them is the method of impulse invariance, another one is based on the bilinear
transformation, while the third is the co-called matching method [3].
The impulse invariance method starts from the assumption that the impulse
response of a digital filter should be equivalent to the values of the impulse
response of a continuous filter in the moments of sampling
h n Thc nT: 1:18
However, the matching method is much more often used; it is not based on
generation of impulse response in time domain, but on the transformation of the
filter transfer function. This method starts from the assumption that the transfer
function of a continuous filter can be written as a sum of partial fractions
XN
Ak
Hc s : 1:19
k1
s sk
8
<PN
Ak esk t ; t 0
hc t 1:20
: k1
0; t\0
In that case the impulse response of the digital filter should assume the form
X
N
hn Thc nT TAk esk nT un
k1
1:21
X
N
sk T n
TAk e u n
k1
2 1 z1
s : 1:23
T 1 z1
Until now we discussed the design of LP filters only. If we desire to design
another type of filters (e.g. an LP filter with another cutoff frequency, HP, BP or
band-stop filterBS), it is necessary to transform the complex variable z. Thus if
the filter HLP Z is defined, all other types can be obtained in the following manner
[3]
H z HLP Z Z 1 Gz1 ; 1:24
where the function G must be rational, while the interior of the unit circle must
map again into the interior of a unit circle. Constantinides found in 1970 the most
general form of this function [3]
1.1 Conventional Approach to the Design of Digital Filters 9
YN
z1 ak
Z 1 G z1 : 1:25
k1
1 ak z1
According to this one may render the following table, which enables mapping
of the obtained transfer function for an LP filter to the transfer function of a digital
filter of the corresponding LP, HP, BP and BS type.
LP z1 a Hp xp
Z 1 sin
1 az1 2
a
Hp xp
sin
2
HP z1 a Hp xp
Z 1 cos
1 az1 2
a
Hp xp
cos
2
x x
BP 2ak 1 k1
x x H
p2 p1
z2 z cos
k 1 k1 2 p2 p1 p
a xp2 xp1 ; k cot
1 tan
Z
k 1 2 2ak 1 cos 2 2
z z 1 2
k1 k1
x x
BS 2ak 1k
z2 z1 cos
p2 p1
x x H
k 1 1 k a 2
x x ; k tan p2 p1 p
Z 1 tan
1 k 2 2a 1 cos
p2 p1 2 2
z z 1 2
1k 1k
The design of an optimal filter [49] assumes the use of the optimization theory,
with a goal to arrive at a solution that is optimal with regard to a previously defined
criterion. These criteria are usually based on the minimization of the mean square
value of the difference between the actual output from the filter and a reference
signal or a desired filter response, and the obtained structure is often denoted as the
Wiener filter [79].
10 1 Introduction
noise
v (t)
+
input signal Filter output signal
+ W(s)
y (t) + z (t) y (t)
y (t)
+ error signal
+
yi (t)
Reference system
I(s)
reference signal
Since the mean value of the error signal is zero, i.e. Ef~ytg 0, the expression
(1.26) also defines the variance of the error (the error is the deviation of the filter
response from the reference or desired response). The minimization of the MSE
criterion (1.26) is easier to implement in the frequency domain than in the time
domain, which is done by applying the Parsevals theorem to (1.26)
Zj1
1
MSE S~y sds; 1:27
2pj
j1
where S~y s is the spectral power density of a stochastic error signal ~y t, i.e.
S~y s E Y~ sY~ s sjx . The reformulation of the optimization problem (1.26)
into the frequency domain, i.e. the relation (1.27), assumes the existence of error
spectral power density S~y s, i.e. the stationarity of the random signals. Since
according to Fig. 1.6 one has ~yt yi t ^yt, by applying the Laplace trans-
form (operator) to the previous relation one obtains
Y~ s Yi s Y^ s; 1:28
where
Yi s I sY s; 1:29
Y^ s W sZ s W sY s V s: 1:30
By replacing (1.29) and (1.30) into (1.28), one obtains
Y~ s I s W sY s W sV s: 1:31
Since according to the defining relation for the spectral power density
S~y s E Y~ sY~ s ; 1:32
by replacing (1.31) into (1.32), one may write
S~y s I s W sSy sI s W s W sSv sW s: 1:33
While deriving (1.33) it has been taken into account that y t and v t are
uncorrelated stochastic signals, i.e. that the cross-spectral power density is
S~y s EfY sV sg 0. By introducing the expression (1.33) into (1.27) the
MSE criterion assumes the form
Zj1
1
MSE I s W sSy sI s W s W sSv sW s ds:
2pj
j1
1:34
12 1 Introduction
The optimization problem under consideration now reduces to the choice of the
transfer function W s so that the MSE-criterion is minimized (1.34). Thus posed
optimization problem represents a classical task of the variational calculus [9].
Following the methodology of the variational calculus, let us denote the required
optimal transfer function by Wo s, an arbitrary fixed transfer function by g s,
and a scalar parameter with adjustable value by e, where W s Wo s egs. In
this case the MSE criterion as a function of the parameter e for different fixed value
of the parameter g will have a qualitative dependence as shown in Fig. 1.7.
It is obvious from Fig. 1.7 that the necessary and sufficient conditions for the
minimum of the criterion are
oMSE o2 MSE
0; [ 0; 1:35
oe e0 oe2 e0
for any variation of the parameter g. The application of the variational procedure
to the problem under consideration reduces to the following. By replacing
W s Wo s egs 1:36
into the MSE-criterion (1.34) one obtains
Zj1
1
MSEe; g I s Wo s egsSy sI s Wo s egs
2pj
j1
1.2 Optimal Filters 13
so that the first relation in (1.35) (the necessary condition for the minimum of the
criterion) reduces to
Zj1
1
MSE Wo s Sy s Sv s I sSy s gsds
2pj
j1
1:38
Zj1
1
gs Sy s Sv s Wo s Sy sI s ds:
2pj
j1
Zj1
W0 s Sy s Sv s I sSy s gsds 0: 1:39
j1
Sz s Sy s Sv s DsDs; 1:42
where D s is a real rational function with zeroes and poles located in the left half
of the s-plane. This ensures that the complex functions D s and D1 s are
regular in the right half of the s-plane. The task (1.42) is denoted as the problem of
spectral factorization and in a general case it cannot be solved analytically, in a
closed form, but requires instead the application of suitable numerical algorithms.
By replacing (1.42) into (1.39), we obtain
Zj1
Wo sDs I sSy sD1 s Dsgsds 0: 1:43
j1
The next step is to develop the term I sSy s D1 s into partial factors, i.e. to
represent it in the form
where the symbol Fo denotes the physically realizable part of the corresponding
transfer function in the brackets . By replacing (1.45) into (1.43) one obtains
Zj1 Zj1
W0 sDs AsDsgsds BsDsgsds 0: 1:46
j1 j1
Z j1
Z X
n
BsDsgsds BsDsgsds Res si 0; 1:47
i1
C j1
Wo s AsD1 s; 1:49
i.e.
Wo s I sSy sD1 s Fo D1 s 1:50
In the case when the signal y t and noise vt are not uncorrelated (cross-
spectral power density Syv s EfY sV sg 6 0 and
Syv s EfV sY sg 6 0), the optimal Wiener filter is defined by the transfer
function
Wo s I sSzy sD1 s Fo D1 s; 1:51
where
DsDs Sz s Sy s Sv s Syv s Svy s; 1:52
while
The relation (1.51) defines the transfer function of an optimal analog Wiener
filter. In a similar manner one may derive a discrete transfer function of an optimal
digital Wiener filter
Wo z I zSzy zD1 z Fo D1 z; 1:54
where Fo denotes the physically realizable part of the discrete transfer function in
brackets (physically realizable discrete transfer function has all its poles within
16 1 Introduction
the region bound by a unit circle, jzj 1, [3]; the connection between the complex
s-plane and the complex z-plane is defined as z expfsT g, where T is the period
of discretization of the analog signal [3]).
The spectral factorization (1.52) now assumes the form
DzDz Sz z Sy z Sv z Syv z Svy z; 1:55
where the spectral power density is
Szy z E Z zY z1 E Y z V zY z1
E Y zY z1 E V zY z1 1:56
Sy z Svy z
Let us note that the discrete version of the analog optimal filter can be also
directly obtained from its analog transfer function, by using some of the discret-
ization techniques described in the previous Chapter.
The Wiener filter did not find wider application in engineering practice, since
Kalman solved the posed optimization problem in time domain without a
requirement for the stationarity of the stochastic signals. Also, the Kalman filter
represents a much simpler solution in numerical sense compared to the Wiener
filter [710].
The theory of Kalman filtering starts from the assumption that the nature of the
signal of interest is stochastic and that it is generated by passing a white stochastic
process (white noise) through a linear dynamical system [79]. Contrary to the
Wiener theory in which a linear dynamic system is described by an inputoutput
relation, i.e. by a corresponding transfer function in the complex domain, here the
system is represented in time domain by a model in a state space. The quoted
model encompasses a dynamic state equation (which in the case of continual
signals represents a vectorial linear differential equation of the first order, while in
the case of discrete signals this equation becomes a vectorial linear difference
equation of the first order) and an algebraic equation of the system output. In the
discrete case these two equations can be written in the following manner
xk 1 Fxk Gxk; x0; 1:57
3) x kwhite stochastic process with a zero mean value which excites the
linear model (1.57), the so-called noise of the state or of the process, i.e.
Efx0g 0; E xkxT j Qkdk;j 1:60
for each k; j 1; 2; . . ., where dk;j is Kronecker delta symbol (dk;j 0 for k 6 j
and (dk;j 1 for k j), and Q k represents the covariance matrix for noise.
4) v kwhite stochastic noise with a zero mean value, which represents additive
noise in the output state Eq. (1.58), i.e. measurement noise, i.e.
Efv0g 0; E vkvT j Rkdk;j 1:61
for each k; j 1; 2; . . . where dk;j is Kronecker delta symbol, and R k represents
covariance matrix for this noise.
5) F, G and H given matrices of corresponding dimensions, which in a general
case may also depend on the time index k. If these matrices are constant, while
covariance matrices of noise Q k and R k also do not depend on the time
index k, i.e.Q k Q and R k R, the considered model is time-invariant
or stationary.
6) Further we assume that the vectorial stochastic variables x 0, x k and v k
are mutually non-correlated, so that
E xkvT j 0; E xkx0 m0 T 0; E vkx0 m0 T 0
1:62
for each k; j 1; 2; . . ..
Let us note that the dynamic Eq. (1.57) represents a model-generator of the state
vector, i.e. the physical mechanism generating the components of the state vector
as the physical variables of interest, while the algebraic Eq. (1.58) describes the
mechanism of measurement (observation) of the output signal, taking into account
the sensor inaccuracy itself, as expressed through additive noise. Thus the output
signal of interest is generated as a linear combination of the components of the
state vector, which is additionally contaminated by measurement noise.
The Kalman filter itself represents a recursive numerical algorithm generating
an estimation of the immeasurable state vector in the current discrete moment, with
a time index k, based on the available estimation of the state vector in the preceding
discrete moment, with the time index k 1, and the newly obtained measurement
in the given discrete moment, y k. If we introduce the following notation
18 1 Introduction
^xkjk ^xk estimation of the state vector x k in the moment k, after the
last measurement, y k has been performed;
^xkjk 1 ^xk estimation of the state vector x k in the moment k, before
the last measurement, y k has been performed,
then the estimation ^xk can be formed based on the predictorcorrector algo-
rithm representing a linear combination of the previously known estimation, ^xk
and the newly obtained measured information, y k, i.e.
^xk K k^xk Kkyk 1:63
where K k and Kk are unknown matrices dependent on the time moment k.
These matrices represent free parameters in the algorithm (1.63) and they should
be chosen so that the estimation (1.63) gives the best possible approximation of the
immeasurable random vector variable x k. Thus posed problem imposes the
requirement to define criteria or indicators for the appraisal of the quality of the
estimation. Similar to the well-known criteria for the appraisal of the quality of
measuring equipment, like its accuracy and precision, the terms are introduced
here of unbiased estimation and the total variance of estimation error. If we
introduce the corresponding notation for estimation errors as
~xk xk ^xk ; 1:64
this estimation will be equal to the mean (expected) value of the immeasurable and
estimated stochastic state vector itself (the so-called accuracy condition), while the
actual realizations of the estimation will be in some neighborhood around this
mean value, this neighborhood being smaller with a smaller value of the criterion
(1.69), i.e. if the precision is higher. Bearing this in mind, the unknown weighted
matrix in the estimation algorithm (1.63) can be determined from the condition of
unbiased estimation (1.66) and from the condition of minimum criterion (1.69),
and thus such estimation is denoted as the estimation of minimum variance of error
or a discrete Kalman filter. Let us note too that ^xk is also called the filtered
estimation, and ^xk is the single-step prediction [79]. Based on (1.58), (1.63)
and (1.64) we may further write
~xk xk K k^xk KkHxk vk K kxk K kxk;
where the last bracketed addend is included artificially and its value is zero, from
which it follows, taking into account (1.65), that
~xk I K k KkHxk K k~xk Kkvk: 1:70
Since it is assumed that Efvkg 0 and if it is also assumed that the previous
estimation was unbiased, i.e. Ef~xk g 0, the condition (1.66) will be satisfied
for each x k only for
I K k KkH 0 ) K k I KkH: 1:71
Replacing (1.71) into (1.63) and (1.70) it is obtained that
^xk ^xk Kkyk H^xk ; 1:72
Pk Pk KkHPk Pk HT KT k
1:74
Kk HPk HT R KT k;
where R Efv kvT kg. When deriving the solution (1.74) it was taken into
account that
E ~xk vT k 0; E vk~xTk 0; 1:75
since the error ~xk depends on the realization of state noise x i,
i 1; 2; . . .; k 1, which is assumed to be uncorrelated with measurement noise
v k. By replacing (1.74) into (1.69), we obtain the expression for the total var-
iance of the estimation (filtration) error.
20 1 Introduction
or2k
0 1:77
oKk
The application of the operator of partial differentiation to (1.76) and the cal-
culation of the corresponding partial derivatives requires the knowledge of the
following rules for the differentiation of the matrix trace, as a scalar expression,
over a matrix argument [7]
o
TracefBACg BT CT ; 1:78
oA
o
Trace ABT 2AB for B BT : 1:79
oA
Applying the partial differentiation operator to (1.76), and using the condition
(1.77), it is obtained that
o
2 TracefKkHPk g
oKk
1:80
o
Trace Kk HPk HT R KT k 0:
oKk
If we further apply the rule (1.78) to the first term in (1.80), with B I;
A K k and C HPk , and the rule (1.79) to the second term in (1.80), with
A K k and B HPk HT R, the relation (1.80) reduces to
or2k k
2Pk HT 2Kk HPk HT R 0 1:81
oKk
from which we obtain the optimum amplification matrix
1
Kk Pk HT HPk HT R : 1:82
If the right side of the expression (1.81) is again partially differentiated,
applying the rule (1.78) to the second addend in (1.81), also choosing B I,
A K k and C HPk HT R, it can be written
1.2 Optimal Filters 21
o2 r2k k
2
2 HPk HT R 0; 1:83
oKk
from which we conclude that the optimum solution (1.82) corresponds to the
minimum of the scalar criterion (1.69). By replacing the optimum amplification
(1.82) into the expression (1.74), we obtain the expression for the covariance
matrix of estimation (filtration) error
1
Pk Pk Pk HT HPk HT R HPk ; 1:84
or, if we introduce relation (1.82) into the expression (1.84),
Pk I KkHPk : 1:85
Relations (1.72), (1.82) and (1.84) or (1.85) define the estimation correction
step based on measurements or the estimation (filtration) step in a discrete Kalman
filter. Obviously the realization of this step assumes that prior to it the prediction
step was realized, i.e. that the values ^xk and Pk are known. Let us note that
the variable
~yk yk H^xk 1:86
is denoted as the measurement residual or the innovation. Namely, based on the
output Eq. (1.58) it is possible to estimate (predict) its expected value before the
signal itself is measured. Since measurement noise is assumed to have a zero mean
value, and in the moment k immediately before the signal y k is measured the
prediction ^xk of the immeasurable random state xk is known, the expected
value or prediction of the output in the moment k is
^yk H^xk : 1:87
In this way, the complete new measurement in the moment k, yk, does not
introduce any new information about the system, since before the measurement
itself we already predicted its value (1.87) according to the measurement model
(1.58), so that the only new information in the measurement is its residual or
innovation (1.86). Further according to (1.58), (1.65) and (1.86) we obtain
~y k H~xk v k; 1:88
from which we conclude that the mean (expected) value of the residual is
E f~ykg 0. This follows from the assumption that measurement noise v k has
a zero mean value, Efvkg 0, and that the prediction ^xk is an unbiased
estimation of the state, i.e. Ef~xkg 0. The covariance matrix of the residual
(which represents variance in the case of one-dimensional signal) is given by the
expression
Sk E ~yk~yT k HPk HT R 1:89
22 1 Introduction
To derive the expression (1.89) we used the relations (1.75) and (1.88). Simi-
larly to measurement noise v k, the residual ~y k also represents white noise with
a zero mean value and with a corresponding covariance matrix S k in (1.89)
(noise covariance matrix vk is R) [79].
The prediction of the system may be regarded as its estimation (filtration) when
a measurement is unavailable (in that case the covariance matrix of measurement
noise R ! 1, so that Kalman amplification in (1.82) is equal to zero, i.e. K 0).
Further, if we include K k 0 in (1.72) and (1.85), it follows that
^xk ^xk Kk0
Pk Pk Kk0
The prediction ^xk itself represents the extrapolation of the state estimation
(filtration) in the moment k 1, ^xk1 , to the moment k, but immediately before
the signal yk is measured. This extrapolation may be done based on the model
(1.57) itself, bearing in mind that the expected value of noise xk is equal to zero.
Then according to (1.57), we may write
^xk F^xk1 : 1:90
Relation (1.90) is derived from (1.57) by replacing the state vector xk by
^xk , and the state vector in the previous moment xk 1 with ^xk1 ,
simultaneously neglecting state noise x k. The prediction error, ~xk in (1.65),
can be written as
~xk Fxk Gxk F^xk1 F~xk1 Gxk
so that the covariance matrix of the prediction error, Pk in (1.68), is given by
P1 1 T 1
k Pk H R H; 1:93
the expression for the amplification matrix of the Kalman filter (1.82) reduces to
1.2 Optimal Filters 23
1
Kk Pk P1 T T
k Pk H HPk H R
1
Pk P1 T 1 T
k H R H Pk H HPk H R
T
1
Pk HT I R1 HPk HT HPk HT R
1
Pk HT R1 R R1 HPk HT HPk HT R
from where it stems that
Kk Pk HT R1 : 1:94
The expression (1.94) shows that the value of Kalman filter amplification
K depends on the size of the covariance matrix of estimation error P, which
represents a figure of merit of the system state estimation quality, and on the size
of the covariance matrix of measurement noise R, which describes the accuracy of
the output signal measurement in the system. A high value of noise, i.e. a high R,
and a small filtration error, i.e. low P, show that the residual ~y is mostly the
consequence of measurement noise, so that the filter imparts small weight, i.e.
small amplification K to this residual, which does not bear important information
about the estimated state, so that the estimated (filtered) state will be close to its
prediction. On the other hand, low measurement noise (low R) and a large error in
the system state estimation (high P) show that the residual contains a significant
information about the estimation error, so that the filter, through its large ampli-
fication K, performs weighting of the residual ~y in a significantly larger amount
compared to the prediction of the state vector. Also, the size of the covariance
matrix of the prediction error in (1.91) depends directly on the size of the
covariance matrix of state noise Q (mean power of state noise). A high Q shows
inadequacy of the signal model in the state space, while low Q results in a small
error covariance matrix, which shows that the model in the state space represents
an adequate approximation of the real system of interest. This further shows that
the matrix of filter amplification is proportional to Q and inversely proportional to
R, i.e.
K QR1 1:95
The function of a Kalman filter may be represented by the following Table
(Table 1.1).
High filter amplification K points out to its wide bandwidth and faster filter
response to excitation in the form of measured signal, i.e. the measurement
residual. However, a high filter amplification simultaneously means a smaller
degree of noise reduction in the system. On the other hand, smaller filter ampli-
fication means its smaller bandwidth and slower response to the excitation in the
form of measurement signal.
For the implementation of the complete recursive algorithm (Kalman filter) it is
necessary to adopt starting (initial) values ^x0 , P0 , corresponding to a time
index of k 0. If we adopt
24 1 Introduction
^x0 Efx0g m0 ; P0 P0 E x0 m0 x0 m0 T ; 1:96
such choice ensures that the prediction ^x1 is unbiased.
Indeed, since according to (1.90)
^x1 F^x0 1:97
if (1.96) is replaced into (1.97), it follows that
^x1 Fm0 FEfx0g EfFx0g 1:98
Expression (1.98) may be expanded by the zero term GEfx0g, from which
we obtain, according to (1.57), the condition for the unbiased prediction
^x1 EfFx0 Gx0g Efx1g: 1:99
The unbiased prediction ^x1 further implies, according to (1.70) and (1.71),
unbiased filtration ^x1 , and this further results in unbiased ^x2 , etc., and using
induction we conclude that the estimation of the state will be unbiased in each
moment k. The equations of a digital Kalman filter for a time invariant system
model are shown in Table 1.2.
Table 1.2 shows that, in the case of a time-invariant system model, the cal-
culation of the covariance matrix of the prediction and estimation (filtration) error
proceeds independently on the calculation of the prediction and estimation of the
system state, so that this calculation can be done prior the filter implementation in
the real time itself, so that the designer may estimate if the values of parameters in
the model of the filter were adopted in an adequate manner (the parameters of the
filter are matrices F, G, H in the model of the system, the initial values m0 and P0 ,
as well as noise statistics Q and R). To verify practically the design of a Kalman
filter it is necessary if the estimation of the state at a measurement sample of finite
width is in agreement (i.e. consistent) with the theoretical assumptions. The sta-
tistical criteria for the analysis of the filter consistency are
estimation (filtration) errors, whose elements should represent random variables
with a zero mean value and a zero expected (mean) amplitude, in accordance
with the value of the square root from the corresponding diagonal element of
covariance matrix of estimation error, Pk .
1.2 Optimal Filters 25
residual (innovation) ~yk, which should satisfy the same assumptions as the
error ~xk , and it is only necessary to replace the error covariance, Pk , with
the corresponding residual covariance, Sk.
residual (innovation), which should represent white stochastic process.
The last two criteria can be tested in real time, during the operation of the filter
itself, while the first criterion, although the most important one, can be applied
only in a simulated experiment, since the real error, i.e. the system state, is not
known in reality [11]. The application of the quoted criteria is based on the theory
of statistical decision, i.e. the hypothesis testing [11-13], and the reader is referred
to literature in order to become acquainted with the topic in more detail.
Let us also note that the important properties of the Kalman filter are the
following:
Kalman filter is a linear function of the current measurement, yk.
the estimation of the system state ^xk explicitly depends only on the current
measurement yk, while its dependence on the previous measurements Y k1
fy0; y1; . . .; yk 1g reflects only through their influence to the prediction,
^xk .
covariance matrices of the prediction errors, Pk , and the estimation errors,
Pk , can be calculated in advance, before the implementation of the filter
itself, for the case of a time-invariant system model.
the assumptions that the noise of state, xk, and of measurement, vk, are
white and mutually uncorrelated can be diminished and replaced by the
assumptions about the correlated (colored) state noise, correlated measurement
26 1 Introduction
noise and mutually correlated noises; such assumption require certain modifi-
cation of the filter equation [7, 911].
if the stochastic variables x0, xk and vk have Gaussian (normal) distri-
bution, then the conditional function of the state probability, xk, when the
measurements are given up to the current moment, k, Y k fy0; y1; . . .;
ykg is Gaussian (normal) and its expected value is
E xkY k ^xk
In other words, in the quoted case a Kalman filter generates a recursive con-
ditional mathematical expectation, which represents an optimal estimation of the
state vector in the sense of minimal possible covariance matrix of the estimation
error, which reaches the Cramr-Rao lower bound [1315].
if the stochastic variables x0, xk and vk are not Gaussian, then the Kalman
filter is optimal only within the class of linear filters, in the sense of minimal
covariance of the estimation error within the said class of filters.
by replacing (1.84) into (1.91) the vectorial difference equation is obtained
n 1 o
Pk1 F Pk Pk HT HPk HT R HPk FT GQGT
1:100
which is called the Riccati equation; in the case of a time-invariant system, the
solution of the Riccati equation converges asymptotically (k ! 1) to a finite
solution P if the model is observable within the state space [7, 911, 16]; the
condition of the model observability implies that the information about the
immeasurable system states is contained in the measured output, which ensures
that the estimation error remains limited; if additionally the model is in the state
space and controllable, the solution P in equilibrium or stochastic equilibrium state
is also unique; the controllability condition ensures that state noise, as a random
excitation signal, acts to all components of the state vector, which prevents the
convergence to zero of the covariance matrix of estimation error P, i.e. the sta-
tionary solution P will be positively definite matrix with all eigenvalues positive
[7, 911, 16].
In a stochastic stationary state the amplification matrix of Kalman filter K is
constant, and the Kalman filter reduces to a Wiener optimal filter [7, 9, 10, 16].
1.3 Adaptive Filters 27
parameter change
error signal
When estimating the filter parameters, one often uses squared error signal, or the
mean square error (MSE) as the optimization criterion. In dependence on a par-
ticular use of the adaptive filter, the measure of the adaptation success may be
based on the value of the estimated filter parameters, on the filter output signal or
on the error signal.
The most often utilized adaptive filter structures are the transversal structures
(or FIR), owing to their unconditional stability and the relatively simple analysis of
the properties of these filters. Although other structures also find wide use, the
subject of this text is the analysis of the adaptive FIR filters in a nonstationary
environment, as well as the possibility to increase their convergence speed.
Namely, the estimation of the parameters of the real systems of interest is as a rule
followed by difficulties stemming from the inherent nonlinearity and/or nonsta-
tionarity of the system, as well as from noisy observational measurements.
In signal processing, the statistical properties of the input and the reference
signal determine the environment of the adaptive filter. Although most of the
analyses of adaptive filters in the available literature are based on the assumption
of a stationary environment, the application of adaptive filters is especially con-
venient for nonstationary environments. Nonstationarity can be categorized with
regard to the change of statistical properties of the input signal, the reference signal
or both simultaneously. In this text we consider the cases when the nonstationary
model is a consequence of the variation of the estimated parameters of the filter,
since all of the quoted types of nonstationarity may be represented in this manner.
The most significant measures of the properties of the adaptive filter in such an
environment are the time necessary for the convergence of the algorithm for the
estimation of the filter parameters at the onset of nonstationary changes, and the
achieved accuracy of the estimated parameters after the convergence has been
reached. Due to the mutual incongruity of these two requirements, the standard
adaptive algorithms, adequate for the estimation of parameters in stationary con-
ditions, do not give satisfactory estimation of parameters. Basically these are
algorithms with unlimited memory, which take into account all previous values of
the analyzed signal and perform the estimation of parameters in the next moment.
The result of such an approach are estimations of average behavior of the process
in the time interval under consideration. To analyze nonstationary signals it is
necessary to utilize an algorithm with limited memory, which is solved by
introducing a variable forgetting factor. By generating such a variable forgetting
factor using residuals, it is possible to adequately follow both slow and abrupt
1.3 Adaptive Filters 29
changes of the time-variable system parameters, without at the same time sig-
nificantly impairing the desired quality of estimation in stationary mode of
operation.
Since FIR adaptive filters find very wide application, it is often necessary to
model systems with a basically IIR with an FIR filter. The consequence of this
approach is that the dimensions of the vector of the estimated parameters may be
very large in order to achieve satisfactory characteristics of the modeled system.
Besides that, due to the increase of the dimensions of the vector of estimated
parameters, the number of iterations necessary for the convergence of the algo-
rithm for the filter parameter estimation also grows. This fact represents the
motivation for the synthesis of adaptive algorithms with increased convergence
speed. One of the ways to increase the convergence speed, based on the optimal
design of the input signal, is called the D-optimality. The essence of the optimal
experiment design is an adequate choice of variables included in the experiment, in
such a manner that the experiment itself is maximally informative with regard to
the desired application. Besides that, in the class of informative experiments, the
property of some experiments is that the vector of estimated parameters, obtained
based on a finite set of data, reaches the desired value much faster than the vector
of estimated parameters obtained under some other conditions. By choosing a
criterion function maximizing the information content of an equivalent experi-
ment, in which the object is excited by specially designed sequences, and com-
bining with the procedure for the estimation of the unknown filter parameters, one
arrives to the solution for the generation of the optimal input signal.
Let us also remind that the investigation of parameter estimation in various
models of real systems resulted in the development of a number of algorithms
possessing theoretically optimal properties with regard to a chosen criterion. In a
majority of the cases, the methods for parameter estimation are based on an a
priori accepted assumption that the random processes in the system have a
Gaussian distribution. Numerous practical examples, however, showed an insuf-
ficient justifiability of such an assumption, especially if there are large realizations
of random perturbations in the system. It was established that optimal estimation
procedures, based on the Gaussian assumption, may be very sensitive to the
deviation of the real distribution of the perturbation from the assumed normal
distribution, which results in estimations with less than satisfactory quality in
many applications. On the other hand, the appearance of impulse noise is often met
in practice when processing speech signals, images, biomedical signals, as well as
in the solution of communication problems. To solve these problems one may use
robust methods, based on the development of stochastic procedures which would
be efficient even in the solutions of incomplete a priori information about the
perturbations in the system. Bearing in mind the typical applications [1721], we
analyzed separately the problem of robustification of adaptive algorithms with
regard to the impulse noise at the system output. The presented analysis is based
on the application of the methodology of approximate maximum likelihood, also
called in literature the M-Hubers robust estimation.
Chapter 2
Adaptive Filtering
2.1 Introduction
Adaptive linear filters are linear dynamical system with variable or adaptive
structure and parameters. They have the property to modify the values of their
parameters, i.e. their transfer function, during the processing of the input signal, in
order to generate signal at the output which is without undesired components,
degradation, noise and interference signals. The goal of the adaptation is to adjust
the characteristics of the filter through an interaction with the environment in order
to reach the desired values, [4, 5]. The operation of adaptive filters is based on the
estimation of the statistical properties of the signal in its environment, while
modifying the value of its parameters in order to minimize a certain criterion
function. The criterion function may be determined in a number of ways,
depending on the particular purpose of the adaptive filter, but usually it is a
function of some reference signal. The reference signal may be defined as the
desired response of the adaptive filter, and in that case the role of the adaptive
algorithm is to adjust the parameters of the adaptive filter in such a way to
minimize the error signal, which represents the difference between the signal at the
output of the adaptive filter and the reference signal.
The basic processes included in adaptive filtering are digital filtering and
adaptation or adjustment, i.e. the estimation of parameters (coefficients) of the
filter. The choice of the filter structure and the criterion function used during the
adaptation process, have the crucial influence to the characteristics of the adaptive
filter as a whole.
There are several types of digital filters usable in the design of adaptive filters;
most often they are linear discrete systems, although there is a significant appli-
cation of nonlinear adaptive filters, among which a large group are neural networks
[22]. Digital filters are often categorized either in dependence on the duration of
the impulse response or on their structure. Two basic types are the IIR (Infinite
Impulse Response) and the FIR (Finite Impulse Response) filters [25].
The impulse response of a digital filter is its output signal, i.e. its response,
obtained when a unit impulse (Kronecker delta-impulse) is brought to its input
1; for k 0
dk
0; for k 6 0:
The digital filters, in which the duration of the impulse filter response is, the-
oretically, infinite, are called the infinite impulse response filters or the IIR filters.
Contrary to them, the filters with finite impulse response are denoted as the FIR
filters.
When categorizing based on the structure, one starts from the output signal. The
filter output may be a function of the actual and the previous samples of the output
signal, xk, as well as the previous values of the output signal, yk. If the actual
value of the output is a function of the previous values of the output, then there
must be a feedback or a recursive mechanism in the structure, and thus such filters
are denoted as recursive. Contrary to them, if the output is only a function of the
input signal, we speak about non-recursive filters. In order to obtain an infinite
impulse response, one has to use some kind of a recursive filter, and this is the
reason why the terms of the IIR and the recursive filter are sometimes used
interchangeably. Similar to that, finite duration of the impulse response is obtained
with non-recursive structures, and thus the terms of the FIR and the non-recursive
digital filters are used as synonyms.
The most general structure of a digital filter is the recursive filter shown in
Fig. 2.1. It contains a direct branch with multipliers, with their values determined
by the parameters bi ; i 0; 1; . . .; M and a return branch with multipliers,
determined by the parameters ai ; i 1; 2; . . .; N. The actual value of the output
signal ^yk is determined by a linear combination of the following weighted
variables: the actual and the previous values of the input signal samples
xk i; i 0; 1; . . .; M, as well as the previous values of the output signal
samples ^yk i; i 1; 2; . . .; N.
In digital signal processing literature the block diagram 2.1 is denoted as the
direct realization [13]. This structure represents the design of the filter transfer
function with zeroes and poles, such that the position of the poles is determined by
the values of the parameters ai , while the position of the zeroes is determined by
the parameters bi [3]. The number of the poles and zeroes is determined by the
number of the delay elements (z1). This structure has a very large memory,
theoretically infinite, and thus it is denoted as the filter with infinite impulse
response (IIR). In other words, the impulse response of the filter, which represents
2.2 Structures of Digital Filters 33
z-1 z-1
b1 a1 y ( k1)
x(k-1)
z-1 z-1
x(k-2) b2 a2
y ( k 2)
z-1 z-1
x(k-M) bM aN y(k N)
the output signal of the filter, ^yk, to impulse excitation xk dk, will last
infinitely long, i.e. the signal ^yk will decrease to zero only after infinite time (the
transient response of the filter will last infinitely before the filter output reaches
zero value in a steady or equilibrium state).
According to Fig. 2.1, the output signal, ^yk, of the IIR filter is given as a
linear difference equation
X
M X
N
^yk bi kxk i aj k^yk j; 2:1
i0 j1
where bi k; i 0; 1; 2. . . M are the parameters of the IIR filter in the direct branch
in a k-th discrete moment, and aj k; j 1; 2 . . . N are the parameters of the IIR
filter in the return branch in the given k-th moment. In general case M N, thus
N represents the order of the filter (N represents the minimal number of the delay
elements z1 necessary to physically implement the relation (2.1) using digital
electronic components [2]).
Besides calculating the filter output, ^yk, the adaptive IIR filter should update
M+N parameters ai and bi, in order to optimize a previously defined criterion
function. The parameter update is a more complex task in the case of IIR filters
than for FIR filters due to two reasons. The first reason is that an IIR filter may
become unstable during optimization if the filter poles become positioned outside
the stable region (unit circle in the complex z-plane), and the other is that the
criterion function to be optimized, generally taken, may have a multitude of local
minima, with a possible consequence that the optimization process ends in some of
the local minima, instead in the global one. Contrary to them, the criterion
functions of the FIR filters (MSE for instance, as will be shown later) usually have
only one minimum, which also represents the global minimum. In spite of the
quoted difficulties, the recursive adaptive filters find significant practical appli-
cations in the control (regulation) systems, especially if the system to be controlled
is recursive. In these applications the adaptive IIR filters with several parameters
may have better properties than FIR filters with several thousands of parameters
[23].
34 2 Adaptive Filtering
X M X
N
Bk z1 bi kzi ; Ak z1 1 aj kzj : 2:3
i0 j1
Let us note that in the adopted notation the symbol z1 has the meaning of unit
delay, i.e. that z1 xk xk 1 and z1^yk ^yk 1. The polynomial rela-
tion (2.2) can be also written in an alternative form
Ak z1 ^yk Bk z1 xk: 2:4
If we assume that the filter parameters do not change with time, i.e. that they do
not change with the time index k, the filter transfer function, Gz, is obtained as
the ratio of z complex forms of the output, ^yk, and the input, xk, signal,
assuming that the initial conditions in the difference equations are zero, i.e. that the
values of the samples of the corresponding signals in (2.1) are equal to zero for
negative values of the time index k, i.e.
Z ^yk Y^ z Bz1
G z ; 2:5
Z xk X z Az1
where according to (2.1) the polynomials are
X M X
N
B z1 bi zi ; A z1 1 aj zj ; N M; 2:6
i0 j1
while N represents the filter order. In the above relation (2.5) z is a complex variable,
and the roots of the equation Bz1 0 determine the zeroes of the filter, while the
roots of the equation Az1 0 define the poles of the filter (zeroes and poles of the
filter are also denoted in literature as critical frequencies, and their position in the z-
plane is denoted as the critical frequency spectrum. The dynamical response of the
filter to the input signal is dominantly dependent on the position of the poles in the z-
plane and the necessary and sufficient condition of the filter stability is that the poles
are located within the unit circle, jzj 1. Generally speaking, a filter is stable if the
filter output in equilibrium or steady state, occurring after transient process ceased, is
dictated solely by the excitation signal [3].
One of the ways to overcome the lack of potential instability in an IIR digital filter
is to design a filter with zeroes only, which in comparison to a recursive IIR
2.2 Structures of Digital Filters 35
structure has only the direct branch or a non-recursive structure. The memory of
such filters is limited, i.e. their impulse response is equal to zero outside some
limited time interval, and because of that they are denoted as the filters with finite
impulse response (FIR). In other words, a transient process in such a system, which
is initiated immediately after bringing excitation and which lasts until the output
signal assumes a stationary value, i.e. the system enters equilibrium or steady state,
has a finite duration. A good property of the FIR filters is that their phase char-
acteristic is completely linear (the transfer function G(z) for
z expjxT ; p xT p, where j denotes imaginary zero, is denoted as
amplitude-phase frequency (spectral) characteristic; at that jGexpjxT j is called
the amplitude, and argfGexpjxT g is the phase frequency (spectral) charac-
teristic). Another good property is that they have unconditional stability, and
because of that they represent the basis of the systems for adaptive signal pro-
cessing [4, 5]. Two basic structures for realization of FIR filters are the transversal
structure and the lattice structure. Figure 2.2 shows the structure of a transversal
FIR filter. The filter contains adders, delay elements (z -1) and multipliers, defined
by the parameters fbi ; i 0; 1; 2; . . .; M g.
The number of the delay elements, M, denotes the order of the filter and the
duration of its impulse response. The output signal of the filter, ^yk, is determined
by the values of the parameters fbi g and it represents a linear combination of the
actual and the previous samples of the input signal, xk These parameters are the
object of estimation in an adaptive process, i.e. they vary with time index k. In this
manner, according to Fig. 2.2, the filter output signal is defined by the linear
difference equation
-1
z
x (k-1) b1
-1
z
x(k-2) b2
z -1
x (k-M ) bM
36 2 Adaptive Filtering
X
M
^yk bi kxk i: 2:7
i0
X M
Bk z1 bi kzi ; 2:9
i0
Z ^yk Y^ z zM Bz1
Gz : 2:10
Z xk X z zM
The roots of the polynomial equation zM Bz1 0 determine the filter zeroes,
while according to the expression (2.10) it is concluded that the filter has a pole z 0
with a multiplicity M (the roots of the equation zM 0 are the poles of the system).
Since these poles are located within the unit circle jzj 1 within the plane of the
complex variable z, the FIR filter represents a system with unconditional stability.
Because of that fact, it is customary in literature to say that the FIR filter transfer
function has only zeroes, while it is said for the transfer function of an IIR filter that it
has both zeroes and poles (zeroes are the roots of the polynomial in the numerator,
while poles are the roots of the polynomial in the denominator of the rational function
representing the filter transfer function). Specifically, if the excitation xk is a unit
impulse, i.e. x0 d0 1 and xk dk 0 za k 6 0, according to (2.7) it is
concluded that ^y0 b0 ; ^y1 b1 ; . . .; ^yM bM ; ^yk 0 for k [ M, i.e. the
impulse response of the filter will last M 1 samples, while the coefficients bi denote
the values of the samples of the filter impulse response in the corresponding discrete
and equidistant moments of signal sampling ti iT; i 0; 1; 2; . . .; M; where T is
the sampling or discretization period.
The concepts of optimal linear estimation represent the basis for the analysis and
synthesis of adaptive filters [710]. The problem of adaptive filtering encompasses
two estimation procedures, the estimation of the desired output signal from the
2.3 Criterion Function for the Estimation of FIR Filter Parameters 37
filter and the estimation of the filter coefficients necessary to achieve the desired
goal. The definition of these estimators depends on the choice of the criterion
function, which defines the quality of the estimation based on the difference
between the estimator input, ^yk, and the reference or the desired output signal,
i.e. the response yk.
If we denote the column vector of input data with a length M 1 as Xk
Xk xk xk 1 xk 2 . . . xk M T ; 2:11
and the column vector of the estimated parameters or filter : coefficients in the k-th
discrete moment of signal sampling as H ^ k
T
^ k ^
H b0 k ^
b1 k ^ b2 k . . . ^bM k ; 2:12
where k denotes the actual discrete time moment, and T is the matrix operation of
transposing, then the signal at the FIR filter output, ^yk, may be defined in the
form of a linear regression equation, i.e. as a scalar product of the corresponding
vectors
^yk XkT H
^ k; 2:13
i.e.
^ kT Xk:
^yk H 2:14
While designing the optimum solution, the filter is optimized according to the
corresponding criterion function or the performance index. The filter is determined
by the parameter vector H ^ k, so that the optimization problem reduces to the
choice of parameters which will minimize the chosen criterion function. The
choice of the criterion function is a complex problem and most often depends on
the particular application of the filter [9, 16].
One often meets in practice a criterion function defined as the mean square error,
MSE, which represents the averaged (expected) value of the squared difference
between the reference signal, yk, and the estimated actual value of the output
signal, ^yk. The goal is to minimize the mean square value of the error signal,
which in the ideal case has the consequence that the statistical mean error value
tends to zero, and that the filter output signal is as close as possible to the desired
reference signal.
The error signal (Fig. 2.3) is defined according to (2.13) in the following
manner
38 2 Adaptive Filtering
y (k )
_ +
Digital filter y(k )
x (k )
H ( k)
e (k )
Adaptive
algorythm
Fig. 2.3 Structure of an adaptive digital filter. The error signal ek appears as the difference
between the reference signal yk and the actual filter output ^yk, and the adaptive algorithm
generates in each step k the parameter vector H ^ k, as well as the estimation of the unknown
parameters Hk
ek yk ^yk yk XkT H
^ k: 2:15
Assuming that e(k), y(k) and x(k) are stationary random series (the statistical
properties of these signals do not change with time) and that the elements of the
^ k are constant, the criterion function J is defined as
vector H
h i 2
where E denotes the mathematical expectation with regard to the random vari-
ables x and y. The nature of the criterion function (2.16) is probabilistic and in such
an environment all signals are taken as realizations of stochastic processes, which
points out to the fact that it is necessary for the design of the filter to know the
suitable statistical indicators regarding the signals under consideration, i.e. the
aggregate function of the probability density for the random variables x and y.
Since the mathematical expectation is a linear operator, i.e. the mathematical
expectation of a sum is equal to the sum of mathematical expectations, and the
mathematical expectation of a product is equal to the product of mathematical
expectations only for statistically independent variables [7, 9, 10], it follows
h i h i
E ek2 E yk2 H ^ T E XkXkT H ^ 2E ykXkT H: ^ 2:17
2 3
xk 2 xkxk 1 xkxk 2 ... xkxk M
6 7
6 xk 1xk
6 x k 1 2 xk 1xk 2 . . . xk 1xk M 7
7
R E6 .. .. .. .. .. 7
6 7
4 . . . . . 5
xk M xk xk M xk 1 xk M xk 2 ... xk M 2
2:18
and
D E ykXT k
denote the cross-correlation vector of the input and the reference signal
Jmin
Hopt =bopt
parameter b
100
50
-50
-10
0 10
b1 0 5
10 -10 -5
b2
The determination of the minimum of the criterion function can be done using
the criterion method [12, 24, 25]. Namely, the MSE gradient is a vector that is
always directed towards the fastest increment of the criterion function and with a
value equal to the slope of the tangent to the criterion function. In the point of the
minimum of the criterion function the slope is zero, so it is necessary to determine
the gradient of the criterion function and equal it to zero in order to obtain the
optimum values of the parameters minimizing the criterion function. The gradient
of the criterion function J, denoted as rJ or only r, is obtained by differentiating
the expression (2.20) with regard to H,^
2.3 Criterion Function for the Estimation of FIR Filter Parameters 41
oJ h iT
r oJ
ob oJ oJ
... oJ ^ 2D:
2RH 2:21
^ 0 ob1 ob2 obM
oH
If (2.21) is made equal to zero, we obtain the Wiener-Hopf equation, and by
solving it we obtain the optimal solution for the parameter vector
^ 2D 0
r 2RH 2:22
Starting from expressions (2.20) and (2.23), the criterion function may be
represented as
^ T RH
J E y2 k H ^ 2HT RH ^
opt
E y2 k HTopt RH^ H ^ Hopt T RH
^ 2:25
^ T RHopt H
E y2 k H ^ Hopt T RH
^
^ 1 R1 r:
Hopt H 2:28
2
Equation (2.28) represents the Newtons method for the determination of the
root of the vector equation obtained by making the gradient of the criterion
function equal to zero (the necessary condition of the minimum of the adopted
criterion). Knowing the value of H ^ in any moment of time, together with the R and
the corresponding gradient r, one can determine the optimal solution Hopt in just a
single step. In practical situations, however, the available information are insuf-
ficient to perform a single-step adaptation. The value of the correlation matrix of
the input signal, R, changes with time under nonstationary conditions and, in the
best case, can be only estimated, similar to the unknown value of the criterion
function gradient r which must be estimated in each iteration. In order to reduce
the effect of noisy or fluctuating values of these estimations, one modifies (2.28)
on order to reach the algorithm which updates the parameter vector H ^ in small
increments and converges to Hopt after a number of iterations. In this manner,
starting from (2.28), one reaches the Newtons method in an iterative (recursive)
form [12, 24, 25]
H ^ k 1 R1 rk;
^ k 1 H fk 0; 1; 2; . . .g 2:29
2
2.3 Criterion Function for the Estimation of FIR Filter Parameters 43
where the index k with the gradient of the criterion function denotes that it is
estimated in each iteration according to (2.21). The expression (2.29) can be
generalized by introducing a constant l, i.e. a dimensionless variable determining
the convergence speed of the iterative process
H ^ k 1 lR1 rk 1:
^ k H 2:30
^ k 1 2D, and thus
According to (2.21) it follows that rk 1 2RH
according to (2.30) one obtains
H ^ k 1 2lR1 D:
^ k 1 2lH 2:31
Arranging further (2.31), and taking into account (2.23), one can write
H^ k 1 2lH
^ k 1 2lHopt
^ 2^
Hk 1 2l Hk 2 2l1 1 2lHopt
.. 2:32
.
kP
1
H^ k 1 2lk H
^ 0 2lHopt 1 2li :
i0
The vector H^ obviously converges to the optimal value of Hopt only in the case
kP
1
when the condition is fulfilled that the geometric series 1 2li is convergent,
i0
i.e.
j1 2lj\1; 2:33
that is
0\l\1; 2:34
and in that case
h i
^ k 1 2lk H0
H ^ Hopt 1 1 2lk : 2:35
From (2.35) it follows that the final solution can be reached in one step for
l 0:5, but only under the condition that one knows the accurate values of the
inverse correlation matrix of the input signal, R1 , and the gradient of the criterion
function, r, i.e. the cross-correlation vector D. In the case when R1 and r are
estimated, one usually utilizes values l 1, typically smaller than 0.01, to
overcome the problems appearing because of the error introduced by the estima-
tion of the unknown variables R and r.
Newtons method is fundamentally important from the mathematical point of
view, however it is very demanding in practical applications because of the need to
estimate R and r in each step. It is the method of gradient search, a consequence
^ change in each iteration, with the goal
of which is that all elements of the vector H
44 2 Adaptive Filtering
to determine the optimum values of the parameters. These changes are always
toward the minimum of the gradient function, but, as (2.30) shows, not necessarily
in the direction of the gradient itself.
As mentioned, the main problem with the Newtons algorithm is its application
under the conditions when one does not know the value of the inverse correlation
matrix of the input signal and the value of the gradient of the criterion function, i.e.
the cross-correlation of the input and the reference signal. Regretfully, it is a
common case in practice [6]. In that case one most often assumes that the non-
diagonal elements of the correlation matrix are equal to zero. The methods based
on this assumption bear a common name of the steepest descent method and we
consider them in the further text.
The steepest descent method starts from some initial value H ^ 0. The estima-
tion in the next step H^ k 1 is equal to the current estimation H ^ k corrected by
the value in the direction opposite to that of the direction of the fastest increment
of the function, i.e. of the gradient, in the point H^ k. The last term in Eq. (2.36)
represents the estimated gradient of the criterion function in the k-th iteration. The
scalar parameter b is the convergence factor determining the size of the correction
step and influences the stability and the adaptation speed of the algorithm. The
dimension of this factor is equal to the reciprocal value of the dimension of the
input signal power.
The graphical presentation of this method for M 1 is given in Fig. 2.6. It can
be shown that the convergence conditions are satisfied for [6]
1
0\b\ ; 2:37
kmax
where kmax is the largest eigenvalue of the correlation matrix of the input signal R,
which depends on the input signal power, i.e. on the mean expected value of the
squared amplitude of the input signal.
2.3 Criterion Function for the Estimation of FIR Filter Parameters 45
J M=1
Gradient
vector
Jmin
When comparing Eqs. (2.36) and (2.29), one should note that in the case of
Newtons method the information about the gradient is corrected with the value of the
inverse correlation matrix of the input signal, R1 , and with the scalar parameter l.
This means that in this method the direction of the criterion function search is
corrected to keep it always toward the minimum of the criterion function, while in
the steepest descent method this direction coincides with the fastest increase
(decrease) of the function. The two quoted direction may not coincide in a general
case, and the search path of the criterion function in the application of Newtons
method is shorter, which suggests that the optimization process is faster compared
to the steepest descent method (Fig. 2.7). This advantage stems from the fact that
Newtons method utilizes much more information about the criterion function in
comparison to the steepest descent method. Also, compared to the steepest descent
method, Newtons algorithm is much more complex, since it requires the calcu-
lation or the estimation of the inverse correlation input matrix in each iteration.
However, under real circumstances, in the presence of noise while estimating the
gradient and the input data correlation matrix, it may happen that the steepest
descent method converges much more slowly toward the minimum of the MSE in
comparison to Newtons method or that, for the sake of speed, converges into a
larger value of the MSE criterion.
Adaptive digital filters, generally taken, consist of two separate units: the digital
filter, with a structure determined to achieve desired processing (the structure is
known with an accuracy to the unknown parameter vector) and the adaptive
46 2 Adaptive Filtering
b2 Newton's method
Steepest descent method
b 2op t
H = [ b1 b 2 ]
H opt = [ b1opt b 2 opt ]
b 1opt b1
Fig. 2.7 Directions of the determination of the minimum of the criterion function for the steepest
descent method and for the Newtons method
algorithm for the update of filter parameters, with a goal to ensure their fastest
possible convergence to the optimum parameters from the point of view of the
adopted criterion. According to this, it is possible to implement a larger number of
combinations of filter structures and adaptive algorithms for parameter estimation.
Most of the adaptive algorithms represent modifications of the standard iterative
procedures for the solution of the problem of minimization of criterion function in
real time. Two important parameters determining the choice of the adaptive
algorithm are the adaptation speed and the expected accuracy of the parameter
estimation after the adaptation is finished. In a general case, there is a discrepancy
between these two requirements. For a given class of adaptive algorithms an
increase of adaptation speed will decrease the accuracy of the estimated param-
eters and vice versa. In this section we consider two basic algorithms for parameter
estimation for the FIR adaptive filters: the Least Mean Square (LMS) and the
Recursive Least Square (RLS) algorithms. The LMS algorithm has a relatively
large importance in applications where it is necessary to minimize the computa-
tional complexity, while the RLS algorithm is popular in the fields of system
identification (the determination of the system model based on experimental data
on input and output signals) and time series analysis (experimentally recorded
signal samples) [24, 25].
LMS (Least Mean Square) algorithm belongs to the method of steepest descent,
but it utilizes a special estimation of the gradient, i.e. it takes the actual value of
2.4 Adaptive Algorithms for the Estimation of Parameters of FIR Filters 47
J k e2 k: 2:38
Based on the values of H^ k and e(k) defined in (2.12) and (2.15) the estimation
of gradient is obtained as
2 3
o ek
2 6 o b0 7
^ k o e k 2ek o ek 2ek6 .. 7 2ekXk;
r 2:39
oH^ oH^ 4 . 5
o ek
o bM
i.e. the current estimation of the gradient is the product of the vector of input
signals in k-th iteration and the corresponding error signal. This represents the
basis of the simplicity of the LMS algorithm, since only a single multiplication
operation per each parameter is necessary for the estimation of the gradient.
Starting from the general form of the steepest descent method (2.36) and (2.39),
one may define the LMS algorithm as
^ k 1 H
H ^ k;
^ k br 2:40
^ k 1 H
H ^ k 2bekXk; k 0; 1; 2; . . .; 2:41
^
where rk represents the estimation of the gradient (2.39), and b is a scalar
parameter influencing the adaptation speed, the stability of the adaptive algorithm
and the error value after the adaptation process is finished. Since the change of the
value of the parameter vector is based on the estimation of the gradient without
averaging (only the actual value of the error signal is used, i.e. one its realization)
and not on its real value, one may expect that the adaptive process will be noisy,
i.e. that the estimation of the parameters will fluctuate (the estimation of the
parameters represents random variables with a corresponding variance).
It can be shown that (2.39) represents an unbiased estimate of the gradient for
the case when the values of the parameters are constant
E r^ k 2EekXk
2E ykXk XkXT kH ^ 2:42
^ 2D rk:
2RH
^
rk, formed as the arithmetic means of the previously calculated estimated
gradients with a goal to approximately determine the mathematical expectation of
such an estimation of the gradient, describes well the accurate value rk. Bearing
in mind that the parameter vector H ^ in (2.40) is practically updated in each
iteration (k 0; 1; . . .), it is necessary to limit the value of the scalar parameter b
according to (2.37), in order to ensure the convergence of the parameter vector H ^
to Hopt [6].
As a consequence of updating of the parameter vector H ^ in each iteration,
according to insufficiently accurate estimation of the gradient, the adaptive process
is noisy, i.e. it does not follow the steepest descent line toward Hopt . This noise
decreases in time with the advance of the adaptive process, since near Hopt the
value of the gradient is small and the correction term in (2.40) is also small, so that
the estimations of the parameters are close to their previous values.
From (2.41) it is obvious that the algorithm is simple for implementation,
because it does not require the operations of squaring, averaging or differentiating.
In the LMS algorithm each parameter of the filter is updated in such a manner that
one adds weighted value of error signal to its actual value
bi k 1 bi k 2bekxk i; i 0; 1; . . .; M; k 0; 1; . . . 2:43
The error signal, e(k), is common for all coefficients, while the weight factor is
2bxk i and it is proportional to the values met in a k-th moment in the i-th
delay section of the FIR filter. To calculate 2bekxk one needs M 1 arith-
metic operations (multiplication and addition). Because of that, each step of the
algorithm requires 2M 1 operations, which makes this algorithm convenient
for the real time application [26].
Generally taken, for larger values of b one reaches greater convergence speeds,
but the estimation error is also larger, while for smaller b one also has smaller
asymptotic error of the parameter estimation. Also, according to (2.37) it can be
seen that the value of b is limited by kmax , i.e. by the input signal power xk.
In order to overcome this problem, one may modify the expression (2.43) so that
the correction factor is normalized with regard to the input signal power
aekxk i
bi k 1 bi k : 2:44
PM
2
x k j
j0
The expression (2.44) represents the Normalized Least Mean Square Algorithm
NLMS, and a is a constant which may have a value within the range 0\a\2, [27]
The simplicity and easy implementation make the LMS algorithm very
attractive for many practical applications. Its main deficiency regards its conver-
gence properties which are slow and depend on the characteristics of the input
signal. The LMS algorithm has only a single variable, the parameter b, whose
change influences the convergence properties and which has a limited range of
possible values, according to (2.37).
2.4 Adaptive Algorithms for the Estimation of Parameters of FIR Filters 49
Let us consider the structure of a digital FIR filter shown in Fig. 2.8 which is
known with an accuracy up to an unknown set of parameters bi ; i 0; 1; :. . .; M.
The error signal, ek, for this case can be defined as
ek yk xkb0 xk 1b1 . . . xk M bM : 2:45
The same as above, the vector of unknown parameters in the k-th moment of
^
sampling is denoted by Hk and defined as
^ T k b0 k
H b1 k b2 k ... bM k : 2:46
The least squares method is based on the criterion according to which the
estimation of parameters is optimal if the sum of error squares is minimal. Thus the
criterion function for the LS algorithm is defined by [6]
1X k
J k e2 i: 2:47
2 i0
Let us note that expression (2.47) for the LS criterion represents an approxi-
mation of the expression (2.16) for the MSE criterion in which the mathematical
expectation is replaced by the corresponding sum. In the LMS criterion (2.38) this
sum contains only a single term, the square of the actual error signal.
y(k)
x (k) z-1 z-1 z-1
b0 b1 b2 bM -
y ( k )
If the expression (2.45) is written in matrix form, using the whole data package
fei; i 0; 1; . . .; kg, one obtains
2 3 2 3
e 0 y 0
6 e 1 7 6 y 1 7
6 7 6 7
6 e 2 7 6 y 2 7
6 76 7
6 . 7 6 . 7
4 .. 5 4 .. 5
e k y k
2 32 3
x0 0 0 0 b0 k
6 x1 x 0 0 0 7 6 b1 k 7
6 7 6 7
6 x2 x 1 x0 0 7 6 b2 k 7
6 7 6 7 2:48
6 .. 7 6 .. 7
4 . 5 4 . 5
xk xk 1 xk 2 x k M bM k
or in matrix notation
^ k;
ek yk ZkH 2:49
where ek represents the error vector, y(k) is the reference signal vector, and
Z(k) is the input matrix. When forming this equation it was adopted that yk 0
for k 0 (causal signal). The criterion function given by (2.38) can be expressed
using the vector ek, given as (2.49), in the form of a scalar product
1
J k eT kek; 2:50
2
or in expanded form
1h ^ T kZT kyk H
^ k H ^ T kZT kZkH
i
^ k :
J k yT kyk yT kZkH
2
2:51
^ one obtains
By differentiating the criterion (2.51) over the parameter vector H
oJ k
ZT kyk ZT kZk: 2:52
^ k
oH
The vector minimizing the criterion function (2.50) is obtained by making the
expression (2.52) equal to zero, i.e.
^ k Z T kZk 1 ZT kyk:
H 2:53
When deriving the expression (2.52) the following rules for the differentiation
of scalar over vector were used [7]
where x and y are column vectors with the corresponding dimensions, and A is a
square and symmetric matrix. So for instance for the second addend in (2.51) it is
adopted that yT yT kZk and x H ^ k, while for the last addend in (2.51)
xH ^ k and A Z kZk.
T
1
Zk
ZT k Xk 1
XT k 1 2:56
T T
1
Z kZk Xk 1X k 1 ;
1
P1 k Xk 1XT k 1
where according to (2.11)
X T k 1 x k 1 xk xk 1 . . . xk M 1 ; 2:57
and the dashed line denotes the columns and the lines to be added to the existing
matrix Z(k) to form the matrix Zk 1.
Using the identity (lemma on matrix inversion) [24]
1
A BCD1 A1 A1 B C DA1 B DA1 2:58
valid for all matrices with corresponding dimensions and the nonsingular matrix
A, one may write the value of the gain matrix in the k 1 discrete moment
52 2 Adaptive Filtering
1
Pk 1 Pk PkXk 1 1 XT k 1PkXk 1 XT k 1Pk:
2:59
The expression (2.59) is obtained directly from (2.57) and (2.58) if one adopts
in (2.58) that A P1 k; B Xk 1; C 1; D XT k 1. According to
(2.53) and (2.59) it follows
T y k
^
Hk 1 Pk 1 Z k j Xk 1
yk 1 2:60
T
Pk 1 Z kyk Xk 1yk 1
Bearing in mind that according to (2.53) and (2.55)
^ k PkZT kyk;
H 2:61
it is further concluded that
Recursive least squares (RLS) algorithm in its original version is suitable for the
estimation of parameters for stationary conditions, i.e. constant estimated
parameters. Basically it is an algorithm with an unlimited memory, where all
previous results are equivalently taken into consideration and based on it the
estimation of the parameters in the next moment is performed. In the case of a
time-variable system this means that the criterion (2.57) will furnish an estimation
of the average behavior of the process in the time interval under consideration, and
thus such an estimation will not be able to follow correctly the momentary changes
of parameters in the digital filter model. To overcome this problem it is necessary
54 2 Adaptive Filtering
1X k
J k qki e2 i; 2:69
2 i0
where q represents the forgetting factor (FF) determining the effective memory of
the algorithm and its value is within the range
0\q 1: 2:70
For stationary conditions (not changing in time) one applies q 1, and in this
case the criterion function defined by (2.69) becomes equal with (2.47), and the
algorithm that recursively minimizes the given criterion has an unlimited memory.
In this way, the estimated parameters have a high accuracy, since, asymptotically
taken, one eliminates the influence of noisy states by averaging them. For the
conditions when the estimated parameters change, the forgetting factor q 1 is
not convenient, because the adaptation of the estimated parameters towards the
real values is relatively slow. Because of that one should use q\1. By utilizing
q\1 one obtains different weightings for previous measurements, i.e. the previous
measurements are taken with a smaller weight compared to the more recent ones.
Assuming that a nonstationary signal consists of stationary segments of a given
length, the forgetting factor q can be determined in the following manner. Starting
from the assumption that the value of q is close to zero, one may write
2.4 Adaptive Algorithms for the Estimation of Parameters of FIR Filters 55
qk ek ln q ek ln1q1 ek1q ;
i.e.
1
qk ek=s ; s : 2:71
1q
In this manner, by choosing a forgetting factor of q\1, the effective memory of
the algorithm becomes
1
s ; 2:72
log q
which, in the case when the values of q are close to one, is approximately equal to
1
s : 2:73
1q
Expression (2.71) shows that the measurements older than s k [ s are allo-
cated a weight smaller than e1 0:36 k s compared to the unit weight
allocated to the current measurement (k 0). In other words, thus chosen weight
factor q in the criterion (2.69) corresponds to the exponentially decaying memory
of the algorithm, where the time constant s of the exponential curve corresponds to
the memory length in the adopted units of time or to the number of periods of
signal sampling.
Through minimization of the criterion function (2.69) one arrives to the
Weighted Recursive Least SquaresWRLS algorithm. The derivation of the
WRLS algorithm is identical to that of the RLS algorithm. Namely, the criterion
(2.69) can be written in square matrix form
1
Jk eT kWkek; 2:74
2
where
2 3 2 3
e 0 w 0 0 0
6 e 1 7 6 0 w 1 0 7
6 7 6 7
e k 6 . 7 ; W k 6 .
. . 7; wi qki ; i
4 . 5 4 . 5
e k 0 0 wk
0; 1; . . .; k: 2:75
1 ^ T kZT kWkyk
J k yT kWkyk H
2 2:76
yT kWkZkH^ k H ^ T kZT kWkZkH
^ k
Similarly to the derivation of the standard RLS algorithm, if one further applies
the rules (2.54) for differentiation of the corresponding terms in (2.76) over the
^ one obtains the necessary condition for the minimum of the criterion (2.74)
vector H,
oJ k ^ k 0;
ZT kWkyk ZT kWkZkH 2:77
^ k
oH
and the nonrecursive algorithm of weighted least squares (non-recursive WRLS
algorithm) directly follows from it
^ k ZT kWkZk 1 ZT kWkyk:
H 2:78
The recursive version of the WRLS algorithm is obtained from (2.78) in a
manner identical to the one used to derive the recursive algorithm (2.38), (2.64)
(2.66) from its non-recursive form (2.53). According to the expression (2.78) one
may write the block-matrix relation
qWk 0 y k
^ k 1 Z T k X k 1
H
0 1 y k 1 2:79
T
Pk 1 qZ kWkyk Xk 1yk 1 ;
where
1
Pk 1 ZT k 1Wk 1Zk 1 ; ZT k 1 ZT k Xk 1
qWk 0 y k
Wk 1 ; y k 1
0 1 y k 1
2:80
According to (2.80) it follows further that
ZT kWkyk P1 kH
^ k; 2:84
from where, after introducing the expression (2.79), one obtains
1 1
^ k:
ZT kWkyk P k 1 Xk 1XT k 1 H 2:85
q
By replacing the expression (2.85) in relation (2.81), one obtains
^ k 1 H
H ^ k Pk 1Xk 1 yk 1 XT k 1H ^ k : 2:86
The expression
Kk 1 Pk 1Xk 1 2:87
defines the gain matrix of the recursive algorithm for the estimation of digital filter
parameters (2.86). By replacing the expression (2.82) in the relation (2.87) the
latter can be written in the alternative form
1
Kk 1 PkXk 1 q XT k 1PkXk 1 2:88
Relations (2.82), (2.86) and (2.87) or (2.88) define the recursive WRLS algo-
rithm, i.e.
^ k H
H ^ k 1 Kk yk XT kH ^ k 1 ; 2:89
where
1n 1 o
P k Pk 1 Pk 1Xk q XT kPk 1Xk XT kPk 1 :
q
2:90
To start the recursive procedure (2.88)(2.90) it is necessary to adopt the initial
values P(0) and H ^ 0 and they are chosen in the same manner as in the RLS
^
algorithm, i.e. H0 0 and P0 r2 I, where r2 1, and I is the unit matrix
with corresponding dimensions.
The very name forgetting factor, q, suggests that it represents the measure of
taking into account the previous measurements in the estimation process. In other
words, the choice of the values for the forgetting factor determines how quickly
one neglects the influence of the previous measurements. In the estimation of
stationary parameters it is desirable that the algorithm similarly takes into account
all previous measurements, because the system does not change with time and in
this case one assumes q 1, i.e. s 1. However, the situation is completely
58 2 Adaptive Filtering
different if parameters vary with time. In this case the so-called older mea-
surements do not have a large significance, because they do not bear information
about the newly occurring changes. Because of that their importance is decreased
by the choice of the forgetting factor with a value smaller than 1. For instance,
q 0:9 corresponds, according to (2.73), to the value of memory of the algorithm
of s 10 signal samples.
Through the choice of the forgetting factor q\1 one achieves faster adaptation
of the parameters to accurate values in such a manner that better results are
obtained with smaller values of q (which corresponds to smaller s), but one
simultaneously increases the variance of the parameter estimation because of the
influence of noisy measurements. On the other hand, through the use of a fixed
forgetting factor q\1, the gain matrix Pk is constantly divided by a factor
smaller than 1, which may lead to the consequence that the gain matrix (2.89)
achieves a very high value, and thus the algorithm becomes very sensitive to
random disturbances or numerical errors propagating through the residual of
measurements
^ k 1:
ek yk XT kH 2:91
The basic deficiency of the application of RLS algorithms with a fixed FF in
time-variant systems follows from here. The choice of the time constant s; i.e. the
forgetting factor q; depends on the expected dynamics of the filter parameter
change and they should be chosen to keep parameters approximately constant on
the interval with a length of s signal samples. Starting from expression (2.71), for
nonstationary signals containing intervals of quasi-stationarity, on nonstationary
segments it is useful to utilize q\1, which corresponds to small s. For stationary
segments one should adopt q 1, which corresponds to a large value of
s s ! 1. In order to achieve an adequate ability for adaptation in the change of
time-variant systems, which includes nonstationary changes, and simultaneously
to avoid a significant influence to the variance of the estimated parameters in the
intervals without changes, as well as the influence to their accuracy, it is necessary
to vary adaptively the forgetting factor during the operation of the algorithm itself.
There is a wider discussion in Chap. 3 about the strategy of choice of variable
forgetting factor while applying the adaptive algorithm itself. In practical situa-
tions one sometimes adopts that the forgetting factor q varies over time within the
quasi-stationarity interval and that it exponentially increases to 1 [24]. This cor-
responds to choosing
qk q0 qk 1 1 q0 ; k 1; 2; . . . 2:92
where the usual choice is q0 0:99 and q0 0:95. According to (2.92) one may
write
q1 q0 q0 1 q0
q2 q0 q1 1 q0 q20 q0 q0 1 q0 1 q0 ;
2.4 Adaptive Algorithms for the Estimation of Parameters of FIR Filters 59
1 qk0 2:93
qk0 q0 1 q0 :
1 q0
1 qk0 1 q0
According to the above expression one obtains
lim qk 1: 2:94
k!1
FIR adaptive filters have a unimodal criterion function with a single global min-
imum and are not susceptible to instability with a change of the value of their
parameters, because the adequate filter transfer function has all poles in the origin,
z 0, of the z-complex plane, i.e. all poles of the transfer function are within the
stability region (unit circle jzj 1) [3]. The process of convergence of the
parameters of an FIR filter towards the optimal values, corresponding to the
minimum of the adopted criterion, is well researched and the results about it are
available in literature [4]. These properties make them more desirable than other
structures and because of that they have a very wide practical application [17].
However, with an increase of the length of the impulse response of the modeled
system one must proportionally increase the number of the filter parameters. This
leads to an increased complexity of the adaptive algorithm, a decrease of the
convergence speed, and in the case of exceptionally long impulse response also to
unacceptably high complexity of the suitable digital hardware.
It is possible to overcome this deficiency by using adaptive filters with infinite
impulse response, the IIR filters.
The main advantage of the adaptive IIR filters in comparison to the adaptive
FIR filters is that by using the same or even a smaller number of parameters one is
able to significantly better describe a given system for signal transfer and pro-
cessing. The response of this system can be much better described by the output
60 2 Adaptive Filtering
signal from a filter whose transfer function has both zeroes and poles (IIR) in
comparison to a filter whose transfer function has zeroes only (FIR). So for
example an adaptive IIR filter with a sufficiently high order can accurately model
an unknown system described by a certain number of zeroes and poles, while an
adaptive FIR filter can only approximate it. In other words, to describe some
system with a given accuracy, an IIR filter generally requires a much smaller
number of coefficients that the corresponding FIR filter.
Figure 2.1 shows a general structure of an adaptive filter, which may be an
adaptive IIR filter. xk denotes the input signal, yk is output signal, ek is error
signal, H is an unknown parameter vector of the estimated filter, and y(k) is the
reference signal. The adaptive IIR filter consists from two basic parts: a digital IIR
filter determined by the values of the variable parameters of the vector H and the
corresponding adaptive algorithm according to which the unknown parameters are
updated to minimize a given criterion function which is a function of the error
signal ek.
Basically there are two approaches to adaptive digital IIR filtering, which
correspond to different formulations of the error signal ek. They are denoted as
the equation error (EE) method and the output error (OE) method. The EE method
is characterized by the updating of the feedback coefficients of the IIR adaptive
filter in the domain of zeroes, which basically leads to the adaptive FIR filters and
the corresponding adaptive algorithms from their domain.
The adaptive IIR filters based on the EE method are shown schematically in
Fig. 2.9. The signal ye k is defined by the following expression
X
N X
M
ye k ai kyk i bi kxk i; 2:95
i1 i0
2.5 Adaptive Algorithms for the Estimation of the Parameters of IIR Filters 61
y o (k)
1
x(k) 1
B ( k, z )
1 + A(k , z1)
where fai kg and fbi kg are variable parameters of the IIR filter, estimated by a
suitable adaptive algorithm (the adaptive algorithm recursively minimizes the
adopted criterion function).
The signal ye k is a function of the momentary value and the M previous
values of the input signal xk, as well as of N previous values of the reference
signal yk. One should note that it does not depend on the values of the filter
output signal, yo k. Neither the reference signal yk, nor the input signal xk
depend on the filter coefficients, so the determination of the values of ye k with
regard to the filter parameters is a non-recursive procedure. Expression (2.95) can
be written in its polynomial form
ye k B k; z1 xk A k; z1 yk; 2:96
where polynomials are in a discrete moment kT (T signal sampling period)
XN
A k; z1 ai kzi
i1
2:97
X
M
1 i
B k; z bi kz :
i0
filters (where Ak; z1 0). They utilize similar adaptive algorithms, with similar
convergence properties as the FIR adaptive filters [1]. Equation (2.95) can be also
represented in the vectorial form
ye k HT kXe k; 2:99
which represents a scalar product of the following two vectors
HT k b0 k b1 k . . . bM k a1 k a2 k . . . aN k; 2:100
Xe k xk xk 1 . . . xk M yk 1 yk 2 . . . yk N T : 2:101
The vector Hk contains estimated parameters in a discrete moment k, and the
vector Xe k contains actual and delayed values of the input and reference signal.
It is important to note that Xe k is not a function of the vector Hk. Various
algorithms may be used for the estimation of parameters, like Recursive Least
Square (RLS) method, Weighted Recursive Least Square (WRLS), Least Mean
Square (LMS) method and others [4, 6]. These algorithms were described in detail
in the previous sections.
IIR adaptive algorithms based on the EE model may converge to values shifted
in comparison to the optimal ones, which leads to an erroneous estimation of
parameters. Although EE adaptive IIR filters have good convergence properties, in
principle they may be completely unacceptable models if this shift is significant in
the estimation of parameters. Let us note that the estimation of the parameter
vector is unbiased if the mathematical expectation of the parameter vector esti-
mation (the estimated parameters represent a random vector) is equal to their
accurate (optimal) value.
OE adaptive IIR filters update the coefficients of the IIR filters in the return
branch directly both in the domain of zeroes and in the domain of poles. In this
case the estimated parameters are not shifted compared to the optimal values, but
the adaptive algorithm may converge to a local minimum of the criterion function.
This means that the estimated values do not have to correspond to the optimal
values. The block diagram of the OE adaptive IIR filter is shown in Fig. 2.10. In
this case the equation of the output error, OE, is defined as
eo k yk yo k: 2:102
The error signal eo k is a nonlinear
function of the filter coefficients, and the
criterion function MSE E e20 k may have more than one local minimum. The
corresponding adaptive algorithms in principle converge slower than the EE
algorithm and may converge to local minima. However, the distinction of the OE
method compared to the EE method is that the adaptive filter generates the output
yo k based on the input signal xk only, while in the case of the EE method the
reference signal, yk, also takes part in the adaptation process.
The signal at the output from the OE IIR adaptive filter is defined by the
difference equation
2.5 Adaptive Algorithms for the Estimation of the Parameters of IIR Filters 63
e o (k )
+
y (k )
X
N X
M
yo k ai kyo k bi kxk i: 2:103
i1 i0
It is seen from (2.103) that the output signal yo k is a function of its N previous
values, as well as of the actual and the M previous values of the input signal xk.
Such a feedback over the output signal significantly influences the adaptive
algorithm, making it much more complex compared to the EE approach. Analo-
gously to expressions (2.96) and (2.97), the expression (2.103) can be represented
in the polynomial form
Bk; z1
yo k xk; 2:104
1 Ak; z1
or like a vectorial equation (scalar product)
yo k HT kXo k; 2:105
where the vector of variable parameters Hk is defined by (2.100), and Xo k
XTo k xk xk 1 . . . xk M yo k 1 yo k 2 . . . yo k N
2:106
The output yo k is a nonlinear function of the parameters of the vector H,
because fyo k i; i 1; 2; . . .; N g like an element of the parameter Xo , is a
function of the coefficients of the filter in the previous k iterations. This fact
significantly complicates synthesis of adaptive algorithms for the estimation of
parameters of a digital filter.
Adaptive algorithms which will be described in this section can be represented
in a general form denoted as the Gauss-Newton algorithm [25].
The Gauss-Newton algorithm represents a stochastic version of the Newton
deterministic algorithm (2.30). To illustrate this algorithm, let us consider the
model of a stochastic signal described by the difference equation (the linear
regression equation)
64 2 Adaptive Filtering
yk HT Xk ek; 2:107
where yk and Xk are measurable variables, and H is the unknown parameter
vector to be determined. In the above expression ek represents a random residual
or error and the natural way to determine H is to minimize the variance of error,
i.e. the mean square error
1
1 n 2 o
J H E e2 k E yk HT Xk ; 2:108
2 2
where Efg denotes mathematical expectation. Since J H is a quadratic function
of the argument H, its minimum is obtained by solving the equation
o
J k rJ H E Xk yk HT Xk 0: 2:109
oH
The quoted problem cannot be exactly solved, since the aggregate function of
the probability density of random variables yk; Xk, which is necessary to
determine the mathematical expectation, is unknown. One of the ways to over-
come this difficulty is to replace the unknown mathematical expectation by the
corresponding arithmetical means, i.e. to adopt the approximation
1X M
Eff xg f i; 2:110
M i1
which leads to the least square algorithm, described in detail in the precious
section. Another possibility is to apply a stochastic version of the Newton deter-
ministic scheme (2.30)
1
Hk Hk 1 ck r2 J Hk 1 rJ Hk 1; 2:111
where the Hessian is
d2 d
r 2 J H k 1 J k rJ Hk 1 E XkXT k : 2:112
dH2 dH
It can be seen that the Hessian is independent on H. The Hessian can be
determined as the solution R of the equation
E XkXT k R 0: 2:113
To iteratively solve this equation one may further use the Robbins-Monro
stochastic approximation procedure [24, 28].
A typical problem of stochastic approximation may be formulated in the fol-
lowing manner. Let fekg represent a series of stochastic variables, with an
identical distribution function, where k denotes the index of a discrete moment. Let
one further have a given function Qx; ek of two arguments x and ek, whose
form does not have to be known accurately, but for each adopted x and the
2.5 Adaptive Algorithms for the Estimation of the Parameters of IIR Filters 65
obtained ek one can determine the value of the function Q; . The problem is
now to determine the solution of the equation
EfQx; ekg f x 0; 2:114
where Efg denotes the mathematical expectation with regard to the random
variable ek, where it is assumed that the user does not know the distribution
function, i.e. the probability density, of the stochastic variable ek. The posed
problem reduces to the determination of the series xk; k 1; 2; . . ., the calcu-
lation of the corresponding values of Qx; ek and the determination of the
solution of the Eq. (2.114). This equation is also denoted as the regression
equation. Its trivial solution consists in fixing the variable x, determining a large
number of values of Qx; ek for the adopted x, with the aim to obtain a good
estimation of the f x, and repeating such procedures for a certain number of new
values of the variable x until the solution of the regression equation is found
(2.114). Obviously such a procedure is not efficient, since much time is spent to
estimate f x for the values of the variable x which significantly differ from the
looked-for solution (2.114). Robbins and Monro proposed the following iterative
solution for the determination of the root of the Eq. (2.114)
^xk ^xk 1 ckQ^xk 1; ek; 2:115
where fckg is a series of positive scalar variables which tend to zero with an
increase of the index k. The convergence properties of the proposed procedure
were analyzed by Robbins and Monro, Blum and Dvoretsky, where it was shown
that the series (2.115) under certain conditions will converge to the solution of the
Eq. (2.114). Typical assumptions in these analyses were that the terms of the series
fekg are independent stochastic vectors, which is not fulfilled in a general case
[24, 28]. Especially for the problem under consideration (2.113)
A modified version of this algorithm, useful for the estimation of the unknown
parameters of the IIR filter, is given by the expression [24]
X
m
Gk; z gi k zi : 2:122
i0
The scalar variable a controls the convergence speed of the algorithm, and the
matrix R(k) is updated as
The role of the matrix R1 is to speed the convergence of the adaptive algo-
rithm, and the price to pay is the increase of computational complexity. If the value
R1 k 1 in (2.119) is replaced by a unit matrix I, one obtains an algorithm with
worse convergence properties, but also a lower complexity, of the order of
M N compared to M N 2 of arithmetic operations in the basic algorithm.
The parameter parameter: vector is most often initialized to H0 0, where 0
is the zero-vector with corresponding dimensions, whose all components are 0, and
R0 r2 I, where r2 is a small, positive and scalar variable. Other initial values
may be defined too, but one has to take care that R is a positively definite matrix,
in order to enable the determination of the inverse matrix R1 , and that the poles
2.5 Adaptive Algorithms for the Estimation of the Parameters of IIR Filters 67
of the polynomial 1 Ak; z1 in (2.104) are always within the unit circle of the
z-complex plane, in order to ensure filter stability.
For the EE method one takes
F k; z Gk; z 1; 2:125
so that
XF k Xe k; eG k ee k: 2:126
The corresponding algorithm is the WRLS algorithm, and if one takes a unit
matrix I, for Rk 1 one obtains the LMS algorithm. The block diagram of the
EE WRLS algorithm is given in Table 2.3.
The recursive prediction error (RPE) algorithm updates the parameters of the
vector H according to the process of minimum square error MSE, n E e2o k ,
where eo is the output error. Since in general n is an unknown variable, the
algorithm is designed to minimize in each iteration the estimated actual value of n,
expressed as f e2o k, and the consequence of such approximation is a relatively
noisy estimation of the filter parameters.
68 2 Adaptive Filtering
oyo k XN
oyo k i
yo k j ai k
oaj k i1
oaj k
2:129
oyo k XN
oyo k i
xk j ai k :
obj k i1
obj k
XN
oyo k oyo k i 1
x k j ai k xk j; 2:132
obj k i1
obj k 1 Ak; z1
where the polynomial Ak; z1 is defined by (2.97). While deriving relations
(2.131) and (2.132) the operator of unit delay z1 was introduced, so that
oyo k 1 oyo k o k1 o k
z1 and oyob j k
z1 oy
obj k . It follows that in the general form
oaj k oaj k
of the algorithm (2.119)(2.122)
1
F k; z ; G k; z1 1: 2:133
1 Ak; z1
Now it is possible to write the complete RPE algorithm
^ ^ 1 1
Hk 1 Hk aR k 1 Xo keo k; 2:134
1 Ak; z1
where Xo k is defined by the expression (2.106), eo k by (2.102) and (2.105), and
Rk by the expression (2.124), while the polynomial A(k) is given by the
expression (2.97).
Relation (2.134) can be written in an alternative form
In this way, the vector of filtered inputoutput data is defined by the expression
XTof k xf k; xf k 1; . . .; xf k M; yf k 1; yf k 2; . . .; yf k N ;
2:139
70 2 Adaptive Filtering
where
X
N
xf j F k; z1 xj xj fi kxj i; j k; k 1; . . .; k M;
i1
2:140
X
N
yf j F k; z1 yo j yo j fi kyo j i; j k 1; . . .; k N:
i1
2:141
Introducing the parameter vector
f T k ff1 k; f2 k; . . .; fN kg 2:142
and the vectors of inputoutput data
XT j fxj 1; xj 2; . . .; xj Ng; j k; k 1; . . .; k M;
2:143
xf j xj XT jf k; j k; k 1; . . .; k M; 2:145
yf j yo j YTo jf k; j k 1; k 2; . . .; k N: 2:146
If the estimations of parameters are close to their optimum values, the proce-
dure can be simplified by filtering only the last input and output data instead of the
whole sequence in (2.145) and (2.146), respectively.
The main disadvantage of the RPE algorithm is that the filter poles, also used to
calculate the derivatives (2.131) and (2.132), may be located outside the unit circle
in the complex z-plane, which implies the appearance of instability. If the poles
remain longer in this region during the adaptation process, a possibility occurs that
the algorithm will diverge. There is a possibility that the poles appear outside the
unit circle, especially because of the noisy estimation of the gradient, since the
approximation fk is used instead of the gradient nk. In order to avoid this, it is
necessary to permanently monitor Pthe system stability. One of the simplest tests to
check stability is to inspect if jai j\1 in each iteration. However, there are
i
cases, especially for large value of N, that this criterion is not satisfactory. On the
other hand, there are tests to establish system instability with certainty, but
computationally they are very complex [3].
2.5 Adaptive Algorithms for the Estimation of the Parameters of IIR Filters 71
In a majority of cases when the test shows that new parameters lead to instability,
most often they are simply neglected, i.e. one takes that Hk ^ 1 Hk.^ Naturally,
this degrades the properties of the algorithm and makes the algorithm a non-robust
one, since it may remain in that state for an indeterminate period of time. If the poles
are located within the unit circle, the filter will be stable only if it represents a linear
and time-invariant system. For the systems variable in time, such as the adaptive IIR
filters, it is not sufficient only to follow the position of poles in discrete time intervals
in order to have the realized system efficient in practical situations.
A block diagram of the RPE algorithm is given in Table 2.4.
Filter coefficients
H^ k H^ k 1 aR1 kXof k 1eo k 1
Update data vector
XTo k xk xk 1 . . . xk M yo k 1 yo k 2 . . . yo k NT where
xi yi 0 for i\0(causal system)
Calculate output error
^
eo k yk XTo kHk yk yo k
Calculate coefficients of the filter that filters Xo k
P
N
fi k ^ M1j kfij k; f0 k 1; i 1; . . .; N
H
j1
Form the vector of coefficients
f T k f1 k f2 k . . . fN k
Filter input and output data
xf k xk XT kf k, where
XT k xk 1 . . . xk N; xi 0 for i\0
yf k 1 yo k 1 YTo kf k, where
YT0 k y0 k 2 . . . y0 k 1 N; y0 i 0 for i\0
Forming of the vector of filtered data
T
XTof k xf k xf k 1 . . . xf k M yf k 1 yf k 2 . . . yf k N
where xf i yf i 0 for i 0
3. Increment iteration counter k by 1 and repeat the procedure from the step 2
72 2 Adaptive Filtering
The statistical properties of the input and the reference signal determine the
environment of an adaptive filter. Although most of the analyses of adaptive filters
in available literature are based on a stationary environment, the utilization of
adaptive filters shows its advantages primarily in nonstationary environments.
Nonstationarity may be categorized with respect to the change of statistical
properties of the input signal, the reference signal, also including the variation of
the estimated system parameters, or both simultaneously. This Chapter considers
the cases when the input signal is stationary, although it does not have to be the
limiting condition for the application of the analyzed algorithms. Further, it was
assumed that additive noise at the system output is stationary with regard to the
reference signal (desired output), so that we considered a model of nonstationarity
caused by the variation of the value of estimated filter parameters.
When an adaptive filter is in a nonstationary environment, the most important
measures of its properties are (1) the time necessary for the algorithm to converge
after the onset of nonstationary changes; (2) the achieved accuracy of the
estimated parameters after the finished convergence. However, these two
requirements are mutually opposed, so that it is necessary to define an algorithm
representing an optimal measure of their congruence. One of the solutions is the
use of adaptive algorithms with a variable forgetting factor.
The choice of a fixed forgetting factor with a value near to unity enables efficient
following of slow changes of the system parameters. However, this approach gives
poor results if the changes of the system parameters are abrupt. As stressed in
Sect 2.4.4, the application of a variable forgetting factor in a parameter estimation
algorithm ensures different evaluations of previous measurements of signals.
In the previous Chapter it was shown that the forgetting factor q in the RLS
algorithm corresponds to an asymptotically exponential decrease of memory, with
a value defined by Eq. (2.71), i.e.
1
s : 3:1
1q
If it is assumed that the properties of an environment within an interval s remain
approximately unchanged, it is possible to use (3.1) to determine the adequate
value of the forgetting factor q. Thus for nonstationary signals it is necessary to
adaptively change the forgetting factor during the operation of the algorithm. On
the nonstationary parts of the signal it is optimal to use a short memory length
s smin , for which q qmin \1, while for the stationary parts of the signal one
should establish a long memory, i.e. s smax , for which q qmax 1. In this
manner one obtains a tradeoff between the desired accuracy and the adaptation
speed of the estimated parameters. According to (3.1) it follows
smin 20 ) qmin 0:95; smax 100 ) qmax 0:99: 3:2
Further, in such an approach it is assumed that the nonstationary signal consists
of stationary parts of a certain length in a range between s smin and s smax .
However, there is only a low probability that in practical situations the duration of
these intervals of stationarity and the moment of their onset will be known.
Because of that, during the operation of the parameter estimation algorithm one
has to estimate the degree of signal nonstationarity and based on that knowledge to
automatically determine the change of the value of the forgetting factor.
Two convenient ways to adaptively determine the forgetting factor, both based
on the energy of the error signal (residual) in one data window are presented in the
papers [9] and [10]. The basic idea is to generate a variable forgetting factor based
on the error residual, which is increased in the nonstationary parts of the signal,
thus pointing out to the onset of nonstationarity [31, 32].
Reference [9] proposed a procedure for the choice of the variable forgetting factor
based on the extended prediction error (EPE algorithm), defined on a data window
of a length L with
1XL1
Qk e2 k i: 3:3
L i0
Since the error e occurring because of the presence of additive noise at the filter
output is a stochastic process, the idea is to use averaging (summation) to remove
(filter) the stochastic error component caused by additive noise, in order to avoid
erroneous recognition of nonstationarity as a presence of high additive noise.
However, the value of L (the length of data window on which one calculates the
error energy, i.e. the EPE criterion) must be sufficiently small in comparison to the
3.1 Choice of Variable Forgetting Factor 77
maximal time constant smax (algorithm memory), in order to ensure the best
possible registering of potential nonstationarity of the signal. The choice of the
variable forgetting factor is defined by (3.1), i.e.
1
qk 1 ; 3:4
sk
where [9]:
r2n smax
s k : 3:5
Qk
Here r2n denotes the expected (estimated) variance of additive noise, based on a
real knowledge of the analyzed stochastic process which generated the measure-
ment data at the filter output. In the stationary parts of the signal the extended
prediction error Qk tends to the noise variance r2n , and in this case the maximal
asymptotic value of the memory (smax ) controls the adaptation speed. Since the
choice of the forgetting factor, defined by (3.4) and (3.5), does not guarantee
positive values of the forgetting factor q in (3.4), it is necessary to limit in advance
the bottom value of this factor to qmin \1. It turns out that this algorithm is
efficient in the cases when the signal to noise ratio (SNR) is above 20 dB. For an
SNR decreasing below 10 dB this
algorithm gives poor results (SNR ratio is
defined as SNR 10 log r2y =r2n , where r2y is variance or mean power of filter
output signal in absence of additive noise, while r2n denotes the variance as a
measure of the mean power of additive noise). Besides that, it is necessary to
specify in advance the variance of additive noise r2n , which is not easily deter-
mined in many cases. The scheme (3.4), (3.5) for the choice of the variable
forgetting factor is very sensitive to this parameter, which in a general case may be
estimated based on measured data (by their adequate averaging or in some other
way). In practical situations, to obtain a heuristic estimation of an unknown var-
iance one often uses a median of the absolute deviation of the median calculated at
a data window of a length L [19, 33]
medianfei medianeig
r2n dk ; 3:6
0:6745
where k is the current discrete moment, and the index of discrete time i belongs to
the set i k; k 1; . . .; k L 1. The median represents a middle term in the
sample whose elements are sorted as an increasing sequence if the sample length
L is an odd number, or the arithmetic mean of the two middle terms of the sample
sorted in an increasing sequence if the sample length L is an even number [16, 19,
33, 34]. The factor 0.6745 ensures that the estimation (3.6) is approximately equal
to the standard deviation of the sample, r2n , for a sufficiently large length L of the
data window and, in the case that the terms of the discrete sequence feig are
generated according to the normal distribution low, with a zero mean value and the
78 3 Finite Impulse Response Adaptive
variance r2n . Instead of the estimation (3.6) one may also use the arithmetic mean
[16, 19, 34]
1XL1
1XL1
r2n ei e2 ; e ei: 3:7
L i0 L i0
It is not convenient to utilize the estimation (3.7) in the situations when mea-
surement noise has impulse character, i.e. when it contains sporadic realization of
high intensity that are denoted as outliers, i.e. such an estimation is non-robust
in the quoted conditions [17, 18, 19, 21, 33]. The usual choice for the estimation
(3.6) is 5 L 10, while in the case of the estimation (3.71) one adopts L 30
[26, 27, 35, 36].
One of the very often cited algorithms for the choice of the variable forgetting
factor in the recursive least squares (RLS) algorithm was proposed in the paper of
Fortescue, Kershenbaum and Ydstie (FKY), according to which it was named FKY
algorithm [10]. The value of the forgetting factor is determined according to the
ratio of the current value of the squared error signal and the estimated power of
additive noise. The choice of the forgetting factor in RLS algorithm is given by
e2 k
qk 1 T
; 3:8
b0 1 X kPk 1Xk
where ek is the current error or residual, and b0 is a constant chosen to satisfy the
desired estimation quality in the stationary mode of operation. Similar to the EPE
algorithm, the FKY algorithm (3.8) does not guarantee positive values of the
forgetting factor, so it is necessary to limit its bottom value to qmin \1.
Basically this algorithm was developed to ensure higher robustness (insensi-
tivity) with regard to the input signal characteristics, but it also proved itself
successful in the applications in nonstationary environments. As in the previous
case, it is necessary for its realization to know the characteristics of additive noise,
i.e. its variance r2n .
The derivation of the FKY algorithm (3.8) consists of the following steps [10].
Let a discrete system (filter) whose parameters are estimated, be described by the
following linear regression model
^ k 1;
ek yk ^yk yk XT kH 3:13
1
Kk Pk 1Xk q XT kPk 1Xk ; 3:14
1n 1 o
P k Pk 1 Pk 1Xk q XT kPk 1Xk XT kPk 1 ;
q
3:15
where the initial value P0 r2 I represents a unit matrix, I, multiplied by a large
positive number r2 1. As will be shown in the further text, the matrix P has the
meaning of an error covariance matrix of the estimated parameters. The role of the
forgetting factor is to enable following of the parameter changes in the systems
variable in time. The adaptation speed is determined by the asymptotic memory of
the algorithm defined by (3.1), i.e.
1
s ; 3:16
1q
which limits the evaluation of the previous signal measurements to s time samples.
It should be noted that for a choice of q 1, with advancing estimation pro-
cess, the value of the matrix P decreases, and a consequence is that the information
80 3 Finite Impulse Response Adaptive
about the system dynamics, i.e. about the estimated parameters decreases and
finally completely disappears. On the other hand, setting q to a value below 1, in
order to include information about the changes occurred in the system (its para-
meters) leads to continuous divisions of the matrix P with a factor lower than one,
which may lead to a sharp increase of its value, as well as to a large sensitivity to
disturbances and numerical errors propagating through the residual ek in (3.13).
The error signal or residual (3.13) contains the information about the state of
the estimator in each discrete moment k. Small values of the error signal mean,
except in the case of possible absence of the input signal, that the value of the
estimated parameters is close to the desired value. In that case it is desirable to
choose a value of the forgetting factor q near to unity, in order to comparably take
into account all previous measurements. In the case of increasing error signal one
should increase the estimator value, i.e. decrease the value of the forgetting factor
q below its unit value, until the estimated parameters are updated to a desired
value, and the error signal becomes sufficiently small.
According to this requirement, one may define the measure of the information
content of the filter, bk, as a weighted sum of squares of error signals, which in
its recursive form is given as [10]
1
bk qkbk 1 e2 k 1 XT kPk 1Xk ; 3:17
where qk is the variable forgetting factor. Let us note that the second addend in
(3.17) represents a normalized error, since the term 1 XT kP k 1 Xk
represents an estimation of the error variance ek, as will be shown at the end of
this Chapter.
The choice of bk in such a manner to preserve its constant value, i.e.
bk bk 1 . . . b0 3:18
may define the strategy for the choice of the forgetting factor in such a manner that
it in each moment depends on the measure of the information contents of the filter,
which is constant. Namely, from (3.17) and (3.18) it directly follows that
e2 k
qk 1 T
: 3:19
b0 1 X kPkXk
Starting from (3.16) and (3.19) one obtains for the effective filter memory
b0
s k 1 : 3:20
T
e2 k 1 X kPk 1Xk
Since b0 is proportional to the sum of squared error signals, when choosing its
value one may start from [10]
b0 r2n s0 ; 3:21
3.1 Choice of Variable Forgetting Factor 81
where r2n is the expected variance of additive noise in (3.9), based on the real
knowledge of the stochastic process, and s0 represents a nominal filter memory
length determining the total speed of the adaptive process. Let us note that, similar
to (2.32), the solution of the difference equation (3.17) is given as
Y
k X
k
bk qib0 xie2 i;
i0 i0
where
1
x0 1; xi 1 XT iPi 1Xi ; i 1; . . .; k :
Taking into account (3.18) one concludes that
" #
Yk X k
b0 1 qi xie2 i:
i0 i0
Since the sum of the squared errors represents an estimation of the variance of
additive noise in the model of the filter output signal (3.9), the derived expression
implies the relation (3.21).
At the end of this section is shown that for a choice of b0 according to (3.21),
for stationary processes one obtains Efskg s0 when k ! 1. The sensitivity of
the system is determined by the choice of s0 so that lower values of s0 lead to a
more sensitive system, and higher to a less sensitive one, but with slower adapt-
ability of the estimated parameters.
Summarily, the recursive least squares algorithm with the FKY strategy for the
choice of the forgetting factor is defined in Table 3.1.
It should be mentioned that an accurate solution of the problem for the choice
of a constant value of bk, which reduces to the solution of (3.18) in each step,
requires the determination of the values of the forgetting factor prior the deter-
mination of the values of the amplification of the estimator Kk, which would
result in a much more complex relation for the choice of the forgetting factor. In a
majority of cases the practical difference between this algorithm and the described
algorithm is very small, but one must introduce testing of the obtained value of
qk, in order to ensure that the forgetting factor does not assume unacceptably
small or even negative values. This problem is avoided by limiting the bottom
value of the forgetting factor by introducing its minimal value qmin .
As shown in the previous section the algorithm (3.12)(3.15) minimizes the
criterion of weighted squared errors ei yi H ^ T Xi; defined by (2.69), i.e.
X
k h i2
J k ^ T Xi ; wi qki :
wi yi H 3:22
i0
82 3 Finite Impulse Response Adaptive
J k eT kWkek; 3:23
where
2 3 2 3
w0 0 0 e0
6 0 w1 0 7 6 e1 7
6 7 6 7
W k 6 . 7; ek 6 .. 7: 3:24
4 .. 5 4 . 5
0 0 wk ek
Nonrecursive least squares algorithm determines H ^ in one step from the con-
dition of the minimum of the criterion (3.22) and is defined by the relation (2.53),
^ H
or, if one adopts the notation H ^ k
" #1
X
k Xk
H^ k T
wiXiX i wiXiyi; 3:25
i0 i0
or in vectorial form
^ k ZT kWkZk 1 ZT kWkYk;
H 3:26
where the matrix and the vectors of input and output data for a FIR discrete system
(filter)
3.1 Choice of Variable Forgetting Factor 83
2 3 2 3 2 3
XT 0 y0 xk
6 XT 1 7 6 y1 7 6 xk 1 7
6 7 6 7 6 7
Zk 6 . 7; Yk 6 . 7; Xk 6 .. 7: 3:27
4 .. 5 4 .. 5 4 . 5
XT k yk xk M
It is assumed here that all relevant signals are causal, i.e. that xi yi 0
for i\0. The algorithm (3.12)(3.15) itself represents a recursive version of the
non-recursive single-step estimation procedure (3.25), i.e. this multistep algorithm
determines the minimum of the adopted estimation criterion (3.22), thus these two
algorithms are equivalent in the asymptotical sense k ! 1. If one further
replaces the discrete system model (3.9) into (3.25), one obtains
" #1
Xk Xk
H^ k H k wiXiXT i wiXini; 3:28
i0 i0
or in vectorial form
^ k Hk ZT kWkZk 1 ZT kWkNk;
H 3:29
where the column-vector of additive noise at the system output is
Nk n0 n1 . . . nkT : 3:30
1X k
lim wiXiXT i E wiXiXT i ; 3:33
k!1 k
i0
84 3 Finite Impulse Response Adaptive
1X k
lim wiXini EfwiXinig: 3:34
k!1 k
i0
Since fwig is a deterministic series, the term wi can be moved before the
linear operator Efg in (3.33) and (3.34), thus one concludes that for sufficiently
^ k will
large k (fulfilled assumption of the law of large numbers) the estimation H
be close to the accurate value of Hk. In this manner, the expected value of the
estimation will be equal to the accurate value of the parameters (unbiased esti-
mation) under the condition that the stochastic variables Xi and ni are
uncorrelated, i.e.
EfXinig 0: 3:35
Since according to (3.27) Xi contains only realizations of the stochastic input
excitation signal of the FIR filter, the condition (3.35) will be fulfilled if additive
noise ni at the signal output (see relation (3.9)) is uncorrelated with the excitation
signal. If, besides that, the excitation signal and additive noise are distributed
according to Gauss (normal) law, the condition for them to be uncorrelated reduces
to the independence of these stochastic variables, which implies [16, 17, 19, 29,
34, 37]
EfXinig EfXigEfnig; 3:36
thus if additive noise at the output of the FIR system has a zero mean value, i.e.
^
Efnig 0, the condition (3.35) will be fulfilled and the estimation Hk will be
unbiased (its expected or mean value will be equal to the accurate value of the
parameters). Let us note that in the case of IIR systems the observation vector Xi
in (2.101) will contain both the excitation and the delayed system outputs, so that
the condition (3.35) will be realized only if the sequence fnig in (3.9) is white
discrete noise (white noise is uncorrelated in time, so that previous realizations of
noise do not depend on the later ones) [16, 17, 19, 29, 34, 37].
Let us further introduce the parameter estimation error
^ k;
Vk Hk H 3:37
and the matrix
" #1
1 X
k
Pk ZT kWkZk wiXiXT i : 3:38
i0
In that case, taking into account (3.29), the error covariance matrix for an
unbiased estimation (estimation error is a stochastic variable with a zero mean
value, since measurements are noisy, and the excitation signal is also stochastic) is
defined by the expression
3.1 Choice of Variable Forgetting Factor 85
covfVkg E VkVT k
3:39
E PkZT kWkNkNT kWkZkPk :
Assuming further that the input matrix Zk is known, i.e. that the excitation
signal sequence fxi; i 0; 1; . . .; kg, is given, the conditional matrix of the
error covariance is
covfVkjx0; . . .; xkg PkZT kWkE NkNT k WkZkPk:
3:40
While deriving (3.40) it was also assumed that the components of the stochastic
vector Nk in (3.30) are statistically independent on stochastic input variables
fxi; i 0; 1; . . .; kg, so that the conditional covariance of noise
covfNkjx0; . . .; xkg is equal to condition less covariance of noise
covfNkg E NkNT k . If, additionally, the stochastic variables ni in
(3.30) have identical distributions, zero mean value and identical variance r2n , then
r2n s0
s e2
; 3:47
2
varfeg rn
from which one concludes that the mean or expected value of the algorithm
memory (3.20) is
Efskg s0 : 3:48
Let us note that while deriving (3.48) it was assumed that the law of large
numbers is valid, i.e. that k ! 1.
The choice of forgetting factor based on the FKY algorithm, as proposed in [11], is
denoted as the PA-RLS algorithm [38]. The methodology of the choice of for-
getting factor in this case is defined as
p p
qk 4 qkqk 4M qkqk 4 qk bmin
qk p p :
qk 4 qkqk 4M qkqk 4 qk M
3:49
Here M denotes the order of the adaptive FIR filter, and the ratio of non-
stationarity and additive noise power qk is defined as
these two algorithms in stationary mode one should make the values of these two
parameters equal [11].
The derivation of the algorithm itself is shortly given in the next part of this
Section. Generally, one may consider two types of errors in adaptive filters. The
estimation error represents the measure of the accuracy of the estimated para-
meters in a stationary environment, while in a nonstationary environment an
important measure is the one defining the reaction speed to environment changes,
the lag error. The mean square estimation criterion represents a sum of these two
errors. When analyzing the properties of adaptive algorithms in nonstationary
environments one of the important steps is the definition of the nonstationarity
model, since it turns out that in practical situations the properties of adaptive filters
may differ for different nonstationarity models. Even in the case when the non-
stationarity model is known, its mathematical modeling is often very demanding or
even unattainable. In [11] the so-called random walk (RW) model of statio-
narity was chosen, since it is known that the algorithms showing good properties in
environments with the RW model of nonstationarity have similar characteristics in
many other nonstationary environments, typical for various applications.
The RLS algorithm with exponentially weighted forgetting factor recursively
minimizes the criterion function (3.22), i.e.
X
k
J k qki e2 k: 3:52
i0
The minimization of the criterion function (3.52) leads to the recursive WRLS
algorithm for parameter estimation (3.12)(3.15)
^ k 1 H
^ k PkXkek
H ; 3:53
q XT kPkXk
1 PkXkXT kPk
Pk 1 P k ; 3:54
q q XT kPkXk
where ek yk H ^ kXk is the error or residual of measurement, and the
matrix P is an approximation of the inverse value of the auto-correlation matrix of
the input signal
R E XkXT k ;
while yk is noisy desired response of an adaptive FIR filter (3.9) in a given
discrete moment k, i.e.
independent on the input signal and on the variations of the parameters Hk. If
one introduces a notation Vk for the vector of parameter estimation error
^
Vk Hk Hk; 3:56
_
then, taking into account (3.55), the error signal ek yk Hk 1Xk is
given as (3.43), i.e.
Then, omitting the index k from the relation (3.61), one obtains
( )
T
E PXX Pe 2
PXXT VVT VVT XXT P
L 2 E : 3:62
q XT PX q XT PX
1b
cb : 3:67
1 b=bcrit
To further simplify the expression (3.62), let us also assume that P is inde-
pendent on X, which is correct in the case of FIR filter analysis, since the window
(memory) of the RLS algorithm is, generally taken, wider than the autocorrelation
of the input signal, R. Besides that, in [39] it was shown that the value VT X is
independent on X and P.
If (3.62) is multiplied by R and the trace matrix operation tr is applied, one
obtains
n 1 o 2 n 2 o n 1 o
tr RL abcE q XT PX E e 2cE XT V E q XT PX ;
3:68
where the variable
n 2 o
tr E RPXXT P q XT PX
a n 1 o n 1 o : 3:69
tr E RPXXT P q XT PX E q XT PX
which follows according to the above assumptions. Let us divide (3.69) with the
minimum of the criterion MSE, r2n , to determine the deviation in stationary
(equilibrium) state, which is defined as
n 2 o
2 T
D E e E V X
M1 1 : 3:71
r2n r2n
Further one obtains
q=l ab
M1 ; 3:72
2 ab
where
n 1 o
D
l cME q XT PX ; 3:73
3.1 Choice of Variable Forgetting Factor 91
and q represents the ratio between the degree of nonstationarity and the additive
noise power
q M tr RL=r2n : 3:74
In [39] it was shown that it is a valid approximation to assume
ll
ab 3:75
for bcrit 2 and for large values of the filter order M, so that
ll
q=
M1 : 3:76
2l
To obtain the optimal value of the forgetting factor q, which minimizes the
deviation (error) in a stationary (equilibrium) state, one should differentiate the
expression (3.76) with respect to the parameter l . By making this derivative equal
to zero one obtains the optimum solution
hp i.
l q q 4 q 2 3:77
according to which one also obtains the optimal forgetting factor for a given
degree of nonstationarity of the environment and the additive noise power,
expressed by the parameter q,
p p
M l
2 l q 4 q q 4 M q q 4 q
q q p p : 3:78
M l
2 l q 4 q q 4 M q q 4 q
From (3.78) one concludes that for very small values of the factor q, the optimal
forgetting factor q tends to unity, and if q approaches a large value, then
q ! M 1=M 1. To determine the optimal forgetting factor q one needs
an estimation of the factor q. Regretfully, such an estimation is not observable for
an RLS process, however if the nature of the additive noise (i.e. of its mean power
r2n ), is known, one can perform the estimation of the factor q in each step in the
following manner [38]
qk 1 maxf0; qq^
^ qk 1 qq l k
2 2
3:79
e k 1=rn 1 2 l
k l k :
This expression was obtained by applying the current value of the squared error
as the criterion measure of the MSE, and by solving (3.72) over q, under the
assumption that l l . Based on (3.78) one may now estimate the optimal value of
the forgetting factor for the RLS algorithm in the next iteration, i.e.
qk 1 q ^ qk1 : 3:80
92 3 Finite Impulse Response Adaptive
Since the optimal value of the forgetting factor q was based on minimization
of the deviation in stationary mode, in order to faster follow rapid changes
occurred after the stationary mode for which q 1, in [11] the following mod-
ification of the expression (3.78) was introduced
bmin
q q q q ; 3:81
M
where bmin is a small constant, chosen with regard to the value of deviation in
stationary state.
The resulting algorithm for the forgetting factor now appears like
p p
q k 4 ^
^ q k ^q k 4 M ^ qk^qk 4 ^qk bmin
qk p p ;
q k 4 ^
^ q k ^qk 4 M ^qk^qk 4 ^qk M
3:82
which represents the desired relation (3.49).
The flow chart of the PA-RLS algorithm is described by the following steps and
is given in Table 3.2.
Recursive procedures for the determination of the variable forgetting factor
based on (1) extended prediction error (3.3) [9]; (2) the ratio between the
momentary value of MSE and the estimated power of additive noise (3.8) [10];
(3) the ratio between the nonstationarity degree and the additive noise power
(3.49) [11], are tested further in this section, in order to perform a more accurate
comparison under conditions close to reality. However, before presenting the
experimental results, let us consider another general approach for the generation of
a recursive least squares algorithm with variable forgetting factor, based on
minimization of the generalized criterion of weighted least squares [17, 19, 37].
X
k
^ k k1
J H; ^ ;
qk; ie2 i; H 3:83
i1
where the residual e is defined by the expression (3.43). For a sufficiently large
k the arithmetic mean at the right side of the expression (3.83) converges to a
corresponding mathematical expectation, i.e. the criterion (3.82) reduces to
^ k E qk; ie2 i; H
J H; ^ : 3:84
A convenient form of the factor q was proposed in [17] and is given as
qk; i xkqk 1; i; 1 i k 1: 3:85
According to (3.85) one further concludes that
qk; i xkqk 1; i
xkxk 1qk 2; i
..
.
xkxk 1 xi 1qi; i;
i.e.
94 3 Finite Impulse Response Adaptive
" #
Y
k
qk; i x j ai; ai qi; i: 3:86
ji1
X
k h i2
^ k k1
J H; ^ T Xi 1 :
qk; i yi H 3:88
i1
By differentiating this expression over the parameter vector H,^ from the con-
dition oJ H;^ k =oH^ 0 one obtains in a single step a non-recursive estimation of
the unknown parameter vector H ^ k H
^ in the current moment k, i.e.
oJ H;^ k Xk T ^
2k1 qk; i yi XT i 1H^ oX i 1H 0; 3:89
oH^ ^
oH
i1
where
^
oXT i 1H
Xi 1: 3:90
oH^
By introducing (3.90) into (3.89) and solving the obtained algebraic equation
^ one finally obtains
over H,
" #1 " #
Xk X
k
^ ^ 1 T 1
H Hk k qk; iXi 1X i 1 k qk; iXi 1yi :
i1 i1
3:91
In a general case, for an arbitrary qk; i, there is no recursive version of the
estimation (3.91), and thus the introduction of its recursive version requires to
specify a convenient form for the weight factor qk; i [17]. A possible convenient
form is defined by the relation (3.85). Introducing further the notation
X
k
k
R qk; iXi 1XT i 1
i1
3:92
X
k1
T T
qk; iXi 1X i 1 qk; kXk 1X k 1;
i1
3.1 Choice of Variable Forgetting Factor 95
^ k 1;
e k y k X T k 1 H 3:97
96 3 Finite Impulse Response Adaptive
Pk 1Xk 1
Kk akPkXk 1; 3:98
xk ak XT k 1Pk 1XT k 1
( )
1 Pk 1Xk 1XT k 1Pk 1
P k P k 1 : 3:99
xk xk ak XT k 1Pk 1XT k 1
P H 1
D log 3:100
P H 0
The logarithm of likelihood ratio can be defined by using the conditional
probability density functions on the given intervals:
( )
f xnN1 ; . . .; xn xnNp1 ; . . .; xnN ; X1 f xn1 ; . . .; xnN xnp1 ; . . .; xn ; X2
D log
f xnN1 ; . . .; xnN xnNp1 ; . . .; xnN ; X3
3:101
where:
f x1 ; . . .; xn xp1 ; . . .; x0 ; Xj 3:102
P
n P 2
nN
K1 exp 2r1 2 u2i K2 exp 2r1 2 ui
1 inN1 2 in1
D log : 3:105
P
nN
K3 exp 2r1 2 u2i
3 inN1
1 X n
1 XnN
1 X nN
^21
r e2i ; r
^22 e2i ; r
^23 e2 3:106
N inN1 N in1 N inN1 i
^1K
K ^2 1 X nN
1 X nN
1 X nN
D log 2 e2i 2 e2i 2 e2 3:107
^3
K r3 inN1
2^ r1 inN1
2^ r2 in1 i
2^
Replacing the expressions (3.106) for r^j in (3.108), one can obtain the
expression for discrimination function D:
1 X nN
1 X n
1 XnN
Dn; N 2N log e2i N log e2i N log e2 : 3:109
2N inN1 N inN1 N in1 i
Dn; N is not smooth, and its sudden change indicates a possible abrupt signal
change. Discrimination function depends on signal change rate, and the duration of
a change. The function Dn; N is not appropriate for precise detection of abrupt
changes, because of its noisy character. That is the reason why we observe a
short D function trend in the n N=2 1; n N=2 interval, containing
N consecutive values of Dn; N . If n1 ; n2 denotes the upper interval, then
Dn; N , n 2 n1 ; n2 , can be expressed by the linear trend tn; N as
Dn; N tn; N en; N 3:112
where the linear trend
tn; N an; N k bn; N ; k 1; 2; . . .; N 3:113
and en; N is the D function noise component. The value of an; N is now smooth
enough, and represents the slope of linear trend t. Local maximum of function
an; N is at some k nmax , where Dn; N suddenly increases in the n1 ; n2
interval. Similarly, an; N riches its local minimum at k nmin , when Dn; N
starts decreasing in the n1 ; n2 interval. If we denote the difference of two con-
secutive local extremes in an; N with
Danmax ; nmin anmax ; N anmin ; N 3:114
Than it can be heuristically shown that Da in (3.114) represents a good
detection parameter when the nmax ; nmin interval contains the parts of the ana-
lyzed signal where a change appear. The intervals nmax ; nmin with
Danmax ; nmin \tr 3:115
where tr denotes a selected threshold, contain the signal sudden changes, while the
intervals in between can be treated as quasi-stationary parts of signal.
The main advantage of the MGLR algorithm, compared to the other well known
abrupt signal change detection procedures, is connected with the D function cal-
culation in (3.111). Hence, MGLR algorithm allows a posterior analysis, since the
D function is obtained in the closed form independently on the previous possibly
detected signal changes. For residual ej calculation, needed for the D function
obtaining, it is possible to use the Robust Recursive Algorithm (RRLS), instead of
the conventional parameter estimation procedures (recursive or non recursive). By
combining the robust property and the possibility of signal abrupt changes
tracking, the RRLS algorithm provides a residual that is convenient for further
analysis by the MGLR algorithm. Such a residual is suitable for the D function
calculation in (3.111).
The first step in the abrupt signal changes detection algorithm implementation,
is the Dk; N function calculation, where k is a current signal sample, and N is the
window length (one window of length N, referred to as referent and test windows,
100 3 Finite Impulse Response Adaptive
min
Dmin Dmax D
is taken at both sides of the current sample). The function D reaches the maximum
values at the signal changing instances. This function ensures a good measure of
signal non-stationary. The next step is the D function mapping to forgetting factor
q, as shown in Fig. 3.2. The D function maximum is correlated with the forgetting
factor minimum and vice versa. At the beginning, it is necessary to know the value
of Dmin and Dmax .
The Dmin is considered as 0, and this value remains unchanged during the
algorithm, while the starting value of Dmax is estimated based on the two para-
meters. The first one is the number of bits used in A/D conversion of analog signal,
and the second one is the interval length for the parameter estimation. The Dmax
value is updated during the algorithm. Variable forgetting factor, q, is obtained
from the D function, based on the defined reference values of qmin and qmax ,
respectively (see Fig. 3.2).
For the calculations in the test window, it is necessary to have the residual, e,
values in advance, because the D function cannot be generates recursively. There
are two approaches for the solution of this problem. One can calculate the
D function in advance, i.e. prior to recursive analysis it is necessary to perform the
signal preprocessing. The residual, ej , obtained by standard non-recursive least
square, LS, algorithm, can be used for the D function obtaining. The other way is
to calculate the D function by recursive LS algorithm, robust or conventional, but
this method yields the one window length (N) processing delay.
3.2 Experimental Analysis 101
= 0.998 = 0.85
0.5 0.5
0.4 0.4
0.3 0.3
H(1)
H(1)
0.2 0.2
0.1 0.1
0 0
0 1000 2000 3000 0 1000 2000 3000
Iteration number Iteration number
Fig. 3.4 The value of the estimated parameter of FIR filter with the application of fixed
forgetting factor q. (The dashed line represents the change of the estimated factor)
EGP
1.05
0.95
(k )
0.9
0.85
0.8
500 1000 1500 2000 2500 3000
Iteration number
0.4
parametar
0.2
-0.2
500 1000 1500 2000 2500 3000
Iteration number
Fig. 3.5 Estimation of time-variant FIR filter parameter by the application of the RLS algorithm
with the EGP strategy for the choice of variable forgetting factor. Top figure shows the variation
of the forgetting factor corresponding to test signal 1. Bottom figure shows: line value of the
estimated parameter and dashed line value of the accurate parameter
3.2 Experimental Analysis 103
FKY
1.05
1
0.95
(k )
0.9
0.85
0.8
500 1000 1500 2000 2500 3000
Iteration number
0.4
parametar
0.2
-0.2
500 1000 1500 2000 2500 3000
Iteration number
Fig. 3.6 Estimation of time-variant FIR filter parameter by the application of the RLS algorithm
with the FKY strategy for the choice of variable forgetting factor. Top figure shows the variation
of the forgetting factor corresponding to test signal 1. Bottom figure shows: line value of the
estimated parameter and dashed line value of the accurate parameter
various values of the signal-to-noise ratio (SNR), and the results presented here are
obtained for a value of SNR 30 dB. Figure 3.4 shows the estimated parameter
values for the test signal 1, when a fixed forgetting factor (FF) of q 0:998 and
q 0:85, respectively is used in the RLS algorithm. The obtained results point out
to the inertness of the classical recursive least squares (RLS) algorithm, after the
onset of a nonstationary change for the case when the value of q is close to unity,
i.e. q 0:998. Namely, a long time is necessary to obtain correct estimations of
the filter parameter, but after that period the obtained estimations have a great
accuracy in stationary (equilibrium or steady) mode of operation.
On the other hand, when using lower values of the forgetting factor q, i.e.
q 0:85, the global trend of the nonstationarity can be estimated much faster,
owing to the fact that the RLS algorithm memory decreases and thus a much
smaller amount of measurement data is considered, but this is achieved for the
price of an increased variance of the estimated parameters. The obtained results
show that in practical situations one should utilize an adaptive algorithm with
variable forgetting factor which is determined in each step simultaneously with the
estimation of the filter parameters.
Figures (3.5, 3.6, 3.7) show the results of estimation of the time-variant para-
meter based on the test signal 1, obtained by the RLS algorithm with a variable
forgetting factor VFF, which is based on the EGP, FKY and PA strategies,
104 3 Finite Impulse Response Adaptive
PA
1.05
0.95
(k )
0.9
0.85
0.8
500 1000 1500 2000 2500 3000
Iteration number
0.4
parametar
0.2
-0.2
500 1000 1500 2000 2500 3000
Iteration number
Fig. 3.7 Estimation of time-variant FIR filter parameter by the application of the RLS algorithm
with the PA strategy for the choice of variable forgetting factor. Top figure shows the variation of
the forgetting factor corresponding to test signal 1. Bottom figure shows: line value of the
estimated parameter and dashed line value of the accurate parameter
respectively. Here too we considered the case when the filter output is noisy due to
Gaussian noise with a zero mean value and with a variance thus chosen to ensure a
signal to noise ratio of SNR 30 dB. In the EGP the value of the data window of
L 5 was chosen, which is sufficiently small with regard to the maximal time
constant (RLS algorithm memory)
1
smax 500:
1 qmax
The choice of smax depends on the previous knowledge on the degree of signal
nonstationarity, but it turns out that it does not have a significant influence to the
algorithm if values 100\smax \1; 000 are chosen, which corresponds to the choice
0:99\qmax \0:999.
Since the value of the variable forgetting factor VFF in the EGP and FKY
algorithms can reach negative values, a bottom limit value qmin 0:85 was
introduced. In spite of the fact that the use of the PA strategy for the choice of VFF
did not require subsequent limitation of VFF, the VFF factor bottom value was
limited to a value of 0.85 in order to have correct comparison of different stra-
tegies. It turns out that all three approaches give a proper VFF factor, i.e. they
3.2 Experimental Analysis 105
EGP
(a)
(k) 1
0.9
0.8
500 1000 1500 2000 2500 3000
(b)
0.1
Q(k)
0.05
0
500 1000 1500 2000 2500 3000
(c)
0.4
parametar
0.2
0
500 1000 1500 2000 2500 3000
Iteration number
Fig. 3.8 Estimation of time-variant parameter b1 (test 2) by RLS EGP algorithm (SNR = 30 dB,
L = 5): a VFF b extended prediction error Q c trajectories of estimated and accurate parameter b1
follow relatively good abrupt parameter changes with a small variance of esti-
mated parameters in the intervals without abrupt changes. In the neighborhood of
an abrupt filter parameter change the VFF decreases to its minimal value, while in
the stationary parts of the signal this factor increases and tends to its maximal
value, i.e. q 0:998. It should be stressed, however, that the FKY algorithm,
Fig. 3.6, for the determination of VFF factor, shows a somewhat slower adaptation
of the filter parameter change. Here the estimation quality is significantly influ-
enced by the value of the parameter b0 , whose inverse value should tend to a mean
time constant s0 in the estimation process. An increase of its value decreases the
adaptability to filter parameter changes, while a decrease of this variable increases
the variance of the estimated parameters. A value of b0 0:01 was chosen here,
which corresponds to a mean memory constant s0 of 100 samples. The results of
parameter estimation using the PA algorithm, Fig. 3.7, show similar characteristics
to those of the EGP algorithm.
Figure 3.8 shows the values of the VFF factor, extended prediction error and
the estimated parameter b1 for the EGP algorithm on test signal 2 (Fig. 3.3). The
extended prediction error Q (Fig. 3.8b) obviously detects signal nonstationarity.
Near the samples 1,650 and 2,250, where abrupt changes of the FIR filter para-
meters occur, the extended prediction error Q detects high nonstationarity, which
leads to a decrease of the forgetting factor VFF and a faster adaptation of the
106 3 Finite Impulse Response Adaptive
(a) FKY
1.05
0.95
(k)
0.9
0.85
0.8
500 1000 1500 2000 2500 3000
(b) 0.5
0.4
parametar
0.3
0.2
0.1
0
500 1000 1500 2000 2500 3000
Iteration number
Fig. 3.9 Estimation of time-variant parameter (test 2) by RLS FKY algorithm (SNR = 30 dB,
b0 0:01): a VFF b trajectories of estimated and accurate parameter
estimated parameters. When the value of the estimated parameters approaches the
desired value, the extended prediction error decreases, and the VV increases its
value towards its maximum.
Figure 3.9 shows the results obtained by the RLS-FKY algorithm. This algo-
rithm shows somewhat larger inertness compared to the EGP algorithm, which is
especially marked for slower changes, in the interval between the 1,100th and
1,400th samples, while in the moment of sudden drop of the filter parameter value
it has characteristics similar to those of the EGP algorithm.
The PA RLS algorithm, Fig. 3.10, shows the best adaptability, which is also
reflected in a larger convergence speed and a higher accuracy of the estimated
parameters. The parameter qk, which defines the ratio between the non-
stationarity degree and the additive noise power, detects very well abrupt changes
and emphasizes them in comparison to linear changes, which results in adequate
changes of the forgetting factor VFF.
Figure 3.11 shows comparative characteristics of the estimated parameters for
all three proposed algorithms.
The obtained experimental results, based on simulations, point out to the fol-
lowing conclusions:
The use of variable forgetting factor leads to a better adaptability of the digital
filter parameter estimation in comparison to the conventional algorithms with
fixed forgetting factor.
3.2 Experimental Analysis 107
PA
(a)
(k) 1
0.9
0.8
500 1000 1500 2000 2500 3000
(b)100
q(k)
50
0
500 1000 1500 2000 2500 3000
(c)
parametar
0.4
0.2
0
500 1000 1500 2000 2500 3000
Iteration number
Fig. 3.10 Estimation of time-variant parameter (test 2) by PA RLS algorithm (SNR = 30 dB,
bmin 0:01; bcrit 2): a VFF b ratio of nonstationarity degree and additive noise power;
c trajectories of estimated and accurate parameter
0.4
0.35
0.3
0.25
0.2
0.15
0.1
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
Fig. 3.11 Estimation of time-variant parameter (test 2) by RLS algorithm with dotted line FKY,
dashed line EGP and line PA strategy for the choice of forgetting factor VFF
108 3 Finite Impulse Response Adaptive
primarily to avoid stability problems connected with the IIR filters. However, since
one needs to model a system with both zeroes and poles (IIR system) with a
structure with zeroes only (FIR system), the dimensions of the parameter vector in
the filter model must be chosen sufficiently large in order to obtain satisfactory
results. A consequence of such an approach is the increase of the order of the
adaptive FIR filter, and thus the number of iterations, i.e. the time required to
achieve convergence of the adaptive algorithm for parameter estimation. For this
reason one arrives at the motive to accelerate the standard RLS algorithm by a
convenient choice of variables included in the adaptive process, one of the most
important being the input signal x(k). This choice belongs to the field of optimal
experiment planning [26, 4144]. Input signal design is much more often used in
the so-called batch processing of measurement signals or in off-line identification
(estimation) processes [41, 42, 45, 46]. The basic idea is to generate the input
signal according to a chosen criterion in such a manner to make the process of
adaptive parametric identification of the FIR system, i.e. the estimation of its
parameters, as informative as possible, i.e. to contain as much as possible infor-
mation about the estimated parameters in order to shorten the convergence time of
the estimation algorithm. The procedure of the enhancement of the convergence
properties of the adaptive algorithm is based on the specific optimal design of the
filter input signal, known in literature as D-optimal design and described in the
next part of this Chapter.
+
Adaptive filter y (k )
(k )
H
e (k)
4.1 Definition of the Parameter 111
yk XT kH nk; 4:1
where Xk is M 1 1-dimensional vector of the input signal, H is
M 1 1-dimensional parameter vector
XT k xk; xk 1; xk 2; . . .; xk M ;
4:2
HT fb0 ; b1 ; b2 ; . . .; bM g
and M is the filter order. It is necessary to estimate the unknown parameter vector
to minimize the mean square (MSE) criterion (2.16). If this criterion is approxi-
mated by the arithmetic mean
1X k
^
Jk H ^ ;
e2 i; H ^ yi^y i=H
e i; H ^ 4:3
k i1
where ^y= is the prediction of the reference signal (desired response) y; e;
prediction error and H ^ estimation of parameter vector H, and if the RLS algorithm
(2.64)(2.66) is applied to minimize the criterion (4.3), one arrives at the recursion
for the update of the FIR filter parameters
H^ k H
^ k 1 Kke k; H
^ k 1 ; 4:4
where
1
Kk Pk 1Xk 1 XT kPk 1Xk ; 4:5
represents the measure of the mean distance of the estimation H ^ k from the
accurate value H; determining at that the asymptotic convergence speed of the
^ k towards the real values H. On the other hand,
estimated parameter vector H
from the well-known Cramr-Rao inequality it follows
n o
kE H ^ k H H ^ k H T I1 ; 4:9
m
jxi j 1; i 1; 2; . . .
The problem of minimization of the criterion (4.11) with regard to the infor-
mation matrix Im is called the optimum experiment [28, 4143]. It should be
mentioned that different optimality criteria give as a result different optimal input
sequences. For D-optimality the function / is defined by [28, 4143]
1
uIm det I1
m ; 4:12
detIm
where det denotes a determinant. One of its basic advantages compared to other
strategies known in literature is its low computational complexity. Since the value
of the input signal, as the variable taking part in the experiment planning, influ-
ences only the normalized information matrix N from (4.10), the choice of the
function / from (4.12) reduces to
1
uNXk : 4:13
det NXk
If N from (4.10) is approximated by the arithmetic means
X
k
NXk k1 XiXT i; 4:14
i1
while implying that the optimal input satisfies the assumed practical limitation
jxkj 1. Besides that, it follows from (4.5) to (4.6) that
Pk 1XkXT k
det Pk det I det Pk 1: 4:17
1 XT kPk 1Xk
It is known from the matrix theory that if A and B are matrices with dimensions
s r and r s; respectively, then detIs AB detIr BA. Here Is and Ir
represent s s or r r unit matrixes, respectively
[45]. According
to that, by an
adequate choice A Pk 1Xk= 1 XT kPk 1Xk and B XT k
(r 1 and s M;) from (4.17) it follows
1
det Pk det Pk 1: 4:18
1 XT kPk 1Xk
114 4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
i.e.
o
XT kPk 1Xk 0: 4:19
oxk
To solve the minimization problem (4.19) an auxiliary M 1 vector is
introduced
pm1 pm 2 pmm
The proposed adaptive algorithm for FIR filters with optimal input sequence is
defined by (4.3)(4.7), (4.20), (4.21) and (4.23). Equation (4.23) determines in
each step a new value of a D-optimal input signal for the filter parameter esti-
mation algorithm.
While implementing the algorithm one may use the following procedure:
Step 1: In the state k k M let the known values include m 1 m M 1
vector of estimated parameters H^ k 1; m 1 vector Xk 1 and m
m matrix Pk 1 from the previous state k 1.
Step 2: Form a M 1 vector X k from (4.20) by deleting the last element
from the m 1 vector Xk 1.
Step 3: Form a scalar variable P1
11 k 1 as the first element of the first order of
the matrix Pk 1; and then from the remaining elements of this order
form 1 M raw vector P12 k 1, as given in (4.21).
4.2 Finite Impulse Response Adaptive Filters with Optimal Input 115
Pk 1Xk 1 XT kPk 1Xk Pk 1XkXT kPk 1Xk
1 XT kPk 1 Xk
Pk 1XkXT kPk 1
P k 1 X k
1 XT kPk 1Xk
4:24
Taking into account (4.5) and (4.6), the expression (4.24) reduces to the form
Kk PkXk; 4:25
where the covariance matrix of estimation error, Pk; is defined by relations (4.5)
and (4.6). Let us also note that by replacing (4.5) into (4.6), the expression for the
matrix Pk can be written as
1
Pk Pk 1 Pk 1Xk 1 XT kPk 1Xk XT kPk 1:
4:26
Using the matrix inversion lemma (2.58), i.e.
1
A BCD1 A1 A1 B C DA1 B DA1 ;
choosing A P1 k 1; B Xk; C 1; D XT k; the expression (4.26)
assumes its alternative form
Now the RLS estimation of the unknown parameter vector H in relation (4.1) is
defined by the expression
H ^ k 1 1 R1 kXke k; H
^ k H ^ k 1 ; 4:33
k
where the amplification matrix Rk is recursively defined by the relation (4.32), i.e.
1
Rk Rk 1 XkXT k Rk 1 : 4:34
k
The optimal input sequence is defined by the relation (4.23), which in the case
of the quoted form of the RLS algorithm becomes
xk sign R1 1
11 k 1R12 k 1X k ; 4:35
where R11 k 1 is the element in the first row and the first column of the m m
matrix Rk 1; and R12 k 1 is 1 M row vector representing the first row of
the matrix Rk 1 from which the first element R11 k 1 was deleted, while the
column vector X k is defined by the expression (4.20). Let us note that in the
adopted notation m M 1; where M is the order of the FIR filter under
consideration.
The algorithm (4.33), (4.34) represents a special case of a recursive algorithm
minimizing the mean-square criterion
1
J H E eT k; HK1 ek; H ; 4:36
2
where the prediction error or measurement residual is defined by the relation (4.3),
i.e.
ek; H yk ^ykjH: 4:37
In the expression (4.36) the matrix K represents a constant weight matrix, and if
the system under consideration has a single input and a single output, which is
fulfilled in the case under consideration, then the excitation signal x and the output
y are scalars, so that the matrix K has dimensions 1 9 1. In the quoted case the
scalar factor K has the meaning of a scaling factor and usually a value of K 1 is
adopted. In a general case of a multivariable system with more inputs and more
outputs, the inputoutput variables x and y represents column vectors, and the
choice of the matrix K influences the accuracy of the estimation procedure. Let us
note that in the case of a system with a single input and a single output the least
squares criterion (2.47) represents an approximation of the general criterion of
weighted least squares (4.36), where it was adopted thatK 1, and the mathe-
matical expectation is approximated by the adequate arithmetic means.
The general form of the algorithm recursively minimizing the generalized MSE
criterion (4.36) is defined by the stochastic Newton scheme
H ^ k 1 1 Q H
^ k 1 ck J 00 H
^ k H ^ k 1; e k; H
^ k 1 ; 4:38
118 4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
where the gradient of the argument of the criterion function (the so-called risk or
loss function) in the MSE performance index (4.36)
d 1 T 1
QH; ek; H e k; HK ek; H
dH 2
4:39
d T 1
e k; H K ek; H;
dH
and the matrix of the second derivative of the MSE criterion with regard to the
parameter vector, the so-called Hessian, is
d2 1 T 1
J 00 H E e k; H K e k; H
dH2 2
T
d d e k; H 1
E K ek; H 4:40
dH dH
T
2 T
d e k; H 1 d ek; H d e k; H 1
E K E K ek; H :
dH dH dH2
Adopting the notation
deT k; H d^ykjH
wk; H ; 4:41
dH dH
and assuming that Rk is a convenient approximation of the Hessian matrix (4.40)
in k-th discrete moment, the algorithm (4.38), (4.39) reduces to
^ k H
H ^ k 1 K1 e k; H
^ k 1 ckR1 kw k; H ^ k 1 ; 4:42
where ck is a series of positive numbers decreasing with an increase of the time
index k. Let us note that the formulation of the algorithm (4.38) is based on the
approximation of the gradient of the criterion function by the gradient of its
argument, which basically reduces to the approximation of the corresponding
mathematical expectation by a single realization of the random process, i.e.
J 0 H E wk; HK1 ek; H wk; HK1 ek; H QH; ek; H:
Particularly for the considered algorithm (4.33), (4.37)
0 0 0 0 0 1
0 0 0
F ( H )= ; G (H )= ;
IM
4:46
0 mm
0 0 2m
E ( H) = H ; T
(k , H ) = X ( k )
where IM is a unit matrix with the dimensions M M and m M 1. Differ-
entiating the expressions (4.44) and (4.45) over H, one obtains
where
o
MH; uk; H; uk FHu GHu; 4:49
oH
o
DH; uk; H EHu: 4:50
oH
Introducing the extended state vector
uk; H
qk; H ; 4:51
colnk; H
Eqs. (4.44), (4.45), (4.47) and (4.48) can be represented by the following two
vector relations defining the extended model in the state space
qk 1; H AHqk; H BH uk; 4:52
^ykjH
CHqk; H; 4:53
colwk; H
120 4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
where colA denotes a column vector formed using the matrix A in such a manner
that its columns were sorted one after another. The matrices A, B and C in the
model (4.52) and (4.53) are obtained according to the concrete model of the
system (filter). Specifically, for the considered FIR system, the m 9 m matrix
(Hi denotes the i-th element of the parameter vector H)
duk; H duk; H duk; H
nk; H ; 4:54
dH1 dH2 dHm
where m M 1; so that the extended m 1 1 state vector
2 3
uk; H
6 duk;H 7
6 dH1 7
qk; H 6
6 .. 7;
7 4:55
4 . 5
duk;H
dHm
F(H) 0 0 G (H)
0 F(H) 0 0
q ( k + 1, H ) = q (k , H) + u (k)
0 0 F(H) 0
= A (H ) q (k , H ) + B (H) u (k)
4:56
utilizing (4.43) and (4.46), the output Eq. (4.53) assumes the concrete form
4:57
Let us note that the matrix AH contains the matrix FH in each of m(m ? 1)-
block diagonal elements, and contains all zeroes outside the main block-diagonal.
Such a structure of the matrix A has a consequence that the eigenvalues of the
matrix A are identical to the eigenvalues of the matrix F, only the degree of
multiplicity of the eigenvalues of the matrix A will be higher in comparison to the
multiplicity of the corresponding eigenvalues of the matrix F. Thus, the stability of
4.3 Convergence Analysis of Adaptive Algorithms 121
the extended system (4.52), i.e. (4.56), will be determined by the stability of the
linear predictor (4.44)(4.46). Since for the considered FIR filter the matrix FH is
a lower triangular matrix with all zeroes on the main diagonal, all eigenvalues of the
matrix F are equal to zero (they are located in the origin of the z-plane), i.e. z = 0 is
eigenvalue of the matrix F with a multiplicity m, so for z = 0 it will also be the
intrinsic value of the matrix A, but with a multiplicity degree m(m ? 1). Hence the
extended linear model in the state space is stable, since the eigenvalues of the
system matrix (system poles) are located within the unit circle jzj 1 in the z-plane
[3]. Let us also note that the relations (4.52) and (4.53) are not convenient for
practical calculations, and are only utilized for theoretical analysis of the properties
of algorithms for system parameters estimation, i.e. for the analysis of the propa-
gation of the prediction error and its gradient through the parameter estimation
algorithm. The parameter estimation algorithm of the system (filter) is defined by
the relation (4.42), while the prediction error (4.37) and its gradient (4.41) can be
calculated using the relations (4.52) and (4.53), i.e.
H ^ k 1 ckR1 kw k; H
^ k H ^ k 1 K1 e k; H ^ k 1 ; 4:58
^ k 1 A H
q i 1; H ^ k 1 q i; H ^ k 1 B H^ k 1 uk
4:59
i 0 ; 1; . . .; k 1; q 0; H^ k 1 q0 ;
" #
^y kjH^ k 1
C H^ k 1 q k; H
^ k 1 ; 4:60
^
colw k; Hk 1
^ k 1 yk ^y kjH
e k; H ^ k 1 : 4:61
The concrete form of the matrices A, B and C depends on the system model and
for the FIR system these matrices are defined by the expressions (4.56) and (4.57).
The algorithm (4.58)(4.61) in its basic form is not recursive, since the cal-
^ k 1 requires the use of the complete
culation of the system state vector q k; H
set of input and output measurements fui; i 0; 1; . . .kg. In order to represent
the algorithm in the recursive form, it is necessary to introduce additional
approximations. The solution of difference equation of the state Eq. (4.59) is
X
k1
^ k 1 A H
q k; H ^ k 1 k q 0 A H^ k 1 ki1 B H^ k 1 ui:
i0
4:62
If the model is stable in the state space, i.e. all eigenvalues of the matrix are
located within the area jzj\1 in the z-plane, then Ai H ^ k 1 will exponentially
tend to zero with an increase of the degree (index of moment in time) i. Also, since
ck is a very small positive number for sufficiently large k, the difference of the
estimations H^ k 1 and H
^ k n will be very small starting from some index n,
so that the following approximation can be adopted
122 4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
" #
X
k1 Y
k1
^ k 1 qk
q k; H ^
A H s B H^ i ui: 4:63
i0 si1
H ^ k 1 ckR1 kwkek:
^ k H 4:67
The recursive approximation Rk for the Hessian matrix (4.40) can be derived
in a similar manner. Let us assume that there is a value of the parameter vector H0
which furnishes a good experimental description of the system (filter) under
consideration, in the sense that efk; H0 g is a sequence of independent random
variables with a zero mean value. Thus, near the minimum point, H0 ; of the MSE
criterion (4.36) the measurement residual, e, in (4.37) represents the realization of
white noise, and thus according to the definition of white noise it is independent on
everything that occurred in the previous moments of time, i.e. it is independent on
the random term d2 e=dH2 in (4.40). In this manner, the second term in (4.40) is
equal to zero, and taking into account (4.41), the expression (4.40) reduces to
J 00 H E wk; HK1 wT k; H : 4:68
The approximation of Hessian (4.68) is denoted as GaussNewton approxi-
mation or GaussNewton search direction in the parameter vector space. The
mathematical expectation (4.68) can be further approximated by arithmetic means,
assuming that H H^ k 1,
1X k
^ i 1 K1 wT i; H
^ i 1 :
Rk w i; H 4:69
k i1
Let us note that the expression (4.69) cannot be written recursively, since the
error gradient w depends on the whole set of the measurement data about the
4.3 Convergence Analysis of Adaptive Algorithms 123
system input and output up to the current moment, so that w itself cannot be
^ k 1 in (4.69) is approxi-
written in the recursive form. However, if w k; H
mated by wk in (4.65), then (4.69) reduces to
1X k
Rk wiK1 wT i: 4:70
k i1
1Xk1
1
Rk wiK1 wT i wkK1 wT k
k i1 k
4:71
k1 1 X k1
1
wiK1 wT i wkK1 wT k:
k k 1 i1 k
Utilizing the definition expression (4.70), the relation (4.71) assumes the
recursive form
1 1
Rk 1 Rk 1 wkK1 wT k: 4:72
k k
Finally, if one adopts the notation ck 1=k; the final form of the recursive
approximation of Hessian is obtained
Rk Rk 1 ck wkK1 wT k Rk 1 ; 4:73
which actually represents the algorithm of stochastic approximation.
As emphasized earlier, in the single inputsingle output (SISO) systems, which
is the case with the considered FIR filter, K is a scalar acting only as a scaling
factor, and thus it is usually adopted that K 1. In multivariable systems with
multiple input-multiple output (MIMO) the choice of the matrix K influences the
accuracy of the estimation procedure, and usually it is adopted that K is an
approximation of the error covariance matrix
K0 E ek; H0 eT k; H0 : 4:74
Such a choice gives a minimal possible estimation error covariance matrix in
the case of nonrecursive or package estimation of the system model parameters
(off-line parameter identification) [24, 25, 4246, 48]. Since the size of K0 in
(4.74) generally is not known, it is usually recursively approximated, according to
the stochastic approximation algorithm (4.73), i.e.
h i
^
Kk K ^ k 1 ck ekeT k K^ k 1 : 4:75
complex problem, primarily since the relation (4.67) is connected with the rela-
tions (4.64) and (4.65). Namely, the prediction error ek and its gradient wk are
^
formed using implicitly all previous parameter estimations Hi; i 1; 2; . . .; k ;
which furnishes a very complex mapping of measurement data on system input
and output fui; i 1; 2; . . .; kg to the parameter vector space. Two general
approaches were proposed in literature for the convergence analysis of the con-
sidered algorithm:
(1) Joining the corresponding deterministic (ordinary) differential equation to the
estimation algorithm (ODE approach), thus reducing the algorithm conver-
gence problem to the stability analysis of an ordinary differential equation
[4.24, 4.39].
(2) Joining the corresponding Lyapunov stochastic function to the recurrent sto-
chastic procedure defining the estimation algorithm and applying the martin-
gale theory for the convergence analysis of the introduced function [49, 50].
In the further text we consider the first of the quoted approaches, the so-called
ODE approach. The corresponding deterministic differential equation which
asymptotically behaves in the same manner as the recurrent stochastic procedure
defining the algorithm for system (filter) parameter estimation, can be derived based
on the following considerations. For sufficiently large k the step size ck 1=k in
the parameter estimation algorithm (4.67) will be very small, thus the parameter
estimation H ^ k will change slower with an increasing time index k. This will
cause the corresponding consequences in the relations (4.64) and (4.65). Namely, the
solution of the difference equation in the state space (4.64) is given by the expression
(4.63). Let us further assume that H^ i belongs to a small neighborhood around the
for k n i k 1; where the condition is fulfilled that H
value H belongs to the set
of parameters for which the matrix AH in (4.64) is stable, i.e.
2 Ds fHjAH has eigenvalues within he circle jzj 1g:
H 4:76
is sufficiently
In this case, assuming that the mentioned neighborhood around H
small, one may write
Y
k1
A H ;
^ i An H 4:77
ikn
where the norm of the matrix (4.77) is smaller than ckn by a k\1 and a constant c.
According to (4.77), the expression (4.63) can be approximated for sufficiently
large n as
X
k1
q k kj1 BH
A H u j: 4:78
jkn
4.3 Convergence Analysis of Adaptive Algorithms 125
X
k1
kj
A H u j BH
BH uk 4:80
j0
X
k1
A H kj1 BH
A H u j BH
uk:
j0
Using the defining expression (4.79), the relation (4.80) obtains its recursive
form
AH
qk 1; H qk; H
B H
uk; 4:81
0:
where the initial condition is q0; H
Under the quoted conditions the following approximations are also valid
; wk wk; H
^yk ^yk=H ; ek ek; H
; 4:82
where
^ykjH
colwk; H CHqk; H; 4:83
yk ^ykjH
ek; H : 4:84
H^ k H 1 wk; H
^ k 1 ckR ek; H
; 4:85
Rk Rk 1 ck wk; H K1 wT k; H
R : 4:86
Let us introduce further the definition expressions
E wk; H
f H K1 ek; H
; 4:87
G H K1 wT k; H
E wk; H ; 4:88
126 4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
4:93
Similarly it follows from (4.90)
R
Rk 1 Rk ck 1GH ck 1xk 1; 4:94
from where, after replacing (4.90), one obtains
R
Rk 1 Rk 1 ck ck 1GH ckxk ck 1xk 1
" #
X
k1 X
k1
Rk 1 R
ci GH cixk
ik ik
4:95
and H
Assuming further that Rk 1 R ^ k 1 H;
according to (4.91),
(4.93) and (4.95) one concludes
t0
X
^ t 0 H
H 1 f H
DtR civi; 4:96
it
4.3 Convergence Analysis of Adaptive Algorithms 127
t0
X
R t 0 R
DtGH
R
cixk: 4:97
it
Since the third addend in expressions (4.96) and (4.97) represents an approx-
imation of the corresponding mathematical expectations, and according to the
assumption Efvig Efxig 0; we have
0
1 X t
civk Efcivig ciEfvig 0;
t0 t 1 it
0
1 X t
cixk Efcixig ciEfxig 0;
t0 t 1 it
^ t 0 H
H 1 f H
DtR ; 4:98
Rt0 R
DtGH
R
: 4:99
Adopting that t0 t Dt according to (4.98) and (4.99) one finally obtains a
system of ordinary differential equations (ODE system)
dHd t
R1
d tf Hd t; 4:100
dt
dRd t
GHd t Rd t: 4:101
dt
In relations (4.100) and (4.101) the index d is used to make the difference
between the solution of the system of deterministic differential equations from the
variables in the recursive algorithm to which this system is joined. However, it for
some large t00 the following is fulfilled
0
t0
X
^ t0 Hd t0 ; R t0 Rd t0 ;
H ci t0 ; 4:102
0 0
it
The relation (4.103) shows that, asymptotically taken (for sufficiently large t),
the solution of the joined system of differential equations will coincide with the
variables in the recursive algorithm. In other words, the trajectory of the estimated
parameters asymptotically generated by the recursive algorithm, after sufficiently
128 4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
long time, will follow the solution of the corresponding deterministic (ordinary)
differential equation.
The asymptotic properties of ordinary differential equations are usually
expressed through the stability features. A general method for the analysis of the
stability of systems of nonlinear deterministic differential equations was given by
Lyapunov and this methods bears the name of the second or direct Lyapunov
method [3, 24, 39]. Let us consider the following system of differential equations
dxt
x_ ; x_ f x: 4:104
dt
A set Dc is denoted as the invariant set of the vector differential Eq. (4.104) if
an arbitrary trajectory starting within the set Dc remains within it, i.e.
x0 2 Dc ) xt 2 Dc za 8t: 4:105
The equilibrium point of the stationary point x
of the differential Eq. (4.104) is
defined as the solution of the nonlinear vectorial algebraic equation
x_ 0; f x
0: 4:106
The stationary points represent the invariant set Dc in (4.105), since it is ful-
filled that
x0 x
) x_ f x
0 ) xt x
za 8t:
Any invariant set Dc has an attraction domain DA ; such that each trajectory
(solution of differential equation) starting in DA ends in Dc after an infinitely long
time (it will converge asymptotically to Dc ), i.e.
x0 2 DA ) xt ! Dc kada t ! 1: 4:107
In a general case DA Dc ; and if DA contains Dc ; the set Dc is a stable and
invariant set. If DA coincides with the whole set on which the solution of the
differential equation is defined, then one talks about the global asymptotic sta-
bility, i.e. Dc is globally asymptotically stable invariant set.
The second (direct) Lyapunov method represents a mean for stability analysis.
Let V x be a positive scalar function of a vector argument x; i.e.
V x 0 za 8x: 4:108
such that it is declining along the trajectory (solution) of the differential Eq.
(4.104), i.e.
dV xt dV xt dV xt
V 0 xtf xt 0 za 8xt; 4:109
dt dx dt
and
4.3 Convergence Analysis of Adaptive Algorithms 129
dV xt
V 0 x t 0 ) xt 2 Dc : 4:110
dt
Each function V x satisfying the conditions (4.108)(4.110) is denoted as the
Lyapunov function. Let us note that outside the set Dc the function V xt is a
strictly declining function of the argument t. However, since V is a function limited
from the bottom, it cannot infinitely decrease, i.e. xt must converge to Dc . The
conditions (4.108)(4.110) guarantee global asymptotic stability of the set Dc . To
determine that DA is the attraction domain of the set Dc , it is required that (4.109)
should be fulfilled only for xt 2 DA ; but an additional condition is introduced
that
c V x 0; V x c za x 2 oDA ; 4:111
where oDA is the limit of the set DA ; in order to ensure that the trajectories
(solutions of the differential equation) do not leave the set. It can be shown that if
an invariant set Dc has an attraction domain DA ; then a Lyapunov function with the
above quoted properties always exists [24, 39].
If the Lyapunov theory is applied to a recursive stochastic algorithm of
parameter estimation and to the ordinary differential equation joined to it, the
following conclusions can be drawn [24]:
(1) If Dc is an invariant set for the system of nonlinear differential Eqs. (4.100)
and (4.101), and DA is its domain of attraction, then if H ^ d t 2 DA often
^
enough, the parameter estimation Hd t will converge with probability one to
Dc for t infinitely growing t ! 1.
(2) Only stable stationary points of the system of differential Eqs. (4.100) and
(4.101) represent the possible convergence points of the recursive algorithm
(4.64)(4.67) and (4.73).
(3) The trajectories Hd t of the differential Eq. (4.100) are asymptotic paths of
the estimations H ^ t generated by the recursive algorithm (4.67).
Let us also note that the point (1) represents a statement about the local con-
vergence of parameter estimation. If the stationary point H
is locally stable, it will
have an attraction domain containing the neighborhood of the point H
. Therefore,
if the estimation sequence H ^ t belongs to that neighborhood often enough, it
will converge to H .
The practical application of the Lyapunov theory for the stability analysis of the
differential equation joined to the recursive algorithm of parameter estimation, and
thus the convergence properties of the considered recursive algorithm, consist of
the following steps:
(1) Calculate the prediction error ek; H and the approximation of its gradient
wk; H for a fixed value of the estimated parameter vector H.
(2) Using the variables from three previous step, calculate the expected (averaged)
direction of variable estimation correction in the recursive algorithm (4.87)
and (4.88).
130 4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
(3) Define the system of deterministic differential Eqs. (4.100) and (4.101) which
contain on their right side the averaged directions of changes from the pre-
vious point.
(4) Analyze the stability of the system of deterministic differential equation
utilizing the Lyapunov theory.
The application of the proposed methodology for the convergence analysis to the
considered problem of parameter estimation for a digital FIR filter consists of the
following. The recursive algorithm for parameter estimation is defined by relations
(4.33)(4.35), which represent a special form of the general algorithm (4.64)(4.67)
and (4.73), when one adopts the defining relations (4.43), (4.46), (4.56) and (4.57).
The system of deterministic differential equations joined to the recursive algorithm is
defined by the relations (4.87), (4.88), (4.100) and (4.101), i.e.
dHt dRt
R1 tf Ht; GR; H; 4:112
dt dt
where
f H EfXiei; Hg; GR; H E XiXT i R: 4:113
Let us adopt that the positive function
1
V H; R E e2 i; H 4:114
2
is a candidate for the Lyapunov function. To show that (4.114) is indeed a Lyapunov
function it is also necessary to show that its derivative is non-positive, which satisfied
the assumptions (4.108) and (4.110). Indeed, according to (4.114) it follows
( )
dV H; R dei; H T dHt
E ei; H ; 4:115
dt dH dt
However, the obtained results are asymptotic, i.e. they can be applied when the
number of data is sufficiently large. Regretfully, it is not known how large that
number should really be. The answer to that question, as well as the answers to
many other questions connected with the practical application of the analyzed
algorithm on a finite set of data, must be based on the experimental analysis. To
show the properties of the proposed algorithm, we will analyze the problem of
local echo cancellation in scrambling systems for full-duplex transmission. The
results of experimental analysis are shown in the continuation.
In this section we analyze the application of the RLS algorithm with optimal input
for local echo cancellation in a system for speech scrambling in conventional
telephone lines. The case was considered when the scrambler is located between
the telephone handset and the base unit. Such a design ensures a very efficient
protection from the compromising electromagnetic radiation and eavesdropping
and represents one of the often met models of protection against unauthorized
access to verbal information [18, 34, 5155].
Although the protection of verbal communication has been for a long time a
privilege of state institutions, in recent time there is a surge of interest for its
application in private sector. Such an interest is a result of the availability and public
accessibility of the communication systems, as well as the influence of the techno-
logical advancement which offers acceptable solutions for protection, both regarding
the price and the quality. Among the communication system the telephone network is
the largest system in which it is necessary to solve the problem of protection of verbal
information. In this system the problem of protection can be solved by analog
scramblers or by digital coding. These are two fundamentally different approaches,
and it is well known that digital coding offers a larger degree of protection than the
analog [53]. Besides the advantages regarding the protection quality, the basic
shortcoming of digital coding is the fact that the resulting waveform takes a much
wider frequency range than the frequency range utilized by a speech signal, while the
procedure of verbal signal compression with the aim to overcome the quoted problem
implies a relatively complex practical problem [56]. Because of that there is still a
relatively large interest for the use of analog scramblers, as well as due to the need to
solve the question of protection on the existing telephone lines in the standard
telephone channels, in a manner acceptable both regarding the price and the quality
of the transmitted speech signal. In spite of the quoted advantages of analog
scramblers, there are many problems to be solved in the design of analog scramblers,
like the realized level of protection, synchronization and its maintenance, echo
cancellation, etc. Among many problems influencing the speech quality in these
application the dominant one is local echo. This problem is especially marked in the
132 4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
Scrambler Scrambler
mic.
loudsp.
Descrambler Descrambler
(a)
x (k )
Scrambler
local
echo
H
canceller
e (k ) y ( k )
y (k )
Descrambler +
(b)
Fig. 4.2 Block diagram of the system for voice signal scrambling. a Generation of local echo;
b Position of local echo canceller block
4.4 Application of Recursive Least Squares Algorithm 133
x (k ) y (k )
Block EC y ( k ) +
(k )
H
e (k )
Mechanism for
EC parameter update
since the hybrid termination is not ideal, a part of the transmitted signal energy
arrives to the receiver on the local side and causes degradation of the communi-
cation quality [4, 17, 18, 23]. This signal is called the local echo. The second part
of the transmitted signal, called the line echo, is reflected because of impedance
discontinuities of the telephone line to finally also end in the receiver at the local
side. The problem of line echo cancellation was analyzed in detail in literature [17,
23], so further we will limit ourselves to the problem of local echo cancellation
only. Bearing in mind the presented system design, the local echo cancellation
block must be a part of the scrambling system.
The transfer function through the hybrid, H, which is denoted as the local echo
path, is not equal to zero. Because of that the basic problem is to approximate the
transfer function of the local echo path and then, subtracting the estimated signal
of local echo ^yk from the real local echo yk cancel the resulting signal ek as
much as possible (see Figs. 4.2 and 4.3). This corresponds to the conventional
identification scheme of an unknown system, which is known with an accuracy up
to an unknown parameter vector H, as represented in Fig. 4.3. Besides that, in
applications for local echo cancellation the disturbances or noise, nk; can be
modeled as white noise (white stochastic process) [23].
Besides that, to reach a desired degree of echo cancellation, it is necessary
to
minimize the mean square error (MSE-criterion), i.e. MSE E e2 k [18]. In
other words, the basic idea is to use an adaptive filter as the local echo canceller
(EC) block, in order to estimate the transfer function of the local echo path in best
possible way [51, 52]. It is known that the transfer function can be adequately
modeled as a real rational function with poles and zeroes (it represents a ratio of
two polynomials with real parameters) [23]. Besides that, most of the contem-
porary EC systems are realized as adaptive FIR filters which have zeroes only,
primarily due to stability problems that inevitably follow adaptive filters realized
in the form of IIR structures [17, 18]. Because of that we further analyzed the use
of an adaptive FIR filter for the approximation of the transfer function of a local
echo path.
134 4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
Starting from the quoted assumption, the path of local echo can be described by
the difference Eq. (4.1). According to the problem of parameter identification, it is
necessary to determine the set of coefficients H ^ to minimize the MSE criterion
(4.3), from where the recursion for parameter update of adaptive FIR filter follows,
as given by formulae (4.4)(4.6).
Since the basic purpose of such systems is the protection of telephone com-
munications, they are primarily intended for personal use. Because of that the
connection location in telephone network is often changed, and a consequence is
the change of the local echo function in practically every new call [34]. Because of
that, in local echo cancellation the adaptation must occur at the beginning of each
new call, as a part of the training or initial procedure (pre-processing before the
transmission of the speech itself). Regretfully, during this initialization procedure
there is no useful transmission and the dialogue participants cannot hear each
other. This is a motive to accelerate the standard RLS algorithm by the choice of
variables included in design, especially the input signal, xk; which leads to the
generation of the optimal input.
To show the properties of this approach, we simulated the structure for echo
cancellation shown in Fig. 4.3. The signal of the desired response yk is formed
by bringing the excitation xk to the FIR filter of the M-th order and by adding to
its output independent noise with normal distribution nk; its mean value being
zero. The variance of additive noise is chosen to achieve the desired values of the
signal to noise ratio, SNR. Various values of SNR were generated to analyze the
efficiency of the proposed design for local echo cancellation. The EC block,
implemented as an adaptive filter of the same order M, is determined by the
parameter vector H;^ which is generated by the recursive algorithm for parameter
estimation (4.4)(4.7), (4.20) and (4.23). The same excitation, xk; is applied to
the EC block. First the experiment was performed with a filter of the M 9 order
and with coefficients H f0:1; 0:2; 0:3; 0:4; 0:5; 0:4; 0:3; 0:2; 0:1g; [57].
Besides that, the application of three types of excitation signals was considered:
(1) Optimal input sequence defined by (4.23), denoted as opt;
(2) White normal noise with zero mean value and with unit variance, denoted as
gauss;
(3) Pseudo-random binary sequence 1, denoted as prbs.
20
SNR=10dB gauss
prbs
NEE [dB]
0 opt
-20
-40
0 50 100 150 200 250 300 350 400 450 500
Iteration number
0
SNR=20dB gauss
NEE [dB]
prbs
-20 opt
-40
-60
0 50 100 150 200 250 300 350 400 450 500
Iteration number
0
SNR=30dB gauss
NEE [dB]
-20 prbs
opt
-40
-60
-80
0 50 100 150 200 250 300 350 400 450 500
Iteration number
Fig. 4.4 Normalized estimation error for white Gaussian noise, pseudo-random binary sequence
and optimal input sequence for a FIR filter order M 9 and various SNR values
^ k 2
H H
NEEk 10 log10 ; 4:117
kH k2
where k k is the Euclidean norm.
Figure 4.4 shows the values of the NEE factor for different SNR values. Three
different values of SNR were analyzed, 10, 20 and 30 dB, respectively. The
measure of the efficiency of the EC block was determined according to a modified
Echo Return Loss Enhancement factor (ERLE) defined in [18, 23]
n o
E y k n k 2
ERLE k 10 log10 n o; 4:118
E e k; H ^ k 1 nk 2
where e is given by (4.3) and (4.7). While implementing (4.118), the mathe-
matical expectation Ef g was approximated using an arithmetic sum of 1,500 cor-
responding values [11, 55]. In this way the ERLE factor was calculated after 1,500
time iterations, when the process of filter adaptation was already finished. Figure 4.5
136 4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
M=9
60
opt
gauss
50 prbs
40
ERLE[dB]
30
20
10
0
30 20 10
SNR [dB]
Fig. 4.5 ERLE factor for white Gaussian noise, pseudo-random binary sequence and optimal
input sequence for the FIR filter order M 9; after 1,500 iterations and for different SNR values
shows the values of the ERLE factor for different SNR values. It can be seen from
Figs. 4.4 and 4.5 that the EC block realized based on an FIR filter with optimal input
sequence shows better properties than that obtained when white Gaussian noise or
pseudo-random binary sequence were utilized as excitation signal.
Besides that the following should be stressed:
(1) Initial convergence speed of the normalized estimation error (4.117) is larger
when optimal input sequence is utilized for all three analyzed SNR cases. This
is especially marked for lower values of SNR (see Fig. 4.4 for SNR = 10 dB).
(2) If the optimal input sequence is used, lower values of normalized estimation
error are generated than those obtained when utilizing Gaussian noise or
pseudo-random binary sequence for all analyzed values of the SNR, which
points out to a more successful parameter estimation and faster convergence.
(3) ERLE factor (4.118) is larger when utilizing optimal input sequence for all
analyzed SNR values, which points out to a higher efficiency in local echo
cancellation.
Echo cancellation structure presented in Fig. 4.3 was also simulated under the
conditions identical to the previous case but for larger filter orders, M = 256 and
M = 1,000. The results obtained by this simulation are shown in Figs. 4.6 and 4.7.
According to the simulation results, one may conclude that the proposed solution
for adaptive cancellation of local echo, based on the design of optimal input
sequence, shows better results in comparison to the conventional approach where
white Gaussian noise or pseudo-random binary sequence are utilized, both for low
and high filter orders. This is especially important for practical applications of the
proposed scheme for local echo cancellation. Namely, practical experience shows
that satisfactory results in local echo cancellation in the existing public telephone
4.4 Application of Recursive Least Squares Algorithm 137
20
gauss
SNR=10dB prbs
NEE [dB]
0 opt
-20
-40
0 500 1000 1500 2000 2500 3000 3500 4000
Iteration number
0
gauss
-10 SNR=20dB prbs
NEE [dB]
opt
-20
-30
-40
0 500 1000 1500 2000 2500 3000 3500 4000
Iteration number
0
gauss
SNR=30dB prbs
NEE [dB]
-20 opt
-40
-60
0 500 1000 1500 2000 2500 3000 3500 4000
Iteration number
Fig. 4.6 Normalized estimation error (NEE) for white Gaussian noise, pseudo-random binary
sequence and optimal input sequence for FIR filter order M 256 and various SNR values
values
20
15
10
0
30 20 10
SNR[dB]
138 4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
Fig. 4.8 Normalized estimation error (NEE) for white Gaussian noise, pseudo-random binary
sequence and optimal input sequence for the FIR filter order M 1; 000 and various SNR values
network can be obtained using FIR filters with an order M between 128 and 256.
However, this is not a limitation to a use of the proposed approach to higher order
filters, in dependence on a particular application, like for instance acoustic echo
cancellation (Fig 4.8, 4.9).
Finally, as a conclusion of the performed analysis one may stress the following:
(1) We analyzed the problem of local echo cancellation in full-duplex scrambled
transmission, when the scrambling block is positioned between the handset and
the base unit. As a solution to this problem we proposed the use of an adaptive
local echo canceller and an adaptive algorithm for local echo cancellation based
on the generation of optimal input sequence according to the D-optimal exper-
iment design. In this approach one generates in each step a new sample of D-
optimal sequence for the estimation of filter parameters. Because of this the
adaptation of the local echo canceller is defined as a training procedure occurring
prior to the beginning of the desired protected communication.
(2) The properties of this approach are shown through simulation results. Com-
pared to the traditional adaptive FIR filters with Gaussian or pseudo-random
excitation signals, is shown that whenever the choice of the adaptive filter
order is correct, the proposed solution is better, which reflects in faster
4.4 Application of Recursive Least Squares Algorithm 139
ERLE[dB]
order M 1; 000; after 20
10,000 iterations and various
SNR values 15
10
0
30 20 10
SNR[dB]
It was shown in Chap. 3 that the use of variable forgetting factor in nonstationary
environments leads to a better adaptability of parameter estimation compared to
the conventional algorithms with a fixed forgetting factor. A comparative analysis
of several strategies for the choice of VFF showed the advantage of the PA-RLS
algorithm, which is reflected in good tracking of both fast and slow changes of
estimated parameters, with a small variance of the parameter estimation in the
intervals without changes. Starting from the results obtained for the application of
optimal input sequence and presented in Sect. 4.2 and 4.4, a question occurs what
are the properties of the algorithms with optimal input in a nonstationary
140 4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
environment, i.e. is it possible to apply the variable forgetting factor (VFF) based
on the PA strategy to algorithms with optimal input sequence.
To investigate the properties of the modified PA-RLS algorithm with optimal
input in which the strategy for the choice of variable forgetting factor is defined by
(3.7)(3.9), and the optimal input sequence by (4.20) and (4.23) we simulated the
identification structure shown in Fig. 4.1 in the following manner. The desired
response y(k) is obtained by bringing a signal x(k) to the input of an FIR filter of
the order M, while independent normal Gaussian noise n(k) with a fixed variance
and a zero mean value is added to the output. Noise variance is chosen to achieve a
desired SNR factor at the filter output. Various values of SNR were generated with
a goal to analyze the proposed algorithm. The adaptive filter is also an FIR filter of
the order M, determined by the vector of estimated parameters H ^ k. The same
input signal x(k) is brought to the input of the adaptive filter. The experiment was
contacted on the FIR filter of ninth order, with coefficients H
f0:1; 0:2; 0:3; 0:4; 0:5; 0:4; 0:3; 0:2; 0:1g; [57], in which the first coefficient
varied according to Fig. 3.1, as already described in Chap. 3. Besides that, two
types of input signals were analyzed:
(1) Optimal input sequence defined by (4.23), denoted as opt;
(2) White normal noise with a zero mean value and with a unit variance, denoted
as gauss;
The results of parameter estimation obtained by the use of the PA-RLS algo-
rithm for the choice VFF and the input signals (i) and (ii) were compared using the
0
Optimal
-5 Grk
-10
-15
-20
-25
-30
-35
-40
0 500 1000 1500 2000 2500 3000
Fig. 4.10 Normalized estimation error (SNR = 10 dB) for the choice of VFF using PA-RLS
algorithm with the input signal: Solid line Gaussian noise; Dashed line optimal input sequence
4.5 Application of Variable Forgetting Factor to Finite Impulse Response 141
-5
Optimal
Grk
-10
-15
-20
-25
-30
-35
-40
-45
0 500 1000 1500 2000 2500 3000
Fig. 4.11 Normalized estimation error (SNR = 20 dB) for the choice of VFF using PA-RLS
algorithm with the input signal: Solid line Gaussian noise; dashed line optimal input sequence
10
Optimal
Grk
-10
-20
-30
-40
-50
-60
0 500 1000 1500 2000 2500 3000
Fig. 4.12 Normalized estimation error (SNR = 30 dB) for the choice of VFF using PA-RLS
algorithm with the input signal: Solid line Gaussian noise; Dashed line optimal input sequence
142
Table 4.1 Value of averaged normalized estimation error for white Gaussian noise and optimal input sequence (SNR = 10 dB): (a) at each considered
interval separately (I1I11), (b) on stationary only (STAC) and nonstationary only (NEST) intervals
(a)
SNR = 10 dB
D I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11
10 Opt -27.00 -18.89 -17.76 -18.14 -16.32 -19.73 -15.51 -16.01 -16.06 -22.90 -31.96
Gauss -24.31 -18.36 -20.31 -18.05 -15.19 -20.48 -15.83 -15.89 -15.56 -22.31 -30.15
25 Opt -26.94 -19.12 -17.74 -18.06 -16.38 -19.30 -15.53 -15.98 -16.02 -23.06 -32.08
Gauss -24.25 -18.59 -20.51 -18.02 -15.27 -19.90 -15.82 -15.80 -15.54 -22.49 -30.12
50 Opt -26.85 -19.44 -17.81 -17.90 -16.49 -18.81 -15.55 -16.03 -15.88 -23.02 -32.29
Gauss -24.12 -19.02 -20.98 -17.81 -15.43 -19.23 -15.84 -15.79 -15.64 -22.36 -30.19
(b)
D Input NEST STAC
10 Opt -19.150 -20.773
Gauss -19.025 -20.221
25 Opt -19.110 -20.787
Gauss -18.965 -20.257
50 Opt -19.044 -20.815
Gauss -18.846 -20.369
4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
Table 4.2 Value of averaged normalized estimation error for white Gaussian noise and optimal input sequence (SNR = 20 dB): (a) at each considered
interval separately (I1I11), (b) on stationary only (STAC) and nonstationary only (NEST) intervals
(a)
SNR = 20 dB
D I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11
10 Opt -36.85 -24.94 -32.21 -23.07 -27.54 -22.06 -30.91 -22.02 -20.35 -20.65 -32.32
Gauss -34.17 -23.78 -31.74 -23.08 -25.32 -22.59 -28.93 -20.53 -18.86 -20.37 -31.49
25 Opt -36.79 -25.92 -32.01 -23.08 -28.23 -23.08 -31.78 -22.02 -21.76 -21.50 -32.50
Gauss -34.10 -24.66 -31.71 -22.78 -25.81 -23.55 -29.31 -20.32 -19.83 -21.67 -31.69
50 Opt -36.70 -27.10 -31.58 -24.68 -29.19 -24.67 -33.23 -24.01 -21.57 -22.79 -32.68
Gauss -33.97 -26.12 -31.17 -22.73 -26.84 -24.94 -29.82 -21.59 -22.61 -23.36 -31.76
(b)
D Input NEST STAC
10 Opt -22.547 -30.034
Gauss -22.072 -28.421
25 Opt -23.123 -30.513
Gauss -22.597 -28.743
50 Opt -24.650 -30.827
4.5 Application of Variable Forgetting Factor to Finite Impulse Response
Table 4.3 Value of averaged normalized estimation error for white Gaussian noise and optimal input sequence (SNR = 30 dB): (a) at each considered
interval separately (I1I11), (b) on stationary only (STAC) and nonstationary only (NEST) intervals
(a)
SNR = 30 dB
D I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11
10 Opt -46.70 -31.07 -39.94 -27.79 -37.63 -29.83 -38.15 -26.65 -29.07 -25.94 -41.01
Gauss -44.02 -30.72 -40.80 -28.16 -35.18 -29.12 -38.25 -25.67 -26.98 -25.96 -40.24
25 Opt -46.64 -32.06 -40.16 -27.86 -38.75 -31.45 -37.93 -27.78 -31.18 -29.53 -41.25
Gauss -43.95 -31.72 -40.89 -27.98 -36.21 -30.53 -38.64 -25.98 -29.22 -27.46 -40.60
50 Opt -46.54 -33.54 -40.25 -31.54 -39.56 -33.21 -38.21 -30.81 -33.25 -31.60 -41.42
Gauss -43.82 -33.48 -40.34 -30.17 -37.43 -32.46 -39.13 -28.98 -31.42 -30.42 -40.75
(b)
D Input NEST STAC
10 Opt -28.256 -38.750
Gauss -27.927 -37.581
25 Opt -29.736 -39.317
Gauss -28.735 -38.252
50 Opt -32.140 -39.872
Gauss -31.102 -38.816
4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
4.5 Application of Variable Forgetting Factor to Finite Impulse Response 145
normalized estimation error (4.117), taking care to take into account changes of
the estimated system over time, i.e. a constant vector H was replaced by a time-
variant H(k), thus
Hk H^ k 2
NEEk 10 log10 : 4:119
kHkk2
Figures 4.10, 4.11 and 4.12 show the values of normalized estimation error
(4.119) for the cases when the SNR factor is SNR 10, 20 and 30 dB, respectively.
Besides that, we separately considered different intervals depending on the
changes of the parameters of the estimated filter. Namely, it is possible to dis-
tinctly separate intervals in which the estimated parameter does not change its
value: I1 (11,100), I3 (1,4001,650), I5 (1,6501,950), I7 (2,1502,250), I9
(2,2502,350), I11 (2,3503,000), then the intervals in which the estimated
parameter slowly changes: I2 (1,1001,400), then the fast change I5 (1,9502,150),
as well as the abrupt changes I4 (around 1,650), I8 (around 2,250) and I10 (around
2,350). The bracketed values are the numbers of signal samples in the sequence,
while I1I11 is the notation of the intervals. To compare the influence of the input
signals, in all intervals we analyzed the averaged normalized estimation error
given as
1X L
NEEk; 4:120
L k1
where L represents the length of the interval under consideration. When averaging
stationary and nonstationary intervals, for the sake of the analysis of fast changes,
which occur almost instantaneously, we introduced the variable D, so the criterion
(4.120) is modified and written in the form (4.121). The variable D defines the
range taken into account before and after nonstationarity occurred, and its values
were considered for 10, 25 and 50 samples. According to this, (4.120) can be
written as
LX
2 D
1
NEEk; 4:121
L2 L1 2D kL1 D
where L1 and L2 are the beginning and the end of the considered interval,
respectively. In the moments of occurrence of abrupt changes L1 L2 ; so that the
averaging interval is equal to 2D. Tables 4.1, 4.2, and 4.3 show averaged values of
normalized estimation error (4.121) for different methods of generation of input
sequence on each considered interval separately (I1I11), as well as on stationary
only (STAC) and nonstationary only (NEST) intervals. The presented values are
obtained for different values of the SNR.
The obtained results show that the use of optimal input sequence in PA-RLS
algorithm for the determination of VFF does not compromise its good properties in
the intervals of nonstationary changes of parameter values. Besides that, the initial
146 4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
of a part of the observation from the main body of the population represents the
task of the robust statistics [59].
Like previously emphasized, the occurrence of inconsistent or surprising
observations is denoted in Anglo-Saxon literature as the problem of outliers. One
classification of possible occurrence of outliers encompasses three types of data
variation: (1) inherent variability which represents a natural property of the data
change which cannot be influenced; (2) execution errors which represent a con-
sequence of external influence which has not been taken into account when
modeling the population; (3) measurement errors which can be regarded as the
errors of measurement data readout and rounding [59]. In the last mentioned type
of variation, the problem is most often overcome by replacement, if the correction
model is known, or by rejection of such data. For the remaining two types one may
apply the so-called incongruence tests for the detection of the presence of
deviations from the assumed population [13]. The incongruence tests have an
important role in the initial phase of the analysis of a set of measurement data.
Further procedure depends on a particular application, which may lead to a cor-
rection or rejection of incongruent observation or one may interpret these incon-
gruent observations through the identification of the influence of some factors of
particular practical interest.
In the field of estimation, regarded as a separate field of the robust statistics, the
problem of incongruent observations does not have necessarily to start by
incongruence tests, and the basic interest is to construct an algorithm for esti-
mation of unknown parameters within the adopted stochastic data model which is
relatively insensitive to the possible presence of outliers, deviating insignificantly
from the optimal estimators when outliers do not exist [14, 15, 28, 47].
In this chapter we present a new member of the family of robust algorithms for
identification of systems, which is based on a statistical approach denoted as the
M-estimation [14]. The assumed robust LMS algorithm differs from the conven-
tional algorithm in that one introduces nonlinear transformation of the estimation of
measurement residual, and the goal of the introduction of nonlinearity is to give a
small weight to a minority of incongruent residuals, so that impulse noise in the
desired filter response does not have a large influence to the total parameter
estimation. The properties of the proposed algorithm are compared to the properties
of the existing robust algorithms that are widely utilized for the solution of the
quoted problems. Besides that, we analyzed the possibility to apply optimal input
sequences to the robust version of the recursive least squares algorithm (RRLS
algorithm), with a goal to improve the convergence properties and the accuracy of
the estimated parameters, as well as to use variable forgetting factor in the
conditions when besides impulse noise one also has to solve the problem of
identification of a system with time-variant parameters.
Because of the known sensitivity of robust estimators to signal dynamics
changes, we analyzed the possibility to iteratively estimate the scaling factor in
robust algorithms during the operation of such an adaptive algorithm, in order to
ensure universality of the estimation procedure with regard to its invariance to the
5 Robustification of Finite Impulse Response Adaptive Filters 149
value of the measurement residual, i.e. the dynamic properties of the measurement
signal.
A special problem in the field of estimation is the estimation of a system with
time-variant parameters, because abrupt changes of parameter values result as a
rule in an abrupt rise of residual, which may be erroneously interpreted as the
appearance of impulse interference in measurement noise. Because of that we
present another new algorithm, which basically represents a combination of a
robust algorithm and an algorithm with variable forgetting factor, and is based on
the synthesis of an outlier detector in the form of a robust median filter and the
application of the recursive least squares algorithm with exponential forgetting
factor for the estimation of time-variant system parameters (FIR filter). The
properties of the proposed algorithms are analyzed experimentally, through a
computer simulation.
Although widely utilized, the least mean square error algorithm (LMS algorithm)
is, similarly to other gradient-type algorithms, very sensitive to impulse errors
[36, 57, 60]. The appearance of impulse interference is often mages, biomedical
signals, as well as in solving communication problems [19]. This shortcoming or
the LMS algorithm is one of the reasons to introduce a modification which would
robustify it with regard to impulse interference. Possible solutions to this problem
are the well-known algorithms from literature, denoted as the Robust Mixed Norm
(RMN) and the median LMS (MLMS) algorithms [36, 57, 60]. Both algorithms
imply nonlinear transformations, but the main difference is that the MLMS
algorithm achieves robustness through the application of a nonlinear median
operation to the gradient estimation, while in the case of the RMN algorithm one
applies a nonlinear transformation to the residual estimation [36, 57]. The block
diagram of the structure for parametric identification of the system is shown in
Fig. 5.1, where the unknown system is a FIR filter of a known order M, defined by
the vector h h0 h1 hM1 T : A common input signal xk representing
Gaussian noise with a zero mean value is brought to the input of an unknown
system and an adaptive filter. Impulse noise nk is brought to the output of the
unknown system yk to obtain noisy desired response d k: Let us note that the
real desired response is yk; while dk represents measurement of this signal in
presence of additive measurement noise nk:
The adaptive filter is also an FIR filter of the order M, determined by the
parameter vector Hk and by a recursive algorithm for parameter estimation,
whose purpose is to obtain the best possible approximation of the desired response
yk by the output signal from the adaptive filter, ^yk; and at the same time to filter
measurement noise, nk: The goal of the parameter estimation algorithm is to
150 5 Robustification of Finite Impulse Response Adaptive Filters
e(k)
Criterion
J
minimize the criterion function, whose main parameter or argument is the esti-
mated error or measurement residual
pe 1 kN 0; 1 kL0; 1; 0\k\1;
ue 1 ke2 2 kjej; We 1 ke ksigne;
while the solution of the Eq. (5.4) leads to the RMN estimation of the
parameter vector H: The choice of k in each step n is based on the probability
kn Pfjdnj [ d0 g; where d0 is a positive threshold, obtained under the
assumption that the desired response dn has a distribution N 0; rd ; which
gives kn 2erfcjdnj=rd : Here erfc represents a complementary error
function [57]. For the robust estimation rd one utilizes the estimation
12
oT To
^d
r ;
nH 3
where o represents a vector formed of nH last values of the desired response d,
sorted in ascending order with regard to their amplitude, and T is a nH nH
diagonal matrix T Diag0; 1; 1; . . .; 1; 1; 0; [57], where nH denotes the dimen-
sions of the parameter vector H.
The LMS estimation (5.2) (item 1) is very sensitive to the shape of the dis-
tribution function at its ends, since the realizations of impulse noise or outliers,
generated by weighted tails deviating from Gaussian distribution, may
adversely influence parameter estimation. On the other hand, RMN and LAD
estimations (items 2 and 3) are much more robust, in the sense of smaller sensi-
tivity to the presence of impulse noise. The presented estimators point out to the
possibility to design a new member of the RMN estimator class, which would be
situated between the LAD and the LMS estimator. The basic motivation is that the
LMS algorithm, generally taken, gives much better estimation in the absence of
impulse noise, when the LAD is less successful, however the LAD is much more
robust in presence of impulse interference in comparison to the LMS algorithm.
Such a new term of the group of robust algorithms for adaptive filtering is
essentially based on Hubers M-robust estimation (5.4) [14], it is called robust
LMS (RLMS) algorithm [61] and is presented in the rest of this Chapter.
152 5 Robustification of Finite Impulse Response Adaptive Filters
When minimizing the maximum asymptotic variance of estimator error within the
class of contaminated normal distribution (normal mixture with weighted tails)
pe 1 kN 0; 1 kge; 0\k\1;
Huber [14] proposed the W function (the so-called influence function) that cor-
responds to the probability density function pe which in its middle part behaves
like N 0; 1 while at the ends it has the form ge L0; 1; i.e.
8
< 1; e [ 0
0
ue WH e ln pe minjej; msigne; signe 0; e 0 5:5
:
1; e\0
where m [ 0 is a constant ensuring efficient robustness of the estimator, its value
controlling the coordination between the robustness degree and the property of
estimator degradation for the case when impulse interference generated by the
term ge L0; 1 is absent. It turns out that such an estimator also gives satis-
factory results for many other probability density functions ge; which are
characteristic for some particular practical applications [14, 15, 36]. It is said that
an estimator is efficiently robust if its accuracy is high (90 7 95 %) for an
adopted nominal statistical model for which one usually utilizes normal distribu-
tion, N(0,1). The function W ensures robust properties of the estimator. In a general
case it is a limited, monotonously non-decreasing and continuous. Monotonicity
leads to a unique solution of the M estimator (5.4), limitedness ensures that par-
ticular realizations of impulse interference do not have arbitrarily high influence to
parameter estimations, and continuity ensures that rounding and cutoff errors, as
well as grouped contamination (impulse interference) do not cause large pertur-
bations of parameter estimations [35].
However, the considered M-estimator is sensitive to changes of signal
dynamics. Namely, let us assume that a solution H ^ is found from (5.4) for some En
and that the realizations of random variables e(i) are replaced by the corresponding
ones where the standard deviation is tripled. A new solution H;^ generated by (5.4),
does not have to be equal to the previous one. To cancel the influence of signal
dynamics to the value of estimated parameters one should modify the expression
(5.4) in the following manner
1X n
~ H ei rei 0;
W 5:6
s i1 s
~ H sWH [35]. Namely, the principal goal of the
where s is a scaling factor, and W
introduction of the scaling factor is to make the estimator invariant to the value of
the random measurement residual, whose expected or mean value is equal to the
5.1 Robust Least Mean Square Algorithm 153
RLMS:(e)=min(|e|,m)sign(e)
-m -1 1 m
rezidual e
-
-1
^ n H
H ^ n 1 ln 1med e n; H
^ n 1 Xn ;
nH
where med fgnH is the vector mn; with dimensions M 1; and with elements
mi n med H ~ i n; H
~ i n 1; . . .; H
~ i n nH 1 ;
~ i n is the i-th element of the vector e n; H
while H ^ n 1 Xn:
Convergence properties of the robust RLMS algorithm (5.9) can be analyzed using
an approach based on ordinary differential equations (ODE), presented in detail in
Sect. 4.3. Under certain conditions (i.e. assuming that sn rd ; lim ln 0
n!1
P
1
and ln 1), the next ODE system can be utilized for the recurrent sto-
n1
chastic procedure (5.9):
dHs
rd f Hs; 5:10
ds
where
X
n
1 e i ei
f Hs lim n WH Xi E WH Xi : 5:11
n!1
i1
rd rd
To analyze the stability properties of the ODE system, let us utilize the direct
method of Lyapunov [24, 39] and define a non-negative function
1
V H; rd E W2H ~ei ; ~ei ei=rd 5:12
2
as a candidate for the Lyapunov function. In order to show that (5.12) is really a
Lyapunov function, it is necessary to show that its derivative is not positive [24, 39].
Starting from (5.12), one may write
dV H; rd dHs
mE WH ~eiXT i r1
d mf T Hf H 0: 5:14
ds ds
Using (5.14), the stability analysis presented in Sect. 4.3 shows that the
parameter estimation H ^ n from the Eq. (5.9) will converge with a probability equal
to one to the stationary value H
of the Lyapunov function V in (5.12). Generally
V W0 ; p V W0 ; p0 V W; p0 ; 5:18
where W is an arbitrary criterion function. This means that for an arbitrary
probability p 2 P the covariance matrix of estimation error will not be larger than
the covariance matrix of the estimation error for the most unfavorable probability
density p0 within the given class P, while the arbitrary criterion function W will
give a larger covariance matrix of estimation error for the most unfavorable
probability density p0 with regard to the ML-estimator which is designed
according to the worst probability density function p0 within the given class P.
Let us illustrate the presented result through the following example. Let us have
a noise probability density distribution
function p with a zero mean value and with
a variance r2d ; i.e. pn n N nj0; r2d ; while real noise has a contaminated normal
probability density distribution
pc n 1 kN nj0; r2d kg;
where 0 k 1; and g represents a symmetric probability density distribution
function with a zero mean value and with a variance r2g r2d : For the standard
LMS algorithm, the influence function is defined by
W x W1 x x r2d ;
which gives for the scaling factor in (5.15) a value of aW1 ; pn r2d and
aW1 ; pc kr2d : It follows from here that the ratio of norms for corresponding
covariance matrices of estimation errors in (5.15) can be approximated as
. .
aW1 ; pc aW1 ; pn kr2g r2d 1:
This result shows a decrease of the convergence speed of the LMS algorithm in
the presence of impulse noise generated by the probability density distribution
function g: Specifically, if g has a Cauchy distribution, when r2g 1; the
standard LMS algorithm ceases to be operational, i.e. it diverges. On the other
hand, the choice of W0 in (5.17) introduced robustness, so that [14, 28]
aW0 ; pc aW0 ; p0 I 1 p0 :
As mentioned earlier, for the class P of the contaminated normal distribution pc ;
the worst case of distribution p0 in (5.17) is the normal distribution with expo-
nentially weighted tails, i.e. the normal distribution contaminated by Laplace
distribution, which leads to the coincidence of W0 with WH in (5.5).
158 5 Robustification of Finite Impulse Response Adaptive Filters
To show the properties of the RLMS algorithm, we simulated a system for the
identification of the FIR filter parameters, as presented in literature [36, 60]. The
signal of the desired response was obtained by bringing noise with normal dis-
tribution and with unit power to the input of a ninth-order FIR filter, with the
coefficients
h 0:1 0:2 0:3 0:4 0:5 0:4 0:3 0:2 0:1:
Independent noise with normal distribution was additively brought to the filter
output signal, with a fixed variance thus chosen to ensure a signal to noise ratio
SNR 0 dB prior to addition of impulse noise. Impulse noise was generated using
the model
nk akAk;
where ak is a binary independent identically distributed (iid) process with the
probabilities
Pak 1 0:01 and Pak 0 0:99:
Here Ak is a process with a symmetrical distribution of amplitude, uncorre-
lated with ak and with a variance varfAkg 104 =12: The length nH of the
sliding window of the estimator (5.7) was chosen to be 10, in order to decrease as
much as possible the probability of the appearance of more than one outlier within
the window. A value of l 0:01 was chosen for the constant of adaptive gain
factor ln l=n, in order to ensure initial convergence speed of the analyzed
algorithm as close as possible to the case without impulse noise.
Figure 5.3 shows the value of log-normalized error of estimated parameters
H^ n h
2
10 log ; 5:19
khk2
where k k is the Euclidean norm, for different algorithms in the case of pure
Gaussian noise. Obviously the properties of the RLMS and of the MLMS algo-
rithm are very similar to the LMS algorithm, and simultaneously better than the
properties of the RMN algorithm. The LAD algorithm shows in this case the worst
results. Figure 5.4 shows the values of the criterion (5.19) of the same algorithms
obtained for the case of simulation when impulse noise is present. Besides that,
Table 5.2 shows the averaged criterion (5.19) for different values of probability
Pak 1 of the appearance of impulse realizations of noise, calculated based
on 100 Monte Carlo attempts. Table 5.3 shows averaged value of the criterion
(5.19), obtained on the basis of 100 Monte Carlo attempts for different values of
impulse noise intensity, expressed by varfAkg: Table 5.4 presents the same
properties measured for different distributions of impulse noise amplitude Ak:
5.1 Robust Least Mean Square Algorithm 159
-3
dB
-4
-5
-6
-7
0 50 100 150 200 250 300 350 400 450 500
Number of iterations
dB 0
-2
-4
-6
-8
0 50 100 150 200 250 300 350 400 450 500
Number of iterations
It is obvious that the properties of the LMS algorithm become worse, while the
implemented changes do not influence the RLMS algorithm, and at the same time
the RLMS retains better properties compared to the RMN and the LAD algorithm.
Besides that, the RLMS and the MLMS algorithms have comparable results. The
dependence of the estimation quality on the parameters quantitatively determining
the adaptive filtering algorithm is also important from the practical standpoint.
Namely, the form of the nonlinearity WH in (5.5) is unambiguously determined
by the parameter m which, generally taken, depends on the probability
Pak 1 of the appearance of impulse noise. Since this probability, the
so-called contamination index [14], is not accurately known in practical situations,
it is necessary to define it in advance. The value defining the decrease of the
algorithm efficiency under the condition of normal noise is denoted in literature as
the premium [12]. Simulation results show that this factor is small for the RLMS
160 5 Robustification of Finite Impulse Response Adaptive Filters
Table 5.3 Averaged log normalized estimation error (5.19) for different values of impulse noise
intensity varfAkg and Pak 1 0:01; calculated based on 100 Monte Carlo attempts
Algorithm
LMS LAD RMN MLMS RLMS
Var{A(k)} 103/12 -4.1910 -2.8602 -3.9423 -4.7064 -4.8478
104/12 -1.4289 -2.7743 -3.8918 -4.6313 -4.7765
105/12 0.3731 -2.7692 -3.8397 -4.4703 -4.6974
106/12 5.0510 -2.7790 -3.8923 -4.4401 -4.8262
Table 5.4 Averaged log normalized estimation error (5.19) for different values of the amplitude
distribution, calculated based on 100 Monte Carlo attempts
Distribution
Laplace Cauchy Uniform
Algorithm LMS -1.2613 -1.6846 -1.0091
LAD -2.7282 -2.8674 -2.7370
RMN -3.8422 -3.8951 -3.7294
MLMS -4.4369 -4.5313 -4.5375
RLMS -4.7800 -4.6771 -4.8604
5.1 Robust Least Mean Square Algorithm 161
algorithm when the value of the parameter m belongs to the interval (1, 2) and
when the real value of the variance r2d is accurately known, ensuring at that good
properties of robustness. Since the value of r2d is usually not known in practical
situations, it is estimated using (5.7). This has only a little influence to the decrease
of the mentioned efficiency if the length of the moving window nH is correctly
chosen. As mentioned before, the simulation results show that the value nH 10
gives good results in practical situations.
^
Finally, since the analyzed algorithms are nonlinear, the starting value H0 may
have a great influence to the quality of the parameter estimation, as well as the choice
of the value of the parameter l in the adaptive amplification factor ln l=n
(Fig 5.5).
On the other hand, low sensitivity to initial conditions is important to obtain
practical robustness.
This problem can be solved by utilizing the adequate initial value H0; ^
obtained by the LMS estimation, while the most convenient value of the variable l
can be determined by simulation, while keeping track of the convergence speed.
Experiments show that a value of l 0:01 gives good results.
The obtained results point out to the following conclusions:
1. A robust adaptive algorithm for FIR filters, the so-called RLMS algorithm is
proposed, based on the statistical approach denoted as the M-estimation, and
basically representing an approximation of the well-known method of maxi-
mum likelihood (ML-estimation). Contrary to the ML-estimator, whose crite-
rion function (5.2) is determined according to the exact knowledge of the
function of distribution density of noise probability, in the M-estimation (5.4,
5.5) one starts from the assumption that the function of the distribution density
of noise probability cannot be exactly known, and it is chosen instead so that an
-2
-3
-4
RLMS
-5
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
P{a(k)=1}=c
162 5 Robustification of Finite Impulse Response Adaptive Filters
X
i
^ i1
Ji H ^ :
u e k; H 5:24
k1
The criterion (5.24) follows from (5.19) when one replaces mathematical
expectation by arithmetic means. It is implied that such an i is chosen to ensure
that Ji converges to J given in (5.20) [28]. To solve a system of nonlinear
equations appearing as the results of the condition of optimum oJ H ^ =oH^ 0;
where oJ =oH denotes gradient or partial derivative operator, one may apply the
164 5 Robustification of Finite Impulse Response Adaptive Filters
Newton-Raphson method [12, 24, 25]. Linearizing Ji H ^ in the vicinity of the
^ i 1;, one obtains
estimation H
oJ H ^ i 1
^ Ji H
Ji H ^ i 1 i H^ H^ i 1 T 1 H ^ H^ i 1 T
oH 2
o2 J i H^ i 1
^ H
H ^ i 1 O
H ^ H ^ i 1
2 ;
oH2
5:25
where
O k xk
lim 0; 5:26
kxk !1 k xk
^ H
and k k denotes the Euclidean norm. The desired value H ^ i is obtained by
solving the equation
oJi H^ i =oH 0; 5:27
whence one obtains
" #1 " #
o 2
J i
^ 1
Hi oJ i
^ 1
Hk
^ Hi
Hi ^ 1 i i
oH2 oH
^ ^
O Hi Hi 1 :
5:28
Let us note that the algorithm (5.28) essentially represents the Stochastic
Newton-Raphson scheme (4.38) applied to the task of criterion (5.24)
minimization.
Let us assume that additionally the following hypotheses are satisfied
P1: Estimation Hi ^ is in the vicinity of the estimation Hi ^ 1; which causes
(a) O
Hi ^ Hi ^ 1
0
o Ji H
2 ^ i o2 Ji H^ i1
(b) oH 2 oH2
for each sufficiently large i
oJi H
^ i1
^
P2: Estimation Hi 1 is optimal in the step i 1; which gives 0;
oH
Further, it follows from (5.24) that
^ i 1Ji1 H
iJi H ^ u e i; H
^ : 5:29
Taking H ^ Hi
^ 1 and twice differentiating (5.29), while utilizing the
hypothesis P1 (b) and the fact that because of (5.1)
^
oe i; H
Xi;
oH
5.2 Robust Recursive Least Squares Algorithm 165
one obtains
o2 Ji H^ i 1 o2 Ji1 H^ i 1
i i 1 ^ i 1 ;
XiXT iW0 e i; H
oH2 oH2
5:30
where W u0 : Besides that, taking also into account the hypothesis, one
obtains from (5.19), after differentiating and replacing H by Hi^ 1;
oJi H^ i 1
i XiW0 e i; H
^ i 1 : 5:31
oH
Introducing the notation
o2 Ji H^ i 1
Ri i ; 5:32
oH2
and utilizing (5.31), the relation (5.28) assumes the form
H ^ i 1 R1 iXiW e i; H
^ i H ^ i 1 ;
5:33
^ di XT iH
e i; H ^ i 1
Ri Ri 1 W0 e i; H
^ i 1 XiXT i: 5:34
One often utilizes the matrix Pi R1 i: Using the lemma on matrix
inversion (2.58), one obtains from (5.33) to (5.34)
^ k H
H ^ k 1 PkXkW e k; H ^ k 1 ;
5:35
e k; H^ d k X T k H
^ k 1
Pk 1XkXT kPk 1W0 e k; H
^ k 1
P k P k 1 5:36
1 W0 e k; H
^ k 1 XT kPk 1Xk
where k 0; 1; 2; . . .: The relations (5.35) and (5.36) define the robust recursive
LS (RRLS) algorithm, where W is defined by (5.23). The standard deviation
(5.23) is unknown and it must be estimated. The popular ad-hoc robust
estimation of the parameter s in statistical literature is the median of the absolute
deviation (5.7), which was discussed in the previous chapter. This scheme for the
determination of s also suggests adequate values for the parameter m from (5.23).
Since s r; one usually takes for m a value close to 1.5. Such a choice gives much
higher efficiency than the RLS algorithm in the case of Gaussian distribution with
weighted tails, and retains good properties in the case when the data is generated
by a normal distribution density. The program implementation of the RRLS
algorithm is based on the flow diagram presented in Table 5.5.
166 5 Robustification of Finite Impulse Response Adaptive Filters
The initial values of the parameter vector H ^ and the matrix P should be chosen
to enhance the convergence properties of the algorithm in the initial part. A
possible approach to accelerate the convergence of the parameter estimation, based
on the optimal design of the input signal, is shown in Chap. 4. Taking into account
(4.204.23) one may define a robust recursive LS (RRLS) algorithm with optimal
input (RRLSO) [62] given by (4.204.23), (5.25) and (5.26). Additionally it is
necessary to investigate the practical effects of the use of optimal input to the
RRLS algorithm under conditions when additive impulse noise is present. In other
words, a question is posed if the improved properties of the algorithm, obtained by
the application of the optimal input, are retained in the RRLS algorithm when it is
in an environment with impulse additive noise. Practical convergence properties of
the RRLSO algorithm, for a finite number of iterations, are experimentally
investigated in the further text. The flow diagram of the RRLSO algorithm is
shown in Table 5.6.
5.2 Robust Recursive Least Squares Algorithm 167
When the filter parameters are variable with time, it is natural to utilize the
second (alternative) form of the RRLS algorithm (5.35) and (5.36). By introducing
the replacement [24]
i ciRi; ci i1
R 5:37
into (5.23), one obtains
Hi ^ 1 ciR1 iXiW e i; H
^ Hi ^ i 1 : 5:38
Besides that, it follows from (5.34) to (5.37) that
Ri
Ri 1 ci W0 e i; H ^ i 1 XiXT i R
k 1 : 5:39
Sometimes it is more convenient to utilize the matrix Pi and the forgetting
i; these are defined in the following manner
factor (FF) qi instead of ci and R
[24]
1 i; qi ci 1 1 ci:
Pi ciR 5:40
ci
In that case the RRLS algorithm (5.385.40) becomes, after applying the lemma
on matrix inversion (2.58),
^ Hi
Hi ^ 1 PiXiW e i; H ^ i 1 : 5:41
( )
1 Pi 1XiXT iPi 1W0 e i; H ^ i 1
Pi Pi 1 : 5:42
qi qi W0 e i; H
^ i 1 XT iPi 1Xi
168 5 Robustification of Finite Impulse Response Adaptive Filters
Relations (5.41) and (5.42) define the adaptive RRLS algorithm with a VFF
factor q: The forgetting factor q; where 0\q 1; imparts different weights to the
previous signal samples, thus enabling the RRLS algorithm to track changes in the
signal. The ways to automatically adjust the forgetting factor q; during the oper-
ation of the algorithm are considered in detail in Chap. 3.
To investigate the properties of the RRLS algorithm with optimal input (RRLSO
algorithm) we simulated a system structure shown in Fig. 5.1. The desired system
response was obtained by bringing an input signal x(k) to a ninth-order FIR filter
with the coefficients hT f 0:1; 0:2; 0:3; 0:4; 0:5; 0:4; 0:3; 0:2; 0:1g and with
superposition of independent additive Gaussian noise, n(k), with a zero mean value
[36]. The variance of additive noise was thus chosen to ensure a signal to noise
ratio SNR equal to 0 dB before the addition of impulse noise. Impulse noise was
generated according to a model nk akAk; where ak is a binary inde-
pendent identically distributed process with a probability Pak 1 0:01 and
Pak 0 0:99: Here Ak is a process with a symmetrical amplitude distri-
bution, uncorrelated with ak and with a variance varfAkg 104 =12 [36]. One
implementation of thus generated additive noise is shown in Fig. 5.6. The length
nH of the sliding window of the scale factor estimator in (5.7) was chosen to have
the value of 10 samples, to ensure that the probability of the appearance of more
than one outlier (realization of impulse noise) within the window remains very
small. The adaptive FIR filter is of the same order of the desired FIR filter and is
determined by a parameter vector Hk^ and by the algorithm for recursive esti-
mation of the parameter vector (5.35) and (5.36). Two different input signals
-10
-20
-30
-40
0 100 200 300 400 500
sample number
5.2 Robust Recursive Least Squares Algorithm 169
10
-10
-20
-30
0 100 200 300 400 500
Number of iterations
Fig. 5.7 Normalized estimation error (NEE) for different adaptive algorithms in the case without
impulse noise in desired response. Dashed line RLS algorithm with Gaussian noise as input
signal; Bold dashed line RRLS algorithm with Gaussian noise as input signal; Solid line RLS
algorithm with optimal input; Bold solid line RRLS algorithm with optimal input
x(k) were applied: (1) the optimum input sequence (4.204.22), (2) Gaussian noise
of unit variance and with a zero mean value. The results of the parameter esti-
mation, obtained by the application of input signals (1) and (2), were compared
according to the normalized estimation error NEE in (5.19).
Figure 5.7 shows the values of the NEE criterion for different algorithms in the
case when there is no impulse noise in the desired response. The analysis of the
obtained results confirmed the initial assumption that the robust RLS algorithms
(RRLS) show similar properties to the conventional RLS algorithm, however the
advantage of the use of the optimal input (RRLSO algorithm) is noticeable and
shows through faster initial convergence and higher accuracy of the estimated
parameter. Besides that, it can be noted that the RRLS algorithm with optimal
input (RRLSO) gives somewhat worse results than the conventional RLS algo-
rithm with optimal input (RLSO), however its properties are still better than those
of the compared algorithms with Gaussian noise as their random excitation.
Figure 5.8 shows the obtained values of the normalized estimation error NEE
for the case when impulse noise is contained in the desired response.
10
-10
-20
0 100 200 300 400 500
Fig. 5.8 Normalized estimation error (NEE) for different adaptive algorithms in the case with
impulse noise in desired response. Dashed line RLS algorithm with Gaussian noise as input
signal; Bold dashed line RRLS algorithm with Gaussian noise as input signal; Solid line RLS
algorithm with optimal input; Bold solid line RRLS algorithm with optimal input
170 5 Robustification of Finite Impulse Response Adaptive Filters
It was already mentioned that the previously considered M-estimators are sus-
ceptible to changes of signal dynamics. This problem can be overcome by a
modification of the criterion function, as done in (5.6), by introducing a scaling
factor (5.7). This Chapter presents another approach to the recursive generation of
the scaling factor in the real time, simultaneously with the generation of the filter
parameter estimations.
Let us assume that the probability distribution density of s stochastic distur-
bance (noise),nk; is known with an accuracy up to a scaling factor s. If pn is the
nominal probability distribution density for s 1; then the probability distribution
density for an arbitrary scaling factor s is given as
1 n
p n p : 5:43
s s
A case of interest is when the disturbance nk belongs to a distribution class P
that is known in advance. In that case it is necessary, by maximizing the bottom
Kramer-Rao limit in (4.9), to determine first the most unfavorable probability
distribution density p
n (such a function of probability distribution density
carries minimum amount of information about the estimated parameters within the
given class). The optimal loss function in the distribution class is the likelihood
function u in (5.2), which is determined by the most unfavorable probability
distribution density function within the given class and now has a form
1 ek; H ek; H
F
ek; H; s ln p
ln s ln p
s s s
ek; H
ln s u : 5:44
s
5.3 Adaptive Estimation of the Scaling Factor in Robust Algorithms 171
According to (5.44) one obtains the relation for mean (expected) losses, which
simultaneously represents the criterion generator (5.21) for recurrent stochastic
procedures for vector H estimation
ek; H
J H; s EfF ek; H; sg ln s E u : 5:45
s
According to (5.45), analogously to (5.24), one obtains the following empirical
functional (mathematical expectation is approximated by arithmetic means)
1X k
ei; H
Jk H; s ln s u : 5:46
k i1 s
where
O k x k
lim 0;
k xk!1 k xk
and k k represents the Euclidean norm. According to (5.46) one may write
" #
k1
1 1 X ei; H ek; H
Jk H; s ln s k 1 u u
k k 1 i1 s s
5:48
1 ek; H
ln s k 1Jk1 H; s k 1 ln s u ;
k s
whence it follows
ek; H
kJk H; s ln s k 1Jk1 H; s u : 5:49
s
By differentiating twice the last relation over H, one obtains
oJk H; s oJk1 H; s Xk ek; H
k k 1 W ; 5:50
oH oH s s
172 5 Robustification of Finite Impulse Response Adaptive Filters
o2 Jk H; s o2 Jk1 H; s XkXT k 0 ek; H
k k 1 W ; 5:51
oH2 oH2 s2 s
where W u0 : If we introduce H Hk ^ 1 and s ^sk 1 into the relation
(5.51), we obtain
o2 J k H^ k 1; ^sk 1 o2 Jk1 H^ k 1; ^sk 1
k k 1
oH2 oH2
! 5:52
XkXT k 0 e k; H ^ k 1
2 W :
^s k 1 ^sk 1
It remains to define the recurrence relation for the scaling factor s. Similarly to
(5.47), one obtains the relation for the recurrent estimation of the scalar parameter s
" #1 " #
o2 J k H^ k 1; sk 1 oJk H^ k 1; sk 1
^sk ^sk 1 k k :
os2 os
5:57
5.3 Adaptive Estimation of the Scaling Factor in Robust Algorithms 173
!
1 2e k; H ^ k 1 e k; H^ k 1
c k c k 1 2 W
^s k 1 ^s3 k 1 ^sk 1
! 5:62
e2 k; H^ k 1 e k; H ^ k 1
4
W0 :
^s k 1 ^sk 1
Using the relations (5.555.57) and (5.62) one obtains the final form of the
algorithm for the estimation of the parameter vector with simultaneous estimation
of the scaling factor s.
!
1 1 Xk e k; H^ k 1
^ ^
Hk Hk 1 R k W ; 5:63
k ^sk 1 ^sk 1
" ! #
^ XkXT k
1 0 e k; Hk 1
R k R k 1 W R k 1 ; 5:64
k ^sk 1 ^s2 k 1
" ! #
1 ^ k 1
k; H
^sk ^sk 1 ^
^sk 1 e k; Hk 1 W ;
ck^s2 k 1 ^sk 1
5:65
!
1 2e k; H ^ k 1 e k; H^ k 1
c k c k 1 2 W
^s k 1 ^s3 k 1 ^sk 1
! 5:66
e2 k; H^ k 1 e k; H ^ k 1
4
W0 :
^s k 1 ^sk 1
The first and the second derivative of the functional (5.45) in the point
Hopt ; sopt are
5.3 Adaptive Estimation of the Scaling Factor in Robust Algorithms 175
( !)
oJ Hopt ; sopt 1 e k; Hopt e k; Hopt
E W 5:69
os sopt s2opt sopt
( ! 2 )
o2 J Hopt ; sopt 1 0 e k; Hopt e k; Hopt
2 E W
os2 sopt sopt s4
( ! ) 5:70
e k; Hopt e k; Hopt
2E W :
sopt s3
oJ Hopt ;sopt
For H Hopt and s sopt we have os 0; thus we obtain from (5.69)
to (5.70)
o2 J Hopt ; sopt 1 1 e k; Hopt 2
E W0 e k; H opt : 5:71
os2 s2 s4 sopt
If one introduces Fishers dispersion matrix on the class of the probability
distribution density functions P, based on the least favorable probability distri-
bution density
p
within the given class P [28]
( )
p0
n 2 2
I d
p
E n ; 5:72
p
n
by applying the procedure presented in [28] and utilizing the relations (5.695.71),
one obtains the following expression for (5.68)
" !#
1 e k; H^ k 1
^sk ^sk 1 ^sk 1 e k; H^ k 1 W
k I d
p
1 ^sk 1
" ! #
a e k; H^ k 1
^sk 1 ^sk 1 e k; H ^ k 1 W
k ^sk 1
5:73
where
1
a : 5:74
Id
p
1
Now the simplified algorithm for the simultaneous estimation of the parameters
and the scaling factor assumes the form
!
1 X k e k; H^ k 1
^ 1
^ k 1 R k
Hk H W ; 5:75
k ^sk 1 ^sk 1
176 5 Robustification of Finite Impulse Response Adaptive Filters
" ! #
^ T
1 0 ek; Hk 1 XkX k
Rk Rk 1 W Rk 1 ; 5:76
k ^sk 1 ^sk 1
" ! #
a e k; H^ k 1
^
^sk ^sk 1 ^sk 1 e k; Hk 1 W ; 5:77
k ^sk 1
^ k 1:
ek yk XT kH 5:78
Since according to (5.77) the ability to update the scaling factor decreases with
the advance of the training process, since the correction of the scaling factor is
divided by the number of iterations k, it is desirable to modify (5.77) to ensure the
application to the signals with abrupt changes of dynamics. Namely, instead of the
variable a/k a fixed weight factor 0.1 was introduced, its value chosen according to
experimental results. The choice of this factor is a result of trade-off between the
Table 5.7 Flow diagram of the algorithm for simultaneous estimation of the parameters and the
scaling factor
Step 1: Initialization
^
H0 0; P0 r2 I; r2 1; R0 P1 0
T
X 1 x1 x0 0. . .0; input column vector with dimensions M 1 1
define the length of the data window nH ; 5 nH 10
define the initial data window E0 fd 0 d 1 . . . d nH 1g
estimate the standard deviation ^s0 according to (5.7and the data E0
e 1; H^ 0 d1; initial measurement residual
define the function W according to (5.23) and its derivative W0
Step 2: Assuming that in each step k 1; 2; 3; . . .one knows Hk^ 1; Xk;
Rk 1; ek; Hk 1 and ^sk 1 calculate:
normalized residual
increase of the variance of the estimated scaling factor and the rate of change of
the scaling factor with the onset of the changes of signal dynamics.
A flow diagram of the algorithm (5.755.78) is given in Table 5.7.
-1
-2
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
(b)
40
20
-20
-40
-60
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Fig. 5.9 Additive noise with variable dynamics: (a) without impulse noise, (b) with impulse
noise
178 5 Robustification of Finite Impulse Response Adaptive Filters
where c1 and c2 are conveniently chosen constants that ensure continuity of the
function, instead of Hubers saturation nonlinearity in (5.22) [63].
The derivative of the function (5.79), the so-called influence function, has the
form [14, 28, 35]
x; za j xj mr
W x u0 x 5:80
0; za j xj [ mr
where the usual choice is m 3 (the so-called 3 sigma rule). For the estimation of
parameters one still utilizes nonlinearity (5.22) and its influence function.
The results obtained by simulation were compared, using normalized estimation
error (5.19), with the results obtained by the RRLS algorithm which utilized the
median of absolute deviation for robust estimation of the scaling factor.
Figure 5.10 shows the estimation of the scaling factor obtained by the use of the
median of absolute deviation (MAD) and the iterative method (5.77), while uti-
lizing nonlinear transformation of residual (5.79) and the adequate influence
function (5.80).
(a)
0.4
0.2
0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
(b)
0.4
0.2
0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Fig. 5.10 Estimation of scaling factor: (a) using median of absolute deviation and (b) by
iterative method (5.77)
5.3 Adaptive Estimation of the Scaling Factor in Robust Algorithms 179
-30
RRLS ? scale factor
-40
-50
-60
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
NEE
-30
iterative method (5.77) in
RRLS ? scale factor
-40
-50
-60
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Figures 5.11 and 5.12 show the values of normalized estimation errors obtained
by different methods for the estimation of the scaling factor.
The performed experimental and theoretical analysis enables to draw the fol-
lowing conclusions:
The algorithm for iterative estimation of scaling factor is convenient for
parameter identification of systems where additive noise also contains abrupt
variance changes besides impulse noise (nonstationary case). The proposed
algorithm was compared to the conventional approach, the robust recursive least
squares algorithm (RRLS), where noise variance is determined using the median
of absolute deviation. The obtained results show that the proposed algorithm
estimates better the scaling factor, which reflects in lower variance of estimations.
Besides that, in the case when additive noise is contaminated by impulse noise, the
proposed algorithm proves itself more efficient than the RRLS algorithm, where
noise variance is determined using the median of absolute deviation.
Thus for the applications where it is necessary to more accurately estimate the
scaling factor with a goal to improve the quality of filter parameter estimation in
nonstationary environment characterized impulse disturbances, the advantages of
the proposed algorithm are obvious, but it should be mentioned that this is
obtained for the price of higher computational complexity.
X
k
^ k k1
Ju H; ^ i
u e H; 5:82
i1
and the criterion generator of the estimation of weighted least squares estimation
with a forgetting factor is given by the expression (3.88), i.e.
X
k
^ k k1
Jq H; ^ i :
qk; ie2 H; 5:83
i1
The three strategies, the so-called EPE, FKY and PA algorithms, which further
concretize the general structure (5.84), were considered in Chap. 3.
In (5.81), kk is a parameter in the k-th step whose value may be 1 or 0. If
k 1; the expression (5.81) becomes the criterion function (5.82), which leads to
the RRLS algorithm, and if k 0 then (5.81) leads to the criterion (5.83) with a
variable forgetting factor, whose recursive minimization can result in the PA-RLS
algorithm. Since impulse disturbances are sporadic, while the parameter changes
may be continuous, it is desirable that the PA-RLS algorithm dominates, while the
RRLS algorithm is activated only in the intervals where impulse disturbances
occur.
Bearing in mind the assumed nature of impulse noise nk akAk; where
ak is a binary independent identically distributed process chosen so that
Pak 1 has a low value close to zero, while Pak 0 has a value near one,
while Ak is a process with a symmetrical distribution of amplitude, uncorrelated
with ak and having a large variance compared to additive noise, the following
strategy was proposed for the choice of the parameter k in each step:
8 n o n o
< 1; c med jekjn \mean jekjn
k n H
o n H
o 5:85
: 0; c med jekj
nH mean jekjnH
Table 5.9 Flow diagram of RRLS algorithm with variable forgetting factor and detection of
impulse disturbances
Step 1: Let vector parameter H ^ k 1 in iteration k; k nH be known from the previous
iteration k 1
Step 2: Calculate the current residual e k; H ^ k 1 dk XT kHk and define the
n
vector E fek; ek 1; . . .; ek nH 1g with a length nH ; implying that
nH 1 is known from the previous values of the residual
fek 1; . . .; ek nH 1g
Step 3: Calculate kk in the data window defined in the Step 2:
8 n o n o
< 1; c med jekjn \mean jekjn
k n H
o n H
o
: 0; c med jekj
nH mean jekjnH
If kk 1; set qk qk 1 and lk lk 1 and then continue with the
Step 8
Step 4: Calculate normalized ratio of nonstationarity and additive noise power, qk;
qk maxf0; qk h 1qk 1 1 qk 1pk 1gi
2
pk 1 lk 1 e rk1
2
1 2 lk 1 lk 1
n
previous value in the next nH iterations, which is the number necessary to ensure
that the mean value becomes insensitive to the detected outlier. The block diagram
of the algorithm is given in Table 5.9.
184 5 Robustification of Finite Impulse Response Adaptive Filters
The block for system parameter identification shown in Fig. 5.1 was simulated.
The signal of the desired response was obtained by bringing noise with normal
distribution and unit power to the input of a ninth-order FIR filter, with the
parameters H f0:1; 0:2; 0:3; 0:4; 0:5; 0:4; 0:3; 0:2; 0:1g: Independent noise with
normal distribution and a fixed variance was added to the filter output
signal so that the signal-to-noise ratio is SNR 30dB before the start of
impulse contamination. Impulse noise was generated according to the model
nk akAk; Pak 1 0:01; Pak 0 0:99 and varfAkg 104
=12:^b1 k:
(a)
1
0.95
(k)
0.9
0.85
0.8
0 500 1000 1500 2000 2500 3000
(b)
0.6
0.5
0.4
0.3
0.2
0.1
(k)
0
-0.1
0 500 1000 1500 2000 2500 3000
(c)
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
0 500 1000 1500 2000 2500 3000
broj iteracija
Number of iterations
Fig. 5.13 Estimation of time-variable parameter using the RRLS algorithm with a variable
forgetting factor q and an outlier detector without impulse contamination. (a) value of variable
forgetting factor, (b) estimated value of variable parameter b1 and the parameter k which detects
the presence of impulse noise, (c) additive noise
5.4 Robust Recursive Least Squares Algorithm with Variable Forgetting Factor 185
The length of the sliding window nH ; applied for the determination of the
parameter k; was chosen to be nH 5 to minimize the probability of occurrence of
more than one outlier in the window simultaneously and at the same time to
decrease the number of iterations when the algorithm for the forgetting factor
update is inactive. Based on experimental results we adopted a value of 10 for the
constant c as a sufficiently good to follow the phenomena of impulse contami-
nation. The changes of the b1 parameter of the FIR filter, as defined in Fig. 5.14,
includes sudden drops and surges, as well as linear growth with larger or smaller
inclination.
Figure 5.13 shows the estimated value of the variable parameter for the case
when no impulse contamination is present in additive noise. The parameter k has a
value of 0 during the whole estimation, which is expected since no outliers occur,
so that the leading role in the estimation is assumed by the PA-RLS algorithm. The
adequate estimation of the forgetting factor in accordance with the variation of the
estimated parameters is obvious.
Figure 5.14 shows the results obtained for the case when impulse contamina-
tion is present in additive noise (Fig. 5.14d). The parameter k successfully
detected each occurrence of impulse contamination (Fig. 5.14c) and thus pre-
vented the update of the variable forgetting factor in the interval after the impulse
disturbance occurred. Due to the appearance of impulse disturbances around 1,200
and 2,300-th iteration one notices insignificant deviations in the estimation of the
parameter values, (Fig. 5.14b), however they did not influence significantly the
total result of the parameter estimation.
At the end, let us note the following. It is possible to solve the problem of the
estimation of the time-variable parameter in the presence of impulse disturbances
in additive noise using a robust recursive least squares algorithm with variable
forgetting factor and a detector of impulse disturbances (outliers). The proposed
algorithm is basically a two-step one, since depending on the detection of impulse
186 5 Robustification of Finite Impulse Response Adaptive Filters
(a)
1
0.95
(k)
0.9
0.85
0.8
0 500 1000 1500 2000 2500 3000
(b)
0.6
0.4
0.2
0
-0.2
0 500 1000 1500 2000 2500 3000
(c)
1
0.5 (k)
0
0 500 1000 1500 2000 2500 3000
(d)
40
20
0
-20
-40
-60
0 500 1000 1500 2000 2500 3000
broj iteracija
Number of iterations
Fig. 5.15 Estimation of the time-variable parameter using the RRLS algorithm with variable
forgetting factor q and with a detector of outliers if impulse contamination is present: (a) the
value of the variable forgetting factor, (b) estimated value of the variable factor b1 , (c) parameter
k which detects the presence of impulse disturbances (d) additive noise
Echo is a phenomenon we meet almost every day. During conversation, one may
hear echo of speech that occurs because of the reflection of the sound signal from the
walls, floor or some other surrounding objects. Echo always appears when there is
reflection, but is often imperceptible if the time difference between the appearance of
the original signal (speech) and the arrival of the reflected signal (echo) is small.
However, when the location of the reflection is sufficiently far from the speaker, as is
the case in large empty rooms, then the time delay of the reflected signal is larger and
echo may thus be significantly more marked in comparison to the original signal.
Echo is also generated in telecommunication networks. In this case the term of
echo implies a delayed and distorted version of the original acoustic or electrical
signal which moves towards its own source due to reflection or some other reason.
From the point of view of transmission quality, echo represents a disturbance
causing a decrease of intelligibility in speech transmission, and an increase of error
probability in data transfer. The origins of echo should be sought in specific
requirements regarding the type of transmission, diversity of terminals and
requirements for maximum exploitation of the available transmission systems.
Although data transfer, in the form of telegraphy, preceded speech signal
transmission, speech communication became dominant with time and determined
the development of telecommunication networks. The contemporary trend of the
development of computers, which are becoming omnipresent due to their low cost,
imposes an increasing need for data transfer. It is natural that there is a tendency to
use the existing telephone networks for this type of transmission too. However,
these networks are optimized for the transmission of analogous speech signals and
thus introduce various distortions in data transfer. The most marked distortions are
the linear ones, which include echo.
On the other hand, regardless if one deals with data or speech transmission, due
to specific requirements in the use of communication equipment (like for instance
acoustic and video teleconferences, satellite transmissions and similar) several
different types of echo signal are generated.
The causes, modes and origins of echo in telecommunication networks may be
different, but their common trait is that they decrease the quality of communica-
tions. Thus there is an interest for a practical use of echo cancellers.
The theoretical basis for echo cancellation is in the field of adaptive digital
filtering. This filed has been intensively researched in the last several decades, and
the first practical implementation of an echo canceller appeared during 1960-ties.
However, because of the requirements connected with complex digital signal
processing, a wider usage had to wait for the advent of the (LSI) Large scale
integrated technology. The first echo canceller in (VLSI) Very large scale inte-
grated technology was implemented in 1980 and this opened new possibilities for
the improvement of characteristics and functionalities of echo canceller, as well as
for their downsizing and cost decrease.
Following the technological development, the usage of echo cancellers also
evolved, from the original concept of echo cancellation on very long distance lines
to the application in full-duplex systems for data transfer, as well as in cancellation
of acoustic feedback (acoustic echo) in electro-acoustic, tele-audio and video
conferences.
As a rule, modern communication systems contain subsystems for local echo
cancellation based on the principles of adaptive filtering. The technology of digital
signal processors (DSP) ensures new possibilities for the implementation of
complex algorithms for local echo cancellation, optimized with regard to adap-
tation speed and accuracy. This is also a reason for an increasing interest for the
improvement of the existing and the generation of new algorithm for more efficient
solution of the echo cancellation problem.
The goal of this Chapter is to explain the concept of adaptive echo cancellation
based on adaptive digital filtration, with a special emphasis on local echo, to
represent the theoretical background, the possibilities and the limitations of this
approach, as well as to present some of the achieved original results contributing to
the improvement of the existing solutions.
The further text presents the basic types of echo signals, their causes and
origins. It points out some conventional ways for the cancellation of this phe-
nomenon, considers their drawbacks and outlines the principles of adaptive echo
cancellation.
Several classes of recursive adaptive algorithms for local echo cancellation are
analyzed from the point of view of accuracy, training speed, adaptation and the
complexity of their implementation. Special care has been dedicated to the analysis
of the influence of training sequences to the performance of the adaptive local echo
canceller. The adjustment of the frequency range to the given parameters of
the adequate communication channel was considered, as well as the statistical-
correlation properties of the training sequences. The performed analysis encom-
passes a novel approach based on the possibility of the on-line generation of
optimal input signals, taking as the synthesis criterion the classes of functionals used
in the field of optimal experiment planning. The algorithms of this type are con-
sidered in detail in Chap. 4. Besides that, we proposed the use of robust recursive
algorithms in the case when echo signal is contaminated by additive impulse noise of
by Gaussian noise with weighted tails. An exhaustive experimental and theo-
retical analysis of robust adaptive digital filters is given in Chap. 5.
6.1 Echo: Causes and Origins 189
In this Section we give an overview of the types and the causes of the appearance
of echo signal occurring in speech and data transmission. We present some
standard solutions for echo control and the basic concepts of adaptive echo
cancellation.
Regardless of their common name, the ways and consequences of echo
appearance in speech transmission and data transfer are different. Thus the
requirements posed for echo cancellation in these two cases are different, and we
consider these two cases separately.
A telephone network consists of the so-called two-wire and the four-wire seg-
ments. If the same line is used for transmission both in the receiving and in the
emitting direction, it is said then that it is the two-wire transfer. This form of
transfer is used on shorter distances, for instance from the subscriber to the local
telephone exchange. For the reasons of cost-effectiveness, longer distance transfers
require signal multiplexing, which implies separation of the transfer directions.
The term four-wire for this type of transfer appeared in the period when radio or
multiplexed transmission was not utilized and the separation of directions meant
the existence of at least two transmission lines. Its basic purpose is to ensure signal
amplification and multi-channel transmission. This type of transmission is sche-
matically shown in Fig. 6.1.
The system shown in Fig. 6.1 is symmetric for a user at the A and at the B side.
Assuming that the user A talks and the user B listens, the speech signal is trans-
ferred from the user A along the two-wire segment to the exchange. The hybrid in
the exchange should ensure the transition from the two-wire to the four-wire
segment by forwarding the signal to the emission direction. In Fig. 6.1 it is located
in the top part of the four-wire segment (Fig. 6.1b). At the other end, in the
exchange closer to the user B, the signal is forwarded through the hybrid and the
two-wire segment to the user B.
An important element of the network is the hybrid with a task to ensure the
conversion from a two-wire to a four-wire segment and vice versa, dividing at that
the emitting and the receiving direction. In an ideal case the hybrid transmits the
incoming signal from the four-wire to the two-wire segment, attenuating it by
3 dB, but in the opposite direction it attenuates it infinitely, not allowing it to
proceed to the emitting direction and thus to make a circular path. This may be
obtained by carefully tuning the hybrid, under the condition that the parameters of
the environment do not change. However, because of the variable number of the
user in the network, the requirement that the terminal may be positioned in various
places in the network and the variability of the directions of the establishment of
190 6 Application of Adaptive Digital Filters
(a)
A B
(b)
(c)
(d)
Fig. 6.1 Appearance of echo in analog telephone network in speech signal transfer. (a) block
diagram of telephone connection, (b) one of the two desired transmission directions, (c) talker echo,
represents the phenomenon that the talker hears his own voice with some delay. (d) listener echo,
represents the phenomenon that the listener also hears a delayed version of the receiving signal
the connection, the hybrid cannot be designed in advance to satisfy the conditions
of ideal impedance matching [34].
A consequence of insufficient impedance matching of the hybrid is manifested as
a decrease of signal attenuation in the undesired direction, so that this attenuation
may even drop below 10 dB. Because of that, a part of the signal transferred to the
user B arrives through the hybrid into a part of the four-wire segment which otherwise
serves for the transfer from the user B to the user A. In this manner, the user A may
hear its own speech with a delay (Fig. 6.1c). This phenomenon is denoted as the
talker echo. The larger the signal propagation time, conditioned by the length of the
four-wire segment, the more marked is the influence of this phenomenon.
6.1 Echo: Causes and Origins 191
From the same reason the user B may hear in a similar way a delayed version of
the receiving signal (Fig. 6.1d). This echo is called the listener echo and is usually
weaker than the so-called talker echo, owing to attenuation at the transmission path
[65, 66].
It should be noted that the useful signal along a four-wire segment propagates in
one direction only, while the path of the echo signal is a loop. Because of that one
of the first methods for echo control was to introduce additional 3 dB attenuation
in both transmission directions at the four-wire segment. In this manner the echo
signal is attenuated by additional 6 dB. A deficiency of this method is that the
useful signal is also attenuated by additional 3 dB.
A somewhat more efficient method for echo control is to interrupt the trans-
mission in one direction if there is a speech signal in the other one (echo sup-
pressor), [67]. However, this method is efficient only if echo signal delay is shorter
than 100 ms. In satellite transmission, the time necessary for the radio signal to
reach a satellite and then to return to the earth is about 250 ms. This means that in
the case of a loop path the time delay may be up to 0.5s [68].
Acoustic echo is one of the dominant problems influencing the quality of speech
signal transmission. It is especially marked in the so-called hands-free telephony,
as well as in acoustic and video teleconferences [69]. The acoustic echo is a signal
occurring due to multiple reflection of sound waves and the establishment of an
acoustic channel between the speaker and the microphone at the terminal. The
appearance of acoustic echo is schematically presented in Fig. 6.2.
Contrary to the conventional telephone sets, where the user is in direct, physical
contact with the terminal and where the acoustic path between the speaker and the
microphone is blocked by their specific location, here it is not the case. The
acoustic channel occurs due to multiple reflection of sound waves. The receiving
signal, after being emitted from the speaker, passes the acoustic medium and,
distorted and delayed, reaches the microphone. An interference with the emitting
signal may occur here. Because of that normal dialog is disrupted, and in
dependence on the acoustic properties of the room a system instability may occur
and acoustic feedback may appear, which completely prevents conversation.
One assumes under the acoustic medium the interior of a room, or a car, or an
open space. Because of the diversity of the environments, their acoustic properties
are also different and variable. The reason for the variability may be the movement
of the user, opening/closing of doors, windows and similar. The changes of the
acoustic properties cause the variations of the acoustic echo duration, so that, for
instance, in a car at a sampling frequency of 8 kHz the duration of acoustic echo is
several hundreds of samples, while in a large room it may be up to several
thousands of samples [70, 71].
192 6 Application of Adaptive Digital Filters
Microphone
If the acoustic properties were known in advance one could design a filter with
invariable structure, which could utilize filtering of the receiving signal to generate
a replica of the acoustic echo and then subtract it from the microphone input
signal. Since the acoustic properties vary, the use of a filter with invariable
structure is not convenient. Even under the assumption that the acoustic properties
of the environment do not vary, such a solution would not be universal. The
conventional way to overcome this problem is to transit to half-duplex transmis-
sion, i.e. to break the connection with a microphone if the speaker is active, or to
use a directional microphone in order to reduce the influence of reflection to
minimum. Deficiencies of both of the quoted solutions are restrictions with regard
to the user, which make the conversation unnatural.
Emitter Emitter
local echo
A B
line echo
Receiver Receiver
Fig. 6.3 Block diagram of full-duplex transmission at a single line, implemented using a hybrid.
Echo signals are denoted by dotted lines
implies that the delay is such that the listener can notice it, like for instance in
satellite transmissions. In data transfer a much more important factor is the relative
ratio of the useful signal to echo signal powers, because a variation of this ratio
varies the error probability.
A more efficient echo control compared to the mentioned methods can be
obtained by utilizing blocks for adaptive echo cancellation.
The causes of echo occurrence are multiple: first, the impossibility of ideal
matching of the hybrid; second, an analog telephone network has many impedance
irregularities causing signal reflection toward its source; and third, the reflection of
acoustic waves and acoustic coupling of speakers and microphone in hands-free
telephony.
For any type of echo, acoustic or electrical, the echo-canceling block must first
model the echo transmission path, then efficiently estimate the parameters of the
transmission path, and then generate a replica of the echo signal. The generated
signal is subtracted from the incident signal, in order to leave only the desired
signal. Adaptive filters are used for the estimation of the transfer function
parameters, since they are able to autonomously update their own parameters,
while simultaneously requiring little foreknowledge bout the characteristics of the
echo transmission path [26].
An adaptive filter consists of a digital filter and an adaptive algorithm (Chap. 2).
The digital filter should model the transmission path of the echo signal, and the
adaptive algorithm should update the filter parameter values according to the
changes of the transmission path. Adaptive digital filtering is required in order to
achieve as accurate replica of the echo signal as possible, since the transmission
path of the echo signal is in principle unknown and variable with time. The choice
of the digital filter and the adaptive algorithm depends on a particular application
and the expected effect in noise cancellation.
Acoustic noise cancellation [7, 62] is based on the identification of the unknown
acoustic transmission path between the speaker and the microphone. Since
acoustic echo occurs due to multiple reflection of the signal, the impulse response
of the acoustic channel is very long. Besides that, under normal circumstances, the
acoustic transmission path varies with time and contains additive noise, which
makes the problem very complex. The acoustic echo cancellation block, imple-
mented as an adaptive filter, is placed in parallel between the speaker and the
microphone, as shown in Fig. 6.4.
In speech transmission, due to the system symmetry, echo is generated at the
both ends of the network. Thus it is necessary to include two echo cancellation
blocks. They are placed in the four-wire segment, as close to the echo source as
possible, to render cancellation as efficient as possible. The principle is as follows:
the input signal is brought both to the hybrid and the block for adaptive echo
cancellation. The output from that block is then subtracted from the output from
the hybrid. The obtained difference signal represents the error signal for the
adaptive blocks and enters the process of the update of the adaptive block
parameters. A block diagram of such an implementation is shown in Fig. 6.5.
6.1 Echo: Causes and Origins 195
Generation of a replica
of acoustic echo
In data transmission the function of the echo cancellation block is to model two
types of echo signal: local echo and line echo. Local echo occurs because of the
existence of the transmission path between the transmitter and the receiver at the
same side. The usual assumption, giving satisfactory results in practice, implies
linearity and slow variability in time of that transmission path (for instance, as a
consequence of temperature changes [70, 74]). Line echo usually has much lower
power than local echo, in principle it is nonlinear, varies with time and has much
larger delays. Because of that, although located in the same place of the network,
the echo cancellers are separately realized for these two cases. The block for local
and line echo cancellation is placed in parallel to the hybrid (Fig. 6.6).
The process of echo cancellation itself can be divided to two intervals: the
training period, occurring at the beginning of the communication, and the period of
following the variations of the characteristics of the echo signal transmission path.
The main goal of the echo canceller in the initial, training period, is to reach
extremely low echo signal in the shortest possible time. This level should be such
to ensure a satisfactory bit error or intelligibility of the received signal, in
dependence on the transmission type. On the other hand, during the communi-
cation itself, it is necessary to ensure the following of variations of echo channel
parameters in order to ensure a satisfactory level of noise cancellation.
Mechanisms of suppression of the local echo and the line echo have many
similarities, but also fundamental differences. For instance, during data transfer
one has all the time signals on the line in both directions, thus the problem of the
correction of parameter values during the transmission itself becomes more
complex. It is not a rule in speech transmission, since the participants usually talk
alternatively. Also, in data transmission one most often requires a much higher
degree of echo suppression than in speech transmission [18]. On the other hand, in
data transmission, the reference signal is a sequence of symbols which may
assume a limited set of values, which simplifies the implementation.
196 6 Application of Adaptive Digital Filters
A B
AEC AEC
Fig. 6.5 Adaptive echo cancellation (AEC) for both directions of speech signal transmission
transmission
Generation of
echo signal
A replica
Receiving
four-wire segment two-wire segment
filters for the approximation of the echo signal transmission path. Starting from
this, the transmission path of echo signal and echo signal itself can be modeled
utilizing the linear regression equation
yk XT kh vk ; 6:1
where XT represents the data vector and h parameter vector of echo signal
transmission path, defined as
XT k xk xk 1 xk 2 xk M
6:2
hT b0 b1 b2 bM :
The quoted approach requires the estimation of parameters defined by the
vector h, so that the MSE criterion is minimized.
However, as a consequence of the modeling of a system with both zeroes and
poles (IIR) utilizing a structure with zeroes only (FIR), the dimensions of the
parameter vector (the order M of the FIR filter) must be large to achieve satis-
factory echo cancellation. Besides that, the number of iterations necessary to reach
a convergence of the estimation procedure rises with an increase of the number of
parameters. An approach to improve the convergence properties of estimation
algorithms is based on an adequate choice of the input signal, xk, during the
training period. Namely, the input signal, xk, may be generated in advance, or
generated on-line utilizing an adequate algorithm during the training itself, taking
as the synthesis criterion the classes of the functionals utilized in the field of
optimal experiment planning. These algorithms are described in detail in Chap. 4.
In this Section we consider an echo cancellation system from the point of the
influence of excitation signals to the training of the adaptive block for echo can-
cellation in speech signal transmission. A characteristics of this situation is the
appearance of the so-called talker echo which is a consequence of the signal
flow in a loop. Since the signal propagation time is far longer than for local echo,
the dimensions of the vector describing the parameters of the transmission path are
also larger. The block for adaptive echo cancellation is placed in parallel to the
hybrid in the four-wire part of the line. As in data transfer, one needs to update
the coefficients of the adaptive filter. This is performed in the initial period before
the communication begins, utilizing the emitted excitation sequence and the
corresponding echo signal.
The goal of the analysis presented in this section is to ensure a better under-
standing of the influence of excitation signals to the performance of the echo
cancellation system.
198 6 Application of Adaptive Digital Filters
The model of the adaptive echo cancellation block itself is shown in Fig. 6.7.
The coefficients of the adaptive FIR filter, defined by the vector H (k), are updated
during the training process utilizing the (NLMS) Normalized least mean square
algorithm expressed as [55]
K
H k H k 1 e k X k; 6:3
rx k
where
K algorithmconvergence
and stability factor, located within the range
2
0 \ K \ =M ;
M the length of the adaptive filter;
rx k estimated input signal power, xk, in k-th moment;
Xk column vector of input signal samples
The figure of merit of the success of echo cancellation is described as the (ERLE)
factor Echo return loss enhancement and is defined as [23]
Ey2 k
ERLE 10 log10 2
; 6:4
Ee k
where Ey2 k and Ee2 k are mathematical expectations of the squared echo
signal yk and the squared signal ek remaining after echo cancellation.
v(k)
+
Transmission path +
of echo signal
x (k) y( k)
+
Echo cancellation
^
block y( k ) -
H(k)
e(k)
Mechanism for
parameter update
(MSE minimization )
An iterative method for the estimation of signal mean power r2x k and the
corresponding mathematical expectations Ey2 k and Ee2 k is given by the
recursive relation [55]
4. Each of these signals was passed through an FIR filter that simulated the line
and in this way signal pairs were obtained for the training of the adaptive filter.
After the completed training, based on the algorithm defined by the expression
(6.3), the values of the filter parameters are fixed and the value of the ERLE
factor is followed on the pair of the signals defined under 2.
5. Telephone line is simulated as a filter with infinite impulse response (IIR filter)
of Chebyshev type, and the bandwidth is defined so that the filter transfer
function is as similar as possible to the transfer function of the FIR filter defined
under 1.
6. Speech signal is passed through the defined IIR filter and the response is stored.
This signal pair is used to investigate the ERLE factor.
7. The excitation signals are the same as under 3, only the signal pairs for training
are obtained by passing the corresponding excitation sequences through the
defined Chebyshev-type IIR filter.
8. The telephone line is simulated by a Butterworth-type IIR filter, and the
bandwidth is defined so that the filter transfer function is as similar as possible
to the transfer function of the FIR filter defined under 1.
9. Experiments are performed analogously to the entries 68.
The accuracy of the calculation of filter coefficients and input data is 32-bit.
Figure 6.8 shows a family of curves representing the ERLE factor for the case
when the transmission path of echo signal is simulated by an FIR filter with a
range 3003,400 Hz. Training of the adaptive filter was done using signals with
completely flat amplitude-frequency characteristics (unit value), a random phase
with normal distribution and a different frequency range. The notation b, d, e, f and
h corresponds to the previously defined frequency ranges. The notation speech
is placed over the curve describing the ERLE factor obtained after training of the
adaptive filter on a speech signal.
Figure 6.9 shows the results obtained when the transmission path of echo signal
was simulated by a Butterworth-type FIR filter, training of the echo cancellation
block was performed using signals with a flat amplitude-frequency characteristic
and with a random phase with uniform distribution.
If the transmission path of the echo signal is simulated by a Chebyshev-type IIR
filter, and Gaussian noise with various frequency ranges is used as the excitation
signal, one obtains the results shown in Fig. 6.10.
The presented results point out to the following possibilities and conclusions:
1. The basic task of an echo canceller in the initial, training period, is to reach an
extremely low echo signal level in the shortest possible time. This level should
ensure a satisfactory bit error or intelligibility of the received signal, depending
on the transmission type. One of the approaches to the increase of the con-
vergence speed and accuracy enhancement in the process of parameter esti-
mation is to utilize adequate excitation (training) signals.
6.3 Analysis of the Influence of Excitation 201
80
training following ERLE
b
70
d
60
e
50
ERLE dB
40
30
govor
20
10 f
0
h
-10
0 5000 10000 15000 20000 25000
Number of samples
Fig. 6.8 ERLE factor of the echo signal transmission path is simulated by an FIR filter
70
training following ERLE
60 b
50 d
ERLE dB
40 e
30
speech
20
10
f
0
h
Fig. 6.9 ERLE factor of the echo signal transmission path is simulated by a Butterworth-type
IIR filter
70
b
60
50 d
e
ERLE dB
40
30
speech
20
10
f
0 h
0 500 10000 15000 2000 25000
Number of samples
Fig. 6.10 ERLE factor of the echo signal transmission path is simulated by a Chebyshev-type
IIR filter
performance of the adaptive digital filter point out to the possibility of on-
line generation of optimal excitation signals, taking as the criterion the syn-
thesis of the class of functionals used in the field of optimal experiment plan-
ning. This approach was analyzed in detail in Chap. 4.
Such an approach results in a novel algorithm for FIR adaptive filtering, which
can be successfully applied for the needs of adaptive echo cancellation. The
algorithm is based on the use of optimal input sequences. Compared to the con-
ventional FIR filter where white Gaussian noise is used as excitation, it proves
itself that the solution is better here if the filter order is correctly chosen, since a
higher ERLE factor is achieved and the normalized estimation error is lower
(Chap. 4).
4. For the case of echo cancellation in speech signal transmission we analyzed the
adjustment of the frequency to the set parameters of the corresponding com-
munication channel, as well as the influence of the statistical-correlation
properties of the training sequences. The experimental results show that
regardless of the type of the filter used to simulate the telephone line, echo
cancellation depends on the characteristics of the excitation signal utilized for
adaptive filter training.
If the adaptive filter is trained using excitation signals with a frequency range
wider than the bandwidth of the simulated line, the ERLE factor is significantly
larger than in the case when the excitation signal has a narrower frequency range.
A decrease of the frequency range of the excitation signal below the frequency
range of the simulated line causes significant drop of the ERLE factor.
After the training of the adaptive filter using a speech signal, the ERLE factor is
far lower than one could intuitively expect.
These experimental results show that a carefully chosen signal for the training
of the echo cancellation block (adaptive digital filter) has a significant influence to
efficient echo cancellation. There is an obvious advantage of a spectrally flat signal
and Gaussian (white) noise over colored signals like the speech signal.
5. The results presented in Chap. 5 clearly point out to the possibility of robus-
tification of the standard least squares algorithm, with a goal to decrease or
remove the influence of the impulse component of additive noise in the pro-
cedure of identification of echo signal transmission path. The use of the robust
recursive least squares algorithm was proposed in Chap. 5.
the criterion function is chosen to weight more the small results compared to the
smaller part of large residuals.
A comparison of he proposed robust RRLS algorithm (Chap. 5) with the
conventional RLS algorithm (Chap. 2) in the simulation of local echo cancellation
shows that the RRLS algorithm has the necessary efficiency under the conditions
of pure Gaussian noise. In the case when echo signal is additively contaminated by
a mixture of impulse and Gaussian noise it is evident that the non-robust RLS
algorithm is very sensitive to the disturbances with such a nature and that its
properties are impaired, while the impulse changes practically do not influence the
robust RRLS algorithm.
According to the obtained results, the RRLS algorithm more efficiently removes
the influence of impulse noise compared to the standard RLS algorithm.
6. The robust recursive least squares algorithm with optimal input (RRLSO in
Chap. 5) unites two main problems: estimation of parameters of echo signal
transmission path under the conditions when echo signal is additively con-
taminated by impulse Gaussian noise, and on-line generation of controlled
training sequence, with a goal to improve estimation accuracy and convergence
speed.
Compared to the RRLS algorithm, it proves itself that for a correctly chosen
filter order of adaptive filter the RRLSO algorithm gives better results through
decreasing the impulse component of additive noise and increasing the conver-
gence speed and the accuracy of estimated parameters, which manifests through a
smaller normalized estimation error.
7. The possible directions of future results lead to a further evaluation and prac-
tical implementation of the solutions presented in this text. Also, the potential
advantages of the adaptive filters with infinite impulse response poses the need
for their more intensive use. Namely, it is known that the transmission path of
echo signal is a rational function, but because of the stability properties and the
convergence properties of the algorithm it is most often modeled as a finite
impulse response adaptive filter. This leads to an increased complexity
regarding the increase of the filter order, as well as the requirements regarding
the computational complexity. Because of that it is necessary to analyze the
possibility to use adaptive algorithms for the IIR filters in the problem of echo
cancellation, in order to solve this problem even more successfully.
References
Oppenheim AV, Schafer RW, Buch JR (2011) Discrete-time signal processing. Prentice Hall,
New Jersey
Mitra SK (2010) Digital signal processing a computer based approach. Mc Graw-Hill, New York
urovic ZM, Kovacevic BD (2004) Digitalni signali i sistemi: pregled teorije i reeni zadaci.
Akademska misao, Beograd
Widrow B, Stearns SD (1985) Adaptive signal processing. Prentice Hall, New Jersey
Haykin S (2012) Adaptive filther theory. Englewood Cliffs, Prentice Hall
Cowan C, Grant P (1985) Adaptive filters. Englewood cliffs, Prentice Hall
Gelb A (1974) Applied optimal estimation. MIT Press, Cambridge
Kovacevic B, Filipovic V (1998) Robust real-time identification of linear systems with correlated
noise. Int J Control 48(3):9931010
Sage AP, Melsa JL (1971) Estimation theory with applications to communications and control.
McGraw Hill, New York
Kovacevic BD, urovic ZM (2008) Fundamentals of stochastic signals, Systems and estimation
theory with worked examples. Springer, Berlin
Bar-Shalom Y, Li XR (1998) Estimation and tracking: principles techniques and software. CRC,
Danvers
Bard Y (1974) Nonlinear parameter estimation. Academic Press, New York
Wilcox RR (1997) Introduction to robust estimation and hypothesis testing. Academic Press, New
York
Huber P (1980) Robust statistics. Wiley, New York
Kovacevic DB (1984) Robust recursive system parameter identification, PhD Thesis, University
of Belgrade (in Serbian)
Van Trees HL (2001) Detection, estimation and modulation theory, part I-IV. Wiley, New York
Murano K, Unagami S, Amano F (1990) Echo cancellation and applications. IEEE Commun Mag
28(1):4955
Messerschmitt DG (1984) Echo cancellation in speech and data transmission. IEEE J Sel Areas
Commun SAC-2(2):283296
Veseghi SV (2006) Advanced signal processing and digital noise reduction. Wiley, New York
Banjac Z, Veinovic M, Kovacevic B, Milosavljevic M (2002) An application of adaptive FIR
filter with nonlinear optimal input design. In: 14th international conference on digital signal
processing, Santorini
Banjac Z, Kovacevic B, urovic Z, Milosavljevic M (1988) A class of algorithms for local echo
cancellation using optimal input design. In: Proceedings of MELECON 98, vol I, Tel-Aviv,
Israel, pp 13761379
Haykin S (1998) Neural networks: a comprehensive foundation. Prentice Hall, Upper Saddle
River
Fan H, Jenkis W (1988) An investigation of an adaptive IIR echo canceller: advantages and
problems. IEEE Trans Acoust Speech Sig Process 36(12):18191834
Ljung L, Soderstrom T (1983) Theory and practice of recursive identification. MIT Press,
Cambridge
Ljung L, Soderstrom T (1987) System identification: theory for the user. Prentice-Hall,
Englewood Cliffs
Solo V, Kong X (1999) Adaptive signal processing algorithmsstability and performance.
Prentice Hall, New Jersey
Widrow B (1976) Stationary and nonstationary learning characteristic of LMS adaptive filters.
Proc IEEE 64(8):1151
Tsypkin YZ (1984) Foundations of informational theory of identification. Nauka, Moscow
Sayed AH (2003) Fundamentals of adaptive filtering. Wiley, NJ
Uzunovic P, Banjac Z, Kovacevic B (2004) Adaptive IIR filtering: advances and problems. In:
Proceedings of the international conference on telecomunications, TELFOR, Belgrade
Cho YS, Kim SB, Powers EJ (1991) Time varying spectral estimation using ar models with
variable forgetting factors. IEEE Trans Sig Process 39(6):14221426
Fortescue TR, Kershenbaum LS, Ydstie BE (1981) Implementation of self-tuning regulators with
variable forgetting factors. Automatica 17(6):831835
Bard J (1974) Nonlinear parameter estimation. Academic Press, New York
Delgado JC, Tribolet JM (1984) Analog full-duplex speech scrambling systems. IEEE J Sel Areas
Commun SAC-2(3):456489
Hampel F (1974) The influence curve and its role in robust estimation. J Amer Stat Assoc 69:383
Chambers J, Tanrikulu O, Constantinides AG (1994) Least mean mixed-norm adaptive filtering.
Electron Lett 30:15741575
Eleftheriou E, Falconer DD (1986) Tracking properties and steady-state performance of RLS
adaptive filter algorithms. IEEE Trans Acoust Speech Sig Process 34:10971109
Peters SD, Antoniou A (1995) A parallel adaptation algorithm for recursive least squares adaptive
filters in nonstationary environments. IEEE Trans Sig Process 43(11):24842494
Willems JL (1970) Stability theory of dynamical systems. Nelson, Walton-on-Thames
Glentis GO, Berberidis K, Theodoridis S (1999) Efficient least squares adaptive filtering for FIR
transversal filtering. Sig Process Mag 16(4):1341
Mareels IMY, Bitmead RR, Gevers M, Johnson CR, Kosut RL, Poubelle MA (1987) How
exciting can a signal really be. Syst Control Lett 8:197204
Goodwin GC, Payne RL (1977) Dynamic system identification: experiment design and data
analysis. Academic Press, New York
Goodwin GC (1987) Identification: experiment design. In: Sing M (ed) Encyclopaedia of systems
and control. Pergamon Press, Oxford, pp 22572264
Tsypkin YZ (1983) Optimality in identification of linear plants. Int J Syst Sci 14(1):5974
Bard Y (1974) Nonlinear parameter estimation. Academic Press, New York
Barkat M (2005) Signal detection and estimation. Artech House, London
Kovacevic B, urovic Z, Filipovic V (1996) Robust system identification using optimal input
signal design. J Autom Control Univ Belgrade 6(1):1929
Soderstrom T (1973) An on-line algorithm for approximate maximum likelihood identification of
linear dynamic systems, Rep 7308, Lund Institute of Technology
Goodwin GC, Sin KS (1984) Adaptive filtering, prediction and control. Prentice Hall, New Jersey
Poularikas AD, Ramadan ZM (2006) Adaptive filtering primer with MATLAB. CRC Press,
Taylor and Francis, Boca Raton
Banjac Z, Kovacevic BD, Milosavljevic MM, Veinovic M (2002) An adaptive FIR filter for
echo cancelling using least squares with nonlinear input design. Control Intell Syst
30(1):2731
Banjac Z, Kovacevic BD, Milosavljevic MM, Veinovic M (2002) Local echo canceller with
optimal input for true full-duplex speech scrambling system. IEEE Trans Sig Process
50(5):18771882
Becker H, Piper F (1982) Cipher systems: the protection of communications. Northwood Books,
London
References 207
Cox RV, Tribolet JM (1983) Analog voice privacy systems using TFSP scrambling. Bell Sys
Tech J 62:4761
Park S, Hillman G (1989) On acoustic echo cancellation implementation with multiple
cascadable adaptive FIR filter chips. Proc ICASSP 2:952955
Kovacevic B, Milosavljevic M, Veinovic M, Markovic M (2004) Robust digital speech
processing. Academic Mind, Belgrade (in Serbian)
Chambers J, Avlonitis A (1997) A robust mixed-norm adaptive filter algorithm. IEEE Sig Process
Lett 4(2):4648
Astrom K (1980) Maximum likelihood and prediction error methods. Automatica 16:551
Barnett V, Lewis T (1978) Outliers in statistical data. Wiley, New York
Williamson GA, Clarkson PM, Sethares WA (1993) Performance characteristics of median LMS
adaptive filter. IEEE Trans Sig Process 41:667680
Banjac Z, Kovacevic BD, Veinovic M, Milosavljevic MM (2001) Robust least mean square
adaptive FIR filter algorithm. IEE Proc Vision Image Sig Proc 148(5):332336
Banjac Z, Milosavljevic M, Kovacevic B, Veinovic M (1998) Robust RLS algorithm for local
echo cancellation using optimal input design. In: Proceedings of the 1st international
conference on digital signal processing and its applications DSPA-98, vol 1, Moscow, Russia,
pp 225231
Banjac Z, Kovacevic BD (2005) Robust parameter and scale factor estimation in nonstationary
impulsive noise environment. In: Proceedings of theIEEE conference, EUROCON 2005,
Belgrade, Serbia and Montenegro
Banjac Z, Kovacevic BD, Veinovic M, Milosavljevic MM (2004) Robust adaptive filtering
with variable forgetting factor. WSEAS Trans Circ Syst 3(2):223229
Sondihi D, Berkly A (1980) Silencing echoes on the telephone network. In: Proceedings IEEE,
vol. 68, Aug 1980
Tao YG, Kolwicz K, Gritton CWK, Duttweiler DI (1984) A cascadable VLSI echo canceller.
IEEE J Sel Areas Commun SAC-2(2):297303
Yaukawa H, Furukawa I, Ishiyama Y (1989) Acoustic echo control for high quality audio
teleconferencing. In: Proceedings of ICASP 89, vol 2, Glasgow, Scotland, pp 20412044,
2326 May 1989
Tanrikulu O, Baykal B, Constantinides AG, Chambers JA (April 1997) Residual echo signal in
critically sampled subband acoustic echo cancellers based on IIR and FIR filter banks. IEEE
Trans Sig Process 45:901912
Kuo M, Pan Z (1994) Development and analyses of distributed acoustic echo cancellation
microphone system. Sig Process 37(3)
Verhoeckx NAM, van den Elzen HC, Snijders FAM, van Gerwen PJ (1979) Digital echo
cancellation for baseband data transmission. IEEE Trans Acoust Speech Sig Process ASSP
27(6)
Falconer D, Mueller KH (1979) Adaptive echo cancellation/AGC structures for two-wire, full-
duplex data transmission. Bell Syst Tech J 58(7)
Yip PCW, Etter DM (1990) An adaptive multiple echo canceller for slowly time-varying echo
paths. IEEE Trans Commun 38(10)
Medvecky M (1996) Modified NLMS algorithm for acoustic echo cancellation. In: Proceedings
IWISP 96, Manchester, UK, pp 683686, 47 Nov 1996
Chen J, Vandevalle J (1989) Study of nonlinear echo canceller with voltera expansion. In:
Proceedings of ICASSP 89, vol. 2, Glasgow, Scotland, pp 13761379, 2326 May 1989
Moon TK, Stirling WC (2000) Mathematical methods and algorithms in signal processing.
Prentice-Hall, NJ
Index
E
Echo, 187189 G
Echo canceller (EC), 131134 GaussNewton, 63
Echo return loss enhancement (ERLE), 136, Global minimum, 33, 39, 59, 61
137, 139, 198203 Gradient, 3948, 65, 6772, 118, 119, 121,
Efficiently robust, 152 123, 130, 149, 150, 153, 163, 171, 202
Equation error (EE), 6063, 67 estimation, 37
Error signal, 31, 37, 46, 49, 54, 59, 66
Estimation, 78, 88, 89, 91, 94, 98, 100, 109,
112, 115117, 121124, 129131, H
135146, 149154, 164, 165, 175176, Hessian, 64, 65, 118, 119, 122, 123
179, 194, 194, 197198, 201204 Huber, 152, 156, 163, 178
I N
Identification, 110, 123, 133, 140, 147150, NewtonRaphson, 163, 164, 171
158, 180, 184 Newtons method, 135, 146, 168, 169
Impulse interference, 139, 149, 152, 163 Noise variance, 77, 179, 180
Impulse noise, 139, 148, 149, 151, 152, 154, Nonstationarity, 7577, 87, 88, 9193,
155, 157162, 166, 168, 170, 177183 101103, 145, 183
Infinite impulse response (IIR), 3, 29, 3239, Nonstationary environments, 75, 78, 87, 139,
5966, 71, 84, 109, 134, 197, 200204 180
Influence function, 152, 155, 157, 163, Normal distribution, 134
178, 183 Normalized estimation error (NEE), 135, 136,
Iteration, 91, 101, 109, 110, 136139, 154, 139, 140, 145, 146, 158, 169, 170, 179
176, 182, 185 Normalized information matrix, 112, 113
Normalized least mean square algorithm
(NLMS), 51, 198
K
Kalman filter, 16, 17, 2126
KramerRao, 27 O
Optimal filters, 9
Optimal input, 112115, 132, 134, 136, 138,
L 140, 146
Lag error, 87 Optimal output, 109, 110, 117, 133, 134, 150,
Laplace distribution, 151 166169, 188
Laplace transform, 7, 10, 14 Ordinary differential equations (ODE), 115, 124
Least absolute deviation (LAD), 151, 158 Outlier, 147149, 151, 158, 162, 182, 184186
Least mean square (LMS), 4649, 53, 62, 67, Output eror (OE), 60, 62, 6668, 71
109, 148154, 157162, 202
Least squares algorithm, 48
Line echo, 133, 193, 195 P
Local echo, 131, 132, 134, 136, 188, 193195, Parallel adaptation (PA), 92, 96, 101, 103, 104,
197, 204 106, 108, 140, 181, 182
Lyapunov, 115, 124, 128131, 155 Parallel adaptation recursive least square
(PA-RLS), 86, 92, 140146, 181, 182,
185, 186
M Parameter
Mathematical expectation, 84, 88, 93, 117, identification, 101, 109, 123, 148, 180
118, 122, 126, 127, 136, 163, 171 update, 33, 73
Maximal likelihood, 147, 150, 151, 156, 161, vector, 3740, 42, 4547, 52, 53, 60,
170 6266, 68, 69, 110118, 112, 126,
Mean-square criterion, 117 131133, 145, 149, 154, 162, 16.3, 168,
Mean square error (MSE), 1013, 27, 37, 171, 174, 183
3944, 47, 62, 64, 67, 72, 88, 9092, Prediction error, 76, 77, 85, 92, 98, 105, 111,
109, 111, 117, 122, 123, 131134, 117, 118, 121, 124, 130
162, 197 Probability density, 38, 97, 150, 152, 156, 157
Median, 77, 78, 149, 153, 178, 182 Pseudo-linear regression (PLR), 72
Median least mean square (MLMS), 149, 154, Pseudo-random binary sequence (Prbs), 135
158162
Median of absolute deviation (MAD), 153,
154, 178 R
M-estimator, 150, 152, 170 Random disturbance, 147
Mixed normal distribution, 151 Random walk (RW), 8788
Index 211
S Z
Scaling factor, 117, 123, 153, 156, 157, 170, Zeroes
172, 174181 filter, 32, 3436
Scrambler, 131, 133 polynomial, 4
Signal to noise ratio (SNR), 77, 83, 103, 104,
134138, 140146, 168, 177
Steepest descent method, 4247
Structures of digital filters, 31