Branko Kovačević, Zoran Banjac, Milan Milosavljević (Auth.) - Adaptive Digital Filters-Springer-Verlag Berlin Heidelberg (2013) PDF

Branko Kovacevic
Zoran Banjac
Milan Milosavljevic
Academic Mind
University of Belgrade - School of Electrical Engineering
University Singidunum
Springer-Verlag Berlin Heidelberg
2013
Adaptive Digital Filters
Branko Kovacevic Zoran Banjac

Milan Milosavljevic
Adaptive Digital Filters
Academic Mind
University of Belgrade - School of Electrical Engineering
University Singidunum, Belgrade
Springer-Verlag Berlin Heidelberg
Branko Kovacevic Milan Milosavljevic
University of Belgrade University of Belgrade and Singidunum
Belgrade University
Serbia Belgrade
Serbia
Zoran Banjac
School of Electrical and Computing
Engineering of Applied Studies
Belgrade
Serbia
ISBN 978-3-642-33560-0 ISBN 978-3-642-33561-7 (eBook)

DOI 10.1007/978-3-642-33561-7
Springer Heidelberg New York Dordrecht London
Library of Congress Control Number: 2013935402

ISBN of Academic Mind : 978-86-7466-434-6
Academic Mind Belgrade and Springer-Verlag Berlin Heidelberg 2013

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts
in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being
entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication
of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the
Publishers location, in its current version, and permission for use must always be obtained from Springer
and Academic Mind. Permissions for use may be obtained through RightsLink at the Copyright
Clearance Center. Violations are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.
Printed on acid-free paper
Academic Mind is official publisher of University of Belgrade, School of Electrical Engineering

(www.akademska-misao.rs)
Springer is part of Springer Science+Business Media (www.springer.com)
Preface
The book Adaptive Digital Filters appeared as a result of years of cooperation

between the Department for Signal Processing within the Institute of Applied
Mathematic and Electronics, Belgrade, Serbia and the Division for Automation
that has later evolved to the present Department for Signals and Systems within the
School of Electrical Engineering, University of Belgrade. This cooperation started
back in mid-1970s, with a goal to research the phenomenon of speech and has
continued up until present days. Among important results of joint science research
in the fields of modeling, analysis, processing, recognition, and transmission of
speech signals are, besides mathematical algorithms, program packages, technical
solutions, and electronic instruments, also numerous research papers, either pub-
lished in leading international science journals or presented in proceedings of
prestigious international science conferences. Since the very inception of their
cooperation the Institute and the School introduced a custom to inform a wider
circle of domestic researchers and experts, as well as the students of mathematics,
electrical engineering, computing, and related areas about the most important
results of their joint projects. This goal was usually achieved by publishing science
monographs in Serbian language, and this was the way two science monographs
appeared, Speech signal processing and recognition (a group of authors from the
Institute and the School, published by the Center of High Military Schools, Bel-
grade, 1993) and Robust digital processing of speech signals (published by
Academic Mind, Belgrade, 2000). It is important to mention here that the quoted
publications were preceded by a number of master theses and doctoral disserta-
tions. In this way, this book represents a continuation of the good practice intro-
duced by the Institute and the School and presents the results of joint research
within the past decade. The youngest author of this book, Dr. Zoran Banjac,
finished his masters thesis and doctoral dissertation in the course of his work
within these projects. The aim and the stance of this book was maybe best defined
by its reviewers, professors at the School of Electrical Engineering, the University
of Belgrade, professors Ljiljana Milic and Duan Drajic, who wrote in the con-
clusion of their review: The monograph Adaptive Digital Filters presents to our
scientific and expert community an important discipline which has been under-
represented prior to the appearance of this book. The book first makes the reader
acquainted with the basic terms of filtering and adaptive filtering, to further
v
vi Preface
introduce the reader into the field of advanced modern algorithms, some of which
represent a contribution of the authors of the book. The work in the field of
adaptive signal processing requires the use of a complex mathematical apparatus.
The manner of exposition in this book presumes a detailed presentation of the
mathematical models, a task done by the authors in a clear and consistent way. The
chosen approach enables everyone with a college level of mathematics knowledge
to successfully follow the mathematical derivations and descriptions of algorithms
in the book. The algorithms are presented by flow charts, which facilitates their
practical implementation. The book gives many experimental results and treats the
aspects of practical application of adaptive filtering in real systems. The book will
be useful both to students of undergraduate and graduate studies, and to all of those
who did not have an opportunity to master this important science field during their
formal education.
The authors would like to express their gratitude to the referees for their useful
suggestions and advices which contributed significantly to the quality of the book.
The text of the book is divided into six chapters.
The first, introductory chapter, considers generally three most often used
theoretical approaches to the design of linear filtersthe conventional approach,
the optimal filtration, and the adaptive filtration. The further text analyzes only the
third approach, i.e., the adaptive filtration.
Chapter 2 presents the basic structures of adaptive filters. It also considers the
criterion function for the optimization of the parameters of adaptive filters and
analyzes the two basic numerical methods for the determination of the minimum
of the criterion function: the Newton method and the method of steepest descent.
After presenting the basic concept of adaptive filtering, it overviews the standard
and the derived adaptive algorithms of the Least Mean Square ErrorLMS type
and Recursive Least SquareRLS algorithm, for the sake of further analysis and
estimation of the possibilities to modify them in order to improve the character-
istics of the mentioned adaptive algorithms. Also, potential advantages of the
Infinite Impulse ResponseIIR filters impose a need for their more intensive use,
as well as for the analysis of the adaptation of the proposed solution for the
systems with Finite Impulse ResponseFIR systems, as well as for the IIR sys-
tems. This is the reason why in the second chapter care has been given to this
problem too.
An analysis of the ability of adaptive algorithms to follow nonstationary
changes in the system, together with the synthesis of efficient algorithms based on
variable forgetting factor, is presented in Chap. 3. A comparison has been made
among a number of strategies for the choice of forgetting factor (extended pre-
diction error, parallel adaptation, and Fortecue-Kershenbaum-Ydstie algorithm)
against their ability to follow nonstationary changes and the complexity of the
implementation of algorithms. The most convenient strategies for the choice of
variable forgetting factor from the practical point of view were emphasized.
Chapter 4 presents an original approach to the design of an FIR-type adaptive
algorithm with a goal to increase the convergence speed in the parameter esti-
mation process. The approach is based on an optimum approach to the
Preface vii
construction of input signal, which belongs to the D-class of optimal experiment

planning. The properties of the proposed algorithm were subsequently analyzed
through the practical problem of local echo cancellation in scrambling systems.
Besides that, a possibility has been shown to apply this approach in nonstationary
environments through the application of a convenient strategy for the choice of
variable forgetting factor.
Robustification of adaptive algorithms against impulsive nonstationary noise in
the desired response is considered in Chap. 5. An original robust algorithm based
on the LMS approach is presented and an analysis is given of the possibility to
apply D-optimal input to the robust recursive least squares algorithm in order to
improve the convergence speed. After that a robust RLS algorithm is introduced
with a recursive estimation of scaling factors for the case when besides the
impulsive noise sudden changes of the system dynamics also occur. Besides that,
another novel algorithm is presented which besides its robust properties against
impulse noise also has the ability to track nonstationary changes of the values of
estimated parameters. Contrary to the previous algorithm, the proposed approach
is based on the design of robust detector of impulse noise, based on the design of
robust median filter and the application of either robust or standard RLS-type
procedure to the estimation of filter parameters, depending on the detection result.
Chapter 6 is dedicated to the analysis of the possibility to apply the proposed
adaptive digital filters for signal echo cancellation in telecommunication networks.
Let us notice at the end that the algorithms and solutions considered in this book
may find application in the wider area of adaptive signal processing, like adaptive
cancellation of echo signal, adaptive noise cancellation, adaptive equalization, as
well as in processing of signals with various physical nature (speech and image
signals, biomedical signals, signals from radars, sonars, satellites, and other
intelligent sensors.
Belgrade, May 2012 The Authors

Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Conventional Approach to the Design of Digital Filters . . . . . . . 1
1.2 Optimal Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Wiener Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Adaptive Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Structures of Digital Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.1 Filters with Infinite Impulse Response (IIR Filters) . . . . . 32
2.2.2 Filters with Finite Impulse Response (FIR Filters) . . . . . 34
2.3 Criterion Function for the Estimation of FIR Filter Parameters . . . 36
2.3.1 Mean Square Error (Risk) Criterion: MSE Criterion . . . . 37
2.3.2 Minimization of the Criterion of Mean Square
Error (Risk) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 39
2.4 Adaptive Algorithms for the Estimation of Parameters
of FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.1 Least Mean Square (LMS) Algorithm . . . . . . . . . . . . . . 46
2.4.2 Least Squares Algorithm (LS Algorithm). . . . . . . . . . . . 49
2.4.3 Recursive Least Squares (RLS) Algorithm . . . . . . . . . . . 51
2.4.4 Weighted Recursive Least Squares (WRLS)
Algorithm with Exponential Forgetting Factor . . . . . . .. 53
2.5 Adaptive Algorithms for the Estimation of the Parameters
of IIR Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 59
2.5.1 Recursive Prediction Error Algorithm
(RPE Algorithm). . . . . . . . . . . . . . . . . . . . . . . . . . . .. 67
2.5.2 Pseudo-Linear Regression (PLR) Algorithm . . . . . . . . .. 72
ix
x Contents
3 Finite Impulse Response Adaptive Filters with Variable

Forgetting Factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 75
3.1 Choice of Variable Forgetting Factor . . . . . . . . . . . . . . . . .... 75
3.1.1 Choice of Forgetting Factor Based on the Extended
Prediction Error . . . . . . . . . . . . . . . . . . . . . . . . . .... 76
3.1.2 FortescueKershenbaumYdstie Algorithm . . . . . . .... 78
3.1.3 Parallel Adaptation Algorithm (PA-RLS Algorithm) .... 86
3.1.4 Generalized Weighted Least Squares Algorithm
with Variable Forgetting Factor . . . . . . . . . . . . . . .... 93
3.1.5 Modified Generalized Likelihood Ratio:
MGLR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . .... 96
3.2 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .... 101
3.2.1 Comparative Analysis of Recursive Algorithms
for the Estimation of Variable Forgetting Factor
(Analysis of RLS Algorithm with EGP, FKY
and PA Strategy for the Calculation of Variable
Forgetting Factor) . . . . . . . . . . . . . . . . . . . . . . . . .... 101
4 Finite Impulse Response Adaptive Filters with Increased

Convergence Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.1 Definition of the Parameter Identification Problem . . . . . . . . . . 110
4.2 Finite Impulse Response Adaptive Filters with Optimal Input . . . 112
4.3 Convergence Analysis of Adaptive Algorithms . . . . . . . . . . . . . 115
4.4 Application of Recursive Least Squares Algorithm
with Optimal Input for Local Echo Cancellation
in Scrambling Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 131
4.4.1 Definition of the Local Echo Cancellation Problem
in Scrambling Systems . . . . . . . . . . . . . . . . . . . . . . .. 133
4.4.2 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . .. 134
4.5 Application of Variable Forgetting Factor to Finite Impulse
Response Adaptive Filter with Optimal Input . . . . . . . . . . . . .. 139
5 Robustification of Finite Impulse Response Adaptive Filters ..... 147

5.1 Robust Least Mean Square Algorithm . . . . . . . . . . . . . . ..... 149
5.1.1 Robustification of Least Mean Square Algorithm:
Robust LMS Algorithm . . . . . . . . . . . . . . . . . . . ..... 152
5.1.2 Stability Analysis of Robust Estimators . . . . . . . . ..... 155
5.1.3 Simulation-Based Experimental Analysis . . . . . . . ..... 158
5.2 Robust Recursive Least Squares Algorithm
with Optimal Output . . . . . . . . . . . . . . . . . . . . . . . . . . ..... 162
5.2.1 Experimental Analysis . . . . . . . . . . . . . . . . . . . . ..... 168
5.3 Adaptive Estimation of the Scaling Factor
in Robust Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . ..... 170
5.3.1 Experimental Analysis . . . . . . . . . . . . . . . . . . . . ..... 177
Contents xi
5.4 Robust Recursive Least Squares Algorithm with Variable

Forgetting Factor and with Detection of Impulse Noise . . . . . . . 180
5.4.1 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 184
6 Application of Adaptive Digital Filters for Echo

Cancellation in Telecommunication Networks . . . . . . . . . . . . . . . . 187
6.1 Echo: Causes and Origins. . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.1.1 Echo in Speech Transmission . . . . . . . . . . . . . . . . . . . . 189
6.1.2 Acoustic Echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
6.1.3 Echo in Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . . 192
6.1.4 Basic Principles of Adaptive Echo Cancellation . . . . . . . 193
6.2 Mathematical Model of an Echo Cancellation System . . . . . . . . 196
6.3 Analysis of the Influence of Excitation Signal
to the Performance of Echo Cancellation System
for Speech Signal Transmission. . . . . . . . . . . . . . . . . ....... 197
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Abbreviations
AECM Asymptotic error covariance matrix

APE Adaptive echo suppression
BP Band pass filter
BS Band stop filter
EC Echo canceller
EE Equation error
EGP Extended prediction error
EPE Extended prediction error
ERLE Echo return loss enhancement
FF Forgetting factor
FIR Finite impulse response
FKY Fortescue-Kershenbaum-Ydstie
HP High pass filter
IIR Infinite impulse response
LAD Least absolute deviation
LMS Least mean square
LP Low pass filter
LS Least square
MAD Median absolute deviation
MIMO Multiple inputmultiple output
ML Maximum likelihood
MGLR Modified generalized likelihood ratio
MLMS Median least mean square
MSE Mean square error
NEE Normalized estimation error
NLMS Normalized least mean square
ODE Ordinary differential equation
OE Output error
PA-RLS Parallel adaptation recursive least square
PLR Pseudo linear regression
RLMS Robust least mean square
RLS Recursive least square
RMN Robust mixed norm
xiii
xiv Abbreviations
RPE Recursive prediction error

RRLS Robust recursive least square
RRLSO RRLS algorithm with optimal input
RW Random walk
SISO Single inputsingle output
SNR Signal to noise ratio
VFF Variable forgetting factor
WRLS Weighted recursive least square
Chapter 1
Introduction
Electrical filters find numerous practical applications, often for very different
purposes. One of the basic applications of filters is suppression of the influence of
noise or interference, with a goal to extract useful components of the signal. A
large subgroup is linear filters, their crucial property being a linear relation
between the signals at their input and at their output. Basically, there are three
theoretical approaches to the design of such filters: (1) conventional approach; (2)
optimal filtering, the so-called Wiener or Kalman filtering and (3) self-adjusting
filters or adaptive filtering.
Although this book is mostly dedicated to the third approach to the design of
linear filters, i.e. to the synthesis of self-adjusting or adaptive digital filtering, for
the sake of completeness the introductory chapter will consider the other two
approaches to the design of such systems.
1.1 Conventional Approach to the Design of Digital Filters
The conventional approach is based on the design of frequency selective filters

which require prior knowledge on the spectral contents of the filtered signal.
Strictly speaking, the often used term frequency-selective filters suggests that
these systems amplify frequency components from a given range, while sup-
pressing the components from all other ranges. However, in a general case, all
dynamical systems that modify signals in certain frequency ranges can be
understood as filters [13]. A linear filter is basically a linear dynamic system with
constant parameters, and the design itself reduces to the choice of the values of
these parameters so that the system satisfies the predefined requirements regarding
the bandwidth, amplification within the bandwidth, attenuation of frequency
components outside the bandwidth, resonant frequencies, etc.
Conventional filter design implies the following steps: (1) specifications of the
required properties of a linear, time-invariant dynamic system; (2) approximation
of these specifications, utilizing the properties of causal discrete systems (the
response or output of such systems is equal to zero prior to switching on the input
B. Kovacevic et al., Adaptive Digital Filters, 1

DOI: 10.1007/978-3-642-33561-7_1,
2 1 Introduction
or excitation signal); (3) system realization. Although these three steps are not
independent, a special care is dedicated in literature to the second step, since the
first step is primarily dependent on the field of application of the filter, while the
third step is connected with the existing implementation technology. It is inter-
esting to note that digital filters are often used to process signals obtained from
continuous signals utilizing analogdigital (A/D) converters. When a digital filter
is used to process analog signals, the specification of the digital filter and the
effective continual filters (whose digital approximation is designed) is given in the
frequency domain. This is primarily valid for frequency selective filters, like low-
pass filter (LP), band-pass filter (BP) and high-pass filter (HP). If the sampling
(discretization) period, T, is sufficiently short, no overlap will occur between the
frequency components from different periods in the periodic frequency charac-
teristics (spectrum) of the discretized continual filter (aliasing effect). Thus in
Nyquist range jXj\ Tp, where X is the analog angular frequency, the digital filter
will behave virtually identical to the desired continual filter with the frequency
response
jXT
H e ; jXj\p=T
Heff jX 1:1
0; jXj p=T
In this case the specification that may be posed to the effective continual filter
may also be posed as the requirements for the digital filter, by introducing sub-
stitution x XT (x is denoted as digital, and X analog angular frequency), i.e.
Hejx becomes the specification of the digital filter within a (basic) period of
infinite and periodic frequency response of the digital filter (represents an infinitely
periodic function of the argument x, with a period 2p)
x
H ejx Heff j ; jxj\p 1:2
T
A typical characteristic of a digital filter is shown in Fig. 1.1, plotted for a
normalized (digital) frequency 0 x p.
In order to satisfy the posed requirements, the real characteristics must fulfill
the following:
Fig. 1.1 Specification of H eff ( j)

requested amplitude-
frequency characteristics of
an LP filter 1 + 1
1 1
Bandpass Transition Bandstop

range range range
2

P S /T
1.1 Conventional Approach to the Design of Digital Filters 3

1 d1 H ejx 1 d1 ; jxj xp ; xp Xp T 1:3
jx
H e d2 ; xs x p; xs Xs T 1:4
Many practically used filters are specified in the presented manner, without any
limitation posed to their phase characteristics, stability (a system is stable if its
output in a steady or equilibrium state which commences after its transient state
vanishes is dictated by its excitation only) and causality (a causal system has a
property that its output signal or response is equal to zero before the input or
excitation signal is brought to it). At the same time it is known that an Infinite
Impulse Response (IIR) filter must be causal and stable, while the basic
assumption for a Finite Impulse Response (FIR) filter is that its phase is linear,
since such a filter by itself represents a stable and causal system. In any case,
according to the specified requirements one should choose a filter with frequency
characteristics satisfying the posed limits, which represents a problem of functional
approximation (Fig. 1.2).
Traditional approach to the design of IIR filters is based on the transformation
of the transfer function of a continuous filter satisfying given requirements into a
corresponding discrete transfer function [1, 3]. This approach stems from the
following:
Procedures for the design of continuous filters are well researched, numerous
and they furnish good results;
Many methods for continuous filter design are very simple and their transfor-
mation to the digital domain significantly simplifies the numerical procedure of
digital filter design;
Many simple methods used for the design of continuous filters do not furnish
simple solutions in closed form if directly applied for digital filter design.
The Butterworth filter and the Chebyshev filter are most often used for the
design of continuous filters.
The Butterworth LP filter is designed to satisfy the requirement to have a
maximally flat amplitude-frequency characteristic in the passband. For an N-order
filter this means that the first (2N 1) derivatives of the squared amplitude
H (e j )
Fig. 1.2 Satisfactory
amplitude-frequency
characteristics of an LP filter
1 + 1
1 1
2

P S /T
4 1 Introduction
characteristic at zero frequency should be equal to zero. Another important

property of the Butterworth filter is the monotonicity of the amplitude character-
istics both in the passband and the stopband range. A squared amplitude charac-
teristic of a continual (analog) Butterworth filter is defined in the following manner
[1, 3]:
1
jHc jXj2 : 1:5
1 jX=jXc 2N
Some of these characteristics for different values of the parameter N (filter
order) are shown in Fig. 1.3.
With an increasing parameter N, the filter characteristic becomes sharper and
the transition from the passband to the stopband becomes steeper. In other words,
the characteristic is flatter in the passband, and in the stopband it is closer to zero if
the filter order N is higher. According to the squared amplitude characteristic, one
may conclude that
1
Hc sHc s 2N : 1:6
s
1 jXc
The zeroes of the polynomial in the denominator (the so-called poles of the
filter) in the last expression are
sk 11=2N jXc Xc ejp=2N 2kN1 ; k 0; 1; . . .; 2N 1: 1:7

Thus these 2N zeroes are located at a circle with a radius Xc , in the s (complex
frequency) plane. Therefore the poles of this complex function are symmetrically
distributed around the imaginary axis and none of them can ever be on the
imaginary axis. The angular distance between them is p=N. This means that one
half of them, those located in the left half-plane of the s-plane (stable poles),
should be associated with the function Hc s, while the other half should be
Fig. 1.3 Amplitude-

N=2
frequency characteristics of 1 N=4
Butterworth filters of N-th N=8
order, N = 2, 4 and 8
0.8
|Hc(j )|2
0.6
0.4
0.2
0
c
associated with the function Hc s. Thus the stable transfer function of a But-
terworth filter can be written in the form
K
H c s ; 1:8
s s1 . . .s sN
where the poles si are from the circle with the radius Xc from the left half-plane of
the s-plane, while the amplification K is calculated from the condition that (unit
amplification at zero frequency)
Y
Hc 0 1 ) K 1N si : 1:9
i1;N
The amplitude-frequency characteristics (frequency response) of a Butterworth

filter is monotonous both in the passband and in the stopband. A much more
efficient solution, in the sense of satisfying the posed requirements by a lower-
order filter, is obtained by designing a filter whose amplitude-frequency charac-
teristics is nonmonotonous. In this way one arrives to Chebyshev filters, where we
discern two cases. One of them regards the amplitude-frequency characteristics
that is nonmonotonous in the stopband (Chebyshev filter of the first kind), while
the other type is characterized by an amplitude-frequency characteristics monot-
onous in the passband and nonmonotonous in the stopband (Chebyshev filter of the
second kind). The squared amplitude characteristics of a Chebyshev filter of the
first kind is
1
jHc jXj2 ; 1:10
1 e2 VN2 X=Xc
where VN x denotes a Chebyshev polynomial of the N-th degree
VN x cosN arccos x: 1:11
It turns out that there is a recurrence relation between Chebyshev polynomials
(Fig. 1.4)
VN1 x 2xVN x VN1 x; 1:12
where V0 x 1 and V1 x x. Taking into account the form of the Chebyshev
hpolynomials,
i it is clear that the value of the function jHc jXj2 varies in the range
1
1e2 ; 1 for frequencies in the range X 2 0; Xc .
The poles of a Chebyshev filter are located on an ellipse in the s-plane. The
ellipse is defined by two circles with their radii equal to the major and the minor
axis of the ellipse, respectively. The length of the minor axis is 2aXc , where
1 1=N p
a a a1=N ; a e1 1 e2 : 1:13
2
The length of the major axis is 2bXc , where
6 1 Introduction
Fig. 1.4 Typical graph of the

squared Chebyshev function
1 1=N
b a a1=N : 1:14
2
The poles of the filter are determined by a procedure in which in the first step
one sketches 2 N half-lines in the s-plane with their starting point in the origin,
they being mostly equidistant (each two half-lines form an angle equal to p=N
radian) and with positions symmetric with regard to the real and the imaginary
axis. In other words, if N is odd, these half-lines form an angle of ip=N; i
0; 1; . . .; 2N 1 with the positive part of the real axis, and an angle of i
0:5 p=N; i 0; 1; . . .; 2N 1 if N is an even number. After that one determines
the cross-sections of these half-lines with the circle whose center is in the origin
and the radius is aXc . The real parts of these cross-sections, which are negative,
simultaneously represent the real parts of the desired poles. The imaginary parts of
the desired poles are determined in the cross-section of the corresponding half-
lines with a circle with its center in the origin and with a radius bXc . The procedure
of the determination of the fourth-order filter poles (N = 4) is shown in Fig. 1.5.
The obtained poles are simultaneously on the ellipse with its semi-axes aXc and
bXc .
In that case one adopts the following form for the transfer function of the
Chebyshev filter
K
H c s ; 1:15
s s1 s s2 . . .s sN
where
XNc
K : 1:16
2N1 e
A Chebyshev filter of the second kind is characterized by the fact that its
squared amplitude characteristic is given by the following expression
Fig. 1.5 Procedure for

determination of the poles for bc
a fourth order Chebyshev
filter of the first kind s1
s2
ac
s3
s4
1
jHc jXj2 1
: 1:17
1 e2 VN2 X=Xc
There are several ways to transform a continuous filter into a digital filter. One
of them is the method of impulse invariance, another one is based on the bilinear
transformation, while the third is the co-called matching method [3].
The impulse invariance method starts from the assumption that the impulse
response of a digital filter should be equivalent to the values of the impulse
response of a continuous filter in the moments of sampling
h n Thc nT: 1:18
However, the matching method is much more often used; it is not based on
generation of impulse response in time domain, but on the transformation of the
filter transfer function. This method starts from the assumption that the transfer
function of a continuous filter can be written as a sum of partial fractions
XN
Ak
Hc s : 1:19
k1
s sk
The corresponding impulse response is then determined as the inverse Laplace

transform of the given transfer function [3], i.e.
8 1 Introduction
8
<PN
Ak esk t ; t 0
hc t 1:20
: k1
0; t\0
In that case the impulse response of the digital filter should assume the form
X
N
hn Thc nT TAk esk nT un
k1
1:21
X
N
sk T n
TAk e u n
k1
where u n denotes the samples of a unit step signal (u n 1 for n 0 and

u n 0 for n\0).
This further means that the transfer function of the digital filter, as the Z
transform of its impulse response, [13], will assume the form
X
N
TAk
H z 1:22
k1
1 esk T z1
The application of this method is not motivated by the preservation of the

impulse response, but by the fact that the limited spectrum of the continuous filter
will be preserved in the digital filter as well. However, if the response to step
excitation (or characteristic values related to the step response, like rise time, peak
overshoot, decay time, etc.) is especially important, one may utilize the step
invariance method [3] as a full alternative to this procedure.
On the other hand, the bilinear transformation approach is applied if the effect
of aliasing (overlap of frequency contents from different periods of the frequency
response) should be avoided even if frequency warping is the price to pay. In that
case one simply replaces the complex variable s in the transfer function of the
continuous filter by the expression [3]
2 1 z1
s : 1:23
T 1 z1
Until now we discussed the design of LP filters only. If we desire to design
another type of filters (e.g. an LP filter with another cutoff frequency, HP, BP or
band-stop filterBS), it is necessary to transform the complex variable z. Thus if
the filter HLP Z is defined, all other types can be obtained in the following manner
[3]

H z HLP Z Z 1 Gz1 ; 1:24
where the function G must be rational, while the interior of the unit circle must
map again into the interior of a unit circle. Constantinides found in 1970 the most
general form of this function [3]
YN
z1 ak
Z 1 G z1 : 1:25
k1
1 ak z1
According to this one may render the following table, which enables mapping
of the obtained transfer function for an LP filter to the transfer function of a digital
filter of the corresponding LP, HP, BP and BS type.
Filter Transformation Parameters

type

LP z1 a Hp xp
Z 1 sin
1 az1 2
a
Hp xp
sin
2

HP z1 a Hp xp
Z 1 cos
1 az1 2
a
Hp xp
cos
2
x x
BP 2ak 1 k1
x x H
p2 p1
z2 z cos
k 1 k1 2 p2 p1 p
a xp2 xp1 ; k cot
1 tan
Z
k 1 2 2ak 1 cos 2 2
z z 1 2
k1 k1
x x
BS 2ak 1k
z2 z1 cos
p2 p1
x x H
k 1 1 k a 2
x x ; k tan p2 p1 p
Z 1 tan
1 k 2 2a 1 cos
p2 p1 2 2
z z 1 2
1k 1k
where Hp determines the bandpass frequency of the original filter, while xp

determines the new frequency for the new LP filter, or the bandstop frequency for
an HP filter. xp1 and xp2 denote cutoff frequencies for a BP and BS filter type,
respectively.
1.2 Optimal Filters
1.2.1 Wiener Filter
The design of an optimal filter [49] assumes the use of the optimization theory,
with a goal to arrive at a solution that is optimal with regard to a previously defined
criterion. These criteria are usually based on the minimization of the mean square
value of the difference between the actual output from the filter and a reference
signal or a desired filter response, and the obtained structure is often denoted as the
Wiener filter [79].
10 1 Introduction
noise
v (t)
+
input signal Filter output signal
+ W(s)
y (t) + z (t) y (t)
y (t)
+ error signal
+
yi (t)
Reference system
I(s)
reference signal
Fig. 1.6 Block diagram of a Wiener filter
A block diagram of a Wiener optimal filter is given in Fig. 1.6. As shown in

Fig. 1.6, the information or a useful signal y t is contaminated by additive noise
v t, so that the measured signal (observation) z t y t v t is passed
through a linear time-invariant filter (system) generating its output signal ^y t.
This signal is compared with a reference signal (desired response) yi t in order to
form the error signal ~yt. The desired signal is generated by passing noiseless
input (received) signal y t through an ideal (desired) system. The signals y t and
v t are stationary stochastic processes (their statistical properties do not vary with
time) with zero mean values, with given spectral power densities Sy s and Sv s,
respectively (according to the Wiener-Khintchine theorem the spectral power
density Sx s and the autocorrelation function of a stationary stochastic process
x t with a zero mean value Rx t Efxtxt sg represent a Fourier transform
pair, i.e. Sx ssjx F Rx s and they are uncorrelated (their cross-correlation
function is Ryv t Efytvt sg 0, i.e. their cross-correlated power density

is Syv s F Ryv s sjx 0). The filter itself is modeled by a transfer function
W(s) (it represents the Laplace transform of the filter response to Dirac impulse
excitation when the initial conditions in the system are equal to zero, or the ratio of
the Laplace transform of the output signal and the Laplace transform of the input
signal under zero initial conditions), so that the output signal of the filter in the s-
domain of complex frequency is given as Y^ s W s Y s. The transfer func-
tion of the reference system is I s and it is usually adopted that I s 1, so that
the reference (desired) response yi t is equal to the noiseless input signal y t.
The problem of the optimal estimation reduces to the task of choice of transfer
function W s of the filter so that the filter response ^y t is the best (optimum)
estimation of the reference response yi t regarding some adopted criterion. As the
criterion of optimality Wiener adopted the mean or expected value of the mean
square error (MSE criterion)
n o
MSE E ~y2 t E yi t ^yt2 : 1:26
1.2 Optimal Filters 11
Since the mean value of the error signal is zero, i.e. Ef~ytg 0, the expression
(1.26) also defines the variance of the error (the error is the deviation of the filter
response from the reference or desired response). The minimization of the MSE
criterion (1.26) is easier to implement in the frequency domain than in the time
domain, which is done by applying the Parsevals theorem to (1.26)
Zj1
1
MSE S~y sds; 1:27
2pj
j1
where S~y s is the spectral power density of a stochastic error signal ~y t, i.e.

S~y s E Y~ sY~ s sjx . The reformulation of the optimization problem (1.26)
into the frequency domain, i.e. the relation (1.27), assumes the existence of error
spectral power density S~y s, i.e. the stationarity of the random signals. Since
according to Fig. 1.6 one has ~yt yi t ^yt, by applying the Laplace trans-
form (operator) to the previous relation one obtains
Y~ s Yi s Y^ s; 1:28
where
Yi s I sY s; 1:29
Y^ s W sZ s W sY s V s: 1:30
By replacing (1.29) and (1.30) into (1.28), one obtains
Y~ s I s W sY s W sV s: 1:31
Since according to the defining relation for the spectral power density

S~y s E Y~ sY~ s ; 1:32
by replacing (1.31) into (1.32), one may write
S~y s I s W sSy sI s W s W sSv sW s: 1:33
While deriving (1.33) it has been taken into account that y t and v t are
uncorrelated stochastic signals, i.e. that the cross-spectral power density is
S~y s EfY sV sg 0. By introducing the expression (1.33) into (1.27) the
MSE criterion assumes the form
Zj1
1
MSE I s W sSy sI s W s W sSv sW s ds:
2pj
j1
1:34
12 1 Introduction
The optimization problem under consideration now reduces to the choice of the
transfer function W s so that the MSE-criterion is minimized (1.34). Thus posed
optimization problem represents a classical task of the variational calculus [9].
Following the methodology of the variational calculus, let us denote the required
optimal transfer function by Wo s, an arbitrary fixed transfer function by g s,
and a scalar parameter with adjustable value by e, where W s Wo s egs. In
this case the MSE criterion as a function of the parameter e for different fixed value
of the parameter g will have a qualitative dependence as shown in Fig. 1.7.
It is obvious from Fig. 1.7 that the necessary and sufficient conditions for the
minimum of the criterion are

oMSE o2 MSE
0; [ 0; 1:35
oe e0 oe2 e0
for any variation of the parameter g. The application of the variational procedure
to the problem under consideration reduces to the following. By replacing
W s Wo s egs 1:36
into the MSE-criterion (1.34) one obtains
Zj1
1
MSEe; g I s Wo s egsSy sI s Wo s egs
2pj
j1
Wo s egsSv sWo s egsg ds;

1:37
Fig. 1.7 MSEcriterion as a MSE

function of the parameter e
for fixed values of the
parameter g
3
2
1

so that the first relation in (1.35) (the necessary condition for the minimum of the
criterion) reduces to
Zj1
1
MSE Wo s Sy s Sv s I sSy s gsds
2pj
j1
1:38
Zj1
1
gs Sy s Sv s Wo s Sy sI s ds:
2pj
j1
Bearing in mind the property of symmetricity of spectral power densities,

Sy s Sy s and Sv s Sv s, we conclude that the integrals in (1.38) are
identical, so that (1.38) reduces to the condition
Zj1

W0 s Sy s Sv s I sSy s gsds 0: 1:39
j1
Obviously (1.39) will be fulfilled for an arbitrary value of g s if

1
W0 s I sSy s Sy s Sv s : 1:40
The obtained solution is denoted as the physically unrealizable Wiener filter,
since W0 s in general case may have poles in the right s-half-plane. Let us note
that, since we perform frequency analysis, i.e. the Fourier transform (s jx), the
poles in the right half-plane of the s-plane do not point out to an unstable but to a
non-causal system in which response is generated before excitation (such a system
is physically unrealizable). Thus, for (1.37) to be an acceptable solution, the
transfer functions W s, Wo s and g s must be physically realizable (causal) or,
in other words, all poles of these complex functions must be in the left half-plane
of the s-plane (the transfer function is a real rational function, i.e. a ratio of two
polynomials over s with fixed coefficients; the roots of the polynomial in the
denominator define the poles of the system, while the roots of the polynomial in
the numerator define the zeroes of the system. Thus the task reduces to the choice
of a transfer function Wo s which will satisfy the condition (1.39), with a limi-
tation that such a solution must be physically realizable (causal). To this purpose
let us denote the spectral power density
Sz s EfZ sZ sg EfY s V sY s V sg

EfY sY sg EfV sV sg EfV sY sg EfY sV sg
Sy s Sv s Svy s Syv s
1:41
in the form (Svy s Syv s 0 according to the starting assumption)
14 1 Introduction
Sz s Sy s Sv s DsDs; 1:42
where D s is a real rational function with zeroes and poles located in the left half
of the s-plane. This ensures that the complex functions D s and D1 s are
regular in the right half of the s-plane. The task (1.42) is denoted as the problem of
spectral factorization and in a general case it cannot be solved analytically, in a
closed form, but requires instead the application of suitable numerical algorithms.
By replacing (1.42) into (1.39), we obtain
Zj1

Wo sDs I sSy sD1 s Dsgsds 0: 1:43
j1
The next step is to develop the term I sSy s D1 s into partial factors, i.e. to
represent it in the form
I sSy sD1 s As Bs; 1:44

where the rational function A s contains partial factors stemming from the poles
in the left half-plane of the s-plane, while the rational function B s contains
terms stemming from the poles in the right half-plane of the s-plane. The transfer
function A s may be also determined as the Laplace transform of the part of the
impulse response of the system I s Sy s=D s in the positive time interval t 0
(causal part of the impulse response, since the impulse response in general case is
non-causal and exists for 1\t\1). Symbolically, A s may be expressed as

As I sSy sD1 s Fo ; 1:45
where the symbol Fo denotes the physically realizable part of the corresponding
transfer function in the brackets . By replacing (1.45) into (1.43) one obtains
Zj1 Zj1
W0 sDs AsDsgsds BsDsgsds 0: 1:46
j1 j1
Since all poles of the rational function B s D s g s are located in the

right half-plane of the s-plane, if the integral along the imaginary axis of the
s-plane is complemented by a semi-circle with infinite radius in order to obtain a
closed contour C that encompasses the whole half-plane of the s-plane (the integral
along the semi-circle is equal to zero, since the argument s ! 1 at the semi-
circle, and in the rational function being integrated the order of the polynomial in
the numerator is lower than the order of the polynomial in the denominator, so that
the integral along the closed contour C is equal to the integral along the imaginary
axis), the application of the Cauchys Residue Theorem furnishes
Z j1
Z X
n
BsDsgsds BsDsgsds Res si 0; 1:47
i1
C j1
where Res si represents the residuum of the integrand function B sD s g s

in the pole si ; i 1; . . .; n, located within the region encompassed by the closed
contour C. Since the region encompassed by the contour C is the left half of the s-
plane, and the rational integrand function has all its poles in the right half of the s-
plane, this means that n 0, i.e. there are no poles within this region and the value
of the integral is equal to zero, so that (1.46) reduces to
j1
Z
Wo sDs AsDsgsds 0: 1:48
j1
According to (1.48) we conclude that the optimal physically realizable (F0)

filter transfer function is
Wo s AsD1 s; 1:49
i.e.

Wo s I sSy sD1 s Fo D1 s 1:50
In the case when the signal y t and noise vt are not uncorrelated (cross-
spectral power density Syv s EfY sV sg 6 0 and
Syv s EfV sY sg 6 0), the optimal Wiener filter is defined by the transfer
function

Wo s I sSzy sD1 s Fo D1 s; 1:51
where
DsDs Sz s Sy s Sv s Syv s Svy s; 1:52
while
Szy s EfZ sY sg EfY s V sY sg

1:53
EfY sY sg EfV sY sg Sy s Svy s:
The relation (1.51) defines the transfer function of an optimal analog Wiener
filter. In a similar manner one may derive a discrete transfer function of an optimal
digital Wiener filter

Wo z I zSzy zD1 z Fo D1 z; 1:54
where Fo denotes the physically realizable part of the discrete transfer function in
brackets (physically realizable discrete transfer function has all its poles within
16 1 Introduction
the region bound by a unit circle, jzj 1, [3]; the connection between the complex
s-plane and the complex z-plane is defined as z expfsT g, where T is the period
of discretization of the analog signal [3]).
The spectral factorization (1.52) now assumes the form
DzDz Sz z Sy z Sv z Syv z Svy z; 1:55
where the spectral power density is

Szy z E Z zY z1 E Y z V zY z1

E Y zY z1 E V zY z1 1:56
Sy z Svy z
Let us note that the discrete version of the analog optimal filter can be also
directly obtained from its analog transfer function, by using some of the discret-
ization techniques described in the previous Chapter.
The Wiener filter did not find wider application in engineering practice, since
Kalman solved the posed optimization problem in time domain without a
requirement for the stationarity of the stochastic signals. Also, the Kalman filter
represents a much simpler solution in numerical sense compared to the Wiener
filter [710].
1.2.2 Kalman Filter
The theory of Kalman filtering starts from the assumption that the nature of the
signal of interest is stochastic and that it is generated by passing a white stochastic
process (white noise) through a linear dynamical system [79]. Contrary to the
Wiener theory in which a linear dynamic system is described by an inputoutput
relation, i.e. by a corresponding transfer function in the complex domain, here the
system is represented in time domain by a model in a state space. The quoted
model encompasses a dynamic state equation (which in the case of continual
signals represents a vectorial linear differential equation of the first order, while in
the case of discrete signals this equation becomes a vectorial linear difference
equation of the first order) and an algebraic equation of the system output. In the
discrete case these two equations can be written in the following manner
xk 1 Fxk Gxk; x0; 1:57
yk Hxk vk; 1:58

where
1) x kstate vector of the linear discrete system in a discrete moment tk kT
(Tsampling period)
2) x0unknown initial condition with a mathematical expectance m0 and a

covariance matrix P0 , i.e.

Efx0g m0 ; E x0 m0 x0 m0 T P0 : 1:59
3) x kwhite stochastic process with a zero mean value which excites the
linear model (1.57), the so-called noise of the state or of the process, i.e.

Efx0g 0; E xkxT j Qkdk;j 1:60
for each k; j 1; 2; . . ., where dk;j is Kronecker delta symbol (dk;j 0 for k 6 j
and (dk;j 1 for k j), and Q k represents the covariance matrix for noise.
4) v kwhite stochastic noise with a zero mean value, which represents additive
noise in the output state Eq. (1.58), i.e. measurement noise, i.e.

Efv0g 0; E vkvT j Rkdk;j 1:61
for each k; j 1; 2; . . . where dk;j is Kronecker delta symbol, and R k represents
covariance matrix for this noise.
5) F, G and H given matrices of corresponding dimensions, which in a general
case may also depend on the time index k. If these matrices are constant, while
covariance matrices of noise Q k and R k also do not depend on the time
index k, i.e.Q k Q and R k R, the considered model is time-invariant
or stationary.
6) Further we assume that the vectorial stochastic variables x 0, x k and v k
are mutually non-correlated, so that

E xkvT j 0; E xkx0 m0 T 0; E vkx0 m0 T 0
1:62
for each k; j 1; 2; . . ..
Let us note that the dynamic Eq. (1.57) represents a model-generator of the state
vector, i.e. the physical mechanism generating the components of the state vector
as the physical variables of interest, while the algebraic Eq. (1.58) describes the
mechanism of measurement (observation) of the output signal, taking into account
the sensor inaccuracy itself, as expressed through additive noise. Thus the output
signal of interest is generated as a linear combination of the components of the
state vector, which is additionally contaminated by measurement noise.
The Kalman filter itself represents a recursive numerical algorithm generating
an estimation of the immeasurable state vector in the current discrete moment, with
a time index k, based on the available estimation of the state vector in the preceding
discrete moment, with the time index k 1, and the newly obtained measurement
in the given discrete moment, y k. If we introduce the following notation
18 1 Introduction
^xkjk ^xk estimation of the state vector x k in the moment k, after the
last measurement, y k has been performed;
^xkjk 1 ^xk estimation of the state vector x k in the moment k, before
the last measurement, y k has been performed,
then the estimation ^xk can be formed based on the predictorcorrector algo-
rithm representing a linear combination of the previously known estimation, ^xk
and the newly obtained measured information, y k, i.e.
^xk K k^xk Kkyk 1:63
where K k and Kk are unknown matrices dependent on the time moment k.
These matrices represent free parameters in the algorithm (1.63) and they should
be chosen so that the estimation (1.63) gives the best possible approximation of the
immeasurable random vector variable x k. Thus posed problem imposes the
requirement to define criteria or indicators for the appraisal of the quality of the
estimation. Similar to the well-known criteria for the appraisal of the quality of
measuring equipment, like its accuracy and precision, the terms are introduced
here of unbiased estimation and the total variance of estimation error. If we
introduce the corresponding notation for estimation errors as
~xk xk ^xk ; 1:64
~xk xk ^xk ; 1:65

it is said then that the estimation ^xk of the state vector x k is unbiased if its
mean value (mathematical expectation) is equal to zero, i.e.
Ef~xk g 0 ) Ef^xk g Efxkg: 1:66
On the other hand, the corresponding covariance matrices of estimation errors
are defined as

Pk E ~xk ~xTk ; 1:67

Pk E ~xk ~xTk ; 1:68
so that the total variance of the estimation error is defined as the sum of diagonal
elements of the Pk matrix, since each diagonal element of this matrix repre-
sents the variance of the estimation error for the corresponding component of the
estimated state vector. In this way the total variance of the estimation error in the
moment k, denoted as r2k , is given as the trace of the matrix (1.67), i.e.
r2k TracefPk g 1:69

The condition (1.66) corresponds to the term of accuracy, and the condition
(1.69) to the term of precision in the measurement technique. Namely, since the
estimation of the state vector is a stochastic variable and since it is based on noisy
measurements, the condition (1.66) points out that the mean (expected) value of
this estimation will be equal to the mean (expected) value of the immeasurable and
estimated stochastic state vector itself (the so-called accuracy condition), while the
actual realizations of the estimation will be in some neighborhood around this
mean value, this neighborhood being smaller with a smaller value of the criterion
(1.69), i.e. if the precision is higher. Bearing this in mind, the unknown weighted
matrix in the estimation algorithm (1.63) can be determined from the condition of
unbiased estimation (1.66) and from the condition of minimum criterion (1.69),
and thus such estimation is denoted as the estimation of minimum variance of error
or a discrete Kalman filter. Let us note too that ^xk is also called the filtered
estimation, and ^xk is the single-step prediction [79]. Based on (1.58), (1.63)
and (1.64) we may further write
~xk xk K k^xk KkHxk vk K kxk K kxk;
where the last bracketed addend is included artificially and its value is zero, from
which it follows, taking into account (1.65), that
~xk I K k KkHxk K k~xk Kkvk: 1:70
Since it is assumed that Efvkg 0 and if it is also assumed that the previous
estimation was unbiased, i.e. Ef~xk g 0, the condition (1.66) will be satisfied
for each x k only for
I K k KkH 0 ) K k I KkH: 1:71
Replacing (1.71) into (1.63) and (1.70) it is obtained that
^xk ^xk Kkyk H^xk ; 1:72
~xk I KkH~xk Kkvk: 1:73

Incorporating (1.73) and the definition of the covariance matrix of estimation
(filtration) error into (1.67), it follows further that
Pk Pk KkHPk Pk HT KT k
1:74
Kk HPk HT R KT k;
where R Efv kvT kg. When deriving the solution (1.74) it was taken into
account that

E ~xk vT k 0; E vk~xTk 0; 1:75
since the error ~xk depends on the realization of state noise x i,
i 1; 2; . . .; k 1, which is assumed to be uncorrelated with measurement noise
v k. By replacing (1.74) into (1.69), we obtain the expression for the total var-
iance of the estimation (filtration) error.
20 1 Introduction
r2k TracefPk g 2TracefKkHPk g

1:76
Trace Kk HPk HT R KT k
The last expression was derived utilizing the fact that the second addend in
(1.74) represents the transposed third addend and using the property of the trace

function that TracefAg Trace AT . The optimum matrix of amplification of
the algorithm Kk corresponds to the minimal total variance of the estimation
error and is determined from the necessary condition for the minimum of the
criterion (1.76)
or2k
0 1:77
oKk
The application of the operator of partial differentiation to (1.76) and the cal-
culation of the corresponding partial derivatives requires the knowledge of the
following rules for the differentiation of the matrix trace, as a scalar expression,
over a matrix argument [7]
o
TracefBACg BT CT ; 1:78
oA
o
Trace ABT 2AB for B BT : 1:79
oA
Applying the partial differentiation operator to (1.76), and using the condition
(1.77), it is obtained that
o
2 TracefKkHPk g
oKk
1:80
o
Trace Kk HPk HT R KT k 0:
oKk
If we further apply the rule (1.78) to the first term in (1.80), with B I;
A K k and C HPk , and the rule (1.79) to the second term in (1.80), with
A K k and B HPk HT R, the relation (1.80) reduces to
or2k k
2Pk HT 2Kk HPk HT R 0 1:81
oKk
from which we obtain the optimum amplification matrix
1
Kk Pk HT HPk HT R : 1:82
If the right side of the expression (1.81) is again partially differentiated,
applying the rule (1.78) to the second addend in (1.81), also choosing B I,
A K k and C HPk HT R, it can be written
o2 r2k k
2
2 HPk HT R 0; 1:83
oKk
from which we conclude that the optimum solution (1.82) corresponds to the
minimum of the scalar criterion (1.69). By replacing the optimum amplification
(1.82) into the expression (1.74), we obtain the expression for the covariance
matrix of estimation (filtration) error
1
Pk Pk Pk HT HPk HT R HPk ; 1:84
or, if we introduce relation (1.82) into the expression (1.84),
Pk I KkHPk : 1:85
Relations (1.72), (1.82) and (1.84) or (1.85) define the estimation correction
step based on measurements or the estimation (filtration) step in a discrete Kalman
filter. Obviously the realization of this step assumes that prior to it the prediction
step was realized, i.e. that the values ^xk and Pk are known. Let us note that
the variable
~yk yk H^xk 1:86
is denoted as the measurement residual or the innovation. Namely, based on the
output Eq. (1.58) it is possible to estimate (predict) its expected value before the
signal itself is measured. Since measurement noise is assumed to have a zero mean
value, and in the moment k immediately before the signal y k is measured the
prediction ^xk of the immeasurable random state xk is known, the expected
value or prediction of the output in the moment k is
^yk H^xk : 1:87
In this way, the complete new measurement in the moment k, yk, does not
introduce any new information about the system, since before the measurement
itself we already predicted its value (1.87) according to the measurement model
(1.58), so that the only new information in the measurement is its residual or
innovation (1.86). Further according to (1.58), (1.65) and (1.86) we obtain
~y k H~xk v k; 1:88
from which we conclude that the mean (expected) value of the residual is
E f~ykg 0. This follows from the assumption that measurement noise v k has
a zero mean value, Efvkg 0, and that the prediction ^xk is an unbiased
estimation of the state, i.e. Ef~xkg 0. The covariance matrix of the residual
(which represents variance in the case of one-dimensional signal) is given by the
expression

Sk E ~yk~yT k HPk HT R 1:89
22 1 Introduction
To derive the expression (1.89) we used the relations (1.75) and (1.88). Simi-
larly to measurement noise v k, the residual ~y k also represents white noise with
a zero mean value and with a corresponding covariance matrix S k in (1.89)
(noise covariance matrix vk is R) [79].
The prediction of the system may be regarded as its estimation (filtration) when
a measurement is unavailable (in that case the covariance matrix of measurement
noise R ! 1, so that Kalman amplification in (1.82) is equal to zero, i.e. K 0).
Further, if we include K k 0 in (1.72) and (1.85), it follows that

^xk ^xk Kk0

Pk Pk Kk0
The prediction ^xk itself represents the extrapolation of the state estimation
(filtration) in the moment k 1, ^xk1 , to the moment k, but immediately before
the signal yk is measured. This extrapolation may be done based on the model
(1.57) itself, bearing in mind that the expected value of noise xk is equal to zero.
Then according to (1.57), we may write
^xk F^xk1 : 1:90
Relation (1.90) is derived from (1.57) by replacing the state vector xk by
^xk , and the state vector in the previous moment xk 1 with ^xk1 ,
simultaneously neglecting state noise x k. The prediction error, ~xk in (1.65),
can be written as
~xk Fxk Gxk F^xk1 F~xk1 Gxk
so that the covariance matrix of the prediction error, Pk in (1.68), is given by
Pk FPk1 FT GQGT : 1:91

While deriving the expression (1.91) we took into account that the cross-cor-
relation is

E ~xk1 xT k 0; E xk~xTk1 0 1:92
since the filtration error ~xk1 depends on noise realization xi; 0\i k 2,
and the assumption is that the discrete array fxig is white noise and that it is
uncorrelated in time, i.e. the momentary realization of this noise xk is not
correlated with its prior realizations fxi; i\kg. Since according to the lemma
on matrix inversion the expression (1.84) may be written in the form
P1 1 T 1
k Pk H R H; 1:93
the expression for the amplification matrix of the Kalman filter (1.82) reduces to
1
Kk Pk P1 T T
k Pk H HPk H R
1
Pk P1 T 1 T
k H R H Pk H HPk H R
T
1
Pk HT I R1 HPk HT HPk HT R
1
Pk HT R1 R R1 HPk HT HPk HT R
from where it stems that
Kk Pk HT R1 : 1:94
The expression (1.94) shows that the value of Kalman filter amplification
K depends on the size of the covariance matrix of estimation error P, which
represents a figure of merit of the system state estimation quality, and on the size
of the covariance matrix of measurement noise R, which describes the accuracy of
the output signal measurement in the system. A high value of noise, i.e. a high R,
and a small filtration error, i.e. low P, show that the residual ~y is mostly the
consequence of measurement noise, so that the filter imparts small weight, i.e.
small amplification K to this residual, which does not bear important information
about the estimated state, so that the estimated (filtered) state will be close to its
prediction. On the other hand, low measurement noise (low R) and a large error in
the system state estimation (high P) show that the residual contains a significant
information about the estimation error, so that the filter, through its large ampli-
fication K, performs weighting of the residual ~y in a significantly larger amount
compared to the prediction of the state vector. Also, the size of the covariance
matrix of the prediction error in (1.91) depends directly on the size of the
covariance matrix of state noise Q (mean power of state noise). A high Q shows
inadequacy of the signal model in the state space, while low Q results in a small
error covariance matrix, which shows that the model in the state space represents
an adequate approximation of the real system of interest. This further shows that
the matrix of filter amplification is proportional to Q and inversely proportional to
R, i.e.
K QR1 1:95
The function of a Kalman filter may be represented by the following Table
(Table 1.1).
High filter amplification K points out to its wide bandwidth and faster filter
response to excitation in the form of measured signal, i.e. the measurement
residual. However, a high filter amplification simultaneously means a smaller
degree of noise reduction in the system. On the other hand, smaller filter ampli-
fication means its smaller bandwidth and slower response to the excitation in the
form of measurement signal.
For the implementation of the complete recursive algorithm (Kalman filter) it is
necessary to adopt starting (initial) values ^x0 , P0 , corresponding to a time
index of k 0. If we adopt
24 1 Introduction
Table 1.1 Heuristic description of Kalman filter function

Condition Amplification K Filter parameters
Confidence in model Low P low (adequate model)
(prediction) R high (bad measurement)
Confidence in residual High P low (adequate model);
(signal measurement) R low (good measurement)
P high (inadequate model)

^x0 Efx0g m0 ; P0 P0 E x0 m0 x0 m0 T ; 1:96
such choice ensures that the prediction ^x1 is unbiased.
Indeed, since according to (1.90)
^x1 F^x0 1:97
if (1.96) is replaced into (1.97), it follows that
^x1 Fm0 FEfx0g EfFx0g 1:98
Expression (1.98) may be expanded by the zero term GEfx0g, from which
we obtain, according to (1.57), the condition for the unbiased prediction
^x1 EfFx0 Gx0g Efx1g: 1:99
The unbiased prediction ^x1 further implies, according to (1.70) and (1.71),
unbiased filtration ^x1 , and this further results in unbiased ^x2 , etc., and using
induction we conclude that the estimation of the state will be unbiased in each
moment k. The equations of a digital Kalman filter for a time invariant system
model are shown in Table 1.2.
Table 1.2 shows that, in the case of a time-invariant system model, the cal-
culation of the covariance matrix of the prediction and estimation (filtration) error
proceeds independently on the calculation of the prediction and estimation of the
system state, so that this calculation can be done prior the filter implementation in
the real time itself, so that the designer may estimate if the values of parameters in
the model of the filter were adopted in an adequate manner (the parameters of the
filter are matrices F, G, H in the model of the system, the initial values m0 and P0 ,
as well as noise statistics Q and R). To verify practically the design of a Kalman
filter it is necessary if the estimation of the state at a measurement sample of finite
width is in agreement (i.e. consistent) with the theoretical assumptions. The sta-
tistical criteria for the analysis of the filter consistency are
estimation (filtration) errors, whose elements should represent random variables
with a zero mean value and a zero expected (mean) amplitude, in accordance
with the value of the square root from the corresponding diagonal element of
covariance matrix of estimation error, Pk .
Table 1.2 Flow diagram of the algorithm of digital Kalman filtration

1. Initialization
read constant matrices: F, G, H, m0 , P0 , Q, R
set initial values of the estimated state vector and covariance matrix of estimation error:
^x0 m0 ; P0 P0
2. For each k 1; 2; . . ., perform:
2.1. prediction step, consisting in the calculation of:
system state prediction: ^xk F^xk1
covariance matrix of prediction error: Pk FPk1 FT GQGT
system output prediction: ^yk H^xk
2.2. Estimation (filtration) step, consisting from:
output signal measurement, yk
calculation of residual: ~yk yk ^yk
calculation of covariance matrix of residual: Sk HPk HT R
calculation of amplification matrix: Kk Pk HT S1 k
calculation of state estimation: ^xk ^xk Kk~yk
calculation of covariance matrix of estimation (filtration) error: Pk I KkHPk
3. Increment the iteration (time) counter k by 1 and repeat the procedure starting from the step 2.
residual (innovation) ~yk, which should satisfy the same assumptions as the
error ~xk , and it is only necessary to replace the error covariance, Pk , with
the corresponding residual covariance, Sk.
residual (innovation), which should represent white stochastic process.
The last two criteria can be tested in real time, during the operation of the filter
itself, while the first criterion, although the most important one, can be applied
only in a simulated experiment, since the real error, i.e. the system state, is not
known in reality [11]. The application of the quoted criteria is based on the theory
of statistical decision, i.e. the hypothesis testing [11-13], and the reader is referred
to literature in order to become acquainted with the topic in more detail.
Let us also note that the important properties of the Kalman filter are the
following:
Kalman filter is a linear function of the current measurement, yk.
the estimation of the system state ^xk explicitly depends only on the current
measurement yk, while its dependence on the previous measurements Y k1
fy0; y1; . . .; yk 1g reflects only through their influence to the prediction,
^xk .
covariance matrices of the prediction errors, Pk , and the estimation errors,
Pk , can be calculated in advance, before the implementation of the filter
itself, for the case of a time-invariant system model.
the assumptions that the noise of state, xk, and of measurement, vk, are
white and mutually uncorrelated can be diminished and replaced by the
assumptions about the correlated (colored) state noise, correlated measurement
26 1 Introduction
noise and mutually correlated noises; such assumption require certain modifi-
cation of the filter equation [7, 911].
if the stochastic variables x0, xk and vk have Gaussian (normal) distri-
bution, then the conditional function of the state probability, xk, when the
measurements are given up to the current moment, k, Y k fy0; y1; . . .;
ykg is Gaussian (normal) and its expected value is

E xkY k ^xk
and the covariance matrix is

E xk E xkY k xk E xkY k Y k Pk
In other words, in the quoted case a Kalman filter generates a recursive con-
ditional mathematical expectation, which represents an optimal estimation of the
state vector in the sense of minimal possible covariance matrix of the estimation
error, which reaches the Cramr-Rao lower bound [1315].
if the stochastic variables x0, xk and vk are not Gaussian, then the Kalman
filter is optimal only within the class of linear filters, in the sense of minimal
covariance of the estimation error within the said class of filters.
by replacing (1.84) into (1.91) the vectorial difference equation is obtained
n 1 o
Pk1 F Pk Pk HT HPk HT R HPk FT GQGT
1:100
which is called the Riccati equation; in the case of a time-invariant system, the
solution of the Riccati equation converges asymptotically (k ! 1) to a finite
solution P if the model is observable within the state space [7, 911, 16]; the
condition of the model observability implies that the information about the
immeasurable system states is contained in the measured output, which ensures
that the estimation error remains limited; if additionally the model is in the state
space and controllable, the solution P in equilibrium or stochastic equilibrium state
is also unique; the controllability condition ensures that state noise, as a random
excitation signal, acts to all components of the state vector, which prevents the
convergence to zero of the covariance matrix of estimation error P, i.e. the sta-
tionary solution P will be positively definite matrix with all eigenvalues positive
[7, 911, 16].
In a stochastic stationary state the amplification matrix of Kalman filter K is
constant, and the Kalman filter reduces to a Wiener optimal filter [7, 9, 10, 16].
1.3 Adaptive Filters 27
1.3 Adaptive Filters
The design of an optimal filter is based on previous knowledge on the signal

statistics, and the obtained solution is optimal only in the case when the filtered
signal really has the assumed statistical properties. However, it is very often the
case in practice that the statistical properties of the signal are not available or they
are variable. In that case the statistical parameters change over time and it becomes
very difficult to design an adequate optimal solution.
A convenient approach in the said situations is the design of the so-called self-
adapting filters, known as the adaptive filters. One of the most important properties
of the adaptive filters is their ability to adapt, which leads to the desired system.
Instead of having a system strictly defined in advance, adaptive filters utilize the
information from their surroundings to set the value of their parameters. This
implies that the value of the parameter changes, according to the filter output,
through a feedback and a suitable criterion function. The properties of the feed-
back directly influence the charge of the parameter values, through a system
procedure that is denoted as the learning process or the training period. During the
training the filter output changes in accordance to the reference signal, i.e. the error
value decreases with the advancement of the training process.
The design of the adaptive algorithm is fundamental for this approach. The
purpose of the adaptive algorithm is the observation of the environment and the
adaptation of the filter transfer function in dependence on the observed changes.
The algorithm starts from the previously defined initial conditions, which do not
have to be coordinated with the environment, and based on the momentary values
of the input, the output and the reference signal it tends to find the solution for an
optimal filter. In a stationary environment it is expected that the filter converges to
an optimal filter, while in nonstationary environment it is expected that the filter
will follow changes with time and modify its parameters accordingly. The adaptive
filters do not use any prior knowledge about the statistical properties of the signal,
but estimate them based on a realization of the process, i.e. using a suitable
sequence of time samples of the signal. There are different ways to estimate these
properties and to apply them in the algorithm itself. A consequence of this is the
existence of different adaptive algorithms. The choice of an adaptive algorithm
depends on a particular application of the adaptive filter, and some of the key
parameters are the convergence speed, adaptation success, robustness to errors,
ability to follow fast changes, numerical stability, computational complexity and
possibility of practical implementation.
A block diagram of an adaptive filter is shown in Fig. 1.8. The adaptive filter
includes two processes: (1) filtering process, which generates an output signal
according to he input signal; (2) adaptive process, whose role is to adjust the filter
parameters, i.e. its transfer function, in order to obtain an adequate output signal.
The adaptation process is controlled by the error signal, which is also called
residual, and which represents a measure of the adaptation of the filter parameters,
i.e. it is the indicator of conformity between the output and the reference signal.
28 1 Introduction
Fig. 1.8 Block diagram of

reference signal
an adaptive filter
+
input signal Adaptive output signal
filter
+
parameter change
error signal
When estimating the filter parameters, one often uses squared error signal, or the
mean square error (MSE) as the optimization criterion. In dependence on a par-
ticular use of the adaptive filter, the measure of the adaptation success may be
based on the value of the estimated filter parameters, on the filter output signal or
on the error signal.
The most often utilized adaptive filter structures are the transversal structures
(or FIR), owing to their unconditional stability and the relatively simple analysis of
the properties of these filters. Although other structures also find wide use, the
subject of this text is the analysis of the adaptive FIR filters in a nonstationary
environment, as well as the possibility to increase their convergence speed.
Namely, the estimation of the parameters of the real systems of interest is as a rule
followed by difficulties stemming from the inherent nonlinearity and/or nonsta-
tionarity of the system, as well as from noisy observational measurements.
In signal processing, the statistical properties of the input and the reference
signal determine the environment of the adaptive filter. Although most of the
analyses of adaptive filters in the available literature are based on the assumption
of a stationary environment, the application of adaptive filters is especially con-
venient for nonstationary environments. Nonstationarity can be categorized with
regard to the change of statistical properties of the input signal, the reference signal
or both simultaneously. In this text we consider the cases when the nonstationary
model is a consequence of the variation of the estimated parameters of the filter,
since all of the quoted types of nonstationarity may be represented in this manner.
The most significant measures of the properties of the adaptive filter in such an
environment are the time necessary for the convergence of the algorithm for the
estimation of the filter parameters at the onset of nonstationary changes, and the
achieved accuracy of the estimated parameters after the convergence has been
reached. Due to the mutual incongruity of these two requirements, the standard
adaptive algorithms, adequate for the estimation of parameters in stationary con-
ditions, do not give satisfactory estimation of parameters. Basically these are
algorithms with unlimited memory, which take into account all previous values of
the analyzed signal and perform the estimation of parameters in the next moment.
The result of such an approach are estimations of average behavior of the process
in the time interval under consideration. To analyze nonstationary signals it is
necessary to utilize an algorithm with limited memory, which is solved by
introducing a variable forgetting factor. By generating such a variable forgetting
factor using residuals, it is possible to adequately follow both slow and abrupt
1.3 Adaptive Filters 29
changes of the time-variable system parameters, without at the same time sig-
nificantly impairing the desired quality of estimation in stationary mode of
operation.
Since FIR adaptive filters find very wide application, it is often necessary to
model systems with a basically IIR with an FIR filter. The consequence of this
approach is that the dimensions of the vector of the estimated parameters may be
very large in order to achieve satisfactory characteristics of the modeled system.
Besides that, due to the increase of the dimensions of the vector of estimated
parameters, the number of iterations necessary for the convergence of the algo-
rithm for the filter parameter estimation also grows. This fact represents the
motivation for the synthesis of adaptive algorithms with increased convergence
speed. One of the ways to increase the convergence speed, based on the optimal
design of the input signal, is called the D-optimality. The essence of the optimal
experiment design is an adequate choice of variables included in the experiment, in
such a manner that the experiment itself is maximally informative with regard to
the desired application. Besides that, in the class of informative experiments, the
property of some experiments is that the vector of estimated parameters, obtained
based on a finite set of data, reaches the desired value much faster than the vector
of estimated parameters obtained under some other conditions. By choosing a
criterion function maximizing the information content of an equivalent experi-
ment, in which the object is excited by specially designed sequences, and com-
bining with the procedure for the estimation of the unknown filter parameters, one
arrives to the solution for the generation of the optimal input signal.
Let us also remind that the investigation of parameter estimation in various
models of real systems resulted in the development of a number of algorithms
possessing theoretically optimal properties with regard to a chosen criterion. In a
majority of the cases, the methods for parameter estimation are based on an a
priori accepted assumption that the random processes in the system have a
Gaussian distribution. Numerous practical examples, however, showed an insuf-
ficient justifiability of such an assumption, especially if there are large realizations
of random perturbations in the system. It was established that optimal estimation
procedures, based on the Gaussian assumption, may be very sensitive to the
deviation of the real distribution of the perturbation from the assumed normal
distribution, which results in estimations with less than satisfactory quality in
many applications. On the other hand, the appearance of impulse noise is often met
in practice when processing speech signals, images, biomedical signals, as well as
in the solution of communication problems. To solve these problems one may use
robust methods, based on the development of stochastic procedures which would
be efficient even in the solutions of incomplete a priori information about the
perturbations in the system. Bearing in mind the typical applications [1721], we
analyzed separately the problem of robustification of adaptive algorithms with
regard to the impulse noise at the system output. The presented analysis is based
on the application of the methodology of approximate maximum likelihood, also
called in literature the M-Hubers robust estimation.
Chapter 2
Adaptive Filtering
2.1 Introduction
Adaptive linear filters are linear dynamical system with variable or adaptive
structure and parameters. They have the property to modify the values of their
parameters, i.e. their transfer function, during the processing of the input signal, in
order to generate signal at the output which is without undesired components,
degradation, noise and interference signals. The goal of the adaptation is to adjust
the characteristics of the filter through an interaction with the environment in order
to reach the desired values, [4, 5]. The operation of adaptive filters is based on the
estimation of the statistical properties of the signal in its environment, while
modifying the value of its parameters in order to minimize a certain criterion
function. The criterion function may be determined in a number of ways,
depending on the particular purpose of the adaptive filter, but usually it is a
function of some reference signal. The reference signal may be defined as the
desired response of the adaptive filter, and in that case the role of the adaptive
algorithm is to adjust the parameters of the adaptive filter in such a way to
minimize the error signal, which represents the difference between the signal at the
output of the adaptive filter and the reference signal.
The basic processes included in adaptive filtering are digital filtering and
adaptation or adjustment, i.e. the estimation of parameters (coefficients) of the
filter. The choice of the filter structure and the criterion function used during the
adaptation process, have the crucial influence to the characteristics of the adaptive
filter as a whole.
2.2 Structures of Digital Filters
There are several types of digital filters usable in the design of adaptive filters;
most often they are linear discrete systems, although there is a significant appli-
cation of nonlinear adaptive filters, among which a large group are neural networks
[22]. Digital filters are often categorized either in dependence on the duration of

DOI: 10.1007/978-3-642-33561-7_2,
32 2 Adaptive Filtering
the impulse response or on their structure. Two basic types are the IIR (Infinite
Impulse Response) and the FIR (Finite Impulse Response) filters [25].
The impulse response of a digital filter is its output signal, i.e. its response,
obtained when a unit impulse (Kronecker delta-impulse) is brought to its input

1; for k 0
dk
0; for k 6 0:
The digital filters, in which the duration of the impulse filter response is, the-
oretically, infinite, are called the infinite impulse response filters or the IIR filters.
Contrary to them, the filters with finite impulse response are denoted as the FIR
filters.
When categorizing based on the structure, one starts from the output signal. The
filter output may be a function of the actual and the previous samples of the output
signal, xk, as well as the previous values of the output signal, yk. If the actual
value of the output is a function of the previous values of the output, then there
must be a feedback or a recursive mechanism in the structure, and thus such filters
are denoted as recursive. Contrary to them, if the output is only a function of the
input signal, we speak about non-recursive filters. In order to obtain an infinite
impulse response, one has to use some kind of a recursive filter, and this is the
reason why the terms of the IIR and the recursive filter are sometimes used
interchangeably. Similar to that, finite duration of the impulse response is obtained
with non-recursive structures, and thus the terms of the FIR and the non-recursive
digital filters are used as synonyms.
2.2.1 Filters with Infinite Impulse Response (IIR Filters)
The most general structure of a digital filter is the recursive filter shown in
Fig. 2.1. It contains a direct branch with multipliers, with their values determined
by the parameters bi ; i 0; 1; . . .; M and a return branch with multipliers,
determined by the parameters ai ; i 1; 2; . . .; N. The actual value of the output
signal ^yk is determined by a linear combination of the following weighted
variables: the actual and the previous values of the input signal samples
xk i; i 0; 1; . . .; M, as well as the previous values of the output signal
samples ^yk i; i 1; 2; . . .; N.
In digital signal processing literature the block diagram 2.1 is denoted as the
direct realization [13]. This structure represents the design of the filter transfer
function with zeroes and poles, such that the position of the poles is determined by
the values of the parameters ai , while the position of the zeroes is determined by
the parameters bi [3]. The number of the poles and zeroes is determined by the
number of the delay elements (z1). This structure has a very large memory,
theoretically infinite, and thus it is denoted as the filter with infinite impulse
response (IIR). In other words, the impulse response of the filter, which represents
2.2 Structures of Digital Filters 33
Fig. 2.1 Structure of an IIR x(k) y ( k )

recursive filter b0
input signal output signal
z-1 z-1
b1 a1 y ( k1)
x(k-1)
z-1 z-1
x(k-2) b2 a2
y ( k 2)
z-1 z-1
x(k-M) bM aN y(k N)
the output signal of the filter, ^yk, to impulse excitation xk dk, will last
infinitely long, i.e. the signal ^yk will decrease to zero only after infinite time (the
transient response of the filter will last infinitely before the filter output reaches
zero value in a steady or equilibrium state).
According to Fig. 2.1, the output signal, ^yk, of the IIR filter is given as a
linear difference equation
X
M X
N
^yk bi kxk i aj k^yk j; 2:1
i0 j1
where bi k; i 0; 1; 2. . . M are the parameters of the IIR filter in the direct branch
in a k-th discrete moment, and aj k; j 1; 2 . . . N are the parameters of the IIR
filter in the return branch in the given k-th moment. In general case M N, thus
N represents the order of the filter (N represents the minimal number of the delay
elements z1 necessary to physically implement the relation (2.1) using digital
electronic components [2]).
Besides calculating the filter output, ^yk, the adaptive IIR filter should update
M+N parameters ai and bi, in order to optimize a previously defined criterion
function. The parameter update is a more complex task in the case of IIR filters
than for FIR filters due to two reasons. The first reason is that an IIR filter may
become unstable during optimization if the filter poles become positioned outside
the stable region (unit circle in the complex z-plane), and the other is that the
criterion function to be optimized, generally taken, may have a multitude of local
minima, with a possible consequence that the optimization process ends in some of
the local minima, instead in the global one. Contrary to them, the criterion
functions of the FIR filters (MSE for instance, as will be shown later) usually have
only one minimum, which also represents the global minimum. In spite of the
quoted difficulties, the recursive adaptive filters find significant practical appli-
cations in the control (regulation) systems, especially if the system to be controlled
is recursive. In these applications the adaptive IIR filters with several parameters
may have better properties than FIR filters with several thousands of parameters
[23].
Relation (2.1) can be also written in the polynomial form

^yk Bk z1 xk 1 Ak z1 ^yk; 2:2
where the polynomials are
X M X
N
Bk z1 bi kzi ; Ak z1 1 aj kzj : 2:3
i0 j1
Let us note that in the adopted notation the symbol z1 has the meaning of unit
delay, i.e. that z1 xk xk 1 and z1^yk ^yk 1. The polynomial rela-
tion (2.2) can be also written in an alternative form

Ak z1 ^yk Bk z1 xk: 2:4
If we assume that the filter parameters do not change with time, i.e. that they do
not change with the time index k, the filter transfer function, Gz, is obtained as
the ratio of z complex forms of the output, ^yk, and the input, xk, signal,
assuming that the initial conditions in the difference equations are zero, i.e. that the
values of the samples of the corresponding signals in (2.1) are equal to zero for
negative values of the time index k, i.e.
Z ^yk Y^ z Bz1
G z ; 2:5
Z xk X z Az1
where according to (2.1) the polynomials are
X M X
N
B z1 bi zi ; A z1 1 aj zj ; N M; 2:6
i0 j1
while N represents the filter order. In the above relation (2.5) z is a complex variable,
and the roots of the equation Bz1 0 determine the zeroes of the filter, while the
roots of the equation Az1 0 define the poles of the filter (zeroes and poles of the
filter are also denoted in literature as critical frequencies, and their position in the z-
plane is denoted as the critical frequency spectrum. The dynamical response of the
filter to the input signal is dominantly dependent on the position of the poles in the z-
plane and the necessary and sufficient condition of the filter stability is that the poles
are located within the unit circle, jzj 1. Generally speaking, a filter is stable if the
filter output in equilibrium or steady state, occurring after transient process ceased, is
dictated solely by the excitation signal [3].
2.2.2 Filters with Finite Impulse Response (FIR Filters)
One of the ways to overcome the lack of potential instability in an IIR digital filter
is to design a filter with zeroes only, which in comparison to a recursive IIR
2.2 Structures of Digital Filters 35
structure has only the direct branch or a non-recursive structure. The memory of
such filters is limited, i.e. their impulse response is equal to zero outside some
limited time interval, and because of that they are denoted as the filters with finite
impulse response (FIR). In other words, a transient process in such a system, which
is initiated immediately after bringing excitation and which lasts until the output
signal assumes a stationary value, i.e. the system enters equilibrium or steady state,
has a finite duration. A good property of the FIR filters is that their phase char-
acteristic is completely linear (the transfer function G(z) for
z expjxT ; p xT p, where j denotes imaginary zero, is denoted as
amplitude-phase frequency (spectral) characteristic; at that jGexpjxT j is called
the amplitude, and argfGexpjxT g is the phase frequency (spectral) charac-
teristic). Another good property is that they have unconditional stability, and
because of that they represent the basis of the systems for adaptive signal pro-
cessing [4, 5]. Two basic structures for realization of FIR filters are the transversal
structure and the lattice structure. Figure 2.2 shows the structure of a transversal
FIR filter. The filter contains adders, delay elements (z -1) and multipliers, defined
by the parameters fbi ; i 0; 1; 2; . . .; M g.
The number of the delay elements, M, denotes the order of the filter and the
duration of its impulse response. The output signal of the filter, ^yk, is determined
by the values of the parameters fbi g and it represents a linear combination of the
actual and the previous samples of the input signal, xk These parameters are the
object of estimation in an adaptive process, i.e. they vary with time index k. In this
manner, according to Fig. 2.2, the filter output signal is defined by the linear
difference equation
Fig. 2.2 Structure of a x (k) y ( k )

transversal FIR recursive b0
filter input signal output signal
-1
z
x (k-1) b1
-1
z
x(k-2) b2
z -1
x (k-M ) bM
X
M
^yk bi kxk i: 2:7
i0
If the delay operator is introduced, z1 , i.e. z1 xk xk 1, the above

relation can be written in the polynomial form

^yk Bk z1 xk; 2:8
where the polynomial is
X M
Bk z1 bi kzi ; 2:9
i0
and M represents the filter order. In the case of a stationary or time-invariant

system in which the parameters bi are constant and do not depend on the time
index k, the filter transfer function is defined as the ratio between z-complex forms
of the output and the excitation signal (it is assumed that the values of the signal
sample are equal to zero for negative values of the time index k):
Z ^yk Y^ z zM Bz1
Gz : 2:10
Z xk X z zM
The roots of the polynomial equation zM Bz1 0 determine the filter zeroes,
while according to the expression (2.10) it is concluded that the filter has a pole z 0
with a multiplicity M (the roots of the equation zM 0 are the poles of the system).
Since these poles are located within the unit circle jzj 1 within the plane of the
complex variable z, the FIR filter represents a system with unconditional stability.
Because of that fact, it is customary in literature to say that the FIR filter transfer
function has only zeroes, while it is said for the transfer function of an IIR filter that it
has both zeroes and poles (zeroes are the roots of the polynomial in the numerator,
while poles are the roots of the polynomial in the denominator of the rational function
representing the filter transfer function). Specifically, if the excitation xk is a unit
impulse, i.e. x0 d0 1 and xk dk 0 za k 6 0, according to (2.7) it is
concluded that ^y0 b0 ; ^y1 b1 ; . . .; ^yM bM ; ^yk 0 for k [ M, i.e. the
impulse response of the filter will last M 1 samples, while the coefficients bi denote
the values of the samples of the filter impulse response in the corresponding discrete
and equidistant moments of signal sampling ti iT; i 0; 1; 2; . . .; M; where T is
the sampling or discretization period.
2.3 Criterion Function for the Estimation of FIR Filter

Parameters
The concepts of optimal linear estimation represent the basis for the analysis and
synthesis of adaptive filters [710]. The problem of adaptive filtering encompasses
two estimation procedures, the estimation of the desired output signal from the
2.3 Criterion Function for the Estimation of FIR Filter Parameters 37
filter and the estimation of the filter coefficients necessary to achieve the desired
goal. The definition of these estimators depends on the choice of the criterion
function, which defines the quality of the estimation based on the difference
between the estimator input, ^yk, and the reference or the desired output signal,
i.e. the response yk.
If we denote the column vector of input data with a length M 1 as Xk
Xk xk xk 1 xk 2 . . . xk M T ; 2:11
and the column vector of the estimated parameters or filter : coefficients in the k-th
discrete moment of signal sampling as H ^ k
T
^ k ^
H b0 k ^
b1 k ^ b2 k . . . ^bM k ; 2:12
where k denotes the actual discrete time moment, and T is the matrix operation of
transposing, then the signal at the FIR filter output, ^yk, may be defined in the
form of a linear regression equation, i.e. as a scalar product of the corresponding
vectors
^yk XkT H
^ k; 2:13
i.e.
^ kT Xk:
^yk H 2:14
While designing the optimum solution, the filter is optimized according to the
corresponding criterion function or the performance index. The filter is determined
by the parameter vector H ^ k, so that the optimization problem reduces to the
choice of parameters which will minimize the chosen criterion function. The
choice of the criterion function is a complex problem and most often depends on
the particular application of the filter [9, 16].
2.3.1 Mean Square Error (Risk) Criterion: MSE Criterion
One often meets in practice a criterion function defined as the mean square error,
MSE, which represents the averaged (expected) value of the squared difference
between the reference signal, yk, and the estimated actual value of the output
signal, ^yk. The goal is to minimize the mean square value of the error signal,
which in the ideal case has the consequence that the statistical mean error value
tends to zero, and that the filter output signal is as close as possible to the desired
reference signal.
The error signal (Fig. 2.3) is defined according to (2.13) in the following
manner
y (k )
_ +
Digital filter y(k )
x (k )
H ( k)
e (k )
Adaptive
algorythm
Fig. 2.3 Structure of an adaptive digital filter. The error signal ek appears as the difference
between the reference signal yk and the actual filter output ^yk, and the adaptive algorithm
generates in each step k the parameter vector H ^ k, as well as the estimation of the unknown
parameters Hk
ek yk ^yk yk XkT H
^ k: 2:15
Assuming that e(k), y(k) and x(k) are stationary random series (the statistical
properties of these signals do not change with time) and that the elements of the
^ k are constant, the criterion function J is defined as
vector H
h i 2
MSE ^ J E ek2 E yk XkT H b

h i 2:16
E y2 k H b 2ykXkT H
b T XkXkT H b ;
where E denotes the mathematical expectation with regard to the random vari-
ables x and y. The nature of the criterion function (2.16) is probabilistic and in such
an environment all signals are taken as realizations of stochastic processes, which
points out to the fact that it is necessary for the design of the filter to know the
suitable statistical indicators regarding the signals under consideration, i.e. the
aggregate function of the probability density for the random variables x and y.
Since the mathematical expectation is a linear operator, i.e. the mathematical
expectation of a sum is equal to the sum of mathematical expectations, and the
mathematical expectation of a product is equal to the product of mathematical
expectations only for statistically independent variables [7, 9, 10], it follows
h i h i
E ek2 E yk2 H ^ T E XkXkT H ^ 2E ykXkT H: ^ 2:17
^ k because of the assumption about its

The index k is omitted from the vector H
constant value in the current consideration. If

R E XkXkT
denotes the auto-correlation matrix of the input signal
2 3
xk 2 xkxk 1 xkxk 2 ... xkxk M
6 7
6 xk 1xk
6 x k 1 2 xk 1xk 2 . . . xk 1xk M 7
7
R E6 .. .. .. .. .. 7
6 7
4 . . . . . 5
xk M xk xk M xk 1 xk M xk 2 ... xk M 2
2:18
and

D E ykXT k
denote the cross-correlation vector of the input and the reference signal
D E ykxk ykxk 1 ykxk 2 . . . ykxk M T ; 2:19

it follows that the mean square error or statistical mean value of the error signal is
h i
J E yk 2 H ^ T RH ^
^ 2DT H: 2:20
2.3.2 Minimization of the Criterion of Mean Square Error

(Risk)
In a general case, if the number of the coefficients in an adaptive process to be

estimated is equal to M, then (2.20) represents a surface in the M-dimensional
parametric space. The adaptation process represents the process of searching for a
point on that surface which corresponds to the minimal value of the MSE in (2.20),
or to the optimal value of the parameter vector H. ^
To determine the global minimum of the criterion function one most often uses
some of the algorithms with random search. An important property of the adaptive
systems implemented in the form of FIR filters is that their criterion function
represents a second-order surface, i.e. a square function of H. ^ Here the MSE
criterion function has only one, global minimum. In that case one can use much
more powerful, deterministic methods of minimization of the criterion function,
based on the use of gradients or some estimation of the gradients, instead of the
stochastic methods which are used in the case when the criterion function has more
local minima, as is the case in the IIR systems [12, 24, 25].
If there is only one parameter, the MSE is described by a parabola (Fig. 2.4);
for two parameters, the MSE is a paraboloid (Fig. 2.5) and in the general case
when there is a larger number of parameters, i.e. when M is larger than 2, the
surface is described by a hiper-paraboloid. Since the MSE is by definition a
positive value, the criterion function represents a concave surface, i.e. it spreads in
the direction of increasing MSE.
Fig. 2.4 The form of MSE

for the case when M = 1. J
The criterion function is
represented by a parabola,
and the parameter vector M=1
contains only a single
parameter, b
Jmin
Hopt =bopt
parameter b
Fig. 2.5 The form of MSE

for the case when M = 2. 250
The contours for constant
values of the MSE are 200
represented by the ellipses at
the bottom of the graph
150
MSE
100
50
-50
-10
0 10
b1 0 5
10 -10 -5
b2
The determination of the minimum of the criterion function can be done using
the criterion method [12, 24, 25]. Namely, the MSE gradient is a vector that is
always directed towards the fastest increment of the criterion function and with a
value equal to the slope of the tangent to the criterion function. In the point of the
minimum of the criterion function the slope is zero, so it is necessary to determine
the gradient of the criterion function and equal it to zero in order to obtain the
optimum values of the parameters minimizing the criterion function. The gradient
of the criterion function J, denoted as rJ or only r, is obtained by differentiating
the expression (2.20) with regard to H,^
oJ h iT
r oJ
ob oJ oJ
... oJ ^ 2D:
2RH 2:21
^ 0 ob1 ob2 obM
oH
If (2.21) is made equal to zero, we obtain the Wiener-Hopf equation, and by
solving it we obtain the optimal solution for the parameter vector
^ 2D 0
r 2RH 2:22
Hopt R1 D: 2:23

Hopt represents the vector of optimal values of the FIR filter parameters, i.e. those
values of the parameter Jmin . According to (2.20) and (2.23), one obtains

Jmin E y2 k HTopt RHopt 2DT Hopt
2:24
E y2 k HTopt RHopt :
Starting from expressions (2.20) and (2.23), the criterion function may be
represented as

^ T RH
J E y2 k H ^ 2HT RH ^
opt

E y2 k HTopt RH^ H ^ Hopt T RH
^ 2:25

^ T RHopt H
E y2 k H ^ Hopt T RH
^
i.e., if one introduces (2.24), it follows

J Jmin H^ Hopt T R H
^ Hopt : 2:26
It is obvious from (2.26) that there is a quadratic dependence of J on H ^ and that

this function reaches its minimum for H ^ Hopt .
It should be mentioned that other criterion functions may be utilized besides
MSE, for instance the (mean) absolute value of the estimated errors, higher order
moments, etc.; however, such a choice, contrary to the MSE, leads to nonlinear
optimization problems [9, 10, 16]. Namely, the complexity of their application and
analysis is fundamentally increased, but nonlinear criterion functions nevertheless
have an important role in some applications [9, 16].
In most practical cases the appearance of the criterion function is not known,
and its analytical description is also not known. From (2.23) it follows that to
determine Jmin it is necessary to know the statistical properties of the input and the
reference signal, i.e. the values of the correlation matrix R and the correlation
vector D. Most often one knows only the measurement sequences of the mentioned
signals, and their statistical properties can be obtained only by estimation, based
on experimental data. The values of the points on the surface defining the criterion
function may be measured or estimated by averaging the MSE in time, in the sense
of the approximation of the mathematical expectation by the appropriate arith-
metic means. The problem of the determination of the optimal values for the filter
parameters reduces to defining an adequate numerical procedure or algorithm able

to describe the curve or, in a general case, the surface determined by the criterion
function, as well as to determine its minimum. The values of the parameters
defining the minimum of the criterion function represent the optimal vector Hopt ,
which is often also denoted as the accurate values of the parameters.
The majority of the adaptive algorithms is based on the standard iterative
procedures for the solution of the minimization problems in real time. To clarify
the properties of the usual adaptive algorithms for the minimization of the criterion
function, we will consider two basic numerical methods for iterative minimization
of the criterion function: the Newtons method and the steepest descent method.
The both methods are used for the estimation of the gradient, r, for the deter-
mination of the minimum of the criterion function instead of the accurate value of
the gradient, which is not even known in the general case [12, 24, 25].
2.3.2.1 Newtons Method
By multiplying (2.21) with R-1 we obtain

1 1 ^ R1 D:
R rH 2:27
2
By combining (2.23) and (2.27) it follows
^ 1 R1 r:
Hopt H 2:28
2
Equation (2.28) represents the Newtons method for the determination of the
root of the vector equation obtained by making the gradient of the criterion
function equal to zero (the necessary condition of the minimum of the adopted
criterion). Knowing the value of H ^ in any moment of time, together with the R and
the corresponding gradient r, one can determine the optimal solution Hopt in just a
single step. In practical situations, however, the available information are insuf-
ficient to perform a single-step adaptation. The value of the correlation matrix of
the input signal, R, changes with time under nonstationary conditions and, in the
best case, can be only estimated, similar to the unknown value of the criterion
function gradient r which must be estimated in each iteration. In order to reduce
the effect of noisy or fluctuating values of these estimations, one modifies (2.28)
on order to reach the algorithm which updates the parameter vector H ^ in small
increments and converges to Hopt after a number of iterations. In this manner,
starting from (2.28), one reaches the Newtons method in an iterative (recursive)
form [12, 24, 25]
H ^ k 1 R1 rk;
^ k 1 H fk 0; 1; 2; . . .g 2:29
2
where the index k with the gradient of the criterion function denotes that it is
estimated in each iteration according to (2.21). The expression (2.29) can be
generalized by introducing a constant l, i.e. a dimensionless variable determining
the convergence speed of the iterative process
H ^ k 1 lR1 rk 1:
^ k H 2:30
^ k 1 2D, and thus
According to (2.21) it follows that rk 1 2RH
according to (2.30) one obtains
H ^ k 1 2lR1 D:
^ k 1 2lH 2:31
Arranging further (2.31), and taking into account (2.23), one can write
H^ k 1 2lH
^ k 1 2lHopt
^ 2^
Hk 1 2l Hk 2 2l1 1 2lHopt
.. 2:32
.
kP
1
H^ k 1 2lk H
^ 0 2lHopt 1 2li :
i0
The vector H^ obviously converges to the optimal value of Hopt only in the case
kP
1
when the condition is fulfilled that the geometric series 1 2li is convergent,
i0
i.e.
j1 2lj\1; 2:33
that is
0\l\1; 2:34
and in that case
h i
^ k 1 2lk H0
H ^ Hopt 1 1 2lk : 2:35
From (2.35) it follows that the final solution can be reached in one step for
l 0:5, but only under the condition that one knows the accurate values of the
inverse correlation matrix of the input signal, R1 , and the gradient of the criterion
function, r, i.e. the cross-correlation vector D. In the case when R1 and r are
estimated, one usually utilizes values l 1, typically smaller than 0.01, to
overcome the problems appearing because of the error introduced by the estima-
tion of the unknown variables R and r.
Newtons method is fundamentally important from the mathematical point of
view, however it is very demanding in practical applications because of the need to
estimate R and r in each step. It is the method of gradient search, a consequence
^ change in each iteration, with the goal
of which is that all elements of the vector H
to determine the optimum values of the parameters. These changes are always
toward the minimum of the gradient function, but, as (2.30) shows, not necessarily
in the direction of the gradient itself.
As mentioned, the main problem with the Newtons algorithm is its application
under the conditions when one does not know the value of the inverse correlation
matrix of the input signal and the value of the gradient of the criterion function, i.e.
the cross-correlation of the input and the reference signal. Regretfully, it is a
common case in practice [6]. In that case one most often assumes that the non-
diagonal elements of the correlation matrix are equal to zero. The methods based
on this assumption bear a common name of the steepest descent method and we
consider them in the further text.
2.3.2.2 Steepest Descent Method
The steepest descent method is an optimization technique utilizing the gradient of

the criterion function to determine its minimum. This method, contrary to
Newtons method, in each iteration updates the values of the vector H, ^ only in
the direction of negative value of the gradient. Since the gradient represents the
direction of the fastest increment of the criterion function, the movement in the
direction of negative gradient should ensure the fastest approach to the minimum
of the criterion function, which is why this method obtained its name.
According to its definition, the steepest descent method can be described in the
following manner [12, 24, 25]
^ k 1 H
H ^ k brk: 2:36
The steepest descent method starts from some initial value H ^ 0. The estima-
tion in the next step H^ k 1 is equal to the current estimation H ^ k corrected by
the value in the direction opposite to that of the direction of the fastest increment
of the function, i.e. of the gradient, in the point H^ k. The last term in Eq. (2.36)
represents the estimated gradient of the criterion function in the k-th iteration. The
scalar parameter b is the convergence factor determining the size of the correction
step and influences the stability and the adaptation speed of the algorithm. The
dimension of this factor is equal to the reciprocal value of the dimension of the
input signal power.
The graphical presentation of this method for M 1 is given in Fig. 2.6. It can
be shown that the convergence conditions are satisfied for [6]
1
0\b\ ; 2:37
kmax
where kmax is the largest eigenvalue of the correlation matrix of the input signal R,
which depends on the input signal power, i.e. on the mean expected value of the
squared amplitude of the input signal.
J M=1
Gradient
vector
Jmin
H(0) H opt H(1) H(k)
Fig. 2.6 Graphical presentation of the steepest descent method
When comparing Eqs. (2.36) and (2.29), one should note that in the case of
Newtons method the information about the gradient is corrected with the value of the
inverse correlation matrix of the input signal, R1 , and with the scalar parameter l.
This means that in this method the direction of the criterion function search is
corrected to keep it always toward the minimum of the criterion function, while in
the steepest descent method this direction coincides with the fastest increase
(decrease) of the function. The two quoted direction may not coincide in a general
case, and the search path of the criterion function in the application of Newtons
method is shorter, which suggests that the optimization process is faster compared
to the steepest descent method (Fig. 2.7). This advantage stems from the fact that
Newtons method utilizes much more information about the criterion function in
comparison to the steepest descent method. Also, compared to the steepest descent
method, Newtons algorithm is much more complex, since it requires the calcu-
lation or the estimation of the inverse correlation input matrix in each iteration.
However, under real circumstances, in the presence of noise while estimating the
gradient and the input data correlation matrix, it may happen that the steepest
descent method converges much more slowly toward the minimum of the MSE in
comparison to Newtons method or that, for the sake of speed, converges into a
larger value of the MSE criterion.
2.4 Adaptive Algorithms for the Estimation of Parameters

of FIR Filters
Adaptive digital filters, generally taken, consist of two separate units: the digital
filter, with a structure determined to achieve desired processing (the structure is
known with an accuracy to the unknown parameter vector) and the adaptive
b2 Newton's method
Steepest descent method
b 2op t
H = [ b1 b 2 ]
H opt = [ b1opt b 2 opt ]
b 1opt b1
Fig. 2.7 Directions of the determination of the minimum of the criterion function for the steepest
descent method and for the Newtons method
algorithm for the update of filter parameters, with a goal to ensure their fastest
possible convergence to the optimum parameters from the point of view of the
adopted criterion. According to this, it is possible to implement a larger number of
combinations of filter structures and adaptive algorithms for parameter estimation.
Most of the adaptive algorithms represent modifications of the standard iterative
procedures for the solution of the problem of minimization of criterion function in
real time. Two important parameters determining the choice of the adaptive
algorithm are the adaptation speed and the expected accuracy of the parameter
estimation after the adaptation is finished. In a general case, there is a discrepancy
between these two requirements. For a given class of adaptive algorithms an
increase of adaptation speed will decrease the accuracy of the estimated param-
eters and vice versa. In this section we consider two basic algorithms for parameter
estimation for the FIR adaptive filters: the Least Mean Square (LMS) and the
Recursive Least Square (RLS) algorithms. The LMS algorithm has a relatively
large importance in applications where it is necessary to minimize the computa-
tional complexity, while the RLS algorithm is popular in the fields of system
identification (the determination of the system model based on experimental data
on input and output signals) and time series analysis (experimentally recorded
signal samples) [24, 25].
2.4.1 Least Mean Square (LMS) Algorithm
LMS (Least Mean Square) algorithm belongs to the method of steepest descent,
but it utilizes a special estimation of the gradient, i.e. it takes the actual value of
2.4 Adaptive Algorithms for the Estimation of Parameters of FIR Filters 47
the squared error signal e2 k instead of the mathematical expectation MSE in

(2.16). The criterion function in LMS algorithm is thus defined as [4]
J k e2 k: 2:38
Based on the values of H^ k and e(k) defined in (2.12) and (2.15) the estimation
of gradient is obtained as
2 3
o ek
2 6 o b0 7
^ k o e k 2ek o ek 2ek6 .. 7 2ekXk;
r 2:39
oH^ oH^ 4 . 5
o ek
o bM
i.e. the current estimation of the gradient is the product of the vector of input
signals in k-th iteration and the corresponding error signal. This represents the
basis of the simplicity of the LMS algorithm, since only a single multiplication
operation per each parameter is necessary for the estimation of the gradient.
Starting from the general form of the steepest descent method (2.36) and (2.39),
one may define the LMS algorithm as
^ k 1 H
H ^ k;
^ k br 2:40
^ k 1 H
H ^ k 2bekXk; k 0; 1; 2; . . .; 2:41
^
where rk represents the estimation of the gradient (2.39), and b is a scalar
parameter influencing the adaptation speed, the stability of the adaptive algorithm
and the error value after the adaptation process is finished. Since the change of the
value of the parameter vector is based on the estimation of the gradient without
averaging (only the actual value of the error signal is used, i.e. one its realization)
and not on its real value, one may expect that the adaptive process will be noisy,
i.e. that the estimation of the parameters will fluctuate (the estimation of the
parameters represents random variables with a corresponding variance).
It can be shown that (2.39) represents an unbiased estimate of the gradient for
the case when the values of the parameters are constant

E r^ k 2EekXk

2E ykXk XkXT kH ^ 2:42
^ 2D rk:
2RH
Since the mathematical expectation of the gradient estimation is equal to the

accurate value of the gradient, such an estimation of the gradient is unbiased.
Because the estimation of the gradient is unbiased, i.e. the mean or expected value
of the gradient estimation (2.42) is equal to its accurate value (2.21), the LMS
algorithm can be classified as a steepest descent method, but with a limitation that
the gradient is estimated in each iteration, while the values of the parameter vector
H^ are updated after several iterations, when one can claim that the estimation
^
rk, formed as the arithmetic means of the previously calculated estimated
gradients with a goal to approximately determine the mathematical expectation of
such an estimation of the gradient, describes well the accurate value rk. Bearing
in mind that the parameter vector H ^ in (2.40) is practically updated in each
iteration (k 0; 1; . . .), it is necessary to limit the value of the scalar parameter b
according to (2.37), in order to ensure the convergence of the parameter vector H ^
to Hopt [6].
As a consequence of updating of the parameter vector H ^ in each iteration,
according to insufficiently accurate estimation of the gradient, the adaptive process
is noisy, i.e. it does not follow the steepest descent line toward Hopt . This noise
decreases in time with the advance of the adaptive process, since near Hopt the
value of the gradient is small and the correction term in (2.40) is also small, so that
the estimations of the parameters are close to their previous values.
From (2.41) it is obvious that the algorithm is simple for implementation,
because it does not require the operations of squaring, averaging or differentiating.
In the LMS algorithm each parameter of the filter is updated in such a manner that
one adds weighted value of error signal to its actual value
bi k 1 bi k 2bekxk i; i 0; 1; . . .; M; k 0; 1; . . . 2:43
The error signal, e(k), is common for all coefficients, while the weight factor is
2bxk i and it is proportional to the values met in a k-th moment in the i-th
delay section of the FIR filter. To calculate 2bekxk one needs M 1 arith-
metic operations (multiplication and addition). Because of that, each step of the
algorithm requires 2M 1 operations, which makes this algorithm convenient
for the real time application [26].
Generally taken, for larger values of b one reaches greater convergence speeds,
but the estimation error is also larger, while for smaller b one also has smaller
asymptotic error of the parameter estimation. Also, according to (2.37) it can be
seen that the value of b is limited by kmax , i.e. by the input signal power xk.
In order to overcome this problem, one may modify the expression (2.43) so that
the correction factor is normalized with regard to the input signal power
aekxk i
bi k 1 bi k : 2:44
PM
2
x k j
j0
The expression (2.44) represents the Normalized Least Mean Square Algorithm
NLMS, and a is a constant which may have a value within the range 0\a\2, [27]
The simplicity and easy implementation make the LMS algorithm very
attractive for many practical applications. Its main deficiency regards its conver-
gence properties which are slow and depend on the characteristics of the input
signal. The LMS algorithm has only a single variable, the parameter b, whose
change influences the convergence properties and which has a limited range of
possible values, according to (2.37).
There is a number of algorithms in literature with better convergence properties

than the LMS algorithm, but this was achieved through an increase of the com-
putational complexity of the algorithms. One of these is the Least Squares (LS)
algorithm.
2.4.2 Least Squares Algorithm (LS Algorithm)
Let us consider the structure of a digital FIR filter shown in Fig. 2.8 which is
known with an accuracy up to an unknown set of parameters bi ; i 0; 1; :. . .; M.
The error signal, ek, for this case can be defined as
ek yk xkb0 xk 1b1 . . . xk M bM : 2:45
The same as above, the vector of unknown parameters in the k-th moment of
^
sampling is denoted by Hk and defined as
^ T k b0 k
H b1 k b2 k ... bM k : 2:46
The least squares method is based on the criterion according to which the
estimation of parameters is optimal if the sum of error squares is minimal. Thus the
criterion function for the LS algorithm is defined by [6]
1X k
J k e2 i: 2:47
2 i0
Let us note that expression (2.47) for the LS criterion represents an approxi-
mation of the expression (2.16) for the MSE criterion in which the mathematical
expectation is replaced by the corresponding sum. In the LMS criterion (2.38) this
sum contains only a single term, the square of the actual error signal.
y(k)
x (k) z-1 z-1 z-1
x (k-1 ) x (k-2 ) x (k-M )

e(k)
+
b0 b1 b2 bM -
y ( k )
Fig. 2.8 Direct realization of FIR filter

If the expression (2.45) is written in matrix form, using the whole data package
fei; i 0; 1; . . .; kg, one obtains
2 3 2 3
e 0 y 0
6 e 1 7 6 y 1 7
6 7 6 7
6 e 2 7 6 y 2 7
6 76 7
6 . 7 6 . 7
4 .. 5 4 .. 5
e k y k
2 32 3
x0 0 0 0 b0 k
6 x1 x 0 0 0 7 6 b1 k 7
6 7 6 7
6 x2 x 1 x0 0 7 6 b2 k 7
6 7 6 7 2:48
6 .. 7 6 .. 7
4 . 5 4 . 5
xk xk 1 xk 2 x k M bM k
or in matrix notation
^ k;
ek yk ZkH 2:49
where ek represents the error vector, y(k) is the reference signal vector, and
Z(k) is the input matrix. When forming this equation it was adopted that yk 0
for k 0 (causal signal). The criterion function given by (2.38) can be expressed
using the vector ek, given as (2.49), in the form of a scalar product
1
J k eT kek; 2:50
2
or in expanded form
1h ^ T kZT kyk H
^ k H ^ T kZT kZkH
i
^ k :
J k yT kyk yT kZkH
2
2:51
^ one obtains
By differentiating the criterion (2.51) over the parameter vector H
oJ k
ZT kyk ZT kZk: 2:52
^ k
oH
The vector minimizing the criterion function (2.50) is obtained by making the
expression (2.52) equal to zero, i.e.

^ k Z T kZk 1 ZT kyk:
H 2:53
When deriving the expression (2.52) the following rules for the differentiation
of scalar over vector were used [7]
oyT x oxT y oxT Ax

y; y; 2Ax; 2:54
ox ox ox
where x and y are column vectors with the corresponding dimensions, and A is a
square and symmetric matrix. So for instance for the second addend in (2.51) it is
adopted that yT yT kZk and x H ^ k, while for the last addend in (2.51)
xH ^ k and A Z kZk.
T
The presented procedure for optimal parameter estimation (2.53), obtained by

the minimization of the sum of squared errors, represents the least squares (LS)
algorithm. It is non-recursive and it requires a rather complex computational
procedure in each iteration, since the filter coefficients (2.53) are each time cal-
culated from the beginning (such algorithms are denoted as package or, in Anglo-
Saxon literature, the off-line algorithms). A much more convenient form of this
algorithm is the recursive form, which is able to update the values of coefficients
utilizing their previous estimation and newly obtained measurements of the input
signal and the reference signal.
2.4.3 Recursive Least Squares (RLS) Algorithm
Let the gain matrix in (2.53) be denoted as

1
Pk ZT kZk : 2:55
In that case, according to (2.48), the gain matrix in the next step (sampling
moment) is
1
Pk 1 ZT k 1Zk 1

1
Zk
ZT k Xk 1
XT k 1 2:56
T T
1
Z kZk Xk 1X k 1 ;
1
P1 k Xk 1XT k 1
where according to (2.11)
X T k 1 x k 1 xk xk 1 . . . xk M 1 ; 2:57
and the dashed line denotes the columns and the lines to be added to the existing
matrix Z(k) to form the matrix Zk 1.
Using the identity (lemma on matrix inversion) [24]
1
A BCD1 A1 A1 B C DA1 B DA1 2:58
valid for all matrices with corresponding dimensions and the nonsingular matrix
A, one may write the value of the gain matrix in the k 1 discrete moment
1
Pk 1 Pk PkXk 1 1 XT k 1PkXk 1 XT k 1Pk:
2:59
The expression (2.59) is obtained directly from (2.57) and (2.58) if one adopts
in (2.58) that A P1 k; B Xk 1; C 1; D XT k 1. According to
(2.53) and (2.59) it follows

T y k
^
Hk 1 Pk 1 Z k j Xk 1
yk 1 2:60
T
Pk 1 Z kyk Xk 1yk 1
Bearing in mind that according to (2.53) and (2.55)
^ k PkZT kyk;
H 2:61
it is further concluded that
ZT kyk P1 kHk: 2:62

Using (2.56), the expression (2.62) can be written in the form

ZT kyk P1 k 1 Xk 1XT k 1 Hk: 2:63
By replacing (2.63) into the relation (2.60) one finally obtains the recursive
least squares algorithm for the estimation of the unknown parameter vector, H, of
an FIR digital filter

^ k 1 H
H ^ k Kk 1 yk 1 XT k 1H
^ k ; 2:64
where
Kk 1 Pk 1Xk 1; 2:65
or, after the expression (2.59) is replaced into the relation (2.65)
1
Kk 1 PkXk 1 1 XT k 1PkXk 1 : 2:66
The starting value of the gain matrix P can be obtained by setting P0 r2 I,

where r2 is a large positive number and I is a unit matrix with adequate dimen-
^
sions. The initial value for the parameter vector H0 can be set to zero value.
^
Alternatively, the initial estimation H0 of the unknown parameter vector H of
the digital filter can be determined by the non-recursive least squares algorithm
(2.53), utilizing the initial package of input signal measurements, x, and the
desired output, y, of the filter with a length of several tens of samples.
The variable
^ k
ek 1 yk 1 XT k 1H 2:67
is also denoted as the measurement residual or innovation, since according to the

relation (2.45), (2.46) and (2.57), the expression
^ k
^yk 1 XT k 1H 2:68
represents the prediction of the reference signal (desired response or output),
yk 1, based on the input measurement vector Xk 1 and the previous esti-
mation of the filter parameter vector H ^ k. In this manner, the whole measurement
of the desired output signal (response), yk 1, does not introduce a new infor-
mation about the estimated parameters, since even before the measurement data
yk 1 was obtained it was possible to anticipate its value according to (2.68),
utilizing the model of digital filter (2.45) and the previous estimation of the filter
parameter vector H ^ k (relation (2.68) defines the estimated output of the FIR filter
before its value has been measured). If the estimation H ^ k is equal to the accurate
value of the parameter vector in (2.45), thenyk ^yk and ek 0 (see Fig. 2.8)
One of the basic deficiencies of this algorithm is actually its complexity. It can
be shown that in each iteration one needs 2:5M 2 4M multiplication and addition
operations (M is the filter order) [6]. With an increasing filter order M, the com-
putational complexity increases with the squared filter order M, which often may
be the limiting factor in certain applications. The computational complexity of the
RLS algorithm is much larger compared to the 2M 1 operations per iteration
required in the LMS algorithm. On the other hand, the initial convergence prop-
erties of the RLS algorithm are significantly better compared to the LMS algo-
rithm. An advantage of the RLS algorithm is also its insensitiveness to the
correlation properties of the input signal, contrary to the LMS algorithm. Namely,
the LS and RLS algorithms do not require an a priori information about the
statistical properties of the relevant signals, contrary to the LMS algorithms where
the value of the factor b is limited by the maximal eigenvalue of the autocorre-
lation matrix of input signals.
The complete RLS algorithm is systematized in Table 2.1.
2.4.4 Weighted Recursive Least Squares (WRLS) Algorithm

with Exponential Forgetting Factor
Recursive least squares (RLS) algorithm in its original version is suitable for the
estimation of parameters for stationary conditions, i.e. constant estimated
parameters. Basically it is an algorithm with an unlimited memory, where all
previous results are equivalently taken into consideration and based on it the
estimation of the parameters in the next moment is performed. In the case of a
time-variable system this means that the criterion (2.57) will furnish an estimation
of the average behavior of the process in the time interval under consideration, and
thus such an estimation will not be able to follow correctly the momentary changes
of parameters in the digital filter model. To overcome this problem it is necessary
Table 2.1 Flow diagram of RLS algorithm

1. Initialization:
^ 0 0; P0 r2 I; r2 1
H
Read in the first sample of the input signal vector X1 x1 0 . . . 0
^ 1; Xk and Pk 1 are
2. In each discrete moment of time k 1; 2; . . .; assuming that Hk
known, calculate:
Input signal estimation: ^yk XT kHk ^ 1
Error signal: ek yk ^yk yk XT kHk ^ 1
T
Pk 1XkX kPk 1
Gain matrix: Pk Pk 1
1 XT kPk 1Xk
Kk PkXk
^ k H
Filter filter:coefficients: H ^ k 1 Kkek
Update input vector:
XT k 1 xk 1 xk xk 1 . . . xk M 1 , assuming that xi 0 for
i 0 (causal excitation signal)
3. Increment counter k by 1 and repeat the procedure from the step 2
to utilize an algorithm with limited memory, which essentially reduces to the

introduction of the forgetting factor. In other words, the criterion (2.47) should be
modified in such a manner that the older measurements take part in it with a
decreased weight, in order to enable the RLS algorithm to follow the parameter
changes. This means that in the case of filter parameters variable in time one
should replace the criterion function (2.47) with an exponentially weighted sum of
the squares of error signals
1X k
J k qki e2 i; 2:69
2 i0
where q represents the forgetting factor (FF) determining the effective memory of
the algorithm and its value is within the range
0\q 1: 2:70
For stationary conditions (not changing in time) one applies q 1, and in this
case the criterion function defined by (2.69) becomes equal with (2.47), and the
algorithm that recursively minimizes the given criterion has an unlimited memory.
In this way, the estimated parameters have a high accuracy, since, asymptotically
taken, one eliminates the influence of noisy states by averaging them. For the
conditions when the estimated parameters change, the forgetting factor q 1 is
not convenient, because the adaptation of the estimated parameters towards the
real values is relatively slow. Because of that one should use q\1. By utilizing
q\1 one obtains different weightings for previous measurements, i.e. the previous
measurements are taken with a smaller weight compared to the more recent ones.
Assuming that a nonstationary signal consists of stationary segments of a given
length, the forgetting factor q can be determined in the following manner. Starting
from the assumption that the value of q is close to zero, one may write
qk ek ln q ek ln1q1 ek1q ;
i.e.
1
qk ek=s ; s : 2:71
1q
In this manner, by choosing a forgetting factor of q\1, the effective memory of
the algorithm becomes
1
s ; 2:72
log q
which, in the case when the values of q are close to one, is approximately equal to
1
s : 2:73
1q
Expression (2.71) shows that the measurements older than s k [ s are allo-
cated a weight smaller than e1 0:36 k s compared to the unit weight
allocated to the current measurement (k 0). In other words, thus chosen weight
factor q in the criterion (2.69) corresponds to the exponentially decaying memory
of the algorithm, where the time constant s of the exponential curve corresponds to
the memory length in the adopted units of time or to the number of periods of
signal sampling.
Through minimization of the criterion function (2.69) one arrives to the
Weighted Recursive Least SquaresWRLS algorithm. The derivation of the
WRLS algorithm is identical to that of the RLS algorithm. Namely, the criterion
(2.69) can be written in square matrix form
1
Jk eT kWkek; 2:74
2
where
2 3 2 3
e 0 w 0 0 0
6 e 1 7 6 0 w 1 0 7
6 7 6 7
e k 6 . 7 ; W k 6 .
. . 7; wi qki ; i
4 . 5 4 . 5
e k 0 0 wk
0; 1; . . .; k: 2:75
Since according to (2.49)

^ k;
ek yk ZkH
by replacing this expression to (2.74) one obtains
1 ^ T kZT kWkyk
J k yT kWkyk H
2 2:76
yT kWkZkH^ k H ^ T kZT kWkZkH
^ k
Similarly to the derivation of the standard RLS algorithm, if one further applies
the rules (2.54) for differentiation of the corresponding terms in (2.76) over the
^ one obtains the necessary condition for the minimum of the criterion (2.74)
vector H,
oJ k ^ k 0;
ZT kWkyk ZT kWkZkH 2:77
^ k
oH
and the nonrecursive algorithm of weighted least squares (non-recursive WRLS
algorithm) directly follows from it

^ k ZT kWkZk 1 ZT kWkyk:
H 2:78
The recursive version of the WRLS algorithm is obtained from (2.78) in a
manner identical to the one used to derive the recursive algorithm (2.38), (2.64)
(2.66) from its non-recursive form (2.53). According to the expression (2.78) one
may write the block-matrix relation

qWk 0 y k
^ k 1 Z T k X k 1
H
0 1 y k 1 2:79
T
Pk 1 qZ kWkyk Xk 1yk 1 ;
where
1
Pk 1 ZT k 1Wk 1Zk 1 ; ZT k 1 ZT k Xk 1

qWk 0 y k
Wk 1 ; y k 1
0 1 y k 1
2:80
According to (2.80) it follows further that
P1 k 1 qZT kWkZk Xk 1XT k 1

2:81
qP1 k Xk 1XT k 1
By applying the lemma on matrix inversion (2.58) to the relation (2.81), adopting
that A qP1 k; B Xk 1; C 1 and D XT k 1 it can be written
1
Pk 1 Pk
q

1
1 1 1
Xk 1 1 XT k 1 PkXk 1 XT k 1 Pk
q q q
2:82
Bearing in mind that according to (2.78) and (2.80)

^ k PkZT kWkyk;
H 2:83
we conclude that
ZT kWkyk P1 kH
^ k; 2:84
from where, after introducing the expression (2.79), one obtains
1 1
^ k:
ZT kWkyk P k 1 Xk 1XT k 1 H 2:85
q
By replacing the expression (2.85) in relation (2.81), one obtains

^ k 1 H
H ^ k Pk 1Xk 1 yk 1 XT k 1H ^ k : 2:86
The expression
Kk 1 Pk 1Xk 1 2:87
defines the gain matrix of the recursive algorithm for the estimation of digital filter
parameters (2.86). By replacing the expression (2.82) in the relation (2.87) the
latter can be written in the alternative form
1
Kk 1 PkXk 1 q XT k 1PkXk 1 2:88
Relations (2.82), (2.86) and (2.87) or (2.88) define the recursive WRLS algo-
rithm, i.e.

^ k H
H ^ k 1 Kk yk XT kH ^ k 1 ; 2:89
where
1n 1 o
P k Pk 1 Pk 1Xk q XT kPk 1Xk XT kPk 1 :
q
2:90
To start the recursive procedure (2.88)(2.90) it is necessary to adopt the initial
values P(0) and H ^ 0 and they are chosen in the same manner as in the RLS
^
algorithm, i.e. H0 0 and P0 r2 I, where r2 1, and I is the unit matrix
with corresponding dimensions.
The very name forgetting factor, q, suggests that it represents the measure of
taking into account the previous measurements in the estimation process. In other
words, the choice of the values for the forgetting factor determines how quickly
one neglects the influence of the previous measurements. In the estimation of
stationary parameters it is desirable that the algorithm similarly takes into account
all previous measurements, because the system does not change with time and in
this case one assumes q 1, i.e. s 1. However, the situation is completely
different if parameters vary with time. In this case the so-called older mea-
surements do not have a large significance, because they do not bear information
about the newly occurring changes. Because of that their importance is decreased
by the choice of the forgetting factor with a value smaller than 1. For instance,
q 0:9 corresponds, according to (2.73), to the value of memory of the algorithm
of s 10 signal samples.
Through the choice of the forgetting factor q\1 one achieves faster adaptation
of the parameters to accurate values in such a manner that better results are
obtained with smaller values of q (which corresponds to smaller s), but one
simultaneously increases the variance of the parameter estimation because of the
influence of noisy measurements. On the other hand, through the use of a fixed
forgetting factor q\1, the gain matrix Pk is constantly divided by a factor
smaller than 1, which may lead to the consequence that the gain matrix (2.89)
achieves a very high value, and thus the algorithm becomes very sensitive to
random disturbances or numerical errors propagating through the residual of
measurements
^ k 1:
ek yk XT kH 2:91
The basic deficiency of the application of RLS algorithms with a fixed FF in
time-variant systems follows from here. The choice of the time constant s; i.e. the
forgetting factor q; depends on the expected dynamics of the filter parameter
change and they should be chosen to keep parameters approximately constant on
the interval with a length of s signal samples. Starting from expression (2.71), for
nonstationary signals containing intervals of quasi-stationarity, on nonstationary
segments it is useful to utilize q\1, which corresponds to small s. For stationary
segments one should adopt q 1, which corresponds to a large value of
s s ! 1. In order to achieve an adequate ability for adaptation in the change of
time-variant systems, which includes nonstationary changes, and simultaneously
to avoid a significant influence to the variance of the estimated parameters in the
intervals without changes, as well as the influence to their accuracy, it is necessary
to vary adaptively the forgetting factor during the operation of the algorithm itself.
There is a wider discussion in Chap. 3 about the strategy of choice of variable
forgetting factor while applying the adaptive algorithm itself. In practical situa-
tions one sometimes adopts that the forgetting factor q varies over time within the
quasi-stationarity interval and that it exponentially increases to 1 [24]. This cor-
responds to choosing
qk q0 qk 1 1 q0 ; k 1; 2; . . . 2:92
where the usual choice is q0 0:99 and q0 0:95. According to (2.92) one may
write
q1 q0 q0 1 q0
q2 q0 q1 1 q0 q20 q0 q0 1 q0 1 q0 ;
from which one inductively concludes that

X
k1
qk qk0 q0 1 q0 qk1i
0 ;
i0
i.e., if one substitutes k 1 i j

X
k1
qk qk0 q0 1 q0 q0j
j0
1 qk0 2:93
qk0 q0 1 q0 :
1 q0
1 qk0 1 q0
According to the above expression one obtains
lim qk 1: 2:94
k!1
Table 2.2 systematizes the WRLS algorithm
2.5 Adaptive Algorithms for the Estimation

of the Parameters of IIR Filters
FIR adaptive filters have a unimodal criterion function with a single global min-
imum and are not susceptible to instability with a change of the value of their
parameters, because the adequate filter transfer function has all poles in the origin,
z 0, of the z-complex plane, i.e. all poles of the transfer function are within the
stability region (unit circle jzj 1) [3]. The process of convergence of the
parameters of an FIR filter towards the optimal values, corresponding to the
minimum of the adopted criterion, is well researched and the results about it are
available in literature [4]. These properties make them more desirable than other
structures and because of that they have a very wide practical application [17].
However, with an increase of the length of the impulse response of the modeled
system one must proportionally increase the number of the filter parameters. This
leads to an increased complexity of the adaptive algorithm, a decrease of the
convergence speed, and in the case of exceptionally long impulse response also to
unacceptably high complexity of the suitable digital hardware.
It is possible to overcome this deficiency by using adaptive filters with infinite
impulse response, the IIR filters.
The main advantage of the adaptive IIR filters in comparison to the adaptive
FIR filters is that by using the same or even a smaller number of parameters one is
able to significantly better describe a given system for signal transfer and pro-
cessing. The response of this system can be much better described by the output
Table 2.2 Flow diagram of WRLS algorithm

1. Initialization
H^ 0 0; P0 r2 I; r2 1
Read in the first sample of the input signal vector X1 x1 0 . . . 0
^ k 1; Xk and Pk 1 are
2. In each discrete moment of time k 1; 2; . . .; assuming that H
known, calculate:
Output signal estimation: ^yk XT kH^ k 1
T ^
Error signal: ek yk X kHk 1
Gain matrix:

Pk 1XkXT kPk 1
Pk q1 Pk 1 T
q X kPk 1Xk
1
Kk Pk 1Xk q XT kPk 1Xk
^ k H
Filter coefficients: H ^ k 1 Kkek
Update input vector
XT k 1 xk 1 xk xk 1 . . . xk M 1 , assuming that xi 0 for
i 0 (causal excitation signal)
3. Increment the counter k by 1 and repeat the procedure from the step 2
signal from a filter whose transfer function has both zeroes and poles (IIR) in
comparison to a filter whose transfer function has zeroes only (FIR). So for
example an adaptive IIR filter with a sufficiently high order can accurately model
an unknown system described by a certain number of zeroes and poles, while an
adaptive FIR filter can only approximate it. In other words, to describe some
system with a given accuracy, an IIR filter generally requires a much smaller
number of coefficients that the corresponding FIR filter.
Figure 2.1 shows a general structure of an adaptive filter, which may be an
adaptive IIR filter. xk denotes the input signal, yk is output signal, ek is error
signal, H is an unknown parameter vector of the estimated filter, and y(k) is the
reference signal. The adaptive IIR filter consists from two basic parts: a digital IIR
filter determined by the values of the variable parameters of the vector H and the
corresponding adaptive algorithm according to which the unknown parameters are
updated to minimize a given criterion function which is a function of the error
signal ek.
Basically there are two approaches to adaptive digital IIR filtering, which
correspond to different formulations of the error signal ek. They are denoted as
the equation error (EE) method and the output error (OE) method. The EE method
is characterized by the updating of the feedback coefficients of the IIR adaptive
filter in the domain of zeroes, which basically leads to the adaptive FIR filters and
the corresponding adaptive algorithms from their domain.
The adaptive IIR filters based on the EE method are shown schematically in
Fig. 2.9. The signal ye k is defined by the following expression
X
N X
M
ye k ai kyk i bi kxk i; 2:95
i1 i0
2.5 Adaptive Algorithms for the Estimation of the Parameters of IIR Filters 61
y o (k)
1
x(k) 1
B ( k, z )
1 + A(k , z1)
mapping of the values of the coefficient a i(k)

+
+
y (k) A ( k, z1)
y e (k)
ee ( k ) _

+
Fig. 2.9 Block diagram of an EE IIR digital filter
where fai kg and fbi kg are variable parameters of the IIR filter, estimated by a
suitable adaptive algorithm (the adaptive algorithm recursively minimizes the
adopted criterion function).
The signal ye k is a function of the momentary value and the M previous
values of the input signal xk, as well as of N previous values of the reference
signal yk. One should note that it does not depend on the values of the filter
output signal, yo k. Neither the reference signal yk, nor the input signal xk
depend on the filter coefficients, so the determination of the values of ye k with
regard to the filter parameters is a non-recursive procedure. Expression (2.95) can
be written in its polynomial form

ye k B k; z1 xk A k; z1 yk; 2:96
where polynomials are in a discrete moment kT (T signal sampling period)
XN
A k; z1 ai kzi
i1
2:97
X
M
1 i
B k; z bi kz :
i0
Here z1 represents the operator of unit delay, i.e. xkz1 xk 1.

In the EE model the error signal ee k is defined in the following manner

ee k yk ye k yk B k; z1 xk A k; z1 yk: 2:98
It follows from expressions (2.96)(2.98) that ee k is a linear function of the
coefficients of the polynomials A and B (elements of the vector H), i.e. that the
criterion function
is defined according to the EE mean square error,
MSE = E e2e k ; a quadratic equation with a unique global minimum. Therefore
the properties of the EE adaptive IIR filters are similar to those of the adaptive FIR
filters (where Ak; z1 0). They utilize similar adaptive algorithms, with similar
convergence properties as the FIR adaptive filters [1]. Equation (2.95) can be also
represented in the vectorial form
ye k HT kXe k; 2:99
which represents a scalar product of the following two vectors
HT k b0 k b1 k . . . bM k a1 k a2 k . . . aN k; 2:100
Xe k xk xk 1 . . . xk M yk 1 yk 2 . . . yk N T : 2:101
The vector Hk contains estimated parameters in a discrete moment k, and the
vector Xe k contains actual and delayed values of the input and reference signal.
It is important to note that Xe k is not a function of the vector Hk. Various
algorithms may be used for the estimation of parameters, like Recursive Least
Square (RLS) method, Weighted Recursive Least Square (WRLS), Least Mean
Square (LMS) method and others [4, 6]. These algorithms were described in detail
in the previous sections.
IIR adaptive algorithms based on the EE model may converge to values shifted
in comparison to the optimal ones, which leads to an erroneous estimation of
parameters. Although EE adaptive IIR filters have good convergence properties, in
principle they may be completely unacceptable models if this shift is significant in
the estimation of parameters. Let us note that the estimation of the parameter
vector is unbiased if the mathematical expectation of the parameter vector esti-
mation (the estimated parameters represent a random vector) is equal to their
accurate (optimal) value.
OE adaptive IIR filters update the coefficients of the IIR filters in the return
branch directly both in the domain of zeroes and in the domain of poles. In this
case the estimated parameters are not shifted compared to the optimal values, but
the adaptive algorithm may converge to a local minimum of the criterion function.
This means that the estimated values do not have to correspond to the optimal
values. The block diagram of the OE adaptive IIR filter is shown in Fig. 2.10. In
this case the equation of the output error, OE, is defined as
eo k yk yo k: 2:102
The error signal eo k is a nonlinear
function of the filter coefficients, and the
criterion function MSE E e20 k may have more than one local minimum. The
corresponding adaptive algorithms in principle converge slower than the EE
algorithm and may converge to local minima. However, the distinction of the OE
method compared to the EE method is that the adaptive filter generates the output
yo k based on the input signal xk only, while in the case of the EE method the
reference signal, yk, also takes part in the adaptation process.
The signal at the output from the OE IIR adaptive filter is defined by the
difference equation
Fig. 2.10 Block diagram of

the OE adaptive digital IIR
filter B ( k, z1) y o (k )
x (k )
1 A( k , z1)
e o (k )
+
y (k )
X
N X
M
yo k ai kyo k bi kxk i: 2:103
i1 i0
It is seen from (2.103) that the output signal yo k is a function of its N previous
values, as well as of the actual and the M previous values of the input signal xk.
Such a feedback over the output signal significantly influences the adaptive
algorithm, making it much more complex compared to the EE approach. Analo-
gously to expressions (2.96) and (2.97), the expression (2.103) can be represented
in the polynomial form

Bk; z1
yo k xk; 2:104
1 Ak; z1
or like a vectorial equation (scalar product)
yo k HT kXo k; 2:105
where the vector of variable parameters Hk is defined by (2.100), and Xo k
XTo k xk xk 1 . . . xk M yo k 1 yo k 2 . . . yo k N
2:106
The output yo k is a nonlinear function of the parameters of the vector H,
because fyo k i; i 1; 2; . . .; N g like an element of the parameter Xo , is a
function of the coefficients of the filter in the previous k iterations. This fact
significantly complicates synthesis of adaptive algorithms for the estimation of
parameters of a digital filter.
Adaptive algorithms which will be described in this section can be represented
in a general form denoted as the Gauss-Newton algorithm [25].
The Gauss-Newton algorithm represents a stochastic version of the Newton
deterministic algorithm (2.30). To illustrate this algorithm, let us consider the
model of a stochastic signal described by the difference equation (the linear
regression equation)
yk HT Xk ek; 2:107
where yk and Xk are measurable variables, and H is the unknown parameter
vector to be determined. In the above expression ek represents a random residual
or error and the natural way to determine H is to minimize the variance of error,
i.e. the mean square error
1 1 n 2 o
J H E e2 k E yk HT Xk ; 2:108
2 2
where Efg denotes mathematical expectation. Since J H is a quadratic function
of the argument H, its minimum is obtained by solving the equation
o
J k rJ H E Xk yk HT Xk 0: 2:109
oH
The quoted problem cannot be exactly solved, since the aggregate function of
the probability density of random variables yk; Xk, which is necessary to
determine the mathematical expectation, is unknown. One of the ways to over-
come this difficulty is to replace the unknown mathematical expectation by the
corresponding arithmetical means, i.e. to adopt the approximation
1X M
Eff xg f i; 2:110
M i1
which leads to the least square algorithm, described in detail in the precious
section. Another possibility is to apply a stochastic version of the Newton deter-
ministic scheme (2.30)
1
Hk Hk 1 ck r2 J Hk 1 rJ Hk 1; 2:111
where the Hessian is
d2 d
r 2 J H k 1 J k rJ Hk 1 E XkXT k : 2:112
dH2 dH
It can be seen that the Hessian is independent on H. The Hessian can be
determined as the solution R of the equation

E XkXT k R 0: 2:113
To iteratively solve this equation one may further use the Robbins-Monro
stochastic approximation procedure [24, 28].
A typical problem of stochastic approximation may be formulated in the fol-
lowing manner. Let fekg represent a series of stochastic variables, with an
identical distribution function, where k denotes the index of a discrete moment. Let
one further have a given function Qx; ek of two arguments x and ek, whose
form does not have to be known accurately, but for each adopted x and the
obtained ek one can determine the value of the function Q; . The problem is
now to determine the solution of the equation
EfQx; ekg f x 0; 2:114
where Efg denotes the mathematical expectation with regard to the random
variable ek, where it is assumed that the user does not know the distribution
function, i.e. the probability density, of the stochastic variable ek. The posed
problem reduces to the determination of the series xk; k 1; 2; . . ., the calcu-
lation of the corresponding values of Qx; ek and the determination of the
solution of the Eq. (2.114). This equation is also denoted as the regression
equation. Its trivial solution consists in fixing the variable x, determining a large
number of values of Qx; ek for the adopted x, with the aim to obtain a good
estimation of the f x, and repeating such procedures for a certain number of new
values of the variable x until the solution of the regression equation is found
(2.114). Obviously such a procedure is not efficient, since much time is spent to
estimate f x for the values of the variable x which significantly differ from the
looked-for solution (2.114). Robbins and Monro proposed the following iterative
solution for the determination of the root of the Eq. (2.114)
^xk ^xk 1 ckQ^xk 1; ek; 2:115
where fckg is a series of positive scalar variables which tend to zero with an
increase of the index k. The convergence properties of the proposed procedure
were analyzed by Robbins and Monro, Blum and Dvoretsky, where it was shown
that the series (2.115) under certain conditions will converge to the solution of the
Eq. (2.114). Typical assumptions in these analyses were that the terms of the series
fekg are independent stochastic vectors, which is not fulfilled in a general case
[24, 28]. Especially for the problem under consideration (2.113)
^x R; ek Xk; Q^x; ek XkXT k R; 2:116

so that the algorithm (2.115) reduces to the form

Rk Rk 1 ck XkXT k Rk 1
2:117
1 ckRk 1 ckXkXT k:
In this manner, our estimation of the Hessian r2 J H in the moment k is

represented by the matrix Rk. Using this estimation, the unknown parameter
vector in the moment k can be estimated using the stochastic Newton algorithm
(2.111), i.e.

Hk Hk 1 ckR1 kXk yk XT kHk 1 ; 2:118

where Xk yk XT kHk 1 represents the approximation of the gradient
rJ H in the moment k 1, which is obtained if the mathematical expectation
in (2.109) is approximated by only a single realization of the random process.
A modified version of this algorithm, useful for the estimation of the unknown
parameters of the IIR filter, is given by the expression [24]
H ^ k aR1 k 1XF keG k;

^ k 1 H 2:119
where XF k and eG k are filtered versions of the input signal vector Xk and the
error signal ek in accordance with
XF k Fk; zXk; eG k Gk; zek; 2:120
where Xk and ek may be Xe k and ee k for the EE method, or Xo k and
eo k for the OE method, respectively. The filters, i.e. the polynomials, F k; z and
Gk; z are defined as
X
n
F k; z fi k zi ; 2:121
i0
X
m
Gk; z gi k zi : 2:122
i0
The scalar variable a controls the convergence speed of the algorithm, and the
matrix R(k) is updated as
Rk 1 qRk aXF kXTF k; 2:123

where q 1 a is the forgetting factor. Typical values for q are between 0.9 and
0.99, which corresponds to the effective memory between 10 and 100 samples,
respectively [1] (see section 2.4.4).
From (2.119) it follows that to update the filter parameters it is necessary to
know the values of the inverse matrix of R, i.e. R1 . This is very complex from the
computational point of view, and thus R1 is most often directly updated, using the
lemma on matrix inversion (2.58) and the expression (2.123)
!
1 1 1 R1 kXF kXTF kR1 k
R k 1 R k : 2:124
q q a XTF kR1 kXF k
The role of the matrix R1 is to speed the convergence of the adaptive algo-
rithm, and the price to pay is the increase of computational complexity. If the value
R1 k 1 in (2.119) is replaced by a unit matrix I, one obtains an algorithm with
worse convergence properties, but also a lower complexity, of the order of
M N compared to M N 2 of arithmetic operations in the basic algorithm.
The parameter parameter: vector is most often initialized to H0 0, where 0
is the zero-vector with corresponding dimensions, whose all components are 0, and
R0 r2 I, where r2 is a small, positive and scalar variable. Other initial values
may be defined too, but one has to take care that R is a positively definite matrix,
in order to enable the determination of the inverse matrix R1 , and that the poles
Table 2.3 Flow diagram of EE-WRLS algorithm

1. Initialization
^
H0 0; R1 0 r2 I; r2 1
Generate the sample of the input signal x(0) and the reference signal y(0)
Initial error of Eq. ee 0 y0
Read in the forgetting factor 0:9 q 0:99; a 1 q
^ 1; ee k 1, R1 k 1
2. In each discrete moment of time k 1; 2; . . .; assuming that Hk
and Xe k are known, calculate:
Gain matrix
!
R1 k 1Xe k 1XTe k 1R1 k 1
R1 k q1 R1 k 1
q a XTe k 1R1 k 1Xe k 1
Filter coefficients
^ k H
H ^ k 1 aR1 kXe k 1ee k 1
Update data vector
Xe k xk xk 1 . . . xk M yk 1 yk 2 . . . yk NT where
xi yi 0 for i\0(causal system)
^
Calculate error of the Eq. ee k yk XTe kHk
3. Increment iteration counter k by 1 and repeat the procedure from the step 2
of the polynomial 1 Ak; z1 in (2.104) are always within the unit circle of the
z-complex plane, in order to ensure filter stability.
For the EE method one takes
F k; z Gk; z 1; 2:125
so that
XF k Xe k; eG k ee k: 2:126
The corresponding algorithm is the WRLS algorithm, and if one takes a unit
matrix I, for Rk 1 one obtains the LMS algorithm. The block diagram of the
EE WRLS algorithm is given in Table 2.3.
2.5.1 Recursive Prediction Error Algorithm

(RPE Algorithm)
The recursive prediction error (RPE) algorithm updates the parameters of the
vector H according to the process of minimum square error MSE, n E e2o k ,
where eo is the output error. Since in general n is an unknown variable, the
algorithm is designed to minimize in each iteration the estimated actual value of n,
expressed as f e2o k, and the consequence of such approximation is a relatively
noisy estimation of the filter parameters.
The proposed RPE algorithm updates the parameters of Hk in the negative

direction of the gradient of the criterion function fk. Taking into account (2.102),
the gradient of the criterion function
1
fk e20 k
2
is
ofk
rfk eo kreo k eo kryo k; 2:127
oHk
where according to (2.103)

oyo k oyo k oyo k oyo k oyo k oyo k T

ryo k ... ... : 2:128
oa1 k oa2 k oaN k ob0 k ob1 k obM k
The last term in (2.127) stems from the definition of the error of the output of
OE and the fact that the reference signal yk is independent on the values of the
parameters of Hk.
Since according to (2.105),

ryo k r HT kXo k ;
and Xo k in (2.106) also contains previous values of the output, y0 k i, which
are a function of the previous estimations of the parameter parameter: vector Hk,
an obvious dependence between Xo k and Hk follows from it. According to the
expression (2.103), one may write
oyo k XN
oyo k i
yo k j ai k
oaj k i1
oaj k
2:129
oyo k XN
oyo k i
xk j ai k :
obj k i1
obj k
If sufficiently small value is taken for the coefficient a in (2.119), which

influences the convergence speed, so that the adaptation is sufficiently slow, one
can then introduce the following approximation
Hk Hk 1 Hk N 1; 2:130
i.e. one may neglect the mentioned dependence. This is acceptable in the majority
of cases, especially if N is low, so that one may write
XN
oyo k oyo k i 1
yo k j ai k yo k j; 2:131
oaj k i1
oaj k 1 Ak; z1
XN
oyo k oyo k i 1
x k j ai k xk j; 2:132
obj k i1
obj k 1 Ak; z1
where the polynomial Ak; z1 is defined by (2.97). While deriving relations
(2.131) and (2.132) the operator of unit delay z1 was introduced, so that
oyo k 1 oyo k o k1 o k
z1 and oyob j k
z1 oy
obj k . It follows that in the general form
oaj k oaj k
of the algorithm (2.119)(2.122)
1
F k; z ; G k; z1 1: 2:133
1 Ak; z1
Now it is possible to write the complete RPE algorithm

^ ^ 1 1
Hk 1 Hk aR k 1 Xo keo k; 2:134
1 Ak; z1
where Xo k is defined by the expression (2.106), eo k by (2.102) and (2.105), and
Rk by the expression (2.124), while the polynomial A(k) is given by the
expression (2.97).
Relation (2.134) can be written in an alternative form
H ^ k aR1 k 1Xof keo k;

^ k 1 H 2:135
where
1
Xof k 1
Xo k F k; z1 Xo k 2:136
1 Ak; z
is the filter vector Xo k of inputoutput data.
Starting from the definition (2.97) for the polynomial Ak; z1 , the coefficient
of the polynomial
X
N
Fk; z1 1 fi kzi 2:137
i1
in (2.121) n N can be determined according to the relation [8]

X
i
fi k aj kfij k; i 1; 2; . . .; N; f0 1: 2:138
j1
In this way, the vector of filtered inputoutput data is defined by the expression

XTof k xf k; xf k 1; . . .; xf k M; yf k 1; yf k 2; . . .; yf k N ;
2:139
where
X
N
xf j F k; z1 xj xj fi kxj i; j k; k 1; . . .; k M;
i1
2:140
X
N
yf j F k; z1 yo j yo j fi kyo j i; j k 1; . . .; k N:
i1
2:141
Introducing the parameter vector
f T k ff1 k; f2 k; . . .; fN kg 2:142
and the vectors of inputoutput data
XT j fxj 1; xj 2; . . .; xj Ng; j k; k 1; . . .; k M;
2:143
YTo j fyo j 1; yo j 2; . . .; yo j Ng; j k 1; k 2; . . .; k N;

2:144
the relations (2.140) and (2.141) can be written in the vectorial form
xf j xj XT jf k; j k; k 1; . . .; k M; 2:145
yf j yo j YTo jf k; j k 1; k 2; . . .; k N: 2:146
If the estimations of parameters are close to their optimum values, the proce-
dure can be simplified by filtering only the last input and output data instead of the
whole sequence in (2.145) and (2.146), respectively.
The main disadvantage of the RPE algorithm is that the filter poles, also used to
calculate the derivatives (2.131) and (2.132), may be located outside the unit circle
in the complex z-plane, which implies the appearance of instability. If the poles
remain longer in this region during the adaptation process, a possibility occurs that
the algorithm will diverge. There is a possibility that the poles appear outside the
unit circle, especially because of the noisy estimation of the gradient, since the
approximation fk is used instead of the gradient nk. In order to avoid this, it is
necessary to permanently monitor Pthe system stability. One of the simplest tests to
check stability is to inspect if jai j\1 in each iteration. However, there are
i
cases, especially for large value of N, that this criterion is not satisfactory. On the
other hand, there are tests to establish system instability with certainty, but
computationally they are very complex [3].
In a majority of cases when the test shows that new parameters lead to instability,
most often they are simply neglected, i.e. one takes that Hk ^ 1 Hk.^ Naturally,
this degrades the properties of the algorithm and makes the algorithm a non-robust
one, since it may remain in that state for an indeterminate period of time. If the poles
are located within the unit circle, the filter will be stable only if it represents a linear
and time-invariant system. For the systems variable in time, such as the adaptive IIR
filters, it is not sufficient only to follow the position of poles in discrete time intervals
in order to have the realized system efficient in practical situations.
A block diagram of the RPE algorithm is given in Table 2.4.
Table 2.4 Flow diagram of RPE algorithm

1. Initialization
^
H0 0; R1 0 r2 I; r2 1
Generation of the sample of the input signal x(0) and the reference signal y(0)
Initial output error eo 0 y0 yo 0; yo 0 0
Read in the forgetting factor 0:9 q 0:99
Calculation of the convergence factor a 1 q
Forming of the initial vector of filtered data Xof 0 0
2. Assuming that Hk ^ 1; eo k 1, R1 k 1 and Xof k 1 are known, in each discrete
moment of time k 1; 2; . . .; calculate:
Gain matrix

R1 k1Xof k1XT k1R1 k1
R1 k q1 R1 k 1 q aXT k1R1ofk1X k1
= of of
Filter coefficients
H^ k H^ k 1 aR1 kXof k 1eo k 1
Update data vector
XTo k xk xk 1 . . . xk M yo k 1 yo k 2 . . . yo k NT where
xi yi 0 for i\0(causal system)
Calculate output error
^
eo k yk XTo kHk yk yo k
Calculate coefficients of the filter that filters Xo k
P
N
fi k ^ M1j kfij k; f0 k 1; i 1; . . .; N
H
j1
Form the vector of coefficients
f T k f1 k f2 k . . . fN k
Filter input and output data
xf k xk XT kf k, where
XT k xk 1 . . . xk N; xi 0 for i\0
yf k 1 yo k 1 YTo kf k, where
YT0 k y0 k 2 . . . y0 k 1 N; y0 i 0 for i\0
Forming of the vector of filtered data
T
XTof k xf k xf k 1 . . . xf k M yf k 1 yf k 2 . . . yf k N
where xf i yf i 0 for i 0
3. Increment iteration counter k by 1 and repeat the procedure from the step 2
2.5.2 Pseudo-Linear Regression (PLR) Algorithm
Pseudo-linear Regression (PLR) algorithm represents a simplification of the RPE

algorithm by introducing
F k; z Gk; z 1: 2:147
The algorithm itself may be expressed as
H ^ k aR1 k 1Xo keo k:

^ k 1 H 2:148
Here the gradient ryo k is approximated by ryo k Xo k. The name of the
algorithm stems from the fact that the output from the adaptive filter is a nonlinear
function of the parameter H, while in the algorithm itself when calculating the
gradient (2.128) one neglects that Xo k is dependent on the parameters of
H. Xo k is also often denoted as the regression vector and is defined by expression
(2.106), while the output signal yo k is defined by expression (2.105).
The PLR algorithm is very similar to the RLS algorithm, so their computational
complexities are comparable and they are much lower than that of the RPE
algorithm.
A disadvantage of this algorithm is that it does not have obligatory to converge
to the minimum of the MSE criterion, except in the case when the polynomial in
the denominator of the transfer function (2.104), denoted as 1 Ak; z1 , satisfies
the Strictly Positive Real (SPR) condition; let us note that the discrete transfer
Table 2.5 Flow diagram of the PLR algorithm

1. Initialization
^
H0 0; R1 0 r2 I; r2 1
Generation of the sample of the input signal x(0) and the reference signal y(0)
Initial output error eo 0 y0 yo 0 y0
Read in the forgetting factor 0:9 q 0:99
Calculation of the convergence factor a 1 k
Forming of the initial vector of filtered data Xo 0 x0 0 . . . 0
^ 1; eo k 1, R1 k 1 and Xo k 1 are known, in each discrete
2. Assuming that Hk
moment of time k 1; 2; . . .; calculate:
Gain matrix
!
R1 k 1Xo k 1XTo k 1R1 k 1
R1 k q1 R1 k 1
q a XTo k 1R1 k 1Xo k 1
Filter filter:coefficients
H^ k H
^ k 1 aR1 kXo k 1eo k 1
Form data vector where xi yi 0 for i\0(causal signals)
Calculate output yo k H^ T kXo k
Calculate output error OE
eo 0 yk yo k
3. Increment counter k by 1 and repeat the procedure from the step 2
function Gz1 is denoted as SPR if RefGejx g [ 0 for 8x; p\x\p, where

j is the imaginary unit. If not, the obtained results may be absolutely unacceptable
[24, 28].
Contrary to the RPE algorithm, here it is not necessary to monitor stability
during the parameter update. Because of that the PLR algorithm can be used in
combination with the RPE algorithm. When RPE algorithm becomes unstable one
adopts the PLR algorithm until the poles return to a stable area. In this way it is
possible to improve the properties of the RPE algorithm, which will ignore the
obtained results in the time intervals when the estimated poles are in the unstable
area, until the stability criterion is satisfied (Table 2.5).
Let us note at the end that the theory of adaptive IIR filters is still insufficiently
researched, since their analysis includes nonlinear systems of high order, and this
too is a reason of their relatively narrow application. Prior analyses and computer
simulations are often necessary to determine with certainty the properties of IIR
adaptive algorithms [29, 30]. Thus the analysis and synthesis of the adaptive IIR
filters in various tasks of processing and transfer of noise-contaminated signals still
represents a subject matter with both theoretical and practical interest.
Chapter 3
Finite Impulse Response Adaptive Filters
with Variable Forgetting Factor
The statistical properties of the input and the reference signal determine the
environment of an adaptive filter. Although most of the analyses of adaptive filters
in available literature are based on a stationary environment, the utilization of
adaptive filters shows its advantages primarily in nonstationary environments.
Nonstationarity may be categorized with respect to the change of statistical
properties of the input signal, the reference signal, also including the variation of
the estimated system parameters, or both simultaneously. This Chapter considers
the cases when the input signal is stationary, although it does not have to be the
limiting condition for the application of the analyzed algorithms. Further, it was
assumed that additive noise at the system output is stationary with regard to the
reference signal (desired output), so that we considered a model of nonstationarity
caused by the variation of the value of estimated filter parameters.
When an adaptive filter is in a nonstationary environment, the most important
measures of its properties are (1) the time necessary for the algorithm to converge
after the onset of nonstationary changes; (2) the achieved accuracy of the
estimated parameters after the finished convergence. However, these two
requirements are mutually opposed, so that it is necessary to define an algorithm
representing an optimal measure of their congruence. One of the solutions is the
use of adaptive algorithms with a variable forgetting factor.
3.1 Choice of Variable Forgetting Factor
The choice of a fixed forgetting factor with a value near to unity enables efficient
following of slow changes of the system parameters. However, this approach gives
poor results if the changes of the system parameters are abrupt. As stressed in
Sect 2.4.4, the application of a variable forgetting factor in a parameter estimation
algorithm ensures different evaluations of previous measurements of signals.
In the previous Chapter it was shown that the forgetting factor q in the RLS
algorithm corresponds to an asymptotically exponential decrease of memory, with
a value defined by Eq. (2.71), i.e.

DOI: 10.1007/978-3-642-33561-7_3,
76 3 Finite Impulse Response Adaptive
1
s : 3:1
1q
If it is assumed that the properties of an environment within an interval s remain
approximately unchanged, it is possible to use (3.1) to determine the adequate
value of the forgetting factor q. Thus for nonstationary signals it is necessary to
adaptively change the forgetting factor during the operation of the algorithm. On
the nonstationary parts of the signal it is optimal to use a short memory length
s smin , for which q qmin \1, while for the stationary parts of the signal one
should establish a long memory, i.e. s smax , for which q qmax 1. In this
manner one obtains a tradeoff between the desired accuracy and the adaptation
speed of the estimated parameters. According to (3.1) it follows
smin 20 ) qmin 0:95; smax 100 ) qmax 0:99: 3:2
Further, in such an approach it is assumed that the nonstationary signal consists
of stationary parts of a certain length in a range between s smin and s smax .
However, there is only a low probability that in practical situations the duration of
these intervals of stationarity and the moment of their onset will be known.
Because of that, during the operation of the parameter estimation algorithm one
has to estimate the degree of signal nonstationarity and based on that knowledge to
automatically determine the change of the value of the forgetting factor.
Two convenient ways to adaptively determine the forgetting factor, both based
on the energy of the error signal (residual) in one data window are presented in the
papers [9] and [10]. The basic idea is to generate a variable forgetting factor based
on the error residual, which is increased in the nonstationary parts of the signal,
thus pointing out to the onset of nonstationarity [31, 32].
3.1.1 Choice of Forgetting Factor Based on the Extended

Prediction Error
Reference [9] proposed a procedure for the choice of the variable forgetting factor
based on the extended prediction error (EPE algorithm), defined on a data window
of a length L with
1XL1
Qk e2 k i: 3:3
L i0
Since the error e occurring because of the presence of additive noise at the filter
output is a stochastic process, the idea is to use averaging (summation) to remove
(filter) the stochastic error component caused by additive noise, in order to avoid
erroneous recognition of nonstationarity as a presence of high additive noise.
However, the value of L (the length of data window on which one calculates the
error energy, i.e. the EPE criterion) must be sufficiently small in comparison to the
3.1 Choice of Variable Forgetting Factor 77
maximal time constant smax (algorithm memory), in order to ensure the best
possible registering of potential nonstationarity of the signal. The choice of the
variable forgetting factor is defined by (3.1), i.e.
1
qk 1 ; 3:4
sk
where [9]:
r2n smax
s k : 3:5
Qk
Here r2n denotes the expected (estimated) variance of additive noise, based on a
real knowledge of the analyzed stochastic process which generated the measure-
ment data at the filter output. In the stationary parts of the signal the extended
prediction error Qk tends to the noise variance r2n , and in this case the maximal
asymptotic value of the memory (smax ) controls the adaptation speed. Since the
choice of the forgetting factor, defined by (3.4) and (3.5), does not guarantee
positive values of the forgetting factor q in (3.4), it is necessary to limit in advance
the bottom value of this factor to qmin \1. It turns out that this algorithm is
efficient in the cases when the signal to noise ratio (SNR) is above 20 dB. For an
SNR decreasing below 10 dB this
algorithm gives poor results (SNR ratio is
defined as SNR 10 log r2y =r2n , where r2y is variance or mean power of filter
output signal in absence of additive noise, while r2n denotes the variance as a
measure of the mean power of additive noise). Besides that, it is necessary to
specify in advance the variance of additive noise r2n , which is not easily deter-
mined in many cases. The scheme (3.4), (3.5) for the choice of the variable
forgetting factor is very sensitive to this parameter, which in a general case may be
estimated based on measured data (by their adequate averaging or in some other
way). In practical situations, to obtain a heuristic estimation of an unknown var-
iance one often uses a median of the absolute deviation of the median calculated at
a data window of a length L [19, 33]
medianfei medianeig
r2n dk ; 3:6
0:6745
where k is the current discrete moment, and the index of discrete time i belongs to
the set i k; k 1; . . .; k L 1. The median represents a middle term in the
sample whose elements are sorted as an increasing sequence if the sample length
L is an odd number, or the arithmetic mean of the two middle terms of the sample
sorted in an increasing sequence if the sample length L is an even number [16, 19,
33, 34]. The factor 0.6745 ensures that the estimation (3.6) is approximately equal
to the standard deviation of the sample, r2n , for a sufficiently large length L of the
data window and, in the case that the terms of the discrete sequence feig are
generated according to the normal distribution low, with a zero mean value and the
variance r2n . Instead of the estimation (3.6) one may also use the arithmetic mean
[16, 19, 34]
1XL1
1XL1
r2n ei e2 ; e ei: 3:7
L i0 L i0
It is not convenient to utilize the estimation (3.7) in the situations when mea-
surement noise has impulse character, i.e. when it contains sporadic realization of
high intensity that are denoted as outliers, i.e. such an estimation is non-robust
in the quoted conditions [17, 18, 19, 21, 33]. The usual choice for the estimation
(3.6) is 5 L 10, while in the case of the estimation (3.71) one adopts L 30
[26, 27, 35, 36].
3.1.2 FortescueKershenbaumYdstie Algorithm
One of the very often cited algorithms for the choice of the variable forgetting
factor in the recursive least squares (RLS) algorithm was proposed in the paper of
Fortescue, Kershenbaum and Ydstie (FKY), according to which it was named FKY
algorithm [10]. The value of the forgetting factor is determined according to the
ratio of the current value of the squared error signal and the estimated power of
additive noise. The choice of the forgetting factor in RLS algorithm is given by
e2 k
qk 1 T
; 3:8
b0 1 X kPk 1Xk
where ek is the current error or residual, and b0 is a constant chosen to satisfy the
desired estimation quality in the stationary mode of operation. Similar to the EPE
algorithm, the FKY algorithm (3.8) does not guarantee positive values of the
forgetting factor, so it is necessary to limit its bottom value to qmin \1.
Basically this algorithm was developed to ensure higher robustness (insensi-
tivity) with regard to the input signal characteristics, but it also proved itself
successful in the applications in nonstationary environments. As in the previous
case, it is necessary for its realization to know the characteristics of additive noise,
i.e. its variance r2n .
The derivation of the FKY algorithm (3.8) consists of the following steps [10].
Let a discrete system (filter) whose parameters are estimated, be described by the
following linear regression model
yk HT kXk nk; 3:9

where yk is the system output (noisy desired output of the system),
Hk b0 b1 b2 . . . bM is the vector of the estimated parameters (the system
model is known with an accuracy up to an unknown parameter vector), Xk is the
vector of input signal measurements and nk is additive noise at the system
output. If one defines a vector of estimated parameters H ^ k for the M-th

filter:order of an adaptive FIR filter

^
Hk ^ b0 k ^
b1 k ^
b2 k . . . ^bM k ; 3:10
then according to the signal model (3.9) the expected output from the adaptive
filter in a moment k (output prediction) is given as
^ k 1;
^yk XT kH 3:11
where the unknown parameter vector Hk is replaced by its last known estimation
^ k 1 before the output signal yk was measured, and noise nk is approxi-
H
mated by its mean or expected value, which is assumed to be zero.
For the considered FIR system the input data vector is given as
T
X k xk xk 1 . . . xk M , so that (3.9) reduces to a stochastic linear
difference equation (linear regression model)
X
M
yk bi kxk i ni:
i0
Parameter estimation may be achieved by the application of the recursive least

squares algorithm with an exponentially weighted of squared error signals (2.89),
the so-called WRLS algorithm, i.e.
^ k H
H ^ k 1 Kkek; 3:12
^ k 1;
ek yk ^yk yk XT kH 3:13
1
Kk Pk 1Xk q XT kPk 1Xk ; 3:14
1n 1 o
P k Pk 1 Pk 1Xk q XT kPk 1Xk XT kPk 1 ;
q
3:15
where the initial value P0 r2 I represents a unit matrix, I, multiplied by a large
positive number r2 1. As will be shown in the further text, the matrix P has the
meaning of an error covariance matrix of the estimated parameters. The role of the
forgetting factor is to enable following of the parameter changes in the systems
variable in time. The adaptation speed is determined by the asymptotic memory of
the algorithm defined by (3.1), i.e.
1
s ; 3:16
1q
which limits the evaluation of the previous signal measurements to s time samples.
It should be noted that for a choice of q 1, with advancing estimation pro-
cess, the value of the matrix P decreases, and a consequence is that the information
about the system dynamics, i.e. about the estimated parameters decreases and
finally completely disappears. On the other hand, setting q to a value below 1, in
order to include information about the changes occurred in the system (its para-
meters) leads to continuous divisions of the matrix P with a factor lower than one,
which may lead to a sharp increase of its value, as well as to a large sensitivity to
disturbances and numerical errors propagating through the residual ek in (3.13).
The error signal or residual (3.13) contains the information about the state of
the estimator in each discrete moment k. Small values of the error signal mean,
except in the case of possible absence of the input signal, that the value of the
estimated parameters is close to the desired value. In that case it is desirable to
choose a value of the forgetting factor q near to unity, in order to comparably take
into account all previous measurements. In the case of increasing error signal one
should increase the estimator value, i.e. decrease the value of the forgetting factor
q below its unit value, until the estimated parameters are updated to a desired
value, and the error signal becomes sufficiently small.
According to this requirement, one may define the measure of the information
content of the filter, bk, as a weighted sum of squares of error signals, which in
its recursive form is given as [10]
1
bk qkbk 1 e2 k 1 XT kPk 1Xk ; 3:17
where qk is the variable forgetting factor. Let us note that the second addend in
(3.17) represents a normalized error, since the term 1 XT kP k 1 Xk
represents an estimation of the error variance ek, as will be shown at the end of
this Chapter.
The choice of bk in such a manner to preserve its constant value, i.e.
bk bk 1 . . . b0 3:18
may define the strategy for the choice of the forgetting factor in such a manner that
it in each moment depends on the measure of the information contents of the filter,
which is constant. Namely, from (3.17) and (3.18) it directly follows that
e2 k
qk 1 T
: 3:19
b0 1 X kPkXk
Starting from (3.16) and (3.19) one obtains for the effective filter memory
b0
s k 1 : 3:20
T
e2 k 1 X kPk 1Xk
Since b0 is proportional to the sum of squared error signals, when choosing its
value one may start from [10]
b0 r2n s0 ; 3:21
where r2n is the expected variance of additive noise in (3.9), based on the real
knowledge of the stochastic process, and s0 represents a nominal filter memory
length determining the total speed of the adaptive process. Let us note that, similar
to (2.32), the solution of the difference equation (3.17) is given as
Y
k X
k
bk qib0 xie2 i;
i0 i0
where
1
x0 1; xi 1 XT iPi 1Xi ; i 1; . . .; k :
Taking into account (3.18) one concludes that
" #
Yk X k
b0 1 qi xie2 i:
i0 i0
Since the sum of the squared errors represents an estimation of the variance of
additive noise in the model of the filter output signal (3.9), the derived expression
implies the relation (3.21).
At the end of this section is shown that for a choice of b0 according to (3.21),
for stationary processes one obtains Efskg s0 when k ! 1. The sensitivity of
the system is determined by the choice of s0 so that lower values of s0 lead to a
more sensitive system, and higher to a less sensitive one, but with slower adapt-
ability of the estimated parameters.
Summarily, the recursive least squares algorithm with the FKY strategy for the
choice of the forgetting factor is defined in Table 3.1.
It should be mentioned that an accurate solution of the problem for the choice
of a constant value of bk, which reduces to the solution of (3.18) in each step,
requires the determination of the values of the forgetting factor prior the deter-
mination of the values of the amplification of the estimator Kk, which would
result in a much more complex relation for the choice of the forgetting factor. In a
majority of cases the practical difference between this algorithm and the described
algorithm is very small, but one must introduce testing of the obtained value of
qk, in order to ensure that the forgetting factor does not assume unacceptably
small or even negative values. This problem is avoided by limiting the bottom
value of the forgetting factor by introducing its minimal value qmin .
As shown in the previous section the algorithm (3.12)(3.15) minimizes the
criterion of weighted squared errors ei yi H ^ T Xi; defined by (2.69), i.e.
X
k h i2
J k ^ T Xi ; wi qki :
wi yi H 3:22
i0
Table 3.1 Determination of forgetting factor by FKY strategy

1. Initialization
RLS algorithm: H ^ 0 0; P0 r2 I; r2 1
Enter the first sample of the input signal vector X1 x1x00 . . . 0
Enter parameters r2n , s0 and qmin
set the value b0 r2n s0
^ k 1, Pk 1 and Xk are
2. In each discrete moment of time k 1; 2; . . . , assuming that H
known, calculate
Estimation (prediction) of the output signal ^yk XT kH ^ k 1
Error signal ek yk ^yk
Forgetting factor (set bottom limit)
2
qk 1 b 1XT ekPkk1Xk
0
1
Gain matrix of RLS algorithm Kk Pk 1Xk qk XT kPk 1Xk
^
Coefficient of the filter Hk ^ 1 Kkek
Hk
Update input vector XT k 1 xk 1 xk xk 1 . . . xk M 1
where xi 0 for i\0 (causal signal)
3. Increment iteration counter k by 1 and repeat the procedure from step 2
Relation (3.22) can be also written in the vectorial form
J k eT kWkek; 3:23
where
2 3 2 3
w0 0 0 e0
6 0 w1 0 7 6 e1 7
6 7 6 7
W k 6 . 7; ek 6 .. 7: 3:24
4 .. 5 4 . 5
0 0 wk ek
Nonrecursive least squares algorithm determines H ^ in one step from the con-
dition of the minimum of the criterion (3.22) and is defined by the relation (2.53),
^ H
or, if one adopts the notation H ^ k
" #1
X
k Xk
H^ k T
wiXiX i wiXiyi; 3:25
i0 i0
or in vectorial form

^ k ZT kWkZk 1 ZT kWkYk;
H 3:26
where the matrix and the vectors of input and output data for a FIR discrete system
(filter)
2 3 2 3 2 3
XT 0 y0 xk
6 XT 1 7 6 y1 7 6 xk 1 7
6 7 6 7 6 7
Zk 6 . 7; Yk 6 . 7; Xk 6 .. 7: 3:27
4 .. 5 4 .. 5 4 . 5
XT k yk xk M
It is assumed here that all relevant signals are causal, i.e. that xi yi 0
for i\0. The algorithm (3.12)(3.15) itself represents a recursive version of the
non-recursive single-step estimation procedure (3.25), i.e. this multistep algorithm
determines the minimum of the adopted estimation criterion (3.22), thus these two
algorithms are equivalent in the asymptotical sense k ! 1. If one further
replaces the discrete system model (3.9) into (3.25), one obtains
" #1
Xk Xk
H^ k H k wiXiXT i wiXini; 3:28
i0 i0
or in vectorial form

^ k Hk ZT kWkZk 1 ZT kWkNk;
H 3:29
where the column-vector of additive noise at the system output is
Nk n0 n1 . . . nkT : 3:30
According to (3.29) one may conclude that the estimation of H ^ k will be

approximately equal to the accurate value of Hk if the values of noise realization
ni are much smaller than the values of the components of the system excitation
signal vector Xi, i.e. jnij kXik and jnij 0, where kk denotes the norm
of the vector, which practically means that the signal-to-noise ration (SNR) is
satisfactorily high.
Relation (3.28) can be written in the modified form as
" #1
^ 1X k
T 1X k
Hk Hk wiXiX i wiXini; 3:31
k i0 k i0
so that one concludes that for sufficiently high k

^ k Hk E wiXiXT i 1 EfwiXinig;
H 3:32
where Efg denotes the mathematical expectation or mean value. While deriving
(3.32), the law of large numbers [16, 17, 19, 29, 34, 37] was applied, according to
which
1X k
lim wiXiXT i E wiXiXT i ; 3:33
k!1 k
i0
1X k
lim wiXini EfwiXinig: 3:34
k!1 k
i0
Since fwig is a deterministic series, the term wi can be moved before the
linear operator Efg in (3.33) and (3.34), thus one concludes that for sufficiently
^ k will
large k (fulfilled assumption of the law of large numbers) the estimation H
be close to the accurate value of Hk. In this manner, the expected value of the
estimation will be equal to the accurate value of the parameters (unbiased esti-
mation) under the condition that the stochastic variables Xi and ni are
uncorrelated, i.e.
EfXinig 0: 3:35
Since according to (3.27) Xi contains only realizations of the stochastic input
excitation signal of the FIR filter, the condition (3.35) will be fulfilled if additive
noise ni at the signal output (see relation (3.9)) is uncorrelated with the excitation
signal. If, besides that, the excitation signal and additive noise are distributed
according to Gauss (normal) law, the condition for them to be uncorrelated reduces
to the independence of these stochastic variables, which implies [16, 17, 19, 29,
34, 37]
EfXinig EfXigEfnig; 3:36
thus if additive noise at the output of the FIR system has a zero mean value, i.e.
^
Efnig 0, the condition (3.35) will be fulfilled and the estimation Hk will be
unbiased (its expected or mean value will be equal to the accurate value of the
parameters). Let us note that in the case of IIR systems the observation vector Xi
in (2.101) will contain both the excitation and the delayed system outputs, so that
the condition (3.35) will be realized only if the sequence fnig in (3.9) is white
discrete noise (white noise is uncorrelated in time, so that previous realizations of
noise do not depend on the later ones) [16, 17, 19, 29, 34, 37].
Let us further introduce the parameter estimation error
^ k;
Vk Hk H 3:37
and the matrix
" #1
1 X
k
Pk ZT kWkZk wiXiXT i : 3:38
i0
In that case, taking into account (3.29), the error covariance matrix for an
unbiased estimation (estimation error is a stochastic variable with a zero mean
value, since measurements are noisy, and the excitation signal is also stochastic) is
defined by the expression

covfVkg E VkVT k
3:39
E PkZT kWkNkNT kWkZkPk :
Assuming further that the input matrix Zk is known, i.e. that the excitation
signal sequence fxi; i 0; 1; . . .; kg, is given, the conditional matrix of the
error covariance is

covfVkjx0; . . .; xkg PkZT kWkE NkNT k WkZkPk:
3:40
While deriving (3.40) it was also assumed that the components of the stochastic
vector Nk in (3.30) are statistically independent on stochastic input variables
fxi; i 0; 1; . . .; kg, so that the conditional covariance of noise
covfNkjx0; . . .; xkg is equal to condition less covariance of noise
covfNkg E NkNT k . If, additionally, the stochastic variables ni in
(3.30) have identical distributions, zero mean value and identical variance r2n , then
covfNkg r2n I; 3:41

where I denotes the unit matrix with adequate dimensions. Taking into account
(3.38) and (3.41), the relation (3.40) becomes
covfVkjx0; . . .; xkg r2n Pk; 3:42

from which one concludes that the matrix Pk in (3.38), and in (3.15), is directly
proportional to the estimation error covariance matrix. Thus Pk represents the
accuracy measure of the operation of the weighted least squares estimation
algorithm. Let us finally note that, according to (3.9), (3.11) and (3.37), the pre-
diction error or the measurement residual is defined by the expression (assuming
that Hk 1 Hk, i.e. that the parameters are slowly changing)
^ k 1 XT kVk 1 nk;
ek yk XT kH 3:43
so that the conditional variance of this error, when the excitation signal sequence
x is given for the k-th moment of time

varfekjx0; . . .; xkg E e2 kx0; . . .; xk 3:44
is defined by the expression
varfekjx0; . . .; xkg XT kcovfVk 1jx0; . . .; xkgXk

varfnkg: 3:45
Taking into account (3.41) and (3.42), it follows further that

varfekjx0; . . .; xkg r2n 1 XT kPk 1Xk : 3:46

Thus the term 1 XT kPk 1Xk is directly proportional to the error
variance or to the measurement residual. Further, according to (3.20), (3.21) and
(3.46), the algorithm memory is
r2n s0
s e2
; 3:47
2
varfeg rn
from which one concludes that the mean or expected value of the algorithm
memory (3.20) is
Efskg s0 : 3:48
Let us note that while deriving (3.48) it was assumed that the law of large
numbers is valid, i.e. that k ! 1.
3.1.3 Parallel Adaptation Algorithm (PA-RLS Algorithm)
The choice of forgetting factor based on the FKY algorithm, as proposed in [11], is
denoted as the PA-RLS algorithm [38]. The methodology of the choice of for-
getting factor in this case is defined as
p p
qk 4 qkqk 4M qkqk 4 qk bmin
qk p p :
qk 4 qkqk 4M qkqk 4 qk M
3:49
Here M denotes the order of the adaptive FIR filter, and the ratio of non-
stationarity and additive noise power qk is defined as
qk maxf0; qk 1qk 1 1 qk 1pk 1g

2
e k 1 3:50
pk 1 lk 1 1 2 lk 1 lk 1 ;
r2n
where
bkbcrit
lk ; bk 1 qkM: 3:51
bkbcrit qkbcrit bk
In Ref. [11] a value of bcrit 2 was proposed, and analogously to b0 in the
FKY algorithm (3.8), it was adopted that bmin b0 , with the aim to correctly
compare these algorithms. Namely, the FKY algorithm tends to keep a constant
value of the information contents of an adaptive filter (3.17), based on the previous
knowledge on the characteristics of adaptive noise and the desired deviation of the
estimated parameters in a stationary mode of operation, i.e. the desired accuracy of
the estimation procedure in a steady state. In order to obtain equal deviations for
these two algorithms in stationary mode one should make the values of these two
parameters equal [11].
The derivation of the algorithm itself is shortly given in the next part of this
Section. Generally, one may consider two types of errors in adaptive filters. The
estimation error represents the measure of the accuracy of the estimated para-
meters in a stationary environment, while in a nonstationary environment an
important measure is the one defining the reaction speed to environment changes,
the lag error. The mean square estimation criterion represents a sum of these two
errors. When analyzing the properties of adaptive algorithms in nonstationary
environments one of the important steps is the definition of the nonstationarity
model, since it turns out that in practical situations the properties of adaptive filters
may differ for different nonstationarity models. Even in the case when the non-
stationarity model is known, its mathematical modeling is often very demanding or
even unattainable. In [11] the so-called random walk (RW) model of statio-
narity was chosen, since it is known that the algorithms showing good properties in
environments with the RW model of nonstationarity have similar characteristics in
many other nonstationary environments, typical for various applications.
The RLS algorithm with exponentially weighted forgetting factor recursively
minimizes the criterion function (3.22), i.e.
X
k
J k qki e2 k: 3:52
i0
The minimization of the criterion function (3.52) leads to the recursive WRLS
algorithm for parameter estimation (3.12)(3.15)
^ k 1 H
^ k PkXkek
H ; 3:53
q XT kPkXk

1 PkXkXT kPk
Pk 1 P k ; 3:54
q q XT kPkXk
where ek yk H ^ kXk is the error or residual of measurement, and the
matrix P is an approximation of the inverse value of the auto-correlation matrix of
the input signal

R E XkXT k ;
while yk is noisy desired response of an adaptive FIR filter (3.9) in a given
discrete moment k, i.e.
yk HT kXk nk: 3:55

In relation (3.55) Hk is the unknown parameter vector of the estimated filter,
and nk is white Gaussian noise with a zero mean value and with a variance r2n ,
independent on the input signal and on the variations of the parameters Hk. If
one introduces a notation Vk for the vector of parameter estimation error
^
Vk Hk Hk; 3:56
_
then, taking into account (3.55), the error signal ek yk Hk 1Xk is
given as (3.43), i.e.
ek VT k 1Xk nk: 3:57

To introduce an RW model of nonstationarity, let the parameters of the esti-
mated filter change in each step according to the vectorial relation
Hk 1 Hk Lk; 3:58
where Lk is a random vectorial variable defining the change of the parameter
vector, so that

EfLkg 0 and E LkLT j djk L; 3:59
where djk is the Kroneckers d- symbol

1; j k
djk :
0; j
6 k
Relation (3.59) implies that the discrete random series fLkg is a white
stochastic process uncorrelated in time, with a zero mean value and with the
covariance matrix L k j.
Let us consider further the value of the MSE criterion in a stationary (equili-
brium) state with the adopted RW model of nonstationarity. From (3.52) and
(3.56) it follows
PkXkek
Vk 1 Vk Lk : 3:60
q XT kPkXk
If the expression (3.60) is multiplied by the corresponding transposed value and
the operator of mathematical expectation (averaging) is applied, one obtains
( )
P k X k X T
k P k e 2
k
T T
E Vk 1V k 1 E VkV k L E 2
q XT kPkXk

PkXkXT kVkVT k VkVT kXkXT kPk
E
q XT kPkXk
3:61
In a limiting case when k ! 1, one may write that in steady (equilibrium) state

E Vk 1VT k 1 E VkVT k :
Then, omitting the index k from the relation (3.61), one obtains
( )
T
E PXX Pe 2
PXXT VVT VVT XXT P
L 2 E : 3:62
q XT PX q XT PX
To simplify this expression one further adopts [11]
EfPg cR1 ; 3:63

which has as a consequence
PR cIM Q; 3:64
where IM is a unit matrix with the dimensions M
M, and Q is the perturbation
matrix independent on the input signal. Since, according to (3.63), P represents the
^ of the matrix R, and the expected value of the
inverse value of the estimation R
matrix R^ is [39]
" #
X
n X
1
E R^ lim q j E Xn jXT n j q j E X jXT j
n!1
j0 j0
R=1 q 3:65
it is possible to adopt the approximation that the coefficient c in (3.63) is

c c0 1 q: 3:66
The limitations of the validity of this approximation, (q ! 1), are often mis-
understood, due to a widespread use of the RLS algorithm. For instance, a value of
the forgetting factor of q 0:99 may be adopted as close enough to unit value
in order to apply (3.66). Regretfully, in practical situations the filter order M often
has large values and for q 0:99 and M 100 the use of the expression (3.66)
results in an error larger than 50 % [11]. Thus for such a nature of the problem it is
much more acceptable to adopt
D
b 1 qM ! 0
instead of q ! 1. The use of the factor b as a parameter for the convergence
control of an adaptive algorithm also ensures invariance with regard to the order of
the adaptive filter M, which also may belong to a wide interval of values, for
instance from 10 up to 1000.
Let us consider further the dependence of the variable c on the convergence
factor b. To this purpose we will assume that the estimation of the auto-correlation
matrix of the input R, denoted as R, ^ approaches singularity when b nears some
critical value bcrit . This suggests a first-order functional dependence between the
parameters c and b, i.e.
1b
cb : 3:67
1 b=bcrit
To further simplify the expression (3.62), let us also assume that P is inde-
pendent on X, which is correct in the case of FIR filter analysis, since the window
(memory) of the RLS algorithm is, generally taken, wider than the autocorrelation
of the input signal, R. Besides that, in [39] it was shown that the value VT X is
independent on X and P.
If (3.62) is multiplied by R and the trace matrix operation tr is applied, one
obtains
n 1 o 2 n 2 o n 1 o
tr RL abcE q XT PX E e 2cE XT V E q XT PX ;
3:68
where the variable
n 2 o
tr E RPXXT P q XT PX
a n 1 o n 1 o : 3:69
tr E RPXXT P q XT PX E q XT PX
By introducing the variable a to (3.69) one obtains the identity

RPXXT P
cb tr E : 3:70
q XT PX
When deriving (3.69), it was taken into account that
n 1 o
E VT XVT QX q XT PX 0;
which follows according to the above assumptions. Let us divide (3.69) with the
minimum of the criterion MSE, r2n , to determine the deviation in stationary
(equilibrium) state, which is defined as
n 2 o
2 T
D E e E V X
M1 1 : 3:71
r2n r2n
Further one obtains
q=l ab
M1 ; 3:72
2 ab
where
n 1 o
D
l cME q XT PX ; 3:73
and q represents the ratio between the degree of nonstationarity and the additive
noise power

q M tr RL=r2n : 3:74
In [39] it was shown that it is a valid approximation to assume
ll
ab 3:75
for bcrit 2 and for large values of the filter order M, so that
ll
q=
M1 : 3:76
2l
To obtain the optimal value of the forgetting factor q, which minimizes the
deviation (error) in a stationary (equilibrium) state, one should differentiate the
expression (3.76) with respect to the parameter l . By making this derivative equal
to zero one obtains the optimum solution
hp i.

l q q 4 q 2 3:77
according to which one also obtains the optimal forgetting factor for a given
degree of nonstationarity of the environment and the additive noise power,
expressed by the parameter q,
p p
M l
2 l q 4 q q 4 M q q 4 q
q q p p : 3:78
M l
2 l q 4 q q 4 M q q 4 q
From (3.78) one concludes that for very small values of the factor q, the optimal
forgetting factor q tends to unity, and if q approaches a large value, then
q ! M 1=M 1. To determine the optimal forgetting factor q one needs
an estimation of the factor q. Regretfully, such an estimation is not observable for
an RLS process, however if the nature of the additive noise (i.e. of its mean power
r2n ), is known, one can perform the estimation of the factor q in each step in the
following manner [38]
qk 1 maxf0; qq^
^ qk 1 qq l k
2 2
3:79
e k 1=rn 1 2 l
k l k :
This expression was obtained by applying the current value of the squared error
as the criterion measure of the MSE, and by solving (3.72) over q, under the
assumption that l l . Based on (3.78) one may now estimate the optimal value of
the forgetting factor for the RLS algorithm in the next iteration, i.e.

qk 1 q ^ qk1 : 3:80
Since the optimal value of the forgetting factor q was based on minimization
of the deviation in stationary mode, in order to faster follow rapid changes
occurred after the stationary mode for which q 1, in [11] the following mod-
ification of the expression (3.78) was introduced
bmin
q q q q ; 3:81
M
where bmin is a small constant, chosen with regard to the value of deviation in
stationary state.
The resulting algorithm for the forgetting factor now appears like
p p
q k 4 ^
^ q k ^q k 4 M ^ qk^qk 4 ^qk bmin
qk p p ;
q k 4 ^
^ q k ^qk 4 M ^qk^qk 4 ^qk M
3:82
which represents the desired relation (3.49).
The flow chart of the PA-RLS algorithm is described by the following steps and
is given in Table 3.2.
Recursive procedures for the determination of the variable forgetting factor
based on (1) extended prediction error (3.3) [9]; (2) the ratio between the
momentary value of MSE and the estimated power of additive noise (3.8) [10];
(3) the ratio between the nonstationarity degree and the additive noise power
Table 3.2 Determination of forgetting factor by PA strategy

1. Initialization
RLS algorithm: H ^ 0 0; P0 r2 I; r2 0
Enter the first sample of the input signal vector X1 x10 . . . 0
2. In each discrete moment of time k 1; 2; . . .; calculate:
Estimation of input signal ^yk XT kH ^ k 1
Error signal ek yk ^yk
Ratio of nonstationarity
8 and additive noise power 9
>
< 0; q k 1qk 1 1 qk 1lk 1
> =

2
qk max e k 1
>
:
1 2 lk 1 lk 1 > ;
r2n
p p
q k 4 ^
^ qk^ qk 4 M ^ qk^ qk 4 ^ qk bmin
Forgetting factor qk p p
q k 4 ^
^ qk^ qk 4 M ^ qk^ qk 4 ^ q k M
Amplification matrix of RLS algorithm
1
Kk Pk 1Xk qk XT kPk 1Xk

1 Pk 1XkXT kPk 1
Pk Pk 1 T
qk qk X kPk 1Xk
Filter coefficients H^ k H ^ k 1 Kkek
Update new input vector XT k 1 xk 1 xk xk 1 . . . xk M 1
Where xi 0 for i\0 (causality)
3. Increment the iteration counter k by 1 and repeat the procedure from the step 2
(3.49) [11], are tested further in this section, in order to perform a more accurate
comparison under conditions close to reality. However, before presenting the
experimental results, let us consider another general approach for the generation of
a recursive least squares algorithm with variable forgetting factor, based on
minimization of the generalized criterion of weighted least squares [17, 19, 37].
3.1.4 Generalized Weighted Least Squares Algorithm

with Variable Forgetting Factor
As stressed in the previous section, the criterion generator of recursive least

squares algorithm with exponential forgetting factor, q, is defined by the expres-
sion (3.22). However, if the system is variable in time, the choice of a fixed
forgetting factor, q, results in an estimation of the average behavior of the system
(parameters) at the analyzed interval. Therefore, to estimate the momentary
properties of the system (the current values of its time-variable parameters) the
exponential forgetting factor xi qki in the criterion (3.22) should, in general
case, be replaced by some weight function (forgetting factor variable in time)
which represents an increasing function of the argument i; 1 i k for a given
moment of time k. This results in the generalized criterion of weighted least
squares
X
k
^ k k1
J H; ^ ;
qk; ie2 i; H 3:83
i1
where the residual e is defined by the expression (3.43). For a sufficiently large
k the arithmetic mean at the right side of the expression (3.83) converges to a
corresponding mathematical expectation, i.e. the criterion (3.82) reduces to

^ k E qk; ie2 i; H
J H; ^ : 3:84
A convenient form of the factor q was proposed in [17] and is given as
qk; i xkqk 1; i; 1 i k 1: 3:85
According to (3.85) one further concludes that
qk; i xkqk 1; i
xkxk 1qk 2; i
..
.
xkxk 1 xi 1qi; i;
i.e.
" #
Y
k
qk; i x j ai; ai qi; i: 3:86
ji1
The usual choice is x j 1, and especially if x j is constant, i.e. x j q

and 0\q\1, one obtains from (3.86) that
qk; i qki ai; 3:87

where q is a fixed exponential forgetting factor. Starting further from the
expression for the error or residual of measurement (3.43), the criterion (3.83)
becomes
X
k h i2
^ k k1
J H; ^ T Xi 1 :
qk; i yi H 3:88
i1
By differentiating this expression over the parameter vector H,^ from the con-

dition oJ H;^ k =oH^ 0 one obtains in a single step a non-recursive estimation of
the unknown parameter vector H ^ k H
^ in the current moment k, i.e.

oJ H;^ k Xk T ^
2k1 qk; i yi XT i 1H^ oX i 1H 0; 3:89
oH^ ^
oH
i1
where
^
oXT i 1H
Xi 1: 3:90
oH^
By introducing (3.90) into (3.89) and solving the obtained algebraic equation
^ one finally obtains
over H,
" #1 " #
Xk X
k
^ ^ 1 T 1
H Hk k qk; iXi 1X i 1 k qk; iXi 1yi :
i1 i1
3:91
In a general case, for an arbitrary qk; i, there is no recursive version of the
estimation (3.91), and thus the introduction of its recursive version requires to
specify a convenient form for the weight factor qk; i [17]. A possible convenient
form is defined by the relation (3.85). Introducing further the notation
X
k
k
R qk; iXi 1XT i 1
i1
3:92
X
k1
T T
qk; iXi 1X i 1 qk; kXk 1X k 1;
i1
and replacing the expression (3.85) into (3.92), one obtains

k xkR
R k 1 akXk 1XT k 1: 3:93
Further derivation fully follows the procedure done when deriving the recursive
forms of the least squares (LS) algorithm and the weighted least squares (WLS)
algorithm with a fixed exponential forgetting factor. Namely, taking into account
(3.92), (3.85) and (3.93), the relation (3.91) can be written in the following manner
( )
1
X
k1
H^ k R
k qk; iXi 1yi qk; kXk 1yk
i1
( )
X
k1
1 k xk
R qk 1; iXi 1yi akXk 1yk
i1
n o
1 k xkR
R 1 kH
^ k 1 akXk 1yk

1 k R
R k akXk 1XT k 1 H ^ k 1 akXk 1yk

1 k R
R k H
^ k 1 akXk 1 yk XT k 1H^ k 1 ;
from where it follows

^ k H
H 1 kXk 1ak yk XT k 1H
^ k 1 R ^ k 1 : 3:94
Using (3.93) and introducing further the replacement

k 1 akXk 1XT k 1 1
1 k xkR
P k R
1
xkP1 k 1 akXk 1XT k 1 ;
after applying the lemma on matrix inversion (2.58), one obtains

1
P k P k 1
xk
1 Pk 1Xk 1XT k 1Pk 1 1
h i
xk XT k 1 1 Pk 1Xk 1 1 xk 3:95
xk ak

1 Pk 1Xk 1XT k 1Pk 1
P k 1 :
xk xk=ak XT k 1Pk 1XT k 1
In this manner, the general form of the recursive algorithm with variable for-
getting factor is defined by the expressions

^ k H
H ^ k 1 Kke k; H
^ k 1 ; 3:96
^ k 1;
e k y k X T k 1 H 3:97
Pk 1Xk 1
Kk akPkXk 1; 3:98
xk ak XT k 1Pk 1XT k 1
( )
1 Pk 1Xk 1XT k 1Pk 1
P k P k 1 : 3:99
xk xk ak XT k 1Pk 1XT k 1
The derived recursive algorithm (3.96)(3.99) minimizes the generalized cri-

terion of weighted least squares (3.83) under the condition that the weight factor
q, the so-called variable forgetting factor has a form as in (3.85). Obviously, the
recursive least square (RLS) algorithm represents a special case of this algorithm
for xk ak 1, while the recursive least squares algorithm with fixed
exponential forgetting factor, q in (3.12)(3.15) is obtained for xk q and
ak 1. Finally, the recursive weighted least squares algorithm with a variable
forgetting factor q qk in (3.12)(3.15) is obtained from the quoted general
form in (3.96)(3.99) for xk qk and ak 1. The three strategies for the
choice of qk in the last algorithm, the so-called EGP, FKY, and PA algorithms,
were discussed in the previous section. The quoted strategies concretize the gen-
eral structure of the variable forgetting factor (3.85) and lead to practical algo-
rithms for its automatic determination in real time, during the procedure of filter
parameter estimation.
Finally, at the end of the chapter we propose a new approach for generator a
variable forgetting factor, based on the modified generalized likelihood ratio
(MGLR) algorithm.
3.1.5 Modified Generalized Likelihood Ratio: MGLR

Algorithm
Conventional MGLR algorithm is based on the consideration of signal samples

from three intervals (windows): reference, test and joint window, which is a
concatenation of the preceding two windows (Fig. 3.1). All of the three windows
have a fixed length: reference and test windows lengths are N, and the joint
window length is 2 N samples on the intervals n N 1, n 1; n N and
n N 1; n N , respectively. During analyzing procedure, all of the three
windows slide with one-sample step keeping the fixed length and relationship.
For given time instance n, we could consider two hypothesis:
H0 no change is occurred at the nth time instance,
H1 a change is occurred at the nth time instance.
Let us denote by D the logarithm of likelihood ratio between the hypothesis H1

and H0, i.e.:
Fig. 3.1 An analysis

windows position in the
MGLR algorithm
P H 1
D log 3:100
P H 0
The logarithm of likelihood ratio can be defined by using the conditional
probability density functions on the given intervals:
( )
f xnN1 ; . . .; xn xnNp1 ; . . .; xnN ; X1 f xn1 ; . . .; xnN xnp1 ; . . .; xn ; X2
D log
f xnN1 ; . . .; xnN xnNp1 ; . . .; xnN ; X3
3:101
where:

f x1 ; . . .; xn xp1 ; . . .; x0 ; Xj 3:102
is the probability density function of the samples xi ; i 1; 2; . . .; n on the given

interval, generated by the model Xj (j 1, for reference, j 2 for test, and j 3
for joint window), provided that the realization of xi , i p 1; . . .; 0 has been
known.
If we assume that the signal samples are generated by the white Gaussian
process, we obtain:
( i Nj 1
)

1 jX 2
f x1 ; . . .; xn xp1 ; . . .; x0 ; Xj Kj exp 2 u ; j 1; 2; 3
2rj iij i
3:103
where: ui is a sequence of normally distributed random variables, and Kj is the
constant:
!Nj
1
Kj p 2 3:104
2prj
and N1 N2 N, i1 i3 n N 1, i2 n 1, N3 2N, (see Fig. 3.1).

Replacing (3.103) in (3.102), we obtain:

P
n P 2
nN
K1 exp 2r1 2 u2i K2 exp 2r1 2 ui
1 inN1 2 in1
D log : 3:105
P
nN
K3 exp 2r1 2 u2i
3 inN1
If we replace ri , i 1; 2; 3; in (3.105) by their sample estimates:
1 X n
1 XnN
1 X nN
^21
r e2i ; r
^22 e2i ; r
^23 e2 3:106
N inN1 N in1 N inN1 i
the expression for D becomes:
^1K
K ^2 1 X nN
1 X nN
1 X nN
D log 2 e2i 2 e2i 2 e2 3:107
^3
K r3 inN1
2^ r1 inN1
2^ r2 in1 i
2^
where: ek ^uk is an estimate of the unmeasurable samples ui , so-called the pre-

Nj
^ j p1 2
diction error or residual, while K is the estimate of Kj in (3.104).
2pr ^ j
Having in mind the expressions for r ^ j , the relation (6.8) becomes:

^j and K
^1K
K ^2
D log ^ 1 log K
log K ^ 2 log K
^3
^3
K
!N !N !2N 3:108
1 1 1
log p 2 log p 2 log p 2 :
2pr ^1 2pr^2 2pr^3
Replacing the expressions (3.106) for r^j in (3.108), one can obtain the
expression for discrimination function D:
1 X nN
1 X n
1 XnN
Dn; N 2N log e2i N log e2i N log e2 : 3:109
2N inN1 N inN1 N in1 i
If we denote by Lc; d the logarithm of the likelihood function obtained on the

basis of the estimated residuals ej on the interval c; d, i.e.
( )
1 X d
2
Lc; d d c 1 ln e 3:110
d c 1 kc k
then we obtain for the discrimination function in (3.109).

Dn; N Ln N 1; n N Ln N 1; n Ln 1; n N : 3:111
The expression (6.12) represents the modified generalized likelihood ratio
MGLR algorithm for the hypothesis that change in signal model is occurred at
the instant n against the hypothesis that signal remain unchanged. Function
Dn; N is not smooth, and its sudden change indicates a possible abrupt signal
change. Discrimination function depends on signal change rate, and the duration of
a change. The function Dn; N is not appropriate for precise detection of abrupt
changes, because of its noisy character. That is the reason why we observe a
short D function trend in the n N=2 1; n N=2 interval, containing
N consecutive values of Dn; N . If n1 ; n2 denotes the upper interval, then
Dn; N , n 2 n1 ; n2 , can be expressed by the linear trend tn; N as
Dn; N tn; N en; N 3:112
where the linear trend
tn; N an; N k bn; N ; k 1; 2; . . .; N 3:113
and en; N is the D function noise component. The value of an; N is now smooth
enough, and represents the slope of linear trend t. Local maximum of function
an; N is at some k nmax , where Dn; N suddenly increases in the n1 ; n2
interval. Similarly, an; N riches its local minimum at k nmin , when Dn; N
starts decreasing in the n1 ; n2 interval. If we denote the difference of two con-
secutive local extremes in an; N with
Danmax ; nmin anmax ; N anmin ; N 3:114
Than it can be heuristically shown that Da in (3.114) represents a good
detection parameter when the nmax ; nmin interval contains the parts of the ana-
lyzed signal where a change appear. The intervals nmax ; nmin with
Danmax ; nmin \tr 3:115
where tr denotes a selected threshold, contain the signal sudden changes, while the
intervals in between can be treated as quasi-stationary parts of signal.
The main advantage of the MGLR algorithm, compared to the other well known
abrupt signal change detection procedures, is connected with the D function cal-
culation in (3.111). Hence, MGLR algorithm allows a posterior analysis, since the
D function is obtained in the closed form independently on the previous possibly
detected signal changes. For residual ej calculation, needed for the D function
obtaining, it is possible to use the Robust Recursive Algorithm (RRLS), instead of
the conventional parameter estimation procedures (recursive or non recursive). By
combining the robust property and the possibility of signal abrupt changes
tracking, the RRLS algorithm provides a residual that is convenient for further
analysis by the MGLR algorithm. Such a residual is suitable for the D function
calculation in (3.111).
The first step in the abrupt signal changes detection algorithm implementation,
is the Dk; N function calculation, where k is a current signal sample, and N is the
window length (one window of length N, referred to as referent and test windows,
Fig. 3.2 The Relation

between the discrimination
D function, the MGLR
algorithm and the variable
forgetting factor, q
max

min
Dmin Dmax D
is taken at both sides of the current sample). The function D reaches the maximum
values at the signal changing instances. This function ensures a good measure of
signal non-stationary. The next step is the D function mapping to forgetting factor
q, as shown in Fig. 3.2. The D function maximum is correlated with the forgetting
factor minimum and vice versa. At the beginning, it is necessary to know the value
of Dmin and Dmax .
The Dmin is considered as 0, and this value remains unchanged during the
algorithm, while the starting value of Dmax is estimated based on the two para-
meters. The first one is the number of bits used in A/D conversion of analog signal,
and the second one is the interval length for the parameter estimation. The Dmax
value is updated during the algorithm. Variable forgetting factor, q, is obtained
from the D function, based on the defined reference values of qmin and qmax ,
respectively (see Fig. 3.2).
For the calculations in the test window, it is necessary to have the residual, e,
values in advance, because the D function cannot be generates recursively. There
are two approaches for the solution of this problem. One can calculate the
D function in advance, i.e. prior to recursive analysis it is necessary to perform the
signal preprocessing. The residual, ej , obtained by standard non-recursive least
square, LS, algorithm, can be used for the D function obtaining. The other way is
to calculate the D function by recursive LS algorithm, robust or conventional, but
this method yields the one window length (N) processing delay.
3.2 Experimental Analysis 101
3.2 Experimental Analysis
3.2.1 Comparative Analysis of Recursive Algorithms

for the Estimation of Variable Forgetting Factor
(Analysis of RLS Algorithm with EGP, FKY and PA
Strategy for the Calculation of Variable Forgetting
Factor)
In the experimental part we analyzed the process of parameter identification

(estimation) for a ninth-order FIR filter with the parameter changing in time, by the
use of the recursive least squares algorithm with variable forgetting factor (VFF).
The change of the first parameter b1 k is defined at an interval of 3000 iteration.
Testing was done on two synthesized signals which represent nonstationary
changes of the filter. An abrupt change of parameter around the 100th sample from
a value of 0.1 to a value of 0.4 was introduced as the test signal. The test signal 2 is
somewhat more complex and is represented in Fig. 3.3 [9]. A change of the
parameter is given comprising sudden drops and surges, as well as a linear rise
with a smaller and with a larger inclination. The value of the parameters is con-
stant and is equal to 0.1 in the first 1100 samples, and then it linearly increases to
0.4 in the next 300 samples. After that, the value remains unchanged in the next
250 samples, to experience sudden change to a value of 0.1 about the 1650th
sample. In the next 300 samples the parameter retains a constant value, and than it
increases linearly, to have after 200 samples a value of 0.4 and to retain it in the
next 150 samples. Around the samples 2250 and 2350 the parameter also abruptly
changes.
The adopted test signals were utilized to test the adaptivity and accuracy of the
RLS algorithm, with the application of the proposed strategies for the choice of the
variable forgetting factor, VFF. The experimental analysis was performed for
Fig. 3.3 Changes of the 0.5

parameter b1 of an FIR filter
0.45
with an order of M = 9
corresponding to the test 0.4
signal 2
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 500 1000 1500 2000 2500 3000
= 0.998 = 0.85
0.5 0.5
0.4 0.4
0.3 0.3
H(1)
H(1)
0.2 0.2
0.1 0.1
0 0
0 1000 2000 3000 0 1000 2000 3000
Iteration number Iteration number
Fig. 3.4 The value of the estimated parameter of FIR filter with the application of fixed
forgetting factor q. (The dashed line represents the change of the estimated factor)
EGP
1.05
0.95
(k )
0.9
0.85
0.8
500 1000 1500 2000 2500 3000
Iteration number
0.4
parametar
0.2
-0.2
500 1000 1500 2000 2500 3000
Iteration number
Fig. 3.5 Estimation of time-variant FIR filter parameter by the application of the RLS algorithm
with the EGP strategy for the choice of variable forgetting factor. Top figure shows the variation
of the forgetting factor corresponding to test signal 1. Bottom figure shows: line value of the
estimated parameter and dashed line value of the accurate parameter
FKY
1.05
1
0.95
(k )
0.9
0.85
0.8
500 1000 1500 2000 2500 3000
Iteration number
0.4
parametar
0.2
-0.2
500 1000 1500 2000 2500 3000
Iteration number
with the FKY strategy for the choice of variable forgetting factor. Top figure shows the variation
of the forgetting factor corresponding to test signal 1. Bottom figure shows: line value of the
various values of the signal-to-noise ratio (SNR), and the results presented here are
obtained for a value of SNR 30 dB. Figure 3.4 shows the estimated parameter
values for the test signal 1, when a fixed forgetting factor (FF) of q 0:998 and
q 0:85, respectively is used in the RLS algorithm. The obtained results point out
to the inertness of the classical recursive least squares (RLS) algorithm, after the
onset of a nonstationary change for the case when the value of q is close to unity,
i.e. q 0:998. Namely, a long time is necessary to obtain correct estimations of
the filter parameter, but after that period the obtained estimations have a great
accuracy in stationary (equilibrium or steady) mode of operation.
On the other hand, when using lower values of the forgetting factor q, i.e.
q 0:85, the global trend of the nonstationarity can be estimated much faster,
owing to the fact that the RLS algorithm memory decreases and thus a much
smaller amount of measurement data is considered, but this is achieved for the
price of an increased variance of the estimated parameters. The obtained results
show that in practical situations one should utilize an adaptive algorithm with
variable forgetting factor which is determined in each step simultaneously with the
estimation of the filter parameters.
Figures (3.5, 3.6, 3.7) show the results of estimation of the time-variant para-
meter based on the test signal 1, obtained by the RLS algorithm with a variable
forgetting factor VFF, which is based on the EGP, FKY and PA strategies,
PA
1.05
0.95
(k )
0.9
0.85
0.8
500 1000 1500 2000 2500 3000
Iteration number
0.4
parametar
0.2
-0.2
500 1000 1500 2000 2500 3000
Iteration number
with the PA strategy for the choice of variable forgetting factor. Top figure shows the variation of
the forgetting factor corresponding to test signal 1. Bottom figure shows: line value of the
respectively. Here too we considered the case when the filter output is noisy due to
Gaussian noise with a zero mean value and with a variance thus chosen to ensure a
signal to noise ratio of SNR 30 dB. In the EGP the value of the data window of
L 5 was chosen, which is sufficiently small with regard to the maximal time
constant (RLS algorithm memory)
1
smax 500:
1 qmax
The choice of smax depends on the previous knowledge on the degree of signal
nonstationarity, but it turns out that it does not have a significant influence to the
algorithm if values 100\smax \1; 000 are chosen, which corresponds to the choice
0:99\qmax \0:999.
Since the value of the variable forgetting factor VFF in the EGP and FKY
algorithms can reach negative values, a bottom limit value qmin 0:85 was
introduced. In spite of the fact that the use of the PA strategy for the choice of VFF
did not require subsequent limitation of VFF, the VFF factor bottom value was
limited to a value of 0.85 in order to have correct comparison of different stra-
tegies. It turns out that all three approaches give a proper VFF factor, i.e. they
EGP
(a)
(k) 1
0.9
0.8
500 1000 1500 2000 2500 3000
(b)
0.1
Q(k)
0.05
0
500 1000 1500 2000 2500 3000
(c)
0.4
parametar
0.2
0
500 1000 1500 2000 2500 3000
Iteration number
Fig. 3.8 Estimation of time-variant parameter b1 (test 2) by RLS EGP algorithm (SNR = 30 dB,
L = 5): a VFF b extended prediction error Q c trajectories of estimated and accurate parameter b1
follow relatively good abrupt parameter changes with a small variance of esti-
mated parameters in the intervals without abrupt changes. In the neighborhood of
an abrupt filter parameter change the VFF decreases to its minimal value, while in
the stationary parts of the signal this factor increases and tends to its maximal
value, i.e. q 0:998. It should be stressed, however, that the FKY algorithm,
Fig. 3.6, for the determination of VFF factor, shows a somewhat slower adaptation
of the filter parameter change. Here the estimation quality is significantly influ-
enced by the value of the parameter b0 , whose inverse value should tend to a mean
time constant s0 in the estimation process. An increase of its value decreases the
adaptability to filter parameter changes, while a decrease of this variable increases
the variance of the estimated parameters. A value of b0 0:01 was chosen here,
which corresponds to a mean memory constant s0 of 100 samples. The results of
parameter estimation using the PA algorithm, Fig. 3.7, show similar characteristics
to those of the EGP algorithm.
Figure 3.8 shows the values of the VFF factor, extended prediction error and
the estimated parameter b1 for the EGP algorithm on test signal 2 (Fig. 3.3). The
extended prediction error Q (Fig. 3.8b) obviously detects signal nonstationarity.
Near the samples 1,650 and 2,250, where abrupt changes of the FIR filter para-
meters occur, the extended prediction error Q detects high nonstationarity, which
leads to a decrease of the forgetting factor VFF and a faster adaptation of the
(a) FKY
1.05
0.95
(k)
0.9
0.85
0.8
500 1000 1500 2000 2500 3000
(b) 0.5
0.4
parametar
0.3
0.2
0.1
0
500 1000 1500 2000 2500 3000
Iteration number
Fig. 3.9 Estimation of time-variant parameter (test 2) by RLS FKY algorithm (SNR = 30 dB,
b0 0:01): a VFF b trajectories of estimated and accurate parameter
estimated parameters. When the value of the estimated parameters approaches the
desired value, the extended prediction error decreases, and the VV increases its
value towards its maximum.
Figure 3.9 shows the results obtained by the RLS-FKY algorithm. This algo-
rithm shows somewhat larger inertness compared to the EGP algorithm, which is
especially marked for slower changes, in the interval between the 1,100th and
1,400th samples, while in the moment of sudden drop of the filter parameter value
it has characteristics similar to those of the EGP algorithm.
The PA RLS algorithm, Fig. 3.10, shows the best adaptability, which is also
reflected in a larger convergence speed and a higher accuracy of the estimated
parameters. The parameter qk, which defines the ratio between the non-
stationarity degree and the additive noise power, detects very well abrupt changes
and emphasizes them in comparison to linear changes, which results in adequate
changes of the forgetting factor VFF.
Figure 3.11 shows comparative characteristics of the estimated parameters for
all three proposed algorithms.
The obtained experimental results, based on simulations, point out to the fol-
lowing conclusions:
The use of variable forgetting factor leads to a better adaptability of the digital
filter parameter estimation in comparison to the conventional algorithms with
fixed forgetting factor.
PA
(a)
(k) 1
0.9
0.8
500 1000 1500 2000 2500 3000
(b)100
q(k)
50
0
500 1000 1500 2000 2500 3000
(c)
parametar
0.4
0.2
0
500 1000 1500 2000 2500 3000
Iteration number
Fig. 3.10 Estimation of time-variant parameter (test 2) by PA RLS algorithm (SNR = 30 dB,
bmin 0:01; bcrit 2): a VFF b ratio of nonstationarity degree and additive noise power;
c trajectories of estimated and accurate parameter
0.4
0.35
0.3
0.25
0.2
0.15
0.1
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
Fig. 3.11 Estimation of time-variant parameter (test 2) by RLS algorithm with dotted line FKY,
dashed line EGP and line PA strategy for the choice of forgetting factor VFF
An adequate change of the forgetting factor results in a satisfactory accuracy in

stationary intervals and an adequate following of nonstationary changes of filter
parameters.
The comparative analysis of the EGP, FKY and PA algorithms for the deter-
mination of the variable forgetting factor shows the applicability of all three
algorithms, with an advantage of the PA algorithm, especially marked in the
intervals with linear changes of filter parameters. In these intervals the PA
algorithm, with a correct choice of the forgetting factor, achieved a faster
convergence and a higher estimation accuracy. It should be noted that the quoted
improvements are a results of a higher mathematical complexity of this
algorithm.
Chapter 4
Finite Impulse Response Adaptive Filters
with Increased Convergence Speed
Generally speaking, the operation of adaptive filtering can be divided in two

phases: the phase of the parameter estimation convergence and the phase of the
following of the variable parameters of the filter. In the convergence phase, the
adaptive algorithm starts from some initial value of the vector of unknown filter
parameters and tends to converge to optimal parameter values, in the sense of the
adopted performance index (quality criterion) like e.g. the MSE algorithm. After
the convergence phase the adaptive filter enters the phase of following, in which
the achieved optimal solution is followed. This step is especially important if the
system is time-variant. Two very important criteria for a qualitative assessment of
the performance of adaptive algorithms are the convergence speed and the accu-
racy of the estimated parameters after the convergence has been achieved. The
convergence speed is a characteristic of the adaptive algorithm in the first phase. It
is usually defined as the number of necessary iterations for the convergence of the
algorithm sufficiently close to the optimal solution. The accuracy of the esti-
mated parameters is connected with the second phase of adaptive filtering and is a
quantitative measure of the difference between the achieved mean square error and
the minimal mean square error, generated by the optimal filter [1, 46, 9, 24, 28].
Algorithms for adaptive filtering may achieve different convergence speeds and
accuracies of estimated parameters. It is often the case that the algorithm achieves
an increased convergence speed at the cost of an increased computational com-
plexity of the algorithm itself (e.g. LMS and RLS algorithm). Besides that, the
convergence speed of some algorithms may depend on the values of the param-
eters included in design. However, there is generally an incongruity between the
achieved convergence speed and the accuracy of estimated parameters. In a
general case, when the algorithm parameters are chosen to obtain faster conver-
gence, the accuracy of the estimated parameters is decreased and vice versa.
According to this, it is desirable to design an algorithm with an increase conver-
gence speed, which at the same time will not significantly decrease the accuracy of
estimated parameters.
On the other hand, to solve the problems of parameter identification, echo
cancellation, noise suppression, etc. very often one uses adaptive an FIR filter even
in the cases when the observed system is basically of the IIR type [4, 17, 40],

DOI: 10.1007/978-3-642-33561-7_4,
110 4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
primarily to avoid stability problems connected with the IIR filters. However, since
one needs to model a system with both zeroes and poles (IIR system) with a
structure with zeroes only (FIR system), the dimensions of the parameter vector in
the filter model must be chosen sufficiently large in order to obtain satisfactory
results. A consequence of such an approach is the increase of the order of the
adaptive FIR filter, and thus the number of iterations, i.e. the time required to
achieve convergence of the adaptive algorithm for parameter estimation. For this
reason one arrives at the motive to accelerate the standard RLS algorithm by a
convenient choice of variables included in the adaptive process, one of the most
important being the input signal x(k). This choice belongs to the field of optimal
experiment planning [26, 4144]. Input signal design is much more often used in
the so-called batch processing of measurement signals or in off-line identification
(estimation) processes [41, 42, 45, 46]. The basic idea is to generate the input
signal according to a chosen criterion in such a manner to make the process of
adaptive parametric identification of the FIR system, i.e. the estimation of its
parameters, as informative as possible, i.e. to contain as much as possible infor-
mation about the estimated parameters in order to shorten the convergence time of
the estimation algorithm. The procedure of the enhancement of the convergence
properties of the adaptive algorithm is based on the specific optimal design of the
filter input signal, known in literature as D-optimal design and described in the
next part of this Chapter.
4.1 Definition of the Parameter Identification Problem
Let us consider the problem of parameter identification of an unknown system as

defined in Fig. 4.1 (the model of the system is known with an accuracy up to an
unknown parameter vector H).
Let the system under consideration be an FIR filter modeled by a linear
regression equation
Fig. 4.1 General structure of

n(k)
the system for parameter +
identification of systems +
" Unknown

system " H
y (k)
x(k)
+
Adaptive filter y (k )

(k )
H
e (k)
4.1 Definition of the Parameter 111
yk XT kH nk; 4:1
where Xk is M 1 1-dimensional vector of the input signal, H is
M 1 1-dimensional parameter vector
XT k xk; xk 1; xk 2; . . .; xk M ;
4:2
HT fb0 ; b1 ; b2 ; . . .; bM g
and M is the filter order. It is necessary to estimate the unknown parameter vector
to minimize the mean square (MSE) criterion (2.16). If this criterion is approxi-
mated by the arithmetic mean
1X k
^
Jk H ^ ;
e2 i; H ^ yi^y i=H
e i; H ^ 4:3
k i1
where ^y= is the prediction of the reference signal (desired response) y; e;
prediction error and H ^ estimation of parameter vector H, and if the RLS algorithm
(2.64)(2.66) is applied to minimize the criterion (4.3), one arrives at the recursion
for the update of the FIR filter parameters

H^ k H
^ k 1 Kke k; H
^ k 1 ; 4:4
where
1
Kk Pk 1Xk 1 XT kPk 1Xk ; 4:5
Pk Pk 1 KkXT kPk 1; 4:6

and H ^ 0 and P0 represent arbitrary initial values. A common choice for the
^ 0 0 and P0 r2 I; where r2 1, and I is a unit matrix
initial conditions is H
with adequate dimensions. The mean square optimal value for ^y=, based on the
adopted signal model (4.1), is given by

^ E ykjXk; H
^y k=H ^
^ XT kH; 4:7
where Efajbg denotes the conditional expectation of a random variable a if a
random variable b is given. Finally, making use of the lemma on matrix inversion
(2.58), the matrix P from (4.6) can be written as
" #1
Xk
T 1
Pk XiX i P 0 : 4:8
i1
4.2 Finite Impulse Response Adaptive Filters

with Optimal Input
The essence of the design (planning) of experiments is in an adequate choice of

variables included to make the experiment maximally informative with regard to
the desired application [41, 42, 45, 46]. Besides that, in the class of informative
experiments, some experiments are such that the estimated parameter vector H ^ k;
obtained on the basis of a finite amount of data, reaches the value of the real
parameter vector H much faster than an estimated parameter vector obtained under
some other conditions. The asymptotic error covariance matrix (AECM)
n o
lim kE H ^ k H H ^ k H T ;
k!1
represents the measure of the mean distance of the estimation H ^ k from the
accurate value H; determining at that the asymptotic convergence speed of the
^ k towards the real values H. On the other hand,
estimated parameter vector H
from the well-known Cramr-Rao inequality it follows
n o
kE H ^ k H H ^ k H T I1 ; 4:9
m
where Im is Fisher information matrix [28, 44]:

Im I pE XkXT k I pNXk: 4:10
The matrix with the dimensions M M

NXk E XkXT k
represents the normalized information matrix, and the scalar variable
n o
2
I p E p0 nk=pnk
is Fisher information amount, where pnk is the probability density function

(pdf) of independent and identically distributed (iid) random variables nk in the
signal model (4.1), which are also independent on the input signal sequence in
(4.2). It follows that the Fisher information matrix (4.10) is a function of the
conditions of the experiment. According to this, the optimal experiment mini-
mizing the averaged difference between the vectors H ^ and H, can be designed by
minimization of a conveniently chosen scalar function / of the matrix argument
Im ; so that in each moment the previously defined criterion is minimized.
J uIm : 4:11
Due to practical reasons, the minimization (4.11) must be considered for the
class of input signals with limited values of amplitude, thus it is adopted that the
input signal satisfies the limitation
4.2 Finite Impulse Response Adaptive Filters with Optimal Input 113
jxi j 1; i 1; 2; . . .
The problem of minimization of the criterion (4.11) with regard to the infor-
mation matrix Im is called the optimum experiment [28, 4143]. It should be
mentioned that different optimality criteria give as a result different optimal input
sequences. For D-optimality the function / is defined by [28, 4143]
1
uIm det I1
m ; 4:12
detIm
where det denotes a determinant. One of its basic advantages compared to other
strategies known in literature is its low computational complexity. Since the value
of the input signal, as the variable taking part in the experiment planning, influ-
ences only the normalized information matrix N from (4.10), the choice of the
function / from (4.12) reduces to
1
uNXk : 4:13
det NXk
If N from (4.10) is approximated by the arithmetic means
X
k
NXk k1 XiXT i; 4:14
i1
and taking into account (4.8), it follows

NXk k1 P1 k P1 0 ; 4:15
where P0 is an arbitrary initial value. In this manner the optimization problem
under consideration reduces to a minimization problem
min det Pk; 4:16
x k
while implying that the optimal input satisfies the assumed practical limitation
jxkj 1. Besides that, it follows from (4.5) to (4.6) that

Pk 1XkXT k
det Pk det I det Pk 1: 4:17
1 XT kPk 1Xk
It is known from the matrix theory that if A and B are matrices with dimensions
s r and r s; respectively, then detIs AB detIr BA. Here Is and Ir
represent s s or r r unit matrixes, respectively
[45]. According
to that, by an
adequate choice A Pk 1Xk= 1 XT kPk 1Xk and B XT k
(r 1 and s M;) from (4.17) it follows

1
det Pk det Pk 1: 4:18
1 XT kPk 1Xk
In this manner the choice of the optimal input signal reduces to
max XT kPk 1Xk;

xk
i.e.
o
XT kPk 1Xk 0: 4:19
oxk
To solve the minimization problem (4.19) an auxiliary M 1 vector is
introduced
XT k xk 1; xk 2; . . .; xk M; 4:20
and the matrix P is represented in the partition or block form

p11 p12 p1m
P11 ( k 1) P12 ( k 1) p21 p22 p2 m
P ( k 1) = = , 4:21
P21 ( k 1) P22 ( k 1)

pm1 pm 2 pmm
where P11 is a scalar value, P12 PT21 is 1 M row vector, P22 is M M

square matrix, and m M 1. Using (4.20) and (4.21), the condition (4.19)
reduces to
o h i
P11 k 1x2 k 2P12 k 1XT kP22 k 1X k 0: 4:22
oxk
By solving (4.22), with the introduced limitation of the input signal amplitude,
one arrives at

xk sign P1 11 k 1P12 k 1X k : 4:23
The proposed adaptive algorithm for FIR filters with optimal input sequence is
defined by (4.3)(4.7), (4.20), (4.21) and (4.23). Equation (4.23) determines in
each step a new value of a D-optimal input signal for the filter parameter esti-
mation algorithm.
While implementing the algorithm one may use the following procedure:
Step 1: In the state k k M let the known values include m 1 m M 1
vector of estimated parameters H^ k 1; m 1 vector Xk 1 and m
m matrix Pk 1 from the previous state k 1.
Step 2: Form a M 1 vector X k from (4.20) by deleting the last element
from the m 1 vector Xk 1.
Step 3: Form a scalar variable P1
11 k 1 as the first element of the first order of
the matrix Pk 1; and then from the remaining elements of this order
form 1 M raw vector P12 k 1, as given in (4.21).
4.2 Finite Impulse Response Adaptive Filters with Optimal Input 115
Step 4: Calculate D-optimal sample of the input signal xk based on (4.23).

Step 5: Form the vector X(k) from (4.2) by shifting right elements m 1 of the
vector Xk 1 by one position and entering the values of xk to the first
place of the resulting vector.
Step 6: Calculate the momentary value of the vector of estimated parameters H ^ k
and of the matrix Pk according to (4.4)(4.6).
Step 7: Set k k 1 and start from the step 1.
In comparison to the standard RLS algorithm, in the calculation of the
D-optimal input the computational complexity of the resulting algorithm for filter
parameter estimation is increased. Additional operations in each step are: the
product of two M-dimension vectors and multiplication of thus obtained scalar
variable with another scalar variable (steps 3, 4). Thus, the number of the nec-
essary addition and multiplication operations is equal to m and m 1; respec-
tively, where m M 1. Besides that, it is necessary to store the values of three
m 1 vectors and two m m symmetric matrices (steps 1, 6) as well as to
implement several operations of shifting (step 6). It is well known from literature
that the standard RLS algorithm requires 2:5M 2 4M multiplication and addition
operations in a single iteration [6]. Using special structures for the amplification
matrix P, it is possible to simplify the computational complexity of the algorithm
so that it becomes proportional to M instead of M 2 operations [6, 40]. It can be thus
said, especially for the filters of a higher order M that the complexity of the
proposed algorithm is approximately of the same order as that of the standard RLS
algorithm.
4.3 Convergence Analysis of Adaptive Algorithms
Convergence analysis for the proposed algorithm can be considered according to

the approach based on the use of ordinary (deterministic) differential equations
(ODE), which is presented in [24, 25, 39]. It can be shown that a recursive
stochastic structure under adequate conditions can be represented by an equivalent
ODE system and the stability properties of this ODE system can be investigated
based on the direct method of Lyapunov [24, 25, 39, 47].
The recursive least squares (RLS) algorithm, defined by relations (4.2)(4.7),
can be written in several alternative forms convenient for theoretical convergence
analysis of parameter estimation H ^ towards their accurate value H in the model
(4.1). Firstly, the amplification matrix of the RLS algorithm (4.5) can be written as
Pk 1Xk Pk 1XkXT kPk 1Xk Pk 1XkXT kPk 1Xk
K k
1 XT kPk 1Xk

Pk 1Xk 1 XT kPk 1Xk Pk 1XkXT kPk 1Xk

1 XT kPk 1 Xk
Pk 1XkXT kPk 1
P k 1 X k
1 XT kPk 1Xk
4:24
Taking into account (4.5) and (4.6), the expression (4.24) reduces to the form
Kk PkXk; 4:25
where the covariance matrix of estimation error, Pk; is defined by relations (4.5)
and (4.6). Let us also note that by replacing (4.5) into (4.6), the expression for the
matrix Pk can be written as
1
Pk Pk 1 Pk 1Xk 1 XT kPk 1Xk XT kPk 1:
4:26
Using the matrix inversion lemma (2.58), i.e.
1
A BCD1 A1 A1 B C DA1 B DA1 ;
choosing A P1 k 1; B Xk; C 1; D XT k; the expression (4.26)
assumes its alternative form
P1 k P1 k 1 XkXT k: 4:27

Introducing further the replacement
1 k;
Pk R 4:28
the expression (4.27) becomes
k R
R k 1 XkXT k; 4:29
i.e. using a new replacement
1
R k R k; 4:30
k
the relation (4.29) reduces to its alternative form
kRk k 1Rk 1 XkXT k; 4:31

from which one obtains
k 1 1
Rk Rk 1 XkXT k: 4:32
k k
4.3 Convergence Analysis of Adaptive Algorithms 117
Now the RLS estimation of the unknown parameter vector H in relation (4.1) is
defined by the expression

H ^ k 1 1 R1 kXke k; H
^ k H ^ k 1 ; 4:33
k
where the amplification matrix Rk is recursively defined by the relation (4.32), i.e.
1
Rk Rk 1 XkXT k Rk 1 : 4:34
k
The optimal input sequence is defined by the relation (4.23), which in the case
of the quoted form of the RLS algorithm becomes

xk sign R1 1
11 k 1R12 k 1X k ; 4:35
where R11 k 1 is the element in the first row and the first column of the m m
matrix Rk 1; and R12 k 1 is 1 M row vector representing the first row of
the matrix Rk 1 from which the first element R11 k 1 was deleted, while the
column vector X k is defined by the expression (4.20). Let us note that in the
adopted notation m M 1; where M is the order of the FIR filter under
consideration.
The algorithm (4.33), (4.34) represents a special case of a recursive algorithm
minimizing the mean-square criterion
1
J H E eT k; HK1 ek; H ; 4:36
2
where the prediction error or measurement residual is defined by the relation (4.3),
i.e.
ek; H yk ^ykjH: 4:37
In the expression (4.36) the matrix K represents a constant weight matrix, and if
the system under consideration has a single input and a single output, which is
fulfilled in the case under consideration, then the excitation signal x and the output
y are scalars, so that the matrix K has dimensions 1 9 1. In the quoted case the
scalar factor K has the meaning of a scaling factor and usually a value of K 1 is
adopted. In a general case of a multivariable system with more inputs and more
outputs, the inputoutput variables x and y represents column vectors, and the
choice of the matrix K influences the accuracy of the estimation procedure. Let us
note that in the case of a system with a single input and a single output the least
squares criterion (2.47) represents an approximation of the general criterion of
weighted least squares (4.36), where it was adopted thatK 1, and the mathe-
matical expectation is approximated by the adequate arithmetic means.
The general form of the algorithm recursively minimizing the generalized MSE
criterion (4.36) is defined by the stochastic Newton scheme

H ^ k 1 1 Q H
^ k 1 ck J 00 H
^ k H ^ k 1; e k; H
^ k 1 ; 4:38
where the gradient of the argument of the criterion function (the so-called risk or
loss function) in the MSE performance index (4.36)

d 1 T 1
QH; ek; H e k; HK ek; H
dH 2
4:39
d T 1
e k; H K ek; H;
dH
and the matrix of the second derivative of the MSE criterion with regard to the
parameter vector, the so-called Hessian, is

d2 1 T 1
J 00 H E e k; H K e k; H
dH2 2

T
d d e k; H 1
E K ek; H 4:40
dH dH

T
2 T
d e k; H 1 d ek; H d e k; H 1
E K E K ek; H :
dH dH dH2
Adopting the notation
deT k; H d^ykjH
wk; H ; 4:41
dH dH
and assuming that Rk is a convenient approximation of the Hessian matrix (4.40)
in k-th discrete moment, the algorithm (4.38), (4.39) reduces to

^ k H
H ^ k 1 K1 e k; H
^ k 1 ckR1 kw k; H ^ k 1 ; 4:42
where ck is a series of positive numbers decreasing with an increase of the time
index k. Let us note that the formulation of the algorithm (4.38) is based on the
approximation of the gradient of the criterion function by the gradient of its
argument, which basically reduces to the approximation of the corresponding
mathematical expectation by a single realization of the random process, i.e.

J 0 H E wk; HK1 ek; H wk; HK1 ek; H QH; ek; H:
Particularly for the considered algorithm (4.33), (4.37)
K 1; wk; H Xk; ek; H yk ^yk=H; ^yk=H XT kH; 4:43

where the input data vector Xk is defined by the relation (4.2).
The output prediction error, e, and its gradient, w; must be calculated according
to a linear model in the state space, as proposed in [24]. Namely, such a linear
prediction of the output signal (response) filters the system inputoutput data
through a finite dimensional discrete linear system, described by the discrete
model in the state space
uk 1; H FHuk; H GHuk; 4:44
^ykjH EHuk; H; 4:45

where u is the filter state vector, uT k yk xk is the vector of filter input
and output, ^ykjH is the vector of filter output (response) prediction, and F, G and
E are matrices parameterized with regard to the unknown parameter vector H that
is being estimated, i.e. they represent matrix functions of the vector argument
H. Specifically, for the considered filter (4.43)
0 0 0 0 0 1
0 0 0
F ( H )= ; G (H )= ;
IM

4:46
0 mm
0 0 2m
E ( H) = H ; T
(k , H ) = X ( k )
where IM is a unit matrix with the dimensions M M and m M 1. Differ-
entiating the expressions (4.44) and (4.45) over H, one obtains
nk 1; H FHnk; H MH; uk; H; uk; 4:47
wT k; H EHnk; H DH; uk; H;

4:48
where
o
MH; uk; H; uk FHu GHu; 4:49
oH
o
DH; uk; H EHu: 4:50
oH
Introducing the extended state vector

uk; H
qk; H ; 4:51
colnk; H
Eqs. (4.44), (4.45), (4.47) and (4.48) can be represented by the following two
vector relations defining the extended model in the state space
qk 1; H AHqk; H BH uk; 4:52

^ykjH
CHqk; H; 4:53
colwk; H
where colA denotes a column vector formed using the matrix A in such a manner
that its columns were sorted one after another. The matrices A, B and C in the
model (4.52) and (4.53) are obtained according to the concrete model of the
system (filter). Specifically, for the considered FIR system, the m 9 m matrix
(Hi denotes the i-th element of the parameter vector H)

duk; H duk; H duk; H
nk; H ; 4:54
dH1 dH2 dHm
where m M 1; so that the extended m 1 1 state vector
2 3
uk; H
6 duk;H 7
6 dH1 7
qk; H 6
6 .. 7;
7 4:55
4 . 5
duk;H
dHm
so that according to (4.44), (4.46) and (4.49) the state equation is
F(H) 0 0 G (H)
0 F(H) 0 0
q ( k + 1, H ) = q (k , H) + u (k)

0 0 F(H) 0
= A (H ) q (k , H ) + B (H) u (k)
4:56
utilizing (4.43) and (4.46), the output Eq. (4.53) assumes the concrete form
4:57
Let us note that the matrix AH contains the matrix FH in each of m(m ? 1)-
block diagonal elements, and contains all zeroes outside the main block-diagonal.
Such a structure of the matrix A has a consequence that the eigenvalues of the
matrix A are identical to the eigenvalues of the matrix F, only the degree of
multiplicity of the eigenvalues of the matrix A will be higher in comparison to the
multiplicity of the corresponding eigenvalues of the matrix F. Thus, the stability of
the extended system (4.52), i.e. (4.56), will be determined by the stability of the
linear predictor (4.44)(4.46). Since for the considered FIR filter the matrix FH is
a lower triangular matrix with all zeroes on the main diagonal, all eigenvalues of the
matrix F are equal to zero (they are located in the origin of the z-plane), i.e. z = 0 is
eigenvalue of the matrix F with a multiplicity m, so for z = 0 it will also be the
intrinsic value of the matrix A, but with a multiplicity degree m(m ? 1). Hence the
extended linear model in the state space is stable, since the eigenvalues of the
system matrix (system poles) are located within the unit circle jzj 1 in the z-plane
[3]. Let us also note that the relations (4.52) and (4.53) are not convenient for
practical calculations, and are only utilized for theoretical analysis of the properties
of algorithms for system parameters estimation, i.e. for the analysis of the propa-
gation of the prediction error and its gradient through the parameter estimation
algorithm. The parameter estimation algorithm of the system (filter) is defined by
the relation (4.42), while the prediction error (4.37) and its gradient (4.41) can be
calculated using the relations (4.52) and (4.53), i.e.

H ^ k 1 ckR1 kw k; H
^ k H ^ k 1 K1 e k; H ^ k 1 ; 4:58

^ k 1 A H
q i 1; H ^ k 1 q i; H ^ k 1 B H^ k 1 uk
4:59
i 0 ; 1; . . .; k 1; q 0; H^ k 1 q0 ;
" #
^y kjH^ k 1
C H^ k 1 q k; H
^ k 1 ; 4:60
^
colw k; Hk 1

^ k 1 yk ^y kjH
e k; H ^ k 1 : 4:61
The concrete form of the matrices A, B and C depends on the system model and
for the FIR system these matrices are defined by the expressions (4.56) and (4.57).
The algorithm (4.58)(4.61) in its basic form is not recursive, since the cal-

^ k 1 requires the use of the complete
culation of the system state vector q k; H
set of input and output measurements fui; i 0; 1; . . .kg. In order to represent
the algorithm in the recursive form, it is necessary to introduce additional
approximations. The solution of difference equation of the state Eq. (4.59) is
X
k1
^ k 1 A H
q k; H ^ k 1 k q 0 A H^ k 1 ki1 B H^ k 1 ui:
i0
4:62
If the model is stable in the state space, i.e. all eigenvalues of the matrix are

located within the area jzj\1 in the z-plane, then Ai H ^ k 1 will exponentially
tend to zero with an increase of the degree (index of moment in time) i. Also, since
ck is a very small positive number for sufficiently large k, the difference of the
estimations H^ k 1 and H
^ k n will be very small starting from some index n,
so that the following approximation can be adopted
" #
X
k1 Y
k1
^ k 1 qk
q k; H ^
A H s B H^ i ui: 4:63
i0 si1
The advantage of the expression (4.63) is that qk can be written in recursive

form, using only the last measurement of the system input and output, uk; instead
of the whole measurement set fui; i 0; 1; . . .; kg, i.e.

qk 1 A H ^ k qk B H ^ k uk: 4:64
Replacing the approximation (4.64) into the relation (4.59), the expression
(4.60) can be also approximated as
" #
^ k 1
^y kjH ^yk
C H^ k 1 qk; 4:65
colw k; H^ k 1 colwk
while the relation (4.61) can be presented in the approximate form

^ k 1 ek yk ^yk:
e k; H 4:66
In this manner, utilizing the introduced approximations, the nonrecursive
estimation algorithm (4.58) can be written in the recursive form
H ^ k 1 ckR1 kwkek:
^ k H 4:67
The recursive approximation Rk for the Hessian matrix (4.40) can be derived
in a similar manner. Let us assume that there is a value of the parameter vector H0
which furnishes a good experimental description of the system (filter) under
consideration, in the sense that efk; H0 g is a sequence of independent random
variables with a zero mean value. Thus, near the minimum point, H0 ; of the MSE
criterion (4.36) the measurement residual, e, in (4.37) represents the realization of
white noise, and thus according to the definition of white noise it is independent on
everything that occurred in the previous moments of time, i.e. it is independent on
the random term d2 e=dH2 in (4.40). In this manner, the second term in (4.40) is
equal to zero, and taking into account (4.41), the expression (4.40) reduces to

J 00 H E wk; HK1 wT k; H : 4:68
The approximation of Hessian (4.68) is denoted as GaussNewton approxi-
mation or GaussNewton search direction in the parameter vector space. The
mathematical expectation (4.68) can be further approximated by arithmetic means,
assuming that H H^ k 1,
1X k
^ i 1 K1 wT i; H

^ i 1 :
Rk w i; H 4:69
k i1
Let us note that the expression (4.69) cannot be written recursively, since the
error gradient w depends on the whole set of the measurement data about the
system input and output up to the current moment, so that w itself cannot be

^ k 1 in (4.69) is approxi-
written in the recursive form. However, if w k; H
mated by wk in (4.65), then (4.69) reduces to
1X k
Rk wiK1 wT i: 4:70
k i1
The expression (4.70) can be further written as
1Xk1
1
Rk wiK1 wT i wkK1 wT k
k i1 k
4:71
k1 1 X k1
1
wiK1 wT i wkK1 wT k:
k k 1 i1 k
Utilizing the definition expression (4.70), the relation (4.71) assumes the
recursive form

1 1
Rk 1 Rk 1 wkK1 wT k: 4:72
k k
Finally, if one adopts the notation ck 1=k; the final form of the recursive
approximation of Hessian is obtained

Rk Rk 1 ck wkK1 wT k Rk 1 ; 4:73
which actually represents the algorithm of stochastic approximation.
As emphasized earlier, in the single inputsingle output (SISO) systems, which
is the case with the considered FIR filter, K is a scalar acting only as a scaling
factor, and thus it is usually adopted that K 1. In multivariable systems with
multiple input-multiple output (MIMO) the choice of the matrix K influences the
accuracy of the estimation procedure, and usually it is adopted that K is an
approximation of the error covariance matrix

K0 E ek; H0 eT k; H0 : 4:74
Such a choice gives a minimal possible estimation error covariance matrix in
the case of nonrecursive or package estimation of the system model parameters
(off-line parameter identification) [24, 25, 4246, 48]. Since the size of K0 in
(4.74) generally is not known, it is usually recursively approximated, according to
the stochastic approximation algorithm (4.73), i.e.
h i
^
Kk K ^ k 1 ck ekeT k K^ k 1 : 4:75
In this manner, the general structure of the algorithm recursively minimizing

the generalized MSE criterion (4.36) and (4.37), is given by relations (4.64)(4.67)
and (4.73). The considered algorithm represents a recursive, stochastic, time-
variant difference equation. The algorithm convergence analysis represents a
complex problem, primarily since the relation (4.67) is connected with the rela-
tions (4.64) and (4.65). Namely, the prediction error ek and its gradient wk are

^
formed using implicitly all previous parameter estimations Hi; i 1; 2; . . .; k ;
which furnishes a very complex mapping of measurement data on system input
and output fui; i 1; 2; . . .; kg to the parameter vector space. Two general
approaches were proposed in literature for the convergence analysis of the con-
sidered algorithm:
(1) Joining the corresponding deterministic (ordinary) differential equation to the
estimation algorithm (ODE approach), thus reducing the algorithm conver-
gence problem to the stability analysis of an ordinary differential equation
[4.24, 4.39].
(2) Joining the corresponding Lyapunov stochastic function to the recurrent sto-
chastic procedure defining the estimation algorithm and applying the martin-
gale theory for the convergence analysis of the introduced function [49, 50].
In the further text we consider the first of the quoted approaches, the so-called
ODE approach. The corresponding deterministic differential equation which
asymptotically behaves in the same manner as the recurrent stochastic procedure
defining the algorithm for system (filter) parameter estimation, can be derived based
on the following considerations. For sufficiently large k the step size ck 1=k in
the parameter estimation algorithm (4.67) will be very small, thus the parameter

estimation H ^ k will change slower with an increasing time index k. This will
cause the corresponding consequences in the relations (4.64) and (4.65). Namely, the
solution of the difference equation in the state space (4.64) is given by the expression
(4.63). Let us further assume that H^ i belongs to a small neighborhood around the
for k n i k 1; where the condition is fulfilled that H
value H belongs to the set
of parameters for which the matrix AH in (4.64) is stable, i.e.
2 Ds fHjAH has eigenvalues within he circle jzj 1g:
H 4:76
is sufficiently
In this case, assuming that the mentioned neighborhood around H
small, one may write
Y
k1
A H ;
^ i An H 4:77
ikn
where the norm of the matrix (4.77) is smaller than ckn by a k\1 and a constant c.
According to (4.77), the expression (4.63) can be approximated for sufficiently
large n as
X
k1
q k kj1 BH
A H u j: 4:78
jkn
If one further adds to the sum (4.78) the terms corresponding to

AH kj BH
u j for j\k n; and which will have arbitrarily small influence,

since AH is a stable matrix, one obtains
X
k1
kj1

qk qk; H
A H u j:
BH 4:79
j0
The relation (4.49) can be recursively written in the following manner

Xk

qk 1; H kj BH
A H u j
j0
X
k1
kj

A H u j BH
BH uk 4:80
j0
X
k1

A H kj1 BH
A H u j BH
uk:
j0
Using the defining expression (4.79), the relation (4.80) obtains its recursive
form
AH
qk 1; H qk; H
B H
uk; 4:81
0:
where the initial condition is q0; H
Under the quoted conditions the following approximations are also valid
; wk wk; H
^yk ^yk=H ; ek ek; H
; 4:82
where

^ykjH
colwk; H CHqk; H; 4:83
yk ^ykjH
ek; H : 4:84
According to (4.82)(4.84), when H ^ k is near to H

and Rk is near to R;
and
k is sufficiently large, the relations (4.67) and (4.73) can be approximated as
H^ k H 1 wk; H
^ k 1 ckR ek; H
; 4:85

Rk Rk 1 ck wk; H K1 wT k; H
R : 4:86
Let us introduce further the definition expressions

E wk; H
f H K1 ek; H
; 4:87

G H K1 wT k; H
E wk; H ; 4:88
where Efg is the mathematical expectation over all inputoutput variables

fui; i 0; 1; 2; . . .; kg. It is assumed that for a sufficiently large k the transient
mode in the system (4.81) vanishes and that in a steady (equilibrium or stationary)
state the mathematical expectations in (4.87) and (4.88) are constant, i.e. time-
invariant. Under the quoted assumptions, taking into account (4.87) and (4.88), the
relations (4.85) and (4.86) can be approximated by
^ k H
H 1 f H
^ k 1 c k R ckvk; 4:89
R
Rk Rk 1 GH ckxk; 4:90
where fxkg and fvkg are sequences of random variables with a zero mean
value (zero mathematical expectation). Let us assume further that Dt is a small
number, and t and t0 are moments defined by
t0
X
ci Dt: 4:91
it
According to the relation (4.81) one can write

^ k 1 H
H 1 f H
^ k ck 1R ck 1vk 1; 4:92
i.e. after replacing (4.89) into (4.92)
^ k 1 H
H ^ k 1 ck ck 1R 1 f H ckvk ck 1vk 1
" #
X
k1 Xk1
H^ k 1 ci R 1 f H
civi:
ik ik
4:93
Similarly it follows from (4.90)
R
Rk 1 Rk ck 1GH ck 1xk 1; 4:94
from where, after replacing (4.90), one obtains
R
Rk 1 Rk 1 ck ck 1GH ckxk ck 1xk 1
" #
X
k1 X
k1
Rk 1 R
ci GH cixk
ik ik
4:95
and H
Assuming further that Rk 1 R ^ k 1 H;
according to (4.91),
(4.93) and (4.95) one concludes
t0
X
^ t 0 H
H 1 f H
DtR civi; 4:96
it
t0
X
R t 0 R
DtGH
R
cixk: 4:97
it
Since the third addend in expressions (4.96) and (4.97) represents an approx-
imation of the corresponding mathematical expectations, and according to the
assumption Efvig Efxig 0; we have
0
1 X t
civk Efcivig ciEfvig 0;
t0 t 1 it
0
1 X t
cixk Efcixig ciEfxig 0;
t0 t 1 it
so that one concludes that
^ t 0 H
H 1 f H
DtR ; 4:98
Rt0 R
DtGH
R
: 4:99
Adopting that t0 t Dt according to (4.98) and (4.99) one finally obtains a
system of ordinary differential equations (ODE system)
dHd t
R1
d tf Hd t; 4:100
dt
dRd t
GHd t Rd t: 4:101
dt
In relations (4.100) and (4.101) the index d is used to make the difference
between the solution of the system of deterministic differential equations from the
variables in the recursive algorithm to which this system is joined. However, it for
some large t00 the following is fulfilled
0
t0
X
^ t0 Hd t0 ; R t0 Rd t0 ;
H ci t0 ; 4:102
0 0
it
then for each t0 [ t00 the following will be satisfied [24]

t0
X
^ t0 Hd t; Rt0 Rd t;
H ci t: 4:103
it
The relation (4.103) shows that, asymptotically taken (for sufficiently large t),
the solution of the joined system of differential equations will coincide with the
variables in the recursive algorithm. In other words, the trajectory of the estimated
parameters asymptotically generated by the recursive algorithm, after sufficiently
long time, will follow the solution of the corresponding deterministic (ordinary)
differential equation.
The asymptotic properties of ordinary differential equations are usually
expressed through the stability features. A general method for the analysis of the
stability of systems of nonlinear deterministic differential equations was given by
Lyapunov and this methods bears the name of the second or direct Lyapunov
method [3, 24, 39]. Let us consider the following system of differential equations
dxt
x_ ; x_ f x: 4:104
dt
A set Dc is denoted as the invariant set of the vector differential Eq. (4.104) if
an arbitrary trajectory starting within the set Dc remains within it, i.e.
x0 2 Dc ) xt 2 Dc za 8t: 4:105
The equilibrium point of the stationary point x
of the differential Eq. (4.104) is
defined as the solution of the nonlinear vectorial algebraic equation
x_ 0; f x
0: 4:106
The stationary points represent the invariant set Dc in (4.105), since it is ful-
filled that
x0 x
) x_ f x
0 ) xt x
za 8t:
Any invariant set Dc has an attraction domain DA ; such that each trajectory
(solution of differential equation) starting in DA ends in Dc after an infinitely long
time (it will converge asymptotically to Dc ), i.e.
x0 2 DA ) xt ! Dc kada t ! 1: 4:107
In a general case DA Dc ; and if DA contains Dc ; the set Dc is a stable and
invariant set. If DA coincides with the whole set on which the solution of the
differential equation is defined, then one talks about the global asymptotic sta-
bility, i.e. Dc is globally asymptotically stable invariant set.
The second (direct) Lyapunov method represents a mean for stability analysis.
Let V x be a positive scalar function of a vector argument x; i.e.
V x 0 za 8x: 4:108
such that it is declining along the trajectory (solution) of the differential Eq.
(4.104), i.e.
dV xt dV xt dV xt
V 0 xtf xt 0 za 8xt; 4:109
dt dx dt
and
dV xt
V 0 x t 0 ) xt 2 Dc : 4:110
dt
Each function V x satisfying the conditions (4.108)(4.110) is denoted as the
Lyapunov function. Let us note that outside the set Dc the function V xt is a
strictly declining function of the argument t. However, since V is a function limited
from the bottom, it cannot infinitely decrease, i.e. xt must converge to Dc . The
conditions (4.108)(4.110) guarantee global asymptotic stability of the set Dc . To
determine that DA is the attraction domain of the set Dc , it is required that (4.109)
should be fulfilled only for xt 2 DA ; but an additional condition is introduced
that
c V x 0; V x c za x 2 oDA ; 4:111
where oDA is the limit of the set DA ; in order to ensure that the trajectories
(solutions of the differential equation) do not leave the set. It can be shown that if
an invariant set Dc has an attraction domain DA ; then a Lyapunov function with the
above quoted properties always exists [24, 39].
If the Lyapunov theory is applied to a recursive stochastic algorithm of
parameter estimation and to the ordinary differential equation joined to it, the
following conclusions can be drawn [24]:
(1) If Dc is an invariant set for the system of nonlinear differential Eqs. (4.100)
and (4.101), and DA is its domain of attraction, then if H ^ d t 2 DA often
^
enough, the parameter estimation Hd t will converge with probability one to
Dc for t infinitely growing t ! 1.
(2) Only stable stationary points of the system of differential Eqs. (4.100) and
(4.101) represent the possible convergence points of the recursive algorithm
(4.64)(4.67) and (4.73).
(3) The trajectories Hd t of the differential Eq. (4.100) are asymptotic paths of
the estimations H ^ t generated by the recursive algorithm (4.67).
Let us also note that the point (1) represents a statement about the local con-
vergence of parameter estimation. If the stationary point H
is locally stable, it will
have an attraction domain containing the neighborhood of the point H
. Therefore,
if the estimation sequence H ^ t belongs to that neighborhood often enough, it

will converge to H .
The practical application of the Lyapunov theory for the stability analysis of the
differential equation joined to the recursive algorithm of parameter estimation, and
thus the convergence properties of the considered recursive algorithm, consist of
the following steps:
(1) Calculate the prediction error ek; H and the approximation of its gradient
wk; H for a fixed value of the estimated parameter vector H.
(2) Using the variables from three previous step, calculate the expected (averaged)
direction of variable estimation correction in the recursive algorithm (4.87)
and (4.88).
(3) Define the system of deterministic differential Eqs. (4.100) and (4.101) which
contain on their right side the averaged directions of changes from the pre-
vious point.
(4) Analyze the stability of the system of deterministic differential equation
utilizing the Lyapunov theory.
The application of the proposed methodology for the convergence analysis to the
considered problem of parameter estimation for a digital FIR filter consists of the
following. The recursive algorithm for parameter estimation is defined by relations
(4.33)(4.35), which represent a special form of the general algorithm (4.64)(4.67)
and (4.73), when one adopts the defining relations (4.43), (4.46), (4.56) and (4.57).
The system of deterministic differential equations joined to the recursive algorithm is
defined by the relations (4.87), (4.88), (4.100) and (4.101), i.e.
dHt dRt
R1 tf Ht; GR; H; 4:112
dt dt
where

f H EfXiei; Hg; GR; H E XiXT i R: 4:113
Let us adopt that the positive function
1
V H; R E e2 i; H 4:114
2
is a candidate for the Lyapunov function. To show that (4.114) is indeed a Lyapunov
function it is also necessary to show that its derivative is non-positive, which satisfied
the assumptions (4.108) and (4.110). Indeed, according to (4.114) it follows
( )
dV H; R dei; H T dHt
E ei; H ; 4:115
dt dH dt
from where, according to (4.43), (4.112) and (4.113), one obtains

dV H; R
E ei; HXT i R1 tf Ht
dt 4:116
f T HtR1 tf Ht 0:
The parameter estimation H ^ t will converge with probability 1 to the sta-

tionary point H
of Lyapunov function, as defined by the condition (4.110). Thus,
the equality sign in (4.116) is valid for H H
for which f H
0 and the point
H
represents a stationary point of the Lyapunov function V. The stationary points
form

a stable invariant set Dc fH
jf H
0g such that
^ k 2 D c
P lim H 1. Particularly, if the stationary point is unique, the filter
k!1
^ k will converge with probability 1 to the accurate value
parameter estimations H
of the parameter vector H
; i.e. to the unique minimum of the MSE criterion.
However, the obtained results are asymptotic, i.e. they can be applied when the
number of data is sufficiently large. Regretfully, it is not known how large that
number should really be. The answer to that question, as well as the answers to
many other questions connected with the practical application of the analyzed
algorithm on a finite set of data, must be based on the experimental analysis. To
show the properties of the proposed algorithm, we will analyze the problem of
local echo cancellation in scrambling systems for full-duplex transmission. The
results of experimental analysis are shown in the continuation.
4.4 Application of Recursive Least Squares Algorithm

with Optimal Input for Local Echo Cancellation
in Scrambling Systems
In this section we analyze the application of the RLS algorithm with optimal input
for local echo cancellation in a system for speech scrambling in conventional
telephone lines. The case was considered when the scrambler is located between
the telephone handset and the base unit. Such a design ensures a very efficient
protection from the compromising electromagnetic radiation and eavesdropping
and represents one of the often met models of protection against unauthorized
access to verbal information [18, 34, 5155].
Although the protection of verbal communication has been for a long time a
privilege of state institutions, in recent time there is a surge of interest for its
application in private sector. Such an interest is a result of the availability and public
accessibility of the communication systems, as well as the influence of the techno-
logical advancement which offers acceptable solutions for protection, both regarding
the price and the quality. Among the communication system the telephone network is
the largest system in which it is necessary to solve the problem of protection of verbal
information. In this system the problem of protection can be solved by analog
scramblers or by digital coding. These are two fundamentally different approaches,
and it is well known that digital coding offers a larger degree of protection than the
analog [53]. Besides the advantages regarding the protection quality, the basic
shortcoming of digital coding is the fact that the resulting waveform takes a much
wider frequency range than the frequency range utilized by a speech signal, while the
procedure of verbal signal compression with the aim to overcome the quoted problem
implies a relatively complex practical problem [56]. Because of that there is still a
relatively large interest for the use of analog scramblers, as well as due to the need to
solve the question of protection on the existing telephone lines in the standard
telephone channels, in a manner acceptable both regarding the price and the quality
of the transmitted speech signal. In spite of the quoted advantages of analog
scramblers, there are many problems to be solved in the design of analog scramblers,
like the realized level of protection, synchronization and its maintenance, echo
cancellation, etc. Among many problems influencing the speech quality in these
application the dominant one is local echo. This problem is especially marked in the
systems with scrambling which should ensure full-duplex communication on two-

wire lines when the scrambler is placed between the headset and the base unit [34].
The considered technique for adaptive local echo cancellation encompasses local
echo cancellation in real full-duplex systems with a scrambler within the standard
telephone channel. It should be mentioned that there is another possibility to solve
this problem in which the speech signal is limited to a half of the available standard
telephone channel [54]. When solving the local echo cancellation problem a special
care has been dedicated to the analysis of the influence of training signal structure to
the properties of the system for adaptive echo cancellation [51, 52]. The problem of
local echo cancellation itself will be additionally considered in Chap. 6.
4.4.1 Definition of the Local Echo Cancellation Problem

in Scrambling Systems
A simplified full-duplex scrambling system is shown in Fig. 4.2. In an ideal case

the whole energy of the output signal from the emitter [side A, Fig. 4.2a] is being
transmitted over the telephone line to the receiver [side B, Fig. 4.2a]. However,
Scrambler Scrambler
mic.
A local echo line B

H H
loudsp.
Descrambler Descrambler
(a)
x (k )
Scrambler
local
echo
H
canceller
e (k ) y ( k )
y (k )
Descrambler +
(b)
Fig. 4.2 Block diagram of the system for voice signal scrambling. a Generation of local echo;
b Position of local echo canceller block
4.4 Application of Recursive Least Squares Algorithm 133
Fig. 4.3 Principle of local

echo cancellation for one v (k )
direction of transfer (EC Local echo path + +
echo canceller) H
x (k ) y (k )
Block EC y ( k ) +
(k )
H
e (k )
Mechanism for
EC parameter update
since the hybrid termination is not ideal, a part of the transmitted signal energy
arrives to the receiver on the local side and causes degradation of the communi-
cation quality [4, 17, 18, 23]. This signal is called the local echo. The second part
of the transmitted signal, called the line echo, is reflected because of impedance
discontinuities of the telephone line to finally also end in the receiver at the local
side. The problem of line echo cancellation was analyzed in detail in literature [17,
23], so further we will limit ourselves to the problem of local echo cancellation
only. Bearing in mind the presented system design, the local echo cancellation
block must be a part of the scrambling system.
The transfer function through the hybrid, H, which is denoted as the local echo
path, is not equal to zero. Because of that the basic problem is to approximate the
transfer function of the local echo path and then, subtracting the estimated signal
of local echo ^yk from the real local echo yk cancel the resulting signal ek as
much as possible (see Figs. 4.2 and 4.3). This corresponds to the conventional
identification scheme of an unknown system, which is known with an accuracy up
to an unknown parameter vector H, as represented in Fig. 4.3. Besides that, in
applications for local echo cancellation the disturbances or noise, nk; can be
modeled as white noise (white stochastic process) [23].
Besides that, to reach a desired degree of echo cancellation, it is necessary
to
minimize the mean square error (MSE-criterion), i.e. MSE E e2 k [18]. In
other words, the basic idea is to use an adaptive filter as the local echo canceller
(EC) block, in order to estimate the transfer function of the local echo path in best
possible way [51, 52]. It is known that the transfer function can be adequately
modeled as a real rational function with poles and zeroes (it represents a ratio of
two polynomials with real parameters) [23]. Besides that, most of the contem-
porary EC systems are realized as adaptive FIR filters which have zeroes only,
primarily due to stability problems that inevitably follow adaptive filters realized
in the form of IIR structures [17, 18]. Because of that we further analyzed the use
of an adaptive FIR filter for the approximation of the transfer function of a local
echo path.
Starting from the quoted assumption, the path of local echo can be described by
the difference Eq. (4.1). According to the problem of parameter identification, it is
necessary to determine the set of coefficients H ^ to minimize the MSE criterion
(4.3), from where the recursion for parameter update of adaptive FIR filter follows,
as given by formulae (4.4)(4.6).
Since the basic purpose of such systems is the protection of telephone com-
munications, they are primarily intended for personal use. Because of that the
connection location in telephone network is often changed, and a consequence is
the change of the local echo function in practically every new call [34]. Because of
that, in local echo cancellation the adaptation must occur at the beginning of each
new call, as a part of the training or initial procedure (pre-processing before the
transmission of the speech itself). Regretfully, during this initialization procedure
there is no useful transmission and the dialogue participants cannot hear each
other. This is a motive to accelerate the standard RLS algorithm by the choice of
variables included in design, especially the input signal, xk; which leads to the
generation of the optimal input.
4.4.2 Experimental Analysis
To show the properties of this approach, we simulated the structure for echo
cancellation shown in Fig. 4.3. The signal of the desired response yk is formed
by bringing the excitation xk to the FIR filter of the M-th order and by adding to
its output independent noise with normal distribution nk; its mean value being
zero. The variance of additive noise is chosen to achieve the desired values of the
signal to noise ratio, SNR. Various values of SNR were generated to analyze the
efficiency of the proposed design for local echo cancellation. The EC block,
implemented as an adaptive filter of the same order M, is determined by the
parameter vector H;^ which is generated by the recursive algorithm for parameter
estimation (4.4)(4.7), (4.20) and (4.23). The same excitation, xk; is applied to
the EC block. First the experiment was performed with a filter of the M 9 order
and with coefficients H f0:1; 0:2; 0:3; 0:4; 0:5; 0:4; 0:3; 0:2; 0:1g; [57].
Besides that, the application of three types of excitation signals was considered:
(1) Optimal input sequence defined by (4.23), denoted as opt;
(2) White normal noise with zero mean value and with unit variance, denoted as
gauss;
(3) Pseudo-random binary sequence 1, denoted as prbs.
The results of parameter estimation obtained by the excitation signals (1)(3)

were compared on the basis of the normalized estimation error (NEE) [57]
20
SNR=10dB gauss
prbs
NEE [dB]
0 opt
-20
-40
0 50 100 150 200 250 300 350 400 450 500
Iteration number
0
SNR=20dB gauss
NEE [dB]
prbs
-20 opt
-40
-60
0 50 100 150 200 250 300 350 400 450 500
Iteration number
0
SNR=30dB gauss
NEE [dB]
-20 prbs
opt
-40
-60
-80
0 50 100 150 200 250 300 350 400 450 500
Iteration number
Fig. 4.4 Normalized estimation error for white Gaussian noise, pseudo-random binary sequence
and optimal input sequence for a FIR filter order M 9 and various SNR values

^ k 2
H H
NEEk 10 log10 ; 4:117
kH k2
where k k is the Euclidean norm.
Figure 4.4 shows the values of the NEE factor for different SNR values. Three
different values of SNR were analyzed, 10, 20 and 30 dB, respectively. The
measure of the efficiency of the EC block was determined according to a modified
Echo Return Loss Enhancement factor (ERLE) defined in [18, 23]
n o
E y k n k 2
ERLE k 10 log10 n o; 4:118
E e k; H ^ k 1 nk 2
where e is given by (4.3) and (4.7). While implementing (4.118), the mathe-
matical expectation Ef g was approximated using an arithmetic sum of 1,500 cor-
responding values [11, 55]. In this way the ERLE factor was calculated after 1,500
time iterations, when the process of filter adaptation was already finished. Figure 4.5
M=9
60
opt
gauss
50 prbs
40
ERLE[dB]
30
20
10
0
30 20 10
SNR [dB]
Fig. 4.5 ERLE factor for white Gaussian noise, pseudo-random binary sequence and optimal
input sequence for the FIR filter order M 9; after 1,500 iterations and for different SNR values
shows the values of the ERLE factor for different SNR values. It can be seen from
Figs. 4.4 and 4.5 that the EC block realized based on an FIR filter with optimal input
sequence shows better properties than that obtained when white Gaussian noise or
pseudo-random binary sequence were utilized as excitation signal.
Besides that the following should be stressed:
(1) Initial convergence speed of the normalized estimation error (4.117) is larger
when optimal input sequence is utilized for all three analyzed SNR cases. This
is especially marked for lower values of SNR (see Fig. 4.4 for SNR = 10 dB).
(2) If the optimal input sequence is used, lower values of normalized estimation
error are generated than those obtained when utilizing Gaussian noise or
pseudo-random binary sequence for all analyzed values of the SNR, which
points out to a more successful parameter estimation and faster convergence.
(3) ERLE factor (4.118) is larger when utilizing optimal input sequence for all
analyzed SNR values, which points out to a higher efficiency in local echo
cancellation.
Echo cancellation structure presented in Fig. 4.3 was also simulated under the
conditions identical to the previous case but for larger filter orders, M = 256 and
M = 1,000. The results obtained by this simulation are shown in Figs. 4.6 and 4.7.
According to the simulation results, one may conclude that the proposed solution
for adaptive cancellation of local echo, based on the design of optimal input
sequence, shows better results in comparison to the conventional approach where
white Gaussian noise or pseudo-random binary sequence are utilized, both for low
and high filter orders. This is especially important for practical applications of the
proposed scheme for local echo cancellation. Namely, practical experience shows
that satisfactory results in local echo cancellation in the existing public telephone
20
gauss
SNR=10dB prbs
NEE [dB]
0 opt
-20
-40
0 500 1000 1500 2000 2500 3000 3500 4000
Iteration number
0
gauss
-10 SNR=20dB prbs
NEE [dB]
opt
-20
-30
-40
0 500 1000 1500 2000 2500 3000 3500 4000
Iteration number
0
gauss
SNR=30dB prbs
NEE [dB]
-20 opt
-40
-60
0 500 1000 1500 2000 2500 3000 3500 4000
Iteration number
Fig. 4.6 Normalized estimation error (NEE) for white Gaussian noise, pseudo-random binary
sequence and optimal input sequence for FIR filter order M 256 and various SNR values
Fig. 4.7 ERLE factor for M=256

white Gaussian noise, 40
opt
pseudo-random binary
gauss
sequence and optimal input 35
prbs
sequence for the FIR filter
order M 256; after 1,500 30
iterations and various SNR
25
ERLE[dB]
values
20
15
10
0
30 20 10
SNR[dB]
Fig. 4.8 Normalized estimation error (NEE) for white Gaussian noise, pseudo-random binary
sequence and optimal input sequence for the FIR filter order M 1; 000 and various SNR values
network can be obtained using FIR filters with an order M between 128 and 256.
However, this is not a limitation to a use of the proposed approach to higher order
filters, in dependence on a particular application, like for instance acoustic echo
cancellation (Fig 4.8, 4.9).
Finally, as a conclusion of the performed analysis one may stress the following:
(1) We analyzed the problem of local echo cancellation in full-duplex scrambled
transmission, when the scrambling block is positioned between the handset and
the base unit. As a solution to this problem we proposed the use of an adaptive
local echo canceller and an adaptive algorithm for local echo cancellation based
on the generation of optimal input sequence according to the D-optimal exper-
iment design. In this approach one generates in each step a new sample of D-
optimal sequence for the estimation of filter parameters. Because of this the
adaptation of the local echo canceller is defined as a training procedure occurring
prior to the beginning of the desired protected communication.
(2) The properties of this approach are shown through simulation results. Com-
pared to the traditional adaptive FIR filters with Gaussian or pseudo-random
excitation signals, is shown that whenever the choice of the adaptive filter
order is correct, the proposed solution is better, which reflects in faster
Fig. 4.9 ERLE factor for M=1000

white Gaussian noise,
30 opt
pseudo-random binary
sequence and optimal input gauss
25 prbs
sequence for the FIR filter
ERLE[dB]
order M 1; 000; after 20
10,000 iterations and various
SNR values 15
10
0
30 20 10
SNR[dB]
convergence, larger local echo cancellation and smaller normalized estimation

error. It follows from here that the D-optimal input sequence is generated
according to the changes in the telephone network topology caused by each
new call. Besides that, the proposed approach does not introduce limitations
that would hinder the realization of additional echo cancellers throughout the
duration of the protected connection.
(3) Such an approach enables the design of a new class of devices for speech
scrambling with a low level of compromising electromagnetic radiation
which, generally taken, increases the overall protection level for the whole
system. Practical experience shows that satisfactory results in public telephone
networks can be obtained for FIR filters between 128th and 256th order. This,
however, does not limit the use of the proposed approach for FIR filters of
higher order, for instance 1,000, in dependence on a particular application (a
high order of FIR filter is characteristic for acoustic echo cancellation).
4.5 Application of Variable Forgetting Factor to Finite

Impulse Response Adaptive Filter with Optimal Input
It was shown in Chap. 3 that the use of variable forgetting factor in nonstationary
environments leads to a better adaptability of parameter estimation compared to
the conventional algorithms with a fixed forgetting factor. A comparative analysis
of several strategies for the choice of VFF showed the advantage of the PA-RLS
algorithm, which is reflected in good tracking of both fast and slow changes of
estimated parameters, with a small variance of the parameter estimation in the
intervals without changes. Starting from the results obtained for the application of
optimal input sequence and presented in Sect. 4.2 and 4.4, a question occurs what
are the properties of the algorithms with optimal input in a nonstationary
environment, i.e. is it possible to apply the variable forgetting factor (VFF) based
on the PA strategy to algorithms with optimal input sequence.
To investigate the properties of the modified PA-RLS algorithm with optimal
input in which the strategy for the choice of variable forgetting factor is defined by
(3.7)(3.9), and the optimal input sequence by (4.20) and (4.23) we simulated the
identification structure shown in Fig. 4.1 in the following manner. The desired
response y(k) is obtained by bringing a signal x(k) to the input of an FIR filter of
the order M, while independent normal Gaussian noise n(k) with a fixed variance
and a zero mean value is added to the output. Noise variance is chosen to achieve a
desired SNR factor at the filter output. Various values of SNR were generated with
a goal to analyze the proposed algorithm. The adaptive filter is also an FIR filter of
the order M, determined by the vector of estimated parameters H ^ k. The same
input signal x(k) is brought to the input of the adaptive filter. The experiment was
contacted on the FIR filter of ninth order, with coefficients H
f0:1; 0:2; 0:3; 0:4; 0:5; 0:4; 0:3; 0:2; 0:1g; [57], in which the first coefficient
varied according to Fig. 3.1, as already described in Chap. 3. Besides that, two
types of input signals were analyzed:
(1) Optimal input sequence defined by (4.23), denoted as opt;
(2) White normal noise with a zero mean value and with a unit variance, denoted
as gauss;
The results of parameter estimation obtained by the use of the PA-RLS algo-
rithm for the choice VFF and the input signals (i) and (ii) were compared using the
0
Optimal
-5 Grk
-10
-15
-20
-25
-30
-35
-40
0 500 1000 1500 2000 2500 3000
Fig. 4.10 Normalized estimation error (SNR = 10 dB) for the choice of VFF using PA-RLS
algorithm with the input signal: Solid line Gaussian noise; Dashed line optimal input sequence
4.5 Application of Variable Forgetting Factor to Finite Impulse Response 141
-5
Optimal
Grk
-10
-15
-20
-25
-30
-35
-40
-45
0 500 1000 1500 2000 2500 3000
algorithm with the input signal: Solid line Gaussian noise; dashed line optimal input sequence
10
Optimal
Grk
-10
-20
-30
-40
-50
-60
0 500 1000 1500 2000 2500 3000
algorithm with the input signal: Solid line Gaussian noise; Dashed line optimal input sequence
142
Table 4.1 Value of averaged normalized estimation error for white Gaussian noise and optimal input sequence (SNR = 10 dB): (a) at each considered
interval separately (I1I11), (b) on stationary only (STAC) and nonstationary only (NEST) intervals
(a)
SNR = 10 dB
D I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11
10 Opt -27.00 -18.89 -17.76 -18.14 -16.32 -19.73 -15.51 -16.01 -16.06 -22.90 -31.96
Gauss -24.31 -18.36 -20.31 -18.05 -15.19 -20.48 -15.83 -15.89 -15.56 -22.31 -30.15
25 Opt -26.94 -19.12 -17.74 -18.06 -16.38 -19.30 -15.53 -15.98 -16.02 -23.06 -32.08
Gauss -24.25 -18.59 -20.51 -18.02 -15.27 -19.90 -15.82 -15.80 -15.54 -22.49 -30.12
50 Opt -26.85 -19.44 -17.81 -17.90 -16.49 -18.81 -15.55 -16.03 -15.88 -23.02 -32.29
Gauss -24.12 -19.02 -20.98 -17.81 -15.43 -19.23 -15.84 -15.79 -15.64 -22.36 -30.19
(b)
D Input NEST STAC
10 Opt -19.150 -20.773
Gauss -19.025 -20.221
25 Opt -19.110 -20.787
Gauss -18.965 -20.257
50 Opt -19.044 -20.815
Gauss -18.846 -20.369
4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
(a)
SNR = 20 dB
D I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11
10 Opt -36.85 -24.94 -32.21 -23.07 -27.54 -22.06 -30.91 -22.02 -20.35 -20.65 -32.32
Gauss -34.17 -23.78 -31.74 -23.08 -25.32 -22.59 -28.93 -20.53 -18.86 -20.37 -31.49
25 Opt -36.79 -25.92 -32.01 -23.08 -28.23 -23.08 -31.78 -22.02 -21.76 -21.50 -32.50
Gauss -34.10 -24.66 -31.71 -22.78 -25.81 -23.55 -29.31 -20.32 -19.83 -21.67 -31.69
50 Opt -36.70 -27.10 -31.58 -24.68 -29.19 -24.67 -33.23 -24.01 -21.57 -22.79 -32.68
Gauss -33.97 -26.12 -31.17 -22.73 -26.84 -24.94 -29.82 -21.59 -22.61 -23.36 -31.76
(b)
D Input NEST STAC
10 Opt -22.547 -30.034
Gauss -22.072 -28.421
25 Opt -23.123 -30.513
Gauss -22.597 -28.743
50 Opt -24.650 -30.827
4.5 Application of Variable Forgetting Factor to Finite Impulse Response
Gauss -23.750 -29.362

143
144
(a)
SNR = 30 dB
D I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11
10 Opt -46.70 -31.07 -39.94 -27.79 -37.63 -29.83 -38.15 -26.65 -29.07 -25.94 -41.01
Gauss -44.02 -30.72 -40.80 -28.16 -35.18 -29.12 -38.25 -25.67 -26.98 -25.96 -40.24
25 Opt -46.64 -32.06 -40.16 -27.86 -38.75 -31.45 -37.93 -27.78 -31.18 -29.53 -41.25
Gauss -43.95 -31.72 -40.89 -27.98 -36.21 -30.53 -38.64 -25.98 -29.22 -27.46 -40.60
50 Opt -46.54 -33.54 -40.25 -31.54 -39.56 -33.21 -38.21 -30.81 -33.25 -31.60 -41.42
Gauss -43.82 -33.48 -40.34 -30.17 -37.43 -32.46 -39.13 -28.98 -31.42 -30.42 -40.75
(b)
D Input NEST STAC
10 Opt -28.256 -38.750
Gauss -27.927 -37.581
25 Opt -29.736 -39.317
Gauss -28.735 -38.252
50 Opt -32.140 -39.872
Gauss -31.102 -38.816
4 Finite Impulse Response Adaptive Filters with Increased Convergence Speed
4.5 Application of Variable Forgetting Factor to Finite Impulse Response 145
normalized estimation error (4.117), taking care to take into account changes of
the estimated system over time, i.e. a constant vector H was replaced by a time-
variant H(k), thus

Hk H^ k 2
NEEk 10 log10 : 4:119
kHkk2
Figures 4.10, 4.11 and 4.12 show the values of normalized estimation error
(4.119) for the cases when the SNR factor is SNR 10, 20 and 30 dB, respectively.
Besides that, we separately considered different intervals depending on the
changes of the parameters of the estimated filter. Namely, it is possible to dis-
tinctly separate intervals in which the estimated parameter does not change its
value: I1 (11,100), I3 (1,4001,650), I5 (1,6501,950), I7 (2,1502,250), I9
(2,2502,350), I11 (2,3503,000), then the intervals in which the estimated
parameter slowly changes: I2 (1,1001,400), then the fast change I5 (1,9502,150),
as well as the abrupt changes I4 (around 1,650), I8 (around 2,250) and I10 (around
2,350). The bracketed values are the numbers of signal samples in the sequence,
while I1I11 is the notation of the intervals. To compare the influence of the input
signals, in all intervals we analyzed the averaged normalized estimation error
given as
1X L
NEEk; 4:120
L k1
where L represents the length of the interval under consideration. When averaging
stationary and nonstationary intervals, for the sake of the analysis of fast changes,
which occur almost instantaneously, we introduced the variable D, so the criterion
(4.120) is modified and written in the form (4.121). The variable D defines the
range taken into account before and after nonstationarity occurred, and its values
were considered for 10, 25 and 50 samples. According to this, (4.120) can be
written as
LX
2 D
1
NEEk; 4:121
L2 L1 2D kL1 D
where L1 and L2 are the beginning and the end of the considered interval,
respectively. In the moments of occurrence of abrupt changes L1 L2 ; so that the
averaging interval is equal to 2D. Tables 4.1, 4.2, and 4.3 show averaged values of
normalized estimation error (4.121) for different methods of generation of input
sequence on each considered interval separately (I1I11), as well as on stationary
only (STAC) and nonstationary only (NEST) intervals. The presented values are
obtained for different values of the SNR.
The obtained results show that the use of optimal input sequence in PA-RLS
algorithm for the determination of VFF does not compromise its good properties in
the intervals of nonstationary changes of parameter values. Besides that, the initial
convergence speed of normalized estimation error is larger if optimal input

sequence is utilized in all three analyzed cases, for different values of the SNR
factor. In the intervals without nonstationary changes the use of optimal input
sequence generates somewhat lower values of the criterion NEE than for the
application of Gaussian noise, which is especially marked for SNR vales of 20 and
30 dB, which points to a more successful parameter estimation and faster con-
vergence when stationary intervals occur. The results presented in Tables 4.1, 4.2,
and 4.3 support the obtained improved results when optimal input sequence is
used. It follows that the D-optimal input sequence can improve the properties of
the PA-RLS algorithm in the intervals without nonstationary changes, while at the
same time it does not impair its good properties in the intervals of nonstationarity.
Chapter 5
Robustification of Finite Impulse
Response Adaptive Filters
Solving the parameter estimation problem in different models of real systems

brought to the development of a number of algorithms with theoretically optimal
properties with regard to a chosen criterion [12, 2425, 4245, 48, 58]. Namely,
the task of identification is the formation of mathematical model and the deter-
mination of its parameters. Since in practical situations one often knows the
structure of the mathematical model of the system based on the knowledge of
the natural laws connecting the system variables and the experimental results, the
identification task reduces to the estimation of the unknown system parameters
based on the measurement of input and output system variables. Most often used in
such models is the probabilistic approach, based on the methods of mathematical
statistics, examples being the method of maximum likelihood [24, 48, 58],
Bayesian estimation [12, 24, 28, 44] and others. In the majority of cases, and
according to the central limit method of the mathematical statistics, the methods
are based on the a priori assumption that random processes in the system have
Gaussian distribution. However, practical experience has shown an insufficient
justifiability of such an assumption in many cases, especially in the situations with
high rates of occurrence of random disturbances in the system, as well as because
of the fact that engineering measurements as a rule contain up to 10 % of such
anomalous values, which are inconsistent with the rest of the measurement pop-
ulation and are denoted as outliers [1315, 35, 59]. It was established that
optimal estimation procedures, based on the Gaussian assumption, may be very
sensitive to deviations of the real distribution of disturbances from the assumed
normal, primarily in the sense of weighing the distribution tails, with a con-
sequence that sporadic realizations of disturbances of high intensity (outliers)
are realized, which results in estimations with poor quality in many practical
applications. The causes of these deviations may be a consequence of inadequate
modeling of the system, inaccurate measurements, failure of some parts of the
systems, disturbances in communication pathways, and similar. Thus it is neces-
sary to utilize stochastic procedures which would be significantly insensitive with
regard to the adopted assumptions, i.e. which would be also efficient in the cases of
incomplete information about disturbances in the system. These methods are
denoted as the robust methods. The task of establishing the existence of a deviation

DOI: 10.1007/978-3-642-33561-7_5,
148 5 Robustification of Finite Impulse Response Adaptive Filters
of a part of the observation from the main body of the population represents the
task of the robust statistics [59].
Like previously emphasized, the occurrence of inconsistent or surprising
observations is denoted in Anglo-Saxon literature as the problem of outliers. One
classification of possible occurrence of outliers encompasses three types of data
variation: (1) inherent variability which represents a natural property of the data
change which cannot be influenced; (2) execution errors which represent a con-
sequence of external influence which has not been taken into account when
modeling the population; (3) measurement errors which can be regarded as the
errors of measurement data readout and rounding [59]. In the last mentioned type
of variation, the problem is most often overcome by replacement, if the correction
model is known, or by rejection of such data. For the remaining two types one may
apply the so-called incongruence tests for the detection of the presence of
deviations from the assumed population [13]. The incongruence tests have an
important role in the initial phase of the analysis of a set of measurement data.
Further procedure depends on a particular application, which may lead to a cor-
rection or rejection of incongruent observation or one may interpret these incon-
gruent observations through the identification of the influence of some factors of
particular practical interest.
In the field of estimation, regarded as a separate field of the robust statistics, the
problem of incongruent observations does not have necessarily to start by
incongruence tests, and the basic interest is to construct an algorithm for esti-
mation of unknown parameters within the adopted stochastic data model which is
relatively insensitive to the possible presence of outliers, deviating insignificantly
from the optimal estimators when outliers do not exist [14, 15, 28, 47].
In this chapter we present a new member of the family of robust algorithms for
identification of systems, which is based on a statistical approach denoted as the
M-estimation [14]. The assumed robust LMS algorithm differs from the conven-
tional algorithm in that one introduces nonlinear transformation of the estimation of
measurement residual, and the goal of the introduction of nonlinearity is to give a
small weight to a minority of incongruent residuals, so that impulse noise in the
desired filter response does not have a large influence to the total parameter
estimation. The properties of the proposed algorithm are compared to the properties
of the existing robust algorithms that are widely utilized for the solution of the
quoted problems. Besides that, we analyzed the possibility to apply optimal input
sequences to the robust version of the recursive least squares algorithm (RRLS
algorithm), with a goal to improve the convergence properties and the accuracy of
the estimated parameters, as well as to use variable forgetting factor in the
conditions when besides impulse noise one also has to solve the problem of
identification of a system with time-variant parameters.
Because of the known sensitivity of robust estimators to signal dynamics
changes, we analyzed the possibility to iteratively estimate the scaling factor in
robust algorithms during the operation of such an adaptive algorithm, in order to
ensure universality of the estimation procedure with regard to its invariance to the
5 Robustification of Finite Impulse Response Adaptive Filters 149
value of the measurement residual, i.e. the dynamic properties of the measurement
signal.
A special problem in the field of estimation is the estimation of a system with
time-variant parameters, because abrupt changes of parameter values result as a
rule in an abrupt rise of residual, which may be erroneously interpreted as the
appearance of impulse interference in measurement noise. Because of that we
present another new algorithm, which basically represents a combination of a
robust algorithm and an algorithm with variable forgetting factor, and is based on
the synthesis of an outlier detector in the form of a robust median filter and the
application of the recursive least squares algorithm with exponential forgetting
factor for the estimation of time-variant system parameters (FIR filter). The
properties of the proposed algorithms are analyzed experimentally, through a
computer simulation.
5.1 Robust Least Mean Square Algorithm
Although widely utilized, the least mean square error algorithm (LMS algorithm)
is, similarly to other gradient-type algorithms, very sensitive to impulse errors
[36, 57, 60]. The appearance of impulse interference is often mages, biomedical
signals, as well as in solving communication problems [19]. This shortcoming or
the LMS algorithm is one of the reasons to introduce a modification which would
robustify it with regard to impulse interference. Possible solutions to this problem
are the well-known algorithms from literature, denoted as the Robust Mixed Norm
(RMN) and the median LMS (MLMS) algorithms [36, 57, 60]. Both algorithms
imply nonlinear transformations, but the main difference is that the MLMS
algorithm achieves robustness through the application of a nonlinear median
operation to the gradient estimation, while in the case of the RMN algorithm one
applies a nonlinear transformation to the residual estimation [36, 57]. The block
diagram of the structure for parametric identification of the system is shown in
Fig. 5.1, where the unknown system is a FIR filter of a known order M, defined by
the vector h h0 h1 hM1 T : A common input signal xk representing
Gaussian noise with a zero mean value is brought to the input of an unknown
system and an adaptive filter. Impulse noise nk is brought to the output of the
unknown system yk to obtain noisy desired response d k: Let us note that the
real desired response is yk; while dk represents measurement of this signal in
presence of additive measurement noise nk:
The adaptive filter is also an FIR filter of the order M, determined by the
parameter vector Hk and by a recursive algorithm for parameter estimation,
whose purpose is to obtain the best possible approximation of the desired response
yk by the output signal from the adaptive filter, ^yk; and at the same time to filter
measurement noise, nk: The goal of the parameter estimation algorithm is to
Fig. 5.1 The general n(k)

structure for parametric Unknown y(k) +
+
d(k)
system
identification of the system h
(the system model is known
x(k)
with accuracy up to an y(k) +
Adaptive filter _
unknown parameter vector) H(k)
e(k)
Criterion
J
minimize the criterion function, whose main parameter or argument is the esti-
mated error or measurement residual
ek d k ^yk dk XT kHk; 5:1

T n
where Xk xkxk 1. . . xk M 1 : If the sequence E fei; i 1;
2; . . .; ng represents a random array formed from identically distributed random
variables ei whose probability density function is p; one may estimate the
parameter vector H utilizing the maximum likelihood algorithm (ML estimator)
for which the criterion function J is [14, 24, 28, 48]
X
n X
n
Jn ln pei uei; 5:2
i1 i1
where the likelihood function is

u ln p:
It follows from minimization of the criterion function (5.2), i.e. from its dif-
ferentiation, that
X
n
p0 ei X
n
rJn rei Weirei 0; 5:3
i1
pei i1
where W is the derivative of the function u; assuming that it is differentiable,

i.e. that u0 W; while r is gradient. The solution of the equation
X
n
Weirei 0 5:4
i1
is described as the maximum likelihood estimator of the parameter vector H and is

denoted by H:^ The estimator based on the solution of the Eq. (5.4) independently
on the choice of a particular function W is called the M-estimator (approximate
maximum likelihood estimator) [14]. Typical examples of this procedure are the
following ones
5.1 Robust Least Mean Square Algorithm 151
1. Normal distribution pe N 0; 1 c expe2 =2; ue e2 =2 c1 ;

We e; while the solution of the Eq. (5.4) leads to the LMS estimation of the
parameter vector H;
2. Laplace distribution pe L0; 1 expjej; ue jej; We signe;
the solution of the Eq. (5.4) leads to the Least Absolute Deviation LAD esti-
mation of the parameter vector H;
3. Mixed normal distribution
pe 1 kN 0; 1 kL0; 1; 0\k\1;

ue 1 ke2 2 kjej; We 1 ke ksigne;
while the solution of the Eq. (5.4) leads to the RMN estimation of the
parameter vector H: The choice of k in each step n is based on the probability
kn Pfjdnj [ d0 g; where d0 is a positive threshold, obtained under the
assumption that the desired response dn has a distribution N 0; rd ; which
gives kn 2erfcjdnj=rd : Here erfc represents a complementary error
function [57]. For the robust estimation rd one utilizes the estimation
12
oT To
^d
r ;
nH 3
where o represents a vector formed of nH last values of the desired response d,
sorted in ascending order with regard to their amplitude, and T is a nH nH
diagonal matrix T Diag0; 1; 1; . . .; 1; 1; 0; [57], where nH denotes the dimen-
sions of the parameter vector H.
The LMS estimation (5.2) (item 1) is very sensitive to the shape of the dis-
tribution function at its ends, since the realizations of impulse noise or outliers,
generated by weighted tails deviating from Gaussian distribution, may
adversely influence parameter estimation. On the other hand, RMN and LAD
estimations (items 2 and 3) are much more robust, in the sense of smaller sensi-
tivity to the presence of impulse noise. The presented estimators point out to the
possibility to design a new member of the RMN estimator class, which would be
situated between the LAD and the LMS estimator. The basic motivation is that the
LMS algorithm, generally taken, gives much better estimation in the absence of
impulse noise, when the LAD is less successful, however the LAD is much more
robust in presence of impulse interference in comparison to the LMS algorithm.
Such a new term of the group of robust algorithms for adaptive filtering is
essentially based on Hubers M-robust estimation (5.4) [14], it is called robust
LMS (RLMS) algorithm [61] and is presented in the rest of this Chapter.
5.1.1 Robustification of Least Mean Square Algorithm:

Robust LMS Algorithm
When minimizing the maximum asymptotic variance of estimator error within the
class of contaminated normal distribution (normal mixture with weighted tails)
pe 1 kN 0; 1 kge; 0\k\1;
Huber [14] proposed the W function (the so-called influence function) that cor-
responds to the probability density function pe which in its middle part behaves
like N 0; 1 while at the ends it has the form ge L0; 1; i.e.
8
< 1; e [ 0
0
ue WH e ln pe minjej; msigne; signe 0; e 0 5:5
:
1; e\0
where m [ 0 is a constant ensuring efficient robustness of the estimator, its value
controlling the coordination between the robustness degree and the property of
estimator degradation for the case when impulse interference generated by the
term ge L0; 1 is absent. It turns out that such an estimator also gives satis-
factory results for many other probability density functions ge; which are
characteristic for some particular practical applications [14, 15, 36]. It is said that
an estimator is efficiently robust if its accuracy is high (90 7 95 %) for an
adopted nominal statistical model for which one usually utilizes normal distribu-
tion, N(0,1). The function W ensures robust properties of the estimator. In a general
case it is a limited, monotonously non-decreasing and continuous. Monotonicity
leads to a unique solution of the M estimator (5.4), limitedness ensures that par-
ticular realizations of impulse interference do not have arbitrarily high influence to
parameter estimations, and continuity ensures that rounding and cutoff errors, as
well as grouped contamination (impulse interference) do not cause large pertur-
bations of parameter estimations [35].
However, the considered M-estimator is sensitive to changes of signal
dynamics. Namely, let us assume that a solution H ^ is found from (5.4) for some En
and that the realizations of random variables e(i) are replaced by the corresponding
ones where the standard deviation is tripled. A new solution H;^ generated by (5.4),
does not have to be equal to the previous one. To cancel the influence of signal
dynamics to the value of estimated parameters one should modify the expression
(5.4) in the following manner

1X n
~ H ei rei 0;
W 5:6
s i1 s
~ H sWH [35]. Namely, the principal goal of the
where s is a scaling factor, and W
introduction of the scaling factor is to make the estimator invariant to the value of
the random measurement residual, whose expected or mean value is equal to the
unknown standard deviation which is time-variant. A popular robust estimation of

the parameter s in statistical literature is the median of absolute deviation [MAD
in (3.6)]
medianjei medianeij
s2 : 5:7
0:6745
The factor 0.6745 ensures that the estimation (5.7) is approximately equal to the
standard deviation of the sample for a large enough sample length in the case when
the data feig are generated by the normal distribution function, i.e. s2 r2e :
Robust estimation of the scaling factor s influences the choice of the value of the
robustness constant m in (5.5). Since s rd ; one usually takes for m a value close
to 1,50.
The solution of the nonlinear system of Eq. (5.6) cannot be obtained in a closed
form, and requires the use of iterative (numerical) procedures. Starting from some
^ 0 of the unknown parameter vector H and the robust estimation
initial estimation H
of the scaling factor (5.7), it is possible to use several approaches to the solution of
the posed nonlinear problem.
The equation for updating parameter estimation using the robust LMS algo-
rithm can be derived from the following gradient-type equations, i.e. the algorithm
of the steepest descent [24, 28]

H^ n H^ n 1 lnrJn H ^ n 1 ; 5:8
where ln is the so-called adaptive gain that influences the convergence prop-
erties of the algorithm. Most often one takes ln l=n; l [ 0: Let us adopt that
^ n 1 is the optimal estimation in the moment (n - 1), according to
H

rJn1 H ^ n 1 0: Taking H ^ H^ n 1; and using the fact that based on
(5.1) one may write ren Xn; we obtain

rJn H^ n 1 W~ H en=snXn;
from which it follows

^ n H
H ~ H en=snXn:
^ n 1 ln 1W 5:9
Here s(n) represents the current estimation of the scaling factor (5.7), calculated
utilizing nH previous values of the measurement residual e.
The value of the derivative W of the criterion function u in the ML-criterion
(5.2), obtained from the Eq. (5.3) for different algorithms is represented in Fig. 5.2.
In robust statistics the function W is also called the influence function [35].
When implementing the algorithm (5.9) one may apply the following proce-
dure, shown in Table 5.1.
One can notice a high resemblance between this procedure and the LMS
algorithm. Namely, Step 4 is a nonlinear transformation of the scaled residual ~e:
This step is known as residual winsorization [14]. In the case when noise
Fig. 5.2 The shape of the

W-function (influence LMS: (e)=e
function) for different LAD: (e)=sign(e) (e)
estimators RMN: (e)=(1-)e+ sign(e)
RLMS:(e)=min(|e|,m)sign(e)
-m -1 1 m
rezidual e
-
-1
Table 5.1 Flow diagram of RLMS algorithm

Step 1: In the iteration n; n nH let the vector H^ n 1 from the previous
iteration (n - 1) be known

Step 2: Calculate the current residual e n; H ^ n 1 utilizing (5.1) and define
the vector En fen; en 1; . . .; en nH 1g of a length nH ; implying that
nH 1 previous values of the residual fen 1; . . .; en nH 1g are known
Step 3: Calculate the MAD estimation s(n) using (5.7) and the terms of the vector
En sorted in an ascending series (the median operation requires the formation
of this ascending series)
Step 4: Calculate normalized residual ~en en=sn and transformed residual
W~ H ~en snWH ~en; where WH is given by equation (5.5)
Step 5: Calculate adaptive gain ln l=n
Step 6: Calculate the current value of the parameter vector H ^ n using expression (5.9)
Step 7: Set n n 1 and start from the Step 1
distribution is normal, it is desirable that the inequality j~ej\m is satisfied, since

then WH ~e ~e:
If this condition is fulfilled, then the estimation of the parameter vector H ^
defined by the Eq. (5.9) is the real LMS estimation. In the presence of impulse
noise, the role of Step 6 is to ensure LMS estimation of transformed residuals, with
a goal to decrease the influence of impulse noise. With regard to the RLMS
algorithm, the median LMS algorithm (MLMS algorithm) known in literature is
defined by the recursion [60]

^ n H
H ^ n 1 ln 1med e n; H
^ n 1 Xn ;
nH
where med fgnH is the vector mn; with dimensions M 1; and with elements

mi n med H ~ i n; H
~ i n 1; . . .; H
~ i n nH 1 ;

~ i n is the i-th element of the vector e n; H
while H ^ n 1 Xn:
5.1.2 Stability Analysis of Robust Estimators
Convergence properties of the robust RLMS algorithm (5.9) can be analyzed using
an approach based on ordinary differential equations (ODE), presented in detail in
Sect. 4.3. Under certain conditions (i.e. assuming that sn rd ; lim ln 0
n!1
P
1
and ln 1), the next ODE system can be utilized for the recurrent sto-
n1
chastic procedure (5.9):
dHs
rd f Hs; 5:10
ds
where
X
n

1 e i ei
f Hs lim n WH Xi E WH Xi : 5:11
n!1
i1
rd rd
To analyze the stability properties of the ODE system, let us utilize the direct
method of Lyapunov [24, 39] and define a non-negative function
1
V H; rd E W2H ~ei ; ~ei ei=rd 5:12
2
as a candidate for the Lyapunov function. In order to show that (5.12) is really a
Lyapunov function, it is necessary to show that its derivative is not positive [24, 39].
Starting from (5.12), one may write

dV H; rd oWH ~ei o~ei T dHs

E WH ~ei : 5:13
ds o~e oH ds
Since
oWH ~e o~ei 1
0\ m; m [ 0; Xi;
o~e oH rd
from expressions (5.10) and (5.13) one may conclude that
dV H; rd dHs
mE WH ~eiXT i r1
d mf T Hf H 0: 5:14
ds ds
Using (5.14), the stability analysis presented in Sect. 4.3 shows that the
parameter estimation H ^ n from the Eq. (5.9) will converge with a probability equal
to one to the stationary value H
of the Lyapunov function V in (5.12). Generally

n the stationaryovalues H comprise the set DC fH jf H 0g; such

speaking,
that P lim H ^ n 2 DC 1: For the case that H
is a unique solution, the value
n!1
^ n converges with probability one to the real value of the parameter vector H
:
H
The judgment about the convergence speed and the robustness properties can be
brought utilizing the covariance matrix of estimation error. If M-estimation (5.6)
with s rd is approximated by replacing the function WH with two terms of its
Taylor expansion in the vicinity of H
; one may show that the approximate esti-
mation of the asymptotic covariance matrix of estimation error is [15, 36]
n o
V W; p lim E n H ^ n H
H ^ n H
T r2d aW; pE1 X iX T i ;
n!1

E W2 ni=rd
aW; p 2 0
E fW ni=rd g
5:15
where H
is the accurate value of the estimated parameter vector of the FIR filter.
It is implied here that fnig are independent and identically distributed (iid)
random variables, not correlated with the input signal x(i). Equation (5.15)
depends on W only through the scaling factor aW; p: To obtain maximum
asymptotic convergence speed, the choice of W should be such to take the
smallest possible value for a: Hubers minmax optimal solution satisfies
[14, 28, 44]
aW0 ; p aW0 ; p0 I 1 p0 aW; p; p 2 P, 5:16

and is given by the likelihood function determined according to the most unfa-
vorable probability density function p0 within the adopted class P
W0 log p0 0 ; p0 arg min I p: 5:17

p2P
In the expression (5.17) I p represents Fishers information amount in (4.10),

and P is the given class of the probability density function, to which the probability
density distribution function of the real unknown noise fnig also belongs. The
probability density function p0 is the most unfavorable within the adopted class P
because it carries minimal amount of information on the estimated parameter
vector, i.e. a minimal Fishers information amount within the given class P cor-
responds to it. In other words, the ML estimator, designed according to the most
unfavorable probability density within the given class P, satisfies the minmax
inequality for the covariance matrix of estimation error
V W0 ; p V W0 ; p0 V W; p0 ; 5:18
where W is an arbitrary criterion function. This means that for an arbitrary
probability p 2 P the covariance matrix of estimation error will not be larger than
the covariance matrix of the estimation error for the most unfavorable probability
density p0 within the given class P, while the arbitrary criterion function W will
give a larger covariance matrix of estimation error for the most unfavorable
probability density p0 with regard to the ML-estimator which is designed
according to the worst probability density function p0 within the given class P.
Let us illustrate the presented result through the following example. Let us have
a noise probability density distribution
function p with a zero mean value and with
a variance r2d ; i.e. pn n N nj0; r2d ; while real noise has a contaminated normal
probability density distribution

pc n 1 kN nj0; r2d kg;
where 0 k 1; and g represents a symmetric probability density distribution
function with a zero mean value and with a variance r2g r2d : For the standard
LMS algorithm, the influence function is defined by

W x W1 x x r2d ;
which gives for the scaling factor in (5.15) a value of aW1 ; pn r2d and
aW1 ; pc kr2d : It follows from here that the ratio of norms for corresponding
covariance matrices of estimation errors in (5.15) can be approximated as
. .
aW1 ; pc aW1 ; pn kr2g r2d 1:
This result shows a decrease of the convergence speed of the LMS algorithm in
the presence of impulse noise generated by the probability density distribution
function g: Specifically, if g has a Cauchy distribution, when r2g 1; the
standard LMS algorithm ceases to be operational, i.e. it diverges. On the other
hand, the choice of W0 in (5.17) introduced robustness, so that [14, 28]
aW0 ; pc aW0 ; p0 I 1 p0 :
As mentioned earlier, for the class P of the contaminated normal distribution pc ;
the worst case of distribution p0 in (5.17) is the normal distribution with expo-
nentially weighted tails, i.e. the normal distribution contaminated by Laplace
distribution, which leads to the coincidence of W0 with WH in (5.5).
5.1.3 Simulation-Based Experimental Analysis
To show the properties of the RLMS algorithm, we simulated a system for the
identification of the FIR filter parameters, as presented in literature [36, 60]. The
signal of the desired response was obtained by bringing noise with normal dis-
tribution and with unit power to the input of a ninth-order FIR filter, with the
coefficients
h 0:1 0:2 0:3 0:4 0:5 0:4 0:3 0:2 0:1:
Independent noise with normal distribution was additively brought to the filter
output signal, with a fixed variance thus chosen to ensure a signal to noise ratio
SNR 0 dB prior to addition of impulse noise. Impulse noise was generated using
the model
nk akAk;
where ak is a binary independent identically distributed (iid) process with the
probabilities
Pak 1 0:01 and Pak 0 0:99:
Here Ak is a process with a symmetrical distribution of amplitude, uncorre-
lated with ak and with a variance varfAkg 104 =12: The length nH of the
sliding window of the estimator (5.7) was chosen to be 10, in order to decrease as
much as possible the probability of the appearance of more than one outlier within
the window. A value of l 0:01 was chosen for the constant of adaptive gain
factor ln l=n, in order to ensure initial convergence speed of the analyzed
algorithm as close as possible to the case without impulse noise.
Figure 5.3 shows the value of log-normalized error of estimated parameters

H^ n h 2
10 log ; 5:19
khk2
where k k is the Euclidean norm, for different algorithms in the case of pure
Gaussian noise. Obviously the properties of the RLMS and of the MLMS algo-
rithm are very similar to the LMS algorithm, and simultaneously better than the
properties of the RMN algorithm. The LAD algorithm shows in this case the worst
results. Figure 5.4 shows the values of the criterion (5.19) of the same algorithms
obtained for the case of simulation when impulse noise is present. Besides that,
Table 5.2 shows the averaged criterion (5.19) for different values of probability
Pak 1 of the appearance of impulse realizations of noise, calculated based
on 100 Monte Carlo attempts. Table 5.3 shows averaged value of the criterion
(5.19), obtained on the basis of 100 Monte Carlo attempts for different values of
impulse noise intensity, expressed by varfAkg: Table 5.4 presents the same
properties measured for different distributions of impulse noise amplitude Ak:
Fig. 5.3 Log normalized 0

estimation error (5.19) for
different algorithms in -1
presence of Gaussian noise
-2
-3
dB
-4
-5
-6
-7
0 50 100 150 200 250 300 350 400 450 500
Number of iterations
Fig. 5.4 Log of the 8

normalized estimation error
(5.19) for different algorithms 6
in presence of impulse noise
4
dB 0
-2
-4
-6
-8
0 50 100 150 200 250 300 350 400 450 500
It is obvious that the properties of the LMS algorithm become worse, while the
implemented changes do not influence the RLMS algorithm, and at the same time
the RLMS retains better properties compared to the RMN and the LAD algorithm.
Besides that, the RLMS and the MLMS algorithms have comparable results. The
dependence of the estimation quality on the parameters quantitatively determining
the adaptive filtering algorithm is also important from the practical standpoint.
Namely, the form of the nonlinearity WH in (5.5) is unambiguously determined
by the parameter m which, generally taken, depends on the probability
Pak 1 of the appearance of impulse noise. Since this probability, the
so-called contamination index [14], is not accurately known in practical situations,
it is necessary to define it in advance. The value defining the decrease of the
algorithm efficiency under the condition of normal noise is denoted in literature as
the premium [12]. Simulation results show that this factor is small for the RLMS
Table 5.2 Averaged log normalized

estimation error (5.19) for different probabilities
Pak 1 and varfAkg 104 12 calculated based on 100 Monte Carlo attempts
Algorithm
LMS LAD RMN MLMS RLMS
P{a(k) = 1} = c 0.005 -2.2221 -2.763 -3.8241 -4.7751 -4.7259
0.010 -1.4289 -2.7743 -3.8918 -4.6313 -4.7765
0.015 -1.3900 -2.7819 -3.8472 -4.7921 -4.7462
0.020 -0.1836 -2.7725 -3.8764 -4.6872 -4.7121
0.025 0.9076 -2.8130 -3.9296 -4.6323 -4.7777
0.030 0.3399 -2.7572 -3.6153 -4.5596 -4.6259
0.035 0.9059 -2.6962 -3.2606 -4.4182 -4.3804
0.040 1.4296 -2.7171 -3.5720 -4.5191 -4.5963
0.045 1.6613 -2.7609 -3.6355 -4.4118 -4.5558
0.050 2.5058 -2.6951 -3.3069 -4.3599 -4.3640
0.055 2.0840 -2.7409 -3.2993 -4.4011 -4.4886
0.060 3.3922 -2.6838 -3.0713 -4.3847 -4.3822
0.065 2.6689 -2.6516 -3.5164 -4.2718 -4.3143
0.070 3.0542 -2.6090 -3.2907 -4.0179 -4.2173
0.075 2.2492 -2.6013 -3.2468 -4.0135 -4.1811
0.080 3.6034 -2.6512 -2.8348 -3.9865 -4.2055
0.085 3.9637 -2.6237 -2.4907 -3.9186 -4.1198
0.090 3.6869 -2.5593 -3.2021 -3.9750 -4.1658
0.095 3.1570 -2.6039 -2.8270 -3.9416 -4.0772
0.100 3.7495 -2.6106 -3.1638 -3.8201 -4.0198
Table 5.3 Averaged log normalized estimation error (5.19) for different values of impulse noise
intensity varfAkg and Pak 1 0:01; calculated based on 100 Monte Carlo attempts
Algorithm
LMS LAD RMN MLMS RLMS
Var{A(k)} 103/12 -4.1910 -2.8602 -3.9423 -4.7064 -4.8478
104/12 -1.4289 -2.7743 -3.8918 -4.6313 -4.7765
105/12 0.3731 -2.7692 -3.8397 -4.4703 -4.6974
106/12 5.0510 -2.7790 -3.8923 -4.4401 -4.8262
Table 5.4 Averaged log normalized estimation error (5.19) for different values of the amplitude
distribution, calculated based on 100 Monte Carlo attempts
Distribution
Laplace Cauchy Uniform
Algorithm LMS -1.2613 -1.6846 -1.0091
LAD -2.7282 -2.8674 -2.7370
RMN -3.8422 -3.8951 -3.7294
MLMS -4.4369 -4.5313 -4.5375
RLMS -4.7800 -4.6771 -4.8604
algorithm when the value of the parameter m belongs to the interval (1, 2) and
when the real value of the variance r2d is accurately known, ensuring at that good
properties of robustness. Since the value of r2d is usually not known in practical
situations, it is estimated using (5.7). This has only a little influence to the decrease
of the mentioned efficiency if the length of the moving window nH is correctly
chosen. As mentioned before, the simulation results show that the value nH 10
gives good results in practical situations.
^
Finally, since the analyzed algorithms are nonlinear, the starting value H0 may
have a great influence to the quality of the parameter estimation, as well as the choice
of the value of the parameter l in the adaptive amplification factor ln l=n
(Fig 5.5).
On the other hand, low sensitivity to initial conditions is important to obtain
practical robustness.
This problem can be solved by utilizing the adequate initial value H0; ^
obtained by the LMS estimation, while the most convenient value of the variable l
can be determined by simulation, while keeping track of the convergence speed.
Experiments show that a value of l 0:01 gives good results.
The obtained results point out to the following conclusions:
1. A robust adaptive algorithm for FIR filters, the so-called RLMS algorithm is
proposed, based on the statistical approach denoted as the M-estimation, and
basically representing an approximation of the well-known method of maxi-
mum likelihood (ML-estimation). Contrary to the ML-estimator, whose crite-
rion function (5.2) is determined according to the exact knowledge of the
function of distribution density of noise probability, in the M-estimation (5.4,
5.5) one starts from the assumption that the function of the distribution density
of noise probability cannot be exactly known, and it is chosen instead so that an
Fig. 5.5 Averaged log 5

normalized estimation error
4
(5.19) for different
probabilities Pak 1 and 3
varfAkg 104 12 2
calculated based on 100 LMS
Monte Carlo attempts 1
dB 0
MLMS RMN
-1 LAD
-2
-3
-4
RLMS
-5
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
P{a(k)=1}=c
estimator criterion function associated with it gives satisfactory results in all

situations of interest in concrete applications of the adaptive filter. In other
words, such an estimator will have a guaranteed quality of behavior determined
according to the least favorable function of noise probability density within the
assumed class, i.e. the estimation error variance for an arbitrary function of
noise probability distribution within a given class will never cross an upper
limit, determined according to the least favorable function of probability dis-
tribution density which carries a minimum amount of information about the
estimated parameter within an assumed distribution class.
2. The proposed RLMS estimator belongs to the class of estimators situated
between the often used LAD and LMS estimators, is insensitive to the presence
of impulse noise in the desired filter response and at the same retains good
properties in the cases when only Gaussian noise is present in the desired
response. This improvement is achieved with a small increase of the compu-
tational complexity in comparison to the classical LMS algorithm.
3. The properties of the proposed algorithm were investigated by simulation, and
the obtained results show that the RLMS algorithm can serve as an adequate
replacement for other robust methods, especially for the RMN and MLMS
algorithms that are well-known in literature.
4. The convergence speed of the RLMS algorithm can be increased by the
introduction of the matrix gain factor instead of a scalar one, which results in
robust versions of the RLS algorithm that are described in the rest of this
chapter.

with Optimal Output
In this chapter we consider a possibility to synthesize a robust version of the RLS

algorithm (the so-called RRLS algorithm), as well as a possibility to improve the
convergence properties of the RRLS algorithm by utilizing optimal input
sequences, as shown in Chap. 4.
To this purpose, the problem of recursive parameter estimation, presented in
Fig. 5.1, will be considered as a task of the h vector estimation based on the actual
signal measurements. If it is of interest to determine a value of the parameter H ^
that minimizes the least squares (LS) criterion, one starts from the MSE criterion
function [24, 2633, 44, 45]

J H^ E e2 k ; ek yk XT kH ^ k 1; 5:20
where ek is prediction error or measurement residual. Since (5.20) imparts the
same weight to all residuals, a large influence of outlier is possible to the resulting
5.2 Robust Recursive Least Squares Algorithm 163
LS estimations which minimize the criterion (5.19) [1315]. With a goal to

decrease this influence, it is possible to modify the LS criterion according to the
M-estimation criterion [14]

^ E u e 2 k ;
JH 5:21
where u is a convex function that depends on the assumed class of disturbances
and that should ensure robustness of the procedure. Namely, the criterion function
should ensure a high estimation efficiency for the data with Gaussian distribution
and at the same to efficiently remove sporadic disturbances in the form of impulse
interference. Because of these requirements, u should have a form of a square
function for low values of the argument. Besides, it is desirable that its derivative,
W u0 ; also denoted as the influence function [35], is a limited and contin-
uous function. This corresponds to the choice of Hubers robust loss function [14]
8
>
> mj xj
< c1 ; j xj mr;
u x r 5:22
> 2
: x c2 ; j xj\mr;
>
2r2
where c1 and c2 are adequately defined constants that ensure continuity of the
function, and m is chosen in such a manner to ensure a desired efficiency for
nominal normal model of data distribution N j0; r2 : Determining the first
derivative of the loss function (5.22), one obtains the Hubers influence function,
also known as Hubers nonlinearity
8
< 1; x [ 0
j xj m
W x min 2 ; sign x; sign x 0; x 0 : 5:23
r r :
1; x\0
The effect of the use of (5.23) is that one imparts low weights to a small part of
residuals with large absolute values, so that they have small influence to the final
parameter estimations.
While deriving the recursive form of the considered robust LS procedure, the
so-called RRLS algorithm, instead of using the optimum criterion given by (5.19),
one starts from the empirically obtained criterion
X
i
^ i1
Ji H ^ :
u e k; H 5:24
k1
The criterion (5.24) follows from (5.19) when one replaces mathematical
expectation by arithmetic means. It is implied that such an i is chosen to ensure
that Ji converges to J given in (5.20) [28]. To solve a system of nonlinear

equations appearing as the results of the condition of optimum oJ H ^ =oH^ 0;
where oJ =oH denotes gradient or partial derivative operator, one may apply the

Newton-Raphson method [12, 24, 25]. Linearizing Ji H ^ in the vicinity of the
^ i 1;, one obtains
estimation H

oJ H ^ i 1
^ Ji H
Ji H ^ i 1 i H^ H^ i 1 T 1 H ^ H^ i 1 T
oH 2

o2 J i H^ i 1
^ H
H ^ i 1 O H ^ H ^ i 1 2 ;
oH2
5:25
where
O k xk
lim 0; 5:26
kxk !1 k xk
^ H
and k k denotes the Euclidean norm. The desired value H ^ i is obtained by
solving the equation

oJi H^ i =oH 0; 5:27
whence one obtains
" #1 " #
o 2
J i
^ 1
Hi oJ i
^ 1
Hk
^ Hi
Hi ^ 1 i i
oH2 oH

^ ^
O Hi Hi 1 : 5:28
Let us note that the algorithm (5.28) essentially represents the Stochastic
Newton-Raphson scheme (4.38) applied to the task of criterion (5.24)
minimization.
Let us assume that additionally the following hypotheses are satisfied
P1: Estimation Hi ^ is in the vicinity of the estimation Hi ^ 1; which causes

(a) O Hi ^ Hi ^ 1 0
o Ji H
2 ^ i o2 Ji H^ i1
(b) oH 2 oH2
for each sufficiently large i
oJi H
^ i1
^
P2: Estimation Hi 1 is optimal in the step i 1; which gives 0;
oH
Further, it follows from (5.24) that

^ i 1Ji1 H
iJi H ^ u e i; H
^ : 5:29
Taking H ^ Hi
^ 1 and twice differentiating (5.29), while utilizing the
hypothesis P1 (b) and the fact that because of (5.1)

^
oe i; H
Xi;
oH
one obtains

o2 Ji H^ i 1 o2 Ji1 H^ i 1
i i 1 ^ i 1 ;
XiXT iW0 e i; H
oH2 oH2
5:30
where W u0 : Besides that, taking also into account the hypothesis, one
obtains from (5.19), after differentiating and replacing H by Hi^ 1;

oJi H^ i 1
i XiW0 e i; H
^ i 1 : 5:31
oH
Introducing the notation

o2 Ji H^ i 1
Ri i ; 5:32
oH2
and utilizing (5.31), the relation (5.28) assumes the form

H ^ i 1 R1 iXiW e i; H
^ i H ^ i 1 ;
5:33
^ di XT iH
e i; H ^ i 1

Ri Ri 1 W0 e i; H
^ i 1 XiXT i: 5:34
One often utilizes the matrix Pi R1 i: Using the lemma on matrix
inversion (2.58), one obtains from (5.33) to (5.34)

^ k H
H ^ k 1 PkXkW e k; H ^ k 1 ;
5:35
e k; H^ d k X T k H
^ k 1

Pk 1XkXT kPk 1W0 e k; H
^ k 1
P k P k 1 5:36
1 W0 e k; H
^ k 1 XT kPk 1Xk
where k 0; 1; 2; . . .: The relations (5.35) and (5.36) define the robust recursive
LS (RRLS) algorithm, where W is defined by (5.23). The standard deviation
(5.23) is unknown and it must be estimated. The popular ad-hoc robust
estimation of the parameter s in statistical literature is the median of the absolute
deviation (5.7), which was discussed in the previous chapter. This scheme for the
determination of s also suggests adequate values for the parameter m from (5.23).
Since s r; one usually takes for m a value close to 1.5. Such a choice gives much
higher efficiency than the RLS algorithm in the case of Gaussian distribution with
weighted tails, and retains good properties in the case when the data is generated
by a normal distribution density. The program implementation of the RRLS
algorithm is based on the flow diagram presented in Table 5.5.
Table 5.5 Flow diagram of RRLS algorithm

Step 1: Initialization
^
H0 0; P0 r2 I; r2 1
T
X 1 x1 x0 0. . .0; input column vector with dimensions M 1 1
define the data window length nH ; 5 nH 10
define the initial data window E0 fd0 d1 . . . dnH 1g
estimate the standard deviation r0 s0 according to (5.7) and the data E0

e 1; H ^ 0 d 1; initial measurement residual
define the function W according to (5.23) and its derivative W0
^ 1; Xk; Pk
Step 2: Assuming that in each step k 1; 2; 3; . . . one knows Hk

^
1; e k; Hk 1 and rk 1; calculate
normalized residual

^ k 1 ek;Hk1
^
~e k; H rk1
nonlinear transformation of residual

~ k; H
W ^ k 1 rk 1W ~e k; H ^ k 1
amplification matrix Pk according to (5.36)
filter parameter estimation H ^ k based on (5.35)
generate a sample of the desired response dk 1
generate a sample of the excitation signal xk 1 and form the input vector XT k
1 xk 1 xk. . .xk M 1; where xi 0 for i 0

error e k 1; H^ k according to (5.35)
form the data vector Ek by shifting the elements of the previous data vector one

^ k to the position of the last element
position to the left and introducing e k 1; H
to the right
standard deviation sk rk according to (5.7) and the data set Ek
Step 3: Increment the counter k by 1 and repeat the procedure from the step 2
The initial values of the parameter vector H ^ and the matrix P should be chosen
to enhance the convergence properties of the algorithm in the initial part. A
possible approach to accelerate the convergence of the parameter estimation, based
on the optimal design of the input signal, is shown in Chap. 4. Taking into account
(4.204.23) one may define a robust recursive LS (RRLS) algorithm with optimal
input (RRLSO) [62] given by (4.204.23), (5.25) and (5.26). Additionally it is
necessary to investigate the practical effects of the use of optimal input to the
RRLS algorithm under conditions when additive impulse noise is present. In other
words, a question is posed if the improved properties of the algorithm, obtained by
the application of the optimal input, are retained in the RRLS algorithm when it is
in an environment with impulse additive noise. Practical convergence properties of
the RRLSO algorithm, for a finite number of iterations, are experimentally
investigated in the further text. The flow diagram of the RRLSO algorithm is
shown in Table 5.6.
Table 5.6 Flow diagram of RRLSO algorithm

dentical to the case of the RRLS algorithm
Step 2: ^ 1; Xk; Pk 1;
Assuming that in each step k 1; 2; 3; . . .one knows Hk

^
e k; Hk 1 and rk 1 calculate

^ k 1 like in the RRLS algorithm
normalized residual ~e k; H

nonlinear transformation W ~ k; H^ k 1 like in the RRLS algorithm
matrix Pk like in RRLS algorithm
filter parameter estimation H^ k like in the RRLS algorithm
generate the sample xk 1 of the D-optimal input signal based on (4.23)
form the input vector XT k 1 like in the RRLS algorithm

^ k
generate the desired response d k 1and calculate the error e k 1; H
like in the RRLS algorithm
form the data vector in Ek like in the RRLS algorithm
calculate the standard deviation sk like in the RRLS algorithm
Step 3: Increment the counted k by 1 and repeat the procedure starting from the step 2
When the filter parameters are variable with time, it is natural to utilize the
second (alternative) form of the RRLS algorithm (5.35) and (5.36). By introducing
the replacement [24]
i ciRi; ci i1
R 5:37
into (5.23), one obtains

Hi ^ 1 ciR1 iXiW e i; H
^ Hi ^ i 1 : 5:38
Besides that, it follows from (5.34) to (5.37) that

Ri
Ri 1 ci W0 e i; H ^ i 1 XiXT i R
k 1 : 5:39
Sometimes it is more convenient to utilize the matrix Pi and the forgetting
i; these are defined in the following manner
factor (FF) qi instead of ci and R
[24]
1 i; qi ci 1 1 ci:
Pi ciR 5:40
ci
In that case the RRLS algorithm (5.385.40) becomes, after applying the lemma
on matrix inversion (2.58),

^ Hi
Hi ^ 1 PiXiW e i; H ^ i 1 : 5:41
( )
1 Pi 1XiXT iPi 1W0 e i; H ^ i 1
Pi Pi 1 : 5:42
qi qi W0 e i; H
^ i 1 XT iPi 1Xi
Relations (5.41) and (5.42) define the adaptive RRLS algorithm with a VFF
factor q: The forgetting factor q; where 0\q 1; imparts different weights to the
previous signal samples, thus enabling the RRLS algorithm to track changes in the
signal. The ways to automatically adjust the forgetting factor q; during the oper-
ation of the algorithm are considered in detail in Chap. 3.
To investigate the properties of the RRLS algorithm with optimal input (RRLSO
algorithm) we simulated a system structure shown in Fig. 5.1. The desired system
response was obtained by bringing an input signal x(k) to a ninth-order FIR filter
with the coefficients hT f 0:1; 0:2; 0:3; 0:4; 0:5; 0:4; 0:3; 0:2; 0:1g and with
superposition of independent additive Gaussian noise, n(k), with a zero mean value
[36]. The variance of additive noise was thus chosen to ensure a signal to noise
ratio SNR equal to 0 dB before the addition of impulse noise. Impulse noise was
generated according to a model nk akAk; where ak is a binary inde-
pendent identically distributed process with a probability Pak 1 0:01 and
Pak 0 0:99: Here Ak is a process with a symmetrical amplitude distri-
bution, uncorrelated with ak and with a variance varfAkg 104 =12 [36]. One
implementation of thus generated additive noise is shown in Fig. 5.6. The length
nH of the sliding window of the scale factor estimator in (5.7) was chosen to have
the value of 10 samples, to ensure that the probability of the appearance of more
than one outlier (realization of impulse noise) within the window remains very
small. The adaptive FIR filter is of the same order of the desired FIR filter and is
determined by a parameter vector Hk^ and by the algorithm for recursive esti-
mation of the parameter vector (5.35) and (5.36). Two different input signals
Fig. 5.6 Additive impulse 20

noise at the output of the
desired system
10
-10
-20
-30
-40
0 100 200 300 400 500
sample number
10
-10
-20
-30
0 100 200 300 400 500
Fig. 5.7 Normalized estimation error (NEE) for different adaptive algorithms in the case without
impulse noise in desired response. Dashed line RLS algorithm with Gaussian noise as input
signal; Bold dashed line RRLS algorithm with Gaussian noise as input signal; Solid line RLS
algorithm with optimal input; Bold solid line RRLS algorithm with optimal input
x(k) were applied: (1) the optimum input sequence (4.204.22), (2) Gaussian noise
of unit variance and with a zero mean value. The results of the parameter esti-
mation, obtained by the application of input signals (1) and (2), were compared
according to the normalized estimation error NEE in (5.19).
Figure 5.7 shows the values of the NEE criterion for different algorithms in the
case when there is no impulse noise in the desired response. The analysis of the
obtained results confirmed the initial assumption that the robust RLS algorithms
(RRLS) show similar properties to the conventional RLS algorithm, however the
advantage of the use of the optimal input (RRLSO algorithm) is noticeable and
shows through faster initial convergence and higher accuracy of the estimated
parameter. Besides that, it can be noted that the RRLS algorithm with optimal
input (RRLSO) gives somewhat worse results than the conventional RLS algo-
rithm with optimal input (RLSO), however its properties are still better than those
of the compared algorithms with Gaussian noise as their random excitation.
Figure 5.8 shows the obtained values of the normalized estimation error NEE
for the case when impulse noise is contained in the desired response.
10
-10
-20
0 100 200 300 400 500
Fig. 5.8 Normalized estimation error (NEE) for different adaptive algorithms in the case with
impulse noise in desired response. Dashed line RLS algorithm with Gaussian noise as input
signal; Bold dashed line RRLS algorithm with Gaussian noise as input signal; Solid line RLS
algorithm with optimal input; Bold solid line RRLS algorithm with optimal input
The presented experimental results point out to the following conclusions:

It is evident that the classical RLS algorithm is very sensitive to disturbances in
the impulse form and that its resulting parameter estimation are thus worse and
practically not applicable in real situations.
The application of the optimal input to the classical RLS algorithm also shows
sensitivity to disturbances of impulse type, while impulse changes practically do
not show any influence to the RRLS algorithm. In such environment too one
observes noticeable improvements resulting from the application of the RRLSO
algorithm, i.e. the optimal input to the robust RLS algorithm (RRLS), which reflect
in faster initial convergence and smaller normalized estimation error.
5.3 Adaptive Estimation of the Scaling Factor

in Robust Algorithms
It was already mentioned that the previously considered M-estimators are sus-
ceptible to changes of signal dynamics. This problem can be overcome by a
modification of the criterion function, as done in (5.6), by introducing a scaling
factor (5.7). This Chapter presents another approach to the recursive generation of
the scaling factor in the real time, simultaneously with the generation of the filter
parameter estimations.
Let us assume that the probability distribution density of s stochastic distur-
bance (noise),nk; is known with an accuracy up to a scaling factor s. If pn is the
nominal probability distribution density for s 1; then the probability distribution
density for an arbitrary scaling factor s is given as
1 n
p n p : 5:43
s s
A case of interest is when the disturbance nk belongs to a distribution class P
that is known in advance. In that case it is necessary, by maximizing the bottom
Kramer-Rao limit in (4.9), to determine first the most unfavorable probability
distribution density p
n (such a function of probability distribution density
carries minimum amount of information about the estimated parameters within the
given class). The optimal loss function in the distribution class is the likelihood
function u in (5.2), which is determined by the most unfavorable probability
distribution density function within the given class and now has a form

1 ek; H ek; H
F
ek; H; s ln p
ln s ln p
s s s
ek; H
ln s u : 5:44
s
5.3 Adaptive Estimation of the Scaling Factor in Robust Algorithms 171
According to (5.44) one obtains the relation for mean (expected) losses, which
simultaneously represents the criterion generator (5.21) for recurrent stochastic
procedures for vector H estimation

ek; H
J H; s EfF ek; H; sg ln s E u : 5:45
s
According to (5.45), analogously to (5.24), one obtains the following empirical
functional (mathematical expectation is approximated by arithmetic means)

1X k
ei; H
Jk H; s ln s u : 5:46
k i1 s
It is implied that k is chosen to ensure the convergence of Jk toward J from

(5.45) [28, 44]. To solve the system of nonlinear equations appearing as a result of
the optimality condition o Jk H; s=oH 0; where o =oH represents the gradi-
ent or the partial derivative operator, one can utilize Newton-Raphson method
from (5.28). Namely, by expanding Jk H; s around the point H ^ k 1 into a
Taylor series and by minimizing thus obtained series over H, one obtains
" #1 " #
o2 J k H^ k 1; s oJk H^ k 1; s
^ ^
H k H k 1 k k
oH2 oH

O H ^ k 1
^ k H 5:47
where
O k x k
lim 0;
k xk!1 k xk
and k k represents the Euclidean norm. According to (5.46) one may write
" #
k1
1 1 X ei; H ek; H
Jk H; s ln s k 1 u u
k k 1 i1 s s

5:48
1 ek; H
ln s k 1Jk1 H; s k 1 ln s u ;
k s
whence it follows

ek; H
kJk H; s ln s k 1Jk1 H; s u : 5:49
s
By differentiating twice the last relation over H, one obtains

oJk H; s oJk1 H; s Xk ek; H
k k 1 W ; 5:50
oH oH s s

o2 Jk H; s o2 Jk1 H; s XkXT k 0 ek; H
k k 1 W ; 5:51
oH2 oH2 s2 s
where W u0 : If we introduce H Hk ^ 1 and s ^sk 1 into the relation
(5.51), we obtain

o2 J k H^ k 1; ^sk 1 o2 Jk1 H^ k 1; ^sk 1
k k 1
oH2 oH2
! 5:52
XkXT k 0 e k; H ^ k 1
2 W :
^s k 1 ^sk 1
By introducing the notation

o2 J k H^ k 1; ^sk 1

Rk k ; 5:53
oH2
and utilizing the assumptions
P1: The estimation H ^ k is near the estimation H ^ k 1; which causes

O H ^ k H^ k 1 0
P2: The estimation H ^ k 1 is optimal in the step k 1; which gives
oJk1 Hk1;^sk1
^
oH 0;
the relation (5.47) becomes
" #
1 oJk H^ k 1; ^sk 1
^ ^
Hk Hk 1 R k k ; 5:54
oH
!
e k; H^ k 1 XkXT k
k R
k 1 W 0
R : 5:55
^sk 1 ^s2 k 1
By combining (5.54) and (5.50) with H H ^ k 1 and s ^sk 1; and

utilizing P2, it follows
" ! #
1 e k; H^ k 1 X k
^ k H
H ^ k 1 R
k W : 5:56
^sk 1 ^sk 1
It remains to define the recurrence relation for the scaling factor s. Similarly to
(5.47), one obtains the relation for the recurrent estimation of the scalar parameter s
" #1 " #
o2 J k H^ k 1; sk 1 oJk H^ k 1; sk 1
^sk ^sk 1 k k :
os2 os
5:57
By differentiating (5.49) twice over s, one obtains

oJk H; s 1 oJk1 H; s ek; H ek; H
k k 1 W
os s os s2 s
2 2
o Jk H; s 1 o Jk1 H; s 2ek; H ek; H
k k 1 W 5:58
os2 s2 os2 s3 s
2

e k; H 0 ek; H
W :
s4 s
Let us introduce an additional assumption
P3: The estimation ^sk 1 is optimal in the step k 1; which gives
oJk1 H
^ k1;^sk1
os 0
Utilizing P3 and accepting H H ^ k 1 and s ^sk 1; it follows from
(5.58)

oJk H^ k 1; ^sk 1 1 oJk1 H ^ k 1; ^sk 1
k k 1
os ^sk 1 os
!
e k; H ^ k 1 e k; H^ k 1
W
^s2 k 1 ^sk 1
" ! #
1 e k; H^ k 1
2 ^sk 1 e k; H ^ k 1 W
^s k 1 ^sk 1
5:59

o2 J k H^ k 1; ^sk 1 1 o2 Jk1 H ^ k 1; ^sk 1
k 2 k 1
os2 ^s k 1 os2
!
2e k; H ^ k 1 e k; H ^ k 1
W
^s3 k 1 ^sk 1
!
e2 k; H ^ k 1 e k; H^ k 1
W0 :
^s4 k 1 ^sk 1
5:60
Introducing the notation

o2 J k H^ k 1; ^sk 1
c k k ; 5:61
os2
the relation (5.60) can be written as
!
1 2e k; H ^ k 1 e k; H^ k 1
c k c k 1 2 W
^s k 1 ^s3 k 1 ^sk 1
! 5:62
e2 k; H^ k 1 e k; H ^ k 1
4
W0 :
^s k 1 ^sk 1
Using the relations (5.555.57) and (5.62) one obtains the final form of the
algorithm for the estimation of the parameter vector with simultaneous estimation
of the scaling factor s.
!
1 1 Xk e k; H^ k 1
^ ^
Hk Hk 1 R k W ; 5:63
k ^sk 1 ^sk 1
" ! #
^ XkXT k
1 0 e k; Hk 1
R k R k 1 W R k 1 ; 5:64
k ^sk 1 ^s2 k 1
" ! #
1 ^ k 1
k; H
^sk ^sk 1 ^
^sk 1 e k; Hk 1 W ;
ck^s2 k 1 ^sk 1
5:65
!
1 2e k; H ^ k 1 e k; H^ k 1
c k c k 1 2 W
^s k 1 ^s3 k 1 ^sk 1
! 5:66
e2 k; H^ k 1 e k; H ^ k 1
4
W0 :
^s k 1 ^sk 1
The proposed algorithm is complex for implementation, thus it is desirable to

introduce acceptable approximations in order to simplify it.
Let us assume that the estimations Hk ^ 1 and ^sk 1; for sufficiently large
k, are close to the optimal solution Hopt and sopt ; and that the following condition is
fulfilled

o2 J k H^ k 1; ^sk 1 o2 J Hopt ; sopt
lim ; 5:67
k!1 os2 os2
which means that instead of the second derivative of the empirical functional
(5.46) one utilizes the second derivative of the functional (5.45). Then the relation
(5.57) assumes the form
" #1
o2 J Hopt ; sopt oJk Hopt ; sopt

^sk ^sk 1 k k : 5:68
os2 os
The first and the second derivative of the functional (5.45) in the point
Hopt ; sopt are
( !)
oJ Hopt ; sopt 1 e k; Hopt e k; Hopt
E W 5:69
os sopt s2opt sopt
( ! 2 )
o2 J Hopt ; sopt 1 0 e k; Hopt e k; Hopt
2 E W
os2 sopt sopt s4
( ! ) 5:70
e k; Hopt e k; Hopt
2E W :
sopt s3
oJ Hopt ;sopt
For H Hopt and s sopt we have os 0; thus we obtain from (5.69)
to (5.70)

o2 J Hopt ; sopt 1 1 e k; Hopt 2

E W0 e k; H opt : 5:71
os2 s2 s4 sopt
If one introduces Fishers dispersion matrix on the class of the probability
distribution density functions P, based on the least favorable probability distri-
bution density
p
within the given class P [28]
( )
p0
n 2 2

I d
p
E n ; 5:72
p
n

by applying the procedure presented in [28] and utilizing the relations (5.695.71),
one obtains the following expression for (5.68)
" !#
1 e k; H^ k 1
^sk ^sk 1 ^sk 1 e k; H^ k 1 W
k I d
p
1 ^sk 1
" ! #
a e k; H^ k 1
^sk 1 ^sk 1 e k; H ^ k 1 W
k ^sk 1
5:73
where
1
a : 5:74
Id
p
1
Now the simplified algorithm for the simultaneous estimation of the parameters
and the scaling factor assumes the form
!
1 X k e k; H^ k 1
^ 1
^ k 1 R k
Hk H W ; 5:75
k ^sk 1 ^sk 1
" ! #
^ T
1 0 ek; Hk 1 XkX k
Rk Rk 1 W Rk 1 ; 5:76
k ^sk 1 ^sk 1
" ! #
a e k; H^ k 1
^
^sk ^sk 1 ^sk 1 e k; Hk 1 W ; 5:77
k ^sk 1
^ k 1:
ek yk XT kH 5:78
Since according to (5.77) the ability to update the scaling factor decreases with
the advance of the training process, since the correction of the scaling factor is
divided by the number of iterations k, it is desirable to modify (5.77) to ensure the
application to the signals with abrupt changes of dynamics. Namely, instead of the
variable a/k a fixed weight factor 0.1 was introduced, its value chosen according to
experimental results. The choice of this factor is a result of trade-off between the
Table 5.7 Flow diagram of the algorithm for simultaneous estimation of the parameters and the
scaling factor
^
H0 0; P0 r2 I; r2 1; R0 P1 0
T
X 1 x1 x0 0. . .0; input column vector with dimensions M 1 1
define the length of the data window nH ; 5 nH 10
define the initial data window E0 fd 0 d 1 . . . d nH 1g
estimate the standard deviation ^s0 according to (5.7and the data E0

e 1; H^ 0 d1; initial measurement residual
define the function W according to (5.23) and its derivative W0
Step 2: Assuming that in each step k 1; 2; 3; . . .one knows Hk^ 1; Xk;
Rk 1; ek; Hk 1 and ^sk 1 calculate:
normalized residual
nonlinear transformation of the residual

^ k 1 and its derivative W0 ~e k; H
W ~e k; H ^ k 1
inverse value of the amplification matrix, Rk; based on (5.76)
scaling factor sk according to (5.77)
estimation of the filter parameters H ^ k according to (5.75)
generate a sample of the desired response dk 1
generate a sample of the excitation signal xk 1 and form the input vector
XT k 1 xk 1 xk . . . xk M 1 where
xi 0 for i 0

error e k 1; H^ k according to (5.78)
form the data vector Ek by shifting the elements of the previous data vectors 1

place to the left and by introducing e k 1; H ^ k instead of the last right
element
Step 3: Increment the counter k by 1 and repeat the procedure starting from the step 2
increase of the variance of the estimated scaling factor and the rate of change of
the scaling factor with the onset of the changes of signal dynamics.
A flow diagram of the algorithm (5.755.78) is given in Table 5.7.
To investigate the properties of the algorithm (5.755.78) we simulated an

adaptive system in the manner described in Sect. 5.2.1, with a difference that in
this analysis the variance of the additive noise was not constant. Namely, the
variance of additive noise was chosen so that in the first segment of the signal the
signal-to-noise ratio was SNR = 30 dB, then it was 10 dB and in the last segment it
was 20 dB. Impulse noise, with characteristics described in 5.2.1, was added to
additive noise. Figure 5.9 shows additive noise with and without impulse noise. It
should be noted that the change of the signal dynamics in Fig. 5.9b) is less easily
seen because of the large realizations of impulse noise, but it is basically equal to
the signal shown in Fig. 5.9a), and the difference is in impulse noise only.
Since the scaling factor (5.77) is directly dependent on the error signal e(k),
which itself is very sensitive to the presence of impulse disturbances, in the
(a) Aditivni sum

2
-1
-2
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
(b)
40
20
-20
-40
-60
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Fig. 5.9 Additive noise with variable dynamics: (a) without impulse noise, (b) with impulse
noise
modification (5.21) of the criterion function (5.20) it is desirable to utilize hard

rejection function for the risk function u
m
c1 ; j xj mr
u x xr2 ; 5:79
2r2 c2 ; j xj\mr
where c1 and c2 are conveniently chosen constants that ensure continuity of the
function, instead of Hubers saturation nonlinearity in (5.22) [63].
The derivative of the function (5.79), the so-called influence function, has the
form [14, 28, 35]

x; za j xj mr
W x u0 x 5:80
0; za j xj [ mr
where the usual choice is m 3 (the so-called 3 sigma rule). For the estimation of
parameters one still utilizes nonlinearity (5.22) and its influence function.
The results obtained by simulation were compared, using normalized estimation
error (5.19), with the results obtained by the RRLS algorithm which utilized the
median of absolute deviation for robust estimation of the scaling factor.
Figure 5.10 shows the estimation of the scaling factor obtained by the use of the
median of absolute deviation (MAD) and the iterative method (5.77), while uti-
lizing nonlinear transformation of residual (5.79) and the adequate influence
function (5.80).
(a)
0.4
0.2
0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
(b)
0.4
0.2
0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Fig. 5.10 Estimation of scaling factor: (a) using median of absolute deviation and (b) by
iterative method (5.77)
Table 5.8 Variance of Interval Scaling factor variance

scaling factor estimation: (a)
using the method of median (a) (b)
of absolute deviation (5.7) (b) 1003,000 1.1510-4 1.2310-5
by iterative method (5.77) 30006,000 1.1210-3 1.9210-4
6,0009,000 8.910-3 1.2510-3
Fig. 5.11 Normalized 0

RRLS algorithm
estimation error if no impulse RRLS + scale estimation
noise is contained in additive
-10
noise. Scaling factor
estimation: Dashed line using
median of absolute deviation -20
(5.7) in RRLS; Solid line
iterative method (5.77) in
NEE
-30
RRLS ? scale factor
-40
-50
-60
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Although both methods of scaling factor estimation change their results at an

approximately equal rate, pointing out to adequate registering of the changes of
signal dynamics, it is noticeable that the estimation of the scaling factor utilizing
the MAD method has a variance an order of magnitude larger compared to the
adaptive procedure (5.77), as denoted in Table 5.8.
In an environment where no impulse noise is present in additive noise
(Fig. 5.11), both of the considered robust algorithms start their process of
parameter estimation in a similar manner. In the moment of abrupt change of
additive noise variance, the standard RRLS algorithm degrades its properties,
which is a consequence of inadequate estimation of the variance of additive noise.
On the other hand, the robust algorithm with simultaneous estimation of the
parameters and the scaling factor shows a lower sensitivity to changes of the
variance value.
Figure 5.12 shows the results of simulation obtained when additive Gaussian
noise is contaminated by impulse noise component. In such an environment the
classical RLS algorithm, as expected, shows high sensitivity to the presence of
impulse noise. The RRLS algorithm and the RRLS algorithm with adaptive esti-
mation of scaling factor show robustness to the presence of impulse noise, but the
latter algorithm shows better properties in an environment with nonstationary noise.
By comparing the normalized error of filter parameter estimation shows that
these two methods give very similar results, both for impulse environment and for
purely Gaussian noise.
Fig. 5.12 Normalized 0

estimation error if impulse RLS
RRLS
noise is contained in additive RRLS + skala faktor
-10
noise, for different
algorithms. Scaling factor
estimation: Solid line using -20
median of absolute deviation
(5.7) in RRLS; Dashed line
NEE
-30
iterative method (5.77) in
RRLS ? scale factor
-40
-50
-60
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Figures 5.11 and 5.12 show the values of normalized estimation errors obtained
by different methods for the estimation of the scaling factor.
The performed experimental and theoretical analysis enables to draw the fol-
lowing conclusions:
The algorithm for iterative estimation of scaling factor is convenient for
parameter identification of systems where additive noise also contains abrupt
variance changes besides impulse noise (nonstationary case). The proposed
algorithm was compared to the conventional approach, the robust recursive least
squares algorithm (RRLS), where noise variance is determined using the median
of absolute deviation. The obtained results show that the proposed algorithm
estimates better the scaling factor, which reflects in lower variance of estimations.
Besides that, in the case when additive noise is contaminated by impulse noise, the
proposed algorithm proves itself more efficient than the RRLS algorithm, where
noise variance is determined using the median of absolute deviation.
Thus for the applications where it is necessary to more accurately estimate the
scaling factor with a goal to improve the quality of filter parameter estimation in
nonstationary environment characterized impulse disturbances, the advantages of
the proposed algorithm are obvious, but it should be mentioned that this is
obtained for the price of higher computational complexity.

with Variable Forgetting Factor and with Detection
of Impulse Noise
In this Chapter we further analyze the problem of system parameters identification

using adaptive FIR filters where, besides the contamination of the desired response
by impulse noise, one also encounters variations of the value of the estimated
5.4 Robust Recursive Least Squares Algorithm with Variable Forgetting Factor 181
parameters (nonstationary environment characterized by impulse noise). A block

diagram of the system for parameter is identical to that in Fig. 5.1. It is known that
to successfully follow the variations of parameter values it is desirable to utilize a
variable forgetting factor, a topic handled in more detail in Chap. 3, while in order
to limit the propagation of errors caused by impulse noise one utilizes robust
algorithms, considered in this Chapter. However, when one encounters environ-
ments where both phenomena are present, none of the mentioned solutions,
applied separately, does not furnish fully satisfactory results. An attempt to modify
the RRLS algorithm by utilizing an algorithm for the estimation of the forgetting
factor based on the PA algorithm (Sect. 3.1) instead of a fixed forgetting factor
proved itself unsuccessful, since such an algorithm does not have a built-in
mechanism for the estimation of the reason of the appearance of abrupt increase of
error signal. Namely, the RRLS and the PA-RLS algorithm react to an increase of
the error signal value, so that it may be erroneously adopted that impulse noise
appeared because of the abrupt change of the parameter value, and vice versa. One
of the possible approaches to the solution of the quoted problem has been proposed
in the previous Chapter and is based on recursive estimation of the scaling factor
simultaneously with the estimation of the parameters of the filter itself utilizing the
robust RLS algorithm (RRLS). It is possible to solve this problem in another way,
by utilizing a combined criterion [33, 64]:

J H;^ k kkJu H; ^ k 1 kkJq H; ^ k ; 5:81
where the criterion generator of robust least squares estimation is defined by
(5.21), i.e.
X
k
^ k k1
Ju H; ^ i
u e H; 5:82
i1
and the criterion generator of the estimation of weighted least squares estimation
with a forgetting factor is given by the expression (3.88), i.e.
X
k
^ k k1
Jq H; ^ i :
qk; ie2 H; 5:83
i1
Let us remind that for a time-variable (nonstationary) system the choice of a

fixed parameter q in (5.83) results in an estimation of average behavior of the
system at the analyzed interval f0 i kg: To estimate the momentary properties
of the system in a current moment k, in general case the exponential factor qki
should be replaced by a weight function qk; i; which represents an increasing
function of the argument i for a given k. A convenient general structure of this
factor is defined by the expression (3.85), i.e.
qk; i xkqk 1; i; 1 i k 1: 5:84
The three strategies, the so-called EPE, FKY and PA algorithms, which further
concretize the general structure (5.84), were considered in Chap. 3.
In (5.81), kk is a parameter in the k-th step whose value may be 1 or 0. If
k 1; the expression (5.81) becomes the criterion function (5.82), which leads to
the RRLS algorithm, and if k 0 then (5.81) leads to the criterion (5.83) with a
variable forgetting factor, whose recursive minimization can result in the PA-RLS
algorithm. Since impulse disturbances are sporadic, while the parameter changes
may be continuous, it is desirable that the PA-RLS algorithm dominates, while the
RRLS algorithm is activated only in the intervals where impulse disturbances
occur.
Bearing in mind the assumed nature of impulse noise nk akAk; where
ak is a binary independent identically distributed process chosen so that
Pak 1 has a low value close to zero, while Pak 0 has a value near one,
while Ak is a process with a symmetrical distribution of amplitude, uncorrelated
with ak and having a large variance compared to additive noise, the following
strategy was proposed for the choice of the parameter k in each step:
8 n o n o
< 1; c med jekjn \mean jekjn
k n H
o n H
o 5:85
: 0; c med jekj
nH mean jekjnH
where c is the proportionality constant dependent on the variance of additive noise

and on the ratio between the variance of contamination and the variance of
additive noise. The median of the absolute value of the error signal
medf jekj nH g; and the mean value of the absolute value of the error signal,
meanf jekj nH g; were calculated on a sliding window with a length of nH previous
samples of residual. The length of the sliding window is chosen to ensure that the
probability of the appearance of an outlier (realization of impulse disturbance)
within the data window remains very low.
In other words, the task of determining the value of the parameter k is the
detection of the appearance of impulse disturbance. By properly choosing
the length of the sliding window one obtains a significant difference between the
median and the mean value of the error signal at the considered interval in the case
of the appearance of impulse disturbance. When choosing the constant c one
should take care that its value increases with the variance of additive noise, in
order to reduce to minimum the probability of spurious detection of an outlier. On
the other hand, this value must not be too large, in order to enable registration of
the outliers with smaller amplitude.
The proposed algorithm basically represents a combination of the RRLS and
the PA-RLS algorithm in which the PA-RLS algorithm is dominant and which
follows the changes of the estimated parameters and according to these changes
updates the values of the forgetting factor q. In the moments of detection of the
appearance of outliers k assumes the value 1, which implies the operation of the
RRLS algorithm, and the value of the variable forgetting factor, q; retains its
Table 5.9 Flow diagram of RRLS algorithm with variable forgetting factor and detection of
impulse disturbances
Step 1: Let vector parameter H ^ k 1 in iteration k; k nH be known from the previous
iteration k 1

Step 2: Calculate the current residual e k; H ^ k 1 dk XT kHk and define the
n
vector E fek; ek 1; . . .; ek nH 1g with a length nH ; implying that
nH 1 is known from the previous values of the residual
fek 1; . . .; ek nH 1g
Step 3: Calculate kk in the data window defined in the Step 2:
8 n o n o
< 1; c med jekjn \mean jekjn
k n H
o n H
o
: 0; c med jekj
nH mean jekjnH
If kk 1; set qk qk 1 and lk lk 1 and then continue with the
Step 8
Step 4: Calculate normalized ratio of nonstationarity and additive noise power, qk;
qk maxf0; qk h 1qk 1 1 qk 1pk 1gi
2
pk 1 lk 1 e rk1
2

1 2 lk 1 lk 1
n
and the value of variable forgetting factor

p p
qk 4 qkqk 4M qkqk 4 qk bmin
qk p p
qk 4 qkqk 4M qkqk 4 qk M
Step 5: Update the values lk and bk :
bkbcrit
lk ;
bkbcrit qkbcrit bk
bk 1 qkM;
Step 6: Calculate the current value of the parameter vector Hk ^ using the RLS algorithm:
^
Hk ^ 1 Kkek
Hk
1
Kk Pk 1Xk qk XT kPk 1Xk

1 Pk 1XkXT kPk 1
Pk Pk 1
qk qk XT kPk 1Xk
Step 7: Set k k 1 and start from Step 2.
Step 8: Calculate the median of absolute deviationsk; using the data window from the Step 2
s2 median eimedianei
0:6745
Step 9: Calculate the influence function W x and its derivative W0 x :
8
n o < 1; x [ 0
j xj m
W x min r2 ; r sign x; sign x 0; x 0
:
1; x\0
Step 10: ^
Calculate the current value of the parameter vector Hk :

^ k H
H ^ k 1 PkXkW e k; H ^ k 1
Step 11: Set k k 1 and start from the Step 1.
previous value in the next nH iterations, which is the number necessary to ensure
that the mean value becomes insensitive to the detected outlier. The block diagram
of the algorithm is given in Table 5.9.
The block for system parameter identification shown in Fig. 5.1 was simulated.
The signal of the desired response was obtained by bringing noise with normal
distribution and unit power to the input of a ninth-order FIR filter, with the
parameters H f0:1; 0:2; 0:3; 0:4; 0:5; 0:4; 0:3; 0:2; 0:1g: Independent noise with
normal distribution and a fixed variance was added to the filter output
signal so that the signal-to-noise ratio is SNR 30dB before the start of
impulse contamination. Impulse noise was generated according to the model
nk akAk; Pak 1 0:01; Pak 0 0:99 and varfAkg 104
=12:^b1 k:
(a)
1
0.95
(k)
0.9
0.85
0.8
0 500 1000 1500 2000 2500 3000
(b)
0.6
0.5
0.4
0.3
0.2
0.1
(k)
0
-0.1
0 500 1000 1500 2000 2500 3000
(c)
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
0 500 1000 1500 2000 2500 3000
broj iteracija
Fig. 5.13 Estimation of time-variable parameter using the RRLS algorithm with a variable
forgetting factor q and an outlier detector without impulse contamination. (a) value of variable
forgetting factor, (b) estimated value of variable parameter b1 and the parameter k which detects
the presence of impulse noise, (c) additive noise
Fig. 5.14 Variation of the 0.5

b1 parameters of an FIR
0.45
filter of the order M = 9
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 500 1000 1500 2000 2500 3000
The length of the sliding window nH ; applied for the determination of the
parameter k; was chosen to be nH 5 to minimize the probability of occurrence of
more than one outlier in the window simultaneously and at the same time to
decrease the number of iterations when the algorithm for the forgetting factor
update is inactive. Based on experimental results we adopted a value of 10 for the
constant c as a sufficiently good to follow the phenomena of impulse contami-
nation. The changes of the b1 parameter of the FIR filter, as defined in Fig. 5.14,
includes sudden drops and surges, as well as linear growth with larger or smaller
inclination.
Figure 5.13 shows the estimated value of the variable parameter for the case
when no impulse contamination is present in additive noise. The parameter k has a
value of 0 during the whole estimation, which is expected since no outliers occur,
so that the leading role in the estimation is assumed by the PA-RLS algorithm. The
adequate estimation of the forgetting factor in accordance with the variation of the
estimated parameters is obvious.
Figure 5.14 shows the results obtained for the case when impulse contamina-
tion is present in additive noise (Fig. 5.14d). The parameter k successfully
detected each occurrence of impulse contamination (Fig. 5.14c) and thus pre-
vented the update of the variable forgetting factor in the interval after the impulse
disturbance occurred. Due to the appearance of impulse disturbances around 1,200
and 2,300-th iteration one notices insignificant deviations in the estimation of the
parameter values, (Fig. 5.14b), however they did not influence significantly the
total result of the parameter estimation.
At the end, let us note the following. It is possible to solve the problem of the
estimation of the time-variable parameter in the presence of impulse disturbances
in additive noise using a robust recursive least squares algorithm with variable
forgetting factor and a detector of impulse disturbances (outliers). The proposed
algorithm is basically a two-step one, since depending on the detection of impulse
(a)
1
0.95
(k)
0.9
0.85
0.8
0 500 1000 1500 2000 2500 3000
(b)
0.6
0.4
0.2
0
-0.2
0 500 1000 1500 2000 2500 3000
(c)
1
0.5 (k)
0
0 500 1000 1500 2000 2500 3000
(d)
40
20
0
-20
-40
-60
0 500 1000 1500 2000 2500 3000
broj iteracija
Fig. 5.15 Estimation of the time-variable parameter using the RRLS algorithm with variable
forgetting factor q and with a detector of outliers if impulse contamination is present: (a) the
value of the variable forgetting factor, (b) estimated value of the variable factor b1 , (c) parameter
k which detects the presence of impulse disturbances (d) additive noise
disturbances it functions either as a robust recursive least squares algorithm or as a

recursive least squares algorithm with a strategy for the forgetting factor choice
based on parallel adaptation. The obtained results show a satisfactory detection of
the appearance of impulse disturbances using the so-called median filter, which
ensures correct following of parameter variations, with simultaneous prevention of
propagation of errors caused by impulse disturbances (Fig. 5.15).
Naturally, it is possible to utilize some other algorithm to determine the variable
forgetting factor in the estimation of the filter parameters in nonstationary intervals
instead of the PA-RLS algorithm. The latter algorithm was chosen because it
represents a satisfactory tradeoff between numerical complexity and efficiency in
various applications.
Chapter 6
Application of Adaptive Digital
Filters for Echo Cancellation
in Telecommunication Networks
Echo is a phenomenon we meet almost every day. During conversation, one may
hear echo of speech that occurs because of the reflection of the sound signal from the
walls, floor or some other surrounding objects. Echo always appears when there is
reflection, but is often imperceptible if the time difference between the appearance of
the original signal (speech) and the arrival of the reflected signal (echo) is small.
However, when the location of the reflection is sufficiently far from the speaker, as is
the case in large empty rooms, then the time delay of the reflected signal is larger and
echo may thus be significantly more marked in comparison to the original signal.
Echo is also generated in telecommunication networks. In this case the term of
echo implies a delayed and distorted version of the original acoustic or electrical
signal which moves towards its own source due to reflection or some other reason.
From the point of view of transmission quality, echo represents a disturbance
causing a decrease of intelligibility in speech transmission, and an increase of error
probability in data transfer. The origins of echo should be sought in specific
requirements regarding the type of transmission, diversity of terminals and
requirements for maximum exploitation of the available transmission systems.
Although data transfer, in the form of telegraphy, preceded speech signal
transmission, speech communication became dominant with time and determined
the development of telecommunication networks. The contemporary trend of the
development of computers, which are becoming omnipresent due to their low cost,
imposes an increasing need for data transfer. It is natural that there is a tendency to
use the existing telephone networks for this type of transmission too. However,
these networks are optimized for the transmission of analogous speech signals and
thus introduce various distortions in data transfer. The most marked distortions are
the linear ones, which include echo.
On the other hand, regardless if one deals with data or speech transmission, due
to specific requirements in the use of communication equipment (like for instance
acoustic and video teleconferences, satellite transmissions and similar) several
different types of echo signal are generated.
The causes, modes and origins of echo in telecommunication networks may be
different, but their common trait is that they decrease the quality of communica-
tions. Thus there is an interest for a practical use of echo cancellers.

DOI: 10.1007/978-3-642-33561-7_6,
188 6 Application of Adaptive Digital Filters
The theoretical basis for echo cancellation is in the field of adaptive digital
filtering. This filed has been intensively researched in the last several decades, and
the first practical implementation of an echo canceller appeared during 1960-ties.
However, because of the requirements connected with complex digital signal
processing, a wider usage had to wait for the advent of the (LSI) Large scale
integrated technology. The first echo canceller in (VLSI) Very large scale inte-
grated technology was implemented in 1980 and this opened new possibilities for
the improvement of characteristics and functionalities of echo canceller, as well as
for their downsizing and cost decrease.
Following the technological development, the usage of echo cancellers also
evolved, from the original concept of echo cancellation on very long distance lines
to the application in full-duplex systems for data transfer, as well as in cancellation
of acoustic feedback (acoustic echo) in electro-acoustic, tele-audio and video
conferences.
As a rule, modern communication systems contain subsystems for local echo
cancellation based on the principles of adaptive filtering. The technology of digital
signal processors (DSP) ensures new possibilities for the implementation of
complex algorithms for local echo cancellation, optimized with regard to adap-
tation speed and accuracy. This is also a reason for an increasing interest for the
improvement of the existing and the generation of new algorithm for more efficient
solution of the echo cancellation problem.
The goal of this Chapter is to explain the concept of adaptive echo cancellation
based on adaptive digital filtration, with a special emphasis on local echo, to
represent the theoretical background, the possibilities and the limitations of this
approach, as well as to present some of the achieved original results contributing to
the improvement of the existing solutions.
The further text presents the basic types of echo signals, their causes and
origins. It points out some conventional ways for the cancellation of this phe-
nomenon, considers their drawbacks and outlines the principles of adaptive echo
cancellation.
Several classes of recursive adaptive algorithms for local echo cancellation are
analyzed from the point of view of accuracy, training speed, adaptation and the
complexity of their implementation. Special care has been dedicated to the analysis
of the influence of training sequences to the performance of the adaptive local echo
canceller. The adjustment of the frequency range to the given parameters of
the adequate communication channel was considered, as well as the statistical-
correlation properties of the training sequences. The performed analysis encom-
passes a novel approach based on the possibility of the on-line generation of
optimal input signals, taking as the synthesis criterion the classes of functionals used
in the field of optimal experiment planning. The algorithms of this type are con-
sidered in detail in Chap. 4. Besides that, we proposed the use of robust recursive
algorithms in the case when echo signal is contaminated by additive impulse noise of
by Gaussian noise with weighted tails. An exhaustive experimental and theo-
retical analysis of robust adaptive digital filters is given in Chap. 5.
6.1 Echo: Causes and Origins 189
6.1 Echo: Causes and Origins
In this Section we give an overview of the types and the causes of the appearance
of echo signal occurring in speech and data transmission. We present some
standard solutions for echo control and the basic concepts of adaptive echo
cancellation.
Regardless of their common name, the ways and consequences of echo
appearance in speech transmission and data transfer are different. Thus the
requirements posed for echo cancellation in these two cases are different, and we
consider these two cases separately.
6.1.1 Echo in Speech Transmission
A telephone network consists of the so-called two-wire and the four-wire seg-
ments. If the same line is used for transmission both in the receiving and in the
emitting direction, it is said then that it is the two-wire transfer. This form of
transfer is used on shorter distances, for instance from the subscriber to the local
telephone exchange. For the reasons of cost-effectiveness, longer distance transfers
require signal multiplexing, which implies separation of the transfer directions.
The term four-wire for this type of transfer appeared in the period when radio or
multiplexed transmission was not utilized and the separation of directions meant
the existence of at least two transmission lines. Its basic purpose is to ensure signal
amplification and multi-channel transmission. This type of transmission is sche-
matically shown in Fig. 6.1.
The system shown in Fig. 6.1 is symmetric for a user at the A and at the B side.
Assuming that the user A talks and the user B listens, the speech signal is trans-
ferred from the user A along the two-wire segment to the exchange. The hybrid in
the exchange should ensure the transition from the two-wire to the four-wire
segment by forwarding the signal to the emission direction. In Fig. 6.1 it is located
in the top part of the four-wire segment (Fig. 6.1b). At the other end, in the
exchange closer to the user B, the signal is forwarded through the hybrid and the
two-wire segment to the user B.
An important element of the network is the hybrid with a task to ensure the
conversion from a two-wire to a four-wire segment and vice versa, dividing at that
the emitting and the receiving direction. In an ideal case the hybrid transmits the
incoming signal from the four-wire to the two-wire segment, attenuating it by
3 dB, but in the opposite direction it attenuates it infinitely, not allowing it to
proceed to the emitting direction and thus to make a circular path. This may be
obtained by carefully tuning the hybrid, under the condition that the parameters of
the environment do not change. However, because of the variable number of the
user in the network, the requirement that the terminal may be positioned in various
places in the network and the variability of the directions of the establishment of
(a)
A B
Two-wire Four-wire segment Two-wire
(b)
(c)
(d)
Fig. 6.1 Appearance of echo in analog telephone network in speech signal transfer. (a) block
diagram of telephone connection, (b) one of the two desired transmission directions, (c) talker echo,
represents the phenomenon that the talker hears his own voice with some delay. (d) listener echo,
represents the phenomenon that the listener also hears a delayed version of the receiving signal
the connection, the hybrid cannot be designed in advance to satisfy the conditions
of ideal impedance matching [34].
A consequence of insufficient impedance matching of the hybrid is manifested as
a decrease of signal attenuation in the undesired direction, so that this attenuation
may even drop below 10 dB. Because of that, a part of the signal transferred to the
user B arrives through the hybrid into a part of the four-wire segment which otherwise
serves for the transfer from the user B to the user A. In this manner, the user A may
hear its own speech with a delay (Fig. 6.1c). This phenomenon is denoted as the
talker echo. The larger the signal propagation time, conditioned by the length of the
four-wire segment, the more marked is the influence of this phenomenon.
From the same reason the user B may hear in a similar way a delayed version of
the receiving signal (Fig. 6.1d). This echo is called the listener echo and is usually
weaker than the so-called talker echo, owing to attenuation at the transmission path
[65, 66].
It should be noted that the useful signal along a four-wire segment propagates in
one direction only, while the path of the echo signal is a loop. Because of that one
of the first methods for echo control was to introduce additional 3 dB attenuation
in both transmission directions at the four-wire segment. In this manner the echo
signal is attenuated by additional 6 dB. A deficiency of this method is that the
useful signal is also attenuated by additional 3 dB.
A somewhat more efficient method for echo control is to interrupt the trans-
mission in one direction if there is a speech signal in the other one (echo sup-
pressor), [67]. However, this method is efficient only if echo signal delay is shorter
than 100 ms. In satellite transmission, the time necessary for the radio signal to
reach a satellite and then to return to the earth is about 250 ms. This means that in
the case of a loop path the time delay may be up to 0.5s [68].
6.1.2 Acoustic Echo
Acoustic echo is one of the dominant problems influencing the quality of speech
signal transmission. It is especially marked in the so-called hands-free telephony,
as well as in acoustic and video teleconferences [69]. The acoustic echo is a signal
occurring due to multiple reflection of sound waves and the establishment of an
acoustic channel between the speaker and the microphone at the terminal. The
appearance of acoustic echo is schematically presented in Fig. 6.2.
Contrary to the conventional telephone sets, where the user is in direct, physical
contact with the terminal and where the acoustic path between the speaker and the
microphone is blocked by their specific location, here it is not the case. The
acoustic channel occurs due to multiple reflection of sound waves. The receiving
signal, after being emitted from the speaker, passes the acoustic medium and,
distorted and delayed, reaches the microphone. An interference with the emitting
signal may occur here. Because of that normal dialog is disrupted, and in
dependence on the acoustic properties of the room a system instability may occur
and acoustic feedback may appear, which completely prevents conversation.
One assumes under the acoustic medium the interior of a room, or a car, or an
open space. Because of the diversity of the environments, their acoustic properties
are also different and variable. The reason for the variability may be the movement
of the user, opening/closing of doors, windows and similar. The changes of the
acoustic properties cause the variations of the acoustic echo duration, so that, for
instance, in a car at a sampling frequency of 8 kHz the duration of acoustic echo is
several hundreds of samples, while in a large room it may be up to several
thousands of samples [70, 71].
Fig. 6.2 The appearance of

acoustic echo. The acoustic
echo is a consequence of
short-circuiting between
the speaker and the
microphone due to reflection
from the surrounding objects
Speaker
Microphone
If the acoustic properties were known in advance one could design a filter with
invariable structure, which could utilize filtering of the receiving signal to generate
a replica of the acoustic echo and then subtract it from the microphone input
signal. Since the acoustic properties vary, the use of a filter with invariable
structure is not convenient. Even under the assumption that the acoustic properties
of the environment do not vary, such a solution would not be universal. The
conventional way to overcome this problem is to transit to half-duplex transmis-
sion, i.e. to break the connection with a microphone if the speaker is active, or to
use a directional microphone in order to reduce the influence of reflection to
minimum. Deficiencies of both of the quoted solutions are restrictions with regard
to the user, which make the conversation unnatural.
6.1.3 Echo in Data Transfer
It is possible to implement full-duplex transmission using separate transmission

lines, one for each direction, or using a single line with a division of the available
frequency range to different transmission directions.
A much more elegant and cost-effective solution is obtained using hybrids
(Fig. 6.3) which ensure implementation of full-duplex transmission at a single line,
utilizing the whole frequency range for transmission in both directions. The role of
the hybrid, like in the speech transmission, is to ensure the transition from a
four-wire to a two-wire form. The hybrid should ensure the signal flow from the
emitter at the A side to the receiver at the B side, as well as that from the emitter at
the B side to the receiver at the A side. At the same time, it should prevent the
signal flow from the emitter to the receiver at the same side. In other words, it
Emitter Emitter
local echo
A B
line echo
Receiver Receiver
Four-wire two-wire segment Four-wire
Fig. 6.3 Block diagram of full-duplex transmission at a single line, implemented using a hybrid.
Echo signals are denoted by dotted lines
should ensure large attenuation of signals moving in undesired directions, without

simultaneously degrading useful signal.
Under ideal circumstances all energy of the signal emitted from the transmitter
should be transferred to the two-wire segment and reach the receiver at the other
side. However, since the hybrid cannot be designed in advance to satisfy the
conditions of ideal impedance matching, but only roughly estimated [34], a part of
the transmitted signal nevertheless arrives to its own receiving side [72, 73]. This
is schematically shown in Fig. 6.3, A-side, by a straight dotted line. This signal is
called the local echo. A part of the signal arriving to the line is reflected because of
the impedance discontinuities of the telephone line [73], as well as due to
impedance mismatch between the line and the hybrid at the B-side, and also
finishes in its own receiver. This signal is called the line echo. The line echo is
schematically shown in Fig. 6.3 as a curved dotted line.
A characteristic of the local echo is that its amplitude is relatively large and its
time delay short, compared to the line echo which has a smaller amplitude and a
larger delay. The delay of the line echo may vary from several milliseconds for
local calls, to 600 ms when utilizing satellite connections [21]. Since typical signal
attenuation arriving from the opposite end of the line is about 40 dB [73] and the
attenuation of the hybrid in the blocking direction may fall below 10 dB, this
means that the ratio between the powers of the useful signal and of the echo signal
may be -30 dB. If one takes into account that for the quoted ratio of powers one
expects at least 20 dB for reliable transmission, then the echo level has to be
suppressed by 50 dB, and in some applications the circumstances may be such to
require a suppression up to 70 dB [74].
6.1.4 Basic Principles of Adaptive Echo Cancellation
In speech transmission, echo significantly impairs the communication quality only

if it occurs with a relatively large delay with respect to the original signal. This
implies that the delay is such that the listener can notice it, like for instance in
satellite transmissions. In data transfer a much more important factor is the relative
ratio of the useful signal to echo signal powers, because a variation of this ratio
varies the error probability.
A more efficient echo control compared to the mentioned methods can be
obtained by utilizing blocks for adaptive echo cancellation.
The causes of echo occurrence are multiple: first, the impossibility of ideal
matching of the hybrid; second, an analog telephone network has many impedance
irregularities causing signal reflection toward its source; and third, the reflection of
acoustic waves and acoustic coupling of speakers and microphone in hands-free
telephony.
For any type of echo, acoustic or electrical, the echo-canceling block must first
model the echo transmission path, then efficiently estimate the parameters of the
transmission path, and then generate a replica of the echo signal. The generated
signal is subtracted from the incident signal, in order to leave only the desired
signal. Adaptive filters are used for the estimation of the transfer function
parameters, since they are able to autonomously update their own parameters,
while simultaneously requiring little foreknowledge bout the characteristics of the
echo transmission path [26].
An adaptive filter consists of a digital filter and an adaptive algorithm (Chap. 2).
The digital filter should model the transmission path of the echo signal, and the
adaptive algorithm should update the filter parameter values according to the
changes of the transmission path. Adaptive digital filtering is required in order to
achieve as accurate replica of the echo signal as possible, since the transmission
path of the echo signal is in principle unknown and variable with time. The choice
of the digital filter and the adaptive algorithm depends on a particular application
and the expected effect in noise cancellation.
Acoustic noise cancellation [7, 62] is based on the identification of the unknown
acoustic transmission path between the speaker and the microphone. Since
acoustic echo occurs due to multiple reflection of the signal, the impulse response
of the acoustic channel is very long. Besides that, under normal circumstances, the
acoustic transmission path varies with time and contains additive noise, which
makes the problem very complex. The acoustic echo cancellation block, imple-
mented as an adaptive filter, is placed in parallel between the speaker and the
microphone, as shown in Fig. 6.4.
In speech transmission, due to the system symmetry, echo is generated at the
both ends of the network. Thus it is necessary to include two echo cancellation
blocks. They are placed in the four-wire segment, as close to the echo source as
possible, to render cancellation as efficient as possible. The principle is as follows:
the input signal is brought both to the hybrid and the block for adaptive echo
cancellation. The output from that block is then subtracted from the output from
the hybrid. The obtained difference signal represents the error signal for the
adaptive blocks and enters the process of the update of the adaptive block
parameters. A block diagram of such an implementation is shown in Fig. 6.5.
Fig. 6.4 Adaptive

cancellation of acoustic echo
Generation of a replica
of acoustic echo
In data transmission the function of the echo cancellation block is to model two
types of echo signal: local echo and line echo. Local echo occurs because of the
existence of the transmission path between the transmitter and the receiver at the
same side. The usual assumption, giving satisfactory results in practice, implies
linearity and slow variability in time of that transmission path (for instance, as a
consequence of temperature changes [70, 74]). Line echo usually has much lower
power than local echo, in principle it is nonlinear, varies with time and has much
larger delays. Because of that, although located in the same place of the network,
the echo cancellers are separately realized for these two cases. The block for local
and line echo cancellation is placed in parallel to the hybrid (Fig. 6.6).
The process of echo cancellation itself can be divided to two intervals: the
training period, occurring at the beginning of the communication, and the period of
following the variations of the characteristics of the echo signal transmission path.
The main goal of the echo canceller in the initial, training period, is to reach
extremely low echo signal in the shortest possible time. This level should be such
to ensure a satisfactory bit error or intelligibility of the received signal, in
dependence on the transmission type. On the other hand, during the communi-
cation itself, it is necessary to ensure the following of variations of echo channel
parameters in order to ensure a satisfactory level of noise cancellation.
Mechanisms of suppression of the local echo and the line echo have many
similarities, but also fundamental differences. For instance, during data transfer
one has all the time signals on the line in both directions, thus the problem of the
correction of parameter values during the transmission itself becomes more
complex. It is not a rule in speech transmission, since the participants usually talk
alternatively. Also, in data transmission one most often requires a much higher
degree of echo suppression than in speech transmission [18]. On the other hand, in
data transmission, the reference signal is a sequence of symbols which may
assume a limited set of values, which simplifies the implementation.
A B
AEC AEC
Fig. 6.5 Adaptive echo cancellation (AEC) for both directions of speech signal transmission
transmission
Generation of
echo signal
A replica
Receiving
four-wire segment two-wire segment
Fig. 6.6 Adaptive cancellation of local and line echo
6.2 Mathematical Model of an Echo Cancellation System
Echo in telecommunication networks appears as a consequence of the existence of

undesired transmission paths and influences the communication quality. Due to
insufficiently good impedance matching, echo is also generated on the hybrid that
serves as the interface between the four-wire and the two-wire network segments.
Because of that it is necessary to utilize echo cancellers, which are implemented as
adaptive filters. They are placed in parallel to the branch containing the hybrid.
The main task of the adaptive echo cancellers is to approximate the parameters of
the echo signal transmission path and to subsequently subtract the estimated echo
signal ^y k from the incoming signal y k in order to maximally suppress the real
echo signal. Besides that, when canceling echo interference or noise appears,
which may be modeled as a white Gaussian stochastic process [23]. In order to
reach a satisfactory degree
of echo cancellation one minimizes the mean square
error (MSE) criterion, E e2 k . In other words, the role of the adaptive filter is to
simulate the transmission path of echo signal as good as possible. It is well known
that the transmission path of echo signal may be modeled as a real rational
function with both zeroes and poles [23]. However, since a majority of the con-
temporary adaptive echo cancellers is implemented in the form of adaptive FIR
filters with zeroes only [28, 41], in the further analysis we utilize adaptive FIR
6.2 Mathematical Model of an Echo Cancellation System 197
filters for the approximation of the echo signal transmission path. Starting from
this, the transmission path of echo signal and echo signal itself can be modeled
utilizing the linear regression equation
yk XT kh vk ; 6:1
where XT represents the data vector and h parameter vector of echo signal
transmission path, defined as
XT k xk xk 1 xk 2 xk M
6:2
hT b0 b1 b2 bM :
The quoted approach requires the estimation of parameters defined by the
vector h, so that the MSE criterion is minimized.
However, as a consequence of the modeling of a system with both zeroes and
poles (IIR) utilizing a structure with zeroes only (FIR), the dimensions of the
parameter vector (the order M of the FIR filter) must be large to achieve satis-
factory echo cancellation. Besides that, the number of iterations necessary to reach
a convergence of the estimation procedure rises with an increase of the number of
parameters. An approach to improve the convergence properties of estimation
algorithms is based on an adequate choice of the input signal, xk, during the
training period. Namely, the input signal, xk, may be generated in advance, or
generated on-line utilizing an adequate algorithm during the training itself, taking
as the synthesis criterion the classes of the functionals utilized in the field of
optimal experiment planning. These algorithms are described in detail in Chap. 4.
6.3 Analysis of the Influence of Excitation Signal

to the Performance of Echo Cancellation System
for Speech Signal Transmission
In this Section we consider an echo cancellation system from the point of the
influence of excitation signals to the training of the adaptive block for echo can-
cellation in speech signal transmission. A characteristics of this situation is the
appearance of the so-called talker echo which is a consequence of the signal
flow in a loop. Since the signal propagation time is far longer than for local echo,
the dimensions of the vector describing the parameters of the transmission path are
also larger. The block for adaptive echo cancellation is placed in parallel to the
hybrid in the four-wire part of the line. As in data transfer, one needs to update
the coefficients of the adaptive filter. This is performed in the initial period before
the communication begins, utilizing the emitted excitation sequence and the
corresponding echo signal.
The goal of the analysis presented in this section is to ensure a better under-
standing of the influence of excitation signals to the performance of the echo
cancellation system.
The model of the adaptive echo cancellation block itself is shown in Fig. 6.7.
The coefficients of the adaptive FIR filter, defined by the vector H (k), are updated
during the training process utilizing the (NLMS) Normalized least mean square
algorithm expressed as [55]
K
H k H k 1 e k X k; 6:3
rx k
where
K algorithmconvergence
and stability factor, located within the range
2
0 \ K \ =M ;
M the length of the adaptive filter;
rx k estimated input signal power, xk, in k-th moment;
Xk column vector of input signal samples
The figure of merit of the success of echo cancellation is described as the (ERLE)
factor Echo return loss enhancement and is defined as [23]

Ey2 k
ERLE 10 log10 2
; 6:4
Ee k
where Ey2 k and Ee2 k are mathematical expectations of the squared echo
signal yk and the squared signal ek remaining after echo cancellation.
v(k)
+
Transmission path +
of echo signal
x (k) y( k)
+
Echo cancellation
^
block y( k ) -

H(k)
e(k)
Mechanism for
parameter update
(MSE minimization )
adaptive FIR filter
Fig. 6.7 Block diagram of an adaptive echo cancellation system

6.3 Analysis of the Influence of Excitation 199
An iterative method for the estimation of signal mean power r2x k and the
corresponding mathematical expectations Ey2 k and Ee2 k is given by the
recursive relation [55]
r2x x br2x k 1 1 bx2 k

Ey2 k bEy2 k 1 1 by2 k 6:5
Ee2 k bEe2 k 1 1 be2 k:
Let us note that the above relations have a form of the predictorcorrector
method, i.e. they represent a linear combination of the prior estimation and the
newly obtained observation, where the latter is generated according to only a
single realization of the stochastic process.
The value of the factor b is close to one, for instance 0.99. The adaptive FIR
filter length is M 256. It is assumed that a speech signal is transmitted within a
frequency range of (3003,400) Hz.
The goal of the designer is to train the adaptive filter to minimize the value of
the error signal mean power, Ee2 k, i.e. to obtain as large ERLE factor in (6.4) as
possible.
The experiment was implemented in the following steps [29]
1. The telephone line was simulated as an FIR band-pass filter in the range
3003,400 Hz.
2. Speech signal was released through this filter and a signal was obtained that
was considered echo. This pair of signals was used to investigate the value of
the ERLE factor.
3. The following were used as excitation signals:
I. Gaussian white noise with a zero mean value, passed through an ideal band-
pass filter with a range of:
a. 04,000 Hz
b. 03,700 Hz
c. 1003,600 Hz
d. 2003,500 Hz
e. 3003,400 Hz
f. 4003,300 Hz
g. 5003,200 Hz
h. 6003,100 Hz
i. 1,1002,600 Hz
j. 8002,900 Hz
II. A signal with a flat amplitude-frequency characteristics in the frequency
range defined under (a)(j) and a phase with an uniform distribution.
III. A signal with a flat amplitude-frequency characteristics in the frequency
range defined under (a)(j) and a phase with a normal distribution.
IV. Speech signal.
4. Each of these signals was passed through an FIR filter that simulated the line
and in this way signal pairs were obtained for the training of the adaptive filter.
After the completed training, based on the algorithm defined by the expression
(6.3), the values of the filter parameters are fixed and the value of the ERLE
factor is followed on the pair of the signals defined under 2.
5. Telephone line is simulated as a filter with infinite impulse response (IIR filter)
of Chebyshev type, and the bandwidth is defined so that the filter transfer
function is as similar as possible to the transfer function of the FIR filter defined
under 1.
6. Speech signal is passed through the defined IIR filter and the response is stored.
This signal pair is used to investigate the ERLE factor.
7. The excitation signals are the same as under 3, only the signal pairs for training
are obtained by passing the corresponding excitation sequences through the
defined Chebyshev-type IIR filter.
8. The telephone line is simulated by a Butterworth-type IIR filter, and the
bandwidth is defined so that the filter transfer function is as similar as possible
to the transfer function of the FIR filter defined under 1.
9. Experiments are performed analogously to the entries 68.
The accuracy of the calculation of filter coefficients and input data is 32-bit.
Figure 6.8 shows a family of curves representing the ERLE factor for the case
when the transmission path of echo signal is simulated by an FIR filter with a
range 3003,400 Hz. Training of the adaptive filter was done using signals with
completely flat amplitude-frequency characteristics (unit value), a random phase
with normal distribution and a different frequency range. The notation b, d, e, f and
h corresponds to the previously defined frequency ranges. The notation speech
is placed over the curve describing the ERLE factor obtained after training of the
adaptive filter on a speech signal.
Figure 6.9 shows the results obtained when the transmission path of echo signal
was simulated by a Butterworth-type FIR filter, training of the echo cancellation
block was performed using signals with a flat amplitude-frequency characteristic
and with a random phase with uniform distribution.
If the transmission path of the echo signal is simulated by a Chebyshev-type IIR
filter, and Gaussian noise with various frequency ranges is used as the excitation
signal, one obtains the results shown in Fig. 6.10.
The presented results point out to the following possibilities and conclusions:
1. The basic task of an echo canceller in the initial, training period, is to reach an
extremely low echo signal level in the shortest possible time. This level should
ensure a satisfactory bit error or intelligibility of the received signal, depending
on the transmission type. One of the approaches to the increase of the con-
vergence speed and accuracy enhancement in the process of parameter esti-
mation is to utilize adequate excitation (training) signals.
80
training following ERLE
b
70
d
60
e
50
ERLE dB
40
30
govor
20
10 f
0
h
-10
0 5000 10000 15000 20000 25000
Number of samples
Fig. 6.8 ERLE factor of the echo signal transmission path is simulated by an FIR filter
70
training following ERLE
60 b
50 d
ERLE dB
40 e
30
speech
20
10
f
0
h
0 5000 10000 15000 20000 25000

Number of samples
Fig. 6.9 ERLE factor of the echo signal transmission path is simulated by a Butterworth-type
IIR filter
2. The operation of echo cancellation blocks is based on the estimation of the

parameter of the undesired transmission path, i.e. on the estimation of the
unknown echo transfer function and generation of a replica of the echo signal,
to subsequently subtract the replica from the incoming signal.
70
b
60
50 d
e
ERLE dB
40
30
speech
20
10
f
0 h
0 500 10000 15000 2000 25000
Number of samples
Fig. 6.10 ERLE factor of the echo signal transmission path is simulated by a Chebyshev-type
IIR filter
Basically two families of algorithms dominate the adaptive estimation field:

(LMS) Least mean square values and (RLS) Recursive least mean square algorithm,
both of which were described in Chap. 2. Their basic characteristics are given in
Table 6.1. Good characteristics of the LMS algorithm include its relatively low
computational complexity, good numerical stability and successful following of
slow time variations of parameters. On the other hand, the RLS algorithm is
superior regarding its convergence speed to the optimal parameters, and the main
disadvantage of its use is a relatively high computational complexity [6].
3. One of the approaches to increase the convergence speed and accuracy in the
process of parameter estimation is based on the application of the adequate
excitation (training) signals. The analysis of the training sequences to the
Table 6.1 Characteristics of RLS and LMS adaptive algorithms

Algorithm Characteristics
RLS Recursive least Values of the adaptive filter parameters are updated using
squares algorithm minimization of the least square error value
Fast convergence, independent on the characteristics of the input
signal
Complex algorithmrequires a large number of computations in
each iteration
LMS Least mean squares Uses the gradient of the current value of squared error signal to
algorithm update parameters
Slow convergence, and the convergence properties of the
algorithm depend on the characteristics of the input signal
Low complexity of the algorithm
performance of the adaptive digital filter point out to the possibility of on-
line generation of optimal excitation signals, taking as the criterion the syn-
thesis of the class of functionals used in the field of optimal experiment plan-
ning. This approach was analyzed in detail in Chap. 4.
Such an approach results in a novel algorithm for FIR adaptive filtering, which
can be successfully applied for the needs of adaptive echo cancellation. The
algorithm is based on the use of optimal input sequences. Compared to the con-
ventional FIR filter where white Gaussian noise is used as excitation, it proves
itself that the solution is better here if the filter order is correctly chosen, since a
higher ERLE factor is achieved and the normalized estimation error is lower
(Chap. 4).
4. For the case of echo cancellation in speech signal transmission we analyzed the
adjustment of the frequency to the set parameters of the corresponding com-
munication channel, as well as the influence of the statistical-correlation
properties of the training sequences. The experimental results show that
regardless of the type of the filter used to simulate the telephone line, echo
cancellation depends on the characteristics of the excitation signal utilized for
adaptive filter training.
If the adaptive filter is trained using excitation signals with a frequency range
wider than the bandwidth of the simulated line, the ERLE factor is significantly
larger than in the case when the excitation signal has a narrower frequency range.
A decrease of the frequency range of the excitation signal below the frequency
range of the simulated line causes significant drop of the ERLE factor.
After the training of the adaptive filter using a speech signal, the ERLE factor is
far lower than one could intuitively expect.
These experimental results show that a carefully chosen signal for the training
of the echo cancellation block (adaptive digital filter) has a significant influence to
efficient echo cancellation. There is an obvious advantage of a spectrally flat signal
and Gaussian (white) noise over colored signals like the speech signal.
5. The results presented in Chap. 5 clearly point out to the possibility of robus-
tification of the standard least squares algorithm, with a goal to decrease or
remove the influence of the impulse component of additive noise in the pro-
cedure of identification of echo signal transmission path. The use of the robust
recursive least squares algorithm was proposed in Chap. 5.
A consequence of the presence of impulse disturbances is that the standard

methods for adaptive filtering, which minimize the sum of squared residuals,
generate inaccurate parameter estimations. The errors are manifested as the
appearance of impulses in the trajectories of the estimated parameters. The quoted
difficulty can be overcome using non-square criteria, which result in robust
parameter estimation. Such procedures take into account the non-Gaussian nature
of additive noise. Robust estimations minimize the sum of weighted residuals, and
the criterion function is chosen to weight more the small results compared to the
smaller part of large residuals.
A comparison of he proposed robust RRLS algorithm (Chap. 5) with the
conventional RLS algorithm (Chap. 2) in the simulation of local echo cancellation
shows that the RRLS algorithm has the necessary efficiency under the conditions
of pure Gaussian noise. In the case when echo signal is additively contaminated by
a mixture of impulse and Gaussian noise it is evident that the non-robust RLS
algorithm is very sensitive to the disturbances with such a nature and that its
properties are impaired, while the impulse changes practically do not influence the
robust RRLS algorithm.
According to the obtained results, the RRLS algorithm more efficiently removes
the influence of impulse noise compared to the standard RLS algorithm.
6. The robust recursive least squares algorithm with optimal input (RRLSO in
Chap. 5) unites two main problems: estimation of parameters of echo signal
transmission path under the conditions when echo signal is additively con-
taminated by impulse Gaussian noise, and on-line generation of controlled
training sequence, with a goal to improve estimation accuracy and convergence
speed.
Compared to the RRLS algorithm, it proves itself that for a correctly chosen
filter order of adaptive filter the RRLSO algorithm gives better results through
decreasing the impulse component of additive noise and increasing the conver-
gence speed and the accuracy of estimated parameters, which manifests through a
smaller normalized estimation error.
7. The possible directions of future results lead to a further evaluation and prac-
tical implementation of the solutions presented in this text. Also, the potential
advantages of the adaptive filters with infinite impulse response poses the need
for their more intensive use. Namely, it is known that the transmission path of
echo signal is a rational function, but because of the stability properties and the
convergence properties of the algorithm it is most often modeled as a finite
impulse response adaptive filter. This leads to an increased complexity
regarding the increase of the filter order, as well as the requirements regarding
the computational complexity. Because of that it is necessary to analyze the
possibility to use adaptive algorithms for the IIR filters in the problem of echo
cancellation, in order to solve this problem even more successfully.
References
Oppenheim AV, Schafer RW, Buch JR (2011) Discrete-time signal processing. Prentice Hall,
New Jersey
Mitra SK (2010) Digital signal processing a computer based approach. Mc Graw-Hill, New York
urovic ZM, Kovacevic BD (2004) Digitalni signali i sistemi: pregled teorije i reeni zadaci.
Akademska misao, Beograd
Widrow B, Stearns SD (1985) Adaptive signal processing. Prentice Hall, New Jersey
Haykin S (2012) Adaptive filther theory. Englewood Cliffs, Prentice Hall
Cowan C, Grant P (1985) Adaptive filters. Englewood cliffs, Prentice Hall
Gelb A (1974) Applied optimal estimation. MIT Press, Cambridge
Kovacevic B, Filipovic V (1998) Robust real-time identification of linear systems with correlated
noise. Int J Control 48(3):9931010
Sage AP, Melsa JL (1971) Estimation theory with applications to communications and control.
McGraw Hill, New York
Kovacevic BD, urovic ZM (2008) Fundamentals of stochastic signals, Systems and estimation
theory with worked examples. Springer, Berlin
Bar-Shalom Y, Li XR (1998) Estimation and tracking: principles techniques and software. CRC,
Danvers
Bard Y (1974) Nonlinear parameter estimation. Academic Press, New York
Wilcox RR (1997) Introduction to robust estimation and hypothesis testing. Academic Press, New
York
Huber P (1980) Robust statistics. Wiley, New York
Kovacevic DB (1984) Robust recursive system parameter identification, PhD Thesis, University
of Belgrade (in Serbian)
Van Trees HL (2001) Detection, estimation and modulation theory, part I-IV. Wiley, New York
Murano K, Unagami S, Amano F (1990) Echo cancellation and applications. IEEE Commun Mag
28(1):4955
Messerschmitt DG (1984) Echo cancellation in speech and data transmission. IEEE J Sel Areas
Commun SAC-2(2):283296
Veseghi SV (2006) Advanced signal processing and digital noise reduction. Wiley, New York
Banjac Z, Veinovic M, Kovacevic B, Milosavljevic M (2002) An application of adaptive FIR
filter with nonlinear optimal input design. In: 14th international conference on digital signal
processing, Santorini
Banjac Z, Kovacevic B, urovic Z, Milosavljevic M (1988) A class of algorithms for local echo
cancellation using optimal input design. In: Proceedings of MELECON 98, vol I, Tel-Aviv,
Israel, pp 13761379
Haykin S (1998) Neural networks: a comprehensive foundation. Prentice Hall, Upper Saddle
River
Fan H, Jenkis W (1988) An investigation of an adaptive IIR echo canceller: advantages and
problems. IEEE Trans Acoust Speech Sig Process 36(12):18191834

DOI: 10.1007/978-3-642-33561-7,
206 References
Ljung L, Soderstrom T (1983) Theory and practice of recursive identification. MIT Press,
Cambridge
Ljung L, Soderstrom T (1987) System identification: theory for the user. Prentice-Hall,
Englewood Cliffs
Solo V, Kong X (1999) Adaptive signal processing algorithmsstability and performance.
Prentice Hall, New Jersey
Widrow B (1976) Stationary and nonstationary learning characteristic of LMS adaptive filters.
Proc IEEE 64(8):1151
Tsypkin YZ (1984) Foundations of informational theory of identification. Nauka, Moscow
Sayed AH (2003) Fundamentals of adaptive filtering. Wiley, NJ
Uzunovic P, Banjac Z, Kovacevic B (2004) Adaptive IIR filtering: advances and problems. In:
Proceedings of the international conference on telecomunications, TELFOR, Belgrade
Cho YS, Kim SB, Powers EJ (1991) Time varying spectral estimation using ar models with
variable forgetting factors. IEEE Trans Sig Process 39(6):14221426
Fortescue TR, Kershenbaum LS, Ydstie BE (1981) Implementation of self-tuning regulators with
variable forgetting factors. Automatica 17(6):831835
Bard J (1974) Nonlinear parameter estimation. Academic Press, New York
Delgado JC, Tribolet JM (1984) Analog full-duplex speech scrambling systems. IEEE J Sel Areas
Commun SAC-2(3):456489
Hampel F (1974) The influence curve and its role in robust estimation. J Amer Stat Assoc 69:383
Chambers J, Tanrikulu O, Constantinides AG (1994) Least mean mixed-norm adaptive filtering.
Electron Lett 30:15741575
Eleftheriou E, Falconer DD (1986) Tracking properties and steady-state performance of RLS
adaptive filter algorithms. IEEE Trans Acoust Speech Sig Process 34:10971109
Peters SD, Antoniou A (1995) A parallel adaptation algorithm for recursive least squares adaptive
filters in nonstationary environments. IEEE Trans Sig Process 43(11):24842494
Willems JL (1970) Stability theory of dynamical systems. Nelson, Walton-on-Thames
Glentis GO, Berberidis K, Theodoridis S (1999) Efficient least squares adaptive filtering for FIR
transversal filtering. Sig Process Mag 16(4):1341
Mareels IMY, Bitmead RR, Gevers M, Johnson CR, Kosut RL, Poubelle MA (1987) How
exciting can a signal really be. Syst Control Lett 8:197204
Goodwin GC, Payne RL (1977) Dynamic system identification: experiment design and data
analysis. Academic Press, New York
Goodwin GC (1987) Identification: experiment design. In: Sing M (ed) Encyclopaedia of systems
and control. Pergamon Press, Oxford, pp 22572264
Tsypkin YZ (1983) Optimality in identification of linear plants. Int J Syst Sci 14(1):5974
Bard Y (1974) Nonlinear parameter estimation. Academic Press, New York
Barkat M (2005) Signal detection and estimation. Artech House, London
Kovacevic B, urovic Z, Filipovic V (1996) Robust system identification using optimal input
signal design. J Autom Control Univ Belgrade 6(1):1929
Soderstrom T (1973) An on-line algorithm for approximate maximum likelihood identification of
linear dynamic systems, Rep 7308, Lund Institute of Technology
Goodwin GC, Sin KS (1984) Adaptive filtering, prediction and control. Prentice Hall, New Jersey
Poularikas AD, Ramadan ZM (2006) Adaptive filtering primer with MATLAB. CRC Press,
Taylor and Francis, Boca Raton
Banjac Z, Kovacevic BD, Milosavljevic MM, Veinovic M (2002) An adaptive FIR filter for
echo cancelling using least squares with nonlinear input design. Control Intell Syst
30(1):2731
Banjac Z, Kovacevic BD, Milosavljevic MM, Veinovic M (2002) Local echo canceller with
optimal input for true full-duplex speech scrambling system. IEEE Trans Sig Process
50(5):18771882
Becker H, Piper F (1982) Cipher systems: the protection of communications. Northwood Books,
London
References 207
Cox RV, Tribolet JM (1983) Analog voice privacy systems using TFSP scrambling. Bell Sys
Tech J 62:4761
Park S, Hillman G (1989) On acoustic echo cancellation implementation with multiple
cascadable adaptive FIR filter chips. Proc ICASSP 2:952955
Kovacevic B, Milosavljevic M, Veinovic M, Markovic M (2004) Robust digital speech
processing. Academic Mind, Belgrade (in Serbian)
Chambers J, Avlonitis A (1997) A robust mixed-norm adaptive filter algorithm. IEEE Sig Process
Lett 4(2):4648
Astrom K (1980) Maximum likelihood and prediction error methods. Automatica 16:551
Barnett V, Lewis T (1978) Outliers in statistical data. Wiley, New York
Williamson GA, Clarkson PM, Sethares WA (1993) Performance characteristics of median LMS
adaptive filter. IEEE Trans Sig Process 41:667680
Banjac Z, Kovacevic BD, Veinovic M, Milosavljevic MM (2001) Robust least mean square
adaptive FIR filter algorithm. IEE Proc Vision Image Sig Proc 148(5):332336
Banjac Z, Milosavljevic M, Kovacevic B, Veinovic M (1998) Robust RLS algorithm for local
echo cancellation using optimal input design. In: Proceedings of the 1st international
conference on digital signal processing and its applications DSPA-98, vol 1, Moscow, Russia,
pp 225231
Banjac Z, Kovacevic BD (2005) Robust parameter and scale factor estimation in nonstationary
impulsive noise environment. In: Proceedings of theIEEE conference, EUROCON 2005,
Belgrade, Serbia and Montenegro
Banjac Z, Kovacevic BD, Veinovic M, Milosavljevic MM (2004) Robust adaptive filtering
with variable forgetting factor. WSEAS Trans Circ Syst 3(2):223229
Sondihi D, Berkly A (1980) Silencing echoes on the telephone network. In: Proceedings IEEE,
vol. 68, Aug 1980
Tao YG, Kolwicz K, Gritton CWK, Duttweiler DI (1984) A cascadable VLSI echo canceller.
IEEE J Sel Areas Commun SAC-2(2):297303
Yaukawa H, Furukawa I, Ishiyama Y (1989) Acoustic echo control for high quality audio
teleconferencing. In: Proceedings of ICASP 89, vol 2, Glasgow, Scotland, pp 20412044,
2326 May 1989
Tanrikulu O, Baykal B, Constantinides AG, Chambers JA (April 1997) Residual echo signal in
critically sampled subband acoustic echo cancellers based on IIR and FIR filter banks. IEEE
Trans Sig Process 45:901912
Kuo M, Pan Z (1994) Development and analyses of distributed acoustic echo cancellation
microphone system. Sig Process 37(3)
Verhoeckx NAM, van den Elzen HC, Snijders FAM, van Gerwen PJ (1979) Digital echo
cancellation for baseband data transmission. IEEE Trans Acoust Speech Sig Process ASSP
27(6)
Falconer D, Mueller KH (1979) Adaptive echo cancellation/AGC structures for two-wire, full-
duplex data transmission. Bell Syst Tech J 58(7)
Yip PCW, Etter DM (1990) An adaptive multiple echo canceller for slowly time-varying echo
paths. IEEE Trans Commun 38(10)
Medvecky M (1996) Modified NLMS algorithm for acoustic echo cancellation. In: Proceedings
IWISP 96, Manchester, UK, pp 683686, 47 Nov 1996
Chen J, Vandevalle J (1989) Study of nonlinear echo canceller with voltera expansion. In:
Proceedings of ICASSP 89, vol. 2, Glasgow, Scotland, pp 13761379, 2326 May 1989
Moon TK, Stirling WC (2000) Mathematical methods and algorithms in signal processing.
Prentice-Hall, NJ
Index
A Excitation signal, 1, 2, 26, 36, 54, 60, 8385,

Acoustic echo, 188, 191, 194 117, 134, 139, 197203
Adaptive algorithms, 2729, 42, 45, 5963 Extended prediction error (EPE), 7678,
Adaptive digital filters, 38, 39, 60, 62 92, 182
Adaptive echo suppression, 188, 193197
Aliasing, 2, 8
Amplification matrix, 92, 115 F
Amplitude characteristics, 4, 5 Filter
Angular frequency, 2 coefficients, 36, 48, 6062
Arithmetic mean, 41, 48 memory length, 76, 81
Asymptotic error covariance order, 75, 86, 89, 101, 130, 131, 158, 168,
matrix (AECM), 112 184, 185, 203, 204
Auto-correlation matrix, 38 Finite impulse esponse (FIR), 3, 28, 29, 3537,
4553, 5961, 78, 90, 101105, 109,
110, 117, 119121, 123, 134140, 149,
C 150, 156, 158, 163, 168, 180, 185, 196,
Causality, 3, 54, 60, 67, 71, 73 198, 200, 203
Chebyshev, 37, 200 Fishers information amount, 156
Convergence properties, 48, 62, 65, 148, 162, Fortescue, Kershenbaum
197 and Ydstie (FKY), 78, 81, 86, 96,
Convergence speed, 29, 30, 45, 106, 112, 139, 103108, 182
148, 153, 156, 197, 198, 200 Forgetting factor, 53, 54, 57, 58, 60, 67,
Covariant matrix, 79, 84, 89 7581, 8696, 100106, 139, 140, 167,
Cross-correlation, 10, 22 168, 181186
vector, 38, 43 Frequency response, 5, 8
Full-duplex, 131, 132, 138
E
Echo, 187189 G
Echo canceller (EC), 131134 GaussNewton, 63
Echo return loss enhancement (ERLE), 136, Global minimum, 33, 39, 59, 61
137, 139, 198203 Gradient, 3948, 65, 6772, 118, 119, 121,
Efficiently robust, 152 123, 130, 149, 150, 153, 163, 171, 202
Equation error (EE), 6063, 67 estimation, 37
Error signal, 31, 37, 46, 49, 54, 59, 66
Estimation, 78, 88, 89, 91, 94, 98, 100, 109,
112, 115117, 121124, 129131, H
135146, 149154, 164, 165, 175176, Hessian, 64, 65, 118, 119, 122, 123
179, 194, 194, 197198, 201204 Huber, 152, 156, 163, 178

DOI: 10.1007/978-3-642-33561-7,
210 Index
Hubers nonlinearity, 148, 171, 176, 178 ML estimator, 150, 156

Hybrid, 133, 189195 Monotonicity, 152
I N
Identification, 110, 123, 133, 140, 147150, NewtonRaphson, 163, 164, 171
158, 180, 184 Newtons method, 135, 146, 168, 169
Impulse interference, 139, 149, 152, 163 Noise variance, 77, 179, 180
Impulse noise, 139, 148, 149, 151, 152, 154, Nonstationarity, 7577, 87, 88, 9193,
155, 157162, 166, 168, 170, 177183 101103, 145, 183
Infinite impulse response (IIR), 3, 29, 3239, Nonstationary environments, 75, 78, 87, 139,
5966, 71, 84, 109, 134, 197, 200204 180
Influence function, 152, 155, 157, 163, Normal distribution, 134
178, 183 Normalized estimation error (NEE), 135, 136,
Iteration, 91, 101, 109, 110, 136139, 154, 139, 140, 145, 146, 158, 169, 170, 179
176, 182, 185 Normalized information matrix, 112, 113
Normalized least mean square algorithm
(NLMS), 51, 198
K
Kalman filter, 16, 17, 2126
KramerRao, 27 O
Optimal filters, 9
Optimal input, 112115, 132, 134, 136, 138,
L 140, 146
Lag error, 87 Optimal output, 109, 110, 117, 133, 134, 150,
Laplace distribution, 151 166169, 188
Laplace transform, 7, 10, 14 Ordinary differential equations (ODE), 115, 124
Least absolute deviation (LAD), 151, 158 Outlier, 147149, 151, 158, 162, 182, 184186
Least mean square (LMS), 4649, 53, 62, 67, Output eror (OE), 60, 62, 6668, 71
109, 148154, 157162, 202
Least squares algorithm, 48
Line echo, 133, 193, 195 P
Local echo, 131, 132, 134, 136, 188, 193195, Parallel adaptation (PA), 92, 96, 101, 103, 104,
197, 204 106, 108, 140, 181, 182
Lyapunov, 115, 124, 128131, 155 Parallel adaptation recursive least square
(PA-RLS), 86, 92, 140146, 181, 182,
185, 186
M Parameter
Mathematical expectation, 84, 88, 93, 117, identification, 101, 109, 123, 148, 180
118, 122, 126, 127, 136, 163, 171 update, 33, 73
Maximal likelihood, 147, 150, 151, 156, 161, vector, 3740, 42, 4547, 52, 53, 60,
170 6266, 68, 69, 110118, 112, 126,
Mean-square criterion, 117 131133, 145, 149, 154, 162, 16.3, 168,
Mean square error (MSE), 1013, 27, 37, 171, 174, 183
3944, 47, 62, 64, 67, 72, 88, 9092, Prediction error, 76, 77, 85, 92, 98, 105, 111,
109, 111, 117, 122, 123, 131134, 117, 118, 121, 124, 130
162, 197 Probability density, 38, 97, 150, 152, 156, 157
Median, 77, 78, 149, 153, 178, 182 Pseudo-linear regression (PLR), 72
Median least mean square (MLMS), 149, 154, Pseudo-random binary sequence (Prbs), 135
158162
Median of absolute deviation (MAD), 153,
154, 178 R
M-estimator, 150, 152, 170 Random disturbance, 147
Mixed normal distribution, 151 Random walk (RW), 8788
Index 211
Recursive least square (RLS), 46, 51, 5358, T

62, 72, 75, 78, 82, 87, 89, 91, 96, 101, Transfer function, 3, 610, 1216, 27, 31, 32,
102, 131, 134, 139141, 162, 165, 169, 3436, 45, 72
178, 179, 182 Time constant, 55, 58, 76, 104, 105
Recursive prediction error (RPE), 6771
Residual, 21, 2325, 27, 53, 58, 64, 76, 78, 80,
85, 87, 93, 94, 98100, 117, 122, U
148150, 152, 154, 162166, 176, 178, Unbiased estimation, 84
182 Unit impulse, 32, 36
cutoff, 152
Risk function, 118
Robust, 147, 148, 155, 161, 162, 165, 166, V
170, 178 Variable forgetting factor (VFF), 75, 76,
estimation, 147149, 151, 165, 178 100107, 140, 141, 145, 168
Robustification, 147, 152
Robust least mean square (RLMS), 150, 151,
154157 W
Robust mixed norm (RMN), 149, 151, Weighted recursive least square (WRLS),
158162 5357, 60, 67, 79, 87
Robust recursive least square (RRLS), 148, Weight matrix, 117
149, 162168, 178182, 184, 186 White normal noise, 134, 145
RRLS algorithm with optimal input (RRLSO), Wiener, 9, 11, 15, 16, 26
148, 166170 WienerHopf, 41
S Z
Scaling factor, 117, 123, 153, 156, 157, 170, Zeroes
172, 174181 filter, 32, 3436
Scrambler, 131, 133 polynomial, 4
Signal to noise ratio (SNR), 77, 83, 103, 104,
134138, 140146, 168, 177
Steepest descent method, 4247
Structures of digital filters, 31

Branko Kovačević, Zoran Banjac, Milan Milosavljević (Auth.) - Adaptive Digital Filters-Springer-Verlag Berlin Heidelberg (2013) PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Branko Kovačević, Zoran Banjac, Milan Milosavljević (Auth.) - Adaptive Digital Filters-Springer-Verlag Berlin Heidelberg (2013) PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Branko Kovacevic

Adaptive Digital Filters

ISBN 978-3-642-33560-0 ISBN 978-3-642-33561-7 (eBook)

Library of Congress Control Number: 2013935402

Academic Mind Belgrade and Springer-Verlag Berlin Heidelberg 2013

Printed on acid-free paper

Academic Mind is official publisher of University of Belgrade, School of Electrical Engineering

The book Adaptive Digital Filters appeared as a result of years of cooperation

construction of input signal, which belongs to the D-class of optimal experiment

Belgrade, May 2012 The Authors

3 Finite Impulse Response Adaptive Filters with Variable

4 Finite Impulse Response Adaptive Filters with Increased

5 Robustification of Finite Impulse Response Adaptive Filters ..... 147

5.4 Robust Recursive Least Squares Algorithm with Variable

6 Application of Adaptive Digital Filters for Echo

AECM Asymptotic error covariance matrix

RPE Recursive prediction error

1.1 Conventional Approach to the Design of Digital Filters

The conventional approach is based on the design of frequency selective filters

B. Kovacevic et al., Adaptive Digital Filters, 1

Fig. 1.1 Specification of H eff ( j)

Bandpass Transition Bandstop

characteristic at zero frequency should be equal to zero. Another important

sk 11=2N jXc Xc ejp=2N 2kN1 ; k 0; 1; . . .; 2N 1: 1:7

Fig. 1.3 Amplitude-

The amplitude-frequency characteristics (frequency response) of a Butterworth

Fig. 1.4 Typical graph of the

Fig. 1.5 Procedure for

The corresponding impulse response is then determined as the inverse Laplace

where u n denotes the samples of a unit step signal (u n 1 for n 0 and

The application of this method is not motivated by the preservation of the

Filter Transformation Parameters

where Hp determines the bandpass frequency of the original filter, while xp

1.2 Optimal Filters

1.2.1 Wiener Filter

Fig. 1.6 Block diagram of a Wiener filter

A block diagram of a Wiener optimal filter is given in Fig. 1.6. As shown in

Wo s egsSv sWo s egsg ds;

Fig. 1.7 MSEcriterion as a MSE

Bearing in mind the property of symmetricity of spectral power densities,

Obviously (1.39) will be fulfilled for an arbitrary value of g s if

Sz s EfZ sZ sg EfY s V sY s V sg

I sSy sD1 s As Bs; 1:44

Since all poles of the rational function B s D s g s are located in the

where Res si represents the residuum of the integrand function B sD s g s

According to (1.48) we conclude that the optimal physically realizable (F0)

Szy s EfZ sY sg EfY s V sY sg

1.2.2 Kalman Filter

yk Hxk vk; 1:58

2) x0unknown initial condition with a mathematical expectance m0 and a

~xk xk ^xk ; 1:65

r2k TracefPk g 1:69

~xk I KkH~xk Kkvk: 1:73

r2k TracefPk g 2TracefKkHPk g

Pk FPk1 FT GQGT : 1:91

Table 1.1 Heuristic description of Kalman filter function

Table 1.2 Flow diagram of the algorithm of digital Kalman filtration

and the covariance matrix is

1.3 Adaptive Filters

The design of an optimal filter is based on previous knowledge on the signal

Fig. 1.8 Block diagram of

2.2 Structures of Digital Filters

B. Kovacevic et al., Adaptive Digital Filters, 31

where u n denotes the samples of a unit step signal (u n 1 for n 0 and

Wo s egsSv sWo s egsg ds;

Sz s EfZ sZ sg EfY s V sY s V sg

Szy s EfZ sY sg EfY s V sY sg

~xk I KkH~xk Kkvk: 1:73