Lecture 6 (ADAPTIVE FILTERS)

11/19/2011
Adaptive Filtering
Lecture 6
Steepest Descent Method
Dr. Tahir Zaidi
Mean Square Error (Revisited)
For a transversal filter (of length M), the output is written as

and the error term wrt. a certain desired response is
Week 5
Adaptive Signal Processing
11/19/2011
Following these terms, the MSE criterion is defined as

Quadratic in w !
Substituting e(n) and manupulating the expression, we get
where
Week 5
For notational simplicity, express MSE in terms of vector/matrices
where
Week 5
11/19/2011
We found that the solution (optimum filter coef.s wo) is given by the
Wiener-Hopf eqn.s
Inversion of R can be very costly.

J(w) is quadratic in w convex in w for wo,
Surface has a single minimum and it is global, then
Can we reach to wo, i.e.
Week 5
with a less demanding algorithm?
Basic Idea of the Method of Steepest Descent
Can we find wo in an iterative manner?
Week 5
11/19/2011
Starting from w(0), generate a sequence {w(n)} with the property
Many sequences can be found following different rules.
Method of steepest descent generates points using the gradient

Gradient of J at point w, i.e.
gives the direction at which
the function increases most.
Then
gives the direction at which the function
decreases most.
Release a tiny ball on the surface of J it follows negative
gradient of the surface.
Week 5
For notational simplicity, let
, then going in the direction given by the negative gradient
How far should we go in g defined by the step size param.

Optimum step size can be obtained by line search - difficult
Generally a constant step size is taken for simplicity.
Then, at each step improvement in J is (from Taylor series expansion)
Week 5
11/19/2011
Application of SD to Wiener Filter
For w(n)
From the theory of Wiener Filter we know that
Then the update eqn. Becomes
which defines a feedback connection.
Week 5
Convergence Analysis
Feedback may cause stability problems under certain conditions.

Depends on
The step size,

The autocorrelation matrix, R
Does SD converge?
Under which conditions?
What is the rate of convergence?
We may use the canonical representation.
Let the weight-error vector be

then the update eqn. becomes
Week 5
10
11/19/2011
Let
be the eigendecomposition of R.
Then
Using QQH=I
Apply the change of coordinates
Then, the update eqn. becomes
Week 5
11
We know that is diagonal, then the k-th natural mode is

or, with the initial values vk(0), we have
Note the geometric series
Week 5
12
11/19/2011
Obviously for stability

or
or, simply
Why?
Geometric series results in an exponentially decaying curve with

time constant k, where letting
Week 5
13
We have
then
but
We know that Q is composed of the eigenvectors of R, then
or
Each filter coefficient decays exponentially.

The overall rate of convergence is limited by the slowest and fastest
modes
Week 5
14
11/19/2011
For small step size
What is v(0)? The initial value v(0) is
For simplicity assume that w(0)=0, then
Week 5
15
Transient behaviour:
From the canonical form we know that
then
As long as the upper limit on the step size parameter is satisfied,

regardless of the initial point
Week 5
16
11/19/2011
The progress of J(n) for n=0,1,... is called the learning curve.
The learning curve of the steepest-descent algorithm consists of a

sum of exponentials, each of which corresponds to a natural mode
of the problem.
# natural modes = # filter taps
Week 5
17
Example
A predictor with 2 taps (w1(n) and w2(n)) is used to find the params.
of the AR process
Examine the transient behaviour for

Fixed step size, varying eigenvalue spread
Fixed eigenvalue spread, varying step size.
v2 is adjusted so that u2=1.
Week 5
18
11/19/2011
Example
Week 5
19
Example
The AR process:
Two eigenmodes
Condition number
Week 5
20
10
11/19/2011
Example (Experiment 1)
Experiment 1: Keep the step size fixed at
Change the eigenvalue spread
Week 5
21
Week 5
ELE 774 - Adaptive Signal Processing
22
11
11/19/2011
Week 5
23
Keep the eigenvalue spread fixed at
Change the step size (max=1.1)
Week 5
24
12
11/19/2011
Week 5
25
Depending on the value of , the learning curve can be

Overdamped, moves smoothly to the min. ((very) small )
Underdamped, oscillates towards the min. (large < max)
Critically damped
Generally rate of convergence is slow for the first two.
Week 5
26
13
11/19/2011
Observations
SD is a deterministic algorithm, i.e. we assume that

R and p are known exactly.
In practice they can only be estimated
Sample average?
Can have high computational complexity.
SD is a local search algorithm, but for Wiener filtering,

the cost surface is convex (quadratic)
convergence is guaranteed as long as < max is satisfied.
Week 5
27
Observations
The origin of SD comes from the Taylor series expansion (as many
other local search optimization algorithms)
Convergence can we very slow.

To speed up the process, second term can also be included as in
the Newtons Method
Hessian
High computational complexity (inversion), numerical stability

problems.
Week 5
28
14

Lecture 6 (ADAPTIVE FILTERS)

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lecture 6 (ADAPTIVE FILTERS)

Hochgeladen von

Copyright:

Verfügbare Formate

11/19/2011

Mean Square Error (Revisited)

For a transversal filter (of length M), the output is written as

Adaptive Signal Processing

Mean Square Error (Revisited)

Following these terms, the MSE criterion is defined as

Substituting e(n) and manupulating the expression, we get

Adaptive Signal Processing

Mean Square Error (Revisited)

For notational simplicity, express MSE in terms of vector/matrices

Adaptive Signal Processing

Mean Square Error (Revisited)

Inversion of R can be very costly.

Can we reach to wo, i.e.

with a less demanding algorithm?

Adaptive Signal Processing

Basic Idea of the Method of Steepest Descent

Can we find wo in an iterative manner?

Adaptive Signal Processing

Basic Idea of the Method of Steepest Descent

Starting from w(0), generate a sequence {w(n)} with the property

Many sequences can be found following different rules.

Method of steepest descent generates points using the gradient

Adaptive Signal Processing

Basic Idea of the Method of Steepest Descent

For notational simplicity, let

, then going in the direction given by the negative gradient

How far should we go in g defined by the step size param.

Then, at each step improvement in J is (from Taylor series expansion)

Adaptive Signal Processing

Application of SD to Wiener Filter

From the theory of Wiener Filter we know that

Then the update eqn. Becomes

which defines a feedback connection.

Adaptive Signal Processing

Feedback may cause stability problems under certain conditions.

The step size,

We may use the canonical representation.

Let the weight-error vector be

Adaptive Signal Processing

Apply the change of coordinates

Then, the update eqn. becomes

Adaptive Signal Processing

We know that is diagonal, then the k-th natural mode is

Note the geometric series

Adaptive Signal Processing

Obviously for stability

Geometric series results in an exponentially decaying curve with

Adaptive Signal Processing

Each filter coefficient decays exponentially.

Adaptive Signal Processing

For small step size

What is v(0)? The initial value v(0) is

For simplicity assume that w(0)=0, then

Adaptive Signal Processing

As long as the upper limit on the step size parameter is satisfied,

Adaptive Signal Processing

The progress of J(n) for n=0,1,... is called the learning curve.

The learning curve of the steepest-descent algorithm consists of a

Adaptive Signal Processing

Examine the transient behaviour for

Adaptive Signal Processing

Adaptive Signal Processing

Adaptive Signal Processing