Sie sind auf Seite 1von 254

EEL 6537 Spectral Estimation

Jian Li
Department of Electrical and Computer Engineering
University of Florida
Gainesville, FL 32611, USA

Spectral Estimation is an Art


Petre Stoica
I hear, I forget;
I see, I remember;
I do, I understand.
A Chinese Philosopher.

What is Spectral Estimation?


From a finite record of a stationary data sequence, estimate how
the total power is distributed over frequencies , or more practically,
over narrow spectral bands (frequency bins).

Spectral Estimation Methods:


Classical (Nonparametric) Methods
Ex. Pass the data through a set of band-pass filters and measure
the filter output powers.
Parametric (Modern) Approaches
Ex. Model the data as a sum of a few damped sinusoids and
estimate their parameters.
Trade-Offs: (Robustness vs. Accuracy)
Parametric Methods may offer better estimates if data closely
agrees with assumed model.
Otherwise, Nonparametric Methods may be better.

Some Applications of Spectral Estimation


Speech
- Formant estimation (for speech recognition)
- Speech coding or compression
Radar and Sonar
- Source localization with sensor arrays
- Synthetic aperture radar imaging and feature extraction
Electromagnetics
- Resonant frequencies of a cavity
Communications
- Code-timing estimation in DS-CDMA systems

REVIEW OF DSP FUNDAMENTALS


Continuous-Time Signals
Periodic signals
x(t) = x(t + Tp )
Fourier Series:
x(t)

ck ej2kFo t

k=

ck
Fo

=
=

1
Tp

x(t)ej2kFo t dt,

Tp

1
.
Tp

x(t)

Ex.

/2

Tp

FT

ejo t 2( o )
x()

2F0

2 c0
2c1

0
0

Ex.
s(t) =

k=

(t kT )
s(t)

t
-2T

Ck =

1
T

-T

for all k

2T

S ()
2 /T

2 /T

Remark:

Periodic Signals Discrete Spectra.


8

Discrete signals
x ()

x(t)

Ex:
x(t) s(t)

Remark:

2/

Discrete Signals Periodic Spectra.

Discrete Periodic Signals Periodic Discrete Spectra.

Aliasing Problem:
Ex.

10

* Fourier Transform (Continuous - Time vs. Discrete-Time)


Let y(t) = x(t)s(t)
CT F T :

Y ()

=
=

n=
Z

y(t)ejt dt

=
=
DT F T :

Y ()

x(nT )(t nT )

n=

x(nT )(t nT )ejt dt

x(nT )ejnT

n=

X
n=

11

x(nT )ejnT

Discrete-Time
Signal

y()

x(nT)
DTFT

nT
0

2
T

Remarks: Discrete-Time Fourier Transform (DTFT) is the same as


Continuous-Time Fourier Transform (CTFT) with x(nT ) (t nT )
R
P
replaced by x(nT ) and replaced by
(easy for computers).
12

For simplicity, we drop T.


X ()

x(n)

DTFT

-2

OR

-1

DT F T

-1/2

1/2

X() = P
jn
n= x(n)e
P air :
R
x(n) = 1 X()ejn d
2

13

Remark: For DTFT, we also have:


DT F T

Discrete Periodic Signals Periodic Discrete Spectra.


X ()

x(n)

DTFT

Ex.
2

Note The Aliasing


When x(n + N ) = x(n),

x(n) = 1 PN 1 X(k)ej2k Nn ,
k=0
N
DF T P air :
X(k) = PN 1 x(n)ej2k Nn
n=0
14

Ex. Note the Aliasing


x(n)

X(k)

DFT

-10

10

-10

10

Showing One Period


DFT

k
0

Remarks: For periodic sequences, DFT and DTFT yield similar


spectra. IDFT (Inverse DFT) is the same as IDTFT (inverse


R
2k
2k
DTFT) with X N N replaced by X(k) and replaced
P
by
(easy for computers).
15

Effects of Zero-Padding:
X ()

x(n)

DTFT

x(n)

X(k)

DFT
5 points
1

n
x(n)
X(k)

DFT
10 points

3
0

8
9

Remark: The more zeroes padded, the closer X(k) is to X().


X(k) is a sampled version of X() for finite duration sequences.
16

Z-Transform

X(z) = P
n
n= x(n)z
R
x(n) = 1
X(z)z n1 dz
2j

For finite duration x(n),

X(z)

N
1
X

x(n)z n

n=0

The DFT X(k) is related to X(z) as follows:


X(k) = X(z)|

j 2 k
N

z=e

Im

Re

( X(k) evenly sampled on the unit circle of the z-plane)

17

Linear Time-Invariant (LTI) Systems.


N th order difference equation:
N
1
X
k=0

ak y(n k) =

M
X

k=0

bk x(n k)

Impulse Response:
(n)

h(n) = y(n)|x(n)=(n)
PM
bk z k
H(z) = PNk=0 k .
k=0

18

ak z

Bounded-Input Bounded-Output (BIBO) Stability:


All poles of H(z) are inside the unit circle for a causal system
(where h(n)=0, n< 0).
FIR Filter:

N=0.

IIR Filter:

N>0.

Minimum Phase: All poles and zeroes of H(z) are inside the unit
circle.

19

ENERGY AND POWER SPECTRAL DENSITIES


Energy Spectral Density of Deterministic Signals.
Finite Energy Signal if
0<

n=

Let X() =

n=

|x(n)| <

x(n)ejn

Parsevals Energy Theorem:

R
2
1
P
n= |x(n)| = 2 S()d,
2

S() = |X()|
2

Remark: |X()| measures the length of orthogonal projection of


 jn
{x(n)} onto basis sequence e
, [, ].
20

Let (k) =

(k)ejk

n=

x(n)x (n k).

k= n=

k=

"

x(n)x (n k)ejn ej(nk)

x(n)ejn

n=
2

= |X()| = S().

#"

x(s)ejs

s=

Remark: S() is the DTFT of the autocorrelation of finite


energy sequence {x(n) }.

21

Power Spectral Density (PSD) of Random Signals.


Let {x(n)} be wide-sense stationary (WSS) sequence with
E[x(n)] = 0.
r(k) = E[x(n)x (n k)].
Properties of autocorrelation function r(k).
r(k) = r (k).
r(0) |r(k)| , for all k
0 r(0) = average power of x(n).

22

Def: A is positive semidefinite if zH Az 0 for any z.

(zH = (zT ) Hermitian transpose ).


Let
A

r(0)

r(k)

r (k) r(0)

h
i
x(n)
x (n) x (n k)
= E
x(n k)

Obviously, A is positive semidefinite.

Then all eigenvalues of A are 0.


determinant of A 0.
2

r2 (0) |r(k)| 0.
23

Covariance matrix:

r(0)
r(1)
r(m 2) r(m 1)

..
..

.
.
r (1)
r(0)
r(m 2)

..
..

..
..
..
R=
.
.
.
.
.

..
..
..
..
..

.
.
.
.
.

r (1)
r(0)
r (m 1) r (m 2)
It is easy to show that R is positive semidefinite.

R is also Toeplitz.
Since R = RH , R is Hermitian.

24

Eigendecomposition of R

R = UUH ,

where UH U = UUH = I
(U is unitary matrix whose columns are eigenvectors of R)
= diag(1 , ..., m ),
(i are the eigenvalues of R, real, and 0).

25

First Definition of PSD:

X
P () =
r(k)ejk
k=
Z

1
r(k) =
2

Or

P (f ) =

P ()ejk d

r(k)ej2f k

k=

r(k) =

1
2

P (f )ej2f k df

12

Remark: Since r(k) is discrete, P () and P (f ) are periodic, with


period 2 () and 1 (f ), respectively.
We usually consider [, ] or f [ 12 , 21 ].
26

r(0) =

1
2

P ()d = Average power for all frequency.

PSD

Average power between 1 and 2

1 2

27

Second Definition of PSD.



2
1

1 NX


.
P () = lim E
x(n)ejn

N
N

n=0

This definition is equivalent to the first one under


1
lim
N N

N
1
X

k=N +1

|k| |r(k)| = 0

(which means that {r(k)} decays sufficiently fast ).


Properties of PSD.
P () 0 for all .
For real x(n), r(k) = r(k), P () = P (), [, ].

For complex x(n), r(k) = r (k).


28

PSD for LTI Systems.

x(n)

H ()

y(n)

Py () = Px ()|H()| .
Complex (DE) Modulation.
y(n) = x(n)ej0 n .
It is easy to show that
ry (k) = rx (k)ej0 k .
Py () = Px ( 0 ).
29

Spectral Estimation Problem


From a finite-length record {x(0), ..., x(N 1)}, determine an
estimate P () of the PSD, P (), for [-,].
NonParametric Methods:
Periodogram:
Recall the second definition of PSD:

2
1

1 NX


P () = lim E
.
x(n)ejn

N
N

n=0

P
2


N
1
Periodogram = Pp () = N1 n=0 x(n)ejn .

Remark: Pp () 0 for all .


If x(n) is real, Pp () is even.
E[Pp ()] = ? Var[Pp ()] = ? (to be discussed later on)
30

Correlogram (See first PSD definition)


N
1
X

Correlogram = Pc () =

r(k)ejk .

k=(N 1)

Unbiased Estimate of r(k):

k 0,
k < 0,

r(k) =

1
N k

PN 1

r(k) = r (k)

31

i=k

x(i)x (i k)

x(i)

Ex.

i
0

r(0) =

1
3

P2

0 (1)(1)

= 1, (average of 3 points)
P2
1
r(1) = r(1) = 2 1 (1)(1) = 1, (average of 2 points)
P2
1
r(2) = r(2) = 1 2 (1)(1) = 1, (average of 1 point)
r(3) = r(3) = 0.
pc ()

r(k)

k
2 1 0

32

Remark:
r(k) is a bad estimate of r(k) for large k.
E[
r(k)] = r(k) (unbiased )
Proof:

"

1
E[
r(k)] = E
N k

N
1
X
i=k

x(i)x (i k)

N
1
X
1
=
r(k) = r(k)
N k
i=k

Pc () based on unbiased r(k) may be 0.

33

Biased Estimate of r(k) (used more often!)

k 0, r(k) = 1 PN 1 x(i)x (i k),


i=k
N
k < 0, r(k) = r (k),

Remark:

N 1
1 X
E[x(i)x (i k)]
E[
r(k)] =
N
i=k

N 1
1 X
N k
r(k)
=
r(k) =
N
N
i=k

r(k), as N
(Asymptotically unbiased)

34

x(i)

Ex.

i
0

r(0) =

1
3

P2

0 (1)(1)

= 1.
P2
r(1) = r(1) = 13 1 (1)(1) = 23 .
P2
1
r(2) = r(2) = 3 2 (1)(1) = 13 .
r(k)

Pc ( )

DTFT

-2 -1 0 1 2

35

Remark:
With biased r(k), Pc () = Pp () 0,for all
E[
r(k)] 6= r(k)
E[
r(k)] r(k), as N Asymptotically unbiased.

r(0)
r(1)
r(N 1)

(1)
r

(0)

(N

2)

R=
..
..
..
..

.
.
.
.

r (N 1) r (N 2)
r(0)

is positive semidefinite.
with r(k) biased estimate. Then R

36

General Comments on Pp () and Pc ().


Pp () and Pc () provide POOR estimate of P (). (The
variances of Pp () and Pc () are high.)
Reason: Pp () and Pc () are from a single realization of a random
process.
Compute Pp () via FFT.

Recall DFT: (N 2 complex multiplication)


X(k) =

N
1
X

x(i)ej N ki

i=0

1
2

Pp (k) = |X(k)| .
N

37

Let
W

j 2
N

= e

X(k) =

N
1
X

, N = 2m

x(n)W kn

n=0
N
2

1
X

x(n)W kn +

n=0

N
1
X

x(n)W kn

n= N
2

N
2




1 
X
Nk
N
W 2 W kn
x(n) + x n +
2
n=0

Note:
W

Nk
2

j 2
N

Nk
2

=e
= ejk

1,
even k
=
1, odd k
38


X(2p) = PN 1 x(n) + x(n + N ) W kn , k = 2p = 0, 2, ...
n=0
2
N

 kn
P
N
2 1
X(2p + 1) =
x(n) x(n + ) W , k = 2p + 1,
n=0

which requires 2


N 2
2

complex multiplication

This process is continued till 2 points.


Remark: An N = 2m -pt FFT requires O(N log2 N ) complex
multiplications.
Zero padding may be used so that N = 2m .

Zero padding will not change resolution of Pp ().

39

FUNDAMENTALS OF ESTIMATION THEORY


Properties of a Good Estimator for a constant scalar a
Small Bias:
Bias = E[
a] a
Small Variance:
Consistent:

n
o
2
Variance = E (
a E[
a])

a
a as Number of measurements .

40

Ex. Measurement
y = a + e,
Where a is an unknown constant and e is N (0, 2 ).
Find a
from y ?

f(y|a)
Pdf of y:
y
a

41

Maximum Likelihood (ML) Estimate of a:


Say y = 5, we want to find a
so that it is most likely that the
measurement is 5
f (y|a)
|a=aM L = 0.
a

5 = a ML

a
M L = y

V ar[
aM L ] = V ar[y] = 2

E[
aM L ] = E[y] = E[a + n] = a

42

Ex. y = a + e
Three independent measurements y1 , y2 , y3 are taken.
a
M L = ? Bias = ? Variance = ?
f (yi |a) =

1 e
2

(yi a)2
2 2

1
e
f (y1 , y2 , y3 |a) = 3i=1 2
f (y1 ,y2 ,y3 |a)
|a=aM L
a

(yi a)2
2 2

=0

1
a
M L = (y1 + y2 + y3 ).
3


1
E[
aM L ] = E (y1 + y2 + y3 ) = a.
3
V ar[
aM L ] =
=

1
V ar(y1 + y2 + y3 )
9
2

1 2
( + 2 + 2 ) =
.
9
3
43

Ex. x is a measurement of an uniformly distributed random


variable on [0, ], where is an unknown constant. M L = ?

M L = x

x
ML = x

Question: What if two independent measurements x1 and x2 are


taken ?

M L =max (x1 , x2 ).

x1

44

x2

Cram
er - Rao Bound.
Let B(a) = E [
a(r)|a] a denote the bias of a
(r), where r is the
measurement.
Then
h

i
2
M SE = E (
a(r) a) |a

1+ a
B(a)]
[

.
2
E [ a ln f (r|a)] |a

* The denominator of the CRB is known as Fishers Information,


I(a).
* If B(a) = 0, the numerator of CRB is 1.

45

Proof: B(a)

= E [
a(r) a|a]
Z
[
a(r) a] f (r|a)dr
=

Z
Z

[
a(r) a]
f (r|a)dr
B(a) =
f (r|a)dr
a
a

|
{z
}
=1
Z

1
1+
[
a(r) a] f (r|a) f (r|a)
B(a) =
dr
a
a
f (r|a)

a f (r|a)
ln f (r|a) =
But
a
f (r|a)
Z

B(a) =
ln f (r|a)dr
1+
[
a(r) a] f (r|a)
a
a

nR
h
i o2
p
p


[
a(r) a] f (r|a) a
ln f (r|a)
f (r|a) dr

2

= 1 + a B(a) .
46

Schwarz Inequality:
Z
 12 Z
g1 (x)g2 (x)dx
g1 2 (x)dx

g2 2 (x)dx

 21

where = holds iff g1 (x) = cg2 (x) for some constant c (c is


independent of x).

2
Z


2
1+
B(a)

[
a(r) a] f (r|a)dr
a

)
(Z
2


ln f (r|a) f (r|a)dr
.
a
|
{z
}
I(a)

where = holds iff

a
(r) a = c a
ln f (r|a).

(where c is a constant independent of r).


47

Efficient Estimate:
An estimate is efficient if
(a.) It is unbiased
o
n
2
(b.) It achieves the CR - bound, i.e, E [
a(r) a] |a = CRB.
Ex. r = a + e

where a is unknown constant, e N (0, 2 ). a


M L = ? efficient ?
1
12 (ra)2
e 2
2
1
1
2
2 (r a) .
ln f (r|a) = ln
2 2

1
ln f (r|a) = 2 2(r a)
a
2
1
= 2 (a r).

f (r|a) =

48

=0
ln f (r|a)
a
a=
aM L

a
M L = r

1
(a a
M L )
ln f (r|a) =
a
2
2

ln f (r|a) = a
M L a
a
h
i
2
E (
aM L a) a = CRB
a
M L efficient
E [
a ] = E [r] = a, unbiased
ML

Remark: MSE = V ar[


aM L ] = V ar[r] = 2 .
(
(
2 )
2 )

1 2
1
1

ln f (r|a) a = E
I(a) = E
(a

r)
=

=

a
2
4
2

1
CRB =
aM L ].
= 2 = V ar[
I(a)
49

Remarks:
(1) If a
(r) is unbiased, V ar[
a(r) ] CRB.
(2) If an efficient estimate a
(r) exists, i.e,

ln f (r|a) = c[
a(r) a]. (c is independent of r.)
a
then
0=

M L (r) = a
(r).
ln f (r|a)|a=aM L (r) results in a
a
If an efficient estimate exists, it is a
M L .

(3) If an efficient estimate does not exist, how good a


M L (r) is
depends on each specific problem.
No estimator can achieve the CR-bound. Bounds (for example,
Bhattacharya, Barankin) larger than the CR-bound may be found.
50

Independent measurements r1 , ..., rN available, where ri may or


may not be Gaussian.
Assume
a
M L

N
1 X
=
ri .
N i=1

Law of large numbers: a


M L a
Central Limit Theorem:

a
M L has Gaussian distribution as N .

51

Asymptotic Properties of a
M L (r1 , ..., rN )
(a) a
M L (r1 , ..., rN ) a (
aM L is a consistent estimate.)
N

(b) a
M L is asymptotically efficient.
(c) a
M L is aymptotically Gaussian.
Ex. r = g 1 (a) + e,
Let

e N (0, 2 ). a
M L =?

efficient ?

b = g 1 (a). Then a = g(b)

 dg 1 (a)

1
1
ln f (r|a) = 2 r g (a)
|a=aM L = 0
a

da
a
M L = g(r) = g(bM L ).
Invariance property of ML estimator
If a = g(b) then a
M L = g(bM L ).
a
M L may not be efficient. a
M L is not efficient if g() is a
nonlinear function.
52

PROPERTIES OF PERIODOGRAM
Bias Analysis
When r(k) is a biased estimate,
i
h
i
E Pp () = E Pc () = E
h

k < 0,

N
1
X

k=(N 1)

r(k)ejk

N k
k 0, E [
r(k)] =
r(k),
N
N +k
N |k|
E [
r(k)] = E [r (k)] =
r (k) =
r(k),
N
N


N
1
i
h
X
|k|

1
r(k)ejk .
E Pp () =
N
k=(N 1)

53

Bartlett or Triangular Window.


w B (k)
1

k
-(N-1)

N-1

i
X
E Pp () =
[wB (k)r(k)] ejk

k=

Let
h

DT F T

wB (k) WB ()

i
E Pp () =

1
2

54

P ()WB ( )d.

When r(k) is unbiased estimate,


h

i
E Pp () =

1
2

P ()WR ( )d .

DT F T

wR (k) WR ()
w R (k)
1

k
-(N-1)

P( )

N-1

E [ P()]

W ()
B,R

55

W B ()

Main lobe

Side lobes

3 dB power width of main lobe

2
N

(or

1
N

in Hz) .

Remark: The main lobe of WB () smears or smooths P ().


Two peaks in P () that are separated less than
resolved in Pp ().

2
N

cannot be

N1 in Hz is called spectral resolution limit of periodogram


methods.
56

Remark:
The side lobes of WB () transfer power from high power
frequency bins to low power frequency bins leakage.
Smearing and leakage cause more problems to peaky P () than
to flat P ().
If P () = 2 , for all , E[Pp ()] = P ().
Bias of Pp () decreases as N . (asymptotically unbiased.)

57

Variance Analysis
We shall consider the case x(n) is zero-mean circularly symmetric
complex Gaussian white noise.

K E[x(n)x (k)] = 2 (n k).


E[x(n)x(k)] = 0 for all n, k.
J

is equivalent to:

E [ Re(x(n))Re(x(k))] = 2 (n k).
2
E [Im(x(n))Im(x(k))] = 2 (n k).

E [Re(x(n))Im(x(k))] = 0.

Remark: The real and imaginary parts of x(n) are N (0, 2 ) and
independent of each other.
58

Remark: If x(n) is zero-mean complex Gaussian white noise, Pp ()


is an unbiased estimate.
r(k) = 2 (k).
N
1
h
i
X
E Pp () =

k=(N 1)

Pp ()

|k|
1
N

r(k)ejk = 2

r(k)ejk = 2

k=

i
= E Pp () .

59

For Gaussian complex white noise,


E [x(k)x (l)x(m)x (n)] = 4 [(k l)(m n) + (k n)(l m)] .
h

i
E Pp (1 )Pp (2 )

N 1 N 1 N 1 N 1
1 X X X X

E
[x(k)x
(l)x(m)x
(n)]
N2
m=0 n=0
k=0 l=0

ej1 (kl) ej2 (mn)


1
1 N
4 N
X
X

= 4 + 2
ej(1 2 )(kl)
N
k=0 l=0

2
N
1


4 X



= 4 + 2
ej(1 2 )k

N
k=0
(
)2
N
4
sin[(1 2 ) 2 ]

4
= + 2
N
sin (1 2 )
2

60

i
lim E Pp (1 )Pp (2 ) = P (1 )P (2 ) + P 2 (1 )(1 2 ).
N
nh
ih
io
lim E Pp (1 ) P (1 ) Pp (2 ) P (2 )
N

P 2 ( ), =
1
1
2
=

0,
1 6= 2 ( uncorrelated if 1 6= 2 )

Remark: Pp () is not a consistent estimate.


If 1 6= 2 , Pp (1 ) and Pp (2 ) are uncorrelated with each other.
This variance result is also true for

X
h(k)x(n k),
y(n) =
k=0

where x(n) is zero-mean complex Gaussian white noise.


x(n)

h (n)

y(n)

61

REFINED METHODS
Decrease variance of P () by increasing bias or
decreasing resolution .
Blackman - Tukey (BT) Method
Remark: The r(k) used in Pc () is poor estimate for large lags k.
M <N :

M
1
X

PBT () =

w(k)
r(k)ejk ,

k=(M 1)

where w(k) is called lag window.


Remark: If w(k) is rectangular, w(k)
r(k) is a truncated version of
r(k).
DT F T

If r(k) is a biased estimate, and w(k) W ()


R
1

PBT () = 2 W( )Pp ()d .


62

Remark: BT spectral estimator is locally weighted average of


periodogram Pp ().
The smaller the M , the poorer the resolution of PBT () but the
lower the variance.
Resolution of PBT ()
Variance of PBT ()

M
N

1
M.
M

fixed
0 .

For fixed M , PBT () is asymptotically biased but variance 0.


Question: When is PBT () 0 ?

63

DT F T

Theorem: Let Y () y(n),

(N 1) n N 1

Then Y () 0 iff

y(0)
y(1) y(N 1)
0

y(1)
y(0) y(N 2) y(N 1)

..
..

.
.

y[(N 1)]
y(0)
y(1)

..
..
..
..

.
.
.
.

..
.
is positive semidefinite.

In other words, Y () 0 iff


, 0, , 0, y[(N 1)], , y(0), y(1), , y(N 1), 0, is a
positive semidefinite sequence.
64

Remark: PBT () 0 iff {w(k)


r(k)} is a positive semidefinite
sequence.
PBT () 0 iff

BT =
R

w(0)
r(0)

w(M 1)
r(M 1)
..
..
.
.

w[(M 1)]
r[(M 1)]

..

..
.

w(0)
r(0)
..

BT 0.
is positive semidefinite, i.e, R

65

w(0)

BT =
R
w[(M 1)]

..
.

r(0)

J
r[(N 1)]

..
.

w(M 1)
..
.

..
.

= Hadamard matrix product:


J
th
(ij) element: (A B)ij = Aij Bij
66

w(0)
..

r(N 1)
..
.

..
.

r(0)
..

Theorem:
If A 0 (positive semidefinite) B 0 then A

B 0.

Remark: If r(k) is a biased estimate, Pp () 0 . Then if W()


0 , we have PBT () 0 .
Remark: Nonnegative definite (positive semidefinite) window
sequences: Bartlett, Parzen.

67

Time-Bandwidth Product
Equivalent Time Width Ne :
PM 1
w(n)
n=(M 1)
Ne =
w(0)
Ex.
PM 1
k=(M 1) (1)
Ne =
= 2M 1.
1
w (n)
R

n
-(M-1)

M-1

68

Ex.

1 |n| , (M 1) n (M 1)
M
wB (n) =

0,
else
w (n)
B

Ne = M
n
0

-(M-1)

69

M-1

Equivalent Bandwidth e :
2e =

W ()d

W (0)

DT F T

Since w(n) W ().


Z
1
w(n) =
W ()ejn d.
2
Z
1
W ()d.
w(0) =
2
W () =

M
1
X

w(n)ejn .

n=(M 1)

W (0) =

M
1
X

n=(M 1)

70

w(n)

Ne e =

PM 1

Ne e = 1

n=(M 1) w(n)
R
1
2 W ()d

W ()d

PM 1
2 n=(M 1)

w(n)

=1

(Time Bandwidth product.)

Remark:
If a signal decays slowly in one domain, it is more concentrated in
the other domain.
Window shape determines the side lobe level relative to W (0).

71

Ex:

1 
.
x(2n) X
2
2
DT F T

X()

x(n)
1
1/2
n

x(2n)

1/2
1/4
n
0

Remark: Once the window shape is fixed, M Ne e .


M main lobe width .
72

Window design for PBT ()


Let m = 3dB main lobe width.
Resolution of PBT () m

Variance of PBT ()

1
m .

Choice of m is based on the trade-off between resolution


and variance, and N
Choice of window shape is based on leakage, and N .
Practical rule of thumb:
1. M

N
10 .

2. Window shape based on trade-off between smearing and leakage.


3. Window shape for PBT () 0,

Remark: Other methods for Non-parametric Spectral


Estimation include: Bartlett, Welch, Daniell Methods.
All try to reduce variance at the expense of poorer resolution.
73

Bartlett Method
} | {z
}
x(n):
}
| {z
| {z
x1 (n)

x2 (n)

xL (n)

x(n) is an N point sequence.

xl (n), l = 1, , L, are M point sequences.


N
.
xl (n) are non-overlapping. L = M
2

M
1


X
1


Pl () =
xl (n)ejn


M
n=0

1X

PB () =
Pl ().
L
l=1

Remark:
PB () 0, .

For large M and L, PB () [ PBT () using wR (n) ]


74

Welch Method:
xl (n) may overlap in the Welch method.
xl (n) may be windowed before computing Periodogram.
x1 (n)
x 2 (n)

x s (n)

Let w(n) be the window applied to xl (n), l = 1, .., S, n = 0, .., M 1


Let
M 1
1 X
2
P = power of w(n) =
|w(n)|
M n=0

75

2

M
1


X
1


w(n)xl (n)ejn
Pl () =


MP
n=0

S
X
1
PW () =
Pl ()
S
l=1

Remarks: By allowing xl (n) to overlap, we hope to have a larger


S, the number of Pj () we average. 50% overlap in general.
Practical examples show that PW () may offer lower variance
than PB (), but not significantly.
PW () may be shown to be PBT () -type estimator, under
reasonable approximation.
PW () can be easily computed with FFT -favored in practice
PBT () is theoretically favored.

76

Daniell Method:
PD () =
P ()

1
2

R +

Pp ()d.

Remark: PD () is a special case of PBT () with

1 , [, ]
DT
F
T

w(n) in PBT () W () =
0, else .

The larger the , the lower the variance, but the poorer the
resolution.
77

Implementation of PD ()
Zero pad x(n) so that x(n) has N points, N >> N.

Calculate Pp (k ) with FFT.


k =

2
k,

k = 0, , N 1.

k+J
X
1
PD (k ) =
Pp (j ).
2J + 1
j=kJ

Pp ()

| {z }

2J+1 points averaging

PD (k )

78

PARAMETRIC METHODS
Parametric Modeling
Ex.
r(0) 12
e
P (f ) =
2f

f
f

2

1
, |f |
2

P(f)

Remark: P (f ) is described by 2 unknowns: r(0) and f .


Once we know r(0) and f , we know P (f ), the PSD.
Nonparametric methods assume no knowledge on P (f ) too
many unknowns.
Parametric Methods attempt to estimate r(0) and f .
79

Parsimony Principle:
Better estimates may be obtained by using an
appropriate data model with fewer unknowns.
Appropriate Data Model.
If data model wrong, P (f ) will always be biased.
Estimate

True PSD
f

To use parametric methods, reasonably correct a priori


knowledge on data model is necessary.

80

Rational Spectra:

2


P () = 2 B()
A()

A() = 1 + a1 ej + + ap ejp
B() = 1 + b1 ej + + bq ejq .
Remark: We mostly consider real valued signals here.
a1 , , ap , b1 , , bq are real coefficients.
Any continuous PSD can be approximated arbitrarily close by a
rational PSD.

81

u(n)

H () = B ()
A ()

x(n)

u(n) = zero-mean white noise of variance 2 .

Remark:

2



2 B()
Pxx () =
.
A()

The rational spectra can be associated with a signal obtained by


filtering white noise of power 2 through a rational filter with
H() =

82

B()
.
A()

In Difference Equation Form,


x(n) =

p
X

k=1

ak x(n k) +

In Z-transform Form, z = ej
H(z) =

q
X

k=0

bk u(n k).

B(z)
,
A(z)

A(z) = 1 + a1 z 1 + + ap z p
B(z) = 1 + b1 z 1 + + bq z q

x(n)

z -1

x(n-1)

Unit Delay line

Notation sometimes used :

z 1 x(n) = x(n 1)

B(z)
u(n)
Then: x(n) =
A(z)
83

ARMA Model: ARMA(p,q)

AR Model: AR(p)

MA Model: MA(q)


2


2 B()
.
P () =

A()
2



2 1
P () =
.
A()
2

P () = 2 |B()| .

Remark: AR models peaky PSD better .


MA models valley PSD better.
ARMA is used for PSD with both peaks and valleys.

84

Spectral Factorization:
H() =

B()
A()


2


2 B()B ()
2 B()
=
.
P () =
A()
A()A ()
A() = 1 + a1 ej + + ap ejp

b1 , , bq , a1 , , ap are real coefficients.


A ()

= 1 + a1 ej + + ap ejp
1
1
1
= 1 + a1 + + ap p = A( )
z
z
z
P (z) =

B(z)B( )
2 A(z)A( 1z ) .
z

Remark: If a1 , , ap , b1 , , bq are complex,

B(z)B ( z1 )

P (z) =
1

A(z)A z
2

85

Consider

1
B(z)B(
2
z)
P (z) =
1 .
A(z)A( z )

Remark: If is zero for P (z), so is


If is a pole for P (z), so is

1
.

1
.

Since the a1 , , ap , b1 , , bq are real, the poles and zeroes of


P (z) occur in complex conjugate pairs.
Im

Re

86

Remark:
If poles of

1
A(z)

inside unit circle ,

H(z) =

If zeroes of B(z) inside unit circle, H(z) =


phase.

B(z)
A(z)

B(z)
A(z)

is BIBO stable.

is minimum

We chose H(z) so that both its zeroes and poles are inside unit
circle.
u(n)

H (z) =

B (z)
A (z)

x(n)

Stable and
Minimum Phase system

87

Relationships Among Models


An MA(q) or ARMA(p,q) model is equivalent to an AR().
An AR(p) or ARMA(p,q) model is equivalent to an MA()
model
Ex:

1 + 0.9z 1
H(z) =
=
1 + 0.8z 1
H(z)

=
=
=

ARMA(1,1)

1
1
(1 + 0.8z 1 ) (1+0.9z
1 )
1
(1 + 0.8z 1 )(1 0.9z 1 + 0.81z 2 + )
AR().

Remark:Let ARMA(p,q) =

B(z)
A(z)

1
C(z)

= AR().

From a1 , , ap , b1 , , bq , we can find c1 , c2 , and vice versa.


88

B(z)
1
Since
=
B(z)C(z) = A(z)
A(z)
C(z)



1
q
1
1 + b1 z + + bq z
1 + c1 z +


1
p
= 1 + a1 z + + ap z

1
0 0

c1
1
0 0

..
.

.. ..
..
1

.
.
.

.. .. ..
b1 .
.
.
.
..

cp

2 =
.. .. ..

.
.
. 1
cp+1
a

.. p

.
..
0
.. .. ..

.
.
.
.
c1

bq
.
..

..

.
cp

..
.
89

()

cp+1
.
.
.

cp+q

cp
..
.
cp+q1

cp
..
.

cpq+1
..
..
.
.
..
.
cp

1
b1
..
.

bq

0
0
..
.
0

b
c
cpq+1
1
p+1
..
.
..
. = ..
.

cp
bq
cp+q

Remark: Once b1 , , bq are computed with ()


computed with ().

90

.()

a1 , , ap can be

Computing Coefficients from r(k).


AR signals.
Let

1
A(z)

x(n) =

= 1 + 1 z 1 + 2 z 2 +

1
A(z) u(n)

= u(n) + 1 u(n 1) +

E [x(n)u(n)] = 2
E [x(n k)u(n)] = 0, k 1

Since A(z)x(n) = u(n)

x(n) + a1 x(n 1) + + ap x(n p) = u(n)

h
i a
1
= u(n)
x(n) x(n 1) x(n p)

ap
91

k = 0,

h
i a1
=
E x(n) x(n) x(n 1) x(n p)
.

ap

h
i

r(0) r(1) r(p)

92

a1
..
.
ap

2 . ()

2 .

k 1,

h
i
a1
=
E x(n k) x(n) x(n 1) x(n p)
.

..

ap

h
i

r(k) r(k 1) r(k p)

93

a1
..
.
ap

0. ()

0.

r(0)

r(1)

r(p)

r(0)
r(p + 1)
r(1)

..
..
..
.
.
.

r(p) r(p 1)
r(0)

Ra = r

1
a1
..
.
ap

2
0
..
.
0

r(p + 1)
a
r(1)
1

..
..
..
. = . .
.

ap
r(p 1)
r(0)
r(p)
r(0)
..
.

94

Remarks:
When we only have N samples, {r(k)} is not available. {
r(k)}
may be used to replace {r(k)} to obtain a
1 , , a
p .
This is the Yule - Walker Method.
R is a positive semidefinite matrix. R is positive definite unless
x(n) is a sum of less than p2 sinusoids.
R is Toeplitz.
Levinson - Durbin algorithm is used to solve for a efficiently
AR models are most frequently used in practice.
Estimation of AR parameters is a well-established topic.

95

Remarks:
If {
r(k)} is a positive definite sequence and if a1 , , ap are found
by solving Ra = r, then the roots of polynomial
1 + a1 z 1 + + ap z p are inside the unit circle.
The AR system thus obtained is BIBO stable
Biased estimate {
r(k)} should be used in YW-equation to obtain
a stable AR system:

96

Efficient Methods for solving


Ra = r

a = r
or R

Levinson - Durbin Algorithm.


Delsarte - Genin Algorithm.

1
Gohberg - Semencul Formula for R1 or R

(Sometimes, we may be interested in not only a but also R1 )

97

Levinson - Durbin Algorithm (LDA)


Let

Rn+1

r(0)
r(1)
r(n)

r(0)

r(1)
,

= .

.
..

..

r(n) r(n 1)
r(0)
n = 1, 2, , p

a
n,1
.
Let n = ..

an,n
98

( real signal )

LDA solves

Rn+1 =

n
0

recursively in n, starting from n = 1.


Remark:
For n = 1, 2, , p,

LDA needs p2 flops

Regular matrix inverses need p4 flops.

99

Let A = Symmetric and Toeplitz.

bn

bn1
=
, with b =
Let b

..

b1
Then if c = Ab

c = Ab

100

b1
..
.
bn

Proof:

Aij

ci

A=

= a|ij|

= cni+1 =

a0
a1
..
.
an1

n
X

a1
..
.
..
.

an1
..
..
.
.
..
.
a1

a1

a0

Ani+1,k bk

k=1

=
=

n
X

k=1
n
X

a|ni+1k| bk

m=1

a|mi| bnm+1 =

(Ab)
i
101

n
X

m=1

Am,ibm

(m = n k + 1)

Consider:


Rn+2
n =

..
Rn+1
. r(n + 1)
..
.
r(n)
..
..
.
.
..
.
r(1)
..
r(0)
r(n + 1) .

Let

rn =

Then n = r(n + 1) + n T rn .

102

r(1)
..
.
r(n)

n 0

0
n

Result:

n
Let kn+1 = . Then
n

n
+ kn+1
.
n+1 =
0
1
2
n+1 = n (1 kn+1
)

103

Proof:

Rn+2

1
n+1

= Rn+2
+
k

n+1
n

= 0 + kn+1 0

n
n


n + kn+1 n
n+1

= 0
=
0

n + kn+1 n
0

104

LDA: Initialization:
n=1:

R2 =

r(0) r(1)
r(1) r(0)
1 = r(1)
r(0)

1 = r(0)

r 2 (1)
r(0)

1
1

1
0

O(1) flops

For n = 1, 2, , p 1, do:
T
r(n+1)+ n
rn
=
n flops
n

2
n+1 = n (1 kn+1
)
O(1) flops

n
+ kn+1
. n flops
n+1 =
0
1
105

O(1) flops

k1 = 1

kn+1

Ex:

a1 = 0
.
0
1
a2

Straightforward Solution:

a
1
1 =
a2

=
2
(1 ) 1
2

2 = 1 2 .
=
0
106

LDA: Initialization:

r(1)

r(0)
1 =

k2

1 = r(0)

r 2 (1)
r(0)

k1 = 1 = .

= 1 2 .

r1 = ,

r(2) + 1T r1
=
1
2 + ()
=
=0
1 2

= 1 (1 k22 ) = (1 2 )(1 02 )
=

1 2 = 2

107

1
0

+ k2

+ 0

1
1

a1
a2

Properties of LDA:
|kn | < 1,

n = 1, 2, , p,

and r(0) > 0, iff

An (z) = 1 + an,1 z 1 + + an,n z n = 0


has roots inside the unit circle.
|kn | < 1,

n = 1, 2, , p,

and r(0) > 0

108

iff

Rn+1 > 0

Proof (for the second


to prove:

1 an,1 an,n

..
..

.
.

0
1 a1,1

property above only): We first use induction

{z

UT
n+1

1
r(0) r(n)

an,1

..

..

.
r(n)
r(0)
{z
} an,n
|
|
}
Rn+1

..

0
1
r(0)
{z

Dn+1

109

()

1
..

a1,1
{z
Un+1

n=1:

r(0) r(1)
r(1) r(0)
1 a1,1

1
a1,1

r(0) r(1)

r(0)

110

r(1) r(0)

r(1)
1

0 r(0)

1 a1,1

.
1

a1,1

Suppose () is true for n = k 1, i.e.,


UTk Rk Uk = Dk .
Consider n = k:
UTk+1 Rk+1 Uk+1

=
Since

Tk

UTk

r(0)

rTk

rk

Rk

r(0) + Tk rk

rTk + Tk Rk

UTk rk

Uk Rk

Rk+1

1
k

111

k
0

0
Uk

Uk

r(0)

rTk

rk

Rk

r(0) + rTk k = k

k
0

r(0) + Tk rk

rTk + Tk RTk

rk + Rk k = 0

UTk+1 Rk+1 Uk+1

rTk + Tk Rk

UTk rk

UTk Rk

= k

=0
1

Uk
0

UTk rk + UTk Rk k UTk Rk Uk

k 0

= Dk+1
=
0 Dk
112

UTn+1 Rn+1 Un+1 = Dn+1 .


()

1
R
Since U1
n+1 n+1

proven !
1
T
= D1
Un+1
n+1 ,

1
T
R1
n+1 = Un+1 Dn+1 Un+1 .
12
Un+1 Dn+1

is called Cholesky Factor of R1


n+1

Consider the determinant of Rn+1 :


det(Rn+1 ) = det(Dn+1 ) = r(0)nk=1 k

det(Rn+1 ) = n det(Rn )

Rn+1 > 0,

n = 1, 2, , p,

and k > 0,

k = 1, 2, , p.
113

iff

r(0) > 0

Recall
2
).
n+1 = n (1 kn+1

If Rn+1 > 0,
n > 0,

r(0) > 0,

n = 1, 2, , p,

n n+1
=
n
Since n n+1 < n ,
2
kn+1

2
<1
kn+1

If

|kn | < 1,

|kn+1 | < 1.

r(0) > 0,

= r(0) > 0,
0

n+1 = n (1 k 2

2
< 1.
kn+1

n+1 )

> 0,

114

n = 1, 2, , p 1

MA Signals:
x(n)

= B(z)u(n)
= u(n) + b1 u(n 1) + + bq u(n q)

r(k)

= E [x(n)x(n k)]
= E {[u(n) + + bq u(n q)]
[u(n k) + + bq u(n q k)]}
|k| > q :

|k| < q :

r(k)

qk
X

bl bl+k ,

l=0

r(k)

= 2

r(k) = 0

= r(k).

q>k0

q <k <0

real.
b0 = 1, b1 , , bq =
Pq
P () = k=q r(k)ejk .
115

Remarks: Estimating b1 , , bq is a nonlinear problem.


P () =

A simple estimator is

q
X

r(k)ejk .

k=q

* This is exactly Blackman - Tukey method with rectangular


window of length 2q + 1.
* No matter whether r(k) is biased or unbiased estimate, this P ()
may be < 0 .
* When unbiased r(k) is used, P () is unbiased.
* To ensure P () 0 , , we may use biased r(k) and
a window with W () 0, . For this case, P () is biased.
This is again exactly BT-method.
A most used MA spectral estimator is based on a Two-Stage
Least Squares Method. See the discussions on ARMA later.
116

ARMA Signals:

(Also called Pole -Zero Model).

(1 + a1 z 1 + + ap z p )x(n) = (1 + b1 z 1 + + bq z q )u(n).
Let us write x(n) as MA():
x(n) = u(n) + h1 u(n 1) + h2 u(n 2) +

E [x(n)u(n)] = 2 .

E [u(n)x(n k)] = 0, k 1
ARMA model can be written as

x(n)

h
i x(n 1)
h
= 1 b1
1 a1 ap
..

x(n p)

u(n)

i
u(n 1)
bq
..

u(n q)

Next we shall multiply both sides by x(n k) and take E {.} .


117

k= 0:

1 a1

k = 1:

i r(1)
h

ap
.. = 1 b1
.

r(p)

1 a1

r(0)

ap

r(1)
r(0)
..
.
r(p 1)

= 1 b1

..
.
118

i
2 h1
bq
..
.

2 hq

bq

2 h1
..
.
2 hq1

k q+1

1 a1

r(k)

i
r(k + 1)
ap
..

r(k + p)

= 1 b1

bq

r((q + 1))
r(q)
r((q + 1) + p)

r((q + 2)) r((q + 1)) r((q + 2) + p)

..
..

.
.

This is the modifed YW - Equation

119

0
0
..
.
0
1

a1
..
.
ap

= 0.

= 0.

To solve for a1 , , ap we need p equations. Using r(k) = r(k)


gives

r(q + 1)
r(q)
r(q p + 1)
1

r(q + 1)
r(q p + 2) a1
r(q + 2)

..

.. = 0.
..

.
.
.

r(q + p) r(q + p 1)

r(q p + 1)
a1
..

.
.
r(q + 1)
..

..

.
a
p
r(q + p 1)
r(q)
r(q)

120

ap

r(q)

r(q + 1)

r(q + 2)

=
..

r(q + p)

Remarks:
(1) Replacing r(k) for r(k) above, we can solve for a
1 , , a
p .
(2) The matrix on the left side
is nonsingular under mild conditions.
is Toeplitz.
is NOT symmetric.
Levinson - type fast algorithms exist.

121

What about the MA part of the ARMA PSD?


Let

y(n) = (1 + b1 z 1 + + bq z q )u(n).

The ARMA model becomes


(1 + a1 z 1 + + ap z p )x(n) = y(n)
y(n)

1
A(z)

x(n)

x(n)

A(z)

y(n)



1 2
Py ().
Px () =
A()

Let k be the autocorrelation function of y(n) . Then (see MA


signals).
q
X
Py () =
k ejk
k=q

122

= E [y(n)y(n k)]
= E [A(z)x(n)A(z)x(n k)]

p
p
X
X
= E
ai x(n i)
aj x(n j k)
i=0

p X
p
X
i=0 j=0

j=0

ai aj r(k + j i).

p may be computed with the modified YW- Method


Since a
1 , , a

= Pp Pp r(k + j i)
a
a

,
a

=
1, k = 0, 1, , q
k
i j
0
i=0
j=0
k = k .
123

ARMA PSD Estimate:


P () =

Remarks:

Pq

jk

e
k
k=q


2
A()

This method is called modified YW ARMA Spectral Estimator


P () is not guaranteed to be 0,

, due to the MA part.

The AR estimates a
1 , , a
p have reasonable accuracy if the
ARMA poles and zeroes are well inside the unit circle.
Very poor estimates a
1 , , a
p occur when ARMA poles and
zeroes are closely-spaced and nearby unit circle. (This is
narrowband signal case).

124

Ex: Consider
x(n) = cos(1 n + 1 ) + cos(2 n + 2 ),
where 1 and 2 are independent and uniformly distributed on
[0,2].
1
1
r(k) = cos(1 k) + cos(2 k).
2
2
cos 2k

cos 1k

Note that when 1 2 , large values of k are needed to distinguish


cos(1 k) and cos(2 k).
Remark: This comment is true for both AR and ARMA models.
125

Overdetermined Modified Yule - Walker Equation

r(q)
r(q p + 1)

..
..

.
.

..

r(q + p 1)
r(q)
.

..
..

p
.
.

r(q + M 1) r(q + M p)

126

(M > p)
r(q + 1)
..
.

r(q + p)

..

r(q + M )

Remarks:
The overdetermined linear equations may be solved with
Least Squares or Total Least Squares Methods.
M should be chosen based on the trade-off between information
contained in the large lags of r(k) and the accuracy of r(k).
Overdetermined YW -equation may also be obtained for AR
signals.

127

Solving Linear Equations:


Consider Amn xn1 = bm1 .
When m = n and A is full rank, x = A1 b.
When m > n and A is full rank n, then the solution exists if b is
in the n-dimensional subspace of the m-dimensional space that is
determined by the columns in A.
Ex:

A=

If b =
If

b=

1
1

3
0

1
0

, x = 3.

, x =?
128

does not exist !

Least Squares (LS) Solution


for Overdetermined Equations:
Objective of LS solution:
Let
Find xLS

e = Ax b

so that eH e

is minimized.

e
1

e2

Let e =
..
.

em

Euclidean Norm = eH e = |e1 | + |e2 | + + |em |


129

e2
e1

Ex:

Remarks: AxLS = b + eLS


We see that xLS is found by perturbing b so that a solution
exists.

130

eH e

= (Ax b) (Ax b)

= xH AH Ax xH AH b bH Ax + bH b
h
iH
h
i
1
1
= x (AH A) AH b (AH A) x (AH A) AH b
i
h
1
+ bH b bH A(AH A) AH b

Remark: The 2nd term above is independent of x.


eH e is minimized if
1

x = (AH A)

AH b

131

LS Solution

Illustration of LS solution:
Let

..
A = [a1 . a2 ].

x1

xLS =
x2
a1
b

x1 a 1

A x LS

a2

x2 a 2

132

Ex:

A=
xLS

1
0

,b =

1
1

xLS =?

= (AH A)1 AH b

1
h
i 1
h
= 1 0
1
0
AxLS

eLS = AxLS

1
0

b=

(1) =
1
0

133

1
1

1
1

0
1

Computational Aspects of LS
Solving Normal Equations
H

A A xLS = AH b.
This equation is called Normal equation.
Let
AH A = C,
CxLS = g,

AH b = g.

where C is positive definite.

134

(1)

Cholesky Decomposition:

where

L=

C = LDLH ,

1
0 0

l21 1 0
(Lower Triangular Matrix )
..

..

.
.

ln1 ln2 1

d
0
1

..
D=
, di > 0.
.

0
dn

135

Back - Substitution to solve:


LDLH xLS = g
Let

l21
..
.

ln1

ln2

y = DLH xLS .

y1
0

0 y2

=
..
..
.
.

y1 = g1

y2 = g2 l21 y1
..

yk = gk Pk1 lkj yj ,
j=1
136

yn

g1
..
.
gn

k = 3, , n.

1 l21

Remarks:

ln1

ln2
..
.

x =
n

xk =

yn
dn
yk
dk

x1
..
.
xn

y1
d1

= LH xLS = D1 y = ..

yn
dn

Pn

l
j=k+1 jk xj ,

k = n 1,

Solving Normal equations may be sensitive to numerical errors.

137

Ex.

3 3
4 4+

x1
x2

where is a small number.


Exact solution:

x1
x2

1
1

1
1

Ax = b

Assume that due to truncation errors, 2 = 0.

25 +
1
. 25
H
H

.
A A=
,A b =
25 + 25 + 2
1 + 2

138

Solution to Normal equation (Note the Big Difference!):

49

1
+2
H
H

A b=
.
x= A A
49

139

QR Method: (Numerically more robust).


Ax = b.
Using Householder
transformation,
we can find an orthonormal


matrix Q
i.e, QQH = I , such that

T
z1

.... x = QAx = Qb = ,

z2
0

where T is a square, upper triangular matrix, and


min eH e = zH
2 z2
TxLS = z1
Back Substitution to find xLS
140

Ex.

3
4

3
4

x1
x2

1
1

1 3 4
Q=
.
5 4 3

1
5 5 + 5
x1

= 5 .
QAx = Qb gives
7
0 7
x

2
5
5

x =1
2

x1 = 1 (same as exact solution)

Remark: For large number of overdetermined equations, QR


method needs about twice as much computation as solving Normal
equation in (1).
141

Total Least Squares (TLS) solution to Ax = b.


Recall xLS is obtained by perturbing b only, i.e,
eH
LS eLS =

AxLS = b + eLS .

min.

xT LS is obtained by perturbing both A and b, i.e.,


(A + ET LS ) xT LS = b + eT LS ,
||[ET LS

bT LS ]||F =

minimum,

where ||.||F is Frobenius matrix norm,


XX
2
|gij | ,
||G||F =
i

gij = (ij)th

element of G.

142

Illustration of TLS solution

The straight line is found by minimizing the shortest distance


between the line and the points squared
Let C = [A

B].

Let the singular value decomposition (SVD) of C be


C = UVH ,
Remarks: The columns of U are the eigenvectors of CCH .
143

Remarks: The columns in V are the eigenvectors of CH C.


Both U and V are unitary matrices, i.e,
UUH = UH U = I,

VVH = VH V = I.

is diagonal and the diagonal elements are the eigenvalues of


CH C

..

n+1

0
0
1 2 n+1 0, i are real

144

Let
1

V11
V=
V21
Remarks:

..
. V12 n

..
1
. V22

xT LS = V12 V22 1

At low SNR, TLS may be better than LS.


At high SNR, TLS and LS yield similar results.
Markov Estimate:
If the statistics of e = Ax - b is known,
the statistics may be used to obtain better solution to Ax =b.
145

ARMA Signals:
Two Stage Least Squares Method
Step 1: Approximate ARM A(p, q) with AR(L) for a large L.
2 , , a
L .
YW Equation may be used to estimate a
1 , a
u
(n)

= x(n) + a
1 x(n 1) + + a
L x(n L).
=

N
X
1
u
2 (n).
N L
n=L+1

146

Step 2: System Identification


u(n)

x(n)

B(z)
A(z)

Let

x=

x(0)
x(1)
..
.
x(N 1)

147

u
=

u
(0)
u
(1)
..
.
u
(N 1)

H =

x(1)

x(0)

..

a1

a2

..
.

= ap .

b1

.
.
.

bq

u
(1)

x(p)
x(p + 1)

u
(0)

u
(q)
u
(q + 1)

x(N 2) x(N p 1) u
(N 2) u
(N q 1)
148

LS Solution
Remarks:

(real signals) .
x = H + u
1 T
T

)
= H H
H (x u

Any elements in H that are unknown are set to zero.


QR Method may be used to solve the LS problem.
Step 3:


1 + b ej + + b ejq 2


1
q
P () =
2

j
jp
1 + a

1 e
+ + a
p e

Remark: The difficult case for this method is when ARMA zeroes
are near unit circle.

149

Further Topics on AR Signals:


Linear prediction of AR Processes
Forward Linear Prediction
x(n)
n-4

n-2
n-3

n
n-1

Samples used to predict x(n)

x(n)

m
i=1

f
x (n)

f
a x(n-i)
i

f
e (n)
-

150

ef (n) = x(n) x
f (n).
h
i

2
f = E ef (n)

Goal: Minimize f

= E

2 i
e (n)
f

= E x(n) +
= rxx (0) +

m
X

m
X
i=1

!2

afi x(n i)

afi rxx (i)

i=1

m
X

afj rxx (j) +

j=1

f
afi

=0

m X
m
X
i=1 j=1

rxx (i) +

m
X
j=1

151

afi afj rxx (j i)

afj rxx (j i) = 0.

rxx (0)

rxx (1)

rxx (m)

rxx (0)
rxx (m 1)
rxx (1)

..

rxx (m) rxx (m 1)


rxx (0)

f
a1

..
.

afm

Remarks: This is exactly the YW - Equation.


f decreases as m increases.

m
p

152

f
0
..
.
0

Backward Linear prediction


x(n)

n+3

n+1
n

n+2

n+4

Samples used to predict x(n)

x
b (n) =

m
X

abi x(n + i).

i=1

eb (n) = x(n m) x
b (n m)
h
2 i
b
b
= E e (n)
.
153

To minimize b , we obtain

rxx (0)
..
.

rxx (1)

rxx (m)

rxx (m) rxx (m 1)


afi = abi ,
f = b .

1
.
..

rxx (0)
abm

for all i

154

b
0
..
.
0

Consider an AR(p) model and the notation in LDA:


Let m = 1, 2, , p
efm (n) = x(n) +

m
X
i=1

afm,i x(n i)

x(n) x(n 1) x(n m)

ebm (n) = x(n m) +

m
X
i=1

= [x(n m) x(n m + 1)
= [x(n)

1
m

abm,i x(n m + i)

x(n)]

x(n m + 1) x(n m)]


155

1
m
m

Recall LDA:

efm (n) =

m =

[x(n) x(n 1)

m1
0

+ km

m1

1
x(n m)]
m1

= [x(n) x(n 1)

+km [x(n 1) x(n 2)

+ km

m1

x(n m + 1)]

156

m1

m1

x(n m)]
1

Similarly,

efm (n) = efm1 (n) + km ebm1 (n 1).


ebm (n) = ebm1 (n 1) + km efm1 (n).

157

Lattice Filter for Linear Prediction Error


f
e (n)
1

k1

x(n)

f
e (n)
2

km

k2

k1

f
e (n)
m

k2

-1

km

-1

b
e (n)
1

b
e (n)
2

-1
b
e (n)
m

Remarks: The implementation advantage of lattice filters is that


they suffer from less round-off noise and are less sensitive to
coefficient errors.
If x(n) is AR(p) and m = p, then
x(n)

-1
1+ a z +
1

+ a

u(n)

-p
z

Whitening Filter

158

AR Spectral Estimation Methods


Autocorrelation or Yule-Walker method: Recall that YWEquation may be obtained by minimizing
n
o
 2 
2
(n)] ,
E e (n) = E [x(n) x
where

x
(n) =

p
X

k=1

ak x(n k).

The autocorrelation or YW method replaces r(k) in the YW


equation with biased r(k)

r(1)
r(0)
r(p 1)
a

1
.
.
..
.

..
.
.
.
=

.
.
.
.
.

r(p)
a
p
r(p 1)
r(0)
159

Covariance or Prony Method


Consider the AR(p) signal,
x(n) =

p
X

k=1

ak x(n k) + u(n),

In matrix form,

x(p 1)
x(p)
..
.
x(N 2)

x(p)

x(p + 1)

..

x(N 1)
x(p 2) ..

x(p + 1) ..

..

n = 0, 1, , N 1

x(0)
x(1)

x(N p 1)
160

a1
a2
..
.
ap

u(p)


u(p + 1)
+
..


.

u(N 1)

The Prony Method is to find LS solution


equation

x(p 1)
x(0)

..

x(N 2) x(N p 1)

to the overdetermined

a1
..
.
ap

x(p)
..
.
x(N 1)

Remarks:
The Covariance or Prony Method minimizes
"
#2
p
N
1
N
1
X
1 X 2
1 X
2
u
(n) =
x(n) +
a
k x(n k)

=
N p n=p
N p n=p
k=1

161

The Autocorrelation Method or YW-Method minimizes


#2
"
p

X
1 X
2
a
k x(n k)
x(n) +

=
N n=
k=1

where those x(n) that are NOT available are set to zero.
For large N , the YW and Prony methods yield similar results.
For small N , YW method gives poor performance. The Prony
p for small N . The Prony
method can give good estimates a
1 , , a
method gives exact estimates for x(n) =sum of sinusoids.
Since biased r(k) are used in YW method, the estimated poles
are inside unit circle. Prony method does not guarantee stability.

162

Modified Covariance or Forward Backward (F/B) Method


Recall Backward Linear Prediction:
x(n) =

p
X

abk x(n + k) + eb (n).

k=1

For real data and real AR coefficients,

x(0)
x(1)
..
.
x(N p 1)

afk = abk = ak ,

x(1)


x(2)

..


.

k = 1, , p
x(2)

x(p)

x(3) x(p + 1)

x(N p)
x(N 1)

a1
a2
..
.
ap

In the F/B method, this backward prediction equation is combined


with the forward prediction equation and LS solution is found.
163

x(p 1)

..

x(N 2)

x(1)

..

x(N p)

x(0)
..
.

x(N p 1)

x(p)

..

x(N 1)

a1
a2
..
.
ap

x(p)
..
.
x(N 1)
x(0)
..
.
x(N p 1)

Remarks: The F/B method does not guarantee poles inside the
unit circle. In Practice, the poles are usually inside the unit circle.

164

For complex data and complex model,



f
b
ak = ak = ak , k = 1, , p
Then F/B solves:

x(p 1)
x(0)

..
..

.
.

x(N 2) x(N p 1)

x (p)
x (1)

..
..

.
.

x (N p)

x (N 1)

165

a1
a2
..
.
ap

x(p)
..
.
x(N 1)
x (0)
..
.

x (N p 1)

Remarks on
2:
In YW method,

2 = r(0) +

p
X

a
k r(k).

k=1

In Prony Method,
Let

eLS

e(p)
..
.
e(N 1)

N
1
X
1
2
2 =
|e(n)|
N p n=p

166

In F/B Method,

Let

eLS

=
2(N p)
2

ef (p)
..
.
ef (N 1)
eb (0)
..
.

eb (N p 1)

)
(N 1
N
p1
X
X


ef (n) 2 +
eb (n) 2
n=p

n=0

167

Burg Method
Consider real data and real model. Recall LDA:

n
n

+ kn+1

n+1 =
0
1
Thus, if we know n and kn+1 , we can find n+1 .

Recall
()

ef (n) = ef (n) + k eb (n 1)
m m1
m
m1
eb (n) = eb (n 1) + km ef (n),
m

where

m1

efm1 (n) = x(n) +

m1

m1
X
k=1

ebm1 (n) = x(n m + 1) +

m1
X
k=1

168

a
m1,k x(n k).

a
m1,k x(n m + 1 + k)

km is found by minimizing (for m1 given)


N 1
2  b
2 o
1 X n f
em (n) + em (n)
.
2 n=m

km

PN 1 f
em1 (n)
ebm1 (n1)
.
= PN 1  n=m
2
2
f
b
[em1 (n)] +[em1 (n1)]
n=m

Steps in Burg method:


Initialization

1
N

r(0) =
0 = r(0)

PN 1
n=0

ef0 (n) = x(n),


eb0 (n) = x(n),

169

x2 (n)
n = 1, 2, , N 1

n = 0, 1, , N 2.

()

For m = 1, 2, , p,

Calculate km with (*)

2
m = m1 (1 km
)

m1
m1

m =
, ( 1 = k1 ).
+ km

0
1

Update efm (n) and ebm (n) with ()

2.
Remarks: p =

Since a2 + b2 2ab,



km < 1,

Burg Method gives poles that are inside unit circle.


Different ways of calculating km are available.

170

Properties of AR(p) Signals:


Extension of r(k):
* Given r(0), r(1), , r(p).

* From YW - Equations we can calculate a1 , a2 , , ap , 2


Pp
* r(k) = l=1 al r(k l), k > p

Another point of view:


* Given r(0), , r(p).

* Calculate a1 , , ap , 2 .
* Obtain P ()
DT F T

* r(k) P ().

171

Maximum Entropy Spectral Estimation


Given r(0), , r(p) . The remaining r(p + 1), are extrapolated
to maximize entropy.
Entropy: Let Sample space for discrete random variable x be
x1 , , xN . The entropy H(x) is
H(x) =

N
X

P (xi ) ln P (xi ),

i=1

P (xi ) = prob(x = xi )
For continuous random variable,
Z
H(x) =
f (x) ln f (x)dx.

f (x) = pdf of x.

172

For Gaussian random variables,

x(0)

..

x=
.

x(N 1)

N (0, RN )

1
ln(detRN ).
2
N , we consider Entropy Rate:

HN =
Since HN

as

HN
h = lim
N N + 1
h is maximized with respect to r(p + 1), r(p + 2), .
Remark: For Gaussian case, we obtain Yule-Walker equations .... !

173

Maximum Likelihood Estimators:


Exact ML Estimator:
u(n)
real inputs

x(n) , n = 0, ..., N-1

A(z)

real outputs

u(n) is Gaussian white noise with zero-mean.

E[u(n)] = 0,

V ar[u(n)] = 2

E[u(i)u(j)] = 0, i 6= j,

The likelihood function is




2
f = f x(0), , x(N 1)|a1 , , ap ,

The ML estimates of a1 , , ap , 2 are found by maximizing f .


174



2
f = f x(p), , x(N 1)| x(0), , x(p 1), a1 , , ap ,


2
f x(0), , x(p 1)|a1 , , ap ,


2
* Consider first f1 = f x(0), , x(p 1)|a1 , , ap ,


1
1 T 1 
f1 =
exp x0 Rp x0 .
p
1
2
2
(2) det 2 (Rp )

x(0)
r(0)
r(p 1)

..
..
.

.
.
.
,
R
=
x0 =

.
.
p
.
.
.

x(p 1)
r(p 1)
r(0)

Remark: r(0), , r(p 1) are functions of a1 , , ap , 2 . (see, e.g.,


the YW system of equations)

175

* Consider next


2
f2 = f x(p), , x(N 1)|x(0), , x(p 1), a1 , , ap ,
x(n) +

p
X

k=1

ak x(n k) = u(n)

u(p) = x(p) + a1 x(p 1) + + ap x(0).

u(p + 1) = x(p + 1) + a1 x(p) + + ap x(1)


..

u(N 1) = x(N 1) + a1 x(N 2) + + ap x(N p 1).

176

u(p)

..

=
.

u(N 1)

x(p)

a1 1
0 0
x(p + 1)

a2 a1 1 0
..

.
..
..

.
.

x(N 1)
0 ap 1

a1 x(p 1) + + ap x(0)

a2 x(p 1) + + ap x(1)

..

ap x(p 1)

..

0
177

Let

u=

u(p)
..
.
u(N 1)

,x =

x(p)
..
.
x(N 1)

Given x(0), , x(p 1), a1 , , ap , 2 , x and u are related by


linear transformation.
The Jacobian of the transformation

1
0

a1 1
J=
.
..
..
.

ap

det(J) = 1
178

0
..

f (u) =

1
(2 2 )

f2

N p
2



1 T
exp 2 u u
2

= f [u(x)] |det(J)|
= f [u(x)].

Let

x(p)
x(p 1)
x(0)

x(p)

x(1)
x(p + 1)
X=
..

x(N 1) x(N 2) x(N p 1)

179

a=

f2 =

1
a1
..
.
ap

u = Xa

 1 T T
1
X X
a .
N p exp 2 2 a

(2 2 )

Remark: Maximizing f = f1 .f2 with respect to a1 , , ap , 2 is


highly non-linear!

180

An Approximate ML Estimator

p ,
2 are found by maximizing f2 .
a
1 , , a
a
1 , , a
p are found by minimizing
aT XT X
a = uT u


u(p)
1
x(p)

x(0)


x(1)
a1 u(p + 1)
x(p + 1)


..
..
.. =

.
.
.

x(N 1) x(N p 1)

ap

u(N 1)

This is exactly Pronys Method !

p
N
1
X
X
1
x(n) +

2 =
a
j x(n j) .
N p n=p
j=1

Again, exactly Pronys Method !


181

Accuracy of AR PSD Estimators


Accuracy Analysis is difficult.
Results for large N are available due to Central Limit Theorem.
For large N , the variances for a
1 , , a
p ,
are all proportional to N1 . Biases N1 .

182

k1 , , kp ,

2 , P ()

AR Model Order Selection


Remarks:
Order too low yields smoothed/biased PSD estimate.
Order too high yields spurious peaks/large variance in PSD
estimate
Almost all model order estimators are based on the estimate of
the power of linear prediction error, denoted k , where k is the
model order chosen.

183

Final Prediction Error (FPE) Method


minimizes

FPE(k) =

N +k
N k k

Akaike Information Criterion (AIC) Method


minimizes

AIC(k) = N ln k + 2k .

Remarks:
As N , AICs probability of error in choosing correct order
does NOT 0.
As N , AIC tends to overestimate model order.

184

Minimum Description Length (MDL) Criterion


minimizes

MDL(k) = N ln k + k ln N .

Remark: As N , MDLs probability of error 0.


(consistent!).
Criterion Autoregressive Transfer (CAT) Method

minimizes

k
1 X 1
1
CAT(k) =
,
N i=1 i
k

i =

N
i
N i

Remarks: None of the above methods works well for small N


Use these methods to initially estimate orders. ( Practical
experience needed ).
185

Noisy AR Processes:
y(n) = x(n) + w(n)
x(n) = AR(p) process.

2
w(n) = White Gaussian noise with zero-mean and variance w

x(n) and w(n) are Independent of each other.


Pyy ()

= Pxx () + Pww ()
=
=

2
2

|A()|

2
+ w
2

2
2 + w
|A()|
2

|A()|

186

Remarks: y(n) is an ARMA signal


a1 , , ap ,

2
2 , w
may be estimated by

* ARMA methods.
* A large order AR approximation.
* Compensating the effect of w(n).
* Bootstrap or adaptive filtering and AR methods.

187

Wiener Filter: (Wiener-Hopf Filter)


x(n) Desired Signal
+

y(n) = x(n) + w(n)


H(z)

e(n)

x(n)

H(z) is found by minimizing E |e(n)| .


H(z) depends on knowing Pxy ().

188

General Filtering Problem: (Complex Signals)


d(n) Desired Signal
+

y(n) = x(n) + w(n)


H(z)

Special case of d(n): d(n) = x(n + m):


1.) m > 0,

m - step ahead prediction.

2.) m = 0,

filtering problem

3.) m < 0,

smoothing problem.

189

e(n)

Three common filters:


1.) General Non-causal:
H(z) =

hk z k .

k=

2.) General Causal:


H(z) =

hk z k

k=0

3.) Finite Impulse Response (FIR):


H(z) =

p
X

k=0

190

hk z k

Case 1: Non-causal Filter.

=E

("

d(n)

k=

= rdd (0)
+

o
n
2
E = E |e(n)|
#"

hk y(n k)

l=

d(n)

hl rdy (l)

k= l=

l=

# )

hl y(n l)

hk rdy
(k)

k=

ryy (l k)hk hl

Remark: For Causal and FIR filters, only limits of sums differ.
Let

E
= 0,
i

hi = i + ji
rdy (i) =

k=

hok ryy (i k),

191

E
= 0.
i
i

In Z - domain
Pdy (z) = H o (z)Pyy (z)
which is the optimum Non-causal Wiener Filter.
Ex : d(n)
Pxx (z)
Pww (z)
x(n)

= x(n),

y(n) = x(n) + w(n),


0.36
=
(1 0.8z 1 ) (1 0.8z)
= 1.

and w(n)

are uncorrelated.

Optimal filter ?
Pyy (z)

= Pxx (z) + Pww (z)


0.36
+1
=
(1 0.8z 1 ) (1 0.8z)

1
1 0.5z
(1 0.5z)
= 1.6
(1 0.8z 1 ) (1 0.8z)
192

rdy (k) = E [d(n + k)y (n)]


= E {x(n + k) [x (n) + w(n)]}

= rxx (k).
Pdy (z)

= Pxx (z)
Pyy (z)
o
H (z) =
Pdy (z)
=
ho (k) =
h

0.36
1.6 (1 0.5z 1 ) (1 0.5z)
 |k|
1
0.3
2

(k)

0.3
k
0

193

Case 2: Causal Filter.


H(z) =

hk z k

k=0

Through similar derivations as for Case 1, we have


P o
rdy (i) = k=0 hk ryy (i k),
hok =?

Split H(z) as

194

B(z)B

1
z

195

1
Pyy (z)

Pick B(z) such that the system B(z) is stable, causal,


minimum phase.
Note that
P (z) = Pyy B(z)B
B(z) is called whitening filter.

1
z

=1

Choose G (z) so that E{|e(n)|2 } is minimized.


P
rd (i) = k=0 gk r (i k).

Since P (z) = 1, r (k) = (k).

rd (i) = gi , i = 0, 1, 2,

196

hi = gi bi .
Note that
rd (i) =
=

E {d(n + i) (n)}
(
"
# )
X
E d(n + i)
bk y(n k)
k=0

bk rdy (i + k).

k=0

Since bk = 0 for k < 0 (causal),


rd (i) =

bk rdy (i + k).

pd (z) = Pdy (z)B

1
z

rd (i) = gi , for i = 0, 1, , ON LY .
197

Let
[X(z)]+ =

"

xk z k

k=

G (z) =

=
+

xk z k .

k=

gk z k

k=


G (z) = Pdy (z)B

1
z

H (z) = B(z)G (z)


H (z) = B(z) Pdy (z)B

198



1
z



Ex. (Same as previous one)


0.36
,
Pxx (z) =
(1 0.8z 1 ) (1 0.8z)
Pww (z) = 1. x(n) and w(n) independent
x(n)
+

x(n) + w(n)

H(z)

e(n)

Causal

Pdy (z)

= Pxy (z) = Pxx (z)



1
(1 0.5z)
1.6 1 0.5z
.
Pyy (z) =
1
(1 0.8z ) (1 0.8z)
B(z)

1 1 0.8z 1

( stable and causal )


1
1.6 1 0.5z

199

Pdy (z)B

1
z

=
=
=


 
1

Pdy (z)B
z +

0.36
1 1 0.8z

(1 0.8z 1 ) (1 0.8z) 1.6 1 0.5z


1
0.36

.
1
1.6 (1 0.8z ) (1 0.5z)


5
5
0.36
3
6z

+
1 0.5z
1.6 1 0.8z 1
5
0.36
o
3

=
G
(z)
1
1

0.8z
1.6

200

5
0.36
1
1 1 0.8z 1
o
3

H (z) =
= 0.375
.
1
1
1
1

0.8z
1

0.5z
1

0.5z
1.6
1.6

h (k) =

201


3 1 k
U (k),
8 2

k = 0, 1, 2, .

Case 3: FIR Filter:


H(z) =

p
X

hk z k

k=0

Again, we can show similarly


rdy (i) =

p
X

k=0

rdy (0)

rdy (1)

..

rdy (p)

ryy (0)

hok ryy (i k).

ryy (1)

ryy (p)



ryy (0)

ryy (1)
=
..

..

.
.

ryy (p) ryy (p 1)

ryy (0)

ho0
ho1
..
.
hop

Remark: The Minimum error E is the smallest in case (1) and


largest in case (3).
202

Parametric Methods for Line Spectra


y(n)
x(n)

= x(n) + w(n)
=

K
X

k ej(k n+k )

k=1

Initial phases, independent of each other,

uniform distribution on
k

= angular frequencies

w(n)

[, ]

amplitudes, constants, > 0

= zero-mean white Gaussian Noise,


independent of 1 , , K

203

Remarks:
Applications: Radar, Communications, .
We are mostly interested in estimating 1 , , K .
Once 1 , , K are estimated,
1, ,
K ,
found readily from
1, ,
K
Let

k ejk = k

1
y(0)


ej 1
y(1)

..
..


.
.

y(N 1)
ej(N 1)1

1
ej 2

1 , , K can be

1
ej K
..
.
ej(N 1)K

The amplitude of k is k . The phase of k is k .

204

1
2
..
.
k

Remarks:
ryy (k)

= E {y (n)y(n + k)}
K
X

i2 eji k + 2 (k)

i=1

Pyy ()
2

i=1

K
X

2
2

i2 ( i ) + 2 .

2
3

Recall that the resolution limit of Periodogram is

1
N

The Parametric methods below have resolution better than N1 .


(These methods are the so-called High - Resolution or Super Resolution methods)
205

Maximum Likelihood Estimator


w(n) is assumed to be zero-mean circularly symmetric complex
Gaussian random variable with variance 2 .
The pdf of w(n) is N (0, 2 )
f (w(n)) =

|w(n)|
1
exp

2
2

Remark: The real and imaginary parts of w(n) are real Gaussian
2
random variables with zero-mean and variance 2 .
The two parts are independent of each other.

206

f (w(0), , w(N 1)) =

1
N

( 2 )

The likelihood function of y(0),


f = f (y(0), , y(N 1)) =

( P
)
N 1
2
n=0 |w(n)|
exp
2

,
1
N

( 2 )

y(N 1) is
( P
)
N 1
2
|y(n) x(n)|
exp n=0
2

Remark: The ML estimates of


1 , , K , 1 , , K , 1 , , K are found by maximizing f
with respect to 1 , , K , 1 , , K , 1 , , K .
Equivalently, we minimize

2
N
1
K


X
X

k ej(k n+k )
g=
y(n)


n=0

k=1

207

Remarks: If w(n) is neither Gaussian nor white, minimizing g is


called the non-linear least-squares method, in general.

y(0)

1
1
..

..
..
Let y =
,

=
,

.
.
.

y(N 1)
K
K

B=

ej1
..
.

ej2

ej(N 1)1

208

1
ejK
..
.
ej(N 1)K

(y B) (y B) .
h
h
iH
i



1
1
= BH B
BH B BH B
BH y
BH y
1 H
H
H
H
B y.
+ y yy B B B
=

1

argmax yH B BH B
BH y .


1

= BH B

.
BH y

is a consistent estimate of
Remarks:

209

For large N ,
h

) (
)
E (

6 2
N3

1
21

..

.
1
2K

CRB

However,
is difficult to implement.
The maximization to obtain
* The search may not find global maximum.
* Computationally expensive.

210

Special Cases:
1.) K = 1
h

= argmax yH B BH B
|
{z

B=

g1

ej
..
.
ej(N 1)

211

1

BH y ,
}

H
, B B = N.

BH y

1 ej

N
1
X

ej(N 1)

y(0)
..
.
y(N 1)

y(n)ejn

n0


2
N
1


1 X

argmax
y(n)ejn

N
n=0

corresponds to the highest peak of the Periodogram !

212

2.)
2
.
N
1
Since V ar (
k k ) 3
N
1

k k 3 .
N2
2
.
i
k | >
infi6=k |
N
We can resolve all K sine waves by evaluating g1 at FFT points:
= infi6=k |i k | >

2
i,

i =
N

Any K

of these

i = 0, , N 1

BH B = N I, I = Identity matrix.

2
K
N
1

X 1 X
j
k n
g1 =
y(n)e

.

N

gives

k=1

n=0

213

2
N

The K
i that maximizes g1 correspond to the
largest K peaks of the Periodogram.

Remarks:
k estimates obtained by using the K largest peaks of
Periodogram have accuracy
k k 2
N
The periodogram is a good frequency estimator.
introduced by Schuster a century ago !)

214

(This was

High - Resolution Methods


Statistical Performance Close to ML estimator ( or CRB ) .
Avoid Multidimensional search over parameter space.
Do not depend on Resolution condition.
All provide consistent estimates
All give similar performance, especially for large N .
Method of choice is a Matter - of - Taste .

215

Higher - Order Yule- Walker ( HOYW) Method:


Let


jk 1
1e z
xk (n)

xk (n) = k ej(k n+k )


= xk (n) ejk xk (n 1)

= k ej(k n+k ) ejk k ej[k (n1)+k ]


=



jk 1
is an Annihilating filter for xk (n).
1e z

K
jk 1
Let A(z) = k=1 1 e z

A(z)x(n) = 0
y(n) = x(n) + w(n)

A(z)y(n) = A(z)w(n)
216

()

Remark:
It is tempting to cancel A(z) from both sides above, but this is
wrong since y(n) 6= w(n) !

Multiplying both sides of () by a polynomial A(z)


of order L K gives


1
L
1
L
y(n) = 1 + a
1 z + + a
w(n)
1+a
1 z + + a
L z
L z

1+a
1 z 1 + + a
L z L = A(z)A(z)

1
a

[y(n) y(n 1) y(n L)]


.. = w(n)+ +aL w(nL)
.

a
L
where

217

Multiplying both sides by

we get

ryy (L)
..
.

ryy (L + 1)
..
.

ryy (L + M )

ryy (1)

ryy (L + M 1) ryy (M )

y (n L M )

ryy (1)

= 0.
.

..

ryy (M )
a
L

a
1
ryy (L + 1)

..
..

. =
.

a
L
ryy (L + M )


a =
218

y (n L 1)
..
.

Remarks:
When y(0), , y(N 1) are the only data available, we first
estimate ryy (i) and replace ryy (i) in above equation with estimate
ryy (i)
{
K } are the angular positions of the K roots nearest the unit
circle
Increasing L and M will
* give better performance due to using the information in
higher lags of r(i)
Increasing L and M too much will
* give worse performance due to increased variance in r(i) for
large i

219

has rank K, if M K and L K

y(n)

y(n

1)

i (n) =
Proof: Let y
..

y(n i + 1)

(n) =
x

x1 (n)
..
.
xK (n)

i (n) =
w

w(n)
w(n 1)
..
.
w(n i + 1)

xk (n) = k ej(k n+k )

220

i (n) =
y

ej1
..
.

ej2

ej(i1)1

1
ejK

ej(i1)K
{z

Ai

Ai = i K Vandermonde matrix.
rank(Ai ) = K if i K and k 6= l for k 6= l.
i (n) = Ai x
(n) + w
i (n)
y

221

x
i (n)
(n) + w

y(n L 1)

..

Thus = E
.

y(n L M )


y (n 1) y (n L)

(n L 1)
x (n
= E AM x

where

PL+1

= AM PL+1 AH
L,


H
(n L)
x (n)
= E x

222

1)AH
L

E {xi (n)}

E {xi (n k)xi (n)}

o
n
= E i ej(i n+i )
Z
1
=
i eji n eji di = 0
2

n
o
j[i (nk)+i ]
j(i n+i )
i e
= E i e

= i2 eji k

Since i s are independent of each other,





E xi (n k)xj (n) = 0, i 6= j

223

PL+1

x1 (n L 1)

x2 (n L 1)
h
x (n 1) x (n 1)
= E
.
1

xK (n L 1)

0
12 ej1 L

..
.

.
.
.
=

.
.
.

2 jK L
0
K
e

Remark: For M K and L K, is of rank K, so is .

224

Consider

ryy (L)
ryy (1)

..

ryy (L + M 1) ryy (M )

=
Remarks: rank ()

r (L + 1)
1
yy
..
..

.
.

a
L
ryy (L + M )

min(M, L)

almost surely, due to errors in ryy (i)


ill conditioned.
For large N , ryy (i) ryy (i) makes
For large N , LS estimates of a
1 , , a
L give poor estimates of
1 , , K .

225

Let us use this rank information as follows: Let


= UVH

[U1

U2 ]

V1H
V2H

K
LK

(Diagonal
denote the singular
value
decomposition
(SVD)
of
.

1 0

arranged from large to small ).


elements in
0 2

is close to rank K, and has rank K,


Since
K = U1 1 VH

in the Frobenius Norm


(The best Rank - K Approximation of

sense) is generally a better estimate of than .


Ka

= V1 1
U
a
1
1
226

()

Remark:
K to replace gives better frequency estimation.
Using

K is closer to
This result may be explained by the fact that

than .
The rank approximation step is referred as noise cleaning .

227

Summary of HOYW Frequency Estimator


Step 1: Compute r(k), k = 1, 2, , L + M .
and determine a

with (**)
Step 2: Compute the SVD of
Step 3: Compute the roots of

1+a
1 z 1 + + a
L z L = 0
Pick the K roots that are nearest the unit circle and obtain the
frequency estimates as the angular positions (phases) of these roots.
Remarks: Rule of Thumb for selecting L and M :
LM

N
L+M
3
Although one cannot guarantee that the K roots nearest the unit
circle give the best frequency estimates, empirical evidence shows
that this is true most often .
228

Some Math Background


Lemma: Let U be a unitary matrix; i.e., UH U = I.
Then ||Ub||22 = ||b||22 ,
where

||x||22 = xH x.

Proof:
||Ub||22 = bH UH Ub = bH b = ||b||2 .
Consider Ax b,
where

is

M L,

x is

L 1,

b is

M 1,

229

is of rank K

SVD of A:
A = UV

U1

U2

V1H
V2H

Goal: Find the minimum-norm x so that ||Ax b||22 = minimum.


||Ax b||22

= ||UH Ax UH b||22

= ||UH UVH x UH b||2


H
= || |V{z
x} UH b||22
y

= ||y UH b||22

2




H
1 0
y1
U1 b



y2
b
0 0
UH


2
2

2
H
2
= ||1 y1 UH
1 b||2 + ||U2 b||2
230

To minimize ||Ax b||22 , we must have,


1 y1 = UH
1 b

H
y1 = 1
1 U1 b .

Note that y2 can be anything and ||Ax b||22 is not affected.

Let y2 = 0 so that ||y||22 = ||x||22 = minimum.

y1
H

V x=y=
0

i y
h
1
= V1 y1
x = Vy = V1 V2
0

H
x = V1 1
U
1 b.
1

||x||22 = ||y||22 =
231

minimum

Recall:

SVD Prony Method



1
L
y(n)
1+a
1 z + + a
L z

= (1 + a
1 z 1 + + a
L z L )w(n). (L K)
At not too low SNR,

h
i

y(n) y(n 1) y(n L)

y(L)
y(L 1)
y(0)

y(L)

y(1)
y(L + 1)

..

y(N 1) y(N 2) y(N L 1)


232

a
1

.. 0
.

a
L

1
a

.. 0
.

a
L

()

Remark: If w(n) = 0, Eq () holds exactly.


If w(n) = 0, Eq () gives EXACT frequency estimates.
Consider next the rank of

x(L 1)
x(0)

..

X=
.

x(N 2) x(N L 1)
Note

x(0)
..
.

x(N L 1)

ej1
..
.

ej(N L1)1

233

1
ejK
..
.

ej(N L1)K

.
..

x(1)
..
.

x(N L)

X=

..

ej1
..
.

..
..
.

ejK

ej(N L1)1

..

j1

e
1

..

ejK
K

ej(N L1)K

1
1

ej1

ejK

..
..

.
.

0
j(N L1)1
j(N L1)K
e
e

ej(L1)1 ej1 1

j(L1)2

ej2 1
e

..
..
..

.
.
.

ej(L1)K

ejK

234

0
..

.
K

Remark: If N L 1 K and L K, X is of rank K.


From (*)

y(L 1)
y(0)

..
..

.
.

y(N 2) y(N L 1)
{z
|
Y

y(L)
a

1
..

..
.
.

y(N 1)
a
L
}
{z
|
y

Remark: A rank K approximation of Y has Noise Cleaning


effect.

h
i
V1H
K
0
1

Let Y = U1 U2
V2H
LK
0 2

235

a
1
..
.
L
a

H
U
= V1 1
1
1

y(L + 1)
..
.
y(N 1)

()

Summary of SVD Prony Estimator.

Step 1. Form Y and compute SVD of Y


Step 2. Determine

a with ()
Step 3. Compute the roots from

a. Pick K roots that are nearest


the unit circle. Obtain frequency estimates as phases of the roots.
Remark: Although one cannot guarantee that the K roots
nearest the unit circle give the best frequency estimates, empirical
results show that this is true most often.
A more accurate method is obtained by cleaning (i.e., rank K
..
approximation of) the matrix [Y . y].
236

Pisarenko and MUSIC Methods


Remark: Pisarenko method is a special case of MUSIC ( Multiple
Signal Classification ) method.
Recall:

M (n) =
y

AM

y(n)
y(n 1)
..
.
y(n M + 1)

ej1
..
.

ej(M 1)1
237

ejK
..
.

ej(M 1)K

(n) =
x

x1 (n)
..
.
xK (n)

M (n) =
w

w(n)
..
.

w(n M + 1)

M (n) = AM x
(n) + w
M (n)
y

Let

R = E

H
M (n)
y
yM
(n)

(n)AH
M

(n)
x
= E AM x


H
M (n)w
M (n)
+ E w
238

2
R = AM PAH
M + I,

0
12

..
P=
.
.

2
0
K

Remarks: rank (AM PAH


M ) = K if M K.

If M K, AM PAH
M has K positive eigenvalues and M K zero
eigenvalues. We shall consider M K below.
Let the positive eigenvalues of AM PAH
M be denoted
1
2
K

The eigenvalues of R are:

=
k + 2 , k = 1, , K.
k
Two groups
k = 2 , k = K + 1, , M
239

Let s1 , , sK be the eigenvectors of R that correspond to


1 , , K .
Let S = [s1 , , sK ]
Let sK+1 , , sM be the eigenvectors of R that correspond to
K+1 , , M .
Let G = [sK+1 , , sM ]

2
0

..
RG = G
= 2 G
.

0 2


H
2
RG =
AM PAM + I G
=

2
AM PAH
MG + G

AM PAH
MG = 0
240

AH
MG = 0

Remark:
Let the linearly independent K columns of AM define
K -dimensional signal subspace
* Then the eigenvectors of R that correspond to the M K
smallest eigenvalues are orthogonal to the signal subspace.
* The eigenvectors of R that correspond to the K largest
eigenvalues of R span the same signal subspace as AM .
AM = SC for a K K

non-singular C.

241

MUSIC:
K

The true frequency values {k }k=1 are the only solutions of


H
aH
()GG
aM () = 0.
M

ej

aM () =
..

ej(M 1)

Steps in MUSIC:

PN
1
H

M (n)
Step 1: Compute R = N n=M y
yM
(n), and its
eigendecomposition.
whose columns are the eigenvectors of R
that correspond
Form G

to the M K smallest eigenvalues of R.

242

Step 2a (Spectral MUSIC): Determine the frequency estimates as


the locations of the K highest peaks of the MUSIC spectrum
1
,
H
H

aM ()GG aM ()

[, ]

Step 2b (Root MUSIC): Determine the frequency estimates as


angular positions (phases) of K (pairs of reciprocal) roots of
equation

H
1 H
GG aM (z) = 0
z
a
M

that are closest to the unit circle




1
M +1 T
aM (z) = 1 z
z
, i.e.,

243

aM (z)|z=ej = aM ()

Pisarenko Method = (MUSIC with M = K + 1)


Remarks:
Pisarenko method is not as good as MUSIC.
M in MUSIC should not be too large due to poor accuracy of
r(k) for large k.

244

ESPRIT Method
(Estimation of Signal Parameters by Rotational Invariance
Techniques )

ej1

ejK

AM =
..

ej(M 1)1

ej(M 1)K

Let B1 = first M 1 rows of AM ,

B2 = last M 1 rows of AM .

B2 D = B1 ,

D=

ej1

0
..

.
ejK

0
245

Let S1 and S2 be formed from S the same way as B1 and B2 from


AM
Recall: S = AM C

S = B C = B DC.
1
1
2

S2 = B2 C
S2 C1 = B2

S1 = S2 C1 DC = S2 .
1
H
S2 S2

=
SH
2 S1 .
The diagonal elements of D are the eigenvalues of .

1
H S

H S

= S
S
Steps of ESPRIT: Step 1:
2 2
2 1

Step 2: Frequency estimates are angular positions of the

eigenvalues of .
246

Remarks:
1
2 S
S
can also be solved with Total Least Squares Method
Since is K K matrix, we do not need to pick K roots nearest
the unit circle, which could be wrong roots.
ESPRIT does not require the search over parameter space, as
required by Spectral MUSIC.
All of these remarks make ESPRIT a recommended method !

247

Sinusoidal Parameter Estimation in the Presence


of Colored Noise via RELAX
y(n) =

K
X

k ejk n + e(n)

k=1

Complex amplitudes, unknown.

Unknown frequencies.

e(n)

Unknown AR or ARMA noise.

Consider the Non-linear least-squares (NLS) method.


g=


2
K

X

jk n
k e
y(n)


N
1
X
n=0

k=1

248

Remarks:
k ,
k and

k = 1, , K are found by minimizing g .

When e(n) is zero mean Gaussian white noise, this NLS method
is the ML method.
When e(n) is non-white noise, NLS method gives asymptotically
(N ) statistically efficient estimates of
k and k despite the
fact that NLS is not an ML method for this case.
The non-linear minimization is a difficult problem.

249

Remarks:
Concentrating out {k } gives
h
i

1
= argmax yH B BH B

BH y
= B B

1



B y
H

Concentrating out {k }, instead of simplifying the problem,


actually complicates the problem.
The RELAX algorithm is a relaxation - based optimization
approach.
RELAX is both computationally and conceptually simple.

250

Preparation:
Let

yk (n) = y(n)

K
X

i ej i n

i=1,i6=k

* i and
i , i 6= k, are assumed given, known, or estimated.
Let

gk =

N
1
X
n=0

* Minimizing gk gives:



yk (n) k ejk n 2 .


2
1
NX


k = argmaxk
yk (n)ejk n .


n=0

N
1

X
1

j
n
k =
.
yk (n)e k

N
n=0

k =
k

251

Remarks:
N
1
X

yk (n)ejk n

is the DTFT of yk (n)!

n=0

(can be computed via FFT and zero-padding.)

k corresponds to the peak of the Periodogram!

k is the peak height (complex number!) of the DTFT of yk (n)


(at
k ) divided by N .

252

The RELAX Algorithm


Step 1: Assume K =1. Obtain
1 and 1 from y(n).
1 and 1
Step 2: Obtain y2 (n) by assuming K=2 and using
obtained from Step 1.

2 from y2 (n)

and

Obtain

Iterate until converg.


2 and 2
Obtain y1 (n) by using

and reestimate
1 and 1 from y1 (n)

Step 3: Assume K = 3.

1 ,1 ,
2 , 2 . Obtain
3 and 3 from y3 (n) .
Obtain y3 (n) from
2 , 2 ,
3 , 3 . Reestimate
1 and 1 from y1 (n).
Obtain y1 (n) from
Obtain y2 (n) from
1 , 1 ,
3 , 3 . Reestimate
2 and 2 from y2 (n).
Iterate until g does not decrease significantly anymore !

253

Step 4: Assume K = 4,
..
.
Continue until K is large enough!
Remark:
RELAX is found to perform better than existing

high-resolution algorithms, especially in obtaining better k ,


k = 1, , K

RELAX is more robust to the choice of K and the data model


errors.

254

Das könnte Ihnen auch gefallen