Sie sind auf Seite 1von 254

EEL 6537 Spectral Estimation

Jian Li
Department of Electrical and Computer Engineering
University of Florida
Gainesville, FL 32611, USA
1
Spectral Estimation is an Art
Petre Stoica
I hear, I forget;
I see, I remember;
I do, I understand.
A Chinese Philosopher.
2
What is Spectral Estimation?
From a nite record of a stationary data sequence, estimate how
the total power is distributed over frequencies , or more practically,
over narrow spectral bands (frequency bins).
3
Spectral Estimation Methods:
Classical (Nonparametric) Methods
Ex. Pass the data through a set of band-pass lters and measure
the lter output powers.
Parametric (Modern) Approaches
Ex. Model the data as a sum of a few damped sinusoids and
estimate their parameters.
Trade-Os: (Robustness vs. Accuracy)
Parametric Methods may oer better estimates if data closely
agrees with assumed model.
Otherwise, Nonparametric Methods may be better.
4
Some Applications of Spectral Estimation
Speech
- Formant estimation (for speech recognition)
- Speech coding or compression
Radar and Sonar
- Source localization with sensor arrays
- Synthetic aperture radar imaging and feature extraction
Electromagnetics
- Resonant frequencies of a cavity
Communications
- Code-timing estimation in DS-CDMA systems
5
REVIEW OF DSP FUNDAMENTALS
Continuous-Time Signals
Periodic signals
x(t) = x(t +T
p
)
Fourier Series:
x(t) =

k=
c
k
e
j2kF
o
t
c
k
=
1
T
p

T
p
x(t)e
j2kF
o
t
dt,
F
o
=
1
T
p
.
6
Ex.
p
x(t)
/2
T
t
e
j
o
t
FT
2(
o
)
2 c
2c
1
0
x()

= 2F
0 0
0
0
7
Ex.
s(t) =

k=
(t kT)
C
k
=
1
T
for all k
-2T 0 -T T 2T
s(t)
t

/T
2
S()

2
/T
Remark:
Periodic Signals Discrete Spectra.
8
Discrete signals
Ex:
T

x(t)
x(t) s(t)
t
() x

t
2/
Remark:
Discrete Signals Periodic Spectra.
Discrete Periodic Signals Periodic Discrete Spectra.
9
Aliasing Problem:
Ex.
10
* Fourier Transform (Continuous - Time vs. Discrete-Time)
Let y(t) = x(t)s(t) =

n=
x(nT)(t nT)
CTFT : Y () =

y(t)e
jt
dt
=

n=
x(nT)(t nT)e
jt
dt
=

n=
x(nT)e
jnT
DTFT : Y () =

n=
x(nT)e
jnT
11
Signal
Discrete-Time
x(nT)
nT

T 0
DTFT
y()
2
T
Remarks: Discrete-Time Fourier Transform (DTFT) is the same as
Continuous-Time Fourier Transform (CTFT) with x(nT) (t nT)
replaced by x(nT) and

replaced by

(easy for computers).


12
For simplicity, we drop T.
-2 1
-1 -1/2 0 1/2 1
X
x(n)
DTFT
()

OR
f
0 1 2 2 0 2
DTFT Pair :

X() =

n=
x(n)e
jn
x(n) =
1
2

X()e
jn
d
13
Remark: For DTFT, we also have:
Discrete Periodic Signals
DTFT
Periodic Discrete Spectra.
Ex.
x(n)
2 n
DTFT
X()

Note The Aliasing


When x(n +N) = x(n),
DFT Pair :

x(n) =
1
N

N1
k=0
X(k)e
j2k
n
N
,
X(k) =

N1
n=0
x(n)e
j2k
n
N
14
Ex. Note the Aliasing
x(n)
n
0
9 0 9
k
X(k)
-10
0
10 10
-10
0
n
DFT
DFT
k
Showing One Period
Remarks: For periodic sequences, DFT and DTFT yield similar
spectra. IDFT (Inverse DFT) is the same as IDTFT (inverse
DTFT) with X

2k
N


2k
N

replaced by X(k) and

replaced
by

(easy for computers).


15
Eects of Zero-Padding:
x(n)
n
x(n)
x(n)
n
n
DTFT
10 points
5 points
0
X(k)
0
1 2
3
4
k
0
X(k)
1 2
3
4
5 6
7
8
9 k
2

X ()
DFT
DFT
Remark: The more zeroes padded, the closer X(k) is to X().
X(k) is a sampled version of X() for nite duration sequences.
16
Z-Transform

X(z) =

n=
x(n)z
n
x(n) =
1
2j

c
X(z)z
n1
dz
For nite duration x(n),
X(z) =
N1

n=0
x(n)z
n
The DFT X(k) is related to X(z) as follows:
X(k) = X(z)|
z=e
j
2
N
k
.
Im
Re
( X(k) evenly sampled on the unit circle of the z-plane)
17
Linear Time-Invariant (LTI) Systems.
N
th
order dierence equation:
N1

k=0
a
k
y(n k) =
M

k=0
b
k
x(n k)
Impulse Response:

1
0
(n)
h(n) = y(n)|
x(n)=(n)
H(z) =

M
k=0
b
k
z
k

N
k=0
a
k
z
k
.
18
Bounded-Input Bounded-Output (BIBO) Stability:
All poles of H(z) are inside the unit circle for a causal system
(where h(n)=0, n< 0).
FIR Filter: N=0.
IIR Filter: N>0.
Minimum Phase: All poles and zeroes of H(z) are inside the unit
circle.
19
ENERGY AND POWER SPECTRAL DENSITIES
Energy Spectral Density of Deterministic Signals.
Finite Energy Signal if
0 <

n=
|x(n)|
2
<
Let X() =

n=
x(n)e
jn
Parsevals Energy Theorem:

n=
|x(n)|
2
=
1
2

S()d,
S() = |X()|
2
Remark: |X()|
2
measures the length of orthogonal projection of
{x(n)} onto basis sequence

e
jn

, [, ].
20
Let (k) =

n=
x(n)x

(n k).

k=
(k)e
jk
=

k=

n=
x(n)x

(n k)e
jn
e
j(nk)
=

n=
x(n)e
jn

s=
x(s)e
js

= |X()|
2
= S().
Remark: S() is the DTFT of the autocorrelation of nite
energy sequence {x(n) }.
21
Power Spectral Density (PSD) of Random Signals.
Let {x(n)} be wide-sense stationary (WSS) sequence with
E[x(n)] = 0.
r(k) = E[x(n)x

(n k)].
Properties of autocorrelation function r(k).
r(k) = r

(k).
r(0) |r(k)| , for all k
0 r(0) = average power of x(n).
22
Def: A is positive semidenite if z
H
Az 0 for any z.
(z
H
= (z
T
)

Hermitian transpose ).
Let
A =

r(0) r(k)
r

(k) r(0)

= E

x(n)
x(n k)

(n) x

(n k)

Obviously, A is positive semidenite.


Then all eigenvalues of A are 0.
determinant of A 0.
r
2
(0) |r(k)|
2
0.
23
Covariance matrix:
R =

r(0) r(1) r(m2) r(m1)


r

(1) r(0)
.
.
.
.
.
.
r(m2)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
r

(m1) r

(m2) r

(1) r(0)

It is easy to show that R is positive semidenite.


R is also Toeplitz.
Since R = R
H
, R is Hermitian.
24
Eigendecomposition of R
R = UU
H
,
where U
H
U = UU
H
= I
(U is unitary matrix whose columns are eigenvectors of R)
= diag(
1
, ...,
m
),
(
i
are the eigenvalues of R, real, and 0).
25
First Denition of PSD:
P() =

k=
r(k)e
jk
r(k) =
1
2

P()e
jk
d
Or
P(f) =

k=
r(k)e
j2fk
r(k) =
1
2

1
2
P(f)e
j2fk
df
Remark: Since r(k) is discrete, P() and P(f) are periodic, with
period 2 () and 1 (f), respectively.
We usually consider [, ] or f [
1
2
,
1
2
].
26
r(0) =
1
2

P()d = Average power for all frequency.


PSD
1 2

Average power between 1 and 2


27
Second Denition of PSD.
P() = lim
N
E

1
N

N1

n=0
x(n)e
jn

.
This denition is equivalent to the rst one under
lim
N
1
N
N1

k=N+1
|k| |r(k)| = 0
(which means that {r(k)} decays suciently fast ).
Properties of PSD.
P() 0 for all .
For real x(n), r(k) = r(k), P() = P(), [, ].
For complex x(n), r(k) = r

(k).
28
PSD for LTI Systems.
x(n) y(n)
H
()
P
y
() = P
x
()|H()|
2
.
Complex (DE) Modulation.
y(n) = x(n)e
j
0
n
.
It is easy to show that
r
y
(k) = r
x
(k)e
j
0
k
.
P
y
() = P
x
(
0
).
29
Spectral Estimation Problem
From a nite-length record {x(0), ..., x(N 1)}, determine an
estimate

P() of the PSD, P(), for [-,].
NonParametric Methods:
Periodogram:
Recall the second denition of PSD:
P() = lim
N
E

1
N

N1

n=0
x(n)e
jn

.
Periodogram =

P
p
() =
1
N

N1
n=0
x(n)e
jn

2
.
Remark:

P
p
() 0 for all .
If x(n) is real,

P
p
() is even.
E[

P
p
()] = ? Var[

P
p
()] = ? (to be discussed later on)
30
Correlogram (See rst PSD denition)
Correlogram =

P
c
() =
N1

k=(N1)
r(k)e
jk
.
Unbiased Estimate of r(k):

k 0, r(k) =
1
Nk

N1
i=k
x(i)x

(i k)
k < 0, r(k) = r

(k)
31
Ex.
1
0 1 2
i
x(i)
r(0) =
1
3

2
0
(1)(1) = 1, (average of 3 points)
r(1) = r(1) =
1
2

2
1
(1)(1) = 1, (average of 2 points)
r(2) = r(2) =
1
1

2
2
(1)(1) = 1, (average of 1 point)
r(3) = r(3) = 0.
()
c
p


0
r(k)
k
- 2 1 0 1 2
32
Remark:
r(k) is a bad estimate of r(k) for large k.
E[ r(k)] = r(k) (unbiased )
Proof:
E[ r(k)] = E

1
N k
N1

i=k
x(i)x

(i k)

=
1
N k
N1

i=k
r(k) = r(k)


P
c
() based on unbiased r(k) may be 0.
33
Biased Estimate of r(k) (used more often!)

k 0, r(k) =
1
N

N1
i=k
x(i)x

(i k),
k < 0, r(k) = r

(k),
Remark:
E[ r(k)] =
1
N
N1

i=k
E[x(i)x

(i k)]
=
1
N
N1

i=k
r(k) =
N k
N
r(k)
r(k), as N
(Asymptotically unbiased)
34
Ex.
1
0 1 2
i
x(i)
r(0) =
1
3

2
0
(1)(1) = 1.
r(1) = r(1) =
1
3

2
1
(1)(1) =
2
3
.
r(2) = r(2) =
1
3

2
2
(1)(1) =
1
3
.
r(k)
-2 -1 0 1 2
k
DTFT

P (
c
)
0
35
Remark:
With biased r(k),

P
c
() =

P
p
() 0,for all
E[ r(k)] = r(k)
E[ r(k)] r(k), as N Asymptotically unbiased.


R =

r(0) r(1) r(N 1)


r

(1) r(0) r(N 2)


.
.
.
.
.
.
.
.
.
.
.
.
r

(N 1) r

(N 2) r(0)

,
with r(k) biased estimate. Then

R is positive semidenite.
36
General Comments on

P
p
() and

P
c
().


P
p
() and

P
c
() provide POOR estimate of P(). (The
variances of

P
p
() and

P
c
() are high.)
Reason:

P
p
() and

P
c
() are from a single realization of a random
process.
Compute

P
p
() via FFT.
Recall DFT: (N
2
complex multiplication)
X(k) =
N1

i=0
x(i)e
j
2
N
ki

P
p
(k) =
1
N
|X(k)|
2
.
37
Let
W = e
j
2
N
, N = 2
m
X(k) =
N1

n=0
x(n)W
kn
=
N
2
1

n=0
x(n)W
kn
+
N1

n=
N
2
x(n)W
kn
=
N
2
1

n=0

x(n) +x

n +
N
2

W
Nk
2

W
kn
Note:
W
Nk
2
= e
j
2
N
Nk
2
= e
jk
=

1, even k
1, odd k
38

X(2p) =

N1
n=0

x(n) +x(n +
N
2
)

W
kn
, k = 2p = 0, 2, ...
X(2p + 1) =

N
2
1
n=0

x(n) x(n +
N
2
)

W
kn
, k = 2p + 1,
which requires 2

N
2

2
complex multiplication
This process is continued till 2 points.
Remark: An N = 2
m
-pt FFT requires O(N log
2
N) complex
multiplications.
Zero padding may be used so that N = 2
m
.
Zero padding will not change resolution of

P
p
().
39
FUNDAMENTALS OF ESTIMATION THEORY
Properties of a Good Estimator for a constant scalar a
Small Bias:
Bias = E[ a] a
Small Variance:
Variance = E

( a E[ a])
2

Consistent:
a a as Number of measurements .
40
Ex. Measurement
y = a +e,
Where a is an unknown constant and e is N(0,
2
).
Find a from y ?
Pdf of y:
f(y|a)
y
a
41
Maximum Likelihood (ML) Estimate of a:
Say y = 5, we want to nd a so that it is most likely that the
measurement is 5
f(y|a)
a
|
a= a
ML
= 0.
a
5 = a
y
ML
a
ML
= y
E[ a
ML
] = E[y] = E[a +n] = a
V ar[ a
ML
] = V ar[y] =
2
42
Ex. y = a +e
Three independent measurements y
1
, y
2
, y
3
are taken.
a
ML
= ? Bias = ? Variance = ?
f(y
i
|a) =
1

2
e
(y
i
a)
2
2
2
.
f(y
1
, y
2
, y
3
|a) =
3
i=1
1

2
e
(y
i
a)
2
2
2
.
f(y
1
,y
2
,y
3
|a)
a
|
a= a
ML
= 0
a
ML
=
1
3
(y
1
+y
2
+y
3
).
E[ a
ML
] = E

1
3
(y
1
+y
2
+y
3
)

= a.
V ar[ a
ML
] =
1
9
V ar(y
1
+y
2
+y
3
)
=
1
9
(
2
+
2
+
2
) =

2
3
.
43
Ex. x is a measurement of an uniformly distributed random
variable on [0, ], where is an unknown constant.

ML
= ?

ML
= x

= x
x
ML
Question: What if two independent measurements x
1
and x
2
are
taken ?

ML
=max (x
1
, x
2
).
x x
1 2
44
Cram er - Rao Bound.
Let B(a) = E [ a(r)|a] a denote the bias of a(r), where r is the
measurement.
Then
MSE = E

( a(r) a)
2
|a


[
1+

a
B(a)
]
2
E

[

a
ln f(r|a)
]
2
|a

.
* The denominator of the CRB is known as Fishers Information,
I(a).
* If B(a) = 0, the numerator of CRB is 1.
45
Proof: B(a) = E [ a(r) a|a]
=

[ a(r) a] f(r|a)dr

a
B(a) =

[ a(r) a]

a
f(r|a)dr

f(r|a)dr
. .. .
=1
1 +

a
B(a) =

[ a(r) a] f(r|a)

a
f(r|a)
1
f(r|a)
dr
But

a
ln f(r|a) =

a
f(r|a)
f(r|a)
1 +

a
B(a) =

[ a(r) a] f(r|a)

a
ln f(r|a)dr

[ a(r) a]

f(r|a)


a
ln f(r|a)

f(r|a)

dr

2
=

1 +

a
B(a)

2
.
46
Schwarz Inequality:

g
1
(x)g
2
(x)dx

g
1
2
(x)dx

1
2

g
2
2
(x)dx

1
2
,
where = holds i g
1
(x) = cg
2
(x) for some constant c (c is
independent of x).

1 +

a
B(a)

[ a(r) a]
2
f(r|a)dr


a
ln f(r|a)

2
f(r|a)dr

. .. .
I(a)
where = holds i
a(r) a = c

a
ln f(r|a).
(where c is a constant independent of r).
47
Ecient Estimate:
An estimate is ecient if
(a.) It is unbiased
(b.) It achieves the CR - bound, i.e, E

[ a(r) a]
2
|a

= CRB.
Ex. r = a +e
where a is unknown constant, e

N(0,
2
). a
ML
= ? ecient ?
f(r|a) =
1

2
e

1
2
2
(ra)
2
ln f(r|a) = ln
1

1
2
2
(r a)
2
.

a
ln f(r|a) =
1
2
2
2(r a)
=
1

2
(a r).
48

a
ln f(r|a)

a= a
ML
= 0 a
ML
= r

a
ln f(r|a) =
1

2
(a a
ML
)

2

a
ln f(r|a) = a
ML
a
a
ML
ecient

( a
ML
a)
2

= CRB
E [ a
ML
] = E [r] = a, unbiased
Remark: MSE = V ar[ a
ML
] = V ar[r] =
2
.
I(a) = E


a
ln f(r|a)

= E

2
(a r)

=
1

2
=
1

2
CRB =
1
I(a)
=
2
= V ar[ a
ML
].
49
Remarks:
(1) If a(r) is unbiased, V ar[ a(r) ] CRB.
(2) If an ecient estimate a(r) exists, i.e,

a
ln f(r|a) = c[ a(r) a]. (c is independent of r.)
then
0 =

a
ln f(r|a)|
a= a
ML
(r)
results in a
ML
(r) = a(r).

If an ecient estimate exists, it is a


ML
.
(3) If an ecient estimate does not exist, how good a
ML
(r) is
depends on each specic problem.
No estimator can achieve the CR-bound. Bounds (for example,
Bhattacharya, Barankin) larger than the CR-bound may be found.
50
Independent measurements r
1
, ..., r
N
available, where r
i
may or
may not be Gaussian.
Assume
a
ML
=
1
N
N

i=1
r
i
.
Law of large numbers: a
ML

N
a
Central Limit Theorem:
a
ML
has Gaussian distribution as N .
51
Asymptotic Properties of a
ML
(r
1
, ..., r
N
)
(a) a
ML
(r
1
, ..., r
N
)
N
a ( a
ML
is a consistent estimate.)
(b) a
ML
is asymptotically ecient.
(c) a
ML
is aymptotically Gaussian.
Ex. r = g
1
(a) +e, e

N(0,
2
). a
ML
=? ecient ?
Let b = g
1
(a). Then a = g(b)

a
ln f(r|a) =
1

r g
1
(a)

dg
1
(a)
da
|
a= a
ML
= 0
a
ML
= g(r) = g(

b
ML
).
Invariance property of ML estimator
If a = g(b) then a
ML
= g(

b
ML
).
a
ML
may not be ecient. a
ML
is not ecient if g() is a
nonlinear function.
52
PROPERTIES OF PERIODOGRAM
Bias Analysis
When r(k) is a biased estimate,
E

P
p
()

= E

P
c
()

= E

N1

k=(N1)
r(k)e
jk

k 0, E [ r(k)] =
N k
N
r(k),
k < 0, E [ r(k)] = E [r

(k)] =
N +k
N
r

(k) =
N |k|
N
r(k),
E

P
p
()

=
N1

k=(N1)

1
|k|
N

r(k)e
jk
.
53
Bartlett or Triangular Window.
N-1
k
-(N-1)
1
B
(k) w
E

P
p
()

k=
[w
B
(k)r(k)] e
jk
Let w
B
(k)
DTFT
W
B
()
E

P
p
()

=
1
2

P()W
B
( )d.
54
When r(k) is unbiased estimate,
E

P
p
()

=
1
2

P()W
R
( )d .
w
R
(k)
DTFT
W
R
()
N-1
k
-(N-1) 0
(k)
R
1
B,R
) P(
W ()
E [ P()]
w
55

() W
B
0
Main lobe
Side lobes
3 dB power width of main lobe
2
N
(or
1
N
in Hz) .
Remark: The main lobe of W
B
() smears or smooths P().
Two peaks in P() that are separated less than
2
N
cannot be
resolved in

P
p
().

1
N
in Hz is called spectral resolution limit of periodogram
methods.
56
Remark:
The side lobes of W
B
() transfer power from high power
frequency bins to low power frequency bins leakage.
Smearing and leakage cause more problems to peaky P() than
to at P().
If P() =
2
, for all , E[

P
p
()] = P().
Bias of

P
p
() decreases as N . (asymptotically unbiased.)
57
Variance Analysis
We shall consider the case x(n) is zero-mean circularly symmetric
complex Gaussian white noise.

E[x(n)x

(k)] =
2
(n k).
E[x(n)x(k)] = 0 for all n, k.

is equivalent to:

E [ Re(x(n))Re(x(k))] =

2
2
(n k).
E [Im(x(n))Im(x(k))] =

2
2
(n k).
E [Re(x(n))Im(x(k))] = 0.
Remark: The real and imaginary parts of x(n) are N(0,

2
2
) and
independent of each other.
58
Remark: If x(n) is zero-mean complex Gaussian white noise,

P
p
()
is an unbiased estimate.
r(k) =
2
(k).
E

P
p
()

=
N1

k=(N1)

1
|k|
N

r(k)e
jk
=
2
P
p
() =

k=
r(k)e
jk
=
2
= E

P
p
()

.
59
For Gaussian complex white noise,
E [x(k)x

(l)x(m)x

(n)] =
4
[(k l)(mn) +(k n)(l m)] .
E

P
p
(
1
)

P
p
(
2
)

=
1
N
2
N1

k=0
N1

l=0
N1

m=0
N1

n=0
E [x(k)x

(l)x(m)x

(n)]
e
j
1
(kl)
e
j
2
(mn)
=
4
+

4
N
2
N1

k=0
N1

l=0
e
j(
1

2
)(kl)
=
4
+

4
N
2

N1

k=0
e
j(
1

2
)k

2
=
4
+

4
N
2

sin[(
1

2
)
N
2
]
sin
(
1

2
)
2

2
60
lim
N
E

P
p
(
1
)

P
p
(
2
)

= P(
1
)P(
2
) +P
2
(
1
)(
1

2
).
lim
N
E

P
p
(
1
) P(
1
)

P
p
(
2
) P(
2
)

P
2
(
1
),
1
=
2
0,
1
=
2
( uncorrelated if
1
=
2
)
Remark:

P
p
() is not a consistent estimate.
If
1
=
2
,

P
p
(
1
) and

P
p
(
2
) are uncorrelated with each other.
This variance result is also true for
y(n) =

k=0
h(k)x(n k),
where x(n) is zero-mean complex Gaussian white noise.
x(n) y(n)
h (n)
61
REFINED METHODS
Decrease variance of

P() by increasing bias or
decreasing resolution .
Blackman - Tukey (BT) Method
Remark: The r(k) used in

P
c
() is poor estimate for large lags k.
M < N :

P
BT
() =
M1

k=(M1)
w(k) r(k)e
jk
,
where w(k) is called lag window.
Remark: If w(k) is rectangular, w(k) r(k) is a truncated version of
r(k).
If r(k) is a biased estimate, and w(k)
DTFT
W()

P
BT
() =
1
2

W( )

P
p
()d .
62
Remark: BT spectral estimator is locally weighted average of
periodogram

P
p
().
The smaller the M , the poorer the resolution of

P
BT
() but the
lower the variance.
Resolution of

P
BT
()
1
M
.
Variance of

P
BT
()
M
N
M xed

N
0 .
For xed M ,

P
BT
() is asymptotically biased but variance 0.
Question: When is

P
BT
() 0 ?
63
Theorem: Let Y ()
DTFT
y(n), (N 1) n N 1
Then Y () 0 i

y(0) y(1) y(N 1) 0


y(1) y(0) y(N 2) y(N 1)
.
.
.
.
.
.
y[(N 1)] y(0) y(1)
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

is positive semidenite.
In other words, Y () 0 i
, 0, , 0, y[(N 1)], , y(0), y(1), , y(N 1), 0, is a
positive semidenite sequence.
64
Remark:

P
BT
() 0 i {w(k) r(k)} is a positive semidenite
sequence.


P
BT
() 0 i

R
BT
=

w(0) r(0) w(M 1) r(M 1) 0


.
.
.
.
.
.
w[(M 1)] r[(M 1)] w(0) r(0)
0
.
.
.
.
.
.
.
.
.

is positive semidenite, i.e,



R
BT
0.
65

R
BT
=

w(0) w(M 1) 0
.
.
.
w[(M 1)] w(0)
0
.
.
.
.
.
.
.
.
.

r(0) r(N 1) 0
.
.
.
r[(N 1)] r(0)
0
.
.
.
.
.
.
.
.
.

= Hadamard matrix product:


(ij)
th
element: (A

B)
ij
= A
ij
B
ij
66
Theorem:
If A 0 (positive semidenite) B 0 then A

B 0.
Remark: If r(k) is a biased estimate,

P
p
() 0 . Then if W()
0 , we have

P
BT
() 0 .
Remark: Nonnegative denite (positive semidenite) window
sequences: Bartlett, Parzen.
67
Time-Bandwidth Product
Equivalent Time Width N
e
:
N
e
=

M1
n=(M1)
w(n)
w(0)
Ex.
N
e
=

M1
k=(M1)
(1)
1
= 2M 1.
0
1
(n)
n
-(M-1) M-1
w
R
68
Ex.
w
B
(n) =

1
|n|
M
, (M 1) n (M 1)
0, else
N
e
= M
-(M-1)
0
1
(n)
M-1
n
B
w
69
Equivalent Bandwidth
e
:
2
e
=

W()d
W(0)
Since w(n)
DTFT
W().
w(n) =
1
2

W()e
jn
d.
w(0) =
1
2

W()d.
W() =
M1

n=(M1)
w(n)e
jn
.
W(0) =
M1

n=(M1)
w(n)
70
N
e

e
=

M1
n=(M1)
w(n)
1
2

W()d

W()d
2

M1
n=(M1)
w(n)
= 1

N
e

e
= 1
(Time Bandwidth product.)
Remark:
If a signal decays slowly in one domain, it is more concentrated in
the other domain.
Window shape determines the side lobe level relative to W(0).
71
Ex:
x(2n)
DTFT

1
2
X

.
x(n)
x(2n)
n
1
0
X()
1/2

1/2
0
n

1/4
Remark: Once the window shape is xed, M N
e

e
.
M main lobe width .
72
Window design for

P
BT
()
Let
m
= 3dB main lobe width.
Resolution of

P
BT
()
m
Variance of

P
BT
()
1

m
.
Choice of
m
is based on the trade-o between resolution
and variance, and N
Choice of window shape is based on leakage, and N.
Practical rule of thumb:
1. M
N
10
.
2. Window shape based on trade-o between smearing and leakage.
3. Window shape for

P
BT
() 0,
Remark: Other methods for Non-parametric Spectral
Estimation include: Bartlett, Welch, Daniell Methods.
All try to reduce variance at the expense of poorer resolution.
73
Bartlett Method
x(n):
. .. .
x
1
(n)

. .. .
x
2
(n)

. .. .
x
L
(n)
x(n) is an N point sequence.
x
l
(n), l = 1, , L, are M point sequences.
x
l
(n) are non-overlapping. L =
N
M
.

P
l
() =
1
M

M1

n=0
x
l
(n)e
jn

P
B
() =
1
L
L

l=1

P
l
().
Remark:


P
B
() 0, .
For large M and L,

P
B
() [

P
BT
() using w
R
(n) ]
74
Welch Method:
x
l
(n) may overlap in the Welch method.
x
l
(n) may be windowed before computing Periodogram.
x
s
(n)
x
x
1
2
(n)
(n)
Let w(n) be the window applied to x
l
(n), l = 1, .., S, n = 0, .., M1
Let
P = power of w(n) =
1
M
M1

n=0
|w(n)|
2
75

P
l
() =
1
MP

M1

n=0
w(n)x
l
(n)e
jn

P
W
() =
1
S
S

l=1

P
l
()
Remarks: By allowing x
l
(n) to overlap, we hope to have a larger
S, the number of

P
j
() we average. 50% overlap in general.
Practical examples show that

P
W
() may oer lower variance
than

P
B
(), but not signicantly.


P
W
() may be shown to be

P
BT
() -type estimator, under
reasonable approximation.


P
W
() can be easily computed with FFT -favored in practice


P
BT
() is theoretically favored.
76
Daniell Method:

P
D
() =
1
2

P
p
()d.
p
P
+
()

Remark:

P
D
() is a special case of

P
BT
() with
w(n) in

P
BT
()
DTFT
W() =

, [, ]
0, else .
The larger the , the lower the variance, but the poorer the
resolution.
77
Implementation of

P
D
()
Zero pad x(n) so that x(n) has N

points, N

>> N.
Calculate

P
p
(
k
) with FFT.

k
=
2
N

k, k = 0, , N

1.

P
D
(
k
) =
1
2J + 1
k+J

j=kJ

P
p
(
j
).

P
p
()
. .. .
2J+1 points averaging

P
D
(
k
)
78
PARAMETRIC METHODS
Parametric Modeling
Ex.
P(f) =
r(0)

2
f
e

1
2

2
, |f|
1
2
P(f)
f
Remark: P(f) is described by 2 unknowns: r(0) and
f
.
Once we know r(0) and
f
, we know P(f), the PSD.
Nonparametric methods assume no knowledge on P(f) too
many unknowns.
Parametric Methods attempt to estimate r(0) and
f
.
79
Parsimony Principle:
Better estimates may be obtained by using an
appropriate data model with fewer unknowns.
Appropriate Data Model.
If data model wrong,

P(f) will always be biased.
Estimate
True PSD
f
To use parametric methods, reasonably correct a priori
knowledge on data model is necessary.
80
Rational Spectra:
P() =
2

B()
A()

2
A() = 1 +a
1
e
j
+ +a
p
e
jp
B() = 1 +b
1
e
j
+ +b
q
e
jq
.
Remark: We mostly consider real valued signals here.
a
1
, , a
p
, b
1
, , b
q
are real coecients.
Any continuous PSD can be approximated arbitrarily close by a
rational PSD.
81
u(n)
H() =
()
()
x(n)
B
A
u(n) = zero-mean white noise of variance
2
.
P
xx
() =
2

B()
A()

2
.
Remark:
The rational spectra can be associated with a signal obtained by
ltering white noise of power
2
through a rational lter with
H() =
B()
A()
.
82
In Dierence Equation Form,
x(n) =
p

k=1
a
k
x(n k) +
q

k=0
b
k
u(n k).
In Z-transform Form, z = e
j
H(z) =
B(z)
A(z)
,
A(z) = 1 +a
1
z
1
+ +a
p
z
p
B(z) = 1 +b
1
z
1
+ +b
q
z
q
-1
z
x(n) x(n-1)
Unit Delay line
Notation sometimes used : z
1
x(n) = x(n 1)
Then: x(n) =
B(z)
A(z)
u(n)
83
ARMA Model: ARMA(p,q)
P() =
2

B()
A()

2
.
AR Model: AR(p)
P() =
2

1
A()

2
.
MA Model: MA(q)
P() =
2
|B()|
2
.
Remark: AR models peaky PSD better .
MA models valley PSD better.
ARMA is used for PSD with both peaks and valleys.
84
Spectral Factorization:
H() =
B()
A()
P() =
2

B()
A()

2
=

2
B()B

()
A()A

()
.
A() = 1 +a
1
e
j
+ +a
p
e
jp
b
1
, , b
q
, a
1
, , a
p
are real coecients.
A

() = 1 +a
1
e
j
+ +a
p
e
jp
= 1 +a
1
1
z
+ +a
p
1
z
p
= A(
1
z
)
P(z) =
2
B(z)B(
1
z
)
A(z)A(
1
z
)
.
Remark: If a
1
, , a
p
, b
1
, , b
q
are complex,
P(z) =
2
B(z)B

(
1
z

)
A(z)A

1
z

85
Consider
P(z) =
2
B(z)B(
1
z
)
A(z)A(
1
z
)
.
Remark: If is zero for P(z), so is
1

.
If is a pole for P(z), so is
1

.
Since the a
1
, , a
p
, b
1
, , b
q
are real, the poles and zeroes of
P(z) occur in complex conjugate pairs.
Im
Re
1
x x
86
Remark:
If poles of
1
A(z)
inside unit circle , H(z) =
B(z)
A(z)
is BIBO stable.
If zeroes of B(z) inside unit circle, H(z) =
B(z)
A(z)
is minimum
phase.
We chose H(z) so that both its zeroes and poles are inside unit
circle.
u(n)
H =
x(n)
B
A
(z)
(z)
(z)
Minimum Phase system
Stable and
87
Relationships Among Models
An MA(q) or ARMA(p,q) model is equivalent to an AR().
An AR(p) or ARMA(p,q) model is equivalent to an MA()
model
Ex:
H(z) =
1 + 0.9z
1
1 + 0.8z
1
= ARMA(1,1)
H(z) =
1
(1 + 0.8z
1
)
1
(1+0.9z
1
)
=
1
(1 + 0.8z
1
)(1 0.9z
1
+ 0.81z
2
+ )
= AR().
Remark:Let ARMA(p,q) =
B(z)
A(z)
=
1
C(z)
= AR().
From a
1
, , a
p
, b
1
, , b
q
, we can nd c
1
, c
2
, and vice versa.
88
Since
B(z)
A(z)
=
1
C(z)
B(z)C(z) = A(z)

1 +b
1
z
1
+ +b
q
z
q

1 +c
1
z
1
+

1 +a
1
z
1
+ +a
p
z
p

1 0 0
c
1
1 0 0
.
.
.
.
.
.
.
.
.
.
.
.
c
p
.
.
.
.
.
.
.
.
.
c
p+1
.
.
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
.
.
.
.
.
c
1
.
.
. c
p
.
.
.

1
b
1
b
2
.
.
.
b
q

1
a
1
.
.
.
a
p
0
.
.
.

()
89

c
p+1
c
p
c
pq+1
.
.
.
.
.
.
.
.
.
.
.
.
c
p+q
.
.
.
c
p

1
b
1
.
.
.
b
q

0
0
.
.
.
0

c
p
c
pq+1
.
.
.
.
.
.
c
p+q1
c
p

b
1
.
.
.
b
q

c
p+1
.
.
.
c
p+q

.()
Remark: Once b
1
, , b
q
are computed with () a
1
, , a
p
can be
computed with ().
90
Computing Coecients from r(k).
AR signals.
Let
1
A(z)
= 1 +
1
z
1
+
2
z
2
+
x(n) =
1
A(z)
u(n) = u(n) +
1
u(n 1) +

E [x(n)u(n)] =
2
E [x(n k)u(n)] = 0, k 1
Since A(z)x(n) = u(n)
x(n) +a
1
x(n 1) + +a
p
x(n p) = u(n)

x(n) x(n 1) x(n p)

1
a
1

a
p

= u(n)
91
k = 0,
E

x(n)

x(n) x(n 1) x(n p)

1
a
1
.
.
.
a
p

=
2
.

r(0) r(1) r(p)

1
a
1
.
.
.
a
p

=
2
. ()
92
k 1,
E

x(n k)

x(n) x(n 1) x(n p)

1
a
1
.
.
.
a
p

= 0.

r(k) r(k 1) r(k p)

1
a
1
.
.
.
a
p

= 0. ()
93

r(0) r(1) r(p)


r(1) r(0) r(p + 1)
.
.
.
.
.
.
.
.
.
r(p) r(p 1) r(0)

1
a
1
.
.
.
a
p

2
0
.
.
.
0

.
Ra = r

r(0) r(p + 1)
.
.
.
.
.
.
r(p 1) r(0)

a
1
.
.
.
a
p

r(1)
.
.
.
r(p)

.
94
Remarks:
When we only have N samples, {r(k)} is not available. { r(k)}
may be used to replace {r(k)} to obtain a
1
, , a
p
.
This is the Yule - Walker Method.
R is a positive semidenite matrix. R is positive denite unless
x(n) is a sum of less than
p
2
sinusoids.
R is Toeplitz.
Levinson - Durbin algorithm is used to solve for a eciently
AR models are most frequently used in practice.
Estimation of AR parameters is a well-established topic.
95
Remarks:
If { r(k)} is a positive denite sequence and if a
1
, , a
p
are found
by solving Ra = r, then the roots of polynomial
1 +a
1
z
1
+ +a
p
z
p
are inside the unit circle.
The AR system thus obtained is BIBO stable
Biased estimate { r(k)} should be used in YW-equation to obtain
a stable AR system:
96
Ecient Methods for solving
Ra = r or

R a = r
Levinson - Durbin Algorithm.
Delsarte - Genin Algorithm.
Gohberg - Semencul Formula for R
1
or

R
1
(Sometimes, we may be interested in not only a but also R
1
)
97
Levinson - Durbin Algorithm (LDA)
Let
R
n+1
=

r(0) r(1) r(n)


r(1) r(0)
.
.
.
.
.
.
r(n) r(n 1) r(0)

, ( real signal )
n = 1, 2, , p
Let
n
=

a
n,1
.
.
.
a
n,n

,
98
LDA solves
R
n+1

n

0

recursively in n, starting from n = 1.


Remark:
For n = 1, 2, , p,
LDA needs p
2
ops
Regular matrix inverses need p
4
ops.
99
Let A = Symmetric and Toeplitz.
Let

b =

b
n
b
n1
.
.
.
b
1

, with b =

b
1
.
.
.
b
n

Then if c = Ab
c = A

b
100
Proof:
A =

a
0
a
1
a
n1
a
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
1
a
n1
a
1
a
0

A
ij
= a
|ij|
c
i
= c
ni+1
=
n

k=1
A
ni+1,k
b
k
=
n

k=1
a
|ni+1k|
b
k
=
n

m=1
a
|mi|
b
nm+1
=
n

m=1
A
m,i

b
m
(m = n k + 1)
= (A

b)
i
101
Consider:
R
n+2

n
0

R
n+1
.
.
. r(n + 1)
.
.
. r(n)
.
.
.
.
.
.
.
.
. r(1)
r(n + 1)
.
.
. r(0)

n

0

n
0

Let r
n
=

r(1)
.
.
.
r(n)

.
Then
n
= r(n + 1) +
n
T
r
n
.
102
Result:
Let k
n+1
=

n
. Then

n+1
=

n
0

+k
n+1

n
1

n+1
=
n
(1 k
2
n+1
)
103
Proof:
R
n+2

n+1

= R
n+2

n
0

+k
n+1

n
1

n
0

+k
n+1

n
0

n
+k
n+1

n
0

n
+k
n+1

n+1
0
0

.
104
LDA: Initialization:
n = 1 : R
2
=

r(0) r(1)
r(1) r(0)

1
0

1
=
r(1)
r(0) O(1) ops

1
= r(0)
r
2
(1)
r(0)
O(1) ops
k
1
=
1
For n = 1, 2, , p 1, do:
k
n+1
=
r(n+1)+
T
n
r
n

n
n ops

n+1
=
n
(1 k
2
n+1
) O(1) ops

n+1
=

n
0

+k
n+1

n
1

. n ops
105
Ex:

1
2
1

2
1

1
a
1
a
2

2
0
0

.
Straightforward Solution:

a
1
a
2

1
1

=
1
(1
2
)

1
1


2
= 1
2
.
106
LDA: Initialization:

1
=
r(1)
r(0)
=

1
=

1
= r(0)
r
2
(1)
r(0)
= 1
2
.
k
1
=
1
= .
r
1
= ,
k
2
=
r(2) +
T
1
r
1

1
=

2
+ ()
1
2
= 0

2
=
1
(1 k
2
2
) = (1
2
)(1 0
2
)
= 1
2
=
2
107

2
=

1
0

+k
2

1
1

+ 0

a
1
a
2

Properties of LDA:
|k
n
| < 1, n = 1, 2, , p, and r(0) > 0, i
A
n
(z) = 1 +a
n,1
z
1
+ +a
n,n
z
n
= 0
has roots inside the unit circle.
|k
n
| < 1, n = 1, 2, , p, and r(0) > 0 i R
n+1
> 0
108
Proof (for the second property above only): We rst use induction
to prove:

1 a
n,1
a
n,n
.
.
.
.
.
.
0 1 a
1,1
1

. .. .
U
T
n+1

r(0) r(n)
.
.
.
r(n) r(0)

. .. .
R
n+1

1
a
n,1
1
.
.
.
.
.
.
a
n,n
a
1,1
1

. .. .
U
n+1
=

n
.
.
.
0
0
1
r(0)

. .. .
D
n+1
()
109
n = 1 :

r(0) r(1)
r(1) r(0)

1
a
1,1

1
0

1 a
1,1
0 1

r(0) r(1)
r(1) r(0)

1 0
a
1,1
1

1 a
1,1
0 1

1
r(1)
0 r(0)

1
0
0 r(0)

.
110
Suppose () is true for n = k 1, i.e.,
U
T
k
R
k
U
k
= D
k
.
Consider n = k:
U
T
k+1
R
k+1
U
k+1
=

1
T
k
0 U
T
k

r(0) r
T
k
r
k
R
k

1 0

k
U
k

r(0) +
T
k
r
k
r
T
k
+
T
k
R
k
U
T
k
r
k
U
k
R
k

1 0

k
U
k

Since R
k+1

k
0

111

r(0) r
T
k
r
k
R
k

k
0

r(0) +r
T
k

k
=
k
r(0) +
T
k
r
k
=
k
r
k
+R
k

k
= 0 r
T
k
+
T
k
R
T
k
= r
T
k
+
T
k
R
k
= 0
U
T
k+1
R
k+1
U
k+1
=

k
0
U
T
k
r
k
U
T
k
R
k

1 0

k
U
k

k
0
U
T
k
r
k
+U
T
k
R
k

k
U
T
k
R
k
U
k

k
0
0 D
k

= D
k+1
112
U
T
n+1
R
n+1
U
n+1
= D
n+1
.
() proven !
Since U
1
n+1
R
1
n+1

U
T
n+1

1
= D
1
n+1
,

R
1
n+1
= U
n+1
D
1
n+1
U
T
n+1
.
U
n+1
D

1
2
n+1
is called Cholesky Factor of R
1
n+1
Consider the determinant of R
n+1
:
det(R
n+1
) = det(D
n+1
) = r(0)
n
k=1

k
det(R
n+1
) =
n
det(R
n
)
R
n+1
> 0, n = 1, 2, , p, i r(0) > 0
and
k
> 0, k = 1, 2, , p.
113
Recall

n+1
=
n
(1 k
2
n+1
).
If R
n+1
> 0,
r(0) > 0,
n
> 0, n = 1, 2, , p,
k
2
n+1
=

n

n+1

n
Since
n

n+1
<
n
,
k
2
n+1
< 1 |k
n+1
| < 1.
If |k
n
| < 1, r(0) > 0,
k
2
n+1
< 1.

0
= r(0) > 0,

n+1
=
n
(1 k
2
n+1
) > 0, n = 1, 2, , p 1
114
MA Signals:
x(n) = B(z)u(n)
= u(n) +b
1
u(n 1) + +b
q
u(n q)
r(k) = E [x(n)x(n k)]
= E {[u(n) + +b
q
u(n q)]
[u(n k) + +b
q
u(n q k)]}
|k| > q : r(k) = 0
|k| < q : r(k) =
2
qk

l=0
b
l
b
l+k
, q > k 0
r(k) = r(k). q < k < 0
b
0
= 1, b
1
, , b
q
= real.

P() =

q
k=q
r(k)e
jk
.
115
Remarks: Estimating b
1
, , b
q
is a nonlinear problem.
A simple estimator is

P() =
q

k=q
r(k)e
jk
.
* This is exactly Blackman - Tukey method with rectangular
window of length 2q + 1.
* No matter whether r(k) is biased or unbiased estimate, this

P()
may be < 0 .
* When unbiased r(k) is used,

P() is unbiased.
* To ensure

P() 0 , , we may use biased r(k) and
a window with W() 0, . For this case,

P() is biased.
This is again exactly BT-method.
A most used MA spectral estimator is based on a Two-Stage
Least Squares Method. See the discussions on ARMA later.
116
ARMA Signals: (Also called Pole -Zero Model).
(1 +a
1
z
1
+ +a
p
z
p
)x(n) = (1 +b
1
z
1
+ +b
q
z
q
)u(n).
Let us write x(n) as MA():
x(n) = u(n) +h
1
u(n 1) +h
2
u(n 2) +

E [x(n)u(n)] =
2
.
E [u(n)x(n k)] = 0, k 1
ARMA model can be written as

1 a
1
a
p

x(n)
x(n 1)
.
.
.
x(n p)

1 b
1
b
q

u(n)
u(n 1)
.
.
.
u(n q)

Next we shall multiply both sides by x(n k) and take E {.} .


117
k= 0:

1 a
1
a
p

r(0)
r(1)
.
.
.
r(p)

1 b
1
b
q

2
h
1
.
.
.

2
h
q

k = 1:

1 a
1
a
p

r(1)
r(0)
.
.
.
r(p 1)

1 b
1
b
q

2
h
1
.
.
.

2
h
q1

.
.
.
118
k q+1

1 a
1
a
p

r(k)
r(k + 1)
.
.
.
r(k +p)

1 b
1
b
q

0
0
.
.
.
0

= 0.

r((q + 1)) r(q) r((q + 1) +p)


r((q + 2)) r((q + 1)) r((q + 2) +p)
.
.
.
.
.
.

1
a
1
.
.
.
a
p

= 0.
This is the modifed YW - Equation
119
To solve for a
1
, , a
p
we need p equations. Using r(k) = r(k)
gives

r(q + 1) r(q) r(q p + 1)


r(q + 2) r(q + 1) r(q p + 2)
.
.
.
.
.
.
r(q +p) r(q +p 1) r(q)

1
a
1
.
.
.
a
p

= 0.

r(q) r(q p + 1)
r(q + 1)
.
.
.
.
.
.
r(q +p 1) r(q)

a
1
.
.
.
a
p

r(q + 1)
r(q + 2)
.
.
.
r(q +p)

120
Remarks:
(1) Replacing r(k) for r(k) above, we can solve for a
1
, , a
p
.
(2) The matrix on the left side
is nonsingular under mild conditions.
is Toeplitz.
is NOT symmetric.
Levinson - type fast algorithms exist.
121
What about the MA part of the ARMA PSD?
Let y(n) = (1 +b
1
z
1
+ +b
q
z
q
)u(n).
The ARMA model becomes
(1 +a
1
z
1
+ +a
p
z
p
)x(n) = y(n)
y(n) x(n) x(n)
A(z)
y(n)
,
A(z)
1
P
x
() =

1
A()

2
P
y
().
Let
k
be the autocorrelation function of y(n) . Then (see MA
signals).
P
y
() =
q

k=q

k
e
jk
122

k
= E [y(n)y(n k)]
= E [A(z)x(n)A(z)x(n k)]
= E

i=0
a
i
x(n i)
p

j=0
a
j
x(n j k)

=
p

i=0
p

j=0
a
i
a
j
r(k +j i).
Since a
1
, , a
p
may be computed with the modied YW- Method


k
=

p
i=0

p
j=0
r(k +j i) a
i
a
j
, a
0

= 1, k = 0, 1, , q

k
=
k
.
123
ARMA PSD Estimate:

P() =

q
k=q

k
e
jk

A()

2
Remarks:
This method is called modied YW ARMA Spectral Estimator


P() is not guaranteed to be 0, , due to the MA part.
The AR estimates a
1
, , a
p
have reasonable accuracy if the
ARMA poles and zeroes are well inside the unit circle.
Very poor estimates a
1
, , a
p
occur when ARMA poles and
zeroes are closely-spaced and nearby unit circle. (This is
narrowband signal case).
124
Ex: Consider
x(n) = cos(
1
n +
1
) + cos(
2
n +
2
),
where
1
and
2
are independent and uniformly distributed on
[0,2].
r(k) =
1
2
cos(
1
k) +
1
2
cos(
2
k).
cos
k

cos
2
1
k
k
Note that when
1

2
, large values of k are needed to distinguish
cos(
1
k) and cos(
2
k).
Remark: This comment is true for both AR and ARMA models.
125
Overdetermined Modied Yule - Walker Equation (M > p)

r(q) r(q p + 1)
.
.
.
.
.
.
r(q +p 1) r(q)
.
.
.
.
.
.
r(q +M 1) r(q +M p)

a
1
.
.
.
a
p

r(q + 1)
.
.
.
r(q +p)
.
.
.
r(q +M)

126
Remarks:
The overdetermined linear equations may be solved with
Least Squares or Total Least Squares Methods.
M should be chosen based on the trade-o between information
contained in the large lags of r(k) and the accuracy of r(k).
Overdetermined YW -equation may also be obtained for AR
signals.
127
Solving Linear Equations:
Consider A
mn
x
n1
= b
m1
.
When m = n and A is full rank, x = A
1
b.
When m > n and A is full rank n, then the solution exists if b is
in the n-dimensional subspace of the m-dimensional space that is
determined by the columns in A.
Ex:
A =

1
0

If b =

3
0

, x = 3.
If b =

1
1

, x =? does not exist !


128
Least Squares (LS) Solution
for Overdetermined Equations:
Objective of LS solution:
Let e = Ax b
Find x
LS
so that e
H
e is minimized.
Let e =

e
1
e
2
.
.
.
e
m

Euclidean Norm = e
H
e = |e
1
|
2
+|e
2
|
2
+ +|e
m
|
2
129
Ex:
m
2
e
e
1
e
Remarks: Ax
LS
= b +e
LS
We see that x
LS
is found by perturbing b so that a solution
exists.
130
e
H
e = (Ax b)
H
(Ax b)
= x
H
A
H
Ax x
H
A
H
b b
H
Ax +b
H
b
=

x (A
H
A)
1
A
H
b

H
(A
H
A)

x (A
H
A)
1
A
H
b

b
H
b b
H
A(A
H
A)
1
A
H
b

Remark: The 2
nd
term above is independent of x.
e
H
e is minimized if
x = (A
H
A)
1
A
H
b
LS Solution
131
Illustration of LS solution:
Let
A = [a
1
.
.
. a
2
].
x
LS
=

x
1
x
2

1
b
a
a
A
x
x
1
a
1
2
a
2
x
LS
2
132
Ex:
A =

1
0

, b =

1
1

, x
LS
=?
x
LS
= (A
H
A)
1
A
H
b
=

1 0

1
0

1 0

1
1

= 1
Ax
LS
=

1
0

(1) =

1
0

,
e
LS
= Ax
LS
b =

1
0

1
1

0
1

133
Computational Aspects of LS
Solving Normal Equations

A
H
A

x
LS
= A
H
b.
(1)
This equation is called Normal equation.
Let
A
H
A = C, A
H
b = g.
Cx
LS
= g, where C is positive denite.
134
Cholesky Decomposition:
C = LDL
H
,
where L =

1 0 0
l
21
1 0
.
.
.
.
.
.
l
n1
l
n2
1

(Lower Triangular Matrix )


D =

d
1
0
.
.
.
0 d
n

, d
i
> 0.
135
Back - Substitution to solve:
LDL
H
x
LS
= g
Let
y = DL
H
x
LS
.

1 0 0
l
21
1 0
.
.
.
.
.
.
l
n1
l
n2
1

y
1
y
2
.
.
.
y
n

g
1
.
.
.
g
n

y
1
= g
1
y
2
= g
2
l
21
y
1
.
.
.
y
k
= g
k

k1
j=1
l
kj
y
j
, k = 3, , n.
136

1 l

21
l

n1
0 1 l

n2
.
.
.
0 1

x
1
.
.
.
x
n

= L
H
x
LS
= D
1
y =

y
1
d
1
.
.
.
y
n
d
n

x
n
=
y
n
d
n
x
k
=
y
k
d
k

n
j=k+1
l

jk
x
j
, k = n 1,
Remarks:
Solving Normal equations may be sensitive to numerical errors.
137
Ex.

3 3
4 4 +

x
1
x
2

1
1

, Ax = b
where is a small number.
Exact solution:

x
1
x
2

Assume that due to truncation errors,


2
= 0.
A
H
A
.
=

25 25 +
25 + 25 + 2

, A
H
b =

1
1 + 2

.
138
Solution to Normal equation (Note the Big Dierence!):
x =

A
H
A

1
A
H
b =

49

+ 2

49

.
139
QR Method: (Numerically more robust).
Ax = b.
Using Householder transformation, we can nd an orthonormal
matrix Q

i.e, QQ
H
= I

, such that

T
....
0

x = QAx = Qb =

z
1

z
2

,
where T is a square, upper triangular matrix, and
min e
H
e = z
H
2
z
2
Tx
LS
= z
1
Back Substitution to nd x
LS
140
Ex.

3 3
4 4

x
1
x
2

1
1

.
Q =
1
5

3 4
4 3

.
QAx = Qb gives

5 5 +

5
0
7
5

x
1
x
2

1
5

7
5

x
2
=
1

x
1
=
1

(same as exact solution)


Remark: For large number of overdetermined equations, QR
method needs about twice as much computation as solving Normal
equation in (1).
141
Total Least Squares (TLS) solution to Ax = b.
Recall x
LS
is obtained by perturbing b only, i.e,
Ax
LS
= b +e
LS
. e
H
LS
e
LS
= min.
x
TLS
is obtained by perturbing both A and b, i.e.,
(A+E
TLS
) x
TLS
= b +e
TLS
,
||[E
TLS
b
TLS
]||
F
= minimum,
where ||.||
F
is Frobenius matrix norm,
||G||
F
=

j
|g
ij
|
2
,
g
ij
= (ij)
th
element of G.
142
Illustration of TLS solution
The straight line is found by minimizing the shortest distance
between the line and the points squared
Let C = [A B].
Let the singular value decomposition (SVD) of C be
C = UV
H
,
Remarks: The columns of U are the eigenvectors of CC
H
.
143
Remarks: The columns in V are the eigenvectors of C
H
C.
Both U and V are unitary matrices, i.e,
UU
H
= U
H
U = I, VV
H
= V
H
V = I.
is diagonal and the diagonal elements are the

eigenvalues of
C
H
C

1
0
.
.
.
0
n+1
0 0

.

1

2

n+1
0,
i
are real
144
Let
n 1
V =

V
11
.
.
. V
12
V
21
.
.
. V
22

n
1
x
TLS
= V
12
V
22
1
Remarks:
At low SNR, TLS may be better than LS.
At high SNR, TLS and LS yield similar results.
Markov Estimate:
If the statistics of e = Ax - b is known,
the statistics may be used to obtain better solution to Ax =b.
145
ARMA Signals:
Two Stage Least Squares Method
Step 1: Approximate ARMA(p, q) with AR(L) for a large L.
YW Equation may be used to estimate a
1
, a
2
, , a
L
.
u(n) = x(n) + a
1
x(n 1) + + a
L
x(n L).

2
=
1
N L
N

n=L+1
u
2
(n).
146
Step 2: System Identication
u(n)
B(z)
A(z)
x(n)
Let x =

x(0)
x(1)
.
.
.
x(N 1)

, u =

u(0)
u(1)
.
.
.
u(N 1)

.
147
=

a
1
a
2
.
.
.
a
p
b
1
.
.
.
b
q

.
H =

x(1) x(p) u(1) u(q)


x(0) x(p + 1) u(0) u(q + 1)
.
.
.
x(N 2) x(N p 1) u(N 2) u(N q 1)

148
x = H + u (real signals) .
LS Solution

=

H
T
H

1
H
T
(x u)
Remarks:
Any elements in H that are unknown are set to zero.
QR Method may be used to solve the LS problem.
Step 3:

P() =
2

1 +

b
1
e
j
+ +

b
q
e
jq
1 + a
1
e
j
+ + a
p
e
jp

2
Remark: The dicult case for this method is when ARMA zeroes
are near unit circle.
149
Further Topics on AR Signals:
Linear prediction of AR Processes
Forward Linear Prediction
n-4 n
x(n)
n-3
n-2
n-1
Samples used to predict x(n)
x(n)
-
m
+
e (n)
i=1
x(n-i) a
(n) x
-
f
f
f
i
150
e
f
(n) = x(n) x
f
(n).

f
= E

e
f
(n)

Goal: Minimize
f

f
= E

e
f
(n)

= E

x(n) +
m

i=1
a
f
i
x(n i)

= r
xx
(0) +
m

i=1
a
f
i
r
xx
(i)
+
m

j=1
a
f
j
r
xx
(j) +
m

i=1
m

j=1
a
f
i
a
f
j
r
xx
(j i)

f
a
f
i
= 0 r
xx
(i) +
m

j=1
a
f
j
r
xx
(j i) = 0.
151

r
xx
(0) r
xx
(1) r
xx
(m)
r
xx
(1) r
xx
(0) r
xx
(m1)
.
.
.
r
xx
(m) r
xx
(m1) r
xx
(0)

1
a
f
1
.
.
.
a
f
m

f
0
.
.
.
0

.
Remarks: This is exactly the YW - Equation.

f
decreases as m increases.
p
m
f

152
Backward Linear prediction
Samples used to predict x(n)
n+2
n+3
n+4
n
x(n)
n+1
x
b
(n) =
m

i=1
a
b
i
x(n +i).
e
b
(n) = x(n m) x
b
(n m)

b
= E

e
b
(n)

.
153
To minimize
b
, we obtain

r
xx
(0) r
xx
(1) r
xx
(m)
.
.
.
r
xx
(m) r
xx
(m1) r
xx
(0)

1
a
b
1
.
.
.
a
b
m

b
0
.
.
.
0

a
f
i
= a
b
i
, for all i

f
=
b
.
154
Consider an AR(p) model and the notation in LDA:
Let m = 1, 2, , p
e
f
m
(n) = x(n) +
m

i=1
a
f
m,i
x(n i)
=

x(n) x(n 1) x(n m)

.
e
b
m
(n) = x(n m) +
m

i=1
a
b
m,i
x(n m+i)
= [x(n m) x(n m+ 1) x(n)]

= [x(n) x(n m+ 1) x(n m)]

m
1

155
Recall LDA:

m
=

m1
0

+k
m

m1
1

.
e
f
m
(n) =
[x(n) x(n 1) x(n m)]

m1
0

+k
m

m1
1

= [x(n) x(n 1) x(n m+ 1)]

m1

+k
m
[x(n 1) x(n 2) x(n m)]

m1
1

156
e
f
m
(n) = e
f
m1
(n) +k
m
e
b
m1
(n 1).
Similarly,
e
b
m
(n) = e
b
m1
(n 1) +k
m
e
f
m1
(n).
157
Lattice Filter for Linear Prediction Error
Z
-1
Z
-1
Z
-1
k1
k2
km
k1
k2 km
x(n)
e
f
1
(n) e
2
f
(n)
e
b
1
(n)
e
b
2
(n)
e
f
m
(n)
(n)
b
e
m
Remarks: The implementation advantage of lattice lters is that
they suer from less round-o noise and are less sensitive to
coecient errors.
If x(n) is AR(p) and m = p, then
x(n)
u(n)
1 +
Whitening Filter
a
1
z
-1
a
p
z +
-p
+
158
AR Spectral Estimation Methods
Autocorrelation or Yule-Walker method: Recall that YW-
Equation may be obtained by minimizing
E

e
2
(n)

= E

[x(n) x(n)]
2

,
where
x(n) =
p

k=1
a
k
x(n k).
The autocorrelation or YW method replaces r(k) in the YW
equation with biased r(k)

r(0) r(p 1)
.
.
.
.
.
.
.
.
.
r(p 1) r(0)

a
1
.
.
.
a
p

r(1)
.
.
.
r(p)

.
159
Covariance or Prony Method
Consider the AR(p) signal,
x(n) =
p

k=1
a
k
x(n k) +u(n), n = 0, 1, , N 1
In matrix form,

x(p)
x(p + 1)
.
.
.
x(N 1)

x(p 1) x(p 2) .. x(0)


x(p) x(p + 1) .. x(1)
.
.
.
x(N 2) .. x(N p 1)

a
1
a
2
.
.
.
a
p

u(p)
u(p + 1)
.
.
.
u(N 1)

160
The Prony Method is to nd LS solution to the overdetermined
equation

x(p 1) x(0)
.
.
.
x(N 2) x(N p 1)

a
1
.
.
.
a
p

x(p)
.
.
.
x(N 1)

.
Remarks:
The Covariance or Prony Method minimizes

2
=
1
N p
N1

n=p
u
2
(n) =
1
N p
N1

n=p

x(n) +
p

k=1
a
k
x(n k)

2
161
The Autocorrelation Method or YW-Method minimizes

2
=
1
N

n=

x(n) +
p

k=1
a
k
x(n k)

2
where those x(n) that are NOT available are set to zero.
For large N, the YW and Prony methods yield similar results.
For small N, YW method gives poor performance. The Prony
method can give good estimates a
1
, , a
p
for small N. The Prony
method gives exact estimates for x(n) =sum of sinusoids.
Since biased r(k) are used in YW method, the estimated poles
are inside unit circle. Prony method does not guarantee stability.
162
Modied Covariance or Forward Backward (F/B) Method
Recall Backward Linear Prediction:
x(n) =
p

k=1
a
b
k
x(n +k) +e
b
(n).
For real data and real AR coecients,
a
f
k
= a
b
k
= a
k
, k = 1, , p

x(0)
x(1)
.
.
.
x(N p 1)

x(1) x(2) x(p)


x(2) x(3) x(p + 1)
.
.
.
x(N p) x(N 1)

a
1
a
2
.
.
.
a
p

In the F/B method, this backward prediction equation is combined


with the forward prediction equation and LS solution is found.
163

x(p 1) x(0)
.
.
.
.
.
.
x(N 2) x(N p 1)
x(1) x(p)
.
.
.
.
.
.
x(N p) x(N 1)

a
1
a
2
.
.
.
a
p

x(p)
.
.
.
x(N 1)
x(0)
.
.
.
x(N p 1)

Remarks: The F/B method does not guarantee poles inside the
unit circle. In Practice, the poles are usually inside the unit circle.
164
For complex data and complex model,
a
k
= a
f
k
=

a
b
k

, k = 1, , p
Then F/B solves:

x(p 1) x(0)
.
.
.
.
.
.
x(N 2) x(N p 1)
x

(1) x

(p)
.
.
.
.
.
.
x

(N p) x

(N 1)

a
1
a
2
.
.
.
a
p

x(p)
.
.
.
x(N 1)
x

(0)
.
.
.
x

(N p 1)

165
Remarks on
2
:
In YW method,

2
= r(0) +
p

k=1
a
k
r(k).
In Prony Method,
Let e
LS
=

e(p)
.
.
.
e(N 1)

2
=
1
N p
N1

n=p
|e(n)|
2
166
In F/B Method,
Let e
LS
=

e
f
(p)
.
.
.
e
f
(N 1)
e
b
(0)
.
.
.
e
b
(N p 1)


2
=
1
2(N p)

N1

n=p

e
f
(n)

2
+
Np1

n=0

e
b
(n)

167
Burg Method
Consider real data and real model. Recall LDA:

n+1
=

n
0

+k
n+1

n
1

Thus, if we know
n
and k
n+1
, we can nd
n+1
.
Recall
()

e
f
m
(n) = e
f
m1
(n) +k
m
e
b
m1
(n 1)
e
b
m
(n) = e
b
m1
(n 1) +k
m
e
f
m1
(n),
where e
f
m1
(n) = x(n) +
m1

k=1
a
m1,k
x(n k).
e
b
m1
(n) = x(n m+ 1) +
m1

k=1
a
m1,k
x(n m+ 1 +k)
168

k
m
is found by minimizing (for
m1
given)
1
2
N1

n=m

e
f
m
(n)

2
+

e
b
m
(n)

k
m
=
2

N1
n=m
e
f
m1
(n) e
b
m1
(n1)

N1
n=m

[
e
f
m1
(n)
]
2
+
[
e
b
m1
(n1)
]
2

.
()
Steps in Burg method:
Initialization

r(0) =
1
N

N1
n=0
x
2
(n)

0
= r(0)
e
f
0
(n) = x(n), n = 1, 2, , N 1
e
b
0
(n) = x(n), n = 0, 1, , N 2.
169
For m = 1, 2, , p,
Calculate

k
m
with (*)

m
=

m1
(1

k
2
m
)

m
=

m1
0

+

k
m

m1
1

, (

1
=

k
1
).
Update e
f
m
(n) and e
b
m
(n) with ()
Remarks:

p
=
2
.
Since a
2
+b
2
2ab,

k
m

< 1,
Burg Method gives poles that are inside unit circle.
Dierent ways of calculating

k
m
are available.
170
Properties of AR(p) Signals:
Extension of r(k):
* Given r(0), r(1), , r(p).
* From YW - Equations we can calculate a
1
, a
2
, , a
p
,
2
* r(k) =

p
l=1
a
l
r(k l), k > p
Another point of view:
* Given r(0), , r(p).
* Calculate a
1
, , a
p
,
2
.
* Obtain P()
* r(k)
DTFT
P().
171
Maximum Entropy Spectral Estimation
Given r(0), , r(p) . The remaining r(p + 1), are extrapolated
to maximize entropy.
Entropy: Let Sample space for discrete random variable x be
x
1
, , x
N
. The entropy H(x) is
H(x) =
N

i=1
P(x
i
) ln P(x
i
),
P(x
i
) = prob(x = x
i
)
For continuous random variable,
H(x) =

f(x) ln f(x)dx.
f(x) = pdf of x.
172
For Gaussian random variables,
x =

x(0)
.
.
.
x(N 1)

N(0, R
N
)
H
N
=
1
2
ln(detR
N
).
Since H
N
as N , we consider Entropy Rate:
h = lim
N
H
N
N + 1
h is maximized with respect to r(p + 1), r(p + 2), .
Remark: For Gaussian case, we obtain Yule-Walker equations .... !
173
Maximum Likelihood Estimators:
Exact ML Estimator:
u(n) 1
A(z) real inputs real outputs
x(n) , n = 0, ..., N-1
u(n) is Gaussian white noise with zero-mean.

E[u(n)] = 0,
V ar[u(n)] =
2
E[u(i)u(j)] = 0, i = j,
The likelihood function is
f = f

x(0), , x(N 1)|a


1
, , a
p
,
2

The ML estimates of a
1
, , a
p
,
2
are found by maximizing f.
174
f = f

x(p), , x(N 1)| x(0), , x(p 1), a


1
, , a
p
,
2

x(0), , x(p 1)|a


1
, , a
p
,
2

* Consider rst f
1
= f

x(0), , x(p 1)|a


1
, , a
p
,
2

f
1
=
1
(2)
p
2
det
1
2
(R
p
)
exp

1
2

x
T
0
R
1
p
x
0

.
x
0
=

x(0)
.
.
.
x(p 1)

, R
p
=

r(0) r(p 1)
.
.
.
.
.
.
.
.
.
r(p 1) r(0)

.
Remark: r(0), , r(p 1) are functions of a
1
, , a
p
,
2
. (see, e.g.,
the YW system of equations)
175
* Consider next
f
2
= f

x(p), , x(N 1)|x(0), , x(p 1), a


1
, , a
p
,
2

x(n) +
p

k=1
a
k
x(n k) = u(n)

u(p) = x(p) +a
1
x(p 1) + +a
p
x(0).
u(p + 1) = x(p + 1) +a
1
x(p) + +a
p
x(1)
.
.
.
u(N 1) = x(N 1) +a
1
x(N 2) + +a
p
x(N p 1).
176

u(p)
.
.
.
u(N 1)

1 0 0 0
a
1
1 0 0
a
2
a
1
1 0
.
.
.
.
.
.
0 a
p
1

x(p)
x(p + 1)
.
.
.
x(N 1)

a
1
x(p 1) + +a
p
x(0)
a
2
x(p 1) + +a
p
x(1)
.
.
.
a
p
x(p 1)
0
.
.
.
0

177
Let u =

u(p)
.
.
.
u(N 1)

, x =

x(p)
.
.
.
x(N 1)

Given x(0), , x(p 1), a


1
, , a
p
,
2
, x and u are related by
linear transformation.
The Jacobian of the transformation
J =

1 0 0
a
1
1
.
.
.
.
.
.
.
.
.
0 a
p
1

det(J) = 1
178
f(u) =
1
(2
2
)
Np
2
exp

1
2
2
u
T
u

f
2
= f[u(x)] |det(J)|
= f[u(x)].
Let X =

x(p) x(p 1) x(0)


x(p + 1) x(p) x(1)
.
.
.
x(N 1) x(N 2) x(N p 1)

179
a =

1
a
1
.
.
.
a
p

.
u = Xa
f
2
=
1
(2
2
)
Np
2
exp

1
2
2
a
T
X
T
X a

.
Remark: Maximizing f = f
1
.f
2
with respect to a
1
, , a
p
,
2
is
highly non-linear!
180
An Approximate ML Estimator
a
1
, , a
p
,
2
are found by maximizing f
2
.
a
1
, , a
p
are found by minimizing a
T
X
T
Xa = u
T
u

x(p) x(0)
x(p + 1) x(1)
.
.
.
x(N 1) x(N p 1)

1
a
1
.
.
.
a
p

u(p)
u(p + 1)
.
.
.
u(N 1)

.
This is exactly Pronys Method !

2
=
1
N p
N1

n=p

x(n) +
p

j=1
a
j
x(n j)

2
.
Again, exactly Pronys Method !
181
Accuracy of AR PSD Estimators
Accuracy Analysis is dicult.
Results for large N are available due to Central Limit Theorem.
For large N, the variances for a
1
, , a
p
,

k
1
, ,

k
p
,
2
,

P()
are all proportional to
1
N
. Biases
1
N
.
182
AR Model Order Selection
Remarks:
Order too low yields smoothed/biased PSD estimate.
Order too high yields spurious peaks/large variance in PSD
estimate
Almost all model order estimators are based on the estimate of
the power of linear prediction error, denoted

k
, where k is the
model order chosen.
183
Final Prediction Error (FPE) Method
minimizes
FPE(k) =
N+k
Nk

k
.
Akaike Information Criterion (AIC) Method
minimizes
AIC(k) = N ln

k
+ 2k .
Remarks:
As N , AICs probability of error in choosing correct order
does NOT 0.
As N , AIC tends to overestimate model order.
184
Minimum Description Length (MDL) Criterion
minimizes
MDL(k) = N ln

k
+k ln N.
Remark: As N , MDLs probability of error 0.
(consistent!).
Criterion Autoregressive Transfer (CAT) Method
minimizes
CAT(k) =
1
N
k

i=1
1

k
,

i
=
N
N i

i
Remarks: None of the above methods works well for small N
Use these methods to initially estimate orders. ( Practical
experience needed ).
185
Noisy AR Processes:
y(n) = x(n) +w(n)
x(n) = AR(p) process.
w(n) = White Gaussian noise with zero-mean and variance
2
w
x(n) and w(n) are Independent of each other.
P
yy
() = P
xx
() +P
ww
()
=

2
|A()|
2
+
2
w
=

2
+
2
w
|A()|
2
|A()|
2
.
186
Remarks: y(n) is an ARMA signal
a
1
, , a
p
,
2
,
2
w
may be estimated by
* ARMA methods.
* A large order AR approximation.
* Compensating the eect of w(n).
* Bootstrap or adaptive ltering and AR methods.
187
Wiener Filter: (Wiener-Hopf Filter)

+
y(n) = x(n) + w(n)
H(z)
- e(n)
Desired Signal x(n)
x(n)
H(z) is found by minimizing E

|e(n)|
2

.
H(z) depends on knowing P
xy
().
188
General Filtering Problem: (Complex Signals)

+
y(n) = x(n) + w(n)
H(z)
- e(n)
d(n) Desired Signal
Special case of d(n): d(n) = x(n +m):
1.) m > 0, m - step ahead prediction.
2.) m = 0, ltering problem
3.) m < 0, smoothing problem.
189
Three common lters:
1.) General Non-causal:
H(z) =

k=
h
k
z
k
.
2.) General Causal:
H(z) =

k=0
h
k
z
k
3.) Finite Impulse Response (FIR):
H(z) =
p

k=0
h
k
z
k
190
Case 1: Non-causal Filter.
E = E

|e(n)|
2

= E

d(n)

k=
h
k
y(n k)

d(n)

l=
h
l
y(n l)

= r
dd
(0)

l=
h

l
r
dy
(l)

k=
h
k
r

dy
(k)
+

k=

l=
r
yy
(l k)h
k
h

l
Remark: For Causal and FIR lters, only limits of sums dier.
Let h
i
=
i
+j
i
E

i
= 0,
E

i
= 0.
r
dy
(i) =

k=
h
o
k
r
yy
(i k), i
191
In Z - domain
P
dy
(z) = H
o
(z)P
yy
(z)
which is the optimum Non-causal Wiener Filter.
Ex : d(n) = x(n), y(n) = x(n) +w(n),
P
xx
(z) =
0.36
(1 0.8z
1
) (1 0.8z)
P
ww
(z) = 1.
x(n) and w(n) are uncorrelated.
Optimal lter ?
P
yy
(z) = P
xx
(z) +P
ww
(z)
=
0.36
(1 0.8z
1
) (1 0.8z)
+ 1
= 1.6

1 0.5z
1

(1 0.5z)
(1 0.8z
1
) (1 0.8z)
192
r
dy
(k) = E [d(n +k)y

(n)]
= E {x(n +k) [x

(n) +w(n)]}
= r
xx
(k).
P
dy
(z) = P
xx
(z)
H
o
(z) =
P
yy
(z)
P
dy
(z)
=
0.36
1.6 (1 0.5z
1
) (1 0.5z)
h
o
(k) = 0.3

1
2

|k|
h
k
0 1 2
o
(k)
0.3
193
Case 2: Causal Filter.
H(z) =

k=0
h
k
z
k
Through similar derivations as for Case 1, we have
r
dy
(i) =

k=0
h
o
k
r
yy
(i k),
h
o
k
=?
Split H(z) as
194
B(z)B

1
z

=
1
P
yy
(z)
195
Pick B(z) such that the system B(z) is stable, causal,
minimum phase.
Note that
P

(z) = P
yy
B(z)B

1
z

= 1
B(z) is called whitening lter.
Choose G

(z) so that E{|e(n)|


2
} is minimized.
r
d
(i) =

k=0
g

k
r

(i k).
Since P

(z) = 1, r

(k) = (k).

r
d
(i) = g

i
, i = 0, 1, 2,
196
h

i
= g

i
b
i
.
Note that
r
d
(i) = E{d(n +i)

(n)}
= E

d(n +i)

k=0
b
k
y(n k)

k=0
b

k
r
dy
(i +k).
Since b

k
= 0 for k < 0 (causal),
r
d
(i) =

k
r
dy
(i +k).
p
d
(z) = P
dy
(z)B

1
z

r
d
(i) = g

i
, for i = 0, 1, , ONLY .
197
Let
[X(z)]
+
=

k=
x
k
z
k

+
=

k=
x
k
z
k
.
G

(z) =

k=
g

k
z
k
G

(z) =

P
dy
(z)B

1
z

+
H

(z) = B(z)G

(z)
H

(z) = B(z)

P
dy
(z)B

1
z

+
198
Ex. (Same as previous one)
P
xx
(z) =
0.36
(1 0.8z
1
) (1 0.8z)
,
P
ww
(z) = 1. x(n) and w(n) independent

x(n)
+
- e(n)
x(n) + w(n)
H(z)
o
Causal
P
dy
(z) = P
xy
(z) = P
xx
(z)
P
yy
(z) =
1.6

1 0.5z
1

(1 0.5z)
(1 0.8z
1
) (1 0.8z)
.
B(z) =
1

1.6
1 0.8z
1
1 0.5z
1
( stable and causal )
199
P
dy
(z)B

1
z

=
0.36
(1 0.8z
1
) (1 0.8z)
1

1.6
1 0.8z
1 0.5z
=
0.36

1.6
1
(1 0.8z
1
) (1 0.5z)
.
=
0.36

1.6

5
3
1 0.8z
1
+
5
6
z
1 0.5z

P
dy
(z)B

1
z

+
=
0.36

1.6
5
3
1 0.8z
1
= G
o
(z)
200
H
o
(z) =
0.36

1.6
5
3
1 0.8z
1
1

1.6
1 0.8z
1
1 0.5z
1
= 0.375
1
1 0.5z
1
.
h
o
(k) =
3
8

1
2

k
U(k), k = 0, 1, 2, .
201
Case 3: FIR Filter:
H(z) =
p

k=0
h
k
z
k
Again, we can show similarly
r
dy
(i) =
p

k=0
h
o
k
r
yy
(i k).

r
dy
(0)
r
dy
(1)
.
.
.
r
dy
(p)

r
yy
(0) r
yy
(1) r
yy
(p)
r
yy
(1) r
yy
(0)
.
.
.
.
.
.
r
yy
(p) r
yy
(p 1) r
yy
(0)

h
o
0
h
o
1
.
.
.
h
o
p

Remark: The Minimum error E is the smallest in case (1) and


largest in case (3).
202
Parametric Methods for Line Spectra
y(n) = x(n) +w(n)
x(n) =
K

k=1

k
e
j(
k
n+
k
)

k
= Initial phases, independent of each other,
uniform distribution on [, ]

k
= amplitudes, constants, > 0

k
= angular frequencies
w(n) = zero-mean white Gaussian Noise,
independent of
1
, ,
K
203
Remarks:
Applications: Radar, Communications, .
We are mostly interested in estimating
1
, ,
K
.
Once
1
, ,
K
are estimated,
1
, ,
K
,

1
, ,

K
can be
found readily from
1
, ,
K
Let
k
e
j
k
=
k

y(0)
y(1)
.
.
.
y(N 1)

1 1 1
e
j
1
e
j
2
e
j
K
.
.
.
.
.
.
e
j(N1)
1
e
j(N1)
K

2
.
.
.

The amplitude of

k
is
k
. The phase of

k
is
k
.
204
Remarks:
r
yy
(k) = E {y

(n)y(n +k)}
=
K

i=1

2
i
e
j
i
k
+
2
(k)
P
yy
() = 2
K

i=1

2
i
(
i
) +
2
.
2
1

2
2
3
2
2 2
2

1
2
3
2
0

Recall that the resolution limit of Periodogram is


1
N
The Parametric methods below have resolution better than
1
N
.
(These methods are the so-called High - Resolution or Super -
Resolution methods)
205
Maximum Likelihood Estimator
w(n) is assumed to be zero-mean circularly symmetric complex
Gaussian random variable with variance
2
.
The pdf of w(n) is N(0,
2
)
f (w(n)) =
1

2
exp

|w(n)|
2

.
Remark: The real and imaginary parts of w(n) are real Gaussian
random variables with zero-mean and variance

2
2
.
The two parts are independent of each other.
206
f (w(0), , w(N 1)) =
1
(
2
)
N
exp

N1
n=0
|w(n)|
2

The likelihood function of y(0), , y(N 1) is


f = f (y(0), , y(N 1)) =
1
(
2
)
N
exp

N1
n=0
|y(n) x(n)|
2

Remark: The ML estimates of

1
, ,
K
,
1
, ,
K
,
1
, ,
K
are found by maximizing f
with respect to
1
, ,
K
,
1
, ,
K
,
1
, ,
K
.
Equivalently, we minimize
g =
N1

n=0

y(n)
K

k=1

k
e
j(
k
n+
k
)

2
207
Remarks: If w(n) is neither Gaussian nor white, minimizing g is
called the non-linear least-squares method, in general.
Let y =

y(0)
.
.
.
y(N 1)

, =

1
.
.
.

, =

1
.
.
.

B =

1 1 1
e
j
1
e
j
2
e
j
K
.
.
.
.
.
.
e
j(N1)
1
e
j(N1)
K

208
g = (y B)
H
(y B) .
=

B
H
B

1
B
H
y

H

B
H
B

B
H
B

1
B
H
y

+ y
H
y y
H
B

B
H
B

1
B
H
y.

= argmax

y
H
B

B
H
B

1
B
H
y

B
H
B

1
B
H
y

.
Remarks: is a consistent estimate of
209
For large N,
E

( ) ( )
H

=
6
2
N
3

2
1
.
.
.
1

2
K

= CRB
However,
The maximization to obtain is dicult to implement.
* The search may not nd global maximum.
* Computationally expensive.
210
Special Cases:
1.) K = 1
= argmax

y
H
B

B
H
B

1
B
H
y

. .. .
g
1
,
B =

1
e
j
.
.
.
e
j(N1)

, B
H
B = N.
211
B
H
y =

1 e
j
e
j(N1)

y(0)
.
.
.
y(N 1)

=
N1

n0
y(n)e
jn
= argmax

1
N

N1

n=0
y(n)e
jn

2
corresponds to the highest peak of the Periodogram !
212
2.)
= inf
i=k
|
i

k
| >
2
N
.
Since V ar (
k

k
)
1
N
3

k

k

1
N
3
2
.
inf
i=k
|
i

k
| >
2
N
.
We can resolve all K sine waves by evaluating g
1
at FFT points:

i
=
2
N
i, i = 0, , N 1
Any K of these
i
gives B
H
B = NI, I = Identity matrix.
g
1
=
K

k=1
1
N

N1

n=0
y(n)e
j
k
n

2
.
213
2
N
g

The K
i
that maximizes g
1
correspond to the
largest K peaks of the Periodogram.
Remarks:
k
estimates obtained by using the K largest peaks of
Periodogram have accuracy
k

k

2
N
The periodogram is a good frequency estimator. (This was
introduced by Schuster a century ago !)
214
High - Resolution Methods
Statistical Performance Close to ML estimator ( or CRB ) .
Avoid Multidimensional search over parameter space.
Do not depend on Resolution condition.
All provide consistent estimates
All give similar performance, especially for large N.
Method of choice is a Matter - of - Taste .
215
Higher - Order Yule- Walker ( HOYW) Method:
Let x
k
(n) =
k
e
j(
k
n+
k
)

1 e
j
k
z
1

x
k
(n) = x
k
(n) e
j
k
x
k
(n 1)
=
k
e
j(
k
n+
k
)
e
j
k

k
e
j[
k
(n1)+
k
]
= 0

1 e
j
k
z
1

is an Annihilating lter for x


k
(n).
Let A(z) =
K
k=1

1 e
j
k
z
1

A(z)x(n) = 0
y(n) = x(n) +w(n)
A(z)y(n) = A(z)w(n) ()
216
Remark:
It is tempting to cancel A(z) from both sides above, but this is
wrong since y(n) = w(n) !
Multiplying both sides of () by a polynomial

A(z)
of order L K gives

1 + a
1
z
1
+ + a
L
z
L

y(n) =

1 + a
1
z
1
+ + a
L
z
L

w(n)
where 1 + a
1
z
1
+ + a
L
z
L
= A(z)

A(z)
[y(n) y(n 1) y(n L)]

1
a
1
.
.
.
a
L

= w(n)+ +a
L
w(nL)
217
Multiplying both sides by

(n L 1)
.
.
.
y

(n L M)

,
we get

r
yy
(L + 1) r
yy
(1)
.
.
.
r
yy
(L +M) r
yy
(M)

1
a
1
.
.
.
a
L

= 0.

r
yy
(L) r
yy
(1)
.
.
.
r
yy
(L +M 1) r
yy
(M)

a
1
.
.
.
a
L

r
yy
(L + 1)
.
.
.
r
yy
(L +M)

a =
218
Remarks:
When y(0), , y(N 1) are the only data available, we rst
estimate r
yy
(i) and replace r
yy
(i) in above equation with estimate
r
yy
(i)
{
K
} are the angular positions of the K roots nearest the unit
circle
Increasing L and M will
* give better performance due to using the information in
higher lags of r(i)
Increasing L and M too much will
* give worse performance due to increased variance in r(i) for
large i
219
has rank K, if M K and L K
Proof: Let y
i
(n) =

y(n)
y(n 1)
.
.
.
y(n i + 1)

, w
i
(n) =

w(n)
w(n 1)
.
.
.
w(n i + 1)

x(n) =

x
1
(n)
.
.
.
x
K
(n)

, x
k
(n) =
k
e
j(
k
n+
k
)
220
y
i
(n) =

1 1 1
e
j
1
e
j
2
e
j
K
.
.
.
e
j(i1)
1
e
j(i1)
K

. .. .
A
i
x(n) + w
i
(n)
A
i
= i K Vandermonde matrix.
rank(A
i
) = K if i K and
k
=
l
for k = l.
y
i
(n) = A
i
x(n) + w
i
(n)
221
Thus

= E

y(n L 1)
.
.
.
y(n L M)

(n 1) y

(n L)

= E

A
M
x(n L 1) x
H
(n 1)A
H
L

= A
M
P
L+1
A
H
L
,
where P
L+1
= E

x(n L) x
H
(n)

222
E {x
i
(n)} = E

i
e
j(
i
n+
i
)

i
e
j
i
n
e
j
i
1
2
d
i
= 0
E {x
i
(n k)x

i
(n)}
= E

i
e
j[
i
(nk)+
i
]

i
e
j(
i
n+
i
)

=
2
i
e
j
i
k
Since

i
s are independent of each other,
E

x
i
(n k)x

j
(n)

= 0, i = j
223
P
L+1
= E

x
1
(n L 1)
x
2
(n L 1)
.
.
.
x
K
(n L 1)

1
(n 1) x

K
(n 1)

2
1
e
j
1
L
0
.
.
.
.
.
.
.
.
.
0
2
K
e
j
K
L

Remark: For M K and L K,

is of rank K, so is .
224
Consider

r
yy
(L) r
yy
(1)
.
.
.
r
yy
(L +M 1) r
yy
(M)

a
1
.
.
.
a
L

r
yy
(L + 1)
.
.
.
r
yy
(L +M)


a .
Remarks: rank (

) = min(M, L)
almost surely, due to errors in r
yy
(i)
For large N, r
yy
(i) r
yy
(i) makes

ill conditioned.
For large N, LS estimates of a
1
, , a
L
give poor estimates of

1
, ,
K
.
225
Let us use this rank information as follows: Let

= UV
H
= [U
1
U
2
]

1
0
0
2

V
H
1
V
H
2

K
L K
denote the singular value decomposition (SVD) of

. (Diagonal
elements in

1
0
0
2

arranged from large to small ).


Since

is close to rank K, and has rank K,

K
= U
1

1
V
H
1
(The best Rank - K Approximation of

in the Frobenius Norm
sense) is generally a better estimate of than

.

K
a ,

a = V
1

1
1
U
H
1
()
226
Remark:
Using

K
to replace gives better frequency estimation.
This result may be explained by the fact that

K
is closer to
than

.
The rank approximation step is referred as noise cleaning .
227
Summary of HOYW Frequency Estimator
Step 1: Compute r(k), k = 1, 2, , L +M.
Step 2: Compute the SVD of

and determine

a with (**)
Step 3: Compute the roots of
1 +

a
1
z
1
+ +

a
L
z
L
= 0
Pick the K roots that are nearest the unit circle and obtain the
frequency estimates as the angular positions (phases) of these roots.
Remarks: Rule of Thumb for selecting L and M:
L M
L +M
N
3
Although one cannot guarantee that the K roots nearest the unit
circle give the best frequency estimates, empirical evidence shows
that this is true most often .
228
Some Math Background
Lemma: Let U be a unitary matrix; i.e., U
H
U = I.
Then ||Ub||
2
2
= ||b||
2
2
,
where ||x||
2
2
= x
H
x.
Proof:
||Ub||
2
2
= b
H
U
H
Ub = b
H
b = ||b||
2
.
Consider Ax b,
where A is M L,
x is L 1,
b is M 1,
A is of rank K
229
SVD of A:
A = UV
H
=

U
1
U
2

1
0
0 0

V
H
1
V
H
2

Goal: Find the minimum-norm x so that ||Ax b||


2
2
= minimum.
||Ax b||
2
2
= ||U
H
Ax U
H
b||
2
2
= ||U
H
UV
H
x U
H
b||
2
= ||V
H
x
. .. .
y
U
H
b||
2
2
= ||y U
H
b||
2
2
=

1
0
0 0

y
1
y
2

U
H
1
b
U
H
2
b

2
2
= ||
1
y
1
U
H
1
b||
2
2
+||U
H
2
b||
2
2
230
To minimize ||Ax b||
2
2
, we must have,

1
y
1
= U
H
1
b

y
1
=
1
1
U
H
1
b .
Note that y
2
can be anything and ||Ax b||
2
2
is not aected.
Let y
2
= 0 so that ||y||
2
2
= ||x||
2
2
= minimum.
V
H
x = y =

y
1
0

x = Vy =

V
1
V
2

y
1
0

= V
1
y
1

x = V
1

1
1
U
H
1
b.
||x||
2
2
= ||y||
2
2
= minimum
231
SVD Prony Method
Recall:

1 + a
1
z
1
+ + a
L
z
L

y(n)
= (1 + a
1
z
1
+ + a
L
z
L
)w(n). (L K)
At not too low SNR,

y(n) y(n 1) y(n L)

1
a
1
.
.
.
a
L

y(L) y(L 1) y(0)


y(L + 1) y(L) y(1)
.
.
.
y(N 1) y(N 2) y(N L 1)

1
a
1
.
.
.
a
L

0 ()
232
Remark: If w(n) = 0, Eq () holds exactly.
If w(n) = 0, Eq () gives EXACT frequency estimates.
Consider next the rank of
X =

x(L 1) x(0)
.
.
.
x(N 2) x(N L 1)

Note

x(0)
.
.
.
x(N L 1)

1 1
e
j
1
e
j
K
.
.
.
.
.
.
e
j(NL1)
1
e
j(NL1)
K

1
.
.
.

233

x(1)
.
.
.
x(N L)

1 .. 1
e
j
1
.. e
j
K
.
.
.
.
.
.
e
j(NL1)
1
.. e
j(NL1)
K

1
e
j
1
.
.
.

K
e
j
K

X =

1 1
e
j
1
e
j
K
.
.
.
.
.
.
e
j(NL1)
1
e
j(NL1)
K

1
0
.
.
.
0
K

e
j(L1)
1
e
j
1
1
e
j(L1)
2
e
j
2
1
.
.
.
.
.
.
.
.
.
e
j(L1)
K
e
j
K
1

234
Remark: If N L 1 K and L K, X is of rank K.
From (*)

y(L 1) y(0)
.
.
.
.
.
.
y(N 2) y(N L 1)

. .. .
Y

a
1
.
.
.
a
L

y(L)
.
.
.
y(N 1)

. .. .
y
Remark: A rank K approximation of Y has Noise Cleaning
eect.
Let Y =

U
1
U
2

1
0
0
2

V
H
1
V
H
2

K
L K
235

a
1
.
.
.

a
L

= V
1

1
1
U
H
1

y(L + 1)
.
.
.
y(N 1)

. ()
Summary of SVD Prony Estimator.
Step 1. Form Y and compute SVD of Y
Step 2. Determine

a with ()
Step 3. Compute the roots from

a. Pick K roots that are nearest
the unit circle. Obtain frequency estimates as phases of the roots.
Remark: Although one cannot guarantee that the K roots
nearest the unit circle give the best frequency estimates, empirical
results show that this is true most often.
A more accurate method is obtained by cleaning (i.e., rank K
approximation of) the matrix [Y
.
.
. y].
236
Pisarenko and MUSIC Methods
Remark: Pisarenko method is a special case of MUSIC ( Multiple
Signal Classication ) method.
Recall:
y
M
(n) =

y(n)
y(n 1)
.
.
.
y(n M + 1)

A
M
=

1 1
e
j
1
e
j
K
.
.
.
.
.
.
e
j(M1)
1
e
j(M1)
K

,
237
x(n) =

x
1
(n)
.
.
.
x
K
(n)

,
w
M
(n) =

w(n)
.
.
.
w(n M + 1)

y
M
(n) = A
M
x(n) + w
M
(n)
Let R = E

y
M
(n) y
H
M
(n)

= E

A
M
x(n) x
H
(n)A
H
M

+ E

w
M
(n) w
H
M
(n)

238

R = A
M
PA
H
M
+
2
I,
P =

2
1
0
.
.
.
0
2
K

.
Remarks: rank (A
M
PA
H
M
) = K if M K.
If M K, A
M
PA
H
M
has K positive eigenvalues and M K zero
eigenvalues. We shall consider M K below.
Let the positive eigenvalues of A
M
PA
H
M
be denoted

K
The eigenvalues of R are:
Two groups

k
=

k
+
2
, k = 1, , K.

k
=
2
, k = K + 1, , M
239
Let s
1
, , s
K
be the eigenvectors of R that correspond to

1
, ,
K
.
Let S = [s
1
, , s
K
]
Let s
K+1
, , s
M
be the eigenvectors of R that correspond to

K+1
, ,
M
.
Let G = [s
K+1
, , s
M
]
RG = G

2
0
.
.
.
0
2

=
2
G
RG =

A
M
PA
H
M
+
2
I

G
= A
M
PA
H
M
G+
2
G
A
M
PA
H
M
G = 0 A
H
M
G = 0
240
Remark:
Let the linearly independent K columns of A
M
dene
K -dimensional signal subspace
* Then the eigenvectors of R that correspond to the M K
smallest eigenvalues are orthogonal to the signal subspace.
* The eigenvectors of R that correspond to the K largest
eigenvalues of R span the same signal subspace as A
M
.
A
M
= SC for a K K non-singular C.
241
MUSIC:
The true frequency values {
k
}
K
k=1
are the only solutions of
a
H
M
()GG
H
a
M
() = 0.
a
M
() =

1
e
j
.
.
.
e
j(M1)

.
Steps in MUSIC:
Step 1: Compute

R =
1
N

N
n=M
y
M
(n) y
H
M
(n), and its
eigendecomposition.
Form

G whose columns are the eigenvectors of

R that correspond
to the M K smallest eigenvalues of

R.
242
Step 2a (Spectral MUSIC): Determine the frequency estimates as
the locations of the K highest peaks of the MUSIC spectrum
1
a
H
M
()

G
H
a
M
()
, [, ]
Step 2b (Root MUSIC): Determine the frequency estimates as
angular positions (phases) of K (pairs of reciprocal) roots of
equation
a
H
M

z
1

G
H
a
M
(z) = 0
that are closest to the unit circle
a
M
(z) =

1 z
1
z
M+1

T
, i.e., a
M
(z)|
z=e
j = a
M
()
243
Pisarenko Method = (MUSIC with M = K + 1)
Remarks:
Pisarenko method is not as good as MUSIC.
M in MUSIC should not be too large due to poor accuracy of
r(k) for large k.
244
ESPRIT Method
(Estimation of Signal Parameters by Rotational Invariance
Techniques )
A
M
=

1 1
e
j
1
e
j
K
.
.
.
e
j(M1)
1
e
j(M1)
K

Let B
1
= rst M 1 rows of A
M
, B
2
= last M 1 rows of A
M
.
B
2
D = B
1
,
D =

e
j
1
0
.
.
.
0 e
j
K

245
Let S
1
and S
2
be formed from S the same way as B
1
and B
2
from
A
M
Recall: S = A
M
C

S
1
= B
1
C = B
2
DC.
S
2
= B
2
C
S
2
C
1
= B
2
S
1
= S
2
C
1
DC

= S
2
.

S
H
2
S
2

1
S
H
2
S
1
.
The diagonal elements of D are the eigenvalues of .
Steps of ESPRIT: Step 1:

=

S
H
2

S
2

S
H
2

S
1
Step 2: Frequency estimates are angular positions of the
eigenvalues of

.
246
Remarks:


S
2


S
1
can also be solved with Total Least Squares Method
Since is K K matrix, we do not need to pick K roots nearest
the unit circle, which could be wrong roots.
ESPRIT does not require the search over parameter space, as
required by Spectral MUSIC.
All of these remarks make ESPRIT a recommended method !
247
Sinusoidal Parameter Estimation in the Presence
of Colored Noise via RELAX
y(n) =
K

k=1

k
e
j
k
n
+e(n)

k
= Complex amplitudes, unknown.

k
= Unknown frequencies.
e(n) = Unknown AR or ARMA noise.
Consider the Non-linear least-squares (NLS) method.
g =
N1

n=0

y(n)
K

k=1

k
e
j
k
n

2
248
Remarks:

k
and
k
, k = 1, , K are found by minimizing g .
When e(n) is zero mean Gaussian white noise, this NLS method
is the ML method.
When e(n) is non-white noise, NLS method gives asymptotically
(N ) statistically ecient estimates of
k
and

k
despite the
fact that NLS is not an ML method for this case.
The non-linear minimization is a dicult problem.
249
Remarks:
Concentrating out {
k
} gives
= argmax

y
H
B

B
H
B

1
B
H
y

B
H
B

1
B
H
y

=
.
Concentrating out {
k
}, instead of simplifying the problem,
actually complicates the problem.
The RELAX algorithm is a relaxation - based optimization
approach.
RELAX is both computationally and conceptually simple.
250
Preparation:
Let y
k
(n) = y(n)
K

i=1,i=k

i
e
j
i
n
*

i
and
i
, i = k, are assumed given, known, or estimated.
Let g
k
=
N1

n=0

y
k
(n)
k
e
j
k
n

2
.
* Minimizing g
k
gives:

k
= argmax

N1

n=0
y
k
(n)e
j
k
n

2
.

k
=
1
N
N1

n=0
y
k
(n)e
j
k
n

k
=
k
.
251
Remarks:
N1

n=0
y
k
(n)e
j
k
n
is the DTFT of y
k
(n)!
(can be computed via FFT and zero-padding.)

k
corresponds to the peak of the Periodogram!

k
is the peak height (complex number!) of the DTFT of y
k
(n)
(at
k
) divided by N.
252
The RELAX Algorithm
Step 1: Assume K =1. Obtain
1
and

1
from y(n).
Step 2: Obtain y
2
(n) by assuming K=2 and using
1
and

1
obtained from Step 1.
Iterate until converg.

Obtain
2
and

2
from y
2
(n)
Obtain y
1
(n) by using
2
and

2
and reestimate
1
and

1
from y
1
(n)
Step 3: Assume K = 3.
Obtain y
3
(n) from
1
,

1
,
2
,

2
. Obtain
3
and

3
from y
3
(n) .
Obtain y
1
(n) from
2
,

2
,
3
,

3
. Reestimate
1
and

1
from y
1
(n).
Obtain y
2
(n) from
1
,

1
,
3
,

3
. Reestimate
2
and

2
from y
2
(n).
Iterate until g does not decrease signicantly anymore !
253
Step 4: Assume K = 4,
.
.
.
Continue until K is large enough!
Remark:
RELAX is found to perform better than existing
high-resolution algorithms, especially in obtaining better

k
,
k = 1, , K
RELAX is more robust to the choice of K and the data model
errors.
254