Li Slides

EEL 6537 Spectral Estimation
Jian Li
Department of Electrical and Computer Engineering
University of Florida
Gainesville, FL 32611, USA
Spectral Estimation is an Art

Petre Stoica
I hear, I forget;
I see, I remember;
I do, I understand.
A Chinese Philosopher.
What is Spectral Estimation?

From a finite record of a stationary data sequence, estimate how
the total power is distributed over frequencies , or more practically,
over narrow spectral bands (frequency bins).
Spectral Estimation Methods:

Classical (Nonparametric) Methods
Ex. Pass the data through a set of band-pass filters and measure
the filter output powers.
Parametric (Modern) Approaches
Ex. Model the data as a sum of a few damped sinusoids and
estimate their parameters.
Trade-Offs: (Robustness vs. Accuracy)
Parametric Methods may offer better estimates if data closely
agrees with assumed model.
Otherwise, Nonparametric Methods may be better.
Some Applications of Spectral Estimation

Speech
- Formant estimation (for speech recognition)
- Speech coding or compression
Radar and Sonar
- Source localization with sensor arrays
- Synthetic aperture radar imaging and feature extraction
Electromagnetics
- Resonant frequencies of a cavity
Communications
- Code-timing estimation in DS-CDMA systems
REVIEW OF DSP FUNDAMENTALS

Continuous-Time Signals
Periodic signals
x(t) = x(t + Tp )
Fourier Series:
x(t)
ck ej2kFo t
k=
ck
Fo
=
=
1
Tp
x(t)ej2kFo t dt,
Tp
1
.
Tp
x(t)
Ex.
/2
Tp
FT
ejo t 2( o )
x()
2F0
2 c0
2c1
0
0
Ex.
s(t) =
k=
(t kT )
s(t)
t
-2T
Ck =
1
T
-T
for all k
2T
S ()
2 /T
2 /T
Remark:
Periodic Signals Discrete Spectra.

8
Discrete signals
x ()
x(t)
Ex:
x(t) s(t)
Remark:
2/
Discrete Signals Periodic Spectra.
Discrete Periodic Signals Periodic Discrete Spectra.
Aliasing Problem:
Ex.
10
* Fourier Transform (Continuous - Time vs. Discrete-Time)

Let y(t) = x(t)s(t)
CT F T :
Y ()
=
=
n=
Z
y(t)ejt dt
=
=
DT F T :
Y ()
x(nT )(t nT )
n=
x(nT )(t nT )ejt dt
x(nT )ejnT
n=
X
n=
11
x(nT )ejnT
Discrete-Time
Signal
y()
x(nT)
DTFT
nT
0
2
T
Remarks: Discrete-Time Fourier Transform (DTFT) is the same as

Continuous-Time Fourier Transform (CTFT) with x(nT ) (t nT )
R
P
replaced by x(nT ) and replaced by
(easy for computers).
12
For simplicity, we drop T.

X ()
x(n)
DTFT
-2
OR
-1
DT F T
-1/2
1/2
X() = P
jn
n= x(n)e
P air :
R
x(n) = 1 X()ejn d
2
13
Remark: For DTFT, we also have:

DT F T
Discrete Periodic Signals Periodic Discrete Spectra.

X ()
x(n)
DTFT
Ex.
2
Note The Aliasing

When x(n + N ) = x(n),
x(n) = 1 PN 1 X(k)ej2k Nn ,
k=0
N
DF T P air :
X(k) = PN 1 x(n)ej2k Nn
n=0
14
Ex. Note the Aliasing

x(n)
X(k)
DFT
-10
10
-10
10
Showing One Period

DFT
k
0
Remarks: For periodic sequences, DFT and DTFT yield similar

spectra. IDFT (Inverse DFT) is the same as IDTFT (inverse

R
2k
2k
DTFT) with X N N replaced by X(k) and replaced
P
by
(easy for computers).
15
Effects of Zero-Padding:
X ()
x(n)
DTFT
x(n)
X(k)
DFT
5 points
1
n
x(n)
X(k)
DFT
10 points
3
0
8
9
Remark: The more zeroes padded, the closer X(k) is to X().

X(k) is a sampled version of X() for finite duration sequences.
16
Z-Transform
X(z) = P
n
n= x(n)z
R
x(n) = 1
X(z)z n1 dz
2j
For finite duration x(n),
X(z)
N
1
X
x(n)z n
n=0
The DFT X(k) is related to X(z) as follows:

X(k) = X(z)|
j 2 k
N
z=e
Im
Re
( X(k) evenly sampled on the unit circle of the z-plane)
17
Linear Time-Invariant (LTI) Systems.

N th order difference equation:
N
1
X
k=0
ak y(n k) =
M
X
k=0
bk x(n k)
Impulse Response:
(n)
h(n) = y(n)|x(n)=(n)
PM
bk z k
H(z) = PNk=0 k .
k=0
18
ak z
Bounded-Input Bounded-Output (BIBO) Stability:

All poles of H(z) are inside the unit circle for a causal system
(where h(n)=0, n< 0).
FIR Filter:
N=0.
IIR Filter:
N>0.
Minimum Phase: All poles and zeroes of H(z) are inside the unit
circle.
19
ENERGY AND POWER SPECTRAL DENSITIES

Energy Spectral Density of Deterministic Signals.
Finite Energy Signal if
0<
n=
Let X() =
n=
|x(n)| <
x(n)ejn
Parsevals Energy Theorem:
R
2
1
P
n= |x(n)| = 2 S()d,
2
S() = |X()|
2
Remark: |X()| measures the length of orthogonal projection of

jn
{x(n)} onto basis sequence e
, [, ].
20
Let (k) =
(k)ejk
n=
x(n)x (n k).
k= n=
k=
"
x(n)x (n k)ejn ej(nk)
x(n)ejn
n=
2
= |X()| = S().
#"
x(s)ejs
s=
Remark: S() is the DTFT of the autocorrelation of finite

energy sequence {x(n) }.
21
Power Spectral Density (PSD) of Random Signals.

Let {x(n)} be wide-sense stationary (WSS) sequence with
E[x(n)] = 0.
r(k) = E[x(n)x (n k)].
Properties of autocorrelation function r(k).
r(k) = r (k).
r(0) |r(k)| , for all k
0 r(0) = average power of x(n).
22
Def: A is positive semidefinite if zH Az 0 for any z.
(zH = (zT ) Hermitian transpose ).

Let
A
r(0)
r(k)
r (k) r(0)
h
i
x(n)
x (n) x (n k)
= E
x(n k)
Obviously, A is positive semidefinite.
Then all eigenvalues of A are 0.

determinant of A 0.
2
r2 (0) |r(k)| 0.
23
Covariance matrix:
r(0)
r(1)
r(m 2) r(m 1)
..
..
.
.
r (1)
r(0)
r(m 2)
..
..
..
..
..
R=
.
.
.
.
.
..
..
..
..
..
.
.
.
.
.
r (1)
r(0)
r (m 1) r (m 2)
It is easy to show that R is positive semidefinite.
R is also Toeplitz.
Since R = RH , R is Hermitian.
24
Eigendecomposition of R
R = UUH ,
where UH U = UUH = I
(U is unitary matrix whose columns are eigenvectors of R)
= diag(1 , ..., m ),
(i are the eigenvalues of R, real, and 0).
25
First Definition of PSD:
X
P () =
r(k)ejk
k=
Z
1
r(k) =
2
Or
P (f ) =
P ()ejk d
r(k)ej2f k
k=
r(k) =
1
2
P (f )ej2f k df
12
Remark: Since r(k) is discrete, P () and P (f ) are periodic, with

period 2 () and 1 (f ), respectively.
We usually consider [, ] or f [ 12 , 21 ].
26
r(0) =
1
2
P ()d = Average power for all frequency.
PSD
Average power between 1 and 2
1 2
27
Second Definition of PSD.

2
1

1 NX

.
P () = lim E
x(n)ejn

N
N

n=0
This definition is equivalent to the first one under

1
lim
N N
N
1
X
k=N +1
|k| |r(k)| = 0
(which means that {r(k)} decays sufficiently fast ).

Properties of PSD.
P () 0 for all .
For real x(n), r(k) = r(k), P () = P (), [, ].
For complex x(n), r(k) = r (k).

28
PSD for LTI Systems.
x(n)
H ()
y(n)
Py () = Px ()|H()| .
Complex (DE) Modulation.
y(n) = x(n)ej0 n .
It is easy to show that
ry (k) = rx (k)ej0 k .
Py () = Px ( 0 ).
29
Spectral Estimation Problem

From a finite-length record {x(0), ..., x(N 1)}, determine an
estimate P () of the PSD, P (), for [-,].
NonParametric Methods:
Periodogram:
Recall the second definition of PSD:

2
1

1 NX

P () = lim E
.
x(n)ejn

N
N

n=0
P
2

N
1
Periodogram = Pp () = N1 n=0 x(n)ejn .
Remark: Pp () 0 for all .

If x(n) is real, Pp () is even.
E[Pp ()] = ? Var[Pp ()] = ? (to be discussed later on)
30
Correlogram (See first PSD definition)

N
1
X
Correlogram = Pc () =
r(k)ejk .
k=(N 1)
Unbiased Estimate of r(k):
k 0,
k < 0,
r(k) =
1
N k
PN 1
r(k) = r (k)
31
i=k
x(i)x (i k)
x(i)
Ex.
i
0
r(0) =
1
3
P2
0 (1)(1)
= 1, (average of 3 points)
P2
1
r(1) = r(1) = 2 1 (1)(1) = 1, (average of 2 points)
P2
1
r(2) = r(2) = 1 2 (1)(1) = 1, (average of 1 point)
r(3) = r(3) = 0.
pc ()
r(k)
k
2 1 0
32
Remark:
r(k) is a bad estimate of r(k) for large k.
E[
r(k)] = r(k) (unbiased )
Proof:
"
1
E[
r(k)] = E
N k
N
1
X
i=k
x(i)x (i k)
N
1
X
1
=
r(k) = r(k)
N k
i=k
Pc () based on unbiased r(k) may be 0.
33
Biased Estimate of r(k) (used more often!)
k 0, r(k) = 1 PN 1 x(i)x (i k),

i=k
N
k < 0, r(k) = r (k),
Remark:
N 1
1 X
E[x(i)x (i k)]
E[
r(k)] =
N
i=k
N 1
1 X
N k
r(k)
=
r(k) =
N
N
i=k
r(k), as N
(Asymptotically unbiased)
34
x(i)
Ex.
i
0
r(0) =
1
3
P2
0 (1)(1)
= 1.
P2
r(1) = r(1) = 13 1 (1)(1) = 23 .
P2
1
r(2) = r(2) = 3 2 (1)(1) = 13 .
r(k)
Pc ( )
DTFT
-2 -1 0 1 2
35
Remark:
With biased r(k), Pc () = Pp () 0,for all
E[
r(k)] 6= r(k)
E[
r(k)] r(k), as N Asymptotically unbiased.
r(0)
r(1)
r(N 1)
(1)
r
(0)
(N
2)
R=
..
..
..
..
.
.
.
.
r (N 1) r (N 2)
r(0)
is positive semidefinite.
with r(k) biased estimate. Then R
36
General Comments on Pp () and Pc ().

Pp () and Pc () provide POOR estimate of P (). (The
variances of Pp () and Pc () are high.)
Reason: Pp () and Pc () are from a single realization of a random
process.
Compute Pp () via FFT.
Recall DFT: (N 2 complex multiplication)

X(k) =
N
1
X
x(i)ej N ki
i=0
1
2
Pp (k) = |X(k)| .
N
37
Let
W
j 2
N
= e
X(k) =
N
1
X
, N = 2m
x(n)W kn
n=0
N
2
1
X
x(n)W kn +
n=0
N
1
X
x(n)W kn
n= N
2
N
2

1
X
Nk
N
W 2 W kn
x(n) + x n +
2
n=0
Note:
W
Nk
2
j 2
N
Nk
2
=e
= ejk

1,
even k
=
1, odd k
38

X(2p) = PN 1 x(n) + x(n + N ) W kn , k = 2p = 0, 2, ...
n=0
2
N

kn
P
N
2 1
X(2p + 1) =
x(n) x(n + ) W , k = 2p + 1,
n=0
which requires 2

N 2
2
complex multiplication
This process is continued till 2 points.

Remark: An N = 2m -pt FFT requires O(N log2 N ) complex
multiplications.
Zero padding may be used so that N = 2m .
Zero padding will not change resolution of Pp ().
39
FUNDAMENTALS OF ESTIMATION THEORY

Properties of a Good Estimator for a constant scalar a
Small Bias:
Bias = E[
a] a
Small Variance:
Consistent:
n
o
2
Variance = E (
a E[
a])
a
a as Number of measurements .
40
Ex. Measurement
y = a + e,
Where a is an unknown constant and e is N (0, 2 ).
Find a
from y ?
f(y|a)
Pdf of y:
y
a
41
Maximum Likelihood (ML) Estimate of a:

Say y = 5, we want to find a
so that it is most likely that the
measurement is 5
f (y|a)
|a=aM L = 0.
a
5 = a ML
a
M L = y
V ar[
aM L ] = V ar[y] = 2
E[
aM L ] = E[y] = E[a + n] = a
42
Ex. y = a + e
Three independent measurements y1 , y2 , y3 are taken.
a
M L = ? Bias = ? Variance = ?
f (yi |a) =
1 e
2
(yi a)2
2 2
1
e
f (y1 , y2 , y3 |a) = 3i=1 2
f (y1 ,y2 ,y3 |a)
|a=aM L
a
(yi a)2
2 2
=0
1
a
M L = (y1 + y2 + y3 ).
3

1
E[
aM L ] = E (y1 + y2 + y3 ) = a.
3
V ar[
aM L ] =
=
1
V ar(y1 + y2 + y3 )
9
2
1 2
( + 2 + 2 ) =
.
9
3
43
Ex. x is a measurement of an uniformly distributed random

variable on [0, ], where is an unknown constant. M L = ?
M L = x
x
ML = x
Question: What if two independent measurements x1 and x2 are

taken ?
M L =max (x1 , x2 ).
x1
44
x2
Cram
er - Rao Bound.
Let B(a) = E [
a(r)|a] a denote the bias of a
(r), where r is the
measurement.
Then
h
i
2
M SE = E (
a(r) a) |a
1+ a
B(a)]
[

.
2
E [ a ln f (r|a)] |a
* The denominator of the CRB is known as Fishers Information,

I(a).
* If B(a) = 0, the numerator of CRB is 1.
45
Proof: B(a)
= E [
a(r) a|a]
Z
[
a(r) a] f (r|a)dr
=
Z
Z
[
a(r) a]
f (r|a)dr
B(a) =
f (r|a)dr
a
a
|
{z
}
=1
Z
1
1+
[
a(r) a] f (r|a) f (r|a)
B(a) =
dr
a
a
f (r|a)
a f (r|a)
ln f (r|a) =
But
a
f (r|a)
Z
B(a) =
ln f (r|a)dr
1+
[
a(r) a] f (r|a)
a
a
nR
h
i o2
p
p
[
a(r) a] f (r|a) a
ln f (r|a)
f (r|a) dr

2
= 1 + a B(a) .
46
Schwarz Inequality:
Z
12 Z
g1 (x)g2 (x)dx
g1 2 (x)dx
g2 2 (x)dx
21
where = holds iff g1 (x) = cg2 (x) for some constant c (c is

independent of x).

2
Z

2
1+
B(a)
[
a(r) a] f (r|a)dr
a
)
(Z
2

ln f (r|a) f (r|a)dr
.
a
|
{z
}
I(a)
where = holds iff
a
(r) a = c a
ln f (r|a).
(where c is a constant independent of r).

47
Efficient Estimate:
An estimate is efficient if
(a.) It is unbiased
o
n
2
(b.) It achieves the CR - bound, i.e, E [
a(r) a] |a = CRB.
Ex. r = a + e
where a is unknown constant, e N (0, 2 ). a

M L = ? efficient ?
1
12 (ra)2
e 2
2
1
1
2
2 (r a) .
ln f (r|a) = ln
2 2
1
ln f (r|a) = 2 2(r a)
a
2
1
= 2 (a r).
f (r|a) =
48
=0
ln f (r|a)
a
a=
aM L
a
M L = r
1
(a a
M L )
ln f (r|a) =
a
2
2

ln f (r|a) = a
M L a
a
h
i
2
E (
aM L a) a = CRB
a
M L efficient
E [
a ] = E [r] = a, unbiased
ML
Remark: MSE = V ar[

aM L ] = V ar[r] = 2 .
(
(
2 )
2 )
1 2
1
1

ln f (r|a) a = E
I(a) = E
(a
r)
=
=

a
2
4
2
1
CRB =
aM L ].
= 2 = V ar[
I(a)
49
Remarks:
(1) If a
(r) is unbiased, V ar[
a(r) ] CRB.
(2) If an efficient estimate a
(r) exists, i.e,
ln f (r|a) = c[
a(r) a]. (c is independent of r.)
a
then
0=
M L (r) = a
(r).
ln f (r|a)|a=aM L (r) results in a
a
If an efficient estimate exists, it is a
M L .
(3) If an efficient estimate does not exist, how good a

M L (r) is
depends on each specific problem.
No estimator can achieve the CR-bound. Bounds (for example,
Bhattacharya, Barankin) larger than the CR-bound may be found.
50
Independent measurements r1 , ..., rN available, where ri may or

may not be Gaussian.
Assume
a
M L
N
1 X
=
ri .
N i=1
Law of large numbers: a

M L a
Central Limit Theorem:
a
M L has Gaussian distribution as N .
51
Asymptotic Properties of a
M L (r1 , ..., rN )
(a) a
M L (r1 , ..., rN ) a (
aM L is a consistent estimate.)
N
(b) a
M L is asymptotically efficient.
(c) a
M L is aymptotically Gaussian.
Ex. r = g 1 (a) + e,
Let
e N (0, 2 ). a
M L =?
efficient ?
b = g 1 (a). Then a = g(b)
dg 1 (a)
1
1
ln f (r|a) = 2 r g (a)
|a=aM L = 0
a
da
a
M L = g(r) = g(bM L ).
Invariance property of ML estimator
If a = g(b) then a
M L = g(bM L ).
a
M L may not be efficient. a
M L is not efficient if g() is a
nonlinear function.
52
PROPERTIES OF PERIODOGRAM
Bias Analysis
When r(k) is a biased estimate,
i
h
i
E Pp () = E Pc () = E
h
k < 0,
N
1
X
k=(N 1)
r(k)ejk
N k
k 0, E [
r(k)] =
r(k),
N
N +k
N |k|
E [
r(k)] = E [r (k)] =
r (k) =
r(k),
N
N

N
1
i
h
X
|k|
1
r(k)ejk .
E Pp () =
N
k=(N 1)
53
Bartlett or Triangular Window.

w B (k)
1
k
-(N-1)
N-1
i
X
E Pp () =
[wB (k)r(k)] ejk
k=
Let
h
DT F T
wB (k) WB ()
i
E Pp () =
1
2
54
P ()WB ( )d.
When r(k) is unbiased estimate,

h
i
E Pp () =
1
2
P ()WR ( )d .
DT F T
wR (k) WR ()
w R (k)
1
k
-(N-1)
P( )
N-1
E [ P()]
W ()
B,R
55
W B ()
Main lobe
Side lobes
3 dB power width of main lobe
2
N
(or
1
N
in Hz) .
Remark: The main lobe of WB () smears or smooths P ().

Two peaks in P () that are separated less than
resolved in Pp ().
2
N
cannot be
N1 in Hz is called spectral resolution limit of periodogram

methods.
56
Remark:
The side lobes of WB () transfer power from high power
frequency bins to low power frequency bins leakage.
Smearing and leakage cause more problems to peaky P () than
to flat P ().
If P () = 2 , for all , E[Pp ()] = P ().
Bias of Pp () decreases as N . (asymptotically unbiased.)
57
Variance Analysis
We shall consider the case x(n) is zero-mean circularly symmetric
complex Gaussian white noise.
K E[x(n)x (k)] = 2 (n k).

E[x(n)x(k)] = 0 for all n, k.
J
is equivalent to:
E [ Re(x(n))Re(x(k))] = 2 (n k).
2
E [Im(x(n))Im(x(k))] = 2 (n k).
E [Re(x(n))Im(x(k))] = 0.
Remark: The real and imaginary parts of x(n) are N (0, 2 ) and
independent of each other.
58
Remark: If x(n) is zero-mean complex Gaussian white noise, Pp ()

is an unbiased estimate.
r(k) = 2 (k).
N
1
h
i
X
E Pp () =
k=(N 1)
Pp ()
|k|
1
N
r(k)ejk = 2
r(k)ejk = 2
k=
i
= E Pp () .
59
For Gaussian complex white noise,

E [x(k)x (l)x(m)x (n)] = 4 [(k l)(m n) + (k n)(l m)] .
h
i
E Pp (1 )Pp (2 )
N 1 N 1 N 1 N 1
1 X X X X
E
[x(k)x
(l)x(m)x
(n)]
N2
m=0 n=0
k=0 l=0
ej1 (kl) ej2 (mn)

1
1 N
4 N
X
X
= 4 + 2
ej(1 2 )(kl)
N
k=0 l=0

2
N
1

4 X

= 4 + 2
ej(1 2 )k

N
k=0
(
)2
N
4
sin[(1 2 ) 2 ]
4
= + 2
N
sin (1 2 )
2
60
i
lim E Pp (1 )Pp (2 ) = P (1 )P (2 ) + P 2 (1 )(1 2 ).
N
nh
ih
io
lim E Pp (1 ) P (1 ) Pp (2 ) P (2 )
N
P 2 ( ), =
1
1
2
=
0,
1 6= 2 ( uncorrelated if 1 6= 2 )
Remark: Pp () is not a consistent estimate.

If 1 6= 2 , Pp (1 ) and Pp (2 ) are uncorrelated with each other.
This variance result is also true for
X
h(k)x(n k),
y(n) =
k=0
where x(n) is zero-mean complex Gaussian white noise.

x(n)
h (n)
y(n)
61
REFINED METHODS
Decrease variance of P () by increasing bias or
decreasing resolution .
Blackman - Tukey (BT) Method
Remark: The r(k) used in Pc () is poor estimate for large lags k.
M <N :
M
1
X
PBT () =
w(k)
r(k)ejk ,
k=(M 1)
where w(k) is called lag window.

Remark: If w(k) is rectangular, w(k)
r(k) is a truncated version of
r(k).
DT F T
If r(k) is a biased estimate, and w(k) W ()

R
1
PBT () = 2 W( )Pp ()d .

62
Remark: BT spectral estimator is locally weighted average of

periodogram Pp ().
The smaller the M , the poorer the resolution of PBT () but the
lower the variance.
Resolution of PBT ()
Variance of PBT ()
M
N
1
M.
M
fixed
0 .
For fixed M , PBT () is asymptotically biased but variance 0.

Question: When is PBT () 0 ?
63
DT F T
Theorem: Let Y () y(n),
(N 1) n N 1
Then Y () 0 iff
y(0)
y(1) y(N 1)
0
y(1)
y(0) y(N 2) y(N 1)
..
..
.
.
y[(N 1)]
y(0)
y(1)
..
..
..
..
.
.
.
.
..
.
is positive semidefinite.
In other words, Y () 0 iff

, 0, , 0, y[(N 1)], , y(0), y(1), , y(N 1), 0, is a
positive semidefinite sequence.
64
Remark: PBT () 0 iff {w(k)

r(k)} is a positive semidefinite
sequence.
PBT () 0 iff
BT =
R
w(0)
r(0)
w(M 1)
r(M 1)
..
..
.
.
w[(M 1)]
r[(M 1)]
..
..
.
w(0)
r(0)
..
BT 0.
is positive semidefinite, i.e, R
65
w(0)
BT =
R
w[(M 1)]
..
.
r(0)
J
r[(N 1)]
..
.
w(M 1)
..
.
..
.
= Hadamard matrix product:

J
th
(ij) element: (A B)ij = Aij Bij
66
w(0)
..
r(N 1)
..
.
..
.
r(0)
..
Theorem:
If A 0 (positive semidefinite) B 0 then A
B 0.
Remark: If r(k) is a biased estimate, Pp () 0 . Then if W()

0 , we have PBT () 0 .
Remark: Nonnegative definite (positive semidefinite) window
sequences: Bartlett, Parzen.
67
Time-Bandwidth Product
Equivalent Time Width Ne :
PM 1
w(n)
n=(M 1)
Ne =
w(0)
Ex.
PM 1
k=(M 1) (1)
Ne =
= 2M 1.
1
w (n)
R
n
-(M-1)
M-1
68
Ex.
1 |n| , (M 1) n (M 1)
M
wB (n) =
0,
else
w (n)
B
Ne = M
n
0
-(M-1)
69
M-1
Equivalent Bandwidth e :
2e =
W ()d
W (0)
DT F T
Since w(n) W ().

Z
1
w(n) =
W ()ejn d.
2
Z
1
W ()d.
w(0) =
2
W () =
M
1
X
w(n)ejn .
n=(M 1)
W (0) =
M
1
X
n=(M 1)
70
w(n)
Ne e =
PM 1
Ne e = 1
n=(M 1) w(n)
R
1
2 W ()d
W ()d
PM 1
2 n=(M 1)
w(n)
=1
(Time Bandwidth product.)
Remark:
If a signal decays slowly in one domain, it is more concentrated in
the other domain.
Window shape determines the side lobe level relative to W (0).
71
Ex:
1
.
x(2n) X
2
2
DT F T
X()
x(n)
1
1/2
n
x(2n)
1/2
1/4
n
0
Remark: Once the window shape is fixed, M Ne e .

M main lobe width .
72
Window design for PBT ()

Let m = 3dB main lobe width.
Resolution of PBT () m
Variance of PBT ()
1
m .
Choice of m is based on the trade-off between resolution

and variance, and N
Choice of window shape is based on leakage, and N .
Practical rule of thumb:
1. M
N
10 .
2. Window shape based on trade-off between smearing and leakage.

3. Window shape for PBT () 0,
Remark: Other methods for Non-parametric Spectral

Estimation include: Bartlett, Welch, Daniell Methods.
All try to reduce variance at the expense of poorer resolution.
73
Bartlett Method
} | {z
}
x(n):
}
| {z
| {z
x1 (n)
x2 (n)
xL (n)
x(n) is an N point sequence.
xl (n), l = 1, , L, are M point sequences.

N
.
xl (n) are non-overlapping. L = M
2

M
1

X
1

Pl () =
xl (n)ejn

M
n=0
1X
PB () =
Pl ().
L
l=1
Remark:
PB () 0, .
For large M and L, PB () [ PBT () using wR (n) ]

74
Welch Method:
xl (n) may overlap in the Welch method.
xl (n) may be windowed before computing Periodogram.
x1 (n)
x 2 (n)
x s (n)
Let w(n) be the window applied to xl (n), l = 1, .., S, n = 0, .., M 1

Let
M 1
1 X
2
P = power of w(n) =
|w(n)|
M n=0
75
2

M
1

X
1

w(n)xl (n)ejn
Pl () =

MP
n=0
S
X
1
PW () =
Pl ()
S
l=1
Remarks: By allowing xl (n) to overlap, we hope to have a larger

S, the number of Pj () we average. 50% overlap in general.
Practical examples show that PW () may offer lower variance
than PB (), but not significantly.
PW () may be shown to be PBT () -type estimator, under
reasonable approximation.
PW () can be easily computed with FFT -favored in practice
PBT () is theoretically favored.
76
Daniell Method:
PD () =
P ()
1
2
R +
Pp ()d.
Remark: PD () is a special case of PBT () with
1 , [, ]
DT
F
T
w(n) in PBT () W () =
0, else .
The larger the , the lower the variance, but the poorer the
resolution.
77
Implementation of PD ()
Zero pad x(n) so that x(n) has N points, N >> N.
Calculate Pp (k ) with FFT.

k =
2
k,
k = 0, , N 1.
k+J
X
1
PD (k ) =
Pp (j ).
2J + 1
j=kJ
Pp ()
| {z }
2J+1 points averaging
PD (k )
78
PARAMETRIC METHODS
Parametric Modeling
Ex.
r(0) 12
e
P (f ) =
2f
f
f
2
1
, |f |
2
P(f)
Remark: P (f ) is described by 2 unknowns: r(0) and f .

Once we know r(0) and f , we know P (f ), the PSD.
Nonparametric methods assume no knowledge on P (f ) too
many unknowns.
Parametric Methods attempt to estimate r(0) and f .
79
Parsimony Principle:
Better estimates may be obtained by using an
appropriate data model with fewer unknowns.
Appropriate Data Model.
If data model wrong, P (f ) will always be biased.
Estimate
True PSD
f
To use parametric methods, reasonably correct a priori

knowledge on data model is necessary.
80
Rational Spectra:

2

P () = 2 B()
A()
A() = 1 + a1 ej + + ap ejp
B() = 1 + b1 ej + + bq ejq .
Remark: We mostly consider real valued signals here.
a1 , , ap , b1 , , bq are real coefficients.
Any continuous PSD can be approximated arbitrarily close by a
rational PSD.
81
u(n)
H () = B ()
A ()
x(n)
u(n) = zero-mean white noise of variance 2 .
Remark:
2

2 B()
Pxx () =
.
A()
The rational spectra can be associated with a signal obtained by

filtering white noise of power 2 through a rational filter with
H() =
82
B()
.
A()
In Difference Equation Form,

x(n) =
p
X
k=1
ak x(n k) +
In Z-transform Form, z = ej
H(z) =
q
X
k=0
bk u(n k).
B(z)
,
A(z)
A(z) = 1 + a1 z 1 + + ap z p
B(z) = 1 + b1 z 1 + + bq z q
x(n)
z -1
x(n-1)
Unit Delay line
Notation sometimes used :
z 1 x(n) = x(n 1)
B(z)
u(n)
Then: x(n) =
A(z)
83
ARMA Model: ARMA(p,q)
AR Model: AR(p)
MA Model: MA(q)

2

2 B()
.
P () =

A()
2

2 1
P () =
.
A()
2
P () = 2 |B()| .
Remark: AR models peaky PSD better .

MA models valley PSD better.
ARMA is used for PSD with both peaks and valleys.
84
Spectral Factorization:
H() =
B()
A()

2

2 B()B ()
2 B()
=
.
P () =
A()
A()A ()
A() = 1 + a1 ej + + ap ejp
b1 , , bq , a1 , , ap are real coefficients.

A ()
= 1 + a1 ej + + ap ejp
1
1
1
= 1 + a1 + + ap p = A( )
z
z
z
P (z) =
B(z)B( )
2 A(z)A( 1z ) .
z
Remark: If a1 , , ap , b1 , , bq are complex,
B(z)B ( z1 )

P (z) =
1
A(z)A z
2
85
Consider
1
B(z)B(
2
z)
P (z) =
1 .
A(z)A( z )
Remark: If is zero for P (z), so is

If is a pole for P (z), so is
1
.
1
.
Since the a1 , , ap , b1 , , bq are real, the poles and zeroes of

P (z) occur in complex conjugate pairs.
Im
Re
86
Remark:
If poles of
1
A(z)
inside unit circle ,
H(z) =
If zeroes of B(z) inside unit circle, H(z) =

phase.
B(z)
A(z)
B(z)
A(z)
is BIBO stable.
is minimum
We chose H(z) so that both its zeroes and poles are inside unit
circle.
u(n)
H (z) =
B (z)
A (z)
x(n)
Stable and
Minimum Phase system
87
Relationships Among Models

An MA(q) or ARMA(p,q) model is equivalent to an AR().
An AR(p) or ARMA(p,q) model is equivalent to an MA()
model
Ex:
1 + 0.9z 1
H(z) =
=
1 + 0.8z 1
H(z)
=
=
=
ARMA(1,1)
1
1
(1 + 0.8z 1 ) (1+0.9z
1 )
1
(1 + 0.8z 1 )(1 0.9z 1 + 0.81z 2 + )
AR().
Remark:Let ARMA(p,q) =
B(z)
A(z)
1
C(z)
= AR().
From a1 , , ap , b1 , , bq , we can find c1 , c2 , and vice versa.

88
B(z)
1
Since
=
B(z)C(z) = A(z)
A(z)
C(z)

1
q
1
1 + b1 z + + bq z
1 + c1 z +

1
p
= 1 + a1 z + + ap z
1
0 0
c1
1
0 0
..
.
.. ..
..
1
.
.
.
.. .. ..
b1 .
.
.
.
..
cp
2 =
.. .. ..
.
.
. 1
cp+1
a
.. p
.
..
0
.. .. ..
.
.
.
.
c1
bq
.
..
..
.
cp
..
.
89
()
cp+1
.
.
.
cp+q
cp
..
.
cp+q1
cp
..
.
cpq+1
..
..
.
.
..
.
cp
1
b1
..
.
bq
0
0
..
.
0
b
c
cpq+1
1
p+1
..
.
..
. = ..
.
cp
bq
cp+q
Remark: Once b1 , , bq are computed with ()

computed with ().
90
.()
a1 , , ap can be
Computing Coefficients from r(k).

AR signals.
Let
1
A(z)
x(n) =
= 1 + 1 z 1 + 2 z 2 +
1
A(z) u(n)
= u(n) + 1 u(n 1) +
E [x(n)u(n)] = 2
E [x(n k)u(n)] = 0, k 1
Since A(z)x(n) = u(n)
x(n) + a1 x(n 1) + + ap x(n p) = u(n)
h
i a
1
= u(n)
x(n) x(n 1) x(n p)

ap
91
k = 0,
h
i a1
=
E x(n) x(n) x(n 1) x(n p)
.
ap
h
i
r(0) r(1) r(p)
92
a1
..
.
ap
2 . ()
2 .
k 1,
h
i
a1
=
E x(n k) x(n) x(n 1) x(n p)
.
..
ap
h
i
r(k) r(k 1) r(k p)
93
a1
..
.
ap
0. ()
0.
r(0)
r(1)
r(p)
r(0)
r(p + 1)
r(1)
..
..
..
.
.
.
r(p) r(p 1)
r(0)
Ra = r
1
a1
..
.
ap
2
0
..
.
0
r(p + 1)
a
r(1)
1
..
..
..
. = . .
.
ap
r(p 1)
r(0)
r(p)
r(0)
..
.
94
Remarks:
When we only have N samples, {r(k)} is not available. {
r(k)}
may be used to replace {r(k)} to obtain a
1 , , a
p .
This is the Yule - Walker Method.
R is a positive semidefinite matrix. R is positive definite unless
x(n) is a sum of less than p2 sinusoids.
R is Toeplitz.
Levinson - Durbin algorithm is used to solve for a efficiently
AR models are most frequently used in practice.
Estimation of AR parameters is a well-established topic.
95
Remarks:
If {
r(k)} is a positive definite sequence and if a1 , , ap are found
by solving Ra = r, then the roots of polynomial
1 + a1 z 1 + + ap z p are inside the unit circle.
The AR system thus obtained is BIBO stable
Biased estimate {
r(k)} should be used in YW-equation to obtain
a stable AR system:
96
Efficient Methods for solving

Ra = r
a = r
or R
Levinson - Durbin Algorithm.

Delsarte - Genin Algorithm.
1
Gohberg - Semencul Formula for R1 or R
(Sometimes, we may be interested in not only a but also R1 )
97
Levinson - Durbin Algorithm (LDA)

Let
Rn+1
r(0)
r(1)
r(n)
r(0)
r(1)
,
= .
.
..
..
r(n) r(n 1)
r(0)
n = 1, 2, , p
a
n,1
.
Let n = ..
an,n
98
( real signal )
LDA solves
Rn+1 =
n
0
recursively in n, starting from n = 1.

Remark:
For n = 1, 2, , p,
LDA needs p2 flops
Regular matrix inverses need p4 flops.
99
Let A = Symmetric and Toeplitz.
bn
bn1
=
, with b =
Let b
..
b1
Then if c = Ab
c = Ab
100
b1
..
.
bn
Proof:
Aij
ci
A=
= a|ij|
= cni+1 =
a0
a1
..
.
an1
n
X
a1
..
.
..
.
an1
..
..
.
.
..
.
a1
a1
a0
Ani+1,k bk
k=1
=
=
n
X
k=1
n
X
a|ni+1k| bk
m=1
a|mi| bnm+1 =
(Ab)
i
101
n
X
m=1
Am,ibm
(m = n k + 1)
Consider:

Rn+2
n =
..
Rn+1
. r(n + 1)
..
.
r(n)
..
..
.
.
..
.
r(1)
..
r(0)
r(n + 1) .
Let
rn =
Then n = r(n + 1) + n T rn .
102
r(1)
..
.
r(n)
n 0
0
n
Result:
n
Let kn+1 = . Then
n
n
+ kn+1
.
n+1 =
0
1
2
n+1 = n (1 kn+1
)
103
Proof:
Rn+2
1
n+1
= Rn+2
+
k
n+1
n
= 0 + kn+1 0
n
n

n + kn+1 n
n+1
= 0
=
0

n + kn+1 n
0
104
LDA: Initialization:
n=1:
R2 =
r(0) r(1)
r(1) r(0)
1 = r(1)
r(0)
1 = r(0)
r 2 (1)
r(0)
1
1
1
0
O(1) flops
For n = 1, 2, , p 1, do:
T
r(n+1)+ n
rn
=
n flops
n
2
n+1 = n (1 kn+1
)
O(1) flops
n
+ kn+1
. n flops
n+1 =
0
1
105
O(1) flops
k1 = 1
kn+1
Ex:
a1 = 0
.
0
1
a2
Straightforward Solution:
a
1
1 =
a2
=
2
(1 ) 1
2
2 = 1 2 .
=
0
106
LDA: Initialization:
r(1)
r(0)
1 =
k2
1 = r(0)
r 2 (1)
r(0)
k1 = 1 = .
= 1 2 .
r1 = ,
r(2) + 1T r1
=
1
2 + ()
=
=0
1 2
= 1 (1 k22 ) = (1 2 )(1 02 )
=
1 2 = 2
107
1
0
+ k2
+ 0
1
1
a1
a2
Properties of LDA:
|kn | < 1,
n = 1, 2, , p,
and r(0) > 0, iff
An (z) = 1 + an,1 z 1 + + an,n z n = 0

has roots inside the unit circle.
|kn | < 1,
n = 1, 2, , p,
and r(0) > 0
108
iff
Rn+1 > 0
Proof (for the second

to prove:
1 an,1 an,n
..
..
.
.
0
1 a1,1
property above only): We first use induction
{z
UT
n+1
1
r(0) r(n)
an,1
..
..
.
r(n)
r(0)
{z
} an,n
|
|
}
Rn+1
..
0
1
r(0)
{z
Dn+1
109
()
1
..
a1,1
{z
Un+1
n=1:
r(0) r(1)
r(1) r(0)
1 a1,1
1
a1,1
r(0) r(1)
r(0)
110
r(1) r(0)
r(1)
1
0 r(0)
1 a1,1
.
1
a1,1
Suppose () is true for n = k 1, i.e.,

UTk Rk Uk = Dk .
Consider n = k:
UTk+1 Rk+1 Uk+1
=
Since
Tk
UTk
r(0)
rTk
rk
Rk
r(0) + Tk rk
rTk + Tk Rk
UTk rk
Uk Rk
Rk+1
1
k
111
k
0
0
Uk
Uk
r(0)
rTk
rk
Rk
r(0) + rTk k = k
k
0
r(0) + Tk rk
rTk + Tk RTk
rk + Rk k = 0
UTk+1 Rk+1 Uk+1
rTk + Tk Rk
UTk rk
UTk Rk
= k
=0
1
Uk
0
UTk rk + UTk Rk k UTk Rk Uk
k 0
= Dk+1
=
0 Dk
112
UTn+1 Rn+1 Un+1 = Dn+1 .

()
1
R
Since U1
n+1 n+1
proven !
1
T
= D1
Un+1
n+1 ,
1
T
R1
n+1 = Un+1 Dn+1 Un+1 .
12
Un+1 Dn+1
is called Cholesky Factor of R1

n+1
Consider the determinant of Rn+1 :

det(Rn+1 ) = det(Dn+1 ) = r(0)nk=1 k
det(Rn+1 ) = n det(Rn )
Rn+1 > 0,
n = 1, 2, , p,
and k > 0,
k = 1, 2, , p.
113
iff
r(0) > 0
Recall
2
).
n+1 = n (1 kn+1
If Rn+1 > 0,
n > 0,
r(0) > 0,
n = 1, 2, , p,
n n+1
=
n
Since n n+1 < n ,
2
kn+1
2
<1
kn+1
If
|kn | < 1,
|kn+1 | < 1.
r(0) > 0,
= r(0) > 0,
0
n+1 = n (1 k 2
2
< 1.
kn+1
n+1 )
> 0,
114
n = 1, 2, , p 1
MA Signals:
x(n)
= B(z)u(n)
= u(n) + b1 u(n 1) + + bq u(n q)
r(k)
= E [x(n)x(n k)]
= E {[u(n) + + bq u(n q)]
[u(n k) + + bq u(n q k)]}
|k| > q :
|k| < q :
r(k)
qk
X
bl bl+k ,
l=0
r(k)
= 2
r(k) = 0
= r(k).
q>k0
q <k <0
real.
b0 = 1, b1 , , bq =
Pq
P () = k=q r(k)ejk .
115
Remarks: Estimating b1 , , bq is a nonlinear problem.

P () =
A simple estimator is
q
X
r(k)ejk .
k=q
* This is exactly Blackman - Tukey method with rectangular

window of length 2q + 1.
* No matter whether r(k) is biased or unbiased estimate, this P ()
may be < 0 .
* When unbiased r(k) is used, P () is unbiased.
* To ensure P () 0 , , we may use biased r(k) and
a window with W () 0, . For this case, P () is biased.
This is again exactly BT-method.
A most used MA spectral estimator is based on a Two-Stage
Least Squares Method. See the discussions on ARMA later.
116
ARMA Signals:
(Also called Pole -Zero Model).
(1 + a1 z 1 + + ap z p )x(n) = (1 + b1 z 1 + + bq z q )u(n).
Let us write x(n) as MA():
x(n) = u(n) + h1 u(n 1) + h2 u(n 2) +
E [x(n)u(n)] = 2 .
E [u(n)x(n k)] = 0, k 1
ARMA model can be written as
x(n)
h
i x(n 1)
h
= 1 b1
1 a1 ap
..
x(n p)
u(n)
i
u(n 1)
bq
..
u(n q)
Next we shall multiply both sides by x(n k) and take E {.} .

117
k= 0:
1 a1
k = 1:
i r(1)
h
ap
.. = 1 b1
.
r(p)
1 a1
r(0)
ap
r(1)
r(0)
..
.
r(p 1)
= 1 b1
..
.
118
i
2 h1
bq
..
.
2 hq
bq
2 h1
..
.
2 hq1
k q+1
1 a1
r(k)
i
r(k + 1)
ap
..
r(k + p)
= 1 b1
bq
r((q + 1))
r(q)
r((q + 1) + p)
r((q + 2)) r((q + 1)) r((q + 2) + p)
..
..
.
.
This is the modifed YW - Equation
119
0
0
..
.
0
1
a1
..
.
ap
= 0.
= 0.
To solve for a1 , , ap we need p equations. Using r(k) = r(k)

gives
r(q + 1)
r(q)
r(q p + 1)
1
r(q + 1)
r(q p + 2) a1
r(q + 2)
..
.. = 0.
..
.
.
.
r(q + p) r(q + p 1)
r(q p + 1)
a1
..
.
.
r(q + 1)
..
..
.
a
p
r(q + p 1)
r(q)
r(q)
120
ap
r(q)
r(q + 1)
r(q + 2)
=
..
r(q + p)
Remarks:
(1) Replacing r(k) for r(k) above, we can solve for a
1 , , a
p .
(2) The matrix on the left side
is nonsingular under mild conditions.
is Toeplitz.
is NOT symmetric.
Levinson - type fast algorithms exist.
121
What about the MA part of the ARMA PSD?

Let
y(n) = (1 + b1 z 1 + + bq z q )u(n).
The ARMA model becomes

(1 + a1 z 1 + + ap z p )x(n) = y(n)
y(n)
1
A(z)
x(n)
x(n)
A(z)
y(n)

1 2
Py ().
Px () =
A()
Let k be the autocorrelation function of y(n) . Then (see MA

signals).
q
X
Py () =
k ejk
k=q
122
= E [y(n)y(n k)]
= E [A(z)x(n)A(z)x(n k)]
p
p
X
X
= E
ai x(n i)
aj x(n j k)
i=0
p X
p
X
i=0 j=0
j=0
ai aj r(k + j i).
p may be computed with the modified YW- Method

Since a
1 , , a
= Pp Pp r(k + j i)
a
a
,
a
=
1, k = 0, 1, , q
k
i j
0
i=0
j=0
k = k .
123
ARMA PSD Estimate:

P () =
Remarks:
Pq
jk
e
k
k=q

2
A()
This method is called modified YW ARMA Spectral Estimator

P () is not guaranteed to be 0,
, due to the MA part.
The AR estimates a
1 , , a
p have reasonable accuracy if the
ARMA poles and zeroes are well inside the unit circle.
Very poor estimates a
1 , , a
p occur when ARMA poles and
zeroes are closely-spaced and nearby unit circle. (This is
narrowband signal case).
124
Ex: Consider
x(n) = cos(1 n + 1 ) + cos(2 n + 2 ),
where 1 and 2 are independent and uniformly distributed on
[0,2].
1
1
r(k) = cos(1 k) + cos(2 k).
2
2
cos 2k
cos 1k
Note that when 1 2 , large values of k are needed to distinguish

cos(1 k) and cos(2 k).
Remark: This comment is true for both AR and ARMA models.
125
Overdetermined Modified Yule - Walker Equation
r(q)
r(q p + 1)
..
..
.
.
..
r(q + p 1)
r(q)
.
..
..
p
.
.
r(q + M 1) r(q + M p)
126
(M > p)
r(q + 1)
..
.
r(q + p)
..
r(q + M )
Remarks:
The overdetermined linear equations may be solved with
Least Squares or Total Least Squares Methods.
M should be chosen based on the trade-off between information
contained in the large lags of r(k) and the accuracy of r(k).
Overdetermined YW -equation may also be obtained for AR
signals.
127
Solving Linear Equations:

Consider Amn xn1 = bm1 .
When m = n and A is full rank, x = A1 b.
When m > n and A is full rank n, then the solution exists if b is
in the n-dimensional subspace of the m-dimensional space that is
determined by the columns in A.
Ex:
A=
If b =
If
b=
1
1
3
0
1
0
, x = 3.
, x =?
128
does not exist !
Least Squares (LS) Solution

for Overdetermined Equations:
Objective of LS solution:
Let
Find xLS
e = Ax b
so that eH e
is minimized.
e
1
e2
Let e =
..
.
em
Euclidean Norm = eH e = |e1 | + |e2 | + + |em |

129
e2
e1
Ex:
Remarks: AxLS = b + eLS

We see that xLS is found by perturbing b so that a solution
exists.
130
eH e
= (Ax b) (Ax b)
= xH AH Ax xH AH b bH Ax + bH b
h
iH
h
i
1
1
= x (AH A) AH b (AH A) x (AH A) AH b
i
h
1
+ bH b bH A(AH A) AH b
Remark: The 2nd term above is independent of x.

eH e is minimized if
1
x = (AH A)
AH b
131
LS Solution
Illustration of LS solution:
Let
..
A = [a1 . a2 ].
x1
xLS =
x2
a1
b
x1 a 1
A x LS
a2
x2 a 2
132
Ex:
A=
xLS
1
0
,b =
1
1
xLS =?
= (AH A)1 AH b
1
h
i 1
h
= 1 0
1
0
AxLS
eLS = AxLS
1
0
b=
(1) =
1
0
133
1
1
1
1
0
1
Computational Aspects of LS
Solving Normal Equations
H
A A xLS = AH b.
This equation is called Normal equation.
Let
AH A = C,
CxLS = g,
AH b = g.
where C is positive definite.
134
(1)
Cholesky Decomposition:
where
L=
C = LDLH ,
1
0 0
l21 1 0
(Lower Triangular Matrix )
..
..
.
.
ln1 ln2 1
d
0
1
..
D=
, di > 0.
.
0
dn
135
Back - Substitution to solve:

LDLH xLS = g
Let
l21
..
.
ln1
ln2
y = DLH xLS .
y1
0
0 y2
=
..
..
.
.
y1 = g1
y2 = g2 l21 y1
..
yk = gk Pk1 lkj yj ,
j=1
136
yn
g1
..
.
gn
k = 3, , n.
1 l21
Remarks:
ln1
ln2
..
.
x =
n
xk =
yn
dn
yk
dk
x1
..
.
xn
y1
d1
= LH xLS = D1 y = ..
yn
dn
Pn
l
j=k+1 jk xj ,
k = n 1,
Solving Normal equations may be sensitive to numerical errors.
137
Ex.
3 3
4 4+
x1
x2
where is a small number.

Exact solution:
x1
x2
1
1
1
1
Ax = b
Assume that due to truncation errors, 2 = 0.
25 +
1
. 25
H
H
.
A A=
,A b =
25 + 25 + 2
1 + 2
138
Solution to Normal equation (Note the Big Difference!):
49

1
+2
H
H
A b=
.
x= A A
49
139
QR Method: (Numerically more robust).

Ax = b.
Using Householder
transformation,
we can find an orthonormal

matrix Q
i.e, QQH = I , such that
T
z1
.... x = QAx = Qb = ,
z2
0
where T is a square, upper triangular matrix, and

min eH e = zH
2 z2
TxLS = z1
Back Substitution to find xLS
140
Ex.
3
4
3
4
x1
x2
1
1
1 3 4
Q=
.
5 4 3
1
5 5 + 5
x1
= 5 .
QAx = Qb gives
7
0 7
x
2
5
5
x =1
2
x1 = 1 (same as exact solution)
Remark: For large number of overdetermined equations, QR

method needs about twice as much computation as solving Normal
equation in (1).
141
Total Least Squares (TLS) solution to Ax = b.

Recall xLS is obtained by perturbing b only, i.e,
eH
LS eLS =
AxLS = b + eLS .
min.
xT LS is obtained by perturbing both A and b, i.e.,

(A + ET LS ) xT LS = b + eT LS ,
||[ET LS
bT LS ]||F =
minimum,
where ||.||F is Frobenius matrix norm,

XX
2
|gij | ,
||G||F =
i
gij = (ij)th
element of G.
142
Illustration of TLS solution
The straight line is found by minimizing the shortest distance

between the line and the points squared
Let C = [A
B].
Let the singular value decomposition (SVD) of C be

C = UVH ,
Remarks: The columns of U are the eigenvectors of CCH .
143
Remarks: The columns in V are the eigenvectors of CH C.

Both U and V are unitary matrices, i.e,
UUH = UH U = I,
VVH = VH V = I.
is diagonal and the diagonal elements are the eigenvalues of

CH C
..
n+1
0
0
1 2 n+1 0, i are real
144
Let
1
V11
V=
V21
Remarks:
..
. V12 n
..
1
. V22
xT LS = V12 V22 1
At low SNR, TLS may be better than LS.

At high SNR, TLS and LS yield similar results.
Markov Estimate:
If the statistics of e = Ax - b is known,
the statistics may be used to obtain better solution to Ax =b.
145
ARMA Signals:
Two Stage Least Squares Method
Step 1: Approximate ARM A(p, q) with AR(L) for a large L.
2 , , a
L .
YW Equation may be used to estimate a
1 , a
u
(n)
= x(n) + a
1 x(n 1) + + a
L x(n L).
=
N
X
1
u
2 (n).
N L
n=L+1
146
Step 2: System Identification

u(n)
x(n)
B(z)
A(z)
Let
x=
x(0)
x(1)
..
.
x(N 1)
147
u
=
u
(0)
u
(1)
..
.
u
(N 1)
H =
x(1)
x(0)
..
a1
a2
..
.
= ap .
b1
.
.
.
bq
u
(1)
x(p)
x(p + 1)
u
(0)
u
(q)
u
(q + 1)
x(N 2) x(N p 1) u
(N 2) u
(N q 1)
148
LS Solution
Remarks:
(real signals) .
x = H + u
1 T
T
)
= H H
H (x u
Any elements in H that are unknown are set to zero.

QR Method may be used to solve the LS problem.
Step 3:

1 + b ej + + b ejq 2

1
q
P () =
2

j
jp
1 + a

1 e
+ + a
p e
Remark: The difficult case for this method is when ARMA zeroes
are near unit circle.
149
Further Topics on AR Signals:

Linear prediction of AR Processes
Forward Linear Prediction
x(n)
n-4
n-2
n-3
n
n-1
Samples used to predict x(n)
x(n)
m
i=1
f
x (n)
f
a x(n-i)
i
f
e (n)
-
150
ef (n) = x(n) x
f (n).
h
i

2
f = E ef (n)
Goal: Minimize f
= E
2 i
e (n)
f
= E x(n) +
= rxx (0) +
m
X
m
X
i=1
!2
afi x(n i)
afi rxx (i)
i=1
m
X
afj rxx (j) +
j=1
f
afi
=0
m X
m
X
i=1 j=1
rxx (i) +
m
X
j=1
151
afi afj rxx (j i)
afj rxx (j i) = 0.
rxx (0)
rxx (1)
rxx (m)
rxx (0)
rxx (m 1)
rxx (1)
..
rxx (m) rxx (m 1)

rxx (0)
f
a1
..
.
afm
Remarks: This is exactly the YW - Equation.

f decreases as m increases.
m
p
152
f
0
..
.
0
Backward Linear prediction

x(n)
n+3
n+1
n
n+2
n+4
Samples used to predict x(n)
x
b (n) =
m
X
abi x(n + i).
i=1
eb (n) = x(n m) x
b (n m)
h
2 i
b
b
= E e (n)
.
153
To minimize b , we obtain
rxx (0)
..
.
rxx (1)
rxx (m)
rxx (m) rxx (m 1)

afi = abi ,
f = b .
1
.
..
rxx (0)
abm
for all i
154
b
0
..
.
0
Consider an AR(p) model and the notation in LDA:

Let m = 1, 2, , p
efm (n) = x(n) +
m
X
i=1
afm,i x(n i)
x(n) x(n 1) x(n m)
ebm (n) = x(n m) +
m
X
i=1
= [x(n m) x(n m + 1)
= [x(n)
1
m
abm,i x(n m + i)
x(n)]
x(n m + 1) x(n m)]

155
1
m
m
Recall LDA:
efm (n) =
m =
[x(n) x(n 1)
m1
0
+ km
m1
1
x(n m)]
m1
= [x(n) x(n 1)
+km [x(n 1) x(n 2)
+ km
m1
x(n m + 1)]
156
m1
m1
x(n m)]
1
Similarly,
efm (n) = efm1 (n) + km ebm1 (n 1).

ebm (n) = ebm1 (n 1) + km efm1 (n).
157
Lattice Filter for Linear Prediction Error

f
e (n)
1
k1
x(n)
f
e (n)
2
km
k2
k1
f
e (n)
m
k2
-1
km
-1
b
e (n)
1
b
e (n)
2
-1
b
e (n)
m
Remarks: The implementation advantage of lattice filters is that

they suffer from less round-off noise and are less sensitive to
coefficient errors.
If x(n) is AR(p) and m = p, then
x(n)
-1
1+ a z +
1
+ a
u(n)
-p
z
Whitening Filter
158
AR Spectral Estimation Methods

Autocorrelation or Yule-Walker method: Recall that YWEquation may be obtained by minimizing
n
o
2
2
(n)] ,
E e (n) = E [x(n) x
where
x
(n) =
p
X
k=1
ak x(n k).
The autocorrelation or YW method replaces r(k) in the YW

equation with biased r(k)
r(1)
r(0)
r(p 1)
a
1
.
.
..
.
..
.
.
.
=
.
.
.
.
.
r(p)
a
p
r(p 1)
r(0)
159
Covariance or Prony Method

Consider the AR(p) signal,
x(n) =
p
X
k=1
ak x(n k) + u(n),
In matrix form,
x(p 1)
x(p)
..
.
x(N 2)
x(p)
x(p + 1)
..
x(N 1)
x(p 2) ..
x(p + 1) ..
..
n = 0, 1, , N 1
x(0)
x(1)
x(N p 1)
160
a1
a2
..
.
ap
u(p)

u(p + 1)
+
..

.

u(N 1)
The Prony Method is to find LS solution

equation
x(p 1)
x(0)
..
x(N 2) x(N p 1)
to the overdetermined
a1
..
.
ap
x(p)
..
.
x(N 1)
Remarks:
The Covariance or Prony Method minimizes
"
#2
p
N
1
N
1
X
1 X 2
1 X
2
u
(n) =
x(n) +
a
k x(n k)
=
N p n=p
N p n=p
k=1
161
The Autocorrelation Method or YW-Method minimizes

#2
"
p
X
1 X
2
a
k x(n k)
x(n) +
=
N n=
k=1
where those x(n) that are NOT available are set to zero.
For large N , the YW and Prony methods yield similar results.
For small N , YW method gives poor performance. The Prony
p for small N . The Prony
method can give good estimates a
1 , , a
method gives exact estimates for x(n) =sum of sinusoids.
Since biased r(k) are used in YW method, the estimated poles
are inside unit circle. Prony method does not guarantee stability.
162
Modified Covariance or Forward Backward (F/B) Method

Recall Backward Linear Prediction:
x(n) =
p
X
abk x(n + k) + eb (n).
k=1
For real data and real AR coefficients,
x(0)
x(1)
..
.
x(N p 1)
afk = abk = ak ,

x(1)

x(2)

..

.

k = 1, , p
x(2)
x(p)
x(3) x(p + 1)
x(N p)
x(N 1)
a1
a2
..
.
ap
In the F/B method, this backward prediction equation is combined

with the forward prediction equation and LS solution is found.
163
x(p 1)
..
x(N 2)
x(1)
..
x(N p)
x(0)
..
.
x(N p 1)
x(p)
..
x(N 1)
a1
a2
..
.
ap
x(p)
..
.
x(N 1)
x(0)
..
.
x(N p 1)
Remarks: The F/B method does not guarantee poles inside the
unit circle. In Practice, the poles are usually inside the unit circle.
164
For complex data and complex model,

f
b
ak = ak = ak , k = 1, , p
Then F/B solves:
x(p 1)
x(0)
..
..
.
.
x(N 2) x(N p 1)
x (p)
x (1)
..
..
.
.
x (N p)
x (N 1)
165
a1
a2
..
.
ap
x(p)
..
.
x(N 1)
x (0)
..
.
x (N p 1)
Remarks on
2:
In YW method,
2 = r(0) +
p
X
a
k r(k).
k=1
In Prony Method,
Let
eLS
e(p)
..
.
e(N 1)
N
1
X
1
2
2 =
|e(n)|
N p n=p
166
In F/B Method,
Let
eLS
=
2(N p)
2
ef (p)
..
.
ef (N 1)
eb (0)
..
.
eb (N p 1)
)
(N 1
N
p1
X
X

ef (n)2 +
eb (n)2
n=p
n=0
167
Burg Method
Consider real data and real model. Recall LDA:
n
n
+ kn+1
n+1 =
0
1
Thus, if we know n and kn+1 , we can find n+1 .
Recall
()
ef (n) = ef (n) + k eb (n 1)
m m1
m
m1
eb (n) = eb (n 1) + km ef (n),
m
where
m1
efm1 (n) = x(n) +
m1
m1
X
k=1
ebm1 (n) = x(n m + 1) +
m1
X
k=1
168
a
m1,k x(n k).
a
m1,k x(n m + 1 + k)
km is found by minimizing (for m1 given)

N 1
2 b
2 o
1 X n f
em (n) + em (n)
.
2 n=m
km
PN 1 f
em1 (n)
ebm1 (n1)
.
= PN 1 n=m
2
2
f
b
[em1 (n)] +[em1 (n1)]
n=m
Steps in Burg method:

Initialization
1
N
r(0) =
0 = r(0)
PN 1
n=0
ef0 (n) = x(n),

eb0 (n) = x(n),
169
x2 (n)
n = 1, 2, , N 1
n = 0, 1, , N 2.
()
For m = 1, 2, , p,
Calculate km with (*)
2
m = m1 (1 km
)
m1
m1
m =
, ( 1 = k1 ).
+ km
0
1
Update efm (n) and ebm (n) with ()
2.
Remarks: p =
Since a2 + b2 2ab,

km < 1,
Burg Method gives poles that are inside unit circle.

Different ways of calculating km are available.
170
Properties of AR(p) Signals:

Extension of r(k):
* Given r(0), r(1), , r(p).
* From YW - Equations we can calculate a1 , a2 , , ap , 2

Pp
* r(k) = l=1 al r(k l), k > p
Another point of view:

* Given r(0), , r(p).
* Calculate a1 , , ap , 2 .
* Obtain P ()
DT F T
* r(k) P ().
171
Maximum Entropy Spectral Estimation

Given r(0), , r(p) . The remaining r(p + 1), are extrapolated
to maximize entropy.
Entropy: Let Sample space for discrete random variable x be
x1 , , xN . The entropy H(x) is
H(x) =
N
X
P (xi ) ln P (xi ),
i=1
P (xi ) = prob(x = xi )
For continuous random variable,
Z
H(x) =
f (x) ln f (x)dx.
f (x) = pdf of x.
172
For Gaussian random variables,
x(0)
..
x=
.
x(N 1)
N (0, RN )
1
ln(detRN ).
2
N , we consider Entropy Rate:
HN =
Since HN
as
HN
h = lim
N N + 1
h is maximized with respect to r(p + 1), r(p + 2), .
Remark: For Gaussian case, we obtain Yule-Walker equations .... !
173
Maximum Likelihood Estimators:

Exact ML Estimator:
u(n)
real inputs
x(n) , n = 0, ..., N-1
A(z)
real outputs
u(n) is Gaussian white noise with zero-mean.
E[u(n)] = 0,
V ar[u(n)] = 2
E[u(i)u(j)] = 0, i 6= j,
The likelihood function is

2
f = f x(0), , x(N 1)|a1 , , ap ,
The ML estimates of a1 , , ap , 2 are found by maximizing f .

174

2
f = f x(p), , x(N 1)| x(0), , x(p 1), a1 , , ap ,

2
f x(0), , x(p 1)|a1 , , ap ,

2
* Consider first f1 = f x(0), , x(p 1)|a1 , , ap ,

1
1 T 1
f1 =
exp x0 Rp x0 .
p
1
2
2
(2) det 2 (Rp )
x(0)
r(0)
r(p 1)
..
..
.
.
.
.
,
R
=
x0 =
.
.
p
.
.
.
x(p 1)
r(p 1)
r(0)
Remark: r(0), , r(p 1) are functions of a1 , , ap , 2 . (see, e.g.,

the YW system of equations)
175
* Consider next

2
f2 = f x(p), , x(N 1)|x(0), , x(p 1), a1 , , ap ,
x(n) +
p
X
k=1
ak x(n k) = u(n)
u(p) = x(p) + a1 x(p 1) + + ap x(0).
u(p + 1) = x(p + 1) + a1 x(p) + + ap x(1)

..
u(N 1) = x(N 1) + a1 x(N 2) + + ap x(N p 1).
176
u(p)

..

=
.

u(N 1)
x(p)
a1 1
0 0
x(p + 1)
a2 a1 1 0
..
.
..
..
.
.
x(N 1)
0 ap 1
a1 x(p 1) + + ap x(0)
a2 x(p 1) + + ap x(1)
..
ap x(p 1)
..
0
177
Let
u=
u(p)
..
.
u(N 1)
,x =
x(p)
..
.
x(N 1)
Given x(0), , x(p 1), a1 , , ap , 2 , x and u are related by

linear transformation.
The Jacobian of the transformation
1
0
a1 1
J=
.
..
..
.
ap
det(J) = 1
178
0
..
f (u) =
1
(2 2 )
f2
N p
2

1 T
exp 2 u u
2
= f [u(x)] |det(J)|
= f [u(x)].
Let
x(p)
x(p 1)
x(0)
x(p)
x(1)
x(p + 1)
X=
..
x(N 1) x(N 2) x(N p 1)
179
a=
f2 =
1
a1
..
.
ap
u = Xa

1 T T
1
X X
a .
N p exp 2 2 a
(2 2 )
Remark: Maximizing f = f1 .f2 with respect to a1 , , ap , 2 is

highly non-linear!
180
An Approximate ML Estimator
p ,
2 are found by maximizing f2 .
a
1 , , a
a
1 , , a
p are found by minimizing
aT XT X
a = uT u

u(p)
1
x(p)
x(0)

x(1)
a1 u(p + 1)
x(p + 1)

..
..
.. =
.
.
.
x(N 1) x(N p 1)
ap
u(N 1)
This is exactly Pronys Method !
p
N
1
X
X
1
x(n) +
2 =
a
j x(n j) .
N p n=p
j=1
Again, exactly Pronys Method !

181
Accuracy of AR PSD Estimators

Accuracy Analysis is difficult.
Results for large N are available due to Central Limit Theorem.
For large N , the variances for a
1 , , a
p ,
are all proportional to N1 . Biases N1 .
182
k1 , , kp ,
2 , P ()
AR Model Order Selection

Remarks:
Order too low yields smoothed/biased PSD estimate.
Order too high yields spurious peaks/large variance in PSD
estimate
Almost all model order estimators are based on the estimate of
the power of linear prediction error, denoted k , where k is the
model order chosen.
183
Final Prediction Error (FPE) Method

minimizes
FPE(k) =
N +k
N k k
Akaike Information Criterion (AIC) Method

minimizes
AIC(k) = N ln k + 2k .
Remarks:
As N , AICs probability of error in choosing correct order
does NOT 0.
As N , AIC tends to overestimate model order.
184
Minimum Description Length (MDL) Criterion

minimizes
MDL(k) = N ln k + k ln N .
Remark: As N , MDLs probability of error 0.

(consistent!).
Criterion Autoregressive Transfer (CAT) Method
minimizes
k
1 X 1
1
CAT(k) =
,
N i=1 i
k
i =
N
i
N i
Remarks: None of the above methods works well for small N

Use these methods to initially estimate orders. ( Practical
experience needed ).
185
Noisy AR Processes:
y(n) = x(n) + w(n)
x(n) = AR(p) process.
2
w(n) = White Gaussian noise with zero-mean and variance w
x(n) and w(n) are Independent of each other.

Pyy ()
= Pxx () + Pww ()
=
=
2
2
|A()|
2
+ w
2
2
2 + w
|A()|
2
|A()|
186
Remarks: y(n) is an ARMA signal

a1 , , ap ,
2
2 , w
may be estimated by
* ARMA methods.
* A large order AR approximation.
* Compensating the effect of w(n).
* Bootstrap or adaptive filtering and AR methods.
187
Wiener Filter: (Wiener-Hopf Filter)

x(n) Desired Signal
+
y(n) = x(n) + w(n)

H(z)
e(n)
x(n)
H(z) is found by minimizing E |e(n)| .

H(z) depends on knowing Pxy ().
188
General Filtering Problem: (Complex Signals)

d(n) Desired Signal
+
y(n) = x(n) + w(n)

H(z)
Special case of d(n): d(n) = x(n + m):

1.) m > 0,
m - step ahead prediction.
2.) m = 0,
filtering problem
3.) m < 0,
smoothing problem.
189
e(n)
Three common filters:

1.) General Non-causal:
H(z) =
hk z k .
k=
2.) General Causal:

H(z) =
hk z k
k=0
3.) Finite Impulse Response (FIR):

H(z) =
p
X
k=0
190
hk z k
Case 1: Non-causal Filter.
=E
("
d(n)
k=
= rdd (0)
+
o
n
2
E = E |e(n)|
#"
hk y(n k)
l=
d(n)
hl rdy (l)
k= l=
l=
# )
hl y(n l)
hk rdy
(k)
k=
ryy (l k)hk hl
Remark: For Causal and FIR filters, only limits of sums differ.
Let
E
= 0,
i
hi = i + ji
rdy (i) =
k=
hok ryy (i k),
191
E
= 0.
i
i
In Z - domain
Pdy (z) = H o (z)Pyy (z)
which is the optimum Non-causal Wiener Filter.
Ex : d(n)
Pxx (z)
Pww (z)
x(n)
= x(n),
y(n) = x(n) + w(n),

0.36
=
(1 0.8z 1 ) (1 0.8z)
= 1.
and w(n)
are uncorrelated.
Optimal filter ?
Pyy (z)
= Pxx (z) + Pww (z)

0.36
+1
=
(1 0.8z 1 ) (1 0.8z)

1
1 0.5z
(1 0.5z)
= 1.6
(1 0.8z 1 ) (1 0.8z)
192
rdy (k) = E [d(n + k)y (n)]

= E {x(n + k) [x (n) + w(n)]}
= rxx (k).
Pdy (z)
= Pxx (z)
Pyy (z)
o
H (z) =
Pdy (z)
=
ho (k) =
h
0.36
1.6 (1 0.5z 1 ) (1 0.5z)
|k|
1
0.3
2
(k)
0.3
k
0
193
Case 2: Causal Filter.

H(z) =
hk z k
k=0
Through similar derivations as for Case 1, we have

P o
rdy (i) = k=0 hk ryy (i k),
hok =?
Split H(z) as
194
B(z)B
1
z
195
1
Pyy (z)
Pick B(z) such that the system B(z) is stable, causal,

minimum phase.
Note that
P (z) = Pyy B(z)B
B(z) is called whitening filter.
1
z
=1
Choose G (z) so that E{|e(n)|2 } is minimized.

P
rd (i) = k=0 gk r (i k).
Since P (z) = 1, r (k) = (k).
rd (i) = gi , i = 0, 1, 2,
196
hi = gi bi .
Note that
rd (i) =
=
E {d(n + i) (n)}
(
"
# )
X
E d(n + i)
bk y(n k)
k=0
bk rdy (i + k).
k=0
Since bk = 0 for k < 0 (causal),

rd (i) =
bk rdy (i + k).
pd (z) = Pdy (z)B
1
z
rd (i) = gi , for i = 0, 1, , ON LY .
197
Let
[X(z)]+ =
"
xk z k
k=
G (z) =
=
+
xk z k .
k=
gk z k
k=

G (z) = Pdy (z)B
1
z
H (z) = B(z)G (z)

H (z) = B(z) Pdy (z)B
198
1
z
Ex. (Same as previous one)

0.36
,
Pxx (z) =
(1 0.8z 1 ) (1 0.8z)
Pww (z) = 1. x(n) and w(n) independent
x(n)
+
x(n) + w(n)
H(z)
e(n)
Causal
Pdy (z)
= Pxy (z) = Pxx (z)

1
(1 0.5z)
1.6 1 0.5z
.
Pyy (z) =
1
(1 0.8z ) (1 0.8z)
B(z)
1 1 0.8z 1
( stable and causal )

1
1.6 1 0.5z
199
Pdy (z)B
1
z
=
=
=

1
Pdy (z)B
z +
0.36
1 1 0.8z
(1 0.8z 1 ) (1 0.8z) 1.6 1 0.5z

1
0.36
.
1
1.6 (1 0.8z ) (1 0.5z)

5
5
0.36
3
6z
+
1 0.5z
1.6 1 0.8z 1
5
0.36
o
3
=
G
(z)
1
1
0.8z
1.6
200
5
0.36
1
1 1 0.8z 1
o
3
H (z) =
= 0.375
.
1
1
1
1
0.8z
1
0.5z
1
0.5z
1.6
1.6
h (k) =
201

3 1 k
U (k),
8 2
k = 0, 1, 2, .
Case 3: FIR Filter:

H(z) =
p
X
hk z k
k=0
Again, we can show similarly

rdy (i) =
p
X
k=0
rdy (0)
rdy (1)
..
rdy (p)
ryy (0)
hok ryy (i k).
ryy (1)
ryy (p)

ryy (0)
ryy (1)
=
..

..

.
.

ryy (p) ryy (p 1)
ryy (0)
ho0
ho1
..
.
hop
Remark: The Minimum error E is the smallest in case (1) and

largest in case (3).
202
Parametric Methods for Line Spectra

y(n)
x(n)
= x(n) + w(n)
=
K
X
k ej(k n+k )
k=1
Initial phases, independent of each other,
uniform distribution on
k
= angular frequencies
w(n)
[, ]
amplitudes, constants, > 0
= zero-mean white Gaussian Noise,

independent of 1 , , K
203
Remarks:
Applications: Radar, Communications, .
We are mostly interested in estimating 1 , , K .
Once 1 , , K are estimated,
1, ,
K ,
found readily from
1, ,
K
Let
k ejk = k

1
y(0)

ej 1
y(1)

..
..

.
.

y(N 1)
ej(N 1)1
1
ej 2
1 , , K can be
1
ej K
..
.
ej(N 1)K
The amplitude of k is k . The phase of k is k .
204
1
2
..
.
k
Remarks:
ryy (k)
= E {y (n)y(n + k)}
K
X
i2 eji k + 2 (k)
i=1
Pyy ()
2
i=1
K
X
2
2
i2 ( i ) + 2 .
2
3
Recall that the resolution limit of Periodogram is
1
N
The Parametric methods below have resolution better than N1 .

(These methods are the so-called High - Resolution or Super Resolution methods)
205
Maximum Likelihood Estimator

w(n) is assumed to be zero-mean circularly symmetric complex
Gaussian random variable with variance 2 .
The pdf of w(n) is N (0, 2 )
f (w(n)) =
|w(n)|
1
exp
2
2
Remark: The real and imaginary parts of w(n) are real Gaussian
2
random variables with zero-mean and variance 2 .
The two parts are independent of each other.
206
f (w(0), , w(N 1)) =
1
N
( 2 )
The likelihood function of y(0),

f = f (y(0), , y(N 1)) =
( P
)
N 1
2
n=0 |w(n)|
exp
2
,
1
N
( 2 )
y(N 1) is
( P
)
N 1
2
|y(n) x(n)|
exp n=0
2
Remark: The ML estimates of

1 , , K , 1 , , K , 1 , , K are found by maximizing f
with respect to 1 , , K , 1 , , K , 1 , , K .
Equivalently, we minimize

2
N
1
K

X
X

k ej(k n+k )
g=
y(n)

n=0
k=1
207
Remarks: If w(n) is neither Gaussian nor white, minimizing g is

called the non-linear least-squares method, in general.
y(0)
1
1
..
..
..
Let y =
,
=
,
.
.
.
y(N 1)
K
K
B=
ej1
..
.
ej2
ej(N 1)1
208
1
ejK
..
.
ej(N 1)K
(y B) (y B) .
h
h
iH
i

1
1
= BH B
BH B BH B
BH y
BH y
1 H
H
H
H
B y.
+ y yy B B B
=
1
argmax yH B BH B
BH y .

1

= BH B
.
BH y
is a consistent estimate of
Remarks:
209
For large N ,
h
) (
)
E (
6 2
N3
1
21
..
.
1
2K
CRB
However,
is difficult to implement.
The maximization to obtain
* The search may not find global maximum.
* Computationally expensive.
210
Special Cases:
1.) K = 1
h
= argmax yH B BH B
|
{z
B=
g1
ej
..
.
ej(N 1)
211
1
BH y ,
}
H
, B B = N.
BH y
1 ej
N
1
X
ej(N 1)
y(0)
..
.
y(N 1)
y(n)ejn
n0

2
N
1

1 X

argmax
y(n)ejn

N
n=0
corresponds to the highest peak of the Periodogram !
212
2.)
2
.
N
1
Since V ar (
k k ) 3
N
1

k k 3 .
N2
2
.
i
k | >
infi6=k |
N
We can resolve all K sine waves by evaluating g1 at FFT points:
= infi6=k |i k | >
2
i,
i =
N
Any K
of these
i = 0, , N 1
BH B = N I, I = Identity matrix.

2
K
N
1

X 1 X
j
k n
g1 =
y(n)e

.

N
gives
k=1
n=0
213
2
N
The K
i that maximizes g1 correspond to the
largest K peaks of the Periodogram.
Remarks:
k estimates obtained by using the K largest peaks of
Periodogram have accuracy
k k 2
N
The periodogram is a good frequency estimator.
introduced by Schuster a century ago !)
214
(This was
High - Resolution Methods

Statistical Performance Close to ML estimator ( or CRB ) .
Avoid Multidimensional search over parameter space.
Do not depend on Resolution condition.
All provide consistent estimates
All give similar performance, especially for large N .
Method of choice is a Matter - of - Taste .
215
Higher - Order Yule- Walker ( HOYW) Method:

Let

jk 1
1e z
xk (n)
xk (n) = k ej(k n+k )

= xk (n) ejk xk (n 1)
= k ej(k n+k ) ejk k ej[k (n1)+k ]

=

jk 1
is an Annihilating filter for xk (n).
1e z

K
jk 1
Let A(z) = k=1 1 e z
A(z)x(n) = 0
y(n) = x(n) + w(n)
A(z)y(n) = A(z)w(n)
216
()
Remark:
It is tempting to cancel A(z) from both sides above, but this is
wrong since y(n) 6= w(n) !
Multiplying both sides of () by a polynomial A(z)

of order L K gives

1
L
1
L
y(n) = 1 + a
1 z + + a
w(n)
1+a
1 z + + a
L z
L z
1+a
1 z 1 + + a
L z L = A(z)A(z)
1
a
[y(n) y(n 1) y(n L)]

.. = w(n)+ +aL w(nL)
.
a
L
where
217
Multiplying both sides by
we get
ryy (L)
..
.
ryy (L + 1)
..
.
ryy (L + M )
ryy (1)
ryy (L + M 1) ryy (M )
y (n L M )
ryy (1)
= 0.
.
..
ryy (M )
a
L
a
1
ryy (L + 1)
..
..
. =
.
a
L
ryy (L + M )

a =
218
y (n L 1)
..
.
Remarks:
When y(0), , y(N 1) are the only data available, we first
estimate ryy (i) and replace ryy (i) in above equation with estimate
ryy (i)
{
K } are the angular positions of the K roots nearest the unit
circle
Increasing L and M will
* give better performance due to using the information in
higher lags of r(i)
Increasing L and M too much will
* give worse performance due to increased variance in r(i) for
large i
219
has rank K, if M K and L K
y(n)
y(n
1)
i (n) =
Proof: Let y
..
y(n i + 1)
(n) =
x
x1 (n)
..
.
xK (n)
i (n) =
w
w(n)
w(n 1)
..
.
w(n i + 1)
xk (n) = k ej(k n+k )
220
i (n) =
y
ej1
..
.
ej2
ej(i1)1
1
ejK
ej(i1)K
{z
Ai
Ai = i K Vandermonde matrix.
rank(Ai ) = K if i K and k 6= l for k 6= l.
i (n) = Ai x
(n) + w
i (n)
y
221
x
i (n)
(n) + w
y(n L 1)
..
Thus = E
.
y(n L M )

y (n 1) y (n L)
(n L 1)
x (n
= E AM x
where
PL+1
= AM PL+1 AH
L,

H
(n L)
x (n)
= E x
222
1)AH
L
E {xi (n)}
E {xi (n k)xi (n)}
o
n
= E i ej(i n+i )
Z
1
=
i eji n eji di = 0
2
n
o
j[i (nk)+i ]
j(i n+i )
i e
= E i e
= i2 eji k
Since i s are independent of each other,

E xi (n k)xj (n) = 0, i 6= j
223
PL+1
x1 (n L 1)
x2 (n L 1)
h
x (n 1) x (n 1)
= E
.
1
xK (n L 1)
0
12 ej1 L
..
.
.
.
.
=
.
.
.
2 jK L
0
K
e
Remark: For M K and L K, is of rank K, so is .
224
Consider
ryy (L)
ryy (1)
..
ryy (L + M 1) ryy (M )
=
Remarks: rank ()
r (L + 1)
1
yy
..
..
.
.
a
L
ryy (L + M )
min(M, L)
almost surely, due to errors in ryy (i)

ill conditioned.
For large N , ryy (i) ryy (i) makes
For large N , LS estimates of a
1 , , a
L give poor estimates of
1 , , K .
225
Let us use this rank information as follows: Let

= UVH
[U1
U2 ]
V1H
V2H
K
LK
(Diagonal
denote the singular
value
decomposition
(SVD)
of
.
1 0
arranged from large to small ).

elements in
0 2
is close to rank K, and has rank K,

Since
K = U1 1 VH
in the Frobenius Norm

(The best Rank - K Approximation of
sense) is generally a better estimate of than .

Ka

= V1 1
U
a
1
1
226
()
Remark:
K to replace gives better frequency estimation.
Using
K is closer to
This result may be explained by the fact that
than .
The rank approximation step is referred as noise cleaning .
227
Summary of HOYW Frequency Estimator

Step 1: Compute r(k), k = 1, 2, , L + M .
and determine a
with (**)
Step 2: Compute the SVD of
Step 3: Compute the roots of
1+a
1 z 1 + + a
L z L = 0
Pick the K roots that are nearest the unit circle and obtain the
frequency estimates as the angular positions (phases) of these roots.
Remarks: Rule of Thumb for selecting L and M :
LM
N
L+M
3
Although one cannot guarantee that the K roots nearest the unit
circle give the best frequency estimates, empirical evidence shows
that this is true most often .
228
Some Math Background

Lemma: Let U be a unitary matrix; i.e., UH U = I.
Then ||Ub||22 = ||b||22 ,
where
||x||22 = xH x.
Proof:
||Ub||22 = bH UH Ub = bH b = ||b||2 .
Consider Ax b,
where
is
M L,
x is
L 1,
b is
M 1,
229
is of rank K
SVD of A:
A = UV
U1
U2
V1H
V2H
Goal: Find the minimum-norm x so that ||Ax b||22 = minimum.

||Ax b||22
= ||UH Ax UH b||22
= ||UH UVH x UH b||2

H
= || |V{z
x} UH b||22
y
= ||y UH b||22

2

H
1 0
y1
U1 b

y2
b
0 0
UH

2
2
2
H
2
= ||1 y1 UH
1 b||2 + ||U2 b||2
230
To minimize ||Ax b||22 , we must have,

1 y1 = UH
1 b
H
y1 = 1
1 U1 b .
Note that y2 can be anything and ||Ax b||22 is not affected.
Let y2 = 0 so that ||y||22 = ||x||22 = minimum.
y1
H
V x=y=
0
i y
h
1
= V1 y1
x = Vy = V1 V2
0
H
x = V1 1
U
1 b.
1
||x||22 = ||y||22 =
231
minimum
Recall:
SVD Prony Method

1
L
y(n)
1+a
1 z + + a
L z
= (1 + a
1 z 1 + + a
L z L )w(n). (L K)
At not too low SNR,
h
i
y(n) y(n 1) y(n L)
y(L)
y(L 1)
y(0)
y(L)
y(1)
y(L + 1)
..
y(N 1) y(N 2) y(N L 1)

232
a
1
.. 0
.
a
L
1
a
.. 0
.
a
L
()
Remark: If w(n) = 0, Eq () holds exactly.

If w(n) = 0, Eq () gives EXACT frequency estimates.
Consider next the rank of
x(L 1)
x(0)
..
X=
.
x(N 2) x(N L 1)
Note
x(0)
..
.
x(N L 1)
ej1
..
.
ej(N L1)1
233
1
ejK
..
.
ej(N L1)K
.
..
x(1)
..
.
x(N L)
X=
..
ej1
..
.
..
..
.
ejK
ej(N L1)1
..
j1
e
1
..
ejK
K
ej(N L1)K
1
1
ej1
ejK
..
..
.
.
0
j(N L1)1
j(N L1)K
e
e
ej(L1)1 ej1 1
j(L1)2
ej2 1
e
..
..
..
.
.
.
ej(L1)K
ejK
234
0
..
.
K
Remark: If N L 1 K and L K, X is of rank K.

From (*)
y(L 1)
y(0)
..
..
.
.
y(N 2) y(N L 1)
{z
|
Y
y(L)
a
1
..
..
.
.
y(N 1)
a
L
}
{z
|
y
Remark: A rank K approximation of Y has Noise Cleaning

effect.
h
i
V1H
K
0
1
Let Y = U1 U2
V2H
LK
0 2
235
a
1
..
.
L
a
H
U
= V1 1
1
1
y(L + 1)
..
.
y(N 1)
()
Summary of SVD Prony Estimator.
Step 1. Form Y and compute SVD of Y

Step 2. Determine
a with ()
Step 3. Compute the roots from
a. Pick K roots that are nearest

the unit circle. Obtain frequency estimates as phases of the roots.
Remark: Although one cannot guarantee that the K roots
nearest the unit circle give the best frequency estimates, empirical
results show that this is true most often.
A more accurate method is obtained by cleaning (i.e., rank K
..
approximation of) the matrix [Y . y].
236
Pisarenko and MUSIC Methods

Remark: Pisarenko method is a special case of MUSIC ( Multiple
Signal Classification ) method.
Recall:
M (n) =
y
AM
y(n)
y(n 1)
..
.
y(n M + 1)
ej1
..
.
ej(M 1)1
237
ejK
..
.
ej(M 1)K
(n) =
x
x1 (n)
..
.
xK (n)
M (n) =
w
w(n)
..
.
w(n M + 1)
M (n) = AM x
(n) + w
M (n)
y
Let
R = E
H
M (n)
y
yM
(n)
(n)AH
M
(n)
x
= E AM x

H
M (n)w
M (n)
+ E w
238
2
R = AM PAH
M + I,
0
12
..
P=
.
.
2
0
K
Remarks: rank (AM PAH

M ) = K if M K.
If M K, AM PAH
M has K positive eigenvalues and M K zero
eigenvalues. We shall consider M K below.
Let the positive eigenvalues of AM PAH
M be denoted
1
2
K
The eigenvalues of R are:
=
k + 2 , k = 1, , K.
k
Two groups
k = 2 , k = K + 1, , M
239
Let s1 , , sK be the eigenvectors of R that correspond to

1 , , K .
Let S = [s1 , , sK ]
Let sK+1 , , sM be the eigenvectors of R that correspond to
K+1 , , M .
Let G = [sK+1 , , sM ]
2
0
..
RG = G
= 2 G
.
0 2

H
2
RG =
AM PAM + I G
=
2
AM PAH
MG + G
AM PAH
MG = 0
240
AH
MG = 0
Remark:
Let the linearly independent K columns of AM define
K -dimensional signal subspace
* Then the eigenvectors of R that correspond to the M K
smallest eigenvalues are orthogonal to the signal subspace.
* The eigenvectors of R that correspond to the K largest
eigenvalues of R span the same signal subspace as AM .
AM = SC for a K K
non-singular C.
241
MUSIC:
K
The true frequency values {k }k=1 are the only solutions of

H
aH
()GG
aM () = 0.
M
ej
aM () =
..
ej(M 1)
Steps in MUSIC:
PN
1
H
M (n)
Step 1: Compute R = N n=M y
yM
(n), and its
eigendecomposition.
whose columns are the eigenvectors of R
that correspond
Form G
to the M K smallest eigenvalues of R.
242
Step 2a (Spectral MUSIC): Determine the frequency estimates as

the locations of the K highest peaks of the MUSIC spectrum
1
,
H
H
aM ()GG aM ()
[, ]
Step 2b (Root MUSIC): Determine the frequency estimates as

angular positions (phases) of K (pairs of reciprocal) roots of
equation

H
1 H
GG aM (z) = 0
z
a
M
that are closest to the unit circle

1
M +1 T
aM (z) = 1 z
z
, i.e.,
243
aM (z)|z=ej = aM ()
Pisarenko Method = (MUSIC with M = K + 1)

Remarks:
Pisarenko method is not as good as MUSIC.
M in MUSIC should not be too large due to poor accuracy of
r(k) for large k.
244
ESPRIT Method
(Estimation of Signal Parameters by Rotational Invariance
Techniques )
ej1
ejK
AM =
..
ej(M 1)1
ej(M 1)K
Let B1 = first M 1 rows of AM ,
B2 = last M 1 rows of AM .
B2 D = B1 ,
D=
ej1
0
..
.
ejK
0
245
Let S1 and S2 be formed from S the same way as B1 and B2 from

AM
Recall: S = AM C
S = B C = B DC.
1
1
2
S2 = B2 C
S2 C1 = B2
S1 = S2 C1 DC = S2 .
1
H
S2 S2
=
SH
2 S1 .
The diagonal elements of D are the eigenvalues of .

1
H S
H S
= S
S
Steps of ESPRIT: Step 1:
2 2
2 1
Step 2: Frequency estimates are angular positions of the
eigenvalues of .
246
Remarks:
1
2 S
S
can also be solved with Total Least Squares Method
Since is K K matrix, we do not need to pick K roots nearest
the unit circle, which could be wrong roots.
ESPRIT does not require the search over parameter space, as
required by Spectral MUSIC.
All of these remarks make ESPRIT a recommended method !
247
Sinusoidal Parameter Estimation in the Presence

of Colored Noise via RELAX
y(n) =
K
X
k ejk n + e(n)
k=1
Complex amplitudes, unknown.
Unknown frequencies.
e(n)
Unknown AR or ARMA noise.
Consider the Non-linear least-squares (NLS) method.

g=

2
K

X

jk n
k e
y(n)

N
1
X
n=0
k=1
248
Remarks:
k ,
k and
k = 1, , K are found by minimizing g .
When e(n) is zero mean Gaussian white noise, this NLS method
is the ML method.
When e(n) is non-white noise, NLS method gives asymptotically
(N ) statistically efficient estimates of
k and k despite the
fact that NLS is not an ML method for this case.
The non-linear minimization is a difficult problem.
249
Remarks:
Concentrating out {k } gives
h
i

1
= argmax yH B BH B
BH y
= B B
1

B y
H
Concentrating out {k }, instead of simplifying the problem,

actually complicates the problem.
The RELAX algorithm is a relaxation - based optimization
approach.
RELAX is both computationally and conceptually simple.
250
Preparation:
Let
yk (n) = y(n)
K
X
i ej i n
i=1,i6=k
* i and
i , i 6= k, are assumed given, known, or estimated.
Let
gk =
N
1
X
n=0
* Minimizing gk gives:

yk (n) k ejk n 2 .

2
1
NX

k = argmaxk
yk (n)ejk n .

n=0

N
1

X
1

j
n
k =
.
yk (n)e k

N
n=0
k =
k
251
Remarks:
N
1
X
yk (n)ejk n
is the DTFT of yk (n)!
n=0
(can be computed via FFT and zero-padding.)
k corresponds to the peak of the Periodogram!
k is the peak height (complex number!) of the DTFT of yk (n)

(at
k ) divided by N .
252
The RELAX Algorithm

Step 1: Assume K =1. Obtain
1 and 1 from y(n).
1 and 1
Step 2: Obtain y2 (n) by assuming K=2 and using
obtained from Step 1.
2 from y2 (n)
and
Obtain
Iterate until converg.

2 and 2
Obtain y1 (n) by using
and reestimate
1 and 1 from y1 (n)
Step 3: Assume K = 3.
1 ,1 ,
2 , 2 . Obtain
3 and 3 from y3 (n) .
Obtain y3 (n) from
2 , 2 ,
3 , 3 . Reestimate
1 and 1 from y1 (n).
Obtain y1 (n) from
Obtain y2 (n) from
1 , 1 ,
3 , 3 . Reestimate
2 and 2 from y2 (n).
Iterate until g does not decrease significantly anymore !
253
Step 4: Assume K = 4,
..
.
Continue until K is large enough!
Remark:
RELAX is found to perform better than existing
high-resolution algorithms, especially in obtaining better k ,

k = 1, , K
RELAX is more robust to the choice of K and the data model

errors.
254

Li Slides

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Li Slides

Hochgeladen von

Copyright:

Verfügbare Formate

EEL 6537 Spectral Estimation

Spectral Estimation is an Art

What is Spectral Estimation?

Spectral Estimation Methods:

Some Applications of Spectral Estimation

REVIEW OF DSP FUNDAMENTALS

Periodic Signals Discrete Spectra.

Discrete Signals Periodic Spectra.

Discrete Periodic Signals Periodic Discrete Spectra.

* Fourier Transform (Continuous - Time vs. Discrete-Time)

x(nT )(t nT )ejt dt

Remarks: Discrete-Time Fourier Transform (DTFT) is the same as

For simplicity, we drop T.

Remark: For DTFT, we also have:

Discrete Periodic Signals Periodic Discrete Spectra.

Note The Aliasing

Ex. Note the Aliasing

Showing One Period

Remarks: For periodic sequences, DFT and DTFT yield similar

Remark: The more zeroes padded, the closer X(k) is to X().

For finite duration x(n),

The DFT X(k) is related to X(z) as follows:

( X(k) evenly sampled on the unit circle of the z-plane)

Linear Time-Invariant (LTI) Systems.

Bounded-Input Bounded-Output (BIBO) Stability:

ENERGY AND POWER SPECTRAL DENSITIES

Parsevals Energy Theorem:

Remark: |X()| measures the length of orthogonal projection of

x(n)x (n k)ejn ej(nk)

Remark: S() is the DTFT of the autocorrelation of finite

Power Spectral Density (PSD) of Random Signals.

Def: A is positive semidefinite if zH Az 0 for any z.

(zH = (zT ) Hermitian transpose ).

Obviously, A is positive semidefinite.

Then all eigenvalues of A are 0.

First Definition of PSD:

Remark: Since r(k) is discrete, P () and P (f ) are periodic, with

P ()d = Average power for all frequency.

Average power between 1 and 2

Second Definition of PSD.

This definition is equivalent to the first one under

(which means that {r(k)} decays sufficiently fast ).

For complex x(n), r(k) = r (k).

PSD for LTI Systems.

Spectral Estimation Problem

Remark: Pp () 0 for all .

Correlogram (See first PSD definition)

Unbiased Estimate of r(k):

Pc () based on unbiased r(k) may be 0.

Biased Estimate of r(k) (used more often!)

k 0, r(k) = 1 PN 1 x(i)x (i k),

General Comments on Pp () and Pc ().

Recall DFT: (N 2 complex multiplication)

This process is continued till 2 points.

Zero padding will not change resolution of Pp ().

FUNDAMENTALS OF ESTIMATION THEORY

Maximum Likelihood (ML) Estimate of a:

Ex. x is a measurement of an uniformly distributed random

Question: What if two independent measurements x1 and x2 are

* The denominator of the CRB is known as Fishers Information,

where = holds iff g1 (x) = cg2 (x) for some constant c (c is

where = holds iff

(where c is a constant independent of r).