Ch7-Short-Time Fourier Transform Analysis and Synthesis

Speech Processing
Short-Time Fourier Transform

Analysis and Synthesis
Short-Time Fourier Transform Analysis

and Synthesis Minimum-Phase Synthesis
Speech & Audio Signals are varying and can be considered

stochastic signals that carry information.
This necessitates short-time analysis since a single Fourier

transform (FT) can not characterize changes in spectral content
over time (i.e., time-varying formants and harmonics)
Discrete STFT vs.

Discrete-time STFT which is continuous in frequency.
In linear Prediction and Homomorphic Processing, underlying

model of the source/filter is assumed. This leads to:
Discrete-time short-time Fourier transform (STFT) consists of

separate FT of the signal in the neighborhood of that instant.
FT in the STFT analysis is replaced by the discrete FT (DFT)
Resulting STFT is discrete in both time and frequency.
Model based analysis/synthesis, also note that

Analysis methods presented implicitly both used short time analysis
methods (to be presented).
In Short-Time Analysis systems no such restrictions apply.
December 30, 2015
Veton Kpuska
Short-Time Analysis (STFT)

Two approaches of STFT are
explored:
1. Fourier-transform &
2. Filterbank
December 30, 2015
Veton Kpuska
Fourier-Transform View
Recall (from Chapter 3):
X n,
x m w n me
j n
w[n] is a finite-length, symmetrical

sequence (i.e., window) of length Nw.
w[n] 0 for [0, Nw-1]
w[n] Analysis window or Analysis Filter
December 30, 2015
Veton Kpuska
x[n] time-domain signal
fn[m]=x[m]w[n-m] - Denotes short-time
section of x[m] at point n. That is, signal at
the frame n.
X(n,) - Fourier transform of fn[m] of shorttime windowed signal data.
Computing the DFT:
X n, k X n, |
December 30, 2015
Veton Kpuska
2
k
N
Thus X(n,k) is STFT for every =(2/N)k
Frequency sampling interval = (2/N)
Frequency sampling factor = N
DFT:
X n,k x m w nm e
2
km
N
December 30, 2015
Veton Kpuska
December 30, 2015
Veton Kpuska
Example 7.1
Let x[n] be a periodic impulse train sequence:
x[ n]
[n lP ]
-P
2P
3P
Also let w[n] be a triangle of length P:
-P/2
P/2+1
P-points
December 30, 2015
Veton Kpuska
Example 7.1
X ( n, )
x[m]w[n m]e
jm
(m lP) w[n m]e
jm
w(n lP)e
j lP
Non-zero only
for m=lP
Window located at
lP &
Linear phase -lP
December 30, 2015
Veton Kpuska
Example 7.1
Since windows w[n] do not overlap, |X(n,)| =
constant and X(n,) is linear.
Computation of DFT for N=P gives:
2
X ( n, k )
x[m]w[n m]e
km
(m lP )
w(n lP)e
X n, k
2
j
k lP
P
w(n lP)
constant
December 30, 2015
w[ n m]e
Veton Kpuska
2
km
P
1
DFT of translated,
non-overlapping
windows with
phase shift of zero
(due to sampling)
10
Spectogram |X(n,)|2
If analysis window length is pitch period
wideband spectrogram
vertical striations
Otherwise
narrowband spectrogram
horizontal striations
How often to apply analysis window to the signal?
X(n,k) is decimated by a temporal decimation
factor L:
X(nL,k) = DFT{fnL(m)}
fnL[m] sections are a subset of fn[m]
How to chose sampling rates in time (L) and

frequency (N-fft length) it will be addressed in one of
the forthcoming sections.
December 30, 2015
Veton Kpuska
11
Analysis window
x[m]
p=1
December 30, 2015
L
p=2
w[pL-m]
p=3
Veton Kpuska
12
Spectrogram |X(n,)|2
December 30, 2015
Veton Kpuska
13
Note that in , X(n,) is periodic over 2 (same as Fourier

transform) and is Hermetian (H=H) symmetric.
For real sequences
Re{X(n,)} or |X(n,)| is symmetric
Im{X(n,)} or arg{X(n,)} is anti-symmetric
A time-shift results in linear phase shift (same as in Fourier

Transform):
~
X (n, ) x[mn0 ]w[nm]e jm x[q ]w[nn0 q]e j ( qn0 )
m
j n 0
jn0
j q
x
[
q
]
w
[
n
q
]
e
e
X (nn0 , )
Thus, a shift by n0 in the original time sequence introduces a linear

phase, but also a shift in time, corresponding to a shift in each
short-time section by n0.
December 30, 2015
Veton Kpuska
14
Filtering View
In this interpretation w[n] is considered to be a
filter whose impulse response is w[n].
Thus w[n] is referred to as analysis filter.
Lets fix the value of =o.
X n,o x m e jo m w nm
m
The above equation represents the convolution of

the sequence x[n]e-jon with the sequence w[n].
Thus:
X n,o x n e
December 30, 2015
j o n
Veton Kpuska
w n
15
Filtering View
The product:
x[n]e-jon
Modulation of x[n] up to frequency o.
December 30, 2015
Veton Kpuska
16
Filtering View
Alternate view:
The discrete STFT can be also

interpreted from the filtering
viewpoint.
X n,o e jo n x n w ne jo n
X n,k e
2
kn
N
x n w n e
2
kn
N
This equation brings the

interpretation of the discrete
STFT as the output of the
filter bank shown in the next
slide.
December 30, 2015
Veton Kpuska
17
Filtering View
December 30, 2015
Veton Kpuska
18
Filtering View
General Properties:
1. If x[n] has the length N & w[n] has the
length M, then X(n,) has length
N+M+1 along n.
2. The bandwidth of X(n,o) is less than or
equal to that of w[n].
3. Sequence X(n,o) has its spectrum
centered at the origin.
December 30, 2015
Veton Kpuska
19
Example 7.2
Consider a Gaussian window of the form:
a ( n n o ) 2
w[n] e
The discrete STFT with DFT length N, therefore, can be

considered as a bank of filters with impulse responses:
hk [n]e
a ( nno ) 2
2
kn
N
For x[n]=(n) x[n]*hk[n]=hk[n]

If N=50, corresponding to bandpass filters spaced by
200 Hz for the sampling rate of 10000 samples/s,
then:
December 30, 2015
Veton Kpuska
20
Example 7.2
For k=0,5,10,15 the following is
obtained:
2
j
0n
a ( nno ) 2
a ( nno ) 2
50
ho [n]e
e
e
h5 [n]e
a ( n no ) 2
h10 [n]e
h15 [n]e
December 30, 2015
a ( nno )
a ( nno )
e
e
2
5n
50
j
2
10 n
50
2
15 n
50
Veton Kpuska
21
Example 7.2
December 30, 2015
Veton Kpuska
22
Example 7.3
Consider the filter bank of previous example 7.2 that was designed
with a Gaussian window of the form:
w[n]e
a ( n no ) 2
Figure 7.7 shows the Fourier transform magnitudes of the output of the
four complex bandpass filters hk[n] for k=0,5,10, and 15 as presented
in previous slide and depicted in the figure 7.6.
December 30, 2015
Veton Kpuska
23
Example 7.3
After Demodulation the resulting bandpass outputs
have the same spectral shape as in the figure but
centered at the origin.
December 30, 2015
Veton Kpuska
24
Time-Frequency Resolution
Tradeoffs
In Chapter 3 basic issue in analysis window selection is the

compromise required between a long window for showing signal detail
in frequency and a short window required for representing fine
temporal structure:
STFT {x[n]} f n [m] x[m]w[nm] X ( )W ( )e jn
1
jn
W
(
)
e
X ( )d
Since both X() and W() are periodic over 2 linear convolution is
essentially circular.
From the equation above:
W() smears (smoothes) X().
Want W() as narrow as possible ideally W()=() for good
frequency resolution.
W()=() will result in a infinitely long w[n].
Poor time resolution.
Conflicting goal
December 30, 2015
Veton Kpuska
25
Example 7.4
Figure 7.8 depicts time-frequency resolution
tradeoff:
December 30, 2015
Veton Kpuska
26
Tradeoffs
From the previous example, smoothing interpretation of
STFT is not valid for non-stationary sequences.
For steady signal long analysis windows are appropriate
and they yield good frequency resolution as depicted in
the next figure.
December 30, 2015
Veton Kpuska
27
Tradeoffs
However, for short and transient signals, plosive
speech, flaps, diphthongs, etc. , short windows are
preferred in order to capture temporal events.
Shorter windows yield poor frequency resolution.
December 30, 2015
Veton Kpuska
28
Short-Time Synthesis
How to obtain original sequence back from its
discrete-time STFT?
The inversion is represented mathematically by a
synthesis equation which expresses a sequence in
terms of its discrete-time STFT.
Recall that for fn[m]=x[m]w[n-m]:
X (n, ) f n [m]e jn
Thus:
X (n, ) f n [m] x[m]w[nm]
If w[n]0 then recovery is complete.

December 30, 2015
Veton Kpuska
29
For each n, we take the inverse Fourier transform of the
corresponding function of frequency, then we obtain the
sequence fn[m].
Evaluating fn[m] for m=n the following is obtained:
x[n]w[0].
For w[0]0 x[n] can be obtained by dividing fn[n]/w[0].
The process of taking the inverse Fourier transform of

X(n,) for a specific n and then dividing by w[0] is
represented in the following relation:
1
j n
x[n]
X
(
n
,
)
e
d
2w[0]
representing synthesis equation for the discrete-time

STFT.
December 30, 2015
Veton Kpuska
30
In contrast to discrete-time STFT X(n,) the
discrete STFT X(n,k) is not always invertible.
Example 1.
Consider the case when w[n] is bandlimited with
bandwidth of B.
December 30, 2015
Veton Kpuska
31
Note if there are frequency components of x[n] which
do not pass through any of the filter regions of the
discrete STFT then
it is not a unique representation of x[n], and
x[n] is not invertible.
Example 2.
Consider X(n,k) decimated in time by factor L, i.e.,
STFT is applied every L samples.
w[n] is non-zero over its length Nw.
If L > Nw then there are gaps in time where x[n] is not
represented/considered.
Thus in such cases again x[n] is not invertible.
December 30, 2015
Veton Kpuska
32
L > Nw
x[m]
L
w[pL-m]
Nw
December 30, 2015
Veton Kpuska
33
Conclusion:
Constraints must be adopted to ensure
uniqueness and invertability:
1. Proper/Adequate frequency sampling:
B2/Nw (B - Window bandwidth)
2. Proper Temporal Decimation: LNw
December 30, 2015
Veton Kpuska
34
Filter Bank Summation (FBS)

Method
Traditional short-time synthesis method that is
commonly referred to as the Filter Bank Summation
(FBS).
FBS is best described in terms of the filtering
interpretation of the discrete STFT.
The discrete STFT is considered to be the set of
outputs of a bank of filters.
The output of each filter is modulated with a complex
exponential
Modulated filter outputs are summed at each instant
of time to obtain the corresponding time sample of
the original sequence (see Figure 7.5(b) in the slide
18).
December 30, 2015
Veton Kpuska
35

Method
Recall the synthesis equation given earlier:
1
jn
x[n]
X
(
n
,
)
e
d
2w[0]
FBS method carries out discrete version of this
equation by utilizing discrete STFT X(n,k):
2
j kn
1 N 1
N
y[n]
X
(
n
,
k
)
e
Nw[0] k 0
Derive conditions such that to ensure that

y[n] x[n].
December 30, 2015
Veton Kpuska
36

Method
1
From Figure 7.5

x[n]
Thus:
Analysis followed
by synthesis
N 1
1
y[n]
Nw[0] k 0
y[n]
x[m]w[nm]e
2
km
N
2
kn
N
X ( n ,k )
Interchanging summation operation this equation

2
reduces to:
N 1
j nk
1
y[n]
x[n]w[n]e
Nw[0]
k 0
December 30, 2015
Veton Kpuska
37

Method
Furthermore
N 1
1
y[n]
x[n]w[n]e
Nw[0]
k 0
N 1
1
y[n]
x[n]w[n]e
Nw[0]
k 0
2
nk
N
2
nk
N
1
y[n]
x[n]w[n] N [nrN ]
Nw[0]
r

Periodic impulse train
period N
December 30, 2015
Veton Kpuska
38

Method
Thus:
1
y[n]
x[n]w[n] [nrN ]
w[0]
r
y[n] is the output of the convolution of x[n] with a

product of the analysis window with a periodic
impulse sequence.
Note:
w[n] [nrN ]
reduces to [n] if:
Window length NwN, or

For Nw>N, must have w[rN]=0 for r0, that is
w[rN ]0; for r 1,2 ,3, ...

December 30, 2015
Veton Kpuska
39

Method
December 30, 2015
Veton Kpuska
40

Method
This constraint is known as the FBS constraint.
It must be fulfilled in order to ensure exact signal
synthesis with the FBS method.
This constrained is commonly expressed in frequency
domain:
N 1
2
W
k Nw 0
k 0
This expression states that the frequency responses of

the analysis filters should sum to a constant across the
entire bandwidth.
We will conclude this discussion by stating that a filter
bank with N filters, based on an analysis filter of length
less than or equal to N, is always an all-pass system.
December 30, 2015
Veton Kpuska
41
Generalized FBS Method

Note:
1
x[n]
2
jn
f
[
n
,
n
r
]
X
(
r
,
)
e
d
Smoothing function f[n.m] is referred to as the timevarying synthesis filter.

It can be shown that any f[n,m] that fulfills the condition
below makes the synthesis equation above valid (Exercise
7.6):
f [nm]w[m]1
Note also that basic FBS method can be obtained by setting

the synthesis filter to be a non-smoothing filter:
f[n,m]=[m]
December 30, 2015
Veton Kpuska
42

Consider the discrete STFT with decimation factor
L. Generalized FSB of the synthesized signal is
given by:
N 1
L
y[n] f [n,nrL] X (rL,k )e
N r k 0
2
nk
N
Furthermore, consider time invariant smoothing

filter:
f[n,m]=f[m]
That is:
f[n,n-rL]=f[n-rL]
December 30, 2015
Veton Kpuska
43

Thus
j nk
L N 1
y[n] f [nrL] X (rL,k )e N
N r k 0
This equation holds when the following constrain is

satisfied by the analysis and synthesis filters as well as
the temporal decimation and frequency sampling factors:
L f [nrL]w[rLn pN ] [ p ],
For f[m]=[m] and L=1 this method reduces to the basic

FBS method.
December 30, 2015
Veton Kpuska
44

Interested in L>1 case and in using f[n] as
interpolator.
Interpolation FBS Methods:
1. Helical Interpolation (Partnoff)

2. Weighted Overlap-add Method (Croshiere)
December 30, 2015
Veton Kpuska
45
Overlap-Add (OLA) Method
FBS Method was motivated from the filtering view of the STFT
OLA method was motivated from the Fourier transform view of
the STFT.
In the OLA method:

1.
2.
Inverse DFT for each fixed time in the discrete STFT is taken,
Overlap and add operation between the short-time section is
performed,
This works provided that analysis window is designed such that

the overlap and add operation effectively eliminates the
analysis window from the synthesized sequence.
Basic idea is that the redundancy within overlapping segments
and the averaging of the redundant samples remove the effect
of windowing.
December 30, 2015
Veton Kpuska
46

Recall the short-time synthesis relation:
1
jn
x[n]
X
(
n
,
)
e
d
2W [0]
If x[n] is averaged over many short-time segments
and normalized by W(0) then
1
jp
x[n]
X
(
p
,
)
e
d
2W [0] p
where
W (0) w[n]
n
December 30, 2015
Veton Kpuska
47
Discretized version of OLA is given by:

j kn
1 1 N 1
y[n]
X ( p,k )e N
W (0) p N k 0

2
IDFT: f p [ n ] x[ n ] w[ pn ]
Note that the above IDFT is true provided that N>N w. The
expression for y[n] thus becomes:
y[n]
1
1
x
[
n
]
w
[
p
n
]
x
[
n
]
W (0) p
W (0)
Which provided that:
w[ pn]W (0)
then
y[n]=x[n]
December 30, 2015
Veton Kpuska
w[ pn]
Always True because sum of

values of a sequence must
always equal the first value
of its Fourier transform
(D.C. Energy of a signal is
by definition sum of signal
values)
48
For decimation in time by factor of L, it can be shown (Exercise 7.4)

that:
W (0)
w[ pLn]
L
p
Then x[n] can be synthesized using the following equation:

j kn
L 1 N 1
y[n]
X ( pL,k )e N
W (0) p N k 0
The above equation depicts general constrain imposed by OLA

method. It requires that the sum of all the analysis windows
(obtained by sliding w[n] with L-point increments) to add up to a
constant as shown in the next figure.
December 30, 2015
Veton Kpuska
49
December 30, 2015
Veton Kpuska
50
Duality of OLA constraint and FBS constraint:
FBS
OLA
2
W k Nw 0
k 0
N 1
w[ pLn]
W ( 0)
L
FBS method requires that finite-length windows have a length N w less than the
number of analysis filters N to satisfy FBS constrain (N>N w).
Analogously, for OLA methods it can be shown that its constrained is satisfied by allfinite-bandwidth analysis windows whose maximum frequency is less than 2/L
(where L is temporal decimation factor).
In addition this finite-bandwidth constraint can be relaxed by allowing the shifted

window transform replicas to take on value zero at the frequency origin =0:
W k 0,
L
at
2
k
L
Analogous to FBS constrain for N w>N where the window w[n] is required to take
on value zero at n=N, 2N, 3N,...
December 30, 2015
Veton Kpuska
51
December 30, 2015
Veton Kpuska
52
Time-Frequency Sampling
Different qualitative view of the time-frequency

sampling concepts for OLA and FBS constrains from
the perspective of classical time-domain and
frequency-domain aliasing.
Following discussion serves as additional summary of

sampling issues for those two methods that gives
motivation for our earlier statement that sufficient but
not necessary conditions for invertability of the
discrete STFT are:
1.
2.
3.
The analysis window is non-zero over its finite length Nw.

The temporal decimation factor LNw
The frequency sampling interval 2/N 2/Nw
December 30, 2015
Veton Kpuska
53
Consider windowed/short-time signal:
fn[m]=w[m]x[n-m], and
X(n,) Fourier transform of fn[m]
Analysis window duration of Nw
From Fourier transform point of view:
Reconstruction of fn[m] from X(n,k) requires a frequency

sampling of at least 2/Nw or finer.
From Time-domain point of view:
Time decimation interval L is required to meet Nyquist

criterion based on the bandwidth of the window w[n].
This implies sampling of X(n, k) at a time interval
L 2/c to avoid frequency-domain aliasing of the
time sequence X(n,)
-c
c
c is the bandwidth of W() [-c, c]
December 30, 2015
Veton Kpuska
54
December 30, 2015
Veton Kpuska
55
Sufficient (but not necessary) conditions for
signal reconstruction are:
1.
Window is non-zero over its lengths Nw
2.
3.
Temporal decimation factor L Nw (2/c)

Frequency sampling interval 2/N 2/Nw
To avoid aliasing:
I.
II.
In the time domain - by ensuring condition 3.

In the frequency domain - by ensuring condition 2.
December 30, 2015
Veton Kpuska
56
Time Decimation Sampling

Implication on the use of practical windows:
I.
Rectangular window, Nw
Assuming bandwidth equal
to the extent of the main lobe
B = [-2/Nw,: 2/Nw]= 4/Nw
2 N w
Lw
B 2
-c
;50% Overlap in windows
II. Hamming Window, Nw
Lw w
B 4
Bandwidth
2 BN= 8/Nw
December 30, 2015
;75% Overlap in windows
Veton Kpuska
57
Summary
OLA Method (DFT of order N)

1. No time aliasing if window length Nw so that:
2/N 2/Nw
2. No frequency-domain aliasing occurs if decimation
factor L is small enough so that filter bandwidth
c =(2/L)
3. If zeros are allowed in W() then condition 2 can be
relaxed. In this case we can under-sample in
frequency and still recover the sequence.
December 30, 2015
Veton Kpuska
58
Summary
FBS Method
1. No frequency-domain aliasing occurs if the
decimation factor L meets the Nyquist criterion, i.e.,
L Nw (2/c) where c is the w[n] bandwidth.
2. Not time-domain aliasing occurs if 2/N 2/Nw
Nw N.
3. If zeros in w[n] are allowed then condition 2 can be
relaxed. In this case we can under-sample in time
and still recover the sequence.
December 30, 2015
Veton Kpuska
59

Magnitude (STFTM)
Spectrogram major tool in speech applications:
Spectrogram is squared STFT magnitude (STFTM).
It has been suggested that human ear extracts
perceptual information strictly form a spectrogramlike-representation of speech ( J.C. Anderson,
Speech Analysis/Synthesis Based on Perception,
PhD Thesis, MIT, 1984)
Experienced speech researchers have trained
themselves to read the spectrogram itself (Victor
Zue, MIT).
Primary topic of FIT-ece5528 Acoustics of
American Speech.
December 30, 2015
Veton Kpuska
60

Magnitude (STFTM)
STFTM discards (possibly) phase information, which has
numerous uses in application areas:
Time-scale modification
Speech Enhancement
In all these applications phase information estimation of

speech is difficult (e.g., presence of noise in the signal)
Furthermore, a number of techniques have been
developed to obtain phase estimate from a STFT
magnitude.
This section introduces STFTM as an alternative timefrequency signal representation.
In addition analysis and synthesis techniques will be
developed for STFTM.
December 30, 2015
Veton Kpuska
61

Magnitude (STFTM)
Squared-Magnitude and Autocorrelation
Relationship:
1
r[ n , m ]
2
2
X (n, )
X ( n , ) e j n d
j n
r
[
n
,
m
]
e
Short-time
autocorrelation
Short-time
magnitude
m-autocorrelation lag
December 30, 2015
Veton Kpuska
62

Magnitude (STFTM)
Furthermore, the autocorrelation r[n,m] is given by
the convolution of the short-time signal:
r[n,m] = fn[m]*fn[-m]
where
fn[m]=x[m]w[n-m]
December 30, 2015
Veton Kpuska
63
Signal Representation
Under what conditions STFTM can be used to
represent a sequence uniquely?
Note that:
|F{x[n]}|= |F{-x[n]}|
Ambiguity, thus STFTM is not unique representation
for all cases.
However, by imposing certain mild restrictions on:
the analysis window and
the signal,
unique signal representation is indeed possible with
the discrete-time STFTM.
December 30, 2015
Veton Kpuska
64
Suppose x[n] is the sum of two

signals: x1[n] and x2[n] occupying
different regions of the n-axis.
Furthermore, suppose that the
gap of zeros between x1[n] and
x2[n] is large enough so that there
is no analysis window position for
which the corresponding shorttime section includes non-zero
samples of both x1[n] and x2[n].
Because of the ambiguity
condition STFTM of:
x1[n] + x2[n]
x1[n] - x2[n], and
-x1[n] + x2[n]
is the same.
December 30, 2015
Veton Kpuska
65
Any uniqueness conditions must include a

restriction on the length of zero gaps between nonzero portions of the signal x[n].
Sufficient uniqueness conditions are the following:
1. The analysis window w[n] is known sequence of
finite length Nw, with no zeros over its durations.
2. The sequence x[n] is one-sided with at most Nw-2
consecutive zero samples, and the sign of its first
non-zero value is known.
December 30, 2015
Veton Kpuska
66
If the successive STFTM correspond to overlapping
signal segments then:
If short-time spectral magnitude of signal segment at
time n is know then
Spectral magnitude of the adjacent section at time
n+1 must be consistent in the region of overlap with
the known short-time section.
If the analysis window were non-zero and of length Nw,
then after dividing out the analysis window, the first
Nw-1 samples of the segment at time n+1, must equal
the last Nw-1 of the segment at time n (as illustrated in
the next slide)
If the last sample of a segment can be extrapolated from
its first Nw-1 values, one could repeat this process to
obtain the entire signal x[n].
December 30, 2015
Veton Kpuska
67
December 30, 2015
Veton Kpuska
68
To develop the procedure for extrapolating the next
sample of a sequence using its STFTM, assume that the
first Nw-1 samples under the analysis window positioned
at time n are known.
The sequence x[n] has been obtained up to some time n-1

from its STFTM.
Goal is to compute sample x[n] from these initial

samples and the STFT magnitude, |X(n,)|, or
equivalently r[n,m].
December 30, 2015
Veton Kpuska
69
Note that r[n, Nw-1], the maximum lag of autocorrelation, is

given by the product of the first and last value of the segment:
r[n, N w 1] w[0]x[n 0] w[ N w 1]x[n ( N w 1)]

first of next
December 30, 2015
last of present
r[n, N w 1]
x[n]
w[0]w[ N w 1]x[n ( N w 1)]
Veton Kpuska
70
Note that:
X n,
2
jn
r
n
,
m
If the first value of the short-time section,

x[n-(Nw-1)] happens to be equal to zero, must find
the first non-zero value within the section and again
use the product relation as depicted in the last
expression.
Note that such a sample can be found because it
was assumed that there are at most Nw-2
consecutive zero samples between any two nonzero samples of x[n].
December 30, 2015
Veton Kpuska
71
Sequential extrapolation algorithm
1. Initialize with x[0]
2. Update time n
3. Compute r[n,Nw-1] from the inverse DFT
of |X(n,k)|2.
r[n, N w 1]
4. Compute: x[n]
w[0]w[ N w 1]x[n ( N w 1)]
5. Return to step (2) and repeat
December 30, 2015
Veton Kpuska
72
Reconstruction from TimeFrequency Samples

To carry out STFTM analysis on a digital computer,
discrete STFTM must be applied.
Uniqueness theory of STFTM can be easily extended
to discrete STFTM.
Uniqueness of STFTM based on the short-time
autocorrelation functions.
Autocorrelation functions can be obtained even if
the STFTM is sampled in frequency (discrete
STFTM) with adequate frequency sampling.
To consider effects of temporal decimation with
factor L, we note that adjacent short-time sections
now have an overlap of Nw-L instead of Nw-1.
December 30, 2015
Veton Kpuska
73
Reconstruction from TimeFrequency Samples
Sufficient uniqueness conditions for the partial

overlap case:
1. The analysis window w[n] is a known sequence of
finite length Nw, with no zeros over its duration.
2. The sequence x[n] is one-sided with, at most Nw-2L
consecutive zero samples.
L consecutive samples of x[n] (from the first nonzero sample) are known.
This is a sufficient but not a necessary condition.
December 30, 2015
Veton Kpuska
74
Signal Estimation from the

Modified STFT or STFTM
Synthesis of a signal from a time-frequency function of a

modified STFT or STFTM required in many applications.
Modification may arise due to:
1.
2.
3.
4.
Quantization errors (e.g., from speech coding)

Time-varying filtering
Speech Enhancement
Signal Rate modifications
Limitations:
Modifications in frequency should result in time modification

that are restricted within an analysis window (Figure 7.18
next slide)
Overlapping sections must undergo similar modifications
(Figure 7.19)
December 30, 2015
Veton Kpuska
75

Example 7.5. Removal of

interfering tone.
Consider modifying a valid

X(n,) of short time
fn[m]=x[m]w[n-m] segment
by inserting a zero gap
where there is known to lie
an unwanted interfering sine
wave component.
Removal of the interfering
signal with H(n,).
Resulting frequency
representation is:
Y(n,)=X(n,)H(n,)
Inverse transforming it to
obtain modified short-time
sequence gn[m] is non-zero
beyond the extent of the
original short-time segment
fn[m]=x[m]w[n-m].
December 30, 2015
Veton Kpuska
76

Example 7.6
At time n:
Suppose a time-decimated
STFT, X(nL,) is multiplied
by a linear phase factor
ejno to obtain
Y(nL,)=X(nL,)ejno
At time (n+1)
X((n+1)L,) is multiplied
by a negative of this linear
phase factor e-jno to obtain
Y((n+1)L,)=X((n+1)L,)
e-jno
Overlapping sections of
inverse Fourier Transforms
denoted by gnL[m] and
g(n+1)L[m] are not consistent.
December 30, 2015
Veton Kpuska
77
Heuristic Application of STFT

Synthesis Methods
Although modifications of the STFT or STFTM may
violate some principles, results may be reasonable.
Resulting effect of modifying STFT (FBS and OLA) with
another time-frequency function can be shown to be a
time-varying convolution between x[n] and a function
[n,m]: x[n]*[n,m].
Let X(n,) be modified by a function H(n,):
Y(n,) = X(n,)H(n,)
This corresponds to a new short-time segment:
gn[m] = fn[n]*h[n,m]
h[n,m] time varying system impulse response
(Chapter 2).
December 30, 2015
Veton Kpuska
78

Synthesis Methods
Consider FBS method (discretization in frequency to
obtain):
Y (n,k ) Y (n, )|
2
k
N
X (n,k ) H (n,k )
N-point IDFT of H(n,k):
~
h [n,m] h[n,mlN ], periodic over N
l
Then resulting sequence

can be written as:
y[n] x[nm]h[n,m]
where
h[n,m] w[n] h[n,mlN ]

l
December 30, 2015
Veton Kpuska
79

Synthesis Methods
Using OLA method, it can be shown (see Exercise
7.11) that:
h[n,m] w[n] h[n,mlN ]

l
Contrasting FBS with OLA

FBS:
OLA:
December 30, 2015
multiplication instantaneous change

convolution smoothing
Veton Kpuska
80

Synthesis Methods
Example 7.7
Suppose we want to deliberately introduce
reverberation into a signal x[n] by convolution with
the filter:
h[n] = [n] + [n-no]
Fourier transform of which is:
H() = 1 + e-jno
STFT of resulting signal is given by:
Y(n,)= X(n,)H()
where
X (n, ) x[m]w[nm]e jm
m
December 30, 2015
Veton Kpuska
81
Example 7.7 (cont.)

Using OLA method (7.21):
2
j kn
1 1 N 1
N
y[n]
Y p,k e
W (0) p n k 0
It is then possible to express y[n] in terms of

original sequence:
2
j k ( n m )
1 N 1
1
N
y[n]
x[ m] H k e
w[ p m]
W (0) p
N k 0 p
W (0)
IDFT h[ nmrN ]
r
x[m]h[nm]
p
December 30, 2015
Veton Kpuska
82
Example 7.7 (cont.)

Where
h[n] h[nrN ] [n rN ] [nno rN ]

is periodic extension of h[n], over N, of which we
only consider interval [0,N-1].
This implies that original reverberated signal is
obtained only when no<N, otherwise temporal alias
will occur (as illustrated in 7.20).
December 30, 2015
Veton Kpuska
83
Example 7.7 (cont.)
December 30, 2015
Veton Kpuska
84
Time-Scale Modification and

Enhancement of Speech
The signal construction methods presented in this
chapter can be applied in a variety of speech
applications.
Time-Scale Modification
In speech case would like to change articulation rate
(faster, slower) without changing the pitch
December 30, 2015
Veton Kpuska
85
December 30, 2015
Veton Kpuska
86
Methods:
Cut & Paste (Fairbanks method):
Discard or duplicate frames, in order to speed up or slow down the

articulation respectively.
Problem:
Pitch-synchronous OLA (Scott & Gerber)
Select frame size & location synchronous to pitch periods. Problem of pitch
period mismatch is avoided.
Problem:
Pitch period mismatch at adjacent frames causes distortion.
Pitch synchronization is not always easy.
STFTM Synthesis
1.
2.
3.
To avoid pitch synchronization problems use only the magnitude of STFT

(i.e., STFTM)
Compute |X(nL,)| at an appropriate frame interval decimation rate L
(e.g., L=128 at Fs=10000 Hz, and N is several T0 long)
Modify decimation rate with new rate M (e.g., M=L/2) for a speed-up of
factor of : |Y(nM,)|= |X(nL,)|
Apply the Least-Squared Error iterative estimation algorithm until |
Y(nM,)| converged.
Problem:
December 30, 2015
Occasional reverberant characteristic of synthesized signal are perceived due to

lack of STFT phase control.
Veton Kpuska
87
December 30, 2015
Veton Kpuska
88
Noise Reduction
A number of techniques developed to remove/reduce
additive noise:
Noise corrupted signal is given by:
y[n]=x[n]+b[n]
STFT Synthesis:
Subtract Noise spectrum b()
1
2
X (nL, ) Y (nL, ) Sb ( ) e -jY ( nL , )

2
2
2
if Y (nL, ) Sb ( ) 0 Y (nL, ) Sb ( ) 0
Original phase spectrum Y(nL,) is retained because

phase of the noise can not be reliably estimated in
general.
Factor is a control of the degree of noise reduction.
December 30, 2015
Veton Kpuska
89
Noise Reduction
STFTM Synthesis:
Ignore phase and use Sequential Extrapolation or
Least-Squared Error estimation method to construct
clean signal.
December 30, 2015
Veton Kpuska
90

Ch7-Short-Time Fourier Transform Analysis and Synthesis

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Ch7-Short-Time Fourier Transform Analysis and Synthesis

Hochgeladen von

Copyright:

Verfügbare Formate

Speech Processing

Short-Time Fourier Transform

Short-Time Fourier Transform Analysis

Speech & Audio Signals are varying and can be considered

This necessitates short-time analysis since a single Fourier

Discrete STFT vs.

In linear Prediction and Homomorphic Processing, underlying

Discrete-time short-time Fourier transform (STFT) consists of

Model based analysis/synthesis, also note that

In Short-Time Analysis systems no such restrictions apply.

December 30, 2015

Short-Time Analysis (STFT)

December 30, 2015

w[n] is a finite-length, symmetrical

December 30, 2015

December 30, 2015

December 30, 2015

December 30, 2015

Also let w[n] be a triangle of length P:

(m lP) w[n m]e

How to chose sampling rates in time (L) and

December 30, 2015

December 30, 2015

Note that in , X(n,) is periodic over 2 (same as Fourier

A time-shift results in linear phase shift (same as in Fourier

Thus, a shift by n0 in the original time sequence introduces a linear

December 30, 2015

The above equation represents the convolution of

December 30, 2015

December 30, 2015

The discrete STFT can be also

This equation brings the

December 30, 2015

December 30, 2015

December 30, 2015

The discrete STFT with DFT length N, therefore, can be

For x[n]=(n) x[n]*hk[n]=hk[n]

December 30, 2015

December 30, 2015

December 30, 2015

In Chapter 3 basic issue in analysis window selection is the

STFT {x[n]} f n [m] x[m]w[nm] X ( )W ( )e jn

December 30, 2015

December 30, 2015

December 30, 2015

December 30, 2015

X (n, ) f n [m] x[m]w[nm]

If w[n]0 then recovery is complete.

The process of taking the inverse Fourier transform of

representing synthesis equation for the discrete-time

December 30, 2015

December 30, 2015

December 30, 2015

Filter Bank Summation (FBS)

Filter Bank Summation (FBS)

Derive conditions such that to ensure that

Filter Bank Summation (FBS)

From Figure 7.5

Interchanging summation operation this equation

December 30, 2015

Filter Bank Summation (FBS)

December 30, 2015

Filter Bank Summation (FBS)