Sie sind auf Seite 1von 90

Speech Processing

Short-Time Fourier Transform


Analysis and Synthesis

Short-Time Fourier Transform Analysis


and Synthesis Minimum-Phase Synthesis

Speech & Audio Signals are varying and can be considered


stochastic signals that carry information.

This necessitates short-time analysis since a single Fourier


transform (FT) can not characterize changes in spectral content
over time (i.e., time-varying formants and harmonics)

Discrete STFT vs.


Discrete-time STFT which is continuous in frequency.

In linear Prediction and Homomorphic Processing, underlying


model of the source/filter is assumed. This leads to:

Discrete-time short-time Fourier transform (STFT) consists of


separate FT of the signal in the neighborhood of that instant.
FT in the STFT analysis is replaced by the discrete FT (DFT)
Resulting STFT is discrete in both time and frequency.

Model based analysis/synthesis, also note that


Analysis methods presented implicitly both used short time analysis
methods (to be presented).

In Short-Time Analysis systems no such restrictions apply.

December 30, 2015

Veton Kpuska

Short-Time Analysis (STFT)


Two approaches of STFT are
explored:
1. Fourier-transform &
2. Filterbank

December 30, 2015

Veton Kpuska

Fourier-Transform View
Recall (from Chapter 3):

X n,

x m w n me

j n

w[n] is a finite-length, symmetrical


sequence (i.e., window) of length Nw.
w[n] 0 for [0, Nw-1]
w[n] Analysis window or Analysis Filter

December 30, 2015

Veton Kpuska

Fourier-Transform View
x[n] time-domain signal
fn[m]=x[m]w[n-m] - Denotes short-time
section of x[m] at point n. That is, signal at
the frame n.
X(n,) - Fourier transform of fn[m] of shorttime windowed signal data.
Computing the DFT:

X n, k X n, |

December 30, 2015

Veton Kpuska

2
k
N

Fourier-Transform View
Thus X(n,k) is STFT for every =(2/N)k
Frequency sampling interval = (2/N)
Frequency sampling factor = N

DFT:

X n,k x m w nm e

2
km
N

December 30, 2015

Veton Kpuska

Fourier-Transform View

December 30, 2015

Veton Kpuska

Example 7.1
Let x[n] be a periodic impulse train sequence:

x[ n]

[n lP ]

-P

2P

3P

Also let w[n] be a triangle of length P:

-P/2

P/2+1

P-points
December 30, 2015

Veton Kpuska

Example 7.1
X ( n, )

x[m]w[n m]e

jm

(m lP) w[n m]e

jm

w(n lP)e

j lP

Non-zero only
for m=lP

Window located at
lP &
Linear phase -lP
December 30, 2015

Veton Kpuska

Example 7.1
Since windows w[n] do not overlap, |X(n,)| =
constant and X(n,) is linear.
Computation of DFT for N=P gives:
2
X ( n, k )

x[m]w[n m]e

km

(m lP )

w(n lP)e

X n, k

2
j
k lP
P

w(n lP)

constant
December 30, 2015

w[ n m]e

Veton Kpuska

2
km
P

1
DFT of translated,
non-overlapping
windows with
phase shift of zero
(due to sampling)
10

Spectogram |X(n,)|2
If analysis window length is pitch period
wideband spectrogram
vertical striations
Otherwise
narrowband spectrogram
horizontal striations
How often to apply analysis window to the signal?
X(n,k) is decimated by a temporal decimation
factor L:
X(nL,k) = DFT{fnL(m)}
fnL[m] sections are a subset of fn[m]

How to chose sampling rates in time (L) and


frequency (N-fft length) it will be addressed in one of
the forthcoming sections.
December 30, 2015

Veton Kpuska

11

Analysis window
x[m]
p=1

December 30, 2015

L
p=2

w[pL-m]
p=3

Veton Kpuska

12

Spectrogram |X(n,)|2

December 30, 2015

Veton Kpuska

13

Fourier-Transform View

Note that in , X(n,) is periodic over 2 (same as Fourier


transform) and is Hermetian (H=H) symmetric.
For real sequences
Re{X(n,)} or |X(n,)| is symmetric
Im{X(n,)} or arg{X(n,)} is anti-symmetric

A time-shift results in linear phase shift (same as in Fourier


Transform):

~
X (n, ) x[mn0 ]w[nm]e jm x[q ]w[nn0 q]e j ( qn0 )
m

j n 0

jn0
j q
x
[
q
]
w
[
n

q
]
e

e
X (nn0 , )

Thus, a shift by n0 in the original time sequence introduces a linear


phase, but also a shift in time, corresponding to a shift in each
short-time section by n0.

December 30, 2015

Veton Kpuska

14

Filtering View
In this interpretation w[n] is considered to be a
filter whose impulse response is w[n].
Thus w[n] is referred to as analysis filter.
Lets fix the value of =o.

X n,o x m e jo m w nm
m

The above equation represents the convolution of


the sequence x[n]e-jon with the sequence w[n].
Thus:

X n,o x n e

December 30, 2015

j o n

Veton Kpuska

w n

15

Filtering View
The product:
x[n]e-jon
Modulation of x[n] up to frequency o.

December 30, 2015

Veton Kpuska

16

Filtering View

Alternate view:

The discrete STFT can be also


interpreted from the filtering
viewpoint.

X n,o e jo n x n w ne jo n

X n,k e

2
kn
N

x n w n e

2
kn
N

This equation brings the


interpretation of the discrete
STFT as the output of the
filter bank shown in the next
slide.

December 30, 2015

Veton Kpuska

17

Filtering View

December 30, 2015

Veton Kpuska

18

Filtering View
General Properties:
1. If x[n] has the length N & w[n] has the
length M, then X(n,) has length
N+M+1 along n.
2. The bandwidth of X(n,o) is less than or
equal to that of w[n].
3. Sequence X(n,o) has its spectrum
centered at the origin.

December 30, 2015

Veton Kpuska

19

Example 7.2
Consider a Gaussian window of the form:
a ( n n o ) 2

w[n] e

The discrete STFT with DFT length N, therefore, can be


considered as a bank of filters with impulse responses:

hk [n]e

a ( nno ) 2

2
kn
N

For x[n]=(n) x[n]*hk[n]=hk[n]


If N=50, corresponding to bandpass filters spaced by
200 Hz for the sampling rate of 10000 samples/s,
then:
December 30, 2015

Veton Kpuska

20

Example 7.2
For k=0,5,10,15 the following is
obtained:
2
j
0n
a ( nno ) 2
a ( nno ) 2
50
ho [n]e
e
e

h5 [n]e

a ( n no ) 2

h10 [n]e
h15 [n]e
December 30, 2015

a ( nno )

a ( nno )

e
e

2
5n
50
j

2
10 n
50

2
15 n
50

Veton Kpuska

21

Example 7.2

December 30, 2015

Veton Kpuska

22

Example 7.3

Consider the filter bank of previous example 7.2 that was designed
with a Gaussian window of the form:

w[n]e

a ( n no ) 2

Figure 7.7 shows the Fourier transform magnitudes of the output of the
four complex bandpass filters hk[n] for k=0,5,10, and 15 as presented
in previous slide and depicted in the figure 7.6.

December 30, 2015

Veton Kpuska

23

Example 7.3
After Demodulation the resulting bandpass outputs
have the same spectral shape as in the figure but
centered at the origin.

December 30, 2015

Veton Kpuska

24

Time-Frequency Resolution
Tradeoffs

In Chapter 3 basic issue in analysis window selection is the


compromise required between a long window for showing signal detail
in frequency and a short window required for representing fine
temporal structure:

STFT {x[n]} f n [m] x[m]w[nm] X ( )W ( )e jn

1
jn

W
(

)
e
X ( )d

Since both X() and W() are periodic over 2 linear convolution is
essentially circular.
From the equation above:
W() smears (smoothes) X().
Want W() as narrow as possible ideally W()=() for good
frequency resolution.
W()=() will result in a infinitely long w[n].
Poor time resolution.
Conflicting goal

December 30, 2015

Veton Kpuska

25

Example 7.4
Figure 7.8 depicts time-frequency resolution
tradeoff:

December 30, 2015

Veton Kpuska

26

Time-Frequency Resolution
Tradeoffs
From the previous example, smoothing interpretation of
STFT is not valid for non-stationary sequences.
For steady signal long analysis windows are appropriate
and they yield good frequency resolution as depicted in
the next figure.

December 30, 2015

Veton Kpuska

27

Time-Frequency Resolution
Tradeoffs
However, for short and transient signals, plosive
speech, flaps, diphthongs, etc. , short windows are
preferred in order to capture temporal events.
Shorter windows yield poor frequency resolution.

December 30, 2015

Veton Kpuska

28

Short-Time Synthesis
How to obtain original sequence back from its
discrete-time STFT?
The inversion is represented mathematically by a
synthesis equation which expresses a sequence in
terms of its discrete-time STFT.
Recall that for fn[m]=x[m]w[n-m]:

X (n, ) f n [m]e jn

Thus:

X (n, ) f n [m] x[m]w[nm]

If w[n]0 then recovery is complete.


December 30, 2015

Veton Kpuska

29

Short-Time Synthesis
For each n, we take the inverse Fourier transform of the
corresponding function of frequency, then we obtain the
sequence fn[m].
Evaluating fn[m] for m=n the following is obtained:

x[n]w[0].
For w[0]0 x[n] can be obtained by dividing fn[n]/w[0].

The process of taking the inverse Fourier transform of


X(n,) for a specific n and then dividing by w[0] is
represented in the following relation:

1
j n
x[n]
X
(
n
,

)
e
d

2w[0]

representing synthesis equation for the discrete-time


STFT.
December 30, 2015

Veton Kpuska

30

Short-Time Synthesis
In contrast to discrete-time STFT X(n,) the
discrete STFT X(n,k) is not always invertible.
Example 1.
Consider the case when w[n] is bandlimited with
bandwidth of B.

December 30, 2015

Veton Kpuska

31

Short-Time Synthesis
Note if there are frequency components of x[n] which
do not pass through any of the filter regions of the
discrete STFT then
it is not a unique representation of x[n], and
x[n] is not invertible.

Example 2.
Consider X(n,k) decimated in time by factor L, i.e.,
STFT is applied every L samples.
w[n] is non-zero over its length Nw.
If L > Nw then there are gaps in time where x[n] is not
represented/considered.
Thus in such cases again x[n] is not invertible.
December 30, 2015

Veton Kpuska

32

L > Nw
x[m]

L
w[pL-m]

Nw

December 30, 2015

Veton Kpuska

33

Short-Time Synthesis
Conclusion:
Constraints must be adopted to ensure
uniqueness and invertability:
1. Proper/Adequate frequency sampling:
B2/Nw (B - Window bandwidth)
2. Proper Temporal Decimation: LNw

December 30, 2015

Veton Kpuska

34

Filter Bank Summation (FBS)


Method
Traditional short-time synthesis method that is
commonly referred to as the Filter Bank Summation
(FBS).
FBS is best described in terms of the filtering
interpretation of the discrete STFT.
The discrete STFT is considered to be the set of
outputs of a bank of filters.
The output of each filter is modulated with a complex
exponential
Modulated filter outputs are summed at each instant
of time to obtain the corresponding time sample of
the original sequence (see Figure 7.5(b) in the slide
18).
December 30, 2015

Veton Kpuska

35

Filter Bank Summation (FBS)


Method
Recall the synthesis equation given earlier:

1
jn
x[n]
X
(
n
,

)
e
d

2w[0]
FBS method carries out discrete version of this
equation by utilizing discrete STFT X(n,k):
2

j kn
1 N 1
N
y[n]
X
(
n
,
k
)
e

Nw[0] k 0

Derive conditions such that to ensure that


y[n] x[n].
December 30, 2015

Veton Kpuska

36

Filter Bank Summation (FBS)


Method
1

From Figure 7.5


x[n]

Thus:

Analysis followed
by synthesis

N 1

1
y[n]

Nw[0] k 0

y[n]

x[m]w[nm]e

2
km
N

2
kn
N

X ( n ,k )

Interchanging summation operation this equation


2
reduces to:
N 1
j nk

1
y[n]
x[n]w[n]e
Nw[0]
k 0

December 30, 2015

Veton Kpuska

37

Filter Bank Summation (FBS)


Method
Furthermore

N 1

1
y[n]
x[n]w[n]e
Nw[0]
k 0
N 1

1
y[n]
x[n]w[n]e
Nw[0]
k 0

2
nk
N

2
nk
N

1
y[n]
x[n]w[n] N [nrN ]
Nw[0]
r

Periodic impulse train
period N

December 30, 2015

Veton Kpuska

38

Filter Bank Summation (FBS)


Method
Thus:

1
y[n]
x[n]w[n] [nrN ]
w[0]
r

y[n] is the output of the convolution of x[n] with a


product of the analysis window with a periodic
impulse sequence.
Note:
w[n] [nrN ]

reduces to [n] if:

Window length NwN, or


For Nw>N, must have w[rN]=0 for r0, that is

w[rN ]0; for r 1,2 ,3, ...


December 30, 2015

Veton Kpuska

39

Filter Bank Summation (FBS)


Method

December 30, 2015

Veton Kpuska

40

Filter Bank Summation (FBS)


Method
This constraint is known as the FBS constraint.
It must be fulfilled in order to ensure exact signal
synthesis with the FBS method.
This constrained is commonly expressed in frequency
domain:
N 1

2
W
k Nw 0

k 0

This expression states that the frequency responses of


the analysis filters should sum to a constant across the
entire bandwidth.
We will conclude this discussion by stating that a filter
bank with N filters, based on an analysis filter of length
less than or equal to N, is always an all-pass system.
December 30, 2015

Veton Kpuska

41

Generalized FBS Method


Note:

1
x[n]
2

jn
f
[
n
,
n

r
]
X
(
r
,

)
e
d

Smoothing function f[n.m] is referred to as the timevarying synthesis filter.


It can be shown that any f[n,m] that fulfills the condition
below makes the synthesis equation above valid (Exercise
7.6):

f [nm]w[m]1

Note also that basic FBS method can be obtained by setting


the synthesis filter to be a non-smoothing filter:
f[n,m]=[m]
December 30, 2015

Veton Kpuska

42

Generalized FBS Method


Consider the discrete STFT with decimation factor
L. Generalized FSB of the synthesized signal is
given by:
N 1

L
y[n] f [n,nrL] X (rL,k )e
N r k 0

2
nk
N

Furthermore, consider time invariant smoothing


filter:
f[n,m]=f[m]
That is:
f[n,n-rL]=f[n-rL]
December 30, 2015

Veton Kpuska

43

Generalized FBS Method


Thus

j nk
L N 1
y[n] f [nrL] X (rL,k )e N
N r k 0

This equation holds when the following constrain is


satisfied by the analysis and synthesis filters as well as
the temporal decimation and frequency sampling factors:

L f [nrL]w[rLn pN ] [ p ],

For f[m]=[m] and L=1 this method reduces to the basic


FBS method.
December 30, 2015

Veton Kpuska

44

Generalized FBS Method


Interested in L>1 case and in using f[n] as
interpolator.
Interpolation FBS Methods:

1. Helical Interpolation (Partnoff)


2. Weighted Overlap-add Method (Croshiere)

December 30, 2015

Veton Kpuska

45

Overlap-Add (OLA) Method

FBS Method was motivated from the filtering view of the STFT
OLA method was motivated from the Fourier transform view of
the STFT.

In the OLA method:


1.
2.

Inverse DFT for each fixed time in the discrete STFT is taken,
Overlap and add operation between the short-time section is
performed,

This works provided that analysis window is designed such that


the overlap and add operation effectively eliminates the
analysis window from the synthesized sequence.
Basic idea is that the redundancy within overlapping segments
and the averaging of the redundant samples remove the effect
of windowing.

December 30, 2015

Veton Kpuska

46

Overlap-Add (OLA) Method


Recall the short-time synthesis relation:

1
jn
x[n]
X
(
n
,

)
e
d

2W [0]
If x[n] is averaged over many short-time segments
and normalized by W(0) then

1
jp
x[n]
X
(
p
,

)
e
d

2W [0] p

where

W (0) w[n]
n

December 30, 2015

Veton Kpuska

47

Overlap-Add (OLA) Method

Discretized version of OLA is given by:


j kn
1 1 N 1
y[n]
X ( p,k )e N

W (0) p N k 0

2

IDFT: f p [ n ] x[ n ] w[ pn ]

Note that the above IDFT is true provided that N>N w. The
expression for y[n] thus becomes:

y[n]

1
1

x
[
n
]
w
[
p

n
]

x
[
n
]

W (0) p
W (0)

Which provided that:

w[ pn]W (0)

then
y[n]=x[n]
December 30, 2015

Veton Kpuska

w[ pn]

Always True because sum of


values of a sequence must
always equal the first value
of its Fourier transform
(D.C. Energy of a signal is
by definition sum of signal
values)

48

Overlap-Add (OLA) Method

For decimation in time by factor of L, it can be shown (Exercise 7.4)


that:

W (0)
w[ pLn]

L
p

Then x[n] can be synthesized using the following equation:


j kn
L 1 N 1
y[n]
X ( pL,k )e N

W (0) p N k 0

The above equation depicts general constrain imposed by OLA


method. It requires that the sum of all the analysis windows
(obtained by sliding w[n] with L-point increments) to add up to a
constant as shown in the next figure.

December 30, 2015

Veton Kpuska

49

Overlap-Add (OLA) Method

December 30, 2015

Veton Kpuska

50

Overlap-Add (OLA) Method

Duality of OLA constraint and FBS constraint:

FBS

OLA

2
W k Nw 0

k 0
N 1

w[ pLn]

W ( 0)
L

FBS method requires that finite-length windows have a length N w less than the
number of analysis filters N to satisfy FBS constrain (N>N w).
Analogously, for OLA methods it can be shown that its constrained is satisfied by allfinite-bandwidth analysis windows whose maximum frequency is less than 2/L
(where L is temporal decimation factor).

In addition this finite-bandwidth constraint can be relaxed by allowing the shifted


window transform replicas to take on value zero at the frequency origin =0:

W k 0,
L

at

2
k
L

Analogous to FBS constrain for N w>N where the window w[n] is required to take
on value zero at n=N, 2N, 3N,...

December 30, 2015

Veton Kpuska

51

Overlap-Add (OLA) Method

December 30, 2015

Veton Kpuska

52

Time-Frequency Sampling

Different qualitative view of the time-frequency


sampling concepts for OLA and FBS constrains from
the perspective of classical time-domain and
frequency-domain aliasing.

Following discussion serves as additional summary of


sampling issues for those two methods that gives
motivation for our earlier statement that sufficient but
not necessary conditions for invertability of the
discrete STFT are:
1.
2.
3.

The analysis window is non-zero over its finite length Nw.


The temporal decimation factor LNw
The frequency sampling interval 2/N 2/Nw

December 30, 2015

Veton Kpuska

53

Time-Frequency Sampling
Consider windowed/short-time signal:

fn[m]=w[m]x[n-m], and
X(n,) Fourier transform of fn[m]
Analysis window duration of Nw

From Fourier transform point of view:

Reconstruction of fn[m] from X(n,k) requires a frequency


sampling of at least 2/Nw or finer.

From Time-domain point of view:

Time decimation interval L is required to meet Nyquist


criterion based on the bandwidth of the window w[n].
This implies sampling of X(n, k) at a time interval
L 2/c to avoid frequency-domain aliasing of the
time sequence X(n,)
-c
c
c is the bandwidth of W() [-c, c]

December 30, 2015

Veton Kpuska

54

Time-Frequency Sampling

December 30, 2015

Veton Kpuska

55

Time-Frequency Sampling
Sufficient (but not necessary) conditions for
signal reconstruction are:

1.

Window is non-zero over its lengths Nw

2.
3.

Temporal decimation factor L Nw (2/c)


Frequency sampling interval 2/N 2/Nw
To avoid aliasing:

I.
II.

In the time domain - by ensuring condition 3.


In the frequency domain - by ensuring condition 2.

December 30, 2015

Veton Kpuska

56

Time Decimation Sampling


Implication on the use of practical windows:
I.

Rectangular window, Nw
Assuming bandwidth equal
to the extent of the main lobe
B = [-2/Nw,: 2/Nw]= 4/Nw

2 N w
Lw
B 2

-c

;50% Overlap in windows

II. Hamming Window, Nw

Lw w
B 4

Bandwidth
2 BN= 8/Nw

December 30, 2015

;75% Overlap in windows

Veton Kpuska

57

Summary

OLA Method (DFT of order N)


1. No time aliasing if window length Nw so that:
2/N 2/Nw
2. No frequency-domain aliasing occurs if decimation
factor L is small enough so that filter bandwidth
c =(2/L)
3. If zeros are allowed in W() then condition 2 can be
relaxed. In this case we can under-sample in
frequency and still recover the sequence.

December 30, 2015

Veton Kpuska

58

Summary
FBS Method
1. No frequency-domain aliasing occurs if the
decimation factor L meets the Nyquist criterion, i.e.,
L Nw (2/c) where c is the w[n] bandwidth.
2. Not time-domain aliasing occurs if 2/N 2/Nw

Nw N.
3. If zeros in w[n] are allowed then condition 2 can be
relaxed. In this case we can under-sample in time
and still recover the sequence.

December 30, 2015

Veton Kpuska

59

Short-Time Fourier Transform


Magnitude (STFTM)
Spectrogram major tool in speech applications:
Spectrogram is squared STFT magnitude (STFTM).
It has been suggested that human ear extracts
perceptual information strictly form a spectrogramlike-representation of speech ( J.C. Anderson,
Speech Analysis/Synthesis Based on Perception,
PhD Thesis, MIT, 1984)
Experienced speech researchers have trained
themselves to read the spectrogram itself (Victor
Zue, MIT).
Primary topic of FIT-ece5528 Acoustics of
American Speech.

December 30, 2015

Veton Kpuska

60

Short-Time Fourier Transform


Magnitude (STFTM)
STFTM discards (possibly) phase information, which has
numerous uses in application areas:

Time-scale modification
Speech Enhancement

In all these applications phase information estimation of


speech is difficult (e.g., presence of noise in the signal)
Furthermore, a number of techniques have been
developed to obtain phase estimate from a STFT
magnitude.
This section introduces STFTM as an alternative timefrequency signal representation.
In addition analysis and synthesis techniques will be
developed for STFTM.
December 30, 2015

Veton Kpuska

61

Short-Time Fourier Transform


Magnitude (STFTM)
Squared-Magnitude and Autocorrelation
Relationship:

1
r[ n , m ]
2
2

X (n, )

X ( n , ) e j n d

j n
r
[
n
,
m
]
e

Short-time
autocorrelation
Short-time
magnitude

m-autocorrelation lag
December 30, 2015

Veton Kpuska

62

Short-Time Fourier Transform


Magnitude (STFTM)
Furthermore, the autocorrelation r[n,m] is given by
the convolution of the short-time signal:
r[n,m] = fn[m]*fn[-m]
where

fn[m]=x[m]w[n-m]

December 30, 2015

Veton Kpuska

63

Signal Representation
Under what conditions STFTM can be used to
represent a sequence uniquely?
Note that:
|F{x[n]}|= |F{-x[n]}|
Ambiguity, thus STFTM is not unique representation
for all cases.
However, by imposing certain mild restrictions on:
the analysis window and
the signal,
unique signal representation is indeed possible with
the discrete-time STFTM.
December 30, 2015

Veton Kpuska

64

Signal Representation

Suppose x[n] is the sum of two


signals: x1[n] and x2[n] occupying
different regions of the n-axis.
Furthermore, suppose that the
gap of zeros between x1[n] and
x2[n] is large enough so that there
is no analysis window position for
which the corresponding shorttime section includes non-zero
samples of both x1[n] and x2[n].
Because of the ambiguity
condition STFTM of:
x1[n] + x2[n]
x1[n] - x2[n], and
-x1[n] + x2[n]
is the same.

December 30, 2015

Veton Kpuska

65

Signal Representation

Any uniqueness conditions must include a


restriction on the length of zero gaps between nonzero portions of the signal x[n].
Sufficient uniqueness conditions are the following:
1. The analysis window w[n] is known sequence of
finite length Nw, with no zeros over its durations.
2. The sequence x[n] is one-sided with at most Nw-2
consecutive zero samples, and the sign of its first
non-zero value is known.

December 30, 2015

Veton Kpuska

66

Signal Representation
If the successive STFTM correspond to overlapping
signal segments then:
If short-time spectral magnitude of signal segment at
time n is know then
Spectral magnitude of the adjacent section at time
n+1 must be consistent in the region of overlap with
the known short-time section.
If the analysis window were non-zero and of length Nw,
then after dividing out the analysis window, the first
Nw-1 samples of the segment at time n+1, must equal
the last Nw-1 of the segment at time n (as illustrated in
the next slide)
If the last sample of a segment can be extrapolated from
its first Nw-1 values, one could repeat this process to
obtain the entire signal x[n].

December 30, 2015

Veton Kpuska

67

Signal Representation

December 30, 2015

Veton Kpuska

68

Signal Representation
To develop the procedure for extrapolating the next
sample of a sequence using its STFTM, assume that the
first Nw-1 samples under the analysis window positioned
at time n are known.

The sequence x[n] has been obtained up to some time n-1


from its STFTM.

Goal is to compute sample x[n] from these initial


samples and the STFT magnitude, |X(n,)|, or
equivalently r[n,m].

December 30, 2015

Veton Kpuska

69

Signal Representation

Note that r[n, Nw-1], the maximum lag of autocorrelation, is


given by the product of the first and last value of the segment:

r[n, N w 1] w[0]x[n 0] w[ N w 1]x[n ( N w 1)]



first of next

December 30, 2015

last of present

r[n, N w 1]
x[n]
w[0]w[ N w 1]x[n ( N w 1)]

Veton Kpuska

70

Signal Representation
Note that:

X n,
2

jn
r

n
,
m

If the first value of the short-time section,


x[n-(Nw-1)] happens to be equal to zero, must find
the first non-zero value within the section and again
use the product relation as depicted in the last
expression.
Note that such a sample can be found because it
was assumed that there are at most Nw-2
consecutive zero samples between any two nonzero samples of x[n].
December 30, 2015

Veton Kpuska

71

Signal Representation
Sequential extrapolation algorithm
1. Initialize with x[0]
2. Update time n
3. Compute r[n,Nw-1] from the inverse DFT
of |X(n,k)|2.

r[n, N w 1]
4. Compute: x[n]
w[0]w[ N w 1]x[n ( N w 1)]
5. Return to step (2) and repeat
December 30, 2015

Veton Kpuska

72

Reconstruction from TimeFrequency Samples


To carry out STFTM analysis on a digital computer,
discrete STFTM must be applied.
Uniqueness theory of STFTM can be easily extended
to discrete STFTM.
Uniqueness of STFTM based on the short-time
autocorrelation functions.
Autocorrelation functions can be obtained even if
the STFTM is sampled in frequency (discrete
STFTM) with adequate frequency sampling.
To consider effects of temporal decimation with
factor L, we note that adjacent short-time sections
now have an overlap of Nw-L instead of Nw-1.
December 30, 2015

Veton Kpuska

73

Reconstruction from TimeFrequency Samples

Sufficient uniqueness conditions for the partial


overlap case:
1. The analysis window w[n] is a known sequence of
finite length Nw, with no zeros over its duration.
2. The sequence x[n] is one-sided with, at most Nw-2L
consecutive zero samples.
L consecutive samples of x[n] (from the first nonzero sample) are known.
This is a sufficient but not a necessary condition.

December 30, 2015

Veton Kpuska

74

Signal Estimation from the


Modified STFT or STFTM

Synthesis of a signal from a time-frequency function of a


modified STFT or STFTM required in many applications.
Modification may arise due to:
1.
2.
3.
4.

Quantization errors (e.g., from speech coding)


Time-varying filtering
Speech Enhancement
Signal Rate modifications

Limitations:

Modifications in frequency should result in time modification


that are restricted within an analysis window (Figure 7.18
next slide)
Overlapping sections must undergo similar modifications
(Figure 7.19)

December 30, 2015

Veton Kpuska

75

Signal Estimation from the


Modified STFT or STFTM

Example 7.5. Removal of


interfering tone.

Consider modifying a valid


X(n,) of short time
fn[m]=x[m]w[n-m] segment
by inserting a zero gap
where there is known to lie
an unwanted interfering sine
wave component.
Removal of the interfering
signal with H(n,).
Resulting frequency
representation is:
Y(n,)=X(n,)H(n,)
Inverse transforming it to
obtain modified short-time
sequence gn[m] is non-zero
beyond the extent of the
original short-time segment
fn[m]=x[m]w[n-m].

December 30, 2015

Veton Kpuska

76

Signal Estimation from the


Modified STFT or STFTM

Example 7.6

At time n:
Suppose a time-decimated
STFT, X(nL,) is multiplied
by a linear phase factor
ejno to obtain
Y(nL,)=X(nL,)ejno
At time (n+1)
X((n+1)L,) is multiplied
by a negative of this linear
phase factor e-jno to obtain
Y((n+1)L,)=X((n+1)L,)
e-jno
Overlapping sections of
inverse Fourier Transforms
denoted by gnL[m] and
g(n+1)L[m] are not consistent.

December 30, 2015

Veton Kpuska

77

Heuristic Application of STFT


Synthesis Methods
Although modifications of the STFT or STFTM may
violate some principles, results may be reasonable.
Resulting effect of modifying STFT (FBS and OLA) with
another time-frequency function can be shown to be a
time-varying convolution between x[n] and a function
[n,m]: x[n]*[n,m].
Let X(n,) be modified by a function H(n,):
Y(n,) = X(n,)H(n,)
This corresponds to a new short-time segment:
gn[m] = fn[n]*h[n,m]
h[n,m] time varying system impulse response
(Chapter 2).
December 30, 2015

Veton Kpuska

78

Heuristic Application of STFT


Synthesis Methods
Consider FBS method (discretization in frequency to
obtain):

Y (n,k ) Y (n, )|

2
k
N

X (n,k ) H (n,k )

N-point IDFT of H(n,k):

~
h [n,m] h[n,mlN ], periodic over N
l

Then resulting sequence


can be written as:

y[n] x[nm]h[n,m]

where

h[n,m] w[n] h[n,mlN ]


l

December 30, 2015

Veton Kpuska

79

Heuristic Application of STFT


Synthesis Methods
Using OLA method, it can be shown (see Exercise
7.11) that:

h[n,m] w[n] h[n,mlN ]


l

Contrasting FBS with OLA


FBS:
OLA:

December 30, 2015

multiplication instantaneous change


convolution smoothing

Veton Kpuska

80

Heuristic Application of STFT


Synthesis Methods
Example 7.7
Suppose we want to deliberately introduce
reverberation into a signal x[n] by convolution with
the filter:
h[n] = [n] + [n-no]
Fourier transform of which is:
H() = 1 + e-jno
STFT of resulting signal is given by:
Y(n,)= X(n,)H()
where

X (n, ) x[m]w[nm]e jm
m

December 30, 2015

Veton Kpuska

81

Example 7.7 (cont.)


Using OLA method (7.21):
2
j kn
1 1 N 1
N
y[n]
Y p,k e

W (0) p n k 0

It is then possible to express y[n] in terms of


original sequence:
2

j k ( n m )
1 N 1

1
N
y[n]
x[ m] H k e
w[ p m]

W (0) p
N k 0 p

W (0)
IDFT h[ nmrN ]
r

x[m]h[nm]
p

December 30, 2015

Veton Kpuska

82

Example 7.7 (cont.)


Where

h[n] h[nrN ] [n rN ] [nno rN ]


is periodic extension of h[n], over N, of which we
only consider interval [0,N-1].
This implies that original reverberated signal is
obtained only when no<N, otherwise temporal alias
will occur (as illustrated in 7.20).

December 30, 2015

Veton Kpuska

83

Example 7.7 (cont.)

December 30, 2015

Veton Kpuska

84

Time-Scale Modification and


Enhancement of Speech
The signal construction methods presented in this
chapter can be applied in a variety of speech
applications.
Time-Scale Modification
In speech case would like to change articulation rate
(faster, slower) without changing the pitch

December 30, 2015

Veton Kpuska

85

Time-Scale Modification

December 30, 2015

Veton Kpuska

86

Time-Scale Modification

Methods:

Cut & Paste (Fairbanks method):

Discard or duplicate frames, in order to speed up or slow down the


articulation respectively.
Problem:

Pitch-synchronous OLA (Scott & Gerber)

Select frame size & location synchronous to pitch periods. Problem of pitch
period mismatch is avoided.
Problem:

Pitch period mismatch at adjacent frames causes distortion.

Pitch synchronization is not always easy.

STFTM Synthesis

1.
2.
3.

To avoid pitch synchronization problems use only the magnitude of STFT


(i.e., STFTM)
Compute |X(nL,)| at an appropriate frame interval decimation rate L
(e.g., L=128 at Fs=10000 Hz, and N is several T0 long)
Modify decimation rate with new rate M (e.g., M=L/2) for a speed-up of
factor of : |Y(nM,)|= |X(nL,)|
Apply the Least-Squared Error iterative estimation algorithm until |
Y(nM,)| converged.
Problem:

December 30, 2015

Occasional reverberant characteristic of synthesized signal are perceived due to


lack of STFT phase control.

Veton Kpuska

87

Time-Scale Modification

December 30, 2015

Veton Kpuska

88

Noise Reduction
A number of techniques developed to remove/reduce
additive noise:
Noise corrupted signal is given by:
y[n]=x[n]+b[n]
STFT Synthesis:
Subtract Noise spectrum b()

1
2

X (nL, ) Y (nL, ) Sb ( ) e -jY ( nL , )


2

2
2
if Y (nL, ) Sb ( ) 0 Y (nL, ) Sb ( ) 0

Original phase spectrum Y(nL,) is retained because


phase of the noise can not be reliably estimated in
general.
Factor is a control of the degree of noise reduction.
December 30, 2015

Veton Kpuska

89

Noise Reduction
STFTM Synthesis:
Ignore phase and use Sequential Extrapolation or
Least-Squared Error estimation method to construct
clean signal.

December 30, 2015

Veton Kpuska

90

Das könnte Ihnen auch gefallen