DSP Fundamentals

Fundamentals of DSP for music
analysis
Juan P Bello
Digital Signals
Digital Signals
According to the sampling theorem: fs > 2fmax
Otherwise there is another, lower-frequency, signal that share
samples with the original signal (an alias).
Related to the wagon-wheel effect:

http://www.michaelbach.de/ot/mot_strob/index.html
LPF LPF
Anti-aliasing Anti-imaging
Block processing and spectrum
N samples
Memory buffer
For Block processing, signal data is sent to a buffer and processed

as a block. The buffer is then filled with new data.
A common example is spectral analysis using the DFT.
The spectrum of a signals segment shows the energy distribution
across the frequency range
Discrete Fourier Transform
The spectrum of a digital signal, x(n), can be calculated as:
N "1
X(k) = DFT[x(n)] = $ x(n)e" j 2 #nk / N
n= 0
k = 0,1,...,N "1
The resulting N samples X(k) are complex-valued:
! X(k) = X R (k) + jX I (k)

X (k) = X R2 (k) + X I2 (k)
X I (k)
" (k) = arctan
X R (k)
k = 0,1,...,N #1
Discrete Fourier Transform
The resulting spectrum is composed of N equidistant frequency
points from 0 to (N-1)fs/N Hz in steps of fs/N
If the N samples x(n) are real-valued (as in the case of audio
signals) then the N DFT samples can be defined in terms of
conjugate pairs of the form:
X(k) = X * (N " k)
|X(k)| (k)
0! N/2 N 0 N/2 N
That means that the DFT of a real-valued signal x(n) is half-

redundant. The complete information is obtained by looking at
X(k), k = 0,1,,N/2 (frequencies up to fs/2)
Inverse DFT (IDFT)
The IDFT allows for the transformation of spectra in discrete
frequency to signal in discrete time.
It can be calculated as follows:
1 N "1
x(n) = IDFT[X(k)] = $ X(k)e" j 2 #nk / N
N k= 0
n = 0,1,...,N "1
The fast version of the DFT is known as the Fast Fourier
Transform (FFT) and its inverse as the IFFT. The FFT is an
algorithm to compute the DFT, usually O(N2) operations long, in
! O(NlogN) operations
Furthermore, there are a number of tricks to express the IDFT in
terms of the FFT
The FFT is so fast that even time-domain operations, like
convolution, can be performed faster using FFT and IFFT instead.
What is Fourier saying?
Any periodic sound can be described by the summation of a

number of sinusoids with time-varying amplitudes and phases
Thus a complex spectrum is just a snapshot of those sinusoids
parameters
|X(k)| (k)
0 N/2 0 N/2
Frequency resolution
As we now know, the frequency resolution is f = fs/N.
It can be seen that to increase resolution we need to increase N
However that implies a loss of temporal resolution
A possible solution is to zero-pad, i.e. to add zero-valued samples
until we reach the desired N-length.
Leaking
In theory the DFT of a sinusoid shows one spectral line at f0
DFT
f0
In practice, unless we perform f0-synchronous analysis, there are

discontinuities (sharp changes) at the segment boundaries that
introduce some noise. Thus the spectral line around f0 is smeared.
This is known as spectral leaking
DFT
f0
Windowing
Segmenting is equivalent to multiplying the signal by a N-length
rectangular window of unitarian amplitude.
Multiplication in time-domain is equivalent to convolution in the
frequency domain
The transform of a rectangular window is a Sinc function (sin(x)/x).
We can have N = kT0, where k is a positive integer, thus
eliminating the discontinuities.
Alternatively we can use a window that smoothly reduces the
signal to zero at the boundaries
Possible examples include Hamming (H), Blackman (B),
Hanning, Triangular, Gaussian and Kaisser-Bessel windows.
Windowing
Time-frequency representation
The Short-time Fourier Transform (STFT)
Independent DFTs are calculated on windowed segments
The segments usually overlap to compensate for the loss of
temporal resolution
Produces a spectrogram (or phasogram)
Time-frequency representation
A waterfall representation (just a different view)
The Phase Vocoder
Phase Vocoder Basics
Phase vocoder refers to a group of signal processing techniques performed
in the spectral domain
First developed as a computer music tool (Flanagan, 1966). It is now

considered a classical signal processing tool with a widely recognised
standard implementation
The theory remains largely unchanged, although many improvements and
different implementations have been proposed
Easily used in processing applications due to the intuitive nature of time-
frequency distributions.
Phase Vocoder Basics
Basic principle of PV :
decompose an audio signal over a dictionary of

sinusoidal bases, or Fourier bases, taking into
account magnitude and phase information.
Signal is decomposed over short time windowed
frames to analyse the feature distribution
evolution over time (STFT).
The short-time spectra can be modified or
transformed
The processed spectrum is applied to an IFFT
overlap-added in the time-domain, yielding the
output signal
Phase Vocoder Theory
Consider a windowed STFT of the signal x(m):
#
" j 2!mk / N
X (n, k ) = $ x (m )
h (n " m )e
m = "#
Where X(n,k) is a time-varying spectrum (in its polar form):
X (n, k ) = X (n, k )e j! (n ,k )
n is the short time frame start

k is the frequency bin number
N is the window length
m is the summation index
h(m) is a sliding window (e.g. Hanning)
Filter Bank Model
Let us define:
2!
WN = e " j 2! / N "k = k
N
Computation of the time-varying spectrum can be seen as a
parallel bank of N bandpass filters with IR given by:
hk (n) = h(n)WN! nk , k " [0, N ! 1]

Such that:
yk (n) = h(n) * x(n)WNnk

Filter Bank Model
Filter Bank Model
Complex base-band implementation: modulates the signal
components at frequency k down to baseband (using WNnk),
followed by low pass filtering of the signal using the filter h(n).
Complex band-pass implementation: filters the signal directly

using hk(n), thus leading to the complex-valued band-pass
signals.
Filter Bank Model
Since we can assume that x(n) is real, bins which are symmetric
about the Nyquist frequency will be conjugate pairs of the form:
~ ~
X (n, k ) = X * (n, N ! k )
This property can be used to simplify the analysis resulting in a
more meaningful interpretation:
(
y k ( n ) = X ( n,k ) e j" ( n,k ) + e# j" ( n,k ) )
= 2 X (n, k ) cos(#~ (n, k )), k " [1, N / 2 ! 1]
!
Leading to: "~ (n, k ) = " (n, k )+ ! k n
N /2
y (n) = ! y k (n)
k =0
FFT/IFFT Model
Phase unwrapping
Is the process of transforming the
cyclic phase (constrained to the
polar circle) to a linear function.
This is done by adding the
cumulative phase variation given
by kn
k is the frequency of the kth
sinusoid of our FFT analysis and is
equal to 2k/N.
2"k
!~ (n, k ) = n + ! (n, k )
N
As the converse of unwrapping we
define a principle argument
function (princarg) to map the
phase to the ]-,] range
Target and deviation phases
If a stable sinusoid with frequency k exists, we can calculate a target
phase value as the sum of the previous unwrapped phase plus the
expected phase increment.
This expected phase increment is equal to the frequency of the sinusoid
multiplied by the time shift (one hop size R).
"~t ((s + 1)R ) = "~ (sR )+ ! k R

The deviation phase is simply the difference between the measured
phase and the target phase at a particular time.
Because of the unwrapping we need to use the principle argument
function:
"~d ((s + 1)R ) = princ arg["~ ((s + 1)R )! "~t ((s + 1)R )]
Instantaneous frequency
From the previous it follows that an expected unwrapped phase can be
calculated as the sum of the target plus the deviation phase:
!~u ((s + 1)R ) = !~t ((s + 1)R )+ !~d ((s + 1)R )

This is useful to define an unwrapped phase difference by subtracting
the expected unwrapped phase to the previous unwrapped phase:
#! ((s + 1)R ) = !~u ((s + 1)R )" !~ (sR )
#$ ((s + 1)R ) = ! k R + princ arg[$~ ((s + 1)R )" $~ (sR )" ! k R ]

Finally we can define the instantaneous frequency as the rate of angular
rotation, i.e. the unwrapped phase difference divided by the time
between successive frames
#" ((s + 1)R )

f i ((s + 1)R ) = fs
2!R
Example App: Time-Scaling
Original Standard PV Adaptive PV

Guitar (20%)
Pop (15%)
Useful References
Zlzer, U. (Ed). DAFX: Digital Audio Effects. John Wiley and Sons (2002)
Chapter 1: Zlzer, U. Introduction.
Chapter 8: Arfib, D., Keiler, F. and Zlzer, U., Time-frequency Processing.
Chapter 10: Amatriain, X., Bonada, J., Loscos, A. and Serra, X. Spectral Processing.
Good read, Chapter 2: Dutilleux, P. and Zlzer, U. Filters
Pohlmann, K. Principles of Digital Audio. McGraw-Hill, Inc. (1995)

Serra, M., Introducing the Phase Vocoder, in Musical Signal Processing, Swets and Zeitlinger Publishers,
1997.
De Gtzen, A., Arfib, D., Bernardini, N., Traditional(?) implementation of a phase vocoder: the tricks of the
trade, pgg. 37-44, DAFx, Verona, Italy, 2000;
Freed, A., Rodet, X., Depalle, P., Synthesis and Control of Hundreds of Sinusoidal Partials on a Desktop
Computer without Custom Hardware, Int. Conf. on Signal Processing Applications and Technology
(ICSPAT92), 1992.
Laroche, J., Time and Pitch Scale Modification of Audio Signals, Application of Digital Signal Processing to
Audio and Acoustics, Ch.7 , Kluwer Academic Publishers, 1998b.
Laroche, J., Dolson, M., Improved phase vocoder time-scale modification of audio, IEEE Trans. on Speech
and Audio Proc., Vol.7,No.3, 1999.

DSP Fundamentals

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

DSP Fundamentals

Hochgeladen von

Copyright:

Verfügbare Formate

Fundamentals of DSP for music

Related to the wagon-wheel effect:

For Block processing, signal data is sent to a buffer and processed

The resulting N samples X(k) are complex-valued:

! X(k) = X R (k) + jX I (k)

That means that the DFT of a real-valued signal x(n) is half-

Any periodic sound can be described by the summation of a

In practice, unless we perform f0-synchronous analysis, there are

First developed as a computer music tool (Flanagan, 1966). It is now

decompose an audio signal over a dictionary of

Where X(n,k) is a time-varying spectrum (in its polar form):

n is the short time frame start

hk (n) = h(n)WN! nk , k " [0, N ! 1]

yk (n) = h(n) * x(n)WNnk

Complex band-pass implementation: filters the signal directly

"~t ((s + 1)R ) = "~ (sR )+ ! k R

!~u ((s + 1)R ) = !~t ((s + 1)R )+ !~d ((s + 1)R )

#! ((s + 1)R ) = !~u ((s + 1)R )" !~ (sR )

#$ ((s + 1)R ) = ! k R + princ arg[$~ ((s + 1)R )" $~ (sR )" ! k R ]

#" ((s + 1)R )

Original Standard PV Adaptive PV

Pohlmann, K. Principles of Digital Audio. McGraw-Hill, Inc. (1995)

Das könnte Ihnen auch gefallen