Sie sind auf Seite 1von 30

Fundamentals of DSP for music

analysis

Juan P Bello
Digital Signals
Digital Signals
According to the sampling theorem: fs > 2fmax
Otherwise there is another, lower-frequency, signal that share
samples with the original signal (an alias).

Related to the wagon-wheel effect:


http://www.michaelbach.de/ot/mot_strob/index.html

LPF LPF

Anti-aliasing Anti-imaging
Block processing and spectrum

N samples
Memory buffer

For Block processing, signal data is sent to a buffer and processed


as a block. The buffer is then filled with new data.
A common example is spectral analysis using the DFT.
The spectrum of a signals segment shows the energy distribution
across the frequency range
Discrete Fourier Transform
The spectrum of a digital signal, x(n), can be calculated as:
N "1
X(k) = DFT[x(n)] = $ x(n)e" j 2 #nk / N
n= 0

k = 0,1,...,N "1

The resulting N samples X(k) are complex-valued:

! X(k) = X R (k) + jX I (k)


X (k) = X R2 (k) + X I2 (k)
X I (k)
" (k) = arctan
X R (k)
k = 0,1,...,N #1
Discrete Fourier Transform
The resulting spectrum is composed of N equidistant frequency
points from 0 to (N-1)fs/N Hz in steps of fs/N
If the N samples x(n) are real-valued (as in the case of audio
signals) then the N DFT samples can be defined in terms of
conjugate pairs of the form:

X(k) = X * (N " k)
|X(k)| (k)

0! N/2 N 0 N/2 N

That means that the DFT of a real-valued signal x(n) is half-


redundant. The complete information is obtained by looking at
X(k), k = 0,1,,N/2 (frequencies up to fs/2)
Inverse DFT (IDFT)
The IDFT allows for the transformation of spectra in discrete
frequency to signal in discrete time.
It can be calculated as follows:
1 N "1
x(n) = IDFT[X(k)] = $ X(k)e" j 2 #nk / N
N k= 0
n = 0,1,...,N "1
The fast version of the DFT is known as the Fast Fourier
Transform (FFT) and its inverse as the IFFT. The FFT is an
algorithm to compute the DFT, usually O(N2) operations long, in
! O(NlogN) operations
Furthermore, there are a number of tricks to express the IDFT in
terms of the FFT
The FFT is so fast that even time-domain operations, like
convolution, can be performed faster using FFT and IFFT instead.
What is Fourier saying?

Any periodic sound can be described by the summation of a


number of sinusoids with time-varying amplitudes and phases
Thus a complex spectrum is just a snapshot of those sinusoids
parameters
|X(k)| (k)

0 N/2 0 N/2
Frequency resolution
As we now know, the frequency resolution is f = fs/N.
It can be seen that to increase resolution we need to increase N
However that implies a loss of temporal resolution
A possible solution is to zero-pad, i.e. to add zero-valued samples
until we reach the desired N-length.
Leaking
In theory the DFT of a sinusoid shows one spectral line at f0

DFT
f0

In practice, unless we perform f0-synchronous analysis, there are


discontinuities (sharp changes) at the segment boundaries that
introduce some noise. Thus the spectral line around f0 is smeared.
This is known as spectral leaking

DFT
f0
Windowing
Segmenting is equivalent to multiplying the signal by a N-length
rectangular window of unitarian amplitude.
Multiplication in time-domain is equivalent to convolution in the
frequency domain
The transform of a rectangular window is a Sinc function (sin(x)/x).
We can have N = kT0, where k is a positive integer, thus
eliminating the discontinuities.
Alternatively we can use a window that smoothly reduces the
signal to zero at the boundaries
Possible examples include Hamming (H), Blackman (B),
Hanning, Triangular, Gaussian and Kaisser-Bessel windows.
Windowing
Time-frequency representation
The Short-time Fourier Transform (STFT)
Independent DFTs are calculated on windowed segments
The segments usually overlap to compensate for the loss of
temporal resolution
Produces a spectrogram (or phasogram)
Time-frequency representation
A waterfall representation (just a different view)
The Phase Vocoder
Phase Vocoder Basics
Phase vocoder refers to a group of signal processing techniques performed
in the spectral domain

First developed as a computer music tool (Flanagan, 1966). It is now


considered a classical signal processing tool with a widely recognised
standard implementation
The theory remains largely unchanged, although many improvements and
different implementations have been proposed
Easily used in processing applications due to the intuitive nature of time-
frequency distributions.
Phase Vocoder Basics
Basic principle of PV :

decompose an audio signal over a dictionary of


sinusoidal bases, or Fourier bases, taking into
account magnitude and phase information.
Signal is decomposed over short time windowed
frames to analyse the feature distribution
evolution over time (STFT).
The short-time spectra can be modified or
transformed
The processed spectrum is applied to an IFFT
overlap-added in the time-domain, yielding the
output signal
Phase Vocoder Theory
Consider a windowed STFT of the signal x(m):
#
" j 2!mk / N
X (n, k ) = $ x (m )
h (n " m )e
m = "#

Where X(n,k) is a time-varying spectrum (in its polar form):

X (n, k ) = X (n, k )e j! (n ,k )

n is the short time frame start


k is the frequency bin number
N is the window length
m is the summation index
h(m) is a sliding window (e.g. Hanning)
Filter Bank Model

Let us define:
2!
WN = e " j 2! / N "k = k
N
Computation of the time-varying spectrum can be seen as a
parallel bank of N bandpass filters with IR given by:

hk (n) = h(n)WN! nk , k " [0, N ! 1]


Such that:

yk (n) = h(n) * x(n)WNnk


Filter Bank Model
Filter Bank Model
Complex base-band implementation: modulates the signal
components at frequency k down to baseband (using WNnk),
followed by low pass filtering of the signal using the filter h(n).

Complex band-pass implementation: filters the signal directly


using hk(n), thus leading to the complex-valued band-pass
signals.
Filter Bank Model
Since we can assume that x(n) is real, bins which are symmetric
about the Nyquist frequency will be conjugate pairs of the form:
~ ~
X (n, k ) = X * (n, N ! k )
This property can be used to simplify the analysis resulting in a
more meaningful interpretation:
(
y k ( n ) = X ( n,k ) e j" ( n,k ) + e# j" ( n,k ) )
= 2 X (n, k ) cos(#~ (n, k )), k " [1, N / 2 ! 1]

!
Leading to: "~ (n, k ) = " (n, k )+ ! k n
N /2
y (n) = ! y k (n)
k =0
FFT/IFFT Model
Phase unwrapping
Is the process of transforming the
cyclic phase (constrained to the
polar circle) to a linear function.
This is done by adding the
cumulative phase variation given
by kn
k is the frequency of the kth
sinusoid of our FFT analysis and is
equal to 2k/N.
2"k
!~ (n, k ) = n + ! (n, k )
N
As the converse of unwrapping we
define a principle argument
function (princarg) to map the
phase to the ]-,] range
Target and deviation phases
If a stable sinusoid with frequency k exists, we can calculate a target
phase value as the sum of the previous unwrapped phase plus the
expected phase increment.
This expected phase increment is equal to the frequency of the sinusoid
multiplied by the time shift (one hop size R).

"~t ((s + 1)R ) = "~ (sR )+ ! k R


The deviation phase is simply the difference between the measured
phase and the target phase at a particular time.
Because of the unwrapping we need to use the principle argument
function:

"~d ((s + 1)R ) = princ arg["~ ((s + 1)R )! "~t ((s + 1)R )]
Instantaneous frequency
From the previous it follows that an expected unwrapped phase can be
calculated as the sum of the target plus the deviation phase:

!~u ((s + 1)R ) = !~t ((s + 1)R )+ !~d ((s + 1)R )


This is useful to define an unwrapped phase difference by subtracting
the expected unwrapped phase to the previous unwrapped phase:

#! ((s + 1)R ) = !~u ((s + 1)R )" !~ (sR )

#$ ((s + 1)R ) = ! k R + princ arg[$~ ((s + 1)R )" $~ (sR )" ! k R ]


Finally we can define the instantaneous frequency as the rate of angular
rotation, i.e. the unwrapped phase difference divided by the time
between successive frames

#" ((s + 1)R )


f i ((s + 1)R ) = fs
2!R
Example App: Time-Scaling

Original Standard PV Adaptive PV


Guitar (20%)

Pop (15%)
Useful References
Zlzer, U. (Ed). DAFX: Digital Audio Effects. John Wiley and Sons (2002)
Chapter 1: Zlzer, U. Introduction.
Chapter 8: Arfib, D., Keiler, F. and Zlzer, U., Time-frequency Processing.
Chapter 10: Amatriain, X., Bonada, J., Loscos, A. and Serra, X. Spectral Processing.
Good read, Chapter 2: Dutilleux, P. and Zlzer, U. Filters

Pohlmann, K. Principles of Digital Audio. McGraw-Hill, Inc. (1995)


Serra, M., Introducing the Phase Vocoder, in Musical Signal Processing, Swets and Zeitlinger Publishers,
1997.
De Gtzen, A., Arfib, D., Bernardini, N., Traditional(?) implementation of a phase vocoder: the tricks of the
trade, pgg. 37-44, DAFx, Verona, Italy, 2000;
Freed, A., Rodet, X., Depalle, P., Synthesis and Control of Hundreds of Sinusoidal Partials on a Desktop
Computer without Custom Hardware, Int. Conf. on Signal Processing Applications and Technology
(ICSPAT92), 1992.
Laroche, J., Time and Pitch Scale Modification of Audio Signals, Application of Digital Signal Processing to
Audio and Acoustics, Ch.7 , Kluwer Academic Publishers, 1998b.
Laroche, J., Dolson, M., Improved phase vocoder time-scale modification of audio, IEEE Trans. on Speech
and Audio Proc., Vol.7,No.3, 1999.

Das könnte Ihnen auch gefallen