Beruflich Dokumente
Kultur Dokumente
Contents:
Concrete problems
– musical instrument classification
– musical genre classification
– percussive instrument transcription
– music segmentation
– speaker recognition, language recognition, sound effects retrieval,
context awareness, video segmentation using audio,...
Closely related to sound source recognition in humans
– includes segmentation (perceptual sound separation) in
polyphonic signals
Many efficient methods have been developed in the
speech / speaker recognition field
Classification 4
Klapuri
Example: musical instrument recognition
Different acoustic properties of sound sources make them recognizable
– properties result from sound production mechanism
But: even a single source produces varying sounds
– we must find someting characteristic to the source, not sound events alone:
source invariants
Examples below: flute (left) and clarinet (right) – what to measure?
[Eronen&Klapuri,2000]
Classification 5
Simplified:
A. Feature extraction
B. Classification (using models)
1. training phase: Model
learn from signal training
examples,
collect
statistics Segmen Feature
for feature tation extraction Models
values
2. classify new Classify
instances
based on what was
learnt in the training phase recognition
result
Classification 6
2 Feature extraction
Klapuri
Cepstral coefficients
Cepstral coefficients c(k) are a very convenient way to
model spectral energy distribution
c(k ) = IDFT {log DFT {x(n )}}
where DFT denotes the Fourier-tranform and IDFT its inverse
In Matlab
% see ”type rceps”
c = real( ifft( log( abs( fft(x)))));
– only real part, real( ), is taken because numerical precision
produces infinidecimally small imaginary part
Cepstrum coefficients are calculated in short frames over
time
– can be further modeled by calculating e.g. the mean and variance
of each coefficient over time
Classification 9
Spectral features Klapuri
Cepstral coefficients
Only the first M cepstrum coefficients are used as features
– all coefficients model the precise spectrum
– coarse spectral shape is modeled by the first coefficients
– precision is selected by the number of coefficients taken
– the first coefficient (energy) is usually discarded
Usually M = fs / (2000 Hz) is a good first guess for M
Figure: piano spectrum (thin black line) and spectrum modeled with
the first 50 coeffs (thick blue line) or 10 coeffs (red broken line)
Log magnitude
Frequency (Hz)
Classification 10
Spectral features Klapuri
Cepstral coefficients
A drawback of the cepstral coefficients: linear frequency scale
Perceptually, the frequency ranges 100–200Hz and 10kHz – 20kHz
should be approximately equally important
– the standard cepstral coefficients do not take this into account
– logarithmic frequency scale would be better
Why mimic perception?
– typically we want to classify sounds according to perceptual (dis)similarity
– perceptually relevant features often lead to robust classification, too
Desirable for features:
– small change in feature vector Æ small perceptual change
(and vice versa)
Æ Mel-frequency cepstral coefficients fulfill this criterion
Classification 11
Spectral features Klapuri
SC = (∑ K
k =1
k × X (k )
2
) (∑ K
k =1
X (k )
2
)
where |X(k)| and f(k) are the magnitude and frequency of
component k
Bandwidth
BW = (∑ K
k =1
( k − SC ) 2
X (k )
2
) (∑ K
k =1
X (k )
2
)
More specific ones... for example for harmonic sounds:
– spectral irregularity: standard deviation of harmonic amplitudes
from spectral envelope
– even and odd harmonic content (open vs. closed acoustic pipes)
Classification 15
Temporal features
Klapuri
Short-time energy
1 N
STE = ∑ x(n) 2
N n=1
– lousy feature as such, but different statistics of STE are useful
Classification 18
More features...
Klapuri
Feature selection
Klapuri
Features with large values may have a larger influence in the cost
function than features with small values (e.g. Euclidean distance)
For N available data points of the dth feature (d = 1, 2, ..., D);
1 N
mean: md = ∑ xnd
N n=1
1 N
σd ∑ nd d
( )
2 2
variance: = x − m
N − 1 n=1
normalized xnd − md
xˆnd =
feature: σd
The resulting normalised features have zero mean and unit variance
– if desired, feature weighting can be performed separately after normalisat.
Classification 22
x2
x1
Classification 23
Idea: find a linear transform A such that when applied to the feature
vectors x, the resulting new features y are uncorrelated.
Æ the covariance matrix of the feature vectors y is diagonal.
Define the covariance matrix of x by (here we use the training data!)
Cx= E{( x – m)( x – m)T }.
Because Cx is real and symmetric, it is always possible to find a set of
D orthonormal eigenvectors, and hence it can be diagonalised:
A Cx AT = Λ
Above, A is an D by D matrix whose rows are formed from the
eigenvectors of Cx and Λ is a diagonal matrix with the corresponding
eigenvalues on its diagonal
Now the transform can be written as
y = A(x – m)
where m is the mean of the feature vectors x.
Classification 24
Principal component analysis Klapuri
Eigenvectors
Recall the eigenvector equations:
Ced = λded
(C – λdI)ed = 0
where ed are the eigenvectors and λd are the eigenvalues
In Matlab: [A,lambdas]=eig(Cx)
Transformed features
The transformed feature vectors y = A(x – m) have zero
mean and their covariance matrix (ignoring the mean
terms):
Cy = E{ A(x – m) (A(x – m))T } = ACxAT = Λ
i.e. Cy is diagonal and the features are thus uncorrelated.
Furthermore, let us denote by Λ1 ⁄ 2 the diagonal matrix
whose elements are the square roots of the eigenvalues
of Cx. Then the transformed vectors
y = Λ–1 ⁄ 2A( x – m)
have uncorrelated elements with unit variance:
Cy = (Λ–1 ⁄ 2A) Cx (Λ–1 ⁄ 2A)T = ... = I.
Classification 26
Principal component analysis Klapuri
Transformed features
Mean removal: y’ = x – m
Classification 27
Principal component analysis Klapuri
Transformed features
Left: decorrelated features Æ basis vectors orthogonal e1Te2 = 0,
Right: variance scaling Æ basis vectors orthonormal e1Te1 = e2Te2 = 1
Classification 28
xˆ = AM ( x − m )
We obtain an approximation for x, which essentially is the projection
of x onto the subspace spanned by the M orthonormal eigenvectors.
This projection is optimal in the sense that it minimises the mean
square error (MSE)
{
E xˆ − x
2
}
for any approximation with M components
Classification 29
Principal component analysis Klapuri
where Ck = E((x – mk)( x – mk)T ) is the covariance matrix for class k and
p(ωk) the prior probability of the class ωk, i.e., p(ωk) = Nk ⁄ N, where Nk is
the number of samples from class ωk (of total N samples).
trace {Sw} , i.e. the sum of the diagonal elements of Sw, is a measure
of the average variance of the features over all classes.
Classification 31
4 Classification methods
Klapuri
Example
Feature space
Distance metrics
Choice of the distance metric is very important
– Euclidean distance metric:
DE ( x , y ) = ( x − y )T ( x − y ) = ∑ d d
d
( x − y ) 2
P( x ωi )P(ωi )
P(ωi x ) =
P( x )
where P(ωi|x) denotes the probability of class ωi given an observed
feature vector x
Note: we do not know P(ωi|x), but P(x|ωi) can be estimated from the
training data
Moreover, since P(x) is the same for all classes, we can perform
classification based on (MAP classifier):
ωi = arg max i {P( x ωi )P(ωi )}
A remaining task is to parametrize and learn P(x|ωi) and to define (or
learn) the prior probabilities P(ωi)
Classification 45
Statistical classification Klapuri
P( x ωi ) = ∑ wi ,q N ( x; µi ,q , Σ i ,q )
Q
q =1
where wi,q are the weights and N(x,µ,Σ) is a Gaussian distribution
Figure:
GMM in one
dimension
[Heittola,
MSc thesis, 2004]
Classification 46
Statistical classification Klapuri
P( x ωi ) = ∑ wi ,q N ( x; µi ,q , Σ i ,q )
Q
q =1
1 ⎛ 1 ⎞
N ( x ; µ, Σ ) = ( ) ( )
T −1
exp⎜ − x − µ Σ x − µ ⎟
(2π ) Σ ⎝ 2
D
⎠
GMM is a parametric model
– parameters: weights wi,q, means µi,q, covariance matrices Σi,q
– diagonal covariance matrices can be used if features are decorrelated
Æ much less parameters
Parameters of the GMM of each class are estimated to maximize
P(x|ωi), i.e., probability of the training data for each class ωi
– iterative EM algorithm
Toolboxes: insert training data Æ get the model Æ use to classify new
Classification 47
Statistical classification Klapuri
MAP classification
After having learned P(x|ωi) and P(ωi) for each class,
maximum a posteriori classification of a new instance x:
ωi = arg max i {P( x ωi )P(ωi )}
Usually we have a sequence of feature vectors x1:T and
⎧T ⎫
ωi = arg max i ⎨∏ P( xt ωi )P(ωi )⎬
⎩ t =1 ⎭
since logarithm is order-preserving we can maximize
⎧ ⎛ T ⎞⎫
ωi = arg max i ⎨log⎜ ∏ P( xt ωi )P(ωi )⎟⎬
⎩ ⎝ t =1 ⎠⎭
⎧T ⎫
= arg max i ⎨∑ log[P( xt ωi )P(ωi )]⎬
⎩ t =1 ⎭
Classification 48
Statistical classification Klapuri