Beruflich Dokumente
Kultur Dokumente
in
Audio and Acoustic Signal Processing
ICASSP 2011
Malcolm Slaney, Yahoo! Research Silicon Valley, USA
Patrick A. Naylor, Imperial College London, UK
What do we mean by ‘Trends’?
ACT TS
I N ow
# p a V I TY E M E o n
per C H IEV c a n d efore
s su
b mi
A
g s w e l d n ’t b
tted Th i n c ou
w e
that
Research
CU
Th RIOS
i
int ngs t ITIES
E S to do kno erest hat m
LE NG an t w w ing i gh
L w
CHA gs we o yet b
ha ut t b e
t to we
in d
Th can’t OPPORTUNITIES d o d on
but Things we didn’t
wit ’t
ht
he
know we m
wanted to do
Historical Trends – ICASSP Submissions
Submissions
Year
Historical Trends – ICASSP Accepted Papers
Accepted Papers
Year
Music at ICASSP
• Three sessions
– SS-L5: Music Signal Processing Exploiting Musical Knowledge
– AE-L3: Music Signal Processing
– AE-P7: Music Signal Processing
• Reasons
– New EDICS
– More content
– Commercially relevant
Big Datasets
• Million-Song Dataset
– Distribute features
– Columbia and EchoNest
MIREX Competition
Musical Separation
• Sound separation
– Uses
• Understanding (key, melody)
• Transcriptions
• Multipitch estimation
– With better models
• HMM
• Scores
– Techniques
• NMF
• Matching Pursuit
• PLCA
From: POLYPHONIC AUDIO-TO-SCORE ALIGNMENT BASED ON BAYESIAN LATENT HARMONIC ALLOCATION HIDDEN MARKOV MODEL. Akira Maezawa,
Hiroshi G. Okuno, Tetsuya Ogata, Kyoto University, Japan; Masataka Goto, National Institute of Advanced Industrial Science and Technology, Japan
Music Research
• Tagging
– Genre
– Emotion
• Miscellaneous
– Morphing
– Similarity
AASP-P2.1: SOUND MORPHING BY FEATURE INTERPOLATION, Marcelo Caetano, Xavier Rodet, Institut de Recherche
et Coordination Acoustique/Musique, France
Applications vs. Algorithms?
If I’m going to be Queen, I suppose I
will not have much time left for
Audio and Acoustic Signal Processing
My Expectation
Maximization
algorithm has converged !
If there is a trend towards things that look nice (applications), let’s not
lose sight of the fundamental power behind them (algorithms).
Microphone Array Signal Processing
APPLICATIONS TASKS
• Hearing aids • Localization of sources
• TV / Entertainment • Tracking
• Extraction/Separation
• Inference of room geometry
GEOMETRY and DISTRIBUTION
• Linear, planar/cylindrical,
spherical, distributed
• Spacing and orientation
the Eigenmike®
– mh Acoustics
32 elements
8.4 cm rigid sphere
2 - 64 elements, 0.5 m, linear ‘wing’ array
Planar Array Geometry Directivity Index
measured vs theoretical
Source Separation
Colin Cherry
‘TRENDSETTER’
Source Separation
• Dereverberation technology
real-world applications
• Dereverberation1 technology
real-world applications
Clean Reverb-
Speech eration
HMM Model
Recurring Theme
S P A R S I T Y!
History of Sparsity at ICASSP
I thought:
Matching Pursuit Dumb!
Compressed Sensing
L1 Regularization
SS-L7.1: SPEECH PROCESSING WITH A CORTICAL REPRESENTATION OF AUDIO, Nima Mesgarani, Shihab Shamma,
University of Maryland, United States
Deep Belief Network
Basis 1
Basis 2
Industrial Perspectives
• Trend
– Origin: Old English trendan ‘revolve, rotate’, of Germanic origin
• Better representations
– Sparse
• Matching pursuit
• DBNs
• features for recognition
– New angles (cortical and textures)
– Subspaces (latent and otherwise)