Beruflich Dokumente
Kultur Dokumente
Frdric Bevilacqua
Ircam
Real Time Musical Interactions Team
Frederic.Bevilacqua@ircam.fr
http://imtr.ircam.fr
Plan
Research Context
Digital Musical Instruments
Gesture and Music
Gesture Analysis/Recognition of Musicians Gestures
Mapping between Gestures and Sounds
Gesture Following and Recognition
Applications
Sound Synthesis
analysis/
synthesis
concatenative
synthesis
physical model
Gesture Capture
sensors
video
game interfaces
motion
capture
sound
analysis
gesture
analysis
interaction
paradigms
synchronization
gesture-sound
mapping
out
sound synthesis
audio processing
visualization
Contexts
Music Technology
Human Machine Interaction
Gesture Research, cognitive sciences
Digital Musical Instruments
Interface
Sound Synthesis
Interaction Design
New design
Instrument-like
clavier MIDI Keyboard
Marimba Lumina
(Buchla)
http://fr.youtube.com/watch?v=FNIKY5kGwLg
8
Instrument-inspired
Augmented Instruments
HyperCello
Tod Machover / Yo-Yo Ma
(1991)
Theremin, 1928
Alternative controllers
http://fr.youtube.com/watch?v=U1L-mVGqug4
IRCAM- Real Time Musical Interactions
Commercial interfaces
http://mopho.stanford.edu/
Da Fact
reactable
http://www.reactable.com/
IRCAM - Real Time Musical Interactions
reactable
Installation Grainstick
Applications
Rehabilitation (?)
Sonification of gesture/action (?)
IRCAM- Real Time Musical Interactions
Affective computing
Bill Buxton
http://www.billbuxton.com/
buxtonIRGVideos.html
http://www.youtube.com/watch?v=Arrus9CxUiA
Musical Interfaces
action-perception loop
of timing and synchronization
importance
requirements: low latency (< 10 ms)
Practical
Easy to
use Interesting
to use
Creative
3.4 Summary
ter. I argue
that our percepti
IRCAM- Real Time Musical Interactions
27
Sound-producing
Sound-modifying
Jensenius, A. R., M. M. Wanderley, R. I. Godoy and M. Leman (2010). Concepts and Methods in Research on Musicrelated Gestures. In Gody, R. I. and M. Leman (Eds.), Musical Gestures: Sound, Movement and Meaning. Routledge.
Phrasing
Expressive
Theatrical
Excitation
Support
Entrained
Modification
Sound-producing
Ancillary
Communicative
Figure 4.7: Examples of where different types of music-related movements (sound-producing, ancillary
and communicative) may be found in piano performance.
Jensenius, A. R., M. M. Wanderley, R. I. Godoy and M. Leman (2010). Concepts and Methods in Research on Musicrelated Gestures. In Gody, R. I. and M. Leman (Eds.), Musical Gestures: Sound, Movement and Meaning. Routledge.
sensor attached
on the bow
hybrid system
E. Schoonderwaldt, N. Rasamimanana, F. Bevilacqua
Combining accelerometer and video camera:
Reconstruction of bow velocity profiles , NIME 2006
+ analysis
amin
[a.u.]
amin
Spiccato
amax
amin
amax
Dtach
amax
[a.u.]
amin
2 violin players, 2 tempi (60 bpm, 120 bpm)
dynamics (pp, mf, ff)
N. Rasamimanana et al., GW 2005, Lecture Notes in Artificial Intelligence 3881, pp. 145155, 2006.
Similar works
PCA + KNN
D. Young. Classification of common violin bowing
techniques using gesture data from a playable
measurement system. In in NIME 2008 Proceedings,
2009.
Bowing - Segmentation
Acceleration [a.u]
dtach
martel
t
time
IRCAM- Real Time Musical Interactions
offset removal
gesture intensity
computation
segmentation
peak selection
amax, amin
characterization/
recognition
knn recognition
OSC out
BogenLied -
accelerometers +
wireless module
mic
Receiver
hub
soundcard
spatialized sound
(6 channels)
Sound
processing
Gesture
processing
Bowing styles
acceleration vs velocity
dtach
martel
spiccato
velocity
acceleration
time
acceleration
video
mm/s
Bowing model
Trap
Jd
Original
Minimizing
mm/s^2
41
Martel
Dtach
Minimum Jerk
Gestural Co-articulation
detach
martel
Miminum Impulse
Minimum Jerk
43
Co-articulation effect
Mapping
Mapping
distinguishes between
syntax and semantics,
and in between, a
connection layer that
consists of
al framework
affect/emotion
expressiveness spaces
and mappings.
Signal-based descriptions of the syntactical features
Antonio Camurri, Gualtiero Volpe, Giovanni De Poli, Marc Leman,
"Communicating Expressiveness and Affect in Multimodal Interactive
Systems," IEEE MultiMedia, vol. 12, no. 1, pp. 43-53, Jan. 2005
Signal-based layer
47
The signal-based layer
extracts the relevant
Mapping
Spatial vs Temporal :
Direct vs Indirect
Direct :
- sensor data directly connected to music parameters
- relationship manually set
Indirect
- uses machine learning techniques to set the relationship
Mapping (Spatial)
one-to-one
sensor data
sound parameters
Mapping
one-to-many
sensor data
sound parameters
Mapping
many-to-one
sensor data
sound parameters
Mapping
Spatial vs Temporal :
Neural Network
Mostly static postures
Synchronization and
recognition
Generator
(AVT)
Spelling
Generator
Generator
10-30
100
130
200
600
x, y, z
roll, pitch, yaw
every 1/60 second
User
10 flex angles
4 abduction angles
thumb and pinkie rotation
wrist pitch and yaw
every 1/100 second
Preprocessor
Speech
Synthesizer
Fixed Pitch
Mapping
V/C Decision
Network
Contact
Switches
Vowel
Network
Combining
Function
Consonant
Network
Foot Pedal
Fixed Stop
Mapping
Fels, S. S. and Hinton, G. E. Glove-Talk: A neural network interface between a data-glove and a speech synthesizer.
IEEE Trans. On Neural Networks, vol. 4, No. 1, 1993.
IRCAM- Real Time Musical Interactions
Conducting gestures
Several works on conducting gestures
(,%"%($%+#(+6%7+2(-%",3$%*+,#"+89*'$34+:;<"%**'#(+=628:01>?+!3"'*?+@"3($%
,".//'
- 0&))#)* 1#&/' 2)3 42) 5+"$6&"7
8&3#2 9+:;.<#)* ,"+.;
=>?0 @2$6&) A)#B&"7#<C
DEFDG @2$6&)' ,&":2)C
H#&/' I+"$6&"7}J$7K"L<6M22$6&)K3&
2+3.$( #2'("#
,9'-" 1-'0":
-2-",',$)& )1
,$2." ,/2"# )1
- ,) 2"-1)-08
)9& ()&*+(,:
& ">,"&*'3."
"(,"* '(/(.$(
% 2',,"-&# '#
" 3"#, 2-)!."
3"', 2-"*$(:
.8 $& '**$,$)&
,5" %"#,+-"A#
8 !"#$% *)"#
C+- 2-".$0:
&$,$)& -'," )1
&$,$)& #/#,"0
)2"&"* $& ,5"
J@ $& I'-(5
'0"9)-;#
!"#$%%&'()*+#,+-.%+/001+2(-%"(3-'#(34+5#(,%"%($%+#(+6%7+2(-%",3$%*+,#"+89*'$34+:;<"%**'#(+=628:01>?+!3"'*?+@"3($%
Bounce
Detector
Rotate
S1
0.00
State
[S3]
State
[S4]
and
=
Zero
Crossing
State Machine
0.12
S2
0.12
S5
0.84
tempo
beat
0.31
Zero
Crossing
+
State
[S2]
and
=
Bounce
Detector
Rotate
S5
0.00
Zero
Crossing
+
State
[S1]
S4
0.63
S2
S3
0.31
S3
S4
0.63
1 S1
State
[S5]
and
=
0.84
!"#$%& '( )*& +&,- !#$%& .*/0. -*& !"#$% #%12* ,/% -*& !/$%34&1- 5&$-%1+36-/ #&.-$%& 2%/!+&7 !"8& ,&1-$%&. 1%& 9&-&:-&9;
0*":* 1%& $.&9 -/ -%"##&% -*& 2%/#%&.. /, 1 .-1-& <1:*"=& -*1- 1+./ 1:-. 1. 1 >&1- 2%&9":-/%7 )*& "=2$- -/ -*& .-1-& <1:*"=& ". -*&
:$%%&=- 2%/#%&.. ?@ -/ AB /, -*& >1-/= 1. "- </8&. -*%/$#* /=& :/<2+&-& :C:+& /, -*& #&.-$%&; .-1%-"=# 1- -*& !%.- >&1-7 )*& %"#*!#$%& .*/0. -*& :/%%&.2/=9"=# >&1- 21--&%= -*1- ". -%1:D&9E =$<>&%&9 :"%:+&. "=9":1-& >&1-.; .F$1%&9 +1>&+. "=9":1-& -*& ,&1-$%&.
-*1- 1%& -%1:D&9 1=9 -*& .-1-& -*1- -*&C :/%%&.2/=9 -/7
)1>+& O( G$<<1%C /, +1-&=:C %&.$+-. ,/% -*& !/$%34&1A( G$<<1%C /, +1-&=:C %&.$+-. ,/% -*& H23I/0= 2%/!+&7
E.Lee, I.Grll, H.Kiel, and J.Borchers.)1>+&
conga:
a framework
for ,)".
adaptive conducting
gesture analysis. In NIME 06:
5&$-%1+36-/ 2%/!+&7
!"#$ %&'
(#)*+ ,-*).
/01#234
562 507 %&'
!"#$ %&' (#)*+ ,-*).
/01#234 ,)".
Proceedings of the 2006 conference on New
for;< musical
Paris,
507 France,
%&'
8 interfaces
9:
88=
>< expression, pages 260265,562
?
;
:
9@
><
88:
<;
@8
@8
88=
8::
8::
@<
88;
88@
8
?
;
8A:
><
>@
?=
=?
8@
<=9 8=9
<<< ?A;
IRCAM??9
8A=
Score following
Observati
scor
observation
Hidden Markov
position
Decodi
position in
q)ur
Score following
s with no jumps to
the minimum time
ed version has been
g systems where the
ptimization over the
e music score ( [12],
n a Semi-Markov
single state (instead
y using an explicit
j (u) for each state
the discrete random
i from a state space
ate m, then St = m
Audio Stream
off-line
real-time
Features
Score
Inference & Decoding
Score
Parser
Score
Actions
Audio
Tempo
Modeling (HMM)
symbols
Modeling (HMM)
signal
Bevilacqua, F., Zamborlin, B., Sypniewski, A., Schnell, N., Gudy, F., Rasamimanana, N. Continuous realtime gesture
following and recognition, accepted in Lecture Notes in Computer Science (LNCS), Gesture in Embodied Communication
and Human-Computer Interaction, Springer Verlag. 2009
F. Bevilacqua, F. Gudy, N. Schnell, E. Flty, N. Leroy, " Wireless sensor interface and gesture-follower for music
pedagogy", Proc. of the International Conference of New Interfaces for Musical Expression (NIME 07), p 124-129, 2007
Goals
performed gesture
recognition/characterization: similarity of the
performed gesture to prerecorded gestures
Requirements
Gesture ?
multimodal data
image descriptors
accelerometers, gyroscope, magnetometers
pitch, loudness
mfccs, ...
time
transition probabilities
Markov Chains
IRCAM - Real Time Musical Interactions
HMM structures
one state every
two samples
one state
every sample
time
time
Hybrid Approach
Hybrid between:
Template based - Dynamic Time Warping
Linear Dynamics Model
HMM
recorded example
time
Synchronization/following
Recognition
Anticipation (prediction)
IRCAM- Real Time Musical Interactions
Time warping
acceleration
references
time warped
performed gesture
x
y
z
time
IRCAM- Real Time Musical Interactions
Learning phase
Transition matrix
left-to-right Markov chain
states regularly spaced in time
transition matrix set by the sampling rate
direct relationship between state number i and time
(T= 1/1-a, where a is the self transition prob)
Emission probabilities
values from the time profile
calculated or set by user
IRCAM- Real Time Musical Interactions
Forward Calculation
State probability for given
observation O(tn) = b
Transition Matrix
" ( t n +1 ) = A[" ( t n ) # b]
state probability
at t = tn+1
IRCAM- Real Time Musical Interactions
Decoding phase
state i
Decoding phase
State with maximum probability at time t
time progression
!
state i
i = likelihood at time t
IRCAM- Real Time Musical Interactions
scaling
offset
noise
IRCAM- Real Time Musical Interactions
music performance
gesture =
acceleration
angular velocity
pressure
audio energy
Conducting
2:55
IRCAM - Real Time Musical Interactions
classical HMM
steps
8
7
6
5
4
3
qt
2
1
Gesture Follower
steps
400
600
800
1000
1200
1400
1600
8
7
6
5
4
q1
3
2
1
segmental HMM
trajectories
400
600
800
1000
1200
1400
q2
q3
qT
1600
8
7
...
6
5
4
3
2
1
400
600
800
1000
1200
1400
1600
qn
ln
s1
s2
s3
f2
s4
s5
s6
f3
I4
s7
I1
I2
I4
0.8
f4
f5
f6
s8
s9
0.6
0.4
0.2
0
10
11
12
13
14
15
16
17
18
time (s)
[1] J. Bloit, N. Rasamimanana, and F. Bevilacqua. Modeling and segmentation of audio descriptor
profiles with segmental models. Pattern Recognition Letters, 2009.
[2] J. Bloit, N. Rasamimanana, and F. Bevilacqua. Towards morphological sound description using
segmental models. In DAFX, Como, Italy, 2009.
IRCAM- Real Time Musical Interactions
D
A
gesture
time
Thanks to
Atelier les Feuillantines and students, Remy Muller, Jean-Philippe Lambert, Alice
Daquet, Anthony Sypniewski, Donald Glowinski, Bertha Bermudez and EG|PC, Myriam
Gourfink, Richard Siegal, Hillary Goidell,Florent Berenger, Florence Baschet