FBevilacqua Gestrecoseminar2010

Gesture Control of Music Systems
Frdric Bevilacqua
Ircam
Real Time Musical Interactions Team
Frederic.Bevilacqua@ircam.fr
http://imtr.ircam.fr
Plan
Research Context
Digital Musical Instruments
Gesture and Music
Gesture Analysis/Recognition of Musicians Gestures
Mapping between Gestures and Sounds
Gesture Following and Recognition
Applications
IRCAM- Real Time Musical Interactions
Sound Synthesis
analysis/
synthesis
concatenative
synthesis
physical model
Gesture Capture
sensors
video
game interfaces
Digital Music Instruments
Musical Digital Instruments

in
sound
capture
motion
capture
sound
analysis
gesture
analysis
interaction
paradigms
synchronization
gesture-sound
mapping
out
sound synthesis
audio processing
visualization
Contexts
Music Technology
Human Machine Interaction
Gesture Research, cognitive sciences
Digital Musical Instruments
Interface
Sound Synthesis
Interaction Design
Digital Music Instruments

Instrument-like
replicate an acoustic instrument
Instrument-inspired
gesture or interface inspired from an acoustic instrument, but the final

musical goal is different than the acoustic instrument
Extended instrument, Augmented Instrument, Hyper Instrument

Acoustic instrument with additional sensors
Alternate controller
New design
Marcelo M. Wanderley and Philippe Depalle. 2004. "Gestural Control of Sound

Synthesis". Proceedings of the IEEE, vol. 92, No. 4 (April), pp. 632-644
Eduardo R. Miranda and Marcelo M. Wanderley. New Digital Musical
Instruments: Control and Interaction beyond the Keyboard, A-R
Editions, Spring 2006
7
Instrument-like
clavier MIDI Keyboard
EWI Electronic Wind

Controller (AKAI)
Marimba Lumina
(Buchla)
http://fr.youtube.com/watch?v=FNIKY5kGwLg
8
Instrument-inspired
Violon MIDI - Suguru Goto

9
Augmented Instruments
HyperCello
Tod Machover / Yo-Yo Ma
(1991)
Clarinette & DataGlove,

Butch Rovan
10
Theremin, 1928
Alternative controllers
The Hands, Michel Waisviz
Le Mta-Instrument - Serge de Laubier
http://fr.youtube.com/watch?v=U1L-mVGqug4
The Hands, Michel Waisviz
Georgia Techs Guthman Musical Instrument Competition (2009)
Jaime Oliver's Silent Drum
Georgia Techs Guthman Musical Instrument Competition (2009)
the Slabs, David Wessel (CNMAT, Berkley)
Commercial interfaces
Stanford Laptop Orchestra (SLOrk)
http://mopho.stanford.edu/
Da Fact
IRCAM - Real Time Musical Interactions
reactable
http://www.reactable.com/
reactable
Installation Grainstick
Cit des Sciences Paris
Pierre Jodlowski Raphal Thibault

Ircam
Applications
Music & New Media

professional level, music performance, composition
music pedagogy
music game
HCI: interaction paradigms using expressive

gestures
Rehabilitation (?)
Sonification of gesture/action (?)
Links to the HCI field
Notion of embodied interaction

P. Dourish Where The Action Is: The Foundations of
Embodied Interaction, MIT Press
M. Leman Embodied Music Cognition and Mediation
Technology, MIT Press
Tangible interfaces, augmented reality
Affective computing
Collaborative and distributed interaction

Bill Buxton
http://www.billbuxton.com/
buxtonIRGVideos.html
http://www.youtube.com/watch?v=Arrus9CxUiA
Musical Interfaces
music is the commonly used

reversed sounds play with ou
can lead to an upbeat and ale
All in all, practical and c
design strategies, as shown in
ease of use, the creative side
ships. From a perceivers po
design found in electronic d
signs are often too confusing
that practical action-sound de
bewildering.
action-perception loop
of timing and synchronization
importance
requirements: low latency (< 10 ms)
Practical
from triggering events...

to using continuous gestures
Easy to
use Interesting
to use
Creative
notion of expressivity: measure of how is a gesture

performed
from A.R.Jensenius PhD , 2007
3.4 Summary
notions of goal and efficiency different

than
in
Ideas from embodied music c
for the discussion of actionstandard HCI
ter. I argue
that our percepti
"Clearly, electronic music systems allow much freedom for the

performer, because the mappings between control units, on the
one hand, and some production units, on the other hand, are not
constrained be any biomechanical regularities. (...). However, as
most electronic music performers know, it is exactly this freedom
of mapping that may disturb the sens of contact and of nonmediation".
"Can we find a way of interacting with machines so that artistic
expression can be fully integrated with contemporary
technologies?
Marc Leman, Embodied Music Cognition and Mediation Technology, MIT Press.
27
Gesture and Music
Gesture and Music

Some references:
Cadoz, C. and M. M. Wanderley, Gesture - Music, in Trends in Gestural Control of

Music, M. M. Wanderley and M. Battier, Editors. 2000, Ircam - Centre Pompidou: Paris,
France. p. 71--94.
Jensenius, A. R., M. M. Wanderley, R. I. Godoy and M. Leman (2010). Concepts and

Methods in Research on Music-related Gestures. In Gody, R. I. and M. Leman (Eds.),
Musical Gestures: Sound, Movement and Meaning. Routledge.
also helps us set up expectations when perceiving a performance. This is

get surprised if a musician happen to perform outside of such conventional
spaces, for example by playing with the fingers on the strings of the piano
sical experimentation happen due to such exploration of the boundaries o
performance spaces.
Types of Musical Gestures

Ancillary,
sound-accompanying,
and communicative
Sound-producing
Figure 4.5: The action spac

as an imaginary box surroun
in which movements can be
Here the action spaces for v
related movements are indica
sound-producing and soundtions, and ancillary, soundand communicative movemen
Sound-modifying
Jensenius, A. R., M. M. Wanderley, R. I. Godoy and M. Leman (2010). Concepts and Methods in Research on Musicrelated Gestures. In Gody, R. I. and M. Leman (Eds.), Musical Gestures: Sound, Movement and Meaning. Routledge.
Types of Musical Gestures

48
CHAPTER 4. MUSIC-RELATED MOVEMENT
Phrasing
Expressive
Theatrical
Excitation
Support
Entrained
Modification
Sound-producing
Ancillary
Communicative
Figure 4.7: Examples of where different types of music-related movements (sound-producing, ancillary
and communicative) may be found in piano performance.
Jensenius, A. R., M. M. Wanderley, R. I. Godoy and M. Leman (2010). Concepts and Methods in Research on Musicrelated Gestures. In Gody, R. I. and M. Leman (Eds.), Musical Gestures: Sound, Movement and Meaning. Routledge.
Capturing Musician Motion

Violin bowing
sensor attached
on the bow
3D optical motion capture
F. Bevilacqua, N. Rasamimanana, E. Flty, S. Lemouton, F.

Baschet The augmented violin project: research,
composition and performance report NIME 06
hybrid system
E. Schoonderwaldt, N. Rasamimanana, F. Bevilacqua
Combining accelerometer and video camera:
Reconstruction of bow velocity profiles , NIME 2006
Capturing Musician Gestures
Direct capture of movement, pressure etc using sensors

+
Indirect capture based on the sound analysis

+
+ analysis
Bowing styles characterization

Martel
amax
amin
[a.u.]
amin
Spiccato
amax
amin
amax
Dtach
amax
[a.u.]
amin
2 violin players, 2 tempi (60 bpm, 120 bpm)
dynamics (pp, mf, ff)
N. Rasamimanana et al., GW 2005, Lecture Notes in Artificial Intelligence 3881, pp. 145155, 2006.
Similar works
PCA + KNN
D. Young. Classification of common violin bowing
techniques using gesture data from a playable
measurement system. In in NIME 2008 Proceedings,
2009.
Bowing - Segmentation
Acceleration [a.u]
dtach
martel
t
time
Bowing recognition: Real time implementation

(Max/MSP)
OSC in
median filter
offset removal
gesture intensity
computation
1st order filtering

peak detection
segmentation
peak selection
amax, amin
characterization/
recognition
knn recognition
OSC out
BogenLied -
accelerometers +
wireless module
mic
Receiver
hub
soundcard
spatialized sound
(6 channels)
Sound
processing
Gesture
processing
Bowing styles
acceleration vs velocity
dtach
martel
spiccato
Influence on bowing frequency

position
velocity
acceleration
time
acceleration
video
N. Rasamimanana et al., GW 2007, Lecture Notes in Artificial Intelligence
mm/s
Bowing model
Trap
Jd
Original
Minimizing
mm/s^2
Minimum impulse : trapezodal

continuous control
Minimum jerk (discrete)
balistic control
41
Bowing style - scale

Finding the best model
Martel
Dtach
Minimum impulse (Trapezoidal)
Minimum Jerk
N. Rasamimanana, F. Bevilacqua. Effort-based analysis of bowing movements: evidence of anticipation

effects. Journal of New Music Research,
42
Gestural Co-articulation
detach
martel
Miminum Impulse
Minimum Jerk
43
Co-articulation effect
major difficulty for segmentation and

characterization
using di-gesture ? (similarly to diphone)
can be used to anticipate (towards intention ?)
expressivity links to co-articulation
Gesture to Sound Mapping
Mapping
Wanderley, M. 2001. Performer-Instrument Interaction: Applications to Gestural Control of Music.

PhD Thesis. Paris, France: University Pierre et Marie Curie - Paris VI
See also:
"Mapping Strategies in Interactive Computer Music." Organised Sound, 7(2), Marcelo Wanderley Ed.
Wanderley, M and Battier, M -editors. Trends in Gestural Control of Music. IRCAM, Centre Pompidou, 2000.
Mapping
Low Level vs High Level
in these different modalicusing on affect, emotion,

ross-modality interactions
Linguistic-based descriptions of semantical properties

(meaning, affect, emotion, expressiveness, and so on)
Figure 1. The layered

conceptual framework
Gesture-based descriptions as trajectories in spaces
distinguishes between
syntax and semantics,
and in between, a
connection layer that
consists of
al framework
uli that make up an artistic

information about expresation can, to some extent,
n communicated among a
al subjects. With multiple
auditory, visual, motoric/
the transmission of expresom one domain to another
usic (auditory) to computer
r from dance (motoric) to
affect/emotion
expressiveness spaces
and mappings.
Signal-based descriptions of the syntactical features
Antonio Camurri, Gualtiero Volpe, Giovanni De Poli, Marc Leman,
"Communicating Expressiveness and Affect in Multimodal Interactive
Systems," IEEE MultiMedia, vol. 12, no. 1, pp. 43-53, Jan. 2005
Signal-based layer
47
The signal-based layer
extracts the relevant
Mapping
Spatial vs Temporal :
Spatial : relationship independent of the temporal ordering of

data
Temporel : relationship between temporal processes
Direct vs Indirect
Direct :
- sensor data directly connected to music parameters
- relationship manually set
Indirect
- uses machine learning techniques to set the relationship
Mapping (Spatial)
one-to-one
sensor data
sound parameters
Mapping Musical Instruments

IDMIL lab, Mc Gill
Mapping
one-to-many
sensor data
sound parameters
Mapping
many-to-one
sensor data
sound parameters
Simple ou complexe mapping ?

Hunt, A., and Kirk, R. 2000. "Mapping Strategies for Musical Performance."
In M. Wanderley and M. Battier, eds. Trends in Gestural Control of Music. Ircam,
Centre Pompidou.
The Physical Sliders Interface
The Multiparametric Interface
Conclusions of Hunt and Kirk study
The multiparametric interface allowed people to think

gesturally, or to mentally rehearse sounds as shapes.
The majority of users felt that the multiparametric
interface had the most long-term potential.
Several users reported that the multiparametric
interface was fun.
Mapping
Spatial vs Temporal :
Spatial : relationship independent of the temporal ordering of

data
Temporel : relationship between temporal processes
Indirect Mapping using Machine Learning Techniques
Neural Network
Mostly static postures
Principal Component Analysis

Data dimension reduction
Finite State Machine

Modeling sequences of postures
DTW, HMM methods

Recognition of temporal profiles
Synchronization and
recognition
Generator
(AVT)
Spelling
Generator
Generator
Sydney Fels : Glove-TalkII
10-30
100
130
200
600
Approximate time per gesture (msec)
adaptive Interface that Maps Hand Gestures to Speech

using neural network
Right Hand data
x, y, z
roll, pitch, yaw
every 1/60 second
User
10 flex angles
4 abduction angles
thumb and pinkie rotation
wrist pitch and yaw
every 1/100 second
Preprocessor
Speech
Synthesizer
Fixed Pitch
Mapping
V/C Decision
Network
Contact
Switches
Vowel
Network
Combining
Function
Consonant
Network
Foot Pedal
Fixed Stop
Mapping
Fels, S. S. and Hinton, G. E. Glove-Talk: A neural network interface between a data-glove and a speech synthesizer.
IEEE Trans. On Neural Networks, vol. 4, No. 1, 1993.
Conducting gestures
Several works on conducting gestures
Study of professional conducting gesture

detections, tempo, anticipation
Beat
- .") '/%0123+ 4"#/5!12#$
6+715)+
'#%89727
Public Installation
Music Pedagogy
(,%"%($%+#(+6%7+2(-%",3$%*+,#"+89*'$34+:;<"%**'#(+=628:01>?+!3"'*?+@"3($%
,".//'
- 0&))#)* 1#&/' 2)3 42) 5+"$6&"7
8&3#2 9+:;.<#)* ,"+.;
=>?0 @2$6&) A)#B&"7#<C
DEFDG @2$6&)' ,&":2)C
H#&/' I+"$6&"7}J$7K"L<6M22$6&)K3&
2+3.$( #2'("#
,9'-" 1-'0":
-2-",',$)& )1
,$2." ,/2"# )1
- ,) 2"-1)-08
)9& ()&*+(,:
& ">,"&*'3."
"(,"* '(/(.$(
% 2',,"-&# '#
" 3"#, 2-)!."
3"', 2-"*$(:
.8 $& '**$,$)&
,5" %"#,+-"A#
8 !"#$% *)"#
C+- 2-".$0:
&$,$)& -'," )1
&$,$)& #/#,"0
)2"&"* $& ,5"
J@ $& I'-(5
'0"9)-;#
#$(8 9$,5 5$#:

#? $, 5'# '.#)
!"#$%%&'()*+#,+-.%+/001+2(-%"(3-'#(34+5#(,%"%($%+#(+6%7+2(-%",3$%*+,#"+89*'$34+:;<"%**'#(+=628:01>?+!3"'*?+@"3($%
Bounce
Detector
Rotate
S1
0.00
State
[S3]
State
[S4]
and
=
Zero
Crossing
C+- 9)-; $# 0),$=',"* 3/ -"#"'-(5 )& &)=". ()02+,"- 0+#$(

'&* 0+.,$0"*$' $&,"-1'("# 1)- 2+3.$( #2'("# #+(5 '# 0+#"+0#
X#"" W$%+-" ZY8 '&* !"#$% 3+$.*# )& )+- 2-$)- ">2"-$"&(" 9$,5 *":
#$%&$&% $&,"-'(,$=" )-(5"#,-'. ()&*+(,$&% ">5$3$,#8 $&(.+*$&% 7'*8
,"#%. 9*!:',(*%8 '& ">5$3$, 1)- ,5" QCBJ[ CW IBJ\H $& ]$"&&'
X())-*$&',"* 3/ I'> I^+5.5^'+#"-8 &)9 ', !'-0#,'*, B&$="-#$,/Y
_Z` '&* ' ()..'3)-',$)& 9$,5 S"-"#' I'--$& P';-' )& ;")<*' (:'
State Machine
0.12
S2
0.12
S5
0.84
tempo
beat
0.31
Zero
Crossing
+
6789,( /: !"#$%&'(; <= 7=>(,<?>7@( ?+=-9?>7=8 (AB7C7> D+,

?B7E-,(= >B<> *( -(@(E+F(-; <> >B( "(>>) ",7== &B7E-,(=G.
H9.(9I 7= H7E*<9J((; 5#!0 )*'%' "++#"&$ ,'-&%#$. '/ %*#
0#%%. 0&122 3*145&#26$ !-$#-7 12 !148"-9##: ;<: =>?@
State
[S2]
and
=
Bounce
Detector
Rotate
S5
0.00
Zero
Crossing
+
State
[S1]
S4
0.63
S2
S3
0.31
S3
S4
0.63
1 S1
State
[S5]
and
=
0.84
!"#$%& '( )*& +&,- !#$%& .*/0. -*& !"#$% #%12* ,/% -*& !/$%34&1- 5&$-%1+36&#1-/ #&.-$%& 2%/!+&7 !"8& ,&1-$%&. 1%& 9&-&:-&9;
0*":* 1%& $.&9 -/ -%"##&% -*& 2%/#%&.. /, 1 .-1-& <1:*"=& -*1- 1+./ 1:-. 1. 1 >&1- 2%&9":-/%7 )*& "=2$- -/ -*& .-1-& <1:*"=& ". -*&
:$%%&=- 2%/#%&.. ?@ -/ AB /, -*& >1-/= 1. "- </8&. -*%/$#* /=& :/<2+&-& :C:+& /, -*& #&.-$%&; .-1%-"=# 1- -*& !%.- >&1-7 )*& %"#*!#$%& .*/0. -*& :/%%&.2/=9"=# >&1- 21--&%= -*1- ". -%1:D&9E =$<>&%&9 :"%:+&. "=9":1-& >&1-.; .F$1%&9 +1>&+. "=9":1-& -*& ,&1-$%&.
-*1- 1%& -%1:D&9 1=9 -*& .-1-& -*1- -*&C :/%%&.2/=9 -/7
)1>+& O( G$<<1%C /, +1-&=:C %&.$+-. ,/% -*& !/$%34&1A( G$<<1%C /, +1-&=:C %&.$+-. ,/% -*& H23I/0= 2%/!+&7
E.Lee, I.Grll, H.Kiel, and J.Borchers.)1>+&
conga:
a framework
for ,)".
adaptive conducting
gesture analysis. In NIME 06:
5&$-%1+36&#1-/ 2%/!+&7
!"#$ %&'
(#)*+ ,-*).
/01#234
562 507 %&'
!"#$ %&' (#)*+ ,-*).
/01#234 ,)".
Proceedings of the 2006 conference on New
for;< musical
Paris,
507 France,
%&'
8 interfaces
9:
88=
>< expression, pages 260265,562
?
;
:
9@
><
88:
<;
@8
@8
88=
8::
8::
@<
88;
88@
8
?
;
8A:
><
>@
?=
=?
8@
<=9 8=9
<<< ?A;
IRCAM??9
8A=
Real Time Musical Interactions
Multimodal Music Stand

Figure 1. Multimodal
interface system model.
Figure 3. Prototype MMSS

with electric-field-sensor
antennas mounted at the
corners of the stand.
the multimodal detection layer via OSC. Using four

channels of sensing makes it possible for the MMSS
to provide full three-dimensional input. In contrast
with the Theremins dual-channel approach, threedimensional proximity can be sensed. The overall
intensity of all four antennas is used to determine
the z-axis of the gestural input, as this mixture
corresponds directly to the performers overall
proximity to the stand. The independent antennas
signal strengths correspond to each quadrant of the
xy space, so they are used to gather up/down and
left/right gestures, and visualized using a simple
display window in Max/MSP/Jitter.
Multimodal Detection Layer
The Multimodal Detection layer integrates audio
and visual classification results (pitch and audio amplitude estimates, face detections, angle estimates)
as well as proximity of the performer to the electricElectric-Field Sensing
field sensors for gesture detection. A GUI allows
composers to define the types of gestures occurring
The Multimodal Music Stand system incorporates
in the piece, either asynchronously (occurring at
four electric-field sensors (Mathews 1989; Boulanger
any time in the piece) or synchronously (ordered
and Mathews 1997; Paradiso and Gershenfeld 1997),
current
system
stillcan
lacks
precise,
dynamic
synchroby timing).
Gestures
be defined
to occur
in a
as shown in Figure 3. These are used as part of the
single modality
(e.g., the
occurrence of one
nization
of the alone
acoustic
instrument
and computer
multimodal gesture-detection system and also as
particular note), or more robust combinations by
input sources for the control of continuously variable
algorithm
at smaller time scales. This problem will
requiring that gestures occur in multiple modalities
musical parameters, such as sound brightness or
be addressed
in afuture
versions
the MMSS.
together within
short time
period. Forof
example,
density. The electric-field sensors capture bodily and
a gaze to the side-mounted camera, along with a
instrumental gestures (they are sensitive to both)
certain loudness of playing and/or proximity to one
made by the performer, which are tracked via the
antenna, can be required for a particular gesture.
four sensor antennas.
Uponas
detection
of the pre-defined
gesture, a trigger
MMSS
an Intelligent
Space
The electric-field-sensing technique is based on
is sent to the synthesis machine in the form of an
the original Theremin circuit topology (Smirnov
OSC message.
2000), but all timing calculations are done enInteractive music frequently addresses the issues
tirely in the digital domain. Whereas Theremins
circuit designs utilized analog heterodyning techAudio Synthesis and
of interactivity
by Transformation
requiring performers to work
niques, the MMSS only uses the front end of
withThe
additional
physical elements to which they
this type of analog circuit. The remaining logic is
sound synthesis component of the MMSS sysare unaccustomed.
These physical
elements include
accomplished through the measurement of hightem is based on a clientserver
model (McCartney
frequency pulse widths using custom-designed
2002).
The
synthesis
server
receives
network
comfoot mands
pedals
(discrete and continuous),
worn sensors,
firmware on the CREATE USB Interface (Overholt
that are mapped into control logic and/or
Detection layer, where higher-level gestures are
derived.
environment (Gumtau et al. 2005). MEDIATE is a

responsive environment replete with sensors (microphones, interactive floor, cameras, interactive
wall, and objects) that promotes multi-sensory inOverholt, D., Thompson, J., Putnam, L., Bell, B., Kleban, J., Sturm, B., and Kuchera-Morin,
J. 2009.
multimodal
system
for
teraction with
theAspace.
MEDIATE
evaluates
the
measurements
gesture recognition in interactive music performance. Comput. Music J. 33, 4 (Dec.
2009), 69-82 of its sensors and makes decisions
about the novelty or repetitiveness of participant
actions to tailor media feedback accordingly.
The MMSS with its camera, microphone, and
electric-field sensing begins to embody the metaconcept of an intelligent, sensing, general performative space that can be expanded and further
developed for precise dynamic,
interactive
control
IRCAMReal Time Musical
Interactions
and devices attached to the instrument. These allow
Score Following - Antescofo~

Born in 2008, Official release :Nov. 2009, IRCAM Forum
used in concerts worldwide, 2 workshops, 10 actives pieces,
active collaborations with composers
A. ContANTESCOFO: Anticipatory Synchronization and Control of Interactive Parameters in Computer Music,

International Computer Music Conference, North Irland, 2008
Score following
For Score Following References:

http://cosmal.ucsd.edu/arshia/index.php?n=Main.Scofobib
http://imtr.ircam.fr/imtr/Score_Following_History
Best systems use Markov/Semi-Markov modelling of musical events
performer
training
Observati
scor
observation
Hidden Markov
position
Decodi
position in
q)ur
score inspired by the observations in section II-B. Given the

scores state-space representation, the realtime system extracts
instantaneous beliefs or observation probabilities of the audio
features calculated from the stream, with regard to the states
of the score graph. The goal of the system is to then integrate
this instantaneous belief with past and future beliefs in order
to decode the position and tempo in realtime. Figure 3 shows
a general diagram of our system.
Score following
s with no jumps to
the minimum time
ed version has been
g systems where the
ptimization over the
e music score ( [12],
Antescofo (Anticipatory Score Follower)
n a Semi-Markov
single state (instead
y using an explicit
j (u) for each state
the discrete random
i from a state space
ate m, then St = m
Audio Stream
off-line
real-time
Features
Score
Inference & Decoding
Score
Parser
Score
Actions
Audio
Tempo
n the duration modScore Position

Tempo
with this timing. In
ot a Markov process Fig. 3. General System Diagram
Arshia Cont. A coupled duration-focused architecture for realtime music to score alignment, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009 (in press).
process in between
To tackle the problem, we adopt a generative approach
v.
Score Following / Gesture Follower

score
Modeling (HMM)
symbols
Modeling (HMM)
signal
gesture follower @ Ircam

http://imtr.ircam.fr/imtr/Gesture_Follower
Bevilacqua, F., Zamborlin, B., Sypniewski, A., Schnell, N., Gudy, F., Rasamimanana, N. Continuous realtime gesture
following and recognition, accepted in Lecture Notes in Computer Science (LNCS), Gesture in Embodied Communication
and Human-Computer Interaction, Springer Verlag. 2009
F. Bevilacqua, F. Gudy, N. Schnell, E. Flty, N. Leroy, " Wireless sensor interface and gesture-follower for music
pedagogy", Proc. of the International Conference of New Interfaces for Musical Expression (NIME 07), p 124-129, 2007
Goals
Hyp: Gesture meaning is in temporal evolutions

Real-time gesture analysis :
gesture following: time progression of the
performed gesture
recognition/characterization: similarity of the
performed gesture to prerecorded gestures
Requirements
simple learning procedure, with a single example

adaptation to the user idiosyncrasies
continuous analysis from the beginning of the gestures
Gesture ?
Any continuous datastream of parameters

typically 0.1 to 1000 Hz
from motion capture systems:
from sound descriptors
multimodal data
image descriptors
accelerometers, gyroscope, magnetometers
pitch, loudness
mfccs, ...
Time Profile Modeling: HMM

Markov Models
probability density function
sensor
value
time
transition probabilities
Markov Chains
HMM structures
one state every
two samples
one state
every sample
time
time
maximum relative speed = 2
maximum relative speed = 2

Hybrid Approach
Hybrid between:
Template based - Dynamic Time Warping
Linear Dynamics Model
HMM
Similar to S. Rajko et al. (ASU), also developed in

an artistic context
G. Qian, T. Ingalls and J. James, Real-time Gesture Recognition with Minimal

Training Requirements and On-line Learning, to appear in IEEE Conference on
Computer Vision and Pattern Recognition, 2007.
S. Rajko and G. Qian, A Hybrid HMM/DPA Adaptive Gesture Recognition
Method, ISVC 2005,p 227-234.
Real-time time warping

performed gesture (live)
gesture
parameter
recorded example
time
Synchronization/following
Recognition
Anticipation (prediction)
Time warping
acceleration
references
time warped
performed gesture
x
y
z
time
Learning phase
Transition matrix
left-to-right Markov chain
states regularly spaced in time
transition matrix set by the sampling rate
direct relationship between state number i and time
(T= 1/1-a, where a is the self transition prob)
Emission probabilities
values from the time profile
calculated or set by user
Forward Calculation
State probability for given
observation O(tn) = b
Transition Matrix
" ( t n +1 ) = A[" ( t n ) # b]
state probability
at t = tn+1
Decoding phase
Using the forward computation [Rabiner 89] (causal !)

Compute the probability of being at state i
!
state i
Decoding phase
State with maximum probability at time t
time progression
!
state i
i = likelihood at time t
Evaluation with synthesized signals

test signal
reference signal
mean error (sample #)
scaling
offset
noise
Following long sequences
Computation of on a sliding window
Gesture Follower - Context

dance
(performance
and installation)
music pedagogy
music performance
StreicherKreis - Florence Baschet
gesture =
acceleration
angular velocity
pressure
audio energy
Synchronizing Sound to Gesture
Music Pedagogy applications
Conducting
Atelier des Feuillantines Fabrice Gudy

Homo Ludens (Richard Siegal - The Bakery)
2:55
Recognizing movement qualities
Sarah Fdili Alaoui (PhD work)

Collaboration with the dance company Emio Greco I PC
Towards Segmental Models

Goal: classification / segmentation of sounds and gestures
based on their temporal evolutions
Approach: segmental HMM models
yt
classical HMM
steps
8
7
6
5
4
3
qt
2
1
Gesture Follower
steps
400
600
800
1000
1200
1400
1600
8
7
6
5
4
q1
3
2
1
segmental HMM
trajectories
400
600
800
1000
1200
1400
q2
q3
qT
1600
[ ytn ln +1 , ... , ytn 1, ytn ]
8
7
...
6
5
4
3
2
1
400
600
800
1000
1200
1400
1600
qn
ln
Sound and gesture morphologies

classification/segmentation on a violin database
(pitch/loudness profiles)
PhD Julien Bloit & Projet Interlude
Modelling by primitive assembling:
f1
s1
s2
s3
f2
s4
s5
s6
Segmentation on a continuous stream:

$# $# # # # # # #
! "
sfz
sfz
Loudness
1
f3
I4
s7
I1
I2
I4
0.8
f4
f5
f6
s8
s9
0.6
0.4
0.2
0
10
11
12
13
14
15
16
17
18
time (s)
[1] J. Bloit, N. Rasamimanana, and F. Bevilacqua. Modeling and segmentation of audio descriptor
profiles with segmental models. Pattern Recognition Letters, 2009.
[2] J. Bloit, N. Rasamimanana, and F. Bevilacqua. Towards morphological sound description using
segmental models. In DAFX, Como, Italy, 2009.
Hierarchical / Two-level Modeling

1.Temporal Segments Temporal
2. Sequence of Segments
N
D
A
gesture
time
Credits and Acknowledgements

Real Time Musical Interaction team:
Frdric Bevilacqua, Tommaso Bianco, Julien Bloit, Riccardo Borghesi, Baptiste
Caramiaux, Arshia Cont, Arnaud Dessein, Sarah Fdili Alaoui, Emmanuel Flty, VassiliosFivos Maniatakos, Norbert Schnell, Diemo Schwarz, Fabrice Gudy, Alain Bonardi,
Nicolas Rasamimanana, Bruno Zamborlin
Current Support of the projects:

ANR projects: Interlude, Topophonie (France).
EU-ICT Project SAME
Thanks to
Atelier les Feuillantines and students, Remy Muller, Jean-Philippe Lambert, Alice
Daquet, Anthony Sypniewski, Donald Glowinski, Bertha Bermudez and EG|PC, Myriam
Gourfink, Richard Siegal, Hillary Goidell,Florent Berenger, Florence Baschet

FBevilacqua Gestrecoseminar2010

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

FBevilacqua Gestrecoseminar2010

Hochgeladen von

Copyright:

Verfügbare Formate

Gesture Control of Music Systems

IRCAM- Real Time Musical Interactions

Digital Music Instruments

Musical Digital Instruments

IRCAM- Real Time Musical Interactions

Digital Music Instruments

gesture or interface inspired from an acoustic instrument, but the final

Extended instrument, Augmented Instrument, Hyper Instrument

Marcelo M. Wanderley and Philippe Depalle. 2004. "Gestural Control of Sound

IRCAM- Real Time Musical Interactions

EWI Electronic Wind

IRCAM- Real Time Musical Interactions

Violon MIDI - Suguru Goto

IRCAM- Real Time Musical Interactions

Clarinette & DataGlove,

IRCAM- Real Time Musical Interactions

IRCAM- Real Time Musical Interactions

The Hands, Michel Waisviz

Le Mta-Instrument - Serge de Laubier

The Hands, Michel Waisviz

IRCAM- Real Time Musical Interactions

Georgia Techs Guthman Musical Instrument Competition (2009)

Jaime Oliver's Silent Drum

Georgia Techs Guthman Musical Instrument Competition (2009)

the Slabs, David Wessel (CNMAT, Berkley)

IRCAM- Real Time Musical Interactions

Stanford Laptop Orchestra (SLOrk)

IRCAM- Real Time Musical Interactions

IRCAM - Real Time Musical Interactions

IRCAM - Real Time Musical Interactions

Cit des Sciences Paris

Pierre Jodlowski Raphal Thibault

Music & New Media

HCI: interaction paradigms using expressive

Links to the HCI field

Notion of embodied interaction

Tangible interfaces, augmented reality

Collaborative and distributed interaction

IRCAM- Real Time Musical Interactions

music is the commonly used

from triggering events...

notion of expressivity: measure of how is a gesture

from A.R.Jensenius PhD , 2007

notions of goal and efficiency different

"Clearly, electronic music systems allow much freedom for the

Gesture and Music

Gesture and Music

Cadoz, C. and M. M. Wanderley, Gesture - Music, in Trends in Gestural Control of

Jensenius, A. R., M. M. Wanderley, R. I. Godoy and M. Leman (2010). Concepts and

IRCAM - Real Time Musical Interactions

also helps us set up expectations when perceiving a performance. This is

Types of Musical Gestures

Figure 4.5: The action spac

IRCAM - Real Time Musical Interactions

Types of Musical Gestures

CHAPTER 4. MUSIC-RELATED MOVEMENT

IRCAM - Real Time Musical Interactions

Capturing Musician Motion

3D optical motion capture

F. Bevilacqua, N. Rasamimanana, E. Flty, S. Lemouton, F.

Capturing Musician Gestures

Direct capture of movement, pressure etc using sensors

Indirect capture based on the sound analysis