Sie sind auf Seite 1von 90

Gesture Control of Music Systems

Frdric Bevilacqua
Ircam
Real Time Musical Interactions Team
Frederic.Bevilacqua@ircam.fr
http://imtr.ircam.fr

Plan

Research Context
Digital Musical Instruments
Gesture and Music
Gesture Analysis/Recognition of Musicians Gestures
Mapping between Gestures and Sounds
Gesture Following and Recognition
Applications

IRCAM- Real Time Musical Interactions

Sound Synthesis

analysis/
synthesis

concatenative
synthesis

physical model

Gesture Capture

sensors

video

game interfaces

Digital Music Instruments

Musical Digital Instruments


in
sound
capture

motion
capture

sound
analysis

gesture
analysis

interaction
paradigms
synchronization
gesture-sound
mapping

out

sound synthesis
audio processing
visualization

Contexts
Music Technology
Human Machine Interaction
Gesture Research, cognitive sciences
Digital Musical Instruments
Interface

Sound Synthesis

Interaction Design

IRCAM- Real Time Musical Interactions

Digital Music Instruments


Instrument-like
replicate an acoustic instrument
Instrument-inspired

gesture or interface inspired from an acoustic instrument, but the final


musical goal is different than the acoustic instrument

Extended instrument, Augmented Instrument, Hyper Instrument


Acoustic instrument with additional sensors
Alternate controller

New design

Marcelo M. Wanderley and Philippe Depalle. 2004. "Gestural Control of Sound


Synthesis". Proceedings of the IEEE, vol. 92, No. 4 (April), pp. 632-644
Eduardo R. Miranda and Marcelo M. Wanderley. New Digital Musical
Instruments: Control and Interaction beyond the Keyboard, A-R
Editions, Spring 2006
7

IRCAM- Real Time Musical Interactions

Instrument-like
clavier MIDI Keyboard

EWI Electronic Wind


Controller (AKAI)

Marimba Lumina
(Buchla)
http://fr.youtube.com/watch?v=FNIKY5kGwLg
8

IRCAM- Real Time Musical Interactions

Instrument-inspired

Violon MIDI - Suguru Goto


9

IRCAM- Real Time Musical Interactions

Augmented Instruments

HyperCello
Tod Machover / Yo-Yo Ma
(1991)

Clarinette & DataGlove,


Butch Rovan
10

IRCAM- Real Time Musical Interactions

Theremin, 1928

IRCAM- Real Time Musical Interactions

Alternative controllers

The Hands, Michel Waisviz

Le Mta-Instrument - Serge de Laubier

http://fr.youtube.com/watch?v=U1L-mVGqug4
IRCAM- Real Time Musical Interactions

The Hands, Michel Waisviz

IRCAM- Real Time Musical Interactions

Georgia Techs Guthman Musical Instrument Competition (2009)

Jaime Oliver's Silent Drum

Georgia Techs Guthman Musical Instrument Competition (2009)

the Slabs, David Wessel (CNMAT, Berkley)

Commercial interfaces

IRCAM- Real Time Musical Interactions

Stanford Laptop Orchestra (SLOrk)

IRCAM- Real Time Musical Interactions

http://mopho.stanford.edu/

Da Fact

IRCAM - Real Time Musical Interactions

reactable

http://www.reactable.com/
IRCAM - Real Time Musical Interactions

reactable

IRCAM - Real Time Musical Interactions

Installation Grainstick

Cit des Sciences Paris

Pierre Jodlowski Raphal Thibault


Ircam
IRCAM - Real Time Musical Interactions

Applications

Music & New Media


professional level, music performance, composition
music pedagogy
music game

HCI: interaction paradigms using expressive


gestures

Rehabilitation (?)
Sonification of gesture/action (?)
IRCAM- Real Time Musical Interactions

Links to the HCI field

Notion of embodied interaction


P. Dourish Where The Action Is: The Foundations of
Embodied Interaction, MIT Press
M. Leman Embodied Music Cognition and Mediation
Technology, MIT Press

Tangible interfaces, augmented reality

Affective computing

Collaborative and distributed interaction


IRCAM- Real Time Musical Interactions

Bill Buxton

http://www.billbuxton.com/
buxtonIRGVideos.html

http://www.youtube.com/watch?v=Arrus9CxUiA

IRCAM- Real Time Musical Interactions

Musical Interfaces

music is the commonly used


reversed sounds play with ou
can lead to an upbeat and ale
All in all, practical and c
design strategies, as shown in
ease of use, the creative side
ships. From a perceivers po
design found in electronic d
signs are often too confusing
that practical action-sound de
bewildering.

action-perception loop
of timing and synchronization
importance
requirements: low latency (< 10 ms)

Practical

from triggering events...


to using continuous gestures

Easy to
use Interesting
to use
Creative

notion of expressivity: measure of how is a gesture


performed

from A.R.Jensenius PhD , 2007

3.4 Summary

notions of goal and efficiency different


than
in
Ideas from embodied music c
for the discussion of actionstandard HCI

ter. I argue
that our percepti
IRCAM- Real Time Musical Interactions

"Clearly, electronic music systems allow much freedom for the


performer, because the mappings between control units, on the
one hand, and some production units, on the other hand, are not
constrained be any biomechanical regularities. (...). However, as
most electronic music performers know, it is exactly this freedom
of mapping that may disturb the sens of contact and of nonmediation".
"Can we find a way of interacting with machines so that artistic
expression can be fully integrated with contemporary
technologies?
Marc Leman, Embodied Music Cognition and Mediation Technology, MIT Press.

27

Gesture and Music

Gesture and Music


Some references:

Cadoz, C. and M. M. Wanderley, Gesture - Music, in Trends in Gestural Control of


Music, M. M. Wanderley and M. Battier, Editors. 2000, Ircam - Centre Pompidou: Paris,
France. p. 71--94.

Jensenius, A. R., M. M. Wanderley, R. I. Godoy and M. Leman (2010). Concepts and


Methods in Research on Music-related Gestures. In Gody, R. I. and M. Leman (Eds.),
Musical Gestures: Sound, Movement and Meaning. Routledge.

IRCAM - Real Time Musical Interactions

also helps us set up expectations when perceiving a performance. This is


get surprised if a musician happen to perform outside of such conventional
spaces, for example by playing with the fingers on the strings of the piano
sical experimentation happen due to such exploration of the boundaries o
performance spaces.

Types of Musical Gestures


Ancillary,
sound-accompanying,
and communicative

Sound-producing

Figure 4.5: The action spac


as an imaginary box surroun
in which movements can be
Here the action spaces for v
related movements are indica
sound-producing and soundtions, and ancillary, soundand communicative movemen

Sound-modifying

Jensenius, A. R., M. M. Wanderley, R. I. Godoy and M. Leman (2010). Concepts and Methods in Research on Musicrelated Gestures. In Gody, R. I. and M. Leman (Eds.), Musical Gestures: Sound, Movement and Meaning. Routledge.

IRCAM - Real Time Musical Interactions

Types of Musical Gestures


48

CHAPTER 4. MUSIC-RELATED MOVEMENT

Phrasing

Expressive
Theatrical

Excitation
Support
Entrained

Modification
Sound-producing

Ancillary

Communicative

Figure 4.7: Examples of where different types of music-related movements (sound-producing, ancillary
and communicative) may be found in piano performance.
Jensenius, A. R., M. M. Wanderley, R. I. Godoy and M. Leman (2010). Concepts and Methods in Research on Musicrelated Gestures. In Gody, R. I. and M. Leman (Eds.), Musical Gestures: Sound, Movement and Meaning. Routledge.

IRCAM - Real Time Musical Interactions

Capturing Musician Motion


Violin bowing

sensor attached
on the bow

3D optical motion capture

F. Bevilacqua, N. Rasamimanana, E. Flty, S. Lemouton, F.


Baschet The augmented violin project: research,
composition and performance report NIME 06

hybrid system
E. Schoonderwaldt, N. Rasamimanana, F. Bevilacqua
Combining accelerometer and video camera:
Reconstruction of bow velocity profiles , NIME 2006

Capturing Musician Gestures

Direct capture of movement, pressure etc using sensors


+

Indirect capture based on the sound analysis


+

+ analysis

IRCAM- Real Time Musical Interactions

Bowing styles characterization


Martel
amax

amin

[a.u.]
amin

Spiccato

amax
amin

amax

Dtach

amax

[a.u.]

amin
2 violin players, 2 tempi (60 bpm, 120 bpm)
dynamics (pp, mf, ff)
N. Rasamimanana et al., GW 2005, Lecture Notes in Artificial Intelligence 3881, pp. 145155, 2006.

Similar works

PCA + KNN
D. Young. Classification of common violin bowing
techniques using gesture data from a playable
measurement system. In in NIME 2008 Proceedings,
2009.

IRCAM- Real Time Musical Interactions

Bowing - Segmentation

Acceleration [a.u]

dtach

martel

t
time
IRCAM- Real Time Musical Interactions

Bowing recognition: Real time implementation


(Max/MSP)
OSC in
median filter

offset removal

gesture intensity
computation

1st order filtering


peak detection

segmentation

peak selection
amax, amin

characterization/
recognition

knn recognition

OSC out

IRCAM- Real Time Musical Interactions

BogenLied -

accelerometers +
wireless module

mic

Receiver
hub
soundcard
spatialized sound
(6 channels)
Sound
processing

Gesture
processing

IRCAM- Real Time Musical Interactions

Bowing styles
acceleration vs velocity
dtach

martel

spiccato

IRCAM- Real Time Musical Interactions

Influence on bowing frequency


position

velocity

acceleration
time

acceleration

video

N. Rasamimanana et al., GW 2007, Lecture Notes in Artificial Intelligence

IRCAM- Real Time Musical Interactions

mm/s

Bowing model
Trap
Jd
Original

Minimizing
mm/s^2

Minimum impulse : trapezodal


continuous control
Minimum jerk (discrete)
balistic control

41

IRCAM- Real Time Musical Interactions

Bowing style - scale


Finding the best model

Martel

Dtach

Minimum impulse (Trapezoidal)

Minimum Jerk

N. Rasamimanana, F. Bevilacqua. Effort-based analysis of bowing movements: evidence of anticipation


effects. Journal of New Music Research,
42

IRCAM- Real Time Musical Interactions

Gestural Co-articulation
detach

martel

Miminum Impulse

Minimum Jerk
43

IRCAM- Real Time Musical Interactions

Co-articulation effect

major difficulty for segmentation and


characterization
using di-gesture ? (similarly to diphone)

can be used to anticipate (towards intention ?)

expressivity links to co-articulation

IRCAM- Real Time Musical Interactions

Gesture to Sound Mapping

Mapping

Wanderley, M. 2001. Performer-Instrument Interaction: Applications to Gestural Control of Music.


PhD Thesis. Paris, France: University Pierre et Marie Curie - Paris VI
See also:
"Mapping Strategies in Interactive Computer Music." Organised Sound, 7(2), Marcelo Wanderley Ed.
Wanderley, M and Battier, M -editors. Trends in Gestural Control of Music. IRCAM, Centre Pompidou, 2000.

Mapping

Low Level vs High Level

in these different modalicusing on affect, emotion,


ross-modality interactions

Linguistic-based descriptions of semantical properties


(meaning, affect, emotion, expressiveness, and so on)

Figure 1. The layered


conceptual framework

Gesture-based descriptions as trajectories in spaces

distinguishes between
syntax and semantics,
and in between, a
connection layer that
consists of

al framework

uli that make up an artistic


information about expresation can, to some extent,
n communicated among a
al subjects. With multiple
auditory, visual, motoric/
the transmission of expresom one domain to another
usic (auditory) to computer
r from dance (motoric) to

affect/emotion
expressiveness spaces
and mappings.
Signal-based descriptions of the syntactical features
Antonio Camurri, Gualtiero Volpe, Giovanni De Poli, Marc Leman,
"Communicating Expressiveness and Affect in Multimodal Interactive
Systems," IEEE MultiMedia, vol. 12, no. 1, pp. 43-53, Jan. 2005

Signal-based layer
47
The signal-based layer
extracts the relevant

IRCAM- Real Time Musical Interactions

Mapping

Spatial vs Temporal :

Spatial : relationship independent of the temporal ordering of


data
Temporel : relationship between temporal processes

Direct vs Indirect

Direct :
- sensor data directly connected to music parameters
- relationship manually set
Indirect
- uses machine learning techniques to set the relationship

IRCAM- Real Time Musical Interactions

Mapping (Spatial)

one-to-one
sensor data

sound parameters

IRCAM- Real Time Musical Interactions

Mapping Musical Instruments


IDMIL lab, Mc Gill

IRCAM- Real Time Musical Interactions

Mapping

one-to-many
sensor data

sound parameters

IRCAM- Real Time Musical Interactions

Mapping

many-to-one

sensor data

sound parameters

IRCAM- Real Time Musical Interactions

Simple ou complexe mapping ?


Hunt, A., and Kirk, R. 2000. "Mapping Strategies for Musical Performance."
In M. Wanderley and M. Battier, eds. Trends in Gestural Control of Music. Ircam,
Centre Pompidou.

The Physical Sliders Interface

The Multiparametric Interface

IRCAM- Real Time Musical Interactions

Conclusions of Hunt and Kirk study

The multiparametric interface allowed people to think


gesturally, or to mentally rehearse sounds as shapes.
The majority of users felt that the multiparametric
interface had the most long-term potential.
Several users reported that the multiparametric
interface was fun.

IRCAM- Real Time Musical Interactions

Mapping

Spatial vs Temporal :

Spatial : relationship independent of the temporal ordering of


data
Temporel : relationship between temporal processes

IRCAM- Real Time Musical Interactions

Indirect Mapping using Machine Learning Techniques

Neural Network
Mostly static postures

Principal Component Analysis


Data dimension reduction

Finite State Machine


Modeling sequences of postures

DTW, HMM methods


Recognition of temporal profiles
IRCAM- Real Time Musical Interactions

Synchronization and
recognition

Generator

(AVT)

Spelling

Generator

Generator

Sydney Fels : Glove-TalkII

10-30

100

130

200

600

Approximate time per gesture (msec)

adaptive Interface that Maps Hand Gestures to Speech


using neural network
Right Hand data

x, y, z
roll, pitch, yaw
every 1/60 second
User

10 flex angles
4 abduction angles
thumb and pinkie rotation
wrist pitch and yaw
every 1/100 second

Preprocessor

Speech

Synthesizer

Fixed Pitch
Mapping

V/C Decision
Network

Contact
Switches

Vowel
Network

Combining
Function

Consonant
Network

Foot Pedal

Fixed Stop
Mapping

Fels, S. S. and Hinton, G. E. Glove-Talk: A neural network interface between a data-glove and a speech synthesizer.
IEEE Trans. On Neural Networks, vol. 4, No. 1, 1993.
IRCAM- Real Time Musical Interactions

Conducting gestures
Several works on conducting gestures

Study of professional conducting gesture


detections, tempo, anticipation
Beat
- .") '/%0123+ 4"#/5!12#$
6+715)+
'#%89727
Public Installation
Music Pedagogy

(,%"%($%+#(+6%7+2(-%",3$%*+,#"+89*'$34+:;<"%**'#(+=628:01>?+!3"'*?+@"3($%

,".//'
- 0&))#)* 1#&/' 2)3 42) 5+"$6&"7
8&3#2 9+:;.<#)* ,"+.;
=>?0 @2$6&) A)#B&"7#<C
DEFDG @2$6&)' ,&":2)C

H#&/' I+"$6&"7}J$7K"L<6M22$6&)K3&

2+3.$( #2'("#
,9'-" 1-'0":
-2-",',$)& )1
,$2." ,/2"# )1
- ,) 2"-1)-08
)9& ()&*+(,:
& ">,"&*'3."
"(,"* '(/(.$(
% 2',,"-&# '#
" 3"#, 2-)!."
3"', 2-"*$(:
.8 $& '**$,$)&
,5" %"#,+-"A#
8 !"#$% *)"#
C+- 2-".$0:
&$,$)& -'," )1
&$,$)& #/#,"0
)2"&"* $& ,5"
J@ $& I'-(5

'0"9)-;#

#$(8 9$,5 5$#:


#? $, 5'# '.#)

!"#$%%&'()*+#,+-.%+/001+2(-%"(3-'#(34+5#(,%"%($%+#(+6%7+2(-%",3$%*+,#"+89*'$34+:;<"%**'#(+=628:01>?+!3"'*?+@"3($%
Bounce
Detector

Rotate

S1
0.00

State
[S3]

State
[S4]

and
=

Zero
Crossing

C+- 9)-; $# 0),$=',"* 3/ -"#"'-(5 )& &)=". ()02+,"- 0+#$(


'&* 0+.,$0"*$' $&,"-1'("# 1)- 2+3.$( #2'("# #+(5 '# 0+#"+0#
X#"" W$%+-" ZY8 '&* !"#$% 3+$.*# )& )+- 2-$)- ">2"-$"&(" 9$,5 *":
#$%&$&% $&,"-'(,$=" )-(5"#,-'. ()&*+(,$&% ">5$3$,#8 $&(.+*$&% 7'*8
,"#%. 9*!:',(*%8 '& ">5$3$, 1)- ,5" QCBJ[ CW IBJ\H $& ]$"&&'
X())-*$&',"* 3/ I'> I^+5.5^'+#"-8 &)9 ', !'-0#,'*, B&$="-#$,/Y
_Z` '&* ' ()..'3)-',$)& 9$,5 S"-"#' I'--$& P';-' )& ;")<*' (:'

State Machine

0.12

S2
0.12

S5
0.84

tempo
beat

0.31
Zero
Crossing
+

6789,( /: !"#$%&'(; <= 7=>(,<?>7@( ?+=-9?>7=8 (AB7C7> D+,


?B7E-,(= >B<> *( -(@(E+F(-; <> >B( "(>>) ",7== &B7E-,(=G.
H9.(9I 7= H7E*<9J((; 5#!0 )*'%' "++#"&$ ,'-&%#$. '/ %*#
0#%%. 0&122 3*145&#26$ !-$#-7 12 !148"-9##: ;<: =>?@

State
[S2]

and
=
Bounce
Detector

Rotate

S5

0.00

Zero
Crossing
+

State
[S1]

S4
0.63

S2

S3
0.31

S3

S4

0.63
1 S1

State
[S5]

and
=

0.84

!"#$%& '( )*& +&,- !#$%& .*/0. -*& !"#$% #%12* ,/% -*& !/$%34&1- 5&$-%1+36&#1-/ #&.-$%& 2%/!+&7 !"8& ,&1-$%&. 1%& 9&-&:-&9;
0*":* 1%& $.&9 -/ -%"##&% -*& 2%/#%&.. /, 1 .-1-& <1:*"=& -*1- 1+./ 1:-. 1. 1 >&1- 2%&9":-/%7 )*& "=2$- -/ -*& .-1-& <1:*"=& ". -*&
:$%%&=- 2%/#%&.. ?@ -/ AB /, -*& >1-/= 1. "- </8&. -*%/$#* /=& :/<2+&-& :C:+& /, -*& #&.-$%&; .-1%-"=# 1- -*& !%.- >&1-7 )*& %"#*!#$%& .*/0. -*& :/%%&.2/=9"=# >&1- 21--&%= -*1- ". -%1:D&9E =$<>&%&9 :"%:+&. "=9":1-& >&1-.; .F$1%&9 +1>&+. "=9":1-& -*& ,&1-$%&.
-*1- 1%& -%1:D&9 1=9 -*& .-1-& -*1- -*&C :/%%&.2/=9 -/7

)1>+& O( G$<<1%C /, +1-&=:C %&.$+-. ,/% -*& !/$%34&1A( G$<<1%C /, +1-&=:C %&.$+-. ,/% -*& H23I/0= 2%/!+&7
E.Lee, I.Grll, H.Kiel, and J.Borchers.)1>+&
conga:
a framework
for ,)".
adaptive conducting
gesture analysis. In NIME 06:
5&$-%1+36&#1-/ 2%/!+&7
!"#$ %&'
(#)*+ ,-*).
/01#234
562 507 %&'
!"#$ %&' (#)*+ ,-*).
/01#234 ,)".
Proceedings of the 2006 conference on New
for;< musical
Paris,
507 France,
%&'
8 interfaces
9:
88=
>< expression, pages 260265,562
?
;
:

9@
><
88:

<;
@8
@8

88=
8::
8::

@<
88;
88@

8
?
;

8A:
><
>@

?=
=?
8@

<=9 8=9
<<< ?A;
IRCAM??9
8A=

Real Time Musical Interactions

Multimodal Music Stand


Figure 1. Multimodal
interface system model.

Figure 3. Prototype MMSS


with electric-field-sensor
antennas mounted at the
corners of the stand.

the multimodal detection layer via OSC. Using four


channels of sensing makes it possible for the MMSS
to provide full three-dimensional input. In contrast
with the Theremins dual-channel approach, threedimensional proximity can be sensed. The overall
intensity of all four antennas is used to determine
the z-axis of the gestural input, as this mixture
corresponds directly to the performers overall
proximity to the stand. The independent antennas
signal strengths correspond to each quadrant of the
xy space, so they are used to gather up/down and
left/right gestures, and visualized using a simple
display window in Max/MSP/Jitter.
Multimodal Detection Layer
The Multimodal Detection layer integrates audio
and visual classification results (pitch and audio amplitude estimates, face detections, angle estimates)
as well as proximity of the performer to the electricElectric-Field Sensing
field sensors for gesture detection. A GUI allows
composers to define the types of gestures occurring
The Multimodal Music Stand system incorporates
in the piece, either asynchronously (occurring at
four electric-field sensors (Mathews 1989; Boulanger
any time in the piece) or synchronously (ordered
and Mathews 1997; Paradiso and Gershenfeld 1997),
current
system
stillcan
lacks
precise,
dynamic
synchroby timing).
Gestures
be defined
to occur
in a
as shown in Figure 3. These are used as part of the
single modality
(e.g., the
occurrence of one
nization
of the alone
acoustic
instrument
and computer
multimodal gesture-detection system and also as
particular note), or more robust combinations by
input sources for the control of continuously variable
algorithm
at smaller time scales. This problem will
requiring that gestures occur in multiple modalities
musical parameters, such as sound brightness or
be addressed
in afuture
versions
the MMSS.
together within
short time
period. Forof
example,
density. The electric-field sensors capture bodily and
a gaze to the side-mounted camera, along with a
instrumental gestures (they are sensitive to both)
certain loudness of playing and/or proximity to one
made by the performer, which are tracked via the
antenna, can be required for a particular gesture.
four sensor antennas.
Uponas
detection
of the pre-defined
gesture, a trigger
MMSS
an Intelligent
Space
The electric-field-sensing technique is based on
is sent to the synthesis machine in the form of an
the original Theremin circuit topology (Smirnov
OSC message.
2000), but all timing calculations are done enInteractive music frequently addresses the issues
tirely in the digital domain. Whereas Theremins
circuit designs utilized analog heterodyning techAudio Synthesis and
of interactivity
by Transformation
requiring performers to work
niques, the MMSS only uses the front end of
withThe
additional
physical elements to which they
this type of analog circuit. The remaining logic is
sound synthesis component of the MMSS sysare unaccustomed.
These physical
elements include
accomplished through the measurement of hightem is based on a clientserver
model (McCartney
frequency pulse widths using custom-designed
2002).
The
synthesis
server
receives
network
comfoot mands
pedals
(discrete and continuous),
worn sensors,
firmware on the CREATE USB Interface (Overholt
that are mapped into control logic and/or
Detection layer, where higher-level gestures are
derived.

environment (Gumtau et al. 2005). MEDIATE is a


responsive environment replete with sensors (microphones, interactive floor, cameras, interactive
wall, and objects) that promotes multi-sensory inOverholt, D., Thompson, J., Putnam, L., Bell, B., Kleban, J., Sturm, B., and Kuchera-Morin,
J. 2009.
multimodal
system
for
teraction with
theAspace.
MEDIATE
evaluates
the
measurements
gesture recognition in interactive music performance. Comput. Music J. 33, 4 (Dec.
2009), 69-82 of its sensors and makes decisions
about the novelty or repetitiveness of participant
actions to tailor media feedback accordingly.
The MMSS with its camera, microphone, and
electric-field sensing begins to embody the metaconcept of an intelligent, sensing, general performative space that can be expanded and further
developed for precise dynamic,
interactive
control
IRCAMReal Time Musical
Interactions
and devices attached to the instrument. These allow

Score Following - Antescofo~


Born in 2008, Official release :Nov. 2009, IRCAM Forum
used in concerts worldwide, 2 workshops, 10 actives pieces,
active collaborations with composers

A. ContANTESCOFO: Anticipatory Synchronization and Control of Interactive Parameters in Computer Music,


International Computer Music Conference, North Irland, 2008

Score following

For Score Following References:


http://cosmal.ucsd.edu/arshia/index.php?n=Main.Scofobib
http://imtr.ircam.fr/imtr/Score_Following_History
Best systems use Markov/Semi-Markov modelling of musical events
performer
training

Observati

scor

observation

Hidden Markov
position

Decodi
position in

IRCAM - Real Time Musical Interactions

q)ur

score inspired by the observations in section II-B. Given the


scores state-space representation, the realtime system extracts
instantaneous beliefs or observation probabilities of the audio
features calculated from the stream, with regard to the states
of the score graph. The goal of the system is to then integrate
this instantaneous belief with past and future beliefs in order
to decode the position and tempo in realtime. Figure 3 shows
a general diagram of our system.

Score following

s with no jumps to
the minimum time
ed version has been
g systems where the
ptimization over the
e music score ( [12],

Antescofo (Anticipatory Score Follower)

n a Semi-Markov
single state (instead
y using an explicit
j (u) for each state
the discrete random
i from a state space
ate m, then St = m

Audio Stream

off-line
real-time

Features

Score
Inference & Decoding
Score
Parser

Score
Actions
Audio

Tempo

n the duration modScore Position


Tempo
with this timing. In
ot a Markov process Fig. 3. General System Diagram
Arshia Cont. A coupled duration-focused architecture for realtime music to score alignment, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009 (in press).
process in between
To tackle the problem, we adopt a generative approach
v.

IRCAM - Real Time Musical Interactions

Score Following / Gesture Follower


score

Modeling (HMM)

symbols

Modeling (HMM)

signal

IRCAM - Real Time Musical Interactions

gesture follower @ Ircam


http://imtr.ircam.fr/imtr/Gesture_Follower

Bevilacqua, F., Zamborlin, B., Sypniewski, A., Schnell, N., Gudy, F., Rasamimanana, N. Continuous realtime gesture
following and recognition, accepted in Lecture Notes in Computer Science (LNCS), Gesture in Embodied Communication
and Human-Computer Interaction, Springer Verlag. 2009
F. Bevilacqua, F. Gudy, N. Schnell, E. Flty, N. Leroy, " Wireless sensor interface and gesture-follower for music
pedagogy", Proc. of the International Conference of New Interfaces for Musical Expression (NIME 07), p 124-129, 2007

Goals

Hyp: Gesture meaning is in temporal evolutions


Real-time gesture analysis :

gesture following: time progression of the

performed gesture
recognition/characterization: similarity of the
performed gesture to prerecorded gestures

Requirements

simple learning procedure, with a single example


adaptation to the user idiosyncrasies
continuous analysis from the beginning of the gestures
IRCAM - Real Time Musical Interactions

Gesture ?

Any continuous datastream of parameters


typically 0.1 to 1000 Hz
from motion capture systems:

from sound descriptors

multimodal data

image descriptors
accelerometers, gyroscope, magnetometers
pitch, loudness
mfccs, ...

IRCAM - Real Time Musical Interactions

Time Profile Modeling: HMM


Markov Models
probability density function
sensor
value

time

transition probabilities

Markov Chains
IRCAM - Real Time Musical Interactions

HMM structures
one state every
two samples

one state
every sample

time

time

maximum relative speed = 2

maximum relative speed = 2


IRCAM- Real Time Musical Interactions

Hybrid Approach

Hybrid between:
Template based - Dynamic Time Warping
Linear Dynamics Model
HMM

Similar to S. Rajko et al. (ASU), also developed in


an artistic context

G. Qian, T. Ingalls and J. James, Real-time Gesture Recognition with Minimal


Training Requirements and On-line Learning, to appear in IEEE Conference on
Computer Vision and Pattern Recognition, 2007.
S. Rajko and G. Qian, A Hybrid HMM/DPA Adaptive Gesture Recognition
Method, ISVC 2005,p 227-234.

IRCAM- Real Time Musical Interactions

Real-time time warping


performed gesture (live)
gesture
parameter

recorded example
time
Synchronization/following
Recognition
Anticipation (prediction)
IRCAM- Real Time Musical Interactions

Time warping

acceleration

references

time warped
performed gesture

x
y
z
time
IRCAM- Real Time Musical Interactions

Learning phase

Transition matrix
left-to-right Markov chain
states regularly spaced in time
transition matrix set by the sampling rate
direct relationship between state number i and time
(T= 1/1-a, where a is the self transition prob)

Emission probabilities
values from the time profile
calculated or set by user
IRCAM- Real Time Musical Interactions

Forward Calculation
State probability for given
observation O(tn) = b
Transition Matrix

" ( t n +1 ) = A[" ( t n ) # b]

state probability
at t = tn+1
IRCAM- Real Time Musical Interactions

Decoding phase

Using the forward computation [Rabiner 89] (causal !)


Compute the probability of being at state i
!

state i

IRCAM- Real Time Musical Interactions

Decoding phase
State with maximum probability at time t
time progression
!

state i

i = likelihood at time t
IRCAM- Real Time Musical Interactions

Evaluation with synthesized signals


test signal
reference signal

mean error (sample #)

scaling

offset

noise
IRCAM- Real Time Musical Interactions

Following long sequences

Computation of on a sliding window

IRCAM- Real Time Musical Interactions

Gesture Follower - Context


dance
(performance
and installation)
music pedagogy

music performance

IRCAM- Real Time Musical Interactions

StreicherKreis - Florence Baschet

gesture =
acceleration
angular velocity
pressure
audio energy

Synchronizing Sound to Gesture

IRCAM- Real Time Musical Interactions

Music Pedagogy applications

Conducting

Atelier des Feuillantines Fabrice Gudy


IRCAM- Real Time Musical Interactions

Homo Ludens (Richard Siegal - The Bakery)

2:55
IRCAM - Real Time Musical Interactions

Recognizing movement qualities

Sarah Fdili Alaoui (PhD work)


Collaboration with the dance company Emio Greco I PC
IRCAM- Real Time Musical Interactions

Towards Segmental Models


Goal: classification / segmentation of sounds and gestures
based on their temporal evolutions
Approach: segmental HMM models
yt

classical HMM

steps

8
7
6
5
4
3

qt

2
1

Gesture Follower
steps

400

600

800

1000

1200

1400

1600

8
7
6
5
4

q1

3
2
1

segmental HMM
trajectories

400

600

800

1000

1200

1400

q2

q3

qT

1600

[ ytn ln +1 , ... , ytn 1, ytn ]

8
7

...

6
5
4
3
2
1

400

600

800

1000

1200

1400

1600

qn

ln

IRCAM- Real Time Musical Interactions

Sound and gesture morphologies


classification/segmentation on a violin database
(pitch/loudness profiles)
PhD Julien Bloit & Projet Interlude
Modelling by primitive assembling:
f1

s1

s2

s3

f2

s4

s5

s6

Segmentation on a continuous stream:


$# $# # # # # # #
! "
sfz
sfz
Loudness
1

f3

I4

s7

I1

I2

I4

0.8

f4
f5
f6

s8

s9

0.6

0.4

0.2

0
10

11

12

13

14

15

16

17

18

time (s)

[1] J. Bloit, N. Rasamimanana, and F. Bevilacqua. Modeling and segmentation of audio descriptor
profiles with segmental models. Pattern Recognition Letters, 2009.
[2] J. Bloit, N. Rasamimanana, and F. Bevilacqua. Towards morphological sound description using
segmental models. In DAFX, Como, Italy, 2009.
IRCAM- Real Time Musical Interactions

Hierarchical / Two-level Modeling


1.Temporal Segments Temporal
2. Sequence of Segments
N

D
A

gesture

time

IRCAM- Real Time Musical Interactions

Credits and Acknowledgements


Real Time Musical Interaction team:
Frdric Bevilacqua, Tommaso Bianco, Julien Bloit, Riccardo Borghesi, Baptiste
Caramiaux, Arshia Cont, Arnaud Dessein, Sarah Fdili Alaoui, Emmanuel Flty, VassiliosFivos Maniatakos, Norbert Schnell, Diemo Schwarz, Fabrice Gudy, Alain Bonardi,
Nicolas Rasamimanana, Bruno Zamborlin

Current Support of the projects:


ANR projects: Interlude, Topophonie (France).
EU-ICT Project SAME

Thanks to
Atelier les Feuillantines and students, Remy Muller, Jean-Philippe Lambert, Alice
Daquet, Anthony Sypniewski, Donald Glowinski, Bertha Bermudez and EG|PC, Myriam
Gourfink, Richard Siegal, Hillary Goidell,Florent Berenger, Florence Baschet

IRCAM- Real Time Musical Interactions

Das könnte Ihnen auch gefallen