tmpF1C0 TMP

Implementation of a 3D Virtual Drummer
Martijn Kragtwijk, Anton Nijholt, Job Zwiers
Department of Computer Science

University of Twente
PO Box 217, 7500 AE Enschede, the Netherlands
Phone: 00-31-53-4893686
Fax: 00-31-53-4893503
email: {kragtwij,anijholt,zwiers}@cs.utwente.nl
ABSTRACT
Audio
We describe a system for the automatic gen- Signal
Percussion
Recognizer
eration of a 3D animation of a drummer play-
ing along with a given piece of music. The in-
put, consisting of a sound wave, is analysed MIDI
to determine which drums are struck at what Events
moments. The Standard MIDI File format is
used to store the recognised notes. From this Animation
Generator
higher-level description of the music, the an-
imation is generated. The system is imple- 3D Animation
mented in Java and uses the Java3D API for
visualisation.
Figure 1: An overview of the system
1. INTRODUCTION
As gure 2 shows, the total task can be
In this paper we describe preliminary results separated into two independent subtasks:
of our research on virtual musicians. The
objective of this project is to generate ani- An analysis of the sound signal and tran-
mated virtual musicians, that play along with scription of the percussion part. The
a given piece of music. The input of this system has to determine which drums
system consists of a sound wave, originating are hit, at what moments in time. Con-
from e.g. a CD or a real-time recording. centrating on percussion sounds has cer-
There are many possible uses for an appli- tain advantages and disadvantages; this
cation like this, ranging from the automatic is further discussed in section 2.
generation of music videos to interactive mu-
The creation of the the movements of
sic performance systems where musicians play
a 3D avatar playing on a drum kit. A
together in a virtual environment. In the last
more detailed explanation on this part
case, the real musicians could be located on
is given in section 3 and 4.
dierent sites, and their virtual counterparts
could be viewed in a virtual theatre by a world-
wide audience. Additionally, our department 2. THE PERCUSSION RECOGNISER
is currently working on instructional agents
that can teach music, for which the work we This part of the system is responsible for the
describe in this paper will be a good founda- translation from a low level description of
tion. the music (the sound wave) to a abstract, high
For our rst virtual musicians application, level description of all percussion sounds that
we have restricted ourselves to an animated are present in the signal. These recognised
drummer. However, the system is exible notes are stored as MIDI events.
enough to allow an easy extension to other Many attempts in the eld of musical in-
instruments. strument recognition concentrate on pitched
sounds [1]. As explained in [9], this is a rather
dierent task than recognising percussive sounds,
which have a sharp attack, short duration,
and no clearly dened pitch. As shown in [9],
individual, monophonic samples of drums and
cymbals can be classied very well. In this
approach, a few frames of the spectrum, mea-
sured from the onset of the sounds, were matched
against a database of spectral templates.
In our highly polyphonic, real-life situa-
tion, however, the input signal may contain
many percussive sounds played simultaneously,
and non-percussive instruments (such as gui-
tar and vocals) may be mixed through the
signal as well. Therefore, special techniques
are needed to separate the percussive sounds Figure 2: An overview of the system
from the other sounds. Other researchers
have already tried to solve the same problem
[13, 14]: Sillanpa
a et. al. subtract harmonic
3.1. Overview of the system
components from the input signal to lter
out non-percussive sounds. Furthermore, they
A general overview of the animation gener-
stress the importance of top-down process-
ation is shown in gure 2. An abstract de-
ing: using temporal predictions to recognise
scription of the animation (in this case, a list
soft sounds that are partially masked by louder
of time-stamped MIDI events) is transformed
sounds [14]. Puckettes Pure Data program
into a concrete animation. This lower-level
has an object called bonk that uses the dif-
description of the animation is dened in terms
ference between subsequent short-time spec-
of key frames [4] that can directly be used by
tra, to determine whether a new attack has
the graphical subsystem to animate objects
occurred.
in the scene.
We are still developing this part of the sys-
tem, therefore we cannot yet present a nal Our implementation uses the Java3D en-
solution of this problem. We plan to solve gine for visualisation purposes [7]; the geom-
the problem of polyphony by adding exam- etry of the 3D objects we have used is has
ples that consist of multiple sounds played been created using Virtual Reality Modeling
together to the collection of spectral templates. Language (VRML, [15]).
For example, a bass drum, snare drum and hi-
hat played together. For an o-line situation,
where the complete input signal is already
3.2. Pre-calculated versus real-time anima-
known, we plan to apply clustering methods
tion
on all fragments of the signal that contain a
strong attack. This is based on our hypothe-
sis that specic drum sounds will sound very In our current o-line implementation, the
similar throughout a piece of music. This piece of music to be played is completely known
is especially plausible for commercial record- in advance as a list of MIDI events. Therefore,
ings, and/or in the case that the music con- the entire animation can be computed before
tains sampled drum sounds. it is started. In a real-time situation, where
the system has to respond to incoming MIDI
events, this would not possible. In that case,
3. BASIC ALGORITHMS a short animation should be constructed and
started immediately for each note that occurs
In this section, we describe how our system in the input.
generates animations automatically. The var- A great advantage of pre-calculating the
ious algorithms discussed here are kept rather entire animation is that the transitions be-
simple on purpose, to maintain a clear view tween strokes will be much smoother: for
on the system as a whole. In section 4, more each note we already know which drum will
advanced techniques (that give better results) be struck next, and the arm can already start
will be explained. moving towards that drum.
3.3. Polyphony Issues 3.4.2. Other Parameters
Monophonic instruments (such as the trum- Other parameters that are dened in the drum
pet or the ute) are relatively easy to animate, kit model:
because each possible sound corresponds to
For each event type, a preferred hand:
exactly one pose of all ngers, valves, etcetera,
-1 (left) or 1 (right).
and only one pose can be active at each mo-
ment in time. Highly polyphonic instruments For each event type, a parameter minT imeGap
(such as the piano) are much more dicult, that determines how fast that particu-
because there are many dierent ways (n- lar event type can be played with one
gerings) to play the same piece of music, and hand. This parameter will be explained
a search method is needed to nd a good so- in more detail in section 3.6.
lution [8]. The drum kit could be viewed in
between these two extreme examples: up to
four sounds can be started simultaneously. 3.5. MIDI Parsing
First, the list of MIDI events is transformed

into a list of DrumEvent objects according to
3.4. Drum Kit Model
the mapping dened in the drum kit model
In this section we will describe the parame- (see section 3.4). The class DrumEvent is an
ters that are used to model dierent drum extension of the AbstractEvent class (see
kits. appendix ??). Besides having a type code, a
DrumEvent has an associated velocity velevent
in the range [0..1].
3.4.1. Event Types Secondly, the list of DrumEvent events is
parsed to remove double events 1 and dis-
The General MIDI standard [11] denes 47 tribute the events over the dierent animated
dierent percussive sounds. The standard in- objects. Objects in the scene respond to a
cludes dierent versions of the same sound, subset of drum event types.
for example Crash cymbal 1 and Crash cym- Three new event lists are created (one for
bal 2. Our application should treat both events the hands, one for the left leg and one for the
in the same way. right leg) and the DrumEvents from the orig-
Additionally, there are six dierent tom- inal list are distributed between them. The
tom sounds (Low oor tom, High oor tom, event list that is used for the hands will later
Low tom, Low-mid tom, Hi-mid tom, High be subdivided for the left and right hand; this
tom), while a real drum kit usually only has is discussed in section 3.6.
2 or 3 tom-toms. It may be clear that we have
to dene a smaller set of drum event types
3.6. Event Distribution
in the drum kit model . The MIDI events from
the input le can then be mapped onto these Drum events that can be played by both hands
drum event types. (i.e. all events except BASS and HIHATPEDAL)
Our current implementation distinguishes need to be distributed between the left and
between the following drum event types: BASS, right hand in a natural looking way.
SNARE, RIM, HIGHTOM, MIDTOM, FLOORTOM,
CRASH, RIDE, RIDEBELL, SPLASH, CHINA, HI-
3.6.1. Hand assignment
HATOPEN, HIHATCLOSED, HIHATPEDAL, COW-
BELL. The rst algorithm that we have tested, was
The drum event types do not neccesarily designed to be as simple as possible. It is
have to have a one-on-one correspondence based on the following principles:
with the objects in the 3D scene, because 2
or more event types can belong to the same 1. No more than two events, that are played
drum/cymbal, with a dierent hit point. A with the hands, can have the same time
good example of this are the RIDE and RIDE- stamp.
BELL events: both are played on the ride cym- 1 some MIDI les that we used contained double
bal, but we speak of a ride bell (or cup bell) events, that is: multiple events on the same chan-
nel, with the same time stamp, the same note number
when the stick hits the small cup at the cen-
and the same velocity. These extra events do not con-
ter of the cymbal (this gives a bell-like sound, tain new information, nor do they increase the velocity,
hence the name). therefore we can discard them.
2. For each event type, there is a preferred 3.7. Pose Creation
(default) hand that should be used if pos-
sible.
3. When playing fast rolls, both hands should

be used.
In our system, these principles were im-

plemented in the following way:
When more than two events (that should

be played with the hands) are found to
have the same time stamp, all but two
are deleted.
A parameter def aultHandeventT ype is Figure 3: The graphical poser interface, ap-
specied for all event types. In our im- plied to the left arm
plementation, the SNARE and RIM events
have the default hand set to left, while A graphical user-interface (GUI) is provided
right is the default hand for all other to create poses manually. Figure 3 shows a
events. screenshot of the GUI applied to the left arm.
A pose consists of a set of angles or trans-
A parameter minT imeGap is dened, lation values: one for each degree of free-
that determines how fast an event can dom. With the horizontal sliders, the user
be played with one hand. This parame- can change these values.
ter can have a dierent value for dier-
ent event types, Because the tendency
to alternate hands varies from one drum
type to another. For example, the hi-hat
is usually played with the right hand;
only in very demanding situations (fast
rolls) both hands will be used. On the
other hand, hand alternation on the high
tom is much more common.
Figure 4: MID TOM Figure 5: MID TOM
These principles are implemented in algo- UP DOWN
rithm 3.1. It consists of two phases:
For each limb, two posews should be spec-
1. default hand assignment ied for each drum event type that it sup-
ports: the DOWN pose (the exact situation
2. hand alternation
on contact) and the UP pose (the situation
just before and just after the hitting moment).
Examples of UP and DOWN poses are shown
Algorithm 3.1 A simple algorithm for event
in gures 4 and 5.
distribution
iterate over all events e: Once a good position is achieved, it can be
hand(e) := preferredHand(type(e)) stored in the pre-dened list of poses. The
iterate over all triplets of subse- entire list can be saved to disk, to preserve
quent events (e1,e2,e3): the information for a next session.
if hand(e1)=hand(e2)=hand(e3)
AND
3.7.1. Motivation
Time(e2) -
Time(e1) <= minTimeGap(type(e1)) We have chosen for manually setting the poses
OR through a GUI interface, instead of using mo-
Time(e3) - tion capture [16] or inverse kinematics for
Time(e2) <= minTimeGap(type(e3))
the following reasons:
then
hand(e2) := otherHand(hand(e2)) Costs: Motion capture equipment is expen-
sive, and requires a complete setup with
a real drum kit that matches the 3D kit. (that contains only drum events that should
If one would want to change something be played by that arm/leg) is parsed in the
in the 3D drum kit (for example, moving correct temporal order. For each abstract an-
a tom-tom) the whole capturing would imation event e, a Stroke is added to the an-
have to be done all over again. imation time line. A Stroke consists of three
concrete animation events (i.e. key frames):
Simplicity: there are only a small number of (ebef or e , econtact , eaf ter ).
poses, and they have to be set only once The parameter delta is a constant that
for a new drum kit conguration. determines the time between the key frames
Flexibility: besides the setting poses for the within a stroke (100ms is a useful value). See
arms and legs, the interface can also be gure 6 for a graphical representation of a
used for the hi-hat stand and pedal, the Stroke that will be used throughout this chap-
cymbal stands, the parts of the bass pedal, ter.
and giving the snare, bass drum and tom-
toms their position and orientation in
the 3D scene.
3.7.2. Implementation
In the object source les, we have to dene

the parameters of the object, such as the de-
grees of freedom and the corresponding ro-
tation / translation axis. For example:
arm:
the shoulder can rotate around its Figure 6: A basic Stroke, consisting of key
local X,Y and Z axis; frames before, contact and after
the elbow can rotate around its lo-
If the time gap between subsequent ani-
cal X and Y axis, to make the lower
mation events e1 and e2 is less then delta,
arm twist and the elbow bend, re-
their key frames will overlap, and special care
spectively;
has to be taken. We distinguish between two
the wrist can rotate around its lo- cases:
cal X and Z axis
If e1 and e2 are of the same event type
hi-hat: (e.g. both are SNARE events), the last
key frame of e1 and the rst key frame
the pedal can rotate around its lo-
of e2 are replaced by an interpolated
cal Z axis
key frame eNew: the less time between
the upper part (the stick to which e1 and e2, the closer the new key frame
the upper cymbal is attached) can will be to the DOWN key frame, as can
be translated along the Y axis. be seen from gure 7.
UP
3.8. Key Frame Generation
key frame space
type(e1)
In this section, the transformation from ab- Added key frames

stract events (DrumEvents) to concrete events
(key frames) is discussed. Because a dierent DOWN
type(e1)
e0
e1
e2
approach is used for the limbs and the cym-

bals, they are discussed separately: time
Figure 7: New key frames in the case of over-

3.8.1. Avatar Animation lapping events of the same event type
The poses that were created with the GUI in-
terface (see section 3.7) are used to create If e1 and e2 are of dierent event types
key frames for the animation of the limbs. (e.g. a SNARE and a HIGHTOM event),
For each arm and leg, its abstract time line more time is needed to bring the arm
alpha max
from the after key frame of e1 to the
before key frame of e2. To accomplish
this, the time dierence between e1contact
and e1af ter , and between e2bef or e and
angle
0
e2af ter is shortened. A parameter a (0 <
a < 1) determines the fraction of the
time between the events that is used for time
moving the arm from e1af ter to e2bef or e .
1a a 1a Figure 9: angle(t)
2 2
key frame space
UP
type(e2)
UP
type(e1)
angle(t) = t max sin(t)

DOWN
type(e2)
DOWN In the above equation,
e2
type(e1)
e1
time max represents the maximum angle
is the damping factor of the vibration

Figure 8: New key frames in the case of over-
(0 < < 1): low values for result in a
lapping events of dierent event types
fast decay.
determines the speed of the vibration:

a higher value for corresponds to a
shorter swing period.
3.8.2. Drum Kit Animation
Overlapping Vibrations are much easier
to deal with than overlapping Strokes. When
The event list, containing all DrumEvents from the rst time stamp of a new Vibration falls
every limb is used to animate the 3D drum within the time range of an previous Vibra-
kit. tion, the remaining events (key frames) are
deleted2 .
alphamax
angle
3.8.2.1. Pedals 0
e1 e2 e3
The bass pedal and the hi-hat are animated time
through the same kind of Stroke objects as

we use for the arms and legs. Because the
pedals and the feet have their UP and DOWN Figure 10: overlapping vibrations for events
key frames at exactly at the same moments [e1,e2,e3]
in time, the illusion is created that the feet
really move the pedals.
4. IMPROVEMENTS
In this section, some advanced techniques will

be discussed that extend the system as de-
scribed in section 3. These techniques are
3.8.2.2. Cymbals designed to make the motion of the virtual
For the animation of the cymbals we use Vi- drummer appear to be more natural and re-
bration objects, that contain a number of key alistic. One should keep in mind, however,
frames starting at the contact time stamp of 2 Note that this will sometimes cause a sudden dis-
a cymbal event. These key frames are com- continuity in the angle, when a new vibration overrides
puted by rotating the cymbal object around an existing one at a moment that the angle was not 0.
In practice, however, this eect is hardly noticed; prob-
its local X and / or Z axis. The angles are
ably because the viewers eye already expects a sharp
sampled from an exponentionaly decaying si- change in the motion of the cymbal, once it gets hit by
nusoid: the stick.
that although some general rules can be fol- HIHATPEDAL} 5 : only one of them can be
lowed, there is no perfect solution: dierent played at a time.
drummers will have their own playing style. We must therefore ensure that the event
Dierences may lie in list that is used to create the animation con-
tains no more than one hi-hat event at each
the parts of the drum kit: how many moment in time. This is taken care of in the
and what type of cymbals, toms etc. are MIDI parsing stage: whenever two or three hi-
used? hat events have the same time stamp, one will
Is there one bass drum with a single pedal, be kept and the others are discarded. Which
one bass drum with a double pedal, or event is kept and which ones are removed is
two bass drums with two seperate ped- a rather arbitrary choice.
als?
4.2.1. The hi-hat pedal as a substitute
the setup of the drum kit: normal (with
the hi-hat on the left side and the lowest Human drummers often use the hi-hat pedal
tom on the right side, this setup is used to play the hi-hat sounds when they have to
by right-handed players) or mirrored play two other sounds on the same time as
(for left-handed drummers)? Where are well. This is implemented in our system in
the cymbals placed? the following way: If three or more DrumEvents
have the same time stamp (not counting BASS
The hand patterns used on a certain roll: events), and one of them is a HIHATOPEN or
LLRR, LRLR, LRRL, etc. HIHATCLOSED event 6 , this event is replaced
by a HIHATPEDAL event with the same time
grip, the way of holding the drum sticks: stamp and velocity.
either matched 3 or traditional 4 ?
4.2.2. Hand assignment
the way of striking the drums: are the
palms of the hands kept vertical or more The hand assignment algorithm described in
horizontal? section 3.6 is easy to model and gives satis-
factory results in most situations. However,
a number of problems arise:
4.1. Event Distribution
when two simultaneous events have the
In our basic algorithm (see section 3.6), there same default hand (for example, MID-
was a maximum of two simultaneous events TOM and LOWTOM), the original algo-
that were played by the hands. In this sec- rithm would remove one of the events
tion, we will show how this constraint can be from the list, even when the other hand
releaved by using the hi-hat pedal in specic could have played that event.
situations. First, however, another constraint
on the contents of the event list will be dis- in some cases, the arms are crossed when
cussed. this is not necessary: consider for ex-
ample a fast sequence HIHATOPEN-RIDE-
HIHATOPEN. Both RIDE and HIHATOPEN
4.2. Simultaneous hi-hat events have right as default hand, and the hand
alternation algorithm will assign the RIDE
Our input list of MIDI events is not bound by event to the left hand. Most drummers,
any real-world constraints, and may there- however, would in this case prefer to
fore contain any number of simultaneous events, play the HIHATOPEN with the left hand
even when this would be impossible to play and the RIDE with the right hand.
on a real drum kit. Consider the set of possi- 5 This is a strongly simplied view of reality, as hu-
ble hi-hat events {HIHATCLOSED, HIHATOPEN, man drummers are able to play much more dierent hi-
hat sounds than these three. For example, playing with
3 in matched grip, both hands hold their stick be- the hi-hat cymbals almost closed sounds entirely dif-
tween thumb and index nger ferent than both HIHATOPEN and HIHATCLOSED. How-
4 the traditional grip is often used by jazz drummers. ever, the three event types that we consider in our
The right hand grip (for right-handed players) is the model are the only three that are included in the Gen-
same as with matched grip, while the left hand holds eral MIDI specication, and are used in most situations.
the stick between thumb and index nger and also be- 6 Note that there can only be one such event, because
tween ring and middle nger of the ltering as explained in section 4.2.
Our second algorithm, that solves these 1. generate all possible solutions
shortcomings, uses default hand assignments
for all possible pairs of events. For exam- 2. assign a distance value to each solution
ple, we can dene that whenever RIDE and (e.g. based on distances between drums,
HIHATOPEN are played together, the RIDE is penalties for using a certain hand for a
played with the right hand and the HIHAT certain event type, etcetera)
with the left. We should keep some exibility,
as these constraints do not have to be equally 3. take the solution with the lowest dis-
strong for all pairs: for example, SNARE+CRASH tance value.
can be played as left-right just as easy as right-
left. Problems with this approach lie in the design
The drum kit model is extended with a of a good distance function, and in the large
function pair (eventT ype, eventT ype), that number of possible solutions7 . We have not
returns a oating-point value in the range [- (yet) implemented a shortest-path algorithm
1..1]. The semantics of this value are as fol- in our system.
lows:
1 strictly left-right 4.3. Key Frame Generation

0 dont care
4.3.1. Drum Elasticity
1 strictly right-left
The improved hand assignment algorithm In a real drum kit, one can observe that some
uses just the pair (a, b) function for simulta- drums or cymbals are more elastic than oth-
neous events. For events [e1, e2] with a time ers, i.e. the drum stick bounces more on one
gap t greater that zero, the default hand object than on another. Besides the object it-
values are taken into account as well. self, the elasticity is also dependent on the
For each event with index I in the event way of playing: the stick will bounce back
list, a hand assignment value is calculated more on the hi-hat when it is played closed
twice: in the pair [event(I-1),event(I)] and in then when it is played open.
the pair [event(I),event(I+1)]. Afterwards, these To simulate this phenomenon, we extend
two values are averaged to yield the nal hand the drum kit model with an elasticity param-
assignment value for event(I). eter eleventT ype in the range [0..1] for each
For a pair [e1,e2] the hand assignment val- drum event type. The value of eleventT ype
ues (hand(e1), hand(e2)) are calculated in determines how far the drum stick should
the following way: bounce back to its initial position after con-
tact. In this denition, 0 means no elastic-
ity while 1 corresponds to maximum elas-
t =T ime(e2) T ime(e1) ticity. The elasticity values are now used in
the following way: for each stroke, the T Raf ter
hand(e1) = t pair (e1, e2) + (1 t )
key frame is interpolated between the UP
def aultHand(e1) and the DOWN pose:
hand(e2) = t (pair (e1, e2)) + (1 t )
def aultHand(e2)
T Rbef or e =T RU P
The decreasing exponential function t T Rcontact =T RDOW N
(0 < < 1) ensures that the default hand val- T Raf ter =T RDOW N + eleventT ype
ues are taken more into account when there
is more time between e1 and e2, at the same (T RU P T RDOW N )
time lowering the inuence of the pair-wise
hand preference. From this, one can easily deduce that
4.2.3. Shortest path methods

eleventT ype = 0 T Raf ter = T Rcontact
A third possible solution to the hand assign- eleventT ype = 1 T Raf ter = T RU P
ment problem might be found in shortest-
path methods, as used in [8, 10]. These meth- 7 This is of exponential complexity, as n events can
ods consist of the following steps: be distributed over the 2 hands in 2n ways
4.3.2. Note Velocities poses are dened for the neck joint, and for
each beat note a Stroke is created. We have
In the basic algorithm (see section 3.5), we used the SNARE event on the left hand as an
did not take the velocities velevent of the DrumEvents approximisation of beat notes.
into account. It would of course be more con- Finding the real beat in a MIDI le is far
vincing to use dierent animations for dier-
from trivial, and many other researchers have
ent velocities. use dierent animations for addressed this problem [3, 2, 5, 6]. Our sys-
dierent velocities will result in a more nat- tem could very well be integrated with an in-
ural behavior: the UP position should be
telligent beat detector to create even better
closer to the drum surface for softer notes, looking behaviour.
and further away in the case of loud notes.
The key frames [T Rbef or e , T Rcontact , T Raf ter ]
that make up a Stroke can therefore be de- 4.3.4. Key Frame Interpolation
ned as follows (see also gure 11):
After the basic key frames are set, the motion
is ne-tuned by inserting extra key frames ac-
T Rbef or e =T RDOW N + velevent di cordiapplying a dierent interpolation script
T Rcontact =T RDOW N between certain key frame types (before /con-
tact /after). These scripts can also be dier-
T Raf ter =T RDOW N + velevent eleventT ype
ent for each joints.
di The example scripts shown in gure 12
di =T RU P T RDOW N create rather convincing results, because the
stick moves slightly behind the hand, giving
in a whip-like motion. These interpolation
TR
UP scripts are derived by observing the motion
of a human drummer.
from after from before from contact
next to before to contact to after
TR key frame
DOWN
vel=1.0 vel=1.0 vel=0.5 vel=0.5
el=0.5 el=0.25 el=1.0 el=0.5 elbow
wrist
stick
Figure 11: The eect of dierent velocity and current

key frame
time stamp
time stamp
time stamp
time stamp
time stamp
time stamp
current
current
current
elasticity values.
next
next
next
4.3.3. Extra avatar animation Figure 12: example interpolation scripts for
the elbow and the wrist and stick joints
In this section, a number of extensions are
discussed that animate parts of the avatar
that were not animated at al in the basic sys- 4.4. Implementation Notes
tem. This helps a great deal to make the
avatar look alive. The Java3D API is used for the implementa-
tion, because it is platform-independent and
supports a wide range of geometry le for-
4.3.3.1. The head mats. Moreover, the our virtual theatre [12] is
The head of the avatar is animated, to cre- currently being ported from VRML to Java3D.
ate the eect that the avatar follows his hands
The SMF format (Standard MIDI File) is used
with his eyes. First, we create poses for the
as intermediate le format between the per-
head: one for each event type that is sup-
cussion recogniser and the animation gener-
ported by the hands. These poses rotate the
ator. A great advantage of using the SMF is,
head so that the eyes are pointed at the as-
that it allows us to use MIDI les (which are
sociated drum / cymbal. If we then use all
widely available on the WWW) to test the an-
events that are played by e.g. the right hand
imation generator independent from the per-
to create a key frame time line, the head ap-
cussion recognizer.
pears to follow this hand.
For the synchronisation of the animation
and the sound, a seperate thread is used, which
4.3.3.2. The neck looks up the current audio position and ad-
The neck joint is used to make the avatar justs the start time of the animation accord-
nod with his head on the beat: UP and DOWN ingly.
5. CONCLUSION In Proceedings of the International Com-
puter Music Conference, pages 171174,
We have chosen for a GUI-based pose editor Sept. 1995.
and script-based key frame interpolation. A
screenshot is shown in gure 3. This proves [7] The Java3D API. http://java.sun.com/-
to be a very exible solution, since there are products/java-media/3D/.
only a small number of poses, and they have [8] J. Kim. Computer animation of pianists
to be set only once for a new drum kit cong- hand. In Eurographics 99 Short Papers
uration. The system could be extended with and Demos, pages 117120, Milan, 1999.
motion capturing, dynamics and inverse kine-
matics to create even more realistic behaviour, [9] M. Kragtwijk. Recognition of percus-
but at the cost of losing simplicity and exi- sive sounds using evolving fuzzy neural
bility. The interpolation scripts create natu- networks. Technical report, University
ral motion, while the hand assignment algo- of Otago, Dunedin, New Zealand, July
rithm ensures the arms will not cross. Mo- 2000. Report of a practical assignment.
tion capture would require the setup of the
virtual drum kit to exactly match the setup [10] T. Lokki, J. Hiipakka, R. H
anninen, T. Il-
of the real kit, so changes cannot easily be monen, L. Savioja, and T. Takala. Real-
made. time audiovisual rendering and con-
The animation results can be viewed at temporary audiovisual art. Organised
our web site: http://wwwhome.cs.utwente.nl Sound, 3(3):219233, 1998.
/kragtwij/science/ [11] The general midi specication.
http://www.midi.org/about-midi-
6. REFERENCES /gm/gm1sound.htm.
[12] A. Nijholt and J. Hulstijn. Multimodal in-

[1] A. T. Cemgil and F. Gurgen. Classi-
teractions with agents in virtual worlds.
cation of musical instrument sounds
In N. Kasabov, editor, Future Directions
using neural networks. Technical re-
for Intelligent Systems and Information
port, Department of Computer Engi-
Science, Studies in Fuzziness and Soft
neering, Bogazii University, Istanbul
Computing, chapter 8, pages 148173.
Turkey, 1997.
Physica-Verlag, 2000.
[2] P. Desain and H. Honing. Quantiza-
[13] M. Puckette. Pure data: Recent progress.
tion of musical time: A connection-
In Proceedings of the Third Intercol-
ist approach. Computer Music Journal,
lege Computer Music Festival, pages 14,
13(3):5666, 1989.
Tokyo, 1997.
[3] P. Desain and H. Honing. Can music cog- [14] J. Sillanp
a
a et al. Recognition of acoustic
nition benet from computer music re- noise mixtures by combined bottom-up
search? from foot tapper systems to and top-down processing. In Proceed-
beat induction models. In Proceedings ings of the European Signal Processing
of the ICMPC, pages 397398, Liege: ES- Conference EUSIPCO, 2000.
COM, 1994.
[15] Web3d consortium.
[4] J. D. Foley, A. van Dam, S. K. Feiner, and http://www.vrml.org/.
J. F. Hughes. Computer Graphics: Princi-
ples and Practice. Addison-Wesley Pub- [16] V. B. Zordan and J. K. Hodgins. Track-
lishing Company, second. edition, 1990. ing and modifying upper-body human
motion data with dynamic simulation.
[5] M. Goto and Y. Muraoka. Music un- In Computer Animation and Simulation
derstanding at the beat level: Real-time 99. Springer-Verlag Wien, 1999.
beat tracking for audio signals. In Work-
ing Notes of the IJCAI-95 Workshop on
Computational Auditory Scene Analysis,
pages 6875, Montreal, Aug. 1995.
[6] M. Goto and Y. Muraoka. A real-time

beat tracking system for audio signals.

tmpF1C0 TMP

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

tmpF1C0 TMP

Hochgeladen von

Copyright:

Verfügbare Formate

Implementation of a 3D Virtual Drummer

Martijn Kragtwijk, Anton Nijholt, Job Zwiers

Department of Computer Science

First, the list of MIDI events is transformed

3. When playing fast rolls, both hands should

In our system, these principles were im-

When more than two events (that should

In the object source les, we have to dene

In this section, the transformation from ab- Added key frames

approach is used for the limbs and the cym-

Figure 7: New key frames in the case of over-

angle(t) = t max sin(t)

time max represents the maximum angle

is the damping factor of the vibration

determines the speed of the vibration:

The bass pedal and the hi-hat are animated time

through the same kind of Stroke objects as

In this section, some advanced techniques will

1 strictly left-right 4.3. Key Frame Generation

4.2.3. Shortest path methods

Figure 11: The eect of dierent velocity and current

[12] A. Nijholt and J. Hulstijn. Multimodal in-

[6] M. Goto and Y. Muraoka. A real-time

Das könnte Ihnen auch gefallen