Beruflich Dokumente
Kultur Dokumente
qxd
30/05/2006
11:11
Page 365
365
H ENKJAN H ONING
Music Cognition Group, University of Amsterdam
WHILE THE MOST COMMON WAY of evaluating a computational model is to see whether it shows a good fit with
the empirical data, recent literature on theory testing
and model selection criticizes the assumption that this
is actually strong evidence for the validity of a model.
This article presents a case study from music cognition
(modeling the ritardandi in music performance) and
compares two families of computational models (kinematic and perceptual) using three different model selection criteria: goodness-of-fit, model simplicity, and the
degree of surprise in the predictions. In the light of what
counts as strong evidence for a models validity
namely that it makes limited range, nonsmooth, and
relatively surprising predictionsthe perception-based
model is preferred over the kinematic model.
among computational
models of cognition? This question has recently
attracted much discussion (Pitt, Myung, &
Zhang, 2002; Roberts & Pashler, 2000, 2002; Rodgers &
Rowe, 2002). While the most common way of evaluating
a computational model is to see whether it shows a good
fit with the empirical data, the discussion addresses
problems that might arise with the assumption that this
OW SHOULD WE SELECT
Music Perception
VOLUME
23,
ISSUE
5,
PP.
365-376,
ISSN
0730-7829,
ELECTRONIC ISSN
1533-8312 2006
UNIVERSITY OF CALIFORNIA . ALL RIGHTS RESERVED. PLEASE DIRECT ALL REQUESTS FOR PERMISSION TO PHOTOCOPY OR REPRODUCE ARTICLE CONTENT
THROUGH THE UNIVERSITY OF CALIFORNIA PRESS S RIGHTS AND PERMISSIONS WEBSITE AT WWW. UCPRESS . EDU / JOURNALS / RIGHTS . HTM
02.MUSIC.23_365-376.qxd
366
30/05/2006
11:11
Page 366
H. Honing
A considerable number of theories on the use of expressive timing in music performance make predictions on
the final ritardandi (or final ritard), that is, the typical
slowing down at the end of a music performance, especially in music from the Western Baroque and
Romantic periods (Hudson, 1996). This characteristic
slowing down can also be observed in, for instance,
Javanese gamelan music and some pop and jazz genres.
Together with the fade-out, it is one of the most
common ways of marking the ending a piece of music
in Western culture. Several approaches have been
suggested to explain the typical ritardandi found in
music performance, most notably the relation between
(1)
(2)
In this generalized model, w is defined over the interval 0, 1], and q over the interval 0, ]. Furthermore,
the curvature of q is dependent on w. When w
02.MUSIC.23_365-376.qxd
30/05/2006
11:11
Page 367
0.8
0.8
Normalized tempo
Normalized tempo
0.6
0.4
q=.1, w=0.3
q=1, w=0.3
q=2, w=0.3
q=3, w=0.3
0.2
367
0.6
0.4
q=3, w=0.2
q=3, w=0.4
q=3, w=0.6
q=3, w=0.8
0.2
0
0
0.2
0.4
0.6
0.8
0.2
0.4
0.6
0.8
FIG. 1. Predictions of the final ritard made by the kinematic model. The x axis indicates the normalized score position, the y axis the
normalized tempo. The left panel shows some of the predictions that can be made by varying q for curvature (with q 2 being a model of
constant braking force, and q 3 a model of constant breaking power); the right panel shows some of the predictions that can be obtained by
varying w for slope.
(3)
(t)
p
p
t tx
p , tx 2 t tx 2
(4)
p
sech2( cos 2(t*) 1) sin 2(t*) (6)
2
02.MUSIC.23_365-376.qxd
368
30/05/2006
11:11
Page 368
H. Honing
C1
1.4
1.4
1.2
1.0
1.2
C0
1.0
0.8
0.8
0.6
Normalized tempo
0.6
0.4
0.0
C2
0.5
0.4
1.0
0.0
1.0
0.5
1.0
1.4
1.4
1.2
1.2
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.0
0.5
0.5
1.0
0.0
1
A log scale will make the areas shown in Figure 2 symmetrical. A
linear scale is used for easier comparison with the existing literature.
2
As it turned out, it is not too important which models are used for
the categorization component: Different combinations of quantizers
and a tempo tracker gave roughly similar behavior (Honing, 2005).
See error bars in Figure 2.
02.MUSIC.23_365-376.qxd
30/05/2006
11:11
Page 369
3
Of course, some performers challenge this. For instance, the late
Glenn Gould took tempi or used timing that would alter the actual
notated rhythm on occasion. However, it could be argued that the
mere existence of such perceptual boundary, and just crossing it,
makes an interpretation ambiguous and interesting (cf. Desain &
Honing, 2003).
369
Goodness of fit (GOF) is the most common way of testing the validity of a theory. It indicates the precision
with which a model fits a particular sample of observed
data. The predictions of the model are compared with
the observed data and the discrepancy between the two
is measured using an error measure.
I shall use GOF to compare the two models on measurements of final ritardandi taken from Friberg and
Sundberg (1999). This setthe F&S99 setconsists
of twelve harpsichord performances of compositions by
J. S. Bach. Sundberg and Verrillo (1980) used slightly
larger set of twenty4 performances that includes the
F&S99 set; these are referred to as the S&V80 set, and
will be used for comparison.
To obtain the best fit, for the K model the q and w
parameters were varied (see Eq. 2), as was the voffset
parameter, which adds a constant to the model. This
third parameter was added to eliminate a propagating
effect of sometimes misfitting the first normalized
Sundberg and Verillo (1980) originally reported on 24 measurements. Of this set, only 20 measurements were made available to the
author.
02.MUSIC.23_365-376.qxd
370
30/05/2006
11:11
Page 370
H. Honing
TABLE 1. Results for the K (Kinematic) and P (Perception-Based) Model as Fitted on the F&S99 Dataset.
K model
Music examples
Id
Notes
W. clav. I Prel. 1
W. clav. II Prel. 1
W. clav. II Prel. 2
W. clav. II Fug. 3
W. clav. II Fug. 5
W. clav. II Fug. 5
Eng. Suite 1 All.
Eng. Suite 2 All.
Fr. Suite 4 Cour.
Fr. Suite 6 All.
Fr. Suite 6 Cour.
It. Conc. Mvt. 3
WIP
WP1
WP2
WF3
WF5A
WF5B
E1A
E2A
F4C
F6A
F6C
IC3
10
8
7
6
7
8
6
11
6
7
7
7
2.4
2.1
2.5
1.1
4.3
2.6
2.0
3.8
4.1
2.3
5.0
1.2
.32
.50
.51
.48
.51
.38
.45
.37
.50
.44
.46
.34
Mean
SD
2.7
1.2
.44
.07
K model
P model
Set
Last
Mean r 2
SD
Mean r2
SD
F&S99
F&S99
S&V80
S&V80
+
+
.98
.89
.95
.86
.01
.08
.05
.09
.90
.97
.91
.97
.07
.04
.07
.05
r2
.010
.000
.030
.020
.010
.020
.030
.020
.020
.020
.010
.010
.98
.98
.97
.97
.99
.98
.98
.98
.99
.98
.99
.97
0.1
0.0
0.0
0.0
0.0
0.2
0.0
0.0
0.0
0.0
0.0
0.1
0.7
1.0
0.6
0.7
0.2
0.0
0.4
1.0
0.5
0.6
0.0
0.9
1.0
1.0
1.0
1.0
1.0
0.8
1.0
1.0
1.0
1.0
0.8
1.0
.85
.99
.99
.95
.93
.89
.88
.85
.89
.96
.76
.85
.002
.019
.98
.01
0.03
0.07
0.55
0.35
0.97
0.08
.90
.07
voffset
TABLE 2. Results for Datasets F&S99 and S&V80 with () and
without () the Last IOI.
*
*
n.s.
**
* p .01. ** p .001.
5
For the other parameters of the P model, the default settings were
used (i.e., the period of the oscillator is initially set to the first
observed interval in the input, and the initial phase is set to zero).
Furthermore, since the global tempo (absolute IOIs) is not considered relevant to the K model, for the P model the fit was selected for
the tempo that gave the best results.
P model
2
02.MUSIC.23_365-376.qxd
30/05/2006
11:11
Page 371
Perception-based model
Kinematic model
1
0.8
0.8
Normalized tempo
Normalized tempo
371
0.6
0.4
0.2
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
0.2
0.4
0.6
0.8
FIG. 3. The response area (gray) for the K model (left) and the P model (right). Circles indicate the measurements of the F&S99 dataset
(closed circles indicate mean tempo; error bars indicate
one standard deviation).
7
For alternatives in visualizing the complexity of a model, see Pitt,
Myung, and Zhang (2002).
02.MUSIC.23_365-376.qxd
372
30/05/2006
11:11
Page 372
H. Honing
0.8
0.8
Normalized tempo
Normalized tempo
0.6
0.4
0.6
0.4
0.2
0.2
0
0
0.2
0.4
0.6
0.8
0.2
0.4
0.6
0.8
FIG. 4. The response area (gray) for the P model modulated by note density (left) and rhythmic structure (right). The left panel show the
influence of note density for a rhythm of six IOIs (open squares) and twelve IOIs (closed triangles). The right panel shows the influence
of isochronous data (open triangles) versus nonisochronous data (closed triangles).
much smaller portion of the full square. So, in conclusion, the P model exhibits much less flexibility than the
K model, despite the larger complexity in terms of the
functional form of the former model (cf. equations).
Another aspect that can be analyzed using the
response area visualization is the influence of the structural characteristics of the input data. In Honing (2005)
it was shown that the K model is insensitive to note density and rhythmic structure: The predicted shape of the
ritard is not affected by these factors. However, these factors were shown to have an effect on the predictions
made by the P model. Using the response area visualization, this can also be shown in the current context.
In Figure 4a the response area for the P model is shown
for six notes (marked with open squares, repeated from
Figure 3b) and for an input twice as dense (i.e., 12 notes;
marked with closed triangles). It shows that note density has an effect on the size of the response area: The
more notes (per time unit), the larger the response area.
This is in line with the idea that one would expect a
ritard of many notes to allow for a deep rubato, while
one of only a few notes is likely to be less deep (i.e., less
slowing down), simply because there is less material
with which to communicate the change of tempo to the
listener (cf. Honing, 2005).
Furthermore, as is shown in Figure 4b, the rhythmic
structure also has an effect on the response area of the
P model. When the input pattern (marked with open
triangles, repeated from Figure 4a) is grouped into
notes of different duration (shown for rhythm 1-2-3-12-3 in Figure 4b; marked with closed triangles), the
response area shrinks. This is in line with research in
02.MUSIC.23_365-376.qxd
30/05/2006
11:11
Page 373
Normalized tempo
Possible
Plausible
Predicted
0
373
8
While there are studies of just noticeable differences (JND) of single timing deviations in an isochronous sequence (e.g., Michon,
1964), one cannot make a tempo curve prediction from these.
02.MUSIC.23_365-376.qxd
374
30/05/2006
11:11
Page 374
H. Honing
B Weak Support
A Weak Support
1
Normalized tempo
K model
P model
Possible
Plausible
D Strong Support
C Weak Support
0
Predicted
not have this quality (cf. Figure 4a). The latter can fit a
considerably larger number of shapes of ritards, and is
hence more complex. A challenge for future research is
to see whether the predictions made by the P model
generally hold, that is, whether empirical data that is
systematically varied for note density and rhythmic
structure will stay outside the predicted forbidden zone.
For now, and in the light of what counts as strong evidence for a modelnamely making precise (constraint),
nonsmooth, and relatively surprising predictionsthe
P model can be preferred over the K model. The general
aim of this article was to show how current issues in
model selection can be of use to the computational
modeling of music cognition.
Author Note
02.MUSIC.23_365-376.qxd
30/05/2006
11:11
Page 375
375
References
CLARKE, E. F. (1999). Rhythm and timing in music. In
D. Deutsch (Ed.), Psychology of music (2nd ed., pp. 473500).
New York: Academic Press.
CLARKE, E. F. (2001). Meaning and the specification of motion
in music. Musicae Scientiae, 5, 213234.
DANNENBERG, R. B., & MONT-REYNAUD, B. (1987). Following a
jazz improvisation in real time. In Proceedings of the 1987
International Computer Music Conference (pp. 241248). San
Francisco: International Computer Music Association.
DESAIN, P., & HONING, H. (1989). The quantization of musical
time: A connectionist approach. Computer Music Journal, 13,
5666.
DESAIN, P., & HONING, H. (1992). The quantization problem:
Traditional and connectionist approaches. In M. Balaban,
K. Ebcioglu, & O. Laske (Eds.), Understanding music with AI:
Perspectives on music cognition (pp. 448463). Cambridge,
MA: MIT Press.
DESAIN, P., & HONING, H. (2003). The formation of rhythmic
categories and metric priming. Perception, 32, 341365.
DESAIN, P., HONING, H., THIENEN, H. VAN, & WINDSOR, W. L.
(1998). Computational modeling of music cognition:
Problem or solution? Music Perception, 16, 151166.
EITAN, Z., & GRANOT, R. Y. (2005). How music moves: Musical
parameters and listeners images of motion. Music Perception,
23, 221247.
EPSTEIN, D. (1994). Shaping time. New York: Schirmer.
FELDMAN, J., EPSTEIN, D., & RICHARDS, W. (1992). Force
dynamics of tempo change in music. Music Perception, 10,
185204.
FODOR, J. (2000). The mind doesnt work that way. The scope and
limits of computational psychology. Cambridge, MA: MIT
Press.
FRIBERG, A., & SUNDBERG, J. (1999). Does music performance
allude to locomotion? A model of final ritardandi derived
from measurements of stopping runners. Journal of the
Acoustical Society of America, 105, 14691484.
GABRIELSSON, A. (1999). Music performance. In D. Deutsch
(Ed.), Psychology of music (2nd ed., pp. 506602). New York:
Academic Press.
GJERDINGEN, R. (1994). Apparent motion in music? Music
Perception, 11, 335370.
HONING, H. (2003). The final ritard: On music, motion, and
kinematic models. Computer Music Journal, 27, 6672.
HONING, H. (2005). Is there a perception-based alternative to
kinematic models of tempo rubato? Music Perception, 23,
7985.
HUDSON, R. (1996). Stolen time: History of tempo rubato.
London: Clarendon Press.
JACOBS, A. M., & GRAINGER, J. (1994). Models of visual word
recognitionSampling the state of the art. Journal of
02.MUSIC.23_365-376.qxd
376
30/05/2006
11:11
Page 376
H. Honing