Beruflich Dokumente
Kultur Dokumente
OI TONAII TY
aurrw j . :ttr, :.a., t.a. nos
8ubmitted for the degree of Doctor of Philosophy
Iaculty of Mathematics, Computing and Technology
The Open University
April ao,
Andrew. Milne, M.A., B.A. Dons. Ccootct:coc| Mc1e| cj t|e Ccg-
o:t:co cj 1coc|:ty, C April ao,
Music epresses that which cannot be put into words
and that which cannot remain silent.
Victor Dugo
Dedicated to Iaurie
,, ,,,
AB8 TRACT
1coc|:ty is the organization of pitches, both simultaneously and across
time, so that certain pitches and chords are heard as attracted, in varying
degrees, to other pitches and chords. Most art music from the seven-
teenth to the nineteenth centuries, and popular music to the present
day, is heavily steeped in a musical language that makes use of tonality
to dene a central most attractive pitch or chord called the tco::. It is
widely thought that the feelings of epectancy and resolution induced
by movements towards and away from the tonic allow composers to
imbue tonal music with meaning and emotion.
In this dissertation, I identify and model some of the innate pro-
cesses by which feelings of tension, resolution, stability, and so forth,
are induced by successions of pitches and chords, irrespective of their
harmonic consonance. By :oocte, I mean processes that do not require
the learning of a musical corpussuch processes are important because
they provide eplanations for why tonal music, and our cognition of
it, take the specic forms they do.
To do this, I introduce a novel family of mathematical methods
metrics applied to epectation tensorsfor calculating the similarity
of pitch collections. Importantly, such tensors can represent not just
the notated pitches of tones, but also their spectral pitches their har-
monics. I then demonstrate howthese techniques can be used to model
participants ratings of the ts of tones in microtonal melodies, and
the ts of all twelve chromatic pitches to an established key centre
Krumhansls probe tone data. The techniques can also be generalized
to predict the tonics of any arbitrarily chosen scaleeven scales with
unfamiliar tunings.
In summary, I demonstrate that psychoacoustic processes, which are
innate and universal, play an important role in our cognition of tonal-
ity.
v
PUBII CATI ON8
8ome ideas and gures have previously appeared in the following pub-
lications.
Milne, A. ., 8ethares, W. A., and Plamondon, . aoo. Tuning con-
tinua and keyboard layouts. coroc| cj Mct|eoct::s co1 Mos::, a.
,.
Milne, A. . aoo. Tonal music theory. Apsychoacoustic eplanation
In Demorest, 8. M., Morrison, 8. ., and Campbell, P. 8., editors,
Irc:ee1:ogs cj t|e t| Ioteroct:coc| Ccojereo:e co Mos:: Ier:et:co co1
Ccgo:t:co, pages ,;ooo, University of Washington, 8eattle, U8A.
Milne, A. ., Prechtl, A., Ianey, R., and 8harp, D. B. aoo. 8pectral
pitch distance 8 microtonal melodies. Poster presented at the Inter-
national Conference for Music Perception and Cognition, Univer-
sity of Washington, 8eattle, U8A, a,a; August aoo.
Milne, A. ., 8ethares, W. A., Ianey, R., and 8harp, D. B. aoo. Met-
rics for pitch collections. In Demorest, 8. M., Morrison, 8. ., and
Campbell, P. 8., editors, Irc:ee1:ogs cj t|e t| Ioteroct:coc| Ccojereo:e
co Mos:: Ier:et:co co1 Ccgo:t:co, pages ;;o, University of Wash-
ington, 8eattle, U8A.
Milne, A. ., 8ethares, W. A., Ianey, R., and 8harp, D. B. ao. Mod-
elling the similarity of pitch collections with epectation tensors.
coroc| cj Mct|eoct::s co1 Mos::, .ao.
Prechtl, A., Milne, A. ., Dolland, 8., Ianey, R., and 8harp, D. B.
aoa. A MIDI sequencer that widens access to the compositional
possibilities of novel tunings. Ccooter Mos:: coroc|, ,o.,a,.
vii
ACKNOWIIDGMINT8
Iirstly, I would like to thank my two supervisors, Dr. Robin Ianey
and Dr. David 8harp, for the huge amounts of time they spent reading
various drafts of material as it progressed, and making numerous useful
suggestions. They, correctly, insisted I presented my ideas in the sim-
plest and most accessible formpossible. 8omething I cannot pretend to
have always achieved.
Professor Bill 8ethares has been a valued colleague and friend since
we wrote our rst paper together in aoo;. 8ince that time, Bill has
acted as an unocial eternal supervisor for both my Masters degree
completed in aoo, and over the period I have been working on this
PhD dissertation. De introduced me to many key mathematical con-
cepts, such as linear algebra and metrics, that have proved invaluable
for my work. Chapter , draws upon a journal paper I co-authored with
Bill and my two supervisors. The concepts and their development are
minethe use of Gaussian smoothing, generalizing to higher dimen-
sions, the use of tensors, the dierences between absolute and relative
embeddings, and breaking the tensors into subspaces to simplify their
calculation. But these all drewupon Bills seminal input, which was the
suggestion of formal metrics between vectors in the pitch domain. Bill
also helped me to generalize the tensor simplications into dimensions
higher than three above which human visualization becomes some-
what challenging. I would also like to thank Dr. Bridget Webb for
helpful discussions on techniques for enumerating the subspaces of the
tensors so they can be appropriately included and ecluded.
Also in relation to this chapter, I would like to thank our two anony-
mous reviewers at the coroc| cj Mct|eoct::s co1 Mos::. Particularly the
suggestions to change our matri notation into tensor notation, and
to avoid the ecessive use of subscripts something which I only had
time to partially rectify in the published paper, but which I fully com-
i
pleted for this dissertation. I would also like to thank Professors Tuo-
mas Ierola and Petri Toiviainen for allowing an early phase of this
project to be developed as part of the Music, Mind and Technology
Masters programme at the University of yvskyl, and Margo 8chul-
ter for setting an early challenge, the solution to which provided some
important insights.
8ection a., and Chapter , draw heavily upon a paper I co-authored
with my supervisors for the journal Mos:: Ier:et:co. I would like to
thank the three anonymous reviewers, all of whom made etremely
helpful suggestions for improving the work, and imPlamondonmy
isomorphic co-conspiratorwho also gave me many useful comments
on these two sections. I would like to thank 8tefan Kreitmayer for as-
sistance with the ava8cript parts of the Ma patch used in the eper-
iment, Professor Paul Garthwaite for advice on appropriate statistical
tests, and my participants for taking time out of their busy schedules
to listen to weird musical stimuli, and rate them.
I wouldlike toepress myprofoundgratitude toDr. 8imonDolland,
mostly for being 8imon. But, more specically, for being an ever stim-
ulating and fun sounding-board, inspiring novel avenues of thought,
and for his enthusiasm and interest in my work. I would also like to
thank all the other members of The 8tern Brocot Band8imon Dol-
land, Vassilis Angelis, Maime Canelli, Anthony Prechtl, and 8imon
Rolphfor allowing some progressive music-making to sneak its way
into my allotted research time. Without creativity, its all just numbers!
And the great and irreplaceable constants throughout this process
who deserve the most thanksare Christine and Astra for keeping life
normal and creating a happy family environment in the face of my
long hours of work and neurotic obsessing with nding simplications
of operations applied to multi-dimensional arrays of numbers.
CONTINT8
trouucto
a :ourtttc nr coctto or oattv ;
a. Why is Tonality Important ,
a.a A 8trategy for Modelling Tonality ,
a.a. Methodology o
a., Models of Mental 8tates. Nature and Nurture
a.,. Iplanation and Prediction ,
a.,.a Mental Processes. Nature and Nurture a
a.,., Circularity and Iplanation ,,
a.,., Bottom-Up and Top-Down Models and Causal
Claims ,
a., Iisting Models of Tonality ,
a. My Models of Tonality ,;
, :ourtttc nr st:ttartv or rtcn cottrc-
tos
,. An Introduction to Metrics and 8imilarity ,
,.a Category Domain Imbeddings o
,., Pitch Domain Imbeddings oo
,., Ipectation Tensors o,
,.,. Monad Ipectation Tensors o
,.,.a Dyad Ipectation Tensors o;
,.,., Triad Ipectation Tensors ;a
,.,., R-ad Ipectation Tensors ;;
,. Metrics ;,
,.o Applications a
,.o. Tonal Distances a
,.o.a Temperaments o
,.o., Musical 8et Theory ,
,.; Discussion ,,
i
ii cors
, a :ourt or :rtoutc arrttv ,;
,. The Model of Melodic Anity oa
,.. Anity. Pitch 8imilarity and Dorizontal Ia-
miliarity o
,..a Consonance. Roughness, Toneness, and Verti-
cal Iamiliarity o,
,.., Minimization and Control of Confounding
Variables
,.a Method
,.a. Participants
,.a.a 8timuli and Apparatus o
,.a., Procedure aa
,., Results a,
,.,. Data Aggregated Over All 8timuli and Partici-
pants a;
,.,.a Data Aggregated Over Participants
Only a,
,., Discussion ,
,.,. Matching Timbre and Tuning ,,
,.,.a 8cales for Tones with Darmonic 8pectra ,,
,. Conclusion ,o
a :ourt or nr rrotr or uaa au scattc
oattv ,;
. The Models a
.. Krumhansl ,ob. corpus prevalence
model. ,
..a Ierdahl . pitch space model. o
.., Butler ,. aggregate contet pitch multiplicity
model. ;
.., Parncutt ,. aggregated contet pitch class
salience model. ,
.. Ieman oo. short-term memory model. oo
..o Krumhansl ,oa. consonance model. o
cors iii
..; 8mith ,;. cumulative consonance
model. oa
.. Parncutt / 8 ,,. virtual pitch class mod-
els. o,
.., Milne a. spectral pitch class similarity mod-
els. o;
.a A Template-Independent Theory of Tonality ;
., 8calic Tonality ;a
.,. Iit Proles for a-TIT 8cales ;
.,.a Iit Proles for Microtonal 8cales ,
., Conclusion
o coctusto ,
o. Contributions ,,
o.a Iimitations ,o
o., Iuture Work ,;
o., Implications aoa
rrrrrrcrs ao
a s:oontc wtun au nr utrrrrrcr tt-
:r aa
t rsors, rsor orrratos, au nrtr oa-
to aa,
c co:ruatoat st:rttrtcato or rrrcato
rsors aa;
u ror:at srrctrtcato or nr :rtoutc arrttv
:ourt a,,
r cross-vattuato corrrtato au roo :ra
suarru rrror a,,
r ror:at srrctrtcato or nr rrotr or uaa
:ourt a,
II 8 T OI II GURI8
Iigure a. Iour categories of mental processes by which
physical stimuli are transduced into subjective
mental states. a,
Iigure a.a Aggregated over people and time, composers
create a feedback loop. ,,
Iigure a., A path diagram demonstrating a loop-
enhanced eect. ,o
Iigure a., Three causal models that can account for cor-
relation between a musical corpus previous
musical events and mental states both ag-
gregated over time and across individuals.
Physical stimuli are in the top row, men-
tal processescategorized into nature and
nurtureare in the second row, the result-
ing subjective mental state is in the bottom
row. By denition, top-down nurture pro-
cesses are those that receive anarrowfrompre-
vious music events, bottom-up nature pro-
cesses do not. Note that, as eplained in 8ec-
tion a.,.a., a causal path from previous mu-
sic events to the current music event is not
meaningful. ,,
Iigure a. The circle-of-fths. ,,
Iigure a.o A 1cooetz. ,,
iv
Iist of Iigures v
Iigure ,. Pitch domain embeddings of two tonesone
with a pitch of ,oo cents, the other with a pitch
of ,o cents. On the left, no smoothing is ap-
plied, so their distance under anystandard met-
ric is maimal, on the right, Gaussian smooth-
ing standard deviation of , cents is applied,
so their distance under any standard metric is
small. oa
Iigure ,.a 8pectral pitch distances of a Cmaj reference
triad and all possible a-r triads that con-
tain a perfect fth. 8pectral pitch distance is
calculated with smoothing of o., cents and
roll-o of o.,a The greyscale indicates the
spectral pitch distance with the reference triad
the darker the shade, the lower the distance
and hence the greater the modelled anity.
A selection of major and minor triads are
labelledupper case for major, lower case for
minor. ,
Iigure ,., The cosine distance on relative dyad epecta-
tion embeddings with a Gaussian smoothing
kernel of , cents standard deviation between
a just intonation major triad (0, 386.3, 702) and
all n-tets from n = 2 to n = 102. ;
vi Iist of Iigures
Iigure ,., The cosine distance between relative dyad
embeddings of a just intonation major triad
{0, 386.3, 702} and a ,-tone -chain whose
-tuning ranges from o to ,,,., cents. The
smoothing is Gaussian with standard devia-
tions of o cents left side, and , cents right
side. The two zooms show the distance min-
ima occurring at the meantone o, and o,o
cents and helmholtz ,, and ;oa tunings, and
howtheir relative levels change as a function of
smoothing width. ,
Iigure ,. The cosine distance between relative dyad
embeddings right and relative triad embed-
dings left of a just intonation major triad
{0, 386.3, 702} and a ;-tone -chain whose -
tuning ranges from o to ,,,., cents. The
smoothing is Gaussian with a standard devia-
tion of , cents. ,
Iigure ,.o The cosine distance using a Gaussian smooth-
ing kernel with a , cents standard deviation
between a just intonation Bohlen-Pierce ma-
jor triad {0, 884.4, 1466.9}, with a period of
,oa cents, and a ,-tone -chain whose -
tuning ranges from o to ,o., cents. ,a
Iist of Iigures vii
Iigure ,. The full model of empirical anity i.e., an-
ity as reported by participants. Physical stim-
uli are in the top row, mental processes in the
middle row, and subjective mental states, and
empirical reports thereof, in the bottom row.
Mental processes with an arrowfromprevious
music events are nurture processes i.e., hor-
izontal and vertical familiarities, those with-
out, are nature processes i.e., pitch similar-
ity, roughness, and toneness. In the eperi-
ment described later, the impact of horizontal
familiarity on anity, and roughness on con-
sonance, is minimized, for this reason, these
processes and submodels thereof are not in-
cluded in the nal model of the eperimen-
tal data, which is why they are greyed out and
their causal paths are dashed. o,
Iigure ,.a The eect of smoothing convolving a spec-
trum with a discrete approimation of a nor-
mal distribution with a standard deviation of
o cents. o
Iigure ,., Virtual pitch weights for a harmonic comple
tone as modelled by cross-correlating the spec-
tral pitch vector in Iigure ,.ab with a harmonic
template. o,
viii Iist of Iigures
Iigure ,., Inherent roughnessmodelled with 8ethares
aoo routineover a continuum of gener-
ated tunings the generator is a fth-like inter-
val ranging fromoo to ;a cents. This tuning
continuum includes a variety of n-rs, only a
few of which have a low value of n these are
labelled. Note that there is a broad local min-
imum of sensory dissonance when the partials
are close to harmonic appro. o,o cents, and
narrow local minima at low-n n-rs. ,
Iigure ,. The nal model of the empirical dataanity
is modelled by either spectral or virtual pitch
similarity, consonance by harmonicity. The
empirical data participants responses is mod-
elled as a combination of anity and conso-
nance. Pitch similarity and toneness are nature
processes, vertical familiarity is a nurture pro-
cess by denition, because one of its inputs is
previous music eventssee 8ec. a.,.a
Iist of Iigures i
Iigure ,.o Results aggregated over all participants. the
squares represent the o dierent pairs of
stimuli the participants were presented with,
the shading of each square indicates the ra-
tio of occasions when the matched, rather
than unmatched, timbre was chosen white for
all matched, black for all unmatched. The
rows labeled by n-r represent the dier-
ent underlying tunings or, equivalently, the
matched timbres spectral tuning, the columns
represent the dierent unmatched timbres
spectral tunings. The bottom row and right-
most column show ratios aggregated over un-
derlying tunings and unmatched timbres, re-
spectively. The bottom-right square is the ra-
tio aggregated over all tunings and unmatched
timbres. Black stars indicate signicantly more
than half of the choices were for matched tim-
bres, white stars indicate signicantly more
were for unmatched timbres using a two-tailed
eact binomial test
in-
dicates signicance at the .001 level. The rst
column demonstrates that model a is signi-
cantly better than all the other models ecept
model , it is better, but not signicantly so.
The rst row shows that each of the two sub-
models spectral pitch distance and harmonic-
ity is signicantly better than the null model,
while the rst column shows that both sub-
models together i.e., model a is better than
either submodel alone. ,
iv Iist of Tables
Table ,.o 8tatistical analysis and evaluations of the model
and its parameters the logistic part of the
model does not include a constant term. 8tan-
dard errors were derived froma numerical esti-
mation of the Dessian matri, the z-score and
p-value were calculated froma signed-rank test
on the cross-validation as described in the main
tet. ,,
Table . Correlations r, cross-validation correlations
r
CV
both with aa degrees of freedom,
and cross-validation root mean squared er-
rors of cross-validation r:srcv of the pre-
dictions made by a variety of models com-
pared with Krumhansl and Kesslers ,a
probe tone data. The cross-validation statis-
tics are the means of these statistics taken over
twenty runs of a-fold cross-validation the
cross-validation statistics are eplained in Ap-
pendi I. The null model is an intercept-only
modeli.e., all values are modelled by their
mean. The remaining models are described in
the main tet. The models are ordered by their
cross-validation statistics or, when these are
missing, by their correlation. ,
1
I NTRODUCTI ON
Imagine listening or actually listen to . 8. Bachs rst Prelude in C
major BWV ,o. I choose this piece because it is well known, and it
clearly eemplies many of the important characteristics of tonal music
that I discuss in this dissertation including present day tonal music.
The rst four bars produce a gentle wave of tension and release, most
noticeable with the resolution of the chord in the third bar to the chord
in the fourth bar each bar, in this piece, consists of a single arpeggiated
chord. In the fth bar, the music seems to start afresh and take a new
journey for the net few bars until, in bar , there is a feeling of a
somewhat temporary resolutionnot a nal destination, but a brief
resting place. In bar a, the tension increases considerably, and this is
releasedtosome etentinthe subsequent bar. This two-bar pattern
of tensioneven-numberedbar thenpartial release odd-numberedbar
repeats up to bar a, where the tension is increased further. This tension
is sustained at a high overall level, occasionally peaking, until the nal
resolution to Cmaj in the ,oth bar, which provides a strong feeling of
release, resolution, and closure. This is a crude analysis and the details
may dier for dierent listeners, but I use it to show how feelings such
as tension, release, motion, rest, stability, and closure, and so forth, are
typical responses to tonal music.
The primary research question addressed by this dissertation is. Can
we identify, and model, the causal processes by which feelings of ten-
sion, resolution, stability, and so forth, are induced by successions of
pitches and chords
This research question is relevant to Western tonal music because, in
such music, pitches are organized across time as melodies and chord
progressions, and the feelings of tension and resolution this induces
are amongst tonal musics most important perceptual characteristics.
a trouucto
Indeed, the feelings of epectancy and fulllment aroused by tense
chords resolving to stable chordsor the lack of fullment when they
do notis one of the principal means by which music communicates
emotion and gains meaning Meyer, ,o, Duron, aooo.
Ior eample, the chord progression ImajGmaj sets up a powerful
epectancy that the following chord will be Cmaj. nishing on this
chord produces a strong sense of resolution and closure. Moving, in-
stead, to Amin has a somewhat surprising but pleasing eect. there is
some sense of resolution but it is not as strong. Moving to Imaj, how-
ever, gives a quite dierent eect. it sounds genuinely surprising, even
somewhat clumsy.
Inevitably, there are many possibles causes of such responses to mu-
sic. I restrict my focus in two ways. Iirstly, I concentrate on those
causes that are not learned, that is, hard-wired or innate responses to
physical stimuli. This is in contrast to Meyer and Duron and many con-
temporary researchers e.g., Krumhansl ,,o, Ierdahl aoo, Pearce
and Wiggins aooo who argue that tonal epectancy is down to no
more than familiaritythe learning of common patterns. Instead, I
propose that innate perceptual and cognitive mechanisms play an im-
portant additional role, and one that is actually more fundamental be-
cause they may underlie music perception across cultural boundaries
and are able to make predictions about which musical forms are more
likely to arise. This is discussed, in depth, in 8ection a.,.
8econdly, I consider only those processes that are not a function of
changing levels of consonance and dissonance. Ior eample, consider
moving from the harmonically dissonant chord G; to the consonant
chord Cmajsuch a progression induces a powerful feeling of clo-
sure, and it is plausible that this is due to the transition from disso-
nance to consonance. Dowever, there is clearly more to it than just
this, for instance, consider the progression G;Imaj, which similarly
transitions from dissonance to consonance, but induces little feeling of
closure. Or, consider the above-mentioned progression ImajGmaj
trouucto ,
Cmaj where all the chords are equally consonant, but which does in-
duce a strong feeling of closure.
By constraining my research area in these two ways, I can state my
aim precisely. I am seeking to identify and model the innate processes
by which feelings of tension, resolution, stability, and so forth, are in-
duced by successions of pitches and chords, irrespective of their har-
monic consonance.
In Chapter a, I consider the overall background to this research, and I
use eisting literature to eplain the development of my theory, and to
help shape the denitions I use throughout this dissertation. Iirstly, I
consider various common meanings of the word tcoc|:ty, and introduce
the denition I use for this dissertation. tcoc|:ty is the organization of
pitches, both simultaneously and across time, so that certain pitches or
chords are heard as attracted, in varying degrees, to other pitches and
chords. I also describe howtonality induces feelings described by terms
like co:ty, teos:co, c:t:::ty, exe:tco:y, resc|ot:co, stc|:|:ty, and so forth. In
8ection a., I discuss the importance of tonalitynotably the manipu-
lation of epectancy and resolutionin communicating meaning and
emotion in music. In 8ection a.a, I outline a broad strategy by which
modelling the cognition of tonality may be approached. I hypothe-
size tonality is due, in part, to three mechanisms. familiarity, conso-
nance, and anity co:ty is the melodic analogue of consonance, the
etent to which sequential pitches or chords are perceived to t. Given
an overall contet of pitches such as a previously established scale,
we epect pitches or chords that are unfamiliar, dissonant, and have
low anity to move to pitches or chords that are familiar, consonant,
and have high anity. My focus is on modelling innate causes of an-
ity, and using these to model tonal cognition. I also discuss the broad
methodological framework of this research. my aim is to produce par-
simonious mathematical computational models of perception tested
against eperimentally obtained data. I also seek models with eplana-
tory as well as predictive poweras eplained in 8ection a.,.
, trouucto
8ection a., is a more substantial section in which I categorizein a
novel waythe dierent processes by which physical stimuli can in-
duce subjective mental states sensations, feelings, and concepts, and
discuss the comple interlinked causal roles played by each of these
processes. It is in this section that I discriminate between the predictive
and eplanatory powers of models, identify the circular causal pro-
cesses that operate between the perceptions of listeners and composers
and the musical repertoire they create, and argue that only bottom-up
models of innate processes are capable of providing eective eplana-
tions for tonal cognition. In 8ection a.,, I briey review eisting the-
ories and models of tonal cognition I provide more focused reviews in
later chapters. In 8ection a., I give an overview of the models I have
developed to meet the above research aim, and how they are empiri-
cally tested.
In Chapter ,, I introduce the general mathematical techniques that
underlie all of the more specic models used in subsequent chapters.
These constitute a novel family of mathematical methods for calcu-
lating the similarity of pitch collections 8ecs. ,.a,.. The similarity
values are derived by standard metrics applied to pairs of exe:tct:co teo-
scrs a tensor is also known as a multi-way array, which embed either
pitches or pitch classes.
tT
m
s(m
0
, m
t
)
_
_
, a.a
where the musical corpus contains values of m
t
over t T
m
.
The variables entering this equation are m
0
and m
t
as illustrated by
the causal paths in Iig. a.. If we are modelling the mental state of
a single participant or a group of participants with a similar cultural
background, m
t
can reasonably enter as constants or parameters to be
optimized. If we are modelling mental states as a function of the cur-
rent musical event and the cultural background of each participant, m
t
can enter as a variable.
;
The summation in a.a is a weighted count of the previous musical
events in a corpus. They are weighted by their similarity to the current
musical event. The resulting sum can be normalized into a weighted
; In modelling terms, all values of m
t
over t, will be adjusted eo ocsse by switching
between corpuses. Clearly, the corpuses should appropriately reect the participants
cultural background.
a., :ourts or :rat sars. aurr au ururr a;
prevalence by dividing by |T
m
|. 8o, if there are lots of events in the
corpus similar to the current event, the resulting value will be high.
The function f converts this weighted prevalence into the resulting
mental state. This function may be very straightforwardwe might
use a linear function of the prevalence to model the perceived conso-
nance of chords so that common chords are modelled as consonant and
uncommon chords as dissonant using a corpus appropriate to the par-
ticipant. Or we may wish to model the epectancy of a chord, given
the two previous chords, by the prevalence of three-chord progressions
in an appropriate corpus.
a.,.a., Lxtr:os:: octore rc:esses
Lxtr:os:: octore rc:esses, in the bottom-left corner of Iigure a., are a
function of non-musical events past and present and the current mu-
sical event. In these processes, a response is associated with a stimu-
lus by similarity not familiarity. a piece of music with a fast tempo
and bouncy melody can communicate ecitement or arousal by anal-
ogy with bodily movement or non-verbal speech patterns, or a crack
in a vocal performance may suggest the emotional fragility of the per-
former. It would seem that such processes can communicate a wide
range of broad feelingsuslin and Iaukka aoo, provide numerous
eamples of musical features that carry meanings that can be associ-
ated, by similarity, with speech prosody and body movement. In semi-
otics these processes are termed ::co:: or :o1ex::c| s:go::ct:co 8loboda
and uslin use the related term ::co:: scor:es cj eoct:co, and it includes
uslin and Vstqlls aoo mechanisms of eoct:coc| :cotcg:co and ::soc|
:ocgery.
Itrinsic nature processes are, by denition, due to perceived sim-
ilarities between musical and non-musical events. This can be mathe-
matically modelled by the mental states induced by non-musical events
that are, in some sense, similar to the current musical event.
a :ourtttc nr coctto or oattv
One simple mathematical formalization of this process is the follow-
ing model.
h(m
0
) = f
_
_
nN
h
n
(n) s(m
0
, n)
_
_
. a.,
This model has a single variable m
0
, with the diering values of nenter-
ing the model as constants parameterized by the range of values consid-
ered, which is denoted by the set N. The function h
n
may be modelled
as being culturally determined, that is, members of dierent cultures
can be modelled as, on average, responding to the same non-musical
events in dierent ways.
The summation is a weighted sum of the mental states induced by
dierent non-musical events. They are weighted by the similarity of
each non-musical event and the current musical event. This can be
transformed into a weighted mean by dividing by |N|. 8o, if there
are some non-musical events similar to the current musical event, and
others that are dissimilar, the resulting mental state is modelled as the
weighted mean of the mental states induced by the former non-musical
events. Ior eample, we may wish to model the arousal produced by
a piece of music, and model certain characteristics of music e.g., high
loudness, high pitch, bright timbre, fast tempo as being similar to the
physical manifestations of arousal e.g., fast and energetic body move-
ments and vocalizations. If the current musical event has these proper-
ties, it will have a high similarity with these non-musical events, hence
the resulting mental state is modelled as high in arousal.
a.,.a., Iotr:os:: octore rc:esses
Iotr:os:: octore rc:esses, in the bottom-right corner of Iigure a., have
just one type of input variable. the current musical event. In such pro-
cesses, a specic type of musical event is eectively hard-wired to a
mental response. At a basic level, any given stimulus is associated with
a raw feel or sensationfor eample, the private subjective eperi-
ence of pitch, timbre, or loudness. Also at a basic level, a stimulus may
a., :ourts or :rat sars. aurr au ururr a,
induce a basic emotional responseanalogously to how the smell of
bread baking in an oven may induce a feeling of hunger, or the ash
of a big cats teeth may induce fear, a sudden loud sound may stimu-
late shock, surprise, or even pain. Dowever, the feelings involved may
not always be quite so basic. Ior instance, there is a well-established
link between stimulus compleity and pleasure Berlyne, ,;o, and it
has been demonstrated how music that is too simple for a participant
is found tedious, and music that is too comple is found unpleasant,
with a pleasure-maimizing Goldilocks compleity somewhere in-
between Vitz, ,oo, Deyduk, ,;, North and Dargreaves, ,,. It is
plausible that there is an evolutionary advantage to such behaviour. it
is useful for us to invest time solving solvable problems, but also use-
ful that we should avoid situations where we cannot properly resolve,
or make out, our percepts such as venturing into the dark woods at
night. This process includes uslin and Vstqlls aoo mechanism of
|rc:o steo reex.
Apossible concrete eample of this is the displeasure associated with
dissonant chords that are rcog| due to the rapid beating of partials
that are close in frequency and have no clear root the spectral con-
tent has no clear fundamental. In both cases, the resulting percept is
more comple than that produced by a chord with low roughness and
a strong root. in the rst case, because of the distracting beating and
the diculty of resolving the individual frequency components, in the
second case, because of the diculty in nding a single representative
root pitch. A causal link from compleity to pleasure makes it possi-
ble for an artist to play aesthetic games, for eample, in music, a greater
symmetry may be eposed only gradually such that each segment e.g.,
interval, chord, phrase, section is somewhat comple asymmetrical
but, when all segments have been played, a more general symmetry or
pattern is apparent. There may be other ways in which compleity can
be used to communicate higher-level feelings, and there may be other
intrinsic nature responses that can be similarly manipulated.
,o :ourtttc nr coctto or oattv
Intrinsic nature processes are due only to some function of the cur-
rent musical variable. This can be mathematically formalized by the
following simple model.
h(m
0
) = f(m
0
) , a.,
which has a single variable m
0
.
A very simple eample is using log-frequency as a model of pitch
height, inother models, suchas the anitymodel that will be described
in 8ection ,.., the variable may be multidimensional and the function
more comple. Purely psychoacoustic models fall into this category,
but this category also includes models that use core knowledge 8pelke
and Kinzler, aoo; and perceptual principles, like Gestalt, or mathe-
matical procedures, such as similarity measures and pattern detection,
which are not psychoacoustic in nature.
a.,.a. Ccosc| :oterc:t:cos
There are also interactions between these four processes not shown in
Iigure a., to avoid a spaghetti eect. Ior eample, a stimulus that is
initially perceived as comple will become less so with familiarity as
demonstrated by North and Dargreaves ,,. In this way, a previ-
ously unpleasantly-dissonant tone cluster may become pleasant with
sucient familiarity. This is because familiarity tends to reduce per-
ceived compleityhence intrinsic nurture processes inuence intrin-
sic nature processes. This can be visualized as a causal path etending
from the intrinsic nurture mental process to the intrinsic nature pro-
cess, or, mathematically, by treating the function f in a., as being
parameterized by a.a. If the intrinsic nature function f is relatively
ineible under the inuence of a.a, it is reasonable to think of the
nature process as the more fundamental, or underlying, process.
This relationship also works in reverse, in that a stimulus that evokes
a strong innate response is likely more salient and, therefore, probably
easier to learn. Ior eample, the pitch associated with the fundamen-
tal of a harmonic comple tone is a very salientprobably innate
a., :ourts or :rat sars. aurr au ururr ,
perceptual response to such tones, whereas the loudness of the seventh
harmonic is not a particularly salient perceptual response. Ior this rea-
son, we nd it easier to learn patterns of pitch than patterns of the sev-
enth harmonics loudness. This can be visualized as a causal path e-
tending from the intrinsic nature mental process to the intrinsic nur-
ture process, or, mathematically, by weighting the similarity function
by the salience implied by a.,.
Iurthermore, salient pitch patterns are more likely to be mentally as-
sociated with non-musical events than are non-salient patterns of sev-
enth harmonic loudness, which shows that an intrinsic nature process
can also aect an etrinsic nurture process.
In similar ways, each type of process can aect each other type of
process, and this illustrates just how dicult it is to cleanly separate
out nature from nurture in an eperimental investigation. Ior eam-
ple, in Chapter , I model the anity of chords and scales by the simi-
larities of the pitches of their partials. Dowever, this presupposes some
understanding of what a chord and a scale is, and what anity means
in a musical contet. If an individual were raised in a culture that did
not use pitch-based music, his or her responses would probably be very
dierent to my participants and, in an eperimental setting, questions
such as how strongly does this chord produce a feeling of closure or
completion would probably be completely alien. Dowever, this does
not preclude the possibility that, if a pitch-based music were to develop
in this culture, it may be more likely to develop in certain directions
than othersin directions that are both constrained by, and make use
of, underlying innate processes of perception and cognition.
Although causal interactions between the dierent types of process
can occur, as described above, it is important to point out that cer-
tain causal paths are unavailable under the denitions given above. Ior
instance, it may seem that previous musical events causally aect the
current musical event. Ior eample, it might seemthat if a certain mu-
sical event is very common in the corpus i.e.,
t
s(m
0
, m
t
) is high,
this implies the probability of m
0
is increased and that this would have
,a :ourtttc nr coctto or oattv
a direct impact on a nature process. But the value of m
0
, which enters
into equations a.a., is not its probability of occurring. Rather m
0
,
in all these models, is taken as a given, which implies its probability of
occurring is conditioned out. Ior this reason, using the denitions I
have given above, there can be no causal link between previous musical
events and the current musical event.
Although there is no way to completely disentangle the dierent
processes that occur in perception and cognition, the above four-fold
categorization is a useful theoretical framework. It is useful because it
claries the way that dierent mental processes operate, and the mod-
els that are appropriate for them. Notably, it enables us to conceptually
dierentiate between those aspects of perception and cognition that are
in essence innate and universal in that they are, to a meaningful etent,
unaected by typical cultural eperiences. It also claries how the dif-
ferent processes causally inuence each other, and shows how they can
operate not just simultaneously but also in concert, with certain stim-
uli taking on a strong emotional or conceptual charge that is mutually
reinforced by all four processes.
In 8ection a.,.,, I eplore another important causal pathway that oc-
curs over time, this is from the mental processes of composers and per-
formers to the repertoire that will become the previous musical events
of a future generation of listeners. In 8ection a.,.,, I briey describe
some ways in which the eects of nurture processes can be minimized
in an eperimental setting.
a.,.a.o 1|e oeotc| stcte
The result of the four processes, and their interactions, is a mental state,
shown at the centre of Iigure a.. This mental state can be directly
observed only by introspection, which is subject to many possible er-
rors and distortions due to the act of observation itself which disrupts
the eperience and preconceptions of what the eperience should
be such distortions are discussed at length by Petitmengin and Bitbol
aoo,. Iurthermore, because it is dicult to communicate to a partic-
a., :ourts or :rat sars. aurr au ururr ,,
ipant a specic categorization of a subjective state, and any such state is
frequently a miture of any number of other states, each of which may
have fuzzy boundaries, it is possible the participant will rate ootcrgete1
mental states that have not been asked for. Ior eample, in the eperi-
ment described in Chapter ,, participants were asked to rate how well
melodic tones tted together their anity, the data indicates, how-
ever, that they were also rating the inherent consonance of the timbres
used.
Iurthermore, introspection is blind to the processes that lead to this
nal mental state see Petitmengin and Bitbol aoo, for a discussion of
process blindness. people may have little ability to report accurately
about their cognitive processes and even have a literal lack of aware-
ness that a process of any kind is occurring until the moment that the
result appears Nisbett and Wilson, ,;;, p. a,. This blindness makes
modelling dicult but, then again, we are also blind to processes natu-
ral laws that underlie the physical world too if we werent, modelling
would be a trivial eercise!.
Underlying all of these nature and nurture processes are psychoa-
coustical and cognitive universalitiesthe peripheral auditory system
is sensitive to a limited frequency range, and it ehibits specic acous-
tical properties that are innate and universal. The central auditory sys-
tem relies on neural and perceptual processes that are, to a large e-
tent, innate and universal, and there are higher-level cognitive pro-
cesses that can also be thought of as innate and universal such as Gestalt
principles of perceptual grouping. Ior eample, in any model of mu-
sical perceptioneven nurture processesit makes sense to model
these peripheral mechanisms e.g., not including ultrasonic frequen-
cies in the rst stages, and take account of general cognitive processes
throughout the model. It is, in this sense, that conventional symbolic
music notation, such as note names which characterize a comple
tones fundamental, can provide useful simple variables that already
encapsulate psychoacoustic processes. Ultimately, the value of any mu-
sic variable in the framework models discussed above, is a function of
,, :ourtttc nr coctto or oattv
current music
event
non-music
event
listening
processes
previous music
events
mental state
composition
processes
Iigure a.a. Aggregated over people and time, composers create a feedback
loop.
an acoustical input variable that can be fully characterized, in the time
domain, by its pressure.
i=1
|x[i] y[i]|
p
_
_
1/p
= x y
p
for p 1 , ,.
where | | is the absolute value and
p
is the p-norm. Dierent values
of p in the L
p
metrics correspond to some familiar distances, for eam-
ple, p = 1 is the taicab distance, p = 2 is the Iuclidean distance, and
p = is the maimum value distance also known as the Chebyshev
distance, it can be written max
i
|x[i] y[i]|.
A s:o:|cr:ty oecsore is a closely related function, ecept it increases
when distance decreases and vice versa. Typically, a similarity measure
s(x, y) takes a value of zero when x and y are maimally dissimilar,
and a value of unity when x = y. Any distance metric d(x, y) can be
transformed into a similarity measure by taking the function e
d(x,y)
Chen et al., aoo,.
,
Dowever, a commonly used similarity measure for
vectors is the cosine of the angle between them, which is given by.
s
cos
(x, y) =
x
(x
x)(y
y)
, ,.a
where x and y are column vectors and
is the matri transpose opera-
tor, which converts a column vector into a row vector, and vice versa.
Ior vectors, all of whose elements are non-negative, this takes a value
between zero when they orthogonal and unity when they are paral-
lel.
Although cosine similarity can be transformed into a true distance
metric, called cogo|cr 1:stco:e, by
d(x, y) = arccos(s
cos
(x, y)) , ,.,
, This is equivalent to 8hepards proposed universal lawof generalization, which relates
distance in psychological space to perceived similarity 8hepard, ,;.
o :ourtttc nr st:ttartv or rtcn cottrctos
the simpler :cs:oe 1:stco:e
d
cos
= 1 s
cos
(x, y) ,.,
is a commonly used semi-metric a seo:-oetr:: fullls the rst three con-
ditions for a metric, but not the triangle inequality.
Applications of these measures to the epectation tensors is discussed
in depth in 8ection ,..
,.a carcorv uo:at r:truutcs
I:t:| :e:tcrs and :t:| :|css :e:tcrsboth of which are widely used in mu-
sic theoryare eamples of :ctegcry 1coc:o eo|e11:ogs this is a novel
designation, which I introduced in Milne et al. aob. In such em-
beddings, the values of the elements indicate pitches or pitch classes
usually in semitones, and their :o1ex position in the vector repre-
sents a categorical value such as the the type of voice e.g., bass, tenor,
alto, or soprano. Ior eample, the pitch vector (48, 60, 67, 74) can rep-
resent a bass part playing the :tut pitch number ,, the tenor playing
oo, the alto o;, and the soprano ;,.
,
As implied by the equations in the previous section, when using stan-
dard metrics such as the L
p
and cosine discussed above between two
such vectors, the resulting distances are based only on the pitches in
matching positions in the two vectors.
i=1
x[i, j] , ,.a
which is the column sum of the pitch class response matri X, so
X
(1)
e
= 1
I
X, ,.,
where 1
I
is an I-dimensional column vector of ones, and
is the trans-
pose operator, so 1
I
is a row vector of I ones.
oo :ourtttc nr st:ttartv or rtcn cottrctos
Applied to Iample ,.,.a, Iquation ,., produces X
(1)
e
=
(0.5, 0.25, 0.3, 0.6, 0.3, 0, 0.25, 0.5, 0.25, 0, 0, 0.25). As shown in Iample
,.,., when there is no probabilistic smoothing, and every tone has
a salience of , the monad epectation vector is equivalent to a mul-
tiplicity function of the rounded pitch class vector, that is, x
e
[j] =
I
i=1
(j x
pc
[i]).
A particularly useful application of the absolute monad epectation
vectorand one which forms the basis for most of the models in this
dissertationis to embed collections of spectral pitches or pitch classes.
Ior simplicity, I denote the resulting vectors se:trc| :t:| :e:tcrs or se:-
trc| :t:| :|css :e:tcrs. Almost all musical instruments produce :co|ex
tcoes, which comprise numerous crt:c|s sine wave components at dif-
fering frequencies. Iach of these components can be embedded as a
spectral pitch in a spectral pitch vector. This allows for the perceived
similarity of any two comple tones or chords comprising such tones
to be modelled with a similarity measure applied to their spectral pitch
vectors. This is the technique I utilize in Chapters , and to model
eperimentally obtained data. To simplify notation in these later chap-
ters, I write spectral pitch vectors as simple vectors i.e., x rather than
as tensors i.e., X
(1)
e
.
In 8ection ,.o, I also demonstrate some models that use the subse-
quently described in 8ecs. ,.,.a,.,., higher-order tensors. Dowever,
I do not formally test these higher-order tensors against empirical data,
and they do not formpart of the models in Chapters ,. This means it
is not essential for the reader to fully assimilate their formal properties
in order to understand the models that form the core of this disser-
tation. Daving said that, these higher-order tensors are an important
independent thread of my research in music cognition, and I hope to
test them against new empirical data in future work see 8ec. o.,.
The re|ct::e ococ1 exe:tct:co s:c|cr
X
(0)
e
R is a s:c|cr that gives the
epected overall number of tones that will be perceived at any pitch.
,., rrrcato rsors o;
Lxco|e ,.,.a. Given x
pc
= (0, 3, 3, 7), x
w
= (1, 1, 1, 1), and no
smoothing, we perceive a total of four tones. This is naturally repre-
sented by the order-o tensor scalar
X
(0)
e
= 4 .
The relative monad epectation scalar can be calculated by summing
X
(1)
e
over j or, more straightforwardly, as the sum of the elements of
the weighting vector
X
(0)
e
=
J1
j=0
x
e
[j] = 1
I
X1
J
=
I
i=1
x
w
[i] , ,.,
where 1
J
is a J-dimensional column vector of ones. Applied to Iam-
ple ,.,.a, ,., gives
X
(0)
e
= 3.2.
,.,.a Dyc1 Lxe:tct:co 1eoscrs
The c|sc|ote 1yc1 exe:tct:co teoscr X
(2)
e
R
JJ
is a octr:x that indicates
the epected number of tone pairs that will be perceived as correspond-
ing to any given dyad of absolute pitches.
Lxco|e ,.,.,. Iet us embed the previously used x
pc
= (0, 3, 3, 7),
x
w
= (1, 1, 1, 1), and apply no smoothing. These imply there are two
ordered pairs of tones with pitches o and , one pair is the rst and
second tones in x
pc
, the other pair is the rst and third tones in x
pc
,
there is one ordered pair with pitches o and ; the rst and fourth tones
in x
pc
, there are two ordered pairs of tones with pitches , and , one
pair is the second and third tones in x
pc
, the other pair is the third and
second tones in x
pc
, and there are two ordered pairs of tones with the
pitches , and ; one pair is the second and fourth tones in x
pc
, the other
o :ourtttc nr st:ttartv or rtcn cottrctos
pair is the third and fourth tones in x
pc
, and so forth. This can be con-
veniently represented by the order-a tensor matri
X
(2)
e
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 0 0 2 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 2 0 0 0 2 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 2 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
,
where each entry is identied by two indices, which both range from
o to . The indices specify a pair of pitches so the top-left element
indicates the pitches o and o, the net element to the right indicates
the pitches o and , the element below this indicates the pitches and
, etc., while the value of the entry indicates the epected number
of ordered tone pairs perceived at that pair of pitches. 8pecically, the
zeroth row shows there are two ordered tone pairs with pitches o and
, because x
e
[0, 3] = 2, and there is one tone pair with pitches o and ;
because x
e
[0, 7] = 1. And so forth.
Absolute dyad epectation tensors are useful for comparing the ab-
solute dyadic structures of two pitch collections, for eample, to com-
pare scales according to the number of dyads they sharethe scales C
major and I major contain many common dyads and so have a small
distance ., the scales C major and I major contain just one com-
mon dyad B, I} and so have a large distance .;a. These distances
are calculated with cosine distance ,., and J = 12.
I now describe a method for calculating these tensors. With two
tones indeed by i = 1 and 2, there are two ordered pairs (1, 2) and
(2, 1). The probability of perceiveing tone as having pitch j and tone
a as having pitch k is given by x[1, j] x[2, k] these are elements taken
fromthe pitch class response matri X, which is dened in ,.o. 8im-
ilarly, the probability of perceiving tone a as having pitch j and tone
,., rrrcato rsors o,
as having pitch k is given by x[2, j] x[1, k]. Given two tones, the e-
pected number of ordered tone pairs that will be perceived as having
pitches j and k is, therefore, given by x[1, j] x[2, k] +x[2, j] x[1, k]. 8im-
ilarly, given three tones indeed by i = 1, 2, and 3, there are si ordered
pairs (1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2), and the epected number
of ordered tone pairs perceived as having pitches j and k is given by the
sum of the si probabilities implied by these pairs.
Generalizing for any number of tones, the absolute dyad epectation
tensor, X
(2)
e
R
JJ
, contains elements
x
e
[j, k] =
(i
1
,i
2
)I
2
:
i
1
=i
2
x[i
1
, j] x[i
2
, k] , ,.
where I = {1, 2, . . . , I}, so I
2
is all ordered pairs from I, and element
indices j and k indicate the pitches j and k note that i ranges from1 toI,
while j and k most conveniently range from0 to J 1 when using pitch
classes. The element value indicates the epected number of ordered
pairs of tones perceived as having those pitches. Ior eample, x
e
[0, 7] =
1 indicates a single pair of pitches at values o and ;.
Iquation ,. requires O(I
2
) operations for each element. Using
the tensor methods described in Appendi C, this can be epressed di-
rectly in terms of X, in a way that requires only O(I) operations per
element.
X
(2)
e
=
_
1
I
X
_
_
1
I
X
_
_
X
X
_
, ,.o
where denotes the coter rc1o:t also known as the teoscr rc1o:t. The
outer product is fully eplained in Appendi B but, in brief, it mul-
tiplies together all possible pairings of elements and applies no sum-
mation. Ior eample, the outer product of two vectors x R
M
and
y R
N
is a matri Z R
MN
each of whose elements z[m, n] is the
product x[m] y[n]. Ior two column vectors, therefore, x y = xy
.
As shown in Appendi B, this operation can be generalized to apply to
tensors of any order.
;o :ourtttc nr st:ttartv or rtcn cottrctos
The tensor product (1
I
X) (1
I
X) is equivalent to
(i
1
,i
2
)I
2
x[i
1
, j] x[i
2
, k], while the matri product X
X is equivalent
to
i
1
=i
2
x[i
1
, j] x[i
2
, k], hence subtracting the latter from the formeras
in ,.ois equivalent to the epression in ,.. This is eplained in
greater detail in Appendi C
The re|ct::e 1yc1 exe:tct:co teoscr
X
(1)
e
R
J
is a :e:tcr that indicates
the epected number of tone pairs that will be perceived as correspond-
ing to any given dyad of relative pitches i.e., an interval.
Lxco|e ,.,.,. Iet us embed the previously used x
pc
= (0, 3, 3, 7),
x
w
= (1, 1, 1, 1), and apply no smoothing. These imply that two or-
dered pairs of pitches make an interval of size zero one pair is the sec-
ond and third tones in x
pc
, the other is the third and second tones in
x
pc
, there are two ordered pairs of tones making an interval of size 3
one pair is the rst and second tones in x
pc
, the other is the rst and
third tones in x
pc
, there are two ordered pairs of tones making an in-
terval of size 4 one pair is the second and fourth tones in x
pc
, the other
is the third and fourth tones in x
pc
, there is one ordered pair of tones
making an interval of size 5 the fourth and rst tones in x
pc
there
is one ordered pair of tones making an interval of size 7 the rst and
fourth tones in x
pc
, there are two ordered pairs of tones making an
interval of size 8 one pair is the fourth and second tones in x
pc
, the
other pair is the fourth and third tones in x
pc
, there are two ordered
pairs of tones making an interval of size 9 one pair is the second and
rst tones in x
pc
, the other pair is the third and rst tones in x
pc
. This
can be conveniently represented by the vector
X
(1)
e
= (2, 0, 0, 2, 2, 1, 0, 1, 2, 2, 0, 0) ,
where the inde which here ranges from 0 to 11 indicates the size of
the interval, and its value represents the number of ordered tone pairs
perceived as having that interval.
It is useful to note that pitch collections with the same interval con-
tent and weights have the same relative dyad embedding. Ior eam-
,., rrrcato rsors ;
ple, the pitch vectors (0, 3, 3, 7) and (0, 4, 4, 7) have identical relative
dyad embeddings.
Relative dyad epectation tensors are useful for comparing the inter-
vallic structures of two or more pitch collections regardless of transpo-
sition. Ior eample, to compare the number of intervals that two pitch
collections have in common or to compare dierent pitch collections
by the number, and tuning accuracy, of a specic set of privileged in-
tervals they each contain for specic applications, see Iample ,.o.,,
which compares thousands of scale tunings to a set of just intonation
intervals.
The relative dyad epectation vector is given by applying row shifts
to X
(2)
e
so that k k +j circular row shifts so that k k +j mod J,
when using pitch classes, and then summing over j, that is,
x
e
[k] =
j
x
e
[j, k +j] , ,.;
where k + j is taken modulo J when pitch class vectors are used. The
inde k indicates an interval, of size k, above j, for eample, x
e
[7] = 1
indicates one interval of size 7. Assuming the independence of tone
saliences, the values are the epected number of ordered tone pairs per-
ceived as having that interval, regardless of transposition.
Iquation ,.; can also be written as autocorrelations.
X
(1)
e
= 1
X 1
X
I
i=1
x[i] x[i] , ,.
where denotes cross-correlation and x[i] is the ith row of the pitch
class response matri X. This is the autocorrelation of the column sum
of the pitch class response matri Xminus the autocorrelations of each
of its rows.
F(f )
2
_
, where is circular
cross-correlation so f f is the autocorrelation of f , and F denotes the Iourier trans-
form.
,., rrrcato rsors ;,
other triple is the rst, third, and fourth tones in x
pc
. There are two
ordered triples with pitches 3, 3, and 7 one triple is the second, third,
and fourth tones in x
pc
, the other is the third, second, and fourth tones
in x
pc
. There are no ordered triples at any other triad of pitches. This
can be conveniently represented by an order-, tensor a three-way ar-
ray, or cube of numbers. In such a tensor, each entry is identied by
three indices each of which ranges from 0 to 11. The indices indicate
a triad of pitches, while the value of that entry indicates the epected
number of ordered tone triples perceived at that triad of pitches. 8o, for
this eample, the tensor is all zeros ecept for entries with inde val-
ues that are a permutation of (0, 3, 3), (0, 3, 7), and (3, 3, 7) which have
a value of 2 i.e., x
e
[0, 3, 3] = x
e
[3, 0, 3] = x
e
[3, 3, 0] = x
e
[0, 3, 7] =
x
e
[0, 7, 3] = x
e
[3, 0, 7] = x
e
[3, 7, 0] = x
e
[7, 0, 3] = x
e
[7, 3, 0] =
x
e
[3, 3, 7] = x
e
[3, 7, 3] = x
e
[7, 3, 3] = 2.
Absolute triad epectation tensors are useful for comparing the abso-
lute triadic structures of two pitch collections, for eample, to compare
two scales according to the number of triads they sharethe scales C
major and I major have many triads in common e.g., C, I, G}, C, D,
I}, and D, I, G} are found in both scales and so have a small distance
.170, the scales C major and I major have no triads in common
they share only two notes B, I}and so have the maimal distance
of . These distances are calculated with the cosine distance ,., with
J = 12.
I now describe a method for calculating these tensors. Given
three tones indeed by , a, and ,, there are si ordered triples
(1, 2, 3), (2, 1, 3), (2, 3, 1), (1, 3, 2), (3, 1, 2), (3, 2, 1), the probabilities
of perceiving each triple as having pitches j, k and , respectively,
are x[1, j] x[2, k] x[3, ] , x[2, j] x[1, k] x[3, ] , x[2, j] x[3, k] x[1, ] ,
x[1, j] x[3, k] x[2, ] , x[3, j] x[1, k] x[2, ] , and x[3, j] x[2, k] x[1, ] .
Given three tones, the epected number of ordered tone triples per-
ceived as having pitches j, k, is given by the sum of the above proba-
bilities.
;, :ourtttc nr st:ttartv or rtcn cottrctos
Generalizing for any number of tones, the absolute triad epectation
tensor, X
(3)
e
R
JJJ
contains elements
x
e
[j, k, ] =
(i
1
,i
2
,i
3
)I
3
:
i
1
=i
2
,i
1
=i
3
,i
2
=i
3
x[i
1
, j] x[i
2
, k] x[i
3
, ] , ,.,
where I = {1, 2, . . . , I}, and I
3
is all ordered triples of elements from
I, and j, k, and indicate the pitch classes j, k, and note that i
ranges from 1 to I, while j, k, and most conveniently range from
0 to J 1 when using pitch classes. Assuming the independence of
tone saliences, element value indicates the epected number of ordered
triples of tones perceived as having those three pitches. Ior eample,
x
e
[0, 4, 7] = 1 indicates a single triad of pitches at values o, ,, and ;.
Iquation ,., requires O(I
3
) operations for each element, but can
be simplied to O(I) by using the tensor methods of Appendi C.
X
(3)
e
=
_
1
I
X
_
_
1
I
X
_
_
1
I
X
_
_
_
1
I
X
_
_
X
X
_
_
1,2,3
_
_
1
I
X
_
_
X
X
_
_
2,1,3
_
_
1
I
X
_
_
X
X
_
_
3,1,2
+ 2
_
X
_
1
I
, ,.ao
where denotes the Khatri-Rao product, and the angle bracketed sub-
scripts denote mode permutations, both of which are eplained in full
in Appendi B. This equation is a higher-order generalization of ,.o
in that the rst line calculates the summation in ,., but over all terms
i.e., (i
1
, i
2
, i
3
) I
3
. The net three lines remove summations over
any terms where (i
1
= i
2
), (i
1
= i
3
), and (i
2
= i
3
). But this removes
the term i
1
= i
2
= i
3
too many three times, so two of them are put
back by the nal line. The process used here is similar to the :o:|os:co-
ex:|os:co r:o::|e used in combinatorics to count the numbers of ele-
,., rrrcato rsors ;
ments in intersecting sets. The method is eplained in greater detail in
Appendi C.
The re|ct::e tr:c1 exe:tct:co teoscr
X
(2)
e
R
JJ
is a octr:x that in-
dicates the epected number of ordered tone triples perceived at any
possible relative triad the latter is characterized by a pair of single-
reference intervals. By s:og|e-rejereo:e, I mean that both intervals are
measured with respect to the same pitch, for instance, a three-voice mi-
nor triad can be characterized by the interval pair , and ; because it has
intervals of those sizes with respect to pitch o i.e., 3 0 and 7 0. The
same triad can also be represented by the interval pair , and , because it
has intervals of those sizes with pitch , i.e., 7 3 and 0 3 mod 12, or
by the interval pair and because it has intervals of those sizes with
pitch ; i.e., 0 7 mod 12 and 3 7 mod 12.
Lxco|e ,.,.o. Iet us embed the previously used x
pc
= (0, 3, 3, 7),
x
w
= (1, 1, 1, 1), and apply no smoothing. These imply there are two
ordered tone triples with the single-reference interval pair o and , one
triple is the second, third, and fourth elements of x
pc
, the other is the
third, second, and fourth elements of x
pc
, there are two ordered tone
triples with the interval pair o and , one triple is the second, third, and
rst elements of x
pc
, the other is the third, second, and rst elements
of x
pc
, there are two ordered tone triples with the interval pair , and
, one triple is the rst, second, and third elements of x
pc
, the other
is the third, third, and second elements of x
pc
, there are two ordered
tone triples with the interval pair , and ; one triple is the rst, second,
and fourth elements of x
pc
, the other is the rst, third, and fourth ele-
ments of x
pc
, there are two ordered tone triples with the interval pair
, and , one triple is the second, fourth, and rst elements of x
pc
, the
other is the third, fourth, and rst elements of x
pc
, there are two or-
dered tone triples with the interval pair and , one triple is the fourth,
rst, and second elements of x
pc
, the other is the fourth, rst, and third
;o :ourtttc nr st:ttartv or rtcn cottrctos
elements of x
pc
. This can be conveniently represented by the order-a
tensor matri
X
(2)
e
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 0 0 0 2 0 0 0 0 2 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 2 0 0 0 2 0 0 0 0
2 0 0 0 0 0 0 0 0 2 0 0
0 0 0 0 0 0 0 0 2 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 2 0 0 0 0 0 0 0 0
0 0 0 0 0 2 0 0 2 0 0 0
2 0 0 0 2 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
where each entry is identied by two indices, which both range from
0 to 11. The indices specify a pair of single-reference intervals so the
top-left element indicates the intervals 0 and 0, the net element to the
right indicates the intervals 0 and 1, the element below this indicates
the intervals 1 and 1, etc., while the value of the entry indicates the
epected number of ordered tone triples perceived as having that pair
of intervals. 8pecically, the zeroth row shows there are two ordered
tone triples with single-reference intervals of sizes o and , because
x
e
[0, 4] = 2, and there are two tone triples with single-reference inter-
vals of sizes o and , because x
e
[0, 9] = 2, the third rowshows there are
two ordered tone triples with single-reference intervals of sizes , and
, because x
e
[3, 3] = 2, and there are two triples with single-reference
intervals of sizes , and ; because x
e
[3, 7] = 2. And so forth.
It is interesting to note that dierent pitch collections with the same
interval content and weights may have dierent relative triad em-
beddings. Ior eample, the pitch vectors (0, 3, 3, 7) and (0, 4, 4, 7) have
non-identical relative triad embeddings.
Relative triad epectation tensors are useful for comparing the tri-
adic structures of two or more pitch collections, regardless of transpo-
sition. Ior eample, to compare the number of triad types two pitch
collections have in common, or to compare pitch collections by the
number, and tuning accuracy, of a specic set of privileged triads they
,., rrrcato rsors ;;
each contain for a specic application, see I. ,.o.,, which compares
thousands of scale tunings against a just intonation triad.
The relative triad epectation matri is given by applying mode
shifts to X
(3)
e
so that k k + j and + j circular mode shifts
so that k k +j mod J and +j mod J, when embedding pitch
classes, and then summing over j.
x
e
[k, ] =
j
x
e
[j, k +j, +j] , ,.a
where k + j and + j are taken modulo J when used with pitch class
vectors. Ilement indices k and indicate two intervals, of sizes k and
, above j which together make a triad. Assuming independence of
tone saliences, the element values are the epected number of ordered
tone triples perceived as corresponding to that relative triad. Ior e-
ample, x
e
[4, 7] = 1 indicates one ordered tone triple containing single-
reference intervals of sizes , and ;.
In the same way the relative dyad epectation vector can be consid-
ered to be a psychoacoustically informed generalization of the interval
class vector, the relative triad epectation matri can be thought of as
a generalization of the tr::|cr1-:|css :e:tcr also known as a ,-class vec-
tor, which is used in musical set theory Iewin, ,;, Castrn, ,,,,
Buchler, aoo, Kuusi, aoo.
,
The tensor representation also has the
advantage that the indeing of its elements is directly related to the
pitch structure, rather than just following Iortes set class designations
,;,.
,.,., R-c1 Lxe:tct:co 1eoscrs
The denitions and techniques of the previous sections can be gener-
alized to R-ads of any order so long as R I.
, It is also equivalent to Iewins EMB(CANON, /X/, Y) function, taken over all tri-
chords in the set class /X/, where the canonical group CANON is transposition op-
erations not inversions Iewin, ,;, p. oo.
; :ourtttc nr st:ttartv or rtcn cottrctos
An c|sc|ote R-c1 exe:tct:co teoscr, X
(R)
e
R
J
R
indicates the epected
number of ordered tone R-tuples that will be perceived as correspond-
ing to any given R-ad of absolute pitches. It contains elements
x
e
[j
1
, j
2
, . . . , j
R
] =
(i
1
,...,i
R
)I
R
:
i
n
=i
p
R
r=1
x[i
r
, j
r
] , ,.aa
where I = {1, 2, . . . , I}. Ilement indices j
1
, j
2
, . . . , j
R
indicate the
pitches j
1
, j
2
, . . . , j
R
note that i ranges from 1 to I, while j
1
, j
2
, j
R
most conveniently range from 0 to J 1 when using pitch classes.
Assuming the independence of tone saliences, element value indicates
the epected number of ordered R-tuples of tones perceived as having
those R pitches. Ior eample, x
e
[0, 4, 7, . . . , 11] = 1 indicates a single
ordered R-tuple of tones with pitches o, ,, ;, , and .
As eplained in Appendi C, this can also be epressed directly in
tensor notation.
X
(R)
e
=
_
_
1
J
R
E
I
R
_
X
R
R+1,1,R+2,2,...,...,R+R,R
_
R
1
I
R
. ,.a,
Iquations ,.aa and ,.a, are symbolically concise, but cumbersome
to calculate since each element of X
(R)
e
requires O(I
R
) operations.
Iortunately, this can be reduced to O(I) by breaking ,.a, into sub-
spaces which are then added and subtracted in a manner analogous
to that shown in ,.o and ,.ao. This process is fully eplained
in Appendi C. As shown in the :atat routines at http://www.
dynamictonality.com/expectation_tensors_files/, the computational
compleitycanbe further reducedbyeploiting the sparsityof the ten-
sors to calculate only non-zero values, furthermore, due to their con-
struction, the tensors are invariant with respect to any transposition of
their indices, so only non-duplicated elements need to be calculated. To
minimize memory requirements, the tensors can be stored in a sparse
format.
,. :rrtcs ;,
A re|ct::e R-c1 exe:tct:co teoscr indicates the epected number of or-
dered tone R-tuples perceived at any possible relative R-ad the latter
characterized by R 1 single-reference intervals. It is invariant with
respect to transposition of the pitch collection. The absolute R-ad e-
pectation tensors are transformed into relative R-ad epectation ten-
sors by shifting modes 2, 3, . . . , R of X
(R)
e
so that j
r
j
r
+ j
1
cir-
cularly shifting modes so that j
r
j
r
+ j
1
mod J, when embedding
pitch classes, and then summing over j
1
. This creates an order-(R1)
re|ct::e R-c1 exe:tct:co teoscr with elements
x
e
[j
2
, j
3
, . . . , j
R
] =
j
1
x
e
[j
1
, j
2
+j
1
, . . . , j
R
+j
1
] R
J
R1
. ,.a,
Ilement indices j
2
, . . . , j
R
indicate a set of R 1 intervals above
j
1
which together make an R-ad, assuming the independence of
tone saliences, element value indicates the epected number of or-
dered R-tuples of tones that are perceived as corresponding to that
relative R-ad set of R 1 single-reference intervals. Ior eample,
x
e
[j
2
, j
3
, . . . , j
R
] = 1 indicates one R-tuple of tones containing single-
reference intervals of sizes j
2
, j
2
, , j
R
.
As before, the relative R-ad epectationtensors canbe viewed as gen-
eralizations of subset-class vectors of cardinality R also called n-class
vectors, where n = R. In comparison to these vectors, the tensors have
the advantage of a principled system of indeing, as well as meaning-
fully accounting for duplicated pitches and the uncertainties of pitch
perception.
,. :rrtcs
The focus of the previous section has been on dierent ways of em-
bedding a single collection of pitches into a tensor. In this section, I
discuss methods to measure the distance between, or similarity of, any
two such tensors. In particular I discuss the two common distances in-
o :ourtttc nr st:ttartv or rtcn cottrctos
troduced in 8ection ,.the L
p
and the cosinewhich are used in the
applications of 8ection ,.o.
It is reasonable to model the perceived pitch distance between any
two tones with their absolute pitch dierence e.g., the pitch distance
between tones with pitch values of o, and oo semitones is , semitones.
The L
p
-metrics are calculated fromabsolute dierences so they provide
a natural choice for calculating the overall distance between pairs of
category domain pitch vectors. When there are I dierent tones in each
vector, there are I dierent pitch dierences, the value of p determines
how these are totalled e.g., p = 1 gives the taicab measure, which
simply adds the distances moved by the dierent voices, p = 2 gives the
Iuclidean measure, p = gives the maimum value distance, which
is largest distance moved by any voice. As discussed in 8ection ,.a, the
use of such metrics is a well-established procedure.
The metrics may be based on the intervals between pairs of pitch, or
pitch class, vectors in R
I
.
d
w
(x
pc
, y
pc
; p) =
_
_
I
i=1
w[i]
x
pc
[i] y
pc
[i]
p
_
_
1/p
, ,.a
where the weights w[i] may be sensibly chosen to be the product of
the saliences w[i] = x
w
[i] y
w
[i] from ,.; Parncutt, ,,. The metrics
may also treat the unordered pitch class intervals.
,
d
c
(x
pc
, y
pc
; p) =
_
_
I
i=1
w[i] min
kZ
x
pc
[i] y
pc
[i] kJ
p
_
_
1/p
. ,.ao
Iquation ,.a provides a measure of pitch height distance while ,.ao
provides a measure of pitch class or chroma distance.
To calculate the distance between two pitch domain epectation ten-
sors X
(R)
e
and Y
(R)
e
R
R
..
J J J
, the L
p
and cosine distances can
, Uocr1ere1 :t:| :|css :oter:c|s, also known as :oter:c| :|csses or oo1:re:te1 :oter:c|s, are com-
monly used in musical set theory. They give the same value to any given interval and
its inversion, for eample, both a perfect fth and a perfect fourth are represented by
the value of 5 semitones.
,. :rrtcs
be applied in an entrywise fashion. The simplest way to write this is to
reshape the tensors into column vectors x and y R
J
R
, which may be
applied in ,. and ,.a.
The cosine similarity of two vectors is equivalent to their uncen-
tred correlation, and the use of such metrics is an established procedure
in music theory and cognition Krumhansl, ,,o, 8cott and Isaacson,
,,, Rogers, ,,,.
But lower-
dimensional tunings principally one and two-dimensional also have
a number of musically useful features, notably, they facilitate mod-
ulation between keys, they can generate scales with simply patterned
structures equal step scales in the case of -D tunings, well-formed
scales in the case of a-D tunings Carey, aoo;, and the tuning of all
tones in the scale can be meaningfully controlled, by a musician, with
a single parameter Milne et al., aoo;.
,
Given the structural advantages of low-dimensional generated scales,
it is useful to nd eamples of such scales that also contain a high pro-
portion of tone-tuples whose pitches approimate privileged higher-
dimensional intervals and chords such as just intonation. A familiar
eample is the a-r chromatic scale generated by the oo cent semi-
ost :otcoct:co intervals and chords have low integer frequency ratios. They are typi-
cally thought to sound more consonant than the tunings used in a-tone equal tem-
perament.
, e||-jcroe1 s:c|es Carey and Clampitt, ,,, or MO8s:c|es Wilson, ,;, are special
cases of a-D scales, each of whose generic intervals as measured by number of scale
steps comes in no more than two specic sizes as measured by a log-frequency unit
like cents. In order to construct an MO8 scale given a specic period and generator,
the generator must be iterated precisely a number of times that yields a scale satisfy-
ing these requirements. The familiar anhemitonic pentatonic and diatonic scales are
MO8 scales with a period of aoo cents and a generator of approimately ;oo cents
the generator is iterated four times for the pentatonic scale, and two additional times
for the diatonic scale. Numerous unfamiliar possibilities become available with non-
standard period and generator tunings Irlich, aooo.
,.o arrttcatos ;
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
0.6
0.8
1.0
Number of Equal Divisions of the Octave
D
i
s
t
a
n
c
e
Iigure ,.,. The cosine distance on relative dyad epectation embeddings
with a Gaussian smoothing kernel of , cents standard deviation
between a just intonation major triad (0, 386.3, 702) and all n-tets
from n = 2 to n = 102.
tone, which contains twelve major and twelve minor triads tuned rea-
sonably close to their just intonation versions. Another familiar eam-
ple is the meantone tuning of the diatonic scale, which is generated by
a period octave of approimately aoo cents and a generator fth
of approimately o,; centsthis scale contains three major and three
minor triads whose tuning is very close to just intonation closer than
the a-r scale. There are, however, numerous alternativeand less
familiarpossibilities.
Given a privileged pitch class collection embedded in an epectation
tensor, it is easy to calculate its distance from a set of n-rs up to any
given value of n.
Lxco|e ,.o.a. -D crcx:oct:cos tc ,.,. (I occr tr:c1). The I just
intonation major triad contains all and only the common-practice
harmonic consonances i.e., the perfect fth and fourth, and the major
and minor thirds and siths. It is, therefore, interesting to nd tunings
that produce simple scales containing lots of good approimations of
these intervals. The just intonation major triad has pitches with fre-
quency ratios of ,..o, hence it is three-dimensional it is factorized by
the three primes a, ,, and . This means there is no equally tempered
scale which is one-dimensional, by denition that can precisely match
all of these just intonation intervals. Dowever, certain equally tem-
pered scale may provide reasonable approimations. Iigure ,., shows
the cosine distance between the relative dyad epectation tensor em-
beddings of the I major triad and all n-rs from n = 2 to 102.
:ourtttc nr st:ttartv or rtcn cottrctos
Observe that the distances approach a at line where increasing n is
no longer benecial, and that the most prominent minima fall at the
familiar a-r and at other alternative n-rs e.g., ,-, aa-, ,-, ,,-,
,-, ,o-, and ,-r that are well-known in the microtonal literature.
Atwo-dimensional tuning has two generating intervals with sizes, in
log (f), denoted and . All intervals in the tuning can be generated by
and , that is, theyhave sizes j+k where j, k Z. Ior eample, all
possible notes in the quarter-comma meantone tuning can be generated
by an octave of aoo cents and a generator of o,o. cents i.e., = 1200
and = 696.58, so all intervals take the form 1200j + 696.58k.
It is often convenient to create a scale from a -:|c:o, which com-
prises all pitches j +k where k is restricted to a limited range of suc-
cessive integers. Ior eample, a ,-tone -chain might consist of the
notes j 9, j 8, . . . , j + 8, j + 9. Given an arbitrary set of
higher-dimensional privileged intervals with a period of repetition
typically aoo cents, how can a -chain with similarly sized intervals
be found In general, it is convenient to the tuning of to /n, for
n N, because this ensures the resulting generated scale repeats at the
period whatever the value of . 8o, once is chosen, the procedure is
to generate -chains of a given cardinality and to iterate the size of
over the desired range. At each iteration, the cosine distance to the set
of privileged intervals is measured using the relative dyad epectation
embeddings. This is illustrated in the following eample.
Lxco|e ,.o.,. z-D crcx:oct:cos tc ,.,. (I occr tr:c1):cocr:og
socct|:og u:1t|s. Given a ,-tone -chain with = 1200, we may wish
to nd the tunings of that produce a large number of good appro-
imations to the intervals found in a just-intonation major triad. The
just intonation major triad is three-dimensional as discussed in the pre-
vious eample, while the -chain is, by denition, two-dimensional.
This means the latter can only approimate the former, never precisely
match it. But we can iterate through values of to nd -chains that
contain large numbers of intervals that are close to the just intonation
intervals.
,.o arrttcatos ,
0.2
0.2
4
0
100
500
600
200
400
300
1100
0.2
0.4
0.6
0.8
700
1000
800
900
Iigure ,.,. The cosine distance between relative dyad embeddings of a just in-
tonation major triad {0, 386.3, 702} and a ,-tone -chain whose
-tuning ranges from o to ,,,., cents. The smoothing is Gaus-
sianwithstandard deviations of o cents left side, and , cents right
side. The two zooms show the distance minima occurring at the
meantone o, and o,o cents and helmholtz ,, and ;oa tunings,
and how their relative levels change as a function of smoothing
width.
Iigure ,., shows the distance between the relative dyad embeddings
of a just intonation major triad and ,-tone -tunings ranging over
0 1199.9 cents in increments of 0.1 cents = 1200 cents.
When using a single smoothing width, this type of chart is perfectly
symmetrical about the centre line passing through o and ooo cents. This
is because a -chain generated by = B cents is identical to that gener-
ated by = B Milne et al., aoo. This means we can utilize such
a chart to compare two dierent smoothing widths. In Iigure ,.,, the
right-hand side shows the eect of using a Gaussian kernel with a stan-
dard deviation of 3 cents, the left-hand side has a standard deviation of
6 cents.
,o :ourtttc nr st:ttartv or rtcn cottrctos
Observe the following distance minima at dierent -tunings. o,.
cents corresponds to the familiar meantone temperament, ,,., cents
to the |e|o|c|tz temperament, ,,a., cents to the seos:eot temperament,
,;. cents to the uors:|o:1t temperament, ,;,., cents to the ocg::
temperament, ,;. to the |cosco temperament, a;.o cents to the cr-
sco temperament, ;o., cents to the tetrc:ct temperament the names
for each of these temperaments has been taken from Irlich aooo. It is
interesting to note that the classic meantone tunings of approimately
o, or o,o cents are deemed closer than the helmholtz tunings of ap-
proimately ,, or ;oa cents when the smoothing has o cents, and
vice versa when the smoothing has a , cent standard deviation. In fu-
ture eperiments, the smoothing width could be used as a free param-
eter that is adjusted to best t participants ratings of how well various
temperaments approimate just intonation.
Lxco|e ,.o.,. z-D crcx:oct:cos tc ,.,. (I occr tr:c1):cocr:og
1yc1 co1 tr:c1 eo|e11:ogs. Iigure ,. compares the distance between be-
tween a just intonation major triad and seven-tone -chains e.g., the
notes j 3, j 2, . . . , j + 2, j + 3, with iterated over the
sizes 0 to 1199.9 cents in increments of o. cents when embedded in
relative dyad and relative triad epectation tensors. The left side shows
triad embeddings, the right side shows dyad embeddings.
Observe that, for low cardinality generated scales like this seven-
tone scale and a smoothing width of , cents, only a few tunings pro-
vide tone triples that are close to the just intonation major triad. the
meantone generated scale 696 cents contains three major triads,
the magic scale 820 cents contains two major triads, the porcu-
pine scale 1, 037 cents contains two major triads but with less
accurate tuning than the magic, the hanson scale 883 cents scale
contains only one major triad tuned etremely close to just intona-
tion. As the cardinality of the -chain is increased, the distances be-
tween the triadic embeddings approach those of the dyadic.
Lxco|e ,.o.. z-D crcx:oct:cos tc ,.,., (,-|:o:t c||eo-I:er:e tr:c1).
The above two eamples have used familiar tonal structures the oc-
,.o arrttcatos ,
0
100
500
600
200
400
300
1100
0.2
0.4
0.6
0.8
700
1000
800
900
Iigure ,.. The cosine distance between relative dyad embeddings right and
relative triad embeddings left of a just intonation major triad
{0, 386.3, 702} and a ;-tone -chain whose -tuning ranges from
o to ,,,., cents. The smoothing is Gaussian with a standard de-
viation of , cents.
tave of aoo cents and the major triad, but the methods are equally
applicable to any alternative structure. One such is the Bohlen-Pierce
scale, which is intended for spectra containing only odd numbered har-
monics. It has a period of ,/ the tritave, which is approimated by
,oa cents. The ,..; triad, which is approimated by {0, 884.4, 1466.9}
cents, is treated as a consonance. Iigure ,.o shows the distance of a -
chain of , notes with 0 1901.9 cents with a Gaussian smooth-
ing of , cents standard deviation. The closest tuning is found at ,,,.
cents, which is almost equivalent to 3 1902/13 and so corresponds to
the ,-equal divisions of the tritave tuning suggested by Bohlen and
Pierce Bohlen, ,;, Mathews et al., ,,.
,.o., Mos::c| 8et 1|ecry
In musical set theory, there is a rich heritage of measures used to model
the perceived distances between pitch collections e.g., Iorte ,;,,
Castrn ,,,, Buchler aoo, Kuusi aoo. Ipectation tensors can
generalize traditional embeddings in a number of ways. a they model
,a :ourtttc nr st:ttartv or rtcn cottrctos
0.2
0.4
0.6
0.8
142
993
284
1135
426 1277
567
1418
709
1560
851
0
0
158
317
475
634
792
951
1109
1268
1426
1585
1743
Iigure ,.o. The cosine distance using a Gaussian smoothing kernel with a ,
cents standard deviation between a just intonation Bohlen-Pierce
major triad {0, 884.4, 1466.9}, with a period of ,oa cents, and a
,-tone -chain whose -tuning ranges from o to ,o., cents.
the inaccuracies of pitch perception, b they can embed pitch collec-
tions in any tuning up to the pitch granularity determined by J, c
they can meaningfully deal with pitch collections that contain dupli-
cated pitches such as when two voices play the same pitch, d they can
be populated with pitches or pitch classes, e they can embed absolute
or relative pitches or pitch classes, f they can generalize subset-class
vectors, but utilize a principled indeing that does not rely upon Iorte
numbers.
ao
The relative dyad embedding is of the T
n
I typethat is, it is invari-
ant with respect to transposition and inversion of the pitch collection it
is derived from.
a
It is also invariant over Z-relations Z-related collec-
ao A set :|css is an equivalence class for pitch class sets that dier only by transposition.
Icrte ooo|ers are the numerical labels used by Iorte to inde set classes of all possible
cardinalities under the assumption of a-r, this ranges from the empty set up to
the set of cardinality a, and there are a total of ,a dierent set classes Iorte, ,;,.
Given a pitch class set, its so|set-:|css :e:tcr of cardinality nalso termed an n-:|css :e:tcr
indicates the number of occurrences of each set class of cardinality n, indeed by their
Iorte number Kuusi, aoo.
a In musical set theory, pitch class sets that are invariant with respect to transposition
belong to the same T
n
class. Ior eample, {0, 4, 7} and {1, 5, 8} are in the same T
n
class because the latter can be transposed down one semitone to make it equal to
the former {1 1, 5 1, 8 1} = {0, 4, 7}. Pitch class sets that are invariant with
respect to both transposition and inversion belong to the same T
n
I class. Ior eample,
{0, 4, 7} and {1, 4, 8} are in the same T
n
I class because the latter can be converted into
,.; utscussto ,,
tions, such as {0, 1, 4, 6} and {0, 1, 3, 7}, have the same interval content
but are not related by transposition or inversion Iorte, ,;,. Rela-
tive triad and higher-ad embeddings are invariant only with respect
to transpositionthat is they are of the T
n
type e.g., the inversion of a
major triad is a minor triad and, although these two chords contain the
same intervals, they have dierent embeddings in a relative triad ma-
tri. When used with pitch class vectors, absolute embeddings have
only period octave invariance, when used with pitch vectors, they
have no invariances.
Lxco|e ,.o.o. D:stco:es |etueeo :-sets re|cte1 |y Z-re|ct:co, :o:ers:co, co1
trcoscs:t:co. Table ,.a shows the cosine distances between the absolute
and relative dyad and triad embeddings of pitch class vector (0, 1, 4, 6),
its Z-relation (0, 1, 3, 7), its inversion (0, 2, 5, 6), and its transposition
(1, 2, 5, 7).
It is reasonable to think that perceptions of pc-set similarity may
be determined by both their absolute and relative pitch structures. To
model this, pc-set similarity can be modelled as a linear combination of
the distances between absolute and relative embeddings of diering or-
ders. Ior eample, adding relative dyad and absolute monad distances,
gives a non-zero distance between pc-sets with diering interval con-
tent like (0, 1, 4, 5) and (0, 1, 4, 6), but also takes into account their
absolute pitches, thus ensuring (0, 1, 4, 5) is closer to its transposition
(4, 5, 8, 9) than it is to its transposition (2, 3, 6, 7) e.g., adding the two
distance functions, with no weighting, gives summed cosine distances
of 0.533, 0.5 and 1, respectively.
,.; utscussto
In this chapter, I have presented a novel family of embeddings and met-
rics for modelling the perceived distance between pitch collections.
The embeddings can be realized in a manner that conforms with es-
tablished psychoacoustic data on pitch perception through the use of
the former by inversion which gives {1, 4, 8} and then transposition up eight
semitones {1 + 8, 4 + 8, 8 + 8} = {0, 4, 7}.
,, :ourtttc nr st:ttartv or rtcn cottrctos
Table ,.a. Cosine distances between a selection of pc-sets related by Z-
relation, inversion, and transposition. The distances are calculated
with four dierent types embedding.
(0, 1, 4, 6)
Z-relation
(0, 1, 3, 7)
inversion
(0, 2, 5, 6)
transposition
(1, 2, 5, 7)
Absolute dyad embeddings
(0, 1, 4, 6) o .,, .,,
(0, 1, 3, 7) .,, o .,,
(0, 2, 5, 6) .,, o .,,
(1, 2, 5, 7) .,, .,, o
Relative dyad embeddings
(0, 1, 4, 6) o o o o
(0, 1, 3, 7) o o o o
(0, 2, 5, 6) o o o o
(1, 2, 5, 7) o o o o
Absolute triad embeddings
(0, 1, 4, 6) o
(0, 1, 3, 7) o
(0, 2, 5, 6) o
(1, 2, 5, 7) o
Relative triad embeddings
(0, 1, 4, 6) o o
(0, 1, 3, 7) o .
(0, 2, 5, 6) . o
(1, 2, 5, 7) o o
Gaussiansmoothing and maybe useful as components inbroader mod-
els of the perception and cognition of music. Indeed, to model any spe-
cic aspect of musical perception, a variety of appropriate embeddings
may be linearly combined, with their weightings, the weightings of the
tone saliences if appropriate, and the type of metric, as free parameters
to be determined from empirical data.
The models demonstrated in this chapter dier fromthose of, for e-
ample, Krumhansl, Ierdahl, or those taking a neo-Riemannian or 1co-
oetz-based approach, because they are built fromeplicit psychoacous-
tic rst principles using Gaussian smoothing to model the frequency
dierence limen. Iurthermore, unlike the traditional pitch embed-
dings used in set class theory, they are able to deal in a meaningful
,.; utscussto ,
way with non-standard tunings and when more than one tone plays
the same, or a very similar, pitch.
This chapter has focused on epectation tensors, but the underly-
ing pitch class response matrices can also be used to generate salience
rather than epectation tensors. these give the probability of perceiv-
ing any given R-ad of pitches rather than the epected number of
tone-tuples perceived at a given R-ad of pitches. Developing compu-
tational simplications for higher-order salience tensors is work that
remains to be done.
The embeddings and metrics described in this chapter are also appli-
cable to other domains. a tone, as dened at the start of this chapter,
can be thought of as a member of a class of 1:s:rete co1 |:oecr st:oo|:.
A stimulus is 1:s:rete when it can be combined with other such stimuli,
yet still be individually perceived e.g., many tones may be sounded to-
gether, but still be individually heard, even the separate spectral pitches
of a harmonic comple tone may be consciously perceived, a stimu-
lus is |:oecr when it can be characterized by a scalar that is the variable
in a linear psychophysical function e.g., a tone can be characterized
by its log-frequency, which is linearly related to its perceived pitch
height. In this generalized contet, a er:c1 indicates the sizein the
units of the psychophysical functionat which perceptual equivalence
may occur e.g., pitches that are octaves apart. These generalized def-
initions indicate how the same methods may be applied to the percep-
tion of any other even non-auditory discrete stimuli that can be trans-
formed, with a link function, to make the psychophysical function lin-
ear. An obvious eample is the perception of timing in rhythms. the
physical time of a percussive event is linearly related to the perceived
time of the event, and a bar or some multiple, or division, thereof can
be thought of as representing the period. Inthis contet, the smoothing
represents perceptual or cognitive inaccuracies in timing, for eample,
it might be possible to embed a rhythmic motif containing four events
in a relative tetrad epectation matri in the time domain, and com-
pare this with a selection of other similarly embedded rhythm patterns
,o :ourtttc nr st:ttartv or rtcn cottrctos
to nd one with the closest match i.e., one that contains the greatest
number of patterns that are similar to the complete motif .
In the subsequent two chapters, however, I focus my attention on
the embedding of spectral pitches into absolute monad epectation
tensorsspectral pitch vectors. In the net chapter, I use the cosine
similarities of such vectors to model anity of microtonally pitched
tones with non-harmonic spectra. In the chapter after that, I use these
cosine similarities of spectral pitch vectors to model Krumhansls probe
tone data, and to predict the tonal functions of pitches and chords in a
variety of scales.
4
A MODII OI MIIODI C AIII NI TY
In the previous chapter, I introduced a psychoacoustically derived
method for embedding spectral or virtual pitches into a spectral or
virtual pitch vector virtual pitches were dened in 8ec. a.,. I sug-
gested the cosine similarity of any two such vectors could be used to
model the perceived anity the melodic analogue of consonance of
the tones or chords they embed. I also provided specic eamples in
8ec. ,.o.. The principal aims of the eperiment and models described
in this chapter are. a to test the spectral pitch similarity model against
eperimental data, b to test whether it is modelling an underlying
psychoacoustic process, rather than a learned response, c to determine
the strength of the psychoacoustic eect modelled by spectral or vir-
tual pitch similarity.
In addition to these aims, I also utilize cross-correlation between the
spectral pitch vector of a tone and the spectral pitch vector of a |cr-
oco:: teo|cte dened in 8ec. ,.. to create a |croco:::ty model of the
formers tcoeoess dened in 8ec. ,..a. This is also tested as a model for
anity. Iurthermore, I also embed virtual pitches, instead of spectral,
to see if virtual pitch similarity can also provide an eective model of
anity.
As I demonstrate later, the eperimental data collected for this chap-
ter conrm that spectral pitch similarity and harmonicity are eective
models of anity, and combining them produces a model with a large
eect size that is highly signicant.
x
1 |x|/w.
,.a :rnou a
coordinates equivalent to that pitch. This can be represented by
a zeroth-order Markov process, and hence by a stochastic matri
P
AbsGen
[0, 1]
NN
, all of whose rows are identical.
The nal stochastic matri of pitch transitions P
Pitch
is given by an
appropriately normalized entrywise product of the above four stochas-
tic matrices, that is,
P
Pitch
= DM, where
M = P
RelPitch
P
AbsPitch
P
RelGen
P
AbsGen
, ,.
where denotes the Dadamard entrywise product, and D R
NN
is a diagonal row-normalization matri that ensures the elements in
every row of P
Pitch
sum to 1, that is,
D =
_
_
_
_
_
_
_
_
|m
1
|
1
0 0 . . . 0
0 |m
2
|
1
0 . . . 0
0 0 |m
3
|
1
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0
0 0 0 0 |m
N
|
1
_
_
_
_
_
_
_
_
, ,.a
where |m
n
|
1
is unity divided by the sum of the elements in the nth
row of M.
The values, given above, for the means and spreads of the triangu-
lar distributions used to create the four stochastic matrices P
RelPitch
,
P
AbsPitch
, P
RelGen
, and P
AbsGen
, were initially chosen by informed
guesswork and then rened, by trial and error, to produce musically
pleasing melodies. Iurthermore, for each dierent melody the inter-
onset-interval for eighth-notes was randomly chosen, with a uniform
distribution, over the range o,,;o ms o,, beats per minute,
whose mean of ,,. ms ,, bpm equates to a medium tempo, the ar-
ticulation ratio of note-length to inter-onset-interval was randomly
chosen from the range o.;a to o.,,, whose mean of o.o equates to the
average articulation used by organists erket, aoo,.
The timbre used was moderately bright and had a quick, but non-
percussive sounding, attack and a full sustain level. With harmonic par-
aa a :ourt or :rtoutc arrttv
tials, it sounded somewhat like a brass or bowed-string instrument. It
was created by using a spectrum with partial amplitudes of 1/i where
i is the number of the partial if all partials had been in the same phase
and tuned to a harmonic series this would give a sawtooth waveform.
To slightly mellow the timbre, the tones were then passed through
The Vikings low-pass lter set to give a small resonant peak. A small
amount of delayed-onset vibrato was added to give the sound life, and
a small amount of reverb/ambience to emulate the sound of a small
recital room. The stimuli were listened to with closed-back circum-
aural headphones in a quiet room. The adapted version of The Viking
used in the eperiment, the Ma/M8Ppatch that generated the random
melodies, and audio les for a sample of the stimuli can be downloaded
from http://www.dynamictonality.com/melodic_affinity_files/.
,.a., Irc:e1ore
Iach participant listened to oo dierent randomly generated melodies.
Iach melody was played in an n-r randomly chosen fromeleven pos-
sibilities. ,-r, ,-r, -r, ;-r, o-r, -r, a-r, ,-r, -
r, o-r, and ;-r. Ior each melody, the participant could use a
mouse or touchpad to switch toggle between two dierent timbres.
one timbre was oct:|e1 its partials matched the underlying tuning, the
other was oooct:|e1 its partials were matched to a dierent n-r ran-
domly chosen from the same list. Iach melody could be repeated, by
the participant, as many times as wished, most trials were completed in
a,o minutes. Ior each participant, no pair of underlying tuning and
unmatched spectral tuning occurred more than once. Ior each pair of
stimuli, the participant was asked to make a single choice of timbre for
which all or most of the following criteria were best met for the dif-
ferent notes of the melody.
they have the greatest anity
they t together best
,., rrsuts a,
they sound most in-tune with each other
they sound the least surprising.
Without further eperimental tests, it is impossible to say whether or
not these four features are measuring the same latent concept of an-
ity for all participants. Dowever, for this work, these four descriptions
constitute my operationalization of anity.
The following data were recorded for each melody.
the tuning of the melody, and the tuning of the unmatched tim-
bres partials
whether the matched timbre coded with a 1 or unmatched tim-
bre coded with a 0 was chosen
the tempo and articulation values
Ior each participant, age, se, musical taste, and musical eperience or
training were also collected. General comments were also asked for.
When stimuli are characterized by their underlying tuning or,
equivalently, the tuning of the matched timbres partials and the tun-
ing of the unmatched timbres partials which must be dierent, by
denition, there are
11
P
2
= 110 dierent possible stimuli. The oo
dierent stimuli listened to by each participant were sampled ran-
domly without replacement uniform distribution from the o. This
means that, on average, each possible pair of underlying tuning and un-
matched tuning has been tested 44 60/110 = 24 times, each underly-
ing tuning row of Iig. ,.o 44 60/10 = 264 times, each unmatched
spectral tuning column of Iig. ,.o 44 60/10 = 264 times. In total
there were 44 60 = 2640 observations of o dierent stimuli.
,., rrsuts
The eperimental data, aggregated over all participants, are summa-
rized in Iigure ,.o. The squares represent the o dierent pairs of
stimuli the participants were presented with, the shade of each square
a, a :ourt or :rtoutc arrttv
3 4 5 7 10 11 12 13 15 16 17 All
All
17
16
15
13
12
11
10
7
5
4
3
3
All
17
16
15
13
12
11
10
7
5
4
3
M
e
l
o
d
y
a
n
d
m
a
t
c
h
e
d
t
i
m
b
r
e
t
u
n
i
n
g
(
n
-
T
E
T
)
3 4 5 7 10 11 12 13 15 16 17 All
Umatched timbre tuning (n-TET)
*** *** ***
*** *** ***
*** *** *** ***
***
*** *** ***
*** *** *** *** *** *** *** ***
** ** **
**
**
** ** ** **
**
* *
* *
** ** **
**
* *
* *
*
*
*
*
***
** *
* *
*
**
Iigure ,.o. Results aggregated over all participants. the squares represent the
o dierent pairs of stimuli the participants were presented with,
the shading of each square indicates the ratio of occasions when
the matched, rather than unmatched, timbre was chosen white
for all matched, black for all unmatched. The rows labeled by n-
r represent the dierent underlying tunings or, equivalently,
the matched timbres spectral tuning, the columns represent the
dierent unmatched timbres spectral tunings. The bottom row
and rightmost column show ratios aggregated over underlying
tunings and unmatched timbres, respectively. The bottom-right
square is the ratio aggregated over all tunings and unmatched tim-
bres. Black stars indicate signicantly more than half of the choices
were for matched timbres, white stars indicate signicantly more
were for unmatched timbres using a two-tailed eact binomial test
D.D.,.
As described in 8ection ,.., the spectral pitch vectors for the tim-
bre and the harmonic template are cross-correlated non-circular cross-
correlation is used, to produce a virtual pitch vector that gives the un-
normalized correlation between the two vectors over diering osets
between their lowest partials D.,. This serves as a model for virtual
pitch weights at log-frequencies relative to the timbres lowest partial
this is illustrated in Iigure ,.,, which shows the virtual pitch vector
for a comple harmonic tone with = 10.3 and = 0.42.
Ior all underlying tunings and timbres matched or unmatched, the
virtual pitch similarities between all possible intervals are calculated
by taking the cosine similarities of their virtual pitch vectors and, as
,a a :ourt or :rtoutc arrttv
described above, their epected values for all pairs of matched and un-
matched timbres are calculated D.D.,.
The value of f
V
(m, n
1
, n
2
; , ) is given by subtracting the epected
virtual pitch similarity of consecutive tones with timbre n
2
from the
epected virtual pitch similarity of consecutive tones with timbre
n
1
D.. It contains just two free parameters. spectral roll-o
(, ), and smoothing width [0, ). This submodel is used
in partto model the probability of choosing a matched timbre over
an unmatched timbre. Ior concision, this submodel may be denoted by
the abbreviated formf
V
, and the vector of its values over the o tested
stimuli is denoted f
V
.
,.,.a., Hcroco:::ty :cocr:sco so|oc1e|
The toneness and vertical familiarity of a timbre are both modelled by
its harmonicity. The |croco:::ty so|oc1e| f
H
(n
1
, n
2
; , ) calculates the
harmonicity of the matched timbre minus the harmonicity of the un-
matched timbre. It is not aected by the underlying tuning m.
Darmonicity is here calculated as the maimum value in the cross-
correlation of the timbre and a harmonic template, that is, it is the ma-
imumvalue found in the virtual pitch vector D.. This means that the
more similar the pattern of the timbres spectral contents is to the har-
monic template, the greater its harmonicity.
The value of f
H
(n
1
, n
2
; , ) is given by subtracting the harmonicity
of timbre n
2
from the harmonicity of timbre n
1
D.a. As with the
other submodels, it contains just two free parameters. spectral roll-o
(, ), and smoothing width [0, ). They are, however,
not necessarily epected to be identical to the related parameters in the
spectral pitch similarity model, because it is possible non-identical pro-
cesses are occurring. In the former, I am modelling a comparison be-
tween two events held in short-term memory, in the latter, one of the
itemsthe harmonic templatedoes not reside in short-term mem-
ory it either resides in long-term memory or is embodied in some
innate cognitive process. We would, however, epect the parameters
,., rrsuts ,,
to be identical to those used in the virtual pitch similarity model, be-
cause they both involve comparison with a harmonic template. This
submodel is usedin partto model the probability of choosing a
matched timbre over an unmatched timbre. Ior concision, this sub-
model may be denoted by the abbreviated form f
H
, and the vector of
its values over the o tested stimuli is denoted f
H
.
,.,.a., 1|ree :co1:1cte oc1e|s cj crt:::cots rescoses
The spectral and virtual pitch similarity submodels are for mental
processes triggered by successive tones horizontal musical features,
the harmonicity model is for mental processes triggered by individ-
ual tones vertical musical features. I test three dierent models that
combine one of the two horizontal submodels with the vertical sub-
model two of the resulting models contain the same two submodels
but utilize a dierent parameterization. In addition to the smoothing
width and roll-o parameters required by each submodel, each candi-
date model requires two additional parameters
1
and
2
to set the
relative weights of its two submodels.
The rst candidate model utilizes the two submodels spectral pitch
similarity f
S
(m, n
1
, n
2
;
S
,
S
) and harmonicity f
H
(n
1
, n
2
;
H
,
H
).
This model has a total of si parameters. the submodel weights
1
and
2
, and the independent smoothing widths and roll-os for both sub-
models as shown after the semicolons.
The second candidate model utilizes the same two submodels, but
uses identical smoothing widths and roll-os for both submodels. spec-
tral pitch similarity f
S
(m, n
1
, n
2
; , ) and harmonicity f
H
(n
1
, n
2
; , ).
This model, therefore, has a total of four parameters. the submodel
weights
1
and
2
, and the smoothing width and roll-o parameters
used for both submodels. It is, therefore, a more parsimonious version
of the rst model and reects the possibility that spectral pitch and har-
monicity do indeed derive from closely related perceptual processes
see the last paragraph in 8ection ,.,.a.,.
,, a :ourt or :rtoutc arrttv
The third candidate model utilizes the two submodels virtual pitch
similarity f
V
(m, n
1
, n
2
; , ) and harmonicity f
H
(n
1
, n
2
; , ). This
model, therefore, has a total of four parameters. the submodel weights
1
and
2
, and the smoothing width and roll-o parameters used for
both submodels.
A model containing both f
S
(m, n
1
, n
2
; , ) and f
V
(m, n
1
, n
2
; , )
i.e., spectral and virtual similarities was not tested because these two
submodels are too highly correlatedf
S
as optimized in the rst model
and f
V
as optimized in the third model have a correlation over the stim-
uli of r(108) = .95.
The method by which the two submodels are combined is deter-
mined by the datathe collected data are the numbers of matched
rather than unmatched timbres chosen, and the total number of trials
for each of the o dierent pairs of stimuli c|ser:ct:cos. As such, the
data for each stimulus are presumed to be random observations from a
binomial distribution and are, therefore, most appropriately modelled
with a logistic regression upon the submodels f
S
and f
H
in the rst and
second models, f
V
and f
H
in the third model.
o.,
o.;,
G(f
H
) .;
G(f
S
) .
Model ,.o
Model , .,
duces a signicantly higher score than the null model, while model a
which uses both of themhas a signicantly higher score than either
of the submodels alone. The eect sizes of the spectral pitch distance
and harmonicity models are similar, but their eects are dierent and
hence complementary. This is conrmed by the signicantly higher
score achieved by model a, which contains both spectral pitch similar-
ity and harmonicity.
Ior model a, the optimized parameter values and their standard
errors, and statistical tests for the whole model are summarized in
Table ,.o. The standard errors were calculated from a numerically
estimated Dessian matri of the optimized parameter values. The
Wilcoon signed-rank test for the whole model, as detailed above, is
highly signicant z = 9.10, p < .001, and a Dosmer-Iemeshow test
indicates the models predictions are not signicantly dierent to the
observed data
2
(8, N = 2638) = 4.60, p = .800. With the optimized
values for and , the two predictors f
S
and f
H
have low correlation
over all stimuli r(108) = .09, p = .334, so there are no concerns
with multicollinearity. Iigure ,., is a scatter plot, for all o stimuli, of
,., rrsuts ,,
Table ,.o. 8tatistical analysis and evaluations of the model and its parameters
the logistic part of the model does not include a constant term.
8tandard errors were derived from a numerical estimation of the
Dessian matri, the z-score and p-value were calculated from a
signed-rank test on the cross-validation as described in the main
tet.
Parameter Value 8I , CI
smoothing o.a o.;o .,a .o
roll-o o.,a o. o.a o.o,
1
spectral similarity weight ,. o. a. ,.
2
harmonicity weight ,,.oo o.,, .,; ;o.o,
Overall model
Wilcoon signed-rank. z = 9.10, p < .001
Dosmer-Iemeshow.
2
(8, N = 2638) = 4.60, p = .800
0 5 10 15 20 25 30
0
5
10
15
20
25
30
Observed numbers of matched timbres
P
r
e
d
i
c
t
e
d
n
u
m
b
e
r
s
o
f
m
a
t
c
h
e
d
t
i
m
b
r
e
s
Iigure ,.,. Ior all o observations, this scatter plot compares the observed
numbers of matched timbres chosen by participants with those
predicted by model a.
the observed numbers of matched timbre choices against the predicted
number of matched timbre choices.
Ior those more accustomed to linear regression statistics, I provide
correlation, standardized coecient, and R
2
values from a linear re-
gression of the log-odds logits of choosing a matched timbre. The
values of and were ed to the optimized values o.,a and o.a
,o a :ourt or :rtoutc arrttv
obtained for model a, and the following linear model with a constant
term was optimized in the standard way with ordinary least squares.
log
_
P(Y = 1 | m, n
1
, n
2
)
1 P(Y = 1 | m, n
1
, n
2
)
_
=
0
+
1
f
S
(m, n
1
, n
2
; 0.42, 10.28) +
2
f
H
(n
1
, n
2
; 0.42, 10.28) .
,.
Because the values of = 0.42 and = 10.28 were originally optimized
in model a, and f
S
and f
H
are nonlinear with respect to them, it would
be misleading to provide standard F-statistics and F-tests the degrees
of freedom cannot be established, so these are not reported. The opti-
mized model has R
2
= .59that is, ,of the variance in the log-odds
of choosing a matched timbre is accounted for by this linear version of
the model. The standardized coecient values are ., for spectral sim-
ilarity f
S
and .;o for harmonicity f
H
the unstandardized coecients
are ,.,, ,;.,;, and .o for the intercept termsimilar to those of the
logistic model. The correlation between the log-odds of choosing a
matched timbre and f
S
over all stimuli is r(108) = .32, p < .001, the
correlation between the log-odds and f
H
is r(108) = .66, p < .001. Iol-
lowing Cohens familiar guidelines on categorizing eect sizes, spec-
tral similarities have a mediumeect size, while harmonicityand the
complete modelhave large eect sizes.
The optimized parameter values for the smoothing width and spec-
tral roll-o o.a cents and o.,a, respectively are reassuringly plau-
sible. Under eperimental conditions, the frequency dierence li-
men just noticeable dierence corresponds to approimately , cents,
which would be modelled by a smoothing width of , cents as e-
plained in App. A. In an eperiment like this, in which the stimuli are
more eplicitly musical, we would epect the standard deviation to
be somewhat wider than this, and the value of approimately o cents
seems eminently reasonable.
The roll-o values are also highly plausible. The optimized roll-o
in saliences o.,a approimately corresponds to the loudnesses of the
,., utscussto ,
partials in the stimulis timbres.
,
This value emphasizes the impor-
tance of the lower partials, and suggests correspondences between the
tempered third partial of the timbre and the tempered perfect fths
and fourths in the melody are important when using matched timbres,
tempered perfect fths and fourths have frequency ratios of tempered
,/a and tempered ,/,, respectively.
The data generated by the second model are shown in Iigure ,.oa,
and can be usefully compared with the observed data shown in Iig-
ure ,.o. The individual contributions of spectral pitch similarity and
harmonicity predictors are shown in Iigures ,.ob and ,.oc in both
cases the parameter values are identical to those used in the full model.
,., utscussto
The eperimental data strongly support the hypotheses that melodic
anity is increased when the tunings of scale degrees and partials are
matched, and when the tones have close-to-harmonic partials. I have
also shown how a combination of spectral pitch similarity and har-
monicity can model, in a more precise way, the relationship between
spectral tuning, melodic tuning, and perceived anity. Importantly,
the eperimental procedure allows us to eliminate the confounding
top-down inuence of horizontal familiarity that part of anity that
is a function of each intervals prevalence. In the absence of this con-
found, we can see that spectral pitch similarity has a medium-sized ef-
fect on perceived anity. As eplained in 8ection ,..a, the eperi-
ment cannot determine whether the impact of harmonicity is because
it models our familiarity with harmonic comple tones, or whether it
is modelling an innate processbut, either way, its eect size is strong.
It is interesting to eplore the implications of these results in two
dierent areas. the use of microtonally tempered spectra for micro-
tonal scales, and the relationship between scales and tunings that pro-
, The timbres had partials with amplitudes of approimately i
1
, where i is the partial
number. According to 8tevens power law, perceived loudness corresponds, approi-
mately, to amplitude pressure to the power of 0.6, hence the loudness of each partial
is approimately i
0.6
, which would be equivalent to a value of 0.6.
,a a :ourt or :rtoutc arrttv
3 4 5 7 10 11 12 13 15 16 17 All
All
17
16
15
13
12
11
10
7
5
4
3
3
All
17
16
15
13
12
11
10
7
5
4
3
3 4 5 7 10 11 12 13 15 16 17 All
Umatched timbre tuning (n-TET)
M
e
l
o
d
y
a
n
d
m
a
t
c
h
e
d
t
i
m
b
r
e
t
u
n
i
n
g
(
n
-
T
E
T
)
a Data simulated by model alogistic
regression on spectral pitch similarity
f
S
(m, n) and harmonicity f
H
(m, n).
3 4 5 7 10 11 12 13 15 16 17 All
All
17
16
15
13
12
11
10
7
5
4
3
3
All
17
16
15
13
12
11
10
7
5
4
3
3 4 5 7 10 11 12 13 15 16 17 All
Umatched timbre tuning (n-TET)
M
e
l
o
d
y
a
n
d
m
a
t
c
h
e
d
t
i
m
b
r
e
t
u
n
i
n
g
(
n
-
T
E
T
)
b Data simulated by a logistic regression on
spectral pitch similarity f
S
(m, n).
3 4 5 7 10 11 12 13 15 16 17 All
All
17
16
15
13
12
11
10
7
5
4
3
3
All
17
16
15
13
12
11
10
7
5
4
3
3 4 5 7 10 11 12 13 15 16 17 All
Umatched timbre tuning (n-TET)
M
e
l
o
d
y
a
n
d
m
a
t
c
h
e
d
t
i
m
b
r
e
t
u
n
i
n
g
(
n
-
T
E
T
)
c Data simulated by a logistic regression on
harmonicity f
H
(m, n).
Iigure ,.o. The modelled data.
,., utscussto ,,
vide good-tting melodies when using tones produced by conven-
tional wind and string instruments and the human voice the majority
of which have close-to-harmonic partials.
,.,. Mct:|:og 1:o|re co1 1oo:og
In the Dynamic Tonality section of 8ethares et al. aoo,, a procedure is
given to retune partials to match a wide variety of scale-tunings scales
generated by two intervalsa period and a generatorboth of which
can take any size. The procedure was aesthetically motivated on the
grounds that it can reduce the sensory dissonance of prevalent inter-
vals and chords in the underlying scale. The eperimental data, pre-
sented here, suggest the matching of partials to low-n n-rs can also
make microtonal melodies more in-tune and tting. Indeed, it was my
practical eperience with Dynamic Tonality synthesizersnoticing,
for eample, how much more in-tune -r melodies sound when the
spectral tuning is matchedthat motivated this eperiment in the rst
place.
ao
Daving said that, it is also clear that timbres with partials close
in frequency to the familiar harmonic template were typically pre-
ferred by participants. This means that, in matching partials to a low
n n-r, one is trading the increased consonance and anity of inter-
vals for possibly dissonant timbres.
,.,.a 8:c|es jcr 1coes u:t| Hcroco:: 8e:trc
The majority of pitched Western instruments have timbres whose par-
tials are tuned to a harmonic series e.g. bowed string, wind instru-
ments, and the voice, or close to such a spectrum e.g., plucked and
hammered string instruments. Iigure ,. shows the spectral pitch
similarity of pairs of tones with harmonic spectra separated by an inter-
ao There are, currently, three Dynamic Tonality synthesizers which allowfor a variety
of tunings and for the spectral tuning to be matchedTransIorm8ynth an analysis-
resynthesis synthesizer,The Viking an additive-subtractive synthesizer, and ao,a a
modal physical modelling synthesizer. They are freeware, and can be downloaded
from http://www.dynamictonality.com.
,, a :ourt or :rtoutc arrttv
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0
0.5
1
Melodic interval size (semitones)
Iigure ,.. The spectral pitch similarity of harmonic comple tones with
diering interval sizes. This chart is calculated with the param-
eter values for spectral similarity as optimized in 8ection ,.,.a.,
smoothing of o., cents, and roll-o of o.,a. The graph bears a
clear resemblance to the sensory dissonance charts of, for eam-
ple, Plomp and Ievelt ,o and 8ethares aoo, with maima
of modelled anity at simple frequency ratios like a/, ,/a, ,/,,
and so forth.
val whose size is shown on the horizontal ais. Iach tick corresponds
to one a-r semitone, and a total range of just over one octave is cov-
ered.
Clearly, the intervals with the highest spectral pitch similarity other
than the unison are the octave and the perfect fth and perfect fourth.
There is signicant empirical evidence that the octave is universally
recognized as an interval with etremely high anity Woolhouse
aoo, cites numerous eamples. The high spectral pitch similarity of
perfect fths and perfect fourths tallies nicely with historical evidence.
Ior eample, ancient Greek scales were typically based on conjunct and
disjunct tetrachords. The two outer tones of a tetrachord span a perfect
fourth of frequency ratio ,/, and, within this perfect fourth, lie two
additional tones that could take on a wide variety of dierent tunings.
The outer fourth was, however, always ed. When a second tetra-
chord is placed a whole-tone above the top note of the rst tetrachord
i.e., a perfect fth above the bottomnote, the entire octave is spanned
to make a seven-tone scale. If the two tetrachords have identical inter-
nal structure, the resulting scale is rich in high spectral pitch similar-
ity perfect fourths and perfect fths. This technique of scale construc-
tion might, therefore, be seen as a heuristic for creating high-anity
scales. Indeed, the bounding fourths potentially provide perceptually
secure start and end points for a melody that traverses the more chal-
lenging tones in between. Ior an in-depth eamination of the history,
,., utscussto ,
and mathematical, perceptual, and aesthetic properties of tetrachords,
see Chalmers ,,o, and for a discussion of the anity CDC- of the
perfect fourth and fth see Tenney ,.
The diatonic and pentatonic scales, which are so ubiquitous to West-
ern music, are the richest in terms of perfect fths and fourths. This
is because they are actually generated by a continuous chain of either
of these intervals. there is no ve-tone scale with more perfect fths
and fourths than the anhemitonic pentatonic, and no seven-tone scale
with more perfect fths and fourths than the diatonic. 8uch scales,
therefore, maimize the number of the highest anity non-octave
intervals. Dypotheses that account for the development of such scales
by the consonance of perfect fths and fourths suer from the uncom-
fortable historical fact that these scales were privileged well before har-
mony became commonplace in the West. Rephrasing this in terms of
melodic anity, rather than harmonic consonance, avoids this prob-
lem. It is also worth noting that Durons ,,, analysis of the aggre-
gate dyadic consonance of all possible scales selected from a-r in-
dicates that the most familiar Western scales the diatonic, ascending
melodic minor and harmonic minor have the highest aggregate con-
sonance for ;-tone scales, the apanese Ritsu and blues scale are highly
ranked for o-tone scales, and the anhemitonic pentatonic is the highest
ranked -tone scale. This is relevant because anity is correlated with
consonance see Iigure ,., so similar results may hold if aggregated
dyadic spectral pitch similarity were used instead of aggregated dyadic
consonance. In this case, Durons conclusions could be comfortably e-
tended to scales developed for melodic music.
It would seem, therefore, that both historical and contemporary mu-
sical practice support the notion that spectral pitch similarity is a uni-
versal component of melodic anity, and that this phenomenon has
contributed to the types of scales that are, and have been, privileged in
melodic music.
,o a :ourt or :rtoutc arrttv
,. coctusto
The eperimental data give strong support for the hypotheses that the
perceived anity of successive tones is a monotonic function of both
their spectral pitch similarity and their inherent harmonicity. The ef-
fect size of spectral pitch similarity is medium, the eect size of har-
monicity is strong. The eperimental intervention also suggests that
spectral pitch similarity is modelling a nature processone that does
not require the learning of musical regularities.
In the net chapter, I build on this result and apply a related model
to the relationships between tones and chords, and between chords and
scales. In this way, I use spectral pitch similarity to model Krumhansls
probe tone data, and the stability and tonic-ness of chords in a vari-
ety of scales. I also present some hypotheses about how spectral pitch
similarity may eplain some additional aspects of tonality.
5
A MODII OI TDI PROBI TONI DATA AND
8 CAII C TONAII TY
The Krumhansl and Kessler ,a probe tone data are participants rat-
ings of the ts of the twelve chromatic pitch classes to a previously
established tonal contet. They are widely considered to be one of the
most important sets of eperimentally obtained data about tonal per-
ception. This is because, given a key, they can summarize the stability
or tonic-ness of pitch classes, and they have a high correlation with
the prevalences of pitch classes in Western tonal music. They have also
proved eective at modelling perceived inter-key distance, and predict-
ing the key of music as it plays. Iurthermore, they are thought to be
robust because similar results have been obtained in numerous related
eperiments.
Clearly, any proposed model of tonal perception should be able to
eectively model this data. In light of the previously demonstrated suc-
cess of the bottom-up spectral pitch similarity model at eplaining the
perceived anity of microtonal pitches, it makes sense to test a related
model for the probe tone data.
In this chapter, I test three spectral pitch similarity models of the
probe tone data they dier only in their parameterizations. I also
compare my models against a number of previously suggested models,
some of which are bottom-up, some of which are top-down. My own
models have amongst the highest t with the data and, being bottom-
up, have wide eplanatory scope.
Iurthermore, I etend the model beyond the probe tone data and use
it as a novel method to successfully predict the tonics of the diatonic
scale and a variety of chromatic alterations of the diatonic scale e.g.,
the harmonic minor and harmonic major scales. The model is also gen-
,;
, a :ourt or nr rrotr or uaa au scattc oattv
0 1 2 3 4 5 6 7 8 9 10 11
1
2
3
4
5
6
7
Pitch class (relative to tonal context) of probe tone
M
e
a
n
r
a
t
i
n
g
o
f
p
r
o
b
e
t
o
n
e
s
f
i
t
w
i
t
h
p
r
e
v
i
o
u
s
l
y
e
s
t
a
b
l
i
s
h
e
d
m
a
j
o
r
c
o
n
t
e
x
t
a Major contet.
0 1 2 3 4 5 6 7 8 9 10 11
1
2
3
4
5
6
7
Pitch class (relative to tonal context) of probe tone
M
e
a
n
r
a
t
i
n
g
o
f
p
r
o
b
e
t
o
n
e
s
f
i
t
w
i
t
h
p
r
e
v
i
o
u
s
l
y
e
s
t
a
b
l
i
s
h
e
d
m
i
n
o
r
c
o
n
t
e
x
t
b Minor contet.
Iigure .. Krumhansl and Kesslers major and minor tonal hierarchies.
eralizable to any possible tuning, and I demonstrate this by predicting
possible tonics for a selection of microtonal scales.
The probe tone data are commonly thought to represent tcoc| |:er-
cr:|:eslearned templates that assign t levels to pitch classes once a
specic major or minor key has been established a deeper discussion is
provided later in this introduction and in 8ec. .a. Dowever, because
my model provides a bottom-up eplanation for the probe tone data,
it does not require learned templates. 8uch templates may eist, but
my model is agnostic on this point. In reality, it is likely that a vari-
ety of long-term memory templates do play a supportive role and, in
8ection .a, I discuss the interplay between long-term memory top-
down processes and bottom-up processes.
Ior the probe tone eperiment, ten participants rated the degree of
t ona seven-point scale with designatedts poorly and; designated
ts well Krumhansl and Kessler, ,a. These well-known results are
illustrated in Iigure ..
The major or minor tonal contet was established by playing one of
four musical e|eoeots. just the tonic triad I, the cadence IVVI, the ca-
dence IIVI, the cadence VIVI. Ior eample, to establish the key
of C major, the chord progressions Cmaj, ImajGmajCmaj, Dmin
GmajCmaj, and AminGmajCmaj were used, to establish the key
of Cminor, the chord progressions Cmin, IminGmajCmin, Ddim
GmajCmin, and AmajGmajCmin were used. A :c1eo:e is dened
by Krumhansl and Kessler ,a, p. ,a as a strong key-dening se-
a :ourt or nr rrotr or uaa au scattc oattv ,,
quence of chords that most frequently contains the V and I chords of
the new key, the above three cadences are amongst the most common
in Western music. Iach element, and its twelve probes, was listened to
four times by each participant. Ior each contet, the ratings of t were
highly correlated over its four dierent elements mean correlations
for the dierent elements were r(10) = .90 in major and r(10) = .91
in minor so the ratings were averaged to produce the results shown in
Iigure .. All listeners had a minimum of ve years formal instruc-
tion on an instrument or voice, but did not have etensive training in
music theory.
All contet elements and probes were played with c:tc:e :co|ex tcoes
also known as ocs or 8hepard tones. 8uch tones contain partials that
are separated only by octaves i.e., they contain only 2
n1
th harmon-
ics, where n N, and the centrally pitched partials have a greater
amplitude than the outer partials, precise specications are given in
Krumhansl and Kessler ,a. Octave comple tones have a clear pitch
chroma but an unclear pitch height, in other words, although they have
an obvious pitch, it is not clear in which octave this pitch lies. The
stated purpose of using ocs was to minimize the eect of pitch height
dierences between the contet and probe tones, which strongly af-
fected the responses of the least musically oriented listeners in |an] ear-
lier study Krumhansl, ,,o, p. ao.
Dowever, ocs are unnatural acoustical eventsno conventional
musical instrument produces such spectra, they have to be articially
synthesized. Musical instruments typically produce |croco:: :co|ex
tcoes ncs in which most harmonics are present and such timbres con-
tain a greater multiplicity of interval sizes between the harmonics e.g.,
frequency ratios such as ,/a, ,/,, /,, and /,, in addition to the a/ oc-
taves found in ocs. Krumhansl and Kessler ,a, p. ,, describe the
oc timbre as an organlike sound, without any clearly dened lowest
or highest component frequencies.
The use of ocs, rather than ncs, may aect the resulting ratings of
t, that is, if ncs had been used instead, it is possible the results may
o a :ourt or nr rrotr or uaa au scattc oattv
have beento some etentdierent, even after taking account of
pitch height eects e.g., Parncutt ao, p. ,,, points out that the e-
perimental data obtained by Budrys and Ambrazevicius aoo indicates
ncs may reverse the ts of the minor third and perfect fthpitch
classes , and ;in the minor contet. Despite this, the high correla-
tion between the obtained ratings and the prevalences of scale degrees
in samples of tonal music e.g., r(10) = .89 for major and r(10) = .86
for minor Krumhansl, ,,o suggests that any such distortions are rel-
atively small.
The probe tone data are thought to be important and robust. Dow-
ever, it should be borne in mind that only ten participants were in-
volved, and they were all musically trained. The etent to which such
results can generalize must, therefore, be open to some question. Parn-
cutt ao cites a number of eamples of similar eperiments that have
produced similar, but non-identical, results Krumhansl and 8hepard,
,;,, Thompson, ,o, 8teinke et al., ,,,, ,,, Budrys and Am-
brazevicius, aoo. Considering the probe tone data may be biased due
to the small number of participants, and aected by the precise contet-
setting elements used, it is important that any model is judged not only
on its goodness of t, but also on the clarity and plausibility of its hy-
potheses, the etent to which it can eplain the data, and its ability to
make predictions beyond this single eperiment.
The data are important because they can be generalized to predict
aspects of music that were not eplicitly tested in the eperiment. No-
tably, the degree of t can be used to model the stability or tonic-ness
of the pitches and chords found in major-minor tonalityas originally
suggested by Krumhansl ,,o, pp. o 8 , and reiterated by Parn-
cutt ao, p. ,,,. Also, the data have been used to model perceived
inter-key distances Krumhansl and Kessler, ,a, and to predict the
keydynamicallyof music as it plays Krumhansl, ,,o, Toiviainen
and Krumhansl, aoo,. Dowever, there is no obvious way to use this
data to account for certain important aspects of tonality. a Why is
the primary major scale the diatonic, while the primary minor scale is
a :ourt or nr rrotr or uaa au scattc oattv
the non-diatonic harmonic minor scale
2/
1/
( y y)
( y y)/(y y)
(y y), where
is the transpose operator which turns
a column vector into a row vector, y is a column vector of the empirical data, y is a
column vector all of whose entries are the mean of the empirical data and, critically, y
is a column vector of the models predictions cjter having been tted by simple linear
regression.
. nr :ourts o,
0 1 2 3 4 5 6 7 8 9 10 11
1
2
3
4
5
6
7
Pitch class (relative to tonal context) of probe tone
M
o
d
e
l
l
e
d
f
i
t
s
o
f
p
r
o
b
e
t
o
n
e
s
w
i
t
h
m
a
j
o
r
t
o
n
i
c
t
r
i
a
d
a Major contet.
0 1 2 3 4 5 6 7 8 9 10 11
1
2
3
4
5
6
7
Pitch class (relative to tonal context) of probe tone
M
o
d
e
l
l
e
d
f
i
t
s
o
f
p
r
o
b
e
t
o
n
e
s
w
i
t
h
m
i
n
o
r
t
o
n
i
c
t
r
i
a
d
b Minor contet.
Iigure .a. The circles show the probe tone data, the upwards pointing tri-
angles show the data as modelled by model a, the rightwards
pointing triangles show the data as modelled by model b, the
downwards pointing triangles show the data as modelled by
model c.
The optimized parameter values all seem plausible. for model a,
= 0.52 and = 5.71, for model b, = 0.77, = 6.99, and = 0.63,
for model c, = 0.67, = 5.95, and = 0.50.
Iurthermore,
I demonstrate in 8ection .,.a that my model produces similar results
for the diatonic scale tuned to a-r and to quarter-comma meantone.
In this latter section, I also eplore some interesting tonal eects pro-
duced when the tuning, and resulting scales, are radically dierent to
those obtained in a-r.
.,.. Mccr (Co:1co:co) |exc:|cr1.
This si-tone scale formed the basis of much medieval music theory
and pedagogy Berger, ,;. It is equivalent to a diatonic scale with the
fourth or seventh scale degree missing. Ior instance the C heachord
contains the pitches C, D, I, I, G, A. There is no B or B to ll the
gap between A and C. In modal music, the note used to ll the gap
was either a |cr1 B a B or a scjt B a B.
a
The choice of hard or soft
was not notated but was made by performers to avoid simultaneous or
melodic tritonesthis practice is called oos::c :tc Berger, ,;.
Although basic tonal eects may be invariant, equal temperaments do provide certain
advantages because they facilitate unlimited modulation, and enharmonic substitu-
tion and modulation. In that sense, they open up greater compositional resources.
a The shape of the natural and at symbols derive from two dierent ways of writing
the letter b.
;o a :ourt or nr rrotr or uaa au scattc oattv
0 1 2 3 4 5 6 7 8 9 10 11
0
0.2
0.4
0.6
0.8
1
Pitch classes (scale pitches are dark, nonscale pitches are light)
S
p
e
c
t
r
a
l
p
i
t
c
h
s
i
m
i
l
a
r
i
t
y
o
f
a
l
l
p
i
t
c
h
c
l
a
s
s
e
s
a
n
d
h
e
x
a
c
h
o
r
d
p
i
t
c
h
c
l
a
s
s
e
s
a Modelled t of major heachord
pitches.
0 1 2 3 4 5 6 7 8 9 10 11
0
0.2
0.4
0.6
0.8
1
Triad roots (major are dark, minor are light)
S
p
e
c
t
r
a
l
p
i
t
c
h
s
i
m
i
l
a
r
i
t
y
o
f
h
e
x
a
c
h
o
r
d
t
r
i
a
d
s
a
n
d
p
i
t
c
h
e
s
b Modelled t of major heachord
chords.
Iigure .,. Modelled pitch class and chord t with the Guidonian heachord.
In Iigure .,, I will assume that pitch class o corresponds to C. Iigure
.,a shows that the pitch classes I and I , and , which are a semitone
apart, are the least well-tting of the heachord tones. In Gregorian
chant, the oc|:s nal pitch was D, I, I, or G corresponding to the
modes rctos, 1eoteros, tr:tos, and tetrcr1os. Of these modes, Iigure .,a
shows that the pitch classes with the highest t are at D and G a and
;, which suggests these two modes have the most stable nal pitches.
This tallies with statistical surveys, referenced in Parncutt ao, which
indicate these two modes are the most prevalent. The relative ts of D
and G are even higher when the heachord has a Pythagorean tuning
in which all its fths have the frequency ratio ,/asuch tunings were
prevalent prior to the fteenth century Iindley, ao,.
When we look at the modelled t of each of the heachords ma-
jor and minor triads with all the pitches in the heachord, the results
are quite dierent Iigure .,b. Dere, every major or minor chord has
identical t with this scale. It is as if the Guidonian heachordwhen
used for major/minor triad harmonyhas no identiable best-tting
tonic chord. As shown in the net eample, all of this changes when
that missing seventh degree is specied, thereby producing a specic
diatonic scale.
.,..a D:ctco:: occr s:c|e.
The diatonic scaleregardless of its modehas numerous properties
that make it perceptually and musically useful. A number of those
., scattc oattv ;;
0 1 2 3 4 5 6 7 8 9 10 11
0
0.2
0.4
0.6
0.8
1
Pitch classes (scale pitches are dark, nonscale pitches are light)
S
p
e
c
t
r
a
l
p
i
t
c
h
s
i
m
i
l
a
r
i
t
y
o
f
a
l
l
p
i
t
c
h
c
l
a
s
s
e
s
d
i
a
t
o
n
i
c
m
a
j
o
r
s
c
a
l
e
p
i
t
c
h
c
l
a
s
s
e
s
a Modelled t of major scale pitches.
0 1 2 3 4 5 6 7 8 9 10 11
0
0.2
0.4
0.6
0.8
1
Triad roots (major are dark, minor are light)
S
p
e
c
t
r
a
l
p
i
t
c
h
s
i
m
i
l
a
r
i
t
y
o
f
d
i
a
t
o
n
i
c
m
a
j
o
r
s
c
a
l
e
t
r
i
a
d
s
a
n
d
p
i
t
c
h
e
s
b Modelled t of major scale chords.
Iigure .. Modelled pitch class and chord t with the major scale..
properties follow from its well-formedness Wilson, ,;, Carey and
Clampitt, ,, such as Myhills property, maimal evenness, unique-
ness, coherence, and transpositional simplicity space precludes e-
plaining these properties herethey are summarized in Prechtl et al.
aoa. Iurthermore, it contains numerous consonant intervals ap-
proimations of low integer frequency ratios, and supports a major
or minor triad on all but one of its scale degrees. Ior tonal-harmonic
music, the major scale e.g., C, D, I, I, G, A, B is the most impor-
tant and prevalent mode of the diatonic scale. The only other mode
that comes close is the Aeolian e.g., A, B, C, D, I, I, G, or C, D, I,
I, G, A, Balso known as the octorc| o:ocr s:c|ewhich is one of
the three scale forms associated with the minor scale the other two are
the harmonic minor, in which the Aeolians seventh degree is sharp-
ened, and the ascending melodic minor in which the sith and seventh
degrees are sharpened.
The addition of a seventh tone to the heachordthereby making
a diatonic scalemakes the ts of its triads more heterogeneous. Iig-
ure .b illustrates this with the diatonic major scalenote how the
Ionian and Aeolian tonics roots on pitch classes o and ,, respectively
are modelled as having greater t than all the remaining triads. This,
correctly, suggests they are the most appropriate tonics of the diatonic
scalethe major scales tonic and the natural minor scales tonic, re-
spectively. This is also reected in the common usage of the subme-
diant chord vi as a substitute for the tonic I in deceptive cadences
Piston and Devoto ,;, p. ,, Macpherson ,ao, p. oo, and the
; a :ourt or nr rrotr or uaa au scattc oattv
frequent modulation of minor keys to their relative major Piston and
Devoto, ,;, p. o. It is also interesting to observe that the fourth
and seventh degrees of the major scale have lower t than the remain-
ing tones. This possibly eplains why these two scale degrees function
as leading tones in tonal-harmonic musicscale degree
7 resolving to
1, and
4 resolving to
1a res-
olution to the tonics root, over IVI or iiI which contain the scale
degrees
1 and
3.
Dowever, this suggests that iiiI would also provide an eective ca-
dence because it too has the worst-tting
7
in iiidue to it being the chords fthmakes it less active, while the
lower t of
I cadences.
These additional hypotheses the importance of semitone resolutions
from poor-t tones to roots of good-t triads, and the decreased t of
pitches that are the thirds of chords seem promising, in future work, I
hope to precisely specify these eects, and use themto model responses
to a variety of chord progressions and scalic contets.
.,.., Hcroco:: o:ocr s:c|e.
An important aspect of the minor tonality is that the harmonic minor
scale is favoured over the diatonic natural minor scaleparticularly in
common practice cadences where the harmonic minor Vi is nearly
., scattc oattv
0 1 2 3 4 5 6 7 8 9 10 11
0
0.2
0.4
0.6
0.8
1
Pitch classes (scale pitches are dark, nonscale pitches are light)
S
p
e
c
t
r
a
l
p
i
t
c
h
s
i
m
i
l
a
r
i
t
y
o
f
a
l
l
p
i
t
c
h
c
l
a
s
s
e
s
a
n
d
h
a
r
m
o
n
i
c
m
i
n
o
r
p
i
t
c
h
c
l
a
s
s
e
s
a Modelled t of harmonic minor scale
pitches.
0 1 2 3 4 5 6 7 8 9 10 11
0
0.2
0.4
0.6
0.8
1
Triad roots (major are dark, minor are light)
S
p
e
c
t
r
a
l
p
i
t
c
h
s
i
m
i
l
a
r
i
t
y
o
f
h
a
r
m
o
n
i
c
m
i
n
o
r
t
r
i
a
d
s
a
n
d
p
i
t
c
h
e
s
b Modelled t of harmonic minor scale
chords.
Iigure .. Modelled pitch class and chord t with the harmonic minor scale.
always used in preference to natural minor vi Piston and Devoto,
,;, p. ,,. The harmonic minor scale is equivalent to the Aeolian
mode with a sharpened seventh degree. This change has an important
eect on the balance of chordal tsand goes some way to eplaining
why this scale forms the basis of minor tonality in Western music.
Iigure .a shows that
6 and
1,
5, and
s
t
r
i
a
d
s
a
n
d
p
i
t
c
h
e
s
b Modelled t of /,-comma diatonic
major scale chords.
Iigure .. Modelled pitch class and chord t with the /,-comma meantone
diatonic major scale.
are robust over such changes in the underlying tuning of the diatonic
scale.
.,.a.a zz-1L1 L, s cr:o:oe s:c|e.
In the following three eamples, I look at dierent well-formed scales
that are subsets of aa-tone equal temperament. The names of these tem-
peraments cr:o:oe, srotc|, and ocg:: are commonly used in the micro-
tonal community, and are eplained in greater detail in Irlich aooo
and the website http://xenharmonic.wikispaces.com/. In all of these
scales, the tuningsrounded to the nearest centof the major triads
are (0, 382, 709), and the tunings of the minor triads are (0, 327, 709).
These tunings are, by most standard metrics, closer to the just into-
nation major and minor triads than those in a-r. Ior each scale, the
spectral pitch class similarities suggest one or more triads that will func-
tion as tonics. I do not, at this stage, present any empirical data to sub-
stantiate or contradict these claims, but I suggest that collecting such
empirical datatonal responses to microtonal scaleswill be a useful
method for testing bottom-up models of tonality. Audio eamples of
the scales, their chords, and some of the cadences described below, can
be downloaded from www.dynamictonality.com/probe_tone_files/.
The porcupine scale has seven tones and is well-formedit contains
one large step of size a cents and si small steps of size o, cents
hence its signature I, os, and the scale pitch classes are indicated with
dark bars in Iigure .aa. Iigure .ab shows that the major triad on
o a :ourt or nr rrotr or uaa au scattc oattv
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
0
0.2
0.4
0.6
0.8
1
Pitch classes (scale pitches are dark, nonscale pitches are light)
S
p
e
c
t
r
a
l
p
i
t
c
h
s
i
m
i
l
a
r
i
t
y
o
f
2
2
p
i
t
c
h
c
l
a
s
s
e
s
a
n
d
t
h
e
1
L
,
6
s
w
e
l
l
f
o
r
m
e
d
s
c
a
l
e
p
i
t
c
h
e
s
a Modelled t of porcupine I, os scale
pitches.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
0
0.2
0.4
0.6
0.8
1
Triad roots (major are dark, minor are light)
S
p
e
c
t
r
a
l
p
i
t
c
h
s
i
m
i
l
a
r
i
t
y
o
f
1
L
,
6
s
s
c
a
l
e
s
t
r
i
a
d
s
a
n
d
p
i
t
c
h
e
s
b Modelled t of porcupine I, os scale
chords.
Iigure .a. Modelled pitch class and chord t with the porcupine I, os scale.
and the minor triad on , are modelled as the best-tting. This suggests
that, within the constraints of this scale, they may function as tonics.
The worst-tting pitch classes are o and a, which can both lead to the
root of the minor triad on ,. Neither of these potential leading tones
are thirds of any triads in this scale, which possibly reduces their ef-
fectiveness when using triadic harmony. Dowever, the above suggests
the most eective cadences in this scale will be the minor chord on a
leading to the minor chord on ,, the major chord on whose fth is
pitch class o leading to the minor chord on ,, or a variety of seventh
chords containing both o and a like the dominant seventh built on
whose third is o and seventh is a also leading to the minor chord on
,. Using Roman numerals, taken relative to the minor tonic on pitch
class ,, these are iii, IIIi, and III
7
i, respectively.
.,.a., zz-1L1 zL, cs srotc| s:c|e.
This ten-tone microtonal scalerst suggested by Irlich ,,is
unusual in that it repeats at the half-octave it is well-formed within this
half-octave interval. This repetition accounts for why the t levels
shown in Iigure .,also repeat at each half-octave. It contains two
large steps of size o, cents, and eight small steps of size o, cents. The
scale pitches are indicated with dark bars in Iigure .,a. The modelled
ts suggest there are two possible major triad tonics on pitch classes ,
and and two possible minor tonics on pitch classes a and ,. The
roots of both the minor chords can be approached by a poorer-tting
., scattc oattv ;
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
0
0.2
0.4
0.6
0.8
1
Pitch classes (scale pitches are dark, nonscale pitches are light)
S
p
e
c
t
r
a
l
p
i
t
c
h
s
i
m
i
l
a
r
i
t
y
o
f
2
2
p
i
t
c
h
c
l
a
s
s
e
s
a
n
d
t
h
e
2
L
,
8
s
w
e
l
l
f
o
r
m
e
d
s
c
a
l
e
p
i
t
c
h
e
s
a Modelled t of srutal aI, s scale
pitches.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
0
0.2
0.4
0.6
0.8
1
Triad roots (major are dark, minor are light)
S
p
e
c
t
r
a
l
p
i
t
c
h
s
i
m
i
l
a
r
i
t
y
o
f
2
L
,
8
s
s
c
a
l
e
s
t
r
i
a
d
s
a
n
d
p
i
t
c
h
e
s
b Modelled t of srutal aI, s scale
chords.
Iigure .,. Modelled pitch class and chord t with the srutal aI, s scale.
leading tone pitch classes o and than can the major pitch classes a,
o, ,, and ;. This suggests eective cadences can be formed with the
major chord on whose third is pitch class o proceeding to the mi-
nor chord on a or their analogous progressions a half-octave higher,
or variety of seventh chords such as the dominant seventh on , whose
seventh is pitch class o. Using Roman numerals relative to the minor
tonic on a or ,, these are VIIi and II
7
i, respectively. These ca-
dences can be thought of as slightly dierent tunings of the familiar
a-r progressions Vi and II
7
i.
.,.a., zz-1L1 ,L, ,s ocg:: s:c|e.
This microtonal scale also has ten tones, and is well-formed with re-
spect to the octave so no repetition at the half-octaveit has three
large steps of size a;, cents and seven small steps of size cents. As
before, the dark bars in Iigure .,a indicate the scale pitches. In this
scale, every degree that is a root of a major triad is also a root of a minor
triad and vice versa. Ior this reason, in Iigure .,b, only the better
tting major or minor is shown on the chart, for the pitch class ,,
however, the major and minor triad have equal t, so this should be
borne in mind.
The modelled ts, in Iigure .,b, suggest two possible major tonics
with roots on pitch classes a and , and two possible minor tonics on
pitch classes , and o. Iigure .,a shows that, in terms of t, pitch
class ; looks like a promising leading tone to the root of the minor
a :ourt or nr rrotr or uaa au scattc oattv
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
0
0.2
0.4
0.6
0.8
1
Pitch classes (scale pitches are dark, nonscale pitches are light)
S
p
e
c
t
r
a
l
p
i
t
c
h
s
i
m
i
l
a
r
i
t
y
o
f
2
2
p
i
t
c
h
c
l
a
s
s
e
s
a
n
d
t
h
e
3
L
,
7
s
w
e
l
l
f
o
r
m
e
d
s
c
a
l
e
p
i
t
c
h
e
s
a Modelled t of magic ,I, ;s scale
pitches.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
0
0.2
0.4
0.6
0.8
1
Triad roots (major are dark, minor are light,
but the triad on 9 can be either)
S
p
e
c
t
r
a
l
p
i
t
c
h
s
i
m
i
l
a
r
i
t
y
o
f
3
L
,
7
s
s
c
a
l
e
s
t
r
i
a
d
s
a
n
d
p
i
t
c
h
e
s
b Modelled t of magic ,I, ;s scale
chords.
Iigure .,. Modelled pitch class and chord t with the magic ,I, ;s scale.
triad on o. Dowever, this pitch class is not the third of any triad in the
scale. The other leading tone contenders are on and , and both of
these can be triad thirds. This implies the major chord on a, and the
major or minor chord on ,, may function as tonics in this scale. This
suggests eective cadences can be formed with the major chord on o
whose third is pitch class proceeding to the major triad on pitch
class a, or the major chord on pitch class whose third is pitch class
proceeding to the major or minor triad on pitch class ,. In Roman
numeral notation, relative to their respective tonics, these are VIII,
VIII, and VIIi. Interestingly, inall these eamples the cadences are
in terms of a-rsimilar to a major chord, whose root is pitched in-
between V and VI, proceeding to I or i the distance between these
roots is ;o, cents.
., coctusto
In this chapter, I have shown that there at least two types of plausible
bottom-up modelParncutts virtual pitch class commonality models,
and my spectral pitch class similarity modelsthat can eplain why the
probe tone data take the form they do. I argue that bottom-up epla-
nations, such as these, are able to account not just for the eistence of
t proles as provided by top-down models, but also for the specic
form they take. In light of both theories ability to eplain and predict
the data, I suggest that there is now little reason to believe the probe
., coctusto ,
tone data are a function purely of top-down processes. I cannot, on the
basis of the probe tone data, determine whether the primary mecha-
nism is spectral pitch class or virtual pitch class similarity. To distin-
guish between these eects would require novel eperiments.
I have also used my model in the reverse directionto predict can-
didate tonic triads for a number of scales that are subsets of the full
twelve chromatic pitch classes. The results accord well with music the-
ory. Iurthermore, I have also suggested some additional mechanisms
that may account for strong cadences a poor-tting tone moving to
the root of a best-tting triad and how this, in turn, may cause the di-
atonic scale to become more oriented to its major Ionian tonic rather
than its minor Aeolian tonic. I also suggest a possible reason for why
the seventh degree loses much of its activity need to resolve when it
is the fth of the mediant iii chord. And, in combination, these mech-
anisms support the use of VI as a cadential chord progression. These
latter hypotheses are somewhat speculative because they have not been
included in a formal mathematical model, but I feel they are promising
ideas that warrant further investigation.
I have also claimed my model can challenge the notion that there
is a tonal hierarchy, which is an unchanging or slowly evolving tem-
plate against which recently heard pitches are compared. Rather, in my
template-independent theory, I suggest that any given musical con-
tet automatically generates a corresponding prole of ts. Ior cer-
tain commonly-used scales, such proles may become embedded as
tonal hierarchies templates, as might the tonal implications of com-
mon melodic lines and harmonic progressions. But crucially, under my
theory, the templates are not the initial cause of tonal functionality,
rather, they are an eect of the more basic and universal psychoacous-
tic process of spectral or virtual pitch similarity. This implies that our
cognition of tonality, and the types of tonal musics we create areto
some etentconstrained and shaped, in a nontrivial and predictable
way, by our perceptual and cognitive apparatus.
,o a :ourt or nr rrotr or uaa au scattc oattv
Iinally, I have pointed to the way in which microtonal scales can
also be analysed with this technique, and how this may become an im-
portant means to eplore our general perception of tonality, and to test
models thereof. Ideally, any model that purports to eplainfromthe
bottom uphow Western tonality works, should also be able to make
useful predictions for the possibly dierent tonalities evoked by com-
pletely dierent scales and tunings.
6
CONCIU8 I ON
In this dissertation, my aim has been to identify and model the in-
nate processes by which feelings of tension, resolution, stability, and
so forth, are induced by successions of pitches and chords, irrespective
of their harmonic consonance.
I have chosen to focus on innate mental processesthose aspects of
music perception that are not due to long-term familiarity with a spe-
cic corpus of music. This is because only such processes can eplain
specifythe casual origins of associations betweenacoustical events and
mental phenomena.
In order to do this, I have postulated thatgiven a contet of
pitches, such as a scalethose chords that are the most consonant, have
the greatest anity, and are the most familiar will tend to be heard
as the most stable and tonic-like. Iamiliarity is, by denition, a top-
down process, but both consonance and anity have plausible bottom-
up psychoacoustic models. I have focused my attention on bottom-up
models of anity, because it is clear that consonance can provide only a
partial answer e.g., in the contet of a Cmajor scale, the root-position
triads Cmaj and Gmaj have identical consonance but diering levels of
stability.
When an instrument plays a single notated pitch, it actually produces
a multiplicity of spectral and possibly virtual pitches. Iollowing Ter-
hardt and Parncutt, I have hypothesized that the anity of any two
tones or chords is due, in part, to the similarity of their spectral or vir-
tual pitches. In order to eectively model such similarities, I have de-
veloped a novel family of representations of pitches called epectation
tensors.
In Chapter ,, I demonstrated howepectation tensors can model the
uncertainties of pitch perception by smearing each pitch over a range
,
,a coctusto
of possible values, and the width of the smearing can be related to e-
perimentally determined frequency dierence limens. The tensors can
embed either absolute or relative pitches. in the latter case, embeddings
of pitch collections that dier only by transposition have zero distance,
a useful feature that relates similarity to structure. Iurthermore, ten-
sors of any order dimensionality can be formed, allowing the embed-
dings to reect the absolute or relative monad pitch, dyad interval,
triad, tetrad, and so forth, content of the pitch collection. The distance
between epectation tensors of the same order can be determined with
any standard metric or similarity measure such as L
p
or cosine.
I also demonstrated how absolute monad epectation tensors can be
used to embed the spectral pitches or pitch classes of tones or chords,
and how the cosine similarity of such vectors can be used to model the
anity of the tones or chords they embed. In Chapter ,, I described
an eperiment that eliminates the confounding top-down inuence of
horizontal familiarity that part of anity that is a function of each
intervals prevalence. In the absence of this confound, the data indi-
cate that spectral pitch similarity is an eective bottom-up model of
anity. In other words, there is a psychoacoustic component to listen-
ers perception of the etent to which tones with dierent pitches t
togetherthe greater the similarity of their spectra, the greater their
anity. The data also show that a spectral pitch based model of har-
monicity toneness is also correlated with perceived anity but, in this
case, this may be modelling either an innate or a learned process.
I additionally showed how these results indicate that certain com-
mon scalessuch as those based on tetrachords or generated by per-
fect fths like the pentatonic and diatonicmaimize the anity of
the harmonic comple tones produced by the human voice and most
pitched musical instruments. 8uch scales may, therefore, be a natural
consequence of our perceptual apparatus, irrespective of culture. I also
suggest that synthetic sounds with timbres matched to the underlying
tuning can be used not just to maimize consonance, but also to ma-
imize melodic anity for microtonal scales.
coctusto ,,
In Chapter , I showed that spectral pitch class similarity can also
model Krumhansls probe tone data with considerable accuracy. In
light of the ability of this model, and Parncutts virtual pitch model, to
both eplain and predict the data, I suggest there is now little support
for the widely held belief that tonal perceptions are due only to top-
down processes. In other words, I have provided compelling evidence
that bottom-up processes play an important role in the perception of
tonal ts. This implies there is no requirement to interpret the probe
tone data as representing a long-term memory template a tonal hier-
archy. I do not argue such templates do not eistit seems highly
plausible that learned processes do play a meaningful role. But, im-
portantly, bottom-up eplanations mean the initial causal impetus for
their forms comes from psychoacoustically based processes. In other
words, our cognition of tonality, and the types of tonal musics we cre-
ate are constrained and shaped, in a nontrivial and predictable way, by
our perceptual and cognitive apparatus.
I also used the spectral pitch class similarity model to predict, for a
variety of scales, which triads are likely to function as tonics, and which
pitches are likely to functionas tense leading tones. The models predic-
tions concur with music theory. I additionally suggested three mecha-
nisms that may account for why. the diatonic scale is biased towards its
major Ionian mode rather than its natural minor Aeolian mode, the
seventh degree of the major scale loses much of its tension when it is the
fth of the iii chord, the VI progression is so important in cadences.
These mechanisms are that a given a scale, the strongest sense of har-
monic resolution is induced when a bad-tting tone moves by a small
interval e.g., a semitone to the root of a best-tting chord, b there is
a feedback mechanismwhereby the increased salience of a tonic degree
aects the instability of its leading tones, c the tension of pitches is
a function of their anity with their local harmonic contet chord
degree in addition to their scalic contet scale degree.
At this stage, these latter hypotheses are somewhat speculative be-
cause they have not been included in a formal mathematical model.
,, coctusto
Iinally, I have pointed to the way in which microtonal scales can also
be analysed with this same technique.
In summary, I have developed a novel set of techniques to model the
perceived similarity of pitch collections, and I have used them to build
bottom-up models for certain important aspects of tonal cognition
anity and tonal stability. Iperimental tests of the models have
shown them to be eective at both predicting and eplaining these as-
pects of tonality.
o. corttutos
The rst novel contribution provided in this dissertation is my four-
fold categorization of mental processes relevant to music cognition.
etrinsic nurture, intrinsic nurture, etrinsic nature, and intrinsic na-
ture. This categorization is related to that provided by 8loboda and
uslin aoo, but diers in that they do not eplicitly separate intrinsic
processes into nature and nurture. I additionally illustrate how these
categories are related to those used in semiotic theory and those sug-
gested by uslin and Vstqll aoo. The principal purpose of my cate-
gorization is to enable a clear and unambiguous distinction to be made
between top-down and bottom-up models. the former require, as an
input variable, a statistical analysis of a musical corpus, the latter do
not.
The second novel contribution is the development of epectation
tensors. Prior to epectation tensors, there had been no generalized
method to represent collections of pitches, intervals, triads, and so
forth within a principled probabilistic framework and incorporating
basic psychoacoustic processes of pitch perception. Iurthermore, by
generalizing the resulting embeddings into multi-dimensional forms
i.e., tensors and allowing for pitches to be represented in either abso-
lute or relative form, I have constructed a family of pitch embeddings
that are generalizations of a number of familiar embeddings used in
musical set theory i.e., interval vectors and other subset-class vectors.
o. corttutos ,
The third novel contribution has been the development of an epec-
tation tensor spectral pitch vector model of tonal anity. This model
has a small number of parameters spectral roll-o and smoothing
width . It has some similarities to Parncutts virtual pitch common-
ality model, but it diers in that. it uses spectral rather than virtual
pitches, it does not assume eachpitchis categorized as a chromatic pitch
class regardless of its precise tuning, it inherits the principled psychoa-
coustic and probabilistic foundations of the epectation tensors.
The fourth novel contribution has been utilizing microtonal stim-
uli to eperimentally disambiguate innate and learned processes as de-
ned in this dissertation. The use of microtonal stimuli in eperimen-
tal investigations of music perception is rare, I am aware of only a few
researchers who have used microtonal stimuliVos ,a, ,,, ,o,
,, Vos and van Vianen ,b,a, Bucht and Duovinen aoo,, and
Ierrer aoo;and in none of these cases are the microtonal stimuli
used to disentangle nature and nurture as dened in this dissertation.
The fth novel contribution has been to demonstrate that the per-
ceived t of successive pitches anity is a causal function of their
spectral pitch similaritya model of an innate mental process.
The sith novel contribution has been to show that anity is also a
function of the harmonicity of the timbres used. It is not possible to say
whether this is due to participants familiarity with harmonic comple
tones, or whether it is modelling an innate preference for such timbres.
The seventh novel contribution has been to demonstrate un-
equivocally that a bottom-up psychoacoustic model can account for
Krumhansls probe tone data. And that the same model also makes re-
alistic predictions about the musical functions of chords in a variety of
familiar scales.
The eighth novel contribution is that I have applied my model to
microtonal scales. As far as I know, this is the only model able to make
predictions about the perceived tonal eects anities, stabilities, ten-
,o coctusto
sions of successively played pitches and chords within such scales.
At
this time, I have no empirical evidence to support these predictions.
o.a tt:tatos
None of the eperiments conducted for this dissertation have enabled
me to determine the relative importance of spectral and virtual pitches
to tonal perception. Dowever, it seems that models containing only
spectral pitches or pitch classes are highly eective. The only way
to distinguish the eect sizes of spectral and virtual pitches is to con-
duct eperiments where their predictions are suciently uncorrelated.
Prior to conducting and building the full models, I had presumed the
microtonal melodies eperiment Ch. , would distinguish between
these two models. In fact, it turned out the data produced by spec-
tral and virtual pitch models were highly correlated r(108) = .95, as
shown in 8ec. ,.,.a.,. Dierent eperimental stimuli are required to
ensure these two models predictions are less highly correlated.
In Chapter , I made some additional hypotheses to account for ad-
ditional aspects of tonality, such as why the diatonic scales Ionian tonic
is stronger than its Aeolian natural minor tonic, and how the tensions
and stabilities of pitches may be aected by their chord degree as well
as their scale degree. Dowever, I have not provided a formal mathe-
matical model for these hypotheses. In the net section, I briey out-
line another hypothesis that may account for these features, and which
deals with successions of chords rather than the relationships between
a given scale and its chords. It is my intention to create a model that
embodies all of these proposed processes.
Dowever, an additional problem is that there is a lack of eperi-
mental data encapsulating many important tonal eects. The probe
tone data measure certain important aspects of tonal perception and,
as shown earlier, I have successfully modelled this data. But they do
Most bottom-up models of consonance/dissonance e.g., 8ethares ,,, can make
predictions for microtonal intervals and chords, but they are not designed to model
the eects produced by successive pitches and chords.
o., ruurr worx ,;
not capture aspects such as. a the eects of diering orderings of any
given set of chords, b the eects of a wide variety of dierent chord
progressions, c the eects of dierent types of voice-leading, d the
eects of chord choice, irrespective of their aggregate pitch class con-
tent e.g., the chord progressions ImajGmajCmaj and DminImin
Cmaj have very dierent tonal eects, but both have the same aggre-
gate pitch content. Iurthermore, the use of octave rather than har-
monic comple tones may have unintended consequences.
In the net section, I discuss ways in which some of these limitations
can be overcome in future work.
o., ruurr worx
In order to gain a deeper understand of tonality, there is a vital need to
obtain more eperimental data, particularly data that provide more de-
tailed information about the eects mentioned above. In earlier work,
I conducted a small eperiment designed to illuminate these proper-
ties Milne, aoo,b,a. Thirty-ve participants were asked to rateon a
seven-point scalethe degree of closure produced by the nal chord
in a variety of three-chord progressions. In order to minimize the ef-
fects of consonance/dissonance, all chords were major or minor tri-
ads in root position. To maimize ecological validity, realistic sound-
ing timbres were usedthe chords were played by a sampled string
quartetand conventional rules of voice-leading were followed i.e.,
minimizing the voice-leading distance, whilst avoiding parallel octaves
and fths and, as much as possible, hidden octaves and fths.
a
This
eperiment provided very useful data, and the participants responses
were highly correlated with each other mean inter-participant corre-
lation of r(33) = .49, Cronbachs = .97 Milne, aoo,a. Dowever,
the stimuli were limited in scope. Ignoring voice-leading, and trans-
a It is widely believed that parallel and hidden fths are avoided in common practice
music in order to preserve the perceived independence of the voices Duron, aoo.
H:11eo octaves and fths occur when two voices move in similar motion to an octave
or fth. They are generally considered less objectionable than parallel octaves and
fths, particularly when one or both voices are the alto and tenor inner voices.
, coctusto
position with respect to the nal chord, there are 24 24 2 = 1152
dierent such chord progressions, I tested only ;a o of these.
In order to gain more etensive data, I recently conducted a much
more comprehensive version of the eperiment. I asked each partici-
pant from a total of ao participants to rate a random selection of a
chord progressions froma total of o,a. This means each of the o,a pro-
gressions were rated, on average, by 120 128/642 24 participants.
The o,a progressions constitute more than half of all possible such
progressions, furthermore, the o ecluded progressions were those
which traverse a large distance in the cycle-of-fths and can, therefore,
be reasonably considered rare in Western music. I have not yet had a
chance to fully analyze or model this data. But it provides a hugely
powerful resource of information for future research and, notably, to
test models of tonality that are designed to capture the eects of chord
ordering, voice-leading, and chord degree.
In particular, I intend to use these data to test the hypotheses I made
in Chapter and reiterated above. Namely that a given a scale, the
strongest sense of harmonic resolution is induced when a bad-tting
tone moves by a small interval e.g., a semitone to the root of a best-
tting chord, b there is a feedback mechanism whereby the increased
salience of a tonic degree aects the instability of its leading tones, c
the tension of pitches is a function of their anity with their local har-
monic contet chord degree in addition to their scalic contet scale
degree. Of course, the rst task is to formally embody these principles
in a mathematical model so they can be tested against these data.
Dowever, there is an additional novel hypothesis I wish to mention.
This hypothesis may also account for the above tonal eects and others,
and these data are ideal to test it. It can be eemplied by returning
to the gure introduced in Chapter ,. I reproduce this gure, and its
minor version, in Iigure o..
The hypothesis has two components. Iirstly, given two chords
played in sequence, we may hear one or both of these chords as pertur-
bations alterations of similar chords that have greater spectral pitch
o., ruurr worx ,,
700 600 500 400 300 200 100 0 100 200 300 400 500 600 700
700
600
500
400
300
200
100
0
100
200
300
400
500
600
700
j
Roots and fifths
T
h
i
r
d
s
700 600 500 400 300 200 100 0 100 200 300 400 500 600 700
700
600
500
400
300
200
100
0
100
200
300
400
500
600
700
Roots and fifths
T
h
i
r
d
s
C
D
E
F
B
A
G
g
a
b
c
d
e
f
C
D
E
F
B
A
G
g
a
b
c
d
e
f
700 600 500 400 300 200 100 0 100 200 300 400 500 600 700
700
600
500
400
300
200
100
0
100
200
300
400
500
600
700
j
Roots andfifths
T
h
i
r
d
s
C
D
E
F
B
A
G
g
a
b
c
d
e
f
a Major reference triad.
700 600 500 400 300 200 100 0 100 200 300 400 500 600 700
700
600
500
400
300
200
100
0
100
200
300
400
500
600
700
Minor
Roots and fifths
T
h
i
r
d
s
700 600 500 400 300 200 100 0 100 200 300 400 500 600 700
700
600
500
400
300
200
100
0
100
200
300
400
500
600
700
Minor
Roots and fifths
T
h
i
r
d
s
g
G
A
a
b
B
C
c
d
D
E
e
F
f
700 600 500 400 300 200 100 0 100 200 300 400 500 600 700
700
600
500
400
300
200
100
0
100
200
300
400
500
600
700
Minor
Roots and fifths
T
h
i
r
d
s
g
G
A
a
b
B
C
c
d
D
E
e
F
f
700 600 500 400 300 200 100 0 100 200 300 400 500 600 700
700
600
500
400
300
200
100
0
100
200
300
400
500
600
700
Minor
Roots and fifths
T
h
i
r
d
s
700 600 500 400 300 200 100 0 100 200 300 400 500 600 700
700
600
500
400
300
200
100
0
100
200
300
400
500
600
700
Minor
Roots andfifths
T
h
i
r
d
s
b Minor reference triad.
Iigure o.. 8pectral pitch similarities of a Cmaj or Cmin reference triad and
all possible a-r triads that contain a perfect fth. 8pectral pitch
similarity is calculated with the previously optimized smoothing
of o., cents and roll-o of o.,a.
similarity. Ior instance and as shown in Iig. o.a, the chord pair
CmajDmaj has lower spectral pitch similarity than the similar chord
pair CmajDmin. The hypothesis implies that whenthe lower anity
CmajDmaj is played, it may be heard as a perturbation of the higher
anity CmajDmin. More specically, the played pitch I may be
heard as an alteration of the pitch I.
8econdly, when a pitch is heard as perturbed in this way, it is heard to
resolve when it continuesin the same direction as its perturbation
to the net available pitch with high anity. Ior eample, in the chord
progression CmajDmajGmaj, the tone I is resolved by moving to
G i.e., in the same direction as its upwards perturbation from I. In
aoo coctusto
this way, the I acts as a low anity passing tone between an I that is
implied due to its higher anity and the played G, which follows.
This may account for the perceived instability or activity of the V in
the IVVprogression of which CmajDmaj is an eample, and for its
perceived need to resolve to I the Gmaj chord. Numerous similar e-
amples can be found, for eample, the spectral pitch similarity of Cmaj
and Imaj is lower than that of Cmaj and Imin, which suggests that,
when the former is played, the tone A may be heard as an active alter-
ation of the higher anity tone A and, hence, seeks resolution to B in
the cadence CmajImajBmin a IIVi cadence. Iurther eamples
are provided in Milne et al. aob.
By combining all these additional hypotheses, it may be possible to
modelfromthe bottomupthe feelings of tension, activity, resolu-
tion, and so forth, that are amongst the most characteristic features of
harmonic tonality. Amodel to eplore this bottom-up account of tonal
functionality is in progress, and I plan to report on its eectiveness in
future publications.
The work described in this dissertation can also be etended in other
ways. To date, I have not used any of the relative or higher-dimensional
tensors informal models of music cognition. In8ections ,.o.a and,.o.,,
I showed how such epectation tensors provide eective methods for
calculating the similarity between any arbitrary tuning of a scale and
just intonation or other privileged referents, and how they can gen-
eralize many of the methods used in musical set theory. It will be inter-
esting to see how such tensors may become useful in models of tonal
perception. Ior instance, Kuusi aoo collected ratings from par-
ticipants for the perceived closeness of chords, these data could be
modelled by a linear combination of the distances between a variety of
tensors of diering orders.
Another research opportunity deriving from the tensors is to de-
velop methods for creating sc|:eo:e rather than epectation tensors. In
8ection ,.,, I showed how epectation tensors sum the elements x[i, j]
in the pitch response matri X to give the epected numbers of tones
o., ruurr worx ao
or ordered tone pairs, tone triples, etc. heard at any given pitch or
dyad, triad, etc.. An alternative strategy is to derive the probability of
hearing any given pitch, dyad, triad, and so forth regardless of how
many tones, or tuples of tones, we may hear playing it. The resulting
tensors will have elements with values in the interval [0, 1]. It is quite
straightforward to naively calculate the elements of such a tensor. Ior
eample, the elements of an absolute monad salience tensor are given
by x
s
[j] = 1
I
i=1
1 x[i, j] compare this with the absolute monad
epectation tensor, which is x
e
[j] =
I
i=1
x[i, j]. Dowever, the method
of inclusion-eclusion of tensor subspaces, which greatly reduces the
computational compleity of the higher-order tensors, cannot be di-
rectly replicated for the salience tensors. Despite this, it should be pos-
sible to nd a related method of inclusion-eclusion that will provide
similar reductions in computational compleity.
Afurther research opportunity opened up by the epectation tensors
is to build in the option for an additional form of invariances:c|e :o-
:cr:co:e. Currently, the tensors can embed pitches or pitch classes only
the latter have invariance with respect to the octave, and these embed-
dings can be represented in an absolute or a relative formonly the latter
has invariance with respect to transposition. 8cale invariance may also
have useful cognitive applicationsunder a relative dyad embedding
that is also scale invariant, the representations of, for eample, pitchsets
C,, D,, I,} and C,, I,, I,} would be identicalthe latter is es-
sentially a scaled stretched version of the former whole tones become
major thirds, and semitones become whole tones. With a means to in-
dicate the temporal ordering of the embedded intervals, theywould be-
come contour invariantmelodies with the same contour shapes but
diering contour depths would be invariant under such an embed-
dings. Melodies with similar contours would have similar embeddings
under a standard metric. I have some preliminary ideas about howsuch
embeddings may be constructed.
A further research question that could be tackled is to devise eper-
imental stimuli that can more clearly distinguish between the eects
aoa coctusto
of virtual and spectral pitch similarity. One way to achieve this would
be to create synthetic timbres whose modelled virtual pitches are dis-
tinctly dierent to their spectral pitches. Participants would then judge
the anities of dierently sized intervals that match either the spectral
or the virtual pitches.
And, on the distant horizon is the grand aim of constructing a com-
plete model of tonality comprising submodels for each of the four cat-
egories of mental processes illustrated in Iigure a.. intrinsic, etrinsic,
nature, and nurture.
o., t:rttcatos
Understanding the etent to which the cultural artefacts of humankind
are due to innate processes of human perception and cognition, and
howmuch they are down to learning is a key question. In this disserta-
tion, I have posited that innate processes play an important role in the
shaping of tonal music.
8o, does this mean there is only one true form of music, one :1ec|
to which artists may aspire, and against which eisting pieces must be
judged No. Rather, it just suggests likely iconic signications for cer-
tain musical events. Ior eample, we can argue there is a likely associa-
tion between the discomfort induced by dissonance, or poor anity,
and negatively valenced feelings. Dowever, eisting symbols and the
contets within which they operate are often deliberately subverted by
the artistwho seeks to make the familiar, unfamiliar.
,
The cultural
contet is also importantthe zeitgeist may consider the discomfort
of disanity or dissonance as a positive aesthetic, and a sophisticated
audience may feel that ecessive consonance and anity is too simplis-
tic, too unlike reality, to communicate in an authentic or meaningful
manner.
, This process is knownas 1ejco:|:cr:zct:co, 8hklovsky ,o, p. a writes, the tech-
nique of art is to make objects unfamiliar, to make forms dicult, to increase the
diculty and length of perception because the process of perception is an aesthetic
end in itself and must be prolonged, as quoted in Thompson ,.
o., t:rttcatos ao,
In the end, it is the artisthis or her aesthetic sensibilitiesand the
culture within which he or she operates that dictates how the loose
emotional connotations of musical events may become transformed
and transgured into the remarkable art form that is music. All good
music resembles something. Good music stirs by its mysterious resem-
blance to the objects and feelings which motivated it Cocteau ,,
p. .
,
, Toute bonne musique est resseo||cote. Ia bonne musique meut par cette ressem-
blance mystrieuse avec les objets et les sentiments qui lont motive.
RIIIRINCI8
Agmon, I. ,,. Iunctional harmony revisited. A prototype-
theoretic approach. Mos:: 1|ecry 8e:troo, ;a.,oa,. Cited
on page ,a.
Balzano, G. . ,o. The group-theoretic description of a-fold and
microtonal pitch systems. Ccooter Mos:: coroc|, ,,.oo,. Cited
on pages ,, and ,,.
Barbour, . M. ,. 1oo:og co1 1eoercoeot. H:stcr::c| 8or:ey.
Michigan 8tate College Press, Iast Iansing, Michigan. Cited on
page ,.
Berger, K. ,;. Mos::c I::tc. 1|ecr:es cj :::1eotc| Ioe:t:cos :o 1c-
:c| Ic|y|coy jrco Mcr:|ettc 1c Ic1c:c tc C:csec Zcr|:oc. Cambridge
University Press. Cited on page ;.
Berlyne, D. I. ,;o. Novelty, compleity and hedonic value. Ier:e-
t:co co1 Isy:|c|ys::s, .a;,ao. Cited on page a,.
Bernstein, . G. and Oenham, A. . aoo,. Pitch discrimination
of diotic and dichotic tone complees. Darmonic resolvability or
harmonic number 1|e coroc| cj t|e :cost::c| 8c::ety cj oer::c,
,o.,,a,,,,,. Cited on page ,o.
Bigand, I., Poulin, B., Tillmann, B., Madurell, I., and DAdamo, D. A.
aoo,. 8ensory versus cognitive components in harmonic priming.
coroc| cj Lxer:oeotc| Isy:|c|cgy. Hooco Ier:et:co co1 Ierjcroco:e,
a,.,;. Cited on page aa.
Bohlen, D. ,;. , tonstufen in der duodezime. :ost::c, ,,a.;o
o. Cited on page ,.
Bregman, A. 8. ,,o. o1:tcry 8:eoe oc|ys:s. MIT Press, Cambridge,
MA, U8A. Cited on page ,.
ao
aoo rrrrrrcrs
Brown, D. ,. The interplay of set content and temporal contet in
a functional theory of tonality perception. Mos:: Ier:et:co, ,.a,
ao. Cited on page ,o.
Brown, . C. ,,a. Musical fundamental frequency tracking using
a pattern recognition method. 1|e coroc| cj t|e :cost::c| 8c::ety cj
oer::c, ,a,.,,,,oa. Cited on page o;.
Buchler, M. aoo. Relative saturation of intervals and set classes. A
new approach to complementation and pcset similarity. coroc| cj
Mos:: 1|ecry, ,a.ao,,,,. Cited on pages ;; and ,.
Bucht, 8. and Duovinen, I. aoo,. Perceived consonance of harmonic
intervals in ,-tone equal temperament. In Parncutt, R., Kessler, A.,
and Zimmer, I., editors, Irc:ee1:ogs cj t|e Ccojereo:e co Ioter1:s::|:ocry
Mos::c|cgy, Graz, Austria. Cited on page ,.
Budrys, R. and Ambrazevicius, R. aoo. Tonal vs atonal. Percep-
tion of tonal hierarchies. In I. Cambouropoulos, I. C., Parncutt, R.,
8olomos, M., 8tefanou, D., and Tsougras, C., editors, Irc:ee1:ogs cj
t|e ,t| Ccojereo:e co Ioter1:s::|:ocry Mos::c|cgy, pages ,o,;, Aristotle
University, Thessaloniki, Greece. Cited on page o.
Busemeyer, . R. and Diederich, A. aoo. Ccgo:t::e Mc1e|:og. 8AGI,
Ios Angeles, U8A. Cited on page a.
Butler, D. ,,. Describing the perception of tonality in music. Acri-
tique of the tonal hierarchy theory and a proposal for a theory of in-
tervallic rivalry. Mos:: Ier:et:co, o,.a,a,a. Cited on pages ,
;, and ;.
Caplin, W. ,,. Tonal function and metrical accent. A historical
perspective. Mos:: 1|ecry 8e:troo, .,. Cited on page ,a.
Carey, N. aoo;. Coherence and sameness in well-formed and pair-
wise well-formed scales. coroc| cj Mct|eoct::s co1 Mos::, a.;,,.
Cited on page o.
rrrrrrcrs ao;
Carey, N. and Clampitt, D. ,,. Aspects of well-formed scales. Mo-
s:: 1|ecry 8e:troo, a.;aoo. Cited on pages o and ;;.
Castrn, M. ,,,. Re:re|. 8:o:|cr:ty Mecsore jcr 8et-C|csses. 8ibelius
Academy, Delsinki. Cited on pages , ;;, and ,.
Chalmers, . ,,o. D:::s:cos cj t|e 1etrc:|cr1. Irog Peak Music. Cited
on pages ; and ,.
Chandler, D. aooa. 8eo:ct::s. 1|e cs::s. Routledge, Iondon. Cited
on page o.
Chen, 8., Ma, B., and Zhang, K. aoo,. On the similarity metric and
the distance metric. 1|ecret::c| Ccooter 8::eo:e, ,oa,a.a,o
a,;o. Cited on page .
Chew, I. aooo. 1cucr1s c oct|eoct::c| oc1e| cj tcoc|:ty. PhD disserta-
tion, Massachusetts Institute of Technology. Cited on page ,,.
Chew, I. aoo. Real-time pitch spelling using the spiral array. Cco-
oter Mos:: coroc|, a,a.o;o. Cited on page ,,.
Chew, I. aooo. 8licing it all ways. mathematical models for tonal
induction, approimation, and segmentation using the spiral array.
INIORM8 coroc| co Ccoot:og, ,.,o. Cited on page ,,.
Cocteau, . ,. Le :c et |cr|eo:o. octes cotcor 1e |c oos:oe. Iditions
de la 8irne, Paris. Cited on page ao,.
Cuddy, I. I. and Thompson, W. I. ,,a. Asymmetry of perceived
key movement in chorale sequences. Converging evidence from a
probe-tone analysis. Isy:|c|cg::c| Resecr:|, ,a.,. Cited on
page ,o.
Dahlhaus, C. ,o. Tonality. In 8adie, 8., editor, 1|e Neu Crc:e
D::t:cocry cj Mos:: co1 Mos:::cos, volume ,, pages . Macmillan,
Iondon, st edition. Cited on page ;.
Dahlhaus, C. ,,o. 8to1:es co t|e Or:g:o cj Hcroco:: 1coc|:ty. Princeton
University Press, Oford. Cited on pages ; and ,.
ao rrrrrrcrs
Dennett, D. C. ,,,. Cognitive science as reverse engineering. 8ev-
eral meanings of top-down and bottom-up. In Prawitz, D.,
8kyrms, B., and Westersthl, D., editors, Lcg::, Met|c1c|cgy, co1 I|:-
|csc|y cj 8::eo:e IX, pages o;,o,, Amsterdam. Ilsevier 8cience.
Cited on page aa.
Deutsch, D. ,,;. 1|e Ic|r:: cj Rec|:ty. 1cucr1s c 1|ecry cj L:eryt|:og.
Penguin Books, Iondon. Cited on pages , and ao.
Dretske, I. aooo. Perception without awareness. In Gendler, T. 8.
and Dawthorne, ., editors, Ier:etoc| Lxer:eo:e. Oford University
Press, Oford, UK. Cited on page ,.
Irlich, P. ,,. Tuning, tonality, and twenty-two-tone tempera-
ment. Xeo|croco:ko, ;. Cited on page o.
Irlich, P. aooo. A middle path between just intonation and the
equal temperaments, part . Xeo|croco:ko, .,,,. Cited on
pages o, ,o, ;,, and .
Iuler, I. ;,,. 1eotcoeo oc:ce t|ecr:ce oos::ce ex :ert:ss:so:s |croco:ce
r:o::::s 1:|o::1e excs:tce. 8aint Petersburg Academy, 8aint Peters-
burg. Cited on page ,,.
Ierguson, 8., 8chubert, I., and Dean, R. T. ao. Continuous subjec-
tive loudness responses to reversals and inversions of a sound record-
ing of an orchestral ecerpt. Mos::ce 8::eot:ce, ,.,;,o. Cited
on page aa.
Ierrer, R. aoo;. 1|e rc|e cj t:o|re :o t|e oeocr:zct:co cj o::rctcoc| :oter-
:c|s. Masters, University of yvskyl. Cited on page ,.
Itis, I. . ,,. 1rc:te :co|et 1e |c t|ecr:e et 1e |c rct:oe 1e ||croco:e.
8chlesinger, Paris. Cited on page ,.
Iorte, A. ,;,. 1|e 8tro:tore cj tcoc| Mos::. Yale University Press.
Cited on pages ;a, ;;, ,, ,a, and ,,.
rrrrrrcrs ao,
Gell, A. ,,. The language of the forest. Iandscape andphonological
iconism in Umeda. In Dirsch, I. and ODanlon, M., editors, 1|e
ot|rcc|cgy cj Lco1s:ce, pages a,aa,. Clarendon Press, Oford.
Cited on page a,.
Goldstein, . I. ,;,. An optimum processor theory for the central
formation of the pitch of comple tones. coroc| cj t|e :cost::c| 8c-
::ety cj oer::c, ,.,,oo. Cited on page oa.
Green, B. and Butler, D. aooa. Irom acoustics to 1cosy:|c|cg:e. In
Christensen, T., editor, 1|e Cco|r:1ge H:stcry cj estero Mos:: 1|e-
cry, pages a,oa;. Cambridge University Press, Cambridge. Cited
on page ,.
Green, D. M. and 8wets, . A. ,oo. 8:goc| Dete:t:co 1|ecry co1 Isy-
:|c|ys::s. Wiley, New York. Cited on page oa.
Darrison, D. ,,,. Hcroco:: Ioo:t:co :o C|rcoct:: Mos::. Reoeue1
Doc|:st 1|ecry co1 co ::coot cj :ts Ire:e1eots. University of Chicago
Press, Chicago and Iondon. Cited on pages ,a and ,.
Delmholtz, D. I. I. ;;. Oo t|e 8eosct:cos cj 1coe cs c I|ys:c|cg::c|
cs:s jcr t|e 1|ecry cj Mos::. Dover, New York. Cited on pages ,
and o.
Deyduk, R. G. ,;. Rated preference for music composition as it
relates to compleity and eposure frequency. Ier:et:co co1 Isy-
:|c|ys::s, ;.,,. Cited on page a,.
Doning, D. aooo. Computational modeling of music cognition. A
case study on model selection. Mos:: Ier:et:co, a,.,o,;o. Cited
on page ;.
Dostinsky, O. ;,. D:e Le|re :co 1eo oos:kc|:s:|eo K| cogeo. L:o
e:trcg zor cest|et:s:|eo eg oro1oog 1er Hcroco:e-Le|re. D. Domini-
cus, Prague. Cited on page ,,.
Duron, D. ,,. Tonal consonance versus tonal fusion in polyphonic
sonorities. Mos:: Ier:et:co, ,a.,,. Cited on pages , and ,.
ao rrrrrrcrs
Duron, D. ,,,. Interval-class content in equally tempered pitch-
class sets. Common scales ehibit optimumtonal consonance. Mos::
Ier:et:co, ,.a,,o. Cited on pages ,, ,, oa, and o,.
Duron, D. aoo. Tone and voice. A derivation of the rules of voice-
leading from perceptual principles. Mos:: Ier:et:co, ,.o,.
Cited on pages ,, o, , and ,;.
Duron, D. aooo. 8ueet ot:::ct:co. Mos:: co1 t|e Isy:|c|cgy cj Lxe:-
tct:co. MIT Press, Cambridge, MA, U8A. Cited on pages a, o, ,
a, ,, and ,o.
Dutchinson, W. and Knopo, I. ,;. The acoustic component of
western consonance. Ioterjc:e, ;.a,. Cited on pages o and o,.
Dyer, B. aoo. Tonality. In 8adie, 8. and Tyrrell, ., editors, 1|e Neu
Crc:e D::t:cocry cj Mos:: co1 Mos:::cos, volume a, pages ,,,.
Macmillan, Iondon, and edition. Cited on page ;.
Dyer, B. aooa. Tonality. In Christensen, T., editor, 1|e Cco|r:1ge
H:stcry cj estero Mos:: 1|ecry, pages ;ao;a. Cambridge Univer-
sity Press, Cambridge. Cited on pages ; and o.
airazbhoy, N. A. ,,. 1|e Rcgs cj Ncrt| Io1:co Mos::. 1|e:r 8tro:-
tore co1 L:c|ot:co. Popular Prakashan, st revised edition. Cited on
page ,.
erket, . aoo,. Music articulation in the organ. In Irc:ee1:ogs cj c:ot
c|t::-Ncr1:: :cost::s Meet:og, Mariehamn, Aland, Iinland. Cited
on page a.
uhsz, Z. aoa. A mathematical study of note association paradigms
in dierent folk music cultures. coroc| cj Mct|eoct::s co1 Mos::,
o,.o,. Cited on page o.
uslin, P. N. and Iaukka, P. aoo,. Ipression, perception, and in-
duction of musical emotions. A review and a questionnaire study
of everyday listening. coroc| cj Neu Mos:: Resecr:|, ,,,.a;a,.
Cited on pages o and a;.
rrrrrrcrs a
uslin, P. N. and Vstqll, D. aoo. Imotional responses to music.
The need to consider underlying mechanisms. e|c::corc| co1 rc:o
8::eo:es, ,.,oa. Cited on pages a,, ao, a;, a,, and ,,.
Kameoka, A. and Kuriyagawa, M. ,o,. Consonance theory parts
and a. 1|e coroc| cj t|e :cost::c| 8c::ety cj oer::c, ,o.,,o,.
Cited on pages o,, o, and o,.
Kelley, R. T. aoo,. Reconciling tonal conicts. Mod-; transforma-
tions in chromatic music. In Icer 1e|::ere1 ct t|e se:co1 coooc| oeet:og
cj t|e Mos:: 1|ecry 8c::ety cj t|e M:1 t|cot:: :o I|:|c1e||:c, Ieoosy|:c-
o:c. Cited on page ,a.
Kim, .-D. aoo,. Istimating classication error rate. Repeated cross-
validation, repeated hold-out and bootstrap. Ccootct:coc| 8tct:st::s
co1 Dctc oc|ys:s, ,.,;,,;,. Cited on page ,o.
Korte, B. and Vygen, . aoo;. Cco|:octcr:c| Ot:o:zct:co. 1|ecry co1
|gcr:t|os. 8pringer, Berlin, ,th edition. Cited on page o.
Krumhansl, C. I. ,,o. Ccgo:t::e Icoo1ct:cos cj Mos::c| I:t:|. Oford
University Press, Oford. Cited on pages a, ,;, ,, ,o, ,a, ,,, ,,
,o, , ,, ,,, o, , ,, , ,, and o.
Krumhansl, C. I. and Kessler, I. . ,a. Tracing the dynamic changes
in perceived tonal organization in a spatial representation of musical
keys. Isy:|c|cg::c| Re::eu, ,,.,,,,o. Cited on pages iv, o,
,;, ,, ,,, o, ,, and .
Krumhansl, C. I. and 8hepard, R. N. ,;,. Quantication of the
hierarchy of tonal functions within a diatonic contet. coroc| cj
Lxer:oeotc| Isy:|c|cgy. Hooco Ier:et:co co1 Ierjcroco:e, .;,,,.
Cited on page o.
Kuusi, T. aoo. 8et-:|css co1 :|cr1. Lxco:o:og :cooe:t:co |etueeo t|ec-
ret::c| reseo||co:e co1 er:e::e1 :|cseoess. PhD thesis, 8ibelius Academy,
Delsinki, Iinland. Cited on pages ;;, ,, ,a, and aoo.
aa rrrrrrcrs
Iansberg, M. I. ,o. The icon in semiotic theory. Correot ot|rc-
c|cgy, a.,,,. Cited on page a,.
Ieman, M. aooo. An auditory model of the role of short-term mem-
ory in probe-tone ratings. Mos:: Ier:et:co, ;,.,o,. Cited on
pages , oo, and ;.
Ierdahl, I. ,. Tonal pitch space. Mos:: Ier:et:co, ,.,,o.
Cited on page o.
Ierdahl, I. aoo. 1coc| I:t:| 8c:e. Oford University Press, Oford.
Cited on pages a, ,a, ,, and o.
Iewandowski, 8. and Iarrell, 8. ao. Ccootct:coc| Mc1e|:og :o Ccg-
o:t:co. Ir:o::|es co1 Irc:t::e. 8AGI, Ios Angeles, U8A. Cited on
page ao.
Iewin, D. ,,. Re. Intervallic relations between two collections of
notes. coroc| cj Mos:: 1|ecry, ,a.a,,o. Cited on page ;.
Iewin, D. ,;. Ceoerc|:ze1 Mos::c| Ioter:c|s co1 1rcosjcroct:cos. Yale
University Press, New Daven, CT. Cited on page ;;.
Iewin, D. aoo. 8pecial cases of the interval function between pitch-
class sets X and Y. coroc| cj Mos:: 1|ecry, ,.a,. Cited on
page ;.
Iindley, M. ao,. Pythaorean intonation. In Crc:e Mos:: Oo|:oe.
Oxjcr1 Mos:: Oo|:oe. Oford University Press. Cited on page ;o.
Ionguet-Diggins, D. C. ,oa. Two letters to a musical friend. 1|e
Mos:: Re::eu, a,.a,,a,. Cited on page ,,.
Iowinsky, I. I. ,o. 1coc|:ty co1 tcoc|:ty :o 8:xteeot|-Ceotory Mos::.
University of California Press, Ios Angeles. Cited on page .
MacCallum, R. M., Mauch, M., Burt, A., and Ieroi, A. M. aoa. Ivo-
lution of music by public choice. Irc:ee1:ogs cj t|e Nct:coc| :c1eoy
cj 8::eo:es cj t|e Uo:te1 8tctes cj oer::c. Cited on page ,,.
rrrrrrcrs a,
Macpherson, 8. ,ao. Me|c1y co1 Hcrocoy. 1rect:se jcr t|e 1ec:|er
co1 t|e 8to1eot. oseph Williams, Iondon. Cited on pages ,o, ;;,
o, and .
Malmberg, C. I. ,. The perception of consonance and dissonance.
Isy:|c|cg::c| Mcocgrc|s, aa.,,,,. Cited on pages o and o,.
Mashinter, K. aooo. Calculating sensory dissonance. 8ome discrep-
ancies arising from the models of Kameoka 8 Kuriyagawa, and
Dutchinson 8 Knopo. Lo:r::c| Mos::c|cgy Re::eu, a.o,.
Cited on page o,.
Mathews, M. V., Roberts, I. A., and Pierce, . R. ,,. Iour new
scales based on nonsuccessive-integer-ratio chords. coroc| cj t|e
:cost::c| 8c::ety cj oer::c, ;8.8o. Cited on page ,.
Mathieu, W. A. ,,;. Hcroco:: Lxer:eo:e. Inner Traditions,
Rochester. Cited on page ,a.
McDermott, . D., Iehr, A. ., and Oenham, A. . aoo. Indi-
vidual dierences reveal the basis of consonance. Correot :c|cgy,
ao.o,o,. Cited on page ,.
Meyer, I. B. ,o. Loct:co co1 Meco:og :o Mos::. University of
Chicago Press, Chicago. Cited on pages a and o.
Milne, A. . aoo,a. A psychoacoustic model of harmonic cadences.
Masters thesis, University of yvskyl, Iinland. Cited on pages ,,
,,, , ,, , o, and ,;.
Milne, A. . aoo,b. A psychoacoustic model of harmonic cadences.
A preliminary report. In Iouhivuori, ., Ierola, T., 8aarikallio, 8.,
Dimberg, T., and Ierola, P., editors, L8COM zoo, Irc:ee1:ogs, pages
,a,,;, yvskyl, Iinland. Cited on page ,;.
Milne, A. . aoo. Tonal music theory. A psychoacoustic eplana-
tion In Demorest, 8. M., Morrison, 8. ., and Campbell, P. 8.,
a, rrrrrrcrs
editors, Irc:ee1:ogs cj t|e t| Ioteroct:coc| Ccojereo:e co Mos:: Ier:e-
t:co co1 Ccgo:t:co, pages ,;ooo, University of Washington, 8eattle,
U8A. Cited on page .
Milne, A. ., Carl, M., 8ethares, W. A., Noll, T., and Dolland, 8.
aoa. 8cratching the scale labyrinth. In Agon, C., Amiot, I.,
Andreatta, M., Assayag, G., Bresson, ., and Mandereau, ., editors,
Mct|eoct::s co1 Ccootct:co :o Mos:: MCM zo, volume o;ao of
LNI, pages o,, Berlin. 8pringer-Verlag. Cited on page ;,.
Milne, A. . and Prechtl, A. aoo. New tonalities with the Thummer
and The Viking. In Crossan, A. and Kaaresoja, T., editors, Irc:ee1:ogs
cj t|e ,r1 Ioteroct:coc| Hct:: co1 o1:tcry Ioterc:t:co Des:go crks|c,
volume a, pages aoaa, yvskyl, Iinland. Cited on page o.
Milne, A. ., 8ethares, W. A., Ianey, R., and 8harp, D. B. aob.
Modelling the similarity of pitch collections with epectation ten-
sors. coroc| cj Mct|eoct::s co1 Mos::, .ao. Cited on pages o
and aoo.
Milne, A. ., 8ethares, W. A., and Plamondon, . aoo;. Isomorphic
controllers and Dynamic Tuning. Invariant ngering over a tuning
continuum. Ccooter Mos:: coroc|, ,,.,a. Cited on page o.
Milne, A. ., 8ethares, W. A., and Plamondon, . aoo. Tuning con-
tinua and keyboard layouts. coroc| cj Mct|eoct::s co1 Mos::, a.
,. Cited on pages o, ,, and ;.
Moore, B. C. ,;,. Irequency dierence limens for short-duration
tones. coroc| cj t|e :cost::c| 8c::ety cj oer::c, ,.ooo,. Cited
on page a,,.
Moore, B. C., Glasberg, B. R., and 8hailer, M. . ,,. Irequency
and intensity dierence limens for harmonics within comple tones.
1|e coroc| cj t|e :cost::c| 8c::ety cj oer::c, ;a.ooo. Cited on
pages aa and a,,.
rrrrrrcrs a
Nisbett, R. I. and Wilson, T. D. ,;;. Telling more than we can
know. Verbal reports on mental processes. Isy:|c|cg::c| Re::eu,
,,.a,a,. Cited on page ,,.
North, A. C. and Dargreaves, D. . ,,. 8ubjective compleity, fa-
miliarity, and liking for popular music. Isy:|coos::c|cgy, ,.;;,,.
Cited on pages ,, a,, and ,o.
Parncutt, R. ,. Revision of Terhardts psychoacoustical model of
the roots of a musical chord. Mos:: Ier:et:co, o.o,,. Cited on
pages ,, oo, o;, o,, o,, and o.
Parncutt, R. ,,. Hcrocoy. Isy:|cc:cost::c| rcc:|. 8pringer-
Verlag. Cited on pages , o, ,, ,,, and ,.
Parncutt, R. ,,a. Dow middle is middle C Terhardts virtual
pitch weight and the distribution of pitches in music. Unpublished.
Cited on page .
Parncutt, R. ,,,. Template-matching models of musical pitch and
rhythmperception. coroc| cj NeuMos:: Resecr:|, a,.,o;. Cited
on page oo.
Parncutt, R. ao. The tonic as triad. Key proles as pitch salience
proles of tonic triads. Mos:: Ier:et:co, a,.,,,,o. Cited on
pages o, , a, , o,, oo, ;, ;,, and ;o.
Parncutt, R. and Dair, G. ao. Consonance and dissonance in music
theory and psychology. Disentangling dissonant dichotomies. cor-
oc| cj Ioter1:s::|:ocry Mos:: 8to1:es, a.,oo. Cited on page ,.
Parncutt, R. and Prem, D. aoo. The relative prevalence of me-
dieval modes and the origin of the leading tone. Poster presented
at International Conference of Music Perception and Cognition
ICMPCo, 8apporo, apan, aa, August aoo. Cited on pages oo
and ;,.
ao rrrrrrcrs
Pearce, M. T. and Wiggins, G. A. aooo. Ipectation in melody. The
inuence of contet and learning. Mos:: Ier:et:co, a,.,;;,o.
Cited on pages a, aa, ,;, ,, ,o, ,, and o.
Peterson, I. R. and Peterson, M. . ,,. 8hort-term retention of
individual verbal items. coroc| cj Lxer:oeotc| Isy:|c|cgy, .,,
,. Cited on pages a, and .
Petitmengin, C. and Bitbol, M. aoo,. The validity of rst-person
descriptions as authenticity and coherence. coroc| cj Ccos::cosoess
8to1:es, ooa.,o,,o,. Cited on pages ,a and ,,.
Piston, W. and Devoto, M. ,;. Hcrocoy. Norton, New York, th
edition. Cited on pages ,o, ;;, ;, ;,, and .
Pitt, M. A., Myung, I. ., and Zhang, 8. aooa. Toward a method of
selecting among computational models of cognition. Isy:|c|cg::c|
Re::eu, o,,.,;a,,. Cited on page ;.
Plomp, R. and Ievelt, W. . M. ,o. Tonal consonance and critical
bandwidth. 1|e coroc| cj t|e :cost::c| 8c::ety cj oer::c, ,,.,
oo. Cited on pages , o,, ,, and ,,.
Pratt, G. ,,o. 1|e Dyoco::s cj Hcrocoy. Ir:o::|es co1 Irc:t::e. Oford
University Press, Oford, UK. Cited on pages ,o and ;,.
Prechtl, A., Milne, A. ., Dolland, 8., Ianey, R., and 8harp, D. B.
aoa. A MIDI sequencer that widens access to the compositional
possibilities of novel tunings. Ccooter Mos:: coroc|, ,o.,a,.
Cited on page ;;.
Quinn, I. aoo. Darmonic function without primary triads. Cited
on page ,a.
Ramachandran, V. 8. and Dubbard, I. M. aoo. 8ynaesthesiaa win-
dow into perception, thought and language. coroc| cj Ccos::cosoess
8to1:es, a.,,,. Cited on page a,.
Rameau, .-P. ;ao. Nco:eco systoe 1e oos:oe t|ecr:oe. Paris. Cited
on page ,a.
rrrrrrcrs a;
Riemann, D. ,,,. Ideen zu einer lehre von den tonvorstel-
lungen. c|r|o:| 1er :||:ct|ek Ieters, aaaao. Cited on page ,,.
Roederer, . G. aoo. 1|e I|ys::s co1 Isy:|c|ys::s cj Mos::. 8pringer,
,th edition. Cited on pages , and a.
Rogers, D. W. ,,,. A geometric approach to pcset similarity. Ier-
se:t::es cj Neu Mos::, ,;.;;,o. Cited on page .
Rubner, Y., Tomasi, C., and Guibas, I. . aooo. The Iarth Movers
Distance as a metric for image retrieval. Ioteroct:coc| coroc| cj Cco-
oter 1:s:co, ,oa.,,a. Cited on page o.
8chenker, D. ,,. Hcrocoy. University of Chicago Press, Chicago.
Cited on page ,a.
8chenker, D. ,;. Ccooterc:ot. 8chirmer, New York. Cited on
page ,a.
8choenberg, A. ,o,. 8tro:torc| Ioo:t:cos cj Hcrocoy. Iaber and Iaber,
Iondon, and edition. Cited on page a.
8choenberg, A. ,;. 1|ecry cj Hcrocoy. Iaber and Iaber, Iondon,
,rd revised edition. Cited on page ,.
8cott, D. and Isaacson, I. . ,,. The interval angle. A similarity
measure for pitch-class sets. Ierse:t::es cj Neu Mos::, ,o.o;,a.
Cited on page .
8ethares, W. A. ,,,. Iocal consonance and the relationship be-
tween timbre and scale. 1|e coroc| cj t|e :cost::c| 8c::ety cj oer::c,
,,,.aaa. Cited on pages o, and ,o.
8ethares, W. A. aoo. 1oo:og, 1:o|re, 8e:troo, 8:c|e. 8pringer Ver-
lag, Iondon, and edition. Cited on pages viii, , oa, o,, ,,
,, and ,,.
8ethares, W. A., Milne, A. ., Tiedje, 8., Prechtl, A., and Plamondon,
. aoo,. 8pectral tools for Dynamic Tonality and audio morphing.
a rrrrrrcrs
Ccooter Mos:: coroc|, ,,a.;,. Cited on pages ,, o, ;,
and ,,.
8hao, . ,,;. An asymptotic theory for linear model selection. 8tc-
t:st::c 8:o::c, ;aaao,. Cited on page .
8hepard, R. N. ,;. Toward a universal law of generalization
for psychological science. 8::eo:e, a,;,ao.,;,a,. Cited on
page .
8hklovsky, V. ,o. Art as technique. In Iemon, I. T. and Reis, M. .,
editors, Ross:co Icroc|:st Cr:t:::so. Icor Lsscys, pages ,a,. Univer-
sity of Nebraska Press. Cited on page aoa.
8loboda, . A. and uslin, P. N. aoo. Mos:: co1 Loct:co, chapter Psy-
chological perspectives onmusic andemotion, pages ;o,. Oford
University Press, Oford. Cited on pages aa, a,, and ,,.
8mith, A. B. ,,;. A cumulative method of quantifying tonal con-
sonance in musical key contets. Mos:: Ier:et:co, a.;.
Cited on page oa.
8pelke, I. 8. and Kinzler, K. D. aoo;. Core knowledge. De:e|coeotc|
8::eo:e, o.,,o. Cited on page ,o.
8teinke, W. R., Cuddy, I. I., and Dolden, R. R. ,,,. Perception
of musical tonality as assessed by the probe-tone method. Ccoc1:co
:cost::s, a.o. Cited on page o.
8teinke, W. R., Cuddy, I. I., and Dolden, R. R. ,,;/,,. Dissoci-
ation of musical tonality and pitch memory fromnonmusical cogni-
tive abilities. Ccoc1:co coroc| cj Lxer:oeotc| Isy:|c|cgy, .,o,,,.
Cited on page o.
8tone, M. ,;;. An asymptotic equivalence of choice of model by
cross-validation and Akaikes criterion. coroc| cj t|e Rcyc| 8tct:st::c|
8c::ety, ,,.,,,;. Cited on page .
8tumpf, C. ,o. 1cosy:|c|cg:e, volume a. Ieipzig. Cited on
page ,.
rrrrrrcrs a,
8turges, D. A. ,ao. The choice of a class interval. coroc| cj t|e oer-
::co 8tct:st::c| ssc::ct:co, a,.ooo. Cited on page ao.
8un, R. aoo. Doc|:ty cj t|e M:o1. cttco-U rcc:| 1cucr1 Ccg-
o:t:co. Psychology Press. Cited on page aa.
Tenney, . ,. H:stcry cj Ccoscoco:e co1 D:sscoco:e. Icelsior
Music, New York. Cited on pages ,, , ,,, ,,, and ,.
Terhardt, I. ,,. The concept of musical consonance. A link be-
tween music and psychoacoustics. Mos:: Ier:et:co, ,.a;oa,.
Cited on page ,.
Terhardt, I., 8toll, G., and 8eewann, M. ,a. Pitch of comple sig-
nals according to virtual-pitch theory. Tests, eamples, and predic-
tions. 1|e coroc| cj t|e :cost::c| 8c::ety cj oer::c, ;,.o;o;.
Cited on pages ,, , ,, and oo.
Thompson, K. ,. reck:og t|e C|css rocr. Necjcroc|:st I:|o oc|-
ys:s. Princeton University Press. Cited on page aoa.
Thompson, W. I. ,o. o1geoeots cj key :|coge :o c:| :|crc|e ex:erts.
o :o:est:gct:co cj t|e seos:t:::ty tc keys, :|cr1s, co1 :c:::og. PhD disser-
tation, Queens University at Kingston. Cited on page o.
Toiviainen, P. and Krumhansl, C. I. aoo,. Measuring and modeling
real-time responses to music. The dynamics of tonality induction.
Ier:et:co, ,ao.;,;oo. Cited on pages ,o, o, and ;.
Trout, . D. aoo;. The psychology of scientic eplanation. I|:|csc|y
Ccocss, a,.o,,. Cited on page ao.
Tymoczko, D. aooo. 8upporting online material for the geomtery of
musical chords. 8::eo:e. Cited on pages and ;.
Vitz, P. C. ,oo. Aect as a function of stimulus variation. coroc| cj
Lxer:oeotc| Isy:|c|cgy, ;.;,;,. Cited on page a,.
von Oettingen, A. oo. Hcroco:esysteo:o 1oc|er Lotu::k|oog. Dorpat.
Cited on page ,,.
aao rrrrrrcrs
Vos, . ,a. The perception of pure and mistuned musical fths and
major thirds. Thresholds for discrimination and beats. Ier:et:co co1
Isy:|c|ys::s, ,a,.a,;,,. Cited on page ,.
Vos, . ,,. 8pectral eects in the perception of pure and tem-
pered intervals. Discrimination and beats. Ier:et:co co1 Isy:|c|ys::s,
,a.;,. Cited on page ,.
Vos, . ,o. Purity ratings of tempered fths and major thirds. Mos::
Ier:et:co, ,,.aaa;. Cited on page ,.
Vos, . ,. 8ubjective acceptability of various regular twelve-tone
tuning systems in two-part musical fragments. coroc| cj t|e :cost::c|
8c::ety cj oer::c, ,o.a,,a,,a. Cited on page ,.
Vos, . and van Vianen, B. G. ,a. The eect of fundamental fre-
quency on the discriminablity between pure and tempered fths and
major thirds. Ier:et:co co1 Isy:|c|ys::s, ,;o.o;,. Cited on
page ,.
Vos, . and van Vianen, B. G. ,b. Thresholds for discrimination
between pure and tempered intervals. The relevance of nearly coin-
ciding harmonics. coroc| cj t|e :cost::c| 8c::ety cj oer::c, ;;.;o
;. Cited on page ,.
Vos, P. G. and Troost, . M. ,,. Ascending and descending melodic
intervals. 8tatistical ndings and their perceptual relevance. Mos::
Ier:et:co, o,.,,,,o. Cited on page .
Wescott, R. W. ,;. Iinguistic iconism. Lcogocge, ,;a.,o,a.
Cited on page a,.
Widdess, R. ,. Aspects of form in North Indian alap and dhru-
pad. In Widdess, D. R., editor, Mos:: co1 1rc1:t:co. Lsscys co s:co
C Ot|er Mos::s Ireseote1 tc Lcoreo I::keo, pages ,,a. Cambridge
University Press, Cambridge, UK. Cited on page ,.
Wilding-White, R. ,o. Tonality and scale theory. coroc| cj Mos::
1|ecry, a.a;ao. Cited on page o,.
rrrrrrcrs aa
Wilson, I. ,;. Ietter to Chalmers pertaining to moments-of-
symmetry/Tanabe cycle. Unpublished. Cited on pages o and ;;.
Woolhouse, M. aoo;. Ioter:c| :y:|es co1 t|e :cgo:t:co cj :t:| cttrc:t:co :o
estero tcoc|-|croco:: oos::. PhD dissertation, Cambridge Univer-
sity, Department of Music, Cambridge. Cited on page ,o.
Woolhouse, M. aoo,. Modelling tonal attraction between adja-
cent musical elements. coroc| cj Neu Mos:: Resecr:|, ,,.,;,;,.
Cited on pages aa, ,;, and ,,.
Woolhouse, M. and Cross, I. aoo. Using interval cycles to model
Krumhansls tonal hierarchies. Mos:: 1|ecry 8e:troo, ,a.oo;.
Cited on page .
Zucchini, W. aooo. An introduction to model selection. coroc| cj
Mct|eoct::c| Isy:|c|cgy, ,,.,o. Cited on page ;.
Zwicker, I. and Iastl, D. ,,,. Isy:|cc:cost::s. Ic:ts co1 oc1e|s.
8pringer, Berlin. Cited on pages and ,.
A
8 MOOTDI NG WI DTD AND TDI DI IIIRINCI
II MIN
The jreoeo:y 1:ereo:e |:oeo, also known as the ost oct::ec||e jreoeo:y
1:ereo:e, is determined in a two-alternative forced-choice a-arc e-
periment. In this type of eperiment, participants are presented with
numerous pairs of successively played tones. The tones in each pair ei-
ther have the same or diering frequencies, and the participant is tasked
with categorizing them accordingly.
The frequency dierence limen is normally dened as the frequency
dierence at which the correct response rate indicates a d
also known
as d prime of approimately one there are alternative denitions. The
value of d
_
5 7
6 8
_
=
_
15 37
26 48
_
=
_
5 21
12 32
_
. B.
The coter (teoscr) rc1o:t of a tensor A of size I J and a ten-
sor B of size L M produces a tensor of size I J L M con-
aa
aao arrrut t
taining all possible products of their elements. If C = A B, then
c[i, j, . . . , , m, . . . ] = a[i, j, . . . ] b[, m, . . . ]. Ior eample,
_
1 3
2 4
_
_
5 7
6 8
_
=
_
_
_
_
_
_
15 17 35 37
16 18 36 38
25 27 45 47
26 28 46 48
_
_
_
_
_
_
=
_
_
_
_
_
_
5 7 15 21
6 8 18 24
10 14 20 28
12 16 24 32
_
_
_
_
_
_
. B.a
The 2 2 partitions help to visualize the four modes of the resulting
tensor. stepping from a partition to the one below increments the in-
de of the rst mode, stepping from a partition to the one on its right
increments the inde of the second mode, stepping down a row, within
the same partition, increments the inde of the third mode, stepping
rightwards by a column, within the same partition, increments the in-
de of the fourth mode. The symbol
R
denotes the Rth outer power
of a tensor, that is, A
R
=
R
..
AA A
.
The K|ctr:-Rcc rc1o:t is the matching columnwise Kronecker
product of matrices. The Khatri-Rao product of a matri of size I N
and a matri of size J N is a matri of size IJ N which may
be interpreted as a tensor of size I J N. If C = A B, then
c[i, j, n] = a[i, n] b[j, n]. This can be naturally etended to succes-
sive Khatri-Rao products of matrices. if F = AB D, then
f[i, j, . . . , , n] = a[i, n] b[j, n] d[, n] the rows of the matrices, in-
deed here by n, must have the same dimension.
Ior eample,
_
1 3
2 4
_
_
5 7
6 8
_
=
_
_
_
_
_
_
15 16
37 38
25 26
47 48
_
_
_
_
_
_
=
_
_
_
_
_
_
5 6
21 24
10 12
28 32
_
_
_
_
_
_
, B.,
In Mathematica, this product can be written Outer[Times, a, b, . . . , d, 1] where
the nal 1 species the level at which the outer product is calculated.
arrrut t aa;
and
_
1 3
2 4
_
_
5 7
6 8
_
_
9 11
10 12
_
=
_
_
_
_
_
_
159 3711 169 3811
1510 3712 1610 3812
259 4711 269 4811
2510 4712 2610 4812
_
_
_
_
_
_
=
_
_
_
_
_
_
45 231 54 264
50 252 60 288
90 308 108 352
100 336 120 384
_
_
_
_
_
_
. B.,
As before, the partitions indicate the resulting tensors modes. The
symbol
R
denotes the Rth Khatri-Rao power.
The :ooer (1ct) rc1o:t is like the tensor product but addition-
ally contracts sums over the product of the last inde of the rst
tensor with the rst inde of the second tensor. if C = A B,
then c[. . . , i, j, , m, . . . ] =
k
a[. . . , i, j, k] b[k, , m, . . . ] the inner two
modes of A and B, indeed here by k, must have the same dimension.
Ior an order-R tensor and an order-S tensor, this results in an order-
(R+S 2) tensor. Ior eample,
_
_
_
_
1
2
3
_
_
_
_
_
_
_
_
4
5
6
_
_
_
_
= 14 + 25 + 36 = 32, B.
and
_
1 3
2 4
_
_
5 7
6 8
_
=
_
15+36 17+38
25+46 27+48
_
=
_
23 31
34 46
_
. B.o
C
COMPUTATI ONAI 8 I MPII II CATI ON OI
IXPICTATI ON TIN8 OR8
The general form of the epectation tensors is, as shown in 8ec-
tion ,.,.,,
x
e
[j
1
, j
2
, . . . , j
R
] =
(i
1
,...,i
R
)I
R
:
i
n
=i
p
R
r=1
x[i
r
, j
r
] , C.
which can be written in tensor notation as
X
J
R
=
_
_
1
J
R
E
I
R
_
X
R
R+1,1,R+2,2,...,...,R+R,R
_
R
1
I
R
, C.a
where 1
J
R
R
J
R
is a tensor with R modes, each of dimension J, all
of whose elements are ones, the
R
_
0 if i
n
= i
p
1 otherwise.
C.,
To understand the construction in C.a, observe that the outer prod-
uct 1
J
R
E
I
R
etends the tensor of nonrepeated indices into R ad-
ditional modes, each of dimension J. 8ince X is an I J matri,
X
R
R
IJIJIJ
is of order 2R. The inde permutation re-
shapes X
R
into an element of R
J
R
I
R
. The Dadamard product with
the permuted X
R
, therefore, sets all entries occurring at locations
with repeated indices to zero. These are precisely the entries that are
ecluded fromthe summation C.. The Rth inner product then sums
aa,
a,o arrrut c
over the R dierent I-dimensional modes to collapse to the desired
tensor in R
J
R
.
The epression takes this formdue to the constraints on which inde
values are summed over. Both forms C. and C.a are cumbersome
to calculate directly. Were there no constraint on which indices in C.
are summed over, C.a would take the form
X
R
R+1,1,R+2,2,...,...,R+R,R
R
1
I
R
. C.,
This requires (IJ)
R
multiplications, but can be reduced to J
R
multi-
plications by rearranging it to
_
1
I
X
_
R
. C.
This suggests an alternative way of calculating C.a, to sum all of the
terms and then subtract the terms that should be ecluded.
Ior eample, consider the R = 2 case. The unconstrained term
is (1
I
X)
2
and the term corresponding to the repeated indices is
(X
) 1
I
, which simplies to X
I
X
_
_
1
I
X
_
_
X
X
_
. C.o
The process for R = 3 is similar. The unconstrained termis (1
I
X)
3
.
There are three terms corresponding to the i = j constraint, the j = k
constraint and the i = k constraint, each is equal to one of the transpo-
sitions of (1
I
X) (X
I
X
_
_
1
I
X
_
_
1
I
X
_
_
_
1
I
X
_
_
X
X
_
_
1,2,3
_
_
1
I
X
_
_
X
X
_
_
2,1,3
_
_
1
I
X
_
_
X
X
_
_
3,1,2
+ 2
_
X
_
1
I
. C.;
An analogous procedure can be followed for any value of R, though
this becomes increasingly dicult because the number of terms grows
as R!. Iach termrepresents a unique minimal set of dierent inde con-
straints. Ior eample, one term A might have the inde constraints
i
1
= i
2
and i
3
= i
4
. Another termB might have no constraint on i
1
but
have i
2
= i
3
= i
4
. When the indices are ordered sequentially, the term
can be calculated by writing each constraint as a subterm of the form
X
c
1
I
, C.
where c is the number of indices in that constraint, and then taking
the outer product of the dierent subterms. Ior instance, with inde
constraints A, C. is
_
_
X
_
1
I
_
_
_
X
_
1
I
_
,
which simplies to
_
X
X
_
_
X
X
_
.
With inde constraints B, C. is
_
X
1
I
_
_
_
X
_
1
I
_
,
which simplies to (1
I
X)
_
(X
) 1
I
_
. The permutation
of the indices in the constraints of a term is given by the correspond-
ing permutation of that terms tensor. Ior eample, the term with
a,a arrrut c
constraints i
1
= i
3
and i
2
= i
4
a permutation of A is represented
by
_
(X
X) (X
X)
_
1,3,2,4
while the term with constraints i
2
and
i
1
= i
3
= i
4
a permutation of B is represented by
_
(1
I
X)
_
(X
) 1
I
_
_
2,1,3,4
.
Ior eample, the R! = 24 terms for the R = 4 case can be written
X
(4)
e
=
_
1
I
X
_
_
1
I
X
_
_
1
I
X
_
_
1
I
X
_
_
_
1
I
X
_
_
1
I
X
_
_
X
X
_
_
1,2,3,4
_
_
1
I
X
_
_
1
I
X
_
_
X
X
_
_
1,3,2,4
_
_
1
I
X
_
_
1
I
X
_
_
X
X
_
_
1,4,2,3
_
_
1
I
X
_
_
1
I
X
_
_
X
X
_
_
2,3,1,4
_
_
1
I
X
_
_
1
I
X
_
_
X
X
_
_
2,4,1,3
_
_
1
I
X
_
_
1
I
X
_
_
X
X
_
_
3,4,1,2
+ 2
_
_
1
I
X
_
_
_
X
_
1
I
_
_
1,2,3,4
+ 2
_
_
1
I
X
_
_
_
X
_
1
I
_
_
2,1,3,4
+ 2
_
_
1
I
X
_
_
_
X
_
1
I
_
_
3,1,2,4
+ 2
_
_
1
I
X
_
_
_
X
_
1
I
_
_
4,1,2,3
+
_
_
X
X
_
_
X
X
_
_
1,2,3,4
+
_
_
X
X
_
_
X
X
_
_
1,3,2,4
+
_
_
X
X
_
_
X
X
_
_
1,4,2,3
6
_
X
_
1
I
. C.,
While epressions like C.; and C., are harder to visualize than the
more compact form C.a, they can be calculated more eciently. the
unsimplied form has O
_
_
IJ
_
R
_
multiplications, the simplied form
arrrut c a,,
has O
_
I
_
J
R
_
_
a ratio of 1 : I
R1
. 8uch simplications are key in
being able to calculate the practical eamples of 8ection ,.o, of the main
tet, some of which use large values for I oa in I. ,.o.a, and , in
I. ,.o.,.
D
IORMAI 8 PICI II CATI ON OI TDI MIIODI C
AIII NI TY MODII
In this appendi, I provide a full mathematical denition of the
models and the form of the empirical data, that are presented
in Chapter ,. Iach of the mathematical steps are described in a
more verbal manner in 8ections ,.., ,..a, and ,.,.a. The :a-
tat function Affinity_model_final.m that implements these equations
can be downloaded from http://www.dynamictonality.com/melodic_
affinity_files/.
In the eperiment, there were ,, participants, and each participant
listened to oo dierent stochastically generated melodies. The melodies
were in one of eleven dierent o::rctcoc| too:ogs the pitch intervals
were dierent to those found in standard Western music. Iurther-
more, there were eleven dierent t:o|res usedeach timbre had dif-
ferently tuned partials overtones or frequency components. The mi-
crotonal scales are indeed by m {1, 2, . . . , 11}, and the timbres are
indeed by n {1, 2, . . . , 11}.
Iach observation involved the participant listening to a single
melody in tuning m played with two dierent timbres n
1
and n
2
.
The participant chose the timbre in which the melodys notes had the
greater co:ty tted the best. In all, 110 dierent stimuli i.e., dier-
ent values of the tuple (m, n
1
, n
2
) were tested, but each participant
listened to a randomly selected uniform distribution without replace-
ment subset of 60 of these. A choice of n
1
was coded 1, a choice of
n
2
was coded 0, and missing data was coded NaN. this results in a data
matri Y {0, 1, NaN}
11044
, each column containing 50 randomly
located NaNs.
a,
a,o arrrut u
Ior timbre n, let a(n) Z
12
be a vector of the :t:|es log-
frequencies of the rst twelve partials indeed by i. The units of pitch
are cents aooth of an octave above the rst partialthe pitch of the
rst partial a
1
(n) is, therefore, always 0.
Iet w R
12
be a vector of weights for the above a partials. The
weights are the same, irrespective of the timbre n. The elements of w
are also indeed by i, and their values are parameterized by a rc||-c
value R, so that
w
i
() = i
i = 1, . . . , 12 . D.
This means that when = 0, all partials have a weight of 1, as in-
creases, the weights of the higher partials are reduced.
The partials their pitches and weights are embedded in a cents do-
main :o1::ctcr :e:tcr b(n; ) R
6000
whose elements are indeed by j.
b
j
(n; ) =
12
i=1
w
i
()
_
j 1 a
i
(n)
_
j = 1, . . . , 6000 , D.a
where [z] is the Kronecker delta function, which equals 1 when z = 0,
and equals 0 when z = 0. This equation means that the vector b(n; )
is all zeros ecept for twelve elements. for i = 1 to 12, its (f
i
(n) +1)th
element has a value of w
i
.
The twelve delta spikes in b(n; ) are smeared by non-circular
convolution with a discrete Gaussian kernel c() parameterized with
a standard deviation [0, ) to give a se:trc| :t:| :e:tcr d(n; , )
R
6000
, which is indeed by k.
d
k
(n; , ) =
6000
j=1
b
j
(n; ) c
kj+1
() k = 1, . . . , 6000 , D.,
where c
z
= 0 if z < 1. Ior the sake of computational eciency, the
method used here diers slightly from that described in Ch. , where
every partial is embedded into a separate vector and separately con-
volved before being summed into a nal spectral pitch vector. The
arrrut u a,;
spectral pitch vectors resulting from the two methods dier slightly
only when the smeared pitches overlap in frequency. This would occur
only when two partials are close in frequency, which is not the case for
the timbres used in this eperiment.
This vector is cross-correlated with the spectral pitch vector of a
|croco:: teo|cte d(nc; , ) one with partials whose frequencies are
integer multiples of the rst partial to produce a ::rtoc| :t:| :e:tcr
g(n; , ) R
11999
, which is indeed by .
g
(n; , ) =
6000
k=1
d
k
(nc; , ) d
k+6000
(n; , ) = 1, . . . , 11999 ,
D.,
where d
x
= 0 if 6000 < x < 1.
The |croco:::ty h(n; , ) R of the spectral pitch vector d(n; , )
is given by the maimum value of the above cross-correlation.
h(n; , ) = max
_
g(n; , )
_
. D.
The se:trc| :t:| s:o:|cr:ty s(n, u) (0, 1) of two tones with timbre n
making an interval of size u cents is given by
s(n, u; , ) =
k
d
k
(n; , ) d
ku
(n; , )
k
_
d
k
(n; , )
_
2
. D.o
This equation gives the cosine of the angle between the vector
d(n; , ) and a transformation of itself that is shifted u elements to the
right, and serves as a similarity measure between them.
The probability of an interval of size u cents occurring between two
successive tones, given microtonal tuning m, is denoted p
U
(u | m).
As described in 8ection ,.a.a, this probability distribution has eight
independent parameters, which were constant across all stimuli. This
a, arrrut u
implies that the exe:te1 se:trc| :t:| s:o:|cr:ty s(m, n; , ) of successive
tones with timbre n, given a microtonal tuning of m, is
s(m, n; , ) = E
u
[s(n, u; , ) | m]
=
u
p
U
(u | m) s(n, u; , ) . D.;
8imilarly, the ::rtoc| :t:| s:o:|cr:ty v(n, u) (0, 1) of two tones with
timbre n making an interval of size u cents is given by
v(n, u; , ) =
k
g
k
(n; , ) g
ku
(n; , )
k
_
g
k
(n; , )
_
2
, D.
and the exe:te1 ::rtoc| :t:| s:o:|cr:ty v(m, n; , ) of successive tones
with timbre n, given a microtonal tuning of m, is
v(m, n; , ) = E
u
[v(n, u; , ) | m]
=
u
p
U
(u | m) v(n, u; , ) . D.,
Irom D., D.;, and D.,, I construct three predictors for the
probability of choosing timbre n
1
given (m, n
1
, n
2
).
f
S
(m, n
1
, n
2
; , ) = s(m, n
1
; , ) s(m, n
2
; , ) D.o
f
V
(m, n
1
, n
2
; , ) = v(m, n
1
; , ) v(m, n
2
; , ) D.
f
H
(n
1
, n
2
; , ) = h(n
1
; , ) h(n
2
; , ) . D.a
Three models, indeed by i, of the eperimentally obtained data
the probabilities of choosing timbre n
1
for 110 dierent values of
arrrut u a,,
(m, n
1
, n
2
)were created from combinations of these predictors and
diering parameterizations.
model i = P
_
Y = 1 | m, n
1
, n
2
; i
_
=
1
1 + e
z
i
, where
z
1
=
1
f
S
(m, n
1
, n
2
;
S
,
S
) +
2
f
H
(n
1
, n
2
;
H
,
H
)
z
2
=
1
f
S
(m, n
1
, n
2
; , ) +
2
f
H
(n
1
, n
2
; , )
z
3
=
1
f
V
(m, n
1
, n
2
; , ) +
2
f
H
(n
1
, n
2
; , ) . D.,
The overall form is a logistic model because the data are probabilities.
Aconstant termis not used because D.oD.a imply that if n
1
= n
2
,
then f
S
= f
V
= f
H
= 0.
E
CRO8 8 - VAII DATI ON CORRIIATI ON AND
ROOT MIAN 8 QUARID IRROR
Ior each of the models discussed in Chapter , I performed twenty runs
of a-fold cross-validation of the models. Iach of the twenty runs uti-
lizes a dierent a-fold partition of the probe tone data, each fold con-
taining a samples. Within each run, one fold is removed and denoted
the :c|:1ct:co set, the remaining folds are aggregated and denoted the
trc:o:og set. The parameters of the model are optimized to minimize the
sum of squared errors between the models predictions and the aa sam-
ples in the training set. Cross-validation statistics, which measure the t
of the predictions to the validation set, are then calculated. This whole
process is done for all twelve folds and this constitutes a single run of
the a-fold cross-validation. The same process is used for all twenty
runs of the a-fold cross-validationeach run using a dierent a-fold
partition of the data. The cross-validation statistics are averaged over
all twelve folds in all twenty runs.
More formally. Iet the data set of I samples be partitioned into K
folds the probe tone data comprise a, values, so I = 24, and I use a-
fold cross-validation, so K = 12. Iet k[i] be the fold of the data con-
taining the ith sample. The cross-validation is repeated, each time with
a dierent K-fold partition, a total of J times. The cross-validation
correlation of the jth run of the cross-validation is given by
r
CV
[j] = 1
_
I
i=1
(y
i
y
\k[i]
i
)
2
/
I
i=1
(y
i
y)
2
, I.
where y
\k[i]
i
denotes the tted value for the ith sample returned by the
model estimated with the k[i]th fold of the data removed, and y is the
mean of all the sample values y
i
. The nal cross-validation correlation
a,
a,a arrrut r
statistic is the mean over the J runs of the cross-validation in our anal-
ysis, J = 20.
r
CV
=
1
J
J
j=1
r
CV
[j] . I.a
The root mean squared error of the jth run of the cross-validation is
given by
RMSECV[j] =
_
1
I
I
i=1
(y
i
y
\k[i]
i
)
2
, I.,
where y
\k[i]
i
denotes the tted value for the ith sample returned by the
model estimated withthe k[i]thfold of the data removed. The nal root
mean squared error of the cross-validation statistic is the mean over the
J runs of the cross-validation.
RMSECV =
1
J
J
j=1
RMSECV[j] . I.,
F
IORMAI 8 PICI II CATI ON OI TDI PROBI
TONI DATA MODII
Inthis appendi, I give a formal mathematical specicationof the probe
tone model described in Chapter . The techniques used are based on
those introduced in Chapter ,. The :atat routines that embody these
routines can be downloaded from http://www.dynamictonality.com/
probe_tone_files/.
Iet a chord comprising M tones, each of which contains N partials,
be represented by the matri X
f
R
MN
. Iach row of X
f
represents
a tone in the chord, and each of element of the row is the frequency
of a partial of that tone. In our model, I use the rst twelve partials so
N = 12, this means that, if X
f
is a three-tone chord, it will be a 3 12
matri.
The rst step is to convert the partials frequencies into pitch class
cents values.
x
pc
[m, n] = 1200log
2
(x
f
[m, n]/x
ref
) mod 1200 , I.
where is the nearest integer function, and x
ref
is an arbitrary refer-
ence frequency e.g., the frequency of middle C. These values are then
collected into a single :t:| :|css :e:tcr denoted x
pc
Z
12M
indeed by
i such that x
pc
[m, n] x
pc
[i], where i = (m1)N +n.
Iet each of the partials have an associated weight x
w
[m, n], which
represents their sc|:eo:e, or probability of being heard. I test three mod-
els a, b, and c. Given model , where {a, b, c} denotes the model,
the saliences of the tonic triad are parameterized by a rc||-c value
R, and a :|cr1-1egree ue:g|t:og value [0, 1], so that
[m/ R
]
x
w
[m, n] = n
m = 1, . . . , M, and n = 1, . . . , 12 , I.a
a,,
a,, arrrut r
where [m / R
j=0
x
pcs
[i, j] g[(k j) mod 1200]
i = 1, . . . , 12N, and k = 0, . . . , 1199 . I.
In this implementation, I make use of the circular convolution theo-
rem, which allows I., to be calculated eciently with fast Iourier
transforms, that is, f g = F
1
_
F(f ) F(g)
_
, where is circular con-
volution, F denotes the Iourier transform, is the Dadamard elemen-
twise product, and f stands for x
pcs
[i].
Iquation I., can be interpreted as adding random noise with a
Gaussian distribution to the original pitch classes in X
pcs
, thereby sim-
ulating perceptual pitch uncertainty. The standard deviation of the
Gaussian distribution models the pitch dierence limen just notice-
able dierence App. A. In laboratory eperiments with sine waves,
the pitch dierence limen is approimately , cents in the central range
of frequency Moore, ,;,, Moore et al., ,,. We would epect the
pitch dierence limen in the more distracting setting of listening to
music to be somewhat wider. Indeed, the value of was optimized
with respect to the probe tone dataat approimately o cents.
Iach element x
pcr
[i, k] of this matri models the probability of the
ith partial in x
p
being heard at pitch class k. In order to summarize
the responses to all the pitches I take the column sum, which gives a
vector of the epected numbers of partials heard at pitch class k. This
aoo-element row vector is denoted a se:trc| :t:| :|css :e:tcr x.
x = 1
X
pcr
, I.o
where 1 is a column vector of 12N ones, and
denotes matri transpo-
sition, so 1
(xx
)(yy
)
. I.;
This similarity value falls between o and , where implies the two
vectors are parallel, and o implies they are orthogonal.
I use this model to establish the similarities of a variety of probes
with respect to a contet. Iet the contet be represented by the spec-
tral pitch class vector x, and let the P dierent probes y
p
be collected
into a matri of spectral pitch class vectors denoted Y R
P1200
. The
column vector of P similarities between each of the probes and the con-
tet is then denoted s(x, Y) R
P
. Ior eample, the contet may be
a major triad built from ncs and the probes may be single ncs at the
twelve chromatic pitches. In this case, the thirty-si harmonics from
the contet a partials for each of the three dierent chord tones are
embedded intoa single spectral pitchclass vector x, as described inI.
I.o. Iach of the twelve dierently pitched probe tones a harmonics
are embedded into twelve spectral pitch class vectors y
p
. The similari-
ties of the contet and the twelve probes are calculatedas described
in I.;to give the vector of their similarities s(x, Y).
Models a, b, and c can now be summarized in mathematical form.
Iet the vector of probe tone data for both contets be denoted d
R
24
, let the vector of associated modelled similarities be denoted
s(x, Y; , , , ) R
24
, where , , are the roll-o, smoothing, and
chord degree weighting parameters discussed above, and {a, b, c}
denotes the model, let 1 be a column vector of a, ones,
d = 1 +s(x, Y; , , , ) + , I.
where and are the linear intercept and slope parameters, and is
a vector of a, unobserved errors that captures unmodelled eects or
random noise.
arrrut r a,;
Iach models parameter values were optimized, iteratively, to mini-
mize the sum of squared residuals between the models predictions and
the empirical data, that is, the optimized parameter values for model
are given by
( ,
, , , )[]
= arg min
,,,,
_
d1 s(, , , )
_
_
d1 s(, , , )
_
,
I.,
where arg min f() returns the value of that minimizes the value of
f().