hfgc

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

1 Aufrufe

hfgc

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

- Segmentation of Brain Tumour from MRI image – Analysis of K-means and DBSCAN Clustering
- A Hybrid Particle Swarm Optimization Algorithm to Human Skin Region Detection
- PhotoSketch: Internet Image Montage
- Barney Kessel Wisdom
- Aspect Ratio Processing DTV HDTV
- SEGMENTATION OF HISTOPATHOLOGICAL IMAGES USING FAST FUZZY C-MEANS APPROACH
- Digital Image Processing
- gould1964a.pdf
- Analyzing Tunes Par Ti
- A DADGAD CHRISTMAS ( Doug Young ).pdf
- The Study of Harmony - Diether de la Motte.pdf
- Partiture - Walking Bass Introduction
- Concept Checklist _Nat 3, 4, 5, Higher & Adv Higher
- TechTipConnectingChordChangesSmoothly.pdf
- artofaccompanyin00linduoft
- analyzing_11th_13th_chords.pdf
- Contour Detection and Hierarchical Image Segmentation
- Fdp on Im & Dm Using Ost
- Prac_6
- ML Project

Sie sind auf Seite 1von 6

STMS IRCAM-CNRS-UPMC

johan.pauwels@ircam.fr, florian.kaiser@ircam.fr, geoffroy.peeters@ircam.fr

curring patterns in the feature sequence cause a perception

This paper describes a novel way to combine a well-proven of a higher structure. In a self-similarity matrix this be-

method of structural segmentation through novelty detec- comes visible as stripes on the off-diagonals [4]. Novelty-

tion with a recently introduced method based on harmonic based systems try to identify transitions between two con-

analysis. The former system works by looking for peaks in trasting parts, which are also perceived as structural bound-

novelty curves derived from self-similarity matrices. The aries by humans [1]. Initially, these methods were the

latter relies on the detection of key changes and on the dif- dual approach of homogeneity-based methods [10], as one

ferences in prior probability of chord transitions according was looking exclusively for transitions between two dis-

to their position in a structural segment. Both approaches tinct sections that are similar according to some musical

are integrated into a probabilistic system that jointly esti- property [3]. Recently however, this approach has been

mates keys, chords and structural boundaries. The nov- extended to also include contrast between a homogeneous

elty curves are herein used as observations. In addition, and a non-homogeneous section [6].

chroma profiles are used as features for the harmony analy- The method we propose builds upon the novelty-based

sis. These observations are then subjected to a constrained method of [3], but integrates this with a novel approach

transition model that is musically motivated. An informa- that is based on the estimated harmony of the piece [16].

tion theoretic justification of this model is also given. Fi- Previous efforts of deriving a structural segmentation from

nally, an evaluation of the resulting system is performed. It harmony have mostly been concerned with using chroma

is shown that the combined system improves the results of features for the construction of a self-similarity matrix, in-

both constituting components in isolation. stead of or in addition to timbre-related features [5]. We

however work with a higher-level harmony description,

1. INTRODUCTION in the form of key and chord estimates. In this sense,

our approach is somewhat similar to previous systems by

Structural segmentation of music is the process in Maddage [11] or Lee [8], but in contrast to their systems,

which an audio recording is divided into a number of ours works simultaneously, not sequentially. They first ex-

non-overlapping sections that correspond to the macro- tract key and chord estimates, which they subsequently use

temporal organisation of a piece. These entities usually as inputs for a structure estimation. A sequential system

take the form of verses and choruses in popular music, or can also be constructed the other way around, using an es-

of movements in classical music. The obtained sections timate of the structure as input to aid with chord estima-

can then be used for interactive listening, audio summa- tion. An example of this kind is the method of Mauch [12].

rization, synchronization or as an intermediate step in fur- We, on the other hand, construct a probabilistic system that

ther content-based indexing. jointly estimates keys, chords and structural boundaries. It

Traditional approaches to structural segmentation have is based on the assumption that some chord combinations

been categorized into three categories [14]: repetition- are more common around structural boundaries. This is es-

based, novelty-based and homogeneity-based methods. A pecially clear when they are expressed as relative chords in

mid-level representation, called self-similarity matrix, is a key, as this gives a musicologically richer representation.

often used for these task. It is obtained from the feature se- These relative chord combinations will then be used as evi-

quence by comparing each instance with all time-delayed dence for structural boundaries, together with the positions

copies of itself according to some similarity measure. The of key changes and peaks in the novelty measure.

result is a visualisation of the musical structure. It was

In the remainder of this paper, we’ll first give an out-

originally introduced into the music domain by Foote [2].

line of our probabilistic system in Section 2.1. Then we’ll

go deeper into the details of how to integrate the harmony-

Permission to make digital or hard copies of all or part of this work for based and the novelty-based approach into this framework

personal or classroom use is granted without fee provided that copies are in Section 2.2, respectively Section 2.3. Afterwards, we

not made or distributed for profit or commercial advantage and that copies describe the experiments we performed and analyse the re-

bear this notice and the full citation on the first page. sults in Section 3. We conclude with some closing remarks

c 2013 International Society for Music Information Retrieval. and ideas for future work in Section 4.

2. A PROBABILISTIC SYSTEM FOR THE JOINT

ESTIMATION OF KEYS, CHORDS AND

STRUCTURAL BOUNDARIES

2.1 Overview

In this section we will describe the probabilistic system

that we propose for the determination of a structural seg-

mentation of a track, along with an estimation of the local Figure 1. An example annotated sequence with the three

keys and chords. As a starting point we use the system of structure dependent positions indicated

Pauwels et al. [15] for the simultaneous estimation of keys

and chords. It consists of an HMM in which each state

represents a combination of a key and a chord. We extend P (xt |st , kt , ct ) = P (yt |st , kt , ct ) P (zt |st , kt , ct ). The

it by letting each state q represent a structural position in final probability to be maximized will therefore be

addition to a key and a chord. A key k can take one of Nk T

values, a chord one of Nc values and the structural posi- Ŝ, K̂, Ĉ = arg max

Y

P (yt |st , kt , ct ) P (zt |st , kt , ct )

tions s can take one of two values: L which means that q is t=1

the last state of a structural segment or O which means that

P (st , kt , ct |st−1 , kt−1 , ct−1 )

it is not. Finally, we add a single state to handle the case

when no chord is being played, notably at the beginning Of these three terms, the chroma observation prob-

and end of a recording. In this state, the key will accord- ability P (yt |st , kt , ct ) and the transition probability

ingly take a “no-key” value and the structural position will P (st , kt , ct |st−1 , kt−1 , ct−1 ) will be set by the harmony-

take a value of s = R. In summary, q = (s, k, c) with based component of our system, while the novelty observa-

s ∈ {L, O} , k ∈ {K1 , . . . , KNk } , c ∈ {C1 , . . . , CNc } or tion probability P (zt |st , kt , ct ) will be set by the novelty-

q = (R, no-key, no-chord). based component.

Finding the most likely sequence of key, chord and We use different observations for both components to

structure labels given a sequence of observations X = capture different types of information, in hopes of them

{x1 , x2 , . . . , xT } then amounts to finding the state se- being complementary. For instance, a change of instru-

quence Q̂ = {q̂1 , q̂2 , . . . , q̂T } that optimally explains these mentation won’t be detected as a structure boundary by the

observations. Because the state variable q consists per harmony-based component, while the novelty-based com-

definition of the combination of a chord, key and struc- ponent won’t recognize a chord sequence that is typical for

ture variable, these three optimal sequences will always be an ending. The choice of their respective observations re-

jointly estimated. Afterwards, the structural boundaries Ŝ flects this.

can be derived from the optimal state sequence by inserting

a boundary for every transition from a state where s = L 2.2 The harmony-based component

to one where s = O, or from or to a state with s = R.

2.2.1 Motivation and information theoretical justification

The derivation of the optimal key K̂ and chord sequence

Ĉ from the latter is even more trivial. The basic premise upon which our approach is built, is that

By applying Bayes’ theorem, and further assuming the chord sequences, and more specifically chord pairs, exhibit

first order Markov property and independence of the ob- a different prior probability depending on their position

servations, we can rewrite the probability to be maximized with respect to structural boundaries. In addition to the

to specificity of these chord combinations, we also argue that

T the number of distinct chord pairs at the end of a segment

is lower compared to all possible chord sequences in the

Y

Ŝ, K̂, Ĉ = arg max P (xt |st , kt , ct )

t=1 middle of a structural segment. Structural boundaries seem

P (st , kt , ct |st−1 , kt−1 , ct−1 ) an implausible place to experiment with some less com-

mon chord combinations, established by well-known, dia-

The time-interval t here indicates the index of the interbeat tonic chord combinations. Real world examples that sup-

segments, where the beats are estimated by ircambeat [17]. port these statements are the observation that movements

The probabilities of this HMM will now be determined by in classical music typically end with one of a select num-

the combination of 2 different components that are each ber of chord combinations, called cadences, or that musical

using their separate observations. The first component is section changes in jazz and blues are often preceded by so-

a method to estimate a structural segmentation simultane- called “turn-arounds”.

ously with the harmony of a piece. It uses chroma features. In order to verify these statements in a methodological

The second component determines structural boundaries way, we first identify three categories of chord pairs based

based on a novelty measure that is derived from timbral on their position with respect to structural boundaries. An

features. For clarity reasons, we introduce separate nota- example sequence for which these three categories are in-

tions for the chroma observations yt and for the novelty dicated can be found in Figure 1. The first class is named

measure zt (so xt = [yt zt ]). We will consider the nov- final and this contains the chord pairs that form the two last

elty observations independent of the chroma observations: chords of a structural segment. The second category we’ll

Isophonics Quaero C1′ and C2′ represent the collection of all relative chords

major minor major minor that appear as first, respectively second, element in the bi-

intra 6.17 6.25 4.29 4.98 grams. The model perplexity is defined as the exponential

inter 3.91 4.52 2.99 2.37 of the entropy H (C1′ , C2′ |m) expressed in nats:

final 2.91 3.51 2.36 3.18

P P (C1′ , C2′ |m) = exp (H (C1′ , C2′ |m))

Table 1. Perplexity of relative chord transition models per

X

mode according to structural position = exp − P (c′1 , c′2 |m) log P (c′2 |c′1 , m)

c′1 ,c′2

call inter and this consists of all chord pairs that straddle a X

= exp − P (c′1 |m)

structural boundary. The last class of chord pairs is called

c′1

intra and this includes all the remaining chord pairs, the

ones that occur in the beginning and middle of a structural X

segment. P (c′2 |c′1 , m) log P (c′2 |c′1 , m)

c′2

To reason about chord pairs in a musicologically more

informative way, we will interpret them as relative chords This expresses the mean prior uncertainty of a bigram as

interpreted in a key. This representation reflects more a function of its mode. A lower value means that the

closely the way scholars analyse harmonic movement. transition probability is concentrated into fewer combina-

Also in accordance to musicological analysis, both chords tions of two chords. The perplexities for both data sets

will be interpreted in the same key. Because of the for- can be found in Table 1 and as can be seen, they confirm

ward motion in music, this will be the key annotated at the our hypothesis: the values for the “intra”-model are indeed

time of the first chord. In mathematical notation, we will significantly higher than those for the “inter” and “final”-

define a key k as the combination of a tonic t and a mode model and this for both corpora.

m. A chord c is defined as the combination of a root r For the calculation of the transition probabilities in the

and a type p. Both t and r belong to one of the 12 different following paragraphs, we will make use of the information

pitch classes. We restrict ourselves in this paper to 2 modes captured in these structure dependent relative chord transi-

and 4 chord types (m ∈ {major, (natural)minor}, p ∈ tion models. The reasoning will therefore be reversed: in-

{maj, min, dim, aug}). We then define a relative chord stead of showing that a structural boundary often suggests

c′ with respect to a key k by expressing the root r as the a specific set of relative chord pairs, we’ll use the occur-

interval between the tonic and the root: i = d(t, r). There- rence of such relative chord combinations as evidence to

fore we can equivalently express a key-chord pair as a key- estimate structural boundaries.

relative chord pair (k, c) = (t, m, r, p) = (t, m, i, p) =

(k, c′ ). In order to take the inherent shift-invariance in 2.2.2 Transition probabilities

harmony analysis into account, we just ignore the tonic The transition probabilities P (st , kt , ct |st−1 , kt−1 , ct−1 )

of the key and only keep the mode. For sequences of 2 are calculated by a prior musicological model that con-

key-chord pairs, we end up with just a mode and a pair of sists of a number of submodels. By introducing some mu-

relative chords as a representation for the local harmony: sicologically motivated constraints to the transition prob-

mn , c′n , c′n+1 where n is the chord index. abilities, we want to enforce a number of relationships

We then construct relative chord transition models for between the concepts of keys, chords and structural seg-

each of the three structural categories by counting occur- ments. These will ensure that our estimation always pro-

rences of all successions of 2 consecutive relative chords duces sensible results and have as an added benefit that this

in a corpus that is annotated with keys, chords and struc- also speeds up the calculation. The first three constraints

tural segments. Sequences that do not appear in the data we impose are 1) a key change kt 6= kt−1 is only allowed

set are assigned a probability using Kneser-Ney smooth- to occur together with a chord change ct 6= ct−1 , 2) a struc-

ing [7]. The corpus should be annotated such that all the tural segment must contain at least two different chords (or

positions are indicated where at least one of key, chord or a single no-chord), 3) there must be a change in chord or

structural segment changes. We have two such data sets at in key between segments. These three limitations can be

our disposal. The first one is the publicly available “Iso- easily enforced by ensuring that every state change implies

phonics” set and more specifically the subset that has been a chord change. This makes the state duration model effec-

used for the MIREX 2010 chord estimation competition. tively a chord duration model that we control by a single

It consists of 217 full songs, mostly by the Beatles (180 parameter Ps :

songs), the remainder by Queen (20) and Zweieck (17).

The second one is a private data set called “Quaero” and it P (st , kt , ct |st−1 , kt−1 , ct−1 ) =

(

contains 53 songs from a number of diverse artists in the Ps st = st−1 ∧ kt = kt−1

popular genre. We can now quantify the difference in rel- , ∀ct = ct−1 (1)

0 st 6= st−1 ∨ kt 6= kt−1

ative chord distribution between the various structure de-

pendent transition models by calculating the bigram model We use the same value for Ps as found in the original sys-

perplexity P P (C1′ , C2′ |m) per mode m for each of them. tem [15].

Ps

The remaining probabilities P (st , kt , ct |st−1 , kt−1 , s=R

ct−1 ), ∀ct 6= ct−1 of the chord changing transitions are

k=no-k, c=no-c

calculated by the combination of three submodels. We fur- s=O s=L

ther apply Bayes’ theorem repeatedly to arrive at a decom-

Ps Ps

position into three terms

ﬁnal

P (st , kt , ct |st−1 , kt−1 , ct−1 ) Nk

kt=kt-1

Nk

= P (st |st−1 , kt−1 , ct−1 ) P (ct |st , st−1 , kt−1 , ct−1 ) ct≠ct-1

The first term P (st |st−1 , kt−1 , ct−1 ) will be used to intra

kt=kt-1

control the ease of changing the structure variable s and inter

thus to control the insertion rate of segment boundaries. ct≠ct-1

We use a simple model that ignores the key and chord in-

fluence and consists of a single parameter ω that balances Figure 2. State diagram of the system

the probability of going to s = O or s = L after leaving

s = O.

structure. The two upper quadrants consist of block diag-

ω st−1 = O, st = O onal matrices with Nk blocks of side Nc , the lower left

1 − ω s

t−1 = O, st = L

quadrant is dense and the lower right quadrant is a diago-

P (st |st−1 ) = nal matrix. In comparison to a system that only estimates

1 s t−1 = L, st = O

keys and chords concurrently, the number of states gets

0 st−1 = L, st = L

doubled by repeating every key-chord state for s = O and

s = L. On the other hand, because of the sparsity of the

We can already recognize the structure-dependent rela- transition matrix, the increase in the number of transitions

tive chord transition model of the previous section in the remains limited. More specifically, the number of transi-

second term P (ct |st , st−1 , kt−1 , ct−1 ). Our three cate- 2

tions is (Nk Nc ) + 2Nk Nc2 + Nk Nc + 1, which corre-

gories of chord transitions – inter, intra and final – each sponds in our configuration to an increase of 8% instead of

correspond to a certain combination of the state variables. the theoretically maximum of 400% that would be reached

The intra model will be used when st−1 = O and st = O, for a dense transition matrix. This sparsity is exploited in

inter when st−1 = L and st = O and final when st−1 = O the implementation of the Viterbi algorithm to limit the in-

and st = L. Finally, from our definition of L it follows crease in computation time.

that when st−1 = L and st = L, only the self probability

Ps should be allowed, to account for the fact that the last 2.2.4 Chroma emission probabilities

chord of a structural segment can – and most likely will –

last more than one time step. The other probabilities are As our chroma features, we use the implementation by

set to zero. Ni et al. [13] known as Loudness Based Chromagrams.

Since we already established that a key change implies These are 24-dimensional vectors that represent the loud-

a chord change, we can neglect the influence of the chords ness of each of the 12 pitch classes in both the tre-

ct−1 , ct in the third term P (kt |st , st−1 , kt−1 , ct , ct−1 ). ble and the bass spectrum. They are calculated with a

We thus end up with P (kt |st , st−1 , kt−1 ). Furthermore, hop size of 23 ms and are afterwards averaged over the

we impose the supplemental constraint that a key change interbeat interval. We make the assumption that keys

can only occur between segments. Mathematically, this and chords can be independently tested for compliance

can be expressed as st−1 = O ⇒ P (kt |st , st−1 , kt−1 ) = with an observation and that the structure position is

δkt ,kt−1 with δ the Kronecker-delta. For the inter key tran- conditionally independent of the observations, such that

sitions P (kt |st = O, st−1 = L, kt−1 ), we reuse the the- P (yt |st , kt , ct ) = P (yt |ct ) P (yt |kt ). The chord acous-

oretical model from [15], based on Lerdahl’s distance [9] tic probability P (yt |ct ) is modelled as a multi-variate

between keys. Gaussian with full covariance matrix. Its parameters are

In Figure 2 one can find a simplified state diagram of trained on the aforementioned “Isophonics” data set. The

our system, in which states are regrouped by the structure key acoustic probability P (yt |kt ) is calculated by taking

variable. Only the transitions from the point of view of the the cosine similarity between the observation vector yt and

structure variable are drawn in order not to overload the Temperley’s key templates [18]. These represent the stabil-

picture, but the constraints on key and chord transitions are ity of each of the 12 pitch classes relative to a given key.

indicated next to the arrows. The names of the structure de-

pendent relative chord transition models are also indicated. 2.3 The novelty-based component

The other major component of our system is a novelty-

2.2.3 System complexity

based structural segmentation algorithm. We’ll use a

The result of adding the additional constraints is that the simple implementation that is conceptually very close to

complete transition matrix will have a well-defined, sparse Foote’s original proposal [3]. First, a self-similarity ma-

Original novelty curve values of the novelty curve by v, then the transformation

1 is the following:

0.5

where j is the index of the novelty curve, which has a fixed

0 sample rate of 4 Hz, and w the size of a window around the

0 50 100 150 j-th value. An example of this transformation can be seen

Peak salienced novelty curve in Figure 3, where the original signal is represented in the

1

first row and the processed one in the second. The effect

is that valleys now reach all the way down to 0 and that

0.5 highs indicate peaks with a high salience. As evident in

our example, there can be quite a large difference between

0 the saliences of the peaks corresponding to annotated seg-

0 50 100 150 ments. In order to diminish the differences, we apply a log

Log compressed peak salienced novelty curve compression, which is subsequently rescaled to the interval

1 [0, 1] for convenience:

0.5 vj′′ = log10 1 + αvj′

0 of Figure 3. Finally, we want to account for small de-

0 50 100 150

viations of the peak positions with respect to the actual

Figure 3. An example of the transformations of the novelty structural boundaries. Therefore we look for extrema of

curve the log-compressed novelty curve in the current beat seg-

ment and the segments that are adjacent on each side. For

P (zt |st = L) and P (zt |st = R) this will be the maxi-

mum and for P (zt |st = O) the minimum, so that the fi-

trix is calculated from a sequence of MFCC’s and the first

nal dimension of zt will actually be 2-dimensional: one

four spectral moments (spectral centroid, spread, skew-

dimension with the local minima of v ′′ and one with its lo-

ness and kurtosis). The similarity measure that is used

cal maxima. This step also changes the time scale from a

is the cosine similarity. From this matrix, a time vary-

fixed step size of 250 ms to beat-synchronous observations

ing novelty curve is derived by convolving the matrix with

(index j to t).

a two-dimensional novelty kernel along its diagonal. We

use a kernel size of 22.5 s and a step size of 250 ms. In

a stand-alone system, the peaks of the resulting novelty 3. EXPERIMENTAL RESULTS

curve are detected and structural boundaries are inserted at In this section, we will evaluate our system for various

those positions. In our case however, we’ll use the com- configurations. We evaluate the structural segmentation

plete curve to calculate the novelty observation probability by calculating the precision P (tol) and recall R (tol) be-

P (zt |st , kt , ct ). tween the generated and the annotated structure. They are

We assume that the novelty observations are condition- a function of a tolerance interval tol whose purpose is to

ally independent of chord and key state, such that we end allow for small deviations from the desired result to be still

up with P (zt |st , kt , ct ) = P (zt |st ). We thus need to considered correct. The precision is defined as the number

model the novelty observations for each of the possible of estimated boundaries for which an annotated boundary

values of st , i.e. O, L or R. By definition, there is a clear lies within the tolerance interval centered around its posi-

relation between the peaks of the novelty curve and a high tion divided by the total number of estimated boundaries.

probability of being in a state that will insert a structural The recall on the other hand, is the relative number of an-

boundary upon leaving it (s = O or s = R). Therefore notated boundaries that have an estimated boundary within

we model the structure acoustic probability for the L and its tolerance interval. Both measures are combined in an F-

R states as a (half) Gaussian centered on 1. Likewise, the measure F (tol). These measures are calculated for every

probability of the s = O states will be modelled by a half song of the data set and are afterwards averaged to give one

Gaussian centered on 0. However, we won’t use the values global result.

of the novelty curve directly as the observations zt for the We calculate the results for two tolerance intervals, 0.5 s

novelty observation probability P (zt |st ). A first remark is and 3 s, in accordance with the MIREX structure seg-

that the novelty curves are designed to be used in conjunc- mentation competition. Both the harmony-based and the

tion with peak picking. Therefore their relative value with novelty-based system were tested separately, as well as the

respect to surrounding valleys is what matters, and not their combination of both approaches. For the harmony-based

absolute value as is desired for our probabilistic system. In and the combined system, we performed our experiments

order to adapt the novelty values to our use, we therefore twice: once with the structure dependent relative chord

first transform the values of the curve. If we represent the models derived from the Isophonics data set and once with

F (3s) F (0.5s) [3] J. Foote. Automatic audio segmentation using a mea-

harmony-based (Isophonics) 54.72 34.90 sure of audio novelty. In Proc. ICME, 2000.

harmony-based (Quaero) 52.44 29.47

novelty-based 61.84 33.53 [4] M. Goto. A chorus section detection method for mu-

combined (Isophonics) 64.38 35.41 sical audio signals and its application to a music lis-

combined (Quaero) 64.13 34.08 tening station. IEEE Trans. Audio, Speech, Language

Process., 15(5), 2006.

Table 2. Results on the Isophonics data set

[5] K. Jensen. Multiple scale music segmentation using

rhythm, timbre, and harmony. EURASIP Journal Ad-

those from the Quaero set. The results on the Isophonics vances in Signal Processing, 073205, 2007.

data can be found in Table 2. We can see that the com-

bination of both approaches has a synergetic effect. Sep- [6] F. Kaiser and G. Peeters. Multiple hypotheses at multi-

arately, they each have different strengths. The harmony- ple scales for audio novelty computation within music.

based approach is better in precisely locating the structure In Proc. ICASSP, 2013.

boundaries, as apparent from the F (0.5s) results, while [7] R. Kneser and H. Ney. Improved backing-off for m-

the novelty-based approach performs better when a larger gram language modeling. In Proc. ICASSP, 1995.

deviation is allowed. As could be expected, the harmony-

based and combined systems work better with the Isophon- [8] K. Lee. A system for acoustic chord transcription and

ics relative chord models, since they are perfectly matched key extraction from audio using hidden Markov mod-

with the test set. However, most of the synergy remains els trained on synthesized audio. PhD thesis, Stanford

when using the models learned on the Quaero set, showing University, 2008.

the generality of these models. Additionally, the combined

[9] F. Lerdahl. Tonal pitch space. Oxford University Press,

system is also less sensitive to the choice of relative chord

New York, 2001.

models than the harmony-based method is.

[10] M. Levy and M. Sandler. Structural segmentation of

4. CONCLUSION musical audio by constrained clustering. IEEE Trans-

actions on Audio, Speech and Language Processing,

In this paper, we proposed a method for structure estima- 16(2), 2008.

tion by combining 2 different approaches. The first is a

traditional way of segmenting structure based on a timbral [11] N. C. Maddage. Automatic structure detection for pop-

novelty measure. The second is based on a harmonic anal- ular music. IEEE MultiMedia, 13(1), 2006.

ysis that is performed concurrently with the structure es-

timation. It makes use of chroma features. Together they [12] M. Mauch and S. Dixon. Using musical structure to

form a probabilistic system for the simultaneous estimation enhance automatic chord transcription. In Proc. ISMIR,

of keys, chords and structure boundaries. We’ve shown 2009.

that the combination of both approaches works better than [13] Y. Ni, M. McVicar, R. Santos-Rodrı́guez, and T.

each of the two systems on its own. De Bie. An end-to-end machine learning system for

In the future, we will experiment with a post-processing harmonic analysis of music. IEEE Transactions on Au-

step to extend our resulting structural segmentation into a dio, Speech and Language Processing, 20(6), 2012.

full structure estimation that includes the identification and

labelling of repeated segments. After all, for each of our [14] J. Paulus, M. Müller, and A. Klapuri. Audio-based mu-

estimated segments we have a harmonic analysis available sic structure analysis. In Proc. ISMIR, 2010.

that could be used as a feature for the clustering of similar

[15] J. Pauwels, J.-P. Martens, and M. Leman. Improving

segments, in addition to the more low-level features that

the key extraction accuracy of a simultaneous key and

are currently used for this task.

chord estimation system. In Proc. ICME, 2011.

the joint estimation of keys, chords and structural

This work was partly supported by the Quaero Program boundaries. In Proc. ACM Multimedia, 2013.

funded by Oseo French agency.

[17] G. Peeters and H. Papadopoulos. Simultaneous beat

and downbeat-tracking using a probabilistic frame-

6. REFERENCES

work: theory and large-scale evaluation. IEEE Trans-

[1] M. J. Bruderer, M. McKinney, and A. Kohlrausch. actions on Audio, Speech and Language Processing,

Structural boundary perception in popular music. In 19(6), 2011.

Proc. ISMIR, 2006.

[18] D. Temperley. The cognition of basic musical struc-

[2] J. Foote. Visualizing music and audio using self- tures. MIT Press, 1999.

similarity. In Proc. ACM Multimedia, 1999.

- Segmentation of Brain Tumour from MRI image – Analysis of K-means and DBSCAN ClusteringHochgeladen vonInternational Journal of Research in Engineering and Science
- A Hybrid Particle Swarm Optimization Algorithm to Human Skin Region DetectionHochgeladen vonJournal of Computing
- PhotoSketch: Internet Image MontageHochgeladen vonromulusx
- Barney Kessel WisdomHochgeladen vonPoss Hum
- Aspect Ratio Processing DTV HDTVHochgeladen vonMax Power
- SEGMENTATION OF HISTOPATHOLOGICAL IMAGES USING FAST FUZZY C-MEANS APPROACHHochgeladen vonIJSTE
- Digital Image ProcessingHochgeladen vonSasmito Adibowo
- gould1964a.pdfHochgeladen vonIgor Reina
- Analyzing Tunes Par TiHochgeladen vonRobert Lee
- A DADGAD CHRISTMAS ( Doug Young ).pdfHochgeladen vonJohn
- The Study of Harmony - Diether de la Motte.pdfHochgeladen vonAnte B.K.
- Partiture - Walking Bass IntroductionHochgeladen vonpepeche7
- Concept Checklist _Nat 3, 4, 5, Higher & Adv HigherHochgeladen vonalfordlive
- TechTipConnectingChordChangesSmoothly.pdfHochgeladen vonRobert Robinson
- artofaccompanyin00linduoftHochgeladen vonLucas De Paula
- analyzing_11th_13th_chords.pdfHochgeladen vonkwintoos
- Contour Detection and Hierarchical Image SegmentationHochgeladen vonTrường Giang
- Fdp on Im & Dm Using OstHochgeladen vonsomenath_sengupta
- Prac_6Hochgeladen voncaranmiroslav
- ML ProjectHochgeladen vonsurendra parla
- At s Article 01021003Hochgeladen vonmahaaraja112
- Image 3Hochgeladen vonapi-3719303
- DIP QuestionsHochgeladen vonNithish Kumar
- A Retinal Vessel Detection Approach Based on Shearlet Transform and Indeterminacy Filtering on Fundus ImagesHochgeladen vonAnonymous 0U9j6BLllB
- A REVIEW ON SEGMENTATION PROBLEMS ON GUJARATI HANDWRITTEN TEXT DOCUMENTHochgeladen vonJournal 4 Research
- Forming Chords & Harmonising the ScaleHochgeladen vonSanjay Chandrakanth
- A Novel Hybrid Approach Using Kmeans Clustering and Threshold filter for Brain Tumor DetectionHochgeladen vonseventhsensegroup
- 32 Tavenar - Lamb.pdfHochgeladen voncuadra
- Diabolus in musicaHochgeladen vonWenilton Daltro
- Niethammer2008 DWI Miccai Tubular Fiber Bundle SegmentationHochgeladen vonmarcniethammer

- Music and Emotion Seven QuestionsHochgeladen vonReem BintMajid
- Classification of Dance Music by Periodicity PatternsHochgeladen vonMaría Marchiano
- Bennett Consolidating Music SchenesHochgeladen vonMaría Marchiano
- ChemeroHochgeladen vonmarquesvaleria
- Empirical Analysis of Track Selection and Ordering in Electronic Dance Music Using Audio Feature Extraction by Thor Kell & George TzanetakisHochgeladen vonFlorin Ciocanu
- Windows Finale Read Me.rtfHochgeladen vonAustin Carlson
- 2008.TSQArticleHochgeladen vonMaría Marchiano
- 546Hochgeladen vonMaría Marchiano
- 227-233-1-PBHochgeladen vonMaría Marchiano
- Despidos Masivos en Argentina Caracterizaci n de La Situacion y an Lisis Del Impacto Sobre La Salud f Sica y Mental 2015 2016 VfinalHochgeladen vonMaría Marchiano
- physicality_feedback.pdfHochgeladen vonlupustar
- HyndmanHochgeladen vonMaría Marchiano
- Neuroscience_and_Education_A_Philosophic.pdfHochgeladen vonMaría Marchiano
- 2009Asensio-ReviewonNortecWoMHochgeladen vonMaría Marchiano
- When to Use Hierarchical Linear ModelingHochgeladen vonMaría Marchiano
- Keller y Rieger (Editorial). Musical Movement and SynchronizationHochgeladen vonMaría Marchiano
- Synesthesia. a New Approach to Understanding the Development of PerceptionHochgeladen vonMaría Marchiano
- lahavHochgeladen vonMaría Marchiano
- Stravinsky - 4 Russian Songs VoicePianoHochgeladen vonMarcia Kaiser
- gallese estética 2017Hochgeladen vonMaría Marchiano
- 64c46bab1840d40402f8a2eac476df1505c9.pdfHochgeladen vonMaría Marchiano
- Figure 1.pdfHochgeladen vonMaría Marchiano
- sage_journals_template_0.docxHochgeladen vonMaría Marchiano
- 4026-10440-1-PBHochgeladen vonMaría Marchiano
- Action Perception PfordresherHochgeladen vonMaría Marchiano
- Interactive Time TravelHochgeladen vonMaría Marchiano
- Weber n Varela - Life After KaHochgeladen vonCécile Carmona
- Exploring LanguageHochgeladen vonMaría Marchiano

- Lsf7 Release NotesHochgeladen vonKarol Korytkowski
- Sap Qm TablesHochgeladen vonAnonymous yahXGrlz8
- Computers Basic MCQ Questions 2Hochgeladen vonK Ashok Anand
- Semester ViiiHochgeladen vonHarikrishnan Rp
- Introduction to DBMSHochgeladen vonRajeev Singh
- Mindtree.pdfHochgeladen vonASHISH JAIN
- 1991-06 HP Journal PapersHochgeladen vonElizabeth Williams
- Peakfit4 12 BrochureHochgeladen vonTiago Pacheco
- 62 Aloha Labor Scheduler User Guide UeHochgeladen vonFederico Franic
- DHCP Client DORA ProcessHochgeladen vonDhaya Nithi
- Information System in Business WPrldHochgeladen vonJames Brian Flores
- 4.5.4 Pulse Code Modulation.pdfHochgeladen vonFirdaus
- Luo converterHochgeladen vonAndrei Cocor
- 2016 Annual ReportHochgeladen vonMoises A. Almendares
- Ford KnC Suspension EnHochgeladen vonpandiyarajmech
- PAD Agreement and InstructionsHochgeladen vonLouis-Philippe Couture
- g8m5l6- comparing linear functions and graphsHochgeladen vonapi-276774049
- #fullscreenonHochgeladen vonAna Filipa Caetano
- BibliographyHochgeladen vonBasu
- BondsHochgeladen vonAnonymous CpOlOtqlsA
- The Implementation of Hospital Information System (His) in Tertiary Hospitals in Malaysia- A Qualitative StudyHochgeladen vonAmree Zaid
- Datasheet Tda 9351psHochgeladen vonTrevor Paton
- MGT300 - Ch03 ExerciseHochgeladen vonkakpisz
- ODI 12c Getting Started VM Installation GuideHochgeladen vonrams08
- How-To Configure SAP Business Objects Access Control 5.3 for SAP NetWeaver Portal 7.0Hochgeladen vonRichard Palotai
- i Option Workbook bizhub c754Hochgeladen vonbhripul
- SAP APO DP Lifecycle PlanningHochgeladen vonSAPAPOOnlinetraining
- Televerification Centres-SOW 2017Hochgeladen vondebaduttakar
- OOP_lab1Hochgeladen vonPete Joempraditwong
- image-guide.pdfHochgeladen vonkokome35