Sie sind auf Seite 1von 8

Ph.

D Research Proposal:
Coordinating Knowledge Within an Optical
Music Recognition System
J. R. McPherson
March, 2001

1 Introduction to Optical Music Recognition


Optical Music Recognition (OMR), sometimes also called musical score recog-
nition or simply score recognition, is the process of automatically extracting
musical meaning from a printed musical score. Music notation provides a rich
description of the composer’s ideas, but ultimately sheet music is open to some
degree of interpretation by performers.
Performance considerations aside, the advantages of a computerised repre-
sentation of a musical score are numerous. These include:
• the ability to automatically transpose a particular instrument;
• converting representation to other musical formats or notations, such as
Braille-reading machines, for various software packages, or re-typesetting
a score published in an outdated fashion;
• allowing musicians to read the music from a computer display, for example
to eliminate the need for page turns [GWMD96, McP99];
• a form of compression, resulting in smaller data sizes [BI98];
• ease of sharing and archiving;
• increased ease of editing (using appropriate software), aiding in composi-
tion; and
• automatic indexing and retrieval of information [MSBW97].

1.1 General framework for OMR


The automated process of extracting musical meaning from sheet music nor-
mally follows a number of specialised steps, performed in a fixed order.
The first step is to acquire a digital form of the sheet music that a computer
can access. Today, this step is fairly easy, with the widespread availability of
cheap scanner hardware that can create both colour and monochrome digital
images at a resolution of three hundred dots per inch or higher, which is more
than adequate for our processing purposes.

1
The second step is to perform various image processing techniques to the
acquired image. This is necessary to recognise the symbols that make up the
page — for example, lines and note heads. This step is the hardest, and is often
broken up into two or more separate steps.
The final step is to determine the musical meaning (also called musical se-
mantics of the image based on the objects found in the previous step. In CMN
for example, objects like notes and rests have musical qualities such as pitch,
volume and duration; objects such as slurs, accents and trills affect individuals
notes; and objects such as tempo markings, key signatures and time signatures
affect the notes that follow.

1.2 Background and Starting Base


Common Music Notation (CMN), also called Western staff notation or Western
music notation, is the notation most widely used today — an example of CMN
is shown in Figure 1. Other music notations include guitar tablature, plainsong
notation, sacred harp notation, and various Asian, African and Indian musical
notations.

Figure 1: A sample of Common Music Notation: Handel’s Sonata V for flute


and piano.

Ideally, an OMR system should not be limited to any particular set of sym-
bols. It should be possible to add rules that allow the system to ‘understand’ a
new notation without making significant internal changes to the system. This
is referred to as extensibility.
Bainbridge’s CANTOR system [Bai97] was one of the first fully extensible
optical music recognition systems developed. Most prior work was limited to
work on small subsets of CMN, and often made assumptions about staff lines,
such as there were always five lines per staff. While CANTOR still has the
restriction that the music must be stave-based, there can be an arbitrary number
of lines per staff.
Here, extensible refers to the fact that one of the design goals was to research
and design a system that did not have hard-coded shapes built into it. This
research led to the formation of Primela — a Primitive Expression Language
for describing specific musical shapes. A set of Primela descriptions can be
written to describe a particular music notation and then loaded and used at
run-time, to process an image.

2
CANTOR consists of four main steps:
Staff line identification, which locates staffs, removes staff lines and locates
objects in the bitmap.
Primitive Recognition, which identifies basic shapes, such as (for the CMN
Primela descriptions) slurs, noteheads, tails, accidentals, and lines.
Primitive Assembly, which joins the basic primitives found into musical ob-
jects, such as noteheads, stems and tails into a note; and
Musical Semantics, which determines musical qualities such as pitch and du-
ration of the musical objects found, and can output various musical file formats.

2 Areas of Research
Most current projects in the field of OMR are concerned with improving the
accuracy of the various components, particularly the pattern recognition stages.
Instead of focusing solely on the individual components, I wish to research
and create methods that improve the overall system not merely by improving
components in isolation, but by improving how they interact with each other so
as to maximise the amount of musical information gained from the image.
Part of my research will involve determining and evaluating appropriate
methods for the process controlling the interaction, known as the coordinator.

2.1 Coordinating interaction between components


Determining how best to coordinate the information receive from the OMR
components will be the main area of focus for the thesis.
Figure 2 shows how most current systems operate. The different phases of
the OMR system are performed in a linear sequence, and each phase’s output
becomes the next phase’s input. This also means that each phase is tightly
coupled to both the previous and following one, as they must share common
data structures and formats.

Scanned music Staff line identification

Image enhancement

Musical object location

Image enhancement

Musical feature classification

Musical knowledge

Musical encoded data file Musical semantics

Figure 2: The current “pipeline” approach

However, this model has some limitations. Most seriously, errors made in
an early step will propagate through the following steps. For example, when
performing musical semantics analysis on the recognised components, an error
may be detected, such as a bar of music not having enough (or too many)

3
notes in it. Because this type of error can not be corrected within the current
context, the system is forced to output something that it knows is not quite
right. (Some errors, however, such as a missing or mis-detected accidental in
a key signature, could conceivably be corrected in this context.) What would
improve the system’s overall accuracy would be to use this newly-gained context
to re-perform a previous stage, and hopefully correct the error given this new
information.

IMAGE MUSIC REPRESENTATION


Co-ordinator

Musical
Page Layout Semantics/Analysis

Staff Processing Primitive Location Primitive Identification Primitive Assembly

Figure 3: The proposed “coordinated” approach

Figure 3 shows a possible revised framework to allow feedback to earlier


stages. All execution is controlled by a coordinating process — the modules can
not communicate directly. The idea here is that the top-level process controls
the flow of execution, based on a number of variables. Part of the research is to
determine the choice of variables used to control program flow, and what affect
these variables have on both the performance and the run-time behaviour of the
system.
This type of framework would also encourage less integration between the
various components. Loosely integrated components would allow, for example,
the addition of several ‘competing’ components that are capable of doing the
same or similar steps which could have their results compared for discrepancies
by the coordinator. This would provide either more confidence that the results
are right, if the different components agree, or particular areas that should be
further examined if the results conflict.
Another advantage is that this framework allows modules that do not di-
rectly perform any music processing but still provide additional context. An
example of this is a component that could detect the scan quality (perhaps
from the level of noise in the bitmap) and if the quality is low then tolerances
could be lowered, or a set of descriptions that is specifically designed for noisy
data could be used.

2.2 Page Layout


I would like to spend some effort into investigating and/or designing algorithms
for using a priori knowledge to determine possible “object types” before using
the lower level recognition subsystems such as staff location or character/text
recognition. This more general area of research is known as document image
analysis, and there are techniques that might be researched and improved with
respect to the OMR domain. This could involve the system keeping a history of
processed documents, to aid in predicting the layout of future documents, and
using prior knowledge to decide that there may be a title and author somewhere

4
near the top of the page. The proposed coordinated approach for the OMR
system could then decide whether or not to test this hypothesis given knowledge
gained about this area of the page from other sources.

2.3 Classification Algorithms for feature extraction


One of the more recent developments in the field of OMR is the use of machine-
learning techniques to develop shape descriptions, given a set of training data
[Ala95, BAD99, SD98]. These techniques could be investigated to design fea-
ture sets for classification of musical primitives for either the current ‘Primela’
framework, or some new, replacement method for differentiating objects.

2.4 Illustration of the Concept


There is currently an existing prototype — which is based on the CANTOR code
and is work-in-progress — capable of using message passing to provide feedback
from a particular phase to earlier phases. While not yet very advanced, the
following example demonstrates the potential improvement that the methods
under investigation may offer.
Figure 4(a) shows a small extract from the Clarinet Concerto by Mozart.
This extract is from the pianist’s part, and also has the clarinetist’s part dis-
played above the piano stave. This incidentally also demonstrates how OMR
must be able to deal with symbols at different scales within the same piece.
Figures 4(b) and 4(c) show the vertical lines and the flats respectively that were
found by CANTOR in the pattern recognition stage. There are some errors in
both of these classifications:
There are a few mis-identified vertical lines: the time signature ( 68 ) was just
broken enough to pass as two vertical lines. The musical semantics modules
could pick up that there was no time signature yet there were extra vertical
lines where a time signature might be expected, and allow the system to re-
examine this area. Also, the two letter ‘l’s of the word “Allegro” were not
unreasonably determined to be vertical lines, as they were close enough to the
staff to be checked. However, they are unlikely to have any musical meaning for
CMN, and are also close to other textual characters.
There are four naturals in the extract that were determined to be flats, due to
the default descriptions used. This could be solved by writing Primela descrip-
tions that correctly differentiate between flats and naturals for the particular
fonts used in this piece of music, but it would be more elegant to automatically
correct these with semantic analysis, by noticing that accidentals rarely appear
that have no affect on the note, due to either the last occurring key signature
or from an accidental on the same note earlier in the same bar.
Unfortunately, in this particular case there are also missing flats in the key
signatures of two of the staves. These could also be picked up using semantic
analysis, by noticing that one staff did have a key signature, so the others
probably will as well. This, coupled with the fact that there will be unrecognised
objects in the position where a key signature could be expected, should provide
enough context that the recognition stage should look there again for a key
signature.
Lastly, for whatever reason the first chord in the second bar did not have a
note stem recognised as a vertical line — see the circled area within each figure

5
to locate this object. (CANTOR currently checks for vertical lines before check-
ing for accidentals, although this is user-defined in the Primela descriptions.)
Because of this, the shape passed the tests as possibly being a flat.
This is as far as CANTOR goes. However, when the prototype system
assembles the primitives together, it is noticed that this particular flat does not
have a notehead in the appropriate position to its immediate right.
The primitive assembly module now issues a request to the coordinator to
check this primitive’s classification again. Note that if the request is rejected,
the primitive assembly stage has already been completed, and processing can
continue regardless. The coordinator determines that the pattern recognition
module is capable of fulfilling this request, so passes the request to it. This stage
now takes account of this new context, and subsequently rejects the shape as
possibly being a flat (Figure 4(d)). Currently this context (that is, the primitive
could not be assembled) is accounted for by re-testing the object for the same
classification, but with a higher threshold for passing.
While this may seem like a small step, it can have an impact on the final
output — this is the difference between the music as written, and an incorrect
note resulting in a dischord. Unfortunately, the prototype does not yet use this
new context to correctly identify this shape, in this case as a vertical line.
The prototype system does not currently perform semantic analysis. As the
above discussion shows, there are plenty of opportunities to use musical context
for improvement in the recognition stages. The key will be finding a generalised
approach for this task.

3 Intended Schedule and Requirements


This research will be carried out using existing equipment within the depart-
ment. No extra computing (or other) resources are expected to be required.
The following is an estimate of the work likely to be completed. Depending
on the progress made during these tasks, other work, such as that mentioned
in Sections 2.2 and 2.3, might be undertaken. Also, new developments by other
researchers may cause a change in direction or scope for this research.
Task Months
Continue research, complete first prototype. 6
Experimentation with prototype 2
Write-up methods, ideas and findings. 1
Investigate and create other coordinators 12
Comparisons between coordinators and other OMR systems 3
Completion of write-up 5
Total: 29
Note that some work has previously been done during enrolment for a Mas-
ters degree since July 2000.
There are currently no foreseen ethical issues arising from this research. If
at a later date it is necessary to perform evaluation studies on various methods
and/or software, then ethical approval from the school’s Ethics Committee will
be sought.

6
(a) The starting image

(b) The vertical lines found by CANTOR

(c) The flats found by CANTOR

(d) The flats found by CANTOR with coordination

Figure 4: Part of the first line of the Rondo from Mozart’s Clarinet Concerto,
with area of interest circled

7
References
[Ala95] Jarmo T. Alander. Indexed bibliography of genetic al-
gorithms in optics and image processing. Report 94-
1-OPTICS, University of Vaasa, Department of Infor-
mation Technology and Production Economics, 1995.
ftp.uwasa.fi/cs/report94-1/gaOPTICSbib.ps.Z.
[BAD99] Kyungim Baek Bruce A. Draper, Jose Bins. Adore: Adaptive ob-
ject recognition. In Proceedings of the International Conference
on Vision Systems, pages 522–537, Las Palmas de Gran Canaria,
Spain, Jan 1999.
[Bai97] David Bainbridge. Extensible Optical Music Recognition. PhD the-
sis, University of Canterbury, Christchurch, New Zealand, 1997.
[BI98] David Bainbridge and Stuart Inglis. Musical image compression.
In Proceedings of the IEEE Data Compression Conference, pages
209–218, Snowbird, Utah, 1998. IEEE.
[GWMD96] Christopher Graefe, Derek Wahila, Justin Maguire, and Orya
Dasna. Designing the muse: A digital music stand for the sym-
phony musician. In Proceedings of the CHI ’96 Conference on Hu-
man factors in computing systems, page 436, Vancouver, Canada,
1996. ACM.
[McP99] J. R. McPherson. Page turning — score automation for musicians.
B.Sc Honours thesis, University of Canterbury, New Zealand, 1999.
[MSBW97] Rodger J. McNab, Lloyd A. Smith, David Bainbridge, and Ian H.
Witten. The New Zealand Digital Library MELody inDEX, May
1997.
[SD98] Marc Vuilleumier Stückelberg and David Doermann. On musi-
cal score recognition using probabilistic reasoning. In Proceedings
of the Fifth International Conference on Document Analysis and
Recognition, ICDAR ’98. IEEE, 1998.

Das könnte Ihnen auch gefallen