Beruflich Dokumente
Kultur Dokumente
D Research Proposal:
Coordinating Knowledge Within an Optical
Music Recognition System
J. R. McPherson
March, 2001
1
The second step is to perform various image processing techniques to the
acquired image. This is necessary to recognise the symbols that make up the
page — for example, lines and note heads. This step is the hardest, and is often
broken up into two or more separate steps.
The final step is to determine the musical meaning (also called musical se-
mantics of the image based on the objects found in the previous step. In CMN
for example, objects like notes and rests have musical qualities such as pitch,
volume and duration; objects such as slurs, accents and trills affect individuals
notes; and objects such as tempo markings, key signatures and time signatures
affect the notes that follow.
Ideally, an OMR system should not be limited to any particular set of sym-
bols. It should be possible to add rules that allow the system to ‘understand’ a
new notation without making significant internal changes to the system. This
is referred to as extensibility.
Bainbridge’s CANTOR system [Bai97] was one of the first fully extensible
optical music recognition systems developed. Most prior work was limited to
work on small subsets of CMN, and often made assumptions about staff lines,
such as there were always five lines per staff. While CANTOR still has the
restriction that the music must be stave-based, there can be an arbitrary number
of lines per staff.
Here, extensible refers to the fact that one of the design goals was to research
and design a system that did not have hard-coded shapes built into it. This
research led to the formation of Primela — a Primitive Expression Language
for describing specific musical shapes. A set of Primela descriptions can be
written to describe a particular music notation and then loaded and used at
run-time, to process an image.
2
CANTOR consists of four main steps:
Staff line identification, which locates staffs, removes staff lines and locates
objects in the bitmap.
Primitive Recognition, which identifies basic shapes, such as (for the CMN
Primela descriptions) slurs, noteheads, tails, accidentals, and lines.
Primitive Assembly, which joins the basic primitives found into musical ob-
jects, such as noteheads, stems and tails into a note; and
Musical Semantics, which determines musical qualities such as pitch and du-
ration of the musical objects found, and can output various musical file formats.
2 Areas of Research
Most current projects in the field of OMR are concerned with improving the
accuracy of the various components, particularly the pattern recognition stages.
Instead of focusing solely on the individual components, I wish to research
and create methods that improve the overall system not merely by improving
components in isolation, but by improving how they interact with each other so
as to maximise the amount of musical information gained from the image.
Part of my research will involve determining and evaluating appropriate
methods for the process controlling the interaction, known as the coordinator.
Image enhancement
Image enhancement
Musical knowledge
However, this model has some limitations. Most seriously, errors made in
an early step will propagate through the following steps. For example, when
performing musical semantics analysis on the recognised components, an error
may be detected, such as a bar of music not having enough (or too many)
3
notes in it. Because this type of error can not be corrected within the current
context, the system is forced to output something that it knows is not quite
right. (Some errors, however, such as a missing or mis-detected accidental in
a key signature, could conceivably be corrected in this context.) What would
improve the system’s overall accuracy would be to use this newly-gained context
to re-perform a previous stage, and hopefully correct the error given this new
information.
Musical
Page Layout Semantics/Analysis
4
near the top of the page. The proposed coordinated approach for the OMR
system could then decide whether or not to test this hypothesis given knowledge
gained about this area of the page from other sources.
5
to locate this object. (CANTOR currently checks for vertical lines before check-
ing for accidentals, although this is user-defined in the Primela descriptions.)
Because of this, the shape passed the tests as possibly being a flat.
This is as far as CANTOR goes. However, when the prototype system
assembles the primitives together, it is noticed that this particular flat does not
have a notehead in the appropriate position to its immediate right.
The primitive assembly module now issues a request to the coordinator to
check this primitive’s classification again. Note that if the request is rejected,
the primitive assembly stage has already been completed, and processing can
continue regardless. The coordinator determines that the pattern recognition
module is capable of fulfilling this request, so passes the request to it. This stage
now takes account of this new context, and subsequently rejects the shape as
possibly being a flat (Figure 4(d)). Currently this context (that is, the primitive
could not be assembled) is accounted for by re-testing the object for the same
classification, but with a higher threshold for passing.
While this may seem like a small step, it can have an impact on the final
output — this is the difference between the music as written, and an incorrect
note resulting in a dischord. Unfortunately, the prototype does not yet use this
new context to correctly identify this shape, in this case as a vertical line.
The prototype system does not currently perform semantic analysis. As the
above discussion shows, there are plenty of opportunities to use musical context
for improvement in the recognition stages. The key will be finding a generalised
approach for this task.
6
(a) The starting image
Figure 4: Part of the first line of the Rondo from Mozart’s Clarinet Concerto,
with area of interest circled
7
References
[Ala95] Jarmo T. Alander. Indexed bibliography of genetic al-
gorithms in optics and image processing. Report 94-
1-OPTICS, University of Vaasa, Department of Infor-
mation Technology and Production Economics, 1995.
ftp.uwasa.fi/cs/report94-1/gaOPTICSbib.ps.Z.
[BAD99] Kyungim Baek Bruce A. Draper, Jose Bins. Adore: Adaptive ob-
ject recognition. In Proceedings of the International Conference
on Vision Systems, pages 522–537, Las Palmas de Gran Canaria,
Spain, Jan 1999.
[Bai97] David Bainbridge. Extensible Optical Music Recognition. PhD the-
sis, University of Canterbury, Christchurch, New Zealand, 1997.
[BI98] David Bainbridge and Stuart Inglis. Musical image compression.
In Proceedings of the IEEE Data Compression Conference, pages
209–218, Snowbird, Utah, 1998. IEEE.
[GWMD96] Christopher Graefe, Derek Wahila, Justin Maguire, and Orya
Dasna. Designing the muse: A digital music stand for the sym-
phony musician. In Proceedings of the CHI ’96 Conference on Hu-
man factors in computing systems, page 436, Vancouver, Canada,
1996. ACM.
[McP99] J. R. McPherson. Page turning — score automation for musicians.
B.Sc Honours thesis, University of Canterbury, New Zealand, 1999.
[MSBW97] Rodger J. McNab, Lloyd A. Smith, David Bainbridge, and Ian H.
Witten. The New Zealand Digital Library MELody inDEX, May
1997.
[SD98] Marc Vuilleumier Stückelberg and David Doermann. On musi-
cal score recognition using probabilistic reasoning. In Proceedings
of the Fifth International Conference on Document Analysis and
Recognition, ICDAR ’98. IEEE, 1998.