Sie sind auf Seite 1von 25

Partitions: a taxonomy of types and representations and

an overview of coding techniques

Manuel Menezes de Sequeira 1,2 and Diogo Cortez


Instituto de Telecomunicações, IST, 1096 LISBOA CODEX, Portugal

Recognising the importance of partition or shape coding in the field of


image and video coding, and the fact that a systematisation of the subject
would simplify the comparison of partition coding techniques, this paper
proposes a classification of partition types and partition representations
into a taxonomy tree. The partition type level of the tree classifies the
possible partition types that may have to be coded. The partition rep-
resentation level classifies the possible representations for each partition
type.
This paper also overviews the partition coding techniques which address
the considered partition types and which use the identified partition rep-
resentations. Emphasis is given to binary partition coding techniques pro-
posed within the framework of MPEG-4.

1 Introduction

During the last decades, considerable research effort has been put into the field
of image and video coding. Pixel-based coding techniques, whose first steps
were made in the seventies [46,26,27], matured during the eighties and begin-
ning of the nineties into widely spread standards such as ITU-T H.261 [24],
ISO/IEC MPEG1 [44], ISO/IEC MPEG2/ITU-T H.262 [45], and more re-
cently ITU-T H.263 [25]. Meanwhile, starting in the beginning of the eighties,
research on new techniques has begun [38]; these new techniques have been
identified [61] as using mid-level vision concepts (such as regions and textures)
instead of the pixel-based low-level vision concepts used before.

1 Partially supported by JNICT.


2 Corresponding author. Address for correspondence: Torre Norte, 10-15, IT,
IST, 1096 LISBOA CODEX, Portugal. Tel.: +351.1.8418461. Fax: +351.1.8418472.
Email: Manuel.Sequeira@it.ist.utl.pt.

Preprint submitted to Elsevier Preprint 21 November 2010


In 1993, the ISO MPEG group started a new standardisation effort: MPEG-
4 [52]. There will be important differences between MPEG-4 and the existing
standards: (a) it will permit easy access and manipulation of the contents of
video sequences—in terms of scene objects—, apart from providing the usual
requirements of compression, quality, and cost [53], and (b) it will be based on
a flexible syntax, enabling it to “survive” longer by allowing and encouraging
evolution.

Obtaining a description of a scene (in a video sequence) in terms of a set of


objects—image analysis—is one of the most important tasks in modern image
and video representation. It is also one of the main challenges faced by the
researchers in this field. Image analysis algorithms usually produce partitions
of the scenes into two-dimensional (or three-dimensional) regions. These par-
titions usually have to be coded during the image and video representation
process. It has been recognised that partition information will account for a
significant percentage of the bit stream (e.g., [23]). It is thus very important
to develop efficient partition coding techniques.

The comparison of techniques proposed in the literature has often been haunted
by the lack of systematisation of the subject. This paper attempts to fill this
gap by proposing a taxonomy of partition types and representations. The par-
tition coding techniques addressing each of the considered partition types and
using the identified partition representations are then overviewed.

2 Definitions

Digital images, whether two- or three-dimensional (the third dimension is


usually time), are usually obtained through a digitalisation process, which
involves sampling and quantisation. The sampling pattern often takes the
form of a lattice [57]. For instance, rectangular and hexagonal lattices can be
used for a two-dimensional image, see Figure 1.

An image graph is a simple graph [54], defining a neighbourhood system, as-


sociated with an image. The neighbourhood systems N4 , N8 , and N6 —where
each vertex (except those on the border of the image) has respectively 4, 8
and 6 neighbours—are usually associated with images sampled according to
rectangular and hexagonal lattices, see Figure 2. Each set of pixels S in an
image also has an associated graph, which is the maximal subgraph of the
image graph with the set of vertices S. A maximal subgraph is a subgraph
to which no further edges from the original graph can be added without also
adding some vertices (in this case, pixels). Henceforward, S will be used inter-
changeably as meaning ’set of vertices S’ or ’the maximal subgraph associated
with the set of vertices S’.

2
y y

– lattice sites

u1 u1
u0 x u0 x

(a) (b)

Fig. 1. Examples of (a) rectangular and (b) hexagonal lattices (u0 and u1 are the
lattice basis vectors).

(a) (b) (c)

– pixel or graph vertex


– graph edge

Fig. 2. Examples of sampling lattices with different neighbourhood systems: (a)


and (b) rectangular sampling lattice with N4 and N8 , respectively; (c) hexagonal
sampling lattice with N6 .

A partition is a digital image where the value at each pixel is a label identifying
the class to which that pixel belongs. If the number of labels is restricted to
two, the partition is binary; if more than two labels are possible, the partition
is said to be mosaic. Partitions are usually obtained by segmenting a digital
image.

A class is the set of all pixels in a partition having that class’s label. A region
consists of the pixels in a connected component of a class (seen as a maximal
subgraph).

The class adjacency graph (CAG) is a graph with one vertex for each class
in the partition (plus an extra one representing the outside of the image),
and an edge between any two classes adjacent in the image graph. The region
adjacency graph (RAG) is defined similarly for regions.

Two partitions are said to be class or region equivalent if they divide the image
into equally shaped classes or regions. Two partitions are said to be equal if,
apart from being class equivalent, the labels of each class are equal in both

3
partitions. The partitions are said to be class topologically equivalent if they
have the same labels and the CAGs are equal.

The line graph of a partition is obtained by duality of its planar image graph,
see Figure 3. The contours of a partition are usually defined as the subgraph of
the line graph containing all vertices and edges standing between pixels with
different labels (i.e., belonging to different classes).

– pixel
– line graph edge
– line graph vertex

(a) (b)

Fig. 3. Line graphs: (a) rectangular and (b) hexagonal.

Contours can also be defined directly on the pixels, by selecting only the pixels
at the borders between regions.

Generally, the contour information allows only for a representation of parti-


tions up to region equivalence. If class equivalence, or equality, is required,
then information about which regions belong to which classes (region-class
information) is necessary.

Several kinds of contour vertices, as shown in Figure 4, are defined according


to their degree (number of incident edges [54]): 1 – dead ends, 2 – regular
vertices, 3 – junctions, and 4 – crossings.

regular junction crossing junction dead end regular

(a) (b) (c)

Fig. 4. Variety of vertices on contour graphs: (a) and (b) rectangular and hexagonal
line graphs, and (b) rectangular image graph.

The concepts of path (a sequence of edges in a graph connecting successive


vertices), circuit (a path which ends at the starting vertex), and loop (a circuit
which cannot be segmented into two circuits), can be defined over a graph [54].

4
Circuits and paths are simple if they do not contain repeated edges. An Euler
circuit is a simple circuit containing all the edges of a given graph.

3 Taxonomy of partition types and representations

A taxonomy of partition types and representations is proposed in this section.


The two main levels of the taxonomy, partition type and partition represen-
tation, can be seen to correspond to the first steps taken when developing
a partition coding technique: the identification of the problem to be solved
corresponds to the identification of the partition type addressed by the coding
technique, and the selection of the partition representation corresponds to se-
lecting the kind of data the coding technique will manipulate. Thus, different
representations, usually leading to different techniques, can be used for the
same type of partitions.

During the description of the taxonomy tree levels, square brackets will be
used to specify the codes representing each branch of the tree.

3.1 Partition type

The partition types can be organised in a tree with the following levels:

(i) Space – Are partitions two- [2D] or three-dimensional [3D]?


(ii) Lattice – What sort of sampling lattice was used for digitising the im-
ages from which the partitions were obtained (e.g., rectangular [R] or
hexagonal [H])?
(iii) Graph – What kind of graph is super-imposed on the partition (usually
a neighbourhood system is specified [Nn ])?
(iv) Classes – Are partitions binary [B] or mosaic [M]?
(v) Connectivity – Are the classes connected [C] or can they be disconnected
[D] (on the chosen image graph)?

Figure 5 shows the partition type levels of the taxonomy tree. The leaves of
the taxonomy tree correspond to different types of partition. Each type of
partition can be specified by answering the five questions listed. For instance,
the answers: i. two-dimensional [2D], ii. hexagonal [H], iii. 6-neighbourhood
[N6 ], iv. mosaic [M], and v. connected [C], (or, with codes, 2DHN6 MC) define
a type of partitions that lie in a two-dimensional space, correspond to digital
images sampled according to an hexagonal lattice, are structured according
to the hexagonal graph, can have more than two classes, and where all classes
are connected (the concepts of class and region are equivalent in this case).

5
Space: 2D 3D
(2D or 3D)

Lattice: H R ...
(hexagonal or
rectangular)
Graph: N6 N4 N8
(Nn)

Classes: B M B M B M
(binary or
mosaic)
Connectivity: C D C D C D C D C D C D
(connected or
disconnected)

2DHN6Bc 2DHN6Mc 2DRN4Bc 2DRN4Mc 2DRN8Bc 2DRN8Mc

2DHN6MC

Fig. 5. The partition type taxonomy tree (in bold, the example given in the text).
c stands for either C (connected classes) or D (disconnected classes).

Notice that the branches under “3D” in the figure are not drawn, since this
paper addresses mainly two-dimensional partitions. At the partition represen-
tation level, however, three-dimensional partitions will be considered in more
detail (see next section).

3.2 Partition representation

This section introduces more levels of detail, related with the representation
chosen for the partitions, into the taxonomy tree. Two- and three-dimensional
partitions will be dealt with separately.

3.2.1 Two-dimensional partitions

The first important decision to be made regards mosaic partitions:

(i) Handling – Should the mosaic partitions be handled as such (a single


mosaic partition) [M] or should they be separated into a collection of bi-
nary partitions (each one corresponding to a different class in the original
mosaic partition) [B]?

As will be discussed in Section 4.1, the handling of mosaic partitions as col-


lections of binary partitions is often of paramount importance. For instance,
when classes should be readily accessible from the coded bit stream, a collec-
tion of binary partitions may allow an easier access to the various objects in
a scene than the original mosaic partition.

It has been seen in Section 2 that a partition can be represented in two dif-
ferent ways: either by the labels of each pixel, or by contour information plus

6
region-class information. When class equivalence is the aim, the latter provides
information about the clustering of regions into a certain number of classes.

Hence, the next level in the taxonomy will be:

(ii) How – How should the partition be represented? With pixel labels [L] or
with contours [C]?

For the case of partitions represented with contours, other choices have to
be made: How to represent the contours? What sort of neighbourhood sys-
tem has the line graph? These questions lead to two other levels of partition
representation in the taxonomy tree:

(iii) Where – Where should contours be defined? On the image graph or on


the line graph? That is, should the contour be defined on pixels [P] or on
edges [E] (understood here as borders between pixels, edges of the line
graph)?
(iv) Graph – What is the kind of neighbourhood system of the graph from
which the contour is a subgraph [Nn ]?

Figure 6 shows the partition representation levels of the taxonomy tree for the
two-dimensional case.

Partition 2DHN6Bc 2DRN4Bc 2DRN4Mc 2DRN8Bc 2DRN8Mc


2DHN6Mc
type:

Handling: B M B M B M
(binary or
mosaic)

How: L C L C L C
(labels or
contours)
Where: P E P E P E
(pixels or
edges)
Graph: N6 N3 N4 N8 N4 N4 N8 N4
(Nn)

2DHN6Mc–BCEN3

Fig. 6. The partition representation taxonomy tree for the two-dimensional case (in
bold, the example given in the text). c stands for either C (connected classes) or D
(disconnected classes).

The 2DHN6 Mc partition type with a representation separated into binary class
partitions, using contours defined on edges, which have a N3 neighbourhood
system, is coded as 2DHN6 Mc-BCEN3 or:

– Partition type – Two-dimensional, hexagonal grid, N6 graph, mosaic, classes


connected or disconnected according to whether c is C or D.
– Partition representation – Mosaic treated as independent binary partitions,

7
contours, edges, N3 graph.

3.2.2 Three-dimensional partitions

As can be seen in Figure 7, for three-dimensional partitions two representa-


tions may be considered: stick to three dimensions [3D], or slice the partition
along the time domain and use two-dimensional methods [2D]. Prediction of
the two-dimensional partition slices can be used [Inter], otherwise the two-
dimensional partitions are considered independent [Intra]. When prediction is
used, it may [M] or may not [F] use motion compensation (the ’M’ stands
for “motion” while the ’F’ stands for “fixed”). The motion information may
be either estimated from the three-dimensional partition [23] or input from
external sources (e.g., from a motion estimator working with the original three-
dimensional image). Notice that the “slicing” to two dimensions establishes a
connection to one of the two-dimensional branches at the top of the represen-
tation taxonomy shown in Figure 6, depending on the type of the resulting
two-dimensional (possibly predicted) partitions.
from the leaves of the 3D branch of
the partition type taxonomy tree

Approach: 3D 2D
(3D or 2D)

Prediction: Intra Inter


(intra or
inter)
Compensation: M F
(motion compensated
or not)
to top of two-dimensional
representation taxonomy

Fig. 7. The partition representation taxonomy tree for the three-dimensional case.

3.3 Representation properties

Choosing the representation for the partitions (of a given type) depends on
the properties of each representation and how adequate they are for the task
at hand. Pros and cons related with some of the levels of the partition repre-
sentation taxonomy tree are listed below:

– Handling: 3 (a) Mosaic – A single connected contour graph can separate sev-
eral regions, which leads to coding efficiency when a contour representation
is used; however, access to a single class shape is not easy, since the regions
(and classes) are not represented individually. (b) Binary – The classes are

3 Only for mosaic partitions.

8
represented independently, and thus easy access to each class is provided,
though at the expense of a lower coding efficiency.
– How: (a) Labels – In this case, the identification of the class to which each
pixel in the partition belongs is very simple, though the shapes of the classes
are not directly represented. (b) Contours – The shapes of the classes are
directly represented, albeit at the expense of requiring somewhat involved
algorithms to ascertain the class of a given pixel [50,58,2].
– Where: (a) Pixels – Representing contours on pixels poses a number of prob-
lems, especially in the case of mosaic partitions, since using all border pixels
leads to unnecessary repetition at both sides of a border; when the problem
is avoided by using only one side of each border, other problems arise: e.g.,
how should one pixel wide regions or parts of regions be distinguished from
borders of thick regions. Although the problems associated with these rep-
resentations have solutions, often somewhat involved, coding contours on
pixels does not seem to achieve higher compression than coding contours
on edges [13] (see also Section 4.6). (b) Edges – This is usually a more
elegant way of representing contours, which in addition typically provides
more compression than pixel based contours [13].

4 Overview of partition coding techniques

Once the type of partitions to code has been ascertained and a partition
representation selected, according to the taxonomy defined in the previous
section, there are usually a number of available coding techniques. This section
overviews some of these techniques. Special attention will be payed to two-
dimensional partitions.

4.1 Objectives of coding

Picard [53] identified the three performance criteria that classical video source
coders attempt to minimise: rate, distortion, and cost. The first relates to the
desirable compression of the data to transmit, so as to reduce redundancy
and also irrelevancy, if information losses are admissible during coding. The
second pertains to the need to maintain the quality of the signal as high as
possible, according to a possibly subjective criterion, and is applicable only to
lossy coding techniques. The third has to do with implementation costs.

A “fourth [performance] criterion” was also identified by Picard [53]: minimise


“content access work”. That is, the contents of a coded video sequence should
be as easy to access and manipulate as possible. Such an access or manip-
ulation of individual objects requires them to be coded as independently as

9
possible in the bit stream. This “fourth criterion” is being addressed also in
MPEG-4, and is related to one of the most important MPEG-4 functionalities:
object scalability.

The question of whether to use lossy partition coding techniques is an im-


portant one. It is true that some techniques that are inherently lossy, such
as parametric curves, can yield good compression [29]. However, it may be
difficult, for some applications, to establish sound partition coding quality cri-
teria. Also, when the scene objects (corresponding possibly to classes or sets of
classes) are to be manipulated individually, e.g., pasting an object into a dif-
ferent scene, the effects of lossy partition coding can be very important, since
pieces of the real object may be lost, pieces of the background can be intro-
duced, and even object deformation may occur. This seems to indicate that
lossless partition coding techniques are preferable, and that simplifications
should be introduced into the partitions carefully during the segmentation
process, before partition coding.

However, if lossy coding is acceptable, the losses are usually constrained so


that there is:

(i) Class topological equivalence: the classes should be maintained in num-


ber and adjacency relations; that is, the CAG should not be altered. A
stronger constraint can be imposed if the RAG is not allowed to change.
(ii) Small displacement of borders: the borders between the regions should
change as little as possible (according to some error criterion); other con-
straints may be imposed, for instance on errors associated with the area
and position of the regions.

4.2 Mosaic vs. binary partitions

When easy access to the contents of the video sequence is required, the shapes
of the various objects (e.g., a class or a set of classes in a partition) will
have to be coded independently. This requirement can be imposed even if the
segmentation process resulted in a mosaic partition, reducing the problem to
the coding of a series of binary partitions (see the “handling” level in Figure 6).

The independent coding of binary partitions also arises naturally when a lay-
ered scene representation, as proposed by Wang and Adelson [61], is used.
Layered representations of the scenes are also used in the MPEG-4 Video Ver-
ification Model 3.0 (VM3): each layer corresponds to a two-dimensional ob-
ject of arbitrary shape, whose time snapshots are called Video Object Planes
(VOPs) [52,48]. The shape of the objects represented by VOPs can be asso-

10
ciated to binary partitions. 4 However, if the content of the VOPs is coded
through region based techniques, then mosaic partitions will also be necessary
within each VOP.

Thus, both coding of binary and mosaic partitions may be important issues
when easy access to the contents of the video sequences is required.

4.3 Partition models

The coding efficiency always depends on the characteristics of the partitions


being coded. Most of the techniques aim at genericness, though this is a some-
what hard to define property. By genericness it is often meant that the tech-
niques perform well on average. The problem with this definition is that often
little is known about the statistics of the partitions which need to be coded.
This is a general problem in image processing: is there a statistical model for
the images to process? In the case of partition coding, the statistical char-
acterisation of input partitions depends both on the original images and on
the segmentation algorithm used upstream. 5 Hence, most techniques do not
address a specific model of input partitions, making only some general as-
sumptions such as: 6

(i) the regions tend to contain a significant amount of pixels, i.e., small
regions are improbable;
(ii) the classes tend to contain a small amount of regions;
(iii) the contours (borders between regions) tend to be simple (not ragged);
(iv) the region interiors tend not to contain too many small holes.

4.4 Class coding

Class coding is necessary when: 1. class equivalence is enough, 2. the partitions


used have disconnected classes (see “connectivity” level in Figure 5), and 3.
the explicit labels of the partition pixels have not (yet) been coded (which is
the case for contour coding techniques and for some label coding techniques).
The objective of class coding is to establish which regions are grouped in the

4 Actually the shapes of the VOPs can be specified in MPEG-4 using “binary
shape”, i.e., a binary partition, or “grey scale shape”, which is an alpha plane
specifying the transparency of each pixel.
5 Such a dependency makes it difficult to evaluate the performance of a partition

coder by itself.
6 See for instance Chapter 10 of [28].

11
same class. This issue will not be discussed at length here. However, note that
the coding methods used should take into account that:

(i) the explicit class labels are not required, since class equivalence is enough;
(ii) adjacent regions cannot belong to the same class, for otherwise they would
be a single region (this can help reduce the amount of data to transmit).

If partition equality is required, then the class labels should be coded explicitly
for each region in the partition. When the classes are connected, the fact that
a given label appears only once can be used to reduce the amount of data
to transmit, since the degrees of freedom keep reducing until zero when the
next-to-last label is transmitted.

4.5 Label coding

Label coding techniques code partitions whose representation is based on pixel


labels. The cases of binary and mosaic partitions will be addressed separately
in the following.

4.5.1 Binary partitions

Binary partitions can be seen as binary (or two-tone or bi-level) images. There-
fore, the techniques available for coding binary images are good candidates
for coding binary partitions. While lossless techniques can be applied without
any problems, lossy techniques often do pose some problems, since the type
of losses they allow does not generally take into account the requirements
identified in Section 4.1 for lossy partition coding.

Reviews on binary image coding can be found in [36,30] and, specifically for
fax, [31]. The lossless coding standards ITU-T T.4 and T.6 (Group 3 and
Group 4 facsimile) [16,17] and ITU-T T.82 (JBIG, for progressive coding of
binary images) [32] use techniques with increasing compression efficiency:

– ITU-T T.4 – Uses one-dimensional run-length encoding (RLE) and, option-


ally, also the two-dimensional modified relative element address designate
(MREAD) codes, both followed by variable length coding (VLC). In the
two-dimensional mode, each k line is coded with RLE (k is set to 2 for low
resolution images and to 4 for high resolution images), while all the other
lines are coded with MREAD.
– ITU-T T.6 – Is similar to ITU-T T.4, though the two-dimensional mode
is always used and k is set to infinite, so that only MREAD is used. The
resulting codes are called modified MREAD (MMREAD).
– ITU-T T.82 – Uses the arithmetic Q-Coder [51] to code the pixel values. The

12
probabilities for the Q-Coder are estimated using a local context (a tem-
plate) for the current pixel. Since JBIG uses resolution layers for progressive
coding, two types of templates exist: the first is used in the lowest resolution
layer and includes only pixels already transmitted in that layer, while the
second is used for all the other layers and includes not only pixels from the
current layer but also from the layer immediately below in resolution.

A technique based on a modified MMREAD code, on 16 × 16 blocks, has been


proposed for the coding of binary alpha maps in the framework of MPEG-
4 [1]. This technique has been adopted in VM3 [48] after the last round of
core experiments on binary shape coding [59] (see also Section 4.7). Two tech-
niques with relations to JBIG [5,6] have also been evaluated during the core
experiments. Both use arithmetic codes with probabilities estimated from a
local context around the pixel to be coded.

Among all the other techniques that have been proposed for binary partition
coding, the morphological skeletons [41] (and more recently [33]) is especially
relevant, mainly because this technique has evolved lately to efficiently cover
also mosaic partitions [8] (see Section 4.5.2). This technique represents the
shape of a region by a set of skeleton points and a so-called quench function:
the region is the union of structuring elements (of a certain shape) centred on
the skeleton points and scaled according to the value of the quench function
at that point.

Since binary partitions are a special case of mosaic partitions, techniques de-
veloped for the latter may also be applied to the former, either directly or
with simplifying changes, despite the fact that they do not take into account
the special characteristics of binary partitions.

4.5.2 Mosaic partitions

The case of mosaic partitions is more complex. The coding of mosaic parti-
tions has received less attention than the coding of binary partitions (however,
see [8,7,60]). It is possible, nevertheless, to use binary partition coding tech-
niques by first converting the mosaic partitions into bit planes. 7

A technique using the concept of geodesic skeleton, where the regions are
described by a set of skeleton points and a quench function [8], was recently

7 For instance, using the Four-colour theorem [54], the regions in a partition can be
perfectly identified by painting them with only four colours. Hence, each region can
be identified by a two-bit label, and thus two bit-planes are sufficient for representing
the partition. Each of the two bit-planes can be coded independently using (lossless)
binary partition coding techniques. Notice that some borders are present in both
bit-planes, so this method cannot yield optimal results.

13
proposed. This technique was developed for mosaic partitions, being thus also
applicable in the binary case, and is, in a sense, an extension of the technique
proposed in [41] for binary partitions (see Section 4.5.1). The authors claim
that “the geodesic skeleton is preferable to chain code whenever there are
many isolated and short contour arcs to be coded”, which seems to be the
case when 3D· · ·-2DInterM (motion predicted 2D partitions corresponding to
time slices of a 3D partition) partition representations are used.

A method which is also related to geodesic skeletons has been proposed in [60].
It represents regions as a union of structuring elements with appropriate trans-
lations and scalings. Both techniques ([8,60]) allow the structuring elements
to overlap already coded regions, thus avoiding duplicate coding of borders
and reducing the required bit rate. Both techniques are lossy and, again, can
be used for mosaic and binary partitions.

Another interesting technique, based on Johnson-Mehl tessellations, has been


proposed in [7]. 8 The idea is to find germs and their germinating time for
each region such that the original partition is reproduced well when the germs
are allowed to grow until reaching other growing germs. Though the technique
proposed is lossy, it can easily be made lossless. According to the authors, the
technique performed worse than the other techniques studied (straight line
and polygonal approximation, chain codes, and geodesic skeletons).

4.6 Contour coding

At least three breeds of contour coding techniques can be distinguished:

(i) Chain codes – The contour graph is coded by a string of symbols rep-
resenting the direction of the “chain” connecting a vertex to the next
vertex on the contour. Each of these strings is called a chain code. Sym-
bols may also represent direction changes, which makes the chain codes
differential.
(ii) Parametric curves – The contours are approximated by parametric curves,
whose coefficients are then coded; the most common examples are approx-
imations by straight lines and by splines (in general, by polynomials).
(iii) Transform codes – The contours are represented as parametric curves
which are coded using transform methods, followed by coefficient quan-
tisation, in a one-dimensional equivalent of the transform image coding.

All these techniques involve two steps: first the representation is changed by
transforming the contours into strings of symbols (e.g., changes in chain di-
rection, spline parameters, control points or transform coefficients—possibly
8 [7] also contains a good review of partition coding techniques.

14
quantised) and then these symbols are entropy coded.

For contours defined on pixels, it is also possible to use techniques developed


for binary image coding. The idea is to paint black, against a white back-
ground, all the border pixels in the partition and then use one of the tech-
niques mentioned in Section 4.5.1. Notice, however, that lossless techniques
should in general be used, since lossy techniques were not usually developed
with partition coding in mind.

4.6.1 Chain codes

The contour graph is a subgraph of either the line graph (for contours defined
on edges) or the image graph (for contours defined on pixels), and usually
consists of a collection of paths on the original graph. Contours can thus be
represented by a string of symbols representing which of the neighbours of the
current graph vertex belongs to the contour or, which is the same, the direc-
tion of the (chain) “link” connecting it to the next vertex on the contour: these
strings are called chain codes [18,19,64]. When the symbols represent direc-
tion changes, the chain codes are said to be differential [14,22]. The simplest
partitions are those for which the contour graph is constituted of disconnected
loops, that is, circuits where each vertex has exactly two neighbours in the
contour graph.

Binary partitions are generally simpler to code than mosaic partitions. The
main difference stems from the fact that, for binary partitions, all vertices in
the contour graph (at least for contours defined on the the line graph) have an
even number of neighbours: two vertices for images sampled with hexagonal
lattices, and two or four vertices for images sampled with rectangular lattices.
That is, the connected components of such graphs have Euler circuits, i.e., they
can be “drawn without lifting the pencil”, according to a known theorem 9 in
graph theory [54].

Mosaic partitions with contours defined on edges require special treatment,


since the existence of junction vertices (vertices with degree 3, see Figure 4)
precludes the definition of contours as disconnected circuits. There are at least
two ways of dealing with this problem:

(i) Ignore junctions and crossings – Select one of the exits and leave the
others for coding as separate contours; since initial contour points are
costly to code, this solution is not optimal.

9 “A connected multigraph [and hence also a simple graph] has an Euler circuit
if and only if each of its vertices has even degree [54]”—this theorem solves the
so-called Königsberg bridges problem.

15
(ii) Code junctions and crossings explicitly [42] – Select one of the exits but
code also information about the junction or crossing so that later one can
“return” and continue following the remaining exits (one in the case of a
junction, two in the case of a crossing).

When junctions and crossings are explicitly coded, the compression obtained
when coding a connected component of a contour depends strongly on the way
the connected component is followed: where to start, which exit to follow first
at each junction or crossing, etc. The problem of coding can then be seen as
a problem of minimising the bit rate given a certain syntax of representation.
This problem is similar to “the Königsberg bridges problem generalised”, that
is, to the problem of making a line drawing without lifting the pencil and
minimising the length of the redrawn lines [3].

When contours are defined on pixels, the concepts of junction and crossing
require a more involved definition and treatment [40,13]. In the case of bi-
nary partitions, the problem may be solved by again ignoring the presence
of vertices of degree larger than two in the contour graph. Another problem
of contours defined on pixels is posed by one pixel wide regions or parts of
regions, which make it difficult to use a stopping condition as simple as “stop
when the initial vertex of the contour is attained”, which is often used when
coding contours defined on edges. Such regions may also require the existence
of a “turning back” (180◦) direction in the chain codes, rarely used, which
may cause some VLCs to be inefficient (for instance Huffman). 10

In general, chain codes correspond to the specification of a subgraph, con-


sisting of a set of paths, in the underlying image or line graph. A contour
connected component consists of a set of paths linked at junctions and cross-
ings. These paths can be represented by: 1. a position for the first vertex of
the path, maybe implicitly indicated in previous crossing or junction informa-
tion, and 2. a string of symbols, the chain codes, which may include crossings
and junctions information. Both the first vertex position and the chain codes
are then entropy coded. The construction of the chain codes may also include
contour simplification procedures.

Several techniques have been proposed in the literature for entropy coding
the initial vertices and the chain codes: 1. zero order Huffman and arithmetic
coding (adaptive or not) [43,42], which tend to be inefficient, since region
borders are usually very different from a Brownian random walk through the
10Consider an alphabet consisting of two symbols A and B with equal probabilities
0.5: the corresponding Huffman code will have one bit per symbol. If a third, im-
probable but possible, symbol C is added, and the probabilities are p(A) = 0.495,
p(B) = 0.495, p(C) = 0.01, the number of bits per code word will be 1, 2, and 2,
respectively. The average number of bits per symbol will be 1.505, 40% worst than
the minimum of 1.071.

16
image or line graph; 2. nth order Huffman and arithmetic coding (adaptive or
not) [13,43,14]; 3. Ziv-Lempel coding [65,63], which is a form of “dictionary-
based coding” [36]; and 4. run-length coding, which groups chain codes into
runs of related symbols [34,43], usually corresponding to straight line seg-
ments [37,4,43] (and hence constituted either of a single symbol or of two sym-
bols, with adjacent directions, which verify the conditions defined by Rosenfeld
in [55]).

In the framework of the MPEG-4 core experiments on binary shape cod-


ing [59], extensions to basic or differential chain codes have been proposed. In
[20,59] a lossy multi-grid chain code is proposed which, according to the au-
thors, reduces by an average of 25% the coding cost with respect to differential
chain codes. In [62] a method is proposed which decomposes a (differential)
chain code into two chain codes with half the resolution, plus additional codes
if lossless coding is desired.

4.6.2 Parametric curves

These techniques approximate contours (or contour segments) by parametric


functions, usually polynomials. The functions can usually be represented by
either a set of coefficients or a set of control points [56,15]. The coefficients or
the coordinates of the control points are quantised and then entropy coded.
Notice that when polynomials of degree one are used (with rectangular co-
ordinates), the contours are approximated by polygons. The use of control
points [49] simplifies the quantisation process, since it is simpler to control
the errors introduced by quantising the coordinates of control points than the
errors introduced by quantising the coefficients of a polynomial. In the case
of mosaic partitions, the crossings and junctions of contours (as defined in
Section 4.6.1) are frequently selected as control points [15,39].

One of the most important problems in parametric curve representation of


contours is error control. Iterative techniques are commonly used which suc-
cessively split the contour until a sufficiently small approximation error is
obtained for each resulting segment [15,39]. The error is frequently calculated
from the geometrical distance between the parametric curves and the real con-
tours [39,21], but some researchers propose the use of the contrast across the
contours, assuming it is available [15].

When control points are used, their differences along the contour graph are
usually entropy coded. These methods deal with junctions and crossings in a
very similar way to chain coding techniques (see Section 4.6.1).

As part of the MPEG-4 core experiments on binary shape coding [59], para-
metric curve techniques have also been evaluated [21,47,35,12] (some of these
techniques stem from the earlier [29]). These techniques approximate the con-

17
tours with polygons or splines using a set of control points chosen again with a
split algorithm. The selection of which approximation method to use is either
done for each contour segment (between control points) or for each object.
The proposed techniques also take advantage of time redundancy between
control points along the successive partitions. One-dimensional transform cod-
ing methods, some of which multi-resolution, are proposed to compensate the
residual error between the parametric curve approximation and the actual
contours (see the next section).

4.6.3 Transform codes

The contours are represented first as parametric curves taking values in IR, if
the contour (or contour segment) being coded can be represented by a polar
function centred somewhere in the image, or in IR2 for other kinds of contour
(or contour segments). These parametric curves (still a lossless representation)
are then coded using transform methods [10], in a one-dimensional equivalent
of the transform coding used in image coding (e.g., DCT in JPEG, H.261,
H.262, and H.263), i.e., the parametric curves are transformed and the result-
ing coefficients are quantised and entropy coded.

Transform codes have also been under scrutiny in the MPEG-4 core exper-
iments on binary shape coding [59], both for contour coding proper and for
coding the residual error after using parametric curve methods.

The first of the techniques considered in the core experiments considers a polar
representation of the contour [11]. The contour is represented by a function
of the polar angle, whose value is the distance between the centroid and the
contour in the direction defined by the angle. 11 The one-dimensional DCT of
the distance function is calculated and then its coefficients are quantised and
VLC coded. Some contours cannot be properly represented by a parametric
function of the polar angle (since more than one contour point may occur for
a single angle). Hence, parts of the contour may have to be left out. These
parts are handled separately using chain codes (see Section 4.6.1). This tech-
nique can also take advantage of the temporal redundancy between successive
partitions.

The other transform coding techniques tested on the MPEG-4 core experi-
ments use either the one-dimensional DST or DCT to code not the contour
itself, but the residual error (distance) between a parametric curve approx-
imation and the actual contour [47,35,12]. In [12] the distance between the
approximated and actual contours is calculated either horizontally or verti-
cally, depending on the slope of the line between the control points of the
11 The centroid is the point whose coordinates are the average of the coordinates of
all the pixels in the region enclosed by the contour.

18
contour segment being encoded. This substantially reduces the calculations
relative to the usual orthogonal distance method. In [47] a multi-resolution
version of the DST is used, so as to provide contour (object) scalability.

4.7 Evaluation of coding techniques

The evaluation of the various existing partition coding techniques is an im-


portant issue, though out of the scope of this paper. As mentioned before,
[7] contains a good review of some partition coding techniques together with
their evaluation.

Recently, MPEG-4 has finished a round of core experiments on binary shape


coding [59], in which techniques for binary partition coding were evaluated
within a common framework. The core experiments usually take place between
the MPEG meetings. If the best of the evaluated techniques is also better than
the technique used in the current version of the VM, the VM is updated. Hence,
the VM is continuously evolving and serves as the reference against which all
coding techniques or tools are compared. The current VM, VM3 [48], uses
modified MMREAD codes [1] for binary shape coding.

5 Conclusion

A systematisation of the field of partition coding has been proposed in the


form of a taxonomy tree. The tree is divided in two main levels: partition type
and partition representation.

The partition type level classifies the possible partition types that may have to
be coded. The partitions types are classified according to: i. space (2D or 3D),
ii. sampling lattice, iii. superimposed graph structure, iv. number of classes in
the partition, and v. class connectivity.

The partition representation level classifies the possible representations for


each partition type. The representation of two-dimensional partitions is clas-
sified according to: i. whether mosaic partitions should be broken into a set
of binary partitions, ii. whether the partition should be represented by pixel
labels or by contours, and iii. whether contours should be defined on the pix-
els or on the edges. The representation of three-dimensional partitions, on the
other hand, is classified according to whether three-dimensional partitions are
represented by successive two-dimensional partitions, each corresponding to a
time instant, whether prediction from the previous two-dimensional partition
in the sequence is used, and whether motion compensation is also used.

19
The proposed systematisation is believed to simplify the comparison between
partition coding techniques, by establishing clearly which type of partitions a
given partition coding technique addresses, and which partition representation
that technique is based on.

An overview of the partition coding techniques available for each partition


type and the corresponding partition representations has been presented. The
overview includes techniques evaluated under the MPEG-4 core experiments
on binary shape coding [59]. The extension of the taxonomy tree with a sys-
tematisation of partition coding techniques will be left for further study.

An issue of interest, which will also be left for further study, is the extension of
the partition tree to include a branch for line drawings or contours that may
be open (which are not the dual of some partition). This is of interest since
contour-based coding, or image reconstruction from edges [22,9,15], with its
long history, still seems to have a large potential in image coding.

Acknowledgement

The authors would like to acknowledge the valuable comments of Prof. Fer-
nando Pereira and of the anonymous reviewers.

References

[1] Technical description for MPEG-4 first round of test. Technical Description
ISO/IEC JTC1/SC29/WG11 MPEG95/0354, Toshiba, November 1995.
[2] S. M. Ali and R. E. Burge. A new algorithm for extracting the interior of
bounded regions based on chain coding. Computer Vision, Graphics, and Image
Processing, 43:256–264, 1988.
[3] Richard Bellman and K. L. Cooke. The Königsberg bridges problem
generalized. Journal of Mathematical Analysis and Applications, 25:1–7, 1969.
[4] Michael James Biggar and A. G. Constantinides. Thin line coding techniques.
In Proceedings of the International Conference on Digital Signal Processing,
Florence, Italy, September 1987.
[5] Frank Bossen and Touradj Ebrahimi. A simple and efficient binary shape coding
technique based on bitmap representation. Technical Description ISO/IEC
JTC1/SC29/WG11 MPEG96/0964, EPFL, July 1996.
[6] Noel Brady. Adaptive arithmetic encoding for shape coding. Technical
Description ISO/IEC JTC1/SC29/WG11 MPEG96/0975, Teltec Ireland
(Dublin City University), ACTS/MoMuSys, July 1996.

20
[7] Patrick Brigger, Antoni Gasull, Chuang Gu, Ferran Marqués, Fernand
Meyer, and Christophe Oddou. Contour coding. CEC Deliverable
R2053/UPC/GPS/DS/R/006/b1, EPFL, UPC, CMM, LEP, December 1993.

[8] Patrick Brigger and Murat Kunt. Morphological shape representation for very
low bit-rate video coding. Signal Processing: Image Communication, 7(4–
6):297–311, November 1995.

[9] Stefan Carlsson. Sketch based coding of grey level images. Signal Processing,
15(1):57–83, July 1988.

[10] R. Chellappa and R. Bagdazian. Fourier coding of image boundaries. IEEE


Transactions on Pattern Analysis and Machine Intelligence, 6(1):102–105,
January 1984.

[11] Yu-Shin Cho, Shi-Hwa Lee, Jae-Seob Shin, and Yang-Seock Seo. Results of core
experiments on comparison of shape coding tools (S4). Technical Description
ISO/IEC JTC1/SC29/WG11 MPEG96/0717, Samsung AIT, March 1996.

[12] Yu-Shin Cho, Shi-Hwa Lee, Jae-Seob Shin, and Yang-Seock Seo. Shape
coding tool: Using polygonal approximation and reliable error residue sampling
method. Technical Description ISO/IEC JTC1/SC29/WG11 MPEG96/0565,
Samsung AIT, January 1996.

[13] Diogo Cortez. Classificação e codificação de contornos. Master’s thesis,


Instituto Superior Técnico, Universidade Técnica de Lisboa, Lisboa, May 1995.

[14] Murray Eden and Michel Kocher. On the performance of a contour coding
algorithm in the context of image coding part I: Contour segment coding. Signal
Processing, 8(4):381–386, July 1985.

[15] F. Eryurtlu, A. M. Kondoz, and B. G. Evans. Very low-bit-rate segmentation-


based video coding using contour and texture prediction. IEE Proceedings –
Vision, Image, and Signal Processing, 142(5):253–261, October 1995.

[16] Standardization of Group 3 facsimile apparatus for document transmission.


Recommendation T.4, CCITT, 1980.

[17] Facsimile coding schemes and coding control functions for Group 4 facsimile
apparatus. Recommendation T.6, CCITT, 1984.

[18] Herbert Freeman. On the encoding of arbitrary geometric configurations. IRE


Transactions on Electronic Computers, 10:260–268, June 1961.

[19] Herbert Freeman. Computer processing of line-drawing images. Computing


Surveys, 6(1):57–97, March 1974.

[20] Antoni Gasull, Ferran Marqués, and Juan A. Garcı́a. Lossy image contour
coding with multiple grid chain code. In Proceedings of the Workshop on
Image Analysis and Synthesis in Image Coding (WIASIC’94), page B4, Berlin,
Germany, October 1994. Heirich-Hertz-Institute.

21
[21] Peter Gerken, Michael Wollborn, and Stefan Schultz. Polygon/spline
approximation of arbitrary image region shapes as proposal for MPEG-
4 tool evaluation – technical description. Technical Description ISO/IEC
JTC1/SC29/WG11 MPEG95/0360, RACE/MAVT, University of Hannover,
Robert Bosch GmbH, and Deutsche Telekom AG, November 1995.

[22] Donald Norman Graham. Image transmission by two-dimensional contour


coding. Proceedings of the IEEE, 55(3):336–346, March 1967.

[23] Chuang Gu and Murat Kunt. Contour simplification and motion compensated
coding. Signal Processing: Image Communication, 7(4–6):279–296, November
1995.

[24] Draft revision of recommendation H.261: Video codec for audiovisual services at
px64 kbits/s, CCITT study group XV, TD 35, 1989. Signal Processing: Image
Communication, 2(2):221–239, August 1990.

[25] Video coding for low bitrate communication. Draft Recommendation H.263,
ITU-T, December 1995.

[26] Ali Habibi. Hybrid coding of pictorial data. IEEE Transactions on


Communications, COM-22(5):614–624, May 1974.

[27] Ali Habibi. Survey of adaptive image coding techniques. IEEE Transactions
on Communications, COM-25(11):1275–1284, November 1977.

[28] Robert M. Haralick and Linda G. Shapiro. Computer and Robot Vision,
volume I. Addison-Wesley Publishing Company, Inc., Reading, Massachusetts,
1992.

[29] Michael Hötter. Object-oriented analysis-synthesis coding based on moving


two-dimensional objects. Signal Processing: Image Communication, 2(4):409–
428, December 1990.

[30] Thomas S. Huang. Coding of two-tone images. IEEE Transactions on


Communications, COM-25(11):1406–1424, November 1977.

[31] Roy Hunter and A. Harry Robinson. International digital facsimile coding
standards. Proceedings of the IEEE, 68(7):854–867, July 1980.

[32] Progressive bi-level image compression. Recommendation T.82, ITU-T, March


1993.

[33] Rémi Jeannot, Demin Wang, and Véronique Haese-Coat. Binary image
representation and coding by a double-recursive morphological algorithm.
Signal Processing: Image Communication, 8(3):241–266, April 1996.

[34] Toru Kaneko and Masashi Okudaira. Encoding of arbitrary curves based on the
chain code representation. IEEE Transactions on Communications, 33(7):697–
707, July 1985.

22
[35] Jong-Lak Kim, Jong-Il Kim, Jong-Tae Lim, Jin-Hun Kim, Han-Soo Kim, Kyu-
Hwan Chang, and Seong-Dae Kim. Daewoo proposal for object scalability.
Technical Description ISO/IEC JTC1/SC29/WG11 MPEG96/0554, Daewoo
Electronics CO.LTD. and KAIST, January 1996.

[36] Weidong Kou. Digital Image Compression: Algorithms and Standards. Kluwer
Academic Publishers, Boston, 1995.

[37] M. K. Kundu, B. B. Chaudhuri, and D. Dutta Majumder. A generalised digital


contour coding scheme. Computer Vision, Graphics, and Image Processing,
30:269–278, 1985.

[38] Murat Kunt, Athanassios Ikonomopoulos, and Michael Kocher. Second-


generation image-coding techniques. Proceedings of the IEEE, 73(4):549–574,
April 1985.

[39] Michael S. Landy and Yoav Cohen. Vectorgraph coding: Efficient coding of
line drawings. Computer Vision, Graphics, and Image Processing, 30:331–344,
1985.

[40] Yuh-Tay Liow. A contour tracing algorithm that preserves commom boundaries
between regions. CVGIP: Image Understanding, 53(3):313–321, May 1991.

[41] Petros A. Maragos and Ronald W. Schafer. Morphological skeleton


representation and coding of binary images. IEEE Transactions on Acoustics,
Speech and Signal Processing, 34(5):1228–1244, October 1986.

[42] Ferran Marqués, Josep Sauleda, and Antoni Gasull. Shape and location coding
for contour images. In Proceedings of the Picture Coding Symposium (PCS’93),
page 18.6, Lausanne, Switzerland, March 1993.

[43] T. H. Morrin, II. Chain-link compression of arbitrary black-white images.


Computer Graphics and Image Processing, 5:172–189, 1976.

[44] Coding of moving pictures and associated audio for digital storage media up to
about 1,5 Mbit/s. International Standard 11172, ISO/IEC, 1993.

[45] Generic coding of moving pictures and associated audio information. Draft
Recommendation H.262, Draft International Standard 13818, ITU-T, ISO/IEC,
January 1995.

[46] Arun N. Netravali and John O. Limb. Picture coding: A review. Proceedings
of the IEEE, 68(3):366–406, March 1980.

[47] Kevin O’Connell and Damon Tull. Motorola MPEG-4 contour-coding tool
technical description. Technical Description ISO/IEC JTC1/SC29/WG11
MPEG95/0447, Motorola, November 1995.

[48] Ad hoc Group on MPEG-4 Video VM Editing. MPEG-4 video verification


model version 3.0. Document ISO/IEC JTC1/SC29/WG11 N1277, ISO, July
1996.

23
[49] David W. Paglieroni and Anil K. Jain. A control point theory for boundary
representation and matching. In Proceedings of the International Conference
on Acoustics, Speech and Signal Processing (ICASSP’85), pages 1851–1854,
Tampa, Florida, 1985. IEEE, Signal Processing Society.

[50] Theo Pavlidis. Contour filling in raster graphics. Computer Graphics, 15(3):29–
36, July 1981.

[51] William B. Pennebaker, Joan L. Mitchell, Glen G. Langdon, Jr., and Ronald B.
Arps. An overview of the basic principles of the q-coder adaptive binary
arithmetic coder. IBM Journal of Research and Development, 32(6):717–726,
November 1988.

[52] Fernando Pereira. MPEG4: a new challenge for the representation of audio-
visual information. In Proceedings of the Picture Coding Symposium (PCS’96),
pages 7–16, Melbourne, Australia, March 1996.

[53] Rosalind W. Picard. Content access for image/video coding: “the fourth
criterion”. Technical Report 295, MIT Media Lab: Perceptual Computing
Section, 1994.

[54] Kenneth H. Rosen. Discrete Mathematics and its Applications. McGraw-Hill,


Inc., New York, 1991.

[55] Azriel Rosenfeld. Digital straight line segments. IEEE Transactions on


Computers, 23(12):1264–1269, December 1974.

[56] Philippe Saint-Marc, Hillel Rom, and Gérard Medioni. B-spline contour
representation and symmetry detection. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 15(11):1191–1197, November 1993.

[57] Jean Serra. Image Analysis and Mathematical Morphology, volume I. Academic
Press, Inc., San Diego, California, 1993.

[58] Uri Shani. Filling regions in binary raster images: a graph-theoretical approach.
Computer Graphics (SIGGRAPH’80 Proceedings), 14(3):321–327, July 1980.

[59] MPEG Video Subgroup. Core experiments on MPEG-4 video shape coding.
Document ISO/IEC JTC1/SC29/WG11 N1326, ISO, July 1996.

[60] Shape coding with an optimized morphological region description. Contribute


COST211ter, Simulation Subgroup, SIM(92)23, U.C.L., February 1992.

[61] John Y. A. Wang and Edward H. Adelson. Representing moving images with
layers. IEEE Transactions on Image Processing, 3(5):625–638, September 1994.

[62] Shuichi Watanabe, Hisashi Saiga, Hiroyuki Katata, and Hiroshi Kusao. Binary
shape coding based on hierarchical chain codes. Technical Description ISO/IEC
JTC1/SC29/WG11 MPEG96/1045, Sharp Corporation, July 1996.

[63] Terry A. Welch. A technique for high-performance data compression. IEEE


Transactions on Computers, pages 8–19, June 1984.

24
[64] C. A. Wüthrich and Peter Stucki. An algorithmic comparison between square-
and hexagonal-based grids. CVGIP: Graphical Models and Image Processing,
53(4):324–339, July 1991.

[65] Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data
compression. IEEE Transactions on Information Theory, IT-23(3):337–343,
May 1977.

25

Das könnte Ihnen auch gefallen