Sie sind auf Seite 1von 5

Pattern Recognition Letters 19 1998.

255259

A graph distance metric based on the maximal common subgraph


a,) b
Horst Bunke , Kim Shearer
a
Informatik und angewandte Mathematik, Uniersity of Bern, Bern, Switzerland
Institut fur
b
Department of Computer Science, Curtin Uniersity of Technology, Perth, WA, Australia
Received 22 July 1997; revised 12 November 1997

Abstract

Error-tolerant graph matching is a powerful concept that has various applications in pattern recognition and machine
vision. In the present paper, a new distance measure on graphs is proposed. It is based on the maximal common subgraph of
two graphs. The new measure is superior to edit distance based measures in that no particular edit operations together with
their costs need to be defined. It is formally shown that the new distance measure is a metric. Potential algorithms for the
efficient computation of the new measure are discussed. q 1998 Elsevier Science B.V. All rights reserved.

Keywords: Error-tolerant graph matching; Distance measure; Maximal common subgraph; Graph edit distance; Metric

1. Introduction Classical algorithms of graph matching include


graph and subgraph isomorphism Read and Corneil,
One of the most general and powerful data struc-
1977; Ullman, 1976.. However, due to errors and
tures useful in a variety of applications are graphs.
distortions in the input data and the models, approxi-
For example, in computer vision and pattern recogni-
mate, or error-tolerant, graph matching methods are
tion, graphs are often used to represent unknown
needed in many applications. One way to cope with
objects, which are to be recognized, and known
errors and distortions is graph edit distance Shapiro
models, which are stored in a database. Thus, the
and Haralick, 1981; Bunke, 1997.. Here one intro-
recognition problem turns into a graph matching
duces a set of edit operations, for example, the
problem. Applications of graph matching in pattern
deletion, insertion and substitution of nodes and
recognition and machine vision include character
edges, and defines the similarity of two graphs in
recognition Lu et al., 1991; Cordella et al., 1997.,
terms of the shortest or least cost. sequence of edit
schematic diagram interpretation Lee et al., 1990;
operations that transforms one graph into the other.
Messmer and Bunke, 1996., shape analysis Pearce
Another approach to error-tolerant graph matching is
et al., 1994., image registration Christmas et al.,
based on the maximal common subgraph of two
1995., 3-D object recognition Cho and Kim, 1992;
graphs Horaud and Skordas, 1989; Levinson, 1992..
Wong, 1992. and video indexing Shearer et al.,
When defining distance or similarity measures,
1997..
certain properties are desirable. For example, one
may wish that the distance from object A to B is the
)
Corresponding author. E-mail: bunke@iam.unibe.ch. same as the distance from B to A symmetry..

0167-8655r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved.


PII S 0 1 6 7 - 8 6 5 5 9 7 . 0 0 1 7 9 - 7
256 H. Bunke, K. Shearerr Pattern Recognition Letters 19 (1998) 255259

Speaking more generally, it is often desired that the n : E L E is a function assigning labels to the
distance measure d fulfills the properties of a metric: edges.
1. d A, B . s 0 m A s B,
2. d A, B . s d B, A., If V s
0 then G is called the empty graph.
3. d A, B . q d B,C . ( d A,C ..
Usually edit distance measures are metrics. Only Definition 2. Given a graph G s V, E, m , n ., a sub-
if the costs of the underlying edit operations satisfy graph of G is a graph S s VS , ES , mS , n S . such that
certain conditions, the properties listed above will VS 9 V,
hold. But these conditions are sometimes too restric- ES s E l VS = VS .,
tive, or incompatible with the considered problem mS and n S are the restrictions of m and n to VS
domain. and ES , respectively, i.e.,
In the present paper, we propose a new graph
m . if g VS ,
distance measure that is based on the maximal com-
mon subgraph of two graphs. The main contribution
mS . s undefined otherwise,
of the paper is the formal proof that the new distance n e. if e g ES ,
measure is a metric. An advantage of the new dis-
tance measure over graph edit distance is the fact
nS e . s undefined otherwise.
that it does not depend on edit costs. It is well known The notation S 9 G is used to indicate that S is a
that any edit distance measure critically depends on subgraph of G.
the costs of the underlying edit operations. But the
problem how these edit costs are obtained is still Definition 3. A bijective function f : V V X is a
unsolved. Using the new distance measure, this prob- graph isomorphism from a graph G s V, E, m , n . to
lem can be avoided. a graph GX s V X , EX , mX , n X . if
In the next section of this paper we will present m . s mX f .. for all g V,
basis definitions. The following section will first for any edge e s 1 , 2 . g E there exists an edge
define the maximal common subgraph based dis- eX s f 1 ., f 2 .. g EX such that n e . s n eX .,
tance measure. Then it will be shown that the mea- and for any eX s X1 ,X2 . g EX there exists an edge
sure is a metric. Concluding remarks will make up e s fy1 X1 ., fy1 X2 .. g E such that n eX . s
the final section, including a discussion of potential n e ..
algorithms for the computation of the new distance
measure. Definition 4. An injective function f : V V X is a
subgraph isomorphism from G to GX if there exists a
subgraph S 9 GX such that f is a graph isomorphism
2. Basic definitions from G to S.

Note that finding a subgraph isomorphism from G


In this paper, we consider graphs with labeled
to GX implies finding a subgraph of GX isomorphic to
nodes and edges. Let L V and L E denote the finite
the whole of G. This distinction becomes important
sets of node and edge labels, respectively. Un-
in later discussion.
labeled graphs are obtained as a special case if
< L V < s < L E < s 1..
Definition 5. Let G, G 1 , and G 2 be graphs. G is a
common subgraph of G 1 and G 2 if there exists
Definition 1. A graph is a 4-tuple G s V, E, m , n ., subgraph isomorphisms from G to G 1 and from G to
where G2 .
V is a set of finite vertices,
E 9 V = V is the set of edges, Definition 6. A common subgraph G of G1 and G 2
m : V L V is a function assigning labels to the is maximal if there exists no other common sub-
vertices, graph GX of G 1 and G 2 that has more nodes than G.
H. Bunke, K. Shearerr Pattern Recognition Letters 19 (1998) 255259 257

The maximal common subgraph of two graphs G 1 Let m12 s <mcs G 1 ,G 2 .<, m 23 s <mcs G 2 ,G 3 .<, and
and G 2 will be denoted by mcs G1 ,G 2 .. Notice that m13 s <mcs G 1 ,G 3 .<. Then the following relation
mcs G 1 ,G 2 . is not necessarily unique for two given holds true:
graphs, G 1 and G 2 . The number of nodes of a graph
m12 q m 23 ( < G 2 < . 1.
G s V, E, m , n . is given by < V <. For the purpose of
notational convenience, we also denote the number Property 4 in Theorem 1 is equivalent to the follow-
of nodes of G by < G <. ing inequality:
m12 m 23
3. Graph distance measure 1y q1y
max < G 1 < , < G 2 < . max < G 2 < , < G 3 < .

m13
Definition 7. The distance of two non-empty graphs 01y . 2.
max < G 1 < , < G 3 < .
G 1 and G 2 is defined as
<mcs G 1 ,G 2 . < We will show that the left-hand side of this inequal-
d G 1 ,G 2 . s 1 y . ity is always greater than or equal to 1, which is
max < G 1 < , < G 2 < .
equivalent to
An example is shown in Fig. 1. Here we have
max < G 1 < , < G 2 < . max < G 2 < , < G 3 < .
< G 1 < s 5, < G 2 < s 4 and <mcs G 1 ,G 2 .< s 3. Hence,
d G 1 ,G 2 . s 0.4. 0 m12 max < G 2 < , < G 3 < . q m 23 max < G 1 < , < G 2 < . . 3.
Theorem 1. For any graphs G 1 , G 2 and G 3 , the We proceed by a simple case analysis.
following properties hold true: Case A.1: < G 1 < 0 < G 2 < 0 < G 3 <. Here Eq. 3. is
1. 0 ( d G1 ,G 2 . ( 1, equivalent to
2. d G 1 ,G 2 . s 0 m G1 and G 2 are isomorphic to
each other, < G 1 < P < G 2 < 0 m12 < G 2 < q m 23 < G 1 < . 4.
3. d G 1 ,G 2 . s d G 2 ,G 1 .,
4. d G 1 ,G 3 . ( d G 1 ,G 2 . q d G 2 ,G 3 .. From Eq. 1. we conclude that

Proof. Properties 13 follow directly from Defini- < G 1 < < G 2 < 0 m12 < G 1 < q m 23 < G 1 < 0 m12 < G 2 < q m 23 < G1 < .
tion 7. In the following proof of the triangle inequal-
ity we distinguish two cases: Case A.2: < G 1 < 0 < G 3 < 0 < G 2 <. Here Eq. 3. be-
Case A. The graphs mcs G 1 ,G 2 . and mcs G 2 ,G 3 . comes
are disjoint, or speaking more strictly, the maximal
common subgraph of mcs G 1 ,G 2 . and mcs G 2 ,G 3 . < G 1 < P < G 3 < 0 m12 P < G 3 < q m 23 P < G 1 < . 5.
is empty. For a Venn diagram illustration see Fig.
2a.. Using Eq. 1. again we conclude

< G1 < < G 3 < 0 < G1 < < G 2 <

0 m12 < G 1 < q m 23 < G 1 < 0 m12 < G 3 < q m 23 < G 1 < .

The remaining four cases < G 2 < 0 < G 1 < 0 < G 3 <, < G 2 <
0 < G 3 < 0 < G 1 <, < G 3 < 0 < G 1 < 0 < G 2 < and < G 3 < 0 < G 2 < 0
< G 1 < can be shown similarly.
Fig. 1. An example of Definition 7: a. a graph G1 ; b. a graph
Case B. Here we assume that the maximal com-
G 2 ; c. the maximal common subgraph, mcs G1 ,G 2 ., of G1 and mon subgraph of mcs G 1 ,G 2 . and mcs G 2 ,G 3 . is not
G 2 . Here we have d G1 ,G 2 . s 0.4. empty see Fig. 2b...
258 H. Bunke, K. Shearerr Pattern Recognition Letters 19 (1998) 255259

Again we proceed by case analysis.


Case B.1: < G 1 < 0 < G 2 < 0 < G 3 <. Here Eq. 8. is
equivalent to
< G1 < < G1 < < G 2 <
0 m12 < G 1 < < G 2 < q m 23 < G 1 < < G 1 < y m < G 1 < < G 2 <
which can be simplified to
< G 1 < < G 2 < 0 m12 < G 2 < q m 23 < G 1 < y m < G 2 <
s m12 y m . < G 2 < q m 23 < G1 < . 9.
From Eq. 6. it follows that
< G 1 < < G 2 < 0 m12 < G 1 < q m 23 < G 1 < y m < G 1 <
s m12 y m . < G 1 < q m 23 < G 1 <
from which we get Eq. 9. due to m12 0 m.
Case B.2: < G 1 < 0 < G 3 < 0 < G 2 <. Here Eq. 8. be-
comes
< G1 < < G1 < < G 3 <
Fig. 2. Illustration of disjoint and overlapping common subgraphs: 0 m12 < G 1 < < G 3 < q m 23 < G 1 < < G 1 < y m < G1 < < G 3 <
a. the maximal common subgraphs mcs G1 ,G 2 . s g 12 and
which can be simplified to
mcs G 2 ,G 3 . s g 23 are disjoint; b. mcs G1 ,G 2 . and mcs G 2 ,G 3 .
share a common subgraph g, i.e., g s mcsmcs G1 ,G 2 ., < G 1 < < G 3 < 0 m12 < G 3 < q m 23 < G 1 < y m < G 3 < . 10 .
mcs G 2 ,G 3 ...
We proceed analogously to Case B.1.
< G 1 < < G 3 < 0 < G 1 < < G 2 < 0 m12 < G1 < q m 23 < G 1 < y m < G 1 <
0 m12 < G 3 < q m 23 < G 1 < y m < G 3 < . 11 .
Let m s <mcsmcs G 1 ,G 2 .,mcs G 2 ,G 3 ..< ) 0. It The remaining cases can be shown similarly. I
follows that there exists a maximal common sub-
From Theorem 1 it follows in particular that our
graph of G 1 and G 3 with size greater than or equal
proposed distance measure is a metric. 1
to m. Furthermore it follows that

m12 q m 23 y m ( < G 2 < , m ( m12 , m ( m 23 . 6. 4. Discussion and conclusion


We will show that We have shown that the graph distance measure
m12 m 23 of Definition 7 is in fact a metric. As discussed
1y q1y earlier it is often difficult to form a metric from edit
max < G 1 < , < G 2 < . max < G 2 < , < G 3 < .
distance measures. Therefore in applications where
m the properties of a metric are important, the largest
01y 7. common subgraph metric could be used.
max < G 1 < , < G 3 < .
One application where this is important is infor-
which implies property 4 of Theorem 1. Obviously mation retrieval from images and video databases
inequality 7. is equivalent to Chang et al., 1987; Lee and Hsu, 1992; Shearer et
al., 1997.. This area relies heavily on browsing to
max < G 1 < , < G 2 < . max < G 2 < , < G 3 < . max < G 1 < , < G 3 < . locate required database elements. Thus it is neces-
sary for the distance measure chosen to be well
0 m12 max < G 2 < , < G 3 < . max < G 1 < , < G 3 < .

q m 23 max < G 1 < , < G 2 < . max < G 1 < , < G 3 < . 1
Strictly speaking, this statement is only true if isomorphic
graphs are regarded equal. But this assumption is certainly justi-
y mmax < G1 < , < G 2 < . max < G 2 < , < G 3 < . . 8. fied in most applications.
H. Bunke, K. Shearerr Pattern Recognition Letters 19 (1998) 255259 259

behaved to allow sensible navigation of the Cho, C.J., Kim, J.J., 1992. Recognizing 3-D objects by forward
database. The use of a metric, such as that proposed, checking constrained tree search. Pattern Recognition Lett. 13
8., 587597.
for the distance measure ensures that the behaviour Christmas, W.J., Kittler, J., Petrou, M., 1995. Structural matching
of the similarity retrieval will be consistent and in computer vision using probabilistic relaxation. IEEE Trans.
comprehensible, aiding the user in their search task. Pattern Anal. Machine Intell. 17 8., 749764.
Classical algorithms for computing the maximal Cordella, L., Foggia, P., Sansone, C., Vento, M., 1997. Subgraph
common subgraph of two graphs are based on maxi- transformations for the inexact matching of attributed rela-
tional graphs. In: Jolion, J.-M., Kropatsch, W. Eds.., Prepro-
mal clique detection Levi, 1972. or backtracking ceeding GbR97: IAPR Workshop on Graph Based Represen-
McGregor, 1982.. These algorithms are conceptu- tations, Lyon.
ally simple, but have a high computational complex- Horaud, R., Skordas, T., 1989. Stereo correspondence through
ity. For example, the worst case time complexity of feature grouping and maximal cliques. IEEE Trans. Pattern
the method described by Levi 1972. is O nm. n ., Anal. Machine Intell. 11 11., 11681180.
Lee, S., Hsu, F., 1992. Spatial reasoning and similarity retrieval of
where n and m denote the number of nodes of the images using 2D C-string knowledge representation. Pattern
two graphs under consideration. Recently, however, Recognition 25 3., 305318.
a new algorithm has been developed which uses Lee, S.W., Kim, J.H., Groen, F.C.A., 1990. Translation-, rotation-,
preprocessing of a database of model graphs to and scale invariant recognition of hand-drawn symbols in
detect the maximal common subgraph from an input schematic diagrams. Internat. J. Pattern Recognition Artif.
Intell. 4 1., 115.
graph to the models in the database with worst case Levi, G., 1972. A note on the derivation of maximal common
time complexity of O2 n . Shearer et al., 1997.. This subgraphs of two directed or undirected graphs. Calcols 9,
algorithm has demonstrated near real-time behaviour 341354.
in a video indexing application. Levinson, R., 1992. Pattern associativity and the retrieval of
In a recent paper, it has been shown that maximal semantic networks. Comput. Math. Appl. 23, 573600.
Lu, S.W., Ren, Y., Suen, C.Y., 1991. Hierarchical attributed graph
common subgraph computation can be regarded a representation and recognition of handwritten Chinese charac-
special case of graph edit distance computation un- ters. Pattern Recognition 24, 617632.
der a particular cost function Bunke, 1997.. An McGregor, J.J., 1982. Backtrack search algorithms and the maxi-
immediate consequence is that any algorithm for mal common subgraph problem. Software Practice and Experi-
graph edit distance computation can be used to com- ence 12, 2334.
Messmer, B., Bunke, H., 1996. Automatic learning and recogni-
pute the maximal common subgraph if it is run under tion of graphical symbols in engineering drawing. In: Kasturi,
the cost function given by Bunke 1997.. This opens R., Tombre, K. Eds.., Graphics Recognition, Lecture Notes in
up additional possibilities for the computation of the Computer Science, vol. 1072. Springer, Berlin, 1996, pp.
distance measure proposed in this paper, particularly 123134.
with respect to an efficient algorithm for graph edit Pearce, A., Caelli, T., Bischof, W.F., 1994. Rulegraphs for graph
matching in pattern recognition. Pattern Recognition 27 9.,
distance computation reported by Bunke and Mess- 12311246.
mer 1997.. Read, R.C., Corneil, D.G., 1977. The graph isomorphism disease.
J. Graph Theory 1, 339363.
Shapiro, L.G., Haralick, R.M., 1981. Structural descriptions and
References inexact matching. IEEE Trans. Pattern Anal. Machine Intell. 3,
504519.
Bunke, H., 1997. On a relation between graph edit distance and Shearer, K., Bunke, H., Venkatesh, S., Kieronska, D., 1997.
maximum common subgraph. Pattern Recognition Lett. 18 8., Efficient graph matching for video indexing. In: Jolion, J.-M.,
689694. Kropatsch, W. Eds.., Preproceeding GbR97: IAPR Work-
Bunke, H., Messmer, B., 1997. Recent advances in graph match- shop on Graph based Representations, Lyon.
ing. Internat. J. Pattern Recognition Artif. Intell. 11 1., Ullman, J.R., 1976. An algorithm for subgraph isomorphism. J.
169203. ACM 23 1., 3142.
Chang, S., Shi, Q., Yan, C., 1987. Iconic indexing by 2D strings. Wong, E.K., 1992. Model matching in robot vision by subgraph
IEEE Trans. Pattern Anal. Machine Intell. 9 3., 413428. isomorphism. Pattern Recognition 25 3., 287304.

Das könnte Ihnen auch gefallen