Sie sind auf Seite 1von 27

International journal of Computer Science & Network Solutions March.2015-Volume 3.No.

3
http://www.ijcsns.com ISSN 2345-3397

Review of Ontology Matching Approaches and


Challenges
Sarawat Anam, Yang Sok Kim, Byeong Ho Kang , Qing Liu
School of Engineering and ICT, University of Tasmania, Australia
Department of Management Information Systems, Keimyung University, Korea
Autonomous Systems, CSIRO Computational Informatics, Hobart, Tasmania, Australia

Abstract
Ontology mapping aims to solve the semantic heterogeneity problems such as ambiguous entity names,
different entity granularity, incomparable categorization, and various instances of different ontologies. The
mapping helps to search or query data from different sources. Ontology mapping is necessary in many
applications such as data integration, ontology evolution, data warehousing, e-commerce and data exchange
in various domains such as purchase order, health, music and e-commerce. It is performed by ontology
matching approaches that find semantic correspondences between ontology entities. In this paper, we
review state of the art ontology matching approaches. We describe the approaches according to instance-
based, schema-based, instance and schema-based, usage-based, element-level, and structure-level. The
analysis of the existing approaches will assist us in revealing some challenges in ontology mapping such as
handling ontology matching errors, user involvement and reusing previous match operations. We explain
the way of handling the challenges using new strategy in order to increase the performance.

Keywords: Ontology, ontology mapping, matching approaches, hybrid approach, challenges and
performance.

I. Introduction

Ontology represents knowledge as a set of concepts within a domain and the relationships
between those concepts. Ontology encompasses several data and conceptual models such as sets
of terms, classifications and database schemas (Shvaiko and Euzenat, 2008). There are some
differences and similarities between schema and ontology. Database schemas often do not
provide explicit semantics because semantics are specified at design time and are not available
(Noy and Klein, 2004). However, Ontologies are logical systems that provide explicit semantics.
The similarities between schemas and ontologies are that they both provide a vocabulary of
terms in which domain of interest are described and they both constrain the meaning of terms
that are used in the vocabulary (Uschold and Gruninger, 2004). For ontology matching, domain
ontologies or WordNet is necessary. However, the external dictionaries are not necessary for
schema matching because the meaning is encoded in the schemas. Ontology consists of a set of
discrete entities such as classes, properties and instances. Ontology mapping that provides a
unified view to the users is necessary to manage heterogeneity between two ontologies. Ontology

1
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

matching is a required task for ontology mapping that finds semantic correspondences between a
pair of entities. The necessity of ontology mapping is described below:
For example, a user wants to buy a computer within 500 dollars. He starts to search web-
based databases of different shops which sell electronic products. A web service application
provides services for searching computers in databases of different computer selling retailers
based on users input parameters. The problem is that all of the databases have been
created independently, and a same product may have different name. For example, select
computers from database where brand is Toshiba and UnitPrice is $500. The query will not
work for all ontology databases. An example of ontology mapping is given in Figure.1:

Figure. 1. Example of two ontologies mapping

For illustrating ontology mapping problem, we use two OWL schemas which are ontologies
O1 and O2 in Figure.1. Each ontology consists of classes, properties and instances. Classes are
shown in rectangles e.g., PurchaseOrder. Items is a subclass of PurchaseOrder. Television,
Refrigerator and Computers are subclasses of Items. The properties of Computers are Brand and
UnitPrice. The properties have some instances such as Acer, $500 etc. According to matching
algorithm such as string similarity metrics and text processing techniques, the confidence
measure between Computers in O1 and Personal_Computers in O2 can be 0.7. If the threshold
value is 0.5 for determining correct mapping that means the matching algorithm considers that
all the pairs of entities with a confidence measure higher than 0.5 as correct mapping elements,
the matching algorithm returns mapping decision to the user is TRUE. Another matching
algorithm matches TSB in O1 and Toshiba in O2 according to the meaning of the entities and
returns to the user that mapping decision is TRUE. After determining mapping between pairs of
entities of two ontologies, it is necessary to generate query expressions in order to translate data
instances of the mapped entities under an integrated entity. Then if users provide the query to the
integrated database, it will work to retrieve the appropriate answer.
Ontology mapping is used in many applications, e.g., ontology evolution (Noy et al., 2006),
data integration (Talukdar et al., 2010), data warehousing (Dessloch et al., 2008). Ontology
evolution is used for changing deployed ontologies. Data integration combines data residing in
multiple data sources. Data warehousing extracts data from multiple heterogeneous data sources,
transform the data in to an appropriate format and loads the data into the warehouse. In these
applications, mapping is the first step to run the actual system. Besides, there are some other
applications where ontology mapping is necessary, e.g., peer-to-peer information sharing

2
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

(Atencia et al., 2011), web service composition (Vaccari et al., 2009), search (Kitamura et al.,
2008) and query answering (He and Chang, 2006).
Some surveys have been conducted on ontology matching approaches. The survey (Wache et
al., 2001) focuses on the ontology-based approaches used for information integration. In the
survey (Shvaiko and Euzenat, 2005), the authors distinguish approximate and exact techniques at
schema-level, and syntactic, semantic, and external techniques at element and structure-level.
They also focus on the current state of the art schema/ontology matching approaches, techniques
and tools, and provide a comparative survey of the existing ontology matching techniques and
systems. Ontology based information extraction techniques are reviewed by (Wimalasuriya and
Dou, 2010). Shvaiko and Euzenat (Shvaiko and Euzenat, 2013) review state of the art ontology
matching approaches and compare the performance of the recent approaches for ontology
matching. They also address some challenges for ontology matching. Though many research
works have been conducted on ontology matching (Peukert et al., 2011, Ngo and Bellahsene,
2012), but users need to select a large number of classifiers and finally results are sent to users to
manually delete the incorrect matching and add the correct matching. The approaches need prior
knowledge, so are time-intensive and costly. An ontology matching approach is necessary with
less human intervention. Therefore, it is necessary to provide an updated summary of the existing
approaches. In this research, we describe the ontology matching approaches according to
instance-based, schema-based, instance and schema-based, usage-based, element-level, and
structure-level. We also find some challenges in the existing approaches and provide guidelines
for handling the challenges.

II. Matching Dimensions

(Shvaiko and Euzenat, 2005) classify the schema matching systems according to three
dimensions such as input of the systems, characteristics of the matching process and outputs of
the systems. The dimensions are described below:

A. Input Dimensions

The ontology matching systems have been classified depending on the data/conceptual
models they take as input. For example, some systems such GLUE (Doan et al., 2002),
SAMBO (Lambrix and Tan, 2011), LogMap (Jimnez-Ruiz and Grau, 2011), Malform-SVN
(Ichise, 2008), MaF (Martinez-Gil et al., 2012), RiMOM (Li et al., 2009), ASMOV (Jean-
Mary et al., 2009) take OWL as input. Some other systems: COMA++ (Aumueller et al.,
2005), Anchor-Flood (Seddiqui and Aono, 2009), GOMMA (Kirsten et al., 2011), YAM++
(Ngo and Bellahsene, 2012), AgreementMaker (Cruz et al., 2009), Falcon (Hu et al., 2008)
and PRIOR++ (Mao et al., 2010) support both RDFS and OWL models. The systems are
also classified according to different types of input information such as instance-based,
schema-based, both instance and schema-based and usage-based. They are described below:

1. Instance-based Matching

Instance-based ontology matching determines the similarity between ontology concepts


from the similarity of instances associated to the ontology concepts (Rahm, 2011). For
example, in Figure.1, matching between TSB and Toshiba of O1 and O2 respectively is the

3
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

instance-based. The semantics of ontology concepts depend on the instances, so instance-


based matching determines the high-quality correspondences (Rahm, 2011).There are some
advantages of instance-based ontology matching (Thor et al., 2007): 1) the number of
instances is higher than the number of concepts which helps to determine the degree of
concept similarity; 2) the match accuracy of the approach can be high though there are some
instance mismatches.
For determining the similarity between instances, all the instances of an ontology
concept are combined into a virtual document. In this way, many virtual documents are
created from all the ontology concepts. Then the documents are compared with each other
using document similarity measure, TF/IDF for completing matching. The approach has
been implemented in some systems including Rimom (Li et al., 2009) and
Coma++(Massmann and Rahm, 2008). In Coma++, Massmann and Rahm use website
names and descriptions for document similarity. Instance overlapping methods are also used
for determining concepts similarity. In Coma++, URLs are used to identify the overlapping
between web directories such as Yahoo and Google considering URL usage. Four similarity
measures such as Base-k similarity, dice, minimum and maximum are used for determining
URL-based similarity. The URL matching alone achieves average f-measure 60% and
average 79% with the combinations of name and description matching. Instance overlapping
method is also used for matching large life science ontologies (Kirsten et al., 2007) and
product catalogs (Thor et al., 2007). (Thor et al., 2007) use similarity of associated instances
for deriving the similarity between concepts. They also use hyperlinks between data sources
and general object matching for performing instance matching. (Hoshiai et al., 2004)
compare feature vectors between a pair of concepts using keyword found in the instances,
and the similarity between feature vectors are determined by a structural matcher. There are
other instance-based matching systems such as GLUE (Doan et al., 2002) and SAMBO
(Lambrix and Tan, 2011).

2. Schema-based Matching

Instance-based matching is not feasible in the process of making knowledge discovery easy
and systematic. This is because instance-based matching has some problems such as lack of
expressivity and schema heterogeneity (Jain et al., 2010b). The problems can be solved by
schema-based matching between the data sources. Schema-based ontology matching
determines the similarity between ontology concepts. For example, in Figure.1, schema-
based matching is between UnitPrice in O1 and Price in O2. The advantage of schema-
based matching is that it benefits both the Artificial Intelligence and Semantic Web
Communities for some applications such as querying, reasoning, data integration, data
mining and knowledge discovery (Anam et al., 2015a). After determining mapping between
two schemas, next step is to generate query expressions that automatically translate data
instances of these schemas under an integrated schema (Shvaiko and Euzenat, 2005).

Some other schema-based matching systems are S-Match (Giunchiglia, 2003),


Anchor-flood (Seddiqui and Aono, 2009), ASMOV (Jean-Mary et al., 2009), Falcon (Hu et
al., 2008). S-Match implements semantic matching by a decider propositional satisfiability.
Anchor-Flood starts matching from a small number of concepts called anchor. Then it
incrementally matches neighbors such as super-concepts, sub-concepts, and siblings of each

4
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

anchor until no further matches are found, thereby building small segments (fragments) out
of the ontologies to be matched. The system uses terminological matchers, structural
matchers and background knowledgebase (WordNet). ASMOV (Jean-Mary et al., 2009)
uses terminological matchers, structural and extensional matchers for similarity calculation
between two ontologies, and examines disjoint-subsumption contradiction, subsumption
incompleteness for semantic verification. The matching process is repeated with the
obtained alignments input until no new correspondences are found. Falcon (Hu et al., 2008)
is an automatic partition-based system that uses divide and conquer approach for ontology
matching.

3. Instance and schema-based Matching

There are some schema and ontology mapping systems which support both instance and
schema-based matching. Among them, COMA++ (Aumueller et al., 2005) is a generic
schema and ontology matching system where simple, hybrid and reuse oriented matchers are
used. (Martinez-Gil et al., 2012) develop an ontology matching framework, MaF that uses
the largest number of algorithms and it can test the largest combinations of algorithms.
However, the framework does not give the scope to correct and validate the result if the
selected algorithm produces wrong mapping results. Some other instance and schema-based
systems are: GOMMA (Kirsten et al., 2011) and YAM++ (Ngo and Bellahsene, 2012).

4. Usage-based Matching

Usage-based schema matching has been introduced in the available literature. Elmeleegy et
al (Elmeleegy et al., 2008) propose a usage-based schema matching approach that exploits
information extracted from the query logs to find correspondences between attributes in the
relational schemas to be matched. The approach does not depend on the schema information
or the data instances. In the approach, first, co-occurrence patterns are determined between
attributes and additional features, such as their use in joins and with aggregation functions;
second, a genetic algorithm is applied to find the highest-score mappings according to the
scoring function used to measure the similarity between the features of the matching
attributes. The advantage of the approach is that it can match schemas even if their attribute
names are opaque or the layouts of the schemas are different. Nandi and Bernstein (Nandi
and Bernstein, 2009) develop hamster approach that uses the click log for keyword queries
of an entity search engine. The approach determines the matching between schema elements
if the distributions of keyword queries that cause click-throughs on their instances are
similar. However, the problem of usage-based matching is that it is very difficult to get the
suitable usage data (Rahm, 2011).

B. Process Dimensions

The classification of the matching process depends on the approximate or exact nature of its
computation (Shvaiko and Euzenat, 2005).The absolute solution to a problem is computed
by the exact algorithms and the approximate algorithms sacrifice exactness to performance
(Ehrig and Sure, 2004). The techniques used for schema/ontology matching systems are
either approximate or exact. Interpretation of the input data is another way for analyzing the

5
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

matching algorithms. (Shvaiko and Euzenat, 2005) find out three large classes such as
syntactic, external and semantic.

C. Output Dimensions

The ontology matching systems have been classified based on the match results they
produce. Each mapping element of the match result can have a mapping expression that
specifies how source schema and target schema are related. In mapping expressions,
matching cardinality problems are also considered (Rahm and Bernstein, 2001). There are
four types of matching cardinality problems such as 1:1, 1:n, n:1 and n:m. If matching is
done element by element that means one element of source schema is matched with one
element of target schema, then it is called 1:1 matching cardinality. For example, id of a
source schema is matched to sid of a target schema. When more than one elements of source
schema is matched with one element of target schema, then it is called n:1 matching
cardinality. The mapping of firstname and lastname to fullname is defined as n:1 matching
cardinality. Matching cardinality, 1:n is opposite of n:1. If address is matched to street, city,
then it is called 1:n matching cardinality. When more than one element of source schema are
mapped with more than one element of target schema, then it is called n:m matching
cardinality. Most of the ontology matching systems such as GLUE (Doan et al., 2002),
SAMBO (Lambrix and Tan, 2011), LogMap (Jimnez-Ruiz and Grau, 2011), Malfom-SVM
(Ichise, 2008), MaF (Martinez-Gil et al., 2012), RiMOM (Li et al., 2009), COMA++
(Aumueller et al., 2005), Anchor-Flood (Seddiqui and Aono, 2009), GOMMA (Kirsten et
al., 2011), YAM++ (Ngo and Bellahsene, 2012), Falcon (Hu et al., 2008) and PRIOR++
(Mao et al., 2010) produce 1:1 alignment as output. ASMOV (Jean-Mary et al., 2009)
supports only n:m match cardinality. AgreementMaker (Cruz et al., 2009) system outputs all
1:1, 1:n, n:1 and n:m alignments.

III. Classification of the Ontology Matching Approaches

(Rahm and Bernstein, 2001) classify the schema-based matching into element-level and structure
level, and the instance-based matching into only element-level. The element-level matching is
divided into syntactic and external, and the structure-level matching is divided into syntactic,
external and semantic according to (Shvaiko and Euzenat, 2005). The ontology matching
approaches are described below:

A. Element Level Matching

Element level matching considers matching names of the entities using terminological
matching methods, and combination methods. It also considers constraint-based matching
techniques and linguistic-based matching resources.

1. Terminological Matching

Terminological matching is a basic approach that compares matching entities using string
similarity metrics and text process techniques. String similarity metrics are measures that
calculate the degree of similarity between names and name description of ontology entities.

6
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

Similarity measures produce numeric value ranging from 0 to 1 in normalized similarity


metrics. The performance of string similarity metrics was analysed in the context of name
matching in the available literature. (Cohen et al., 2003) analyse the performance of the
string similarity metrics in the name-matching tasks. They compare the performance of the
string metrics according to edit distance metrics such as Levenshtein, Jaro-Winkler, Jaro-
measure, Needleman-Wunsch, Smith-Waterman and N-gram, token-based distance metrics
such as TFIDF, Cosine, Jaccard and hybrid metrics such as Monge-Elkan. They find that the
performance of Monge-Elkan, TFIDF and Soft TFIDF is the best among each category.
(Jimenez et al., 2009a) perform some experiments on 12 name matching data sets comparing
generalized Monge-Elkan with three representative character-based string measures:
Bigrams, Edit distance, and Jaro similarity. They find that the generalized Monge-Elkan
method outperforms the original Monge-Elkan method when character-based measures are
used to compare tokens. In the record linkage problem, the similarity measures are used to
quantify the degree of similarity or closeness of two data entities (Koudas et al., 2006).
(Stoilos et al., 2005) develop a string metric for ontology alignment and compare this metric
to other string metrics such as Levenstein, Jaro-Winkler, Monge-Elkan, Smith-Waterman,
Needleman-Wunsch, 3-gram and substring on a subset of the OAEI benchmark test set.
They find that the performances of Monge-Elkan and Smith-Waterman are very poor on
their test set.
Only string similarity metrics does not provide good performance for matching. For
increasing performance, string text processing techniques such as tokenization, synonym,
abbreviation, stop word removal, stemming and translation (Cruz et al., 2009, Jean-Mary et
al., 2009, Jain et al., 2010a, Li et al., 2009, Lambrix and Tan, 2011, Mitra et al., 1999,
Madhavan et al., 2001b, Madhavan et al., 2001a) are necessary. (Cheatham and Hitzler,
2013) report the evaluation results of string similarity metrics in ontology alignment
problem. Their results show 1) that the performance of different string similarity metrics
varies greatly for some types of ontologies, 2) that the impact of string text processing
strategies are in many cases unhelpful and in some cases count-productive, and 3) that the
appropriate string text processing strategies can improve the performance. (Anam et al.,
2014a) compare the names of the schemas using different string metrics and find the best
string metric and threshold value. In addition, they use string text processing techniques to
process the names of the schemas that contain combined words, abbreviated and synonym
words. They analyze the performance differences with various settings like string-metric +
tokenization, string-metric + abbreviation, string-metric + synonym, string-metric +
abbreviation + tokenization, string-metric + tokenization + synonym and string-metric +
tokenization +abbreviation+ synonym. Finally, they find out the best string metric, threshold
value and text processing technique. ASMOV (Jean-Mary et al., 2009) uses tokenization,
string equality, Levenshtein distance in the terminological matching. It also uses WordNet
and UMLS as background knowledge. Tokenization, string equality and Winkler-based
similarity for matching concepts are used in Falcon (Hu et al., 2008).

2. Matching by Combination Methods

In order to improve the performance by handling matching errors, some combination


methods such as machine learning, knowledge engineering, neural network, and hybrid
approach can be applied.

7
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

Machine Learning based Matching

Machine learning techniques has been used in some systems including GLUE (Doan et al.,
2002) and SAMBO (Lambrix and Tan, 2011) for instance-based matching. GLUE focus on
matching ontologies based on product catalogs or web directories, and SAMBO handles
biomedical ontologies. GLUE uses TFIDF for name matching and synonyms as a string pre-
processing approach. The system consists of distribution estimator, similarity estimator, and
relaxation labeler. It uses machine learning techniques like Multi-strategy learning approach
as base learner, Naive Bayes for classifying text, and Meta learner combines the prediction
of base learners and assigns weights to base learner for finding matching among a set of
instances. SAMBO uses similarity-based matchers including terminological, structural
(concept to concept) and background knowledge based (UMLS and WordNet). In the
terminological matching, it uses n-gram and edit-distance for matching a list of words. The
results are produced by the combination of all the matchers based on users defined
threshold. In the structure level matching, it considers two concepts are similar with respect
to is-a or part-of hierarchies relative to already matched concepts. The systems work in two
phases: training and matching. In the training phase, concept classifiers are learned for the
available instances of ontology. In the matching phase, the learned concept classifiers are
applied to the instances of the other ontology in order to determine the concepts; an instance
is predicted to belong to. For deriving concept similarities and concept correspondences, the
instance-concept associations are aggregated by Jaccard-based set similarity measure are
used. However, GLUE can classify small ontology sizes between 31 and 331 concepts, and
SAMBO was evaluated for small ontology sizes between 10 and 112 concepts (Rahm,
2011). Though machine learning is promising for element similarity, but it needs well
prepared training dataset which is difficult to get for large datasets. It also needs to rebuild
the training model if schema data changes over time.

For both instance and schema-based matching, machine learning techniques have also
been used in some systems. YAM++ (Ngo and Bellahsene, 2012) is a ontology matching
system that uses machine learning techniques such as decision tree, Support Vector Machine
and Nave Bayes for combining string similarity metrics in order to produce mappings at the
element level. Another machine learning framework for ontology matching using SVM,
Malform-SVM (Ichise, 2008) constructs attributes using the methods word list similarity,
concept hierarchy similarity and structure similarity, and feed into machine learning
algorithm, Support Vector Machine (SVM) for predicting correct and incorrect mappings.

Knowledge Engineering based Matching

Knowledge engineering based approach has been used in some ontology matching systems
(Aumueller et al., 2005, Chai et al., 2008, Kirsten et al., 2011) which support both instance
and schema-based matching. COMA++ (Aumueller et al., 2005) is a generic schema and
ontology matching system where simple, hybrid and reuse oriented matchers are used. The
system combines multiples matching algorithms in a flexible way and supports multiple
schemas and ontologies. In the system, users need to select the best combination of
matchers. However, determining best combination of matcher is not easy. GOMMA (Kirsten

8
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

et al., 2011) is a component-based infrastructure which manages, matches and analyses


many versions of different life science ontologies. In the system, three functional
components such as Match, DIFF and Evolution use ontologies, entities and mappings. It
considers both instance and schema level matching. For matching ontologies, two types of
matchers are used: metadata based matcher (linguistic, child, path, similarity flooding
algorithms) and annotation based matcher. The system scaled large ontologies successfully
in the OAEI 2011 competition. It supports OnEX user interface for giving facilities to the
users to add and delete concepts and attributes. It is used for query ontology and mapping
versions, statistics. However, it does not support reasoning based diagnosis. The knowledge
engineering approach, obtaining knowledge from experts and incorporating it into expert
systems is difficult and time consuming. The difficulties arise because experts never report
on how they reach a decision, rather they justify why the decision is correct. These
justifications vary markedly with the context, in which they are required, but in context they
are accurate and adequate; the difficulties in knowledge engineering arise from taking the
justification out of context.

Neural Network based Matching

Neural network has also been used in ontology matching system. (Mao et al., 2010) propose
PRIOR+, a generic neural network based ontology matching approach. It consists of three
major modules, i.e., the IR-based similarity generator, the adaptive similarity filter and
weighted similarity aggregator, and the neural network based constraint satisfaction solver.
The approach measures linguistics and structural similarities of ontologies in a vector space
model. In the linguistic similarity, it pre-processes each element by stop word removal,
stemming and tokenization techniques. In the structural similarity, it considers hierarchical
structure of the classes. It estimates the harmony of each similarity upon its corresponding
matrix. Then for removing the false mappings, it adaptively aggregates different similarities,
and uses the aggregated results to improve overall mapping performance using interactive
activation and competition (IAC) neural network based constraint satisfaction model.
Therefore, the approach solves some problems of ontology mapping: 1) aggregating multiple
similarities of multiple mapping strategies, 2) manually setting parameters in aggregation
functions, and 3) handling ontology constraints.

Hybrid approach

There are some limitations of the above combination methods. Machine learning approaches
need to rebuild a training model if schema data changes over time; Knowledge engineering
approaches need time consuming knowledge acquisition; and neural networks are not good
for large datasets. In order to avoid the limitations, incremental knowledge engineering
approach, Censor Production Rules (CPR) based Ripple Down Rules (RDR) (Kim et al.,
2012) can be applied. The approach improves performance incrementally for schema
mapping (Anam et al., 2014b), but it needs to create rules one by one which is time-
intensive. Therefore, it is necessary to apply hybrid approach. Hybrid approach, Hybrid-
RDR has been implemented in (Anam et al., 2015b) for schema mapping. The approach
combines machine learning algorithm with incremental knowledge acquisition approach.
The advantage of Hybrid-RDR approach is that only one classification model is created by

9
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

decision tree for a small amount of schema data and knowledge base is then built
incrementally by adding rules to solve the schema matching problems. The approach helps
to reduce time in two ways. Firstly, it does not create classification model when schema data
changes over time. Secondly, it does not classify all the related schemas one by one by
manually creating rules. The Hybrid-RDR approach is useful where there are large numbers
of validation cases are at hand. The same hybrid approach can be applied for ontology
mapping.

3. Constraint-based Matching Techniques

Constraint-based matching techniques are used to handle the internal constraints which are
applied to the definitions of entities such as datatypes, keys and cardinality (Shvaiko and
Euzenat, 2005). Datatypes include string, integer, varchar and date. Keys can be primary,
unique and foreign. Constraint-based matching is necessary to consider in measuring
similarity between schemas if the schemas contain the constraint information (Rahm and
Bernstein, 2001). For example, the datatype and key of attribute empName of one dataset are
varchar and primary respectively, and the datatype and key of attribute ename of another
dataset are string and unique respectively, then the attributes are matched because string is
equivalent to varchar, and primary key is equivalent to unique. Some matching systems use
the technique. SMB (Marie and Gal, 2008) considers name and domain constraints for
matching schemas. CUPID (Madhavan et al., 2001a) combines names, data types,
constraints at the finest level of granularity.

4. Linguistics-based Matching Resources

The ontology entity pair can be matched according to their semantics using linguistics-based
matching resources such as domain and common knowledge thesauri. telephonecontact is
an example of synonym matching where purchase order domain knowledge thesaurus
created for COMA (Do and Rahm, 2002) is used as an external dictionary. Common
knowledge thesaurus, WordNet (Miller, 1995) is used for measuring the semantic similarity
between entities. WordNet is partitioned into nouns, verbs, adjectives, and adverbs, which
are organized into synonym sets, each representing one underlying lexical concept.
Synonym sets are also called synsets and these sets are interlinked by different relations,
such as hypernymy, hyponymy, antonymy, meronymy and holonymy. For instance, entities
section and chapter are semantically similar and matched using WordNet.

B. Structure Level Matching

The result of element level matching is used in the structure level matching to identify the
structural similarity between a pair of nodes from two ontologies by analyzing the positions
of nodes on the hierarchical structure of the graphs. Structure matching is used to adjust
incorrect matches from matching phase. (Melnik et al., 2002) present a graph matching
algorithm called Similarity Flooding (SF) and explore its usability for schema matching. The
approach converts schemas into directed labelled graphs and uses fix point computation to
determine the matches between corresponding nodes of the graphs. They use the concept
that two nodes are matched based on the matching of neighborhood. The algorithm is used

10
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

in some systems (Kirsten et al., 2011, Ngo and Bellahsene, 2012) for matching the
hierarchical structure of a full graph. In other systems (Wang, 2011, Li et al., 2009, Mao et
al., 2010, Seddiqui and Aono, 2009), similarity propagation is used for matching nodes of a
graph. (Madhavan et al., 2001a) represent algorithm for matching tree structure based on
children-context and leaf-context. The contexts are described below:

Children-context. In this context, the similarity of the children nodes is used for
determining the structural similarity between inner nodes of the graphs. That means two
non-leaf elements are structurally similar if their immediate children sets are highly similar
according to element level matching.

Leaf-context. In this context, the similarity of the leaf nodes is used for determining the
structural similarity between inner nodes of the graphs. The leaf sets in the two trees are
similar if they are similar according to element level matching, and if elements in their
respective vicinities (ancestors and siblings) are similar. Two non-leaf elements are similar
if the leaf sets are highly similar, and even if their immediate children are not.

The above algorithm has been used in some systems (Do and Rahm, 2002, Aumueller et
al., 2005). Some other algorithms such as role similarity analysis (Martinez-Gil et al., 2012),
structural proximities, clustering and GMO (Hu et al., 2008) for doing structure level
matching. The similarity between nodes can be computed based on their relations (Maedche
and Staab, 2002). In S-Match (Giunchiglia and Shvaiko, 2003), semantic structure matching
is implemented by a decider propositional satisfiability (SAT). The algorithm works only on
Directed Acyclic Graphs (DAGs) and is-a links. SAT deciders are correct and complete
decision procedures for propositional logics. SAT allows us to find only and all possible
mappings between elements. Similarity flooding (Melnik et al., 2002) algorithm is used for
calculating similarity in some systems (Ngo and Bellahsene, 2012, Kirsten et al., 2011).
Anchor-Flood (Seddiqui and Aono, 2009) uses internal, external similarities; and iterative
anchor-based similarity propagation in the structure level matching. Weighted sum of the
domain and range similarities are used for computing iterative fix point, and measuring
hierarchical and restriction similarities in ASMOV (Jean-Mary et al., 2009).

C. Aggregation Functions

The mapping discovered from element level and structure level matching is combined by
using some aggregation functions. We define the similarity values found from element level
matching and structure level matching by esim and ssim respectively. The aggregation
functions are described below:

Harmonic mean: Harmonic mean is calculated by the following function (1):


Harmonic mean=2*esim*ssim/ (esim+ssim) (1)
This combination strategy is used in the systems (Do and Rahm, 2002, Ngo et al., 2011a).

Average: The average similarity is calculated by dividing the sum of the similarity
values of two string metrics for each name pair by the total number of similarity functions.
Average value is calculated by the following function (2):

11
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

Avg= (esim+ssim)/2 (2)


The matching systems which use this strategy are (Do and Rahm, 2002, Volz et al., 2009,
Jimenez et al., 2009b).

Minimum: This strategy returns the minimum similarity value between two string
metrics. Minimum value is calculated by using the following function (3):
Min=Math.min (esim, ssim) (3)
This combination strategy is used in some matching systems (Do and Rahm, 2002, Volz et
al., 2009, Massmann and Rahm, 2008).

Maximum: This strategy returns the maximum similarity value between two string
metrics. Maximum value is calculated by using the following function (4):
Max=Math.max (esim, ssim) (4)
The matching systems use this strategy are (Do and Rahm, 2002, Volz et al., 2009,
Massmann and Rahm, 2008) .

Weighted: This strategy returns a weighted sum of the similarity values. The values
found from structure level matching is used as the threshold value which is the weight of
element level matching, and the weight for structure level matching, Wstruct is (1-threshold)
(Ngo et al., 2011b). The weighted similarity of the entity pair, e1 and e2 is calculated by (5)
wsim(e1,e2)= Wstruct .ssim(e1,e2)+(1- Wstruct).esim(e1,e2) (5)
This combination strategy is used in some matching systems (Do and Rahm, 2002, Ngo and
Bellahsene, 2012, Madhavan et al., 2001b).

IV. State of the Art Ontology Matching Systems

We now describe some state of the art ontology matching systems based on the classifications
provided in Section III.

Malfom-SVM. Malfom-SVM (Ichise, 2008) is an ontology matching framework that takes


ontologies in OWL as input and produces 1:1 mapping output between concepts. The
framework uses different types of concept similarity measures including string-based, graph-
based, instance classification and knowledge-based similarity. It uses prefix, suffix, edit
distance and n-gram for measuring string-based similarity, and WordNet for knowledge-
based similarity measures between concepts. In addition, it measures similarity based on
word list and concept hierarchy. In the structure level matching, it uses 16 similarity
measures for parents. They construct attributes using the above similarity measures and feed
into machine learning algorithm, Support Vector Machine (SVM) for predicting correct and
incorrect mappings. The framework does not support any GUI and need classified examples
as it uses supervised machine learning algorithm.

Falcon. Falcon (Hu et al., 2008) is an automatic system that uses divide and conquer
approach for ontology matching. It handles large ontologies which are represented in OWL
and RDFS formats. The approach first partitions the ontologies with a structure-based
partitioning to separate entities (classes and properties) of each ontology into a set of small

12
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

clusters using Rock agglomerative clustering algorithm (Guha et al., 2000). Then it
constructs blocks out of these clusters. In the second phase the blocks from distinct
ontologies are matched based on threshold using I-SUB string comparison technique.
Finally, in the third phase the results of the so-called V-Doc (a linguistic matcher) and GMO
(an iterative structural matcher) techniques are combined via sequential composition to
discover alignments between the matched block pairs. According to the results of the
anatomy track of OAEI-2007, the performance of Falcon was better than other systems like
S-Match, COMA. For matching schemas of the contest, Falcon took several minutes to
complete, but other systems took hours and even days.

RiMOM. RiMOM (Li et al., 2009) is a instance-based and schema-based ontology matching
system. It inputs ontologies in OWL format and produces output based on 1:1 mapping
cardinality. It is one of the first systems that implements dynamic multi-strategy selection of
matchers. Two types of similarity measure techniques are used in this system: linguistic
similarity, structural similarity. In the linguistic matching, it matches elements consisting of
name, comments and instance values using edit distance over entity labels, vector distance
among comments and instances of entities. It also uses WordNet as background knowledge.
In the structure level matching, it uses Similarity Flooding algorithm for matching concept
to concept, concept to property and property to property. The system is among the best
performing prototypes in the OAEI contests until 2009.

AgreementMaker. AgreementMaker (Cruz et al., 2009) is a system that comprises a


number of automatic matching methods considering conceptual and structural level of
granularities, a user interface, a set of evaluation strategies, user feedback, the type of
components: schema only or schema and instances. It matches ontologies of various
domains such as geospatial, environmental and biomedical. It handles ontologies in XML,
RDFS, OWL, N3 and outputs 1:1, 1:n, n:n and n:m alignments. Matching process consists of
two phases: similarity computation and alignment selection. In the similarity computation, it
uses terminological and structural matchers. The system combines matchers using three
layers. In the first layer, matchers compare concept features such as labels, comments,
annotations and instances that are represented as TFIDF vectors. The features are compared
using cosine, edit-distance, JaroWinkler. The system also uses WordNet as background
knowledge during matching. In the second layer, structural ontology properties are matched
using descendants similarity inheritance and siblings similarity contribution. Matching
results of first two layers are combined and filtered based on user-defined threshold. The
final results are sent to the user for feedback (approval, rejection or modification). In terms
of f-measure in the oriented matching track, the best results produced by SAMBO, RiMOM
and AgreementMaker in OAEI-2008, OAEI-2009 and OAEI-2010 respectively.

LILY. LILY (Wang, 2011) is an ontology matching system that inputs RDF ontology and
extracts a semantic subgraph for each entity. The information of semantic subgraph is the
basic descriptions such as identifier, label and comments; concept descriptions such as class
hierarchies, related properties and instances; property description such as hierarchies
(Melnik et al., 2002), domains, ranges, restrictions and related instances. It uses both
linguistic and structural information in semantic sub graphs to generate 1:1 alignments. It
combines three matching strategy for generating matching tasks. In the system, Generic

13
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

Ontology Matching (GOM) for normal size ontologies and Large Scale Ontology Matching
(LOM) for large scale ontologies are used for generating matching tasks. For large scale
ontologies, the system uses positive and negative reduction anchors for reducing the time
complexity in matching. The system also uses ontology mapping debugging for verifying
and improving the mapping results. The matching process proceeds in three steps: 1)
extracting semantic subgraph, 2) computing alignment similarity and 3) similarity
propagation based on semantic subgraphs. The system uses classic image threshold selection
algorithm to automatically select threshold and generates 1:1 alignments using stable
marriage strategy. It does not use WordNet as background knowledge and does not consider
ontology constraints. In the system, it is necessary to manually set the size of subgraph
according to different mapping tasks.

LogMap. LogMap (Jimnez-Ruiz and Grau, 2011) is a ontology matching system that maps
ontologies in OWL format in the biomedicine domain. The system can deal with
semantically rich ontologies such as SNOMED CT, the National Cancer Institute Thesaurus
(NCI), and the Foundational Model of Anatomy (FMA) containing tens of thousands of
classes. The system can handle highly optimized data structures for lexically and structurally
indexing the input ontologies. In the lexical indexation, LogMap uses the labels of the
classes in each ontology and their lexical variations for indexation, and uses WordNet or
UMLS-lexicon for enriching the indexes. In the structural indexation, it uses interval
labelling schema for considering hierarchical structures of the classes. The lexical indexes
are used to compute an initial set of anchor mapping and to assign a confidence value to
each of them. The system works like an iterative process for new mapping starting from the
initial anchor mappings and using the ontologies extended class hierarchy. The system
detects incoherencies and uses greedy diagnosis algorithm for repairing to improve the
quality of the resulting alignment. The goal of a repairing process is to restore coherence by
minimally changing the input. LogMap2 (Jimnez-Ruiz et al., 2012) is an improved version
of LogMap system that maps ontologies in the biomedicine domain. The system supports
user interaction during matching (which is essential for use cases requiring very accurate
mapping), scalability of large ontologies and reasoning based diagnosis. However, in both
LogMap and LogMap2, reasoning-based techniques aggravate the scalability problem,
which restricts their application with more effective and complex matching strategies.

Ontology modularization technique is used for detecting incoherent concepts and a repair
technique is used to minimize the incoherence and removal of matches from the input
alignment (Santos et al., 2013). In the system, conflicts sets of mappings are computed using
depth first search in the core fragments structure. For filtering conflicts sets of mappings,
confidence values are used in this system. For removing conflicts sets of mapping, two
approaches have been applied: 1) compute all disjoint clusters of conflicting sets and 2)
compute and remove the mappings that belongs to the highest number of unresolved
conflicts sets using predefined depth first search. But the repair technique causes loss of
recall which is needed to be solved.

YAM++. YAM++ (Ngo and Bellahsene, 2012) is a machine learning based ontology
matching system. It takes ontologies in some ontology languages such as N3, RDF, and
OWL as input and produces 1:1 mappings between entities as output by element level and

14
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

structure level matchers. At the element level matching, input ontologies are parsed to
extract label and comments as entity. In this level, terminological matcher and extensional
matcher are used. Terminological matcher uses different string similarity metrics for
computing similarity score between entities. It also uses machine learning algorithms
decision tree, Support Vector Machine and Naive Bayes for combining string similarity
metrics in order to produce mappings at the element level. Extensional matcher uses external
instances with ontologies to find out similar instances from two ontologies, and also to
discover new mappings between two instances. At the structure level matching, the system
uses the results of element level matching to identify the structural similarity of the entities
by analyzing the positions of schemas on the hierarchy structure of ontologies. Structural
similarity algorithm of Similarity Flooding (Melnik et al., 2002) is used to calculate
structural similarity. The authors also use global diagnosis optimization method (Meilicke,
2011) in terms of semantic matching to refine candidate mappings. The system has a
Graphical User Interface (GUI) for selecting different configurations and displaying
matching results.

MaF. (Martinez-Gil et al., 2012) develop an ontology matching framework, MaF that takes
two OWL ontologies as inputs and produces 1:1 mappings between ontologies. It performs
both element and structure level matching in both instance and schema level. In the element
level matching, the framework uses concept similarity analysis (CSA2) algorithms including
distance-based, name-based and WordNet-based methods. It uses Role similarity analysis
(RSA2) algorithms including class, object property and data type property methods in the
structure level matching. It also uses hybrid Similarity analysis (HSA2) algorithms that
combine CSA2 and RSA2 for mapping between ontologies. In addition, it combines different
mapping results by using average, maximum, minimum, Murkowski distance, weighted
product and weighted sum aggregation functions. For getting the most promising mapping
result, the framework allows users to filter the mapping results using hard, delta and
proportional threshold. It uses the largest number of algorithms and it can test the largest
combinations of algorithms. However, the framework is user-dependent and users need to
gather experience to select the appropriate algorithm for mapping. The framework does not
give the scope to correct and validate the result if the selected algorithm produces wrong
matching results.

OMReasoner. OMReasoner (Shen et al., 2014) is another ontology matching system


combines multiple individual matchers: string similarity metrics (prefix, suffix, edit
distance) for syntactic matching, external dictionary (WordNet) and description logic (DL)
reasoner for semantic matching, and constraint-based matching techniques. The multiple
matching results are combined by weighted summarizing algorithm (WeightSum) and
maximum method (Max). There are some limitations of the system. 1) It uses threshold in
the syntactic matching to determine whether the similarity is regarded as equivalence.
Applying a specific threshold value for all ontologies is not feasible. 2) It does not use
preprocessing techniques such as stemming and tokenization which are necessary to
eliminate specific character and separating compound words respectively. 3) It does not use
comments and labels of concepts for matching. 4) It employs description logic (DL)
reasoner with external rules to reason about the ontology matching. However, the reasoning
technique is time intensive and it does not have large impact on the results. Finally, it does

15
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

not consider structure level matching which is necessary to match hierarchical context of
concepts.

XMap++. XMap++ (Djeddi and Khadir, 2014) matches two ontologies using string,
linguistic and structural based similarity matchers. It uses cosine similarity as a string
similarity measure in order to match labels, names and identity of two ontologies. Here,
Bing Translator is used to translate labels with different languages. WordNet is used as a
Linguistic matcher for matching words semantically. The system also loads WordNet fully
into memory in order to reduce time for matching large ontologies. At the structure level
matching, it uses adjacency relationships (subClassOf and is-a) for correcting the entities
which are not matched by string and linguistic matchers. Aggregation operators:
aggregation, selection and combination are used for combining the similarity values
produced by three matching techniques. Finally, threshold value is applied to filter the result.
However, the system does not take into account the comments of entities and instance
information. In addition, it cannot avoid multiple accesses to the Microsoft Translator within
the matching process.

AOTL. AOTL (Ontology Alignment at Terminological and Linguistic level) (Khiat and
Benaissa, 2014) is an ontology matching system that performs matching only at the element
level. It takes two ontologies and extracts their entities: names, labels, properties and
instances. It calculates similarities between entities using string similarity algorithms such as
Levenshtein-distance, block-distance, Jaro, SLIM-Winkler, Jaro-Winkler, Smith-Waterman
and Needleman-Wunsch at the terminological level, and uses external resource, WordNet at
the linguistic level. It represents the calculated similarities in matrix, and applies a filter for
identifying the alignment. However, choosing a specific threshold value for all ontologies is
not feasible as the ontologies have own characteristics. The systems do not consider
structure level matching. In addition, they cannot process ontologies written in different
languages.

MassMtch. MassMtch (Schadd and Roos, 2014) performs both element and structure level
ontology matching. It uses 3-grams, Jaccard and WordNet at the element level, and hybrid
similarity at the structure level. Virtual document similarity that uses a weighted
combination of descriptions of concepts is applied in the system. All the similarities are
combined using average aggregation function. Finally, Naive descending extraction
algorithm is applied on the aggregated similarity matrix for determining the final mappings.
However, the system does not support multi-lingual problems. In addition, it has run time
and memory problems for large ontologies.

A. Summary of the Ontology Matching Systems

The summary of ontology matching systems is given in TABLE 1. In the table, The Systems
column represent the names of the ontology matching systems. The Input column describes the

16
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

types of input used in the systems. The column Match cardinality provides information about the
outputs produced by the systems. Schema/instance describes whether the systems perform
mappings in the instance or schema or both level. The systems which have GUI (Graphical User
Interface) and which do not have are represented in the GUI column. The column Techniques
describes that machine learning-based, rule-based and neural-based techniques are used in the
ontology matching systems. The columns Element level matching and Structure level matching
describe that the methods which are used for both matching, and which approach perform
matching in which level.

TABLE I
SUMMARY OF ONTOLOGY MATCHING SYSTEMS

Systems Input Match Schema/ GUI Techni Element Structure


cardin instance ques level level
ality matching matching
GLUE OWL 1:1 instance no Naive TFIDF, -
(Doan et Bayes synonym,
al., 2002) lexical and
domain
knowledge
COMA++ SQL, 1:1 schema yes rules String children and
(Aumueller W3C and similarity leaf matching
et al., 2005) XSD, instance metrics, text
RDF, processing
OWL techniques
SAMBO OWL 1:1 instance yes Naive String Interactive
(Lambrix Bayes similarity structural
and Tan, metrics, similarity
2011) WordNet based on is-
a, part-of
hierarchies
GOMMA OBO, 1:1 schema yes rules N-gram, similarity
(Kirsten et OWL, and Levenshtein, flooding
al., 2011) RDF instance external algorithm
knowledge
sources
LogMap OWL 1:1 schema no rules WordNet or interval
(Jimnez- UMLS- labelling
Ruiz and lexicon schema
Grau, 2011)
LILY RDF 1:1 Schema no - linguistics similarity
(Wang, and propagation
2011) instance based on
semantic
subgraphs

17
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

YAM++ N3, 1:1 schema yes decisio String similarity


(Ngo and RDF, and n tree, similarity flooding
Bellahsene, OWL instance SVM, metrics
2012) Naive
Bayes
Malform- OWL 1:1 schema no SVM String Path list and
SVN and similarity word list
(Ichise, instance metrics and
2008) word net
MaF OWL 1:1 schema yes Concept Role
(Martinez- and similarity similarity
Gil et al., instance analysis analysis
2012) (CSA2) (RSA2)
algorithms algorithms
Anchor- RDFS, 1:1 schema yes - tokenization, internal,
Flood OWL string external
(Seddiqui equality and similarities;
and Aono, Winkler- iterative
2009) based anchor-based
similarity, similarity
WordNet propagation
RiMOM OWL 1:1 schema no - Edit distance, similarity
(Li et al., and vector propagation
2009) instance distance,
WordNet
ASMOV OWL n:m schema no - tokenization, interactive
(Jean-Mary string fix point
et al., 2009) equality, computation,
Levenshtein hierarchical,
distance, restriction
WordNet, similarities
UMLS
Agreement XML, 1:1, schema yes - cosine, edit- descendant,
Maker RDFS, 1:n, and distance, sibling
(Cruz et al., OWL, n:n, instance JaroWinkler, similarities
2009) N3 n:m WordNet
PRIOR+ XML, 1:1 schema no Neural Edit distance, propagation
(Mao et al., RDF/R and networ stop word of original
2010) DFS instance k removal, information
and stemming,
OWL tokenization
Falcon RDFS, 1:1 schema - - I-SUB string Structural
(Hu et al., OWL comparison proximities,
2008) technique clustering,
GMO
OMReason OWL 1:1 schema - - Prefix, suffix, -

18
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

er (Shen et edit distance,


al., 2014) WordNet
XMap++ OWL 1:1 schema - - Cosine, adjacency
(Djeddi and WordNet, relationships
Khadir, Bing (subClassOf
2014) Translator and is-a)
AOTL OWL 1:1 Schema - String -
(Khiat and similarity
Benaissa, metrics
2014)
MassMtch OWL 1:1 schema - - 3-grams, Hybrid
(Schadd Jaccard, similarity
and Roos, WordNet
2014)

V. Challenges of the ontology Matching Systems

There are some challenges of ontology matching systems such as handling ontology matching
errors, user involvement and reuse of previous match operations. The challenges are described
below:

1. Handling Ontology Matching Errors

Ontology matching approaches (Ngo and Bellahsene, 2012, Aumueller et al., 2005) describe
that the reason of low performance is the ontology matching errors such as false positives
(where irrelevant matching between concepts is found as relevant) and false negatives
(where relevant matching between concepts is found as irrelevant). However, they do not
describe how to handle the problems. If false positives are high, then precision becomes low.
If false negatives are high, the recall becomes low. Precision can be 1.0 if FP becomes zero.
Precision share the reliability of match predictions. Recall calculates the real matches. If
precision becomes high but recall becomes very low, then overall performance becomes
very low. For this, it is very important to increase the value of recall. YAM++ uses machine
learning approach in the element level matching. The system iterates matching many times
for increasing the similarity scores between ontologies. The reason is that if the matching
methods are run only one time to produce new mappings, the accuracy of the new results is
not good. Iterating many times can remove incorrect mappings but the process is time
consuming. The system shows that because of false positives and false negatives,
performance becomes low. However, it does not show how to handle the errors
algorithmically for increasing performance. In the system, the mappings of two ontologies
are displayed in a Graphical User Interface (GUI) and users can modify and remove
incorrect mappings or add new mappings using their knowledge manually. The approach is
not efficient. First of all, the users need proper knowledge about the domain. Second,
manually correcting mapping is time consuming and error prone. COMA++ also follows the
same procedure for correcting the mapping.

19
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

In order to handle the matching errors semi-automatically, it is necessary to apply a


hybrid approach. We apply a hybrid-RDR approach (Anam et al., 2015b) for schema
mapping. In the approach, only one machine learning classification model is needed and
rules are created for solving errors using incremental knowledge acquisition. The same
approach can be applied for handling the matching errors of ontologies.

2. User Involvement
Ontology matching can be done manually, semi-automatically and automatically. Manual
matching is relied heavily on expert intervention, and extremely time intensive. If this
process is completed by a user rather than an expert, the results are unreliable and error
prone. In addition, different experts may have different opinions about the correctness of the
matching. On the other hand, automatic matching is not feasible because of the complexity
of the ontology entities in the datasets. Semi-automatic matching can remove the burden of
manual and automatic matching processes. In the semi-automatic matching approaches, it is
necessary to set some parameters for correcting and validating the matching results.
COMA++ (Aumueller et al., 2005) supports a large number of matchers where users need to
select the best combination of matchers. However, selecting matchers is not feasible if the
users do not have proper knowledge about the systems. For avoiding the matchers selection,
users try to use the default combination. In the largely differing matching problems of
diverse domains, the default configuration is not appropriate.

YAM++ (Ngo and Bellahsene, 2012) supports different classifiers for different mapping
scenario. In the system, user is asked to select the appropriate classifier or to use the default
classifier which has been used for learning over a huge mapping knowledge base. However,
default classifiers cannot cope up datasets in various domains. In addition, the system
accepts users preference between precision and recall during the generation process.
Without proper knowledge, it is not easy to provide the preferences. In order to reduce the
user involvement, it is necessary to build an ontology matching system using human
heuristics method of expert system. In the systems, features of ontology entities give proper
knowledge to the users to perform matching. The process has been applied in hybrid-RDR
(Anam et al., 2015b) based schema matching system for schema mapping. The same process
can be used for developing an ontology matching system with less human intervention
where users do not need to select the matchers combination or classifiers.

3. Reuse of Previous Match Operations


Reuse of previous match operations is different and more efficient compared to the reuse of
previous match results. Reuse of the previous schema matching results has been introduced
by (Rahm and Bernstein, 2001) for improving the effectiveness and efficiency of schema
matching system. Some systems COMA (Do and Rahm, 2002), COMA++ (Aumueller et
al., 2005) and Corpus-based (Madhavan et al., 2005) reuse the previous matching results.
However, there are some limitations of reusing the previous matching results. Firstly, the
ontology entities of new dataset for which the repository is searched must be same to the
ontology entities of the mapped datasets. For example, the entity CompanyName of one
dataset is matched to the entity Name of another dataset and the mapping information is
stored. In order to reuse the mapping information, new datasets must have the same entities,

20
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

CompanyName and Name. Secondly, the datasets should represent the same domain such as
purchase order.
Storing previous match operations can solve the above problems. The approach has been
implemented in hybrid-RDR (Anam et al., 2015b) system. In the system, the match
operations are termed as rules, and it is not necessary to create rules from the very
beginning. The rules which are used for matching schema entities of two datasets are reused
later for matching the entities of other new datasets. For example, the rule, Jaro(e1,e2)>=0.5
satisfies the schema entities, CompanyName (e1) and Name (e2), and the rule is stored in the
knowledge base. Later the rule can be reused to satisfy other schema entities such as
ItemNumber and Number. Jaro(e1,e2)>=0.5 means if the value of JaroWinkler function
applied on the entities, e1 and e2 greater than or equals to the threshold value 0.5, then the
conclusion is TRUE. This approach also reduces the matching effort to match large
schemas. If ontology matching system reuses the previous matching operations, then it is
possible to increase the performance by adding rules incrementally.

VI. Conclusion and Future Works

In this research, we have represented the necessity of ontology matching approaches and
described the applications of ontology mapping. We have also described the ontology matching
systems according to the classifications such as instance-based, schema-based, instance and
schema-based, usage-based, element-level, and structure-level, and provided a comparison
among some systems based on features. We have found that some combination methods such as
machine learning, knowledge engineering and neural network have been used in the state of the
art ontology matching systems. The methods have both advantages and limitations. In order to
accept the advantages and reduce the limitations, we have proposed to use hybrid approach that
combines both machine learning and knowledge engineering approaches for ontology mapping.
In future, we will develop an ontology matching system using hybrid approach. In this research,
we have also mentioned some challenges of ontology mapping such as handling ontology
matching errors, user involvement and reusing previous match operations, and have described
the way of handling the challenges. In future, we will handle the challenges and compare our
system to other systems in terms of performance and quality of match.

Acknowledgement

Autonomous Systems, Digital Productivity and Service Flagship, and the Tasmanian node of the
Australian Centre for Broadband Innovation are assisted by a grant from the Tasmanian
Government which is administered by the Tasmanian Department of Economic Development,
Tourism and the Arts.

21
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

References
i. Anam, S., Kim, Y., Kang, B. and Liu, Q. (2014a). Evaluation of Terminological Schema
Matching and Its Implications for Schema Mapping. PRICAI 2014: Trends in Artificial
Intelligence. Springer International Publishing.
ii. Anam, S., Kim, Y. and Liu, Q. (2014b). Incremental Schema Mapping. Knowledge
Management and Acquisition for Smart Systems and Services. Springer International
Publishing.
iii. Anam, S., Kim, Y.S., Kang, B.H. and Liu, Q. (2015a). Linked Data Provenance: State of
the Art and Challenges. the 3rd Australasian Web Conference, AWC 2015. Sydney,
Australia: CRPITT.
iv. Anam, S., Kim, Y.S., Kang, B.H. and Liu, Q. (2015b). Schema Mapping Using Hybrid
Ripple-Down Rules. the Thirty-Eighth Australasian Computer Science Conference,
ACSC 2015. Sydney, Australia: CRPITT.
v. Atencia, M., Euzenat, J., Pirr, G. and Rousset, M.-C. (2011). Alignment-based trust for
resource finding in semantic P2P networks. The Semantic WebISWC 2011. Springer.
vi. Aumueller, D., Do, H.-H., Massmann, S. and Rahm, E. (2005). Schema and ontology
matching with COMA++. Proceedings of the 2005 ACM SIGMOD international
conference on Management of data, ACM, 906-908.
vii. Chai, X., Sayyadian, M., Doan, A., Rosenthal, A. and Seligman, L. (2008). Analyzing
and revising data integration schemas to improve their matchability. Proc. VLDB Endow.,
1, 773-784.
viii. Cheatham, M. and Hitzler, P. (2013). String similarity metrics for ontology alignment.
The Semantic WebISWC 2013. Springer.
ix. Cohen, W.W., Ravikumar, P. and Fienberg., S.E.(2003). A Comparison of String
Distance Metrics for Name-Matching Tasks. IJCAI-03 Workshop on Information
Integration, 73-78.
x. Cruz, I.F., Antonelli, F.P. and Stroe, C. (2009). AgreementMaker: efficient matching for
large real-world schemas and ontologies. Proceedings of the VLDB Endowment, 2, 1586-
1589.
xi. Dessloch, S., Hernndez, M.A., Wisnesky, R., Radwan, A. and Zhou, J.(2008). Orchid:
Integrating schema mapping and etl. Data Engineering, 2008. ICDE 2008. IEEE 24th
International Conference on, 2008. IEEE, 1307-1316.
xii. Djeddi, W.E. and Khadir, M.T. (2014). XMap++: Results for OAEI 2014.
xiii. Do, H.-H. and Rahm, E. (2002). COMA: a system for flexible combination of schema
matching approaches. Proceedings of the 28th international conference on Very Large
Data Bases, VLDB Endowment, 610-621.

22
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

xiv. Doan, A., Madhavan, J., Domingos, P. and Halevy, A. (2002). Learning to map between
ontologies on the semantic web. Proceedings of the 11th international conference on
World Wide Web, ACM, 662-673.
xv. Ehrig, M. and Sure, Y. (2004). Ontology mappingan integrated approach. The Semantic
Web: Research and Applications. Springer.
xvi. Elmeleegy, H., Ouzzani, M. and Elmagarmid, A.(2008). Usage-based schema matching.
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, 2008.
IEEE, 20-29.
xvii. Giunchiglia, F. and Shvaiko, P. (2003). Semantic matching.
xviii. Giunchiglia, F.a.S., P. (2003). Semantic matching. Knowl. Eng. Rev., 18(3), 265-280.
xix. Guha, S., Rastogi, R. and Shim, K. (2000). ROCK: A robust clustering algorithm for
categorical attributes. Information systems, 25, 345-366.
xx. He, B. and Chang, K.C.-C. (2006). Automatic complex schema matching across web
query interfaces: A correlation mining approach. ACM Transactions on Database
Systems (TODS), 31, 346-395.
xxi. Hoshiai, T., Yamane, Y., Nakamura, D. and Tsuda, H. (2004). A semantic category
matching approach to ontology alignment. Proceedings of the 3rd International
Workshop on the Evaluation of Ontology-based Tools.
xxii. Hu, W., Qu, Y. and Cheng, G. (2008). Matching large ontologies: A divide-and-conquer
approach. Data & Knowledge Engineering, 67, 140-160.
xxiii. Ichise, R.(2008). Machine learning approach for ontology mapping using multiple
concept similarity measures. Computer and Information Science, 2008. ICIS 08. Seventh
IEEE/ACIS International Conference on, 2008. IEEE, 340-346.
xxiv. Jain, P., Hitzler, P., Sheth, A.P., Verma, K. and Yeh, P.Z. (2010a). Ontology alignment
for linked open data. The Semantic WebISWC 2010. Springer.
xxv. Jain, P., Hitzler, P., Yeh, P.Z., Verma, K. and Sheth, A.P. (2010b). Linked Data Is
Merely More Data. AAAI Spring Symposium: Linked Data Meets Artificial Intelligence.
xxvi. Jean-Mary, Y.R., Shironoshita, E.P. and Kabuka, M.R. (2009). Ontology matching with
semantic verification. Web Semantics: Science, Services and Agents on the World Wide
Web, 7, 235-251.
xxvii. Jimnez-Ruiz, E. and Grau, B.C. (2011). Logmap: Logic-based and scalable ontology
matching. The Semantic Web ISWC 2011. Springer.
xxviii. Jimnez-Ruiz, E., Grau, B.C., Zhou, Y. and Horrocks, I. (2012). Large-scale Interactive
Ontology Matching: Algorithms and Implementation. ECAI, 444-449.

23
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

xxix. Jimenez, S., Becerra, C., Gelbukh, A. and Gonzalez, F. (2009a). Generalized Mongue-
Elkan Method for Approximate Text String Comparison. In: GELBUKH, A. (ed.)
Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg.
xxx. Jimenez, S., Becerra, C., Gelbukh, A. and Gonzalez, F. (2009b). Generalized mongue-
elkan method for approximate text string comparison. Computational Linguistics and
Intelligent Text Processing. Springer.
xxxi. Khiat, A. and Benaissa, M. (2014). AOT/AOTL Results for OAEI 2014.
xxxii. Kim, Y.S., Compton, P. and Kang, B.H. (2012). Ripple-down rules with censored
production rules. Knowledge Management and Acquisition for Intelligent Systems.
Springer.
xxxiii. Kirsten, T., Gross, A., Hartung, M. and Rahm, E. (2011). GOMMA: a component-based
infrastructure for managing and analyzing life science ontologies and their evolution. J.
Biomedical Semantics, 2, 6.
xxxiv. Kirsten, T., Thor, A. and Rahm, E. (2007). Instance-based matching of large life science
ontologies. Data Integration in the Life Sciences, Springer, 172-187.
xxxv. Kitamura, Y., Segawa, S., Sasajima, M., Tarumi, S. and Mizoguchi, R. (2008). Deep
semantic mapping between functional taxonomies for interoperable semantic search. The
semantic web. Springer.
xxxvi. Koudas, N., Sarawagi, S. and Srivastava, D. (2006). Record linkage: similarity measures
and algorithms. Proceedings of the ACM SIGMOD international conference on
Management of data. Chicago, IL, USA: ACM.
xxxvii. Lambrix, P. and Tan, H. (2011). SAMBO-A System for Aligning and Merging
Biomedical Ontologies. Web Semantics: Science, Services and Agents on the World Wide
Web, 4.
xxxviii. Li, J., Tang, J., Li, Y. and Luo, Q. (2009). RiMOM: A dynamic multistrategy ontology
alignment framework. Knowledge and Data Engineering, IEEE Transactions on, 21,
1218-1232.
xxxix. Madhavan, J., Bernstein, P.A., Doan, A. and Halevy, A.(2005). Corpus-based schema
matching. Data Engineering, 2005. ICDE 2005. Proceedings. 21st International
Conference on, 2005. IEEE, 57-68.
xl. Madhavan, J., Bernstein, P.A. and Rahm, E. (2001a). Generic Schema Matching with
Cupid. Proceedings of the 27th International Conference on Very Large Data Bases.
Morgan Kaufmann Publishers Inc.
xli. Madhavan, J., Bernstein, P.A. and Rahm, E.(2001b). Generic schema matching with
cupid. VLDB, 2001b. 49-58.
xlii. Maedche, A. and Staab, S. (2002). Measuring similarity between ontologies. Knowledge
engineering and knowledge management: Ontologies and the semantic web. Springer.

24
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

xliii. Mao, M., Peng, Y. and Spring, M. (2010). An adaptive ontology mapping approach with
neural network based constraint satisfaction. Web Semantics: Science, Services and
Agents on the World Wide Web, 8, 14-25.
xliv. Marie, A. and Gal, A. (2008). Boosting schema matchers. On the Move to Meaningful
Internet Systems: OTM 2008. Springer.
xlv. Martinez-Gil, J., Navas-Delgado, I. and Aldana-Montes, J.F. (2012). MaF: An ontology
matching framework. Journal of Universal Computer Science, 18, 194-217.
xlvi. Massmann, S. and Rahm, E. (2008). Evaluating Instance-based Matching of Web
Directories. WebDB.
xlvii. Meilicke, C. (2011). Alignment incoherence in ontology matching. Universittsbibliothek
Mannheim.
xlviii. Melnik, S., Garcia-Molina, H. and Rahm, E. (2002). Similarity flooding: A versatile
graph matching algorithm and its application to schema matching. Data Engineering,
2002. Proceedings. 18th International Conference on, 2002. IEEE, 117-128.
xlix. Miller, G.A. (1995). WordNet: a lexical database for English. Communications of the
ACM, 38, 39-41.
l. Mitra, P., Wiederhold, G. and Jannink, J. (1999). Semi-automatic integration of
knowledge sources. Proceedings of Fusion.
li. Nandi, A. and Bernstein, P.A. (2009). HAMSTER: using search clicklogs for schema and
taxonomy matching. Proceedings of the VLDB Endowment, 2, 181-192.
lii. Ngo, D., Bellahsene, Z. and Coletta, R. (2011a). A generic approach for combining
linguistic and context profile metrics in ontology matching. On the Move to Meaningful
Internet Systems: OTM 2011. Springer.
liii. Ngo, D.H. and Bellahsene, Z. (2012). YAM++:(not) Yet Another Matcher for Ontology
Matching Task. BDA'2012: 28e journes Bases de Donnes Avances.
liv. Ngo, D.H., Bellahsene, Z. and Coletta, R. (2011b). YAM++--Results for OAEI 2011.
ISWC'11: The 6th International Workshop on Ontology Matching, 228-235.
lv. Noy, N.F., Chugh, A., Liu, W. and Musen, M.A. (2006). A framework for ontology
evolution in collaborative environments. The Semantic Web-ISWC 2006. Springer.
lvi. Noy, N.F. and Klein, M. (2004). Ontology evolution: Not the same as schema evolution.
Knowledge and information systems, 6, 428-440.
lvii. Peukert, E., Eberius, J. and Rahm, E. (2011). AMC-A framework for modelling and
comparing matching systems as matching processes. Data Engineering (ICDE), 2011
IEEE 27th International Conference on, 2011. IEEE, 1304-1307.
lviii. Rahm, E. (2011). Towards large-scale schema and ontology matching. Schema matching
and mapping. Springer.
25
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

lix. Rahm, E. and Bernstein, P.A. (2001). A survey of approaches to automatic schema
matching. the VLDB Journal, 10, 334-350.
lx. Santos, E., Faria, D., Pesquita, C. and Couto, F. (2013). Ontology alignment repair
through modularization and confidence-based heuristics. arXiv preprint
arXiv:1307.5322.Schadd, F.C. and Roos, N. 2014. Alignment Evaluation of MaasMatch
for the OAEI 2014 Campaign.

lxi. Schadd, F.C. and Roos, N. (2014). Alignment Evaluation of MaasMatch for the OAEI
2014 Campaign.

lxii. Seddiqui, M.H. and Aono, M. (2009). An efficient and scalable algorithm for segmented
alignment of ontologies of arbitrary size. Web Semantics: Science, Services and Agents
on the World Wide Web, 7, 344-356.
lxiii. Shen, G., Liu, Y., Fei Wang, J.S., Wang, Z., Huang, Z. and Kang, D. (2014).
OMReasoner: Combination of Multi-matchers for Ontology Matching: results for OAEI
2014.
lxiv. Shvaiko, P. and Euzenat, J. (2005). A survey of schema-based matching approaches.
Journal on Data Semantics IV. Springer.
lxv. Shvaiko, P. and Euzenat, J. (2008). Ten challenges for ontology matching. On the Move
to Meaningful Internet Systems: OTM 2008. Springer.
lxvi. Shvaiko, P. and Euzenat, J. (2013). Ontology matching: state of the art and future
challenges. Knowledge and Data Engineering, IEEE Transactions on, 25, 158-176.
lxvii. Stoilos, G., Stamou, G. and Kollias, S. (2005). A string metric for ontology alignment.
Proceedings of the 4th international conference on The Semantic Web. Galway, Ireland:
Springer-Verlag.
lxviii. Talukdar, P.P., Ives, Z.G. and Pereira, F (2010). Automatically incorporating new sources
in keyword search-based data integration. Proceedings of the 2010 ACM SIGMOD
International Conference on Management of data, 2010. ACM, 387-398.
lxix. Thor, A., Kirsten, T. and Rahm, E (2007). Instance-based matching of hierarchical
ontologies. BTW, 436-448.
lxx. Uschold, M. and Gruninger, M. (2004). Ontologies and semantics for seamless
connectivity. ACM SIGMod Record, 33, 58-64.
lxxi. Vaccari, L., Shvaiko, P. and Marchese, M. (2009). A geo-service semantic integration in
spatial data infrastructures. International Journal of Spatial Data Infrastructures
Research, 4, 24-51.
lxxii. Volz, J., Bizer, C., Gaedke, M. and Kobilarov, G. (2009). Discovering and maintaining
links on the web of data, Springer.

26
International journal of Computer Science & Network Solutions March.2015-Volume 3.No.3
http://www.ijcsns.com ISSN 2345-3397

lxxiii. Wache, H., Voegele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H. and
Hbner, S (2001). Ontology-based integration of information-a survey of existing
approaches. IJCAI-01 workshop: ontologies and information sharing, 108-117.
lxxiv. Wang, P. (2011). Lily results on SEALS platform for OAEI 2011. Proc. of 6th OM
Workshop, 156-162.
lxxv. Wimalasuriya, D.C. and Dou, D. (2010). Ontology-based information extraction: An
introduction and a survey of current approaches. Journal of Information Science.

27

Das könnte Ihnen auch gefallen