A Flexible Framework To Experiment With Ontology Learning Techniques 2008

Available online at www.sciencedirect.
com
Knowledge-Based Systems 21 (2008) 192199

www.elsevier.com/locate/knosys
A exible framework to experiment with ontology learning techniques

Ricardo Gacitua *, Pete Sawyer, Paul Rayson
Lancaster University, Computing Department, Infolab21, South Drive, LA1 4WA Lancaster, UK
Available online 23 November 2007
Abstract
Ontology learning refers to extracting conceptual knowledge from several sources and building an ontology from scratch, enriching,
or adapting an existing ontology. It uses methods from a diverse spectrum of elds such as natural language processing, articial intelligence and machine learning. However, a crucial challenging issue is to quantitatively evaluate the usefulness and accuracy of both techniques and combinations of techniques, when applied to ontology learning. It is an interesting problem because there are no published
comparative studies.
We are developing a exible framework for ontology learning from text which provides a cyclical process that involves the successive
application of various NLP techniques and learning algorithms for concept extraction and ontology modelling. The framework provides
support to evaluate the usefulness and accuracy of dierent techniques and possible combinations of techniques into specic processes, to
deal with the above challenge. We show our frameworks ecacy as a workbench for testing and evaluating concept identication. Our
initial experiment supports our assumption about the usefulness of our approach.
Crown copyright 2007 Published by Elsevier B.V. All rights reserved.
Keywords: Semantic Web; Ontologies; Ontology learning; NLP methods; Machine learning methods
1. Introduction
The Semantic Web is an evolving extension of the
World-Wide Web, in which content is encoded in a formal
and explicit way, and can be read and used by software
agents [2]. It depends heavily on the proliferation of ontologies. An ontology constitutes a formal conceptualization
of a particular domain shared by a group of people. In
complex domains to identify, dene, and conceptualize a
domain manually, can be a costly and error-prone task.
This problem can be eased by semi-automatically generating an ontology.
Most domain knowledge about domain entities and
their properties and relationships is embodied in text collections with varying degrees of explicitness and precision. Ontology learning from text has therefore been
among the most important strategies for building an ontol*
Corresponding author. Tel.: +44 1524 510563.

E-mail addresses: r.gacitua@lancs.ac.uk (R. Gacitua), sawyer@lancs.ac.uk (P. Sawyer), p.rayson@lancs.ac.uk (P. Rayson).
ogy. Machine learning and automated language-processing

techniques have been used to extract concepts and relationships from structured and unstructured data, such as text
and databases. For instance, Cimiano et al. [7] use statistical analysis to extract terms and produce a taxonomy. Similarly, Reinberger and Spyns [21] use shallow linguistic
parsing for concept formation and identify some types of
relationships by using prepositions.
Researchers have realized that the output for the ontology learning process is far from being perfect [14]. One
problem is that in most cases it is not obvious to how to
use, congure and combine techniques from dierent elds
for a specic domain. Although there are a few published
results about combinations of techniques, for instance
[23], the problem is far from being solved. For example,
some researchers use dierent text processing techniques
such as stopwords ltering [5], lemmatization [4] or stemming [13] to generate a set of pre-processed data as input
for the concept identication. However, there are no comparative studies that show the eectiveness of these linguistics pre-processing techniques. An additional problem for
0950-7051/$ - see front matter Crown copyright 2007 Published by Elsevier B.V. All rights reserved.
doi:10.1016/j.knosys.2007.11.009
R. Gacitua et al. / Knowledge-Based Systems 21 (2008) 192199
ontology learning is that most frameworks use a predened combination of techniques. Thus, they do not
include any mechanism for carrying out experiments with
combinations or the ability to include new ones. Reinberger et al. [22] point out that: To our knowledge no comparative study has been published yet on the eciency
and eectiveness of the various techniques applied to ontology learning.
Our motivation is to help to make the ontology learning
process controllable. Because of this, it is important to
know the contribution of the available techniques and the
eciency of a technique combination. We think that the
failure to evaluate the relative ecacy of dierent NLP
techniques is likely to hinder the development of eective
learning and knowledge acquisition support for ontology
engineering. Due to the above problem, both a exible
framework and an integrated tool-suite to congure and
combine techniques applied to ontology learning are proposed. The general architecture of our solution integrates
an existing linguistic tool (WMatrix [20]), which provides
part-of-speech (POS) and semantic tagging, an ontology
workbench for information extraction, and an existing
open source ontology editor called Protege [16].1 This work
is part of a larger project to build ontologies semi-automatically by processing a collection of domain texts. It involves
dealing with four fundamental issues: extracting the relevant domain terminology, discovering concepts, deriving
a concept hierarchy, and identifying and labeling ontological relations. Our work involves the innovative adaptation, integration and application of existing NLP and
machine learning techniques in order to answer the following research question:
Can shallow analysis of the kind enabled by a range of linguistic and statistical NLP and corpus linguistic techniques
identify key domain concepts? Can it do it with sucient condence in the correctness and completeness of the result?
The main contributions of our project are:
Providing ontology engineers with a coordinated and
integrated tool for knowledge objects extraction and
ontology modelling.
Evaluating the contribution of dierent NLP and
machine learning techniques and their combinations
for ontology learning.
Proposing a guideline to congure and combine techniques applied to ontology learning.
In this paper we present the results achieved so far:
The denition of a framework which provides support
for testing dierent NLP and machine learning techniques to support the semi-automatic ontology learning
process.
http://protege.stanford.edu/
193
A prototype workbench for knowledge object extraction

which provides support for the framework. This workbench integrates a set of NLP and corpus linguistics
techniques for experimenting with them.
Comparative analysis using a set of linguistic and statistical techniques.
The remainder of our paper is organized as follows. We
begin by introducing related work. Then, we present the
main parts of the framework by describing and characterizing each of the activities that form the process. Next, we
present experiments using a set of linguistic and statistical
techniques. Finally, we discuss the results of the experiments and present the conclusions.
2. Background
In recent years, a number of frameworks that support
ontology learning processes have been reported. They
implement several techniques from dierent elds such as
knowledge acquisition, machine learning, information
retrieval, natural language processing, articial intelligence
reasoning and database management, as shown by the following work:
ASIUM [11] learns verb frames and taxonomic knowledge, based on statistical analysis of syntactic parsing
of French texts.
Text2Onto [6] is a complete re-design and re-engineering
of KAON TextToOnto. It combines machine learning
approaches with basic linguistic processing such as tokenization or lemmatizing and shallow parsing. It is
based on the GATE framework [8].
Ontolearn [24] learns by interpretation of compounds by
compositional interpretational.
OntoLT [3] learns concepts by term extraction by statistical methods and denition of linguistic patterns as well
as mapping to ontological structures.
DODDLE II [25] learns taxonomic and non-taxonomic
relations using co-occurrence analysis, exploiting a
machine readable dictionary (Wordnet) and domainspecic text.
WEBKB [9] combines Bayesian learning and First
Order Logic rule learning methods to learn instances
and instance extraction rules from World-Wide Web
documents.
All the above combine linguistic analyses methods with
machine learning algorithms to nd potentially interesting
concepts and relationships between them. However, only
Text2Onto has been designed with a central management
component that allows various algorithms for ontology
learning to be plugged in to. Since it is based on the GATE
framework it is exible with respect to the set of linguistic
algorithms used, because GATE applications can be congured by replacing existing algorithms or adding new
ones. Similarly, our framework uses a plug-in based struc-
194
Fig. 1. OntoLancs framework.
ture so it can include new algorithms. However, in contrast, it can include techniques from existing linguistic
and ontology tools by using java APIs (Application Program Interface) directly where it is possible. In addition,
Tex2Onto denes the user interaction as a core aspect
whereas our framework provides support to process algorithms in an unsupervised mode as well. In the next section
we describe our Ontology Acquisition Framework before
explaining in the subsequent section how our framework
supports evaluation.
3. The ontology framework: OntoLancs
Our research project principally addresses the issue of
quantitatively evaluating the usefulness or accuracy of
techniques and combinations of techniques applied to
ontology learning. We have integrated a rst set of natural
language processing, corpus linguistics and machine learning techniques for experimentation. They are: (a) POS
grouping, (b) stopwords ltering, (c) frequency ltering,
(d) POS ltering, (e) lemmatization, (f) stemming, (g) frequency proling, (h) concordance, (i) lexicon-syntactic pattern (j) co-occurrence by distance, and (k) collocation
analysis. Our framework facilitates experiments with dierent NLP and machine learning techniques in order to
assess their eciency and eectiveness, including the performances of various combinations of techniques. All such
functions are being built into a prototype workbench to
evaluate and rene existing techniques using a range of
domain document corpora.
In this paper several existing knowledge acquisition
techniques are selected for performing the concept acquisi-
tion process in order to evaluate the performance of the

selected techniques.
3.1. Phases of the ontology framework
This section describes the ontology framework.
The workow of our ontology framework proceeds
through the stages of (i) semi-automatic abstraction and
classication of domain concepts, (ii) encoding them in
the OWL ontology language [10], and (iii) editing them
using an enhanced version of an existing editor Protege.
This set of tools provides ontology engineers with a coordinated and integrated workbench for extracting terms and
modelling ontology.
There are four main phases of process, as shown in Fig. 1.
Below we provide detailed descriptions of these phases.
Phase 1: part-of-speech (POS) and semantic annotation
of corpus. Domain texts are tagged morpho-syntactically
and semantically using Wmatrix. The system assigns a
semantic category to each word employing a comprehensive semantic category scheme called The UCREL Semantic Analysis System (USAS) [19].2 It is a framework for
undertaking the automatic semantic analysis of text. The
semantic tagset used by USAS has a hierarchical semantic
taxonomy containing 21 major discourse elds and 232
ne-grained semantic elds. In addition, USAS combines
several resources including the CLAWS POS tagger [12],
which is used to assign POS tags to words.
Phase 2: extraction of concepts. The domain terminology
is extracted from the tagged domain corpus by identifying a
2
http://www.comp.lancs.ac.uk/ucrel/usas/
195
Fig. 2. OntoLancs workbench combining NLP techniques.
list of candidate domain terms (Candidate Domain Term

Forest). In this phase the system provides a set of NLP
and machine learning techniques which an ontology engineer can combine for identifying candidate concepts.
Where a domain ontology exists it can be used as a reference ontology and to calculate the precision and recall of
the techniques used when applied to a set of domain documents. The DARPA Agent Markup Language (DAML)
Library3 provides a rich set of domain ontologies thus
can be used as the basis for our experiment. We initially
plan to apply the framework and workbench to a set of
domain documents for which a domain ontology exists.
Phase 3: domain ontology construction. In this phase, a
domain lexicon is built. Denitions for each concept are
extracted from several on-line sources automatically, such
as: WordNet [15] and online dictionaries (Webster4 and
Cambridge Dictionary Online5). Concepts extracted during
the previous phase are then added to a bootstrap ontology.
We assume that a hierarchical classication of terms,
rather than a fully dened ontology, will be sucient for
the rst stage of our project.
Phase 4: domain ontology edition. In the nal phase, the
bootstrap ontology is turned into light OWL language, and
then processed using an ontology editor to manage the versioning of the domain ontology, and modify/improve it.
For the editor, we will use Protege. Phase 4 is currently
future work since our current concern is to identify the best
set of techniques to integrate in phase 2.
3
4
5
http://www.daml.org/ontologies/
http://www.m-w.com/
http://dictionary.cambridge.org/
3.2. An integrated ontology workbench

This section provides a brief description of the implementation of the rst phase in the prototype workbench.
Our framework is designed to include a set of NLP and
machine learning techniques, and to enable its enhancement with new techniques in future. In other words, we
are concerned with providing a core set of linguistic utilities
within an open architecture that can accept new plug-in
techniques as they become available. Initially, the techniques are organized and executed as a pipeline (see
Fig. 2), the output of one technique forms the input for
another technique. When a technique is selected a new
tab is created in the main panel. An optional linguistic
technique (grouping by POS) is included at the beginning.
In future versions of our framework a graphical workow
engine will provide support for the composition of complex
ensemble techniques.
The output from any technique is represented using
XML format. In the rst phase we use Wmatrix to get
POS tags, lemmas and semantic tags for each word. The
integration between both Wmatrix and the ontology workbench provides a platform for dealing with the scalability
problem. Running in a powerful server, Wmatrix is capable
of processing large volumes of corpora. Furthermore, the
workbench has pre-loaded the BNC corpus a balanced
synchronic text corpus containing 100 million words with
morphosyntactic annotation.6 In order to identify a preliminary set of concepts the workbench provides functions
to analyze the corpus and lter the candidates using POS
6
http://www.natcorp.ox.ac.uk/
196
tags and absolute frequency as preliminary lters. Fig. 1

shows the GUI of the workbench. Extraction of concepts:
at present, a rst set of linguistic techniques has been
implemented. It comprises:- (i) POS grouping and ltering
by POS group. This provides an option to select a set of
POS tag categories and lter the list of terms, (ii) stopwords
ltering. This eliminates stop words when the texts are analyzed. However, stop words are not necessarily eliminated
when extracting phrases (for example: players of football). As a result n-grams may contain stop words, (iii)
frequency ltering. This provides an option to lter the list
of terms by frequency ranges, (iv) lemmatization. This is a
process wherein the inectional and variant forms of a
word are reduced to their lemma: their base form, (v) stemming. This conates terms in a common stem using Porters
algorithm [17], and (vi) frequency proling [18]. This technique can be used to discover key words in a corpora which
dierentiate the domain corpus from a normative corpus,
such as the British National Corpus as a normative corpus.
4. Experiments
In this section we describe the mechanism our framework provides for evaluating the ecacy of dierent NLP
techniques for the crucial second phase of the ontology
learning process described in Section 3.1.
The experiments were designed to extract a set of candidate concepts from a domain corpus using a combination
of NLP and machine learning techniques and to check
the correspondence between the candidate concepts and
the classes of a DAML reference ontology. In order to
assess the eciency of the techniques we used the precision
and recall values as a comparative measure. Note that
although we do not believe that automatic ontology creation is possible or desirable, our experiments to select
appropriate techniques are conducted in an unsupervised
manner. In addition, it is important to remember here that
we are not trying to validate the reference ontology itself.
Our main aim is to validate NLP and machine learning
techniques applied to concept identication. Clearly, the
extent to which the chosen reference ontology reects the
consensus about the domain is a threat to validity. To try
to minimise this, we selected reference ontologies from
the DAML Library. Ontologies are not guaranteed to be
validated, but their existence in the public domain provides
an initial, weak, rst level of condence in their validity. A
second threat to validity results from the corpus of domain
documents. This can be mitigated by selecting domains for
which much public-domain document is readily available.
For the experiment described here, we built a football
corpus which comprises 102,543 words. Football, that is
soccer, is an attractive domain because, at least one DAML
football ontology already exists and there is a wealth of
domain documentation ranging from rules published by
the sports governing bodies, to many thousands of words
published daily in match reports across the world. All documents were gathered by running a Google query Foot-
ball Game. Then we selected those written by FIFA

(Federation Internationale de Football Association) and
mainly published in football web sites.
For our reference ontology, we selected a football ontology7 which has 199 classes. This ontology is used to annotate videos in order to produce personalized summaries of
soccer matches.
Although we cannot ensure the conceptual correctness
of the DAML reference ontology and its correspondence
with the application context of our domain corpus, we
assumed as a preliminary premise that the DAML reference ontology is valid in order to evaluate our concept
extraction process.
One we had assembled our football document corpus,
we applied the following combination of linguistic techniques: Group and Filter by POS lemmatization/stemming frequency proling on a set of candidate terms
returned by WMatrix. Before applying the techniques in
combination, we excluded a pre-dened list of stop words
which are not useful for identifying concepts.
Note that these are presented here simply to illustrate
how our evaluation mechanism works. We could have used
any of the frameworks in-built NLP techniques.
First, we grouped the initial list of candidate terms using
dierent categories. These categories represent the results
of applying dierent NLP techniques as described below:(a) Filter by Group by POS, which provides an option for
selecting a set of POS tag categories and ltering the list
of terms. In this case, we used 3 sorts of word grouping:
(i) Using specic POS Tags. For instance: Kick _VV0 (base
form or lexical verb) is considered dierent from Kick_VVI
(innitive), (ii) Using a generic POS Tag. In this case, we
used a generic POS Tag. For instance Kick_VV0 and
Kick_VVI are turned into Kick_verb, (iii) POS-independent. In this case, we used only a word with a generic category: any. For instance, Kick_noun and Kick_verb are
turned into Kick_any. Second, we applied a morphological
method lemmatization or stemming on the set of candidate terms in order to use a canonical form for each word.
Third, we applied frequency proling techniques and ltered the set of terms by using the log-likelihood measure
(LL) which provides a measure of statistical signicance.
We used two lters: 95th percentile (5% level; p < 0.05; critical value = 3.84. A LL greater 3.84 indicates that there is a
95% condence in the results reliability) and, 99.9th percentile (1% level; p < 0.01; critical value = 6.63. A LL
greater 6.63 indicates that there is a 99% condence in
the results reliability). Finally, we checked the lexical correspondence between the candidate terms and the classes in
the DAML reference ontology.
In order to evaluate quantitatively the results of this
process we used the precision and recall values. Applied
in this context, the metrics are dened as:
http://www.lgi2p.ema.fr/~ranwezs/ontologies/soccerV2.0.daml
Precision: measures the number of classes of the reference ontology which were matched by a concept returned
by applying the selected NLP techniques to the document
corpus divided by the number of the candidate terms.
Recall: measures the number of classes of the reference
ontology which were matched by a concept returned by
applying the selected NLP techniques to the document corpus divided by the number of ontology classes.
We dened a set of NLP and machine learning techniques combinations, grouped by the use of a morphological technique, and then obtained precision and recall
values for each combination (see Fig. 3).
The results of the rst evaluation, after applying grouping by POS, stemming or lemmatization and frequency
proling techniques, showed low values of recall and precision. This is a consequence of the fact that we used an
unsupervised method and applied a limited number of
techniques for identifying domain concepts.
In the above experiments, although we applied one NLP
and one machine learning technique only on the set of candidate terms, we collected a reasonable number of matched
classes with the ontology all experiments had a recall
above 42% (see Table 1). Applying the stemming technique
before applying the frequency proling technique on the set
of candidates terms, produced the lowest values of recall.
All were above 32% and below 34% (see Table 2). In the
case of precision, the results were lower than the independent morphosyntactic technique. In contrast, applying lemmatization before applying the frequency proling
technique produced the best results. In particular, the set
of candidate terms ltered by using a 95% condence produced values of recall above 47% (see Table 3). In the case
of precision, the results were higher than other cases (3.43%
the highest value).
From the experiments, we can conclude that the lemmatization technique produces better results of precision and
recall than the stemming technique for the domain concept
197
Table 1
Performance using dierent techniques morphosyntactic technique
independent
Combination
Recall
Precision
A1
A2
A3
A4
A5
A6
45.45
44.71
44.98
43.74
44.02
42.79
2.36
2.83
2.39
2.88
2.42
2.97
Table 2
Performance using dierent techniques stemming
Combination
Recall
Precision
S1
S2
S3
S4
S5
S6
33.33
32.52
33.33
32.52
33.33
32.25
2.25
2.65
2.25
2.65
2.34
2.77
Table 3
Performance using dierent techniques lemmatization
Combination
Recall
Precision
L1
L2
L3
L4
L5
L6
47.62
45.45
47.62
45.45
47.14
45.45
2.98
3.43
2.98
3.43
3.16
3.43
acquisition process. Stemming just nds any base form,

which does not even need to be a word in the language.
Because of this, most of the stems generated by Porters
algorithm could not to be recognized as English words.
For instance, in the soccer corpus the term: referee was
reduced as refere. In contrast, lemmatization nds the
Fig. 3. Technique combination.
198
actual root of a word, which comes from a morphological

analysis. Our results are consistent with other studies.
For instance, Alkula [1] suggested that lemmatization
may be a better approach than stemming.
The precision and recall values obtained in the experiment are aected by the following factors:
(1) We used an unsupervised method. The preliminary list
of concepts tagged by POS has several words that carry
general concepts, thus they should not be considered as
domain concepts. For instance: following, greater,
etc. These kinds of terms can be ltered by a human ontology engineer, and indeed, our overall aim is to develop our
framework so that it provides eective support rather than
full automation. Nevertheless, for the evaluation of the ecacy of individual or combinations of NLP techniques,
objective results can only be acquired by applying them
in an unsupervised way.
(2) The reference domain ontology contains articial classes which cannot be derived from a domain corpus automatically. Most ontologies contain articial classes dened for
grouping classes generally. For instance, the soccer ontology which was used as a reference to appraise the knowledge extraction process contains classes such as:
Attribute, Boolean-type, False-value, etc. Such
terms would not be expected to gure prominently in the
domain corpus since they are an artifact of the ontologys
taxonomic structuring mechanism rather than of the football domain.
In spite of this (2), the results provided a reasonable
indication of the relative ecacy of the selected NLP and
machine learning techniques, and oer a useful insights
into, for example, the usefulness of ltering on dierent
part of speech. In addition, we can conrm that applying
lemmatization on a set of candidate terms produces better
results than applying stemming, for ontology learning.
Thus, values of recall and precision will become higher
when new techniques to identify multiword are included,
and also human supervision to lter general concepts is
provided.
5. Conclusions and further work
In this paper, we have described an ongoing project
which proposes a exible framework for the ontology
learning process. This framework is designed as a cyclical
process to experiment with dierent techniques and combinations of techniques. It provides support to determine
what techniques or their combinations provide optimal
performances for the ontology learning process. An ontology engineer can decide techniques or combinations which
will be used to extract concepts and turn them into an
ontology. In future versions of our framework a graphical
workow engine will provide support for the composition
of complex ensemble techniques.
Our research project addresses an important challenge
for ontology research, i.e., how to validate innovative natural language processing approaches for the purpose of
capturing knowledge objects, which are contained in

domain-specic texts. In a rst stage, it involves dealing
with three fundamental issues: extracting the relevant
domain terminology, discovering concepts, and deriving a
concept hierarchy. Additional future work includes using
semantic lters. Our initial experiment supports our
assumption about the usefulness of our approach: evaluating the eectiveness of the techniques for ontology learning
acquisition. The experiments suggest that the lemmatization technique may be a better approach than stemming.
The preliminary results reinforce our belief that the
availability of linguistic tools integrated into a practical
ontology engineering process can potentially aid the rapid
development of domain ontologies. Our ontology engineering environment, OntoLancs is unique in not only providing a framework for integrating linguistic techniques, but
also the possibility of an experimental platform for identifying the most eective techniques or combinations.
References
[1] R. Alkula, From plain character strings to meaningful words:
producing better full text databases for inectional and compounding
languages with morphological analysis software, Inf. Retr. 4 (34)
(2001) 195208.
[2] T. Berner-Lee, J. Hendler, O. Lassila, The Semantic Web a new
form of Web content that is meaningful to computers will unleash a
revolution of new possibilities, Sci. Am. 284 (5) (2001) 34.
[3] P. Buitelaar, M. Sintek, OntoLT version 1.0: middleware for ontology
extraction from text, in: Proc. Demo Session at the International
Semantic Web Conference, 2004.
[4] P. Buitelaar, S. Ramaka, Unsupervised ontology-based semantic
tagging for knowledge markup, in: S.B. Wray Buntine, A. Hotho
(Eds.), Proc. Workshop on Learning in Web Search co-located with
the International Conference on Machine Learning, Bonn, 2005.
[5] S. Bloehdorn, P. Cimiano, A. Hotho, Learning ontologies to improve
text clustering and classication, in: M. Spiliopoulou, R. Kruse, A.
Nurnberger, C. Borgelt, W. Gaul (Eds.), From Data and Information
Analysis to Knowledge Engineering: Proc. GfKl 2005, vol. 30,
Springer, Magdeburg, 2006.
[6] P. Cimiano, J. Volker, Text2onto a framework for ontology
learning and data-driven change discovery, in: Proc. NLDB 2005,
Lecture Notes in Computer Science, vol. 3513, Springer, Alicante,
2005, pp. 227238.
[7] P. Cimiano, L. Schmidt-Thieme, A. Pivk, S. Staab, Learning
taxonomic relations from heterogeneous evidence, in: P. Buitelaar,
P. Cimiano,B. Magnini (Eds.), Ontology Learning from Text:
Methods, Applications and Evaluation, Vol. 123 Frontiers in
Articial Intelligence and Applications, IOS Press, 2005, pp. 5973.
[8] H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, GATE: a
framework and graphical development environment for robust NLP
tools and applications, in: Proc. of the 40th Anniversary Meeting of
the Association for Computational Linguistics, 2002.
[9] M. Craven, D. DiPasquo, D. Freitag, A.K. McCallum, T.M. Mitchell,
K. Nigam, S. Slattery, Learning to construct knowledge bases from the
World Wide Web, Artif. Intell. 118 (1/2) (2000) 69113.
[10] M. Dean, G. Schreiber, OWL Web Ontology Language Reference,
W3C (2004).
[11] D. Faure, T. Poibeau, First experiences of using semantic knowledge
learned by ASIUM for information extraction task using INTEX, in:
Proc. OL2000 Workshop on Ontology Learning, Berlin, 2000.
[12] R. Garside, The CLAWS word-tagging systemThe Computational
Analysis of English: A Corpus-based Approach, Longman, London, 1987.

[13] J.-U. Kietz, A. Maedche, R. Volz, A method for semi-automatic
ontology acquisition from a corporate intranet, in: Proc. of Workshop Ontologies and Text, co-located with (EKAW2000), Juan-LesPins, 2000.
[14] A. Maedche, S. Staab, Mining ontologies from text, in: Proc. 12th
European Workshop on Knowledge Acquisition, Modeling and
Management, Lecture Notes in Computer Science, vol. 1937,
Springer-Verlag, London, pp. 189202.
[15] G.A. Miller, WordNet: a lexical database for English, Commun.
ACM 38, 11 (1995), 3941.
[16] N. Noy, R.W. Fergerson, M.A. Musen, The knowledge model of
protege-2000: combining interoperability and exibility, in: Proc.
EKAW2000, Juan-les-Pins, 2000.
[17] M.F. Porter, An algorithm for sux stripping, Program 14 (3) (1980)
130137.
[18] P. Rayson, R. Garside, Comparing corpora using frequency proling,
in: Proc. of the workshop on Comparing corpora, Association for
Computational Linguistics, Morristown, NJ, 2000, pp. 16.
[19] P. Rayson, D. Archer, S.L. Piao, T. McEnery, The UCREL
semantic analysis system, in: Proc. of the workshop on Beyond
Named Entity Recognition Semantic Labelling for NLP tasks,
Lisbon, 2004, pp. 712.
199
[20] P. Rayson, Matrix: a statistical method and software tool for

linguistic analysis through corpus comparison, Ph.D. Thesis, Computing Department, Lancaster University, UK, 2003.
[21] M.L. Reinberger, P. Spyns, Discovering knowledge in texts for the
learning of DOGMA-inspired ontologies, in: Proc. ECAI04
Workshop Ontology Learning and Population, Valencia, 2004, pp.
1924.
[22] M.-L. Reinberger, P. Spyns, J. Pretorius, W. Daelemans, Automatic
initiation of an ontology, in: Proc. ODBASE 2004, Lecture Notes in
Computer Science, Springer-Verlag, 2004, pp. 600617.
[23] M. Sabou, C. Wroe, C.A. Goble, H. Stuckenschmidt, Learning
domain ontologies for semantic web service descriptions, J. Web Sem.
3 (4) (2005) 340365.
[24] P. Velardi, R. Navigle, A. Cucchiarelli, F. Neri, Evaluation of
OntoLearn, a methodology for automatic learning of domain ontologies, in: P. Buitelaar, P. Cimiano, B. Magnini (Eds.), Ontology
Learning from Text: Methods, Applications and Evaluation, Vol. 123,
Frontiers in Articial Intelligence and Applications, IOS Press, 2005.
[25] T. Yamaguchi, Acquiring conceptual relationships from domainspecic texts, in: Alexander Maedche, Steen Staab, Claire Nedellec,
Eduard H. Hovy (Eds.), Proc. IJCAI2001 Workshop on Ontology
Learning, vol. 38, Seattle, 2001.

A Flexible Framework To Experiment With Ontology Learning Techniques 2008

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

A Flexible Framework To Experiment With Ontology Learning Techniques 2008

Hochgeladen von

Copyright:

Verfügbare Formate

Available online at www.sciencedirect.

Knowledge-Based Systems 21 (2008) 192199

A exible framework to experiment with ontology learning techniques

Corresponding author. Tel.: +44 1524 510563.

ogy. Machine learning and automated language-processing

R. Gacitua et al. / Knowledge-Based Systems 21 (2008) 192199

A prototype workbench for knowledge object extraction

R. Gacitua et al. / Knowledge-Based Systems 21 (2008) 192199

Fig. 1. OntoLancs framework.

tion process in order to evaluate the performance of the

R. Gacitua et al. / Knowledge-Based Systems 21 (2008) 192199

Fig. 2. OntoLancs workbench combining NLP techniques.

list of candidate domain terms (Candidate Domain Term

3.2. An integrated ontology workbench

R. Gacitua et al. / Knowledge-Based Systems 21 (2008) 192199

tags and absolute frequency as preliminary lters. Fig. 1

ball Game. Then we selected those written by FIFA

R. Gacitua et al. / Knowledge-Based Systems 21 (2008) 192199

acquisition process. Stemming just nds any base form,

Fig. 3. Technique combination.

R. Gacitua et al. / Knowledge-Based Systems 21 (2008) 192199

actual root of a word, which comes from a morphological

capturing knowledge objects, which are contained in

R. Gacitua et al. / Knowledge-Based Systems 21 (2008) 192199

[20] P. Rayson, Matrix: a statistical method and software tool for

Das könnte Ihnen auch gefallen