Query Based Info Retrieveal

A SURVEY ON QUERY ENRICHMENT BASED ON
ONTOLOGY
Submitted By,
C.R.Sangeetha
K.Anitha Priya
R.Lavanya Sree.
Contact No.: 9787049144.

Abstract. information retrieval and reuse.
Retrieval of correct and Retrieval of precise information
precise information at the right at the right time is crucial in
time is knowledge intensive tasks (e.g.,
essential in knowledge intensive monitoring performance of
tasks requiring quick decision- subsea equipment in oil and gas
making. In this production). However, finding
paper, we propose a method for both relevant and good quality
utilizing ontologies to enhance information in this sea of
the quality of information is not a trivial task.
information retrieval (IR) by An increasing number of
query enrichment. We explain recent information retrieval
how a retrieval systems make use of ontologies
system can be tuned by (more details in section 6) to
adapting ontologies to provide help the users clarify their
both an in-depth information needs and come up
understanding of the user's with semantic representations
needs as well as an easy of documents. A particular
integration with standard concern with
vector-space retrieval systems. these semantic approaches is
The ontology concepts are integration with traditional
adapted to the commercial search
domain terminology by technologies. Whereas, in this
computing a feature vector for paper, we discuss how utilizing
each concept. The ontologies in the query
feature vector is used to enrich process can enhance typical IR
a provided query. The ontology systems. In particular, we use
and the whole retrieval system text-mining techniques
are under development as part to tailor the ontology concepts
of a Semantic Web to domain terminology, i.e.
standardization project for the terms used in documents,
Norwegian oil and gas industry. but not necessarily aligned to
standard terminology. Later,
Introduction these tailored concepts are
Lots of business used to enrich the query to
procedures and knowledge are improve the retrieval quality.
written in natural language and
stored in huge information Word Sense Disambiguation
repositories. The procedures are Word sense
meant to support and guide disambiguation is known as one
employees in daily operations. of the most complex tasks in the
The sizes of these information area of artificial intelligence. We
repositories are constantly do not even attempt here a
increasing and confronting survey of the field, but we refer
companies with a problem of the interested reader to the
efficient and effective Senseval home page
(http://www.senseval.org/) for a
collection of state of art sense
disambiguation methods, and
the results of public
competitions in this area. During
the most recent Senseval
evaluation, the best system in
the English allwords task
(Mihalcea and Moldovan, 2001)
reached a 69% precision and
recall, a performance that
(Gonzalo et al., 1998) claim to
be well below the threshold that
produces improvements in a
text retrieval task. However, for Creation of semantic networks
a query expansion task it is not For every wk _ Q and every
necessary to pursue high recall, synset Sj k of wk (where Sjk is the j-th
but rather high precision. As we sense of wk in WordNet) we create a
show semantic net. Semantic nets are
in sections 3 and 4, even automatically built using the following
expanding only monosemous semantic relations: hyperonymy (car is-a
words in a query may produce a vehicle, denoted with _@), hyponymy
significant improvement over (its inverse, _~), meronymy (room has-a
the unexpanded query. wall, _#), holonymy (its inverse, _%),
pertainymy (dental pertains-to tooth _\),
attribute (dry value-of wetness, _=),
similarity (beautiful similar-to pretty,
Therefore we developed an algorithm _&), gloss (_gloss), topic (_topic),
that may be tuned to produce high domain ((_dl). Every relation is directly
precision, possibly at the price of low extracted from WordNet, except for
recall. The algorithm belongs to the class gloss, topic and domain. The topic and
of structural pattern recognition methods the gloss relations are obtained parsing
(Pavlidis, 1977). Structural pattern with a NL processor respectively the
recognition is particularly useful when SemCor4 sentences including a given
instances have an inherent, identifiable synset Sjk. and WordNet concept
organization, which is not captured by definitions (called glosses). SemCor is
feature vectors. In our work we use a an annotated corpus where each word in
graph representation to describe a sentence is assigned a sense selected
instances (word senses). from the WordNet sense inventory for
that word; an example is the following:
Shortly, the algorithm is as follows: Movement#7 itself was#7 the
chief#1 and often#1 the only#1
attraction#4 of the primitive#1 movies#1
of the nineties#1.
The topic relations extracted
from Semcor identify semantic co-
occurrences between two related nodes
of the semantic network (e.g. chief#1
_topic attraction#4). scoring configurations
As far as the gloss relation is concerned,
it is worth noticing that words in glosses
do not have sense tags in WordNet,
therefore we use an algorithm for gloss
disambiguation that is a variation of the
WSD algorithm described in this section.
For example, for sense #1 of bus “a
vehicle carrying many passengers; …”
the following relations are created:
Finally, the domain relation is extracted

from the set of domain labels (e.g.
tourism, chemistry, economy..) assigned
to WordNet synsets by a semiautomatic
methodology described in (Magnini and
Cavaglia, 2000). To reduce the
dimension of a SN, we consider only
concepts at a distance not greater than 3
relations from Sjk (the SN center). The
dimension of the SN has been
experimentally tuned.
Figure 1 is an example of SN generated

for sense #1 of bus.
Intersecting semantic networks and

by the content providers in the
purpose of giving the
documents a misleading higher
ranking than it should have had
[16]. However, there is still a
vision that for ontology based IR
systems on Semantic Web, “it is
necessary to annotate the web’s
content with terms defined in
ontology” [17].
The related work to our

approach comes from two main
areas. Ontology based IR,
in general, and approaches to
query expansion, in particular.
General approaches to
ontology based IR can further be
sub-divided into Knowledge
Related Work
Base (KB) and vector
Traditional information
space model driven approaches.
retrieval techniques (i.e., vector-
KB approaches use reasoning
space model) have an
mechanism and
advantage of being fast and
ontological query languages to
give a fair result. However, it is
retrieve instances. Documents
difficult to represent the
are treated either as
content of the documents
instances or are annotated
meaningfully using these
using ontology instances [5, 6,
techniques. That is, after the
17, 19]. These approaches
documents are indexed, they
focus on retrieving instances
become a “bag of terms” and
rather than documents. Some
hence the semantics is
approaches are often
partly lost in this process.
combined with ontological
In order to increase
filtering [4, 7, 15].
quality of IR much effort has
been put into annotating
There are approaches
documents with semantic
combining both ontology based
information [8, 12, 13, 23]. That
IR and vector space model.
is a tedious and laborintensive
For instance, some start with
task. Furthermore, hardly any
semantic querying using
search engines are using
ontology query languages and
metadata when indexing the
use resulting instances to
documents. AltaVista8 is one of
retrieve relevant documents
the last major search engines
[19, 20]. [20] use weighted
which dropped its support in
annotation when associating
2002 [16]. The main reason for
documents with ontology
this is that the meta information
instances.
can be and has been misused
The weights are approaches are focusing on
http://www.altavista.com/ . using ontologies in the process
Tomassen, Jon Atle Gulla, of enriching queries [4, 6, 25].
and Darijus Strasunskas based However, ontology in
on the frequency of occurrence such case typically serves as
of the instances in each thesaurus containing synonyms,
document. combines ontology hypernyms/hyponyms, and do
usage with vector-space model not consider the context of each
by extending a non-ontological term, i.e. every term is equally
query. There, ontology is used weighted. [24] is using query
to disambiguate queries. Simple expansion based on similarity
text search is run on the thesaurus. Weighting of terms is
concepts’ labels and users are used to reflect the domain
asked to choose the proper term knowledge. The query
interpretation. A similar expansion is done by similarity
approach is described in [25] measures. Similarly, [2]
where documents are describes a conceptual query
associated with concepts in the expansion. There, the query
ontology. The concepts in the concepts are created from a
query are matched to the result set. Both approaches
concepts of the ontology in show an improvement
order to retrieve terms and then compared to simple term based
used for calculation of document queries. While an approach
similarity. presented in [26] is most similar
[4] is using ontologies for to ours. However, [26] is not
retrieval and filtering of domain using ontologies but is reliant on
information across query concepts.
multiple domains. There each
ontology concept is defined as a Two techniques are used
domain feature with to create the feature vectors of
detailed information relevant to the query concepts, i.e. based
the domain including on document set and result set
relationships with other of a user query.
features. The relationships used
are hypernyms (super class), To contrast to above
hyponyms (sub class), discussed related work we
and synonyms. Unfortunately, emphasize the main features of
there are no details in [4] our approach as follows. Our
provided on how a domain approach relies on domain
feature is created. knowledge represented in
Most query enrichment ontology when constructing
approaches are not using feature vectors, then traditional
ontologies like [2, 24, 26]. Query vector-space retrieval
expansion is typically done by model is used for the
extending provided query terms information retrieval task, where
with synonyms or feature vectors are used to
hyponyms (cf. [21]). Some enrich provided queries. The
main advantage of our approach approach. Though, preliminary
is that the concepts of results indicate that the
an ontology is tailored to the quality of the feature vectors is
terminology of the document very important for the quality of
collection, which can vary the search result.
a lot even within the same In future work we are planning
domain. to inspect and tackle a set of
issues as follows.
Conclusions and Future First, there is a need to refine
Work term weight computation. Here
In this paper, we have we will investigate
proposed a method for utilizing alternative methods for
ontologies to improve the assigning relevant terms to the
retrieval quality. The concepts in ontology concepts, i.e. using
the ontology are associated with association rules, and evaluate
contextual the influence on the search
definitions in terms of weighted results. Second, we will
feature vectors tailoring the also look into alternative
ontology to the content methods for post-processing of
of the document collection. the results utilizing the
Further, the feature vectors are semantic relations in the
used to enrich a provided ontology for better ranking and
query. Query enrichment by navigation.
feature vectors provides means
to bridge the gap between References:
query terms and terminology www.informatik.uni-tier.de
used in a document set, and still www.wikipedia.com
employing the
knowledge encoded in the
ontology. We have also
proposed that concepts and
Document Space Adapted
Ontology: Application in Query
Enrichment 11
ordinary terms or keywords of
the query should be handled
differently since they have
different roles.
Main architectural
components and techniques
constituting the method have
been
presented in this paper. As
research reported here is still in
progress, we have not been
able to formally evaluate the

Query Based Info Retrieveal

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Query Based Info Retrieveal

Hochgeladen von

Copyright:

Verfügbare Formate

A SURVEY ON QUERY ENRICHMENT BASED ON

Contact No.: 9787049144.

Finally, the domain relation is extracted

Figure 1 is an example of SN generated

Intersecting semantic networks and

The related work to our

Das könnte Ihnen auch gefallen