Sie sind auf Seite 1von 8

THE EFFECTS OF QUERY COMPLEXITY, EXPANSION

AND STRUCTURE ON INFORMATION RETRIEVAL


SYSTEMS
Abstract

In this paper we have discussed about effects of query complexity, expansion and structure on retrieval
performance measured as precision and recall in probabilistic text retrieval were tested. Complexity
refers to the number of search facets or intersecting concepts in a query. Facets were divided into major and
minor facets on the basis of their importance with respect to a corresponding request. Two complexity
levels were tested: high complexity refers to queries using all search facets identified from requests, low
complexity was achieved by formulating queries with major facets only. There were five expansion types:
(1) the first query version was an unexpanded, original query with one search key for each search concept
(original search concepts) elicited from the test thesaurus;
(2) the synonyms of the original search keys were added to the original query;
(3) search keys representing the narrower concepts of the original search concepts were added to the
original query;
(4) search keys representing the associative concepts of the original search concepts were added to the
original query;
(5) all previous expansion keys were cumulatively added to the original query.
Query structure refers to the syntactic structure of a query expression, marked with query operators and
parentheses. The structure of queries was either weak (queries with no differentiated relations between
search keys, except weights) or strong (different relationships between search keys).

1. Problems in Search Key Selection


and Query Formulation

Information retrieval (IR) is the


activity of information searching and
the name of a research area. IR is
concerned with the processes
involved in the representation,
storage, searching and finding of Information needs may be categorised
information which is relevant to a into (1) verificative, (2) conscious
requirement for information desired topical, and (3) muddled or ill-
by a human user. defined needs. The first category
The reasons why people seek refers to situation where documents
information are called information with known properties are sought,
needs. e.g., by author name, titles of known
authors, etc. The second type implies
that the topic is known and definable,
but less exact than in the first
category. A person looking for
information has some level of
understanding of it. In the third
category are the cases in which a
person wishes to find new knowledge
and concepts in domains he is not
familiar with.

2. Retrieval Techniques: Exact and


Partial Matching

A query and document


representations are compared using
some retrieval or matching technique. 3. From Requests to Queries
Figure 5 shows a classification of
retrieval techniques. An essential
An information need should be
differentiation lies between exact and
partial match. The former is identified formulated into the language of an
with matching based on Boolean IR system in order to retrieve
logic, the latter is diversified, most documents. In this chapter we shall
studied cases being the vector space present the transformation process.
model (Salton & McGill 1983) and As soon as the request is stated, it
prob-abilistic retrieval. Exact match dominates the scene, and the words of
means that the conditions given in a the request govern query formulation.
query must be fulfilled exactly in a This is related to the label effect,
document to be retrieved. Boolean which means that information need is
retrieval has been the prevalent expressed by words and phrases that
technique of operational IR systems do not wholly describe the need but
until recently. label it.

Vector Space and Probabilistic 4. Query Construction


Models
The In-Query Retrieval System We adopt the model of three
abstraction levels, that is, we
distinguish between conceptual,
linguistic and string levels. First,
search concepts are identified from a
request. Second, the concepts are
connected to expressions, which, in
turn, are replaced by search keys. As to conceptual hierarchies. Coverage
the result of this process, a request is is the number of concepts in the
translated into a query. conceptual query plan. Coverage is
connected to the paradigmatic
relations of the concepts in a facet (or
to disjunction of concepts).

4.2 Linguistic level


The conceptual query plan is turned
into the level of expressions by
naming the concepts with accurate
words and phrases. A search
expression refers to words and
4.1 Conceptual level phrases of natural language, to
A conceptual or facet analysis of the common codes and abbreviations,
request is an essential part of query and to terms of documentation
formulation. At the conceptual level a languages. Search keys are either
query should be formulated without string constants, which correspond
any ties to query languages, rather it to expressions at string level, or
should respect the information need
string patterns, which match several
behind the request. A result of this
analysis is a conceptual query plan. string constants and are typically
The conceptual query plan may be formed from constants by applying
described by complexity, specificity, wild cards.
and coverage. Complexity is the
number of facets in the conceptual 4.3 String level
query plan. A facet is represented in At the string level search expressions
the plan if any of the concepts are turned into search keys (character
representing the facet is included. strings) suitable for matching in the
Complexity is connected to the index of a database, and the
syntagmatic relations of facets (or to expressed query becomes a formal
conjunction of concepts). Specificity query.The number of search keys in a
refers to the specificity of the query is an analogous concept with
concepts representing facets. If the coverage, and is referred to as
concepts are of the same specificity broadness. In a strict sense,
level as the request, the plan is fully complexity and coverage belong to
specific. The specificity is connected the conceptual query plan, but are
used to describe formal queries as with several operators;
well, because no confusion should differentiated relationships between
arise. In other words, the attributes of search keys).
the conceptual level are passed on to Queries can be classified according to
the linguistic and string levels. In all, their structural features:
abstraction levels should be 1)are concepts identified?
understood as overlapping layers, and
2)are facets identified?
a concept as a kernel item, which has
3)are concepts weighted?
representatives at other levels.
4)are facets weighted?
5. Query Structures 5)are search keys based on single
words or phrases?
In the literature the expression 6)are search keys weighted?
structured queries typically refers 7)by what operators are facets /
to queries formulated with the concepts / search keys
connected?
Boolean operators in contrast to
natural language queries, which are 6.Query Expansion
sets of words. Thus, structure may
be understood as relationships Query formulation, reformulation,
between search keys in a query, and and expansion have been studied
it is expressed in query lan-guages extensively because the selection of
by operators or search key weights, good search keys is difficult but
which guide the matching of search crucial for good results. Real users
keys and document representations. requests and/or queries do not
The structure of queries may be usually contain all the expressions
described as weak or strong. A and not necessarily the best
weak query structure does not expressions that might be used to
indicate facet or concept structure describe the concepts of interest.
with operators (i.e., queries with a Typically, requests are short, and
single or no explicit operator; each concept in them is named by
differentiated relations between one expression only, and queries
search keys). In a strong query may contain only a few search
structure search keys representing expressions. (Bates, Wilde &
different concepts or facets are Siegfried 1993, 2327; Lu &
separated by operators (i.e., queries Keefer 1995.) The first query
formulation typically acts as an dependent on retrieval technique
entry to the search system and is and query structure, although little
followed by browsing and query consideration has been given to
reformulations. The first query may query structures in partial match
be reformulated by adding search retrieval.
keys with or without reweighting
(NB., both search keys and text Conclusion
keys may be reweighted), the
process known as query expansion In this study we have tested the
(QE). QE may be specified by the effects of query structure,
sources of expansion keys, by the complexity and expansion on
methods of selecting expansion retrieval performance in partial
keys, and by the methods of adding match retrieval. We have
expansion keys to queries. demonstrated that when the
broadness of queries is low, the
query structure does not have
significant effects on retrieval
performance, but when queries are
expanded in other words, when
the number of search keys is
increased the query structure
affects retrieval performance
substantially. The best effectiveness
In general, QE is not tied to any for broad queries is yielded by
retrieval technique, but may be facet- or concept-based query
applied with any of them. However, structures that treat all keys
the effects of QE could be representing a facet or a concept as
instances of one key.