Sie sind auf Seite 1von 16

Information Retrieval

Acknowledgements:
Dr Mounia Lalmas (QMW)
Dr Joemon Jose (Glasgow)

Course Text
Modern

Information
Retrieval,
R. Baeza-yates and B.
Ribeiro-Neto.,
Addison-Wesley and ACM
Press, 1999,
ISBN: 0-201-39829-X
2

Introduction

Example of information need in the context of the world


wide web:
Find all documents containing information on computer
courses which:
(1) are offered by universities in South England, and
(2) are accredited by the BCS/IEE bodies. To be
relevant, the document must include information on
admission requirements, and e-mail and phone number
for contact purpose.
Information Retrieval

Information Retrieval

Representation, storage, organisation, and access to


information items

(Usually) keyword-based representation


Information Need
Query
Set of retrieved documents
Useful or relevant
information to the
user

Documents

Search Engine
Retrieval System

Primary goal of an IR system


Retrieve all the documents which are relevant to a user query,
while retrieving as few non-relevant documents as possible.

User tasks
Pull

technology

User requests
information in an
interactive manner
3 retrieval tasks
Browsing (hypertext)
Retrieval (classical IR

systems)
Browsing and retrieval
(modern digital libraries
and web systems)

Push

technology

automatic and

permanent pushing of
information to user
software agents
example: news service
filtering (retrieval
task) relevant
information for later
inspection by user
5

Documents
Unit

of retrieval
A passage of free text
composed of text, strings of characters

from an alphabet
composed of natural language

newspaper article, a journal paper, a


dictionary definition, email messages

size of documents
arbitrary
newspaper article vs. journal paper vs. email
6

What is a document?

Representation of documents

Set of index terms or keywords

extracted directly form text


specified by human subjects (information science) metadata
Most concise representation
Poor quality of retrieval

Full text representation

Most complete representation


High computational cost

Large collections

Reduce set of representative keywords


Elimination of stop words
Stemming
Identification of noun phrases
Further compression

Structure representation

Chapter, section, sub-section, etc

Document term
descriptors to
access texts

Generation of
descriptors for
text
By hand
By analysing the text 8

Relevance
feedback

The retrieval process


Information need

Documents

Formulation
Formulation

Indexing
Indexing

Query

Document representation

Retrievalfunctions
functions
Retrieval

Retrieved documents

Queries
Information

Need
Simple queries
composed of two or three, perhaps even

dozens, of keywords
e.g., as in web retrieval
Boolean

User term
descriptors
characterising
the user need

queries

neural networks AND speech recognition

Context

Queries

Proximity search, phrase queries

10

Best-Match Retrieval

Compare the terms in a document


and query
Compute similarity between each
document in the collection and the
query based on the terms that they
have in common
Sorting the documents in order of
decreasing similarity with the query
The outputs are a ranked list and
displayed to the user - the top ones
are more relevant as judged by the
system

Document term
descriptors to
access texts

User term
descriptors
characterising
the user need

11

Conceptual View of Text


Retrieval
Queries

Similarity
Computation

Documents

Retrieved
Documents

12

Expanded view of text retrieval


system
Queries

Indexing

Similarity
Computation

Indexed
Documents

Documents

Retrieved
Documents

Ranked
Documents

13

Process of retrieving info


User Interface
User feedback

Text

User
need

Text

Text Operations
Logical view

Query
Operations

Indexing
Inverted
file

Query

Similarity
Computation
Retrieved docs
Ranked docs

Logical view

Ranking

Index

Document
Repository
Manager

Text
repository

14

Key Topics
Indexing

text documents
Retrieving text documents
Evaluation
Query reformulations
Search Engines
=
IR + Link Structure + Name Interpretation
15

Information Retrieval
vs Information Extraction
Information

Retrieval

Given a set of query terms and a set of document

terms select only the most relevant documents


[precision], and preferably all the relevant [recall].

Information

Extraction

Extract from the text what the document means.

IR

systems can FIND documents but need not


understand them
16

Das könnte Ihnen auch gefallen