Beruflich Dokumente
Kultur Dokumente
Acknowledgements:
Dr Mounia Lalmas (QMW)
Dr Joemon Jose (Glasgow)
Course Text
Modern
Information
Retrieval,
R. Baeza-yates and B.
Ribeiro-Neto.,
Addison-Wesley and ACM
Press, 1999,
ISBN: 0-201-39829-X
2
Introduction
Information Retrieval
Documents
Search Engine
Retrieval System
User tasks
Pull
technology
User requests
information in an
interactive manner
3 retrieval tasks
Browsing (hypertext)
Retrieval (classical IR
systems)
Browsing and retrieval
(modern digital libraries
and web systems)
Push
technology
automatic and
permanent pushing of
information to user
software agents
example: news service
filtering (retrieval
task) relevant
information for later
inspection by user
5
Documents
Unit
of retrieval
A passage of free text
composed of text, strings of characters
from an alphabet
composed of natural language
size of documents
arbitrary
newspaper article vs. journal paper vs. email
6
What is a document?
Representation of documents
Large collections
Structure representation
Document term
descriptors to
access texts
Generation of
descriptors for
text
By hand
By analysing the text 8
Relevance
feedback
Documents
Formulation
Formulation
Indexing
Indexing
Query
Document representation
Retrievalfunctions
functions
Retrieval
Retrieved documents
Queries
Information
Need
Simple queries
composed of two or three, perhaps even
dozens, of keywords
e.g., as in web retrieval
Boolean
User term
descriptors
characterising
the user need
queries
Context
Queries
10
Best-Match Retrieval
Document term
descriptors to
access texts
User term
descriptors
characterising
the user need
11
Similarity
Computation
Documents
Retrieved
Documents
12
Indexing
Similarity
Computation
Indexed
Documents
Documents
Retrieved
Documents
Ranked
Documents
13
Text
User
need
Text
Text Operations
Logical view
Query
Operations
Indexing
Inverted
file
Query
Similarity
Computation
Retrieved docs
Ranked docs
Logical view
Ranking
Index
Document
Repository
Manager
Text
repository
14
Key Topics
Indexing
text documents
Retrieving text documents
Evaluation
Query reformulations
Search Engines
=
IR + Link Structure + Name Interpretation
15
Information Retrieval
vs Information Extraction
Information
Retrieval
Information
Extraction
IR