Beruflich Dokumente
Kultur Dokumente
Organizational Remarks
Exercises: Please, register to the exercises by sending me (huerst@informatik.uni-freiburg.de) an email till Friday, May 5th, with - Your name, - Matrikelnummer, - Studiengang, - Plans for exam This is just to organize the exercises but has no effect if you decide to drop this course later.
INDEX
QUERY
INDEXING
SEARCH
INDEX
QUERY
PERFORMANCE EVALUATION
Proximity, e.g.
Phrases
Often used (esp. for web search): Quotas e.g. New York City Advantage: Easy and seem to work well (about 10% of web queries are such phrases according to Manning et al. [2])
How do we support this? We need word positions. We need all original words (e.g. no stop word removal in University of Freiburg). We need an efficient way to do this.
23:4[3,12,46,78] 18 23 25 47 25:3[43,120,221]
YORK 9421
32:6[12,20,57,200,322,481] 25 47 53 55
NEW
23535
,25:6[41,87,136,],
,25:2[42,137],
YORK 9421
Positional Indexes
Also works for queries such as University [word]1 Freiburg University NEAR Freiburg Problem: Size Need to store additional info (positions) on an already large index (stop words!) Approx. size: 2-4 times the original index, 1/2 size of uncompressed documents [2] In practice: Combinations exist, e.g. index w. names as phrases, useful biwords, and store position
Wildcards (Cont.)
General wildcards, e.g. f*ball
(matches e.g. to fuball, federball, )
Permuterm index:
dictionary = all permuterms, postings = dictionary terms containing this rotation
Query:
Permute * to the end (e.g. ball$f*) and get postings from permuterm index (e.g. ball$fu, ball$feder, )
Structural Queries
In practice: Often semi-structured documents Structural queries: Use available structure to better specify the information need, e.g. Requires to store structure information, e.g. in a parametric index
encoded in the dictionary:
OTTMANN.AUTHOR OTTMANN.TITLE OTTMANN.BODY 9 12 8 17 26 9 19 44 17 28 48 23
or in the postings:
OTTMANN 8.BODY 9.AUTHOR, 9.BODY 12.TITLE
Proximity, e.g.