Sie sind auf Seite 1von 53

1

3/24/2011, Thesis Proposal Defense

Retrieval and Evaluation Techniques for

PERSONAL INFORMATION

Jinyoung Kim
Advisor : W. Bruce Croft
2

* Outline

•  Introduction

•  Retrieval Models
Completed
Work
•  Evaluation Techniques

•  Proposed Work
3

PROBLEM OVERVIEW
4

* Personal Information Retrieval (PIR)


•  Retrieval of people’s own information
•  An example of desktop search
5

* Another Example

A Tweet can have this


much information!
6

* Characteristics & Related Areas


•  Many document types
•  Related area : Federated search

•  Unique metadata for each type


•  Related area : Structured document retrieval

•  Long-term interaction with a single user


•  Related area : Interactive IR / Search personalization

•  People mostly do re-finding


•  Related area : Known-item finding
7

* Previous Work for PIR


•  Major Focus
•  User Interface Issues [Dumais03,06]
•  Desktop-specific Features [Solus06] [Cohen08]

•  Limitations
•  Each study proposed its own retrieval method
•  Each study is evaluated on different user study
•  None of them performed comparative evaluation
8

* Contributions Overview
•  General Retrieval Models for PIR Keyword Query

Term-based Search
•  Term-based search model
E-Mail
•  Associative browsing model Associative Browsing
Bookmark

Document Blog Post


Document
Document Webpage

•  Evaluation Models for PIR E-mail


Daily Journal
DailyNews
Journal

•  Simulation-based evaluation method


•  Game-based evaluation method

•  Novel Techniques for Related Areas


•  A retrieval method for structured document retrieval
•  A type prediction method for structured document retrieval
•  An adaptive method for creating browsing suggestions
•  Evaluation methods for known-item finding
9

* Major Publications
•  [ECIR09]
•  A Probabilistic Retrieval Model for Semi-structured Data
•  Jinyoung Kim, Xiaobing Xue and W. Bruce Croft in ECIR'09

•  [CIKM09]
•  Retrieval Experiments using Pseudo-Desktop Collections
•  Jinyoung Kim and W. Bruce Croft in CIKM'09

•  [SIGIR10]
•  Ranking using Multiple Document Types in Desktop Search
•  Jinyoung Kim and W. Bruce Croft in SIGIR'10

•  [CIKM10]
•  Building a Semantic Representation for Personal Information
•  Jinyoung Kim, Anton Bakalov, David A. Smith and W. Bruce Croft in CIKM'10
10

RETRIEVAL MODELS FOR


PERSONAL INFORMATION
(Completed Work)
11

* Term-based Search Model [SIGIR10]


•  Type-specific Ranking
•  Contribution : PRM-S retrieval method
•  Type Prediction
•  Contribution : FQL & Feature-based method
•  Combine into the Final Result
•  Rank-list merging using CORI [Callan,Lu,Croft95]
12

* Probabilistic Retrieval Model for Semi-structured data


[ECIR09]

•  User’s Query

James Registration

•  Implicit Query-Field Mapping


13

* Mixture of Field LM vs. PRM-S


q1 q2 ... qm q1 q2 ... qm

f1 f1 f1 f1
w1 w1 P(F1|q1) P(F1|qm)

f2 f2 f2 f2
w2 w2 P(F2|q1) P(F2|qm)
... ... ... ...

fn fn fn fn
wn wn P(Fn|q1) P(Fn|qm)
m �
n �m �
n

P (Q|d) = wj PQL (qi |fj ) P (Q|d) = PM (Fj |qi )PQL (qi |fj )
i=1 j=1 i=1 j=1

•  PRM-S outperforms MFLM [Ogilvie03] & BM25F [Robertson04]

(Using the TREC W3C Email Collection / Measured in MRR)


14

* Type Prediction Methods


•  Field-based collection Query-Likelihood (FQL) [SIGIR10]
•  Calculate QL score for each field of a collection
•  Combine field-level scores into a collection score
f1

f2
...
fn

A Document
•  Feature-based Method [SIGIR10] with Fields
•  Combine existing type-prediction methods
•  Grid Search / SVM for finding combination weights
15

* Type Prediction Performance


•  Pseudo-desktop Collections

•  CS Collection

(% of queries with correct prediction)

•  FQL improves performance over CQL


•  Combining features improves the performance further
16

* Would term-based search be sufficient?


•  Term-based search doesn’t always work
•  Sometimes a user doesn’t have ‘good’ keywords
•  Search is not always a preferred option [Teeval04]

•  Associative browsing as a solution


•  Human memory has association mechanism [Tulving73]

Keyword Query

Term-based Search

E-Mail
Bookmark
Associative Browsing
Document Blog Post
Document
Document Webpage

Daily Journal
DailyNews
Journal
E-mail Associations between
documents

Access path toward


the target document
17

* Known-item Finding with Associative Browsing


[CIKM10]
•  Associative Browsing Model
•  Extract concepts from document metadata
•  Build a network of concepts and documents
•  By combining features based on user’s feedback

Keyword Search

Concepts Documents
Place
Event
Person
Term Vector Similarity Term
Concept Space
Temporal Similarity
Tag Similarity E-Mail
Bookmark

String Path / Type Document


Document
Document Webpage Blog Post
Similarity Similarity
Daily Journal
Co-occurrence Concept E-mail DailyNews
Journal

Similarity Document Space


18

* Known-item Finding with Associative Browsing


[CIKM10]
•  User Interface
•  User clicks on suggestions for browsing
•  System uses the click data for training feature weights
•  Grid Search / RankSVM as learning methods

Browsing Suggestions
19

* The Quality of Browsing Suggestions


•  For Concept Browsing

(Using the CS Collection, Measured in MRR)

•  For Document Browsing

•  The advantages are in some part the product of personalization


20

* Summary – Retrieval Models


•  Term-based Search vs. Associative Browsing

Term-based Search Associative Browsing

User’s On Target Item On Related Item


Knowledge
User’s Input Typing Query Click on Suggestions

•  Technical Contributions
•  PRM-S retrieval method Exploiting Field
•  FQL type prediction method Structure
•  Feature-based type prediction
Exploiting User
•  Feature-based browsing suggestions Feedback
21

EVALUATION METHODS FOR


PERSONAL INFORMATION
(Completed Work)
22

* Challenges in Personal Search Evaluation


•  Hard to create a ‘test-collection’
•  Each user has different documents and habits

•  Privacy concerns
•  People will not donate their documents and queries for research

•  Can’t we just do some diary study?


•  Deploy software to users’ machine and see the long-term usage
23

* Problems with User Studies


•  It’s costly
•  A ‘working’ system should be implemented
•  Participants should be using it for a long time

•  Experimental control is hard


•  You need to double the participant for each control variable

•  Data is not reusable by third parties


•  The findings cannot be repeated by others

•  How can we evaluate with low-cost & repeatability?


24

* Solution : Simulated Evaluation


•  Basic Idea : simulate a part of user’s interaction

Components of Evaluation Diary Study DocTrack Pseudo-desktop


- Document / Metadata - User's documents - Collection documents from public sources
Collection - Usage Logs
Replace
...
Col & Task

- Known-item finding - Actual task - Simulated task


Task - Topical search
...

- Query - Human interaction - Algorithmic generation


Interaction - Click-through Replace
Interaction
- Scroll
...
25

* DocTrack Game [SIGIR10]


•  Procedure
•  Collect public documents in UMass CS department
•  Build a web interface where participants can find documents
•  Ask CS department people to join and compete

•  Benefits
•  Participants are motivated to contribute the data
•  Resulting queries and logs are reusable
•  Free from privacy concerns
•  Low cost compared to a traditional user study
26

* DocTrack Game Target Item

System User

Randomly choose two


candidate documents

Skim though documents


(15 seconds each)

Randomly pick one Find It!


target document

Use keyword search to


find the document

Generate a ranked list


for keyword search
27

* Pseudo-desktop Method [CIKM09]


•  Collect documents of reasonable size and variety
•  Filter an existing email collection by a person’s name
•  Use web search to collect documents mentioning the person

•  Generate queries automatically


•  Randomly select a target document
•  Take terms from the document algorithmically

Query :
James Registration
28

* Query Generation and Validation


•  Parameters of Query Generation
•  Choice of extent : Document [Azzopardi07] vs. Field [CIKM09]
•  Choice of term : Uniform vs. TF vs. IDF vs. TF-IDF [Azzopardi07]

•  Validation by Manual Queries


•  Compare Query-terms [CIKM09]
•  Compare Retrieval Scores [Azzopardi07]
•  Two-sided Kolmogorov-Smirnov test

•  Experimental Results
•  Field-based generation shows higher validity using both methods
(in TREC Email and Pseudo-desktop collections)
29

* Summary – Evaluation Methods


•  Comparison of Evaluation Methods
User Study DocTrack Pseudo-desktop

Simulated Part None Collection / Task Collection / Task /


Interaction
Human Actual User Game participants None
Involvement (privacy issue)

•  Our Contributions
•  DocTrack game + CS Collection
•  A platform for game-based user study in PIR

•  Pseudo-desktop method + Pseudo-desktop Collection


•  Field-based query generation method
30

* Community Efforts based on the Datasets


31

PROPOSED WORK
32

* Contributions Review
•  In Personal Information Retrieval
•  Two general retrieval methods
•  Two novel evaluation methods

•  In Related Area
Contributions
Previous Work(completed) Contributions
Contributions (proposed)
(completed)
Structured
Structured PRM-S Analyzing PRM-S
Mixture [ECIR09]
of Field LM PRM-S
Document
Document
[ECIR09]
PRM-D
BM25F [CIKM09] PRM-D [CIKM09] PRM-S
Improving
Retrieval
Retrieval
Federated
Federated Field-based CQL Likelihood
Collection Query [SIGIR10] Field-based CQL [SIGIR10]
Search
Search Feature-based
Feature-based Type Prediction
Vertical Selection Feature-based Type Prediction
[SIGIR10]
[SIGIR10]

Associative Cosine similarity


Feature-based Suggestion [CIKM10] Feature-based Suggestion
User Modeling for Browsing
[CIKM10]
Browsing
Known-item Field-based
Query Generation Field-based
Query Generation[CIKM09] Query
Improving Generation
Query Generation
[CIKM09]
Finding
PageHunt Game
DocTrack Game [SIGIR10] DocTrack Game [SIGIR10]
33

* Analyzing the PRM-S Method


•  Factors affecting the performance of PRM-S
•  Query characteristics
•  Length, field mappings
•  Collection characteristics
•  Number, languages of fields

•  Understanding these characteristics


•  Experiment with query generation methods
•  Experiment with various collections
•  REXA Academic Paper Collection (with actual query logs)
•  Enron Email Collection (queries collected by DocTrack)
34

* Improving the PRM-S Method


•  Improve the estimation of mapping probability
•  Mapping probability is estimated independently per query term

•  Other sources of mapping estimation


•  Dependency between query-terms
•  Dependency in term occurrence across different fields
•  Phrase (bi-gram)

•  Cast it as a sequential labeling problem


•  Conditional Random Field as Learning Method
•  Requires lots of training queries
Start F1 F2 F3 End

Term1 Term2 Term3


35

* More Realistic Query Generation Method


•  HMM-based Query Generation Method
•  Searcher remembers aspects of the target item in sequence
•  Each aspect (field) generates query-terms

Start F1 F2 F3 End

Term1 Term2 Term3

•  Parameter Estimation
•  Based on human-generated queries
•  But how do we get a large quantities of manual queries?
36

* Improving Query Generation by Crowdsourcing

•  Interaction Scenario
What’s the query
you would use to
Algorithm Human find the document?
Gather a Set of
Manual Queries

Query Generation Which of the following


Parameter Initialization queries would you use
to find the document?
Evaluation of
Generated Queries

Query Generation
Generated Queries
Parameter Refinement indistinguishable from
Manual Queries
37

* Probabilistic User Modeling for PIR


Term-based Search
Keyword Query

•  Motivation E-Mail
Bookmark
Associative Browsing
•  User study is expensive Document Blog Post
Document Webpage
with two access methods
Document

Daily Journal
DailyNews
Journal
E-mail

•  Can we simulate the user interaction


in such circumstances?
Start

•  Unified user model as a solution Click on Result


Click on Result
•  Term-based search Search Browsing

•  Associative browsing Type Keyword


Type Keyword
•  State transitions

End
38

* Plan for Proposed Work


•  2011/3 - 2011/5
•  Analysis and improvement on PRM-S and query-generation

•  2011/9 - 2011/12
•  Unified user modeling for known-item finding

•  2012/1 – 2012/5
•  Additional experiments

•  2012/6 – 2012/8
•  Finalize the thesis
39

REFERENCES
40

* Major References
•  Desktop Search
•  Stuff I’ve Seen [Dumais et al.]
•  Semi-structured Document Retrieval
•  Mixture of Field Language Model [Ogilvie & Callan]
•  Federated Search
•  CORI method for rank-list merging [Callan et al.]
•  Associative Browsing
•  Find-similar method [Smucker & Allan]
•  Known-item Finding
•  Query generation method [Azzopardi et al.]
•  Human Computation Game
•  PageHunt [Ma & Chandrasekar]
41

OPTIONAL SLIDES
42

* Probabilistic Retrieval Model for Semi-structured data


(PRM-S)[ECIR09]

•  Infer the mapping P(Fj|qi)


PM (qi |Fj )PM (Fj )
PM (Fj |qi ) = ∝ PQL (qi |Fj )PM (Fj )
P (qi )

•  Use P(Fj|qi) for field weights q1 q2 ... qm


m �
� n
f1 f1
P (Q|d) = PM (Fj |qi )PQL (qi |fj ) P(F1|q1) P(F1|qm)
i=1 j=1
f2 f2
P(F2|q1) P(F2|qm)
... ...

fn fn
P(Fn|q1) P(Fn|qm)
43

* Merging into the Final Result


•  What we have for each collection
•  Type-specific ranking
•  Type score

•  CORI Algorithm for Merging [Callan,Lu,Croft95]


•  Use normalized collection and document score
* A User Model for Associative Browsing
•  User’s level of knowledge
•  Random : randomly click on a ranked list
•  Informed
•  Oracle : always click on the best possible item

•  User’s browsing behavior


•  Fan-out : the number of clicks per ranked list
•  BFS vs. DFS : the order in which documents are visited

Fan-out 1

1 2 3 4 5 6 7

Fan-out 2 / BFS Fan-out 2 / DFS


4 3
2 2
5 4

1 1
6 6
3 5
7 7
45

EXPERIMENTAL RESULTS
Term-based Search Model
46

* Experimental Setting
•  Pseudo-desktop Collections
•  Crawl of W3C mailing list & documents
•  Automatically generated queries
•  100 queries / average length of 2

•  CS Collection
•  UMass CS department webpages & emails & etc.
•  Human-formulated queries from DocTrack game
•  984 queries / average length 3.97

•  Other details
•  Mean Reciprocal Rank was used for evaluation
47

* Collection Statistics
•  Pseudo-desktop Collections

(#Docs (Length))

•  CS Collection
48

* Validation of Generated Queries


In W3C Email Collection [CIKM09]
•  Compare Query-terms

•  Compare the Distribution of Retrieval Scores


49

* Type Prediction Performance


•  Pseudo-desktop Collections

•  CS Collection

(% of queries with correct prediction)

•  FQL improves performance over CQL


•  Combining features improves the performance further
50

* Retrieval Performance
•  Pseudo-desktop Collections

Best :
use best type-specific
retrieval method
Oracle :
predict correct type
perfectly

•  CS Collection

(Mean Reciprocal Rank)


51

EXPERIMENTAL RESULTS
Associative Browsing Model
52

* Experimental Setting
•  Collections
•  Two personal collections of volunteers & their click data
•  CS Collection & click data collected from the DocTrack game

•  Collection Statistics

•  The Role of Browsing


53

* The Quality of Browsing Suggestions


•  For Concept Browsing

•  For Document Browsing

Das könnte Ihnen auch gefallen