PIR Thesis Proposal Jinyoung

1
3/24/2011, Thesis Proposal Defense
Retrieval and Evaluation Techniques for
PERSONAL INFORMATION
Jinyoung Kim
Advisor : W. Bruce Croft
2
* Outline
•  Introduction
•  Retrieval Models
Completed
Work
•  Evaluation Techniques
•  Proposed Work
3
PROBLEM OVERVIEW
4
* Personal Information Retrieval (PIR)

•  Retrieval of people’s own information
•  An example of desktop search
5
* Another Example
A Tweet can have this

much information!
6
* Characteristics & Related Areas

•  Many document types
•  Related area : Federated search
•  Unique metadata for each type

•  Related area : Structured document retrieval
•  Long-term interaction with a single user

•  Related area : Interactive IR / Search personalization
•  People mostly do re-finding

•  Related area : Known-item finding
7
* Previous Work for PIR

•  Major Focus
•  User Interface Issues [Dumais03,06]
•  Desktop-specific Features [Solus06] [Cohen08]
•  Limitations
•  Each study proposed its own retrieval method
•  Each study is evaluated on different user study
•  None of them performed comparative evaluation
8
* Contributions Overview
•  General Retrieval Models for PIR Keyword Query
Term-based Search
•  Term-based search model
E-Mail
•  Associative browsing model Associative Browsing
Bookmark
Document Blog Post

Document
Document Webpage
•  Evaluation Models for PIR E-mail

Daily Journal
DailyNews
Journal
•  Simulation-based evaluation method

•  Game-based evaluation method
•  Novel Techniques for Related Areas

•  A retrieval method for structured document retrieval
•  A type prediction method for structured document retrieval
•  An adaptive method for creating browsing suggestions
•  Evaluation methods for known-item finding
9
* Major Publications
•  [ECIR09]
•  A Probabilistic Retrieval Model for Semi-structured Data
•  Jinyoung Kim, Xiaobing Xue and W. Bruce Croft in ECIR'09
•  [CIKM09]
•  Retrieval Experiments using Pseudo-Desktop Collections
•  Jinyoung Kim and W. Bruce Croft in CIKM'09
•  [SIGIR10]
•  Ranking using Multiple Document Types in Desktop Search
•  Jinyoung Kim and W. Bruce Croft in SIGIR'10
•  [CIKM10]
•  Building a Semantic Representation for Personal Information
•  Jinyoung Kim, Anton Bakalov, David A. Smith and W. Bruce Croft in CIKM'10
10
RETRIEVAL MODELS FOR

(Completed Work)
11
* Term-based Search Model [SIGIR10]

•  Type-specific Ranking
•  Contribution : PRM-S retrieval method
•  Type Prediction
•  Contribution : FQL & Feature-based method
•  Combine into the Final Result
•  Rank-list merging using CORI [Callan,Lu,Croft95]
12
* Probabilistic Retrieval Model for Semi-structured data

[ECIR09]
•  User’s Query
James Registration
•  Implicit Query-Field Mapping

13
* Mixture of Field LM vs. PRM-S

q1 q2 ... qm q1 q2 ... qm
f1 f1 f1 f1
w1 w1 P(F1|q1) P(F1|qm)
f2 f2 f2 f2
w2 w2 P(F2|q1) P(F2|qm)
... ... ... ...
fn fn fn fn
wn wn P(Fn|q1) P(Fn|qm)
m �
n �m �
n
�
P (Q|d) = wj PQL (qi |fj ) P (Q|d) = PM (Fj |qi )PQL (qi |fj )
i=1 j=1 i=1 j=1
•  PRM-S outperforms MFLM [Ogilvie03] & BM25F [Robertson04]
(Using the TREC W3C Email Collection / Measured in MRR)

14
* Type Prediction Methods

•  Field-based collection Query-Likelihood (FQL) [SIGIR10]
•  Calculate QL score for each field of a collection
•  Combine field-level scores into a collection score
f1
f2
...
fn
A Document
•  Feature-based Method [SIGIR10] with Fields
•  Combine existing type-prediction methods
•  Grid Search / SVM for finding combination weights
15
* Type Prediction Performance

•  Pseudo-desktop Collections
•  CS Collection
(% of queries with correct prediction)
•  FQL improves performance over CQL

•  Combining features improves the performance further
16
* Would term-based search be sufficient?

•  Term-based search doesn’t always work
•  Sometimes a user doesn’t have ‘good’ keywords
•  Search is not always a preferred option [Teeval04]
•  Associative browsing as a solution

•  Human memory has association mechanism [Tulving73]
Keyword Query
Term-based Search
E-Mail
Bookmark
Associative Browsing
Document Blog Post
Document
Document Webpage
Daily Journal
DailyNews
Journal
E-mail Associations between
documents
Access path toward

the target document
17
* Known-item Finding with Associative Browsing

[CIKM10]
•  Associative Browsing Model
•  Extract concepts from document metadata
•  Build a network of concepts and documents
•  By combining features based on user’s feedback
Keyword Search
Concepts Documents
Place
Event
Person
Term Vector Similarity Term
Concept Space
Temporal Similarity
Tag Similarity E-Mail
Bookmark
String Path / Type Document

Document
Document Webpage Blog Post
Similarity Similarity
Daily Journal
Co-occurrence Concept E-mail DailyNews
Journal
Similarity Document Space

18
* Known-item Finding with Associative Browsing

[CIKM10]
•  User Interface
•  User clicks on suggestions for browsing
•  System uses the click data for training feature weights
•  Grid Search / RankSVM as learning methods
Browsing Suggestions
19
* The Quality of Browsing Suggestions

•  For Concept Browsing
(Using the CS Collection, Measured in MRR)
•  For Document Browsing
•  The advantages are in some part the product of personalization

20
* Summary – Retrieval Models

•  Term-based Search vs. Associative Browsing
Term-based Search Associative Browsing
User’s On Target Item On Related Item

Knowledge
User’s Input Typing Query Click on Suggestions
•  Technical Contributions
•  PRM-S retrieval method Exploiting Field
•  FQL type prediction method Structure
•  Feature-based type prediction
Exploiting User
•  Feature-based browsing suggestions Feedback
21
EVALUATION METHODS FOR

(Completed Work)
22
* Challenges in Personal Search Evaluation

•  Hard to create a ‘test-collection’
•  Each user has different documents and habits
•  Privacy concerns
•  People will not donate their documents and queries for research
•  Can’t we just do some diary study?

•  Deploy software to users’ machine and see the long-term usage
23
* Problems with User Studies

•  It’s costly
•  A ‘working’ system should be implemented
•  Participants should be using it for a long time
•  Experimental control is hard

•  You need to double the participant for each control variable
•  Data is not reusable by third parties

•  The findings cannot be repeated by others
•  How can we evaluate with low-cost & repeatability?

24
* Solution : Simulated Evaluation

•  Basic Idea : simulate a part of user’s interaction
Components of Evaluation Diary Study DocTrack Pseudo-desktop

- Document / Metadata - User's documents - Collection documents from public sources
Collection - Usage Logs
Replace
...
Col & Task
- Known-item finding - Actual task - Simulated task

Task - Topical search
...
- Query - Human interaction - Algorithmic generation

Interaction - Click-through Replace
Interaction
- Scroll
...
25
* DocTrack Game [SIGIR10]

•  Procedure
•  Collect public documents in UMass CS department
•  Build a web interface where participants can find documents
•  Ask CS department people to join and compete
•  Benefits
•  Participants are motivated to contribute the data
•  Resulting queries and logs are reusable
•  Free from privacy concerns
•  Low cost compared to a traditional user study
26
* DocTrack Game Target Item
System User
Randomly choose two

candidate documents
Skim though documents

(15 seconds each)
Randomly pick one Find It!

target document
Use keyword search to

find the document
Generate a ranked list

for keyword search
27
* Pseudo-desktop Method [CIKM09]

•  Collect documents of reasonable size and variety
•  Filter an existing email collection by a person’s name
•  Use web search to collect documents mentioning the person
•  Generate queries automatically

•  Randomly select a target document
•  Take terms from the document algorithmically
Query :
James Registration
28
* Query Generation and Validation

•  Parameters of Query Generation
•  Choice of extent : Document [Azzopardi07] vs. Field [CIKM09]
•  Choice of term : Uniform vs. TF vs. IDF vs. TF-IDF [Azzopardi07]
•  Validation by Manual Queries

•  Compare Query-terms [CIKM09]
•  Compare Retrieval Scores [Azzopardi07]
•  Two-sided Kolmogorov-Smirnov test
•  Experimental Results
•  Field-based generation shows higher validity using both methods
(in TREC Email and Pseudo-desktop collections)
29
* Summary – Evaluation Methods

•  Comparison of Evaluation Methods
User Study DocTrack Pseudo-desktop
Simulated Part None Collection / Task Collection / Task /

Interaction
Human Actual User Game participants None
Involvement (privacy issue)
•  Our Contributions
•  DocTrack game + CS Collection
•  A platform for game-based user study in PIR
•  Pseudo-desktop method + Pseudo-desktop Collection

•  Field-based query generation method
30
* Community Efforts based on the Datasets

31
PROPOSED WORK
32
* Contributions Review
•  In Personal Information Retrieval
•  Two general retrieval methods
•  Two novel evaluation methods
•  In Related Area
Contributions
Previous Work(completed) Contributions
Contributions (proposed)
(completed)
Structured
Structured PRM-S Analyzing PRM-S
Mixture [ECIR09]
of Field LM PRM-S
Document
Document
[ECIR09]
PRM-D
BM25F [CIKM09] PRM-D [CIKM09] PRM-S
Improving
Retrieval
Retrieval
Federated
Federated Field-based CQL Likelihood
Collection Query [SIGIR10] Field-based CQL [SIGIR10]
Search
Search Feature-based
Feature-based Type Prediction
Vertical Selection Feature-based Type Prediction
[SIGIR10]
[SIGIR10]
Associative Cosine similarity

Feature-based Suggestion [CIKM10] Feature-based Suggestion
User Modeling for Browsing
[CIKM10]
Browsing
Known-item Field-based
Query Generation Field-based
Query Generation[CIKM09] Query
Improving Generation
Query Generation
[CIKM09]
Finding
PageHunt Game
DocTrack Game [SIGIR10] DocTrack Game [SIGIR10]
33
* Analyzing the PRM-S Method

•  Factors affecting the performance of PRM-S
•  Query characteristics
•  Length, field mappings
•  Collection characteristics
•  Number, languages of fields
•  Understanding these characteristics

•  Experiment with query generation methods
•  Experiment with various collections
•  REXA Academic Paper Collection (with actual query logs)
•  Enron Email Collection (queries collected by DocTrack)
34
* Improving the PRM-S Method

•  Improve the estimation of mapping probability
•  Mapping probability is estimated independently per query term
•  Other sources of mapping estimation

•  Dependency between query-terms
•  Dependency in term occurrence across different fields
•  Phrase (bi-gram)
•  Cast it as a sequential labeling problem

•  Conditional Random Field as Learning Method
•  Requires lots of training queries
Start F1 F2 F3 End
Term1 Term2 Term3

35
* More Realistic Query Generation Method

•  HMM-based Query Generation Method
•  Searcher remembers aspects of the target item in sequence
•  Each aspect (field) generates query-terms
Start F1 F2 F3 End
Term1 Term2 Term3
•  Parameter Estimation
•  Based on human-generated queries
•  But how do we get a large quantities of manual queries?
36
* Improving Query Generation by Crowdsourcing
•  Interaction Scenario
What’s the query
you would use to
Algorithm Human find the document?
Gather a Set of
Manual Queries
Query Generation Which of the following

Parameter Initialization queries would you use
to find the document?
Evaluation of
Generated Queries
Query Generation
Generated Queries
Parameter Refinement indistinguishable from
Manual Queries
37
* Probabilistic User Modeling for PIR

Term-based Search
Keyword Query
•  Motivation E-Mail
Bookmark
Associative Browsing
•  User study is expensive Document Blog Post
Document Webpage
with two access methods
Document
Daily Journal
DailyNews
Journal
E-mail
•  Can we simulate the user interaction

in such circumstances?
Start
•  Unified user model as a solution Click on Result

Click on Result
•  Term-based search Search Browsing
•  Associative browsing Type Keyword

Type Keyword
•  State transitions
End
38
* Plan for Proposed Work

•  2011/3 - 2011/5
•  Analysis and improvement on PRM-S and query-generation
•  2011/9 - 2011/12
•  Unified user modeling for known-item finding
•  2012/1 – 2012/5
•  Additional experiments
•  2012/6 – 2012/8
•  Finalize the thesis
39
REFERENCES
40
* Major References
•  Desktop Search
•  Stuff I’ve Seen [Dumais et al.]
•  Semi-structured Document Retrieval
•  Mixture of Field Language Model [Ogilvie & Callan]
•  Federated Search
•  CORI method for rank-list merging [Callan et al.]
•  Associative Browsing
•  Find-similar method [Smucker & Allan]
•  Known-item Finding
•  Query generation method [Azzopardi et al.]
•  Human Computation Game
•  PageHunt [Ma & Chandrasekar]
41
OPTIONAL SLIDES
42
* Probabilistic Retrieval Model for Semi-structured data

(PRM-S)[ECIR09]
•  Infer the mapping P(Fj|qi)

PM (qi |Fj )PM (Fj )
PM (Fj |qi ) = ∝ PQL (qi |Fj )PM (Fj )
P (qi )
•  Use P(Fj|qi) for field weights q1 q2 ... qm

m �
� n
f1 f1
P (Q|d) = PM (Fj |qi )PQL (qi |fj ) P(F1|q1) P(F1|qm)
i=1 j=1
f2 f2
P(F2|q1) P(F2|qm)
... ...
fn fn
P(Fn|q1) P(Fn|qm)
43
* Merging into the Final Result

•  What we have for each collection
•  Type-specific ranking
•  Type score
•  CORI Algorithm for Merging [Callan,Lu,Croft95]

•  Use normalized collection and document score
* A User Model for Associative Browsing
•  User’s level of knowledge
•  Random : randomly click on a ranked list
•  Informed
•  Oracle : always click on the best possible item
•  User’s browsing behavior

•  Fan-out : the number of clicks per ranked list
•  BFS vs. DFS : the order in which documents are visited
Fan-out 1
1 2 3 4 5 6 7
Fan-out 2 / BFS Fan-out 2 / DFS

4 3
2 2
5 4
1 1
6 6
3 5
7 7
45
EXPERIMENTAL RESULTS
Term-based Search Model
46
* Experimental Setting
•  Crawl of W3C mailing list & documents
•  Automatically generated queries
•  100 queries / average length of 2
•  UMass CS department webpages & emails & etc.
•  Human-formulated queries from DocTrack game
•  984 queries / average length 3.97
•  Other details
•  Mean Reciprocal Rank was used for evaluation
47
* Collection Statistics
(#Docs (Length))
48
* Validation of Generated Queries

In W3C Email Collection [CIKM09]
•  Compare Query-terms
•  Compare the Distribution of Retrieval Scores

49
* Type Prediction Performance

(% of queries with correct prediction)
•  FQL improves performance over CQL

•  Combining features improves the performance further
50
* Retrieval Performance
Best :
use best type-specific
retrieval method
Oracle :
predict correct type
perfectly
(Mean Reciprocal Rank)

51
EXPERIMENTAL RESULTS
Associative Browsing Model
52
* Experimental Setting
•  Collections
•  Two personal collections of volunteers & their click data
•  CS Collection & click data collected from the DocTrack game
•  Collection Statistics
•  The Role of Browsing

53
* The Quality of Browsing Suggestions

•  For Concept Browsing
•  For Document Browsing

PIR Thesis Proposal Jinyoung

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

PIR Thesis Proposal Jinyoung

Hochgeladen von

Copyright:

Verfügbare Formate

1

3/24/2011, Thesis Proposal Defense

Retrieval and Evaluation Techniques for

* Personal Information Retrieval (PIR)

A Tweet can have this

* Characteristics & Related Areas

• Unique metadata for each type

• Long-term interaction with a single user

• People mostly do re-finding

* Previous Work for PIR

Document Blog Post

• Evaluation Models for PIR E-mail

• Simulation-based evaluation method

• Novel Techniques for Related Areas

RETRIEVAL MODELS FOR

* Term-based Search Model [SIGIR10]

* Probabilistic Retrieval Model for Semi-structured data

• Implicit Query-Field Mapping

* Mixture of Field LM vs. PRM-S

• PRM-S outperforms MFLM [Ogilvie03] & BM25F [Robertson04]

(Using the TREC W3C Email Collection / Measured in MRR)

* Type Prediction Methods

* Type Prediction Performance

(% of queries with correct prediction)

• FQL improves performance over CQL

* Would term-based search be sufficient?

• Associative browsing as a solution

Access path toward

* Known-item Finding with Associative Browsing

String Path / Type Document

Similarity Document Space

* Known-item Finding with Associative Browsing

* The Quality of Browsing Suggestions

(Using the CS Collection, Measured in MRR)

• For Document Browsing

• The advantages are in some part the product of personalization

* Summary – Retrieval Models

Term-based Search Associative Browsing

User’s On Target Item On Related Item

EVALUATION METHODS FOR

* Challenges in Personal Search Evaluation

• Can’t we just do some diary study?

* Problems with User Studies

• Experimental control is hard

• Data is not reusable by third parties

• How can we evaluate with low-cost & repeatability?

* Solution : Simulated Evaluation

Components of Evaluation Diary Study DocTrack Pseudo-desktop

- Known-item finding - Actual task - Simulated task

- Query - Human interaction - Algorithmic generation

* DocTrack Game [SIGIR10]

* DocTrack Game Target Item

Randomly choose two

Skim though documents

Randomly pick one Find It!

Use keyword search to

Generate a ranked list

* Pseudo-desktop Method [CIKM09]

• Generate queries automatically

* Query Generation and Validation

• Validation by Manual Queries

* Summary – Evaluation Methods

•  Unique metadata for each type

•  Long-term interaction with a single user

•  People mostly do re-finding

•  Evaluation Models for PIR E-mail

•  Simulation-based evaluation method

•  Novel Techniques for Related Areas

•  Implicit Query-Field Mapping

•  PRM-S outperforms MFLM [Ogilvie03] & BM25F [Robertson04]

•  FQL improves performance over CQL

•  Associative browsing as a solution

•  For Document Browsing

•  The advantages are in some part the product of personalization

•  Can’t we just do some diary study?

•  Experimental control is hard

•  Data is not reusable by third parties

•  How can we evaluate with low-cost & repeatability?

•  Generate queries automatically

•  Validation by Manual Queries

•  Pseudo-desktop method + Pseudo-desktop Collection

•  Understanding these characteristics

•  Other sources of mapping estimation

•  Cast it as a sequential labeling problem

•  Can we simulate the user interaction

•  Unified user model as a solution Click on Result

•  Associative browsing Type Keyword

•  Infer the mapping P(Fj|qi)

•  Use P(Fj|qi) for field weights q1 q2 ... qm

•  CORI Algorithm for Merging [Callan,Lu,Croft95]

•  User’s browsing behavior

•  Compare the Distribution of Retrieval Scores

•  FQL improves performance over CQL

•  The Role of Browsing

•  For Document Browsing