Beruflich Dokumente
Kultur Dokumente
PERSONAL INFORMATION
Jinyoung Kim
Advisor : W. Bruce Croft
2
* Outline
• Introduction
• Retrieval Models
Completed
Work
• Evaluation Techniques
• Proposed Work
3
PROBLEM OVERVIEW
4
* Another Example
• Limitations
• Each study proposed its own retrieval method
• Each study is evaluated on different user study
• None of them performed comparative evaluation
8
* Contributions Overview
• General Retrieval Models for PIR Keyword Query
Term-based Search
• Term-based search model
E-Mail
• Associative browsing model Associative Browsing
Bookmark
* Major Publications
• [ECIR09]
• A Probabilistic Retrieval Model for Semi-structured Data
• Jinyoung Kim, Xiaobing Xue and W. Bruce Croft in ECIR'09
• [CIKM09]
• Retrieval Experiments using Pseudo-Desktop Collections
• Jinyoung Kim and W. Bruce Croft in CIKM'09
• [SIGIR10]
• Ranking using Multiple Document Types in Desktop Search
• Jinyoung Kim and W. Bruce Croft in SIGIR'10
• [CIKM10]
• Building a Semantic Representation for Personal Information
• Jinyoung Kim, Anton Bakalov, David A. Smith and W. Bruce Croft in CIKM'10
10
• User’s Query
James Registration
f1 f1 f1 f1
w1 w1 P(F1|q1) P(F1|qm)
f2 f2 f2 f2
w2 w2 P(F2|q1) P(F2|qm)
... ... ... ...
fn fn fn fn
wn wn P(Fn|q1) P(Fn|qm)
m �
n �m �
n
�
P (Q|d) = wj PQL (qi |fj ) P (Q|d) = PM (Fj |qi )PQL (qi |fj )
i=1 j=1 i=1 j=1
f2
...
fn
A Document
• Feature-based Method [SIGIR10] with Fields
• Combine existing type-prediction methods
• Grid Search / SVM for finding combination weights
15
• CS Collection
Keyword Query
Term-based Search
E-Mail
Bookmark
Associative Browsing
Document Blog Post
Document
Document Webpage
Daily Journal
DailyNews
Journal
E-mail Associations between
documents
Keyword Search
Concepts Documents
Place
Event
Person
Term Vector Similarity Term
Concept Space
Temporal Similarity
Tag Similarity E-Mail
Bookmark
Browsing Suggestions
19
• Technical Contributions
• PRM-S retrieval method Exploiting Field
• FQL type prediction method Structure
• Feature-based type prediction
Exploiting User
• Feature-based browsing suggestions Feedback
21
• Privacy concerns
• People will not donate their documents and queries for research
• Benefits
• Participants are motivated to contribute the data
• Resulting queries and logs are reusable
• Free from privacy concerns
• Low cost compared to a traditional user study
26
System User
Query :
James Registration
28
• Experimental Results
• Field-based generation shows higher validity using both methods
(in TREC Email and Pseudo-desktop collections)
29
• Our Contributions
• DocTrack game + CS Collection
• A platform for game-based user study in PIR
PROPOSED WORK
32
* Contributions Review
• In Personal Information Retrieval
• Two general retrieval methods
• Two novel evaluation methods
• In Related Area
Contributions
Previous Work(completed) Contributions
Contributions (proposed)
(completed)
Structured
Structured PRM-S Analyzing PRM-S
Mixture [ECIR09]
of Field LM PRM-S
Document
Document
[ECIR09]
PRM-D
BM25F [CIKM09] PRM-D [CIKM09] PRM-S
Improving
Retrieval
Retrieval
Federated
Federated Field-based CQL Likelihood
Collection Query [SIGIR10] Field-based CQL [SIGIR10]
Search
Search Feature-based
Feature-based Type Prediction
Vertical Selection Feature-based Type Prediction
[SIGIR10]
[SIGIR10]
Start F1 F2 F3 End
• Parameter Estimation
• Based on human-generated queries
• But how do we get a large quantities of manual queries?
36
• Interaction Scenario
What’s the query
you would use to
Algorithm Human find the document?
Gather a Set of
Manual Queries
Query Generation
Generated Queries
Parameter Refinement indistinguishable from
Manual Queries
37
• Motivation E-Mail
Bookmark
Associative Browsing
• User study is expensive Document Blog Post
Document Webpage
with two access methods
Document
Daily Journal
DailyNews
Journal
E-mail
End
38
• 2011/9 - 2011/12
• Unified user modeling for known-item finding
• 2012/1 – 2012/5
• Additional experiments
• 2012/6 – 2012/8
• Finalize the thesis
39
REFERENCES
40
* Major References
• Desktop Search
• Stuff I’ve Seen [Dumais et al.]
• Semi-structured Document Retrieval
• Mixture of Field Language Model [Ogilvie & Callan]
• Federated Search
• CORI method for rank-list merging [Callan et al.]
• Associative Browsing
• Find-similar method [Smucker & Allan]
• Known-item Finding
• Query generation method [Azzopardi et al.]
• Human Computation Game
• PageHunt [Ma & Chandrasekar]
41
OPTIONAL SLIDES
42
fn fn
P(Fn|q1) P(Fn|qm)
43
Fan-out 1
1 2 3 4 5 6 7
1 1
6 6
3 5
7 7
45
EXPERIMENTAL RESULTS
Term-based Search Model
46
* Experimental Setting
• Pseudo-desktop Collections
• Crawl of W3C mailing list & documents
• Automatically generated queries
• 100 queries / average length of 2
• CS Collection
• UMass CS department webpages & emails & etc.
• Human-formulated queries from DocTrack game
• 984 queries / average length 3.97
• Other details
• Mean Reciprocal Rank was used for evaluation
47
* Collection Statistics
• Pseudo-desktop Collections
(#Docs (Length))
• CS Collection
48
• CS Collection
* Retrieval Performance
• Pseudo-desktop Collections
Best :
use best type-specific
retrieval method
Oracle :
predict correct type
perfectly
• CS Collection
EXPERIMENTAL RESULTS
Associative Browsing Model
52
* Experimental Setting
• Collections
• Two personal collections of volunteers & their click data
• CS Collection & click data collected from the DocTrack game
• Collection Statistics