Beruflich Dokumente
Kultur Dokumente
Computational Journalism
Columbia Journalism School
Week 6: Hybrid Filtering
October 16, 2015
Filtering Comments
Comment voting
Up down votes
plus time decay
p = 0.333
p = 0.6875
Condence interval
Given observed p, interval that true p has a
probability of lying inside.
User-item matrix
User-item matrix
No content analysis. We know nothing about what is in each
item.
Typically very sparse a user hasnt watched even 1% of all
movies.
Filtering problem is guessing unknown entry in matrix. High
guessed values are things user would want to see.
Filtering process
Similar items
Item similarity
Cosine similarity!
Generating a recommendation
j items
u
variation in
user topics
user rating
of item
i users
topics in doc
topic
topic for word
word in doc
words in topics
word
concentration
concentration
parameter
parameter
N words
D docs
in doc
K topics
variation in
per-user topics
topics for user
user rating
of doc
content only
content +
social
Item Content
My Data
who I follow
social network
structure,
other users likes
How to evaluate/optimize?
How to evaluate/optimize?
Netflix: try to predict the rating that the user gives a
movie after watching it.
Amazon: sell more stuff.
Google web search: human raters A/B test every
change
How to evaluate/optimize?
Does the user understand how the filter works?
Can they configure it as desired?
Can they correctly predict what they will and won't
see?
How to evaluate/optimize?
Can it be gamed? Spam, "user-generated
censorship," etc.
r(S,U,{P},{B}) in [0...1]
How to evaluate/optimize?
Does it improve the user's life?