Sie sind auf Seite 1von 8

2010

2010
IEEE
IEEE
Fourth
International
International
Conference
Conference
on Semantic
on Semantic
Computing
Computing

MySpace Video Recommendation with Map-Reduce


on Qizmt
Yohan Jin, Minqing Hu, Harbir Singh,
Daniel Rule and Mikhail Berlyant

Zhuli Xie
Yahoo! Inc.
3333 West Empire Avenue
Burbank, CA, USA
zxie@yahoo-inc.com

Data Mining, MySpace Inc.


407 North Maple Drive
Beverly Hills, CA, USA
{ychin, mihu, hsingh, drule, mberlyant}@mysapce-inc.com

generate personalized recommendations. This can be achieved


by finding videos similar to each of the users videos history,
aggregating those videos, and then recommending the most
correlated videos.
User-based CF recommendation is not practical for our
task, as we face a severe new user problem as half of
the videos are watched by visitors, who either do not have a
MySpace account or are not logged in when watching videos.
Moreover, user-based CF is computational expensive and thus
places scalability challenge when dealing with large datasets
(tens of millions of items). Even though item-based CF has
the advantage of taking computation offline and generating
recommendation indexes quickly, item-based CF is still limited
by the new item problem, i.e. not until the new video is
watched by a sufficient number of users, the recommender
system would not be able to involve it in the recommendation.
For example, more than 47% of the videos on MySpace were
watched by only one or two users during a 90 days period. The
limitation of CF method prompted us to leverage content based
techniques. In our system, we extract keywords as features
from the content, i.e., video metadata (title, tags, description)
in which a user describes the content of the uploaded video.
However, a lot of user generated metadata are short and noisy.
Thus, many videos may not have a sufficient set of features.
To address the problem induced by noisy metadata, we also
introduce a refinement step based on semantic similarity using
modified Normalized Google Distance [15] to improve the
quality of the recommendations.
Some facts of the video streaming data on MySpace Video
may help the readers better understand the problem we are
dealing with. During a typical 90 days period, there were
more than 382 million video views in which 13.4 million
videos were consumed by 113 million users. From another
angle, only 3.8K popular videos (with 10K or more viewers)
attracted 166 million video views (nearly 43% of total views).
This paper makes contributions by:
1) Proposing a practical video recommendation engine
which works on top of Map-Reduce with reasonable
running time (less than 30 minutes with 23 million
videos on a 160 core cluster).
2) Suggesting a refinement algorithm as a post-processing

AbstractRecent years have seen a surge in online video


content which is often used as a communication medium and
information resource by users. The explosive growth in content
has given rise to the need of developing effective recommendation
system which can help users discover meaningful and interesting
videos. In this paper, we present a large-scale Map-Reduce video
recommendation system. Our approach includes item-to-item
collaborative filtering using video views data, and involves content
analysis of video metadata to extract feature representation for
identifying similar videos for recommendation. Recommendation
results are further filtered through a refinement stage using
semantic similarity. As an integrated pipeline, we show how our
proposed approach is implemented in Qizmt which is a .Net
MapReduce framework. Additionally, our approach is capable
of updating video recommendation index with hourly added
video data. We describe our recommendation approach using
a portion (23 million) of all videos from MySpace and undertake
quantitative as well as qualitative evaluation.

I. I NTRODUCTION
Recommender systems have become a vital part of most
web companies that have a significant online presence. These
systems form a specific type of information filtering technique
that attempts to present items (movies, music, books, news,
images, web pages) that are likely of interest to the user by
comparing the users profile to reference characteristics such
as the information from the item or from the users social
environment. Input is taken either directly or indirectly from
users and based on user needs, preferences and usage patterns,
recommendations for personalized products and services are
provided. The goal is to ease the information search, discovery
and decision processes for the user.
In this paper, we explore both Collaborative Filtering (CF)
and Content-based (CB) methods for video recommendation.
Our recommendation approach is item-based, i.e, the recommendations are per-item, rather than per-user based. An
item-based collaborative filtering method and a content-based
approach to find similar videos is presented. To find similar
videos, both methods are employed where the item-based
collaborative filtering utilizes the video views data and the
content-based method makes use of the video metadata. The
results from two approaches are then blended to generate the
recommendation indexes, which are then improved through
a refinement stage. Our system can be easily modified to
978-0-7695-4154-9/10 $26.00 2010 IEEE
DOI 10.1109/ICSC.2010.79

126

and aural relevance using 13K online videos. A film recommendation system, which uses low level audio-visual features
extracted to classify MPEG coded films is presented in [16].
This differs from our approach as we do not use low level
physical features, primarily to avoid issues related to proprietary nature of data and complexity given the large size
of data set. Another approach, Music Video Miner, which
performs video segmentation and feature extraction to get
an abstract representation of video using audio, visual and
transcript features, is presented in [4]. An approach providing
video suggestions using user-video graph is presented in [6].
Our approach differs from theirs in that we use semantic
information obtained from video metadata.
The rapid growth in Recommender Systems has seen approaches attempting to deal with the challenges of scalability,
accommodating to new data, comprehensibility and improving accuracy of recommendations. An approach addressing
the scalability issue is presented in [8] using a CF based
incremental clustering. In our recommendation approach we
leverage map-reduce framework to provide a scalable solution
for processing large dataset and make use of an alternate way
to achieve improvements in the quality of recommendation
results through a refinement stage. In terms of handling new
data, we use incremental learning and updating. The results
in [10] indicate that allowing users to know more about
the result generating process can help them understand the
strengths and weaknesses of the recommender system. With
this knowledge, users can make low-risk decisions. We provide
explanations in terms of common features using concepts of
semantic similarity when providing corresponding recommendations for candidate items which in our study are videos.

stage and demonstrating its efficiency for higher quality


and more compact recommendations.
This paper is organized as follows: Section II is about
related efforts in video recommendation research, Section III
explains data used for our recommendation system and shows
its characteristics and challenges. Section IV describes our
proposed recommendation system which combines all sub-part
together as a whole pipeline. In following sections, we will
show proposed algorithm details (Section V) and experiment
results with focus on refinement improvement (Section VI).
Finally, conclusion and future work will be discussed in
Section VII.
II. R ELATED W ORK
The development of recommender systems is a challenging
task and various approaches have been proposed. The approaches are classified into categories such as: Collaborativefiltering, Content-based methods and Hybrid approaches,
which is a combination of collaborative and content analysis) [3].
In Collaborative-filtering (CF) based systems, users receive
recommendations based on people who have similar tastes and
preferences. CF does not rely on the content descriptions of
items, but purely depends on preferences expressed by a set
of users. These preferences can either be expressed explicitly
by numeric ratings, or indicated implicitly by user behaviors,
such as clicking on a hyperlink, purchasing a book or reading
a particular news article. CF-based recommender systems have
been used in diverse areas such as e-commerce (Amazon [11],
Netflix [7]), news (Grouplens [10]), news personalization [8]
and movies recommendation (Movielens [14]).
CF algorithms are further classified into memory-based
approaches and model-based approaches. Memory-based CF
methods apply a scheme to predict a users ratings based
on the ratings given by like-minded users and make use of
past ratings and comparisons between users and/or items. In
user memory-based systems, the prediction of the rating of
an item for a given user will depend upon the ratings of the
same item by similar users. Similarly, in item memory-based
systems, the predicted rating depends upon the ratings of other
similar items by the same user. In contrast to memory-based,
model-based CF use machine learning algorithms to predict
ratings by learning a descriptive model of user preferences.
Several model based collaborative filtering approaches exist
using Bayesian models, regression, neural networks [9]. The
advantages of these approaches are that scalability issue is
handled by separating the offline tasks of creating user models
from the real-time task of recommendation generation. This is
often at a cost of tuning of a significant number of parameters
making it difficult for adoption in practical scenarios and
leading to variations in accuracy of recommendation in the
presence of dimensionality reduction techniques [17]. In this
work the item memory-based approach is used.
Some recommender approaches pertaining to video include
VideoReach [12], which is an online video recommendation
system that finds relevant videos based on textual, visual

III. DATA E XPLORATION


In this section we describe the data that are used in our
video recommendation system. MySpace Video allows users
to upload and share video clips with others. The video sources
include not only the normal users, but hundreds of content
partners who provide MySpace with high quality videos, such
as TV shows, news reports, cartoon episodes, etc. Everyday
there are tens of thousands of videos uploaded. MySpace has
accumulated more than 50 million video clips and the number
is still rapidly growing. Such large amount of videos imposes
a challenging task for recommendation systems to find related
or interesting videos efficiently [6]. In fact, our data show that
only a very small percentage (< 0.3%) of videos were popular
enough to have more than 1000 users who watched them and
more than 47% of the videos were only watched by one or
two users during a 90 days period. Therefore, if we only use
the video views data to make recommendations, a large part
of video will not receive any recommendations. This issue is
sometimes referred as cold start or new item problem [5],
which is commonly seen in CF based methods. To solve
such problem, we employ a content-based approach which
uses the metadata associated with videos. While many useruploaded videos may be just about some personal everyday
events, such videos can interest other people as well. E.g.,

127

one user who just watched a clip of her friend dancing Hanna
Montana, she would be likely to watch similar mimic shows
from other unknown people and have some more laughs. In
such situation, we will provide the users with similar videos
which are determined from the textual data associated with
the videos, such as the title, description, and tags entered by
the users. At the same time, the user views data provide us
some other evidences that some videos are still of peoples
interests even though they cannot be determined as related
from the associated text. In Section V we will show that our
recommendation system combines the recommendation results
generated from our pipeline which deal with the user views
data and video Metadata separately. Below we describe two
kind of data used by our pipeline.
A. Co-Viewed Videos
MySpace allows users to watch the videos on the website
without logging into the site with their MySpace user accounts.
If we only use the user views data from authenticated users,
more than 40% of the total user views will not be used. In
our recommendation pipeline, we use cookies to represent
users instead of users accounts. When two videos are watched
by a number of users within a certain period of time, it
is an indication that both videos interest the users in some
way. And such user views data are usually collected by the
recommendation systems based on Collaborative Filtering,
such as the one used by Amazon.com [11]. We used the
video views data collected over a moving window of last
90-day period. It was also shown in [13] that the majority
of the videos uploaded to the social website accumulated a
very low percentage of their total views after 90 days since
uploaded. During an averaged 90-day period, there were more
than 382 million video views in which 13.4 million videos
were consumed by 113 million users (both authenticated users
and guests). Nearly 4.1 million videos were viewed only once
while the most popular videos were watched by nearly 1
millions of users. On average, a video was viewed by about 30
users. A similar picture can be seen from another aspect: more
than 65 million users watched only 1 video during the 90 days
period. On average, one user watched about 3 videos. About
166 million video views (nearly 43% of total views) were on
only 3.8K popular videos (with 10K or more viewers). Based
on above view data there were 382 million recommendations
generated, covering 2.2 millions videos which is 16% of total
videos that were viewed at least once in the 90-day period.
On average, there are 56 recommendations per video.

Fig. 1.

Distribution of Keyword Counts in MySpace Videos.

Among the 23 million videos we used in our experiments,


more than 13 million videos belong to this kind. To improve
the quality of the recommendation for such videos, we involve
a refinement process which will be discussed in Section V-C.
In comparison, the textual information of the videos provided
by MySpace content partners is rich in general, e.g., they can
provide multiple categories from more options. In Figure 2, we
show both distributions of all videos and our partner provided
videos with respect to the number of keywords identified from
each video. The distinction between the two is obvious: while
more than half of all videos have very few (1-5) keywords,
our partner-provided videos have only less than 3% falling
into that type. In Figure 2, we show the metadata for a typical
video from each type of the content sources. When we make
Sample Video Metadata From Content Partner:
Title:
Earth
Description: The first film in the Disneynature series, earth,
narrated by JAMES EARL JONES, tells the remarkable story of three animal families and their
amazing journey across the planet we all call home.
earth combines rare action, unimaginable scale and
impossible locations by capturing the most intimate
moments of our planets wildest and most elusive
creatures. Directors Alastair Fothergill and Mark
Linfield, the acclaimed creative team behind the
Emmy Award winning Planet Earth, combine forces
again to bring this epic adventure to the big screen,
beginning Earth Day 2009.
Tags:
Earth, Disneynature, Trailer, Park

B. Video Metadata

Sample Video Metadata From Normal User:


Title:
Bumble Bee
Description: Bumble Bee pollinating
Tags:
Bumble, Bee

When a user is uploading a video, she is required to


provide the title, description, a category selected from 17
options, tags, and the primary language used in the video.
The length limits for the title, description, and tags are 64,
3000, and 65 characters respectively. However, many videos
were provided with very limited information. We regard such
videos as text-poor videos and the recommendations related
them are generally of low quality in the text-based approaches.

Fig. 2.

Sample Video Metadata

recommendation based on the video metadata, we actually try


to find the similar videos according to the keywords identified

128

from them. Each video will be represented with a feature


vector while each keyword is a feature. From all the videos
we build a keyword table, which includes both unigram and
bigram identified from the video metadata after removing all
stop words from them. The value of a feature in one videos
feature vector is a weighted TF*IDF as commonly used in
Information Retrieval, which will be discussed in Section V-B.

2.

A threshold is used to remove those video pairs with low


score which is more likely be caused by chances other than
true correlations. In practice, we choose 4, which mean a video
pair must be co-viewed by at least 5 users to be recommended.
Local popularity also provides an easy to understand evidence
of how the recommendation is provided, i.e., 346 users who
watched this video also watched....
The computation of recommendation index table (recommendation repository in our pipeline) is time intensive, with
O(N 2 M ) as worst case, while N is the total number of videos
and M is the total number of users. In practice, however, it
is closer to O(N M ), as most users have watched very few
videos in the 90-day window. This algorithm can be easily
implemented in Map-Reduce, as shown below:

IV. R ECOMMENDATION P IPELINE


Our recommendation system is built upon a MySpace MapReduce system which is called Qizmt[1]. Qizmt uses C#
and is .Net based. It is tailored towards our specific internal
usage, with the Map-Reduce spirit intact. For example, our
system enables most Map-Reduce jobs to be executed faster
in subsequent executions. This makes incremental learning
of recommendations fast when new videos are uploaded and
makes real time recommendation generation possible.
In Figure 3, we show the video recommendation pipeline,
both metadata and video views data are fed into the system
and item-based collaborative filtering and content-based approaches work in parallel to generate recommendation and
these recommendations are then blended to present the final
result. On the first run, the snapshot of existing video metadata
and 90-days video views data is processed. The subsequent
runs will only be on newly uploaded videos and new video
views, and the recommendation delta will be added to the
recommendation repository.

ALGORITHM 2: Map-Reduce Logic


user video viewing history list: (ui , vj ), ...
Map: (key: ui , value: vj )
Reduce: (key: ui , values: v1 , v2 , ...)
Generate all possible video pairs from v1 , v2 , ...
Output: video pairs list: (vi , vj )

Input:

Input:

video pairs list: (vi , vj )


Map: (key: (vi , vj ), value: 1)
Reduce: (key: (vi , vj ), values: 1, 1, ...)
Aggregate the values and generate local popularity
score
Output: video recommendation pairs: (vi , vj , #score)
Given the recommendation index table, the recommendation
set for any video vi can be retrieved and then videos with
highest scores will be rendered as recommendations for vi .

V. A PPROACHES
A. Collaborative Filtering
We use item-to-item based collaborative filtering method
that is similar to [11]. As discussed in previous section, we
are generating recommendation not on user level, but finding
videos with similar content, as half of our videos consumed
are from visitors (We can always provide more personalized
recommendation for users, by using their video watching
history in 90-day window). Its possible to compute the similarity between two videos in various ways, including cosine
measure, local popularity and Jaccard index [6]. We choose
local popularity because it provides the highest recommendation coverage without significantly losing the recommendation
quality. The coverage is defined as the percentage of videos
that receive recommendations among all videos. The following
algorithm calculates the similarity between video pairs using
local popularity and generates the recommendation index.
ALGORITHM 1: Recommendation Index Generation

For each user u:


1. Gather the video watching history of user
u:v1 , v2 , ..., vn
2. For each video vi from v1 to vn1 in the history
list
2.1 Record that a user watched vi and vi+1

For each video pair (vi , vj ):


1. Compute the similarity between vi and vj : record
total users who watched video pair (vi , vj ), #u

If #u > threshold, video pair (vi , vj , #u) will be


kept in the recommendation index table

B. Content Based Approach


Now we present the content based approach that finds
similar videos based on the textual descriptions. It first extracts
keywords from video metadata and then measures video
similarities based on the textual features.
1) Feature Extraction: Currently for each video on MySpace, there are four types of metadata that provide description
about the video content, namely, title, description, tags, and
categories. While tags and categories are lists of individual
keywords, titles and descriptions contain phrases and sentences that show syntactic structures. We apply different rules
when extracting features from different types of metadata. We
preprocess title and description with part of speech (POS)
tagging, and then we extract valid words and phrases, such as
nouns, verbs, and adjectives as features. While we analyze the
syntactic role of each term in title and description, we do no
further processing but extracting a list of individual terms, as
tag is already user-defined keywords. For category metadata,
we do not include categories as features, as there are less than
100 categories that a video can belong to on MySpace Video
and categories cannot be simply treated as normal keywords.
For all nouns, verbs, adjectives, and adverbs, we first filter out

129

Fig. 3.

Video Recommendation Pipeline.

appears. Here, a document means a video metadata file and


we count document frequency for all unigrams and bigrams
from all available metadata files.
Finally, each feature is assigned with a T F IDF term
score and multiplied by the weight which is given by the POS
tag and metadata type. Below is the table of weights defined
by a heuristic assumption.

the stop words1 and obtain a list of bigrams and unigrams as


features.
After obtaining a set of features for a given video, we assign
appropriate weight for each feature. Based on the assumption
that a syntactic role is related to a semantic importance within
a sentence, which means noun is more important than pronoun
or adjective, we apply different weight to different syntactic
roles. The final feature score is obtained by multiplying the
term score that is determined by the term distribution and the
weight given by its syntactic role or metadata type.
F eatureScore = T ermScore W eight

TABLE I
F EATURE WEIGHT GIVEN BY METADATA TYPE OR POS TAGS

(1)

To compute the term score, we apply T F IDF which


defines term frequency normalized by the general importance
of the term. The term frequency (TF) is the number of times a
term appears in the given document and the inverse document
frequency (IDF) is obtained by dividing the number of all
documents by the number of documents containing the term.
T ermScore = T F IDF = ni,j

|D|
log
|dj : ti dj |

(2)

Metadata Type or POS

Weight

Tag

Noun Phrase

0.8

Proper Noun

0.8

Phrasal Verb, Other bigram

0.8

Noun

Verb

0.7

Adjective

0.5

Adverb

0.2

When a multiword feature (n-gram) is selected, the single


word (unigram) that is a part of the selected feature is excluded
from the feature list.
2) Similarity Computation: To find similar videos, we compute cosine similarity between all pairs of videos by checking
the common features.
The cosine similarity is a measure between two vectors
which is defined by the cosine of the angle between them.

where ni,j is the number of occurrences of the term ti in


the document dj , |D| is the total number of documents and
|dj : ti dj | is the number of documents where the term ti
1 Some words are too common and cannot help understand the context, so
they are defined as stop words. The most common examples will be the,
a, is and so on, but the range of the stop words can be different for each
final application. We built a stop word list by referencing publicly available
resources on the web, and edited after a preliminary data experiment.

130

In our case of measuring the similarity between two videos,


each video is defined as n-dimensional vectors of n features
extracted from metadata. The similarity between the video A
and B is defined as follows:

wi,A .wi,B
(A.B)
=  i 
SA,B = cos =
(3)
(|A||B|)
w
i i,A2
i wi,B 2

Step1: Collect N unique feature-ids from top M recommended videos;


Step2: Compute summation of semantic distances of each
feature-id between N-1 other feature-ids;
Step3: Divide into two sets (relevance, irrelevance) the
N collected-feature-ids by summation of semantic
distance measure using NGD (Normalized Google
Distance)[15];
Step4: Remove all videos which dont have any fids belong to relevance set from refined recommendation
videos.

where wi,A is a score of term i in the video A. The score


is weighted T F IDF as defined in Eq. 1. Although each
video metadata has different number of features, the similarity
is normalized by the Euclidean distance (|A||B|) of vectors,
that is, normalized by the number of features (also can be
interpreted as document length), so that the final similarity
value is between 0 and 1. This method allows finding similar
videos in terms of showing similar distribution of features and
avoiding bias by a single popular feature.

VI. E XPERIMENTAL R ESULTS


A. Experimental Data
For running the proposed approach, we use a Qizmt cluster
with 160 cores to do all experiments. It shows that our
video recommendation engine generates huge index within
reasonable computational time.

C. Refinement using Semantic Similarity


The recommendation output sometime contains irrelevant
videos in addition to similar videos, due to noise in the
extracted keywords, which may lead to poor overall quality of
recommendation. For instance, we have a video which has the
following metadata:
Title:
Kia KND-4
Description:
The Kia KND-4 makes its North American
debut at the Los Angeles show.
Category:
Automotive, Entertainment
Keyword Tags: road, concept, track, magazine, KIA, Production, rt, KND-4
After keyword extraction on title and description, combined
with keyword tags, we have words set: American, debut,
concept, Kia, Los Angeles, road, track, magazine,
KIA, Production, rt, KND-4 It is apparent that Kia,
KND-4 and road are related keywords to the topic of new
car model of Kia and recommended videos will be relevant
if matched with these keywords. However, if recommended
videos match with those noisy keywords like magazine, they
are bad recommendations for this video.
In order to identify the most related set of keywords to the
topic of the video and provide users with more relevant recommendations, in this work, we proposed a semantic refinement
procedure: for video x and its recommended videos, first compute semantic similarity between selected keywords of video
x, then segment those keywords into relevant and irrelevant set
and finally the relevant keyword set will be used to filter out
irrelevant videos from the recommendations of video x. We
utilize the idea of Normalized Google Distance (NGD)[15] to
compute semantic similarities between keywords. Instead of
using Google for getting word hit counts, we use MySpaces
own data to get the word hits count. To avoid doing individual
query for each pair of keywords, we pre-compute a NGD table
of words so that we only do look up during the refinement
step. This NGD table is updated from time to time to reflect
the latest changes in MySpace data. The following shows the
detailed procedure of refinement computation.

TABLE II
C OMPUTATIONAL T IME AND C OVERAGE R ATE A NALYSIS .
Running Time (mins)
Initial

Coverage Rate
Subsequent

CF

16.21%

CB

27

16

58.22%

The CF method uses last 90-days video views data while


the CB method uses all active videos. Table II shows computational time and coverage rate. To update total 23 million videos
with newly arrived hourly data, CB method requires 16mins.
As for coverage rate, as CF relies on video views, it covers
only part of total engaged videos. On other hand, CB method
finds related videos if only there exists textual similarity, which
covers much more videos in recommendation results (58.22%
vs. 16.21%).
For computing automatically overall performance in terms
of recommendation quality, we introduced a possible heuristic
way to do it by borrowing pre-defined categories. Whenever
a new video is uploaded, one of the categories (as shown
in the first column of Table III), must be selected, which
the uploader deems the most appropriate. However, when
we use this type of category matching across 23 million
videos, it gains the power of majority. Along with title,
tag and description, user-selected categories are another
available resource as metadata. However, we did not use
category information as features of video, but, here we use
it for checking overall recommendation quality automatically.
We use a Mean Average Precision (MAP) method for automatic recommendation evaluation method. Normally, MAP
has been used for computing video/image retrieval accuracies [2].
k
i/ri
(4)
M AP = i=1
Ri
where k: top k in rank, ri : rank, R: number of relevant videos

131

Using Eq. 4, we compute MAP value of each category.


For example, if video x has category Sports and has 4
recommendations, where only rank 1 and 3 recommendation
is of the same Sports category. For video x, we compute
correct recommendation based on category matching between
query video (sports) and each recommended video ordered
by rank up to top 4. If the total number of relevant videos
is 5, then M AP = (1/1 + 2/3)/5 = 0.333. In Table III, we
show that each categorys average value of MAP across all
query videos. For this experiment, we run recommendation
engine for generating all possible pair combinations from 23
million active videos and choose 104K number of query videos
uploaded during a week time-period. Although we narrow
query size to 104K videos, each query video can have any
video from 23 million video populations as a recommendation.
Fig. 4.
contents.

TABLE III
U SER - DEFINED C ATEGORY BASED M EAN AVERAGE P RECISION
A CCURACIES .
Categories
Animals
Animation/CGI
Automotive
Comedy and Humor
Entertainment
Extreme Videos
Instructional
Music
News and Politics
Schools and Education
Science and Technology
Video Blogging
Sports
Travel and Vacations
Video Games
Weird Stuff
Overall

CF
0.3421
0.5540
0.7149
0.7003
0.7621
0.6599
0.4093
0.7345
0.4347
0.4720
0.2577
0.7262
0.7987
0.6494
0.4799
0.7200
0.5885

CFprm
0.5911
0.7738
0.6198
0.7364
0.7840
0.4641
0.8087
0.6720
0.4166
0.3315
0.3423
0.6264
0.7025
0.3568
0.7595
0.5227
0.5943

CB
0.2217
0.3026
0.4103
0.3788
0.4341
0.2758
0.2944
0.5382
0.2524
0.2028
0.3247
0.2857
0.4137
0.2721
0.2903
0.3084
0.3254

CBprm
0.5061
0.6730
0.5866
0.6103
0.9120
0.4264
0.5838
0.6572
0.5205
0.3900
0.3878
0.6647
0.7426
0.4718
0.5082
0.4608
0.5689

Fig. 5.

MAP accuracy enhancement through refinement on preimium

MAP accuarcy enhancement through refinement on user contents.

normally large. With this experiment, it verified that refinement


can compact initial recommendation index size and enhance
recommendation quality.

In Table III, we compare with several different approaches


and dataset. Video views data based on CF method shows
much better results than CB(keywords) method overall. Another interesting observation is for premium-only video data
experiment. Different from user-generated videos. Premium
videos normally have much rich textual description, which
makes our proposed content-based approach a lot powerful
in terms of finding very related videos as recommendation.
As we can observe in Table III, CBprm (premium) improved
from 32.54% to 56.89%. On the other hand, MAP value for
CF (video view data) does not change much between usergenerated video and premium-only video population as it does
not use textual description of videos.
Figures 4, 5 demonstrate that refinement processs can improve recommendation qualities. We apply refinement as a
post-processing step in our pipeline. In terms of runnning time
and resource, computing semantic similarity between metadata
distribution and making refining decision are relatively quick
process. However, we can see that all categories have an
impact of increasing MAP accuracies. Refinement can also
reduce the amount of recommendation index size, which is

B. Click-through Evaluation
In this section we show the click-through comparison study
of the proposed video recommendation system on MySpace
Video site of one days data on Nov 16, 2009. This is only one
days worth of data, but other days show similar performance.
On MySpace Video site, for each public viewable video,
recommendations is showing in a module along with another
module showing editorially-selected Featured Videos. This
set up helps us to measuring absolute performance of recommendation, in addition to comparing it to the manually
created list. Figure 6 shows information for the top 5000
videos shown in US for that day. Recommended videos have
an average click-through rate of 11.3%, with over 25% of the
recommended videos having a click-through rate of larger than
15%. By comparison, Featured Videos had a click-through
rate of 0.36%.

132

Fig. 6.

Click-Through Rate Evaluation.

VII. C ONCLUSIONS

[4] L. Agnihotri, N. Dimitrova, J. Kender, and J. Zimmerman, Music videos


miner, in Proceedings of the eleventh ACM international conference on
Multimedia. ACM, 2003, pp. 442443.
[5] I. S. Andrew, P. Alexandrin, H. U. Lyle, and M. P. David, Methods
and metrics for cold-start recommendations, in Proceedings of the
25th Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval. ACM, 2002, pp. 253260.
[6] S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar,
D. Ravichandran, and M. Aly, Video suggestion and discovery for
youtube: taking random walks through the view graph, in WWW 08:
Proceeding of the 17th international conference on World Wide Web.
ACM, 2008, pp. 895904.
[7] J. Bennett and S. Lanning, The netflix prize, in Proceedings of KDD
Cup and Workshop, 2007.
[8] A. Das, M. Datar, A. Garg, and S. Rajaram, Google news personalization: Scalable online collaborative filtering, in 16th International World
Wide Web Conference. ACM, 2007, pp. 271280.
[9] Y. Kai, A. Schwaighofer, V. Tresp, X. Xiaowei, and H. Kriegel,
Probabilistic memory-based collaborative filtering, IEEE Transactions
on Knowledge and Data Engineering, vol. 16, no. 1, pp. 5669, Jan
2004.
[10] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, and
J. Riedl, GroupLens: Applying collaborative filtering to Usenet news,
Communications of the ACM, vol. 40, no. 3, pp. 7787, Jan 1997.
[11] G. Linden, B. Smith, and J. York, Amazon.com recommendations:
Item-to-item collaborative filtering, IEEE Internet Computing, vol. 7,
no. 1, pp. 7680, Jan/Feb 2003.
[12] T. Mei, B. Yang, X.-S. Hua, L. Yang, S.-Q. Yang, and S. Li, Videoreach:
an online video recommendation system, in SIGIR 07: Proceedings of
the 30th annual international ACM SIGIR conference on Research and
development in information retrieval. ACM, 2007, pp. 767768.
[13] L. Michael, Study: Videos Live Fast, Die Young On Web,
http://www.businessinsider.com/2008/6/online-video, 2008.
[14] B. Miller, I. Albert, S. Lam, J. Konstan, and J. Riedl, Movielens
unplugged: experiences with an occasionally connected recommender
system. in IUI Conference, 2003, pp. 263266.
[15] L. C. Rudi and M. V. Paul, The google similarity distance, IEEE
Transactions on Knowledge and Data Engineering, vol. 19, no. 3, pp.
370383, Mar 2007.
[16] M. Sugano, M. Furuya, A. Yoneyama, Y. Takishima, and Y. Nakajima,
Framework for context-based film recommendation system, International Conference on Consumer Electronics, vol. 7, no. 11, pp. 299300,
Jan 2006.
[17] M. Yashar, A. Deepak, P. Benjamin, and M. J. Joemon, Movie
recommender: Semantically enriched unified relevance model for rating
prediction in collaborative filtering, in The 31st European Conference
on Information Retrieval. Springer, 2009, pp. 5465.

In this paper, we presented a recommendation system which


uses an item-based approach by unifying collaborative filtering
and content based methods with two heterogeneous input data:
video views and textual metadata. We showed that our proposed system works for large-scale video site, MySpace Video.
As a case study, this paper has demonstrated the challenges of
large-scale content recommendation systems: heterogeneous,
noisy and huge data to deal with. Also we proposed a
refinement algorithm which improves the recommendation
quality significantly. As a practical application, we showed
the refinement process in map-reduce in detail. Different from
the original Normalized Google Distance, we showed the
way of using metadata resource as an offline-corpus, which
makes it an inexpensive process to find semantic distances
between co-occurred textual keywords. Compared with the
relatively quick refinement process, the quality improvement
is significant and the size of final recommendation index is
more compact. In the future, we will involve expansion of
user-given textual description through knowledge and plan
to conduct analyses of incorporating social component into
recommendation generation.
VIII. ACKNOWLEDGMENTS
We would like to thank Monisha Kanoth and Igor Deck for
assisting in data collection. The work reported in this paper
was carried out when the authors were working at MySpace
Inc.
R EFERENCES
[1] MySpace Qizmt - MySpaces Mapreduce Framework. [Online].
Available: http://qizmt.myspace.com/
[2] TREC
Video
Retrieval
Evaluation,
http://wwwnlpir.nist.gov/projects/trecvid/, 2009.
[3] G. Adomavicius and A. Tuzhilin, Toward the next generation of
recommender systems: a survey of the state-of-the-art and possible
extensions, Knowledge and Data Engineering, IEEE Transactions on,
vol. 17, no. 6, pp. 734749, June 2005.

133

Das könnte Ihnen auch gefallen