Sie sind auf Seite 1von 10

184 IEEE SYSTEMS JOURNAL, VOL. 8, NO.

1, MARCH 2014

Cloud-Based Mobile Multimedia Recommendation


System With User Behavior Information
Yijun Mo, Jianwen Chen, Senior Member, IEEE, Xia Xie, Changqing Luo, and
Laurence Tianruo Yang, Member, IEEE

Abstract—Facing massive multimedia services and contents in video-sharing websites and social network applications
the Internet, mobile users usually waste a lot of time to obtain everyday [1]. The video content may be duplicate, similar,
their interests. Therefore, various context-aware recommendation related, or quite different. Facing billions of multimedia web-
systems have been proposed. Most of those proposed systems
deploy a large number of context collectors at terminals and access pages, online users are usually having a hard time finding their
networks. However, the context collecting and exchanging result favorites. This situation is even worse for mobile users because
in heavy network overhead, and the context processing consumes of screen limit and low bandwidth. How to help mobile users
huge computation. In this paper, a cloud-based mobile multimedia obtain their desired content lists from billions of webpages
recommendation system which can reduce network overhead and in a short time is very challenging [2]. Some video-sharing
speed up the recommendation process is proposed. The users are
classified into several groups according to their context types and websites recommend video lists for end users according to
values. With the accurate classification rules, the context details video classification, video description tags, or watching history.
are not necessary to compute, and the huge network overhead However, these recommendations are not accurate and are
is reduced. Moreover, user contexts, user relationships, and user always not consistent with the end users’ interests. To improve
profiles are collected from video-sharing websites to generate this, some websites also provide users with search engine to
multimedia recommendation rules based on the Hadoop platform.
When a new user request arrives, the rules will be extended and search their desired videos quickly. However, searching is based
optimized to make real-time recommendation. The results show on the keywords. For most cases, mobile users do not have
that the proposed approach can recommend desired services with any keyword when they process the search. Favorite video
high precision, high recall, and low response delay. recommendation techniques are commercially driven and are
Index Terms—Cloud computation, minimal spanning tree, important for mobile multimedia applications.
multimedia service recommendation, user behavior analysis. There are several successful video recommendation algo-
rithms and systems that have been developed and exploited. For
I. I NTRODUCTION example, Google has adopted content-based filtering (CB) rec-
ommender system in its AdWords services. The Google search
A CCORDING to Cisco’s latest forecast, two-thirds of the
world’s mobile data traffic and 62% of the consumer
Internet traffic will be video by the end of 2015. The sum of all
engine returns search results with keyword-related advertise-
ments. However, those advertisements are always neglected by
end users. This is mainly because of the biased decisions of
forms of video (TV, video on demand, Internet, and P2P) will
users’ favorite content [3]. Unfortunately, Google AdWords had
continue to be approximately 90% of global consumer traffic
been removed from the right side of the page. Amazon and
by 2015. Internet users post a large number of video clips on
Taobao have achieved great success in recent years. They have
introduced collaborative filtering (CF) recommender systems
Manuscript received July 29, 2012; revised December 27, 2012; accepted
into their e-commerce websites to help users find their inter-
May 8, 2013. Date of publication January 16, 2014; date of current version ested goods [4]. The users’ interests are identified by matching
February 5, 2014. This work was supported in part by the National Key Projects the click and concern patterns among a group of users. The
of China under Grants 2012ZX03002010 and 2009ZX03004-004-004-04 and
in part by the National Science Foundation of China under grants No. 61001070
basic concept is to use the large group people’s behavior to
and 61201219. (Corresponding author: X. Xie.) predict the individual interests. Therefore, the highly popular
Y. Mo is with the Department of Electronics and Information Engineer- contents are considered as the common users’ interests, while
ing, Huazhong University of Science and Technology, Wuhan 430074, China
(e-mail: moyj@hust.edu.cn).
the less popular contents are always not judged as users’ inter-
J. Chen is with the Department of Computer Science, University of Califor- ests. As a result, the less popular but users’ interest content will
nia, Los Angeles, CA 90095 USA (e-mail: jianwen.chen@ieee.org; xu-feng@ be never recommended to them. Another famous recommender
live.com).
X. Xie and C. Luo are with the School of Computer Science and Technol- system based on social network filtering (SNF) is exploited
ogy, Huazhong University of Science and Technology, Wuhan 430074, China by Facebook. On Facebook, the social network is formed
(e-mail: shelicy@mail.hust.edu.cn; chqluo2013@gmail.com). according to social signals, such as space links, user concerns,
L. T. Yang is with the School of Computer Science and Technology,
Huazhong University of Science and Technology, Wuhan 430074, China, and content forwards, and user interactions. Users can recommend
also with the Department of Computer Science, St. Francis Xavier University, content to their social network. That becomes a trend of content
Antigonish, NS B2G 2W5, Canada (e-mail: ltyang@gmail.com). recommendation. However, recommendation satisfaction, cold
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. start, and timeliness in content recommendation are still three
Digital Object Identifier 10.1109/JSYST.2013.2279732 challenging issues [5].

1932-8184 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
MO et al.: CLOUD-BASED MOBILE MULTIMEDIA RECOMMENDATION SYSTEM WITH USER BEHAVIOR INFORMATION 185

For almost all of the existing recommendation algorithms, model, user-behavior-based clustering, clustering-based user
the typical system consists of two essential components: 1) a profile collecting, and cloud-based recommendation rule rea-
content recommender that takes charge of user interest iden- soning in detail. Further discussion about system performance
tification, user interest recommendation, and result reranking optimization is presented in Section IV. Section V describes
and 2) various collectors that collect user context and activities, the implementation of the proposed recommender system and
content attributes, and updates. In recommendation system ini- presents a comprehensive evaluation of the system. Section VII
tialization, a few contextual information, e.g., time and location, concludes this paper and discusses future work.
is collected [6]. To capture the interests of users in a ubiquitous
environment, more and more contextual information, such as
II. R ELATED W ORK AND S YSTEM A RCHITECTURE
user opinions, watching times, and video ages, is logged in the
recommendation system [4]. Real-time recommendation cannot For emerging mobile devices and services, various context-
be guaranteed due to inevitable increment of computations. aware service platforms, such as SPICE [8], CASD [9], and
User interests and content clustering are often used to narrow uPnP-based architecture [10], are developed to provide mo-
the searching range of related content. bile user favorite services and applications. Recommendation
In this paper, we propose a mobile multimedia recommender systems based on the users’ preference have been applied to
system based on user behavior. The system is implemented on user favorite recommendation for several years. In this section,
the Hadoop platform to satisfy the huge computation require- we will review existing recommendation systems and present
ments for real-time recommendation systems. Compared with the architecture of the proposed cloud-assisted recommendation
traditional recommender systems, there are three differences: system.
1) the collector and user profiles are decentralized into several
computing nodes; 2) the user behavior clusters are collected
A. Recommendation System
except for only user profiles; and 3) the graph-based optimiza-
tion mechanism is introduced into the recommender to speed Recommendation systems focus on a specific domain. For
up the recommendation process. The proposed system has the example, Google News provides personalized news recom-
following contributions. mendation services for a substantial amount of online read-
ers. Amazon uses the recommender system to help users find
1) User clusters are collected instead of detailed user pro-
their desired products. YouTube uses user watching history to
files. More and more user contexts and profiles will be
predict and recommend videos for users. In general, four cate-
delivered and exchanged with the increment of collectors
gories of algorithms have been exploited by the recommender
and Hadoop nodes. To avoid the explosion of network
system: CB recommendation [11]–[13], CF-based recommen-
overhead, user-behavior-based clustering is performed
dation [14]–[19], context-aware recommendation, and graph-
first, and the collectors calculate user clusters according
based recommendation [20].
to the clustering rules and then report the user cluster to
CB recommendation: The systems make recommendation
the recommender only.
based on the similarities of content titles, tags, or descriptions.
2) The Hadoop platform is used in the proposed multimedia
Some systems find user-interested items based on user’s indi-
recommendation system. On the platform, user clusters
vidual reading history in term of content. CB recommender
and multimedia content are collected, distributed, and
systems are easy to implement. However, in some scenarios,
stored into the Hadoop distributed file system (HDFS).
simply representing the user’s profile information by a bag of
During user content recommendation, those data are par-
words is not sufficient to capture the exact interests of the user.
titioned into several chunks, the chunks are processed
CF-based recommendation: The systems make recommen-
simultaneously by several mapper, and then, the results
dation based on abundant user transaction histories and con-
are reduced and merged together [7]. The MapReduce
tent popularity. In the systems, individual user’s interests are
procedure can speed up the existing recommendation
predicted by a group of similar users [15]–[17]. To obtain
algorithm, such as CB, CF, or SNF (social-network-based
the content rating and users’ similarity, statistics and feedback
filter).
methods are used [18], [19]. CF systems require enough histor-
3) Recommendation rules are reordered to improve scal-
ical consumption record and feedback. Otherwise, prediction,
ability and real-time recommendation. Existing recom-
implicit feedback, or opinion classification methods should be
mendation systems always recommend a ranked list to
adopted to solve cold-start issues [5].
users after training from some given data. However, if
Context-aware recommendation: The aforementioned sys-
the content changes or a new keyword appears, a fixed
tems provide stable recommendation without considering user
list is always provided. In our work, according to rec-
context information. In fact, user interests vary according to
ommendation rules, the recommender searches a real-
location, time, and emotion. Context-aware recommendation
time ranked list for users. Furthermore, considering the
systems complement user context sensed on smartphone and
influence of rule execution order, we proposed a graph-
long-time user profile to assist the user in selecting better ser-
based rule reordering method to reduce searching latency.
vices, photographs, or videos dynamically. Context is a difficult
The rest of this paper is organized as follows. Section II concept to capture and describe; fuzzy ontologies and semantic
discusses the related work and proposes the cloud-assisted reasoning are used to augment and enrich the description of
system architecture. Section III presents the user behavior context [21], [22].
186 IEEE SYSTEMS JOURNAL, VOL. 8, NO. 1, MARCH 2014

user terminal to collect the contexts. With the increase


of context types and online users, networking and com-
puting resources will be consumed quickly. Furthermore,
highly dimensional contexts are very challenging for
recommendation algorithms. In order to avoid the issues,
initially collected contexts are clustered at the server side;
then, clustering rules are used by the application plug-ins
to calculate the user clusters, and the clusters are reported
to context collectors instead of former user contexts.
By this approach, networking and computing load is
relieved. To guarantee that the clusters keep fresh, context
collecting and clustering will be restarted periodically.
User’s social connection and profiles are collected by the
collectors at the server side.
2) User content clustering: User’s social connection and
Fig. 1. Cloud-assisted recommender system framework. user profiles are exploited by the component to find user
content similarity. Social connections are retrieved from
Graph-based recommendation: Graph is built in the systems the user’s actions on videos shared by other users. In
to calculate the correlation between recommendation objects. addition to content categories of user profiles, several
Moreover, recommendation problem turns into a node selection communities are formed. In each community, content de-
problem on a graph. Incorporating conversion content and scriptions (titles, tags, and resolutions) and content access
contextual information, links on video pages are converted to patterns (click behaviors on videos, access frequency, and
undirected weighted graph. Furthermore, the graph is parti- access time) are mapped into six-attribute tuple; then, the
tioned for recommending videos of latent topic or long tail user content clustering algorithm is executed on the tuple
videos. Besides that, users’ cotagging behaviors and friendships to obtain user interest clusters and user content similarity.
in social network can be described by a graph, and then, random The component is implemented within the MapReduce
walk with restarts is applied on the social graph to recommend framework.
items [4], [14], [20]. 3) Dynamic recommendation rule generating: If all similar
With the huge increase of user numbers, user contexts, user users and user content lists of a user are stored in his
profiles, and video contents, recommendation systems require profile, the system should allocate more storage space
more and more computation capacity. To resolve the huge com- for each user with the increase of users and videos. As
putation requirements, CF algorithms and context-aware algo- a result, the system becomes unscalable and brings more
rithms have been implemented on cloud-computing platforms latency to the search recommendation lists. Considering
to improve performance and scalability of the recommender user contexts, some user content lists will be duplicated.
system [7]. To resolve this issue, recommendation rules are extracted
from user context clusters and user content clusters.
B. Cloud-Based Recommender System The rules are composed dynamically during real-time
recommendation.
Based on the aforementioned description, we propose a novel 4) Optimized real-time recommendation: The only real-time
cloud-based recommender system for video applications. The component accepts the user’s new requests and returns
framework of the proposed system is illustrated in Fig. 1. The the recommendation lists to the user. The procedure trans-
system includes two parts: recommendation training and real- lates the user’s request into recommendation rules on the
time recommending. Recommendation training components basis of request keywords and implicit user contexts, and
collect user contexts, user relationships, and user profiles and then searches for user favorite according to the rules. To
then cluster and filter the behavior data on the Hadoop platform guarantee user experience, the procedure must provide
to obtain recommendation rules. When a user requests new a real-time response. The system adjusts the execution
videos, real-time recommending components will extend re- order of the rules based on the weighted graph to reduce
quests to recommendation rules and will return the recommen- searching latency.
dation lists in accordance with optimized rules. The four major
components and procedures in our framework are described as
follows. III. U SER B EHAVIOR C ONSTRUCTION
1) User behavior collecting: Video surfing behavior relies Making suitable recommendations for mobile users relies on
on user contexts (time, location, network type, etc.), user accurate and complete user behavior models. Intuitively, user
interests (browsed content, access patterns, and preferred behavior is highly influenced by his neighbors in the social
keywords and categories), and friend recommendation network, varies as the environment changes, and inherits from
(reviewed, replied, commented, and forwarded relation- history. Therefore, our system considers mainly three kinds of
ship). Most of the user contexts cannot be retrieved on user behavior: access preferences, social activities, and reading
application servers; application plug-ins are loaded on the interests. They are extracted separately from user contexts,
MO et al.: CLOUD-BASED MOBILE MULTIMEDIA RECOMMENDATION SYSTEM WITH USER BEHAVIOR INFORMATION 187

Fig. 2. Protocol of user context collection.


Fig. 3. Protocol of user context collection.
connection maps, and user profiles. Those data are crawled on
Tudou (a top video-sharing website in China), except that user 2) The collectors store the contexts into an input tuple with
contexts are collected from an Android-based Tudou client. five attributes CXnet , CXdt , CXt0 , CXt1 , CXloc .
More details will be presented in following section. 3) The properties of the related accessed video are mapped
into an output tuple with five attributes Vres , Vlen , Vrate ,
A. User Contexts and Access Preferences Vage , Vcat , where Vres represents video resolution, Vlen
denotes video length, Vrate denotes video bit rate, Vage is
It is verified that user contexts are essential to provide users video age, and Vcat represents video categories.
right services in ubiquitous networks. Analyzing user’s access 4) Based on the input and output tuples, we adopt subspace
attributes on Tudou, we can draw a similar conclusion: access clustering through attribute clustering (SCA) to abandon
network types and devices take effect on the resolution, length, redundant attributes and to get user clusters. In the phase,
and bit rate of requested videos; access time and user location we drop noise points in the result, merge some clusters
affects the accessed video categories of the mobile user. With with small size, and then generate rough clustering rules
the coaction of those contexts, the same user shows extra differ- according to the values of the cluster attributes.
ence of his interests. For example, a 3G user in journey prefers 5) The server maps the clustering rules into key value
short clips or TV series with a low bit rate, while the same pairs and pushes possible rule pairs to mobile users.
user with WiFi in the office neglects video length and prefers A typical list of rule pairs is like ClusterID, 0x12,
the clips with higher resolution. Some users watch the videos Attributes, 3, N etwork, wif i, Location, static,
with specific types at fixed time. Therefore, some research V iewing, ”30 − 60”. The rule means that the cluster
studies use domain ontologies directly to augment and enrich with ID 0x12 has three attributes. The users in the cluster
contextual information, and input the contextual information access the network by WiFi, and they always view a
into a suitable reasoner to recommend content automatically. video from 30 to 60 min without moving. In a rule, if
Because the contextual information such as time and location the value of an attribute is a range, it will be denoted as
is more likely to change, user contexts have to be reported in “lowerbound–higherbound” (e.g., “30_60”). To explain
time. However, Tudou has approximately 100 million clicks more clearly, we use the meaning of the values instead
generated by 20 million users every day. Supposing that only of true values.
10% of users access from mobile networks, at least 240 MB of 6) When a user requests videos, the plug-ins will collect user
context messages will be delivered to the server every day (in contexts and will calculate belonged cluster. The cluster
context message, access time occupies 8 B, leave time occupies will be reported to the collectors at the server side for later
8 B, location occupies 8 B, and network type and device clustering and recommending.
type occupy 2 B). If domain ontologies are adopted, context The cluster information will be combined with user reading
messages bring more than 1 GB of network overhead every interests to obtain the complete recommending rules in the
day. The large contextual information poses great pressure on latter phase.
reasoning at the server side. Besides that, context collection
invades users’ privacy.
As mentioned previously, a user always watches the videos B. User Connections and Social Activities
with specific types in particular context, which indicates that On multimedia-sharing websites, such as Flickr, Facebook,
users can be clustered, each user cluster has a unique access and Twitter, users assign tags on the resources. Analyzing the
preference, and user context collection can be replaced by user tagging information in former research studies, the users with
cluster collection. To cluster users and to collect user clusters, cotagging behaviors show high similarity on specific items.
we developed Tudou plug-ins on the user terminal with a novel Besides that, online users often click the resources recom-
user context collecting protocol. The flowchart is described mended by their concerning users and interesting groups. Based
in Fig. 2. on the implicit relationship of user–user and user–resource in
1) The Tudou plug-ins on initial users or contract users col- social networks, the recommendation system can achieve better
lect basic user contexts, such as network type (CXnet ), performance and lower time cost.
device type (CXdt ), access time (CXt0 ), leave time Tudou has approximately 80 million registered users and
(CXt1 ), and location (CXloc ). The contexts are reported 17 592 groups until February 2012. The group information
to the collectors at the server side. cannot be used to recommend items directly due to data sparsity
188 IEEE SYSTEMS JOURNAL, VOL. 8, NO. 1, MARCH 2014

in each group and incomplete user coverage. Fortunately, users C. User Profiles and Viewing Interests
often co-comment on popular videos and add others, providing Based on context clusters and social communities, we can
them favorite videos as idols. The behaviors hide some group make a rough recommendation for users. More accuracy recom-
information. Therefore, we are concerned with three kinds of mendation depends on the user’s reading interests on content.
user relationship: idol–fan profiles, co-commenting behaviors, User’s reading interests are extracted from his profiles which
and interest groups. Based on the connection relationship, we keep track of what videos he has viewed. Former research
construct a weighted graph like Fig. 3. studies build user’s profiles by exploration on three different
The graph is composed of three kinds of subgraph. The but related dimensions, such as topic distribution, similar access
subgraph with red link is constructed from idol–fan profiles; patterns, and preferred entities. However, similar access pat-
if a user adds another user as his idol, the two users are linked terns are discussed previously, and preferred entities are hardly
together. If two users take each other as idols, there are two detected from videos. We construct user’s profile from two
edges between users; the subgraph with blue link is generated aspects: video content and video attributes; each aspect includes
from user co-comment behaviors. If a user comments on his several dimensions.
interested videos, an edge links the user and the video. The
subgraph with green link is plotted based on user interested 1) Video content: It is characterized by a probability vector
groups. If a user joins an interested group, he is connected with of keywords in video title and tags, and the vector is
the group. It is complex to recommend items based on the graph denoted as {key1 , pro1 , key2 , pro2 , . . .}. If keywords
directly. or tags are synonyms, related elements will be merged.
With the help of intermediate videos, we merge the 2) Video attributes: Besides video content, user interested
three subgraphs into one graph of user–user community. videos have many specific attributes, such as video
The new graph G = {V, E} consists of node set V and length, video resolution, video popularity, and video age,
edge set E. V = {u1 , u2 , . . . , ui , . . . , um }, where node ui which are denoted as a list {va1 , va2 , . . .}. Analyzing
denotes the ith user. E = {e1 , e2 , . . . , ej , . . . , en }, where historic attribute lists from a user’s profiles, we obtain
edge ej denotes correlation between two users. The weight the probability of the user’s interests on specific video
of node ui [Weightnode (ui )] and the weight of edge clusters. IEEEhowto:kopka
ej [Weightedge (ej )] are calculated as follows: The aforementioned profiles are not only explored by their
own users during recommending but also used by other users
Weightnode (ui ) = α ∗ fans + β ∗ comments during CF. On Tudou, 80 million users click 100 million times
a day, and the data increase every day. Recommending based
+ γ ∗ grpusers
on the items should consume great computation and bring large
1 + MuFans(j0, j1) latency. On the other hand, users’ interests vary from time to
Weightedge (ej ) = α ∗ time, and old profiles introduce a lot of noise during recom-
2
mending. To overcome the problems, we combine the group
CoCom(j0, j1)
+β∗ results in Section III-B, put the profiles of the users in the same
comj0 + comj1 group into one set, and adopt k-means clustering algorithm
CoGrp (j0, j1) on the set to obtain interest clusters. By doing so, searching
+γ∗ space is narrowed down to one cluster while making real-time
grpj0 + grpj1
recommendation. For example, searching in the profiles for
α > 0, β > 0, γ > 0; α + β + γ = 1 (1) 10 days requires n-dimensional query in 1000 million profiles;
after the profiles have been divided into average N clusters
where α, β, and γ represent the influence of idol–fan profiles, with M groups, they require log2 M +log2 N 1-D search and
co-commenting behaviors, and interest groups. fans means the n-dimensional query in 1000 million/(M ∗N) profiles in average.
number of ith user’s fans. comments is commented times of
the ith user posted videos, and grpusers is the size of the ith IV. C LOUD -A SSISTED C LUSTERING
user created group. MuFans(j0, j1) depends on whether two
users are mutual idols. If they are, it is 1; otherwise, it is 0. As mentioned previously, various cluster algorithms are
comj0 , comj1 are commenting times of two users linked by the adopted to analyze user behavior and to obtain the recommend-
jth edges, while CoCom(j0, j1) means co-commenting times ing rules. For example, SCA is used to cluster user contexts,
of the two users. grpj0 , grpj1 are numbers of the groups joined graph partition is exploited to get community groups, and
by the two users, while CoGrp(j0, j1) is the number of the k-means is introduced into viewing interest clustering. Although
groups joined by both users. the algorithms are executed offline, it is still a time-consuming
If a weight of an edge is lower than a threshold δ, the users work and unacceptable. Therefore, we deploy the cluster al-
linked by the edge will be divided into two subgraphs, and then, gorithms on Hadoop—a famous MapReduce-based cloud plat-
the graph is partitioned into several subgraphs. Tunable value form provided by Apache. More details are illustrated in Fig. 4.
δ depends on the probability distribution of Weightedge . For 1) User profiles in HDFS are cut into s chunks. Each trunk
example, the graph in Fig. 3 is partitioned into two subgraphs. includes profiles of different users, and the profiles of the
The partitioned groups and the weights of nodes are adopted same user may be stored in several trunks. To balance
as important parts for interest extraction. resources and processing latency among Hadoop nodes,
MO et al.: CLOUD-BASED MOBILE MULTIMEDIA RECOMMENDATION SYSTEM WITH USER BEHAVIOR INFORMATION 189

Fig. 6. Examples of rules.


are separately pai and pbj , A and B have T attributes,
and vaia and vaib are separately the ith attributes of
A and B. Sim(A, B)’s include two kinds of simi-
larity, SimSem(A, B) denotes semantic similarity of
Fig. 4. Protocol of user context collection. keywords, SimAttr(A, B) represents similarity of
attributes, minSD(ai, B) is minimal semantic distance
between keyword ai and all keywords in B, and
minSD(bj, A) is minimal semantic distance between
keyword bj and all keywords in A. The semantic distance
is calculated according to semantic tree/dictionary; more
details will not be discussed here.
4) According to the similarity, the mapper puts the most
similar profiles into one cluster and computes the means
of similarity.
5) To guarantee the stability of the cluster, the central point
of the cluster is rechosen to make the means of similarity
minimal. If the new central point is the same as the old
one, the mapper task terminates. Otherwise, k central
points are updated.
6) Steps 3–5 repeat until the central point remains un-
changed or the means of similarity is below a threshold.
Fig. 5. Protocol of user context collection. (a) Mapper. (b) Reducer. Each mapper obtains k clustering rules with a central
profile, a list of profile attributes, and an upper bound of
trunk size is decided by computation capacity and mem- the means of similarity.
ory storage of the nodes. 7) The intermediate rules are input into the reducer to merge
2) Clustering mappers are triggered as Fig. 5(a) to process ks clusters into k clusters.
the profiles. 8) The reducing procedure is like the map phase. k central
3) A mapper randomly chooses k profiles from a chunk as points are chosen from ks profiles; if the profiles are
central point of k clusters and calculates the cosine simi- the same, the related clusters will be merged directly.
larity between the left profiles and the central profiles as Otherwise, the same iteration as the mapper does will be
Sim(A, B) = SimSem(A, B) ∗ SimAttr (A, B) performed by the reducer to get stable k clusters.
SimSem(A,B) With the help of the Hadoop platform, we obtain clustering
[pai ∗ pbj ∗ minSD(ai, B)minSD(bj, A)] rules rapidly. The rules include cluster ID, center point of
i∈I,j∈J user profiles, concerned attributes, and threshold of similarity.
=  
pai [minSD(ai, B)]2 pbj [minSD(bj, A)]2 During recommendation, the rules will be adopted instead of
i∈I j∈J the recommendation lists in previous work.
SimAttr(A,
B)
(vaia vaib ) V. O PTIMIZED R EAL -T IME R ECOMMENDATION
i∈T
=   (2) Through cloud-based clustering and analyzing on user be-
(vaia )2 (vaib )2 haviors, the recommendation rules can be obtained. The rules
i∈T i∈T
will guide the recommender on how to recommend videos to
where A and B are two profiles, A has I keywords ai, users when new requests arrive. Typical rules are illustrated
B has J keywords bj, the probabilities of ai and bj as Fig. 6.
190 IEEE SYSTEMS JOURNAL, VOL. 8, NO. 1, MARCH 2014

Before rule execution, we choose an initial rule according


to the summary statistics. For example, node C (“iphone”
key1 ?X?K1 ) in Fig. 8 is chosen as the first rule because the
weight of the node is 360 000, which is smallest among all
nodes. After node C is being executed, the searching space
is narrowed down to 360 000 items. Under the constraint of
node C, the summary statistics of edges CA, CB, CD, and CE
are compared, and D is chosen as the second rule due to the
smallest weight of CD. Repeating the work, we can get a span-
Fig. 7. Examples of request.
ning tree {C, D, A, E, B}. Rule execution orders follow the
spanning tree, and the searching space is narrowed gradually to
about 50 items without considering duplicate removal. It should
be noticed that the summary statistics of edges is obtained from
evaluation on samples or referred from the weight of related
nodes. The summary statistics need to be updated timely during
spanning tree construction. Another thing should be noticed
that the context clustering rule should be translated to content
clustering rule before reordering.

VI. E XPERIMENTAL R ESULTS AND A NALYSIS


Our recommender system includes two parts: one part is ap-
plication plug-ins on mobile terminal, which is implemented on
the Android platform, collects user contexts, and takes charge
of calculating belonged cluster according to the context cluster-
ing rule. The core part is the recommender at the server side,
Fig. 8. Examples of rule execution reorder.
which is implemented on the Hadoop platform, includes user
In this example, the users who access the websites via WiFi context clustering, user group partition, user profile clustering,
from 15:00 to 24:00 and move within a small range should be recommendation rule generation, and real-time recommending.
classified into cluster m. They often view particular category We deploy the recommender system on a small-sized cloud
videos at least 10 min with 360P resolution. The videos which platform with four nodes.
show how iPhone pushes user interested message via the cloud To evaluate the performance of the system, training data
platform are grouped into cluster x. When a new iPhone user and test data should be gathered. We collect user contexts
requests video for introduction of cloud technique, his request via the Android client and the survey questionnaires among
will be extended as Fig. 7(a), according to his contexts, input 30 volunteers in the National Engineering Laboratory for Next
keywords, and implicit keywords. Matching the query with Generation Internet Access System. We also extract public
recommendation rules in Fig. 6, we can obtain the recommen- information from Tudou, such as user group information, video
dation lists for the user. attributes, video descriptions/tags, and user profiles. It is easy to
Matching procedure translates implicit access content rules obtain the first three information via crawling on Tudou. To get
to searching dimension. The rules are executed sequentially user profiles, we collect the profiles from the 30 volunteers and
one by one. Intuitively, if nothing has been done during rule user comment records. Therefore, the videos without comments
execution, searching latency cannot be guaranteed. As we are will be removed during the preprocessing procedure. Consider-
testing on Tudou, 30% of the videos last more than 10 min, 80% ing timeliness of user behaviors and user interests, all data are
of the videos have above 360P resolution, 22 982 videos are collected from October 1, 2011, to February 18, 2012. After
tagged as “iPhone,” and 45 570 videos are tagged as “cloud.” preprocessing, 17 592 user groups, 974 000 video items, and
If the resolution rule is executed first, 48 million videos will be 293 000 users with 1 450 000 user profiles are stored.
picked up, and left rules will be executed on 48 million videos. After gathering the data, we classify the user profiles into
Otherwise, if rules with “iPhone” are executed first, left rules two sets. A set of profiles is used as training data and denoted
will be executed only on 22 982 videos. Obviously, execution as T , and the other set of profiles is used as verified data and
as the latter order reduces result sets and latency. represented as V . Our recommender system recommends a list
Based on the aforementioned investigation, the execution with no more than 20 items, which is denoted as R. Referring
order of the recommendation rules should be adjusted before to other works, we define two metrics Precision(R, V ) and
matching. We use weighted graph to decide the order. Each Recall(R, V ) as (3) to evaluate the performance of our work
atomic rule is taken as a node, and the node is weighted by 
the summary statistics of the rule. Two rules are connected by size(R V )
Precision(R, V ) =
an edge, and the edge is weighted by summary statistics under size(R)

the constraint of the two rules. The rules [Fig. 7(b)] can be size(R V )
Recall(R, V ) = . (3)
translated to a graph as Fig. 8. size(V )
MO et al.: CLOUD-BASED MOBILE MULTIMEDIA RECOMMENDATION SYSTEM WITH USER BEHAVIOR INFORMATION 191

Fig. 9. Recommendation quality comparison of user profile cluster numbers.


Fig. 10. Recommendation quality comparison of user numbers in interest
group.

Examining the precision and recall of volunteers and the


users whose profiles were exploited in both the training and
verifying phases, we can obtain a curve of average precision
versus recall. We will compare the precision–recall curve and
the run-time recommendation latency in several scenarios. We
also compare the training latency and the run-time latency on
Hadoop with that on a stand-alone machine. We would like
to point out that the proposed optimized recommendation re-
order method reduces latency of personalized real-time content
recommendation.
Effect of number of clusters: In our recommender system,
three kinds of clusters are used. We will discuss the cumulative
effect of the clusters. First, we evaluate the effect of user profile
cluster number. Then, the number of attribute clusters is eval-
uated. Finally, we analyze the number of user group clusters.
By cumulative method, we also evaluate the improvement of Fig. 11. Recommendation quality comparison of user context cluster
recommendation quality step by step. numbers.
Fig. 9 shows the comparison results of different profile
cluster numbers. From Fig. 9, we observe that, as the number
of user profile clusters k increases from 1 to 32, precision From the figure, we can draw a conclusion that the effect of
and recall improve consistently. Further increase of k from 32 context cluster number c is very small in our experiments. That
results in less improvement of recommendation quality, but user may be caused by the small sample set of user contexts. We only
profiles can be clustered into more small groups. The small employ 30 volunteers to report their contexts, the volunteers are
groups benefit from real-time recommendation. The figure also students from the same laboratory, and they access the website
shows that a very large cluster number leads to unimaginable in similar environments.
quality degradation. For example, a recommendation quality Clustering latency: In our experiments, we deploy our
of k = 300 is even lower than that of k = 1. The reason recommender system on four HP ProLiant Blade servers with
is that a few profiles in the small group do not cover user Xeon 2.4G CPU and 8-GB DDR3 memory. Comparing the
interests, and we do not adopt the collaborate filter during the latency of three clustering algorithms with cloud and without
experiment. cloud, we plot clustering latency with different cluster sizes
Based on the comparison of user profile group numbers, we in Fig. 12. From the figure, we can conclude that the cloud
set the cluster number of profile cluster k to 32 and evaluate platform does not reduce clustering latency when the cluster
the effect of the interest group number on the same data set. number is small, but it helps in improving the performance of
We can obtain the conclusion as Fig. 10. It shows that the the system with the increment of the cluster number.
effect of interest group number g is similar as that of profile Real-time recommendation latency: Finally, we compare
cluster number k. When the group number is less than 16, the real-time recommendation latency brought by three meth-
recommendation quality improves with the increment of group ods, such as CF, rule-based algorithm without optimization, and
number g. If the number deviates from the optimal value, rule-based algorithm with optimization. The result is given in
the quality degrades gradually. However, the effect of group Fig. 13. The figure shows that rule-based algorithms reduce la-
number g is smaller than that of profile cluster k. tency about six times rather than CF. If the rule-based algorithm
Continually, we set k to 32 and g to 16 and check the effect of is optimized by weighted graph, the latency will be reduced for
the user context cluster number. The result is shown in Fig. 11. another 50%.
192 IEEE SYSTEMS JOURNAL, VOL. 8, NO. 1, MARCH 2014

R EFERENCES
[1] C.-F. Lai, Y.-M. Huang, and H.-C. Chao, “DLNA-based multimedia
sharing system for OSGI framework with extension to P2P network,”
IEEE Syst. J., vol. 4, no. 2, pp. 262–270, Jun. 2010.
[2] K.-D. Chang, C.-Y. Chen, J.-L. Chen, and H.-C. Chao, “Challenges to next
generation services in IP multimedia subsystem,” J. Inf. Process. Syst.,
vol. 6, no. 2, pp. 129–146, Jun. 2010.
[3] D. Li, Q. Lv, X. Xie, L. Shang, H. Xia, T. Lu, and N. Gu, “Interest-
based real-time content recommendation in online social communities,”
Knowl.-Based Syst., vol. 28, pp. 1–12, Apr. 2012.
[4] X. Wu, Y. Zhang, J. Guo, and J. Li, “Web video recommendation and long
tail discovering,” in Proc. IEEE ICME, 2008, pp. 369–372.
[5] D. Poirier, F. Fessant, and I. Tellier, “Reducing the cold-start prob-
lem in content recommendation through opinion classification,” in Proc.
IEEE/WIC/ACM Int. Conf. WI-IAT, 2010, pp. 204–207.
[6] M.-H. Kuo, L.-C. Chen, and C.-W. Liang, “Building and evaluating a
location-based service recommendation system with a preference adjust-
Fig. 12. Recommendation latency comparison of clustering algorithms on ment mechanism,” Exp. Syst. Appl., vol. 36, no. 2, pp. 3543–3554, Mar. 2009.
cloud or no cloud. [7] Z.-D. Zhao and M.-S. Shang, “User-based collaborative-filtering recom-
mendation algorithms on Hadoop,” in Proc. WKDD, 2010, pp. 478–481.
[8] C. Cordier, F. Carrez, H. Van Kranenburg, C. Licciardi, J. Van der Meer,
A. Spedalieri, J. P. Le Rouzic, and J. Zoric, “Addressing the challenges
of beyond 3G service delivery: The SPICE service platform,” in Proc.
Workshop ASWN, 2006, pp. 1–29.
[9] P. Pawar and A. Tokmakoff, “Ontology-based context-aware service dis-
covery for pervasive environments,” in Proc. IEEE Int. Workshop Service
Integr. Pervasive Environ., Jun. 2006, pp. 1–7.
[10] C.-F. Lai, S.-Y. Chang, Y.-M. Huang, J. H. Park, and H.-C. Chao, “A
portable uPnP-based high performance content sharing system for sup-
porting multimedia devices,” J. Supercomput., vol. 55, no. 2, pp. 269–283,
Feb. 2011.
[11] M. J. Pazzani and D. Billsus, “Content-based recommendation systems,” in
The Adaptive Web. Berlin, Germany: Springer-Verlag, 2007, pp. 325–341.
[12] E. Gabrilovich, S. Dumais, and E. Horvitz. Newsjunkie, “Providing per-
sonalized newsfeeds via analysis of information novelty,” in Proc. WWW,
2004, pp. 482–490.
[13] L. Li, D. Wang, T. Li, D. Knox, and B. Padmanabhan, “SCENE: A
scalable two-stage personalized news recommendation system,” in Proc.
Fig. 13. Real-time recommendation latency comparison. SIGIR, 2011, pp. 125–134.
[14] Z. Wang, Y. Tan, and M. Zhang, “Graph-based recommendation on social
networks,” in Proc. Int. Asia-Pac. APWEB Conf., 2010, pp. 116–122.
VII. C ONCLUSION AND F UTURE W ORK [15] K. Ali and W. van Stam, “TiVo: Making show recommendations using a
distributed collaborative filtering architecture,” in Proc. ACM SIGKDD,
In this paper, we have proposed a cloud-assisted recom- 2004, pp. 394–410.
mender system for videos. Based on the MapReduce platform, [16] T. Hofmann, “Latent semantic models for collaborative filtering,” ACM
Trans. Inf. Syst., vol. 22, no. 1, pp. 89–115, Jan. 2004.
we have analyzed three kinds of user behaviors, including user [17] Z. Zheng, H. Ma, R. Lyu, and I. King, “WSRec: A collaborative filtering
contexts, interest groups, and user profiles. Along with different based web service recommender system,” in Proc. IEEE Int. Conf. ICWS,
2009, pp. 437–444.
characteristics of the three kinds of information, we adopt SCA, [18] G. Go, J. Yang, H. Park, and S. Han, “Using online media sharing behavior
graph partition, and k-means separately. Distinguishing with as implicit feedback for collaborative filtering,” in Proc. IEEE Int. Conf.
other recommender systems, we have stored recommendation Social Comput., 2010, pp. 439–445.
[19] Z. N. Chan, W. Gaaloul, and S. Tata, “Collaborative filtering technique
rules instead of recommending lists. Additionally, a graph- for web service recommendation based on user-operation combination,”
based rule reordering method is used in real-time recommend- in Proc. OTM, 2010, pp. 222–239.
ing. Evaluation shows that the proposed system provides higher [20] S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar,
D. Ravichandran, and M. Aly, “Video suggestion and discovery for
quality of recommendation with lower training latency and YouTube: Taking random walks through the view graph,” in Proc. WWW,
recommending latency. 2008, pp. 895–904.
In this paper, user profiles have been obtained from co- [21] A. C. M. Costa, R. S. S Guizzardi, G. Guizzardi, and J. G. P. Filho, “COReS:
Context-aware, ontology-based recommender system for service recom-
comment information, but users always make no comment after mendation,” in Proc. Ubiquitous Mobile Inf. Collab. Syst., 2007, pp. 1–15.
viewing their interested video, which leads to errors during [22] A. C. G. C. A. Cimino, B. Lazzerini, and F. Marcelloni, “Situation-aware
clustering. For future work, we plan to handle the data sparsity mobile service recommendation with fuzzy logic and semantic web,” in
Proc. ISDA, 2009, pp. 1037–1042.
of user profiles. Another important point that should be studied
is designing a distributed recommendation cache to improve
recommending hit rate. The cache can also reduce computation Yijun Mo received the B.Eng. degree in electrical
and electronics engineering, the M.Phil. degree, and
pressures caused by the amount of concurrent rule reordering the Ph.D. degree from Huazhong University of Sci-
and executions. ence and Technology (HUST), Wuhan, China, in
1999, 2001, and 2008, respectively.
Since November 2009, he has been an Associate
ACKNOWLEDGMENT Professor with HUST. His research interests include
wireless networks, semantic networks and service
The authors would like to thank Q. Chen and H. Wu for their composite, and multimedia communication.
work on data collecting and preprocessing.
MO et al.: CLOUD-BASED MOBILE MULTIMEDIA RECOMMENDATION SYSTEM WITH USER BEHAVIOR INFORMATION 193

Jianwen Chen (SM’12) received the Ph.D. degree Changqing Luo received the Ph.D. degree in electri-
in electrical engineering from Tsinghua University, cal engineering from Beijing University of Posts and
Beijing, China, in 2007. His Ph.D. research focused Telecommunications, Beijing, China, in 2011.
on video compression algorithm design, video codec He is an Assistant Professor with the School of
hardware architecture design, and embedded video Computer Science and Technology, Huazhong Uni-
codec algorithm optimization and implementation. versity of Science and Technology, Wuhan, China.
From 2007 to 2010, he was a Staff Researcher with During his Ph.D. study, he was a visiting stu-
IBM Research, where he conducted cutting-edge re- dent with the Department of Electrical and Com-
search on wireless communication systems and mul- puter Engineering, University of British Columbia,
ticore video-coding architectures. From September Vancouver, BC, Canada, for half a year and with the
2010 to September 2012, he was with the research Department of Systems and Computer Engineering,
group in the Department of Electrical Engineering, University of California, Carleton University, Ottawa, ON, Canada, for half a year. His current research
Los Angeles (UCLA), Los Angeles, CA, USA, where he furthered the research interests include algorithms and optimization for wireless networks, green
on high-efficiency video-coding techniques, wireless networking systems, and communication, and mobile cloud computing.
high-performance computing systems and applications. Since October 2012,
he has been a Senior Visiting Scholar with the Human Visio Research Center
of Harvard, where he is focusing on visual quality evaluation, 3D video Laurence Tianruo Yang (M’97) received the B.E.
experience, and media cloud systems. He has authored more than 50 papers. His degree in computer science from Tsinghua Uni-
current research interests include multimedia communication over networks, versity, Beijing, China, and the Ph.D. degree in
video coding, and wireless communication network systems. computer science from the University of Victoria,
Dr. Chen has more than 50 standard proposals for MPEG, AVS, and Victoria, BC, Canada.
VCEG since 2003. Since February 2012, he has served as the Chairman of He is a Professor with the School of Computer
the MPEG Internet Video Codec Ad Hoc Group. He was nominated as the Science and Technology, Huazhong University of
Chancellor’s Postdoctoral Researcher of UCLA in 2012. He has served as a Science and Technology, Wuhan, China, and the
reviewer/organizer for many academic journals and conferences, such as the Department of Computer Science, St. Francis Xavier
IEEE T RANSACTIONS ON W IRELESS C OMMUNICATION, the IEEE T RANS - University, Antigonish, NS, Canada. His current re-
ACTIONS ON M ULTIMEDIA , the IEEE T RANSACTIONS ON C IRCUITS AND search interests include parallel and distributed com-
S YSTEMS FOR V IDEO T ECHNOLOGY, the IEEE Visual Communication and puting and embedded and ubiquitous/pervasive computing. His research is
Image Processing, the IEEE International Symposium on Circuits and Systems, supported by the National Sciences and Engineering Research Council and the
and the IEEE T RANSACTIONS ON P ROFESSIONAL C OMMUNICATION. Canada Foundation for Innovation.

Xia Xie received the Ph.D. degree in computer


science from Huazhong University of Science and
Technology (HUST), Wuhan, China.
She is an Associate Professor with the School
of Computer Science and Technology, HUST. Her
current research interests include data mining, per-
formance evaluation, and parallel and distributed
computing.

Das könnte Ihnen auch gefallen