Beruflich Dokumente
Kultur Dokumente
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE
Transactions on Circuits and Systems for Video Technology
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015 1
1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE
Transactions on Circuits and Systems for Video Technology
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015 2
1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE
Transactions on Circuits and Systems for Video Technology
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015 3
1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE
Transactions on Circuits and Systems for Video Technology
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015 4
1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE
Transactions on Circuits and Systems for Video Technology
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015 5
Fig. 4. Comparison of topic distribution Adeles Music generated by NMF Fig. 5. Comparison of the distributions along the timeline for two topics
and SNMF (with = 10). The horizontal axis is time (in days) and the generated by SNMF: (a) Adeles pregnancy and (b) Adeles Music.
vertical axis is the topics weight in each days log data.
between PD (di |tk ) and PD (di |tl ). This is because the timeline
investigate their influences on performance in the experiments is not always well aligned even for two topics describing
section. It should be noted that, in general we choose a a same story, their temporal distributions may still have a
relatively large K, as we dont want to miss some social events slight time lag. To deal with the alignment issue, we shift one
which are not that prominent in comparison to popular ones. distribution forward and backward with a small offset (one
The side-effect of a large K is the risk of over-splitting topics. day in the implementation), and select the smallest symmetric
Therefore, an additional fusion step is introduced in the next K-L divergence as the distance between tk and tl , as
subsection, in which more informative clues such as the search distD (tk , tl ) = min {KLD (tk , tl ; ), {1, 0, 1}}, (6)
log data are taken into account to merge similar topics.
2) Topic Fusion: After the factorization step, we have K where KLD (tk , tl ; ) is the shift-enabled K-L divergence
topics {t1 , . . . , tK } and two matrices W and H. To charac- mentioned above, is the offset in days.
terize a topic, the most intuitive clues are its distributions, C. Topic similarity over search log URLs . From the search
both over the query vocabulary and over the time line. These log, the relationships between the search log URLs and the
two distributions can be directly obtained from W and H. queries can be described by a |U| |Q| matrix L in which
Another useful clue from the search log data is the set of each element Lij denotes the number of times that the URL ui
search log URLs, which have proven to be effective for being clicked given the query qj . By multiplying L and W, we
query clustering [40]. The assumption is, queries trigging can propagate a topics weights over queries to the search log
the same URL are very likely to have similar semantics. URLs. Next, a topic tk s distribution over
P|Uthe
|
search log URLs
Consequentially, two topics should be semantically correlated is defined as PU (ui |tk ) = (LW)ik / j=1 (LW)jk ; and the
if they have similar distributions over the click-URL space U corresponding distance between tk and tl is
defined in III B. In this paper, we combine these three clues
to measure the similarity between two topics, and merge all distU (tk , tl ) = KLU (tk , tl ), (7)
the topics in an unsupervised way. where KLU has the same form as KLQ in (5).
A. Topic similarity over queries . Given a topic tk (1 The three distance scores in (5)(7) are simply added up
k K), its distribution over the queries can be approximated to describe the overall distance between two topics. Then,
by the k th column of W. As W is a non-negative matrix, agglomerative hierarchical clustering is adopted to merge sim-
it is straightforward enough to transform the k th column ilar topics in a bottom-up way. We selected complete linkage
into a distribution PQ (qi |tk ) by normalizing it with the sum as the merge criterion for the clustering, to ensure strong
P|Q|
of its elements, i.e., PQ (qi |tk ) = Wik / j=1 Wjk . Then, connections between those merged topics. The stop threshold
the distance between two topics tk and tl , over the query is automatically estimated by identifying the significant jump
vocabulary, is defined by the symmetric Kullback-Leibler from the ascending sorted distance scores of all topic pairs.
divergence, as 3) Event Ranking: The last step is to distinguish event
related topics from others. Although this is essentially a
distQ (tk , tl ) = KLQ (tk , tl ) classification problem, collecting enough unbiased training
|Q| data is quite difficult in practice. Therefore, we treat it as a
1X PQ (qi |tk ) PQ (qi |tl )
= (PQ (qi |tk ) ln + PQ (qi |tl ) ln ). ranking problem, to leverage several heuristics summarized
2 i=1 PQ (qi |tl ) PQ (qi |tk )
based on a number of observations. Similar to the above part,
(5)
these heuristics are based on the distributions of a topic over
B. Topic similarity over timeline . Similarly, a topic tk s the time-line, over the query vocabulary, and over the search
distribution over the timeline, PD (di |tk ), can be approximated log URLs.
by normalizing the k th row in H. That is, PD (di |tk ) = Fig. 5 shows the distributions of two topics along the
P|D|
Hki / j=1 Hkj . However, the similarity of tk and tl over the timeline. One is a generic topic about Adeles music and the
time-line cannot be directly measured by the KL divergence other is about her pregnancy. It is clear that the curve of her
1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE
Transactions on Circuits and Systems for Video Technology
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015 6
events is defined as
topic
rankevent (tk ) = scoret (tk )scoreq (tk )scoreu (tk ). (11)
search log URLs are similar. That is, social events have more the search results of queries form social events.
Penalize those images which have similar ones in the
concentrated distributions than generic topics. In reality, the
search results of queries from popular topics.
numbers of queries and URLs associated with a social event
To do this, for each social event, the 5 most dominant
are much smaller than those of a generic topic. For example,
queries are selected for image searching. For each query,
the two numbers of search log URLs related to Adeles lyrics
thumbnails of the top 100 images returned by a commercial
and Adeles pregnancy have different orders of magnitude.
search engine are downloaded. In this way, we construct a
A natural choice for measuring the degree of concentration
candidate photo set for the events, denoted by Ievent , which
of a distribution is entropy. To promote topics with more
has 500 thumbnails. Similarly, the top 10 queries from profile
concentrated distributions, another two ranking scores are
topics are used to collect a set of the most representative
defined as
images of that celebrity, denoted by Iprof ile , which has 1000
|Q| thumbnails in total. For each celebrity, Iprof ile is shared across
1 X
scoreq (tk ) = 1.0 + (PQ (qi |tk ) ln PQ (qi |tk )) (9) various social events. Before processing, blur features and dark
ln |Q| i=1
features are used to remove photos that are low quality. In the
following subsections, we will introduce how to measure the
|U |
1 X content similarities among images in Ievent and Iprof ile ; and
scoreu (tk ) = 1.0 + (PU (ui |tk ) ln PU (ui |tk )) (10)
ln |U| i=1 how to re-rank photos in Ievent based on these similarities.
These steps will help identify those photos that most represent
Lastly, the ranking score of a topic tk associated with some the event in question.
1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE
Transactions on Circuits and Systems for Video Technology
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015 7
1) Image Similarity Measures: To measure image similar- w becomes very small if Ix has similar images in Iprof ile .
ity, we considered both global and local image features in Finally, the new ranking score of a photo Ix Ievent is
this paper. Global features are extracted based on a whole |Ievent | idx(Ix )
img
image, and are suitable for identifying fully duplicate images. rankevent (Ix ) = w+ (Ix ) w (Ix ), (17)
|Ievent |
By contrast, local features describe a local image patch, and
have been widely used for recognizing partial duplicates. where idx(Ix ) is the zero-based index of the photo Ix in the
Supporting partial duplicate detection is quite important in search results returned by search engines. According to the
this step, as many images have been edited (e.g., cropping new ranking scores, the photos with the highest scores are
or stitching) before being published online. considered to be the most representative images of that event.
The global feature adopted in this paper is the block-
based intensity histogram [27]. Each image is divided into IV. DATA A NALYSIS
64 (8 8) blocks, and for the ith block a 256dimensional In this paper, we use the image search log collected by a
intensity histogram gi is computed based on the pixels within commercial search engine, consisting of queries, clicks and
that block. Consequently, the global feature-based similarity search results from July to December 2012. After filtering the
between two images Ix and Iy is defined as2 log data with the 200 celebrities names, we obtain more than
64 190 million log records. In other words, for each celebrity,
1 X Ix I
simhist (Ix , Iy ) = max{1.0 ||g gi y ||2 , 0}. (12) there are an average of around 5000 log records in every day.
64 i=1 i The data has been updated by the search engine to remove
For local feature-based similarity measurements, we choose private information, and each log record has three main fields:
the classic SIFT (Scale-Invariant Feature Transform) feature time, query, and click-URL, as shown in Fig. 7. Given a
and follow the matching process proposed in [22]. In [22], celebrity, only records with a query containing the celebritys
there is a geometric verification process which ensures that name are retained for further event detection. To guarantee data
the remaining SIFT correspondences between two images are quality, we also ignore those log records whose click-URL is
compliant with each other. This is a very strong assumption, empty. It should be noted that sometimes users click more than
and two images are very likely to be partial duplicates with one URLs in a given set of search results. For such a situation,
each other if the number of surviveing SIFT correspondences there will be multiple records, each of which correspond to one
is larger than a threshold. For two images Ix and Iy , the local clicked URL. This is to reserve more information about query
feature-based similarity is defined as and URL pairs, which is helpful in measuring the similarity
among different log records. For example, the 4th and 5th
1 inlier(Ix , Iy ) > sif t
simsif t (Ix , Iy ) = , (13) rows in Fig. 7 are two different click-URL for the singular
0 inlier(Ix , Iy ) sif t
query Jennifer Lopez Movies.
where inlier(Ix , Iy ) is the number of survived SIFT correspon-
dences between Ix and Iy , and the threshold sif t is set as 12 V. EVALUATION AND DISCUSSION
as suggested in [22]. A. Experimental Settings
Both the global and local similarity measurements can be
accelerated via off-the-shelf indexing technologies like k-d tree For evaluation, the first step is to choose a list of celebrities.
or hashing kernels [39], which have proven to be very efficient In this paper, we select target celebrities from three main
for million-scale image retrieval. Therefore, the computation data resources: (1) Google Zeitgeist 20123 which contains
cost in this paper is affordable. the hottest celebrities in search queries; (2) the most popular
At last, the integrated content similarity between image Ix celebrities in Yahoo!4 , the list from the largest internet portal;
and Iy are defined as and (3) the Forbes celebrity 100 list5 . After removing those
candidates which have few log records or no related ground
sim(Ix , Iy ) = max{simhist (Ix , Iy ), simsif t (Ix , Iy )}. (14) truth, we come up with a list of 200 celebrities from who are
2) Event Photo Re-ranking: To promote a photo Ix singers, actors/actresses, and politicians.
Ievent which has duplicates in Ievent , we define the weighting For quantitative performance measurement, the most chal-
score w+ (Ix ) as lenging step is to prepare a benchmark dataset with ground
X truth labels. In practice this turn out to be a laborious task. Ten
w+ (Ix ) = sim(Ix , Iy ). (15) websites, as listed in Table V-A, are adopted for ground truth
Iy Ievent ,Iy 6=Ix generation. For each website, we first develop a site-specific
According to the definition, the more duplicates in Ievent , the crawler to download those pages containing celebrity-related
more important the photo Ix is. Similarly, to punish photos social events. Then, we manually write regular expressions
with similar ones in Iprof ile , another weighting score w (Ix ) to extract events related information from every fetched web
is defined as page. In this way, we convert these web pages into a table of
structured data, of which there are three fields: celebrity name,
w (Ix ) = 1.0 max {sim(Ix , Iy )}. (16) event time, and event descriptions. To provide a convincible
Iy Iprof ile
2 Although 3 http://www.google.com/zeitgeist/2012/
l2 distance is not the best one to measure the similarity of two
4 http://omg.yahoo.com/top-celebrities/
histograms, it works well in practice. We adopt l2 distance mainly because it
can be easily accelerated via off-the-shelf indexing technologies. 5 http://www.forbes.com/celebrities/
1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE
Transactions on Circuits and Systems for Video Technology
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015 8
Fig. 7. A snapshot of the web search log data used for celebrity social event mining.
TABLE I
10 W EBSITES FOR GROUND TRUTH GENERATION
1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE
Transactions on Circuits and Systems for Video Technology
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015 9
Fig. 11. Subjective evaluation for the relevance of event photos returned by
Bing, Google, and the proposed approach.
Fig. 9. Comparisons of the performance with different number of topics K
in the topic factorization(Mirco-Precision, Micro-Recall, Macro-Precision and
Macro-Recall).
to group queries/URLs into events. The third one [1] converts
the time series data into time-interval sequences of temporal
abstractions and present minimal predictive recent temporal
paaterns framework to select the event patterns. For the
convenience of comparison, the number of abnormal queries
and number of clusters in [40] are both set to the ground truth
event number gt(ci ). Fig. 10 shows the experimental results.
It is clear that the abnormal query-based solution achieve the
worst performance. As we argue in IV, statistics at query-level
are very noisy and unreliable. The scores of [40] and [1]
are not good enough, too. This is because (1) some popular
topics (events) will dominate the clustering and (2) sometimes
Fig. 10. Comparisons of the overall performance with different approaches. events related search log URLs are too sparse to bridge
related queries. (3) the time patterns in the search log can
be overwhelmed by the common noisy data. Therefore, topic
violently varying timeline distribution (refer to the example factorization and event ranking are both necessary components
shown in Fig. 4) cannot accurately characterize the temporal in the solution.
evolution of a topic, and will hurt the next steps of topic fusion
and event ranking. When increased, the performance got C. Evaluation of Event Photo Relevance
better and better, which demonstrates the necessity of adopting
SNMF for topic factorization. However, when becomes To measure the relevance of social event photos, we have
large enough, the performance starts creeping down. There to resort to subjective evaluation. 10 undergraduate students
are two reasons leading to the performance drop: (i) a strong were invited as judges. For each event, the judges first read
smoothing operation will weaken peaks (just like the one the related webpages (through search log URLs) to know
shown in Fig. 5) which reflect the occurrence of some events; the story. Then, photos returned by Google, Bing, and our
and (ii) the strong regularization factor will dominate the cost approach were presented to the judges in a random order. Each
function in equation (4) and the obtained W H cannot photo was assigned one of the three scores: perfect, relevant,
approximate the original matrix D very well. According to and irrelevant. Perfect means the photo is about both the
Fig. 8, we set = 10 in the following experiments. celebrity and the event, relevant means the photo is at least
For the number of topics K, we vary it in the range of about the celebrity, and irrelevant means the photo is totally
(10 50). The performance curves are shown in Fig. 9. From wrong. Considering the cost of human judges, we randomly
Fig. 9, it is noted that as K increases, the performance curves selected 50 correctly discovered event topics for evaluation;
have a clear trend of upmaintaindown. When K is small, and for each event, only the top five photos returned by the
some social events are easily mixed with other popular topics; search engines and our approach was labeled. The comparison
when K is very large, the fusion step in IV B may fail to results are shown in Fig. 11. The event photo re-ranking
merge some relevant topics. Both of the two situations will hurt method introduced in V is helpful to identify event relevant
performance. By contrast, these curves are relatively stable for photos from image search results. To provide a vivid feeling
the range 20 K 40, which indicates the effectiveness of to the event photo selection, some example cases are shown
the topic fusion component. We set K = 40 in the following in Fig. 12, in which perfectly relevant photos are marked with
experiments. blue rectangles.
Lastly, we compare the overall performance of the proposed
method with three other approaches. One is the straightforward D. Examples of Event Storyboard
abnormal query-based strategy mentioned in the beginning Finally, the storyboard will be generated using the selected
of IV; and the second is the approach proposed by Zhao et events with relevant photos. To ensure the high quality of
al. [40] which also utilized web search logs. In [40], the query the photos, low visual quality photos will be eliminated at
and URL pairs in log data are represented with a bipartite first. Besides, time and location contexts are other important
graph, based on which a novel clustering method is adopted fact for the storyboard photos. For each detected event, the
1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE
Transactions on Circuits and Systems for Video Technology
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015 10
Fig. 13. The storyboards for Tome Cruise and Barack Obama, from July 2012 to December 2012.
TABLE II
H UMAN EVALUATION FOR STORYBOARD AND HUMAN WEB PAGE . E ACH
METHOD IS EVALUATED BY 10 PERSONS ( SCALE 1-10, HIGHER IS
BETTER ).
1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE
Transactions on Circuits and Systems for Video Technology
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015 11
[8] Y.-J. Chang, H.-Y. Lo, M.-S. Huang, and M.-C. Hu. Representative
photo selection for restaurants in food blogs. In Multimedia & Expo
Workshops (ICMEW), 2015 IEEE International Conference on, pages
16. IEEE, 2015.
[9] H. L. Chieu and Y. K. Lee. Query based event extraction along a
timeline. In Proceedings of the 27th annual international ACM SIGIR
conference on Research and development in information retrieval, pages
425432. ACM, 2004.
[10] T.-C. Chou and M. C. Chen. Using incremental plsi for threshold-
resilient online event analysis. Knowledge and Data Engineering, IEEE
Transactions on, 20(3):289299, 2008.
Fig. 14. A celebrity social event system: (a) home page,(b) list view of [11] H. Cui, J.-R. Wen, J.-Y. Nie, and W.-Y. Ma. Probabilistic query
topics related to Jenifer Aniston, (c) list view of hot event in October, and (d) expansion using query logs. In Proceedings of the 11th international
relevant images about the event girlfrined, developed on Windows Phone conference on World Wide Web, pages 325332. ACM, 2002.
8. [12] S. Essid and C. Fevotte. Smooth nonnegative matrix factorization
for unsupervised audiovisual document structuring. Multimedia, IEEE
Transactions on, 15(2):415425, 2013.
[13] G. P. C. Fung, J. X. Yu, H. Liu, and P. S. Yu. Time-dependent event
social event system. Uses can interactively switch among hierarchy construction. In Proceedings of the 13th ACM SIGKDD
four views: people-centric, timeline-centric, month-centric and international conference on Knowledge discovery and data mining,
topic-centric. pages 300309. ACM, 2007.
[14] T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of
Other than celebrity, more entities events can be detected the 22nd annual international ACM SIGIR conference on Research and
using a similar strategy with the search log data. Such as land- development in information retrieval, pages 5057. ACM, 1999.
mark, brand, we can detect their development and evolution [15] T. Joachims. Optimizing search engines using clickthrough data. In
Proceedings of the eighth ACM SIGKDD international conference on
from timeline with related photos, which make it easier for Knowledge discovery and data mining, pages 133142. ACM, 2002.
users to know more about them. [16] N. Kawamae. Trend analysis model: trend consists of temporal words,
topics, and timestamps. In Proceedings of the fourth ACM international
conference on Web search and data mining, pages 317326. ACM, 2011.
VI. C ONCLUSIONS [17] D. D. Lee and H. S. Seung. Algorithms for non-negative matrix
factorization. In Advances in neural information processing systems,
In this paper, we use search logs as data source to generate pages 556562, 2001.
social event storyboards automatically. Unlike common text [18] J. Li and C. Cardie. Timeline generation: Tracking individuals on twitter.
mining, search logs have short, sparse text queries and the data In Proceedings of the 23rd international conference on World wide web,
pages 643652. ACM, 2014.
size is much bigger than some news websites or blogs. Based [19] Z. Li, B. Wang, M. Li, and W.-Y. Ma. A probabilistic model for
on these features, we do not use the query text information retrospective news event detection. In Proceedings of the 28th annual
to do the analysis. Structure and statistic information are used international ACM SIGIR conference on Research and development in
information retrieval, pages 106113. ACM, 2005.
to get the topics and event detection in our work, which can [20] A. Liu, W. Lin, and M. Narwaria. Image quality assessment based
fit the data well. Furthermore, we add time information in our on gradient similarity. Image Processing, IEEE Transactions on,
approach to SNMF to make it easier to discover social events 21(4):15001512, 2012.
[21] H. Liu, J. He, Y. Gu, H. Xiong, and X. Du. Detecting and tracking topics
compared with traditional NMF methods. Our work performs and events from web search logs. ACM Transactions on Information
better than traditional works in this area, e.g. [40], because Systems (TOIS), 30(4):21, 2012.
we can distinguish the topics in a way that gets the events [22] D. G. Lowe. Object recognition from local scale-invariant features. In
Computer vision, 1999. The proceedings of the seventh IEEE interna-
which are most appealing to common users. The associated tional conference on, volume 2, pages 11501157. Ieee, 1999.
images were selected to make up the storyboard in a timeline [23] Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to
to present a good representation of the mined events using the spatiotemporal theme pattern mining on weblogs. In Proceedings of
the 15th international conference on World Wide Web, pages 533542.
image search results features and relationships. ACM, 2006.
[24] T. Mei, Y. Rui, S. Li, and Q. Tian. Multimedia search reranking: A
R EFERENCES literature survey. ACM Computing Surveys (CSUR), 46(3):38, 2014.
[25] M. Platakis, D. Kotsakos, and D. Gunopulos. Searching for events in
[1] C. Alexander, B. Fayock, and A. Winebarger. Automatic event detection the blogosphere. In Proceedings of the 18th international conference on
and characterization of solar events with iris, sdo/aia and hi-c. In World wide web, pages 12251226. ACM, 2009.
AAS/Solar Physics Division Meeting, volume 47, 2016. [26] S. D. Roy, T. Mei, W. Zeng, and S. Li. Towards cross-domain learning
[2] J. Allan, J. G. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic for social video popularity prediction. Multimedia, IEEE Transactions
detection and tracking pilot study final report. 1998. on, 15(6):12551267, 2013.
[3] S. Arora, R. Ge, and A. Moitra. Learning topic modelsgoing beyond [27] Y. Rui, T. S. Huang, and S.-F. Chang. Image retrieval: Current
svd. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd techniques, promising directions, and open issues. Journal of visual
Annual Symposium on, pages 110. IEEE, 2012. communication and image representation, 10(1):3962, 1999.
[4] N. Babaguchi, S. Sasamori, T. Kitahashi, and R. Jain. Detecting events [28] E. Sadikov, J. Madhavan, L. Wang, and A. Halevy. Clustering query
from continuous media by intermodal collaboration and knowledge refinements by user intent. In Proceedings of the 19th international
use. In Multimedia Computing and Systems, 1999. IEEE International conference on World wide web, pages 841850. ACM, 2010.
Conference on, volume 1, pages 782786. IEEE, 1999. [29] S. Song, Q. Li, and N. Zheng. Understanding a celebrity with his salient
[5] P. N. Bennett, R. W. White, W. Chu, S. T. Dumais, P. Bailey, F. Borisyuk, events. In Active Media Technology, pages 8697. Springer, 2010.
and X. Cui. Modeling the impact of short-and long-term behavior on [30] Y. Suhara, H. Toda, and A. Sakurai. Event mining from the blogosphere
search personalization. In Proceedings of the 35th international ACM using topic words. In ICWSM, 2007.
SIGIR conference on Research and development in information retrieval, [31] S. Tan, C.-W. Ngo, J. Xu, and Y. Rui. Celebrowser: An example of
pages 185194. ACM, 2012. browsing big data on small device. In Proceedings of International
[6] D. M. Blei. Introduction to probabilistic topic models. Comm. ACM, Conference on Multimedia Retrieval, page 514. ACM, 2014.
55(4):7784, 2012. [32] T. C. Walber, A. Scherp, and S. Staab. Smart photo selection: Interpret
[7] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the gaze as personal interest. In Proceedings of the SIGCHI Conference on
Journal of machine Learning research, 3:9931022, 2003. Human Factors in Computing Systems, pages 20652074. ACM, 2014.
1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2598704, IEEE
Transactions on Circuits and Systems for Video Technology
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. XX, DECEMBER 2015 12
[33] X. Wang and A. McCallum. Topics over time: a non-markov continuous- Rui Cai is a Lead Researcher at Microsoft Research
time model of topical trends. In Proceedings of the 12th ACM SIGKDD Asia. He received the B.E. and Ph.D. degrees in
international conference on Knowledge discovery and data mining, computer science from Tsinghua University, Beijing,
pages 424433. ACM, 2006. China, in 2001 and 2006, respectively. His research
[34] X. Wang, K. Zhang, X. Jin, and D. Shen. Mining common topics from interests include web search and data mining, ma-
multiple asynchronous text streams. In Proceedings of the Second ACM chine learning, pattern recognition, computer vision,
International Conference on Web Search and Data Mining, pages 192 multimedia content analysis, and signal processing.
201. ACM, 2009. He is a member of Association for Computing
[35] W. Weerkamp, R. Berendsen, B. Kovachev, E. Meij, K. Balog, and Machinery (ACM) and the Institute of Electrical and
M. De Rijke. People searching for people: Analysis of a people search Electronics Engineers (IEEE).
engine log. In Proceedings of the 34th international ACM SIGIR
conference on Research and development in Information Retrieval, pages
4554. ACM, 2011.
[36] J. Weng and B.-S. Lee. Event detection in twitter. ICWSM, 11:401408,
2011.
[37] C.-C. Wu, T. Mei, W. H. Hsu, and Y. Rui. Learning to personalize trend-
ing image search suggestion. In Proceedings of the 37th international Houqiang Li (SM12) received the B.S., M.Eng.,
ACM SIGIR conference on Research & development in information and Ph.D. degrees from the University of Science
retrieval, pages 727736. ACM, 2014. and Technology of China (USTC), Hefei, China, in
[38] Y. Yang, T. Pierce, and J. Carbonell. A study of retrospective and on- 1992, 1997, and 2000, respectively, all in electronic
line event detection. In Proceedings of the 21st annual international engineering.
ACM SIGIR conference on Research and development in information He is currently a Professor at the Department
retrieval, pages 2836. ACM, 1998. of Electronic Engineering and Information Science,
[39] X. Zhang, L. Zhang, and H.-Y. Shum. Qsrank: Query-sensitive hash USTC. He has authored or co-authored over 100
code ranking for efficient epsilon-neighbor search. In Proc. CVPR, pages papers in journals and conferences. His current re-
20582065, 2012. search interests include video coding and commu-
[40] Q. Zhao, T.-Y. Liu, S. S. Bhowmick, and W.-Y. Ma. Event detection nication, multimedia search, and image/video anal-
from evolution of click-through data. In Proceedings of the 12th ACM ysis.
SIGKDD international conference on Knowledge discovery and data Dr. Li served as an Associate Editor of the IEEE TRANSACTIONS ON
mining, pages 484493. ACM, 2006. CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY from 2010 to
2013 and has been in the Editorial Board of Journal of Multimedia since
2009. He has served on technical/program committees, organizing committees
and as Program Co-Chair, Track/Session Chair for over ten international
conferences. He was the recipient of the Best Paper Award for Visual
Jun Xu is a PHD student at USTC(University of Communications and Image Processing in 2012, for International Conference
Science and Technology of China). He received on Internet Multimedia Computing and Service in 2012, for the International
the B.E. degree from USTC, in 2015. His research Conference on Mobile and Ubiquitous Multimedia from in 2011, and a senior
interests include data mining, video analysis, pattern author of the Best Student Paper of the 5th International Mobile Multimedia
recognition, computer vision and multimedia content Communications Conference in 2009.
analysis. He took part in Microsoft Research, Bei-
jing, China as an intern since from 2012 to 2013 and
from 2014 to 2016.
1051-8215 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.