Sie sind auf Seite 1von 13

Video Shot Boundary Detection Using Visual Bag-of-Words

ABSTRACT:
In this paper ,the system can produce accurate shot change detection which is useful for
video content analysis. In few days ago, many convergence of techniques are used in image
analysis and video processing. Also many computation and memory intensive analysis methods
have become available for per frame processing of videos due to increased computing power of
desktop computers and efficient implementations on multiple cores and graphical processing
units.(GPUs). So according to this, As our main contribution regarding this work is, by the help
of a popular image analysis (object detection) approach:visual Bag-of-words(BoW) we solve
problem of short boundry detection.colour histogram is the main baseline for the short boundary
detection and it is also core of many top methods, but our BoW method of similar complexity in
terms of parameters clearly outperforms colour histogram. Interestingly, an AND-combination
of colour and BoW histogram detection this two techniques is clearly superior indicating that
colour and local features provide complimentary information for video analysis.

INTRODUCTION:
Video Shot Detection is also called Cut Detection, is a field of research of video
processing. The objective of cut detection is finding the position in the video in that one scene is
replaced by another one with different visual content. Problem settings in image and video
processing/analysis problems this two things are almost equivalent to each other, but newly
adopted approaches have been divergent due to per frame processing required in many video
processing tasks, such as in video shot boundary detection. For example, one hour of video
contains approximately100,000 frames, and the processing time of one second per frame would
take 27 hours in total. In this type of tasks typically fast-to-compute features, such as colour
histograms, have been used. On the other hand, benchmark databases for image analysis have
also become very large. For example, there are nearly 15 million annotated images in the
ImageNet. This has set new demands for approaches, and development has not only produced

new techniques, but also more efficient implementations of the existing ones. Amongst the
various structural levels (i.e., frame, shot, scene, etc.), shot level organization has been regarded
suitable for browsing and content-based retrival.so in this paper another popular approach has
been proposed for Content based image retrieval which is called Bag of Visual Words. Shot
boundary detection is usually the entry phase for automatic video indexing and browsing. it has
been studied deeply in recent past few years.
now a days Video have become a popular entertainment. It has found applications in
different domains like in video indexing, video compression, video access and others. Video
processing is a new area which attracted many researchers on short boundary detection in digital
video. As the amount of user generated videos increase, a large collection of popular videos are
available in websites. Searching for videos from the large collection is becoming a tedious task.
The viewers require control over the data, so the video browsing and indexing application are
developed. So here, we elaborate new method for processing of massive amounts of Images is,
the state-of-the-art i.e Bag-of-Words(BoW) method. also dense SIFT for feature detection and
representation, k-means clustering for codebook generation,L1-normalisation of codebook
histograms, and the Euclidean distance for code matching. In this paper our main focus is to
apply this method for shot boundary detection.

LITERATURE SURVEY:
Video shot boundary detection and content based retrival analysis(Truong and Venkatesh,
2007) and(Sivic and Zisserman, 2003)
Video shot boundary detection is the first step before higher level processing in video
abstraction(Truong and Venkatesh, 2007) and content based retrieval (Sivic and Zisserman,
2003), For shot boundry analysis, the shots are usually considered as basic units and thus success
of the boundary detection affects the whole processing pipeline. The shot detection has been
studied within specific applications and as its own problem and a wide variety of proposed
methods exist.

Fuzzy logic approach for detection of video shot boundaries.


A fuzzy logic approach to integrate hybrid features for detecting shot boundaries inside
general videos. The fuzzy logic approach contains two processing modes, where one is dedicated
to detection of abrupt shot cuts including those short dissolved shots, and the other for detection
of gradual shot cuts. These two modes are unified by a mode-selector to decide which mode the
scheme should work on in order to achieve the best possible detection performances.

Introduction to the subject and an analysis of the best approaches with the common
benchmark data(Smeaton et al., 2010)
A good introduction to the subject and an analysis of the best approaches with the
common benchmark data were provided in a TrecVid survey (Smeaton et al., 2010) which
summarised the findings over seven years of the TRECVid shot boundary competition. The vast
majority of the best performing methods utilise colour histograms and machine learning
algorithms, such as GMM (Gaussian Mixture Models) (Kang and Hua, 2005) or HMM (Hidden
Markov Models) (Pruteanu-Malinici and Carin, 2008). It is noteworthy, that the colour histogram
difference, which is considered as the baseline method, performs notably well and is virtually
parameter free except the difference detection threshold (Gargi et al., 2000).

Video Shot Boundary Detection Using Visual Bag-of-Words.


shot boundary detection using a popular image analysis (object detection) approach
visual bag-of-words (BoW). The approach for the shot boundary detection has been color
histogram and it is at the core of many top methods, but our BoW method of similar complexity
in the terms of parameters clearly outperforms color histograms. Interestingly, an ANDcombination of color and BoW histogram detection is clearly superior indicating that color and
local features provide complimentary information for video analysis.

Baseline methods(Lazebnik et al., 2006;Leibe et al., 2008; Cao et al., 2010)


The baseline method are available in a large manner and in form of variants and
extensions (e.g., (Lazebnik et al., 2006;Leibe et al., 2008; Cao et al., 2010)), but often the basic
method performs the best (Tuytelaars et al., 2010) and for large scale problems the most efficient

discriminative methods are not feasible anymore (Deng et al., 2010). For this work, we adopt the
recent implementation in (Deng et al., 2010).

Efficient Closed-Form Solution to Generalized Boundary Detection.


A unified formulation for boundary detection, with closed-form solution, which is
applicable to the localization of deferent types of boundaries, such as intensity edges and
occlusion boundaries from video and RGB-D cameras.

PROBLEM DEFINITION:
In previous system convergence of techniques used in image analysis and video
processing has occurred. Many computation and memory intensive image analysis methods have
become available for per frame processing of videos due to increased computing power of
desktop computers and efficient implementations on multiple cores and graphical processing
units. but there is no any shot boundary detection using a popular image analysis (object
detection).

EXISTING SYSTEM:
Video Shot Boundary Detection for partial segmentation only certain parts are extracted
from a video and the rest is disregarded. The original video cannot be reproduced. This is
common for surveillance scenarios or for highlight extraction in sports videos, where parts of the
video where nothing happens are left out. A special case of partial video scene segmentation is
video skimming. In contrast to video scene segmentation, where videos are indexed on the scenelevel, the purpose of video skimming is to summarize the most important scenes of a video.
Viewers should get the most important information, which is contained in a video, in a fraction
of its duration. Some approaches are presented that may be related to video skimming, but as the
authors of the corresponding papers explain their highlight scene extraction methods in detail we
decided to mention them in this survey.

Fig: Problem of Existing Sysyem


A survey of segmentation methods was already presented by Vendring and Worring[77]
ten years ago. They only focus on visual segmentation methods for movies and TV series. Scene
segmentation algorithms are analyzed according to the method how shots are compared
(sequentially or group-wise) and the temporal distance function. A classification framework has
been dened to categorize them into four classes. The advantages and disadvantages of the dierent
classes are discussed and an evaluation method for video segmentation is presented. The scene
detection task is transformed into a graph partitioning problem. all graph-based algorithms have
in common that shots are clustered based on similarity (in most cases also based on temporal
closeness) and arranged in a graph representation. Video scene segmentation using graph-based
solutions works best for restricted environments. Especially for videos with always repeating
types of scenes,like news broadcasts or talk shows. The accuracy is lower if this method is
applied to motion pictures. Movies have dynamic environments and directors relyon dierent
camera techniques and directors to trigger certain emotions of the audience. Algorithms based on
stochastic methods address the boundary detection problem with stochastic models. An optimal
solution is approximated by maximizing the a posteriori probability of the estimated scene
boundaries to be correct. With stochastic-based approaches a high accuracy can be achieved, but
a lot of data is needed in advance for the determination of the stochastic models and for the
creation of training sets.

Beside the needs for improved accuracy, the fast processing of visual information is
another important issue, particularly for embedding these algorithms in semi-automatic video
editing tools. To this end, the development of GPU-based implementation of the previously
described analysis techniques could significantly reduce the required computation time. Thus
contributing to the acceleration of the overall procedure of video editing towards the creation of
content for interactive TV applications.

OUTLINE OF PROPOSED SYSTEM:

Fig: outline of proposed system

The automatic understanding of multimedia content, particularly capturing the meaning


that real life images and video content, has long been one of the major challenges in the
multimedia community. So here in this paper introduced the detection and understanding of high
level events that are depicted on or otherwise related to the video content. After introducing the
notion of video event and providing motivation for the event based analysis and organization of
video content. in this paper review the existing literature on complex video event detection.
Subsequently, introduce some more detail a very promising class of techniques that are use
traditional concept or category based video analysis results as the stepping stone for detecting
complex events in video. this technique will provide the reader with a comprehensive overview
of the state of the art in the timely research topic of video event understanding.
This method i.e Bag-of-Word(BoW) method for processing of massive amounts of
images: dense SIFT for feature detection and representation, k-means clustering for codebook
generation, L1-normalisation of codebook histograms, and the Euclidean distance for code
matching are introduced. Our main contribution is to apply this method for shot boundary
detection. In addition, we compare video specific codebooks, generated from the local features
extracted from an input video, to a general codebook generated from the ImageNet descriptors.
Moreover, we study the effect of varying the codebook size, which is the most important
parameter of Bag-of-word( BoW). The experiments are performed using the TRECVid 2007 shot
boundary detection competition data. We compare our approach to the frame windows method
which was among the top performers in and can be considered as the baseline method for shot
boundary detection. so in this way our experiment results the boundary of video shot with the
help of Bag-of-Word. Shot Boundary Detection has been an area of active research. Many
automatic techniques have been developed to detect frame transition in video sequences. There
are all together many ways of detecting shot boundary detection.
The simplest way is by comparing the pixel values of corresponding frames. The way
alternate to pixel matching is using a gray-scale or color histogram. Another way of detecting
shot boundary is by edge change. Other methods use predefined models, objects, regions or
spatial-temporal sub-sampling to detect camera breaks. In this method we specify how well the
general codebook performs as compared to a specific codebook, which is re-generated for every
input video. Specific codebooks are generated using features extracted from selected frames (one
frame per second in our implementation) and using the k-means clustering method. Our main

focus is on to detect short boundaries with the help of efficient bag-of-features method. our
method performed better than the baseline the colour histograms are used by many state-of-theart methods. also focuses on the study of the motion activity descriptor for shot boundary
detection in video sequences.

Aims and Objectives:


In this, we have main aim is to set the problem of Video Shot Boundary Detection
Using Visual Bag-of-Words:
Problem settings in image and video processing/analysis problems are almost equivalent,
but adopted approaches have been divergent due to per frame processing required in many video
processing tasks, such as in video shot boundary detection. For example, one hour of video
contains approximately 100,000 frames, and the processing time of one second per frame would
take 27 hours in total. In this kind of tasks typically fast-to-compute features, such as colour
histograms, have been used. On the other hand, benchmark databases for image analysis have
also become very large. For example, there are nearly 15 million annotated images in the
ImageNet. This has set new demands for approaches, and development has not only produced
new techniques, but also more efficient implementations of the existing ones. for this our main
focus is on:
1 An efficient bag-of-features method for detecting shot boundaries. Our method performed
better than the baseline (note that colour histograms are used by many state-of-the-art methods).
2 We investigate the effect of the codebook size and whether the codebook should be video
specific or general, both being important computational considerations.
3 We show how the combination of colour histograms and local feature histograms provides
clearly superior results indicating that colour and local features provide complementary
information for video processing.
Our approach is based on bag of of visual words model, which originally comes from
document representation in terms of form and semantics. The bag of words model has been
widely used in classification, recognition, content based image retrieval and detection. Inspired
by the image classification algorithm proposed by Csurka et.al, which has been proved to
effectively in static image dataset, we propose to classify sports video by extracting key frames,

and classify each of them. The v the final classification result of the video is the production of
the sources output from all of the key frames. Our method contains the following 4 main steps:

1 Descriptor of video frames are detected and extracted by SURF approach.


2 The descriptors are then used to form up the visual word vocabulary(codebook)by using a
cluster algorithm.
3 A histogram represented is formed to count the number of visual words appeared in the each
frame.
4 A multi-class classifier considers histogram representation of a frame as a feature vector. It,
then, determines which class to assign the test frame to.

Fig: Illustration of our four steps method base on Bag of Words model

PLAN OF EXECUTION:

Effort

Task

Deliverables

Milestones

weeks
Analysis of existing systems & compare with 4 weeks
proposed one

Literature survey

1 week

Designing & planning

1+2 weeks

o System flow

1 weeks

o Designing modules & its 2 week


deliverables

Modules
design document

Implementation

8 weeks

Primary system

Testing

3 weeks

Test Reports

Thesis

1 weeks

Complete

formal

project formal

report
Phase

Task

Description

Phase 1

Analysis

Analyze the information related to the remote Surveillance.

Phase 2

Literature survey

Collect raw data and elaborate on literature surveys.

Phase 3

Design

Assign the module and design the process flow control.

Phase 4

Implementation

Implement the code for all the modules and integrate all the
modules.

Phase 5

Testing

Test the code and overall process weather the process works
properly.

Phase 6

Prepare the thesis for this project with conclusion and future

Thesis

enhancement.

Phase 1
Phase 2
Phase 3
Phase 4
Phase 5
Phase 6

S/W REQUIREMENTS:

Language : Java

Database : Mysql

Tool used : Eclipse

Feb/15

Dec/14

Oct/14

Sep/14

Phase

Aug/14

Date

H/W REQUIREMENTS:

Processor : Pentium iv 1.1GHz

RAM : 1GB RAM

Hard Disk :80 GB

Future Scope:
In future work, we will investigate other low level video processing tasks using the BoW
approach and optimization of our implementation to run on at least frame rate. With the help of
this video shot boundary detection using Bags-of-Words. Also the paper has presented a novel
approach to satisfy sport videos. The proposed method consist of main steps, including
descriptors detected and extracted by SURF, visual word vocabulary formed up, histogram
representation constructed and the multi-class classifier used. We have collected a large realworld dataset with a high diversity including 600 videos with a total of more than 6000 minutes
for 10 different kind of sports. Our system shows the Bag-of-Words model is highly appropriate
with sports video shot classification problem. Extensive setups are demonstrated to the
advantages of different parameters such as: codebook sizes, classifier kernel. In future, we are
going to integrate more sports into the dataset. We will try to improve our model to speed up the
running time as well as avoid confusion. We also would like to integrate with state-of-the-art
shot boundary detection to automatically provide the shots for classification.

Conclusion:
In this way, we solve the problem of low level video processing task of video shot
boundary detection, by using this new approach for object detection, i.e. visual Bag-ofWords(BoW). We utilised the available efficient implementations and our method, which has
equal complexity in terms of the number of parameters, achieved clearly superior performance to
the baseline.

REFERENCES:
Truong, B. and Venkatesh, S. (2007). Video abstraction: A systematic review and classification.
ACM Trans. On Multimedia Computing, Communications and Applications(ACM TOMCCAP),
3(1).

Sivic, J. and Zisserman, A. (2003). Video Google: A text retrieval approach to object matching
in videos. In ICCV

Smeaton, A., Over, P., and Doherty, A. (2010). Video shot boundary detection: Seven years of
TRECVid activity. Computer Vision and Image Understanding,114:411418.

Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid
matching for recognizing natural scene categories. In CVPR.

Pruteanu-Malinici, I. and Carin, L. (2008). Infinite Hidden Markov Models for Unusual-Event
Detection in Video. IEEE Trans. on Image Processing, 17(5):811822.

Joyce, R. A. and Liu, B. (2006). Temporal segmentation ofvideo using frame and histogram
space. IEEE Transactionson Multimedia, 8(1):130140

Gargi, U., Kasturi, R., and Strayer, S. H. (2000). Performance characterization of video-shot
change detection methods. IEEE Trans. Circuits Syst. Video Techn., 10(1):113.

Kang, H.-W. and Hua, X.-S. (2005). To learn representativeness of video frames. In ACM
international conference on Multimedia

Das könnte Ihnen auch gefallen