Sie sind auf Seite 1von 10

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & 6367(Print), ISSN

0976 6375(Online) Volume 4, Issue 4, July-August (2013), IAEME

TECHNOLOGY (IJCET)

ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume 4, Issue 4, July-August (2013), pp. 257-266 IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com

IJCET
IAEME

QUERY CLIP GENRE RECOGNITION USING TREE PRUNING TECHNIQUE FOR VIDEO RETRIEVAL
Vilas Naik1, Vishwanath Chikaraddi2, Prasanna Patil3
1 2

Department of CSE, Basaveshwar Engineering College, Bagalkot, India Department of CSE, Basaveshwar Engineering College, Bagalkot, India 3 Department of CSE, Basaveshwar Engineering College, Bagalkot, India

ABSTRACT Optimal efficiency of the retrieval techniques depends on the search methodologies that are used in the data retrieving system. The use of inappropriate search methodologies may make the retrieval system ineffective. In recent years, the multimedia storage grows and the cost for storing multimedia data is cheaper. So there is huge number of videos available in the video repositories. It is difficult to retrieve the relevant videos from large video repository as per user interest. Hence, an effective video and retrieval system based on recognition is essential for searching video relevant to user query from a huge collection of videos. An approach, which retrieves video from repository by recognizing genre of user query clip is presented. The method extracts regions of interest from every frame of query clip based on motion descriptors. These regions of interest are considered as objects and are compared with similar objects from knowledge base prepared from various genre videos for object recognition and a tree pruning technique is employed to do genre recognition of query clip. Further the method retrieves videos of same genre from repository. The method is evaluated by experimentation over data set containing three genres i.e. sports movie and news videos. Experimental results indicate that the proposed algorithm is effective in genre recognition and retrieval. Keywords: Genre recognition, Motion detection, Video retrieval, Visual query, ROI, Tree pruning. 1. INTRODUCTION

During recent years, methods have been developed for retrieval of videos based on common visual features such as, color, texture, shape, motion. These features are employed in finding similarity between query and videos from repository. Despite the sustained efforts in the last years, the paramount challenge remains bridging the semantic gap. By this it means that low level features are easily measured and computed, but the starting point of the retrieval process is typically the high
257

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 4, July-August (2013), IAEME level query from a human. Translating or converting the question posed by a human to the low level features seen by the computer illustrates the problem in bridging the semantic gap. Thus Video retrieval is an important technology used in the design of video search engines and extraction of a preliminary set of related videos from the database. Content-based visual information retrieval (CBVIR) is the application of computer vision and the video retrieval, that is, the problem of searching for intended video in large databases. Content in this context might refers to color, shapes, textures, or any other information that can be derived from the image or video itself. Without the ability to examine video content, searches must rely on metadata such as captions or keywords, which may be laborious or expensive to produce. The content-based video retrieval is the most challenging and important problem of practical value. It can help users to retrieve desired video from a large video database efficiently based on the video contents through user interactions. The video retrieval system can be roughly divided into two major components: a module for extracting representative features from video frames and one defining an appropriate similarity model to find similar video frames from database. Many approaches used different kinds of features to represent a video frame, including histogram, shape information, texture, text analysis. Few approaches integrate features to improve the retrieval performance However all the frames of video may not be globally homogeneous but are represented by frame containing different objects. Further frames of videos of same genre contain similar set of objects may be in different spatiotemporal combination. Therefore extracting objects from video, annotating them over video is an important step towards finding similar videos. The work proposed is a method which segments the objects from frames and recognizes them and employ tree pruning for identifying spatiotemporal combination of these object in frames then into video for recognizing the genre of the video. The rest of this paper is organized in 4 sections as follows. Section 2 provides a literature overview. Section 3 presents the proposed algorithm and the details about it. In Section 4 experimentation and results are discussed. Finally, in Section 5 conclusions are given. 2. RELATED WORK

The literature presents numerous algorithms and techniques for the retrieval of significant videos from the database due to the widespread interest of content-based video retrieval in a large number of applications. Some recent researches related to content-based video retrieval are discussed in this section The traditional text based search experiences the subsequent drawbacks: The manual annotations are time consuming and costly to implement. With the increase in the number of media in a database, the complexities in determining the required information also increases. The manually annotation of all attributes of the media content is a difficult task [1]. Also the manual annotations fall short in handling the difference of subjective perception. Acquiring the entire attributes of the content of any media is unachievable [2]. For this reason, a good search technique for Content-Based Video Retrieval System (CBVR) is required. In other words, content-based is defined as the search which will examine the original image contents. Here, content relates to colors, shapes, textures, or any other information that can be obtained from the image directly [3].Recently, CBVR system has been widely studied. In CBVR, vital information is automatically taken out by employing signal processing and pattern recognition techniques for audio and video signals. Digital video needs to efficiently store the index, store, and retrieve the visual information from multimedia database. Video has both spatial and temporal dimensions and video index should capture the spatio-temporal contents of the scene. In order to achieve this, a framework mainly works into three basic steps. Shot segmentation, Feature extraction and finally similarity match for effective retrieval of the query clip. This approach has established a general framework of image retrieval from a new perspective. The
258

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 4, July-August (2013), IAEME query example may be an image, a shot or a clip. A shot is a sequence of frames that was continuously captured by the same camera, while a clip is a series of shots describing a particular event. Current techniques for content-based video retrieval can be broadly classified into two categories. First type with frame sequence matching [4], proposed a scheme that matches videos based on similarity of temporal activity, it finds similar actions. Furthermore, it provides precise temporal localization of the actions in the matched videos. Video sequences are represented as a sequence of feature vectors called fingerprints. The fingerprint of the query video is matched against the fingerprints of videos in a database using sequential matching. In [5] author achieves the compact shot representation by integrating the color and spatial features of individual frame. In the video matching step, a shot similarity measure is defined to locate the occurrence of similar video clips in the database. In [6] original two-phase scheme for video similarity detection is proposed. For each video sequence, they extract two kinds of signatures with different granularities: coarse and near coarse. In the second phase, the query video example is compared with the results of the first phase according to the similarity measure of the near signature. They achieve better quality results than the conventional approaches. Many works [7],[8],[9] & [10] has been discussed and designed their models using the concept of frames sequence matching. Second type is key-frame based shot matching: in [11] the algorithm using key-frames of abrupt transitions is implemented. They extracted image features (color, texture and motion) around the key frames. For each key frame in the query, a similar value is obtained with respect to the key frames in the database video. Consecutive key frames in the database video that are highly similar to the query key frames are then used to generate the set of retrieved video clips. In [12] an efficient algorithm for video sequence matching using the modified Hausdorff distance and the directed divergence of histograms between successive frames is proposed. To effectively match the video sequences with a low computational load, author uses the key frames extracted by the cumulative directed divergence and compares the set of key frames using the modified Hausdorff distance. The same approach of key frame based shot matching is used by [13], [14], [15],[16] & [17]. The approach of frame sequence matching is derived from the sequential correlation matching that is widely used in the signal processing domain. These methods usually focus on frameby-frame comparison between two clips in order to find sequences of frames that are consistently similar. The common drawback of these techniques is the heavy computational cost of the exhaustive search. Although there exist some techniques [18] to improve the linear scanning speed, their time complexity still remains at least linear to the size of database. Additionally, these approaches are susceptible to alignment problem when comparing clips of different encoding rates. In second category, each video shot is represented by a key-frame compactly. To reduce computational cost, video sequence matching is achieved by comparing the visual features of key-frames. The problem with these approaches lies in that they all leave out the temporal variations and correlation between key-frames within an individual shot. Also, it is not clear as to which image should be used as the key-frame for a shot. To strike a good balance between searching accuracy and computational cost, in this paper, we propose an integrated approach for shot matching. In contrast to previous approaches, our approach analyzes all frames within a shot to extract more visual features for shot representation. Because there does not exists a single visual feature for the best representation of video content, we integrate several visual features to capture the spatio-temporal information more accurately. The proposed method is video retrieval mechanism based on video clip query. The mechanism first identifies the genre of query clip and retrieves the videos of same genre from video library. For the genre recognition manually trained tree pruning technique is employed.

259

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 4, July-August (2013), IAEME 3. PROPOSED ALGORITHM FOR GENRE RECOGNITION AND RETRIEVAL

The proposed algorithm extracts sample objects as regions of interest (ROI) from every frame based detection of significant motion in video. Then the features from each of these ROIs are extracted and stored in feature file, these features are used for matching. The user entered query clip frames are extracted and regions of interest are segmented out. Euclidean distance based matching is adopted to match the ROIs with those stored in database and identified ROI are annotated over the ROIs in the video itself. Finally system runs tree pruning based technique to recognize genre of query clip and retrieve the relevant videos from archive. The proposed algorithm is described in the following steps, Step1: One video at a time is read from the library and frames are extracted. Each frame is compared with the previous one and motion is detected. Frames with significant motion differences are separated as key frame. Step2: Regions of interest are detected using bounding box and are classified. Step3: Mean and Standard deviation of RGB and HSV color channels are extracted from the regions detected using bounding box. Step4: Query video clip is read, frames are extracted and motion detection is run. Key frames, ROIs are extracted along with the features. Step5: Euclidean distance based matching is adopted to match the ROIS with those stored in database and identified object names are annotated over the objects in the video itself. Step6: Then system runs a tree pruning based technique to retrieve the relevant videos.
VIDEO ARCHIVE

VIDEO

QUERY VIDEO

MOTION BASED KEYFRAME IDENTIFICATON

MOTION BASED KEYFRAME IDENTIFICATON

REGION SEGMENTATION AND FEATURE EXTRACTION

REGION SEGMENTATION AND FEATURE EXTRACTION

EUCLIDIAN DISTANCE BASED MATCHING

TREE PRUNING FOR GENRE CONFIRM

RETRIEVAL RESULT

Figure 4.1 Block diagram of the proposed model


260

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 4, July-August (2013), IAEME Key frame identification Here the proposed method works on group of frames extracted from a video. It takes a list of files or frames in order in which they will be extracted. It is based on predefined threshold that specify whether 2 video frames are similar. It s main function is to choose smaller number of video representative key frames. It starts from 1st frame from sorted list of files. If consecutive frames are within threshold, then two frames are similar. Repeat process till frames are similar, delete all similar frames & take 1st as key -frame. Start with next frame which is outside threshold & repeat the steps for all frames of video. Identification of region of interest Detecting the visually attentive regions in images is done using bounding box technique. The 'Bounding Box' function draws a rectangle around region of interest. The rectangle containing the region, a 1-by-Q *2 vector, where Q is the number of image dimensions: ndims (L), ndims (BW), or numel (CC.ImageSize), width of the bounding box, upper-left corner. Feature extraction Once regions of interest in the video shot has been segmented and tracked, then computing the features of the region of interest is done and are stored in the feature library. For each region of interest Mean and Standard deviation of RGB and HSV channels are extracted. Matching and retrieval In the retrieval, the database video clips that are similar to the query clip are retrieved by means of measuring the similarity in between the query clip and the database video clips. When a query clip is given to the proposed retrieval system, all the aforesaid features are extracted as performed for the database video clips. Then, with the aid of Euclidean distance mechanism, similarity is measured between every database video clip and the query clip, finally using this result videos are retrieved with the help of tree pruning. The distance metric can be termed as similarity measure, which is the key-component in Content Based Image Retrieval. Here the Euclidean distance between ROIs of the videos in the database and the ROI of query video are calculated and used for ranking. The query videos ROI is more similar to the database videos ROI if the distance is smaller. If x and y are feature vectors of database ROIs and query ROI respectively. Then the distance metrics are defined as follows: The Euclidean distance measure can be defined as 3.4 3.3 3.2 3.1

Finally existing video search algorithm utilize a tree-structured hierarchy and subtree pruning to reduce the search space while traversing the tree from root to leaf nodes for a given query video.

261

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 4, July-August (2013), IAEME

Figure 4.2. Illustration of the notion of NRC. (a) Tree-structured hierarchy with the NRC rk associated with a node k , k =1, 2, 3, 4. (b) The NRC rx of a node x for the cluster Cx Details are as illustrated in Fig. 1, each node x has a feature vector which represents its whole cluster, i.e., subtree within the bound of what we call NRC. The NRC, denoted by rx, is defined as the maximum distance between the node x and the subordinate leaf nodes, i.e., belonging to , and is computed as follows:

Where d (.) denotes a distance metric between two feature vectors. To retrieve all similar videos whose distance from the query q is within a threshold value of , every node in the tree hierarchy needs to be visited, but some irrelevant clusters can be pruned without degrading the recall rate of retrieval unless the following triangle inequality holds:

The evaluation of d(x, y) in (2) involves dimensional feature vectors.


262

distance computation between two high-

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 4, July-August (2013), IAEME 4. EXPERIMENTATION AND RESULTS The proposed content-based video retrieval system is implemented using MATLAB (Matlab2010b) and the performance of the proposed system is analyzed using the evaluation metrics including precision, recall measures. The experimented are conducted in windows XP based system with 3GB RAM on a data set containing 200 videos. 4.1 Datasets We have performed experiments on a dataset of 200 videos obtained from YouTube website (www.youtube.com). The collected video contains the following categories of objects presented in these videos such as sports, news and movies. The sample snapshot for the input videos of the proposed system is given in figure.

Figure 5.1: A sample snapshot for the input database

263

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 4, July-August (2013), IAEME 4.2 Identification of region of interest This representation contains the significant visual contents in a shot by processing every frame presented in it. The IOROI for the SPORTS video is constructed as shown in figure.

Fig 5.2 a) Snapshot of query video

Figure 5.2 b).A sample snapshot of region of interests Feature extraction For each region of interest Mean and Standard deviation of RGB and HSV channels are extracted. Video retrieval using tree pruning The features Mean and Standard deviation of RGB and HSV channels are extracted from the ROI of query video and are matched with the features of ROIs stored in library using the Euclidian distance mechanism. Finally for retrieving, the proposed system uses tree pruning technique, where a tree-structured hierarchy is used, in which each node is associated with a ROI image or a feature vector which represents all of the images belonging to its subtree. In the same context, each child represents a disjoint subset of the images and thus partitions the subtree rooted at its parent node into smaller units. Each leaf node corresponds to a single video in the database. With the node radius for cluster (NRC), which is defined as the maximum distance between the node and its descendants or cluster, stored at each intermediate node, the triangle inequality is applied to reduce the search space by pruning irrelevant clusters. The matching score compute is used to retrieve the videos from the dataset and the retrieved video for the corresponding input videos is given in figure 5. 4.4 4.3

Figure 5.3 A sample snapshot of retrieved videos


264

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 4, July-August (2013), IAEME Quantitative analysis The performance of the proposed approach system is evaluated on the input dataset using the precision and recall measure. Graph-1 shows the Precision-Recall plot: For quantitative analysis, videos from each category are given to the proposed system and results are evaluated with the defined as follows. 4.5

Table 5.1 Experimental results VIDEO GENRE RECALL PRECESSION SPORTS .625 1.00 MOVIES .538 .92 NEWS .625 1.00 5. CONCLUSION AND FUTURE WORK An algorithm for content based video retrieval is designed and experimented on sufficient number of different genres of videos .The algorithm is implemented in Matlab 2010b and executed on Intel core2duo, 2.66 GHz processor with 3 GB of RAM. The algorithm initially extracted Regions of interest through motion estimation. Features of these ROIs are extracted and are matched with the ROIs of query video. Finally retrieval of videos is carried out using tree pruning. The proposed method has been experimented on different genres of videos like sports, movies and news clips. The performance of the video summarization algorithm is evaluated by the precision and recall measures. The experimental results on standard video datasets reveal that the proposed model is robust and extracts videos from variant genres efficiently. The system can be further updated with more complicated features that include both shape and texture descriptors like wavelet moments. REFERENCES [1] Chia-Hung Wei, Chang-Tsun Li,(2004) Contentbased multimedia retrieval introduction, applications, design of content-based retrieval systems, feature extraction and representation International Journal of Wireless and Microwave Technologies(IJWMT)ISSN: 2076-1449 Petkovic, Milan, Jonker, Willem,(2003)Content-based video retrieval, Kluwer Academic Publishers, Boston, Monograph, 2003, 168 p., Hardcover ISBN: 978-1-4020-7617-6 John Eakins, Margaret Graham,(1999) University of Northumbria at Newcastle, Contentbased Image Retrieval (JISC Technology Applications Program Report 39 -1999deo Browsing Strategies.
265

[2] [3]

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 6375(Online) Volume 4, Issue 4, July-August (2013), IAEME [4] [5] Mohan, R.(1998), Video sequence matching, Proceedings of International Conference on Acoustic, Speech and Signal Processing, pp. 36973700. Tan Y. Kulkarni S., & Ramadge, P. (1999), A framework for measuring video similarity and its application to video query by example, International Conference on Image Processing, pp. 106110. Naphade, M., Yeung, M. & Yeo, B. (2000), A novel scheme for fast and efficient video sequence matching using compact signature,SPIE Conference on Storage and Retrieval for Media Database, pp. 564572. Hoad, T. & Zobel, J. (2003), Fast video matching with signature alignment,ACM SIGMM International Workshop on Multimedia Information Retrieval, Berkeley, CA, pp. 262269. Ren, W. & Singh, S. (2004), Video sequence matching with spatio-temporal constraints, International Conference on Pattern Recognition, pp. 834837. Kim, C. & Vasudev, B. (2005), Spatiotemporal sequence matching for efficient video copy detection, IEEE Transactions on Circuits and Systems for Video Technology 15(1), 127132. Toguro, M., Suzuki, K., Hartono, P. & Hashimoto, S. (2005), Video stream retrieval based on temporal feature of frame difference, Proceedings of International Conference on Acoustic, Speech and Signal Processing, Volume 2, pp. 445448. Jain, A., Vailaya, A. & Wei, X. (1999), Query by video clip, Multimedia Systems 7, 369 384. Lienhart, R., Effelsberg, W. & Jain, R. (2000), VisualGREP: A systematic method to compare and retrieve video sequences, Multimedia Tools and Applications 10(1), 4772. Liu, X., Zhung, Y. & Pan, Y. (1999), A new approach to retrieve video by example video clip, ACM International Conference on Multimedia, pp. 4144. Kim, S. & Park, R. (2002), An efficient algorithm for video sequence matching using the modified Hausdorff distance and the directed divergence, IEEE Transactions on Circuits and Systems for Video Technology 12(7), 592596. Diakopouos, N. & Volmer, S. (2003), Temporally tolerant video matching, in ACM SIGIR Workshop on Multimedia Information Retrieval, Toronto, Canada. Peng, Y. & Ngo, C. (2004), Clip-based similarity measure for hierarchical video retrieval, ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 5360. Luo, H., Fan, J., Satoh, S. & Ribarsky, W. (2007),Large scale news video database browsing and retrieval via information visualization, ACM symposium on applied computing, Seoul, Korea, pp. 10861087. Kashino, K., Kurozumi, T. & Murase, H. (2003), A quick search method for audio and video signals based on histogram pruning, IEEE Transactions on Multimedia 5(3), 348357. Reeja S R and Dr. N. P Kavya, Motion Detection for Video Denoising The State of Art and the Challenges, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 2, 2012, pp. 518 - 525, ISSN Print: 0976 6367, ISSN Online: 0976 6375. Vilas Naik and Raghavendra Havin, Entropy Features Trained Support Vector Machine Based Logo Detection Method for Replay Detection and Extraction from Sports Videos, International Journal of Graphics and Multimedia (IJGM), Volume 4, Issue 1, 2013, pp. 20 - 30, ISSN Print: 0976 6448, ISSN Online: 0976 6456.

[6]

[7]

[8] [9]

[10]

[11] [12] [13] [14]

[15] [16]

[17]

[18]

[19]

[20]

266

Das könnte Ihnen auch gefallen