PaperA Full Reference Algorithm For Dropped Frames Identification in Uncompressed Video Using Genetic Algorithm

A Full Reference Algorithm for Dropped Frames Identification in Uncompressed Video Using Genetic Algorithm Manish K Thakur, Vikas
Saxena, J P Gupta
A Full Reference Algorithm for Dropped Frames Identification in Uncompressed Video Using Genetic Algorithm
*1,
Manish K Thakur, 2Vikas Saxena, 3J P Gupta Jaypee Institute of Information Technology, Noida, India, mthakur.jiit@gmail.com 2, Jaypee Institute of Information Technology, Noida, India, vikas.saxena@jiit.ac.in 3, Sharda University, Greater Noida, India, jaip.gupta@gmail.com
Abstract
Dropped frames identification in a given video stream is always a challenging task for research community due to required heavy computation. To preserve the quality of service during any processing over visual information, drop of frame is least desired. Many contemporary work identifies the frame drop in terms of full reference algorithm (where reference and distorted video streams are available for comparison), or reduced reference algorithm (where some information about reference video are available) or no reference algorithm (where information about reference video are not available). This paper presents a novel full reference heuristic approach using genetic algorithm which identifies the dropped frame indices in a given distorted video stream with respect to original video stream. The proposed algorithm efficiently identifies dropped frame indices even if reference video stream contains repeated frames and spatially distorted too with low or high spatial distortions. The proposed algorithm is simulated and tested with 12 video streams. Simulation results suggested that it is more efficient with a video stream having lesser repeated frames.
Keywords: Frame Drop; Temporal Distortion; Spatial Distortion; Peak Signal To Noise Ratio
(PSNR); Longest Increasing Sequence; Genetic Algorithm
1. Introduction
In the last decade, enormous and unprecedented growth of information in various domains across the world demanded the significant application of high ended multimedia based technology to cater the effective data flow communication, storage, retrieval, compression, and editing at various platform. Due to various processing over visual information, it may get distorted when reaches to end user [1-4]. Apart from accidental distortions during processing, it includes some malafied intentions/video tampering where one might distort pixels of a video frame, or drop some video frames, or insert average frame into video [5-8]. Consequently these distortions lead to loss of visual information and if tampered intentionally or accidentally must be identified. Distortions in a video stream appear either in intra frame or at inter-frame level. Intra frame distortions or spatial distortions (SD) arises due to change of pixel bits of a reference video frame; this might be multifactorial governed viz due to introduced bit error during communication, or other manipulation over pixel bits of a video frame like compression, bit inversion, acquisition, watermarking, editing, and storage while, inter frame distortions or temporal distortions (TD) arises with respect to time like, drop of video frames, swapping of video frames, and frame averaging [8-10]. Figure 1 depicts an example of SD where, figure 1a is an original video frame i.e. without distortion and figure 1b is the distorted video frame which is spatially distorted due to introduced bit error as highlighted by blue and orange color. Figure 2 depicts an example of TD where figure 2a is the original video stream of 5 frames, whereas figure 2b, figure 2c, and figure 2d are the temporally distorted video streams due to frame drop, frame swapping, and frame averaging respectively. Due to frame drop or frame averaging the order of unaffected frames in distorted video is not disturbed while frame order will be changed in distorted video stream due to frame swapping.
International Journal of Digital Content Technology and its Applications(JDCTA) Volume6,Number20,November 2012 doi:10.4156/jdcta.vol6.issue20.61
562
A Full Reference Algorithm for Dropped Frames Identification in Uncompressed Video Using Genetic Algorithm Manish K Thakur, Vikas Saxena, J P Gupta
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 18 16 13 12 80 60
20 18 17 15 14 12 80 60
20 18 17 16 13 12 80 60
20 18 20 20 12 10 80 60
20 18 36 34 28 30 50 60
20 18 38 44 32 45 35 60
(a)
(b)
Figure 1. (a) is original video frame and (b) is spatially distorted video frame
Frame #
1 2 5 1 2 4 5 1 5 1
Avg frame 5
3 2 4
2 3
(a)
(b)
rd
(c)
(d)
Figure 2. (a) original video frames, (b) 3 frame as dropped frame, (c) video frames 3 and 4 are swapped, and (d) average frame inserted at frame number 4 As frame drop is least desired, dropped frames identification in a video stream is always a challenging task for research community. There are significant work in terms of quality metrics which identify the dropped frames under full reference (FR), reduced reference (RR), and no reference (RR). S. Wolf in his work identified the dropped frames identification as a problem which requires heavy computation [11]. Therefore to reduce the required computational time, we present in this paper a novel full reference algorithm which identifies the dropped frame indices in a distorted video stream with respect to given reference (or original) video stream. As shown in section 4, presented heuristic efficiently identifies the dropped frames indices. Here, we have used genetic algorithms mutation operator to resolve issues which arises after applying the heuristic. This paper is organized as follows: apart from introduction in section 1, section 2 describes the problem, section 3 deals with proposed solution, simulation and analysis of solution approach is presented in section 4 followed by conclusion and references.
2. The Problem
Many challenges arise during processing of multimedia data which are voluminous and continuous in nature. One of such processing is its communication over multimedia network, where due to packet loss; video frames may not be received at users end and resulting in drop of video frames. The drop of frames in multimedia communication network is accidental and a simple error recovery is to replay the last received frame. However this remedial approach is on the cost of received streams quality which is undesirable. Further, video frames in a video stream may be dropped with malafied intentions. In either case it is required to identify the dropped frames and analyse its impact over the video quality. Researchers are continuously addressing and resolving the mentioned problem. One of the contemporary works presented by S. Wolf proposed NR and RR metrics to identify dropped frames by maintaining motion energy time history of a video clip [11]. As it is NR or RR metric
563
where we do not (or little information) have the information about reference video stream, other distortions (SD) which may be introduced during processing were unhandled. In another contemporary work being carried out by Graphics and media lab of Moscow State University (MSU) presented dropped frame metric which identifies the dropped frames by calculating each frame difference with the previous one [12]. As it identifies the dropped frames by computing the difference of each frame with previous frame; it might restrict the metric to explore other frames for similarity. A work being carried out by Nishikawa K et. al. presented a no reference method for estimating the degradation in video stream due to packet loss by estimating peak signal to noise ratio (PSNR) between frames affected by packet loss and the originally encoded frames [13]. Like MSUs metric, this method also utilizes the similarity between adjacent frames and thus restricting the method to explore other frames (non-adjacent) for similarity. In subsequent paragraph we are defining the problem of dropped frames identification in a full reference (FR) mode by addressing unhandled issues in contemporarys work like spatially distorted video stream and comparing a video frame with all possible frames for similarity. Say, we have given a reference video stream (VR) with m video frames as VR1, VR2, VR3 .. VRm. The reference video V R is distorted (intentionally or accidentally) by one of the TD i.e. frame drop to give distorted video stream (VD ) with n video frames as VD1 , VD2 .. VDn, where m>n. Further, some video frames of the distorted video stream V D are spatially distorted too by introducing bit errors at different pixel bits. The introduced SD may be high or low. Apart from dropped frames in VD , following cases are possible related with introduced SD in video frames: C2.1: There is no SD in video frames of VD . C2.2: Some video frames of VD are distorted with low SD (we call change of 5 to 10 % in at least 10% pixel bits as low SD, as highlighted by orange color in figure 1b) C2.3: Some video frames of VD are distorted with high SD (we call change of more than 10% in at least 10% bits as high SD, as highlighted by blue color in figure 1b) It is desired to identify the dropped frames in distorted video stream VD with respect to original video stream VR which we call as reference video stream.
3. Proposed Solution
As a solution to the problem defined in previous section, this section presents a heuristic approach which identifies dropped frame indices by maximizing the overall similarity between video frames of VR and VD. Considering that we have only to deal with frame drop (i.e. frame order of VR is retained in VD ), a frame VDi of VD might present at the indices ranging from i to i + dropped frames count in VR. Therefore unlike previous approaches (section 2) where a frame was compared with adjacent frames, it is now required to compare it with range of frames (as mentioned) for the similarity. In the example of figure 3, where VR is with m frames and VD is with n = m- 1 frames, first frame of VD will be either first frame or second frame of VR. Thus, it is required to compare first frame of VD only with first two frames of VR for the similarity and so on. There are several quality metrics which through different indexes identifies the introduced noise or similarity between two given images or video frames. PSNR, Mean square error (MSE), and Structural similarity index (SSIM) are few of them [14-18]. PSNR of 100 dB between two video frames indicates that there is no noise in both video frames, i.e. both are identical whereas PSNR of less than 100 dB indicates noise in compared frames. Due to simplest computation required for computation of PSNR, we have used the PSNR index to identify the dissimilarity between two video frames.
564
VR with m frames
VR1
VR2
VR3 .
VRm-1
VRm
. VD with n = m-1 frames VD1 VD2 VD3 VDn
Figure 3. An example of compared frames of V R ( m frames) and VD (n = m1 frames) Subsequent paragraph describes the steps of our solution approach: Step 3.1: Input reference video VR = {VR1, VR2, .. VRm} with m frames and distorted video VD = {VD1, VD2, .VRn} with n frames, where m > n. Step 3.2: Compute PSNR of each frame of VD with respect to m-n+1 frames of V R (ith frame of VD is compared with ith to (i+m- n+1) th frame of V R) and store these indices into a two dimensional matrix having n rows corresponding to length of VD and m+2 columns corresponding to length of VR with two additional fields Max and Index to store highest PSNR in a row and its index respectively i.e. index will be the column number corresponding to highest PSNR in a row. If more than one field in a row have same highest PSNR then store the smaller index into the Index field. We call this matrix as PSNR difference matrix DiffMat. Following cases can arise in Index field of DiffMat as elaborated by examples in figure 4, figure 5, and figure 6 where DiffMat has been created with 12 reference video frames and 6 distorted video frames. As there are drop of 6 video frames, ith frame of VD (for all i = 1 to n) has been compared with ith to (i+6) th frames of VR and computed PSNR has been stored in respective field of DiffMat . C3.2.1 : If there are only TD or frame drop errors ( i.e. no SD) in VD , then, at least one field in each row of matrix DiffMat will contain PSNR as 100dB which indicate the matching frame of VD in VR. Since there is no SD, indices in Index field of each row should be in increasing order. This case has been presented by an example in figure 4. C3.2.2 : Along with TD, if some or all frames of VD are distorted spatially too, then Max field of a row of DiffMat might be or might not be 100dB. Since, there is SD in some or all frames of V D , these indices in Index field (a) might be in increasing order or (b) in random order. Cases C3.2.2a and C3.2.2b have been presented by examples in figure 5 and figure 6 respectively. Step 3.3: Find out longest increasing sequence (LIS) [19-22] in the Index field of DiffMat. Following cases (C3.3.1 and C3.3.2 ) will arise while computing length of identified LIS.
VR1 100 VR2 60 100 VR3 50 48 31 VR4 45 44 32 31 VR5 35 41 36 34 36 VR6 33 38 100 35 37 26 VR7 29 30 58 100 39 27 VR8 26 50 55 44 29 VR9 VR10 VR11 VR12 Max 100 100 100 100 100 100 Index 1 2 6 7 9 12
VD1 VD2 VD3 VD4 VD5 VD6
38 51 100 31
32 61 32
55 58
100
Figure 4. An example of case C3.2.1 where DiffMat is created with m = 12 and n = 6

VR1 65 VR2 60 57 VR3 50 48 31 VR4 45 44 32 31 VR5 35 41 36 34 36 VR6 33 38 59 35 37 26 VR7 29 30 58 63 39 27 VR8 26 50 55 44 29 VR9 VR10 VR11 VR12 Max 65 57 59 63 80 64 Index 1 2 6 7 9 12
38 51 80 31
32 61 32
55 58
64
Figure 5. An example of case C3.2.2a where DiffMat is created with m = 12 and n = 6
565
VR1 65
VR2 60 57
VR3 50 48 31
VR4 45 44 32 31
VR5 35 41 36 34 36
VR6 33 38 59 35 37 26
VR7 29 30 58 52 39 27
VR8 26 50 55 44 29
VR9
VR10
VR11
VR12
38 59 80 31
68 77 32
75 58
63
Max 65 57 59 68 80 63
Index 1 2 6 10 9 12
Figure 6. An example of case C3.2.2b where DiffMat is created with m = 12 and n = 6

1 2 3 4 5 6 7 8 9 10 11 12
1 1 1
1 1 2 0 2 0 2 0 2
1 (a) 1 (b) 3 3 3 0 0
1 1 4 0 4 1 5
1 1 6
(c) Figure 7. (a) Enter 1 in PopArr for LIS as 1, 2, 6, 9, and 12 with MinFit = 54dB (b) Rest entries of PopArr is randomly filled by 1 or 0 based upon allowed 1s with MinFit = 63dB (c) Prefix sum of PopArr is stored into array PfxSum C3.3.1 : Length of identified LIS is equal to n ( i.e. number of frames in VD ). This indicates exactly one unique matching for each frame of VD with one frame of VR. Call these indices as frames of VR which are present in VD and rest other indices as dropped frames indices. In the example of figure 4 and figure 5, all indices ( i.e . 6 indices) of the Index field are in increasing sequence, thus LIS will give the sequence as {1, 2, 6, 7, 9, 12}, which we call as the frame indices of VR present in VD . We call other indices {3, 4, 5, 8, 10, and 11} of VR as the dropped frame indices in V D . C3.3.2 : Length of identified LIS is less than n ( i.e. number of frames in VD). This indicates that there are some frames in V D which do not have unique matching frame in VR. This case has been elaborated using an example in figure 6 where identified LIS is of length 5 (which is less than n=6) either as 1, 2, 6, 10, and 12 with average PSNR (for all n = 6 frames) as 52dB or 1, 2, 6, 9, and 12 with average PSNR (for all n = 6 frames) as 54dB. As the objective of the proposed heuristic is to select such combination of indices (of length n) which produces maximum overall similarity (i.e. maximum average PSNR for n frames), it is required to be explored all combinations of n frames which is an exponential time problem and requires huge computation for larger values of m and n. To obtain a feasible solution we present here an approach using genetic algorithm (GA) which maximizes the overall similarity. As GA does not guarantee to give optimal or near optimal solution, we use it with partially guided initial population such that accuracy can be ensured. Use steps 3.3.1 onwards if case C3.3.2 has been encountered. Step 3.3.1: Create initial population using identified LIS (of length less than n) having maximum average PSNR and store it into population array PopArr. Fill the fields of PopArr as 1 if that index is the part of the identified LIS. Call the maximum average PSNR as minimum fitness score MinFit and optimal average PSNR ( i.e. 100dB) as MaxFit. Thus in the example of figure 6, LIS as 1, 2, 6, 9, and 12 has been selected to create the initial population as its average PSNR is maximum. As indicated in figure 7a the population array PopArr is being filled as 1 for the LIS as 1, 2, 6, 9, and 12. The MinFit for this example is 54dB whereas MaxFit is 100dB. Step 3.3.2: As total number of 1s in the population array ( PopArr) should be n, randomly fill unfilled entries as 1 or 0 such that total number of 1 is not exceeding n.
566
As indicated in figure 7b, apart from entries as 1 in PopArr corresponding to LIS in preceding example, we randomly filled only one 1 (because in this example only 6 entries as 1 is allowed) at index number 11 into the PopArr and other fields as 0. Step 3.3.3: Create prefix sum array PfxSum which will store the prefix sum [23] of the array PopArr (figure 7c). Compute fitness score of the population by averaging the PSNR of DiffMat [ PfxSum[i]][i] for all i = 1 to n where PopArr[i] = 1. If fitness score of current population is greater than MinFit then assign current fitness score into MinFit and make a copy of PopArr into Final. Generate new population into PopArr using mutation operator. Use bit string mutation [24] to flip randomly any two bits of current population by ensuring that one of the flipped bit is 0 and another one is 1. As count of 1 is unchanged while performing bit string mutation, there will be n entries of 1s in newly generated population placed at random places in PopArr. In the preceding example, fitness score for current population is 63dB which is greater than MinFit = 54dB. Thus modified MinFit is 63dB and Final as {1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1}. Step 3.3.4: Repeat step 3.3.3 until k th population or MinFit = MaxFit . Step 3.3.5: We say that a frame of V R is present into V D if that index in Final is 1 i.e. Final[i] = 1 for all i from 1 to m. All i from 1 to m will be called as dropped frame indices of VR if Final[i] = 0.
4. Simulation and Analysis

4.1. Analysis
The proposed solution approach for identification of dropped frames in previous section mainly performs three operations: (a) Creation of DiffMat (b) Identification of LIS (c) Creation of population list and generation of next population using mutation operator if required. To create DiffMat the required computation is of the order O((m-n+1) n) whereas to identify LIS in Index field of DiffMat, it is of the order O( n log n). In best case if there is no or nominal frame drop (i.e. m n), n computational steps are required to create DiffMat, whereas in worst case if almost all frames are dropped ( i.e. n 1), m2 computations are required. Length of identified LIS will be approximately n for the problem of nature mentioned in cases C2.1 and C 2.2. It will be n if all the video frames of VR are unique (i.e. no repeated frames in V R ) and there is no SD (which is case C2.1) i.e. there will be single appearance of PSNR as 100dB in each row of DiffMat. It will be approximately n if some of the video frames of V R are repeated (i.e. two identical frames in VR) or frames of V R are spatially distorted with low SD. Under worst case, length of LIS will be much less than n. Creation of population list and generation of next population are required only when the problem scenario is according to case C3.3.2. Best case for creation of population list will be encountered when length of identified LIS is n-1 (or n). Problems of nature mentioned in case C2.3 (where along with frame drop, frames of V R are spatially distorted with high SD) there will be more possibility of getting the scenario like in case C3.3.2.
4.2. Simulation
We simulated the scheme presented in section 3 and analysed the performance in terms of percentage accuracy for the cases mentioned in section 2. As a video stream may contain repeated frames, cases mentioned in section 2 (C2.1 , C2.2, and C2.3 ) have been extended by introducing a new scenario of repeated frames to each case. Updated cases are listed in subsequent paragraphs. To analyse the performance, we conducted experiments under following cases. These cases are based upon introduced SD (no SD, low SD, and high SD) and repeated frames in each video stream. C4.1: There is no SD in frames of V D and all frames in VR are unique. C4.2: There is no SD in frames of VD and there are (a) 1% or (b) 5% repeated frames in VR. C4.3: Some frames of VD are distorted by low SD and all frames in V R are unique.
567
C4.4: Low SD in some frames of VD and there are (a) 1% or (b) 5% repeated frames in VR. C4.5: High SD in some frames of VD and all frames in VR are unique. C4.6: High SD in some frames of VD and there are (a) 1% or (b) 5% repeated frames in VR. To conduct experiments in all listed cases, we required three types of reference video stream; (a) set of video stream in which video frames of each video stream are unique, (b) set of video stream in which 1% frames are repeated, and (c) set of video stream in which 5% frames are repeated. We used 12 video streams (figure 8) to conduct these experiments. These video streams (bus, coastguard, galleon, football, Stefan, container, garden, hall_monitor, tt, Akiyo, carphone, and foreman) are publicly available at http://media.xiph.org/video/derf/ in compressed format. We converted all video streams into raw video streams and conducted experiments with these raw videos.
bus
coastguard
galleon
football
Stefan
container
garden
hall monitor
tt
Akiyo
carphone
foreman
Figure 8. Videos (http://media.xiph.org/video/derf/) used as test video streams As we do not have the information about repeated frames in all listed video streams, we assumed that video frames in each video stream are unique. Therefore to conduct experiments in mentioned cases we intentionally dropped some (1% or 5%) video frames and inserted repeated frame such that length of video stream is unchanged. Now for our experiments we have three sets of reference video stream, one is the original video stream with n unique frames (figure 8), second one is video stream with 1% repeated frames and last one is with 5% repeated frames. To conduct experiments, we also created set of distorted video stream from these reference video streams. We created distorted streams in two phases; first we introduced SD (low or high) and then introduced TD by dropping video frames. We used least significant bit (LSB) watermarking scheme [25] to introduce SD into frames of a video stream. We embedded watermark at (a) LSB and (b) 6th LSB bit of a pixel in a video frame. When watermark is embedded into LSB, there will be least change in pixel bit of a video frame, thus we called it as low SD, whereas embedding watermark at sixth LSB will be resulted in major change (10 to 25 %) in pixel bit of a video frame, thus we called it as high SD. Finally we randomly dropped frames (1% to 5%) of these spatially distorted video streams such that we have set of distorted video streams VD having lesser frames. Now it is required to identify all the dropped frame indices. Performance of proposed algorithm has been analysed in terms of percentage accuracy to identify the dropped frames in each case ( C4.1 to C4.6). Percentage accuracy has been computed according to equation 1 for each case ( C4.1 to C4.6) and each percentage drop of frames (1% to 5%).
568
100
b
i 1
ai
i
(1)
where, i is the experiment # of the conducted experiment for that case (C 4.1 to C4.6 ) with m% (1% to 5%) drop of frame, a is the count of successfully identified dropped frames indices, and b is the count of dropped frames. Obtained accuracy while identifying the dropped frame indices for each video stream under different experiment cases have been analysed using following cases: A4.1: Analysis based upon repeated frames (a) No repeated frames, (b) 1% repeated frames, and (c) 5% repeated frames. A4.2: Analysis based upon SD in video frames (a) No SD, (b) Low SD, and (c) High SD We analysed the performance of each video stream under above categories (figure 11 and figure 12) but as an example we separately present next the simulation steps and analysis of our approach for video stream bus (figure 9 and figure 10). There are 126 video frames in uncompressed video bus. We assumed all frames in bus as unique. As discussed, we need three sets of reference video of each video stream, therefore we created two more video streams bus1 and bus2 (having 126 frames) from bus. To create bus1, we introduced 1% repeated frames (2 frames) in bus by dropping 2 frames and inserting 2 repeated frames. Similarly bus2 is created by introducing 5% repeated frames (7 frames) and dropping 7 random frames in bus. Video stream bus has been used as reference video stream in cases C4.1 , C4.3, and C4.5 whereas bus1 in cases C4.2a, C4.4a , and C4.6a and bus2 for cases C4.2b , C4.4b , and C4.6b . To create distorted video streams, first we introduced SD by embedding watermark (at LSB and 6th LSB pixel bits) into video frames of each reference video stream (bus, bus1, and bus2) and then introduced TD. After inserting SD, we have in total 9 video streams of bus (three each spatially distorted video stream of bus, bus1, and bus2). We introduced TD by dropping random video frames 1 to 5% (2 to 7 frames) of each spatially distorted video streams of bus. For each percentage of dropped frames in a particular case ( C4.1 to C4.6 ) we created 5 distorted video streams. i.e. For drop of 1% frames in case C4.1 , we created 5 distorted video streams, similarly for drop of 2% frames in case C4.1, we have created 5 distorted video streams and so on. Altogether, to conduct experiments for case C4.1 with reference video stream bus we have 25 distorted video streams (5 distorted videos for each percentage of dropped frames). Similarly we have 25 distorted video streams of bus in each cases C4.2a, C4.2b etc.
(a)
(a)
569
(b)
(b)
(c)
(c)
Figure 9. Analysis of video stream bus having (a) No, (b) 1%, and (c) 5% repeated frames
Figure 10. Analysis of video stream bus having (a) No, (b) Low, and (c) High SD
(a)
(a)
(b)
(b)
570
(c)
(c)
Figure 11. Analysis of video streams (including bus) having (a) No, (b) 1%, and (c) 5% repeated frames
Figure 12. Analysis of video streams (including bus) having (a) No, (b) Low, and (c) High SD
After creating reference and distorted video streams, we computed drop frame indices using our proposed algorithm (section 3). It is simulated for maximum 100 generation of population (if MinFit = MaxFit, next population is not generated) using mutation operator. Subsequent paragraphs present performance analysis of proposed scheme for video stream bus: Our approach performs best when there are no or low SD and all frames are unique in bus (figure 9a, figure 10a, and figure 10b). Here ( C4.1 and C4.3 ) our algorithm successfully identified all dropped frame indices. Similar observations have been made for other video streams (figure 11a, figure 12a, and figure 12b), which infers that if there are no repeated frames with no or low SD, our algorithm successfully identify all dropped frame indices. But obvious reason behind this is the single matching index for each frame in distorted video stream with respect to reference video stream and length of LIS is equal to length of distorted video. As observed from figure 9c and figure 10c, our algorithms performance is worst when frames are distorted with high SD and there are many repeated frames in reference video stream. Similar observations have been made for other video streams in figure 11c and figure 12c. But obvious reason behind this is the conflicts regarding selection of index due to repeated frames. As observed from figure 9, our algorithm performance decreases when number of repeated frames increases in video stream bus. Similarly, we observed (figure 10) that our algorithm performance decreases when SD into frames of video stream bus increases. Similar observations have been made in figure 11 and figure 12 for other video streams.
4.3. Comparison with Other Schemes

As discussed in section 2, some of the contemporary works are being presented by S. Wolf; MSUs dropped frame metric; and Nishikawa K et. al. Wolfs [11] method is based upon identification of dropped frames when there is no or reduced information about reference video stream therefore SD in video frames are unhandled. Whereas we presented a full reference algorithm and it is able to efficiently identify the dropped frames even though frames are spatially distorted with low or high SD which might occur during processing over video frames. Comparing a frame with its adjacent frame is the main step of MSUs [12] dropped frame metric or Nishikawa K et. al [13] which might restrict the approach to explore other frame in a video stream for the similarity. As presented in section 3, our algorithm explores the possibilities of dropped frame indices into entire video stream and therefore giving an edge over previous schemes to efficiently (section 4.2) identify the dropped frame indices into a distorted video stream with respect to reference video stream.
5. Conclusion
In this paper we have presented a full reference algorithm which efficiently identifies the dropped frame indices in distorted video stream with respect to reference video stream. Analysis section shows that the minimum computations required by our algorithm is n + n log n, i.e. n
571
computations for creating DiffMat (when there is nominal frame drop) and n log n computations to identify LIS. In worst case, the computations required by our algorithm is m2 + n log n + k, i.e. m 2 computations for creating DiffMat (when almost all frames are dropped), n log n computations to identify LIS, and k is the required computational steps (if required) to generate k population by mutation operator along with its fitness score computation. If length of LIS is much less than n, then to identify dropped frames it is required to explore all combination such that best match can be selected which is an exponential time problem and requires heavy computation. Use of genetic operator reduces it to k steps by compromising a little bit in accuracy (section 4.2) to identify the dropped frame indices. Simulation results based upon 50 experiments in each case show that the proposed algorithm successfully identifies all the dropped frame indices when there is no or low SD in distorted video stream and all frames in reference video stream are unique, whereas its accuracy is approximately in between 76% to 90% (worst obtained accuracy) for the cases where there is high SD in distorted video stream and many repeated frames in reference video stream.
6. References
[1]
[2]
[3] [4]
[5] [6] [7]
[8] [9] [10] [11]
[12] [13]
[14] [15]
[16] [17] [18]
Moorthy A K, Seshadrinathan K, Soundararajan R, Bovik A C, Wireless video quality assessment: A study of Subjective scores and Objective algorithms, IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 4, pp. 587-599, 2010. Seshadrinathan K, Soundararajan R, Bovik A C, Cormack L K, Study of Subjective and Objective quality assessment of video, IEEE Transactions on Image Processing, vol. 19, no. 6, pp. 1427-1441, 2010. Seshadrinathan K, Bovik A C, Motion tuned spatio-temporal quality assessment of natural videos, IEEE Transactions on Image Processing, vol. 19, no. 2, pp. 335-350, 2010. Chikkerur S, Sundaram V, Reisslein M, Karam L J, Objective Video Quality Assessment Methods: A Classification, Review, and Performance Comparison, IEEE Transactions on Broadcasting, vol. 57, no. 2, pp. 165-182, 2011. http://www.cyberlawsindia.net/index1.html Accessed 26 April 2012. http://cyber.law.harvard.edu/metaschool/fisher/integrity/Links/Articles/winick.html Accessed 26 April 2012. Li Y, Gao X, Ji H, A 3D Wavelet Based Spatial-Temporal Approach for Video Watermarking, in Proceedings of Fifth International Conference on Computational Intelligence and Multimedia Applications, ICCIMA03, 2003. Gulliver S R, Ghinea G, The Perceptual and Attentive Impact of Delay and Jitter in Multimedia Delivery, IEEE Transactions on Broadcasting, vol. 53, no. 2, pp. 449-458, 2007. Yim C, Bovik A C, Evaluation of temporal variation of video quality in packet loss networks, Signal Processing: Image Communication 26 (2011), pp. 24-38, 2011. Pinson M H, Wolf S, and Cermak G, HDTV Subjective Quality of H.264 vs. MPEG-2, With and Without Packet Loss, IEEE Transactions on Broadcasting, vol. 56, no. 1, pp. 86-91, 2010. Wolf S, A No Reference (NR) and Reduced Reference (RR) Metric for Detecting Dropped Video Frames, Fourth International Workshop on Video Processing and Quality Metrics for Consumer Electronics, VPQM 2009. http://compression.ru/video/quality_measure/metric_plugins/dfm_en.htm Accessed 26 April 2012. Nishikawa K, Munadi K, Kiya H, No-Reference PSNR Estimation for Quality Monitoring of Motion JPEG2000 Video Over Lossy Packet Networks, IEEE Transactions on Multimedi, vol. 10, no. 4, pp. 637-645, 2008. Winkler S, Mohandas P, The Evolution of Video Quality Measurement: From PSNR to Hybrid Metrics, IEEE Transactions on Broadcasting, vol. 54, no. 3, pp. 1-9, 2008. Wang Z, Sheikh H R, Bovik A C, Objective video quality assessment, in The Handbook of Video Databases: Design and Applications, B. Furht and O. Marques, ed., CRC Press, pp 1041-1078, 2003. Girod B, Whats wrong with mean-squared error, in Digital Images and Human Vision. A. B. Watson, ed., MIT Press, pp. 207220, 1993. Yuanjiang LI and Yuehua LI, "Passive Millimeter-wave Image Denoising Based on Improved Algorithm of Non-local mean", IJACT, vol. 4, no. 10, pp. 158-164, 2012. H. Duan and G. Chen, "A New Digital Halftoning Algorithm by Integrating Modified PulseCoupled Neural Network with Random Number Generator", JDCTA, vol. 6, no. 12, pp. 29-37, 2012.
572
[19]
[20]
[21] [22]
[23] [24] [25]
Saks M, Seshadhri C, Estimating the Longest Increasing Sequence in Polylogarithmic Time, in Proceedings of 51st Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, pp. 458-467, 2010. Chan W T, Zhang Y, Fung S P Y, Ye D, Zhu H, Efficient algorithms for finding a longest common increasing subsequences, in Proceedings of 16th Annual International Symposium on Algorithms and Computation, Hainan, China, pp. 665-674, 2005. Sakai Y, A linear space algorithm for computing a longest common increasing subsequence, Information Processing Letters, vol. 99, Issue 5, pp. 203-207, 2006. Baik J, Deift P, Johansson K, On the Distribution of the Length of the Longest Increasing Subsequence of Random Permutations, Journal of the American Mathematical Society, vol. 12, no. 4, pp. 1119-1178, 1999. http://www.cs.cmu.edu/~blelloch/papers/Ble93.pdf. Accessed 26 April 2012. http://en.wikipedia.org/wiki/Mutation_%28genetic_algorithm%29 Accessed 26 April 2012. Abdullah B, Rosziati I, Nazib M S, A New Digital Watermarking Algorithm using Combination of Least Significant Bit (LSB) and Inverse Bit, Journal of Computing, vol. 3, Issue 4, pp. 1-8, 2011.
573

PaperA Full Reference Algorithm For Dropped Frames Identification in Uncompressed Video Using Genetic Algorithm

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

PaperA Full Reference Algorithm For Dropped Frames Identification in Uncompressed Video Using Genetic Algorithm

Hochgeladen von

Copyright:

Verfügbare Formate

A Full Reference Algorithm for Dropped Frames Identification in Uncompressed Video Using Genetic Algorithm Manish K Thakur, Vikas

. VD with n = m-1 frames VD1 VD2 VD3 VDn

VD1 VD2 VD3 VD4 VD5 VD6

Figure 4. An example of case C3.2.1 where DiffMat is created with m = 12 and n = 6

VD1 VD2 VD3 VD4 VD5 VD6

Figure 5. An example of case C3.2.2a where DiffMat is created with m = 12 and n = 6

VD1 VD2 VD3 VD4 VD5 VD6

Figure 6. An example of case C3.2.2b where DiffMat is created with m = 12 and n = 6

4. Simulation and Analysis

4.3. Comparison with Other Schemes

[5] [6] [7]

[8] [9] [10] [11]

[16] [17] [18]

[23] [24] [25]

Das könnte Ihnen auch gefallen