Beruflich Dokumente
Kultur Dokumente
Background subtraction is a technique of separating out foreground objects from the background in a sequence of video frames. Background/Foreground must arguably be one of the most popular research topics in computer vision industries. It finds use in a lot of applications related to video, such as video surveillance, traffic monitoring, and gesture recognition for human-machine interfaces, to name a few. Many methods exist for background subtraction, each with different strengths and weaknesses in terms of performance and computational requirements. My project focuses mainly on two of the used techniques. The results of the techniques have been taken for different video sequences. I have tried to draw a comparison between the two techniques namely Frame Differencing and Approximate Median. These techniques were chosen for the reasons of being: Computationally efficient for many low power applications. Being a good representation of background subtraction implementations in today's video applications.
1. Introduction
BGS techniques are defined by the background model and the foreground detection process. According to [1], the background model is described by three aspects: the initialization, the representation and the update process of the scene background. A correct initialization allows acquiring a background of the scene without errors. For instance, techniques that analyse video sequences with presence of moving objects in the whole sequence should consider different initialization schemes to avoid the acquisition of an incorrect background of the scene. The representation describes the mathematical techniques used to model the value of each background pixel. For instance, unimodal sequences (where background pixels variation follows a unimodal scheme) need more simple models to describe the background of the scene than the multimodal ones (where background pixels, due to scene dynamism, vary following more complex schema). The update process allows incorporating specific global changes in the background model, such as those owing to illumination and viewpoint variation. Additionally, these techniques usually include pre-processing and post-processing stages to improve final foreground detection results.
These changes might result in errors and hence these must be dealt with
1.2Classification of techniques
The BGS techniques, classified according to the model representation, just in order to identify their most relevant parameters and implementation details that might diverge from the referenced work.
value in the current frame, also following a running average scheme with parameter . The initial Gaussians deviation value,o, the threshold , and are the analysed parameters. Gamma method (G) [8]: in practice represents each background pixel with a running average of its previous values. The decision on a pixel belonging to the background is performed by summing up the square differences between pixel values of a square spatial window centred in the considered pixel frame and the corresponding background model values, and setting a threshold over it. A theoretical development, based on the assumption that the pixel variation follows a Gaussian, concludes that the threshold function follows a Chi-square distribution, which bases the threshold selection, in fact a probability. This threshold is the analysed parameter.
2.1.1 MATLAB For the project, I chose to use Matlab as the programming language. It is a high-level language that specializes in data analysis and computing mathematical problems. Matlabs official website can be found at www.mathworks.com. The program environment has an interactive command window that allows users to test and experiment with the code line by line. Users can also save their codes into an M-file and run the program. The Matlab Help Navigator is also very useful. It properly categorizes and provides detailed explanations and sample usages of all functions. Just like C++ and Java, the language syntax provides loops and condition statements for programming purposes. The language was chosen over C++ and Java because there are a lot of built-in functions that are specific for image processing. As well, the compiler can compute large mathematical equations faster than other languages. These advantages suit the project perfectly due to the large matrix computations required during the extraction process. There were some minor problems that occurred during the working of the project. The first problem was that Matlab is a complete new language and environment for me. I had to get myself familiarized with Matlab by practicing simple tutorials and exploring with the programming environment. Another problem that arose is that Matlab takes a long time running the segmentation code.
When compared to C++ and Java, Matlab can calculate matrices quicker, but the large video files take a long time for a scripting language to compile. Lastly, the Matlab software environment requires a lot of memory to run. During the process of starting up and compiling, windows often cannot provide enough memory for Matlab and windows will sometimes shutdown automatically.
2.2 Assumptions
Assumptions for the project may alter the decision for choosing the practical extraction algorithm. There are many different techniques that can extract background/foregrounds from videos. There are a couple assumptions relative to the background environment of the video. The first assumption allows the user to choose the location of the video. It can be filmed either indoor or outdoor. Secondly, lighting in the video must always be constant due to the difficulty that arises when a light source changes its brightness or location. Also, the background of the video must be static. No moving objects are allowed in the background. Even slight movement such as a reflection off of a window can create unwanted noises. Lastly, the software process is not required to run at real-time. This assumption greatly reduces the complexity of the software. The videos are taken for two setting inside and outside environments. (The earlier discussed problems of moving background and illumination have caused some minor deviations in the result but overall it seems to be running fine.)
2.3 Algorithm
6
The two models used basically use a similar approach in the sense that we need to compare the frame difference with a threshold. For instance in frame differencing the step that does the job of background/foreground detection is
I will now discuss the two basic models used for Background/Foreground subtraction. The algorithm and steps involved will be given with it.
3.1.1 Algorithm
fi : A pixel in a current frame, where I is the frame index. fi-1: A pixel in a previous frame (fi and fi-1 are located at the same location.) di: Absolute difference of fi and fi-1. bi: B/F mask. T: Threshold value.
Steps
1. di = |fi fi-1| 2. If di > T, fi belongs to the foreground; otherwise, it belongs to the background. 3.1.2 Drawbacks A major (perhaps fatal) flaw of this method is that for objects with uniformly distributed intensity values (such as the side of a car), the interior pixels are interpreted as part of the background. Another problem is that objects must be continuously moving. If an object stays still for more than a frame period (1/fps), it becomes part of the background. 3.1.3 Advantages This method does have two major advantages. One obvious advantage is the modest computational load. Another is that the background model is highly adaptive. Since the background is based solely on the previous frame, it can adapt to changes in the background faster than any other method (at 1/fps to be precise). As we'll see later on, the frame difference method subtracts out extraneous background noise (such as waving trees), much better than the more complex approximate median and mixture of Gaussians methods. 3.1.4 Challenges A challenge with this method is determining the threshold value. (This is also a problem for the other methods.) The threshold is typically found empirically, which can be tricky. The threshold set too low will let every object pass through so it should not be set low. Also it cannot be set high because it will block the foreground.
Figure 2 : Background from one of the videos using Frame differencing model
tracking of piglets in images, has since seen wide implementation in the background subtraction literature, and been applied to a wide range of background subtraction scenarios. 3.2.1 Logic The approximate median method works as such: if a pixel in the current frame has a value larger than the corresponding background pixel, the background pixel is incremented by 1. Likewise, if the current pixel is less than the background pixel, the background is decremented by one. In this way, the background eventually converges to an estimate where half the input pixels are greater than the background, and half are less than the backgroundapproximately the median (convergence time will vary based on frame rate and amount movement in the scene.) 3.2.2 Algorithm The steps involved in the process of approximate median method are: Read the video Input the frame Convert the frame into GRAY Determine the threshold Determine the frame difference value Compare the threshold with the frame difference values For the foreground values > background values make the background more light For the foreground values > background values make the background more dark Subplot the frame foreground and background
3.2.3 Efficiency The approximate median method does a much better job at separating the entire object from the background. This is because the more slowly adapting background incorporates a longer
11
history of the visual scene, achieving about the same result as if we had buffered and processed N frames.
4. Coding
This section describes the coding used for the project. I will mention the codes for different models in separate sections. The video along with the directory must be mentioned in the source. Threshold must be chosen for the background to be visible. The result will show that there is a slight deviation in the output that is because of the problems discussed in section 1.1. The code is shown next. I have given the code in the same format as it appears in Matlab. To cover the entire code on a single page the code is shown on the next page.
is used to
12
source = aviread('Filename'); with name thresh = 25; bg = source(1).cdata; background frame bg_bw = rgb2gray(bg); greyscale
-----------------------
set
frame
size
variables
----------------------fr_size = size(bg); width = fr_size(2); height = fr_size(1); fg = zeros(height, width); % --------------------process frames
----------------------------------for i = 2:length(source) fr = source(i).cdata; fr_bw = rgb2gray(fr); % read in frame % convert frame to grayscale
13
fr_diff = abs(double(fr_bw) - double(bg_bw)); operands as double to avoid negative overflow for j=1:width in foreground for k=1:height if ((fr_diff(k,j) > thresh)) fg(k,j) = fr_bw(k,j); else fg(k,j) = 0; end end end bg_bw = fr_bw; figure(1),subplot(3,1,1),imshow(fr) subplot(3,1,2),imshow(fr_bw) subplot(3,1,3),imshow(uint8(fg)) M(i-1) into movie end % movie2avi(M,'frame_difference_output', 'fps', 30); movie as avi = im2frame(uint8(fg),gray);
% cast
% put frames
% save
14
----------------------fr_size = size(bg); width = fr_size(2); height = fr_size(1); fg = zeros(height, width); % --------------------process frames
fr_diff = abs(double(fr_bw) - double(bg_bw)); operands as double to avoid negative overflow for j=1:width foreground for k=1:height if ((fr_diff(k,j) > thresh))
15
fg(k,j) = fr_bw(k,j); else fg(k,j) = 0; end if (fr_bw(k,j) > bg_bw(k,j)) bg_bw(k,j) = bg_bw(k,j) + 1; elseif (fr_bw(k,j) < bg_bw(k,j)) bg_bw(k,j) = bg_bw(k,j) - 1; end end end figure(1),subplot(3,1,1),imshow(fr) subplot(3,1,2),imshow(uint8(bg_bw)) subplot(3,1,3),imshow(uint8(fg)) M(i-1) end %movie2avi(M,'approximate_median_background','fps',15); % save movie as avi = im2frame(uint8(fg),gray); % save
output as movie
5. Results
16
In this section I will show the results obtained for the techniques at different threshould and for different video sequences.
5.1 Results for Frame Difference 5.1.1 Test Video 1 Threshold value 15
Threshold value 25
17
Threshold value 45
18
19
Threshold Value 25
21
Threshold Value 45
22
Threshold value 45
the background eventually converges to an estimate where half the input pixels are greater than the background, and half are less than the backgroundapproximately the median (convergence time will vary based on frame rate and amount movement in the scene.)
As you can see, the approximate median method does a much better job at separating the entire object from the background. This is because the more slowly adapting background incorporates a longer history of the visual scene, achieving about the same result as if we had buffered and processed N frames.
We do see some trails behind the larger objects (the cars). This is due to updating the background at a relatively high rate (30 fps). In a real application, the frame rate would likely be lower (say, 15 fps).
7. Conclusion
25
Each technique has its merits. The conclusion. In frame difference one obvious advantage is the modest computational load. Another is that the background model is highly adaptive. Since the background is based solely on the previous frame, it can adapt to changes in the background faster than any other method (at 1/fps to be precise). As we'll see later on, the frame A challenge with this method is determining the threshold value. Approximate Median is a very good compromise. It offers performance near what you can achieve with higher-complexity methods (according to my research and the academic literature), and it costs not much more in computation and storage than frame differencing.
26
Appendices
References
27
[1] Cristani, M., Bicego, M., Murino, V.: Multi-level background initialization using Hidden Markov Models. In: First ACM SIGMM Int. workshop on Video surveillance, pp. 1120 (2003) [2] Piccardi, M.: Background subtraction techniques: a review. In: SMC 2004, vol. 4,pp. 30993104 (2004) [3] Cheung, S.-C., Kamath, C.: Robust techniques for background subtraction in urban traffic video. In: Panchanathan, S., Vasudev, B. (eds.) Proc. Elect Imaging: Visual Comm. Image Proce. (Part One) SPIE, vol. 5308, pp. 881892 (2004) [4] Cucchiara, R.: People Surveillance, VISMAC Palermo (2006) [5] Ewerth, R., Freisleben, B.: Frame difference normalization: an approach to reduce errorrates of cut detection algorithms for MPEG videos. In: ICIP, pp. 10091012 (2003) [6] 1. C. Stauffer and W Grimson. Adaptive Background Mixture Models for Real-Time Tracking, Proc IEEE Conference Computer Vision and Pattern Recognition, 1999. [7] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of Computer algorithms. Addison Wesley, Reading, MA 1974. [7] Wren, A., Darrell, P.: Pfinder: Real-time tracking of the human body. PAMI (1997) [8] Cavallaro, A., Steiger, O., Ebrahimi, T.: Semantic video analysis for adaptive content delivery and automatic description. IEEE Transactions on Circuits and Systems for Video Technology 15(10), 12001209 (2005) [9] Stauffer, G.: Adaptive background mixture models for real-time tracking. In: CVPR (1999) [10] Carminati, L., Benois-Pineau, J.: Gaussian mixture classification for moving object detection in video surveillance environment. In: ICIP, pp. 113 116 (2005)
28
[11] Comaniciu, D.: Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and machine Intelligence 24(5), 603 (2002) [12] Elgammal, A.M., Harwood, D., Davis, L.S.: Non-parametric model for background subtraction. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp. 751767. Springer, Heidelberg (2000) [13] Zivkovic, Z., Van Der Heijden, F.: Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters 27(7), 773780 (2006) [14] Stenger, B., Ramesh, V., Paragios, N., Coetzee, F., Buhmann, J.M.: Topology Free Hidden Markov Models: Application to Background Modeling. In: Eighth Int. Conf. on Computer Vision, ICCV 2001, vol. 1, pp. 294301 (2001) [15] Mittal, A., Paragios, N.: Motion-based background subtraction using adaptive kernel density estimation. In: Proceedings of the Int. Conf. Comp. Vision and Patt. Recog., CVPR, pp. 302309 (2004) [16] Tiburzi, F., Escudero, M., Bescs, J., Martnez, J.M.: A Corpus for Motionbased Videoobject Segmentation. In: IEEE International Conference on Image Processing (Workshop on Multimedia Information Retrieval), ICIP 2008, SanDiego, USA (2008) [17] El Baf, F., Bouwmans, T., Vachon, B.: Comparison of Background Subtraction Methods for a Multimedia Application. In: 14th International Conference on systems, Signals and Image Processing, IWSSIP 2007, Maribor, Slovenia, pp. 385388 (2007) [18] Parks, D.H., Fels, S.S.: Evaluation of Background Subtraction Algorithms with Post- Processing. In: IEEE Fifth International Conference on Advanced Video and Signal Based Surveillance, AVSS 2008, pp. 192199 (2008) [19] Z. Zivkovic and F. van der Heijden, Efficient adaptive density estimation per image pixel for the task of background subtraction, Pattern
29
Recognition Letters, vol. 27, no. 7, pp. 773780, May 2006. [20] NVIDIA Corporation, NVIDIA CUDA Programming Guide, 2007. [21] AMD/ATI, ATI CTM (Close to Mental) Guide, 2007. [22] J. Fung, Advances in GPU-based Image Processing and Computer Vision, in SIGGRAPH, 2009. [23] S.-j. Lee and C.-s. Jeong, Real-time Object Segmentation based on GPU, 2006 International Conference on Computational Intelligence and Security, pp. 739742, November 2006. [24] P. Carr, GPU Accelerated Multimodal Background Subtraction. Digital Image Computing: Techniques and Applications, December 2008. [25] Apple Inc., Core Image Programming Guide, 2008. [26] NVIDIA Corporation, NVIDIA CUDA Best Practices Guide, 2010. [27] VSSN 2006 Competition, 2006. [Online]. Available: http://mmc36. informatik.uni-augsburg.de/VSSN06_OSAC/ [28] PETS 2009 Benchmark Data, 2009. [Online]. Available: http: //www.cvg.rdg.ac.uk/PETS2009/a.html [30]//www.eetimes.com
30