Sie sind auf Seite 1von 30

Abstract

Background subtraction is a technique of separating out foreground objects from the background in a sequence of video frames. Background/Foreground must arguably be one of the most popular research topics in computer vision industries. It finds use in a lot of applications related to video, such as video surveillance, traffic monitoring, and gesture recognition for human-machine interfaces, to name a few. Many methods exist for background subtraction, each with different strengths and weaknesses in terms of performance and computational requirements. My project focuses mainly on two of the used techniques. The results of the techniques have been taken for different video sequences. I have tried to draw a comparison between the two techniques namely Frame Differencing and Approximate Median. These techniques were chosen for the reasons of being: Computationally efficient for many low power applications. Being a good representation of background subtraction implementations in today's video applications.

1. Introduction

BGS techniques are defined by the background model and the foreground detection process. According to [1], the background model is described by three aspects: the initialization, the representation and the update process of the scene background. A correct initialization allows acquiring a background of the scene without errors. For instance, techniques that analyse video sequences with presence of moving objects in the whole sequence should consider different initialization schemes to avoid the acquisition of an incorrect background of the scene. The representation describes the mathematical techniques used to model the value of each background pixel. For instance, unimodal sequences (where background pixels variation follows a unimodal scheme) need more simple models to describe the background of the scene than the multimodal ones (where background pixels, due to scene dynamism, vary following more complex schema). The update process allows incorporating specific global changes in the background model, such as those owing to illumination and viewpoint variation. Additionally, these techniques usually include pre-processing and post-processing stages to improve final foreground detection results.

1.1 Challenges for any model


The background image is not fixed but must adapt to: wisely. Illumination changes Motion changes Changes in the background geometry.

These changes might result in errors and hence these must be dealt with

1.2Classification of techniques
The BGS techniques, classified according to the model representation, just in order to identify their most relevant parameters and implementation details that might diverge from the referenced work.

1.2.1 Basic Models


Frame differencing (FD) [5]: also known as temporal difference, this method uses the previous frame as background model for the current frame. Setting a threshold, , on the squared difference between model and frame decides on foreground and background. This threshold is the analysed parameter. Median filtering (MF) [3]: uses the median of the previous N frames as background model for the current frame. As FD, a threshold on the model-frame difference decides. This threshold is the only analysed parameter. This method claims to be very robust, but requires memory resources to store the last N frames.

1.2.2 Parametric Models


Simple Gaussian (SG) [7]: represents each background pixel variation with a Gaussian distribution. For every new frame, a pixel is determined to belong to the background if it falls into a deviation, , around the mean. The parameters of each Gaussian are updated with the current frame pixel by using a running average scheme [6], controlled by a learning factor .The initial deviation value, o, and are the analysed parameters. Mixture of Gaussians (MoG) [9]: represents each background pixel variation with a set of weighted Gaussian distributions. Distributions are ordered according to its weight; the more relevant (until the accumulated weight gets past a threshold, ) are considered to model the background; the remaining models the foreground. A pixel is decided to belong to the background if it falls into a deviation,, around the mean of any of the Gaussians that model it. The update process is only performed on the Gaussian distribution that describes the pixel

value in the current frame, also following a running average scheme with parameter . The initial Gaussians deviation value,o, the threshold , and are the analysed parameters. Gamma method (G) [8]: in practice represents each background pixel with a running average of its previous values. The decision on a pixel belonging to the background is performed by summing up the square differences between pixel values of a square spatial window centred in the considered pixel frame and the corresponding background model values, and setting a threshold over it. A theoretical development, based on the assumption that the pixel variation follows a Gaussian, concludes that the threshold function follows a Chi-square distribution, which bases the threshold selection, in fact a probability. This threshold is the analysed parameter.

1.2.3 Non-Parametric Models


Histogram-based approach (Hb) [15]: represents each background pixel variation with a histogram of its last N values, which is re-computed every L frames (L << N ). A threshold is set on each normalized histogram (hence being different for each pixel), so that incoming pixels with values over the threshold are considered to be background. The value of the threshold is the analysed parameter. Kernel Density Estimation (KDE) [12] estimates the probability density function (pdf) of each pixel for each frame, via averaging the effect of a set of kernel functions (typically Gaussian) centred at each pixel value for the N previous frames. A pixel is determined to belong to the background if its probability of belonging to the modelled distribution is higher than a threshold , which is the only analysed parameter.

2. Equipment and Methodology

2.1 Equipment used


The equipment used for any setting will include a digital camera and a processing tool that will perform the extraction process. The background subtraction algorithms typically process lower resolution grayscale video, so it might also require a video editing and compressing software.

2.1.1 MATLAB For the project, I chose to use Matlab as the programming language. It is a high-level language that specializes in data analysis and computing mathematical problems. Matlabs official website can be found at www.mathworks.com. The program environment has an interactive command window that allows users to test and experiment with the code line by line. Users can also save their codes into an M-file and run the program. The Matlab Help Navigator is also very useful. It properly categorizes and provides detailed explanations and sample usages of all functions. Just like C++ and Java, the language syntax provides loops and condition statements for programming purposes. The language was chosen over C++ and Java because there are a lot of built-in functions that are specific for image processing. As well, the compiler can compute large mathematical equations faster than other languages. These advantages suit the project perfectly due to the large matrix computations required during the extraction process. There were some minor problems that occurred during the working of the project. The first problem was that Matlab is a complete new language and environment for me. I had to get myself familiarized with Matlab by practicing simple tutorials and exploring with the programming environment. Another problem that arose is that Matlab takes a long time running the segmentation code.

When compared to C++ and Java, Matlab can calculate matrices quicker, but the large video files take a long time for a scripting language to compile. Lastly, the Matlab software environment requires a lot of memory to run. During the process of starting up and compiling, windows often cannot provide enough memory for Matlab and windows will sometimes shutdown automatically.

2.2 Assumptions
Assumptions for the project may alter the decision for choosing the practical extraction algorithm. There are many different techniques that can extract background/foregrounds from videos. There are a couple assumptions relative to the background environment of the video. The first assumption allows the user to choose the location of the video. It can be filmed either indoor or outdoor. Secondly, lighting in the video must always be constant due to the difficulty that arises when a light source changes its brightness or location. Also, the background of the video must be static. No moving objects are allowed in the background. Even slight movement such as a reflection off of a window can create unwanted noises. Lastly, the software process is not required to run at real-time. This assumption greatly reduces the complexity of the software. The videos are taken for two setting inside and outside environments. (The earlier discussed problems of moving background and illumination have caused some minor deviations in the result but overall it seems to be running fine.)

2.3 Algorithm
6

The two models used basically use a similar approach in the sense that we need to compare the frame difference with a threshold. For instance in frame differencing the step that does the job of background/foreground detection is

The algorithm will be taken up with the techniques later on.

3. Designs and Logic

I will now discuss the two basic models used for Background/Foreground subtraction. The algorithm and steps involved will be given with it.

3.1 Frame Difference


Frame difference is arguably the simplest form of background subtraction. The current frame is simply subtracted from the previous frame, and if the difference in pixel values for a given pixel is greater than a threshold Ts, the pixel is considered part of the foreground.

3.1.1 Algorithm

Figure 1:The figure shows Algorithm for Frame Differencing model

The algorithm of two-frame-based B/F detection is described below.


fi : A pixel in a current frame, where I is the frame index. fi-1: A pixel in a previous frame (fi and fi-1 are located at the same location.) di: Absolute difference of fi and fi-1. bi: B/F mask. T: Threshold value.

Steps

1. di = |fi fi-1| 2. If di > T, fi belongs to the foreground; otherwise, it belongs to the background. 3.1.2 Drawbacks A major (perhaps fatal) flaw of this method is that for objects with uniformly distributed intensity values (such as the side of a car), the interior pixels are interpreted as part of the background. Another problem is that objects must be continuously moving. If an object stays still for more than a frame period (1/fps), it becomes part of the background. 3.1.3 Advantages This method does have two major advantages. One obvious advantage is the modest computational load. Another is that the background model is highly adaptive. Since the background is based solely on the previous frame, it can adapt to changes in the background faster than any other method (at 1/fps to be precise). As we'll see later on, the frame difference method subtracts out extraneous background noise (such as waving trees), much better than the more complex approximate median and mixture of Gaussians methods. 3.1.4 Challenges A challenge with this method is determining the threshold value. (This is also a problem for the other methods.) The threshold is typically found empirically, which can be tricky. The threshold set too low will let every object pass through so it should not be set low. Also it cannot be set high because it will block the foreground.

Figure 2 : Background from one of the videos using Frame differencing model

3.2 Approximate Median


In median filtering, the previous N frames of video are buffered, and the background is calculated as the median of buffered frames. Then (as with frame difference), the background is subtracted from the current frame and thresholded to determine the foreground pixels. Median filtering has been shown to be very robust and to have performance comparable to higher complexity methods. However, storing and processing many frames of video (as is often required to track slower moving objects) requires an often prohibitively large amount of memory. This can be alleviated somewhat by storing and processing frames at a rate lower than the frame rate thereby lowering storage and computation requirements at the expense of a slower adapting background. A more efficient compromise was devised back in 1995 by UK researchers N.J.B. McFarlane and C.P. Schofield. While doing government funded research on piglet tracking in large commercial farms, they came up with an efficient recursive approximation of the median filter. Their approximate median method, presented in their seminal paper, Segmentation and
10

tracking of piglets in images, has since seen wide implementation in the background subtraction literature, and been applied to a wide range of background subtraction scenarios. 3.2.1 Logic The approximate median method works as such: if a pixel in the current frame has a value larger than the corresponding background pixel, the background pixel is incremented by 1. Likewise, if the current pixel is less than the background pixel, the background is decremented by one. In this way, the background eventually converges to an estimate where half the input pixels are greater than the background, and half are less than the backgroundapproximately the median (convergence time will vary based on frame rate and amount movement in the scene.) 3.2.2 Algorithm The steps involved in the process of approximate median method are: Read the video Input the frame Convert the frame into GRAY Determine the threshold Determine the frame difference value Compare the threshold with the frame difference values For the foreground values > background values make the background more light For the foreground values > background values make the background more dark Subplot the frame foreground and background

3.2.3 Efficiency The approximate median method does a much better job at separating the entire object from the background. This is because the more slowly adapting background incorporates a longer

11

history of the visual scene, achieving about the same result as if we had buffered and processed N frames.

4. Coding
This section describes the coding used for the project. I will mention the codes for different models in separate sections. The video along with the directory must be mentioned in the source. Threshold must be chosen for the background to be visible. The result will show that there is a slight deviation in the output that is because of the problems discussed in section 1.1. The code is shown next. I have given the code in the same format as it appears in Matlab. To cover the entire code on a single page the code is shown on the next page.

The statement movie2avi(M,'frame_difference_output', 'fps', 30);


30 frames per second.

is used to

save the output as an avi file, named 'frame_difference_output' running at

12

4.1 Frame Difference

source = aviread('Filename'); with name thresh = 25; bg = source(1).cdata; background frame bg_bw = rgb2gray(bg); greyscale

%Give the path of the file

%Set threshold % read in 1st frame as % convert background to

-----------------------

set

frame

size

variables

----------------------fr_size = size(bg); width = fr_size(2); height = fr_size(1); fg = zeros(height, width); % --------------------process frames

----------------------------------for i = 2:length(source) fr = source(i).cdata; fr_bw = rgb2gray(fr); % read in frame % convert frame to grayscale
13

fr_diff = abs(double(fr_bw) - double(bg_bw)); operands as double to avoid negative overflow for j=1:width in foreground for k=1:height if ((fr_diff(k,j) > thresh)) fg(k,j) = fr_bw(k,j); else fg(k,j) = 0; end end end bg_bw = fr_bw; figure(1),subplot(3,1,1),imshow(fr) subplot(3,1,2),imshow(fr_bw) subplot(3,1,3),imshow(uint8(fg)) M(i-1) into movie end % movie2avi(M,'frame_difference_output', 'fps', 30); movie as avi = im2frame(uint8(fg),gray);

% cast

% if fr_diff > thresh pixel

% put frames

% save

14

4.2 Approximate Median


source = aviread('File name'); thresh = 28; bg = source(1).cdata; background frame bg_bw = double(rgb2gray(bg)); greyscale % ----------------------set frame size variables % convert background to % read in 1st frame as % Give path of the video

----------------------fr_size = size(bg); width = fr_size(2); height = fr_size(1); fg = zeros(height, width); % --------------------process frames

----------------------------------for i = 2:length(source) fr = source(i).cdata; fr_bw = rgb2gray(fr); % convert frame to grayscale % cast

fr_diff = abs(double(fr_bw) - double(bg_bw)); operands as double to avoid negative overflow for j=1:width foreground for k=1:height if ((fr_diff(k,j) > thresh))

% if fr_diff > thresh pixel in

15

fg(k,j) = fr_bw(k,j); else fg(k,j) = 0; end if (fr_bw(k,j) > bg_bw(k,j)) bg_bw(k,j) = bg_bw(k,j) + 1; elseif (fr_bw(k,j) < bg_bw(k,j)) bg_bw(k,j) = bg_bw(k,j) - 1; end end end figure(1),subplot(3,1,1),imshow(fr) subplot(3,1,2),imshow(uint8(bg_bw)) subplot(3,1,3),imshow(uint8(fg)) M(i-1) end %movie2avi(M,'approximate_median_background','fps',15); % save movie as avi = im2frame(uint8(fg),gray); % save

output as movie

5. Results
16

In this section I will show the results obtained for the techniques at different threshould and for different video sequences.

5.1 Results for Frame Difference 5.1.1 Test Video 1 Threshold value 15

Threshold value 25
17

Threshold value 45
18

5.1.2. Test Video 2

19

Threshold value 15 Threshold value 25 Threshold value 45

5.2 Results for Approximate Median


20

5.2.1 Test Video 1 Threshold Value 15

Threshold Value 25

21

Threshold Value 45
22

5.2.2 Test Video 2


23

Threshold value 15 Threshold value 25

Threshold value 45

6. Observations and Analysis of Results


24

6.1 Frame Differencing


Frame difference method subtracts out extraneous background noise (such as waving trees), much better than the more complex approximate median and mixture of Gaussians methods. As can be seen, a major (perhaps fatal) flaw of this method is that for objects with uniformly distributed intensity values (such as the side of a car), the interior pixels are interpreted as part of the background. Another problem is that objects must be continuously moving. If an object stays still for more than a frame period (1/fps), it becomes part of the background.

6.2 Approximate Median


The approximate median method works as such: if a pixel in the current frame has a value larger than the corresponding background pixel, the background pixel is incremented by 1. Likewise, if the current pixel is less than the background pixel, the background is decremented by one.

the background eventually converges to an estimate where half the input pixels are greater than the background, and half are less than the backgroundapproximately the median (convergence time will vary based on frame rate and amount movement in the scene.)

As you can see, the approximate median method does a much better job at separating the entire object from the background. This is because the more slowly adapting background incorporates a longer history of the visual scene, achieving about the same result as if we had buffered and processed N frames.

We do see some trails behind the larger objects (the cars). This is due to updating the background at a relatively high rate (30 fps). In a real application, the frame rate would likely be lower (say, 15 fps).

The processing time increases with larger frame sequence.

7. Conclusion
25

Each technique has its merits. The conclusion. In frame difference one obvious advantage is the modest computational load. Another is that the background model is highly adaptive. Since the background is based solely on the previous frame, it can adapt to changes in the background faster than any other method (at 1/fps to be precise). As we'll see later on, the frame A challenge with this method is determining the threshold value. Approximate Median is a very good compromise. It offers performance near what you can achieve with higher-complexity methods (according to my research and the academic literature), and it costs not much more in computation and storage than frame differencing.

26

Appendices

References

27

[1] Cristani, M., Bicego, M., Murino, V.: Multi-level background initialization using Hidden Markov Models. In: First ACM SIGMM Int. workshop on Video surveillance, pp. 1120 (2003) [2] Piccardi, M.: Background subtraction techniques: a review. In: SMC 2004, vol. 4,pp. 30993104 (2004) [3] Cheung, S.-C., Kamath, C.: Robust techniques for background subtraction in urban traffic video. In: Panchanathan, S., Vasudev, B. (eds.) Proc. Elect Imaging: Visual Comm. Image Proce. (Part One) SPIE, vol. 5308, pp. 881892 (2004) [4] Cucchiara, R.: People Surveillance, VISMAC Palermo (2006) [5] Ewerth, R., Freisleben, B.: Frame difference normalization: an approach to reduce errorrates of cut detection algorithms for MPEG videos. In: ICIP, pp. 10091012 (2003) [6] 1. C. Stauffer and W Grimson. Adaptive Background Mixture Models for Real-Time Tracking, Proc IEEE Conference Computer Vision and Pattern Recognition, 1999. [7] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of Computer algorithms. Addison Wesley, Reading, MA 1974. [7] Wren, A., Darrell, P.: Pfinder: Real-time tracking of the human body. PAMI (1997) [8] Cavallaro, A., Steiger, O., Ebrahimi, T.: Semantic video analysis for adaptive content delivery and automatic description. IEEE Transactions on Circuits and Systems for Video Technology 15(10), 12001209 (2005) [9] Stauffer, G.: Adaptive background mixture models for real-time tracking. In: CVPR (1999) [10] Carminati, L., Benois-Pineau, J.: Gaussian mixture classification for moving object detection in video surveillance environment. In: ICIP, pp. 113 116 (2005)

28

[11] Comaniciu, D.: Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and machine Intelligence 24(5), 603 (2002) [12] Elgammal, A.M., Harwood, D., Davis, L.S.: Non-parametric model for background subtraction. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp. 751767. Springer, Heidelberg (2000) [13] Zivkovic, Z., Van Der Heijden, F.: Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognition Letters 27(7), 773780 (2006) [14] Stenger, B., Ramesh, V., Paragios, N., Coetzee, F., Buhmann, J.M.: Topology Free Hidden Markov Models: Application to Background Modeling. In: Eighth Int. Conf. on Computer Vision, ICCV 2001, vol. 1, pp. 294301 (2001) [15] Mittal, A., Paragios, N.: Motion-based background subtraction using adaptive kernel density estimation. In: Proceedings of the Int. Conf. Comp. Vision and Patt. Recog., CVPR, pp. 302309 (2004) [16] Tiburzi, F., Escudero, M., Bescs, J., Martnez, J.M.: A Corpus for Motionbased Videoobject Segmentation. In: IEEE International Conference on Image Processing (Workshop on Multimedia Information Retrieval), ICIP 2008, SanDiego, USA (2008) [17] El Baf, F., Bouwmans, T., Vachon, B.: Comparison of Background Subtraction Methods for a Multimedia Application. In: 14th International Conference on systems, Signals and Image Processing, IWSSIP 2007, Maribor, Slovenia, pp. 385388 (2007) [18] Parks, D.H., Fels, S.S.: Evaluation of Background Subtraction Algorithms with Post- Processing. In: IEEE Fifth International Conference on Advanced Video and Signal Based Surveillance, AVSS 2008, pp. 192199 (2008) [19] Z. Zivkovic and F. van der Heijden, Efficient adaptive density estimation per image pixel for the task of background subtraction, Pattern
29

Recognition Letters, vol. 27, no. 7, pp. 773780, May 2006. [20] NVIDIA Corporation, NVIDIA CUDA Programming Guide, 2007. [21] AMD/ATI, ATI CTM (Close to Mental) Guide, 2007. [22] J. Fung, Advances in GPU-based Image Processing and Computer Vision, in SIGGRAPH, 2009. [23] S.-j. Lee and C.-s. Jeong, Real-time Object Segmentation based on GPU, 2006 International Conference on Computational Intelligence and Security, pp. 739742, November 2006. [24] P. Carr, GPU Accelerated Multimodal Background Subtraction. Digital Image Computing: Techniques and Applications, December 2008. [25] Apple Inc., Core Image Programming Guide, 2008. [26] NVIDIA Corporation, NVIDIA CUDA Best Practices Guide, 2010. [27] VSSN 2006 Competition, 2006. [Online]. Available: http://mmc36. informatik.uni-augsburg.de/VSSN06_OSAC/ [28] PETS 2009 Benchmark Data, 2009. [Online]. Available: http: //www.cvg.rdg.ac.uk/PETS2009/a.html [30]//www.eetimes.com

30

Das könnte Ihnen auch gefallen