Sie sind auf Seite 1von 8
Available online at www.sciencedirect.com ScienceDirect Procedia Technology 25 ( 2016 ) 536 – 543 Global

Available online at www.sciencedirect.com

ScienceDirect

Procedia Technology 25 ( 2016 ) 536 – 543

ScienceDirect Procedia Technology 25 ( 2016 ) 536 – 543 Global Colloquium in Recent Advancement and

Global Colloquium in Recent Advancement and Effectual Researches in Engineering, Science and Technology (RAEREST 2016)

A Background Foreground Competitive Model for Background Subtraction in Dynamic Background

Rashid M E*, Vinu Thomas

Model Engineering College, Thrikkakkara, Kochi 682021, India

Abstract

Background subtraction is used to remove the relatively motionless background information from video frames and to detect moving objects. Several methods have been proposed for nonparametric modelling of the background. In this paper a novel method for simultaneous non parametric modelling of background and foreground is proposed, which is utilized for classification of pixels as foreground or background, in a competitive manner. Selective updating of the background and foreground models is employed to accommodate changes in the background. Both temporal and spatial dependencies of pixels are utilized in the model updating. The proposed method gives higher Percentage of Correct Classification (PCC) score in dynamic background compared to the other methods, which is verified using standard databases.

© 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license

© 2015 The Authors.Published by Elsevier Ltd.

(http://creativecommons.org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the organizing committee of RAEREST 2016.

Peer-review under responsibility of the organizing committee of RAEREST 2016

Keywords: Background subtraction, Visual surveillance, Multi object tracking

1. Introduction

Moving object detection is an important task in automatic visual surveillance [3].

* Corresponding author: rashid@mec.ac.in

The three key steps in

2212-0173 © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license

(http://creativecommons.org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the organizing committee of RAEREST 2016

doi:10.1016/j.protcy.2016.08.142

M.E. Rashid and Vinu Thomas / Procedia Technology 25 (2016) 536 – 543

537

automatic surveillance are detection of moving objects, tracking of such objects across frames and analysis of object tracks to recognize their behaviour [3]. Moving objects can be detected by subtracting the background image from the current frame of a video scene. Typical approaches for detecting the foreground pixels in a frame employ the idea of comparing each frame against a model of the background, followed by selecting the pixels that do not fit in the model [1,3]. An independent decision is made for each pixel, possibly taking into account information from neighbouring pixels. Background (BG) subtraction is complex due to the changes in the background [1,2]such as gradual and sudden illumination variation, changes introduced to the background, camera vibration due to wind, dynamic background (water surface, waving flags, curtains, trees etc.), moving objects with similar colour or other features to the background and presence of shadow of moving objects. Figure 1 shows some real world examples of such instances.

Figure 1 shows some real world examples of such instances. (a) (b) (c) (d) (e) (f)
Figure 1 shows some real world examples of such instances. (a) (b) (c) (d) (e) (f)
Figure 1 shows some real world examples of such instances. (a) (b) (c) (d) (e) (f)

(a)

1 shows some real world examples of such instances. (a) (b) (c) (d) (e) (f) (g)
1 shows some real world examples of such instances. (a) (b) (c) (d) (e) (f) (g)

(b)

(c)

some real world examples of such instances. (a) (b) (c) (d) (e) (f) (g) Figure 1.Challenges
some real world examples of such instances. (a) (b) (c) (d) (e) (f) (g) Figure 1.Challenges

(d)

(e)

(f)

(g)

Figure 1.Challenges in Background Subtraction (a) rippling water and fountain (b) waving tree (c) shadow of moving object illumination variation with time and (f) and (g) variation due to camera vibration

(d) and (e)

Many background modelling algorithms have been proposed in the literature. However, the problem of detecting or extracting moving objects in a complex environment is far from being fully resolved [10,15,16]. The paper is organized as follows: section 2 provides a brief survey of existing methods for BG subtraction. Section 3 presents the proposed BGFG model based background subtraction. Section 4 presents classification results and their performance evaluation. The paper is concluded in section 5.

2. Literature Survey

A robust method of determining background intensities is taking the background as the average of previous n frames or using a moving average in order to minimize the memory requirements [4]. To handle the changes in the illumination, an exponential forgetting factor is employed. A simple recursive filter was proposed by Elgammal etal.[5], to estimate the median. Here the running estimate of the median is incremented by one if the input pixel intensity is larger than the estimate, and decreased by one if it is smaller. This estimate eventually converges to a value such that half of the input pixels are larger than this value and the other half smaller than this value. In Minimum-Maximum filter [6], three values are estimated for each pixel using the training sequence without foreground objects: minimum intensity (Min), maximum intensity (Max), and maximum intensity difference between consecutive frames (D). These values are estimated over several frames and are periodically updated for background regions. All of the above are unimodal techniques. The background of the scene may contain many non-static objects such as rippling water, tree branches, bushes etc. whose movement depends on the wind in the scene. This kind of background motion causes the pixel intensity values to vary significantly with time. In such a case,

538

M.E. Rashid and Vinu Thomas / Procedia Technology 25 (2016) 536 – 543

multimodal assumption is essential [13, 18]. Multimodal approaches are categorized as Parametric or Non-parametric methods [11,18]. In the parametric method, the temporal probability density of individual pixels is modelled in a parametric manner. A popular parametric approach is the Gaussian mixture model (GMM) [7], where the values observed over time at each pixel is represented by a weighted mixture of Gaussians. The Gaussian distributions are then evaluated to determine which are the most likely to result from a background process. This deals with lighting changes, repetitive motions from clutter and long-term scene changes. In non-parametric methods, a set of real values sampled from a pixel’s recent history is stored, which is then used to estimate a probability density function to determine whether or not the pixel in the current frame belongs to the background. Elgammal et al. [2] used a kernel density estimation (KDE) approach, where they represented a background model by individual pixels of the last N frames. Mittal and Paragios [19] proposed an adaptive kernel density estimation technique to address the dynamic background case. In [20], Seki et al. exploited the spatial correlation of pixels by using the co-occurrence of image variations. Jung [8] made use of the co-occurrence of pixels in a small area. Chen et al. [9] used the co-occurrence of pixels in a more differentiated manner.

Combined background (BG) and foreground (FG) models have been described recently with spatial and temporal information for BG subtraction. These models include prior information about the FG in the BG modelling. Sheikh et al. [11] described a BG model that competes with an explicit FG model to provide the best description of the visual appearance of a scene. Hao et al. [12] suggested a spatiotemporal KDE based background model and a Gaussian formulation to describe the spatial correlation of moving objects for foreground modelling. The background and foreground models generated are employed to generate a background frame, which is updated based on certain rules. This paper proposes nonparametric modelling of both background and foreground and classification of pixels by the two models which is done in a competitive manner, as background or foreground.

3. Proposed Methodology

Given a sample

S

x

i

i

1

N

from a distribution with density function p x , an estimate

density at x can be calculated [2, 11] using

^ 1

px (

)

N

N

i 1

K

x

x

i

(1)

^

p

x

of the

where K σ is a kernel function with a band width σ. We can think of (1) as estimating the pdf by averaging the effect of a set of kernel functions centered at each data point.

be samples of foreground

intensity values for a pixel. Given these samples, we can obtain an estimate of the pixel intensity probability density

at time t, we can estimate

the background and foreground probabilities for this observation as

function at any intensity using kernel density estimation. Given the observed intensity

Let

xx

bb

1

,

2

x

,.

b

N

be samples of background intensity values and

xx

ff

1

,

2

x

,.

f

N

x

t

(

Pb x

t

(

Pf x

t

)

)

1

N

1

N

N

tb

K

x

x

i

1

N

i

tf

K

x

x

i

i

1

(2)

(3)

Using the probability estimates, the pixel is classified in a competitive way by the two estimates and considered to

be a background pixel if

Similar to the approach proposed by Barnich et al. [13] and Wang et al. [17], a specific form for the probability distribution function is not employed in the proposed method. Separate models will be used for

()

Pb x

t

Pf

()

x

t

.

M.E. Rashid and Vinu Thomas / Procedia Technology 25 (2016) 536 – 543

539

background and foreground. A new value is compared to background and foreground model samples to classify it as foreground or background. As proposed by Barnich et al. [13], spatial neighborhood will be considered for initialization of background model samples and the complementary values will be taken as the initial foreground model samples. Each pixel x is modelled by a collection of N background sample values

(

Mx

b

)

vv

bb

1

,

2

,

v

b N

4

taken in previous frames and by a collection of N foreground sample values

Mx vv

f

ff

1

,

2

(

)

,

v

f

N

5

taken in previous frames. To classify a pixel value v(x) according to its corresponding Model M b (x) and M f (x), we compare it to the closest values within the set of samples by defining a circle S R (v(x)) of radius R centred on v(x). If n 1 is the number of BG model samples within the circle and n 2 is the number of FG model samples within the circle, then the pixel will be classified as below,

if n 1 > n 2 , BG pixel, otherwise, FG pixel

(6)

While computing n 1 and n 2 , highest value will be limited to a cardinality term # min . The classification process is illustrated in Fig. 2. In the figure, dots represent samples of M b (x) and + symbol represent samples of M f (x). As the circle is intersecting with 12 M f (x) samples and with only 3 M b (x) samples, the pixel value v(x) will be taken as FG pixel.

the pixel value v ( x ) will be taken as FG pixel. Fig 2. Competitive

Fig 2. Competitive classification of a pixel value with a set of BG and FG model samples in a 2-D Euclidean color space (C 1 , C 2 )

Updating the BG model is essential to adapt to lighting changes and other variations in the background. Similarly updating the FG model is essential to adapt according to a new foreground object. Selective update mechanism will be used for updating both the models. The models will be selected for updating according to the condition as below,

ifn 1 > n 2 , updating of BG model ifn 1 < n 2 , updating of FG model ifn 1 = n 2 , no model updating

(7)

While updating a model, a randomly selected model sample value will be replaced with the pixel value.

540

4. Results and Discussion

M.E. Rashid and Vinu Thomas / Procedia Technology 25 (2016) 536 – 543

The performance of the proposed algorithm is compared against three methods: moving average [4], recursive median [5] and GMM [7]. All codes were developed in MATLAB except for GMM. For GMM, the code available in the computer vision toolbox of MATLAB was used.

available in the computer vision toolbox of MATLAB was used. (a) (d) (b) (e) (c) (f)

(a)

in the computer vision toolbox of MATLAB was used. (a) (d) (b) (e) (c) (f) Figure.3.

(d)

in the computer vision toolbox of MATLAB was used. (a) (d) (b) (e) (c) (f) Figure.3.

(b)

the computer vision toolbox of MATLAB was used. (a) (d) (b) (e) (c) (f) Figure.3. Moving

(e)

computer vision toolbox of MATLAB was used. (a) (d) (b) (e) (c) (f) Figure.3. Moving Object

(c)

vision toolbox of MATLAB was used. (a) (d) (b) (e) (c) (f) Figure.3. Moving Object Detection

(f)

Figure.3. Moving Object Detection in frame number 375 of CMS dataset . (a)Input image, (b)Ground truth, (c) Moving average method, (d) Recursive median method, (e) GMM (f) proposed BGFG

The experimentations have been carried out on the Carnegie Mellon test image sequence (CMS) [11] and on the two test image sequences in the CDW datasets [14]: CDW\dynamic Background\canoe and CDW\dynamic Background\overpass. Some of the challenges in the datasets are - noisy or non-stationary background, slow foreground, waving tree, rippling water, movement along the camera axis ie, radial motion, moving object with similar color to the background, cast and moving shadow and illumination variation.

cast and moving shadow and illumination variation. (a) (d) (b) (e) (c) (f) Figure. 4. Moving

(a)

cast and moving shadow and illumination variation. (a) (d) (b) (e) (c) (f) Figure. 4. Moving

(d)

cast and moving shadow and illumination variation. (a) (d) (b) (e) (c) (f) Figure. 4. Moving

(b)

and moving shadow and illumination variation. (a) (d) (b) (e) (c) (f) Figure. 4. Moving Object

(e)

moving shadow and illumination variation. (a) (d) (b) (e) (c) (f) Figure. 4. Moving Object Detection

(c)

shadow and illumination variation. (a) (d) (b) (e) (c) (f) Figure. 4. Moving Object Detection in

(f)

Figure. 4. Moving Object Detection in frame number 965 of CDW canoe dataset . (a) Input image, (b) Ground truth, (c) Moving average method,

(d) Recursive median method, (e) GMM

(f) proposed BGFG

M.E. Rashid and Vinu Thomas / Procedia Technology 25 (2016) 536 – 543

541

The parameters N and #min of the proposed method have been optimized to get the best PCC score. N=20 and #min=10 gave the best PCC score. CDW\dynamic background\overpass dataset alone was used for this optimization.

Figure.3 shows background subtraction results obtained when the proposed method and other methods were tested on the CMS dataset where the background variation is low. It is observed that the GMM method and the proposed method give the best result. Figure.4 shows results when the algorithms were tested on CDW/dynamic Background/canoe dataset. Rippling water resulting in the background being dynamic, a moving object (human wearing a cap) with color similar to the background, a large object (canoe) that moves slowly etc are some of the challenges in this video. The number of moving object pixels detected is higher in the proposed method compared to other methods. It is also observed that the proposed method successfully rejects many of the pixels in the water surface.

Figure.5 shows results when the algorithms were tested on CDW/dynamic Background/overpass dataset. Waving tree rippling water etc are some of the challenges in this video. The number of moving object pixels detected is higher in the proposed method compared to the other methods. It is also observed that the proposed method successfully rejects many of the pixels corresponding to the waving tree.

rejects many of the pixels corresponding to the waving tree. (a) (d) (b) (e) (c) (f)

(a)

many of the pixels corresponding to the waving tree. (a) (d) (b) (e) (c) (f) Figure.

(d)

many of the pixels corresponding to the waving tree. (a) (d) (b) (e) (c) (f) Figure.

(b)

of the pixels corresponding to the waving tree. (a) (d) (b) (e) (c) (f) Figure. 5.

(e)

the pixels corresponding to the waving tree. (a) (d) (b) (e) (c) (f) Figure. 5. Moving

(c)

pixels corresponding to the waving tree. (a) (d) (b) (e) (c) (f) Figure. 5. Moving Object

(f)

Figure. 5. Moving Object Detection in frame number 2480 of CDW overpass dataset . (a)Input image, (b)Ground truth, (c) Moving average method, (d) Recursive median method, (e) GMM (f) proposed BGFG

Quantitative evaluation has been conducted on the three datasets that were employed. The conventional Recall and Precision metrics, used in many of the literature [11, 12], are used to obtain a quantitative evaluation measure. They are defined as below:

Precision

Number of correctly detected foreground pixels

Number of detected foreground pixels

Recall

Number of

correcly detected foreground pixels

Number of foreground pixels in ground truth

(8)

(9)

Precision and recall metrics measured on the datasets using the proposed method and the other methods used for comparison of the performance, are indicated in the tables 1 and 2. Proposed method gives the highest precision whereas the recall is highest in the case of recursive median method.

542

M.E. Rashid and Vinu Thomas / Procedia Technology 25 (2016) 536 – 543

Table 1.Detection Performance: Precision.

 

Movingaverag

Recursive

Proposed

Video

e

median

GMM

BGFG

CMS

0.383

0.454

0.661

0.728

CDWcanoe

0.173

0.201

0.429

0.510

CDWoverpass

0.49

0.541

0.635

0.781

Average Precision

0.349

0.399

0.575

0.673

Table 2.Detection Performance: Recall.

 
 

Moving

Recursive

Proposed

Video

average

median

GMM

BGFG

CMS

0.674

 

0.729

0.754

0.667

CDWcanoe

0.510

 

0.566

0.646

0.638

CDWoverpass

0.576

0.639

0.430

0.590

Average recall

0.624

0.680

0.538

0.632

According to Elhabian et al. [18], the best metric that can be used to compare performance of binary classifiers is the percentage of correct classification (PCC), which combines the effect of precision and recall

PCC

TP

TN

 

TP

TN

FP

FN

(10)

where the number of true positives (TP), which counts the number of correctly detected foreground pixels; the number of false positives (FP), which counts the number of background pixels incorrectly classified as foreground; the number of true negatives (TN), which counts the number of correctly classified background pixels; and the number of false negatives (FN), which accounts for the number of foreground pixels incorrectly classified as background.

Table 3.Detection Performance: PCC.

Moving

Recursive

Proposed

Video

average

median

GMM

BGFG

CMS

 

0.947

0.959

0.978

0.979

CDWcanoe

 

0.822

0.924

0.932

0.947

CDWoverpass

0.897

0.910

0.922

0.942

Average PCC

0.889

0.931

0.944

0.956

PCC score measured on the datasets using the proposed method and the other methods used for comparison of the performance are indicated in the table 3. The proposed method gives the highest PCC score compared to the other methods.

M.E. Rashid and Vinu Thomas / Procedia Technology 25 (2016) 536 – 543

5. Conclusion

543

A competitive background Foreground model is used for background subtraction. The algorithm is applied to challenging dynamic video sequences and compared to results obtained using other methods. The algorithm produces reliable classification results. Temporal consistency of pixel values will be considered in our future work.

References

[1] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers. Wallflower: Principles and practice of background maintenance. In Proc. IEEE Int. Conf. Comput. Vis; vol. 1, pp. 255261, Sep. 1999. [2] A. Elgammal, R. Duraiswami, D. Harwood, and L. S. Davis. Background and foreground modeling using non-parametric kernel density estimation for visual surveillance. In Proc. of the IEEE; volume 90, no. 7, pp. 11511163 July 2002. [3] Cheung.S.C and Kamath.C. Robust techniques for background subtraction in urban traffic video. Proc Elect Imaging: Visual Comm Image Proce (Part One); SPIE, 2004, No.5308, pp.881-892, 2004. [4] A. Cavallaro and T. Ebrahimi. Video object extraction based on adaptive background and statistical change detection. In Proc. Vis.Commun. Image Process; pp. 465475 Jan. 2001. [5] Elgammal A, Duraiswami R and Davis LS. Efficient Kernel Density Estimation Using the Fast Gauss Transform with Applications to Color Modeling and Tracking. IEEE Trans.Patt Analysis and Machine Intelligence; 25(11): 1499- 1504, 2003. [6] I. Haritaoglu, D. Harwood, and L. S. Davis. W4 : Real-time surveillance of people and their activities. IEEE Trans. Pattern Anal. Machine Intell; vol. 22, pp. 809830, Aug. 2000. [7] Stauffer.C and Grimson.W.E.L. Adaptive Background Mixture Models for Real Time Tracking. Computer Society Conference on Computer Vision and Pattern Recognition, Ft. Collins, USA; vol. 2, Pages 246-252, 1999. [8] C. R. Jung. Efficient background subtraction and shadow removal for monochromatic video sequences. IEEE Trans. Multimedia; vol. 11, no. 3, pp. 571577, Apr. 2009. [9] S Chen, J Zhang,YLi and J Zhang. A Hierarchical Model Incorporating Segmented Regions and Pixel Descriptors for Video Background Subtraction. IEEE Trans. Industrial Informatics; vol. 8, no. 1, pp 118-127, Feb 2012. [10] C. Liu, P. C. Yuen and G. Qiu. Object motion detection using information theoretic spatio-temporal saliency. Pattern Recognit; vol. 42, no. 11, pp. 28972096, Nov. 2009. [11] Y. Sheikh and M. Shah. Bayesian modeling of dynamic scenes for object detection. IEEE Trans. Pattern Anal. Mach. Intell; vol. 27, no. 11, pp. 17781792, Nov. 2005. [12] JiuYueHao, Chao Li, Zuwhan Kim, and Zhang Xiong. Spatio-Temporal Traffic Scene Modeling for Object Motion Detection. IEEE trans. Intelligent Transportation Systems; vol. 14, no. 1, pp. 295-302, March 2013. [13] Barnich. O and Van. Droogenbroeck.M. ViBe: A Universal Background Subtraction Algorithm for Video Sequences. IEEE trans. Image Processing; vol.20, No.6, June 2011. [14] wordpress-jodoin.dmi.usherb.ca/cdw2012/ [15] Y. Benezeth, P. Jodoin, B. Emile, H. Laurent, and C. Rosenberger. Review and evaluation of commonly implemented Background subtraction algorithms. In Proc. IEEE Int. Conf. Pattern Recognition; Dec. 2008, pp. 1-4. [16] M. Cristani and V. Murino. A spatial sampling mechanism for effective background Subtraction; In proc. VISAPP, 2007, pp. 403-412. [17] H. Wang and D. Suter. A consensus-based method for tracking:Modelling background scenario and foreground appearance. Pattern Recognit; vol. 40, pp. 10911105, Mar. 2007. [18] S. Elhabian, K. El-Sayed, and S. Ahmed. Moving object detection in spatial domain using background removal techniquesState-of-art. Recent Pat. Comput. Sci; vol. 1, pp. 3254, Jan. 2008. [19] A Mittal and N Paragios. Motion based Background Subtraction Using Adaptive Kernel density Estimation. In proc. International conference on Computer Vision and Pattern Recognition; pp. 302-309., 2004. [20] M. Seki, T. Wada, H. Fujiwara, and K. Sumi. Background subtraction based on cooccurrence of image variations. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit; 2003, vol. 2, pp. 6572.