Sie sind auf Seite 1von 15

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/222065807

Real-time foregroundbackground
segmentation using codebook model

Article in Real-Time Imaging June 2005


Impact Factor: 2.27 DOI: 10.1016/j.rti.2004.12.004 Source: DBLP

CITATIONS READS

754 347

4 authors, including:

Kyungnam Kim Thanarat H. Chalidabhongse


HRL Laboratories, LLC Chulalongkorn University
44 PUBLICATIONS 1,316 CITATIONS 46 PUBLICATIONS 2,362 CITATIONS

SEE PROFILE SEE PROFILE

Larry S. Davis
University of Maryland, College Park
456 PUBLICATIONS 23,306 CITATIONS

SEE PROFILE

All in-text references underlined in blue are linked to publications on ResearchGate, Available from: Larry S. Davis
letting you access and read them immediately. Retrieved on: 25 June 2016
ARTICLE IN PRESS

Real-Time Imaging ] (]]]]) ]]]]]]


www.elsevier.com/locate/rti

Real-time foregroundbackground segmentation


using codebook model
Kyungnam Kima,, Thanarat H. Chalidabhongseb, David Harwooda, Larry Davisa
a
Computer Vision Lab, Department of Computer Science, University of Maryland, College Park, MD 20742, USA
b
Faculty of Information Technology, King Mongkuts Institute of Technology, Ladkrabang, Bangkok 10520, Thailand

Abstract

We present a real-time algorithm for foregroundbackground segmentation. Sample background values at each pixel are
quantized into codebooks which represent a compressed form of background model for a long image sequence. This allows us to
capture structural background variation due to periodic-like motion over a long period of time under limited memory. The
codebook representation is efcient in memory and speed compared with other background modeling techniques. Our method can
handle scenes containing moving backgrounds or illumination variations, and it achieves robust detection for different types of
videos. We compared our method with other multimode modeling techniques.
In addition to the basic algorithm, two features improving the algorithm are presentedlayered modeling/detection and adaptive
codebook updating.
For performance evaluation, we have applied perturbation detection rate analysis to four background subtraction algorithms and
two videos of different types of scenes.
r 2005 Elsevier Ltd. All rights reserved.

1. Introduction unimodal distribution. This basic model is used in [1,2].


However, a single-mode model cannot handle multiple
The capability of extracting moving objects from a backgrounds, like waving trees. The generalized mixture
video sequence captured using a static camera is a typical of Gaussians (MOG) in [3] has been used to model
rst step in visual surveillance. A common approach for complex, non-static backgrounds. Methods employing
discriminating moving objects from the background is MOG have been widely incorporated into algorithms
detection by background subtraction. The idea of that utilize Bayesian frameworks [4], dense depth data
background subtraction is to subtract or difference the [5], color and gradient information [6], mean-shift
current image from a reference background model. The analysis [7], and region-based information [8].
subtraction identies non-stationary or new objects. MOG does have some disadvantages. Backgrounds
having fast variations are not easily modeled with just a
1.1. Related work few Gaussians accurately, and it may fail to provide
sensitive detection (which is mentioned in [9]). In
The simplest background model assumes that the addition, depending on the learning rate to adapt to
intensity values of a pixel can be modeled by a single background changes, MOG faces trade-off problems.
For a low learning rate, it produces a wide model that
Corresponding author. Tel.: +1 301 405 8368; has difculty in detecting a sudden change to the
fax: +1 301 314 9658. background. If the model adapts too quickly, slowly
E-mail addresses: knkim@umiacs.umd.edu (K. Kim), moving foreground pixels will be absorbed into the
thanarat@it.kmitl.ac.th (T.H. Chalidabhongse),
harwood@umiacs.umd.edu (D. Harwood),
background model, resulting in a high false negative
lsd@umiacs.umd.edu (L. Davis). rate. This is the foreground aperture problem described
URL: http://www.cs.umd.edu/knkim. in [10].

1077-2014/$ - see front matter r 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.rti.2004.12.004
ARTICLE IN PRESS
2 K. Kim et al. / Real-Time Imaging ] (]]]]) ]]]]]]

To overcome these problems, a non-parametric above algorithm are presented in Section 4layered
technique estimating the probability density function modeling/detection and adaptive codebook updating. In
at each pixel from many samples using kernel density Section 5, a performance evaluation techniquepertur-
estimation technique was developed in [9]. It is able to bation detection rate analysisis used to evaluate four
adapt very quickly to changes in the background process pixel-based algorithms. Finally, conclusion and discus-
and to detect targets with high sensitivity. A more sion are presented in last Section 6.
advanced approach using adaptive kernel density
estimation was recently proposed in [11].
However, the non-parametric technique in [9] cannot 2. Background modeling and detection
be used when long-time periods are needed to suf-
ciently sample the backgroundfor example when there The CB algorithm adopts a quantization/clustering
is signicant wind load on vegetationdue mostly to technique, inspired by Kohonen [18,19], to construct a
memory constraints. Our algorithm constructs a highly background model from long observation sequences.
compressed background model that addresses that For each pixel, it builds a codebook consisting of one or
problem. more codewords. Samples at each pixel are clustered
Pixel-based techniques assume that the time series of into the set of codewords based on a color distortion
observations is independent at each pixel. In contrast, metric together with brightness bounds. Not all pixels
some researchers [5,8,10] employ a region- or frame- have the same number of codewords. The clusters
based approach by segmenting an image into regions or represented by codewords do not necessarily correspond
by rening low-level classication obtained at the pixel to single Gaussian or other parametric distributions.
level. Markov random eld techniques employed in Even if the distribution at a pixel were a single normal,
[12,13] can also model both temporal and spatial there could be several codewords for that pixel. The
context. Algorithms in [14,15] aim to segment the background is encoded on a pixel-by-pixel basis.
foreground objects in dynamic textured backgrounds Detection involves testing the difference of the current
(e.g., water, escalators, waving trees, etc.). Furthermore, image from the background model with respect to color
Amer et al. [16] describes interactions between low-level and brightness differences. If an incoming pixel meets
object segments and high-level information such as two conditions, it is classied as background(1) the
tracking or event description. color distortion to some codeword is less than the
detection threshold, and (2) its brightness lies within the
1.2. Proposed algorithm brightness range of that codeword. Otherwise, it is
classied as foreground.
Our codebook (CB) background subtraction algo-
rithm was intended to sample values over long times, 2.1. Construction of the initial codebook
without making parametric assumptions. Mixed back-
grounds can be modeled by multiple codewords. The The algorithm is described for color imagery, but it
key features of the algorithm are can also be used for gray-scale imagery with minor
modications. Let X be a training sequence for a single
 an adaptive and compact background model that can pixel consisting of N RGB-vectors: X fx1 ; x2 ; . . . ; xN g:
capture structural background motion over a long Let C fc1 ; c2 ; . . . ; cL g represent the codebook for the
period of time under limited memory. This allows us pixel consisting of L codewords. Each pixel has a
to encode moving backgrounds or multiple changing different codebook size based on its sample variation.
backgrounds; Each codeword ci ; i 1 . . . L; consists of an RGB
 the capability of coping with local and global vector vi R i; G i and a 6-tuple auxi hIi ; I^i ; f i ;
i; B
illumination changes; li ; pi ; qi i: The tuple auxi contains intensity (brightness)
 unconstrained training that allows moving foreground values and temporal variables described below:
objects in the scene during the initial training period;
 layered modeling and detection allowing us to have  I^ the min and max brightness, respectively, of all
I;
multiple layers of background representing different pixels assigned to this codeword
background layers. f the frequency with which the codeword has
occurred
In Section 2, we describe the codebook construction l the maximum negative run-length (MNRL) dened
algorithm and the color and brightness metric, used for as the longest interval during the training period
detection. We show, in Section 3, that the method is that the codeword has NOT recurred
suitable for both stationary and moving backgrounds in p; q the first and last access times, respectively, that the
different types of scenes, and applicable to compressed codeword has occurred
videos such as MPEG. Important improvements to the
ARTICLE IN PRESS
K. Kim et al. / Real-Time Imaging ] (]]]]) ]]]]]] 3

In the training period, each value, xt ; sampled at time t samples encoding approximation. To determine which
is compared to the current codebook to determine which codeword will be the best match, we employ a color
codeword cm (if any) it matches (m is the matching distortion measure and brightness bounds. The detailed
codewords index). We use the matched codeword as the algorithm is given below.

Algorithm for Codebook construction

I. L 0 1, C ; (empty set)
II. for t 1 to N do
p
(i) xt R; G; B; I R2 G 2 B2
(ii) Find the codeword cm in C fci j1pipLg matching to xt based on two conditions (a) and (b).
(a) colordistxt ; vm p1
(b) brightnessI; hIm ; I^m i true
(iii) If C ; or there is no match, then L L 1: Create a new codeword cL by setting
 vL R; G; B
 auxL hI; I; 1; t  1; t; ti:
(iv) Otherwise, update the matched codeword cm ; consisting of
vm R
m; G  ^
 m ; Bm and auxm hIm ; I m ; f m ; lm ; pm ; qm i; by setting
m G f B
m R f m G
f mR m m B
 vm f m 1 ; f m 1 ; f m 1

 auxm hminfI; Im g; maxfI; I^m g; f m 1; maxflm ; t  qm g; pm ; ti:


end for
III. For each codeword ci ; i 1; . . . ; L; wrap around li by setting li maxfli ; N  qi pi  1g:

In the temporal ltering step, we rene the fat


The two conditions (a) and (b) in the Step II(ii), detailed
codebook by separating the codewords that might
in Eqs. (2, 3) later, are satised when the pure colors of
contain moving foreground objects from the true
xt and cm are close enough and the brightness of xt lies
background codewords, thus allowing moving fore-
between the acceptable brightness bounds of cm : Instead
ground objects during the initial training period. The
of nding the nearest neighbor, we just nd the rst
true background, which includes both static pixels and
codeword to satisfy these two conditions. 1 is the
moving background pixels, usually is quasi-periodic
sampling threshold (bandwidth). One way to improve
(values recur in a bounded period). This motivates the
the speed of the algorithm is to relocate the most
temporal criterion of MNRL (l), which is dened as the
recently updated codeword to the front of the codebook
maximum interval of time that the codeword has not
list. Most of the time, the matched codeword was the
recurred during the training period. For example, as
rst codeword thus relocated, making the matching step
shown in Fig. 1, a pixel on the tip of the tree was
efcient.
sampled to plot its intensity variation over time. The
Note that reordering the training set almost always
codeword of sky-color has a very small l; around 15,
results in codebooks with the same detection capacity.
and that of tree-color has 100. However, the codeword
Reordering the training set would require maintaining
of the persons body has a very large l; 280.
all or a large part of it in memory. Experiments show
Let M and T M denote the background model (which
that one-pass training is sufcient. Retraining or other
is a rened codebook after temporal ltering) and the
simple batch processing methods do not affect
threshold value, respectively. Usually, T M is set equal to
detection signicantly.
half the number of training frames, N=2;

M fcm j cm 2 C ^ lm pT M g. (1)
2.2. Maximum negative run-length
Codewords having a large l will be eliminated from the
We refer to the codebook obtained from the previous codebook by Eq. (1). Even though one has a large
step as the fat codebook. It contains all the codewords frequency f, its large l means that it is mostly a
that represent the training image sequence, and may foreground event which was stationary only for that
include some moving foreground objects and noise. period f. On the other hand, one having a small f and a
small l could be a rare background event occurring
1
means assignment. quasi-periodically. We can use l as a feature to
ARTICLE IN PRESS
4 K. Kim et al. / Real-Time Imaging ] (]]]]) ]]]]]]

Most of the time, the pixel shows sky colors

250

200

Intensity
150

100 The tree shows up quasi-periodically with an acceptable


frame244
50
A pixel on the tip of the The person occupied the pixel over this period.
tree was sampled. 0
0 50 100 150 200 250

time

Fig. 1. Example showing how MNRL is used.

discriminate the actual background codewords from the decreasing or increasing the light strength to make the
moving foreground codewords. If T M N=2; all the pixel values darker or brighter. The pixel values are
codewords should recur at least every N=2 frames. We mostly distributed in elongated shape along the axis
note that we also experimented with the combination of going toward the origin point 0; 0; 0:
the frequency f and l; but that l alone performs almost Based on this observation, we developed a color
the same as that combination. model depicted in Fig. 3 to perform a separate
Experiments on many videos reveal that only 6.5 evaluation of color distortion and brightness distortion.
codewords per pixel (on average) are required for the The motivation of this model is that background pixel
background acquisition in order to model 5 min of values lie along the principal axis of the codeword along
outdoor video captured at 30 frames/s. By contrast, with the low and high bound of brightness, since the
indoor videos are simpler, having one or two back- variation is mainly due to brightness. When we have an
ground values nearly everywhere. This reasonable input pixel xt R; G; B and a codeword ci where vi
number of codewords means that our method achieves R i ; B i ;
i; G
a high compression of the background model. This
allows us to capture variable moving backgrounds over kxt k2 R2 G 2 B2 ;
a very long period of training time with limited memory. kvi k2 R 2i B 2i ;
2i G
i G B i B2 :
iR G
hxt ; vi i2 R
2.3. Color and brightness
The color distortion d can be calculated by
To deal with global and local illumination changes hxt ; vi i2
such as shadows and highlights, algorithms generally p2 kxt k2 cos2 y ,
kvi k2
employ normalized colors (color ratios). These techni- q
ques typically work poorly in dark areas of the image. colordistxt ; vi d kxt k2  p2 . 2
The dark pixels have higher uncertainty2 than the bright
pixels, since the color ratio uncertainty is related to Our color distortion measure can be interpreted as a
brightness. Brightness should be used as a factor in brightness-weighted version in the normalized color
comparing color ratios. This uncertainty makes the space. This is equivalent to geometrically rescaling
detection in dark regions unstable. The false detections (normalizing) a codeword vector to the brightness of
tend to be clustered around the dark regions. This an input pixel. This way, the brightness is taken into
problem is discussed in [17]. consideration for measuring the color distortion, and we
Hence, we observed how pixel values change over avoid the instability of normalized colors.
time under lighting variation. Fig. 2(b) shows the pixel To allow for brightness changes in detection, we store
value distributions in the RGB space where 4 represen- I and I^ statistics, which are the min and max brightness
tative pixels are sampled from the image sequence of the of all pixels assigned to a codeword, in the 6-tuple
color-chart in Fig. 2(a). In the sequence captured in a dened in Section 2.1. We allow the brightness change
lab environment, the illumination changes over time by to vary in a certain range that limits the shadow level
and highlight level. The range is I low ; I hi ; for each
2
Consider two pairs of two color values at the same Euclidean codeword, dened as
distance in RGB spaceh10; 10; 10i and h9; 10; 11i for dark pixels,
 
h200; 200; 200i and h199; 200; 201i for bright pixels. Their distor- 
tions in normalized colors are 30 2
j109jj1010jj1011j
30
2
and 200 ^ I hi min bI;
I low aI; ^ I ,
j200199jj200200jj200201j
; respectively. a
200
ARTICLE IN PRESS
K. Kim et al. / Real-Time Imaging ] (]]]]) ]]]]]] 5

Fig. 2. The distributions of 4 pixel values of the color-chart image sequence having illumination changes over time: (a) original color-chart image,
(b) 3D plot of pixel distributions.


This is very fast and shows little difference in detection
G Ihi compared with the probability estimate. The subtraction
decision boundary
I operation BGSx for an incoming pixel value x in the
. vi (codeword)
test set is dened as:
I

Ilow Algorithm for Background subtraction


. xt (input pixel)
p p
I. x R; G; B; I R2 G 2 B2

II. For all codewords in M in Eq. (1), nd the
R codeword cm matching to x based on two
B O
conditions:
 colordistx; cm p2
Fig. 3. The proposed color modela separate evaluation of color
 brightnessI; hIm ; I^m i true
distortion and brightness distortion.
Update the matched codeword as in Step II (iv)
in the algorithm
( of codebook construction.
where ao1 and b41: Typically, a is between 0.4 and III. foreground if there is no match
0.7,3 and b is between 1.1 and 1.5.4 This range I low ; I hi  BGSx
becomes a stable range during codebook updating. The background otherwise:
logical brightness function in Section 2.1 is dened as
 2 is the detection threshold. The pixel is detected as
true if I low pkxt kpI hi ; foreground if no acceptable matching codeword exists.
 ^
brightnessI; hI; Ii (3)
false otherwise: Otherwise it is classied as background.

2.5. Review of multimode modeling techniques


2.4. Foreground detection
Here, we compare our method with other multimode
Subtracting the current image from the background background modeling techniquesMOG [3] and Kernel
model is straightforward. Unlike MOG or [9] which [9]. The characteristics of each algorithm are listed in
compute probabilities using costly oating point opera- Table 1.
tions, our method does not involve probability calcula-
tion. Indeed, the probability estimate in [9] is dominated  Unlike MOG, we do not assume that backgrounds are
by the nearby training samples. We simply compute the multimode Gaussians. If this assumption, by chance,
distance of the sample from the nearest cluster mean. were correct, then MOG would get accurate para-
3 meters, and would be very accurate. But this is not
These typical values are obtained from experiments. 0.4 allows large
brightness bounds, but 0.7 gives tight bounds.
always true. The background distribution could be
4
b is additionally used for limiting I hi since shadows (rather than very different from normal, as we see in compressed
highlights) are observed in most cases. videos such as MPEG.
ARTICLE IN PRESS
6 K. Kim et al. / Real-Time Imaging ] (]]]]) ]]]]]]

Table 1
Characteristics of background modeling algorithms

MOG [3] Kernel [9] CB (proposed)

Model representation Mixture of Gaussians Kernel density Codebook


Model evaluation Probability density estimation Probability density estimation Distance
Parametric modeling Yes No No
Color metric RGB only Normalized color r, g and s Rescaled RGB and brightness
(brightness)
Background memorization As much as K Gaussians hold Short-term (N samples) Almost practically innite memory
capacity
Long-term (N samples)
Memory usage Small Large Compact
Processing speed Slow Slow Fast
Model maintenance Online updating with K Gaussians Short- and long-term models Layered modeling
and detection using cache

 Also, in contrast to Kernel, we do not store raw image of the standard deviations of blue(B)-channel
samples to maintain the background model. These values in the training set. It is easy to see that the
samples are huge, but do not cover a long period of distribution of pixel values has been affected by the
time. The codebook models are so compact that we blocking effects of MPEG. The unimodal model in
can maintain them with very limited memory. Fig. 4(c) suffers from these effects. For the compressed
 Ours handles multi-backgrounds well. There is no video, CB eliminates most compression artifactssee
restriction of the number of backgrounds. It can Figs. 4(c)(f).
model trees which move longer than the raw sample In a compressed video, pixel intensities are usually
size of Kernel. Even the rare background events, quantized into a few discontinuous values based on an
which meet the quasi-periodicity condition, survive as encoding scheme. Their histograms show several spike
backgrounds. distributions in contrast to continuous bell-shaped
 Unconstrained training using MNRL ltering allows distributions for an uncompressed video. MOG has
moving foreground objects in the training sequence. low sensitivity around its Gaussian tails and less
 Our codebook method does not evaluate probabilities, frequent events produce low probability with high
which is very computationally expensive. We just variance. Kernels background model, which contains
calculate the distance from the cluster means. That a recent N-frame history of pixel values, may not cover
makes the operations fast. some background events which were quantized before
 MOG uses the original RGB variables and does not the N frames. If Gaussian kernels are used, the same
separately model brightness and color. MOG cur- problems occur as in the MOG case. CB is based on a
rently does not model covariances, which are often vector quantization technique. It can handle these
large and caused by variation in brightness. It is discrete quantized samples, once they survive temporal
probably best to explicitly model brightness. Kernel ltering (l-ltering).
uses normalized colors and brightness; the normalized Fig. 5 illustrates the ability of the codebooks to model
color has uncertainty related to brightness. To cope multiple moving backgroundsThe trees behind the
with the problem of illumination changes such as person moving signicantly in the video. For the test
shading and highlights, we calculates a brightness sequence5 used in Fig. 5(a), further comparison of our
difference as well as a color difference of rescaled method was done with 10 different algorithms, and the
RGB values. results are described in [10].
In areas such as building gates, highways, or path-
ways where people walk, it is difcult to obtain good
3. Detection results and comparison background models without ltering out the effects of
foreground objects. We applied the algorithms to a test
Most existing background subtraction algorithms fail video in which people are always moving in and out a
to work with low-bandwidth compressed videos mainly building (see Fig. 6). By l-ltering, our method was able
due to spatial block compression that causes block to obtain the most complete background model.
artifacts, and temporal block compression that causes
abnormal distribution of encoding (random spikes).
Fig. 4(a) is an image extracted from an MPEG video 5
We would like to thank K. Toyama and J. Krumm at Microsoft
encoded at 70 kbits/s. Fig. 4(b) depicts 20-times scaled Research, for providing us with this image sequence.
ARTICLE IN PRESS
K. Kim et al. / Real-Time Imaging ] (]]]]) ]]]]]] 7

Fig. 4. Detection results on a compressed video: (a) original image, (b) standard deviations, (c) unimodal model in [2], (d) MOG, (e) Kernel, (f) CB
(proposed).

Fig. 5. Detection results on multiple moving backgrounds: (a) original image, (b) MOG, (c) Kernel, (d) CB (proposed).

Multiple backgrounds moving over a long period of containing no foreground objects. This is due to a
time cannot be well trained with techniques having compact background model represented by quantized
limited memory constraints. A sequence of 1000 frames codewords.
recorded at 30 frames/s (fps) was trained. It contains The implementation of the approach is quite straight-
trees moving irregularly over that period. The number forward and is faster than MOG and Kernel. Table 2
of Gaussians allowed for MOG was 10. A sample of shows the speeds to process the results in Figs. 7(b)(d)
size 300 was used to represent the background. Fig. 7 on a 2 GHz Dual Pentium system. Note that the training
shows that CB captures most multiple background time of Kernel is mostly used for reading and storing
events; here we show typical false alarms for a frame samples.
ARTICLE IN PRESS
8 K. Kim et al. / Real-Time Imaging ] (]]]]) ]]]]]]

Fig. 6. Detections results on training of non-clean backgrounds: (a) original image, (b) MOG, (c) Kernel, (d) CB (proposed).

Fig. 7. Detections results on very long-time backgrounds: (a) original image, (b) MOG, (c) Kernel, (d) CB (proposed).

Regarding memory usage for the results in Figs. for each sample300 samples amount to 900 bytes. In
7(b)(d), MOG requires 5 oating point numbers6 RGB CB, we have 5 oating point numbers (R; B;
G;  I)
I; ^ and
means, a variance, a weight for each distribution10 4 integers (f ; l; p; q)the average7 number of codewords
Gaussians correspond to 200 bytes. Kernel needs 3 bytes in each pixel, 4 codewords, can be stored in 112 bytes.

6 7
Floating point: 4 bytes, integer: 2 bytes. The number of codewords depends on the variation of pixel values.
ARTICLE IN PRESS
K. Kim et al. / Real-Time Imaging ] (]]]]) ]]]]]] 9

4. Improvements T delete : The periodicity of an incoming pixel value is


ltered by T H ; as we did in the background modeling.
In order to make our technique more practically The values re-appearing for a certain amount of time
useful in a visual surveillance system, we improved the (T add ) are added to the background model as non-
basic algorithm by layered modeling/detection and permanent, short-term background. We assume that the
adaptive codebook updating. background obtained during the initial background
modeling is permanent. Background values not accessed
4.1. Layered modeling and detectionmodel for a long time (T delete ) are deleted from the background
maintenance model. Thus, a pixel can be classied into four
subclasses(1) background found in the permanent
The motivation of layered modeling and detection is background model, (2) background found in the non-
to still be able to detect foreground objects against new permanent background model, (3) foreground found in
backgrounds which were obtained during the detection the cache, and (4) foreground not found in any of them.
phase. If we do not have those background layers, This adaptive modeling capability also allows us to
interesting foreground objects (e.g., people) will be capture changes to the background scene (see Fig. 8).
detected mixed with other stationary objects (e.g., car). Only two layers of background are described here, but
The scene can change after initial training, for this can be extended to multiple layers. The detailed
example, by parked cars, displaced books, etc. These procedure is given below:
changes should be used to update the background
model. We do this by dening an additional model H I. After training, the background model M is
called a cache and three parametersT H ; T add ; and obtained. Create a new model H as a cache.
II. For an incoming pixel x; nd a matching
codeword in M: If found, update the codeword.
Table 2
Processing speed in frames/s
III. Otherwise, try to nd a matching codeword in H
and update it. For no match, create a new
MOG Kernel CB codeword h and add it to H:
IV. Filter out the cache codewords based on T H :
Background training 8.3 40.8 39.2
Background subtraction 12.1 11.1 30.7 H H  fhi jhi 2 H; l of hi is longer than T H g

Fig. 8. Layered modeling and detectiona woman placed a box on the desk and then the box has been absorbed into the background model as non-
permanent. Then a purse is put in front of the box. The purse is detected against both the box and the desk.
ARTICLE IN PRESS
10 K. Kim et al. / Real-Time Imaging ] (]]]]) ]]]]]]

variation. Fig. 10(a) shows two sample points (labeled 1


V. Move the cache codewords staying for enough and 2) which are signicantly affected by illumination
time, to M: changes and Fig. 10(b) shows the brightness changes of
M M [ fhi jhi 2 H; hi stays longer than T add g those two points. As shown in Fig. 10(d), adaptive
codebook updating eliminates the false detection which
VI. Delete the codewords not accessed for a long time occurs on the roof and road in Fig. 10(c).
from M:
M M  fci jci 2 M; ci not accessed for T delete g
5. Performance evaluation using PDR analysis
VII. Repeat the process from the Step II.
In this section we evaluate the performance of several
background subtraction algorithms using perturbation
Layered modeling and detection can also be used for the detection rate (PDR) analysis. PDR measures, given
further analysis of scene change detection. As shown in a false alarm rate (FA-rate), the sensitivity of a
Fig. 9, a man unloads two boxes after parking the car. background subtraction algorithm in detecting low
The car and the two boxes are labeled with different contrast targets against a background as a function of
coloring based on their rst-access-times as non- contrast (D), also depending on how well the model
permanent backgrounds while the man is still detected captures mixed (moving) background events. As an
as foreground. alternative to the common method of ROC analysis, it
does not require foreground targets or knowledge of
4.2. Adaptive codebook updatingdetection under global foreground distributions. PDR graphs show how
illumination changes sensitively an algorithm detects foreground targets at a
certain contrast (D) to the background as the contrast
Global illumination changes (for example, due to increases. A detailed discussion of PDR analysis is
moving clouds) make it difcult to conduct background reported in [21].
subtraction in outdoor scenes. They cause over-detec- We evaluate four algorithmsCB (proposed), MOG
tion, false alarms, or low sensitivity to true targets. [3], KER [9], and UNI [2]. UNI was added to evaluate
Good detection requires equivalent false alarm rates single-mode technique in contrast to multi-mode ones.
over time and space. We discovered from experiments Since the algorithm in [9] can work with either normal-
that variations of pixel values are different (1) at ized colors (KER) or RGB colors (KER.RGB), it has
different surfaces (shiny or muddy), and (2) under two separate graphs. Fig. 11 shows the representative
different levels of illumination (dark or bright). Code- empty frames from two test videos.
words should be adaptively updated during illumination Fig. 12 depicts an example of foreground detection,
changes. Exponential smoothing of codeword vector showing differences in detection sensitivity for two
and variance with suitable learning rates is efcient in algorithms due to differences in the color metrics. These
dealing with illumination changes. It can be done by differences reect the performance shown in the
replacing the updating formula of vm with PDR graph in Fig. 13. The video image in Fig. 12(a)
shows someone with a red sweater standing in front of
vm gxt 1  gvm
the brick wall of somewhat different reddish color
and appending shown in Fig. 11(a). There are detection holes through
s2m rd2 1  rs2m the sweater (and face) and more shadows behind the
person in the MOG result (Fig. 12(b)). The holes are
to Step II (iv) of the algorithm for codebook construc- mainly due to difference in color balance and not overall
tion. g and r are learning rates. Here, s2m is the overall brightness. The CB result in Fig. 12(c) is much better
variance of color distortion in our color model, not the for this small contrast. After inspection of the image,
variance of RGB. sm is initialized when the algorithm the magnitude of contrast D was determined to be about
starts. Finally the function colordist in Eq. (2) is 16 in missing spots. Fig. 13 shows a large difference
modied to in detection for this contrast, as indicated by the
d vertical line.
colordist xt ; vi . Fig. 14 shows how sensitively the algorithms detect
si
foregrounds against a scene containing moving back-
We tested a PETS20018 sequence which is challenging
grounds (trees). In order to sample enough moving
in terms of multiple targets and signicant lighting
background events, 300 frames are allowed for training.
8
IEEE International Workshop on Performance Evaluation of
A windows is placed to represent moving backgrounds
Tracking and Surveillance 2001 at http://www.visualsurveillance.org/ as shown in Fig. 11(b). PDR analysis is performed on
PETS2001. the window with the FA-rate obtained only within the
ARTICLE IN PRESS
K. Kim et al. / Real-Time Imaging ] (]]]]) ]]]]]] 11

Fig. 9. The leftmost column: original images, the middle column: color-labeled non-permanent backgrounds, the rightmost column: detected
foreground. The video shows that a man parks his car on the lot and takes out two boxes. He walks away to deliver them.

windowa window false alarm rate (instead of frame rithms, with CB and KER performing best. CB and KER,
false alarm rate). both of which model mixed backgrounds and separate
The PDR graph (Fig. 14) for the moving background color/brightness, are most sensitive, while, as expected,
window is generally shifted right, indicating reduced UNI does not perform well as in the previous case
sensitivity of all algorithms for moving backgrounds. because it was designed for single-mode backgrounds.
Also, it shows differences in performance among algo- KER.RGB and MOG are also less sensitive outdoors.
ARTICLE IN PRESS
12 K. Kim et al. / Real-Time Imaging ] (]]]]) ]]]]]]

Fig. 10. Results of adaptive codebook updating for detection under global illumination changes. Detected foregrounds on the frame 1105 are labeled
with green color: (a) original imageframe 1, (b) brightness changes, (c) before adaptive updating, (d) after adaptive updating.

Fig. 11. The sample empty-frames of the two videos used in the experiments: (a) red-brick wall, (b) parking lot.

Fig. 12. Sensitive detection at small contrast showing the differences in color metrics of the algorithms: (a) a red-brick wall frame including a person
in a red sweater, (b) MOG, (c) CB (proposed).

6. Conclusion and discussion moving backgrounds, illumination changes (using our


color distortion measures), and compressed videos
Our new adaptive background subtraction algorithm, having irregular intensity distributions. It has other
which is able to model a background from a long desirable featuresunconstrained training and layered
training sequence with limited memory, works well on modeling/detection. Comparison with other multimode
ARTICLE IN PRESS
K. Kim et al. / Real-Time Imaging ] (]]]]) ]]]]]] 13

Detection rate at perturbation show that nearest neighbor classication, which is


(video 'red-brick wall' / false alarm rate = 0.01%) computationally very efcient, is as effective as
100 probabilistic classication (both kernel and MOG)
for our application. Practically, even when comput-
90 ing probabilities of pixel measurements coming from
80 the background, these probabilities are dominated
by the nearest component of the background
70 mixture.
CB
Detection Rate (%)

(2) The most important lesson, based on our experience,


60 MOG for analyzing color videos is that using an appro-
50 KER priate color model is critical for obtaining accurate
KER.RGB detection, especially in low light conditions such as
40 in shadows. Using RGB directly lowers detection
UNI
30
sensitivity because most of the variance at a pixel is
due to brightness, and absorbing that variability into
20 the individual RGB components results in a lower
true detection rate for any desired false alarm rate.
10
In other words, an algorithm would have to allow
0 greater color variability than the data actually
0 5 10 15 20 25 30 35 40 requires in order to accommodate the intrinsic
variability in brightness. Using normalized colors,
Fig. 13. PDR for red-brick wall video in Fig. 11(a).
on the other hand, is undesirable because of their
high variance at low brightness levels; in order to
maintain sufciently low detection error rates at low
Detection rate on window at perturbation brightness, one necessarily sacrices sensitivity at
(video 'parking lot' / 'window' false alarm rate = 0.1%) high brightness. This is due to using an angular
100 measure between normalized color coordinates for
90
detection. The color model proposed in this paper,
on the other hand, maintains a constant false alarm
80 rate across, essentially, the entire range of brightness
levels. One would expect that modifying other
70
background subtraction algorithms, such as the
Detection Rate (%)

60 MOG algorithm, to use this more appropriate color


model would bring their performance much closer to
50 that of the codebook algorithm.
40
CB
30 We have applied the PDR analysis to four background
MOG

20 KER subtraction algorithms and two videos of different types


KER.RGB
of scenes. The results reect obvious differences among
10
UNI
the algorithms as applied to the particular type of
background scenes. We also provided a real video
0
0 5 10 15 20 25 30 35 40 example of differences among the algorithms with
respect to sensitive foreground detection which is
consistent with the PDR simulation.
Automatic parameter selection is an important goal
Fig. 14. PDR for window on moving background (Fig. 11(b)). for visual surveillance systems as addressed in [20]. Two
of our parameters, 1 in Section 2.1 and 2 in Section 2.4,
can be automatically determined. Their values depend
modeling algorithms shows that the codebook algorithm on variation within a single background distribution,
has good properties on several background modeling and are closely related to false alarm rates. Preliminary
problems. experiments on many videos show that automatically
In summary, our major contributions are as follows: chosen threshold parameters 1 and 2 are sufcient.
However, they are not always acceptable, especially for
(1) We propose a background modeling technique highly compressed videos where we cannot always
efcient in both memory and speed. Experiments measure the robust parameter accurately. In this
ARTICLE IN PRESS
14 K. Kim et al. / Real-Time Imaging ] (]]]]) ]]]]]]

regards, further investigation could be done to obtain [11] Mittal A, Paragios N. Motion-based background subtraction
robust parameters. using adaptive kernel density estimation. IEEE Conference in
Computer Vision and Pattern Recognition 2004.
[12] Paragios N, Ramesh V. A MRF-based real-time approach for
References subway monitoring. IEEE Conference in Computer Vision and
Pattern Recognition 2001.
[13] Wang D, Feng T, Shum H, Ma S. A novel probability model for
[1] Wren CR, Azarbayejani A, Darrell T, Pentland A. Pnder: real-
time tracking of the human body. IEEE Transactions on Pattern background maintenance and subtraction. The 15th International
Analysis and Machine Intelligence 1997;19(7):7805. Conference on Vision Interface; 2002.
[14] Zhong J, Sclaroff S. Segmenting foreground objects from a
[2] Horprasert T, Harwood D, Davis LS. A statistical approach for
real-time robust background subtraction and shadow detection. dynamic textured background via a robust Kalman lter. IEEE
IEEE Frame-Rate Applications Workshop, Kerkyra, Greece; 1999. International Conference on Computer Vision 2003.
[3] Stauffer C, Grimson WEL. Adaptive background mixture models [15] Monnet A, Mittal A, Paragios N, Ramesh V. Background
modeling and subtraction of dynamic scenes. IEEE International
for real-time tracking. IEEE International Conference on
Computer Vision and Pattern Recognition 1999;2:24652. Conference on Computer Vision 2003.
[4] Lee DS, Hull JJ, Erol B. A Bayesian framework for Gaussian [16] Amer A, Dubois E, Mitiche A. Real-time system for high-level
mixture background modeling. IEEE International Conference on video representation: application to video surveillance. Proceed-
ings of SPIE International Symposium on Electronic Imaging,
Image Processing 2003.
[5] Harville M. A framework for high-level feedback to adaptive, per- Conference on Visual Communication and Image Processing
pixel, mixture-of-gaussian background models. European Con- (VCIP); 2003.
[17] Greiffenhagen M, Ramesh V, Comaniciu D, Niemann H.
ference on Computer Vision 2002;3:54360.
[6] Javed O, Shaque K, Shah M. A hierarchical approach to robust Statistical modeling and performance characterization of a real-
background subtraction using color and gradient information. time dual camera surveillance system. Proceedings of Interna-
IEEE Workshop on Motion and Video Computing (MO- tional Conference on Computer Vision and Pattern Recognition
2000;2:33542.
TION02); 2002.
[7] Porikli F, Tuzel O. Human body tracking by adaptive back- [18] Kohonen T. Learning vector quantization. Neural Networks
ground models and mean-shift analysis. IEEE International 1988;1:316.
Workshop on Performance Evaluation of Tracking and Surveil- [19] Ripley BD. Pattern recognition and neural networks. Cambridge:
Cambridge University Press; 1996.
lance (PETS-ICVS); 2003.
[8] Cristani M, Bicego M, Murino V. Integrated region- and pixel- [20] Scotti G, Marcenaro L, Regazzoni C. A S.O.M. based algorithm
based approach to background modelling. Proceedings of IEEE for video surveillance system parameter optimal selection. IEEE
Workshop on Motion and Video Computing; 2002. Conference on Advanced Video and Signal Based Surveillance
2003.
[9] Elgammal A, Harwood D, Davis LS. Non-parametric model for
background subtraction. European Conference on Computer [21] Chalidabhongse TH, Kim K, Harwood D, Davis L. A perturba-
Vision 2000;2:75167. tion method for evaluating background subtraction algorithms.
Joint IEEE International Workshop on Visual Surveillance and
[10] Toyama K, Krumm J, Brumitt B, Meyers B. Wallower:
principles and practice of background maintenance. International Performance Evaluation of Tracking and Surveillance (VS-
Conference on Computer Vision 1999; 25561. PETS); 2003.

Das könnte Ihnen auch gefallen