Beruflich Dokumente
Kultur Dokumente
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/224368826
CITATIONS
READS
86
4 authors, including:
Dimitrios Makris
Sergio Velastin
SEE PROFILE
SEE PROFILE
Abstract
Introduction
Previous research
The challenge
DSP
.....
DSP
Event DB
HOST
Video
storage
Object DB
IP
NETWORK
ALERT
MANAGER
ALERT
CONFIG
Motion estimation
Object detection
Blob-based tracking
Object classification
Figure 2:Structure of the video-processing software
If we consider the architecture of the system, and also its
practical aspects related to commissioning and configuration,
it becomes evident that the object classification module must
adhere to the following specifications:
The solution must run in real-time on DSP boards. This is
crucial to perform low level feature extraction at high
frame rates without creating bottlenecks in a multi
channel system.
The training process of a scene-dependent classification
module should be quick and easy to perform.
The solution should distinguish among multiple classes
and also allow for all possible deployments (e.g.
detecting planes in an airport, boats in a seaport or bags
in a lounge).
In the next sections, two main approaches to address the
highlighted challenges are explored:
1. A scene-independent framework that needs neither
complex configuration nor training (once the design of
the classifier has been completed).
2. A scene-dependent classifier that requires a training stage
for each camera view. However, the labelling process
lasts no more than few minutes per camera.
Scene-independent approach
FEATURE DESCRIPTION
Heightk
Current height
Widthk
Current width
Ratiok
AvgHeight
Average height
VarHeight
AvgWidth
Average width
VarWidth
AvgRatio
VarRatio
AvgArea
Average size
AreaRatio
Dispersedness
Dleft
Dright
Dtop
Dbottom
AvgSpeed
Average speed
VarSpeed
Bag
5
0
0
0
0
0
Bag
9
1
0
0
0
0
4.3
Adaboost
Adaboost is a popular method for supervised learning
introduced in [3]. An Adaboost network is built of weighted
weak classifiers, hence it is able to automatically select the
most significant features. Training can be executed offline,
but, like k-means, it needs a wide range of samples.
We use the Modest Adaboost algorithm of the GML
AdaBoost Matlab Toolbox [13]. The weak classifiers are tree
learners made of up to 5 thresholds. For multi-class
recognition, a One-Vs-All approach is used [9], as it can be
simply implemented without affecting accuracy.
Adaboost is a fast classification algorithm however its
computational cost is certainly more significant than k-means.
In the testing phase, the tree of solutions has to be explored,
multiplying weights and adding values for a maximum
number of TN*N*CL cycles; where CL is the number of
classes (for multi-class recognition), TN is the total number
of weak classifiers (200 in our case) and N is the number of
Bag
10
1
0
0
0
0
The results reported in Section 4 demonstrate that a sceneindependent solution cannot provide sufficiently accurate
results. Despite normalization, features are not sufficiently
stable among different scenes due to the variations in view,
camera angle, object position and direction. The approach
proposed in this section tries to overcome these problems,
while keeping the required configuration time to a minimal
amount. The idea is to train the system separately for each
camera view, so that it can adapt to the different
characteristics of the scene.
The training process uses data acquired by the videosurveillance system during a fixed period of time (for
example, one week). Samples to be presented to the user, for
labelling, are selected through an automatic analysis of the
feature vectors. This analysis finds the optimal description of
the feature space using a maximum of 32 samples, even when
performing multi-class recognition (the value 32 is a trade-off
Number of
k-means
Learningbased
samples
Person
17
82%
94%
Motorbikes
5
20%
80%
Small group
4
100%
50%
Large group
6
83%
Crowd
5
20%
80%
Car
25
88%
100%
Van
8
62%
75%
Bus
4
100% (lorry)
100%
TOTAL
74
70%
92%
Table 5: Objects correctly classified by k-means (algorithm
described in Par. 4.2) and our learning-based solution. We
consider 74 consistent tracks appearing in a video sequence
35 minutes long; the TOTAL is affected by the number of
objects belonging to each class.
The scene-dependent learning-based algorithm can also be
trained to recognize erroneous tracks. In Table 6, all tracks
are considered, and the class error is added. The camera
view used for this test presents groups of objects, cluttering
and strong shadows, so wrong detections are quite frequent.
Results demonstrate that the classifier can cope with these
challenging conditions, assigning false objects to the class
error.
Table 6 also shows the results achieved by the Adaboost
classification algorithm on the same data set (by using a
leave-1-out approach, the training set consists of 364 objects).
Adaboost is implemented as described in Section 4.3, but the
classifier is trained only on the scene of interest (i.e. Adaboost
is used in this case for scene-dependent classification, thus
increasing its accuracy). In this way, the two learning-based
scene-dependent algorithms can be compared. The results
Conclusions
Acknowledgements
The authors acknowledge financial support from DBERR,
Knowledge Transfer Partnership, under grant number
KTP006256.
References
[1] S. Boragno, B. Boghossian, J. Black, D. Makris, S.A.
Velastin, A DSP-based system for detecting vehicles
stopping in prohibited area, Advanced Video and Signal
Based Surveillance (AVSS2007), pp. 260-265, (2007).
[2] O. Chapelle, P. Haffner, and V. N. Vapnik, Support
vector
machines
for
histogram-based
image
classification, IEEE Trans. on Neural Networks, 10(5):
pp. 10551064, (Sep. 1999).
[3] Y. Freund and R.E. Schapire, A decision-theoretic
generalization of on-line learning and an application to
boosting, Journal of Computer and System Sciences,
55(1): pp. 119-139, (1997).
[4] O. Javed and M. Shah, Tracking And Object
Classification For Automated Surveillance, In Proc. 7th
European Conf. on Computer Vision ECCV 2002,
Springer, Vol. 4, pp. 343-357, (2002).
[5] A. Levin, P. Viola, and Y. Freund. Unsupervised
improvement of visual detectors using co-training,
ICCV 2003, Vol. 1, pp. 626-633, (2003).
[6] A. Lipton, H. Fujiyoshi, and R.S. Patil, Moving Target
Classification and Tracking from Real-Time Video,
Proc. IEEE Workshop Applications of Computer Vision
(WACV), pp. 8-14, (Oct. 1998).
[7] J.B. MacQueen, Some methods for classification and
analysis of multivariate observations, Proceedings of the
5th Berkeley Symposium on Mathematical Statistics and
Probability, pp. 281297, (1967).
[8] J.-P. Renno, D. Makris and G.A. Jones, Object
Classification in Visual Surveillance Using Adaboost,
CVPR 07, pp. 1-8, (2007).
[9] R. Rifkin and A. Klautau, In Defense of One-Vs-All
Classification, Journal of Machine Learning Research
5, pp. 101-141, (2004)
[10] C. Stauffer, Minimally-supervised Classification using
Multiple Observation Sets, ICCV 2003, pp. 297-304,
(2003).
[11] T.N. Tan and K.D. Baker, Efficient Image Gradient
Based Vehicle Localization, IEEE Tran. on Image
Processing, Vol.9, No.8, pp. 1343-1356. (Aug. 2000).
[12] A. Vailaya, M. A. T. Figueiredo, A K. Jain, and HongJiang Zhang, Image Classification for Content-Based
Indexing, IEEE transactions on image processing, vol.
10, no.1, pp. 117-130, (2001).
[13] A. Vezhnevetz, GML Adaboost Matlab Toolbox,
http://research.graphicon.ru/machine-learning/gmladaboost-matlab-toolbox.html. Accessed: 28th March
2008.
[14] P.Viola and M.Jones, Real-time Object Detection,
Cambridge Research Laboratory, Technical Report
Series, (2001).
[15] Q. Zang and R.Klette, Object Classification and
Tracking in Video Surveillance, In Proc. Computer
Analysis of Images and Patterns, pp. 198-205, (2003).