Beruflich Dokumente
Kultur Dokumente
102
home monitoring, elderly care, and smart monitor access to the institutions perimeter;
environments to security and surveillance in public monitor equipment and data;
or corporate buildings. Computer vision based detect and follow acts of vandalism and theft;
solutions have the potential for very discriminating recognize license plates;
detection and very low false alarms [5]-[6]. support criminal investigations and control
Surveillance systems are typically categorized into access.
three distinct generations [7]. The first generation Since educational institutions often have an IP
uses analog equipment throughout the complete network infrastructure, it is beneficial to set up
system. The second generation uses analog digital video surveillance systems [16]. Due to the
cameras but digital back-end equipment. The third above reasons, we have implemented our IVSS in
generation completed the digital transformation our University for testing. Basically, the system is
and the cameras which have the ability to convert composed of a set of IP cameras plugged directly
the video signal to digital before sending them in the local network hub. A human computer
sometimes over IP [8]. Over the last 2 decades, interface and a storage space are also plugged in
research in computer vision has been very active this system. The main advantage of such
where scene interpretation and understanding architecture is its flexibility. The main goal is to
received the lions share from the scientific create a robust, adaptive system that is flexible
community effort in this field [9]. That is mostly enough to handle variations in lighting, moving
due to the specific interest of governments in scene clutter, multiple moving objects and other
automatic video surveillance for homeland arbitrary changes to the observed scene.
security. That orientation was largely helped by Consequently, in this architecture, the system
the hardware (cameras and computers) becoming should be able to:
cheaper. Consequently, many projects were started i) Register different viewpoints and create virtual
and aimed to develop intelligent video surveillance displays of the facility or area;
systems like CROMATICA [10], VSAM [11], or ii) Track and classify objects
ADVISOR [12]. However, despite investigators iii) Overlay tracking information on a virtual
hard work, it is clear that a big effort is yet to be display constructed from the observations of
done before developing surveillance related multiple cameras;
systems that are really useful [13]. Robustness of iv) Learn standard and abnormal behaviors of
activity detection, tracking and understanding objects;
modules, is one of the crucial problems still to be v) Selectively store video. Low bandwidth
investigated in a more systematic manner [14]. tracking information could be continually
These projects were started in order to build an stored allowing the observer to query the
Intelligent Visual System which will add a brick system about activities.
towards solving the problem of robustness. Other The architecture enables a single human operator
known problems, like handling occlusions [15] to monitor activities over a broad area using a
may also be investigated. distributed network of video sensors. The sensor
platforms are mainly autonomous, notifying the
3. SYSTEM ARCHITECTURE operator only of salient information as it occurs,
Video surveillance is increasingly found in and engaging the operator minimally to alter
academic institutions. It is used to oversee the platform operations. And we believe that
safety of faculty members, staff and students, as developing the capabilities to deploy and most
well as to protect assets from vandalism and theft. importantly to process the data from such a big
Moreover, the campuses may be extensive, number of cameras will impact existing
especially in the case of universities, and be surveillance and monitoring methods. The
comprised of several buildings, accesses and architecture of our proposed system focuses on a
parking lots to monitor. In this environment, video reliable link between image processing and video
surveillance is used in particular to: content analysis as seen on figure 3.1. Hence,
103
integration of image processing within the digital stored in the field experience module for easier
video networked surveillance system itself is access and future detection [16].
inevitable. The proposed IVSS system contains all In the context of this architecture, we have build
the modules (video capture, image analysis, image on existing framework for detecting, tracking and
understanding, event generator and field classifying activities, in a variety of directions.
experience). Moreover, it contains an auto- Tracking methods will be extended to incorporate
learning module and another module about video multiple cameras. This will require coordination
retrieval. between the cameras to ensure that the same object
is being tracked in each, as well as to merge
statistical information about the tracked object into
a coherent framework.
104
by a green frame tracking in real-time the moving parts into one motion region. This propriety is
person. The video capture module delivers a video useful to identify a human who are not rigid and
stream acquired from the camera, and then each I also useful in occultation of the moving object and
frame of the stream is smoothed with the second other target. After the motion region is
derivative in time of the temporal Gaussian determined, targets are morphologically dilated
function. (twice) and then eroded. Subsequently, moving
If fn is the intensity of the nth I frame of the shot, targets are clustered into motion regions using a
then the absolute difference function n is: connected components criterion.
n = | fn - fn-1 |
(1)
The result of the difference is binarized in order to
separate changed pixels from others. To do this, a
threshold function is used and a motion image Mn
can be extracted.
f (u, v ) if n (u , v ) T
M n (u, v ) = n
0 if n (u, v ) < T
(2)
Where T is an appropriate threshold chosen after
several tests performed on the scenes of the Figure 4.1: Original images and detected motion regions
environment. To separate the regions of interest
from the rest of image, binary statistical The algorithm works effectively and satisfactorily
morphological operators (erosion and dilatation) when the scene includes many moving objects or
are used as follows humans. Each time a person enters the scene, the
- Binary statistical erosion: if the structured system encapsulates the moving body shape by a
element SE and the filtering threshold th are fixed, numbered frame for proper tracking through time.
the output of binary statistical erosion at pixel i is: The multi-object motion detection is illustrated by
1 M 1 (i ) th figure 4.2 where we have tried two persons
f e (i ) = coming towards each other then passing nearby
0 otherwise each other. The system shows the two numbered
(3)
1
Where M (i) is the number of pixels of value 1 frames coming closer, merging, and then
inside the SE. It allows eliminating the noisy separating again.
isolated pixels returned by the change pixels
detector.
- Binary mathematical dilatation: if the structuring
element SE is fixed, the output of binary dilatation
at pixel i is:
1 M 1 (i ) 1
f d (i ) =
0 otherwise
(4)
This operation allows recovering interesting pixels
eliminated by erosion, by filling holes present
inside interesting regions. Then, the moving
sections must be grouped into motion regions
Rn(i). This is done using a connected component
criterion. It allows to group different motion
sections susceptible to be a part of the same Figure 4.2: Video stream showing the detection of 2 persons
region, or allows grouping the residual motion when they cross in front of the camera.
105
and the ratio of a fuzzy set can be determined as
4.2 Moving Object Fuzzy Classification follows:
The main difficulty with metrical
classification is that for example, multiple humans (Perimetre )2 Length
Dispersion = , Ratio =
close together can be misclassified as vehicles, or Area width
a partly occluded vehicle may look like a human, (7)
or some background clutter may appear as a The classified motion regions are used as
vehicle. To overcome this problem, an additional templates for metrical training algorithms. The
hypothesis is used. The main idea is to record all fuzzy system is based on two entrances: the
potential motion regions PRn from the first frame dispersion and the ratio of the motion regions, and
of the stream. Each one of these potential regions three exits: one exit for human, one exit for the
must be observed along some frames of the shot to vehicles and one exit for non-identified objects.
determine if they persist or not, and so decide to For every entrance, we have two fuzzy sets: one
continue classifying them. To do this, for each for the category of humans and other for the
new frame, each previous motion region PRn-1 is category of vehicles. Its clear that the most
matched to the spatially closest current motion obvious types of targets which will be of interest
region Rn according to a mutual proximity rule. in our IVSS application are Humans and Vehicles.
After this process, each previous potential motion For the time being we did not assign any outdoor
region PRn-1 whish have not been matched to camera for vehicle tracking, but this issue is
current region are removed from the list of among the future research objectives. So we set up
accepted motion regions. And any current motion the classification without testing vehicles for the
region Rn whish has not been matched is time being. Many experiments have been
considered new potential region. At each frame, conducted to evaluate the range of the ratio and
their new classification according to the metric dispersion for humans and vehicles. For the sake
operators, dispersion and ratio, are used to update of meeting Saudi standards, we always experiment
the classification hypothesis. The most advantage with people wearing Saudi clothes beside the ones
of this method is that if an occluded object is wearing western clothes. For this reason, two
misclassified it will be correctly classified with the classifiers to detect these two groups have been
passage of time. Another advantage is that the implemented. The metric is based on the
instable motions appearing at the background will knowledge that humans are, in general, smaller
be misclassified as no-identified regions. than vehicles, and that they have more complex
The motivation of the use of the geometry features shapes.
is that is computationally inexpensive and
invariant to lighting conditions or viewpoint [18]. 4.3 Object Tracking Approach
On the other hand, it is clear that the human, with Many tracking techniques are based on
its small and more complex shape, will have larger mathematical methods that make it possible to
dispersion than a vehicle. If we define an predict an objects position on a frame based on its
appropriate membership function for the object, movement in the previous frames. Tracking
the area and the perimeter p of the object can be several objects at the same time poses many
calculated as follows: Area of fuzzy sets: challenges. Each object detected in a frame must
a( ) =
(5)
be associated with its corresponding object in the
subsequent frame. This matching is done based on
Perimeter of a fuzzy set: the objects characteristics (e.g., corners, area,
M N1 N M1 ratios, etc.), or their model of appearance.
p() = m,n m,n+1 + m,n m+1,n
m=1 n=1 n=1 m=1 Occlusions (regions hidden by others) represent a
(6) major difficulty for tracking objects. A video
Where M and N are the dimensions of the image. surveillance system may lose track of an object if
Based on the perimeter and the area, the dispersion it is totally or partially obstructed over a certain
period of time. The known difficulties in object
106
tracking which remain largely open problems
could arise from: abrupt object motion, changing
appearance of objects and scenes, self-occlusion,
and occlusion by structure.
Once objects have been detected, the next
logical step is to track these detected objects.
Tracking has a number of benefits. Firstly, the
detection phase is quite computationally
expensive, so by using tracking, the detection step
does not need to be computed for each frame.
After detecting moving objects, the IVSS track
their movement over the video stream. Each task
requires locating each object tracked from one
image to another. The Kalman filter is another
Figure 5.1: IP-Camera network in the College building
powerful tool for analyzing motion. The filter can (Environment View)
be used to predict the real position of the blob
being tracked at a better accuracy than raw sensor The interface used in our IVSS is shown by Figure
data. The Kalman filter uses the history of 5.2 thereby the operator can have a general view
measurements to build a model of the state of the of what happening in the area under surveillance.
system that maximizes the probability for the The two cameras of type CIVS-IPC-3431 (denoted
position of the target based on the past camera K and L) were installed in the server room
measurements [20]. and just in the nearby corridor for the purpose of
5. EXTENSION OF THE TRACKING identifying persons accessing the server room and
METHOD TO MULTIPLE CAMERAS checking for access rights. While the ten cameras
The aim of tracking description in multiple-camera of type CIVS-IPC-4300 have been installed in the
configuration is to make a link between the corridors of the first floor of the department to
tracking and the analysis processes. It is then cover a wide closed area where students move and
important to establish correspondences between access the lecture rooms, faculty offices,
the objects in different image sequences taken by administration offices and toilets. The ten cameras
different cameras. Consequently, target tracking were denoted A, B, C, D, E, F, G, H, I and J as
and data fusion also need to be addressed. The illustrated by figure 5.1. The idea is to create an
success of data fusion depends on how well data is interface with a mosaic of all available cameras
represented, how reliable and adequate the model and when clicking on an image, we can see it in a
of data used and how accurate and applicable prior bigger size or in full screen mode. While
knowledge is. Figure 5.1 shows the environment designing the interface, we had several choices
of our IVS which had been implemented in the depending on the development language we are
college of computer science. Camera calibration going to choose. With Java language, we have the
seems to be a necessary step to make it possible to Swing library for developing desktop application.
calculate the actual size and speed of the objects in With C++, we have MFC, GTK or QT
the scene. It establishes the correspondence Framework, which offer all a complete SDK for
between the geometry of the scene and that of the developing portable cross-platform application
image. For fixed cameras, a 2D context can be especially GTK or QT. Accordingly, Java being a
defined by the system administrator identifying higher-level language, we prefer C++ for an
areas in the image such as input/output regions, intensive resource consuming application like
zones to ignore, etc. video processing.
107
analysis to establish a link between the different
zones.
surveillance. It is then essential for tracking Table 5.1: Camera link table for tracking analysis based on
left exit
108
After detecting blobs on the monitored area, the are already known. Another member of the neural
next step was to represent the evolution of this networks family, namely the self-organizing
blobs on a Map in real-time. Bottom, top and top neural networks (like Kohonen networks), is
and VP exits can easily be derived. A Map of the suited to behavior understanding when the object
area was drawn for this purpose. The position of motions are unrestricted. Among the abnormal
the different cameras is visible on the map and behavior, we tackled those based on two
each camera was assigned a letter to represent it. parameters: Existence or not of interaction
Once the camera scale is fixed, each camera between objects (humans here) and the event
tracker is given the coordinates of the blobs being normal or abnormal (falling or running
detected and it will be displayed on the map as person, a punch (involving 2 persons) and a
shown on Figure 5.4. rushing crowd in the wrong direction during a
given time. As the output is binary in all the
scenarios, the statistical method chosen to
discriminate the events is the Support Vector
Machine. As a machine has to make the decisions,
the broad machine learning topic pops up. As
OpenCV2 comes with a machine learning
library ml.lib, we decided to use its C++ API to
implement the scenarios.
6.1 Support Vector Machine (SVM)
Figure 5.4: Display on the map (red point) of a tracked
person (shown at bottom left) from one of the camera
The basic idea of Support Vector Machines is to
find the optimal hyperplane that splits a dataset
6. ACTIVITY RECOGNITION into different categories. That hyperplane is
In this section we will proceed to evaluation of the chosen, so that the distance to the nearest data
most important methods used in, either the point of the classes is maximized. The following
abstraction or event modeling phases figure gives an idea about a simple example with
independently on the taxonomy used. However, only 2 categories in the plane.
we will use the taxonomy proposed by Lavee et al
[21]. only as indication for categorizing the
methods used. The objective of this section is to
show the strengths and shortages of some of the
most important methods used which will help
investigators choose their tools depending on
their problems.
The traditional techniques which fall under this Figure 6.1: The red line (H2) is the optimal in this example
title focus on the event recognition problem and do
not consider the semantic representation Globally, it is seen as a set of supervised
(understanding), which makes their role simple, learning methods that analyze data and recognize
thus realizing good results in their scope. Minimal patterns. It takes a set of input data and predicts,
semantic knowledge is needed in building the for each given input, which of two possible classes
event classifiers in this category using techniques forms the input, making the SVM a non-
like Nearest Neighbor, Support Vector probabilistic binary linear classifier. Given a set of
Machines and Neural Networks. Often, they may training examples, each marked as belonging to
be fully specified from training data. These one of two categories, an SVM training algorithm
techniques are often applied to the abstraction builds a model that assigns new examples into one
stage. The methods mentioned above all involve category or the other. An SVM model is a
supervised learning. They are applicable for representation of the examples as points in space,
known scenes where the types of object motions mapped so that the examples of the separate
categories are divided by a clear gap that is as
109
wide as possible. New examples are then mapped previous position. Figure 6.4 shows the detection
into that same space and predicted to belong to a of falling and running behavior simultaneously.
category based on which side of the gap they fall
on [22]. In fact, using the geometry of the frame
associated with the detected motion of the
recognized object, we may categorize some basic
activities like running, jumping and falling:
Running: may be detected when the speed of the
frame from image to image goes beyond a
predetermined threshold. In fact, the speed value
Figure 6.4: Detection of falling (a) and running (b) behaviors
less than this threshold characterize normal
walking. The result is shown on Figures 6.2 Generally, most of the research done in the field of
below. This case depicts running to the right or IVSS concerns mostly people wearing western
the left but parallel to image plane. Detecting the clothes. This constitutes, in fact; another problem
running behavior away or towards the camera is to be solved in our case, which is discerning the
still under implementation. person wearing a white Saudi wear from the
mostly white background of our environment in
the college. The difficulties that we faced during
this first phase of system implementation are part
of the problems linked to the third generation of
video systems which are multiple and need to be
addressed quickly in order to push this technology
to maturation.
Figure 6.2: Detection of the running behavior 7. CONCLUSION
Feature extraction and classification, even if
Jumping: This case is detected when the position investigations last more than 2 decades now,
of the frame from image to image suddenly goes remain a big challenge. Many methods were used
up then down. Moreover, the speed of this up and to detect moving objects like background
down motion should be greater than a subtraction and others. However, each of them
predetermined threshold. to not confuse with the present drawbacks like ghosts for the
normal displacement of the frame during the background subtraction method. We may notice in
motion detection process. Figures 6.3 show a the experimental results, presented through the
detection of a human jump behavior. different sections of this paper that sometimes
only two persons are present in the scene but
frames are not consecutively numbered. This is
mainly due to feature extraction and classification
algorithm which sometimes classifies a shadow as
another moving person and assigns to it another
frame number. But, after a couple of seconds the
frame disappears automatically due to the fact that
Figure 6.3: Detection of jump behavior the tracking does support the sudden deformation
of the frame. In fact, finding the events of interest
Falling: This behavior can easily be detected for a
and identifying the behavior is not a trivial task.
single person when the size of the frame suddenly
This is may be the bigger challenge in our IVSS.
changes its dimensions and becomes squeezed
Many approaches are presented but harder work is
downward. Generally, the center of the tracking
yet to be done. The computational cost for some
frame should go suddenly down relatively to the
methods is very high which make their use
difficult.Most of the Saudis are wearing white
110
dress during most of the time of the year. Hence, [10] Khoudour L. et al. Project cromatica. In Alberto
discerning the moving person from the mostly Del Bimbo, editor, Image Analysis and Processing,
volume 1311 of Lecture Notes in Computer Science,
white background of our environment in the pages 757764. Springer Berlin / Heidelberg, 1997.
college was a very tedious task. Actually, we [11] Collins J., Takeo K, Hironobu F, David D, Yanghai T,
shifted to the red head cover detection, but still Tolliver D, Enomoto N., Hasegawa O, Peter B and
this head cover may be white for many Saudis. At Lambert W. VSAM: A System for Video Surveillance
the beginning, we tried rising the threshold value and Monitoring Robert T. CMU-RI-TR-00-12.
[12] Nils T and Maybank S. The ADVISOR Visual
used for motion image segmentation, but soon we Surveillance System. http://www-
discover that it causes the shadow and some minor sop.inria.fr/orion/ADVISOR/
light changes like a foreground blob that is [13] Teddy Ko. A survey on behavior analysis in video
moving. surveillance for homeland security applications. Applied
Acknowledgement: The authors would like to Image Pattern Recognition Workshop, 0:18, 2008.
[14] Petr Chmela. Content Analysis of Distributed Video
thank the King Abdul-Aziz City for Science and Surveillance Data for Retrieval and Knowledge
Technology (KACST) for support funding to carry Discovery. Thesis Brno University of technology. 2007
out this research project AT/29/314. [15] Tomi D. Raty. Survey on contemporary remote
surveillance systems for public safety. IEEE
8. REFERENCES Transactions on Systems, Man, and Cybernetics, 493
[1] In Su K, Hong Seok Choi, Yi Kwang Moo, Choi Jin 515, September 2010.
Young, and Kong Seong G. Intelligent visual [16] Turaga P. ,. Chellappa R, V.S. Subrahmanian, and
surveillance a survey. International Journal of O. Udrea. Machine recognition of human activities: A
Control, Automation, and Systems, 8:926939, 2010. survey. Circuits and Systems for Video Technology,
[2] Valera, M. and Velastin, S.A. Intelligent distributed IEEE Transactions on, 18(11):14731488, nov. 2008.
surveillance systems: a review. IEE Proc. . Vis. Image [17] Peter L. Venetianer and Hongli Deng. Performance
Signal Process., Vol. 152, No. 2, April 2005, pp. evaluation of an intelligent video surveillance system - a
192.204. case study. Computer Vision and Image Understanding,
[3] Hannah M. Dee and Sergio A. Velastin. How close are 114(11):12921302, 2010.
we to solving the problem of automated visual [18] Hampapur, L. Brown, J. Connell, A. Ekin, N. Haas, M.
surveillance? : A review of real-world surveillance, Lu, H. Merkl, S. Pankanti, S., Smart video
scientific progress and evaluative mechanisms. Machine surveillance: exploring the concept of multiscale
Vision and Applications, 19:329343, September 2008. spatiotemporal tracking, IEEE Signal Processing Mag.,
[4] Vallejo, D., et al. A cognitive surveillance system for vol. 22, no. 2, pp. 3851, Mar. 2005.
detecting incorrect traffic behaviors. Elsevier. Expert [19] Dee, H., Velastin, S. A. How close are we to solving
Systems with Applications (2009), the problem of automated visual surveillance? A review
doi:10.1016/j.eswa.2009.01.034 of real-world surveillance, scientific progress and
[5] SAGEM et al. Integrated surveillance of crowded areas evaluative mechanisms. Machine Vision and
for public security. Website, 2007. Applications, 19 (5-6). Septembre 2008. pp. 329-343.
http://www.iscaps.reading.ac.uk/about.htm. [20] Goh K-S; Miyahara, K.; Radhakrishan, R.; Xiong, Z.;
[6] Gouaillier V and Fleurant A. Intelligent video Divakaran, A., "Audio-Visual Event Detection Based on
surveillance: Promises and challenges technological and Mining of Semantic Audio-Visual Labels", SPIE
commercial intelligence report. Technical report, CRIM Conference on Storage and Retrieval for Multimedia
and Technopole Defence and Security, 2009. Databases, Vol. 5307, pp. 292-299, January 2004
[7] Weiming H, Tieniu T, Liang W, and S. Maybank. A [21] Gal L, Rivlin E, and Rudzsky M. Libsvm Understanding
survey on visual surveillance of object motion and Video Events: A Survey of Methods for Automatic
behaviors. Systems, Man and Cybernetics, Part C, IEEE Interpretation of Semantic Occurrences in Video. IEEE
Transactions on, 34(3):334352, 2004. Trans on Systems, Man, and Cybernetics. Vol. 39, No.
[8] Hampapur, L. Brown, J. Connell, S. Pankanti, A. Senior 5, September 2009.
and Y. Tian, Smart surveillance: applications, [22] Foresti G., Micheloni C., Snidaro L., Remagnino P.,
technologies and implications, IBM T.J. Watson Ellis T., Active video-based surveillance system: the
Research centre, www.research.ibm.com/peoplevision/, low-level image and video processing techniques
Mar 2008. needed for implementation, IEEE Signal Processing
[9] Duque D. , Santos H. , and Cortez P. . Prediction of Mag., vol. 22, no. 2, pp. 2537, Mar. 2005.
abnormal behaviors for intelligent video surveillance
systems. In Computational Intelligence and Data
Mining, 2007. CIDM 2007. IEEE Symposium on, pages
362367, april 2007.
111