Obect Detection and Measurement Using Stereo Images

See
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/261512876
Object Detection and Measurement Using

Stereo Images
Conference Paper June 2012

DOI: 10.1007/978-3-642-30721-8_16
CITATIONS READS
0 1,859
1 author:
Christian Kollmitzer
Fachhochschule Technikum Wien
9 PUBLICATIONS 2 CITATIONS
SEE PROFILE
All content following this page was uploaded by Christian Kollmitzer on 11 April 2014.
The user has requested enhancement of the downloaded file.

Object detection and measurement using stereo images
Christian Kollmitzer
Electronic Engineering
University of Applied Sciences Technikum Wien
Hchstdtplatz 4, A-1200 Vienna Austria
kollmitz@technikum-wien.at
Abstract: This paper presents an improved method for detecting objects in ste-
reo images and of calculating the distance, size and speed of these objects in re-
al time. This can be achieved by applying a standard background subtraction
method on the left and right image, subsequently a method known as subtrac-
tion stereo calculates the disparity of detected objects. This calculation is sup-
ported by several additional parameters like the center of object, the color dis-
tribution and the object size. The disparity is used to verify the plausibility of
detected objects and to calculate the distance and position of this object. Out of
position and distance the size of the object can be extracted, additionally the
speed of objects can be calculated when tracked over several frames. A dense
disparity map produced during the learning phase serves as additional possibil-
ity to improve the detection accuracy and reliability.
Keywords: Computer Vision, Stereo Vision, Foreground Segmentation, Dispar-

ity Map, Subtraction Stereo
1 Introduction
In surveillance systems and autonomous mobile robots cameras are common sensors
for detecting objects. The standard one-camera system is able to detect objects by
differentiating these objects from the background by motion, brightness or color. This
leads to misinterpretations and unclear situations, especially when the object is similar
to the background, occluded or when the background changes rapidly, like at chang-
ing illumination.
In a setup, where a static camera observes a scene, objects have to be distinguished
from the background. The evaluated system learns over time to find out, when back-
ground or foreground is present. The improvement of the robustness of this detection
requires additional information to normal color cameras. This additional information
can be provided by a second camera, which leads to depth information and allows
calculating the position of objects in the scene.
All computational processes are evaluated by means of a computer system using an
i5 processor running with a frequency of 2.4GHz. The target is a detection- frame-rate
of 20 fps at a resolution of 640x480 pixels. All algorithms use the libraries of
OpenCV [7] and run in Windows7.
2 Camera rig
The used camera set is constructed in a way that two identical cameras are mount-
ed on a stable bar with a base distance of 413 mm (Fig. 1).
Base distance: The resolution of the distance (Z) measurement is based on the hori-
zontal distance (T) between the left and right camera.
(1)
offset = distance between the left and right horizontal camera center in pixels; f =
focus in pixels; T = base distance in mm; Z = distance in mm. The pixel unit is pre-
sent in the numerator and denominator; therefore the resulting unit is mm.
(2)
The resolution of the distance measurement can be improved by increasing the

base distance (T) or by increasing the focus (f; tele lens). d = minimal disparity = 1
pixel. In this setting the base distance was chosen in a way that distances up to 20m
can be measured. The focus is 530 pixels, which gives a field of view which is suita-
ble for surveillance of indoor and outdoor areas without camera movement.
Camera resolution: The resolution is a tradeoff between exact object detection and
computational cost. As a standard resolution 640x480 was chosen with the ability to
use also 1280x1024 for higher accuracy.
Fig. 1. Camera rig

3 Image calibration and rectification
The cameras have to be calibrated and the acquired images have to be rectified to
achieve congruent images, which allow disparity calculation. The calibration for cam-
eras which cover a higher distance range is better performed in two steps. First each
camera has to be calibrated separately by means of chessboards, which are presented
several times and lead to intrinsic parameters, which cover misalignment of the cam-
era chip, focal distance and distortion of the lenses. In a second step the rig is cali-
brated again by presenting a chessboard in several positions. By this the extrinsic
parameters like camera distance and rotation to each other is examined.
With these calibration parameters both images are rectified, which leads to hori-
zontally aligned images, which ease the disparity calculation (Fig.2). In this case the
search for identical points in both images is reduced to a horizontal search. After cali-
bration the focus setting of the cameras should not change, therefore all automatic
focus adjustments of cameras have to be turned off.
Fig. 2. Rectified images
4 Disparity calculation
Disparity calculation is a matching problem, the position of identical points in the

left and the right image has to be detected. Due to rectification the search can be lim-
ited to horizontal lines. Several algorithms have been already presented and evaluated.
The computational effort (matching cost) for different methods and algorithms have
been evaluated in [1].
Better algorithms find more correct corresponding points but typically the match-
ing cost is higher. In this evaluation the disparity algorithm semi global block match-
ing is used [5].
The implementation of this algorithm has a high calculation cost and is depending
on the resolution of the images.
Disparity calculation with semi global block matching mode per frame:
Resolution 320x240: 39ms

For a real time application calculation times should be smaller than 50ms to
achieve a frame rate of 20 fps. This method has been used in the further evaluation
during the learning phase of 100 frames and gives a dense disparity map for the back-
ground used as reference background with a resolution of 320x240 pixels. An even
denser disparity map can be achieved by averaging all images during the learning
phase and forming a stable background (Fig.3).
Fig. 3. Left image and corresponding background disparity image
5 Background registration
During this evaluation different algorithms have been used, starting with a method
called modified codebook [4] [5]. The quality of detecting objects and separating
these objects from background is good but the computational effort is very high.
Codebook background registration time per frame:

The combination of disparity algorithm semi global block mode and background
registration with modified codebook results in a calculation time of 440ms per
frame at the intended resolution of 640x480 pixels. With these algorithms the average
frame rate is about 2fps, which is not acceptable for surveillance purposes.
This problem has been solved by evaluating a different type of background-
registration and disparity calculation. For the background registration an adaptive
median background subtraction was used [6]. During a learning phase of 40 images
the median of each image pixels history is calculated and used as background. During
the detection phase this background is subtracted from the actual image; a threshold
function is used to distinguish between foreground and background.
6 Subtraction Stereo
The reduction of the matching cost for the disparity calculation can be achieved by
a method known as subtraction stereo [2]. In this case the disparity calculation is
not applied to the whole left and right image but to the stereo images after the back-
ground registration. Thus only areas which have changed and are different from back-
ground are used for disparity. This can be done in several ways. One is proposed in a
paper of K. Umeda [2] calculating the horizontal distance of detected foreground
objects. Evaluating this method results in noisy position calculation, caused by vary-
ing object sizes, due to background registration.
This method is modified in this evaluation in a way, that only the left image under-
goes a background registration. After smoothing foreground pixels with a procedure
known as connecting components [7] foreground pixels are clustered and objects
identified. The bounding box of the identified object is used to cut out a template of
the rectified left image. The next step is to search image data of the right image for
this template. The search can be limited to a rectangle, which lies left of the object-
position in the left image on the same horizontal line. The search process uses normed
correlation.

(3)

The best correspondence can be found by searching the maximum within the result
area. This position marks the center of the object in the right image and the distance
between the object centers in the right and left image represent the disparity.
Out of the disparity and the position of the object in the image the 3D position of
the object can be calculated. If the center of the object is tracked over several images
an x/y diagram of the projected path can be drawn (Fig. 7).
The object center position is calculated (d = disparity; T = distance of the cameras;

f = focus of the cameras, X,Y,Z = position of the object center in 3D space)
(4)
(5)
(6)
The calculated position is inserted in the display (Fig.4). Additionally certain

points can be selected by the user in the right and left image and thus determine the
coordinates of these points. This allows to measure distances and positions of refer-
ence points (Fig.5) within the field of view.
Fig. 4. Left and right image with object detection and position
Fig. 5. Distance measurement

7 Object tracking
The center position of registered object (Fig.6) is tracked and allows drawing the
way of tracked objects in an x/y diagram (Fig.7).
Fig. 6. Frame of object tracking video with position of center
This diagram (Fig.7) shows that the person first walked in z direction away from
the camera (blue) and then walked towards the camera in a wiggly line (red). There
are still positions which are not correctly recognized, this has to be improved by filter-
ing. The resolution changes with distance; at 20m distance the resolution is 75cm.
Fig. 7. Object center tracking projection (x/y in dm)

8 Object measurement
Out of the center coordinates the position of the center can be calculated. The
speed calculation uses the center positions, determines the distance between neighbor-
ing frames centers and measures time between frames (Fig.8). Height and width of
objects can be calculated as estimation or with more computational effort in detail.
In the estimation it is assumed, that the object is farther away and all points of the
object have nearly the same distance to the camera. If the calculation has to be more
detailed, all individual measuring points have to undergo disparity detection, similar
to the center disparity detection.
Fig. 8. Detected object with distance, height, width and speed
The error of the distance measurement has been verified by distance measurement
with a laser measuring device and lies below 10%.
9 Algorithm
In the complete algorithm left and right images are acquired and during a learning
phase of 100 images a standard dense disparity map of the background is produced
with a running average process and the semi global block matching method.
Moving objects are detected in both images with the method adaptive median
background subtraction, detected pixels are evaluated belonging to the object with
the method connected components. This produces two images holding detected
objects.
A second disparity map is produced out of the two images with the detected ob-
jects. This disparity map is calculated by evaluating the horizontal difference of the
center of objects in the left and right image.
A third disparity map is calculated by using the method described in 6 (subtraction
stereo).
The first disparity map represents the background and can be used to measure the
distance between the objects and the background.
The second disparity map decides, if the object is visible in both images and if a
detailed detection is reasonable.
The third disparity map is used to calculate the objects properties like center, size,
and speed. (Fig.9)
Fig. 9. Object detection algorithm
Computational cost of this algorithm for up to three objects is 50ms per frame at a
resolution of 640x320 pixels, which is sufficient for real time use. The calculation
time depends on the object size and on the number of objects.
10 Discussion
These methods allow a stable localisation of objects. Based on this work, im-
provements like better shadow removal [3], multi-object-tracking, identifying multi-
ple objects out of a crowd are under consideration. To achieve better distance resolu-
tion the cameras should be mounted with greater distance which complicates calibra-
tion. As calibration is crucial for calculation accuracy, methods for automatic calibra-
tion and recalibration should be developed.
11 Acknowledgments
The work reported in this article has been done within the framework of the Euro-
pean FP7-SEC Project INDECT (http://www.indect-project.eu)
12 References
1. Hirschmller H.,Scharstein D,: Evaluation of Stereo Matching Costs on Images with Radi-
ometric Differences, IEEE Transactions on Pattern Analysis and Machine Intelligence
(2008)
2. Umeda K., et al., Subtraction Stereo - A Stereo Camera System that focuses on Moving
Regions, Proceedings of SPIE-IS&T Electronic Imaging, 7239 Three-Dimensional Imag-
ing Metrology (2009).
3. Terabayashi K., et al., Improvement of Human Tracking in Stereoscopic Environment Us-
ing Subtraction Stereo with Shadow Detection, International Journal of Automation Tech-
nology, Vol. 5, No. 6, pp.924-931, (2011)
4. Kollmitzer C., Weichselbaum J., Hager C.: Background modeling by combining codebook
method and disparity maps, Proceedings of MCSS (2010)
5. Kim K., Chalidabhongse T.H., Harwood D., Davis L.S.: Real-time foreground-background
segmentation using codebook model. Real-Time Imaging 2005.
6. Lo B.,Velastin S.: Automatic congestion detection system for underground platforms," in
Proceedings of 2001International symposium on intelligent multimedia, video, and speech
processing, pp. 158{161, (Hong Kong), May 2001.
7. G. Bradsky, A. Kaehler, Learning OpenCV, Computer Vision with the OpenCV Library
(Book Style), OReilly Media, Sebastopol, CA, 2008
View publication stats

Obect Detection and Measurement Using Stereo Images

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Obect Detection and Measurement Using Stereo Images

Hochgeladen von

Copyright:

Verfügbare Formate

See

Object Detection and Measurement Using

Conference Paper June 2012

The user has requested enhancement of the downloaded file.

Keywords: Computer Vision, Stereo Vision, Foreground Segmentation, Dispar-

The resolution of the distance measurement can be improved by increasing the

Fig. 1. Camera rig

Fig. 2. Rectified images

Disparity calculation is a matching problem, the position of identical points in the

Resolution 320x240: 39ms

Fig. 3. Left image and corresponding background disparity image

Codebook background registration time per frame:

Resolution 320x240: 49ms

The object center position is calculated (d = disparity; T = distance of the cameras;

The calculated position is inserted in the display (Fig.4). Additionally certain

Fig. 5. Distance measurement

Fig. 6. Frame of object tracking video with position of center

Fig. 7. Object center tracking projection (x/y in dm)

Fig. 8. Detected object with distance, height, width and speed

Fig. 9. Object detection algorithm

View publication stats

Das könnte Ihnen auch gefallen