Beruflich Dokumente
Kultur Dokumente
John Martin
Introduction
As automotive traffic patterns in many large cities grow more complex, the study of these
traffic flows requires new tools which can bridge the gap between treating traffic flows as
a statistical fluid and examining individual driver behavior. Presented here is a tool
which can track individual vehicles as they pass an overhead visible light camera. The
result of such tracking will be a set of trajectories from which human driver behaviour
and vehicle interaction may be studied.
This paper will describe the image pattern recognition techniques used to observe
vehicles as well as the tracking techniques used to tie each frame’s pattern recognition
results together. In addition, we will discuss reasons for moving away from the motion
segmentation techniques commonly used in the past as well as the current limitations of
applying these techniques.
Existing Systems
Previous research in automotive tracking systems has not been completely successful. In
[6] Kalman-Snakes provide automobile contour tracking after an initial motion
segmentation step. The authors of [2] use block matching to find optical flow, and a
priori knowledge of the road geometry is used to handle stationary vehicles. In [14]
background estimation isolates foreground objects as “blobs”, and principal component
analysis is then used to classify the blobs and estimate their orientation.
1
Finding Cars by Appearance
Since we have dismissed motion segmentation as unreliable for this application, the
traffic tracker must be able to identify a certain pattern of pixels as being the image of a
vehicle. Ideally, one could set up a lookup table which would contain every pixel pattern
possibility coupled with a binary value indicating the pattern’s class, either “vehicle” or
“no vehicle”. Classification would be a simple matter of retrieving the binary value
residing at the current pattern’s location in the lookup table. Unfortunately, aside from the
difficulties of training such an ideal classifier, no computer or future computer has
enough memory for a lookup table with 6400 256 entries. (80x80 pixel image fragment
with 256 discrete levels) [9].
Because this “ideal” pattern classifier is not possible to implement, a more realistic
pattern classifier might search for simple features which are specific to images of
vehicles. Unfortunately, we have not been able to find simple, specific vehicle features
which are invariant between vehicle types, reflectivity, orientation and lighting.
Our initial attempts at solving this vehicle recognition problem used geometric primitive
templates. This earlier algorithm discovered edges of cars in the input image with the
Canny edge detector [22], and used those edges to fit ellipses [21]. If the ellipses were
the correct length and width for a vehicle, a vehicle was assumed to exist at the center of
the ellipse. Unfortunately, the resulting algorithm proved unwieldy and highly dependent
on the setting of thresholds. With proper tweaking, the algorithm could work for short
periods of time on example video, but, generally, the algorithm would fail when
presented with geometries and lighting different from the situation for which the various
thresholds were set. As an example, closely packed vehicles at traffic light queues made
it particularly difficult to assign distinct, correct ellipses. Also, a shadow could eliminate
the distinct edges on one side of the vehicle and thereby change the geometry of the
vehicle significantly. The ellipse fitting experiments revealed that using simple templates
is a problem because there is no systematic method of extending simple templates to new
geometries and lighting conditions.
2
A pattern classifier can be used as a scanner which moves across an input image,
classifying an 80x80 sub image at each image location. Figure 1 attempts to illustrate the
scanner. Each of the sub images selected by the scanner are then fed into a classifier
which decides whether there is a car centered in the sub image. Figure 2 shows several
80x80 sub images which do have cars at their center.
Figure 1 The scanner selects all the 80x80 square sub images from the original image and tries to
discover if there are cars in the middle of the sub image
3
Support Vector Machines
In the first attempt to apply pattern recognition techniques to this problem, we used a
support vector machine (SVM) [17] to indicate whether a certain set of pixels could be
classified as a vehicle. This SVM classifier was then applied to each 80x80 sub image
within the original image to find car image locations.
This method showed promise-- one could even say that it “worked”. The primary
drawback was that it could require one computer more than a hundred seconds to scan
one frame of size 768 x 200 for vehicles. The use of multiple networked computers
brought the frame evaluation time down below ten seconds, but the complexity of
messaging across a network made this solution an undesirable one.
The SVM classifier was trained with approximately 300 features selected from a sub
image’s Haar wavelet coefficients. These features were selected manually, with no
algorithmic guidance other than generally picking low frequency coefficients in the
vicinity of the center of each sub image under consideration. In short, these 300 features
were selected by best guess. Unfortunately, leaving feature selection to “best guess” is
not a particularly good way of designing a classifier. Also, selecting features from a
complete basis set such as the Haar basis is limiting. An overcomplete basis set provides
a richer set of features to choose from.
Viola also discusses how to “cascade” simple classifiers such that certain regions of the
input image are eliminated from consideration early in the classification process. Because
no further computational resources are devoted to these eliminated regions, this early
elimination can produce significant increases in classification speed.
4
that training set patterns which are already “taken care of” by previously trained
classifiers do not heavily impact the training of the current weak classifier.
x, y
II ( x, y ) = ∑ I ( x, y )
0,0
Area = II ( B) − II (C ) − II ( A) + II ( D)
Figure 3
This speedy method of discovering the sum of pixels within a rectangle can be used to
evaluate a limited set of overcomplete operators which can be used as weak classifiers.
These operators may be thought of as match detectors. The filter kernel for several types
of these operators may be found in Figure 4, where gray regions are zero, white regions
are 1 and black regions are -1. After the integral transform has been used to find the sum
of pixels in both the white and black regions, the operator value is found by subtracting
the black region sum from the white region sum.
5
Using the Features
For example, if one desired a simple classifier which identifies BMW logos (Figure 5)
within an input image, the first feature type shown in Figure 4 could be used.
Conveniently, this feature shares the “center of gravity symbol” appearance with the
BMW logo. The feature could be centered within the 80x80 input region, and the size
increased such that it looked like the feature operator shown in Figure 6.
Viola’s Classifiers
6
Viola’s adaptation of Adaboost provides a method of weighing and selecting these feature
operators such that a collection of the operators may classify a complex object such as a
car with a reasonable error rate.
Viola defines a simple classifier based on features evaluated with the integral
transform[23]. This weak classifier consists of:
1 if pf ( x) < pθ
h( x ) =
0 otherwise
where h is the classifier, p is the parity, θ is the threshold and f is one of the integral
transform-type features discussed above and x is the 80 x 80 sub image. The classifier
reports a value of 1 when it believes that a car is found, and it reports a value of 0 when a
car has not been found. Training a classifier which uses a given feature f consists of
discovering the threshold and parity which maximize it’s classification performance
within the training set.
1 T
1 ∑t =1α t ht ( x ) ≥ ∑t =1α t
T
1
• The final classifier is: h( x) = 2 where α t = log
0 otherwise βt
7
Note that the weight associated with each training example is decreased as weak
classifiers correctly classify the training example. In this way, as weak classifiers are
trained, they are less responsive to training examples which are already “covered” by
previously trained weak classifiers. Also note that the final classifier sums the weak
classifiers’ results with weights based on the weighted error of the classifier within the
training set.
Unlike the pseudo-code description above, the training code used in the car detection
application does not exhaustively search for the “best” weak classifier on each iteration
of the main “for” loop. It merely looks at several thousand random classifiers and selects
the best classifier. This short-cut substantially decreases training time. However, since
each iteration is not necessarily finding the best classifier, the final classifier’s
performance may suffer.
Cascade of Classifiers
Viola’s feature evaluation is fast, but scanning an image with a 1000 feature classifier
remains time consuming. Thus, initial stages of classification are performed with
relatively inaccurate, but simple, collections of weak classifiers. Areas of the image
which are unlikely to correspond to a vehicle are rejected early in the process. For
instance, in the first stage of the classifier, there are only three features used, shown in
Figure 7. The resulting three feature classifier can find image regions which are not
likely to be vehicles as shown in Figure 8. The portions of the image which are not filled
in with red pixels are non-vehicle regions and these regions may be removed from further
consideration. After this initial classification step, more complex classifiers may then be
applied to areas of the image still under consideration. Figures 9,10 and 11 illustrate this
process. Each classifier stage becomes more complex and time consuming, but each
stage also eliminates sections of the image. The complex latter stage classifiers will
never see the regions of the image which are “easy” to dismiss as non-car. Depending on
the complexity of the image, large speed increases can be realized.
8
Figure 8 The red regions represent areas of the image which are still under consideration as “car-
like”. This is only a quick, first stage analysis of the image, which uses only the three features shown
in Figure 7.
Figure 9 The red regions represent areas of the image which are still under consideration as “car-
like”. This is the sixth stage of a 20 stage classifier. While some of the red regions are correct, there
is still considerable noise.
Figure 10 The red regions represent areas of the image which are still under consideration as “car-
like”. This is the eleventh stage of a 20 stage classifier. Much of the noise has been eliminated.
9
Figure 11 The red regions represent areas of the image which are still under consideration as “car-
like”. This is the final stage of a 20 stage classifier. While all the noise has been eliminated, the car
second from the front on the far side of the queue is no longer detected.
However, classifier stages with small numbers of operators cannot provide the necessary
classification accuracy. Thus, much of the classifier’s heavy-lifting is done in the
classifier’s later stages.
The positive training set remains constant for all classifier stages while the negative
training set consists of negative images which the previous stage failed to classify
correctly. The false positives of the previous stage are used to train the current stage such
that the classifier stages have varied “talents.”
As discussed above, using a support vector machine classifer [16] to scan the entire input
image is rather time consuming, a support vector machine classifier may be employed as
a final stage after less time consuming classifiers have, hopefully, classified most of the
image as “not car”. The final support vector machine uses three hundred of Viola’s
integral transform features to form a binary valued SVM input vector. The SVM was
tested on a labeled set of test images which were kept separate from the training images,
and its accuracy was 98%.
Say there are two classes of example patterns (vehicle or no vehicle, positive or negative,
for example). Each example pattern may be expressed as a vector and placed as a point
within a vector space shown below in two dimensions.
10
Figure 1 Classifying set A and set B
If the two classes of example patterns are separable, each class forms its own cloud of
points in the vector space and a plane may be drawn between the two clouds of points.
New example vectors are classified by evaluating the side of the separating plane on
which they lie. Implicit here is the assumption that vectors of the same class lie together
in their vector space.
In general, however, the pattern vector space is not limited to two dimensions as shown in
the above figure. Say there is an example set S = {( X i , y i )}i =1 . Where X i is a pattern
m
vector of size n of quantity m , and the y i are simply labels which indicate the vector’s
class, y i ∈ {−1,1} . The classifier then takes the following form:
Equation 1
m
f ( X ) = ∑ λi y i X i X + b
T
i =1
f ( X ) = 0 is a hyper-plane separating the two classes of X i . The λi ’s and the origin offset
b are selected during training such that the margin between the training set points and the
hyper-plane is maximized. For many vectors X i , the corresponding scalar λi will be very
close to zero. These X i may be neglected in the classifier’s summation. The remaining
X i , with non-zero λi , are called support vectors.
11
into a higher dimensional “feature” space, where a separating hyper-plane can often be
found. For instance, in the figure below, two classes (x’s and o’s) are placed in a 2D
space, and the classes are not linearly separable. However, one could imagine an ellipse
might be drawn such that the x’s are inside the ellipse and the o’s are outside.
One way of finding the separating ellipse would be to project these 2D vectors [ z1 , z 2 ]
into a higher dimensional space [ z1 , z 2 , z12 , z 22 , z1 z 2 ] where this ellipse becomes a hyper-
plane. [16]
The major drawback to finding a hyper-plane in a higher dimensional space is that each
of the inner products in Equation 1 require a number of multiplications equal to the
dimension of this higher dimensional feature space. This problem would seem to limit
the dimensionality of the feature space. However, most projections into the feature space
are accomplished with an implicit mapping expressed as a kernel function which defines
the inner product between two vectors in the feature space:
Equation 2
K ( X , Z ) = φ ( X ) • φ (Z )
where K is the kernel function and the vector valued function φ () is the mapping from
the original input space to feature space. K is selected such that φ () is complex enough
to possibly separate linearly inseparable classes, while K itself is kept reasonably simple
and relatively computationally non-intensive.
12
Equation 3
m
f ( X ) = ∑ λi y i K ( X i , X ) + b
i =1
Notice how the kernel function in equation 3 takes care of evaluating the inner product in
feature space. The mapping function φ () does not need to be evaluated at all. [16]
There are two primary criteria used when selecting an SVM kernel function:
1) The kernel function must provide a rich feature space [7]
2) The kernel function must be computationally non-intensive
Equation 4
( n ,n )
K ( X , Z ) = ( X • Z )2 = ∑ (x
( k , j ) = (1,1)
k x j )( z k z j ) = φ ( X ) • φ ( Z )
Here, from Equation 4, we can see that this kernel makes φ ( X ) = ( x k x j ) (1,1) where x k is
( n ,n )
the kth element of X . In this particular case, the mapping function φ provides an n 2
dimensional space in which to find a separating hyper-plane, while computing an inner
product in this n 2 dimensional space only requires n multiplications for the inner product
and one more to square the result.
The complexity of the kernel is only one factor affecting the amount of time it takes to
classify a certain input vector. Additionally, the number of λi ’s, m , linearly affects the
amount of time it takes to evaluate the classification function. Fortunately, the set of
support vectors discovered during training is not necessarily the smallest set of vectors
which can describe the decision surface f ( X ) = 0 .
If the quadratic, homogeneous, kernel is used in the classifier in equation 3, the following
expression results for the SVM classifier:
Equation 5
m
f ( X ) = ∑ λi y i ( X i X ) 2 + b
T
i =1
13
which may be expressed in matrix form as [see Appendix A]:
Equation 6
f ( X ) = X T AX + b
Equation 8
n
Auv = ∑ α i z iu z iv
i =1
With this exact simplification method, a quadratic, homogeneous kernel classifier may be
greatly simplified by finding the eigenvectors and eigenvalues of the symmetric matrix
A . However, this method works only on quadratic, homogeneous kernels, which may or
may not offer the mapping and the feature space complexity required.
After a frame is scanned, some locations within the image are suspected of being cars.
However, there are many more suspected vehicle locations than there are vehicles in the
image. These suspected vehicle locations must be brought together in clusters, such that
there is, ideally, only one measurement for each vehicle.
The clustering method employed in this application uses a Delaunay [25] triangulation.
The locations within the image which are suspected cars are triangulated as shown in
14
Figure 12. After triangulation, triangles which have a large area or are long and thin are
discarded and small,well-behaved, triangles are kept as vehicle locations. The centroids
of the resulting polygons are considered to be the final measurement of the vehicle’s
location. The area of the resulting measurement polygon is considered to be the
measurement’s confidence because actual vehicles tend to produce many vehicle
indicators, while false positives tend to be isolated.
Tracking Vehicles
In this case, a Kalman filter [26] is initialized for each untracked measurement which
exceeds confidence thresholds. This Kalman filter is updated with measurements as long
as a measurement appears within some distance of the Kalman filter’s current state. This
track-to-measurement association window varies in size depending on the confidence in
the Kalman filter’s internal states. A Kalman filter instance which has not received an
update for several frames has a larger track-to-measurement association window than a
Kalman filter which received an update in the previous frame.
Track-to-Measurement Association
For each frame, there is a set of measurements which must be associated with a set of
existing tracks. While some methods attempt to use probabilistic methods of associating
measurements with tracks [20], this application uses a simpler method which requires
15
that each track either be associated with a single measurement, or not associated with a
measurement at all. The association algorithm first tries to group tracks with high-
confidence measurements. If a track is near one of the high confidence measurements, it
is updated with that measurement. The remaining measurements are then placed in
confidence groups and applied to the remaining, unassociated tracks until there are no
further measurements. This ordering of measurements gives a higher priority to high-
confidence measurements, and it seeks to avoid a situation where a track is updated with
the closest measurement without regard to the measurement’s confidence.
16
Tracks which have not been updated by an actual measurement use their internal states
and the Kalman filter’s system model to find their location in the next frame.
Track Criteria
If a track does not meet certain geometric criteria, it is judged to have not come from a
vehicle, and it is dropped. Specifically, tracks must have a minimum length and a
minimum extent. Length is the linear distance between the start and end. Extent is the
size of the smallest rectangle which can be drawn around the track. If a track lies outside
the preset geometric parameters, it is rejected.
World Coordinates
Measurements taken from an image are not terribly useful unless there is a method of
associating a location in world coordinates with each pixel location in the image. It is
necessary to find a function which can transform an image location in pixels to a world
location in feet or meters... regardless of lens distortions. This process of finding an
image-to-world transformation function is often referred to as “calibrating” a camera, and
the image-to-world function and its parameters are often called a “camera calibration”.
Camera Calibration
There are two general types of camera calibration parameters, extrinsic and intrinsic.
Intrinsic calibration parameters are parameters which describe the distortions intrinsic to
the camera, while extrinsic parameters describe the orientation of the camera with respect
to the world coordinates. A commonly used method for finding intrinsic camera
parameters is described in [27]. This camera calibration method is encapsulated in an
easy-to-use Matlab toolbox [28] which requires the user to take images of a checkerboard
pattern held in front of the camera at various angles. The Matlab script then calculates
the intrinsic camera parameters.
17
After the intrinsic camera parameters are found, it is necessary to find the extrinsic
parameters of the camera. While it is assumed that the location of the camera has been
surveyed by GPS, the orientation of the camera is not immediately known and must be
calculated by knowing the correspondence between world and image coordinates for
three points in the image. Since the intrinsic camera parameters are already known,
accurate vectors in a camera-centered coordinate system may be found for each of the
three surveyed image locations under consideration. If the world locations for each of the
three image locations have been surveyed by GPS, a linear transformation R may be
found such that:
Vwi = RVci
i
Where Vw represents the ith unit vector which extends from the camera’s focal point
i
towards the surveyed object in world coordinates, and Vc represents the very same vector
expressed in camera coordinates. R is a 3x3 matrix which represents the rotation
between the world coordinate system and the camera coordinate system. Suppose that
three world coordinate vectors exist such that:
[
Vw = Vw1 Vw2 Vw3 ]
and three camera coordinate vectors form a matrix such that:
[
Vc = Vc1 Vc2 Vc3 ]
then:
Vw = RVc
and if the vector sets contained in the matrices Vw and Vc are not coplanar, R may be
found by:
R = VwVc−1
The rotation matrix R should be generally orthonormal. Indeed, a reasonable sanity
check on the result could consist of checking that the dot products of the matrix columns
are close to zero. One could also check to see if the norm of the columns is close to
unity. There are several reasons why R may not be orthonormal—a non-exhaustive list
follows:
18
• The surveyed vectors were not normalized
If greater accuracy is desired, more than three surveyed points may be used. Since
Vw = RVc is overdetermined for more than three vectors, least-squares or some other
minimization method may be employed.
• principal point in x
• principal point in y
• focal length x
• focal length y
• four radial distortion coefficients
C code for using the above parameters may be found in Intel’s OpenCV library. The
OpenCV function “icvNormalizeImagePoints” will use the above parameters to convert
an image point xi , y i to a normalized image point xc , y c . In this case “normalized”
means that the original image point, xi , y i , has been transformed such that the original
image had been captured with an imaginary, ideal, pinhole camera with unity focal
length. “icvNormalizeImagePoints” will return a vector Vc for image point xi , y i .
Vc = ( x c , y c ,1) is a vector which points from the camera’s focal point to the object in the
real world which had been represented in the original image by the pixel location xi , y i .
If we transform Vc such that it is aligned with the standard world coordinate axes, we
have:
Vw = RVc
where the rotation matrix R is found as described above. The world coordinate
representation of the original image point xc , y c may be found by finding the intersection
of Vw with the ground plane.
One of the primary limitations of the tracking algorithm is the pattern recognition. The
pattern classifier has been primarily trained on the data collected during the
November/December 2001 test at the intersection of Refugee Road and Winchester Pike
in Columbus,Ohio. The degree to which the vehicle classification depends on the
19
specific backgrounds and lighting conditions encountered during this period of operation
is unknown.
Figure 13 Digital photograph classifier results from an Ohio State U. parking garage
The 80x80 image segments, which are fed into the classifier, are large enough to
encompass large amounts of background. This large classification window is necessary
because the vehicles often do not appear with significant amounts of detail (dark vehicle
casting a long shadow, for instance). In these situations recognizing a vehicle is difficult,
even for a human, without significant amounts of context on either side of the vehicle.
This means that the recognition is very background dependent. For instance, a dark blob
encountered within lane markers might be classified as a vehicle while a dark blob
encountered elsewhere is not. However, a more robust system might rely more on
internal vehicle details (windows, wheels) than on the more general vehicle body.
Large trucks remain a problem. Some large trucks are classified correctly, a light colored
truck cab with dark windows will generally result in a correct classification as shown in
Figure 14. However, in general, trucks are not classified correctly. Also, a large semi-
tractor trailer may obscure several cars from the camera’s view—in these cases, the track
is almost always lost or led astray.
All truck detections are generalizations from cars. Trucks were not included in the
training set, in either the positive or negative classes. Thus, large trucks and their trailers
produce may produce many false positives. If these false positives are persistent enough,
20
trucks may cause spurious tracks which may need to be deleted later. At some point, a
classifier designed specifically for trucks may need to be designed.
21
22
Recognition Limitations
Although the pattern recognition-based traffic tracking implemented here is more robust
than techniques such as background estimation, the pattern recognition can be easily
misled. Different lighting conditions may still cause objects within a frame of video to
have an appearance very different from any of the training data. While the goal of the
pattern recognition is to generalize from the training set such that never-before-observed
vehicles are recognized as vehicles, lighting and shadows may still cause a recognition
error.
3
Figure 15 Dazzle paint camouflage.
23
Figure 15 shows a ship painted in a type of wartime camouflage which attempts to
disrupt recognition of the object, rather than trying to blend in with the background.
Normally, the human eye could detect the above ship many miles away, and both the type
and direction of the vessel could be easily discernable. However, the dazzle paint
camouflage disrupts the lines which normally encode information about the type and
direction of the vessel. The success and widespread use of such camouflage prior to the
advent of radar shows that in some cases, even human pattern recognition is not
necessarily robust.
Figure 16 shows a car tracking situation where the false positives caused by shadows are
a significant problem. The top image of Figure 16 shows regions of the image which are
recognized as cars, while the bottom image is identical but shows the same image with
the current tracks overlaid. The shadow cast by the upper portion of the lower right hand
utility pole is interpreted strongly and wrongly as a car. This mistaken measurement is
labelled “1” in the upper image while the corresponding track is labeled “2721” in the
lower image. In addition, the group of bushes next to the gas station in the middle of the
image is being tracked as a vehicle, and the bushes on the right side of the screen are
being tracked as a vehicle as well. The top of the lower right utility pole is generating a
false positive. Finally there are two false positive measurements in the upper left of the
image. Generally, classification performance seems to decline in the presence of strong
sunlight and shadows. Figure 17 shows the same scene under diffuse lighting conditions.
One can see that the incidence of false positives has declined.
One possible reason for the difference in classification performance between direct and
diffuse lighting is that there are fewer ways of lighting an object with diffuse lighting
than with direct lighting. In other words, the specification of direct lighting source
requires a direction and an intensity, while the specification of a diffuse lighting source
requires only an intensity. Thus, an object viewed under direct lighting has more
variation in its appearance than an object viewed under diffuse lighting. The shadows
cast by other scene objects is another major problem with direct lighting as well. Some
algorithms may attempt to eliminate some of the lighting variation by employing
techniques such as brightness plane subtraction. Currently, however, no such algorithms
are currently being used for this application.
24
Figure 16 Tracking under sunny lighting conditions
Possible Improvements
There are three primary ways in which the system’s performance may be improved and
generalized. First, one could place recently discovered false negative images into the
positive training set. This would allow car images which were mistakenly identified as
“not-car” to be incorporated into the classifier’s training. Similarly, one could also place
new false positives into the negative training set as well. These additions to the training
images would allow the classifier to learn from its mistakes. Second, tracking could be
improved by using the probabilistic data association techniques discussed in [20].
Currently, the data association is “nearest neighbor” with some attempts to favor high
25
confidence measurements. Third, one could try to find a method of decreasing the
lighting variation before even applying the classifier. If there were less lighting variation,
the classifier’s task would not be as difficult.
Conclusion
This vehicle classifier and tracker provide a method of studying traffic flow which has
not previously existed. At the present time, there is no other means of finding multiple
vehicle trajectories in complex traffic situations. The inherent flexibility and extensibility
of the pattern classifier at the heart of the system speaks well for operation in varied
environments.
26
References
[1] M.S. Bartlett, H.M. Lades T.J. Sejnowski. Independent component representations for
face recognition. In Proceedings of the SPIE Conference on Human Vision and Electronic
Imaging III, volume 3299 1998
[2] F. Bartolini, V. Capellini, and C. Giani. Motion estimation and tracking for urban
traffic monitoring, International Conference on Image Processing, volume 3, 1996
[3] C.J.C Burges. Simplified support vector decision rules. In International Conference on
Machine Learning,
[4] R. Collobert and S. Bengio. SVMTorch: Support Vector Machines for Large-Scale
Regression Problems. Journal of Machine Learning Research, 1:143-160, 2001
[5] T. Downs, K.E. Gates, A. Masters. Exact Simplification of Support Vector Solutions,
In Journal of Machine Learning Research 2 2001
[6] D. Koller, J. Weber and J. Malik, Towards realtime visual based tracking in cluttered
traffic scenes, In: Proc. of the Intelligent Vehicles Symposium 1994, October 1994, Paris,
France
[9] H. Schneiderman and T. Kanade Object Detection Using the Statistics of Parts, In
International Journal of Computer Vision 2002
[10] R.A. Singer Estimating optimal tracking filter performance for manned
maneuvering targets, IEEE Tran. Aerospace Electronics Systems, 1971
[11] Z Sun, G. Bebis, and R. Miller, “Quantized wavelet features and support vector
machines for on-road vehicle detection,” The Seventh International Conference on
Control, Automation, Robotics and Vision, December, 2002, Singapore
[12] R.Y. Tsai. “A versatile camera calibration technique for high-accuracy 3D machine
vision metrology using off-the-shelf TV cameras and lenses,” IEEE Journal of Robotics
and Automation, Vol RA-3, No. 4, August 1987, pages 323-344
[13] M. Turk and A. Pentland. Eigen Faces for Recognition. Journal of Cognitive
Neuroscience, 3(1), 1991.
27
[14] H. Veeraraghaven and O. Masoud. N. Papanikolopoulos Managing Suburban
Intersections Through Sensing Technical Report Intelligent Transportation Systems
Institute University of Minnesota December 2002
[15] F. Rosenblatt “The Perceptron: A Probabilistic Model for Information Storage and
Organization in the Brain,” Cornell Aeronautical Laboratory, Psychological Review
1958, v65, No. 6
[17] V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York 1995
[19] C. Rasmussen and G.D. Hager “Probabilistic Data Association Methods for Tracking
Complex Visual Objects” IEEE Trans. on Pattern Analysis and Machine Intelligence,
June 2001
[20] Y. Bar-Shalom and T. Fortmann. Tracking and Data Association. Academic Press,
1988
[21] A. W. Fitzgibbon and R.B. Fisher. “A Buyer’s Guide to Conic Fitting”. Proc. 5th
British Machine Vision Conference, Birmingham 1995
[23] P. Viola and M. Jones, Robust Real-Time Object Detection. International Journal of
Computer Vision, 2002
[25] Leonidas J. Guibas and Jorge Stolfi. Primitives for the Manipulation of General
Subdivisions and the Computation of Voronoï Diagrams. ACM Transactions on Graphics
4(2):74-123, April 1985.
[26] R.E. Kalman A New Approach to Linear Filtering and Prediction Problems.
Transactions of the ASME--Journal of Basic Engineering, 1960 vol 82 pp 35-45
[27] Z. Zhang. “A Flexible New Technique for Camera Calibration.”, IEEE Trans. on
Pattern Analysis and Machine Intelligence, 22(11):1330-1334, 2000.
28
[28] Jean-Yves Bouguet, Camera Calibration Toolbox for Matlab, April 2002. Available:
http://www.vision.caltech.edu/bouguetj/calib_doc
29
Appendix A
m
f ( X ) = ∑ λi y i ( X T X ) 2 + b
i =1
m
f ( X ) = ∑ λi y i [( xiu xiv ) (( un ,,vn))=(1,1) ] • [( xu x v ) (( un ,,vn))=(1,1) ] + b
i =1
m
f ( X ) = [∑ λi y i [( xiu xiv ) (( un ,,vn))=(1,1) ]] • [( xu x v ) ((un ,,vn))=(1,1) ] + b
i =1
f ( X ) = X T AX + b
m
Auv = ∑ λi y i [( xiu xiv ) (( un ,,vn))=(1,1) ]
i =1
30