Full Text 01

Institutionen fr systemteknik
Department of Electrical Engineering

Examensarbete
An Optimization Based Approach to Visual

Odometry Using Infrared Images
Examensarbete utfrt i Reglerteknik

vid Tekniska hgskolan i Linkping
av
Emil Nilsson
LiTH-ISY-EX--10/4386--SE
Linkping 2010

Linkpings universitet
SE-581 83 Linkping, Sweden
Linkpings tekniska hgskola

581 83 Linkping
An Optimization Based Approach to Visual

Odometry Using Infrared Images
Examensarbete utfrt i Reglerteknik

vid Tekniska hgskolan i Linkping
av
Emil Nilsson
Handledare:
Christian Lundquist
isy, Linkpings universitet
Jacob Roll
Autoliv Electronics
David Forslund
Autoliv Electronics
Examinator:
Thomas Schn
isy, Linkpings universitet
Linkping, 15 June, 2010
Avdelning, Institution
Division, Department
Datum
Date
Division of Automatic Control

SE-581 83 Linkping, Sweden
Sprk
Language
Rapporttyp
Report category
ISBN
Svenska/Swedish
Licentiatavhandling
ISRN
Engelska/English
Examensarbete

C-uppsats
D-uppsats
vrig rapport
2010-06-15
Serietitel och serienummer ISSN
Title of series, numbering

URL fr elektronisk version
http://www.control.isy.liu.se
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-57981
Titel
Title
En optimeringsbaserad metod fr visuell odometri med infrarda bilder

An Optimization Based Approach to Visual Odometry Using Infrared Images
Frfattare Emil Nilsson

Author
Sammanfattning
Abstract
The goal of this work has been to improve the accuracy of a pre-existing algorithm
for vehicle pose estimation, which uses intrinsic measurements of vehicle motion
and measurements derived from far infrared images.
Estimating the pose of a vehicle, based on images from an on-board camera and
intrinsic measurements of vehicle motion, is a problem of simultanoeus localization
and mapping (SLAM), and it can be solved using the extended Kalman lter
(EKF). The EKF is a causal lter, so if the pose estimation problem is to be solved
o-line acausal methods are expected to increase estimation accuracy signicantly.
In this work the EKF has been compared with an acausal method for solving the
SLAM problem called smoothing and mapping (SAM) which is an optimization
based method that minimizes process and measurement noise.
Analyses of how improvements in the vehicle motion model, using a number
of dierent model extensions, aects accuracy of pose estimates have also been
performed.
Nyckelord
Keywords
visual odometry, smoothing and mapping, SAM, SLAM, automobile motion model,
FIR, monocular
Abstract
The goal of this work has been to improve the accuracy of a pre-existing algorithm
for vehicle pose estimation, which uses intrinsic measurements of vehicle motion
and measurements derived from far infrared images.
Estimating the pose of a vehicle, based on images from an on-board camera and
intrinsic measurements of vehicle motion, is a problem of simultanoeus localization
and mapping (SLAM), and it can be solved using the extended Kalman lter
(EKF). The EKF is a causal lter, so if the pose estimation problem is to be solved
o-line acausal methods are expected to increase estimation accuracy signicantly.
In this work the EKF has been compared with an acausal method for solving the
SLAM problem called smoothing and mapping (SAM) which is an optimization
based method that minimizes process and measurement noise.
Analyses of how improvements in the vehicle motion model, using a number
of dierent model extensions, aects accuracy of pose estimates have also been
performed.
Sammanfattning
Mlet med detta examensarbete var att frbttra precisionen hos en redan existerande algoritms skattningar av ett fordons pose (position och orientering), som
anvnder interna mtningar av fordonets rrelse samt mtningar erhllna frn
infrarda bilder.
Att skatta ett fordons pose, utifrn bilder frn en kamera ombord p farkosten
samt interna mtningar av fordonsrrelse, r ett problem av typen samtidig lokalisering och kartering (SLAM), och det kan lsas med ett utkat Kalmanlter (EKF).
EKF r ett kausalt lter, s om skattning av pose ska utfras i efterhand kan
icke-kausala metoder istllet anvndas och sdana metoder frvntas ge avsevrt
frbttrad precision i skattningarna. I detta examensarbete har EKF jmfrts med
en icke-kausal metod att lsa SLAM-problemet som kallas glttning och kartering
(SAM), en optimeringsbaserad metod som minimerar process- och mtbrus.
ven analyser av hur frbttringar av fordonets rrelsemodell, med ett antal
olika modelltillgg, pverkar precisionen i skattningarna av pose har genomfrts.
Acknowledgments
First of all I would like to thank David Forslund and Jacob Roll at Autoliv Electronics for the help, in all kinds of aspects, during this masters thesis. Of great
help has also Christian Lundquist and my examiner Thomas Schn been, constantly sharing ideas and answering questions throughout my work. My thanks
also goes to Zoran Sjanic and Martin Skoglund for giving me an introduction to
SAM.
Furthermore I want to take the opportunity to thank Thomas Karlsson and
Gran Forsling at the Mathematics department for, during all of my years at the
university, always allowing me to ask them questions, giving me helpful solutions
and explanations to mathematical problems and queries.
Last but not least, I would like to thank my family and Anna for all encouragement and support you have given me throughout the years.
vii
Contents
1 Introduction
1.1 Background . . . . . . . .
1.2 Problem Formulation . . .
1.3 Autoliv . . . . . . . . . .
1.4 Camera System Overview
1.5 Related Work . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
2
3
4
2 System Modeling
2.1 Coordinate Frames . . . . .
2.2 Rotation Matrices . . . . .
2.3 Vehicle Process Model . . .
2.3.1 Basic Model . . . . .
2.3.2 Model Extensions . .
2.4 Landmark Parametrization
2.5 Measurement Model . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
6
7
7
9
11
13
3 Visual Odometry and SLAM

3.1 Extended Kalman Filter . . . . .
3.2 Feature Detection . . . . . . . . .
3.3 Feature Association . . . . . . . .
3.4 Smoothing and Mapping . . . . .
3.5 Feature Association Improvement
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
15
18
19
19
25
4 Results
4.1 Performance Measures . . . . . . . . . . .
4.1.1 Landmark Measurement Residuals
4.1.2 Trajectory Plot in Camera Image .
4.2 SAM and Model Extensions . . . . . . . .
4.3 Feature Association Improvement . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
27
27
28
29
54
5 Concluding Remarks
5.1 Conclusions . . . . . . . . . . . . . . . . .
5.1.1 SAM . . . . . . . . . . . . . . . . .
5.1.2 Vehicle Process Model Extensions
5.1.3 Feature Association Improvement .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
57
57
57
58
ix
Contents
5.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bibliography
A Nomenclature
A.1 Mathematical Notations .
A.2 Abbreviations . . . . . . .
A.3 Landmark States . . . . .
A.4 Vehicle States . . . . . . .
A.5 Variables, Parameters and
A.6 Indices . . . . . . . . . . .
A.7 Top Notations . . . . . . .
59
61
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Functions
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
63
63
63
63
64
64
65
65
B Proof of Matrix Singularity
66
C Derivation of Landmark Measurement Residual Limit
67
D Results
68
Chapter 1
Introduction
This section describes the problem to be solved in this work, and the context of
the problem.
1.1
Background
Autoliv Electronics has, together with the Division of Automatic Control at Linkping University, developed an o-line software tool for estimating the sequence of
poses of a vehicle (with pose meaning position and orientation), using images taken
by a far infrared (FIR) camera and measurements of vehicle acceleration, speed
and yaw rate. This tool uses sensor fusion of the vehicle measurements and FIR
images in order to perform simultaneous localization and mapping (SLAM), so it
is not only the vehicle poses that are estimated, but also positions of landmarks
seen by the camera. Figure 1.1 illustrates the SLAM problem; it shows ve consecutive vehicle poses, and measurements of landmarks from the dierent vehicle
positions. Note that although the camera only measures the direction to landmarks, it is possible to determine a landmarks position as the vehicle moves and
more measurements of the landmark are gathered.
1.2
Problem Formulation
The problem to be solved in this work is to nd out how the o-line estimation of
vehicle poses can be improved in terms of accuracy.
Before this work, pose estimation was performed using an extended Kalman
lter (EKF). Since the EKF is a causal method, and the pose sequence estimation
is performed o-line, using an acausal method was expected to improve the estimation signicantly, so the rst order of action was to implement such a method.
The problem of recovering the best possible estimate of the poses and landmark
positions is called smoothing and mapping (SAM, see for example [1]), and the
method proposed was linear least squares SAM, which means iteratively linearizing the vehicle and measurement models and nding a pose sequence estimate by
1
Introduction
Figure 1.1: Simultaneous localization (of the vehicle, illustrated by rectangles) and
mapping (of the landmarks, illustrated by circles) is performed by making some
sort of position measurements of the landmarks (e.g. bearing measurements) from
the dierent vehicle positions.
solving a linear least squares problem. Other proposed areas with potential for
increasing the estimation accuracy was improvements regarding the vehicle model,
feature detection and association algorithms (how to extract measurements from
the FIR images) and robustness against moving objects.
Since the ground truth for the vehicle poses is unknown, the estimation accuracy must be evaluated by indirect measures, so an important part of this work is
to nd good measures of estimation accuracy.
1.3
Autoliv
Autoliv is a worldwide leading developer and manufacturer of automotive safety

systems such as seatbelts, airbags and safety electronics, with most of the major
automobile manufacturers as their customers.
The main activity of Autolivs subsidiary company Autoliv Electronics, for
whom this work is performed, is to delevop and manufacture electronic control
units (ECU) for controlling air bag deployment. Not so long ago Autoliv Electronics started development of the Night Vision pedestrian detection system [5],
a system that in spite of darkness is able to warn drivers when pedestrians are in
the vehicles path, or moving towards this path. Although the system currently
does not automatically detect and warn for animals, they can be seen on the Night
Vision display, giving the driver a chance to detect them by using the display in a
way similar to the use of rear-view mirrors. The Night Vision system, originally
without the pedestrian detection, has been available in production cars since 2005.
Figure 1.2 shows the Night Vision system in action, along with visual images for
comparison.
1.4 Camera System Overview
(a) City
(b) Countryside
Figure 1.2: These two images show comparisons between visual images and Night
Vision images
1.4
Camera System Overview
The Night Vision camera registers radiation in the far infrared (FIR) region at
30 Hz, with a resolution of 320240 pixels, and a spectral range of 814 m. An
advantage of using a FIR sensor, compared to near infrared (NIR) sensors, is that
body heat radiation from humans and animals lies within the FIR region ([13]
suggests that it is easier to detect distant pedestrian using FIR compared to using
NIR). On the other hand, non-living objects and structures are not registered so
clearly by a FIR camera, so for the sake of estimating vehicle poses the FIR camera
might not be the best choice. The camera system is passive, meaning that it does
not actively illuminate the road ahead with FIR radiation. Figure 1.3 shows the
camera when it is mounted to the car.
(a) Camera mounting
(b) Close-up view of camera mounting
Figure 1.3: The images in this gure show the placement of the Night Vision
camera.
1.5
Introduction
Related Work
The EKF-based estimation of the vehicle poses that Autoliv used before this work
is described in [9]. Descriptions of a few versions of SAM can be found in [1] and
[11], where [11] describes the application of SAM to a problem similar to the one
in this work. The work in [1] is on the other hand more focused on fast execution
when using SAM, using factorizations of the information matrix, and [4] describes
how to further improve the execution time by using incremental updates of the
factorized information matrix.
In [8] it is shown how the uncertainty in the distance (depth) to landmarks
in images is more adequately represented by Gaussian noise when the distance
parametrization used is the inverse depth than when regular depth is used.
The model improvements for vehicle pitch dynamics evaluated during this work,
namely constant oset, a second order model and inuence from acceleration, are
described in [2].
Bundle adjustment (which is a computer vision term corresponding to what is
here referred to as SAM) is compared to ltering in [12]. The authors evaluate
bundle adjustment and ltering in terms of computational cost for decreasing
the robot pose estimate covariance, and found that ltering based methods only
might be the better choice when the available processing resources are small; they
conclude that bundle adjustment is superior in all other cases. Bundle adjustment
is still rather costly, and the authors of [10] state that this is because of the choice
of a single privileged coordinate frame; a choice which they avoid by taking a
relative approach to bundle adjustment. This approach uses some of the poses as
references, and for each other pose or landmark there is one reference pose which
it is related to.
In the solution developed during this work feature detection is performed using the Harris corner detector described in [3], and for feature description image
patches are used. For association of features in new images normalized crosscorrelation (NCC), see for example [7], between feature patches and the new images is used. A problem with using image patches as feature descriptors is that
they are not invariant to changes in scale and rotation, and [9] suggests using the
so-called Scale-invariant feature transform (SIFT) described in [6], instead of the
Harris detector and NCC of image patches. The SIFT algorithm detects and describes features in images and two of its advantages is that it is invariant to scale
and orientation of the features.
Chapter 2
System Modeling
The system of a vehicle moving in unknown surroundings, with measurements of
landmarks which are stationary in these surroundings, and measurements of the
vehicle motion, is modeled by a state space model. To begin with we have the
state of the vehicle (describing the current status of the position and motion of the
vehicle) at time t, denoted xvt . Based on the vehicle state xvt1 and some input ut
(in our case it is the acceleration of the vehicle), the next vehicle state xvt is given
by
(2.1a)
xvt = f (xvt1 , ut ) + wt ,
where f is the motion model function and wt is Gaussian process noise which
compensates for model errors. The position at time t of the jth landmark is
parametrized by its state xlj,t and the landmark motion model becomes
xlj,t = xlj,t1 ,
(2.1b)
since we assume that all landmarks are stationary. Then we have, at time t, the
vehicle measurements ytv according to
ytv = hv (xvt ) + evt ,
(2.1c)
where hv is the vehicle measurement function and evt is Gaussian measurement

noise. Finally, the landmark measurements of the jth landmark at time t is denoted
l
yj,t
and given by
l
yj,t
= hl (xvt , xlj,t ) + elj,t ,
(2.1d)
where hl is the landmark measurement function and elj,t is Gaussian measurement

noise.
2.1
Coordinate Frames
There are three relevant coordinate frames for the combined vehicle and camera
system:
5
System Modeling
World (w): This is considered an inertial frame and it is xed to the surroundings of the vehicle.
Vehicle body (b): This frame is xed to the vehicle, with its origin located
in the middle of the rear axis. Coordinate frame b coincide with w at the
start of a scenario.
Camera (c): This frame is xed relative to b, and it is positioned in the
optical center of the camera.
For all three coordinate frames the x-axis is pointing forward, the y-axis is
pointing to the right and the z-axis is pointing downwards.
The yaw, pitch and roll angles describe rotations around z-, y- and x-axes
respectively, and unless stated otherwise, positive direction of angles are given by
the right-hand grip rule applied to the coordinate frame axis around which the
rotation occurs.
2.2
Rotation Matrices
This section describes how coordinates of a xed point is transformed between

three dimensional coordinate systems that have dierent orientation but the same
origin.
The three basic rotation matrices for three dimensional space are given by
1
0
0
Rx () = 0 cos sin ,
0 sin
cos
cos 0 sin
1
0 ,
Ry () = 0
sin 0 cos
cos sin 0
cos 0 .
Rz () = sin
0
0
1
(2.2a)
(2.2b)
(2.2c)
For any rotation matrix we have that R() = R()1 , and we also have
R = RT by denition.
1
If the angles yaw (), pitch () and roll () are used to describe the orientation
of coordinate frame (e.g. a vehicle) as seen from coordinate frame (e.g. the
world), the rotation matrix R for converting -coordinates to -coordinates is
given by
R = Rx ()Ry ()Rz () =
= Rx ()T Ry ()T Rz ()T .
(2.3)
2.3 Vehicle Process Model
Since rotating from to is the inverse of rotating from to (R = (R )1 )

we get
R = Rz ()Ry ()Rx ().
(2.4)
Equation (2.5) concludes this theory section by dening a notation for the
rotation matrix, as a function of the three rotation angles:
R(, , ) = R .
2.3
(2.5)
Vehicle Process Model
This section describes the vehicle process models, in other words the models for
how vehicle states changes over time. The rst section is about the basic model
(also described in [9]) that was used before this work, and the remaining sections
are about the model extensions that have been tested during this work.
2.3.1
Basic Model
Before this work the used vehicle process model was the one described in this
section.
With the states described in Table 2.1, the vehicle state vector at time t is
given by
pt
vx,t
t
v
,
xt =
t
t
t
px,t
pt = py,t .
pz,t
(2.6a)
(2.6b)
The input signal ut is given by

ut = v x,t ,
(2.7)
where the vehicle acceleration v x,t in practice is measured. By treating v x,t as an

input signal instead of as a measurement we avoid having to incorporate it as a
state in the vehicle state vector.
With T as the sampling time, L as the wheel base of the vehicle and C as a
pitch damping parameter, we have
System Modeling
State
p
vx
Description
Vehicle position in world coordinates.
Velocity of the vehicle in its forward direction.
Yaw angle (z-axis rotation), relative to the world coordinate frame.
Front wheel angle, relative to the vehicles forward direction.
Road pitch angle, relative to the world coordinate frame xy-plane.
Pitch angle of the vehicle, relative to the road.
Table 2.1: This table contains short descriptions of the vehicle states.
px,t + T vx,t cos t cos t

py,t + T vx,t sin t cos t
pz,t T vx,t sin t
vx,t + T ut+1
,
f (xvt , ut+1 ) =
T vx,t
t + L tan t
t
Ct
(2.8)
which describes the vehicle process model.

The vehicle process noise wt is independent and Gaussian, according to
wt = B(xvt1 )t N (0, Q(xvt1 )),
Q(xvt )
B(xvt )Q B(xvt )T ,
T cos t cos t
0
T sin t cos t
0
0
B(xt ) = T sin t
,
1
0
0
I 44
vx
t

t

t =
t N (0, Q ),

t
t
vx
q
Q =
,
(2.9a)
(2.9b)
(2.9c)
(2.9d)
where all the q-variables are process noise variance parameters.
(2.9e)
2.3 Vehicle Process Model
2.3.2
Model Extensions
This section describes the vehicle process model extensions tested during this work.
Note that these model extensions can be used together in any combination.
We need to introduce some new notation: In order to describe the vehicle
process model for one of the vehicle states, we use superscripts. An example: For
the yaw angle () process model we would use the notation f .
Constant Oset in Car Pitch Angle
Since the stationary pitch angle of the camera, relative to the road, might be nonzero, due to the current load in the vehicle or misaligned camera mounting, the
state for the camera pitch oset 0 is added to the state vector. This oset state
is interpreted as the stationary car pitch angle, around which the car pitch angle
oscillates. The process model for car pitch angle and car pitch angle oset 0
becomes
f (xvt , ut+1 ) = C(t 0t ) + 0t ,
0
f (xvt , ut+1 ) = 0t ,
(2.10a)
(2.10b)
where C is the previously mentioned pitch damping parameter.

The camera pitch oset models a constant oset angle, so the process noise
variance for 0t is zero. The process noise variance for is independent of whether
0 is included in the vehicle state and process model or not.
Acceleration Oset in Car Pitch Angle
Acceleration (including deceleration) of the vehicle has signicant eects on the
car pitch angle. The model of this eect is that the stationary car pitch angle,
when the acceleration u is constant, becomes Ku, where K is a vehicle geometry
dependent parameter. This leads to that
f (xvt , ut+1 ) = C(t Kut+1 ) + Kut+1 .
(2.11)
This model is very similar to the constant oset model for car pitch; the differences are that for acceleration no new states are required, and the oset due to
acceleration is not necessarily constant.
Note that the choice of using a linear relation between acceleration and stationary car pitch angle is mainly motivated by the fact that the springs, which are
part of the vehicle suspension, have a linear relation between displacement and
force (this is known as Hookes law).
The process noise variance for should not be changed when adding the car
pitch acceleration oset to the process model.
10
System Modeling
Second Order Car Pitch Model

A natural extension of the rst order model in (2.8) for the car pitch is to use
a second order model, modeling the car pitch as a damped harmonic oscillator,
instead of as oscillation-free damping which the rst order model describes.
With as the characteristic time (sometimes called relaxation time) and fn
as the natural frequency of the car pitch system the dierential equation for car
pitch motion is
1
torque
.
(2.12)
+ 2 + (2fn )2 =
moment of inertia
Since the torque is unknown it will be represented by process noise and subsequently set to zero in
1
+ 2 + (2fn )2 = 0,
(2.13)
which is the time-continuous equation (without process noise) that the car pitch
process model will be based on.
Discretization and approximations of derivatives are performed according to
t = (tT ),
t+1 t
,
T
t+1 t
t = (tT
)
.
T
t = (tT
(2.14a)
(2.14b)
(2.14c)
The order of the linear ordinary dierential equation in (2.13) is two. This
means that a state space representation of the car pitch system requires two states;
is already a vehicle state, and is added to the vehicle state vector.
In the absence of process noise, we can identify t+1 as f (xvt , ut+1 ) and t+1
as f (xvt , ut+1 ), so (2.13) and (2.14) gives
f (xvt , ut+1 ) = t + T t ,
(
)
T
v
f (xt , ut+1 ) = 1 2
t T (2fn )2 t .
(2.15a)
(2.15b)
For the second order model of the car pitch to have the same characteristic
time (i.e. the same damping) as the rst order model in (2.8) we seek = (C).
The rst order model in (2.8) is a discretization of the continuous time model
+
1
= 0.
(2.16)
2.4 Landmark Parametrization
11
Using (2.14) to discretize (2.16) results in

(
t+1
)
t .
(2.17)
A comparison of (2.8) and (2.17) nally gives
T
.
1C
(2.18)
Using the second order model for car pitch, the process noise variance for is
lowered by an order of magnitude compared to the process noise variance when
using the basic model, making tightly connected to .
Then the process noise
variance for is set to approximately correspond to that of for when the basic
model is used, resulting in a value one or two orders of magnitude higher than the
basic process noise variance.
Roll Angle
For several reasons, such as that curves of country roads are banked, and that the
car may roll quite a bit when driving on uneven roads such as dirt roads, letting
the roll angle be constant at zero, which is done in the basic model where roll is
not a vehicle state, might be an inadequate approximation. The process model for
the combined roll angle of the car and road is given by
f (xvt , ut+1 ) = t .
(2.19)
Since the roll and pitch angles of automobiles have similiar behaviour in terms
of amplitude and natural frequency, the process noise variance for the roll angle
is set to be approximately the same as the basic model car pitch process noise
variance.
2.4
Landmark Parametrization
In this section the landmark state is described. At time t we have Mt number

of visible landmarks. (Visible means that the landmark has been measured; a
landmark may very well be non-visible although it is present in the FIR image,
but it cannot be visible if it is not in the image.) The landmark index, which is
its identication, of the visible landmark number i {1, 2, . . . , Mt } at time t is
denoted jt (i).
With the landmark states described in Table 2.2, the landmark state vector is
given by
12
System Modeling
xljt (1),t
xl
jt (2),t
,
xlt =
..
.
xljt (Mt ),t
w
kj,t
w
j,t
,
xlj,t =
w
j,t
j,t
w
kj,t,x
w
w
kj,t
= kj,t,y
.
w
kj,t,z
State
kw
w
w
(2.20a)
(2.20b)
(2.20c)
Description
The position in world coordinates of the camera at the time when the
landmark was rst seen.
The azimuth angle of the landmark as seen from k w , relative to world
coordinate frame directions.
The elevation angle of the landmark as seen from k w , relative to world
coordinate frame directions, with positive angles towards the positive
z-axis.
The inverse depth (which is the inverse of the distance) from k w to the
landmark.
Table 2.2: This table contains short descriptions of the landmark states.
The reason for using inverse depth rather than regular distance is that we want
a more natural way to assign uncertainty to the estimates of distance. Since the
uncertainty of state estimates will be represented by normal distributions we get
higher uncertainty for large distances and lower uncertainty for small distances by
using inverse depth, and this is precisely what we want.
w
of landmark j
The landmark state xlj,t is a parametrization of the position lj,t
at time t, and the relationship between position and state, with landmark position
given in world coordinates, is
w
lj,t
w
cos w
cos j,t
j,t
1
w
w
cos w
= kj,t
+
.
j,t sin j,t
j,t
sin w
j,t
|
{z
}
mw
j,t
(2.21)
2.5 Measurement Model
2.5
13
Measurement Model
The model for the vehicle measurements is

(
h
(xvt )
vx,t
t
(
=
)
vx,t
,
vx,t
L tan t
(2.22)
where L is the wheel base of the vehicle. The landmark measurement model is
given by
(xvt , xlj,t )
Pn (pcj,t )
pcj,t
1
pcj,t,x
( c )
pj,t,y
,
pcj,t,z
pcj,t,x
= pcj,t,y = pc (xvt , xlj,t ) =
pcj,t,z
1 cb bw
w
b
=
R (R (j,t (kj,t
p t ) + mw
j,t ) j,t c ),
j,t
Rcb = R(c , c , c )T ,
{
R(t , t + t , t )T
bw
R =
R(t , t + t , 0)T
(2.23a)
(2.23b)
(2.23c)
if model contains ,
otherwise.
(2.23d)
where cb is the position of the camera in the vehicle body coordinate frame, and
Pn (pc ) is the so-called normalized pinhole projection of a point pc , which is given
in camera coordinates. Furthermore, Pn generates normalized camera coordinates
and c , c and c are the yaw, pitch and roll angles of the camera, relative to the
vehicle body coordinate frame.
Both of the two measurement noises evt and elj,t are independent and Gaussian,
according to
evt N (0, S v ),
elj,t N (0, S l ),
( vx
)
s
0
Sv =
,
0 s
( c
)
s
0
Sl =
,
0 sc
(2.24a)
(2.24b)
(2.24c)
(2.24d)
where all the s-variables are measurement noise variance parameters.

The FIR camera is of course digital, so
images consisting of pixels.
)T
( it generates
y
To translate
between
pixel
coordinates
and
normalized
camera coordi(
)T
l
nates y z , which is the kind of coordinates landmark measurements yj,t
are
given in, we use
14
System Modeling
( )
y
y ic
y
f
y
,
=
z
z ic
z
fz
(2.25)
(
)T
where yic zic denotes the image center (which is the intersection of the optical
axis and the image plane), and fy and fz are the focal lengths (given in pixels) in
y-direction and z-direction respectively.
Chapter 3
Visual Odometry and SLAM

The contents of this chapter describes how estimation of vehicle and landmark
states is performed, by rst explaining how the extended Kalman lter (EKF) is
applied to our problem, and then describing the smoothing and mapping (SAM)
algorithm, which can be seen as a method to rene the state estimates from the
EKF. Finally an improvement of the feature association is described.
3.1
Extended Kalman Filter
With xvt and xlt given by (2.6a) and (2.20a) respectively, the true state vector at
time t is given by
( v)
xt
,
(3.1)
xt =
xlt
and x
t|s is the estimate of xt using measurements up until time s {t 1, t}.
If the model and the initial values of x
and P are accurate, then we have
Pt|s = cov(xt x
t|s ). The model used is of course not perfectly accurate, but we
can still interpret Pt|s as a measure of how good x
t|s approximates xt . However,
because of the linearization done in the EKF, P tends to be underestimated, saying
that x
is less uncertain than it actually is, and this must be kept in mind if we are
looking at the actual values in P (e.g. plotting the condence bounds for landmark
positions). Pt|s can be decomposed according to
)
(
v
vl
Pt|s
Pt|s
Pt|s =
,
(3.2)
lv
l
Pt|s
Pt|s
v
vl
lv
where Pt|s
is the covariance of the vehicle states, Pt|s
and Pt|s
is the covariance
l
between the landmark states and the vehicle states, and Pt|s is the covariance of
the landmark states.
The vector ytl contains all the landmark measurements for time t, and hlt (xt )
is the corresponding measurement function; they are dened in (3.3). Remember
15
16
from Chapter 2 that the landmarks which are measured at time t have indices
jt (i), i {1, 2, . . . , Mt }, where Mt is the number of measured landmarks at time
t. We have that
yjl t (1),t
yl
jt (2),t
,
ytl =
..
.
yjl t (Mt ),t
l v l
h (xt , xjt (1),t )

hl (xv , xl
t
jt (2),t )
.
hlt (xt ) =
..
.
hl (xvt , xljt (Mt ),t )
(3.3a)
(3.3b)
The EKF uses linearized measurement and process models given by

f (xvt1 , ut )
Ft =
,
v
xvt1
xt1 =
xv
t1|t1

hv (xvt )
v
Ht =
,
xvt xv =xv
t
t|t1

l

(x
)
h
t
t

Htl =
.
xt xt =xt|t1
(3.4a)
(3.4b)
(3.4c)
We then get the time update (i.e. the prediction) of the states according to
x
vt|t1 = f (
xvt1|t1 , ut ),
x
lt|t1
Pt|t1 =
x
lt1|t1 ,
(
Ftv
0
(3.5a)
(
0
Ftv
Pt1|t1
I
0
0
I
)T
(
+
Qt
0
)
0
,
0
(3.5b)
(3.5c)
where Qt = Q(
xvt1|t1 ) (see (2.9)).
The next step is the measurement update, which adjusts the state predictions
from the time update, using measurements. Stl is the noise covariance for measurement ytl , so it is a block diagonal matrix with Mt number of S l matrices on its
diagonal. With S l and S v given by (2.24), the measurement update is given by
3.1 Extended Kalman Filter
17
x
t|t = x
t|t1 + Ktv (ytv hv (
xvt|t1 )) + Ktl (ytl hlt (
xt|t1 )),
(
)
v
l l
v
Pt|t = Pt|t1 Kt Ht 0 Pt|t1 Kt Ht Pt|t1 ,
)1
(
)T ( ( v
)
(
)T
Ht 0 Pt|t1 Htv 0 + S v
Ktv = Pt|t1 Htv 0
,
(
)1
Ktl = Pt|t1 (Htl )T Htl Pt|t1 (Htl )T + S l
.
(3.6a)
(3.6b)
(3.6c)
(3.6d)
When a new feature is extracted from an image, a corresponding landmark

is added to the landmark state vector x
t|t and the covariance matrix Pt|t must
be expanded to include the uncertainty of the new landmarks states. With p
as the vehicle position, (0 as the )initial inverse depth, P0 as the initial variance
T
for inverse depth, y nl = yynl yznl
as the measured position of the feature (in
normalized camera coordinates), and superscipt nl meaning new landmark, the
new
new state vector x
new
t|t and covariance matrix Pt|t are given by
( )
x
t|t
,
x
nl
pt|t + Rwb cb
arctan y nl + t|t
y
x
nl =
arctan y nl t|t ,
z
0
{
R(t ,
t + t , t ) if model contains ,
Rwb =
R(t ,
t + t , 0) otherwise,
0
Pt|t 0 0
0
0 0
0
new
JT ,
Pt|t
=J
0
0 Sl
0
0
0 0 P0
(
I
nl
J = xnl
x
nl
0
0 0 xt|t 0 0
pt|t
x
new
t|t =
t|t
(3.7a)
(3.7b)
(3.7c)
(3.7d)
0 0
0
0
)
x
nl
y nl
x
nl
0
(3.7e)
where lines are added for better readability, at the same place in the dierent
matrices (which are of equal size). These lines are placed so they separate the old
covariance from the covariance of the new landmark state. Remember that cb is
the position of the camera in the vehicle body coordinate frame, and note that
the covariance for the new landmarks camera position state will be given by the
nl
covariance for the vehicle position in combination with pxt|t .
Due to the linearization of the non-linear landmark measurement function hl
it sometimes happens that the latest estimation of a landmarks inverse depth is
negative. This can happen to landmarks that are distant but with measurements
18
indicating that the current distance estimate is too low, so the distance is increased
by lowering the inverse depth estimation. To avoid this problem all inverse depth
estimates in x
lt|t are forced to be above a positive threshold min , thereby setting
a maximum allowed distance for landmarks. If min is small enough, making the
maximum allowed distance large enough, the measurement errors (for landmarks
further away than the maximum allowed distance) due to forcing the inverse depth
estimates is negligible.
v
Algorithm 1 describes the EKF algorithm used. P1|0
is the initial vehicle state
covariance, tf is the interval for when to search for new features, and M max is the
maximal number of visible landmarks.
Algorithm 1 Sensor fusion using EKF
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
Initialize all elements in x

v1|1 to zero, except for vx,1|1 and 1|1 which are given
v
v
by measurements y1 using (2.22) with y1v = hv (
xv1|1 ). Let P1|1 = P1|0
.
l
Generate y1 by extracting features from the rst image.
Expand x
1|1 and P1|1 according to (3.7) using y1l .
for t = 2 to N do
Generate x
t|t from x
t1|t1 and Pt|t from Pt1|t1 using vehicle measurements ytv , but no landmark measurements.
Predict landmark measurements.
Associate features, resulting in landmark measurements ytl . (Landmarks
whose features cannot be associated with high precision are removed.)
Update x
t|t and Pt|t using landmark measurements ytl . (Landmarks with
highly unlikely measurements are removed.)
for all j do
Make sure j,t|t min .
end for
Make sure Pt|t is symmetric.
if t 0 (mod tf ) and Mt < M max then
Extract new features (at most M max Mt new features).
end if
end for
3.2
Feature Detection
In order to nd distinctive, and thereby trackable, features in the image some

sort of feature detection must be performed. In this work the so-called Harris
detector (see [3]) is used; this detector nds areas in the image where the image
contains signicant change in more than one direction (thus for examle corners,
but not straight lines, will be detected). The feature detection algorithm, used in
Algorithm 1 and given by Algorithm 2, is run at xed time intervals, and only if
the number of visible landmarks Mt is lower than the maximum allowed number
of visible landmarks M max .
3.3 Feature Association
19
Algorithm 2 Feature detection

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
Smooth the image in order to suppress noise.

Compute the image gradient.
Compute the Harris measure from the image gradient.
Mask the image where the road is expected to be, near the image borders, and
around predicted positions of existing features, by setting the Harris measure
to zero in these regions.
Set nM = M max Mt .
while nM > 0 and maximum of Harris measure is above a threshold do
Extract a xed-size image patch for the feature around the position of the
maximum for the Harris measure.
Mask the image around the new feature, by setting the Harris measure to
zero in this region.
Decrease nM by one.
end while
for all extracted features do
Expand the landmark state and covariance with the new landmark that the
extracted feature represents.
end for
3.3
Feature Association
In order to track a feature in a sequence of frames we must perform association,

i.e. determining the position of the feature in question in all of the images in the
sequence. In this work features are represented and characterized by a small image
patch of each feature. Association of one feature image patch with a camera image,
i.e. determining where in the camera image the feature image patch (and thus the
feature) is located, is performed using normalized cross-correlation (NCC). The
feature association in Algorithm 1 is performed according to Algorithm 3.
3.4
Smoothing and Mapping
The idea behind smoothing and mapping (SAM) is to simultaneously estimate the
vehicle states and the landmark states, utilizing both previous, present and future
measurements, as opposed to the EKF which cannot utilize future measurements.
This acausal estimation is performed by solving weighted least squares problems
iteratively, until desired accuracy of the converging solution is achieved.
Since the landmarks are stationary each landmark is represented by a timeindependent landmark state. The vehicle however is not stationary, so the vehicle
states for all the time steps t {1, 2, . . . , N } are included in the state vector of
the SAM algorithm. Mathematical descriptions of the states used in the SAM
algorithm can be found in (3.8).
20
Algorithm 3 Feature association

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
for all features in xlt do

Predict feature position in image.
Select a search region around the predicted position.
if search region is completely within the image then
Compute normalized cross-correlation (NCC) of the current image (limited to the search region) and the feature image patch representing the
landmark.
Find the maximum of the NCC.
Compute the maximum of the NCC when a small area around the original
maximum has been zeroed out.
if second maximum NCC-value is not close to original maximum then
Save the position of the original maximum as the new measurement of
the feature.
end if
end if
end for
Not all landmarks from the EKF run are included in the SAM run; a threshold
for the minimum number of times a landmark has been visible is used to sort out
landmarks with few measurements. With M number of landmarks included in
the SAM run, the landmark index of landmark number i {1, 2, . . . , M } in the
SAM run is denoted j(i). The state vector x, containing both landmark states
and vehicle states, is given by
( v)
x
,
xl
v
x1
xv2

xv = . ,
..
x=
xvN
l
xj(1)
xl
j(2)
xl =
.. ,
.
xlj(M )
w
j
l
.
xj = w
j
j
(3.8a)
(3.8b)
(3.8c)
(3.8d)
The variable x
is the current estimate of x. During the rst SAM iteration
x
is based on the state estimate from the EKF. The vehicle states xvt are simply
3.4 Smoothing and Mapping
21
collected from all the time steps of the EKF run. Landmark states xlj on the
other hand are based on the last estimation by the EKF of the landmark state in
question.
Instead of including the camera positions in the landmark states xlj we use the
vehicle state xvtc (j) , from the time tc (j) when landmark j was rst seen, to calculate
the camera position. This can be done since we assume that the camera is rmly
mounted in the car. We can recompose a landmark state using the function g,
which returns the complete landmark state (i.e. camera position, azimuth angle,
elevation angle and inverse depth) according to
)
ptc (j) + Rwb cb
,
=
xlj
{
R(tc (j) , tc (j) + tc (j) , tc (j) )
=
R(tc (j) , tc (j) + tc (j) , 0)
(
g(xvtc (j) , xlj )

Rwb
(3.9a)
if model contains ,
otherwise.
(3.9b)
Note that xlj,t , with time index, denotes the landmark state that includes camera
position, whereas xlj , without time index, denotes the landmark state that excludes camera position.
l
Since the number of landmark measurements yj,t
is not the same for every time
step, k is used to numerate the complete series of landmark measurements. We
introduce the notation jk for the index of the landmark associated with measurement k, and similarly tk for the time when measurement k was performed. Using
the just introduced notation, yjl k ,tk is the measurement we get from landmark
measurement number k.
Vehicle measurements ytv are performed once every time step, so no notation
other than the time t is required to numerate these measurements.
Linearization at x = x
of the process model and measurement model, and
using the fact that the input signal ut is non-variable (it is actually a measurement
although we treat it as an input signal) results in
x
vt + xvt = f (
xvt1 , ut ) + Ft xvt1 + wt ,
v
v v
xt ) + Htv xvt + evt ,
yt = h (
(3.10a)
(3.10b)
yjl k ,tk = hl (
xvtk , g(
xvtc (jk ) , x
ljk )) + Hkl xvtk +
+
where
Hkl,c xvtc (jk )
Jk xljk
eljk ,tk ,
(3.10c)
(3.10d)
22
Ft =
Htv =
Hkl =
Hkl,c =

f (xvt1 , ut )
,
v
xvt1
xt1 =
xv
t1

hv (xvt )
,
xvt xv =xv
t
t

hl (xvt , g(
xvtc (jk ) , x
ljk ))
,

v v
xvt
xt =
xt
k

l v
v
l
h (
xtk , g(xtc , x
jk ))
,

v v
xvtc
xtc =
xt

hl (
xvtk , g(
xvtc (jk ) , xlj ))
Jk =

xlj
(3.11a)
(3.11b)
(3.11c)
(3.11d)
c (jk )
,
xlj =
xlj
(3.11e)
are Jacobians of the process model and measurement model.

By using the residuals
xvt1 , ut ),
at = x
vt f (
xvt )),
cvt = ytv hv (
(3.12a)
(3.12b)
clk = yjl k ,tk hl (

xvtk , g(
xvtc (jk ) , x
ljk )),
(3.12c)
we can form a least squares optimization expression, minimizing the sum of squared
weighted measurement and process noises, according to
x = arg min
x
= arg min
((
t
((
wt 2Q1
t
evt 2(S v )1
)
+
)
eljk ,tk 2(S l )1
)
Ft xvt1 xvt at 2Q1 + Htv xvt cvt 2(S v )1 + (3.13)
t
)
Hkl xvtk + Hkl,c xvtc (jk ) + Jk xljk clk 2(S l )1 ,
v0 , which we dont
where Qt = Q(
xvt1 ) (see (2.9)). Note that for t = 1 we need x
v
have, so we let F1 = 0, a1 = 0 and Q1 = P1|0 , which is the initial vehicle state
covariance for the EKF estimation. This way the resulting term xv1 2Q1 makes
1
sure that the smoothed state estimate x
s stays reasonably close to x
.
The vector norm P 1 used in (3.13) is the so called Mahalanobis distance,
and it is given by
e2P 1 = eT P 1 e = (P T /2 e)T (P T /2 e) = P T /2 e22 .
(3.14)
3.4 Smoothing and Mapping
23
A problem with using the Mahalanobis distance is that Qt is singular (see Appendix B for a proof), so its inverse does not exist. To get around this problem
we add I, where > 0 is a small number, to Qt before the inversion.
T /2
T /2
By collecting all the weighted Jacobians (Qt

Ft , Qt
, (S v )T /2 Htv , . . . )
T /2
in a matrix A, and stacking all the weighted residuals (Qt
at , (S v )T /2 clk and
l T /2 v
(S )
ct ) in a vector b, the least squares optimization can be rewritten to be of
the form of a standard linear least squares problem, according to
x = arg minAx b22 ,
(3.15)
and an example of the structure of A is available in Example 3.1.

Thereafter the sought smoothed state vector x
s is obtained according to
x
s = x
+ x
(3.16)
In order to get good accuracy for the state estimate we repeat the SAM algorithm, using x
=x
s , until x becomes smaller than some predened threshold.
Example 3.1
To show the structure of the matrix A, an example scenario is used to illustrate
how A is composed of the Jacobians Htv , Hkl , Hkl,c and Jk , but with all the weight
matrices omitted for better visibility (the symbol used later on is to be interpreted as "roughly similar to"). The example scenario has:
N = 5 time steps.
M = 2 landmarks.
Measurements of landmark j = 1 at times t {1, 2, 3, 4}.
Measurements of landmark j = 2 at times t {3, 4, 5}.
This results in the following landmark measurements, enumerated by k
k
tk
jk
1
1
1
2
2
1
3
3
1
4
3
2
5
4
1
6
4
2
7
5
2
The structure of A, as generated by Algorithm 4, is described in (3.17). Remember that the subscript k of Hkl,c and Hkl denotes landmark measurement number, while the subscript t of Htv denotes time.
Note that:
A11 relates vehicle states to vehicle state prediction errors.
A21 relates vehicle states to measurement prediction errors.
24

A22 relates landmark states to measurement prediction errors.
(
A=
A11
A21
0
A22
I
F2
H1v
l,c
H + H1l
1
l,c
H2
H l,c
H5l,c
I
F3
I
F4
I
F5
I
J1
H2v
H2l
J2
H3v
H3l
l,c
H4 + H4l
H6l,c
H7l,c
J3
H4v
H5l
H6l
J5
H5v
H7l
J4
J6
(3.17)
J7
It is time comsuming to evaluate the Jacobian Hkl,c , connecting camera position

to vehicle state. If only the Jacobians partial derivatives with respect to vehicle
position are allowed to be non-zero, it is possible to reduce the execution time
of the evaluation whilst the speed of convergence is marginally lowered, and the
approximations eect on the vehicle pose is almost undetectable. As a result of
this, the implementation uses the approximate version of Hkl,c . The interpretation
of this approximation is that we disregard change in camera position that stems
from changes in the vehicle state angles tc (jk ) and tc (jk ) .
Just as with the EKF algorithm, and because of the same reason, it is possible
to after a SAM iteration get negative estimates of landmarks inverse depths. This
problem is solved in the SAM algorithm like it is in the EKF algorithm, namely
by forcing inverse depth estimates below min up to this minimal allowed value.
Algorithm 4 summarizes the implementation of the SAM algorithm. nv is the
number of states in the vehicle model (i.e. the length of xvt , for any t). nl is the
number of states in the SAM landmark model, for one landmark.
3.5
25
Feature Association Improvement
In order to derive FIR images measurements that contains much information about
the vehicle poses we want many measurements, but at the same time we want few
landmark states since also these are estimated from the measurements. A method
to increase the number of landmark measurements per landmark is described below.
A major disadvantage with association of features using normalized crosscorrelation of image patches is that change in scale is not handled. To overcome
this issue, without implementing a whole new feature representation, scaling of
the image patches that describes the features can be performed according to
Lj,t
Dj,t
Dj,t L0
=2
+ 1,
2
w
w
lj,t
kj,t
1
=
,
= w
w
wb
b
lj,t (
pt + R c )
j,t lj,t (
pt + Rwb cb )
(3.18a)
(3.18b)
where Lj,t is the size of the enlarged patch for landmark j at time t, L0 is the size
of the original patch (with size being the length of the quadratic patches sides).
pt , kw , j,t and lw are EKF estimates of states and variables described in Section
j,t
j,t
w
2.4, and Rwb is the same as in (3.7). Note that lj,t
is not a state and is therefore
not itself estimated, it is instead calculated using the estimated landmark states.
The quantity Dj,t is a scale factor which, based on the EKF estimate of landmark state and vehicle state, estimates how much larger landmark j will appear in
the image at time t, compared to when the landmark was rst seen and the image
patch describing the feature was stored. The scale factor is simply calculated as
the distance to the landmark when it was rst measured, divided by the current
distance to it.
Scaling the feature patches can, besides prolonging the life-time of landmarks,
also be expected to improve the quality of landmark measurements. The key to
this is that when scaling is not used, and we have a landmark patch which ought
to be enlarged, then the association of this patch with new image might become
oset in some direction. Consider for example the top of a spruce tree and lets
say that the feature patch describing this landmark is taken far away so that the
patch depicts the upper half of the tree. If we associate this feature patch with
an image taken at half the original distance to the tree it is likely that we succeed
in nding an association, but the patch will be associated with the top quarter of
the tree, so the measurement is higher up than it should be.
Note that Lj,t is Dj,t L0 rounded to the nearest odd integer. This is done to
make sure that also the enlarged patch has one pixel which represents the center
of the patch. Also note that Dj,t is an approximation of the real scale change, but
this approximation holds since the landmarks are small compared to the distance
to them, resulting in small angles.
26
Algorithm 4 Sensor fusion using SAM

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
Generate x
v from all x
vt|t .
Generate x
l from all x
lt|t , using the last estimate of the individual landmarks.
Exclude landmarks that has fewer measurements than a predened threshold.
Update yjl k ,tk (taken from the EKF algorithm) by removing measurements
belonging to landmarks removed in the previous step.
repeat
Initialize a and c as empty column vectors, and A21 and A22 as empty
matrices. (These variables will grow iteratively as new rows are added to
them.)
for t = 1 to N do
if t = 1 then
(
v
v)
v T /2
Set A11 = (P1|0
)
I 0n (N 1)n .
v
Set a = 0n 1 .
else
v
v)
T /2 ( nv (t2)nv
Append Qt
0
Ft I 0n (N t)n to A11 .
T /2
Append Qt
at to a.
end if
(
v
v)
Append (S v )T /2 02(t1)n Htv 02(N t)n to A21 .
Append a two rows of zeros to A22 .
Append (S v )T /2 cvt to c.
for all k : tk = t do
if t = tc (jk ) then (
v
v)
Append (S l )T /2 02(t1)n Hkl,c + Hkl 02(N t)n to A21 .
else
(
v
v
v)
(S l )T /2 02(tc (jk )1)n Hkl,c 02(ttc (jk )1)n Hkl 02(N t)n
is appended to A21 .
end if
(
)
l
l
Append (S l )T /2 02(jk 1)n Jk 02(M jk )n to A22 .
Append (S l )T /2 clk to c.
end for
end for (
)
A11
0
Let A =
.
(A
)21 A22
a
Let b =
.
c
Calculate x and x
s according to (3.15) and (3.16) respectively.
s
Set x
=x
.
for all j do
Make sure j min .
end for
until max abs(x ) is smaller than predened threshold
Chapter 4
Results
This chapter contains a description of the result presentation, followed by the
results of using SAM and vehicle model extensions, and lastly results of introducing
scale change compensation to the feature association.
4.1
Performance Measures
The results in terms of vehicle pose estimation accuracy of SAM and vehicle model
extensions are given in the form of two measures. The rst one, landmark measurement residuals, is an indirect but quantied measure of pose estimation accuracy,
whereas the second measure, trajectory plot in camera image, is direct but nonquantied and also to some extent subjective.
4.1.1
Landmark Measurement Residuals
(
The root mean square, RMS(v), of a vector v = v1
v
u
N
u1
RMS(v) = t
v2 .
N i=1 i
. . . vN
)T
RN is
(4.1)
To measure how well image positions for landmarks are predicted, i.e. how
far o the predictions were from the measurements on average, the RMS(
cl ) of
l
the landmark measurement residuals c (given in pixels) is used. With fy and fz
as the focal lengths (given in pixels) in y-direction and z-direction respectively,
N
M k = t=1 Mt as the total number of landmark measurements and te (j) as the
end time for landmark j, i.e. the latest time that it was measured, cl is given
according to
27
28
Results
cl1
cl = ... ,
cl k
( M
)
fy 0
l
ck =
cl ,
0 fz k
{
xvtk |tk , x
ljk ,te (jk )|te (jk ) )
yjl k ,tk hl (
l
ck =
l
l v
yjk ,tk h (
xtk , g(
xvtc (jk ) , x
ljk ))
(4.2a)
(4.2b)
if EKF estimates are used,
if SAM estimates are used.
(4.2c)
Note that clk is given in normalized camera coordinates.

The fact that landmark measurements are discretized (they are given as integer
pixel positions) results in that there is a lower limit for the RMS of the landmark
1
measurement residuals, and it is RMS(
cl ) ' 2
0.29. See Appendix C for the
3
derivation of this limit.
A drawback with this pose estimation measure is that since it is based on landmark measurements it depends on what types of landmarks that are extracted
from the image sequence, and the extracted landmarks are not necessarily the
same when using a dierent subset of the model extensions described in 2.3.2. For
example: Let us consider a data sequence where we with one model extension
get a couple of erroneously associated landmarks and no erroneous associations
with another model extension, then it is dicult to compare the pose estimation
accuracy based on the RMS(
cl ) values from these two cases. To make the results
more dependable we take the RMS(
cl ) values for all the 16 combinations of vehicle process model extensions for both EKF and SAM (a total of 32 values per
sequence) and calculate a few averages of the RMS(
cl ) values for each sequence.
It should also be noted that the measurements of course are not the truth,
and the quality of them depends on how well the feature association algorithm
performs.
4.1.2
Trajectory Plot in Camera Image
Plotting the sequence of estimated vehicle positions (the trajectory) in the rst
FIR camera image provides an intuitive way to evaluate the accuracy of the pose
estimates.
The vehicle pose consists of three-dimensional position and three-dimensional
orientation, and the part of the pose sequence that the trajectory basically shows is
position and yaw angle (since yaw is tightly connected to the sequence of positions),
but not pitch angle or roll angle.
The obvious drawback of using the trajectory plot as a measure of pose estimation accuracy is that it is not quantitative but instead has to be assessed
subjectively.
4.2 SAM and Model Extensions
4.2
29
SAM and Model Extensions
We introduce the following abbreviations for the vehicle process model extensions
described in Section 2.3.2:
Oset: Constant oset in car pitch angle.
Acc: Acceleration oset in car pitch angle.
Roll: Roll angle.
2nd: Second order model for car pitch.
For each data sequence used to evaluate the SAM algorithm and model extensions proposed in this thesis, both EKF and SAM estimates has been generated
for all the 16 combinations of vehicle process model extensions, and all values of
RMS(
cl ) for the sequences are found in Appendix D. This section summarizes the
results found in Appendix D, and for each sequence a few vehicle trajectories are
plotted in the rst FIR camera image of the sequence. To illustrate the trajectory
and landmarks a map of estimated landmark positions and vehicle trajectory is
shown for each sequence.
It is important to note that not all sequences contains behaviour that the model
extensions are ment to handle. If for example the true car pitch oset is zero the
oset model extension cannot be expected to improve the pose estimates for that
particular sequence. The same goes for the acceleration oset in car pitch angle
for any sequence in which the vehicle moves at constant speed. To give an idea
about which model extension that are given a chance to improve the results for a
particular sequence, all sequences below are accompanied by a short description
of its pitch motion, roll motion and acceleration, but note that it is generally
unknown whether or not the car pitch has a constant oset.
30
Results
Sequence A
The car pitch and roll oscillates quite a bit during this sequence, but the acceleration is virtually zero. Table 4.1 presents the results for the mean landmark
measurement residuals, when using SAM compared to using only EKF, and when
using the dierent vehicle model extensions, compared to not using them. Figure
4.1 shows a top view of the estimated map and trajectory for Sequence A, while
Figures 4.2 and 4.3 show trajectory plots which compares the dierent vehicle
model extensions, and SAM versus EKF.
RMS(
cl )
2.71
0.51
Method
EKF
SAM
(a) EKF versus SAM.
RMS(
cl )
With model
Without model
Vehicle process model

Oset Acc Roll 2nd
0.52
0.51 0.48 0.50
0.50
0.51 0.54 0.53
(b) With and without model extensions, using SAM.
Table 4.1: These tables show the eect on the mean landmark measurement residuals of using SAM and vehicle model extensions for Sequence A.
200
150
y [m]
100
50
50
0
500
1000
1500
x [m]
Figure 4.1: Sequence A: This gure show a map of the SAM estimates, using the
basic vehicle process model.
31
Figure 4.2: Sequence A: Trajectory for EKF estimates (dashdotted) and SAM
estimates (solid), using the basic vehicle process model.
Figure 4.3: Sequence A: Trajectory for SAM estimates, using the models oset
(dotted), acc (dashed), roll (dashdotted) and 2nd (solid).
32
Results
Sequence B
In this sequence there are moderate amounts of pitch and roll motion, but the
acceleration is very close to zero. Table 4.2 presents the results for the mean
landmark measurement residuals, when using SAM compared to using only EKF,
and when using the dierent vehicle model extensions, compared to not using them.
Figure 4.4 shows a top view of the estimated map and trajectory for Sequence B,
while Figures 4.5 and 4.6 show trajectory plots which compares the dierent vehicle
RMS(
cl )
1.18
0.46
Method
EKF
SAM
(a) EKF versus SAM.
RMS(
cl )
With model
Without model

Oset Acc Roll 2nd
0.45
0.46 0.41 0.46
0.46
0.46 0.50 0.45
Table 4.2: These tables show the eect on the mean landmark measurement residuals of using SAM and vehicle model extensions for Sequence B.
60
40
y [m]
20
0
20
40
60
100
200
300
400
500
600
700
x [m]
Figure 4.4: Sequence B: This gure show a map of the SAM estimates, using the
33
Figure 4.5: Sequence B: Trajectory for EKF estimates (dashdotted) and SAM
Figure 4.6: Sequence B: Trajectory for SAM estimates, using the models oset
34
Results
Sequence C
In this sequence the car pitch oscillates moderately, but roll angle oscillation is
low. However there is some roll motion due to road geometry, and the level of
acceleration is low. Table 4.3 presents the results for the mean landmark measurement residuals, when using SAM compared to using only EKF, and when using
the dierent vehicle model extensions, compared to not using them. Figure 4.7
shows a top view of the estimated map and trajectory for Sequence C, while Figures 4.8 and 4.9 show trajectory plots which compares the dierent vehicle model
extensions, and SAM versus EKF.
RMS(
cl )
3.70
0.65
Method
EKF
SAM
(a) EKF versus SAM.

Oset Acc Roll 2nd
0.61
0.67 0.62 0.64
0.69
0.64 0.68 0.66
RMS(
cl )
With model
Without model
Table 4.3: These tables show the eect on the mean landmark measurement residuals of using SAM and vehicle model extensions for Sequence C.
50
y [m]
50
100
150
0
200
400
600
800
1000
x [m]
Figure 4.7: Sequence C: This gure show a map of the SAM estimates, using the
35
Figure 4.8: Sequence C: Trajectory for EKF estimates (dashdotted) and SAM
Figure 4.9: Sequence C: Trajectory for SAM estimates, using the models oset
36
Results
Sequence D
This sequence contains some roll motion but not so much. The acceleration is
low and the pitch oscillates moderately. Table 4.4 presents the results for the
mean landmark measurement residuals, when using SAM compared to using only
EKF, and when using the dierent vehicle model extensions, compared to not
using them. Figure 4.10 shows a top view of the estimated map and trajectory
for Sequence D, while Figures 4.11 and 4.12 show trajectory plots which compares
the dierent vehicle model extensions, and SAM versus EKF.
RMS(
cl )
1.68
0.58
Method
EKF
SAM
(a) EKF versus SAM.
RMS(
cl )
With model
Without model

Oset Acc Roll 2nd
0.54
0.59 0.55 0.59
0.61
0.57 0.61 0.57
Table 4.4: These tables show the eect on the mean landmark measurement residuals of using SAM and vehicle model extensions for Sequence D.
80
60
40
y [m]
20
0
20
40
60
80
100
200
400
600
800
1000
x [m]
Figure 4.10: Sequence D: This gure show a map of the SAM estimates, using the
37
Figure 4.11: Sequence D: Trajectory for EKF estimates (dashdotted) and SAM
Figure 4.12: Sequence D: Trajectory for SAM estimates, using the models oset
38
Results
Sequence E
This sequence contains almost no acceleration, low roll motion, but the pitch
oscillates moderately. Table 4.5 presents the results for the mean landmark measurement residuals, when using SAM compared to using only EKF, and when
4.13 shows a top view of the estimated map and trajectory for Sequence E, while
RMS(
cl )
0.91
0.45
Method
EKF
SAM
(a) EKF versus SAM.

Oset Acc Roll 2nd
0.44
0.45 0.44 0.45
0.46
0.45 0.45 0.45
RMS(
cl )
With model
Without model
Table 4.5: These tables show the eect on the mean landmark measurement residuals of using SAM and vehicle model extensions for Sequence E.
40
20
y [m]
0
20
40
60
80
100
200
300
400
500
600
700
x [m]
Figure 4.13: Sequence E: This gure show a map of the SAM estimates, using the
39
Figure 4.14: Sequence E: Trajectory for EKF estimates (dashdotted) and SAM
Figure 4.15: Sequence E Trajectory for SAM estimates, using the models oset
40
Results
Sequence F
In this sequence both pitch and roll motion is rather extensive, and there is also
a little bit of acceleration. Table 4.6 presents the results for the mean landmark
4.16 shows a top view of the estimated map and trajectory for Sequence F, while
RMS(
cl )
1.56
0.52
Method
EKF
SAM
(a) EKF versus SAM.
RMS(
cl )
With model
Without model

Oset Acc Roll 2nd
0.51
0.52 0.49 0.52
0.53
0.51 0.55 0.51
Table 4.6: These tables show the eect on the mean landmark measurement residuals of using SAM and vehicle model extensions for Sequence F.
20
0
y [m]
20
40
60
80
100
200
400
600
800
x [m]
Figure 4.16: Sequence F: This gure show a map of the SAM estimates, using the
41
Figure 4.17: Sequence F: Trajectory for EKF estimates (dashdotted) and SAM
Figure 4.18: Sequence F Trajectory for SAM estimates, using the models oset
42
Results
Sequence G
This sequence exhitibs moderate roll motion and acceleration but extensive pitch
motion. Table 4.7 presents the results for the mean landmark measurement residuals, when using SAM compared to using only EKF, and when using the dierent
vehicle model extensions, compared to not using them. Figure 4.19 shows a top
view of the estimated map and trajectory for Sequence G, while Figures 4.20 and
4.21 show trajectory plots which compares the dierent vehicle model extensions,
and SAM versus EKF.
RMS(
cl )
2.78
0.55
Method
EKF
SAM
(a) EKF versus SAM.

Oset Acc Roll 2nd
0.54
0.56 0.52 0.56
0.57
0.54 0.58 0.54
RMS(
cl )
With model
Without model
Table 4.7: These tables show the eect on the mean landmark measurement residuals of using SAM and vehicle model extensions for Sequence G.
0
20
y [m]
40
60
80
100
120
140
100
200
300
400
500
600
700
x [m]
Figure 4.19: Sequence G: This gure show a map of the SAM estimates, using the
43
Figure 4.20: Sequence G: Trajectory for EKF estimates (dashdotted) and SAM
Figure 4.21: Sequence G Trajectory for SAM estimates, using the models oset
44
Results
Sequence H
This sequence exhibits a large constant oset in car pitch (resulting in that the
horizon is on average 15-20 pixels below the image center, this can be seen in
the images below), extensive roll motion, moderate acceleration, but not much
car pitch oscillation. Table 4.8 presents the results for the mean landmark measurement residuals, when using SAM compared to using only EKF, and when
4.22 shows a top view of the estimated map and trajectory for Sequence H, while
Method
EKF
SAM
RMS(
cl )
4.23
0.70
(a) EKF versus SAM.
RMS(
cl )
With model
Without model

Oset Acc Roll 2nd
0.62
0.69 0.61 0.76
0.78
0.71 0.79 0.64
Table 4.8: These tables show the eect on the mean landmark measurement residuals of using SAM and vehicle model extensions for Sequence H.
20
15
y [m]
10
5
0
5
10
0
50
100
150
200
250
300
350
x [m]
Figure 4.22: Sequence H: This gure show a map of the SAM estimates, using the
45
Figure 4.23: Sequence H: Trajectory for EKF estimates (dashdotted) and SAM
Figure 4.24: Sequence H Trajectory for SAM estimates, using the models oset
46
Results
Sequence I
In this sequence pitch and roll motion is moderate, and there is some acceleration.
Table 4.9 presents the results for the mean landmark measurement residuals, when
using SAM compared to using only EKF, and when using the dierent vehicle
model extensions, compared to not using them. Figure 4.25 shows a top view of
the estimated map and trajectory for Sequence I, while Figures 4.26 and 4.27 show
trajectory plots which compares the dierent vehicle model extensions, and SAM
versus EKF.
RMS(
cl )
2.16
0.44
Method
EKF
SAM
(a) EKF versus SAM.
RMS(
cl )
With model
Without model

Oset Acc Roll 2nd
0.42
0.43 0.39 0.44
0.46
0.45 0.49 0.44
Table 4.9: These tables show the eect on the mean landmark measurement residuals of using SAM and vehicle model extensions for Sequence I.
80
60
40
y [m]
20
0
20
40
60
0
200
400
600
800
x [m]
Figure 4.25: Sequence I: This gure show a map of the SAM estimates, using the
47
Figure 4.26: Sequence I: Trajectory for EKF estimates (dashdotted) and SAM
Figure 4.27: Sequence I Trajectory for SAM estimates, using the models oset
48
Results
Sequence J
This sequence contains extensive acceleration and moderate roll and pitch motion.
Table 4.10 presents the results for the mean landmark measurement residuals,
when using SAM compared to using only EKF, and when using the dierent
vehicle model extensions, compared to not using them. Figure 4.28 shows a top
view of the estimated map and trajectory for Sequence J, while Figures 4.29 and
4.30 show trajectory plots which compares the dierent vehicle model extensions,
and SAM versus EKF.
RMS(
cl )
3.33
0.68
Method
EKF
SAM
(a) EKF versus SAM.
RMS(
cl )
With model
Without model

Oset Acc Roll 2nd
0.64
0.66 0.66 0.69
0.73
0.71 0.71 0.68
Table 4.10: These tables show the eect on the mean landmark measurement
residuals of using SAM and vehicle model extensions for Sequence J.
20
10
y [m]
10
20
30
0
100
200
300
400
x [m]
Figure 4.28: Sequence J: This gure show a map of the SAM estimates, using the
49
Figure 4.29: Sequence J: Trajectory for EKF estimates (dashdotted) and SAM
Figure 4.30: Sequence J Trajectory for SAM estimates, using the models oset
50
Results
Sequence K
This sequence exhibits moderate roll motion, and extensive acceleration and pitch
motion. Table 4.11 presents the results for the mean landmark measurement
residuals, when using SAM compared to using only EKF, and when using the
dierent vehicle model extensions, compared to not using them. Figure 4.31 shows
a top view of the estimated map and trajectory for Sequence K, while Figures
4.32 and 4.33 show trajectory plots which compares the dierent vehicle model
extensions, and SAM versus EKF.
RMS(
cl )
3.57
0.75
Method
EKF
SAM
(a) EKF versus SAM.
RMS(
cl )
With model
Without model

Oset Acc Roll 2nd
0.68
0.74 0.68 0.80
0.82
0.76 0.81 0.70
residuals of using SAM and vehicle model extensions for Sequence K.
15
10
5
y [m]
0
5
10
15
20
25
30
0
50
100
150
200
250
300
x [m]
Figure 4.31: Sequence K: This gure show a map of the SAM estimates, using the
51
Figure 4.32: Sequence K: Trajectory for EKF estimates (dashdotted) and SAM
Figure 4.33: Sequence K Trajectory for SAM estimates, using the models oset
52
Results
Sequence L
In this sequence both the acceleration and roll motion are extensive, but the pitch
oscillation is moderate. Table 4.12 presents the results for the mean landmark
4.34 shows a top view of the estimated map and trajectory for Sequence L, while
RMS(
cl )
2.42
0.66
Method
EKF
SAM
(a) EKF versus SAM.
RMS(
cl )
With model
Without model

Oset Acc Roll 2nd
0.69
0.60 0.57 0.69
0.63
0.72 0.76 0.64
residuals of using SAM and vehicle model extensions for Sequence L.
0
20
y [m]
40
60
80
100
120
140
200
400
600
800
x [m]
Figure 4.34: Sequence L: This gure show a map of the SAM estimates, using the
53
Figure 4.35: Sequence L: Trajectory for EKF estimates (dashdotted) and SAM
Figure 4.36: Sequence L Trajectory for SAM estimates, using the models oset
54
Results
Smoothed Trajectory
Figure 4.37 shows a part of the trajectory of Sequence A, where it clearly can be
seen that the EKF estimated trajectory is rather rough while the SAM estimated
trajectory is smooth.
Figure 4.37: Sequence A: Part of the trajectory for EKF estimates (dashdotted)
and SAM estimates (solid), using the basic vehicle process model.
4.3
The feature patch scaling described in Section 3.5 is intended to improve the quality of the landmark measurements and to increase the life-time of the landmarks,
with life-time being the number of times a landmark has been visible (measured),
resulting in more measurements per landmark state.
For a number of sequences measures of life-time has been generated, both with
and without the patch scaling, using the basic vehicle process model, and these
results are presented in table 4.13. The results are ambiguous, showing both
improvement and degradation of the mean landmark life-time.
Mean life-time
Sequence A
Sequence B
Sequence C
Sequence D
Sequence E
Sequence F
Sequence G
Sequence H
Sequence I
Sequence J
Sequence K
Sequence L
Scaling
24.9
27.0
47.7
33.5
32.1
20.8
22.7
32.1
40.4
35.5
37.5
14.2
55
No scaling
26.6
22.6
41.2
32.8
26.9
17.6
23.2
33.2
37.7
36.0
49.1
14.2
Table 4.13: Life-time of landmarks, with and without scaling of feature patches.
Chapter 5
Concluding Remarks
This work was aimed at improving the accuracy of vehicle pose estimates, and
this chapter contains conclusions about how the dierent investigated methods
performed, based on the results in the previous chapter, and a discussion of areas
which might be interesting to investigate in order to further improve the pose
estimation accuracy.
It should be noted that this work is in no way exclusive to or dependent on
FIR images, other kind of images such as NIR images or visual images may in
principle be used instead of FIR images.
5.1
Conclusions
This section contains the conclusions, based on the results, of this work.
5.1.1
SAM
It is clear from the sequences presented in Chapter 4 that the SAM algorithm outperforms the EKF algoritm in terms of pose estimation accuracy. This conclusion
is supported both by the landmark measurement residual measure, and by the
trajectory plots, which in the cases where the EKF estimates are clearly erroneous
the SAM estimates are considerably better.
Another aspect of SAM is that the estimated trajectory is smooth, as expected
(remember that SAM literally means smoothing and mapping), while the EKF
generates estimates that are generally much less smooth, especially when we get
erroneous association of landmarks. An example of this is found in gure 4.37.
5.1.2
Vehicle Process Model Extensions
As can be seen in the results, the process model extensions are not able to improve
the pose estimates as much as the SAM algorithm. They do however result in some
improvements, and a summary of the results along with conclusions are given in
this section.
57
58
Concluding Remarks
Constant Oset in Car Pitch Angle

The landmark measurement residual measures shows that this model extension can
give some improvements in pose estimation accuracy. The magnitude of the improvement of course depends heavily on how much oset the car pitch angle really
is, which is why this model extension cannot provide improvements in accuracy
for all sequences.
Note that if we have a sequence with considerable acceleration, and we use
the constant oset model extension but not the acceleration model extension,
then the constant oset model extension will erroneously try to describe the pitch
angle change (caused by the acceleration) as an oset.
Acceleration Oset in Car Pitch Angle
It can be seen from the results that if a sequence contains extensive acceleration
this model extension can decrease the value of the landmark measurement residual
measure, indicating improved pose estimation accuracy. To get such results the
acceleration has to be at least in the order of 1 m/s2 . As a reference, a regular
automobile can gain speed at a rate of 23 m/s2 och brake at a rate of 510
m/s2 , so under normal driving conditions accelerations of 1 m/s2 or more are
common. An advantage of this model extension is that it does not introduce any
new vehicle states, so it can be used without compromising the estimation accuracy
for acceleration-free sequences.
Roll Angle
This model extension diers from the others in that it introduces a new dimension
for the description of vehicle motion. The results reects this by showing that it
is the model extension which provides the largest performance improvement, at
least in terms of landmark measurement residuals. However, for some sequences
the roll angle has had a tendency to drift slowly, so if longer sequences are to be
analyzed this might become a problem.
Second Order Model for Car Pitch
The results for this model extension indicate that it provides no improvement,
but rather that it decreases performance. This might be a result of that the true
car pitch oscillation process is described adequately enough by the basic model,
so that the second order model basically only adds the hassle of estimating more
vehicle states, making it more dicult to estimate the pose accurately. Another
possibility is that the used values of the model parameters (natural frequency and
relaxation time) are not properly set.
5.1.3
The results show that introducing scale change correction for association of feature
patches most of the time gives landmarks with basically unchanged or slightly
5.2 Future Work
59
longer life-time, but the results of sequence K shows that the scaling might decrease
the life-time, although the life-time with scaling was still high for this sequence,
so it is possible that the risk of shortened life-time is not so big for sequence in
which the life-time is low when scaling is not used. Another possibility for that
sequence is that when no scaling was used there were some long-lived landmarks
with erroneous association, and introducing scaling benecially decreased the lifetime of these landmarks by letting only the correct associations remain.
5.2
Future Work
During this work it has become more and more clear that the problem with erroneous association of features is an important issue to be dealt with, since the
measurements are the foundation on which SAM and model extensions relies. The
natural way to improve landmark association is to use some other feature descriptor, and possibly also change the feature detection algorithm, but a straightforward
approach of reducing the rate of erroneous feature association is to add logic in
the SAM algorithm that removes unstable landmarks. Such an approach might
also result in robustness to moving objects, at least to some extent. Another way
to improve the measurements derived from the camera images is to extract image
measurements of yaw, pitch and roll change rates that are not based on landmarks,
but instead uses phase correlation or some similar method.
Bibliography
[1] F. Dellaert and M. Kaess. Square root SAM: Simultaneous localization and
mapping via square root information smoothing. International Journal of
Robotics Research, 25(12):11811203, 2006.
[2] E.D. Dickmanns. Dynamic Vision for Perception and Control of Motion.
Springer, Secaucus, NJ, USA, 2007.
[3] C. Harris and M. Stephens. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision Conference, pages 147151, Manchester, UK,
August 1988.
[4] M. Kaess, A. Ranganathan, and F. Dellaert. iSAM: Incremental Smoothing
and Mapping. IEEE Transactions on Robotics, 24(6):13651378, 2008.
[5] Q. Lin, F. Tjrnstrm, J. Roll, and B. Wass. Developing a far infrared based
night-vision system with pedestrian detection. VDI Berichte, (2038):153158,
2008.
[6] D.G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91110, 2004.
[7] Y. Ma, S. Soatto, J. Kosecka, and S.S. Sastry. An Invitation to 3-D Vision:
From Images to Geometric Models. Springer, 2004.
[8] J.M.M. Montiel, J. Civera, and A.J. Davison. Unied inverse depth
parametrization for monocular SLAM. In Proceedings of Robotics: Science
and Systems (RSS), Philadelphia, USA, August 2006.
[9] T.B. Schn and J. Roll. Ego-motion and indirect road geometry estimation
using night vision. In IEEE Intelligent Vehicles Symposium, Proceedings,
pages 3035, Xian, Shaanxi, China, June 2009.
[10] G. Sibley, C. Mei, I. Reid, and P. Newman. Adaptive Relative Bundle Adjustment. In Proceedings of Robotics: Science and Systems (RSS), Seattle,
USA, June 2009.
[11] Z. Sjanic, M. Skoglund, T. B. Schn, and F. Gustafsson. Solving the SLAM
Problem for Unmanned Aerial Vehicles Using Smoothed Estimates. In Proceedings of the Reglermte (Swedish Control Conference), Lund, Sweden,
June 2010.
61
62
Bibliography
[12] H. Strasdat, J.M.M. Montiel, and A.J. Davison. Real-time Monocular SLAM:
Why Filter?
In Proceedings of the IEEE International Conference on
Robotics and Automation (ICRA), Anchorage, Alaska, USA, May 2010.
[13] O. Tsimhoni, J. Brgman, and M.J. Flannagan. Pedestrian detection with
near and far infrared night vision enhancement. LEUKOS - Journal of Illuminating Engineering Society of North America, 4(2):113128, 2007.
Appendix A
Nomenclature
A.1
Mathematical Notations
f
N (, )
v2
arg minf (x)
time derivative df
dt of function f
normal distribution with mean and covariance matrix
2-norm of vector v
argument which minimizes the function f
max abs(v)
x
I nn
A.2
EKF
FIR
NCC
NIR
RMS
SAM
SLAM
A.3
kw
w
w
maximum of the absolute values of the elements in vector v

ooring of scalar x
identity matrix with n rows and columns
Abbreviations
extended Kalman lter
far infrared
normalized cross-correlation
near infrared
root mean square
smoothing and mapping
simultaneous localization and mapping
Landmark States
initial camera position in three dimensions
azimuth angle
elevation angle
inverse depth
63
64
A.4
p
vx
A.5
a
A
c
cb
C
e
f
F
g
h
H
j
J
k
l
L
m
M
N
P
Pn
Q
R
S
t
T
u
w
x
y
Nomenclature
Vehicle States
position in three dimensions
velocity of the vehicle in its forward direction
yaw angle
front wheel angle
road pitch angle
vehicle pitch angle
vehicle pitch angle oset
vehicle pitch angular acceleration
roll angle
Variables, Parameters and Functions

vehicle state prediction residuals
matrix describing the optimization problem
measurement residuals
camera position given in vehicle body coordinates
car pitch damping parameter
measurement noise
vehicle state process function, focal length
Jacobian of f with respect to vehicular states
landmark state in six parameters as a function of landmark index
and SAM states
measurement function
Jacobian of h with respect to all states or vehicular states
landmark index
Jacobian of h with respect to landmark states
measurement number
landmark position
wheel base (distance between front and rear wheel axes) of the vehicle
directional unit vector from initial camera position to landmark
number of landmarks, number of visible landmarks
total number of time steps
state covariance
normalized pinhole projection function
process noise covariance
rotation matrix
measurement noise covariance
time
sampling time
input signal
vehicle state process noise
state vector
measurement vector
A.6 Indices
A.6
b
c
j
k
l
t
v
w
x
y
z
A.7
x
Indices
vehicle body coordinate frame
camera, camera coordinate frame
landmark index
measurement number
landmark
time
vehicle
world coordinate frame
x-coordinate
y-coordinate
z-coordinate
Top Notations
state estimate from EKF algorithm
state estimate from SAM algorithm
landmark measurements given in pixel coordinates
65
Appendix B
Proof of Matrix Singularity

This appendix provides a proof of the statement that the process noise covariance
matrix Q(xvt ) is singular (non-invertible).
We have
(2.9)
Qt = Q(xvt ) = B(xvt )Q B(xvt )T Rn

B(xvt )
nv m
Q R
m < nv ,
mm
nv
(B.1a)
(B.1b)
(B.1c)
(B.1d)
which results in
rank Qt = rank(B(xvt )Q B(xvt )T )
min(rank(B(xvt )), rank(Q B(xvt )T ))
/
/
rank(B(xvt )) min(nv , m) = m
(B.2)
m.
Finally, rank Qt < nv since m < nv , and we have that Qt is not of full rank, hence
it is singular.
66
Appendix C
Derivation of Landmark
Measurement Residual Limit
Suppose that the undiscretized landmark measurement y (given in pixels) is noise
free and unbiased, and that we have the prediction yp = y. The measurement
ym is y rounded to the nearest integer, resulting in the landmark measurement
residual cl (
y ) according to
cl (
y ) = ym yp = round(
y ) y =
y + 0.5 y.
(C.1)
Note that we are considering one scalar component of a single measurement, i.e.
we are looking at either the y-component or the z-component of the measurement.
The residual is periodic, so the RMS of it can be calculated over one period,
and we assume that y is uniformly distributed.
v
u
u
u
l
RMS(
c (
y )) = t
1
0.5 (0.5)
0.5
v
u 0.5
u
u
cl (
y )2 d
y=t
(0 y)2 d
y=
0.5
0.5
v
[ ]
u 0.5
(
)
u
0.5
1
y3
1 1
u
2
y d
y=
=
+
=
=t
3 0.5
3 23
23
(C.2)
0.5
1
= 0.29.
2 3
The conclusion from this is that regardless of how good the landmark measurement predictions are (as a result of an accurate vehicle process model), RMS(
cl )
cannot be expected to be below approximately 0.29. (Note however that this limit
is not strict in a mathematical sense, since by chance we might get samples of
y resulting in a RMS(
cl ) smaller than the limit, but the probability for this to
happen decreases rapidly as the number of measurements increases.)
67
Appendix D
Results
Tables D.1-D.12 show the complete results in terms of landmark measurement
residuals for all the sequences presented in the results chapter.
Oset Acc Roll 2nd
RMS(
cl )
EKF SAM
2.50
0.55
2.91
0.54
2.49
0.55
2.73
0.55
2.54
0.53
2.90
0.54
2.88
0.50
2.88
0.53
2.47
0.49
2.53
0.53
2.65
0.40
2.88
0.49
2.91
0.56
2.78
0.50
2.56
0.39
2.77
0.52
Table D.1: Sequence A: Landmark measurement residuals.
68
69
Oset Acc Roll 2nd
RMS(
cl )
EKF SAM
1.02
0.49
1.39
0.52
1.00
0.49
0.94
0.41
0.90
0.49
1.37
0.51
1.36
0.39
1.34
0.50
0.95
0.41
0.99
0.51
1.10
0.45
1.36
0.39
1.34
0.50
1.31
0.39
1.20
0.44
1.31
0.39
Table D.2: Sequence B: Landmark measurement residuals.

Oset Acc Roll 2nd
RMS(
cl )
EKF SAM
3.02
0.71
6.02
0.62
3.14
0.83
3.45
0.65
3.18
0.71
4.59
0.61
2.82
0.60
5.09
0.63
3.46
0.65
2.77
0.74
3.35
0.53
2.87
0.58
5.13
0.60
3.78
0.63
3.07
0.69
3.47
0.62
Table D.3: Sequence C: Landmark measurement residuals.
70
Results
Oset Acc Roll 2nd
RMS(
cl )
EKF SAM
1.44
0.66
1.81
0.56
1.34
0.64
1.93
0.55
1.08
0.65
1.82
0.56
1.64
0.50
1.69
0.57
1.90
0.55
1.09
0.66
1.91
0.55
1.65
0.50
1.74
0.57
1.57
0.51
1.97
0.65
2.37
0.56
Table D.4: Sequence D: Landmark measurement residuals.

Oset Acc Roll 2nd
RMS(
cl )
EKF SAM
0.99
0.47
0.86
0.44
0.97
0.47
0.95
0.45
0.98
0.47
0.86
0.44
0.90
0.45
0.82
0.43
0.93
0.45
0.97
0.47
0.94
0.45
0.88
0.42
0.83
0.43
0.84
0.44
0.92
0.45
0.85
0.44
Table D.5: Sequence E: Landmark measurement residuals.
71
Oset Acc Roll 2nd
RMS(
cl )
EKF SAM
1.46
0.55
1.74
0.53
1.48
0.53
1.47
0.48
1.37
0.56
1.84
0.53
1.61
0.49
1.64
0.52
1.47
0.48
1.31
0.64
1.39
0.49
1.69
0.49
1.80
0.53
1.57
0.49
1.40
0.48
1.69
0.49
Table D.6: Sequence F: Landmark measurement residuals.

Oset Acc Roll 2nd
RMS(
cl )
EKF SAM
1.95
0.53
3.43
0.54
2.04
0.66
2.03
0.50
3.00
0.61
3.50
0.54
2.75
0.54
4.72
0.55
1.80
0.52
2.17
0.65
2.51
0.53
3.10
0.48
3.78
0.57
2.47
0.51
2.11
0.53
3.12
0.55
Table D.7: Sequence G: Landmark measurement residuals.
72
Results
Oset Acc Roll 2nd
RMS(
cl )
EKF SAM
3.91
0.75
4.49
0.67
4.20
0.87
3.91
0.87
3.48
0.97
5.89
0.65
3.50
0.42
4.20
0.65
3.18
0.48
3.63
0.95
3.63
0.69
3.01
0.41
5.59
0.80
5.65
0.68
3.47
0.64
5.86
0.68
Table D.8: Sequence H: Landmark measurement residuals.

Oset Acc Roll 2nd
RMS(
cl )
EKF SAM
2.10
0.51
2.66
0.55
2.19
0.51
1.87
0.36
2.12
0.53
2.35
0.53
1.83
0.36
2.35
0.42
1.66
0.36
1.97
0.43
2.91
0.53
1.81
0.36
2.58
0.43
1.76
0.36
2.70
0.46
1.74
0.35
Table D.9: Sequence I: Landmark measurement residuals.
73
Oset Acc Roll 2nd
RMS(
cl )
EKF SAM
2.84
0.75
4.04
0.73
2.83
0.69
2.99
0.72
2.58
0.77
3.49
0.67
3.70
0.57
3.96
0.67
3.00
0.65
2.96
0.73
2.93
0.83
3.65
0.63
3.73
0.64
4.25
0.61
2.41
0.66
3.85
0.56
Table D.10: Sequence J: Landmark measurement residuals.

Oset Acc Roll 2nd
RMS(
cl )
EKF SAM
4.26
0.84
7.54
0.72
3.41
0.75
3.16
0.72
3.14
0.88
3.64
0.67
3.25
0.63
2.96
0.81
3.24
0.66
3.91
1.08
3.33
0.87
3.10
0.59
2.81
0.76
3.48
0.59
2.58
0.75
3.24
0.66
Table D.11: Sequence K: Landmark measurement residuals.
74
Results
Oset Acc Roll 2nd
RMS(
cl )
EKF SAM
2.53
0.68
2.82
0.68
2.59
0.68
1.93
0.60
2.02
0.77
2.57
0.61
2.15
0.67
4.19
1.29
1.78
0.59
2.02
0.66
1.83
0.56
2.30
0.57
2.87
0.66
2.53
0.52
2.24
0.50
2.38
0.53
Table D.12: Sequence L: Landmark measurement residuals.

Full Text 01

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Full Text 01

Hochgeladen von

Copyright:

Verfügbare Formate

Institutionen fr systemteknik

Department of Electrical Engineering

An Optimization Based Approach to Visual

Examensarbete utfrt i Reglerteknik

Department of Electrical Engineering

Linkpings tekniska hgskola

An Optimization Based Approach to Visual

Examensarbete utfrt i Reglerteknik

Linkping, 15 June, 2010

Division of Automatic Control

En optimeringsbaserad metod fr visuell odometri med infrarda bilder

Frfattare Emil Nilsson

3 Visual Odometry and SLAM

B Proof of Matrix Singularity

C Derivation of Landmark Measurement Residual Limit

Autoliv is a worldwide leading developer and manufacturer of automotive safety

1.4 Camera System Overview

Camera System Overview

(a) Camera mounting

(b) Close-up view of camera mounting

where hv is the vehicle measurement function and evt is Gaussian measurement

where hl is the landmark measurement function and elj,t is Gaussian measurement

This section describes how coordinates of a xed point is transformed between

2.3 Vehicle Process Model

Since rotating from to is the inverse of rotating from to (R = (R )1 )

Vehicle Process Model

The input signal ut is given by

where the vehicle acceleration v x,t in practice is measured. By treating v x,t as an

px,t + T vx,t cos t cos t

pz,t T vx,t sin t

which describes the vehicle process model.

where all the q-variables are process noise variance parameters.

2.3 Vehicle Process Model

where C is the previously mentioned pitch damping parameter.

Second Order Car Pitch Model

2.4 Landmark Parametrization

Using (2.14) to discretize (2.16) results in

A comparison of (2.8) and (2.17) nally gives

In this section the landmark state is described. At time t we have Mt number

2.5 Measurement Model

The model for the vehicle measurements is

where all the s-variables are measurement noise variance parameters.

Visual Odometry and SLAM

Extended Kalman Filter

Visual Odometry and SLAM

h (xt , xjt (1),t )

The EKF uses linearized measurement and process models given by

3.1 Extended Kalman Filter

When a new feature is extracted from an image, a corresponding landmark

Visual Odometry and SLAM

Initialize all elements in x

In order to nd distinctive, and thereby trackable, features in the image some

3.3 Feature Association

Algorithm 2 Feature detection

Smooth the image in order to suppress noise.

In order to track a feature in a sequence of frames we must perform association,

Smoothing and Mapping

Visual Odometry and SLAM

Algorithm 3 Feature association

for all features in xlt do

3.4 Smoothing and Mapping

g(xvtc (j) , xlj )

Hkl,c xvtc (jk )

Visual Odometry and SLAM