Beruflich Dokumente
Kultur Dokumente
SCIYO
Sensor Fusion and Its Applications
Edited by Dr. Ciza Thomas
Published by Sciyo
Janeza Trdine 9, 51000 Rijeka, Croatia
All chapters are Open Access articles distributed under the Creative Commons Non Commercial Share
Alike Attribution 3.0 license, which permits to copy, distribute, transmit, and adapt the work in any
medium, so long as the original work is properly cited. After this work has been published by Sciyo,
authors have the right to republish it, in whole or part, in any publication of which they are the author,
and to make other personal use of the work. Any republication, referencing or personal use of the work
must explicitly identify the original source.
Statements and opinions expressed in the chapters are these of the individual contributors and
not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of
information contained in the published articles. The publisher assumes no responsibility for any
damage or injury to persons or property arising out of the use of any materials, instructions, methods
or ideas contained in the book.
The technology of sensor fusion combines pieces of information coming from different
sources/sensors, resulting in an enhanced overall system performance with respect to
separate sensors/sources. Different sensor fusion methods have been developed in order to
optimize the overall system output in a variety of applications for which sensor fusion might
be useful: sensor networks, security, medical diagnosis, navigation, biometrics, environmental
monitoring, remote sensing, measurements, robotics, etc. Variety of techniques, architectures,
levels, etc., of sensor fusion enables to bring solutions in various areas of diverse disciplines.
This book aims to explore the latest practices and research works in the area of sensor fusion.
The book intends to provide a collection of novel ideas, theories, and solutions related to the
research areas in the field of sensor fusion. This book aims to satisfy the needs of researchers,
academics, and practitioners working in the field of sensor fusion. This book is a unique,
comprehensive, and up-to-date resource for sensor fusion systems designers. This book is
appropriate for use as an upper division undergraduate or graduate level text book. It should
also be of interest to researchers, who need to process and interpret the sensor data in most
scientific and engineering fields.
Initial chapters in this book provide a general overview of sensor fusion. The later chapters
focus mostly on the applications of sensor fusion. Much of this work has been published
in refereed journals and conference proceedings and these papers have been modified and
edited for content and style. With contributions from the world’s leading fusion researchers
and academicians, this book has 22 chapters covering the fundamental theory and cutting-
edge developments that are driving this field.
Several people have made valuable contributions to this book. All researchers who have
contributed to this book are kindly acknowledged: without them, this would not have
been possible. Jelena Marusic and the rest of the sciyo staff provided technical and editorial
assistance that improved the quality of this work.
Editor
X1
1. Introduction
In the field of information fusion, state estimation is necessary1-3. The traditional state
estimation is a process to use statistics principle to estimate the target dynamical (or static)
state by using of measuring information including error from single measure system.
However, single measure system can’t give enough information to satisfy the system
requirement for target control, and is bad for the precision and solidity of state estimation.
Therefore, developing and researching information fusion estimation theory and method is
the only way to obtain state estimation with high precision and solidity.
The traditional estimation method for target state (parameter) can be traced back to the age
of Gauss; in 1975, Gauss presented least square estimation (LSE), which is then applied in
orbit determination for space target. In the end of 1950s, Kalman presented a linear filter
method, which is widely applied in target state estimation and can be taken as the recursion
of LSE4. At present, these two algorithms are the common algorithms in multi-sensor state
fusion estimation, which are respectively called as batch processing fusion algorithm and
sequential fusion algorithm.
The classical LSE is unbiased, consistent and effective as well as simple algorithm and easy
operation when being applied in standard multi sensor information fusion system (which is
the character with linear system state equation and measuring equation, uncorrelated plus
noise with zero mean)5. However, because of the difference of measuring principle and
character of sensor as well as measuring environment, in actual application, some
non-standard multi-sensor information fusion systems are often required to be treated,
which mainly are as follows:
1) Each system error, mixing error and random disturbed factor as well as each
nonlinear factor, uncertain factor (color noise) existing in multi sensor measuring
information6;
2) Uncertain and nonlinear factors existing in multi sensor fusion system model, which
is expressed in two aspects: one is much stronger sense, uncertain and nonlinear factors in
model structure and another is time-change and uncertain factor in model parameter7;
3) Relativity between system noise and measuring noise in dynamical system or
relativity among sub-filter estimation as well as uncertain for system parameter and
unknown covariance information8-9.
2 Sensor Fusion and Its Applications
Ignoring the above situations, the optimal estimation results cannot be obtained if still using
the traditional batch processing fusion algorithm or sequential fusion algorithm. So to
research the optimal fusion estimation algorithm for non standard multi-sensor system with
the above situations is very essential and significative10.
In the next three sections, the research work in this chapter focuses on non-standard
multi-sensor information fusion system respectively with nonlinear, uncertain and
correlated factor in actual multi-sensor system and then presents the corresponding
resolution methods.
Firstly, the modeling method based on semi-parameter modeling is researched to solve state
fusion estimation in nonstandard multi-sensor fusion system to eliminate and solve the
nonlinear mixing error and uncertain factor existing in multi-sensor information and
moreover to realize the optimal fusion estimation for the state.
Secondly, a multi-model fusion estimation methods respectively based on multi-model
adaptive estimation and interacting multiple model fusion are researched to deal with
nonlinear and time-change factors existing in multi-sensor fusion system and moreover to
realize the optimal fusion estimation for the state.
Thirdly, self-adaptive optimal fusion estimation for non-standard multi-sensor dynamical
system is researched. Self-adaptive fusion estimation strategy is introduced to solve local
dependency and system parameter uncertainty existed in multi-sensor dynamical system
and moreover to realize the optimal fusion estimation for the state.
number of parameters to be estimated, the treatment not only lowered the integration of
estimation accuracy, but also increased the complexity of the computation of the matrix
inversion. In addition, robust estimation theory and its research are designed to the problem
of the incomplete computing of the abnormal value and the condition of systems affected by
the large deviation13. A first order Gauss - Markov process is used to analyze and handle the
random noise in measurement information. However, most of these treatments and
researches are based on artificial experience and strong hypothesis, which are sometimes so
contrary to the actual situation that they can doubt the feasibility and credibility of the state
fusion estimation.
The main reason for the failure of the solution of the above-mentioned problems is that there
is no suitable uncertainty modeling method or a suitable mathematical model to describe
the non-linear mixed-error factors in the multi-sensor measurement information14.
Parts of the linear model (or called) semi-parameter model can be used as a suitable
mathematical model to describe the non-linear mixed-error factors in the measurement
information 15. Semi-parametric model have both parametric and non-parametric
components. Its advantages are that it focused on the main part of (i.e. the parameter
component) the information but without neglecting the role of the interference terms
(non-parametric component). Semi-parametric model is a set of tools for solving practical
problems with a broad application prospects. On the one hand, it solves problems which are
difficult for only parameter model or non-parametric model alone to solve, thus enhancing
the adaptability of the model; on the other, it overcome the issue of excessive loss of
information by the non-parametric method and describe practical problems closer to the real
and made fuller use of the information provided by data to eliminate or weaken the impact
of the state fusion estimation accuracy caused by non-linear factors more effectively.
This section attempts to introduce the idea of semi-parametric modeling into the fusion state
estimation theory of the non-standard multi-sensor. It establishes non-standard multi-sensor
fusion state estimation model based on semi-parametric regression and its corresponding
parameters and non-parametric algorithm. At the same time of determining the unknown
parameters, it can also distinguish between nonlinear factors and uncertainties or between
system error and accidental error so as to enhance the state fusion estimation accuracy.
system, are difficult to be completely expressed by parameters. In the first place, there are
many factors which effect the value-taken of nonlinearity but all of these cannot be
considered when establishing mathematical models. Secondly, some relative simple
functional relations are chosen to substitute for functional relation between those factors and
their parameters, so the established functional model are often said to be the approximate
expression of the practical problems, that is to say, there is the existence of the model
representation for error. When the error value of the model is a small amount, there is
nothing much influence on the result of the assessment of the general state of this system if
omitting it. But when the error value of the model is comparatively large, the neglect of it
will exert a strong influence and lead to the wrong conclusion. Therefore, we main focused
on the refinement issues of the state fusion estimation model under the condition of the
non-linear uncertain factors (those non-linear uncertainties which are not parameterized
fully), introducing semi-parametric regression analysis to establish non-standard
multi-sensor information fusion estimation theory based on semi-parametric regression and
its corresponding fusion estimation algorithm.
(1) Semi-parametric Regression Model
Assuming a unified model of linear integration of standard multi-sensor fusion system is:
Y N HX v N
Where, YN is the observation vector, X the state vector of the fusion to be estimated, vN
observation error, H the mapping matrix between metrical information and the state fusion
to be estimated. In this model, vN is supposed to be white noise of the zero mean. That is to
say, except observation error, the observation vector YN is completely used as the function
of status to be assessed. However, if the model is not accurate, with nonlinear uncertainties,
the above formula cannot be strictly established and should be replaced by:
Y N H N X S N vN (2.1)
N
Where, S (t ) is the amount of model error which describes an unknown function
relationship, it is the function of a certain variables t.
Currently, there are three methods for using semi-parametric model to estimate the error
with nonlinear factor model in theory, including the estimation of part of the linear model of
approximation parameterization, the estimation of part of the linear model of regularization
matrix compensation, and part of the two-stage linear model estimation16. But the process of
its solution implies that the algorithm realization is comparative complex, and that the
accuracy of estimation depends on the cognition of characteristics of non-parametric
component as well as the choice of basis function. Taking the estimation of part of the linear
model of regularization matrix compensation for instance, the programming of key factors
like regular matrix and smoothing factor are highly hypothetical, including some elements
presumed in advance, furthermore, the solution process is very complex. If there is
something error or something that cannot meet the model requirement in the solution of
smoothing factor and regularization matrix Rs , it will directly lead to unsolvable
result to the semi-parametric fusion model. Here, we propose an algorithm based on the
state fusion estimation of mutual-iteration semi-parametric regression, by the compensation
for the error of the non-standard multi-sensor fusion model and the spectrum feature
analysis to non-linear uncertainties, through aliasing frequency estimation method of
State Optimal Estimation for Nonstandard Multi-sensor Information Fusion System 5
decoupling to define the best fitting model, thus establishing the algorithm between the
model compensation for the state fusion estimation model and the state fusion estimation of
mutual iteration semi-parametric regression, isolating from non-linear uncertainties and
eliminating the influence on its accuracy of the state fusion estimation.
(2) The basis function of nonlinear uncertainties is expressed as a method for decoupling
parameter estimation of the aliasing frequency.
According to the signal processing theory, in the actual data processing, model errors and
random errors under the influence of the real signal, non-linear uncertainties are often at
different frequency ranges. Frequency components which are included in the error of
measurement model are higher than the true signal frequency, but lower than random
errors, so it can be called sub-low-frequency error17-18. It is difficult for classical least squares
estimation method to distinguish between non-linear model error and random errors.
However, the error characteristics of the measurement model can be extracted from the
residual error in multi-sensor observation. Namely, it is possible to improve the state
estimation accuracy if model error of residual error (caused mainly by the non-linear
uncertainties) can be separated from random noise and the impact of model error deducted
in each process of iterative solution.
On consideration that nonlinear factors S in semi-parametric model can be fitted as the
following polynomial modulation function forms:
M 1 N m 1 def M 1
S (t ) ( ai( m )t i ) exp{ j 2 f mt} b m (t ) exp{ j 2 f m t} (2.2)
m0 i 0 m0
bˆ0 (t ) LPF[ ~
y 0 (t )]
Noting: , among them, LPF[] refers to low-pass filter. The
observation model of amplitude envelope is deduced from Formula (2.2):
N 0 1
bˆ0 (t ) a t (t )
( 0) i
i (2.4)
i 0
Yi1 S i vi1
Yi 2 S i vi 2
, i 1, 2, , N
YiL S i viL
Where, the white noise series is vi1 , v i 2 , , viL , Y1 j , Y2 j , , Y N j is replaced by
Xˆ BCS H T R 1 H H T R 1 (Y Sˆ )
1
(2.6)
And Ŝ is function fitted values of nonlinear uncertain error vector, then its expectation is:
E[ X BCS ] E[( H T R1 H )1 H T R1 (Y Sˆ )] ( H T R1 H )1 H T R1 HX X (2.7)
ˆ
X̂ BCS is the unbiased estimation of X . The estimated value X̂ WLSE is computed by the
method of weighted least squares estimation fusion. That is:
Xˆ WLSE ( H T R 1 H ) 1 H T R 1Y ( H T R 1 H )1 H T R 1 ( HX Sˆ ) (2.8)
Its expectation is:
8 Sensor Fusion and Its Applications
Theorem 2.3: On the condition of nonlinear error factors, the valuation accuracy of X̂
which is based on the state fusion estimation of the mutual iteration semi-parametric
regression ranked above the valuation accuracy which is based on the method of weighted
least squares estimation fusion.
Demonstration: The estimation accuracy of semi-parametric state fusion is supposed to
be Cov[ Xˆ BCS ] , so:
Cov[ Xˆ ] E[( Xˆ BCS X )( Xˆ BCS X )T ] ( H T R 1 H ) 1
BCS (2.11)
However, the valuation accuracy Cov[ X ˆ ] obtained by the method of weighted least
WLSE
squares estimation fusion.
Cov[ Xˆ WLSE ] E[( Xˆ WLSE X )( Xˆ WLSE X )T ]
(2.12)
E[( Xˆ P X )( Xˆ P X )T ] ( H T R1 H )1 P T P
BCS BCS
X̂ based on the state fusion estimation of the mutual iteration semi-parametric regression
is superior to the estimation accuracy obtained by the method of weighted least squares
estimation fusion.
it can also be predicted that with the increase of non-linear error factors, the estimation
accuracy is bound to reduce more and more. But the method for the state fusion estimation
of the mutual iteration semi-parametric can separate white noise series in observation noise
from non-linear error factors, canceling its influence to state fusion estimation accuracy by
fitting estimates. If there is nonlinearity error, the state estimator, which is obtained by the
method for the state fusion estimation of the iteration semi-parametric, will be the optimal
estimation of true value.
Obtaining the state optimal fusion estimate is the processing of using multi-model to
approach dynamic performance of the system at first, then realizing the disposure of
multi-model multi-sensor fusion to the controlled object tracking measurement, This is the
problem of the multi-model fusion estimation in essence22. The basic idea of it is to map the
uncertainty of the parameter space (or model space) to model set. Based on each model
parallel estimator, the state estimation of the system is the optimal fusion of estimation
obtained by each estimator corresponding to the model. As it is very difficult to analyze this
system, one of these methods is to use a linear stochastic control system to denote the
original nonlinear system approximately and to employ the treatment of thinking linear
regression model to solve the nonlinear problem which should be solved by uncertain
control systems 23. The fusion approach diagram is shown in Fig. 3.1.
Model 2(ρ=ρ2)
X̂ i X̂ opt
Z
Model 3(ρ=ρ3) Fusion Estimation
Where, since different operational modes of stochastic uncertain systems worked with a
group of parallel estimator, the input of each estimator will be the control input u and
metrical information Z in a system, while the output of each estimator will be each one
based on output residuals and state estimation Xi in a single model. According to the
residual information, a hypothesis testing principle is used for programming model weight
of an estimator corresponding to each model to reflect the situation that the probability of a
model-taken at the determining time in a system. And the overall system state estimation is
the weighted average value of the state estimation of each estimator.
method. If the conditional mean of the system state is considered as estimation, global
estimates will be the sum of the probability weighted of estimated value of all estimators.
That is:
N
Xˆ k |k E X k | Z k Xˆ ik P ( M ik | Z k ) (3.5)
i 1
2) Hard Decision: The approximation of the obtained global estimates is always from the
estimated value of some estimators. The principle of selection of these estimators is the
model maximum possible matching with the current model and the final state estimation
will be mandatory. If only one model is to be selected in all models by maximum probability,
consider its estimated value as the global one.
3) Random Decision: Global estimates are determined approximately based on some of the
randomly selected sequence of the estimated model. The first fusion mode is the main
method of multi-model fusion estimation. With the approximation of the nonlinear system
and the improvement for system fault tolerance, the tendency of the multi-model fusion will
be: estimating designing real-time adaptive weighted factor and realizing the adaptive
control between models.
In reality, according to different model structures and fusion methods, multi-model fusion
algorithm can be divided into two categories: (1) fixed multi-model (FMM); (2) interacting
multiple model (IMM)24-25. The latter is designed for overcoming the shortcomings of the
former. It can expand the system to the new mode without changing the structure of the
system, but requires some prior information of a probability density function and the
condition that the switching between the various models should satisfy the Markov process.
Related closely to the fixed structure MM algorithms, there is a virtually ignored question:
the performance of MM Estimator is heavily dependent on the use of the model set. There is
a dilemma here: more models should be increased to improve the estimation accuracy, but
the use of too many models will not only increase computation, but reduce the estimator's
performance.
There are two ways out of this dilemma: 1) Designing a better model set (But so far the
available theoretical results are still very limited); 2) using the variable model set.
It will be discussed Multi-model Adaptive Estimation (MMAE) and Interactive Multiple
Model in Multi-model estimation method in a later paper.
residual error is used to calculate the conditional probability pi in the conditions of the
actual measured values and the actual vector parameter a
by test algorithm. The
conditional probability is used to weigh the correctness of each Kalman filter state estimate.
The probability weighted average being from the state estimation, formed the mixed state
estimation of the actual system Xˆ MMAE . Multiple model adaptive estimators are shown in
Fig. 3.2.
(2) The Filtering Algorithm in MMAE
Step1 Parallel Filtering Equation
The Kalman filter of the i (i 1, 2, , N ) linear model is:
X i (tk ) Φi X i (tk 1 ) Ci u(tk 1 ) Γ i wi (tk 1 )
(3.6)
Z i (tk ) H i X i (tk ) vi (tk )
The symbols have the same meaning as those of Formula (3.4). In addition, systematic noise
wi (tk ) and observation noise vi (tk ) are both zero mean white noise, and for all k, j,
satisfying:
E[ wi (t k )] 0
E[ vi (t k )] 0
(3.7)
E[ wi (t k ) wi (t j )] Qi k , j
T
E[ vi (t k ) vi T (t j )] Ri k , j
E[ wi (t k ) vi T (t j )] 0
r1
The Kalman filter
Based on model 1 X̂ 1
Hypothesis â
r2 testing algorithm
The Kalman filter
u X̂ 2
Based on model 2 PN … P3 P2 P1
Z r3
The Kalman filter
X̂ 3
Based on model 3
…
∑ X̂ MMAE
r4
The Kalman filter
X̂ k
Based on model k
The Kalman filter algorithm use the above model to determine the optimum time to update
the prediction and measurement of Kalman filter state estimation, optimum estimate update
equation and state estimation error covariance matrix. Based on Kalman filter model, the
update time equation of the Kalman filter state estimation is as follows:
Xˆ i ( k / k 1) Φ i Xˆ i ( k 1 / k 1) C i u ( k 1)
(3.8)
Zˆ i ( k / k 1) H i Xˆ i ( k / k 1)
The update time equation of the state estimation error covariance matrix is:
Pi k / k 1 Φi Pi k 1/ k 1 Φi T Γ i Qi Γ i T (3.9)
The Kalman filter state estimation can achieve the measurement update by the following
formula:
Xˆ i (k / k ) Xˆ i (k / k 1) K i (k )ri (k ) (3.10)
And the gain of Kalman is:
K i k Pi k / k 1 H i T Ai (k )1 (3.11)
The O-C residual vector referred to the deviation by subtracting the measured value Zi (k )
to the Kalman estimation based on previous measurements Z i ( k / k 1) , and that is:
ri (k ) Z i (k ) H i Xˆ i (k / k 1) (3.12)
Its variance matrix is:
Ai (k ) H i Pi k / k 1 H i T Ri (3.13)
And the update equation of the state estimate covariance matrix is:
Pi k / k [I K i k H i ]Pi k / k 1 (3.14)
Step2 Solving of Model Probability
It can obtain the new residual income of single linear model at any moment through the
calculation of each parallel filter system of local filtering equation. At this moment, on the
basis of the residual information and a hypothesis test principle, the model probability,
corresponding to each estimator model, is designed reflect real-time system model in
determining the time to take the possibility of a size. The representation of the probability of
two models will be given as:
1) The representation of the model probability based on statistical properties of residuals
It is known to all: If the Single Kalman model and the system model phase are matching, the
residual is the Gaussian white noise of the sequence zero-mean, and the variance matrix can
be obtained by Formula (3.13). Therefore, the conditional probability density function under
the condition of the measured values Z (tk ) of the i (i 1, 2, , N ) filter model at
the kth moment is:
1 1 (3.15)
f Z ( tk )| H i , Z ( tk 1 ) ( Z (t k ) | H i , Z (t k 1 )) exp ri T ( k ) Ai1ri ( k )
(2 ) m/2 1/ 2
| Ai | 2
Defining the following objective function:
J i (k ) p (θi | Z k ) pi (tk ) Pr{H H i | Z (tk ) Z k } (3.16)
State Optimal Estimation for Nonstandard Multi-sensor Information Fusion System 15
satisfied in the model. Secondly, IMM can expand the new operation model of the estimated
system without changing the structure of the system. Furthermore, the amount of
computation in IMM is moderate, having advantages of nonlinear filtering.
(1) The Fusion Architecture of IMM
Assuming a certain system can be described as the following state equation and
measurement equation:
X (k 1) (k , m(k )) X (k ) w(k , m(k ))
(3.23)
Z (k ) H (k , m(k )) X (k ) v(k , m(k ))
Where, X (k ) is the system state vector, ( k , m(k )) being the state transition matrix;
w(k , m(k )) is a mean zero, the variance being the Gaussian white noise
Q(k , m(k )) ;
Z (k ) is the measurement vector, H (k , m(k )) being the observation matrix;
v(k , m(k )) is a mean zero, the variance being the Gaussian white noise R (k , m(k )) ;
And there is no relation between w(k , m( k )) and v ( k , m( k )) .
Where, m( k ) means an effective mode at tk sampling time. At tk time, the effective
representation of mi is mi ( k ) {m( k ) mi } . All possible system mode set is
P{mi ( k 1) | m j ( k )} ji , mi , m j M (3.24)
N
And
i 1
ji 1 j 1, 2, , N (3.25)
When received measurement information, the actual transition probability between models
is the maximum posterior probability based on the above ji and measurement set
k
{Z } .
The core of the interacting multiple model algorithms can modify the filter's input/output
by using the actual transition probability in the above. The schematic figure of
inter-influencing multiple model algorithms will be given in Fig. 3.3.
State Optimal Estimation for Nonstandard Multi-sensor Information Fusion System 17
1
j|i (k 1| k 1) P{m j (k 1) | mi (k ), Z k 1} ji j (k 1) (3.29)
ci
Where
N
ci ji j (k 1) (3.30)
j 1
So, i (k ) is the likelihood function for model mi the kth time, the likelihood value will
be calculated by the residual error and the updated amount of covariance. That is:
1
i ( k ) N [vi ( k ) : 0, Si (k )] | 2 Si |1/ 2 exp{ viT Si1vi } (3.39)
2
Step4 Output Fusion
The final state of the output is obtained by weighting and combining all sub-model state
estimation, namely, by the product of the state estimation of each model and model
probability.
N
Xˆ ( k | k ) Xˆ i (k | k ) i ( k ) (3.40)
i 1
Simultaneously, the estimated covariance matrix is:
N
P (k | k ) i (k ){Pi (k | k ) b bT } (3.41)
i 1
And
b [ Xˆ i (k | k ) Xˆ (k | k )] (3.42)
As will be readily seen, when IMM estimation is taking into historical information of mode
at the kth time, it also mixes the previous estimated information in the beginning of each
circulation to avoid the shortcoming that the complexity of the optimal estimation will
present an exponential growth with time. It is the main aspect that can distinguish
interacting multiple model algorithm from other non-interacting multiple model estimation.
The distributed joint filter (FKF, Federal Kalman Filter) was proposed by an American
Scholar N.A. Carlson in 1988 concerning with a special form of distributed fusion. It has
been considered as a new information fusion method which is only directed towards the
synthesis of the estimated information of sub-filter. The sub-filter is also a parallel structure
and each filter adopted the Kalman filter algorithm to deal with its own sensor
measurements. In order to make the structure of the master filter and the accuracy of the
centralized fusion estimation similar, the feature which distinguished the combined filter
from the general distributed filter is that the combined filter applied variance upper bound
technique and information distribution principle to eliminate the correlation estimates of the
sub-filter in each sensor, and distributed the global state estimate information and noise
information of the system to each sub-filter without changing the form of sub-filter
algorithm. Therefore, it has the advantages of more simple in algorithm, better fault
tolerance and easy to implement, etc. When information distribution factor determined the
performance of the combined filter, the selection rules became the focus of recent research
and debate28. Under the present circumstances, it is the main objective and research
direction in this field to search for and design "information distribution" which will be
simple, effective and self-adaptive.
4.1 Analysis and Decoupling for the Relevance of the Combined Filter
The system description will be given as:
X ( k 1) Φ ( k 1, k ) X (k ) Γ ( k 1, k ) w( k ) (4.1)
Z i (k 1) H i (k 1) X i (k 1) vi (k 1) i 1, 2, , N (4.2)
i sensor at the k 1 time, and H i ( k 1) being the mapping matrix of the ith sensor at
the H i (k 1) time. Assume E[ w( k )] 0 , E[ w(k ) w
T
( j )] Q (k ) kj , E[vi (k )] 0 ,
and E[vi (k )viT ( j )] Ri (k ) kj .
Theorem 4.1: In the multi-sensor information fusion system described by Equation (4.1) and
(4.2), if local estimates are unrelated, the global optimal fusion estimate of the state Xˆ g can
have the following general formulas:
ˆ N
X g Pg Pi X i Pg P1 X 1 Pg P2 X 2 Pg PN X N
1 ˆ 1 ˆ 1 ˆ 1 ˆ
i 1 (4.3)
N
P ( P 1 ) 1 ( P 1 P 1 P 1 ) 1
g
i 1
i 1 2 N
the combined Kalman filter (the fusion center), Xˆ i (k | k ) , Pi ( k | k ) being the estimate
and the covariance matrix of the i sub-filter, Xˆ m (k | k ) , Pm (k | k ) being the estimate and
the covariance matrix of the Master Filter, and if there is no feedback from the fusion center
to the sub-filter, when the Master Filter completed the fusion process at k time, there will
be Xˆ m (k | k ) Xˆ (k | k ) , Pm (k | k ) P (k | k ) . The forecast for the main filter will be
(Because of no measurements, the Master Filter only had time updates, but no measurement
updates.):
Xˆ m ( k 1 | k ) Φ k Xˆ ( k | k )
(4.4)
Pm ( k 1 | k ) Φ ( k ) P ( k | k )Φ ( k ) Γ ( k )Q ( k ) ( k )
T T
Where, the meanings of Φ ( k ) , Γ ( k ) and Q ( k ) are the same as those above. As the ith
sub-filter has both time updates and measurement updates, it should have:
Xˆ ( k 1 | k 1) Xˆ ( k 1 | k ) K ( k 1)( Z ( k 1) H ( k 1) Xˆ ( k 1 | k ))
i i i i i i (4.5)
Φ ( k ) Xˆ i ( k | k ) K i ( k 1)( Z i ( k 1) H i ( k 1) Φ ( k ) Xˆ i ( k | k ))
Accordingly,
X i (k 1| k 1) X (k 1| k 1) Xˆ i (k 1| k 1)
Φ (k ) X (k | k ) Γ (k ) w(k ) Φ (k ) Xˆ (k | k ) i
(4.6)
K i (k 1)[ H i (k 1)(Φ (k ) X ( k | k ) Γ ( k ) w( k )) vi (k 1) H i (k 1)Φ (k ) Xˆ i ( k | k )]
(I K (k 1) H (k 1))Φ (k ) X (k | k )
i i i
Therefore, the covariance of any sub-filter i and the Master Filter m at the ( k 1) th time
will be:
Pi , m ( k 1) Cov( X i ( k 1 | k 1), X m ( k 1 | k 1))
(I K i ( k 1) H i ( k 1))Φ ( k ) Pi , m ( k )Φ T ( k ) (4.9)
(I K i ( k 1) H i ( k 1)) Γ ( k )Q ( k ) Γ T ( k )
As can be seen, only on the condition of both Q ( k ) 0 and Pi , j ( k ) 0 ,the filtering errors
between each sub-filter and the Master Filter at ( k 1) time are not related to each other.
While in the usual case, both constraint conditions are hard to establish.
In addition, supposing:
Bi (k 1) (I K i (k 1) Η i (k 1))Φ (k ), Ci (k 1)
(4.10)
(I K i (k 1) H i (k 1)) Γ (k ), (i 1, 2, , N )
And:
P1,1 ( k 1) P1, N ( k 1) P1,m ( k 1)
PN ,1 ( k 1) PN , N (k 1) PN ,m ( k 1)
Pm ,1 ( k 1) Pm , N ( k 1) Pm ,m ( k 1)
B1 ( k 1) P1,1 (k )( B1 ( k 1)) T B1 ( k 1) P1, N (k )( BN ( k 1)) T B1 ( k 1) P1,m ( k )Φ T ( k )
BN ( k 1) PN ,1 (k )( B1 ( k 1)) T BN ( k 1) PN , N (k )( BN ( k 1))T BN ( k 1) PN ,m ( k )Φ T (k )
Φ (k ) Pm ,1 ( k )( B1 ( k 1))
T
Φ ( k ) Pm , N ( k )( BN ( k 1)) T Φ ( k ) Pm ( k )Φ T ( k )
C1 ( k 1)Q (k )(C1 ( k 1)) T C1 ( k 1)Q ( k )(C N ( k 1)) T C1 ( k 1)Q (k ) Γ T ( k )
C N ( k 1)Q ( k )(C1 ( k 1)) T C N ( k 1)Q (k )(C N ( k 1))T C N ( k 1)Q (k ) Γ T ( k )
Γ (k )Q ( k )(C1 ( k 1))
T
Γ ( k )Q ( k )(C N ( k 1)) T Γ ( k )Q ( k ) Γ T ( k )
B1(k 1) 0 P1,1(k) P1,N (k) P1,m (k) B1T (k 1)
0 0 0
0 BN (k 1) 0 PN,1(k) PN,N (k) PN,m (k) 0 ΒNT (k 1) 0
0 0 Φ(k) Pm,1(k) Pm,N (k) Pm,m (k) 0 0 Φ (k)
T
As can be seen, due to the influence of the common process noise w( k ) , even
if Pi , j ( k ) 0 , there cannot get Pi , j ( k 1) 0 . At this time, "variance upper-bound"
technology can be used to eliminate this correlation. Known by the matrix theory29, there are
upper-bound existed in the phalanx being composed of Q (k ) from the Formula (4.11).
Q ( k ) Q (k ) Q (k ) 1
1
Q (k ) 0 0
(4.12)
Q ( k ) Q (k ) Q (k ) 0 N Q (k )
1
0
Q ( k ) Q ( k ) Q ( k ) 0 0 m1Q ( k )
And: 1 2 N m 1, 0 i 1
As can be seen, the positive definite of the upper-bound in Formula (4.12) is stronger than
that of original matrix. That is to say, the difference between the upper-bound matrix and
the original matrix is positive semi-definite.
A similar upper-bound can also be set in the initial state covariance P0 . That is:
Public reference
公共基准系统 Xˆ g , 11 Pg Master
主滤波器 Filter
Xˆ 1 , P1
子系统1 1
Z1 子滤波器 11
Sub-system Sub-filter 时间更新
Updated time
Xˆ g , 21 Pg
Xˆ , P
Xˆ 2 , P2 Xˆ m , Pm Xˆ g , Pg g g
Sub-system
子系统2 2
Z2 Sub-filter
子滤波器 22
Xˆ g , N1 Pg m1
Optimal
最优融合 fusion
Xˆ N , PN
子系统NN
ZN 子滤波器 NN
Sub-filter
Sub-system
From the filter structure shown in the Fig. 4.1, the fusion process for the combined filter can
be divided into the following four steps.
Step1 Given initial value and information distribution: The initial value of the global state in
the initial moment is supposed to be X 0 , the covariance to be Q0 , the state estimate vector
of the local filter, the system covariance matrix and the state vector covariance matrix
ˆ ,Q , P ,
separately, respectively to be X i 1, , N , and the corresponding master filter
i i i
ˆ
Xi (k | k) Xˆ g (k | k) i 1,2,, N, m
Where, i should meet the requirements of information conservation principles:
1 2 N m 1 0 i 1
Step2 the time to update the information: The process of updating time conducted
independently, the updated time algorithm is shown as follows:
Xˆ i (k 1| k ) Φ(k 1| k ) Xˆ i (k | k ) i 1, 2,, N , m
Pi (k 1| k ) Φ(k 1| k ) Pi (k | k )Φ (k 1| k ) Γ (k 1| k )Qi (k ) Γ (k 1| k )
T T
(4.16)
Step3 Measurement update: As the master filter does not measure, there is no measurement
update in the Master Filter. The measurement update only occurs in each local sub-filter,
and can work by the following formula:
Pi1(k 1| k 1) Xˆ i (k 1| k 1) Pi1(k 1| k) Xˆ i (k 1| k) HiT (k 1)Ri1(k 1)Zi (k 1)
1
Pi (k 1| k 1) Pi (k 1| k) Hi (k 1)Ri (k 1)Hi (k 1)
1 T 1
i 1,2,, N
(4.17)
Step4 the optimal information fusion: The amount of information of the state equation and
the amount of information of the process equation can be apportioned by the information
distribution to eliminate the correlation among sub-filters. Then the core algorithm of the
combined filter can be fused to the local information of every local filter to get the state
optimal estimates.
ˆ N ,m
X
g
(k | k ) P g ( k | k ) Pi 1 (k | k ) Xˆ i (k | k )
i 1 (4.18)
N ,m
P (k | k ) ( P 1 (k | k ))1 ( P 1 (k | k ) P 1 (k | k ) P 1 (k | k ) P 1 (k | k ))1
g
i 1
i 1 2 N m
It can achieve the goal to complete the workflow of the combined filter after the processes of
information distribution, the updated time, the updated measurement and information
fusion. Obviously, as the variance upper-bound technique is adopted to remove the
26 Sensor Fusion and Its Applications
correlation between sub-filters and the master filter and between the various sub-filters in
the local filter and to enlarge the initial covariance matrix and the process noise covariance
of each sub-filter by i times, the filter results of each local filter will not be optimal. But
1
some information lost by the variance upper-bound technique can be re-synthesized in the
final fusion process to get the global optimal solution for the equation.
In the above analysis for the structure of state fusion estimation, it is known that centralized
fusion structure is the optimal fusion estimation for the system state in the minimum
variance. While in the combined filter, the optimal fusion algorithm is used to deal with
local filtering estimate to synthesize global state estimate. Due to the application of variance
upper-bound technique, local filtering is turned into being suboptimal, the global filter after
its synthesis becomes global optimal, i.e. the fact that the equivalence issue between the
combined filtering process and the centralized fusion filtering process. To sum up, as can be
seen from the above analysis, the algorithm of combined filtering process is greatly
simplified by the use of variance upper-bound technique. It is worth pointing out that the
use of variance upper-bound technique made local estimates suboptimum but the global
estimate after the fusion of local estimates is optimal, i.e. combined filtering model is
equivalent to centralized filtering model in the estimated accuracy.
Combined with above findings in the literature, on determining rules for information
distribution factor, we should consider from two aspects.
1) Under circumstances of meeting conditions required in Kalman filtering such as exact
statistical properties of noise, it is known from filter performance analysis in Section 4.2 that:
if the value of the information distribution factor can satisfy information on conservation
principles, the combined filter will be the global optimal one. In other words, the global
optimal estimation accuracy is unrelated to the value of information distribution factors,
which will influence estimation accuracy of a sub-filter yet. As is known in the information
distribution process, process information obtained from each sub-filter is i Qg , i Pg1 ,
1
Kalman filter can automatically use different weights according to the merits of the quality
of information: the smaller the value of i is, the lower process message weight will be, so
the accuracy of sub-filters is dependent on the accuracy of measuring information; on the
contrary, the accuracy of sub-filters is dependent on the accuracy of process information.
2) Under circumstances of not knowing statistical properties of noise or the failure of a
subsystem, global estimates obviously loss the optimality and degrade the accuracy, and it
is necessary to introduce the determination mode of adaptive information distribution factor.
Information distribution factor will be adaptive dynamically determined by the sub-filter
accuracy to overcome the loss of accuracy caused by fault subsystem to remain the relatively
high accuracy in global estimates. In determining adaptive information distribution factor, it
should be considered that less precision sub-filter will allocate factor with smaller
information to make the overall output of the combined filtering model had better fusion
performance, or to obtain higher estimation accuracy and fault tolerance.
In Kalman filter, the trace of error covariance matrix P includes the corresponding estimate
vector or its linear combination of variance. The estimated accuracy can be reflected in filter
answered to the estimate vector or its linear combination through the analysis for the trace
of P. So there will be the following definition:
Definition 4.1: The estimation accuracy of attenuation factor of the i th local filter is:
EDOPi tr( Pi Pi T )
(4.19)
Where, the definition of EDOPi (Estimation Dilution of Precision) is attenuation factor
estimation accuracy, meaning the measurement of estimation error covariance matrix
in i local filter, tr() meaning the demand for computing trace function of the matrix.
When introduced attenuation factor estimation accuracy EDOPi , in fact, it is said to use
the measurement of norm characterization Pi in Pi matrix: the bigger the matrix norm is,
the corresponding estimated covariance matrix will be larger, so the filtering effect is poorer;
and vice versa.
According to the definition of attenuation factor estimation accuracy, take the computing
formula of information distribution factor in the combined filtering process as follows:
EDOPi
i
EDOP1 EDOP2 EDOPN EDOPm (4.20)
28 Sensor Fusion and Its Applications
5. Summary
The chapter focuses on non-standard multi-sensor information fusion system with each kind
of nonlinear, uncertain and correlated factor, which is widely popular in actual application,
because of the difference of measuring principle and character of sensor as well as
measuring environment.
Aiming at the above non-standard factors, three resolution schemes based on semi-parameter
modeling, multi model fusion and self-adaptive estimation are relatively advanced, and
moreover, the corresponding fusion estimation model and algorithm are presented.
(1) By introducing semi-parameter regression analysis concept to non-standard multi-sensor
state fusion estimation theory, the relational fusion estimation model and
parameter-non-parameter solution algorithm are established; the process is to separate
model error brought by nonlinear and uncertainty factors with semi-parameter modeling
method and then weakens the influence to the state fusion estimation precision; besides, the
conclusion is proved in theory that the state estimation obtained in this algorithm is the
optimal fusion estimation.
(2) Two multi-model fusion estimation methods respectively based on multi-model adaptive
estimation and interacting multiple model fusion are researched to deal with nonlinear and
time-change factors existing in multi-sensor fusion system and moreover to realize the
optimal fusion estimation for the state.
(3) Self-adaptive fusion estimation strategy is introduced to solve local dependency and
system parameter uncertainty existed in multi-sensor dynamical system and moreover to
realize the optimal fusion estimation for the state. The fusion model for federal filter and its
optimality are researched; the fusion algorithms respectively in relevant or irrelevant for
each sub-filter are presented; the structure and algorithm scheme for federal filter are
designed; moreover, its estimation performance was also analyzed, which was influenced
by information allocation factors greatly. So the selection method of information allocation
factors was discussed, in this chapter, which was dynamically and self-adaptively
determined according to the eigenvalue square decomposition of the covariance matrix.
6. Reference
Hall L D, Llinas J. Handbook of Multisensor Data Fusion. Bcoa Raton, FL, USA: CRC Press,
2001
Bedworth M, O’Brien J. the Omnibus Model: A New Model of Data Fusion. IEEE
Transactions on Aerospace and Electronic System, 2000, 15(4): 30-36
State Optimal Estimation for Nonstandard Multi-sensor Information Fusion System 29
Heintz, F., Doherty, P. A Knowledge Processing Middleware Framework and its Relation to
the JDL Data Fusion Model. Proceedings of the 7th International Conference on
Information Fusion, 2005, pp: 1592-1599
Llinas J, Waltz E. Multisensor Data Fusion. Norwood, MA: Artech House, 1990
X. R. Li, Yunmin Zhu, Chongzhao Han. Unified Optimal Linear Estimation Fusion-Part I:
Unified Models and Fusion Rules. Proc. 2000 International Conf. on Information
Fusion, July 2000
Jiongqi Wang, Haiyin Zhou, Deyong Zhao, el. State Optimal Estimation with Nonstandard
Multi-sensor Information Fusion. System Engineering and Electronics, 2008, 30(8):
1415-1420
Kennet A, Mayback P S. Multiple Model Adaptive Estimation with Filter Pawning. IEEE
Transaction on Aerospace Electron System, 2002, 38(3): 755-768
Bar-shalom, Y., Campo, L. The Effect of The Common Process Noise on the Two-sensor
Fused-track Covariance. IEEE Transaction on Aerospace and Electronic Systems,
1986, Vol.22: 803-805
Morariu, V. I, Camps, O. I. Modeling Correspondences for Multi Camera Tracking Using
Nonlinear Manifold Learning and Target Dynamics. IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, June, 2006, pp: 545-552
Stephen C, Stubberud, Kathleen. A, et al. Data Association for Multisensor Types Using
Fuzzy Logic. IEEE Transaction on Instrumentation and Measurement, 2006, 55(6):
2292-2303
Hammerand, D. C. ; Oden, J. T. ; Prudhomme, S. ; Kuczma, M. S. Modeling Error and
Adaptivity in Nonlinear Continuum System, NTIS No: DE2001-780285/XAB
Crassidis. J Letal.A. Real-time Error Filter and State Estimator.AIAA-943550.1994:92-102
Flammini, A, Marioli, D. et al. Robust Estimation of Magnetic Barkhausen Noise Based on a
Numerical Approach. IEEE Transaction on Instrumentation and Measurement,
2002, 16(8): 1283-1288
Donoho D. L., Elad M. On the Stability of the Basis Pursuit in the Presence of Noise. http:
//www-stat.stanford.edu/-donoho/reports.html
Sun H Y, Wu Y. Semi-parametric Regression and Model Refining. Geospatial Information
Science, 2002, 4(5): 10-13
Green P.J., Silverman B.W. Nonparametric Regression and Generalized Linear Models.
London: CHAPMAN and HALL, 1994
Petros Maragos, FangKuo Sun. Measuring the Fractal Dimension of Signals: Morphological
Covers and Iterative Optimization. IEEE Trans. On Signal Processing, 1998(1):
108~121
G, Sugihara, R.M.May. Nonlinear Forecasting as a Way of Distinguishing Chaos From
Measurement Error in Time Series, Nature, 1990, 344: 734-741
Roy R, Paulraj A, kailath T. ESPRIT--Estimation of Signal Parameters Via Rotational
Invariance Technique. IEEE Transaction Acoustics, Speech, Signal Processing, 1989,
37:984-98
Aufderheide B, Prasad V, Bequettre B W. A Compassion of Fundamental Model-based and
Multi Model Predictive Control. Proceeding of IEEE 40th Conference on Decision
and Control, 2001: 4863-4868
30 Sensor Fusion and Its Applications
X2
1. Introduction
ATC is a critical area related with safety, requiring strict validation in real conditions (Kennedy
& Gardner, 1998), being this a domain where the amount of data has gone under an
exponential growth due to the increase in the number of passengers and flights. This has led to
the need of automation processes in order to help the work of human operators (Wickens et
al., 1998). These automation procedures can be basically divided into two different basic
processes: the required online tracking of the aircraft (along with the decisions required
according to this information) and the offline validation of that tracking process (which is
usually separated into two sub-processes, segmentation (Guerrero & Garcia, 2008), covering
the division of the initial data into a series of different segments, and reconstruction (Pérez et
al., 2006, García et al., 2007), which covers the approximation with different models of the
segments the trajectory was divided into). The reconstructed trajectories are used for the
analysis and evaluation processes over the online tracking results.
This validation assessment of ATC centers is done with recorded datasets (usually named
opportunity traffic), used to reconstruct the necessary reference information. The
reconstruction process transforms multi-sensor plots to a common coordinates frame and
organizes data in trajectories of an individual aircraft. Then, for each trajectory, segments of
different modes of flight (MOF) must be identified, each one corresponding to time intervals
in which the aircraft is flying in a different type of motion. These segments are a valuable
description of real data, providing information to analyze the behavior of target objects
(where uniform motion flight and maneuvers are performed, magnitudes, durations, etc).
The performance assessment of ATC multisensor/multitarget trackers require this
reconstruction analysis based on available air data, in a domain usually named opportunity
trajectory reconstruction (OTR), (Garcia et al., 2009).
OTR consists in a batch process where all the available real data from all available sensors is
used in order to obtain smoothed trajectories for all the individual aircrafts in the interest
area. It requires accurate original-to-reconstructed trajectory’s measurements association,
bias estimation and correction to align all sensor measures, and also adaptive multisensor
smoothing to obtain the final interpolated trajectory. It should be pointed out that it is an
off-line batch processing potentially quite different to the usual real time data fusion
systems used for ATC, due to the differences in the data processing order and its specific
32 Sensor Fusion and Its Applications
processing techniques, along with different availability of information (the whole trajectory
can be used by the algorithms in order to perform the best possible reconstruction).
OTR works as a special multisensor fusion system, aiming to estimate target kinematic state,
in which we take advantage of both past and future target position reports (smoothing
problem). In ATC domain, the typical sensors providing data for reconstruction are the
following:
• Radar data, from primary (PSR), secondary (SSR), and Mode S radars (Shipley,
1971). These measurements have random errors in the order of the hundreds of
meters (with a value which increases linearly with distance to radar).
• Multilateration data from Wide Area Multilateration (WAM) sensors (Yang et al.,
2002). They have much lower errors (in the order of 5-100 m), also showing a linear
relation in its value related to the distance to the sensors positions.
• Automatic dependent surveillance (ADS-B) data (Drouilhet et al., 1996). Its quality
is dependent on aircraft equipment, with the general trend to adopt GPS/GNSS,
having errors in the order of 5-20 meters.
The complementary nature of these sensor techniques allows a number of benefits (high
degree of accuracy, extended coverage, systematic errors estimation and correction, etc), and
brings new challenges for the fusion process in order to guarantee an improvement with
respect to any of those sensor techniques used alone.
After a preprocessing phase to express all measurements in a common reference frame (the
stereographic plane used for visualization), the studied trajectories will have measurements
with the following attributes: detection time, stereographic projections of its x and y
components, covariance matrix, and real motion model (MM), (which is an attribute only
included in simulated trajectories, used for algorithm learning and validation). With these
input attributes, we will look for a domain transformation that will allow us to classify our
samples into a particular motion model with maximum accuracy, according to the model we
are applying.
The movement of an aircraft in the ATC domain can be simplified into a series of basic
MM’s. The most usually considered ones are uniform, accelerated and turn MM’s. The
general idea of the proposed algorithm in this chapter is to analyze these models
individually and exploit the available information in three consecutive different phases.
The first phase will receive the information in the common reference frame and the analyzed
model in order to obtain, as its output data, a set of synthesized attributes which will be
handled by a learning algorithm in order to obtain the classification for the different
trajectories measurements. These synthesized attributes are based on domain transformations
according to the analyzed model by means of local information analysis (their value is based
on the definition of segments of measurements from the trajectory).They are obtained for each
measurement belonging to the trajectory (in fact, this process can be seen as a data pre-
processing for the data mining techniques (Famili et al., 1997)).
The second phase applies data mining techniques (Eibe, 2005) over the synthesized
attributes from the previous phase, providing as its output an individual classification for
each measurement belonging to the analyzed trajectory. This classification identifies the
measurement according to the model introduced in the first phase (determining whether it
belongs to that model or not).
The third phase, obtaining the data mining classification as its input, refines this
classification according to the knowledge of the possible MM’s and their transitions,
Air traffic trajectories segmentation based on time-series sensor data 33
correcting possible misclassifications, and provides the final classification for each of the
trajectory’s measurement. This refinement is performed by means of the application of a
filter.
Finally, segments are constructed over those classifications (by joining segments with the
same classification value). These segments are divided into two different possibilities: those
belonging to the analyzed model (which are already a final output of the algorithm) and
those which do not belong to it, having to be processed by different models. It must be
noted that the number of measurements processed by each model is reduced with each
application of this cycle (due to the segments already obtained as a final output) and thus,
more detailed models with lower complexity should be applied first. Using the introduced
division into three MM’s, the proposed order is the following: uniform, accelerated and
finally turn model. Figure 1 explains the algorithm’s approach:
Trajectory input
data First phase:
domain Synthesized attributes Second phase: data
Analyzed mining techniques
transformation
model
Third phase:
Refined classifications Preliminary
results filtering
classifications
YES
Final segmentation results
Fig. 1. Overview of the algorithm’s approach
The validation of the algorithm is carried out by the generation of a set of test trajectories as
representative as possible. This implies not to use exact covariance matrixes, (but
estimations of their value), and carefully choosing the shapes of the simulated trajectories.
We have based our results on four types of simulated trajectories, each having two different
samples. Uniform, turn and accelerated trajectories are a direct validation of our three basic
MM’s. The fourth trajectory type, racetrack, is a typical situation during landing procedures.
The validation is performed, for a fixed model, with the results of its true positives rate
(TPR, the rate of measurements correctly classified among all belonging to the model) and
false positives rate (FPR, the rate of measurements incorrectly classified among all not
belonging the model). This work will show the results of the three consecutive phases using
a uniform motion model.
The different sections of this work will be divided with the following organization: the
second section will deal with the problem definition, both in general and particularized for
the chosen approach. The third section will present in detail the general algorithm, followed
34 Sensor Fusion and Its Applications
by three sections detailing the three phases for that algorithm when the uniform movement
model is applied: the fourth section will present the different alternatives for the domain
transformation and choose between them the ones included in the final algorithm, the fifth
will present some representative machine learning techniques to be applied to obtain the
classification results and the sixth the filtering refinement over the previous results will be
introduced, leading to the segment synthesis processes. The seventh section will cover the
results obtained over the explained phases, determining the used machine learning
technique and providing the segmentation results, both numerically and graphically, to
provide the reader with easy validation tools over the presented algorithm. Finally a
conclusions section based on the presented results is presented.
2. Problem definition
2.1 General problem definition
As we presented in the introduction section, each analyzed trajectory (ܶ ) is composed of a
collection of sensor reports (or measurements), which are defined by the following vector:
In the general definition of this problem these segments are obtained by the comparison
with a test model applied over different windows (aggregations) of measurements coming
from our trajectory, in order to obtain a fitness value, deciding finally the segmentation
operation as a function of that fitness value (Mann et al. 2002), (Garcia et al., 2006).
We may consider the division of offline segmentation algorithms into different approaches:
a possible approach is to consider the whole data from the trajectory and the segments
obtained as the problem’s basic division unit (using a global approach), where the basic
operation of the segmentation algorithm is the division of the trajectory into those segments
(examples of this approach are the bottom-up and top-down families (Keogh et al., 2003)). In
the ATC domain, there have been approaches based on a direct adaptation of online
techniques, basically combining the results of forward application of the algorithm (the pure
online technique) with its backward application (applying the online technique reversely to
the time series according to the measurements detection time) (Garcia et al., 2006). An
alternative can be based on the consideration of obtaining a different classification value for
each of the trajectory’s measurements (along with their local information) and obtaining the
Air traffic trajectories segmentation based on time-series sensor data 35
segments as a synthesized solution, built upon that classification (basically, by joining those
adjacent measures sharing the same MM into a common segment). This approach allows the
application of several refinements over the classification results before the final synthesis is
performed, and thus is the one explored in the presented solution in this chapter.
ܤሺݔ
ሻ ൌ ሼݔ ሽ ݆ ߳ ሾ݉ െ ǡ ǥ ǡ ݉ǡ ǥ ǡ ݉ ሿ (3)
ܤሺݔ ሻ ൌ ሼݔ ሽ ݐ୨ ߳൛ݐ୫
െ ǡ ǥ ǡ ୫ ǡ ǥ ǡ ݐ୫ ൟ (4)
Once we have chosen a window around our current measurement, we will have to apply a
function to that segment in order to obtain its transformed value. This general classification
function F(ݔሬሬሬԦఫప ሻ, using measurement boundaries, may be represented with the following
formulation:
ሬሬሬሬሬԦ
F(ݔ ప ሬሬሬሬሬԦ
ప ሬሬሬԦప ୧
୫ ሻ = F(ݔ୫ ȁܶ ) ֜ F(ݔ ȁ൫ ୫ ൯ሻ = Fp(ݔ
Ԧ୫ି
,.., ݔԦ୫
,.., ݔԦ୫ା ) (5)
From this formulation of the problem we can already see some of the choices available: how
to choose the segments (according to (3) or (4)), which classification function to apply in (5)
and how to perform the final segment synthesis. Figure 2 shows an example of the local
approach for trajectory segmentation.
Segmentation issue example
6,5
6
5,5
Y coordinate
5
4,5
4
3,5
3
2,5
0,9 1,4 1,9 2,4 2,9
X coordinate
Trajectory input data Analyzed segment Analyzed measure
changes). To obtain the final output for the model analysis, the isolated measurements will
be joined according to their classification in the final segments of the algorithm.
The formalization of these phases and the subsequent changes performed to the data is
presented in the following vectors, representing the input and output data for our three
processes:
������ � � ����� � � � � �� � � �� � � � � � � � � �� ��
Input data: � � � �� � � � � �
����
�
Domain transformation: Dt��� � �F(�� �� ) � F(� ����
� � ������ ��� � = {Pc � }, ������� � ��
� �
�
Pc� = pre-classification k for measurement j, M = number of pre-classifications included
Classification process: Cl(Dt�� ������ �)) = Cl({Pc � })= ��
�
�� = automatic classification result for measurement j (including filtering refinement)
Final output: � � � � ��� ��� � ���� � ��������� � ���� ��
��� = Final segments obtained by the union process
4. Domain transformation
The first phase of our algorithm covers the process where we must synthesize an attribute
from our input data to represent each of the trajectory’s measurements in a transformed
domain and choose the appropriate thresholds in that domain to effectively differentiate
those which belong to our model from those which do not do so.
The following aspects are the key parameters for this phase, presented along with the
different alternatives compared for them, (it must be noted that the possibilities compared
here are not the only possible ones, but representative examples of different possible
approaches):
Transformation function: correlation coefficient / Best linear unbiased estimator
residue
Segmentation granularity: segment study / independent study
Segment extension, time / samples, and segment resolution, length of the segment,
using the boundary units imposed by the previous decision
Threshold choosing technique, choice of a threshold to classify data in the
transformed domain.
Each of these parameters requires an individual validation in order to build the actual final
algorithm tested in the experimental section. Each of them will be analyzed in an individual
section in order to achieve this task.
In order to use the BLUE residue we need to present a model for the uniform MM,
represented in the following equations:
��
�� ��� 1 �� 0 0 ��� �� ���
��� ��� � � ��� �� �� � � � ���� ��� � ������ (7)
�� ��� 0 0 1 �� �� �� ���
���
�� �
� � � ��
��� �
���� � � � � � �� ���� �� ���� ���� �� � ���� �� ���� � ���������
� (8)
� ��� � � � �
����� ��
With those values we may calculate the interpolated positions for our two variables and the
associated residue:
The BLUE residue is presented normalized (the residue divided by the length of the
segment in number of measurements), in order to be able to take advantage of its interesting
statistical properties, which may be used into the algorithm design, and hence allow us to
obtain more accurate results if it is used as our transformation function.
To obtain a classification value from either the CC or the BLUE residue value these values
must be compared with a certain threshold. The CC threshold must be a value close, in
absolute value, to 1, since that indicates a strong correlation between the variables. The
BLUE residue threshold must consider the approximation to a chi-squared function which
can be performed over its value (detailed in the threshold choosing technique section). In
any case, to compare their results and choose the best technique between them, the
threshold can be chosen by means of their TPR and FPR values (choosing manually a
threshold which has zero FPR value with the highest possible TPR value).
To facilitate the performance comparison between the two introduced domain
transformations, we may resort to ROC curves (Fawcett, 2006), which allow us to compare
their behavior by representing their TPR against their FPR. The result of this comparison is
shown in figure 3.
Air traffic trajectories segmentation based on time-series sensor data 39
Fig. 3. Comparison between the two presented domain transformations: CC and BLUE
residue
The comparison result shows that the introduction of the sensor’s noise information is vital
for the accuracy of the domain transformation, and thus the BLUE residue is chosen for this
task.
shorter segments according to the transformation results (this choice will be analysed in the
following section), while applying that same transformed value to the whole segments
introduces restrictions related to the precision which that length introduces (longer
segments may be better to deal with the noise in the measurements, but, at the same time,
obtain worse results due to applying the same transformed value to a greater number of
measurements).
The ROC curve results for this comparison, using segments composed of thirty-one
measurements, are shown in figure 4.
Given the presented design criterion, which remarks the importance of low FPR values, we
may see that individual transformed values perform much better at that range (leftmost side
of the figure), leading us, along with the considerations previously exposed, to its choice for
the algorithm final implementation.
With the units given by the available information, Figure 5 shows the effect of different
resolutions over a given turn trajectory, along with the results over those resolutions.
Observing the presented results, where the threshold has been calculated according to the
procedure explained in the following section, we may determine the resolution effects: short
segments exhibit several handicaps: on the one hand, they are more susceptible to the noise
effects, and, on the other hand, in some cases, long smooth non-uniform MM segments may
be accurately approximated with short uniform segments, causing the algorithm to bypass
them (these effects can be seen in the lower resolutions shown in figure 5). Longer segments
allow us to treat the noise effects more effectively (with resolution 31 there are already no
misclassified measurements during non-uniform segments) and make the identification of
non-uniform segments possible, avoiding the possibility of obtaining an accurate
approximation of these segments using uniform ones (as can be seen with resolution 91)
However, long segments also make the measurements close to a non-uniform MM increase
their transformed value (as their surrounding segment starts to get into the non-uniform
MM), leading to the fact that more measurements around the non-uniform segments will be
pre-classified incorrectly as non-uniform (resolution 181). A different example of the effects
of resolution in these pre-classification results may be looked up in (Guerrero et al., 2010).
There is, as we have seen, no clear choice for a single resolution value. Lower resolutions
may allow us to obtain more precise results at the beginning and end of non-uniform
segments, while higher resolution values are capital to guarantee the detection of those non-
uniform segments and the appropriate treatment of the measurements noise. Thus, for this
first phase, a multi-resolution approach will be used, feeding the second phase with the
different pre-classifications of the algorithm according to different resolution values.
ସ ସ ଼
thres=ʹ െ ͵ටே െ ܰ ൌ ሺ݇݉ܽ ݔെ ݇݉݅݊ ͳሻ (12)
ே ேమ
This threshold depends on the resolution of the segment, N, which also influences the
residue value in (10). It is interesting to notice that the highest threshold value is reached
with the lowest resolution. This is a logical result, since to be able to maintain the TPR value
(having fixed it with the inequality at 99%) with short segments, a high threshold value is
required, in order to counteract the noise effects (while longer segments are more resistant
to that noise and thus the threshold value may be lower).
We would like to determine how precisely our ߯ ଶ distribution represents our normalized
residue in non-uniform trajectories with estimated covariance matrix. In the following
figures we compare the optimal result of the threshold choice (dotted lines), manually
chosen, to the results obtained with equation (12). Figure 6 shows the used trajectories for
this comparison, along with the proposed comparison between the optimal TPR and the one
obtained with (12) for increasing threshold values.
In the two trajectories in figure 6 we may appreciate two different distortion effects
introduced by our approximation. The turn trajectory shows an underestimation of our TPR
due to the inexactitude in the covariance matrix ܴ . This inexactitude assumes a higher
noise than the one which is present in the trajectory, and thus will make us choose a higher
threshold than necessary in order to obtain the desired TPR margin.
In the racetrack trajectory we perceive the same underestimation at the lower values of the
threshold, but then our approximation crosses the optimal results and reaches a value over
it. This is caused by the second distortion effect, the maneuver’s edge measurements. The
measurements close to a maneuver beginning or end tend to have a higher residue value
44 Sensor Fusion and Its Applications
than the theoretical one for a uniform trajectory (due to their proximity to the non-uniform
segments), making us increase the threshold value to classify them correctly (which causes
the optimal result to show a lower TPR in the figure). These two effects show that a heuristic
tuning may be required in our ߯ ଶ distribution in order to adapt it to these distortion effects.
Bayesian networks (Jensen & Graven-Nielsen, 2007) are directed acyclic graphs whose nodes
represent variables, and whose missing edges encode conditional independencies between
the variables. Nodes can represent any kind of variable, be it a measured parameter, a latent
variable or a hypothesis. Special simplifications of these networks are Naive Bayes networks
(Rish, 2001), where the variables are considered independent. This supposition, even though
it may be considered a very strong one, usually introduces a faster learning when the
number of training samples is low, and in practice achieves very good results.
Artificial neural networks are computational models based on biological neural networks,
consisting of an interconnected group of artificial neurons, which process information using
a connectionist approach to computation. Multilayer Perceptron (MP), (Gurney, 1997), are
feed-forward neural networks having an input layer, an undetermined number of hidden
layers and an output layer, with nonlinear activation functions. MP’s are universal function
approximators, and thus they are able to distinguish non-linearly separable data. One of the
handicaps of their approach is the configuration difficulties which they exhibit (dealing
mainly with the number of neurons and hidden layers required for the given problem). The
Weka tool is able to determine these values automatically.
caused by their surrounding noise. Figure 7 shows the results of this filtering process
applied to an accelerated trajectory
In figure 7, the lowest values (0.8 for post-filtered results, 0.9 for pre-filtered ones and 1 for
the real classification) indicate that the measurement is classified as uniform, whereas their
respective higher ones (1+ its lowest value) indicate that the measurement is classified as
non-uniform. This figure shows that some measurements previously misclassified as non-
uniform are corrected.
The importance of this filtering phase is not usually reflected in the TPR, bearing in mind
that the number of measurements affected by it may be very small, but the number of
output segments can vary its value significantly. In the example in figure 7, the pre-filtered
classification would have output nine different segments, whereas the post-filtered
classification outputs only three segments. This change highlights the importance of this
filtering process.
The method to obtain the output segments is extremely simple after this median filter
application: starting from the first detected measurement, one segment is built according to
that measurement classification, until another measurement i with a different classification
value is found. At that point, the first segment is defined with boundaries [1, i-1] and the
process is restarted at measurement i, repeating this cycle until the end of the trajectory is
reached.
7. Experimental validation
The division of the algorithm into different consecutive phases introduces validation
difficulties, as the results are mutually dependant. In this whole work, we have tried to
show those validations along with the techniques explanation when it was unavoidable (as
occurred in the first phase, due to the influence of the choices in its different parameters)
and postpone the rest of the cases for a final validation over a well defined test set (second
and third phases, along with the overall algorithm performance).
Air traffic trajectories segmentation based on time-series sensor data 47
This validation process is carried out by the generation of a set of test trajectories as
representative as possible, implying not to use exact covariance matrixes, (but estimations of
their value), and carefully choosing the shapes of the simulated trajectories. We have based
our results on four kind of simulated trajectories, each having two different samples.
Uniform, turn and accelerated trajectories are a direct validation of our three basic MM’s
identified, while the fourth trajectory type, racetrack, is a typical situation during landing
procedures.
This validation will be divided into three different steps: the first one will use the whole
data from these trajectories, obtain the transformed multi-resolution values for each
measurement and apply the different presented machine learning techniques, analyzing the
obtained results and choosing a particular technique to be included in the algorithm as a
consequence of those results.
Having determined the used technique, the second step will apply the described refinement
process to those classifications, obtaining the final classification results (along with their TPR
and FPR values). Finally the segmentations obtained for each trajectory are shown along
with the real classification of each trajectory, to allow the reader to perform a graphical
validation of the final results.
EM Bayesian Multilayer
C 4.5 Naive Bayes
Trajectory Clustering networks perceptron
TPR FPR TPR FPR TPR FPR TPR FPR TPR FPR
Racetr. 1 0,903 0 0,719 0 0,903 0 0,903 0 0,903 0
Racetr. 2 0,966 0,036 0,625 0 0,759 0 0,759 0 0,966 0,036
Turn 1 0,975 0 1 1 0,918 0 0,914 0 0,975 0
Turn 2 0,994 0,019 0,979 0 0,987 0 0,987 0 0,994 0,019
Accel. 1 0,993 0 0,993 0 0,993 0 0,993 0 0,993 0
Accel. 2 0,993 0,021 0,993 0,021 0,993 0,021 0,993 0,021 0,993 0,021
Whole
0,965 0,078 0,941 0,003 0,956 0,096 0,956 0,096 0,973 0,155
dataset
Table 1. Results presentation over the introduced dataset for the different proposed machine
learning techniques
Machine learning techniques
comparison
1,00
0,99
True positives rate
0,98
0,97
0,96
0,95
0,94
0,93
0 0,05 0,1 0,15 0,2
False positives rate
Fig. 8. TPR and FPR results comparison for the different machine learning techniques over
the whole dataset.
From the results above we can determine that the previous phase has performed an accurate
job, due to the fact that all the different techniques are able to obtain high TPR and low TPR
results. When we compare them, the relationship between the TPR and the FPR does not
allow a clear choice between the five techniques. If we recur to multi-objetive optimization
terminology (Coello et al. 2007), (which is, in fact, what we are performing, trying to obtain a
FPR as low as possible with a TPR as high as possible) we may discard the two Bayesian
approaches, as they are dominated (in terms of Pareto dominance) by the C 4.5 solution.
That leaves us the choice between EM (with the lowest FPR value), the C 4.5 (the most
equilibrated between FPR and TPR values) and the multilayer perceptron (with the highest
TPR). According to our design criterion, we will incorporate into the algorithm the
technique with the lowest FPR: EM clustering.
Air traffic trajectories segmentation based on time-series sensor data 49
In the previous results we can see that the filtering does improve the results in some
trajectories, even though the numerical results over TPR and FPR are not greatly varied (the
effect, as commented in the filtering section, is more noticeable in the number of segments,
given that every measurement misclassified might have meant the creation of an additional
output segment).
The overall segmentation output shows difficulties dealing with the racetrack trajectories.
This is caused by the fact that their uniform segments inside the oval are close to two
different non-uniform ones, thus increasing their transformed value to typical non-uniform
measurements ones, being accordingly classified by the machine learning technique.
However, these difficulties decrease the value of TPR, meaning that this misclassification
can be corrected by the non-uniform models cycles which are applied after the described
uniform one detailed through this work. The rest of the trajectories are segmented in a
satisfactory way (all of them show the right number of output segments, apart from an
additional non-uniform segment in one of the completely uniform ones, caused by the very
high measuring noise in that area).
Figure 9 shows the original trajectory with its correct classification along with the
algorithm’s results.
8. Conclusions
The automation of ATC systems is a complex issue which relies on the accuracy of its low
level phases, determining the importance of their validation. That validation is faced in this
work with an inherently offline processing, based on a domain transformation of the noisy
measurements with three different motion models and the application of machine learning
and filtering techniques, in order to obtain the final segmentation into these different
models. This work has analyzed and defined in depth the uniform motion model and the
algorithm’s performance according to this model. The performance analysis is not trivial,
since only one of the motion models in the algorithm is presented and the results obtained
are, thus, only partial. Even so, these results are encouraging, having obtained good TPR
and FPR values in most trajectory types, and a final number of segments which are
reasonably similar to the real ones expected. Some issues have been pointed out, such as the
behaviour of measurements belonging to uniform motion models when they are close to
two different non-uniform segments (a typical situation during racetrack trajectories), but
the final algorithm’s results are required in order to deal with these issues properly. Future
lines include the complete definition of the algorithm, including the non uniform motion
models and the study of possible modifications in the domain transformation, in order to
deal with the introduced difficulties, along with the validation with real trajectories.
9. References
Allchin, D.,(2001) "Error Types", Perspectives on Science, Vol.9, No.1, Pages 38-58. 2001.
Baxes, G.A. (1994) “Digital Image Processing. Principles & Applications”, Wiley and Sons
Coello, C. A., Lamont, G. B., Van Veldhuizen, D. A. (2007) “Evolutionary Algorithms for
Solving Multi-Objective Problems” 2nd edition. Springer
Dellaert, F. (2002) “The Expectation Maximization Algorithm”. Technical Report number GIT-
GVU-02-20. College of Computing, Georgia Institute of Technology
Drouilhet, P.R., Knittel, G.H., Vincent, A., Bedford, O. (1996) “Automatic Dependent
Surveillance Air Navigation System”. U.S. Patent n. 5570095. October, 1996
Eibe, F. (2005) “Data Mining: Practical Machine Learning Tools and Techniques”. Second Edition.
Morgan Kauffman
Famili, A., Sehn, W., Weber, R., Simoudis, E. (1997) “Data Preprocessing and Intelligent Data
Analysis” Intelligent Data Analysis Journal. 1:1-28, March, 1997.
Fawcett, T. (2006) “An introduction to ROC analysis”. Pattern Recognition Letters, 27. Pages:
861-874. International Association for Pattern Recognition
Garcia, J.; Perez, O.; Molina, J.M.; de Miguel, G.; (2006) “Trajectory classification based on
machine-learning techniques over tracking data”. Proceedings of the 9th International
Conference on Information Fusion. Italy. 2006.
Garcia, J., Molina, J.M., de Miguel, G., Besada, A. (2007) “Model-Based Trajectory
Reconstruction using IMM Smoothing and Motion Pattern Identification”. Proceedings
of the 10th International Conference on Information Fusion. Canada. July 2007.
Garcia, J., Besada, J.A., Soto, A. and de Miguel, G. (2009) “Opportunity trajectory
reconstruction techniques for evaluation of ATC systems“. International Journal of
Microwave and Wireless Technologies. 1 : 231-238
52 Sensor Fusion and Its Applications
Guerrero, J.L and Garcia J. (2008) “Domain Transformation for Uniform Motion Identification in
Air Traffic Trajectories” Proceedings of the International Symposium on Distributed
Computing and Artificial Intelligence (Advances in Soft Computing, Vol. 50), pp.
403-409, Spain, October 2008. Springer
Guerrero, J.L., Garcia, J., Molina, J.M. (2010) “Air Traffic Control: A Local Approach to the
Trajectory Segmentation Issue“. Proceedings for the Twenty Third International
Conference on Industrial, Engineering & Other Applications of Applied Intelligent
Systems, part III, Lecture Notes in Artificial Intelligence, Vol. 6098, pp. 498-507.
Springer
Gurney, K. (1997) “An introduction to Neural Networks”. CRC Press.
Jensen, F.B., Graven-Nielsen, T. (2007) “Bayesian Networks and Decision Graphs”. Second
edition. Springer.
Kay, S.M. (1993) “Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory“.
Prentice Hall PTR.
Kennedy D., Gardner, A. B. (1998) “Tools for analysing the performance of ATC surveillance
radars”, IEE Colloquium on Specifying and Measuring Performance of Modern Radar
Systems. March, 1998. 6/1-6/4.
Keogh, E, Chu, S., Hart, D., Pazzani, M. (2003) “Segmenting Time Series: A Survey and
Novel Approach”. In: Data Mining in Time Series Databases, second edition.. pp 1-21.
World Scientific
Mann, R. Jepson, A.D. El-Maraghi, T. (2002) “Trajectory segmentation using dynamic
programming”. Proceedings for the 16th International Conference on Pattern
Recognition. 2002.
Meyer, P. (1970) “Introductory Probability and Statistical Applications” Second edition. Addison
Wesley.
Pérez, O., García, J., Molina, J.M. (2006) “Neuro-fuzzy Learning Applied to Improve the
Trajectory Reconstruction Problem". Proceedings of the International Conference on
Computational Intelligence for Modelling Control and Automation and
International Conference on Intelligent Agents Web Technologies and International
Commerce (CIMCA06). Australia, November 2006. IEEE Computer Society.
Picard, R.; Cook, D. (1984). "Cross-Validation of Regression Models". Journal of the American
Statistical Association 79 (387). Pages 575–583.
Quinlan, J.R. (1993) “C4.5: Programs for Machine Learning”. Morgan Kaufmann
Rish, I. (2001) “An empirical study of the naive Bayes classifier”. IJCAI 2001 : Workshop on
Empirical Methods in Artificial Intelligence.
Shipley, R. (1971) “Secondary Surveillance Radar in ATC Systems: A description of the
advantages and implications to the controller of the introduction of SSR facilities”.
Aircraft Engineering and Aerospace Technology. 43: 20-21. MCB UP Ltd.
Wickens, C.D., Mavor, A.S., Parasuraman, R. and McGee, J. P. (1998) “The Future of Air
Traffic Control: Human Operators and Automation”. The National Academies Press,
Washington, D.C.
Yang, Y.E. Baldwin, J. Smith, A. Rannoch Corp., Alexandria, VA. (2002) “Multilateration
tracking and synchronization over wide areas”. Proceedings of the IEEE Radar
Conference. August 2002. IEEE Computer Society.
Yin, L., Yang, R., Gabbouj, M., Neuvo, Y. (1996) “Weighted Median Filters: A Tutorial”,
IEEE Trans. on Circuits and Systems, 43(3), pages. 157-192.
Distributed Compressed Sensing of Sensor Data 53
0
3
1. Introduction
Intelligent Information processing in distributed wireless sensor networks has many differ-
ent optimizations by which redundancies in data can be eliminated, and at the same time the
original source signal can be retrieved without loss. The data-centric nature of sensor net-
work is modeled, which allows environmental applications to measure correlated data by pe-
riodic data aggregation. In the distributed framework, we explore how Compressed Sensing
could be used to represent the measured signals in its sparse form, and model the frame-
work to reproduce the individual signals from the ensembles in its sparse form expressed in
equations(1,3). The processed signals are then represented with their common component;
which is represented by its significant coefficients, and the variation components, which is
also sparse are projected onto scaling and wavelet functions of the correlated component.
The overall representation of the basis preserves the temporal (intra-signal) and spatial (inter-
signal) characteristics. All of these scenarios correspond to measuring properties of physical
processes that change smoothly in time, and in space, and thus are highly correlated. We
show by simulation that the framework using cross-layer protocols can be extended using
sensor fusion, and data-centric aggregation, to scale to a large number of nodes.
ments show temporal correlation with inter sensor data, the signal is further divided into
many blocks which represent constant variance. In terms of the OSI layer, the pre-processing
is done at the physical layer, in our case it is wireless channel with multi-sensor intervals. The
network layer data aggregation is based on variable length pre-fix coding, which minimizes
the number of bits before transmitting it to a sink. In terms of the OSI layers, data aggregation
is done at the data-link layer periodically buffering, before the packets are routed through the
upper network layer.
2. Pre-Processing
As different sensors are connected to each node, the nodes have to periodically measure the
values for the given parameters which are correlated. The inexpensive sensors may not be
calibrated, and need processing of correlated data, according to intra and inter sensor varia-
tions. The pre-processing algorithms allow to accomplish two functions, one to use minimal
number of measurement at each sensor, and the other to represent the signal in its loss-less
sparse representation.
Consider a real-valued signal x ∈ R N indexed as x(n), n ∈ 1, 2, ..., N. Suppose that the basis
Ψ = [Ψ1 , ..., Ψ N ] provides a K-sparse representation of x; that is, where x is a linear combina-
tion of K vectors chosen from, Ψ, nk are the indices of those vectors, and ϑ (n) are the coeffi-
cients; the concept is extendable to tight frames (Dror Baron (Marco F. Duarte)). Alternatively,
we can write in matrix notation x = Ψϑ, where x is an N × 1 column vector, the sparse basis
matrix is N × N with the basis vectors Ψn as columns, and ϑ (n) is an N × 1 column vector
with K nonzero elements. Using . p Ato ˛ denote the p norm, we can write that ϑ p = K;
we can also write the set of nonzero indices Ω1, ..., N, with |Ω| = K. Various expansions, in-
cluding wavelets (Dror Baron (Marco F. Duarte)), Gabor bases (Dror Baron (Marco F. Duarte)),
curvelets (Dror Baron (Marco F. Duarte)), are widely used for representation and compression
of natural signals, images, and other data.
are many factored possibilities of x = Pθ. Among the factorization the unique representation
of the smallest dimensionality of θ is the sparsity level of the signal x under this model, or
which is the smallest interval among the sensor readings distinguished after cross-layer
aggregation.
Fig. 1. Bipartite graphs for distributed compressed sensing.
DCS allows to enable distributed coding algorithms to exploit both intra-and inter-signal cor-
relation structures. In a sensor network deployment, a number of sensors measure signals
that are each individually sparse in the some basis and also correlated from sensor to sensor.
If the separate sparse basis are projected onto the scaling and wavelet functions of the corre-
lated sensors(common coefficients), then all the information is already stored to individually
recover each of the signal at the joint decoder. This does not require any pre-initialization
between sensor nodes.
x j = zC + z j , j ∈ ∀ (4)
X = PΘ (5)
Distributed Compressed Sensing of Sensor Data 57
We now introduce a bipartite graph G = (VV , VM , E), as shown in Figure 1, that represents the
relationships between the entries of the value vector and its measurements. The common and
innovation components KC and K j , (1 < j < J ), as well as the joint sparsity D = KC + ∑ K J .
The set of edges E is defined as follows:
• The edge E is connected for all Kc if the coefficients are not in common with K j .
• The edge E is connected for all K j if the coefficients are in common with K j .
A further optimization can be performed to reduce the number of measurement made by each
sensor, the number of measurement is now proportional to the maximal overlap of the inter
sensor ranges and not a constant as shown in equation (1). This is calculated by the common
coefficients Kc and K j , if there are common coefficients in K j then one of the Kc coefficient is
removed and the common Zc is added, these change does not effecting the reconstruction of
the original measurement signal x.
Substituting d-distance for x and k number of bits transmitted, we equate as in equation (7).
n=100, Tx range=50m CONST
440m2 140m 2 75m2 50m 2
Fig. 2. Cost function for managing Node density
Notice that the expression in equation (10) has the form of a linear relationship with slope k,
and scaling the argument induces a linear shift of the function, and leaves both the form and
slope k unchanged. Plotting to the log scale as shown in Figure 3, we get a long tail showing
a few nodes dominate the transmission power compared to the majority, similar to the Power
Law (S. B. Lowen and M. C. Teich (1970)).
Properties of power laws - Scale invariance: The main property of power laws that makes
them interesting is their scale invariance. Given a relation f ( x ) = ax k or, any homogeneous
polynomial, scaling the argument x by a constant factor causes only a proportionate scaling
of the function itself. From the equation (10), we can infer that the property is scale invariant
even with clustering c nodes in a given radius k.
This is validated from the simulation results (Vasanth Iyer (G. Rama Murthy)) obtained in Fig-
ure (2), which show optimal results, minimum loading per node (Vasanth Iyer (S.S. Iyengar)),
when clustering is ≤ 20% as expected from the above derivation.
Sensors S1 S2 S3 S4 S5 S6 S7 S8
Value 4.7 ± 1.6 ± 3.0 ± 1.8 ± 4.7 ± 1.6 ± 3.0 ± 1.8 ±
2.0 1.6 1.5 1.0 1.0 0.8 0.75 0.5
Group - - - - - - - -
Table 1. A typical random measurements from sensors showing non-linearity in ranges
better handle passive listening, and used low-power listening(LPL). The performance charac-
teristic of MAC based protocols for varying density (small, medium and high) deployed are
shown in Figure 3. As it is seen it uses best effort routing (least cross-layer overhead), and
maintains a constant throughput, the depletion curve for the MAC also follows the Power
Law depletion curve, and has a higher bound when power-aware scheduling such LPL and
Sleep states are further used for idle optimization.
1. Assumption1: How well the individual sensor signal sparsity can be represented.
2. Assumption2: What would be the minimum measurement possible by using joint spar-
sity model from equation (5).
3. Assumption3: The maximum possible basis representations for the joint ensemble co-
efficients.
4. Assumption4: A cost function search which allows to represent the best basis without
overlapping coefficients.
5. Assumption5: Result validation using regression analysis, such package R (Owen Jones
(Robert Maillardet)).
The design framework allows to pre-process individual sensor sparse measurement, and uses
a computationally efficient algorithm to perform in-network data fusion.
To use an example data-set, we will use four random measurements obtained by multiple
sensors, this is shown in Table 1. It has two groups of four sensors each, as shown the mean
value are the same for both the groups and the variance due to random sensor measurements
vary with time. The buffer is created according to the design criteria (1), which preserves
the sparsity of the individual sensor readings, this takes three values for each sensor to be
represented as shown in Figure (4).
60 Sensor Fusion and Its Applications
In the case of post-processing algorithms, which optimizes on the space and the number of
bits needed to represent multi-sensor readings, the fusing sensor calculates the average or the
mean from the values to be aggregated into a single value. From our example data, we see that
for both the data-sets gives the same end result, in this case µ = 2.7 as shown in the output
plot of Figure 4(a). Using the design criteria (1), which specifies the sparse representation is
not used by post-processing algorithms. Due to this dynamic features are lost during data
aggregation step.
The pre-processing step uses Discrete Wavelet Transform (DWT) (Arne Jensen and Anders
la Cour-Harbo (2001)) on the signal, and may have to recursively apply the decomposition
to arrive at a sparse representation, this pre-process is shown in Figure 4(b). This step uses
the design criteria (1), which specifies the small number of significant coefficients needed to
represent the given signal measured. As seen in Figure 4(b), each level of decomposition
reduces the size of the coefficients. As memory is constrained, we use up to four levels of
decomposition with a possible of 26 different representations, as computed by equation (2).
These uses the design criteria (3) for lossless reconstruction of the original signal.
The next step of pre-processing is to find the best basis, we let a vector Basis of the same
length as cost values representing the basis, this method uses Algorithm 1. The indexing of
the two vector is the same and are enumerated in Figure of 4(b). In Figure 4(b), we have
marked a basis with shaded boxes. This basis is then represented by the vector. The basis
search, which is part of design criteria (4), allows to represent the best coefficients for inter
and intra sensor features. It can be noticed that the values are not averages or means of the
signal representation, it preserves the actual sensor outputs. As an important design criteria
(2), which calibrates the minimum possible sensitivity of the sensor. The output in figure 4(b),
shows the constant estimate of S3 , S7 which is ZC = 2.7 from equation (4).
Distributed Compressed Sensing of Sensor Data 61
Sensors S1 S2 S3 S4 S5 S6 S7 S8
i.i.d.1 2.7 0 1.5 0.8 3.7 0.8 2.25 1.3
i.i.d.2 4.7 1.6 3 1.8 4.7 1.6 3 1.8
i.i.d.3 6.7 3.2 4.5 2.8 5.7 2.4 3.75 2.3
Table 2. Sparse representation of sensor values from Table:1
To represent the variance in four sensors, a basis search is performed which finds coefficients
of sensors which matches the same columns. In this example, we find Zj = 1.6, 0.75 from
equation (4), which are the innovation component.
Basis = [0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Correlated range = [0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
−
→
4.0 3.20 3.00 2.00 2.00 1.60 1.5 1.0
−−→
3.2 2.56 2.40 1.60 1.60 1.28 1.20 0.80
−−→
3.0 2.40 2.250 1.50 1.50 1.20 1.125 0.75
−−→
2.0 1.60 1.50 1.00 1.00 0.80 0.75 0.5
Σ= −−→ (13)
2.0 1.60 1.50 1.00 1.00 0.80 0.75 0.5
−−→
1.6 1.28 1.20 0.80 0.80 0.64 0.60 0.4
−−−→
1.5 1.20 1.125 0.75 0.75 0.60 0.5625 0.375
−−→
1.0 0.80 0.750 0.50 0.50 0.40 0.375 0.250
5. Conclusion
In this topic, we have discussed a distributed framework for correlated multi-sensor mea-
surements and data-centric routing. The framework, uses compressed sensing to reduce the
number of required measurements. The joint sparsity model, further allows to define the sys-
tem accuracy in terms of the lowest range, which can be measured by a group of sensors. The
sensor fusion algorithms allows to estimate the physical parameter, which is being measured
without any inter sensor communications. The reliability of the pre-processing and sensor
faults are discussed by comparing DWT and Covariance methods.
The complexity model is developed which allows to describe the encoding and decoding of
the data. The model tends to be easy for encoding, and builds more complexity at the joint
decoding level, which are nodes with have more resources as being the decoders.
Post processing and data aggregation are discussed with cross-layer protocols at the network
and the MAC layer, its implication to data-centric routing using DHT is discussed, and com-
pared with the DCS model. Even though these routing algorithms are power-aware, the model
does not scale in terms of accurately estimating the physical parameters at the sensor level,
making sensor driven processing more reliable for such applications.
6. Theoretical Bounds
The computational complexities and its theoretical bounds are derived for categories of sensor
pre-, post processing and routing algorithms.
6.1 Pre-Processing
Theorem 1. The Slepian-Wolf rate as referenced in the region for two arbitrarily correlated sources x
and y is bounded by the following inequalities, this theorem can be adapted using equation
y
x
Rx ≥ H , Ry ≥ H and R x + Ry ≥ H ( x, y) (14)
y x
Theorem 2. minimal spanning tree (MST) computational and time complexity for correlated den-
drogram. First considering the computational complexity let us assume n patterns in d-dimensional
space. To make c clusters using dmin ( Di , Dj) a distance measure of similarity. We need once for
all, need to calculate n(n − 1) interpoint distance table. The space complexity is n2 , we reduce it to
lg(n) entries. Finding the minimum distance pair (for the first merging) requires that we step through
the complete list, keeping the index of the smallest distance. Thus, for the first step, the complexity is
O(n(n − 1))(d2 + 1) = O(n2 d2 ). For clusters c the number of steps is n(n − 1) − c unused distances.
The full-time complexity is O(n(n − 1) − c) or O(cn2 d2 ).
Distributed Compressed Sensing of Sensor Data 63
Algorithm 1 DWT: Using a cost function for searching the best sparse representation of a
signal.
1: Mark all the elements on the bottom level
2: Let j = J
3: Let k = 0
4: Compare the cost v1 of the element k on level ( j − 1) (counting from the left on that level)
to the sum v2 of the cost values of the element 2k and the 2k + 1 on the level j.
5: if v1 ≤ v2 , all marks below element k on level j − 1 are deleted, and element k is marked.
6: if v1 > v2 , the cost value v1 of element k is replaced with v2 k = k + 1. If there are more
elements on level j (if k < 2 j−1 − 1)), go to step 4.
7: j = j − 1. If j > 1, go to step 3.
8: The marked sparse representation has the lowest possible cost value, having no overlaps.
6.2 Post-processing
Theorem 3. Properties of Pre-fix coding: For any compression algorithm which assigns prefix codes
and to uniquely be decodable. Let us define the kraft Number and is a measure of the size of L. We
see that if L is 1, 2− L is .5. We know that we cannot have more than two L’s of .5. If there are more
that two L’s of .5, then K > 1. Similarly, we know L can be as large as we want. Thus, 2− L can be as
small as we want, so K can be as small as we want. Thus we can intuitively see that there must be a
strict upper bound on K, and no lower bound. It turns out that a prefix-code only exists for the codes
IF AND ONLY IF:
K≤1 (15)
The above equation is the Kraft inequality. The success of transmission can be further calculated by
using the equation For a minimum pre-fix code a = 0.5 as 2− L ≤ 1 for a unique decodability.
Iteration a = 0.5
In order to extend this scenario with distributed source coding, we consider the case of separate encoders
for each source, xn and yn . Each encoder operates without access to the other source.
Iteration a ≥ 0.5 ≤ 1.0
As in the previous case it uses correlated values as a dependency and constructs the code-book. The
compression rate or efficiency is further enhanced by increasing the correlated CDF higher than a >
0.5. This produces very efficient code-book and the design is independent of any decoder reference
information. Due to this a success threshold is also predictable, if a = 0.5 and the cost between L = 1.0
and 2.0 the success = 50% and for a = 0.9 and L = 1.1, the success = 71%.
7. References
S. Lowen and M. Teich. (1970). Power-Law Shot Noise, IEEE Trans Inform volume 36, pages
1302-1318, 1970.
Slepian, D. Wolf, J. (1973). Noiseless coding of correlated information sources. Information
Theory, IEEE Transactions on In Information Theory, IEEE Transactions on, Vol. 19,
No. 4. (06 January 2003), pp. 471-480.
Bhaskar Krishnamachari, S.S. Iyengar. (2004). Distributed Bayesian Algorithms for Fault-
Tolerant Event Region Detection in Wireless Sensor Networks, In: IEEE TRANSAC-
TIONS ON COMPUTERS,VOL. 53, NO. 3, MARCH 2004.
Dror Baron, Marco F. Duarte, Michael B. Wakin, Shriram Sarvotham, and Richard G. Baraniuk.
(2005). Distributed Compressive Sensing. In Proc: Pre-print, Rice University, Texas,
USA, 2005.
Vasanth Iyer, G. Rama Murthy, and M.B. Srinivas. (2008). Min Loading Max Reusability Fusion
Classifiers for Sensor Data Model. In Proc: Second international Conference on Sensor
Technologies and Applications, Volume 00 (August 25 - 31, SENSORCOMM 2008).
Vasanth Iyer, S.S. Iyengar, N. Balakrishnan, Vir. Phoha, M.B. Srinivas. (2009). FARMS: Fusion-
able Ambient Renewable MACS, In: SAS-2009,IEEE 9781-4244-2787, 17th-19th Feb,
New Orleans, USA.
Vasanth Iyer, S. S. Iyengar, Rama Murthy, N. Balakrishnan, and V. Phoha. (2009). Distributed
source coding for sensor data model and estimation of cluster head errors using
bayesian and k-near neighborhood classifiers in deployment of dense wireless sensor
networks, In Proc: Third International Conference on Sensor Technologies and Applications
SENSORCOMM, 17-21 June. 2009.
Vasanth Iyer, S.S. Iyengar, G. Rama Murthy, Kannan Srinathan, Vir Phoha, and M.B. Srinivas.
INSPIRE-DB: Intelligent Networks Sensor Processing of Information using Resilient
Encoded-Hash DataBase. In Proc. Fourth International Conference on Sensor Tech-
nologies and Applications, IARIA-SENSORCOMM, July, 18th-25th, 2010, Venice,
Italy (archived in the Computer Science Digital Library).
Vasanth Iyer, S.S. Iyengar, N. Balakrishnan, Vir. Phoha, M.B. Srinivas. (2009). FARMS: Fusion-
able Ambient Renewable MACS, In: SAS-2009,IEEE 9781-4244-2787, 17th-19th Feb,
New Orleans, USA.
GEM: Graph EMbedding for Routing and DataCentric Storage in Sensor Networks Without
Geographic Information. Proceedings of the First ACM Conference on Embedded
Networked Sensor Systems (SenSys). November 5-7, Redwood, CA.
Owen Jones, Robert Maillardet, and Andrew Robinson. Introduction to Scientific Program-
ming and Simulation Using R. Chapman & Hall/CRC, Boca Raton, FL, 2009. ISBN
978-1-4200-6872-6.
Arne Jensen and Anders la Cour-Harbo. Ripples in Mathematics, Springer Verlag 2001. 246
pp. Softcover ISBN 3-540-41662-5.
S. S. Iyengar, Nandan Parameshwaran, Vir V. Phoha, N. Balakrishnan, and Chuka D Okoye,
Fundamentals of Sensor Network Programming: Applications and Technology.
ISBN: 978-0-470-87614-5 Hardcover 350 pages December 2010, Wiley-IEEE Press.
Adaptive Kalman Filter for Navigation Sensor Fusion 65
X4
Tsu-Pin Weng
EverMore Technology, Inc., Hsinchu
Taiwan
1. Introduction
As a form of optimal estimator characterized by recursive evaluation, the Kalman filter (KF)
(Bar-Shalom, et al, 2001; Brown and Hwang, 1997, Gelb, 1974; Grewal & Andrews, 2001) has
been shown to be the filter that minimizes the variance of the estimation mean square error
(MSE) and has been widely applied to the navigation sensor fusion. Nevertheless, in
Kalman filter designs, the divergence due to modeling errors is critical. Utilization of the KF
requires that all the plant dynamics and noise processes are completely known, and the
noise process is zero mean white noise. If the input data does not reflect the real model, the
KF estimates may not be reliable. The case that theoretical behavior of a filter and its actual
behavior do not agree may lead to divergence problems. For example, if the Kalman filter is
provided with information that the process behaves a certain way, whereas, in fact, it
behaves a different way, the filter will continually intend to fit an incorrect process signal.
Furthermore, when the measurement situation does not provide sufficient information to
estimate all the state variables of the system, in other words, the estimation error covariance
matrix becomes unrealistically small and the filter disregards the measurement.
In various circumstances where there are uncertainties in the system model and noise
description, and the assumptions on the statistics of disturbances are violated since in a
number of practical situations, the availability of a precisely known model is unrealistic due
to the fact that in the modelling step, some phenomena are disregarded and a way to take
them into account is to consider a nominal model affected by uncertainty. The fact that KF
highly depends on predefined system and measurement models forms a major drawback. If
the theoretical behavior of the filter and its actual behavior do not agree, divergence
problems tend to occur. The adaptive algorithm has been one of the approaches to prevent
divergence problem of the Kalman filter when precise knowledge on the models are not
available.
To fulfil the requirement of achieving the filter optimality or to preventing divergence
problem of Kalman filter, the so-called adaptive Kalman filter (AKF) approach (Ding, et al,
66 Sensor Fusion and Its Applications
2007; El-Mowafy & Mohamed, 2005; Mehra, 1970, 1971, 1972; Mohamed & Schwarz, 1999;
Hide et al., 2003) has been one of the promising strategies for dynamically adjusting the
parameters of the supposedly optimum filter based on the estimates of the unknown
parameters for on-line estimation of motion as well as the signal and noise statistics
available data. Two popular types of the adaptive Kalman filter algorithms include the
innovation-based adaptive estimation (IAE) approach (El-Mowafy & Mohamed, 2005;
Mehra, 1970, 1971, 1972; Mohamed & Schwarz, 1999; Hide et al., 2003) and the adaptive
fading Kalman filter (AFKF) approach (Xia et al., 1994; Yang, et al, 1999, 2004;Yang & Xu,
2003; Zhou & Frank, 1996), which is a type of covariance scaling method, for which
suboptimal fading factors are incorporated. The AFKF incorporates suboptimal fading
factors as a multiplier to enhance the influence of innovation information for improving the
tracking capability in high dynamic maneuvering.
The Global Positioning System (GPS) and inertial navigation systems (INS) (Farrell, 1998;
Salychev, 1998) have complementary operational characteristics and the synergy of both
systems has been widely explored. GPS is capable of providing accurate position
information. Unfortunately, the data is prone to jamming or being lost due to the limitations
of electromagnetic waves, which form the fundamental of their operation. The system is not
able to work properly in the areas due to signal blockage and attenuation that may
deteriorate the overall positioning accuracy. The INS is a self-contained system that
integrates three acceleration components and three angular velocity components with
respect to time and transforms them into the navigation frame to deliver position, velocity
and attitude components. For short time intervals, the integration with respect to time of the
linear acceleration and angular velocity monitored by the INS results in an accurate velocity,
position and attitude. However, the error in position coordinates increase unboundedly as a
function of time. The GPS/INS integration is the adequate solution to provide a navigation
system that has superior performance in comparison with either a GPS or an INS stand-
alone system. The GPS/INS integration is typically carried out through the Kalman filter.
Therefore, the design of GPS/INS integrated navigation system heavily depends on the
design of sensor fusion method. Navigation sensor fusion using the AKF will be discussed.
A hybrid approach will be presented and performance will be evaluated on the loosely-
coupled GPS/INS navigation applications.
This chapter is organized as follows. In Section 2, preliminary background on adaptive
Kalman filters is reviewed. An IAE/AFKF hybrid adaptation approach is introduced in
Section 3. In Section 4, illustrative examples on navigation sensor fusion are given.
Conclusions are given in Section 5.
Q k , i k R k , i k
E[ w k w iT ] ; E[ v k v iT ] ; E[ w k v iT ] 0 for all i and k (2)
0 , i k 0, i k
where Q k is the process noise covariance matrix, R k is the measurement noise covariance
matrix, Φ k e Ft is the state transition matrix, and t is the sampling interval, E []
represents expectation, and superscript “T” denotes matrix transpose.
The discrete-time Kalman filter algorithm is summarized as follow:
Prediction steps/time update equations:
xˆ k 1 Φ k xˆ k (3)
Pk 1 Φ k Pk Φ Tk Qk (4)
Correction steps/measurement update equations:
K k Pk H Tk [H k Pk H Tk R k ]1 (5)
xˆ k xˆ k K k [ z k H k xˆ k ] (6)
Pk [I K k H k ]Pk (7)
A limitation in applying Kalman filter to real-world problems is that the a priori statistics of
the stochastic errors in both dynamic process and measurement models are assumed to be
available, which is difficult in practical application due to the fact that the noise statistics
may change with time. As a result, the set of unknown time-varying statistical parameters of
noise, {Q k , R k } , needs to be simultaneously estimated with the system state and the error
covariance. Two popular types of the adaptive Kalman filter algorithms include the
innovation-based adaptive estimation (IAE) approach (El-Mowafy and Mohamed, 2005;
Mehra, 1970, 1971, 1972; Mohamed and Schwarz, 1999; Hide et al., 2003; Caliskan & Hajiyev,
2000) and the adaptive fading Kalman filter (AFKF) approach (Xia et al., 1994; Zhou & Frank,
1996), which is a type of covariance scaling method, for which suboptimal fading factors are
incorporated.
Defining Ĉ k as the statistical sample variance estimate of C k , matrix Ĉ k can be
computed through averaging inside a moving estimation window of size N
k
ˆ 1
Ck
N
υ jυTj (11)
j j0
where N is the number of samples (usually referred to the window size); j0 k N 1 is the
first sample inside the estimation window. The window size N is chosen empirically (a good
size for the moving window may vary from 10 to 30) to give some statistical smoothing.
More detailed discussion can be referred to Gelb (1974), Brown & Hwang (1997), and
Grewal & Andrews (2001).
The benefit of the adaptive algorithm is that it keeps the covariance consistent with the real
performance. The innovation sequences have been utilized by the correlation and
covariance-matching techniques to estimate the noise covariances. The basic idea behind the
covariance-matching approach is to make the actual value of the covariance of the residual
consistent with its theoretical value. This leads to an estimate of R k :
ˆ C
R ˆ H P H T (12)
k k k k k
Based on the residual based estimate, the estimate of process noise Q k is obtained:
k
ˆ 1
Q k
N
x j x Tj Pk Φ k Pk 1ΦTk (13)
j j0
Adaptive Kalman Filter for Navigation Sensor Fusion 69
where x k x k xˆ k . This equation can also be written in terms of the innovation sequence:
ˆ K C
Q ˆ T
(14)
k k k K k
For more detailed information derivation for these equations, see Mohamed & Schwarz
(1999).
M k H k Φ k Pk ΦTk H Tk (17)
N k C 0 R k H k Q k H Tk (18a)
where
υ υT
0 0 ,k 0
2
C0 (19)
T
[ k υ k υ k ] , k 1
1
k
Equation (18a) can be modified by multiplying an innovation enhancement weighting factor
γ , and adding an additional term:
N k γC 0 R k H k Q k HTk (18b)
In the AFKF, the key parameter is the fading factor matrix λ k . The factor γ is introduced
for increasing the tracking capability through the increased weighting of covariance matrix
of the innovation. The value of weighting factor γ is tuned to improve the smoothness of
state estimation. A larger weighting factor γ provides stronger tracking capability, which is
usually selected empirically. The fading memory approach tries to estimate a scale factor to
increase the predicted variance components of the state vector. The variance estimation
method directly calculates the variance factor for the dynamic model.
There are some drawbacks with a constant factor, e.g., as the filtering proceeds, the
precision of the filtering will decrease because the effects of old data will become less and
less. The ideal way is to use a variant scale factor that will be determined based on the
dynamic and observation model accuracy.
c , i ck 1
i, k i k (21)
1 , i ck 1
where
tr[N k ]
ck (22)
tr[αM k ]
and
N k γVk R k H k Q k H Tk (23)
M k H k Φ k Pk ΦTk HTk (24)
υ υT , k 0
0 0
Vk [ V T (25)
k 1 υ k υ k ] , k 1
1
The key parameter in the STKF is the fading factor matrix λ k , which is dependent on three
parameters, including (1) i ; (2) the forgetting factor ( ); (3) and the softening factor ( ).
These parameters are usually selected empirically. i 1, i 1,2 , , m , which are a priori
selected. If from a priori knowledge, we have the knowledge that x will have a large
change, then a large i should be used so as to improve the tracking capability of the STKF.
On the other hand, if no a priori knowledge about the plant dynamic, it is commonly
select 1 2 m 1 . In such case, the STKF based on multiple fading factors
deteriorates to a STKF based on a single fading factor. The range of the forgetting factor is
0 1 , for which 0.95 is commonly used. The softening factor is utilized to improve
the smoothness of state estimation. A larger (with value no less than 1) leads to better
estimation accuracy; while a smaller provides stronger tracking capability. The value is
usually determined empirically through computer simulation and 4.5 is a commonly
selected value.
~ υk
υk (28)
C k
To avoid 0 , it is common to choose
1 ~
υk c
c
~ (29)
υk c
~
υk
The a priori selected value is usually selected empirically. If from a priori knowledge, we
have the knowledge that x will have a large change, then a small should be used so as to
improve the tracking capability. The range of the factor is 0 1 . The factor is utilized to
improve the smoothness of state estimation. A larger ( 1 ) leads to better estimation
accuracy; while a smaller provides stronger tracking capability. The value is usually
determined empirically through personal experience or computer simulation using a
heuristic searching scheme. In the case that 1 , it deteriorates to a standard Kalman filter.
In Equation (29), the threshold c 0.5 is an average value commonly used. To increase the
tracking capability, the time-varying suboptimal scaling factor need to be properly
designed, for on-line tuning the covariance of the predicted state, which adjusts the filter
gain, and accordingly the improved version of AFKF is able to provide better estimation
accuracy.
DOM C k Cˆ (31)
k
Kalman filtering with motion detection is important in target tracking applications. The
innovation information at the present epoch can be employed for timely reflect the change
in vehicle dynamic. Selecting the degree of divergence (DOD) as the trace of innovation
covariance matrix at present epoch (i.e., the window size is one), we have:
υ Tk υ k tr (υ k υ Tk ) (32)
Adaptive Kalman Filter for Navigation Sensor Fusion 73
For each of the approaches, only one scalar value needs to be determined, and therefore the
fuzzy rules can be simplified resulting in the decrease of computational efficiency.
The logic of adaptation algorithm using covariance-matching technique is described as
follows. When the actual covariance value Ĉ k is observed, if its value is within the range
predicted by theory C k and the difference is very near to zero, this indicates that both
covariances match almost perfectly. If the actual covariance is greater than its theoretical
value, the value of the process noise should be decreased; if the actual covariance is less than
its theoretical value, the value of the process noise should be increased. The fuzzy logic
(Abdelnour,et al , 1993; Jwo & Chang, 2007; Loebis, et al, 2007; Mostov & Soloviev, 1996;
Sasiadek, et al, 2000) is popular mainly due to its simplicity even though some other
approaches such as neural network and genetic algorithm may also be applicable. When the
fuzzy logic approach based on rules of the kind:
IF〈antecedent〉THEN〈consequent〉
the following rules can be utilized to implement the idea of covariance matching:
A. Ĉ k is employed
ˆ
(1) IF〈 C k 0 〉THEN〈 Q k is unchanged〉 (This indicates that Ĉ k is near to zero, the
process noise statistic should be remained.)
(2) IF〈 Cˆ
k 0 〉THEN〈 Q k is increased〉 (This indicates that Ĉ k is larger than zero,
the process noise statistic is too small and should be increased.)
ˆ
(3) IF〈 C k 0 〉THEN〈 Q k is decreased〉 (This indicates that Ĉ k is less than zero, the
process noise statistic is too large and should be decreased.)
B. DOM is employed
(1) IF〈 DOM 0 〉THEN〈 Q k is unchanged〉 (This indicates that Ĉ k is about the same
as C k , the process noise statistic should be remained.)
(2) IF〈 DOM 0 〉THEN〈 Q k is decreased〉 (This indicates that Ĉ k is less than C k , the
process noise statistic should be decreased.)
(3) IF〈 DOM 0 〉THEN〈 Q k is increased〉 (This indicates that Ĉ k is larger than C k ,
the process noise statistic should be increased.)
C. DOD ( ) is employed
74 Sensor Fusion and Its Applications
Suppose that is employed as the test statistic, and T represents the chosen threshold.
The following fuzzy rules can be utilized:
(1) IF〈 T 〉THEN〈 Q k is increased〉 (There is a failure or maneuvering reported; the
process noise statistic is too small and needs to be increased)
(2) IF〈 T 〉THEN〈 Q k is decreased〉 (There is no failure or non maneuvering; the
process noise statistic is too large and needs to be decreased)
or, alternatively
Pk1 λ P Φ k Pk Φ Tk Q k (39a)
On the other hand, the covariance matrix can also be approximated by
Pk1 λ P Pk1 λ P (Φ k Pk ΦTk Q k ) (39b)
where λ P diag (1 , 2 , m ) . The main difference between different adaptive fading
algorithms is on the calculation of scale factor λ P . One approach is to assign the scale
factors as constants. When i 1 ( i 1,2,, m ), the filtering is in a steady state processing
while i 1 , the filtering may tend to be unstable. For the case i 1 , it deteriorates to the
standard Kalman filter. There are some drawbacks with constant factors, e.g., as the filtering
proceeds, the precision of the filtering will decrease because the effects of old data tend to
become less and less. The ideal way is to use time varying factors that are determined
according to the dynamic and observation model accuracy.
When there is deviation due to the changes of covariance and measurement noise, the
corresponding innovation covariance matrix can be rewritten as:
C k H k Pk H Tk R k
and
C k λ P H k Pk H Tk λ R R k (40)
To enhance the tracking capability, the time-varying suboptimal scaling factor is
incorporated, for on-line tuning the covariance of the predicted state, which adjusts the filter
gain, and accordingly the improved version of AFKF is obtained. The optimum fading
factors can be calculated through the single factor:
tr (C ˆ )
k
i ( λ P )ii max 1, , i 1,2, m (41)
tr ( Ck )
where tr[] is the trace of matrix; i 1 , is a scaling factor. Increasing i will improve
tracking performance.
(2) Adaptation on measurement noise covariance. As the strength of measurement noise changes
with the environment, incorporation of the fading factor only is not able to restrain the
expected estimation accuracy. For resolving these problems, the ATS needs a mechanism for
R-adaptation in addition to P-adaptation, to adjust the noise strengths and improve the filter
estimation performance.
A parameter which represents the ratio of the actual innovation covariance based on the
sampled sequence to the theoretical innovation covariance matrices can be defined as one of
the following methods:
(a) Single factor
ˆ )
tr (Ck
j ( λ R ) jj , j 1,2 , n (42a)
tr (C k )
(b) Multiple factors
ˆ )
(C k jj
j , j 1,2, n (42b)
(C k ) jj
76 Sensor Fusion and Its Applications
It should be noted that from Equation (40) that increasing R k will lead to increasing C k ,
and vice versa. This means that time-varying R k leads to time-varying C k . The value of
λ R is introduced in order to reduce the discrepancies between C k and R k . The
adaptation can be implemented through the simple relation:
Rk λ RR k (43)
Further detail regarding the adaptive tuning loop is illustrated by the flow charts shown in
Figs. 1 and 2, where two architectures are presented. Fig. 1 shows the system architecture #1
and Fig. 2 shows the system architecture #2, respectively. In Fig. 1, the flow chart contains
two portions, for which the block indicated by the dot lines is the adaptive tuning system
(ATS) for tuning the values of both P and R parameters; in Fig. 2, the flow chart contains
three portions, for which the two blocks indicated by the dot lines represent the R-
adaptation loop and P-adaptation loop, respectively.
x̂ 0 and P0
xˆ k xˆ k K k [z k zˆ k ]
Pk I K k H k Pk
υ k z k zˆ k
(Adaptive Tuning System)
ˆ 1 υ υT
k
Ck j j
N j j0
Ck H k Pk H Tk R k
R-adaptation P-adaptation
ˆ )
tr (C ˆ )
tr (C
k k
( λ R ) jj ( λ P ) ii
tr (Ck ) tr (Ck )
R k λ RR k (λ P ) ii max1, (λ P ) ii
xˆ k1 Φ k xˆ k
Pk1 λ P (Φ k Pk Φ Tk Q k )
Fig. 1. Flow chart of the IAE/AFKF hybrid AKF method - system architecture #1
Adaptive Kalman Filter for Navigation Sensor Fusion 77
An important remark needs to be pointed out. When the system architecture #1 is employed,
only one window size is needed. It can be seen that the measurement noise covariance of the
innovation covariance matrix hasn’t been updated when performing the fading factor
calculation. In the system architecture #2, the latest information of the measurement noise
strength has already been available when performing the fading factor calculation. However,
one should notice that utilization of the ‘old’ (i.e., before R-adaptation) information is
required. Otherwise, unreliable result may occur since the deviation of the innovation
covariance matrix due to the measurement noise cannot be correctly detected. One strategy
for avoiding this problem can be done by using two different window sizes, one for R-
adaptation loop and the other for P-adaptation loop.
x̂ 0 and P0
xˆ k xˆ k K k [z k zˆ k ]
Pk I K k H k Pk
ˆ 1 υ υT
k
k
ˆ 1 C
Ck
NR
υ j υ Tj k
N P j j0
j j
j j0
Ck H k Pk H Tk R k
Ck H k Pk H Tk R k
ˆ )
tr (C ˆ )
tr (C
k k
( λ R ) jj ( λ P ) ii
tr (Ck ) tr (Ck )
Rk λ RRk (λ P ) ii max1, (λ P ) ii
xˆ k 1 Φ k xˆ k
Pk1 λ P (Φ k Pk Φ Tk Q k )
Fig. 2. Flow chart of the IAE/AFKF hybrid AKF method - system architecture #2
78 Sensor Fusion and Its Applications
n
e
vn
n n 1 0 0 0 0 0 0 0 ve vn
z k INS GPS (46)
eINS eGPS 0 1 0 0 0 0 0 0 ve
au
a
v
r
Further simplification of the above two models leads to
n 0 0 1 0 0 n 0
e 0
0 0 1 0 e 0
d
vn 0 0 0 0 0 vn wn (47)
dt
ve 0 0 0 0 0 ve w e
0
0 0 0 0 w
and
n
e
nINS nGPS 1 0 0 0 0 v
zk vn n (48)
eINS eGPS 0 1 0 0 0 v ve
e
respectively.
North
au
av East
(A) Example 1: utilization of the fuzzy adaptive fading Kalman filter (FAFKF) approach
The first illustrative example is taken from Jwo & Huang (2009). Fig. 4 provides the strategy
for the GPS/INS navigation processing based on the FAFKF mechanism. The GPS
navigation solution based on the least-squares (LS) is solved at the first stage. The
measurement is the residual between GPS LS and INS derived data, which is used as the
measurement of the KF.
80 Sensor Fusion and Its Applications
Corrected output x̂
xINS
INS
measurement prediction +
h(x * ) + Estimated
INS Errors
xGPS -
GPS AFKF
+
Innovation
information
Determination of threshold c
FLAS
Fig. 4. GPS/INS navigation processing using the FAFKF for the illustrative example 1.
The experiment was conducted on a simulated vehicle trajectory originating from the (0, 0)
m location. The simulated trajectory of the vehicle and the INS derived position are shown
as in Fig. 5. The trajectory of the vehicle can be approximately divided into two categories
according to the dynamic characteristics. The vehicle was simulated to conduct constant-
velocity straight-line during the three time intervals, 0-200, 601-1000 and 1401-1600s, all at a
speed of 10 m/s. Furthermore, it conducted counterclockwise circular motion with radius
2000 meters during 201-600 and 1001-1400s where high dynamic maneuvering is involved.
The following parameters were used: window size N =10; the values of noise standard
deviation are 2e-3 m /s 2 for accelerometers and 5e-4 m /s 2 for gyroscopes.
The presented FLAS is the If-Then form and consists of 3 rules. The υ and innovation
covariance Ĉ k as the inputs. The fuzzy rules are designed as follows:
Fig. 5. Trajectory for the simulated vehicle (solid) and the INS derived position (dashed)
(c) Threshold c
Fig. 6. Membership functions for the inputs and output
82 Sensor Fusion and Its Applications
Fig. 7. East and north components of navigation errors and the 1-σ bound based on the
FAFKF method
Fig. 9. Trajectories of the threshold c (top) from the fuzzy logic output, and the
corresponding fading factor k (bottom)
Corrected output x̂
xINS
INS
measurement prediction +
h(x * ) + Estimated
INS Errors
xGPS -
GPS KF
+
Determination of Innovation
λ P and λ R information
ATS
Fig. 10. GPS/INS navigation processing using the IAE/AFKF Hybrid AKF for the
illustrative example 2
Fig. 11. Trajectory for the simulated vehicle (solid) and the INS derived position (dashed)
Adaptive Kalman Filter for Navigation Sensor Fusion 85
Fig. 12. The solution from the integrated navigation system without adaptation as compared
to the GPS navigation solutions by the LS approach
Fig. 13. The solutions for the integrated navigation system with and without adaptation
In the real world, the measurement will normally be changing in addition to the change of
process noise or dynamic such as maneuvering. In such case, both P-adaptation and R-
adaptation tasks need to be implemented. In the following discussion, results will be
provided for the case when measurement noise strength is changing in addition to the
86 Sensor Fusion and Its Applications
Partial adaptation
Partial adaptation
Fig. 14. East and north components of navigation errors and the 1-σ bound based on the
method without measurement noise adaptation
Full adaptation
Full adaptation
Fig. 15. East and north components of navigation errors and the 1-σ bound based on the
proposed method (with adaptation on both estimation covariance and measurement noise
covariance matrices)
Calculated (solid)
Reference (dashed)
Calculated (solid)
Reference (dashed)
Fig. 16. Reference (true) and calculated standard deviations for the east (top) and north
(bottom) components of the measurement noise variance values
88 Sensor Fusion and Its Applications
Fig. 17. East and north components of navigation errors and the 1-σ bound based on the
proposed method when the threshold setting is not incorporated
Calculated (solid)
Reference (dashed)
Calculated (solid)
Reference (dashed)
Fig. 18. Reference (true) and calculated standard deviations for the east and north
components of the measurement noise variance values when the threshold setting is not
incorporated
Adaptive Kalman Filter for Navigation Sensor Fusion 89
5. Conclusion
This chapter presents the adaptive Kalman filter for navigation sensor fusion. Several types
of adaptive Kalman filters has been reviewed, including the innovation-based adaptive
estimation (IAE) approach and the adaptive fading Kalman filter (AFKF) approach. Various
types of designs for the fading factors are discussed. A new strategy through the
hybridization of IAE and AFKF is presented with an illustrative example for integrated
navigation application. In the first example, the fuzzy logic is employed for assisting the
AFKF. Through the use of fuzzy logic, the designed fuzzy logic adaptive system (FLAS) has
been employed as a mechanism for timely detecting the dynamical changes and
implementing the on-line tuning of threshold c , and accordingly the fading factor, by
monitoring the innovation information so as to maintain good tracking capability.
In the second example, the conventional KF approach is coupled by the adaptive tuning
system (ATS), which gives two system parameters: the fading factor and measurement noise
covariance scaling factor. The ATS has been employed as a mechanism for timely detecting the
dynamical and environmental changes and implementing the on-line parameter tuning by
monitoring the innovation information so as to maintain good tracking capability and
estimation accuracy. Unlike some of the AKF methods, the proposed method has the merits of
good computational efficiency and numerical stability. The matrices in the KF loop are able to
remain positive definitive. Remarks to be noted for using the method is made, such as: (1) The
window sizes can be set different, to avoid the filter degradation/divergence; (2) The fading
factors (λ P )ii should be always larger than one while (λ R ) jj does not have such limitation.
Simulation experiments for navigation sensor fusion have been provided to illustrate the
accessibility. The accuracy improvement based on the AKF method has demonstrated
remarkable improvement in both navigational accuracy and tracking capability.
6. References
Abdelnour, G.; Chand, S. & Chiu, S. (1993). Applying fuzzy logic to the Kalman filter
divergence problem. IEEE Int. Conf. On Syst., Man and Cybernetics, Le Touquet,
France, pp. 630-634
Brown, R. G. & Hwang, P. Y. C. (1997). Introduction to Random Signals and Applied Kalman
Filtering, John Wiley & Sons, New York, 3rd edn
Bar-Shalom, Y.; Li, X. R. & Kirubarajan, T. (2001). Estimation with Applications to Tracking and
Navigation, John Wiley & Sons, Inc
Bakhache, B. & Nikiforov, I. (2000). Reliable detection of faults in measurement systems,
International Journal of adaptive control and signal processing, 14, pp. 683-700
Caliskan, F. & Hajiyev, C. M. (2000). Innovation sequence application to aircraft sensor fault
detection: comparison of checking covariance matrix algorithms, ISA Transactions,
39, pp. 47-56
Ding, W.; Wang, J. & Rizos, C. (2007). Improving Adaptive Kalman Estimation in GPS/INS
Integration, The Journal of Navigation, 60, 517-529.
Farrell, I. & Barth, M. (1999) The Global Positioning System and Inertial Navigation, McCraw-
Hill professional, New York
Gelb, A. (1974). Applied Optimal Estimation. M. I. T. Press, MA.
90 Sensor Fusion and Its Applications
Grewal, M. S. & Andrews, A. P. (2001). Kalman Filtering, Theory and Practice Using MATLAB,
2nd Ed., John Wiley & Sons, Inc.
Hide, C, Moore, T., & Smith, M. (2003). Adaptive Kalman filtering for low cost INS/GPS,
The Journal of Navigation, 56, 143-152
Jwo, D.-J. & Cho, T.-S. (2007). A practical note on evaluating Kalman filter performance
Optimality and Degradation. Applied Mathematics and Computation, 193, pp. 482-505
Jwo, D.-J. & Wang, S.-H. (2007). Adaptive fuzzy strong tracking extended Kalman filtering
for GPS navigation, IEEE Sensors Journal, 7(5), pp. 778-789
Jwo, D.-J. & Weng, T.-P. (2008). An adaptive sensor fusion method with applications in
integrated navigation. The Journal of Navigation, 61, pp. 705-721
Jwo, D.-J. & Chang, F.-I., 2007, A Fuzzy Adaptive Fading Kalman Filter for GPS Navigation,
Lecture Notes in Computer Science, LNCS 4681:820-831, Springer-Verlag Berlin
Heidelberg.
Jwo, D.-J. & Huang, C. M. (2009). A Fuzzy Adaptive Sensor Fusion Method for Integrated
Navigation Systems, Advances in Systems Science and Applications, 8(4), pp.590-604.
Loebis, D.; Naeem, W.; Sutton, R.; Chudley, J. & Tetlow S. (2007). Soft computing techniques
in the design of a navigation, guidance and control system for an autonomous
underwater vehicle, International Journal of adaptive control and signal processing,
21:205-236
Mehra, R. K. (1970). On the identification of variance and adaptive Kalman filtering. IEEE
Trans. Automat. Contr., AC-15, pp. 175-184
Mehra, R. K. (1971). On-line identification of linear dynamic systems with applications to
Kalman filtering. IEEE Trans. Automat. Contr., AC-16, pp. 12-21
Mehra, R. K. (1972). Approaches to adaptive filtering. IEEE Trans. Automat. Contr., Vol. AC-
17, pp. 693-698
Mohamed, A. H. & Schwarz K. P. (1999). Adaptive Kalman filtering for INS/GPS. Journal of
Geodesy, 73 (4), pp. 193-203
Mostov, K. & Soloviev, A. (1996). Fuzzy adaptive stabilization of higher order Kalman filters in
application to precision kinematic GPS, ION GPS-96, Vol. 2, pp. 1451-1456, Kansas
Salychev, O. (1998). Inertial Systems in Navigation and Geophysics, Bauman MSTU Press,
Moscow.
Sasiadek, J. Z.; Wang, Q. & Zeremba, M. B. (2000). Fuzzy adaptive Kalman filtering for
INS/GPS data fusion. 15th IEEE int. Symp. on intelligent control, Rio Patras, Greece, pp.
181-186
Xia, Q.; Rao, M.; Ying, Y. & Shen, X. (1994). Adaptive fading Kalman filter with an
application, Automatica, 30, pp. 1333-1338
Yang, Y.; He H. & Xu, T. (1999). Adaptively robust filtering for kinematic geodetic
positioning, Journal of Geodesy, 75, pp.109-116
Yang, Y. & Xu, T. (2003). An adaptive Kalman filter based on Sage windowing weights and
variance components, The Journal of Navigation, 56(2), pp. 231-240
Yang, Y.; Cui, X., & Gao, W. (2004). Adaptive integrated navigation for multi-sensor
adjustment outputs, The Journal of Navigation, 57(2), pp. 287-295
Zhou, D. H. & Frank, P. H. (1996). Strong tracking Kalman filtering of nonlinear time-
varying stochastic systems with coloured noise: application to parameter
estimation and empirical robustness analysis. Int. J. control, Vol. 65, No. 2, pp. 295-
307
Fusion of Images Recorded with Variable Illumination 91
0
5
1. Introduction
The results of an automated visual inspection (AVI) system depend strongly on the image
acquisition procedure. In particular, the illumination plays a key role for the success of the
following image processing steps. The choice of an appropriate illumination is especially cri-
tical when imaging 3D textures. In this case, 3D or depth information about a surface can
be recovered by combining 2D images generated under varying lighting conditions. For this
kind of surfaces, diffuse illumination can lead to a destructive superposition of light and sha-
dows resulting in an irreversible loss of topographic information. For this reason, directional
illumination is better suited to inspect 3D textures. However, this kind of textures exhibits a
different appearance under varying illumination directions. In consequence, the surface in-
formation captured in an image can drastically change when the position of the light source
varies. The effect of the illumination direction on the image information has been analyzed in
several works [Barsky & Petrou (2007); Chantler et al. (2002); Ho et al. (2006)]. The changing
appearance of a texture under different illumination directions makes its inspection and clas-
sification difficult. However, these appearance changes can be used to improve the knowledge
about the texture or, more precisely, about its topographic characteristics. Therefore, series of
images generated by varying the direction of the incident light between successive captures
can be used for inspecting 3D textured surfaces. The main challenge arising with the varia-
ble illumination imaging approach is the fusion of the recorded images needed to extract the
relevant information for inspection purposes.
This chapter deals with the fusion of image series recorded using variable illumination direc-
tion. Next section presents a short overview of related work, which is particularly focused
on the well-known technique photometric stereo. As detailed in Section 2, photometric stereo
allows to recover the surface albedo and topography from a series of images. However, this
method and its extensions present some restrictions, which make them inappropriate for some
problems like those discussed later. Section 3 introduces the imaging strategy on which the
proposed techniques rely, while Section 4 provides some general information fusion concepts
and terminology. Three novel approaches addressing the stated information fusion problem
92 Sensor Fusion and Its Applications
are described in Section 5. These approaches have been selected to cover a wide spectrum
of fusion strategies, which can be divided into model-based, statistical and filter-based me-
thods. The performance of each approach are demonstrated with concrete automated visual
inspection tasks. Finally, some concluding remarks are presented.
terns, more specifically, than diffuse lighting. In this sense, a variable directional illumination
strategy presents an optimal framework for surface inspection purposes.
The imaging system presented in the following is characterized by a fixed camera position
with its optical axis parallel to the z-axis of a global Cartesian coordinate system. The camera
lens is assumed to perform an orthographic projection. The illumination space is defined as
the space of all possible illumination directions, which are completely defined by two angles:
the azimuth ϕ and the elevation angle θ; see Fig. 1.
An illumination series S is defined as a set of B images g(x, bb ), where each image shows the
same surface part, but under a different illumination direction given by the parameter vector
b b = ( ϕ b , θ b )T :
S = { g(x, bb ), b = 1, . . . , B} , (1)
with x = ( x, y)T ∈ R2 . The illuminant positions selected to generate a series {bb , b = 1, . . . , B}
represent a discrete subset of the illumination space. In this sense, the acquisition of an image
series can be viewed as the sampling of the illumination space.
Beside point light sources, illumination patterns can also be considered to generate illumina-
tion series. The term illumination pattern refers here to a superposition of point light sources.
One approach described in Section 5 uses sector-shaped patterns to illuminate the surface si-
multaneously from all elevation angles in the interval θ ∈ [0◦ , 90◦ ] given an arbitrary azimuth
angle; see Fig. 2. In this case, we refer to a sector series Ss = { g(x, ϕb ), b = 1, . . . , B} as an
image series in which only the azimuthal position of the sector-shaped illumination pattern
varies.
approaches can be classified according to the utilized sensor type into passive, active and
a mix of both (passive/active). Additionally, the sensor configuration can be divided into
parallel or serial. If the fusion approaches are analyzed by considering the nature of the sen-
sors’ information, they can be grouped into recurrent, complementary or cooperative. Finally,
if the hierarchies of the input and output data classes (data, feature or decision) are consi-
dered, the fusion methods can be divided into different architectures: data input-data output
(DAI-DAO), data input-feature output (DAI-FEO), feature input-feature output (FEI-FEO),
feature input-decision output (FEI-DEO) and decision input-decision output (DEI-DEO). The
described categorizations are the most frequently encountered in the literature. Table 1 shows
the fusion categories according to the described viewpoints. The shaded boxes indicate those
image fusion categories covered by the approaches presented in this chapter.
This chapter is dedicated to the fusion of images series in the field of automated visual inspec-
tion of 3D textured surfaces. Therefore, from the viewpoint of the application area, the ap-
proaches presented in the next section can be assigned to the field of robotics. The objectives
of the machine vision tasks are the detection and classification of defects. Now, if we analyze
the approaches considering the sensor type, we find that the specific sensor, i.e., the camera, is
Fusion of Images Recorded with Variable Illumination 95
a passive sensor. However, the whole measurement system presented in the previous section
can be regarded as active, if we consider the targeted excitation of the object to be inspected
by the directional lighting. Additionally, the acquisition system comprises only one camera,
which captures the images of the series sequentially after systematically varying the illumina-
tion configuration. Therefore, we can speak here about serial virtual sensors.
More interesting conclusions can be found when analyzing the approaches from the point
of view of the involved data. To reliably classify defects on 3D textures, it is necessary to
consider all the information distributed along the image series simultaneously. Each image in
the series contributes to the final decision with a necessary part of information. That is, we
are fusing cooperative information. Now, if we consider the hierarchy of the input and output
data classes, we can globally classify each of the fusion methods in this chapter as DAI-DEO
approaches. Here, the input is always an image series and the output is always a symbolic
result (segmentation or classification). However, a deeper analysis allows us to decompose
each approach into a concatenation of DAI-FEO, FEI-FEO and FEI-DEO fusion architectures.
Schemes showing these information processing flows will be discussed for each method in
the corresponding sections.
where gx (b) is the intensity signal at a fixed location x as a function of the illumination pa-
rameters b. This signal allows us to derive a set of model-based features, which are extracted
individually at each location on the surface and are independent of the surrounding locations.
The features considered in the following method are related to the macrostructure (the local
orientation) and to reflection properties associated with the microstructure of the surface.
Ld = kd · cos(θ − θn ). (4)
Fusion of Images Recorded with Variable Illumination 97
The assignment of the variables θ (angle of the incident light) and θn (angle of the normal
vector orientation) is explained in Fig. 3.
Fig. 3. Illumination direction, direction of observation, and local surface normal n are in-plane
for the applied 1D case of the reflection model. The facet, which reflects the incident light into
the camera, is tilted by ε with respect to the normal of the local surface spot.
The forescatter reflection is described by a geometric model according to [Torrance & Sparrow
(1967)]. The surface is considered to be composed of many microscopic facets, whose normal
vectors diverge from the local normal vector n by the angle ε; see Fig. 3. These facets are
normally distributed and each one reflects the incident light like a perfect mirror. As the
surface is assumed to be isotropic, the facets distribution function pε (ε) results rotationally
symmetric:
ε2
pε (ε) = c · exp − 2 . (5)
2σ
We define a surface spot as the surface area which is mapped onto a pixel of the sensor. The
reflected radiance of such spots with the orientation θn can now be expressed as a function of
the incident light angle θ:
kfs (θ + θr − 2θn )2
Lfs = exp − . (6)
cos(θr − θn ) 8σ2
The parameter σ denotes the standard deviation of the facets’ deflection, and it is used as
a feature to describe the degree of specularity of the surface. The observation direction of
the camera θr is constant for an image series and is typically set to 0◦ . Further effects of the
original facet model of Torrance and Sparrow, such as shadowing effects between the facets,
are not considered or simplified in the constant factor kfs .
The reflected radiance Lr leads to an irradiance reaching the image sensor. For constant small
solid angles, it can be assumed that the radiance Lr is proportional to the intensities detected
by the camera:
gx ( θ ) ∝ Lr ( θ ). (7)
98 Sensor Fusion and Its Applications
Considering Eqs. (3)-(7), we can formulate our model for the intensity signals detected by the
camera as follows:
kfs (θ + θr − 2θn )2
gx (θ ) = kd · cos(θ − θn ) + exp − . (8)
cos(θr − θn ) 8σ2
This equation will be subsequently utilized to model the intensity of a small surface area (or
spot) as a function of the illumination direction.
n(x) = (cos φ(x) sin ϑ (x), sin φ(x) sin ϑ (x), cos ϑ (x))T . (9)
5.1.3 Segmentation
Segmentation methods are often categorized into region-oriented and edge-oriented approaches.
Whereas the first ones are based on merging regions by evaluating some kind of homogeneity
criterion, the latter rely on detecting the contours between homogeneous areas. In this section,
we make use of region-oriented approaches. The performance is demonstrated by examining
the surface of two different cutting inserts: a new part, and a worn one showing abrasion on
the top of it; see Fig. 4.
Fig. 4. Test surfaces: (left) new cutting insert; (right) worn cutting insert. The shown images
were recorded with diffuse illumination (just for visualization purposes).
the parameters of the reflection model was sufficient to achieve a satisfactory segmentation.
Further, other surface characteristics of interest could be detected by exploiting the remaining
surface model parameters.
Fig. 5. Pseudo-colored representation of the derivatives p(x) and q(x) of the surface normal:
(left) new cutting insert; (right) worn cutting insert. The worn area is clearly visible in area of
the rightmost image as marked by a circle.
Fig. 6. Results of the region-based segmentation of the feature images p(x) and q(x): (left)
new cutting insert; (right) worn cutting insert. In the rightmost image, the worn regions were
correctly discerned from the intact background.
Fig. 7 shows a segmentation result based on the model parameters kd (x), kfs (x) and σ(x).
This result was obtained by thresholding the three parameter signals, and then combining
100 Sensor Fusion and Its Applications
them by a logical conjunction. The right image in Fig. 7 compares the segmentation result
with a manual selection of the worn area.
Fig. 7. Result of the region-based segmentation of the defective cutting insert based on the
parameters of the reflection model: (left) segmentation result; (right) overlay of an original
image, a selection of the defective area by an human expert (green), and the segmentation
result (red). This result was achieved using a different raw dataset than for Figs. 5 and 6. For
this reason, the cutting inserts are depicted with both a different rotation angle and a different
magnification.
5.1.4 Discussion
The segmentation approach presented in this section utilizes significantly more information
than conventional methods relying on the processing of a single image. Consequently, they are
able to distinguish a larger number of surface characteristics. The region-based segmentation
methodology allows exploiting multiple clearly interpretable surface features, thus enabling a
discrimination of additional nuances. For this reason, a more reliable segmentation of surfaces
with arbitrary characteristics can be achieved.
Fig. 8 illustrates the fusion process flow. Basically, the global DAI-DEO architecture can be
Feature
Segmentation
extraction
DAI-FEO FEI-DEO
DAI-DEO
seen as the concatenation of 2 fusion steps. First, features characterizing the 3D texture are ex-
tracted by fusing the irradiance information distributed along the images of the series. These
Fusion of Images Recorded with Variable Illumination 101
features, e.g., surface normal and reflection parameters, are then combined in the segmenta-
tion step, which gives as output a symbolic (decision level) result.
where g(1) (x), . . . , g( B) (x) denote the individual images of the defined series.
and description of different approaches and implementations of ICA algorithms can be found
in [Hyvärinen & Oja (2000)].
The calculation of an independent component si is achieved by means of the inner product of
a row vector wTi of the ICA matrix W and an observed vector v:
m
(k)
si = wi , v = ∑ wi · v(k) , (13)
k =1
(k)
where wi and v(k) are the k-components of the vectors wi and v respectively. This step is
called feature extraction and the vectors wi , which can be understood as filters, are called
feature detectors. In this sense, si can be seen as features of v. However, in the literature,
the concept of feature is not uniquely defined, and usually ai is denoted as feature, while si
corresponds to the amplitude of the feature in v. In the following sections, the concept of
feature will be used for si and ai interchangeably.
Training phase
Set of image Sorting and
Learning of
series for selection of
ICA features
training features
Detection phase
Image series Generation Segmented
Feature -
of surface to of image Thresholds defects
extraction
be inspected series image
+
Basically, Eq. (16) gives a measure of the pixel intensity distribution similarity between the
(1,...,4)
individual images ai (x) of an image vector ai (x). A low value of f (ai (x)) denotes a high
similarity. The image series of the basis ai (x) are then sorted by this measure. As defects
introduce local variations of the intensity distribution between the images of a series, the
lower the value of f (ai (x)), the better describes ai (x) the background.
Whole images are simply obtained by generating contiguous image patches and then joining
them together. The segmented defect image is obtained following the thresholding scheme
shown in Fig. 10. This scheme can be explained as follows:
104 Sensor Fusion and Its Applications
abs +
-
abs +
-
segmented
defect
image
+
abs +
-
abs +
-
• When the absolute value of the difference between an original image g(1,...,4) (x) and the
(1,...,4)
generated one ggen (x) exceeds a threshold Thresha , then these areas are considered
as possible defects.
• When possible defective zones occur in the same position at least in Threshb different
individual images of the series, then this area is considered as defective.
Fig. 11. Image series of a tested surface. (a)-(d): Original images. (e)-(h): Generated texture
images.
(a) Possible defects (Thresha = 30). (b) Segmented defect image (Threshb = 2).
Fig. 12. Possible defective areas and image of segmented defects of a varnished wood surface.
5.2.6 Discussion
A method for defect detection on textured surfaces was presented. The method relies on
the fusion of an image series recorded with variable illumination, which provides a better
visualization of topographical defects than a single image of a surface. The proposed method
can be considered as filter-based: a filter bank (a set of feature detectors) is learned by applying
ICA to a set of training surfaces. The learned filters allow a separation of the texture from the
106 Sensor Fusion and Its Applications
Image series
Defect detection
Feature Background
Sum and
extraction generation and
Threshold
and sorting subtraction
DAI-FEO FEI-FEO FEI-DEO
DAI-DEO
A common approach to construct an invariant feature from g(x) is integrating over the trans-
formation space P :
f˜( g(x)) = f (t(p){ g(x)}) dp . (18)
P
Equation (18) is known as the Haar integral. The function f := f (s), which is paremetrized
by a vector s, is an arbitrary, local kernel function, whose objective is to extract relevant in-
formation from the pattern. By varying the kernel function parameters defined in s, different
features can be obtained in order to achieve a better and more accurate description of the
pattern.
In this approach, we aim at extracting invariant features with respect to the 2D Euclidean
motion, which involves rotation and translation in R2 . Therefore, the parameter vector of
the transformation function is given as follows: p = (τx , τy , ω )T , where τx and τy denote the
translation parameters in x and y direction, and ω the rotation parameter. In order to guar-
antee the convergence of the integral, the translation is considered cyclical [Schulz-Mirbach
(1995)]. For this specific group, Eq. (18) can be rewritten as follows:
f˜l ( g(x)) = f l (t(τx , τy , ω ){ g(x)}) dτx dτy dω , (19)
P
where f˜l ( g(x)) denotes the invariant feature obtained with the specific kernel function f l :=
f (sl ) and l ∈ {1, . . . , L}. For the discrete case, the integration can be replaced by summations
as:
M −1 N −1 K −1
f˜l ( g(x)) = ∑ ∑ ∑ f l (tijk { gmn }) . (20)
i =0 j =0 k =0
Here, tijk and gmn are the discrete versions of the transformation and the gray scale image
respectively, K = 360◦ /∆ω and M × N denotes the image size.
where the vector (m , n )T is the translated and rotated vector (m, n)T :
m cos(k∆ω ) sin(k∆ω ) m i
= − . (24)
n − sin ( k∆ω ) cos ( k∆ω ) n j
The kernel function f lijk extracts the information of the image series, considering this latter as
a whole. That is, the calculation of the kernel function for each value of i, j and k implies a first
fusion process. Introducing f lijk (S) in Eq. (22), the invariant feature f˜l (S) can be expressed
as:
M −1 N −1 K −1
f˜l (S) = ∑ ∑ ∑ f lijk (S) . (27)
i =0 j =0 k =0
The summations over i, j and k are necessary to achieve invariance against 2D rotation and
translation. However, as a consequence of the summations, much information extracted by
f lijk gets lost. Therefore, the resulting feature f˜l (S) presents a low capability to discriminate
between classes. For the application of interest, a high discriminability is especially important,
because different kinds of varnish defects can be very similar. For this reason, the integration
method to achieve invariance is used only for the rotation transformation, as explained in the
following paragraphs.
Fusion of Images Recorded with Variable Illumination 109
An alternative to this summation is the utilization of histograms, which are inherently inva-
riant to translation. This option has the advantage to avoid the loss of information resulting
from the summation over i and j, so that the generated features have a better capability to re-
present different classes [Siggelkow & Burkhardt (1998)]. Then, considering all values of i and
j, a fuzzy histogram Hcl (S) is constructed from the rotation invariants f˜ijl (S) [Schael (2005)],
where c = 1, . . . , C denotes the histogram bins. Finally, the resulting histogram represents our
feature against 2D Euclidean motion:
5.3.3 Results
The presented method is used to detect and classify defects on varnished wood surfaces.
Given the noisy background due to the substrate texture and the similarity between defect
classes, the extracted feature should have good characteristics with regard to discriminability
[Pérez Grassi et al. (2006)]. A first step in this direction was made by using histograms to
achieve translation invariance instead of integration. Another key aspect is the proper selec-
tion of a kernel function f lijk . For the results presented below, the kernel function is a vectorial
function flijk (r1,l , r2,l , αl , β l , al , ∆ϑl ), whose q-th element is given by:
q 1
f lijk (S) = g̃uq − g̃vq . (31)
B l l
q q
Here, the vectors ul and vl are defined as:
q r1,l cos(αl + q ∆ϑl ) q r2,l cos( β l + q ∆ϑl )
ul = ;0 , vl = ; al , (32)
−r1,l sin(αl + q ∆ϑl ) −r2,l sin( β l + q ∆ϑl )
where 1 ≤ q ≤ Ql and Ql = 360◦ /∆ϑl . According to Eqs. (31) and (32), two circular neigh-
borhoods with radii r1,l and r2,l are defined in the images b = 0 and b = al respectively. Both
circumferences are sampled with a frequency given by the angle ∆ϑl . This sampling results
q q
in Ql points per neighborhood, which are addressed through the vectors ul and vl corres-
q
pondingly. Each element f lijk (S) of the kernel function is obtained by taking the absolute
q q
value of the difference between the intensities in the positions ul and vl with the same q. In
Fig. 14, the kernel function for a given group of parameters is illustrated. In this figure, the
110 Sensor Fusion and Its Applications
Fig. 14. Kernel function flijk for a image series with B = 4 (∆ϕ = 90◦ ). Function’s parameters:
al = 2, r1,l = 0.5 r2,l , αl = 45◦ , β l = 90◦ and ∆θl = 180◦ ( Ql = 2). The lines between the points
represent the absolut value of the difference (in the image, the subindex l has been suppressed
from the function parameters to obtain a clearer graphic).
q q q
pairs of points ul and vl involved in the calculation of each element f lijk of flijk are linked by
segments.
Using the defined kernel function, a vectorial feature f̃ijl (S) invariant against rotation is ob-
lQ
tained using Eq. (28), where f̃ijl (S) = ( f ijl1 (S), . . . , f ij (S)). Then, a fuzzy histogram Hcl (S) is
lq
constructed from each element f˜ij (S) of f̃ijl (S). This results in a sequence of Q histrograms
lq
Hc , which represents our final invariant feature:
lQ
f̃l (S) = (Hcl1 (S), . . . , Hc (S)). (33)
The performance of the resulting feature f̃l (S) is tested in the classification of different varnish
defects on diverse wood textures. The classification is performed by a Support Vector Machine
(SVM) and the features f̃l (S) are extracted locally from the image series by analyzing small
image windows (32 × 32 pixels). Fig. 15 shows some classification results for five different
classes: no defect, bubble, ampulla, fissure and crater. Theses results were generated using
image series consisting of eight images (B = 8) and ten different parameter vectors of the
kernel function (L = 10).
5.3.4 Discussion
The presented method extracts invariant features against rotation and translation from illu-
mination series, which were generated varying the azimuth of an illumination source syste-
matically. Taking the surface characteristics and the image acquisition process into account, a
Fusion of Images Recorded with Variable Illumination 111
Fig. 15. Results: (left) single images of the image series; (right) classification result.
kernel function has been defined, which allows the extraction of relevant information. For the
generation of the invariant features, two methods have been applied: The invariance against
rotation has been achieved by integration over the transformation space, while the invariance
against translation was obtained by building fuzzy histograms. The classification of the ob-
tained features is performed by a SVM. The obtained features have been successfully used in
the detection and classification of finishing defects on wood surfaces.
In Fig. 16 the fusion architecture is shown schematically. The information processing can be
represented as a concatenation of different fusion blocks. The first and second processing steps
perform the invariant feature extraction from the data. Finally, the SVM classifier generates a
symbolic output (decision level data): the classes of the detected defects.
112 Sensor Fusion and Its Applications
Rotation and
Rotation
Translation
Invariant
Image series Invariant
Defect class
Rotation Translation
SVM
invariant invariant
DAI-FEO FEI-FEO FEI-DEO
DAI-DEO
Fig. 16. Fusion architecture scheme for the method based on invariant features.
6. Conclusions
The illumination configuration chosen for any image acquisition stage plays a crucial role
for the success of any machine vision task. The proposed multiple image strategy, based
on the acquisition of images under a variable directional light source, results in a suitable
framework for defect detection and surface assessment problems. This is mainly due to the
enhanced local contrast achieved in the individual images. Another important fact is that
the cooperative information distributed along the image series provides a better and more
complete description of a surface topography. Three methods encompassing a wide field of
signal processing and information fusion strategies have been presented. The potentials and
benefits of using multi-image analysis methods and their versatility have been demonstrated
with a variety of nontrivial and demanding machine vision tasks, including the inspection of
varnished wood boards and machined metal pieces such as cutting inserts.
The fusion of images recorded with variable illumination direction has its roots in the well-
known photometric stereo technique developed by [Woodham (1980)]. In its original for-
mulation, the topography of Lambertian surfaces can be reconstructed. Since then, many au-
thors have extended its applicability to surfaces with different reflection characteristics. In this
chapter, a novel segmentation approach that estimates not only the surface normal direction
but also reflectance properties was presented. As shown, these properties can be efficiently
used as features for the segmentation step. This segmentation approach makes use of more
information than conventional methods relying on single images, thus enabling a discrimi-
nation of additional surface properties. It was also shown, that, for some specific automated
visual inspection problems, an explicit reconstruction of the surface profile is neither necessary
nor efficient. In this sense, two novel problem-specific methods for detection of topographic
defects were presented: one of them filter-based and the other relying on invariant statistical
features.
Fusion of images recorded with variable illumination remains an open research area. Within
this challenging field, some new contributions have been presented in this chapter. On the
one hand, two application-oriented methods were proposed. On the other hand, a general
segmentation method was presented, which can be seen as an extension of the well established
photometric stereo technique.
Fusion of Images Recorded with Variable Illumination 113
7. References
Barsky, S. & Petrou, M. (2007). Surface texture using photometric stereo data: classification and
direction of illumination detection, Journal of Mathematical Imaging and Vision 29: 185–
204.
Beyerer, J. & Puente León, F. (2005). Bildoptimierung durch kontrolliertes aktives Sehen und
Bildfusion, Automatisierungstechnik 53(10): 493–502.
Chantler, M. J., Schmidt, M., Petrou, M. & McGunnigle, G. (2002). The effect of illuminant ro-
tation on texture filters: Lissajous’s ellipses, Vol. 2352 of Proceedings of the 7th European
Conference on Computer Vision-Part III, London, UK. Springer-Verlag, pp. 289–303.
Dasarathy, B. V. (1997). Sensor fusion potential exploitation-innovative architectures and il-
lustrative applications, Proceedings of the IEEE 85(1): 24–38.
Drbohlav, O. & Chantler, M. J. (2005). Illumination-invariant texture classification using single
training images, Texture 2005: Proceedings of the 4th International Workshop on
Texture Analysis and Synthesis, Beijing, China, pp. 31–36.
Gonzalez, R. C. & Woods, R. E. (2002). Digital image processing, Prentice Hall, Englewood Cliffs,
NJ.
Haralick, R. M. & Shapiro, L. G. (1992). Computer and Robot Vision, Vol. II, Reading, MA:
Addison-Wesley.
Heizmann, M. & Beyerer, J. (2005). Sampling the parameter domain of image series, Image
Processing: Algorithms and Systems IV, San José, CA, USA, pp. 23–33.
Ho, Y.-X., Landy, M. & Maloney, L. (2006). How direction of illumination affects visually
perceived surface roughness, Journal of Vision 6: 634–348.
Hyvärinen, A. & Oja, E. (2000). Independent component analysis: algorithms and applica-
tions, Neural Netw. 13(4-5): 411–430.
Lambert, G. & Bock, F. (1997). Wavelet methods for texture defect detection, ICIP ’97: Pro-
ceedings of the 1997 International Conference on Image Processing (ICIP ’97) 3-Volume Set-
Volume 3, IEEE Computer Society, Washington, DC, USA, p. 201.
Lindner, C. (2009). Segmentierung von Oberflächen mittels variabler Beleuchtung, PhD thesis, Tech-
nische Universität München.
Lindner, C. & Puente León, F. (2006). Segmentierung strukturierter Oberflächen mittels vari-
abler Beleuchtung, Technisches Messen 73(4): 200–2007.
McGunnigle, G. (1998). The classification of textured surfaces under varying illuminant direction,
PhD thesis, Heriot-Watt University.
McGunnigle, G. & Chantler, M. J. (2000). Rough surface classification using point statistics
from photometric stereo, Pattern Recognition Letters 21: 593–604.
Nachtigall, L. & Puente León, F. (2009). Merkmalsextraktion aus Bildserien mittels der Inde-
pendent Component Analyse , in G. Goch (ed.), XXIII. Messtechnisches Symposium des
Arbeitskreises der Hochschullehrer für Messtechnik e.V. (AHMT), Shaker Verlag, Aachen,
pp. 227–239.
Ojala, T., Pietikäinen, M. & Mäenpää, T. (2002). Multiresolution gray-scale and rotation in-
variant texture classification with local binary patterns, IEEE Transactions on Pattern
Analysis and Machine Intelligence 24(7): 971–987.
Penirschke, A., Chantler, M. J. & Petrou, M. (2002). Illuminant rotation invariant classifica-
tion of 3D surface textures using Lissajous’s ellipses, 2nd International Workshop on
Texture Analysis and Synthesis, Copenhagen, pp. 103–107.
114 Sensor Fusion and Its Applications
Pérez Grassi, A., Abián Pérez, M. A. & Puente León, F. (2008). Illumination and model-based
detection of finishing defects, Reports on Distributed Measurement Systems, Shaker
Verlag, Aachen, pp. 31–51.
Pérez Grassi, A., Abián Pérez, M. A., Puente León, F. & Pérez Campos, M. R. (2006). Detection
of circular defects on varnished or painted surfaces by image fusion, Proceedings of
the IEEE International Conference on Multisensor Fusion and Integration for Intelligent
Systems .
Puente León, F. (1997). Enhanced imaging by fusion of illumination series, in O. Loffeld (ed.),
Sensors, Sensor Systems, and Sensor Data Processing, Vol. 3100 of Proceedings of SPIE,
SPIE, pp. 297–308.
Puente León, F. (2001). Model-based inspection of shot peened surfaces using fusion tech-
niques, Vol. 4189 of Proceedings of SPIE on Machine Vision and Three-Dimensional Imag-
ing Systems for Inspection and Metrology, SPIE, pp. 41–52.
Puente León, F. (2002). Komplementäre Bildfusion zur Inspektion technischer Oberflächen,
Technisches Messen 69(4): 161–168.
Puente León, F. (2006). Automated comparison of firearm bullets, Forensic Science International
156(1): 40–50.
Schael, M. (2005). Methoden zur Konstruktion invarianter Merkmale für die Texturanalyse, PhD
thesis, Albert-Ludwigs-Universität Freiburg.
Schulz-Mirbach, H. (1995). Anwendung von Invarianzprinzipien zur Merkmalgewinnung in der
Mustererkennung, PhD thesis, Technische Universität Hamburg-Harburg.
Siggelkow, S. & Burkhardt, H. (1998). Invariant feature histograms for texture classification,
Proceedings of the 1998 Joint Conference on Information Sciences (JCIS’98) .
Torrance, K. E. & Sparrow, E. M. (1967). Theory for off-specular reflection from roughened
surfaces, J. of the Optical Society of America 57(9): 1105–1114.
Tsai, D.-M., Tseng, Y.-H., Chao, S.-M. & Yen, C.-H. (2006). Independent component analysis
based filter design for defect detection in low-contrast textured images, ICPR ’06:
Proceedings of the 18th International Conference on Pattern Recognition, IEEE Computer
Society, Washington, DC, USA, pp. 231–234.
Tsai, D. M. & Wu, S. K. (2000). Automated surface inspection using Gabor filters, The Interna-
tional Journal of Advanced Manufacturing Technology 16(7): 474–482.
Woodham, R. J. (1980). Photometric method for determining surface orientation from multiple
images, Optical Engineering 19(1): 139–144.
Xie, X. (2008). A review of recent advances in surface defect detection using texture analysis
techniques, Electronic Letters on Computer Vision and Image Analysis 7(3): 1–22.
Camera and laser robust integration in engineering and architecture applications 115
X6
1. Introduction
1.1 Motivation
The 3D modelling of objects and complex scenes constitutes a field of multi-disciplinary
research full of challenges and difficulties, ranging from the accuracy and reliability of the
geometry, the radiometric quality of the results up to the portability and cost of the products,
without forgetting the aim of automatization of the whole procedure. To this end, a wide
variety of passive and active sensors are available of which the digital cameras and the scanner
laser play the main role. Even though these two types of sensors can work in a separate
fashion, it is when are merged together when the best results are attained. The following table
(Table 1) gives an overview of the advantages and limitations of each technology.
The comparison between the laser scanner and the digital camera (Table 1) stresses the
incomplete character of the information derived from only one sensor. Therefore, we reach
the conclusion that an integration of data sources and sensors must be achieved to improve
the quality of procedures and results. Nevertheless, this sensor fusion poses a wide range of
difficulties, derived not only from the different nature of the data (2D images and 3D
scanner point clouds) but also from the different processing techniques related to the
properties of each sensor. In this sense, an original sensor fusion approach is proposed and
applied to the architecture and archaeology. This approach aims at achieving a high
automatization level and provides high quality results all at once.
116 Sensor Fusion and Its Applications
- On-site integration, that resides on a “physical” fusion of both sensors. This approach
consists on a specific hardware structure that is previously calibrated. This solution
provides a higher automatization and readiness in the data acquisition procedures but a
higher dependency and a lack of flexibility in both the data acquisition and its processing.
Examples of this kind of fusion are the commercial solutions of Trimble and Leica. Both are
equipped with digital cameras that are housed in the inside of the device. These cameras
exhibit a very poor resolution. (<1Mp). With the idea of accessing cameras with higher
quality, other manufacturers present an exterior and calibrated frame to which a reflex
camera can be attached. Faro Photon, Riegl LMS-Z620, Leica HDS6100 and Optech Ilris-3D
are some of the laser systems that have incorporated this external sensor.
Even though these approaches may lead to the idea that the sensor fusion is a
straightforward question, the actual praxis is rather different, since the photos shoot time
must be simultaneous to the scanning time, thus the illumination conditions as well as other
conditions regarding the position of the camera or the environment may be far from the
desired ones.
Camera and laser robust integration in engineering and architecture applications 117
- Office integration, that consists on achieving the sensor fusion on laboratory, as the result of
a processing procedure. This approach permits more flexibility in the data acquisition since
it will not require neither a previously fixed and rigid framework nor a predetermined time
of exposure. Nevertheless, this gain in flexibility demands the challenge of developing an
automatic or semi-automatic procedure that aims at “tunning” both different data sources
with different constructive fundamentals. According to Kong et al., (2007), the sensor fusion
can be divided in three categories: the sensorial level (low level), the feature level
(intermediate level) and the decision level (high level). In the sensorial level raw data are
acquired from diverse sensors. This procedure is already solved for the onsite integration case
but it is really complicated to afford when sensors are not calibrated to each other. In this
sense, the question is to compute the rigid transformation (rotation and translation) that
renders the relationship between both sensors, besides the camera model (camera
calibration). The feature level merges the extraction and matching of several feature types.
The procedure of feature extraction includes issues such as corners, interest points, borders
and lines. These are extracted, labeled, located and matched through different algorithms.
The decision level implies to take advantage of hybrid products derived from the processed
data itself combined with the expert decision taking.
Regarding the two first levels (sensorial and feature), several authors put forward the
question of the fusion between the digital camera and the laser scanner through different
approaches linked to different working environments. Levoy et al., (2000) in their project
“Digital Michelangelo” carry on a camera pre-calibration facing integration with the laser
scanner without any user interaction. In a similar context, Rocchini et al. (1999) obtain a
fusion between the image and the laser model by means of an interactive selection of
corresponding points. Nevertheless, both approaches are only applied to small objects such
as sculptures and statues. With the idea of dealing with more complicated situations arising
from complex scenes, Stamos and Allen, (2001) present an automatic fusion procedure
between the laser model and the camera image. In this case, 3D lines are extracted by means
of a segmentation procedure of the point clouds. After this, the 3D lines are matched with
the borders extracted from the images. Some geometrical constraints, such as orthogonality
and parallelism, that are common in urban scenes, are considered. In this way, this
algorithm only works well in urban scenes where these conditions are met. In addition, the
user must establish different kinds of thresholds in the segmentation process. All the above
methodologies require the previous knowledge of the interior calibration parameters. With
the aim of minimizing this drawback, Aguilera and Lahoz (2006) exploit a single view
modelling to achieve an automatic fusion between a laser scanner and a not calibrated
digital camera. Particularly, the question of the fusion between the two sensors is solved
automatically through the search of 2D and 3D correspondences that are supported by the
search of two spatial invariants: two distances and an angle. Nevertheless, some
suppositions, such as the use of special targets and the presence of some geometric
constraints on the image (vanishing points) are required to undertake the problem. More
recently, Gonzalez-Aguilera et al. (2009) develop an automatic method to merge the digital
image and the laser model by means of correspondences of the range image (laser) and the
camera image. The main contribution of this approach resides on the use of a level hierarchy
(pyramid) that takes advantage of the robust estimators, as well as of geometric constraints
118 Sensor Fusion and Its Applications
that ensure a higher accuracy and reliability. The data are processed and tested by means of
software called USALign.
Although there are many methodologies that try to progress in the fusion of both sensors
taking advantage of the sensorial and the feature level, the radiometric and spectral
properties of the sensors has not received enough attention. This issue is critical when the
matching concerns images from different parts of the electromagnetic spectrum: visible
(digital camera), near infrared (laser scanner) and medium/far infrared (thermal camera)
and when the aspiration is to achieve an automatization of the whole procedure. Due to the
different ways through which the pixel is formed, some methodologies developed for the
visible image processing context may work in an inappropriate way or do not work at all.
On this basis, this chapter on sensor fusion presents a method that has been developed and
tested for the fusion of the laser scanner, the digital camera and the thermal camera. The
structure of the chapter goes as follows: In the second part, we will tackle with the
generalities related with the data acquisition and their pre-processing, concerning to the
laser scanner, the digital camera and the thermal camera. In the third part, we will present
the specific methodology based on a semi-automatic procedure supported by techniques of
close range photogrammetry and computer vision. In the fourth part, a robust registration of
sensors based on a spatial resection will be presented. In the fifth part, we will show the
experimental results derived from the sensor fusion. A final part will devoted to the main
conclusions and the expected future developments.
2. Pre-processing of data
In this section we will expose the treatment of the input data in order to prepare them for
the established workflow.
average scanning direction. When this angle is large, the automatic fusion
procedures will become difficult to undertake.
The following picture (Fig. 1) depicts the three questions mentioned above:
Fig. 1. Factors that influence the data acquisition with the laser scanner and the
digital/thermal camera
Through a careful planning of the acquisition framework, taking into account the issues
referred before, some rules and basic principles should be stated (Mancera-Taboada et al.,
2009). These could be particularised for the case studies analyzed in section 5 focussing on
objects related to the architectural and archaeological field. In all of them the input data are
the following:
The point cloud is the input data in the case of the laser scanner and exhibits a 3D
character with specific metric and radiometric properties. Particularly, the cartesian
coordinates XYZ associated to each of the points are accompanied by an intensity
value associated to the energy of the return of each of the laser beams. The image
that is formed from the point cloud, the range image, has radiometric properties
derived from the wavelength of the electromagnetic spectrum, that is, the near or
the medium infrared. This image depends on factors such as: the object material,
the distance between the laser scanner and the object, the incidence angle between
the scanner rays and the surface normal and the illumination of the scene. Also, in
some cases, this value can be extended to a visible RGB colour value associated to
each of the points.
The visible digital image is the input data coming from the digital camera and
presents a 2D character with specific metric and radiometric properties. Firstly, it is
important that its geometric resolution must be in agreement with the object size
and with the scanning resolution. The ideal would be that the number of elements
in the point cloud would be the same that the number of pixels in the image. In
this way, a perfect correspondence could be achieved between the image and the
120 Sensor Fusion and Its Applications
point cloud and we could obtain the maximum performance from both data sets. In
addition, for a given field of view for each sensor, we should seek that the whole
object could be covered by a single image. As far as this cannot be achieved we
should rely on an image mosaic where each image (previously processed) should
be registered in an individual fashion. On the other hand, from a radiometric point
of view, the images obtained from the digital camera should present a
homogeneous illumination, avoiding, as far as it is possible, the high contrasts and
any backlighting.
The thermal digital image is the input data coming from the thermal camera and
presents a 2D character with specific metric and radiometric properties. From a
geometric point of view they are low resolution images with the presence of high
radial lens distortion. From a radiometric point of view the values distribution does
not depend, as it does in the visible part of the electromagnetic spectrum, on the
intensity gradient of the image that comes from part of the energy that is reflected
by the object, but from the thermal gradient of the object itself as well as from the
object emissivity. This represents a drawback in the fusion process.
To obtain the photo coordinates (xA,yA) of a three-dimensional point (XA,YA,ZA) the value of
the exterior orientation parameters (XS,YS,ZS,,,), must have been computed. These are
the target unknowns we address when we undertake the sensors registration procedure. As
this is a question that must be solved through an iterative process, it becomes necessary to
provide the system of equations (1) with a set of initial solutions that will stand for the
exterior orientation of the virtual camera. The registration procedure will lead to a set of
corrections in such a way that the final result will be the desired position and attitude.
In this process it is necessary to define a focal length to perform the projection onto the
range image. To achieve the best results and to preserve the initial configuration, the same
focal length of the camera image will be chosen.
Camera and laser robust integration in engineering and architecture applications 121
Z w i i
Zk i 1
n
w i
i 1 (2)
122 Sensor Fusion and Its Applications
where Zk is the digital level of the empty pixel, Zi are the digital values of the neighbouring
pixels, wi is the weighting factor and n is the number of points that are involved in the
interpolation. Specifically this weighting factor is defined is the inverse of the square of
distance between the pixel k and the i-th neighbouring pixel
1
wi (3)
d k2,i
The neighbouring area is defined as a standard mask of 3x3 pixels, although this size may
change depending on the image conditions. In this way, we ensure a correct interpolation
within the empty pixels of the image according to its specific circumstances.
In (2) only one radiometric channel is addressed because the original point cloud data has
only one channel (this is common for the intensity data) or because these data have been
transformed from a RGB distribution in the original camera attached to the laser scanner to
a single luminance value. At last, together with the creation of the range image, an equal
size matrix is generated which stores the object coordinates corresponding to the point
cloud. This matrix will be used in the sensor fusion procedure.
Fig. 4. Displacement due to the radial distortion (Right): Real photo (Left), Ideal Photo
(Centre)
Please note in Fig. 4, how if the camera optical elements would be free from the radial
distortion effects, the relationship between the image (2D) and the object (3D) would be
linear, but such distortions are rather more complicated than this linear model and so, the
transformation between image and object needs to account for this question.
On the other hand, in the determination of the radial distortion we find that its modelling is
far from being simple because, first of all, there is little agreement at the scientific
community on the standard model to render this phenomenon and this leads to difficulties
in the comparison and interpretation of the different models and so, it is not easy to assess
the accuracy of the methodology. As a result empirical approaches are rather used (Sánchez
et al., 2004).
In our case, the radial distortion has been estimated by means of the so called Gaussian
model as proposed by Brown (Brown, 1971). This model represents a “raw” determination
of the radial distortion distribution and doses not account for any constraint to render the
correlation between the focal length and such distribution(4).
dr k1r '3 k 2 r '5 (4)
For the majority of the lenses and applications this polynomial can be reduced to the first
term without a significant loss in accuracy.
Particularly, the parameters k1 and k2 in the Gaussian model have been estimated by means
of the software sv3DVision Aguilera and Lahoz (2006), which enables to estimate these
parameters from a single image. To achieve this, it takes advantage of the existence of
diverse geometrical constraints such as straight lines and vanishing points. In those cases of
study, such as archaeological cases, in which these elements are scarce, the radial distortion
parameters have been computed with the aid of the open-source software Fauccal (Douskos
et al., 2009).
Finally, it is important to state that the radial distortion parameters will require a constant
updating, especially for consumer-grade compact cameras, since a lack of robustness and
stability in their design will affect to the focal length stability. A detailed analysis of this
question is developed by Sanz (2009) in its Ph.D thesis. Particularly, Sanz analysis as factors
124 Sensor Fusion and Its Applications
of unsteadiness of the radial lens distortion modelling the following: switching on and off,
use of zooming and focus and setting of the diaphragm aperture.
Once the camera calibration parameters are known, they must be applied to correct the
radial distortion effects. Nevertheless, the direct application of these parameters may
produce some voids in the final image since the pixels are defined as entire numbers (Fig. 5),
that is, neighbouring pixels in the original image may not maintain this condition after
applying the distortion correction.
Fig. 5. Left: original image with radial distortion. Right: image without radial distortion,
corrected by the direct method.
Trying to avoid this situation as well as applying an interpolation technique which would
increase the computing time considerably, an indirect method based on Newton-Rapshon
(Süli, 2003) has been adapted in order to correct images of radial lens distortion.
Particularly, the corrected image matrix will be consider as the input data, so for every
target position on such matrix (xu,yu), the corresponding position on the original image
(xd,yd) will be computed.
Treatment of background of the visible image. Usually when we acquire an image, some
additional information of the scene background that is not related with object of study is
recorded. On the counterpart, the range image has as a main feature that there is any
information at all corresponding to the background (by default, this information is white)
because it has been defined from the distances to the object. This disagreement has an
impact on the matching quality between the elements that are placed at the objects edges,
since their neighbourhood and the radiometric parameters related with them become
modified by scene background.
Camera and laser robust integration in engineering and architecture applications 125
From all the elements that may appear at the background of an image shut at the outside
(which is the case of the facades of architecture) the most common is the sky. This situation
cannot be extrapolated to those elements in the inside or to those situations in which the
illumination conditions are uniform (cloudy days), so this background correction would not
be not necessary. Nevertheless, for the remaining cases in which the atmosphere appears
clear, the background radiometry will be close to blue and, consequently it will turn
necessary to proceed to its rejection. This is achieved thanks to its particular radiometric
qualities. (Fig. 6).
Fig. 6. Before (Left) and after (Right) of rejecting the sky from camera image.
The easiest and automatic way is to compute the blue channel in the original image, that is,
to obtain an image whose digital levels are the third coordinate in the RGB space and to
filter it depending on this value. The sky radiometry exhibits the largest values of blue
component within the image (close to a digital level of 1, ranging from 0 to 1), and far away
from the blue channel values that may present the facades of buildings (whose digital level
usually spans from 0.4 to 0.8). Thus, we just have to implement a conditional instruction by
which all pixels whose blue channel value is higher than a certain threshold, (this threshold
being controlled by the user), will be substituted by white.
Conversion of colour models: RGB->YUV. At this stage the RGB space radiometric
information is transform into a scalar value of luminance. To achieve this, the YUV colour
space is used because one of its main characteristics is that it is the model which renders
more closely the human eye behaviour. This is done because the retina is more sensitive to
the light intensity (luminance) than to the chromatic information. According to this, this
space is defined by the three following components: Y (luminance component), U and V
(chromatic components). The equation that relates the luminance of the YUV space with the
coordinates of the RGB space is:
Texture extraction. With the target of accomplishing a radiometric uniformity that supports
the efficient treatment of the images (range, visible and thermal) in its intensity values, a
region based texture extraction has been applied. The texture information extraction for
purposes of image fusion has been scarcely treated in the scientific literature but some
126 Sensor Fusion and Its Applications
experiments show that it could yield interesting results in those cases of low quality images
(Rousseau et al., 2000; Jarc et al., 2007). The fusion procedure that has been developed will
require, in particular, the texture extraction of thermal and range images. Usually, two filters
are used for this type of task: Gabor (1946) or Laws (1980). In our case, we will use the Laws
filter. Laws developed a set 2D convolution kernel filters which are composed by
combinations of four one dimensional scalar filters. Each of these one dimensional filters
will extract a particular feature from the image texture. These features are: level (L), edge
(E), spot (S) and ripple (R). The one-dimensional kernels are as follows:
L5 1 4 6 4 1
E5 1 2 0 2 1
(6)
S5 1 0 2 0 1
R5 1 4 6 4 1
The combination of these kernels gives 16 different filters. From them, and according to
(Jarc, 2007), the more useful are E5L5, S5E5, S5L5 and their transposed. Particularly,
considering that our cases of study related with thermal camera correspond to architectural
buildings, the filters E5L5 and L5E5 have been applied in order to extract horizontal and
vertical textures, respectively.
Finally, each of the images filtered by the convolution kernels, were scaled for the range 0-
255 and processed by histogram equalization and a contrast enhancement.
An apparent solution would be to create a range and/or thermal image of the same size as
the visible image. This solution presents an important drawback since in the case of the
Camera and laser robust integration in engineering and architecture applications 127
range image this would demand to increase the number of points in the laser cloud and, in
the case of the thermal image, the number of thermal pixels. Both solutions would require
new data acquisition procedures that would rely on the increasing of the scanning
resolution in the case of the range image and, in the case of the thermal image, on the
generation of a mosaic from the original images. Both approaches have been disregarded for
this work because they are not flexible enough for our purposes. We have chosen to resize
all the images after they have been acquired and pre-processed seeking to achieve a balance
between the number of pixels of the image with highest resolution (visible), the image with
lowest resolution (thermal) and the number of laser points. The equation to render this
sizing transformation is the 2D affine transformation (8).
R Im g CIm g A1
R Im g TIm g A2 (8)
a b c
A1,2 d e d
0 0 1
where A1 contains the affine transformation between range image and camera image, A2
contains the affine transformation between range image and thermal image , and RImg, CImg
y TImg are the matrices of range image, the visible image and the thermal image,
respectively.
After the resizing of the images we are prepared to start the sensor fusion.
3. Sensor fusion
One of the main targets of the fusion sensor strategy that we propose is the flexibility to use
multiple sensors, so that the laser point cloud can be rendered with radiometric information
and, vice versa, that the images can be enriched by the metric information provided by de
laser scanner. Under this point of view, the sensor fusion processing that we will describe in
the following pages will require extraction and matching approaches that can ensure:
accuracy, reliability and unicity in the results.
remains invariant to rotations and scale changes and an edge-line detector invariant to
intensity variations on the images.
Harris operator provides stable and invariant spatial features that represent a good support
for the matching process. This operator shows the following advantages when compared
with other alternatives: high accuracy and reliability in the localization of interest points and
invariance in the presence of noise. The threshold of the detector to assess the behaviour of
the interest point is fixed as the relation between the eigenvector of the autocorrelation
function of the kernel(9) and the standard deviation of the gaussian kernel. In addition, a
non maximum elimination is applied to get the interest points:
R 1 2 k 1 2 M k trace M
2
(9)
where R is the response parameter of the interest point, 1 y 2 are the eigenvectors of M, k
is an empiric value and M is the auto-correlation matrix. If R is negative, the point is labeled
as edge, if R is small is labeled as a planar region and if it is positive, the point is labeled as
interest point.
On the other hand, Förstner algorithm is one of the most widespread detectors in the field of
terrestrial photogrammetry. Its performance (10) is based on analyzing the Hessian matrix
and classifies the points as a point of interest based on the following parameters:
2
λ λ 4 N N
q 1 1 2 2 w (10)
λ1 λ2 tr (N ) 1
tr( N )
2
where q is the ellipse circularity parameter, 1 and 2 are the eigenvalues of N, w the point
weight and N the Hessian matrix. The use of q-parameter allows us to avoid the edges
which are not suitable for the purposes of the present approach
The recommended application of the selection criteria is as follows: firstly, remove those
edges with a parameter (q) close to zero; next, check that the average precision of the point
Camera and laser robust integration in engineering and architecture applications 129
(w) does not exceed the tolerance imposed by the user; finally, apply a non-maximum
suppression to ensure that the confidence ellipse is the smallest in the neighbourhood.
Edge detection: Filter of Canny. The Canny edge detector is the most appropriate for the edge
detection in images where there is a presence of regular elements because it meets three
conditions that are determinant for our purposes:
Accuracy in the location of edge ensuring the largest closeness between the
extracted edges and actual edges.
Reliability in the detection of the points in the edge, minimizing the
probability of detecting false edges because of the presence of noise and,
consequently, minimizing the loss of actual edges.
Unicity in the obtaining of a unique edge, ensuring edges with a maximum
width of one pixel.
Mainly, the Canny edge detector filter consists of a multi-phase procedure in which the user
must choose three parameters: a standard deviation and two threshold levels. The result
will be a binary image in which the black pixels will indicate the edges while the rest of the
pixels will be white.
Line segmentation: Burns. The linear segments of an image represent one of the most
important features of the digital processing since they support the interpretation in three
dimensions of the scene. Nevertheless, the segmentation procedure is not straightforward
because the noise and the radial distortion will complicate its accomplishment. Achieving a
high quality segmentation will demand to extract as limit points of the segment those that
best define the line that can be adjusted to the edge. To meet so, the segmentation procedure
that has been developed is, once more, structured on a multi-phase fashion in which a series
of stages are chained pursuing to obtain a set of segments (1D) defined by their limit points
coordinates. The processing time of the segmentation phase will linearly depend on the
number of pixels that have been labeled as edge pixels in the previous phase. From here, it
becomes crucial the choice of the three Canny parameters, described above.
The segmentation phase starts with the scanning of the edge image (from up to down and
from left to right) seeking for candidate pixels to be labeled as belonging to the same line.
The basic idea is to group the edge pixels according to similar gradient values, being this
step similar to the Burns method. In this way, every pixel will be compared with its eight
neighbours for each of the gradient directions. The pixels that show a similar orientation
130 Sensor Fusion and Its Applications
will be labeled as belonging to the same edge: from here we will obtain a first clustering of
the edges, according to their gradient.
Finally, aiming at depurating and adapting the segmentation to our purposes, the edges
resulting from the labeling stage will be filtered by means of the edge least length
parameter. In our case, we want to extract only those most relevant lines to describe the
object in order to find the most favourable features to support the matching process. To do
so, the length of the labeled edges is computed and compared with a threshold length set by
the user. If this length is larger than the threshold value the edge will be turned into a
segment which will receive as limit coordinates the coordinates of the centre of the first and
last pixel in the edge. On the contrary, if the length is smaller than the threshold level, the
edge will be rejected (Fig. 7).
Fig. 7. Edge and line extraction with the Canny and Burns operators.
3.2 Matching
Taking into account that the images present in the fusion problem (range, visible and
thermal) are very different in their radiometry, we must undertake a robust strategy to
ensure a unique solution. To this end, we will deal with two feature based matching
strategies: the interest point based matching strategy (Li and Zouh, 1995; Lowe, 2005) and
the edge and lines based matching strategy (Dana and Anandan, 1993; Keller and Averbuch,
2006), both integrated on a hierarchical and pyramidal procedure.
where p is the cross-correlation coefficient, HR is the covariance between the windows of
the visible image and the range image; H is the standard deviation of the visible image and
y R is the standard deviation of the range image. The interest point based matching relays
on closeness and similarity measures of the grey levels within the neighbourhood.
Later, at the last level of the pyramid, in which the image is processed at its real resolution
the strategy will be based on the least squares matching (Grün, 1985). For this task, the
initial approximations will be taken from the results of the area based matching applied on
the previous levels. The localization and shape of the matching window is estimated from
the initial values and recomputed until the differences between the grey levels comes to a
minimum(12),
where F and G represent the reference and the matching window respectively, a,b,c,d,x y
y are the geometric parameters of an affine transformation while r1 and r0 are the
radiometric parameters of a linear transformation, more precisely, the gain and the offset,
respectively.
Finally, even though the matching strategy has been applied in a hierarchical fashion, the
particular radiometric properties of both images, especially the range image, may lead to
many mismatches that would affect the results of the sensor fusion. Hence, the proposed
approach has been reinforced including geometric constraints relying on the epipolar lines
(Zhang et al., 1995; Han and Park, 2000). Particularly and taking into account the case of the
laser scanner and the digital camera, given a 3D point in object space P, and being pr and pv
its projections on the range and visible images, respectively and being Ol and Oc the origin
of the laser scanner and the digital camera, respectively, we have that the plane defined by
P, Ol and Oc is named the epipolar plane. The intersections of the epipolar plane with the
range and visible image define the epipolar lines lr and lv. The location of an interest point
on the range image pr, that matches a point pv on the visible image matching is constrained
to lay at the epipolar line lr of the range image (Fig. 8). To compute these epipolarity
constraints, the Fundamental Matrix is used (Hartley, 1997) using eight homologous points
as input (Longuet-Higgins, 1981). In this way, once we have computed the Fundamental
Matrix we can build the epipolar geometry and limit the search space for the matching
points to one dimension: the epipolar line. As long as this strategy is an iterative process, the
threshold levels to be applied in the matching task will vary in an adaptative way until we
have reduced the search as much as possible and reach the maximum accuracy and
reliability.
132 Sensor Fusion and Its Applications
In order to ensure the accuracy of the Fundamental Matrix, the iterative strategy has been
supported by RANSAC algorithm (RANdom SAmpling Consensus) (Fischler and Bolles,
1981). This technique computes the mathematical model for a randomly selected dataset and
evaluates the number of points of the global dataset which satisfy this model by a given
threshold. The final accepted model will be that one which incorporates the larger set of
points and the minimum error.
The solution proposed takes advantage of line based matching (Hintz & Zhao, 1990; Schenk,
1986) that exploits the direction criterion, distance criterion and attribute similarity criterion
in a combined way. Nevertheless, this matching process is seriously limited by the ill-
conditioning of both images: the correspondence can be of several lines to one (due to
discontinuities), several lines regarded as independent may be part of the same, some lines
may be wrong or may not exist at all (Luhmman et al., 2006). This is the reason for the pre-
processing of these images according to a texture filtering (Laws, 1980) as described in the
section (2.3.2). This will yield four images with a larger radiometric similarity degree and
with horizontal and vertical textures extracted on which we can support our line based
matching.
Camera and laser robust integration in engineering and architecture applications 133
On the following lines we describe the three criteria we have applied for the line based
matching:
Direction criterion. We will take as an approach criterion to the line based matching the
classification of these lines according to their direction. To this end, we take the edge
orientation and the own gradient of the image as reference. The main goal in this first step is
to classify the lines according to their horizontal and vertical direction, rejecting any other
direction. In those cases in which we work with oblique images, a more sophisticated option
could be applied to classify the linear segments according to the three main directions of the
object (x,y,z) based on vanishing points (Gonzalez-Aguilera y Gomez-Lahoz, 2008).
Distance criterion. Once we have classified the lines according to their direction we will take
their distance attribute as the second criterion to search for the homologous line. Obviously,
considering the different radiometric properties of both images, an adaptative threshold
should be established since the distance of a matched line could present variations.
Intersection criterion. In order to reinforce the matching of lines based on their distance, a
specific strategy has been developed based on computing intersections between lines
(corner points). Particularly, a buffer area (50x50 pixels) is defined where horizontal and
vertical lines are enlarged to their intersection. In this sense, those lines that find a similar
intersection points will be labelled as homologous lines.
As a result of the application of these three criterions, a preliminary line based matching
based on the fundamental matrix will be performed (see section 3.2.1). More precisely, the
best eight intersections of lines matched perform as input data in the fundamental matrix.
Once we have computed the Fundamental Matrix we can build the epipolar geometry and
limit the search space for the matching lines to one dimension: the epipolar line. As long as
this strategy is an iterative process, the threshold levels to be applied in the matching task
will vary in an adaptative way until we have reduced the search as much as possible and
reach the maximum accuracy and reliability.
4. Spatial resection
Once we have solved the matching task, through which we have related the images to each
other (range image, visible image and thermal image) we proceed to solve the spatial
resection. The parameters to be determined are the exterior parameters of the cameras
(digital and thermal) respect of the laser scanner.
The question of the spatial resection is well known on the classic aerial photogrammetry
(Kraus, 1997). It is solved by the establishment of the relationship of the image points, the
homologous object points and the point of view through a collinearity constraint (1).
We must have in mind that the precision of the data on both systems (image and object) is
different since their lineage is different, so we must write an adequate weighting for the
stochastic model. This will lead to the so called unified approach to least squares adjustment
(Mikhail and Ackerman, 1976) in the form:
134 Sensor Fusion and Its Applications
L BV AX 0 (13)
where L is the independent term vector, B is the jacobian matrix of the observations, V is
the vector of the residuals, A is the jacobian matrix of the unknowns and X is the vector of
unknowns. The normal equation system we get after applying the criterion of least squares
is in the form:
A T M 1Ax A T M 1L 0 where M BW 1BT (14)
This equation is equivalent to the least squares solution we obtain when directly solving for
the so called observation equation system. In this case we can say that the matrix M plays
the role of weighting the equations (instead of the observations). Please note that this matrix
is obtained from the weighting of the observations (through the matrix W) and from the
functional relationship among them expressed the jacobian matrix (matrix B). In this way,
this matrix operates in the equation system as a geometrical counterpart of the metrical
relationship between the precision of the different observations (image and object).
From the equation (13) and its solution (14) we can obtain the adjusted residuals:
AX L
1
V W 1BT BW 1BT (15)
According to the covariance propagation law (Mikhail and Ackermann, 1976), the cofactor
matrix of the estimated parameters is obtained from the equation:
Q x̂ A T M 1A
1
A T M 1Q L M 1A A T M 1A
1
A M A
T 1 1
(16)
1 T
Q L BW B M
and so, the covariance matrix of the spatial resection is given by:
C xˆ 02 Q xˆ (17)
The square root of the elements in the main diagonal of the matrix provides the standard
deviation of the exterior orientation parameters.
Finally, the mean square error is obtained from:
V T WV
e.m.c. (18)
2n 6
On the other side, with the aim of comparing the results and analyzing its validity we have
also solved the spatial resection by means of the so called Direct Linear Transformation
Camera and laser robust integration in engineering and architecture applications 135
The DLT equations results from a re-parameterization of the collinearity equations (Kraus,
1997) in the following way,
xA xp X A XS
y y R Y Y (19)
A p A S
f ZA ZS
in which (XS,YS,ZS) are the object coordinates of the point of view, (XA,YA,ZA) are the object
coordinates of an object point and (xA,yA) are the image coordinates of an image point
homologous of the first, f is the focal length, (xp,yp) are the image coordinates of the
principal point, R is the 3x3 rotation matrix and is the scale factor.
If we expand the terms and divide the equations among them to eliminate the scale factor
we have:
xA
x r
p 31
fr11 XA xp r32 fr12 YA xp r33 fr13 ZA xp r31 fr11 XS xp r32 fr12 YS xp r33 fr13 ZS
r31XA r32 YA r33 ZA r31XS r32 YS r33 ZS
(20)
y r fr X y r
p 31 21 A p 32
fr22 YA yp r33 fr23 ZA yp r31 fr21 XS yp r32 fr22 YS yp r33 fr23 ZS
yA
r31XA r32 YA r33 ZA r31XS r32 YS r33 ZS
L1X A L 2 YA L3 Z A L 4 L5 X A L 6 YA L 7 Z A L8
xA yA (21)
L9 X A L10 YA L11Z A 1 L9 X A L10 YA L11 ZA 1
This expression relates image coordinates (xA,yA) with the object coordinates (XA,YA,ZA), and
consequently, it is useful to reference the images to the laser model. The relationship
between the mathematical parameters (L1,…,L11) and the geometrical parameters is as
follows:
136 Sensor Fusion and Its Applications
L4
fr 11
x p r31 X S fr12 x p r32 YS fr13 x p r33 ZS
D
y p r31 fr21 y p r32 fr22 y p r33 fr23 (22)
L5 L6 L7
D D D
L8
fr 21
y p r31 X S fr22 y p r32 YS fr23 y p r33 ZS
D
r31 r32 r33
L9 L10 L11
D D D
D r31 X S r32 YS r33 ZS
1
X S L1 L2 L3 L 4
Y L L6 L 7 L8
S 5
ZS L9 L11 1
L10
L1L 9 L 2 L10 L3 L11
xp
L29 L210 L211
L 5 L 9 L 6 L10 L 7 L11
yp
L29 L210 L211
x p L9 L1 x p L10 L 2 x p L11 L3
f f f
r11 r12 r13
y
p 9 L5
L y p L10 L 6 y p L10 L 7
R r21 r22 r23 D
f f f
r31 r32 r33
L 9
L10 L11
(23)
1
D2
L29 L210 L211
To solve the spatial resection both models are effective. Nevertheless, some differences must
be remarked:
- DLT is a linear model and therefore it does not require neither an iterative process
nor initial values for the first iteration (both derived from de Taylor series
expansion).
Camera and laser robust integration in engineering and architecture applications 137
- The number of parameters to be solved when using the DLT is 11 and so, we need
at least to have measured 6 control points (2 equations for each point) whereas the
number of parameters to be solved when using the collinearity equations is directly
6 (if we are only solving for the exterior orientation or 9 (if we are solving for the
exterior orientation and for the three parameters of the interior orientation that the
describe the camera without taking into account any systematic error such a as
radial lens distortion). Therefore, we will need three control points in the first case,
and five in the second case.
Concerning the reliability of the spatial resection, it is important to stress that in spite of
robust computing methods that we have applied at the matching stage, there may still
persist some mismatching on the candidates homologous points and so the final accuracy
could be reduced. These blunders are not easy to detect because in the adjustment its
influence is distributed over all the points. As is well known, the least squares approach
allows to detect blunders when the geometry is robust, that is, when the conditioning of the
design matrix A is good but when the geometry design is weak the high residual which
should be related with the gross error is distributed over other residuals. Consequently, it
becomes necessary to apply statistical tests such as the Test of Baarda (Baarda, 1968), and/or
the Test of Pope (Pope, 1976), as well as robust estimators that can detect and eliminate such
wrong observations.
Regarding the statistical tests, they are affected by some limitations, some of which are
related with the workflow described up to here. These are:
If the data set present a bias, that is, the errors do not follow a gaussian distribution
the statistical tests will lose a large part of its performance
Form the actual set of available statistical tests, only the test of Pope can work
without knowing previously the variance of the observations. Unfortunately, this is
usually the case in photogrammetry.
As stated before, in the case of working under a weak geometry, the probability
that these tests do not perform adequately greatly increases. In addition, these tests
are only able of rejecting one observation at each iteration.
On the contrary, these statistical tests exhibit the advantage that they may be applied in a
fully automated fashion, and thus avoiding the interaction with the user.
The test of Baarda (Baarda, 1968), assumes that the theoretical variance is not known and
therefore will use the a priori variance (02). It also works on the assumption that the
standard deviation of the observations is known. The test is based on the fact that the
residuals are normally (gaussian) distributed. The test indicator is the normalised residual
(zi), defined as:
Pv i
zi (24)
0 PQ vv P ii
138 Sensor Fusion and Its Applications
where P is the matrix of weights, vi is the i-th residual, and Qvv is the cofactor matrix of the
residuals. This indicator is compared with the critical value of the test to accept or reject the
null hypothesis (H0). It is defined by:
Where is the significance level, N represents the normal distribution, F is the distribution
of Fischer-Snedecor and is the Square – Chi table.
The critical value of Baarda (Tb) takes into account the level of significance as well as the
power of the test. Therefore, it is very common that certain combinations of and are
used in the majority of cases. The most common is =0,1 % and =20%, which leads to a
critical value of 3,291.
If the null hypothesis is rejected, we will assume that there is gross error among the
observations. Therefore, the procedure will consist on eliminating from the adjustment the
point with the largest typified residual and repeating the test of Baarda to check if there are
more gross errors. The iterative application of this strategy is called data snooping (Kraus,
1997) and permits to detect multiple blunders and to reject them from the adjustment.
The test of Pope (Pope, 1976) is used when the a priori variance (02) is not known or is not
possible to be determined. In its place the a posteriori variance (26) is used.
V T PV
ˆ 0 2 (26)
r
This statistic test is usually applied in Photogrammetry since it is very common that the a
priori variance is not known. The null hypothesis (H0) is that all the residuals (vi) follow a
normal (gaussian) distribution N(0,) such that its variance is the residual normalized
variance.
Vˆ ˆ 0 q ii (27)
i
where qvivi is the i.th element of the main diagonal of the cofactor matrix of the residuals
(Qvv). On the contrary, the alternative hypothesis (Ha) states that the in the set of
observations there is a gross error that does not behave according to the normal distribution
and, thus, must be eliminated. Thus, we establish as statistical indicator the standardised
residuals (wi) that is obtained as:
vi
wi (28)
ˆ 0 Vi
Camera and laser robust integration in engineering and architecture applications 139
Please note that in this test we use the standardised residuals (wi), while in the test of Baarda
we use the normalised residual (zi). The only difference is the use of the a posteriori and a
priori variance, respectively.
Since the residuals are computed using the a posteriori variance they will not be normally
distributed (25)but rather will follow a Tau distribution. The critical value of the Tau
distribution may be computed from the tables of the t-student distribution (Heck, 1981)
according to:
2
r t r 1,0 2
r, (29)
0 2, 2
r 1 t r 1,0 2
where r are the degree of freedom of the adjustment and 0 the significance level for a single
observation that is computing from the total significance level () and the number of
observations (n):
1
0 1 1 n (30)
n
If the alternative hypothesis is accepted, the standardised residual wi will be regarded as a
blunder and hence will be eliminated from the adjustment. The procedure is repeated until
the null hypothesis is verified for all the remaining points in a similar way as has been done
with the data snooping technique, described above.
There are many robust estimators (Domingo, 2000), because each of them modifies the
weighting function in a particular way. The most common robust estimators are:
1
Sum Minimum p vi (31)
vi
140 Sensor Fusion and Its Applications
1 para v a
Huber
p v a
v para v a
Modified Danish
p vi e
2
vi
In our case, the three robust estimators (31) have been implemented, adapted and combined
with the statistical estimators in the spatial resection adjustment in order to detect the gross
errors and to improve the accuracy and the reliability of the sensor fusion. Particularly, the
robust estimators have been applied on the first iterations to filter the observations from the
worst blunders and afterwards, the statistical tests have been applied to detect and eliminate
the rest of the gross errors.
5. Experimental results
In order to assess the capabilities and limitations of the methodology of sensor fusion
developed, some experiments have been undertaken by using the USALign software
(González-Aguilera et al., 2009). In the following pages, tree case studies are outlined. The
reasons for presenting these cases are based on the possibilities of integrating different
sensors: laser scanner, digital camera and thermal camera.
The goal is to obtain a 3D textured mapping from the fusion of both sensors. In this way, we
will be able to render with high accuracy the facade and consequently, to help on its
documentation, dissemination and preservation.
Camera and laser robust integration in engineering and architecture applications 141
Fig. 9. Input data. Left: Image acquired with the digital camera. Right: Range image
acquired with the laser scanner.
The next step is to apply an interest point extraction procedure, by means of the Förstner
operator working on criteria of precision and circularity. After it, a robust matching
procedure based on a hierarchical approach is carried on with the following parameters:
cross correlation coefficient: 0.70 and search kernel size: 15 pixels. As a result, we obtain
2677 interest points from which only 230 are identified as homologous points. This low
percentage is due to the differences in textures of both images. In addition, the threshold
chosen for the matching is high in order to provide good input data for the next step: the
computation of the Fundamental Matrix. This Matrix is computed by means of the
algorithm of Longuet-Higgins with a threshold of 2.5 pixels and represents the base for
establishing the epipolar constraints. Once these are applied, the number of homologous
points increased to 317 (Fig. 10)
Fig. 10. Point matching based on epipolar constraints. Left: Digital image. Right: range
image.
142 Sensor Fusion and Its Applications
The next step is to obtain the exterior parameters of the camera in the laser point cloud
system. An iterative procedure based on the spatial resection adjustment is used in
combination with robust estimators as well as the statistical test of Pope. 18 points are
eliminated as a result of this process. The output is the position and attitude parameters of
the camera related to the point cloud and some quality indices that give an idea of the
accuracy of the fusion process of both sensors (Table 2).
Finally, once the spatial resection parameters have been computed a texture map is obtained
(Fig. 11). This allows us to integrate under the same product both the radiometric properties
of the high resolution camera and the metric properties of the laser scanner.
Fig. 11. Left: Back projection error on pixels with a magnification factor of 10. Right: Texture
mapping as the result of the fusion of both sensors
The goal is to obtain a 3D textured mapping from the fusion of both sensors. In this way, we
will be able to render with high accuracy the rocky paintings of the Cave of Llonín and,
hence, we will contribute to its representation and preservation.
Particularly, in the generation of the range image a total of 6480 points have been processed.
This step yields an improvement in the image quality by enhancing the effects of the empty
pixels and by increasing the resolution which is usually lower than the digital camera
resolution. The digital image will be corrected of radial lens distortion effects and
transformed from the RGB values to luminance values as described in the section (2.3.2)
(Fig. 12).
Fig. 12. Input data. Left: Image acquired with the camera. Right: Range image acquired with
the laser scanner.
The next step is to apply an interest point extraction procedure, by means of the Harris
operator and a robust matching procedure based on a hierarchical approach with the
following parameters: cross correlation coefficient: 0.80 and search kernel size: 15 pixels. As
a result, we obtain 1461 interest points from which only 14 are identified as homologous
points. This low rate is due to the low uncertainty while trying to bridge the gap between
the textures of both images. In addition, the threshold chosen for the matching is high to
avoid bad results that could distort the computation of the Fundamental Matrix. This Matrix
is computed by means of the algorithm of Longuet-Higgins with a threshold of 2.5 pixels
and represents the base for establishing the epipolar constraints. This, in time, leads to an
improvement of the procedure and thus the matching yields as much as 63 homologous
points (Fig. 13).
144 Sensor Fusion and Its Applications
Fig. 13. Point matching based on epipolar constraints. Left: Digital image. Right: range
image.
Afterwards, the exterior parameters of the camera are referenced to the laser point cloud in
an iterative procedure based on the spatial resection adjustment in which robust estimators
as well as the statistical test of Pope play a major role. As a result, we obtain the following
parameters: position and attitude of the camera related to the point cloud and some quality
indices that give an idea of the accuracy of the fusion process of both sensors (Table 3).
Finally, once the spatial resection parameters are computed, a texture map is obtained (Fig.
14). This allows us to integrate under the same product both the radiometric properties of
the high resolution camera and the metric properties of the laser scanner.
Fig. 14. Left: Back projection error on pixels with a magnification factor of 5. Right: Texture
mapping as the result of the fusion of both sensors.
Camera and laser robust integration in engineering and architecture applications 145
Fig. 15. Faro Photon (Left); FLIR SC640 thermal camera (Right).
Nevertheless, in the case of the thermal image, we find the opposite situation: the resolution
is low (640x480 pixels) and the pixel size projected on the object taking into account the
technical specifications and a shutting distance of 20 metres, is of 5 cm. The following image
(Fig. 16) shows the input data of this case study.
Fig. 16. Input data: (Left) Range image (GSD 1 cm) obtained with the laser scanner Faro
Photon. (Right) Thermal image (GSD 5 cm) acquired with the thermal camera SC640 FLIR.
In relation with the methodology we have developed, it can be divided in four parts: i) pre-
processing of the range and thermal images ii) feature extraction and matching iii)
registration of the images iv) generation of hybrid products.
The pre-processing automatic tasks to prepare the images for the matching process are
diverse. Nevertheless, due to the specific properties of the images, the most important stage
undertaken at this level is a texture extraction based on the Laws filters. In this way, we
achieve to uniform the images. Particularly, the range and thermal images are convoluted
with the filters E5L5 and L5E5 which are sensitive to horizontal and vertical edges
respectively (Fig. 17). Both images of each case are added to obtain an output image free
from any orientation bias.
Fig. 17. Texture images derived from the range image (a)(b) and thermal image (c)(d).
Afterwards, we apply a feature extraction process and matching. Particularly, edges and
lines are extracted by using the Canny and Burns operators, respectively. The working
parameters for these operators are: deviation: (1), Gaussian kernel size: 5x5, superior
threshold: 200, inferior threshold: 40 and minimum length of lines: 20 pixels. A total amount
of 414 linear segments are extracted for the range image whereas the number of segments
extracted for the thermal image is 487 (Fig. 18).
Camera and laser robust integration in engineering and architecture applications 147
Fig. 18. Linear features extraction on the range image (Left) and on the thermal image
(Right) after applying the Canny and Burns operators.
In the next step, and taking into account the extracted linear features and their attributes:
direction, length and intersection, a feature based matching procedure is undertaken.
Particularly, the intersection between the most favourable horizontal and vertical lines is
computed and used as input data in the fundamental matrix. As a result, the epipolar
constraints are applied iteratively to reinforce the lines matching and thus to compute the
registration of thermal camera supported by robust estimators and the statistical test of
Pope. The following table (Table 6) shows the results of this stage.
Finally, once both sensors are registered to each other the following products could be
derived: a 3D thermal model and thermal orthofoto (Fig. 19). These hybrid products
combine the qualitative properties of the thermal image with the quantitative properties of
the laser point cloud. In fact, the orthophoto may be used as matrix where the rows and
columns are related with the planimetric coordinates of the object while the pixel value
represents the temperature.
Fig. 19. Hybrid products from the fusion sensor: 3D thermal model (Left); thermal
orthophoto (GSD 5 cm) (Right).
148 Sensor Fusion and Its Applications
With relation to the most relevant advantages of the proposed approach, we could remark
on:
The integration of sensors, regarding the three sensors analyzed here (laser scanner, digital
image and thermal image) is feasible and that an automatization of the process may be
achieved. In this way, we can overcome the incomplete character of the information derived
from only one sensor.
More specifically, we have seen that the initial difference between the sources: geometric
differences, radiometric differences and spectral differences may be solved if we take
advantage of the multiple procedures that the photogrammetric and the computer vision
communities have been developing for the last two decades.
In this sense, it is also important to stress that these strategies must work: a) on a pre-
processing and processing level; b) on a multi-disciplinary fashion where strategies are
developed to take advantage of the strength of certain approaches while minimize the
weaknesses of others; c) taking advantage of iterative and hierarchical approaches based on
the idea that the first low accurate and simple solutions are the starting point of a better
approximation that can only be undertaken if the previous one is good enough.
On the other hand, the main drawbacks that have been manifested from this work are:
The processing is still far away from acceptable computing times. At least on the
unfavorable cases (case 2 and 3) we think that is still a long way to go on reducing the
computing time. We think that seeking for a better integration of the diverse strategies that
has been used or developing new ones may lead to an optimization in this sense.
Likewise the full automatization target is fairly improvable. The user interaction is required
mainly to define threshold levels and there is a wide field of research to improve this. It is
important to note that this improvement should not rely on a higher complexity of the
procedures involved in the method since this would punish the previous question of the
computation effort. So this is a sensitive problem that must be undertaken in holistic way.
The data and processing presented here deal with conventional image frames. It would be a
great help if approaches to include line-scanning cameras or fish eye cameras would be
proposed.
Finally regarding with future working lines, the advantages and drawbacks stated before
point out the main lines to work on in the future. Some new strategies should be tested on
Camera and laser robust integration in engineering and architecture applications 149
the immediate future: to develop a line based computation of the spatial resection, to
develop a self calibration process to render both the calibration parameters of each sensor
and the relationship among them, to work on a better integration and automatization of the
multiple procedures or to work on the generalization of this approaches to other fields like
the panoramic images.
7. References
Abdel-Aziz, Y.I. & Karara, H.M. (1971). Direct linear transformation from comparator
coordinates into space coordinates in close range photogrammetry. Proceedings of
the Symposium on close range photogrammetry, pp. 1-18, The American Society of
Photogrammetry: Falls Church.
Aguilera, D.G. & Lahoz, J. G. (2006). sv3DVision: didactical photogrammetric software for
single image-based modeling”, Proceedings of International Archives of
Photogrammetry, Remote Sensing and Spatial Information Sciences 36(6), pp. 171-179.
Baarda, W. (1968). A testing procedure for use in geodetic networks, Netherlands Geodetic
Commission Publications on Geodesy. New Series, 2 (5), Delft.
Brown, D. C. (1971). Close Range Camera Calibration. Photogrammetric Engineering.
Burns, B. J., Hanson, A.R. & Riseman, E.M. (1986) Extracting Straight Lines, IEEE
Transactions on Pattern Analysis and Machine Intelligence, pp. 425-455.
Canny, J. F. (1986). A computational approach to edge detection. IEEE Trans. Pattern Analysis
and Machine Intelligence, pp. 679-698.
Dana, K. & Anandan, P. (1993). Registration of visible and infrared images, Proceedings of the
SPIE Conference on Architecture, Hardware and Forward-looking Infrared Issues in
Automatic Target Recognition, pp. 1-12, Orlando, May 1993.
Domingo, A. (2000). Investigación sobre los Métodos de Estimación Robusta aplicados a la
resolución de los problemas fundamentales de la Fotogrametría. PhD thesis.
Universidad de Cantabria.
Douskos V.; Grammatikopoulos L.; Kalisperakis I.; Karras G. & Petsa E. (2009). FAUCCAL:
an open source toolbox for fully automatic camera calibration. XXII CIPA
Symposium on Digital Documentation, Interpretation & Presentation of Cultural Heritage,
Kyoto, 11-15October 2009.
Fischler, M. A., & R. C. Bolles, (1981). Random sample consensus: A paradigm for model
fitting with application to image analysis and automated cartography.
Communications of the ACM, 24(6), pp- 381-395.
Förstner, W. & Guelch, E. (1987). A fast operator for detection and precise location of distinct
points, corners and center of circular features. ISPRS Conference on Fast Processing of
Photogrammetric Data, pp. 281-305, Interlaken, Switzerland.
Gabor, D. (1946) Theory of Communication, Journal of Institute for Electrical Engineering, Vol.
93, part III. n.º 26. pp. 429-457.
González-Aguilera, D. & Gómez-Lahoz, J. (2008). From 2D to 3D Through Modelling Based
on a Single Image. The Photogrammetric Record, vol. 23, nº. 122, pp. 208-227.
González-Aguilera, D.; Rodríguez-Gonzálvez, P. & Gómez-Lahoz, J. (2009). An automatic
procedure for co-registration of terrestrial laser scanners and digital cameras, ISPRS
Journal of Photogrammetry & Remote Sensing 64(3), pp. 308-316.
150 Sensor Fusion and Its Applications
Grün, A. (1985). Adaptive least squares correlation: A powerful image matching technique.
South African Journal of Photogrammetry, Remote Sensing and Cartography 14 (3),
pp.175-187.
Han, J.H. & Park, J.S. (2000). Contour Matching Using Epipolar Geometry, IEEE Trans. on
Pattern Analysis and Machine Intelligence, 22(4), pp.358-370.
Harris, C. & Stephens, M. J. (1988). A combined corner and edge detector. Proceddings of
Alvey Vision Conference. pp. 147-151
Hartley, R. I. (1997). In defence of the 8-point algorithm. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 19(6), pp. 580-593.
Heck, (1981). The influence of individual observations on the result of a compensation and
the search for outliers in the observations. AVN, 88, pp. 17-34.
Hintz, R.J. & Zhao, M. Z. (1990). Demonstration of Ideals in Fully Automatic line Matching
of Overlapping Map Data, Auto-Carto 9 Proceedings. p.118.
Jarc, A.; Pers, J.; Rogelj, P.; Perse, M., & Kovacic, S. (2007). Texture features for affine
registration of thermal (FLIR) and visible images. Proceedings of the 12th Computer
Vision Winter Workshop, Graz University of Technology, February 2007.
Keller, Y. & Averbuch, A. (2006) Multisensor Image Registration via Implicit Similarity.
IEEE Trans. Pattern Anal. Mach. Intell. 28(5), pp. 794-801.
Kong, S. G.; Heo, J.; Boughorbel, F.; Zheng, Y.; Abidi, B. R.; Koschan, A.; Yi, M. & Abidi,
M.A. (2007). Adaptive Fusion of Visual and Thermal IR Images for Illumination-
Invariant Face Recognition, International Journal of Computer Vision, Special Issue on
Object Tracking and Classification Beyond the Visible Spectrum 71(2), pp. 215-233.
Kraus, K. (1997). Photogrammetry, Volume I, Fundamentals and Standard Processes. Ed.
Dümmler (4ª ed.) Bonn.
Laws, K. (1980). Rapid texture identification. In SPIE Image Processing for Missile Guidance,
pp. 376–380.
Levoy, M.; Pulli, K.; Curless, B.; Rusinkiewicz, S.; Koller, D.; Pereira, L.; Ginzton, M.;
Anderson, S.; Davis, J.; Ginsberg, J.; Shade, J. & Fulk, D. (2000). The Digital
Michelangelo Project: 3-D Scanning of Large Statues. Procedding of SIGGRAPH.
Li, H. & Zhou, Y-T. (1995). Automatic EO/IR sensor image registration. Proceedings of
International Conference on Image Proccessing. Vol. 2, pp.161-164.
Longuet-Higgins, H. C. (1981). A computer algorithm for reconstructing a scene from two
projections. Nature 293, pp. 133-135.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints, International
Journal of Computer Vision, 60(2), pp. 91-110.
Luhmann, T.; Robson, S.; Kyle, S. & Harley, I. (2006). Close Range Photogrammetry: Principles,
Methods and Applications. Whittles, Scotland, 510 pages.
Mancera-Taboada, J., Rodríguez-Gonzálvez, P. & González-Aguilera, D. (2009). Turning
point clouds into 3d models: The aqueduct of Segovia. Workshop on Geographical
Analysis, Urban Modeling, Spatial statistics, pp. 520-532, Yongin (Korea)
Mikhail, E.M. & Ackerman, F. (1976) Observations and least squares. New York. University
Press of America.
Mitka, B. & Rzonca, A. (2009). Integration of photogrammetric and 3D laser scanning data
as a flexible and effective approach for heritage documentation. Proceedings of 3D
Virtual Reconstruction and Visualization of Complex Architectures, Trento, Italy.
Camera and laser robust integration in engineering and architecture applications 151
Pope, A. J. (1976). The statistics of residuals and the detection of outliers. NOAA Technical
Report NOS 65 NGS 1, National Ocean Service, National Geodetic Survey, US
Department of Commerce. Rockville, MD, Washington, 133pp.
Rocchini, C.; Cignoni, P. & Montani, C. (1999). Multiple textures stitching and blending on
3D objects. 10th Eurographics Rendering Workshop, pp. 127-138.
Rousseau, F.; Fablet, R. & Barillot, C. (2000) Density based registration of 3d ultrasound
images using texture information, Electronic Letters on Computer Vision and Image
Analysis, pp. 1–7.
Sánchez, N.; Arias, B.; Aguilera, D. & Lahoz, J. (2004). Análisis aplicado de métodos de
calibración de cámaras para usos fotogramétricos. TopCart 2004, ISBN 84-923511-2-
8, pp. 113-114.
Sanz, E. (2009). Control de la deformación en sólidos mediante técnicas de fotogrametría de
objeto cercano: aplicación a un problema de diseño estructural. PhD thesis.
Universidad de Vigo.
Schenk T. (1986). A Robust Solution to the Line-Matching Problem in Photogrammetry and
Cartography, Photogrammetric Engineering and Remote Sensing 52(11), pp. 1779-1784.
Shepard, D. (1968). A two-dimensional interpolation function for irregularly-spaced data.
Proceedings of the ACM National Conference, pp. 517–524.
Stamos, I., & Allen, P. K. (2001). Automatic registration of 3-D with 2-D imagery in urban
environments. IEEE International conference on computer vision pp. 731-736.
Straßer, W. (1974) Schnelle Kurven-und Flaechendarstellung auf graphischen Sichtgeraeten,
PhD thesis, TU Berlin.
Süli, E. & Mayers, D. (2003). An Introduction to Numerical Analysis, Cambridge University
Press, ISBN 0-521-00794-1.
Zhang, Z.; Deriche, R.; Faugeras, O. & Luong, Q-T. (1995). A robust technique for matching
two uncalibrated images through the recovery of the unknown epipolar geometry.
Artificial intelligence, 78(1-2), pp. 87-119.
152 Sensor Fusion and Its Applications
Spatial Voting With Data Modeling 153
X7
1. Introduction
Our detailed problem is one of having multiple orthogonal sensors that are each able to
observe different objects, but none can see the whole assembly comprised of such objects.
Further, our sensors are moving so we have positional uncertainties located with each
observed object. The problem of associating multiple time based observations of a single
object by fusing future estimate covariance updates with our best estimate to date and
knowing which estimate to assign to which current location estimate becomes a problem in
bias elimination or compensation. Once we have established which objects to fuse together,
next we must determine which objects are close enough together to be related into a possible
assembly. This requires a decision to determine which objects to group together. But if a
group of nearby objects is found, how are their spatial locations best used? Simply doing a
covariance update and combining their state estimates yields a biased answer, since the
objects are “not” the same and “not” superimposed. Therefore, naive covariance updating
yields estimates of assemblies within which we find no objects. Our proposed spatial
correlation and voting algorithm solves this spatial object fusion problem.
The spatial voting (SV) concept for the object to assembly aggregation problem is based on
the well-known principles of voting, geometry, and image processing using 2D convolution
(Jaenisch et.al., 2008). Our concept is an adaptation of the subjects as covered in Hall and
McCullen (2004) which are limited to multiple sensors, single assembly cases. Our concept is
an extension to multiple orthogonal sensors and multiple assemblies (or aspects of the same
assembly). Hall and McCullen describe general voting as a democratic process. Hard
decisions from M sensors are counted as votes with a majority or plurality decision rule. For
example, if M sensors observe a phenomena and make an identity declaration by ranking n
different hypothesis, summing the number of sensors that declare each hypothesis to be true
and taking the largest sum as the winner forms an overall declaration of identity. From this,
it is easy to see that voting many times reduces to probabilities or confidences and their
efficient mathematical combination. Typically, this is where either Bayes’ rule or other
covariance combining methods are used such as covariance updating and Klein’s Boolean
voting logic (Klein, 2004). However, all of these methods are still probabilistic.
154 Sensor Fusion and Its Applications
SV reduces object location uncertainty using an analog method by stacking and tallying,
which results in a vote. It is common with probabilistic methods to assume the assembly is
at the center of the estimate with some uncertainty. In our spatial approach, we don’t make
that assumption. Rather, SV states that the object is located with confidence somewhere in
the area. The larger the area, the higher the confidence that one or more objects will be
contained within its boundary, which is the opposite approach taken in traditional
covariance confidence updating.
2. Approach
In SV, the sensor report ellipses are stacked and the aggregation becomes a tally or vote.
This results in a growing landscape of overlapping ellipses. By collecting assembly object
estimates throughout one full epoch, a full landscape of all the best available assembly
sensor reports and locations are obtained. By convolving this array with a 2-D spatial kernel,
it is possible to achieve correlation based on a mission controlled setting for choosing a 2-D
kernel of the necessary spatial extent. For our application, the grid exists as a 128 x 128
element array that is a total size of 128 x 128 meters. Each unit on the grid is 1 square meter,
and we wish to fuse elements up to 5 meters apart using a suitable 5m x 5m kernel. Spatial
averaging combines by blending high regions in the array that are approximately 5 meters
apart or less into a continuous blob. The blob is then isolated by calculating an adaptive
threshold from the frame and zeroing everything below the threshold. The resultant hills are
projected down to the x and y-axis independently and the local sub pixel regions are
extracted. The pixel regions are used to calculate spatial extent in the form of a covariance
for the extracted region for the newly discerned assembly. Finally, each located assembly is
evaluated to estimate total confidence of the assembly being significant or anomalous and
the assembly labeled accordingly.
3. Algorithm Description
A flowchart for the SV algorithm is given in Fig. 1 on the top of the next page. SV has been
implemented in MathCAD 14 and this implementation is included in Fig. 8 – 13 as a SV
simulation shown later in this chapter. The SV simulation allows Monte Carlo cases to be
generated (explained in Section 5). The example case described in this section is the first
Monte Carlo case (FRAME=0) generated by the MathCAD simulation. To initialize the SV
process, first define the dimensions of the detection space. For our example case, the size of
the grid is 128 x 128 grid units. Each grid unit is 1 meter by 1 meter in size. Next, define the
spatial extent over which objects are to be fused together as assemblies. This defines the size
(or spatial extent) of the spatial convolution kernel (spatial correlator) that we apply to the
detection space once the centroid and covariance defined ellipse representations (derived
encompassing rectangles) are stacked. The kernel size is given by
Spatial Extent
Kernel Size (1)
Grid Resolution
where kernel size is in number of grid units, and spatial extent and grid resolution are in
meters.
Spatial Voting With Data Modeling 155
The spatial convolution kernel (Jain, 1989) (NASA, 1962) is the equivalent kernel shown in
Equation (2c), which is a 5m x 5m (with very little effect past 2m) matrix resulting from the
convolution of the two low-pass or smoothing kernels used in image processing given in
Equations (2a) and (2b).
1 1 1 1 2 1
Kernel LowPass 1 3 1 (a) KernelGaussian 2 4 2 (b) (2)
1 1 1 1 2 1
(3m x 3m) (3m x 3m)
1 3 4 3 1
3 11 16 11 3
Kernel SpatialCon volutionKe rnel Kernel LowPass * *Kernel Gaussian 4 16 24 16 4 (c) (5m x 5m)
3 11 16 11 3
1 3 4 3 1
The spatial convolution kernel in Equation (2c) defines the shape of a Gaussian distribution,
and by convolving the spatial convolution kernel with the detection space, it is converted
into a correlation map describing how each pixel neighborhood in the detection space is
correlated with the spatial convolution kernel.
156 Sensor Fusion and Its Applications
To enlarge the kernel to match the spatial extent, the spatial convolution kernel in Equation
(2c) is convolved with itself until the equivalent kernel size corresponds with the extent. The
number of times that the kernel in Equation (2c) is convolved with itself is given in Equation
(3), and the final equivalent kernel requiring convolving the original spatial convolution
kernel n times with itself is given in Equation (4) as
where Kerneln-1 is the result of convolving the spatial convolution kernel with itself n-1
times.
The estimated object’s position is described by the sensor report using position centroid and
covariance (which defines the location and size of the uncertainty region). The centroid and
covariance are given by
x xx xy
X
y
yx yy
x
1
N x
i
i y
1
N y
i
i
N N N
xx
1
N xi x 2
i 1
xy
1
N x y
i 1
i x i y yy
1
N y
i 1
i y
2 (5)
where xi and yi are the individual sensor reports of position estimates used to derive the
centroid and covariance. We begin with the sensor report consisting of the position centroid
X and the covariance . From the covariance, the one sigma distance from the centroid
along the semi-major (a) and semi-minor (b) axes of the ellipse are given by
1
a2 xx yy yy xx 2 4 xy2
2
1
2
b2 xx yy yy xx 2 4xy2 (6)
(along Semi-major axis) (along Semi-minor axis)
and the angle of rotation of the semi-major axis is given in Equation (7) and the lengths a
and b and the rotation angle are used to define a rotated ellipse of the form given in
Equation (7).
1 2 xy
arctan
2 yy xx
where h is the centroid x value, k is the centroid y value, and a and b are defined in Equation
(6). The ellipse in Equation (7) defines the perimeter of the elliptical region, and to define the
entire region encompassed by the ellipse, we simply change the equality (=) in Equation (7)
to the less than or equal to (<) inequality so that the function includes not only the
boundary, but also the locations contained within the boundary.
Because the actual location of each object is unknown, the only information that is available
is contained in the sensor report in the form of a centroid and covariance. It is an incorrect
assumption that the object is located at the center of the ellipse; because if this were true
then the covariance information would not be needed since the true position would be
defined by the centroid alone.
The semi-major axis length, semi-minor axis length, and rotation angle are converted into
covariance using
Fig. 2 shows 2 examples of starting with the rotation angle and semi-major and semi-minor
axis lengths deriving the covariance matrix and corresponding ellipse.
100
80
80
b a
60
60
Y-Axis
Y-Axis
40
40
20
20
0
0 20 40 60 80 100 0 20 40 60 80 100
X-Axis X-Axis
Fig. 2. Starting with the semi-major axis length, semi-minor axis length, and rotation angle,
the covariance matrices and ellipses above are derived using Equation (8).
If the ellipses are placed into the detection grid directly, artifacts are introduced by aliasing
and pixelization from the boundary of the ellipse. Also, as the size of an ellipse to place
becomes small relative to the detection grid size, the overall shape approaches the rectangle.
Therefore, to minimize scale dependent artifacts, encompassing rectangles with well-
defined boundaries replace each ellipse (Press et.al., 2007). The semi-major axis of the ellipse
is the hypotenuse of the triangle, and completing the rectangle yields the first
approximation to an equivalent rectangle. Finally, the width of the rectangle is scaled to the
semi-minor axis length to preserve the highest spatial confidence extent reported in the
covariance. The length of the sides of the rectangle that are placed instead of the ellipse are
given by
158 Sensor Fusion and Its Applications
x 2 a cos( )
if xx yy
y Max ( 2 a sin( ), 2 b sin( ))
x Max ( 2 a sin( ), 2b sin( )) (9)
if yy xx
y 2 a cos( )
where again is the rotation angle of the semi-major axis given in Equation (7) above. The
length of the semi-minor axis modifies the size of the rectangle subject to the conditions
given in Equation (9), which preserves the axis with the greatest spatial location confidence.
The value at each grid location inside of the rectangle is
1 1
Value (10)
N x y
where x and y are the sides of the rectangle, and if the analysis is done under the
assumption that a smaller area increases the confidence that it contains an object. If on the
other hand the converse is true, then N = 1, which implies that the confidence of an object
being contained in a larger area is weighted higher than the confidence in a smaller area
when the spatial vote or stacking occurs. Both forms may be used to determine how many
pedigree covariance reports are associated with each respective assembly by using the sum
of the values mapped into the assembly location as a checksum threshold.
Now that the rectangle extent and value is defined, the rectangles are stacked into the
detection grid one at a time. This is accomplished by adding their value (1 or 1/Area
depending on how scoring is done for testing sensor report overlap. If 1’s are placed the
number of overlaps is max value in subarray, if 1/Area is used sum>Area indicates overlap)
in each rectangle to the current location in the grid where the rectangle is being stacked. Fig.
3 (left) shows as an example (obtained from the MathCAD 14 implementation in Fig. 8-13)
38 original sensor reports along a stretch of road, and (left center) is the detection grid after
stacking the 38 rectangles that represent each of the sensor reports and applying the spatial
convolution kernel.
Fig. 3. (Left) Original 38 sensor reports, (Left Center) the detection grid after stacking and
applying the spatial convolution kernel, (Right Center) After applying the threshold, and
(Right) locations in the graph on the left converted into a 0/1 mask.
Next, automatically calculate a threshold to separate the background values in the detection
grid from those that represent the assemblies (anomaly detection). This threshold is
Spatial Voting With Data Modeling 159
calculated as the minimum value of the non-zero grid locations plus a scale factor times the
range of the values in the detection grid (set in the MathCAD implementation as 0.3 times
the range (maximum minus minimum) of the non-zero values) in Fig. 3 (left center). Values
that occur below this threshold are set to zero, while those that are above the threshold
retain their values. Fig. 3 (right center) shows an example of applying the threshold to the
detection grid in Fig. 3 (left center), resulting assembly blobs (fused objects) are shown. Also
shown in Fig. 3 (right) is an example of the mask that is formed by setting all values above
the threshold to one and all those below the threshold to zero.
In order to isolate the assemblies that now are simply blobs in the detection grid, we
compute blob projections for both the x-axis and the y-axis of the grid by summing the mask
in Fig. 3 (right) both across the rows (y-axis projection) and down the columns (x-axis
projection). Fig. 4 shows examples of these assembly shadow projections, which are
calculated using
ngrid 1 ngrid 1
DX j
k 0
Dk , j and DYi
D
k 0
i, k (11)
where D is the array shown in Fig. 3 (right), ngrid is the grid size (128 for our example), DXj
is the x-axis projection for the jth column, and DYi is the y-axis projection for the ith row.
2 2
0 0
0 50 100 0 50 100
X-Axis
j Y-Axis
i
Fig. 4. Example assembly shadow projections for the x-axis (left) and y-axis (right) for the
mask shown in Fig. 4 (right).
Once these assembly shadow projections are calculated, they are renormalized so that all
non-zero locations have a value of one while zero locations remain zero. Using the assembly
shadow projections, horizontal and vertical lines are placed across the detection grid
corresponding to the transition points from zero to one and from one to zero in the graphs
in Fig. 4. Regions on the grid that are formed by intersections of these lines are labeled 1
through 45 in Fig. 5, and are the candidate assemblies that are identified (including zero
frames).
Each of the 45 candidate assembly subframes are processed to remove zero frames by
determining if any non-zero locations exist within its boundary. This is done by extracting
the assembly subframe into its own separate array and calculating the maximum value of
the array. If the maximum value is non-zero, then at least one grid unit in the array is a part
of an object assembly and is labeled kept.
160 Sensor Fusion and Its Applications
1 2 3 4 5 1,2
1 2
c6 7 8 9 10 7 1,2
11 12 13 14 15 12
16 17 18 19 20 17
21 22 23 24 25 22
26 27 28 29 30 27
31 32 33 34 35 32,33
39
36 37 38 39 40
41 42 43 44 45 45
1 2,7 33 39 45
7,12 1 2
17,22
27,32
Fig. 5. (Left) Identification of candidate assembly subframes using the shadow projection in
Equation (11) and Fig. 3, (Center) Applying the shadow projection algorithm a second time
to Region 7 to further isolate assemblies. After processing the subframes a second time, a
total of 12 candidate assemblies have been located.
This is repeated for each of the subframes that are identified. Once the subframes have been
processed, we repeat this process on each subframe one at a time to further isolate regions
within each subframe into its own candidate assembly. This results in subframes being
broken up into smaller subframes, and for all subframes found, the centroid and covariance
is calculated. This information is used by the object and assembly tracking routines to
improve object and assembly position estimates as a function of motion and time. As a
result of this processing, the number of candidate assembly subframes in Fig. 5 is reduced
from 45 to 12 (the number of isolated red regions in the grid). The final results after applying
the SV algorithm to the stacked rectangles in Fig. 6 (center) results in the graphic shown in
Fig. 6 (right).
Fig. 6. (Left) Sensor reports, (Center) rectangles stacked in the detection grid with SV
smoothing applied, (Right) final result after applying spatial voting.
4. Heptor
From an isolated SV target, we have available the geospatial distribution attributes (X and Y
or latitude and longitude components characterized independently, including derivatives
across cluster size transitions), and if physics based features exist, Brightness (including
Spatial Voting With Data Modeling 161
derivatives across cluster size transitions), Amplitude, Frequency, Damping, and Phase.
Each of these attributes is characterized with a fixed template of descriptive parametric and
non-parametric (fractal) features collectively termed a Heptor (vector of seven (7) features)
and defined in Equations (12) – (18) as:
3
N N xj x
1
N 1 (x j x) 2 (12) Skew 1
N (13)
j 1
j 1
N x x4 N xj x
6
Kurt N1
j
3 (14) M 6 N1 15
(15)
j 1
j 1
8
Range
N xj x log
M 8 N1 105
(16) NJ i (17)
Re( J i1 ) min lim
j 1
J 0 1
log
N
N 1 x j 1 x j 2
DH 1 log N1 1 ( Range ) (18)
j 1
The basic Heptor can be augmented by additional features that may exist, but in their
absence, the Heptor represents an excellent starting point. Equations (19) to (23) lists some
additional features that can be used to augment the Heptor.
N
x 1
N x
j 1
j
(19) Min min( x ) (20)
The features are associated with a category variable (class) for 1=no target or nominal, and
2=target or off-nominal. The next step is to drive a classifier to associate the features with
class in a practical fashion. Here we use the Data Model since very few examples are
available, and not enough to discern statistical behavior. For this, we want to populate a
knowledge base which encodes all examples encountered as observed and identified
species. This mandates a scalable, bottom-up approach that is well suited to the Group
Method of Data Handling (GMDH) approach to polynomial network representation.
5. SV Simulation in MathCAD 14
SV has been implemented in MathCAD 14, C, and Java. Included in this work (Figs. 8-13) is
the MathCAD 14 implementation, which facilitates the ability to generate Monte Carlo
ensemble cases used in deriving the Data Model decision architecture described in later
sections of this chapter.
162 Sensor Fusion and Its Applications
isigma 1 For placing ellipses ntgt 3 Number of targets per epoch sx 6 Max number of
components per target
B. Kernel Initialization
i 0 nsize 1 j 0 nsize 1 t nsize res t 128 meters by t 128 meters grid ( res 1 meter resolution)
maxseparation
ksize floor
ksize1
ksize1 ksize1 5 2 1 ksize 5 kernel size to detect the
res 2 selected max separation
number of LP/G passes required to build
ncomb floor
ksize 3 1 1 1 1 2 1
1 ncomb 0 ksize kernel starting with initial LP/G
2 LP 1 3 1 G 2 4 2
Initial LowPass and Gaussian kernels
1 1 1 1 2 1
C. Calculate final correlator kernel LPK 0 GK 0 ij 0 2 ik 0 2
nsize 1 nsize 1 nsize 1 nsize 1
LP G
ijik ijik
LPK GK
floor[ ( nsize 1)0.5] 1 ijfloor[ ( nsize 1)0.5] 1 ik 11 floor[ ( nsize 1)0.5] 1 ijfloor[ ( nsize 1)0.5] 1 ik 16
nsize 1
sy jtgt 0 ntgt 1 k 0 ntgt 1 ty floor[ [ ( jtgt 1) sy jtgt sy 1] 0.5] tx rpoly
ntgt jtgt jtgt ty jtgt
tx1 tx sx RD( kkk1 jtgt ) ty1 ty sx RD( kkk1 jtgt 2) rottheta 180
jtgt jtgt jtgt jtgt
RD[ kkk1 ( k 1) 5] rottheta sval1 9 RD[ kkk1 ( k 1) 3]
k 180 k
FA m 0 k 0 rows ( FA ) 1 ix FA iy FA
n k k 0 n k k 1
for p 0 nsize 1
n rows ( ix) k 0 n 1 ix and iy are the
if RD[ kkk1 ( p ) 8] 0.75 detection centroids
tmp [ RD[ kkk1 ( p ) 8] 0.8] nsize roadx rpoly roady ir1
ir1 ir1 ir1
FA rpoly tmp
m 0 p
FA p MAPCOMP ( jxjy ) m 0
m 1 min2( A z) aA for i 0 rows ( jx) 1
m m 1
FA
a z if z a
a
n rows jx i
for j 0 n 1
max2( A z) aA
a z if z a
ix jx
m ij
ir1
100
a iy
m
jy ij
ty1 jtgt m m 1
Triangles are truth targets,
iy k 50 X's are detections A ix
0
A iy
1
0
A
0 50 100
rpoly ir1 tx1jtgt ixk
A1 PLCC( A1 xcent ycent Covar icolor isigma 1) D PLCC( D xcent ycent Covar icolor isigma 1)
I. Heptor Candidate Assemblies for Decision Architecture
AAA HEPFRAME( IFRAMES tx1ty1 ) kkk1 FRAME 1 kkk1 is the Monte Carlo parameter to vary (increment by +1)
FRAME increments using the Animation tool under
Tools/Animation/Record
c
j i
c
j i
yc k j
dm yc
j k i i
dm 0 0 C1 1 tmp
a 0.5 C
b 0.5 C C tmp
for i 0 nfiles 1 0 0 1 1
for j 0 nfiles 1 xlen 2 a sin ( )
c ylen 2 a cos ( )
j i
c if n 1
j i n1 xlen 2 b cos ( ) if 2 b cos ( ) xlen
c if C C
0 0 1 1
ADELL( A xy C ikul sig ) nr rows ( A )
xlen 2 a cos ( )
nc cols ( A )
Maps points around ylen 2 a sin ( )
an ellipse (for plotting) sx C
0 0
ylen 2 b cos ( ) if 2 b cos ( ) ylen
sy C
1 1 I 0.5 0.5 2
xmn max2floor I
1i 5
C
0 1 ymn max2floor I I 0.5 0.5 2
sx sy 2i 5
cd11 sx xmx min2floor I I 0.5 0.5 cols I 3
1 5 0
i
cd12 0.0
ymx min2floor I I 0.5 0.5 rows I 3
cd21 sy 2i 5 0
10 for i ymn ymx
cd22 10
if ymx ymn xmx xmn
2 2
cd22 sy 1 if 1 0 for j xmn xmx
for i 0 0.01 6.3 if I 0
4
x1 cd11 sig cos ( i) cd12 sig sin ( i)
tmp ( ymx ymn 1) ( xmx xmn 1)
x1 floor ( x1 0.5) x
1
y1 cd21 sig cos ( i) cd22 sig sin ( i) I0i j I0ij tmp
y1 floor ( y1 0.5) y
imp 0
if y1 0 y1 nr 1 x1 0 x1 nc 1
A ikul
I0i j I0ij 1 otherwise
y1 x1
abc 0 imp 0
c xmn p
A 0 rflr( xp ) x x 10
xH max DDX
i
DX 1 if DX 1
j j break if i rows ( D)
yL DDY
0 j
for j 0 rows I 0 1 icol icol 1
i i 1 otherwise
cols I0 1
yH max DDY
j
E
Determines non-zero
sum 0
DY
j D
j k
CALCOV( A ) n rows ( A )
locations in each
candidate assembly
k 0
sm1 0 (A holds all candidate
DY 1 if DY 1 covar 0
j j 0 assemblies) and
N0 calls COVAR
I DX if n 1 1
H0 1
for i 1 n 1
D0 I DY
2 m 0
M0
I D CA
4 i
M 0
rows I0 1 cols I0 1
J SHAD( I) B0
for m xL xH if rows ( J) 1 for j 0 rows ( C) 1
for L yL yH for k 1 rows ( J) 1 for k 0 cols ( C) 1
p L yL for j 0 9 if C 0
q m xL j k
E J
3Lm
n1 j k j B k
D I m 0
p q n1 n1 1
B j
M F ijk 1 m 1
L m L m
m m 1
sm1 sm1 I 0Lm G
G E
covar COVAR( B2 m)
mean( A ) ym 0 i
L m 3
H F I for i 0 rows ( A ) 1 abc 0
p q Calculates mean
L m
A covar
N N 1 if I
of 1-D array A i
0
0 L m ym ym
rows ( A )
sum sum I F ym
3 L m
L m Calculates max
for m xL xH if I
5
0 sm1 ( 1 tl) I
6 0 N 1 maxxA
( ) mx A
0 0
of 2-D array A
J( A ) n rows ( A ) rows ( A) 1
1
ystat 1 HEP( A ) ym
rows ( A )
A
i
yold 2 i 0
j0 rows ( A ) 1
1
Ai ym
2
if ( max( A ) min( A ) ) 0 h
0 rows ( A ) 1
while yold Re( ystat ) .0001 i0
yold Re( ystat ) rows ( A) 1 3
1
i 0
A ym h 1
if Im( ystat ) 0 h
1 rows ( A )
i0
lgr ln ( Re( ystat ) )
1
rows ( A) 1 4
1
lgi 1 sign ( Re( ystat ) )
1
i 0
A ym h 1 3
h
2 rows ( A )
lgi logi 0.5 i0
otherwise rows ( A) 1 6
1
i 0
A ym h 1 15
xx Re( ystat ) ystat 2
h
3 rows ( A )
i0
yy Im( ystat ) ystat 2 rows ( A) 1 8
1
i 0
A ym h 1 105
if xx yy 1 0 h
4 rows ( A )
acot atan yy xx 1 h J( A )
i0
5
abc 0
1
if xx yy 1 0
A ( A min( A ) ) ( max( A ) min( A ) )
rows ( A) 2
acot 0.5
2
M 1 A ( rows ( A ) 1) A ( rows ( A ) 1)
abc 0 i 1 i
i 0
if xx yy 1 0
h 1 ln
M 1
( ln( M) )
acot atan yy xx 1 6 rows ( A ) 1
abc 0 h
HEPFRAME( I xy ) for i 1 rows ( I) 1
logr ln xx yy
2 2 0.5 PLCR( I) n rows I 2 A0
logi sign ( yy ) acot m 0
if n 1 0
ystat max( A ) min( A )
for i 0 n 1 i8 1
for j 0 rows I
1 1
ystat ystat n Stdev ( A ) D1 E2RC( I i) i8 1
for k 0 cols I
1 1
ystat ln( ystat ) ln n
I D1
0 0 A I
m i 8
j k
xm D1 1 0
ystat ystat lgr ln n
1 i
1 m m 1
xx D1 H
i1
HEP( A )
ystat ystat 1 lgi ln n
1 i 1
1 1
ym D1
icls 2
jj1 i 1
2 for j 0 rows ( x) 1
abc 0 yx D1 if x I x I y I y I
i 1 3 j i 3 j i 5 j i 4 j i 6
h Re( ystat )
imp 0 icls 1
h 1 if h 1
E I ijk 1
h 0 if h 0 0 0
H icls
1 E xm 7 i1
hh if h .5 1
T
1 E xx HH
h (1 h) otherwise 2
w APPENDPRN( "feats.out" H)
h E ym
PLCR places rectangles 3
Calls Heptor for each
Calls E2RC (ellipse to rectangle) E yx
4 candidate assembly frame
for multiple covariance ellipses
and determines if a truth
Returns array with rectangles E
location is within the assembly
stacked along with min and max
(sets output variable for use
x and y locations for each
in Data Model K-G algorithm
rectangle extent
used to derive the Decision
Architecture)
for i 0 twon 1 if i j B A
i ysize j i j xsize
i
zA FFT A1
i
t1 x
i B
i
zB FFT B1
i
x x
i j MAX ( A ) n0
T x t1 B 0
zA zA j 0
T t1 y for i 0 rows ( A ) 1
zB zB i
for j 0 cols ( A ) 1
for i 0 twon 1 y y
i j
i
zA FFT zA
i
y t1
j
if A
i j
0
i
zB FFT zB
i
n1 0
B A
n i j
nn1
T n2 1
zA zA
maxb B
m floorln( n ) ( ln( 2) )
1 0
zB zB
T
for i 1 n 1
for i 0 twon 1 for i 0 m 1
maxb B if B maxb
for j 0 twon 1 n1 n2 i i
n2 n2 n2 maxb
zC zA zB Max non-zero
i j i j i j
a0 values in 2-D
for i 0 twon 1 array A
for j 0 n1 1
for j 0 twon 1
zC zC
c cos 2 a n
1
s sin 2 a n
i j i j
1
for i 0 twon 1
i
zC FFT zC
i
aa2
( m i 1)
for k j j n2 n 1
T
zC zC
t1 c x sy
k n1 k n1
for i 0 twon 1
i
zC FFT zC
i
t2 s x
k n1
cy
k n1
x x t1
T k n1 k
zC zC
y y t2
C RC( zC) k n1 k
for i 0 n 1 x x t1
k k
for j 0 n 1
y y t2
D
i j
Re C i n1 j n1 twon k k
D F x 1 y n 1
F
The first 2 pages (Fig. 8 and 9) list the overall structure of the SV algorithm implementation
(the main program body), and each of these 2 pages has been broken up into lettered
sections with brief descriptions of each section. The remaining 4 pages (Fig. 10-13) are
individual MathCAD programs that implement each of the specific functions used in SV,
along with a general description of each function. When the MathCAD 14 document is
loaded, a single case is generated. In order to vary the road and object placements, new
individual cases can be generated by increasing the value of kkk1 (Fig. 9, Section I at the
bottom of the figure) in integer steps. Alternatively, Monte Carlo cases can be generated
using the Tool/Animation/Record pull down menu to load the movie recording capability
in MathCAD 14. Place a fence around the kkk1 equation and set the FRAME variable to
range from 0 to the number of Monte Carlos desired and set the time step to 1. The resultant
HEPTOR features for each Monte Carlo are written into the file feats.out in the HEPFRAME
function (note, delete this file from the directory containing the MathCAD 14 document
before starting this process so that only the selected Monte Carlos are written into the file).
6. Classifier KG algorithm
To derive a general mathematical Data Model (Jaenisch and Handley, 2003), it is necessary
to combine multiple input measurement variables to provide a classifier in the form of an
analytical math model. Multivariate linear regression is used to derive an O(3n) Data Model
fusing multiple input measurement sources or data sets and associated target label
definitions. This is accomplished using a fast algorithm (flowchart in Fig. 14) that derives
the coefficients of the approximation to the Kolmogorov-Gabor (KG) polynomial (which
they proved to be a universal function or mathematical model for any dynamic process)
which takes all available inputs in all possible combinations raised to all possible powers
(orders).
The full KG multinomial is impractical to derive directly. One method for approximating
the KG polynomial is the Group Method of Data Handling (GMDH) algorithm (Madala and
Ivakhnenko, 1994), which has been improved upon by the author into Data Modeling. Data
Modeling uses multivariable linear regression to fit combinations of input variables (up to a
user specified number at a time) to find the minimum error using either correlation or root
sum square (RSS) differences between the regression output and the objective function. The
best of these combinations (user specified number) are retained and used as metavariables
(new inputs) and the process repeated at the next layer. Layering is terminated when overall
desired RSS difference is minimized (Jaenisch and Handley, 2009). Figs. 16-20 on the
following pages contain a MathCAD 14 implementation of Data Model K-G algorithm that
was used to build the decision architecture in Section 7, and as before, Fig. 16 is broken up
into sections for explanation.
7. Decision Architecture
It is possible to identify an optimal subset of the exemplars using available support vector
finding machines; however, a good rule of thumb is to use 10% of the available exemplars.
The SV algorithm in Figs. 8-13 was run for 50 epochs (FRAME ranging from 0 to 49),
generating a total of 320 exemplars. The first 1/3 of these points (107 exemplars) was used
as input into the MathCAD 14 document in Figs. 16-20. Fig. 16 shows the output results
from this Data Model graphically at the bottom of the page. Two thresholds were set (lower
threshold at 0.89 and an upper threshold at 1.92), and the exemplars cases which fell
between the two thresholds were pulled out as the support vectors (87 of the 107 original
cases were selected as support vectors) using the EXTR function provided.
Starting with these 87 exemplars, a new Data Model was generated using the decision
architecture construction/execution flowchart in Fig. 15. Each node was constructed using
the exemplars siphoned from the previous node (using EXTUP in the MathCAD document).
The number of layers (nlay) was changed to 2 to make the Data Models shorter for
publication in this work. A total of 3 nodes (bulk filter plus 2 resolvers) were required to
learn these 87 support vector exemplars (with care taken to preserve each Data Model
BASIC source code written out from the MathCAD 14 document at each siphon point along
with the exemplar data).
H EPTOR
D ata (SV )
D e te rm i n e
N o de 1 C o nf > Yes
Suppo rt
V e cto rs (Bul k F i l te r) z
90% ?
D EC LA R E
Yes
No
N o de 2 C o nf >
z ~
(R e so l v e r 1) 90% ? ~
~ No Yes
~
N o de N C o nf > No
(R e so lv e r N -1) z
90% ?
R EJEC T
prec 2 exitc 0.0001 nfwd 3 nlay 4 maxvar 3 maxorder 3 X1 READPRN( "feats.out" ) nins 7
Calc Prec Exit Criteria # metvar # Layers Max # ins Max bldg blk Data file from SV # inputs
forward per bldg blk Order
outcol 7 nsamp 107
Output Data Col # samps to
build DM
B. Pull out samples and Sort into ascending order on output (for visualization)
i 0 nsamp 1 j 0 nins 1 X2 X1 X2 X1 X2 CSORT( X2 outcol ) X X2
i j i j i outcol i outcol i j i j
Y X2 j 0 cols ( X ) 1 Xtmp X2 npts rows ( X ) i 0 npts 1
i i outcol
C. Supply names for inputs and output InNam concat ( "z" num2str ( j 1) ) Input names (z1 to z7) OutName "y"
j
D. Z-score inputs and output
xavg rflr mean X
j
j
prec xdev rflr ADEV X
j
j
npts prec yavg rflr( mean ( Y) prec )
X xavg Y yavg
i j j i
ydev rflr( ADEV( Y npts ) prec ) X Y
i j xdev i ydev
j
E. Perform Multivariable linear regression (K-G) algorithm process
I X I InNam I Y I nlay I nfwd I exitc I maxvar I maxorder I prec I nins
0 1 2 3 4 5 6 7 8 9
F. Undo Z-scoring on output to score kgmodel kgmodel ydev yavg Y Y ydev yavg
i i i i
2
rss kgmodel Y rss 2.99
i i 2.5 lthresh 0.89
i
uthresh 1.92
Yi 2
kgmodeli
lthresh 1.5
uthresh
1
0.5
0 50 100
i
p
mean ( A ) ym 0 rflr( xp ) x x 10
for i 0 rows ( A ) 1 x floor ( x 0.5) B COD
A p
i x x 10 Enable to write Writes out the Data
ym ym
rows ( A ) x out siphoned data Model BASIC code
or support vectors
ym
Rounds x to p decimal
places precision
Calculates the mean of
1-D vector A
Returns combinations
Each column of A is a variable
Calculates the Average Multivariable linear regression
Each row of A is an example
Deviation of x (n is the
of that variable
number of points in x) FIT( n v din dout pc ) s 0
n 1
COMBIN( A ) tot 0
ADEV( xn ) avg 0 a 0
icnt 0 v 1 v 2
avgdev 0 iexit 0 x1 1
1
for i 0 n 1 mxc cols ( A ) for i 1 n
x mxw rows ( A )
i for j 1 v
avg avg
n mxc x1 din
maxcomb mxw j 1 i 1 j 1
for i 0 n 1
for i 0 mxc 1 x1 dout
x x avg v 2 i 1
i i ord 0
i for k 1 v 1
x
i while iexit 0
avgdev avgdev for m 1 v 2
n for i 0 mxw 1 a a x1 x1
k m k m k m
avgdev ord i
mxc1 s a
k k v 2
ifill 1
REMDUP( A ) n rows ( A ) Removes any s s x1 x1
for k 0 mxc 1 v 2 v 2 v 2 v 2
ncols cols ( A ) duplications
in output B A for i 2 v 1
for i 0 n 1 tot k ordk k
from COMBIN t a
ncols 1 i 1 i
if A ""
ordk k
A
i ncols
A
i j
ifill 0
for i 1 v 1
j 0 ji
abc 0
A csort ( A ncols ) while a 0
j i
tot tot 1 if ifill 1
itot 1 jj1
icnt icnt 1
for j 0 ncols 1 break if j v 1
if icnt maxcomb
B A break if j v 1
0 j 0 j
iexit 1
if n 1 for k 1 v 2
break
for i 1 n 1 ba
break if iexit 1 i k
if A A ic 1 a a
i ncols i 1 ncols i k j k
for j 0 ncols 1 iexit1 0 a b
j k
B A while iexit1 0
itot j i j
itot itot 1
s1 mxc ic 1 z1 a ii 1
s2 mxc ic 2 for k 1 v 2
abc 0
Embeds Z-score for i s1 s2 0
B a z1 a
of input in BASIC i k i k
if ord mxw 1
code export i for j 1 v 1
MV( I zj k)
"*" z"-"
z concat "((" num2str I
9 j
ord 0
i if j i
ic ic 1 z1 a
z concat znum2str rflr I I ")/(" j i
7 k 5 break for k 1 v 2
z concat znum2str rflr I I "))" otherwise a a z1 a
8 5
k ord ord 1
j k j k i k
i i c a
z Makes temporary 0 1 v 2
iexit1 1
file names for j 1 v
break
NAM ( i j k) vec2str tnam 96 i
0
for i 0 tot 1 j j 1 v 2 pc
c rflr a
tnam 96 j k for j 0 mxc 1 c str2num num2str c
1 j j
B1 B c
2 i j i j
Probability
p ( xms ) exp0.5
x m
distribution B1
s
Fig. 17. Part 2 of MathCAD 14 implementation to derive a multivariable Data Model.
172 Sensor Fusion and Its Applications
abc concat abc vec2str ( a ) " for input as #1"
0 0 rnk
m nf k
rnk
m nf k 1
jjj 1
abc concat ( "open " vec2str ( a ) "kggmdh.out" )
1
for m 0 n 1
1
abc concat abc vec2str ( a ) " for output as #2"
1 rnk
m i
z
m
abc "do until eof(1)"
2 rnk 0
m i nf
abc "input #1, "
3 for m 0 rows ( c ) 1
for i 0 tvar 2 if tvar 1 rnk c
m i nf m
abc concat abc Nam ", "
3 3 i rnk
i 1 0
cerr
abc concat abc Nam
3 3 tvar 1 ibreak 1
APPENDPRN ( f abc ) break if ibreak 0
RBLOCK( X Y a flg no pc ) nvar cols ( X ) jjj 1
n rows ( X ) rnk
Uses Combinatorial
Algorithm to determine
m 0 CHKEQN( I) n rows I 0 Reads Data Model
coefficients from file
power combinations and for i 1 no ni cols I 0 and generates Data
calls FIT for j 0 nvar 1 Model values
for j 1 I
Z j 1 4
j i 1
B COMBIN ( Z )
cname NAM I j 0 2
A REMDUP ( B) cname concat ( cname ".prn" )
DM( I) errc I 1
NEST( I) 0
n rows I
e concat O " = " num2str rflr a C
0 0 2
5
nvar cols I mm 1
0
Xi I
0 for i 1 C
rank 0 0
n1 2 I3
Nam I
1 for j 0 C 1
for i 0 I 1 1
3
ilay 0
Z j1
while errc I
5
ilay I
3
rank
i 0
10
20 j i 1
B COMBIN( Z)
ilay ilay 1 for i 0 n 1
A REMDUP( B)
I1 Xi a 1
0 i for j 0 rows ( A ) 1
I1 Nam
1
for v1 1 I
4 if rflr a mmC2 0 mmC2
rflr a 1.0
I1 I for j 0 nvar 1 e concat ( O " = " O " + " )
2 2 m
Z j1
I1 I
3 4
j v1 1 e
m m
concat e num2str rflr a mmC2
I1 I B COMBIN( Z)
4 6 for k 0 i 1
A REMDUP( B)
e concat e " * " N
I1 I
5 7 for v2 1 I m m Aj k 1
5
I1 I m m 1
6 8 for j 0 rows ( A ) 1
r NEST( I1) XA 0 otherwise
errc r
0 0 for v3 0 v1 1
if rflr a mmC2 0
for i I 1 2 I
4 4 XA
v3
0
I
Aj v3 1 mmC2
if rflr a 1.0
I2 zjj
tmp READPRN( cn ) 2
err err
( i 1) I4 j jj
A tmp
err err
WRITEPRN( "fnldm.prn" A )
1
rank SCR zn err rank I c
3
r
rank
Switches value
locations (Sorting)
Lower sort routine branch
BASIC source code exportation
(writes all except header)
Upper Sort routine branch
S2B( I) L I
SWT ( I xy ) temp I 0 x
2
S2A ( I) if I 1 I
I0x I0y
01 0
2 5 ir I
WPG( I) nl I 5
for j I 1 I
nf I
2 5
k floor [ ( L ir) 0.5] I0y temp
0 2 0 a I 0 j I SWT ( I k L 1)
temp I
for i 1 nl
b I 1 j if I 0L 1 I0ir 1
x
for k 1 nf
I SWT ( I L 1 ir)
1 x 1 y
I I
neqn k ( i 1) nf jf 0
00 neqn
nvar I if j 1
abc 0 I1y temp
for i j 1 j 2 1
if I 0L I0ir I
norder I
0 1 neqn if I 0 i a I SWT ( I Lir)
for j 0 nvar 1 abc 0
jf 1
var I
j 0j 2 neqn break
if I 0L 1 I0L
for kk 0 rows I 1 break if jf 1 I SWT ( I L 1 L)
I0i 1 I0i
6
if I
abc 0
var
6 j
I1i 1 I1i
kk i L 1
var MV I var j kk j ir
j j
ijk 1
abc 0 a I 0 L
ncoeff I 0nvar 2 neqn i 0 if jf 0
1 L
I0i 1 a b I
for j 0 ncoeff 1
b I
j 0nvar 3 j neqn I1i 1 b jf 0
while jf 0
abc 0 while jf 0
oname NAM ( i k 0)
if I 0 i i 1
C norder 4
0
C nvar
I 1
6
break if I 0 i a
1
break while jf 0
C I
2 5
bline WOB ( C b oname var )
I I
5 3 I jj1
4
break if I 0 j a
APPENDPRN I bline 4 I I
2 3 I 1
4
break if j i
break if i nl
I I 2 I SWT ( I i j ) otherwise
break if i nl 4 4
1
abc concat I " = " oname "*"
0 I I0L I0j
abc concat abc num2str rflr I I
0 0 3 5 I0j a
abc concat abc " + " num2str rflr I I
0 0 2 5 I1L I1j
abc concat I " = " I "/" num2str I
1 1 1 10
I1j b
abc concat "print #2, " I
2 1 2 variable sort
I I 2
abc "loop : close : end" main program 4 4
3 SORT2( irarr brr) I arr if I 50
4
0 4
APPENDPRN I abc
I brr kf 1
1
CSORT( A k) n rows ( A ) I 1 break
2
for i 1 n if ir i 1 j L
istack 0
arr A
i i 1 k
0
I3 I ir
I istack 4
3
brr i 1
i
I 0
4
I3I 1 i
4
loc SORT2( n arr brr)
1
I ir ir j 1
for i 0 cols ( A ) 1 5
I 0 otherwise
for j 1 n
B A
6
while I 0
3 I j 1
I
4
j 1 i locj i 6
B I S2A ( I) if I I 7
5 2
I3I 1 L
4
Fig. 21 shows the results of processing all of the 87 training exemplars thru the bulk filter
and 2 resolver Data Models in this process (Jaenisch et.al., 2002)(2010). All examples for
which the Data Model returns a value inside the lower and upper thresholds (labeled
Declare on each graph) are declared targets, while those outside the upper and lower
thresholds are deferred until the last resolver, where a reject decision is made. Rollup
equations for each node in the decision architecture are also provided under each graph in
Fig. 21. The coefficients in front of each variable are derived by first determining how many
times the variable occurs in the actual multinomial, normalizing each by dividing by the
number of occurrences of the least frequently occurring variable, summing the result, and
dividing each result by the sum. By normalizing by the least frequently occurring variable
first and then turning the number into a percentage by dividing by the result sum, the
coefficients describe the key and critical feature contribution in the full Data Model.
Bulk Filter
2.5
Defer Resolver 1
0.8 Resolver 2 Pd = 1, Pfa = 0
5
0.6 5
5
Defer
2 0.4
0.2 2
2
2
5 5
0
li 1 0 1 2 3
5 1 1
1.5
5
5 0 100 200 300
DC (0.79<Out<1.47) 0 10 20 30 40
DC (0.95 Out<1.07)
Declare (0.6<Out<1.28) 5
0 R (Out<0.79)
20 40 60
1
)0.8 )0.8
0.6 0.6
)
0.4 0.4
0.5 Defer(Out < 0.6)
0 20 40 60 80 0.2
0
0.2
0
1 0 1 2 3 1 0 1 2 3
2 2 2 1 2 2 2 2 2
Bulk StdDev Skew DfJ Kurt Resolver1 StdDev Kurt Resolver2 Kurt M 6 DfH
7 7 7 7 7 7 9 9 9
2 1 1 1 1
DfJ Skew StdDev Skew DfJ
7 7 9 9 9
Fig. 21. Results from processing the training examples thru the bulk filter Data Model
classifier and the ambiguity resolver. The entire decision architecture flows thru the bulk
filter and, if required, as many of the ambiguity resolvers until either a reject or declare is
determined.
The 3 BASIC files saved from deriving each of the nodes were combined together into the
single decision architecture BASIC program given in Figs. 22 and 23. The value for each
node in the decision architecture was converted into a confidence using the normal
probability distribution defined by
where Val is the value returned by the individual node in the decision architecture, m is the
average between the upper and lower declare thresholds, and s (normally in the distribution
the standard deviation) the value required so that Equation 25 returned a value of 0.9 (90%
confidence) at the declaration thresholds. At the upper declaration threshold, no potential
targets with a confidence of less than 90% are ever allowed to be declared, since they are
labeled as defer by the decision architecture. All of the 320 examples were processed thru
the decision architecture, yielding a probability of detection (Pd) of 0.65 and a probability of
false alarm (Pfa) of 0.16.
176 Sensor Fusion and Its Applications
CLS ba=ba+1.34*aa*aa*(z2+.06)/(.56)
ON ERROR RESUME NEXT ba=ba+.55*(z2+.06)/(.56)*(z2+.06)/(.56)*aa
OPEN "feats.out" FOR INPUT AS #1 ba=ba-.1*(z2+.06)/(.56)*(z2+.06)/(.56)*(z2+.06)/(.56)
OPEN "dearch.out" FOR OUTPUT AS #2 da1=ba*.5+1.54
DO UNTIL EOF(1) p1=EXP(-.5*((da1-.94)/.75) ^ 2)
INPUT #1, z1, z2, z3, z4, z5, z6, z7, trth RETURN
'z1 to z7 are heptor elements node2:
'trth=truth class from SV file for comparison aa=-.58-.27*(z3+.98)/(.5)
GOSUB node1 aa=aa-.55*(z1-343.23)/(50.85)
IF p1 >= .9 THEN aa=aa+1.12*(z6-1.54)/(.14)
class=1 aa=aa+1.16*(z3+.98)/(.5)*(z3+.98)/(.5)
ELSE aa=aa+.72*(z1-343.23)/(50.85)*(z3+.98)/(.5)
GOSUB node2 aa=aa-.53*(z3+.98)/(.5)*(z6-1.54)/(.14)
IF p2 >= .9 THEN aa=aa+.32*(z1-343.23)/(50.85)*(z1-343.23)/(50.85)
class=1 aa=aa-.57*(z1-343.23)/(50.85)*(z6-1.54)/(.14)
ELSE aa=aa+.1*(z6-1.54)/(.14)*(z6-1.54)/(.14)
GOSUB node3 aa=aa+.19*(z3+.98)/(.5)*(z3+.98)/(.5)*(z3+.98)/(.5)
IF p3 >= .9 THEN aa=aa+1.15*(z3+.98)/(.5)*(z1-343.23)/(50.85)*(z3+.98)/(.5)
class=1 aa=aa-.39*(z6-1.54)/(.14)*(z3+.98)/(.5)*(z3+.98)/(.5)
ELSE aa=aa+.48*(z3+.98)/(.5)*(z1-343.23)/(50.85)*(z1-343.23)/(50.85)
class=2 aa=aa-1.34*(z3+.98)/(.5)*(z1-343.23)/(50.85)*(z6-1.54)/(.14)
END IF aa=aa-.02*(z1-343.23)/(50.85)*(z1-343.23)/(50.85)*(z1-343.23)/(50.85)
END IF aa=aa+.21*(z6-1.54)/(.14)*(z6-1.54)/(.14)*(z3+.98)/(.5)
END IF aa=aa-.34*(z1-343.23)/(50.85)*(z1-343.23)/(50.85)*(z6-1.54)/(.14)
PRINT #2, class, trth aa=aa+.75*(z6-1.54)/(.14)*(z6-1.54)/(.14)*(z1-343.23)/(50.85)
LOOP aa=aa-.29*(z6-1.54)/(.14)*(z6-1.54)/(.14)*(z6-1.54)/(.14)
CLOSE ab=.29+.3*(z6-1.54)/(.14)
END ab=ab-.24*(z1-343.23)/(50.85)
node1: ab=ab+.15*(z2+.23)/(.5)
aa=.35+.47*(z6-1.52)/(.14) ab=ab-.45*(z6-1.54)/(.14)*(z6-1.54)/(.14)
aa=aa-.52*(z1-370.42)/(73.83) ab=ab+.69*(z1-343.23)/(50.85)*(z6-1.54)/(.14)
aa=aa+.4*(z2+.06)/(.56) ab=ab-.85*(z6-1.54)/(.14)*(z2+.23)/(.5)
aa=aa-.07*(z6-1.52)/(.14)*(z6-1.52)/(.14) ab=ab-.31*(z1-343.23)/(50.85)*(z1-343.23)/(50.85)
aa=aa+.37*(z1-370.42)/(73.83)*(z6-1.52)/(.14) ab=ab+1.02*(z1-343.23)/(50.85)*(z2+.23)/(.5)
aa=aa+.17*(z6-1.52)/(.14)*(z2+.06)/(.56) ab=ab-.3*(z2+.23)/(.5)*(z2+.23)/(.5)
aa=aa-.09*(z1-370.42)/(73.83)*(z1-370.42)/(73.83) ab=ab-.3*(z6-1.54)/(.14)*(z6-1.54)/(.14)*(z6-1.54)/(.14)
aa=aa+.33*(z1-370.42)/(73.83)*(z2+.06)/(.56) ab=ab+.39*(z6-1.54)/(.14)*(z1-343.23)/(50.85)*(z6-1.54)/(.14)
aa=aa+.07*(z2+.06)/(.56)*(z2+.06)/(.56) ab=ab-.5*(z2+.23)/(.5)*(z6-1.54)/(.14)*(z6-1.54)/(.14)
aa=aa-.04*(z6-1.52)/(.14)*(z6-1.52)/(.14)*(z6-1.52)/(.14) ab=ab+.15*(z6-1.54)/(.14)*(z1-343.23)/(50.85)*(z1-343.23)/(50.85)
aa=aa-.02*(z6-1.52)/(.14)*(z1-370.42)/(73.83)*(z6-1.52)/(.14) ab=ab-.04*(z6-1.54)/(.14)*(z1-343.23)/(50.85)*(z2+.23)/(.5)
aa=aa+.01*(z2+.06)/(.56)*(z6-1.52)/(.14)*(z6-1.52)/(.14) ab=ab-.16*(z1-343.23)/(50.85)*(z1-343.23)/(50.85)*(z1-343.23)/(50.85)
aa=aa-.02*(z6-1.52)/(.14)*(z1-370.42)/(73.83)*(z1-370.42)/(73.83) ab=ab-.03*(z2+.23)/(.5)*(z2+.23)/(.5)*(z6-1.54)/(.14)
aa=aa-.05*(z6-1.52)/(.14)*(z1-370.42)/(73.83)*(z2+.06)/(.56) ab=ab+.46*(z1-343.23)/(50.85)*(z1-343.23)/(50.85)*(z2+.23)/(.5)
aa=aa+.16*(z2+.06)/(.56)*(z2+.06)/(.56)*(z6-1.52)/(.14) ab=ab-.13*(z2+.23)/(.5)*(z2+.23)/(.5)*(z1-343.23)/(50.85)
aa=aa-.02*(z1-370.42)/(73.83)*(z1-370.42)/(73.83)*(z2+.06)/(.56) ab=ab-.05*(z2+.23)/(.5)*(z2+.23)/(.5)*(z2+.23)/(.5)
aa=aa+.09*(z2+.06)/(.56)*(z2+.06)/(.56)*(z1-370.42)/(73.83) ba=.45-.88*ab
aa=aa-.02*(z2+.06)/(.56)*(z2+.06)/(.56)*(z2+.06)/(.56) ba=ba+.46*aa
ab=.19-.14*(z3+.91)/(.53) ba=ba+.11*(z3+.98)/(.5)
ab=ab-.39*(z1-370.42)/(73.83) ba=ba+.74*ab*ab
ab=ab+.41*(z6-1.52)/(.14) ba=ba-.47*aa*ab
ab=ab+.02*(z3+.91)/(.53)*(z3+.91)/(.53) ba=ba-.47*ab*(z3+.98)/(.5)
ab=ab-.05*(z1-370.42)/(73.83)*(z3+.91)/(.53) ba=ba-.07*aa*aa
ab=ab-.05*(z3+.91)/(.53)*(z6-1.52)/(.14) ba=ba-.99*aa*(z3+.98)/(.5)
ab=ab-.06*(z1-370.42)/(73.83)*(z1-370.42)/(73.83) ba=ba-.09*(z3+.98)/(.5)*(z3+.98)/(.5)
ab=ab-.01*(z1-370.42)/(73.83)*(z6-1.52)/(.14) ba=ba+.86*ab*ab*ab
ab=ab-.1*(z6-1.52)/(.14)*(z6-1.52)/(.14) ba=ba-.14*ab*aa*ab
ab=ab-.03*(z3+.91)/(.53)*(z3+.91)/(.53)*(z3+.91)/(.53) ba=ba-.64*(z3+.98)/(.5)*ab*ab
ab=ab-.09*(z3+.91)/(.53)*(z1-370.42)/(73.83)*(z3+.91)/(.53) ba=ba+.19*ab*aa*aa
ab=ab-.04*(z6-1.52)/(.14)*(z3+.91)/(.53)*(z3+.91)/(.53) ba=ba-.3*ab*aa*(z3+.98)/(.5)
ab=ab+.08*(z3+.91)/(.53)*(z1-370.42)/(73.83)*(z1-370.42)/(73.83) ba=ba-.13*aa*aa*aa
ab=ab-.16*(z3+.91)/(.53)*(z1-370.42)/(73.83)*(z6-1.52)/(.14) ba=ba+.49*(z3+.98)/(.5)*(z3+.98)/(.5)*ab
ab=ab+.02*(z1-370.42)/(73.83)*(z1-370.42)/(73.83)*(z1-370.42)/(73.83) ba=ba+1.1*aa*aa*(z3+.98)/(.5)
ab=ab-.06*(z6-1.52)/(.14)*(z6-1.52)/(.14)*(z3+.91)/(.53) ba=ba+.12*(z3+.98)/(.5)*(z3+.98)/(.5)*aa
ab=ab-.01*(z1-370.42)/(73.83)*(z1-370.42)/(73.83)*(z6-1.52)/(.14) ba=ba-.03*(z3+.98)/(.5)*(z3+.98)/(.5)*(z3+.98)/(.5)
ab=ab+.08*(z6-1.52)/(.14)*(z6-1.52)/(.14)*(z1-370.42)/(73.83) da2=ba*.35+1.77
ab=ab-.01*(z6-1.52)/(.14)*(z6-1.52)/(.14)*(z6-1.52)/(.14) p2=EXP(-.5*((da2-1.13)/.75) ^ 2)
ba=-.04+2*ab RETURN
ba=ba+.11*aa node3:
ba=ba+.39*(z2+.06)/(.56) aa=1.94+14.85*(z3+.96)/(.55)
ba=ba+1.3*ab*ab aa=aa-12.06*(z4+8.56)/(4.29)
ba=ba-2.28*aa*ab aa=aa+2.26*(z7-1.36)/(.02)
ba=ba-.43*ab*(z2+.06)/(.56) aa=aa-35.27*(z3+.96)/(.55)*(z3+.96)/(.55)
ba=ba+.24*aa*aa aa=aa+91.74*(z4+8.56)/(4.29)*(z3+.96)/(.55)
ba=ba+.06*aa*(z2+.06)/(.56) aa=aa-.31*(z3+.96)/(.55)*(z7-1.36)/(.02)
ba=ba-.03*(z2+.06)/(.56)*(z2+.06)/(.56) aa=aa-57.86*(z4+8.56)/(4.29)*(z4+8.56)/(4.29)
ba=ba-.55*ab*ab*ab aa=aa+.8*(z4+8.56)/(4.29)*(z7-1.36)/(.02)
ba=ba-.24*ab*aa*ab aa=aa-.09*(z7-1.36)/(.02)*(z7-1.36)/(.02)
ba=ba+.46*(z2+.06)/(.56)*ab*ab aa=aa-9.12*(z3+.96)/(.55)*(z3+.96)/(.55)*(z3+.96)/(.55)
ba=ba+.38*ab*aa*aa aa=aa+11.75*(z3+.96)/(.55)*(z4+8.56)/(4.29)*(z3+.96)/(.55)
ba=ba-2.29*ab*aa*(z2+.06)/(.56) aa=aa-6.98*(z7-1.36)/(.02)*(z3+.96)/(.55)*(z3+.96)/(.55)
ba=ba-.67*aa*aa*aa aa=aa-2.62*(z3+.96)/(.55)*(z4+8.56)/(4.29)*(z4+8.56)/(4.29)
ba=ba-.79*(z2+.06)/(.56)*(z2+.06)/(.56)*ab aa=aa+12.73*(z3+.96)/(.55)*(z4+8.56)/(4.29)*(z7-1.36)/(.02)
Fig. 22. BASIC source code for the decision architecture (Part 1 of 2).
Spatial Voting With Data Modeling 177
aa=aa+.16*(z4+8.56)/(4.29)*(z4+8.56)/(4.29)*(z4+8.56)/(4.29) ac=ac-.4*(z7-1.36)/(.02)*(z7-1.36)/(.02)*(z7-1.36)/(.02)
aa=aa+.69*(z7-1.36)/(.02)*(z7-1.36)/(.02)*(z3+.96)/(.55) ac=ac+.22*(z7-1.36)/(.02)*(z1-341.35)/(48.93)*(z7-1.36)/(.02)
aa=aa-6.08*(z4+8.56)/(4.29)*(z4+8.56)/(4.29)*(z7-1.36)/(.02) ac=ac+.26*(z2+.31)/(.48)*(z7-1.36)/(.02)*(z7-1.36)/(.02)
aa=aa-1.16*(z7-1.36)/(.02)*(z7-1.36)/(.02)*(z4+8.56)/(4.29) ac=ac-.12*(z7-1.36)/(.02)*(z1-341.35)/(48.93)*(z1-341.35)/(48.93)
aa=aa-.14*(z7-1.36)/(.02)*(z7-1.36)/(.02)*(z7-1.36)/(.02) ac=ac+.75*(z7-1.36)/(.02)*(z1-341.35)/(48.93)*(z2+.31)/(.48)
ab=1.15+11.53*(z3+.96)/(.55) ac=ac+.32*(z1-341.35)/(48.93)*(z1-341.35)/(48.93)*(z1-341.35)/(48.93)
ab=ab-11.27*(z4+8.56)/(4.29) ac=ac-.44*(z2+.31)/(.48)*(z2+.31)/(.48)*(z7-1.36)/(.02)
ab=ab+.72*(z6-1.55)/(.13) ac=ac-.24*(z1-341.35)/(48.93)*(z1-341.35)/(48.93)*(z2+.31)/(.48)
ab=ab-28.13*(z3+.96)/(.55)*(z3+.96)/(.55) ac=ac+.01*(z2+.31)/(.48)*(z2+.31)/(.48)*(z1-341.35)/(48.93)
ab=ab+73.45*(z4+8.56)/(4.29)*(z3+.96)/(.55) ac=ac+.14*(z2+.31)/(.48)*(z2+.31)/(.48)*(z2+.31)/(.48)
ab=ab+.21*(z3+.96)/(.55)*(z6-1.55)/(.13) ba=.53-.33*ac
ab=ab-47.14*(z4+8.56)/(4.29)*(z4+8.56)/(4.29) ba=ba+.01*ab
ab=ab-1.2*(z4+8.56)/(4.29)*(z6-1.55)/(.13) ba=ba-.24*aa
ab=ab+.6*(z6-1.55)/(.13)*(z6-1.55)/(.13) ba=ba+.04*ac*ac
ab=ab-.05*(z3+.96)/(.55)*(z3+.96)/(.55)*(z3+.96)/(.55) ba=ba-.16*ab*ac
ab=ab-7.74*(z3+.96)/(.55)*(z4+8.56)/(4.29)*(z3+.96)/(.55) ba=ba-.17*ac*aa
ab=ab+10.68*(z6-1.55)/(.13)*(z3+.96)/(.55)*(z3+.96)/(.55) ba=ba-.12*ab*ab
ab=ab+11.2*(z3+.96)/(.55)*(z4+8.56)/(4.29)*(z4+8.56)/(4.29) ba=ba-.1*ab*aa
ab=ab-23*(z3+.96)/(.55)*(z4+8.56)/(4.29)*(z6-1.55)/(.13) ba=ba+.14*aa*aa
ab=ab-2.98*(z4+8.56)/(4.29)*(z4+8.56)/(4.29)*(z4+8.56)/(4.29) ba=ba-.06*ac*ac*ac
ab=ab-7.06*(z6-1.55)/(.13)*(z6-1.55)/(.13)*(z3+.96)/(.55) ba=ba+.13*ac*ab*ac
ab=ab+12.47*(z4+8.56)/(4.29)*(z4+8.56)/(4.29)*(z6-1.55)/(.13) ba=ba+.53*aa*ac*ac
ab=ab+7.43*(z6-1.55)/(.13)*(z6-1.55)/(.13)*(z4+8.56)/(4.29) ba=ba+.31*ac*ab*ab
ab=ab+.15*(z6-1.55)/(.13)*(z6-1.55)/(.13)*(z6-1.55)/(.13) ba=ba+.15*ac*ab*aa
ac=-2.24+2.01*(z7-1.36)/(.02) ba=ba-.02*ab*ab*ab
ac=ac-1.26*(z1-341.35)/(48.93) ba=ba-.06*aa*aa*ac
ac=ac-.55*(z2+.31)/(.48) ba=ba-.15*ab*ab*aa
ac=ac+.68*(z7-1.36)/(.02)*(z7-1.36)/(.02) ba=ba-.08*aa*aa*ab
ac=ac+.31*(z1-341.35)/(48.93)*(z7-1.36)/(.02) ba=ba+.01*aa*aa*aa
ac=ac+.39*(z7-1.36)/(.02)*(z2+.31)/(.48) da3=ba*.11+1.94
ac=ac+.4*(z1-341.35)/(48.93)*(z1-341.35)/(48.93) p3=EXP(-.5*((da3-1.01)/.13) ^ 2)
ac=ac-.77*(z1-341.35)/(48.93)*(z2+.31)/(.48) RETURN
ac=ac+.75*(z2+.31)/(.48)*(z2+.31)/(.48)
Fig. 23. BASIC source code for the decision architecture (Part 2 of 2).
8. Summary
We use the Spatial Voting (SV) process for fusing spatial positions in a 2-D grid. This
process yields a centroid and covariance estimate as the basis of robust cluster identification.
We calculate a series of geospatial features unique to the identified cluster and attempt to
identify unique and consistent features to enable automated target recognition. We define
the geospatial features and outline our process of deriving a decision architecture populated
with Data Models. We attempt to identify the support vectors of the feature space and
enable the smallest subsample of available exemplars to be used for extracting the analytical
rule equations. We present details of the decision architecture derivation process. We
construct ambiguity resolvers to further sieve and classify mislabeled sensor hits by
deriving a new resolver Data Model that further processes the output from the previous
layer. In this fashion through a cascade filter, we are able to demonstrate unique
classification and full assignment of all available examples even in high dimensional spaces.
9. Acknowledgements
The author would like to thank James Handley (LSEI) for programming support and
proofreading this document; and Dr. William “Bud” Albritton, Jr., Dr. Nat Albritton, Robert
Caspers, and Randel Burnett (Amtec Corporation) for their assistance with applications
development, as well as their sponsorship of and technical discussions with the author.
178 Sensor Fusion and Its Applications
10. References
Hall, D.L., & McMullen, S.A.H. (2004), Mathematical Techniques in Multisensor Data Fusion,
Artech House, ISBN 0890065586, Boston, MA, USA.
Jaenisch, H.M., Albritton, N.G., Handley, J.W., Burnett, R.B., Caspers, R.W., & Albritton Jr.,
W.P. (2008), “A Simple Algorithm For Sensor Fusion Using Spatial Voting
(Unsupervised Object Grouping)”, Proceedings of SPIE, Vol. 6968, pp. 696804-
696804-12, ISBN 0819471593, 17-19 March 2008, Orlando, FL, USA.
Jaenisch, H.M., & Handley, J.W. (2009), “Analytical Formulation of Cellular Automata Rules
Using Data Models”, Proceeding of SPIE, Vol. 7347, pp. 734715-734715-13, ISBN
0819476137, 14-15 April 2009, Orlando, FL, USA.
Jaenisch, H.M., & Handley, J.W. (2003), “Data Modeling for Radar Applications”,
Proceedings of IEEE Radar Conference 2003, ISBN 0780379209, 18-19 May 2003,
Huntsville, AL, USA.
Jaenisch, H.M., Handley, J.W., Albritton, N.G., Koegler, J., Murray, S., Maddox, W., Moren,
S., Alexander, T., Fieselman, W., & Caspers, R.T., (2010), “Geospatial Feature Based
Automatic Target Recognition (ATR) Using Data Models”, Proceedings of SPIE,
Vol. 7697, 5-9 April 2010, Orlando, FL, USA.
Jaenisch, H.M., Handley, J.W., Massey, S., Case, C.T., & Songy, C.G. (2002), “Network
Centric Decision Architecture for Financial or 1/f Data Models”, Proceedings of
SPIE, Vol. 4787, pp. 86-97, ISBN 0819445541, 9-10 July 2002, Seattle, WA, USA.
Jain, A.K. (1989), Fundamentals of Digital Image Processing, Prentice-Hall, ISBN 0133361659,
Englewood Cliffs, NJ.
Klein, L. (2004), Sensor and Data Fusion, SPIE Press, ISBN 0819454354, Bellingham, WA.
Madala, H, & Ivakhnenko, A. (1994), Inductive Learning Algorithms for Complex Systems
Modeling, CRC Press, ISBN 0849344387, Boca Raton, FL.
National Aeronautics and Space Administration (NASA) (1962), Celestial Mechanics and Space
Flight Analysis, Office of Scientific and Technical Information, Washington, DC.
Press, W.H., Teukolsky, S.A., Vetterling, W.T., & Flannery, B.P. (2007), Numerical Recipes: The
Art of Scientific Computing, 3rd Edition, Cambridge University Press, ISBN
0521880688, Cambridge, UK.
Hidden Markov Model as a Framework for Situational Awareness 179
0
8
Abstract
In this chapter we present a hidden Markov model (HMM) based framework for situational
awareness that utilizes multi-sensor multiple modality data. Situational awareness is a pro-
cess that comes to a conclusion based on the events that take place over a period of time across
a wide area. We show that each state in the HMM is an event that leads to a situation and the
transition from one state to another is determined based on the probability of detection of
certain events using multiple sensors of multiple modalities - thereby using sensor fusion for
situational awareness. We show the construction of HMM and apply it to the data collected
using a suite of sensors on a Packbot.
1. Introduction
Situational awareness (SA) is a process of conscious effort to process the sensory data to ex-
tract actionable information to accomplish a mission over a period of time with or without
interaction with the sensory systems. Most of the information is time dependent and they
usually follow a sequence of states. This is where the Markov or hidden Markov models are
useful in analyzing the data and to extract the actionable information from the sensors. To
gain better understanding, the following section would elaborate on situation awareness.
1 http://www.army.mil/armyBTKC/focus/sa/index.htm
180 Sensor Fusion and Its Applications
Situational Awareness is the ability to generate actionable knowledge through the use of
timely and accurate information about the Army enterprise, its processes, and external
factors.
Endsley and Garland (Endsley & Mataric, 2000) defines SA as “SA is knowing what is go-
ing around you". There is usually a torrent of data coming through the sensors, situational
awareness is sifting through all that data and extracting the information that is actionable and
predicting the situation ahead. The awareness of the situation ahead lets one plan the data
collection from the right set of sensors. SA allows selective attention to the information. Some
other pertinent definitions are provided here (Beringer & Hancock, 1989):
SA requires an operator to “quickly detect, integrate and interpret data gathered from the
environment. In many real-world conditions, situational awareness is hampered by two
factors. First, the data may be spread throughout the visual field. Second, the data are
frequently noisy" (Green et al., 1995).
Situation awareness is based on the integration of knowledge resulting from recurrent
situation awareness (Sarter & Woods, 1991).
“Situation awareness is adaptive, externally-directed consciousness that has as its prod-
ucts knowledge about a dynamic task environment and directed action within that envi-
ronment"(Smith & Hancock, 1995).
In a sensor world, situation awareness is obtained by gathering data using multi-modal mul-
tiple sensors distributed over an area of interest. Each modality of sensor obtains the data
within its operating range. For example video observes the data within its field of view.
Acoustic sensors record the sound within its audible (sensitive) range. In this chapter, several
sensor modalities will be considered and the data they present. Proper information from each
sensor or from a combination of sensors will be extracted to understand the scene around.
Extraction of the right information depends mostly on previous knowledge or previous situa-
tion awareness. Understanding of the contribution of each sensor modality to the SA is key to
the development of algorithms pertinent to the SA. Clearly, the information one would like to
obtain for SA depends on the mission. In order to help us better understand the functionality
of each modality, three different missions are considered as exemplars here, namely, (a) urban
terrain operations, (b) difficult terrain such as tunnels, caves, etc., and (c) battlefield.
– Evacuation of embassies
– Seize ports and airfields
– Counter weapons of mass destruction (WMD)
– Seize enemy leaders
• Sustained urban combat
From the above list of operations that may take place in an urban area, clearing of buildings
and protecting them is one of the major missions. Often, once a building is cleared, one may
leave some sensors in the building to monitor the building for intruders. Another important
operation is perimeter protection. In the case of perimeter protection, several sensors will
be deployed around the perimeter of a building or a place. These sensors detect any person
approaching the perimeter and report to the command center for further investigation and
action. Next we consider operations in difficult terrain.
Borders between warring nations and between rich and poor nations have become porous
for illegal transportation of people, drugs, weapons, etc. Operations in these areas include:
(a) detection of tunnels using various sensing modalities and (b) making sure that the tunnels
remain cleared once they are cleared. Detection of tunnels require different kind of sensors.
Clearly, the requirements for different operations are different. To be successful in the oper-
ations, one need to have a clear understanding of the situation. Situation awareness comes
from the sensors deployed on the ground and in the air, and human intelligence. The sensor
data is processed for right information to get the correct situation awareness. The next section
presents various sensors that could be used to monitor the situation.
power consumption by traditional sensors, (c) wide area of operation requiring many sensors,
(d) limited field of view by Radar and video and (e) new modalities offer better insight in to
the situation. Most of the sensors for situation awareness are deployed in an area of interest
and left there for days, weeks, and months before attending to them. This necessitated the
need for low power, low cost and large quantities of sensors that could be deployed in the
field.
Now, we will present some of the sensors that may be deployed in the field and discuss their
utility.
Acoustic Sensors: While the imaging sensors (for example: camera, video) act as the eyes, the
acoustic sensors fulfill the role of ears in the sensing world. These microphones capture the
sounds generated by various events taking place in their vicinity, such as, a vehicle traveling
on a nearby road, mortar/rocket launch and detonations, sound of bullets whizzing by and
of course sounds made by people, animals, etc., to name few. These are passive sensors, that
is, they do not transmit any signals unlike the Radar, hence they can be used for stealth op-
erations. There are several types of microphones, namely, condenser, piezoelectric, dynamic,
carbon, magnetic and micro-electro mechanical systems (MEMS) microphones. Each micro-
phone has its own characteristic response in terms of sensitivity to the sound pressure and the
frequency of operation. Each application demands a different type of microphone to be used
depending on the signals that are being captured by the microphone. For example, detection
of motor vehicles require the microphones that have the frequency response equal or greater
than the highest engine harmonic frequency. On the other hand to capture a transient event
such as a shock wave generated by a super sonic bullet require a microphone with frequency
response of 100 kHz or more. When the microphones are used in an array configuration, such
as, linear, circular or tetrahedral array, the signals from all the microphones can be processed
for estimating the angle of arrival (AoA) of the target. Figure 1 shows a single microphone
Hidden Markov Model as a Framework for Situational Awareness 183
and a tetrahedral array. The microphones in the tetrahedral array Figure 1b are covered by
foam balls to reduce the wind noise.
Seismic Sensors: These are also called geophones. These sensors are used to detect the vibra-
tions in the ground caused by the events taking place in the sensing range of the sensors. Just
as in the case of acoustic sensors, the seismic sensors are passive sensors. Typical applications
for these sensors include (a) detection of vehicles (both civilian and military vehicles) by cap-
turing the signals generated by a moving vehicles, (b) perimeter protection – by capturing the
vibrations caused by footsteps of a person walking, (c) explosion, etc. The Indonesian tsunami
in December 2004 was devastating to the people. However, several animals sensed the vibra-
tions in the ground caused by the giant waves coming to the shore and ran to the hills or
elevated areas and survived the tsunami. Figure 2 shows different seismic sensors. The spikes
are used to couple the the sensor to the ground by burying the spikes in the ground.
Magnetic Sensors: Magnetic (B-field) sensors can be used to detect ferromagnetic materials
carried by people, e.g., keys, firearms, and knives. These sensors may also detect the usage of
computer monitors. There are several types of magnetic sensors, namely, (a) flux gate magne-
tometer and (b) coil type magnetic sensor. The coil type magnetic sensor has high frequency
response compared to the flux gate magnetometer. One can use multiple sensors in order to
detect the flux change in all three X, Y and Z directions. The sensitivity of the magnetic sensor
depends on the type and as well as the construction of the sensor. Figure 3 shows two types
of magnetic sensors.
Fig. 3. (a) Flux gate magnetometer, (b) Coil type magnetic sensor
184 Sensor Fusion and Its Applications
Electrostatic or E-field Sensors: These are passive sensors that detect static electric charge
built-up on the targets or any electric field in the vicinity of the sensor. Some of the sources
of the static electric charge are (a) clothes rubbing against the body, (b) combing hair, and
(c) bullet or projectile traveling in the air builds up charge on the bullet, etc. All the electric
transmission lines have electric field surrounding the lines – this field gets perturbed by a
target in the vicinity – and can be detected by E-field sensors. Figure 4 shows some of the
E-field sensors that are commercially available.
Passive Infrared (PIR) Sensor: These are passive sensors that detect infrared radiation by the
targets. These are motion detectors. If a person walks in front of them, the sensor generates an
output proportional to the temperature of the body and inversely proportional to the distance
between the person and the sensor. Figure 5 shows a picture of PIR sensor.
Chemical Sensor: These sensors are similar to the carbon monoxide detectors used in build-
ings. Some of the sensors can detect multiple chemicals. Usually, these sensors employ sev-
eral wafers. Each wafer reacts to a particular chemical in the air changing the resistivity of the
wafer. The change in the resistivity in turn changes the output voltage indicating the presence
of that chemical.
Hidden Markov Model as a Framework for Situational Awareness 185
Infra Red Imagers: There are several IR imagers depending on the frequency band they op-
erate at, namely, long wave IR, medium wave IR, and forward looking infrared (FLIR). These
sensors take the thermal image of the target in their field of view. A typical IR imager’s picture
is shown in Figure 6.
Visible Imagers: These are regular video cameras. They take the pictures in visible spectra
and have different resolution and different field of view depending on the lens used. Figure 6
shows a picture of a typical video camera.
In the next section, we present the description of the unattended ground sensors.
UGS are in general placed in the area of interest conspicuously and left to operate for sev-
eral days or months. In general these are low power sensors that meant to last for several
days or months before replacing the batteries. There are several manufacturers that make the
UGS systems.
186 Sensor Fusion and Its Applications
Bayesian methods provide a way for reasoning about partial beliefs under conditions of un-
certainty using a probabilistic model, encoding probabilistic information that permits us to
compute the probability of an event. The main principle of Bayesian techniques lies in the
inversion formula:
p(e| H ) p( H )
p( H |e) =
p(e)
where H is the hypothesis, p(e| H ) is the likelihood, p( H ) is called the prior probability, p( H |e)
is the posterior probability, and p(e) is the probability of evidence. Belief associated with the
hypothesis H is updated based on this formula when new evidence arrives. This approach
forms the basis for reasoning with Bayesian belief networks. Figure 7 show how the evidence
is collected using hard and soft methods.
Nodes in Bayesian networks (Pearl, 1986; 1988) represent hypotheses, and information is
transmitted from each node (at which evidence is available or belief has been updated) to
adjacent nodes in a directed graph. Use of Bayesian rule for large number of variables require
estimation of joint probability distributions and computing the conditional probabilities. For
example, if no assumption on the dependencies is made, that is, all variables are dependent
on each other, then
If the dependencies are modeled as shown in Figure 8, then the joint probability distribution
is much simpler and is given by
Let G (V, E) isa directed acyclic graph with a set of vertices V = {v1 , v2 , · · · , vn } and a set
of edges E = e1,2 , e1,3 , · · · , ei,j , with i = j ∈ {1, 2, · · · , n}. Note that the directed edge ei,j
Hidden Markov Model as a Framework for Situational Awareness 187
connects the vertex vi to vertex v j and it exists if and only if there is a relationship between
nodes vi and v j . Node vi is the parent of node v j and v j is the descendant of node vi . Let us
denote the random variable associated with the node vi by Xvi . For simplicity, let us denote
Xi = Xvi . Let pa(vi ) denote the parent nodes of the node vi . For a Bayesian belief network the
following properties must be satisfied:
• Each variable is conditionally independent of its non-descendants
• Each variable is dependent on its parents
This property is called the local Markov property. Then the joint probability distribution is given
by
n
p ( X1 , X2 , · · · , X n ) = ∏ p (Xi | pa(Xi )) (3)
i =1
188 Sensor Fusion and Its Applications
Now it is possible to associate meaning to the links in the Bayesian belief network and hence
what we need to specify to turn the graphical dependence structure of a BBN into a proba-
bility distribution. In Figure 8 the nodes labeled ‘sound’ and ‘human voice’ are related. The
node ‘sound’ is the parent node of ‘human voice’ node since without sound there is no human
voice. The link shows that relation. Similarly nodes in Figure 8 are related to others with cer-
tain probability. Each node in the BBN represents a state and provides the situation awareness.
A closely related process to BBN is a Markov process. Both Markov and Hidden Markov
process are presented in the next section.
One of the important application of Markov model is in speech recognition where the states
are hidden but the measured parameters depend on the state the model is in. This important
model is called the hidden Markov model (HMM). A more detailed description of the model
is presented in the next section.
observation is made. The process is repeated with other sensors. The entire process gener-
ates a sequence of observations O = O1 , O2 , · · · , O M , where Oi ∈ V. This is similar to the
urn and ball problem presented in (Rabiner, 1989). One of the problems could be; given the
observation sequence, what is the probability that it is for car, truck or tank?
An HMM in Figure 10 is characterized by (Rabiner, 1989):
1. The number of states N. Let S denote the set of states, given by, S = {S1 , S2 , · · · , S N }
and we denote the state at time t as qt ∈ S.
2. Size of the alphabet M, that is, the number of distinct observable symbols
V = { v1 , v2 , · · · , v M }.
3. The state transition probability distribution A = aij where
aij = P qt+1 = S j | qt = Si , 1 ≤ i, j ≤ N. (5)
190 Sensor Fusion and Its Applications
4. The probability distribution of each alphabet vk in state j, B = b j (vk ) , where
b j (vk ) = P vk at t | qt = S j , 1 ≤ j ≤ N; 1 ≤ k ≤ M. (6)
πi = P [q1 = Si ] , 1 ≤ i ≤ N. (7)
Clearly, the HMM is completely specified if N, M, A, B, π are specified and it can be used to
generate an observation sequence O = O1 , O2 , · · · , OT (Rabiner, 1989). Three questions arise
with HMMs, namely,
• Question 1: Given the observation sequence O = O1 , O2 , · · · , OT , and the model λ =
{ A, B, π }, how does one compute the P (O | λ), that is, the probability of the observa-
tion sequence,
• Question 2: Given the observation sequence O = O1 , O2 , · · · , OT , and the model λ,
how does one compute the optimal state sequence Q = q1 q2 · · · q T that best explains
the observed sequence, and
• Question 3: How does one optimizes the model parameters λ = { A, B, π } that maxi-
mizes P (O | λ).
Getting back to the problem posed in Figure 9, we will design a separate N-state HMM for
each vehicle passage. It is assumed that the vehicles travel at near constant velocity and the
experiment starts when the vehicle approaches a known position on the road. For training
purposes the experiment is repeated with each vehicle traveling at different positions on the
road, for example, left, right, middle or some other position. Now, for each HMM a model has
to be built. In section 3.4 we show how to build an HMM. This is same as finding the solution
to the question 3. Answer to question 2 provides the meaning to the states. Recognition of the
observations is given by the solution to the question 1.
Solution to Question 1: Given the observation sequence O and the model λ, estimate P(O | λ).
Let the observed sequence is
O = O1 , O2 , · · · , OT
and one specific state sequence that produced the observation O is
Q = q1 , q2 , · · · , q T
The probability of the state sequence Q can be computed using (5) and (7) and is given by
P ( Q | λ ) = π q 1 a q 1 q 2 a q 2 q 3 · · · a q T −1 q T . (10)
Finally, the probability of the observation sequence O is obtained by summing over all possible
Q and is given by
P(O | λ) = ∑ P (O | Q, λ) P ( Q | λ) (11)
all Q
There are efficient ways to compute the probability of the observation sequence given by (11)
which will not be discussed here. Interested people should consult (Rabiner, 1989).
Sensor fusion is supposed to lead to a better situational awareness. However fusion of multi-
modal data is a difficult thing to do as there are few joint probability density functions exist for
mixed modalities. Fusion mostly depends on the application at hand. The problem is further
complicated if one has to fuse the events that take place over a period of time and over a wide
area. If they are time dependent, relevance of the data observed at different times become an
issue. We opted to do fusion of information, that is, probability of detection of an event. In
a majority of the cases Bayesian networks (Singhal & Brown, 1997; 2000) are used for fusion.
In this chapter we use Dempster-Shafer fusion (Hall & Llinas, 2001; Klein, 2004) for fusion of
multi-modal multi-sensor data.
Figure 12 shows the robot with 4 microphones, 3-axis seismic sensor, PIR, chemical sensor, 3
coil type magnetometer (one coil for each axis X, Y and Z ), three flux gate magneto meter,
3-axis E-field sensor, visible video and IR imaging sensors. The goal is to assess the situation
based on the observations of various sensor modalities over a period of time in the area cov-
ered by the sensor range. We enacted the data collection scenario with several features built-in
to observe the happenings inside the office room and assess the situation.
192 Sensor Fusion and Its Applications
• A person walks into the office room - this triggers PIR, B & E-field and seismic sensors.
• She occasionally talks - the acoustic sensor picks up the voice.
• She sits in front of a computer.
• She turns on the computer.
Hidden Markov Model as a Framework for Situational Awareness 193
– B & E-field sensors observe the power surge caused by turning on the computer.
– Acoustic sensors observe the characteristic chime of Windows turning on.
– The person’s movements are picked up by the PIR sensor.
– Visible video shows a pattern on the computer screen showing activity on the
computer.
– The IR imager picks up the reflected thermal profile of the person in front of the
monitor.
• She types on the keyboard - sound is detected by the acoustic sensor.
• She turns off the computer.
– Windows turning off sound is observed by the acoustic sensor.
– The power surge after shutdown is observed by the B-field sensor.
In the next section we present the data from various sensors and show the events detected by
each sensor and also present some of the signal processing done to identify the events.
Acoustic sensor data analysis: In the case of acoustic sensors, we try to look for any hu-
man or machine activity - this is done by observing the energy levels in 4 bands, that is, 20 -
250Hz, 251 - 500Hz, 501 - 750Hz and 751 - 1000Hz corresponding to voice indicative of human
presence. These four energy levels become the feature set and a classifier (Damarla et al., 2007;
2004; Damarla & Ufford, 2007) is trained with this feature set collected with a person talking
and not talking. The algorithm used to detect a person is presented in the references (Damarla
et al., 2007; 2004; Damarla & Ufford, 2007) and the algorithm is provided here.
M = E { X } = [ m1 , m2 , · · · , m N ] T
σ11 σ12 · · · σ1N
σ σ22 · · · σ2N
Σ = E ( X − M) ( X − M)T = 21
··· ··· ··· ··· ,
and person not present), we know the a priori probability and the particular N-variate normal
probability function P { X | i ). That is, we know R normal density functions. Let us denote the
mean vectors Mi and the covariance matrices Σi for i = 1, 2, · · · , R, then we can write
1 1 T −1
p (X | i) = exp − ( X − M i ) Σ i ( X − M i ) (12)
(2π ) N/2 |Σi |1/2 2
where Mi = (mi1 , mi2 , · · · , miN ). Let us define H0 and H1 as the null and human present
hypotheses. The likelihood of each hypothesis is defined as the probability of the observation,
i.e., feature, conditioned on the hypothesis,
l H j ( Xs ) = p Xs | H j (13)
for j = 1, 2 and s ∈ S, where S ={acoustic, PIR, seismic}. The conditional probability is mod-
eled as a Gaussian distribution given by (12),
2
p Xs | Hj = ℵ Xs ; µs,j , σs,j . (14)
Now, (13)-(14) can be used to determine the posterior probability of human presence given a
single sensor observation. Namely,
l H1 ( Xs ) p ( H1 )
p ( Hi | Xs ) = (15)
l H0 ( Xs ) p ( H0 ) + l H1 ( Xs ) p ( H1 )
where p( H0 ) and p( H1 ) represent the prior probabilities for the absence and presence of a
human, respectively. We assumes an uninformative prior, i.e., p( H0 ) = p( H1 ) = 0.5.
In the office room scenario, we are looking for any activity on the computer - the Windows
operating system produces a distinct sound whenever a computer is turned on or off. This
distinct sound has a 75-78Hz tone and the data analysis looks for this tone. The acoustic data
process is depicted in the flow chart sown in Figure 13 and Figure 14 shows the spectrum of
the acoustic data when a person is talking and when Windows operating system comes on.
The output of the acoustic sensor is Pi , i = 1, 2, 3, corresponding to three situations, namely,
(i) a person talking, (ii) computer chime and (iii) no acoustic activity.
Seismic Sensor Data Analysis: We analyze the seismic data for footfalls of a person walking.
The gait frequency of normal walk is around 1-2Hz. We use the envelope of the signal instead
of the signal itself to extract the gait frequency (Damarla et al., 2007; Houston & McGaffigan,
2003). We also look for the harmonics associated with the gait frequency. Figure 15 shows the
flow chart for seismic data analysis. We use the 2-15Hz band to determine the probability of
person walking in the vicinity. The seismic sensor provides two probabilities, (i) probability
of a person walking and (ii) probability of nobody present.
PIR sensor data analysis: These are motion detectors, if a person walks in front of them, they
will give an output proportional to the temperature of the body and inversely proportional to
the distance of the person from the sensor. Figure 16 shows the PIR sensor data collected in the
office room. Clearly, one can see a large amplitude when a person walked by the sensor. The
smaller amplitudes correspond to the person seated in the chair in front of the computer and
moving slightly (note that the chair is obstructing the full view of the person) and only part
of the body is seen by the PIR sensor. In order to assess the situation, both seismic and PIR
sensor data can be used to determine whether a person entered the office room. The seismic
sensor does not require line of sight unlike the PIR sensor - they complement each other.
196 Sensor Fusion and Its Applications
Magnetic sensor (B-field sensor) Data Analysis: We used both Flux gate and coil magne-
tometers. The former has low frequency response while the coil magnetometer provides high
frequency response. A total of six sensors: three flux gate magnetometers, one for each direc-
tion X, Y, and Z and three coil magnetometers were used. The coil magneto-meters are placed
in X, Y, and Z axes to measure the magnetic flux in respective direction. Figure 17 shows
clearly the change in magnetic flux when a computer is turned on and off. Similar signals are
observed in Y and Z axes.
E-Field Sensor data analysis: We used three E-field sensors - one in each axis. The output
of X-axis E-field sensor data is shown in Figure 18. A spike appears when the computer is
turned on in the E-field sensor output, however, we did not observe any spike or change in
amplitude when the computer is turned off.
Visible and IR imaging sensors: Several frames of visible and IR images of the office room
and its contents are taken over a period of time. In this experiment, the images are used to
determine if the computers are on or off and if anybody is sitting in front of the computer
to assess the situation. Due to limited field of view of these sensors, only a partial view of
the room is visible – often it is difficult to observe a person in the room. Figure 19 shows a
frame of visible image showing only the shoulder of a person sitting in front of a computer.
Figure 20 shows an IR frame showing a thermal image of the person in front of the computer
due to reflection. Most of the thermal energy radiated by the person in front of the computer
monitor is reflected by the monitor and this reflected thermal energy is detected by the IR
imager. The IR imager algorithm processes the silhouette reflected from the monitor – first
Hough transform (Hough, 1962) is used to determine the line patterns of an object and then
using elliptical and rectangular models to detect a person (Belongie et al., 2002; Dalal & Triggs,
2005; Wang et al., 2007) in front of the monitor and provide the probability of a person being
present in the room. The visible imager algorithm determines the brightness of the monitor
Hidden Markov Model as a Framework for Situational Awareness 197
and varying patterns and provides the probability that the computer is on. In the next section
we present the framework for HMM.
In the next section 3.3, we present an HMM with hypothetical states and how they can be
reached based on the information observed. Although we present that these states are deter-
mined based on the output of some process, hence making them deterministic rather than the
198 Sensor Fusion and Its Applications
Fig. 19. Visible image showing a person in front of computer before it is turned on
Fig. 20. IR image frame showing thermal reflection of person in front of the computer
hidden states, it is shown like this for conceptual purposes only. In section 3.4 we present the
HMM where the states are hidden and can be reached only by particular observations.
3.3 Relation between HMM states and various states of Situational Awareness
Based on the situation we are interested in assessing, the HMM is designed with four states
as shown in Figure 21. The states are as follows:
• S0 denotes the state when there is no person in the office room,
• S1 denotes the state when a person is present in the office room,
• S2 denotes the state when a person is sitting in front of a computer and
• S3 denotes the state when a computer is in use.
Hidden Markov Model as a Framework for Situational Awareness 199
The above mentioned states are just a sample and can be extended to any number based on
the situation one is trying to assess on the basis of observations using multi-modal sensors.
We now discuss how each state is reached, what sensor data is used and how they are used.
This also illustrates that the HMM also achieves the sensor fusion as each state transition is
made on the observations of all or a subset of sensors.
State S0 : This is the initial state of the HMM. We use acoustic, seismic, PIR and visible video
data to determine the presence of a person. Each sensor gives probability of detection, prob-
ability of no detection and confidence level denoted by (Pd, Pnd, Pc) as shown in Figure 22.
These probabilities are fused using the Dempster-Shafer (Hall & Llinas, 2001; Klein, 2004) fu-
sion paradigm to determine the overall probability. There will be transition from state S0 to
S1 if this probability exceeds a predetermined threshold otherwise it will remain in state S0 .
The Dempster-Shafer fusion paradigm used is presented here.
Dempster-Shafer fusion rule: To combine the results from two sensors (s1 and s2 ), the fusion
algorithm uses the Dempster-Shafer Rule of combination (Hall & Llinas, 2001; Klein, 2004):
The total probability mass committed to an event Z defined by the combination of evidence
200 Sensor Fusion and Its Applications
s1,2 ( Z ) = s1 ( Z ) ⊕ s2 ( Z ) = K ∑ s 1 ( x ) s 2 (Y ) (16)
X ∩Y = Z
where ⊕ denotes the orthogonal sum and K the normalization factor is:
K −1 = 1 − ∑ s 1 ( X ) s 2 (Y ) (17)
X ∩Y = φ
This is basically the sum of elements from the set of Sensor 1 who intersect with Sensor 2 to
make Z, divided by 1 minus the sum of elements from s1 that have no intersection with s2 .
The rule is used to combine all three probabilities (Pd, Pnd, Pc) of sensors s1 and s2 . The re-
sultant probabilities are combined with the probabilities of the next sensor.
State S1 : This is the state when there is a person in the room. There are three transitions
that can take place while in this state, namely, (1) transition to state S2 , (2) transitions back to
state S0 and (3) stays in the same state.
Transition to S2 happens if any one of the following takes place: (a) if the computer turn on
chime is heard, (b) if magnetic and E-field sensors detect flux change and E-field by the re-
spective sensors, (c) if the IR imager detects an image on the monitor and (d) if the visible
imager detects changing images that appear during the windows turning on process.
The HMM remain in state S1 if there is activity in the PIR, acoustic or seismic but not any
of the events described for the case of transition to S2 . Figure 23 shows the data processing in
each sensor modality.
Hidden Markov Model as a Framework for Situational Awareness 201
State S2 : This is the state where a person is in front of the computer. The transition from this
state either to S3 or to S1 depends on the following: (a) there is keyboard activity or the IR
imager detects a hand on the keyboard and the PIR detects slight motion. S2 to S1 takes place
when the computer is turned off - as detected by acoustic and magnetic sensors.
State S3 : This is the state where the computer is in use. As long as keyboard activity is de-
tected using acoustic and IR imagers the state remains in state S3 , if no keyboard activity is
detected, it will transition to S2 .
Data processing in state S2 is shown in Figure 24. Data processing in S3 is straight forward.
We discussed what processing is done at each state and how the probabilities are estimated.
The transition probabilities of HMM are generated based on several observations with people
entering into the computer room, sitting in front of the computer, turning it on, using it for a
period of time, turning it off and leaving the office room.
Data processing of various sensors depends on the state of the machine and the confidence
levels of various sensor modalities are also changed based on the state of the HMM. For ex-
ample, in state S2 the PIR sensor output monitoring a person in a chair produces small am-
plitude changes as shown in Figure 16 - in normal processing those outputs will not result
in high probability – however in this case they will be given high probability. In state S3 the
acoustic sensor determines the tapping on the keyboard, this sound is often very light and the
sensor is given high confidence levels than normal. In order to accommodate such varying
confidence levels based on the state – it is necessary the state information should be part of
the processing in a deterministic system. In a HMM where the states are automatically transi-
tion based on the outputs of sensor observations. In the next section 3.4 an HMM is built for
the above problem.
202 Sensor Fusion and Its Applications
Clearly some processes can be combined to reduce the number of variables. For example,
acoustic and seismic data can be processed together for detection of human presence. Less
number of variables simplify the code table needed to train the HMM. Or one can use the
output of process in Figure 22 as one variable, output of process in Figure 23 as another vari-
able and so on. Let us assume that each variable gives a binary output, that is, in the case of
acoustic data analysis X1 = 0 implies no human voice, X1 = 1 implying the presence of hu-
man voice. At each instant of time we observe X = { X1 , X2 , · · · , X8 } which can take 28 = 256
different values. Each distinct vector X is an alphabet and there are 256 alphabets.
The data collection scenario in section 3.1 is enacted several times and each enactment is made
with some variation. While enacting the scenario, for each time step t, we make an observa-
tion Ot = O1t , O2t , · · · , O8t , where Oi = Xi . Each observation Ot is associated with a state
Si , for i ∈ {0, 1, 2, 3, 4} based on the ground truth. For example, let the observation at time
step t is Ot = {0, 0, 1, 0, 0, 0, 0, 0} is associated with state S0 if there is no person present or
it is associated with state S1 if there is person in the room. This is the training phase. This
association generates a table of 9 columns, first 8 columns corresponding to the observations
and the 9th column corresponding to the states.
This table should be as large as possible. Next, the HMM model λ = { A, B, π } will be devel-
oped.
n1i
πi = (18)
ne
Hidden Markov Model as a Framework for Situational Awareness 203
O1 O2 O3 O4 O5 O6 O7 O8 State
0 0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0 1
1 0 1 1 0 0 0 0 1
.. .. .. .. .. .. .. .. ..
. . . . . . . . .
0 0 0 0 0 0 1 1 2
0 0 0 1 0 0 1 1 3
0 0 1 0 0 0 0 0 0
Table 1. Exemplar observations and the state assignment
Estimation of A: A is the state transition probability distribution A = aij where
aij = p qt+1 = S j | qt = Si , 1 ≤ i, j ≤ N
In order to compute aij , we need to estimate how many times the state Si to S j in the Table 1,
let this number is denoted by nij . Note that nij need not be equal to n ji . Then
nij
aij = (19)
nT
where n T denotes the total number of rows in the Table 1.
Estimation of B: B is the probability distribution of each alphabet vk in state j, B = b j (vk ) ,
where
b j (vk ) = p vk at t | qt = S j , 1 ≤ j ≤ N; 1 ≤ k ≤ M.
In order to compute b j (vk ), first we count the number of times n j the state S j has occurred
in Table 1. Out of these count the number of times the pattern vk = {O1 , O2 , · · · , O8 } has
occurred and denote this number by nvk . Then
n vk
b j (vk ) = (20)
nj
Now we have showed how to compute the model λ = { A, B, π } and it can be used to de-
termine the state and hence the situation when a new pattern is observed. It is worth noting
several educational institutes have developed HMM packages for the MATLAB programming
language and are available on the Internet HMM Toolbox.
In this chapter we showed how the HMM can be used to provide the situational awareness
based on its states. We also showed how to build a HMM. We showed that fusion happens in
HMM.
204 Sensor Fusion and Its Applications
4. References
Belongie, S., Malik, J. & Puzicha, J. (2002). Shape matching and object recognition using shape
contexts, IEEE Trans. Pattern Anal. Mach. Intell. Vol. 24(No. 4): 509–522.
Beringer, D. & Hancock, P. (1989). Summary of the various definitions of situation awareness,
Proc. of Fifth Intl. Symp. on Aviation Psychology Vol. 2(No.6): 646 – 651.
Bernardin, K., Ogawara, K., Ikeuchi, K. & Dillmann, R. (2003). A hidden markov model based
sensor fusion approach for recognizing continuous human grasping sequences, Proc.
3rd IEEE International Conference on Humanoid Robots pp. 1 – 13.
Bruckner, D., Sallans, B. & Russ, G. (2007). Hidden markov models for traffic observation,
Proc. 5th IEEE Intl. Conference on Industrial Informatics pp. 23 – 27.
Dalal, N. & Triggs, B. (2005). Histograms of oriented gradients for human detection, IEEE
Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) Vol.
1: 886 – 893.
Damarla, T. (2008). Hidden markov model as a framework for situational awareness, Proc. of
Intl. Conference on Information Fusion, Cologne, Germany .
Damarla, T., Kaplan, L. & Chan, A. (2007). Human infrastructure & human activity detection,
Proc. of Intl. Conference on Information Fusion, Quebec City, Canada .
Damarla, T., Pham, T. & Lake, D. (2004). An algorithm for classifying multiple targets using
acoustic signatures, Proc. of SPIE Vol. 5429(No.): 421 – 427.
Damarla, T. & Ufford, D. (2007). Personnel detection using ground sensors, Proc. of SPIE Vol.
6562: 1 – 10.
Endsley, M. R. & Mataric, M. (2000). Situation Awareness Analysis and Measurement, Lawrence
Earlbaum Associates, Inc., Mahwah, New Jersey.
Green, M., Odom, J. & Yates, J. (1995). Measuring situational awareness with the ideal ob-
server, Proc. of the Intl. Conference on Experimental Analysis and Measurement of Situation
Awareness.
Hall, D. & Llinas, J. (2001). Handbook of Multisensor Data Fusion, CRC Press: Boca Raton.
HMM Toolbox (n.d.).
URL: www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html
Hough, P. V. C. (1962). Method and means for recognizing complex patterns, U.S. Patent
3069654 .
Houston, K. M. & McGaffigan, D. P. (2003). Spectrum analysis techniques for personnel de-
tection using seismic sensors, Proc. of SPIE Vol. 5090: 162 – 173.
Klein, L. A. (2004). Sensor and Data Fusion - A Tool for Information Assessment and Decision
Making, SPIE Press, Bellingham, Washington, USA.
Maj. Houlgate, K. P. (2004). Urban warfare transforms the corps, Proc. of the Naval Institute .
Pearl, J. (1986). Fusion, propagation, and structuring in belief networks, Artificial Intelligence
Vol. 29: 241 – 288.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference,
Morgan Kaufmann Publishers, Inc.
Press, D. G. (1998). Urban warfare: Options, problems and the future, Summary of a conference
sponsored by MIT Security Studies Program .
Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech
recognition, Proc. of the IEEE Vol. 77(2): 257 – 285.
Sarter, N. B. & Woods, D. (1991). Situation awareness: A critical but ill-defined phenomenon,
Intl. Journal of Aviation Psychology Vol. 1: 45–57.
Hidden Markov Model as a Framework for Situational Awareness 205
Singhal, A. & Brown, C. (1997). Dynamic bayes net approach to multimodal sensor fusion,
Proc. of SPIE Vol. 3209: 2 – 10.
Singhal, A. & Brown, C. (2000). A multilevel bayesian network approach to image sensor
fusion, Proc. ISIF, WeB3 pp. 9 – 16.
Smith, D. J. (2003). Situation(al) awareness (sa) in effective command and control, Wales .
Smith, K. & Hancock, P. A. (1995). The risk space representation of commercial airspace, Proc.
of the 8th Intl. Symposium on Aviation Psychology pp. 9 – 16.
Wang, L., Shi, J., Song, G. & Shen, I. (2007). Object detection combining recognition and
segmentation, Eighth Asian Conference on Computer Vision (ACCV) .
206 Sensor Fusion and Its Applications
Multi-sensorial Active Perception for Indoor Environment Modeling 207
X9
1. Introduction
For many applications, the information provided by individual sensors is often incomplete,
inconsistent, or imprecise. For problems involving detection, recognition and reconstruction
tasks in complex environments, it is well known that no single source of information can
provide the absolute solution, besides the computational complexity. The merging of
multisource data can create a more consistent interpretation of the system of interest, in
which the associated uncertainty is decreased.
Multi-sensor data fusion also known simply as sensor data fusion is a process of combining
evidence from different information sources in order to make a better judgment (Llinas &
Waltz, 1990; Hall, 1992; Klein, 1993). Although, the notion of data fusion has always been
around, most multisensory data fusion applications have been developed very recently,
converting it in an area of intense research in which new applications are being explored
constantly. On the surface, the concept of fusion may look to be straightforward but the
design and implementation of fusion systems is an extremely complex task. Modeling,
processing, and integrating of different sensor data for knowledge interpretation and
inference are challenging problems. These problems become even more difficult when the
available data is incomplete, inconsistent or imprecise.
In robotics and computer vision, the rapid advance of science and technology combined
with the reduction in the costs of sensor devices, has caused that these areas together, and
before considered as independent, strength the diverse needs of each. A central topic of
investigation in both areas is the recovery of the tridimensional structure of large-scale
environments. In a large-scale environment the complete scene cannot be captured from a
single referential frame or given position, thus an active way of capturing the information is
needed. In particular, having a mobile robot able to build a 3D map of the environment is
very appealing since it can be applied to many important applications. For example, virtual
exploration of remote places, either for security or efficiency reasons. These applications
depend not only on the correct transmission of visual and geometric information but also on
the quality of the information captured. The latter is closely related to the notion of active
perception as well as the uncertainty associated to each sensor. In particular, the behavior
any artificial or biological system should follow to accomplish certain tasks (e.g., extraction,
208 Sensor Fusion and Its Applications
simplification and filtering), is strongly influenced by the data supplied by its sensors. This
data is in turn dependent on the perception criteria associated with each sensorial input
(Conde & Thalmann, 2004).
A vast body of research on 3D modeling and virtual reality applications has been focused on
the fusion of intensity and range data with promising results (Pulli et al., 1997; Stamos &
Allen, 2000) and recently (Guidi et al., 2009). Most of these works consider the complete
acquisition of 3D points from the object or scene to be modeled, focusing mainly on the
registration and integration problems.
In the area of computer vision, the idea of extracting the shape or structure from an image
has been studied since the end of the 70’s. Scientists in computer vision were mainly
interested in methods that reflect the way the human eye works. These methods, known as
“shape-from-X”, extract depth information by using visual patterns of the images, such as
shading, texture, binocular vision, motion, among others. Because of the type of sensors
used in these methods, they are categorized as passive sensing techniques, i.e., data is
obtained without emitting energy and involve typically mathematical models of the image
formation and how to invert them. Traditionally, these models are based on physical
principles of the light interaction. However, due to the difficulties to invert them, is
necessary to assume several aspects about the physical properties of the objects in the scene,
such as the type of surface (Lambertian, matte) and albedo, which cannot be suitable to real
complex scenes.
However, although many elegant algorithms based on traditional approaches for depth
recovery have been developed, the fundamental problem of obtaining precise data is still a
difficult task. In particular, achieving geometric correctness and realism may require data
collection from different sensors as well as the correct fusion of all these observations.
Good examples are the stereo cameras that can produce volumetric scans that are
economical. However, these cameras require calibration or produce range maps that are
incomplete or of limited resolution. In general, using only 2D intensity images will provide
Multi-sensorial Active Perception for Indoor Environment Modeling 209
sparse measurements of the geometry which are non-reliable unless some simple geometry
about the scene to model is assumed. By fusing 2D intensity images with range finding
sensors, as first demonstrated in (Jarvis, 1992), a solution to 3D vision is realized -
circumventing the problem of inferring 3D from 2D.
One aspect of great importance in the 3D modeling reconstruction is to have a fast, efficient
and simple data acquisition process from the sensors and yet, have a good and robust
reconstruction. This is crucial when dealing with dynamic environments (e.g., people
walking around, illumination variation, etc.) and systems with limited battery-life. We can
simplify the way the data is acquired by capturing only partial but reliable range
information of regions of interest. In previous research work, the problem of tridimensional
scene recovery using incomplete sensorial data was tackled for the first time, specifically, by
using intensity images and a limited number of range data (Torres-Méndez & Dudek, 2003;
Torres-Méndez & Dudek, 2008). The main idea is based on the fact that the underlying
geometry of a scene can be characterized by the visual information and its interaction with
the environment together with its inter-relationships with the available range data. Figure 1
shows an example of how a complete and dense range map is estimated from an intensity
image and the associated partial depth map. These statistical relationships between the
visual and range data were analyzed in terms of small patches or neighborhoods of pixels,
showing that the contextual information of these relationships can provide information to
infer complete and dense range maps. The dense depth maps with their corresponding
intensity images are then used to build 3D models of large-scale man-made indoor
environments (offices, museums, houses, etc.)
Fig. 1. An example of the range synthesis process. The data fusion of intensity and
incomplete range is carried on to reconstruct a 3D model of the indoor scene. Image taken
from (Torres-Méndez, 2008).
In that research work, the sampling strategies for measuring the range data was determined
beforehand and remain fixed (vertical and horizontal lines through the scene) during the
data acquisition process. These sampling strategies sometimes carried on critical limitations
to get an ideal reconstruction as the quality of the input range data, in terms of the
geometric characteristics it represent, did not capture the underlying geometry of the scene
to be modeled. As a result, the synthesis process of the missing range data was very poor.
In the work presented in this chapter, we solve the above mentioned problem by selecting in
an optimal way the regions where the initial (minimal) range data must be captured. Here,
the term optimal refers in particular, to the fact that the range data to be measured must truly
210 Sensor Fusion and Its Applications
represent relevant information about the geometric structure. Thus, the input range data, in
this case, must be good enough to estimate, together with the visual information, the rest of
the missing range data.
Both sensors (camera and laser) must be fused (i.e., registered and then integrated) in a
common reference frame. The fusion of visual and range data involves a number of aspects
to be considered as the data is not of the same nature with respect to their resolution, type
and scale. The images of real scene, i.e., those that represent a meaningful concept in their
content, depend on the regularities of the environment in which they are captured (Van Der
Schaaf, 1998). These regularities can be, for example, the natural geometry of objects and
their distribution in space; the natural distributions of light; and the regularities that depend
on the viewer’s position. This is particularly difficult considering the fact that at each given
position the mobile robot must capture a number of images and then analyze the optimal
regions where the range data should be measured. This means that the laser should be
directed to those regions with accuracy and then the incomplete range data must be
registered with the intensity images before applying the statistical learning method to
estimate complete and dense depth maps.
The statistical studies of these images can help to understand these regularities, which are
not easily acquired from physical or mathematical models. Recently, there has been some
success when using statistical methods to computer vision problems (Freeman & Torralba,
2002; Srivastava et al., 2003; Torralba & Oliva, 2002). However, more studies are needed in
the analysis of the statistical relationships between intensity and range data. Having
meaningful statistical tendencies could be of great utility in the design of new algorithms to
infer the geometric structure of objects in a scene.
The outline of the chapter is as follows. In Section 2 we present related work to the problem
of 3D environment modeling focusing on approaches that fuse intensity and range images.
Section 3 presents our multi-sensorial active perception framework which statistically
analyzes natural and indoor images to capture the initial range data. This range data
together with the available intensity will be used to efficiently estimate dense range maps.
Experimental results under different scenarios are shown in Section 4 together with an
evaluation of the performance of the method.
2. Related Work
For the fundamental problem in computer vision of recovering the geometric structure of
objects from 2D images, different monocular visual cues have been used, such as shading,
defocus, texture, edges, etc. With respect to binocular visual cues, the most common are the
obtained from stereo cameras, from which we can compute a depth map in a fast and
economical way. For example, the method proposed in (Wan & Zhou, 2009), uses stereo
vision as a basis to estimate dense depth maps of large-scale scenes. They generate depth
map mosaics, with different angles and resolutions which are combined later in a single
large depth map. The method presented in (Malik and Choi, 2008) is based in the shape
from focus approach and use a defocus measure based in an optic transfer function
implemented in the Fourier domain. In (Miled & Pesquet, 2009), the authors present a novel
method based on stereo that help to estimate depth maps of scene that are subject to changes
Multi-sensorial Active Perception for Indoor Environment Modeling 211
in illumination. Other works propose to combine different methods to obtain the range
maps. For example, in (Scharstein & Szeliski, 2003) a stereo vision algorithm and structured
light are used to reconstruct scenes in 3D. However, the main disadvantage of above
techniques is that the obtained range maps are usually incomplete or of limited resolution
and in most of the cases a calibration is required.
Another way of obtaining a dense depth map is by using range sensors (e.g., laser scanners),
which obtain geometric information in a direct and reliable way. A large number of possible
3D scanners are available on the market. However, cost is still the major concern and the
more economical tend to be slow. An overview of different systems available to 3D shape of
objects is presented in (Blais, 2004), highlighting some of the advantages and disadvantages
of the different methods. Laser Range Finders directly map the acquired data into a 3D
volumetric model thus having the ability to partly avoid the correspondence problem
associated with visual passive techniques. Indeed, scenes with no textural details can be
easily modeled. Moreover, laser range measurements do not depend on scene illumination.
More recently, techniques based on learning statistics have been used to recover the
geometric structure from 2D images. For humans, to interpret the geometric information of
a scene by looking to one image is not a difficult task. However, for a computational
algorithm this is difficult as some a priori knowledge about the scene is needed.
For example, in (Torres-Méndez & Dudek, 2003) it was presented for the first time a method
to estimate dense range map based on the statistical correlation between intensity and
available range as well as edge information. Other studies developed more recently as in
(Saxena & Chung, 2008), show that it is possible to recover the missing range data in the
sparse depth maps using statistical learning approaches together with the appropriate
characteristics of objects in the scene (e.g., edges or cues indicating changes in depth). Other
works combine different types of visual cues to facilitate the recovery of depth information
or the geometry of objects of interest.
In general, no matter what approach is used, the quality of the results will strongly depend
on the type of visual cues used and the preprocessing algorithms applied to the input data.
The key role of our active perception process concentrates on capturing range data from
places where the visual cues of the images show depth discontinuities. Man-made indoor
environments have inherent geometric and photometric characteristics that can be exploited
to help in the detection of this type of visual cues.
Given that our method is based on a statistical analysis, the type of images to analyze in the
database must contain characteristics and properties similar to the scenes of interest, as we
focus on man-made scenes, we should have images containing those types of images.
However, we start our experiments using a public available image database, the van
Hateren database, which contains scenes of natural images. As this database contains
important changes in depth in their scenes, this turns out to be the main characteristic to be
considered so that our method can be functional.
The statistical analysis of small patches implemented is based in part on the Feldman and
Yunes algorithm (Feldman & Yunes, 2006). This algorithm extracts characteristics of interest
from an image through the observation of an image database and obtains an internal
representation that concentrates the relevant information in a form of a ternary variable. To
generate the internal representation we follow three steps. First, we reduce (in scale) the
images in the database (see Figure 2). Then, each image is divided in patches of same size
(e.g. 13 x13 pixels), with these patches we make a new database which is decomposed in its
principal components by applying PCA to extract the most representative information,
which is usually contained, in the first five eigenvectors. In Figure 3, the eigenvectors are
depicted. These eigenvectors are the filters that are used to highlight certain characteristics
on the intensity images, specifically the regions with relevant geometric information.
The last step consists on applying a threshold in order to map the images onto a ternary
variable where we assign -1 value to very low values, 1 to high values and 0 otherwise. This
way, we can obtain an internal representation
where k represents the number of filters (eigenvectors). G is the set of pixels of the scaled
image.
Fig. 2. Some of the images taken from the van Hateren database. These images are reduced
by a scaled factor of 2.
Fig. 3. The first 5 eigenvectors (zoomed out). These eigenvectors are used as filters to
highlight relevant geometric information.
The internal representation gives information about the changes in depth as it is shown in
Figure 4. It can be observed that, depending on the filter used, the representation gives a
different orientation on the depth discontinuities in the scene. For example, if we use the
first filter, the highlighted changes are the horizontal ones. If we applied the second filter,
the discontinuities obtained are the vertical ones.
This internal representation is the basis to capture the initial range data from which we can
obtain a dense range map.
Fig. 5. Camera and laser scanner orientation and world coordinate system. Image taken
from (Torres-Méndez & Dudek, 2008).
In order to compute the maximum a posteriori (MAP) for a depth value Ri of a voxel Vi, we
need first to build an approximate distribution of the conditional probability P(fi fNi) and
sample from it. For each new depth value Ri R to estimate, the samples that correspond to
Multi-sensorial Active Perception for Indoor Environment Modeling 215
the neighborhood system of voxel i, i.e., Ni, are taken and the distribution of Ri is built as a
histogram of all possible values that occur in the sample. The neighborhood system Ni (see
Figure 6) is an infinite real subset of voxels, denoted by Nreal. Taking the MRF model as a
basis, it is assumed that the depth value Ri depends only on the intensity and range values
of its immediate neighbors defined in Ni. If we define a set
that contains all occurrences of Ni in Nreal, then the conditional probability distribution of Ri
can be estimated through a histogram based on the depth values of voxels representing each
Ni in (Ri). Unfortunately, the sample is finite and there exists the possibility that no
neighbor has exactly the same characteristics in intensity and range, for that reason we use
the heuristic of finding the most similar value in the available finite sample ’(Ri), where
’(Ri) (Ri). Now, let Ap be a local neighborhood system for voxel p, which is composed
for neighbors that are located within radius r and is defined as:
Ap { Aq N dist( p , q ) r }. (3)
In the non-parametric approximation, the depth value Rp of voxel Vp with neighborhood Np,
is synthesized by selecting the most similar neighborhood Nbest to Np.
All neighborhoods Aq in Ap that are similar to Nbest are included in ’(Rp) as follows:
N p Aq 1 N p N best . (5)
The similarity measure between two neighborhoods Na and Nb is described over the partial
data of the two neighborhoods and is calculated as follows:
216 Sensor Fusion and Its Applications
N a Nb
G , v v0 D (6)
vN a , N b
where v0 represents the voxel located in the center of the neighborhood Na and Nb, v is the
neighboring pixel of v0 . Ia and Ra are the intensity and range values to be compared. G is a
Gaussian kernel that is applied to each neighborhood so that voxels located near the center
have more weight that those located far from it. In this way we can build a histogram of
depth values Rp in the center of each neighborhood in ’(Ri).
is an nxn window centered at Vp, then for each voxel Vp, we calculate its priority value as
follows
C(Vi )F(Vi )
iN p
P(Vp ) , (8)
Np 1
where . indicates the total number of voxels in Np. Initially, the priority value of C(Vi) for
each voxel Vp is assigned a value of 1 if the associated ternary value is 1, 0.8 if its ternary
value is 0 and 0.2 if -1. F(Vi) is a flag function, which takes value 1 if the intensity and range
values of Vi are known, and 0 if its range value is unknown. In this way, voxels with greater
priority are synthesized first.
4. Experimental Results
In order to evaluate the performance of the method, we use three databases, two of which
are available on the web. One is the Middlebury database (Hiebert-Treuer, 2008) which
contains intensity and dense range maps of 12 different indoor scenes containing objects
with a great variety of texture. The other is the USF database from the CESAR lab at Oak
Ridge National Laboratory. This database has intensity and dense range maps of indoor
scenes containing regular geometric objects with uniform textures. The third database was
created by capturing images using a stereo vision system in our laboratory. The scenes
contain regular geometric objects with different textures. As we have ground truth range
data from the public databases, we first simulate sparse range maps by eliminating some of
the range information using different sampling strategies that follows different patterns
(squares, vertical and horizontal lines, etc.) The sparse depth maps are then given as an
input to our algorithm to estimate dense range maps. In this way, we can compare the
ground-truth dense range maps with those synthesized by our method and obtain a quality
measure for the reconstruction.
218 Sensor Fusion and Its Applications
To evaluate our results we compute a well-know metric, called mean absolute residual
(MAR) error. The MAR error of two matrices R1 and R2 is defined as
R1 i , j R2 i , j
i, j
MAR (9)
# unknown range voxels
In general, just computing the MAR error is not a good mechanism to evaluate the success
of the method. For example, when there are few results with a high MAR error, the average
of the MAR error elevates. For this reason, we also compute the absolute difference at each
pixel and show the result as an image, so we can visually evaluate our performance.
In all the experiments, the size of the neighborhood N is 3x3 pixels for one experimental set
and 5x5 pixels for other. The search window varies between 5 and 10 pixels. The missing
range data in the sparse depth maps varies between 30% and 50% of the total information.
4.1 Range synthesis on sparse depth maps with different sampling strategies
In the following experiments, we have used the two first databases described above. For
each of the input range maps in the databases, we first simulate a sparse depth map by
eliminating a given amount of range data from these dense maps. The areas with missing
depth values follow an arbitrary pattern (vertical, horizontal lines, squares). The size of
these areas depends on the amount of information that is eliminated for the experiment
(from 30% up to 50%). After obtaining a simulated sparse depth map, we apply the
proposed algorithm. The result is a synthesized dense range map. We compare our results
with the ground truth range map computing the MAR error and also an image of the
absolute difference at each pixel.
Figure 8 shows the experimental setup of one of the scenes in the Middlebury database. In
8b the ground truth range map is depicted. Figure 9 shows the synthesized results for
different sampling strategies for the baby scene.
(a) Intensity image. (b) Ground truth dense (c) Ternary variable image.
range map.
Fig. 8. An example of the experimental setup to evaluate the method (Middlebury database).
Multi-sensorial Active Perception for Indoor Environment Modeling 219
Input range map Synthesized result Input range map Synthesized result
(a) (b)
Fig. 9. Experimental results after running our range synthesis method on the baby scene.
The first column shows the incomplete depth maps and the second column the synthesized
dense range maps. In the results shown in Figure 9a, most of the missing information is
concentrated in a bigger area compared to 9b. It can be observed that for some cases, it is not
possible to have a good reconstruction as there is little information about the inherent
statistics in the intensity and its relationship with the available range data. In the
synthesized map corresponding to the set in Figure 9a following a sampling strategy of
vertical lines, we can observe that there is no information of the object to be reconstructed
and for that reason it does not appear in the result. However, in the set of images of Figure
9b the same sampling strategies were used and the same amount of range information as of
9a is missing, but in these incomplete depth maps the unknown information is distributed in
four different regions. For this reason, there is much more information about the scene and
the quality of the reconstruction improves considerably as it can be seen. In the set of Figure
8c, the same amount of unknown depth values is shown but with a greater distribution over
the range map. In this set, the variation between the reconstructions is small due to the
amount of available information. A factor that affects the quality of the reconstruction is the
existence of textures in the intensity images as it affects the ternary variable computation.
For the case of the Middlebury database, the images have a great variety of textures, which
affects directly the values in the ternary variable as it can be seen in Figure 8c.
220 Sensor Fusion and Its Applications
(a) Intensity image. (b) Ground truth dense (c) Ternary variable image.
range map.
Fig. 10. An example of the experimental setup to evaluate the proposed method (USF
database).
4.2 Range synthesis on sparse depth maps obtained from the internal representation
We conducted experiments where the sparse depth maps contain range data only on regions
indicated by the internal representation. Therefore, apart from greatly reducing the
acquisition time, the initial range would represent all the relevant variations related to depth
discontinuities in the scene. Thus, it is expected that the dense range map will be estimated
more efficiently.
In Figure 10 an image from the USF database is shown with its corresponding ground truth
range map and ternary variable image. In the USF database, contrary to the Middlebury
database, the scenes are bigger and objects are located at different depths and the texture is
uniform. Figure 10c depicts the ternary variable, which represents the initial range given as
an input together with the intensity image to the range synthesis process. It can be seen that
the discontinuities can be better appreciated in objects as they have a uniform texture.
Figure 11 shows the synthesized dense range map. As before, the quality of the
reconstruction depends on the available information. Good results are obtained as the
known range is distributed around the missing range. It is important to determine which
values inside the available information have greater influence on the reconstruction so we
can give to them a high priority.
In general, the experimental results show that the ternary variable influences in the quality
of the synthesis, especially in areas with depth discontinuities.
Fig. 11. The synthesized dense range map of the initial range values indicated in figure 10c.
Multi-sensorial Active Perception for Indoor Environment Modeling 221
(a) Left (stereo) image. (b) Ternary variable images. (c) Sparse depth maps.
Fig. 12. Input data for three scenes captured in our laboratory.
222 Sensor Fusion and Its Applications
5. Conclusion
We have presented an approach to recover dense depth maps based on the statistical
analysis of visual cues. The visual cues extracted represent regions indicating depth
discontinuities in the intensity images. These are the regions where range data should be
captured and represent the range data given as an input together with the intensity map to
the range estimation process. Additionally, the internal representation of the intensity map
is used to assign priority values to the initial range data. The range synthesis is improved as
the orders in which the voxels are synthesized are established from these priority values.
Multi-sensorial Active Perception for Indoor Environment Modeling 223
The quality of the results depends on the amount and type of the initial range information,
in terms of the variations captured on it. In other words, if the correlation between the
intensity and range data available represents (although partially) the correlation of the
intensity near regions with missing range data, we can establish the statistics to be looked
for in such available input data.
Also, as in many non-deterministic methods, we have seen that the results depend on the
suitable selection of some parameters. One is the neighborhood size (N) and the other the
radius of search (r). With the method here proposed the synthesis near the edges (indicated
by areas that present depth discontinuities) is improved compared to prior work in the
literature.
While a broad variety of problems have been covered with respect to the automatic 3D
reconstruction of unknown environments, there remain several open problems and
unanswered questions. With respect to the data collection, a key issue in our method is the
quality of the observable range data. In particular, with the type of the geometric
characteristics that can be extracted in relation to the objects or scene that the range data
represent. If the range data do not capture the inherent geometry of the scene to be modeled,
then the range synthesis process on the missing range values will be poor. The experiments
presented in this chapter were based on acquiring the initial range data in a more directed
way such that the regions captured reflect important changes in the geometry.
6. Acknowledgements
The author gratefully acknowledges financial support from CONACyT (CB-2006/55203).
7. References
Besl, P.J. & McKay, N.D. (1992). A method for registration of 3D shapes. IEEE Transactions on
Pattern Analysis and Machine Intelligence, Vol. 4, No. 2, 239-256, 1992.
Blais, F. (2004). A review of 20 years of range sensor development. Journal of Electronic
Imaging, Vol. 13, No. 1, 231–240, 2004.
Conde, T. & Thalmann, D. (2004). An artificial life environment for autonomous virtual
agents with multi-sensorial and multi-perceptive features. Computer Animation
and Virtual Worlds, Vol. 15, 311-318, ISSN: 1546-4261.
Feldman, T. & Younes, L. (2006). Homeostatic image perception: An artificial system.
Computer Vision and Image Understanding, Vol. 102, No. 1, 70–80, ISSN:1077-3142.
Freeman, W.T. & Torralba, A. (2002). Shape recipes: scene representations that refer to the
image. Adv. In Neural Information Processing Systems 15 (NIPS).
Guidi, G. & Remondino, F. & Russo, M. & Menna, F. & Rizzi, A. & Ercoli, S. (2009). A Multi-
Resolution Methodology for the 3D Modeling of Large and Complex Archeological
Areas. Internation Journal of Architectural Computing, Vol. 7, No. 1, 39-55, Multi
Science Publishing.
Hall, D. (1992). Mathematical Techniques in Multisensor Data Fusion. Boston, MA: Artech House.
Harris, C. & Stephens, M. (1988). A combined corner and edge detector. In Fourth Alvey
Vision Conference, Vol. 4, pp. 147–151, 1988, Manchester, UK.
Hiebert-Treuer, B. (2008). Stereo datasets with ground truth.
224 Sensor Fusion and Its Applications
http://vision.middlebury.edu/stereo/data/scenes2006/.
Jarvis, R.A. (1992). 3D shape and surface colour sensor fusion for robot vision. Robotica, Vol.
10, 389–396.
Klein, L.A. (1993). Sensor and Data Fusion Concepts and Applications. SPIE Opt. Engineering Press,
Tutorial Texts, Vol. 14.
Klette, R. & Schlns, K. (1998). Computer vision: three-dimensional data from images. Springer-
Singapore. ISBN: 9813083719, 1998.
Llinas, J. & Waltz, E. (1990). Multisensor Data Fusion. Boston, MA: Artech House.
Lowe, D.G. (1999). Object recognition from local scale-invariant features. In Proceedings of the
International Conference on Computer Vision ICCV, 1150–1157.
Malik, A.S. & Choi, T.-S. (2007). Application of passive techniques for three dimensional
cameras. IEEE Transactions on Consumer Electronics, Vol. 53, No. 2, 258–264, 2007.
Malik, A. S. & Choi, T.-S. (2008). A novel algorithm for estimation of depth map using image
focus for 3D shape recovery in the presence of noise. Pattern Recognition, Vol. 41,
No. 7, July 2008, 2200-2225.
Miled, W. & Pesquet, J.-C. (2009). A convex optimization approach for depth estimation
under illumination variation. IEEE Transactions on image processing, Vol. 18, No. 4,
2009, 813-830.
Pulli, K. & Cohen, M. & Duchamp, M. & Hoppe, H. & McDonald, J. & Shapiro, L. & Stuetzle,
W. (1997). Surface modeling and display from range and color data. Lectures Notes
in Computer Science 1310: 385-397, ISBN: 978-3-540-63507-9, Springer Berlin.
Saxena, A. & Chung, S. H. (2008). 3D depth reconstruction from a single still image.
International journal of computer vision, Vol. 76, No. 1, 2008, 53-69.
Scharstein, D. & Szeliski, R. (2003). High-accuracy stereo depth maps using structured light.
IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1,
pp. 195–202.
Stamos, I. & Allen, P.K. (2000). 3D model construction using range and image data. In
Proceedings of the International Conference on Vision and Pattern Recognition, 2000.
Srivastava, A., Lee, A.B., Simoncelli, E.P. & Zhu, S.C. (2003). On advances in statistical
modeling of natural images. Journal of the Optical Society of America, Vol. 53, No. 3,
375–385, 2003.
Torralba, A. & Oliva, A. (2002). Depth estimation from image structure. IEEE Trans. Pattern
Analysis and Machine Intelligence, Vol. 24, No. 9, 1226–1238, 2002.
Torres-Méndez, L. A. & Dudek, G. (2003). Statistical inference and synthesis in the image
domain for mobile robot environment modeling. In Proc. of the IEEE/RSJ Conference
on Intelligent Robots and Systems, Vol. 3, pp. 2699–2706, October, Las Vegas, USA.
Torres-Méndez, L. A. & Dudek, G. (2008). Inter-Image Statistics for 3D Environment
Modeling. International Journal of Computer Vision, Vol. 79, No. 2, 137-158, 2008.
ISSN: 0920-5691.
Torres-Méndez, L. A. (2008). Inter-Image Statistics for Mobile Robot Environment Modeling.
VDM Verlag Dr. Muller, 2008, ISBN: 3639068157.
Van Der Schaaf, A. (1998). Natural Image Statistics and Visual Processing. PhD thesis,
Rijksuniversiteit Groningen, 1998.
Wan, D. & Zhou, J. (2009). Multiresolution and wide-scope depth estimation using a dual-
PTZ-camera system. IEEE Transactions on Image Processing, Vol. 18, No. 3, 677–682.
Mathematical Basis of Sensor Fusion in Intrusion Detection Systems 225
1
10
Balakrishnan Narayanaswamy
Associate Director, Indian Institute of Science, Bangalore
India
1. Introduction
Intrusion Detection Systems (IDS) gather information from a computer or a network, and ana-
lyze this information to identify possible security breaches against the system or the network.
The network traffic (with embedded attacks) is often complex because of multiple communi-
cation modes with deformable nature of user traits, evasion of attack detection and network
monitoring tools, changes in users’ and attackers’ behavior with time, and sophistication of
the attacker’s attempts in order to avoid detection. This affects the accuracy and the relia-
bility of any IDS. An observation of various IDSs available in the literature shows distinct
preferences for detecting a certain class of attack with improved accuracy, while performing
moderately on other classes. The availability of enormous computing power has made it pos-
sible for developing and implementing IDSs of different types on the same network. With
the advances in sensor fusion, it has become possible to obtain a more reliable and accurate
decision for a wider class of attacks, by combining the decisions of multiple IDSs.
Clearly, sensor fusion for performance enhancement of IDSs requires very complex observa-
tions, combinations of decisions and inferences via scenarios and models. Although, fusion
in the context of enhancing the intrusion detection performance has been discussed earlier
in literature, there is still a lack of theoretical analysis and understanding, particularly with
respect to correlation of detector decisions. The theoretical study to justify why and how the
sensor fusion algorithms work, when one combines the decisions from multiple detectors has
been undertaken in this chapter. With a precise understanding as to why, when, and how
particular sensor fusion methods can be applied successfully, progress can be made towards
a powerful new tool for intrusion detection: the ability to automatically exploit the strengths
and weaknesses of different IDSs. The issue of performance enhancement using sensor fusion
is therefore a topic of great draw and depth, offering wide-ranging implications and a fasci-
nating community of researchers to work within.
226 Sensor Fusion and Its Applications
The mathematical basis for sensor fusion that provides enough support for the acceptability
of sensor fusion in performance enhancement of IDSs is introduced in this chapter. This chap-
ter justifies the novelties and the supporting proof for the Data-dependent Decision (DD) fu-
sion architecture using sensor fusion. The neural network learner unit of the Data-dependent
Decision fusion architecture aids in improved intrusion detection sensitivity and false alarm
reduction. The theoretical model is undertaken, initially without any knowledge of the avail-
able detectors or the monitoring data. The empirical evaluation to augment the mathematical
analysis is illustrated using the DARPA data set as well as the real-world network taffic. The
experimental results confirm the analytical findings in this chapter.
2. Related Work
Krogh & Vedelsby (1995) prove that at a single data point the quadratic error of the ensemble
estimator is guaranteed to be less than or equal to the average quadratic error of the compo-
nent estimators. Hall & McMullen (2000) state that if the tactical rules of detection require
that a particular certainty threshold must be exceeded for attack detection, then the fused de-
cision result provides an added detection up to 25% greater than the detection at which any
individual IDS alone exceeds the threshold. This added detection equates to increased tactical
options and to an improved probability of true negatives Hall & McMullen (2000). Another
attempt to illustrate the quantitative benefit of sensor fusion is provided by Nahin & Pokoski
(1980). Their work demonstrates the benefits of multisensor fusion and their results also pro-
vide some conceptual rules of thumb.
Chair & Varshney (1986) present an optimal data fusion structure for distributed sensor net-
work, which minimizes the cumulative average risk. The structure weights the individual
decision depending on the reliability of the sensor. The weights are functions of probability of
false alarm and the probability of detection. The maximum a posteriori (MAP) test or the Like-
lihood Ratio (L-R) test requires either exact knowledge of the a priori probabilities of the tested
hypotheses or the assumption that all the hypotheses are equally likely. This limitation is over-
come in the work of Thomopoulos et al. (1987). Thomopoulos et al. (1987) use the Neyman-
Pearson test to derive an optimal decision fusion. Baek & Bommareddy (1995) present optimal
decision rules for problems involving n distributed sensors and m target classes.
Aalo & Viswanathan (1995) perform numerical simulations of the correlation problems to
study the effect of error correlation on the performance of a distributed detection systems.
The system performance is shown to deteriorate when the correlation between the sensor
errors is positive and increasing, while the performance improves considerably when the cor-
relation is negative and increasing. Drakopoulos & Lee (1995) derive an optimum fusion rule
for the Neyman-Pearson criterion, and uses simulation to study its performance for a specific
type of correlation matrix. Kam et al. (1995) considers the case in which the class-conditioned
sensor-to-sensor correlation coefficient are known, and expresses the result in compact form.
Their approach is a generalization of the method adopted by Chair & Varshney (1986) for
solving the data fusion problem for fixed binary local detectors with statistically independent
decisions. Kam et al. (1995) uses Bahadur-Lazarsfeld expansion of the probability density
functions. Blum et al. (1995) study the problem of locally most powerful detection for corre-
lated local decisions.
Mathematical Basis of Sensor Fusion in Intrusion Detection Systems 227
The next section attempts a theoretical modeling of sensor fusion applied to intrusion detec-
tion, with little or no knowledge regarding the detectors or the network traffic.
3. Theoretical Analysis
The choice of when to perform the fusion depends on the types of sensor data available and
the types of preprocessing performed by the sensors. The fusion can occur at the various lev-
els like, 1) input data level prior to feature extraction, 2) feature vector level prior to identity
declaration, and 3) decision level after each sensor has made an independent declaration of
identity.
Sensor fusion is expected to result in both qualitative and quantitative benefits for the intru-
sion detection application. The primary aim of sensor fusion is to detect the intrusion and to
make reliable inferences, which may not be possible from a single sensor alone. The particu-
lar quantitative improvement in estimation that results from using multiple IDSs depends on
the performance of the specific IDSs involved, namely the observational accuracy. Thus the
fused estimate takes advantage of the relative strengths of each IDS, resulting in an improved
estimate of the intrusion detection. The error analysis techniques also provide a means for de-
termining the specific quantitative benefits of sensor fusion in the case of intrusion detection.
The quantitative benefits discover the phenomena that are likely rather than merely chance of
occurrences.
To perform the theoretical analysis, it is necessary to model the process under consideration.
Consider a simple fusion architecture as given in Fig. 1 with n individual IDSs combined by
means of a fusion unit. To start with, consider a two dimensional problem with the detectors
responding in a binary manner. Each of the local detector collects an observation x j ∈ m
and transforms it to a local decision sij ∈ {0, 1}, i = 1, 2, ..., n, where the decision is 0 when the
traffic is detected normal or else 1. Thus sij is the response of the ith detector to the network
connection belonging to class j = {0, 1}, where the classes correspond to normal traffic and
the attack traffic respectively. These local decisions sij are fed to the fusion unit to produce an
unanimous decision y = s j , which is supposed to minimize the overall cost of misclassification
and improve the overall detection rate.
228 Sensor Fusion and Its Applications
The fundamental problem of network intrusion detection can be viewed as a detection task to
decide whether network connection x is a normal one or an attack. Assume a set of unknown
features e = {e1 , e2 , ..., em } that are used to characterize the network traffic. The feature ex-
tractor is given by ee ( x ) ⊂ e. It is assumed that this observed variable has a deterministic
component and a random component and that their relation is additive. The deterministic
component is due to the fact that the class is discrete in nature, i.e., during detection, it is
known that the connection in either normal or an attack. The imprecise component is due
to some random processes which in turn affects the quality of extracted features. Indeed, it
has a distribution governed by the extracted feature set often in a nonlinear way. By ignor-
ing the source of distortion in extracted network features ee ( x ), it is assumed that the noise
component is random (while in fact it may not be the case when all possible variations can be
systematically incorporated into the base-expert model).
where x is the sniffed network traffic, ee is a feature extractor, and θi is a set of parameters
associated to the detector indexed i. There exists several types of intrusion detectors, all of
which can be represented by the above equation.
Sensor fusion results in the combination of data from sensors competent on partially overlap-
ping frames. The output of a fusion system is characterized by a variable s, which is a function
of uncertain variables s1 , ..., sn , being the output of the individual IDSs and given as:
where f (.) corresponds to the fusion function. The independent variables (i.e., information
about any group of variables does not change the belief about the others) s1 , ..., sn , are impre-
cise and dependent on the class of observation and hence given as:
Variance of the IDSs determines how good their average quality is when each IDS acts indi-
vidually. Lower variance corresponds to a better performance. Covariance among detectors
measures the dependence of the detectors. The more the dependence, the lesser the gain ben-
efited out of fusion.
Let us consider two cases here. In the first case, n responses are available for each access and
these n responses are used independent of each other. The average of variance of s j over all
j
i = 1, 2, ..., n, denoted as (σav )2 is given as:
j 1 n j 2
(σav )2 =
n i∑
(σi ) (4)
=1
In the second case, all n responses are combined using the mean operator. The variance over
j
many accesses is denoted by (σ f usion )2 and is called the variance of average given by:
j 1 n j 2 1 n n
j j j
(σ f usion )2 = ∑ (σi ) + 2 ∑ ∑ ρ σσ (5)
2
n i =1 n i=1,i=k k=1,i=k i,k i k
j
where ρi,k is the correlation coefficient between the ith and kth detectors and for j taking the
different class values. The first term is the average variance of the base-experts while the
second term is the covariance between ith and kth detectors for i = k. This is because the term
j j j
ρi,k σi σk is by definition equivalent to correlation. On analysis, it is seen that:
j j
(σ f usion )2 ≤ (σav )2 (6)
When two detector scores are merged by a simple mean operator, the resultant variance of
the final score will be reduced with respect to the average variance of the two original scores.
j
Since 0 ≤ ρm,n ≤ 1,
1 j 2 j
(σ ) ≤ (σ f usion )2 (7)
n av
j
Equation 6 and equation 7 give the lower and upper bound of (σ f usion )2 , attained with corre-
lation and uncorrelation respectively. Any positive correlation results in a variance between
these bounds. Hence, by combining responses using the mean operator, the resultant variance
is assured to be smaller than the average (not the minimum) variance.
Fusion of the scores reduces variance, which in turn results in reduction of error (with respect
to the case where scores are used separately). To measure explicitly the factor of reduction in
variance,
1 j 2 j j
(σ ) ≤ (σ f usion )2 ≤ (σav )2 (8)
n av
j
( σ )2
Factor of reduction in variance, (vr ) = (σk av )2
f usion
1 ≤ vr ≤ n
230 Sensor Fusion and Its Applications
This clearly indicates that the reduction in variance is more when more detectors are used,
i.e., increasing n, the better will be the combined system, even if the hypotheses of underlying
IDSs are correlated. This comes at a cost of increased computation, proportional to the value
of n. The reduction in variance of the individual classes results in lesser overlap between the
class distributions. Thus the chances of error reduces, which in turn results in an improved
detection. This forms the argument in this chapter for why fusion using multiple detectors
works for intrusion detection application.
From the above analysis using a mean operator for fusion, the conclusion drawn are the fol-
lowing:
The analysis explains and shows that fusing two systems of different performances is not al-
ways beneficial. The theoretical analysis shows that if the weaker IDS has (class-dependent)
Mathematical Basis of Sensor Fusion in Intrusion Detection Systems 231
variance three times larger that of the best IDS, the gain due to fusion breaks down. This is
even more true for correlated base-experts as correlation penalizes this limit further. It is also
seen that fusing two uncorrelated IDSs of similar performance always result in improved per-
formance. Finally, fusing two correlated IDSs of similar performance will be beneficial only
when the covariance of the two IDSs are less than the variance of the IDSs.
It is necessary to show that a lower bound of accuracy results in the case of sensor fusion. This
can be proved as follows:
Given the fused output as s = ∑i wi si , the quadratic error of a sensor indexed i, (ei ), and
also the fused sensor, (e f usion ) are given by:
ei = ( s i − c )2 (10)
and
e f usion = (s f usion − c)2 (11)
respectively, where wi is the weighting on the ith detector, and c is the target. The ambiguity
of the sensor is defined as:
a i = ( s i − s )2 (12)
The squared error of the fused sensor is seen to be equal to the weighted average squared
error of the individuals, minus a term which measures average correlation. This allows for
non-uniform weights (with the constraint ∑i wi = 1). Hence, the general form of the ensem-
ble output is s = ∑i wi si .
a f usion = ∑ wi a i = ∑ wi ( s i − s )2 (13)
i i
On solving equation 13, the error due to the combination of several detectors is obtained as
the difference between the weighted average error of individual detectors and the ambiguity
among the fusion member decisions.
e f usion = ∑ wi ( s i − c )2 − ∑ wi ( s i − s )2 (14)
i i
The ambiguity among the fusion member decisions is always positive and hence the combina-
tion of several detectors is expected to be better than the average over several detectors. This
result turns out to be very important for the focus of this chapter.
232 Sensor Fusion and Its Applications
4. Solution Approaches
In the case of fusion problem, the solution approaches depend on whether there is any knowl-
edge regarding the traffic and the intrusion detectors. This section initially considers no
knowledge of the IDSs and the intrusion detection data and later with a knowledge of avail-
able IDSs and evaluation dataset. There is an arsenal of different theories of uncertainty and
methods based on these theories for making decisions under uncertainty. There is no con-
sensus as to which method is most suitable for problems with epistemic uncertainty, when
information is scarce and imprecise. The choice of heterogeneous detectors is expected to re-
sult in decisions that conflict or be in consensus, completely or partially. The detectors can
be categorized by their output si , i.e., probability (within the range [0, 1]), Basic Probability
Assignment (BPA) m (within the range [0, 1]), membership function (within the range [0, 1]),
distance metric (more than or equal to zero), or log-likelihood ratio (a real number).
Consider a body of evidence ( F; m), where F represents the set of all focal elements and m their
corresponding basic probability assignments. This analysis without any knowledge of the sys-
tem or the data, attempts to prove the acceptance of sensor fusion in improving the intrusion
detection performance and hence is unlimited in scope. In this analysis the Dempster-Shafer
fusion operator is used since it is more acceptable for intrusion detection application as ex-
plained below.
Dempster-Shafer theory considers two types of uncertainty; 1) due to the imprecision in the
evidence, and 2) due to the conflict. Non specificity and strife measure the uncertainty due to
imprecision and conflict, respectively. The larger the focal elements of a body of evidence, the
more imprecise is the evidence and, consequently, the higher is non specificity. When the evi-
dence is precise (all the focal elements consist of a single member), non specificity is zero. The
importance of Dempster-Shafer theory in intrusion detection is that in order to track statis-
tics, it is necessary to model the distribution of decisions. If these decisions are probabilistic
assignments over the set of labels, then the distribution function will be too complicated to
retain precisely. The Dempster-Shafer theory of evidence solves this problem by simplifying
the opinions to Boolean decisions, so that each detector decision lies in a space having 2Θ ele-
ments, where Θ defines the working space. In this way, the full set of statistics can be specified
using 2Θ values.
The DS rule corresponds to conjunction operator since it builds the belief induced by accepting
two pieces of evidence, i.e., by accepting their conjunction. Shafer developed the DS theory of
evidence based on the model that all the hypotheses in the FoD are exclusive and the frame
is exhaustive. The purpose is to combine/aggregate several independent and equi-reliable
sources of evidence expressing their belief on the set. The aim of using the DS theory of
Mathematical Basis of Sensor Fusion in Intrusion Detection Systems 233
fusion is that with any set of decisions from heterogeneous detectors, sensor fusion can be
modeled as utility maximization. DS theory of combination conceives novel categories that
classify empirical evidence in a novel way and, possibly, are better able to discriminate the
relevant aspects of emergent phenomena. Novel categories detect novel empirical evidence,
that may be fragmentary, irrelevant, contradictory or supportive of particular hypotheses.
The DS theory approach for quantifying the uncertainty in the performance of a detector and
assessing the improvement in system performance, consists of three steps:
1. Model uncertainty by considering each variable separately. Then a model that considers
all variables together is derived.
2. Propagate uncertainty through the system, which results in a model of uncertainty in
the performance of the system.
3. Assess the system performance enhancement.
In the case of Dempster-Shafer theory, Θ is the Frame of Discernment (FoD), which defines
the working space for the desired application. FoD is expected to contain all propositions of
which the information sources (IDSs) can provide evidence. When a proposition corresponds
to a subset of a frame of discernment, it is said that the frame discerns that proposition. It is
expected that the elements of the frame of discernment, Θ are assumed to be exclusive propo-
sitions. This is a constraint, which always gets satisfied in intrusion detection application
because of the discrete nature of the detector decision. The belief of likelihood of the traffic to
be in an anomalous state is detected by various IDSs by means of a mass to the subsets of the
FoD.
The DS theory is a generalization of the classical probability theory with its additivity axiom
excluded or modified. The probability mass function ( p) is a mapping which indicates how
the probability mass is assigned to the elements. The Basic Probability Assignment (BPA)
function (m) on the other hand is the set mapping, and the two can be related ∀ A ⊆ Θ as
m( A) = ∑ BA p( B) and hence obviously m( A) relates to a belief structure. The mass m is very
near to the probabilistic mass p, except that it is shared not only by the single hypothesis but
also to the union of the hypotheses.
In DS theory, rather than knowing exactly how the probability is distributed to each element
B ∈ Θ, we just know by the BPA function m that a certain quantity of a probability mass is
somehow divided among the focal elements. Because of this less specific knowledge about
the allocation of the probability mass, it is difficult to assign exactly the probability associated
with the subsets of the FoD, but instead we assign two measures: the (1) belief (Bel) and (2)
plausibility (Pl), which correspond to the lower and upper bounds on the probability,
i.e., Bel ( A) ≤ p( A) ≤ Pl ( A)
where the belief function, Bel ( A), measures the minimum uncertainty value about proposi-
tion A, and the Plausibility, Pl ( A), reflects the maximum uncertainty value about proposition
A.
The following are the key assumptions made with the fusion of intrusion detectors:
234 Sensor Fusion and Its Applications
• If some of the detectors are imprecise, the uncertainty can be quantified about an event
by the maximum and minimum probabilities of that event. Maximum (minimum) prob-
ability of an event is the maximum (minimum) of all probabilities that are consistent
with the available evidence.
• The process of asking an IDS about an uncertain variable is a random experiment whose
outcome can be precise or imprecise. There is randomness because every time a differ-
ent IDS observes the variable, a different decision can be expected. The IDS can be
precise and provide a single value or imprecise and provide an interval. Therefore, if
the information about uncertainty consists of intervals from multiple IDSs, then there
is uncertainty due to both imprecision and randomness.
If all IDSs are precise, then the pieces of evidence from these IDSs point precisely to specific
values. In this case, a probability distribution of the variable can be build. However, if the IDSs
provide intervals, such a probability distribution cannot be build because it is not known as
to what specific values of the random variables each piece of evidence supports.
The equation Bel ( A) + Bel ( Ā) = 1, which is equivalent to Bel ( A) = Pl ( A), holds for all sub-
sets A of the FoD if and only if Bel s focal points are all singletons. In this case, Bel is an
additive probability distribution. Whether normalized or not, the DS method satisfies the two
The problem is formalized as follows: Considering the network traffic, assume a traffic space
Θ, which is the union of the different classes, namely, the attack and the normal. The attack
class have different types of attacks and the classes are assumed to be mutually exclusive.
Each IDS assigns to the traffic, the detection of any of the traffic sample x ∈Θ, that denotes the
traffic sample to come from a class which is an element of the FoD, Θ. With n IDSs used for
the combination, the decision of each one of the IDSs is considered for the final decision of the
fusion IDS.
This chapter presents a method to detect the unknown traffic attacks with an increased degree
of confidence by making use of a fusion system composed of detectors. Each detector observes
the same traffic on the network and detects the attack traffic with an uncertainty index. The
frame of discernment consists of singletons that are exclusive (Ai ∩ A j = φ, ∀i = j) and are
exhaustive since the FoD consists of all the expected attacks which the individual IDS detects
or else the detector fails to detect by recognizing it as a normal traffic. All the constituent IDSs
that take part in fusion is assumed to have a global point of view about the system rather than
Mathematical Basis of Sensor Fusion in Intrusion Detection Systems 235
separate detectors being introduced to give specialized opinion about a single hypothesis.
The DS combination rule gives the combined mass of the two evidence m1 and m2 on any
subset A of the FoD as m( A) given by:
∑ m 1 ( X ) m 2 (Y )
X∩Y = A
m( A) = (15)
1− ∑ m 1 ( X ) m 2 (Y )
X∩Y = φ
Specifically, if a particular detector indexed i taking part in fusion has probability of detection
mi ( A) for a particular class A, it is expected that fusion results in the probability of that class
as m( A), which is expected to be more that mi ( A) ∀ i and A. Thus the confidence in detecting
a particular class is improved, which is the key aim of sensor fusion. The above analysis
is simple since it considers only one class at a time. The variance of the two classes can be
merged and the resultant variance is the sum of the normalized variances of the individual
classes. Hence, the class label can be dropped.
Assuming the attack connection and normal connection scores to have the mean values yij= I =
µ I and yij= N I = µ N I respectively, µ I > µ N I without loss of generality. Let σI and σN I be the
standard deviation of the attack connection and normal connection scores. The two types of
errors committed by IDSs are often measured by False Positive Rate (FPrate ) and False Nega-
tive Rate (FNrate ). FPrate is calculated by integrating the attack score distribution from a given
threshold T in the score space to ∞, while FNrate is calculated by integrating the normal dis-
tribution from −∞ to the given threshold T. The threshold T is a unique point where the error
is minimized, i.e., the difference between FPrate and FNrate is minimized by the following
criterion:
T = argmin(| FPrateT − FNrateT |) (16)
At this threshold value, the resultant error due to FPrate and FNrate is a minimum. This is
because the FNrate is an increasing function (a cumulative density function, cdf) and FPrate is
a decreasing function (1 − cd f ). T is the point where these two functions intersect. Decreasing
the error introduced by the FPrate and the FNrate implies an improvement in the performance
of the system. ∞
FPrate = ( pk= N I )dy (17)
T
T
FNrate = ( pk= I )dy (18)
−∞
The fusion algorithm accepts decisions from many IDSs, where a minority of the decisions are
false positives or false negatives. A good sensor fusion system is expected to give a result that
accurately represents the decision from the correctly performing individual sensors, while
minimizing the decisions from erroneous IDSs. Approximate agreement emphasizes preci-
sion, even when this conflicts with system accuracy. However, sensor fusion is concerned
solely with the accuracy of the readings, which is appropriate for sensor applications. This is
true despite the fact that increased precision within known accuracy bounds would be bene-
ficial in most of the cases. Hence the following strategy is being adopted:
. The false alarm rate FPrate can be fixed at an acceptable value α0 and then the detection
rate can be maximized. Based on the above criteria a lower bound on accuracy can be
derived.
. The detection rate is always higher than the false alarm rate for every IDS, an assump-
tion that is trivially satisfied by any reasonably functional sensor.
. Determine whether the accuracy of the IDS after fusion is indeed better than the accu-
racy of the individual IDSs in order to support the performance enhancement of fusion
IDS.
. To discover the weights on the individual IDSs that gives the best fusion.
Given the desired false alarm rate which is acceptable, FPrate = α0 , the threshold ( T ) that
maximizes the TPrate and thus minimizes the FNrate ;
n
TPrate = Pr [ ∑ wi si ≥ T | attack ] (19)
i =1
Mathematical Basis of Sensor Fusion in Intrusion Detection Systems 237
n
FPrate = Pr [ ∑ wi si ≥ T |normal ] = α0 (20)
i =1
The fusion of IDSs becomes meaningful only when FP ≤ FPi ∀ i and TP ≥ TPi ∀ i. In order
to satisfy these conditions, an adaptive or dynamic weighting of IDSs is the only possible
alternative. Model of the fusion output is given as:
n
s= ∑ wi s i and TPi = Pr [si = 1| attack], FPi = Pr [si = 1|normal ] (21)
i =1
where TPi is the detection rate and FPi is the false positive rate of any individual IDS indexed
i. It is required to provide a low value of weight to any individual IDS that is unreliable, hence
meeting the constraint on false alarm as given in equation 20. Similarly, the fusion improves
the TPrate , since the detectors get appropriately weighted according to their performance.
Fusion of the decisions from various IDSs is expected to produce a single decision that is
more informative and accurate than any of the decisions from the individual IDSs. Then the
question arises as to whether it is optimal. Towards that end, a lower bound on variance for
the fusion problem of independent sensors, or an upper bound on the false positive rate or a
lower bound on the detection rate for the fusion problem of dependent sensors is presented
in this chapter.
The successful operation of a multiple sensor system critically depends on the methods that
combine the outputs of the sensors. A suitable rule can be inferred using the training exam-
ples, where the errors introduced by various individual sensors are unknown and not con-
trollable. The choice of the sensors has been made and the system is available, and the fusion
rule for the system has to be obtained. A system of n sensors IDS1 , IDS2 , ..., IDSn is consid-
ered; corresponding to an observation with parameter x, x ∈ m , sensor IDSi yields output
si , si ∈ m according to an unknown probability distribution pi . A training l −sample (x1 , y1 ),
(x2 , y2 ), ..., (xl , yl ) is given where yi = (s1i , s2i , ..., sin ) and sij is the output of IDSi in response to
the input x j . The problem is to estimate a fusion rule f : nm → m , based on the sample,
such that the expected square error is minimized over a family of fusion rules based on the
given l −sample.
Consider n independent IDSs with the decisions of each being a random variable with Gaus-
sian distribution of zero mean vector and covariance matrix diagonal (σ12 , σ22 , . . . , σn2 ). Assume
s to be the expected fusion output, which is the unknown deterministic scalar quantity to be
238 Sensor Fusion and Its Applications
estimated and ŝ to be the estimate of the fusion output. In most cases the estimate is a deter-
ministic function of the data. Then the mean square error (MSE) associated with the estimate
ŝ for a particular test data set is given as E[(s − ŝ)2 ]. For a given value of s, there are two basic
kinds of errors:
. Random error, which is also called precision or estimation variance.
. Systematic error, which is also called accuracy or estimation bias.
Both kinds of errors can be quantified by the conditional distribution of the estimates pr (ŝ − s).
The MSE of a detector is the expected value of the error and is due to the randomness or due
to the estimator not taking into account the information that could produce a more accurate
result.
The MSE is the absolute error used to assess the quality of the sensor in terms of its variation
and unbiasedness. For an unbiased sensor, the MSE is the variance of the estimator, or the
root mean squared error ( RMSE) is the standard deviation. The standard deviation measures
the accuracy of a set of probability assessments. The lower the value of RMSE, the better it is
as an estimator in terms of both the precision as well as the accuracy. Thus, reduced variance
can be considered as an index of improved accuracy and precision of any detector. Hence, the
reduction in variance of the fusion IDS to show its improved performance is proved in this
chapter. The Cramer-Rao inequality can be used for deriving the lower bound on the variance
of an estimator.
The Cramer-Rao lower bound is used to get the best achievable estimation performance. Any
sensor fusion approach which achieves this performance is optimum in this regard. CR in-
equality states that the reciprocal of the Fisher information is an asymptotic lower bound on
the variance of any unbiased estimator ŝ. Fisher information is a method for summarizing the
influence of the parameters of a generative model on a collection of samples from that model.
In this case, the parameters we consider are the means of the Gaussians. Fisher information is
the variance, (σ2 ) of the score (partial derivative of the logarithm of the likelihood function of
the network traffic with respect to σ2 ).
∂
score = ln( L(σ2 ; s)) (23)
∂σ2
Basically, the score tells us how sensitive the log-likelihood is to changes in parameters. This is
a function of variance, σ2 and the detection s and this score is a sufficient statistic for variance.
The expected value of this score is zero, and hence the Fisher information is given by:
∂
E [ 2 ln( L(σ2 ; s))]2 |σ2 (24)
∂σ
Fisher information is thus the expectation of the squared score. A random variable carrying
high Fisher information implies that the absolute value of the score is often high.
Mathematical Basis of Sensor Fusion in Intrusion Detection Systems 239
If the prior probability of detection of the various IDSs are known, the weights wi |i=1,−−−n can
be assigned to the individual IDSs. The idea is to estimate the local accuracy of the IDSs. The
decision of the IDS with the highest local accuracy estimate will have the highest weighting
on aggregation. The best fusion algorithm is supposed to choose the correct class if any of the
individual IDS did so. This is a theoretical upper bound for all fusion algorithms. Of course,
the best individual IDS is a lower bound for any meaningful fusion algorithm. Depending
on the data, the fusion may sometimes be no better than Bayes. In such cases, the upper and
lower performance bounds are identical and there is no point in using a fusion algorithm. A
further insight into CRB can be gained by understanding how each IDS affects it. With the ar-
chitecture shown in Fig. 1, the model is given by ŝ = ∑in=1 wi si . The bound is calculated from
σ2
the effective variance of each one of the IDSs as σˆ2 = i and then combining them to have the
i wi2
1
CRB as n 1 .
∑ i =1 σ̂2
i
The weight assigned to the IDSs is inversely proportional to the variance. This is due to the
fact that, if the variance is small, the IDS is expected to be more dependable. The bound on
the smallest variance of an estimation ŝ is given as:
1
σˆ2 = E[(ŝ − s)2 ] ≥ (26)
w2
∑in=1 σ2i
i
It can be observed from equation 26 that any IDS decision that is not reliable will have a very
limited impact on the bound. This is because the non-reliable IDS will have a much larger
variance than other IDSs in the group; σˆn2 σˆ12 ,- - - , σn2ˆ−1 and hence 1ˆ2 1ˆ2 , −- - , 21ˆ . The
σn σ1 σn−1
1
bound can then be approximated as n −1 1 .
∑ i =1 ˆ
σ2
i
Also, it can be observed from equation 26 that the bound shows asymptotically optimum
2ˆ = min [ σˆ2 , − − −, σˆ2 ], then
behavior of minimum variance. Then, σˆi2 > 0 and σmin i n
1 2ˆ ≤ σˆ2
CRB = < σmin i (27)
∑in=1 1
σˆ2
i
From equation 27 it can also be shown that perfect performance is apparently possible with
enough IDSs. The bound tends to zero as more and more individual IDSs are added to the
fusion unit.
1
CRBn→∞ = Ltn→∞ 1 1
(28)
ˆ2 + − − − + ˆ2 σ1 σn
240 Sensor Fusion and Its Applications
1 σˆ2
CRBn→∞ = Ltn→∞ n = Ltn→∞ =0 (29)
n
σˆ2
From equation 28 and equation 29 it can be easily interpreted that increasing the number
of IDSs to a sufficiently large number will lead to the performance bounds towards perfect
estimates. Also, due to monotone decreasing nature of the bound, the IDSs can be chosen to
make the performance as close to perfect.
As an illustration, let us consider a system with three individual IDSs, with a joint density at
the IDSs having a covariance matrix of the form:
1 ρ12 ρ13
= ρ21 1 ρ23 (30)
ρ31 ρ32 1
The false alarm rate (α) at the fusion center, where the individual decisions are aggregated can
be written as:
t t t
αmax = 1 − Pr (s1 = 0, s2 = 0, s3 = 0|normal ) = 1 − Ps (s|normal )ds (31)
−∞ −∞ −∞
where Ps (s|normal ) is the density of the sensor observations under the hypothesis normal
and is a function of the correlation coefficient, ρ. Assuming a single threshold, T, for all the
sensors, and with the same correlation coefficient, ρ between different sensors, a function
Fn ( T |ρ) = Pr (s1 = 0, s2 = 0, s3 = 0) can be defined.
−∞ √
T − ρy
Fn ( T |ρ) = Fn ( ) f (y)dy (32)
−∞ 1−ρ
where f (y) and F ( X ) are the standard normal density and cumulative distribution function
respectively.
F n ( X ) = [ F ( X )]n
and
αmax = 1 − F3 ( T |ρ) f or − 0.5 ≤ ρ < 1 (34)
With this threshold T, the probability of detection at the fusion unit can be computed as:
∞ √
T − S − ρy
TPmin = 1 − F3 ( ) f (y)dy f or 0 ≤ ρ < 1 (35)
−∞ 1−ρ
and
TPmin = 1 − F3 ( T − S |ρ) f or − 0.5 ≤ ρ < 1 (36)
The above equations 33, 34, 35, and 36, clearly showed the performance improvement of sen-
sor fusion where the upper bound on false positive rate and lower bound on detection rate
were fixed. The system performance was shown to deteriorate when the correlation between
the sensor errors was positive and increasing, while the performance improved considerably
when the correlation was negative and increasing.
The above analysis were made with the assumption that the prior detection probability of
the individual IDSs were known and hence the case of bounded variance. However, in case
the IDS performance was not known a priori, it was a case of unbounded variance and hence
given the trivial model it was difficult to accuracy estimate the underlying decision. This
clearly emphasized the difficulty of sensor fusion problem, where it becomes a necessity to
understand the individual IDS behavior. Hence the architecture was modified as proposed in
the work of Thomas & Balakrishnan (2008) and shown in Fig. 2 with the model remaining the
same. With this improved architecture using a neural network learner, a clear understanding
of each one of the individual IDSs was obtained. Most other approaches treat the training
data as a monolithic whole when determining the sensor accuracy. However, the accuracy
was expected to vary with data. This architecture attempts to predict the IDSs that are reliable
for a given sample data. This architecture is demonstrated to be practically successful and is
also the true situation where the weights are neither completely known nor totally unknown.
combination when the IDSs are heterogeneous and shows difference in performance. The ar-
chitecture should be independent of the dataset and the structures employed, and has to be
used with any real valued data set.
The Data-dependent Decision fusion architecture has three-stages; the IDSs that produce the
alerts as the first stage, the neural network supervised learner determining the weights to the
IDSs’ decisions depending on the input as the second stage, and then the fusion unit doing
the weighted aggregation as the final stage. The neural network learner can be considered as
a pre-processing stage to the fusion unit. The neural network is most appropriate for weight
determination, since it becomes difficult to define the rules clearly, mainly as more number of
IDSs are added to the fusion unit. When a record is correctly classified by one or more detec-
tors, the neural network will accumulate this knowledge as a weight and with more number
of iterations, the weight gets stabilized. The architecture is independent of the dataset and the
structures employed, and can be used with any real valued dataset. Thus it is reasonable to
make use of a neural network learner unit to understand the performance and assign weights
to various individual IDSs in the case of a large dataset.
The weight assigned to any IDS not only depends on the output of that IDS as in the case
of the probability theory or the Dempster-Shafer theory, but also on the input traffic which
causes this output. A neural network unit is fed with the output of the IDSs along with the
respective input for an in-depth understanding of the reliability estimation of the IDSs. The
alarms produced by the different IDSs when they are presented with a certain attack clearly
tell which sensor generated more precise result and what attacks are actually occurring on the
network traffic. The output of the neural network unit corresponds to the weights which are
assigned to each one of the individual IDSs. The IDSs can be fused with the weight factor to
produce an improved resultant output.
This architecture refers to a collection of diverse IDSs that respond to an input traffic and the
weighted combination of their predictions. The weights are learned by looking at the response
of the individual sensors for every input traffic connection. The fusion output is represented
as:
s = Fj (wij ( x j , sij ), sij ), (37)
Mathematical Basis of Sensor Fusion in Intrusion Detection Systems 243
where the weights wij are dependent on both the input x j as well as individual IDS’s output
sij , where the suffix j refers to the class label and the prefix i refers to the IDS index. The fusion
unit used gives a value of one or zero depending on the set threshold being higher or lower
than the weighted aggregation of the IDS’s decisions.
The training of the neural network unit by back propagation involves three stages: 1) the feed
forward of the output of all the IDSs along with the input training pattern, which collectively
form the training pattern for the neural network learner unit, 2) the calculation and the back
propagation of the associated error, and 3) the adjustments of the weights. After the training,
the neural network is used for the computations of the feedforward phase. A multilayer net-
work with a single hidden layer is sufficient in our application to learn the reliability of the
IDSs to an arbitrary accuracy according to the proof available in Fausett (2007).
Consider the problem formulation where the weights w1 , ..., wn , take on constrained values
to satisfy the condition ∑in=1 wi = 1. Even without any knowledge about the IDS selectivity
factors, the constraint on the weights assures the possibility to accuracy estimate the underly-
ing decision. With the weights learnt for any data, it becomes a useful generalization of the
trivial model which was initially discussed. The improved efficient model with good learning
algorithm can be used to find the optimum fusion algorithms for any performance measure.
The classification of the various attacks found in the network traffic is explained in detail in the
thesis work of Kendall (1999) with respect to DARPA intrusion detection evaluation dataset
and is explained here in brief. The attacks fall into four main classes namely, Probe, Denial
of Service(DoS), Remote to Local(R2L) and the User to Root (U2R). The Probe or Scan attacks
244 Sensor Fusion and Its Applications
automatically scan a network of computers or a DNS server to find valid IP addresses, active
ports, host operating system types and known vulnerabilities. The DoS attacks are designed
to disrupt a host or network service. In R2L attacks, an attacker who does not have an account
on a victim machine gains local access to the machine, exfiltrates files from the machine or
modifies data in transit to the machine. In U2R attacks, a local user on a machine is able to
obtain privileges normally reserved for the unix super user or the windows administrator.
Even with the criticisms by McHugh (2000) and Mahoney & Chan (2003) against the DARPA
dataset, the dataset was extremely useful in the IDS evaluation undertaken in this work. Since
none of the IDSs perform exceptionally well on the DARPA dataset, the aim is to show that
the performance improves with the proposed method. If a system is evaluated on the DARPA
dataset, then it cannot claim anything more in terms of its performance on the real network
traffic. Hence this dataset can be considered as the base line of any research Thomas & Balakr-
ishnan (2007). Also, even after ten years of its generation, even now there are lot of attacks in
the dataset for which signatures are not available in database of even the frequently updated
signature based IDSs like Snort (1999). The real data traffic is difficult to work with; the main
reason being the lack of the information regarding the status of the traffic. Even with intense
analysis, the prediction can never be 100 percent accurate because of the stealthiness and so-
phistication of the attacks and the unpredictability of the non-malicious user as well as the
intricacies of the users in general.
The weight analysis of the IDS data coming from PHAD, ALAD, and Snort was carried out by
the Neural Network supervised learner before it was fed to the fusion element. The detectors
PHAD and ALAD produces the IP address along with the anomaly score whereas the Snort
produces the IP address along with severity score of the alert. The alerts produced by these
IDSs are converted to a standard binary form. The Neural Network learner inputs these deci-
sions along with the particular traffic input which was monitored by the IDSs.
The neural network learner was designed as a feed forward back propagation algorithm with
a single hidden layer and 25 sigmoidal hidden units in the hidden layer. Experimental proof
is available for the best performance of the Neural Network with the number of hidden units
being log( T ), where T is the number of training samples in the dataset Lippmann (1987). The
values chosen for the initial weights lie in the range of −0.5 to 0.5 and the final weights after
Mathematical Basis of Sensor Fusion in Intrusion Detection Systems 245
training may also be of either sign. The learning rate is chosen to be 0.02. In order to train the
neural network, it is necessary to expose them to both normal and anomalous data. Hence,
during the training, the network was exposed to weeks 1, 2, and 3 of the training data and the
weights were adjusted using the back propagation algorithm. An epoch of training consisted
of one pass over the training data. The training proceeded until the total error made during
each epoch stopped decreasing or 1000 epochs had been reached. If the neural network stops
learning before reaching an acceptable solution, a change in the number of hidden nodes or in
the learning parameters will often fix the problem. The other possibility is to start over again
with a different set of initial weights.
The fusion unit performed the weighted aggregation of the IDS outputs for the purpose of
identifying the attacks in the test dataset. It used binary fusion by giving an output value of
one or zero depending the value of the weighted aggregation of the various IDS decisions.
The packets were identified by their timestamp on aggregation. A value of one at the output
of the fusion unit indicated the record to be under attack and a zero indicated the absence of
an attack.
2∗P∗R
F-score = (38)
P+R
Higher value of F-score indicates that the IDS is performing better on recall as well as preci-
sion.
246 Sensor Fusion and Its Applications
Table 4 and Fig. 3 show the improvement in performance of the Data-dependent Decision
fusion method over each of the three individual IDSs. The detection rate is acceptably high
for all types of attacks without affecting the false alarm rate.
The real traffic within a protected University campus network was collected during the work-
ing hours of a day. This traffic of around two million packets was divided into two halves,
one for training the anomaly IDSs, and the other for testing. The test data was injected with 45
HTTP attack packets using the HTTP attack traffic generator tool called libwhisker Libwhisker
(n.d.). The test data set was introduced with a base rate of 0.0000225, which is relatively real-
istic. The comparison of the evaluated IDS with various other fusion techniques is illustrated
in table 5 with the real-world network traffic.
The results evaluated in Table 6 show that the accuracy (Acc.) and AUC are not good met-
rics with the imbalanced data where the attack class is rare compared to the normal class.
Accuracy was heavily biased to favor majority class. Accuracy when used as a performance
measure assumed target class distribution to be known and unchanging, and the costs of FP
and FN to be equal. These assumptions are unrealistic. If metrics like accuracy and AUC are
to be used, then the data has to be more balanced in terms of the various classes. If AUC was
to be used as an evaluation metric a possible solution was to consider only the area under
Mathematical Basis of Sensor Fusion in Intrusion Detection Systems 247
the ROC curve until the FP-rate reaches the prior probability. The results presented in Table
5 indicate that the Data-dependent Decision fusion method performs significantly better for
attack class with high recall as well as high precision as against achieving the high accuracy
alone.
The ROC Semilog curves of the individual IDSs and the DD fusion IDS are given in Fig.
4, which clearly show the better performance of the DD fusion method in comparison to the
three individual IDSs, PHAD, ALAD and Snort. The log-scale was used for the x-axis to iden-
tify the points which would otherwise be crowded on the x-axis.
Detection/
Fusion P R Acc. AUC F-Score
PHAD 0.35 0.28 0.99 0.64 0.31
ALAD 0.38 0.32 0.99 0.66 0.35
Snort 0.09 0.51 0.99 0.75 0.15
Data-
Dependent 0.39 0.68 0.99 0.84 0.50
fusion
Table 6. Performance Comparison of individual IDSs and the Data-Dependent Fusion method
6. Conclusion
A discussion on the mathematical basis for sensor fusion in IDS is included in this chapter.
This study contributes to fusion field in several aspects. Firstly, considering zero knowledge
about the detection systems and the traffic data, an attempt is made to show the improved
performance of sensor fusion for intrusion detection application. The later half of the chapter
takes into account the analysis of the sensor fusion system with a knowledge of data and
sensors that are seen in practice. Independent as well as dependent detectors were considered
and the study clarifies the intuition that independence of detectors is crucial in determining
the success of fusion operation. If the individual sensors were complementary and looked
at different regions of the attack domain, then the data-dependent decision fusion enriches
the analysis on the incoming traffic to detect attack with appreciably low false alarms. The
approach is tested with the standard DARPA IDS traces, and offers better performance than
any of the individual IDSs. The individual IDSs that are components of this architecture in
this particular work were PHAD, ALAD and Snort with detection rates 0.28, 0.32 and 0.51
respectively. Although the research discussed in this chapter has thus far focused on the three
Mathematical Basis of Sensor Fusion in Intrusion Detection Systems 249
0.9
0.8
0.3
0.2
0.1
0
−6 −5 −4 −3 −2 −1 0
10 10 10 10 10 10 10
FALSE POSITIVE RATE (LOG SCALE)
Fig. 4. ROC Semilog curve of individual and combined IDSs
IDSs, namely, PHAD, ALAD and Snort, the algorithm works well with any IDS. The result
of the Data-dependent Decision fusion method is better than what has been predicted by the
Lincoln Laboratory after the DARPA IDS evaluation. An intrusion detection of 68% with a
false positive of as low as 0.002% is achieved using the DARPA data set and detection of 60%
with a false positive of as low as 0.002% is achieved using the real-world network traffic. The
figure of merit, F-score of the data-dependent decision fusion method has improved to 0.50
for the DARPA data set and to 0.47 for the real-world network traffic.
7. References
Aalo, V. & Viswanathan, R. (1995). On distributed detection with correlated sensors: Two
examples, IEEE Transactions on Aerospace and Electronic Systems Vol. 25(No. 3): 414–
421.
ALAD (2002). Learning non stationary models of normal network traffic for detecting novel
attacks, SIGKDD.
Baek, W. & Bommareddy, S. (1995). Optimal m-ary data fusion with distributed sensors, IEEE
Transactions on Aerospace and Electronic Systems Vol. 31(No. 3): 1150–1152.
Bass, T. (1999). Multisensor data fusion for next generation distributed intrusion detection
systems, IRIS National Symposium.
Blum, R., Kassam, S. & Poor, H. (1995). Distributed detection with multiple sensors - part ii:
Advanced topics, Proceedings of IEEE pp. 64–79.
Brown, G. (2004). Diversity in neural network ensembles, PhD thesis .
Chair, Z. & Varshney, P. (1986). Optimal data fusion in multiple sensor detection systems,
IEEE Transactions on Aerospace and Electronic Systems Vol. 22(No. 1): 98–101.
DARPA-1999 (1999). http://www.ll.mit.edu/IST/ideval/data/data_index.
html.
Drakopoulos, E. & Lee, C. (1995). Optimum multisensor fusion of correlated local, IEEE Trans-
actions on Aerospace and Electronic Systems Vol. 27: 593–606.
Elkan, C. (2000). Results of the kdd’99 classifier learning, SIGKDD Explorations, pp. 63–64.
250 Sensor Fusion and Its Applications
0
11
1. Introduction
Recent advances in wireless communication have enabled the diffusion of networked systems
whose capability of acquiring information and acting on wide areas, in a decentralized and
autonomous way, represents an attractive peculiarity for many military and civil applications.
Sensor networks are probably the best known example of such systems: cost reduction in pro-
ducing smart sensors has allowed the deployment of constellations of low-cost low-power
interconnected nodes, able to sense the environment, perform simple computation and com-
municate within a given range (Akyildiz et al., 2002). Another example is mobile robotics,
whose development has further stressed the importance of distributed control and coopera-
tive task management in formations of agents (Siciliano & Khatib, 2008). A non-exhaustive list
of emerging applications of networked systems encompasses target tracking, environmental
monitoring, smart buildings surveillance and supervision, water quality and bush fire sur-
veying (Martinez & Bullo, 2006).
The intrinsically distributed nature of measurements acquired by the nodes requires the sys-
tem to perform a fusion of sensor perceptions in order to obtain relevant information from the
environment in which the system is deployed. This is the case of environmental monitoring,
in which the nodes may measure the trend of variables of interest over a geographic region, in
order to give a coherent overview on the scenario of observation. As in this last example, most
of the mentioned fields of application require that each node has precise knowledge of its ge-
ometric position for correctly performing information fusion, since actions and observations
are location-dependent. Other cases in which it is necessary to associate a position to each
node are formation control, which is based on the knowledge of agent positions, and location
aware routing, which benefits from the position information for optimizing the flow of data
through the network, to mention but a few.
In this chapter we discuss the problem of network localization, that is the estimation of node
positions from internodal measurements, focusing on the case of pairwise distance measure-
ments. In Section 2 the estimation problem is first introduced, reporting the related literature
on the topic. In Section 2.1 we consider the case of localization from range-only measure-
ments, whereas in Section 3 we formalize the estimation problem at hand. Five approaches
for solving network localization are extensively discussed in Section 4, where we report the
theoretical basis of each technique, the corresponding convergence properties and numeri-
cal experiments in realistic simulation setups. The first three localization methods, namely
a gradient-based method, a Gauss-Newton approach and a trust region method are local, since
252 Sensor Fusion and Its Applications
they require a reasonable initial guess on node position to successfully estimate the actual net-
work configuration. We then present two global techniques, respectively a global continuation
approach and a technique based on semidefinite programming (SDP), which are demonstrated,
under suitable conditions, to retrieve the actual configuration, regardless the available prior
knowledge on node positions. Several comparative results are presented in Sections 5 and 6.
A brief discussion on distributed localization techniques is reported in Section 7 and conclu-
sions are draws in Section 8.
2. Network Localization
When dealing with a network with a large number of nodes a manual configuration of node
positions during system set up, when possible, is an expensive and time consuming task.
Moreover, in many applications, such as mobile robotics, nodes can move autonomously,
thus positions need be tracked as time evolves. A possible solution consists in equipping
each node with a GPS sensor, hence allowing the nodes to directly measure their location.
Such an approach is often infeasible in terms of cost, weight burden, power consumption, or
when the network is deployed in GPS-denied areas. As the above mentioned factors could be
technological barriers, a wide variety of solutions for computing node locations through effec-
tive and efficient procedures was proposed in the last decade. The so-called indirects methods
are finalized at determining absolute node positions (with respect to a local or global reference
frame) from partial relative measurements between nodes, that is, each node may measure the
relative position (angle and distance, angle only or distance only) from a set of neighbor nodes,
and the global absolute positions of all nodes need be retrieved. This problem is generically
known as network localization.
If all relative measurements are gathered to some “central elaboration unit” which performs
estimation over the whole network, the corresponding localization technique is said to be cen-
tralized. This is the approach that one implicitly assumes when writing and solving a problem:
all the data that is relevant for the problem description is available to the problem solver. In
a distributed setup, however, each node communicates only with its neighbors, and performs
local computations in order to obtain an estimate of its own position. As a consequence, the
communication burden is equally spread among the network, the computation is decentral-
ized and entrusted to each agent, improving both efficiency and robustness of the estimation
process.
In the most usual situation of planar networks, i.e., networks with nodes displaced in two-
dimensional space, three main variations of the localization problem are typically considered
in the literature, depending on the type of relative measurements available to the nodes. A first
case is when nodes may take noisy measurements of the full relative position (coordinates or,
equivalently, range and angle) of neighbors; this setup has been recently surveyed in (Barooah
& Hespanha, 2007). The localization problem with full position measurements is a linear
estimation problem that can be solved efficiently via a standard least-squares approach, and
the networked nature of the problem can also be exploited to devise distributed algorithms
(such as the Jacobi algorithm proposed in (Barooah & Hespanha, 2007)).
A second case arises, instead, when only angle measurements between nodes are available.
This case, which is often referred to as bearing localization, can be attacked via maximum like-
lihood estimation as described in (Mao et al., 2007). This localization setup was pioneered by
Stanfield (Stanfield, 1947), and further studied in (Foy, 1976).
In the last case, which is probably the most common situation in practice, each node can mea-
sure only distances from a subset of other nodes in the formation. This setup that we shall
Sensor Fusion for Position Estimation in Networked Systems 253
name range localization, has quite a long history, dating at least back to the eighties, and it is
closely related to the so-called molecule problem studied in molecular biology, see (Hendrick-
son, 1995). However, it still attracts the attention of the scientific community for its relevance
in several applications; moreover, recent works propose innovative and efficient approaches
for solving the problem, making the topic an open area of research.
3. Problem Statement
We now introduce a formalization of the range-based localization problem. Such model is the
basis for the application of the optimization techniques that are presented in the following
sections and allows to estimate network configuration from distance measurement.
Let V = {v1 , . . . , vn } be a set of n nodes (agents, sensors, robots, vehicles, etc.), and let
P = { p1 , . . . , pn } denote a corresponding set of positions on the Cartesian plane, where
pi = [ xi yi ] ∈ R2 are the coordinates of the i-th node. We shall call P a configuration of
nodes. Consider a set E of m distinct unordered pairs e1 , . . . , em , where ek = (i, j), and suppose
that we are given a corresponding set of nonnegative scalars d1 , . . . , dm having the meaning of
distances between node i and j.
We want to determine (if one exists) a node configuration { p1 , . . . , pn } that matches the given
set of internodal distances, i.e. such that
pi − p j 2 = d2ij , ∀ (i, j) ∈ E ,
or, if exact matching is not possible, that minimizes the sum of squared mismatch errors, i.e.
such that the cost 2
1
f = ∑ pi − p j 2 − d2ij (1)
2 (i,j)∈E
is minimized. When the global minimum of f is zero we say that exact matching is achieved,
otherwise no geometric node configuration can exactly match the given range data, and we
say that approximate matching is achieved by the optimal configuration.
The structure of the problem can be naturally described using graph formalism: nodes {v1 , . . . , vn }
represent the vertices of a graph G , and pairs of nodes (i, j) ∈ E between which the internodal
distance is given represent graph edges. The cost function f has thus the meaning of accumu-
lated quadratic distance mismatch error over the graph edges. We observe that in practical
applications the distance values dij come from noisy measurements of actual distances be-
tween node pairs in a real and existing configuration of nodes in a network. The purpose of
network localization is in this case to estimate the actual node positions from the distance mea-
surements. However, recovery of the true node position from distance measurements is only
possible if the underlying graph is generically globally rigid (ggr), (Eren et al., 2004). A network
is said to be globally rigid if is congruent with any network which shares the same underly-
ing graph and equal corresponding information on distances. Generically global rigidity is
a stronger concept that requires the formation to remain globally rigid also up to non triv-
ial flexes. Rigidity properties of a network strongly depends on the so called Rigidity matrix
R ∈ R m×2n , in which each row is associated to an edge eij , and the four nonzero entries of the
row can be computed as xi − x j , yi − y j , x j − xi , y j − yi (with pi = [ xi , yi ] ), and are located
respectively in column 2i − 1, 2i, 2j − 1, 2j. In particular a network is globally rigid if R has
rank 2n − 3.
If a planar network is generically globally rigid the objective function in (1) has a unique global
minimum, if the positions of at least three non-collinear nodes is known and fixed in advance
(anchor nodes), or it has several equivalent global minima corresponding to congruence trans-
formations (roto-translation) of the configuration, if no anchors are specified. If the graph is
not ggr, instead, there exist many different geometric configurations (also called flexes) that
match exactly or approximately the distance data and that correspond to equivalent global
minima of the cost f . In this work we are not specifically interested in rigidity conditions that
Sensor Fusion for Position Estimation in Networked Systems 255
render the global minimum of f unique. Instead, we focus of numerical techniques to com-
pute a global minimum of f , that is one possible configuration that exactly or approximately
matches the distance data. Clearly, if the problem data fed to the algorithm correspond to a
ggr graph with anchors, then the resulting solution will indeed identify univocally a geomet-
ric configuration. Therefore, we here treat the problem in full generality, under no rigidity
assumptions. Also, in our approach we treat under the same framework both anchor-based
and anchor-free localization problems. In particular, when anchor nodes are specified at fixed
positions, we just set the respective node position variables to the given values, and eliminate
these variables from the optimization. Therefore, the presence of anchors simply reduces the
number of free variables in the optimization.
how much the initial function changes in the transformation. For large values of the smooth-
ing parameter the transformed function is convex, whereas smaller values correspond to less
smoothed functions. When the parameter is zero the original cost function is recovered. The
result is that the initial smoothing succeeds in moving the initial guess closer to the global op-
timum of the objective function, then a decreasing sequence of smoothing parameters assures
the method to reach the global minimum of the original function. According to the previ-
ous considerations the method guarantees the convergence to the global optimum with high
probability regardless the initial guess of the optimization. In the chapter it is shown how the
robustness of the approach implies a further computation effort which may be unsustainable
for nodes with limited computational resources.
Finally we describe a technique which has recently attracted the attention of the research com-
munity. The approach, whose first contributions can be found in (Doherty et al., 2001) and
(Biswas & Ye, 2004), is based on a relaxation of the original optimization problem and solved
using semidefinite programming (SDP). This technique is the most computational demanding
with respect to the previous approaches, although distributed techniques can be implemented
to spread the computational burden on several nodes.
These centralized approaches for minimizing the cost (1) work iteratively from a starting ini-
tial guess. As mentioned above the gradient method, the Gauss-Newton approach, the trust
region method are local, hence the initial guess plays a fundamental role in the solution of the
problem: such techniques may fail to converge to the global optimum, if the initial guess is
not close enough to the global solution. In Figure 1 we report an example of node configura-
tion and a possible initial guess for optimization. The Global Continuation method employs
iteratively a local approach on a smoothed objective function and this allows the solution to
be resilient on perturbations of the initial guess. Finally the Semi-definite Programming ap-
proach is proved to retrieve the correct network configuration in the case of exact distance
measurements, although it can be inaccurate in the practical case of noisy measurements. The
0.8
0.6
y
0.4
0.2
Fig. 1. Actual node configuration (circles) and initial position guess (asterisks).
1 .
gij2 ( p), gij ( p) = pi − p j 2 − d2ij ,
2 (i,j∑
f ( p) = (2)
)∈E
Sensor Fusion for Position Estimation in Networked Systems 257
and we let p(0) denote the vector of initial position estimates. We next describe the five cen-
tralized methods to determine a minimum of the cost function, starting from p(0) .
p ( τ +1) = p ( τ ) − α τ ∇ f ( p ( τ ) ), (3)
where ατ is the step length, which may be computed at each iteration via exact or approximate
line search, and where
where gradient ∇ gij is a row vector of n blocks, with each block composed of two entries, thus
2n entries in total, and with the only non-zero terms corresponding to the blocks in position i
and j:
p ( τ ) − p ( τ −1) 2
ατ = , (5)
( p(τ ) − p(τ −1) ) (∇ f ( p(τ ) ) − ∇ f ( p(τ −1) ))
hence no line searches or matrix computations are required to determine ατ . In the rest of
the chapter the BB stepsize will be employed for solving the network localization with the
gradient method.
258 Sensor Fusion and Its Applications
1 1
0.8 0.8
0.6 0.6
y
0.4 0.4
0.2 0.2
0 0
(a) (b)
Fig. 2. (a) Example of trilateration graph with nodes in the unit square, [0, 1]2 ; (b) Example of
geometric random graph with nodes in the unit square.
practical setups, the sensing radius of each node is limited, i.e. edges in the graph may appear
only between nodes whose distance is less than the sensing range R. In order to work on more
realistic graphs in the numerical tests, we hence use random geometric graphs, that are graphs
in which nodes are deployed at random in the unit square [0, 1]2 , and an edge exists between
a pair of nodes if and only if their geometrical distance is smaller than R. It has been proved
√ log(n)
in (Eren et al., 2004) that if R > 2 2 n , the graphs produced by the previous technique
are ggr with high probability. An example of geometric graph with R = 0.3 and n = 50 is
shown in Figure 2(b).
We consider the configuration generated as previously described as the “true” configuration
(which is of course unknown in practice), and then, we use the distance measurements from
this configuration as the data for the numerical tests. Hence the global optimum of the ob-
jective function is expected to correspond to the value zero of the objective function. Conver-
gence properties of the gradient method are evaluated under the settings mentioned above.
According to (Moré & Wu, 1997), we consider pi∗ , i = 1, 2, ..., n a solution to the network local-
ization problem, i.e., the gradient algorithm successfully attained the global optimum of the
objective function, if it satisfies:
b 0
20
c 40
60
d
80
e 100
Prior knowledge Number of nodes
Fig. 3. Percentage of convergence test depending on network size and goodness of initial
guess for the GM approach.
by the initial guess for the optimization. In particular, we considered five levels of a-priori
knowledge on the configuration:
a) Good prior knowledge: initial guess for the algorithms is drawn from a multivariate
Normal distribution centered at the true node positions, with standard deviation σp =
0.1;
b) Initial guess is drawn from a multivariate Normal distribution with σp = 0.5;
c) Bad prior knowledge: Initial guess is drawn from a multivariate Normal distribution
with σp = 1;
d) Only the area where nodes are deployed is known: initial guess is drawn uniformly
over the unit square;
e) No prior information is available: initial guess is drawn randomly around the origin of
the reference frame.
In Figure 3 we report the percentage of test in which convergence is observed for different net-
work sizes and different initial guess on non-anchors position (for each setting we performed
100 simulation runs). The gradient method shows high percentage of convergence when good
prior knowledge is available.
The second test is instead related to the localization performance in function of the number of
anchors in the network. We consider a realistic setup in which there are 50 non-anchor nodes
and the number of anchors ranges from 3 to 10, displaced in the unit square. Two nodes
are connected by an edge if their distance is smaller than 0.3 and distance measurement are
affected by noise in the form:
dij = d˜ij + d ∀ (i, j) ∈ E (7)
where d˜ij is the true distance among node i and node j, dij is the corresponding measured
quantity and d is a zero mean white noise with standard deviation σd . In the following test
we consider σd = 5 · 10−3 . In order to measure the localization effectiveness, we define the
node positioning error φi∗ at node i as the Euclidean distance between the estimated position pi∗
260 Sensor Fusion and Its Applications
0.02 0.02
0.018 0.018
0.016 0.016
0.014 0.014
0.012 0.012
Φ*
Φ*
0.01 0.01
0.008 0.008
0.006 0.006
0.004 0.004
0.002 0.002
3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Number of anchors σd x 10
−3
(a) (b)
Fig. 4. (a) Localization error for different numbers of anchor nodes, using gradient method;
(b) Localization error for different standard deviation of distance measurement noise.
and the true position pi of the node. The localization error Φ∗ is further defined as the mean
value of the local positioning errors of all the nodes in the network:
1 n
n i∑
Φ∗ = pi − pi∗ .
=1
It can be seen from Figure 4(a) that the localization error shows low sensitivity on the tested
number of anchors, and the downward slope of the curve is not remarked (see tests on SDP,
Section 4.5.3, for comparison).
The third test is aimed at studying the localization error for different standard deviations of
the distance noise σd . The results, considering 3 anchor nodes, are reported in Figure 4(b). It
is worth noting that the statistics about the localization error are performed assuming con-
vergence to the global optimum of the technique, hence a good initial guess was used for
optimization in the second and third test. In this way we can disambiguate the effects of
convergence (Figure 3), from the effect of distance noise propagation (Figures 4(a) and 4(b)).
g ( p ) g ( p ( τ ) ) + R ( p ( τ ) ) δ p ( τ ),
.
where δ p (τ ) = p − p(τ ) , and
∇ gi1 j1 ( p)
. .. m,2n
R( p) = . ∈R , (9)
∇ gim jm ( p)
Sensor Fusion for Position Estimation in Networked Systems 261
where m is the number of node pairs among which a relative distance measurement exists.
Matrix R is usually known as the rigidity matrix of the configuration, see (Eren et al., 2004).
Using the approximation in (8), we thus have that
1 2
f ( p) ∑
2 (i,j)∈E
gij ( p(τ ) ) + ∇ gij ( p(τ ) )δ p (τ )
1
= g( p(τ ) ) + R( p(τ ) )δp (τ )2 .
2
The update step is then computed by determining a minimizing solution for the approximated
f , which corresponds to the least-squares solution
The described iterative technique is a version of the classical Gauss-Newton method, for
which convergence to a local minimizer is guaranteed whenever the initial level set { p :
f ( p) ≤ f ( p(0) )} is bounded, and Rr has full rank at all steps; see, e.g., (Nocedal & Wright,
2006).
80
60
40
20
0
a
0
b 20
c 40
60
d 80
100
e Number of nodes
Prior knowledge
Fig. 5. Percentage of convergence test depending on network size and goodness of initial
guess for the GN approach.
smaller and the technique is prone to incur in local minima when starting from poor initial
guess. This issue becomes more critical as the number of nodes increases.
We repeated the localization error tests for different measurement noise and number of an-
chors nodes obtaining exactly the same results as in the gradient-based case. This is however
an intuitive result when the initial guess of local techniques is sufficiently close to the global
optimum of the objective function: all the techniques simply reaches the same minimum and
the localization errors simply depend on the error propagation from distance measurement
noise.
f ( p) qτ ( p)
. 1
= f ( p(τ ) ) + ∇ f ( p(τ ) )δ p (τ ) + δ (τ )∇2 f ( p(τ ) )δp (τ ),
2 p
where, using the notation introduced in the previous section,
= 2g ( p) R( p),
where the Hessian matrix ∇2 gij ( p) ∈ R2n,2n is composed of n × n blocks of size 2 × 2: all
blocks are zero, except for the four blocks in position (i, i ), (i, j), ( j, i ), ( j, j), which are given by
min q τ ( p ).
δp (τ )∈∆τ
Let δ∗p (τ ) be the resulting optimal solution, and let p∗ = p(τ ) + δ∗p (τ ). Then, we compute the
ratio between actual and approximated function decrease:
f ( p(τ ) ) − f ( p∗ )
ρτ = ,
qτ ( p(τ ) ) − qτ ( p∗ )
and update the solution and trust region according to the following rules:
( τ +1) p(τ ) + δ∗p (τ ), if ρτ > η0
p =
p(τ ) , if ρτ ≤ η0
σ1 min{δ∗p (τ ), ξ τ }, if ρτ < η1
ξ τ +1 = σ2 ξ τ , if ρτ ∈ [η1 , η2 ) (13)
σ3 ξ τ , if ρτ ≥ η2
where ξ τ +1 is the radius of the trust region ∆τ +1 , and η0 > 0, 0 < η1 < η2 < 1; 0 < σ1 < σ2 <
1 < σ3 are parameters that have typical values set by experience to η0 = 10−4 , η1 = 0.25, η2 =
0.75; σ1 = 0.25, σ2 = 0.5, σ3 = 4. The conjugate gradient method is usually employed to solve
the trust-region subproblem, see (More, 1983) for further implementation details.
80
60
40
20
0
a
b 0
20
c 40
60
d
80
e 100
Prior knowledge Number of nodes
Fig. 6. Percentage of convergence test depending on network size and goodness of initial
guess for TR approach.
approach of (More, 1983), the smoothed version of the cost function is obtained by means of
the Gaussian transform: For a function f : R n → R the Gaussian transform is defined as
. 1
ϕ( x ) = f λ ( x ) = f ( x + λu) exp(−u2 )du. (14)
π n/2 Rn
Intuitively, ϕ( x ) is the average value of f (z) in the neighborhood of x, computed with respect
to a Gaussian probability density. The parameter λ controls the degree of smoothing: large λ
implies a high degree of smoothing, whereas for λ → 0 the original function f is recovered.
The Gaussian transform of the cost function in (1) can be computed explicitly, see Theorem 4.3
on (More, 1983).
Proposition 1 (Gaussian transform of localization cost). Let f be given by (1). Then, the Gaussian
transform of f is given by
ϕλ ( p) = f ( p) + γ + 8λ2 ∑ p i − p j 2 , (15)
(i,j)∈E
where
γ = 8mλ4 − 4λ2 ∑ d2ij .
(i,j)∈E
It is interesting to observe that, for suitably large value of λ, the transformed function ϕλ ( p)
is convex. This fact is stated in the following proposition.
.
where we defined rij ( p) = pi − p j . Let
Then
. dhij
hij = = 4rij (rij2 − d2ij + 4λ2 )
drij
2
. d hij
hij = = 4(3rij2 − d2v + 4λ2 ).
drij2
Note that hij > 0 if 4λ2 > d2ij − rij2 , and hij > 0 if 4λ2 > d2ij − 3rij2 . Since rij ≥ 0 it follows that,
for 4λ2 > d2ij , both hij and hij are positive. Therefore, if
1
λ> max d (17)
2 (i,j)∈E ij
then functions hij are increasing and convex, for all (i, j). Observe next that function rij ( p) is
convex in p (it is the norm of an affine function of p), therefore by applying the composition
rules to hij (rij ( p)) (see Section 3.2.4 of (Boyd, 2004)), we conclude that this latter function is
convex. Convexity of ϕλ ( p) then immediately follows since the sum of convex functions is
convex.
The key idea in the global continuation approach is to define a sequence {λk } decreasing to
zero as k increases, and to compute a corresponding sequence of points { p∗ (λk )} which are
the global minimizers of functions ϕλk ( p). The following strong result holds.
Proposition 3 (Theorem 5.2 of (More, 1983)). Let {λk } be a sequence converging to zero, and let
{ p∗ (λk )} be the corresponding sequence of global minimizers of ϕλk ( p). If { p∗ (λk )} → p∗ then p∗
is a global minimizer of f ( p).
2. Compute a (hopefully global) minimizer p∗k of ϕλk ( p) using a trust-region method with
initial guess pk−1 ;
3. If λk = 0, exit and return p∗ = p∗k ;
266 Sensor Fusion and Its Applications
4. Let k = k + 1. Update λ:
M−k
λk = λ ;
M−1 1
5. Go to step 2).
In step 2) of the algorithm, a quadratic approximation of ϕλk ( p) is needed for the inner iter-
ations of the trust-region method. More precisely, the trust-region algorithm shall work with
the following approximation of ϕλ around a current point p̄:
ϕλ ( p) qλ ( p)
. 1
= ϕλ ( p̄) + ∇ ϕλ ( p̄)δp + δ ∇2 ϕλ ( p̄)δp ,
2 p
where δ p = p − p̄. Due to the additive structure of (15), the gradient and Hessian of ϕλ are
computed as follows:
∇ ϕλ ( p) = ∇ f ( p) + 8λ2 ∑ ∇ gij ( p)
(i,j)∈E
100
80
60
40
20
0
e
d 100
80
c 60
40
b 20
Prior knowledge a 0
Number of nodes
Fig. 7. Percentage of convergence test depending on network size and goodness of initial
guess for GC approach.
more intensive than the previous two methods (see Section 5), shows instead a remarkable
insensitivity to the initial guess, and therefore it is suitable for applications in which little or
no prior knowledge is available. In few cases the number of converging experiments is less
than 100%, but this issue can be alleviated by increasing the number of major iterations, hence
making a more gradual smoothing. Further results on computational effort required by the
technique in reported in the Section 5.
Sensor Fusion for Position Estimation in Networked Systems 267
The previous convex constraints can only impose internodal distances to be less than a given
sensing range or less or equal to a measured distance dij . However, as stated in Section 3, we
want to impose equality constraints in the form:
Such constraints are non convex, and the SDP network localization approach is based on a
relaxation of the original problem. If only inequality conditions like (18) are used it is possible
to assure good localization accuracy only when non-anchor nodes are in the convex hull of
the anchors, whereas these localization approaches tend to perform poorly when anchors are
placed in the interior of the network (Biswas et al., 2006).
The rest of this section is structured as follows. The SDP relaxation is presented in Section
4.5.1. Then in Section 4.5.2 some relevant properties of the SDP approach are discussed. In
Section 4.5.3 some numerical examples are reported. Finally, a gradient refinement phase is
presented in Section 4.5.4, for the purpose of enhancing localization performance when the
distance measurements are affected by noise.
pi − p j 2 = d2ij , ∀(i, j) ∈ E p
ak − p j 2 = d2kj , ∀(k, j) ∈ E a
where dij is the distance measurement between non-anchor nodes i and j and dkj is the dis-
tance measurement between non-anchor node j and anchor node k.
268 Sensor Fusion and Its Applications
If we define the standard unit vector ei as a column vector of all zeros, except a unit entry in
the i-th position, it is possible to write the following equalities:
Then the matrix form of the localization problem can be rewritten as:
find p ∈ R (2× n ) , Y ∈ R ( n × n )
s.t. (ei − e j ) Y (ei − e j ) = d2ij , ∀ (i, j) ∈ E p
ak I2 p ak (20)
= d2kj , ∀ (k, j) ∈ E a ;
−e j pT Y −e j
Y = p p.
Equation (20) can be relaxed to a semidefinite program by simply substituting the constraint
Y = p p with Y p p. According to (Boyd, 2004) the previous inequality is equivalent to:
I2 p
Z= 0.
p Y
Then the relaxed problem (20) can be stated in standard SDP form:
min 0
s.t. (1; 0; 0) Z (1; 0; 0) = 1
(0; 1; 0) Z (0; 1; 0) = 1
(1; 1; 0) Z (1; 1; 0) = 2
(0; ei − e j ) Z (0; ei − e j ) = d2ij , ∀ (i, j) ∈ E p , (21)
ak ak
Z = d2kj , ∀ (k, j) ∈ E a ,
−e j −e j
Z 0.
Problem (21) is a feasibility convex program whose solution can be efficiently retrieved us-
ing interior-point algorithms, see (Boyd, 2004). As specified in Section 4.5.2 the approach is
proved to attain the actual node position, regardless the initial guess chosen for optimization.
It is clear that constraints in (21) are satisfied only if all distance measurements exactly match
internodal distances in network configuration. For example, in the ideal case of perfect dis-
tance measurements, the optimal solution satisfies all the constraints and corresponds to the
actual node configuration. In practical applications, however, the distance measurements are
noisy, and, in general, no configuration does exist that satisfies all the imposed constraints.
In such a case it is convenient to model the problem so to minimize the error on constraint
satisfaction, instead of the stricter feasibility form (21). Hence the objective function can be
rewritten as the sum of the error between the measured ranges and the distances between the
nodes in the estimated configuration:
It is worth noticing that, if the square of the errors is considered instead of the absolute value,
the problem formulation exactly matches the one presented in Section 3. By introducing slack
variables us and ls , the corresponding optimization problem can be stated as follows:
The previous semidefinite convex program allows to efficiently solve the range localization;
moreover it has global convergence properties as we describe in the following section.
Remark 1. Let c = m p + m a + 3 be the number of constraints in the SDP formulation of the range
localization problem (23) and ε be a positive number. Assume that a ε-solution of (23) is required,
that is we are looking for an optimal configuration of the network that corresponds to a value of the
objective function that is at most ε above the global minimum. Then the total number of interior-point
270 Sensor Fusion and Its Applications
√
algorithm iterations is smaller than n + c log 1ε , and the worst-case complexity of the SDP approach
√
is O( n + c(n3 + n2 c + c3 log 1ε )).
0.18 0.12
0.11
0.16
0.1
0.14
0.09
0.12
0.08
Φ*
Φ*
0.1 0.07
0.06
0.08
0.05
0.06
0.04
0.04
0.03
0.02 0.02
3 4 5 6 7 8 9 10 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
Number of anchors l/2
(a) (b)
Fig. 8. (a) Localization error for different number of anchor nodes, using SDP approach; (b)
Localization error for different anchor placements, using SDP approach. Four anchor nodes
are displaced on the vertices of a square centered in [0.5, 0.5] and side l.
consider a network in the unit square, with four anchors disposed on the vertices of a smaller
square with side l, centered in [0.5, 0.5]. For a small value of l the anchors are in the interior
of the network, whereas as l increases the anchors tends to be placed on the boundary of the
formation. It is possible to observe that the latter case, i.e., when the non-anchor nodes are in
the convex hull of the anchors, the localization error is remarkably reduced, see Figure 8(b).
in order to have a rough estimate of node position; (ii) refine the imprecise estimated con-
figuration with a local approach. For example one can use the gradient method described in
Section 4.1, which is simple and has good convergence properties at a time. Moreover the
gradient approach can be easily implemented in a distributed fashion, as discussed in Section
7. Numerical results on the SDP with local refinement are presented in Section 6.
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
(a) (b)
Fig. 9. (a) initial position guess; (b) actual nodes configuration (n = 225).
niques, depending on the initial guess. The latter is obtained from the actual configuration by
perturbing each node position with a Gaussian noise with covariance matrix Σi = σp2 I2 . Figure
10 reports the percentage of converging tests (as defined in Section 4.1.2) over 100 simulation
runs. For the same experiment we report the computational effort for the four techniques, see
Table 1. The effort is expressed in terms of CPU time required for reaching the termination
condition of each algorithm. The tests were performed in Matlab on a MacBook, with 2.26
GHz clock frequency and 4 GB RAM.
It is possible to observe that the Gauss-Newton is the less resilient when the initial guess is
not accurate, although it is fast and converges in few iterations. The Gradient method and
the trust region approach have similar convergence properties and they require a comparable
computational effort: the trust region approach is able to converge in few iterations, since it
272 Sensor Fusion and Its Applications
105
100
95
85
80
75
GM
70 GN
65 TR
GC
60
55
0 0.2 0.4 0.6 0.8 1
Initial guess − σp
Table 1. CPU time for gradient method (GM), Gauss-Newton (GN), trust region (TR) and
global continuation (GC) approach for different values of σp . Time is expressed in seconds.
σp GM GN TR GC
0.01 0.2955 0.0264 0.0751 2.2362
0.05 0.3292 0.0393 0.0635 2.2280
0.1 0.3657 0.0869 0.0930 2.2437
0.5 0.4449 0.7493 0.2316 2.3654
1 0.5217 1.4703 0.3443 2.5524
uses also second order information on the objective function (i.e., the Hessian). The gradi-
ent method, however, requires simpler update steps, but this comes at the price at a bigger
number of iteration required for the technique to converge. Finally the global continuation
converged in all the test performed, whereas the computational effort required is remarkably
higher than the other approaches. Table 1 also enlightens how global continuation takes no
advantage from good prior knowledge, since the initial smoothing moves the initial guess to
the minimizer of the convexified function, regardless the starting guess of the optimization.
0.035
GC
0.03
SDP
SDP + refinement
0.025
0.02
Φ*
0.015
0.01
0.005
0
1 2 3 4 5 6 7 8 9 10
σd x 10
−3
Fig. 11. Localization error of the configuration estimated with global continuation (dotted line
with triangle), SDP (dashed line with circle) and SDP with gradient refinement (dotted line
with cross).
Table 2. CPU time for global continuation (GC) approach, semidefinite programming and SDP
with gradient refinement. Time is expressed in seconds.
σd GC SDP SDP + GM
0.001 2.2942 20.1295 20.3490
0.003 2.3484 18.5254 18.9188
0.005 1.7184 16.5945 16.8349
0.007 1.7191 15.8923 16.1929
0.01 1.7360 15.8138 16.1093
has been addressed with some forms of distributed implementation of the SDP approach, see
in (Biswas et al., 2006). Some discussion on distributed network localization is reported in the
following section.
ing (Costa et al., 2006), and the use of barycentric coordinates for localizing the nodes under
the hypothesis that non-anchor nodes lie in the convex hull of anchors (Khan et al., 2009). An
extension of the SDP framework to distributed network localization can be found in (Biswas
et al., 2006), whereas contributions in the anchor-free setup include (Xunxue et al., 2008). We
conclude the chapter with a brief outline of a distributed extension of the gradient method
presented in Section 4.1. We first notice that the gradient information which is needed by the
node for an update step requires local-only information. Each node, in fact, can compute the
local gradient as:
∇i f ( p) = ∑ ( pi − p j ) gij ( p), (25)
j∈Ni
where ∇i f ( p) denote the i-th 1 × 2 block in the gradient ∇ f ( p) in (11) and Ni are the neigh-
bors of the node i. It follows that the portion of gradient ∇i f ( p) can be computed individually
by node i by simply querying the neighbors for their current estimated positions. For iterat-
ing the gradient method each node also needs the stepsize ατ , which depends on some global
information. The expression of the stepsize (5), however, is particularly suitable for decentral-
ized computation, as we can notice by rewriting (5) in the following form:
(τ ) ( τ −1) 2
∑in=1 pi − pi
ατ = (τ ) ( τ −1)
, (26)
∑in=1 ( pi − pi ) (∇ i f (p
(τ ) ) − ∇
i f (p
(τ −1) ))
It is easy to observe that each summand that composes the sum at the denominator and
the numerator of ατ is a local quantity available at node i. Hence a distributed averaging
method, like the one proposed in (Xiao et al., 2006), allows each node to retrieve the quan-
(τ ) ( τ −1) (τ ) ( τ −1)
tities n1 ∑in=1 pi − pi 2 and n1 ∑in=1 ( pi − pi ) (∇i f ( p(τ ) ) − ∇i f ( p(τ −1) )). By sim-
ply dividing these quantities each node can obtain the stepsize ατ and can locally update its
estimated position according to the distributed gradient rule:
( τ +1) (τ )
pi = pi − α τ ∇ i f ( p ( τ ) ). (27)
Similar considerations can be drawn about the Gauss-Newton approach. On the other hand
it can be difficult to derive a distributed implementation of the global continuation and trust
region approaches, limiting their effectiveness in solving the network localization problem.
8. Conclusion
In this chapter we review several centralized techniques for solving the network localization
problem from range measurements. We first introduce the problem of information fusion
aimed at the estimation of node position in a networked system, and we focus on the case in
which nodes can take pairwise distance measurements. The problem setup is naturally mod-
eled using graph formalism and network localization is expressed as an optimization problem.
Suitable optimization methods are then applied for finding a minimum of the cost function
which, under suitable conditions, corresponds to the actual network configuration. In the
chapter we analyze five numerical techniques for solving the network localization problem
under range-only measurements, namely a gradient method, an Gauss-Newton algorithm, a
Trust-Region method, a Global Continuation approach and a technique based on semidefinite
programming. The methods are described in details and compared, in terms of computational
efficiency and convergence properties. Several tests and examples further define possible ap-
plications of the presented models, allowing the reader to approach the problem of position
estimation in networked system paying attention to both theoretical and practical aspects. The
Sensor Fusion for Position Estimation in Networked Systems 275
first three techniques (GM, GN and TR) are local in the sense that the optimization techniques
are able to attain the global optimum of the objective function only when some initial guess
on node configuration is available and this guess is sufficiently close to actual node positions.
The convergence properties of these techniques are tested through extensive simulations. The
gradient method can be implemented easily and requires only first order information. In this
context we recall a simple and effective procedure for computing the stepsize, called Barzilai-
Borwein stepsize. The Gauss-Newton approach, although being the fastest and most efficient
method, is prone to convergence to local minima and it is therefore useful only when good
a-priori knowledge of the node position is available. The trust-region method has better con-
vergence properties with respect to the previous techniques, providing a good compromise
between numerical efficiency and convergence. We also present two global approaches, a
global continuation approach and a localization technique based on semidefinite program-
ming (SDP). Global continuation, although computationally intensive, shows convergence to
the global optimum regardless the initial guess on node configuration. Moreover it allows to
compute accurate position estimates also in presence of noise. Finally the SDP approach is
able to retrieve the exact node position in the case of noiseless distance measurements, by re-
laxing the original problem formulation. In the practical case of noisy measure, the approach
tends to be inaccurate, and the localization error heavily depends on the number of anchor
nodes and on their placement. In order to improve the localization accuracy we also discuss
the possibility of adding a local refinement to the SDP estimate, evaluating this solution in
terms of precision and computational effort.
We conclude the chapter by discussing how decentralized implementations of the network lo-
calization algorithms can be derived, and reviewing the state-of-the-art on distributed range-
based position estimation.
9. References
Akyildiz, I., Su, W., Sankarasubramniam, Y. & Cayirci, E. (2002). A survey on sensor networks,
IEEE Communication Magazine 40(8): 102–114.
Barooah, P. & Hespanha, J. (2007). Estimation on graphs from relative measurements, IEEE
Control Systems Magazine 27(4): 57–74.
Barzilai, J. & Borwein, J. (1988). Two-point step size gradient methods, IMA J. Numer. Anal.
8: 141–148.
Biswas, P., Lian, T., Wang, T. & Ye, Y. (2006). Semidefinite programming based algorithms for
sensor network localization, ACM Transactions on Sensor Networks (TOSN) 2(2): 220.
Biswas, P. & Ye, Y. (2004). Semidefinite programming for ad hoc wireless sensor network
localization, Proceedings of the Third International Symposium on Information Processing
in Sensor Networks (IPSN), pp. 2673–2684.
Boyd, S., V. L. (2004). Convex optimization, Cambridge University Press.
Costa, J., Patwari, N. & Hero, A. (2006). Distributed weighted-multidimensional scaling for
node localization in sensor networks, ACM Transactions on Sensor Networks 2(1): 39–
64.
Doherty, L., Pister, K. & El Ghaoui, L. (2001). Convex position estimation in wireless sensor
networks, IEEE INFOCOM, Vol. 3, pp. 1655–1663.
Eren, T., Goldenberg, D., Whiteley, W., Yang, Y., Morse, A., Anderson, B. & Belhumeur, P.
(2004). Rigidity, computation, and randomization in network localization, IEEE IN-
FOCOM, Vol. 4, pp. 2673–2684.
276 Sensor Fusion and Its Applications
0
12
1. Introduction
Multi-sensor based state estimation is still challenging because sensors deliver correct mea-
sures only for nominal conditions (for example the observation of a camera can be identified
for a bright and non smoggy day and illumination conditions may change during the tracking
process). It results that the fusion process must handle with different probability density func-
tions (pdf) provided by several sensors. This fusion step is a key operation into the estimation
process and several operators (addition, multiplication, mean, median,...) can be used, which
advantages and drawbacks.
In a general framework, the state is given by a hidden variable X that define "what we are
looking for" and that generates the observation, provided by several sensors. Figure 1 is an
illustration of this general framework. Let Z be a random vector that denotes the observa-
tions (provided by several sensors). State estimation methods can be divided in two main
categories. The first family is based on optimisation theory and the state estimation problem
is reformulated as the optimisation of an error criteria into the observation space. The sec-
ond family proposes a probabilistic framework in which the distribution of the state given the
observation has to be estimated (p(X|Z)). Bayes rule is widely used to do that:
p(Z|X) p(X)
p(X|Z) = (1)
p(Z)
When the state is composed by a random continuous variable, the associated distribution
are represented by two principal methods: the first one, consists in the définition of an ana-
lytic representation of the distribution by a parametric function. A popular solution is given
by Gaussian or mixture of Gaussian models. The main drawback of this approach is that it
assumes that the general shape of the distribution if known (for example a Gaussian repre-
senting an unimodal shape). The second category of methods consists in approximate the
distribution by samples, generated in a stochastic way from Monte-Carlo techniques. The
resulting model is able to handle with non linear model and unknown distributions.
This chapter presents the probabilistic framework of state estimation from several sensors
and more specifically, stochastic approaches that approximate the state distribution as a set of
samples. Finally, several simple fusion operators are presented and compared with an original
algorithm called M2SIR, on both synthetic and real data.
278 Sensor Fusion and Its Applications
Fig. 1. State estimation synoptic: multi sensors observations are generated by the hidden state
to be estimated.
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fig. 2. Probability distribution approximation of the blue curve with unweighted samples (red
balls). (best viewed in color)
1 N
δ(X − Xn ), is equivalent to p(X) ≈ {Xn }nN=1
N n∑
p(X) ≈ (2)
=1
with δ the Kronecker function. Figure 3 shows that the same distribution may be also approx-
imate by a sum of N samples with associated weights π n , n ∈ 1...N, such as ∑nN=1 π n = 1
:
N
p(X) ≈ ∑ π n δ(X − Xn ), is equivalent to p(X) ≈ {Xn , π n }nN=1 (3)
n =1
M2SIR: A Multi Modal Sequential Importance Resampling Algorithm for Particle Filters 279
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fig. 3. Probability distribution approximation of the blue curve with weighted samples (red
ellipsoids, from which the area is proportional to the weight). (best viewed in color)
Fig. 4. Online tracking. The current state depends only on the current observation and the
previous state.
Temporal filtering of a state sequence can be formalised with a first order Markov process (the
current state depends only on the current observation and the previous state) as illustrated
on fig. 4). In this recursive Markov process framework, sequential probabilistic tracking, the
estimation of the posterior state distribution p(Xt |Z1:t ) at time t, is made according to the
previous sequence of observation Z1:t . The Bayes rule is used update the current state:
N
p(Xt−1 |Z1:t−1 ) ≈ ∑ πtn−1 δ(Xt−1 − Xnt−1 ), (7)
n =1
where πtn−1 is the weight associated to the nth sample, n ∈ 1...N, such as ∑nN=1 πtn−1 = 1. A
discrete approximation of the Chapman Kolmogorov equation (5) is given by:
N
p(Xt |Z1:t−1 ) ≈ ∑ πtn−1 p(Xt |Xnt−1 ), (8)
n =1
M2SIR: A Multi Modal Sequential Importance Resampling Algorithm for Particle Filters 281
where p(Xt |Xt−1 ) is the transition distribution of the system. The law, defined by equation
(8) is a mixture of N components p(Xt |Xnt−1 ), weighted with πtn−1 . A discrete form of the
recursive Bayesian filter (6) is approximated by:
N
p(Xt |Z1:t ) ≈ C −1 p(Zt |Xt ) ∑ πtn−1 p(Xt |Xnt−1 ). (9)
n =1
Since no analytical expression of the likelihood p(Zt |Xt ), is available, a sampling strategy is
also proposed. An importance sampling algorithm is applied to generate a new set of particles
from the previous set {Xnt−1 , πtn−1 }nN=1 , using the prediction distribution for each sample. The
result is a set of N samples Xnt generated by :
N
Xnt ∼ q(Xt ) = ∑ πtn−1 p(Xt |Xnt−1 ) (10)
n =1
For each sample Xnt the likelihhod is estimated with: πtn = P(Zt |Xnt ). The filter provides a
set of N weighted samples {Xnt , πtn }nN=1 , which is an approximation p(Xt |Z1:t )of the posterior
at time t. Figure 5 illustrates the SIR algorithm for a simple state of dimension one. The
algorithm is divided in three steps:
• (a) Importance sampling: draw particles according to their weight from the set of par-
ticles at time t − 1. This process duplicates particles with a strong weight and removes
particles with a light weight. The resulting set of particle approximates the same distri-
bution than the weighted set from which they are drawn.
• (b) Prediction step: move each particle according to a proposal function p(X∗ |X). When
no information on the evolution process is available, a random step strategy can be used
p(X∗ |X) = N (0, σ ).
• (c) Estimation step: the weight of each particle is computed according to the likelihood
function of the observation Zt , given a sample Xnt : πtn = P(Zt |Xnt )
Diffusion of particles from t − 1 to t provides filtering properties to the algorithm SIR. The
particles are attracted toward high probability areas of the state space, from their previous
position. The main drawback of this algorithm is that it requires a number of particles which
grows in an exponential way related to the size of the state vectorIsard & MacCormick (2001);
Smith & Gatica-Perez (2004). For high dimensional problems, MCMC methods with marginal-
ized sampling strategies are preferred ( see MacKay (2003) for further information).
Fig. 5. Illustration of the diffusion of a set of particles with the SIR algorithm, with a proposal
function N (0, σ = 0.06). (0) : posterior distribution at t − 1 (blue curve) and it approximation
with weighted particles (red ellipsoids with an area proportional to the weight). (a) : the same
posterior distribution at t − 1 (blue curve) and its approximation with unweighted particles,
after the importance resampling algorithm. (b) : posterior distribution at t (blue curve) and
it approximation with unweighted particles, generated using the proposal function N (0, σ =
0.06). (c) :probability distribution at t (blue curve) and it approximation by weighted particles
according to the likelihhod function at time t.
Kassim (2007) propose to adapt the importance sampling method to the data quality. For a
similar application, Du W. Du et al. (2007) propose to combine an independent transition ker-
M2SIR: A Multi Modal Sequential Importance Resampling Algorithm for Particle Filters 283
nel with a booster function to get a mixture function. We propose a new importance sampling
algorithm allowing to handle with several sources.
3. M2SIR Algorithm
When the observation is provided by several sources, the likelihood associated to each parti-
cle results to the fusion of several weights. This fusion is then a challenging operation because
several operators can be used, with advantages and drawbacks. We are proposing to merge
observations intrinsically during the re-sampling step of the particle filter .The resulting al-
gorithm (see Algorithm 2) is a variant of the C ONDENSATION algorithm M. Isard & A. Blake
(1998). The difference between this algorithm and C ONDENSATION is that the weight associ-
ated to each particle is a weight vector (composed of weights generated from observations of
each source) and that the sampling step is provided by the M2SIR algorithm developed in the
following section.
. 1 N
Estimation : X̂t = N ∑n=1 Xnt
end for
Output : The set of estimated states during the video sequence {X̂t }t=1,...,Tend
We consider the estimation of the posterior p(Xt |Z0:t ) at time t, by a set of N particles {(Xnt , πtn )}nN=1
with N associated weight vector πtn . The weight vector, of size M given by the number of ob-
284 Sensor Fusion and Its Applications
servations (sources), is composed by the weights related to the sources. For readability, we
omit the temporal index t in the following equations. The aim of the proposed multi modal
sequential importance resampling algorithm (M2SIR) is to generate a new particle with a three
step approach, illustrated in Fig. 6 in the case of three sources
1. M samples (one for each source) are drawn using an Importance Sampling strategy. The
resulting output of the step is a set of M candidate samples and their associated weight
vector: {X(i) , π (i) }i=1,...,M
2. A likelihood ratio vector r of size M is then built from likelihood ratios estimated for
each candidate sample. (see below for more details).
3. The selected candidate sample is finally given by an importance sampling strategy op-
erated on a normalized likelihood ratio vector.
The M likelihood ratios used in step two, called ri (i = 1, .., M) are computed by:
i
. M M πj
ri = ∏ ∏ k
(11)
j =1 k =1 π j
where π ij denotes the weight associated to particle i, from the sensor j Equation 11 can be
written in a simplest way using log ratio:
M M
lri = ∑∑ log (π ij ) − log (π kj ) (12)
j =1 k =1
. 1 M
with 1(1× M) a matrix of size one line and M columns filled by ones. if Cπ = ∑ lπ k , lr
M k =1
can be written:
1 (1× M ) ( l π 1 − C π )
1 (1× M ) ( l π 2 − C π )
lr = M
(15)
...
1 (1× M ) ( l π M − C π )
M2SIR: A Multi Modal Sequential Importance Resampling Algorithm for Particle Filters 285
lr represents an unnormalized log. weight vector and the final normalized weight vector is
given by:
.
c = Cc .exp (lr ) (16)
.
Cc = 1(1× M) lr (17)
r is then used in step three to select a sample for the M candidates with a importance sampling
strategy.
Importance
Sampling
likelihood ratio
Product of
× × × × × = ×××××
=
× × × × × =
Importance
sampling
Particle
likelihood (weight)/sensor 1
likelihood (weight)/sensor 2
likelihood (weight)/sensor 3
Fig. 6. synoptic of the M2SIR algorithm in the case of three sources: 1)Three particles are
drawn using importance sampling (one for each sensor weight distribution). 2) Likelihood
ratio are then computed for the three particles. 3) The final particle is drawn with importance
sampling from the three ratios.
286 Sensor Fusion and Its Applications
2. Importance sampling using a product operator, called PSIR. For each particle, a global
weight is computed by the product of the weight provided by each sensor :
M
πi = ∏ πij (19)
j =1
3. Importance sampling using the M2SIR algorithm presented in the previous section.
Two synthetic set of 3 input distributions have been generated:
1. The first sequence illustrates two dissonant sensors (cf. figure 7)). Sensors two and three
provide two different Gaussian distributions while sensor one is blind (it distribution
follows a uniform random law).
2. The second sequence is an example of two sensors providing the same information (cf.
figure8)). Distributions of sensors two and three follow the same Gaussian law while
sensor one is blind.
Figure 7 shows, for the first sequence, the resulting distributions computed by SSIR, PSIR and
M2SIR algorithms. In this example, both the SSIR and M2SIR methods give a resulting pdf
reporting the two modes present in the original distributions of sensors two and three. The
PSIR method provides a third ghost mode between modes of sensors 2 and 3. The second
example (cf. fig 8) shows that the SSIR method generates a noisy distribution, resulting to the
M2SIR: A Multi Modal Sequential Importance Resampling Algorithm for Particle Filters 287
blind sensor. PSIR and M2SIR gives close distributions, decreasing the variance of sensors 2
and 3.
800 800
600 600
400 400
200 200
0 0
0 5 10 15 20 0 5 10 15 20
pdf. sensor 3
1000 pdf. SSIR.
1000
800
800
600 600
400 400
200 200
0 0
0 5 10 15 20 0 5 10 15 20
pdf. M2SIR
pdf. PSIR. 1000
1000
800
800
600 600
400 400
200 200
0 0
0 5 10 15 20 0 5 10 15 20
Fig. 7. Illustration of multi-source sampling algorithm for a three sensor fusion step. The
distribution provided from sensor one is blind (follows a uniform law) while the distribution
provided by sensors two and three are dissonant (the maximum of the two distribution is
different).
800 800
600 600
400 400
200 200
0 0
0 5 10 15 20 0 5 10 15 20
pdf. sensor 3
1000 pdf. SSIR
1000
800
800
600 600
400 400
200 200
0 0
0 5 10 15 20 0 5 10 15 20
pdf. M2SIR
pdf. PSIR 1000
1000
800
800
600 600
400 400
200 200
0 0
0 5 10 15 20 0 5 10 15 20
Fig. 8. Illustration of multi-source sampling algorithm for a three sensor fusion step. Distri-
bution provided from sensor one is blind (follows a uniform law) while distribution provided
by sensors two and three are the same (Gaussian law).
• The likelihood model (or observation function) is divided in two parts. The first obser-
vation is provided by a foreground/background algorithm developed in Goyat et al.
(2006). The second observation is achieved by a laser sensor.
• The prediction model assume that the state prediction can be driven by a bicycle model.
Details of this application can be find here Goyat et al. (2009).
5.1 Experiments
Experiments have been achieved in order to compare several fusion algorithms on real data.
In order to estimate the precision of the algorithms, ground truth has been acquired using
a RTKGPS1 . A set of twenty sequences at different velocities and under different illumina-
tion conditions has been acquired with the associated RTKGPS trajectories. A calibration step
gives the homography between the image plane and and GPS ground plane such as an av-
erage error can be computed in centimeters into the GPS reference frame. Table 1 shows the
estimated precision provided by each sensor without fusion and by the three fusion strategies:
PSIR, SSIR and M2SIR. The fusion strategy increases the accuracy of the estimation. Moreover
results provided by the M2SIR are slighty better than SSIR and PSIR. An other set of twenty
sequences has been acquired with a unplugged sensor with provides constant measures. Table
Fig. 9. Synoptic of the fusion based tracking application. A particle filter (SIR) is proposed
with a five dimensional state vector, a bicycle evolution model and observations provided by
a camera and a laser scanner.
2 shows the estimated precision provided by three fusion strategies. The SSIR fusion strategy
provides a poor precision comparing to PSIR and M2SIR.
6. Conclusion
Particle filters are widely used algorithms to approximate, in a sequential way, probability
distributions of dynamic systems. However, when observations are provided by several sen-
sors, a data fusion step is necessary to update the system. We have presented several fusion
operators and compare them on both synthetic and real data. The M2SIR algorithm is a multi
290 Sensor Fusion and Its Applications
Fig. 10. Illustration of the observations provided by the two sensors. The reference frame is
defined by a GPS antenna on the top of the vehicle. The estimated position is represented
by a virtual GPS antenna associated to each sensor (green dash for vision and red dash for
laser). The green cube represents the projection of the 3D vehicle model for the estimated
state (vision). Red dashes are the projection of laser measures into the image.
modal sequential importance resampling algorithm. This method, based on likelihood ratios,
can be used easily within a particle filter algorithm. Experiments show that the method deals
efficiently with both blind and dissonant sensors.
Fusion operators have been used for a vehicle tracking application, and experiments have
shown that the sensor fusion increases the precision of the estimation.
7. References
Bar-Shalom, Y. & Fortmann, T. (1988). Alignement and Data Association, New-York: Academic.
D. Marimon, Y. Maret, Y. Abdeljaoued & T. Ebrahimi (2007). Particle filter-based camera
tracker fusing marker and feature point cues, IS&T/SPIE Conf. on visual Communi-
cations and image Processing, Vol. 6508, pp. 1–9.
Gorji, A., Shiry, S. & Menhaj, B. (2007). Multiple Target Tracking For Mobile Robots using
the JPDAF Algorithm, IEEE International Conference on Tools with Artificial Intelligence
(ICTAI), Greece.
Goyat, Y., Chateau, T., Malaterre, L. & Trassoudaine, L. (2006). Vehicle trajectories evaluation
by static video sensors, 9th International IEEE Conference on Intelligent Transportation
Systems Conference (ITSC 2006), Toronto, Canada.
Goyat, Y., Chateau, T. & Trassoudaine, L. (2009). Tracking of vehicle trajectory by combining a
camera and a laser rangefinder, Springer MVA : Machine Vision and Application online.
Isard, M. & MacCormick, J. (2001). Bramble: A bayesian multiple-blob tracker, Proc. Int. Conf.
Computer Vision, vol. 2 34-41, Vancouver, Canada.
M2SIR: A Multi Modal Sequential Importance Resampling Algorithm for Particle Filters 291
J. Klein, C. Lecomte & P. Miche (2008). Preceding car tracking using belief functions and a
particle filter, ICPR08, pp. 1–4.
Karlsson, R. & Gustafsson, F. (2001). Monte carlo data association for multiple target tracking,
In IEEE Target tracking: Algorithms and applications.
Khan, Z., Balch, T. & Dellaert, F. (2004). An MCMC-based particle filter for tracking multiple
interacting targets, European Conference on Computer Vision (ECCV), Prague, Czech
Republic, pp. 279–290.
M. Isard & A. Blake (1998). Condensation – conditional density propagation for visual track-
ing, IJCV : International Journal of Computer Vision 29(1): 5–28.
MacKay, D. (2003). Information Theory, Inference and Learning Algorithms., Cambridge Univer-
sity Press.
Oh, S., Russell, S. & Sastry, S. (2004). Markov chain monte carlo data association for multiple-
target tracking, IEEE Conference on Decision and Control, Island.
P. Pérez, J. & A. Blake (2004). Data fusion for visual tracking with particles, Proceedings of the
IEEE 92(2): 495–513.
Read, D. (1979). An algorithm for tracking multiple targets, IEEE Transactions on Automation
and Control 24: 84–90.
Sarkka, S., Vehtari, A. & Lampinen, J. (2004). Rao-blackwellized particle filter for multiple
target tracking, 7th International Conference on Information Fusion, Italy.
Smith, K. & Gatica-Perez, D. (2004). Order matters: A distributed sampling method for multi-
object tracking, British Machine Vision Conference (BMVC), London, UK.
Vermaak, J., Godsill, J. & Pérez, P. (2005). Monte carlo filtering for multi-target tracking and
data association, IEEE Transactions on Aerospace and Electronic Systems 41: 309–332.
W. Du, Y. Maret & J. Piater (2007). Multi-camera people tracking by collaborative particle
filters and principal axis-based integration, ACCV, pp. 365–374.
Y.D. Wang, J. & A. Kassim (2007). Adaptive particle filter for data fusion of multiple cameras,
The Journal of VLSI Signal Processing 49(3): 363–376.
292 Sensor Fusion and Its Applications
On passive emitter tracking in sensor networks 293
0
13
Darko Mušicki
Hanyang University
Korea
Wolfgang Koch
Fraunhofer FKIE
Germany
1. Introduction
Many applications require fast and accurate localization and tracking of non-cooperative emit-
ters. In many cases, it is advantageous not to conceal the observation process by using active
sensors, but to work covertly with passive sensors. The estimation of the emitter state is
based on various types of passive measurements by exploiting signals emitted by the targets.
In other applications there is no choice but to exploit received signals only. Typical examples
include search and rescue type operations.
Some passive measurements can be taken by single sensors: e.g. bearing measurements
(AOA: Angle of Arrival) and frequency measurements (FOA: Frequency of Arrival). The
emitter state can be estimated based on a set of measurements of a single passive observer.
This problem is called the Target Motion Analysis (TMA) problem which means the process
of estimating the state of a radiating target from noisy incomplete measurements collected by
one or more passive observer(s). The TMA problem includes localization of stationary as well
as tracking of moving emitters. The TMA problem based on a combination of AOA and FOA
measurements is considered by Becker in (Becker, 2001). Becker investigates and discusses
the TMA problem with many characteristic features such as observability conditions, combi-
nation of various types of measurements, etc., (Becker, 1999; 2005) .
Alternatively, measurements can be obtained from a network of several spatially dislocated
sensors. Here, a minimum of two sensors is often needed. Measurements of Time Difference
of Arrival (TDOA) and Frequency Difference of Arrival (FDOA) belong to this group.
TDOA measurements are obtained in the following way: several distributed, time-synchronized
sensors measure the Time of Arrival (TOA) of signals transmitted from the emitter. The dif-
ference between two TOA measurements of the same signal gives one TDOA measurement.
Alternatively, TDOA measurements can be obtained by correlating signals received by the
sensors. A time standard can be used for time synchronization.
294 Sensor Fusion and Its Applications
In the absence of noise and interference, a single TDOA measurement localizes the emitter on
a hyperboloid with the two sensors as foci. By taking additional independent TDOA mea-
surements from at least four sensors, the three-dimensional emitter location is estimated from
the intersections of three or more hyperboloids. If sensors and emitter lie in the same plane,
one TDOA measurement defines a hyperbola describing possible emitter locations. There-
fore, the localization using TDOA measurements is called hyperbolic positioning. The sign
of the measurement defines the branch of the hyperbola on which the emitter is located. The
two-dimensional emitter location is found at the intersection of two or more hyperbolae from
at least three sensors. This intersection point can be calculated by analytical solution, see
e.g. (K. C. Ho, 2008; So et al., 2008). Alternatively, a pair of two sensors moving along arbi-
trary but known trajectories can be used for localizing an emitter using TDOA measurements.
In this case, the emitter location can be estimated by filtering and tracking methods based
on further measurements over time. This chapter is focused on the localization of unknown,
non-cooperative emitters using TDOA measurements from a sensor pair. Some results have
already been published in (Kaune, 2009).
The localization and tracking a non-cooperative emitter can be improved by combining dif-
ferent kinds of passive measurements, particularly in the case of a moving emitter.
One possibility is based on bearing measurements. A pair of one azimuth and one TDOA
measurement is processed at each time step. The additional AOA measurement can solve the
ambiguities appearing in processing TDOA measurements only. Another possibility consid-
ers two sensors measuring the FDOA between two frequencies of arrival (Mušicki et al., 2010;
Mušicki & Koch, 2008). These measurements can be taken by the same sensors as the TDOA
measurements. The TDOA/FDOA measurement pairs can be obtained by using the Complex
Ambiguity function (CAF). The combination of TDOA and FDOA measurements improves
the estimation performance strongly.
This chapter gives an overview of the topic of passive emitter tracking. Section 2 describes the
situation of a single passive observer. Important steps of solving the passive emitter tracking
problems are presented. When assessing an estimation task, it is important to know the best
estimation accuracy that can be obtained with the measurements. The Cramér Rao Lower
Bound (CRLB) provides a lower bound on the estimation accuracy for any unbiased estimator
and reveals characteristic features of the estimation problem.
Powerful estimation algorithms must be applied to obtain useful estimates of the emitter state.
For passive emitter tracking, measurements and states are not linearly related. Therefore, only
nonlinear estimation methods are appropriate. Passive emitter tracking is a complex prob-
lem. Depending on the types of measurements, various estimation methods can be applied
showing different localization performance in various scenarios. The goal of this chapter is to
provide a review of the state of the art. The discussion is not restricted to one chosen method
but presents an overview of different methods. The algorithms are not shown in detail; there-
fore, a look at the references is necessary to implement them. In the appendix, a toolbox
of methods makes several estimation methods available which are applied in this chapter.
Firstly, the maximum likelihood estimator (MLE) as a direct search method, which evaluates
at each estimate the complete measurement dataset. Secondly, Kalman filter based solutions
which recursively update the emitter state estimates. The tracking problem is nonlinear; thus
the Extended Kalman Filter (EKF) provides an analytic approximation, while the Unscented
Kalman Filter (UKF) deterministically selects a small number of points and transforms these
points nonlinearly. Thirdly, Gaussian Mixture (GM) filters will be discussed, which approxi-
On passive emitter tracking in sensor networks 295
mate the posterior density by a GM (a weighted sum of Gaussian density functions). Addi-
tionally, some basics on the CRLB and the Normalized Estimation Error Squared (NEES) are
presented.
In sections 3, 4, 5 passive emitter tracking using TDOA, a combination of TDOA and AOA
and a combination of TDOA and FDOA is investigated, respectively. Finally, conclusions are
drawn.
where vk ∼ N (0, Q) means that vk is zero-mean normal distributed with covariance Q. The
measurement model relates the noisy measurements zk ∈ R nz to the state, where nz is the
dimension of the measurement vector. The measurement function h(e) is a function of the
emitter state, nonlinear or linear, and reflects the relations between the emitter state and the
measurements. Thus, the measurement process is modeled by adding white Gaussian noise
uk :
zk = h(ek ) + uk , uk ∼ N (0, R), (2)
where R is the covariance of the measurement noise.
An estimation algorithm must be found to solve the emitter tracking problem. Based on all
available measurements Zk = {z1 , z2 , . . . , zk } up to time tk we seek to estimate the emit-
ter state ek . Therefore, it is required to compute the posterior probability density function
p(ek | Zk ). A short review of available estimation algorithms is given in A.3 and include:
• As a direct method, maximum likelihood estimation (MLE) evaluates at each time step
the complete measurement dataset. In many cases, a numerical iterative search algo-
rithm is needed to implement MLE.
• Recursive Kalman-type filter algorithms can be used as well. They are Bayesian estima-
tors and construct the posterior density using the Bayes rule. Since the measurement
equation in passive emitter tracking is often nonlinear, nonlinear versions of it must be
used: the Extended Kalman filter (EKF) provides an analytic approximation, while the
Unscented Kalman filter (UKF) deterministically selects a small number of points and
transforms these points according to the nonlinearity.
• Gaussian Mixture (GM) filters approximate the required densities by Gaussian Mix-
tures, weighted sums of Gaussians. The approximation can be made as accurate as de-
sirable by adapting the number of mixture components appropriately, see (Ristic et al.,
2004).
In passive tracking, the emitter may not be observable from available measurements in some
situations. If the observer is moving directly in direction of the stationary emitter, for example,
the emitter is not observable from bearing measurements only. In the literature, necessary and
sufficient observability criteria using angle measurements and using a combination of angle
and frequency measurements have been derived (Becker, 1993; 1996). In general, ambiguities
can be resolved by suitable observer maneuvers, which depend on the type of measurements
and the emitter model as well. A measurement set consisting of different measurement types
often results in less restrictive observability conditions.
In an application, the user should always strive to get the maximum of attainable estima-
tion accuracy. Estimation accuracy can firstly be influenced by the choice of the estimation
algorithm and, secondly, by the choice of the emitter-observer geometry over time, via ob-
server motion. The estimation accuracy highly depends on the emitter-observer geometry.
The emitter-observer geometry may be changed by observer maneuvers. Thus, the final step
in solving the TMA problem is to find an optimal observer maneuver creating a geometry that
maximizes the estimation accuracy. In the literature, several criteria have been used, one of
them is maximizing the determinant of the Fisher Information Matrix (FIM) J.
Already a single bearing measurement provides information on the emitter position. In ad-
dition, or instead of bearing measurements, measurements of the Doppler-shifted frequency
can be taken, (Becker, 1992). Frequency measurements depend on the emitter-sensor-motion,
more precisely on the radial component of the relative velocity vector. Frequency drift and
frequency hopping have an impact on the quality of frequency measurements and have to be
taken into account. The location methods based on bearing or frequency measurements differ
significantly. The substantial differences between both methods lead to a significant integra-
tion gain when the combined set of bearing and frequency measurements is processed.
x
emitter e = ,
ẋ
x
x= ,
y
ẋ
ẋ =
x ẏ
α (1)
(1) x
sensor s =
ẋ(1)
y
where xk = ( xk , yk ) T ∈ R2 denotes the position and ẋk = ( ẋk , ẏk ) T ∈ R2 the velocity. Two
sensors with the state vectors
T
(i ) (i ) T (i ) T
sk = xk , ẋk , i = 1, 2, (4)
observe the emitter and receive the emitted signal. The sensors have a navigation system to
know their own position and speed. Therefore their state vectors are known at every time.
To simplify, the emitter is assumed to be stationary, i.e. ẋk = 0, while the sensors move along
their trajectories with a constant speed.
The speed of propagation is the speed of light c, the TOA measurement can be expressed by:
1 (i )
t0 + ||xk − xk ||,
c
(i )
where || · || denotes the vector norm. t0 is the emission time of the signal and ||rk || =
(i ) (i )
||xk − xk || is the range between emitter and sensor i, i = 1, 2, at time tk , where rk denotes
the emitter position relative to sensor i.
The TOA measurement consists of the unknown time of emission t0 and the time the signal
needs for propagating the relative vector between the emitter and sensor i. Calculating the
difference between the TOA measurements eliminates the unknown time t0 and yields the
TDOA measurement at time tk :
1 (1) (2)
htk =||xk − xk || − ||xk − xk || .
c
The measurement in the range domain is obtained by multiplication with the speed of the
light c:
(1) (2)
hrk = ||xk − xk || − ||xk − xk ||.
The measurement equation is a function of the unknown emitter position xk , the emitter speed
is not important. Furthermore, the positions of the sensors which are changing over time are
parameters of the measurement equation, the sensor speed is irrelevant. The two-dimensional
vector of position xk of the emitter is to be estimated. The emitter is stationary, its position is
independent of the time and it holds for all time step tk :
x k = x0 .
A typical TDOA situation is illustrated in Figure 2. The two sensors move at the edge of the
observation area in an easterly direction indicated by the arrows. They observe a stationary
emitter. A single accurate, i.e. noise-free, TDOA measurement defines a hyperbola as possible
emitter location. In Figure 2, the red curve shows the branch of the hyperbolae on which the
emitter must be placed.
The combination of two measurements of the stationary emitter taken over time leads to an
ambiguity of the emitter position. The two detection results are the true position of the emitter
and the position mirrored along the connecting line between the sensors. This ambiguity can
be resolved in various ways, e.g. by a maneuver of the sensors, the addition of a third sensor,
or an additional bearing measurement. Alternatively, sensors which are sensitive only in the
hemisphere can be used, and thus able to observe only this half-space. Here the sensors are
positioned at the edge of the observation area, e.g. on a coast for the observation of a ground
emitter or on the edge of a hostile territory.
On passive emitter tracking in sensor networks 299
25
20
km in north direction →
stationary emitter e
15
10
r(1) r
(2)
5
2 moving sensors
s(1) s(2)
0
0 5 10 15 20 25
km in east direction →
Fig. 2. TDOA scenario
The measurement process is modeled by adding white Gaussian noise to the measurement
function. We get the measurement equation in the range domain at time tk :
where σr denotes the standard deviation of the measurement error in the range domain. The
measurement noise urk is i.i.d., the measurement error is independent from time to time, i.e.
mutually independent, and identically distributed.
This shows that the CRLB depends only on the relative position of the sensors and the emitter,
the measurement accuracy and the number of measurements.
The FIM J1 at time t1 will usually be singular since we cannot estimate the full position vector
x from a single TDOA measurement without additional assumptions, see (Van Trees, 1968).
In the present case these assumptions concern the area in which the emitter is supposed to be.
These assumptions about the prior distribution on x are added to the FIM at time t1 .
For visualization, the estimation accuracy is given as the square root of the trace of the 2 × 2
CRLB matrix.
Figure 3 shows a plot of the CRLB in the plane for the two investigated scenarios without
taking into account of prior information. The initial sensor positions are marked with green
triangles, and the red circle designates the position of the emitter. For a grid of possible emitter
y range in km→
y range in km→
x range in km → x range in km →
Fig. 3. CRLB in the plane, values cut off at 500 m: (a) scenario 1 (b) scenario 2, colorbar in m.
positions in the plane the Fisher information J100 after 100 measurements is computed by
−1
Equation (6). The associated CRLB J100 is calculated and the square root of the trace is shown.
Values larger than 500 m have been cut off for better visualization. The color bar shows the
localization accuracy in m. The localization accuracy can be read from the figure for any
emitter location in the plane.
In the first scenario, the emitter lies exactly in the area of optimal approach to the target.
On passive emitter tracking in sensor networks 301
In the second scenario, it is near the region of divergence which indicates poor localization
performance.
3.2.2 Results
For comparison of the estimation methods, the Root Mean Square Error (RMSE), the squared
distance of the estimate to the true target location xk is used in Monte Carlo simulations. They
(i )
are averaged over N, the number of Monte Carlo runs. Let x̂k be the estimate of the ith run
at time tk . Than, the RMSE at time tk is computed as:
1 N (i ) (i )
RMSEk = ∑ (xk − x̂k ) T (xk − x̂k ). (8)
N i =1
Four estimation algorithms which solve the nonlinear emitter localization problem are inves-
tigated and compared.
• The Maximum Likelihood Estimate (MLE) is that value of xk which maximizes the
likelihood function (30). Since there is no closed-form ML solution for xk , a numerical
iterative search algorithm is needed to find the minimum of the quadratic form, see
equation (42). In our case, the simplex method due to Nelder and Mead is used. It is
initialized with a central point from the observation area in scenario 1, in the second
scenario the initialization point is chosen at a distance of up to about 5 km from the
true target position. Being a batch algorithm, the MLE evaluates, at each update, the
complete measurement dataset. It attains the CRLB when properly initialized. One
disadvantage of the ML estimator is the higher computational effort in comparison to
the Kalman filters, as can be seen in Table 1. Table 1 shows the computational efforts of
the different estimation algorithms for a Monte Carlo simulation with 1000 runs for the
first scenario. One advantage of the MLE is the superior performance in comparison to
the Kalman filters.
• The Extended Kalman filter (EKF) approximates the nonlinear measurement equation
by its first-order Taylor series expansion:
(1) (2)
(xk − xk ) T (xk − xk ) T
k =
H − . (9)
(1) (2)
||xk − xk || ||xk − xk ||
Then, the Kalman filter equations are applied. The EKF is highly sensitive to the ini-
tialization and works only if the initial value is near the true target position. The EKF
may not reach the CRLB even in the case of a good initialization. Initial values are cho-
sen from a parametric approach similar to the approach described in (Mušicki & Koch,
2008): the first measurement is used for initialization. It defines a hyperbola as possi-
ble emitter locations from which several points are taken. These points initialize a ML
estimate which evaluates a sequence of first measurements. The best result is the initial
value of the EKF and the UKF. The computational efforts shown in Table 1 include this
phase of initialization.
• The Unscented Kalman filter (UKF) (see (Julier & Uhlmann, 2004)) uses the Gaus-
sian representation of the posterior density via a set of deterministically chosen sample
points. These sample points are propagated through the Unscented Transform (UT).
302 Sensor Fusion and Its Applications
Since the nonlinearity is in the measurement equation, the UT is applied in the update
step. Then the KF equations are carried out.
The initialization is the same as in the EKF. Poor initialization values result in divergent
tracks like in the EKF case.
Time in sec
EKF UKF MLE GS
49 80 939 90
• The static Gaussian Mixture (GM) filter overcomes the initialization difficulties of the
Kalman filter like EKF and UKF. It approximates the posterior density by a Gaussian
Mixture (GM)((Tam et al., 1999)), a weighted sum of Gaussian density functions. The
computational effort of finding a good initialization point is omitted here. The first
measurement is converted into a Gaussian sum. The algorithmic procedure for compu-
tation of weights w g , means x g and covariances P g is the same as in (Mušicki & Koch,
2008). The mapping of the TDOA measurement into the Cartesian state space consists
of several steps:
– present the ±σr hyperbolae in the state space,
– choose the same number of points on each hyperbolae,
– inscribe an ellipse in the quadrangle of two points on the +σr and two points on
the −σr hyperbola,
– the center of the ellipse is the mean, the ellipse the covariance and the square root
of the determinant the weight of the Gaussian summand.
An EKF is started for each mean and covariance, the weights are updated with the
posterior probability. The final mean is computed as weighted sum of the individual
EKF means: x̄ = ∑ng=1 w g x g , where n is the number of Gaussian terms.
The performance of these four estimation algorithms is investigated in two different tracking
scenarios. In the first scenario, the emitter at (15, 15) km lies in a well-locatable region. MLE
shows good performance. The results of EKF and UKF are shown in Figure 4. They perform
well and the NEES, see appendix A.2, lies in the 95% interval [1.878, 2.126] for both filters, as
can be seen from Figure 4 (b). For this scenario the static GM filter shows no improvement
compared to a single EKF or UKF.
Scenario 2 analyzes a parallel flight of the sensors. The CRLB for the emitter position in (10, 7)
km indicates poor estimation accuracy. EKF and UKF have heavy initialization problems, both
have a high number of diverging tracks. Also the MLE suffers from difficulties of divergence.
The initialization with a GM results in 9 simultaneously updated EKFs. The sampling from the
GM approximation of the first measurement is presented in Figure 5 (a). The black solid lines
are the ±σr hyperbolae. The sampling points are displayed in blue. They give an intelligent
approximation of the first measurement. In Figure 5 (b) the RMSE of the GM filter and the
MLE are plotted in comparison to the CRLB. In this scenario the GM filter, the bank of 9
EKFs, shows good performance. After an initial phase, it nears asymptotically the CRLB.
The results of a single KF are unusable, they are higher than 105 m and for better visibility
On passive emitter tracking in sensor networks 303
1000 2.6
EKF EKF
900 UKF 2.4 UKF
CRLB GM
800
2.2
700
2
600
RMSE [m]
NEES
1.8
500
1.6
400
1.4
300
1.2
200
100 1
0 0.8
0 50 100 150 200 0 50 100 150 200
time [s] time [s]
Fig. 4. (a) RMSE for EKF and UKF and (b) NEES for scenario 1
not presented. The MLE is initialized as described above and produces good results near the
CRLB. Its performance is better than the performance of the GM filter. The CRLB are shown
with initial assumptions.
3
30 10
ML
GM
km in north direction →
25 CRLB
20
RMSE [m]
15
10
e
5
1 2
s s
0 2
10
0 5 10 15 20 25 30 0 50 100 150 200
km in east direction → time [s]
Fig. 5. (a) Sampling from the GM approximation and (b) RMSE for scenario 2
25
AOA
TDOA
20
km in north direction →
emitter
15 e
10
2 sensors
s1 s2
0
0 5 10 15 20 25
km in east direction →
(1)
xk − xk
hαk = arctan (1)
(10)
yk − yk
Addition of white noise yields:
Thus, at each time step a pair of one azimuth and one TDOA measurement can be processed.
The azimuth measurement standard deviation is assumed to be 1 degree and the TDOA mea-
surement standard deviation is assumed to be 200 m in the range domain.
6000
m in north direction →
5000
4000
moving emitter
3000
stationary sensor
2000 2
s
moving sensor
1000 s1
0
0 1000 2000 3000 4000 5000 6000
m in east direction →
Fig. 7. Measurement situation
4.2.2 Results
300 300
200 200
RMS error [m]
100 100
50 50
0 0
0 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 140 160 180
time [s] time [s]
The measurement pair and its associated measurement covariance R = diag[σα2 , σr2 ],
where diag[] means the diagonal matrix, is processed using the UT. I.e., several sigma
points in the two-dimensional measurement space are selected and transformed. We
obtain an estimation of the emitter state in the Cartesian state space and an associated
covariance. A linear Kalman filter is started with the position estimate and the asso-
ciated covariance. The update is performed in the Cartesian state space by transform-
ing the incoming measurement pair using the unscented transform. This filter uses as
model for the emitter dynamic the model for a inertially moving target. This model
does not describe correctly the emitter dynamic but addition of white Gaussian process
noise can correct the error of the model.
In Figure 8 the results based on 1000 Monte Carlo runs are presented. Figure 8 (a) shows the
comparison between the MLE only based on azimuth measurements and based on a combi-
nation of azimuth and TDOA measurements. The MLE delivers for each Monte Carlo run
one 7-dimensional estimate of the emitter state from which the resulting emitter trajectory is
computed. The RMS error to the true emitter trajectory is shown. Using the combined mea-
surement set, the performance is significant better than the AOA only results. Figure 8 (b)
visualizes the results of the linear KF using the UT. At each time step, an estimate of the emit-
ter state is computed. In spite of an insufficient dynamic model, the emitter state is estimated
quite fair in the beginning. But due to the incorrect dynamic model, the localization accuracy
in the end is about 120 m. The MLE based on the combined measurement set shows better
performance than the filter using the UT.
30 30
FDOA FDOA
TDOA TDOA
25 25
km in north direction →
km in north direction →
20 20
15 Emitter e 15 Emitter e
10 10
5 5
Sensor 1 & 2
Sensor 1 Sensor 2
0 0
−5 −5
−5 0 5 10 15 20 25 30 −5 0 5 10 15 20 25 30
km in east direction → km in east direction →
(a) tail flight (b) parallel flight
30
FDOA
TDOA
25
km in north direction →
20
15 Emitter e
10
Sensor 1 Sensor 2
0
−5
−5 0 5 10 15 20 25 30
km in east direction →
(c) flight head on
r (1) r (2)
h f = (ẋ(1) − ẋ) T 1
− (ẋ(2) − ẋ) T (2) . (19)
||r ||
( ) ||r ||
Under the assumption of uncorrelated measurement noise from time step to time step and
from the TDOA measurements, we obtain the FDOA measurement equation in the velocity
domain:
z f = h f + u f , u f ∼ N (0, σ2f ), (20)
On passive emitter tracking in sensor networks 309
where σ f is the standard deviation of the FDOA measurement. The associated TDOA/ FDOA
measurement pairs may be obtained by using the CAF ((Stein, 1981)). For each TDOA value
the associated FDOA value can be calculated. Nonlinear estimation algorithms are needed to
process the pair of TDOA and FDOA measurements and to estimate the emitter state.
Figure 9 shows the situation for different sensor headings after taking one pair of TDOA and
FDOA measurements. The green curve, i.e. the branch of hyperbola, indicates the ambiguity
after the TDOA measurement. The ambiguity after the FDOA measurement is plotted in ma-
genta. The intersection of both curves presents a gain in information for the emitter location.
This gain is very high if sensors move behind each other.
in Figure 10. Assumed is a standard deviation of TDOA of 200 m (0.67µs) and a standard
deviation of FDOA of 4 m/s, this corresponds to a standard deviation in the frequency domain
of 40 Hz, assuming a carrier frequency of about 3 GHz. The color bar shows the values for
the localization accuracy in m. In these situations, the maximal gain in localization accuracy
is obtained when the sensors fly one after the other. The results for the parallel flight can be
improved if the distance of the sensors is increased.
km in north direction →
→
km in north direction→
km in east direction →
(c) flight head on
Fig. 10. CRLB for the combination of TDOA and FDOA for one time scan
5.2.2 Results
Both TDOA and FDOA measurement equations are nonlinear. Therefore nonlinear estimation
algorithms are needed to process the combined measurement set. The performance of three
different estimation algorithms is investigated in a scenario with a moving emitter.
The investigated scenario and the results are described in (Mušicki et al., 2010). The emitter is
assumed to move at a constant speed in x-direction of −10 m/s. Due to observability reasons,
sensors perform maneuvers, they move with a constant speed, but not velocity, of 100 m/s.
The results shown here are the product of a Monte Carlo simulation with 1000 runs with a
On passive emitter tracking in sensor networks 311
4000 50
15000
45
3500
40
Emitter start 3000
10000
RMSE [m]
35
2500 30
Sensors start
2000 25
5000 TFDOA
static GM 20
1500
TDOA
15
CRLB
1000
0 10
500 5
0 0
−5000 0 20 40 60 80 50 60 70 80
0 5,000 10,000 15,000 20,000 25,000 time [s]
Fig. 11. (a) Scenario, (b) RMSE of the mobile emitter tracking (©[2010] IEEE)1
6. Conclusions
Passive emitter tracking in sensor networks is in general superior to emitter tracking using
single sensors. Even a pair of sensors improves the performance strongly. The techniques of
solving the underlying tracking problem are the same as in the single sensor case. The first
step should be the investigation of the CRLB to know the optimal achievable estimation ac-
curacy using the available measurement set. It reveals characteristic features of localization
and gives an insight into the parametric dependencies of the passive emitter tracking problem
under consideration. It shows that the estimation accuracy is often strongly dependent on the
geometry. Secondly, a powerful estimation algorithm is needed to solve the localization prob-
lem. In passive emitter tracking, states and measurements are not linearly related. Therefore,
only methods that appropriately deal with nonlinearities can be used. This chapter provides
a review of different nonlinear estimation methods. Depending on the type of measurement
and on different requirements in various scenarios, different estimation algorithms can be the
methods of choice. E.g., to obtain good results near the CRLB the MLE is an appropriate
method. Here, the computational effort is higher compared to alternatives such as Kalman
filters. Tracking from the first measurement is possible using the UT in the TDOA/AOA case
or using the GM filter or the GMM-ITS filter. They overcome the initialization difficulties of
single Kalman Filters. The UT transform the measurement into the Cartesian state space and
the GM filter and GMM-ITS filter approximate the first measurement by a Gaussian Mixture,
a weighted sum of Gaussian densities. The first measurement is transformed into the Carte-
sian space and converted into a Gaussian sum. The tracking with the GM filter and GMM-ITS
filter shows good performance and results near the CRLB.
For passive emitter tracking in sensor networks different measurement types can be gained
by exploiting the signal coming from the target. Some of them can be taken by single sensors:
e. g. bearing measurements. Others are only gained in the sensor network, a minimum of two
sensors is needed. The combination of different measurements leads to a significant gain in
estimation accuracy.
7. References
Bar-Shalom, Y., Li, X. R. & Kirubarajan, T. (2001). Estimation with Applications to Tracking and
Navigation: Theory Algorithms and Software, Wiley & Sons.
Becker, K. (1992). An Efficient Method of Passive Emitter Location, IEEE Trans. Aerosp. Electron.
Syst. 28(4): 1091–1104.
Becker, K. (1993). Simple Linear Theory Approach to TMA Observability, IEEE Trans. Aerosp.
Electron. Syst. 29, No. 2: 575–578.
Becker, K. (1996). A General Approach to TMA Observability from Angle and Frequency
Measurements, IEEE Trans. Aerosp. Electron. Syst. 32, No. 1: 487–494.
Becker, K. (1999). Passive Localization of Frequency-Agile Radars from Angle and Frequency
Measurements, IEEE Trans. Aerosp. Electron. Syst. 53, No. 4: 1129 – 1144.
Becker, K. (2001). Advanced Signal Processing Handbook, chapter 9: Target Motion Analysis
(TMA), pp. 1–21.
Becker, K. (2005). Three-Dimensional Target Motion Analysis using Angle and Frequency
Measurements, IEEE Trans. Aerosp. Electron. Syst. 41(1): 284–301.
Julier, S. J. & Uhlmann, J. K. (2004). Unscented Filtering and Nonlinear Estimation, Proc. IEEE
92(3): 401–422.
K. C. Ho, L. Y. (2008). On the Use of a Calibration Emitter for Source Localization in the Pres-
ence of Sensor Position Uncertainty, IEEE Trans. on Signal Processing 56, No. 12: 5758
– 5772.
Kaune, R. (2009). Gaussian Mixture (GM) Passive Localization using Time Difference of Ar-
rival (TDOA), Informatik 2009 — Workshop Sensor Data Fusion: Trends, Solutions, Appli-
cations.
Mušicki, D., Kaune, R. & Koch, W. (2010). Mobile Emitter Geolocation and Tracking Using
TDOA and FDOA Measurements, IEEE Trans. on Signal Processing 58, Issue 3, Part
2: 1863 – 1874.
Mušicki, D. & Koch, W. (2008). Geolocation using TDOA and FDOA measurements, Proc. 11th
International Conference on Information Fusion, pp. 1–8.
Oispuu, M. & Hörst, J. (2010). Azimuth-only Localization and Accuracy Study for Piecewise
Curvilinearly Moving Targets, International Conference on Information Fusion.
Ristic, B., Arulampalam, S. & Gordon, N. (2004). Beyond the Kalman Filter, Particle Filters for
Tracking Applications, Artech House.
So, H. C., Chan, Y. T. & Chan, F. K. W. (2008). Closed-Form Formulae for Time-Difference-of-
Arrival Estimation, IEEE Trans. on Signal Processing 56, No. 6: 2614 – 2620.
Stein, S. (1981). Algorithms for Ambiguity Function Processing, IEEE Trans. Acoustic, Speech
and Signal Processing 29(3): 588–599.
Tam, W. I., Plataniotis, K. N. & Hatzinakos, D. (1999). An adaptive Gaussian sum algorithm
for radar tracking, Elsevier Signal Processing 77: 85 – 104.
Van Trees, H. L. (1968). Detection, Estimation and Modulation Theory, Part I, New York: Wiley &
Sons.
methods. The CRLB is calculated from the inverse of the Fisher Information Matrix (FIM) J.
The CR inequality reads:
E (êk − ek )(êk − ek ) T ≥ Jk−1 , (28)
Jk = E ∇ek ln p( Zk |ek )(∇ek ln p( Zk |ek )) T , (29)
where ê determines the estimate and E [·] determines the expectation value.
The Fisher information J uses the Likelihood function, the conditional probability p( Zk |ek ),
for calculation:
1 1 k T 1
p( Zk |ek ) = exp − ∑ (zi − h(ei )) R (zi − h(ei )) ,
−
(30)
det(2πR) 2 i =1
where
∂h(ei ) ∂h(ei ) ∂ei
= . (32)
∂ek ∂ei ∂ek
For the stationary scenario the state vector e of the emitter is the same at each time step. That
means,
∂h(ei ) ∂h(ei )
= ∀i. (33)
∂ei ∂ek
For the mobile emitter case, we obtain, using the dynamic equation of the inertially target
motion,
e k = F k | k −1 e k −1 , (34)
where Fk|k−1 is the evolution matrix which relates the target state from time tk to time tk−1 ,
the FIM at reference time tk
k T
∂h(ei ) ∂h(ei )
Jk = ∑ Fk−|i1 T ∂ei
R −1
∂ei
Fk−|i1 . (35)
i =1
At time t1 the FIM J1 is usually singular and not invertible, because the state vector ek can-
not be estimated based on a single measurement without additional assumptions. Thus, we
incorporate additional assumptions. These assumptions may concern the area in which the
emitter is attended to be. This prior information about a prior distribution of e can be added
to the FIM at time t1 as artificial measurement:
pr
J1 = J1 + J pr , (36)
where J pr is the prior Fisher information. Under the Gaussian assumption of e follows:
1
J pr = P−
pr , (37)
On passive emitter tracking in sensor networks 315
A.2 NEES
Consistency is necessary for filter functionality, thus the normalized estimation error squared,
the NEES is investigated, see (Bar-Shalom et al., 2001). A consistent estimator describes the
size of the estimation error by its associated covariance matrix adequately. Filter consistency
is necessary for the practical applicability of a filter.
The computation of the NEES requires the state estimate ek|k at time tk , its associated covari-
ance matrix Pk|k and the true state ek .
Let ẽk|k be the error of ek|k : ẽk|k := ek − ek|k . The NEES is defined by this term:
thus, k is the squared estimation error ẽk|k which is normalized with its associated covariance
1
P−
k|k
. Under the assumption that the estimation error is approximately Gaussian distributed
and the filter is consistent, k is χ2 distributed with ne degrees of freedom, where ne is the
dimension of e: k ∼ χ2ne . Then:
E [ k ] = n e . (39)
The test will be based on the results of N Monte Carlo Simulations that provide N independent
samples ki , i = 1, . . . , N, of the random variable k . The sample average of these N samples is
1 N i
N i∑
¯ k = k . (40)
=1
If the filter is consistent, N ¯ k will have a χ2 density with Nne degrees of freedom.
Hypothesis H0 , that the state estimation errors are consistent with the filter calculated covari-
ances is accepted if ¯ k ∈ [ a1 , a2 ], where the acceptance interval is determined such that:
P {¯ k ∈ [ a1 , a2 ] | H0 } = 1 − α. (41)
In this chapter, we apply the 95% probability concentration region for ¯ k , i.e. α is 0, 05.
In the TDOA scenario of a stationary emitter, the dimension ne of the emitter is 2, so the
number of degrees of freedom for the NEES is equal to 2. Basis of the test are the results of
N = 1000 Monte Carlo simulations, we get a total of 2000 degrees of freedom. The interval
[1.878, 2.126] is obtained for 2000 degrees of freedom with the values of the χ2 table as two-
sided acceptance interval.
A.3.1 MLE
The MLE is a direct search method and computes at each time step the optimal emitter state
based on the complete measurement dataset. It stores the complete measurement dataset and
belongs to the batch algorithms. The MLE provides that value of ek which maximizes the
Likelihood function, the conditional probability density function, (30). This means that the
MLE minimizes the quadratic form:
k
g(ek ) = ∑ (zi − h(ei ))T R−1 (zi − h(ei )) (42)
i =1
with respect to ek . Since there is no closed-form MLE solution for ek in passive emitter track-
ing using TDOA, FDOA and AOA, a numerical iterative search algorithm is needed to find the
minimum of the quadratic form. Therefore, application of MLE suffers from the same prob-
lems as the numerical algorithms. The ML method attains asymptotically the CRLB when
properly initialized. One disadvantage of the MLE is the high computational effort in com-
parison to the Kalman filters.
A.3.2 EKF
The Extended Kalman filter (EKF) is a recursive Bayesian estimator which approximates the
nonlinearities by linearization. The Bayes theorem which expresses the posterior probability
density function of the state based on all available measurement information is used to obtain
an optimal estimate of the state:
p(zk |ek ) p(ek | Zk−1 )
p(ek | Zk ) = , (43)
p(zk | Zk−1 )
with p(zk | Zk−1 ) = p(zk |ek ) p(ek | Zk−1 )dek .
The filter consists of two steps, prediction using the dynamic equation and update, using
the Bayes theorem to process the incoming measurement. Processing a combination of two
measurements is the same as filtering first with one measurement and then processing the
result with the other measurement, as shown in (Mušicki et al., 2010).
In passive target tracking using TDOA, angle and FDOA measurements, the nonlinearity is
in the measurement equations. Thus, the EKF approximates the measurement equations by
its first-order Taylor series expansions. Here, the TDOA and AOA measurement functions are
differentiated with respect to the position coordinates and the FDOA measurement function
is differentiated with respect to the position and velocity coordinates:
(1) (2)
(rk ) T (rk ) T
t =
H − (44)
k (1) (2)
||rk || ||rk ||
(1) T
yk − yk
(1)
xk − xk
α =
H (45)
k (1)
||rk ||2
(1) (2)
T
Dk − Dk
f
H = (2)
rk
(1)
rk
, (46)
k
(2) − (1)
||rk || ||rk ||
On passive emitter tracking in sensor networks 317
where
(i ) 1 (i ) (i ) (i )
(ẋk − ẋk ) − (i ) (ẋk − ẋk ) T rk rk
(i ) ||rk ||2
Dk = (i )
, i = 1, 2. (47)
||rk ||
Then the Kalman filter equations are applied. The EKF is highly sensitive to the initialization
and works satisfactorily only if the initial value is near the true target position.
A.3.3 UKF
The Unscented Kalman Filter (UKF) (see (Julier & Uhlmann, 2004)) deterministically selects
a small number of sigma points. These sigma points are propagated through a nonlinear
transformation. Since the nonlinearities in passive target tracking are in the measurement
equations, the Unscented Transform (UT) is applied in the update step. In the state space,
sample points and their weights are deterministically chosen. They represent mean and co-
variance of the density. The sample points are propagated through the UT. This produces the
sampling points in the measurement space. Furthermore a covariance and a cross covariance
is computed. Than the Filter Equations are passed.
Alternatively, the UT can be used to transform measurements in the state space. In this chap-
ter, measurements of the two-dimensional measurement space of TDOA and azimuth mea-
surements and their associated measurement covariances are converted into the Cartesian
state space. A position estimate and the associate position coavariance in the Cartesian state
space is obtained.
The UT algorithm is very simple and easy to apply, no complex Jacobians must be calculated.
The initialization is very important. A proper initialization is substantial for good results.
from which one can see, that multiplying p(zk |ek ) by any constant will not change the poste-
rior density.
The approximation of the likelihood is performed in the state space and can be made as accu-
rate as desirable through the choice of the number of mixture components. The problem is to
318 Sensor Fusion and Its Applications
formulate an algorithmic procedure for computation of weights, means and covariances. The
number of components can increase exponentially over time.
We describe two types of GM filters, a dynamic GM filter and a static GM filter.
Dynamic GM filter
The dynamic GM filter represents both the measurement likelihood p(zk |ek ) and the state esti-
mate p(ek | Zk ) in the form of Gaussian mixtures in the state space. The algorithm is initialized
by approximating the likelihood function after the first measurement in the state space. This
Gaussian Mixture yields a modelling of the state estimate too. New incoming TDOA mea-
surements are converted into a Gaussian mixture in the state space. Each component of the
state estimate is updated by each measurement component to produce one component of the
updated emitter state estimate pdf p(ek | Zk ). This update process is linear and performed by
a standard Kalman filter. The number of emitter state estimate components increases expo-
nentially in time. Therefore, their number must be controlled by techniques of pruning and
merging.
For each time step the state estimate is obtained by the mean and the covariance:
S k Mk
êk = ∑ ξ ( g)êk|k ( g) (50)
g =1
S k Mk
Pk|k = ∑ ξ ( g) Pk|k + êk|k ( g)êkT|k ( g) − êk|k êkT|k . (51)
g =1
The GM filter equations can be applied to all passive emitter tracking situations in this chap-
ter. The measurement likelihoods must be presented by their GM.
Static GM filter
The static GM filter represents the likelihood function p(z1 |e1 ) after taking the first measure-
ment. The representation in the state space is used. Using the Bayesian equation this likeli-
hood can be used to present the posterior density p(e1 |z1 ). The component of the Gaussian
Mixture are restricted to the number of the components of this Gaussian Sum representation.
For each new incoming measurement an EKF is performed to update the posterior density.
The algorithmic procedure for computation of weights w g , means e g and covariances P g of
the GM is the same as in the dynamic case. The first measurement is converted into a Gaus-
sian sum. The computational effort of finding a good initialization point for a single KF is
omitted here. An EKF is started for each mean and covariance, the weights are updated with
the probabilities p(z|e). The filter output is the weighted sum of the individual estimates and
covariances:
n
êk = ∑ w( g)êk|k ( g) (52)
g =1
n
Pk|k = ∑ w( g) Pk|k + êk|k ( g)êkT|k ( g) − êk|k êkT|k , (53)
g =1
14
X
Fuzzy-Pattern-Classifier Based
Sensor Fusion for Machine Conditioning
Volker Lohweg and Uwe Mönks
Ostwestfalen-Lippe University of Applied Sciences, inIT – Institute Industrial IT,
Lemgo
Germany
1. Introduction
Sensor and Information fusion is recently a major topic not only in traffic management,
military, avionics, robotics, image processing, and e.g. medical applications, but becomes
more and more important in machine diagnosis and conditioning for complex production
machines and process engineering. Several approaches for multi-sensor systems exist in the
literature (e.g. Hall, 2001; Bossé, 2007).
In this chapter an approach for a Fuzzy-Pattern-Classifier Sensor Fusion Model based on a
general framework (e.g. Bocklisch, 1986; Eichhorn, 2000; Schlegel, 2004; Lohweg, 2004;
Lohweg, 2006; Hempel, 2008; Herbst 2008; Mönks, 2009; Hempel 2010) is described. An
application of the fusion method is shown for printing machines. An application on quality
inspection and machine conditioning in the area of banknote production is highlighted.
The inspection of banknotes is a high labour intensive process, where traditionally every
note on every sheet is inspected manually. Machines for the automatic inspection and
authentication of banknotes have been on the market for the past 10 to 12 years, but recent
developments in technology have enabled a new generation of detectors and machines to be
developed. However, as more and more print techniques and new security features are
established, total quality, security in banknote printing as well as proper machine conditions
must be assured (Brown, 2004). Therefore, this factor necessitates amplification of a sensorial
concept in general. Such systems can be used to enhance the stability of inspection and
condition results for user convenience while improving machine reliability.
During printed product manufacturing, measures are typically taken to ensure a certain
level of printing quality. This is particularly true in the field of security printing, where the
quality standards, which must be reached by the end-products, i.e. banknotes, security
documents and the like, are very high. Quality inspection of printed products is
conventionally limited to the optical inspection of the printed product. Such optical
inspection can be performed as an off-line process, i.e. after the printed product has been
processed in the printing press, or, more frequently, as an in-line process, i.e. on the printing
press, where the printing operation is carried out. Usually only the existence or appearance
of colours and their textures are checked by an optical inspection system.
320 Sensor Fusion and Its Applications
In general, those uni-modal systems have difficulties in detection of low degradation errors
over time (Ross 2006; Lohweg, 2006). Experienced printing press operators may be capable
of identifying degradation or deviation in the printing press behaviour, which could lead to
the occurrence of printing errors, for instance characteristic noise produced by the printing
press. This ability is however highly dependent on the actual experience, know-how and
attentiveness of the technical personnel operating the printing press. Furthermore, the
ability to detect such changes in the printing press behaviour is intrinsically dependent on
personnel fluctuations, such as staff reorganisation, departure or retirement of key
personnel, etc. Moreover, as this technical expertise is human-based, there is a high risk that
this knowledge is lost over time. The only available remedy is to organize secure storage of
the relevant technical knowledge in one form or another and appropriate training of the
technical personnel.
Obviously, there is need for an improved inspection system which is not merely restricted to
the optical inspection of the printed end-product, but which can take other factors into
account than optical quality criteria. A general aim is to improve the known inspection
techniques and propose an inspection methodology that can ensure a comprehensive
quality control of the printed substrates processed by printing presses, especially printing
presses which are designed to process substrates used in the course of the production of
banknotes, security documents and such like.
Additionally, a second aim is to propose a method, which is suited to be implemented as an
expert system designed to facilitate operation of the printing press. In this context, it is
particularly desired to propose a methodology, which is implemented in an expert system
adapted to predict the occurrence of printing errors and machine condition and provide an
explanation of the likely cause of errors, should these occur. An adaptive learning model, for
both, conditioning and inspection methods based on sensor fusion and fuzzy interpretation
of data measures is presented here.
The performance Perf of a system should be higher than the performance of the two mono-
sensory systems, or at least, it should be ensured that:
The fusion process incorporates performance, effectiveness and benefit. With fusion of
different sources the perceptual capacity and plausibility of a combined result should be
Fuzzy-Pattern-Classifier Based Sensor Fusion for Machine Conditioning 321
increased. It should be pointed out the above mentioned terms are not strictly defined as
such. Moreover, they depend on a specific application as pointed out by Wald (Wald, 1999):
“Information fusion expresses the means and the tools for the alliance of data origination from
different sources; it aims to obtain information of greater quality, the exact definition of greater
quality will depend on the application.”
The World Model (Luo, 1989) describes the fusion process in terms of a changing
environment (cf. Fig. 1). The environment reacts on the system which controls (weighting
factors Ai) a local fusion process based on different sensors Si. On the basis of sensor models
and the behaviour state of the sensors it is possible to predicate the statistical characteristics
of the environment. Based on the World Model the environment stands for a general
(printing) production machine. The fusion process generates in a best-case-scenario
plausible and confident information which is necessary and sufficient for a stable decision.
Fig. 1. World Model flow chart for multi-sensor information fusion (Luo, 1989)
Insofar, a fusion process must create a low amount of data which creates reliable
knowledge. Usually the main problems in sensor fusion can be described as follows: Too
much data, poor models, bad features or too many features, and applications are not
analysed properly. One major misbelieve is that machine diagnosis can be handled only
based on the generated data – knowledge about the technical, physical, chemical, or other
processes are indispensable for modeling a multi-sensor system.
Over the last decade many researchers and practitioners worked on effective multi-sensor
fusion systems in many different areas. However, it has to be emphasized that some “Golden
Rules” were formed which should be considered when a multi-sensor fusion system is
researched and developed. One of the first who suggested rules (dirty secrets) in military
applications were Hall and Steinberg (Hall, 2001a). According to their “Dirty Secrets” list, ten
rules for automation systems should be mentioned here as general statements.
1. The system designers have to understand the production machine, automation
system, etc. regarding its specific behaviour. Furthermore, the physical, chemical,
biological and other effects must be conceived in detail.
2. Before designing a fusion system, the technical data in a machine must be
measured to clarify which kind of sensor must be applied.
3. A human expert who can interpret measurement results is a must.
4. There is no substitute for an excellent or at least a good sensor. No amount of data
from a not understood or not reliable data source can substitute a single accurate
sensor that measures the effect that is to be observed.
5. Upstream sins still cannot be absolved by downstream processing. Data fusion
processing cannot correct for errors in the pre-processing (or a wrong applied sensor)
of individual data. “Soft” sensors are only useful if the data is known as reliable.
6. Not only may the fused result be worse than the best sensor – but failure to address
pedigree, information overload, and uncertainty may show a worst result.
7. There is no such thing as a magic fusion algorithm. Despite claims of the contrary,
no algorithm is optimal under all conditions. Even with the use of agent systems,
ontologies, Dempster-Shafer and neuro-fuzzy approaches – just to name a few –
the perfect algorithm is not invented yet. At the very end the application decides
which algorithms are necessary.
8. The data are never perfectly de-correlated. Sources are in most cases statistically
dependent.
9. There will never be enough training data available in a production machine.
Therefore, hybrid methods based on models and training data should be used to
apply Machine Learning and Pattern Recognition.
10. Data fusion is not a static process. Fusion algorithms must be designed so that the
time aspect has to be considered.
1. Raw data noise: Noisy data results from not sufficiently mounted or improperly
maintained sensors. Also illumination units which are not properly maintained can
cause trouble. Also, in general, machine drives and motors can couple different
kinds of noise into the system.
2. Intraclass variations: These variations are typically caused by changing the sensory
units in a maintenance process or by ageing of illuminations and sensors over a
period of time.
3. Interclass variations: In a system which has to handle a variety of different
production states over a period of time, there may be interclass similarities in the
feature space of multiple flaws.
4. Nonuniversality: A system may not be able to create expedient and stable data or
features from a subset of produced material.
Some of the above mentioned limitations can be overcome by including multiple
information sources. Such systems, known as multimodal systems, are expected to be more
reliable, due to the presence of multiple, partly signal-decorrelated, sensors. They address
the problems of nonuniversality, and in combination with meaningful interconnection of
signals (fusion), the problem of interclass variations. At least, they can inform the user about
problems with intraclass variations and noise.
A generic multi-sensor system consists of four important units: a) the sensor unit which
captures raw data from different measurement modules resp. sensors; b) the feature
extraction unit which extracts an appropriate feature set as a representation for the machine
to be checked; c) the classification unit which compares the actual data with their
corresponding machine data stored in a database; d) the decision unit which uses the
classification results to determine whether the obtained results represent e.g. a good printed
or valid banknote. In multimodal systems information fusion can occur in any of the units.
Generally three fusion types, depending on the abstraction level, are possible. The higher
the abstraction level, the more efficient is the fusion. However, the high abstraction level
fusion is not necessarily more effective due to the fact that data reduction methods are used.
Therefore, information loss will occur (Beyerer, 2006).
1. Signal level fusion – Sensor Association Principle. At signal level all sensor signals are
combined. It is necessary that the signals are comparable in a sense of data amount
resp. sampling rate (adaption), registration, and time synchronisation.
2. Feature level fusion – Feature Association Principle. At feature level all signal
descriptors (features) are combined. This is necessary if the signals are not
comparable or complementary in a sense of data amount resp. sampling rate
(adaption), registration, and time synchronisation. Usually this is the case if images
and 1D-sensors are in use. There is no spatio-temporal coherence between the
sensor signals.
3. Symbol level fusion – Symbol Association Principle. At symbol level all classification
results are combined. In this case the reasoning (the decision) is based e.g. on
probability or fuzzy membership functions (possibility functions). This is necessary
if the signals are not comparable or complementary in a sense of data amount resp.
sampling rate (adaption), registration, synchronisation and expert’s know-how has
to be considered.
It is stated (Ross, 2006) that generic multimodal sensor systems which integrate information
by fusion at an early processing stage are usually more efficient than those systems which
perform fusion at a later stage. Since input signals or features contain more information
about the physical data than score values at the output of classifiers, fusion at signal or
feature level is expected to provide better results. In general, fusion at feature level is critical
under practical considerations, because the dimensionality of different feature sets may not
be compatible. Therefore, the classifiers have the task to adapt the different dimensionalities
onto a common feature space. Fusion in the decision unit is considered to be rigid, due to
the availability of limited information and dimensionality.
P1
Preprocessing Decision
Sensors Fuzzy Classifier
(e.g. spectral transforms) Unit
Pn
5. Modelling by Fuzzy-Pattern-Classification
Fuzzy set theory, introduced first by Zadeh (Zadeh, 1965), is a framework which adds
uncertainty as an additional feature to aggregation and classification of data. Accepting
vagueness as a key idea in signal measurement and human information processing, fuzzy
membership functions are a suitable basis for modelling information fusion and
classification. An advantage in a fuzzy set approach is that class memberships can be trained
by measured information while simultaneously expert’s know-how can be taken into
account (Bocklisch, 1986).
Fuzzy-Pattern-Classification techniques are used in order to implement the machine
behaviour analysis. In other words, sets of fuzzy-logic rules are applied to characterize the
behaviours of the printing press and model the various classes of printing errors which are
likely to appear on the printing press. Once these fuzzy-logic rules have been defined, they
can be applied to monitor the behaviour of the printing press and identify a possible
correspondence with any machine behaviour which leads or is likely to lead to the
328 Sensor Fusion and Its Applications
5.1 Modified-Fuzzy-Pattern-Classification
The Modified-Fuzzy-Pattern-Classifier (MFPC) is a hardware optimized derivate of
Bocklisch’s Fuzzy-Pattern-Classifier (FPC) (Bocklisch, 1986). It should be worth mentioning
here that Hempel and Bocklisch (Hempel, 2010) showed that even non-convex classes can be
modelled within the framework of Fuzzy-Pattern-Classification. The ongoing research on
FPC for non-convex classes make the framework attractive for Support Vector Machine
(SVM) advocates.
Inspired from Eichhorn (Eichhorn, 2000), Lohweg et al. examined both, the FPC and the
MFPC, in detail (Lohweg, 2004). MFPC’s general concept of simultaneously calculating a
number of membership values and aggregating these can be valuably utilised in many
approaches. The author’s intention, which yields to the MFPC in the form of an optimized
structure, was to create a pattern recognition system on a Field Programmable Gate Array
(FPGA) which can be applied in high-speed industrial environments (Lohweg, 2009). As
MFPC is well-suited for industrial implementations, it was already applied in many
applications (Lohweg, 2006; Lohweg, 2006a; Lohweg, 2009; Mönks, 2009; Niederhöfer, 2009).
Based on membership functions μ m, p , MFPC is employed as a useful approach to
modelling complex systems and classifying noisy data. The originally proposed unimodal
MFPC fuzzy membership function μ m, p can be described in a graph as:
Fuzzy-Pattern-Classifier Based Sensor Fusion for Machine Conditioning 329
(m)
Dr Df
Bf
Br
m0 C r m0 m 0 C f m
Fig. 3. Prototype of a unimodal membership function
( m , p) A 2 d ( m , p ) , (3)
1 , m m0
Br
Cr
d(m , p) Df
. (4)
1 m m0
1 , m m0
B Cf
f
As for Fig. 3, the potential function (m , p) is a function concerning parameters A and the
parameter vector p containing coefficients m0 , Br , B f , C r , C f , Dr , and D f . A is denoted
as the amplitude of this function, and in hardware design usually set A 1. The coefficient
m0 is featured as center of gravity. The parameters Br and B f determine the value of the
membership function on the boundaries m0 C r and m0 C f correspondingly. In addition,
rising and falling edges of this function are described by (m0 Cr , p) Br and
(m0 C f , p) B f . The distance from the center of gravity is interpreted by C r and C f . The
parameters Dr and D f depict the decrease in membership with the increase of the distance
from the center of gravity m0 . Suppose there are M features considered, then Eq. 3 can be
reformulated as:
M 1
1
M di ( mi , pi ) (5)
( m , p) 2 i 0
.
M 1
1
M di ( mi , pi ) (6)
MFPC ( m , p) 2 i 0
,
where
D
mi m0 , i 1 m mmin i (7)
di ( mi , pi ) , m0 , i (mmaxi mmini ), C i (1 2 PCE ) ( maxi ).
C 2 2
i
The parameters mmax and mmin are the maximum and minimum values of a feature in the
training set. The parameter mi is the input feature which is supposed to be classified.
Admittedly, the same objects should have similar feature values that are close to each other.
In such a sense, the resulting value of mi m0, i ought to fall into a small interval,
representing their similarity. The value PCE is called elementary fuzziness ranging from
zero to one and can be tuned by experts’ know-how. The same implies to D = (2, 4, 8, …).
The aggregation is performed by a fuzzy averaging operation with a subsequent
normalization procedure.
As an instance of FPC, MFPC was addressed and successfully hardware-implemented on
banknote sheet inspection machines. MFPC utilizes the concept of membership functions in
fuzzy set theory and is capable of classifying different objects (data) according to their
features, and the outputs of the membership functions behave as evidence for decision
makers to make judgments. In industrial applications, much attention is paid on the costs
and some other practical issues, thus MFPC is of great importance, particularly because of
its capability to model complex systems and hardware implementability on FPGAs.
where Y is the output matrix, W is the KLT transform matrix followed by the data (input)
matrix:
x11 x12 x1 N
x21 x22 x2 N
X . (9)
x M 1 x M 2 x MN
Furthermore, the expectation value E(•) (average x ) of the data vectors is necessary:
E( x1 ) x1
E( x2 ) x2 1 N
x E( X ) , where xi xi . (10)
N i 1
E( x M ) x M
332 Sensor Fusion and Its Applications
The variables cii are called variances; the variables cij are called covariances of a data set.
The correlation coefficients are described as ij . Correlation is a measure of the relation
between two or more variables. Correlation coefficients can range from -1 to +1. The value
of -1 represents a perfect negative correlation while a value of +1 represents a perfect
positive correlation. A value of 0 represents no correlation. In the next step the eigenvalues
i and the eigenvectors V of the correlation matrix are computed by Eq. 13, where
diag( ) is the diagonal matrix of eigenvalues of C:
diag( ) V 1 R V . (13)
The eigenvectors generate the KLT matrix and the eigenvalues represent the distribution of
the source data's energy among each of the eigenvectors. The cumulative energy content for
the pth eigenvector is the sum of the energy content across all of the eigenvectors from 1
through p. The eigenvalues have to be sorted in decreasing order:
1 0
, where 1 2 M . (14)
0 M
The corresponding vectors vi of the matrix V have also to be sorted in decreasing order
like the eigenvalues, where v1 is the first column of matrix V , v2 the second and v M is the
last column of matrix V . The eigenvector v1 corresponds to eigenvalue 1 , eigenvector v2
to eigenvalue 2 and so forth. The matrix W represents a subset of the column eigenvectors
as basis vectors. The subset is preferably as small as possible (two eigenvectors). The energy
distribution is a good indicator for choosing the number of eigenvectors. The cumulated
energy should map approx. 90 % on a low number of eigenvectors. The matrix Y (cf. Eq. 8)
then represents the Karhunen-Loéve transformed data (KLT) of matrix X (Lohweg, 2006a).
Fuzzy-Pattern-Classifier Based Sensor Fusion for Machine Conditioning 333
1 m1 , p1 0 0 0
0 2 m2 , p2 0 0
AFPC diag i . (15)
0 0 0
0 0 0 M mM , p M
The adaptive fuzzy inference system (AFIS), is then described with a length M unit vector
u 1, , 1 and the attractor vector A A 1 , A2 , , AM as
T T
1
AFIS AT diag i u , (16)
AT u
which can be written as
1
i 1 Ai 2 di .
M
AFIS (17)
M
i 1
Ai
The adaptive Fuzzy-Pattern-Classifier model output AFIS can be interpreted as a score value
in the range of 0 1 . If AFIS 1 , a perfect match is reached, which can be assumed as a
measure for a “good” system state, based on an amount of sensor signals. The score value
AFIS 0 represents the overall “bad” measure decision for a certain trained model. As it
will be explained in section 6 the weight values of each parameter are taken as the weighted
components of eigenvector one (PC1) times the square roots of the corresponding
eigenvalues:
Ai v1i 1 . (18)
1
i 1 v1i 1 2 di .
M
MAFPC (19)
M
i 1
v1i 1
functions by estimating the data set's probability distribution and deriving the function's
parameters automatically from it. The resulting Probabilistic MFPC (PMFPC) membership
function is based on the MFPC approach, but leaves only one degree of freedom leading to a
shorter learning time for obtaining stable and robust classification results (Mönks, 2010).
Before obtaining the different PMFPC formulation, it is reminded that the membership
functions are aggregated using a fuzzy averaging operator in the MFPC approach.
Consequently, on the one hand the PMFPC membership functions can substitute the MFPC
membership function. On the other hand the fuzzy averaging operator used in the MFPC
can be substituted by any other operator. Actually, it is also possible to substitute both parts
of the MFPC at the same time (Mönks, 2010), and in all cases the application around the
classifier remains unchanged. To achieve the possibility of exchanging the MFPC’s core
parts, its formulation of Eq. 6 is rewritten to
M 1 1
1
di ( mi , pi )
M M
2 di ( mi , pi ) , (20)
M
MFPC ( m , p) 2 i0
i 1
revealing that the MFPC incorporates the geometric mean as its fuzzy averaging operator.
Also, the unimodal membership function, as introduced in Eq. 3 with A 1 , is isolated
clearly, which shall be replaced by the PMFPC membership function described in the
following section.
1
ld d m , p
( m , p) 2 B
0,1 . (21)
D and B are automatically parameterised in the PMFPC approach. PCE is yet not automated
to preserve the possibility of adjusting the membership function slightly without needing to
learn the membership functions from scratch. The algorithms presented here for
automatically parameterising parameters D and B are inspired by former approaches:
Bocklisch as well as Eichhorn developed algorithms which allow obtaining a value for the
(MFPC) potential function's parameter D automatically, based on the used training data set.
Bocklisch also proposed an algorithm for the determination of B. For details we refer to
(Bocklisch, 1987) and (Eichhorn, 2000). However, these algorithms yield parameters that do
not fulfil the constraints connected with them in all practical cases (cf. (Mönks, 2010)).
Hence, we propose a probability theory-based alternative described in the following.
Bocklisch's and Eichhorn's algorithms adjust D after comparing the actual distribution of
objects to a perfect uniform distribution. However, the algorithms tend to change D for
every (small) difference between the actual distribution and a perfect uniform distribution.
This explains why both algorithms do not fulfil the constraints when applied to random
uniform distributions.
We actually stick to the idea of adjusting D with respect to the similarity of the actual
distribution compared to an artificial, ideal uniform distribution, but we use probability
theoretical concepts. Our algorithm basically works as follows: At first, the empirical
Fuzzy-Pattern-Classifier Based Sensor Fusion for Machine Conditioning 335
cumulative distribution function (ECDF) of the data set under investigation is determined.
Then, the ECDF of an artificial perfect uniform distribution in the range of the actual
distribution is determined, too. The similarity between both ECDFs is expressed by its
correlation factor which is subsequently mapped to D by a parameterisable function.
P
k
i 1 m xi Pm Pu xi Pu
c , (22)
k 2 k 2
i 1
Pm xi Pm i 1
Pu xi Pu
D x
D
( x , D)
D
D
2 x ln(2) 2 x D
x D
ln( x ) . (23)
The locations x represent the distance to the membership function’s mean value m0 , hence
x 0 is the mean value itself, x 1 is the class boundary m0 C , x 2 twice the class
boundary and so on. The average influence of D on the membership function
xr
( D) 1
xr x l
xl
D ( x ) dx is evaluated for 1 x 1 : This interval bears the most valuable
information since all feature values of the objects in the training data set are included in this
interval, and additionally those of the class members are expected here during the
classification process, except from only a typically neglectable number of outliers. The
mapping of D : c 2, 20 , which is derived in the following, must take D’s average
influence into consideration, which turns out to be exponentially decreasing (Mönks, 2010).
336 Sensor Fusion and Its Applications
ld 1 m m0 r
D
2 Br Cr , m m
0
( m , p) D
, (26)
1 m m0 f
ld
2 B f C f , m m0
M
where m0 1
M i 1
mi , mi m is the arithmetic mean of all feature values. If m0 was
computed as introduced in Eq. 7, the resulting membership function would not describe the
underlying feature vector m appropriately for asymmetrical feature distributions. A new
computation method must therefore also be applied to C r m0 mmin PCE (mmax mmin )
and C f mmax m0 PCE (mmax mmin ) due to the change to the asymmetrical formulation.
To compute the remaining parameters, the feature vector must be split into the left side
feature vector m r ( mi mi m0 ) and the one for the right side m f (mi mi m0 ) for all
mi m . They are determined following the algorithms presented in the preceding sections
5.3.1.2 and 5.3.1.3, but using only the feature vector for one side to compute this side’s
respective parameter.
Using Eq. 26 as membership function, the Probabilistic Modified-Fuzzy-Pattern-Classifier is
defined as
338 Sensor Fusion and Its Applications
1
2 Br Cr , m m0
i 1
PMFPC ( m , p) 1 , (27)
1 m m0
Df
M
M ld B f C f
2 , m m0
i 1
having in mind, that the geometric mean operator can be substituted by any other fuzzy
averaging operator. An application is presented in section 6.2.
6. Applications
6.1 Machine Condition Monitoring
The approach presented in section 4 and 5.1 was tested in particular with an intaglio
printing machine in a production process. As an interesting fact print flaws were detected at
an early stage by using multi-sensory measurements. It has to be noted that one of the most
common type of print flaws (Lohweg, 2006) caused by the wiping unit was detected at a
very early stage.
The following data are used for the model: machine speed - motor current - printing
pressure side 1 (PPS1) - printing pressure side 2 (PPS2) - hydraulic pressure (drying blade) -
wiping solution flow - drying blade side 1 (DBS1) - drying blade side 2 (DBS2) - acoustic
signal (vertical side 1) - acoustic signal (horizontal side 1) - acoustic signal (vertical side 2) -
acoustic signal (horizontal side 1).
It has been mentioned that it might be desirable to preprocess some of the signals output by
the sensors which are used to monitor the behaviour of the machine. This is particularly true
in connection with the sensing of noises and/or vibrations produced by the printing press,
which signals a great number of frequency components. The classical approach to
processing such signals is to perform a spectral transformation of the signals. The usual
spectral transformation is the well-known Fourier transform (and derivatives thereof) which
converts the signals from the time-domain into the frequency-domain. The processing of the
signals is made simpler by working in the thus obtained spectrum as periodic signal
components are readily identifiable in the frequency-domain as peaks in the spectrum. The
drawbacks of the Fourier transform, however, reside in its inability to efficiently identify
and isolate phase movements, shifts, drifts, echoes, noise, etc., in the signals. A more
adequate “spectral” analysis is the so-called “cepstrum” analysis. “Cepstrum” is an
anagram of “spectrum” and is the accepted terminology for the inverse Fourier transform of
the logarithm of the spectrum of a signal. Cepstrum analysis is in particular used for
analysing “sounds” instead of analysing frequencies (Bogert, 1963).
A test was performed by measuring twelve different parameters of the printing machine’s
condition while the machine was running (data collection) (Dyck, 2006). During this test the
wiping pressure was decreased little by little, as long as the machine was printing only error
sheets. The test was performed at a speed of 6500 sheets per hour and a sample frequency of
Fuzzy-Pattern-Classifier Based Sensor Fusion for Machine Conditioning 339
7 kHz. During this test 797 sheets were printed, that means, the set of data contained more
than three million values per signal. In the first step before calculating the KLT of the raw
data, the mean value per sheet was calculated to reduce the amount of data to 797 values
per signal. As already mentioned, 12 signals were measured; therefore the four acoustical
signals were divided by cepstrum analysis in six new parameters, so that all in all 14
parameters built up the new input vectors of matrix X . As described above, at first the
correlation matrix of the input data was calculated. Some parameters are highly correlated,
e.g. PPS1 and PPS2 with a correlation factor 0.9183, DBS1 and DBS2 with a correlation factor
0.9421, and so forth. This fact already leads to the assumption that implementing the KLT
seems to be effective in reducing the dimensions of the input data. The classifier model is
shown in Fig. 4.
The KLT matrix is given by calculating the eigenvectors and eigenvalues of the correlation
matrix, because the eigenvectors build up the transformation matrix. In Fig. 5 the calculated
eigenvalues are presented. On the ordinate the variance contribution of several eigenvalues
in percentage are plotted versus the number of eigenvalues on the abscissa axis. The first
principal component has already a contribution of almost 60 % of the total variance. Looking
at the first seven principal components, which cover nearly 95 % of the total variance, shows
that this transformation allows a reduction of important parameters for further use in
classification without relevant loss of information. The following implementations focussed
only on the first principal component, which represents the machine condition state best.
Fig. 4. The adaptive Fuzzy-Pattern-Classifier Model. The FPC is trained with 14 features,
while the fuzzy inference system is adapted by the PCA output. Mainly the first principal
component is applied.
PCA is not only a dimension-reducing technique, but also a technique for graphical
representations of high-dimension data. Graphical representation of variables in a two-
dimensional way shows which parameters are correlated. The coordinates of the parameter
are calculated by weighting the components of the eigenvectors with the square root of the
eigenvalues: the ith parameter is represented as the point ( v1i 1 , v2 i 2 ). This weighting
is executed for normalisation.
340 Sensor Fusion and Its Applications
Fig. 5. Eigenvalues (blue) and cumulated eigenvalues (red). The first principal component
has already a contribution of almost 60 % of the total normalized variance.
For the parameter “speed” of test B the coordinates are calculated as:
1. ( v1,1 1 , v2 ,1 2 ) (0.24 7.8 , 0.14 1.6 ) (0.67, 0.18) , where
The x-axis represents the first principal component (PC1) and the y-axis represents the second
principal component (PC2). The values are always between zero and one. Zero means that the
parameters’ effect on the machine condition state is close to zero. On the other hand a value
near one shows that the parameters have strong effects on the machine condition state.
Therefore, a good choice for adaptation is the usage of normalized PC1 components.
Fuzzy-Pattern-Classifier Based Sensor Fusion for Machine Conditioning 341
The acoustical operational parameters sensed by the multiple-sensor arrangement are first
analysed with the cepstrum analysis prior to doing the principal component analysis (PCA).
The cepstrum analysis supplies the signal’s representative of vibrations or noises produced
by the printing press, such as the characteristic noises or vibrations patterns of intaglio
printing presses. Thereafter the new acoustical parameters and the remaining operational
parameters have to be fed into the PCA block to calculate corresponding eigenvalues and
eigenvectors. As explained above, the weight-values of each parameter are taken as the
weighted components of eigenvector one (PC1) times the square roots of the corresponding
eigenvalues. Each weight-value is used for weighting the output of a rule in the fuzzy
inference system (Fig. 4). E.g., the parameter “hydraulic pressure” receives the weight 0.05,
the parameter “PPS2” receives the weight 0.39, the parameter “Current” receives the weight
0.94 and so forth (Fig. 6). The sum of all weights in this test is 9.87. All 14 weights are fed
into the fuzzy inference system block (FIS).
Figure 7 shows the score value of test B. The threshold is set to 0.5, i.e. if the score value is
equal to or larger than 0.5 the machine condition state is “good”, otherwise the condition
state of the machine is “bad” and it is predictable that error sheets will be printed. Figure 7
shows also that the score value passes the threshold earlier than the image signals. That
means the machine runs in bad condition state before error sheets are printed.
Fig. 7. Score value representation for 797 printed sheets. The green curve represents the
classifier score value for wiping error detection, whilst the blue curve shows the results of
an optical inspection system. The score value 0.5 defines the threshold between “good” and
“bad” print.
The incorporated classifier uses both the MFPC and PMFPC membership functions as
introduced in section 5.3. Each membership function represents one of the 17 features
obtained from the images. All membership functions are learned based on the dedicated
training set consisting of 17 images per class. Their outputs, based on the respective feature
values of each of the 746 objects which were investigated, are subsequently fused through
aggregation using different averaging operators by using the classifier framework presented
in (Mönks, 2009). Here, the incorporated aggregation operators are Yager’s family of Ordered
Weighted Averaging (OWA) (Yager, 1988) and Larsen’s family of Andness-directed Importance
Weighting Averaging (AIWA) (Larsen, 2003) operators (applied unweighted here)—which
both can be adjusted in their andness degree—and additionally MFPC’s original geometric
mean (GM). We refer to (Yager, 1988) and (Larsen, 2003) for the definition of OWA and
AIWA operators. As a reference, the data set is also classified using a Support Vector Machine
(SVM) with a Gaussian radial basis function (RBF). Since SVMs are capable of
distinguishing between only two classes, the classification procedure is adjusted to pairwise
(or one-against-one) classification according to (Schölkopf, 2001). Our benchmarking
n
measure is the classification rate r N
, where n is the number of correctly classified
objects and N the total number of objects that were evaluated. The best classification rates at
a given aggregation operator’s andness g are summarised in the following Table 2, where
the best classification rate per group is printed bold.
Aggregation PMFPC MFPC
Operator D2 D4 D8 D 16
g PCE r PCE r PCE r PCE r PCE r
0.5000 AIWA 0.255 93.70 % 0.370 84.58 % 0.355 87.67 % 0.310 92.36 % 0.290 92.90 %
OWA 0.255 93.70 % 0.370 84.58 % 0.355 87.67 % 0.310 92.36 % 0.290 92.90 %
0.6000 AIWA 0.255 93.16 % 0.175 87.13 % 0.205 91.02 % 0.225 92.36 % 0.255 92.23 %
OWA 0.255 93.57 % 0.355 84.58 % 0.365 88.47 % 0.320 92.63 % 0.275 92.76 %
0.6368 GM 0.950 84.45 % 0.155 81.77 % 0.445 82.17 % 0.755 82.44 % 1.000 82.44 %
AIWA 0.245 91.42 % 0.135 85.52 % 0.185 90.08 % 0.270 89.81 % 0.315 89.95 %
OWA 0.255 93.57 % 0.355 84.72 % 0.355 88.74 % 0.305 92.63 % 0.275 92.76 %
0.7000 AIWA 1.000 83.65 % 0.420 82.71 % 0.790 82.57 % 0.990 82.31 % 1.000 79.22 %
OWA 0.280 93.57 % 0.280 84.85 % 0.310 89.01 % 0.315 92.76 % 0.275 92.63 %
Table 2. “OCR” classification rates r for each aggregation operator at andness degrees g
with regard to membership function parameters D and PCE .
The best classification rates for the “OCR” data set are achieved when the PMFPC
membership function is incorporated, which are more than 11 % better than the best using
the original MFPC. The Support Vector Machine achieved a best classification rate of
r 95.04% by parameterising its RBF kernel with 5.640 , which is 1.34 % or 10 objects
better than the best PMFPC approach.
Fuzzy-Pattern-Classifier Based Sensor Fusion for Machine Conditioning 343
8. References
Beyerer, J.; Punte León, F.; Sommer, K.-D. Informationsfusion in der Mess- und
Sensortechnik (Information Fusion in measurement and sensing),
Universitätsverlag Karlsruhe, 978-3-86644-053-1, 2006
Bezdek, J.C.; Keller, J.; Krisnapuram, R.; Pal, N. (2005). Fuzzy Models and Algorithms for
Pattern Recognition and Image Processing, The Handbook of Fuzzy Sets, Vo. 4,
Springer, 0-387-24515-4, New York
Bocklisch, S. F. & Priber, U. (1986). A parametric fuzzy classification concept, Proc.
International Workshop on Fuzzy Sets Applications, pp. 147–156, Akademie-
Verlag, Eisenach, Germany
Bocklisch, S.F. (1987). Prozeßanalyse mit unscharfen Verfahren, Verlag Technik, Berlin,
Germany
Bogert et al. (1963). The Quefrency Alanysis of Time Series for Echoes: Cepstrum, Pseudo-
autocovariance, Cross-Cepstrum, and Saphe Cracking, Proc. Symposium Time
Series Analysis, M. Rosenblatt (Ed.), pp. 209-243, Wiley and Sons, New York
Bossé, É.; Roy, J.; Wark, S. (2007). Concepts, models, and tools for information fusion, Artech
House, 1596930810, London, UK, Norwood, USA
Brown, S. (2004). Latest Developments in On and Off-line Inspection of Bank-Notes during
Production, Proceedings, IS&T/SPIE 16th Annual Symposium on Electronic
Imaging, Vol. 5310, pp. 46-51, 0277-786X, San Jose Convention Centre, CA, January
2004, SPIE, Bellingham, USA
Dujmović, J.J. & Larsen, H.L. (2007). Generalized conjunction/disjunction, In: International
Journal of Approximate Reasoning 46(3), pp. 423–446
Dyck, W. (2006). Principal Component Analysis for Printing Machines, Internal lab report,
Lemgo, 2006, private communications, unpublished
Eichhorn, K. (2000). Entwurf und Anwendung von ASICs für musterbasierte Fuzzy-
Klassifikationsverfahren (Design and Application of ASICs for pattern-based
Fuzzy-Classification), Ph.D. Thesis, Technical University Chemnitz, Germany
344 Sensor Fusion and Its Applications
Hall, D. L. & Llinas, J. (2001). Multisensor Data Fusion, Second Edition - 2 Volume Set, CRC
Press, 0849323797, Boca Raton, USA
Hall, D. L. & Steinberg, A. (2001a). Dirty Secrets in Multisensor Data Fusion,
http://www.dtic.mil, last download 01/04/2010
Hempel, A.-J. & Bocklisch, S. F. (2008). , Hierarchical Modelling of Data Inherent Structures
Using Networks of Fuzzy Classifiers, Tenth International Conference on Computer
Modeling and Simulation, 2008. UKSIM 2008, pp. 230-235, April 2008, IEEE,
Piscataway, USA
Hempel, A.-J. & Bocklisch, S. F. (2010). Fuzzy Pattern Modelling of Data Inherent Structures
Based on Aggregation of Data with heterogeneous Fuzziness
Modelling, Simulation and Optimization, 978-953-307-048-3, February 2010,
SciYo.com
Herbst, G. & Bocklisch, S.F. (2008). Classification of keystroke dynamics - a case study of
fuzzified discrete event handling, 9th International Workshop on Discrete Event
Systems 2008, WODES 2008 , pp.394-399, 28-30 May 2008, IEEE Piscataway, USA
Jolliffe, I.T. (2002). Principal Component Analysis, Springer, 0-387-95442-2, New York
Larsen, H.L. (2003). Efficient Andness-Directed Importance Weighted Averaging Operators.
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,
11(Supplement-1) pp. 67–82
Liggins, M.E.; Hall, D. L.; Llinas, J. (2008). Handbook of Multisensor Data Fusion: Theory
and Practice (Electrical Engineering & Applied Signal Processing), CRC Press,
1420053086, Boca Raton, USA
Lohweg, V.; Diederichs, C.; Müller, D. (2004). Algorithms for Hardware-Based Pattern
Recognition, EURASIP Journal on Applied Signal Processing, Volume 2004
(January 2004) pp. 1912-1920, 1110-8657
Lohweg, V.; Dyck, W.; Schaede, J.; Türke, T. (2006a). Information Fusion Application On
Security Printing With Parametrical Fuzzy Classification, Fusion 2006-9th
International Conference on Information Fusion, Florence, Italy
Lohweg, V.; Li, R.; Türke, T.; Willeke, H.; Schaede, J. (2009). FPGA-based Multi-sensor Real
Time Machine Vision for Banknote Printing, Proceedings, IS&T/SPIE 21th Annual
Symposium on Electronic Imaging, Vol. 7251, No. 7251-28, 9780819475015, San Jose
Convention Centre, CA, January 2009, SPIE, Bellingham, USA
Lohweg, V.; Schaede, J.; Türke, T. (2006). Robust and Reliable Banknote Authentication and
Print Flaw Detection with Opto-Acoustical Sensor Fusion Methods, Proceedings,
IS&T/SPIE 18th Annual Symposium on Electronic Imaging, Vol. 6075, No. 6075-02,
0277-786X, San Jose Convention Centre, CA, January 2006, SPIE, Bellingham, USA
Luo, R.C. & Kay, M.G. (1989). Multisensor integration and fusion in intelligent systems,
Systems, IEEE Transactions on Man and Cybernetics, vol. 19, no. 5, pp. 901-931,
Sep/Oct 1989, IEEE Piscataway, USA
Mönks, U.; Lohweg, V.; Larsen, H. L. (2009). Aggregation Operator Based Fuzzy Pattern
Classifier Design, Workshop Machine Learning in Real-Time Applications (MLRTA
09), Artificial Intelligence 2009, Paderborn, Germany
Mönks, U.; Petker, D.; Lohweg, V. (2010). Fuzzy-Pattern-Classifier Training with Small Data
Sets, In: Information Processing and Management of Uncertainty in Knowledge-
Based Systems, E. Hüllermeier, R. Kruse and F. Hoffmann (Ed.), Vol. 80, pp. 426 –
435, Springer, 978-3-642-14054-9, Heidelberg
Fuzzy-Pattern-Classifier Based Sensor Fusion for Machine Conditioning 345
15
X
1. Introduction
A robot is said to be fully autonomous if it is able to build a navigation map. The map is a
representation of a robot surroundings modelled as 2D geometric features extracted from a
proximity sensor like laser. It provides succinct space description that is convenient for
environment mapping via data association. In most cases these environments are not known
prior, hence maps needs to be generated automatically. This makes feature based SLAM
algorithms attractive and a non trivial problems. These maps play a pivotal role in robotics
since they support various tasks such as mission planning and localization. For decades, the
latter has received intense scrutiny from the robotic community. The emergence of
stochastic map proposed by seminal papers of (Smith et al., 1986; Moutarlier et al., 1989a;
Moutarlier et al., 1989b & Smith et al., 1985), however, saw the birth of joint posterior
estimation. This is a complex problem of jointly estimating the robot’s pose and the map of
the environment consistently (Williams S.B et al., 2000) and efficiently. The emergence of
new sensors systems which can provide information at high rates such as wheel encoders,
laser scanners and sometimes cameras made this possible. The problem has been research
under the name Simultaneous Localization and Mapping (SLAM) (Durrant-Whyte, H et al.
2006 Part I and II) from its inception. That is, to localize a mobile robot, geometric features/
landmarks (2D) are generated from a laser scanner by measuring the depth to these
obstacles. In office like set up, point (from table legs), line (walls) and corner (corner forming
walls) features makes up a repeated recognisable pattern formed by a the laser data. These
landmarks or features can be extracted and used for navigation purposes. A robot’s
perception of its position relative to these landmarks increases, improving its ability to
accomplish a task. In SLAM, feature locations, robot pose estimates as well feature to robot
pose correlations statistics are stochastically maintained inside an Extended Kalman filter
increasing the complexity of the process (Thorpe & Durrant-Whyte, 2001). It is also
important to note that, though a SLAM problem has the same attributes as estimation and
tracking problems, it is not fully observable but detectable. This has a huge implication in
the solution of SLAM problem. Therefore, it is important to develop robust extraction
algorithms of geometric features from sensor data to aid a robot navigation system.
348 Sensor Fusion and Its Applications
2. Feature Extraction
Feature extraction forms the lower part of the two layered procedure of feature detection.
The top tier is the data segmentation process, which creates clusters of points deemed to
originate from the same obstacle. It groups measurements of a scan into several clusters
according to the distances between consecutive scans. These segments sectors then are fed to
Feature extraction: techniques for landmark based navigation system 349
the feature extraction algorithms, where features like corners or lines are considered. These
features are well defined entities which are recognisable and can be repeatedly detected.
In this paper, real laser data from the sensor onboard a robot is processed to extract corner
like features, common in most indoor environments. A robot used for this experiment is
called Meer-Cat and was developed in house, depicted by Figure 1 below.
Fig. 1. Meer-Cat mobile platform equipped with Sick laser scanner. The robot has an upright
board at the top used for tracking purposes via another laser sensor.
between the vectors as well as the minimum allowable opposite distance c as shown in
figure 2b below are set prior. A corner is normally described by angles less than 120 degrees,
while the separation distance is tightly related to the angular resolution of the laser
rangefinder. The distance c is set to very small values; computations greater than this value
are passed as corners. If a corner is detected, an ‘inward’ search is conducted. This is done
by checking for a corner angle violation/ existence between the 2nd and 10th, 3rd and 9th, and
so on, for sample sector of 11 data points. This is from the assumption that a linear fit can be
performed on the vectors. The searching routine of this method already demand high
computation speed, therefore inward search will undoubtedly increase the complexity.
Fig. 2. (a), Sliding window technique. (b) Shows how two vectors centred at the midpoint
are derived if a corner if found. The terminal points are at the first and the eleventh point
given that the midpoint of the sector is 6.
Using the above methods one runs into the problem of mapping outliers as corners. This has
huge implication in real time implementation because computation complexity of the SLAM
process is quadratic the number of landmarks mapped. The outliers or ‘ghost’ landmarks
corrupt the EKF SLAM process.
y mx c (2)
where c and m is the y -intercept and slope of a line respectively. The shortcoming with
m (gradient).
this representation is that vertical lines require infinite
Feature extraction: techniques for landmark based navigation system 351
where Pi and Pf are respectively the Cartesian coordinates of the initial and the end of a
line. While m and b are the parameters of an ith line. A method proposed by [14] is used to
search for a breaking point of a cluster, which occurs at the maximum perpendicular
distance to a line. The process starts by connecting the first and last data points of a cluster
Ax By C 0 ), where
by a straight line (
A y f yi ; B x f xi ; C ( By f Ax f ) . Then for all data points between the
Axk Byk C
d ,k . (4)
A2 B 2
If a tolerance value is violated by the d then a break point is determined, this is done
recursively until the point before last. The final step is to determine straight line parameters,
i.e. an orthogonal regression method (Mathpages 2010-04-23) is applied to determine linear
fit that minimizes quadratic error. The process is graphically represented by the figure
below
352 Sensor Fusion and Its Applications
To mitigate the infinite slope problem, a polar representation or Hessen form is used. In the
method, each point in the Cartesian coordinate space adds a sinusoid in the ( , ) space.
This is shown the figure 5 below.
Fig. 5. Mapping between the Cartesian space and the polar Space.
where 0 is the perpendicular distance of the line to the origin. The angle is bounded
by and is the angle between the x axis and the normal of the line as shown in
the figure 6 below.
Feature extraction: techniques for landmark based navigation system 353
Fig. 6. Fitting line parameters. d is the fitting error we wish to minimize. A line is
expressed in polar coordinates ( and ). ( x , y ) is the Cartesian coordinates of a point on
the line.
Using the above representation, the split-and-merge algorithm recursively subdivides scan
data into sets of collinear points, approximated as lines in total least square sense. The
algorithm determines corners by two main computations, the line extraction and collection
of endpoints as corners. Initially, scanned data is clustered into sectors assumed to come
from the same objects. The number of data points within a certain cluster as well as an
identification of that cluster is stored. Clusters are then passed to a line fitting algorithm (Lu
& Milios, 1994). When we perform a regression fit of a straight line to a set of ( x , y ) data
points we typically minimize the sum of squares of the "vertical" distance between the data
points and the line (Mathpages 2010-04-23).Therefore, the aim of the linear regression
method is to minimize the mean squared error of
such that ( xi , yi ) are the inputs points in Cartesian coordinates. The solution to the line
parameters can be found by taking the first derivative of the equation 6 above with respect
to and respectively. We assume that
354 Sensor Fusion and Its Applications
d 2 d 2
0 and 0 (7)
Line parameters can be determined by the following
2 ( ym yi )( xm xi )
tan(2 )
[( y m yi ) 2 ( xm xi ) 2 ]
(8)
2 ( ym yi )( xm xi )
0.5a tan 2
[( y y ) 2 ( x x ) 2 ]
m i m i
if we assume that the Centroid is on the line then can be computed using equation 4 as:
N
xm 1 xi
and (10)
N
ym 1 yi
are ( xm , ym ) are Cartesian coordinates of the Centroid, and N is the number of points in the
sector scan we wish to fit line parameter to.
Fig. 7. Fitting lines to a laser scan. A line has more than four sample points.
Feature extraction: techniques for landmark based navigation system 355
During the line fitting process, further splitting positions within a cluster are determined by
computing perpendicular distance of each point to the fitted line. As shown by figure 6. A
point where the perpendicular distance is greater than the tolerance value is marked as a
candidate splitting position. The process is iteratively done until the whole cluster scan is
made up of linear sections as depicted by figure 7 above. The next procedure is collection of
endpoints, which is joining points of lines closest to each other. This is how corner positions
are determined from split and merge algorithm. The figure below shows extracted corners
defined at positions where two line meet. These positions (corners) are marked in pink.
Fig. 8. Splitting position taken as corners (pink marks) viewed from successive robot
positions. The first and second extraction shows 5 corners. Interestingly, in the second
extraction a corner is noted at a new position, In SLAM, the map has total of 6 landmarks in
the state vector instead of 5. The association algorithm will not associate the corners; hence a
new feature is mapped corrupting the map.
The split and merge corner detector brings up many possible corners locations. This has a
high probability of corrupting the map because some corners are ‘ghosts’. There is also the
issue of computation burden brought about by the number of landmarks in the map. The
standard EKF-SLAM requires time quadratic in the number of features in the map (Thrun, S
et al. 2002).This computational burden restricts EKF-SLAM to medium sized environments
with no more than a few hundred features.
356 Sensor Fusion and Its Applications
c 2 a 2 b 2 2abf ( )
where (11)
c 2 (a 2 b 2 )
f ( )
2ab
where f ( ) is minus cosine . The limits of operating bounds for c can be inferred from
the output of f ( ) at corresponding bound angles. That is, is directly proportion to
distance c. Acute angles give negative results because the square of c is less than the sum of
squares of a and b . The figure 9 below shows the angle-to-sides association as well as the
corresponding f ( ) results as the angle grows from acuteness to obtuseness.
Feature extraction: techniques for landmark based navigation system 357
Fig. 9. The relation of the side lengths of a triangle as the angle increases. Using minus
cosine function, an indirect relationship is deduced as the angle is increased from acute to
obtuse.
The f ( ) function indirectly has information about the minimum and maximum
allowable opposite distance. From experiment this was found to be within [-0.3436 0.3515].
That is, any output within this region was considered a corner. For example, at 90
angle c
2
a 2 b2 , outputting zero for f ( ) function. As the angle increases,
2 2 2
acuteness ends and obtuseness starts, the relation between c and a b is reversed.
The main aim of this algorithm is to distinguish between legitimate corners and those that
are not (outliers). Corner algorithms using sliding window technique are susceptible to
mapping outlier as corners. This can be shown pictorial by the figure below
358 Sensor Fusion and Its Applications
where is the change in angle as the algorithm checks consecutively for a corner angle
between points. That is, if there are 15 points in the window and corner conditions are met,
corner check process will be done. The procedure checks for corner condition violation/
acceptance between the 2nd & 14th, 3rd & 13th, and lastly between the 4th & 12th data points as
portrayed in figure 10 above. If does not violate the pre-set condition, i.e. (corner angles
120) then a corner is noted. c is the opposite distance between checking points.
Because this parameter is set to very small values, almost all outlier corner angle checks will
pass the condition. This is because the distances are normally larger than the set tolerance,
hence meeting the condition.
The algorithm we propose uses a simple and effect check, it shifts the midpoint and checks
for the preset conditions. Figure 11 below shows how this is implemented
Fig. 11. Shifting the mid-point to a next sample point (e.g. the 7th position for a 11 sample
size window) within the window
As depicted by figure 11 above, and angles are almost equal, because the angular
resolution of the laser sensor is almost negligible. Hence, shifting the Mid-point will almost
give the same corner angles, i.e. will fall with the f ( ) bounds. Likewise, if a Mid-
Feature extraction: techniques for landmark based navigation system 359
point coincides with the outlier position, and corner conditions are met, i.e. and c
(or f ( ) conditions) are satisfies evoking the check procedure. Shifting a midpoint gives a
results depicted by figure 12 below.
Fig. 12. If a Mid-point is shifted to the next consecutive position, the point will almost
certainly be in-line with other point forming an obtuse triangle.
Evidently, the corner check procedure depicted above will violate the corner conditions. We
expect angle to be close to 180 and the output of f ( ) function to be almost 1, which
is outside the bounds set. Hence we disregard the corner findings at the Mid-point as ghost,
i.e. the Mid-point coincide with an outlier point. The figure below shows an EKF SLAM
process which uses the standard corner method, and mapping an outlier as corner.
Fig. 13. Mapping outliers as corners largely due to the limiting bounds set. Most angle and
opposite distances pass the corner test bounds.
360 Sensor Fusion and Its Applications
A pseudo code in the figure is able to distinguish outlier from legitimate corner positions.
This is has a significant implication in real time implementation especially when one maps
large environments. EKF-SLAM’s complexity is quadratic the number of landmarks in the
map. If there are outliers mapped, not only will they distort the map but increase the
computational complexity. Using the proposed algorithm, outliers are identified and
discarded as ghost corners. The figure below shows a mapping result when the two
algorithms are used to map the same area
Fig. 15. Comparison between the two algorithms (mapping the same area)
3. EKF-SLAM
The algorithm developed in the previous chapter form part of the EKF-SLAM algorithms. In
this section we discuss the main parts of this process. The EKF-SLAM process consists of a
recursive, three-stage procedure comprising prediction, observation and update steps. The
EKF estimates the pose of the robot made up of the position ( xr , yr ) and orientation r ,
together with the estimates of the positions of the N environmental features x f ,i
where i 1 N , using observations from a sensor onboard the robot (Williams, S.B et al.
2001).
SLAM considers that all landmarks are stationary; hence the state transition model for the
i th feature is given by:
x f ,i (k ) x f ,i (k 1) x f ,i
(12)
It is important to note that the evolution model for features does have any uncertainty since
the features are considered static.
be converted into a control input for use in the core navigation system. It would be a bad
idea to simply use a dead-reckoned odometry estimate as a direct measurement of state in a
Kalman Filter (Newman, P, 2006).
Fig. 16. Odometry alone is not ideal for position estimation because of accumulation of
errors. The top left figure shows an ever increasing 2 bound around the robot’s position.
x1 cos 1 y1 sin 1
x1 x1 sin 1 y1 cos 1
1
(17)
Feature extraction: techniques for landmark based navigation system 363
xi xr yi yr
2 2
ri
(19)
y yr
i tan 1 i r
xi xr (20)
the range and bearing respectively to the i th feature in the environment relative to the
vehicle pose.
h (k ) r
(21)
The strength (covariance) of the observation noise is denoted R .
R diag r2 2 (22)
[ z0 , R0 ] GetLaserSensorMeasuremet
If ( z0 ! =0)
[ zk , Rk ] GetLaserSensorMeasuremet
H k DoDataAssociation( xk |k 1 , Pk |k 1 , zk , Rk )
xk |k , Pk |k EKF _ Update( xk |k 1 ; Pk |k 1 , zk , Rk , H k ) {If a feature exists in the map}
xk |k , Pk |k AugmentMap( xk |k 1 ; Pk |k 1 , zk , Rk , H k ) {If it’s a new feature}
If ( zk = =0)
xk |k , Pk |k = xk |k 1 , Pk |k 1
end
end
Fig. 17. EKF- SLAM pseudo code
X 0B X rB 0 (23)
P 0B P r B 0
(24)
This avoids future states of the vehicle’s uncertainty reaching values below its initial
settings, since negative values make no sense. If at any time there is a need to compute the
vehicle location or the map feature with respect to any other reference, the appropriate
transformations can be applied. At any time, the map can also be transformed to use a
Feature extraction: techniques for landmark based navigation system 365
feature as base reference, again using the appropriate transformations (Castellanos, J.A et al.
2006).
X r (k | k 1) X r (k 1| k 1) uo (k ) (25)
Pr (k | k 1) J1 ( X r , uo ) Pr (k 1| k 1) J1 ( X r , uo )T J 2 ( X r , uo )U O (k ) J1 ( X r , uo )T (26)
J1 ( X r , uo ) is the Jacobian of equation (16) with respect to the robot pose, Xr and
J 2 ( X r , uo ) is the Jacobian of equation (16) with respect to the control input, uo . Based on
equations (12), the above Jacobians are calculated as follows:
x1 x2
J1 x1 , x2 (27)
x1
1 0 x2 sin 1 y2 cos 1
(28)
J1 x1 , x2 0 1 x2 cos 1 y2 sin 1
0 0 1
x1 x2
J 2 x1 , x2 (29)
x2
cos 1 sin 1 0
J 2 x1 , x2 sin 1 cos 1 0 (30)
0 0 1
3.3.3 Observation
Assume that at a certain time k an onboard sensor makes measurements (range and
bearing) to m features in the environment. This can be represented as:
zm (k ) [ z1 . . zm ] (31)
366 Sensor Fusion and Its Applications
3.3.4 Update
th
The update process is carried out iteratively every k step of the filter. If at a given time
step no observations are available then the best estimate at time k is simply the
prediction X ( k | k 1) . If an observation is made of an existing feature in the map, the
state estimate can now be updated using the optimal gain matrix W ( k ) . This gain matrix
provides a weighted sum of the prediction and observation. It is computed using the
innovation covariance S ( k ) , the state error covariance P ( k | k 1) and the Jacobians of
the observation model (equation 18), H (k ) .
W (k ) P (k | k 1) H (k ) S 1 (k ) , (32)
where S (k ) is given by:
S (k ) H ( k ) P (k | k 1) H T (k ) R( k ) (33)
P ( k | k ) P ( k | k 1) W ( k ) S ( k )W ( k ) T (35)
The innovation, v (k ) is the discrepancy between the actual observation, z (k ) and the
predicted observation, z (k | k 1) .
v (k ) z (k ) z (k | k 1) , (36)
z (k | k 1) h X r (k | k 1), xi , yi (37)
X r ( k | k 1) is the predicted pose of the robot and ( xi , yi ) is the position of the observed
map feature.
feature, z (k ) as arguments and returns a new, longer state vector with the new feature at
its end (Newman 2006).
X (k | k )* y X (k | k ), z ( k ) (338)
X (k | k )
X (k | k ) xr r cos( r )
* (39)
yr r sin( r )
Where the coordinates of the new feature are given by the function g :
x r cos( r ) g1
g r (40)
yr r sin( r ) g 2
r and are the range and bearing to the new feature respectively. ( xr , yr ) and r are the
estimated position and orientation of the robot at time k .
The augmented state vector containing both the state of the vehicle and the state of all
feature locations is denoted:
X (k | k )* [ X rT (k ) x Tf ,1 . . x Tf , N ] (41)
We also need to transform the covariance matrix P when adding a new feature. The
gradient for the new feature transformation is used for this purpose:
x r cos( r ) g1
g r (42)
yr r sin( r ) g 2
The complete augmented state covariance matrix is then given by:
P (k | k ) 0 T
P (k | k )* Yx , z Yx , z , (43)
0 R
where Yx. z is given by:
I nxn 0nx 2
Yx , z
zeros(nstates n)] G z
(44)
[Gxr
368 Sensor Fusion and Its Applications
where nstates and n are the lengths of the state and robot state vectors respectively.
g
GXr (45)
X r
Gz g (47)
z
g1 g1
r cos( r ) r sin( r )
Gz (48)
g 2 g 2 sin( r ) r cos( r )
r
Hk j1 j2 j3..... jm , (49)
indicates that the measurement zi ( k ) does not come from any feature in the map. Figure 2
below summarises the data association process described here. Several techniques have
been proposed to address this issue and more information on some these techniques can be
found in (Castellanos, J.A et al. 2006) and (Cooper, A.J, 2005).
Of interest in this chapter is the simple data association problem of finding the
correspondence of each measurement to a map feature. Hence the Individual Compatibility
Nearest Neighbour Method will be described.
Feature extraction: techniques for landmark based navigation system 369
z j (k | k 1) h( X r (k | k 1), x j , y j ) (50)
ij (k ) zi (k ) z j (k | k 1) (51)
the robot states and H Fj , the gradient Jacobian of the observation model with respect to the
observed map feature.
H ( k ) H r 00 00 H Fj 0 0 (53)
The measurement and a map feature can be considered compatible if the Mahalanobis
distance satisfies:
2 2
D ij (k ) d ,1 (55)
Where d dim(vij ) and 1 is the desired level of confidence usually taken to be 95% .
The result of this exercise is a subset of map features that are compatible with a particular
measurement. This is the basis of a popular data association algorithm termed Individual
370 Sensor Fusion and Its Applications
Compatibility Nearest Neighbour. Of the map features that satisfy IC, ICNN chooses one
with the smallest Mahalanobis distance (Castellanos, J.A et al. 2006).
E[ X ] 0 (56)
E X (k ) X (k ) P(k | k 1)
T
(57)
where the actual state estimation error is given by:
X (k ) X (k ) X (k | k 1) (58)
P (k | k 1) is the state error covariance. Equation (57) means that the actual mean square
error matches the state covariance. When the ground truth solution for the state variables is
available, a chi-squared test can be applied on the normalised estimation error squared to
check for filter consistency.
X (k ) P (k | k 1) X (k )
T 1 2
d ,1 (59)
where DOF is equal to the state dimension d dimx(k ) and 1 is the desired confidence
level. In most cases ground truth is not available, and consistency of the estimation is
checked using only measurements that satisfy the innovation test:
observed a Kalman filter update is done. This improves the overall position estimates of the
robot as well as the landmark. Consequently, this causes the confidence ellipse drawn
around the map (robot position and corners) to reduce in size (bottom left picture).
Fig. 18. In figure 8, two consecutive corner extraction process from the split and merge
algorithm maps one corner wrongly, while in contrast our corner extraction algorithm picks
out the same two corners and correctly associates them.
Fig. 19. EKF-SLAM simulation results showing map reconstruction (top right) of an office
space drawn from sensor data logged by the Meer Cat. When a corner is detected, its
position is mapped and a 2 confidence ellipse is drawn around the feature position. As
the number of observation of the same feature increase the confidence ellipse collapses (top
right). The bottom right picture depict x coordinate estimation error (blue) between 2
bounds (red). Perceptual inference
Expectedly, as the robot revisits its previous position, there is a major decrease in the ellipse,
indicating robot’s high perceptual inference of its position. The far top right picture shows a
reduction in ellipses around robot position. The estimation error is with the 2 , indicating
consistent results, bottom right picture. During the experiment, an extra laser sensor was
372 Sensor Fusion and Its Applications
user to track the robot position, this provided absolute robot position. An initial scan of the
environment (background) was taken prior by the external sensor. A simple matching is
then carried out to determine the pose of the robot in the background after exploration.
Figure 7 below shows that as the robot close the loop, the estimated path and the true are
almost identical, improving the whole map in the process.
termination position
3
2
[m]
start
-1
-4 -3 -2 -1 0 1 2 3
[m]
Fig. 20. The figure depicts that as the robot revisits its previous explored regions; its
positional perception is high. This means improved localization and mapping, i.e. improved
SLAM output.
time operation. The corner detector we developed reduces the chance of mapping dummy
corners and has improved computation cost. This offline simulation with real data has
allowed us to test and validate our algorithms. The next step will be to test algorithm
performance in a real time. For large indoor environments, one would employ a try a
regression method to fit line to scan data. This is because corridors will have numerous
possible corners while it will take a few lines to describe the same space.
6. Reference
Bailey, T and Durrant-Whyte, H. (2006), Simultaneous Localisation and Mapping (SLAM):
Part II State of the Art. Tim. Robotics and Automation Magazine, September.
Castellanos, J.A., Neira, J., and Tard´os, J.D. (2004) Limits to the consistency of EKF-based
SLAM. In IFAC Symposium on Intelligent Autonomous Vehicles.
Castellanos, J.A.; Neira, J.; Tardos, J.D. (2006). Map Building and SLAM Algorithms,
Autonomous Mobile Robots: Sensing, Control, Decision Making and Applications, Lewis,
F.L. & Ge, S.S. (eds), 1st edn, pp 335-371, CRC, 0-8493-3748-8, New York, USA
Collier, J, Ramirez-Serrano, A (2009)., "Environment Classification for Indoor/Outdoor
Robotic Mapping," crv, Canadian Conference on Computer and Robot Vision , pp.276-
283.
Cooper, A.J. (2005). A Comparison of Data Association Techniques for Simultaneous
Localisation and Mapping, Masters Thesis, Massachusets Institute of Technology
Crowley, J. (1989). World modeling and position estimation for a mobile robot using
ultrasound ranging. In Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA).
Duda, R. O. and Hart, P. E. (1972) "Use of the Hough Transformation to Detect Lines and
Curves in Pictures," Comm. ACM, Vol. 15, pp. 11–15 ,January.
Durrant-Whyte, H and Bailey, T. (2006). Simultaneous Localization and Mapping (SLAM): Part I
The Essential Algorithms, Robotics and Automation Magazine.
Einsele, T. (2001) "Localization in indoor environments using a panoramic laser range
finder," Ph.D. dissertation, Technical University of München, September.
Hough ,P.V.C., Machine Analysis of Bubble Chamber Pictures. (1959). Proc. Int. Conf. High
Energy Accelerators and Instrumentation.
Li, X. R. and Jilkov, V. P. (2003). Survey of Maneuvering Target Tracking.Part I: Dynamic
Models. IEEE Trans. Aerospace and Electronic Systems, AES-39(4):1333.1364, October.
Lu, F. and Milios, E.E..(1994). Robot pose estimation in unknown environments by matching
2D range scans. In Proc. of the IEEE Computer Society Conf. on Computer Vision and
Pattern Recognition (CVPR), pages 935–938.
Mathpages, “Perpendicular regression of a line”
http://mathpages.com/home/kmath110.htm. (2010-04-23)
Mendes, A., and Nunes, U. (2004)"Situation-based multi-target detection and tracking with
laser scanner in outdoor semi-structured environment", IEEE/RSJ Int. Conf. on
Systems and Robotics, pp. 88-93.
Moutarlier, P. and Chatila, R. (1989a). An experimental system for incremental environment
modelling by an autonomous mobile robot. In ISER.
Moutarlier, P. and Chatila, R. (1989b). Stochastic multisensory data fusion for mobile robot
location and environment modelling. In ISRR ).
374 Sensor Fusion and Its Applications
Newman, P.M. (1999). On the structure and solution of the simultaneous localization and
mapping problem. PhD Thesis, University of Sydney.
Newman, P. (2006) EKF Based Navigation and SLAM, SLAM Summer School.
Pfister, S.T., Roumeliotis, S.I., and Burdick, J.W. (2003). Weighted line fitting algorithms for
mobile robot map building and efficient data representation. In Proc. of the IEEE Int.
Conf. on Robotics & Automation (ICRA).
Roumeliotis S.I. and Bekey G.A. (2000). SEGMENTS: A Layered, Dual-Kalman filter
Algorithm for Indoor Feature Extraction. In Proc. IEEE/RSJ International Conference
on Intelligent Robots and Systems, Takamatsu, Japan, Oct. 30 - Nov. 5, pp.454-461.
Smith, R., Self, M. & Cheesman, P. (1985). On the representation and estimation of spatial
uncertainty. SRI TR 4760 & 7239.
Smith, R., Self, M. & Cheesman, P. (1986). Estimating uncertain spatial relationships in
robotics, Proceedings of the 2nd Annual Conference on Uncertainty in Artificial
Intelligence, (UAI-86), pp. 435–461, Elsevier Science Publishing Company, Inc., New
York, NY.
Spinello, L. (2007). Corner extractor, Institute of Robotics and Intelligent Systems, Autonomous
Systems Lab,
http://www.asl.ethz.ch/education/master/mobile_robotics/year2008/year2007,
ETH Zürich
Thorpe, C. and Durrant-Whyte, H. (2001). Field robots. In ISRR’.
Thrun, S., Koller, D., Ghahmarani, Z., and Durrant-Whyte, H. (2002) Slam updates require
constant time. Tech. rep., School of Computer Science, Carnegie Mellon University
Williams S.B., Newman P., Dissanayake, M.W.M.G., and Durrant-Whyte, H. (2000.).
Autonomous underwater simultaneous localisation and map building. Proceedings
of IEEE International Conference on Robotics and Automation, San Francisco, USA, pp.
1143-1150,
Williams, S.B.; Newman, P.; Rosenblatt, J.; Dissanayake, G. & Durrant-Whyte, H. (2001).
Autonomous underwater navigation and control, Robotica, vol. 19, no. 5, pp. 481-
496.
Sensor Data Fusion for Road Obstacle Detection: A Validation Framework 375
16
X
France
1. Introduction
Obstacle detection is an essential task for autonomous robots. In particular, in the context of
Intelligent Transportation Systems (ITS), vehicles (cars, trucks, buses, etc.) can be considered
as robots; the development of Advance Driving Assistance Systems (ADAS), such as
collision mitigation, collision avoidance, pre-crash or Automatic Cruise Control, requires
that reliable road obstacle detection systems are available. To perform obstacle detection,
various approaches have been proposed, depending on the sensor involved: telemeters like
radar (Skutek et al., 2003) or laser scanner (Labayrade et al., 2005; Mendes et al., 2004),
cooperative detection systems (Griffiths et al., 2001; Von Arnim et al., 2007), or vision
systems. In this particular field, monocular vision generally exploits the detection of specific
features like edges, symmetry (Bertozzi et al., 2000), color (Betke & Nguyen, 1998)
(Yamaguchi et al., 2006) or even saliency maps (Michalke et al., 2007). Anyway, most
monocular approaches suppose recognition of specific objects, like vehicles or pedestrians,
and are therefore not generic. Stereovision is particularly suitable for obstacle detection
(Bertozzi & Broggi, 1998; Labayrade et al., 2002; Nedevschi et al., 2004; Williamson, 1998),
because it provides a tri-dimensional representation of the road scene. A critical point about
obstacle detection for the aimed automotive applications is reliability: the detection rate
must be high, while the false detection rate must remain extremely low. So far, experiments
and assessments of already developed systems show that using a single sensor is not
enough to meet these requirements: due to the high complexity of road scenes, no single
sensor system can currently reach the expected 100% detection rate with no false positives.
Thus, multi-sensor approaches and fusion of data from various sensors must be considered,
in order to improve the performances. Various fusion strategies can be imagined, such as
merging heterogeneous data from various sensors (Steux et al., 2002). More specifically,
many authors proposed cooperation between an active sensor and a vision system, for
instance a radar with mono-vision (Sugimoto et al., 2004), a laser scanner with a camera
(Kaempchen et al., 2005), a stereovision rig (Labayrade et al., 2005), etc. Cooperation
between mono and stereovision has also been investigated (Toulminet et al., 2006).
376 Sensor Fusion and Its Applications
Our experiments in the automotive context showed that using specifically a sensor to
validate the detections provided by another sensor is an efficient scheme that can lead to a
very low false detection rate, while maintaining a high detection rate. The principle consists
to tune the first sensor in order to provide overabundant detections (and not to miss any
plausible obstacles), and to perform a post-process using the second sensor to confirm the
existence of the previously detected obstacles. In this chapter, such a validation-based
sensor data fusion strategy is proposed, illustrated and assessed.
The chapter is organized as follows: the validation framework is presented in Section 2. The
next sections show how this framework can be implemented in the case of two specific
sensors, i.e. a laser scanner aimed at providing hypothesis of detections, and a stereovision
rig aimed at validating these detections. Section 3 deals with the laser scanner raw data
processing: 1) clustering of lasers points into targets; and 2) tracking algorithm to estimate
the dynamic state of the objects and to monitor their appearance and disappearance. Section
4 is dedicated to the presentation of the stereovision sensor and of the validation criteria. An
experimental evaluation of the system is given. Eventually, section 5 shows how this
framework can be implemented with other kinds of sensors; experimental results are also
presented. Section 6 concludes.
Fig. 1. Overview of the validation framework: a first sensor outputs hypothesis of detection.
A second sensor validates those hypothesis.
The successive steps of the validation framework are as follows. First, a volume of interest
(VOI) surrounding the targets is built in the 3D space in front of the equipped vehicle, for
each target provided by the first sensor. Then, the second sensor focuses on each VOI, and
evaluates criteria to validate the existence of the targets. The only requirement for the first
Sensor Data Fusion for Road Obstacle Detection: A Validation Framework 377
sensor is to provide localized targets with respect to the second sensor, so that VOI can be
computed.
In the next two sections, we will show how this framework can be implemented for two
specific sensors, i.e. a laser scanner, and a stereovision rig; section 5 will study the case of an
optical identification sensor as first sensor, along with a stereovision rig as second sensor. It
is convenient to assume that all the sensors involved in the fusion scheme are rigidly linked
to the vehicle frame, so that, after calibration, they can all refer to a common coordinate
system. For instance, Fig. 2 presents the various sensors taken into account in this chapter,
referring to the same coordinate system.
Fig. 2. The different sensors used located in the same coordinate system Ra.
The chosen distance Di, must comply with the following criteria (Gruyer et al., 2003):
- Firstly, this function Di, must give a result scaled between 0 and 1 if the measurement has
an intersection with the cluster . The value 0 indicates that the measurement i is the same
object than the cluster with a complete confidence.
- Secondly, the result must be above 1 if the measurement i is out of the cluster ,
- Finally, this distance must have the properties of distance functions.
Xi Xi t (1)
Di , j
X X X Xi
In the normalizing part, the point X represents the border point of a cluster (centre ).
This point is located on the straight line between the cluster (centre ) and the
measurement i (centre Xi). The same border measurement is used with the measurement.
The computation of X and XX is made with the covariance matrices Rx and P. P and Rx
are respectively the cluster covariance matrix and the measurement covariance matrix. The
measurement covariance matrix is given from its polar covariance representation (Blackman
& Popoli, 1999) with 0 the distance and 0 the angle:
2 x2 y
x0 0 0
Rx (2)
2 2
x0 y0 y0
2 2 cos ² 2 ² sin ²
x0 0 0 0 0 0
2 2 2
y sin ²0 0 ² cos ²0 (3)
0 0 0
2 1 2 2
x0 y0 2 sin 2 0 0 ²
0 0
2 and 2 are the variances in both distance and angle of each measurement provided by
0 0
the laser scanner. From this covariance matrix, the eigenvalues and the eigenvectors V are
extracted. A set of equations for ellipsoid cluster, measurement modeling and the line
between the cluster centre and the laser measurement X is then deduced:
x and y give the position of a point on the ellipse and the position of a point in a line. If x
and y are the same in the three equations then an intersection between the ellipse and the
line exists. The solution of the set of equations (4) gives:
² V aV
1 2,1 1,1
arctan with 2 , 2 (5)
2 ² V2,2 aV1,2
Then equation (1) is used with X to know if a laser point belongs to a cluster. Fig. 3 gives a
visual interpretation of the used distance for the clustering process. Fig. 4 gives an example
of a result of autonomous clustering from laser scanner data. Each cluster is characterized by
its position, its orientation, and its size along the two axes (standard deviations).
380 Sensor Fusion and Its Applications
-Y
m
A
A 1 A 2 , A (7)
The sum of these masses is equal to 1 and the mass corresponding to the impossible case
m1.. n X i must be equal to 0.
In order to succeed in generalizing the Dempster combination rule and thus reducing its
combinatorial complexity, the reference frame of definition is limited with the constraint
that a perceived object can be connected with one and only one known object.
Sensor Data Fusion for Road Obstacle Detection: A Validation Framework 381
For example, for a detected object, in order to associate among three known objects, the
frame of discernment is:
Y1 ,Y2 ,Y3 ,Y*
where Yi means that "X and Yi are supposed to be the same object"
In order to be sure that the frame of discernment is really exhaustive, a last hypothesis noted
“Y*” is added (Royere et al., 2000). This one can be interpreted as “a target has no association
with any of the tracks”. In fact each Yj represents a local view of the world and the “Y*”
represents the rest of the world. In this context, “Y*” means that “an object is associated with
nothing in the local knowledge set”.
In our case, the definition of the bba is directly in relation with the data association
applications. The mass distribution is a local view around a target Xi and of a track Yj. The
bba on the association between Xi and Yj will be noted m j X i . It is defined on the frame of
discernment = {Y1,Y2,…Yn,Y*} and more precisely on focal elements Y ,Y , were Y
means not Y.
In this mass distribution, X denotes the processed perceived objects and the index j the
known objects (track). If the index is replaced by a set of indices, then the mass is applied to
all targets.
Moreover, if an iterative combination is used, the mass m j X i (Y* ) is not part of the initial
mass set and appears only after the first combination. It replaces the conjunction of the
combined masses m j X i (Y j ) . By observing the behaviour of the iterative combination with
n mass sets, a general behaviour can be seen which enables to express the final mass set
according to the initial mass sets. This enables to compute directly the final masses without s
recurrent stage. For the construction of these combination rules, the work and a first
formalism given in (Rombaut, 1998) is used. The use of a basic belief assignment generator
using the strong hypothesis: “an object cannot be in the same time associated and not associated to
another object” allows obtaining new rules. These rules firstly reduce the influence of the
382 Sensor Fusion and Its Applications
conflict (the combination of two identical mass sets will not produce a conflict) and,
secondly the complexity of the combination (Gruyer & Berge-Cherfaoui 1999a; Gruyer &
Berge-Cherfaoui 1999b). The rules become:
n
m1..n X i (Yj ) m j X i (Yj )
1 m
a 1
a Xi (Ya ) (8)
a j
n
m1..
n X i Yj , Y* m
j Xi ma Xi (Ya )
a 1
(9)
a j
n
m1..
n X i Yj , Yk , Y* m
j Xi .mk Xi ma Xi (Ya )
a 1
(10)
a j
a k
n
m1..
n Xi Yj , Yk ,..., Yl , Y* m
j Xi .mk Xi ....ml Xi ma Xi Ya (11)
a 1
a j
a k
......
al
n
m1..
n X i Yj m j X i (Yj )
m
a X i (12)
a1
a j
n
m1..n X i m
a1
a X i (13)
n
m1..n X i (Y* ) m
a 1
a Xi (Ya ) (14)
n n n
m1.. n
a 1 a1
Xi 1 1 ma Xi (Ya ) ma Xi (Ya ) 1 mb Xi (Yb )
b 1
(15)
b a
Sensor Data Fusion for Road Obstacle Detection: A Validation Framework 383
m{Xi}(Y*) is the result of the combination of all non association belief masses for Xi. Indeed,
new target(s) apparition or loss of track(s) because of field of view limitation or objects
occultation, leads to consider with attention the Y* hypothesis which models these
phenomena.
In fact, a specialized bba can be defined given a local view of X with Y association. In order
to obtain a global view, it is necessary to combine the specialized bbas. The combination is
possible when bbas are defined on the same frame of discernment and for the same
parameter X.
In a first step, a combination of m j X i with j [1..n] is done using equations (8) to (15).
The result of the combination gives a mass m1..n X i defined on 2. We can repeat these
operations for each Xi and to obtain a set of p bbas: m1.. n X 1 , m1.. n X 2 ,.. m1..n X p
p is the number of targets and Ω the frame including the n tracks corresponding to the n
hypotheses for target-to-track association.
mi Y j () m j X i () : Degree of « the ignorance on the association between Yj and Xi ».
The same combination -equations (8) to (15)- is applied and gives m1.. p Yi .
These operations can be repeated for each Yj to obtain a set of n bbas:
m1..1p Y1 , m1..2p Y2 ,.. m1..np Yn
n is the number of tracks and j is the frame based on association hypothesis for Yj
parameter. The index j in j is now useful in order to distinguish the frames based on
association for one specific track Yj for j [1..n].
A second matrix is obtained involving the pignistic probabilities BetP i Yi X j about the
tracks.
384 Sensor Fusion and Its Applications
The last stage of this algorithm consists to establish the best decision from the previously
computed associations using the both pignistic probabilities matrices ( BetP i X i Y j and
BetP i Yi X j ). The decision stage is done with the maximum pignistic probability rule.
This rule is applied on each column of both pignistic probabilities matrices.
With the first matrix, this rule answers to the question “which track Yj is associated with target
Xi?”:
(16)
Xi d(Yj ) Max BetP i
i Xi Yj
With the second matrix, this rule answers to the question “which target Xi is associated to the
track Yj?”:
(17)
Yj d( X i ) Max BetP j
j Yj Xi
Unfortunately, a problem appears when the decision obtained from a pignistic matrix is
ambiguous (this ambiguity quantifies the duality and the uncertainty of a relation) or when
the decisions between the two pignistic matrices are in conflict (this conflict represents
antagonism between two relations resulting each one from a different belief matrix). Both
problems of conflicts and ambiguities are solved by using an assignment algorithm known
under the name of the Hungarian algorithm (Kuhn, 1955; Ahuja et al., 1993). This algorithm
has the advantage of ensuring that the decision taken is not “good” but “the best”. By the
“best”, we mean that if a known object has defective or poor sensors perceiving it, then the
sensor is unlikely to know what this object corresponds to, and therefore ensuring that the
association is good is a difficult task. But among all the available possibilities, we must
certify that the decision is the “best” of all possible decisions.
Once the multi-objects association has been performed, the Kalman filter associated to each
target is updated using the new position of the target, and so the dynamic state of each
target is estimated, i.e. both speed and angular speed.
stereoscopic baseline. Given a point P (Xa, Ya, Za) in the common coordinate system Ra, its
position (ur, s, v) and (ul,s, v) in the stereoscopic images can be calculated as:
X b /2
u u a s (18)
r 0 0 0
(Y Y ) sin ( Z Z ) cos
a S s a S s
Xa bs /2
ul u0 (19)
(Ya YS0 )sins ( Za ZS0 )coss
bs
s (21)
(Ya YS )sins ( Za ZS0 )coss
0
bs (( v v0 )coss sins )
Ya YS 0 (23)
s
bs ( coss ( v v0 )sins )
Za ZS 0 (24)
s
The coordinate system R = (Ω, ur, v, s) defines a 3D space E, denoted disparity space.
ur
Fig. 6 illustrates this definition. This is equivalent to a region of interest in the right image of
the stereoscopic pair, associated to a disparity range. This definition is useful to distinguish
objects that are connected in the images, but located at different longitudinal positions.
To build volumes of interest in the stereoscopic images, a bounding box Vo is constructed in
Ra from the laser scanner targets as described in Fig. 7 (a). Znear , Xleft and Xright are computed
from the ellipse parameters featuring the laser target. Zfar and Yhigh are then constructed from
an arbitrary knowledge of the size of the obstacles. Fig. 7 (b) shows how the VOI is projected
in the right image of the stereoscopic pair. Equations (18-20) are used to this purpose.
Fig. 7. (a): Conversion of a laser target into bounding box. (b): Projection of the bounding
box (i.e. VOI) into the right image of the stereoscopic pair.
Sensor Data Fusion for Road Obstacle Detection: A Validation Framework 387
1) Local disparity map computation: The local disparity map for each VOI is computed using a
classical Winner Take All (WTA) approach (Scharstein & Szeliski, 2002) based on Zero Sum
of Square Difference (ZSSD) criterion. Use of a sparse disparity map is chosen to keep a low
computation time. Thus, only high gradient pixels are considered in the process.
2) Filtering: Using directly raw data from the local disparity map could lead to a certain
number of errors. Indeed, such maps could contain pixels belonging to the road surface, to
targets located at higher distances or some noise due to matching errors. Several filtering
operations are implemented to reduce such sources of errors: the cross-validation step helps
to efficiently reject errors located in half-occluded areas (Egnal & Wildes, 2002), the double
correlation method, using both rectangular and sheared correlation window provides
instant classification of the pixels corresponding to obstacles or road surface (Perrolaz et al.,
2007). Therefore only obstacle pixels are kept; it is required to take in consideration the
disparity range of the VOI in order to reject pixels located further or closer than the
processed volume; a median filter rejects impulse noise created by isolated matching errors.
3) Obstacle pixels: Once the local disparity map has been computed and filtered, the VOI
contains an ‘obstacle disparity map’, corresponding to a set of measurement points. For
better clarity, we will call obstacle pixels the measurement points present in the ‘obstacle
disparity map’.
We propose to exploit the obstacle pixels to reject false detections. It is necessary to highlight
the major features of what we call ‘obstacles’, before defining the validation strategy. These
features must be as little restrictive as possible, to ensure that the process of validation
remains generic against the type of obstacles.
Starting from these three hypotheses, let us define three different validation criteria.
1) Number of obstacle pixels: To validate a target according to the first feature, the most
natural method consists in checking that the volume of interest associated to the target
actually contains obstacle pixels. Therefore, our validation criterion consists in counting the
number of obstacle pixels in the volume, and comparing it to a threshold.
388 Sensor Fusion and Its Applications
2) Prevailing alignment criterion: One can also exploit the almost verticality of obstacles, while
the road is almost horizontal. We offer therefore to measure in which direction the obstacle
pixels of the target are aligned. For this purpose, the local disparity map of the target is
projected over the v-disparity plane (Labayrade & al., 2002). A linear regression is then
computed to find the global orientation of the set of obstacle pixels. The parameters of the
extracted straight line are used to confirm the detection.
3) Bottom height criterion: A specific type of false detections by stereovision appears in scenes
with many repetitive structures. Highly correlated false matches can then appear as objects
closer to the vehicle than their actual location. These false matches are very disturbing,
because the validation criteria outlined above assume that matching errors are mainly
uncorrelated. These criteria are irrelevant with respect to such false detections. Among these
errors, the most problematic ones occur when the values of disparities are over-evaluated. In
the case of an under-evaluation, the hypothesis of detection is located further than the actual
object, and is therefore a case of detection failure. When the disparity is significantly over-
evaluated, the height of the bottom of an obstacle can be high and may give the feeling that
the target flies without ground support. So the validation test consists to measure the
altitude of the lowest obstacle pixel in the VOI, and check that this altitude is low enough.
Fig. 8. Detailed architecture of the framework, using a laser scanner as the first sensor, and
stereovision as validation sensor.
Fig. 9. Right image from stereoscopic pair with laser points projected (cross), and resulting
targets (rectangles).
Fig. 10. Common sources of errors in detection using a laser scanner. (a): laser scanning
plane intersects road surface. (b): non planar road is seen as an obstacle. (c): laser temporal
tracking failed. All of these errors are correctly discarded by the stereovision-based
validation step.
(a) (b)
Fig. 11. (a): Detection from optical identification system projected in the right image. (b):
Error in detection: ID lamp reflected on the road separating wall. This error is correctly
discarded by the stereovision-based validation step.
6. Conclusion
For the application of obstacle detection in the automotive domain, reliability is a major
consideration. In this chapter, a sensor data fusion validation framework was proposed: an
initial sensor provides hypothesis of detections that are validated by a second sensor.
Experiments demonstrate the efficiency of this strategy, when using a stereovision rig as the
validation sensor, which provide rich 3D information about the scene. The framework can
be implemented for any initial devices providing hypothesis of detection (either single
sensor or detection system), in order to drastically decrease the false alarm rate while having
few influence on the detection rate.
One major improvement of this framework would be the addition of a multi-sensor
combination stage, to obtain an efficient multi-sensor collaboration framework. The choice
to insert this before or after validation is still open, and may have significant influence on
performances.
392 Sensor Fusion and Its Applications
7. References
Ahuja R. K.; Magnanti T. L. & Orlin J. B. (1993). Network Flows, theory, algorithms, and
applications, Editions Prentice-Hall, 1993.
Bertozzi M. & Broggi, A. (1998). Gold: A parallel real-time stereo vision system for generic
obstacle and lane detection, IEEE Transactions on Image Processing, 7(1), January
1998.
Bertozzi, M., Broggi, A., Fascioli, A. & Nichele, S. (2000). Stereo vision based vehicle
detection, In Proceedings of the IEEE Intelligent Vehicles Symposium, Detroit, USA,
October 2000.
Betke, M. & Nguyen, M. (1998). Highway scene analysis form a moving vehicle under
reduced visibility conditions, Proceedings of the IEEE International Conference on
Intelligent Vehicles, Stuttgart, Germany, October 1998.
Blackman S. & Popoli R. (1999). Modern Tracking Systems, Artech, 1999.
Denoeux, T. & Smets, P. (2006). Classification using Belief Functions: the Relationship
between the Case-based and Model-based Approaches, IEEE Transactions on
Systems, Man and Cybernetics B, Vol. 36, Issue 6, pp 1395-1406, 2006.
Egnal, G. & Wildes, R. P. (2002). Detecting binocular half-occlusions: Empirical comparisons
of five approaches, IEEE Transactions on Pattern Analysis and Machine Intelligence,
24(8):1127–1133, 2002.
Griffiths, P., Langer, D., Misener, J. A., Siegel, M., Thorpe. C. (2001). Sensorfriendly vehicle
and roadway systems, Proceedings of the IEEE Instrumentation and Measurement
Technology Conference, Budapest, Hongrie, 2001.
Gruyer, D. & Berge-Cherfaoui V. (1999a). Matching and decision for Vehicle tracking in road
situation, IEEE/RSJ International Conference on Intelligent Robots and Systems, Koera,
1999.
Gruyer, D., & Berge-Cherfaoui V. (1999b). Multi-objects association in perception of
dynamical situation, Fifteenth Conference on Uncertainty in Artificial Intelligence,
UAI’99, Stockholm, Sweden, 1999.
Gruyer, D. Royere, C., Labayrade, R., Aubert, D. (2003). Credibilistic multi-sensor fusion for
real time application. Application to obstacle detection and tracking, ICAR 2003,
Coimbra, Portugal, 2003.
Kaempchen, N.; Buehler, M. & Dietmayer, K. (2005). Feature-level fusion for free-form object
tracking using laserscanner and video, Proceedings of the IEEE Intelligent Vehicles
Symposium, Las Vegas, USA, June 2005.
Kuhn H. W. (1955), The Hungarian method for assignment problem, Nav. Res. Quart., 2, 1955.
Labayrade, R.; Aubert, D. & Tarel, J.P. (2002). Real time obstacle detection on non flat road
geometry through ‘v-disparity’ representation, Proceedings of the IEEE Intelligent
Vehicles Symposium, Versailles, France, June 2002.
Labayrade R.; Royere, C. & Aubert, D. (2005). A collision mitigation system using laser
scanner and stereovision fusion and its assessment, Proceedings of the IEEE
Intelligent Vehicles Symposium, pp 440– 446, Las Vegas, USA, June 2005.
Labayrade R.; Royere C.; Gruyer D. & Aubert D. (2005). Cooperative fusion for multi-
obstacles detection with use of stereovision and laser scanner", Autonomous Robots,
special issue on Robotics Technologies for Intelligent Vehicles, Vol. 19, N°2, September
2005, pp. 117 - 140.
Sensor Data Fusion for Road Obstacle Detection: A Validation Framework 393
Mendes, A.; Conde Bento, L. & Nunes U. (2004). Multi-target detection and tracking with a
laserscanner, Proceedings of the IEEE Intelligent Vehicles Symposium, University of
Parma, Italy, June 2004.
Michalke, T.; Gepperth, A.; Schneider, M.; Fritsch, J. & Goerick, C. (2007). Towards a human-
like vision system for resource-constrained intelligent Cars, Proceedings of the 5th
International Conference on Computer Vision Systems, 2007.
Nedevschi, S.; Danescu, R.; Frentiu, D.; Marita, T.; Oniga, F.; Pocol, C.; Graf, T. & Schmidt R.
(2004). High accuracy stereovision approach for obstacle detection on non planar
roads, Proceedings of the IEEE Intelligent Engineering Systems, Cluj Napoca, Romania,
September 2004.
Perrollaz, M., Labayrade, R., Gallen, R. & Aubert, D. (2007). A three resolution framework
for reliable road obstacle detection using stereovision, Proceedings of the IAPR
International Conference on Machine Vision and Applications, Tokyo, Japan, 2007.
Rombaut M. (1998). Decision in Multi-obstacle Matching Process using Theory of Belief,
AVCS’98, Amiens, France, 1998.
Royere, C., Gruyer, D., Cherfaoui V. (2000). Data association with believe theory,
FUSION’2000, Paris, France, 2000.
Scharstein, D. & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo
correspondence algorithms, International Journal of Computer Vision, 47(1-3):7–42,
2002.
Shafer G. (1976). A mathematical theory of evidence, Princeton University Press, 1976.
Skutek, M.; Mekhaiel, M. & Wanielik, M. (2003). Precrash system based on radar for
automotive applications, Proceedings of the IEEE Intelligent Vehicles Symposium,
Columbus, USA, June 2003.
Steux, B.; Laurgeau, C.; Salesse, L. & Wautier, D. (2002). Fade: A vehicle detection and
tracking system featuring monocular color vision and radar data fusion, Proceedings
of the IEEE Intelligent Vehicles Symposium, Versailles, France, June 2002.
Sugimoto, S.; Tateda, H.; Takahashi, H. & Okutomi M. (2004). Obstacle detection using
millimeter-wave radar and its visualization on image sequence, Proceedings of the
IAPR International Conference on Pattern Recognition, Cambridge, UK, 2004.
Toulminet, G.; Bertozzi, V; Mousset, S.; Bensrhair, A. & Broggi. (2006). Vehicle detection by
means of stereo vision-based obstacles features extraction and monocular pattern
analysis, IEEE Transactions on Image Processing, 15(8):2364–2375, August 2006.
Von Arnim, A.; Perrollaz, M.; Bertrand, A. & Ehrlich, J. (2007). Vehicle identification using
infrared vision and applications to cooperative perception, Proceedings of the IEEE
Intelligent Vehicles Symposium, Istanbul, Turkey, June 2007.
Williamson, T. (1998). A High-Performance Stereo Vision System for Obstacle Detection. PhD
thesis, Carnegie Mellon University, 1998.
Yamaguchi, K.; Kato, T. & Ninomiya, Y. (2006). Moving obstacle detection using monocular
vision, Proceedings of the IEEE Intelligent Vehicles Symposium, Tokyo, Japan, June
2006.
394 Sensor Fusion and Its Applications
Biometrics Sensor Fusion 395
17
X
1. Introduction
Performance of any biometric system entirely depends on the information that is acquired
from biometrics characteristics (Jain et. al., 2004). Several biometrics systems are developed
over the years in the last two decades, which are mostly considered as viable biometric tools
used for human identification and verification. However, due to some negative constraints
that are often associated with the biometrics templates are generally degraded the overall
performance and accuracy of the biometric systems. In spite of that, many biometrics
systems are developed and implemented over the years and deployed successfully for user
authentication. Modality based categorization of the biometric systems are made on the
basis of biometric traits are used. While single biometric systems are used for verification or
identification of acquired biometrics characteristics/attributes, it is called uni-biometrics
authentication systems and when more than one biometric technology are used in fused
form for identification or verification, it is called multimodal biometrics. It has been seen
that, depending on the application context, mono-modal or multimodal biometrics systems
can be used for authentication.
In biometric, human identity verification systems seek considerable improvement in
reliability and accuracy. Several biometric authentication traits are offering ‘up-to-the-mark’
and negotiable performance in respect of recognizing and identifying users. However, none
of the biometrics is giving cent percent accuracy. Multibiometric systems remove some of
the drawbacks of the uni-modal biometric systems by acquiring multiple sources of
information together in an augmented group, which has richer details. Utilization of these
biometric systems depends on more than one physiological or behavioral characteristic for
enrollment and verification/ identification. There exist multimodal biometrics (Jain et. al.,
2004) with various levels of fusion, namely, sensor level, feature level, matching score level
and decision level. Further, fusion at low level / sensor level by biometric image fusion is an
emerging area of research for biometric authentication.
A multisensor multimodal biometric system fuses information at low level or sensor level of
processing is expected to produce more accurate results than the systems that integrate
396 Sensor Fusion and Its Applications
information at a later stages, namely, feature level, matching score level, because of the
availability of more richer and relevant information.
Face and palmprint biometrics have been considered and accepted as most widely used
biometric traits, although the fusion of face and palmprint is not studied at sensor level/low
level when it is compared with existing multimodal biometric fusion schemes. Due to
incompatible characteristics of face and palmprint images, where a face image is processed
as holistic texture features on a whole face or divided the face into local regions and
palmprint consists of ridges and bifurcations along with three principal lines, difficult to
integrate in different levels of fusion in biometric.
This chapter proposes a novel biometric sensor generated evidence fusion of face and
palmprint images using wavelet decomposition and monotonic decreasing graph for user
identity verification. Biometric image fusion at sensor level refers to a process that fuses
multispectral images captured at different resolutions and by different biometric sensors to
acquire richer and complementary information to produce a fused image in spatially
enhanced form. SIFT operator is applied for invariant feature extraction from the fused
image and the recognition of individuals is performed by adjustable structural graph
matching between a pair of fused images by searching corresponding points using recursive
descent tree traversal approach. The experimental results show that the proposed method
with 98.19% accuracy is found to be better than the uni-modal face and palmprint
authentication having recognition rates 89.04% and 92.17%, respectively if all methods are
processed in the same feature space, i.e., in SIFT feature space.
The chapter is organized as follows. Next section introduces a few state-of-the-art biometrics
sensor fusion methods for user authentication and recognition. Section 3 discusses the
process of multisensor biometric evidence fusion using wavelet decomposition and
transformation. Section 4 presents the overview of feature extraction by using SIFT features
from fused image. Structural graph for corresponding points searching and matching is
analyzed in Section 5. Experimental results are discussed in section 6 and conclusion is
drawn in the last section.
infrared images for generating a fused face image. 2D log polar Gabor transform and local
binary pattern feature extraction algorithms are applied to the fused face image to extract
global and local facial features, respectively. The corresponding matching scores are fused
using Dezert Smarandache theory of fusion which is based on plausible and paradoxical
reasoning. The efficacy of the proposed algorithm is validated using the Notre Dame and
Equinox databases and is compared with existing statistical, learning, and evidence theory
based fusion algorithms.
Prior to image fusion, wavelet transforms are determined from face and palmprint images.
The wavelet transform contains low-high bands, high-low bands and high-high bands of the
face and palmprint images at different scales including the low-low bands of the images at
coarse level. The low-low band has all the positive transform values and remaining bands
have transform values which are fluctuating around zeros. The larger transform values in
these bands respond to sharper brightness changes and thus to the changes of salient
features in the image such as edges, lines, and boundaries. The proposed image fusion rule
selects the larger absolute values of the two wavelet coefficients at each point. Therefore, a
fused image is produced by performing an inverse wavelet transform based on integration
of wavelet coefficients correspond to the decomposed face and palmprint images. More
formally, wavelet transform decomposes an image recursively into several frequency levels
and each level contains transform values. Let it be a gray-scale image, after wavelet
decomposition, the first level would be
Registered Image
DWT
-I
Fusion Decision
Wavelet
Decomposition
Wavelet
Decomposition
Generally, I LL1 represents the base image, which contains coarse detail of positive transform
values and the other high frequency detail such as I LH1 , I HL1 and I HH1 represent the
vertical, horizontal and diagonal detail of transform values, respectively, and these details
fluctuating transform values around zeros. After nth level decomposition of the base image
in low frequency, the nth level would be the following:
So, the nth level of decomposition will be consisting of 3n+1 sub-image sequences. The 3n+1
sub-image sequences are then fused by applying different wavelet fusion rules on the low
and high frequency parts. Finally, inverse wavelet transformation is performed to restore
the fused image. The fused image possesses good quality of relevant information for face
and palm images. Generic wavelet-based decomposition and image fusion approach are
shown in the Fig. 1 and Fig. 2 respectively.
scaled by some constant times of the other. To detect the local maxima and minima, each
feature point is compared with its 8 neighbors at the same scale and in accordance with its 9
neighbors up and down by one scale. If this value is the minimum or maximum of all these
points then this point is an extrema. More formally, if a DoG image is given as D(x, y, σ),
then
where L(x, y, kσ) is the convolution of the original image I(x, y) with the Gaussian blur G(x, y,
kσ) at scale kσ, i.e.,
1 2
y 2 ) / 2 2
G ( x, y, ) e ( x
2 2
From Equations (3) and (4) it can be concluded that a DoG image between
scales kiσ and kjσ is just the difference of the Gaussian-blurred images at scales kiσ and kjσ.
For scale-space extrema detection with the SIFT algorithm, the image is first convolved with
Gaussian-blurs at different scales. The convolved images are grouped by octave (an octave
corresponds to doubling the value of σ), and the value of ki is selected so that we obtain a
fixed number of convolved images per octave. Then the Difference-of-Gaussian images are
taken from adjacent Gaussian-blurred images per octave. Fig. 3 shows difference-of-
gaussian octave.
value of difference of Gaussian pyramids is less than a threshold value the point is excluded.
If there is a case of large principle curvature across the edge but a small curvature in the
perpendicular direction in the difference of Gaussian function, the poor extrema is localized
and eliminated.
First, for each candidate keypoint, interpolation of nearby data is used to accurately
determine its position. The initial approach is to just locate each keypoint at the location and
scale of the candidate keypoint while the new approach calculates the interpolated location
of the extremum, which substantially improves matching and stability. The interpolation is
done using the quadratic expansion of the Difference-of-Gaussian scale-space function, D(x,
y, σ) with the candidate keypoint as the origin. This Taylor expansion is given by:
D T 1 2D (5)
D( p) D p pT p
p 2 p 2
where D and its derivatives are evaluated at the sample point and p = (x, y, σ)T is the offset
from this point. The location of the extremum, p̂ is determined by taking the derivative of
this function with respect to p and setting it to zero, giving
2 D 1 D
x (6)
x 2 x
If the offset p is larger than 0.5 in any dimension, then it is an indication that the extremum
lies closer to another candidate keypoint. In this case, the candidate keypoint is changed and
the interpolation performed instead about that point. Otherwise the offset is added to its
candidate keypoint to get the interpolated estimate for the location of the extremum.
m ( x, y ) ( L ( x 1, y ) L ( x 1, y )) 2 ( L ( x , y 1) L ( x , y 1)) 2 (7)
402 Sensor Fusion and Its Applications
An orientation histogram is formed from the gradient orientations of sample points within a
region around the keypoint. The orientation histogram has 36 bins covering the 360 degree
range of orientations. Each sample added to the histogram is weighted by its gradient
magnitude and by a Gaussian-weighted circular window with a σ that is 1.5 times that of
the scale of the keypoint.
Fig. 4. A keypoint descriptor created by the gradient magnitude and the orientation at each
point in a region around the keypoint location.
50
100
150
200
50 100 150
Fig. 5. Fig. 5(a) Fused Image, (b) Fused Image with extracted SIFT features.
Biometrics Sensor Fusion 403
In the proposed work, the fused image is normalized by histogram equalization and after
normalization invariants SIFT features are extracted from the fused image. Each feature
point is composed of four types of information – spatial location (x, y), scale (S), orientation
(θ) and Keypoint descriptor (K). For the sake experiment, only keypoint descriptor
information has been taken which consists of a vector of 128 elements representing
neighborhood intensity changes of current points. More formally, local image gradients are
measured at the selected scale in the region around each keypoint. The measured gradients
information is then transformed into a vector representation that contains a vector of 128
elements for each keypoints calculated over extracted keypoints. These keypoint descriptor
vectors represent local shape distortions and illumination changes. In Fig. 5, SIFT features
extracted on the fused image are shown.
Next section discusses the matching technique by structural graph for establishing
correspondence between a pair of fused biometric images by searching a pair of point sets
using recursive descent tree traversal algorithm (Cheng, et. al., 1991).
Equation (9) is used for making closeness between a pair of edges using edge threshold .
Traversal would be possible when p i correspond to g1 and p j corresponds to g 2 or
conversely, p j to g 1 and p i to g 2 . Traversal can be initiated from the first
edge ( p i , p j ) and by visiting n feature points, we can generate a matching graph
P ' ( p 1 ' , p 2 ' , p 3 ' ,..., p m ' ) on the fused probe image which should be a corresponding
candidate graph of G . In each recursive traversal, a new candidate graph Pi ' is found. At
the end of the traversal algorithm, a set of candidate graphs Pi ' ( p 1 i ', p 2 i ', p 3 i ', ..., p m i ') i =
1,2,…,m is found and all of which are having identical number of feature points.
th
For illustration, consider the minimal k order error from G . , the final optimal
graph P " can be found from the set of candidate graphs Pi ' and we can write,
m min( k , i 1)
| P ' ' G | k
i2
| d( p ', p
j 1
i i j ' ) d ( g i , g i j ) |, (11)
k , k 1, 2 ,3,..., m
The Equation (11) denotes sum of all differences between a pair edges corresponding to a
pair of graphs. This sum can be treated as final dissimilarity value for a pair of graphs and
also for a pair of fused images. It is observed that, when k is large, the less error
correspondence is found. This is not always true as long as we have a good choice of the
edge threshold є. Although for the larger k, more comparison is needed. For identity
verification of a person, client-specific threshold has been determined heuristically for each
user and the final dissimilarity value is then compared with client-specific threshold and
decision is made.
6. Experiment Results
The experiment is carried out on multimodal database of face and palmprint images
collected at IIT Kanpur which consists of 750 face images and 750 palmprint images of 150
individuals. Face images are captured under control environment with ±200 changes of head
pose and with at most uniform lighting and illumination conditions and with almost
consistent facial expressions. For the sake of experiment, cropped frontal view face has been
taken covering face portion only. For the palmprint database, cropped palm portion has
been taken from each palmprint image which contains three principal lines, ridges and
bifurcations. The proposed multisensor biometric evidence fusion method is considered as a
semi-sensor fusion approach with some minor adjustable corrections in terms of cropping
and registration. Biometric sensors generated face and palmprint images are fused at low
level by using wavelet decomposition and fusion of decompositions. After fusion of
Biometrics Sensor Fusion 405
cropped face and palmprint images of 200×220 pixels, the resolution for fused image has
been set to 72 dpi. The fused image is then pre-processed by using histogram equalization.
Finally, the matching is performed between a pair of fused images by structural graphs
drawn on both the gallery and the probe fused images using extracted SIFT keypoints.
0.9
0.8
<--- Accept Rate --->
0.7
0.6
0.5
0.4
0.3
Multisensor Biometric fusion
0.2 Palmprint-SIFT Matching
Face-SIFT Matching
0.1
-4 -3 -2 -1 0
10 10 10 10 10
<--- False Accept Rate --->
Fig. 6. ROC curves (in ‘stairs’ form) for the different methods.
The matching is accomplished for the method and the results show that fusion performance
at the semi-sensor level / low level is found to be superior when it is compared with two
monomodal methods, namely, palmprint verification and face recognition drawn on same
feature space. Multisensor biometric fusion produces 98.19% accuracy while face
recognition and palmprint recognition systems produce 89.04% accuracy and 92.17%
accuracy respectively, as shown in the Fig. 6. ROC curves shown in Figure 6 illustrate the
trade-off between accept rate and false accept rate. Further, it shows that the increase in
accept rate accompanied by decrease in false accept rate happens in each modality, namely,
multisensor biometric evidence fusion, palmprint matching and face matching.
7. Conclusion
A novel and efficient method of multisensor biometric image fusion of face and palmprint
for personal authentication has been presented in this chapter. High-resolution multisensor
face and palmprint images are fused using wavelet decomposition process and matching is
performed by monotonic-decreasing graph drawn on invariant SIFT features. For matching,
correspondence has been established by searching feature points on a pair of fused images
using recursive approach based tree traversal algorithm. To verify the identity of a person,
test has been performed with IITK multimodal database consisting of face and palmprint
samples. The result shows that the proposed method initiated at the low level / semi-sensor
level is robust, computationally efficient and less sensitive to unwanted noise confirming the
406 Sensor Fusion and Its Applications
validity and efficacy of the system, when it is compared with mono-modal biometric
recognition systems.
8. References
Bicego, M., Lagorio, A., Grosso, E. & Tistarelli, M. (2006). On the use of SIFT features for face
authentication. Proceedings of International Workshop on Biometrics, in association with
CVPR.
Cheng, J-C. & Don, H-S. (1991). A graph matching approach to 3-D point correspondences
Cheng, J.C. & Dong Lee, H.S. (1988). A Structural Approach to finding the point
correspondences between two frames. Proceedings of International Conference on
Robotics and Automation, pp. 1810 -1815.
Hsu, C. & Beuker, R. (2000). Multiresolution feature-based image registration. Proceedings of
the Visual Communications and Image Processing, pp. 1 – 9.
http://www.eecs.lehigh.edu/SPCRL/IF/image_fusion.htm
Jain, A.K. & Ross, A. (2004). Multibiometric systems. Communications of the ACM, vol. 47,
no.1, pp. 34 - 40.
Jain, A.K., Ross, A. & Pankanti, S. (2006). Biometrics: A tool for information security. IEEE
Transactions on Information Forensics and Security, vol. 1, no. 2, pp. 125 – 143.
Jain, A.K., Ross, A. & Prabhakar, S. (2004). An introduction to biometrics recognition. IEEE
Transactions on Circuits and Systems for Video Technology, vol. 14, no. 1, pp. 4 – 20.
Lin, Z.C., Lee, H. & Huang, T.S. (1986). Finding 3-D point correspondences in motion
estimation. Proceeding of International Conference on Pattern Recognition, pp.303 – 305.
Lowe, D. G. (2004). Distinctive image features from scale invariant keypoints. International
Journal of Computer Vision, vol. 60, no. 2.
Lowe, D.G. (1999). Object recognition from localscale invariant features. International
Conference on Computer Vision, pp. 1150 – 1157.
Park, U., Pankanti, S. & Jain, A.K. (2008). Fingerprint Verification Using SIFT Features.
Proceedings of SPIE Defense and Security Symposium.
Poh, N., & Kittler, J. (2008). On Using Error Bounds to Optimize Cost-sensitive Multimodal
Biometric Authentication. 17th International Conference on Pattern Recognition, pp. 1 –
4.
Raghavendra, R., Rao, A. & Kumar, G.H. (2010). Multisensor biometric evidence fusion of
face and palmprint for person authentication using Particle Swarm Optimization
(PSO). International Journal of Biometrics (IJBM), Vol. 2, No. 1.
Ross, A. & Govindarajan, R. (2005). Feature Level Fusion Using Hand and Face Biometrics.
Proceedings of SPIE Conference on Biometric Technology for Human Identification II, pp.
196 – 204.
Ross, A. & Jain, A.K. (2003). Information Fusion in Biometrics. Pattern Recognition Letters,
vol. 24, pp. 2115 – 2125.
Singh, R., Vatsa, M. & Noore, A. (2008). Integrated Multilevel Image Fusion and Match Score
Fusion of Visible and Infrared Face Images for Robust Face Recognition. Pattern
Recognition - Special Issue on Multimodal Biometrics, Vol. 41, No. 3, pp. 880-893.
Stathaki, T. (2008). Image Fusion – Algorithms and Applications. Academic Press, U.K.
Fusion of Odometry and Visual Datas to Localization a Mobile Robot 407
18
X
1. Introduction
Applications involving wheeled mobile robots have been growing significantly in recent
years thanks to its ability to move freely through space work, limited only by obstacles.
Moreover, the wheels allow for greater convenience of transportation in environments plans
and give greater support to the static robot.
In the context of autonomous navigation of robots we highlight the localization problem.
From an accumulated knowledge about the environment and using the current readings of
the sensors, the robot must be able to determine and keep up its position and orientation in
relation to this environment, even if the sensors have errors and / or noise. In other words,
to localize a robot is necessary to determine its pose (position and orientation) in the
workspace at a given time.
Borenstein et al. (1997) have classified the localization methods in two great categories:
relative localization methods, which give the robot’s pose relative to the initial one, and
absolute localization methods, which indicate the global pose of the robot and do not need
previously calculated poses.
As what concerns wheel robots, it is common the use of encoders linked to wheel rotation
axes, a technique which is known as odometry. However, the basic idea of odometry is the
integration of the mobile information in a determined period of time, what leads to the
accumulation of errors (Park et al., 1998). The techniques of absolute localization use
landmarks to locate the robot. These landmarks can be artificial ones, when introduced in
the environment aiming at assisting at the localization of the robot, or natural ones, when
they can be found in the proper environment.
It´s important to note that, even the absolute location techniques are inaccurate due to noise
from the sensors used. Aiming to obtain the pose of the robot with the smallest error
408 Sensor Fusion and Its Applications
parameters of the lines detected by Hough directly in the Kalman equations without any
intermediate calculation stage. Figure 1 shows the scheme of the proposed system.
in which s є Rn is the state vector; u є Rl is the vector of input signals; z є Rm is the vector of
measurements; the matrix n x n, A, is the transition matrix of the states; B, n x l, is the
coefficient matrix on entry; matrix C, m x n, is the observation matrix; γ є Rn represents the
vector of the noises to the process and є Rm the vector of measurement errors. Indexes
t and t-1 represent the present and the previous instants of time.
The Filter operates in prediction-actualization mode, taking into account the statistical
proprieties of noise. An internal model of the system is used to updating, while a retro-
alimentation scheme accomplishes the measurements. The phases of prediction and
actualization to DKF can be described by the Systems of Equations (2) and (3) respectively.
The Kalman Filter represents the state vector st by its mean μt and co-variance Σt. Matrixes
R, n x n, and Q, l x l, are the matrixes of the covariance of the noises of the process (γ) and
measurement () respectively, and matrix K, n x m, represents the gain of the system.
in which g(ut-1; st-1) is a non-linear function representing the model of the system, and h(st) is
a nonlinear function representing the model of the measurements. Their prediction and
actualization phases can be obtained by the Systems of Equations (5) and (6) respectively.
The matrix G, n x n, is the jacobian term linearizes the model and H, l x n, is the jacobian
term linearizes the measuring vector. Such matrixes are defined by the Equations (7) e
(8).
������� ����� �
�� � (7)
�����
������� �
�� � (8)
���
3. Modeling
3.1 Prediction phase: process model
Traditionally, the behavior of the robot motion is described by its dynamic model. Modeling
this type of system is quite complex because there are many variables involved (masses and
moments of inertia, friction, actuators, etc.). Even in the most elaborate systems cannot
faithfully portray the behavior of the robot motion.
A classic method used to calculate the pose of a robot is the odometry. This method uses
sensors, optical encoders, for example, which measure the rotation of the robot’s wheels.
Fusion of Odometry and Visual Datas to Localization a Mobile Robot 411
Using the cinematic model of the robot, its pose is calculated by means of the integration of
its movements from a referential axis.
As encoders are sensors, normally their reading would be implemented in the actualization
phase of the Kalman Filter, not in the prediction phase. Thrun et al. (2005) propose that
odometer information does not function as sensorial measurements; rather they suggest
incorporating them to the robot’s model. In order that this proposal is implemented, one
must use a robot’s cinematic model considering the angular displacements of the wheels as
signal that the system is entering in the prediction phase of the Kalman Filter.
Consider a robot with differential drive in which the control signals applied and its
actuators are not tension, instead angular displacement, according to Figure 2.
With this idea, and supposing that speeds are constant in the sampling period, one can
determine the geometric model of the robot’s movement (System 9).
��
�� � ���� � �� �sin����� � ��� � sin����� ��
�� � � �� (9)
� ��� � �� �cos����� � ��� � �������� ��
�� � ���� � ����������������������������������������������������������
The turn easier the readability of the System (9) representing the odometry model of the
robot, two auxiliary variables have been employed ΔL and Δθ.
in which ΔθR is the reading of the right encoder and functions relatively the robot by means
of the angular displacement of the right wheel; ΔθL is the reading of the left encoder and
functions as a displacement applied to the left wheel; b represents the distance from wheel
to wheel of the robot; rL and rR are the spokes of the right and the left wheels respectively.
It is important to emphasize that in real applications the angular displacement effectively
realized by the right wheel differs of that measured by the encoder. Besides that, the
supposition that the speeds are constant in the sampling period, which has been used to
obtain the model 9, is not always true. Hence, there are differences between the angular
displacements of the wheels (���� and ���� ) and those ones measured by the encoders (ΔθR
and ΔθL). This difference will be modeled by a Gaussian noise, according to System (11).
412 Sensor Fusion and Its Applications
���� � ��� � ��
� (11)
���� � ��� � ��
It is known that odometry possesses accumulative error. Therefore, the noises εR and εL
do not possess constant variance. It is presumed that these noises present a proportional
standard deviation to the module of the measured displacement. With these new
considerations, System (9) is now represented by System (12):
���
� �� � ���� � � �s������� � ���� � s������� ��
� ��
���
� � ���� � � �cos����� � ���� � �������� �� (12)
� � ��
�� � � �
� � ��� � ��
in which:
One should observe that this model cannot be used when ��� = 0. When it occurs, one uses
an odometry module simpler than a robot (System 14), obtained from the limit of System
(12) when ��� →0.
Thrun’s idea implies a difference as what concerns System (4), because the noise is not
audible; rather, it is incorporated to the function which describes the model, as System
(15) shows:
It is necessary, however, to bring about a change in the prediction phase of the System (6)
resulting in the System (16) equations:
in which, M, l x l, is the co-variance matrix of the noise sensors (ε) and V, n x m, is the
jacobian mapping the sensor noise to the space of state. Matrix V is defined by equation (17).
Making use of the odometry model of the robot described in this section and the definitions
of the matrixes used by the Kalman Filter, we have:
���
1 0 � �cos�����
��
� ���� � cos����� ��
�� � � ��� � (18)
0 1
��� �sin����� � ���� � sin����� ��
0 0 1
�� �
�� cos��� � � �� �sin��� � � sin ����� �� ��� cos��� � � �� �sin��� � � sin ����� ��
� �� sin��� � � �� ��cos��� � � cos ����� �� ��� sin��� � � �� ��cos��� � � cos ����� ���(19)
�� �� ��� ��
� |��� | 0
�� � � � � � (20)
0 �� |���� |
Elements m11 and m22 in the Equation (20) represent the fact that the standard deviations of
εR and εL are proportional to the module of the angular displacement. The variables k1, k2
and k3 are given by System (21), considering rd = re = r.
� ���
����
�
� �
�
� �� � ������ ����� �
� �� ���
�� �
����
�� � ���� � (21)
� �
�� � �����
� � ������� ���� �����
� �
Due to the choice of the straight lines as landmarks, the technique adopted to identify them
was the Hough transform [Hough, 1962]. This kind of transform is a method employed to
414 Sensor Fusion and Its Applications
identify inside a digital image a class of geometric forms which can be represented by a
parametric curve [Gonzales, 2007]. As what concerns the straight lines, a mapping is
provided between the Cartesian space (X ,Y) and the space of the parameters (ρ, ) where
the straight line is defined.
Hough defines a straight line using its common representation, as Equation (22) shows, in
which parameter (ρ) represents the length of the vector and () the angle this vector forms
with axis X. Figure 4 shows the geometric representation of these parameters.
The robot navigates in an environment where the position of the lines in the world is known
and every step identifies the descriptors of the lines contained in the image ρI e I. These
descriptors are mapped to the plane of a moving coordinate system and obtaining ρM e M.
This transformation is easy and relies only on the correct calibration of camera parameters.
Figure 5 illustrates the coordinate systems used in mathematical deduction of the sensor
model.
We define a fixed coordinate system (F) and a mobile one (M), attached to the robot, both
� �
illustrated in Figure 5. The origin of the mobile system has coordinates (�� � �� ) in the fixed
�
system. �� represents the rotation of the mobile system with respect to the fixed one. One
�
should note that there is a straight relation among these variables (�� � ��� � ���
) and the
robot’s pose (�� � �� � �� ), which is given by Equations (23).
Fusion of Odometry and Visual Datas to Localization a Mobile Robot 415
� � �
�� � �� �� � �� �� � �� � ��� (23)
We use the relation between coordinates in the (M) and (F) systems (System 24) and
Equation (22) in both coordinate systems (Equations 25 and 26).
�� � �� � �
� � � cos��� � � sin��� � � ��
� � �� � �� � � (24)
� � sin��� � � cos��� � � ��
By replacing Equations (24) in Equation (25), doing the necessary equivalences with
Equation (26) and replacing some variables using Equations (23), we obtain the Systems (27)
and (28), which represent two possible sensor models h(.) to be used in the filter. To decide
about which model to use, we calculate both values of � � and use the model which
generates the value closer to the measured value.
�� � �� � �� cos�� � � � �� sin�� � �
� � (27)
� � � � � � �� � �
The sensor model is incorporated into the EKF through the matrix H (Equation 8).
Representation for H obtained from the System (27) is given by Equation (29) and, using the
System (28), H is described by Equation (30).
� � �� �
� � ��cos �� � �sin �� � ��� sin�� � �� cos �� �� (29)
� � ��
� � �� �
� � �cos �� � sin �� � �� sin�� � �� cos �� �� (30)
� � ��
4. Image Processing
4.1 Detection of lines
Due to the choice of floor lines as landmarks, the technique adopted to identify them was
the Hough transform [Hough, 1962]. The purpose of this technique is to find imperfect
instances of objects within a certain class of shapes by a voting procedure. This voting
procedure is carried out in a parameter space, from which object candidates are obtained as
local maxima in an accumulator grid that is constructed by the algorithm for computing the
Hough transform [Bradski and Kaehler, 2008].
In our case, the shapes are lines described by Equation (22) and the parameter space has
coordinates (ρ,). The images are captured in grayscale and converted to black and white
using the edge detector Canny [Canny, 1986]. Figure 6.a shows a typical image of the floor,
416 Sensor Fusion and Its Applications
Figure 6.b shows the image after applying the Canny detector and Figure 6.c shows lines
detected by Hough.
a) b)
c)
Fig. 6. Image processing.
� �
�� �� � � ��� ��� (31)
1 1
The scale factor s is determined for each point in such a way that the value of the third
element of the vector is always 1. The homography can be calculated off-line by using a
pattern containing 4 or more remarkable points with known coordinates (see Figure 7.a).
After detecting the remarkable point in the image, we have several correspondences
between point coordinates in the mobile coordinate system M and in the image. Replacing
these points in Equation (31), we obtain a linear system with which we can determine the 8
elements of the homography matrix A.
a) b)
Fig. 7. Calibration pattern.
Once calculated the homography, for each detected line we do the following: a) using
the values of (��� ��) in the image obtained by the Hough transform, calculate two point
belonging to the image line; b) convert the coordinates of these two points to the mobile
Fusion of Odometry and Visual Datas to Localization a Mobile Robot 417
coordinate system M using A; c) determine (ߩெ ǡ ߙ ெ ) of the line that passes through these
two points.
To verify the correctness of the homography found, we calculated the re-projection error
using the points detected in the image and their counterparts worldwide. The average error
was calculated at e = 1.5 cm. To facilitate interpretation of this value, the figure shows a
circle of e radius drawn on the pattern used.
5. Results
The experiments were carried out using the Karel robot, a reconfigurable mobile platform
built in our laboratory that has coupled to the structure, a webcam and a laptop for
information processing (Figure 3). The robot has two wheels that are driven by DC motors
418 Sensor Fusion and Its Applications
with differential steering. Each motor has an optical encoder and a dedicated card based on
a PIC microcontroller that controls local velocity. The cards communicate with the computer
through a CAN bus, receiving the desired wheel velocities and encoder data.
To validate the proposed system, results were obtained in two different environments: one
containing only a pattern of lines and another containing two patterns of lines. The first
experiment was carried out by making the robot navigate in an environment where there
are vertical lines on the floor: (I) was command the robot to move forward by 25m, (II) rotate
90 degrees around its own axis ( III) move forward 5m, (IV) 180 rotate around its own axis
(V) move forward 5m, (VI) rotate 90 degrees around its axis and, finally, walking forward
25m. Figure 9 shows the map of the environment and the task commanded to the robot.
In this experiment, during the full navigation of the robot 1962 images were processed and
the matching process was successful in 93% of cases. The average observation of each line
was 23 times.
In this work the sensors used have different sampling rates. We decided to use the encoders
reading in a coordinated manner with the image capture. The camera captures images 640 x
480 (Figure 6) and each image is processed, on average, 180 ms. Figure 10 shows the graphs
of the acquisition time (image and encoder), processing time and total time of the system,
including acquisition, processing and calculations of the localization algorithm. The average
time of acquisition was 50 ms, the processing was 125 ms and the average total time the
system was 180 ms. The peaks on the graph made themselves available after the first turning
motion of the robot (II), or after it enters a new corridor with different lighting.
About the homography, Figure 7.a shows the pattern that was used at the beginning of the
experiment to calculate it. The camera was positioned so that it was possible to have a
viewing angle of about twice the size of the robot. It is important to remember that the
camera position is such that the image plane is not parallel to the floor plan. Equation (23)
shows the homography matrix used.
Besides the proposed system, another location system was also implemented: location
system using geometric correction. In this system, every step, the lines are identified and
used to calculate the robot pose using trigonometry. When there are no lines identified, the
robot pose is calculated by odometry. Figure 11 shows the trajectories calculated using EKF,
Geometric Correction and Odometry. It is easy to see that the behavior of the system based
on Kalman filter (proposed system) was more satisfactory. The final error, measured in-loco,
was 0.27m to the system using EKF, 0.46m using the geometric correction system and 0.93m
using only odometry.
In this second experiment, the matching process was successful in 95% of cases. Considering
the full navigation of the robot, 2220 images were processed and found that in 87% of step
lines were observed (61% and 26% a line two lines). The final error, measured in-loco was
lower than that found in Experiment 0.16m and allowed us to infer that for greater precision
of the proposed system is not enough just a lot of lines in the environment, but rather, they
are competitors.
7. References
Aiube, F. , Baidya T. and Tito, E. (2006), Processos estocásticos dos preços das commodities:
uma abordagem através do filtro de partículas, Brazilian Journal of Economics,
Vol.60, No.03, Rio de Janeiro, Brasil.
Amarasinghe, D., Mann, G. and Gosine, R. (2009), Landmark detection and localization for
mobile robot applications: a multisensor approach, Robotica Cambridge.
Bezerra, C. G. (2004), Localização de um robô móvel usando odometria e marcos naturais.
Master Thesis, Federal University of Rio Grande do Norte, Natal, RN, Brasil.
Fusion of Odometry and Visual Datas to Localization a Mobile Robot 421
Borenstein, J., Everett, H., Feng, L., and Wehe, D. (1997), Mobile robot positioning: Sensors
and techniques. Journal of Robotic Systems, pp. 231–249.
Bradski, G. and Kaehler, A. (2008), Learning OpenCV: Computer Vision with the OpenCV
Library, O'Reilly Media.
Canny, J. (1986), A computational approach to edge detection, IEEE Trans. Pattern Analysis
and Machine Intelligence, pp. 679 -698.
Gonzalez, R. C. and Woodes, R. E. (2007), Digital Image Processing. Prentice Hall.
Hough, P. V. C (1962), Method and means for recognizing complex patterns, US Pattent
3069654, Dec. 18.
Kalman, R. E. (1960), A new approach to linear filtering and predictive problems,
Transactions ASME, Journal of basic engineering.
Kiriy, E. and Buehler, M. (2002), Three-state extended Kalman filter for mobile robot
localization. Report Centre for Intelligent Machines - CIM, McGill University.
Launay, F., Ohya, A., and Yuta, S. (2002), A corridors lights based navigation system
including path definition using a topologically corrected map for indoor mobile
robots. IEEE International Conference on Robotics and Automation, pp.3918-3923.
Marzorati, D., Matteucci, M., Migliore, D. and Sorrenti, D. (2009), On the Use of Inverse
Scaling in Monocular SLAM, IEEE Int. Conference on Robotics and Automation, pp.
2030-2036.
Odakura, V., Costa, A. and Lima, P. (2004), Localização de robôs móveis utilizando
observações parciais, Symposium of the Brazilian Computer Society.
Park, K. C., Chung, D., Chung, H., and Lee, J. G. (1998), Dead reckoning navigation mobile
robot using an indirect Kalman filter. Conference on Multi-sensor fusion and
Integration for Intelliget Systems, pp. 107-118.
Thrun, S., Burgard, W., and Fox, D. (2005). Probabilistic Robotics. MIT Press.
Wu, E., Zhou, W., Dail, G. and Wang, Q. (2009), Monocular Vision SLAM for Large Scale
Outdoor Environment, IEEE Int. Conference on Mechatronics and Automation, pp.
2037-2041, (2009).
422 Sensor Fusion and Its Applications
Probabilistic Mapping by Fusion of Range-Finders Sensors and Odometry 423
0
19
André Santana
Federal University of Piauí
Teresina, PI, Brazil
1. Introduction
One of the main challenges faced by robotics scientists is to provide autonomy to robots. That
is, according to Medeiros (Medeiros, 1998) a robot to be considered autonomous must present
a series of abilities as reaction to environment changes, intelligent behavior, integration of
data provided by sensors (sensor fusion), ability for solving multiple tasks, robustness, op-
eration without failings, programmability, modularity, flexibility, expandability, adaptability
and global reasoning. Yet in the context of autonomy, the navigation problem appears. As
described in Fig. 1, sense, plan and act capabilities have to be previously given to a robot
in order to start thinking on autonomous navigation. These capabilities can be divided into
sub-problems abstracted hierarchically in five levels of autonomy: Environment Mapping, Lo-
calization, Path Planning, Trajectory Generation, and Trajectory Execution (Alsina et. al., 2002).
At the level of Environment Mapping the robot system has to generate a computational model
containing the main structural characteristics of the environment. In other words, it is nec-
essary to equip the robot with sensing devices that allow the robot to perceive its surrounds
acquiring useful data to producing information for construction of the environment map.
Further, in order to get a trustworthy mapping, the system needs to know the position and
orientation of the robot with relation to some fixed world referential. This process that includes
sensory data capture, position and orientation inferring, and subsequently processing with
objective of construction of a computational structure representing the robot underlying space
is simply known as Robotic Mapping.
In this work, we propose a mapping method based on probabilistic robotics, with the map
being represented through a modified occupancy grid Elfes (1987). The main idea is to let
the mobile robot construct its surroundings geometry in a systematic and incremental way
in order to get the final, complete map of the environment. As a consequence, the robot can
move in the environment in a safe mode based on a trustworthiness value, which is calculated
by its perceptual system using sensory data. The map is represented in a form that is coherent
424 Sensor Fusion and Its Applications
Sense
Environment Mapping
Localization
Plan
Path Planning
Trajectory Generation
Act
Trajectory Execution
with sensory data, noisy or not, coming from sensors. Characteristic noise incorporated to
data is treated by probabilistic modeling in such a way that its effects can be visible in the final
result of the mapping process. Experimental tests show the viability of this methodology and
its direct applicability to autonomous robot tasks execution, being this the main contribution
of this work.
In the folowing, the formal concepts related to robotics mapping through sensor fusion are
presented. A brief discussion about the main challenges in environment mapping and their
proposed solutions is presented as well the several manners of representing the mapped en-
vironments. Further, the mapping algorithm proposed in this work, based on a probabilistic
modeling on occupancy grid. The proposed modeling of sensory information with fusing of
sonar data that are used in this work and odometry data provided by the odometry system
of the robot is described. Given that odometry is susceptible to different sources of noise
(systematic and/or not), further efforts in modeling these kinds of noises in order to represent
them in the constructed map are described. In this way, the mapping algorithm results in a
representation that is consistent with sensory data acquired by the robot. Results of the pro-
posed algorithm considering the robot in an indoor environment are presented and, finally,
conclusions showing the main contributions and applications plus future directions are given.
2. Robotic Mapping
In order to formalize the robotics mapping problem, some basic hypotheses are established.
The first is that the robot precisely knows its position and orientation inside the environment
in relation to some fixed reference frame, that is, a global coordinate system. This process of
inferring position and orientation of the robot in an environment is known as the localization
problem. The second hypothesis is that the robot has a perceptual system, that is, sensors that
Probabilistic Mapping by Fusion of Range-Finders Sensors and Odometry 425
makes possible acquisition of data, proper and of the environment, such as cameras, sonars
and motor encoders, between others.
With these assumptions, robotics mapping can be defined as the problem of construction of
a spatial model of an environment through a robotic system based on accurate knowledge
of position and orientation of the robot in the environment and on data given by the robot
perceptual system.
With respect to the model used for representing the map, Thrun (Thrun, 2002) proposes a
classification following two main approaches, the topological and the metric maps. Topological
maps are those computationally (or mathematically) represented by way of a graph, which
is a well known entity in Math. In this representation, in general, the nodes correspond to
spaces or places that are well defined (or known) and the links represent connectivity relations
between these places. Metric maps (or metric representations) reproduce with certain degree
of fidelity the environment geometry. Objects as walls, obstacles and doorway passages are
easily identified in this approach because the map has a topographic relation very close to the
real world. This proposed classification is the most used up to date, besides a subtle variation
that adds a class of maps based on features appears in some works (Choset & Fox, 2004; Rocha,
2006). This category is treated sometimes as a sub-category of the metric representation due to
the storage of certain notable objects or features as for example edges, corners, borders, circles
and other geometric shapes that can be detected by any feature detector.
Fig. 2 illustrates the above mentioned ways of representing a mapped environment. Each one
of these forms of representation have its own advantages and disadvantages. It is easier to
construct and to maintain a map based on the metric approach. It allows to recognize places
with simple processing and facilitates the computation of short paths in the map. However,
it requires high computational efforts to be kept and needs to know the precise position and
orientation of the robot at all time, what can be a problem. On its turn, the topological represen-
tation needs few computational efforts to be kept and can rely on approximated position and
orientation, besides being a convenient way for solving several classes of high-level problems.
However, it is computationally expensive to construct and maintain this representation and it
makes it difficult the identification or recognition of places.
2 3
Several challenges that can be found in the robotics mapping problem are enumerated by
Thrun as (Thrun, 2002):
1. Modeling sensor errors
There are several sources of errors causing different types or natures of noise in the
426 Sensor Fusion and Its Applications
sensory data. Error can be easy modeled for noises that are statistically independent in
different measures. However, there is a random dependence that occurs because errors
inherent to robot motion accumulate over time affecting the way that sensory measures
are interpreted.
2. Environment dimension
Besides the lack of precision in the robot system, a second challenge is the size of the
environment to be mapped, that is, the map gets less precise and more expensive to
built it as the environmnet gets bigger.
3. Data association
This problem is also known as data correspondence (or matching). During the mapping,
it is often current that the same object or obstacle is perceived several times by the robot
system in different instants. So, it is desirable that an already seen object gets recognized
and treated in a different manner that a not yet mapped object. Data association aims
to determine the occurrence of this case in an efficient manner.
4. Environment dynamics
Another challenge is related to the mapping of dynamic environments as for example
places where people are constantly walking. The great majority of algorithms for
mapping considers the process running in static environments.
5. Exploration strategy
The mapping must incorporate a good exploration strategy, which should consider a
partial model of the environment. This task appears as the fifth challenge for the robotics
mapping problem.
Robots can be used to construct maps of indoor (Ouellette & Hirasawa, 2008; Santana &
Medeiros, 2009; Thrun et. al., 2004), outdoor (Agrawal et. al., 2007; Triebel et. al., 2006;
Wolf et. al., 2005), subterranean (Silver et. al., 2004; Thrun et. al., 2003), and underwater
environments (Clark et. al., 2009; Hogue & Jenkin, 2006). With respect to its use, they can be
employed in execution of tasks considered simple such as obstacle avoidance, path planning
and localization. Map can also be used in tasks considered of more difficulty as exploration of
galleries inside coal-mines, nuclear installations, toxic garbage cleanness, fire extinguishing,
and rescue of victims in disasters, between others. It is important to note that these tasks
can be extended to several classes of mobile robots, as aerial, terrestrial and aquatic (Krys &
Najjaran, 2007; Santana & Medeiros, 2009; Steder et. al., 2008).
3.1 Localization
As explained previously, localization, that is, inferring position and orientation of the robot
inside its environment is an important requirement for map construction. Some researchers
makes this assumption farther important stating that localization is the fundamental and main
Probabilistic Mapping by Fusion of Range-Finders Sensors and Odometry 427
problem that should be treated in order to give autonomy to a robot (Cox, 1991). As well, Thrun
(Thrun et. al., 2000) treats localization as the key problem for the success of an autonomous
robot.
Localization methods generally fall in one of the following approaches: relative, absolute or
using multisensor-fusion. Relative localization (or dead reckoning) is based on the integration
of sensory data (generally from encoders) over time. The current localization of the robot is
calculated after a time interval from the previous one plus the current displacement/rotation
perceived in that time slot by the sensor. Several sources may generate errors between each
time step, so, note that this approach also integrates errors. The calculated localizations are in
reality estimations whose precision depends on the amount of accumulated error. In fact, the
robot may be lost after some time. Absolute localization gives the actual localization of the
robot at a given time. This actual localization is generally calculated based on the detection
of objects or landmarks in the environment, with known localization, from which the position
and orientation of the robot can be calculated by triangulation or some other methodology.
Note that a GPS (and/or compassing) or similar method can also be used to get absolute po-
sition and orientation of the robot in the environment. Multi-sensor fusion combines relative
and absolute localization. For example, a robot relying on its encoders may, after certain period
of time, do absolute localization in order to rectify its actual localization from landmarks in the
environment. In general Kalman filter and/or similar approaches are used in this situation to
extend the maximum possible the amount of time necessary for absolute re-localization, since
this is generally time consuming so the robot does anything while actually localizing itself.
We consider using relative localization in this work since no information with respect to the
environment is given to the robot previously.
One of the most used ways for estimating the robot position and orientation is by using odom-
etry. Odometry gives an estimate of the current robot localization by integration of motion
of the robot wheels. By counting pulses generated by encoders coupled to the wheels axes
(actually, rotation sensors that count the amount of turns) the robot system can calculate the
linear distance and orientation of the robot at the current instant. Odometry is most used
because of its low cost, relative precision in small displacements and high rate of sampled
data (Borenstein et. al., 1996). However, the disadvantage of this method is the accumulation
of errors that increases proportionally to the displacement. Propagated error is systematic or
not. Systematic errors are due to uncertainty in the parameters that are part of the kinematics
modeling of the robot (different wheel diameters, axis lenght diffenent from its actual size,
finite sample rate of the encoders, and others). Non systematic errors occur due to unexpected
situations as unexpected obstacles or slipping of the wheels (Santana, 2007).
Particularly, with the objective of modeling the odometry of our robot, a methodology based
on utilization of empirical data (Chenavier & Crowley, 1992) is used in this work. From
experimental data collected in several samples it was possible to devise the function that
approximates the odometry errors. This practical experiment is done in two phases. In the
first one the angular and linear errors were modeled in a linear displacement (translation only)
and in the second one the angular and linear errors were modeled in an angular displacement
(rotation only). From these experiments, it was possible to stablish a fucntion that describes, in
approximation, the behavior of systematic errors present at the odometry system. Equations
1 and 2 represents these funcions (linear and angular, respectively.
428 Sensor Fusion and Its Applications
Var(elin )
κll = (5)
µ(∆l)
Var(elin )
κlθ = (6)
µ(∆θ)
Var(eang )
κθθ = (7)
µ(∆θ)
Var(eang )
κθl = (8)
µ(∆l)
In Equations 5, 6, 7, and 8, parameter Var(.) is the variance, µ(.) is the mean, elin and eang
are the linear and angular errors, respectively that are obtained from the comparison between
the real displacement values and the estimated given by the odometry system. By grouping
the two error sources (systematic and not), a model for global error is obtained as given by
Equations 9 and 10).
Dark cells represent objects (or obstacles) detected by the sonar array, clear cells represent free
regions, and gray cells are regions not yet mapped. Spatial model based on occupancy grid can
be directly used in navigation tasks, as path planning with obstacle avoidance and position
estimation (Elfes, 1989). The state values are estimated by way of interpretation of data coming
from depth sensors probabilistic modeled using probabilistic function. It is possible to update
each cell value through Bayesian probabilistic rules every time that new readings are done in
different positions in the environment.
Most part of current researches related to environment mapping for robotics uses probabilistic
techniques constructing probabilistic models for the robots, sensors and mapped environ-
ments. The reason for the popularity is of probabilistic techniques comes from the assumed
existence of uncertainty present in sensory data. With probabilistic techniques, it is possible
to treat this problem by explicitly modeling the several sources of noise and their influence in
the measures (Thrun, 2002).
The standard algorithm formalized by Elfes (Elfes, 1989) aims to construct a map based on
sensory data and knowing the robot position and orientation. In our work, we use the
odometry system of the robot for calculating position and orientation. So the occupancy grid
map construction is based on the fusion of data given by the sonars with data provided by
the odometry system of the robot. Equation 11 presents the mathematical formulation that
usually describes the occupancy grid mapping (Elfes, 1987; 1989; Thrun et. al., 2003; 2005).
P(m|z1:t ) (11)
In the Equation 11, m represents the acquired map and z1:t is the set of sensory measures
realized up to time instant t. It is important to clear that the algorithm assumes that position
and orientation of the robot are known. Continuous space is discretized in cells that, together,
approximate the environment shape. This discretization corresponds to a plan cut of the 3D
environment in the case of using a 2D grid or could be a 3D discretization in the case of a 3D
grid. This depends of the sensors model and characteristics. For example, sonar allows a 2D
430 Sensor Fusion and Its Applications
sample of the environment, however stereo vision allows a 3D reconstruction. In this work,
we use sonars.
Considering the discretization of the environment in cells, the map m can be defined as a finite
set of cells mx,y where each cell has a value that corresponds to the probability of it being
occupied. The cells can have values in the interval [0, 1] with 0 meaning empty and 1 meaning
occupied. Being the map a set of cells, the mapping problem can be decomposed in several
problems of estimation of the value of each cell in the map. Equation 12 represents an instance
for the estimation of the value of a cell mx,y , that is, the probability of cell mx,y being occupied
when sensory measures z1:t until the t instant.
P(mx,y |z1:t )
ltx,y = log (13)
1 − P(mx,y |z1:t )
The probability occupancy value can be recovered through Equation 14.
1
P(mx,y |z1:t ) = 1 −
t
(14)
elx,y
The value of log-odds can be estimated recursively at any instant t by using the Bayes rule
applied to P(mx,y |z1:t ) (see Equation 15).
By applying the total probability rule to Equation 19, Equation 20 is obtained. The last calculates
the probability of occupation for cell mx,y having as basis the probabilistic model of sensor
P(zt |mx,y ) and the occupancy value of the cell available previously P(mx,y |z1:t−1 ).
Algorithm 1 occupancy_grid_mapping({lt−1,(x,y) }, xt , zt )
1: for all cells mx,y do
2: if mx,y in perceptual field of zt then
3: lt,(x,y) = lt−1,x,y + inverse_sensor_model(mx,y , xt , zt ) − l0
4: else
5: lt,(x,y) = lt−1,(x,y)
6: end if
7: end for
8: return {lt,(x,y) }
It is important to emphasize that the occupancy values of the cells at Algorithm 1 are calculated
through log-odd that is the logarithm of the probability of avoiding numerical instabilities. In
order to recover the probability values Equation 14 can be used.
With basis on this algorithm, we implemented the method proposed in this work. The main
difference is in the probabilistic modeling of the sensors. Our proposed model implements the
inverse_sensor_model used in the algorithm.
III
main axis
II
z
β
obstacle
sonar θx,y
θ
X
Fig. 4. Regiões significativas em feixe de sonar.
reflected the sound wave may be anywhere inside this region. Region III is the one covered, in
theory, by the sonar bean. However it is not known if it is empty or occupied. Considering the
above regions, the model adopted to represent the sonar is described as a Gaussian distribution
as given by Equation 21.
1 1 (z − dx,y )2 (θ − θx,y )2
P(z, θ|dx,y , θx,y ) =
exp − + (21)
2πσz σθ 2 σ2z σ2θ
In the above Equation, θ is the orientation angle of the sensor with respect to the x axis of the
global reference frame (see Fig. 4), θx,y is the angle between the vector with initial point at the
sonar through cell mx,y , that may be or not with obstacle, and to the global frame axis x (see
Fig. 4), σ2z and σ2θ are the variance that gives uncertainty in the measured distance z and in
the θ angle, respectively. Fig. 5 illustrates the function that estimates the occupancy for this
model.
0.51
0.5 0
10
20
0.49
30
0
10 40
20 50
30
40 60
50
60 70
70 80
80 Y
X
Having as basis a 2D Gaussian model in this work we also consider uncertainties that are
inherent to the odometry system besides sonar uncertainties. Using odometry errors modele
given in Section 3.1 of this text, described by Equations 9 and 10, it is possible to stablish a
relation between the variances σ2z and σ2θ that (model sonar errors) with odometry errors as:
σz = z × η + Elin
β
σθ = + Eang
2
or
β
σθ =+ Eang (∆θ) + N (0, εang ) (23)
2
In the above Equations, z the measure given by the sonar, η is an error factor typical of the sonar
in use (an error of about 1%) and β the aperture angle of the sonar bean (see Fig. 4). Variances
σ2z and σ2θ can be calculated through Equations 22 and 23, now considering the influences
caused by odometry. Equation 22 calculates uncertainty to a distance z and a linear displace-
ment ∆l. Elin (∆l) is the function used to compute systematic errors of odometry (Equation 1)
and N (0, εlin ) is the normal distribution used to compute non systematic errors (Equation 3).
Equation 23 gives uncertainty of the orientation angle of the sonar θ and an angular displace-
ment ∆θ performed by the robot. Eang (∆θ) (Equation 2) describes the systematic error of an
angular displacement and N (0, εang ) (Equation 4) is the normal distribution that estimates
non systematic errors for the same displacement.
P(m |z)
x,y 0.52
0.51
0
0.5 10
20
0.49
30
0.48 40
0 50
10
20 60
30
40
50 70
60
70 80
80 Y
X
4. Experiments
The robot used in this work is a Pioneer-3AT, kindly named Galatea, which is projected for
locomotion on all terrain (the AT meaning) (see Fig. 7).
The Pioneer family is fabricated by ActiveMedia Robotics designed to support several types
of perception devices. The robot comes with an API (Application Program Interface) called
ARIA (ActiveMedia Robotics Interface) that has libraries for C++, Java and Python languages
(in this work we use C++ language.). The ARIA library makes possible to develop high-level
programs to communicate with and control the several robot devices (sensors and actuators)
allowing reading of data about the robot in execution time.
The software package comes with a robot simulator (MobileSim) that allows to test some
functions without using the real robot. It is possible to construct environments of different
Probabilistic Mapping by Fusion of Range-Finders Sensors and Odometry 435
shapes to serve as basis to experiments and tests. Galatea has an embedded computer, a
PC104+ with a pentium III processor of 800MHz, 256Mb of RAM memory, a 20Gb hard
disk, communication interface RS232, Ethernet connection and wireless network board 10/100.
Galatea has 2 sonar arrays with 8 sonars in which one and encoders coupled to the 2 motor
axes that comprises its odometry system. To control all of these, we use the RedHat 7.3 Linux
operating system.
The simulated robot has mapped part of this environment following the doted path at Fig. 8.
The robot perform the maping process until the odometry errors degrade the quality of the
final map. At this point, values of each cell do not define anymore whether a cell is occupied,
empty or mapped yet. That is, the robot has no subsides to construct a trustable map due
to odometry errors. Part (a) of Fig. 9 illustrates this this situation. Galatea is represented
by the red point, white regions are empty, dark regions are occupied cells and gray regions
are not mapped yet cells. As this situation occur, we simulated an absolut localization for
Galatea correcting its odometry and consequently indicating that it can continue the mapping
without considering past accumulated errors. Fig. 9 (b) shows the moment at which the robot
localization is rectified and Fig. 9 (c) illustrates the mapping continuation after this moment.
436 Sensor Fusion and Its Applications
measures by circles in such a way that if a given measure invades the circular region defined
by another measure, the invaded region measure is eliminated. This technique is called Bubble
Circle (BC) Threshold (Lee & Chung, 2006). Results for our work were not convincing thus
other alternatives were studied.
After several experiments and observations, we could verify that in environments with rect-
angular shapes, as this one formed by strait corridors and not as big rooms, more consistent
maps are constructed by using the side sonars. These comprise angles of 90o with the walls
when the robot is parallel to the walls. The same for the rear and front sonars with smallest
angles with respect to the robot main axis. So we discard the other sensors given that they
produced false measures due to its disposition with respect to the walls. In fact we believe that
these several other sensors with oblique angles were projected to be used when the robot is
operating in outdoor environments. In fact, we could verify latter that the same consideration
has been reported in the work of Ivanjko (Ivanjko & Petrovic, 2005), in this case working with
the Pioneer-2DX robot.
Now, after solving the above mentioned problems, we have done another set of experiments
using Algorithm 1, which is a modification of the one proposed by Thrun (Thrun et. al., 2005).
The main differential of the algorithm proposed here is the inclusion of the probabilistic model
of the sensor that represents the uncertainties inherent to to perceptionm in the occupancy
grid map. Fig. 11 shows the mapping of the same corridor yet in the beginning of the process,
however with degradation caused by the odometry error. The red dotted boundary indicates
actual walls position. The red point at the white region indicates the localization of Galatea
at the map and the point at the right extremity of the Fig. is the last point were the mapping
should stop. Fig. 12 shows the evolution of the mapping. Observe the decreasing of quality
in the mapping as the robot moves that, in its turn, increases the odometry errors. The map
shown in Fig. 13 presents an enhancement in its quality. At this point, the absolute localization
was done because the map was degraded. Odometry error goes to zero here, rectifying the
robot position and orientation.
The mapping process goes up to the point shown in Fig. 14 where it is necessary another
correction by using absolute localization, then going until the final point as shown in Fig. 15.
By considering probabilistic modeling of odometry errors and sensor readings to try to di-
minish error effects at the mapping process, we could have a substantial enhancement in the
mapping quality. However, we remark that at some time of the process the map gets even-
438 Sensor Fusion and Its Applications
tually corrupted by the effects of non systematic errors. Fig. 16 shows the situation using
modeling to attenuating the effect of these errors. In this case, the effect is very small because
the errors become very little for the travelled distance.
Fig. 11. Use of the proposed model considering representation of odometry errors in the
mapping.
Fig. 16. Mapa construído com o modelo proposto e como correção dos erros sistemáticos de
odometria.
With basis on the results given by the performed experiments, we conclude that the algorithm
proposed in this work gives a more realistic and actual manner for representing a mapped
environment using the occupancy grid technique. This is becuse that now we know that the
data provided by the sensors have errors and we know how much this error can grow, that is,
we have an upper limit for the error, controlling its growing. Even with the difficulties given
by the sonar limitations, our system presents satisfactory results. Other types of depth sensors
can be added to this model or use a similar approach, as for example lasers, thus increasing
map consistency.
As next work, we intend to study techniques and exploration heuristics in order for a robot to
perform the mapping process in autonomous way. Besides, forms to enhance robot localiza-
tion with incorporation of other sensors will also be studied that together can improve map
quality. Future trials with emphasis in Localization and Simultaneous Mapping (SLAN) will
also be done, having as basis the studies done in this work. Fusion of information with the
ones provided by a visual system (stereo vision) will be further done. With this, we intend to
explorate the construction of 3D maps allowing the use of robots in other higher level tasks,
as for example, analysis of building structure.
6. References
Agrawal, M.; Konolige, K. & Bolles, R.C. (2007). Localization and Mapping for Autonomous
Navigation in Outdoor Terrains : A Stereo Vision Approach. In: IEEE Workshop on
Applications of Computer Vision, WACV ’07.
Alsina, P. J.; Gonçalves, L. M. G.; Medeiros, A. A. D.; Pedrosa, D. P. F. & Vieira, F. C. V. (2002).
Navegação e controle de robôs móveis. In: Mini Curso - XIV Congresso Brasileiro de
Automática, Brazil.
Borenstein, J.; Everett, R. H. & Feng, L. (1996). Where am I? Sensors and Methods for Mobile Robot
Positionig, University of Michigan, EUA.
Chenavier, F. & Crowley, J. L. (1992). Position estimation for a mobile robot using vision
and odometry, In: Proceeding of the 1992 IEEE International Conference on Robotics and
Automation, Nice, France.
Choset, H. & Fox, D. (2004). The World of Mapping. In: Proceedings of WTEC Workshop on Review
of United States Research in Robotics, National Science Foundation (NSF), Arlington,
Virginia, USA.
Clark, C. M.; Olstad, C. S.; Buhagiar, K. & Gambin, T. (2009). Archaeology via Underwater
Robots: Mapping and Localization within Maltese Cistern Systems. In: 10th Interna-
tional Conf. on Control, Automation, Robotics and Vision, pp.662 - 667, Hanoi, Vietnam.
Cox, I. J. (1991). Blanche - An Experiment in Guidance and Navigation of an Autonomous
Robot Vehicle, In: IEEE Transactions on Robotics and Automation, Vol. 7, No. 2.
Elfes, A. (1987). Sonar-based real-world mapping and navigation, In: IEEE Journal of Robotics
and Automation, Vol. 3, No. 3 , pp. 249-265.
Elfes, A. (1989). Occupancy Grid: A Probabilistic Framework for Robot Perception and Navi-
gation, PhD Thesis, Carnegie Mellon University, Pittsburg, Pensylvania, USA.
Hogue, A. & Jenkin, M. (2006). Development of an Underwater Vision Sensor for 3D Reef
Mapping, In: Proceedings IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS 2006), pp. 5351-5356.
Probabilistic Mapping by Fusion of Range-Finders Sensors and Odometry 441
Ivanjko, E. & Petrovic, I. (2005). Experimental Evaluation of Occupancy Grids Maps Improve-
ment by Sonar Data Dorrection, In: Proceedings of 13th Mediterranean Conference on
Control and Automation, Limassol, Cyprus.
Krys, D. & Najjaran, H. (2007). Development of Visual Simultaneous Localization and Map-
ping (VSLAM) for a Pipe Inspection Robot. In: Proceedings of the 2007 International
Symposium on Computational Intelligence in Robotics and Automation, CIRA 2007. pp.
344-349.
Lee, K. & Chung, W. K. (2006). Filtering Out Specular Reflections of Sonar Sensor Readings, In:
The 3rd International Conference on Ubiquitous Robots and Ambient Intelligent (URAI).
Lee, Y.-C.; Nah, S.-I.; Ahn, H.-S. & Yu, W. (2006). Sonar Map Construction for a Mobile
Robot Using a Tethered-robot Guiding System, In: The 6rd International Conference on
Ubiquitous Robots and Ambient Intelligent (URAI).
Medeiros, A. A. D. (1998). A Survey of Control Architectures for Autonomous Mobile Robots.
In: JBCS - Journal of the Brazilian Computer Society, Special Issue on Robotics, ISSN
0104-650004/98, vol. 4, n. 3, Brazil.
Ouellette, R. & Hirasawa, K. (2008). Mayfly: A small mapping robot for Japanese office envi-
ronments. In: IEEE/ASME International Conference on Advanced Intelligent Mechatronics
,pp. 880-885.
Rocha, R. P. P. (2006). Building Volumetric Maps whit Cooperative Mobile Robots and Useful
Information Sharing: A Distributed Control Approach based on Entropy. PhD Thesis,
FEUP - Faculdade de Engenharia da Universidade do Porto, Portugal.
Santana, A. M. (2007). Localização e Planejamento de Caminhos para um Robô Humanóide e
um Robô Escravo com Rodas. Master Tesis, UFRN, Natal, RN. 2007.
Santana, A M. & Medeiros, A. A. D. (2009). Simultaneous Localization and Mapping (SLAM)
of a Mobile Robot Based on Fusion of Odometry and Visual Data Using Extended
Kalman Filter. In: In Robotics, Automation and Control, Editor: Vedran Kordic, pp. 1-10.
ISBN 978-953-7619-39-8. In-Tech, Austria.
Silver, D.; Ferguson, D.; Morris, A. C. & Thayer, S. (2004). Features Extraction for Topological
Mine Maps, In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS 2004), Vol. 1, pp. 773-779.
Souza, A. A. S. (2008). Mapeamento com Grade de Ocupação Baseado em Modelagem Proba-
bilística, Master Thesis, UFRN, Natal, Brazil.
Souza, A. A. S.; Santana, A. M.; Britto, R. S.; Gonçalves, L. M. G. & Medeiros, A. A. D. (2008).
Representation of Odometry Errors on Occupancy Grids. In: Proceeding of International
Conference on Informatics in Control, Automation and Robotics (ICINCO2008), Funchal,
Portugal.
Steder, B.; Grisetti, G.; Stachniss, C. & Burgard, W. (2008). Visual SLAM for Flying Vehicles, In:
IEEE Transactions on Robotics, Vol. 24 , Issue 5, pp. 1088-1093.
Triebel, R.; Pfaff, P. & Burgard, W. (2006). Multi-Level Surface Maps for Outdoor Terrain
Mapping and Loop Closing. In: Proceedings of IEEE/RSJ International Conference on
Intelligent Robots and Systems, pp. 2276 - 2282.
Thrun, S. (2002). Robotic mapping: A survey. In: Exploring Artificial Intelligence in the New
Millenium,Ed. Morgan Kaufmann, 2002.
Thrun, S.; Fox, D.; Burgard, W. & Dellaert, F. (2000). Robust Monte Carlo Localization for
Mobile Robots, In: Artificial Inteligence, Vol. 128, No. 1-2, pp. 99-141.
Thrun, S.; Hähnel, D.; Fergusin, D.; Montermelo, M.; Riebwel, R.; Burgard,W.; Baker, C.;
Omohundro, Z.; Thayer, S. & Whittaker, W. (2003). A system for volumetric mapping
442 Sensor Fusion and Its Applications
20
X
Abstract
Detrimental residual stresses and microstructure changes are the two major precursors for
future sites of failure in ferrous steel engineering components and structures. Although
numerous Non-Destructive Evaluation (NDE) techniques can be used for microstructure
and stress assessment, currently there is no single technique which would have the
capability to provide a comprehensive picture of these material changes. Therefore the
fusion of data from a number of different sensors is required for early failure prediction
Electromagnetic (EM) NDE is a prime candidate for this type of inspection, since the
response to Electromagnetic excitation can be quantified in several different ways: e.g. eddy
currents, Barkhausen emission, flux leakage, and a few others.
This chapter reviews the strengths of different electromagnetic NDE methods, provides an
analysis of the different sensor fusion techniques such as sensor physical system fusion
through different principles and detecting devices, and/or feature selection and fusion,
and/or information fusion. Two sensor fusion case studies are presented: pulsed eddy
current thermography at sensor level and integrative electromagnetic methods for stress and
material characterisation at feature (parameters) level.
1. Introduction
In recent years, non-destructive testing and evaluation (NDT&E) techniques have been
developed which allow quantitative analysis of the stresses acting on a material; either
through direct measurement of displacement (strain measurement)(1) or measurement of
material properties which interact with stress and can therefore be used to indicate the
material stress state. The second category includes magnetic(2) and electromagnetic
(induction) NDT&E inspection techniques which allow the quantification of material
stresses through magnetic and electrical properties, including magnetic permeability μ,
electrical conductivity σ and domain wall motion. Although magnetic and electromagnetic
444 Sensor Fusion and Its Applications
techniques are promising candidates for stress measurement, the fact that the stress
measurement is performed indirectly, means the relationship between the measured signal
and stress is complex and heavily dependent on material microstructure, thus material-
specific calibration is almost always required.
Because of the complex nature of the mechanisms which contribute to cracking, degradation
and material stresses, the use of more than one NDE methods is often required for
comprehensive assessment of a given component. The development of fusion techniques to
integrate signals from different sources has the potential to lead to a decrease in inspection
time and also a reduction in cost. Gathering of data from multiple systems coupled with
efficient processing of information can provide great advantages in terms of decision
making, reduced signal uncertainty and increased overall performance. Depending on the
different physical properties measured, fusion techniques have the benefit that each NDE
modality reveals different aspects of the material under inspection. Therefore professional
processing and integration of defect information is essential, in order to obtain a
comprehensive diagnosis of structural health.
With research and development in NDE through a wide range of applications for
engineering and medical sciences, conventional NDT&E techniques have illustrated
different limitations, e.g. ultrasonic NDT&E needs media coupling, eddy current NDT&E
can only be used to inspect surface or near surface defects in metallic or conductive objects,
etc. As industrial applications require inspection and monitoring for large, complex safety
critical components and subsystems, traditional off-line NDT and quantitative NDE for
defect detection cannot meet these needs. On-line monitoring e.g. structural health
monitoring (SHM) for defects, as well as precursors e.g. material abnormal status for life
cycle assessment and intelligent health monitoring is required. Recent integrative NDE
techniques and fusion methods have been developed to meet these requirements (3).
subtraction, multiplication etc(6, 7); feature selection and combination from sensor data
features(8, 9, 10); information fusion through case studies(10, 11). Signal level data fusion,
represents fusion at the lowest level, where a number of raw input data signals are
combined to produce a single fused signal. Feature level fusion, fuses feature and object
labels and property descriptor information that have already been extracted from individual
input sensors. Finally, the highest level, decision level fusion refers to the combination of
decisions already taken by individual systems. The choice of the fusion level depends
mainly upon the application and complexity of the system.
In this chapter, three different applications of electromagnetic NDE sensor fusion are
discussed and the benefits of the amalgamation of different electromagnetic NDE techniques
are examined. In section 2, three kinds of sensor fusion are reported: Section 2.1. introduces
PEC thermography using integrative different modality NDE methods; Section 2.2 looks at
Sensor fusion for electromagnetic stress measurement and material characterisation 445
Figure 1a shows a typical PEC thermography test system. A copper coil is supplied with a
current of several hundred amps at a frequency of 50kHz – 1MHz from an induction heating
system for a period of 20ms – 1s. This induces eddy currents in the sample, which are
diverted when they encounter a discontinuity leading to areas of increased or decreased
heating. The resultant heating is measured using an IR camera and displayed on a PC.
Figure 1b shows a PEC thermography image of a section of railtrack, shown from above. It can
be seen that the technique has the ability to provide a “snapshot” of the complex network of
cracking, due to wear and rolling contact fatigue (RCF) in the part. It is well known that in the
initial stages, RCF creates short cracks that grow at a shallow angle, but these can sometimes
grow to a steep angle. This creates a characteristic surface heat distribution, with the majority
of the heating on one side of the crack only. This is due to two factors, shown in figure 1c; a
high eddy current density in the corner of the area bounded by the crack and an increase in
heating, due to the small area available for diffusion.
446 Sensor Fusion and Its Applications
(a)
50
100
150
200
250
300
(b) (c)
Fig. 1. a) PEC thermography system diagram, b) PEC thermography image of gauge corner
cracking on a section of railtrack, c) Eddy current distribution and heat diffusion around
angular crack
This ability to provide an instantaneous image of the test area and any defects which may be
present is an obvious attraction of this technique, but further information can be gained
through the transient analysis of the change in temperature in the material. The sample
shown in figures 2a and 2b is made from titanium 6424 and contains a 9.25mm long
semicircular (half-penny) defect with a maximum depth of around 4.62mm. The crack is
formed by three point bending technique and the sample contains a 4mm deep indentation
on the opposite side to the crack, to facilitate this process. Figure 2d shows the transient
temperature change in five positions in the defect area, defined in figure 2c. It can be seen
from the plot that different areas of the crack experience a very different transient response,
corresponding to the combined effects of differing eddy current distributions around the
Sensor fusion for electromagnetic stress measurement and material characterisation 447
crack and differing heat diffusion characteristics. This shows that the technique has the
potential to offer both near-instantaneous qualitative defect images and quantitative
information through transient analysis.
(a) (b)
centre 0.5
140 +1mm
+2mm 0.4
+3mm
160 +4mm 0.3
0.2
180
0.1
140 160 180 200 220 0 0.05 0.1 0.15 0.2
Time - s
(c) (d)
Fig. 2. Inspection of Ti 6424 sample; a) Front view, b) Cross section, c) Positions for transient
analysis, d) Transient temperature change in different positions on the sample surface
2.2. Potential for fusion of MBE and MAE for microstructural characterisation
Although MBE and MAE are both based on the sensing of domain wall motion in
ferromagnetic materials in response to a time varying applied magnetic field, the two
techniques have important differences when applied to stress measurement and
microstructural evaluation. Due to the skin effect, MBE is a surface measurement technique
with a maximum measurement depth below 1mm and a strong reduction in sensitivity with
increased depth. As MAE is essentially an acoustic signal, it does not suffer from the same
restrictions as MBE and can be considered to be a bulk measurement technique. The
interpretation of MAE can however, be complex, thus the implementation of a combination
of the two techniques is advisable.
448 Sensor Fusion and Its Applications
Fig. 3. MBE (a) and MAE (b) profiles measured on En36 gear steel samples of varying case
depths
Figure 3 shows the results from a set of tests to quantify the case hardening depth in En36
gear steel. It can be seen from the plot that for case depths >0.64mm, the shape of the MBE
profile remains the same, indicating that the case depth has exceeded the measurement
depth, whereas for MAE, the profile shape continues to change up to the maximum depth of
1.35mm, indicating a greater measurement depth for this technique.
5c indicates that two different mechanisms are responsible for the change in MBE with stress.
The peaks exhibit opposite behaviour; peak 1 increases with stress, whereas peak 2
decreases with stress. This indicates that each peak is associated with a different
microstructural phase and / or domain configuration, active at a different point in the
excitation cycle.
(a) (b)
(c) (d)
Fig. 4. Results of PEC measurements on steel under elastic and plastic deformation; a)
Normalised PEC response peak(BNORM),under elastic stress; (b) Non-normalised PEC
response max(BNON-NORM) under elastic stress, a) Normalised PEC response peak(BNORM)
under plastic strain (b) Non-normalised PEC response max(BNON-NORM) under plastic strain
Figure 5b shows the change in MBEENERGY for plastic stress. The MBEENERGY exhibits a large
increase in the early stages of plastic deformation indicating a change in the domain
structure due to the development of domain wall pinning sites, followed by a slower
increase in MBEENERGY as applied strain increases. Figure 5d shows the development of the
MBE profile for an increase in plastic stress. It can be seen from the plot that as plastic
deformation increases, the overall amplitude of the MBE profile increases, corresponding to
the increase in MBEENERGY. It can also be seen that the increase in overall amplitude is
450 Sensor Fusion and Its Applications
coupled with a shift in peak position with respect to the excitation voltage. This change in
the MBE profile is due to the development of material dislocations increasing domain wall
pinning sites, leading to higher energy MBE activity later in the excitation cycle.
Examination of this in peak position has shown that it has a strong correlation to the
stress/strain curve in the plastic region.
The dependence of the MBE peak position qualitatively agrees with the dependence of
max(BNON-NORM) as a function of strain shown in Figure 4d. These dependencies decrease
according to the tensile characteristics in the yielding region and therefore it has the same
value for two different strains, which makes it difficult to quantify the PD. However the
dependence of peak(BNORM) as function of strain, shown in Figure 4c, increases in the same
region which provides complimentary information and enables PD characterisation using
two features proportional to the magnetic permeability and electrical conductivity
respectively.
MBE ENERGY FOR ELASTIC STRESS MBE ENERGY FOR PLASTIC STRAIN
0.21 0.26
0.2
0.24
MBE ENERGY V2s
0.19
0.22
0.18
0.2
0.17
0.18
0.16
0.15 0.16
0 50 100 150 200 250 300 0 2 4 6 8 10
Applied stress - MPA Strain - %
(a) (b)
MBE PROFILES FOR ELASTIC STRESS MBE PROFILES FOR PLASTIC STRESS
0.7
0 MPa 0.6 344 MPa
0.6 78 MPa Peak 1 412 MPa
304 MPa 0.5 467 MPa
MBE amplitude - mV
MBE amplitude - mV
0.5
0.4
0.4
0.3
0.3
0.2
0.2
Peak 2
0.1 0.1
0 0
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
Normalised excitation voltage Normalised excitation voltage
(c) (d)
Fig. 5. Results of MBE measurements on steel under elastic and plastic deformation; a)
MBEENERGY for elastic stress, b) MBEENERGY for plastic strain, c) MBE profiles for elastic stress,
c) MBE profiles for plastic stress
Sensor fusion for electromagnetic stress measurement and material characterisation 451
These results illustrate the complementary nature of these two electromagnetic NDE
techniques. PEC can be used for simple stress measurement, but to gain a full picture of the
microstructural changes in the material, MBE profile analysis should be employed. Thus,
fusion of PEC and MBE in a single system, with a common excitation device and a combined
MBE/PEC pickup coil has the potential to provide comprehensive material assessment. This
fusion technique has been used for the second Round Robin test organised by UNMNDE
(Universal Network for Magnetic Non-Destructive Evaluation) for the characterisation of
material degradation and ageing
MBE has the capability to provide stress and microstructure information, but has a low
measurement depth (up to 1 mm), a weak correlation with defects and the determination of
exact correlations between signal features and material properties can be difficult without a
full range of calibration samples; consequently the combination of MBE with other
inspection techniques has received some attention in recent years. Quality Network, Inc.
(QNET), the marketing and services affiliate of the Fraunhofer Institute for Non-Destructive
Testing (IZFP) has introduced the multi-parameter micro-magnetic microstructure testing
system (3MA)(13). The 3MA system is optimised to measure surface and subsurface hardness,
residual stress, case depth and machining defects through simultaneous measurement of
MBE, incremental permeability, tangential magnetic field strength and eddy current
impedance. As 3MA is a commercial system, exact details of the 3MA operational
parameters are not available, but it is implied in the literature that variations in excitation
field strength and frequency is used to control measurement depth and the measured
parameters are combined using a multiple regression technique.
Chady et al. have assessed the comparative strengths of MBE, ECT, flux leakage and
Hysteresis loop measurement for the characterisation of fatigue failure through cyclic
dynamic loading of S355J2G3 structural steel (14). Pixel level fusion of the scan results from
the different inspection techniques was performed and it was found that fusion of all the
signals creates opportunity to detect and evaluate quantitatively a level of material
degradation.
452 Sensor Fusion and Its Applications
Fig. 6. Sensor fusion for comprehensive evaluation of defects and material properties
In addition to the sensor or data fusion above, Figure 6 shows an example of how sensor
fusion can be used to implement a comprehensive material assessment system. A common
excitation device is used to apply an electromagnetic field to the material under assessment
and the response of the material is measured in several different ways. Firstly, a magnetic
field sensor, operating as a pulsed magnetic flux leakage (PMFL)(15) sensing device, is used
to measure the tangential magnetic field. This signal is analysed to extract information and
quantify any surface, subsurface or opposite side defects which may be present. Secondly,
the field at the surface of the material is measured using a coil, the measured signal is then
band-pass filtered to reject the low frequency envelope and isolate the Barkhausen emission
signal. This can then be used to characterise surface material changes, such as surface
residual stresses and microstructural changes, i.e. degradation, corrosion, grinding burn.
Using MBE, these changes can be quantified up to a depth of around 1mm. Bulk
stress/microstructure changes are quantified using a piezoelectric sensor to measure
magneto-acoustic emission, thus by comparing MBE and MAE measurements(16), bulk and
surface changes can be separated and quantified.
4. Conclusions
Sensor fusion for electromagnetic NDE at different stages and levels has been discussed and
three case studies for fusion at sensor and feature levels have been investigated. Instead of
applying innovative mathematical techniques to utilise multiple sensors to improve the
fidelity of defect and material characterisation, physics based sensor fusion is investigated. It
has been shown that the three types of sensing system fusion, feature selection and
integration and information combination for decision making in Quantitative NDE and
material characterisation have different complementary strengths. Our future research
efforts will explore the platform of features (parameters) of the signatures from the
multimodal sensor data spaces using physical models and mathematic techniques for
different engineering and medical challenges, including quantitative non-destructive
evaluation, structural health monitoring, target detection and classification, and non-
invasive diagnostics.
Acknowledgement
The Authors would like to thank the EPSRC for funding the work through EP/E005071/1
and EP/F023324/1, and the Royal Academy of Engineering (RAEng) for the Global
Research Award "Global research and robin tests on magnetic non-destructive evaluation”
awarded to Professor Gui Yun Tian.
5. References
1. P J Withers, M Turski, L Edwards, P J Bouchard and D J Buttle, 'Recent advances in
residual stress measurement', Int. Journal of Pressure Vessels and Piping, Vol. 85,
No. 3, pp. 118-127, 2008.
2. J W Wilson, G Y Tian and S Barrans, 'Residual magnetic field sensing for stress
measurement', Sensors and Actuators A: Physical, Vol. 135, No. 2, pp. 381-387,
2007.
3. Gros X. Emanuel, Applications of NDT Data Fusion, Kluwer Academic Publishers, 2001.
4. J. Wilson, G.Y. Tian, I.Z. Abidin, Suixian Yang, D. Almond, Pulsed eddy current
thermography: system development and evaluation, Insight - Non-Destructive
Testing and Condition Monitoring, Volume: 52, Issue: 2 February 2010, 87-90
5. J. Wilson, G.Y. Tian, I.Z. Abidin, S. Yang, D. Almond, Modelling and evaluation of eddy
current stimulated thermography, Nondestructive Testing and Evaluation, pp. 1 -
14, 2010.
6. V. Kaftandjian, Y. Min Zhu, O. Dupuis, and D. Babot, ’ The Combined Use of the Evidence
Theory and Fuzzy Logic for Improving Multimodal Nondestructive Testing
Systems’, IEEE Transaction on Instrumentation and Measurement , Vol. 54, No. 5,
2005.
7. X. Gros, Z. Liu, K. Tsukada, and K. Hanasaki, ‘Experimenting with Pixel-Level NDT Data
Fusion Techniques’, IEEE Transaction on Instrumentation and Measurement, Vol.
49, No. 5, 2000.
454 Sensor Fusion and Its Applications
21
X
1. Introduction
Multispectral images usually present complimentary information such as visual-band
imagery and infrared imagery (near infrared or long wave infrared). There is strong
evidence that the fused multispectral imagery increases the reliability of interpretation
(Rogers & Wood, 1990; Essock et al., 2001); whereas the colorized multispectral imagery
improves observer performance and reaction times (Toet et al. 1997; Varga, 1999; Waxman et
al., 1996). A fused image in grayscale can be automatically analyzed by computers (for
target recognition); while a colorized image in color can be easily interpreted by human
users (for visual analysis).
Imagine a nighttime navigation task that may be executed by an aircraft equipped with a
multisenor imaging system. Analyzing the combined or synthesized multisensory data will
be more convenient and more efficient than simultaneously monitoring multispectral
images such as visual-band imagery (e.g., image intensified, II), near infrared (NIR)
imagery, and infrared (IR) imagery. In this chapter, we will discuss how to synthesize the
multisensory data using image fusion and night vision colorization techniques in order to
improve the effectiveness and utility of multisensor imagery. It is anticipated that the
successful applications of such an image synthesis approach will lead to improved
performance of remote sensing, nighttime navigation, target detection, and situational
awareness. This image synthesis approach involves two main techniques, image fusion and
night vision colorization, which is reviewed as follows, respectively.
Image fusion combines multiple-source imagery by integrating complementary data in order
to enhance the information apparent in the respective source images, as well as to increase
the reliability of interpretation. This results in more accurate data (Keys et al., 1990) and
increased utility (Rogers & Wood, 1990; Essock et al., 1999). In addition, it has been reported
that fused data provides far more robust aspects of operational performance such as
increased confidence, reduced ambiguity, improved reliability and improved classification
(Rogers & Wood, 1990; Essock et al., 2001). A general framework of image fusion can be
found in Reference (Pohl & Genderen, 1998). In this chapter, our discussions focus on pixel-
level image fusion. A quantitative evaluation of fused image quality is important for an
objective comparison between the respective fusion algorithms, which measures the amount
of useful information and the amount of artifacts introduced in the fused image.
456 Sensor Fusion and Its Applications
Two common fusion methods are the discrete wavelet transform (DWT) (Pu & Ni, 2000;
Nunez et al., 1999) and various pyramids (such as Laplacian, contrast, gradient, and
morphological pyramids) (Jahard et al., 1997; Ajazzi et al., 1998), which both are multiscale
fusion methods. Recently, an advanced wavelet transform (aDWT) method (Zheng et al.,
2004) has been proposed, which incorporates principal component analysis (PCA) and
morphological processing into a regular DWT fusion algorithm. The aDWT method can
produce a better fused image in comparison with pyramid methods and regular DWT
methods. Experiments also reveal an important relationship between the fused image
quality and the wavelet properties. That is, a higher level of DWT decomposition (with
smaller image resolution at a higher scale) or a lower order of wavelets (with a shorter
length) usually results in a more sharpened fused image. This means that we can use the
level of DWT decomposition and the length of a wavelet as the control parameters of an
iterative DWT-based fusion algorithm.
So far, only a few metrics are available for quantitative evaluation of the quality of fused
imagery. For example, the root mean square error (RMSE) may be the natural measure of
image quality if a “ground truth” image is available. Unfortunately, for realistic image
fusion applications there are no ground truths. Piella et al. (2003) presented an image fusion
metric, the image quality index (IQI), which measures how similar the fused image is to
both input images. More recently, Zheng et al. (2007) proposed an image quality metric,
termed as “the ratio of SF error (rSFe)”, which is a relative measurement regardless of the
type of image being analyzed. The rSFe metric is defined upon “spatial frequency” (SF)
(Eskicioglu & Fisher, 1995). In addition, the rSFe value can show the fusion status (i.e.,
under-fused or over-fused). Refer to Section 2.3 for a review of fusion metrics.
On the other hand, a night vision colorization technique can produce colorized imagery with a
naturalistic and stable color appearance by processing multispectral night-vision imagery.
Although appropriately false-colored imagery is often helpful for human observers in
improving their performance on scene classification, and reaction time tasks (Essock et al.,
1999; Waxman et al., 1996), inappropriate color mappings can also be detrimental to human
performance (Toet & IJspeert, 2001; Varga, 1999). A possible reason is lack of physical color
constancy (Varga, 1999). Another drawback with false coloring is that observers need
specific training with each of the unnatural false color schemes so that they can correctly
and quickly recognize objects; whereas with colorized nighttime imagery rendered with
natural colors, users should be able to readily recognize and identify objects.
Toet (2003) proposed a night vision (NV) colorization method that transfers the natural color
characteristics of daylight imagery into multispectral NV images. Essentially, Toet’s natural
color-mapping method matches the statistical properties (i.e., mean and standard deviation)
of the NV imagery to that of a natural daylight color image (manually selected as the
“target” color distribution). However, this color-mapping method colorizes the image
regardless of scene content, and thus the accuracy of the coloring is very much dependent
on how well the target and source images are matched. Specifically, Toet’s method weights
the local regions of the source image by the “global” color statistics of the target image, and
thus will yield less naturalistic results (e.g., biased colors) for images containing regions that
differ significantly in their colored content. Another concern of Toet’s “global-coloring”
method is that the scene matching between the source and target is performed manually. To
address the aforementioned bias problem in global coloring, Zheng et al. (2005; 2008)
presented a “local coloring” method that can colorize the NV images more like daylight
Iterative Multiscale Fusion and Night Vision Colorization of Multispectral Images 457
imagery. The local-coloring method will render the multispectral images with natural colors
segment by segment (also referred to as “segmentation-based”), and also provide automatic
association between the source and target images (i.e., avoiding the manual scene-matching
in global coloring).
A B C
A B C
B G R Adjustment
False coloring Advanced DWT
Fusion Process
F
Nonlinear
diffusion Metric calculation
RGB to lαβ
transform Meet stop N
conditions?
Clustering and Y
region merging
Fused
Grayscale image
Segment F
recognition
Tgt. color
Color mapping by
schemes
stat- or hist-match
lαβ to RGB
transform
RGB to HSV
transform
HSV to RGB
transform
Colored image
Fig. 1. The diagram of image fusion and night vision colorization. The iterative image fusion
(shown within the right dashed box) takes multispectral images (A, B, C) as inputs, and
fuses them into a grayscale image, F. The night vision colorization (shown in the left column)
takes the same multispectral images (A, B, C) and also the fused image F as inputs, and
generates a colored image. Three steps shown inside a dotted rectangle are performed in the
lαβ color space.
458 Sensor Fusion and Its Applications
In this chapter, a joint approach that incorporates image fusion and night vision colorization
is presented to synthesize and enhance multisensor imagery. This joint approach provides
two sets of synthesized images, fused image in grayscale and colored image in colors using
the image fusion procedure and night vision colorization procedure. As shown in Fig. 1, the
image fusion (shown in the right dashed box) takes multispectral images (A, B, C) as inputs
and fuses them into a grayscale image (F). The night vision colorization (shown in the left
column) takes the same multispectral images (A, B, C) and the fused image (F) as an input
and eventually generates a colored image. The image fusion process can take more than
three bands of images; whereas the night vision colorization can accept three (or less) bands
of images. If there are more than three bands of images available, (e.g. II, NIR, MWIR
(medium-wave IR) and LWIR (long-wave IR)), we may choose a visual band image (II) and
two bands of IR images for the following colorization (refer to Section 4 for a detailed
discussion). Two procedures are discussed respectively in Sections 2 and 3. Note that in this
chapter, the term “multispectral” is used equivalently to “multisensory”; and by default the
term “IR” means “LWIR” unless specified.
The remainder of this chapter is organized as follows: The multiscale image fusion methods
are discussed in Section 2. Image quality metrics are also reviewed in this section. The night
vision colorization methods are fully described in Section 3. The experiments and
discussions are given in Section 4. Finally, conclusions are made in Section 5.
scale; and the larger absolute value of the detail coefficients (i.e., the high-pass filtered
images) at each transform scale. Then, an inverse DWT is performed to obtain a fused
image. At each DWT scale of a particular image, the DWT coefficients of a 2D image consist
of four parts: approximation, horizontal detail, vertical detail, and diagonal detail. In the
advanced DWT (aDWT) method (Zheng et al., 2004), we apply PCA to the two input images’
approximation coefficients at the highest transform scale. That is, we fuse them using the
principal eigenvector (corresponding to the larger eigenvalue) derived from the two original
images, as described in Eq. (1) below:
C F (a1 C A a2 C B ) /( a1 a2 ) , (1)
where CA and CB are approximation coefficients (image matrices) transformed from input
images A and B. CF represents the fused coefficients; a1 and a2 are the elements (scalars) of
the principal eigenvector, which are computed by analyzing the original input images. Note
that the denominator in Eq. (1) is used for normalization so that the fused image has the
same energy distribution as the original input images.
For the detail coefficients (the other three quarters of the coefficients) at each transform
scale, the larger absolute values are selected, followed by neighborhood morphological
processing, which serves to verify the selected pixels using a “filling” and “cleaning”
operation (i.e., the operation fills or removes isolated pixels locally). Such an operation
(similar to smoothing) can increase the consistency of coefficient selection thereby reducing
the distortion in the fused image.
Note that Q0 [0, 1] can reflect the correlation (similarity), luminance distortion, and
contrast distortion between vectors x and y, which correspond to the three components
(factors) in Eq. (2). Keep in mind that for the image quality evaluation with Q0, the values xi,
yi are positive grayscale values. The maximum value Q0 = 1 is achieved when x and y are
identical.
Then, the fused image quality metric (i.e., the image quality index) (Wang & Bovik, 2002;
Piella & Heijmans, 2003) can be defined as
Qw = λQ0(IA, IF) + (1−λ) Q0(IB, IF), (3)
where subscripts A, B, and F denote the input images (A, B) and the fused images (F); and
weight λ = S(IA) / [S(IA) + S(IB)]. S(IA) denotes the “saliency” of image A, which may be the
local variance, S(IA) = A . Since image signals are generally non-stationary, it is more
appropriate to measure the weighted image quality index Qw over local regions (e.g., using
a sliding window) and then combine the different results into a single measure.
460 Sensor Fusion and Its Applications
N M
1
CF
MN [ I ( i , j ) I ( i 1, j )]
j 1 i 2
2 ; (5b)
M N
1
MDF wd
MN
[I (i, j) I (i 1, j 1)]2 , (5c)
i 2 j 2
N 1 M
1
SDF wd
MN
[ I (i, j) I (i 1, j 1)]2 ; (5d)
j 1 i 2
Similar to Eq. (4), the overall reference spatial frequency, SFR, can be computed by combining
four directional reference SFs (SFR is not formulated here). Note that the notation of
Iterative Multiscale Fusion and Night Vision Colorization of Multispectral Images 461
In the orientation-based fusion algorithm, the Gabor wavelet transforms are performed with
each input image at M spatial frequencies by N orientations, notated as M×N. For a 16×16
GWT, a total of 256 pairs (magnitudes and phases) of filtered images are extracted with 256
Gabor wavelets (also called Gabor kernels, or Gabor filter bank) distributed along 16 bands
(located from low to high frequencies) by 16 orientations (0.00°, 11.25°, 22.50°, ..., 157.50°,
168.75°). The size of each Gabor filter should match the image size being analyzed. If all
input images are of the same size, then the set of 256 Gabor wavelets are only computed
once. Instead of doing spatial convolution, the GWT can be accomplished in frequency
domain by using fast Fourier transforms (FFT) that will significantly speed up the process.
Many GWT coefficients are produced, for example, 512 coefficients (256 magnitudes plus
256 phases) per pixel in an 16×16 GWT. Suppose a set of M×N GWT are performed with two
input images (IA and IB). At each frequency band (b = 1, 2, …, M), the index of maximal GWT
magnitude between two images is selected pixel by pixel; and then two index frequencies,
HA(b) and HB(b), are calculated as its index accumulation along N orientations, respectively.
The final HA and HB are the weighted summations through M bands, where the band
weights (Wb) are given empirically. Eventually, the fused image (IF) is computed as
IF = (IA .* HA + IB .* HB)/( HA + HB), (9)
where ‘.*’ denotes element-by-element product of two arrays; and
M
H A Wb H A (b) , (10a)
b 1
M
H B Wb H B (b) , (10b)
b 1
where Wb are the band weights decided empirically. The middle frequency bands
(Hollingsworth et al., 2009) in GWT (by suppressing the extreme low and extreme high
frequency bands) usually give a better representation and consistency in image fusion,
especially for noisy input images.
The orientation-based fusion algorithm can be further varied by either keeping DC (direct
current) or suppressing DC in GWT. “Keeping DC” will produce a contrast-smooth image
(suitable for contrast-unlike images); while “suppressing DC” (i.e., forcing DC = 0.0) will
result a sharpened fusion (suitable for contrast-alike images). Color fusion can be achieved by
replacing the red channel of a color image with the fused image of red channel and LWIR
image, which is suitable for poorly illuminated color images.
The target color schemes are grouped by their contents and colors such as plants, mountain,
roads, sky, water, buildings, people, etc. (4) The association between the source region
segments and target color schemes is carried out automatically utilizing a classification
algorithm such as the nearest neighbor paradigm. (5) The color mapping procedures
(statistic-matching and histogram-matching) are carried out to render natural colors onto
the false-colored image segment by segment. (6) The mapped image is then transformed
back to the RGB space. (7) Finally, the mapped image is transformed into HSV (Hue-
Saturation-Value) space and the “value” component of the mapped image is replaced with
the “fused NV image” (a grayscale image). Note that this fused image replacement is
necessary to allow the colorized image to have a proper and consistent contrast.
similar (i.e., Qw(x,y) > TQ (a predefined threshold)), these two clusters will be merged.
Qw(x,y) is a similarity metric (derived from the IQI metric described in Section 2.3.1)
between two clusters, x and y, which is defined in the lαβ color space as follows:
where wk is a given weight for each color component. Qk(x,y) is formulated below:
2 xy 2 x y
Qk ( x , y ) , (15b)
( x 2 y 2 ) ( 2x 2y )
where x and x are the mean and the standard deviation of cluster x in a particular
component, respectively. Similar definitions are applied to cluster y. The sizes (i.e., areas) of
two clusters (x and y) are usually unequal. Notice that Qk(x,y) is computed with regard to
the diffused false-color image.
σTk for k = { l, α, β },
μTk ,
I Ck ( I Sk μSk ) (16)
σ Sk
where IC is the colored image, IS is the source (false-color) image in lαβ space; μ denotes the
mean and σ denotes the standard deviation; the subscripts ‘S’ and ‘T’ refer to the source and
target images, respectively; and the superscript ‘k’ is one of the color components: { l, α, β}.
After this transformation, the pixels comprising the multispectral source image have means
and standard deviations that conform to the target daylight color image in lαβ space. The
color-mapped image is transformed back to the RGB space through the inverse transforms
(lαβ space to the LMS, exponential transform from LMS to LMS, and LMS to RGB, refer to
Eqs. (11-14)) (Zheng & Essock, 2008).
For each case as demonstrated in Figs. 2-6, the IQI values of four fusions are shown in the
figure captions. Actually, there were no iterations in Laplacian pyramid fusion and
orientation based fusion. For the Laplacian pyramid algorithm, a pair of fixed parameters,
Iterative Multiscale Fusion and Night Vision Colorization of Multispectral Images 469
(Ld, Lw) = (4, 4) as typically used in literature, were used in all pyramid fusions (shown in
Figs. 2-6). In general, the aDWTi-IQI algorithm converges at larger numbers of Ni and Lw but
a smaller number of Ld; whereas the aDWTi-rSFe algorithm converges at a larger number of
Ld but smaller numbers of Ni and Lw. Furthermore, the aDWTi-IQI algorithm produces a
smooth image, which is especially suitable for noisy images such as multispectral NV
images; whereas the aDWTi-rSFe algorithm yields a sharpened image, which is ideal for
well exposed daylight pictures (like the two-clock image pair). On the other hand, the
orientation-based fusion using Gabor wavelet transform is good for the fusion between
contrast-unlike images such as visible versus IR (thermal) images.
The IQI values (the higher the better) of four fusions, as shown in the figure captions of Figs.
2-6, are used for quantitative evaluations. The IQI results showed that, the orientation-based
fusion is the best in Figs. 2, 4, & 6, while the aDWTi-IQI fusion is the best in Figs. 3 & 5.
Visual perceptions provide the same rank of fused images as the quantitative evaluations.
As shown in Fig. 6, Laplacian fusion (Fig. 6c) is pretty good but the eyes behind the glasses
are not as clear as shown in the orientation fusion (Fig. 6f). Notice that eyes are the most
important facial features in face recognition systems and applications. The iterative fusions
of aDWTi-IQI and aDWTi-rSFe show overshot effect especially around the head boundary.
The IQI values reveal the same rank of different fusions. The 16×16 orientation fusion (16
bands by 16 orientations, Fig. 6f) presents more details and better contrast than other
multiscale fusions (Figs. 6c-e). In an M×N orientation-based fusion, a larger M (number of
bands) is usually beneficial to the detailed images like Fig. 6.
470 Sensor Fusion and Its Applications
The tree pairs of multispectral images were completely analyzed by the presented night
vision colorization algorithm; and the results using local coloring algorithm are illustrated in
Figs. 7-9. The original input images and the fused images used in the coloring process are
shown in Figs. 3-5a, Figs. 3-5b and Figs. 3-5d, respectively. The smooth images (Figs. 3-5d)
fused by the aDWTi-IQI algorithm were used in night vision colorization because they show
better contrast and less sensitive to noises. The false colored images are shown in Figs. 7-9a,
which were obtained by assigning image intensified (II) images to blue channels, infrared
(IR) images to red channels, and providing averaged II and IR images to green channels. The
rationale of forming a false-color image is to assign a long-wavelength NV image to the red
channel and to assign a short-wavelength NV to the blue channel. The number of false
colors were reduced with the nonlinear diffusion algorithm with AOS (additive operator
splitting for fast computation) implementation that facilitated the subsequent segmentation.
The segmentation was done in lαβ space through clustering and merging operations (see
Figs. 7-9b). The parameter values used in clustering and merging are NBin = [24 24 24], wk =
[0.25 0.35 0.40] and TQ = 0.90. To emphasize two chromatic channels (due to more
distinguishable among segments) in lαβ space, relatively larger weights were assigned in wk.
With the segment map, the histogram-matching and statistic-matching could be performed
segment by segment (i.e., locally) in lαβ space. The source region segments were
automatically recognized and associated with proper target color schemes (after the training
process is done). The locally colored images (segment-by-segment) are shown in Figs. 7-9c.
From a visual examination, the colored images (Figs. 7-9c) appear very natural, realistic, and
colorful. The comparable colorization results by using global coloring algorithm are presented
in Reference (Zheng & Essock, 2008). This segmentation-based local coloring process is fully
automatic and well adaptive to different types of multisensor images. The input images are
not necessary to be multispectral NV images although the illustrations given here use NV
images.
A different color fusion is illustrated in Fig. 10f by replacing the red channel image in Fig. 10a
with the orientation fused images in Fig. 10e (IQI = 0.7849). The orientation-based fusion
(Fig. 10e) was formed by combining the red channel image of Fig. 10a (visible band) and a
IR (thermal) image (Fig. 10b), which shows a better result than Figs. 10c-d. The colors in Fig.
10f is not as natural as daylight colors but useful for human perception especially for those
poorly illuminated images. For example, Fig. 10f shows a better contrast and more details
than Fig. 10a and Figs. 10c-e. Note that non-uniform band weights (Wb = [0.0250 0.0250
0.0500 0.0500 0.0875 0.0875 0.0875 0.0875 0.0875 0.0875 0.0875 0.0875 0.0500 0.0500 0.0250
0.0250]) were applied to the noisy input images in order to emphasize the contents at
medium frequencies meanwhile suppress the noise at high-frequencies.
The night vision colorization process demonstrated here took two-band multispectral NV
images as inputs. Actually, the local-coloring procedure can accept two or three input
images. If there are more than three bands of images available, we may choose the low-light
intensified (visual band) image and two bands of IR images. As far how to choose two
bands of IR images, we may use the image fusion algorithm as a screening process. The two
selected IR images for colorization should be the two images that can produce the most
(maximum) informative fused image among all possible fusions. For example, given three
IR images, IR1, IR2, IR3, the two chosen images for colorization, IC1, IC2, should satisfy the
472 Sensor Fusion and Its Applications
following equation: Fus(IC1, IC2) = max{Fus(IR1, IR2), Fus(IR1, IR3), Fus(IR2, IR3)}, where Fus
stands for the fusion process and max means selecting the fusion of maximum information.
5. Conclusions
The multispectral image fusion and night vision colorization approaches presented in this
chapter can be performed automatically and adaptively regardless of the image contents.
Experimental results with multispectral imagery showed that the fused image is informative
and clear, and the colorized image appears realistic and natural. We anticipate that the
presented fusion and colorization approaches for multispectral imagery will help improve
target recognition and visual analysis, especially for nighttime operations.
Specifically, the proposed approaches can produce two versions of synthesized imagery, a
grayscale image and a color image. The image fusion procedure is based on multiscale
analysis, and the fused image is suitable to machine analysis (e.g., target recognition). The
night vision colorization procedure is based on image segmentation, pattern recognition,
and color mapping. The colorized image is good for visual analysis (e.g., pilot navigation).
The synthesized multispectral imagery with proposed approaches will eventually lead to
improved performance of remote sensing, nighttime navigation, and situational awareness.
6. Acknowledgements
This research is supported by the U. S. Army Research Office under grant number W911NF-
08-1-0404.
Iterative Multiscale Fusion and Night Vision Colorization of Multispectral Images 473
7. References
Ajazzi, B.; Alparone, L.; Baronti, S.; & Carla, R.; (1998). Assessment of pyramid-based
multisensor image data fusion, in Proc. SPIE 3500, 237–248.
Barash, D. & Comaniciu, D. (2004). A common framework for nonlinear diffusion, adaptive
smoothing, bilateral filtering and mean shift, Image Vision Computing 22(1), 73-81.
Burt, P. J. & Adelson, E. H. (1983). The Laplacian pyramid as a compact image code, IEEE
Trans. Commun. Com-31 (4), 532–540.
Burt, P. J. & Adelson, E. H. (1985). Merging images through pattern decomposition, Proc.
SPIE 575, 173–182.
Eskicioglu, A. M. & Fisher, P. S. (1995). Image quality measure and their performance, IEEE
Trans. Commun. 43(12), 2959–2965.
Essock, E. A.; McCarley, J. S.; Sinai, M. J. & DeFord, J. K. (2001). Human perception of
sensor-fused imagery, in Interpreting Remote Sensing Imagery: Human Factors, R. R.
Hoffman and A. B. Markman, Eds., Lewis Publishers, Boca Raton, Florida.
Essock, E. A.; Sinai, M. J. & et al. (1999). Perceptual ability with real-world nighttime scenes:
imageintensified, infrared, and fused-color imagery, Hum. Factors 41(3), 438–452.
Fairchild, M. D. (1998). Color Appearance Models, Addison Wesley Longman Inc., ISBN: 0-201-
63464-3, Reading, MA.
Gonzalez, R. C. & Woods, R. E. (2002). Digital Image Processing (Second Edition), Prentice
Hall, ISBN: 0201180758, Upper Saddle River, NJ.
Hollingsworth, K. P.; Bowyer, K. W.; Flynn, P. J. (2009). The Best Bits in an Iris Code, IEEE
Trans. on Pattern Analysis and Machine Intelligence, vol. 31, no. 6, pp. 964-973.
Jahard, F.; Fish, D. A.; Rio, A. A. & Thompson C. P. (1997). Far/near infrared adapted
pyramid-based fusion for automotive night vision, in IEEE Proc. 6th Int. Conf. on
Image Processing and its Applications (IPA97), pp. 886–890.
Jones J. P. & Palmer, L. A. (1987). The two-dimensional spectral structure of simple receptive
fields in cat striate cortex, Journal of Neurophysiology, vol.58 (6), pp. 1187–1211.
Keys, L. D.; Schmidt, N. J.; & Phillips, B. E. (1990). A prototype example of sensor fusion
used for a siting analysis, in Technical Papers 1990, ACSM-ASPRS Annual Conf.
Image Processing and Remote Sensing 4, pp. 238–249.
Keysers, D.; Paredes, R.; Ney, H. & Vidal, E. (2002). Combination of tangent vectors and
local representations for handwritten digit recognition, Int. Workshop on Statistical
Pattern Recognition, Lecture Notes in Computer Science, Vol. 2396, pp. 538-547,
Windsor, Ontario, Canada.
Li, S.; Kwok, J. T. & Wang, Y. (2001). Combination of images with diverse focuses using the
spatial frequency, Information Fusion 2(3), 169–176.
Nunez, J.; Otazu, X.; & et al. (1999). Image fusion with additive multiresolution wavelet
decomposition; applications to spot1landsat images, J. Opt. Soc. Am. A 16, 467–474.
Perona, P. & Malik, J. (1990). Scale space and edge detection using anisotropic diffusion,
IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 629–639.
Piella, G. & Heijmans, H. (2003). A new quality metric for image fusion, in Proc. 2003 Int.
Conf. on Image Processing, Barcelona, Spain.
Pohl C. & Genderen J. L. V. (1998). Review article: multisensor image fusion in remote
sensing: concepts, methods and applications, Int. J. Remote Sens. 19(5), 823–854.
Pu T. & Ni, G. (2000). Contrast-based image fusion using the discrete wavelet transform,
Opt. Eng. 39(8), 2075–2082.
474 Sensor Fusion and Its Applications
Rogers, R. H. & Wood, L (1990). The history and status of merging multiple sensor data: an
overview, in Technical Papers 1990, ACSMASPRS Annual Conf. Image Processing
and Remote Sensing 4, pp. 352–360.
Ruderman, D. L.; Cronin, T. W. & Chiao, C. C. (1998). Statistics of cone responses to natural
images: implications for visual coding, Journal of the Optical Society of America A 15
(8), 2036–2045.
Toet, A. (2003). Natural colour mapping for multiband nightvision imagery, Information
Fusion 4, 155-166.
Toet, A. & IJspeert, J. K. (2001). Perceptual evaluation of different image fusion schemes, in:
I. Kadar (Ed.), Signal Processing, Sensor Fusion, and Target Recognition X, The
International Society for Optical Engineering, Bellingham, WA, pp.436–441.
Toet, A.; IJspeert, J.K.; Waxman, A. M. & Aguilar, M. (1997). Fusion of visible and thermal
imagery improves situational awareness, in: J.G. Verly (Ed.), Enhanced and Synthetic
Vision 1997, International Society for Optical Engineering, Bellingham, WA, pp.177–
188.
Varga, J. T. (1999). Evaluation of operator performance using true color and artificial color in
natural scene perception (Report ADA363036), Naval Postgraduate School,
Monterey, CA.
Wang, Z. & Bovik, A. C. (2002). A universal image quality index, IEEE Signal Processing
Letters 9(3), 81–84.
Waxman, A.M.; Gove, A. N. & et al. (1996). Progress on color night vision: visible/IR fusion,
perception and search, and low-light CCD imaging, Proc. SPIE Vol. 2736, pp. 96-
107, Enhanced and Synthetic Vision 1996, Jacques G. Verly; Ed.
Zheng, Y. & Agyepong, K. (2007). Mass Detection with Digitized Screening Mammograms
by Using Gabor Features, Proceedings of the SPIE, Vol. 6514, pp. 651402-1-12.
Zheng, Y. & Essock, E. A. (2008). A local-coloring method for night-vision colorization
utilizing image analysis and image fusion, Information Fusion 9, 186-199.
Zheng, Y.; Essock, E. A. & Hansen, B. C. (2005). An advanced DWT fusion algorithm and its
optimization by using the metric of image quality index, Optical Engineering 44 (3),
037003-1-12.
Zheng, Y.; Essock, E. A. & Hansen, B. C. (2004). An advanced image fusion algorithm based
on wavelet transform—incorporation with PCA and morphological processing,
Proc. SPIE 5298, 177–187.
Zheng, Y.; Essock, E. A.; Hansen, B. C. & Haun, A. M. (2007). A new metric based on
extended spatial frequency and its application to DWT based fusion algorithms,
Information Fusion 8(2), 177-192.
Zheng, Y.; Hansen, B. C. & Haun, A. M. & Essock, E. A. (2005). Coloring Night-vision
Imagery with Statistical Properties of Natural Colors by Using Image Segmentation
and Histogram Matching, Proceedings of the SPIE, Vol. 5667, pp. 107-117.
Super-Resolution Reconstruction by Image Fusion
and Application to Surveillance Videos Captured by Small Unmanned Aircraft Systems 475
22
X
1. Introduction
In practice, surveillance video captured by a small Unmanned Aircraft System (UAS) digital
imaging payload is almost always blurred and degraded because of limits of the imaging
equipment and less than ideal atmospheric conditions. Small UAS vehicles typically have
wingspans of less than four meters and payload carrying capacities of less than 50
kilograms, which results in a high vibration environment due to winds buffeting the aircraft
and thus poorly stabilized video that is not necessarily pointed at a target of interest. Super-
resolution image reconstruction can reconstruct a highly-resolved image of a scene from
either a single image or a time series of low-resolution images based on image registration
and fusion between different video frames [1, 6, 8, 18, 20, 27]. By fusing several subpixel-
registered, low-resolution video frames, we can reconstruct a high-resolution panoramic
image and thus improve imaging system performance. There are four primary applications
for super-resolution image reconstruction:
1. Automatic Target Recognition: The interesting target is hard to identify and recognize
under degraded videos and images. For a series of low-resolution images captured
by a small UAS vehicle flown over an area under surveillance, we need to perform
super-resolution to enhance image quality and automatically recognize targets of
interest.
2. Remote Sensing: Remote sensing observes the Earth and helps monitor vegetation
health, bodies of water, and climate change based on image data gathered by
wireless equipments over time. We can gather additional information on a given
area by increasing the spatial image resolution.
3. Environmental Monitoring: Related to remote sensing, environmental monitoring
helps determine if an event is unusual or extreme, and to assist in the development
of an appropriate experimental design for monitoring a region over time. With the
476 Sensor Fusion and Its Applications
development of green industry, the related requirements become more and more
important.
4. Medical Imaging: In medical imaging, several images of the same area may be
blurred and/or degraded because of imaging acquisition limitations (e.g., human
respiration during image acquisition). We can recover and improve the medical
image quality through super-resolution techniques.
This paper proceeds as follows. Section 2 describes the basic modeling of super-resolution
image reconstruction. Our proposed super-resolution algorithm is presented in Section 3,
with experimental results presented in Section 4. We draw conclusions from this research in
Section 5.
update the super-resolution results. From equation (1), the total error for super-resolution
reconstruction in the L2-norm can be represented as
1 n 2
L2 ( X ) Yk Dk Ck Fk X . (4)
2 k 1 2
Differentiating L2 ( X ) with respect to X , we have the gradient L2 ( X ) of L2 ( X ) as the
sum of derivatives over the low-resolution input images:
n
L2 ( X ) FkT CkT DkT Dk Ck Fk X Yk
k 1
(5)
We can then implement an iterative gradient-based optimization technique to reach the
minimum value of L2 ( X ) , such that
X t 1 X t L2 ( X ) , (6)
where is a scalar that defines the step size of each iteration in the direction of the gradient
L2 ( X ) .
Instead of a summation of gradients over the input images, Zomet [31] calculated n times
the scaled pixel-wise median of the gradient sequence in L2 ( X ) . That is,
X t 1 X t n median F1T C1T D1T D1C1 F1 X Y1 , , FnT CnT DnT Dn Cn Fn X Yn , (7)
where t is the iteration step number. It is well-known that the median filter is robust to
outliers. Additionally, the median can agree well with the mean value under a sufficient
number of samples for a symmetric distribution. Through the median operation in equation
(7), we supposedly have a robust super-resolution solution. However, we need to execute
many computations to implement this technique. We not only need to compute the gradient
map for every input image, but we also need to implement a large number of comparisons
to compute the median. Hence, this is not truly an efficient super-resolution approach.
3.1 Up-sampling process between additional frame and the reference frame
Without loss of generality, we assume that i1 is the reference frame. For every additional
frame ik ( 1 k n) in the video sequence, we transform it into the coordinate system of the
reference frame through image registration. Thus, we can create a warped image
Super-Resolution Reconstruction by Image Fusion
and Application to Surveillance Videos Captured by Small Unmanned Aircraft Systems 479
4. Experimental Results
The proposed efficient and robust super-resolution image reconstruction algorithm was
tested on two sets of real video data captured by an experimental small UAS operated by
480 Sensor Fusion and Its Applications
(a) (b)
(c) (d)
(e) (f)
Fig. 3. Test Set #1 super-resolved images, factor 4 (reduced to 60% of original size for
display). Results were computed as follows: (a) Robust super-resolution [31]. (b) Bicubic
interpolation. (c) Iterated back projection [10]. (d) Projection onto convex sets (POCS) [24].
(e) Papoulis-Gerchberg algorithm [8, 19]. (f) Proposed method.
(a) (b)
(c) (d)
Super-Resolution Reconstruction by Image Fusion
and Application to Surveillance Videos Captured by Small Unmanned Aircraft Systems 483
(e) (f)
Fig. 6. Test Set #2 super-resolved images, factor 4(reduced to 60% of original size for
display). Results were computed as follows: (a) Robust super-resolution [31]. (b) Bicubic
interpolation. (c) Iterated back projection [10]. (d) Projection onto convex sets (POCS) [24].
(e) Papoulis-Gerchberg algorithm [8, 19]. (f) Proposed method.
Tables 1, 2, 3, and 4 show the CPU running times in seconds for five established super-
resolution algorithms and our proposed algorithm with up-sampling factors of 2 and 4.
Here, the robust super-resolution algorithm is abbreviated as RobustSR, the bicubic
interpolation algorithm is abbreviated as Interp, the iterated back projection algorithm is
abbreviated as IBP, the projection onto convex sets algorithm is abbreviated as POCS, the
Papoulis-Gerchberg algorithm is abbreviated as PG, and the proposed efficient super-
resolution algorithm is abbreviated as MedianESR. From these tables, we can see that
bicubic interpolation gives the fastest computation time, but its visual performance is rather
poor. The robust super-resolution algorithm using the longest running time is
computationally expensive, while the proposed algorithm is comparatively efficient and
presents good visual performance. In experiments, all of these super-resolution algorithms
were implemented using the same estimated motion parameters.
5. Summary
We have presented an efficient and robust super-resolution restoration method by
computing the median on a coarsely-resolved up-sampled image sequence. In comparison
with other established super-resolution image reconstruction approaches, our algorithm is
not only efficient with respect to the number of computations required, but it also has an
acceptable level of visual performance. This algorithm should provide a movement in the
right direction with respect to real-time super-resolution image reconstruction. In future
research, we plan to try other motion models such as planar homography and multi-model
motion in order to determine whether or not we can achieve better performance. In
addition, we will explore to incorporate the natural image characteristics to set up the
criterion of super-resolution algorithms such that the super-resolved images provide high
visual performance under natural image properties.
6. References
1. S. Borman and R. L. Stevenson, “Spatial Resolution Enhancement of Low-Resolution
Image Sequences – A Comprehensive Review with Directions for Future Research.”
University of Notre Dame, Technical Report, 1998.
2. D. Capel and A. Zisserman, “Computer Vision Applied to Super Resolution.” IEEE
Signal Processing Magazine, vol. 20, no. 3, pp. 75-86, May 2003.
3. M. C. Chiang and T. E. Boulte, “Efficient Super-Resolution via Image Warping.” Image
Vis. Comput., vol. 18, no. 10, pp. 761-771, July 2000.
4. M. Elad and A. Feuer, “Restoration of a Single Super-Resolution Image from Several
Blurred, Noisy and Down-Sampled Measured Images.” IEEE Trans. Image Processing,
vol. 6, pp. 1646-1658, Dec. 1997.
5. M. Elad and Y. Hel-Or, “A Fast Super-Resolution Reconstruction Algorithm for Pure
Translational Motion and Common Space Invariant Blur.” IEEE Trans. Image Processing,
vol. 10, pp. 1187-1193, Aug. 2001.
6. S. Farsiu, D. Robinson, M. Elad, and P. Milanfar, “Advances and Challenges in Super-
Resolution.” International Journal of Imaging Systems and Technology, Special Issue on
High Resolution Image Reconstruction, vol. 14, no. 2, pp. 47-57, Aug. 2004.
7. S. Farsiu, D. Robinson, M. Elad, and P. Milanfar, “Fast and Robust Multi-Frame Super-
resolution.” IEEE Transactions on Image Processing, vol. 13, no. 10, pp. 1327-1344, Oct.
2004.
8. R.W. Gerchberg, “Super-Resolution through Error Energy Reduction.” Optica Acta, vol.
21, no. 9, pp. 709-720, 1974.
9. R. C. Gonzalez and P. Wintz, Digital Image Processing. New York: Addison-Wesley, 1987.
10. M. Irani and S. Peleg, “Super Resolution from Image Sequences.” International
Conference on Pattern Recognition, vol. 2, pp. 115-120, June 1990.
11. M. Irani, B. Rousso, and S. Peleg, “Computing Occluding and Transparent Motions.”
International Journal of Computer Vision, vol. 12, no. 1, pp. 5-16, Feb. 1994.
12. M. Irani and S. Peleg, “Improving Resolution by Image Registration.” CVGIP: Graph.
Models Image Processing, vol. 53, pp. 231-239, 1991.
13. A. K. Jain, Fundamentals in Digital Image Processing. Englewood Cliffs, NJ: Prentice-Hall,
1989.
Super-Resolution Reconstruction by Image Fusion
and Application to Surveillance Videos Captured by Small Unmanned Aircraft Systems 485
14. D. Keren, S. Peleg, and R. Brada, “Image Sequence Enhancement Using Sub-Pixel
Displacements.” In Proceedings of IEEE Computer Society Conference on Computer Vision
and Pattern Recognition (CVPR ‘88), pp. 742-746, Ann Arbor, Michigan, June 1988.
15. S. P. Kim and W.-Y. Su, “Subpixel Accuracy Image Registration by Spectrum
Cancellation.” In Proceedings IEEE International Conference on Acoustics, Speech and Signal
Processing, vol. 5, pp. 153-156, April 1993.
16. R. L. Lagendijk and J. Biemond. Iterative Identification and Restoration of Images. Boston,
MA: Kluwer, 1991.
17. L. Lucchese and G. M. Cortelazzo, “A Noise-Robust Frequency Domain Technique for
Estimating Planar Roto-Translations.” IEEE Transactions on Signal Processing, vol. 48, no.
6, pp. 1769–1786, June 2000.
18. N. Nguyen, P. Milanfar, and G. H. Golub, “A Computationally Efficient Image
Superresolution Algorithm.” IEEE Trans. Image Processing, vol. 10, pp. 573-583, April
2001.
19. A. Papoulis, “A New Algorithm in Spectral Analysis and Band-Limited Extrapolation.”
IEEE Transactions on Circuits and Systems, vol. 22, no. 9, pp. 735-742, 1975.
20. S. C. Park, M. K. Park, and M. G. Kang, “Super-Resolution Image Reconstruction: A
Technical Overview.” IEEE Signal Processing Magazine, vol. 20, no. 3, pp. 21-36, May
2003.
21. S. Peleg, D. Keren, and L. Schweitzer, “Improving Image Resolution Using Subpixel
Motion.” CVGIP: Graph. Models Image Processing, vol. 54, pp. 181-186, March 1992.
22. W. K. Pratt, Digital Image Processing. New York: Wiley, 1991.
23. R. R. Schultz, L. Meng, and R. L. Stevenson, “Subpixel Motion Estimation for Super-
Resolution Image Sequence Enhancement.” Journal of Visual Communication and Image
Representation, vol. 9, no. 1, pp. 38-50, 1998.
24. H. Stark and P. Oskoui, “High-Resolution Image Recovery from Image-Plane Arrays
Using Convex Projections.” Journal of the Optical Society of America, Series A, vol. 6, pp.
1715-1726, Nov. 1989.
25. H. S. Stone, M. T. Orchard, E.-C. Chang, and S. A. Martucci, “A Fast Direct Fourier-
Based Algorithm for Sub-Pixel Registration of Images.” IEEE Transactions on Geoscience
and Remote Sensing, vol. 39, no. 10, pp. 2235-2243, Oct. 2001.
26. L. Teodosio and W. Bender, “Salient Video Stills: Content and Context Preserved.” In
Proc. 1st ACM Int. Conf. Multimedia, vol. 10, pp. 39-46, Anaheim, California, Aug. 1993.
27. R. Y. Tsai and T. S. Huang, “Multiframe Image Restoration and Registration.” In
Advances in Computer Vision and Image Processing, vol. 1, chapter 7, pp. 317-339, JAI
Press, Greenwich, Connecticut, 1984.
28. H. Ur and D. Gross, “Improved Resolution from Sub-Pixel Shifted Pictures.” CVGIP:
Graph. Models Image Processing, vol. 54, no. 181-186, March 1992.
29. P. Vandewalle, S. Susstrunk, and M. Vetterli, “A Frequency Domain Approach to
Registration of Aliased Images with Application to Super-Resolution.” EURASIP Journal
on Applied Signal Processing, vol. 2006, pp. 1-14, Article ID 71459.
30. B. Zitova and J. Flusser, “Image Registration Methods: A Survey.” Image and Vision
Computing, vol. 21, no. 11, pp. 977-1000, 2003.
31. A. Zomet, A. Rav-Acha, and S. Peleg, “Robust Superresolution.” In Proceedings of IEEE
Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ‘01), vol. 1,
pp. 645-650, Kauai, Hawaii, Dec. 2001.
486 Sensor Fusion and Its Applications