Beruflich Dokumente
Kultur Dokumente
net/publication/308344751
CITATIONS READS
11 748
3 authors, including:
Some of the authors of this publication are also working on these related projects:
A comparison of simultaneous state and parameter estimation schemes for a continuous fermentor reactor View project
All content following this page was uploaded by Aditya Tulsyan on 25 January 2018.
Review
a r t i c l e i n f o a b s t r a c t
Article history: The main purpose of this primer is to systematically introduce the theory of particle filters to readers with
Received 12 February 2016 limited or no prior understanding of the subject. The primer is written for beginners and practitioners
Received in revised form 14 July 2016 interested in learning about the theory and implementation of particle filtering methods. Throughout
Accepted 31 August 2016
this primer we highlight the common mistakes that beginners and first-time researchers make in under-
Available online 19 September 2016
standing and implementing the theory of particle filtering. We also discuss and demonstrate the use
of particle filtering in nonlinear state estimation applications. We conclude the primer by providing an
Keywords:
implementable version of MATLAB code for particle filters. The code not only aids in improving the under-
Monte Carlo method
Particle filter
standing of particle filters, it also serves as a template for building and implementing advanced nonlinear
State estimation state estimation routines.
Bayesian inference © 2016 Elsevier Ltd. All rights reserved.
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2. Perfect sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2.1. Empirical distribution function under perfect sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
2.2. Understanding Dirac Delta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2.3. Integral property of Dirac Delta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2.4. Reconstructing CDF under perfect sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
2.4.1. Comparing MC probability approximation and histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
2.5. The resolution problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
2.5.1. The inter-sample resolution problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
2.5.2. The sampling resolution problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3. Importance sampling (IS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3.1. Empirical distribution function under IS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3.2. Importance weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
3.3. Reconstructing CDF under importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4. Sampling importance resampling (SIR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.1. Resampling step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.2. Resampling strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5. State estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.1. State space models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.2. Bayesian state estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.3. Filtering methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.4. Particle filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
∗ Corresponding author.
E-mail addresses: tulsyan@mit.edu (A. Tulsyan), bhushan.gopaluni@ubc.ca (R. Bhushan Gopaluni), srkhare@maths.iitkgp.ernet.in (S.R. Khare).
http://dx.doi.org/10.1016/j.compchemeng.2016.08.015
0098-1354/© 2016 Elsevier Ltd. All rights reserved.
A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145 131
6. Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Fig. 2. An illustration showing: (a) Perfect i.i.d. random sampling from a standard
normal distribution with random particles denoted by red dots along the axis and
the underlying density function represented by the solid black curve; (b) probability
of X assuming values in an infinitesimal interval set dx. The probability of X ∈ dx is
Fig. 1. A computer generated graphical illustration showing: (a) Bi-variate Gaussian the blue highlighted area and is approximately equal to p(x)dx. (For interpretation of
density function; (b) random samples or particles distributed according to the bi- the references to color in this figure legend, the reader is referred to the web version
variate Gaussian density function. The figure is created in MATLAB using command of the article.)
mvrnd (MATLAB, 2010a).
different. In our opinion, the varied representations used in dif- p(x) = ı(x − x(i) ), (7)
N
ferent disciplines have left practitioners in a lurch when it comes i=1
to fully understanding an empirical distribution function. In this
paper, we will assume (2) as the definition of an empirical dis- A serious flaw with (7) is that it erroneously suggests that MC meth-
tribution function; however, readers are cautioned that without ods approximate PDFs. In fact, in literature, it is often reported that
invoking the rigorous ideas of probability and measures, (2) also MC methods approximate PDFs by counting the number of parti-
constitutes an abuse of notation. It is not our aim to revisit the his- cles at each location. It is easy to check that this claim is incorrect
torical development of an empirical distribution, or to establish the since the maximum density value assigned at a particle location
rigorous theory behind it. The objective here is to provide insights, in (7) with the Kronecker delta function in (6) is 1, while the true
and to highlight common misconceptions behind the use of (2) as density can be greater than 1.
prevalent among practitioners. We begin by looking at the Dirac Thus while perceiving delta in (2) as a measure is rigorous and
delta used in (2). consistent with the theory of MC methods, it is a standard practice
to refer to it as a function. In our view, the abuse of semantics is
2.2. Understanding Dirac Delta serious, for it incites practitioners into accepting the heuristic or
incorrect definitions of delta as suggested by (4) and (6). In the rest
In the context of MC methods, a rigorous way to define delta in of the paper, any reference to delta is to be construed as a Dirac
(2) is to think of it as a “measure” or Dirac delta for a given point x measure, as defined by (3).
in the real-line R. A Dirac delta accepts as an argument some set A Finally, Fig. 3(a) gives a pedagogical sketch of the representa-
and returns a value as defined by tion in (2). As evident from Fig. 3(a), it is intuitive to think of an
empirical distribution function as a collection of mass contained
1 if x ∈ A in N infinitesimal bins, each of length dx and centered around the
ıx (A) = 1A (x) = , (3) N
0 otherwise particle set {x(i) }i=1 .
134 A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145
1 1 (i)
N N
1
N
1
N
= ıx(i) (dx) = 1L (x(i) ), (12b)
N L
N
i=1 i=1
1
N
˜
P̃X (+∞) = Pr(X ∈ (−∞, +∞)) = 1(−∞,+∞) (x(i) ) = 1. (15a)
N
i=1
1
10,000
density in L by counting the number of particles in L normalized The numerical value reported in this example is based on a code
by total area under the histogram. Thus it is easy to check that the written in MATLAB. Note that the calculated CDF value is “close” to
sum of MC probabilities over all given intervals always adds up to 1 the theoretical value of 0.5.
(see Fig. 3(c)); whereas, the sum of histogram values can be greater
than 1. Conversely, it is also true that the area under a histogram is Finally, Fig. 3(d) gives a schematic of the CDF approximation in (14).
always 1; whereas, area under a MC distribution function need not The “step-like” approximation of CDF in Fig. 3(d) is from the indi-
necessarily be 1. The next example gives a MATLAB code to generate cator function in (14) which adds a step-height of N−1 for every
a MC and histogram approximations of the distribution function particle encountered while moving from left to right along the
and PDF, respectively, of a Gaussian random variable. real-line as the CDF is reconstructed. In MATLAB, MC approxima-
tion of a CDF given in (14) can be automatically computed using
Example 2.1. Let X∼N( · |0, 0.012 ) be a Gaussian random variable the command function cdfplot.
with mean 0 and standard deviation 0.01. Assuming we sample
N = 10, 000 i.i.d. particles distributed according to N( · |0, 0.012 ), 2.5. The resolution problem
then a MC approximation of underlying distribution function and
histogram approximation of the PDF can be computed in MATLAB In MC methods, the resolution or quality of the CDF approxima-
using the following code: tion is proportional to the number of particles used. The resolution
of P̃X in (14) thus can be arbitrarily improved by simply increasing
the number of particles. Since computers only have finite comput-
ing capabilities, the practical resolution of (14) is also finite. The
finite resolution of (14) leads to two serious problems (a) finite
inter-sample resolution, and (b) finite sampling resolution. These
two issues are discussed next.
From Fig. 5(a) it is clear that while the true probability Pr(X ∈ [a,
1
N ˜
b]) is non-zero, Pr(X ∈ [a, b]) = 0. Since (17) holds for any subset
˜
P̃X (a) = Pr(X ∈ (−∞, a]) = 1(−∞,a] (x(i) ), (14) contained in the inter-sample interval [x(i) , x(j) ), the CDF approx-
N
i=1 imations at inter-sample particle locations are generally poor. In
136 A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145
Fig. 5. A schematic highlighting resolution problems with MC approximation of the CDF, P̃X in (14). P̃X is denoted by “step-like” function in red and true CDF PX represented
by solid black curve. Sampled particles are denoted by solid red balls along the X-axis. (a) we show that for any interval [a, b] ⊂ [x(i) , x(j) ), where {x(i) , x(j) } is a pair of adjacent
random particles, probability Pr(X˜ ∈ [a, b]) = 0, while Pr(X ∈ [a, b]) > 0. (b) We sketch the principle of inverse transform sampling method. Let Y ∼U( · |0, 1) be a uniformly
distributed random variable defined in the interval [0, 1]. Now according to inverse transform sampling method, for every random number Y = y generated in interval [a,
b], projecting it onto P̃X selects x(j) as random sample; whereas, projecting onto PX selects different random samples. In the figure, projection of numbers a and b onto PX
is illustrated by yellow arrow and projection onto P̃X denoted by black arrow. Now since the probability Pr(Y ∈ [a, b]) = (b − a)−1 = N−1 (recall that step-size at each particle
location in P̃X is N−1 ), we have Pr(X
˜ = x(i) ) = N −1 . (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)
N-i.i.d. particles distributed according to s(·) then an empirical dis- that in many practical applications, including state estimation, the
tribution function for X under IS is normalizing factor in the target density is unknown. This implies
that the importance weight in (24) is only known up to a constant
N
p(x)dx = w(i) ıx̃(i) (dx), (20) scaling factor such that
i=1 p(x̃(i) )
w(i) ∝ for i = 1, . . ., N. (26)
where p(x)dx is an N-particle approximation of an empirical distri- s(x̃(i) )
bution function of X over dx, and ıx̃(i) ( · ) is the Dirac delta and w(i) A straightforward approach to resolve this issue is to normalize the
is the “importance weight” associated with x̃(i) . The choice of an weights in (26) as follows
empirical distribution function under IS in (20) can be substanti-
ated by reconsidering the integration in (9). Recall that if p(x) lends w(i)
w̃(i) = N , for i = 1, . . ., N, (27)
itself to perfect sampling then (10b) yields a MC approximation of i=1
w(i)
(9); otherwise we can rewrite (9) as
where w̃ is the “normalized” weight. With normalized weights, the
p(x) empirical distribution in (20) can now be written as
H(f (x)) = f (x) s(x)dx. (21)
s(x)
R
N
Now since s(x) admits perfect sampling, an empirical distribution p(x)dx = w̃(i) ıx̃(i) (dx). (28)
function for X̃ can be written as i=1
1
N The normalized weights in (28) satisfy the following properties
s(x)dx = ıx̃(i) (dx). (22)
N w̃(i) ≥ 0, for i = 1, . . ., N,
i=1
˜
Pr(X ∈ L) = w̃(i) ıx̃(i) (dx), (30a)
N
N
L
(i) (i) (i) i=1
= w f (x)ıx̃(i) (dx) = f (x̃ )w , (25b)
i=1
R
i=1
N
N
(i)
= w̃ ıx̃(i) (dx) = w̃(i) 1L (x̃(i) ), (30b)
which is same as (23d). This motivates the use of (20) as the abstract
L
i=1 i=1
definition of an empirical distribution function under IS. Observe
that the importance weight in (24) is a random variable with W (i) = ˜ · ) is an MC approximation of the distribution function
where Pr(
N
w(i) denoting its random realization. The ordered set {x̃(i) , w(i) }i=1 Pr(·) under IS. Finally, using (30b), the CDF of X can be approximated
is referred to as a“particle system”. The use of importance weights by setting L = (∞, a] in (30b)
in (20) is discussed next.
N
˜
P̃X (a) = Pr(X ∈ (−∞, a]) = w̃(i) 1(−∞,a] (x̃(i) ), (31)
3.2. Importance weights
i=1
The importance weight in (20) reflects the correction involved where P̃X ( · ) is a MC approximation of the CDF. Observe that (29)
to ensure that the particles from the sampling density are in fact ensures that P̃X (+∞) = 1 in (31). Graphically, the CDF approxima-
samples from the target density. This correction involves assigning tion in (31) yields a “step-like” function (see Fig. 7 for illustration)
a weight to each of the sampled particles. As seen in (24), the impor- with the step-size at each particle location equal to the particle
tance weight of a particle is proportional to the ratio of the target weight (contrast this with the perfect sampling case where the
density to sampling density evaluated at the particle location. Note steps have a uniform height of N−1 ).
138 A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145
1
M the empirical distribution function of X under SIR can be repre-
p(x)dx = ıx(j) (dx). sented as
M
1
j=1 M
p(x)dx = ıx(j) (dx), (34)
M
j=1
5. State estimation
Yt |(Xt = xt )∼p(yt |xt ), (35c) filters. Both the Kalman and SMC-based filters are tractable in finite
computational time and can be used for state estimation in general
where (35a) is the density of the initial latent state. For the sake
or specific types of nonlinear SSMs. A detailed exposition of non-
of brevity, the dependence of (35) on the exogenous inputs and
linear filtering methods and related approximations is not included
known model parameters are not explicitly shown in this section;
here, but can be found in the handbook of nonlinear filtering (Crisan
however, it is straightforward to include them (see Section 6. Fur-
and Rozovskii, 2011).
ther, this exposition only considers the case with n = 1 and m = 1.
The class of SMC-based filtering methods, popularly referred to
Its extension to multi-dimensional systems is straightforward in
particle filters is an importance class of filtering methods for nonlin-
terms of implementation; however, it should be cautioned that like
ear SSMs. Some of the popular particle filtering algorithms, include
many other state estimation algorithms, particle filters also suf-
sampling importance resampling (SIR) filter, auxiliary SIR (ASIR) fil-
fer from the curse of dimensionality problem, as sampling is often
ter, and Rao-Blackwellized particle filter (RBPF). The SIR method is
inefficient in higher dimensional spaces (order of tens or higher).
the simplest of all other particle filtering algorithms. In the next sec-
The state estimation problem aims at computing an estimate of
tion, we demonstrate the application of the SIR method discussed
xt ∈ X in real-time using a sequence of measurements denoted as
in Section 4 in solving the recursive filtering solution in (38) and
y1:t ≡ {y1 , . . ., yt } for all t ∈ N.
(39).
5.2. Bayesian state estimation Algorithm 4. Particle filter for state estimation
(i) N
1: Initialization: generate {x0|0 } ∼p(x0 ) distributed according to the initial
i=1
In the Bayesian framework, state estimation for SSMs in (35) is state density p(x0 ).
solved by recursively computing the state posterior density 2: for t=1 to T do
(i) N
3: Predict: predict {xt|t−1 } according to
i=1
Xt |(Y1:t = y1:t )∼p(xt |y1:t ), (36) (i) (i)
xt|t−1 ∼p(xt |xt−1|t−1 ), for i = 1, . . ., N.
4: Update: compute importance weights according to
where p(xt |y1:t ) is called a posterior or filtering density. Intuitively, (i)
p(yt |xt|t−1 )
the posterior density is a probabilistic representation of available w̃t|t−1 = N
(i)
, for i = 1, . . ., N.
(i)
statistical information on Xt given y1:t . Using the Markov property p(yt |xt|t−1 )
i=1
of (35) and the Bayes’ theorem, we can write 5: State estimation: compute the state estimate as
N
Further, using marginalization p(xt |y1:t−1 ) can be written as The SIR method is the most basic particle filtering method. In
order to use the idea of an SIR discussed in Section 4 for state esti-
p(xt |y1:t−1 ) = p(xt |xt−1 )p(xt−1 |y1:t−1 )dxt−1 , (39) mation, we first need to identify the target and sampling densities
X in the filtering problem. Observe that in (40), the state estimates
are computed by solving the integral with respect to the posterior
where p(xt |xt−1 ) and p(xt−1 |y1:t−1 ) are the transition and posterior
density. Recall that a MC approximation of (40) can be computed
densities at t − 1, respectively. Given p(xt |y1:t ), the most common
by generating perfect samples distributed according to p(xt |y1:t ).
point estimate of xt is the posterior mean given by
Now since, p(xt |y1:t ) does not permit perfect sampling, we use an
SIR method to generate samples from p(xt |y1:t ). Therefore in (38),
x̂t|t = E[Xt |(Y1:t = y1:t )] = xt p(xt |y1:t )dxt , (40)
p(xt |y1:t ) is the target density, and p(xt |y1:t−1 ) is taken as the samp-
X
ling density, such that
where x̂t|t ∈ Rn is an estimate of xt . Recursively solving (38) and
(39) for the posterior density is called the filtering problem, and the p(xt |y1:t ) ∝ p(yt |xt ) p(xt |y1:t−1 ) . (41)
solution methods are called the filtering methods. targetdensity samplingdensity
5.3. Filtering methods It is highlighted that while p(xt |y1:t−1 ) is selected as the sampling
density in (41), it is not a requirement in general. In fact, in many
In linear SSMs, the state posterior density in (38) is Gaussian and advanced particle filtering algorithms, sampling densities other
can thus be exactly represented by the Kalman filter (KF) using a than the one considered here have been found to be more effective
finite number of moments (e.g., mean, variance); whereas, in non- in state estimation applications. The discussion on other sampling
linear SSMs, the posterior is non-Gaussian, and at least in theory, an density functions is outside the scope of this work; however, read-
infinite number of moments are required for exact representation ers are referred to Arulampalam et al. (2002), and references cited
of the density (Ristic et al., 2004). Thus, with finite computing capa- therein for a detailed treatment on the subject. Now for the choice of
bilities, an optimal nonlinear state filter is not realizable (Tulsyan the target and sampling densities in (41), we can define importance
et al., 2013a, 2013). weights as follows (see (24)).
In the last few decades, several approximate nonlinear state (i)
(i) 1 p(xt|t−1 |y1:t )
filters based on statistical and analytical approximations of the wt|t−1 ≡ , for all i = 1, . . ., N, (42)
N p(x(i) |y )
optimal nonlinear filter have been developed for state estimation t|t−1 1:t−1
in nonlinear SSMs (Sorenson, 1974; Maybeck, 1982; Tulsyan et al., N N
(i) (i)
2014, 2012). Most of these nonlinear filters can be classified as where {xt|t−1 } and {wt|t−1 } are the i.i.d. samples from
i=1 i=1
either Kalman-based filters or sequential Monte Carlo (SMC)-based p(xt |y1:t−1 ) and unnormalized importance weights, respectively.
A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145 141
In principle, (42) can not be evaluated since the target density is Given (50), the state estimate in (40) can be computed as
unknown; however, using (38), we can rewrite (42) as 1 (i)
N N
1
(i)
wt|t−1 ∝
(i)
p(yt |xt|t−1 ), for i = 1, . . ., N, (43) x̂t|t ≈ xt ı(i) (dxt ) = xt|t , (51)
X
N xt|t N
i=1 i=1
or in terms of normalized importance weights as Recall from Section 4.2, that resampling in (48) takes i.i.d. particles
(i) N (i) N
(i)
p(yt |xt|t−1 ) {xt|t−1 } and delivers dependent particles {xt|t } . As shown in
(i) i=1 i=1
w̃t|t−1 = N (i)
, for i = 1, . . ., N, (44) Ninness (2000), the rate of convergence of (51) to (40) decreases as
i=1
p(yt |xt|t−1 ) (i) N
the correlation in the resampled set {xt|t } increases. This problem
i=1
(i) N is alleviated by alternatively computing (40) as
where {w̃t|t−1 } are the normalized weights. According to (44),
i=1
the particle weight depend on the likelihood function p(yt |xt ). Intu- p(xt |y1:t )
x̂t|t = xt p(xt |y1:t−1 )dxt , (52a)
itively, this makes sense, since the likelihood establishes how likely X
p(xt |y1:t−1 )
a given state explains the measurement. Thus the better a certain
1
N
particle explains the measurement, the higher the probability that
the particle was in fact sampled from the target density. A prereq- ∝ xt p(yt |xt ) ı (i) (dxt ), (52b)
X
N x
t|t−1
uisite for computing the weights in (44) is complete access to the i=1
(i) N
1 (i) 1 (i)
particles set {xt|t−1 } . This set can be obtained as follows. Assum- N
p(yt |xt|t−1 )
(i) N
i=1 (i)
ing we have particles from the target density p(xt−1 |y1:t−1 ) at the = xt|t−1 N = xt|t−1 w̃t|t−1 . (52c)
N (j)
p(yt |xt|t−1 ) N
previous time, its empirical distribution can be written as i=1 j=1 i=1
From (52a) to (52b), we have used the relation (38) and the empir-
1
N
p(xt−1 |y1:t−1 )dxt−1 = ı (i) (dxt−1 ). (45) ical distribution (47). Now since the sum in (52c) involves an
N x (i) N (i) N
t−1|t−1
i=1 independent set {xt|t−1 } instead of the dependent set {xt|t } in
i=1 i=1
(51), the estimate (52c) is generally more accurate.
Now substituting (45) into (39) yields
Finally, the procedure to recursively compute the MC approxi-
mation of the posterior density in (50) is referred to as the particle
1
N
p̃(xt |y1:t−1 ) = p(xt |xt−1 ) ı (i) (dxt−1 ), (46a) filtering algorithm, which is outlined in Algorithm 4.
X
N x
t−1|t−1
i=1
6. Implementation
1
N
(i)
= p(xt |xt−1|t−1 ), (46b) In this section, we discuss the implementation of Algorithm 4.
N
i=1 The aim of this discussion is to enable beginners and first-time
researchers to implement particle filtering for their own state esti-
where p̃(xt |y1:t−1 ) is a MC approximation of p(xt |y1:t−1 ). In (46b),
mation problems. Before we give the implementation, it is worth
the sampling density is given as a mixture of N transition densi-
commenting on some of the aspects of Algorithm 4. First observe
ties. Now since each of the N densities are uniformly weighted,
N that the SSM in (35) is a general probabilistic representation of
(i)
passing the particle set {xt−1|t−1 } through the transition density time-series models. In this section, we consider the problem of state
i=1
(i) N estimation in the following class of SSMs
generates i.i.d. samples set {xt|t−1 } , that is distributed according
i=1
(i) N X0 ∼N( · |M0 , P0 ), (53a)
to the sampling density p(xt |y1:t−1 ). Once {xt|t−1 } is generated,
i=1
the empirical distribution of the sampling density can be written Xt+1 = f (Xt , Ut ) + Vt , Vt ∼N( · |0, PX ), (53b)
as
Yt = g (Xt , Ut ) + Wt , Wt ∼N( · |0, PY ), (53c)
1
N
p(xt |y1:t−1 )dxt = ı (i) (dxt ), (47) where the initial state is Gaussian with mean M0 and covari-
N x
t|t−1 ance P0 , Ut ∈ Rp are the system inputs; Vt ∈ Rn and Wt ∈ Rm
i=1
are the additive, mutually independent zero-mean Gaussian state
(i) N and measurement noise processes, respectively; and f and g are
From {xt|t−1 } , particles from the target density is obtained by
i=1 the state and measurement mapping functions parametrized by
(i) (i) N
resampling {xt|t−1 , w̃t|t−1 } according to ∈ Rk , respectively. Here, (53) is called a discrete-time nonlinear
i=1
SSM with additive Gaussian noise and can be probabilistically rep-
˜ (j) = x(i) ) = w̃(i) ,
Pr(X for j = 1, . . ., N, (48) resented as (35). In fact, the state transition and the likelihood for
t|t t|t−1 t|t−1
(53) can be represented as
with the resampled particle weights reset to
p(xt+1 |xt , ut ) = N( · |(xt+1 − f (xt , ut )), PX ), (54a)
(j) 1
wt|t = for j = 1, . . ., N, (49) p(yt |xt , ut ) = N( · |(yt − g (xt , ut )), PY ). (54b)
N
The densities in (54a) and (54b) correspond to the mean-shifted
(i) (i) N (i) N
Finally the particle system {xt|t , wt|t } } = {xt|t , N −1 } corre- state and measurement noise, respectively. Now given (54a), the
i=1 i=1
sponds to the target density p(xt |y1:t ), with its empirical Step 3 of Algorithm 4 is implemented as follows. First we generate
distribution represented as (i) (i)
a sample of state noise (Vt = vt )∼N( · |0, PX ), then given xt−1|t−1 ,
(i)
1
N predict xt|t−1 as follows
p(xt |y1:t )dxt = ı (i) (dxt ). (50)
N xt|t (i) (i)
xt|t−1 = f (xt−1|t−1 , ut−1 ) + vt .
(i)
(55)
i=1
142 A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145
Table 3
A MATLAB code for implementing Algorithm 4 for the SSM in (53) with n = 2, m = 1 and p = 2.
(i) (i) (i) growth and substrate consumption, the dynamics of the species
Here, given vt and xt−1|t−1 , the particle xt|t−1 is generated deter-
(i) insider the fermenter can be described by Tulsyan et al. (2012)
ministically. Similarly, w̃t|t−1 in Step 4 is calculated using (54b).
(i) (1)Xt (2)
Since (54b) is Gaussian, w̃t|t−1 is also a Gaussian density evaluated Xt+1 (1) = 0.1 − Ut (1) − (4) X1 (t) + Vt (1),
(i) (2) + Xt (2)
at xt|t−1 . Finally, a MATLAB implementation of Algorithm 4 for the
SSM in (53) for n = 2, m = 1 and p = 2 is given in Table 3. We use (1)Xt (2) Xt (1)
the systematic resampling in Algorithm 3 to implement Step 6 in Xt+1 (2) = 0.1 − + Ut (1) (Ut (2) − Xt (2))
(2) + Xt (2) (3)
Algorithm 4. The first four arguments to the function StateEsti- +Vt (2),
mation are the state equations f1 and f2, measurement equation g
and the likelihood pe, all defined as MATLAB inline functions. We
Yt = Xt (2) + Wt ,
demonstrate the use of code in Table 3 on the following example.
where Xt (1) and Xt (2) are the state variables representing the con-
centrations of biomass growth (g/L) and substrate consumption
Example 6.1. Consider a semi-continuous Baker’s fermenter for (g/L) as a function of time t, respectively. Manipulated variables
biomass growth. Assuming the Monod kinetics for both biomass Ut (1) and Ut (2) are the dilution factor (h−1 ) and the substrate
A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145 143
Table 4
A MATLAB code for state estimation in Example 6.1 using Algorithm 4.
concentration (g/L) in feed, respectively. The measurements yt are in the first step. The next step is to generate synthetic measure-
available only for the substrate consumption. The state noise and ments using the model description. In real-world applications, the
measurement noise are given by Vt ∼N( · |0, PX ) and Wt ∼N( · |0, PY ), measurements are sampled from the process. Once the measure-
respectively. Also, = [(1), (2), (3), (4)]T are the four model ments are available, the StateEstimation routine is invoked. The
parameters, assumed to be perfectly known a priori. The objective MATLAB code for state estimation in Example 6.1 is shown in Table 4.
is then to estimate in real-time the concentration of biomass in the Finally, the state estimates computed using Table 4 are shown in
fermenter using available noisy substrate measurements. Fig. 11. The deviation in the state estimates from the true values
The first step is to define the model in terms of initial condi- in Fig. 11 is due to the noise in the system. From Fig. 11 it is clear
tions, state and measurement functions, and noise distributions. that Algorithm 2 is successful in accurately estimating the concen-
Filter parameters, such as the number of particles is also initialized tration of the biomass growth. Mathematically, the performance of
144 A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145
and pitfalls beginners make while reading particle filtering for the
first time. Moreover, we have also provided the reader with some
intuition as to why the algorithm works and how to implement
it in practice. An implementable version of MATLAB code for par-
ticle filters is also provided. The code not only aids in improving
the understanding of particle filters, it also serves as a template
for beginners to build and implement their own advanced state
estimation routines.
References
Tulsyan, A., Gopaluni, R.B., 2016. Robust model-based delay timer alarm for Tulsyan, A, Huang, B, Gopaluni, RB, Forbes, JF, 2013a. On simultaneous on-line state
non-linear processes. In: Proceedings of the 2016 American Control and parameter estimation in non-linear state-space models. J. Process Control
Conference (ACC), Boston, pp. 2989–2994. 23 (4), 516–526.
Tulsyan, A., Huang, B., Gopaluni, R.B., Forbes, J.F., 2012. Performance assessment of Tulsyan, A, Huang, B, Gopaluni, RB, Forbes, JF, 2013b. A particle filter approach to
nonlinear state filters. In: Proceedings of the 8th IFAC Symposium on approximate posterior Cramér-Rao lower bound: the case of hidden states.
Advanced Control of Chemical Processes, Singapore, pp. 371–376. IEEE Trans. Aerosp. Electron. Syst. 49 (4), 2478–2495.
Tulsyan, A., Forbes, J.F., Huang, B., 2012. Designing priors for robust Bayesian Tulsyan, A., Huang, B., Gopaluni, R.B., Forbes, J.F., 2014. Performance assessment,
optimal experimental design. J. Process Control 22 (2), 450–462. diagnosis, and optimal selection of non-linear state filters. J. Process Control 24
Tulsyan, A., Huang, B., Gopaluni, R.B., Forbes, J.F., 2013. Bayesian identification of (2), 460–478.
non-linear state space models: Part II – Error analysis. In: Proceedings of the Tulsyan, A., Tsai, Y., Gopaluni, R.B., Braatz, R.D., 2016. State-of-charge estimation in
10th International Symposium on Dynamics and Control of Process Systems, lithium-ion batteries: a particle filter approach. J. Power Sources 331, 208–223.
Singapore. Whitley, D., 1994. A genetic algorithm tutorial. Stat. Comput. 4 (2), 65–85.