Sie sind auf Seite 1von 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/308344751

Particle filtering without tears: A primer for beginners

Article  in  Computers & Chemical Engineering · September 2016


DOI: 10.1016/j.compchemeng.2016.08.015

CITATIONS READS

11 748

3 authors, including:

Aditya Tulsyan Bhushan Gopaluni


Massachusetts Institute of Technology University of British Columbia - Vancouver
41 PUBLICATIONS   252 CITATIONS    130 PUBLICATIONS   1,045 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Wastewater treatment control View project

A comparison of simultaneous state and parameter estimation schemes for a continuous fermentor reactor View project

All content following this page was uploaded by Aditya Tulsyan on 25 January 2018.

The user has requested enhancement of the downloaded file.


Computers and Chemical Engineering 95 (2016) 130–145

Contents lists available at ScienceDirect

Computers and Chemical Engineering


journal homepage: www.elsevier.com/locate/compchemeng

Review

Particle filtering without tears: A primer for beginners


Aditya Tulsyan a,∗ , R. Bhushan Gopaluni b , Swanand R. Khare c
a
Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
b
Department of Chemical and Biological Engineering, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
c
Department of Mathematics, Indian Institute of Technology Kharagpur, WB 721302, India

a r t i c l e i n f o a b s t r a c t

Article history: The main purpose of this primer is to systematically introduce the theory of particle filters to readers with
Received 12 February 2016 limited or no prior understanding of the subject. The primer is written for beginners and practitioners
Received in revised form 14 July 2016 interested in learning about the theory and implementation of particle filtering methods. Throughout
Accepted 31 August 2016
this primer we highlight the common mistakes that beginners and first-time researchers make in under-
Available online 19 September 2016
standing and implementing the theory of particle filtering. We also discuss and demonstrate the use
of particle filtering in nonlinear state estimation applications. We conclude the primer by providing an
Keywords:
implementable version of MATLAB code for particle filters. The code not only aids in improving the under-
Monte Carlo method
Particle filter
standing of particle filters, it also serves as a template for building and implementing advanced nonlinear
State estimation state estimation routines.
Bayesian inference © 2016 Elsevier Ltd. All rights reserved.

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2. Perfect sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2.1. Empirical distribution function under perfect sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
2.2. Understanding Dirac Delta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2.3. Integral property of Dirac Delta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2.4. Reconstructing CDF under perfect sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
2.4.1. Comparing MC probability approximation and histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
2.5. The resolution problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
2.5.1. The inter-sample resolution problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
2.5.2. The sampling resolution problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3. Importance sampling (IS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3.1. Empirical distribution function under IS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3.2. Importance weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
3.3. Reconstructing CDF under importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4. Sampling importance resampling (SIR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.1. Resampling step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.2. Resampling strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5. State estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.1. State space models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.2. Bayesian state estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.3. Filtering methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.4. Particle filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

∗ Corresponding author.
E-mail addresses: tulsyan@mit.edu (A. Tulsyan), bhushan.gopaluni@ubc.ca (R. Bhushan Gopaluni), srkhare@maths.iitkgp.ernet.in (S.R. Khare).

http://dx.doi.org/10.1016/j.compchemeng.2016.08.015
0098-1354/© 2016 Elsevier Ltd. All rights reserved.
A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145 131

6. Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

1. Introduction implementation of particle filters. This exposition is presented in


a series of systematic steps. First we discuss the perfect sampling
The early idea of Monte Carlo methods can be traced back to strategy, and develop the related idea of cumulative density func-
1777 in the Buffon’s needle experiment, where random experi- tion (CDF) approximation under perfect sampling conditions. We
ments were used by Buffon to estimate  (Solomon, 1978), and then extend the discussion to CDF approximation under impor-
later in 1930s by Fermi to study a neutron diffusion experiment. tance sampling conditions followed by the idea of random sampling
The modern version of Monte Carlo was invented in the late 1940s using the sampling importance resampling (SIR) method. Finally,
at the Los Alamos Scientific Laboratory by John von Neumann, we discuss the application of the SIR filter in state estimation. We
Stanislaw Ulam, Niick Metropolis, Edward Teller and others. In fact, also develop and present an implementable MATLAB code for state
Monte Carlo methods were central to the simulations needed for estimation using SIR filter. Throughout this paper, we highlight
the Manhattan Project during the World War II. Intuitively, Monte the common pitfalls and mistakes beginners make while reading
Carlo method is a broad class of computational algorithms that rely about particle filters. We assume that the reader is familiar with the
on random sampling to learn complex systems or mathematical basics of random variables, probability and density functions at the
objects which are otherwise analytically intractable. For example, level of Montgomery and Runger (2010). We also assume readers’
Monte Carlo methods are efficient in solving complex integration, familiarity with the basics of the state estimation problem.
non-convex optimization, and inverse problems (Geweke, 1989;
Rubinstein and Kroese, 2011). Since 1940s, Monte Carlo methods 2. Perfect sampling
have been rediscovered independently in many disciplines span-
ning across sciences, engineering and finance. Several new Monte Perfect samplers or perfect sampling methods are algorithms
Carlo techniques, such as Bayesian bootstrap, sequential Monte that provide samples guaranteed to be distributed according to
Carlo, hybrid Monte Carlo, quasi Monte Carlo, quantum Monte Carlo a density function of interest. The resulting samples from a per-
have been developed and pioneered. The details of the historical fect sampler are called perfect samples Let us assume that we
development of Monte Carlo methods are skipped here but can be have an algorithm to generate a pool of N ∈ N independent and
found in Chen (2003), Metropolis (1987) and the references cited identically distributed (i.i.d.) perfect random samples denoted by
therein. N
{X (i) }i=1 ≡ {X (1) , X (2) , . . ., X (N) } from some probability density func-
The implementation of advanced control and monitoring strate-
tion p(·) then the notation
gies on complex process and manufacturing systems require
measurement of the key process state variables, which are often (X (i) = x(i) )∼p(x(i) ), for i = 1, . . ., N
hidden or unmeasured. The sequential Monte Carlo method or par-
ticle filter is a popular approach that allows real-time estimation implies that each random sample - X(i) is distributed according
N
of hidden process states by combining the power of Monte Carlo to p(·). Here {x(i) }i=1 ≡ {x(1) , x(2) , . . ., x(N) } denotes a pool of gener-
methods with the Bayesian inference. Compared to other Monte ated samples or “particles”. Algorithms to generate perfect random
Carlo methods, the idea of particle filter is relatively new, and samples from “simple” density functions (e.g., Gaussian, uniform,
was formally established by Gordon et al. (1993). Almost imme- chi-squared) are well established in literature, and can be auto-
diately, a number of statisticians also independently developed mated on a computer to generate millions of samples in a finite
other versions of particle filtering methods, such as the samp- time. There are two main methods to generate perfect samples –
ling importance resampling (SIR) filter, Rao-Blackwellized particle (i) inverse transformation and (ii) general transformation (Robert
filter, auxilary particle filter and others. With the advent of high- and Casella, 2013). For both these methods, uniform random vari-
speed computing technology, the rediscovery and renaissance of ables play a key determinant in the generation of perfect samples
particle filters in the mid-1990s lead to an explosion in both devel- distributed according to other “simple” density functions of inter-
opment and use of particle filtering methods for state estimation. In ests. This is because many of these “simple” density functions
recent years, particle filtering has attracted considerable attention can be represented as a deterministic transformation of a uni-
from researchers across multiple disciplines, with many success- form random variable. For example, assuming we have access to
ful applications in applied statistics, machine learning, signal perfect samples from the uniform density function then perfect
processing, econometrics, computer graphics, automatic control, samples from other density functions can be generated using gen-
tracking, computer vision, communications, computational biol- eral transformation methods as shown in Table 1. Similary, samples
ogy, and others (Chen, 2003) (for example, see the March special from a uniform density can also generate perfect samples from a
issue of 2001 Annals of the Institute of Statistical Mathematics
and the February special issue of 2002 IEEE Transactions on Signal
Processing). Table 1
Perfect samples from various distributions obtained using the general transforma-
The huge popularity of particle filtering methods among
tion method. Here U∼U[0,1] represents a sample distributed according to the uniform
researchers across multiple disciplines have lead to propagation density function U[0,1] , defined over the interval [0, 1].
of the subject knowledge that is varied both in style and under-
standing, and in some cases even contradictory. This often leaves Distribution Transformation

practitioners and first-time researchers in a lurch, who often have Exponential


 (j) 2
X = log U/∼Exp()
to grapple between innumerous published articles and book chap- Chi-squared Y =2 X ∼2
j=1
a
ters to further their understanding of particle filters as applicable to Gamma Y =ˇ X (j) ∼Ga(a, ˇ)
their discipline of practice or study. The main aim and contribution
a j=1(j)
X

of this primer is to provide a gentle introduction to practition- Beta Y = a+b


j=1
∼Be(a, b)
(j) X
ers and beginners with limited understanding of the theory or j=1
132 A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145

Fig. 2. An illustration showing: (a) Perfect i.i.d. random sampling from a standard
normal distribution with random particles denoted by red dots along the axis and
the underlying density function represented by the solid black curve; (b) probability
of X assuming values in an infinitesimal interval set dx. The probability of X ∈ dx is
Fig. 1. A computer generated graphical illustration showing: (a) Bi-variate Gaussian the blue highlighted area and is approximately equal to p(x)dx. (For interpretation of
density function; (b) random samples or particles distributed according to the bi- the references to color in this figure legend, the reader is referred to the web version
variate Gaussian density function. The figure is created in MATLAB using command of the article.)
mvrnd (MATLAB, 2010a).

is focused only on the latter. Although we only discuss MC methods


Gaussian density function using the Box–Muller general transfor- for CDF approximations, there are other methods that can be also
mation algorithm (see Algorithm 1 for illustration). In MATLAB, be used to approximate CDFs.
i.i.d. random samples from a multi-variate Gaussian random vari-
Algorithm 1. Box–Muller for Gaussian sample generation
able can be generated using the command function mvnrnd. Fig. 1
1: Generate U(1) = u(i) , U(1) = u(i) i.i.d. random samples from a uniform
shows particles drawn from a bi-variate Gaussian density function
using mvnrnd. A list of other density functions permitting perfect
 U[0,1] over [0, 1].
distribution

x(1) = −2 log(u(1) ) cos(2u(2) )
sampling are discussed in Robert and Casella (2013), and their 2: Define: 
corresponding MATLAB commands in MATLAB (2010a). Generat- x(2) = −2 log(u(1) ) sin(2u(2) )
ing perfect samples from arbitrary density function is nontrivial, 3: Take x(1) and x(2) as two independent draws from N(0, 1).
in general; however, for the material presented in this section, we
In this paper, while we discuss MC methods as a means to
assume that perfect samples from the density function of interest
approximate CDFs of random variables, these methods are also
are available.
N widely introduced in textbooks and articles to numerically evaluate
Now given a set of perfectly sampled particles {x(i) }i=1 gener-
complex integrals. Since the ability to solve integration problems is
ated from some unknown density function p(·), we are interested
a direct result of being able to reconstruct the CDFs (this is discussed
in learning (or reconstructing) the underlying distribution (see
in Section 2.3), in this exposition paper, we proceed by introducing
Fig. 2(a)). In statistics, this problem is referred to as the probabil-
the MC methods as means to approximate the CDFs first.
ity density function (PDF) or cumulative distribution function (CDF)
Recall that the CDF of a continuous random variable X ∼ p(·) is a
estimation problem, depending on the function being estimated. In
continuous function, denoted by PX (·), and defined as
practice, CDFs form basis in applications, such as statistical hypoth-
 a
esis testing, numerical integration, and random number generation.
CDFs also plays a critical role in the Kolmogorov–Smirnov test PX (a) ≡ Pr(X ∈ (−∞, a]) = p(x)dx, (1)
−∞
for checking whether two empirical distributions are different or
whether an empirical distribution is different from an ideal distri- where Pr(X ∈ I) is describes the probability of X assuming values
bution. Recall that a CDF is defined as an integral of the underlying in an interval I ∈ R. Thus observe that construction of PX (·) solely
PDF over an interval. The estimation of PDFs from sampled particles depends on the ability to approximate Pr(X ∈ (−∞, a]) for all a ∈ R.
is an active area of research, and there are numerous approaches The MC method to reconstruct CDFs using particles is based on this
such as histogram interpolation and kernel estimation methods observation, and is discussed in the following two sections.
for reconstructing the underlying PDF. The standard methods for
PDF estimation can be found in Silverman (1986). Unlike the PDF 2.1. Empirical distribution function under perfect sampling
estimation methods, estimating the underlying CDF from particles
is relatively well established. For example, a Monte Carlo (MC) Intuitively, if Pr(X ∈ dx) ≡ p(x)dx defines the probability of X in
method provides an efficient approach to estimate CDFs. Given an infinitesimal interval of length dx (see Fig. 2(b) for illustration),
a reliable CDF estimation method, one can then argue that the then the basic MC method describes its “empirical” approximation
same method be used to estimate the PDF by differentiating the as
approximate CDF. Despite the intuitive appeal, a CDF estimate does
1
N
not translate well into a PDF estimate due to the resolution prob-
lem (see Section 2.5). Having established the primary distinction PrN (X ∈ dx) = p(x)dx ≈ ıx(i) (dx), (2)
N
between the PDF and CDF estimation methods, the rest of the paper i=1
A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145 133

Table 2 where 1A ( · ) is an indicator function of A. Thus if a Dirac delta in


Notation widely used in books and articles on MC methods to represent an empirical
(2) is conceptualized as modeling an idealized point-mass at x then
distribution function of a random variable.
ıx (dx) represents the “mass” contained in an infinitesimal set dx.
Notation used for empirical distribution function Note that (3) is a simplified representation of a “measure” and the
N alternations required to obtain a rigorous formulation are obvious.
p(x)dx = N −1 ıx(i) (dx)
i=1N A major source of ambiguity in (2) is often the use of delta.
p(x)dx = N −1 ı(x − x(i) )
N i=1
In general, the definition of delta, and its perception as that of a
p(dx) = N −1 ıx(i) (dx)
N i=1 “measure”, “function” or a “distribution” varies across scientific
p(x) = N −1 ıx(i) (dx)
i=1
N
and engineering disciples. The ambiguity surrounding the use of
p(x) = N −1 ıx(i) (x) delta arises out of our interest in seeking a simpler representation
i=1
N
p(x) = N −1 ı(x − x(i) ) of an otherwise complex mathematical construct. For example, in
i=1N
Pr(x) = N −1
ı(x − x )
(i) engineering, a delta is often viewed as a “function”, characterized
i=1
“heuristically” on the real line as zero everywhere except at a point,
where it is infinite. Mathematically, it is:
where p(x)dx is an N-particle approximation, referred to as “empir-

ical distribution function” of X over dx, and ıx(i) ( · ) is the Dirac delta +∞ if x = a
with support mass at location x(i) . ı(x − a) = . (4)
0 otherwise
Numerous papers and books on MC methods introduce (2) as the
“abstract” definition of an “empirical distribution function” under
perfect sampling conditions. Despite the compact representation in Although perceiving delta as a “function” in (4) does not aid much
(2), an empirical distribution is often the source of constant confu- in the understanding of an empirical distribution in (2), its use is
sion and ambiguity among practitioners. The reason for this is two often implied in representations, such as (see Table 2)
fold – (i) lack of formal training among practitioners in mathemati-
cal subjects such as measure and probability theory, and (ii) severe
1
N
abuse of notation by authors for mathematical convenience and to p(x)dx = ı(x − x(i) ). (5)
make MC methods accessible to a wider class of audience. For exam- N
i=1
ple, a list of common representations in literature for an empirical
distribution function is given in Table 2. Note that without any
It is easy to see that the representation (5) incorrectly suggests
rigorous introduction to the mathematical ideas and framework
that the probability at the particle location x(i) is infinity! Another
all the representations in Table 2, including (2) can be described
widely accepted notion of delta in (5) is that of a Kronecker delta
as “simplistic” at best with (2) being the most “accurate”. This is
function, which is defined as
corroborated in all recent publications by the statistics commu-
nity; wherein, there seems to be some sort of consensus on the 
consistent use of representation in (2) as the “accurate” abstract 1 if x = a
ı(x − a) = , (6)
definition of an empirical distribution function. See Andrieu et al. 0 otherwise
(2010), Kantas et al. (2015), for example. In literature, the indiscrim-
inate use of different representations for an empirical distribution
while the Kronecker delta alleviates the “infinity” problem in (5) it
over the last 15 years has immensely contributed to the confusion
makes the abstract definition of an empirical distribution open to
among practitioners with limited understanding of the rigorous
further misinterpretation. For example, another widely used nota-
theory underpinning its use. Even in the statistics community, the
tion for an empirical distribution is (see Table 2)
representation for an empirical distribution function has evolved
since its inception. For example, the representations used by the
1
authors in Doucet and Johansen (2009), Andrieu et al. (2010) are N

different. In our opinion, the varied representations used in dif- p(x) = ı(x − x(i) ), (7)
N
ferent disciplines have left practitioners in a lurch when it comes i=1
to fully understanding an empirical distribution function. In this
paper, we will assume (2) as the definition of an empirical dis- A serious flaw with (7) is that it erroneously suggests that MC meth-
tribution function; however, readers are cautioned that without ods approximate PDFs. In fact, in literature, it is often reported that
invoking the rigorous ideas of probability and measures, (2) also MC methods approximate PDFs by counting the number of parti-
constitutes an abuse of notation. It is not our aim to revisit the his- cles at each location. It is easy to check that this claim is incorrect
torical development of an empirical distribution, or to establish the since the maximum density value assigned at a particle location
rigorous theory behind it. The objective here is to provide insights, in (7) with the Kronecker delta function in (6) is 1, while the true
and to highlight common misconceptions behind the use of (2) as density can be greater than 1.
prevalent among practitioners. We begin by looking at the Dirac Thus while perceiving delta in (2) as a measure is rigorous and
delta used in (2). consistent with the theory of MC methods, it is a standard practice
to refer to it as a function. In our view, the abuse of semantics is
2.2. Understanding Dirac Delta serious, for it incites practitioners into accepting the heuristic or
incorrect definitions of delta as suggested by (4) and (6). In the rest
In the context of MC methods, a rigorous way to define delta in of the paper, any reference to delta is to be construed as a Dirac
(2) is to think of it as a “measure” or Dirac delta for a given point x measure, as defined by (3).
in the real-line R. A Dirac delta accepts as an argument some set A Finally, Fig. 3(a) gives a pedagogical sketch of the representa-
and returns a value as defined by tion in (2). As evident from Fig. 3(a), it is intuitive to think of an
 empirical distribution function as a collection of mass contained
1 if x ∈ A in N infinitesimal bins, each of length dx and centered around the
ıx (A) = 1A (x) = , (3) N
0 otherwise particle set {x(i) }i=1 .
134 A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145


1 1  (i)
N N

= f (x)ıx(i) (dx) = f (x ), (10b)


N R
N
i=1 i=1

where H̃(f (x)) is a MC approximation of H(f(x)) in (9). Intuitively,


MC methods evaluate complex integrals by simply summing the
integrand values evaluated at each particle location. The ability of
MC methods to compute integrals under density functions is crucial
not only to approximate CDF in (1) but also plays a critical role in
state estimation applications discussed in Section 5.

2.4. Reconstructing CDF under perfect sampling

Having abstractly constructed an approximate probability of X in


an infinitesimal interval dx, (2) can now be readily used to compute
probability in any given interval. The extension from infinitesimal
to general intervals is necessary to compute CDF, which is defined
in (1)) over intervals. Now if Pr(X ∈ L) denotes probability of X in
an interval L then we can write

Pr(X ∈ L) ≡ p(x)dx. (11)
L
N
Now since p(·) is unknown (recall that only random samples {x(i) }i=1
distributed according to p(·) are available at our disposal) (11) can-
not be evaluated analytically. Thus substituting (2) into (11), and
using the Dirac delta integral property in (8) yields a MC approxi-
mation of (11) given by

1
N
˜
Pr(X ∈ L) = ıx(i) (dx), (12a)
L
N
i=1

1 
N
1
N
= ıx(i) (dx) = 1L (x(i) ), (12b)
N L
N
i=1 i=1

˜ · ) is an MC approximation of Pr(·) and 1L ( · ) is an indica-


where Pr(
tor function of L. Intuitively, (12b) approximates the probability of
Fig. 3. (a) Empirical distribution function of a random variable constructed using
random variable, X in any interval L ∈ R as the fraction of particles
N i.i.d. particles distributed according to p(·) (see (2)). For N = 17, the location of
N
{x(i) }i=1 is denoted by solid red balls along the axis. The height of stem at particle
contained in L. An illustration of MC approximation of the proba-
location is 1/17; (b) arbitrary interval sets along the real-line; (c) MC approximation bility of X over intervals, denoted generically as I ∈ R in (12b) is
of the probability of X ∼ p(·) over different intervals as given by (12b); and (d) the shown in Fig. 3(b) and (c).
MC approximation of the CDF (denoted by “step-like” function, with step-height
of 1/17 at each particle location and step-width equal to the distance between two
N 2.4.1. Comparing MC probability approximation and histogram
consecutive particles) constructed using the particle set {x(i) }i=1 . The true underlying
It is instructive to highlight that the MC approximation of prob-
CDF is denoted by the solid black curve. (For interpretation of the references to color
in this figure legend, the reader is referred to the web version of the article.) ability over intervals in (12b) is often misconstrued for a histogram
(see Fig. 3(c)). Recall that while MC methods approximate prob-
2.3. Integral property of Dirac Delta abilities, histogram method approximates a PDF (Castro, 2015).
The confusion between MC and histogram approximations partly
Defining a Dirac delta in (3) automatically allows us to define arise because both methods only require a set of random particles
integration of complex functions. For example, integral of f(x) with to construct the underlying distribution, and both “look similar”.
N
respect to a measure ıX (i) (dx) is defined as Mathematically, given {x(i) }i=1 an M-interval, N-particle histogram
 approximation of the PDF p(·) of X in the j-th interval Lj of width
f (x)ıx(i) (dx) = f (x(i) ). (8) K(Lj ) is given by
R
1
N
The ability to define integration with respect to Dirac measure (see p̃(x) = 1Lj (x(i) ), for all x ∈ Lj (13)
(8)) allows us to obtain a MC approximation of integrals. For exam- c
i=1
ple, consider the following integration problem
 where p̃(x) is the approximate density over interval Lj and
H(f (x)) = f (x)p(x)dx. (9)
M 
N
R
c= K(Lj ) 1L (x(i) ),
Notice that substituting (2) into (9) and using the integral property j=1
i=1
of Dirac measure in (8) yields
is the total area under M intervals. Thus comparing (12b) and (13)

1
N
it is clear that while MC method in (12b) approximates probability
H̃(f (x)) = f (x) ıx(i) (dx), (10a) of a random variable in an interval L by fraction of the total number
R
N
i=1 of particles contained in L, histogram method approximates the
A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145 135

where P̃X ( · ) is a MC approximation of the CDF. It is easy to check


that (14) satisfies the fundamental property of a CDF, i.e., P̃X (+∞) =
1 by setting a =+∞ in (14) such that

1
N
˜
P̃X (+∞) = Pr(X ∈ (−∞, +∞)) = 1(−∞,+∞) (x(i) ) = 1. (15a)
N
i=1

It can also be shown that as N → ∞, (14) converges to the true


underlying CDF in some probabilistic sense (Doucet and de Freitas,
2001). Note that the convergence result for (14) is established in
probabilistic sense because (14) is a function of a set of random
samples. This unwieldy statement has a simple interpretation – for
N N
any two given sets of random realizations of {X (i) }i=1 , say {x(i) }i=1
N
and {x̃(i) }i=1 , the MC approximations in (14) are different; however,
they both converge to the same true CDF. The next example illus-
trates the quality of CDF approximation in (14) for a uni-variate
Gaussian random variable.
Fig. 4. An illustration showing: (a) MC approximation of the probability of X over
intervals; (b) Histogram approximation of the probability density function. The Example 2.2. Let X∼N( · |0, 1) be a Gaussian random variable with
result shown here is for a Gaussian random variable with mean 0 and standard N
deviation 0.01. The red curve in (b) describes the true PDF. Notice that while the
0 mean and unit variance. Assuming N = 10, 000, let {X (i) }i=1 define
graphs for MC and Histogram approximations appear “similar”, they are approxi- a set of i.i.d. random samples generated in MATLAB using command
mating two different aspects of a continuous random variable. (For interpretation of randn. Then the CDF of X, denoted by PX (a) at a = 0 can be empiri-
the references to color in this figure legend, the reader is referred to the web version cally computed as
of the article.)

1 
10,000

P̃X (0) = 1(−∞,0] (x(i) ) = 0.4986. (16a)


10, 000
i=1

density in L by counting the number of particles in L normalized The numerical value reported in this example is based on a code
by total area under the histogram. Thus it is easy to check that the written in MATLAB. Note that the calculated CDF value is “close” to
sum of MC probabilities over all given intervals always adds up to 1 the theoretical value of 0.5.
(see Fig. 3(c)); whereas, the sum of histogram values can be greater
than 1. Conversely, it is also true that the area under a histogram is Finally, Fig. 3(d) gives a schematic of the CDF approximation in (14).
always 1; whereas, area under a MC distribution function need not The “step-like” approximation of CDF in Fig. 3(d) is from the indi-
necessarily be 1. The next example gives a MATLAB code to generate cator function in (14) which adds a step-height of N−1 for every
a MC and histogram approximations of the distribution function particle encountered while moving from left to right along the
and PDF, respectively, of a Gaussian random variable. real-line as the CDF is reconstructed. In MATLAB, MC approxima-
tion of a CDF given in (14) can be automatically computed using
Example 2.1. Let X∼N( · |0, 0.012 ) be a Gaussian random variable the command function cdfplot.
with mean 0 and standard deviation 0.01. Assuming we sample
N = 10, 000 i.i.d. particles distributed according to N( · |0, 0.012 ), 2.5. The resolution problem
then a MC approximation of underlying distribution function and
histogram approximation of the PDF can be computed in MATLAB In MC methods, the resolution or quality of the CDF approxima-
using the following code: tion is proportional to the number of particles used. The resolution
of P̃X in (14) thus can be arbitrarily improved by simply increasing
the number of particles. Since computers only have finite comput-
ing capabilities, the practical resolution of (14) is also finite. The
finite resolution of (14) leads to two serious problems (a) finite
inter-sample resolution, and (b) finite sampling resolution. These
two issues are discussed next.

2.5.1. The inter-sample resolution problem


An immediate consequence of the finite resolution of P̃X is the
poor CDF approximation in the inter-sample intervals – a problem
popularly referred to as the “inter-sample” resolution problem. A
schematic of the inter-sample resolution problem is illustrated in
Fig. 5(a). As shown in Fig. 5(a), let {x(i) , x(j) } denote a pair of adjacent
Fig. 4 shows a MC and histogram approximations constructed in ˜
particles with x(i) < x(j) then probability Pr(X ∈ [a, b]) = 0 for all [a,
MATLAB using the code given in Example 2.1. Finally, having approx-
b] ⊂ [x(i) , x(j) ). This holds because for any [a, b] ⊂ [x(i) , x(j) ) we can
imated the distribution function of a random variable using MC
write
methods in (12b), the CDF in (1) can now be approximated by
setting L = (∞, a] in (12b) such that ˜
Pr(X ∈ [a, b]) = P̃X (x(i) ) − P̃X (x(i) ) = 0. (17)

From Fig. 5(a) it is clear that while the true probability Pr(X ∈ [a,
1
N ˜
b]) is non-zero, Pr(X ∈ [a, b]) = 0. Since (17) holds for any subset
˜
P̃X (a) = Pr(X ∈ (−∞, a]) = 1(−∞,a] (x(i) ), (14) contained in the inter-sample interval [x(i) , x(j) ), the CDF approx-
N
i=1 imations at inter-sample particle locations are generally poor. In
136 A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145

Fig. 5. A schematic highlighting resolution problems with MC approximation of the CDF, P̃X in (14). P̃X is denoted by “step-like” function in red and true CDF PX represented
by solid black curve. Sampled particles are denoted by solid red balls along the X-axis. (a) we show that for any interval [a, b] ⊂ [x(i) , x(j) ), where {x(i) , x(j) } is a pair of adjacent
random particles, probability Pr(X˜ ∈ [a, b]) = 0, while Pr(X ∈ [a, b]) > 0. (b) We sketch the principle of inverse transform sampling method. Let Y ∼U( · |0, 1) be a uniformly
distributed random variable defined in the interval [0, 1]. Now according to inverse transform sampling method, for every random number Y = y generated in interval [a,
b], projecting it onto P̃X selects x(j) as random sample; whereas, projecting onto PX selects different random samples. In the figure, projection of numbers a and b onto PX
is illustrated by yellow arrow and projection onto P̃X denoted by black arrow. Now since the probability Pr(Y ∈ [a, b]) = (b − a)−1 = N−1 (recall that step-size at each particle
location in P̃X is N−1 ), we have Pr(X
˜ = x(i) ) = N −1 . (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)

other words, the accuracy of the CDF approximation in (14) is


N
limited to the particle set {x(i) }i=1 .

2.5.2. The sampling resolution problem


The finite resolution of the CDF approximation in (14) leads to
another limitation referred to as the “sampling” resolution prob-
lem. An immediate consequence of finite sampling resolution is
that sampling algorithms working on P̃X can generate samples only
N
from the set {x(i) }i=1 . This is because given P̃X , the probability of
generating a sample with any sampling algorithm (e.g., inverse
transform sampling method (Robert and Casella, 2013)) is
 N
N −1 for x ∈ {x(i) }i=1 ,
˜
Pr(X = x) = (18)
0 elsewhere.
N
In (18), the probability of generating a sample from {x(i) }i=1 is N−1 ,
since we can write (see Fig. 5(b) for illustration)
˜
Pr(X = x(j) ) = Pr(Y ∈ [a, b]) = P̃X (b) − P̃X (a) = N −1 , (19)

where Y ∼U( · |0, 1) is a uniform random variable in the interval [0,


1]. From (19), it is clear that N−1 in (18) corresponds to the step-size
of P̃X at location x. The inability to generate random samples from
N
(14) outside the set {x(i) }i=1 is referred to as the sampling resolution
problem. Typically, the sampling resolution of (14) can be improved
by choosing a large N; however, caution should be exercised since
N has a direct bearing on the computational cost of MC methods.
Finally, given a set of perfectly sampled i.i.d. random particles, Fig. 6. (a) The target density function of a random variable X ∼ p(·) and the sampling
a MC method provides an efficient approach to: (1) reconstruct density function of a random variable X̃∼s( · ) are represented by the solid yellow and
black curves, respectively. In the figure, the particles generated from the sampling
the underlying CDF of the random variable, and (2) evaluate com-
density is denoted by solid red balls. It is assumed that the support of s(·) includes
plex integrals under density functions. The assumption underlying the support of p(·). (b) The importance weight associated with each sampled particle
the above discussion is that it is possible to generate perfect is represented by the volume of the red ball. In the illustration, bulkier particles rep-
i.i.d. random samples from the density function of interest, which in resent particles with relatively higher importance weight. The blue highlighted area
practice, is often not possible. In order to use the ideas described in is the probability that X ∈ dx, and is approximated using the empirical distribution
function under IS in (20). (For interpretation of the references to color in this figure
this section, we need to be able to generate random particles from
legend, the reader is referred to the web version of the article.)
complicated distributions that do not lend themselves to perfect
sampling. This leads to the idea of importance sampling (IS), and is
discussed next. how likely it is for the particle to be classified as a sample from p(x).
An illustration of the IS method is shown in Fig. 6(a).
3. Importance sampling (IS)
3.1. Empirical distribution function under IS
In this section, we discuss the key idea behind IS for generat-
ing samples from a “target” density of interest, generically denoted Barring a few new concepts such as that of resampling (dis-
here by p(x). Since generating random samples from any arbitrary cussed in Section 4), the ideas of perfect sampling in Section 2 carry
target density is nontrivial, the idea is then to employ an alternative over as is to IS. In fact, IS can be considered as a generalization of
“sampling” density (or importance function as referred in some lit- perfect sampling to arbitrary density functions. Mathematically,
erature), say s(x), that is simple to draw samples from. The idea of let X, X̃ denote a target and sampling random variables dis-
N
IS is that every time a particle is generated from s(x), we can find tributed according to p(·) and s(·), respectively. Let {x̃(i) }i=1 denote
A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145 137

N-i.i.d. particles distributed according to s(·) then an empirical dis- that in many practical applications, including state estimation, the
tribution function for X under IS is normalizing factor in the target density is unknown. This implies
that the importance weight in (24) is only known up to a constant

N
p(x)dx = w(i) ıx̃(i) (dx), (20) scaling factor such that
i=1 p(x̃(i) )
w(i) ∝ for i = 1, . . ., N. (26)
where p(x)dx is an N-particle approximation of an empirical distri- s(x̃(i) )
bution function of X over dx, and ıx̃(i) ( · ) is the Dirac delta and w(i) A straightforward approach to resolve this issue is to normalize the
is the “importance weight” associated with x̃(i) . The choice of an weights in (26) as follows
empirical distribution function under IS in (20) can be substanti-
ated by reconsidering the integration in (9). Recall that if p(x) lends w(i)
w̃(i) = N , for i = 1, . . ., N, (27)
itself to perfect sampling then (10b) yields a MC approximation of i=1
w(i)
(9); otherwise we can rewrite (9) as
 where w̃ is the “normalized” weight. With normalized weights, the
p(x) empirical distribution in (20) can now be written as
H(f (x)) = f (x) s(x)dx. (21)
s(x)
R 
N

Now since s(x) admits perfect sampling, an empirical distribution p(x)dx = w̃(i) ıx̃(i) (dx). (28)
function for X̃ can be written as i=1

1
N The normalized weights in (28) satisfy the following properties
s(x)dx = ıx̃(i) (dx). (22)
N w̃(i) ≥ 0, for i = 1, . . ., N,
i=1

Now substituting (22) into (21) yields  N


(29)
 w̃(i) = 1.
p(x) i=1
H̃(f (x)) = f (x) s(x)dx, (23a)
R
s(x)
Intuitively, the importance weight in (28) contains information
 about how probable it is for a particle from the sampling den-
p(x) 1 
N
= f (x) ıx̃(i) (dx), (23b) sity to be classified as a particle generated from the target density.
R
s(x) N Thus higher particle weight implies higher “acceptance” probabil-
i=1
ity of the corresponding particle. Finally, the quality of (28) depends

1
N
p(x) on both N and the choice of sampling density. While recognizing
= f (x) ı (i) (dx), (23c) that the choice of a sampling density in (28) is non-unique has
N R
s(x) x̃
i=1 lead to the development of application-specific importance meth-
ods with different sampling densities. Despite all their differences
1  (i) p(x̃(i) )  (i) (i)
N N
these methods work on the same principle of delivering a system
= f (x̃ ) = f (x̃ )w , (23d) N
N s(x̃(i) ) {x̃(i) , w̃(i) }i=1 to construct (28).
i=1 i=1

where 3.3. Reconstructing CDF under importance sampling


(i) 1 p(x̃(i) )
w ≡ for i = 1, . . ., N (24) Intuitively, the empirical distribution function in (28) approx-
N s(x̃(i) )
imates the probability of X in dx by counting the total weight of
is the “unnormalized” importance weight for x̃(i) . Conversely, start- particles contained in dx (contrast this with the perfect sampling
ing with (20), substituting (20) into (9) yields case; wherein, the probability is calculated as the fraction of the
 
N total number of particles in dx). As previously, extending (28) to
Ĥ(f (x)) = f (x) w(i) ıx̃(i) (dx), (25a) general intervals by substituting (28) into (11) yields
R
i=1
 
N

 ˜
Pr(X ∈ L) = w̃(i) ıx̃(i) (dx), (30a)

N 
N
L
(i) (i) (i) i=1
= w f (x)ıx̃(i) (dx) = f (x̃ )w , (25b)
i=1
R
i=1 
N  
N
(i)
= w̃ ıx̃(i) (dx) = w̃(i) 1L (x̃(i) ), (30b)
which is same as (23d). This motivates the use of (20) as the abstract
L
i=1 i=1
definition of an empirical distribution function under IS. Observe
that the importance weight in (24) is a random variable with W (i) = ˜ · ) is an MC approximation of the distribution function
where Pr(
N
w(i) denoting its random realization. The ordered set {x̃(i) , w(i) }i=1 Pr(·) under IS. Finally, using (30b), the CDF of X can be approximated
is referred to as a“particle system”. The use of importance weights by setting L = (∞, a] in (30b)
in (20) is discussed next.

N
˜
P̃X (a) = Pr(X ∈ (−∞, a]) = w̃(i) 1(−∞,a] (x̃(i) ), (31)
3.2. Importance weights
i=1

The importance weight in (20) reflects the correction involved where P̃X ( · ) is a MC approximation of the CDF. Observe that (29)
to ensure that the particles from the sampling density are in fact ensures that P̃X (+∞) = 1 in (31). Graphically, the CDF approxima-
samples from the target density. This correction involves assigning tion in (31) yields a “step-like” function (see Fig. 7 for illustration)
a weight to each of the sampled particles. As seen in (24), the impor- with the step-size at each particle location equal to the particle
tance weight of a particle is proportional to the ratio of the target weight (contrast this with the perfect sampling case where the
density to sampling density evaluated at the particle location. Note steps have a uniform height of N−1 ).
138 A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145

4.1. Resampling step

The resampling is a crucial step in an SIR method that gener-


ates particles approximately distributed according to the target
density. The resampling step is implemented as follows. Assuming
N
{x̃(i) , w̃(i) }i=1 is available using the IS in Section 3.1, we construct
P̃X ( · ) as in (31). Next we generate M new particles from P̃X ( · )
according to the following probability relation
˜ (j) = x̃(i) ) = w̃(i) ,
Pr(X for j = 1, . . ., M. (32)
According to (32), the particles associated with small normalized
Fig. 7. An illustration showing MC approximation of the CDF of a random variable importance weights are most likely discarded, whereas the best
X ∼ p(·) constructed using IS method (see (31)). The CDF approximation is denoted samples are replicated in proportion to their importance weights.
by “step-like” function in red, while the true CDF is represented by the solid yel- The relation in (32) is similar to (18) except that the sampling prob-
low curve. The i.i.d. random particles from the sampling density is represented by
ability in (32) is w̃(i) in contrast to N−1 in (18). The relation in (32)
solid red balls. The importance weight associated with each sampled particle is rep-
resented by the volume of red balls. In the illustration, bulkier particles represent can be validated by recalling that the step-size of the CDF approx-
particles with relatively higher importance weights. In the CDF approximation, the imation under IS at location x̃(i) is w̃(i) (see Fig. 7). An algorithm to
step height at each particle location is equal to its particle weight. (For interpreta- implement (32) is discussed in Section 4.2. Note that the resampling
tion of the references to color in this figure legend, the reader is referred to the web algorithms in SIR are often implemented under several sampling
version of the article.)
constraints. For example, it is most natural to keep the particle
pool size fixed, such that M = N. Another constraint commonly used
aims at reducing the variance of the importance weights. This is
critical for improving the statistical properties of an SIR method.
Mathematically, this second constraint requires
Algorithm 2. Sampling importance resampling (SIR)
1:
N
Generate N i.i.d. random samples, {X̃ (i) = x̃(i) }i=1 distributed according to 1
w(j) = , for j = 1, . . ., M, (33)
the sampling density s(·). M
N
2: Compute particle importance weight {W̃ (i) = w̃(i) }i=1 according to M
p(x̃(i) ) where {w(j) }j=1 are importance weights for the resampled particles.
w(i) ∝ , for i = 1, . . ., N, where p(x) is the target probability
s(x̃(i) ) Eq. (33) ensures that the resampled particles have uniform impor-
density function. tance weights. For a list of other constraints used in resampling, the
3: Compute normalized importance weights
readers are referred to the paper by Douc and Cappé (2005).
w(i)
w̃(i) = N , for i = 1, . . ., N. M
Finally, resampling yields a new particle set {X (j) = x(j) }j=1 that is
w(i)
i=1
M N approximately distributed according to the target density. Further,
4: Generate M new random samples {X (j) }j=1 by resampling {x̃(i) }i=1
according to the probability relation implementing resampling under (33) ensures w(j) = M −1 , for all
M
˜ (j) = x̃(i) ) = w̃(i) , for j = 1, . . ., M.
Pr(X j = 1, . . ., M such that {x(j) , M −1 }j=1 denotes the new particle system
M
5: Using the resampled particle set {X = (j)
x(j) }j=1 ,
represent the empirical M
distribution of X ∼ p(·) under sampling importance resampling as
corresponding to the target density. With {x(j) , M −1 }j=1 available,

1
 M the empirical distribution function of X under SIR can be repre-
p(x)dx = ıx(j) (dx). sented as
M
1
j=1 M
p(x)dx = ıx(j) (dx), (34)
M
j=1

4. Sampling importance resampling (SIR) M


where the set {x(j) }i=1 are resampled particles approximately dis-
tributed according to the target density. Comparing (34) with (20),
The IS method discussed in Section 3.3 approximates the CDF
it is clear that (34) does not require particles from the sampling den-
of the target random variable; however, it by itself does not yield
sity in its construction. In fact, the representation in (34) is similar
samples from the target density. Recall that in (28), the particles
to (2), except that the particles in (34) are only approximately dis-
are in fact generated from the sampling density as opposed to
tributed according to the target density as compared to perfectly
the target density. In practice, samples from the target density
sampled particles used in (2). An outline of the SIR method is given
are often sought in recursive applications such as state estima-
in Algorithm 2.
tion discussed in Section 5. There are three most popular sampling
methods – sampling importance resampling (SIR), acceptance-
4.2. Resampling strategy
rejection sampling, and Metropolis-Hastings sampling. In this
section, we briefly discuss the SIR method for generating particles
There are several implementations of the resampling step. One
from the target density; however, for other methods, the readers
way of achieving this is called simple random resampling. The idea
are referred to Robert and Casella (2013), Gilks (2005), Tanizaki and
of simple random resampling is that given a MC approximation of
Mariano (1998).
the CDF of a target random variable (see Fig. 8), we first gener-
The SIR is an extension of the idea of IS discussed in Section 3. In
ate M uniformly distributed random numbers between [0, 1] along
SIR, the random samples from the target density are generated in
the Y axis, which denotes the cumulative sum of the normalized
a two-stage procedure, in which the IS step is followed by another
importance weights. These M uniformly distributed numbers sam-
sampling step, called “resampling”. The SIR is implemented by first
N pled along the Y-axis are then “matched” with the particles on the
constructing a particle system {x̃(i) , w̃(i) }i=1 using IS, and then using X-axis. Finally, the particles on the X-axis generated there from are
M
a resampling step to generate {x(j) }j=1 – particles approximately taken as the approximate samples from the target density.
distributed according to the target density. The resampling step is Note that the samples generated in the resampling step are
discussed next. only limited to the particles originally used to construct the CDF
A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145 139

Fig. 10. A graphical model of the structure of an SSM.

Fig. 8. An illustration showing simple random sampling strategy on a MC approx-


imation of the CDF, P̃X of a target random variable X ∼ p(·) constructed using
12 systematic resampling, this is often a preferred choice in an SIR
{x̃(i) , w̃(i) }i=1 . The particle location is denoted by the solid red ball placed along the X
axis, with its normalized importance weight proportional to the volume of the red
method. The pseudo-algorithm for the systematic resampling is
ball. For convenience, particle index is denoted by the number running along the X- given in Algorithm 3.
axis. In the figure, 3 uniformly distributed random numbers are generated between
[0, 1] along the Y axis, which denotes the cumulative sum of the normalized impor- Algorithm 3. Systematic resampling
tance weights. These three sample draws are shown by the dashed line in the figure. 1: Generate a uniformly distributed random sample y1 ∼U([0, M −1 ]) in an
The uniformly distributed numbers along the Y-axis are then “matched” with the interval (0, M−1 ].
particle location on the X-axis. In the figure, particle number 7 is duplicated twice 2: Generate M ordered samples according to
and particle number 3 is chosen once. The rest of the particles are discarded, with (k − 1) + y1
yk = , for k = 1, . . ., M.
3 M
{x(j) }j=1 = {x̃(3) , x̃(7) , x̃(7) } representing samples distributed according to the target 3: The resampled particles are generated  by producing mi copies
of particle
density.

i−1

i

x̃(i) , where mi = number of yk ∈ w̃(j) , w̃(j) .


j=1 j=1

5. State estimation

Recent advances in high-speed computing technology have


enabled the process and manufacturing industries to use complex
and high-fidelity nonlinear dynamical models, such as in fer-
mentation bioreactors (Chitralekhaa et al., 2010), polymerization
reactors (Achilias and Kiparissides, 1992), and petroleum reser-
voirs (Evensen, 2007). The implementation of advanced control and
monitoring strategies on such complex systems require measure-
ment of the key process state variables, which in many processes
are often hidden or unmeasured. These unmeasured states can be
estimated within the Bayesian framework by solving a filtering
problem; wherein, the posterior density for the states is recursively
computed at each sampling-time conditioned on the available mea-
surement sequence (Doucet and de Freitas, 2001; Tulsyan et al.,
2016; Barazandegan et al., 2015). In this section, we demonstrate
Fig. 9. An illustration showing the principle of resampling. Random samples (red
balls along the X axis) drawn from the sampling density (black solid curve) and the the use of the SIR method discussed in Section 4 in solving the state
normalized importance weight associated with a particle are proportional to the estimation problem.
volume of the red ball. The target density is denoted by the yellow solid curve. After
resampling, all particles have same importance weight, with some of the original
particles either discarded or duplicated. (For interpretation of the references to color 5.1. State space models
in this figure legend, the reader is referred to the web version of the article.)
Fig. 10 gives a graphical model of a state space model (SSM) with
latent states denoted by xt and measurements yt . The latent state
of the target random variable under IS. This is again due to the xt is assumed to be a first order Markov process, such that for a
finite resolution of P̃X constructed under IS in (31). In other words, given sequence of past state information x0:t−1 ≡ {x0 , . . ., xt−1 }, the
M
the particle set {x(j) }j=1 generated in resampling is completely con- current state xt only depends on the previous state xt−1 . Further,
N the measurement yt is assumed to be conditionally independent of
structed from {x̃(i) }i=1 with some particles in the set either discarded
each other given xt . More specifically, the state and observations in
or duplicated. This introduces dependence in the resampled parti-
M N Fig. 10 are random processes denoted by Xt ∈ X ⊂ Rn and Yt ∈ Y ⊂
cles {x(j) }j=1 , even if the original set {x̃(j) }i=1 are i.i.d. samples from
Rm , respectively. The latent states and the measurement processes
the sampling density. A graphical illustration of the resampling step
are governed by their densities p(xt |xt−1 ) and p(yt |xt ), respectively.
is given in Fig. 9.
Given a state Xt−1 = xt−1 , p(xt |xt−1 ) is the density associated with
Other efficient algorithms for resampling includes residual
the transition to a new state Xt = xt at t ∈ N. Similarly, p(yt |xt ) is
sampling (Whitley, 1994; Liu and Chen, 1998), stratified samp-
the likelihood of Xt = xt to have generated the measurement Yt = yt .
ling (Fearnhead, 1998; Kitagawa, 1996), systematic sampling
With these assumptions, SSMs can be probabilistically represented
(Arulampalam et al., 2002; Kitagawa, 1996). The efficiency of a
as follows (Tulsyan and Gopaluni, 2016)
resampling algorithm is determined by the resampling quality
(in terms of MC variation) and computational complexity. Com- X0 ∼p(x0 ), (35a)
parison of different resampling algorithms are discussed in Douc
and Cappé (2005). Owing to the simplicity and efficiency of the Xt+1 |(Xt = xt )∼p(xt+1 |xt ), (35b)
140 A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145

Yt |(Xt = xt )∼p(yt |xt ), (35c) filters. Both the Kalman and SMC-based filters are tractable in finite
computational time and can be used for state estimation in general
where (35a) is the density of the initial latent state. For the sake
or specific types of nonlinear SSMs. A detailed exposition of non-
of brevity, the dependence of (35) on the exogenous inputs and
linear filtering methods and related approximations is not included
known model parameters are not explicitly shown in this section;
here, but can be found in the handbook of nonlinear filtering (Crisan
however, it is straightforward to include them (see Section 6. Fur-
and Rozovskii, 2011).
ther, this exposition only considers the case with n = 1 and m = 1.
The class of SMC-based filtering methods, popularly referred to
Its extension to multi-dimensional systems is straightforward in
particle filters is an importance class of filtering methods for nonlin-
terms of implementation; however, it should be cautioned that like
ear SSMs. Some of the popular particle filtering algorithms, include
many other state estimation algorithms, particle filters also suf-
sampling importance resampling (SIR) filter, auxiliary SIR (ASIR) fil-
fer from the curse of dimensionality problem, as sampling is often
ter, and Rao-Blackwellized particle filter (RBPF). The SIR method is
inefficient in higher dimensional spaces (order of tens or higher).
the simplest of all other particle filtering algorithms. In the next sec-
The state estimation problem aims at computing an estimate of
tion, we demonstrate the application of the SIR method discussed
xt ∈ X in real-time using a sequence of measurements denoted as
in Section 4 in solving the recursive filtering solution in (38) and
y1:t ≡ {y1 , . . ., yt } for all t ∈ N.
(39).

5.2. Bayesian state estimation Algorithm 4. Particle filter for state estimation
(i) N
1: Initialization: generate {x0|0 } ∼p(x0 ) distributed according to the initial
i=1
In the Bayesian framework, state estimation for SSMs in (35) is state density p(x0 ).
solved by recursively computing the state posterior density 2: for t=1 to T do
(i) N
3: Predict: predict {xt|t−1 } according to
i=1
Xt |(Y1:t = y1:t )∼p(xt |y1:t ), (36) (i) (i)
xt|t−1 ∼p(xt |xt−1|t−1 ), for i = 1, . . ., N.
4: Update: compute importance weights according to
where p(xt |y1:t ) is called a posterior or filtering density. Intuitively, (i)
p(yt |xt|t−1 )
the posterior density is a probabilistic representation of available w̃t|t−1 = N
(i)
, for i = 1, . . ., N.
(i)
statistical information on Xt given y1:t . Using the Markov property p(yt |xt|t−1 )
i=1
of (35) and the Bayes’ theorem, we can write 5: State estimation: compute the state estimate as
N

p(yt |xt )p(xt |y1:t−1 ) x̂t|t =


1 (i) (i)
xt|t−1 w̃t|t−1 .
p(xt |y1:t ) = , (37) N
p(yt |y1:t−1 ) i=1
(i) (i) N
where p(xt |y1:t−1 ) and p(yt |xt ) are the state prior and the likelihood 6: Resample: Resample {xt|t−1 , w̃t|t−1 } as per
i=1

of the data, respectively, and p(yt |y1:t−1 ) is a normalizing constant, ˜ (j)


Pr(X =
(i)
xt|t−1 ) =
(i)
w̃t|t−1 , for j = 1, . . ., N.
t|t
(i) N
referred to as the marginal likelihood. In state estimation prob- Define the resampled particles as {xt|t } .
i=1
lems, the normalizing constant is often not explicitly known, hence, 7: end for
ignoring the constant in (37) yields
5.4. Particle filtering
p(xt |y1:t ) ∝ p(yt |xt )p(xt |y1:t−1 ). (38)

Further, using marginalization p(xt |y1:t−1 ) can be written as The SIR method is the most basic particle filtering method. In
 order to use the idea of an SIR discussed in Section 4 for state esti-
p(xt |y1:t−1 ) = p(xt |xt−1 )p(xt−1 |y1:t−1 )dxt−1 , (39) mation, we first need to identify the target and sampling densities
X in the filtering problem. Observe that in (40), the state estimates
are computed by solving the integral with respect to the posterior
where p(xt |xt−1 ) and p(xt−1 |y1:t−1 ) are the transition and posterior
density. Recall that a MC approximation of (40) can be computed
densities at t − 1, respectively. Given p(xt |y1:t ), the most common
by generating perfect samples distributed according to p(xt |y1:t ).
point estimate of xt is the posterior mean given by
 Now since, p(xt |y1:t ) does not permit perfect sampling, we use an
SIR method to generate samples from p(xt |y1:t ). Therefore in (38),
x̂t|t = E[Xt |(Y1:t = y1:t )] = xt p(xt |y1:t )dxt , (40)
p(xt |y1:t ) is the target density, and p(xt |y1:t−1 ) is taken as the samp-
X
ling density, such that
where x̂t|t ∈ Rn is an estimate of xt . Recursively solving (38) and
(39) for the posterior density is called the filtering problem, and the p(xt |y1:t ) ∝ p(yt |xt ) p(xt |y1:t−1 ) . (41)



solution methods are called the filtering methods. targetdensity samplingdensity

5.3. Filtering methods It is highlighted that while p(xt |y1:t−1 ) is selected as the sampling
density in (41), it is not a requirement in general. In fact, in many
In linear SSMs, the state posterior density in (38) is Gaussian and advanced particle filtering algorithms, sampling densities other
can thus be exactly represented by the Kalman filter (KF) using a than the one considered here have been found to be more effective
finite number of moments (e.g., mean, variance); whereas, in non- in state estimation applications. The discussion on other sampling
linear SSMs, the posterior is non-Gaussian, and at least in theory, an density functions is outside the scope of this work; however, read-
infinite number of moments are required for exact representation ers are referred to Arulampalam et al. (2002), and references cited
of the density (Ristic et al., 2004). Thus, with finite computing capa- therein for a detailed treatment on the subject. Now for the choice of
bilities, an optimal nonlinear state filter is not realizable (Tulsyan the target and sampling densities in (41), we can define importance
et al., 2013a, 2013). weights as follows (see (24)).
In the last few decades, several approximate nonlinear state (i)
(i) 1 p(xt|t−1 |y1:t )
filters based on statistical and analytical approximations of the wt|t−1 ≡ , for all i = 1, . . ., N, (42)
N p(x(i) |y )
optimal nonlinear filter have been developed for state estimation t|t−1 1:t−1
in nonlinear SSMs (Sorenson, 1974; Maybeck, 1982; Tulsyan et al., N N
(i) (i)
2014, 2012). Most of these nonlinear filters can be classified as where {xt|t−1 } and {wt|t−1 } are the i.i.d. samples from
i=1 i=1
either Kalman-based filters or sequential Monte Carlo (SMC)-based p(xt |y1:t−1 ) and unnormalized importance weights, respectively.
A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145 141

In principle, (42) can not be evaluated since the target density is Given (50), the state estimate in (40) can be computed as
unknown; however, using (38), we can rewrite (42) as   1  (i)
N N
1
(i)
wt|t−1 ∝
(i)
p(yt |xt|t−1 ), for i = 1, . . ., N, (43) x̂t|t ≈ xt ı(i) (dxt ) = xt|t , (51)
X
N xt|t N
i=1 i=1
or in terms of normalized importance weights as Recall from Section 4.2, that resampling in (48) takes i.i.d. particles
(i) N (i) N
(i)
p(yt |xt|t−1 ) {xt|t−1 } and delivers dependent particles {xt|t } . As shown in
(i) i=1 i=1
w̃t|t−1 = N (i)
, for i = 1, . . ., N, (44) Ninness (2000), the rate of convergence of (51) to (40) decreases as
i=1
p(yt |xt|t−1 ) (i) N
the correlation in the resampled set {xt|t } increases. This problem
i=1
(i) N is alleviated by alternatively computing (40) as
where {w̃t|t−1 } are the normalized weights. According to (44), 
i=1
the particle weight depend on the likelihood function p(yt |xt ). Intu- p(xt |y1:t )
x̂t|t = xt p(xt |y1:t−1 )dxt , (52a)
itively, this makes sense, since the likelihood establishes how likely X
p(xt |y1:t−1 )
a given state explains the measurement. Thus the better a certain

1
N
particle explains the measurement, the higher the probability that
the particle was in fact sampled from the target density. A prereq- ∝ xt p(yt |xt ) ı (i) (dxt ), (52b)
X
N x
t|t−1
uisite for computing the weights in (44) is complete access to the i=1
(i) N
1  (i) 1  (i)
particles set {xt|t−1 } . This set can be obtained as follows. Assum- N
p(yt |xt|t−1 )
(i) N
i=1 (i)
ing we have particles from the target density p(xt−1 |y1:t−1 ) at the = xt|t−1 N = xt|t−1 w̃t|t−1 . (52c)
N (j)
p(yt |xt|t−1 ) N
previous time, its empirical distribution can be written as i=1 j=1 i=1

From (52a) to (52b), we have used the relation (38) and the empir-
1
N
p(xt−1 |y1:t−1 )dxt−1 = ı (i) (dxt−1 ). (45) ical distribution (47). Now since the sum in (52c) involves an
N x (i) N (i) N
t−1|t−1
i=1 independent set {xt|t−1 } instead of the dependent set {xt|t } in
i=1 i=1
(51), the estimate (52c) is generally more accurate.
Now substituting (45) into (39) yields
Finally, the procedure to recursively compute the MC approxi-
 mation of the posterior density in (50) is referred to as the particle
1
N
p̃(xt |y1:t−1 ) = p(xt |xt−1 ) ı (i) (dxt−1 ), (46a) filtering algorithm, which is outlined in Algorithm 4.
X
N x
t−1|t−1
i=1
6. Implementation
1
N
(i)
= p(xt |xt−1|t−1 ), (46b) In this section, we discuss the implementation of Algorithm 4.
N
i=1 The aim of this discussion is to enable beginners and first-time
researchers to implement particle filtering for their own state esti-
where p̃(xt |y1:t−1 ) is a MC approximation of p(xt |y1:t−1 ). In (46b),
mation problems. Before we give the implementation, it is worth
the sampling density is given as a mixture of N transition densi-
commenting on some of the aspects of Algorithm 4. First observe
ties. Now since each of the N densities are uniformly weighted,
N that the SSM in (35) is a general probabilistic representation of
(i)
passing the particle set {xt−1|t−1 } through the transition density time-series models. In this section, we consider the problem of state
i=1
(i) N estimation in the following class of SSMs
generates i.i.d. samples set {xt|t−1 } , that is distributed according
i=1
(i) N X0 ∼N( · |M0 , P0 ), (53a)
to the sampling density p(xt |y1:t−1 ). Once {xt|t−1 } is generated,
i=1
the empirical distribution of the sampling density can be written Xt+1 = f (Xt , Ut ) + Vt , Vt ∼N( · |0, PX ), (53b)
as
Yt = g (Xt , Ut ) + Wt , Wt ∼N( · |0, PY ), (53c)
1
N

p(xt |y1:t−1 )dxt = ı (i) (dxt ), (47) where the initial state is Gaussian with mean M0 and covari-
N x
t|t−1 ance P0 , Ut ∈ Rp are the system inputs; Vt ∈ Rn and Wt ∈ Rm
i=1
are the additive, mutually independent zero-mean Gaussian state
(i) N and measurement noise processes, respectively; and f and g are
From {xt|t−1 } , particles from the target density is obtained by
i=1 the state and measurement mapping functions parametrized by
(i) (i) N
resampling {xt|t−1 , w̃t|t−1 } according to  ∈ Rk , respectively. Here, (53) is called a discrete-time nonlinear
i=1
SSM with additive Gaussian noise and can be probabilistically rep-
˜ (j) = x(i) ) = w̃(i) ,
Pr(X for j = 1, . . ., N, (48) resented as (35). In fact, the state transition and the likelihood for
t|t t|t−1 t|t−1
(53) can be represented as
with the resampled particle weights reset to
p(xt+1 |xt , ut ) = N( · |(xt+1 − f (xt , ut )), PX ), (54a)
(j) 1
wt|t = for j = 1, . . ., N, (49) p(yt |xt , ut ) = N( · |(yt − g (xt , ut )), PY ). (54b)
N
The densities in (54a) and (54b) correspond to the mean-shifted
(i) (i) N (i) N
Finally the particle system {xt|t , wt|t } } = {xt|t , N −1 } corre- state and measurement noise, respectively. Now given (54a), the
i=1 i=1
sponds to the target density p(xt |y1:t ), with its empirical Step 3 of Algorithm 4 is implemented as follows. First we generate
distribution represented as (i) (i)
a sample of state noise (Vt = vt )∼N( · |0, PX ), then given xt−1|t−1 ,
(i)
1
N predict xt|t−1 as follows
p(xt |y1:t )dxt = ı (i) (dxt ). (50)
N xt|t (i) (i)
xt|t−1 = f (xt−1|t−1 , ut−1 ) + vt .
(i)
(55)
i=1
142 A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145

Table 3
A MATLAB code for implementing Algorithm 4 for the SSM in (53) with n = 2, m = 1 and p = 2.

(i) (i) (i) growth and substrate consumption, the dynamics of the species
Here, given vt and xt−1|t−1 , the particle xt|t−1 is generated deter-
(i) insider the fermenter can be described by Tulsyan et al. (2012)
ministically. Similarly, w̃t|t−1 in Step 4 is calculated using (54b).
 
(i) (1)Xt (2)
Since (54b) is Gaussian, w̃t|t−1 is also a Gaussian density evaluated Xt+1 (1) = 0.1 − Ut (1) − (4) X1 (t) + Vt (1),
(i) (2) + Xt (2)
at xt|t−1 . Finally, a MATLAB implementation of Algorithm 4 for the
 
SSM in (53) for n = 2, m = 1 and p = 2 is given in Table 3. We use (1)Xt (2) Xt (1)
the systematic resampling in Algorithm 3 to implement Step 6 in Xt+1 (2) = 0.1 − + Ut (1) (Ut (2) − Xt (2))
(2) + Xt (2) (3)
Algorithm 4. The first four arguments to the function StateEsti- +Vt (2),
mation are the state equations f1 and f2, measurement equation g
and the likelihood pe, all defined as MATLAB inline functions. We
Yt = Xt (2) + Wt ,
demonstrate the use of code in Table 3 on the following example.
where Xt (1) and Xt (2) are the state variables representing the con-
centrations of biomass growth (g/L) and substrate consumption
Example 6.1. Consider a semi-continuous Baker’s fermenter for (g/L) as a function of time t, respectively. Manipulated variables
biomass growth. Assuming the Monod kinetics for both biomass Ut (1) and Ut (2) are the dilution factor (h−1 ) and the substrate
A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145 143

Table 4
A MATLAB code for state estimation in Example 6.1 using Algorithm 4.

concentration (g/L) in feed, respectively. The measurements yt are in the first step. The next step is to generate synthetic measure-
available only for the substrate consumption. The state noise and ments using the model description. In real-world applications, the
measurement noise are given by Vt ∼N( · |0, PX ) and Wt ∼N( · |0, PY ), measurements are sampled from the process. Once the measure-
respectively. Also,  = [(1), (2), (3), (4)]T are the four model ments are available, the StateEstimation routine is invoked. The
parameters, assumed to be perfectly known a priori. The objective MATLAB code for state estimation in Example 6.1 is shown in Table 4.
is then to estimate in real-time the concentration of biomass in the Finally, the state estimates computed using Table 4 are shown in
fermenter using available noisy substrate measurements. Fig. 11. The deviation in the state estimates from the true values
The first step is to define the model in terms of initial condi- in Fig. 11 is due to the noise in the system. From Fig. 11 it is clear
tions, state and measurement functions, and noise distributions. that Algorithm 2 is successful in accurately estimating the concen-
Filter parameters, such as the number of particles is also initialized tration of the biomass growth. Mathematically, the performance of
144 A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145

and pitfalls beginners make while reading particle filtering for the
first time. Moreover, we have also provided the reader with some
intuition as to why the algorithm works and how to implement
it in practice. An implementable version of MATLAB code for par-
ticle filters is also provided. The code not only aids in improving
the understanding of particle filters, it also serves as a template
for beginners to build and implement their own advanced state
estimation routines.

References

Achilias, D.S., Kiparissides, C., 1992. Development of a general mathematical


framework for modeling diffusion controlled free-radical polymerization
reactions. Macromolecules 25, 3739–3750.
Andrieu, C., Doucet, A., Holenstein, R., 2010. Particle Markov chain Monte Carlo
methods. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 72 (3), 269–342.
Arulampalam, M.S., Maskell, S., Gordon, N., Clapp, T., 2002. A tutorial on particle
filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal
Process. 50 (2), 174–188.
Fig. 11. The blue dashed line corresponds to the concentration of the biomass Barazandegan, M., Ekram, F., Kwok, E., Gopaluni, B., Tulsyan, A., 2015. Assessment
of type II diabetes mellitus using irregularly sampled measurements with
growth in Example 6.1 as estimated by Algorithm 4, and the red dashed line cor-
missing data. Bioprocess Biosyst. Eng. 38 (4), 615–629.
responds to the true biomass concentration. The RMSE with N = 10, 000 particles
Castro, R., 2015. The Empirical Distribution Function and the Histogram, Lecture
is 5.87 g/L. (For interpretation of the references to color in this figure legend, the
Notes, 2WS17- Advanced Statistics. Department of Mathematics, Eindhoven
reader is referred to the web version of the article.) University of Technology.
Chen, Z., 2003. Bayesian Filtering: From Kalman Filters to Particle Filters, and
Table 5 Beyond, Tech. Rep. Communications Research Laboratory, McMaster
Effect of the number of particles on the RMSE and computational time. The results University, Hamilton, Ontario, Canada.
are computed for Example 6.1. Chitralekhaa, S.B., Prakash, J., Raghavan, H., Gopaluni, R.B., Shah, S.L., 2010. A
comparison of simultaneous state and parameter estimation schemes for a
Particles RMSE (g/L) Computational time (s) continuous fermentor reactor. J. Process Control 20, 934–943.
Crisan, D., Rozovskii, B., 2011. The Oxford Handbook of Non-Linear Filtering.
10 9.57 1.21 Oxford University Press, Oxford.
50 8.07 1.33 Douc, R., Cappé, O., 2005. Comparison of resampling schemes for particle filtering.
100 7.97 1.61 In: Proceedings of the 4th International Symposium on Image and Signal
Processing and Analysis, Zagreb, Croatia, pp. 64–69.
Doucet, A., de Freitas, N., Gordon, N.J., 2001. Sequential Monte Carlo Methods in
Algorithm 4 on Example 6.1 can be assessed using the root mean Practice, chap. An Introduction to Sequential Monte Carlo Methods.
Springer-Verlag.
square error (RMSE), which is defined as Doucet, A., Johansen, A.M., 2009. A tutorial on particle filtering and smoothing:

 T fifteen years later. Handb. Nonlinear Filter. 12 (656–704), 3.
 Evensen, G., 2007. Data Assimilation: The Ensemble Kalman filter, chap. Estimation
RMSE = 
T in an Oil Reservoir Simulator. Springer, Berlin, Heidelberg.
[x − x̂t|t ] [x − x̂t|t ],
t t (56)
Fearnhead, P., (Ph.D. thesis) 1998. Sequential Monte Carlo Methods in Filter
t=1 Theory. University of Oxford.
Geweke, J., 1989. Bayesian inference in econometric models using Monte Carlo
where Xt = xt and x̂t|t represent the true and estimated states. The integration. Econom. J. Econom. Soc., 1317–1339.
RMSE for Algorithm 4 with N = 10, 000 particles is 5.87 g/L. Note Gilks, W., 2005. Encyclopedia of Biostatistics, chap. Markov Chain Monte Carlo.
Wiley Online Library.
that the RMSE is a function of the number of particles used. Table 5 Gordon, N.J., Salmond, D.J., Smith, A.F.M., 1993. Novel approach to nonlinear and
gives the RMSEs for different N used in Algorithm 4. As expected, non-Gaussian Bayesian state estimation. IEE Proc. Radar Signal Process. 140
the RMSE decreases with the number of particles used. In practice, (2), 107–113.
Kantas, N., Doucet, A., Singh, S.S., Maciejowski, J., Chopin, N., et al., 2015. On
the optimal choice of N is nontrivial, as it is difficult to estimate a
particle methods for parameter estimation in state-space models. Stat. Sci. 30
priori the number of particles needed to achieve the desired per- (3), 328–351.
formance (measured in terms of RMSE, for example). Generally, Kitagawa, G., 1996. Monte Carlo filter and smoother for non-Gaussian nonlinear
the performance of Algorithm 4 improves with the size of N; how- state space models. J. Comput. Graph. Stat. 5 (1), 1–25.
Liu, J., Chen, R., 1998. Sequential Monte Carlo methods for dynamic systems. J. Am.
ever, exercise needs to be cautioned as the computational load of Stat. Assoc. 93 (443), 1032–1044.
Algorithm 4 scales linearly with N. Table 5 gives the computational MATLAB, 2010a. version 7.10.0 (R2010a). The MathWorks Inc., Natick, MA.
time for Algorithm 4 for different values of N. Maybeck, P.S., 1982. Stochastic Models, Estimation and Control, Vol. 2. Academic
Press, New York.
Finally, it is highlighted that the implementation of Algorithm Metropolis, N., 1987. The beginning of the Monte Carlo method. Los Alamos Sci. 15
(584), 125–130.
4 on Example 6.1 is only intended to highlight the use of particle Montgomery, D., Runger, G., 2010. Applied Statistics and Probability for Engineers.
filtering in state estimation applications. Note that Algorithm 4 is John Wiley & Sons, Hoboken, NJ.
general and can be used for estimation under much more compli- Ninness, B., 2000. Strong laws of large numbers under weak assumptions with
application. IEEE Trans. Autom. Control 45 (11), 2117–2122.
cated system settings than the one considered in Example 6.1. For Ristic, B., Arulampalam, S., Gordon, N., 2004. Beyond the Kalman Filter: Particle
example, in Tulsyan et al. (2013b), the authors discuss the applica- Filters for Tracking Applications, chap. A Tutorial on Particle Filters. Artech
tion of particle filters for state estimation in non-Gaussian systems. House, Boston, MA.
Robert, C., Casella, G., 2013. Monte Carlo Statistical Methods. Springer Science &
Business Media, New York, NY.
7. Conclusions Rubinstein, R., Kroese, D., 2011. Simulation and the Monte Carlo Method. John
Wiley & Sons, Hoboken, NJ.
Silverman, B., 1986. Density Estimation for Statistics and Data Analysis, vol. 26.
We have presented a gentle introduction to several of the impor-
CRC Press, New York, NY.
tant Monte Carlo sampling methods, such as perfect sampling, Solomon, H., 1978. Geometric Probability, chap. Buffon Needle Problem,
importance sampling, sequential importance resampling and parti- Extensions and Estimation of . SIAM, Philadelphia, PA, pp. 1–24.
cle filters. This exposition is targeted at beginners and practitioners Sorenson, H.W., 1974. On the development of practical non-linear filters. Inf. Sci. 7
(C), 253–270.
with limited understanding of particle filtering theory. Throughout Tanizaki, H., Mariano, R., 1998. Nonlinear and non-Gaussian state-space modeling
this primer we have highlighted some of the common mistakes with Monte Carlo simulations. J. Econom. 83 (1), 263–290.
A. Tulsyan et al. / Computers and Chemical Engineering 95 (2016) 130–145 145

Tulsyan, A., Gopaluni, R.B., 2016. Robust model-based delay timer alarm for Tulsyan, A, Huang, B, Gopaluni, RB, Forbes, JF, 2013a. On simultaneous on-line state
non-linear processes. In: Proceedings of the 2016 American Control and parameter estimation in non-linear state-space models. J. Process Control
Conference (ACC), Boston, pp. 2989–2994. 23 (4), 516–526.
Tulsyan, A., Huang, B., Gopaluni, R.B., Forbes, J.F., 2012. Performance assessment of Tulsyan, A, Huang, B, Gopaluni, RB, Forbes, JF, 2013b. A particle filter approach to
nonlinear state filters. In: Proceedings of the 8th IFAC Symposium on approximate posterior Cramér-Rao lower bound: the case of hidden states.
Advanced Control of Chemical Processes, Singapore, pp. 371–376. IEEE Trans. Aerosp. Electron. Syst. 49 (4), 2478–2495.
Tulsyan, A., Forbes, J.F., Huang, B., 2012. Designing priors for robust Bayesian Tulsyan, A., Huang, B., Gopaluni, R.B., Forbes, J.F., 2014. Performance assessment,
optimal experimental design. J. Process Control 22 (2), 450–462. diagnosis, and optimal selection of non-linear state filters. J. Process Control 24
Tulsyan, A., Huang, B., Gopaluni, R.B., Forbes, J.F., 2013. Bayesian identification of (2), 460–478.
non-linear state space models: Part II – Error analysis. In: Proceedings of the Tulsyan, A., Tsai, Y., Gopaluni, R.B., Braatz, R.D., 2016. State-of-charge estimation in
10th International Symposium on Dynamics and Control of Process Systems, lithium-ion batteries: a particle filter approach. J. Power Sources 331, 208–223.
Singapore. Whitley, D., 1994. A genetic algorithm tutorial. Stat. Comput. 4 (2), 65–85.

View publication stats

Das könnte Ihnen auch gefallen