Tutorial On Monte Carlo Sampling PDF

Tutorial on Monte Carlo Sampling
Hongshu Chen
Dept. of Chemical & Biomolecular Eng., Ohio State University
May 16, 2005
1 Monte Carlo Approximation

As engineers, we are familiar with integrations like
b
y= g(x)f (x)dx (1)
a
or optimization problems like

x = arg max f (x) (2)
x(a,b)
If the problem is simple enough, we can solve it analytically. Unfortunately, in most cases, there
is no closed form solution and we need to use numerical methods. Sometimes, even numerical
integration or optimization is not convenient or good enough, that is the time to think about an
alternative, Monte Carlo approximation.
The basic idea of Monte Carlo approximation is very simple. Still use equation (1), (2) as
examples, if f (x) satises two conditions:
1. f (x) 0, x (a, b)1

b
2. a f (x)dx = M <
Let
f (x)
f (x) = (3)
M
f (x) fulls the requirements of being a Probability Density Function (PDF) of a distribution.
Then we can consider f (x) to be a PDF for a distribution dened in (a, b). Hence, equation (1) is
equivalent with
b
y= M g(x)f (x)dx = M Ef (g(x)) (4)
a
i.e., the integration is simply the expected value of g(x) over the distribution f (x) times a constant
M . Hence, we can draw samples (x1 , x2 , ..., xn ) from f (x) and approximate the integration as:
1
n
yM g(xi ) (5)
n
i=1
1
This condition is not necessary as long as we can make the linear transformation of f (x) nonnegative and make
changes accordingly in Monte Carlo approximation.
1
When the integration of f (x) is easy while the integration of g(x)f (x) is hard, equation (5) will be
much easier than equation (1) to solve.
Solving equation (2) by Monte Carlo approximation is even simpler. Equation (2) can be
reformatted as:
x = arg max f (x) (6)
x(a,b)
i.e., x is the mode (the highest density location) of f (x). As long as we get the samples from f (x),
it is very easy to nd the high density area of f (x) (e.g., by plotting the histogram of samples),
thus we can provide the estimate of x. Depending on the sampling technique we use, this process
even may not need to calculate M .
When the number of samples n , the answer by Monte Carlo approximation will approach
the true value for integration or optimization.
From the above examples, we can see that the main issue in Monte Carlo approximation is how
we draw samples according to the distribution we have. There are quite a few sampling techniques
have been developed already, in following sections, some widely used sampling methods will be
introduced.
Now let us look at two simple examples of Monte Carlo approximation. They are the incarna-
tions of equation (1) and (2).
Example 1 Integration 1
y= x exp(x)dx (7)
0
We can easily get the analytical solution for this problem, the true value of y is 1.
Now we use Monte Carlo approximation to solve this problem. Noticing that the integration
limits are 0 and 1, this reminds us the unif orm(0, 1) distribution, hence,
1
n
y = Eunif orm(0,1) (x exp(x)) xi exp(xi ) (8)
n
i=1
where (x1 , x2 ..., xn ) comes from unif orm(0, 1).

When n = 1000, in one simulation, y 1.01.
In this simulation, we can use the random number generator for unif orm(0, 1) in Matlab di-
rectly, in following sections, this example will be revisited to show how other sampling techniques
works.
Example 2 Optimization
(x1 +1)2 (x1 +1)(x2 1)) (x2 1)2
1 +
x = arg max { exp( 4 8 4
)} (9)
xR2
2 4 1 1 2(1 412 )
42
) (
4 1
This function is actually the PDF of a bivariate normal distribution, N ormal((1, 1)0 ,
)
1 4
Hence, x = (1, 1)0 . Now, using Monte Carlo approximation, we can directly use the multi-
normal random number generator in Matlab, draw n samples of x, then nd the high density area
based on the histogram.
Let n = 106 , in one simulation, we get a histogram(gure (1)), comparing with the contour of
this function(gure (2)), we can see that this histogram is a good approximation of the contour of
function value. In this histogram, the center of the bin in the histogram which has the most counts
2
0.05
0.04
0.03
0.02
0.01
0
6
4 4
2 2
0 0
2
2 4
x2 4 6
x1
Figure 1: Histogram of Samples for Example 2 Figure 2: 3-D Contour of Function for Example
2
is (1.24, 1.25)0 . Hence, x (1.24, 1.25)0 . Monte Carlo approximation can not only provide point
estimate, it can also easily give an error bound.
We will revisit this example by using Markov chain Monte Carlo to draw sample from this
distribution.
2 Sampling Methods
2.1 Inverse Transform Sampling
The principle of inverse transform sampling is based on the following theorem:
Theorem 2.1 Let F(x) be the Cumulative Density Function (CDF) of random variable x, if random
variable y comes from unif orm(0, 1) distribution, then random variable z = F 1 (y) comes from
the distribution of x.
Based on theorem (2.1), as long as we can get the inverse CDF of one distribution, we can easily
sample from it. (Random numbers from unif orm(0, 1) are often available in computer languages.)
Table (1) shows the algorithm of inverse transform sampling. It is very easy to implement as long
as we get the inverse CDF but it can only be used for univariate distributions.
Example 1 (ctd.) 1 Consider exp(x) as the kernel of distribution (i.e., f (x) in equation (1)),
the integration of exp(x) over (0, 1) is e 1, hence, the PDF of the distribution to sample from is:
exp(x)
f (x) = I (x) (10)
e 1 (0,1)
The CDF of this distribution is:

x
exp(t) exp(x) 1
F (x) = dt = (11)
0 e1 e1
The inverse CDF of this distribution is:
F 1 (y) = ln(1 + (e 1)y) I (y) (12)

(0,1)
3
Table 1: Algorithm of Inverse Transform Sampling
f (x) is the PDF of x
derive the inverse PDF F 1 (x) from f (x)
for i = 1 : n
draw yi from unif orm(0, 1)
xi = F 1 (yi )
end for
(x1 , x2 ..., xn ) comes from f (x)
0.9
0.8
yi
0.7
0.6
CDF
0.5
0.4
0.3
0.2
xi
0.1
0
0 0.2 0.4 0.6 0.8 1
x
Figure 3: Illustration of Inverse Transform Sampling for Example 1
4
Figure (3) shows how we draw sample xi given random number yi in this example.
After we get the samples (x1 , x2 ..., xn ), we can approximate the integration in equation 7 as:
e1
n
y xi (13)
n
i=1
When n = 1000, in one simulation, y 0.998 which is very close to the true value 1.
Note: Inverse transform sampling is good when the distribution is univariate and the inverse
CDF is relatively easy to get. In many cases, inverse CDF can not be solved analytically and
there is a method called Griddy Sampling[Ritter et al.(1992)] which use a grid to numerically
evaluate the CDF and inverse CDF, then apply inverse transform sampling.
2.2 Acceptance Rejection Sampling

Acceptance rejection sampling is another widely used sampling method to draw samples from
univariate distribution. It is common that we can easily draw samples from some distributions while
not so easy from others. Acceptant rejection sampling can make use of those easier distributions
to draw samples from the distribution we need. Suppose f (x) is the distribution we want to draw
sample from, and there is another distribution h(x) which is much easier to draw sample from, we
can use h(x) as the proposal distribution in acceptance rejection sampling as long as it satises the
following condition:
There exists a constant c > 0 such that f (x) ch(x) for any x in the domain of f (x)
This condition implies that wherever f (x) > 0, h(x) must also be greater than zero, i.e., the support
of h(x) must contain the support of f (x).
f (x )
During sampling, we draw sample x from h(x) and accept it with the probability of ch(x ) , until
one x is accepted. Table (2) shows the algorithm of acceptance rejection sampling. This algorithm
also need random numbers from unif orm(0, 1) as an assistance to decide whether accept or reject
a sample. Figure 4 shows how we can decide to accept or reject a sample x given a unif orm(0, 1)
random number u and the acceptance probability
f (x )
(x ) = (14)
ch(x )
We can see that the larger is, the more possibility to accept it. To make the acceptance probability
large, we should nd a proposal distribution which has a similar shape as the distribution we want
to sample from. When ch(x) = f (x), the acceptance probability always equal to 1, i.e., we always
accept the samples drawn from h(x). But when the shapes of h(x) and f (x) are quite dierent, the
acceptance probability could be very slow, thus the acceptance rejection sampling is very inecient.
When we calculate the acceptance probability, f (x) need not be the PDF, instead, it can be
the kernel of the PDF, this implies that we need not to calculate the normalization constant(M in
equation 3) when we use acceptance rejection sampling to draw samples.
Example 1 (ctd.) 2 Still consider to draw samples from f (x) = exp(x)
e1 I (x), use unif orm(0, 1)
(0,1)
as the proposal distribution because it has the same support as f (x) and it is easy to sample from.
h(x) = I (x) (15)
(0,1)
The maximum of f (x) is slightly less than 1.6, hence we can choose c = 1.6, therefore, f (x) ch(x).
Figure 5 shows the plots of these functions.
When n = 1000, in one simulation, y 0.984 which is still fairly close to true value.
5
Figure 4: Illustration of Accept/Reject a Sample with The Help of Random Number
Table 2: Algorithm of Acceptance Rejection Sampling
f (x) is the PDF of x, h(x) is the proposal PDF

a constant c > 0, s.t. f (x) ch(x) anywhere
for i = 1 : n
u=2
f (x )
do while u > (x ) = ch(x )
draw x from h(x)
draw u from unif orm(0, 1)
end do
xi = x
end for
(x1 , x2 ..., xn ) comes from f (x)
6
1.8
ch(x)
1.6
1.4
1.2 f*(x)
PDF
h(x)
1
0.8
0.6
0.4
0 0.2 0.4 0.6 0.8 1
x
Figure 5: Illustration of Acceptance Rejection Sampling for Example 1
2.3 Importance Sampling

Importance sampling is similar to acceptance sampling because it also need a proposal distribution
h(x), and we draw samples of f (x) through h(x). The condition for h(x) is:
Wherever f (x) > 0, h(x) is also greater than zero
It implies that the support of h(x) must contain the support of f (x). This is a weaker condition
compared with the condition for acceptance rejection sampling. In importance sampling, we do not
need to reject the samples from h(x) with some probability, instead, we give them corresponding
weights, hence, it is easier to implement. Equation (16) shows the principle of importance sampling.
b
Ef (g(x)) = g(x)f (x)dx
a
b
f (x)
= g(x)
h(x)dx
a h(x)
f (x)
= Eh (g(x) ) (16)
h(x)
f (x) f (x)
h(x) is the weight function, let w(x) = h(x) , then
1
n
Ef (g(x)) (g(xi )w(xi )) (17)
n
i=1
where (x1 , x2 , ..., xn ) are samples from h(x). When we use these samples for approximation, we
must consider the weights of samples at the same time. Table (3) shows the algorithm of importance
sampling. To get better approximation, the proposal distribution h(x) should be as close to f (x)
as possible[Chen(2002)], when they are equal, the weights are always 1 and it is the same as to
draw samples directly from f (x). If the proposal distribution is not proper, most weights will be
close to zero and the eciency of importance sample will be very low.
7
Table 3: Algorithm of Importance Sampling
f (x) is the PDF of x, h(x) is the proposal PDF

Wherever f (x) > 0, h(x) > 0
for i = 1 : n
draw x from h(x)

xi = x
f (xi )
calculate weight: wi = h(xi )
end for
(x1 , x2 ..., xn ) and (w1 , w2 ..., wn ) are samples and corresponding weights
Example 1 (ctd.) 3 Still consider to draw samples from f (x) = exp(x)

e1 I (x), the shape of f (x)
(0,1)
is similar with the curve of a normal distribution, hence, we can use a SemiN ormal(1, 0.25) dis-
tribution as the proposal distribution h(x).
4
h(x) = exp(2(x 1)2 ) I (x) (18)
2 (,1)
Figure (6) shows the plots of the two functions.

And the weight function is:

2
w(x) = exp(x + 2(x 1)2 ) I (x) (19)
4(e 1) (0,1)
When n = 1000, in one simulation, y 0.989 which is still fairly close to true value.
Importance sampling can be used to draw samples from both univariate and multivariate
distributions, also because of its simplicity, it is widely used in solving problems for dynamic
systems[Chen et al.(2004)].
2.4 Markov Chain Monte Carlo

Markov chain Monte Carlo[Gamerman(1997)][Neal(1993)] is a sampling method which draws sam-
ples from distribution f (x) by creating a Markov chain. To understand how Markov chain Monte
Carlo works, there are some basic concepts need to be known.
Denition 2.1 A sequence of random variables (x0 , x1 , ..., xk1 , xk , ...) forms a Markov chain if
P (xk |xk1 , xk2 , ..., x1 , x0 ) = P (xk |xk1 ), k = 1, 2, ..., i.e., the probability of one random variable
only depends on the previous variable in the sequence.
The probability Pk (xk |xk1 ) is called transition probability distribution at time step k. A Markov
chain has stationary transition probability when transition probability distributions are the same
for all time steps. For Markov chain with stationary transition probability, we can remove the
subscript of the transition probability distribution and denote it as P (x|y).
8
2
1.8
1.6
h(x)
1.4
*
1.2 f (x)
PDF
1
0.8
0.6
0.4
0.2
0
0.5 0 0.5 1 1.5
x
Figure 6: Illustration of Importance Sampling for Example 1
Denition 2.2 For a Markov chain with a stationary transition probability distribution P (x|y),
if there exists a distribution (x) such that (x) = P (x|y)(y)dy, then (x) is a stationary
distribution of the Markov chain.
If the stationary distribution (x) is unique, then,
1
n
E (g(x)) (g(xi )) (20)
n
i=1
Where (x1 , x2 , ..., xn ) forms the Markov chain with (x) as the stationary distribution. In practice,
to eliminate the eect of x0 , we often use samples after some time steps for Monte Carlo approxi-
mation. Those ignored samples are called burn-ins. Obviously, as long as we can create a Markov
chain with a desired stationary distribution, we can simply use elements in the Markov chains as
samples from that distribution. This is the principle behind Markov chain Monte Carlo.
To construct such a Markov chain, there are two algorithms widely used, Metropolis-Hastings
sampling and Gibbs sampling. They will be introduced separately in the following sections.
2.4.1 Metropolis-Hastings Sampling

In Metropolis-Hastings sampling, we need to choose a proposal distribution P (x|y) for creating the
Markov chain. This is a conditional distribution which depends on the last element in the Markov
chain. We draw a sample x for the time step xk from this conditional distribution P (x|xk1 ),
and similar to acceptance rejection sampling, we need to accept x with the acceptance probability
(x , xk1 ), which is calculated as:
f (x )P (xi1 |x )
(x , xk1 ) = min{1, } (21)
f (xi1 )P (x |xi1 )
So we also need random numbers from unif orm(0, 1) to help making our decision as illustrated
in gure (4). But unlike acceptance rejection sampling, if the sample is rejected, we do not draw
9
Table 4: Algorithm of Metropolis-Hastings Sampling
f (x) is the PDF of x, P (x|y) is the proposal PDF
for i = 1 : n
draw x from P (x|xi1 )

draw ui from unif orm(0, 1)
|x )
if ui < (x , xi1 ) = min{1, ff(x(xi1)P)P(x(xi1
|x
i1 )
}
xi = x
else
xi = xi1
end if
end for
(x1 , x2 ..., xn ) are samples from f (x)
a new sample, instead, we let xk = xk1 and move to the next time step. Table (4) shows the
algorithm of Metropolis-Hastings sampling.
One big issue in Metropolis-Hastings sampling is how we can nd a proper proposal function.
Based on the type of proposal function, there are dierent types of Markov chains. Among them,
random walk chain and independence chain are the most popular.
Random walk chain is to construct the Markov chain in a similar manner of the random walk
process. We draw sample for time step k as following:
x = xk1 + z (22)
where random number z comes from distribution h(z). Hence, the proposal distribution is
P (x|y) = h(x y). The distribution h(z) could be normal, unif orm, student0 s t or other distrib-
ution. When we choose normal distribution, this Markov chain is very similar to Brownian motion
process except that there is an additional step to accept/reject the move of particle.
When h(z) is symmetric about 0, the acceptance probability reduced to:
f (x )
(x , xk1 ) = min{1, } (23)
f (xk1 )
Interestingly, we can nd a counterpart in optimization for random walk chain, which is random
search optimization[Edgar et al.(2001)]. In random search optimization, the direction and length
from one point to another point are randomly chosen. The new point is accepted only when it is
optimal than the previous point. But in random walk Markov chain, the new point is accepted
according to the acceptance probability. So random walk chain is able to touch locations where
random search optimization will never go to, thus, it is more capable of dealing with local minimal
or multi-mode issues. Moreover, random search optimization can only give us a search route and
10
6 6
5 5
4 4
3 3
2 2
x2
2
1 1
x
0 0
1 1
2 2
3 3
4 4
6 4 2 0 2 4 6 4 2 0 2 4
x1 x1
Figure 7: First 50 Samples in Random Walk Figure 8: Search Route of Random Search Op-
Chain for Example 2 timization for Example 2
the use the nal point as the estimate, while random walk chain can provide the whole approximate
contour of the function and there are lots of more information contained in the contour, the estimate
is also based on all the samples and it is very easy to give an error bound. These two issues are
also advantages of all sampling methods against optimizations methods. We can compare these
two methods in following example.
Example 2 (ctd.) 1 Let us draw samples by random walk chain from the bivariate normal dis-
tribution in example (2).
Sample for time step k is drawn by:
x = xk1 + z (24)
where z is also from a bivariate distribution. Here, we let:

( )
0 0.25 0
z N ormal((0, 0) , ) (25)
0 0.25
i.e., the two components of z are independently normal distributed. Because z is symmetric about
(0, 0)0 , the acceptance probability is calculated as:
f (x )
(x , xk1 ) = min{1, } (26)
f (xk1 )
where f (x) is the function to be optimized[Tierney(1994)].

As for random search optimization, we can let the direction to be chosen uniformly from 0 2,
the length of step to be chosen uniformly from 0 0.5, and only the more optimal point is accepted.
Let the two methods both start from (4, 3)0 , gure (7)) and (8) show how the points in the
two methods move dierently.
Generating a random walk chain with length of 10000, consider the rst 100 samples to be burn-
ins, then pick one sample from every ve samples, we can get the scatter plot and the histogram as
in gure (9) and (10).
11
6
2
x2
4
6 4 2 0 2 4
x1
Figure 9: Scatter Plot of Samples in Random Figure 10: Histogram of Samples in Random
Walk Chain for Example 2 Walk Chain for Example 2
Independence chain is to use a proposal distribution P (x|y) = P (y), i.e., it is not a conditional
distribution, the sample drawn from this distribution does not depend on the sample in the last
time step. However, this is not to say it is not a independence chain is not a Markov chain, because
we have an additional step to accept/reject it and the acceptance probability still depends on the
sample in last time step.
In independence chain, the acceptance probability is calculated as:
f (x )P (xk1 )
(x , xk1 ) = min{1, }
f (xk1 )P (x )
f (x )/P (x )
= min{1, } (27)
f (xk1 )/P (xk1 )
where f (x)/P (x) is calculated the same as weights in importance sampling[Tierney(1994)]. Hence,
the probability of accepting a new sample depends on the ratio of weights of new sample and the
sample in last time step.
The benet of using independence chain is that it is not likely to be trapped in a local minimum
due to not enough random walk distance. But the choice of the proposal distribution is crucial for
the success of independence chain, the same as for importance sampling.
Example 2 (ctd.) 2 Now we use independence chain to draw samples from the bivariate normal
distribution in example (2). ( )
0 4 0
We can choose N ormal((0, 0) , ) as the proposal distribution, then the acceptance prob-
0 4
ability is calculated as:
f (x )/P (x )
(x , xk1 ) = min{1, } (28)
f (xk1 )/P (xk1 )
where f (x) is the function to be optimized and P (x) is the proposal distribution.
Let the two methods both start from (4, 3)0 , gure (11)) shows the rst 50 samples in inde-
pendence chain. As we can see, the jump of samples are much larger than in random walk chain.
12
6
2
x2
4
6 4 2 0 2 4
x1
Figure 11: First 50 Samples in Independence Chain for Example 2
2
x2
4
6 4 2 0 2 4
x1
Figure 12: Scatter Plot of Samples in Indepen- Figure 13: Histogram of Samples in Indepen-
dence Chain for Example 2 dence Chain for Example 2
13
Table 5: Algorithm of Gibbs Sampling
f (x) is the PDF of x, x = (x1 , x2 , ..., xm )0 is m-dimensional
for i = 1 : n
for j=1:m
draw xji from f (xj |x1i , x2i , ..., xij1 , xj+1 m
i1 , ..., xi1 )
end for
end for
(x1 , x2 ..., xn ) are samples from f (x)
Note: Except for random walk chain and independence chain, we can use other types of chains
like autoregressive chain, rejection sampling chain, grid-based chain and so on[Tierney(1994)].
2.4.2 Gibbs Sampling

Gibbs sampling is a special type of Metropolis-Hastings sampling. It is extremely useful for sampling
from high dimensional distributions. For example, if x is m-dimensional, in each time step, Gibbs
sampling draws samples of elements of x in sequence. The proposal distributions are full conditional
distributions of every element of x. In this manner, the acceptance probability will always equal
to 1, hence, samples will always be accepted. So we can ignore the accept/reject step. This is a
big advantage because we need not to worry about the high rejection rate due to updating all the
elements the same time as in Metropolis-Hastings sampling. Table 5 shows the algorithm of Gibbs
sampling.
Gibbs sampling also has a counterpart in optimization, which is called univariate search opti-
mization. In univariate search optimization[Edgar et al.(2001)], for a multivariate function to be
optimized, we only search for the direction in one dimension and x others sequentially.
Example 2 (ctd.) 3 To use Gibbs sampling draw samples for the bivariate normal distribution in
example (2), we need to derive the full conditional distributions for the two elements of x. In this
example, the conditional distributions are easy to get:
x2 1 15
x1 |x2 N ormal(1 + , ) (29)
4 4
x1 + 1 15
x2 |x1 N ormal(1 + , ) (30)
4 4
So we draw the two elements in sequence conditional from above distributions on the newest other
element.
As for univariate search, we can jump in one dimension from previous point to the mode of the
distribution given the value of another dimension, it converges to the mode soon.
Let the two methods both start from (4, 3)0 , gure (14)) and (15) show how the points in the
two methods move dierently.
14
6 6
5 5
4 4
3 3
2 2
x2
2
1 1
x
0 0
1 1
2 2
3 3
4 4
6 4 2 0 2 4 6 4 2 0 2 4
x1 x1
Figure 14: First 10 Samples in Gibbs Sampling Figure 15: Search Route of Univariate Search
for Example 2 Optimization for Example 2
2
x2
4
6 4 2 0 2 4
x
1
Figure 16: Scatter Plot of Samples in Gibbs Figure 17: Histogram of Samples in Gibbs Sam-
Sampling for Example 2 pling for Example 2
Note: In practice, it is usually not easy to draw samples from the full conditional dis-
tributions. There are some methods developed based on acceptance rejection sampling
called Adaptive Rejection Sampling(ARS)[Gilks et al.(1992)] and Adaptive Rejection Metropo-
lis Sampling(ARMS)[Gilks et al.(1995)] which are widely used as an assistance for Gibbs sam-
pling.
References
[Chen(2002)] W.-s.Chen, 2002, Tutorial on Sequential Monte Carlo Sampling, Technical Report,
Department of Chemical & Biomolecular Engineering, Ohio State University.
15
[Chen et al.(2004)] W.-s.Chen, B.R.Bakshi, P.K.Goel, S.Ungarala, 2004, Bayesian Estimation of
Unconstrained Nonlinear Dynamic Systems via Sequential Monte Carlo Sampling, Industrial
and Engineering Chemistry Research, 43(14),:4012-4025.
[Edgar et al.(2001)] T.F.Edgar, D.M.Himmelblau, L.S.Lasdon, 2001, Optimization of Chemical

Processes, second edition, McGraw-Hill.
[Gamerman(1997)] D.Gamerman, 1997. Markov Chain Monte Carlo, Chapman & Hall.
[Gilks et al.(1995)] W.R.Gilks, N.G.Best, K.K.C.Tan, 1995, Adaptive Rejection Metropolis Sam-
pling within Gibbs Sampling, Applied Statistics, 44(4):455-472.
[Gilks et al.(1992)] W.R.Gilks, P.Wild, 1992, Adaptive Rejection Sampling for Gibbs Sampling,
Applied Statistics, 41(2):337-348.
[Neal(1993)] R.M.Neal, 1993, Probabilistic Inference Using Markov Chain Monte Carlo Methods,
Technical Report, Department of Computer Science, University of Toronto.
[Ritter et al.(1992)] C.Ritter, M.A.Tanner, 1992, Facilitating the Gibbs Sampler: The Gibbs Stop-
per and the Griddy-Gibbs Sampler, J.A.S.A, Vol.87, No.419, 861-868.
[Tierney(1994)] L.Tierney, 1994 Markov Chain for Exploring Posterior Distributions, The Annals
of Statistics, 22(4):1701-1762.
16

Tutorial On Monte Carlo Sampling PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Tutorial On Monte Carlo Sampling PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Tutorial on Monte Carlo Sampling

May 16, 2005

1 Monte Carlo Approximation

or optimization problems like

1. f (x) 0, x (a, b)1

where (x1 , x2 ..., xn ) comes from unif orm(0, 1).

F 1 (y) = ln(1 + (e 1)y) I (y) (12)

f (x) is the PDF of x

derive the inverse PDF F 1 (x) from f (x)

(x1 , x2 ..., xn ) comes from f (x)

Figure 3: Illustration of Inverse Transform Sampling for Example 1

2.2 Acceptance Rejection Sampling

Table 2: Algorithm of Acceptance Rejection Sampling

f (x) is the PDF of x, h(x) is the proposal PDF

(x1 , x2 ..., xn ) comes from f (x)

Figure 5: Illustration of Acceptance Rejection Sampling for Example 1

2.3 Importance Sampling

f (x) is the PDF of x, h(x) is the proposal PDF

draw x from h(x)

Example 1 (ctd.) 3 Still consider to draw samples from f (x) = exp(x)

Figure (6) shows the plots of the two functions.

2.4 Markov Chain Monte Carlo

Figure 6: Illustration of Importance Sampling for Example 1

2.4.1 Metropolis-Hastings Sampling

f (x) is the PDF of x, P (x|y) is the proposal PDF

draw x from P (x|xi1 )

(x1 , x2 ..., xn ) are samples from f (x)

where z is also from a bivariate distribution. Here, we let:

where f (x) is the function to be optimized[Tierney(1994)].

Figure 11: First 50 Samples in Independence Chain for Example 2

f (x) is the PDF of x, x = (x1 , x2 , ..., xm )0 is m-dimensional

(x1 , x2 ..., xn ) are samples from f (x)

2.4.2 Gibbs Sampling

[Edgar et al.(2001)] T.F.Edgar, D.M.Himmelblau, L.S.Lasdon, 2001, Optimization of Chemical

Das könnte Ihnen auch gefallen