Beruflich Dokumente
Kultur Dokumente
RonaldCools
DirkNuyens Editors
Monte Carlo
and QuasiMonte Carlo
Methods
MCQMC, Leuven, Belgium, April 2014
Editors
123
Editors
Ronald Cools
Department of Computer Science
KU Leuven
Heverlee
Belgium
Dirk Nuyens
Department of Computer Science
KU Leuven
Heverlee
Belgium
ISSN 2194-1009
ISSN 2194-1017 (electronic)
Springer Proceedings in Mathematics & Statistics
ISBN 978-3-319-33505-6
ISBN 978-3-319-33507-0 (eBook)
DOI 10.1007/978-3-319-33507-0
Library of Congress Control Number: 2016937963
Mathematics Subject Classication (2010): 11K45, 11K38, 65-06, 65C05, 65D30, 65D18, 65C30,
65C35, 65C40, 91G60
Springer International Publishing Switzerland 2016
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microlms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specic statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.
Printed on acid-free paper
This Springer imprint is published by Springer Nature
The registered company is Springer International Publishing AG Switzerland
Preface
vi
Preface
This conference continued the tradition of biennial MCQMC conferences initiated by Harald Niederreiter, held previously at the following places:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
The next conference will be held at Stanford University, USA, in August 2016.
The proceedings of these previous conferences were all published by
Springer-Verlag, under the following titles:
Monte Carlo and Quasi-Monte Carlo Methods in Scientic Computing
(H. Niederreiter and P.J.-S. Shiue, eds.)
Monte Carlo and Quasi-Monte Carlo Methods 1996 (H. Niederreiter,
P. Hellekalek, G. Larcher and P. Zinterhof, eds.)
Monte Carlo and Quasi-Monte Carlo Methods 1998 (H. Niederreiter and
J. Spanier, eds.)
Monte Carlo and Quasi-Monte Carlo Methods 2000 (K.-T. Fang,
F.J. Hickernell and H. Niederreiter, eds.)
Monte Carlo and Quasi-Monte Carlo Methods 2002 (H. Niederreiter, ed.)
Monte Carlo and Quasi-Monte Carlo Methods 2004 (H. Niederreiter and
D. Talay, eds.)
Preface
vii
Monte Carlo and Quasi-Monte Carlo Methods 2006 (A. Keller, S. Heinrich and
H. Niederreiter, eds.)
Monte Carlo and Quasi-Monte Carlo Methods 2008 (P. LEcuyer and A. Owen,
eds.)
Monte Carlo and Quasi-Monte Carlo Methods 2010 (L. Plaskota and
H. Woniakowski, eds.)
Monte Carlo and Quasi-Monte Carlo Methods 2012 (J. Dick, F.Y. Kuo,
G.W. Peters and I.H. Sloan, eds.)
The program of the conference was rich and varied with 207 talks. Highlights
were the invited plenary talks, the tutorials and a public lecture. The plenary talks
were given by Steffen Dereich (Germany, Westflische Wilhelms-Universitt
Mnster), Peter Glynn (USA, Stanford University), Wenzel Jakob (Switzerland,
ETH Zrich), Makoto Matsumoto (Japan, Hiroshima University), Harald
Niederreiter (Austria, Austrian Academy of Sciences), Erich Novak (Germany,
Friedrich-Schiller-Universitt Jena), Christian Robert (France, Universit
Paris-Dauphine and UK, University of Warwick) and Raul Tempone (Saudi Arabia,
King Abdullah University of Science and Technology). The tutorials were given by
Mike Giles (UK, Oxford University) and Art Owen (USA, Stanford University),
and the public lecture was by Jos Leys.
The papers in this volume were carefully refereed and cover both theory and
applications of Monte Carlo and quasi-Monte Carlo methods. We thank the
reviewers for their extensive reports.
We gratefully acknowledge nancial support from the KU Leuven, the city of
Leuven, the US National Science Foundation and the FWO Scientic Research
Community Stochastic Modelling with Applications in Financial Markets.
Leuven
December 2015
Ronald Cools
Dirk Nuyens
Contents
Part I
Invited Papers
29
87
Contributed Papers
ix
Contents
Contents
xi
List of Participants
xiv
List of Participants
List of Participants
xv
xvi
List of Participants
List of Participants
xvii
xviii
List of Participants
Part I
Invited Papers
1 Introduction
The numerical computation of expectations E[G(X )] for solutions X = (X t )t[0,T ]
of stochastic differential equations (SDE) is a classical problem in stochastic analysis and numerous numerical schemes were developed and analysed within the last
twenty to thirty years, see for instance the textbooks by Kloeden and Platen [19]
and Glasserman [15]. Recently, a new very efficient class of Monte Carlo algorithms
was introduced by Giles [14], see also Heinrich [17] for an earlier variant of the
computational concept. Central to these multilevel Monte Carlo algorithms is the
use of whole hierarchies of approximations in numerical simulations.
S. Dereich (B) S. Li
Institut Fr Mathematische Statistik, Westflische Wilhelms-Universitt Mnster,
Orlans-Ring 10, 48149 Mnster, Germany
e-mail: steffen.dereich@wwu.de
S. Li
e-mail: li.sangmeng@wwu.de
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_1
S. Dereich and S. Li
that G(x) depends on the marginals, integrals and/or supremum of the path x D(R).
Before we state the results we introduce the underlying numerical schemes.
(s,Ys ) ,
where denotes the Dirac delta function and xt = xt xt for x D(R) and
t (0, T ]. It has intensity (0,T ] , where (0,T ] denotes Lebesgue measure on
(0, T ]. Further, let be the compensated variant of that is the random signed
measure on (0, T ] (R\{0}) given by
= (0,T ] .
The process (Yt )t[0,T ] admits the representation
Yt = bt + Wt + lim
0
(0,t]B(0,)c
x d (s, x),
(2)
= bt + Wt +
(0,t]B(0,h)c
x d (s, x),
(3)
S. Dereich and S. Li
and the latter one by the (compensated) small jumps only, that is
Yth = lim
0
(0,t](B(0,h)\B(0,))
x d (s, x).
(4)
We apply an Euler scheme with two sets of update times for the coefficient. We
enumerate the times
in increasing order and consider the Euler approximation X h,, = ( X th,, )t[0,T ]
given as the unique process with X 0h,, = x0 that is piecewise constant on [Tn1 , Tn )
and satisfies
h
h
= X Th,,
+ a( X Th,,
) (YThn YThn1 ) + 1 Z (Tn ) a( X Th,,
X Th,,
) (Y T Y T ),
n
n
n
n1
n1
n
(5)
for n = 1, 2, . . . . Note that the coefficient in front of (Yth ) is updated at all times in
{T0 , T1 , . . . } and the coefficient in front of (Yth ) at all times in {0, , 2 , . . . , T }
{T0 , T1 , . . . }. Hence two kinds of updates are used and we will consider schemes
where in the limit the second kind is in number negligible to the first kind. The parameter h serves as a threshold for jumps being considered large that entail immediate
updates on the fine scale. The parameters and control the regular updates on the
fine and coarse scale.
We call X h,, piecewise constant approximation with parameter (h, , ). We
will also work with the continuous approximation X h,, = (X th,, )t[0,T ] defined
for n = 1, 2, . . . and t [Tn1 , Tn ) by
+ a( X Th,,
)(Yth YThn1 ).
X th,, = X Th,,
n1
n1
Note that for this approximation the evolution Y h takes effect continuously.
and denote by
S(G) the random output that is obtained when estimating the individual
expectations E[G(X 1 )], E[G(X 2 ) G(X 1 )], . . . , E[G(X L ) G(X L1 )] independently by classical Monte-Carlo with n 1 , . . . , n L iterations and summing up the
individual estimates. More explicitly, a multilevel scheme
S associates to each measurable G a random variable
nk
n1
L
1
1
G(X k,i, f ) G(X k1,i,c ) ,
G(X 1,i ) +
S(G) =
n 1 i=1
n
k=2 k i=1
(6)
where the pairs of random variables (X k,i, f , X k1,i,c ), resp. the random variables X 1,i ,
appearing in the sums are all independent with identical distribution as (X k , X k1 ),
resp. X 1 . Note that the entries of the pairs are not independent of each other and the
superscript f and c refer to the fine and coarse simulation, respectively!
We note that (ML3) and (ML4) are conditions that entail that our approximations
have the same quality as the ones that one obtains when doing adapted Euler with
update times {T0 , T1 , . . . }. Condition (ML2) implies that the number of updates
caused by large jumps is negligible in comparison to the regular updates at times in
N0 [0, T ]. This will be in line with our examples and entails that the error process
is of a particularly simple form.
Let (X k : k N) be a family of path approximation for X depending on ((h k , k ,
k ) : k N) and assume that is a parameter greater or equal to 1/2 such that
S. Dereich and S. Li
lim n E G(X n ) G(X ) = 0.
(7)
The maximal level and iteration numbers: We specify the family of multilevel
schemes. For each (0, 1) we denote by
S the multilevel scheme which has
maximal level
log 1
L() =
log M
and iteration numbers
n k () = 2 L() k1
for k = 1, 2, . . . , L().
The error process: The error estimate will make use of an additional process
which can be interpreted as idealised description of the difference between two
consecutive levels, the so called error process. We equip the points (s, Ys ) of the
Poisson point process with two independent marks s2 and s , the former one being
} and the latter one being standard normal.
uniformly distributed on {0, M1 , . . . , M1
M
The error process U = (Ut )t[0,T ] is defined as the solution of the integral equation
Ut =
1
1
2
M
a (X s )Us dYs +
0
+
s s (aa )(X s ) Ys ,
2
(8)
s(0,t]:Ys =0
1
1
1
Bt + lim
s s Ys
0
2
M
s(0,t]:|Y |
s
a (X s )Us dYs +
(aa )(X s ) dZ s .
Central limit theorem: We cite an error estimate from [12]. We assume as before
that the driving process Y is a square integrable Lvy process and that the coefficient a
is a continuously differentiable Lipschitz function. Additionally we assume that 2
is strictly positive.
Suppose that G : D(R) R is of the form
G(x) = g(Ax)
lim 2 E (
S (G) E[G(X )])2 = 2 .
0
Remark 1 1. The theorem is a combination of Theorem 1.6, 1.8, 1.9, 1.10 of [12].
2. One of the assumptions requires a control on the bias, see (7). We note that the
assumptions imposed on G in the theorem imply validity of (7) for = 21 . In
general, research on weak approximation of SDEs suggests that (7) is typically
valid for < 1, see [3] for a corresponding result concerning diffusions.
3. If
T
xs ds ,
Ax = x T ,
0
then the statement of the previous theorem remains true for the multilevel scheme
based on piecewise constant approximations with the same terms appearing in
the limit.
4. For k = 1, 2, . . . the expected number of Euler steps to generate X k (at the
discrete time skeleton of update times) is T (k1 + (B(0, h k )c )). Taking as cost
for a joint simulation of G(X k ) G(X k1 ) the expected number of Euler steps
we assign one simulation of
S (G) the cost
10
S. Dereich and S. Li
T 11 + T (B(0, h 1 )c + T
L()
1
n k ()(k1 + k1
+ (B(0, h k )c + (B(0, h k1 )c )
k=2
T (M + 1) 2
2
(log 1/)2 ,
(log M)2
5.
6.
7.
8.
Ajx =
and generally suppose that a (X s )Ys = 1 for all s [0, T ], almost surely. Then
there exists a constant depending on G and the underlying SDE, but not on M such
that the variance 2 satisfies
=
1
1
.
2
M
11
where is a measure on R\{0} with x 2 ( dx) < . In practise, the measure is
given and we need an effective algorithm for sampling from F.
(10)
This approximation converges fast to the cdf, provided that satisfies certain assumptions. We cite an error estimate from [8].
Theorem 3 Suppose there exist positive reals d , d+ such that
is analytic in the space {z C; im(z) (d , d+ )},
d
d+ |(u + i y)| dy 0, as u ,
:= lim0 R |(u i(d ))| du < +.
If there exist constants , c, > 0 such that,
|(z)| exp(c|z| ), for z R,
then
|G(x) F,K (x)|
for x R.
e2d /xd
e2d+ /+xd+
+
+
2d
/
2 d (1 e
)
2 d+ (1 e2d+ / )
4
1
+
ec(K )
+
2 K
c(K )
12
S. Dereich and S. Li
xmax xmin
and yk = re(F,K (xk ))
N
and, for each k = 1, . . . , N , the unique parabola pk that coincides with re F,K in
the points xk1 , (xk1 + xk )/2, xk . We suppose that F is strictly increasing and note
that by choosing a sufficiently accurate approximation F,K we can guarantee that
each parabola pk is strictly increasing on [xk1 , xk ] and thus has a unique inverse
pk1 when restricted to the domain [xk1 , xk ].
We assume that N is of the form 2d+1 1 with d N and arrange the N entries
y0 , . . . , y N in a binary search tree of depth d.
Sampling: Sampling is achieved by carrying out the following steps:
generation of an on [y0 , y N ] uniformly distributed random number u,
identification of an index k {1, . . . , N } with u [yk1 , yk ] based on the binary
search tree,
output of pk1 (u).
+
h,
( dx) =
(11)
This class of processes has a scaling property similar to stable processes which is
particularly useful in simulations. It will allow us to do the precomputation for one
infinitely divisible distribution only and use the scaling property to do simulations
of different levels. For applications of truncated stable processes, we refer the reader
to [23, 26].
13
3.1 Preliminaries
Proposition 1 Let > 0 and (Yt ) be a truncated stable process with parameters
, h, c+ , c . The process (Yt/ ) is a truncated stable process with parameters
, h, c+ , c .
Proof The process (Yt/ ) is a Lvy process with
E[e
i zxYt/
t
c+
c
(eizx izx 1) 1(0,h] (x) 1+ + 1[h,0) (x) 1+ dx
|x|
|x|
c+
c
= exp t (ei zy i zy 1) 1(0,h] (y) 1+ + 1[h,0) (x) 1+ dy
|y|
|y|
] = exp
for t 0 and z R.
t i zh
i zh
c
e
+
c
e
(c
+
c
)
(c
c
)i
zh
+
h
t
i z c+ ei zh c ei zh (c+ c )
1
( 1)h
th 2
z 2 c+ 1 F1 (2 , 3 , i zh)
( 1)(2 )
+ c 1 F1 (2 , 3 , i zh) ,
z
F
(2
,
3
,
i
zh)
+
F
(2
,
3
,
i
zh))
(
1
1
1
1
( 1)(2 )h 2
Proof It suffices to prove the statement for c+ = 1 and c = 0. All other cases can
be deduced from this case via scaling, reflection and superposition. Recall that
h
1
(ei zx 1 i zx) 1+ dx .
E[ei zYt ] = exp t
x
0
14
S. Dereich and S. Li
(ei zx 1 i zx)
1
x 1+
1
1 h
i z h i zx
1
dx = (ei zx 1 i zx)
+
(e 1) dx
x 0+
0
x
(e
i zx
ei zx 1i zx
x
(12)
= 0. Doing an additional partial
h
1
1
1
1 h
iz
i zx
(e 1) 1
1) dx =
+
ei zx 1 dx
0+
x
1
x
1 0
x
h
1
cos(zx)
1
i
z
(ei zh 1) 1 +
=
dx
(13)
1
h
1 0 x 1
h
z
sin(zx)
dx.
1 0 x 1
h
0
1
cos(zx)
cos(zhx)
2
dx = h
dx
1
x
x 1
0
h 2
=
(1 F1 (2 , 3 , i zh) + 1 F1 (2 , 3 , i zh))
2(2 )
and
h
0
1
sin(zx)
sin(zhx)
2
dx
=
h
dx
1
x
x 1
0
i h 2
=
(1 F1 (2 , 3 , i zh) 1 F1 (2 , 3 , i zh)).
2(2 )
Inserting this into (13) and then inserting the result into (12) finishes the proof.
R\{0}
(ei zx i zx 1) ( dx) .
Then the assumptions of Theorem 3 are satisfied for arbitrary d = d+ = d > 0 and
one has
( 1)
c+ + c
h
|z| + 2
|(z)| exp (c+ + c ) () sin
2
15
2
,
1 e 2 |d| 2 1 +
2
where
1 := exp
1
2
ehd d 2
c+ + c 2
c+ + c hd
h
h e
and 2 := edh > 0.
+2
2
Proof Fix d > 0 and take z = u + i y with u R and y [d, d]. Using that
ei(u+i y)x 1 i(u + i y)x = eiuxyx 1 iux + yx
= eyx (eiux 1 iux) + eyx 1 + yx + iux(eyx 1)
(eyx 1 + yx)( dx)
e (e 1 iux)( dx) exp
exp
iux(eyx 1)( dx) =: 1 (z) 2 (z) 3 (z).
yx
iux
1 | | 2
e , for R.
2
1
2
ehd d 2
1
c+ + c 2
h
.
x 2 ( dx) = exp ehd d 2
2
2
Finally we estimate 1 (z). Note that Re(eiux 1 iux) 0 and eyx edh if
|x| h. Hence,
Re eyx (eiux 1 iux) ( dx) edh Re(eiux 1 iux) ( dx)
so that
dh
|1 (z)| exp e
Re(eiux 1 iux) ( dx) .
16
S. Dereich and S. Li
c+ +c
1[h,h] (x) |x|(1+)
2
dx we have
Re(eiux 1 iux) ( dx) = (eiux 1 iux) ( dx)
c+ + c (1+)
c+ + c (1+)
|x|
|x|
dx
(eiux 1 iux)
dx
= (eiux 1 iux)
c
2
2
[h,h]
=|u| (symm. stable)
|u| + 2
c+ + c
h ,
1
2
ehd d 2
(14)
c+ + c 2
c+ + c hd
h
h e
+2
2
2 := edh > 0.
Equation (14) implies that all assumptions of Theorem 3 are satisfied. If additionally
the imaginary part y of z is zero, then 2 (z) = 3 (z) = 1 and using the estimate for
1 gives that
c+ + c
h
.
|(z)| exp |z| + 2
|u|+|d|
2
2
|(u + i(u ))| du 1 e2 (u +|d| ) 2 du 1 e2 ( 2 ) du
R
R
R
2
1 e 2 (|u| +|d| ) du
R
2 |u|
2 |u|
2 |d|
1 e 2
e 2 du +
e 2 du
B(0,1)
2
2+1
,
1 e 2 |d| 2 +
2
B(0,1)c
and letting 0 we get that indeed + satisfies the inequality of the statement. A
similar computation shows that also satisfies the same inequality.
17
+
( dx) =
( dx) = H,
(15)
g
h
T, )
(16)
for k N.
Proposition 4 If the parameters ((h k , k , k ) : k N) are chosen as above, then
properties (ML1)(ML4) are satisfied.
Proof One has for k
(B(0, h k )c ) k
M k
c+ + c
k h k
0
=
log2 k 0,
k
M k
M 2/
since M 2/ > M. Hence, (ML2) and (ML4) are satisfied. Property (ML3) follows
analogously
k
1 c+ + c 2 k
1
hk
x 2 ( dx) log2 1 + =
log2 1 +
k B(0,h k )
k
2
k
k
M 2k/
log2 k
M k
18
S. Dereich and S. Li
As we show in the next proposition the fact that h k /k 1/ is constant in k allows
us to do the sampling of the process constituted by the small increments with the
help of only one additional distribution for which we have to do a precomputation
in advance.
Proposition 5 Suppose that is a real random variable with
c+ 1(0,1] (x) + c 1[1,0) (x)
(ei zx 1 i zx)
dx
.
E[ei z ] = exp
x 1+
(0,1]
For every k N the increments of (Yth k )t0 over intervals of length k are independent
and distributed as h k .
Proof We note that the increments of (Yt1 ) over intervals of length one are equally
1. (Yt
2. (Yt
h k+1
) on the set of times k+1
Z [0, T ]
we can compute ( X tk+1 ) via the Euler update rule (5). The increments of the process
in (2) are independent and distributed as h k+1 so that the simulation is straightforward. To simulate the process in (1) we first simulate the random set of discontinuities
{(s, Ys ) : s (0, T ], |Ys | h k+1 } =: {(S1 , D1 ), (S2 , D2 ), . . . }.
Here the points are ordered in such a way that S1 , S2 , . . . is increasing. These points
constitute a Poisson point process with intensity |(0,T ] | B(0,h k+1 )c . When considering an infinite time horizon the random variables ((Sk Sk1 , Dk ) : k N) (with
S0 = 0) are independent and identically distributed with both components being
independent of each other, the first being exponentially distributed with parameter
({x : |x| h k+1 }) and the second having distribution
1{|x|h k+1 }
( dx).
({|x| h k+1 })
19
YTkk+1 = BTk +
Dk + Tk b
i:Si Tk
|x|h k+1
x ( dx)
Di t
i:Si t
|x|(h k+1 ,h k ]
x ( dx)
to generate (Yth k ) on k Z [0, T ]. To generate the former process we note that since
k k+1 N the set {T0 , T1 , . . . } is a subset of {T0 , T1 , . . . } so that we can use that
YThk
k
= B +
Tk
Dk + Tk b
|x|h k
x ( dx) .
We stress that all integrals can be made explicit due to the particular choice of .
20
S. Dereich and S. Li
In Sect. 3.3.2 we optimise over the multiplier M appearing in the multilevel scheme.
The answer depends on a parameter that depends in a subtle way on the underlying
problem and the implementation. We conduct tests for a volatility model.
In Sect. 3.3.3 we numerically analyse the error and runtime of the multilevel
schemes for the volatility model introduced there.
3.3.1
|x y| p d(x, y)
1/ p
: is a coupling of and
for two distributions and . For further details concerning the Wasserstein metric we refer the reader to [13, 29]. For real distributions optimal couplings can
be given in terms of quantiles which leads to an alternative representation for the
Wasserstein metric that is particularly suited for explicit computations. We denote
by F : (0, 1) R the generalised right continuous inverse of the cdf F of , that
is
F (u) = inf{t R : F (t) u},
and we use analogous notation for the measure replaced by . One has
W p (, ) =
0
|F (u) F (u)| p du
1/ p
(17)
W2
d
d
d
d
= 11
=9
=7
=5
W2
d
d
d
d
= 11
=9
=7
=5
W2
d
d
d
d
= 11
=9
=7
=5
21
= 1.2
with 106 iterations. Preliminary numerical tests showed that for the following parameters the approximate distribution function has an error of about the machine precision
for reals on the supporting points of the distribution function: K = 400, = 0.02
and
11, if = 1.2,
xmin = xmax = 13, if = 1.5,
20, if = 1.8.
Since these parameters only effect the precomputation we choose them as above and
only vary d (and N ) in the following test that is depicted in Table 1. There the term
following is twice the estimated standard deviation. The results show that one
achieves machine precision for about d = 9.
3.3.2
When generating a pair of levels (X k , X k+1 ) we need to carry out an expected number
of T /k+1 + T (B(0, h k+1 )) Euler steps for the simulation of X k+1 and an expected
number of T /k + T (B(0, h k )) Euler steps for the simulation of X k . By assumption
(B(0, h k+1 )) = o(k1 ) is asymptotically negligible and hence it is natural to assign
one simulation of G(X k+1 ) G(X k ) the cost
T /k+1 + T /k = (M + 1)T /k .
22
S. Dereich and S. Li
A corresponding minimisation of the parameter M is carried out in [14, Sect. 4.1] for
diffusions. The number of Euler steps is only an approximation to the real runtime
caused by the algorithm. In general, the dominating cost is caused by computations
1
and we make the Ansatz that the computational cost for
being of order k1 and k+1
k+1
one simulation of G(X ) G(X k ) is
Ck = (1 + o(1))cost (M + )/k ,
(18)
where cost and are positive constants that do not depend on the choice of M. The
case where one restricts attention to the number of Euler steps is the one with = 1.
We note that for the numerical schemes as in Theorem 2 one has for F as in the
latter theorem
2
S (G) E[G(X )]) N 0, err
(1
1 (
1
)
M
2
(M 1)(M + ) 2
cost err
(log 1 )2 .
2
M(log M)2
dX t =
dt =
1
1
(X t )t dWt + 10
10
1
1
10
dt + 10
dYt ,
23
dt
(19)
0.2941
0.3102
0.3401
0.3030
0.3049
0.3053
0.3215
0. 3039
0.3286
0.3029
0.3169
0.3162
0.3094
0.3238
0.3153
0.3002
0.3187
0.3169
0.3087
0.3386
0.3217
0.3110
0.3220
0.3169
0.3574
0.3590
0.3478
0.3568
0.3573
0.3594
0.3582
0.3576
0.3545
0.3573
0.3562
0.3592
0.3581
0.3595
0.3591
0.3481
0.3581
0.3599
0.3588
0.3604
0.3656
0.3610
0.3563
0.3600
24
S. Dereich and S. Li
25
noticed that varies strongly with the implementation and the choice of the stochastic
differential equation. In most tests we observed to be between 0.2 and 0.6.
3.3.3
In this section we numerically test the error of our multilevel schemes in the volatility
model (19). We adopt the same setting as described in the lines following (19). Further
1
we choose M = 4 and M = M 3 + 3 in the calibration of the scheme.
26
S. Dereich and S. Li
References
1. Applebaum, D.: Lvy processes and stochastic calculus. Cambridge Studies in Advanced Mathematics, vol. 116. Cambridge University Press, Cambridge (2009)
2. Asmussen, S., Rosinski, J.: Approximations of small jumps of Lvy processes with a view
towards simulation. J. Appl. Probab. 38(2), 482493 (2001)
3. Bally, V., Talay, D.: The law of the Euler scheme for stochastic differential equations. I. Convergence rate of the distribution function. Probab. Theory Relat. Fields 104(1), 4360 (1996)
4. Becker, M.: Exact simulation of final, minimal and maximal values of Brownian motion and
jump-diffusions with applications to option pricing. Comput. Manag. Sci. 7(1), 117 (2010)
5. Ben Alaya, M., Kebaier, A.: Central limit theorem for the multilevel Monte Carlo Euler method.
Ann. Appl. Probab. 25(1), 211234 (2015)
6. Bertoin, J.: Lvy Processes. Cambridge University Press, Cambridge (1996)
7. Bruti-Liberati, N., Nikitopoulos-Sklibosios, C., Platen, E.: First order strong approximations
of jump diffusions. Monte Carlo Methods Appl. 12(34), 191209 (2006)
8. Chen, Z.S., Feng, L.M., Lin, X.: Simulating Lvy processes from their characteristic functions
and financial applications. ACM Trans. Model. Comput. Simul. 22(3), 14 (2012)
9. Dereich, S.: The coding complexity of diffusion processes under supremum norm distortion.
Stoch. Process. Appl. 118(6), 917937 (2008)
10. Dereich, S.: Multilevel Monte Carlo algorithms for Lvy-driven SDEs with Gaussian correction. Ann. Appl. Probab. 21(1), 283311 (2011)
11. Dereich, S., Heidenreich, F.: A multilevel Monte Carlo algorithm for Lvy-driven stochastic
differential equations. Stoch. Process. Appl. 121(7), 15651587 (2011)
12. Dereich, S., Li, S.: Multilevel Monte Carlo for Lvy-driven SDEs: central limit theorems for
adaptive Euler schemes. Ann. Appl. Probab. 26(1), 136185 (2016)
13. Dobrushin, R.L.: Prescribing a system of random variables by conditional distributions. Theory
Probab. Appl. 15(3), 458486 (1970)
14. Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607617 (2008)
15. Glasserman, P.: Monte Carlo methods in financial engineering. Applications of Mathematics
(New York). Stochastic Modelling and Applied Probability, vol. 53. Springer, New York (2004)
16. Gradshteyn, I.S., Ryzhik, I.M.: Table of Integrals, Series, and Products. Academic, New York
(1980)
17. Heinrich, S.: Multilevel Monte Carlo methods. Lect. Notes Comput. Sci. 2179, 5867 (2001)
18. Jacod, J., Kurtz, T.G., Mlard, S., Protter, P.: The approximate Euler method for Lvy driven
stochastic differential equations. Ann. Inst. H. Poincar Probab. Statist. 41(3), 523558 (2005).
doi:10.1016/j.anihpb.2004.01.007
19. Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differential Equations, Applications
of Mathematics (New York), vol. 23. Springer, Berlin (1992)
20. Kohatsu-Higa, A., Tankov, P.: Jump-adapted discretization schemes for Lvy-driven SDEs.
Stoch. Process. Appl. 120(11), 22582285 (2010)
21. Li, S.: Multilevel Monte Carlo simulation for stochastic differential equations driven by Lvy
processes. Ph.D. dissertation, Westflische Wilhelms-Universitt (2015)
27
22. Maghsoodi, Y.: Mean square efficient numerical solution of jump-diffusion stochastic differential equations. Sankhya Ser. A 58(1), 2547 (1996)
23. Menn, C., Rachev, S.T.: Smoothly truncated stable distributions, GARCH-models, and option
pricing. Math. Methods Oper. Res. 69(3), 411438 (2009)
24. Mordecki, E., Szepessy, A., Tempone, R., Zouraris, G.E.: Adaptive weak approximation of
diffusions with jumps. SIAM J. Numer. Anal. 46(4), 17321768 (2008)
25. Platen, E.: An approximation method for a class of It processes with jump component. Litovsk.
Mat. Sb. 22(2), 124136 (1982)
26. Quek, T., De La Roche, G., Gven, I., Kountouris, M.: Small Cell Networks: Deployment,
PHY Techniques, and Resource Management. Cambridge University Press, Cambridge (2013)
27. Rubenthaler, S.: Numerical simulation of the solution of a stochastic differential equation driven
by a Lvy process. Stoch. Process. Appl. 103(2), 311349 (2003)
28. Sato, K.: Lvy processes and infinitely divisible distributions. Cambridge Studies in Advanced
Mathematics, vol. 68. Cambridge University Press, Cambridge (1999)
29. Vasershtein, L.N.: Markov processes over denumerable products of spaces describing large
system of automata. Problemy Peredaci Informacii 5(3), 6472 (1969)
Abstract A formal mean square error expansion (MSE) is derived for Euler
Maruyama numerical solutions of stochastic differential equations (SDE). The error
expansion is used to construct a pathwise, a posteriori, adaptive time-stepping Euler
Maruyama algorithm for numerical solutions of SDE, and the resulting algorithm is
incorporated into a multilevel Monte Carlo (MLMC) algorithm for weak approximations of SDE. This gives an efficient MSE adaptive MLMC algorithm for handling
a number of low-regularity approximation problems. In low-regularity numerical
example problems, the developed adaptive MLMC algorithm is shown to outperform
the uniform time-stepping MLMC algorithm by orders of magnitude, producing output whose error with high probability is bounded by TOL > 0 at the near-optimal
MLMC cost rate O TOL2 log(TOL)4 that is achieved when the cost of sample
generation is O(1).
Keywords Multilevel monte carlo Stochastic differential equations Euler
Maruyama method Adaptive methods A posteriori error estimation Adjoints
1 Introduction
SDE models are frequently applied in mathematical finance [12, 28, 29], where an
observable may, for example, represent the payoff of an option. SDE are also used
to model the dynamics of multiscale physical, chemical or biochemical systems
H. Hoel (B)
Department of Mathematics, University of Oslo, P.O. Box 1053,
0316 Blindern, Oslo, Norway
e-mail: haakonah@math.uio.no
H. Hoel J. Hppl R. Tempone
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division,
King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
e-mail: juho.happola@kaust.edu.sa
R. Tempone
e-mail: raul.tempone@kaust.edu.sa
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_2
29
30
H. Hoel et al.
[11, 25, 30, 32], where, for instance, concentrations, temperature and energy may
be sought observables.
Given a filtered, complete probability space (, F , (Ft )0tT , P), we consider
the It SDE
dXt = a(t, Xt )dt + b(t, Xt )dWt ,
t (0, T ],
X0 = x0 ,
(1)
A := A A
inf
AA
A{
| AA}
P A = 0 .
The contributions of this work are twofold. First, an a posteriori adaptive timestepping algorithm for computing numerical realizations of SDEs using the Euler
Maruyama method is developed. And second, for a given observable g : Rd1 R,
we construct a mean square error (MSE) adaptive time-stepping multilevel
Monte
Carlo (MLMC) algorithm for approximating the expected value, E g(XT ) , under
the following constraint:
P E g(XT) A TOL 1 .
(2)
31
Sect. 4, we compare the performance of the MSE adaptive and uniform MLMC
algorithms in a couple of numerical examples, one of which is a low-regularity SDE
problem. Finally, we present brief conclusions followed by technical proofs and the
extension of the main result to higher-dimensional problems in the appendices.
(3)
Regarding ordinary differential equations (ODE), the theory for numerical integrators
of different orders for scalar SDE is vast. Provided sufficient regularity, higher order
integrators generally yield higher convergence rates [22]. With MC methods it is
straightforward
to determine that the goal (2) is fulfilled at the computational cost
O TOL21/ , where 0 denotes the weak convergence rate of the numerical
method, as defined in Eq. (5).
As a method of temporal discretization, the EulerMaruyama scheme is given by
X tn+1 = X tn + a(tn , X tn )tn + b(tn , X tn )Wn ,
X 0 = x0 ,
(4)
(5)
32
H. Hoel et al.
(6)
cf. [22]. For multi-dimensional SDE problems, higher order schemes are generally
less applicable, as either the diffusion coefficient matrix has to fulfill a rigid commutativity condition, or Levy areas, required in higher order numerical schemes, have to
be accurately approximated to achieve better convergence rates than those obtained
with the EulerMaruyama method [22].
1.2.1
MLMC Notation
M
L
gm
=0 m=1
M
(7)
where
{0}
g X T ,
if = 0,
{1}
{}
gm :=
g X
, otherwise.
m,T g X m,T
Here, the positive integer, L, denotes the final level of the estimator, M is the number
{}
{1}
of sample realizations on the th level, and the realization pair, X m,T and X m,T , are
copies of the by the EulerMaruyama method (4) approximations of the SDE using
the same Wiener path, Wm , sampled on the respective meshes, t {} and t {1} ,
(cf. Fig. 1).
33
Fig. 1 (Left) A sample Wiener path, W , generated on the coarse mesh, t {0} , with uniform step
size 1/10 (blue line). The path is thereafter Brownian bridge interpolated onto a finer mesh, t {1} ,
which has uniform step size of 1/20 (green line). (Right) EulerMaruyama numerical solutions of the
OrnsteinUhlenbeck SDE problem, dXt = 2(1 Xt )dt + 0.2dWt , with initial condition X0 = 3/2,
are computed on the meshes t {0} (blue line) and t {1} (green line) using Wiener increments from
the respective path resolutions
1.2.2
In the uniform time-stepping MLMC introduced in [8], the respective SDE realiza{}
tions {X T } are constructed on a hierarchy of uniform meshes with geometrically
decaying step size, min t {} = max t {} = T /N , and N = c N0 with c N\{1}
and N0 an integer. For simplicity, we consider the uniform time-stepping MLMC
method with c = 2.
1.2.3
By construction,
multilevel estimator is telescoping in expectation, i.e.,
the
{L}
E AML = E g X T . Using this property, we may conveniently bound the multilevel approximation error:
E g(XT) A E g(XT) g X {L} + E g X {L} A
.
ML
ML
T
T
=:ET
=:ES
The approximation goal (2) is then reached by ensuring that the sum of the bias, ET ,
and the statistical error, ES , is bounded from above by TOL, e.g., by the constraints
ET TOL/2 and ES TOL/2, (see Sect. 3.2 for more details on the MLMC error
control). For the MSE error goal,
2
TOL2 ,
E E g(XT) AML
the following theorem states the optimal computational cost for MLMC:
34
H. Hoel et al.
(ii) Var( g) = O N ,
(iii) Cost( g) = O N .
Then, for any TOL < e1 , there exists an L and a sequence {M }L=0 such that
2
TOL2 ,
E AML E g(XT)
(8)
and
2
,
if > ,
O TOL
2
2
log(TOL)
,
if
= ,
O
TOL
Cost(AML ) =
2+
O TOL
,
if < .
(9)
In comparison,
the computational cost of achieving the goal (8) with single-level
MC is O TOL2 / . Theorem 1 thus shows that for any problem with > 0,
MLMC will asymptotically be more efficient than single-level MC. Furthermore,
the performance gain of MLMC over MC is particularly apparent in settings where
. The latter property is linked to the contributions of this work. In low-regularity
SDE problems, e.g., Example 6 below and [1, 35], the uniform time-stepping Euler
Maruyama results in convergence rates for which < . More sophisticated integrators can preserve rates such that .
Remark 1 Similar accuracy versus complexity results to Theorem 1, requiring
slightly stronger moment bounds, have also been derived for the approximation
goal (2) in the asymptotic setting when TOL 0, cf. [5, 16].
1.2.4
35
as it often will lead to improved convergence rates, (since Var( g) E g2 ),
which by Theorem 1 may reduce the computational cost of MLMC. In Theorem 2,
we derive the following error expansion for the MSE of EulerMaruyama numerical
solutions of the SDE (1):
N1
2
2
2
=E
n tn + o tn ,
E g(XT ) g X T
(10)
n=0
where the error density, n , is a function of the local error and sensitivities from the
dual solution of the SDE problem, as defined in (24). The error expansion (10) is an
a posteriori error estimate for the MSE, and in our adaptive algorithm, the mesh is
refined by equilibration of the expansions error indicators
r n := n tn2 , for n = 0, 1, . . . , N 1.
1.2.5
(11)
Using the described MSE adaptive algorithm, we construct an MSE adaptive MLMC
{}
algorithm in Sect. 3. The MLMC algorithm generates SDE realizations, {X T } , on
a hierarchy of pathwise adaptively refined meshes, {t {} } . The meshes are nested,
i.e., for all realizations ,
t {0} () t {1} () . . . t {} () . . . ,
with the constraint that the number of time steps in t {} , t {} , is bounded by 2N :
{}
t < 2N = 2+2 N1 .
Here, N1 denotes the pre-initial number of time steps; it is an integer set in advance
of the computations. This corresponds to the hierarchy setup for the uniform timestepping MLMC algorithm in Sect. 1.2.2.
The potential efficiency gain of adaptive MLMC is experimentally illustrated in
this work using the drift blow-up problem
dXt =
rXt
dt + Xt dWt , X0 = 1.
|t |p
This problem is addressed in Example 6 for the three different singularity exponents
p = 1/2, 2/3 and 3/4, with a pathwise, random singularity point U(1/4, 3/4),
an observable g(x) = x, and a final time T = 1. For the given singularity exponents, we observe experimental deteriorating convergence rates, = (1 p) and
= 2(1 p), for the uniform time-stepping EulerMaruyama integrator, while for
36
H. Hoel et al.
TOL2
TOL2
TOL2
TOL2
TOL3
TOL4
1.2.6
Gaines and Lyons work [7] is one of the seminal contributions on adaptive algorithms for SDE. They present an algorithm that seeks to minimize the pathwise error
of the mean and variation of the local error conditioned on the -algebra generated by
(i.e., the values at which the Wiener path has been evaluated in order to numerically
integrate the SDE realization) {Wtn }Nn=1 . The method may be used in combination
with different numerical integration methods, and an approach to approximations of
potentially needed Levy areas is proposed, facilitated by a binary tree representation
of the Wiener path realization at its evaluation points. As for a posteriori adaptive
algorithms, the error indicators in Gaines and Lyons algorithm are given by products of local errors and weight terms, but, unlike in a posteriori methods, the weight
terms are computed from a priori estimates, making their approach a hybrid one.
Szepessy et al. [31] introduced a posteriori weak error based adaptivity for
the EulerMaruyama algorithm with numerically computable error indicator terms.
Their development of weak error adaptivity took inspiration from Talay and Tubaros
seminal work [33], where an error expansion for the weak error was derived for the
EulerMaruyama algorithm when uniform time steps were used. In [16], Szepessy
et al.s weak error adaptive algorithm was used in the construction of a weak error
adaptive MLMC algorithm. To the best of our knowledge, the present work is the
first on MSE a posteriori adaptive algorithms for SDE both in the MC- and MLMC
setting.
Among other adaptive algorithms for SDE, many have refinement criterions based
only or primarily on estimates of the local error. For example in [17], where the stepsize depends on the size of the diffusion coefficient for a MSE EulerMaruyama
adaptive algorithm; in [23], the step-size is controlled by the variation in the size of
the drift coefficient in the constructed EulerMaruyama adaptive algorithm, which
preserves the long-term ergodic behavior of the true solution for many SDE problems;
and in [19], a local error based adaptive Milstein algorithm is developed for solving
multi-dimensional chemical Langevin equations.
37
(12)
The derivation of our adaptive algorithm consists of two steps. First, an error expansion for the MSE is presented in Theorem 2. Based on the error expansion, we
thereafter construct a mesh refinement algorithm. At the end of the section, we apply
the adaptive algorithm to a few example problems.
a(u, Xu )du +
b(u, Xu )dWu ,
s [t, T ],
(13)
and in light of this notation, Xt is shorthand for Xtx0 ,0 . For a given observable g, the
payoff-of-flow map function is defined by (t, x) = g(XTx,t ). We also make use of
the following function space notation
C(U) := {f : U R | f is continuous},
Cb (U) := {f : U R | f is continuous and bounded},
dj
Cbk (R) := f : R R | f C(R) and j f Cb (R) for all integers 1 j k ,
dx
k1 ,k2
Cb ([0, T ] R) := f : [0, T ] R R | f C([0, T ] R) and
j
t 1 xj2 f Cb ([0, T ] R) for all integers j1 k1 and 1 j1 + j2 k2 .
38
H. Hoel et al.
We are now ready to present our mean square expansion result, namely,
Theorem 2 (1D MSE leading-order error expansion) Assume that drift and diffusion
coefficients and input data of the SDE (1) fulfill
(R.1) a, b Cb2,4 ([0, T ] R),
(R.2) there exists a constant C > 0 such that
|a(t, x)|2 + |b(t, x)|2 C(1 + |x|2 ),
x R and t [0, T ],
p
E tn2p c3 E tn2 .
Then, as N increases,
(bx b)2
2
N1
(tn , X tn )tn2 + o tn2 ,
=
E g(XT ) g X T
E x tn , X tn
2
n=0
(14)
2
2
N1
2 (bx b)
(tn , X tn )tn2 + o tn2 .
E g(XT ) g X T
=
E x,n
2
n=0
(15)
We present the proof to the theorem in Appendix Error Expansion for the MSE
in 1D
Remark 2 In condition (M.2) of the above theorem we have introduced N to denote
the deterministic upper bound for the number of time steps in all mesh realizations.
Moreover, from this point on the mesh points {tn }n and time steps {tn }n are defined
with the natural extension tn = T and tn = 0 for all
for all indices {0, 1, . . . , N}
In addition to ensuring an upper bound on the complexity of a
n {N + 1, . . . , N}.
39
2.1.1
(16)
where ax denotes the partial derivative of a with respect to its spatial argument. To
describe conditions under which the terms g (Xsx,t ) and x Xsx,t are well defined, let us
first recall that if Xsx,t solves the SDE (13) and
E
t
|Xsx,t |2 ds
< ,
then we say that there exists a solution to the SDE. If a solution Xsx,t exists and all
solutions
Xsx,t satisfy
P sup Xsx,t
Xsx,t > 0 = 0,
s[t,T ]
40
H. Hoel et al.
Lemma 1 Assume the regularity assumptions (R.1), (R.2), (R.3), and (R.4)
in
Theorem 2 hold, and that for any fixed t [0, T ], x is Ft -measurable and E |x|2p < ,
for all p N. Then there exist pathwise unique solutions Xsx,t and x Xsx,t to the respective SDE (13) and (16) for which
!
"
x,t 2p
x,t 2p
max E sup Xs
, E sup x Xs
< , p N.
s[t,T ]
s[t,T ]
E |x (t, x) |2p < , p N.
We leave the proof of the Lemma to Appendix Variations of the flow map.
To obtain an all-terms-computable error expansion in Theorem 2, which will be
needed to construct an a posteriori adaptive algorithm, the first variation of the flow
map, x , is approximated by the first variation of the EulerMaruyama numerical
solution,
x,n := g (X T )X tn X T .
Here, for k > n, x X
(x X
X tn ,tn
X tn ,tn
X tn ,tn
X tn ,tn
X tn ,tn
)tj Wj ,
(17)
= 1, which is cou-
Lemma 2 If the assumptions (R.1), (R.2), (R.3), (R.4), (M.1) and (M.2) in Theorem 2
hold, then the numerical solution X of (4) converges in mean square sense to the
solution of the SDE (1),
2p
1/2p
max E X tn Xtn
C N 1/2 ,
(18)
2p
max E X tn < , p N.
(19)
1nN
and
1nN
For any fixed instant of time tn in the mesh, 1 n N, the numerical solution X tn X
of (17) converges in mean square sense to x X Xtn ,tn ,
#
2p $1/2p
X tk ,tk
Xtn ,tn
x Xtk
C N 1/2 .
max E x X
nkN
(20)
41
2p
X tk ,tk
max E x X
< , p N.
nkN
and
(21)
E |x,n |2p < , p N.
(22)
x X T n
N1
%
1 + ax (tk , X tk )tk + bx (tk , X tk )Wk ,
k=n
(23)
N1
%
cx (tk , X tk )
k=n
= cx (tn , X tn )g (X T )
N1
%
cx (tk , X tk )
k=n+1
(bx b)2
(tn , X tn ), n = 0, 1, . . . , N 1,
2
(24)
42
H. Hoel et al.
and, for representing the numerical solutions error contribution from the time interval
(tn , tn+1 ), the error indicators
r n := n tn2 , n = 0, 1, . . . , N 1.
(25)
2
N1
=
E r n + o tn2 .
E g(XT ) g X T
(26)
n=0
(s)t(s)ds + (
ds N),
t(s)
(27)
(s)t(s)ds
T
0
ds = N,
t(s)
n tn2
2
g(XT ) g X T
=
, n = 0, 1, . . . , N 1.
N
(28)
, (29)
43
.
T
E
N
(s) ds .
(30)
A consequence of observations (29) and (30) is that for many low-regularity problems, for instance, if (s) = sp with p [1, 2), adaptive time-stepping Euler
Maruyama methods may produce more accurate solutions (measured in the MSE)
than are obtained using the uniform time-stepping EulerMaruyama method under
the same computational budget constraints.
2.2.1
(31)
(32)
n +2
44
H. Hoel et al.
(C.3) Go to step (C.4) if Nrefine mesh refinements have been made; otherwise, return
to step (C.1).
(C.4) (Postconditioning) Do a last sweep over the mesh and refine by
every
halving
1
denotes
time step that is strictly larger than tmax , where tmax = O N
the maximum allowed step size.
The postconditioning step (C.4) ensures that all time steps become infinitesimally
small as the number of time steps N with such a rate of decay that condition
(M.2) in Theorem 2 holds and is thereby one of the necessary conditions from
Lemma 2 to ensure strong convergence for the numerical solutions of the MSE
adaptive EulerMaruyama algorithm. However, the strong convergence result should
primarily be interpreted as a motivation for introducing the postconditioning step
(C.4) since Theorem 2s assumption (M.1), namely that the mesh points are stopping
times tn measurable with respect to Ftn1 , will not hold in general for our adaptive
algorithm.
2.2.2
When a time step is refined, as described in (32), the Wiener path must be refined
correspondingly. The value of the Wiener path at the midpoint between Wtn and
Wtn +1 can be generated by Brownian bridge interpolation,
W
tnnew
+1
Wt + Wtn +1
+
= n
2
tn
,
2
(33)
where N(0, 1), cf. [27]. See Fig. 1 for an illustration of Brownian bridge interpolation applied to numerical solutions of an OrnsteinUhlenbeck SDE.
2.2.3
After the refinement of an interval, (tn , tn +1 ), and its Wiener path, error indicators
must also be updated before moving on to determine which interval is next in line for
refinement. There are different ways of updating error indicators. One expensive but
more accurate option is to recompute the error indicators completely by first solving
the forward problem (4) and the backward problem (23). A less costly but also less
accurate alternative is to update only the error indicators locally at the refined time
step by one forward and backward numerical solution step, respectively:
new
X tn +1 = X tn + a(tn , X tn )tnnew
+ b(tn , X tn )Wnnew
,
new
new
x,n
)x,n +1 .
+1 = cx (tn , X t new
n
(34)
45
new 2
r n +1 = new
n +1 tn +1 .
(35)
As a compromise between cost and accuracy, we here propose the following mixed
approach to updating error indicators post refinement: With Nrefine denoting the prescribed number of refinement iterations of the input mesh, let all error indicators
= O(log(Nrefine ))th iteration, whereas for the
be completely recomputed every N
iterations, only local updates of the error indicators are comremaining Nrefine N
puted. Following this approach, the computational cost of
refining a mesh holding
N time steps into a mesh of 2N time steps becomes O N log(N)2 . Observe that
the asymptotically dominating cost is to sort the meshs error indicators O(log(N))
times. To anticipate the computational cost for the MSE adaptive MLMC algorithm, this implies
that
the cost of generating an MSE adaptive realization pair is
Cost( g) = O 2 2 .
2.2.4
Pseudocode
The mesh refinement and the computation of error indicators are presented in Algorithms 1 and 2, respectively.
Algorithm 1 meshRefinement
Input: Mesh t, Wiener path W , number of refinements Nrefine , maximum time step tmax
Output: Refined mesh t and Wiener path W .
= O (log(Nrefine )) and
Set the number of re-computations of all error indicators to a number N
, = Nrefine /N.
compute the refinement batch size N
do
for i = 1 to N
Completely update the error density by applying
[r, X, x , ] = computeErrorIndicators(t, W ).
, then
if Nrefine > 2N
,
Set the below for-loop limit to J = N.
else
Set J = Nrefine .
end if
for j = 1 to J do
Locate the largest error indicator r n using Eq. (31).
Refine the interval (tn , tn +1 ) by the halving (32), add a midpoint value Wnnew
+1 to the Wiener
path by the Brownian bridge interpolation (33), and set Nrefine = Nrefine 1.
Locally update the error indicators rnnew
and rnnew
46
H. Hoel et al.
Algorithm 2 computeErrorIndicators
Input: mesh t, Wiener path W .
Output: error indicators r, path solutions X and x , error density .
Compute the SDE path X using the EulerMaruyama algorithm (4).
Compute the first variation x using the backward algorithm (23).
Compute the error density and error indicators r by the formulas (24) and (25), respectively.
E (XT X T )2 = min!, N given,
(36)
at the final time, T = 1, (cf. the goal (B.1)). One may derive that the dual solution
of this problem is of the form
x (Xt , t) = Xt XTXt ,t =
XT
,
Xt
X2
(bx b)2 (Xt , t) (x (Xt , t))2
= T.
2
2
47
W0 = 0,
X0 = 0.
Here, we seek to minimize the MSE E (XT X T )2 for the observable
XT =
Wt dWt
0
= Xt Xt +
Ws dWs
= 1,
it follows from the error density in multi-dimensions in Eq. (65) that (t) = 21 . We
conclude that uniform time-stepping is optimal for this problem as well.
Example 3 Next, we consider the three-dimensional (3D) SDE problem
dWt(1) = 1dWt(1) ,
W0(1) = 0,
dWt(2) = 1dWt(2) ,
W0(2) = 0,
X0 = 0,
1
0
1 ,
b((Wt , Xt ), t) = 0
(1)
Wt Wt(2)
48
H. Hoel et al.
T
= Xt Xt +
(Ws(1) dWs(2) Ws(2) dWs(1) ),
= 1,
it follows from Eq. (65) that (t) = 1. We conclude that uniform time-stepping is
optimal for computing Levy areas.
Example 4 As the last example, we consider the 2D SDE
dWt = 1dWt ,
dXt =
3(Wt2
W0 = 0,
t)dWt ,
X0 = 0.
We seek to minimize the MSE (36) at the final time T = 1. For this problem, it
may be shown by It calculus that the pathwise exact solution is XT = WT3 3WT T .
Representing the diffusion matrix by
b((Wt , Xt ), t) =
3(Wt2 t)
Equation (65) implies that (t) = 18Wt2 . This motivates the use of discrete error
indicators, r n = 18Wt2n tn2 , in the mesh refinement criterion. For this problem, we
may not directly conclude that the error expansion (67) holds since the diffusion
coefficient does not fulfill the assumption in Theorem 3. Although we will not include
j
the details here, it is easy to derive that x XTx,t = 0 for all j > 1 and to prove that
the MSE leading-order error expansion also holds for this particular problem by
following the steps of the proof of Theorem 2. In Fig. 2, we compare the uniform
and adaptive time-stepping EulerMaruyama algorithms in terms of MSE versus the
100
102
103
101
102
103
104
49
number of time steps, N. Estimates for the MSE for both algorithms are computed
by MC sampling using M = 106 samples. This is a sufficient sample size to render
the MC estimates statistical error negligible. For the adaptive algorithm, we have
used the following input parameter in Algorithm 1: uniform input mesh, t, with
step size 2/N (and tmax = 2/N). The number of refinements is set to Nrefine = N/2.
We observe that the algorithms have approximately equal convergence rates, but, as
expected, the adaptive algorithm is slightly more accurate than the uniform timestepping algorithm.
P E g(XT) AML TOL 1 .
We denote the multilevel estimator by
AML :=
M
L
gm
,
M
=0 m=1
=:A ( g;M )
where
gX m,T ,
1 if = 0,
gm :=
g X m,T g X m,T , else.
(37)
50
3.1.1
H. Hoel et al.
A realization, g i, , is generated on a nested pair of mesh realizations
. . . t {1} (i, ) t {} (i, ).
Subsequently, mesh realizations are generated step by step from a prescribed and
deterministic input mesh, t {1} , holding N1 uniform time steps. First, t {1} is
refined into a mesh, t {0} , by applying Algorithm 1, namely
[t {0} , W {0} ] = meshRefinement t {1} , W {1} , Nrefine = N1 , tmax = N01 .
The mesh refinement process is iterated until meshes t {1} and t {1} are produced, with the last couple of iterations being
1
,
[t {1} , W {1} ] = meshRefinement t {2} , W {2} , Nrefine = N2 , tmax = N1
and
[t {} , W {} ] = meshRefinement t {1} , W {1} , Nrefine = N1 , tmax = N1 .
{1}
{}
is thereafter
The output realization for the difference gi = g X i g X i
generated on the output temporal mesh and Wiener path pairs, (t {1} , W {1} ) and
(t {} , W {} ).
For later estimates of the computational cost of the MSE adaptive MLMC algorithm, it is useful to have upper bounds on the growth of the number of time steps
in the mesh hierarchy, {t {} } , as increases. Letting |t| denote the number of
time steps in a mesh, t (i.e., the cardinality of the set t = {t0 , t1 , . . .}), the
following bounds hold
N t {} < 2N
N0 .
The lower bound follows straightforwardly from the mesh hierarchy refinement procedure described above. To show the upper bound, notice the maximum number of
mesh refinements going from a level 1 mesh, t {1} to a level mesh, t {} is
2N1 1. Consequently,
|t {} | |t {1} | +
1
j=0
N1 + 2
j=0
51
{}
to hold, it is not
Remark 4 For the telescoping property E AML = E g X T
required that the adaptive mesh hierarchy is nested, but non-nested meshes make it
more complicated to compute Wiener path pairs (W {1} , W {} ). In the numerical
tests leading to this work, we tested both nested and non-nested adaptive meshes and
found both options performing satisfactorily.
E g(XT) A
ML
E g(XT) g X {L} + E g X {L} A
ML
T
T
=:ET
=:ES
and
TOL = TOLT + TOLS ,
(38)
(39)
(40)
Under the moment assumptions stated in [6], Lindebergs version of the Central
Limit Theorem yields that as TOL 0,
{L}
AML E g X T
D
1
N(0, 1).
Var AML
D
Here,
denotes convergence in distribution. By construction, we have
L
Var( g)
Var AML =
.
M
=0
52
H. Hoel et al.
(41)
CC
CC ()
()ex
/2
dx = (1 ),
(42)
C M .
=0
An optimization of the number of samples at each level can then be found through
minimization of the Lagrangian
L (M0 , M1 , . . . , ML , ) =
# L
Var( g)
=0
M
TOLS 2
CC 2 ()
$
+
C M ,
=0
yielding
2
M =
CC 2 ()
TOLS 2
4
L
Var( g) *
C Var( g) , = 0, 1, . . . , L.
C
=0
Since the cost of adaptively refining a mesh, t {} , is O N log(N )2 , as noted in
Sect.
2.2.3, the cost of generating an SDE realization, is of the same order: C =
O N log(N )2 . Representing the cost by its leading-order term and disregarding the
logarithmic factor, an approximation to the level-wise optimal number of samples
becomes
3
4
2
L
CC 2 () Var( g) *
(43)
N Var( g) , = 0, 1, . . . , L.
M =
N
TOLS 2
=0
53
Remark 5 In our MLMC implementations, the variances, Var( g), in Eq. (43)
are approximated by sample variances. To save memory in our parallel computer
implementation,
the maximum permitted batch size for a set of realizations,
, samples,
{ g i, }i , is set to 100,000. For the initial batch consisting of M = M
the sample variance is computed by the standard approach,
M
1
( g i, A ( g; M ))2 .
V ( g; M ) =
M 1 i=1
+M
Thereafter, for every new batch of realizations, { g i, }M
i=M +1 (M here denotes
an arbitrary natural number smaller or equal to 100,000), we incrementally update
the sample variance,
M
V ( g; M )
M + M
M
+M
1
( g i, A ( g; M + M))2 ,
+
(M + M 1) i=M +1
V ( g; M + M) =
3.2.2
To control the time discretization error, we assume that a weak order convergence
rate, > 0, holds for the given SDE problem when solved with the EulerMaruyama
method, i.e.,
{L}
E g(XT) g X T = O NL ,
and we assume that the asymptotic rate is reached at level L 1. Then
{L}
E L g
.
E g E L g
2
=
E g(XT) g X T =
2 1
=L+1
=1
In our implementation, we assume
54
H. Hoel et al.
max 2 |A (L1 g; ML1 )| , |A (L g; ML )|
TOLT ,
2 1
(44)
(cf. Algorithm 3). Here we implicitly assume that the statistical error in estimating
the bias condition is not prohibitively large.
A final level L of order log(TOLT 1 ) will thus control the discretization error.
3.2.3
Computational Cost
Under the convergence rate assumptions stated in Theorem 1, it follows that the cost
of generating
an adaptive
MLMC
estimator, AML , fulfilling the MSE approximation
CML
O TOL2 ,
if > 1,
2
4
=
M C O TOL log(TOL) , if = 1,
=0
O TOL2+ log(TOL)2 , if < 1.
(45)
55
Algorithm 3 mlmcEstimator
Input: TOLT , TOLS , confidence , initial mesh t {1} , initial number of mesh steps N1 , input
,
weak rate , initial number of samples M.
Output: Multilevel estimator AML .
Compute the confidence parameter CC () by (42).
Set L = 1.
while L < 2 or (44), using the input for the weak rate, is violated do
Set L = L + 1.
L
,
Set ML = M,
generate a set of realizations { g i, }M
by
i=1
adaptiveRealizations(t {1} ).
for = 0 to L do
Compute the sample variance V ( g; Ml ).
end for
for = 0 to L do
Determine the number of samples M by (43).
if new value of M is larger than the old value then
Mnew
by
Compute
additional
realizations
{ g i, }i=M
+1
adaptiveRealizations(t {1} ).
end if
end for
end while
Compute AML from the generated samples by using formula (7).
applying
applying
56
H. Hoel et al.
Algorithm 4 adaptiveRealization
Input: Mesh t {1} .
Outputs: One realization g()
Generate a Wiener path W {1} on the initial mesh t {1} .
for j = 0 to do
Refine the mesh by applying
[t {j} , W {j} ] = meshRefinement(t {j1} , W {j1} , Nrefine = Nj1 , tmax = Nj1 ).
end for
{1}
{}
Compute EulerMaruyama realizations (X T
, X T )() using the mesh pair (t {1} , t {} )()
{1}
{}
and Wiener path pair (W
, W )(), cf. (4), and return the output
{}
{1}
g() = g X T () g X T
() .
ing for both examples, all of the assumptions in Theorem 2 are not fulfilled for our
adaptive algorithm, when applied to either of the two examples. We are therefore not
able to prove theoretically that our adaptive algorithm converges in these examples.
For reference, the implemented MSE adaptive MLMC algorithm is described in
Algorithms 14, the standard form of the uniform time-stepping MLMC algorithm
that we use in these numerical comparisons is presented in Algorithm 5, Appendix A
Uniform Time Step MLMC Algorithm, and a summary of the parameter values used
in the examples is given in Table 2. Furthermore, all average properties derived from
the MLMC algorithms that we plot for the considered examples in Figs. 3, 4, 5, 6, 7,
8, 9, 10, 11 and 12 below are computed from 100 multilevel estimator realizations,
and, when plotted, error bars are scaled to one sample standard deviation.
Example 5 We consider the geometric Brownian motion
dXt = Xt dt + Xt dWt , X0 = 1,
where we seek to fulfill the weak approximation goal (2) for
the observable, g(x) = x,
at the final time, T = 1. The reference solution is E g(XT) = eT . From Example 1,
we recall that the MSE minimized in this problem by using uniform time steps.
However, our a posteriori MSE adaptive MLMC algorithm computes error indicators
from numerical solutions of the path and the dual solution, which may lead to slightly
non-uniform output meshes. In Fig. 3, we study how
to uniform the MSE
close
t {} /N , where we recall
adaptive
meshes
are
by
plotting
the
level-wise
ratio,
E
that t {} denotes the number of time steps in the mesh, t {} , and
that a uniform
mesh on level has N time steps. As the level, , increases, E t {} /N converges
to 1, and to interpret this result, we recall from the construction of the adaptive mesh
57
Table 2 List of parameter values used by the MSE adaptive MLMC algorithm and (when required)
the uniform MLMC algorithm for the numerical examples in Sect. 4
Parameter
Description of parameter
Example 5
Example 6
TOL
TOLS
TOLT
t {1}
N0
N()
tmax ()
tmin
,
M
0.1
[103 , 101 ]
TOL/2
TOL/2
1/2
0.1
[103 , 101 ]
TOL/2
TOL/2
1/2
log(+2)
log(2)
log(+2)
log(2)
N1
251
N1
251
100
20
(1 p)
hierarchy in Sect. 3 that if t {} = N , then the mesh, t {} , is uniform. We thus
conclude that for this problem, the higher the level, the more uniform the MSE
adaptive mesh realizations generally become.
Since adaptive mesh refinement is costly and since this problem has sufficient
regularity for the first-order weak and MSE convergence rates (5) and (6) to hold,
respectively, one might expect that MSE adaptive MLMC will be less efficient than
the uniform MLMC. This is verified in Fig. 5, which shows that the runtime of the
MSE adaptive MLMC algorithm grows slightly faster than the uniform MLMC algorithm and that the cost ratio is at most roughly 3.5, in favor of uniform MLMC. In
Fig. 4, the accuracy of the MLMC algorithms is compared, showing that both algorithms fulfill the goal (2) reliably. Figure 6 further shows that
algorithms have
both
roughly first-order convergence rates for the weak error E g and the variance
Var( g), and that the decay rates for Ml are close to identical. We conclude that
58
H. Hoel et al.
Number of time steps ratio E[|t{} |]/N
1.010
1.008
1.006
1.004
1.002
1.000
0
10
12
Level
Fig. 3 The ratio of the level-wise mean number of time steps E t {} /N , of MSE adaptive
mesh realizations to uniform mesh realizations for Example 6
Fig. 4 For a set of TOL values, 100 realizations of the MSE adaptive multilevel estimator
are
computed using both MLMC algorithms for Example 5. The errors |AML (i ; TOL, ) E g(XT) |
are respectively plotted as circles (adaptive MLMC) and triangles (uniform MLMC), and the
number
of multilevel estimator realizations failing the constraint |AML (i ; TOL, ) E g(XT) | < TOL
is written above the (TOL1 , TOL) line. Since the confidence parameter is set to = 0.1 and less
than 10 realizations fail for any of the tested TOL values, both algorithms meet the approximation
goal (37)
although MSE adaptive MLMC is slightly more costly than uniform MLMC, the
algorithms perform comparably in terms of runtime for this example.
Remark 7 The reason why we are unable to prove theoretically that the numerical
solution of this problem computed with our adaptive algorithm asymptotically converges to the true solution is slightly subtle. The required smoothness conditions in
Theorem 2 are obviously fulfilled, but due to the local update of the error indicators
in our mesh refinement procedure, (cf. Sect. 2.2.3), we cannot prove that the mesh
points will asymptotically be stopping times for which tn is Ftn1 -measurable for all
n {1, 2, . . . , N}. If we instead were to use the version of our adaptive algorithm
that recomputes all error indicators for each mesh refinement, the definition of the
error density (24) implies that, for this particular problem, it would take the same
59
104
adaptive MLMC
uniform MLMC
c1 TOL2 log(TOL)2
103
2
Runtime [s]
10
101
100
101
102
103
101
102
103
1
TOL
Fig. 5 Average runtime versus TOL1 for the two MLMC algorithms solving Example 5
Adaptive MLMC
Uniform MLMC
10
101
102
A(g ; M )
103
A( g; M )
104
c2
105
101
100
101
102
V(g ; M )
103
V( g; M )
c2
10
1010
109
108
c2
10
106
105
104
Level
10
12 0
10
12
Level
60
H. Hoel et al.
7
2
value, n = N1
k=0 cx (tk , X tk ) /2, for all indices, n {0, 1, . . . , N}. The resulting
adaptively refined mesh would then become uniform and we could verify convergence, for instance, by using Theorem 2. Connecting this to the numerical results for
the adaptive algorithm that we have implemented
here, we notice that the level-wise
mean number of time steps ratio, E t {} /N , presented in Fig. 3 seems to tend
towards 1 as increases, a limit ratio that is achieved only if t {} is indeed a uniform
mesh.
Example 6 We next consider the two-dimensional SDE driven by a one-dimensional
Wiener process
dXt = a(t, Xt ; )dt + b(t, Xt ; )dWt
X0 = [1, ]T ,
(46)
with the low-regularity drift coefficient, a(t, x) = [r|t x (2) |p , 0]T , interest rate,
r = 1/5, and volatility b(t, x) = [, 0]T with, = 0.5, and observable, g(x) = x, at
the final time T = 1. The in the initial condition is distributed as U(1/4, 3/4)
and it is independent from the Wiener process, W . Three different blow-up exponent
test cases are considered, p = (1/2, 2/3, 3/4), and to avoid blow-ups in the numerical
integration of the drift function component, f (; ), we replace the fully explicit
EulerMaruyama integration scheme with the following semi-implicit scheme:
X tn+1 = X tn +
rf (tn ; )X tn tn + X tn Wn ,
if f (tn ; ) < 2f (tn+1 ; ),
(47)
rf (tn+1 ; )X tn tn + X tn Wn , else,
where we have dropped the superscript for the first component of the SDE, writing
out only the first component, since the evolution of the second component is trivial.
For p [1/2, 3/4] it may be shown that for any singularity point, any path integrated
by the scheme (47) will have at most one drift-implicit integration step. The reference
mean for the exact solution is given by
E[XT ] = 2
3/4
exp
1/4
r(x 1p + (1 x)1p )
dx,
1p
2.5
{2}
{6}
X t ()
X t ()
{4}
{8}
X t ()
2.0
61
X t ()
1.5
1.0
0.5
0.0
101
102
103
104
105
106
107
108
109
1010
1011
1012
1013
1014
1015
0.0
0.2
0.4
0.6
0.8
1.00.0
0.2
0.4
0.6
0.8
1.00.0
0.2
0.4
0.6
0.8
time t
time t
time t
t{2} ()
t{6} ()
t{4} ()
t{8} ()
0.2
0.4
0.6
time t
0.8
1.00.0
0.2
0.4
0.6
time t
0.8
1.00.0
0.2
0.4
0.6
0.8
1.0
1.0
time t
Fig. 7 (Top) One MSE adaptive numerical realization of the SDE problem (46) at different mesh
hierarchy levels. The blow-up singularity point is located at 0.288473 and the realizations
are computed for three singularity exponent values. We observe that as the exponent, p, increases,
the more jump at t = becomes more pronounced. (Bottom) Corresponding MSE adaptive mesh
realizations for the different test cases
N1
2
N(at + ax a)2 tn2 + (bx b)2 (tn , X tn ; ) 2
2
tn .
E X T XT E
x,n
2
n=0
This is the error expansion we use for the adaptive mesh refinement (in Algorithm 1)
in this example. In Fig. 7, we illustrate the effect that the singularity exponent, p, has
on SDE and adaptive mesh realizations.
Implementation Details and Observations
Computational tests for the uniform and MSE adaptive MLMC algorithms are implemented with the input parameters summarized in Table 2. The weak convergence
rate, , which is needed in the MLMC implementations stopping criterion (44), is
estimated experimentally as (p) = (1 p) when using the EulerMaruyama integrator with uniform time steps, and roughly = 1 when using the EulerMaruyama
integrator with adaptive time steps, (cf. Fig. 8). We further estimate the variance convergence rate to (p) = 2(1 p), when using uniform time-stepping, and roughly
62
H. Hoel et al.
p = 0.5, TOL = 103
100
101
102
103
A(g ; M )
A( g; M )
104
c2
105
Level
10 1
Level
10 1
Level
10
100
101
102
A(g ; M )
103
10
c2 TOL0.5
10
Level
A( g; M )
c20.33
15
20
10
15
Level
c20.25
20
25
10
15
Level
20
25
Fig. 8 (Top) Average errors E g for Example 6 solved with the MSE adaptive MLMC algorithm for three singularity exponent values. (Bottom) Corresponding average errors for the uniform
MLMC algorithm
to = 1 when using MSE adaptive time-stepping, (cf. Fig. 9). The low weak convergence rate for uniform MLMC implies that the number of levels L in the MLMC
estimator will be become very large, even with fairly high tolerances. Since computations of realizations on high levels are extremely costly, we have, for the sake of
, = 20, for the initial number
computational feasibility, chosen a very low value, M
of samples in both MLMC algorithms. The respective estimators use of samples,
M , (cf. Fig. 10), shows that the low number of initial samples is not strictly needed
for the the adaptive MLMC algorithm, but for the sake of fair comparisons, we have
chosen to use the same parameter values in both algorithms.
From the rate estimates of and , we predict the computational cost of reaching
the approximation goal (37) for the respective MLMC algorithms to be
1
Costadp (AML ) = O log(TOL)4 TOL2 and Costunf (AML ) = O TOL 1p ,
63
101
100
101
102
103
V(g ; M )
104
V( g; M )
10
106
c2
2
Level
10 1
Level
10 1
Level
10
101
100
101
102
103
104
105
10
V(g ; M )
c21
106
108
c2 TOL0.5
V( g; M )
0.67
c2
5
10
Level
15
20
10
15
Level
20
25
10
15
Level
20
25
Fig. 9 (Top) Variances Var( g) for for Example 6 solved with the MSE adaptive MLMC algorithm
for three singularity exponent values. (Bottom) Corresponding variances for the uniform MLMC
algorithm. The more noisy data on the highest levels is due to the low number used for the initial
= 20, and only a subset of the generated 100 multilevel estimator realizations reached
samples, M
the last levels
by using the estimate (45) and Theorem 1 respectively. These predictions fit well
with the observed computational runtime for the respective MLMC algorithms,
(cf. Fig. 11). Lastly, we observe that the numerical results are consistent with both
algorithms fulfilling the goal (37) in Fig. 12.
Computer Implementation
The computer code for all algorithms was written in Java and used the Stochastic
Simulation in Java library to sample the random variables in parallel from threadindependent MRG32k3a pseudo random number generators, [24]. The experiments
were run on multiple threads on Intel Xeon(R) CPU X5650, 2.67GHz processors
and the computer graphics were made using the open source plotting library Matplotlib, [18].
64
H. Hoel et al.
p = 0.5
109
107
E[M ]
p = 0.67
TOL = 102.11
TOL = 103
c2
108
p = 0.75
TOL = 101.56
TOL = 102
TOL = 101.06
TOL = 101.5
106
105
104
103
102
101
10 1
p = 0.5
108
10 1
p = 0.67
Level
10
p = 0.75
1.56
TOL = 10
TOL = 103
c21
106
Level
2.11
107
E[M ]
Level
TOL = 101.06
TOL = 101.5
c20.75
TOL = 10
TOL = 102
c20.83
105
104
103
102
101
10
Level
15
20
10
15
Level
20
25
10
15
Level
20
25
Fig. 10 (Top) Average number of samples M for for Example 6 solved with the MSE adaptive
MLMC algorithm for three singularity exponent values. (Bottom) Corresponding average number of
samples for the uniform MLMC algorithm. The plotted decay rate reference lines, c2(((p)+1)/2) ,
for M follow implicitly from Eq. (43) (assuming that (p) = 2(1 p) is the correct variance decay
rate)
104
Runtime [s]
10
c2 TOL3
Adaptive MLMC
c1 TOL2
c2 TOL4
Uniform MLMC
102
101
100
101
102
101
102
TOL1
103
101
102
TOL1
100.50
101
101.50
TOL1
Fig. 11 Average runtime versus TOL1 for the two MLMC algorithms for three singularity exponent values in Example 6
p=0.5
TOL
Adaptive MLMC
100
101
102
103
65
5 3
1 2
0 1
0 0
0 0
p=0.75
0 1 0
1 1 0
0 0 0
0
3 1 2
0 1 0
2 1 0
1
104
105
101
102
103
103
101
102
100.50
101
TOL1
TOL1
p=0.5
p=0.67
p=0.75
TOL
Uniform MLMC
100
101
102
TOL1
7 7
8 4
4 6
2 4
5 3
2 0 3 5
2 4 1 2
0 2
101.50
0 1 0 1 0 0 0
0 0 0
104
105
101
102
TOL1
103
101
102
TOL1
100.50
101
TOL1
Fig. 12 Approximation errors for both of the MLMC algorithms solving Example 6. At every
TOL value, circles and triangles represent the errors from 100 independent multilevel estimator
realizations of the respective algorithms
5 Conclusion
We have developed an a posteriori, MSE adaptive EulerMaruyama time-stepping
algorithm and incorporated it into an MSE adaptive MLMC algorithm. The MSE
error expansion presented in Theorem 2 is fundamental to the adaptive algorithm.
Numerical tests have shown that MSE adaptive time-stepping may outperform uniform time-stepping, both in the single-level MC setting and in the MLMC setting,
(Examples 4 and 6). Due to the complexities of implementing adaptive time-stepping,
the numerical examples in this work were restricted to quite simple, low-regularity
SDE problems with singularities in the temporal coordinate. In the future, we aim
to study SDE problems with low-regularity in the state coordinate (preliminary tests
and analysis do however indicate that then some ad hoc molding of the adaptive
algorithm is required).
Although a posteriori adaptivity has proven to be a very effective method for
deterministic differential equations, the use of information from the future of the
numerical solution of the dual problem makes it a somewhat unnatural method to
extend to It SDE: It can result in numerical solutions that are not Ft -adapted,
which consequently may introduce a bias in the numerical solutions. [7] provides
an example of a failing adaptive algorithm for SDE. A rigorous analysis of the
convergence properties of our developed MSE adaptive algorithm would strengthen
the theoretical basis of the algorithm further. We leave this for future work.
66
H. Hoel et al.
Acknowledgments This work was supported by King Abdullah University of Science and Technology (KAUST); by Norges Forskningsrd, research project 214495 LIQCRY; and by the University
of Texas, Austin Subcontract (Project Number 024550, Center for Predictive Computational Science). The first author was and the third author is a member of the Strategic Research Initiative on
Uncertainty Quantification in Computational Science and Engineering at KAUST (SRI-UQ). The
authors would like to thank Arturo Kohatsu-Higa for his helpful suggestions for improvements in
the proof of Theorem 2.
Theoretical Results
Error Expansion for the MSE in 1D
In this section, we derive a leading-order error expansion for the MSE (12) in the 1D
setting when the drift and diffusion coefficients are respectively mappings of the form
a : [0, T ] R R and b : [0, T ] R R. We begin by deriving a representation
of the MSE in terms of products of local errors and weights.
Recalling the definition of the flow map, (x, t) := g(XTx,t ), and the first variation
of the flow map and the path itself given in Sect. 2.1.1, we use the Mean Value
Theorem to deduce that
g(XT) g X T = (0, x0 ) (0, X T )
=
N1
n=0
N1
X tn ,tn
(tn+1 , X tn+1 )
tn+1 , Xtn+1
(48)
n=0
N1
x tn+1 , X tn+1 + sn en en ,
n=0
X ,t
tn n
X tn+1 and sn [0, 1]. It expansion
where the local error is given by en := Xtn+1
of the local error gives the following representation:
en =
tn+1
n
tn+1
n
+
t
tn+1
n
tn+1
X ,t
) a(tn , X tn ) dt +
b(t, Xt tn n ) b(tn , X tn ) dWt
tn
X tn ,tn
a(t, Xt
an
bn
tn+1 t
axx 2
X ,t
X ,t
b )(s, Xs tn n ) ds dt +
(at + ax a +
(ax b)(s, Xs tn n ) dWs dt
2
tn
tn
tn
t
|n
=:a
8n
=:a
tn+1 t
bxx 2
X ,t
X ,t
b )(s, Xs tn n )ds dWt +
(bt + bx a +
(bx b)(s, Xs tn n )dWs dWt .
2
tn
tn
tn
t
|n
=:b
8n
=:b
(49)
67
By Eq. (48) we may express the MSE by the following squared sum
N1
2
E g(XT ) g X T
= E
x tn+1 , X tn+1 + sn en en
n=0
N1
n,k=0
This is the first step in deriving the error expansion in Theorem 2. The remaining
steps follow in the proof below.
Proof of Theorem 2. The main tools used in proving this theorem are Taylor and
ItTaylor expansions, It isometry, and truncation of higher order terms. For errors
8 n , (cf. Eq. (49)), we do detailed
attributed to the leading-order local error term, b
calculations, and the remainder is bounded by stated higher order terms.
We begin by noting that under the assumptions in Theorem 2 Lemmas 1 and 2
respectively verify then the existence and uniqueness of the solution of the SDE X
and the numerical solution X, and provide higher order moment bounds for both.
Furthermore, due to the assumption of the mesh points being stopping times for
which tn is Ftn1 -measurable for all n, it follows also that the numerical solution is
adapted to the filtration, i.e., X tn is Ftn -measurable for all n.
We further need to extend the flow map and the first variation notation from
x,tk
Sect. 2.1.1. Let X tn for n k denote the numerical solution of the EulerMaruyama
scheme
x,tk
x,tk
x,tk
x,tk
(50)
x,tk
that E |x|2p < for all p N, x is Ftk -measurable and provided the assumptions
of Lemma 2 hold, it is straightforward to extend the proof of the lemma to verify
x,tk
x,tk
that (X , x X ) converges strongly to (X x,tk , x X x,tk ) for t [tk , T ],
#
2p
x,tk
k
E X tn Xtx,t
max
n
1/2p
#
2p
x,tk
x,tk
E x X tn x Xtn
max
1/2p
knN
knN
and
C N 1/2 , p N
C N 1/2 , p N
x,tk 2p
x,tk 2p
max max E X tn , E x X tn
knN
< , p N.
(51)
68
H. Hoel et al.
In addition to this, we will also make use of moment bounds for the second and
third variation of the flow map in the proof, i.e., xx (t, x) and xxx (t, x). The second
variation is described in Section Variations of the flow
where it is shown in
map,
Lemma 3 that provided that x is Ft -measurable and E |x|2p < for all p N, then
max E |xx (t, x)|2p , E |xxx (t, x)|2p , E |xxxx (t, x)|2p < , p N.
Considering the MSE error contribution from the leading order local error terms
8 n , i.e.,
b
8 k b
8n ,
E x tk+1 , X tk+1 + sk ek x tn+1 , X tn+1 + sn en b
(52)
we have for k = n,
2
8 2n
E x tn+1 , X tn+1 + xx tn+1 , X tn+1 + sn en sn en b
2
8 2n + o tn2 .
= E x tn+1 , X tn+1 b
The above o tn2 follows from Youngs and Hlders inequalities,
8 2n
E 2x tn+1 , X tn+1 xx tn+1 , X tn+1 + sn en sn en b
#
$
8 4n
2 3
e2n b
C E x tn+1 , X tn+1 xx tn+1 , X tn+1 + sn en tn + E
tn3
2
3
C E E x tn+1 , X tn+1 xx tn+1 , X n+1 + sn en
Ftn tn
2 4
2 4
6
| b
| 2n b
8n
8 4n
8n
8
8 n b
a
a
b
b
n
n
+E
+
E
+
E
+
E
tn3
tn3
tn3
tn3
3
3
!
1
| 4n |Ftn 1 + E E a
8 4n |Ftn
C E tn3 +
E E a
tn
tn
3
3
3
1
1 "
4
4
8
| |Ft 1 + E E b
8
8
+ E E b
|F
E
E
|F
b
t
t
n
n
n
n
n
n
tn
tn
tn5
= E o(tn2 )
(53)
where the last inequality is derived by applying the moment bounds for multiple
It integrals described in [22, Lemma 5.7.5] and under the assumptions (R.1), (R.2),
(M.1), (M.2) and (M.3). This yields
69
axx 2 4
X tn ,tn
b (s, Xs
sup at + ax a +
) Ftn tn8 ,
2
s[tn ,tn+1 )
4
4
X
,t
8 n |Ftn CE
sup |ax b| (s, Xs tn n ) Ftn tn6 ,
E a
| 4n |Ftn CE
E a
s[tn ,tn+1 )
bxx 2 4
X tn ,tn
b (s, Xs
sup bt + bx a +
) Ftn tn6 ,
2
s[tn ,tn+1 )
4
4
X
,t
8 n |Ftn CE
sup |bx b| (s, Xs tn n ) Ftn tn4 ,
E b
4
| |Ft CE
E b
n
n
8
8 n |Ftn CE
E b
s[tn ,tn+1 )
sup
s[tn ,tn+1 )
|bx b|
(s, XsX tn ,tn ) Ftn
(54)
tn8 .
8 2n CE tn4 .
E xx X tn+1 + sn en , tn+1 sn2 e2n b
For achieving independence between forward paths and dual solutions in the expectations, an ItTaylor expansion of x leads to the equality
2
8 2n = E x tn+1 , X tn 2 b
8 2n + o tn2 .
E x tn+1 , X tn+1 b
Introducing the null set completed -algebra
,n = ({Ws }0stn ) ({Ws Wtn+1 }tn+1 sT ) (X0 ),
F
2
,n measurable by construction, (cf. [27, Appenwe observe that x tn+1 , X tn is F
dix B]). Moreover, by conditional expectation,
2
2 n
,
8 2n = E x tn+1 , X tn 2 E b
8 n |F
E x tn+1 , X tn b
2
2
tn2
2
+ o tn ,
= E x tn+1 , X tn (bx b) (tn , X tn )
2
where the last equality follows from using Its formula,
$
t #
b2 2
X ,t
2
t + ax + x (bx b) (s, Xs tn n ) ds
2
tn
t
X ,t
bx (bx b)2 (s, Xs tn n ) dWs , t [tn , tn+1 ),
+
X ,t
(bx b)2 (t, Xt tn n ) = (bx b)2 (tn , X tn ) +
tn
70
H. Hoel et al.
to derive that
2
n
,
8
E bn |F = E
tn+1
tn
tn
dWt
X tn
(bx b)2 (tn , X tn ) 2
tn + o tn2 .
2
2
Here, the higher order o tn terms are bounded in a similar fashion as the terms in
inequality (53), by using [22, Lemma 5.7.5].
For the terms in (52) for which k < n, we will show that
=
N1
N1
8n =
8 k b
E x tk+1 , X tk+1 + sk ek x tn+1 , X tn+1 + sn en b
E o tn2 ,
n=0
k,n=0
(55)
which means that the contribution to the MSE from these terms is negligible to
leading order. For the use in later expansions, let us first observe by use of the chain
rule that for any Ftn -measurable y with bounded second moment,
x (tk+1 , y) = g (XT
y,tk+1
Xt
y,tk+1
)x XT
y,tk+1
+sm ek ,tk+1
Xt
= g (XT k+1
)x XT n+1
y,t
y,t
= x tn+1 , Xtn+1k+1 x Xtn+1k+1 ,
,tn+1
y,t
x Xtn+1k+1
and that
Xt
x Xtn+1k+1
+sk ek ,tk+1
+
Xt
= x Xtn k+1
tn+1
+sk ek ,tk+1
X tk+1 +sk ek ,tk+1
ax (s, Xs
tn
tn+1
)x Xs
bx (s, Xs
ds
)x Xs
dWs .
tn
,k,n and ItTaylor expand the x functions in (55) about center points that are F
measurable:
X t +sk ek ,tk+1
X t +sk ek ,tk+1
x Xtn+1k+1
x tk+1 , X tk+1 + sk ek = x tn+1 , Xtn+1k+1
X t +sk ek ,tk+1
X tk ,tk+1
X tk ,tk+1
X t ,tk+1
+ xx tn+1 , Xtn
Xtn+1k+1
= x tn+1 , Xtn
Xtn k
71
X t +sk ek ,tk+1
X t ,tk+1
Xtn+1k+1
Xtn k
X t ,tk+1
+ xxx tn+1 , Xtn k
2
X t +sk ek ,tk+1
X t ,tk+1
+ xxxx tn+1 , (1 sn )Xtn k
+ sn Xtn+1k+1
Xt
(Xtn+1k+1
+sk ek ,tk+1
X t ,tk+1 2
Xtn k
X t ,tk+1
X t ,tk+1
x Xtn k
+ xx Xtn k
(a(tk , X tk )tk + b(tk , X tk )Wk + sk ek )
X t +`sk (a(tk ,X tk )tk +(b(tk ,X tk )Wk +sk ek ),tk+1
+ xxx Xtn k
tn+1
ax (s, Xs
)x Xs
ds
tn
tn+1
X t +sk ek ,tk+1
X t +sk ek ,tk+1
bx (s, Xs k+1
)x Xs k+1
dWs
,
(56)
tn
where
Xt
k+1
Xtn+1
+sk ek ,tk+1
t
n+1
tn
X t ,tk+1
Xtn k
a(s, Xs
)ds +
t
n+1
tn
+ x Xtn k
b(s, Xs
)dWs
and
X t ,tk+1
x tn+1 , X tn+1 + sn en = x tn+1 , X tn k
+ xx
X t ,tk+1
tn+1 , X tn k
X k ,tk+1
k,n + xxx tn+1 , X n
2
k,n
2
3
k,n
X t ,tk+1
,
+ xxxx tn+1 , (1 sn )X tn k
+ sn (X tn+1 + sn en )
6
(57)
with
k,n := a(tn , X tn )tn + b(tn , X tn )Wn + sn en
X t +sk (a(tk ,X tk )tk +b(tk ,X tk )Wk ),tk+1
+ x X tn k
72
H. Hoel et al.
8n ,
8 k b
E x tk+1 , X k+1 + sk ek x tn+1 , X n+1 + sn en b
the summands in the resulting expression that only contain products of the first
variations vanishes,
X t ,tk+1
X t ,tk+1
X t ,tk+1
8 k b
8n
x Xtn k
x tn+1 , X tn k+1
E x tn+1 , Xtn k
b
,n = 0,
8 n |F
E b
and
X tk ,tk+1
X tk ,tk+1
X tk ,tk+1
8
8
E x tn+1 , Xtn
x Xtn
b(tn , X tn )Wn x tn+1 , X tn
bk bn
X tk ,tk+1
X tk ,tk+1
n
,
8
8
x tn+1 , X tn
= 0.
= E x tn+1 , Xtn+1
bk b(tn , X tn )E bn Wn |F
From these observations, assumption (M.3), inequality (54), and, when necessary,
,k additional expansions of integrands to render the leading order integrand either F
n
,
or F -measurable and thereby sharpen the bounds (an example of such an expansion is
tn+1 t
8n =
(bx b)(s, XsX tn ,tn )dWs dWt
b
tn
tn+1
=
tn
tn
t
tn
X t ,tk+1
k
(bx b) s, Xs tn
,tn
73
We derive after a laborious computation which we will not include here that
E x tk+1 , X t
k+1
8n
8 k b
+ sk ek x tn+1 , X tn+1 + sn en b
N1
8 k b
8n
E x tk+1 , X tk+1 + sk ek x tn+1 , X tn+1 + sn en b
k,n=0,k=n
E tk2 E tn2
N1
C N 3/2
k,n=0,k=n
N1
1
C N 3/2
E tn2
n=0
C N 1/2
N1
E tn2 ,
n=0
N1
2 (bx b)2
=E
(tn , X tn )tn2 + o tn2 . (58)
x tn+1 , X tn
2
n=0
| n , can also be
| n , a
8 n and b
The MSE contribution from the other local error terms, a
m,n
,
bounded using the above approach with ItTaylor expansions, F -conditioning
and It isometries. This yields that
| k a
|n
E x tk+1 , X tk+1 + sk ek x tn+1 , X tn+1 + sn en a
at + ax a + axx b2 /2
(tk , X tk )
= E x X tk , tk x tn , X tn
2
a + a a + a b2 /2
t
x
xx
(tn , X tn )tk2 tn2 + o tk2 tn2 ,
2
(59)
74
H. Hoel et al.
8n
8 k a
E x tk+1 , X tk+1 + sk ek x tn+1 , X tn+1 + sn en a
E x tn , X t 2 (ax b)2 (tn , X t )t 3 + o t 3 , if k = n,
n
n
n
n
2
=
O N 3/2 E t 3 E t 3 1/2 ,
if k = n,
n
k
and
| k b
|n
E x tk+1 , X tk+1 + sk ek x tn+1 , X tn+1 + sn en b
E x tn , X t 2 (bt +bx a+bxx b2 /2)2 (tn , X t )t 3 + o t 3 , if k = n,
n
n
n
n
3
=
O N 3/2 E t 3 E t 3 1/2 ,
if k = n.
n
k
Moreover, conservative bounds for error contributions involving products of different
| k b
8 n , can be induced from the above bounds and Hlders
local error terms, e.g., a
inequality. For example,
N1
E
|
8
a
b
t
,
X
+
s
e
,
X
+
s
e
t
x
t
x
t
n
n
n
n+1
k+1
k
k
k
n+1
k+1
k,n=0
N1
N1
| k
8 n
= E
x tk+1 , X tk+1 + sk ek a
x tn+1 , X tn+1 + sn en b
k=0
k=0
'
(
2
(
( N1
(
| k
)E
x tk+1 , X t
+ sk ek a
k+1
k=0
'
(
2
(
( N1
8 n
(
x tn+1 , X tn+1 + sn en b
)E
n=0
= O N 1/2
N1
E tn2 .
n=0
X ,t
X ,t
X ,t
X ,t
E |x tn+1 , X tn x tn , X tn | = E g (XT tn n+1 )x XT tn n+1 g (XT tn n )x XT tn n
X ,t
X ,t
X ,t
E (g (XT tn n+1 ) g (XT tn n ))x XT tn n+1
X ,t
X ,t
X ,t
+ E g (XT tn n )(x XT tn n+1 x XT tn n )
= O N 1/2 .
75
Here, the last equality follows from the assumptions (M.2), (M.3), (R.2), and (R.3),
and Lemmas 1 and 2,
X ,t
X ,t
X ,t
E g (XT tn n+1 ) g (XT tn n ) x XT tn n+1
'
(
2
(
2
X tn ,tn
Xtn+1
,tn+1
( X tn ,tn+1
X tn ,tn+1
)
C E XT
XT
E x XT
4 1/4
X tn ,tn
(1sn )X tn +sn Xtn+1 ,tn+1
C E x XT
#
E
tn+1
tn
+
tn+1
tn
C E sup |a(s, XsX tn ,tn )|4 tn4 +
tn stn+1
= O N 1/2 ,
4 $1/4
1/4
tn stn+1
and that
'
(
2
(
X tn ,tn
X tn ,tn+1
X tn ,tn
X ,t
X ,t
E g (XT
)(x XT
x XT
) C )E x XT tn n+1 x XT tn n
'
(
2
X ,tn
(
Xt tn ,tn+1
(
X ,t
X tn ,tn
= C )E x XT tn n+1 x XT n+1
x Xtn+1
'
(
X ,tn
(
Xt tn ,tn+1
X ,t
C )E x XT tn n+1 x XT n+1
'
(
tn+1
tn+1
X ,tn
(
Xt tn ,tn+1
(
X ,t
X ,t
+ )E x XT n+1
ax (s, Xs tn n )ds +
bx (s, Xs tn n )dWs
tn
tn
2
'
(
tn+1
tn+1
X tn ,tn
(
(1sn )X tn +sn Xtn+1
,tn+1
(
X ,t
X ,t
C )E xx XT
ax (s, Xs tn n )ds +
bx (s, Xs tn n )dWs
tn
tn
2
+ O N 1/2
= O N 1/2 .
The last step is to replace the first variation of the exact path x tn , X tn with the
X t ,tn
76
H. Hoel et al.
X t ,tn
X ,t
X ,t
E x,n x tn , X tn = E g (X T )x X T n g (XT tn n )x XT tn n
X t ,tn
X ,t
X ,t
X ,t
E |g (X T )| x X T n x XT tn n + E g (X T ) g (XT tn n ) x XT tn n
= O N 1/2 .
dYu
(2)
(1)
(1)
(2)
(1)
(2)
dYu
(60)
77
defined for u (t, T ] with the initial condition Yt = (x, 1, 0, 0, 0). The first component of the vector coincides with Eq. (13), whereas the second one is the first variation
of the path from Eq. (16). The last three components can be understood as the second,
third and fourth variations of the path, respectively.
Making use of the solution of SDE (60), we also define the second, third and
fourth variations as
xx (t, x) = g (XTx,t )xx XTx,t + g (XTx,t )(x XTx,t )2 ,
xxx (t, x) = g (XTx,t )xxx XTx,t + + g (XTx,t )(x XTx,t )3 ,
xxxx (t, x) = g
+ + g
(61)
In the sequel, we prove that the solution to Eq. (60) when understood in the integral
sense that extends (13) is a well defined random variable with bounded moments.
Given sufficient differentiability of the payoff g, this results in the boundedness of
the higher order variations as required in Theorem 2.
Proof of Lemma 1. By writing (Ys(1) , Ys(2) ) := (Xsx,t , x Xsx,t ), (13) and (16) together
form an SDE:
dYs(1) = a(s, Ys(1) )ds + b(s, Ys(1) )dWs
(62)
dYs(2) = ax (s, Ys(1) )Ys(2) ds + bx (s, Ys(1) )Ys(2) dWs
for s (t, T ] and with initial condition Yt = (x, 1). As before, ax stands for the
partial derivative of the drift function with respect to its spatial argument. We note
that (62) has such a structure that dynamics of Ys(2) depends on Ys(1) , that, in turn, is
independent of Ys(2) . By the Lipschitz continuity of a(s, Ys(1) ) and the linear growth
bound of the drift and diffusion coefficients a(s, Ys(1) ) and b(s, Ys(1) ), respectively,
there exists a pathwise unique solution of Ys(1) that satisfies
E sup
s[t,T ]
|Ys(1) |2p
< , p N,
(cf. [22, Theorems 4.5.3 and 4.5.4 and Exercise 4.5.5]). As a solution of an It SDE,
XTx,t is measurable with respect to FT it generates.
Note that Theorem [20, Theorem 5.2.5] establishes that the solutions of (62) are
pathwise unique. Kloeden and Platen [22, Theorems 4.5.3 and 4.5.4] note that the
existence and uniqueness theorems for SDEs they present can be modified in order
to account for looser regularity conditions, and the proof below is a case in point.
Our approach below follows closely presentation of Kloeden and Platen, in order to
prove the existence and moment bounds for Ys(2) .
(2)
, n N by
Let us define Yu,n
(2)
Yu,n+1
=
t
(2)
ax (s, Ys(2) )Ys,n
ds
+
t
(2)
bx (s, Ys(2) )Ys,n
dWs ,
78
H. Hoel et al.
(2)
with Yu,1
= 1, for all u [t, T ]. We then have, using Youngs inequality, that
2
u
(1)
(2)
bx (s, Ys )Ys,n dWs
+ 2E
t
t
u
u
2
2
(2)
(2)
2(u t)E
ax (s, Ys(1) )Ys,n
ds + 2E
bx (s, Ys(1) )Ys,n
ds .
(2) 2
E Yu,n+1 2E
2
(2)
ax (s, Ys(1) )Ys,n
ds
Boundedness of the partial derivatives of the drift and diffusion terms in (62) gives
(2) 2
E Yu,n+1
C(u t + 1)E
(2) 2
ds .
1 + Ys,n
n N.
tuT
(2)
(2)
(2)
Now, set Yu,n
= Yu,n+1
Yu,n
. Then
(2) 2
E Yu,n
2E
u
t
2
(2)
ax (s, Ys(1) )Ys,n1 ds + 2E
u
t
2
(2)
bx (s, Ys(1) )Ys,n1 dWs
u
(2) 2
(2) 2
2(u t)
E ax (s, Ys(1) )Ys,n1 ds + 2
E bx (s, Ys(1) )Ys,n1 ds
t
t
u
(2) 2
C1
E Ys,n1 ds.
C1n1
(n 1)!
(u s)
n1
(2) 2
E Ys,1 ds.
(2) 2
Next, let us show that E Ys,1 is bounded. First,
(2) 2
E Yu,1 = E
(2)
ax (s, Ys(1) )Ys,2
ds
+
t
2
(3)
bx (s, Ys(1) )Yu,2
dWs
(2) 2
C(u t + 1) sup E Ys,2
.
s[t,u]
E Yu,n
,
n!
C n (T t)n
(2) 2
sup E Yu,n
.
n!
u[t,T ]
79
Define
(2)
,
Zn = sup Yu,n
tuT
(2)
(2)
ax (s, Ys(1) )Ys,n+1 ax (s, Ys(1) )Ys,n
ds
t
u
(2)
(1)
(1)
(2)
+ sup
bx (s, Ys )Ys,n+1 bx (s, Ys )Ys,n dWs .
Zn
tuT
E |Zn |2 2(T t)
C n (T t)n
,
n!
for some C R. Using the Markov inequality, we get
n4 C n (T t)n
.
P Zn > n2
n!
n=1
n=1
The right-hand side of the equation above converges by the ratio test, whereas the
BorelCantelli Lemma guarantees the (almost sure) existence of K N, such that
(2)
Zk < k 2 , k > K . We conclude that Yu,n
converges uniformly in L 2 (P) to the limit
&
(2)
(2)
(2)
Yu = n=1 Yu,n and that since {Yu,n }n is a sequence of continuous and Fu -adapted
processes, Yu(2) is also continuous and Fu -adapted. Furthermore, as n ,
u
u
u
(3)
(1) (3)
(1) (3)
(3)
C
a
(s,
Y
)Y
ds
a
(s,
Y
)Y
ds
Ys,n Ys ds 0, a.s.,
x
x
s
s,n
s
s
t
and, similarly,
(3)
bx (s, Ys(1) )Ys,n
dWs
0, a.s.
80
H. Hoel et al.
Having established that Yu(2) solves the relevant SDE and that it has a finite second
moment, we may follow the principles laid out in [22, Theorem 4.5.4] and show that
all even moments of
u
u
ax (t, Ys(1) )Ys(2) ds +
bx (t, Ys(1) )Ys(2) dWs
Yu(2) = +
t
are finite. By Its Lemma, we get that for any even integer l,
(3) l
Y =
u
2
l(l 1) (2) l2
Ys
bx (s, Ys(1) )Ys(2) ds
2
t u
(2) l2 (2)
Y Y
+
bx (s, Ys(1) )Ys(2) dWs .
s
s
u
+E
t
l2
l(l 1) Ys(2)
(1)
(2) 2
bx (s, Ys )Ys
ds .
2
E |Y2,u |l ds
+E
t
l2
(2) 2
l(l 1) Ys(2)
(1)
bx s, Ys Ys
ds .
2
By the same treatment for the latter integral, using that bx is bounded,
(2) l
C
E Y
u
l
E Yu(2) ds.
l
Thus, by Grnwalls inequality, E Y (2) < .
i{1,2,...,5}
2p
sup E Yu(i)
< ,
u[t,T ]
p N.
81
Furthermore, the higher variations as defined by Eq. (61) satisfy are FT -measurable
and for all p N,
@
?
max E |x (t, x)|2p , E |xx (t, x)|2p , E |xxx (t, x)|2p , E |xxxx (t, x)|2p < .
Proof We note that (60) shares with (62) the triangular dependence structure. That
(j) 1
for d1 < 5 has drift and diffusion functions a :
is, the truncated SDE for {Yu }dj=1
(j)
d1
d1
[0, T ] R R and b : [0, T ] Rd1 Rd1 d2 that do not depend on Yu for
j d1 .
This enables verifying existence of solutions for the SDE in stages: first for
(Y (1) , Y (2) ), thereafter for (Y (1) , Y (2) , Y (3) ), and so forth, proceeding iteratively to
add the next component Y (d1 +1) of the SDE. We shall also exploit this structure
for proving the result of bounded moments for each component. The starting point
for our proof is Lemma 1, which guarantees existence, uniqueness and the needed
moment bounds for the first two components Y (1) , and Y (2) . As one proceeds to Y (i) ,
i > 2, the relevant terms in (64) feature derivatives of a and b of increasingly high
order. The boundedness of these derivatives is guaranteed by assumption (R.1).
(3)
, n N by
Defining a successive set of approximations Yu,n
(3)
Yu,n+1
=
t
2
(3)
axx (s, Ys(1) ) Ys(2) + ax (s, Ys(2) )Ys,n
ds
u
2
(3)
+
bxx (s, Ys(1) ) Ys(2) + bx (s, Ys(2) )Ys,n
dWs ,
t
(3)
= 0, for all u [t, T ]. Let us denote by
with the initial approximation defined by Yu,1
Q=
t
2
axx (s, Ys(1) ) Ys(1) ds +
u
t
2
bxx (s, Ys(1) ) Ys(2) dWs
(63)
(3)
. We then have, using
the terms that do not depend on the, highest order variation Yu,n
Youngs inequality, that
2
(3) 2
E Yu,n+1 3E |Q| + 3E
2
u
(1) (3)
bx (s, Ys )Ys,n dWs
+ 3E
t
t
u
u
2
2
(3)
(3)
3E |Q|2 + 3(u t)E
ax (s, Ys(1) )Ys,n
ds + 3E
bx (s, Ys(1) )Ys,n
ds .
u
2
(3)
ax (s, Ys(1) )Ys,n
ds
The term Q is bounded by Lemma 1 and the remaining terms can be bounded by
the same methods as in the proof of 1. Using the same essential tools: Youngs
and Doobs inequalities, Grnwalls lemma, Markov inequality and BorelCantelli
(3)
converges. This limit
Lemma, we can establish the existence of a limit to which Yu,n
(3)
is the solution of of Yu , and has bounded even moments through arguments that are
straightforward generalisations of those already presented in the proof of Lemma 1.
82
H. Hoel et al.
t (0, T ]
(64)
tn+1
tn
tn
tn+1
tn
tn
t
tn+1
1
at(i) + ax(i)j a(j) + ax(i)j xk (bbT )(j,k)
2
bt
tn
tn
tn
tn+1
tn
ds dt,
1
(j)
+ bx(i,j)
a(k) + bx(i,j)
(bbT )(k,) ds dWt ,
k
2 k x
(j)
bx(i,j)
b(k,) dWs() dWt ,
k
where all the above integrand functions in all equations implicitly depend on the
X ,t
X ,t
state argument Xs tn n . In flow notation, at(i) is shorthand for at(i) (s, Xs tn n ).
Under sufficient regularity, a tedious calculation similar to the proof of Theorem 2
verifies that, for a given smooth payoff, g : Rd1 R,
N1
2
2
2
E g(XT) g X T
E
n tn + o tn ,
n=0
where
n :=
(i,j)
1
xi ,n (bbT )(k,) (bxk bxT )
(tn , X tn )xj ,n .
2
83
(65)
In the multi-dimensional setting, the ith component of first variation of the flow map,
x = (x1 , x2 , . . . , xd1 ), is given by
y,t (j)
y,t
xi (t, y) = gxj (XT )xi XT
.
The first variation is defined as the second component to the solution of the SDE,
dYs(1,i) = a(i) s, Ys(1) ds + b(i,j) s, Ys(1) dWs(j)
s, Ys(1) Ys(2,k,j) dWs() ,
dYs(2,i,j) = ax(i)k s, Ys(1) Ys(2,k,j) ds + bx(i,)
k
where s (t, T ] and the initial conditions are given by Yt(1) = x Rd1 , Yt(2) = Id1 ,
with Id1 denoting the d1 d1 identity matrix. Moreover, the extension of the numerical method for solving the first variation of the 1D flow map (23) reads
xi ,n = cx(j)i (tn , X tn ) xj ,n+1 , n = N 1, N 2, . . . 0.
(66)
xi ,N = gxi (X T ),
with the jth component of c : [0, T ] Rd1 Rd1 defined by
(j)
c(j) tn , X tn = X tn + a(j) (tn , X tn )tn + b(j,k) (tn , X tn )Wn(k) .
Let U and V denote subsets of Euclidean spaces and let us introduce the
multi-index
7 partial derivatives of order
& = (1 , 2 , . . . , d ) to represent spatial
|| := dj=1 j on the following short form x := dj=1 xj . We further introduce
the following function spaces.
C(U; V ) := {f : U V | f is continuous},
Cb (U; V ) := {f : U V | f is continuous and bounded},
dj
Cbk (U; V ) := f : U V | f C(U; V ) and j f Cb (U; V )
dx
for all integers 1 j k ,
Cbk1 ,k2 ([0, T ] U; V ) := f : [0, T ] U V | f C([0, T ] U; V ), and
j
t f Cb ([0, T ] U; V ) for all integers j k1 and 1 j + || k2 .
Theorem 3 (MSE leading order error expansion in the multi-dimensional setting)
Assume that drift and diffusion coefficients and input data of the SDE (64) fulfill
84
H. Hoel et al.
(R.1) a Cb2,4 ([0, T ] Rd1 ; Rd1 ) and b Cb2,4 ([0, T ] Rd1 ; Rd1 d2 ),
(R.2) there exists a constant C > 0 such that
|a(t, x)|2 + |b(t, x)|2 C(1 + |x|2 ),
p
E tn2p c3 E tn2 .
Then, as N increases,
2
E g(XT ) g X T
(i,j)
N1
xj (tn , X tn )
xi (bbT )(k,) (bxk bxT )
tn2 + o(tn2 ),
= E
2
n=0
where we have dropped the arguments of the first variation as well as the diffusion
matrices for clarity.
Replacing the first variation xi tn , X n by the numerical approximation xi ,n ,
as defined in (66) and using the error density notation from (65), we obtain the
following to leading order all-terms-computable error expansion:
N1
2
2
2
=E
n tn + o(tn ) .
E g(XT ) g X T
(67)
n=0
85
Algorithm 5 mlmcEstimator
Input: TOLT , TOLS , confidence , input mesh t {1} , input mesh intervals N1 , inital number
, weak convergence rate , SDE problem.
of samples M,
Output: Multilevel estimator AML .
Compute the confidence parameter CC () by (42).
Set L = 1.
while L < 3 or (44), using the input rate , is violated do
Set L = L + 1.
, generate a set of (EulerMaruyama) realizations { g i, }ML on mesh and
Set ML = M,
i=1
Wiener path pairs (t {L1} , t {L} ) and (W {L1} , W {L} ), where the uniform mesh pairs have
step sizes t {L1} = T /NL1 and t {L} = T /NL ), respectively.
for = 0 to L do
Compute the sample variance V ( g; Ml ).
end for
for = 0 to L do
Determine the number of samples by
3
2
4
L
CC 2 () Var( g) *
M =
N
Var(
g)
.
N
TOLS 2
=0
References
1. Avikainen, R.: On irregular functionals of SDEs and the Euler scheme. Financ. Stoch. 13(3),
381401 (2009)
2. Bangerth, W., Rannacher, R.: Adaptive Finite Element Methods for Differential Equations.
Lectures in Mathematics ETH Zrich. Birkhuser, Basel (2003)
3. Barth, A., Lang, A.: Multilevel Monte Carlo method with applications to stochastic partial
differential equations. Int. J. Comput. Math. 89(18), 24792498 (2012)
4. Cliffe, K.A., Giles, M.B., Scheichl, R., Teckentrup, A.L.: Multilevel Monte Carlo methods and
applications to elliptic PDEs with random coefficients. Comput. Vis. Sci. 14(1), 315 (2011)
5. Collier, Nathan, Haji-Ali, Abdul-Lateef, Nobile, Fabio, von Schwerin, Erik, Tempone, Ral:
A continuation multilevel Monte Carlo algorithm. BIT Numer. Math. 55(2), 399432 (2014)
6. Durrett, R.: Probability: Theory and Examples, 2nd edn. Duxbury Press, Belmont (1996)
7. Gaines, J.G., Lyons, T.J.: Variable step size control in the numerical solution of stochastic
differential equations. SIAM J. Appl. Math. 57, 14551484 (1997)
8. Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607617 (2008)
9. Giles, M.B.: Multilevel Monte Carlo methods. Acta Numerica 24, 259328 (2015)
10. Giles, M.B., Szpruch, L.: Antithetic multilevel Monte Carlo estimation for multi-dimensional
SDEs without Lvy area simulation. Ann. Appl. Probab. 24(4), 15851620 (2014)
86
H. Hoel et al.
11. Gillespie, D.T.: The chemical Langevin equation. J. Chem. Phys. 113(1), 297306 (2000)
12. Glasserman, P.: Monte Carlo Methods in Financial Engineering. Applications of Mathematics
(New York), vol. 53. Springer, New York (2004). Stochastic Modelling and Applied Probability
13. Haji-Ali, A.-L., Nobile, F., von Schwerin, E., Tempone, R.: Optimization of mesh hierarchies
in multilevel Monte Carlo samplers. Stoch. Partial Differ. Equ. Anal. Comput. 137 (2015)
14. Heinrich, S.: Monte Carlo complexity of global solution of integral equations. J. Complex.
14(2), 151175 (1998)
15. Heinrich, S., Sindambiwe, E.: Monte Carlo complexity of parametric integration. J. Complex.
15(3), 317341 (1999)
16. Hoel, H., von Schwerin, E., Szepessy, A., Tempone, R.: Implementation and analysis of an
adaptive multilevel Monte Carlo algorithm. Monte Carlo Methods Appl. 20(1), 141 (2014)
17. Hofmann, N., Mller-Gronbach, T., Ritter, K.: Optimal approximation of stochastic differential
equations by adaptive step-size control. Math. Comp. 69(231), 10171034 (2000)
18. Hunter, J.D.: Matplotlib: a 2d graphics environment. Comput. Sci. Eng. 9(3), 9095 (2007)
19. Ilie, S.: Variable time-stepping in the pathwise numerical solution of the chemical Langevin
equation. J. Chem. Phys. 137(23), 234110 (2012)
20. Karatzas, I., Shreve, S.E.: Brownian Motion and Stochastic Calculus. Graduate Texts in Mathematics, vol. 113, 2nd edn. Springer, New York (1991)
21. Kebaier, A.: Statistical Romberg extrapolation: a new variance reduction method and applications to option pricing. Ann. Appl. Probab. 15(4), 26812705 (2005)
22. Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differential Equations. Applications
of Mathematics (New York). Springer, Berlin (1992)
23. Lamba, H., Mattingly, J.C., Stuart, A.M.: An adaptive Euler-Maruyama scheme for SDEs:
convergence and stability. IMA J. Numer. Anal. 27(3), 479506 (2007)
24. LEcuyer, P., Buist, E.: Simulation in Java with SSJ. In: Proceedings of the 37th conference on
Winter simulation, WSC 05, pages 611620. Winter Simulation Conference (2005)
25. Milstein, G.N., Tretyakov, M.V.: Quasi-symplectic methods for Langevin-type equations. IMA
J. Numer. Anal. 23(4), 593626 (2003)
26. Mishra, S., Schwab, C.: Sparse tensor multi-level Monte Carlo finite volume methods for
hyperbolic conservation laws with random initial data. Math. Comp. 81(280), 19792018
(2012)
27. ksendal, B.: Stochastic Differential Equations. Universitext, 5th edn. Springer, Berlin (1998)
28. Platen, E., Heath, D.: A Benchmark Approach to Quantitative Finance. Springer Finance.
Springer, Berlin (2006)
29. Shreve, S.E.: Stochastic Calculus for Finance II. Springer Finance. Springer, New York (2004).
Continuous-time models
30. Skeel, R.D., Izaguirre, J.A.: An impulse integrator for Langevin dynamics. Mol. Phys. 100(24),
38853891 (2002)
31. Szepessy, A., Tempone, R., Zouraris, G.E.: Adaptive weak approximation of stochastic differential equations. Comm. Pure Appl. Math. 54(10), 11691214 (2001)
32. Talay, D.: Stochastic Hamiltonian systems: exponential convergence to the invariant measure,
and discretization by the implicit Euler scheme. Markov Process. Relat. Fields 8(2), 163198
(2002). Inhomogeneous random systems (Cergy-Pontoise, 2001)
33. Talay, D., Tubaro, L.: Expansion of the global error for numerical schemes solving stochastic
differential equations. Stoch. Anal. Appl. 8(4), 483509 (1990)
34. Teckentrup, A.L., Scheichl, R., Giles, M.B., Ullmann, E.: Further analysis of multilevel Monte
Carlo methods for elliptic PDEs with random coefficients. Numer. Math. 125(3), 569600
(2013)
35. Yan, L.: The Euler scheme with irregular coefficients. Ann. Probab. 30(3), 11721194 (2002)
Abstract A new family of digital nets called Vandermonde nets was recently
introduced by the authors. We generalize the construction of Vandermonde nets
with a view to obtain digital nets that serve as stepping stones for new constructions
of digital sequences called Vandermonde sequences. Another new family of Vandermonde sequences is built from global function fields, and this family of digital
sequences has asymptotically optimal quality parameters for a fixed prime-power
base and increasing dimension.
Keywords Low-discrepancy point sets and sequences
sequences Digital point sets and sequences
(t, m, s)-nets
(t, s)-
87
88
with integers di 0 and 0 ai < bdi for 1 i s and with volume btm contains
exactly bt points of P.
The number t is called the quality parameter of a (t, m, s)-net in base b and it
should be as small as possible in order to get strong uniformity properties of the net.
It was shown in [7] (see also [8, Theorem 4.10]) that in the nontrivial case m 1,
the star discrepancy D N (P) of a (t, m, s)-net P in base b with N = bm satisfies
D N (P) B(b, s)bt N 1 (log N )s1 + O bt N 1 (log N )s2 ,
(1)
where B(b, s) and the implied constant in the Landau symbol depend only on b and
s. The currently best values of B(b, s) are due to Kritzer [6] for odd b and to Faure
and Kritzer [3] for even b.
Most of the known constructions of (t, m, s)-nets are based on the digital method
which was introduced in [7]. Although the digital method works for any base b 2,
we focus in the present paper on the case where b is a prime power. In line with
standard notation, we write q for a prime-power base. The construction of a digital
net over Fq proceeds as follows. Given a prime power q, a dimension s 1, and an
integer m 1, we let Fq be the finite field of order q and we choose m m matrices
C (1) , . . . , C (s) over Fq . We write Z q = {0, 1, . . . , q 1} Z for the set of digits in
base q. Then we define the map m : Fqm [0, 1) by
m (b ) =
m
(b j )q j
j=1
(2)
By letting b range over all q m column vectors in Fqm , we arrive at a digital net
consisting of q m points in [0, 1)s .
Definition 2 If the digital net P over Fq consisting of the q m points in (2) with b
ranging over Fqm is a (t, m, s)-net in base q for some value of t, then P is called a
digital (t, m, s)-net over Fq . The matrices C (1) , . . . , C (s) are the generating matrices
of P.
89
This construction of digital nets can be generalized somewhat by employing further bijections between Fq and Z q (see [8, p. 63]), but this is not needed for our
purposes since our results depend only on the generating matrices of a given digital
net. Note that a digital net over Fq consisting of q m points in [0, 1)s is always a digital
(t, m, s)-net over Fq with t = m.
A new family of digital nets called Vandermonde nets was recently introduced
by the authors in [5]. In the present paper, we extend the results in [5] in several
directions. Most importantly, we show how to obtain not only new (t, m, s)-nets, but
also new (t, s)-sequences from our approach. It seems reasonable to give the name
Vandermonde sequences to these (t, s)-sequences.
The rest of the paper is organized as follows. In Sect. 2, we briefly review the construction of digital nets in [5]. We generalize this construction in Sect. 3, as a preparation for the construction of Vandermonde sequences. Finally, the constructions of
new (t, s)-sequences and more generally of (T, s)-sequences called Vandermonde
sequences are presented in Sects. 4 and 5.
90
L(a) = h G q,m
s1
Hq,m
s
h i (i ) = 0
i=1
and L
(a) = L(a)\{0}. The following figure of merit was defined in [5, Definition 2.1]. We use the standard convention that an empty sum is equal to 0.
Definition 3 If L
(a) is nonempty, we define the figure of merit
(a) = min
hL (a)
deg (h 1 ) +
s
deg(h i ) .
i=2
91
by an averaging argument (see [5, Sect. 3]). Again for a prime q, suitable s-tuples
a Fqs m yielding this improved discrepancy bound can be obtained by a componentby-component algorithm (see [5, Sect. 5]).
We comment on the relationship between Vandermonde nets and other known
families of digital nets. A broad class of digital nets, namely that of hyperplane
nets, was introduced in [17] (see also [1, Chap. 11]). Choose 1 , . . . , s Fq m not
all 0. Then for the corresponding hyperplane net relative to a fixed ordered basis
1 , . . . , m of Fq m over Fq , the matrix C = ( j(i) )1is, 1 jm at the beginning of this
section is given by j(i) = i j for 1 i s and 1 j m (see [1, Theorem 11.5]
and [10, Remark 6.4]). Thus, this matrix C is also a structured matrix, but the structure
is in general not a Vandermonde structure. Consequently, Vandermonde nets are in
general not hyperplane nets relative to a fixed ordered basis of Fq m over Fq . The wellknown family of polynomial lattice point sets (see [1, Chap. 10] and [15]) belongs
to the family of hyperplane nets by [16, Theorem 2] (see also [1, Theorem 11.7]),
and so Vandermonde nets are in general not polynomial lattice point sets.
92
s1
:
(h i gi ) 0 (mod f )
L(g, f ) = h G q,m Hq,m
i=1
and L
(g, f ) = L(g, f )\{0}.
Definition 4 Let q be a prime power and let s, m N. Let f Fq [X ] with deg( f ) =
s
. If L
(g, f ) is nonempty, we define the figure of merit
m and let g G q,m
(g, f ) =
min
hL (g, f )
deg (h 1 ) +
s
deg(h i ) .
i=2
j=0
m1
j
b j g1 + bm g2 = 0 Fqm .
j=0
m1
m1
j=0
b j g1 + bm g2 0 (mod f ). If we put
b j X j , h 2 (X ) = bm X, h i (X ) = 0 for 3 i s,
j=0
s1
is a nonzero s-tuple belonging to L(g, f ).
then h = (h 1 , . . . , h s ) G q,m Hq,m
93
t = m (g, f ).
Proof The case (g, f ) = 0 is trivial, and so in view of Remark 1 we can assume that
1 (g, f ) m. According to a well-known result for digital nets (see [1, Theorem 4.52]),
for any nonnegative integers d1 , . . . , ds
s it suffices to show the following:(i)
with i=1
di = (g, f ), the row vectors c j Fqm , 1 j di , 1 i s, of the
generating matrices of V (g, f ) are linearly independent over Fq . Suppose, on the
contrary, that we had a linear dependence relation
di
s
m
bi, j c(i)
j = 0 Fq ,
i=1 j=1
where all bi, j Fq and not all of them are 0. By the definition of the c(i)
j and the
Fq -linearity of f we obtain
f
d1
j1
b1, j g1
j=1
s
d1
i=1 (h i
di
s
bi, j gi
= 0 Fqm .
i=2 j=1
gi ) 0 (mod f ), where
b1, j X j1 G q,m , h i (X ) =
j=1
di
j=1
and so h = (h 1 , . . . , h s ) L
(g, f ). Furthermore, by the definitions of the degree
functions deg and deg in Sect. 2, we have deg (h 1 ) < d1 and deg(h i ) di for 2
i s. It follows that
deg (h 1 ) +
s
i=2
deg(h i ) <
s
di = (g, f ),
i=1
Now we generalize the explicit construction of Vandermonde nets in [5, Sect. 4].
Let q be a prime power and let s and m be integers with 1 s q + 1 and m 2. Put
g1 (X ) = X G q,m . If s 2, then we choose s 1 distinct elements c2 , . . . , cs of Fq ;
this is possible since s 1 q. Furthermore, let f Fq [X ] be such that deg( f ) =
m. If s 2, then suppose that f (ci ) = 0 for 2 i s (for instance, this condition
is automatically satisfied if f is a power of a nonlinear irreducible polynomial over
Fq ). For each i = 2, . . . , s, we have gcd(X ci , f (X )) = 1, and so there exists a
uniquely determined gi G q,m with
gi (X )(X ci ) 1 (mod f (X )).
(4)
94
=
deg
(h
)
and
d
=
deg(h
)
for
2
s.
Now
h
L
(g,
f
)
where
d
1
1
i
i
s implies that
s
dk
i=1 (h i gi ) 0 (mod f ), and multiplying this congruence by
k=2 (X ck )
we get
h 1 (X )
s
(X ck ) +
dk
k=2
(h i gi )(X )
i=2
If we write h i (X ) =
(h i gi )(X )
s
s
di
j=1
s
k=2
(X ck )dk =
di
k=2
h i, j gi (X )
s
j=1
di
(X ck )dk
k=2
j
h i, j gi (X )
(X ci )
j=1
di
s
(X ck )dk
k=2
k =i
h i, j (X ci )di j
j=1
di
s
(X ck )dk (mod f (X ))
k=2
k =i
by (4), and so
h 1 (X )
s
k=2
(X ck ) +
dk
di
s
i=2
j=1
h i, j (X ci )di j
s
k=2
k =i
Let f 0 Fq [X ] denote
sthe left-hand side of the preceding
s congruence. The first term
di m 1. In the sum i=2
in
expression
f0 , a
of f 0 has degree i=1
the
for
s
s
di 1 i=1
di
term appears only if di 1 and such a term has degree i=2
m 1 since d1 = deg (h 1 ) 1. Altogether we have deg( f 0 ) m 1 < deg( f ).
But f divides f 0 according to the congruence above, and so f 0 = 0 Fq [X ]. If we
assume that dr 1 for some r {2, . . . , s}, then substituting X = cr in f 0 (X ) we
obtain
0 = f 0 (cr ) =
dr
j=1
h r, j (cr cr )dr j
s
95
k=2
k =r
s
(cr ck )dk .
k=2
k =r
Since the last product is nonzero, we deduce that h r,dr = 0. This is a contradiction to
deg(h r ) = dr . Thus we have shown that di = 0 for 2 i s, and so h i = 0 Fq [X ]
for 2 i s. Since f 0 = 0 Fq [X ], it follows that also h 1 = 0 Fq [X ]. This is
the final contradiction since h L
(g, f ) means in particular that h is a nonzero
s-tuple.
In the case where f Fq [X ] with deg( f ) = m 2 is irreducible over Fq , this
construction of Vandermonde (0, m, s)-nets over Fq is equivalent to that in [5,
Sect. 4]. The construction is best possible in terms of the condition on s since it is well
known that if m 2, then s q + 1 is a necessary condition for the existence of a
(0, m, s)-net in base q (see [8, Corollary 4.21]). The fact that we can explicitly construct Vandermonde (0, m, s)-nets over Fq for all dimensions s q + 1 represents
an advantage over polynomial lattice point sets since explicit constructions of good
polynomial lattice point sets are known only for s = 1 and s = 2 (see [8, Sect. 4.4]
and also [1, p. 305]).
96
we restrict the attention to the case of a prime-power base b = q. For a given dimension s 1, the generating matrices are now matrices C (1) , . . . , C (s) over
Fq , where by an matrix we mean a matrix with denumerably many rows
and columns. Let Fq be the sequence space over Fq , viewed as a vector space of
column vectors over Fq of infinite length. We define the map : Fq [0, 1] by
(e) =
(e j )q j
j=1
a j (n)q j1 ,
n=
j=1
with all a j (n) Z q and a j (n) = 0 for all sufficiently large j, be the unique digit
expansion of n in base q. With n we associate the column vector
n = ((a1 (n)), (a2 (n)), . . .) Fq ,
where : Z q Fq is a given bijection with (0) = 0. Now we define the sequence
S by
xn = (C (1) n), . . . , (C (s) n) [0, 1]s
for n = 0, 1, . . . .
Note that the matrix-vector products C (i) n for i = 1, . . . , s are meaningful since
n has only finitely many nonzero coordinates. The sequence S is called a digital
sequence over Fq .
Definition 6 If the digital sequence S over Fq is a (T, s)-sequence in base q for
some function T : N N0 with T(m) m for all m N, then S is called a digital
(T, s)-sequence over Fq . Similarly, if S is a (t, s)-sequence in base q for some
integer t 0, then S is called a digital (t, s)-sequence over Fq .
For i = 1, . . . , s and any integer m 1, we write Cm(i) for the left upper m m
submatrix of the generating matrix C (i) of a digital sequence over Fq . The following well-known result serves to determine a suitable function T for a given digital
sequence over Fq (see [1, Theorem 4.84]).
Lemma 1 Let S be a digital sequence over Fq with generating matrices C (1) , . . . ,
C (s) and let T : N N0 with T(m) m for all m N. Then S is a digital (T, s)sequence over Fq if the following property holds: for any integer m 1 and any
s
m
di = m T(m), the vectors c(i)
integers d1 , . . . , ds 0 with i=1
j,m Fq , 1 j
(i)
di , 1 i s, are linearly independent over Fq , where c j,m denotes the jth row
vector of Cm(i) .
97
di = (m + r ) (T(m) + r ) = m T(m),
i=1
m+r
the vectors c(i)
, 1 j di , 1 i s, are linearly independent over Fq .
j,m+r Fq
But this is obvious since any nontrivial linear dependence relation between the latter
vectors would yield, by projection onto the first m coordinates of these vectors, a
m
nontrivial linear dependence relation between the vectors c(i)
j,m Fq , 1 j di ,
1 i s.
Now we show how to obtain digital (T, s)-sequences over Fq from the Vandermonde nets in Theorem 2. Let k and s be integers with k 2 and 1 s q + 1. Let
f Fq [X ] be such that deg( f ) = k. If s 2, then let c2 , . . . , cs Fq be distinct
and suppose that f (ci ) = 0 for 2 i s. For any integer e 1, we consider the
modulus f e Fq [X ]. We have again f e (ci ) = 0 for 2 i s, and so Theorem 2
yields a Vandermonde net V (g e , f e ) which is a digital (0, ek, s)-net over Fq . We
write
s
for all e N.
g e = (g1,e , . . . , gs,e ) G q,ek
Then we have the compatibility property
g e+1 g e (mod f e )
for all e N,
(5)
98
other coordinates, the congruence follows from the fact that gi G q,m is uniquely
determined by (4).
Recall that V (g e , f e ) depends also on the choice of an ordered basis Be of the
vector space Fq [X ]/( f e ) over Fq (see Sect. 3). We make these ordered bases Be
for e N compatible by choosing them as follows. Let B1 consist of the residue
classes of 1, X, . . . , X k1 modulo f (X ), let B2 consist of the residue classes of
1, X, . . . , X k1 , f (X ), X f (X ), . . . , X k1 f (X ) modulo f 2 (X ), and so on in an obvious manner. For the maps f , f 2 , . . . in Sect. 3, this has the pleasant effect that for
any e N and any h Fq [X ] we have
f e (h) = (e+1)k,ek ( f e+1 (h)),
(6)
where (e+1)k,ek : Fq(e+1)k Fqek is the projection onto the first ek coordinates of a
vector in Fq(e+1)k .
Finally, we construct the generating matrices C (1) , . . . , C (s) of an sdimensional digital sequence over Fq . We do this by defining certain left upper square
submatrices of each C (i) and by showing that these submatrices are compatible.
(i)
Concretely, for i = 1, . . . , s and any e N, the left upper (ek) (ek) submatrix Cek
(i)
e
of C is defined as the ith generating matrix of the Vandermonde net V (g e , f ).
For this to make sense, we have to verify the compatibility condition that for each
(i)
is equal to
i = 1, . . . , s and e N, the left upper (ek) (ek) submatrix of C(e+1)k
(i)
Cek . In the notation of Lemma 1, this means that we have to show that
(i)
c(i)
j,ek = (e+1)k,ek (c j,(e+1)k )
by (6) and (5), and obvious modifications show the analogous identity for i = 1. This
completes the construction of the Vandermonde digital sequence S over Fq with
generating matrices C (1) , . . . , C (s) .
Theorem 3 Let q be a prime power and let k and s be integers with k 2 and 1
s q + 1. Let f Fq [X ] be such that deg( f ) = k. If s 2, then let c2 , . . . , cs
Fq be distinct and suppose that f (ci ) = 0 for 2 i s. Then the Vandermonde
sequence S constructed above is a digital (T, s)-sequence over Fq with T(m) =
rk (m) for all m N, where rk (m) is the least residue of m modulo k.
Proof It suffices to show that S is a digital (T0 , s)-sequence over Fq with T0 (m) = 0
if m 0 (mod k) and T0 (m) = m otherwise. The rest follows from Lemma 2.
Now let m 0 (mod k), say m = ek with e N. Then for m = ek, we have to
verify the linear independence property in Lemma 1 for the left upper (ek) (ek)
(1)
(s)
(1)
(s)
of S , with the
submatrices
ek , . . . , C ek of the generating matrices C , . . . , C
C
s
condition i=1 di = ek in Lemma 1. By the construction of the latter generating
99
(1)
(s)
matrices, the submatrices Cek
, . . . , Cek
are the generating matrices of the Vandere
e
monde net V (g e , f ). Now V (g e , f ) is a digital (0, ek, s)-net over Fq by Theorem 2, and this implies the desired linear independence property in Lemma 1 for
(1)
(s)
, . . . , Cek
.
Cek
100
nP P
PP F
with n P Z for all P P F and all but finitely many n P = 0. We write also n P =
P (D). The finite set of all places P of F with P (D) = 0 is called the support of
D. The degree deg(D) of a divisor D is defined by
deg(D) =
n P deg(P) =
PP F
P (D) deg(P).
PP F
Divisors are added and subtracted term by term. We say that a divisor D of F is
positive if P (D) 0 for all P P F . The principal divisor div( f ) of f F is
defined by
div( f ) =
P ( f ) P.
PP F
j L (D + j Pi j P1 ) \ (L (D + j Pi ( j + 1)P1 ) L (D + ( j 1)Pi j P1 ))
101
(ii) P1 ( (i)
j ) = j P1 (D),
(iii) Pi ( (i)
j ) = j,
(l)
(iv) Ph ( j ) 0
for j 1, for 2 i s, and for 2 h s and 1 l s with h = l.
Proof We first observe that obviously
deg (D + ( j 1)P1 ( j 1)P2 ) = 2g,
(7)
(8)
(9)
deg (D + j Pi ( j + 1)P1 ) = 2g 1,
deg (D + ( j 1)Pi j P1 ) = 2g 1,
(L (D + j Pi ( j + 1)P1 ) L (D + ( j 1)Pi j P1 )) {0}.
(10)
(11)
(12)
where we used the Riemann-Roch theorem together with (9), (10), (11), and (12).
The results (i), (ii), (iii), and (iv) are now obtained from the choice of the (i)
j for
1 i s and j 1 and from the given properties of the divisor D.
Example 2 If F is the rational function field Fq (X ), then the elements (i)
j F in
Lemma 4 can be given explicitly. For this F we have the so-called infinite place
(which is a rational place of F), and the remaining places of F are in one-to-one
correspondence with the monic irreducible polynomials over Fq (see [14, Sect. 1.5]).
For an integer s with 2 s q + 1, let P1 be the infinite place of F and for i =
2, . . . , s let Pi be the rational place of F corresponding to the polynomial X
ci Fq [X ], where c2 = 0, c3 , . . . , cs are distinct elements of Fq . Let D be the zero
j1
j
for j 1 and (i)
for
divisor of F. Then the elements (1)
j = X
j = (X ci )
2 i s and j 1 satisfy all properties in Lemma 4 (note that no choice of P is
needed for Lemma 4). There is an obvious relationship between these elements (i)
j
and the construction of Vandermonde sequences in Sect. 4 (compare also with the
construction leading to Theorem 2).
A trick that was used in [20] for the construction of good digital sequences comes
in handy now. We first determine a basis {w1 , . . . , wg } of the vector space L (D
P1 ) with dimension
(D P1 ) = g as follows. By the Riemann-Roch theorem and
Lemma 3, we know the dimensions
(D P1 ) = g and
(D P1 2g P ) = 0.
Hence there exist integers 0 n 1 < < n g < 2g such that
102
(D P1 n r P ) =
(D P1 (n r + 1)P ) + 1 for 1 r g.
Now we choose wr L (D P1 n r P )\L (D P1 (n r + 1)P ) to obtain the
basis {w1 , . . . , wg } of L (D P1 ). Note that P (wr ) = n r , P1 (wr ) 1 P1 (D),
and Pi (wr ) 0 for all 2 i s, 1 r g.
Lemma 5 With the notation above, the system {w1 , . . . , wg } { (i)
j }1is, j1 is linearly independent over Fq .
Proof The linear independence of { (i)
j } j1 for every fixed i = 1, . . . , s is obvious
from the known values of valuations in Lemma 4. Suppose that
g
ar wr +
r =1
s
v
(i)
b(i)
j j = 0
i=1 j=1
(h)
b(h)
j j
g
ar wr
s
v
r =1
j=1
i=1
i =h
(i)
b(i)
j j .
j=1
(1)
b(1)
j j =
g
ar wr .
r =1
j=1
a (i)
j,k z k
k=0
103
(i)
m
b(i)
j c j,m = 0 Fq .
(13)
i=1 j=1
di
s
i=1 j=1
(i)
b(i)
j j
di
s
b(i)
j
a (i)
j,nr wr
r =1
i=1 j=1
g
=:
k=0
k =n 1 ,...,n g
di
s
(i)
b(i)
zk .
j a j,k
i=1 j=1
k=m+g
di
s
(i)
b(i)
zk .
j a j,k
i=1 j=1
104
for 1 j d1 , and
(i)
j L (D + j Pi j P1 ) L (D + di Pi P1 ) L (D + (d1 1)P1 + d2 P2 + + ds Ps )
for all m N,
105
References
1. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte
Carlo Integration. Cambridge University Press, Cambridge (2010)
2. Faure, H.: Discrpance de suites associes un systme de numration (en dimension s). Acta
Arith. 41, 337351 (1982)
3. Faure, H., Kritzer, P.: New star discrepancy bounds for (t, m, s)-nets and (t, s)-sequences.
Monatsh. Math. 172, 5575 (2013)
4. Hofer, R., Niederreiter, H.: Explicit constructions of Vandermonde sequences using global
function fields, preprint available at http://arxiv.org/abs/1311.5739
5. Hofer, R., Niederreiter, H.: Vandermonde nets. Acta Arith. 163, 145160 (2014)
6. Kritzer, P.: Improved upper bounds on the star discrepancy of (t, m, s)-nets and (t, s)sequences. J. Complex. 22, 336347 (2006)
7. Niederreiter, H.: Point sets and sequences with small discrepancy. Monatsh. Math. 104, 273
337 (1987)
8. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM,
Philadelphia (1992)
9. Niederreiter, H.: (t, m, s)-nets and (t, s)-sequences. In: Mullen, G.L., Panario, D. (eds.) Handbook of Finite Fields, pp. 619630. CRC Press, Boca Raton (2013)
10. Niederreiter, H.: Finite fields and quasirandom points. In: Charpin, P., Pott, A., Winterhof, A.
(eds.) Finite Fields and Their Applications: Character Sums and Polynomials, pp. 169196. de
Gruyter, Berlin (2013)
11. Niederreiter, H., zbudak, F.: Low-discrepancy sequences using duality and global function
fields. Acta Arith. 130, 7997 (2007)
12. Niederreiter, H., Xing, C.P.: Low-discrepancy sequences and global function fields with many
rational places. Finite Fields Appl. 2, 241273 (1996)
13. Niederreiter, H., Xing, C.P.: Rational Points on Curves over Finite Fields: Theory and Applications. Cambridge University Press, Cambridge (2001)
14. Niederreiter, H., Xing, C.P.: Algebraic Geometry in Coding Theory and Cryptography. Princeton University Press, Princeton (2009)
15. Pillichshammer, F.: Polynomial lattice point sets. In: Plaskota, L., Wozniakowski, H. (eds.)
Monte Carlo and Quasi-Monte Carlo Methods 2010, pp. 189210. Springer, Berlin (2012)
16. Pirsic, G.: A small taxonomy of integration node sets. sterreich. Akad. Wiss. Math.-Naturw.
Kl. Sitzungsber. II(214), 133140 (2005)
17. Pirsic, G., Dick, J., Pillichshammer, F.: Cyclic digital nets, hyperplane nets, and multivariate
integration in Sobolev spaces. SIAM J. Numer. Anal. 44, 385411 (2006)
18. Schrer, R.: A new lower bound on the t-parameter of (t, s)-sequences. In: Keller, A., Heinrich,
S., Niederreiter, H. (eds.)Monte Carlo and Quasi-Monte Carlo Methods 2006, pp. 623632.
Springer, Berlin (2008)
19. Stichtenoth, H.: Algebraic Function Fields and Codes, 2nd edn. Springer, Berlin (2009)
20. Xing, C.P., Niederreiter, H.: A construction of low-discrepancy sequences using global function
fields. Acta Arith. 73, 87102 (1995)
1 Introduction
The central goal of light transport algorithms in computer graphics is the generation
of renderings, two-dimensional images that depict a simulated environment as if
photographed by a virtual camera. Driven by the increasing demand for photorealism,
computer graphics is currently undergoing a substantial transition to physics-based
rendering techniques that compute such images while accurately accounting for the
interaction of light and matter.
These methods require a detailed model of the scene including the shape and
optical properties of all objects including light sources; the final rendering is then
generated by a simulation of the relevant physical laws, specifically transport and
scattering, i.e., the propagation of light and its interaction with the materials that
comprise the objects. In this article, we present a high-level overview of the underlying physics and analyze how this leads to a high-dimensional integration problem
that is typically handled using Monte Carlo methods.
W. Jakob (B)
Realistic Graphics Lab, EPFL, Lausanne, Switzerland
e-mail: wenzel.jakob@epfl.ch
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_4
107
108
W. Jakob
Section 2 begins with a discussion of the geometric optics framework used in computer graphics. After defining the necessary notation and physical units, we state
the energy balance equation that characterizes the interaction of light and matter.
Section 3 presents a simple recursive Monte Carlo estimator that solves this equation, though computation time can be prohibitive if accurate solutions are desired.
Section 4 introduces path space integration, which offers a clearer view of the underlying light transport problem. This leads to a large class of different estimators that
can be combined to improve convergence. Section 5 introduces MCMC methods in
rendering. Section 6 covers an MCMC method that explores a lower-dimensional
manifold of light paths, and Sect. 7 discusses extensions to cases involving interreflection between glossy objects. Section 8 concludes with a discussion of limitations and unsolved problems.
This article is by no means a comprehensive treatment of rendering; the selection
of topics is entirely due to the authors personal preference. It is intended that the
discussion will be helpful to readers who are interested in obtaining an understanding of recent work on path-space methods and applications of MCMC methods in
rendering.
109
N(x) > 0
N(x) < 0
and
N(x) > 0
.
N(x) < 0
With the help of these definitions, we can introduce the surface energy balance
equation that describes the relation between the incident and outgoing radiance based
on the material properties at x:
Lo (x, ) =
S2
Li (x, ) f (x, ) N(x) d + Le (x, ), x M . (1)
The integration domain S 2 is the unit sphere and f is the bidirectional scattering distribution function (BSDF) of the surface, which characterizes the surfaces response
110
W. Jakob
Incident
radiance
Reflectance
Forefunction
shortening
Emitted
radiance
Final
pixel color
Fig. 2 Illustration of the energy balance Eq. (1) on surfaces. Here, it is used to compute the pixel
color of the surface location highlighted in white (only the top hemisphere is shown in the figure)
111
Fig. 3 An overview of common material types. The left side of each example shows a 2D illustration
of the underlying scattering process for light arriving from the direction highlighted in red. The
right side shows a corresponding rendering of a material test object
microfacet theory [4, 27, 32] are a popular choice in particularthey model the interaction of light with random surfaces composed of tiny microscopic facets that are
oriented according to a statistical distribution. Integration over this distribution then
leads to simple analytic expressions that describe the expected reflection and refraction properties at a macroscopic scale. In this article, we assume that the BSDFs are
provided as part of the input scene description and will not discuss their definitions
in detail.
3 Path Tracing
We first discuss how Eq. (1) can be solved using Monte Carlo integration, which leads
to a simple method known as Path Tracing [12]. For this, it will be convenient to
establish some further notation: we define the distance to the next surface encountered
by the ray (x, ) R3 S 2 as
dM (x, ) := inf {d > 0 | x + d M }
where inf = . Based on this distance, we can define a ray-casting function r:
r(x, ) := x + dM (x, ).
(2)
112
W. Jakob
Due to the preservation of radiance along unoccluded rays, the ray-casting function
can be used to relate the quantities Li and Lo :
Li (x, ) = Lo (r(x, ), ).
In other words, to find the incident radiance along a ray (x, ), we must only determine the nearest surface visible in this direction and evaluate its outgoing radiance
into the opposite direction. Using this relation, we can eliminate Li from the energy
balance Eq. (1):
Lo (x, ) =
S2
Lo (r(x, ), ) f (x, ) N(x) d + Le (x, )
(3)
Although the answer is still not given explicitly, the equation is now in a form
that is suitable for standard integral equation solution techniques. However, this is
made difficult by the ill-behaved nature of the integrand, which is generally riddled
with singularities and discontinuities caused by visibility changes in the ray-casting
function r. Practical solution methods often rely on a Neumann series expansion of the
underlying integral operator, in which case the resulting high number of dimensions
rules out standard deterministic integration rules requiring an exponential number
of function evaluations. Monte Carlo methods are resilient to these issues and hence
see significant use in rendering.
To obtain an unbiased MC estimator based on Eq. (3), we replace the integral with
a single sample of the integrand at a random direction and divide by its probability
density p( ), i.e.
Lo (r(x, ), ) f (x, ) N(x)
+ Le (x, )
Lo (x, ) =
p( )
(4)
113
S2
h(r(x, ), ) f (x, ) N(x) d ,
(5)
h
L :=
S2
114
W. Jakob
which intuitively expresses the property that the outgoing radiance is equal to the
emitted radiance plus radiance that has scattered one or more times (the sum converges since the energy of the multiply scattered illumination tends to zero).
Lo = Le + TLe + T 2 Le + .
(6)
Rather than explicitly computing the radiance function Lo , the objective of rendering is usually to determine the response of a simulated camera to illumination that
reaches its aperture. Suppose that the sensitivity of a pixel j in the camera is given by
(j)
sensitivity profile function We : M S 2 R defined on ray space. The intensity
Ij of the pixel is given by
Ij =
M
S2
(7)
which integrates over its sensitivity function weighted by the outgoing radiance
on surfaces that are observed by the camera. The spherical integral in the above
expression involves an integrand that is evaluated at the closest surface position as
seen from the ray (x, ). It is convenient to switch to a different domain involving
only area integrals. We can transform the above integral into this form using the
identity
S2
q(r(x, )) | N(x)| d =
(8)
N(x)
xy
xy N(y)
x y
2
(9)
1, if {x + (1 )y | 0 < < 1} M =
0, otherwise
(10)
M
We(j) (x,
xy) Lo (y,
yx) G(x y) dA(x, y).
(11)
We can now substitute Lo given by Eq. (6) into the above integral, which is a power
series of the T operator (i.e. increasingly nested spherical integrals). Afterwards, we
apply the change of variables once more to convert all nested spherical integrals into
115
nested surface integrals. This is tedious but straightforward and leads to an explicit
expression of Ij in terms of an infinite series of integrals over increasing Cartesian
powers of M .
These nested integrals over surfaces are due to the propagation of light along
straight lines and changes of direction at surfaces, which leads to the concept of a light
path. This can be thought of as the trajectory of a particle carrying an infinitesimal
portion of the illumination. It is a piecewise linear curve x = x1 xn with endpoints
x1 and xn and intermediate scattering vertices x2 , . . . , xn1 . The space of all possible
light paths is a union consisting of paths with just the endpoints, paths that have one
intermediate scattering event, and so on. More formally, we define path space as
P :=
Pn , and
n=2
Pn := {x1 xn | x1 , . . . , xn M } .
(12)
The nested integrals which arose from our manipulation of Eq. (11) are simply integrals over light paths of different lengths, i.e.
Ij =
P2
(x1 x2 ) dA(x1 , x2 ) +
P3
(x1 x2 x3 ) dA(x1 , x2 , x3 ) + . . . .
(13)
Because some paths carry more illumination from the light source to the camera
than others, the integrand : P R is needed to quantify their light-carrying
capacity; its definition varies based on the number of input arguments and is given
by Eq. (15). The total illumination Ij arriving at the camera is often written more
compactly as an integral of over the entire path space, i.e.:
=:
(x) dA(x).
(14)
n1
G(xk1 xk ) f (xk1 xk xk+1 )
k=2
x
xi followed by a directional argument
i xi+1 . Figure 4 shows an example light path
and the different weighting terms. We summarize their meaning once more:
116
W. Jakob
Fig. 4 Illustration of a simple light path with four vertices and its corresponding weighting function
Le (x1 x2 ) is the emission profile of the light source. This term expresses the
amount of radiance emitted from position x1 traveling towards x2 . It is equal to
zero when x1 is not located on a light source.
j
We (xn1 xn ) is the sensitivity profile of pixel j of the camera; we can think of
the pixel grid as an array of sensors, each with its own profile function.
G(x y) is the geometric term (Eq. 9), which specifies the differential amount
of illumination carried along segments of the light path. Among other things, it
accounts for visibility: when there is no unobstructed line of sight between x and
y, G evaluates to zero.
f (xk1 xk xk+1 ) is the BSDF, which specifies how much of the light
that travels from xk1 to xk is then scattered towards position xk+1 . This function
essentially characterizes the material appearance of an object (e.g., whether it is
made of wood, plastic, concrete, etc.).
Over the last 40 years, considerable research has investigated realistic expressions
for the Le , We , and f terms. In this article, we do not discuss their definition and prefer
to think of them as black box functions that can be queried by the rendering algorithm. This is similar to how rendering software is implemented in practice: a scene
description might reference a particular material (e.g., car paint) whose corresponding function f is provided by a library of material implementations. The algorithm
accesses it through a high-level interface shared by all materials, but without specific
knowledge about its internal characteristics.
117
118
W. Jakob
Fig. 5 A bidirectional path tracer finds light paths by generating partial paths starting at the camera
and light sources and connecting them in every possible way. The resulting statistical estimators
tend to have lower variance than unidirectional techniques. Modeled after a scene by Eric Veach. a
Path tracer, 32 samples/pixel. b Bidirectional path tracer, 32 samples/pixel
generate vertex xi+1 from xi and xi1 , which leads to a method referred to as light
tracing or particle tracing. This method sends out particles from the light source
(thus avoiding problems with the enclosure) and records the contribution to rendered
pixels when they hit the aperture of the camera.
4.2.1
The bidirectional path tracing method (BDPT) [17, 29] computes radiance estimates
via two separate random walks from the light sources and the camera. The resulting
two partial paths are connected for every possible vertex pair, creating many complete
paths of different lengths, which supplies this method with an entire family of path
sampling strategies. A path with n vertices can be created in n + 1 different ways,
which is illustrated by Fig. 6 for a simple path with 3 vertices (2 endpoints and 1
scattering event). The captions s and t indicate the number of sampling steps from
the camera and light source. In practice, each of the strategies is usually successful at
dealing with certain types of light paths, while being a poor choice for others (Fig. 7).
4.2.2
Because all strategies are defined on the same space (i.e. path space), and because
each has a well-defined density function on this space, it is possible to evaluate
and compare these densities to determine the most suitable strategy for sampling
particular types of light paths. This is the key insight of multiple importance sampling
119
Fig. 6 The four different ways in which bidirectional path tracing can create a path with one
scattering event: a Standard path tracing, b Path tracing variant: connect to sampled light source
positions, c Standard light tracing, d Light tracing variant: connect to sampled camera positions.
Solid lines indicate sampled rays which are intersected with the geometry, whereas dashed lines
indicate deterministic connection attempts which must be validated by a visibility test
(MIS) [30] which BDPT uses to combine multiple sampling strategies in a provably
good way to minimize variance in the resulting rendering (bottom of Fig. 7).
Suppose two statistical estimators of the pixel intensity Ij are available. These
estimators can be used to generate two light paths x 1 and x 2 , which have path space
probability densities p1 (x1 ) and p2 (x2 ), respectively. The corresponding MC estimates are given by
Ij(1) =
(x1 )
(x2 )
and Ij(2) =
.
p1 (x1 )
p2 (x2 )
1 (1)
Ij + Ij(2) .
2
However, this is not a good idea, since the combination is affected by the variance
of the worst ingredient estimator (BDPT generally uses many estimators, including
ones that have very high variance). Instead, MIS combines estimators using weights
that are related to the underlying sample density functions:
Ij(4) := w1 (x1 )Ij(1) + w2 (x2 )Ij(2) ,
where
wi (x) :=
pi (x)
.
p1 (x) + p2 (x)
(16)
120
W. Jakob
(a)
(b)
Fig. 7 The individual sampling strategies that comprise the previous BDPT rendering, both a
without and b with multiple importance sampling. Each row corresponds to light paths of a certain
length, and the top row matches the four strategies from Fig. 6. Almost every strategy has deficiencies
of some kind; multiple importance sampling re-weights samples to use strategies where they perform
well
121
While not optimal, Veach proves that no other choice of weighting functions can
significantly improve on Eq. (16). He goes on to propose a set of weighting heuristics that combine many estimators (i.e., more than two), and which yield perceptually
better results. The combination of BDPT and MIS often yields an effective method
that addresses many of the flaws of the path tracing algorithm. Yet, even this combination can fail in simple cases, as we will discuss next.
122
(a) Path tracing from the light source (b) Path tracing from the camera
W. Jakob
Fig. 8 Illustration of the difficulties of sequential path sampling methods when rendering LSDSE
caustic patterns at the bottom of a swimming pool. a, b Unidirectional techniques sample light paths
by executing a random walk consisting of alternating transport and scattering steps. The only way to
successfully complete a path in this manner is to randomly hit the light source or camera, which
happens with exceedingly low probability. c Bidirectional techniques trace paths from both sides,
but in this case they cannot create a common vertex at the bottom of the pool to join the partial light
paths
Figure 8b shows the behavior of the path tracing method, which generates paths
in the reverse direction but remains extremely inefficient: in order to construct a
complete light path x with (x) > 0, the path must reach the other end by chance,
which happens with exceedingly low probability. Assuming for simplicity that rays
leave the pool with a uniform distribution in Fig. 8b, the probability of hitting the
sun with an angular diameter of 0.5 is on the order of 105 .
BDPT traces paths from both sides, but even this approach is impractical here:
vertices on the water surface cannot be used to join two partial paths, since the
resulting pair of incident and outgoing directions would not satisfy Snells law. It is
possible to generate two vertices at the bottom of the pool as shown in the figure,
but these cannot be connected: the resulting path edge would be fully contained in a
surface rather than representing transport between surfaces.
In this situation, biased techniques would connect the two vertices at the bottom
of the pool based on a proximity criterion, which introduces systematic errors into
the solution. We will only focus on unbiased techniques that do not rely on such
approximations.
The main difficulty in scenes like this is that caustic paths are tightly constrained:
they must start on the light source, end on the aperture, and satisfy Snells law in two
places. Sequential sampling approaches are able to satisfy all but one constraint and
run into issues when there is no way to complete the majority of paths.
Paths like the one examined in Fig. 8 lead to poor convergence in other settings
as well; they are collectively referred to as speculardiffusespecular (SDS) paths
due to the occurrence of this sequence of interactions in their path classification.
SDS paths occur in common situations such as a tabletop seen through a drinking
glass standing on it, a bottle containing shampoo or other translucent liquid, a shop
window viewed and illuminated from outside, as well as scattering inside the eye of
a virtual character. Even in scenes where these paths do not cause dramatic effects,
their presence can lead to excessively slow convergence in rendering algorithms that
attempt to account for all transport paths. It is important to note that while the SDS
class of paths is a well-studied example case, other classes (e.g., involving glossy
123
interactions) can lead to many similar issues. It is desirable that rendering methods
are robust to such situations. Correlated path sampling techniques based on MCMC
offer an attractive way to approach such challenges. We review these methods in the
remainder of this article.
124
W. Jakob
function Record(xi ), which first determines the pixel associated with the current
iterations light path xi and then increases its brightness by a fixed amount.
These MCMC methods all sample light paths proportional to the amount they
contribute to the pixels of the final rendering; by increasing the pixel brightness in
this way during each iteration, these methods effectively compute a 2D histogram of
the marginal distribution of over pixel coordinates. This is exactly the image to be
rendered up to a global scale factor, which can be recovered using a traditional MC
sampling technique such as BDPT. The main difference among these algorithms is
the underlying state space, as well as the employed set of mutation rules.
[0,1]n
( ) d .
(17)
(a)
125
(b)
Fig. 9 Primary sample space MLT performs mutations in an abstract random number space. A deterministic mapping induces corresponding mutations in path space. a Primary sample space view.
b Path space view
The key idea of PSSMLT is to compute Eq. (17) using MCMC integration on primary
sample space, which leads to a trivial implementation, as all complications involving
light paths and other rendering-specific details are encapsulated in the black box
mapping (Fig. 9).
One missing detail is that the primary sample space dimension n is unknown
ahead of time. This can be solved by starting with a low-dimensional integral and
extending the dimension on demand when additional samples are requested by .
PSSMLT uses two types of Mutate functions. The first is an independence
sampler, i.e., it forgets the current state and switches to a new set of pseudorandom
variates. This is needed to ensure that the Markov Chain is ergodic. The second is a
local (e.g. Gaussian or similar) proposal centered around a current state i [0, 1]n .
Both are symmetric so that the proposal density T cancels in the acceptance ratio
(Line 5 in Algorithm 2).
PSSMLT uses independent proposals to find important light paths that cannot be
reached using local proposals. When it finds one, local proposals are used to explore
neighboring light paths which amortizes the cost of the search. This can significantly
improve convergence in many challenging situations and is an important advantage
of MCMC methods in general when compared to MC integration.
Another advantage of PSSMLT is that it explores light paths through a black box
mapping that already makes internal use of sophisticated importance sampling
techniques for light paths, which in turn leads to an easier integration problem in
primary sample space. The main disadvantage of this method is that its interaction
with is limited to a stream of pseudorandom numbers. It has no direct knowledge
of the generated light paths, which prevents the design of more efficient mutation
rules based on the underlying physics.
126
W. Jakob
PSSMLT
MMLT
Components
(t, s ) = (2 , 5)
(3 , 4)
(4 , 3)
(5 , 2)
(6 , 1)
Visualization
Fig. 10 Analysis of the Multiplexed MLT (MMLT) technique [7] (used with permission): the
top row shows weighted contributions from different BDPT strategies in a scene with challenging
indirect illumination [18, 28]. The intensities in the middle row visualize the time spent on each
strategy using the MMLT technique: they are roughly proportional to the weighted contribution
in the first row. The rightmost column visualizes the dominant strategies (3,4), (4, 3), and (5, 2)
using RGB colors. PSSMLT (third row) cannot target samples in this way and thus produces almost
uniform coverage
127
128
W. Jakob
(a)
(b)
(c)
(d)
Fig. 11 MLT operates on top of path space, which permits the use of a variety of mutation rules that
are motivated by important physical scattering effects. The top row illustrates ones that are useful
when rendering a scene involving a glass object on top of a diffuse table. The bottom row is the
swimming pool example from Fig. 8. In each example, the original path is black, and the proposal
is highlighted in blue. a Lens perturbation. b Caustic perturbation. c Multi-chain perturbation. d
Manifold perturbation
Camera
Mirror
Fig. 12 A motivating example in two dimensions: specular reflection in a mirror
129
events as a specular chain of length k. A specular chain of length 1 from the light
source to the camera is shown in the figure.
Reflections in the mirror must satisfy the law of specular reflection. Assuming
that the space of all specular chains in this simple scene can be parameterized using
the horizontal coordinates x1 , x2 , and x3 , it states that
x2 =
x1 + x3
,
2
(18)
i.e., the x coordinate of the second vertex must be exactly half-way between the
endpoints. Note that this equation can also be understood as the implicit definition
of a plane in R3 (x1 2x2 + x3 = 0).
When interpreting the set of all candidate light paths as a three-dimensional space
P3 of coordinate tuples (x1 , x2 , x3 ), this constraint then states that the subset of
relevant paths has one dimension less and is given by the intersection of P3 and the
plane Eq. (18). With this extra knowledge, it is now easy to sample valid specular
chains, e.g. by generating x1 and x3 and solving for x2 .
Given general non-planar shapes, the problem becomes considerably harder, since
the equations that have to be satisfied are nonlinear and may admit many solutions.
Prior work has led to algorithms that can find solutions even in such cases [21, 33]
but these methods are closely tied to the representation of the underlying geometry,
and they become infeasible for specular chains with lengths greater than one. Like
these works, ME finds valid specular chainsbut because it does so within the
neighborhood of a given path, it avoids the complexities of a full global search and
does not share these limitations.
ME is also related to the analysis of reflection geometry presented by Chen and
Arvo [2], who derived second-order expansion of the neighborhood of a path. The
main difference is that ME solves for paths exactly and is used as part of an unbiased
MCMC rendering algorithm.
(x1 x5 ) dA(x1 , x3 , x5 ).
130
W. Jakob
Note the absence of the specular vertices x2 and x4 in the integrals area product
measure. The contribution function still has the same form: a product of terms corresponding to vertices and edges of the path. However, singular reflection functions
at specular vertices are replaced with (unitless) specular reflectance values, and the
geometric terms are replaced by generalized geometric terms over specular chains
that we will denote G(x1 x2 x3 ) and G(x3 x4 x5 ).
The standard geometric term G(x y) for a non-specular edge computes the area
ratio of an (infinitesimally) small surface patch at one vertex and its projection onto
projected solid angles as seen from the other vertex. The generalized geometry factor
is defined analogously: the ratio of solid angle at one end of the specular chain with
respect to area at the other end of the chain, considering the path as a function of the
positions of the endpoints.
i + o
is equal to the surface normal, i.e., h(i , o ) = n. In the case of refraction, the
relationship of these directions is explained by Snells law. Using a generalized
definition of the half direction vector which includes weighting by the incident and
outgoing indices of refraction [32]; i.e.,
Specular reflection
Specular refraction
Fig. 13 In-plane view of the surface normal n and incident and outgoing directions i and o at a
surface marking a transition between indices of refraction i and o
h(i , o ) :=
i i + o o
,
i i + o o
131
(20)
we are able to use a single constraint h(i , o ) = n which subsumes both Snells
law and the law of specular reflection (in which case i equals o ). Each specular
vertex xi of a path x must satisfy this generalized constraint involving its own position
and the positions of the preceding and following vertices. Note that this constraint
involves unit vectors with only two degrees of freedom. We can project (20) onto a
two-dimensional subspace to reflect its dimensionality:
x
ci (x) = T (xi )T h(
i xi1 , xi xi+1 ),
(21)
(23)
132
W. Jakob
(24)
Unfortunately, the theorem does not specify how to compute qit only guarantees
the existence of such a function. It does, however, provide an explicit expression for
the derivative of q, which contains all information we need to compute a basis for
the tangent space at the path x , which corresponds to the origin in local coordinates.
This involves the Jacobian of the constraint function c (0), which is a matrix of
k 2 by k 2-by-2 blocks with a block tridiagonal structure (Fig. 14).
(a)
(b)
where
(c)
(d)
Fig. 14 The linear system used to compute the tangent space and its interpretation as a derivative
of a specular chain. a An example path. b Associated constraints. c Constraint Jacobian. d Tangent
space
133
(25)
This matrix is k 2 by 2 blocks in size, and each block represents the derivative of
one vertex with respect to one endpoint.
This construction computes tangents with respect to a graph parameterization
of the manifold, which is guaranteed to exist for a suitable choice of independent
variables. Because we always use the endpoint vertices for this purpose, difficulties
arise when one of the endpoints is located exactly at the fold of a caustic wavefront,
in which case c becomes rank-deficient and A fails to be invertible. This happens
rarely in practice and is not a problem for our method, which allows for occasional
parameterization failures. In other contexts where this is not acceptable, the chain
could be parameterized by a different pair of vertices when a non-invertible matrix
is detected.
These theoretical results about the structure of the specular manifold can be used
in an algorithm to solve for specular paths, which we discuss next.
134
W. Jakob
Start
te
pola
extra
ct
oje
pr
ext
rap
ola
te
project
Target
Fig. 15 Manifold walks use a Newton-like iteration to locally parameterize the specular manifold.
The extrapolation operation takes first-order steps based on the local manifold tangents, which are
subsequently projected back onto the manifold
ed
trac
ted
a
upd
half-vector equal
to surface normal
xb xc . The key observation is that MCMC explores the space of light paths using
localized steps, which is a perfect match for the local parameterization of the path
manifold provided by Manifold Exploration.
6.5 Results
Figures 17 and 18 show the comparisons of several MCMC rendering techniques for
an interior scene containing approximately 2 million triangles with shading normals
and a mixture of glossy, diffuse, and specular surfaces and some scattering volumes.
One hour of rendering time was allotted to each technique; the results are intentionally
unconverged to permit a visual analysis of the convergence behavior. By reasoning
about the geometry of the specular and offset specular manifolds for the paths it
encounters, the ME perturbation strategy is more successful at rendering certain
pathssuch as illumination that refracts from the bulbs into the butter dish, then to
the camera (6 specular vertices)that the other methods struggle with.
135
(a)
(b)
(c)
(d)
Fig. 17 This interior scene shows chinaware, a teapot containing an absorbing medium, and a butter
dish on a glossy silver tray. Illumination comes from a complex chandelier with glass-enclosed bulbs.
Prior methods have difficulty in finding and exploring relevant light paths, which causes noise and
other convergence artifacts. Equal-time renderings on an eight-core Intel Xeon X5570 machine at
1280 720 pixels in 1 h. a MLT [28]. b ERPT [3]. c PSSMLT [14]. d ME [11]
(a)
(b)
(c)
(d)
Fig. 18 This view of a different part of the room, now lit through windows using a spherical environment map surrounding the scene, contains a scattering medium inside the glass egg.
Equal-time renderings at 720 1280 pixels in 1 h. a MLT [28]. b ERPT [3]. c PSSMLT [14].
d ME [11]
136
W. Jakob
Specular
Glossy
137
path space near the specular manifold can be explored without stepping out of this
thin band of near-specular light transport.
x3
x5
x1
h2
h3
x2
h4
h5
x4
x3
x5
x1
h2
x2
h3
h4
h5
x4
Fig. 20 In the above example, ME (top) constrains the half vectors of two glossy chains x1 . . . x4
and x4 . . . x6 and solves for an updated configuration after perturbing the position of x4 . HSLT
(bottom) instead adjusts all half vectors at once and solves for suitable vertex positions with this
configuration. This proposal is effective for importance sampling the material terms and leads to
superior convergence when dealing with transport between glossy surfaces. Based on a figure by
Kaplanyan et al. [13] (used with permission)
138
W. Jakob
Difference
Fig. 21 The natural constraint formulation [13] is a parameterization of path space in the half vector
domain. It has the interesting property of approximately decoupling the influence of the individual
scattering events on . The figure shows a complex path where the half vector h3 is perturbed at
vertex x3 . The first column shows a false-color plot of over the resulting paths for different values
of h3 and two roughness values. The second column shows a plot of the BSDF value at this vertex,
which is approximately proportional to . Based on a figure by Kaplanyan et al. [13] (used with
permission)
139
MEMLT (30m)
HSLT+MLT (30m)
Fig. 22 Equal-time rendering of an interior kitchen scene with many glossy reflections. Based on
a figure by Kaplanyan et al. [13] (used with permission)
8 Conclusion
This article presented an overview of the physics underlying light transport simulations in computer graphics. After introducing relevant physical quantities and the
main energy balance equation, we showed how to compute approximate solutions
using a simple Monte Carlo estimator. Following this, we introduced the concept of
path space and examined the relation of path tracing, light tracing, and bidirectional
path tracingincluding their behavior given challenging input that causes these
methods to become impracticably slow. The second part of this article reviewed several MCMC methods that compute path space integrals using proposal distributions
defined on sets of light paths. To efficiently explore light paths involving specular
materials, we showed how to implicitly define and locally parameterize the associated paths using a root-finding iteration. Finally, we reviewed recent work that
aims to generalize this approach to glossy scattering interactions. Most of the methods that were discussed are implemented in the Mitsuba renderer [9], which is a
research-oriented open source rendering framework.
MCMC methods in rendering still suffer from issues that limit their usefulness
in certain situations. Most importantly, they require an initialization or mutation
rule that provides well distributed seed paths to the perturbations, as they can only
explore connected components of path space. Bidirectional Path Tracing and the
Bidirectional Mutation are reasonably effective but run into issues when there are
many disconnected components of path space. This becomes increasingly problematic as their number increases. Ultimately, as the number of disconnected components
exceeds the number of samples that can be generated, local exploration of path space
becomes ineffective; future algorithms could be designed to attempt exploration only
in sufficiently large path space components.
Furthermore, the all perturbations rules made assumptions about specific path
configurations or material properties, which limits their benefits when rendering
scenes that contain a wide range of material types. To efficiently deal with light paths
140
W. Jakob
References
1. Arvo, J.R.: Analytic methods for simulated light transport. Ph.D. thesis, Yale University (1995)
2. Chen, M., Arvo, J.: Theory and application of specular path perturbation. ACM Trans. Graph.
19(4), 246278 (2000)
3. Cline, D., Talbot, J., Egbert, P.: Energy redistribution path tracing. ACM Trans. Graph. 24(3),
11861195 (2005)
4. Cook, R.L., Torrance, K.E.: A reflectance model for computer graphics. ACM Trans. Graph.
1(1), 724 (1982)
5. Doucet, A., Johansen, A., Tadic, V.: On solving integral equations using Markov Chain Monte
Carlo methods. Appl. Math. Comput. 216(10), 28692880 (2010)
6. Grnschlo, L., Raab, M., Keller, A.: Enumerating quasi-Monte Carlo point sequences in
elementary intervals. In: Plaskota, L., Wozniakowski, H. (eds.) Monte Carlo and Quasi-Monte
Carlo Methods 2010. Springer Proceedings in Mathematics and Statistics, vol. 23, pp. 399408.
Springer, Berlin (2012)
7. Hachisuka, T., Kaplanyan, A.S., Dachsbacher, C.: Multiplexed metropolis light transport. ACM
Trans. Graph. 33(4), 100:1100:10 (2014)
8. Heckbert, P.S.: Adaptive radiosity textures for bidirectional ray tracing. In: Proceedings of
SIGGRAPH 90 on Computer Graphics. (1990)
9. Jakob, W.: Mitsuba renderer. http://www.mitsuba-renderer.org (2010)
10. Jakob, W.: Light transport on path-space manifolds. Ph.D. thesis, Cornell University (2013)
11. Jakob, W., Marschner, S.: Manifold exploration: a Markov Chain Monte Carlo technique for
rendering scenes with difficult specular transport. ACM Trans. Graph. 31(4), 58:158:13 (2012)
12. Kajiya, J.T.: The rendering equation. In: Proceedings of SIGGRAPH 86 on Computer Graphics,
pp. 143150 (1986)
13. Kaplanyan, A.S., Hanika, J., Dachsbacher, C.: The natural-constraint representation of the path
space for efficient light transport simulation. ACM Trans. Graph. (Proc. SIGGRAPH) 33(4),
113 (2014)
14. Kelemen, C., Szirmay-Kalos, L., Antal, G., Csonka, F.: A simple and robust mutation strategy
for the Metropolis light transport algorithm. Comput. Graph. Forum 21(3), 531540 (2002)
15. Keller, A.: Quasi-Monte Carlo Image Synthesis in a Nutshell. Springer, Heidelberg (2014)
16. Kollig, T., Keller, A.: Efficient Bidirectional Path Tracing by Randomized Quasi-Monte Carlo
Integration. Springer, Heidelberg (2002)
17. Lafortune, E.P., Willems, Y.D.: Bi-directional path tracing. In: Proceedings of the Compugraphics 93. Alvor, Portugal (1993)
18. Lehtinen, J., Karras, T., Laine, S., Aittala, M., Durand, F., Aila, T.: Gradient-domain Metropolis
light transport. ACM Trans. Graph. 32(4), 1 (2013)
19. Manzi, M., Rousselle, F., Kettunen, M., Lethinen, J., Zwicker, M.: Improved sampling for
gradient-domain Metropolis light transport. ACM Trans. Graph. 33(6), 112 (2014)
20. Matsumoto, M., Nishimura, T.: Mersenne twister: a 623-dimensionally equidistributed uniform
pseudo-random number generator. ACM Trans. Model. Comput. Simul. 8(1), 330 (1998)
21. Mitchell, D.P., Hanrahan, P.: Illumination from curved reflectors. In: Proceedings of the SIGGRAPH 92 on Computer Graphics, pp. 283291 (1992)
141
22. Nicodemus, E.: Geometrical Considerations and Nomenclature for Reflectance, vol. 160. US
Department of Commerce, National Bureau of Standards, Washington (1977)
23. Pauly, M., Kollig, T., Keller, A.: Metropolis light transport for participating media. In: RenderingTechniques 2000: 11th Eurographics Workshop on Rendering, pp. 1122 (2000)
24. Pharr, M., Humphreys, G., Jakob, W.: Physically Based Rendering: From Theory to Implementation, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco (2016)
25. Preisendorfer, R.: Hydrologic optics. US Department of Commerce, Washington (1976)
26. Spivak, M.: Calculus on Manifolds. Addison-Wesley, Boston (1965)
27. Torrance, E., Sparrow, M.: Theory for off-specular reflection from roughened surfaces. JOSA
57(9), 11051112 (1967)
28. Veach, E.: Robust Monte Carlo methods for light transport simulation. Ph.D. thesis, Stanford
University (1997)
29. Veach, E., Guibas, L.: Bidirectional estimators for light transport. In: Proceedings of the Fifth
Eurographics Workshop on Rendering (1994)
30. Veach, E., Guibas, L.J.: Optimally combining sampling techniques for Monte Carlo rendering.
In: Proceedings of the 22nd annual conference on Computer graphics and interactive techniques,
SIGGRAPH 95, pp. 419428. ACM (1995)
31. Veach, E., Guibas, L.J.: Metropolis light transport. In: Proceedings of the SIGGRAPH 97 on
Computer Graphics, pp. 6576 (1997)
32. Walter, B., Marschner, S.R., Li, H., Torrance, K.E.: Microfacet models for refraction through
rough surfaces. In: Rendering Techniques 2007: 18th Eurographics Workshop on Rendering,
pp. 195206 (2007)
33. Walter, B., Zhao, S., Holzschuch, N., Bala, K.: Single scattering in refractive media with triangle
mesh boundaries. ACM Trans. Graph 28(3), 92 (2009)
[0,1)s
f (x)dx.
M. Matsumoto (B)
Graduate School of Sciences, Hiroshima University, Hiroshima 739-8526, Japan
e-mail: m-mat@math.sci.hiroshima-u.ac.jp
R. Ohori
Fujitsu Laboratories Ltd., Kanagawa 211-8588, Japan
e-mail: ohori.ryuichi@jp.fujitsu.com
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_5
143
144
We choose a finite point set P [0, 1)s , whose cardinality is called the sample size
and denoted by N . The quasi-Monte Carlo (QMC) integration of f by P is the value
I ( f ; P) :=
1
f (x),
N
xP
i.e., the average of f over the finite points P that approximates I ( f ). The QMC
integration error is defined by
Error( f ; P) := |I ( f ) I ( f ; P)|.
If P consists of N independently, uniformly and randomly chosen points, the QMC
integration is nothing but the classical Monte Carlo (MC) integration, where the
integration error is expected to decrease with the order of N 1/2 when N increases,
if f has a finite variance.
The main purpose of QMC integration is to choose good point sets so that the
integration error decreases faster than MC. There are enormous studies in diverse
directions, see for examples [7, 19].
In applications, often we know little on the integrand f , so we want point sets
which work well for a wide class of f . An inequality of the form
Error( f ; P) V ( f )D(P),
(1)
145
holds, where point sets with W (P) = O(N (log N )s ) are constructible from
(t, m, s)-nets (named higher order digital net). The definition of W (P) is given
later in Sect. 5.3. We omit the definition of || f || , which depends on all partial
mixed derivatives up to the th order in each variable; when s = 1, it is defined by
f 2 :=
i=0
2
f (i) (x) d x +
() 2
f (x) d x.
2.1 Discretization
Although the following notions are naturally extended to Z/b or even any finite
abelian groups [29], we treat only the case when base b = 2 for simplicity.
Let F2 := {0, 1} = Z/2 be the two-element field. Take n large enough, and approximate the unit interval I = [0, 1) by the set of n-bit integers In := F2 n through the
inclusion In I, x(considered as an n-bit integer) x/2n + 1/2n+1 .
More precisely, we identify the finite set In with the set of half open intervals
obtained by partitioning [0, 1) into 2n pieces; namely
In := {[i2n , (i + 1)2n ) | 0 i 2n 1}.
Example 1 In the case n = 3 and I3 = {0, 1}3 , I3 is the set of 8 intervals in Fig. 1.
The s-dimensional hypercube I s is approximated by the set Ins of 2ns hypercubes,
which is identified with Ins = (F2 n )s = Ms,n (F2 ) =: V . In sum,
Fig. 1 {0, 1}3 is identified
with the set of 8
segments I3
1 See Sect. 2.3 for a definition of digital nets; there we use the italic
146
100
011
corresponds to [0.100, 0.101) [0.011, 0.100).
As an approximation of f : I s R, define
f n : InS = V R,
B f n (B) :=
1
Vol(B)
f dx
B
f (x) dx =
1
f n (B).
|V | BV
ai j bi j F2 (mod2).
1is,1 jn
f has Lipschitz constant C, namely, satisfies f (x y) < C|x y|, then the error is bounded
by C s2n [16, Lemma 2.1].
2 If
147
:=
Thus
1
g(B)(1)(B,A) .
|V | BV
1
f n (B) = I ( f ).
fn (0) =
|V | BV
Remark 1 The value fn (A) coincides with the Ath Walsh coefficient
of the function
f defined as follows. Let A = (ai j ). Define an integer ci := nj=1 ai j 2 j for each
i = 1, . . . , s. Then the Ath Walsh coefficient of f is defined as the standard multiindexed Walsh coefficient fc1 ,...,cs .
1
f n (B) =
fn (A),
|P| BP
(2)
AP
where the right equality (called Poisson summation formula) follows from
1
( BV f n (B)(1)(B,A) )
AP f n (A) =
AP
|V
|
= |V1 | BV f n (B) AP (1)(B,A)
= |V1 | BP f n (B)|P |
1
= |P|
BP f n (B).
3 The
perpendicular space is called the dual space in most literatures on QMC and coding theory.
However, in pure algebra, the dual space to a vector space V over a field k means V := Homk (V, k),
which is defined without using inner product. In this paper, we use the term perpendicular going
against the tradition in this area.
148
| fn (A)|. (3)
AP {0}
(A) :=
jai j ,
1is,1 jn
1001 ja
1004
(1 + 0 + 0 + 4)
ij
A = 0111 0234 (A) = +(0 + 2 + 3 + 4) = 17.
0010
0030
+(0 + 0 + 3 + 0)
Walsh figure of merit of P is defined as follows [16]:
Definition 4 (WAFOM) Let P V . WAFOM of P is defined by
WF(P) :=
2(A) .
AP {0}
By plugging this definition and Dicks Theorem 1 into (3), we have an inequality
of KoksmaHlawka type:
Error( f n ; P) C(s, n)|| f ||n WF(P).
(4)
149
Table 1 Toy examples for WAFOM for 3-digit discretization for integrated x, x 2
V = {000 001 010 011 100 101 110 111}
(100) = {000 001 010 011
}
100 101
}
(010) = {000 001
(110) = {000 001
110 111}
010
100
110
}
(001) = {000
(101) = {000
010
101
111}
011 100
111}
(011) = {000
(111) = {000
011
101 110
}
(A) for
P
WF(P)
Error for x
Error for x 2
A P \0
V
0
0
0.0013
001
0+0+3
23
0.0625
0.0638
101
1+0+3
24
0
0.0299
011
0+2+3
25
0
+0.0143
111
1+2+3
26
0
0.0013
and x 3
Error for x 3
0.0020
0.0637
0.0449
+0.0215
0.0137
150
Thus, the listed errors include both the discretization errors and QMC-integration
errors for f n . For the first line, P = V implies no QMC integration error for f n
(n = 3), so the values show the discretization error exactly. The error bound (4) is
proportional to WF(P) for a fixed integrand. The table shows that, for these test
functions, the actual errors are well reflected in WAFOM values.
Here is a loose interpretation of WF(P). For an F2 -linear P,
A P \{0} is a linear relation satisfied by P.
(A) measures
complexity of A.
WF(P) = AP \{0} 2(A) is small if all relations have high complexity, and
hence P is close to uniform.
The weight j in the sum
jai j in the definition of (A) denotes that the jth digit
below the decimal point is counted with complexity 2 j .
/s+Dd
= E N C log2 N /s+D .
151
1
WF(P) =
[(1 + (1)bi, j 2 j )] 1 .
|P| BP 1is,1 jn
This is computable in O(ns N ) steps of arithmetic operations in real numbers, where
N = |P|. Compared with most of other discrepancies, this is relatively easily computable. This allows us to do a random search for low-WAFOM point sets.
Remark 2 1. The above equality holds only for an F2 -linear P. Since the left hand
side is non-negative, so is the right sum in this case. It seems impossible to define
WAFOM for a general point set by using this formula, since for a general (i.e.
non-linear) P, the sum at the right hand side is sometimes negative and thus will
never give a bound on the integration error.
2. The right sum may be interpreted as the QMC integration of a function (whose
definition is given in the right hand side of the equality) by P. The integration of
the function over total space V is zero. Hence, the above equality indicates that,
to have a best F2 -linear P from the viewpoint of WAFOM, it suffices to have a
best P for QMC integration for a single specified function. This is in contrast to
the definition of star-discrepancy, where all the rectangle characteristic functions
are used as the test functions, and the supremum of their QMC integration errors
is taken.
3. Harase-Ohori [11] gives a method to accelerate this computation by a factor of
30, using a look-up table. Ohori-Yoshiki [25] gives a faster and simpler method
to compute a good approximation of WAFOM, using that Walsh coefficients of
exponential function approximates the Dick weight . More precisely,
WF(P)
s
xi ), whose
is well-approximated by the QMC-error of the function exp(2 i=1
value is easy to evaluate in modern CPUs.
4 Experimental Results
4.1 Random Search for Low WAFOM Point Sets
We fix the precision n = 30. We consider two cases of the dimension s = 4 and
s = 8. For each d = 8, 9, 10, . . . , 16, we generate d-dimensional subspace P V =
152
Fig. 2 WAFOM values for: (1) best WAFOM among 10000, (2) the 100th best WAFOM, (3)
Niederreiter-Xing, (4) Sobol , of size 2d with d = 8, 9, . . . , 16. The vertical axis is for log2 of their
WAFOM, and the horizontal for log2 of the size of point sets. The left figure is for dimension s = 4,
the right s = 8
(F2 30 )s 10000 times, by the uniformly random choice of d elements as its basis. Let
Pd,s be the point set with the lowest WAFOM among them. For the comparison, Q d,s
be the point set of the 100th lowest WAFOM.
153
Fig. 3 QMC integration errors for (1) best WAFOM among 10000, (2) the 100th best WAFOM,
(3) Niederreiter-Xing, (4) Sobol , (5) Monte Carlo, using six Genz functions on the 4-dimensional
unit cube. The vertical axis is for log2 of the errors, and the horizontal for log2 of the size of point
sets. The error is the mean square error for 100 randomly digital shifted point sets
s
Corner Peak f 3 (x) = (1 + i=1
ai xi )(s+1)
s
2
2
Gaussian f 4 (x) = exp(
a
i=1 i (x i u i ) )
s
a
|x
u
Continuous f 5 (x) = exp(
i |)
i=1 i i
0
if x1 > u 1 or x2 > u 2 ,
Discontinuous f 6 (x) =
s
exp( i=1 ai xi )) otherwise.
This selection is copied from [22, p. 91] [11]. The parameters a1 , . . . , as are selected
so that (1) they are in an arithmetic progression (2) as = 2a1 (3) the average of
a1 , . . . , as coincides with the average of c1 , . . . , c10 in [22, Eq. (10)] for each test
function. The parameters u i are generated randomly by [15].
Figure 3 shows the QMC integration errors for six test functions with five methods,
for dimension s = 4. The error for Monte Carlo is of order N 1/2 . The best WAFOM
point sets (WAFOM) and Niederreiter-Xing (NX) are comparable. For the function
Oscillatory, where its higher derivatives grow relatively slowly, WAFOM point sets
154
perform better than NX and Sobol , and the convergence rate seems of order N 2 . For
Product peak and Gaussian, WAFOM and NX are comparable; this coincides with the
fact that higher derivatives of these test functions rapidly grow, but still we observe
convergence rate N 1.6 . For Corner peak, WAFOM performs better than NX. It is
somewhat surprising that the convergence rate is almost N 1.8 for WAFOM point sets.
For Continuous, NX performs better than WAFOM. Since the test functions are not
differentiable, || f ||n is unbounded and hence the inequality (4) has no meaning. Still,
for Continuous, the convergence rate of WAFOM is almost N 1.2 . For Discontinuous,
NX and Sobol perform better than WAFOM. Note that except Discontinuous, the
large/small value of WAFOM of NX for d = 14, 15 observed in the left of Fig. 2
seems to be reflected in the five graphs.
We conducted similar experiments for s = 8 dimension, but we omit the results,
since their difference in WAFOM is small, and the QMC rules show not much
difference. We report that still we observe convergence rate with N with > 1.05
for the five test functions except Discontinuous, for WAFOM selected points and
NX.
Remark 3 Convergence rate for the integration error is even faster than that of
WAFOM values, for WAFOM selected point sets and NX for s = 4, while Sobol
sequence converging with rate N 1 . We feel that these go against our intuition, so
checked the code and compared with MC. We do not know why NX and WAFOM
work so well.
5.1 t-Value
Let P I S = [0, 1)s be a finite set of cardinality 2m . Let n 1 , n 2 , . . . , n
s 0 be
s
In i
integers. Recall that Ini is the set of 2ni intervals partitioning I . Then, i=1
is a set of 2n 1 +n 2 ++n s intervals. We want to make the QMC integration error 0 in
computing the volume of every such interval. A trivial bound is n 1 + n 2 + + n s
m, since at least one point must fall in each interval. The point set P is called a
(t, m, s)-net if the QMC integration error for each interval is zero, for any tuple
(n 1 , . . . , n s ) with
n 1 + n 2 + + n s m t.
Thus, smaller t-value is more preferable.
155
1
1
1
Fig. 4 Left Hellekaleks function f (x) = (x11.1 1+1.1
)(x21.7 1+1.7
)(x32.3 1+2.3
)(x42.9
1
4 {5x }{7x }{11x }{13x }, where {x} := x [x]. Hor),
right
Hamukazus
function
f
(x)
=
2
1
2
3
4
1+2.9
izontal axis for category, vertical for the log2 of error. :WAFOM, +
: t-value
Then, we sort the same 106 point sets by WAFOM. We categorize them into 10
classes from the smallest WAFOM, so that ith class has the same frequency with the
ith class by t-value. Thus, the same 106 point sets are categorized in two ways. For
a given test integrand function, compute the mean square error of QMC integral in
each category, for those graded by t-value and those graded by WAFOM.
Figure 4 shows log2 of the mean square integration error, for each category corresponding to 3 t 12 for t-value (+
), and for the category sorted by WAFOM
value (). The smooth test function in the left hand side comes from Hellekalek
[12], and the non-continuous function in the right hand side was communicated
from Kimikazu Kato (refered to as Hamukazu according to his established twitter
handle). From the left figure, for t = 3, the average error for the best 63 point sets
with the smallest t-value 3 is much larger than the average from the best 63 point
sets selected by WAFOM. Thus, the experiments show that for this test function,
WAFOM seems to work better than t-value in selecting good point set. We have no
explanation why the error decreases for t 9. In the right figure, for Hamukazus
non-continuous test function, t-value works better in selecting good points.
Thus, it is expected that digital nets that have small t-value and small WAFOM
would work well for smooth functions and robust to non-smooth functions. Harase
[10] noticed that Owen linear scrambling [7, Sect. 13] [26] preserves t-value, but
156
2 (A) .
(5)
AP {0}
157
min
AP {0}
1 (A).
(6)
22(A) .
AP {0}
7 Variants of WAFOM
As mentioned in the previous section, [9] defined WFr.m.s. (P). As another direction,
the following generalization of WAFOM is proposed by Yoshiki [30] and Ohori [24]:
158
( j + )ai j
1is,1 jn
for any (even negative) real number (note that this definition is different from that of
, but we could not find a better notation). Then Definition 4 gives WF (P). The case
where = 1 is dealt in [30]. A weak point of the original WAFOM is that WAFOM
value does not vary enough and consequently it is not useful in grading point sets
for a large s, see Fig. 2, the s = 8 case. By choosing a suitable , we obtain WF (P)
that varies for large s (even for s = 16) and useful in choosing a good point set [24].
A table of bases of such point sets is available from Ohoris GitHub Pages: http://
majiang.github.io/qmc/index.html. These point sets are obtained by Ohori, using
Harases method based on linear scrambling, from NX sequences. Thus, they have
small t-values and small WAFOM values. Experiments show their good performance
[18].
8 Conclusion
Walsh figure of merit (WAFOM) [16] for F2 -linear point sets as a quality measure for
a QMC rule is discussed. Since WAFOM satisfies a KoksmaHlawka type inequality
(4), its effectiveness for very smooth functions is assured. Through the experiments
on QMC integration, we observed that the low WAFOM point sets show higher order
convergence such as O(N 1.2 ) for several test functions (including non-smooth one)
in dimension four, and O(N 1.05 ) for dimension eight.
Acknowledgments The authors are deeply indebted to Josef Dick, who patiently and generously
informed us of beautiful researches in this area, and to Harald Niederreiter for leading us to this
research. They thank for the indispensable helps by the members of Komaba-Applied-Algebra Seminar (KAPALS): Takashi Goda, Shin Harase, Shinsuke Mori, Syoiti Ninomiya, Mutsuo Saito, Kosuke
Suzuki, and Takehito Yoshiki. We are thankful to the referees, who informed of numerous improvements on the manuscript. The first author is partially supported by JST CREST, JSPS/MEXT Grantin-Aid for Scientific Research No.21654017, No.23244002, No.24654019, and No.15K13460. The
second author is partially supported by the Program for Leading Graduate Schools, MEXT, Japan.
References
1. Bardeaux, J., Dick, J., Leobacher, G., Nuyens, D., Pillichshammer, F.: Efficient calculation of
the worst-case error and (fast) component-by-component construction of higher order polynomial lattice rules. Numer. Algorithm. 59, 403431 (2012)
2. Dick, J.: Walsh spaces containing smooth functions and quasi-monte carlo rules of arbitrary
high order. SIAM J. Numer. Anal. 46, 15191553 (2008)
159
3. Dick, J.: The decay of the walsh coefficients of smooth functions. Bull. Austral. Math. Soc.
80, 430453 (2009)
4. Dick, J.: On quasi-Monte Carlo rules achieving higher order convergence. In: Monte Carlo and
Quasi-Monte Carlo Methods 2008, pp. 7396. Springer, Berlin (2009)
5. Dick, J., Kritzer, P., Pillichshammer, F., Schmid, W.: On the existence of higher order polynomial lattices based on a generalized figure of merit. J. Complex 23, 581593 (2007)
6. Dick, J., Matsumoto, M.: On the fast computation of the weight enumerator polynomial and
the t value of digital nets over finite abelian groups. SIAM J. Discret. Math. 27, 13351359
(2013)
7. Dick, J., Pillichshammer, F.: Digital Nets and Sequences. Discrepancy Theory and Quasi-Monte
Carlo Integration. Cambridge University Press, Cambridge (2010)
8. Genz, A.: A package for testing multiple integration subroutines. In: Numerical Integration:
Recent Developments, Software and Applications, pp. 337340. Springer, Berlin (1987)
9. Goda, T., Ohori, R., Suzuki, K., Yoshiki, T.: The mean square quasi-Monte Carlo error for
digitally shifted digital nets. In: Cools, R., Nuyens, D. (eds.) Monte Carlo and Quasi-Monte
Carlo Methods 2014, vol. 163, pp. 331350. Springer, Heidelberg (2016)
10. Harase, S.: Quasi-Monte Carlo point sets with small t-values and WAFOM. Appl. Math. Comput. 254, 318326 (2015)
11. Harase, S., Ohori, R.: A search for extensible low-WAFOM point sets. arXiv:1309.7828
12. Hellekalek, P.: On the assessment of random and quasi-random point sets. In: Random and
Quasi-Random Point Sets, pp. 49108. Springer, Berlin (1998)
13. Joe, S., Kuo, F.: Constructing Sobol sequences with better two-dimensional projections. SIAM
J. Sci. Comput. 30, 26352654 (2008). http://web.maths.unsw.edu.au/~fkuo/sobol/new-joekuo-6.21201
14. Kakei, S.: Development in Discrete Integrable Systems - Ultra-discretization, Quantization.
RIMS, Kyoto (2001)
15. Matsumoto, M., Nishimura, T.: Mersenne twister: a 623-dimensionally equidistributed uniform
pseudorandom number generator. ACM Trans. Model.Comput. Simul. 8(1), 330 (1998). http://
www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html
16. Matsumoto, M., Saito, M., Matoba, K.: A computable figure of merit for quasi-Monte Carlo
point sets. Math. Comput. 83, 12331250 (2014)
17. Matsumoto, M., Yoshiki, T.: Existence of higher order convergent quasi-Monte Carlo rules via
Walsh figure of merit. In: Monte Carlo and Quasi-Monte Carlo Methods 2012, pp. 569579.
Springer, Berlin (2013)
18. Mori, S.: A fast QMC computation by low-WAFOM point sets. In preparation
19. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. CBMS-NSF,
Philadelphia (1992)
20. Niederreiter, H., Pirsic, G.: Duality for digital nets and its applications. Acta Arith. 97, 173182
(2001)
21. Niederreiter, H., Xing, C.P.: Low-discrepancy sequences and global function fields with many
rational places. Finite Fieldsr Appl. 2, 241273 (1996)
22. Novak, E., Ritter, K.: High-dimensional integration of smooth functions over cubes. Numer.
Math. 75, 7997 (1996)
23. Nuyens, D.: The magic point shop of qmc point generators and generating vectors. http://
people.cs.kuleuven.be/~dirk.nuyens/qmc-generators/. Home page
24. Ohori, R.: Efficient quasi-monte carlo integration by adjusting the derivation-sensitivity parameter of walsh figure of merit. Masters Thesis (2015)
25. Ohori, R., Yoshiki, T.: Walsh figure of merit is efficiently approximable. In preparation
26. Owen, A.B.: Randomly permuted (t, m, s)-nets and (t, s)-sequences. In: Monte Carlo and
Quasi-Monte Carlo Methods 1994, pp. 299317. Springer, Berlin (1995)
27. Pirsic, G.: A software implementation of niederreiter-xing sequences. In: Monte Carlo and
quasi-Monte Carlo methods, 2000 (Hong Kong), pp. 434445 (2002)
28. Suzuki, K.: An explicit construction of point sets with large minimum Dick weight. J. Complex.
30, 347354 (2014)
160
29. Suzuki, K.: WAFOM on abelian groups for quasi-Monte Carlo point sets. Hiroshima Math. J.
45, 341364 (2015)
30. Yoshiki, T.: Bounds on walsh coefficients by dyadic difference and a new Koksma-Hlawka
type inequality for quasi-Monte Carlo integration. arXiv:1504.03175
31. Yoshiki, T.: A lower bound on WAFOM. Hiroshima Math. J. 44, 261266 (2014)
f (x) dx
(1)
Dd
over an open subset Dd Rd of Lebesgue measure d (Dd ) = 1 for integrable functions f : Dd R. The main interest is on the behavior of the minimal number of
function values that are needed in the worst case setting to achieve an error at most
E. Novak (B)
Mathematisches Institut, University Jena, Ernst-Abbe-Platz 2, 07743 Jena, Germany
e-mail: erich.novak@uni-jena.de
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_6
161
162
E. Novak
> 0. Note that classical examples of domains Dd are the unit cube [0, 1]d and the
normalized Euclidean ball (with volume 1), which are closed. However, we work
with their interiors for definiteness of certain derivatives.
We state our problem. Let Fd be a class of integrable functions f : Dd R.
For f Fd , we approximate the integral Sd ( f ), see (1), by algorithms of the form
An ( f ) = n ( f (x1 ), f (x2 ), . . . , f (xn )),
where x j Dd can be chosen adaptively and n : Rn R is an arbitrary mapping.
Adaption means that the selection of x j may depend on the already computed values f (x1 ), f (x2 ), . . . , f (x j1 ). We define N : Fd Rn by N ( f ) = ( f (x1 ), . . . ,
f (xn )). The (worst case) error of the algorithm An is defined by
e(An ) = sup |Sd ( f ) An ( f )|,
f Fd
The information complexity n(, Fd ) is the minimal number of function values which
is needed to guarantee that the error is at most , i.e.,
n(, Fd ) = min{n | An such that e(An ) }.
We minimize n over all choices of adaptive sample points x j and mappings n .
In this paper we give an overview on some of the basic results that are known about
the numbers e(n, Fd ) and n(, Fd ). Hence we concentrate on complexity issues and
leave aside other important questions such as implementation issues.
It was proved by Smolyak and Bakhvalov that as long as the class Fd is convex and
balanced we may restrict the minimization of e(An ) by considering only nonadaptive
choices of x j and linear mappings n , i.e., it is enough to consider An of the form
An ( f ) =
n
ai f (xi ).
(2)
i=1
Theorem 0 (Bakhvalov [6]) Assume that the class Fd is convex and balanced. Then
e(n, Fd ) = inf
x1 ,...,xn
sup Sd ( f )
f Fd
N ( f )=0
(3)
and for the infimum in the definition of e(n, Fd ) it is enough to consider linear and
nonadaptive algorithms An of the form (2).
163
In this paper we only consider convex and balanced Fd and then we can use the
last formula for e(n, Fd ).
Remark 0 (a) For a proof of Theorem 0 see, for example, [87, Theorem 4.7]. This
result is not really about complexity (hence it got its number), but it helps to prove
complexity results.
(b) A linear algorithm An is called a quasi Monte Carlo (QMC) algorithm if ai =
1/n for all i and is called a positive quadrature formula if ai > 0 for all i. In general
it may happen that optimal quadrature formulas have some negative weights and, in
addition, we cannot say much about the position of good points xi .
(c) More on the optimality of linear algorithms and on the power of adaption can
be found in [14, 77, 87, 112, 113]. There are important classes of functions that are
not balanced and convex, and where Theorem 0 can not be applied, see also [13, 94].
The optimal order of convergence plays an important role in numerical analysis.
We start with a classical result of Bakhvalov (1959) for the class
Fdk = { f : [0, 1]d R | D f 1, || k},
d
where k N and || = i=1
i for Nd0 and D f denotes the respective partial
derivative. For two sequences an and bn of positive numbers we write an bn if
there are positive numbers c and C such that c < an /bn < C for all n N.
Theorem 1 (Bakhvalov [5])
e(n, Fdk ) n k/d .
(4)
Remark 1 (a) For such a complexity result one needs to prove an upper bound (for a
particular algorithm) and a lower bound (for all algorithms). For the upper bound one
can use tensor product methods based on a regular grid, i.e., one can use the n = m d
points xi with coordinates from the set {1/(2m), 3/(2m), . . . , (2m 1)/(2m)}.
The lower bound can be proved with the technique of bump functions: One can
construct 2nfunctions f 1 , . . . , f 2n with disjoint supports such that all 22n functions
2n
i f i are contained in Fdk , where i = 1 and Sd ( f i ) cd,k n k/d1 .
of the form i=1
n function values, there are two functions
Since an
algorithm An can only compute
2n
f i and f = f + 2 nk=1 f ik such that f + , f Fdk and An ( f + ) =
f + = i=1
An ( f ) but |Sd ( f + ) Sd ( f )| 2ncd,k n k/d1 . Hence the error of An must be at
least cd,k n k/d . For the details see, for example, [78].
(b) Observe that we can not conclude much on n(, Fdk ) if is fixed and d is large,
since Theorem 1 contains hidden factors that depend on k and d. Actually the lower
bound is of the form
e(n, Fdk ) cd,k n k/d ,
where the cd,k decrease with d and tend to zero.
(c) The proof of the upper bound (using tensor product algorithms) is easy since
we assumed that the domain is Dd = [0, 1]d . The optimal order of convergence is
164
E. Novak
known for much more general spaces (such as Besov and TriebelLizorkin spaces)
and arbitrary bounded Lipschitz domains, see [85, 115, 118]. Then the proof of the
upper bounds is more difficult, however.
(d) Integration on fractals was recently studied by Dereich and Mller-Gronbach [18]. These authors also obtain an optimal order of convergence n k/ . The
definition of Sd must be modified and coincides, under suitable conditions, with
the Hausdorff dimension of the fractal.
By the curse of dimensionality we mean that n(, Fd ) is exponentially large in d.
That is, there are positive numbers c, 0 and such that
n(, Fd ) c (1 + )d
for all 0
(5)
If, on the other hand, n(, Fd ) is bounded by a polynomial in d and 1 then we say
that the problem is polynomially tractable. If n(, Fd ) is bounded by a polynomial in
1 alone, i.e., n(, Fd ) C for < 1, then we say that the problem is strongly
polynomially tractable.
From the proof of Theorem 1 we can not conclude whether the curse of dimensionality holds for the classes Fdk or not; see Theorem 11. Possibly Maung Zho Newn
and Sharygin [124] were the first who published (in 1971) a complexity result for
arbitrary d with explicit constants and so proved the curse of dimensionality for
Lipschitz functions.
Theorem 2 (Maung Zho Newn and Sharygin [124]) Consider the class
Fd = { f : [0, 1]d R | | f (x) f (y)| max |xi yi |}.
i
Then
e(n, Fd ) =
d
n 1/d
2d + 2
for n = m d with m N.
Remark 2 One can show that for n = m d the regular grid (points xi with coordinates from theset {1/(2m), 3/(2m), . . . , (2m 1)/(2m)}) and the midpoint rule
n
f (xi ) are optimal. See also [3, 4, 12, 107] for this result and for
An ( f ) = n 1 i=1
generalizations to similar function spaces.
2 Randomized Algorithms
The integration problem is difficult for all deterministic algorithms if the classes Fd
of inputs are too large, see Theorem 2. One may hope that randomized algorithms
make this problem much easier.
165
= sup
f F
is the error of A. By
1/2
S( f ) (N ( f ))2 d()
n}.
eran (n, Fd ) = inf{eran (A) : n(A)
If A : F G is a (measurable) deterministic algorithm then A can also be treated
as a randomized algorithm with respect to a Dirac (atomic) measure . In this sense
we can say that deterministic algorithms are special randomized algorithms. Hence
the inequality
(6)
eran (n, Fd ) e(n, Fd )
is trivial.
The number eran (0, Fd ) is called the initial error in the randomized setting. For
n = 0, we do not sample f , and A ( f ) is independent of f , but may depend on .
166
E. Novak
It is easy to check that for a linear S and a balanced and convex set F, the best we
can do is to take A = 0 and then
eran (0, Fd ) = e(0, Fd ).
This means that for linear problems the initial errors are the same in the worst case
and randomized setting.
The main advantage of randomized algorithms is that the curse of dimensionality
is not present even for certain large classes of functions. With the standard Monte
Carlo method we obtain
1
eran (n, Fd ) ,
n
when Fd is the unit ball of L p ([0, 1]d ) and 2 p . Math [72] proved that this
is almost optimal and the optimal algorithm is
An ( f ) =
n
1
f (X i )
n + n i=1
with i.i.d. random variables X i that are uniformly distributed on [0, 1]d . It also follows
that
1
eran (n, Fd ) =
,
1+ n
when Fd is the unit ball of L p ([0, 1]d ) and 2 p . In the case 1 p < 2 one
can only achieve the rate n 1+1/ p , for a discussion see [50].
Bakhvalov [5] found the optimal order of convergence already in 1959 for the
class
Fdk = { f : [0, 1]d R | D f 1, || k},
where k N and || =
d
i=1
i for Nd0 .
(7)
Remark 3 A proof of the upper bound can be given with a technique that is often
called separation of the main part or also control variates. For n = 2m use m function values to construct a good L 2 approximation f m of f Fdk by a deterministic
algorithm. The optimal order of convergence is
f f m 2 m k/d .
167
m
1
f ) = Sd ( f m ) +
( f f m )(X i )
m i=1
with i.i.d. random variables X i that are uniformly distributed on [0, 1]d . See, for
example, [73, 78] for more details. We add in passing that the optimal order of convergence can be obtained for many function spaces (Besov spaces, TriebelLizorkin
spaces) and for arbitrary bounded Lipschitz domains Dd Rd ; see [85], where the
approximation problem is studied. To obtain an explicit randomized algorithm with
the optimal rate of convergence one needs a random number generator for the set
Dd . If it is not possible to obtain efficiently random samples from the uniform distribution on Dd one can work with Markov chain Monte Carlo (MCMC) methods,
see Theorem 5.
All known proofs of lower bounds use the idea of Bakhvalov (also called Yaos
Minimax Principle): study the average case setting with respect to a probability
measure on F and use the theorem of Fubini. For details see [4547, 73, 78, 88].
We describe a problem that was studied by several colleagues and solved by
Hinrichs [58] using deep results from functional analysis. Let H (K d ) be a reproducing kernel Hilbert space of real functions defined on a Borel measurable set Dd Rd .
Its reproducing kernel K d : Dd Dd R is assumed to be integrable,
Cdinit :=
1/2
K d (x, y) d (x) d (y) dx dy
Dd
< .
Dd
f (x) d (x) dx
for all
f H (K d ),
Dd
n
1 f (x j ) d (x j )
.
n j=1
d (x j )
168
E. Novak
sup
f H (K d ) 1
2 1/2
Ed Sd ( f ) An,d,d ( f )
,
where the expectation is with respect to the random choice of the sample points x j .
For n = 0 we formally take A0,d,d = 0 and then
eran (0, H (K d )) = Cdinit .
Theorem 4 (Hinrichs [58]) Assume additionally that K d (x, y) 0 for all
x, y Dd . Then there exists a positive density function d such that
eran (An,d,d )
1/2 1
eran (0, H (K d )).
2
n
S( f ) =
D
Then Markov chain Monte Carlo (MCMC) methods are a very versatile and widely
used tool.
We use an average of a finite Markov chain sample as approximation of the mean,
i.e., we approximate S( f ) by
Sn,n 0 ( f ) =
169
n
1
f (X j+n 0 ),
n j=1
1/2
e (Sn,n 0 , f ) = E,K |Sn,n 0 ( f ) S( f )|
,
where and K indicate the initial distribution and the transition kernel of the chain;
we work with the spaces L p = L p ( ). For the proof of the following error bound
we refer to [98, Theorem 3.34 and Theorem 3.41].
Theorem 5 (Rudolf [98]) Let (X n )nN be a Markov chain with reversible transition
kernel K , initial distribution , and transition operator P. Further, let
= sup{ : spec(P S)},
where spec(P S) denotes the spectrum of the operator (P S) : L 2 L 2 , and
assume that < 1. Then
sup e (Sn,n 0 , f )2
f p 1
2 C n0
2
+ 2
n(1 ) n (1 )2
(8)
P n S L 1 L 1 M n
d
for all n N and C = M d
1 ;
d
d
for p = 4, d
L 2 and = P S L 2 L 2 < 1 where C = 64 d
12 .
Remark 5 Let us discuss the results. First observe that we assume that the so called
spectral gap 1 is positive; in general we only know that || 1. If the transition
kernel is L 1 -exponentially convergent, then we have an explicit error bound for inted
L . However,
grands f L 2 whenever the initial distribution has a density d
in general it is difficult to provide explicit values and M such that the transition
kernel is L 1 -exponentially convergent with ( , M). This motivates to consider transition kernels which satisfy a weaker convergence property, such as the existence of
an L 2 -spectral gap, i.e., P S L 2 L 2 < 1. In this case we have an explicit error
d
L 2.
bound for integrands f L 4 whenever the initial distribution has a density d
170
E. Novak
Am
| det A|
,
f
ad
a
d
mZ
171
where A is a suitable matrix that does not depend on k or n, and a > 0. Of course
in (0, 1)d .
the sum is finite since we use only the points Am
a
This algorithm is similar to a lattice rule but is not quite a lattice rule since the
points do not build an integration lattice. The sum of the weights is roughly 1, but
not quite. Therefore this algorithm is not really a quasi-Monte Carlo algorithm. The
algorithm An can be modified to obtain the optimal order of convergence for the
whole space W pk,mix ([0, 1]d ). The modified algorithm uses different points xi but
still positive weights ai . For a tutorial on this algorithm see [116]. Error bounds
for Besov spaces are studied in [35]. TriebelLizorkin spaces and the case of small
smoothness are studied in [117] and [74].
For the BesovNikolskii classes S rp,q B(T d ) with 1 p, q and 1/ p < r < 2,
the optimal rate is
n r (log n)(d1)(11/q)
and can be obtained constructively with QMC algorithms, see [63]. The lower bound
was proved by Triebel [115].
The Frolov algorithm can be used as a building block for a randomized algorithm
that is universal in the sense that it has the optimal order of convergence (in the
randomized setting as well as in the worst case setting) for many different function
spaces, see [65].
A famous algorithm for tensor product problems is the Smolyak algorithm, also
called sparse grids algorithm. We can mention just a few papers and books that deal
with this topic: The algorithm was invented by Smolyak [106] and, independently,
by several other colleagues and research groups. Several error bounds were proved
by Temlyakov [108, 110]; explicit error bounds (without unknown constants) were
obtained by Wasilkowski and Wozniakowski [121, 123]. Novak and Ritter [8082]
studied the particular Clenshaw-Curtis Smolyak algorithm. A survey is Bungartz
and Griebel [9] and another one is [88, Chap. 15]. For recent results on the order of
convergence see Sickel and T. Ullrich [99, 100] and Dinh Dung and T. Ullrich [36].
The recent paper [62] contains a tractability result for the Smolyak algorithm applied
to very smooth functions. We display only one recent result on the Smolyak algorithm.
Theorem 7 (Sickel and T. Ullrich [100]) For the classes W2k,mix ([0, 1]d ) one can
construct a Smolyak algorithm with the order of the error
n k (log n)(d1)(k+1/2) .
(9)
Remark 7 (a) The bound (9) is valid even for L 2 approximation instead of integration, but it is not known whether this upper bound is optimal for the approximation
problem. Using the technique of control variates one can obtain the order
n k1/2 (log n)(d1)(k+1/2)
172
E. Novak
for the integration problem in the randomized setting. This algorithm is not often used
since it is not easy to implement and its arithmetic cost is rather high. In addition,
the rate can be improved by the algorithm of [65] to n k1/2 (log n)(d1)/2 .
(b) It is shown in Dinh Dung and T. Ullrich [36] that the order (9) can not be
improved when restricting to Smolyak grids.
(c) We give a short description
of the ClenshawCurtis Smolyak algorithm for the
computation of integrals [1,1]d f (x) dx that often leads to almost optimal error
bounds, see [81].
We assume that for d = 1 a sequence of formulas
Ui( f ) =
mi
a ij f (x ij )
j=1
is given. In the case of numerical integration the a ij are just numbers. The method
U i uses m i function values and we assume that U i+1 has smaller error than U i and
m i+1 > m i . Define then, for d > 1, the tensor product formulas
(U U )( f ) =
i1
id
m i1
j1 =1
m id
jd =1
(1)
q|i|
qd+1|i|q
d 1
(U i1 U id ),
q |i|
where q d. Specifically, we use, for d > 1, the Smolyak construction and start,
for d = 1, with the classical ClenshawCurtis formula with
m1 = 1
and
173
mi
a ij f (x ij )
j=1
( j 1)
,
mi 1
j = 1, . . . , m i
(and x11 = 0). Hence we use nonequidistant knots. The weights a ij are defined in such
a way that U i is exact for all (univariate) polynomials of degree at most m i .
It turns out that many tensor product problems are still intractable and suffer from the curse of dimensionality, for a rather exhaustive presentation see
[87, 88, 90]. Sloan and Wozniakowski [103] describe a very interesting idea that
was further developed in hundreds of papers, the paper [103] is most important and
influential. We can describe here only the very beginnings of a long ongoing story;
we present just one example instead of the whole theory.
The rough idea is that f : [0, 1]d R may depend on many variables, d is large,
but some variables or groups of variables are more important than others. Consider,
for d = 1, the inner product
f, g1, =
f dx
0
1
g dx +
where > 0. If is small then f must be almost constant if it has small norm.
A large means that f may have a large variation and still the norm is relatively
small. Now we take tensor products of such spaces and weights 1 2 . . . and
consider the complexity of the integration problem for the unit ball Fd with respect
to this weighted norm. The kernel K of the tensor product space H (K ) is of the form
K (x, y) =
d
K i (xi , yi ),
i=1
[0,1]d
K (x, x) dx
[0,1]2d
K (x, y) dx dy .
174
E. Novak
1
This
expectation is of the form Cd n and the sequence Cd is bounded if and only if
i < . The lower bound in [103] is based on the fact that the kernel K is always
non-negative; this leads to lower bounds for QMC algorithms or, more generally, for
algorithms with positive weights.
As already indicated, Sloan and Wozniakowski [103] was continued in many
directions. Much more general weights and many different Hilbert spaces were studied. By the probabilistic method one only obtains the existence of a good QMC
algorithms but, in the meanwhile, there exist many results about the construction
of good algorithms. In this paper the focus is on the basic complexity results and
therefore we simply list a few of the most relevant papers: [8, 11, 2628, 5355,
6669, 92, 93, 102, 104, 105]. See also the books [23, 71, 75, 88] and the excellent
survey paper [29].
i < .
tractable if and only if i=1
Remark 9 Due to the known upper bound of Theorem 8, to prove Theorem 9 it is
enough to prove a lower bound for arbitrary algorithms. This is done via the technique
of decomposable kernels that was developed in [86], see also [88, Chap. 11].
We do not describe this technique here and only remark that we need for this
technique many non-zero functions f i in the Hilbert space Fd with disjoint supports.
Therefore this technique usually works for functions with finite smoothness, but not
for analytic functions.
Tractability of integration can be proved for many weighted spaces and one
may ask whether there are also unweighted spaces where tractability holds as well.
A famous example for this are integration problems that are related to the star discrepancy.
For x1 , . . . , xn [0, 1]d define the star discrepancy by
D
(x1 , . . . , xn )
n
1
= sup t1 td
1[0,t) (xi ) ,
n i=1
t[0,1]d
1
n
n
i=1
f (xi ).
f :=
175
d f
.
x1 x2 . . . xd 1
hence the star discrepancy is a worst case error bound for integration. We define
(x1 , . . . , xn ) }.
n(, Fd ) = min{n | x1 , . . . , xn with D
The following result shows that this integration problem is polynomially tractable
and the complexity is linear in the dimension.
Theorem 10 ([51])
n(, Fd ) C d 2
(10)
and
n(1/64, Fd ) 0.18 d.
Remark 10 This result was modified and improved in various ways and we mention
some important results. Hinrichs [57] proved the lower bound
n(, Fd ) c d 1 for 0 .
Aistleitner [1] proved that the constant C in (10) can be taken as 100. Aistleitner
and Hofer [2] proved more on upper bounds.
Already the proof in [51] showed that
E(D
(x1 , . . . , xn ))
d
for n d.
n
Since the upper bounds are proved with the probabilistic method, we only know
the existence of points with small star discrepancy. The existence results can be
transformed into (more or less explicit) constructions and the problem is, of course,
to minimize the computing time as well as the discrepancy. One of the obstacles is
that already the computation of the star discrepancy of given points x1 , x2 , . . . , xn is
very difficult. We refer the reader to [19, 24, 25, 3134, 42, 59].
Recently Dick [20] proved a tractability result for another unweighted space that
is defined via an L 1 -norm and consists of periodic functions; we denote Fourier
coefficients by f(k), where k Zd . Let 0 < 1 and 1 p and
176
E. Novak
F, p,d =
f : [0, 1] R |
d
|
f
(x
+
h)
f
(x)|
| f(k)| + sup
1 .
hp
x,h
d
kZ
d 1 d / p
e(n, F, p,d ) max ,
n
n
f (x) dx
(11)
Dd
up to some error > 0, where Dd Rd has Lebesgue measure 1. The results hold for
arbitrary sets Dd , the standard example of course is Dd = [0, 1]d . For convenience
we consider functions f : Rd R. This makes the function class a bit smaller and
the result a bit stronger, since our emphasis is on lower bounds.
It has not been known if the curse of dimensionality is present for probably
the most natural class which is the unit ball of r times continuously differentiable
functions,
Fdk = { f C k (Rd ) | D f 1 for all || k},
where k N.
Theorem 11 ([60]) The curse of dimensionality holds for the classes Fdk with the
super-exponential lower bound
n(, Fdk ) ck (1 ) d d/(2k+3) for all d N and (0, 1),
where ck > 0 depends only on k.
177
Remark 11 In [60, 61] we also prove that the curse of dimensionality holds for
even smaller classes of functions Fd for which
the norms of arbitrary directional
derivatives are bounded proportionally to 1/ d.
We start with the fooling function
1
f 0 (x) = min 1, dist(x, P )
d
where
P =
n
for all x Rd ,
Bd (xi )
i=1
and Bd (xi ) is the ball with center xi and radius d. The function f 0 is Lipschitz.
By a suitable smoothing via convolution we construct a smooth fooling function
f k Fd with f k |P0 = 0.
Important elements of the proof are volume estimates (in the spirit of Elekes [38]
and Dyer, Fredi and McDiarmid [37]), since we need that the volume of a neighborhood of the convex hull of n arbitrary points is exponentially small in d.
Also classes of C -functions were studied recently. We still do not know whether
the integration problem suffers from the curse of dimensionality for the classes
Fd = { f : [0, 1]d R | D f 1 for all Nd0 },
this is Open Problem 2 from [87]. We know from Vybral [119] and [61] that the
curse is present for somewhat larger spaces and that a weak tractability holds for
smaller classes; this can be proved with the Smolyak algorithm, see [62].
We now consider univariate oscillatory integrals for the standard Sobolev spaces
H s of periodic and non-periodic functions with an arbitrary integer s 1. We study
the approximate computation of Fourier coefficients
Ik ( f ) =
f (x) e2 i kx dx,
i=
1,
where k Z and f H s .
There are several recent papers about the approximate computation of highly
oscillatory univariate integrals with the weight exp(2 i kx), where x [0, 1] and
k is an integer (or k R) which is assumed to be large in the absolute sense, see
Huybrechs and Olver [64] for a survey.
We study the Sobolev space H s for a finite s N, i.e.,
H s = { f : [0, 1] C | f (s1) is abs. cont., f (s) L 2 }
(12)
178
E. Novak
s1
=0
s1
()
(x) dx
()
g (x) dx +
(13)
f
()
()
, 10 g , 10 + f
(s)
(s)
, g 0 ,
=0
1
1/2
where f, g0 = 0 f (x) g(x) dx, and norm f H s = f, f s .
For the periodic case, an algorithm that uses n function values at equally spaced
points is nearly optimal, and its worst case error is bounded by Cs (n + |k|)s with
Cs exponentially small in s. For the non-periodic case, we first compute successive
derivatives up to order s 1 at the end-points x = 0 and x = 1. These derivatives
values are used to periodize the function and this allows us to obtain similar error
bounds like for the periodic case. Asymptotically in n, the worst case error of the
algorithm is of order n s independently of k for both periodic and non-periodic cases.
Theorem 12 ([91]) Consider the integration problem Ik defined over the space H s
of non-periodic functions with s N. Then
cs
e(n, k, H s )
(n + |k|)s
3
2
s
2
,
(n + |k| 2s + 1)s
References
1. Aistleitner, Ch.: Covering numbers, dyadic chaining and discrepancy. J. Complex. 27, 531
540 (2011)
2. Aistleitner, Ch., Hofer, M.: Probabilistic discrepancy bounds for Monte Carlo point sets. Math.
Comput. 83, 13731381 (2014)
3. Babenko, V.F.: Asymptotically sharp bounds for the remainder for the best quadrature formulas
for several classes of functions. 19(3), 187193 (1976). English Translation: Mathematics
Notes
179
4. Babenko, V.F.: Exact asymptotics of the error of weighted cubature formulas optimal for
certain classes of functions. English Translation Mathematics Notes 20(4), 887890 (1976)
5. Bakhvalov, N.S., On the approximate calculation of multiple integrals. Vestnik MGU, Ser.
Math. Mech. Astron. Phys. Chem. 4:318, : in Russian. English translation: Journal of Complexity 31, 502516, 2015 (1959)
6. Bakhvalov, N.S.: On the optimality of linear methods for operator approximation in convex
classes of functions. USSR Comput. Math. Math. Phys. 11, 244249 (1971)
7. Baldeaux, J., Gnewuch, M.: Optimal randomized multilevel algorithms for infinitedimensional integration on function spaces with ANOVA-type decomposition. SIAM J.
Numer. Anal. 52, 11281155 (2014)
8. Baldeaux, J., Dick, J., Leobacher, G., Nuyens, D., Pillichshammer, F.: Efficient calculation
of the worst-case error and (fast) component-by-component construction of higher order
polynomial lattice rules. Numer. Algorithms 59, 403431 (2012)
9. Bungartz, H.-J., Griebel, M.: Sparse grids. Acta Numer. 13, 147269 (2004)
10. Bykovskii, V.A.: On the correct order of the error of optimal cubature formulas in spaces with
dominant derivative, and on quadratic deviations of grids. Akad. Sci. USSR, Vladivostok,
Computing Center Far-Eastern Scientific Center (preprint, 1985)
11. Chen, W.W.L., Skriganov, M.M.: Explicit constructions in the classical mean squares problem
in irregularities of point distribution. J. fr Reine und Angewandte Mathematik (Crelle) 545,
6795 (2002)
12. Chernaya, E.V.: Asymptotically exact estimation of the error of weighted cubature formulas
optimal in some classes of continuous functions. Ukr. Math. J. 47(10), 16061618 (1995)
13. Clancy, N., Ding, Y., Hamilton, C., Hickernell, F.J., Zhang, Y.: The cost of deterministic,
adaptive, automatic algorithms: cones, not balls. J. Complex. 30, 2145 (2014)
14. Creutzig, J., Wojtaszczyk, P.: Linear vs. nonlinear algorithms for linear problems. J. Complex.
20, 807820 (2004)
15. Creutzig, J., Dereich, S., Mller-Gronbach, Th, Ritter, K.: Infinite-dimensional quadrature
and approximation of distributions. Found. Comput. Math. 9, 391429 (2009)
16. Daun, T., Heinrich, S.: Complexity of Banach space valued and parametric integration. In:
Dick, J., Kuo, F.Y., Peters, G.W., Sloan, I.H. (eds.) Monte Carlo and Quasi-Monte Carlo
Methods 2012, pp. 297316. Springer (2013)
17. Daun, T., Heinrich, S.: Complexity of parametric integration in various smoothness classes.
J. Complex. 30, 750766, (2014)
18. Dereich, S., Mller-Gronbach, Th.: Quadrature for self-affine distributions on Rd . Found.
Comput. Math. 15, 14651500, (2015)
19. Dick, J.: A note on the existence of sequences with small star discrepancy. J. Complex. 23,
649652 (2007)
20. Dick, J.: Numerical integration of Hlder continuous, absolutely convergent Fourier-, Fourier
cosine-, and Walsh series. J. Approx. Theory 183, 1430 (2014)
21. Dick, J., Gnewuch, M.: Optimal randomized changing dimension algorithms for infinitedimensional integration on function spaces with ANOVA-type decomposition. J. Approx.
Theory 184, 111145 (2014)
22. Dick, J., Gnewuch, M.: Infinite-dimensional integration in weighted Hilbert spaces: anchored
decompositions, optimal deterministic algorithms, and higher order convergence. Found.
Comput. Math. 14, 10271077 (2014)
23. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and QuasiMonte Carlo Integration. Cambridge University Press, Cambridge (2010)
24. Dick, J., Pillichshammer, F.: Discrepancy theory and quasi-Monte Carlo integration. In: Chen,
W., Srivastav, A., Travaglini, G., (eds) Panorama in Discrepancy Theory. Lecture Notes in
Mathematics 2107, pp. 539619. Springer (2014)
25. Dick, J., Pillichshammer, F.: The weighted star discrepancy of Korobovs p-sets. Proc. Am.
Math. Soc. 143, 50435057, (2015)
26. Dick, J., Sloan, I.H., Wang, X., Wozniakowski, H.: Liberating the weights. J. Complex. 20,
593623 (2004)
180
E. Novak
27. Dick, J., Sloan, I.H., Wang, X., Wozniakowski, H.: Good lattice rules in weighted Korobov
spaces with general weights. Numer. Math. 103, 6397 (2006)
28. Dick, J., Larcher, G., Pillichshammer, F., Wozniakowski, H.: Exponential convergence and
tractability of multivariate integration for Korobov spaces. Math. Comput. 80, 905930 (2011)
29. Dick, J., Kuo, F.Y., Sloan, I.H.: High-dimensional integration: the quasi-Monte Carlo way.
Acta Numer. 22, 133288 (2013)
30. Doerr, B.: A lower bound for the discrepancy of a random point set. J. Complex. 30, 1620
(2014)
31. Doerr, B., Gnewuch, M.: Construction of low-discrepancy point sets of small size by bracketing covers and dependent randomized rounding. In: Keller, A., Heinrich, S., Niederreiter, H.,
(eds.), Monte Carlo and Quasi-Monte Carlo Methods 2006, pp. 299312. Springer (2008)
32. Doerr, B., Gnewuch, M., Kritzer, P., Pillichshammer, F.: Component-by-component construction of low-discrepancy point sets of small size. Monte Carlo Methods Appl. 14, 129149
(2008)
33. Doerr, B., Gnewuch, M., Wahlstrm, M.: Algorithmic construction of low-discrepancy point
sets via dependent randomized rounding. J. Complex. 26, 490507 (2010)
34. Doerr, C., Gnewuch, M., Wahlstrm, M.: Calculation of discrepancy measures and applications. In: Chen, W.W.L., Srivastav, A., Travaglini, G., (eds.) Panorama of Discrepancy Theory.
Lecture Notes in Mathematics 2107, pp. 621678. Springer (2014)
35. Dubinin, V.V.: Cubature formulas for Besov classes. Izvestija Mathematics 61(2), 259283
(1997)
36. Dung, D., Ullrich, T.: Lower bounds for the integration error for multivariate functions with
mixed smoothness and optimal Fibonacci cubature for functions on the square. Mathematische
Nachrichten 288, 743762 (2015)
37. Dyer, M.E., Fredi, Z., McDiarmid, C.: Random volumes in the n-cube. DIMACS Ser. Discret.
Math. Theor. Comput. Sci. 1, 3338 (1990)
38. Elekes, G.: A geometric inequality and the complexity of computing volume. Discret. Comput.
Geom. 1, 289292 (1986)
39. Frolov, K.K.: Upper bounds on the error of quadrature formulas on classes of functions. Doklady Akademy Nauk USSR 231, 818821 (1976). English translation: Soviet Mathematics
Doklady 17, 16651669, 1976
40. Frolov, K.K.: Upper bounds on the discrepancy in L p , 2 p . Doklady Akademy Nauk
USSR 252, 805807 (1980). English translation: Soviet Mathematics Doklady 18(1): 3741,
1977
41. Gnewuch, M.: Infinite-dimensional integration on weighted Hilbert spaces. Math. Comput.
81, 21752205 (2012)
42. Gnewuch. M.: Entropy, randomization, derandomization, and discrepancy. In: Plaskota, L.,
Wozniakowski, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2010, pp. 4378.
Springer (2012)
43. Gnewuch, M.: Lower error bounds for randomized multilevel and changing dimension algorithms. In: Dick, J., Kuo, F.Y., Peters, G.W., Sloan, I.H. (eds.) Monte Carlo and Quasi-Monte
Carlo Methods 2012, pp. 399415. Springer (2013)
44. Gnewuch, M., Mayer, S., Ritter, K.: On weighted Hilbert spaces and integration of functions
of infinitely many variables. J. Complex. 30, 2947 (2014)
45. Heinrich, S.: Lower bounds for the complexity of Monte Carlo function approximation. J.
Complex. 8, 277300 (1992)
46. Heinrich, S.: Random approximation in numerical analysis. In: Bierstedt, K.D., et al. (eds.)
Functional Analysis, pp. 123171. Dekker (1994)
47. Heinrich, S.: Complexity of Monte Carlo algorithms. In: The Mathematics of Numerical
Analysis, Lectures in Applied Mathematics 32, AMS-SIAM Summer Seminar, pp. 405419.
Park City, American Mathematical Society (1996)
48. Heinrich, S.: Quantum Summation with an Application to Integration. J. Complex. 18, 150
(2001)
49. Heinrich, S.: Quantum integration in Sobolev spaces. J. Complex. 19, 1942 (2003)
181
50. Heinrich, S., Novak, E.: Optimal summation and integration by deterministic, randomized,
and quantum algorithms. In: Fang, K.-T., Hickernell, F.J., Niederreiter, H. (eds.) Monte Carlo
and Quasi-Monte Carlo Methods 2000, pp. 5062. Springer (2002)
51. Heinrich, S., Novak, E., Wasilkowski, G.W., Wozniakowski, H.: The inverse of the stardiscrepancy depends linearly on the dimension. Acta Arithmetica 96, 279302 (2001)
52. Heinrich, S., Novak, E., Pfeiffer, H.: How many random bits do we need for Monte Carlo
integration? In: Niederreiter, H. (ed.) Monte Carlo and Quasi-Monte Carlo Methods 2002,
pp. 2749. Springer (2004)
53. Hickernell, F.J., Wozniakowski, H.: Integration and approximation in arbitrary dimension.
Adv. Comput. Math. 12, 2558 (2000)
54. Hickernell, F.J., Wozniakowski, H.: Tractability of multivariate integration for periodic functions. J. Complex. 17, 660682 (2001)
55. Hickernell, F.J., Sloan, I.H., Wasilkowski, G.W.: On strong tractability of weighted multivariate integration. Math. Comput. 73, 19031911 (2004)
56. Hickernell, F.J., Mller-Gronbach, Th, Niu, B., Ritter, K.: Multi-level Monte Carlo algorithms
for infinite-dimensional integration on RN . J. Complex. 26, 229254 (2010)
57. Hinrichs, A.: Covering numbers, Vapnik-Cervonenkis classes and bounds for the star discrepancy. J. Complex. 20, 477483 (2004)
58. Hinrichs, A.: Optimal importance sampling for the approximation of integrals. J. Complex.
26, 125134 (2010)
59. Hinrichs, A.: Discrepancy, integration and tractability. In: Dick, J., Kuo, F.Y., Peters, G.W.,
Sloan, I.H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2012, pp. 129172. Springer
(2013)
60. Hinrichs, A., Novak, E., Ullrich, M., Wozniakowski, H.: The curse of dimensionality for
numerical integration of smooth functions. Math. Comput. 83, 28532863 (2014)
61. Hinrichs, A., Novak, E., Ullrich, M., Wozniakowski, H.: The curse of dimensionality for
numerical integration of smooth functions II. J. Complex. 30, 117143 (2014)
62. Hinrichs, A., Novak, E., Ullrich, M.: On weak tractability of the Clenshaw Curtis Smolyak
algorithm. J. Approx. Theory 183, 3144 (2014)
63. Hinrichs, A., Markhasin, L., Oettershagen, J., Ullrich, T.: Optimal quasi-Monte Carlo rules
on order 2 digital nets for the numerical integration of multivariate periodic functions, Numer.
Math. 134, (2015)
64. Huybrechs, D., Olver, S.: Highly oscillatory quadrature. Lond. Math. Soc. Lect. Note Ser.
366, 2550 (2009)
65. Krieg, D., Novak, E.: A universal algorithm for multivariate integration. Found. Comput.
Math. available at arXiv:1507.06853 [math.NA]; arXiv:1507.06853v2 [math.NA]
66. Kritzer, P., Pillichshammer, F., Wozniakowski, H.: Multivariate integration of infinitely many
times differentiable functions in weighted Korobov spaces. Math. Comput. 83, 11891206
(2014)
67. Kritzer, P., Pillichshammer, F., Wozniakowski, H.: Tractability of multivariate analytic problems. In: Uniform distribution and quasi-Monte Carlo methods, pp. 147170. De Gruyter
(2014)
68. Kuo, F.Y.: Component-by-component constructions achieve the optimal rate of convergence
for multivariate integration in weighted Korobov and Sobolev spaces. J. Complex. 19, 301
320 (2003)
69. Kuo, F.Y., Wasilkowski, G.W., Waterhouse, B.J.: Randomly shifted lattice rules for unbounded
integrands. J. Complex. 22, 630651 (2006)
70. Kuo, F.Y., Sloan, I.H., Wasilkowski, G.W., Wozniakowski, H.: Liberating the dimension. J.
Complex. 26, 422454 (2010)
71. Leobacher, G., Pillichshammer, F.: Introduction to Quasi-Monte Carlo Integration and Applications. Springer, Berlin (2014)
72. Math, P.: The optimal error of Monte Carlo integration. J. Complex. 11, 394415 (1995)
73. Mller-Gronbach, Th., Novak, E., Ritter, K.: Monte-Carlo-Algorithmen. Springer, Berlin
(2012)
182
E. Novak
74. Nguyen, V.K., Ullrich, M., Ullrich, T.: Change of variable in spaces of mixed smoothness and
numerical integration of multivariate functions on the unit cube (In preparation)
75. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM (1992)
76. Niu, B., Hickernell, F., Mller-Gronbach, Th, Ritter, K.: Deterministic multi-level algorithms
for infinite-dimensional integration on RN . J. Complex. 27, 331351 (2011)
77. Novak, E.: On the power of adaption. J. Complex. 12, 199237 (1996)
78. Novak, E.: Deterministic and Stochastic Error Bounds in Numerical Analysis. Lecture Notes
in Mathematics 1349. Springer, Berlin (1988)
79. Novak, E.: Quantum complexity of integration. J. Complex. 17, 216 (2001)
80. Novak, E., Ritter, K.: High dimensional integration of smooth functions over cubes. Numer.
Math. 75, 7997 (1996)
81. Novak, E., Ritter, K.: The curse of dimension and a universal method for numerical integration.
In: Nrnberger, G., Schmidt, J.W., Walz, G. (eds.) Multivariate Approximation and Splines,
vol. 125, pp. 177188. ISNM, Birkhuser (1997)
82. Novak, E., Ritter, K.: Simple cubature formulas with high polynomial exactness. Constr.
Approx. 15, 499522 (1999)
83. Novak, E., Rudolf, D.: Computation of expectations by Markov chain Monte Carlo methods.
In: Dahlke, S., et al. (ed.) Extraction of quantifiable information from complex systems.
Springer, Berlin (2014)
84. Novak, E., Sloan, I.H., Wozniakowski, H.: Tractability of tensor product linear operators. J.
Complex. 13, 387418 (1997)
85. Novak, E., Triebel, H.: Function spaces in Lipschitz domains and optimal rates of convergence
for sampling. Constr. Approx. 23, 325350 (2006)
86. Novak, E., Wozniakowski, H.: Intractability results for integration and discrepancy. J. Complex. 17, 388441 (2001)
87. Novak, E., Wozniakowski, H.: Tractability of Multivariate Problems, vol. I, Linear Information. European Mathematical Society (2008)
88. Novak, E., Wozniakowski, H.: Tractability of Multivariate Problems, vol. II, Standard Information for Functionals. European Mathematical Society (2010)
89. Novak, E., Wozniakowski, H.: Lower bounds on the complexity for linear functionals in the
randomized setting. J. Complex. 27, 122 (2011)
90. Novak, E., Wozniakowski, H.: Tractability of Multivariate Problems, vol. III, Standard Information for Operators. European Mathematical Society (2012)
91. Novak, E., Ullrich, M., Wozniakowski, H.: Complexity of oscillatory integration for univariate
Sobolev spaces. J. Complex. 31, 1541 (2015)
92. Nuyens, D., Cools, R.: Fast algorithms for component-by-component construction of rank-1
lattice rules in shift invariant reproducing kernel Hilbert spaces. Math. Comput. 75, 903920
(2006)
93. Nuyens, D., Cools, R.: Fast algorithms for component-by-component construction of rank-1
lattice rules with a non-prime number of points. J. Complex. 22, 428 (2006)
94. Plaskota, L., Wasilkowski, G.W.: The power of adaptive algorithms for functions with singularities. J. Fixed Point Theory Appl. 6, 227248 (2009)
95. Plaskota, L., Wasilkowski, G.W.: Tractability of infinite-dimensional integration in the worst
case and randomized settings. J. Complex. 27, 505518 (2011)
96. Roth, K.F.: On irregularities of distributions. Mathematika 1, 7379 (1954)
97. Roth, K.F.: On irregularities of distributions IV. Acta Arithmetica 37, 6775 (1980)
98. Rudolf, D.: Explicit error bounds for Markov chain Monte Carlo. Dissertationes Mathematicae
485, (2012)
99. Sickel, W., Ullrich, T.: Smolyaks algorithm, sampling on sparse grids and function spaces of
dominating mixed smoothness. East J. Approx. 13, 387425 (2007)
100. Sickel, W., Ullrich, T.: Spline interpolation on sparse grids. Appl. Anal. 90, 337383 (2011)
101. Skriganov, M.M.: Constructions of uniform distributions in terms of geometry of numbers.
St. Petersburg Math. J. 6, 635664 (1995)
183
102. Sloan, I.H., Reztsov, A.V.: Component-by-component construction of good lattice rules. Math.
Comput. 71, 263273 (2002)
103. Sloan, I.H., Wozniakowski, H.: When are quasi-Monte Carlo algorithms efficient for high
dimensional integrals? J. Complex. 14, 133 (1998)
104. Sloan, I.H., Kuo, F.Y., Joe, S.: On the step-by-step construction of quasi-Monte Carlo integration rules that achieves strong tractability error bounds in weighted Sobolev spaces. Math.
Comput. 71, 16091640 (2002)
105. Sloan, I.H., Wang, X., Wozniakowski, H.: Finite-order weights imply tractability of multivariate integration. J. Complex. 20, 4674 (2004)
106. Smolyak, S.A.: Quadrature and interpolation formulas for tensor products of certain classes
of functions. Doklady Akademy Nauk SSSR 4, 240243 (1963)
107. Sukharev, A.G.: Optimal numerical integration formulas for some classes of functions. Sov.
Math. Dokl. 20, 472475 (1979)
108. Temlyakov, V.N.: Approximate recovery of periodic functions of several variables. Mathematics USSR Sbornik 56, 249261 (1987)
109. Temlyakov, V.N.: On a way of obtaining lower estimates for the error of quadrature formulas.
Math. USSR Sb. 181, 14031413 (1990). in Russian. English translation: Mathematics USSR
Sbornik 71(247257), 1992
110. Temlyakov, V.N.: On approximate recovery of functions with bounded mixed derivative. J.
Complex. 9, 4159 (1993)
111. Temlyakov, V.N.: Cubature formulas, discrepancy, and nonlinear approximation. J. Complex.
19, 352391 (2003)
112. Traub, J.F., Wozniakowski, H.: A General Theory of Optimal Algorithms. Academic Press,
Cambridge (1980)
113. Traub, J.F., Wasilkowski, G.W., Wozniakowski, H.: Information-Based Complexity. Academic Press, Cambridge (1988)
114. Traub, J.F., Wozniakowski, H.: Path integration on a quantum computer. Q. Inf. Process. 1,
365388 (2003)
115. Triebel, H.: Bases in Function Spaces, Sampling, Discrepancy, Numerical Integration. European Mathematical Society (2010)
116. Ullrich, M.: On Upper error bounds for quadrature formulas on function classes by K.K.
Frolov. In: Cools, R., Nuyens, D., (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2014,
vol. 163, pp. 571582. Springer, Heidelberg (2016)
117. Ullrich, M., Ullrich, T.: The role of Frolovs cubature formula for functions with bounded
mixed derivative. SIAM J. Numer. Anal. 54(2), 969993 (2016)
118. Vybral, J.: Sampling numbers and function spaces. J. Complex. 23, 773792 (2007)
119. Vybral, J.: Weak and quasi-polynomial tractability of approximation of infinitely differentiable functions. J. Complex. 30, 4855 (2014)
120. Wasilkowski, G.W.: Average case tractability of approximating -variate functions. Math.
Comput. 83, 13191336 (2014)
121. Wasilkowski, G.W., Wozniakowski, H.: Explicit cost bounds of algorithms for multivariate
tensor product problems. J. Complex. 11, 156 (1995)
122. Wasilkowski, G.W., Wozniakowski, H.: On tractability of path integration. J. Math. Phys. 37,
20712088 (1996)
123. Wasilkowski, G.W., Wozniakowski, H.: Weighted tensor-product algorithms for linear multivariate problems. J. Complex. 15, 402447 (1999)
124. Zho Newn, M., Sharygin, I.F.: Optimal cubature formulas in the classes D21,c and D21,l1 . In:
Problems of Numerical and Applied Mathematics, pp. 2227. Institute of Cybernetics, Uzbek
Academy of Sciences (1991, in Russian)
Approximate Bayesian
Computation: A Survey on Recent Results
Christian P. Robert
1 ABC Basics
Bayesian statistics and Monte Carlo methods are ideally suited to the task of passing many
models over one dataset. D. Rubin 1984
185
186
C.P. Robert
187
returns as an accepted value a draw generated exactly from the posterior distribution,
( |x0 ).
When compared with Rubins representation, ABC produces an approximate solution, replacing the above acceptance step with the tolerance condition
d(x, x0 ) <
in order to handle both continuous and large finite sampling spaces,1 X, but this
early occurrence in [56] is definitely worth signalling. It is also relevant that Rubin
does not promote this simulation method in situations where the likelihood is not
available but rather as an intuitive way to understand posterior distributions from
a frequentist perspective, because s from the posterior are these that could have
generated the observed data. (The issue of the zero probability of the exact equality
between simulated and observed data is not addressed in Rubins paper, maybe
because the notion of a match between simulated and observed data is not clearly
defined.) Another (just as) early occurrence of an ABC-like algorithm was proposed
by [19].
Algorithm 2 ABC (basic version)
Given an observation x 0
for t = 1 to N do
repeat
Generate from the prior ()
Generate x from the model f (| )
Compute the distance (x0 , x )
Accept if (x0 , x ) <
until acceptance
end for
return the N accepted values of
188
C.P. Robert
time series (toy) example of [39], the signal-to-noise2 ratio produced by selecting
s such that (x0 , x ) < falls dramatically as the dimension of x 0 increases for
a fixed value of . This means a corresponding increase in either the total number of
simulations N or in the tolerance is required to preserve a positive acceptance rate.
In practice, it is thus paramount to first summarise the data in a so-called summary
statistic before computing a proximity index. Thus enters the notion of summary
statistics that is central to operational ABC algorithms, as well as the subject of
much debate, as discussed in [12, 39] and below. A more realistic version of the ABC
algorithm is produced in Algorithm 3, where S() denotes the summary statistic.
Algorithm 3 ABC (version with summary)
Given an observation x 0
for t = 1 to N do
Generate (t) from the prior ()
Generate x (t) from the model f (| (t) )
Compute dt = (S(x0 ), S(x(t) ))
end for
Order distances d(1) d(2) . . . d(N )
return the values (t) associated with the k smallest distances
For a general introduction to ABC methods, I refer the reader to our earlier survey
[39] and to [60], the latter constituting the original version of the Wikipedia page on
ABC [69], first published in PLoS One. The presentation made in that page is comprehensive and correct, rightly putting stress on the most important aspects of the method.
The authors also include the proper level of warning about the need to assess assumptions behind and calibrations of the method. For concisions sake, I will not cover here
recent computational advances, like these linked with sequential Monte Carlo [4, 65]
and the introduction of Gaussian processes in the approximation [72].
An important question that arises in the wake of defining this approximate algorithm is whether or not it constitutes a valid approximation to the posterior distribution
( |S(y0 )), if not of the original ( |y0 ). (This is what we will call consistency of
the ABC algorithm in the following section, meaning that the Monte Carlo approximation provided by the algorithm converges to the posterior when the number of
simulations grows to infinity. The more standard notion of statistical consistency
will also be invoked to justify the approximation.) In case it does not converge to
the posterior, a subsequent question is whether or not the ABC output constitutes
a proper form of Bayesian inference. Answers to the latter vary according to ones
perspective:
asymptotically, an infinite computing power allows for a zero tolerance, hence for
a proper posterior conditioning on S(y0 );
the outcome of Algorithm 3 is an exact posterior distribution when assuming an
error-in-variable model with scale [70];
2 Or,
189
and
( |d(Y, y0 ) )
may exhibit the opposite feature. (In the above, we introduce the tolerance to stress
the dependence of the choice of the tolerance on the summary statistics.) A related
difficulty with ABC is that the approximation errorof using ( |d(S(Y ), S(y0 ))
) instead of ( |S(y0 )) or the original ( |y0 )is unknown unless one is ready
to run costly simulation.
2 ABC Consistency
ABC was first treated with suspicion by the mainstream statistical community (as
well as some population geneticists, see the fierce debate between [63, 64] and [5, 8])
because it sounded like a rudimentary version of standard Monte Carlo methods like
MCMC algorithms [53]. However, the perspective later changed, due to representations of the ABC posterior distribution as (i) a genuine posterior distribution [71] and
of ABC as an auxiliary variable method [71], (ii) a non-parametric technique [10,
11], connected with both indirect inference [20] and k-nearest neighbour estimation
[9]. This array of interpretations helped to turn ABC into an acceptable (if not fully
accepted) component of Bayesian computational methods, albeit requiring caution
and calibration [69]. The following entries cover some of the advances made in the
statistical analysis of the method. While some of the earlier justifications are about
computational consistency, namely a converging approximation when the computing
power grows to infinity, the more recent analyses are mostly focused on statistical
consistency. This perspective shift signifies that ABC is increasingly considered as
an inference method per se.
190
C.P. Robert
with the way it is truly implemented. In practice, as in the DIYABC software [16],
the tolerance bound is determined as in Algorithm 3: a quantile of the simulated
distances, say the 10 % or the 1 % quantile, is chosen as . This means in particular
that the interpretation of as a non-parametric density estimation bandwidth, while
interesting and prevalent in the literature (see, e.g., [10, 24]), is only an approximation
of the actual practice.
The focus of [9] is on the mathematical foundations of this practice, an advance
obtained by (re)analysing ABC as a k-nearest neighbour (knn) method. Using generic
knn results, they derive a consistency property for the ABC algorithm by imposing
some constraints upon the rate of decrease of the quantile k as a function of n. More
specifically, provided
k N/log log N
and
k N/N
when N , for almost all s0 (with respect to the distribution of S(Y )), with
probability 1, convergence occurs, i.e.
1/k N
kN
( j ) E[( j )|S = s0 ]
j=1
The setting is restricted to the use of sufficient statistics or, equivalently, to a distance over the whole sample. The issue of summary statistics is not addressed in the
paper. The paper also contains a rigorous proof of the convergence of ABC when
the tolerance goes to zero. The mean integrated square error consistency of the
conditional kernel density estimate is established for a generic kernel (under usual
assumptions). Further assumptions (both on the target and on the kernel) allow the
authors to obtain precise convergence rates (as a power of the sample size), derived
from classical k-nearest neighbour regression, like
kN N
p+4/m+ p+4
in dimensions m larger than 4 (where N is the simulation size). The paper [9] is
theoretical and highly mathematical, however this work clearly constitutes a major
reference for the justification of ABC. In addition, it creates a link with machinelearning techniques, where ABC is yet at an early stage of development.
191
In [17], an ABC scheme is derived in such a way that the ABC simulated sequence
remains an HMM, the conditional distribution of the observables given the latent
Markov chain being modified by the ABC acceptance ball. This means that conducting maximum likelihood (or Bayesian) estimation based on the ABC sample is
equivalent to exact inference under the perturbed HMM scheme. In this sense, this
equivalence also connects with [24, 71] perspectives on exact ABC. While the
paper provides asymptotic bias for a fixed value of the tolerance , it also proves that
an arbitrary level of accuracy can be attained with enough data and a small enough .
The authors of the paper show in addition (as in [24]) that ABC inference based on
noisy observations y1 + 1 , . . . , yn + n with the same tolerance , is equivalent
to a regular inference based on the original data y1 , . . . , yn , hence the consistence of
Algorithm 4. Furthermore, the asymptotic variance of the ABC version is shown to
always be larger than the asymptotic variance of the standard MLE, and decreasing
as 2 . The paper also contains an illustration on an HMM with -stable observables.
Notice that the restriction to summary statistics that preserve the HMM structure
is paramount for the results in the paper to apply, hence prevents the use of truly
summarising statistics that would not grow linearly in dimension with the size of the
HMM series.
192
C.P. Robert
notion that simulating pseudo-data ( la ABC) with a known parameter value allows
for a Monte Carlo evaluation of the credible interval genuine coverage, hence for
a calibration of the tolerance . The delicate issue is about the generation of these
known parameters. For instance, if the pair (, y) is generated from the joint distribution made of prior times likelihood, and if the credible region is also based on
the true posterior, the average coverage is the nominal one. On the other hand, if
the credible interval is based on a poor (ABC) approximation to the posterior, the
average coverage should differ from the nominal one. Given that ABC is only an
approximation, however, this approach may fail to return a powerful diagnostic. In
their implementation, the authors end up approximating the p-value P(0 < ) and
checking for uniformity.
where y 0 = (y10 , . . . , yT0 ) is the actual observation. The ABC approximation indeed
retains the same likelihood structure and allows for derivations of consistency properties (in the number of observations) of the ABC estimates. In particular, using such
a distance in the algorithm allows for the approximation to converge to the genuine
posterior when the tolerance goes to zero [9]. This is the setting where [24] (see also
17) show that noisy ABC is well-calibrated, i.e. has asymptotically proper convergence properties. Most of the results obtained by Jasra and co-authors are dedicated
to specific classes of models, from iid models [17, 24, 31] to observation-driven
times-series [31] to other forms of HMM (17, 21, 41) mostly for MLE consistency
results. The constraint mentioned above leads to computational difficulties as the
acceptance rate quickly decreases with n (unless the tolerance is increasing with n).
The authors of [31] then suggest raising the number of pseudo-observations to average indicators in the above product and to make it random in order to ensure a
fixed number of acceptances. Moving to ABC-SMC (for sequential Monte Carlo,
193
see [4] and Algorithm 5), [32] establish unbiasedness and convergence within this
framework, in connection with the alive particle filter [35].
Algorithm 5 ABC-SMC
Given an observation x 0 , 0 < < 1, and a proposal distribution q0 ()
Set 0 = + and i = 0
repeat
for t = 1 to N do
Generate t from the proposal qi ()
Generate x from the model f (|t )
Compute the distance dt = (x , x 0 ) and the weight t = (t )/qi (t )
end for
Set i = i + 1
Update i as the weighted -quantile of the dt s and qi based on the weighted t s
until is stable
return N weighted values t
n
pi
i=1
where the maximum is taken on for all ps on the simplex of Rn such that
pi h(xi , ) = 0
As such, empirical likelihood is a non-parametric approach in the sense that the distribution of the data does not need to be specified, only some of its characteristics.
Econometricians have developed this kind of approach over the years, see e.g. [26].
However, this empirical likelihood technique can also be seen as a convergent
194
C.P. Robert
approximation to the likelihood and hence able to be exploited for cases when the
exact likelihood cannot be derived. For instance, [43] propose using it as a substitute
to the exact likelihood in Bayes formula, as sketched in Algorithm 6.
Algorithm 6 ABC (with empirical likelihood)
Given an observation x 0
for i = 1 N do
Generate i from the prior ()
Set the weight i = L el (i |x 0 )
end for
return (i , i ), i = 1, . . . , N
Use weighted sample as in importance sampling
195
196
C.P. Robert
197
The paper also suggests a further regularisation of [6] by ridge regression, although
L 1 penalty la Lasso would be more appropriate in my opinion for removing extraneous summary statistics. Unsurprisingly, ridge regression does better than plain
regression in the comparison experiment when there are many almost collinear summary statistics, but an alternative conclusion could be that regression analysis is not
that appropriate with many summary statistics. Indeed, summary statistics are not
quantities of interest but data summarising tools towards a better approximation of
the posterior at a given computational cost.
when evaluated at the maximum composite likelihood at the observed data point.
In the specific (but unrealistic) case of an exponential family, an ABC based on the
score is asymptotically (i.e., as the tolerance goes to zero) exact. Working with a
composite likelihood thus leads to a natural summary statistics. As with the empirical
likelihood approach, the composite likelihoods that are available for computation are
usually restricted in number, thus leading to an almost automated choice of a summary
statistic.
An interesting (common) feature in most examples found in this paper is that
comparisons are made between ABC using the (truly) sufficient statistic and ABC
based on the pairwise score function, which essentially relies on the very same
statistics. So the difference, when there is a difference, pertains to the choice of
a different combination of the summary statistics or, somehow equivalently to the
choice of a different distance function. One of the examples starts from the MA(2)
toy-example of [39]. The composite likelihood is then based on the consecutive triplet
marginal densities.
In a related vein, [40] offer a new perspective on ABC based on pseudo-scores.
For one thing, it concentrates on the selection of summary statistics from a more
econometrics than usual point of view, defining asymptotic sufficiency in this context and demonstrating that both asymptotic sufficiency and Bayes consistency can
be achieved when using maximum likelihood estimators of the parameters of an
auxiliary model as summary statistics. In addition, the proximity to (asymptotic)
sufficiency yielded by the MLE is replicated by the score vector. Using the score
instead of the MLE as a summary statistics allows for huge gains in terms of speed.
The method is then applied to a continuous time state space model, using as auxiliary
198
C.P. Robert
model an augmented unscented Kalman filter. The various state space models tested
therein demonstrate that the ABC approach based on the marginal [likelihood] score
performs quite well, including against [24] approach. It strongly supports the idea
of using such a generic object as the unscented Kalman filter for state space models,
even when it is not a particularly accurate representation of the true model. Another
appealing feature is found in the connections made with indirect inference.
199
in an altogether different way, namely as a tool for assessing the goodness of fit of a
given model. The fundamental idea is to process as an additional parameter of the
model, simulating from a joint posterior distribution
f (, |x0 ) (|x0 , ) ( ) ()
where x0 is the data and (|x0 , ) plays the role of the likelihood. (The s are
obviously the priors on and .) In fact, (|x0 , ) is the prior predictive density of
(S(x), S(x0 )) given and x0 when x is distributed from f (x| ). The authors then
derive an ABC algorithm they call ABC to simulate an MCMC chain targeting this
joint distribution, replacing (|x0 , ) with a non-parametric kernel approximation.
For each model under comparison, the marginal posterior distribution on the error
is then used to assess the fit of the model, the logic of it being that this posterior
should include 0 in a reasonable credible interval. (Contrary to other ABC papers,
can be negative and multidimensional in this paper.)
Given the wealth of innovations contained in the paper, let me add here that,
while the authors stress they use the data once (a point always uncertain to me), they
also define the above target by using simultaneously a prior distribution on and a
conditional distribution on the same -that they interpret as the likelihood in (, ).
The product being most often defined as a density in (, ), it can be simulated from,
but this is hardly a regular Bayesian problem, especially because it seems the prior
on significantly contributes to the final assessment. Further and more developed
criticisms are published as [55], along with a reply by the authors [50]. Let me stress
one more time how original this paper is and deplore a lack of follow-up in the
subsequent literature for a practical method that should be implemented on existing
ABC software.
200
C.P. Robert
g1 (x) S
B (x)
g2 (x) 12
where the last term is the limiting ABC Bayes factor. Therefore, in the favourable
case of the existence of a sufficient statistic, using only the sufficient statistic induces
a difference in the result that fails to converge with the number of observations or
simulations. Quite the opposite, it may diverge one way or another as the number
of observations increases. Again, this is in the favourable case of sufficiency. In the
realistic setting of insufficient statistics, things deteriorate even further. This practical
situation implies a wider loss of information compared with the exact inferential
approach, hence a wider discrepancy between the exact Bayes factor and the quantity
produced by an ABC approximation. The paper is thus intended as a warning to the
community about the dangers of this approximation, especially when considering the
rapidly increasing number of applications using ABC for conducting model choice
and hypothesis testing.
This paper stresses a fundamental and even foundational distinction between ABC
point (and confidence) estimation, and ABC model choice, namely that the problem
stands at another level for Bayesian model choice (using posterior probabilities).
When doing point estimation with insufficient statistics, the information content is
poorer, but unless one uses very degraded (i.e., ancillary) summary statistics, inference is converging. The posterior distribution stays different from the true posterior
in this case but, at least, increasing the number observations brings more information
about the parameter (and convergence when this number goes to infinity). For model
choice, this is not guaranteed if we use summary statistics that are not inter-model
sufficient, as shown by the Poisson and normal examples in [54]. Furthermore, except
for very specific cases such as Gibbs random fields [28], it is almost always impossible to derive inter-model sufficient statistics, beyond the raw sample. The paper
includes a realistic and computationally costly population genetic illustration, where
it exhibits a clear divergence in the numerical values of both approximations of the
posterior probabilities. The error rates in using the ABC approximation to choose
between two scenarios, labelled 1 and 2, are 14.5 and 12.5 % (under scenarios 1 and 2),
respectively.
A quite related if less pessimistic paper is [18], also concerned with the limiting
behaviour for the ratio,
g1 (x) S
B (x).
B12 (x) =
g2 (x) 12
Indeed, the authors reach the opposite conclusion from ours, namely that the problem
can be solved by a sufficiency argument. Their point is that, when comparing models
within exponential families (which is the natural realm for sufficient statistics), it
is always possible to build an encompassing model with a sufficient statistic that
201
to the sum of the observables into a large sufficient statistic produces a ratio g1 /g2 that
is equal to 1, hence avoids any discrepancy. However, this encompassing property
only applies for exponential families. Looking at what happens in the limiting case
when one is relying on a common sufficient statistic is a formal study that sheds no
light on the (potentially huge) discrepancy between the ABC-based Bayes factor and
the true Bayes factor in the typical case.
202
C.P. Robert
in practical cases. Nonetheless, this paper comes as a third if not last step in a series
of papers on the issue of ABC model choice. Indeed, we first identified a sufficiency
property [28], then realised that this property was a quite rare occurrence, and we
finally made the theoretical advance in [38]. This last step characterises when is a
statistic good enough to conduct model choice, with a clear answer that the ranges
of the mean of the summary statistic under each model should not intersect. From a
methodological point of view, only the conclusion should be taken into account, as
it is then straightforward to come up with quick simulation devices to check whether
a summary statistic behaves differently under both models, taking advantage of the
reference table already available (instead of having to run Monte Carlo experiments
with ABC steps). The paper [38] includes a 2 check about the relevance of a given
summary statistics.
In [59], the authors consider summary statistics for ABC model choice in hidden
Gibbs random fields. The move to a hidden Markov random field means that the
original approach of [28] does not apply: there is no dimension-reduction sufficient
statistics in that case. The authors introduce a small collection of (four!) focused
statistics to discriminate between Potts models. They further define a novel misclassification rate, conditional on the observed value and derived from the ABC reference
table. It is the predictive error rate
)
= m|S(y obs ))
PABC (m(Y
integrating out both the model index m and the corresponding random variable Y (and
the hidden intermediary parameter) given the observationsmore precisely given the
transform of the observations by the summary statistic S. In a simulation experiment,
this paper shows that the predictive error rate significantly decreases by including
2 or 4 geometric summary statistics on top of the no-longer-sufficient concordance
statistics.
6 Conclusion
This survey reflects upon the diversity and the many directions of progress in this
field of ABC research. The overall message is that the on-going research has led both
to consider ABC as part of the statistical tool-kit and to envision different approaches
to statistical modelling, where a complete representation of the whole world is no
longer feasible. Over the evolution of ABC in the past fifteen years we have thus
moved from approximate methods to approximate models, which is a positive move
in my opinion.
Acknowledgments The author is most grateful to an anonymous referee for her or his help with
the syntax and grammar of this survey. He also thanks the organisers of MCqMC 2014 in Leuven
for their kind invitation.
203
References
1. Allingham, D., King, R., Mengersen, K.: Bayesian estimation of quantile distributions. Stat.
Comput. 19, 189201 (2009)
2. Andrieu, C., Roberts, G.: The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Stat. 37(2), 697725 (2009)
3. Beaumont, M.: Approximate Bayesian computation in evolution and ecology. Annu. Rev. Ecol.
Evol. Syst. 41, 379406 (2010)
4. Beaumont, M., Cornuet, J.-M., Marin, J.-M., Robert, C.: Adaptive approximate Bayesian computation. Biometrika 96(4), 983990 (2009)
5. Beaumont, M., Nielsen, R., Robert, C., Hey, J., Gaggiotti, O., Knowles, L., Estoup, A., Mahesh,
P., Coranders, J., Hickerson, M., Sisson, S., Fagundes, N., Chikhi, L., Beerli, P., Vitalis, R.,
Cornuet, J.-M., Huelsenbeck, J., Foll, M., Yang, Z., Rousset, F., Balding, D., Excoffier, L.: In
defense of model-based inference in phylogeography. Mol. Ecol. 19(3), 436446 (2010)
6. Beaumont, M., Zhang, W., Balding, D.: Approximate Bayesian computation in population
genetics. Genetics 162, 20252035 (2002)
7. Belle, E., Benazzo, A., Ghirotto, S., Colonna, V., Barbujani, G.: Comparing models on the
genealogical relationships among Neandertal, Cro-Magnoid and modern Europeans by serial
coalescent simulations. Heredity 102(3), 218225 (2008)
8. Berger, J., Fienberg, S., Raftery, A., Robert, C.: Incoherent phylogeographic inference. Proc.
Natl. Acad. Sci. 107(41), E57 (2010)
9. Biau, G., Crou, F., Guyader, A.: New insights into approximate Bayesian computation.
Annales de lIHP (Probab. Stat.) 51, 376403 (2015)
10. Blum, M.: Approximate Bayesian computation: a non-parametric perspective. J. Am. Stat.
Assoc. 105(491), 11781187 (2010)
11. Blum, M., Franois, O.: Non-linear regression models for approximate Bayesian computation.
Stat. Comput. 20, 6373 (2010)
12. Blum, M.G.B., Nunes, M.A., Prangle, D., Sisson, S.A.: A comparative review of dimension
reduction methods in approximate Bayesian computation. Stat. Sci. 28(2), 189208 (2013)
13. Bollerslev, T., Chou, R., Kroner, K.: ARCH modeling in finance. A review of the theory and
empirical evidence. J. Econom. 52, 559 (1992)
14. Calvet, C., Czellar, V.: Accurate methods for approximate Bayesian computation filtering. J.
Econom. (2014, to appear)
15. Cornuet, J.-M., Ravign, V., Estoup, A.: Inference on population history and model checking using DNA sequence and microsatellite data with the software DIYABC (v1.0). BMC
Bioinform. 11, 401 (2010)
16. Cornuet, J.-M., Santos, F., Beaumont, M., Robert, C., Marin, J.-M., Balding, D., Guillemaud, T.,
Estoup, A.: Inferring population history with DIYABC: a user-friendly approach to approximate
Bayesian computation. Bioinformatics 24(23), 27132719 (2008)
17. Dean, T., Singh, S., Jasra, A., Peters, G.: Parameter inference for hidden Markov models with
intractable likelihoods. Scand. J. Stat. (2014, to appear)
18. Didelot, X., Everitt, R., Johansen, A., Lawson, D.: Likelihood-free estimation of model evidence. Bayesian Anal. 6, 4876 (2011)
19. Diggle, P., Gratton, R.: Monte Carlo methods of inference for implicit statistical models. J. R.
Stat. Soc. Ser. B 46, 193227 (1984)
20. Drovandi, C., Pettitt, A., Fddy, M.: Approximate Bayesian computation using indirect inference. J. R. Stat. Soc. Ser. A 60(3), 503524 (2011)
21. Ehrlich, E., Jasra, A., Kantas, N.: Gradient free parameter estimation for hidden markov models
with intractable likelihoods. Method. Comp. Appl. Probab. (2014, to appear)
22. Excoffier, C., Leuenberger, D., Wegmann, L.: Bayesian computation and model selection in
population genetics (2009)
23. Fagundes, N., Ray, N., Beaumont, M., Neuenschwander, S., Salzano, F., Bonatto, S., Excoffier,
L.: Statistical evaluation of alternative models of human evolution. Proc. Natl. Acad. Sci.
104(45), 1761417619 (2007)
204
C.P. Robert
24. Fearnhead, P., Prangle, D.: Constructing summary statistics for Approximate Bayesian computation: semi-automatic approximate Bayesian computation. J. R. Stat. Soc.: Ser. B (Stat.
Method.), 74(3), 419474. (With discussion)
25. Ghirotto, S., Mona, S., Benazzo, A., Paparazzo, F., Caramelli, D., Barbujani, G.: Inferring
genealogical processes from patterns of bronze-age and modern DNA variation in Sardinia.
Mol. Biol. Evol. 27(4), 875886 (2010)
26. Gouriroux, C., Monfort, A.: Simulation Based Econometric Methods. CORE Lecture Series.
CORE, Louvain (1995)
27. Gouriroux, C., Monfort, A., Renault, E.: Indirect inference. J. Appl. Econom. 8, 85118 (1993)
28. Grelaud, A., Marin, J.-M., Robert, C., Rodolphe, F., Tally, F.: Likelihood-free methods for
model choice in Gibbs random fields. Bayesian Anal. 3(2), 427442 (2009)
29. Guillemaud, T., Beaumont, M., Ciosi, M., Cornuet, J.-M., Estoup, A.: Inferring introduction
routes of invasive species using approximate Bayesian computation on microsatellite data.
Heredity 104(1), 8899 (2009)
30. Jasra, A.: Approximate Bayesian Computation for a Class of Time Series Models. e-prints
(2014)
31. Jasra, A., Kantas, N., Ehrlich, E.: Approximate inference for observation driven time series
models with intractable likelihoods. TOMACS (2014, to appear)
32. Jasra, A., Lee, A., Yau, C., Zhang, X.: The Alive Particle Filter. e-prints (2013)
33. Jasra, A., Singh, S., Martin, J., McCoy, E.: Filtering via approximate Bayesian computation.
Stat. Comp. 22, 12231237 (2012)
34. Joyce, P., Marjoram, P.: Approximately sufficient statistics and Bayesian computation. Stat.
Appl. Genet. Mol. Biol. 7(1), Article 26 (2008)
35. Le Gland, F., Oudjane, N.: A Sequential Particle Algorithm that Keeps the Particle System
Alive. Lecture Notes in Control and Information Sciences, vol. 337, pp. 351389. Springer,
Berlin (2006)
36. Lehmann, E., Casella, G.: Theory of Point Estimation, revised edn. Springer, New York (1998)
37. Leuenberger, C., Wegmann, D.: Bayesian computation and model selection without likelihoods.
Genetics 184(1), 243252 (2010)
38. Marin, J., Pillai, N., Robert, C., Rousseau, J.: Relevant statistics for Bayesian model choice. J.
R. Stat. Soc. Ser. B 76(5), 833859 (2014)
39. Marin, J., Pudlo, P., Robert, C., Ryder, R.: Approximate Bayesian computational methods.
Stat. Comput. 21(2), 279291 (2011)
40. Martin, G.M., McCabe, B.P.M., Maneesoonthorn, W., Robert, C.P. Approximate Bayesian
Computation in State Space Models. e-prints (2014)
41. Martin, J., Jasra, A., Singh, S., Whiteley, N., Del Moral, P., McCoy, E.: Approximate Bayesian
computation for smoothing. Stoch. Anal. Appl. 32(3), (2014)
42. McKinley, T., Ross, J., Deardon, R., Cook, A.: Simulation-based Bayesian inference for epidemic models. Comput. Stat. Data Anal. 71, 434447 (2014)
43. Mengersen, K., Pudlo, P., Robert, C.: Bayesian computation via empirical likelihood. Proc.
Natl. Acad. Sci. 110(4), 13211326 (2013)
44. Owen, A.B.: Empirical likelihood ratio confidence intervals for a single functional. Biometrika
75, 237249 (1988)
45. Owen, A.B.: Empirical Likelihood. Chapman & Hall, Boca Raton (2001)
46. Patin, E., Laval, G., Barreiro, L., Salas, A., Semino, O., Santachiara-Benerecetti, S., Kidd, K.,
Kidd, J., Van Der Veen, L., Hombert, J., et al.: Inferring the demographic history of African
farmers and pygmy hunter-gatherers using a multilocus resequencing data set. PLoS Genet.
5(4), e1000448 (2009)
47. Prangle, D., Blum, M.G.B., Popovic, G., Sisson, S.A.: Diagnostic tools of approximate
Bayesian computation using the coverage property. e-prints (2013)
48. Pritchard, J., Seielstad, M., Perez-Lezaun, A., Feldman, M.: Population growth of human Y
chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 17911798
(1999)
205
49. Ramakrishnan, U., Hadly, E.: Using phylochronology to reveal cryptic population histories:
review and synthesis of 29 ancient DNA studies. Mol. Ecol. 18(7), 13101330 (2009)
50. Ratmann, O., Andrieu, C., Wiuf, C., Richardson, S.: Reply to Robert et al.: Model criticism
informs model choice and model comparison. Proc. Natl. Acad. Sci. 107(3), E6E7 (2010)
51. Ratmann, O., Andrieu, C., Wiujf, C., Richardson, S.: Model criticism based on likelihood-free
inference, with an application to protein network evolution. Proc. Natl. Acad. Sci. USA 106,
16 (2009)
52. Robert, C.: Discussion of constructing summary statistics for Approximate Bayesian Computation by Fernhead, P., Prangle, D., J. R. Stat. Soc. Ser. B, 74(3), 447448 (2012)
53. Robert, C., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York (2004)
54. Robert, C., Cornuet, J.-M., Marin, J.-M., Pillai, N.: Lack of confidence in ABC model choice.
Proc. Natl. Acad. Sci. 108(37), 1511215117 (2011)
55. Robert, C., Mengersen, K., Chen, C.: Model choice versus model criticism. Proc. Natl. Acad.
Sci. 107(3), E5 (2010)
56. Rubin, D.: Bayesianly justifiable and relevant frequency calculations for the applied statistician.
Ann. Stat. 12, 11511172 (1984)
57. Ruli, E., Sartori, N., Ventura, L.: Approximate Bayesian Computation with composite score
functions. e-prints (2013)
58. Stephens, M., Donnelly, P.: Inference in molecular population genetics. J. R. Stat. Soc.: Ser. B
(Stat. Method.) 62(4), 605635 (2000)
59. Stoehr, J., Pudlo, P., Cucala, L.: Adaptive ABC model choice and geometric summary statistics
for hidden Gibbs random fields. Stat. Comput. pp. 113 (2014)
60. Sunnker, M., Busetto, A., Numminen, E., Corander, J., Foll, M., Dessimoz, C.: Approximate
Bayesian computation. PLoS Comput. Biol. 9(1), e1002803 (2013)
61. Tanner, M., Wong, W.: The calculation of posterior distributions by data augmentation. J. Am.
Stat. Assoc. 82, 528550 (1987)
62. Tavar, S., Balding, D., Griffith, R., Donnelly, P.: Inferring coalescence times from DNA
sequence data. Genetics 145, 505518 (1997)
63. Templeton, A.: Statistical hypothesis testing in intraspecific phylogeography: nested clade
phylogeographical analysis vs. approximate Bayesian computation. Mol. Ecol. 18(2), 319
331 (2008)
64. Templeton, A.: Coherent and incoherent inference in phylogeography and human evolution.
Proc. Natl. Acad. Sci. 107(14), 63766381 (2010)
65. Toni, T., Welch, D., Strelkowa, N., Ipsen, A., Stumpf, M.: Approximate Bayesian computation
scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface
6(31), 187202 (2009)
66. van der Vaart, A.: Asymptotic Statistics. Cambridge University Press, Cambridge (1998)
67. Verdu, P., Austerlitz, F., Estoup, A., Vitalis, R., Georges, M., Thry, S., Froment, A., Le Bomin,
S., Gessain, A., Hombert, J.-M., Van der Veen, L., Quintana-Murci, L., Bahuchet, S., Heyer,
E.: Origins and genetic diversity of pygmy hunter-gatherers from Western Central Africa. Curr.
Biol. 19(4), 312318 (2009)
68. Wegmann, D., Excoffier, L.: Bayesian inference of the demographic history of chimpanzees.
Mol. Biol. Evol. 27(6), 14251435 (2010)
69. Wikipedia (2014). Approximate Bayesian computation Wikipedia, The Free Encyclopedia
70. Wilkinson, R.L: Approximate Bayesian computation (ABC) gives exact results under the
assumption of model error. Technical Report (2008)
71. Wilkinson, R.: Approximate Bayesian computation (ABC) gives exact results under the
assumption of model error. Stat. Appl. Genet. Mol. Biol. 12(2), 129141 (2013)
72. Wilkinson, R.D.: Accelerating ABC methods using Gaussian processes. e-prints (2014)
Part II
Contributed Papers
Abstract We propose Monte Carlo (MC), single level Monte Carlo (SLMC) and
multilevel Monte Carlo (MLMC) methods for the numerical approximation of statistical solutions to the viscous, incompressible NavierStokes equations (NSE) on
a bounded, connected domain D Rd , d = 1, 2 with no-slip or periodic boundary
conditions on the boundary D. The MC convergence rate of order 1/2 is shown
to hold independently of the Reynolds number with constant depending only on
the mean kinetic energy of the initial velocity ensemble. We discuss the effect of
space-time discretizations on the MC convergence. We propose a numerical MLMC
estimator, based on finite samples of numerical solutions with finite mean kinetic
energy in a suitable function space and give sufficient conditions for mean-square
convergence to a (generalized) moment of the statistical solution. We provide in
particular error bounds for MLMC approximations of statistical solutions to the viscous Burgers equation in space dimension d = 1 and to the viscous, incompressible
Navier-Stokes equations in space dimension d = 2 which are uniform with respect
to the viscosity parameter. For a more detailed presentation and proofs we refer the
reader to Barth et al. (Multilevel Monte Carlo approximations of statistical solutions
of the NavierStokes equations, 2013, [6]).
Keywords Multilevel Monte Carlo method NavierStokes equations Statistical
solutions Finite volume
A. Barth (B)
SimTech, University of Stuttgart, Pfaffenwaldring 5a, 70569 Stuttgart, Germany
e-mail: andrea.barth@mathematik.uni-stuttgart.de
C. Schwab
Seminar Fr Angewandte Mathematik, ETH Zrich, Rmistrasse 101,
8092 Zurich, Switzerland
e-mail: schwab@math.ethz.ch
J. ukys
Computational Science Laboratory, ETH Zrich, Clausiusstrasse 33,
8092 Zurich, Switzerland
e-mail: jonas.sukys@mavt.ethz.ch
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_8
209
210
A. Barth et al.
u u + (u )u + p = f , u = 0,
(1)
t
with the kinematic viscosity 0 and with a given initial velocity field u(0) = u0 .
In space dimension d = 1, i.e. for D = (0, 1), the NSE reduce to the (viscous for
> 0) Burgers equation. We focus here on Eq. (1) with periodic or no-slip boundary
conditions. We provide numerical examples for periodic boundary conditions, but
emphasize that the theory of statistical solutions extends also to other boundary conditions (see [7]). Apart from not exhibiting viscous boundary layers, homogeneous
statistical solutions to the NSE with periodic boundary conditions appear in certain
physical models [7, Chaps. IV and V].
Statistical solutions aim at describing the evolution of ensembles of solutions
through their probability distribution. In space dimension d 2, for no-slip boundary
conditions we define the function space
Hnsp = {v L 2 (D)d : v = 0 in H 1 (D), v n|D = 0 in H 1/2 (D)},
where n is the unit outward-pointing normal vector from D. For D = (0, 1)2 and
periodic boundary conditions, we denote the corresponding space of functions with
vanishing average over D by Hper . We remark that Hper coincides with the space
H(L)
in [7, Chap. V.1.2] of L-periodic functions with vanishing average, with period
L = 1. Whenever we discuss generic statements valid for either boundary conditions,
we write H {Hnsp , Hper }.
We assume given a probability measure on H, where H is equipped with the Borel -algebra B(H). Statistical solutions to the NSE as defined in [7, 8] are parametric
families of probability measures on H. Rather than being restricted to one single
initial condition, a (FoiasProdi) statistical solution to the NSE is a one-parameter
family of probability measures which describes the evolution of statistics of initial
velocity ensembles. Individual solutions of Eq. (1) are special cases of statistical
solutions, for initial measure 0 charging one initial velocity u0 H. In general, the
initial distribution 0 is defined via an underlying probability space (, F , P). The
distribution of initial velocities is assumed to be given as image measure under a
H-valued random variable with distribution 0 . This random variable X is defined as
a mapping from the measurable space (, F ) into the measurable space (H, B(H))
such that 0 = X P. Consider the NSE (1) in space dimension d = 2 with viscosity
211
> 0 without forcing, i.e. with f 0. In this case, the solution to the NSE is unique
and the initial-data-to-solution map is a semigroup S = (S (t, 0), t J) on H [7,
Chap. III.3.1]. Then, a (unique) time-dependent family of probability measures =
(t , t J) on H is given by [7, Chap. IV.1.2]
t (E) = 0 (S (t, 0)1 E), E B(H),
(2)
i.e., for every t 0, and every E B(H), P({u(t) E}) = P({u0 S (t, 0)1 E}) =
0 ((S (t, 0))1 E). We remark that for nonzero, time-dependent forcing f , S defines
in general not a semigroup on H [7, Chap. V.1.1]. For any time t J, we may then
define the generalized moment
H
(w) dt (w)
H
(v) dt (v)
=
H
(3)
(4)
k
i=1
212
A. Barth et al.
Energy equalities are central for statistical solutions to Eq. (1); we integrate Eq. (3),
which leads, in space dimension d = 2 and for all t J, to (cp. [7, Eq. V.1.9])
H
v2H
dt (v)
+ 2
t
0
v2V ds (v) ds
t
=
(f (s), v)H ds (v) ds +
v2H d0 (v).
V
(5)
Equations (3) and (5) motivate the definition of statistical solutions to Eq. (1).
Definition 2 [7, Definitions V.1.2, V.1.4] In space dimension d = 1, 2, a oneparametric family = (t , t J) of Borel probability measures on H is a statistical
solution to Eq. (1) on the time interval J if
1. the initial Borel probability measure 0 on H has finite mean kinetic energy, i.e.,
H
v2H d0 (v) < ,
2. f L 2 (J; H) and the Borel probability measures t satisfy Eq. (3) for all C
and Eq. (5) holds.
We note that in space dimension d = 3 the notion of statistical solution is more
delicate, cp. [8]. We recall an existence (and, in space dimension d = 2, uniqueness,)
result (see [7, Theorems V.1.2, V.1.3, V.1.4]), [8]): if 0 is supported in BH (R) for
some 0 < R < , and if the forcing term f H is time-independent, the statistical
solution is unique and given by Eq. (2).
2 Discretization Methods
Our goal is the numerical approximation of (generalized) moments of the statistical
solution (t , t J) for a given initial distribution 0 on H. We achieve this by
approximating, for given C (with C as in Definition 1) and for given 0 with
finite mean kinetic energy on H, the expression
et () =
H
(w) dt (w), t J.
As a first approach, we assume that we can sample from the exact initial distribution
0 . Since 0 is a distribution on the infinite-dimensional space H, this is, in general,
a simplifying assumption. However, if the probability measure 0 is supported on
a finite-dimensional subspace of H, the assumption is no constraint. We discuss an
appropriate approximation of the initial distribution in Sect. 3. We generate M N
independent copies (wi , i = 1, . . . , M) of u0 , where u0 is 0 -distributed. Assume for
213
now that for each draw wi H, distributed according to 0 , we can solve ui (t) =
S (t, 0)wi exactly, and that we can evaluate the (real-valued) functional (ui (t))
exactly. Then
e ()
t
EMt ((u(t)))
M
M
1
1
i
:=
(u (t)) =
(S (t, 0)wi ),
M i=1
M i=1
(6)
where we denoted by (EMt , M N) the sequence of MC estimators which approximate the (generalized) expectation et () for C . To state the error bound on
the variance of the MC estimator, given in Eq. (6), we assume for simplicity that the
right hand side of Eq. (1) is equal to zero, i.e., f 0 (all results that follow have an
analog for nonzero forcing f L 2 (D)).
Proposition 1 Let C be a test function. Then, an error bound on the meansquare error of the Monte Carlo estimator EMt , for M N, is given by
1/2
1
Var t ()
et () EMt ((u(t)))L2 ((H,t );R) =
M
1/2
1
1+
.
C
w2H d0 (w)
M
H
For > 0, the latter inequality is strict. Here, we used the notation Var P (X) =
eP (eP (X) X2E ) for a square-integrable, E-valued random variable X under the
measure P. We define, further, L 2 ((, P); E) as the set of square-summable (with
respect to the measure P) random variables taking values in the separable Banach
1/2
space E, and equip it with norm XL2 ((,P);E) := eP (X2E ) . Test functions in C
fulfill, for some constant C > 0, the linear growth condition |(w)| C(1 + wH ),
for all w H. We remark that the MC error estimate in Proposition 1 is uniform with
respect to > 0 (see [6]). With EMt being a convex combination of individual Leray
Hopf solutions, by [8, Theorem 4.2] the MC estimator EMt converges as M (in
the sense of sequential convergence of measures, and uniformly on bounded time
intervals) to a ViikFoursikov statistical solution as defined in [8].
Space and Time Discretization
The MC error bounds in Proposition 1 are semi-discrete in the sense that they assume
the availability of an exact LerayHopf solution to the NSE for each initial velocity
sample drawn from 0 , and they pertain to bulk properties of the flow in the sense
that they depend on the H-norm of the individual flows. We have, therefore, to
perform additional space and time discretizations in order to obtain computationally
feasible approximations of (generalized) moments of statistical solutions. In MLMC
sampling strategies such as those proposed subsequently, we consider a sequence
of (space and time) discretizations which are indexed by a level index N0 . We
consider a dense, nested family of finite dimensional subspaces V = (V , N0 )
of V and therefore of H. Associated to the subspaces V , we have the refinement
214
A. Barth et al.
T
}.
t
We view the fully-discrete solution to Eq. (1) as the solution to a nonlinear dynamical
system according to
Dt (u ) = F (t, u ),
where Dt denotes the weak derivative with respect to time and the right hand side is
F (t, v) = f A v B (v, v).
Here, A denotes the discrete Stokes operator and B the associated bilinear form.
We denote by S = (S (t i , 0), i = 0, . . . , T / t) the fully-discrete solution operator that maps u0 into u = (u (t i ), i = 0, . . . , T / t). We assume that the spaces
in V and the time discretizations are chosen such that the following error bound
holds.
Assumption 1 The sequence of fully-discrete solutions (u , N0 ) converges to
the solution u to Eq. (1). The space and time discretization error is bounded, for
N and t , with h t, either by
1.
u(t) u H = S (t, 0)u0 S (t, 0)u0 H C (h s + ( t)s ) C h s ,
(7)
h
h
( t)
+
)C ,
(8)
215
M
M
1
1
((u (t))) =
(S (t, 0)wi ).
M i=1
M i=1
|(v) (w)| Cv wH .
(9)
We remark that Eq. (9) follows from being continuously differentiable and with
compact support. The constant C depends on the maximum of and on the H norms
of g1 , . . . , gk . Under Eq. (9), the SLMC estimator admits the following mean-square
error bound (see [6]).
Proposition 2 If, for C fulfilling Eq. (9) and N0 , the generalized moment
of the statistical solution fulfills Assumption 1, for some s [0, 1] or some > 0
and h t, then the fully-discrete single level Monte Carlo estimator EMt ((u ))
admits, for t , the bound
216
A. Barth et al.
+ et ( (u ))L2 ((H,t );R)
M
1
C
+ (h ) .
M
For robust discretizations, (z) = zs , with C > 0 independent of , h and of .
The error bound for the SLMC estimator consists of two additive components, the
approximation of the spatial and temporal discretization and of the MC sampling.
Although we only established an upper bound, one can show that this error is, indeed,
of additive nature. This, in turn, indicates that the lack of scale-resolution in the
spatial and temporal approximation, i.e. if the discretization underresolves the scale
of viscous cut-off, can partly (in a mean-square sense) be offset by increasing the
number of samples, on the mesh-level in the MC approximation. This is in line
with similar findings for MLMC Galerkin discretizations for elliptic homogenization
problems in [2]. To ensure that the total error in Proposition 2 is smaller than a
prescribed tolerance > 0, we require
1/2
1
+ et ( (u ))L2 ((H,t );R) .
Var t ()
M
A sufficient condition for this is for some (0, 1)
1/2
1
Var t ()
and et ( (u ))L2 ((H,t );R) (1 ).
L
et ((u ) (u 1 )).
=1
Then we approximate the expectation in each term on the right hand side with a
SLMC estimator with a level dependent number of samples, so that we may write
EL t ((uL )) = EMt0 ((u0 )) +
L
=1
217
We call EL t the MLMC estimator for discretization level L N0 . The MLMC estimator has the following mean-square error bound.
Proposition 3 If, for C fulfilling Eq. (9) and L N0 , the generalized moment
of the statistical solution fulfills Assumption 1, for = 0, . . . , L with s [0, 1) or
> 0 and h t, the error of the fully-discrete multilevel Monte Carlo estimator
EL t ((uL )) admits, for t L , the bound
et () EL t ((uL ))L2 ((H,t );R)
et ( (uL ))L2 ((H,t );R) +
L
=0
1
1
Var t ((u ) (u 1 )) 2
L
1
1
1 + (h0 ) +
(h ) + (h 1 ) ,
C (hL ) +
M
M0
=1
where (u1 ) 0, (z) = zs or (z) = z and z [0, 1]. If, further, for all =
1, . . . , L, it holds that h t and that h 1 h , with some reduction factor
0 < < 1 independent of . Then, there exists C() > 0 independent of L, such
that there holds the error bound
L
1
1
et () EL t ((uL ))L2 ((H,t );R) C() (hL ) +
+
(h ) .
M0 =0 M
A proof can be found in [6]. This result leads again to the question how to chose
the sample numbers (M , = 1, . . . , L) that yield a given (mean kinetic energy)
error threshold . We have, if we assume that L (0, 1), the requirement
et ( (uL ))L2 ((H,t );R) (1 L ) and
L
=0
1/2
1
L .
Var t ((u ) (u 1 ))
Var t ((u ) (u 1 ))
1/2
then, to equilibrate the error for each level = 1, . . . , L, we choose the sample sizes
M = 2 (L )2
(10)
218
A. Barth et al.
is bounded by the prescribed tolerance > 0. This is only possible if the convergence requirement is fulfilled for level L, since then we can choose L accordingly
to satisfy a preset error bound. However, the convergence requirement might not
be fulfilled for all < L, hence, for those levels we have to sample accordingly. In
particular, denote by 0 the first level where the solution is scale-resolved. Then
Var t ((u ) (u 1 )) might be large, as might be ; thus has to be chosen
accordingly. Since it is infeasible to determine the values Var t ((u ) (u 1 ))
we estimate sample numbers from the second (more general) bound in Proposition 3.
We refer to [4] for an analysis of the computational complexity of MLMC estimators
in the case of weak or strong errors of SPDEs.
We proceed to determine the numbers M of SLMC samples. To this end, we
continue to work under Assumption 1. We either assume Eq. (7) or we work with
Eq. (8) under the assumption that at least on the finest level the scale resolution
requirement is fulfilled, i.e., hL < . For the latter, we consider the case where the
scale resolution requirement is not fulfilled for all levels up to level (). In this
case, for 0 () < L (meaning h () and h ()+1 < ), we choose on the
first level the sample number
M0 = O
2
((hL ))1
(11)
to equilibrate the statistical and the discretization error contributions. Here, and in
what follows, all constants implied in the Landau symbols O() are independent
of . According to this convergence analysis, the SLMC sample numbers M , for
discretization levels = 1, . . . , (), . . . , L should be chosen according to
M = O
2
(h )((hL ))1 2(1+ ) ,
(12)
for > 0 arbitrary (with the constant implied in O depending on ). Note that (h )
might be large for underresolved discretization levels. This choice of sample numbers
is in line with Eq. (10) for one particular sequence ( , = 1, . . . , L).
3 Numerics
We describe numerical experiments in the unit interval D = (0, 1) in space dimension
d = 1, i.e. for the viscous Burgers equation, and in space dimension d = 2, in
D = (0, 1)2 , with periodic boundary conditions, and with stochastic initial data. As
indicated in Sect. Space and Time Discretization, in space dimension d = 1, i.e. for
scalar problems, the bound in Assumption 1 holds with s = 1/2 and with a constant
C > 0 independent of (see [10]). If the mesh used for the space discretization
resolves the viscous scale, the first order Finite Volume method even converges with
rate s = 1 in L 1 (D) due to the high spatial regularity of the solution u, albeit with
constants which blow up as the viscosity tends to zero. Specifically, we consider
219
Eq. (1) with periodic boundary conditions in the physical domain D = [0, 1], i.e.
2
1 2
u+
(u ) = 2 u + f , for all x D, t [0, T ], ,
t
2 x
x
(13)
i i wi ,
iN
where ((i , wi ), i N) is a complete orthonormal system in H and consists of eigenvalues and eigenfunctions of Q. The sequence (i , i N) consists of real-valued,
independent, (standard) normal-distributed random variables. With -term truncations of KarhunenLove expansions
define a sequence of random variables
we
(X , N) given by X = m + i=1 i i wi , with mean m H and covariance
operator Q . The sequence of truncated sums X converge P-a.s. to X in the H-norm
as +. Then, we have the following lemma (see [5] for a proof).
Lemma 1 ([5]) If the eigenvalues (i , i N) of the covariance operator Q of the
Gaussian random variable X on H have a rate of decay of i C i for some
> 1, then the sequence (X , N) converges to X in L 2 (; H) and the error is
bounded by
1
1
2 .
X X L2 (;H) C
1
For the numerical realization of the MLMC method, and in particular for the
numerical experiments ahead, we need to draw samples from the initial distribution.
2
(D), where
As an example we therefore introduce a Gaussian distribution on H = Lper
D = (0, 1). In the univariate
case,
the
condition
u
=
0
in
(1)
becomes
void and
2
2
(D) = {u L 2 (D) : D u = 0}. A basis of Lper
(D) is given by (wi , i N), where
Lper
220
A. Barth et al.
wi (x) = sin(2i x). Then the covariance operator Q is with Mercers theorem defined,
2
(D), as
for Lper
Q(x) =
q(x, y)(y)dy
D
where the kernel is q(x, y) = iN i wi (x)wi (y) =
x) sin(2i y).
iN i sin(2i
<
to define
Now, we may choose any sequence (i , i N) with
i
iN
a covariance operator Q on H which is trace class. One possible choice would
be i i , for > 2. In our numerical experiments, we choose as eigenvalues
i = i2.5 for i 8 and zero otherwise, and the mean field m 0, i.e.
u0 (x, ) =
8
1
sin(2 ix)Yi ().
5/4
i
i=1
(14)
The kinematic viscosity is chosen to be = 103 and the source term is set to f 0.
All simulations reported below were performed on Cray XE6 in CSCS [14] with the
recently developed massively parallel code ALSVID-UQ [1, 13, 15]. Simulations
were executed on Cray XE6 (see [14]) with 1496 AMD Interlagos 2 16-core 64bit CPUs (2.1 GHz), 32 GB DDR3 memory per node, 10.4 GB/s Gemini 3D torus
interconnect with a theoretical peak performance of 402 TFlops.
The initial data in Eq. (14) and the reference solution uref at time t = 2 are depicted
in Fig. 1. The solid line represents the mean Et (uref ) and the dashed lines represent
the mean plus/minus the standard deviation (Var t (uref ))1/2 of the (random) solution
uref at every point x D. The variance
and1therefore the2standard deviation can easily
sin(2 ix)) , for x D. The solution is
be calculated by Var 0 (u0 (x)) = 8i=1 ( i5/4
computed with a standard first-order Finite Volume scheme using the Rusanov HLL
solver on a spatial grid in D of size 32768 cells and the explicit forward Euler
time stepping (see [12]) with the CFL number set to 0.9. The number of levels of
refinement is 9 (the coarsest level has 64 cells). The number of samples is chosen
according to the analysis in Sect. Space and Time Discretization with s = 1, i.e.
Fig. 1 Reference solution computed using the MLMC finite volume method
221
(u)(t, ) =
(15)
Note, that formally the function is not compactly supported. However, for onedimensional problems, there holds an energy bound (we refer to the results in [12])
with respect to the initial data u0 (, ), i.e. u(, t, )L2 (D) u0 (, )L2 (D) . Since
the values of the inner product can be bounded for every t and P-a.e. by
|(u(, t, ), g1 )H | u(, t, )L2 (D) g1 L2 (D) u0 (, )L2 (D) g1 L2 (D) < ,
the function () may be modified for large values, enforcing the required compact
support of in the Definition 1. We note, that such modification is -dependent,
and hence a more stringent bound of the L (, L 2 (D))-norm of the initial data
is required instead, i.e. we require that u0 (, )L2 (D) < C holds P-a.s. for some
constant C < . Such a bound holds for the uniformly distributed initial condition,
however, it does not hold for the Gaussian distributed initial condition considered
here. In the following numerical experiment, we choose the function g1 in Eq. (15)
to be g1 (x) = (x 0.5)3 . With this choice it can be easily verified that in Eq. (15)
fulfills the Lipschitz condition in Eq. (9).
Using MLMC Finite Volume approximations for the mean Et ((uref )) and the
variance Var t ((uref )) from Fig. 1 as a reference solution, we compute approximate solutions u using both, SLMC Finite Volume and MLMC Finite Volume
methods, on a family of meshes with spatial resolutions ranging from n0 = 64 cells
up to nL = 2048 cells. We monitor the convergence of the errors in EL t ((uL )) and
Var Lt ((uL )),
LE = Et ((uref )) EL t ((uL )) , LV = Var t ((uref )) Var Lt ((uL )) .
The number of samples on the finest mesh is set to ML = 4. The number of levels
for the MLMC Finite Volume method
is chosen so that the coarsest level contains
64 cells. Since 1/64 0.015 < = 101.5 0.03, the viscous cut-off scale
(which, in the present problem coincides with the scale of the viscous shock profile)
of the solution u is resolved on every mesh resolution level = 0, . . . , L.
Since the solution is a random field, the discretization error L is a random quantity
as well. For error convergence analysis we, therefore, compute a statistical estimator by averaging estimated discretization errors from several independent runs. We
compute the error in Proposition 3 by approximating the L 2 (H, R)-norm by MC
222
A. Barth et al.
sampling. Let (uref ) denote the reference solution and (((uL ))(k) , k = 1, . . . , K)
be a sequence of independent approximate solutions obtained by running the SLMC
Finite Volume or MLMC Finite Volume solver K N times. The L 2 (; H)-based
relative percentage error estimator is defined to be
RLE = 100 EK
e,(k)
L
2
, RLV = 100 EK
2
V,(k)
L
.
| Var t ((uref ))|
In order to obtain an accurate estimate of RLE and RLV , the number K must be
large enough to ensure a sufficiently small (<0.1) relative variance 2 (RLE ) and
2 (RLV ). We found K = 30 to be sufficient for our numerical experiments. Next,
we analyze the relative percentage error convergence plots of mean and variance.
In Fig. 2, we plot the error LE against the number of cells on the finest discretization level L in the left subplot and versus the computational work (runtime) in the
right subplot. The coarse level stays the same when we increase the finest discretization level L to obtain a convergence plot. Both SLMC and MLMC methods give
similar relative percentage errors for the same spatial resolution. However, there is a
significant difference in the runtime: MLMC methods are two orders of magnitude
faster than plain SLMC methods. The lower dashed line in the top-right corner of
each plot in Fig. 2 (and all subsequent figures) indicates the expected convergence
rate of the MLMC method obtained in Proposition 3. These expected convergence
rates coincide with the observations in the numerical experimental data. In Fig. 3,
we plot the error LV versus the number of cells on the finest discretization level L
in the left subplot and versus the computational work (runtime) in the right subplot.
Analogously as in the plots for the expectation, both SLMC and MLMC methods
give similar errors for the same spatial resolution. In terms of the required computational work for one percent error, MLMC methods are, in this example, two orders
of magnitude faster than plain SLMC methods.
We repeat the error convergence analysis for Burgers equation, but this time
with much fewer cells on the coarsest mesh resolution in the MLMC estimator. In
Fig. 2 Convergence of the error LE of the mean Et () of the viscous Burgers equation
223
Fig. 3 Convergence of the error LV of the variance Var t () of the viscous Burgers equation
particular, instead of taking 64 cells on the coarsest mesh resolution, we will take
only 8
cells, i.e. adding three more levels of mesh refinement. Since in this case
1/8 > = 101.5 0.03, the viscous cut-off length scale of the solution u is not
resolved on every mesh resolution level, in particular, it is resolved only on the mesh
resolution levels = 3, . . . , L, and it is under-resolved on = 0, 1, 2. Notice, that
the number of cells on the finer mesh resolutions stays the same as in the previous
experiment, where n3 = 64, . . . , nL = 2048. Note also that by the theory in [10],
the presently used numerical scheme converges robustly in H with order s 1/2,
meaning that the constant in the convergence bound is independent of . In Fig. 4,
we plot the error LE against the number of cells nL in the left subplot and versus
computational work (runtime) in the right subplot for the case of 8 cells on the
coarsest resolution. Even in the presence of multiple under-resolved levels, the error
convergence of the MLMC Finite Volume method is faster than the previous setup
(compared to Fig. 2). In Fig. 5, we plot the error LV versus the number of cells nL in
the left subplot and versus the computational work (runtime) in the right subplot for
the case of 8 cells on the coarsest resolution. Again, even in the presence of multiple
under-resolved levels, the error convergence of the MLMC Finite Volume method is
faster than the previous setup (compared to Fig. 3).
Fig. 4 Convergence of the error LE of the mean Et () of the viscous Burgers equation
224
A. Barth et al.
Fig. 5 Convergence of the error nV of the variance Var t () of the viscous Burgers equation
(17)
In terms of the (scalar in space dimension d = 2) vorticity (t), Eq. (1) becomes
the viscous vorticity equation: in the periodic setting, for s 0, given > 0, find
s+1
s1
(D)) H 1 (J; Hper
(D)) such that there holds Eq. (17) and
Xs := L 2 (J; Hper
s1
(D)),
t + u = , in L 2 (J; Hper
s+1
= in L 2 (J; Hper
(D)),
|t=0 = 0 in
(18)
s
Hper
(D).
The relations Eqs. (16) and (17) are bijective in certain scales of (Sobolev) spaces
of D-periodic functions so that Eqs. (16)(18) and (1) are equivalent. Moreover,
the isomorphisms rot and rot1 in Eqs. (16) and (17) allow to transfer the statistical
solutions = (t , t 0) equivalently to a one-parameter family = (t , t 0)
of probability measures on sets of admissible vorticities, defined for every ensemble
F of 0 -measurable initial vorticities 0 by
t (F) = 0 ((T (t))1 (F)), T (t)0 := (rot S (t, 0) rot1 )0 .
225
Fig. 6 L2 error of the mean for different viscosities with SLMC and MLMC, with respect to the
mesh width h and wall clock time
Here, we defined 0 (F) := (0 rot1 )(F). Existence and uniqueness of the velocity
statistical solutions imply existence and uniqueness of the vorticity statistical
solutions . We refer to [11] for further details, and also for detailed description of
the Finite Volume discretization and convergence analysis of Eq. (18) (Fig. 6).
In the ensuing numerical experiments, we consider a probability measure 0
concentrated on initial vorticities of the form:
0 (x; ) = 0 (x) + Y1 ()1 (x)
1
(D) denotes the mean initial vorticwith Y1 U (1, 1) and where 0 (x) Hper
1
ity, and the fluctuation is given by 1 (x) := sin(2 x1 ) sin(2 x2 ) Hper
(D). We
choose as the mean vorticity 0 (x) := x1 (1 x1 )x2 (1 x2 ). Note that then 0 ()
1
(D) P-a.s.
Hper
The ensuing numerical results are obtained using a forward in time, central in space
(FTCS), vorticity solver, described in detail in [11]. In this case, for small data, the
individual Leray-Hopf solutions converge, as 0, to the unique incompressible,
inviscid Euler flow (see [3, Chap. 13], [17]) in C([0, T ]; L 2 (D)). Contrary to the
one-dimensional setting, in space dimension d = 2 and for sufficiently regular initial
data, incompressible, inviscid Euler flow solutions do not form shocks. To construct
a reference solution, we approximate the ensemble average by 1-dimensional Gauss
Legendre quadrature (using 20 nodes) and a fine discretization in space and time. This
is sufficient to accurately resolve the mean of the statistical solution. This solution,
computed with a space discretization on 10242 equal sized cells, is used as a reference
solution for the error convergence analysis of the SLMC and MLMC Finite Volume
discretization error for the 1-parametric random initial data. Simulations of individual
solutions are performed up to final time T = 1. We compare SLMC and MLMC
approximations. We select the sample numbers on the discretization levels so that
the sampling error and the discretization errors remain balanced. Due to the absence
of boundary layers, for periodic boundary conditions, and of shocks in solutions of the
226
A. Barth et al.
limiting problem, we are in the setting of Assumption 1, with s = 1. Then, the SLMC
error behaves like O(M 1/2 ) + O(h ) with O() independent of . A sufficient choice
of the sample numbers for a first order numerical scheme on individual solutions
is M = h 2 . For MLMC, with the choice M = 22s(Ll) we achieve an asymptotic
error bound of O(hL log(hL )). On the finest meshes we choose ML = 10 samples
in order to remove sampling fluctuations. Concerning the computational work, the
computational cost of a single deterministic simulation behaves like WDET hL3
(in two spatial dimensions and one temporal dimension). We remark, that Multigrid
methods allow for implicit time-stepping for the viscous part and for the velocity
reconstruction in work and memory of O(hL2 ) per time step. For SLMC, we perform
O(hL2 ) deterministic runs. This yields a scaling of the overall work of WSLMC hL5 .
With MLMC we require M = O(h 2s /hL2s ) simulations per level, for a total work of:
WMLMC
L
l=0
h 3 h 2s /hL2s
hL2
L
h 1 hL3 ,
=0
neglecting the logarithmic term. That is, for SLMC with the mentioned choices of
1/5
1/3
sample numbers M, we obtain WSLMC ErrSLMC , whereas for MLMC, WMLMC
ErrMLMC (see Fig. 6). From the discussion above and from the numerical results,
SLMC has prohibitive complexity for small space and timesteps. As predicted by the
theoretical analysis, MLMC exhibits, in terms of work vs. accuracy, a performance
which is comparable to that of one individual numerical solution on the finest mesh.
As in the one-dimensional setting, for the computation of the error, a sample of
K = 10 experiments was generated and the error is estimated by the sample average.
The number K of repetitions of experiments is chosen in such a way that the variance
of the relative error is sufficiently small.
Acknowledgments The research of Ch. S. and A. B. is partially supported under ERC AdG 247277.
The research of J. . was supported by ETH CHIRP1-03 10-1 and CSCS production project ID
S366. The research of A.B. leading to these results has further received funding from the German
Research Foundation (DFG) as part of the Cluster of Excellence in Simulation Technology (EXC
310/2) at the University of Stuttgart, and it is gratefully acknowledged. The research of A. B. and
J. . partially took place at the Seminar fr Angewandte Mathematik, ETH Zrich. The authors
thank S. Mishra and F. Leonardi for agreeing to cite numerical tests from [11] in space dimension
d = 2.
References
1. ALSVID-UQ. Version 3.0. http://www.sam.math.ethz.ch/alsvid-uq
2. Abdulle, A., Barth, A., Schwab, Ch.: Multilevel Monte Carlo methods for stochastic elliptic
multiscale PDEs. Multiscale Model. Simul. 11(4), 10331070 (2013)
3. Bahouri, H., Chemin, J.-Y., Danchin, R.: Fourier Analysis and Nonlinear Partial Differential Equations. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of
Mathematical Sciences], vol. 343. Springer, Heidelberg (2011)
4. Barth, A., Lang, A.: Multilevel Monte Carlo method with applications to stochastic partial
differential equations. Int. J. Comput. Math. 89(18), 24792498 (2012)
227
5. Barth, A., Lang, A.: Simulation of stochastic partial differential equations using finite element
methods. Stochastics 84(23), 217231 (2012)
6. Barth, A., Schwab, Ch., ukys, J.: Multilevel Monte Carlo approximations of statistical solutions of the NavierStokes equations. Research report 2013-33, Seminar for Applied Mathematics, ETH Zrich (2013)
7. Foias, C., Manley, O., Rosa, R., Temam, R.: Navier-Stokes equations and turbulence. Encyclopedia of Mathematics and its Applications, vol. 83. Cambridge University Press, Cambridge
(2001)
8. Foias, C., Rosa, R., Temam, R.: Properties of time-dependent statistical solutions of the threedimensional Navier-Stokes equations. Annales de lInstitute Fourier 63(6), 25152573 (2013)
9. Heywood, J.G., Rannacher, R.: Finite element approximation of the nonstationary NavierStokes problem. I. Regularity of solutions and second-order error estimates for spatial discretization. SIAM J. Numer. Anal. 19(2), 275311 (1982)
10. Karlsen, K.H., Koley, U., Risebro, N.H.: An error estimate for the finite difference approximation to degenerate convection-diffusion equations. Numer. Math. 121(2), 367395 (2012)
11. Leonardi, F., Mishra, S., Schwab, Ch.: Numerical Approximation of Statistical Solutions of
Incompressible Flow. Research report 2015-27, Seminar for Applied Mathematics, ETH Zrich
(2015)
12. LeVeque, R.: Numerical Solution of Hyperbolic Conservation Laws. Cambridge Press, Cambridge (2005)
13. Mishra, S., Schwab, Ch., ukys, J.: Multi-level Monte Carlo Finite Volume methods for nonlinear systems of conservation laws in multi-dimensions. J. Comput. Phys. 231(8), 33653388
(2012)
14. Rosa (Cray XE6). Swiss National Supercomputing Center (CSCS), Lugano. http://www.
cscs.ch
15. ukys, J., Mishra, S., Schwab, Ch.: Static load balancing for Multi-Level Monte Carlo finite
volume solvers. PPAM 2011, Part I, LNCS, vol. 7203, pp. 245254. Springer, Heidelberg
(2012)
16. Temam, R.: Navier-stokes equations and nonlinear functional analysis. CBMS-NSF Regional
Conference Series in Applied Mathematics, vol. 41. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1983)
17. Yudovic, V.I.: A two-dimensional non-stationary problem on the flow of an ideal incompressible
fluid through a given region. Mat. Sb. (N.S.) 64(106), 562588 (1964)
Fourier transform
Levy
1 Introduction
Nowadays Monte Carlo simulation becomes an influential tool in financial applications such as derivative pricing and risk management; see Glasserman [12] for a
comprehensive overview, Staum [25] and Chen and Hong [8] for introductory tutorials of the topic. A standard MC procedure typically starts with using some general
methods of random number generation, such as inverse transform and acceptancerejection, to sample from descriptive probabilistic distributions of market variables.
D. Belomestny (B)
Duisburg-Essen University, Thea-Leymann-Str. 9, Essen, Germany
e-mail: denis.belomestny@uni-due.de
D. Belomestny
IITP RAS, Moscow, Russia
N. Chen Y. Wang
The Chinese University of Hong Kong, Hong Kong, China
e-mail: nchen@se.cuhk.edu.hk
Y. Wang
e-mail: ywwang@se.cuhk.edu.hk
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_9
229
230
D. Belomestny et al.
Therefore, explicit knowledge about the functional forms of the underlying distributions is a prerequisite for the applications of MC technique.
However, a growing literature of Lvy-driven processes and their applications in
finance calls for research to investigate how to simulate from a distribution whose
cumulative probability function or probability density function may not be available in explicit form. As an important building block of asset price modeling, Lvy
processes can capture well discontinuous price changes and thus are widely used to
model the skewness/smile of implied volatility curves in the option market; see, e.g.
Cont and Tankov [26], for the modeling issues of Lvy process. According to the
celebrated Lvy-Khintchine representation, the joint distribution of the increments
of a Lvy process is analytically characterized by its Fourier transform. Utilizing this
fact, we can evaluate the price function of options written on the underlying assets
modelled by a Lvy process in two steps. First, we apply the Fourier transform (with
some suitable adjustments) on the risk-neutral presentation of option prices in order
to obtain an explicit form of the transformed price function. Second, we numerically
invert the transform to recover the original option price. This research line can be
traced back to Carr and Madan [5], which proposed Fast Fourier Transform (FFT) to
accelerate the computational speed of the method. One may also refer to Lewis [20],
Lee [19], Lord and Kahl [21] and Kwok et al. [17] for more detailed discussion and
extension of FFT. Kou et al. [16] used a trapezoidal rule approximation developed
by Abate and Whitt [1] to invert Laplace transforms for the purpose of option pricing
under the double exponential jump diffusion model, a special case of Lvy process.
Feng and Linetsky [11] introduced Hilbert transform to simplify Fourier transform
of discretely monitored barrier option by backward induction. More recently, Biagini
et al. [4] and Hurd and Zhou [14] extended the Fourier-transform based method to
price options on several assets, including basket options, spread options and catastrophe insurance derivatives.
The numerical inversion of Fourier transforms turns out to be the computational
bottleneck of the above approach. It essentially involves using a variety of numerical discretization schemes to evaluate one- or multi-dimensional integrals. Hence,
such methods will suffer seriously from the curse of dimensionality as the problem dimension increases when we try to price options written on multiple assets.
Monte Carlo method, as a competitive alternative for calculating integrals in a high
dimensional setting, thus becomes a natural choice in addressing this difficulty. To
overcome the barrier that explicit forms of the distribution functions for Lvy driven
processes are absent, some early literature relies on somehow ad hoc techniques to
derive upper bounds for the underlying distribution for the purpose of applying the
acceptance-rejection principle (see, e.g. Glynn [2] and Devroye [18]). More recently,
some scholars, such as Glasserman and Liu [13] and Chen et al. [9], proposed to
numerically invert the transformed distributions to tabulate the original distribution
on a uniform grid so that they can simulate from. Both directions work well in one
dimension. Nevertheless, it is difficult for them to be extended to simulate high
dimensional distributions.
In this paper we propose a novel approach for computing high-dimensional integrals with respect to distributions with explicitly known Fourier transforms based on
231
Qn =
g(x)p(x)dx =
1
(2 )d
Rd
F [g](u)F [p](u)du.
Take, for example, g(x) = (max{x, 0}) with some (0, ), then Parsevals identity implies
g(x)
[x p (x)] dx
x
1
F [x p (x)](u)F [g(x)/x](u) du.
=
2
Q=
According to
F [x p (x)](u) = i
and
F [g(x)/x](u) =
d
F [p ](u) = isign(u)|u|1 exp(|u| )
du
()
(cos(/2) + isign(u) sin(/2)),
|u|
we have
() sin(/2)
Q=
0
u1 exp(u ) du.
(1)
232
D. Belomestny et al.
1
exp(|x| ), < x < +
2 (1 + 1/)
g (x) =
2 General Framework
Let g be a real-valued function on Rd and let p be a probability density on Rd . Our
aim is to compute the integral of g with respect to p :
V=
g(x)p(x) dx.
Rd
Rd
g(x)ex,R p(x)ex,R dx =
1
(2 )d
Rd
(2)
Let q be a probability density function with the property that q(x) = 0 whenever
|F [p](u iR)| = 0, | | denoting the complex modulus. That is, q has the same
support as |F [p](u iR)|. Then we can write
V=
where
1
(2 )d
Rd
F [g](iR u)
F [p](u iR)
q(u) du = eq [h(X)] ,
q(u)
(3)
h(x) =
233
1
F [p](x iR)
.
F [g](iR x)
d
(2 )
q(x)
1
2
2d
Rd
|F [g](iR u)|2
|F [p](u iR)|2
du V 2 .
q(u)
Note that the function |F [p](u iR)| is, up to a constant, a probability density and
in order to minimize the variance, we need to find a density q, that minimizes the
ratio
|F [p](u iR)|
q(u)
and that we are able to simulate from. In the next section, we discuss how to get a
tight upper bound for |F [p](iR u)| in the case of an infinitely divisible distribution
p, corresponding to the marginal distributions of Lvy processes. Such a bound can
be then used to find a density q leading to small values of variance Varq [h(X)].
3 Lvy Processes
Let (Zt ) be a pure jump d-dimensional Levy process with the characteristic exponent
, that is
E eiu,Zt = et(u) , u Rd .
Consider the process Xt = Zt , where is a real m d matrix. Let a vector R Rm
.
be such that R (dz) = e R,z (dz) is again a Lvy measure, i.e.
2
|z| 1 R (dz) < .
Suppose that there exist a constant C > 0 and a real number (0, 2), such
that, for sufficiently small > 0, the following estimate holds
{zR:|z,h|}
(4)
The above condition is known as Oreys condition in the literature (see Sato [24]). It
is usually used to ensure that the process admits continuous transition densities. The
value is called by the BlumenthalGetoor index of the process. Under it, we have
Lemma 1 Suppose that (4) holds, then there exists constant AR > 0 such that, for
any u Rm and sufficiently large | u|,
234
D. Belomestny et al.
2tC
(5)
(dz)
e R,z 1 cos u, z
exp t
Rd
1 cos u, z R (dz) ,
= AR exp t
Rd
where
AR = exp t
Rd
R,z
e
1 R, z1{|z|1} (dz) < ,
since
R,z
1 R, z1{|z|1}
C1 ( R) |z|2 1{|z|1} + C2 ( R)e R,z 1{|z|>1} .
First, note that the condition (4) is equivalent to the following one
{zR:|z,k|1}
for sufficiently large k Rd , say |k| c0 . To see this, it is enough to change in (4)
the vector h to the vector k. Fix u Rm with |u| 1 and | u| c0 , then using
the inequality 1 cos(x) 22 |x|2 , |x| , we find
2
1 cos u, z R (dz) 2
u, z2 R (dz)
{zR:| u,z|1}
Rd
2C
2
u
.
235
2tC
q(u) := C exp 2
u
,
we know from Lemma 1 that our simulation scheme will have a finite variance.
Discussion
The condition (4) is not very restrictive. We can show that it is true for many commonly used Lvy models in financial applications, such as CGMY, NIG and -stable
models. Below we discuss a special case, which can be viewed as a generalization
of -stable processes.
For simplicity we take R = 0. Clearly, if (Zt ) is a d-dimensional -stable process
which is rotation invariant ((h) = c |h| , for h Rd ), then (4) holds. Consider now
general -stable processes. It is known that Z is -stable if and only if its components
Z 1 , . . . , Z d are -stable and if the Levy copula C of Z is homogeneous of order 1
(see Cont and Tankov [26]), i.e.
C (r 1 , . . . , r d ) = r C (1 , . . . , d )
for all = (1 , . . . , d ) Rd and r > 0. As an example of such homogeneous Levy
copula one can consider
1/
d
j
1 ...
C (1 , . . . , d ) = 22d
1
d 0
(1 )11 ...d <0 ,
j=1
where > 0 and [0, 1]. If the marginal tail integrals given by
j (xj ) = R, . . . , I (xj ), . . . R sgn(xj )
with
I (x) =
(x, ),
x 0,
(, x], x < 0,
are absolutely continuous, we can compute the Lvy measure for the Lvy copula
C by differentiation as follows:
(dx1 , . . . , dxd ) = 1 . . . d C |1 =1 (x1 ),...,d =d (xd ) 1 (dx1 ) . . . d (dxd ),
where 1 (dx1 ), . . . , d (dxd ) are the marginal Lvy measures.
Suppose that the marginal Lvy measures are absolutely continuous with a stablelike behaviour:
j (dxj ) = kj (xj ) dxj =
lj (|xj |)
dxj , j = 1, . . . , d,
|xj |1+
236
D. Belomestny et al.
lj (rxj )
|xj |
, j (xj , r) = 1{xj 0}
1+
xj
xj
k j (s, r) ds.
Since the function G is homogeneous with order 1 d, we get for (0, 1),
{zR:|z,h|}
z, h2 (dz) = 2
{zR:|y,h|1}
y, h2 G 1 (y1 , ), . . . , d (yd , )
h: |h|=1 {zR:|z,h|1}
If for some R = (R1 , . . . , Rd ) the functions exRi li (x), i = 1, . . . , d, are bounded, the
.
condition (4) holds for R (dz) = eR,z (dz).
Of course, the power exponential distribution may not be a proper candidate for
q(u) if the condition (4) fails to hold. Nevertheless, we need to stress that the principle
behind Parsevals identity still applies here and thus our unbiased simulation should
work in that case.
For example, for the variance gamma process Xt with parameters , drift
and volatility of Brownian motion and variance of the subordinator, the Fourier
transform is
u2 2
t
i u) .
E[eiuXt ] = (1 +
2
There exists some constant 1 < <
2t
,
providing
iuX
E[e t ]
<
2t
1
(1 + |u|)
237
q(u) =
g(x)p(x) dx.
Rd
F [g](x)F [p](x) dx.
Rd
Rd
g2 (x)p(x) dx V 2
and
p(0)
|F [g](x)|2 F [p](x) dx V 2
(2 )d Rd
= p(0)
(g g)(x)p(x) dx V 2 ,
Var p [g (X)] =
Rd
238
D. Belomestny et al.
where
(g g)(x) =
As a result,
Rd
2
g (x) p(0)(g g)(x) p(x) dx.
Note that if p(0) > 0 is small, then it is likely that Var p [g(X)] > Var p [g (X)].
This means that estimating V under p with Monte Carlo can be viewed as a variance
reduction method in this case. Apart from the variance reduction effect, the density
p may has in many cases (for example, for infinitely divisible distributions) much
simpler form than p and therefore is easy to simulate from.
5 Numerical Examples
5.1 European Put Option Under CGMY Model
The CGMY process {Xt }t0 with drift is a pure jump process with the Lvy measure
(see Carr et al. [6])
exp(Gx)
exp(Mx)
CGMY (x) = C
1x<0 +
1x>0 , C, G, M > 0, 0 < Y < 2.
|x|1+Y
x 1+Y
As can be easily seen, the Lvy measure CGMY satisfies the condition (4) with = Y .
The characteristic function of XT is given by
(u) = e[eiuXT ] = exp iuT + TC (Y )[(M iu)Y M Y + (G + iu)Y GY ] ,
where
= r C (Y )[(M 1)Y M Y + (G + 1)Y GY
ensures that {ert eXt }t0 is a martingale. Suppose the stock price follows the model
St = S0 eXt ,
then due to (2), for any R < 0, the price of the European put option is given by
rT
e
where
erT
e[(K ST ) ] =
2
+
F [g](iR u)F [p](u iR)du,
(6)
F [g](iR u) =
239
K 1R eiu ln K
, F [p](u iR) = ei(uiR) ln S0 e[ei(uiR)XT ].
(iu + R 1)(iu + R)
To ensure the finiteness of F [p](u iR), we have to select an R such that G <
R < 0. In fact, under such R,
eRx CGMY (x)dx < +,
|x|1
which is equivalent to E[eRXT ] < + (see Sato [24], Theorem 25.17). Therefore,
|F [p](u iR)| eR ln S0 E[eRXT ] < +.
Lemma 1 implies that we can find constants , A, and such that Y , A > 0,
> 0, and
|F [p](u iR)| Ae
|u|
1
1
2 (1 + 1 )
|u|
G<R<0,,
Eq
|F [g](iR U)|2 |F [p](U iR))|2
, U q().
q2 (U)
Since the expectation usually does not have the explicit form, we propose the
following stochastic optimization algorithm to solve the problem.
.
Step 1 Noting that W = |U| is gamma distributed with the density
qW (w) =
( 1 )
w 1 e , w > 0,
1
240
D. Belomestny et al.
arg
min
G<R<0,,
N
1 |F [g](iR Ui )|2 |F [p](Ui iR))|2
N i=1
q2 (Ui )
10.3073
400,000
10.2999
1,600,000
10.2970
100,000(PT)
100,000(KM)
11.6421
10.2938
[10.2896,
10.3251]
[10.2910,
10.3088]
[10.2926,
10.3014]
[9.3455, 13.9387]
[10.2016,
10.3861]
0.0091
0.06
0.0045
0.27
0.0023
1.05
3.5172
0.0471
0.03
13096.13
10.3074
400,000
10.2990
1,600,000
10.2958
[10.2705,
10.3444]
[10.2805,
10.3175]
[10.2866,
10.3050]
Time (s)
Time (s)
0.0188
0.07
0.0094
0.30
0.0047
1.07
241
10.3062
400,000
10.2925
1,600,000
10.2961
[10.2614,
10.3511]
[10.2701,
10.3150]
[10.2849,
10.3073]
Time (s)
0.0229
0.07
0.0114
0.29
0.0057
1.08
= r q ( a2 2 a2 ( + 1)2 )
ensures the martingale condition. Then for any a < R < 0, the price of European put option is given by
rT
erT
e[(K ST ) ] =
2
+
F [g](iR u)F [p](u iR)du,
where
F [g](iR u) =
K 1R eiu ln K
, F [p](u iR) = ei(uiR)(ln S0 +T ) e[ei(uiR)XT ].
(iu + R 1)(iu + R)
1 |u|
e
2
as the importance sampling density, where the parameter can be chosen by minimizing the simulated second moment.
242
D. Belomestny et al.
4.5900
400,000
4.5896
1,600,000
4.5897
[4.5879,
4.5922]
[4.5886,
4.5907]
[4.5891,
4.5902]
RMSE
(direct)
Time
0.0011
0.06
0.0238
0.04
0.0006
0.23
0.0119
0.13
0.0003
0.92
0.0059
0.49
ium ium1
iu2 iu2
m
(uj ) (um+1 iR).
j=1
1
1/1
21
(1 +
1
)
1
|u1 |1
1
1
1/2
22
(1 +
1
)
2
|u2 |2
2
243
Table 5 Barrier option in CGMY model (R = 1.1, 1 = 1.4, 1 = 0.9, 2 = 0.7, 2 = 0.2)
No. of simulation Price
95 %-interval
RMSE
Time (s)
100,000
400,000
1,600,000
1.2235
1.2260
1.2264
[1.2164, 1.2305]
[1.2225, 1.2295]
[1.2247, 1.2282]
0.0036
0.0018
0.0009
0.18
0.61
2.34
References
1. Abate, J., Whitt, W.: The fourier-series method for inverting transforms of probability distributions. Queueing Syst. 10(12), 587 (1992)
2. Asmussen, S., Glynn, P.W.: Stochastic Simulation: Algorithms and Analysis, vol. 57. Springer
Science & Business Media, New York (2007)
3. Barndorff-Nielsen, O.E.: Processes of normal inverse gaussian type. Financ. Stoch. 2(1), 4168
(1997)
4. Biagini, F., Bregman, Y., Meyer-Brandis, T.: Pricing of catastrophe insurance options written
on a loss index with reestimation. Insur.: Math. Econ. 43(2), 214222 (2008)
5. Carr, P., Madan, D.: Option valuation using the fast Fourier transform. J. Comput. Financ. 2(4),
6173 (1999)
6. Carr, P., Geman, H., Madan, D.B., Yor, M.: The fine structure of asset returns: an empirical
investigation. J. Bus. 75(2), 305333 (2002)
7. Chambers, J.M., Mallows, C.L., Stuck, B.: A method for simulating stable random variables.
J. Am. Stat. Assoc. 71(354), 340344 (1976)
8. Chen, N., Hong, L.J.: Monte Carlo simulation in financial engineering. In: Proceedings of the
39th Conference on Winter Simulation, pp. 919931. IEEE Press (2007)
9. Chen, Z., Feng, L., Lin, X.: Simulating Lvy processes from their characteristic functions and
financial applications. ACM Trans. Model. Comput. Simul. (TOMACS) 22(3), 14 (2012)
10. Feng, L., Lin, X.: Inverting analytic characteristic functions and financial applications. SIAM
J. Financ. Math. 4(1), 372398 (2013)
11. Feng, L., Linetsky, V.: Pricing discretely monitored barrier options and defaultable bonds in
Lvy process models: a fast Hilbert transform approach. Math. Financ. 18(3), 337384 (2008)
12. Glasserman, P.: Monte Carlo Methods in Financial Engineering, vol. 53. Springer, New York
(2004)
13. Glasserman, P., Liu, Z.: Sensitivity estimates from characteristic functions. Oper. Res. 58(6),
16111623 (2010)
14. Hurd, T.R., Zhou, Z.: A Fourier transform method for spread option pricing. SIAM J. Financ.
Math. 1(1), 142157 (2010)
15. Kawai, R., Masuda, H.: On simulation of tempered stable random variates. J. Comput. Appl.
Math. 235(8), 28732887 (2011)
16. Kou, S., Petrella, G., Wang, H.: Pricing path-dependent options with jump risk via Laplace
transforms. Kyoto Econ. Rev. 74(1), 123 (2005)
244
D. Belomestny et al.
17. Kwok, Y.K., Leung, K.S., Wong, H.Y.: Efficient options pricing using the fast Fourier transform.
Handbook of Computational Finance, pp. 579604. Springer, Heidelberg (2012)
18. LEcuyer, P.: Non-uniform random variate generations. International Encyclopedia of Statistical Science, pp. 991995. Springer, New York (2011)
19. Lee, R.W., et al.: Option pricing by transform methods: extensions, unification and error control.
J. Comput. Financ. 7(3), 5186 (2004)
20. Lewis, A.L.: A simple option formula for general jump-diffusion and other exponential Lvy
processes. Available at SSRN 282110 (2001)
21. Lord, R., Kahl, C.: Optimal Fourier inversion in semi-analytical option pricing (2007)
22. Poirot, J., Tankov, P.: Monte Carlo option pricing for tempered stable (CGMY) processes.
Asia-Pac. Financ. Mark. 13(4), 327344 (2006)
23. Rudin, W.: Real and Complex Analysis. Tata McGraw-Hill Education, New York (1987)
24. Sato, K.I.: Lvy Processes and Infinitely Divisible Distributions. Cambridge University Press,
Cambridge (1999)
25. Staum, J.: Monte Carlo computation in finance. Monte Carlo and Quasi-Monte Carlo Methods
2008, pp. 1942. Springer, New York (2009)
26. Tankov, P.: Financial Modelling with Jump Processes, vol. 2. CRC Press, Boca Raton (2004)
1 Introduction
Many models from physics, chemistry or biology involve stochastic systems for
different purposes: taking into account uncertainty with respect to data parameters,
C.-E. Brhier (B)
Universit Paris-Est, CERMICS (ENPC), 6-8-10 Avenue Blaise Pascal,
Cit Descartes, 77455 Marne-la-valle, France
e-mail: brehierc@cermics.enpc.fr
C.-E. Brhier
INRIA Paris-Rocquencourt, Domaine de Voluceau - Rocquencourt,
B.P. 105, 78153 Le Chesnay, France
L. Goudenge
Fdration de Mathmatiques de lcole Centrale Paris, CNRS,
Grande voie des Vignes, 92295 Chtenay-Malabry, France
e-mail: goudenege@math.cnrs.fr
L. Tudela
Ensae ParisTech, 3 Avenue Pierre Larousse, 92240 Malakoff, France
e-mail: loic.tudela@ensae-paristech.fr
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_10
245
246
2 1 dWt ,
associated with a potential function V with several local minima. Here W denotes a
d-dimensional standard Wiener process. When the inverse temperature increases,
the transitions become rare events (their probability decreases exponentially fast).
In this paper, we adopt a numerical point of view, and analyze a method which
outperforms a pure Monte-Carlo method for a given computational effort in the small
probability regime (in terms of relative error). Two important families of methods
have been introduced in the 1950s and next have been extensively developed, in order
to efficiently address this rare event estimation problem: importance sampling, and
importance/multilevel splittingsee [11], and [9] for a more recent treatment. We
refer for instance to [12] for a more general presentation.
The method we study in this work is a multilevel splitting algorithm. The main
advantage of this kind of methods is that they are non-intrusive: the model does not
need to be modified in order to obtain a more efficient Monte-Carlo method. The
method we study has an additional feature: adaptive computations (of levels) are
made on-the-fly. To explain more precisely the algorithm and its properties, from
now on we only focus on a simpler, generic setting for the rare event estimation
problem.
Let X be a real random variable, and a be a given threshold. We want to estimate
the tail probability p := P(X > a). The splitting strategy, in the regime when
a becomes large, consists in introducing the following decomposition of p, as a
product of conditional probabilities:
P(X > a) = P(X > an |X > an1 ) . . . P(X > a2 |X > a1 )P(X > a1 ),
for a sequence of levels a1 < . . . < an1 < an = a. The common interpretation of
this formula is that the event that X > a is split in n conditional probabilities for X ,
which are each much larger than p, and are thus easier to estimate.
To optimize the variance, the levels must be chosen such that all the conditional
probabilities are equal to p 1/n , with n as large as possible. However, levels satisfying
this condition are not known a priori in practical cases.
Notice that, in principle, to apply this splitting strategy, one needs to know how to
sample according to the conditional distributions appearing in the splitting formula.
If this condition holds, we say that we are in an idealized setting.
Adaptive techniques based on multilevel splitting, where the levels are computed
on-the-fly, have been introduced in the 2000s in various contexts, under different
names: Adaptive Multilevel Splitting (AMS) [57], Subset simulation [2] and Nested
sampling [13] for instance.
247
In this paper, we focus on the versions of AMS algorithms studied in [3], following
[5]. Such algorithms depend on two parameters: a number of (interacting) replicas
n, and a fixed integer k {1, . . . , n 1}, such that a proportion k/n of replicas are
killed and resampled at each iteration. The version with k = 1 has been studied in
[10], and is also (in the idealized setting) a special case of the Adaptive Last Particle
Algorithm of [14].
A family of estimators ( p n,k )n2,1kn1 is introduced in [3]see (2) and (3).
The main property established there is unbiasedness: for all values n and k the
equality E[ p n,k ] = p holds truenote that this statement is not an asymptotic
result. Moreover, an analysis of the computational cost is provided there, in the
regime n +, with fixed k. However, comparisons, when k changes, are made
using a cumbersome procedure: M independent realizations of the algorithm are
necessary to define a new estimator, as an empirical mean of p 1n,k , . . . , p n,k
M , and
finally one studies the limit when M +. The aim of this paper is to remove this
procedure: we prove directly an asymptotic normality result for the estimator p n,k ,
when n +, with fixed k. Such a result allows to directly rely on asymptotic
Gaussian confidence intervals.
Note that other Central Limit Theorems for Adaptive Multilevel Splitting
estimators (in different parameter regimes for n and k) have been obtained in
[4, 5, 8].
The main result of this paper is Theorem 1: if k and a are fixed, under the assumption that the cumulative
function of X is continuous, when n +,
distribution
the random variable n p n,k p converges in law to a centered Gaussian random
variable, with variance p 2 log( p) (independent of k).
The main novelty of the paper is the treatment of the case k > 1: indeed when
k = 1 (see [10]) the law of the estimator is explicitly known (it involves a Poisson
random variable with parameter n log( p)): the asymptotic normality of log( p n,1 ) is
a consequence of straightforward computation, and the central limit theorem for p n,1
easily follows using the delta-method. When k > 1, the law is more complicated
and not explicitly known; the key idea is to prove that the characteristic function
of log( p n,k ) satisfies a functional equation, following the strategy in [3]; the basic
ingredient is a decomposition according to the first step of the algorithm.
One of the main messages of this paper is thus that the functional equation technique is a powerful tool in order to prove several key properties of the AMS algorithm
in the idealized setting: unbiasedness and asymptotic normality.
The paper is organized as follows. In Sect. 2, we introduce the main objects:
the idealized setting (Sect. 2.1) and the AMS algorithm (Sect. 2.2). Our main result
(Theorem 1) is stated in Sect. 2.3. Section 3 is devoted to the detailed proof of this
result. Finally Sect. 4 contains a numerical illustration of the Theorem.
248
249
0
Define Z 1 = X (k)
, the kth order statistics of the sample X 0 = (X 10 , . . . , X n0 ), and
1 the (a.s.) unique associated permutation: X 0 1 (1) < . . . < X 0 1 (n) .
Set j = 1.
Iterations (on j 1): While Z j < a:
j
n,k
(x) = C
n,k
k
(x) 1
n
J n,k (x)
(2)
with
C n,k (x) =
n,k
1
Card i ; X iJ (x) a .
n
(3)
(4)
250
Notice that the asymptotic variance does not depend on k. As a consequence of this
result, one can define asymptotic Gaussian confidence intervals, for one realization
of the algorithm and n +. However, the speed of convergence is not known
and may depend on the estimated probability p, and on the parameter k.
Thanks to Theorem 1, we can study the cost of the use of one realization of
the AMS algorithm to obtain a given accuracy when n +. In [3], the cost was
analyzed when using a sample of M independent realizations of the algorithm, giving
an empirical estimator, and the analysis was based on an asymptotic analysis of the
variance in the large n limit.
Let be some fixed tolerance error, and > 0. Denote r such that P(Z
[r , r ]) = 1 , where Z is a standard Gaussian random variable.
Then for n large,an asymptotic confidence
interval with level 1 , centered
around p, is [ p
p2 log( p)
,
n
p2 log( p)
].
n
p2 log( p)r2
.
2
p+
251
We emphasize again that even if the exponential case appears as a specific example
(Assumption 2 obviously implies Assumption 1), giving a detailed proof of Proposition 1 is sufficient, thanks to Corollary 3.4 in [3], to obtain our main general result
Theorem 1. Since the exponential case is more convenient for the computations below,
in the sequel we work under Assumption 2. Moreover, we abuse notation: we use the
general notations from Sect. 2, even under Assumption 2.
The following notations will be useful:
f (z) = exp(z)1z>0 (resp. F(z) = 1 exp(z) 1z>0 ) is the density (resp. the
cumulative distribution function) of the exponential law E (1) with parameter 1.
nk
is the density of the kth order statistics
f n,k (z) = k nk F(z)k1 f (z) 1 F(z)
X (k) of a sample (X 1 , . . . , X n ), where the X i are independent and exponentially
distributed, with parameter 1.
252
f n,k (z)dz,
(7)
Straightforward computations (see also [3]) yield the following useful formulae:
d
f n,1 (y; x) = n f n,1 (y; x).
dx
d
(8)
(9)
A natural idea is to introduce the characteristic function of p n,k (x), and to follow the strategy developed in [3]. Nevertheless, we are not able to derive a useful functional equation with respect to the x variable. The strategy we adopt is to
study the asymptotic normality of the logarithm log( p n,k (x)) of the estimator, and
to use a particular case of the delta-method (see for instance [15], Sect. 3): if for
asequence
of real random variables
(n )nN and a real number R one has
n n ) N (0, 2 ), then n exp(n ) exp( ) N 0, exp(2 ) 2 ,
n
n
where convergence is in distribution.
We thus introduce for any t R and any 0 x a
n,k (t, x) := E exp it n log( p n,k (x)) log(P(x)) .
(10)
(11)
253
for which Lemma 1 states a functional equation, with respect to the variable x
[0, a]. By Lvys Theorem, Proposition 1 is a straightforward consequence (choosing
x = 0) of Proposition 2 below.
Proposition 2 For any k N , any 0 x a and any t R
t 2 (x a)
.
n,k (t, x) exp
n+
2
(12)
The rest of this section is devoted to the statement and the proof of four lemmas,
and finally to the proof of Proposition 2.
Lemma 1 (Functional Equation) For any n N and any k {1, . . . , n 1}, and
for any t R, the function x n,k (t, x) is solution of the following functional
equation (with unknown ): for any 0 x a
(t, x) = e
it n log(1 nk )
(13)
k1
eit
n log(1 nl )
(14)
l=0
where (S(x)nj )1 jn are iid with law L (X |X > x) and where S(x)n(l) is the lth order
statistics of this sample (with convention S(x)n(0) = x).
Proof The idea (like in the proof of Proposition 4.2 in [3]) is to decompose the
0
. On the event
expectation according to the value of the first level Z 1 = X (k)
1
n,k
nl
Z > a = J (x) = 0 , the algorithm stops and p n,k (x) = n for the unique
l {0, . . . , k 1} such that S(x)n(l) < a S(x)n(l+1) . Thus
E[eit
1 J n,k (x)=0 ] =
k1
eit
n log(1 nl )
l=0
If Z 1 < a, for the next iteration the algorithm restarts from Z 1 , and
E[eit
1 J n,k (x)>0 ]
n,k
it n log C n,k (x)(1 nk ) J (x)1
it n log(1 nk )
1
E[e
|Z ]1 Z 1 <a
=E e
k
n,k
1
(16)
= eit n log(1 n ) E E[eit n log( p (Z )) |Z 1 ]1 Z 1 <a
k
= eit n log(1 n ) E n,k (t, Z 1 )1 Z 1 <a
a
it n log(1 nk )
=e
n,k (t, y) f n,k (y; x) dy.
x
Then (13) follows from (15), (16) and the definition (11) of n,k .
254
We exploit the functional equation (13) for x n,k (t, x), to prove that this
function is solution of a Linear Ordinary Differential Equation (ODE).
Lemma 2 (ODE) Let n and k {1, . . . , n 2} be fixed. There exist real numbers
n,k and (rmn,k )0mk1 , depending only on n and k, such that for all t R, the
function x n,k (t, x) satisfy the following Linear Ordinary Differential Equation
(ODE) of order k: for x [0, a]
k1
dm
dk
it n log(1 nk ) n,k
(t,
x)
=
e
(t,
x)
+
rmn,k m n,k (t, x).
n,k
n,k
k
dx
dx
m=0
(17)
The coefficients n,k and (rmn,k )0mk1 satisfy the following properties:
k1
n,k = (1)k n . . . (n k + 1)
rmn,k m = ( n) . . . ( n + k 1) for all R.
(18)
m=0
Observe that the ODE (17) is linear and that the coefficients are constant (with
respect to the variable x [0, a], for fixed parameters n, k and t). This nice property
is the main reason why we consider the function n,k (given by (11)) instead of
n,k (given by (10)); moreover it is also the reason why we study the characteristic
function of log( p n,k (x)), instead of the one of p n,k (x).
Proof The proof follows the same lines as Proposition 6.4 in [3]. We introduce
n,k (t, x) :=
k1
eit
n log(1 nl )
l=0
Then by recursion, using the second line in (8), for 0 l k 1 and for any x a
and t R
a
dl
n,k it n log(1 nk )
(t,
x)
(t,
x)
=
e
n,k (t, y) f n,kl (y; x) dy
n,k
n,k
l
d xl
x
l1
m
n,k d
n,k (t, x) n,k (t, x) , (19)
rm,l
+
m
dx
m=0
with the associated recursion
n,k
n,k
k + l + 1)ln,k ;
n,k 0 = 1, l+1 = (nn,k
n,k
rl,l = 1.
(20)
255
Using (19) for l = k 1 and the first line of (8), one eventually obtains, by differentiation, an ODE of order k:
dk
n,k it n log(1 nk )
(t,
x)
(t,
x)
=
e
n,k (t, x)
n,k
n,k
dxk
k1
dm
rmn,k m n,k (t, x) n,k (t, x) , (21)
+
dx
m=0
n,k
n,k
with n,k := n,k
k and r m := r m,k .
It is key to observe that the coefficients n,k and (rmn,k )0mk1 are defined by the
same recursion as in [3]. In particular, they do not depend on the parameter t R.
To see a proof of (18), we refer to Sect. 6.4 in [3].
It is clear that the polynomial equality in (18) is equivalent to the following
identity: for all j {0, . . . , k 1}
k1
dm
dk
exp
k
+
j
+
1)(x
a))
=
rmn,k m exp ((n k + j + 1)(x a)) .
((n
k
dx
dx
m=0
Due to the definition of the cumulative distribution functions of order statistics (7),
one easily checks that n,k (t, .) is a linear combination of the exponential functions
x exp(nx), . . . , exp((n k + 1)x); therefore
k1
m
dk
n,k d
(t,
x)
=
r
n,k (t, x).
n,k
m
dxk
dxm
m=0
Thus the terms depending on n,k in (21) cancel out, and thus (17) holds true.
The next steps are to give an explicit expression of the solution of (17) as a linear
combination of exponential functions, and to study the coefficients and the modes in
the asymptotic regime n +. Since the ODE is of order k, in order to uniquely
determine the solution, more information is required: we need to know the derivatives
of order 0, 1, . . . , k 1 of x n,k (t, x) at some point. We choose the terminal
point x = a (notice that by the change of variable x a x the ODE (17) can then
be seen as an ODE with an initial condition). This is the content of Lemma 3 below.
Lemma 3 (Terminal condition) For any fixed k {1, . . . , } and any t R, we have
n,k (t, a) = 1
dm
(t, x)
d x m n,k
= O( 1n )n m
x=a n
if m {1, . . . , k 1} .
(22)
Proof The equality n,k (t, a) = 1 is trivial, since p n,k (a) = 1. Equations (19) and
(21), immediately imply (by recursion) that for 1 m k 1
256
dm
dm
(t,
x)
=
(t,
x)
.
n,k
n,k
x=a
x=a
dxm
dxm
Introduce the following decomposition
n,k (t, x) =
k1
l
(eit n log(1 n ) P(S(x)n(l) < a S(x)n(l+1) )
l=0
k1
eit
n log(1 nl )
1 Fn,l (a; x) Fn,l+1 (a; x)
l=0
k1
l=0
F
(a;
x)
= 0.
n,k
x=a
dxm
k
l=1
l
n,k
(t)en,k (t)(xa) ,
l
(24)
257
1n,k (t) = it n +
n
1
n,k
(t)
n
and for 2 l k
t2
2
+ o(1),
(25)
1;
i2(l1)
k
l
(t)
n,k
n
),
(26)
0.
Proof We denote by (ln,k (t))1lk the roots of the characteristic equation associated
with the linear ODE with constant coefficient (17) (with unknown C): thanks to
(18)
(n )...(n k + 1 )
k
eit n log(1 n ) = 0
n...(n k + 1)
By the continuity property of the roots of a complex polynomial of degree k with
respect to its coefficients, we have
l
n,k (t) :=
ln,k (t)
n
where ( (t))1lk are the roots of (1 )k = 1: thus 1n,k (t) and = o(n),
n
i2(l1)
k
).
To study more precisely the asymptotic behavior of 1n,k (t), we postulate an ansatz
1
itk
t 2k2
it n log(1 nk )
e
+o
.
= 1
n
2n
n
n
In particular, for n large enough, (ln,k (t))1lk are pairwise distinct, and (24) follows.
l
(t))1lk are solutions of the following linear system
Then the coefficients (n,k
of equations of order k:
258
k
1
1
k1
k
k1
1
1 d k1
k
+ ... + n,k
(t) n,k (t)
= n k1
(t, a).
n,k (t) n,k (t)
d x k1 n,k
(27)
l
Using Cramers rule, we express each n,k
(t) as a ratio of determinants (the denominator is a Vandermonde determinant and is non zero when n is large enough). For
l {2, . . . , k}, we have
l
(t) =
n,k
l
det(Mn,k
(t))
1
0,
n+
l
Mn,k
(t)
1
n,k (t)
2
n,k (t)
..
..
.
.
k1 2
k1
1
n,k (t)
n,k (t)
... 1 ...
1
k
. . . O( 1n ) . . . n,k (t)
..
..
..
..
.
.
.
.
k
k1
1
. . . O( n ) . . . n,k (t)
l
is such that det(Mn,k
(t)) 0 (since n,k (t) 0), while the denominator is the
n+
1
1
k
k
Vandermonde determinant V n,k (t), . . . , n,k (t)
V (t), . . . , (t)
= 0.
!k
l
1
(t) = 1 l=2
n,k
(t)
Finally, n,k
Lemma 4.
n+
n+
We
are
now in position to prove Proposition 2. Indeed, recall that n,k (t, x) =
exp it n(x a) n,k (t, x) thanks to (10) and (11). Then taking the limit n +
thanks to Lemma 4 gives the convergence of the characteristic function n,k .
4 Numerical Results
In this section, we provide numerical illustration of the Central Limit Theorem 1. We
apply the algorithm with an exponentially distributed random variable with parameter
1this is justified by the discussion in Sect. 3.1.
In the simulations below, the estimated probability is e6 ( 2.48 103 ).
In Fig. 1, we fix the value k = 10, and we show histograms for n = 102 , 103 , 104 ,
with different values for the number M independent realizations of the algorithm,
such that n M = 108 (we thus have empirical variance of the same order for all
cases). In Fig. 1, we give the associated Q-Q plots, where the empirical quantiles of
259
Fig. 1 Histograms for k = 10 and p = exp(6): n = 102 , 103 , 104 from left to right
Fig. 2 Q-Q plot for k = 10 and p = exp(6): n = 102 , 103 , 104 from left to right
Fig. 3 Histograms for n = 104 and p = exp(6): k = 1, 10, 100 from left to right
the sample are compared with the exact quantiles of the standard Gaussian random
variable (after normalization).
In Fig. 3, we show histograms for M = 104 independent realizations of the AMS
algorithm with n = 104 and k {1, 10, 100}; we also provide associated Q-Q plots
in Fig. 4.
From Figs. 1 and 2, we observe that when n increases, the normality of the estimator is confirmed. Moreover, from Figs. 3 and 4, no significant difference when k
varies is observed.
260
Fig. 4 Q-Q plot for n = 104 and p = exp(6): k = 1, 10, 100 from left to right
Acknowledgments C.-E. B. would like to thank G. Samaey, T. Lelivre and M. Rousset for the
invitation to give a talk on the topic of this paper at the 11th MCQMC Conference, in the special
session on Mathematical aspects of Monte Carlo methods for molecular dynamics. We would also
like to thank the referees for suggestions which improved the presentation of the paper.
References
1. Asmussen, S., Glynn, P.W.: Stochastic Simulation: Algorithms and Analysis. Springer, New
York (2007)
2. Au, S.K., Beck, J.L.: Estimation of small failure probabilities in high dimensions by subset
simulation. J. Probab. Eng. Mech. 16, 263277 (2001)
3. Brhier, C.E., Lelivre, T., Rousset, M.: Analysis of adaptive multilevel splitting algorithms in
an idealized case. ESAIM Probab. Stat., to appear
4. Crou, F., Del Moral, P., Furon, T., Guyader, A.: Sequential Monte Carlo for rare event estimation. Stat. Comput. 22(3), 795808 (2012)
5. Crou, F., Guyader, A.: Adaptive multilevel splitting for rare event analysis. Stoch. Anal. Appl.
25(2), 417443 (2007)
6. Crou, F., Guyader, A.: Adaptive particle techniques and rare event estimation. In: Conference
Oxford sur les mthodes de Monte Carlo squentielles, ESAIM Proceedings, vol. 19, pp. 6572.
EDP Sci., Les Ulis (2007)
7. Crou, F., Guyader, A., Lelivre, T., Pommier, D.: A multiple replica approach to simulate
reactive trajectories. J. Chem. Phys. 134, 054108 (2011)
8. Crou, F., Guyader, A., Del Moral, P., Malrieu, F.: Fluctuations of adaptive multilevel splitting.
e-preprints (2014)
9. Glasserman, P., Heidelberger, P., Shahabuddin, P., Zajic, T.: Multilevel splitting for estimating
rare event probabilities. Oper. Res. 47(4), 585600 (1999)
10. Guyader, A., Hengartner, N., Matzner-Lber, E.: Simulation and estimation of extreme quantiles and extreme probabilities. Appl. Math. Optim. 64(2), 171196 (2011)
11. Kahn, H., Harris, T.E.: Estimation of particle transmission by random sampling. Natl. Bur.
Stand. Appl. Math. Ser. 12, 2730 (1951)
12. Rubino, G., Tuffin, B.: Rare Event Simulation using Monte Carlo Methods. Wiley, Chichester
(2009)
13. Skilling, J.: Nested sampling for general Bayesian computation. Bayesian Anal. 1(4), 833859
(2006)
14. Simonnet, E.: Combinatorial analysis of the adaptive last particle method. Stat. Comput. (2014)
15. van der Vaart, A.W.: Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic
Mathematics, vol. 3. Cambridge University Press, Cambridge (1998)
1 Introduction
In this paper we compare two classes of low discrepancy sequences which have been
introduced relatively recently and which have an interesting overlap.
We are interested in -adic van der Corput sequences, which have been introduced
in [3, 22, 23]. Their motivation stems from algebraic arguments and is related to the
-adic representation of real numbers introduced by [24]. They have been studied
I. Carbone (B)
Department of Mathematics and Informatics, University of Calabria,
Ponte P. Bucci Cubo 30B, 87036, Arcavacata di Rende, Cosenza, Italy
e-mail: i.carbone@unical.it
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_11
261
262
I. Carbone
quite extensively. For good references on the subject we suggest [3, 18, 22, 23]. For
the original definition of van der Corput sequence see [25].
The other, more recent, class is represented by the L S-sequences which have been
introduced in [4] and have been object of several papers. These sequences have a
more geometric motivation and are related to a generalization of an idea of Kakutani
([20]), which appeared in [26]. For another generalization of the Kakutani splitting
procedure to the multidimensional setting see [8]; for two possible generalizations
of L S-sequences to dimension 2, see [6]. Other papers dedicated to the subject are
[911, 19].
As it has been shown in [4], each L S-sequence of points corresponds to the
reordering by a suitable algorithm of the points defining a specific sequence of
partitions of [0, 1[, which depends on two nonnegative integers L and S, such that
L + S 2, L 1 and S 0. These sequences will be defined in the next section.
An interesting role is played by the (1, 1)-sequence, called in [4] the Kakutani
Fibonacci sequence of points, which has also been studied from an ergodic point of
view in [7, 18].
Each -adic van der Corput sequence is associated to a characteristic equation
x d = a0 x d1 + a1 x d2 + + ad1 , with some restrictions on the coefficients.
These restrictions imply, when d = 2, that a1 = a0 or a1 = a0 + 1. In the latter case
-adic sequences are nothing else but the classical van der Corput sequences with
base b = a0 1.
This paper is concerned with the study of the interesting overlap between -adic
sequences of order two (with a0 = a1 ) and the corresponding L S-sequences for
L = S.
It should be noted that both families of sequences are much richer: the -adic
sequences can be defined for any order d 2 (with appropriate restrictions on the
coefficients), while the L S-sequences can be defined for any pair of positive integers
L , S and have low discrepancy whenever L S.
The main result is Theorem 2, which states that for L = S the (L , L)-sequence
and the -adic van der Corput sequence, which corresponds to the positive root of
x 2 L x L, can be obtained from each other by a permutation. In particular, when
L = S = 1, the KakutaniFibonacci (1, 1)-sequence coincides with the -adic
van der Corput sequence where = is the golden ratio, i.e. the positive root of
x 2 x 1 (see also [18] for more details).
-adic sequences and L S-sequences provide, in dimension 1, low-discrepancy
sequences.
Having new low-discrepancy sequences at our disposal, it is important to obtain
a more complete understanding of their behavior in order to use them in the Quasi
Monte Carlo method, pairing them la Halton (as it has been done in [16] and in
[17]). This is the main motivation of this paper.
For the L S-sequences the problem has been posed for the first time in [6]. It should
be noted that partial negative results have been obtained quite recently by [2]. This is
one of the most interesting open problems concerning L S-sequences. On the other
hand, a recent result ([18]) proved uniform distribution of the Halton type sequences
for -adic van der Corput sequences.
263
264
I. Carbone
(1)
M
ak (n) bk ,
(2)
k=0
with ak (n) {0, 1, . . . , b 1} for all 0 k M, and M = logb n (here and in the
sequel denotes the integer part). The expression (2) leads to the representation
in base b of n
(3)
[n]b = a M (n)a M1 (n) . . . a0 (n) .
If n = 0, we write [0]b = 0. The representation of n in base b given by (2) is used
to define the radical-inverse function b on N which associates the number
b (n) =
M
ak (n)bk1
265
(4)
k=0
to the string of digits (3), whose representation in base b is 0.a0 (n)a1 (n) . . . a M (n).
Of course 0 b (n) < 1 for all n 0.
The sequence {b (n)}n0 is the van der Corput sequence in base b.
Definition 4 (Carbone, [5]) Let be the positive solution of the equation Sx 2 +
L x = 1. We denote by N L ,S the set of all positive integers n, ordered by magnitude,
/ E L ,S for all
with [n] L+S = a M (n) a M1 (n) . . . a0 (n) such that (ak (n), ak+1 (n))
0 k M 1. If S = 0, we have N L ,S = N. For all n N L ,S we define the
L S-radical inverse function as follows:
L ,S (n) =
M
a k (n) k+1 ,
(5)
k=0
266
I. Carbone
1,1 (n) =
M
ak (n) k+1 ,
(6)
k=0
with the same coefficients ak (n) of the representation of n given by (3) for b = 2.
n
}nN or {1,1 (n)}nNL ,S for the KakutaniFibonacci
We will use the notation {1,1
sequence of points.
We conclude this section with some basic notions on numeration systems with
respect to a linear recurrence base sequence (for more details see [13]).
If G = {G n }n0 is an increasing sequence of natural numbers with G 0 = 1, any
n N can be expanded with respect to this sequence as follows:
n=
k (n)G k .
(7)
k=0
N
This expansion is finite and unique if for every N N we have k=0
k (n)G k <
G N +1 . G is called numeration system and (7) the G-expansion of n. The digits k
can be computed by the greedy algorithm (see, for instance, [14]).
Let us consider now a special numeration system, where the base sequence is a
linear recurrence of order d 1, namely
G n+d = a0 G n+d1 + + ad1 G n ,
n 0,
(8)
(9)
(n) =
k (n) k1 ,
(10)
k=0
267
3 Results
In order to compare L S-sequences and -adic van der Corput sequences, let us recall
that the sequence { (n)}n0 defined by (10) is not necessarily contained and dense
in [0, 1[. A partial answer can be found in [3], where it is proved that if is the Pisot
root of the characteristic Eq. (9) associated to the numeration system G defined by
(8), where a0 = = ad1 , then the sequence { (n)}n0 is uniformly distributed
in [0, 1[ and has low discrepancy. In this case, the sequence is called the Multinacci
sequence.
A complete answer has been given very recently by [18], where the authors proved
the following result.
Lemma 1 (HoferIacTichy, [18]) Let a = (a0 , . . . , ad1 ), where the integers
a0 , . . . , ad1 0 are the coefficients of the numeration system G and assume that
the corresponding characteristic root satisfies (9). Furthermore, assume that there
is no b = (b0 , . . . , bk1 ) with k < d such that is the characteristic root of
the polynomial defined by b. Then (N) [0, 1[ and (N) [0, x[ for some
0 < x < 1 if and only if a can be written either as
a = (a0 , . . . , a0 )
(11)
a = (a0 , a0 1, . . . , a0 1, a0 ),
(12)
or
where a0 > 0.
We notice that the above lemma does not require the assumption of decreasing
coefficients. In [18] it is also observed that, if the condition that d has to be minimal
is dropped, then there exist two more cases in which the above theorem is satisfied.
We are interested in the following case:
a = (a0 , . . . , a0 , a0 + 1).
(13)
From now on we shall restrict our attention to the case d = 2, and consequently
to (11) and (12). Let us consider the numeration system G = {G n }n0 defined by the
linear recurrence of order d = 2
G n+2 = a0 G n+1 + a1 G n , n 0,
with the initial conditions
(14)
268
I. Carbone
G 0 = 1 and G 1 = a0 + 1.
(15)
is
the
does not contain two consecutive digits equal to 1. Moreover, = 51
2
solution of the equation + 2 = 1. If we consider now the linear recurrence (14),
namely G n+2 = G n+1 + G n with the initial conditions (15)given by G 0 = 1 and
G 1 = 2, we have already noticed that the golden ratio = 1+2 5 is the solution of the
equation 2 = + 1 and that 1 = . Furthermore, it is clear that {G n }n0 = {tn }n0 ,
where tn is the total number of intervals of the nth partition of the KakutaniFibonacci
n
} defined in Sect. 2, which satisfies tn+2 = tn+1 + tn , with
sequence of partitions {1,1
0
t0 = 1 and t1 = 2. Here tn (0) = 1 corresponds to 1,1
= [0, 1[.
The coefficients k (n) of the related -adic van der Corput sequence { (n)}n0
defined by (10) can be evaluated with the greedy algorithm: it is very simple to
see that k (n) {0, 1} and that the expansion (7) does not contain two consecutive
coefficients equal to 1. In both representations, the -adic Monna map and the (1, 1)radical inverse function coincide on their domain and the proof is complete.
This result appears also in [18].
Now we prove the statement of the theorem in the case L = S 2, showing that
the set of the images of the radical inverse function L ,L (n) defined by (5) coincides
with the set of the images of the -adic Monna map (n) defined by (10).
More precisely, we consider n N L ,L . According to Definition 4, n has a representation [n]2L = a M (n) a M1 (n) . . . a0 (n) in base 2L such that (ak (n), ak+1 (n))
M
a k (n) k+1 ,
(16)
k=0
269
We now restrict our attention to the digits a k (n) in the case L ak (n) 2L 1.
If we put ak (n) = L + m, with 0 m L 1, we can write a k (n) = L + m .
Consequently, we have a k (n) k+1 = L k+1 + m k+2 .
From the condition (ak (n), ak+1 (n))
/ E L ,L we derive that ak+1 (n) must be equal
to 0, and that ak1 (n) has to belong to the set {0, 1, . . . , L 1}. Three consecutive
powers of can be grouped in the partial sum
a k1 (n) k + a k (n) k+1 + a k+1 (n) k+2 = ak1 (n) k + L k+1 + m k+2 ,
and in (16) we also admit two consecutive digits belonging to the set {L}{1, . . . , L
1}.
Taking the set E L ,L into account, (16) can be written with new coefficients ak
(n),
(n))
/ E L
,L , where
which are nonnegative integer numbers such that (ak
(n), ak+1
E L
,L = E L ,L \ {L} {0, 1, . . . , L 1} =
= {L + 1, . . . , 2L 1} {1, . . . , 2L 1} {L} {L , . . . , 2L 1} . (17)
Now we consider the -adic van der Corput sequence { (n)}n0 , where
(n) =
k (n) k1 ,
k=0
References
1. Aistleitner, C., Hofer, M.: Uniform distribution of generalized Kakutanis sequences of partitions. Annali di Matematica Pura e Applicata (4). 192(4), 529538 (2013)
2. Aistleitner, C., Hofer, M., Ziegler, V.: On the uniform distribution modulo 1 of multidimensional
L S-sequences. Annali di Matematica Pura e Applicata (4). 193(5), 13291344 (2014)
3. Barat, G., Grabner, P.: Distribution properties of G-additive functions. J. Number Theory 60,
103123 (1996)
270
I. Carbone
4. Carbone, I.: Discrepancy of L S sequences of partitions and points. Annali di Matematica Pura
e Applicata (4). 191(4), 819844 (2012)
5. Carbone, I.: Extension of van der Corput algorithm to L S-sequences. Appl. Math. Comput.
255, 2072013 (2015)
6. Carbone, I., Iac, M.R., Volcic, A.: L S-sequences of points in the unit square. submitted
arXiv:1211.2941 (2012)
7. Carbone, I., Iac, M.R., Volcic, A.: A dynamical system approach to the Kakutani-Fibonacci
sequence. Ergod. Theory Dyn. Syst. 34(6), 17941806 (2014)
8. Carbone, I., Volcic, A.: Kakutani splitting procedure in higher dimension. Rendiconti
dellIstituto Matematico dellUniversit di Trieste 39, 119126 (2007)
9. Carbone, I., Volcic, A.: A von Neumann theorem for uniformly distributed sequences of partitions. Rendiconti del Circolo Matematico di Palermo 60(12), 8388 (2011)
10. Chersi, F., Volcic, A.: -equidistributed sequences of partitions and a theorem of the de BruijnPost type. Annali di Matematica Pura e Applicata 4(162), 2332 (1992)
11. Drmota, M., Infusino, M.: On the discrepancy of some generalized Kakutanis sequences of
partitions. Unif. Distrib. Theory 7(1), 75104 (2012)
12. Drmota, M., Tichy, R.F.: Sequences Discrepancies and Applications. Lecture Notes in Mathematics. Springer, Berlin (1997)
13. Fraenkel, A.S.: Systems of numeration. Am. Math. Mon. 92(2), 105114 (1985)
14. Frougny, C., Solomyak, B.: Finite beta-expansions. Ergod. Theory Dyn. Syst. 12, 713723
(1992)
15. Grabner, P., Hellekalek, P., Liardet, P.: The dynamical point of view of low-discrepancy
sequences. Unif. Distrib. Theory 7(1), 1170 (2012)
16. Halton, J.H.: On the efficiency of certain quasi-random sequences of points in evaluating multidimensional integrals. Numerische Mathematik 2, 8490 (1960)
17. Hammersley, J.M.: Monte-Carlo methods for solving multivariate problems. Ann. N. Y. Acad.
Sci. 86, 844874 (1960)
18. Hofer, M., Iac, M.R., Tichy, R.: Ergodic properties of the -adic Halton sequences. Ergod.
Theory Dyn. Syst. 35, 895909 (2015)
19. Infusino, M., Volcic, A.: Uniform distribution on fractals. Unif. Distrib. Theory 4(2), 4758
(2009)
20. Kakutani, S.: A problem on equidistribution on the unit interval [0, 1[. In: Measure theory
(Proc. Conf., Oberwolfach, 1975), Lecture Notes in Mathematics 541, pp. 369375. Springer,
Berlin (1976)
21. Kuipers, L., Niederreiter, H.: Unif. Distrib. Seq. Pure and Applied Mathematics. Wiley, New
York (1974)
22. Ninomiya, S.: Constructing a new class of low-discrepancy sequences by using the -adic
transformation. IMACS Seminar on Monte Carlo Methods (Brussels, 1997). Math. Comput.
Simul. 47(25), 403418 (1998)
23. Ninomiya, S.: On the discrepancy of the -adic van der Corput sequence. J. Math. Sci. 5,
345366 (1998)
24. Rnyi, A.: Representations for real numbers and their ergodic properties. Acta Mathematica
Academiae Scientiarum Hungaricae 8, 477493 (1957)
25. van der Corput, J.G.: Verteilungsfunktionen. Proc. Koninklijke Nederlandse Akademie Van
Wetenschappen 38, 813821 (1935)
26. Volcic, A.: A generalization of Kakutanis splitting procedure. Annali di Matematica Pura e
Applicata (4). 190(1), 4554 (2011)
271
272
1 Introduction
The efficient approximation of high-dimensional integrals is a core task in many areas
of scientific computing. We mention only uncertainty quantification, computational
finance, computational physics and chemistry, and computational biology. In particular, high-dimensional integrals arise in the computation of statistical quantities of
solutions to partial differential equations with random inputs.
In addition to efficient spatial and temporal discretizations of partial differential
equation models, it is important to devise high-dimensional quadrature schemes that
are able to exploit an implicitly lower-dimensional structure in parametric input data
and solutions of such PDEs. The rate of convergence of Monte Carlo (MC) methods is
dimension-robust, i.e. the convergence rate bound holds with constants independent
of the problem dimension provided that the variances are bounded independent of
the dimension, but it is limited to 1/2. Thus it is important to devise integration
methods which converge of higher order than 1/2, independent of the dimension of
the integration domain.
In recent years, numerous approaches to achieve this type of higher-order convergence have been proposed; we mention only quasi Monte-Carlo integration, adaptive
Smolyak quadrature, adaptive polynomial chaos discretizations, and related methods.
In the present paper, we consider the realization of novel higher-order interlaced
polynomial lattice rules introduced in [6, 10, 11], which allow an integrand-adapted
construction of a quasi-Monte Carlo quadrature rule that exploits sparsity of the
parameter-to-solution map. We consider in what follows the problem of integrating
a function f : [0, 1)s R of s variables y1 , . . . , ys over the s-dimensional unit
cube,
f (y1 , . . . , ys ) dy1 dys .
(1)
I [ f ] :=
[0,1)s
N 1
1
f (x (n) ).
N n=0
273
In Sect. 2 we first define in more detail the structure of the point set P considered
throughout and derive worst-case error bounds for integrand functions which belong
to certain weighted spaces of functions introduced in [13]. Then, the component-bycomponent construction is reviewed and the worst-case error reformulated to allow
efficient computation. The main contribution of this paper is found in Sects. 4 and 5,
which mention some practical considerations required for efficient implementation
and application of these rules. In Sect. 5, we give measured convergence results for
several model integrands, showing the applicability of these methods.
2.1 Definitions
For a given prime number b, let Zb denote the finite field of order b and Zb [x] the set
of polynomials with coefficients in Zb . Let P Zb [x] be an irreducible polynomial
of degree m. Then, the finite field of order bm is isomorphic to the residue class
(Zb [x]/P, +, ), where both operations are carried out in Zb [x] modulo P. We denote
by G b,m = ((Zb [x]/P) , ) the cyclic group formed by the nonzero elements of the
residue class together with polynomial multiplication modulo P.
Throughout, we frequently interchange an integer n, 0 n < N = bm , with its
associated polynomial n(x) = 0 + 1 x + 2 x 2 + . . . + m1 x m1 , the coefficients
of which are given by the b-adic expansion n = 0 + 1 b + 2 b2 + . . . + m1 bm1 .
Given a generating vector q G sb,m , we have the following expression for the
ith component of the nth point x(n) [0, 1)s of a polynomial lattice point set P:
xi(n) = vm
n(x)q (x)
i
, i = 1, . . . , s, n = 0, . . . , N 1,
P(x)
1
where the mapping
any integer w by the
vm :Zb ((xm)) [0, 1)is given for 1
=
b
,
and
Z
((x
)) denotes the set of
expression vm
b
=w
=min(1,w)
k
formal Laurent series
a
x
with
a
Z
for
some
integer
w.
k
b
k=w k
274
A key ingredient for obtaining QMC formulas which afford higher-order convergence rates is the interlacing of lattice point sets, as introduced in [1, 2]. We define
the digit interlacing function, which maps points in [0, 1) to one point in [0, 1).
Definition 1 (Digit Interlacing Function) We define the digit interlacing function
D with interlacing factor N acting on the points {x j [0, 1), j = 1, . . . , }
by
D (x1 , . . . , x ) =
j,a b j(a1) ,
a=1 j=1
xi
n(x)q(i1)+1 (x)
n(x)q(i1)+ (x)
, . . . , vm
, i = 1, . . . , s,
= D vm
P(x)
P(x)
i.e. the ith coordinate of the nth point is obtained by interlacing a block of coordinates.
2.2.1
In order to derive a worst-case error (WCE) bound, consider the higher-order unanchored Sobolev space Ws,, ,q,r := { f L 1 ([0, 1)s ) : f s,, ,q,r < } which is
defined in terms of the higher order unanchored Sobolev norm
f s,, ,q,r :=
u{1:s}
|v|
[0,1]
uq
vu u\v {1:}|u\v|
[0,1]s|v|
( ,
,0)
( y v u\v
r/q
1/r
f )( y) d y{1:s}\v d yv
,
(2)
275
2.2.2
Error Bound
The worst-case error eWC (P, W ) of a point set P = { y(0) , . . . , y(b 1) } over the
function space W is defined by the following supremum over the unit ball in W :
m
Assume that 1 r, r
with 1/r + 1/r
= 1 and , s N with > 1. Define a
collection of positive weights = (u )uN . Then, by [6, Theorem 3.5], we have the
following bound on the worst-case error in the space Ws,, ,q,r ,
sup
f Ws,, ,q,r 1
|I [ f ] QP [ f ]| es,, ,r
(P),
es,, ,r
(P) =
=u{1:s}
|u|
C,b
u
r
1/r
b (ku ) .
(3)
ku Du
276
bounding the Walsh coefficients of functions in Sobolev spaces, see [3, Thm.14] for
details. Here, it has the value
1
2
C,b = max
,
max
(2 sin b ) z=1,...,1 (2 sin b )z
1
2 2b + 1
1
.
(4)
3+ +
1+ +
b b(b + 1)
b
b1
The bound (3) holds for general digital nets; however, we wish to restrict ourselves
to polynomial lattice rules. We additionally choose r
= 1 (and thus r = , i.e. the
norm over the sequence indexed by u {1 : s} in the norm s,, ,q,r ). We
a point set in s dimensions, and use in the following the definition
denote by P
logb y(1) b 1
where (0) = bb1
(y) = bb1
b b
b . Using [6, Theorem 3.9], we
b b
bound the sum over the dual net Du in (3) by a computationally amenable expression,
b 1
1
es,, ,1 (P) E s (q) = m
v
(y (n)
j ),
b n=0 v{1:s} jv
m
y(n) P,
(5)
v=
n(x)q j (x)
where y (n)
depends on the jth component of the generating vector,
j = vm
P(x)
v , v {1 : s} depends on the choice of weights u .
q j (x), and the auxiliary weight
Assume given a sequence ( j ) j p (N) for 0 < p < 1 and denote by u(v) {1 :
s} an indicator set containing a dimension i {1, . . . , s} if any of the corresponding
dimensions {(i 1) + 1, . . . , i} is in v {1 : s}. This can be given explicitly
by u(v) = { j/ : j v}. For product weights, we define
v =
j = C,b b(1)/2
j,
!2(,) j ,
(6)
=1
ju(v)
u=
(7)
u(v)=u
u(v) {1:}|u(v)|
| u(v) |!
ju(v)
277
v=
{1:}|u(v)|
||!
j ( j )
ju(v)
(y (n)
)
.
j
(9)
jv
3 Component-by-Component Construction
The component-by-component construction (CBC) [12, 18, 19] is a simple but nevertheless effective algorithm for computing generating vectors for rank-1 lattice rules,
of both standard and polynomial type. In each iteration of the algorithm, the worstcase error is computed for all candidate elements of the generating vector, and the one
with minimal WCE is taken as the next component. After s iterations, a generating
vector of length s is obtained, which can then be used for QMC quadrature.
Nuyens and Cools reformulated in [15, 16] the CBC construction to exploit the
cyclic structure inherent in the point sets for standard lattice rules when the number
of points N is a prime number. This leads to the so-called fast CBC algorithm based
on the fast Fourier transform (FFT) which speeds up the computation drastically. It
is also the basis for the present construction.
Fast CBC is based on reformulating (7) and (9): instead of iterating over the index
d = 1, . . . , smax , we iterate over the dimension s = 1, . . . , smax and for each s
over t = 1, . . . , . Thus, the index d above is replaced by the pair s, t through
d = (s 1) + t and we write
(n)
y (n)
j,i = y( j1)+i ,
j = 1, . . . , smax , i = 1, . . . , .
(10)
In order to obtain an efficient algorithm we further reformulate (7) and (9) such that
only intermediate quantities are updated instead of recomputing E d (q) in (7) and (9).
(11)
278
(n)
b 1
s
(n)
+ m
(ys,t
)Vs,t1 (n)Ys1 (n),
b n=1
m
(12)
(n)
. This reformulation permits
where only (12) depends on the unknown qs,t through ys,t
efficient computation of the worst-case error bound E s,t during the CBC construction
by updating intermediate quantities.
||=
j >0
jv
(y (n)
j ).
(13)
279
1
bm
bm 1 s
n=0
s
j ( j )
1 + (y (n)
j,i ) 1 .
{0:}s j=1
||= j >0
i=1
Proceeding as in the product weight case, we separate out the E s1, (q s1, ) term,
E s,t (q) = E s1, (q s1, )
bm 1
t
s min(,)
1
!
(n)
Us1,s (n) .
+ m
(1 + (ys,i )) 1
s (s )
b
( s )!
n=0
=1 s =1
i=1
min(,)
!
Defining Vs,t (n) as above and with Ws (n) = s
s (s ) (
Us1,s
s =1
=1
s )!
(n), we again aim to isolate the term depending on the unknown qs,t . This yields
E s,t (q) = E s1, (q s1, ) +
1 b 1 t
1 Ws (0)
m
b
b b
b 1
1
(Vs,t1 (n) 1)Ws (n)
+ m
b n=1
(14)
b 1
1
(n)
+ m
Vs,t1 (n)Ws (n)(ys,t
),
b n=1
(15)
(n)
.
where only the last sum (15) depends on qs,t through ys,t
The remaining terms can be ignored, since the error E(q d1 , z) is shifted by the
same amount for all candidates z G b,m . This optimization saves O(N ) operations
due to the omission of the sum (14). An analogous optimization is possible in the
product weight case. Since the value of the error bound E smax , (q) is sometimes a
useful quantity, one may choose to compute the full bounds given above.
280
n(x)q(x)
:= vm
P(x)
1nbm 1
qG b,m
(16)
with the vector consisting of the component-wise product Vs,t1 (n)Ws (n) 1nbm 1 .
The elements of depend on n(x)q(x), which is a product of polynomials in G b,m .
Since the nonzero elements of a finite field form a cyclic group under multiplication,
there exists a primitive element g that generates the group, i.e. every element of G b,m
can be given as some exponent of g.
By using the so-called Rader transform, originally developed in [17], the rows
and columns of can be permuted to obtain a circulant matrix perm . Application
of the fast Fourier transform allows the multiplications (12) and (15) to be executed
in O(N log N ) operations. This technique was applied to the CBC algorithm in [16];
we also mention the exposition in [8, Chap. 10.3].
The total work complexity is O(s N log N + 2 s 2 N ) for SPOD weights and
O(s N log N ) for product weights [6, Theorems 3.1, 3.2]. In Sect. 5, we show measurements of the CBC construction time that indicate that the constants in these
asymptotic estimates are small, allowing these methods to be applied in practice.
3.4 Algorithms
In Algorithms 1 and 2 below, V, W, Y, U() and X() denote vectors of length N . E
is a vector of length N 1 and E old , E 1 , E 2 are scalars. By we denote componentwise multiplication and z,: denotes the zth row of .
Algorithm 1 CBC_product(b, m, , smax , {1 , . . . , s })
Y 1 bm , E old 0
for s = 1, . . . , smax do
V1
for t = 1, .
. . , do
1 t
E 1 s bb b
1 Y(0)
bm 1
E 2 s n=1 V(n) 1 Y(n)
E s (V Y) + (E old + E 1 + E 2 ) 1
qs,t argminqG b,m E(q)
V (1 + qs,t ,: ) V
end for
Y 1 + s (V 1) Y
E old E(qs, )
end for
return q, E old
281
4 Implementation Considerations
4.1 Walsh Coefficient Bound
The definition of the auxiliary weights (6) and (8) contain powers of the Walsh
constant bound C,b defined in (4), which for b = 2 is bounded from below by
2
C,2 = 29 53
29 . For base b = 2, it was recently shown in [20] that C,2 can
be replaced by C = 1. Large values of the worst-case error bounds (7) and (9) have
been found to lead to generating vectors with bad projections. For integrand functions
with small Walsh coefficients, C,b may be replaced with a tighter bound C; this will
yield a worst-case error bound better adapted to the integrand and a generating vector
with the desired properties. Since additionally C,b is increasing in for fixed b, this
becomes more important as the order of the quadrature rule increases.
282
4.2 Pruning
For large values of the WCE, the elements of the generating vector can repeat, leading
to very bad projections in certain dimensions. For polynomial lattice rules, if qs,k =
qs,k k = 1, . . . , for two dimensions s and s , the corresponding components of the
quadrature points will be identical, xs(n) = xs(n) for all values of n = 0, . . . , bm 1.
Thus, in the projection onto the (s, s )-plane, only points on the diagonal are obtained,
which is obviously a very bad choice. One way this problem could be avoided is to
consider a second error criterion, as in [4]. We propose here a simpler method that
requires only minor modification of the CBC iteration.
To alleviate this effect, we formulate a pruning procedure that incorporates this
observation into the construction of the generating vector. We impose the additional
condition that the newest element of the generating vector is unique, i.e. is not
equal to a previously constructed component of q. This can be achieved in the CBC
construction by replacing the minimization of E(q) over all possible bm 1 values
of the new component by the restricted version
qd =
zG b,m ,
z {q
/ 1 ,...,qd1 }
(17)
5 Results
We present several tests of an implementation of Algorithms 1 and 2, and of the
resulting higher order QMC quadrature rules. Rather than solving concrete application problems, the purpose of the ensuing numerical experiments is a) to verify
the validity of the asymptotic (as s, N ) complexity estimates and QMC error
bounds, in particular to determine the range where the asymptotic complexity bounds
give realistic descriptions of the CBC constructions performance; b) to investigate
the quantitative effect of (not) pruning the generating vector on the accuracy and convergence rates of the QMC quadratures, and c) to verify the necessity of the weighted
spaces Ws,, ,q,r and the norms in (2) for classifying integrand function regularity. We
283
remark that, due to the limited space of these proceedings, only few representative
simulations can be presented in detail; for further results and a complete description
of our implementation, we refer to [9].
f ,s, ( y) = 1 +
s
1
aj yj
, a j = j , N .
(18)
j=1
||+1
We have the differential y f ,s, ( y) = (1)|| ||! f ,s, ( y) sj=1 (a j ) j , leading
to the bound | y f ,s, ( y)| C f ||! sj=1 j j for all {0, 1, . . . , }s and for a
C f 1 with the weights j given by j = a j = j , j = 1, . . . , s. Additionally,
for s , we have ( j ) j p (N) with p > 1 and thus =
1/ p + 1 = .
Therefore, by Theorem 3.2 of [6], an interlaced polynomial lattice rule of order
with N = bm points (b prime, m 1) and point set P N can be constructed such
that the QMC quadrature error fulfills
|I [ f ,s, ] QP N [ f ,s, ]| C(, , b, p)N 1/ p ,
(19)
for a constant C(, , b, p) independent of s and N . Convergence rates were computed with respect to a reference value of the integral I [ f ,s, ] obtained with
dimension-adaptive Smolyak quadrature with tolerance 1014 . We also consider separable integrand functions, which, on account of their separability, trivially belong
to the product weight class. They are given by
g,s, ( y) = exp
s
j=1
a j y j , a j = j ,
(20)
284
and satisfy y g( y) = g( y) sk=1 (ak )k . Under the assumption that there exists
> 0 that is independent of s and such that g( y) C
for all y
a constant C
s
s
[0, 1] , which holds here with C = exp( j=1 j ), > 1, we have the bound
sj=1 j j , for all {0, 1, . . . , }s with the weights j given
| y g,s, ( y)| C
by j = a j = j for j = 1, . . . , s. We have the following analytically given
formula for the integral
s
s
j
j
, (21)
exp( j ) 1 = exp
I [g,s, ] =
log
( + 1)!
=0
j=1
j=1
and have an error bound of the form (19), with a different value for C(, , b, p).
(a)
(b)
(c)
(d)
285
Fig. 1 CPU time required for the construction of generating vectors of varying order = 2, 3, 4
for product and SPOD weights with j = j versus the dimension s in a and b and versus the
number of points N = 2m in c and d
(a)
(b)
Fig. 2 Effect of pruning the generating vectors: convergence of QMC approximation for the SPOD
integrand (18) with = 4, s = 100, base b = 2 and = 2, 3, 4, with and without pruning. Results
a obtained with Walsh constant (4). In b, the Walsh constant C = 1 and pruning are theoretically
justified in [5] and [20], respectively
286
(a)
(b)
(c)
(d)
Fig. 3 Convergence of QMC approximation to (21) for the product weight integrand (20) in s =
100, 1000 dimensions with interlacing parameter = 2, 3, 4 with pruning. a s = 100, = 2, b
s = 100, = 4, c s = 1000, = 2, d s = 1000, = 4
(a)
(b)
Fig. 4 Convergence of QMC approximation for the SPOD weight integrand (18) in s = 100
dimensions with interlacing parameter = 2, 3, 4 with pruning. a = 2. b = 4
287
References
1. Dick, J.: Explicit constructions of quasi-Monte Carlo rules for the numerical integration of highdimensional periodic functions. SIAM J. Numer. Anal. 45(5), 21412176 (2007) (electronic).
doi:10.1137/060658916
2. Dick, J.: Walsh spaces containing smooth functions and quasi-Monte Carlo rules of arbitrary
high order. SIAM J. Numer. Anal. 46(3), 15191553 (2008). doi:10.1137/060666639
3. Dick, J.: The decay of the Walsh coefficients of smooth functions. Bull. Aust. Math. Soc. 80(3),
430453 (2009). doi:10.1017/S0004972709000392
4. Dick, J.: Random weights, robust lattice rules and the geometry of the cbcr c algorithm.
Numerische Mathematik 122(3), 443467 (2012). doi:10.1007/s00211-012-0469-5
5. Dick, J., Kritzer, P.: On a projection-corrected component-by-component construction. J. Complex. (2015). doi:10.1016/j.jco.2015.08.001
6. Dick, J., Kuo, F.Y., Le Gia, Q.T., Nuyens, D., Schwab, C.: Higher order QMC PetrovGalerkin
discretization for affine parametric operator equations with random field inputs. SIAM J.
Numer. Anal. 52(6), 26762702 (2014)
288
7. Dick, J., Le Gia, Q.T., Schwab, C.: Higher-order quasi-Monte Carlo integration for holomorphic, parametric operator equations. SIAM/ASA J. Uncertain. Quantif. 4(1), 4879 (2016).
doi:10.1137/140985913
8. Dick, J., Pillichshammer, F.: Digital nets and sequences. Cambridge University Press, Cambridge (2010). doi:10.1017/CBO9780511761188
9. Gantner, R. N.: Dissertation ETH Zrich (in preparation)
10. Goda, T.: Good interlaced polynomial lattice rules for numerical integration in weighted Walsh
spaces. J. Comput. Appl. Math. 285, 279294 (2015). doi:10.1016/j.cam.2015.02.041
11. Goda, T., Dick, J.: Construction of interlaced scrambled polynomial lattice rules of arbitrary
high order. Found. Comput. Math. (2015). doi:10.1007/s10208-014-9226-8
12. Kuo, F.Y.: Component-by-component constructions achieve the optimal rate of convergence
for multivariate integration in weighted Korobov and Sobolev spaces. J. Complexity 19(3),
301320 (2003). doi:10.1016/S0885-064X(03)00006-2
13. Kuo, F.Y., Schwab, C., Sloan, I.H.: Quasi-Monte Carlo methods for high-dimensional integration: the standard (weighted Hilbert space) setting and beyond. ANZIAM J. 53, 137 (2011).
doi:10.1017/S1446181112000077
14. Niederreiter, H.: Random number generation and quasi-Monte Carlo methods. CBMS-NSF
Regional Conference Series in Applied Mathematics, vol. 63. Society for Industrial and Applied
Mathematics (SIAM), Philadelphia, PA (1992). doi:10.1137/1.9781611970081
15. Nuyens, D., Cools, R.: Fast algorithms for component-by-component construction of rank-1
lattice rules in shift-invariant reproducing kernel Hilbert spaces. Math. Comp. 75(254), 903
920 (2006) (electronic). doi:10.1090/S0025-5718-06-01785-6
16. Nuyens, D., Cools, R.: Fast component-by-component construction, a reprise for different
kernels. Monte Carlo and quasi-Monte Carlo methods 2004, pp. 373387. Springer, Berlin
(2006). doi:10.1007/3-540-31186-6_22
17. Rader, C.: Discrete Fourier transforms when the number of data samples is prime. Proc. IEEE
3(3), 12 (1968)
18. Sloan, I.H., Kuo, F.Y., Joe, S.: Constructing randomly shifted lattice rules in weighted Sobolev
spaces. SIAM J. Numer. Anal. 40(5), 16501665 (2002). doi:10.1137/S0036142901393942
19. Sloan, I.H., Reztsov, A.V.: Component-by-component construction of good lattice rules. Math.
Comp. 71(237), 263273 (2002). doi:10.1090/S0025-5718-01-01342-4
20. Yoshiki, T.: Bounds on Walsh coefficients by dyadic difference and a new Koksma- Hlawka
type inequality for Quasi-Monte Carlo integration (2015)
Abstract New methods are derived for the computation of multivariate normal
probabilities defined for hyper-rectangular probability regions. The methods use conditioning with a sequence of truncated bivariate probability densities. A new approximation algorithm based on products of bivariate probabilities will be described.
Then a more general method, which uses sequences of simulated pairs of bivariate
normal random variables, will be considered. Simulations methods which use Monte
Carlo, and quasi-Monte Carlo point sets will be described. The new methods will be
compared with methods which use univariate normal conditioning, using tests with
random multivariate normal problems.
Keywords Multivariate normal probabilities Bivariate conditioning
1 Introduction
Many problems in applied statistical analysis require the computation of multivariate
normal (MVN) probabilities in the form
1
(a, b; ) =
|| (2 )n
b1
a1
...
bn
e 2 x
1 t
dx,
an
289
290
n1
m=1
n1
cnm ym )/cnn .
m=1
i1
(a, b; ) =
1
(2 )n
b1
a1
y12
e 2
b2
a2
y22
e 2
i1
bn1
an1
e 2 dy.
we
(1)
This conditioned form for MVN probabilities has been used as the basis for several
numerical approximation and simulation methods (see Genz and Bretz [4, 5]).
291
M
EM =
PM )2 2
.
M(M 1)
k=1 (Pk
(2)
The scaled standard error is used to provide error estimates for PM . If QMC points
are used instead of the u i U (0, 1) MC points, the result is a QMC algorithm,
with faster convergence to (a, b; ) (see Hickernell [8], where the use of lattice
rule QMC point sets is analyzed). Sndor and Andrs [12] also showed how QMC
point sets can provide faster convergence than MC point sets for this problem, and
compared several types of QMC point sets.
a
x2
xe 2 d x/((b) (a)).
292
The GGE variable prioritization method first chooses the outermost integration variable by selecting the variable i so that
bi
ai
.
i = argmin
ii
ii
1in
The integration limits and the rows and columns of for variables 1 and i are interchanged. Then the first column of the Cholesky decomposition C of is computed
using c11 = 11 and ci1 = c11i1 for i = 2, . . . , n. Letting a 1 = ca111 , b1 = cb111 , we set
1 = E(a 1 , b1 ).
At stage j, the jth integration variable is chosen by selecting a variable i so that
j1
j1
a
b
i
im
m
i
im
m
m=1
m=1
.
i = argmin
j1 2
j1 2
jin
ii m=1 cim
ii m=1 cim
The integration limits, rows and columns of , and partially completed rows of C for
variables j and i are interchanged. Then the jth column of C is computed using The
integration limits, rows and columns of , and partially completed rows of C for
variables
j and i are interchanged. Then the jth column of C is computed using
cjj =
j1
j1
2
ii m=1 cim
and ci j = (i j m=1 cim c jm )/c j j , for i = j + 1, . . . , n.
j1
j1
Letting a j = (a j m=1 c jm m )/c j j , and b j = (b j m=1 c jm m )/c j j , we set
j = E(a j , b j ). The algorithm finishes when j = n, and then the final Cholesky
factor C and permuted integration limits a and b are used for the Pk computations
in (2).
Tests of the univariate conditioned simulation algorithm, with this variable
reordering algorithm show that the resulting Pk have smaller variation, reducing
the overall variation for the MVN estimates (see Genz and Bretz [4]). This (variable
prioritized) algorithm is widely used with QMC or deterministic us for implementations in Matlab, R, and Mathematica.
293
3.1 L DL t Decomposition
In order to transform the MVN problem into a sequence of conditioned BVN integrals, we define k = n2 and use the covariance matrix decomposition = L DL t .
If n is even this decomposition of has
I2 O2
L 21 . . . . . .
L=
. .
.. . . I2
L k1 . . . L k,k1
D1 O 2
O2
..
..
.
, D = O2 .
. .
.. . .
O2
I2
O2 . . .
O2
.
..
. ..
,
Dk1 O2
O 2 Dk
1 0 0
2 1 1 1 2
0 1 0
1 2 1 1 2
=
1 1 4 3 1 , L = 1 1 1
1 1 0
1 1 3 4 1
2 2 1
2 2 1 1 16
21 0 00
00
1 2 0 0 0
0 0
0 0, D =
0 0 2 1 0 .
0 0 1 2 0
10
00 0 02
11
This
block t decomposition
using the partitioning
can be recursivelycomputed
D1 O
I2 O
1,1 R
=
, and D =
, where 1,1 is a 2 2
, with L =
R
M L
O D
matrix. Then D1 = 1,1 , M = R D11 , D = M D1 M t , and the decomposition
procedure continues by applying the same operations to the (n 2) (n 2) matrix
This is a 2 2 block form for the standard Cholesky decomposition algorithm
.
(see Golub and Van Loan [6]).
e 2 y2k Dk y2k
|D| (2 )n 1 2
2k1
2k
dy
if n = 2k;
n 2d1 yn2
nn
e
dy
if
n = 2k + 1.
n
294
b1
a1
b2
a2
1 t
2 |12 | 2
1
2
e 2 z2 12 z2
b2k1
a2k1
b2k
a2k
2 |2k1,2k | 2
dz
if n = 2k,
21 z n2
e
dz if n = 2k + 1.
(3)
bn
an
!
1 k
, k = d2k1,2k / d2k1,2k1 d2k,2k , and
k 1
(a , b )i = (, )i / dii .
The bivariate approximation algorithm begins with the computation outermost
BVN probability P1 = ((a1 , a2 ), (b1 , b2 ); 12 ). We then use explicit formulas,
derived by Muthn [11], for truncated BVN moments 1 and 2 : using q1 =
with 2k1,2k =
1 12 ,
1
2q12
(u, v)e
dvdu.
=
2 P1 q1 a1 a2
The Muthn formula for 1 is
1 = 1
+
(a2 )
P1
(a1 )
P1
"
"
(b2 )
P1
(b1 )
P1
"
"
#
,
(4)
using the univariate (a, b) = (b) (a). The 2 formula is the same, except
for the interchanges a1 a2 and b1 b2 . Note that the i formulas depend only
on easily computed univariate pdf and cdf values.
Now, approximate the second BVN by P2 = ((a 3 , a 4 ), (b3 , b4 ); 3,4 ), where
a i , bi , are ai , bi , with z 1 , z 2 replaced by 1 , 2 . Then, compute (3 , 4 ) =
E((a 3 , a 4 ), (b3 , b4 ); 2 ). At the ith stage we compute
Pi = ((a 2i1 , a 2i ), (b2i1 , b2i ); 2i1,2i ),
with a i , bi , computed ai , bi , with z 1 , ..., z 2i2 replaced by the expected values 1 ,
..., 2i2
After k stages the bivariate conditioning approximation is
(a, b; )
k
$
i=1
Pi
1
if n = 2k;
(a n , bn ) if n = 2k + 1.
295
This algorithm was proposed and studied by Trinh and Genz [15], where the
BVN conditioned approximations were found to be more accurate than approximations using univariate means with conditioning. In that paper variable reorderings
were also studied, where a natural strategy is to reorder the variables at stage i to
minimize the Pi . But this strategy uses O(n 2 ) BVN values overall, which can take
a lot more time than the strategy described previously which uses only UVN values. Tests by Trinh and Genz showed that UVN value prioritization results provided
approximations which were usually as accurate, or almost as accurate as the BVN
prioritized approximations.
e 2 z2 12 z2
1 t
2 P1 |12 | 2
e 2 zi 2i1,2i zi
1 t
2 Pi |2i1,2i | 2
After k stages
(a, b; )
k
$
i=1
Pi
1
if n = 2k;
(a n , bn ) if n = 2k + 1.
(5)
296
The k stages in the algorithm are repeated and the results are averaged to approximate
(a, b; ). The primary complication with this algorithm compared to the algorithm
for univariate simulation is the truncated BVN simulation. In contrast to the univariate
simulation, there is no direct inversion formula for truncated BVN simulation.
Currently, the most efficient methods for truncated BVN simulation use an algorithm derived by Chopin [1], with variations for special cases. The basic algorithm is
an Acceptance-Rejection (AR) algorithm which we now describe. At each stage in
the BVN conditioned simulation algorithm, we need to simulate x, y from a truncated
BVN. We consider a generic BVN problem
with
truncated region (a, b) (c, d) and
1
correlation coefficient . Using =
, we first define
1
P=
1
1
e 2 z z dz 2 dz 1 =
1 t
e 2 x
2
1
dx
1 2
cx
2 || 2 a c
a
1 2
b 1 x2
dx 1 y 2
2
2
e
1 2 e
dy.
cx
2
2
a
2
e 2 y
d yd x
2
1
The AR algorithm first simulates x (using AR) from the (a, b) truncated density
h(x) =
1 2
e 2 x
2 P
1 2
using the (a, b) truncated Normal density g(x) = e 2 x /( 2 ((b, a)). This x is
accepted if u < h(x)/Cg(x), where u U (0, 1), and where the AR constant C
is given by C = max x[a, b] h(x/g(x)). Now h(x)/g(x) = f (x)(a, b)/P, so C is
given by the x [a, b] which maximizes f (x).
it can be shown
Using basic analysis,
),
b
,
so
we
define f =
that a unique maximum occurs at x = min max(a, c+d
2
4. Return (x,y).
The notation (x, y) B N ((a, b), (c, d); ) will be used to denote an (x, y) pair produced by this algorithm. We need n1
(x, y) pairs for each approximate (a, b, )
2
computation (5). We will present some test results using this MC algorithm in
Sect. 4.4.
297
e 2 x
2
1
dx
1 2
cx
1 2
e 2 y
y ),
F(x, y)d yd x P F(x,
2
1
(6)
with (x,
y ) B N ((a, b), (c, d); ), and we used AR to determine x.
In order to use
a smoothed AR simulation for x,
we rewrite the BVN integral as
P=
a
e 2 x f (x)
dx
f
f
2
1
a
e 2 x
f
2
1
0
298
where r (x) = f (x)/ f , and I (s) is the indicator function (with value 1 if s is true and
0 otherwise). This setup can be used for MC or QMC simulations (first simulate x
N (a, b) by inversion from U (0, 1), then use u U (0, 1)), but the nonsmooth I (s)
is not expected to lead to an efficient algorithm. However, we tested this unsmoothed
(USAR) algorithm, where the approximation which replaces P in (6) is
P = (a, b)) f I (r (x) < u).
These approximations, which are sometimes zero, are used to replace the Pi values
in (5), and the primary problem is that the USAR simulation algorithm can often
have zero value for (5).
Smoothed AR replaces I (r < u) with a smoother function wr (u) which satisfies
1
the condition 0 wr (u)du = r . After some experimentation and consideration of
the possibilities discussed by Wang [16], we chose to replace I (r (x) < u) by the
continuous
(x)
u, if u r (x);
1 1r
r (x)
wr (x) (u) =
r (x)
(1 u), otherwise.
1r (x)
1
0
e 2 x
f (x)d x
2
1
e 2 x
f
2
1
0
w f (x) (u)dud x.
f
additional weight for that stage. The resulting contribution to the product for each
(a, b, ) approximation in (5) is
Pi = (a, b) f wr (x) (u)
instead of Pi . Notice that Pi is not needed for the SAR algorithm, and the algorithm
is similar to the univariate conditioned algorithm which uses
%
(a, b)
c x d x
!
,!
1 2
1 2
&
k
$
i=1
Pi
1
if n = 2k;
(a n , bn ) if n = 2k + 1.
(7)
299
As with AR, the k stages in the algorithm are repeated and the results are averaged
to produce the final approximation to (a, b; ).
The SAR algorithm requires one additional u U (0, 1) for each stage so, assuming that x,
and y are both computed using truncated univariate Normal inversion of
U (0, 1)s, the total number of U (0, 1)s is m = 3n/2 1 for each approximation
to (a, b; ) for an MC SAR algorithm. For a QMC SAR algorithm, m-dimensional
QMC vectors with components from (0, 1) replace the m-dimensional U (0, 1) component vectors for the MC algorithm.
All of the algorithm used the GGE univariate variable prioritization algorithm
described in Sect. 2.1. We used the Matlab mvncdf function to compute exact
values for each .
The results in Table 1 show, as was expected, that USAR is clearly not competitive
with any of the other algorithms. Somewhat surprisingly, the AR(MC) algorithm had
average errors that were somewhat smaller than the SAR errors, and (2 3) smaller
than the univariate conditioned MC algorithm. The AR(QMC) algorithm had errors
(5 10) smaller than the AR(MC) algorithm and were similar to the UV(QMC)
algorithm errors.
Table 2 provides some test results for times for the six algorithms using Matlab
with a 3.5Ghz processor Linux workstation. The results in Table 2 show that the
300
0.000039
0.000042
0.000040
0.000056
0.000052
0.000039
0.000066
0.000045
0.000046
0.000036
0.000050
0.000026
0.000285
0.000282
0.000370
0.000279
0.000341
0.000335
0.000324
0.000278
0.000298
0.000316
0.000354
0.000406
0.000054
0.000097
0.000066
0.000071
0.000075
0.000094
0.000224
0.000073
0.000101
0.000072
0.000079
0.000066
0.000008
0.000010
0.000008
0.000007
0.000007
0.000007
0.000005
0.000003
0.000005
0.000004
0.000003
0.000006
0.000008
0.000005
0.000005
0.000005
0.000005
0.000005
0.000006
0.000004
0.000003
0.000003
0.000003
0.000003
0.486
0.899
1.072
1.478
1.649
2.069
2.226
2.626
2.800
3.208
3.380
3.784
0.479
0.657
0.829
1.007
1.183
1.357
1.519
1.689
1.864
2.067
2.204
2.405
0.471
0.656
0.836
1.014
1.195
1.378
1.553
1.725
1.910
2.087
2.269
2.449
0.509
0.926
1.096
1.519
1.686
2.107
2.271
2.695
2.862
3.284
3.440
3.865
0.007
0.009
0.011
0.013
0.015
0.016
0.018
0.020
0.022
0.024
0.026
0.028
UV(MC)
0.000125
0.000137
0.000154
0.000109
0.000111
0.000138
0.000126
0.000113
0.000107
0.000100
0.000106
0.000099
UV(MC)
0.008
0.011
0.013
0.016
0.018
0.021
0.023
0.026
0.029
0.032
0.034
0.037
AR algorithms takes more time (the difference increasing with dimension) compared to the approximately equal time USAR and SAR algorithms; these AR versus
SAR/USAR time difference are caused by the time needed by AR extra random
number generation and acceptance testing. The UV algorithms take much less time
( 1/100) because these algorithms can easily be implemented in Matlab in a vectorized form which allows large sets of (a, b; ) approximations to be computed
simultaneously.
301
5 Conclusions
The Monte Carlo MVN simulation methods described in this paper which use bivariate conditioning are more accurate than the univariate conditioned Monte Carlo simulation methods that we tested. However, there is a significant additional time cost
for the bivariate algorithms because there is no simple algorithm for simulation from
truncated BVN distributions.
We also considered the use of QMC methods with bivariate conditioned MVN
computations, but the lack of a direct algorithm for truncated BVN simulation does
not allow the straightforward use of QMC point sequences. But we did test a simple
QMC algorithm which replaces the MC vectors for the truncated BVN AR simulations with QMC vectors and this algorithm was significantly more accurate than
the MC algorithm, with error levels comparable to the univariate conditioned QMC
algorithm. We also derived a smoothed AR algorithm which could be used with a
QMC sequence for truncated BVN simulation. But, when this algorithm was combined in the bivariate conditioned MVN algorithm, the testing showed this smoothed
AR BVN conditioned algorithm had larger errors than the MC AR BVN conditioned
algorithm. The complete algorithm was not as accurate as a univariate conditioned
QMC algorithm. The bivariate conditioned algorithms also require significantly more
time than the (easily vectorized) univariate conditioned algorithms. Unfortunately,
the goal of finding a bivariate conditioned QMC MVN algorithm has not been satisfied. It is possible that a more direct algorithm for truncated BVN simulation could
lead to a more efficient MVN computation algorithm based on bivariate conditioning
with QMC sequences, but this is a subject for future research.
References
1. Chopin, N.: Fast simulation of truncated Gaussian distributions. Stat. Comput. 21, 275288
(2011)
2. Drezner, Z., Wesolowsky, G.O.: On the computation of the bivariate normal integral. J. Stat.
Comput. Simul. 3, 101107 (1990)
3. Genz, A.: Numerical computation of rectangular bivariate and trivariate normal and t probabilities. Stat. Comput. 14, 151160 (2004)
4. Genz, A., Bretz, F.: Methods for the computation of multivariate t-probabilities. J. Comput.
Graph. Stat. 11, 950971 (2002)
5. Genz, A., Bretz, F.: Computation of Multivariate Normal and t Probabilities. Lecture Notes in
Statistics, vol. 195. Springer, New York (2009)
6. Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins University Press,
Baltimore (2012)
7. Gibson, G.J., Glasbey, C.A., Elston, D.A.: Monte Carlo evaluation of multivariate normal
integrals and sensitivity to variate ordering. In: Dimov, I.T., Sendov, B., Vassilevski, P.S. (eds.)
Advances in Numerical Methods and Applications, pp. 120126. World Scientific Publishing,
River Edge (1994)
8. Hickernell, F.J.: Obtaining O(N 2+ convergence for lattice quadrature rules. In: Fang, K.T.,
Hickernell, F.J., Niederreiter, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2000,
pp. 274289. Springer, Berlin (2002)
302
9. Nuyens, D., Cools, R.: Fast algorithms for component-by-component construction of rank-1
lattice rules in shift-invariant Reproducing Kernel Hilbert Spaces. Math. Comput. 75, 903920
(2006)
10. Moskowitz, B., Caflish, R.E.: Smoothness and dimension reduction in quasi-Monte Carlo
methods. Math. Comput. Model. 23, 3754 (1996)
11. Muthn, B.: Moments of the censored and truncated bivariate normal distribution. Br. J. Math.
Stat. Psychol. 43, 131143 (1991)
12. Sndor, Z., Andrs, P.: Alternative sampling methods for estimating multivariate normal probabilities. J. Econ. 120, 207234 (2002)
13. Schervish, M.J.: Algorithm AS 195: multivariate normal probabilities with error bound. J.
Royal Stat. Soc. Series C 33, 8194 (1984), correction 34,103104 (1985)
14. Stewart, G.W.: The efficient generation of random orthogonal matrices with an application to
condition estimators. SIAM J. Numer. Anal. 17(3), 403409 (1980)
15. Trinh, G., Genz, A.: Bivariate conditioning approximations for multivariate normal probabilities. Stat. Comput. (2014). doi:10.1007/s11222-014-9468-y
16. Wang, X.: Improving the rejection sampling method in quasi-Monte Carlo methods. J. Comput.
Appl. Math. 114, 231246 (2000)
17. Zhu, H., Dick, J.: Discrepancy bounds for deterministic acceptance-rejection samplers. Electron. J. Stat. 8, 687707 (2014)
Abstract This paper shows that it is relatively easy to incorporate adaptive timesteps
into multilevel Monte Carlo simulations without violating the telescoping sum on
which multilevel Monte Carlo is based. The numerical approach is presented for
both SDEs and continuous-time Markov processes. Numerical experiments are given
for each, with the full code available for those who are interested in seeing the
implementation details.
Keywords multilevel Monte Carlo
Markov process
L
E[P P1 ],
(1)
=1
expressing the expected value on the finest level as the expected value on the coarsest
level of approximation plus a sum of expected corrections.
303
304
L
Y ,
Y =
N1
=0
N
(n)
P(n) P1
n=1
L
N1 V
=0
/ 2, and the number of samples N so that the variance sum is less than 2 /2.
If C is the cost of a single sample P P1 , then a constrained optimisation,
minimising the computational cost for a fixed total variance, leads to
N = 2 2
V /C
L
=0
V C .
> ,
O(2 ),
C = O(2 (log 1 )2 ), = ,
O(2( )/ ), < .
The above is a quick overview of the multilevel Monte Carlo (MLMC) approach.
In the specific context of outputs which are functionals of the solution of an SDE, most
MLMC implementations use a set of levels with exponentially decreasing uniform
timesteps, i.e. on level the uniform timestep is
h = M h 0
where M is an integer. When using the EulerMaruyama approximation it is usually
found that the optimum value for M is in the range 48, whereas for higher order
strong approximations such as the Milstein first order approximation it is found that
M = 2 is best.
The MLMC implementation is then very straightforward. In computing a single
correction sample P P1 , one can first generate the Brownian increments for the
fine path simulation which leads to the output P . The Brownian increments can then
be summed in groups of size M to provide the Brownian increments for the coarse
305
path simulation which yields the output P1 . The strong convergence properties of
the numerical approximation ensure that the difference between the fine and coarse
path simulations decays exponentially as , and therefore the output difference
P P1 also decays exponentially; this is an immediate consequence if the output
is a Lipschitz functional of the path solution, but in other cases it requires further
analysis.
In the computational finance applications which have motivated a lot of MLMC
research, it is appropriate to use uniform timesteps on each level because the drift
and volatility in the SDEs does not vary significantly from one path to another, or
from one time to another. However, in other applications with large variations in
drift and volatility, adaptive timestepping can provide very significant reductions in
computational cost for a given level of accuracy [15]. It can also be used to address
difficulties with SDEs such as
dSt = St3 dt + dWt ,
which have a super-linear growth in the drift and/or the volatility, which otherwise
lead to strong instabilities when using uniform timesteps [11].
The most significant prior research on adaptive timestepping in MLMC has been
by Hoel, von Schwerin, Szepessy and Tempone [9] and [10]. In their research, they
construct a multilevel adaptive timestepping discretisation in which the timesteps
used on level are a subdivision of those used on level 1, which in turn are
a subdivision of those on level 2, and so on. By doing this, the payoff P on
level is the same regardless of whether one is computing P P1 or P+1 P ,
and therefore the MLMC telescoping summation, (1), is respected. Another notable
aspect of their work is the use of adjoint/dual sensitivities to determine the optimal
timestep size, so that the adaptation is based on the entire path solution.
In this paper, we introduce an alternative approach in which the adaptive timesteps
are not nested, so that the timesteps on level do not correspond to a subdivision
of the timesteps on level 1. This leads to an implementation which is perhaps a
little simpler, and perhaps a more natural extension to existing adaptive timestepping
methods. The local adaptation is based on the current state of the computed path, but
it would also work with adjoint-based adaptation based on the entire path. We also
show that it extends very naturally to continuous-time Markov processes, extending
ideas due to Anderson and Higham [1, 2]. The key point to be addressed is how to
construct a tight coupling between the fine and coarse path simulations, and at the
same time ensure that the telescoping sum is fully respected.
306
Fig. 1 Simulation times for multilevel Monte Carlo with adaptive timesteps
Algorithm 1 Outline of the algorithm for a single MLMC sample for > 0 for a
scalar Brownian SDE with adaptive timestepping for the time interval [0, T ].
t := 0; t c := 0; t f := 0
h c := 0; h f := 0
W c := 0; W f := 0
while (t < T ) do
told := t
t := min(t c , t f )
W := N (0, t told )
W c := W c + W
W f := W f + W
if t = t c then
update coarse path using h c and W c
compute new adapted coarse path timestep h c
h c := min(h c , T t c )
t c := t c + h c
W c := 0
end if
if t = t f then
update fine path using h f and W f
compute new adapted fine path timestep h f
h f := min(h f , T t f )
t f := t f + h f
W f := 0
end if
end while
compute P P1
For Brownian diffusion SDEs, level uses an adaptive timestep of the form
h = M H (Sn ), where M > 1 is a real constant, and H (S) is independent of
level. This automatically respects the telescoping summation, (1), since the adaptive timestep on level is the same regardless of whether it is the coarser or finer of
the two paths being computed. On average, the adaptive timestepping leads to simulations on level having approximately M times as many timesteps as level 1, but
it also results in timesteps which are not naturally nested, so the simulation times for
the coarse path do not correspond to simulation times on the fine path. It may appear
that this would cause difficulties in the strong coupling between the coarse and fine
307
paths in the MLMC implementation, but it does not. As usual, what is essential to
achieve a low multilevel correction variance V is that the same underlying Brownian
path is used for both the fine and coarse paths. Figure 1 shows a set of simulation
times which is the union of the fine and coarse path times. This defines a set of intervals, and for each interval we generate a Brownian increment with the appropriate
variance. These increments are then summed to give the Brownian increments for
the fine and coarse path timesteps.
An outline implementation to compute a single sample of P P1 for > 0 is
given in Algorithm 1. This could use either an EulerMaruyama discretisation of the
SDE, or a first order Milstein discretisation for those SDEs which do not require the
simulation of Lvy area terms.
Adaptive timestepping for continuous-time Markov processes works in a very
similar fashion. The evolution of a continuous-time Markov process can be described
by
t
j Pj
j (Ss ) ds
St = S0 +
0
where the summation is over the different reactions, j is the change due to reaction
j
j (the number of molecules of each species which are created or destroyed), the P
are independent unit-rate Poisson processes, and j is the propensity function for
the j th reaction, meaning that j (St ) dt is the probability of reaction j taking place
in the infinitesimal time interval (t, t +dt).
j (St ) should be updated after each individual reaction, since it changes St , but in
the tau-leaping approximation [7] j is updated only at a fixed set of update times.
This is the basis for the MLMC construction due to Anderson and Higham [1].
Using nested uniform timesteps, with h c = 2 h f , each coarse timestep is split into
two fine timesteps, and for each
appropriate
of the fine timesteps one has
to compute
f
c f
f
Poisson increments P j j h for the coarse path and P j j h for the fine path.
To achieve a tight coupling between the coarse and fine paths, they use the fact that
f
f
j
f
min(cj , j )
|cj
f
j |
1c < f ,
j
)
h
)
and
P(|
|
Poisson variates P(min(
j
j
j
j h ) and use these to give the
Poisson variates for the coarse and fine paths over the same fine timestep.
As outlined in Algorithm 2, the extension of adaptive timesteps to continuoustime Markov processes based on the tau-leaping approximation is quite natural. The
Poisson variates are computed for each time interval in the time grid formed by the
union of the coarse and fine path simulation times. At the end of each coarse timestep,
the propensity functions c are updated, and a new adapted timestep h c is defined.
Similarly, f and h f are updated at the end of each fine timestep.
308
Algorithm 2 Outline of the algorithm for a single MLMC sample for a continuoustime Markov process with adaptive timestepping for the time interval [0, T ].
t := 0; t c := 0; t f := 0
c := 0; f := 0
h c := 0; h f := 0
while (t < T ) do
told := t
t := min(t c , t f )
h := t told
c , f ) h), P(|
c f | h),
for each reaction, generate Poisson variates P(min(
use Poisson variates to update fine and coarse path solutions
if t = t c then
update coarse path propensities c
compute new adapted coarse path timestep h c
h c := min(h c , T t c )
t c := t c + h c
end if
if t = t f then
update fine path propensities f
compute new adapted fine path timestep h f
h f := min(h f , T t f )
t f := t f + h f
end if
end while
compute P P1
The telescoping sum is respected because, for each timestep of either the coarse or
fine path simulation, the sum of the Poisson variates for the sub-intervals is equivalent
in distribution to the Poisson variate for the entire timestep, and therefore the expected
value E[P ] is unaffected.
3 Numerical Experiments
3.1 FENE SDE Kinetic Model
A kinetic model for a dilute solution of polymers in a fluid considers each molecule as
a set of balls connected by springs. The balls are each subject to random forcing from
the fluid, and the springs are modelled with a FENE (finitely extensible nonlinear
elastic) potential which increases without limit as the length of the bond approaches
a finite value [3].
309
In the case of a molecule with just one bond, this results in the following 3D SDE
for the vector length of the bond:
dqt =
4
qt dt + 2 dWt
1qt 2
4h n
qn + 2 Wn
1qn 2
and because the volatility is constant, one would expect this to give first order strong
convergence. The problem is that this discretisation leads to qn+1 > 1 with positive
probability, since Wn is unbounded.
-5
-2
-4
log 2 |mean|
log 2 variance
-10
-15
-20
Pl
-8
-10
Pl
-12
Pl - P l-1
-25
-6
-14
Pl - P l-1
level l
10 6
10
Std MC
MLMC
=0.0005
=0.001
=0.002
=0.005
=0.01
10
Nl
Cost
10 4
level l
10 2
10 0
level l
10 -1
10-3
accuracy
Fig. 2 MLMC results for the FENE model using adaptive timesteps
10-2
310
This problem is addressed in two ways. The first isto use adaptive timesteps which
become much smaller as qn 1. Since Wn = h Z n , where the component of
Z n in the direction normal to the boundary is a standard Normal random variable
which is very unlikely to take a value with magnitude greater than 3, we choose the
timestep so that
6 h n 1 qn
so the stochastic term is highly unlikely to take across the boundary. In addition, the
drift term is singular at the boundary and therefore for accuracy we want the drift
term to be not too large relative to the distance to the boundary so that it will not
change by too much during one timestep. Hence, we impose the restriction
2h n
1qn .
1qn
Combining these two gives the adaptive timestep
H (qn ) =
(1qn )2
,
max(2, 36)
qn+1
:=
1
qn+1
qn+1
if qn+1 > 1 , with typically chosen to be 105 , which corresponds to an adaptive timestep of order 1010 for the next timestep. Numerical experiments suggest
that this value for does not lead to any significant bias in the output of interest.
The output of interest in the initial experiments is E[q2 ] at time T = 1, having
started from initial data q = 0 at time t = 0. Figure 2 presents the MLMC results,
showing first order convergence for the weak error (top right plot) and second order
convergence for the multilevel correction variance (top left plot). Thus, in terms of
the standard MLMC theory we have = 1, = 2, = 1, and hence the computational
cost for RMS accuracy is O(2 ); this is verified in the bottom right plot, with the
bottom left plot showing the number of MLMC samples on each level as a function
of the target accuracy.
311
,
R1 : S1
R2 : S2 S3 ,
1/500
(2)
1/2
R3 : S1 + S1 S2 , R4 : S2 S1 + S1 .
and the corresponding propensity functions for the 4 reactions are
1 = S1 ,
2 = (1/25) S2 ,
3 = (1/500) S1 (S1 1), 4 = (1/2) S2 ,
(3)
10 4
10
Transient phase
10 4
10
Long phase
S
Copy number
Copy number
0.01
0.02
Time
0.03
10
20
30
Time
Fig. 3 The temporal evolution of a single sample path of reaction system (2) on two different
time-scales. Reaction rates are given in (3) and initial conditions are as described in the text
312
to ensure that there is no more than a 25 % change in any species in one timestep,
the timestep on the coarsest level is taken to be
Si + 1
H = 0.25 min
.
i
| j i j j |
(4)
15
14
12
log 2 |mean|
log 2 variance
10
10
8
6
Pl - P l-1
Pl - P l-1
-5
0
level l
level l
106
10 8
Std MC
MLMC
=1
=2
=5
=10
=20
10 7
Nl
Cost
104
102
100
10 6
level l
10 0
10 1
accuracy
Fig. 4 MLMC results for the continuous-time Markov process using adaptive timesteps
313
Note that these numerical results do not include a final multilevel correction which
couples the tau-leaping approximation on the finest grid level to the unbiased Stochastic Simulation Algorithm which simulates each individual reaction. This additional coupling is due to Anderson and Higham [1], and the extension to adaptive
timestepping is discussed in [12]. Related research on adaptation has been carried out
by [13, 14].
4 Conclusions
This paper has just one objective, to explain how non-nested adaptive timesteps
can be incorporated very easily within multilevel Monte Carlo simulations, without
violating the telescoping sum on which MLMC is based.
Outline algorithms and accompanying numerical demonstrations are given for
both SDEs and continuous-time Markov processes. For those interested in learning
more about the implementation details, the full MATLAB code for the numerical
examples is available with other example codes prepared for a recent review paper
[5, 6].
Future papers will investigate in more detail the FENE simulations, including
results for molecules with multiple bonds and the interaction with fluids with nonuniform velocity fields, and the best choice of adaptive timesteps for continuous-time
Markov processes [12].
The adaptive approach could also be extended easily to Lvy processes and other
processes in which the numerical approximation comes from the simulation of increments of a driving process over an appropriate set of time intervals formed by a union
of the simulation times for the coarse and fine path approximations.
Acknowledgments MBGs research was funded in part by EPSRC grant EP/H05183X/1, and CL
and JW were funded in part by a CCoE grant from NVIDIA. In compliance with EPSRCs open
access initiative, the data in this paper, and the MATLAB codes which generated it, are available from
doi:10.5287/bodleian:s4655j04n. This work has benefitted from extensive discussions
with Ruth Baker, Endre Sli, Kit Yates and Shenghan Ye.
References
1. Anderson, D., Higham, D.: Multi-level Monte Carlo for continuous time Markov chains with
applications in biochemical kinetics. SIAM Multiscale Model. Simul. 10(1), 146179 (2012)
2. Anderson, D., Higham, D., Sun, Y.: Complexity of multilevel Monte Carlo tau-leaping. SIAM
J. Numer. Anal. 52(6), 31063127 (2014)
3. Barrett, J., Sli, E.: Existence of global weak solutions to some regularized kinetic models for
dilute polymers. SIAM Multiscale Model. Simul. 6(2), 506546 (2007)
4. Giles, M.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607617 (2008)
5. Giles, M.: Matlab code for multilevel Monte Carlo computations. http://people.maths.ox.ac.
uk/gilesm/acta/ (2014)
314
6. Giles, M.: Multilevel Monte Carlo methods. Acta Numer. 24, 259328 (2015)
7. Gillespie, D.: Approximate accelerated stochastic simulation of chemically reacting systems.
J. Chem. Phys. 115(4), 17161733 (2001)
8. Heinrich, S.: Multilevel Monte Carlo methods. In: Multigrid Methods. Lecture Notes in Computer Science, vol. 2179, pp. 5867. Springer, Heidelberg (2001)
9. Hoel, H., von Schwerin, E., Szepessy, A., Tempone, R.: Adaptive multilevel Monte Carlo
simulation. In: Engquist, B., Runborg, O., Tsai, Y.H. (eds.) Numerical Analysis of Multiscale
Computations, vol. 82, pp. 217234. Lecture Notes in Computational Science and Engineering.
Springer, Heidelberg (2012)
10. Hoel, H., von Schwerin, E., Szepessy, A., Tempone, R.: Implementation and analysis of an
adaptive multilevel Monte Carlo algorithm. Monte Carlo Methods Appl. 20(1), 141 (2014)
11. Hutzenthaler, M., Jentzen, A., Kloeden, P.: Divergence of the multilevel Monte Carlo method.
Ann. Appl. Prob. 23(5), 19131966 (2013)
12. Lester, C., Yates, C., Giles, M., Baker, R.: An adaptive multi-level simulation algorithm for
stochastic biological systems. J. Chem. Phys. 142(2) (2015)
13. Moraes, A., Tempone, R., Vilanova, P.: A multilevel adaptive reaction-splitting simulation
method for stochastic reaction networks. Preprint arXiv:1406.1989 (2014)
14. Moraes, A., Tempone, R., Vilanova, P.: Multilevel hybrid Chernoff tau-leap. SIAM J. Multiscale
Model. Simul. 12(2), 581615 (2014)
15. Mller-Gronbach, T.: Strong approximation of systems of stochastic differential equations.
Habilitation thesis, TU Darmstadt (2002)
16. Tian, T., Burrage, K.: Binomial leap methods for simulating stochastic chemical kinetics. J.
Chem. Phys. 121(10), 356 (2004)
315
316
D. Ginsbourger et al.
Keywords Gaussian processes Sensitivity analysis Kriging Covariance functions Conditional simulations
317
fu ,
(1)
uI
(1)|u||u |
f (x1 , . . . , xd ) u (dxu ),
u u
(2)
= (x i )iI \u . As developed in [19], Eq. (2) is
and
x
where u =
j
u
jI \u
a special case of a decomposition relying on commuting projections. Denoting by
P j : f F f d j the orthogonal projector onto the subspace F j of f F
not depending on x j , the identity on F can be expanded as
IF =
d
(IF P j ) + P j =
(IF P j )
Pj .
j=1
uI
ju
(3)
jI \u
P
)
P
j
j . Finally, the squared norm
ju F
j u
/
2
of f decomposes by orthogonality as f = uI Tu ( f )2 and the influence of
each (group of) variable(s) on f can be quantified via the Sobol indices
Su ( f ) =
Tu ( f T ( f ))2
Tu ( f )2
=
, u
= .
f T ( f )2
f T ( f )2
(4)
318
D. Ginsbourger et al.
u, v I i u j v
We have
ku,v (x, y) =
(1)|u|+|v||u ||v |
k(x, y) u (dxu ) v (dyv ).
u u v v
(6)
Moreover, ku,v may be written concisely as ku,v = [Tu Tv ]k.
(b) Suppose that D is compact and k is a continuous s.p.d. kernel. Then, for any
d
(u )uI R2 , the following function is also a s.p.d. kernel:
(x, y) D D
319
u v ku,v (x, y) R.
(7)
uI vI
x2
2
y+
y2
2
+ 13 .
Example 2 Consider the very common class of tensor product kernels: k(x, y) =
d
i=1 ki (x i , yi ) where the ki s are 1-dimensional symmetric kernels. It turns out that
Eq. (6) boils down to a sum depending on 1- and 2-dimensional integrals, since
k(x, y)du (xu )dv (yv ) =
ki (xi , yi )
ki (xi , )di
ki (, yi )di
ki d(i i ).
iuv
iu\v
iv\u
i uv
/
(8)
By symmetry of k, Eq. (8) solely depends on the integrals ki d(i i ) and integral
functions t ki (, t)di , i = 1, . . . , d. We refer to Sect. 7 for explicit calculations
using typical ki s. A particularly convenient case is considered next.
Corollary 1 Let ki(0) : Di Di R (1 i d) be argumentwise centred, i.e.
such that ki(0) (, t)di = ki(0) (s, )di = 0 for all i I and s, t Di , and
d
consider k(x, y) = i=1 (1 + ki(0) (xi , yi )). Then the KANOVA decomposition of k
consists of the terms [Tu Tu ]k(x, y) = iu ki(0) (xi , yi ) and [Tu Tv ]k = 0 if
u
= v.
d
(1+ki(0) (xi , yi )), where ki0 are s.p.d., we recover
Remark 1 By taking k(x, y) = i=1
the so-called ANOVA kernels [6, 38, 39]. Corollary 1 guarantees for argumentwise
centred ki(0) (see, e.g., [6, Sect. 2]) that the associated k has a simple KANOVA
decomposition, with analytically tractable ku,u and vanishing ku,v terms (for u
= v),
as also reported in [4] where a GRF model with this structure is postulated.
320
D. Ginsbourger et al.
and that Z has continuous sample paths. The latter can be guaranteed by a weak
condition on the covariance kernel; see [1], Theorem 1.4.1. For r N \ {0} write
Cb (D, Rr ) for the space of (bounded) continuous functions D Rr equipped
with the supremum norm, and set in particular Cb (D) = Cb (D, R). We reinterpret
Tu as maps Cb (D) Cb (D), which are still bounded linear operators, and set
Z x(u) = (Tu Z )x .
Theorem 2 The 2d -dimensional vector-valued random field (Z x(u) , u I )xD is
Gaussian, centred, and has continuous sample paths again. Its matrix-valued covariance function is given by
Cov(Z x(u) , Z y(v) ) = [Tu Tv ]k (x, y).
(9)
Example 3 Continuing from Example 1, let B = (Bx )x[0,1] be the Brownian motion
on D = [0, 1], which is a centred GRF with continuous paths. Theorem 2 yields that
1
1
(T B, T{1} B) = ( 0 Bu du, Bx 0 Bu du)xD is a bivariate random field on D, where
T B is a N (0, 1/3)-distributed random variable, while (T{1} Bx ) is a centred GRF
2
2
with covariance kernel k{1},{1} (x, y) = min(x, y) x + x2 y + y2 + 13 . The cross2
covariance function of the components is given by Cov(T B, T{1} Bx ) = x x2 13 .
Remark 2 Under our conditions on Z and using the notation
from the proof of
Theorem 1, we have a KarhunenLove expansion Z x = i=1
i i i (x), where
= (i )iN\{0} is a standard Gaussian white noise sequence and the series converges
uniformly (i.e. in Cb (D)) with probability 1 (and in L 2 ()); for d = 1 see [1, 18].
Thus by the continuity of Tu , we can expand the projected random field as
Z x(u)
= Tu
i i i (x) =
i=1
i i Tu (i ) (x),
(10)
i=1
where the series converges uniformly in x with probability 1 (and in L 2 ()). This
is the basis for an alternative proof of Theorem 2. We can also verify Eq. (9) under
2
these
Using
conditions.
the left/right-continuity of cov in L (), we obtain indeed
(u)
(v)
cov Z x , Z y = i=1 i Tu (i )(x) Tv (i )(y) = ku,v (x, y).
Corollary 2 (a) For any u I the following statements are equivalent:
(i)
(ii)
(iii)
(iv)
321
Remark 3 A consequence of Corollary 2 is that choosing a kernel without u component in GRF-based GSA will lead to a posterior distribution without u component
whatever the conditioning observations, i.e. P(Z (u) = 0 | Z x1 , . . . , Z xn ) = 1 (a.s.).
However, the analogous result does not hold for cross-covariances between Z (u)
and Z (v) for u
= v. Let us take for instance D = [0, 1], arbitrary, and Z t = U + Yt ,
where U N (0, 2 ) ( > 0) and (Yt ) is a centred Gaussian process with argumentwise centred covariance kernel k (0) . Assuming that U and Y are independent, it is
clear that (T Z )s = U and (T{1} Z )t = Yt , so Cov((T Z )s , (T{1} Z )t ) = 0. If in addition Z was observed at a point r D, Eq. (9) yields Cov((T Z )s , (T{1} Z )t |Z r ) =
(T T{1} )(k(,
) k(, r )k(r,
)/k(r, r ))(s, t), where k(s, t) = 2 + k (0) (s, t)
is the covariance kernel of Z . By Eq. (6) we obtain Cov((T Z )s , (T{1} Z )t |Z r ) =
2 k (0) (r, t)/( 2 + k (0) (r, r )), which in general is nonzero.
Remark 4 Coming back to the ANOVA kernels discussed in Remark 1, Corollary 2(b) implies that for a centred
continuous sample paths and covariance
d GRF with
kernel of the form k(x, y) = i=1
(1 + ki(0) (xi , yi )), where ki(0) is argumentwise centred, the FANOVA effects Z (u) , u I , are actually independent.
To close this section, let us finally touch upon the distribution of Sobol indices
of GRF sample paths, relying on Theorem 2 and Remark 2.
Corollary 3 For u I , u
= , we can represent the Sobol indices of Z as
Su (Z ) =
Q u (, )
,
v
= Q v (, )
,
where
g
=
i j Tu i , Tu j .
Su (Z ) = Su (Z ) = i,j=1 gi j i j
i=1 i i
ij
Truncating both series above at K N, applying the theorem in Sect. 2 of [29] and
then Lebesgues theorem for K , we obtain
ESu (Z ) =
gii
i=1
ESu (Z )2 =
(1 + 2i t)3/2
(gii g j j + 2gi j 2 )
i=1 j=1
(1 + 2l t)1/2
1
dt,
l
=i
1
t (1 + 2i t)3/2
(1 + 2l t)1/2
dt.
l {i,
/ j}
322
D. Ginsbourger et al.
(11)
From Theorem 1(b), and also from the fact that knew is the covariance function of
Z (u) where Z is a centred GRF with covariance function kold , it is clear that such
kernels are s.p.d.; however, they will generally not be strictly positive definite.
d
Going one step further, one obtains a richer class of 22 symmetric positive definite
kernels by considering parts of P(I ), and designing kernels accordingly. Taking
U P(I ), we obtain a further class of projected kernels as follows:
knew = U kold with U = TU TU =
Tu Tv , where TU =
uU vU
Tu . (12)
uU
The resulting kernel is again s.p.d., which follows from Theorem 1(b) by choosing
u = 1 if u U and u = 0 otherwise, or again by noting that knew is the covariance
function of uU Z (u) where Z is a centred GRF with covariance function kold .
Such a kernel contains not only the covariances of the effects associated with the
different subsets of U , but also cross-covariances between these effects. Finally,
another relevant class of positive definite projected kernels can be designed by taking
knew = U
kold with U
=
Tu Tu .
(13)
uU
This kernel corresponds to the one of a sum of independent random fields with same
individual distributions as the Z (u) (u U ). In addition, projectors of the form
323
U1 ,U
2 (U1 , U2 P(I )) can be combined (e.g. by sums or convex combinations)
in order to generate a large class of s.p.d. kernels, as illustrated here and in Sect. 6.
Example 4 Let us consider A = {, {1}, {2}, . . . , {d}} and O, the complement of A
in P(I ). While A corresponds to the constant and main effects forming the additive component in the FANOVA decomposition, O corresponds to all higher-order
terms, referred to as ortho-additive component in [21]. Taking A k = (T A T A )k
amounts to extracting the additive component of k with cross-covariances between
the various
main effects (including the constant); see Fig. 1(c). On the other hand,
A
k = uA u k retains these main effects without their possible cross-covariances;
see Fig. 1(b). In the next theorem (proven in [21]), analytical formulae are given for
A k and related terms for the class of tensor product kernels.
d
Theorem 3 Let Di = [ai , bi ] (ai < bi ) and k = i=1
ki , where the ki are s.p.d.
kernels on Di such that ki (xi , yi ) > 0 for all xi , yi Di . Then, the additive and
ortho-additive components of k with their cross-covariances are given by
( A k)(x, y) =
a(x)a(y)
E
+E
ki (xi , yi )
i=1
Ei
E i (xi )E i (yi )
Ei2
d
k j (x j , y j )
(TO T A k)(x, y) = (T A TO k)(y, x) = E(x) 1 d +
( A k)(x, y)
E j (x j )
j=1
b
b
d
where E i (xi ) = ai i ki (xi , yi ) dyi , E(x) = i=1
E i (xi ), Ei = ai i E i (xi )i (dxi ),
d Ei (xi )
d
.
Ei , and a(x) = E 1 d + i=1
E = i=1
Ei
6 Numerical Experiments
We consider 30-dimensional numerical experiments where we compare the prediction abilities of sparse kernels obtained from the KANOVA decomposition of
k(x, y) = exp(||x y||2 ),
x, y [0, 1]30 .
(14)
k A
= |u|1 u k
kinter = |u|2 u k
k A
+O = k A
+ O k
ksparse = ( + {1} + {2} + {2,3} + {4,5} )k.
(15)
324
D. Ginsbourger et al.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 1 Schematic representations of a reference kernel k and various projections or sums of projections. The expressions of these kernels are detailed in Sect. 6 (Eq. 15)
A schematic representation of these kernels can be found in Fig. 1. Note that the
tensor product structure of k allows to use Theorem 3 in order to get more tractable
expressions for all kernels above. Furthermore, the integrals appearing in the E i and
Ei terms can be calculated analytically as detailed in appendix.
We now compare kriging predictions based on paths simulated from centred GRFs,
selecting any combination of two of the kernels in Fig. 1 and using one for simulation
(generating kernel) and one for prediction (prediction kernel). Each prediction
is performed at n test = 200 locations based on observations of an individual path
at n train = 500 locations. We judge the performance of the prediction by averaging
over n path = 200 sample paths for each combination of kernels. Whenever the kernel
used for prediction is not the same as the one used for simulation, a Gaussian observation noise with variance 2 is assumed in the models used in prediction, where
2 is chosen so as to reflect the part of variance that cannot be approximated by the
model. For simplicity, only one n train -point training set and one n test -point test set
are considered for the whole experiment. For both, design points are chosen by maximizing the minimal interpoint distance among random Latin hypercube designs [28]
using DiceDesign [7, 11]. For each path y
(
= 1, . . . , n path ), the criterion used for
quantifying prediction accuracy is:
n test
(y
,i y
,i )2
C
= 1 i=1n test 2
i=1 y
,i
(16)
where y
,i and y
,i are the actual and predicted values of the
th path at the i th
test point. While C
= 1 means a null prediction error, C
= 0 means that y
predicts as badly as the null function. Average values of C
are summarized in
325
Z inter
Z A
ZA
Z sparse
Mean
kfull
kdiag
k A
+O
k A+O
kinter
k A
kA
ksparse
0.06
0.05
0.05
0.06
0.33
0.67
0.69
0.75
0.33
0.05
0.05
0.04
0.06
0.37
0.76
0.77
0.83
0.37
0.06
0.05
0.05
0.06
0.34
0.71
0.71
0.8
0.35
0.05
0.05
0.04
0.06
0.37
0.75
0.77
0.78
0.36
0.05
0.04
0.04
0.05
0.7
0.96
0.96
0.95
0.47
0.03
0.03
0.03
0.04
0.28
1
1
0.9
0.41
0.04
0.03
0.03
0.04
0.28
1
1
0.9
0.42
0.01
0.01
0.01
0.01
0.07
0.2
0.18
1
0.19
Rows correspond to generating GRF models (characterized by generating kernels) while columns
correspond to prediction kernels. The four last rows of the kinter column are in bold blue to highlight
the superior performances of that prediction kernel when the class of generating GRF models is as
sparse or sparser than Z inter
Table 1 for all couples of generating versus prediction kernel. Note that Table 1 was
slightly perturbed but the conclusions unchanged when replicating the training and
test designs.
First, this example illustrates that, unless the correlation range is increased, predicting a GRF based on 500 points in dimension 30 is hopeless when the generating
kernel is full or close to full (first four rows of Table 1) no matter what prediction
kernel is chosen. However, for GRFs with a sparser generating kernel, prediction
performances are strongly increased (last four rows of Table 1).
Second, still focusing on the four last lines of Table 1, kinter seems to offer a
nice compromise as it works much better than other prediction kernels on Z inter and
achieves very good performances on sample paths of sparser GRFs. Besides this, it
is not doing notably worse than the best prediction kernels on rows 14.
Third, neglecting cross-correlations has very little or no influence on the results,
so that the Gaussian kernel appears to have a structure relatively close to what we
refer to as diagonal (diag) here. This point remains to be studied analytically.
326
D. Ginsbourger et al.
ortho-additive kernels extracted from tensor product kernels, for which a closed form
formula was given. Besides this, a 30-dimensional numerical experiment supports
our claim that KANOVA may be a useful approach to designing kernels for highdimensional kriging, as the performances of the interaction kernel suggest. Perspectives include analytically calculating the norm of terms appearing in the KANOVA
decomposition to better understand the structure of common GRF models. From a
practical point of view, a next challenge will be to parametrize decomposed kernels
adequately so as to recover from data which terms of the FANOVA decomposition
are dominating and to automatically design adapted kernels from this.
Acknowledgments The authors would like to thank Dario Azzimonti for proofreading, as well as
the editors and an anonymous referee for their valuable comments and suggestions.
Proofs
Theorem 1 (a) The first part and the concrete solution (6) follow directly from the
corresponding statements in Sect. 2. Having established (6), it is easily seen that
[Tu Tv ]k = Tu(1) Tv(2) k coincides with ku,v .
(b) Under these conditions Mercers theorem applies (see [34] for an overview and
recent extensions). So there exist a non-negative sequence (i )iN\{0} , and continuous
representatives (i )iN\{0} of an orthonormal basis of L2 () such that k(x, y) =
u v ku,v (x, y) =
i i (x)i (y),
(17)
i=1
where i = uI u (Tu i ). Thus the considered function is indeed s.p.d.
d
Corollary 1 Expand the product l=1
(1 + kl(0) (xl , yl )) and conclude by unique
(0)
ness of the KANOVA decomposition, noting that
lu kl (xl , yl )i (dx i ) =
(0)
lu kl (xl , yl ) j (dy j ) = 0 for any u I and any i, j u.
Theorem 2 Sample path continuity implies product-measurability of Z and Z (u) ,
respectively, as can be shown by an approximation argument; see e.g. Prop. A.D.
kernel k is continuous, hence
in [31]. Due to Theorem 3 in [35], the covariance
1/2
E|Z
|
(dx
)
(
k(x,
x)
(dx
))
<
for any u I and by
x u
u
u
u
D
D
CauchySchwarz D D E|Z x Z y | u (dxu )v (dyv ) < for any u, v I .
Replacing f by Z in Formula (2), taking expectations and using Fubinis theorem
yields that Z (u) is centred again. Combining (2), Fubinis theorem, and (6) yields
u u
327
Z y v (dyv )
Z x u (dxu ),
v v
(19)
328
D. Ginsbourger et al.
Additional Examples
Here we give useful expressions to compute the KANOVA decomposition of some
tensor product kernels with respect to the uniform measure on [0, 1]d . For simplicity
we denote the 1-dimensional kernels on which they are based by k (corresponding
to the notation ki in Example 2). The uniform measure on [0, 1] is denoted by .
, then:
Example 5 (Exponential kernel) If k(x, y) = exp |xy|
1
0 k(., y)d = [2 k(0, y) k(1, y)]
[0,1]2 k(., .)d( ) = 2 (1 + e1/ )
Example 6 (Matrn kernel, = p + 21 ) Define for = p +
p! ( p + i)!
k(x, y) =
(2 p)! i=0 i!( p i)!
p
Then, denoting p =
1
0
,
2
|x y|
/ 8
pi
1
2
( p N):
|x y|
.
exp
/ 2
we have:
y
1y
p!
Ap
,
2c p,0 A p
k(., y)d = p
(2 p)!
p
p
p
u
p
where A p (u) =
with c p,
=
!1 i=0 ( p+i)!
2 pi . This generalizes
=0 c p,
u e
i!
Example 5, corresponding to = 1/2. Also, this result can be
written more explicitly
for the commonly selected value = 3/2 ( p = 1, 1 = / 3):
exp |xy|
k(x, y) = 1 + |xy|
1
1
1
0 k(., y)d = 1 4 A1 y1 A1 1y
with A1 (u) = (2 + u)eu
1
[0,1]2 k(., .)d( ) = 21 2 31 + (1 + 31 )e1/1
1
0 k(., y)d = 2 1y
+ y 1
2
[0,1]2 k(., .)d( ) = 2(e1/(2 ) 1) + 2 2 1 1
where denotes the cdf of the standard normal distribution.
329
References
1. Adler, R., Taylor, J.: Random Fields and Geometry. Springer, Boston (2007)
2. Antoniadis, A.: Analysis of variance on function spaces. Statistics 15, 5971 (1984)
3. Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic Publishers, Boston (2004)
4. Chastaing, G., Le Gratiet, L.: ANOVA decomposition of conditional Gaussian processes for
sensitivity analysis with dependent inputs. J. Stat. Comput. Simul. 85(11), 21642186 (2015)
5. Durrande, N., Ginsbourger, D., Roustant, O.: Additive covariance kernels for high-dimensional
Gaussian process modeling. Ann. Fac. Sci. Toulous. Math. 21, 481499 (2012)
6. Durrande, N., Ginsbourger, D., Roustant, O., Carraro, L.: ANOVA kernels and RKHS of zero
mean functions for model-based sensitivity analysis. J. Multivar. Anal. 115, 5767 (2013)
7. Dupuy, D., Helbert, C., Franco, J.: DiceDesign and DiceEval: Two R packages for design and
analysis of computer experiments. J. Stat. Softw. 65(11): 138 (2015)
8. Duvenaud, D.: Automatic model construction with Gaussian processes. Ph.D. thesis, Department of Engineering, University of Cambridge (2014)
9. Duvenaud, D., Nickisch, H., Rasmussen, C.: Additive Gaussian Processes. NIPS conference.
(2011)
10. Efron, B., Stein, C.: The jackknife estimate of variance. Ann. Stat. 9, 586596 (1981)
11. Franco, J., Dupuy, D., Roustant, O., Damblin, G., Iooss, B.: DiceDesign: Designs of computer
experiments. R package version 1.7 (2015)
12. Gikhman, I.I., Skorokhod, A.V.: The theory of stochastic processes. Springer, Berlin (2004).
Translated from the Russian by S. Kotz, Reprint of the 1974 edition
13. Hoeffding, W.: A class of statistics with asymptotically normal distributions. Ann. Math. Stat.
19, 293325 (1948)
14. Jan, B., Bect, J., Vazquez, E., Lefranc, P.: approche baysienne pour lestimation dindices de
Sobol. In 45mes Journes de Statistique - JdS 2013. Toulouse, France (2013)
15. Janon, A., Klein, T., Lagnoux, A., Nodet, M., Prieur, C.: Asymptotic Normality and Efficiency
of Two Sobol Index Estimators. Probability And Statistics, ESAIM (2013)
16. Kaufman, C., Sain, S.: Bayesian functional ANOVA modeling using Gaussian process prior
distributions. Bayesian Anal. 5, 123150 (2010)
17. Kre, P.: Produits tensoriels complts despaces de Hilbert. Sminaire Paul Kre Vol 1, No. 7
(19741975)
18. Kuelbs, J.: Expansions of vectors in a Banach space related to Gaussian measures. Proc. Am.
Math. Soc. 27(2), 364370 (1971)
19. Kuo, F.Y., Sloan, I.H., Wasilkowski, G.W., Wozniakowski, H.: On decompositions of multivariate functions. Math. Comput. 79, 953966 (2010)
20. Le Gratiet, L., Cannamela, C., Iooss, B.: A Bayesian approach for global sensitivity analysis
of (multi-fidelity) computer codes. SIAM/ASA J. Uncertain. Quantif. 2(1), 336363 (2014)
21. Lenz, N.: Additivity and ortho-additivity in Gaussian random fields. Masters thesis, Departement of Mathematics and Statistics, University of Bern (2013). http://hal.archives-ouvertes.fr/
hal-01063741
22. Marrel, A., Iooss, B., Laurent, B., Roustant, O.: Calculations of Sobol indices for the Gaussian
process metamodel. Reliab. Eng. Syst. Saf. 94, 742751 (2009)
23. Muehlenstaedt, T., Roustant, O., Carraro, L., Kuhnt, S.: Data-driven Kriging models based on
FANOVA-decomposition. Stat. Comput. 22(3), 723738 (2012)
24. Oakley, J., OHagan, A.: Probabilistic sensitivity analysis of complex models: a Bayesian
approach. J. R. Stat. Soc. 66, 751769 (2004)
25. Rajput, B.S., Cambanis, S.: Gaussian processes and Gaussian measures. Ann. Math. Stat. 43,
19441952 (1972)
26. Rasmussen, C.R., Williams, C.K.I.: Gaussian Processes for Machine Learning. Cambridge,
MIT Press (2006)
27. Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M.,
Tarantola, S.: Global sensitivity analysis: the primer. Wiley Online Library (2008)
330
D. Ginsbourger et al.
28. Santner, T., Williams, B., Notz, W.: The design and analysis of computer experiments. Springer,
New York (2003)
29. Sawa, T.: The exact moments of the least squares estimator for the autoregressive model. J.
Econom. 8(2), 159172 (1978)
30. Scheuerer, M.: A comparison of models and methods for spatial interpolation in statistics and
numerical analysis. Ph.D. thesis, Georg-August-Universitt Gttingen (2009)
31. Schuhmacher, D.: Distance estimates for poisson process approximations of dependent thinnings. Electron. J. Probab. 10(5), 165201 (2005)
32. Sobol, I.: Multidimensional Quadrature Formulas and Haar Functions. Nauka, Moscow
(1969). (In Russian)
33. Sobol, I.: Global sensitivity indices for nonlinear mathematical models and their Monte Carlo
estimates. Math. Comput. Simul. 55(13), 271280 (2001)
34. Steinwart, I., Scovel, C.: Mercers theorem on general domains: on the interaction between
measures, kernels, and RKHSs. Constr. Approx. 35(3), 363417 (2012)
35. Talagrand, M.: Regularity of Gaussian processes. Acta Math. 159(12), 99149 (1987)
36. Tarieladze, V., Vakhania, N.: Disintegration of Gaussian measures and average-case optimal
algorithms. J. Complex. 23(46), 851866 (2007)
37. Touzani, S.: Response surface methods based on analysis of variance expansion for sensitivity
analysis. Ph.D. thesis, Universit de Grenoble (2011)
38. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
39. Wahba, G.: Spline Models for Observational Data. Siam, Philadelphia (1990)
40. Welch, W.J., Buck, R.J., Sacks, J., Wynn, H.P., Mitchell, T.J., Morris, M.D.: Screening, predicting, and computer experiments. Technometrics 34, 1525 (1992)
T. Goda
Graduate School of Engineering, The University of Tokyo,
7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
e-mail: goda@frcer.t.u-tokyo.ac.jp
R. Ohori
Fujitsu Laboratories Ltd., 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki,
Kanagawa 211-8588, Japan
e-mail: ohori.ryuichi@jp.fujitsu.com
K. Suzuki (B) T. Yoshiki
School of Mathematics and Statistics, The University of New South Wales,
Sydney, NSW 2052, Australia
e-mail: kosuke.suzuki1@unsw.edu.au
K. Suzuki T. Yoshiki
Graduate School of Mathematical Sciences, The University of Tokyo,
3-8-1 Komaba, Meguro-ku, Tokyo 153-8914, Japan
e-mail: takehito.yoshiki1@unsw.edu.au
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_16
331
332
T. Goda et al.
1 Introduction
Quasi-Monte Carlo (QMC) integration is one of the well-known methods for
high-dimensional numerical integration [5, 11]. Let P be a point set in the
s-dimensional unit cube [0, 1)s with finite cardinality |P|, and f : [0, 1)s R a
Riemann integrable
function. The QMC integration by P gives
an approximation
of I ( f ) := [0,1)s f (x) d x by the average IP ( f ) := |P|1 xP f (x).
Let Zb = Z/bZ be the residue class ring modulo b, which is identified with the set
the set of s n matrices over Zb for a positive integer n. The
{0, . . . , b 1}, and Zsn
b
is
an
additive
group
with respect to the operation +, the usual summation
set Zsn
b
of matrices over Zb . As QMC point sets, we consider digital nets defined as follows.
Definition 1 Let m, n
be positive integers. Let 0 k bm 1 be an integer with
m
i bi1 . Let Ci Znm
. For 1 i s and 1 j n,
b-adic expansion k = i=1
b
define yi, j,k Zb by (yi,1,k, , . . . , yi,n,k ) = Ci (1 , . . . , m ) . Then we define
xi,k =
yi,2,k
yi,1,k
yi,n,k
+ 2 + + n [0, 1)
b
b
b
for 1 i s. In this way we obtain the k-th point x k = (x1,k , . . . , xs,k ). We call the
set P := {x 0 , . . . , x bm 1 } (P is considered as a multiset) a digital net over Zb with
precision n generated by C1 , . . . , Cs , or simply a digital net.
Recently, the discretization f n of a function f : [0, 1)s R has been introduced
to analyze QMC integration in the framework of digital computation [9]. We define
R by
the n-digit discretization f n : Zsn
b
f n (X ) :=
1
Vol(In (X ))
In (X )
f (x) d x,
s n
xi, j b j , nj=1 xi, j b j + bn ).
for X = (xi, j ) Zsn
i=1 [
j=1
b . Here In (X ) :=
We denote the true integral of f n by I ( f n ) := bsn X Zbsn f n (X ), which indeed
s
equals I ( f ). Define a function : Zsn
[0, 1)s by (X ) := ( nj=1 xi, j b j )i=1
b
sn
for X = (xi, j ) Zb , where xi, j is considered to be an integer and the sum is taken
in R. Then it is easy to check that for any digital net P there exists a subgroup
such that P = (P). Thus, in discretized setting, our main concern is
P Zsn
b
is a subgroup. By abuse of terminology, a subgroup of Zsn
the case that P Zsn
b
b
is also called a digital net in this paper.
In [9], Matsumoto, Saito and Matoba
treat the QMC integration of the n-th discrete approximation I P ( f n ) := |P|1 X P f n (X ) for b = 2. They consider the discretized integration error Err( f n ; P) := I P ( f n ) I ( f n ) instead of the usual integration error Err( f ; (P)) := I(P) ( f ) I ( f ). The difference between them, which
is equal to I(P) ( f ) I P ( f n ), is called the discretization error and bounded by
sup X Zbsn , xIn (X ) | f (x) f n (X )|. If f is continuous with Lipschitz constant K ,
333
practice (say n = 30) [9, Lemma 2.1]. Hence, in this case, we have Err( f n ; P)
Err( f ; (P)), which is a part of their setting we adopt.
Assume that f : [0, 1)s R is a function whose mixed partial derivatives up to
order n in each variable are continuous and P Zsn
is a subgroup. Matsumoto
b
et al. [9] proved the KoksmaHlawka type inequality for Err( f n ; P);
|Err( f n ; P)| Cb,s,n || f ||n WAFOM(P),
(1)
Z
.
Here
is chosen uniformly and randomly. Randomized
P Zsn
b
b
QMC integration by P + of the n-digit discretization f n gives the approximation
I P+ ( f n ) of I ( f n ). By adding a random element , it becomes possible to obtain
some statistical estimate on the integration error. Such an estimate is not available
for deterministic digital nets.
We note that randomized QMC integration using digitally shifted digital nets has
already been studied in previous works, see for instance [1, 7] among many others,
where a digital shift is chosen from [0, 1)s and the QMC integration using P
is considered to give the approximation of I ( f ). Here denotes digitwise addition
modulo b applied componentwise. It is known that the estimator IP ( f ) is an
unbiased estimator of I ( f ), so that the mean square QMC error for a function f
with respect to [0, 1)s equals the variance of the estimator.
In the n-digit discretized setting which we consider in this paper, it is also possible to show that the estimator I P+ ( f n ) is an unbiased estimator of I ( f n ), so that
equals the
the mean square QMC error for a function f n with respect to Zsn
b
variance of the estimator, see Proposition 2. For our case, where the discretization
error is negligible, we also have Var [0,1)s [I(P) ( f )] Var Zbsn [I(P+) ( f )]
Var Zbsn [I P+ ( f n )].
The variance Var Zbsn [I(P+) ( f )] is for practical computation where each
real number in [0, 1) is represented as a finite-digit binary fraction. The estima-
334
T. Goda et al.
tor I(P+) ( f ) of I ( f ) has so small a bias that the variance Var Zbsn [I(P+) ( f )] is
a good approximation of the mean square error EZbsn [(I(P+) ( f ) I ( f ))2 ].
From the above justifications of the n-digit discretization for digitally shifted
point sets, we focus on analyzing the variance Var Zbsn [I P+ ( f n )] of the estimator
I P+ ( f n ). As the main result of this paper, in Sect. 4 below, we give a Koksma
Hlawka type inequality to bound the variance:
Var Zbsn [I P+ ( f n )] Cb,s,n f n W (P; ),
(2)
where Cb,s,n and f n are the same as in (1), denotes the Dick weight defined
later in Definition 3, and W (P; ) is a quantity which depends only on P and can
be computed in O(sn|P|) steps. Thus, similarly to WAFOM(P), W (P; ) can be a
useful measure for the quality of digital nets.
The remainder of this paper is organized as follows. We give some preliminaries
in Sect. 2. In Sect. 3, we consider the randomized QMC integration over Zsn
b . For
R, a subgroup P Zsn
and an element Zsn
a function F : Zsn
b
b
b , we first
prove the unbiasedness of the estimator I P+ (F) as mentioned above, and then that
the variance Var Zbsn [I P+ (F)] can be written in terms of the discrete Fourier coefficients of F, see Theorem 2. In Sect. 4, we apply a bound on the Walsh coefficients
for sufficiently smooth functions to the variance Var Zbsn [I P+ ( f n )], and obtain a
quality measure W (P; ) which satisfies a KoksmaHlawka type inequality on the
root mean square error. By using the MacWilliams-type identity given in [13], we
give a computable formula for W (P; ) in Sect. 5. Finally, in Sect. 6, we conduct
two types of experiments to show that our new quality measure is of use for finding
digital nets which show good convergence behavior of the root mean square error
for smooth integrands.
2 Preliminaries
Throughout this paper, we use the following notation. Let N be the set of positive
integers and N0 := N {0}. For a set S, we denote by |S| the cardinality
of S. For
z C, we denote by z the complex conjugate of z. Let b = exp(2 1/b).
In the following, we recall the notion of the discrete Fourier transform and see
the correspondence of discrete Fourier coefficients to Walsh coefficients.
hg
define the pairing as g h := b . We also define the pairing
For g, h Zb , we
sn
with
on Zb as A B := 1is,1 jn ai j bi j for A = (ai j ) and B = (bi j ) in Zsn
b
1 i s, 1 j n. We note the following properties used in this paper:
A B = (A B)1 = (A) B and A (B + C) = (A B)(A C).
We now define the discrete Fourier transform.
335
C,
is
defined
by
f
(A)
=
b
f : Zsn
BZb
b
b .
Each value f (A) is called a discrete Fourier coefficient.
We assume that P Zsn
is a digital net. We define the dual net of P as
b
|
A
B
=
1
for
all B P}. Several important properties of the
P := {A Zsn
b
discrete Fourier transform are summarized below (for a proof, see [13] for example).
Lemma 1 We have
AB =
AZbsn
bsn if B = 0,
0 if B = 0.
C be a function and
Theorem 1 (Poisson summation formula) Let f : Zsn
b
C
its
discrete
Fourier
transform.
Then
we
have
f : Zsn
b
1
f (B) =
f (A).
|P| BP
AP
Walsh functions and Walsh coefficients are widely used to analyze QMC integration using digital nets, and are defined as follows. Let f : [0, 1)s R and
k = (k1 , . . . , ks ) Ns0 . We define the k-th Walsh function wal k by
wal k (x) :=
j1
i, j i, j
i=1
where for 1 i s, we write the b-adic expansion of ki by ki = j1 i, j b j1
and xi by xi = j1 i, j b j , where for each i, infinitely many of the digits i, j are
different from b 1. By using Walsh functions, we define the k-th Walsh coefficient
F ( f )(k);
F ( f )(k) :=
[0,1)s
We refer to [5, Appendix A] for general information on Walsh functions. We denote the kth Walsh coefficient of f by F ( f )(k), while it is denoted by
f (k)
in [5, Appendix A]. The relationship between Walsh coefficients and discrete
Fourier coefficients is stated in the following proposition (for a proof, see [13,
sn
Ns0 by
Lemma 2]).
(ai, j ) Zsn
b . We define the function : Zb
nLet A = j1
sn
s
(A) := ( j=1 ai, j b )i=1 for A = (ai, j ) Zb . Note that each element of
(A) is strictly less than bn .
Proposition 1 Let A = (ai, j ) Zsn
and assume that f : [0, 1)s R is integrable.
b
Then we have
F ( f )((A)) =
f n (A).
336
T. Goda et al.
I P+ (F) = bsn
Zbsn
Zbsn
1
1 sn
F(B + ) =
b
|P|
|P|
BP
BP
1
I (F) = I (F),
|P|
F(B + )
Zbsn
BP
and thus we have the following proposition, showing that randomized QMC integration using a digitally shifted point set P + gives an unbiased estimator I P+ (F)
of I (F).
Proposition 2 For an arbitrary subset P Zsn
b , we have
EZbsn [I P+ (F)] = I (F).
It follows from this proposition that the mean square QMC error equals the variance Var Zbsn [I P+ (F)], namely we have
EZbsn [(I P+ (F) I (F))2 ] = Var Zbsn [I P+ (F)].
is a subgroup of Zsn
Hereafter we assume that P Zsn
b
b .
Lemma 2 Let P Zsn
be a subgroup. Then we have
b
I P+ (F) =
(A )1 F(A).
AP
Proof Let F (B) := F(B + ). Then for A Zsn
b , we can calculate F (A) as
F (A) = bsn
F (B)(A B)
BZbsn
= (A ())bsn
BZbsn
= (A )1 F(A),
F(B + )(A (B + ))
337
where we use the definition of F(A)
in the last equality. Thus by Theorem 1 we have
I P+ (F) =
1
F (B) =
(A )1 F(A),
F (A) =
|P| BP
AP
AP
Zbsn
= bsn
|I P+ (F) I (F)|2
Zbsn
2
sn
1
=b
(A ) F(A)
Zbsn AP \{0}
)
= bsn
(A ) F(A)
(A ) F(A
Zbsn AP \{0}
AP \{0}
A P \{0}
= bsn
=
A P \{0}
F(A
)
F(A)
((A A) )
Zbsn
F(A)
2 ,
AP \{0}
F(A)
2 .
AP \{0}
In particular, we immediately obtain the following corollary for the most important
case.
be a subgroup, i.e., a digital net over Zb , and f n be the
Corollary 1 Let P Zsn
b
n-digit discretization of f : [0, 1)s R. Then we have
Var Zbsn [I P+ ( f n )] =
2
f n (A) .
AP \{0}
Our results obtained in this section can be regarded as the discretized version of
known results [1, 7].
338
T. Goda et al.
(A) :=
j (ai, j ),
1is
1 jn
(3)
f n :=
uS S\u {0,...,n1}s|u|
|u|
[0,1]
[0,1]s|u|
1/2
2
f ( S\u ,nu ) (x) d x S\u d x u ,
where we used the following notation: Let S := {1, . . . , s}, x = (x1 , . . . , xs ), and for
u S let x u = (x j ) ju . ( S\u , nu ) denotes a sequence ( j ) j with j = n for j u
/ u. Moreover, we write f (n 1 ,...,n s ) = n 1 ++n s f /x1n 1 xsn s .
and j = j for j
Another upper bound on the Walsh coefficients of f has been shown by Yoshiki
[14] for b = 2. Applying Proposition 1, we also have the following;
Lemma 4 (Yoshiki) Let f : [0, 1]s R and define Ni := |{ j = 1, . . . , n | ai, j =
0}| and N := (Ni )1is Ns0 for A = (ai, j ) Zsn
2 . If the Nth mixed partial derivN1
(N)
N1 ++Ns
Ns
=
f /x1 xs of f exists and is continuous, then we have
ative f
f n (A) f (N) 2((A)+h(A)) ,
where h(A) :=
i, j
(4)
339
b2(A) ,
AP \{0}
W (P; + h) :=
b2((A)+h(A)) .
AP \{0}
(N)
Var Z2sn [I P+ ( f n )] max f W (P; + h)
0Nn
N=0
holds where the condition for the maximum is denoted by a multi-index, i.e., the
maximum value is taken over N = (N1 , . . . , Ns ) such that 0 Ni n for all i and
Ni = 0 for some i.
Proof Since the proofs of these inequalities are almost identical, we only show the
latter. Apply Lemma
4 to each term in the right-hand side of the result in Corollary 1.
For the factor f (N) , note that N depends only on A, that A runs through all
non-zero elements of P , and that Ni n for all i. Then we have
Var Zbsn [I P+ ( f n )]
AP \{0}
2
(N)
max f 22((A)+h(A))
0Nn
N=0
340
T. Goda et al.
(A) =
i, j (ai, j ),
1is
1 jn
Note that the Dick weight is given by i, j = j and the Hamming weight h is given
by i, j = 1. The key to the formula [9, (4.2)] for WAFOM is the discrete Fourier
transform. In order to obtain a formula for W (P; ), we use a MacWilliams-type
identity [13], which is also based on the discrete Fourier transform.
Let X := {xi, j (l)} be a set of indeterminates for 1 i s, 1 j n, and l Zb .
The complete weight enumerator polynomial of P , in a standard sense [8, Chap. 5],
is defined by
xi, j (ai, j ).
GW P (X ) :=
AP 1is
1 jn
xi, j (bi, j ),
BP 1is
1 jn
341
GW P (X ) =
1
GW P (Z ),
|P|
(5)
where in the right hand side every xi, j (g) X is substituted by z i, j (g) Z , which
is defined by
z i, j (g) :=
(l g)xi, j (l).
lZb
By substituting Y into X for (5), we have the following result. Since the result
follows in the same way as in [13, Corollary 2], we omit the proof.
be a subgroup. Then we have
Theorem 4 Let P Zsn
b
1
W (P; ) = 1 +
(1 + (bi, j )b2i, j ),
|P| BP 1is
1 jn
W (P; ) = 1 +
(1 + (bi, j )b2 j ),
|P| BP 1is
1 jn
1
W (P; + h) = 1 +
(1 + (bi, j )b2( j+1) ),
|P| BP 1is
1 jn
6 Numerical Experiments
To show that W works as a useful bound on the root mean square error we conduct
two types of experiments. The first one is to generate many point sets at random, and
342
T. Goda et al.
to observe the distribution of the criterion W and the standard deviation E . The other
one is to search for low-W point sets and to compare with digital nets consisting of
the first terms of a known low-discrepancy sequence.
In this section we consider only the case b = 2. The dimension of a digital net P
is denoted by m, i.e., |P| = 2m . We set s = 4, 12 and
as a subvector space of Zsn
2
use the following eight test functions for x = (xi )1is :
Polynomial
f 0 (x) = ( i xi
)6 ,
Exponential
f j (x) = exp(a
i xi ) (a = 2/3 for j = 1 and a = 3/2 for j = 2),
Oscillatory
f 3 (x) = cos( i xi ),
exp( i xi2 ),
Gaussian
f 4 (x) =
Product peak
f 5 (x) = i (xi2 + 1)1 ,
Continuous
f 6 (x) = i T (xi ) where T (x) = minkZ |3x 2k|,
Discontinuous f 7 (x) = i C(xi ) where C(x) = (1)3x .
Assuming that the discretization error is negligible, we have that I(P+) ( f ) is a
practically unbiased
estimator of I ( f ). Thus we may say that if the standard deviation E ( f ; P) := Var Z2sn [I(P+) ( f )] of the quasi-Monte Carlo integration is
small then the root mean square error EZ2sn [(I(P+) ( f ) I ( f ))2 ] is as small as
E (
f ; P). From the same assumption we also have that E ( f ; P) is well approximated
by Var Z2sn [I P+ ( f n )], on which we have a bound in Theorem 3.
In this section we implicitly use the weight + h so W (P) denotes W (P; + h).
The aim of the experiments is to establish that if W (P) is small then so is E ( f ; P).
For this we
compute W by the inversion formula in Corollary 2 and approximate
uniE ( f ; P) = Var Z2sn [I(P+) ( f )] by sampling 210 digital shifts Zsn
2
formly, randomly and independently of each other. We shall observe both the criterion
W and the variance E in binary logarithm, which is denoted by lg.
343
0.9861
0.9907
0.9897
0.9794
0.9723
0.9421
0.3976
0.0220
0.9920
0.9901
0.9887
0.9818
0.9599
0.9144
0.3218
0.0102
12
12
0.9821
0.9842
0.9821
0.8900
0.9975
0.9912
0.4077
0.0208
0.9776
0.9866
0.9851
0.8916
0.9951
0.9839
0.3258
0.0171
2
3
4
5
6
+
7
+++
+
+
+
+
+
8
+
++ +
+
+
+
+
+
++
++
lg E 9
+
+
+
++
+
+
+
+
+
++
+
+
+
10
++
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
11
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
++
+
+
+
+
+
+
+
12
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
13
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
14
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
15 +
+
++
+
16
15 1413121110 9 8 7 6
+
+
+
+ +
+
5 4 3 2 1
lg W
+
++
+
9
10
lg E 11
12
13
14
15
16
10
+++
+ +
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
++
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+++
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
9
lg W
344
T. Goda et al.
13
14
lg E
15
16
8
+
+
+
+
+
+++
+
+ +
+
+
+
+
+ +
+
++ ++
+++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ++++
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
++
+
+
+
+
+
+
+
+
++ +
+
+ ++
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
++
+
++ +
+++
+
+
+
++
+
+
+
+++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+ +
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++ ++
++++
++++
+
+
+
+
lg W
+ +
+
4
+ +
+++
+ + +
+
+
+ +
+ +
+
++
+
+
+ ++
+
+
+
+
+
+
+ +
+ +
+++
+ +
++ ++ +
++
+++
+
+
++
+
+
+ +
+
+ ++
++
+
++++
++
+ ++ +
+
+
+
+
+
+
+
+ +
+
+
++
+
+
++
+
++
+
+ ++
+ +
+
++
+
++
+
+
+
+
+
+
6
+
++
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
++
+
+
+
+
+ +
+ +++
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+ ++
+
+
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ++ +
+
+++
+++
+
+
+
+
++
+
+
+
+
+
+
+
+
+
++
+
+
+
+
++++
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+ +
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ++
+
+
+
+
+
+
7 +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
++
+
+
+
++ +
+ ++
+
+ + +
+
+
+++ + +
+
5
lg E
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+ + +
+
191817161514131211109 8 7 6 5 4 3
lg W
345
W (P)
W (Q)
1/Tl
,1 .
6.3 Discussion
The first experiment shows that W works as a useful bound on E for some of the
functions tested above. The other experiment shows that point sets with low W
values are easy enough to find and perform better for smooth test functions, while
these point sets work as badly as the Niederreiter-Xing sequence for non-smooth or
discontinuous functions.
lg W (PNX )
lg W (P)
lg E ( f 0 ; PNX )
lg E ( f 0 ; P)
$ lg E ( f 1 ; PNX )
lg E ( f 1 ; P)
lg E ( f 2 ; PNX )
lg E ( f 2 ; P)
lg E ( f 3 ; PNX )
lg E ( f 3 ; P)
lg E ( f 4 ; PNX )
lg E ( f 4 ; P)
lg E ( f 5 ; PNX )
lg E ( f 5 ; P)
lg E ( f 6 ; PNX )
lg E ( f 6 ; P)
lg E ( f 7 ; PNX )
lg E ( f 7 ; P)
lg W (PNX )
lg W (P)
lg E ( f 0 ; PNX )
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
12
12
12
10.31
12.59
0.19
2.14
9.81
12.74
3.76
5.25
10.93
13.13
12.44
13.16
13.24
13.81
9.77
8.93
4.32
4.53
5.18
6.16
9.95
m=8
12.40
14.39
2.17
3.99
11.99
14.72
5.60
6.87
13.62
14.91
14.57
15.69
15.39
16.24
11.23
10.31
4.96
4.12
6.07
6.93
8.89
9
12.90
16.39
3.22
6.03
12.07
16.54
6.67
8.82
14.14
17.00
15.00
17.26
15.57
17.89
11.54
11.70
5.70
5.25
6.68
7.89
8.00
10
12.98
17.91
3.45
7.51
12.12
18.62
6.93
10.20
14.47
18.57
15.14
18.05
15.67
18.30
12.13
9.55
6.17
5.68
6.82
8.67
7.84
11
15.74
19.50
5.93
9.35
15.01
20.58
9.42
11.55
16.84
20.17
17.88
19.75
18.48
20.66
12.20
11.88
6.47
6.21
6.92
9.66
7.80
12
Table 2 Comparison between NiederreiterXing sequences (PNX ) and low-W point sets (P) in lg W and lg E .
13
15.77
21.82
5.98
11.95
15.00
23.09
9.50
13.51
16.84
22.40
17.97
21.43
18.55
21.79
14.57
14.85
6.65
7.40
6.98
10.73
7.76
14
15.77
23.67
5.94
13.63
14.98
24.82
9.46
15.34
16.86
24.28
17.95
24.32
18.55
25.12
15.92
15.56
8.06
7.05
11.52
11.67
1.39
15
(continued)
23.20
26.00
12.75
16.40
23.26
27.47
15.92
17.45
24.03
27.04
25.30
24.46
26.47
24.66
17.60
17.19
9.22
8.84
12.01
12.64
0.09
346
T. Goda et al.
lg E ( f 0 ; P)
lg E ( f 1 ; PNX )
lg E ( f 1 ; P)
lg E ( f 2 ; PNX )
lg E ( f 2 ; P)
lg E ( f 3 ; PNX )
lg E ( f 3 ; P)
lg E ( f 4 ; PNX )
lg E ( f 4 ; P)
lg E ( f 5 ; PNX )
lg E ( f 5 ; P)
lg E ( f 6 ; PNX )
lg E ( f 6 ; P)
lg E ( f 7 ; PNX )
lg E ( f 7 ; P)
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
Table 2 (continued)
8.09
0.57
2.20
11.02
10.58
6.14
7.18
10.56
11.54
10.69
12.00
13.64
13.87
4.06
4.00
m=8
7.19
1.60
3.05
10.20
9.91
7.34
8.01
11.52
12.36
11.70
12.86
14.31
14.16
4.45
4.51
9
6.05
2.43
4.12
9.77
9.07
8.32
9.01
12.07
13.28
12.33
13.97
14.93
14.83
4.93
5.00
10
4.98
2.60
5.07
9.54
8.53
8.64
10.16
12.27
14.09
12.62
14.86
15.65
15.48
5.50
5.52
11
4.15
2.64
5.97
9.40
7.53
8.97
10.78
12.39
14.82
12.70
15.17
16.11
15.97
6.01
5.96
12
2.46
2.69
7.35
9.25
6.80
9.27
11.86
12.41
16.17
12.71
16.99
16.62
16.45
6.48
6.50
13
1.49
8.27
8.36
6.00
5.84
12.74
12.90
16.99
17.10
18.09
17.90
17.10
17.30
7.02
6.95
14
0.31
8.99
9.61
5.45
5.18
13.51
13.76
17.47
18.20
18.69
19.34
17.54
18.09
7.48
7.52
15
348
Fig. 5 W values for s = 4
T. Goda et al.
10
11
12
13
14
15
16
17
lg W 18
19
20
21
22
23
24
25
26
Niederreiter-Xing sequence
Low-W digital nets
+
+
10
11
12
13
14
15
dimension/F2
10
11
12
13
lg W
Niederreiter-Xing sequence
Low-W digital nets
10
11
12
13
14
15
dimension/F2
13
14
15
16
17
18
19
lg E 20
21
22
23
24
25
26
27
Niederreiter-Xing sequence
Low-W digital nets
+
+
10
11
12
dimension/F2
13
14
15
349
+
+
Niederreiter-Xing sequence
Low-W digital nets
lg E 6
8
8
10
11
12
13
14
15
dimension/F2
Acknowledgments The authors would like to thank Prof. Makoto Matsumoto for helpful discussions and comments. The work of T.G. was supported by Grant-in-Aid for JSPS Fellows No.24-4020.
The works of R.O., K.S. and T.Y. were supported by the Program for Leading Graduate Schools,
MEXT, Japan. The work of K.S. was partially supported by Grant-in-Aid for JSPS Fellows Grant
number 15J05380.
References
1. Baldeaux, J., Dick, J.: QMC rules of arbitrary high order: reproducing kernel Hilbert space
approach. Constr. Approx. 30(3), 495527 (2009)
2. Dick, J.: Walsh spaces containing smooth functions and quasi-Monte Carlo rules of arbitrary
high order. SIAM J. Numer. Anal. 46(3), 15191553 (2008)
3. Dick, J.: The decay of the Walsh coefficients of smooth functions. Bulletin of the Australian
Mathematical Society 80(3), 430453 (2009)
4. Dick, J.: On quasi-Monte Carlo rules achieving higher order convergence. In: Monte Carlo and
Quasi-Monte Carlo Methods 2008, pp. 7396. Springer, Berlin (2009)
5. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and quasi-Monte
Carlo integration. Cambridge University Press, Cambridge (2010)
6. Harase, S., Ohori, R.: A search for extensible low-WAFOM point sets (2013)
7. LEcuyer, P., Lemieux, C.: Recent advances in randomized quasi-Monte Carlo methods.
Modeling uncertainty. International Series in Operations Research and Management Science,
vol. 46, pp. 419474. Kluwer Academic Publishers, Boston, MA (2002)
8. MacWilliams, F.J., Sloane, N.J.A.: The theory of error-correcting codes. I. North-Holland
Mathematical Library, North-Holland Publishing Co., Amsterdam (1977)
9. Matsumoto, M., Saito, M., Matoba, K.: A computable figure of merit for quasi-Monte Carlo
point sets. Math. Comput. 83(287), 12331250 (2014)
10. Matsumoto, M., Yoshiki, T.: Existence of higher order convergent quasi-Monte Carlo rules via
Walsh figure of merit. In: Monte Carlo and Quasi-Monte Carlo Methods 2012, pp. 569579.
Springer, Heidelberg (2013)
11. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods, CBMS-NSF
Regional Conference Series in Applied Mathematics, vol. 63. Society for Industrial and Applied
Mathematics (SIAM), Philadelphia, PA (1992)
350
T. Goda et al.
12. Nuyens, D.: The magic point shop of QMC point generators and generating vectors, http://
people.cs.kuleuven.be/~dirk.nuyens/qmc-generators/
13. Suzuki, K.: WAFOM on abelian groups for quasi-Monte Carlo point sets. Hiroshima Math. J.
45(3), 341364 (2015)
14. Yoshiki, T.: Bounds on Walsh coefficients by dyadic difference and a new Koksma-Hlawka
type inequality for Quasi-Monte Carlo integration (2015)
Abstract Pricing of weather derivatives often requires a model for the underlying
temperature process that can characterize the dynamic behavior of daily average
temperatures. The comparison of different stochastic models with a different number
of model parameters is not an easy task, especially in the absence of a liquid weather
derivatives market. In this study, we consider four widely used temperature models
in pricing temperature-based weather derivatives. The price estimates obtained from
these four models are relatively similar. However, there are large variations in their
estimates with respect to changes in model parameters. To choose the most robust
model, i.e., the model with smaller sensitivity with respect to errors or variation in
model parameters, the global sensitivity analysis of Sobol is employed. An empirical
investigation of the robustness of models is given using temperature data.
Keywords Weather derivatives Sobol sensitivity analysis Model robustness
1 Introduction
Weather related risks exist in many economic sectors, especially in agriculture,
tourism, energy, and construction. Hanley [10] reports that about one-seventh of
the industrialized economy is sensitive to weather. The weather related risks can be
A. Gnc
Xian Jiaotong Liverpool University, Suzhou 215123, China
e-mail: Ahmet.Goncu@xjtlu.edu.cn
Y. Liu
Hydrogeology Department, Earth Sciences Division,
Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
e-mail: yaningliu@lbl.gov
G. kten (B) M.Y. Hussaini
Florida State University, Tallahassee, FL 32306, USA
e-mail: okten@math.fsu.edu
M.Y. Hussaini
e-mail: yousuff@fsu.edu
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_17
351
352
A. Gnc et al.
hedged via weather derivatives, which is a relatively new form of a financial instrument that has contingent payoffs with respect to possible weather events or indices.
The market for weather derivatives was established in the USA in 1997 following
the deregulation of the energy market. The Weather Risk Management Association
(WRMA) reported that as of 2011 the weather derivatives market has grown to 12
billion US dollars. The Chicago Mercantile Exchange (CME) trades standardized
weather derivatives with the highest trading volume in temperature-based weather
derivatives; this type of derivatives is the focus of this study.
There are different approaches to price weather derivatives, such as, historical
burn analysis, index modeling, and stochastic modeling of daily average temperatures ([13]). In the stochastic modelling approach, a mean-reverting process such as
the OrnsteinUhlenbeck process is often used for modelling the evolution of daily
average temperatures at a particular measurement station. Amongst others, some
examples of studies that follow this approach are given by Alaton et. al. [1], Benth
and Benth [3], Brody et al. [4], Cao and Wei [6], Platen and West [20], Huang et al.
[12], and Gnc [8]. Some studies suggest the superiority of daily temperature modelling over the index modelling approach (Oetomo and Stevenson [18], Schiller et al.
[25]). Another important modeling approach uses time series to model daily average
temperatures. An example is the model of Campbell and Diebold [5], which forecasts
daily average temperatures using an autoregressive conditional heteroscedasticity
(ARCH) model.
Within the class of dynamic models of daily temperatures, four models that are
highly cited in the literature (see, for example, the survey by Schiller et. al. [25]) and
widely used in the weather derivatives industry are given by Alaton et al. [1], Benth
and Benth [3], Brody et al. [4] and Campbell and Diebold [5]. In the study by Gnc
[9] these four models are compared in terms of their forecasting power of the futures
prices for different locations. Different models come with different parameters that
need to be estimated from the historical data, and although we may know how accurately a certain parameter can be estimated, the question of the impact of the parameter estimation error on the overall model has not been investigated in the literature. In
this paper, we propose a framework based on global sensitivity analysis to assess the
robustness of a model with respect to the uncertainties in its parameters. We apply our
methodology to the four different temperature models given in [1], [35].
The paper is organized as follows. In Sect. 2, we describe the dataset utilized,
introduce the temperature models investigated, and present estimation results of
each model. Section 3 discusses the global sensitivity analysis employed and Sect. 4
presents numerical results and conclusions.
353
(1)
where = 2/365. The sine and cosine functions capture the seasonality of daily
temperatures, whereas the linear term captures the trend in temperatures which might
be due to global warming or urbanization effects. The parameters A, B, C, D can
be estimated from the data by a linear regression. An improvement in the fit can
be obtained by increasing the number of sine and cosine functions in the above
representation. However, in our dataset, we did not observe any significant improvements by adding more terms. Our dataset consists of daily average temperatures1
and HDD/CDD monthly futures prices for the measurement station at New York La
Guardia International Airport. Daily average temperature data for the period between
01/01/1997 and 01/21/2012 is used to estimate the parameters of each model considered. In Fig. 1, the historical temperatures for New York are plotted.
1 Daily
average temperatures are measured by the Earth Satellite Corporation and our dataset is
provided by the Chicago Mercantile Exchange (CME).
354
A. Gnc et al.
Fahrenheit
100
90
80
70
60
50
40
30
20
10
0
0
1000
2000
3000
4000
5000
6000
dTtm
+ a(Ttm Tt ) dt + t dWt ,
dt
(2)
where Tt is the temperature at time t, a is the mean reversion parameter, t is a piecewise constant volatility function, Wt is P-Brownian motion (the physical probability
measure) and Ttm is the long-term mean temperature given by Eq. (1).
The volatility of daily temperatures t is assumed to be constant for each month
of the year. We will not discuss the estimation of model parameters since they are
explained in [1]. We estimate the piecewise constant volatility function for our dataset
using the regression and quadratic variation methods. Figure 2 plots these results,
9
(t)
7
6
5
4
3
2
0
50
100
150
200
Time (t)
250
300
350
400
355
Table 1 Estimated parameters for the model by Alaton, Djehiche, and Stillberger (standard errors
of estimators in parenthesis)
A
B
C
D
a
3.0 104
(5.6 105 )
55.7952
(0.1849)
8.7965
(0.1307)
20.0178
(0.1307)
0.3491
(0.01)
Table 2 Estimated monthly volatility (t ) for each month of the year, for the model by Alaton,
Djehiche, and Stillberger (standard errors of estimators in parenthesis)
Jan
Feb
Mar
Apr
May
Jun
Volatility
6.36 (0.76)
5.84 (0.64)
5.82 (0.64)
5.52 (0.57)
4.69 (0.41)
4.53 (0.39)
Jul
Aug
Sep
Oct
Nov
Dec
Volatility
3.61 (0.25)
3.53 (0.23)
4.03 (0.30)
4.67 (0.41)
5.00 (0.47)
5.96 (0.67)
together with the empirical daily volatility and its Fourier series fit. Tables 1 and 2
display the estimated model parameters (including the parameters for Eq. (1)) for
our dataset with the standard errors given in parenthesis.
I1
ci sin(it) +
i=1
J1
d j cos(jt),
(3)
j=1
1.2162 (0.7708)
d1
d2
d4
9.3381 (0.7708)
d3
1.1303 (0.7708)
356
A. Gnc et al.
2
(T)
H=0.64
H=0.50
1.5
Slope = 0.36
(T)
1
0.5
0
0.5
1
1.5
2.5
3.5
4.5
5.5
6.5
log (T)
dTtm
m
+ a(Tt Tt ) dt + t dWtH .
dt
(4)
357
L
l sin(lt) + l cos(lt) +
p Tt p + t t ,
(5)
p=1
l=1
t2 = 0 +
P
Q
R
2
q sin(qt) + q cos(qt) +
r tr
,
(6)
r =1
q=1
Table 4 Estimated parameters for the model by Campbell and Diebold (standard errors of estimators in parenthesis)
1
2
1
1
1
2
3
15.2851
(0.8534)
0.0001
(3.6105 )
1.0969
(0.1424)
5.9156
(0.3247)
0.8820
(0.0137)
0.3184
(0.0184)
0.1193
(0.0187)
10
0.0149
(0.0189)
0.0160
(0.0192)
0.0185
(0.0189)
0.0019
(0.0186)
0.0066
(0.0189)
0.0207
(0.0183)
0.0017
(0.0134)
Table 5 Estimated parameters for the model by Campbell and Diebold, contd. (standard errors of
estimators in parenthesis)
0
1
1
1
2
3
16.4401
(0.9091)
2.2933
(0.6893)
7.3571
(0.7528)
0.0294
(0.0133)
0.0366
(0.0133)
0.0110
(0.0133)
0.0465
(0.0133)
0.0505
(0.0133)
0.0114
(0.0133)
0.0151
(0.0133)
0.0611
(0.0133)
0.0043
(0.0133)
358
A. Gnc et al.
assume a Gaussian noise term after removing the effects of trend, seasonality, and
heteroscedasticity in the daily temperatures, whereas the model by Brody et. al. [4]
captures the long-memory effects by using the fractional Brownian motion different
from the other models. For option pricing of short term weather contracts it is possible to assume a simpler form of heteroscedasticity in the volatility which would
be sufficient to price monthly weather options (see [9]). The model by Campbell
and Diebold [5] might be prone to pricing errors due to the large number of ARCH
coefficients to be estimated, whereas the model by Brody et. al. [4] suffers from the
difficulty of estimating the Hurst exponent and long-term sensitivity with respect to
this parameter. These issues are investigated in the next section.
359
tion f (x), called the total variance, is 2 = [0,1]d f (x)2 dx f 2 . The total variance
can be written as the sum of all partial variances: 2 = u{1,...,d} u2 . Based on
the ANOVA decomposition, Sobol [26] introduced
two types of global
sensitivity
indices (GSI) for an index set u: S u = 12 vu v2 and S u = 12 vu= v2 . The
sensitivity index S u sums all the normalized variances whose index sets are subsets
of u, and S u sums all those whose index sets have non-empty intersections with u.
Clearly, S u S u , and hence they can be used as the lower and upper bounds for
the sensitivity measures on the parameters x u . The GSI with respect to singletons,
S {i} , for instance, represents the impact on the output of parameter xi alone, and
S {i} considers the individual impact as well as the cooperative impact of xi and the
other parameters. In this sense, S {i} and S {i} are called main effects and total effects,
respectively. In the general case, S u and S u are also called lower and upper Sobol
indices. The main effects S {i} can be used to prioritize the model parameters in terms
of their importance, while the total effects S {i} can be used as a tool to reduce model
complexity. If S {i} is relatively small, then the corresponding parameter can be frozen
at its nominal value.
4 Numerical Results
In our global sensitivity analysis, the model output is the estimate of the HDD call
option price that is calculated by averaging the payoff in Definition 2. The model
inputs are the temperature model parameters, which are estimated from the historical
temperatures. In our numerical results, the pricing of the weather derivatives is done
under the physical probability measure. We estimate the price of an HDD call option
on December 31, 20122 with strike price 800 HDDs. The contract period is January
1-31, 2012. We will refer to the four weather derivatives models considered in Sect. 2
by simply using the name of the first author. The parameters of the weather derivatives models can be classified into six groups: trend, seasonality, volatility, mean
reversion, Hurst parameters, and ARCH parameters. Trend, seasonality and volatility are common to Alatons, Benths and Brodys models. Brodys model assumes a
fractional Brownian motion and thus involves the additional Hurst parameter. Campbells model considers an AR(P) process for the temperatures and an ARCH(R) for
the volatility process. Least squares regression is used to obtain the mean of each
estimate and its standard error. The detailed grouping is listed in Table 6. We apply
global sensitivity analysis to these groups of parameters. Table 7 shows the Sobol
indices S with respect to groups of parameters for all models. The Sobol indices
are computed using a sample size of 20,000, and the price of the derivative is computed using a randomly permuted random-start Halton sequence ([19]) of sample
size 10,000.
2 Our historical data starts from 1/1/1997, which corresponds to t
360
A. Gnc et al.
A, B
A, B
C, D
C, D
i , i = 1, . . . , 12 c, ci , di ,
i = 1, . . . , 4
a
a
N/A
N/A
0.8240
0.1053
0.0736
0.0040
N/A
N/A
0.8794
0.1148
0.0019
0.0027
N/A
N/A
Campbell
A, B
1 , 2
C, D
1 , 1
i , i = 1, . . . , 12 0 , 1 , ...9
a
H
1 , ...10
N/A
Brody
Campbell
0.6317
0.0823
0.2666
0.0118
0.0134
N/A
0.2073
0.0278
0.00001
N/A
N/A
0.8313
The sample sizes used for sensitivity analysis and for calculating the prices are 20,000 and 10,000,
respectively. M = 31, t0 = 5475, and regression standard errors are chosen as standard deviations
For all models, the sum of the upper Sobol indices is approximately 1, indicating
that the secondary interactions between groups of parameters are small. From Table 7,
we see that the largest sensitivity in the models by Alaton, Benth, and Brody are
due to the trend parameters. The sensitivities of the mean reversion parameters are
negligible. For Campbells model, the ARCH parameters are the most sensitive,
while the seasonality and volatility parameters are the most insensitive.
We first compare Alatons, Benths and Brodys models due to their similarities.
Note that the trend and seasonality parameters are the same for the three models and
the characterization of volatility by Benth is different from Alaton. Despite the fact
that Brodys model considers volatility the same way as Alatons model, the use of
fractional Brownian motion changes the behavior of the underlying stochastic process
and thus changes the volatility part as well. We keep the uncertainties of all groups
of parameters, excluding volatility, fixed at their regression standard errors. We vary
the uncertainty of the volatility group by increasing the coefficient of variation (CoV,
defined as the ratio of standard deviation to the mean) for each parameter in the
volatility group from 1 to 35 %. For example, when the CoV is 1 % for the firstmonth volatility parameter 1 in Alatons model, then 1 is modeled as a normal
distribution with mean 6.36, and standard deviation 0.01 6.36. (The estimated
mean for 1 is 6.36, as shown in Table 2.)
Figure 4a shows that as the CoV of volatility increases, Sobols non-normalized
upper index 2 S V , which represents the sum of all the partial variances of groups
(b)
(a)
Alaton
Benth
Brody
250
Alaton
Benth
Brody
110
105
2 S { V
200
2 S { V
361
150
100
100
95
90
50
85
80
0
0.05
0.1
0.15
0.2
0.25
CoV of volatility
0.3
0.35
0.05
0.1
0.15
0.2
0.25
0.3
0.35
CoV of volatility
Fig. 4 Model robustness using Sobol indices. a Sobols upper index for the volatility parameters
against the coefficient of variation of volatility; b Sobols lower index for the compliment of volatility
parameters against the coefficient of variation of volatility
362
A. Gnc et al.
(b)
(a)
Benth
Campbell
150000
15000
2 S { T }
2 S { T }
120000
90000
60000
30000
0
0
Benth
Campbell
18000
12000
9000
6000
3000
0.05 0.1
0.15 0.2
0.25 0.3
0
0
0.35
0.05
0.1
0.15
0.2
0.25 0.3
0.35
CoV of trend
CoV of trend
Fig. 5 a Sobols upper index for the trend parameters against the coefficient of variation of trend;
b Sobols lower index for the compliment of trend parameters against the coefficient of variation
of trend
(a)
(b)
25000
24000
Benth
Campbell
20000
18000
2 S { S }
2 S { S }
Benth
Campbell
21000
15000
10000
15000
12000
9000
6000
5000
3000
0
0
0
0.05
0.1
0.15
0.2
0.25
CoV of seasonality
0.3
0.35
0.05
0.1
0.15
0.2
0.25
0.3
0.35
CoV of seasonality
Fig. 6 a Sobols upper index for the seasonality parameters against the coefficient of variation
of seasonality; b Sobols lower index for the compliment of seasonality parameters against the
coefficient of variation of seasonality
Next we compare Benths model with Campbells time series model. Figure 5a
shows that as the CoV of the trend parameters increases, the non-normalized upper
Sobol index 2 S T increases monotonically in a similar pattern for both models.
However, when we examine the lower Sobol index 2 S{T } plot in Fig. 5b, we
observe that Campbells model has significantly larger sensitivity for components
other than the trend. This also means that the total variance of the model output for
Campbells model is much larger. Figure 6 plots the sensitivity for the seasonality
parameters. The upper Sobol index increases at a similar rate for both Benths
and Campbells models. However, the lower Sobol index for Campbells model
(a)
(b)
25000
Benth
Campbell
Benth
Campbell
20000
2 S { V
2 S { V
363
4
3
15000
10000
2
5000
1
0
0
0
0.05
0.1
0.15
0.2
0.25
0.3
0.05
CoV of volatility
0.1
0.15
0.2
0.25
0.3
CoV of volatility
Fig. 7 a Sobols upper index for the volatility parameters against the coefficient of variation of
volatility; b Sobols lower index for the compliment of volatility parameters against the coefficient
of variation of volatility
140000
trend
seasonality
120000
volatility
100000
80000
60000
40000
20000
0
0
0.05
0.1
0.15
0.2
0.25
0.3
CoV
364
A. Gnc et al.
106.69
108.04
104.86
104.16
Brody
Campbell
118.95
114.61
140.70
20337.33
The sample sizes used for sensitivity analysis and for calculating the prices are 20,000 and 10,000,
respectively. M = 31, t0 = 5475, and regression standard errors are chosen as standard deviations
did in Table 8 for the four weather derivative models considered in the paper. From
this table, one can deduce the models by Alaton, Benth and Brody perform equally
well, and the model by Campbell is unsatisfactory. However, the information in this
table does not reveal how the variances will change as the models are recalibrated
with different input, resulting in different standard errors for the input parameters.
In other words, the total variance information does not explain how robust a model
is with respect to its input parameter(s). Our qualitative analysis computes Sobol
sensitivity indices for each model, with inputs (or input groups) that match across
models, and compares the growth of the sensitivity indices as the estimation error in
the input parameters (CoV) increases. Based on our empirical results, we conclude
Benths model is the most robust; the model that has the smallest rate of increase in
the sensitivity indices as a function of input parameter error. In future work, we will
investigate developing a quantitative approach to define the robustness of a model.
References
1. Alaton, P., Djehiche, B., Stillberger, D.: On modelling and pricing weather derivatives. Appl.
Math. Financ. 9, 120 (2002)
2. Alexanderian, A., Winokur, J., Sraj, I., Srinivasan, A., Iskandarani, M., Thacker, W.C., Knio,
O.M.: Global sensitivity analysis in an ocean general circulation model: a sparse spectral
projection approach. Comput. Geosci. 16, 757778 (2012)
3. Benth, F.E., Benth, J.S.: The volatility of temperature and pricing of weather derivatives. Quant.
Financ. 7, 553561 (2007)
4. Brody, D.C., Syroka, J., Zervos, M.: Dynamical pricing of weather derivatives. Quant. Financ.
3, 189198 (2002)
5. Campbell, S., Diebold, F.X.: Weather forecasting for weather derivatives. J. Am. Stat. Assoc.
100, 616 (2005)
6. Cao, M., Wei, J.: Weather derivatives valuation and market price of weather risk. J. Futur. Mark.
24, 10651089 (2004)
7. Engle, R.F.: Autoregressive conditional heteroscedasticity with estimates of variance of united
kingdom inflation. Econometrica 50, 9871008 (1982)
8. Gnc, A.: Pricing temperature-based weather derivatives in China. J. Risk Financ. 13, 3244
(2011)
9. Gnc, A.: Comparison of temperature models using heating and cooling degree days futures.
J. Risk Financ. 14, 159178 (2013)
10. Hanley, M.: Hedging the force of nature. Risk Prof. 1, 2125 (1999)
11. Hrdle, W.K., Cabrera, B.L.: The Implied Market Price of Weather Risk. Appl. Math. Financ.
19, 5995 (2012)
365
12. Huang, H.-H., Shiu, Y.-M., Lin, P.-S.: HDD and CDD option pricing with market price of
weather risk for Taiwan. J. Futu. Mark. 28, 790814 (2008)
13. Jewson, S.: Weather Derivative Valuation: The Meteorological, Statistical, Financial and Mathematical Foundations. Cambridge University Press, Cambridge (2005)
14. Kucherenko, S., Rodriguez-Fernandez, M., Pantelides, C., Shah, N.: Monte Carlo evaluation
of derivative-based global sensitivity measures. Reliab. Eng. Syst. Saf. 94, 11351148 (2009)
15. Kucherenko, S., Feil, B., Shah, N., Mauntz, W.: The identification of model effective dimensions
using global sensitivity analysis. Reliab. Eng. Syst. Saf. 96, 440449 (2011)
16. Liu, R., Owen, A.: Estimating mean dimensionality of analysis of variance decompositions. J.
Am. Stat. Assoc. 101, 712721 (2006)
17. Liu, Y., Jimenez, E., Hussaini, M.Y., kten, G., Goodrick, S.: Parametric uncertainty quantification in the Rothermel model with randomized quasi-Monte Carlo methods. Int. J. Wildland
Fire 24, 307316 (2015)
18. Oetomo, T., Stevenson, M.: Hot or Cold? a comparison of different approaches to the pricing
of weather derivatives. J. Emerg. Mark. Financ. 4, 101133 (2005)
19. kten, G., Shah, M., Goncharov, Y.: Random and deterministic digit permutations of the Halton
sequence. In: Plaskota, L., Wozniakowski, H. (eds.) 9th International Conference on Monte
Carlo and Quasi-Monte Carlo Methods in Scientific Computing, Warsaw, Poland, August
1520, pp. 589602. Springer, Berlin (2012)
20. Platen, E., West, J.: A fair pricing approach to weather derivatives. Asian-Pac. Financ. Mark.
11, 2353 (2005)
21. Rohmer, J., Foerster, E.: Global sensitivity analysis of large-scale numerical landslide models
based on Gaussian-Process meta-modeling. Comput. Geosci. 37, 917927 (2011)
22. Saltelli, A., Tarantola, S., Chan, K.P.-S.: A quantitative model-independent method for global
sensitivity analysis of model output. Technometrics 41, 3956 (1999)
23. Saltelli, A.: Making best use of model evaluations to compute sensitivity indices. Comput.
Phys. Commun. 145, 80297 (2002). doi:10.1016/S0010-4655(02)00280-1
24. Saltelli, A.: Global Sensitivity Analysis: The Primer. Wiley, New Jersey (2008)
25. Schiller, F., Seidler, G., Wimmer, M.: Temperature models for pricing weather derivatives.
Quant. Financ. 12, 489500 (2012)
26. Sobol, I.M.: Sensitivity estimates for non-linear mathematical models. Math. Model. Comput.
Exp. 1, 407414 (1993)
27. Sobol, I.M.: Global sensitivity indices for nonlinear mathematical models and their
Monte Carlo estimates. Math. Comput. Simul. 55, 271280 (2001). doi:10.1016/S03784754(00)00270-6
28. Sobol, I.M., Kucherenko, S.: Derivative based global sensitivity measures and their link with
global sensitivity indices. Math. Comput. Simul. 79, 30093017 (2009)
Abstract Quasi-Monte Carlo cubature methods often sample the integrand using
Sobol (or other digital) sequences to obtain higher accuracy than IID sampling. An
important question is how to conservatively estimate the error of a digital sequence
cubature so that the sampling can be terminated when the desired tolerance is reached.
We propose an error bound based on the discrete Walsh coefficients of the integrand
and use this error bound to construct an adaptive digital sequence cubature algorithm.
The error bound and the corresponding algorithm are guaranteed to work for integrands lying in a cone defined in terms of their true Walsh coefficients. Intuitively,
the inequalities defining the cone imply that the ordered Walsh coefficients do not
dip down for a long stretch and then jump back up. An upper bound on the cost of our
new algorithm is given in terms of the unknown decay rate of the Walsh coefficients.
Keywords Quasi-Monte Carlo methods Multidimensional integration Digital
sequences Sobol sequences Adaptive algorithms Automatic algorithms
1 Introduction
Quasi-Monte Carlo cubature rules approximate multidimensional integrals over the
unit cube by an equally weighted sample average of the integrand values at the first n
F.J. Hickernell Ll.A. Jimnez Rugama (B)
Department of Applied Mathematics, Illinois Institute of Technology,
10 W. 32nd Street, E1-208, Chicago, IL 60616, USA
e-mail: ljimene1@hawk.iit.edu
F.J. Hickernell
e-mail: hickernell@iit.edu
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_18
367
368
i=0
n1
The discrepancy, D({z i }i=0
), measures how far the empirical distribution of the first
n nodes differs from the uniform distribution. The variation, V ( f ), is some seminorm of the integrand, f . The definitions of the discrepancy and variation are linked
to each other. Examples of such error bounds are given by [3, Chaps. 23], [4], [11,
Sect. 5.6], [12, Chaps. 23], and [14, Chap. 9].
A practical problem is how large to choose n so that the absolute error is smaller
than some user-defined tolerance, . Error bounds of the form (1) do not help in this
regard because it is too hard to compute V ( f ), which is typically defined in terms
of integrals of mixed partial derivatives of f .
This article addresses the challenge of reliable error estimation for quasi-Monte
Carlo cubature based on digital sequences, of which Sobol sequences are the most
popular example. The vector space structure underlying these digital sequences facilitates a convenient expression for the error in terms of the (Fourier)-Walsh coefficients
of the integrand. Discrete Walsh coefficients can be computed efficiently, and their
decay provides a reliable cubature error estimate. Underpinning this analysis is the
assumption that the integrands lie in a cone defined in terms of their true Walsh
coefficients; see (13).
The next section introduces digital sequences and their underlying algebraic structure. Section 3 explains how the cubature error using digital sequences as nodes can
be elegantly formulated in terms of the Walsh series representation of the integrand.
Our contributions begin in Sect. 4, where we derive a reliable data-based cubature
error bound for a cone of integrands, (16), and an adaptive cubature algorithm based
on that error bound, Algorithm 2. The cost of the algorithm is also represented in
terms of the unknown decay of the Walsh series coefficients and the error tolerance
in Theorem 1. A numerical example and discussion then conclude this article. A
parallel development for cubature based on lattice rules is given in [9].
2 Digital Sequences
The integrands considered here are defined over the half open d-dimensional unit
cube, [0, 1)d . For integration problems on other domains one may often transform the
integration variable so that the problem is defined on [0, 1)d . See [1, 58] for some
discussion of variable transformations and the related error analysis. The example in
Sect. 5 also employs a variable transformation.
369
x=
=1
d
x j b
xt =
, t=
x=
t j b
,
j=1
=1
=1
j=1
(mod 1)
[x j mod b]b
x t := x (t),
j=1
=1
x j , t j Fb := {0, . . . , b 1},
ax := x x a Fb .
a times
j=1
We do not have associativity for all of [0, 1)d . For example, for b = 2,
1/6 = 2 0.001010 . . . , 1/3 = 2 0.010101 . . . , 1/2 = 2 0.1000 . . .
1/3 1/3 = 2 0.00000 . . . = 0, 1/3 1/6 = 2 0.011111 . . . = 1/2,
(1/3 1/3) 1/6 = 0 1/6 = 1/6, 1/3 (1/3 1/6) = 1/3 1/2 = 5/6.
This lack of associativity comes from the possibility of digitwise addition resulting
in an infinite trail of digits b 1, e.g., 1/3 1/6 above.
Define the Boolean operator that checks whether digitwise addition of two points
does not result in an infinite trail of digits b 1:
true, min j=1,...,d sup{ : [(x j + t j ) mod b] = b 1} = ,
ok(x, t) =
(2)
false, otherwise.
If P [0, 1)d is some set that is closed under and ok(x, t) = true for all x, t P,
then associativity holds for all points in P. Moreover, P is an Abelian group and
also a vector space over the field Fb .
=0
i z b ,
where i =
i b N0 , i Fb .
(3a)
(3b)
=0
b 1
is a subspace
Such a P is called a digital sequence. Moreover, any Pm := {z i }i=0
of P and is called a digital net. From this definition it is clear that
m
370
(a)
(b)
0.75
0.75
0.5
0.5
0.25
0.25
0.25
0.5
0.75
0.25
0.5
0.75
Fig. 1 a 256 Sobol points, b 256 scrambled and digitally shifted Sobol points
(z 1 ) j1 (z b ) j1 (z b2 ) j1
(z 1 ) j2 (z b ) j2 (z b2 ) j2
C j = (z 1 ) j3 (z b ) j3 (z b2 ) j3
..
..
..
.
.
.
..
.
for j = 1, . . . , d.
The Sobol sequence works in base b = 2 and makes a careful choice of the basis
{z 1 , z 2 , z 4 , . . .} so that the points are evenly distributed. Figure 1a displays the initial
points of the two-dimensional Sobol sequence. In Fig. 1b the Sobol sequence has
been linearly scrambled to obtain another digital sequence and then digitally shifted.
3 Walsh Series
Non-negative integer vectors are used to index the Walsh series for the integrands.
The set Nd0 is a vector space under digitwise addition, , and the field Fb . Digitwise
addition and negation are defined as follows for all k, l Nd0 :
k=
d
k j b
=0
, l=
d
l j b
=0
j=1
kl =
k=
371
k j , l j Fb ,
j=1
d
=0
d
(b k j )b
=0
,
j=1
ak := k
k a Fb .
j=1
a times
k, x :=
d
k j x j,+1
(mod b).
(4a)
j=1 =0
For all points t, x [0, 1)d , wavenumbers k, l Nd0 , and a Fb , it follows that
k, 0 =
0, x = 0,
k, ax t = a
k, x +
k, t (mod b) if ok(ax, t)
ak l, x = a
k, x +
l, x (mod b),
k, x = 0 k
Nd0
= x = 0.
(4b)
(4c)
(4d)
(4e)
k, z i = 0 i N0 = k = 0.
(5)
Defining N0,m := {0, . . . , bm 1}, the dual net corresponding to the net Pm is
the set of all wavenumbers for which
k, maps the whole net to 0:
Pm := {k Nd0 :
k, z i = 0, i N0,m }
= {k Nd0 :
k, z b = 0, = 0, . . . , m 1}.
The properties of the bilinear transform defined in (4) imply that the dual nets Pm
are subspaces of each other:
= {0}.
P0 = Nd0 P1 P
The integrands are assumed to belong to some subset of L 2 ([0, 1)d ), the space of
square integrable functions. The L 2 inner product is defined as
f, g2 =
[0,1)d
f (x)g(x) dx.
372
(6)
f(k)e2 1
k,x/b , where f(k) := f, e2 1
k,/b ,
f (x) =
2
kNd0
and the L 2 inner product of two functions is the 2 inner product of their Walsh series
coefficients:
f, g2 =
.
f(k)g(k)
kNd
0
kNd0
Since the digital net Pm is a group under , one may derive a useful formula for
the average of a Walsh function sampled over a net. For all wavenumbers k Nd0
and all x Pm one has
b 1
1 2 1
k,zi /b
[e
e2 1
k,zi x/b ]
0= m
b i=0
m
b 1
1 2 1
k,zi /b
= m
[e
e2 1{
k,zi +
k,x}/b ] by (4c)
b i=0
m
= [1 e
2 1
k,x/b
b 1
1 2 1
k,zi /b
] m
e
.
b i=0
m
By this equality it follows that the average of the sampled Walsh function values is
either one or zero, depending on whether the wavenumber is in the dual net or not:
bm 1
1 2 1
k,zi /b
1, k Pm
e
= 1Pm (k) =
m
b i=0
0, k Nd0 \ Pm .
(7)
Multivariate integrals may be approximated by the average of the integrand sampled over a digitally shifted digital net, namely,
b 1
1
f (z i ).
Im ( f ) := m
b i=0
m
(8)
Under the assumption that ok(z i , ) = true (see (2)) for all i N0 , it follows that
the error of this cubature rule is the sum of the Walsh coefficients of the integrand
over those wavenumbers in the dual net:
[0,1)d
373
2 1
k,/b
f (x) dx Im ( f ) = f (0)
f (k) Im e
kNd0
= f(0)
f(k)1Pm (k)e2 1
k,/b
kNd0
=
f(k)e2
.
1
k,/b
kPm \{0}
(9)
Adaptive Algorithm 2 that we construct in Sect. 4 works with this expression for the
cubature error in terms of Walsh coefficients.
Although the true Walsh series coefficients are generally not known, they can be
estimated by the discrete Walsh transform, defined as follows:
bm 1
1 2 1
k,zi /b
2 1
k,/b
f m (k) := Im e
f () = m
e
f (z i )
b i=0
bm 1
1 2 1
k,zi /b
e
= m
f (l)e2 1
l,zi /b
b i=0
d
lN0
lNd0
1
f(l) m
b
f(l)e2
lNd0
f(l)e2
m
b
1
e2
= f(k) +
b 1
1 2 1
lk,zi /b
e
bm i=0
m
1
lk,/b
1
lk,/b
f(k l)e2
lPm
1
lk,z i /b
i=0
lNd0
1Pm (l k)
1
l,/b
f(k l)e2
1
l,/b
k Nd0 .
(10)
lPm \{0}
The discrete transform, fm (k) is equal to the true Walsh transform, f(k), plus aliasing
terms proportional to f(k l) where l is a nonzero wavenumber in the dual net.
374
k Nd0 : k k()
Pm , k k(),
z bm = a , a = 1, . . . , b 1,
but not necessarily in that order.
There is some flexibility in the choice of this map. One might choose k to map
smaller values of to smaller values of k based on some standard measure of size
such as that given in [3, (5.9)]. The motivation is that larger should generally lead
to smaller f( k()).
We use Algorithm 3 below to construct this map implicitly.
To illustrate the initial steps of Algorithm 1, consider the Sobol points in dimension 2. In this case, z 1 = (1/2, 1/2), z 2 = (1/4, 3/4) and z 4 = (1/8, 5/8). For
m = = 0, one needs
P0 , k k(0),
z 1 = 1 = k Nd0 :
k, z 1 = 1 .
k(1)
k Nd0 : k k(0)
k(2)
k Nd0 : k k(0)
P1 , k k(0),
z2 = 1
= k Nd0 : k P1 ,
k, z 2 = 1 .
k(3)
k Nd0 : k k(1)
P1 , k k(1),
z2 = 1 ,
375
2
f+bm e
m
1 k(+b
) k(),
/b
(11)
=1
[0,1)d
m
2 1 k(b
), /b
m
f
f (x) dx Im ( f ) =
fbm e
b .
=1
(12)
=1
We will use the discrete transform, fm, , to estimate true Walsh coefficients, f , for
m significantly larger than logb ().
m
b
1
f ,
b
1
S,m ( f ) =
=bm1
f+bm ,
=b1 =1
Sm ( f ) = S0,m ( f ) + + Sm,m ( f ) =
f ,
!
S,m ( f ) =
=bm
b
1
fm, .
=b1
The first three sums, Sm ( f ), S,m ( f ), and Sm ( f ), cannot be observed because they
involve the true series coefficients. But, the last sum, !
S,m ( f ), is defined in terms of
the discrete Walsh transform and can easily be computed in terms of function values.
The details are described in the Appendix.
We now make critical assumptions about how certain sums provide upper bounds
on others. Let N be some fixed integer and and be some non-negative valued
)S ( f ), m}.
(13)
This is a cone because f C = a f C for all real a.
376
The first inequality asserts that the sum of the larger indexed Walsh coefficients
bounds a partial sum of the same coefficients. For example, this means that S0,12 , the
sum of the values of the large black dots in Fig. 2, is no greater than some factor times
S12 ( f ), the sum of the values of the gray . Possible choices of are (m) = 1
or (m) = Cbm for some C > 1 and 0 1. The second inequality asserts
that the sum of the smaller indexed coefficients provides an upper bound on the sum
of the larger indexed coefficients. In other words, the fine scale components of the
integrand are not unduly large compared to the gross scale components. In Fig. 2 this
means that S12 ( f ) is no greater than some
factor times S8 ( f ), the sum of the values
of the black squares. This implies that f does not dip down and then bounce back
up too dramatically as . The reason for enforcing the second inequality only
for is that for small , one might have a coincidentally small S ( f ), while
Sm ( f ) is large.
The cubature error bound in (12) can be bounded in terms of Sl ( f ), a certain
finite sum of the Walsh coefficients for integrands f in the cone C . For , m N,
m, it follows that
[0,1)d
fbm = S0,m ( f )
f (x) dx Im ( f )
by (12)
=1
(m) Sm ( f ) (m)(m
)S ( f ).
(14)
Thus, the faster S ( f ) decays as , the faster the cubature error must decay.
Unfortunately, the true Walsh coefficients are unknown. Thus we must bound
S,m ( f ). This
S ( f ) in terms of the observable sum of the approximate coefficients, !
is done as follows:
S ( f ) =
1
b
377
f
=b1
m
fm,
+bm e2 1 k(+b )k(), /b
f
=1
=b1
1
b
1
b
fm, +
=b1
1
b
by (11)
f+bm = !
S,m ( f ) + S,m ( f )
=b1 =1
)S ( f )
by (13),
!
S,m ( f ) + (m )(m
!
S,m ( f )
S ( f )
provided that (m )(m
) < 1.
1 (m )(m
)
(15)
Combining (14) with (15) leads to the following conservative upper bound on the
cubature error for , m N, m:
[0,1)d
!
S,m ( f )(m)(m
)
.
f (x) dx Im ( f )
1 (m )(m
)
(16)
378
Theorem 1 If the integrand, f , lies in the cone, C , then the Algorithm 2 is successful:
[0,1)d
f (x)dx Im ( f ) .
The number of integrand values required to obtain this answer is bm , where the
following upper bound on m depends on the tolerance and unknown decay rate of
the Walsh coefficients.
)]Sm r ( f ) }
m min{m + r : C(m )[1 + (r )(r
The computational cost of this algorithm beyond that of obtaining the integrand
values is O(mbm ) to compute the discrete Walsh transform.
Proof The success of this algorithm comes from applying (16). To bound the number
of integrand values required note that argument leading to (15) can be modified to
provide an upper bound on !
S,m ( f ) in terms of S ( f ):
!
S,m ( f ) =
b
1
fm,
=b1
m
2 1 k(+b
) k(),
/b
=
f +bm e
f +
=1
=b1
b
1
b
1
f +
=b1
b
1
by (11)
f+bm = S ( f ) + S,m ( f )
=b1 =1
[1 + (m )(m
)]S ( f )
by (13).
Thus, the upper bound on the error in Step 2 of Algorithm 2, is itself bounded above
by C(m)[1 + (r )(r
)]Smr ( f ). Therefore, the stopping criterion in Step 2 must be
satisfied no later than when this quantity falls below .
The computation of the discrete Walsh transform and !
Smr,m ( f ) is described in
Algorithm 3 in the Appendix. The cost of this algorithm is O(mbm ) operations.
5 Numerical Experiments
Algorithm 2 has been implemented in MATLAB code as the function cubSobol_g.
It is included in our Guaranteed Automatic Integration Library (GAIL) [2]. Our
cubSobol_g utilizes MATLABs built-in Sobol sequences, so b = 2. The default
algorithm parameters are
10
0.2
0.4
0.6
0.8
0.8
1
10
0.6
0.4
10
0.2
3
10
0
6
10
= 6,
379
r = 4,
10
10
10
10
10
C(m) = 5 2m ,
Rd
[0,1)d
"
# d
#1
cos $
[ 1 (x j )]2 dx,
2 j=1
(17)
where is the standard Gaussian distribution function (Fig. 3). We generated 1000
IID random values of the dimension d = e D with D being uniformly distributed
between 0 and log(20). Each time cubSobol_g was run, a different scrambled and
shifted Sobol sequence was used. The tolerance was met about 97 % of the time
and failures were more likely among the higher dimensions. For those cases where
the tolerance was not met, mostly the larger dimensions, the integrand lay outside
the cone C . Our choice of k via Algorithm 3 depends somewhat on the particular
scrambling and digital shift, so the definition of C also depends mildly on these.
6 Discussion
There are few quasi-Monte Carlo cubature algorithms available that adaptively determine the sample size needed based on integrand values. The chief reason is that
reliable error estimation for quasi-Monte Carlo is difficult. Quasi-standard error has
serious drawbacks, as explained in [15]. Internal replications have no explicit theory.
380
IID replications of randomized quasi-Monte Carlo rules are sometimes used, but one
does not know how many replications are needed.
The proposed error bound and adaptive algorithm here are practical and have
theoretical justification. The conditions imposed on the sums of the (true) Fourier
Walsh coefficients make it possible to bound the cubature error in terms of discrete
FourierWalsh coefficients. The set of integrands satisfying these conditions is a nonconvex cone (13), thereby placing us in a setting where adaption has the opportunity
to be beneficial.
Problems requiring further consideration include how to choose the default parameters for Algorithm 2. We would also like to extend our algorithm and theory to
the case of relative error.
Acknowledgments This work was partially supported by US National Science Foundation grants
DMS-1115392, DMS-1357690, and DMS-1522687. The authors thank Ronald Cools and Dirk
Nuyens for organizing MCQMC 2014. We thank Sergei Kucherenko and Art Owen for organizing
the special session in honor of Ilya M. Sobol. We are grateful for Professor Sobols many contributions to MCQMC and related fields. The suggestions made by Sou-Cheng Choi, Yuhan Ding, Lan
Jiang, and the anonymous referees to improve this manuscript are greatly appreciated.
e2 1 =0 i /b yi ,
b i=0
b i =0
i =0
m
Y(m)
m1
:=
381
Next, we relate the Y to the discrete Walsh transform of the integrand f . For
, let
every k Nd0 and every digital sequence P = {z i }i=0
!
0 (k) := 0,
!
m (k) :=
m1
k, z b b N0,m , m N.
(18)
=0
If we set yi = f (z i + ), and if !
m (k) = , then
b 1
1 2 1
k,zi /b
e
yi
fm (k) = m
b i=0
m
=
=
=
=
e2
m
1
1
k,/b b
bm
e2
m
1
1
k,/b b
m
1
1
k,/b b
m
1
1
k,/b b
1
k,/b
yi
by (4c)
%
2 1 k, m1
j=0 i j z b j /b
yi
by (3)
e2
%m1
j=0
i j
k,z b j /b
yi
by (4c)
i=0
bm
= e2
1
k,z i /b
i=0
bm
e2
i=0
bm
e2
e2
e2
%m1
=0
i /b
yi
by (18)
i=0
Y(m) .
(19)
Using the notation in Sect. 4, for all m N0 define a pointer m : N0,m N0,m
as m () := !
m ( k()).
It follows that
= e2
fm, = fm ( k())
!
S,m ( f ) =
b
1
=b1
1
k,/b
Y(m)
,
m ()
b
1
(m)
fm, =
Ym () .
(20)
=b1
The quantity !
Smr,m ( f ) is the key to the stopping criterion in Algorithm 2.
If the map k : N0 Nd0 defined in Algorithm 1 is known explicitly, then specifying m is straightforward. However, in practice the bookkeeping involved in constructing k might be tedious, so we take a data-dependent approach to constructing
the pointer m () for N0,m directly, which then defines k implicitly.
Algorithm 3 Let r N be fixed. Given the input m N0 , the discrete Walsh coefficients Y(m) for N0,m , and also the pointer m1 () defined for N0,m1 ,
provided m > 0,
382
m
m
:= {k Nd0 : !
m (k) = m ()} for N0,m , m N0 , where m
Lemma 1 Let Pm,
is given by Algorithm 3. Then we implicitly have defined the map k in the sense
= 0 Pm,0
, and k()
Pm, for
that any map k : N0,m Nd0 that chooses k(0)
m
all = 1, . . . , b 1 gives the same value of Smr,r ( f ). It is also consistent with
Algorithm 1 for N0,mr .
k Pm,
, l Pm,+ab
= k l P
and that
Pm+1,
Pm,
(22)
Pm,
= for some
values in Step 3 also preserves (21). Step 3 may cause Pm1,
larger values of , but the constraint on the values of in Step 3 mean that (22) is
preserved.
References
1. Caflisch, R.E.: Monte Carlo and quasi-Monte Carlo methods. Acta Numer. 7, 149 (1998)
2. Choi, S.C.T., Ding, Y., Hickernell, F.J., Jiang, L., Jimnez Rugama, Ll.A., Tong, X., Zhang,
Y., Zhou, X.: GAIL: Guaranteed Automatic Integration Library (versions 1.02.1). MATLAB
software (20132015). https://github.com/GailGithub/GAIL_Dev
3. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte
Carlo Integration. Cambridge University Press, Cambridge (2010)
383
4. Hickernell, F.J.: A generalized discrepancy and quadrature error bound. Math. Comput. 67,
299322 (1998)
5. Hickernell, F.J., Sloan, I.H., Wasilkowski, G.W.: On strong tractability of weighted multivariate
integration. Math. Comput. 73, 19031911 (2004)
6. Hickernell, F.J., Sloan, I.H., Wasilkowski, G.W.: On tractability of weighted integration for
certain Banach spaces of functions. In: Niederreiter [13], pp. 5171
7. Hickernell, F.J., Sloan, I.H., Wasilkowski, G.W.: On tractability of weighted integration over
bounded and unbounded regions in Rs . Math. Comput. 73, 18851901 (2004)
8. Hickernell, F.J., Sloan, I.H., Wasilkowski, G.W.: The strong tractability of multivariate integration using lattice rules. In: Niederreiter [13], pp. 259273
9. Jimnez Rugama, Ll.A., Hickernell, F.J.: Adaptive multidimensional integration based on
rank-1 lattices. In: Cools, R., Nuyens, D., (eds.) Monte Carlo and Quasi-Monte Carlo Methods
2014, vol. 163, pp. 407422. Springer, Heidelberg (2016)
10. Keister, B.D.: Multidimensional quadrature algorithms. Comput. Phys. 10, 119122 (1996)
11. Lemieux, C.: Monte Carlo and quasi-Monte Carlo Sampling. Springer Science+Business Media
Inc, New York (2009)
12. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. CBMS-NSF
Regional Conference Series in Applied Mathematics. SIAM, Philadelphia (1992)
13. Niederreiter, H. (ed.): Monte Carlo and Quasi-Monte Carlo Methods 2002. Springer, Berlin
(2004)
14. Novak, E., Wozniakowski, H.: Tractability of Multivariate Problems Volume II: Standard Information for Functionals. No. 12 in EMS Tracts in Mathematics. European Mathematical Society,
Zrich (2010)
15. Owen, A.B.: On the Warnock-Halton quasi-standard error. Monte Carlo Methods Appl. 12,
4754 (2006)
Quasi-Monte Carlo
Optimal quadrature
1 Introduction
Quasi-Monte Carlo (QMC) rules are equal-weight quadrature rules which can be
used to approximate integrals defined on the d-dimensional unit cube [0, 1)d
[0,1)d
f (x) dx
N
1
f (x i ),
N i=1
385
386
N
1
f (x) dx
f (x i ) .
[0,1)d
N i=1
To study the behavior of this error as N increases for f from a Banach space (H , )
one considers the worst case error
N
1
wce(H , P N ) = sup
f (x) dx
f (x i ) .
d
N
[0,1)
f H
i=1
f 1
Particularly nice examples of such function spaces are reproducing kernel Hilbert
1
spaces [1]. Here, we will consider the reproducing kernel Hilbert space Hmix
of
1-periodic functions with mixed smoothness. Details on these spaces are given in
Sect. 2. The reproducing kernel is a tensor product kernel of the form
K d, (x, y) =
d
j=1
N
K d, (x i , x j ).
i, j=1
There is a general connection between the discrepancy of a point set and the worst case
error of integration. Details can be found in [11, Chap. 9]. In our case, the relevant
notion is the L 2 -norm of the periodic discrepancy. We describe the connection in
detail in Sect. 2.3.
There are many results on the rate of convergence of worst case errors and of
the optimal discrepancies for N , see e.g. [9, 11], but results on the optimal
point configurations for fixed N and d > 1 are scarce. For discrepancies, we are
only aware of [21], where the point configurations minimizing the standard L -stardiscrepancy for d = 2 and N = 1, 2, . . . , 6 are determined, [14], where for N = 1
the point minimizing the standard L - and L 2 -star discrepancy for d 1 is found,
and [6], where this is extended to N = 2.
It is the aim of this paper to provide a method which for d = 2 and N > 2
yields the optimal points for the periodic L 2 -discrepancy and worst case error in
1
Hmix
. Our approach is based on a decomposition of the global optimization problem
into exponentially many local ones which each possess unique solutions that can be
approximated efficiently by a nonlinear block GauSeidel method. Moreover, we
387
use the symmetries of the two-dimensional torus to significantly reduce the number
of local problems that have to be considered.
It turns out that in the case that N is a (small) Fibonacci number, the Fibonacci
lattice yields the optimal point configuration. It is common wisdom, see e.g.
[3, 10, 1518], that the Fibonacci lattice provides a very good point set for integrating periodic functions. Now our results support the conjecture that they are actually
the best points.
These results may suggest that the optimal point configurations are integration
lattices or at least lattice point sets. This seems to be true for some numbers N of
points, for example for Fibonacci numbers, but not always. However, it can be shown
1
, P N ). Moreover, our
that integration lattices are always local minima of wce(Hmix
numerical results also suggest that for small the optimal points are always close
to a lattice point set, i.e. N -point sets of the form
i (i)
,
N N
: i = 0, . . . , N 1 ,
1 (T2 )
2 Quasi-Monte Carlo Integration in Hmix
kZ
|2 k|2 fk2 =
2
T
f (x) dx
+
f (x)2 dx
(1)
388
for a function f in the univariate Sobolev space H 1 (T) = W 1,2 (T) L 2 (T) of
functions with first weak derivatives bounded in L 2 gives a Hilbert space norm
f H 1, on H 1 (T) depending on the parameter > 0. The corresponding inner
product is given by
( f, g) H 1, (T) =
f (x) dx
g(x) dx +
We denote the Hilbert space H 1 (T) equipped with this inner product by H 1, (T).
Since H 1, (T) is continuously embedded in C 0 (T) it is a reproducing
kernel Hilbert space (RKHS), see [1], with a symmetric and positive definite kernel
K 1, : T T R, given by [20]
K 1, (x, y) := 1 +
kZ\{0}
(2)
= 1 + k(|x y|),
where k(t) = 21 (t 2 t + 16 ) is the Bernoulli polynomial of degree two divided by
two.
This kernel has the property that it reproduces point evaluations in H 1 , i.e.
f (x) = ( f (), K (, x)) H 1, for all f H 1 . The reproducing kernel of the tensor
1,
product space Hmix (T2 ) := H 1 (T) H 1 (T) C(T2 ) is the product of the univariate kernels, i.e.
K 2, (x, y) = K 1, (x1 , y1 ) K 1, (x2 , y2 )
= 1 + k(|x1 y1 |) + k(|x2 y2 |) + 2 k(|x1 y1 |)k(|x2 y2 |).
(3)
i ig
,
N N
mod 1 : i = 0, . . . , N 1
389
nth Fibonacci number. It is well known that the Fibonacci lattices yield the optimal
rate of convergence in certain spaces of periodic functions.
In the setting of a reproducing kernel Hilbert space with kernel K on a general
domain D, the worst case error of the QMC-rule Q N can be computed as
wce(H , P N )2 =
K (x, y) dx d y
N
N
2
1
K (x i , y) dy + 2
K (x i , x j ),
N
N
i=1 D
i, j=1
which is the norm of the error functional, see e.g. [4, 11]. For the kernel K 2, we
obtain
1,
wce(Hmix (T2 ), P N )2 = 1 +
N
N
1
K 2, (x i , x j ).
N 2 i=1 j=1
and
#P N B
vol(B).
N
Finally, the periodic L 2 -discrepancy of P N is the L 2 -norm of the discrepancy function taken over all periodic boxes B = B(x, y), i.e.
390
D2 (P N ) =
1/2
D(P N , B(x, y)) d y dx
2
[0,1)d
[0,1)d
It turns out, see [11, p. 43] that the periodic L 2 -discrepancy can be computed as
D2 (P N )2 = 3d +
1
N2
K d (x, y)
x, yP N
1,6
(Td ), P N )2 ,
= 3d wce(Hmix
wce(Hmix (T2 ), P N )2 = 1 +
N 1
1
K 1, (xi , x j ) K 1, (yi , y j )
N2
i, j=0
=1+
=
=
N2
1
N2
N
1
N
1
1 + k(|xi x j |) + k(|yi y j |) + 2 k(|xi x j |)k(|yi y j |)
i, j=0
k(|xi x j |) + k(|yi y j |) + k(|xi x j |)k(|yi y j |)
i, j=0
(2k(0) + k(0)2 )
N
N 2 N 1
2
+ 2
k(|xi x j |) + k(|yi y j |) + k(|xi x j |)k(|yi y j |)
N
i=0 j=i+1
391
1,
N 1
N 2
k(|xi x j |) + k(|yi y j |) + k(|xi x j |)k(|yi y j |)
i=0 j=i+1
(4)
or
G (x, y) :=
N 1
(5)
i, j=0
For theoretical considerations we will sometimes use G , while for the numerical
implementation we will use F as objective function, since it has less summands.
Let , S N be two permutations of {0, 1, . . . , N 1}. Define the sets
D,
x
x (1) x (N 1)
= x [0, 1) , y [0, 1) : (0)
y (0) y (1) y (N 1)
N
(6)
on which all points maintain the same order in both components and hence it holds
|xi x j | = si, j (xi x j ) for si, j {1, 1}. It follows that the restriction of F to
D, , i.e. F (x, y)|D, , is a polynomial of degree 4 in (x, y). Moreover, F |D, is
convex for sufficiently small .
Proposition 1 F (x, y)|D, and G (x, y)|D, are convex if [0, 6].
Proof It is enough to prove the claim for
G (x, y) =
N 1
i, j=0
1 + k(s) = 1 +
2
1
s s+
6
2
1 2
> s
.
2
392
But this is elementary to check for 0 < 6 and s [0, 1]. In the case = 6
the determinant of H ( f ) = 0 and some additional argument is necessary which we
omit here.
Since
[0, 1) N [0, 1) N =
D, ,
(, )S N S N
0 = x0 x1 . . . x N 1
,
D = x [0, 1) N , y [0, 1) N :
0 = y0 y (1) y (N 1)
393
(7)
for
0 i, j N 1
(8)
(0) = 0
d( (1), (2)) d(0, (N 1))
(1) = min {d( (i), (i + 1)) | i = 0, 1, . . . , N 1}
is lexicographically smaller than 1 .
N 1
i, j=0
394
Now we use that the sum of a convex and a strictly convex function is again strictly
convex. Hence it is enough to show that the function
f (x1 , . . . , x N 1 , y1 , . . . , y N 1 ) =
N 1
i=1
N 1
i=1
m
f i (z i )
i=1
is strictly convex.
3.3 Minimizing F on D
Our strategy will be to compute the local minimum of F on each region
D [0, 1) N [0, 1) N for all semi-canonical permutations C N S N and
determine the global minimum by choosing the smallest of all the local ones.
This gives for each C N the constrained optimization problem
min F (x, y)
(x, y)D
and
wi ( y) = y (i) y (i1)
for i = 1, . . . , N 1. (10)
In order to use the necessary (and due to local strict convexity also sufficient) conditions for local minima
F (x, y) = 0
xk
F (x, y) = 0
yk
and
395
for k = 1, . . . , N 1
N 1
k1
N 1
F (x, y)|D = yk
ci,k
ci,k yi +
ci,k si,k
ck, j sk, j ,
yk
2 i=0
i=0
i=0
j=k+1
N 1
i=k
i=k
(11)
where si, j = sgn(yi y j ) and ci, j := 1 + k(|xi x j |) = c j,i .
Interchanging x and y the same result holds for the partial derivatives with respect
to x with the obvious modification to ci, j and the simplification that si, j = 1.
The second order derivatives with respect to y are given by
N 1
k1
2
i=0 ci,k + i=k+1 ci,k
F(x, y)|D =
yk y j
ck, j
for j = k
, k, j {1, . . . , N 1}
for j = k
(12)
Again, the analogue for xk x j F(x, y)|D is obtained with the obvious modification
ci, j = 1 + k(|yi y j |).
2
Proof We prove the claim for the partial derivative with respect to y:
1
N
2 N
F (x, y) =
k(|yi y j |) 1 + k(|xi x j |) +
k(|xi x j |)
yk
yk
yk
i=0 j=i+1
N
2 N
1
=:ci, j
ci, j
i=0 j=i+1
1
N
2 N
ci, j
i=0 j=i+1
N
1
ck, j sk, j
j=k+1
k(|yi y j |)
yk
for i = k
si, j
k (si, j (yi y j )) si, j for j = k
0
else
k1
1
1
sk, j (yk y j )
ci,k si,k si,k (yi yk )
2
2
i=0
1
1
k1
N
1
N
N
1
= yk
ci,k
ci,k yi +
ci,k si,k
ck, j sk, j .
2
i=0
i=k
i=0
i=k
i=0
j=k+1
396
N 1
(i vi (x) + i wi ( y))
(13)
(14)
i=1
(x, y)D
and
, 0 (component-wise).
(15)
1 1 0 . . . 0 0
0 1 1 . . . 0 0
.. R(N 1)(N 1) .
..
B := ...
(16)
.
.
0
. . . 0 1 1
0
...
0 1
Then the partial derivatives of L F with respect to x and y are given by
1 2
..
.
x L F (x, y, , ) = x F(x, y)
= x F(x, y) B
N 2 N 1
N 1
(17)
397
and
(1) (2)
..
.
y L F (x, y, , ) = y F(x, y)
= y F(x, y) BP .
(N 2) (N 1)
(N 1)
(18)
This leads to the following theorem.
Theorem 1 For C N and > 0 let the point ( x , y ) D fulfill
F( x , y ) =
xk
F( x , y ) =
yk
and
for k = 1, . . . , N 1.
(19)
Then
F(x, y) F( x , y )
N 1
(N i) vi ( x ) + (N i)wi ( y )
i=1
2
> F( x , y ) N
(20)
(21)
and
= P1 B 1 y F( x , y )
(22)
x F( x , y) = B
and
y F( x , y) = BP .
(23)
yields
B 1
1 1 ...
0 1 . . .
:= .
.. 0 . . .
0 ... 0
1
1
(N 1)(N 1)
,
.. R
.
1
which yields y, > 0 and hence by Wolfe duality gives (20). The second inequality
(21)
follows from
noting that both |vi (x)| and |wi ( y)| are bounded by 1 and
Nthen
1
N 1
(N i) = 2 i=1
i = (N 1)(N 2) < N 2 .
2 i=1
Now, suppose we had some candidate (x , y ) D for an optimal point set. If we
can find for all other C N points ( x , y ) that fulfills (19) and
F( x , y ) N 2 F (x , y )
398
for some > 0, we can be sure that D is (up to torus symmetry) the unique domain
D that contains the globally optimal point set.
(x, y)D
(24)
where the inner minimum has a unique solution due to Proposition 2. Moreover, since
D is a convex domain we know that the local minimum of F (x, y)|D is not on
the boundary. Hence we can restrict our search for optimal point sets to the interior
of D , where F is differentiable.
Instead of directly employing a local optimization technique, we will make use
of the special structure of F . While F (x, y)|D is a polynomial of degree four, the
functions
(25)
x F (x, y0 )|D and y F (x 0 , y)|D ,
where one coordinate direction is fixed, are quadratic polynomials, which have unique
minima in D . We are going to use this property within an alternating minimization
approach. This means, that the objective function F is not minimized along all coordinate directions simultaneously, but with respect to certain successively alternating
blocks of coordinates. If these blocks have size one this method is usually referred
to as coordinate descent [7] or nonlinear GauSeidel method [5]. It is successfully employed in various applications, like e.g. expectation maximization or tensor
approximation [8, 19].
In our case we will alternate between minimizing F (x, y) along the first coordinate block x (0, 1) N 1 and the second one y (0, 1) N 1 , which can be done
exactly due to the quadratic polynomial property of the partial objectives (25). The
method is outlined in Algorithm 1, which for threshold-parameter = 0 approximates the local minimum of F on D . For > 0 it obtains feasible points that
399
1
N
,...,
N 1
N )
(1)
(N 1)
).
N ,...,
N
repeat
N
N
1. compute H x := xi x j F (x (k) , y(k) i, j=1 and x = xi F (x (k) , y(k) i=1 by (12) and (11).
2. Update x (k+1) := H 1
( x + 1) via Cholesky factorization.
x
N
N
3. compute H y := yi y j F (x (k+1) , y(k) i, j=1 and y = yi F (x (k+1) , y(k) i=1 .
4. Update y(k+1) := H 1
y + 1 via Cholesky factorization.
y
5. k := k + 1.
until x 2 + y 2 <
Output: point set (x, y) D with x F (x, y) 1 and y F (x, y) 1.
(x, y)D
400
4.3 Results
In Figs. 1 and 2 the optimal point sets for N = 2, . . . , 16 and both = 1 and = 6
are plotted. It can be seen that they are close to lattice point sets, which justifies using
them as start points in Algorithm 1. The distance to lattice points seems to be small
if is small.
In Table 1 we list the permutations for which D contains an optimal set of
cubature points. In the second column the total number of semi-canonical permutations C N that had to be considered is shown. It grows approximately like 21 (N 2)!.
Moreover, we computed the minimal worst case error and periodic L 2 -discrepancies.
In some cases we found more than one semi-canonical permutation for which
D contained a point set which yields the optimal worst case error. Nevertheless, they
represent equivalent permutations. In the following list, the torus symmetries used
to show the equivalency of the permutations are given. All operations are modulo 1.
N = 7: (x, y) (1 y, x)
N = 9: (x, y) (y 2/9, x 1/9)
N = 11: (x, y) (y + 5/11, x 4/11)
N = 14: (x, y) (x 4/14, y + 6/14)
N = 15: (x, y) (y + 3/15, x + 2/15), (y 2/15, 12/15 x), (y 6/15,
4/15 x)
N = 16: (x, y) (1/16 x, 3/16 y)
In all the examined cases N {2, . . . , 16} Algorithm 2 produced sets N which
contained exactly the permutations that were previously obtained by Algorithm 1
and are listed in Table 1. Thus we can be sure, that the respective D contained
minimizers of F , which on each D are unique. Hence we know that our numerical
approximation of the minimum is close to the true global minimum, which (modulo
torus symmetries) is unique. In the cases N = 1, 2, 3, 5, 7, 8, 12, 13 the obtained
global minima are integration lattices.
401
402
403
Table 1 List of semi-canonical permutations , such that D contains an optimal set of cubature
points for N = 1, . . . , 16
1,1
N
|CN |
wce(Hmix
, P N ) D2 (P N )
Lattice
1
2
3
4
5
6
7
0
1
1
2
5
13
57
0.416667
0.214492
0.146109
0.111307
0.0892064
0.0752924
0.0650941
0.372678
0.212459
0.153826
0.121181
0.0980249
0.0850795
0.0749072
8
9
282
1,862
0.056846
0.0512711
0.0651562
0.0601654
10
14,076
0.0461857
0.054473
11
124,995
0.0422449
0.050152
12
1,227,562
0.0370732
0.0456259
13
13,481,042
0.0355885
0.0421763
14
160,456,465
0.0333232
0.0400524
15
2,086,626,584
0.0312562
0.0379055
16
29,067,602,676
0.0294507
0.0359673
(0)
(0 1)
(0 1 2)
(0 1 3 2)
(0 2 4 1 3)
(0 2 4 1 5 3)
(0 2 4 6 1 3 5), (0
3 6 2 5 1 4)
(0 3 6 1 4 7 2 5)
(0 2 6 3 8 5 1 7 4),
(0 2 7 4 1 6 3 8 5)
(0 3 7 1 4 9 6 2 8
5)
(0 3 8 1 6 10 4 7 2
9 5), (0 3 9 5 1 7
10 4 8 2 6)
(0 5 10 3 8 1 6 11
4 9 2 7)
(0 5 10 2 7 12 4 9
1 6 11 3 8)
(0 5 10 2 8 13 4
11 6 1 9 3 12 7),
(0 5 10 3 12 7 1 9
4 13 6 11 2 8)
(0 4 9 13 6 1 11 3
8 14 5 10 2 12 7),
(0 5 11 2 7 14 9 3
12 6 1 10 4 13 8),
(0 5 11 2 8 13 4
10 1 6 14 9 3 12
7), (0 5 11 2 8 13
6 1 10 4 14 7 12 3
9)
(0 3 11 5 14 9 1 7
12 4 15 10 2 6 13
8), (0 3 11 6 13 1
9 4 15 7 12 2 10 5
14 8)
404
5 Conclusion
In the present paper we computed optimal point sets for quasi-Monte Carlo cubature
of bivariate periodic functions with mixed smoothness of order one by decomposing
the required global optimization problem into approximately (N 2)!/2 local ones.
Moreover, we computed lower bounds for each local problem using arbitrary precision rational number arithmetic. Thereby we obtained that our approximation of the
global minimum is in fact close to the real solution.
In the special case of N being a Fibonacci number our approach showed that for
N {1, 2, 3, 5, 8, 13} the Fibonacci lattice is the unique global minimizer of the
1
. We strongly conjecture that this is true for all
worst case integration error in Hmix
Fibonacci numbers. Also in the cases N = 7, 12, the global minimizer is the obtained
integration lattice.
In the future we are planning to prove that optimal points are close to lattice
r
, i.e. Sobolev spaces with dominating
points. Moreover, we will investigate Hmix
mixed smoothness of order r 2 and other suitable kernels and discrepancies.
Acknowledgments The authors thank Christian Kuske and Andr Uschmajew for valuable hints
and discussions. Jens Oettershagen was supported by the Sonderforschungsbereich 1060 The Mathematics of Emergent Effects of the DFG.
References
1. Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337404 (1950)
2. Bezdek, J.C., Hathaway, R.J., Howard, R.E., Wilson, C.A., Windham, M.P.: Local convergence
analysis of a grouped variable version of coordinate descent. J. Optim. Theory Appl. 54(3),
471477 (1987)
3. Bilyk, D., Temlyakov, V.N., Yu, R.: Fibonacci sets and symmetrization in discrepancy theory.
J. Complex. 28, 1836 (2012)
4. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte
Carlo Integration. Cambridge University Press, Cambridge (2010)
5. Grippo, L., Sciandrone, M.: On the convergence of the block nonlinear Gau-Seidel method
under convex constraints. Oper. Res. Lett. 26(3), 127136 (2000)
6. Larcher, G., Pillichshammer, F.: A note on optimal point distributions in [0, 1)s . J. Comput.
Appl. Math. 206, 977985 (2007)
7. Luo, Z.Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72(1), 735 (1992)
8. McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley series in probability
and statistics. Wiley, New York (1997)
9. Niederreiter, H.: Quasi-Monte Carlo Methods and Pseudo-Random Numbers, Society for
Industrial and Applied Mathematics (1987)
10. Niederreiter, H., Sloan, I.H.: Integration of nonperiodic functions of two variables by Fibonacci
lattice rules. J. Comput. Appl. Math. 51, 5770 (1994)
11. Novak, E., Wozniakowski, H.: Tractability of Multivariate Problems. Volume II: Standard
Information for Functionals. European Mathematical Society Publishing House, Zrich (2010)
12. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
13. Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables,
Society for Industrial and Applied Mathematics (1987)
405
14. Pillards, T., Vandewoestyne, B., Cools, R.: Minimizing the L 2 and L star discrepancies of a
single point in the unit hypercube. J. Comput. Appl. Math. 197, 282285 (2006)
15. Sloan, I.H., Joe, S.: Lattice Methods for Multiple Integration. Oxford University Press, New
York and Oxford (1994)
16. Ss, V.T., Zaremba, S.K.: The mean-square discrepancies of some two-dimensional lattices.
Stud. Sci. Math. Hung. 14, 255271 (1982)
17. Temlyakov, V.N.: Error estimates for Fibonacci quadrature formulae for classes of functions.
Trudy Mat. Inst. Steklov 200, 327335 (1991)
18. Ullrich, T., Zung, D.: Lower bounds for the integration error for multivariate functions with
mixed smoothness and optimal Fibonacci cubature for functions on the square. Math. Nachr.
288(7), 743762 (2015)
19. Uschmajew, A.: Local convergence of the alternating least squares algorithm for canonical
tensor approximation. SIAM J. Matrix Anal. Appl. 33(2), 639652 (2012)
20. Wahba, G.: Smoothing noisy data with spline functions. Numer. Math. 24(5), 383393 (1975)
21. White, B.E.: On optimal extreme-discrepancy point sets in the square. Numer. Math. 27, 157
164 (1977)
22. Zinterhof, P.: ber einige Abschtzungen bei der Approximation von Funktionen mit Gleichverteilungsmethoden. sterreich. Akad. Wiss. Math.-Naturwiss. Kl. S.-B. II 185, 121132
(1976)
Abstract Quasi-Monte Carlo methods are used for numerically integrating multivariate functions. However, the error bounds for these methods typically rely on
a priori knowledge of some semi-norm of the integrand, not on the sampled function values. In this article, we propose an error bound based on the discrete Fourier
coefficients of the integrand. If these Fourier coefficients decay more quickly, the
integrand has less fine scale structure, and the accuracy is higher. We focus on rank-1
lattices because they are a commonly used quasi-Monte Carlo design and because
their algebraic structure facilitates an error analysis based on a Fourier decomposition of the integrand. This leads to a guaranteed adaptive cubature algorithm with
computational cost O(mbm ), where b is some fixed prime number and bm is the
number of data points.
Keywords Quasi-Monte Carlo methods Multidimensional integration
lattices Adaptive algorithms Automatic algorithms
Rank-1
1 Introduction
Quasi-Monte Carlo (QMC) methods use equally weighted sums of integrand values
at carefully chosen nodes to approximate multidimensional integrals over the unit
cube,
n1
1
f (z i )
f (x) dx.
n i=0
[0,1)d
Ll.A. Jimnez Rugama (B) F.J. Hickernell
Department of Applied Mathematics, Illinois Institute of Technology,
10 W 32nd Street, E1-208, Chicago, IL 60616, USA
e-mail: ljimene1@hawk.iit.edu
F.J. Hickernell
e-mail: hickernell@iit.edu
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_20
407
408
z bm = b (z bm1 + am ) = b
a0 {1, . . . , b 1}d ,
am + + b
m1
a0 ,
am
(1a)
Fdb ,
m N. (1b)
409
(2)
=0
i z b mod 1 =
m1
i z b mod 1 =
=0
m1
=0
= j z bm1 mod 1,
where j =
m1
i bm1 , (3)
=0
where (2) was used. This means that node set Pm defined above may be written as
the integer multiples of the generating vector z bm1 since
m1
Pm := {z i }iFbm = z bm1
i bm1 mod 1 : i 0 , . . . , i m1 Fb
=0
k Zd , x [0, 1)d .
(4)
This bilinear operation has the following properties: for all t, x [0, 1)d , k, l Zd ,
and a Z, it follows that
k, 0
= 0, x
= 0,
k, ax mod 1 t
= (a k, x
+ k, t
) mod 1
(5a)
(5b)
410
ak + l, x
= (a k, x
+ l, x
) mod 1,
k, x
= 0 k Z
(5c)
= x = 0.
(5d)
(6)
The bilinear operation defined in (4) is also used to define the dual lattice corresponding to Pm :
Pm := {k Zd : k, z i
= 0, i Fbm }
= {k Zd : k, z bm1
= 0}
(7)
By this definition P0 = Zd , and the properties (2), (4), and (6), imply also that the
Pm are nested subgroups with
= {0}.
Zd = P0 Pm P
(8)
Analogous to the dual lattice definition, for j Fbm one can define the dual
cosets as Pm, j := {k Zd : bm k, z bm1 = j}. Hence, a similar extended property (8)
applies:
Pm, j =
b1
, j+abm
Pm+1
, j+abm
= Pm, j Pm+1
, a Fb , j Fbm .
(9)
a=0
, j+abm b1
}a=0
The overall dual cosets structure can be represented as a tree, where {Pm+1
, j
are the children of Pm .
(a)
(b)
20
15
0.8
10
0.6
0.4
0
10
0.2
15
0
0
0.2
0.4
0.6
0.8
20
20
10
10
20
411
Figure 1 shows an example of a rank-1 lattice node set with 64 points in dimension
2 and its dual lattice. The parameters defining this node set are b = 2, m = 6, and
z 32 = (1, 27)/64. It is useful to see how Pm = Pm1 {Pm1 + z 2m1 mod 1}.
3 Fourier Series
The integrands considered here are absolutely continuous periodic functions. If the
integrand is not initially periodic, it may be periodized as discussed in [4, 12], or
[13, Sect. 2.12]. More general box domains may be considered, also by using variable
transformations, see e.g., [7, 8].
The L 2 ([0, 1)d ) inner product is defined as f, g 2 = [0,1)d f (x)g(x) dx. The
f (x) =
(10)
f(k)e2 1k,x
, where f(k) = f, e2 1k,
,
2
kZd
and the inner product of two functions in L 2 ([0, 1)d ) is the 2 inner product of their
series coefficients:
f, g
2 =
.
f(k)g(k)
d
kZ
2
kZd
(11)
This property of the dual lattice is used below to describe the absolute error of a
shifted rank-1 lattice cubature rule in terms of the Fourier coefficients for wavenumbers in the dual lattice. For fixed [0, 1)d , the cubature rule is defined as
b 1
1
Im ( f ) := m
f (z i ),
b i=0
m
m N0 .
(12)
412
[0,1)d
f (x) dx Im ( f ) =
f(k)e2
1k,
kPm \{0}
f (k) . (13)
kPm \{0}
fm (k) := Im e2 1k,
f ()
2
1k,
2
1l,
f(l)e
= Im e
=
lZd
f(l) Im e2 1lk,
lZd
f(l)e2
1lk,
lZd
(14a)
f(k + l)e2
lPm
= f(k) +
1Pm (l k)
1l,
f(k + l)e2
1l,
k Zd .
(14b)
lPm \{0}
Thus, the discrete transform fm (k) equals the integral transform f(k), defined in
(10), plus aliasing terms corresponding to f(k + l) scaled by the shift, , where
l Pm \ {0}.
To facilitate the calculation of fm (k), we define the map
m : Zd Fbm as follows:
0 (k) := 0,
m (k) := bm k, z bm1
, m N.
, j
(15)
k, z i
= k,
m1
i z b mod 1 =
=0
m1
413
i k, z b mod 1
=0
m1
i
+1 (k)b1 mod 1. (16)
=0
The map
m depends on the choice of the embedded rank-1 lattice node sets
defined in (1) and (3). We can confirm that the right hand side of this definition lies
in Fbm by appealing to (1) and recalling that the a are integer vectors:
bm k, z bm1
= bm [(b1 k T am1 + + bm k T a0 ) mod 1]
= (bm1 k T am1 + + k T a0 ) mod bm Fbm , m N.
Moreover, note that for all m N
m+1 (k)
m (k) = bm+1 k, z bm
bm k, z bm1
= bm [b k, z bm
k, z bm1
]
= bm [a + k, bz bm mod 1
k, z bm1
], for some a Fb
= bm [a + k, z bm1
k, z bm1
], by (2)
= abm for some a Fb .
(17)
(18)
= 1, . . . , m.
(19)
1 2 1k,zi
2 1k,
f m (k) := Im e
f () = m
e
yi
b i=0
= e2
1k,
Ym (
m (k)),
m N0 , k Zd ,
(20)
= Ym ( m ).
414
The quantity Ym (), Fbm , which is essentially the discrete Fourier transform, can
be computed efficiently via some intermediate quantities. For p {0, . . . , m 1},
m, N0 define Ym,0 (i 0 , . . . , i m1 ) := yi0 ++im1 bm1 and let
Ym,m p (, i m p , . . . , i m1 )
:=
b1
1
bm p
i m p1 =0
b1
m p1
i 0 =0
i +1 b1 .
=0
b1
1
bm p
1
b
i m p1 =0
b1
b1
p1
m
1
i 0 =0
=0
i m p1 =0
For each p one must perform O(bm ) operations, so the total computational cost to
obtain Ym () for all Fbm is O(mbm ).
5 Error Estimation
As seen in Eq. (13), the absolute error is bounded by a sum of the absolute value of
the Fourier coefficients in the dual lattice. Note that increasing the number of points
in our lattice, i.e. increasing m, removes wavenumbers from the set over which this
summation is defined. However, it is not obvious how fast is this error decreasing
with respect to m. Rather than deal with a sum over the vector wavenumbers, it is
more convenient to sum over scalar non-negative integers. Thus, we define another
mapping k : N0 Zd .
Set k(0)
=0
For m N0
For Fbm ,
(i) If a = 0, choose k( + ab ) {k Z :
m+1 (k) =
m ( k())}.
m
d
(ii) Choose k( + a b ) {k Z :
m+1 (k) =
m ( k()) + a bm },
for a {1, . . . , b 1}\{a}.
415
Definition 1 is intended to reflect the embedding of the dual cosets described in (8)
, j+abm
= j. In (i), if k()
Pm+1
with a > 0,
and (9). For clarity, consider
m ( k())
, j
m
m+1 (k) =
m ( k())
+
It remains to be shown that for any Fbm , {k Zd :
m
a b } is nonempty for all a Fb with a = a. Choose l such that l, z 1
= b1 .
This is possible because z 1 = b1 a0 = 0. For any m N0 , Fbm , and a Fb ,
note that
k()
+ a bm l, z bm = k(),
by (5c)
z bm + a bm l, z bm
mod 1
= [bm1
m+1 ( k())
+ a l, bm z bm mod 1 ] mod 1
by (5b) and (15)
m ( k())
+ ab1 + a l, z 1
] mod 1
= [b
m1
= [b
m1
by (2)
m ( k())
+ (a + a )b ] mod 1,
+ a bm l) =
m ( k())
+ (a + a mod b)bm
m+1 ( k()
by (15).
By choosing a such that a = (a + a mod b), we have shown that the set Fbm ,
m+1 (k) =
m ( k())
+ a bm } is nonempty.
{k Zd :
To illustrate the initial steps of a possible mapping, consider the lattice in Fig. 1
and Table 1. For m = 0, {0} and a = 0. This skips i) and implies k(1)
{k
Continuing, we may take k(4) := (1, 1), k(5) := (0, 1), k(6) := (1, 1) and
k(7)
:= (0, 1).
Lemma 1 The map in Definition 1 has the property that for m N0 and Fbm ,
+ bm )} = {l Zd : k()
l Pm }.
{ k(
=0
416
Table 1 Values
1 ,
2 and
3 for some wavenumbers and a possible assignment of k()
k()
1( k())
=
(
k())
=
(
k())
=
2
3
2 k(),
4 k(),
8 k(),
(1, 27)/2
(1, 27)/4
(1, 27)/8
(0, 0)
(1, 1)
(1, 1)
(1, 1)
(1, 0)
(1, 0)
(0, 1)
(0, 1)
(1, 1)
0
4
2
6
1
3
7
5
0
0
0
0
1
1
1
1
0
0
0
2
2
3
1
1
3
0
0
4
2
6
7
1
5
3
4
Proof This statement holds trivially for m = 0 and = 0. For m N it is noted that
by (7)
k l Pm k l, z bm1
= 0
by (5c)
k, z bm1
= l, z bm1
bm
m (k) = bm
m (l)
m (k) =
m (l).
by (15)
(21)
m (l) =
m ( k())}
= {l Zd : k()
l Pm }.
{l Zd :
(22)
+ bm )}b1 {k Zd :
m+1 (k) =
m ( k())
+ abm , a Fb }
{ k(
=0
= {k Zd :
m (k) =
m ( k())}.
Applying property (19) on the right side,
))},
+ bm )}b1 {k Zd :
(k) =
( k(
{ k(
=0
= 1, . . . , m.
Because one can say the above equation holds = 1, . . . , n < m, the left hand side
can be extended,
+ bm )} {k Zd :
{ k(
m (k) =
m ( k())}.
=0
(23)
417
follows that
m ( k(
m ( k(
m (l) =
m ( k()).
Since m and
+ bm )} . Thus,
are both in Fbm , this implies that m = , and so l { k(
=0
+ bm )} {k Zd :
{ k(
m (k) =
m ( k())},
and the lemma is proved.
=0
m
f (x) dx Im ( f )
f b ,
[0,1)d
(24)
=1
2
f+bm e
1 k(+b
) k(),
(25)
=1
m
b
1
f ,
b
1
S,m ( f ) =
=bm1
f+bm ,
=b1 =1
f ,
Sqm ( f ) =
S0,m ( f ) + +
Sm,m ( f ) =
=bm
S,m ( f ) =
b
1
fm, .
=b1
Note that
S,m ( f ) is the only one that can be observed from data because it
involves
the
coefficients. In fact, from (20) one can identify
discrete transform
and our adaptive algorithm will be based on this sum bound fm, = Ym (
m ( k()))
ing the other three, Sm ( f ),
S,m ( f ), and Sqm ( f ), which cannot be readily observed.
and be some bounded non-negative
Let N be some fixed integer and
valued functions. We define a cone, C , of absolutely continuous functions whose
Fourier coefficients decay according to certain inequalities:
S,m ( f )
(m ) Sqm ( f ), m,
C := { f AC([0, 1)d ) :
Sqm ( f ) (m
=
We also require the existence of r such that
(r )(r
) < 1 and that limm (m)
0. This set is a cone, i.e. f C = a f C a R, but it is not convex. A wider
discussion on the advantages and disadvantages of designing numerical algorithms
for cones of functions can be found in [2].
418
b
1
=b1
b
1
b
1
f =
=b1
fm, +
=b1
b
1
m
2 1 k(+b
) k(),
f+bm e
f m,
=1
f+bm =
S,m ( f ) +
S,m ( f )
=b1 =1
(m )(m
)S ( f )
S,m ( f ) +
and provided that
(m )(m
) < 1,
(27)
S ( f )
S,m ( f )
.
1
(m )(m
)
419
(28)
By (24) and the cone conditions, (28) implies a data-based error bound:
[0,1)d
fbm =
S0,m ( f )
f (x) dx Im ( f )
(m) Sqm ( f )
=1
(m)(m
)S ( f )
(m)(m
)
S,m ( f ).
1
(m )(m
)
(29)
(m)(r
)
.
1
(r )(r
)
The choice of the parameter r is important. Larger r means a smaller C(m), but it
also makes the error bound more dependent on smaller indexed Fourier coefficients.
Algorithm 1 (Adaptive Rank-1 Lattice Cubature, cubLattice_g) Fix r and ,
and describing C in (26). Given a tolerance, , initialize m = + r and do:
Step 1. According to Sect. 4, compute
Smr,m ( f ).
Step 2. Check whether C(m) Smr,m ( f ) . If true, return Im ( f ) defined in (12).
If not, increment m by one, and go to Step 1.
Theorem 1 For m = min{m + r : C(m )
Sm r,m ( f ) }, Algorithm 1 is successful whenever f C ,
[0,1)d
f (x)dx Im ( f ) .
(r )(r
)]Sm r ( f ) }, we also have bm bm . This means that the
C(m )[1 +
computational cost can be bounded,
Im , f, $( f )bm + cm bm
cost
where $( f ) is the cost of evaluating f at one data point.
420
Proof By construction, the algorithm must be successful. Recall that the inequality
used for building the algorithm is (29).
To find the upper bound on the computational cost, a similar result to (27) provides
S,m ( f ) =
b
1
=b1
b
1
=b1
b
1
m
2 1 k(+b
) k(),
f +
fm, =
f +bm e
=b1
b
1
f +
=1
f+bm = S ( f ) +
S,m ( f )
=b1 =1
[1 +
(m )(m
)]S ( f ).
Replacing
S,m ( f ) in the error bound in (29) by the right hand side above proves that
the choice of m needed to satisfy the tolerance is no greater than m defined above.
In Sect. 4, the computation of
Smr,m ( f ) is described in terms of O(mbm ) operations. Thus, the total cost of Algorithm 1 is,
Im , f, $( f )bm + cm bm
cost
7 Numerical Example
Algorithm 1 has been coded in MATLAB as cubLattice_g in base 2, and is
part of GAIL, [1]. To test it, we priced an Asian call with geometric Brownian
motion, S0 = K = 100, T = 1 and r = 3 %. The test is performed on 500 samples
whose dimensions are chosen IID uniformly among 1, 2, 4, 8, 16, 32, and 64, and
the volatility also IID uniformly from 10 to 70 %. Results, in Fig. 3, show 97 % of
success meeting the error tolerance.
The algorithm cone parametrization was = 6, r = 4 and C(m) = 5 2m . In
addition, each replication used a shifted lattice with U (0, 1). However, results
are strongly dependent on the generating vector that was used for creating the rank-1
lattice embedded node sets. The vector applied to this example was found with the
latbuilder software from Pierre LEcuyer and David Munger [9], obtained for
226 points, d = 250 and coordinate weights j = j 2 , optimizing the P2 criterion.
For this particular example, the choice of C(m) does not have a noticeable impact
on the success rate or execution time. In other cases such as discontinuous functions, it is more sensitive. Being an adaptive algorithm, if the Fourier coefficients
10 2
Time (seconds)
Fig. 3 Empirical
distribution functions
obtained from 500 samples,
for the error (continuous
line) and time (slashed-doted
line). Quantiles are specified
on the right and top axes
respectively. The tolerance
of 0.02 (vertical dashed line)
is an input of the algorithm
and will be a guaranteed
bound on the error if the
function lies inside the cone
0.2
421
0.4
0.6
0.8
10 1
0.8
0.6
-1
0.4
10 -2
0.2
10
10
10
-3
10
-5
10
-4
10
-3
10
-2
10
-1
10
Error
decrease quickly, cone conditions have a weaker effect. One can see that the number
of summands involving
Smr,m ( f ) is 2mr 1 for a fixed r . Thus, in order to give a
uniform weight to each wavenumber, we chose C(m) proportional to 2m .
422
Acknowledgments The authors thank Ronald Cools and Dirk Nuyens for organizing MCQMC
2014 and greatly appreciate the suggestions made by Sou-Cheng Choi, Frances Kuo, Lan Jiang,
Dirk Nuyens and Yizhi Zhang to improve this manuscript. In addition, the first author also thanks
Art B. Owen for partially funding traveling expenses to MCQMC 2014 through the US National
Science Foundation (NSF). This work was partially supported by NSF grants DMS-1115392, DMS1357690, and DMS-1522687.
References
1. Choi, S.C.T., Ding, Y., Hickernell, F.J., Jiang, L., Jimnez Rugama, Ll.A., Tong, X., Zhang,
Y., Zhou, X.: GAIL: Guaranteed Automatic Integration Library (versions 1.02.1). MATLAB
software. https://github.com/GailGithub/GAIL_Dev (20132015)
2. Clancy, N., Ding, Y., Hamilton, C., Hickernell, F.J., Zhang, Y.: The cost of deterministic,
adaptive, automatic algorithms: cones, not balls. J. Complex. 30(1), 2145 (2014)
3. Dick, J., Kuo, F., Sloan, I.H.: High dimensional integration the Quasi-Monte Carlo way. Acta
Numer. 22, 133288 (2013)
4. Hickernell, F.J.: Obtaining O(N 2+ ) convergence for lattice quadrature rules. In: Fang, K.T.,
Hickernell, F.J., Niederreiter, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2000,
pp. 274289. Springer, Berlin (2002)
5. Hickernell, F.J., Jimnez Rugama, Ll.A.: Reliable adaptive cubature using digital sequences.
In: Cools, R., Nuyens, D., (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2014, vol. 163,
pp. 367383. Springer, Heidelberg (2016)
6. Hickernell, F.J., Niederreiter, H.: The existence of good extensible rank-1 lattices. J. Complex.
19, 286300 (2003)
7. Hickernell, F.J., Sloan, I.H., Wasilkowski, G.W.: On tractability of weighted integration over
bounded and unbounded regions in Rs . Math. Comput. 73, 18851901 (2004)
8. Hickernell, F.J., Sloan, I.H., Wasilkowski, G.W.: The strong tractability of multivariate integration using lattice rules. In: Niederreiter, H. (ed.) Monte Carlo and Quasi-Monte Carlo Methods
2002, pp. 259273. Springer, Berlin (2004)
9. LEcuyer, P., Munger, D.: Algorithm xxx: A general software tool for constructing rank-1 lattice
rules. ACM Trans. Math. Softw. (2016). To appear, http://www.iro.umontreal.ca/~lecuyer/
myftp/papers/latbuilder.pdf
10. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. CBMS-NSF
Regional Conference Series in Applied Mathematics. SIAM, Philadelphia (1992)
11. Novak, E., Wozniakowski, H.: Tractability of Multivariate Problems Volume II: Standard Information for Functionals. No. 12 in EMS Tracts in Mathematics. European Mathematical Society,
Zrich (2010)
12. Sidi, A.: A new variable transformation for numerical integration. In: Brass, H., Hmmerlin,
G. (eds.) Numerical Integration IV, No. 112 in International Series of Numerical Mathematics,
pp. 359373. Birkhuser, Basel (1993)
13. Sloan, I.H., Joe, S.: Lattice Methods for Multiple Integration. Oxford University Press, Oxford
(1994)
1 Introduction
Modeling with physical entities like cameras, light sources, and materials on top of
a scene surface stored in a computer, light transport simulation may deliver photorealistic images. Due to complex discontinuities and the curse of dimension, analytic
solutions are out of reach. Thus simulation algorithms have to rely on sampling path
space and summing up the contributions of light transport paths that connect camera sensors and light sources. Depending on the complexity of the modeled scene,
the inherent noise of sampling may vanish only slowly with the progression of the
computation.
This noise may be efficiently reduced by smoothing the contribution of light transport paths before reconstructing the image. So far, intermediate approximations were
A. Keller (B) K. Dahm N. Binder
NVIDIA, Fasanenstr. 81, 10623 Berlin, Germany
e-mail: keller.alexander@gmail.com
K. Dahm
e-mail: ken.dahm@gmail.com
N. Binder
e-mail: nikolaus.binder@gmail.com
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_21
423
424
A. Keller et al.
computed for this purpose. However, removing frequent persistent visual artifacts
due to insufficient approximation then forces simulation from scratch. In addition,
optimizing the numerous parameters of such methods in order to increase efficiency
has been challenging.
We therefore propose a simple and efficient deterministic algorithm that has fewer
parameters. Furthermore, visual artifacts are guaranteed to vanish by progressive
computation and the consistency of the scheme, which in addition overcomes tedious
parameter tuning. While the algorithm unites the advantages of previous work, it also
provides the desired noise reduction as shown by many practical examples.
(a)
425
(b)
Fig. 1 Illustration of connecting path segments in light transport simulation: a Segments of light
transport paths are generated by following photon trajectories from the light source L and tracing
paths from the camera. End points of path segments then are connected either if they are mutually
visible (dashed line, shadow ray) or if they are sufficiently close (indicated by the dashed circle).
b Complementary to these connection techniques, path space filtering is illustrated by the green
part of the schematic: The contribution ci to the vertex xi of a light transport path is replaced by
a smoothed contribution ci resulting from averaging contributions csi + j to vertices inside the ball
B(n). This averaged contribution ci then is multiplied by the throughput i of the path segment
towards the camera and accumulated on the image plane P. In order to guarantee a consistent
algorithm, the radius r (n) of the ball B(n) must vanish with an increasing number n of samples
Photon mapping connects end points of path segments that are less than a specified radius apart. Decreasing such a radius r (n) with the increasing number n of
sampled light transport paths as introduced by progressive photon mapping [5], the
scheme became consistent: In the limit it in fact becomes equivalent to shadow ray
connections. A consistent and numerically robust quasi-Monte Carlo method for
progressive photon mapping has been developed in [12], while the references in this
article reflect the latest developments in photon mapping as well. Similar to stochastic progressive photon mapping [4], the computation is processing consecutive
batches of light transport paths. Depending on the low discrepancy sequence used,
some block sizes are preferable over others and we stick to integer block sizes of the
form bm as derived in [12]. Note that b is fixed by the underlying low discrepancy
sequence.
426
A. Keller et al.
Considering the ith out of a current total of n light transport paths, selecting a
vertex xi suitable for filtering the radiance contribution ci of the light path segment
towards xi also determines the throughput i along the path segment towards the
camera (see Fig. 1b). While any or even multiple vertices of a light transport path
may be selected, a simple and practical choice is the first vertex along the path from
the camera whose optical properties are considered sufficiently diffuse.
As mentioned before, one low discrepancy sequence is transformed to sample
path space in contiguous batches of bm N light transport paths, where for each
path one selected tuple (xi , i , ci ) is stored for path space filtering. As the memory
consumption is proportional to the batch size bm and given the size of the tuples
and the maximum size of a memory block, it is straightforward to determine the
maximum natural number m.
Processing the batch of bm paths starting at index si := bim bm , the image is
formed by accumulating i ci , where
B(n) xsi + j xi wi, j csi + j
bm 1
j=0 B(n) x si + j x i wi, j
bm 1
ci :=
j=0
(1)
is the weighted average of the contributions csi + j of all vertices xsi + j in a ball B(n)
of radius r (n) centered in xi normalized by the sum of weights wi, j as illustrated in
Fig. 1. While the weights will be detailed in Sect. 2.1, for the moment it is sufficient
to postulate wi,i = 0.
Centered in xi , the characteristic function B(n) always includes the ith path (as
opposed to for example [28]). Therefore, given an initial radius r0 (see Sect. 2.2 for
details), and a radius (see [12])
r (n) =
r0
for (0, 1)
n
(2)
vanishing with the total number n of paths guarantees limn ci = ci and thus
consistency. As a consequence, all artifacts visible during progressive computation
must be transient, even if they may vanish slowly. However, selecting a small radius
to hide the transient artifacts is a goal competing with a large radius to include as
many as possible contributions in the weighted average.
Given the path space samples of a path tracer with next event estimation and
implicit multiple importance sampling [11, 19], Fig. 2 illustrates progressive path
space filtering, especially its noise reduction, transient artifacts, and consistency
for an increasing number n of light transport paths. The lighting consists of a high
dynamic range environment map. The first hit points as seen from the camera are
stored as the vertices xi , where the range search and filtering takes place.
In spite of the apparent similarity of Eq. 1, methods used for scattered data interpolation [14, 25], and weighted uniform sampling [23, 27], there are principal differences: First, an interpolation property ci = ci would inhibit any averaging right from
the beginning and second, bm , as bm is proportional to the required amount of
memory to store light transport paths. Nevertheless, the batch size bm should be cho-
427
Fig. 2 The series of images illustrates progressive path space filtering. Each image shows the
unfiltered input above and the accumulation of weighted averages below the diagonal. As more and
more batches of paths are processed, the splotchy artifacts vanish due to the consistency of the
algorithm as guaranteed by the decreasing range search radius r (n). Model courtesy M. Dabrovic
and Crytek
sen as large as memory permits, because the efficiency results from simultaneously
filtering as many vertices as possible.
Caching samples of irradiance and interpolating them to increase the efficiency
of light transport simulation [32] has been intensively investigated [18] and has been
implemented in many renderers (see Fig. 5b). Scintillation in animations is the key
artifact of this method, which appears due to interpolating cached irradiance samples
that are noisy [17, Sect. 6.3.2] and cannot be placed in a coherent way over time. Such
artifacts require to adjust a set of multiple parameters followed by simulation from
scratch, because the method is not consistent.
Other than irradiance interpolation, path space filtering efficiently can filter across
discontinuities such as detailed geometry (for examples, see Fig. 6). It overcomes
the necessity of excessive trajectory splitting to reduce noise in the cached samples,
too, which enables path tracing using the fire-and-forget paradigm as required for
efficient parallel light transport simulation. This in turn fits the observation that
with an increasing number of simulated light transport paths trajectory splitting
becomes less efficient. In addition, reducing artifacts in a frame due to consistency
only requires to continue computation instead of starting over from scratch.
The averaging process defined in Eq. 1 may be iterated within a batch of light
transport paths, i.e. computing ci from ci and so on. This yields a further dramatic
428
A. Keller et al.
Fig. 3 Iterated weighted averaging very efficiently smooths the solution by relaxation at the cost
of losing some detail. Obviously, path space filtering replaces the black pixels of the input with the
weighted averages, which brightens up the image in the expected way. Model courtesy G. M. Leal
LLaaguno
speed up at the cost of some blurred illumination detail as can be seen in Fig. 3. Note
that such an iteration is consistent, too, because the radius r (n) decreases with the
number of batches.
429
Fig. 4 The effect of weighting: The top left image was rendered by a forward path tracer at 16 path
space samples per pixel. The bottom left image shows the same algorithm with path space filtering.
The improvement is easy to see in the enlarged insets. The right column illustrates the effect of the
single components of the weights. From top to bottom: Using uniform weights, the image looks
blurred and light is transported around corners. Including only samples with similar surface normals
(middle), removes a lot of blur resulting in crisp geometry. The image at the bottom right in addition
reduces texture blur by not filtering contributions with too different local throughput by the surface
reflectance properties. Finally, the bottom left result adds improvements on the shadow boundaries
by excluding contributions that have too different visibility. Model courtesy M. Dabrovic and Crytek
In situations where this evaluation is too costly or not feasible, the algorithm has
to rely on data stored during path segment generation. Such data usually includes
a color term, which is the bidirectional scattering distribution function (BSDF)
multiplied by the ratio of the cosine between surface normal and direction of
incidence and the probability density function (pdf) evaluated for the directions
of transport. For the example of cosine distributed samples on diffuse surfaces
only the diffuse albedo remains, because all other terms cancel. If a norm of the
difference of these terms in xsi + j and xi is below a threshold ( 2 < 0.05 in our
implementation), the contribution of xsi + j is included in the average. Unless the
surface is diffuse, the similarity of the directions of observation must be checked
as well to avoid incorrect in-scattering on glossy materials. Including more and
430
A. Keller et al.
more heuristics of course excludes more and more candidates, decreasing the
potential of noise reduction. In the real-time implementation of path space filtering [2], the weighted average is computed for each component resulting from a
decomposition of path space induced by the basis functions used to represent the
optical surface properties.
Blurred shadows: Given a point light source, its visibility as seen from xi and xsi + j
may be either identical or different. In order to avoid sharp shadow boundaries
to be blurred, contributions may be only included upon identical visibility. For
ambient occlusion and illumination by an environment map, blur can be reduced
by comparing the lengths of each one ray shot into the hemisphere at xi and xsi + j
by thresholding their difference.
Using only binary weights that are either zero or one, the denominator of the ratio in
Eq. 1 amounts to the number of included contributions. Although seemingly counterintuitive, using the norms to directly weight the contributions results in higher variance. This effect already has been observed in an article [1] on efficient anti-aliasing:
Having other than uniform weights, the same contribution may be weighted differently in neighboring queries, which in turn results in increased noise. In a similar
way, using kernels (for examples see [26] or kernels used in the domain of smoothed
particles hydrodynamics (SPH)) other than the characteristic function B(n) to weight
contributions by their distance to the query location xi increases the variance.
431
hand, it indicates that the initial radius needs to be selected sufficiently large in order
to include a meaningful number of contributions in the weighted averages from Eq. 1.
The initial radius r0 also may depend on the query location xi . For example, it
r 2
may be derived from the definition of the solid angle := d 20 of a disk of radius
r0 in xi perpendicular to a ray at a distance d from the ray origin. For a fixed solid
angle , the initial radius
d 2
r0 =
d
432
A. Keller et al.
edges. While this does not happen for the weighted average of path space filtering,
contrast may be reduced (see the foliage rendering in Fig. 6). In addition, so-called
fire flies that actually are rarely sampled peaks of the integrand, are attenuated by
the weighted average and therefore may look more like splotches instead of single
bright pixels. Since both progressive photon mapping and path space filtering are
consistent, all of these artifacts must be transient.
(a)
trajectory splitting
(b)
irradiance interpolation
(c)
(d)
super-sampling
Fig. 5 In order to determine the radiance in xi as seen by the long ray, a many rays are shot
into the hemisphere to sample the contributions. As this becomes too expensive due to the large
number of rays, b irradiance interpolation interpolates between cached irradiance samples that were
smoothed by trajectory splitting. c Path space filtering mimics trajectory splitting by averaging the
contributions of paths in the proximity. d Supersampling the information provided by the paths used
for path space filtering is possible by tracing additional path segments from the camera. Note that
then xi does not have an intrinsic contribution
(a)
433
(b)
ambient occlusion
(c)
shadows
(d)
(e)
complex geometry
(d)
transluscent material
Fig. 6 The split image comparisons show how path space filtering can remove substantial amounts
of noise in various example settings. Models courtesy S. Laine, cgtrader, Laubwerk, Stanford
Computer Graphics Laboratory, and G.M. Leal LLaaguno
(see Fig. 6f), multiple views of a scene, rendering for light field displays, or an
animation of a static scene can greatly benefit as vertices can be shared among all
frames to be rendered.
Motion blur: Identical to [11], the consistent simulation of motion blur may be realized by averaging images at distinct points in time. As an alternative, extending
the range search to include proximity in time allows for averaging across vertices with different points in time. In cases where linear motion is a sufficient
approximation and storing linear motion vectors is affordable, reconstructing the
visibility as introduced in [20, 21] may improve the speed of convergence.
Spectral rendering: The consistent simulation of spectral light transport may be realized by averaging monochromatic contributions ci associated to a wavelength i .
The projection onto a suitable color system may happen during the averaging
process, where the suitable basis functions are multiplied as factors to the weights.
One example of such a set of basis functions are the CIE XYZ response curves.
434
A. Keller et al.
Fig. 7 The left image shows a fence rendered with one light transport path per pixel. The image on
the right shows the result of anti-aliasing by path space filtering using the paths from the left image
and an additional three camera paths per pixel. Model courtesy Chris Wyman
4 Conclusion
Path space filtering is simple to implement on top of any sampling-based rendering
algorithm and has low overhead. The progressive algorithm efficiently reduces variance and is guaranteed to converge without persistent artifacts due to consistency.
It will be interesting to explore the principle applied to integro-approximation problems other than computer graphics and to investigate how the method fits into the
context of multilevel Monte Carlo methods.
References
1. Ernst, M., Stamminger, M., Greiner, G.: Filter importance sampling. In: Proceedings of the
IEEE/EG Symposium on Interactive Ray Tracing, pp. 125132 (2006)
435
2. Gautron, P., Droske, M., Wchter, C., Kettner, L., Keller, A., Binder, N., Dahm, K.: Path space
similarity determined by Fourier histogram descriptors. In: ACM SIGGRAPH 2014 Talks,
SIGGRAPH14, pp. 39:139:1. ACM (2014)
3. Georgiev, I., Krivnek, J., Davidovic, T., Slusallek, P.: Light transport simulation with vertex
connection and merging. ACM Trans. Graph. (TOG) 31(6), 192:1192:10 (2012)
4. Hachisuka, T., Jensen, H.: Stochastic progressive photon mapping. In: SIGGRAPH Asia09:
ACM SIGGRAPH Asia Papers, pp. 18. ACM (2009)
5. Hachisuka, T., Ogaki, S., Jensen, H.: Progressive photon mapping. ACM Trans. Graph. 27(5),
130:1130:8 (2008)
6. Hachisuka, T., Pantaleoni, J., Jensen, H.: A path space extension for robust light transport
simulation. ACM Trans. Graph. (TOG) 31(6), 191:1191:10 (2012)
7. Jensen, H.: Realistic Image Synthesis Using Photon Mapping. AK Peters, Natick (2001)
8. Jensen, H., Buhler, J.: A rapid hierarchical rendering technique for translucent materials. ACM
Trans. Graph. 21(3), 576581 (2002)
9. Keller, A.: Instant radiosity. In: SIGGRAPH97: Proceedings of the 24th Annual Conference
on Computer Graphics and Interactive Techniques, pp. 4956 (1997)
10. Keller, A.: Quasi-Monte Carlo Methods for Photorealistic Image Synthesis. Ph.D. thesis, University of Kaiserslautern, Germany (1998)
11. Keller, A.: Quasi-Monte Carlo image synthesis in a nutshell. In: Dick, J., Kuo, F., Peters, G.,
Sloan, I. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2012, pp. 203238. Springer,
Heidelberg (2013)
12. Keller, A., Binder, N.: Deterministic consistent density estimation for light transport simulation.
In: Dick, J., Kuo, F., Peters, G., Sloan, I. (eds.) Monte Carlo and Quasi-Monte Carlo Methods
2012, pp. 467480. Springer, Heidelberg (2013)
13. Keller, A., Droske, M., Grnschlo, L., Seibert, D.: A divide-and-conquer algorithm for simultaneous photon map queries. Poster at High-Performance Graphics
in Vancouver. http://www.highperformancegraphics.org/previous/www_2011/media/Posters/
HPG2011_Posters_Keller1_abstract.pdf (2011)
14. Knauer, E., Brz, J., Mller, S.: A hybrid approach to interactive global illumination and soft
shadows. Vis. Comput.: Int. J. Comput. Graph. 26(68), 565574 (2010)
15. Kollig, T., Keller, A.: Efficient bidirectional path tracing by randomized quasi-Monte Carlo
integration. In: Niederreiter, H., Fang, K., Hickernell, F. (eds.) Monte Carlo and Quasi-Monte
Carlo Methods 2000, pp. 290305. Springer, Berlin (2002)
16. Kontkanen, J., Rsnen, J., Keller, A.: Irradiance filtering for Monte Carlo ray tracing. In: Talay,
D., Niederreiter, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2004, pp. 259272.
Springer, Berlin (2004)
17. Krivnek, J.: Radiance caching for global illumination computation on glossy surfaces. Ph.D.
thesis, Universit de Rennes 1 and Czech Technical University in Prague (2005)
18. Krivnek, J., Gautron, P.: Practical Global Illumination with Irradiance Caching. Synthesis
lectures in computer graphics and animation. Morgan & Claypool, San Rafael (2009)
19. Lafortune, E.: Mathematical Models and Monte Carlo Algorithms for Physically Based Rendering. Ph.D. thesis, Katholieke Universiteit Leuven, Belgium (1996)
20. Lehtinen, J., Aila, T., Chen, J., Laine, S., Durand, F.: Temporal light field reconstruction for
rendering distribution effects. ACM Trans. Graph. 30(4), 55:155:12 (2011)
21. Lehtinen, J., Aila, T., Laine, S., Durand, F.: Reconstructing the indirect light field for global
illumination. ACM Trans. Graph. 31(4), 51 (2012)
22. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM,
Philadelphia (1992)
23. Powell, M., Swann, J.: Weighted uniform sampling - a Monte Carlo technique for reducing
variance. IMA J. Appl. Math. 2(3), 228236 (1966)
24. Schwenk, K.: Filtering techniques for low-noise previews of interactive stochastic ray tracing.
Ph.D. thesis, Technische Universitt Darmstadt (2013)
25. Shepard, D.: A two-dimensional interpolation function for irregularly-spaced data. In: Proceedings of the 23rd ACM National Conference, pp. 517524. ACM (1968)
436
A. Keller et al.
26. Silverman, B.: Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC,
London (1986)
27. Spanier, J., Maize, E.: Quasi-random methods for estimating integrals using relatively small
samples. SIAM Rev. 36(1), 1844 (1994)
28. Suykens, F., Willems, Y.: Adaptive filtering for progressive Monte Carlo image rendering. In:
WSCG 2000 Conference Proceedings (2000)
29. Teschner, M., Heidelberger, B., Mller, M., Pomeranets, D., Gross, M.: Optimized spatial
hashing for collision detection of deformable objects. In: Proceedings of VMV03, pp. 4754.
Munich, Germany (2003)
30. Veach, E.: Robust Monte Carlo Methods for Light Transport Simulation. Ph.D. thesis, Stanford
University (1997)
31. Wald, I., Kollig, T., Benthin, C., Keller, A., Slusallek, P.: Interactive global illumination using
fast ray tracing. In: Debevec, P., Gibson, S. (eds.) Rendering Techniques (Proceedings of the
13th Eurographics Workshop on Rendering), pp. 1524 (2002)
32. Ward, G., Rubinstein, F., Clear, R.: A ray tracing solution for diffuse interreflection. Comput.
Graph. 22, 8590 (1988)
33. Wyman, C., Sloan, P., Shirley, P.: Simple analytic approximations to the CIE XYZ color matching functions. J. Comput. Graph. Tech. (JCGT) 2, 111 (2013). http://jcgt.org/published/0002/
02/01/
1 Introduction
In this paper we study multivariate integration Is (f ) = [0,1]s f (x) dx in reproducing
kernel Hilbert spaces H (K) of functions f : [0, 1]s R, equipped with the norm
H (K) , where K denotes the reproducing kernel. We refer to Aronszajn [1] for
an introduction to the theory of reproducing kernel Hilbert spaces. Without loss of
generality, see, e.g., [19, 23], we can restrict ourselves to approximating Is (f ) by
means of linear algorithms QN,s of the form
QN,s (f , P) :=
N1
qk f (xk ),
k=0
437
438
sup
f H (K)
f H (K) 1
Is (f ) QN,s (f , P) .
where the infimum is extended over all N-element point sets in [0, 1)s . Additionally,
the initial error e(0, s) is defined as the worst-case error of the zero algorithm,
e(0, s) =
sup
f H (K)
f H (K) 1
|Is (f )|
and is used as a reference value. In this paper we are interested in the dependence
of the worst-case error on the dimension s. To study this dependence systematically
we consider the so-called information complexity defined as
Nmin (, s) = min{N N0 : e(N, s) e(0, s)},
which is the minimal number of points required to reduce the initial error by a factor
of , where > 0.
We would like to avoid cases where the information complexity Nmin (, s) grows
exponentially or even faster with the dimension s or with 1 . To quantify the behavior
of the information complexity we use the following notions of tractability.
We say that the integration problem in H (K) is
Nmin (,s)
weakly QMC-tractable, if lims+1 logs+
= 0;
1
polynomially QMC-tractable, if there exist non-negative numbers c, p, and q such
that Nmin (, s) csq p ;
strongly polynomially QMC-tractable, if there exist non-negative numbers c and
p such that Nmin (, s) cp .
439
tractability theory is summarized in the three volumes of the book of Novak and
Wozniakowski [1921] which we refer to for extensive information on this subject
and further references. Most of these investigations have in common that reproducing
kernel Hilbert spaces are tensor products of one-dimensional spaces whose kernels
are all of the same type (but maybe equipped with different weights). In this paper
we consider the case where the reproducing kernel is a tensor product of spaces
with kernels of different type. We call such spaces hybrid spaces. Some results on
tractability in general hybrid spaces can be found in the literature. For example, in
[20] multivariate integration is studied for arbitrary reproducing kernels Kd without
relation to Kd+1 . Here we consider as a special instance the tensor product of Walsh
and Korobov spaces. As far as we are aware of, this specific problem has not been
studied in the literature so far. This paper is a first attempt in this direction.
In particular, we consider the tensor product of an s1 -dimensional weighted Walsh
space and an s2 -dimensional weighted Korobov space (the exact definitions will be
given in the next section). The study of such spaces could be important in view of the
integration of functions which are periodic with respect to some of the components
and, for example, piecewise constant with respect to the remaining components.
Moreover, it has been pointed out by several scientists (see, e.g., [11, 17]) that
hybrid integration problems may be relevant for certain integration problems in
applications. Indeed, communication with the authors of [11] and [17] have motivated
our idea for considering function spaces, where we may have very different properties
of the integrands with respect to different components, as for example regarding
smoothness.
From the analytical point of view, it is very challenging to deal with integration
in hybrid spaces. The reason for this is the rather complex interplay between the
different analytic and algebraic structures of the kernel functions. In the present study
we are concerned with Fourier analysis carried out simultaneously with respect to
the Walsh and the trigonometric function system. The problem is also closely related
to the study of hybrid point sets which received much attention in recent times (see,
for example, [5, 6, 810, 1315]).
The paper is organized as follows. In Sect. 2 we introduce the Hilbert space under
consideration in this paper. The main result states necessary and sufficient conditions
for various notions of tractability and is stated in Sect. 3. In Sect. 4 we prove the
necessary conditions and in Sect. 5 the sufficient ones.
440
significant nonzero digit of k, we define the kth Walsh function walk : [0, 1) C
(in base b) by
1 0 + + a+1 a
,
walk (x) := e
b
where e(v) := exp(2 iv). For dimension s 2 and vectors k = (k1 , . . . , ks ) Ns0
and x = (x1 , . .
. , xs ) [0, 1)s we define the kth Walsh function walk : [0, 1)s C
by walk (x) := sj=1 walkj (xj ).
Furthermore, for l Zs and y Rs we define the lth trigonometric function by
el (y) := e(l y), where denotes the usual dot product.
We define two functions r (1) , r (2) : let > 1 and > 0 be reals and let = (j )j1
be a sequence of positive reals.
For integer b 2, and k N0 let
(1)
r,
(k)
:=
1
if k = 0,
logb k
b
if k = 0.
(1)
(1)
For k = (k1 , . . . , ks ) Ns0 let r,
(k) := sj=1 r,
(kj ). Even though the parameter
j
(1)
b occurs in the definition of r, , we do not explicitly include it in our notation as
the choice of b will usually be clear from the context.
For l Z let
1
if l = 0,
(2)
(l) :=
r,
|l|
if l = 0.
(2)
For l = (l1 , . . . , ls ) Zs let r,
(l) :=
s
(2)
j=1 r,j (lj ).
441
for (x, y), (x , y ) [0, 1]s1 +s2 (to be more precise, we should write x, x [0, 1]s1
and y, y [0, 1]s2 ; from now on, when using the notation (x, y) [0, 1]s1 +s2 , we
shall always tacitly assume that x [0, 1]s1 and y [0, 1]s2 ).
Note that Ks,, can be written as
Kor
Ks,, ((x, y), (x , y )) = KsWal
(1) (x, x )Ks , , (2) (y, y ),
1 ,1 ,
2 2
(1)
where KsWal
(1) is the reproducing kernel of a Hilbert space based on Walsh functions.
1 ,1 ,
This space is defined as
(1) <
(k)wal
:
f
f
=
,
H (KsWal
f
(1) ) :=
wal
k
s
,
,
1
1
,
,
1 1
s1
kN0
where the
fwal (k) :=
[0,1]s1
f s1 ,1 , (1) =
1/2
s
kN01
r(1)
(1) (k)
1 ,
|
fwal (k)|2
This so-called Walsh space was first introduced and studied in [3]. The kernel
KsWal
(1) can be written as (see [3, p. 157])
1 ,1 ,
KsWal
(1) (x, x ) =
1 ,1 ,
s
r(1)
(1) (k)walk (x)walk (x )
1 ,
kN01
s1
1+
j=1
s1
j(1)
walk (xj xj )
kN
b1 logb k
(2)
(3)
j=1
where denotes digit-wise subtraction modulo b, and where the function wal,1 is
defined as in [3, p. 170], where it is also noted that 1 + j wal,1 (u, v) 0 for any
u, v as long as j 1.
Furthermore, KsKor
(2) is the reproducing kernel of a Hilbert space based on
2 ,2 ,
trigonometric functions. This second function space is defined as
(2) <
(l)e
:
f
f
=
,
H (KsKor
f
(2) ) :=
trig
l
s
,
,
2
2
,
,
2 2
s2
lZ0
442
where the
ftrig (l) :=
[0,1]s2
f s2 ,2 , (2) =
1/2
lZs2
r(2)
(2) (l)
2 ,
|
ftrig (l)|
This so-called Korobov space is studied in many papers. We refer to [20, 22] and
the references therein for further information. The kernel KsKor
(2) can be written as
2 ,2 ,
(see [22])
KsKor
(2) (y, y ) =
2 ,2 ,
r(2)
(2) (l)el (y)el (y )
2 ,
lZs2
s2
j=1
s2
1 +
el (yj yj )
j(2)
lZ\{0}
|l|2
(4)
cos(2 l(yj yj ))
(2)
1 + 2j
j=1
l 2
l=1
(5)
Note that Ks2 ,2 , (2) (y, y ) 0 as long as j(2) (2 (2 ))1 for all j 1, where is
the Riemann zeta function.
Furthermore, [1, Part I, Sect. 8, Theorem I, p. 361] implies that Ks,, is the reproKor
ducing kernel of the tensor product of the spaces H (KsWal
(1) ) and H (Ks , , (2) ),
1 ,1 ,
2 2
i.e., of the space
Kor
H (Ks,, ) = H (KsWal
(1) ) H (Ks , , (2) ).
1 ,1 ,
2 2
The elements of H (Ks,, ) are defined on [0, 1]s1 +s2 , and the space is equipped with
the norm
||f ||s,, =
s
kN01
where
f (k, l) :=
that
[0,1]s1 +s2
s
lZ02
1/2
1
(2)
r(1)
(1) (k) r , (2) (l)
1 ,
2
|
f (k, l)|2
f (x, y)walk (x)el (y) dx dy. From (1), (3) and (5) it follows
s1
s2
cos(2
l(y
y
))
j
j
(1)
(2)
.
1 + 2j
= (1 + j wal,1 (xj , xj ))
2
l
j=1
j=1
l=1
443
In particular, if j(1) 1 and j(2) (2 (2 ))1 for all j 1, then the kernel Ks,,
is nonnegative.
We study the problem of numerically integrating a function f H (Ks,, ), i.e.,
we would like to approximate
Is (f ) =
[0,1]s1
f (x, y) dx dy.
[0,1]s2
s1 +s2
We use a QMC rule based on a point set SN,s = ((xn , yn ))N1
, so we
n=0 [0, 1)
approximate Is (f ) by
N1
1
f (xn , yn ).
N n=0
Using [4, Proposition 2.11] we obtain that e(0, s1 + s2 ) = 1 for all s1 , s2 and
e2 (H (Ks,, ), SN,s ) = 1 +
N1
1
Ks,, ((xn , yn ), (xn , yn )).
N 2 n,n =0
(6)
lim
(s1 +s2 )
s1
s2
j(1) +
j(2) < .
j=1
(7)
j=1
j=1
(s1 +s2 )
j(1)
log+ s1
s2
+
j=1
j(2)
log+ s2
< ,
(8)
lim
(s1 +s2 )
s1
s2
(1)
(2)
j +
j s1 + s2 = 0.
j=1
j=1
(9)
444
The necessity of the conditions in Theorem 1 will be proven in Sect. 4 and the
sufficiency in Sect. 5. In the latter section we will see that the notions of tractability
can be achieved by using so-called hybrid point sets made of polynomial lattice point
sets and of classical lattice point sets. We will construct these by a component-bycomponent algorithm.
e2 (H (Ks,, ), SN,s ) 1 +
where () :=
b (b1)
b b
s1
s2
1
(1 + j(1) (1 )) (1 + 2j(2) (2 )),
N j=1
j=1
1
,
2 (2 )
respectively, for j 1. This imposes no loss of generality due to the fact that if we
decrease product weights, then the problem becomes easier. Under the assumption
on the weights we know from Sect. 2.2 that Ks,, is nonnegative. Now, taking only
the diagonal elements in (6), and from the representations of the kernels in (1), (3)
and (5) we obtain
1
Ks,, ((xn , yn ), (xn , yn ))
N 2 n=0
s1
s2
1
(1 + j(1) (1 )) (1 + 2j(2) (2 )) ,
= 1 +
N j=1
j=1
N1
e2 (H (Ks,, ), SN,s ) 1 +
s1
s2
1
(1)
(2)
Nmin (, s1 + s2 )
(1 + j (1 )) (1 + 2j (2 )) .
1 + 2 j=1
j=1
445
Now the two products can be analyzed in the same way as it was done in [3] and [22],
respectively. This finally leads to the necessary conditions (7) and (8) in Theorem 1.
Now assume that we have weak QMC-tractability. Then for = 1 we have
1
2
1
log(1 + j(1) (1 )) +
log(1 + 2j(2) (2 ))
+
2
j=1
j=1
and
s1
lim
j=1
log(1 + j(1) (1 )) +
s2
j=1
log(1 + 2j(2) (2 ))
s1 + s2
(s1 +s2 )
= 0.
This implies that limj j(k) = 0 for k {1, 2}. For small enough x > 0 we have
log(1 + x) cx for some c > 0. Hence, for some j1 , j2 N and s1 j1 and s2 j2
we have
s1
log(1 + j(1) (1 )) +
j=1
s2
log(1 + 2j(2) (2 ))
j=1
c1 (1 )
s1
j(1)
+ c2 2 (2 )
j=j1
s2
j(2)
j=j2
c1 (1 )
s1
(s1 +s2 )
j=j1
j(1) + c2 2 (2 )
s1 + s2
s2
j=j2
j(2)
= 0.
446
yn =
nz
1
,...,
nz
s2
for all 0 n N 1,
where {} denotes the fractional part of a number. Note that it suffices to choose
z ZNs2 , where
ZN := {z {0, 1, . . . , N 1} : gcd(z, N) = 1}.
Polynomial lattice point sets (according to Niederreiter [18]). Let Fb be the
finite field of prime order b. Furthermore let Fb [x] be the set of polynomials
over Fb , and let Fb ((x 1 )) be the field of formal Laurent series over Fb . The
latter contains the field of rational functions as a subfield. Given m N, set
Gb,m := {a Fb [x] : deg(a) < m} and define a mapping m : Fb ((x 1 )) [0, 1)
by
m
l
m
tl x
tl bl .
:=
l=z
l=max(1,z)
Let f Fb [x] with deg(f ) = m and g = (g1 , . . . , gs1 ) Fb [x]s1 . The polynomial
lattice point set (xh )hGb,m with generating vector g, consisting of bm points in
[0, 1)s1 , is defined by
h(x)g1 (x)
h(x)gs1 (x)
, . . . , m
for all h Gb,m .
xh := m
f (x)
f (x)
A QMC rule using a (polynomial) lattice point set is called (polynomial) lattice rule.
1
N2
N1
n,n =0
s2
j=1
s1
walk (xn,j xn ,j )
(1)
1 + j
j=1
1 + j(2)
b1 logb k
el (yn,j yn ,j )
,
|l|2
kN
lZ\{0}
(10)
447
We now proceed to our construction algorithm. Note that we state the algorithm
in a way such that we exclude the cases s1 = 0 or s2 = 0, as these are covered by
the results in [2] and [16]. For s N let [s] := {1, . . . , s}.
Algorithm 1 Let s1 , s2 , m N, a prime number b, and an irreducible polynomial
f Fb [x] with deg(f ) = m be given. We write N = bm .
1. For d1 = 1, choose g1 = 1 Gb,m .
2. For d2 = 1, choose z1 ZN such that e2(1,1),, (g1 , z1 ) is minimized as a function
of z1 .
3. For d1 [s1 ] and d2 [s2 ], assume that gd1 = (g1 , . . . , gd1 ) and zd2 =
(z1 , . . . , zd2 ) are given. If d1 < s1 and d2 < s2 go to either Step (3a) or (3b). If
d1 = s1 and d2 < s2 go to Step (3b). If d1 < s1 and d2 = s2 , go to Step (3a). If
d1 = s1 and d2 = s2 , the algorithm terminates.
a. Choose gd1 +1 Gb,m such that e2(d1 +1,d2 ),, ((gd1 , gd1 +1 ), zd2 ) is minimized
as a function of gd1 +1 . Increase d1 by 1 and repeat Step 3.
b. Choose gd2 +1 ZN such that e2(d1 ,d2 +1),, (gd1 , (zd2 , zd2 +1 )) is minimized as
a function of zd2 +1 . Increase d2 by 1 and repeat Step 3.
Remark 1 As pointed out in, e.g., [22] and [3], the infinite sums in (10) can be
represented in closed form, so the construction cost of Algorithm 1 is of order
O(N 3 (s1 + s2 )2 ). Of course it would be desirable to lower this cost bound. If s1 = 0
or s2 = 0 one can use the fast CBC approach based on FFT as done by Cools and
Nuyens to reduce the construction cost to O(sN log N), where s {s1 , s2 }. It is not
yet clear if these ideas also apply to the hybrid case.
Theorem 3 Let d1 [s1 ] and d2 [s2 ] be given. Then the generating vectors gd1
and zd2 constructed by Algorithm 1 satisfy
d1
d2
2
(1)
(2)
e2(d1 ,d2 ),, (gd1 , zd2 )
1 + j 2(1 )
1 + j 4 (2 ) .
N j=1
j=1
The proof of Theorem 3 is deferred to the appendix.
s1
s2
2
(1)
(2)
1 + j 2(1 )
1 + j 4 (2 ) .
e2 (N, s1 + s2 )
N j=1
j=1
448
(1 +
j=1
(1)
exp
j (1 ) =: C1 (1 , (1) ).
j(1) (1 ))
j=1
j=1
s2
j=1 (1
2
C(, )
C1 (1 , (1) )C2 (2 , (2) ) =:
.
N
N
s1
s2
1 + j(1) 2(1 )
1 + j(2) 4 (2 )
Nmin (, s1 + s2 ) 22
.
j=1
j=1
Hence
log Nmin (, s1 + s2 ) log 4 + 2 log 1 + 2(1 )
s1
j=1
j(1) + 4 (2 )
s2
j(2) ,
j=1
6 Open Questions
The findings of this paper naturally lead to the following two open problems:
Study tractability for general algorithms (not only QMC rules) and compare the
tractability conditions with the one given in Theorem 1.
From Theorem 3 we obtain a convergence rate of order O(N 1/2 ) for the worstcase error which is the same as for plain Monte Carlo. Improve this convergence
rate.
449
Acknowledgments The authors would like to thank the anonymous referees for their remarks
which helped to improve the presentation of this paper. P. Kritzer is supported by the Austrian
Science Fund (FWF), Projects P23389-N18 and F05506-26. The latter is part of the Special Research
Program Quasi-Monte Carlo Methods: Theory and Applications. F. Pillichshammer is supported
by the Austrian Science Fund (FWF) Project F5509-N26, which is part of the Special Research
Program Quasi-Monte Carlo Methods: Theory and Applications.
N1
wal
(x
(1)
x
(1))
1(2)
k1 n,1
n ,1
(1,1) (z1 ) := 2
1 + 1(1)
N n,n =0
b1 logb k1
k1 N
l1 Z\{0}
2
1 + 1(1) (1 ) .
N
N1
walk (xn,1 (1) xn ,1 (1))
1(2)
(1)
1
1 + 1
= 2
N n,n =0
b1 logb k1
(1,1) (z1 )
k1 N
|l1 |2
(N) zZ
N l1 Z\{0}
1(2) 1 + 1(1) (1 ) B ,
(11)
450
where
2i(nn )zl1 /N
1
e
1
B := 2
|l1 |2
N n=0 n =0 (N) zZ
N l1 Z\{0}
N
1 1 e2inzl1 /N
,
=
|l1 |2
N n=1 (N) zZ
N1
N1
N l1 Z\{0}
since the inner sum in the second line always has the same value. We now use [16,
Lemmas 2.1 and 2.3] and obtain B 4 (2 )N 1 , where we used that N has only
one prime factor. Hence we obtain
(1,1) (z1 )
1(2)
1 + 1(1) (1 ) 4 (2 ).
N
(12)
Combining Eqs. (11) and (12) yields the desired bound for (g1 , z1 ).
Let us now assume d1 [s1 ] and d2 [s2 ] and that we have already found
generating vectors gd1 and zd2 such that the bound in Theorem 3 is satisfied.
In what follows, we are going to distinguish two cases: In the first case, we assume
that d1 < s1 and add a component gd1 +1 to gd1 , and in the second case, we assume
that d2 < s2 and add a component zd2 +1 to zd2 . In both cases, we will show that the
corresponding bounds on the squared worst-case errors hold.
Let us first consider the case where we start from (gd1 , zd2 ) and add, by Algorithm
1, a component gd1 +1 to gd1 . According to Eq. (10), we have
e2(d1 +1,d2 ),, ((gd1 , gd1 +1 ), zd2 ) = e2(d1 ,d2 ),, (gd1 , zd2 ) + (d1 +1,d2 ) (gd1 +1 ),
where
(d1 +1,d2 ) (gd1 +1 )
d1
N1
d(1)
wal
(x
(g
)
x
(g
))
k n,j j
n ,j j
+1
:= 1 2
1 + j(1)
N n,n =0 j=1
b1 logb k
kN
d2
e
(y
(z
)
y
(z
))
l n,j j
n ,j j
1 + j(2)
2
|l|
j=1
lZ\{0}
walk (xn,d
1 +1
kN
451
d1
d2
2
(1)
1 + j 2(1 )
1 + j(2) 4 (2 ) .
N j=1
j=1
(13)
d
d2
1
1 + j(1) (1 )
1 + j(2) 2 (2 ) C ,
j=1
j=1
where
walk (xn,d1 +1 (g) xn ,d1 +1 (g))
1
1
C := 2
N n,n =0 N gG
b1 logb k
b,m kN
N1 N1
1 1 walk (xnn ,d1 +1 (g))
= 2
N n=0 n =0 N gG
b1 logb k
b,m kN
N1
1 1 walk (xn,d1 +1 (g))
=
,
N n=0 N gG
b1 logb k
b,m kN
N1
N n=1 N gG
b1 logb k
kN
b,m kN
N1
(1 )
1 1 walk (xn,d1 +1 (g))
=
+
.
N
N n=1 N gG
b1 logb k
b,m kN
452
b,m
kN
N gG
b1 logb k
kN
k0(N)
kN
k0(N)
b,m
b,m
=:C,n,1 + C,n,2 .
By results in [2],
C,n,1 =
kN
k0(N)
1
b1 logb k
(1 )
(1 )
.
m
b
N
Furthermore,
C,n,2 =
kN
k0(N)
b1 logb k
1
walk (xn,d1 +1 (g))
N gG
b,m
b 1
g
1
,
wal
k
b1 logb k
N g=0
bm
m
kN
k0(N)
gGb,m
gGb,m
gGb,m
n(x)g(x)
walk m
f (x)
m
b
1
g
g(x)
=
walk m
walk m ,
f (x)
b
g=0
453
d1 +1
d2
2
(1)
1 + 2j (1 )
1 + j(2) 4 (2 ) .
N j=1
j=1
The case where we start from (gd1 , zd2 ) and add, by Algorithm 1, a component
zd2 +1 to zd2 can be shown by a similar reasoning. We just sketch the basic points:
According to Eq. (10), we have
e2(d1 ,d2 +1),, (gd1 , (zd2 , zd2 +1 )) = e2(d1 ,d2 ),, (gd1 , zd2 ) + (d1 ,d2 +1) (zd2 +1 ),
where e2(d1 ,d2 ),, (gd1 , zd2 ) satisfies (13) and where
d1
d2
1 + j(1) (1 )
1 + j(2) 2 (2 ) D ,
(d1 ,d2 +1) (zd2 +1 ) d(1)
2 +1
j=1
j=1
with
N1
1 1 e2inzl/N 4 (2 )
D =
,
|l|2
N n=0 (N) zZ
N
N lZ\{0}
according to [16, Lemmas 2.1 and 2.3]. This implies
(d1 ,d2 +1) (zd2 +1 )
d1
4 (2 )
d(1)
2 +1
1 + j(1) (1 )
j=1
d2
1 + j(2) 2 (2 ) .
j=1
References
1. Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337404 (1950)
2. Dick, J., Kuo, F.Y., Pillichshammer, F., Sloan, I.H.: Construction algorithms for polynomial
lattice rules for multivariate integration. Math. Comput. 74, 18951921 (2005)
3. Dick, J., Pillichshammer, F.: Multivariate integration in weighted Hilbert spaces based on Walsh
functions and weighted Sobolev spaces. J. Complex. 21, 149195 (2005)
4. Dick, J., Pillichshammer, F.: Digital Nets and Sequences. Discrepancy Theory and Quasi-Monte
Carlo Integration. Cambridge University Press, Cambridge (2010)
5. Hellekalek, P.: Hybrid function systems in the theory of uniform distribution of sequences. In:
Plaskota, L., Wozniakowski, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2010, pp.
435449. Springer, Berlin (2012)
6. Hellekalek, P., Kritzer, P.: On the diaphony of some finite hybrid point sets. Acta Arithmetica
156, 257282 (2012)
454
1 Introduction
Global sensitivity analysis (GSA) is the study of how the uncertainty in the model
output is apportioned to the uncertainty in model inputs [9, 14]. GSA can provide
valuable information regarding the dependence of the model output to its input parameters. The variance-based method of global sensitivity indices developed by Sobol
[11] became very popular among practitioners due to its efficiency and easiness of
S. Kucherenko (B) S. Song
Imperial College London, SW7 2AZ, London, UK
e-mail: s.kucherenko@imperial.ac.uk
S. Song
e-mail: shufangsong@nwpu.edu.cn
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_23
455
456
interpretation. There are two types of Sobol sensitivity indices: the main effect
indices, which estimate the individual contribution of each input parameter to the
output variance, and the total sensitivity indices, which measure the total contribution
of a single input factor or a group of inputs [3]. The total sensitivity indices are used
to identify non-important variables which can then be fixed at their nominal values
to reduce model complexity [9]. For high-dimensional models the direct application
of variance-based GSA measures can be extremely time-consuming and impractical.
A number of alternative SA techniques have been proposed. In this paper we
present derivative based global sensitivity measures (DGSM) and their link with
Sobol sensitivity indices. DGSM are based on averaging local derivatives using
Monte Carlo or Quasi Monte Carlo sampling methods. These measures were briefly
introduced by Sobol and Gershman in [12]. Kucherenko et al. [6] introduced some
other derivative-based global sensitivity measures (DGSM) and coined the acronym
DGSM. They showed that the computational cost of numerical evaluation of DGSM
can be much lower than that for estimation of Sobol sensitivity indices which later
was confirmed in other works [5]. DGSM can be seen as a generalization and formalization of the Morris importance measure also known as elementary effects [8].
Sobol and Kucherenko [15] proved theoretically that there is a link between DGSM
and the Sobol total sensitivity index Sitot for the same input. They showed that DGSM
can be used as an upper bound on total sensitivity index Sitot . They also introduced
modified DGSM which can be used for both a single input and groups of inputs [16].
Such measures can be applied for problems with a high number of input variables
to reduce the computational time. Lamboni et al. [7] extended results of Sobol and
Kucherenko for models with input variables belonging to the class of Boltzmann
probability measures.
The numerical efficiency of the DGSM method can be improved by using the automatic differentiation algorithm for calculation DGSM as was shown in [5]. However,
the number of required function evaluations still remains to be proportional to the
number of inputs. This dependence can be greatly reduced using an approach based
on algorithmic differentiation in the adjoint or reverse mode [1]. It allows estimating all derivatives at a cost at most 46 times of that for evaluating the original
function [4].
This paper is organised as follows: Sect. 2 presents Sobol global sensitivity
indices. DGSM and lower and upper bounds on total Sobol sensitivity indices
for uniformly distributed variables and random variables are presented in Sects. 3
and 4, respectively. In Sect. 5 we consider test cases which illustrate an application
of DGSM and their links with total Sobol sensitivity indices. Finally, conclusions
are presented in Sect. 6.
f (x) = f 0 +
d
f i (xi ) +
i=1
where f 0 =
Hd
d
d
457
(1)
i=1 j>i
(2)
Hd
are satisfied for all different groups of indices x1 , , xs such that 1 i 1 < i 2 < ...
< i s n. These conditions guarantee that all terms in (1) are mutually orthogonal
with respect to integration.
The variances of the terms in the ANOVA decomposition add up to the total
variance:
n
n
2
2
f (x)dx f 0 =
Di1 ...is ,
D=
Hd
s=1 i 1 <<i s
where Di1 ...is = H d f i21 ...is (xi1 , ..., xis )d xi1 , ..., xis are called partial variances.
Total partial variances account for the total influence of the factor xi :
Di1 ...is ,
Ditot =
<i>
<i>
fying condition 1 i 1 < i 2 < ... < i s d, 1 s d, where one of the indices is
equal to i. The corresponding total sensitivity index is defined as
Sitot = Ditot D.
Denote u i (x) the sum of all terms in ANOVA decomposition (1) that depend on xi :
u i (x) = f i (xi ) +
d
j=1, j=i
(3)
458
Hd
u i2 (x)dx
Hd
Denote z = (x1 , ..., xi1 , xi+1 , ..., xd ) the vector of all variables but xi , then
x (xi , z) and f (x) f (xi , z). The ANOVA decomposition of f (x) in (1) can be
presented in the following form
f (x) = u i (xi , z) + v(z),
where v(z) is the sum
of terms independent of xi . Because of (2) and (3) it is easy to
show that v(z) = H d f (x)d xi . Hence
u i (xi , z) = f (x)
f (x)d xi .
(4)
Hd
(5)
We note that in the case of independent random variables all definitions of the
ANOVA decomposition remain to be correct but all derivations should be considered
in probabilistic sense as shown in [14] and presented in Sect. 4.
(6)
459
i =
Hd
f (x)
xi
2
dx.
(8)
(9)
Hd
f (x)
xi (1 xi )
xi
2
dx.
(10)
2
We note that i is in fact the mean value of f xi . We also note that
u i
f
=
.
xi
xi
(11)
Hd
2
< Sitot .
(12)
u i (x)
u i (x)
dx.
xi
(13)
u i (x)
u i (x)
dx
xi
Hd
u i2 (x)dx
Hd
u i (x)
xi
2
dx.
(14)
It is easy to prove that the left and right parts of this inequality cannot be equal.
should be linearly dependent.
Indeed, for them to be equal functions u i (x) and uix(x)
i
For simplicity consider a one-dimensional case: x [0, 1]. Lets assume
u(x)
= Au(x),
x
460
where A is a constant. The general solution to this equation u(x) = B exp(Ax), where
B is a constant. It is easy to see that this solution is not consistent with condition (3)
which should
be imposed on function u(x).
dx can be transformed as
Integral H d u i (x) uix(x)
i
1 u i2 (x)
u i (x) uix(x)
dx =
dx
d
i
2 H xi
1
2
2
=
d1 u i (1, z) u i (0, z) dz
2 H
1
=
d1 (u i (1, z) u i (0, z)) (u i (1, z) + u i (0, z)) dz
2 H
1
=
d ( f (1, z) f (0, z)) ( f (1, z) + f (0, z) 2v(z)) dz.
2 H
Hd
(15)
All terms in the last integrand are independent of xi , hence we can replace integration with respect to dz to integration with respect to dx and substitute v(z) for
f (x) in the integrand due to condition (3). Then (15) can be presented as
Hd
u i (x)
u i (x)
1
dx =
xi
2
Hd
Hd
Hd
2
< Sitot
(17)
xim u i (x)dx.
(18)
2
Hd
xim u i (x)dx
Hd
xi2m dx
Hd
u i2 (x)dx.
(19)
461
It is easy to see that equality in (19) cannot be attained. For this to happen functions
u i (x) and xim should be linearly dependent. For simplicity consider a one-dimensional
case: x [0, 1]. Lets assume
u(x) = Ax m ,
where A = 0 is a constant. This solution does not satisfy condition (3) which should
be imposed on function u(x).
Further we use the following transformation:
Hd
xim+1 u i (x)
u i (x)
dx = (m + 1)
xim u i (x)dx +
xim+1
dx
xi
xi
Hd
Hd
(20)
We notice that
Hd
xi2m dx =
1
.
(2m + 1)
(21)
(2m + 1)
Hd
2
< Sitot .
(m + 1)2 D
(22)
(2m + 1)
Hd
+1)
2
(23)
(24)
462
We note that both lower and upper bounds can be estimated by a set of derivative
based measures:
i = {i , wi(m) }, m > 0.
(25)
i
.
2D
(26)
i
,
D
(27)
u2d x
ud x
0
1
2
x(1 x)u 2 d x.
(28)
f (x)
,
xi
i = 1, ..., d. Evaluation of
can be done analytically for explicitly given easilydifferentiable functions or numerically.
In the case of straightforward numerical estimations of all partial derivatives and
computation of integrals using MC or QMC methods, the number of required function
evaluations for a set of all input variables is equal to N (d + 1), where N is a number
of sampled points. Computing LB1 also requires values of f (0, z) , f (1, z), while
computing LB2 requires only values of f (1, z). In total, numerical computation of
463
i =
f (x)
xi
Rd
2
d F(x).
(29)
Rd
f (x)
d F(x).
xi
(30)
i2 wi2
Sitot .
D
Proof Consider
obtain
Rd
Rd
(31)
xi u i (x)d F(x)
Rd
xi2 d F(x)
Rd
u i2 (x)d F(x).
(32)
464
Equality in (32) can be attained if functions u i (x) and xi are linearly dependent. For
simplicity consider a one-dimensional case. Lets assume
u(x) = A(x ),
where A = 0 is a constant. This solution satisfies condition (3) for normally distributed variable x with the mean value : R d u(x)d F(x) = 0.
For normally distributed variables the following equality is true [2]:
2
Rd
xi u i (x)d F(x)
=
Rd
xi2 d F(x)
Rd
u i (x)
d F(x).
xi
(33)
By definition R d xi2 d F(x) = i2 . Using (32) and (33) and dividing the resulting
inequality by D we obtain the lower bound (31).
(34)
i2
i .
D
(35)
5 Test Cases
In this section we present the results of analytical and numerical estimation of Si ,
Sitot , LB1, LB2 and UB1, UB2. The analytical values for DGSM and Sitot were calculated and compared with numerical results. For text case 2 we present convergence
plots in the form of root mean square error (RMSE) versus the number of sampled
465
points N . To reduce the scatter in the error estimation the values of RMSE were
averaged over K = 25 independent runs:
i =
K
I0 2
1 Ii,k
K k=1
I0
21
.
Here Ii can be either numerically computed Sitot , LB1, LB2 or UB1, UB2, I0 is
the corresponding analytical value of Sitot , LB1, LB2 or UB1, UB2. The RMSE can
be approximated by a trend line cN . Values of () are given in brackets on the
plots. QMC integration based on Sobol sequences was used in all numerical tests.
Example 1 Consider a linear with respect to xi function:
f (x) = a(z)xi + b(z).
1
2
2
For this function Si = Sitot , Ditot = 12
d1 a (z)dz, i =
H
H d1 a (z)dz, L B1 =
2
2
2
2
2
a
(z)2a
(z)x
dzd
x
(2m+1)m
a(z)dz
( H d1
) . A maximum value
( Hd (
i)
i)
= 0 and (m) =
4(m+2)2 (m+1)2 D
4D d1 a 2 (z)dz
H
0.0401
(m) is attained at m =3.745, when (m )
=
D
2
a(z)dz . The lower and upper bounds are L B 0.48Sitot . U B1 1.22Sitot .
1
1
tot
2
U B2 = 12D
0 a(z) dz = Si . For this test function UB2 < UB1.
of
Example 2 Consider the so-called g-function which is often used in GSA for illustration purposes:
f (x) =
d
gi ,
i=1
where gi =
|4xi 2|+ai
1+ai
d
total variance is D = 1 +
1+
j=1
1/3
(1+a j )2
j=1, j=i
Table 1 The analytical expressions for Si , Sitot and LB2 for g-function
Si
1/3
(1 + ai )2 D
Sitot
1/3
(1+ai )2
d
j=1, j=i
1+
1/3
(1+a j )2
(m)
(2m + 1) 1
2
4 1(1/2)m+1
m+2
(1 + ai )2 (m + 1)2 D
466
(m)
0.0772
By solving equation ddm
= 0, we find that m = 9.64, (m ) = (1+a
.
2
i) D
depend
on
a
,i
=
1,
2,
...,
d
and
d.
In the
It is interesting to note that m does not
i
Si
0.257
1
tot
, UB1 and
all i, S(mtot ) (4/3)
d1 , S tot (4/3)d1 . The analytical expression for Si
i
i
UB2 are given in Table 2.
2
2
Sitot
Sitot
For this test function UB1
= 48 , UB2
= 41 , hence UB2
= 12 < 1. Values of Si ,
UB1
Sitot , UB and LB2 for the case of a = [0, 1, 4.5, 9, 99, 99, 99, 99], d = 8 are given
in Table 3 and shown in Fig. 1. We can conclude that for this test the knowledge of
LB2 and UB1, UB2 allows to rank correctly all the variables in the order of their
importance.
Figure 2 presents RMSE of numerical estimations of Sitot , UB1 and LB2. For an
individual input LB2 has the highest convergence rate, following by Sitot , and UB1
in terms of the number of sampled points. However, we recall that computation of
all indices requires N FL B = N (3d + 1) function evaluations for LB, while for Sitot
this number is N FS = N (d + 1) and for UB it is also N FU B = N (d + 1).
4
n
Example 3 Hartmann function f (x) = ci exp
i j (x j pi j )2 , xi
i=1
j=1
[0, 1]. For this test case a relationship between the values LB1, LB2 and Si varies
with the change of input (Table 4, Fig. 3): for variables x2 and x6 LB1> Si > LB2,
while for all other variables LB1< LB2 <Si . LB* is much smaller than Sitot for all
inputs. Values of m* vary with the change of input. For all variables but variable 2
UB1 > UB2.
Table 2 The analytical expressions for Sitot UB1 and UB2 for g-function
Sitot
1/3
(1+ai )2
d
j=1, j=i
1+
1/3
(1+a j )2
U B1
16
d
j=1, j=i
1+
1/3
(1+a j )2
(1 + ai )2 2 D
U B2
d
4
j=1, j=i
1+
1/3
(1+a j )2
3(1 + ai )2 D
Table 3 Values of LB*, Si , Sitot , UB1 and UB1. Example 2, a = [0, 1, 4.5, 9, 99, 99, 99, 99],
d =8
x1
x2
x3
x4
x5 ...x8
L B
Si
Sitot
U B1
U B2
0.166
0.716
0.788
3.828
3.149
0.0416
0.179
0.242
1.178
0.969
0.00549
0.0237
0.0343
0.167
0.137
0.00166
0.00720
0.0105
0.0509
0.0418
0.000017
0.0000716
0.000105
0.000501
0.00042
467
Si
tot
Si
UB
LB
log (RMSE)
log (N)
2
Fig. 1 Values of Si , Sitot , LB2 and UB1 for all input variables. Example 2, a = [0, 1, 4.5,
9, 99, 99, 99, 99], d = 8
(b) 4
UB1(0.962)
LB2(1.134)
4
6
8
10
(c) 14
15
tot
Si (0.953)
log2 (RMSE)
tot
Si (0.977)
log 2(RMSE)
log2 (RMSE)
(a) 0
UB1(0.844)
LB2(1.048)
8
10
12
14
12
16
4
10
11
12
10
11
12
tot
Si (0.993)
16
17
18
19
20
21
22
23
UB1(0.894)
LB2(0.836)
10
11 12
log (N)
log2(N)
log 2(N)
Fig. 2 RMSE of Sitot , UB and LB2 versus the number of sampled points. Example 2, a = [0, 1, 4.5,
9, 99, 99, 99, 99], d = 8. Variable 1 (a), variable 3 (b) and variable 5 (c)
Table 4 Values of m , LB1, LB2, UB1, UB2, Si and Sitot for all input variables
L B1
L B2
m
L B
Si
Sitot
U B1
U B2
x1
x2
x3
x4
x5
x6
0.0044
0.0515
4.6
0.0515
0.115
0.344
1.089
1.051
0.0080
0.0013
10.2
0.0080
0.00699
0.398
0.540
0.550
0.0009
0.0011
17.0
0.0011
0.00715
0.0515
0.196
0.150
0.0029
0.0418
5.5
0.0418
0.0888
0.381
1.088
0.959
0.0014
0.0390
3.6
0.0390
0.109
0.297
1.073
0.932
0.0357
0.0009
19.9
0.0357
0.0139
0.482
1.046
0.899
468
i
tot
Si
UB
LB1
LB2
log2(RMSE)
0.5
1
1.5
2
2.5
3
3.5
log2(N)
Fig. 3 Values of Si , Sitot , UB1, LB1 and LB2 for all input variables. Example 3
6 Conclusions
We can conclude that using lower and upper bounds based on DGSM it is possible
in most cases to get a good practical estimation of the values of Sitot at a fraction of
the CPU cost for estimating Sitot . Small values of upper bounds imply small values
of Sitot . DGSM can be used for fixing unimportant variables and subsequent model
reduction. For linear function and product function, DGSM can give the same variable
ranking as Sitot . In a general case variable ranking can be different for DGSM and
variance based methods. Upper and lower bounds can be estimated using MC/QMC
integration methods using the same set of partial derivative values. Partial derivatives
can be efficiently estimated using algorithmic differentiation in the reverse (adjoint)
mode.
We note that all bounds should be computed with sufficient accuracy. Standard
techniques for monitoring convergence and accuracy of MC/QMC estimates should
be applied to avoid erroneous results.
Acknowledgments The authors would like to thank Prof. I. Sobol his invaluable contributions
to this work. Authors also gratefully acknowledge the financial support by the EPSRC grant
EP/H03126X/1.
469
References
1. Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques of Algorithmic
Differentiation. SIAM Philadelphia, Philadelphia (2008)
2. Hardy, G.H., Littlewood, J.E., Polya, G.: Inequalities, 2nd edn. Cambridge University Press,
Cambridge (1973)
3. Homma, T., Saltelli, A.: Importance measures in global sensitivity analysis of model output.
Reliab. Eng. Syst. Saf. 52(1), 117 (1996)
4. Jansen, K., Leovey, H., Nube, A., Griewank, A., Mueller-Preussker, M.: A first look at quasiMonte Carlo for lattice field theory problems. Comput. Phys. Commun. 185, 948959 (2014)
5. Kiparissides, A., Kucherenko, S., Mantalaris, A., Pistikopoulos, E.N.: Global sensitivity analysis challenges in biological systems modeling. J. Ind. Eng. Chem. Res. 48(15), 71687180
(2009)
6. Kucherenko, S., Rodriguez-Fernandez, M., Pantelides, C., Shah, N.: Monte Carlo evaluation of
derivative based global sensitivity measures. Reliab. Eng. Syst. Saf. 94(7), 11351148 (2009)
7. Lamboni, M., Iooss, B., Popelin, A.L., Gamboa, F.: Derivative based global sensitivity measures: general links with Sobols indices and numerical tests. Math. Comput. Simul. 87, 4554
(2013)
8. Morris, M.D.: Factorial sampling plans for preliminary computational experiments. Technometrics 33, 161174 (1991)
9. Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M.,
Tarantola, S.: Global Sensitivity Analysis: The Primer. Wiley, New York (2008)
10. Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., Tarantola, S.: Variance based
sensitivity analysis of model output: design and estimator for the total sensitivity index. Comput.
Phys. Commun. 181(2), 259270 (2010)
11. I.M. Sobol Sensitivity estimates for nonlinear mathematical models. Matem. Modelirovanie ,
2: 112-118, 1990 (in Russian). English translation: Math. Modelling and Comput. Experiment,
1(4):407414, 1993
12. Sobol, I.M., Gershman, A.: On an altenative global sensitivity estimators. Proc SAMO, Belgirate 1995, 4042 (1995)
13. Sobol, I.M.: Global sensitivity indices for nonlinear mathematical models and their Monte
Carlo estimates. Math. Comput. Simul. 55(13), 271280 (2001)
14. Sobol, I.M., Kucherenko, S.: Global sensitivity indices for nonlinear mathematical models.
Rev. Wilmott Mag. 1, 5661 (2005)
15. Sobol, I.M., Kucherenko, S.: Derivative based global sensitivity measures and their link with
global sensitivity indices. Math. Comput. Simul. 79(10), 30093017 (2009)
16. Sobol, I.M., Kucherenko, S.: A new derivative based importance criterion for groups of variables and its link with the global sensitivity indices. Comput. Phys. Commun. 181(7), 1212
1217 (2010)
Abstract We are interested in lower bounds for the approximation of linear operators
between Banach spaces with algorithms that may use at most n arbitrary linear
functionals as information. Lower error bounds for deterministic algorithms can
easily be found by Bernstein widths; for mappings between Hilbert spaces it is already
known how Bernstein widths (which are the singular values in that case) provide
lower bounds for Monte Carlo methods. Here, a similar connection between Bernstein
numbers and lower bounds is shown for the Monte Carlo approximation of operators
between arbitrary Banach spaces. For non-adaptive algorithms we consider the
average case setting with the uniform distribution on finite dimensional balls and in
this way we obtain almost optimal prefactors. By combining known results about
Gaussian measures and their connection to the Monte Carlo error we also cover
adaptive algorithms, however with weaker constants. As an application, we find
that for the L approximation of smooth functions from the class C ([0, 1]d ) with
uniformly bounded partial derivatives, randomized algorithms suffer from the curse
of dimensionality, as it is known for deterministic algorithms.
Keywords Monte Carlo Lower error bounds Bernstein numbers Approximation
of smooth functions Curse of dimensionality
471
472
R.J. Kunsch
(1)
where all functionals Lk are chosen at once. In that case N is a linear mapping for
fixed . For adaptive information N the choice of the functionals may depend
on previously obtained information, we assume that the choice of the k-th functional
) Lk;y
is a measurable mapping (; y1 , . . . , yk1
(2)
For randomized algorithms An = (An ()) this can be generalized as the expected
error at f
(3)
e(An , S, f ) := E S(f ) An (f ) G ,
however some authors prefer the root mean square error
e2 (An , S, f ) :=
E S(f ) An (f ) 2G .
(4)
(The expectation E is written for the integration over all with respect to P.)
Since e(An , S, f ) e2 (An , S, f ), for lower bounds we may stick to the first version.
The global error of an algorithm An is defined as the error for the worst input
from the input set F
F, we write
e(An , S, F) := sup e(An , S, f ).
f F
(5)
Bernstein Numbers and Lower Bounds for the Monte Carlo Error
473
For technical purposes we also need the average error, which is defined for any
(sub-)probability measure (the so-called input distribution) on the input space
F,
e(An , S, ) :=
e(An , S, f ) d(f ).
(6)
(A sub-probability measure on
F is a positive measure with 0 < (
F) 1.)
The difficulty of a problem within a particular setting refers to the error of optimal
algorithms, we define
e, (n, S, F) :=
An A ,
inf e(An , S, ),
An A ,
where {ran, det} and {ada, nonada}. These quantities are inherent properties
of the problem S, so eran, (n, S, F) is called the Monte Carlo error, edet, (n, S, F) the
worst case error, and edet, (n, S, ) the -average case error of the problem S.
Since adaption and randomization are additional features for algorithms we have
eran, (n, S, ) edet, (n, S, ) and e,ada (n, S, ) e,nonada (n, S, ),
(7)
E e(A
n , S, f ) d(f ) = E
Fubini
inf
A n Andet,
e(An , S, f ) d(f )
e(A n , S, ).
In the last step we used that for any fixed elementary event the realization An
can be seen as a deterministic algorithm.
We will prove lower bounds for the Monte Carlo error by considering particular
average case situations where we have to deal with only deterministic algorithms.
We have some freedom to choose a suitable distribution .
For more details on error settings and types of information see [11].
474
R.J. Kunsch
(8)
(9)
(10)
2
2n (S).
(11)
The new result for operators between arbitrary Banach spaces (see Theorem 1) reads
quite similar, for non-adaptive algorithms we have:
eran,nonada (n, S, F)
1
b2n+1 (S).
2
(12)
For adaptive algorithms we get at least the existence of a constant c 1/215 such
that
(13)
eran,ada (n, S, F) c b2n (S).
Bernstein Numbers and Lower Bounds for the Monte Carlo Error
475
Proposition 2 (Structure of unit balls) Let (V, ) be a normed vector space over
the reals with its closed unit ball B := {x V : x 1}. Then
for any finite-dimensional subspace U V the intersection B U is compact
and has a non-empty interior with respect to the standard topology of U as a
finite-dimensional vector space, i.e. B U U is a d-dimensional body, where
d := dim U,
B is symmetric, i.e. if x B then x B,
B is convex, i.e. for x, y B and any (0, 1) it contains the convex combination
(1 )x + y B.
If conversely a given set B fulfills those properties, it induces a norm by
x B := inf{r 0 | x r B}, x V,
(14)
476
R.J. Kunsch
(15)
Bernstein Numbers and Lower Bounds for the Monte Carlo Error
477
mn
bm (S).
m+1
1
b2n+1 (S).
2
mn
bm (S),
m+1
(16)
which by Proposition 1 (Bakhvalovs technique) provides a lower bound for nonadaptive Monte Carlo methods.
Within the first step we rewrite the integral in (16) as an integral of local average
errors over the information. The set of inputs F and the information mapping N
match the situation of Corollary 1 with m = n + d, each d-dimensional slice Fy :=
F N 1 (y) represents all inputs with the same information y Rn . Since is
the uniform distribution on F, the uniform distribution on Fy is a version of the
conditional distribution of given y = N(f ), which we denote by y . Therefore we
can write the integral from (16) as
478
R.J. Kunsch
S(f ) (N(f )) G d(f ) =
S(f ) (y) G dy (f ) d N 1 (y),
(17)
The size of the slices Fy compared to the central slice F0 (where y = 0) shall be
1/d
. The function R(y)d is a quasi-density
described by R(y) := Vold (Fy )/ Vold (F0 )
for the distribution N 1 of the information y Rn . Further, by subsequent
Lemma 1 we have a lower bound for the inner integral, which we call the local
average error:
S(f ) (y) G dy (f )
d
R(y) bm (S).
d+1
Therefore the integral (17) is bounded from below by an expression that only depends
on the volumes of the parallel slices Fy :
R(y)d+1 dn y
d
N(F)
bm (S),
S(f ) (N(f )) G d(f )
d + 1 N(F) R(y)d dn y
(18)
(19)
where we have cancelled the factor n. For all directions k N(F) the ratio of the
integrands with respect to k is globally bounded from below, in detail
1
d+1 n1
r
dr
d+1
d+1
0 R(r k)
=
,
1
d
n1
d
+
n
+
1
m
+1
dr
0 R(r k) r
where the function r R(r k) [0, 1] is concave on [0, 1] and R(0) = 1. For the
solution of this univariate variational problem see subsequent Lemma 2. It follows
d+1
as well, which along with (18) proves the
that (19) is bounded from below by m+1
theorem.
Bernstein Numbers and Lower Bounds for the Monte Carlo Error
479
The following lemma is about local average errors. Its quintessence is that ballshaped slices S(Fy ) G (with respect to the norm in G) are optimal. For the general
notion of local average radius of information see [11, pp. 197204].
Lemma 1 (Local average error) Let S : Rm G be an injective linear mapping
between Banach spaces, where F Rm is the unit ball with respect to an arbitrary norm on Rm , and let be the uniform distribution on F. Let N : Rm Rn
be a linear surjective information mapping, where for y Rn the conditional
measure y is the uniform distribution on the slice Fy := F N 1 (y). With
1/d
and d := m n for the local average error we
R(y) := Vold (Fy )/ Vold (F0 )
have
d
R(y) bm (S).
inf S(f ) g G dy (f )
gG
d+1
Proof Since S : Rm G is linear and bm (S) > 0, the exact solutions S(Fy ) for
inputs with the same information y Rn in each case form (convex) sets within
d-dimensional affine subspaces Uy := S(N 1 (y)) of the output space G. We compare the volume of subsets within different parallel affine subspaces, i.e. any ddimensional Lebesgue-like measure on U0 is also defined for subsets of the affine
subspaces Uy just as in Corollary 2. The linear mapping S preserves the ratio of
volumes, i.e.
Vold (Fy )
(S(Fy ))
= R(y)d =
.
(20)
Vold (F0 )
(S(F0 ))
Therefore for each information y N(F) the image measure y S 1 is the uniform
distribution on S(Fy ) with respect to . This means that for any g G we need to
show the inequality
1
x g G d (x)
S(f ) g G dy (f ) =
(S(Fy )) S(Fy )
d
R(y) bm (S).
d+1
(21)
(22)
480
R.J. Kunsch
(23)
d
v , for v V ().
V ()
(24)
d
v , for v V ().
(25)
If (S(Fy )) V () we obtain
(25)
S(Fy )
x g G d (x) (S(Fy ))
d
(S(Fy ))(d+1)/d .
d+1
(26)
(27)
x g G y , for x Cy \ S(Fy ).
This enables us to carry out a symmetrization:
S(Fy )
x g G d (x)
(27)
x g G d (x)
C y
V () +
(S(Fy ))
V ()
(v) dv
Bernstein Numbers and Lower Bounds for the Monte Carlo Error
(24),(25)
(S(Fy ))
481
v1/d dv
(28)
d
(S(Fy ))(d+1)/d .
d+1
S(Fy )
x g G d (x)
d
d
R(y) (S(F0 ))1/d
R(y) bm (S),
d+1
d+1
which is (21). For the second inequality we have used the definition of the Bernstein
number, i.e. Bbm (S) (0) S(Rm ) S(F) and therefore Bbm (S) (0) U0 S(F0 ) which
with our scaling (22) implies bm (S)d (S(F0 )).
Remark 1 (Alternative to Bernstein numbers) In the very end of the above proof
we have replaced (S(F0 )) by an expression using the Bernstein number bm (S). In
fact, due to the scaling of , the expression (S(F0 )) is a volume comparison of an
(m n)-dimensional slice of the image of the input set S(F) and the unit ball in G.
We could replace the Bernstein numbers within Theorem 1 by new quantities
km,n (S) := sup inf
Xm Ymn
1/(mn)
,
(29)
where Xm
F and Ymn S(Xm ) are linear subspaces with dimension dim(Xm ) =
dim(S(Xm )) = m and dim(Ymn ) = m n, further BG denotes the unit ball in G and
for each choice of Ymn the volume measure Volmn may be any (mn)-dimensional
Lebesgue measure, since we are only interested in the ratio of volumes.
Lemma 2 (Variational problem) For d, n N consider the variational problem of
minimizing the functional
1
R(r)d+1 r n1 dr
F[R(r)] := 0 1
,
d n1 dr
0 R(r) r
where R : [0, 1] [0, 1] is concave and R(0) = 1. Then
F[R(r)]
d+1
d+n+1
1
0
(1 r)p1 r n1 dr =
(n 1) 1
,
p (p + n 1)
482
R.J. Kunsch
which is a special value of the beta function (see for Example [12, p. 103]). Knowing
the value of this integral we get
F[1 r] =
d+1
.
d+n+1
(30)
(1 (1 )r)d+1 r n1 dr
F[(1 r) + r] = 0 1
d n1 dr
0 (1 (1 )r) r
1
(1 x)d+1 x n1 dx
[x=(1)r] 0
,
=
1
(1 x)d x n1 dx
0
where we have cancelled the factor (1 )n . We can express this as a conditional
expectation using a random variable X [0, 1] with quasi density (1 x)d x n1 :
F[(1 r) + r] = E[(1 X) | X (1 )] = E[(1 X) | (1 X) ],
which obviously is monotonically increasing in .
For any nonlinear concave function R : [0, 1] [0, 1] with R(0) = 1 there exists
exactly one linear function
R(r) = (1 r) + r with
R(r)d r n1 dr
R(r)d r n1 dr = 0.
(31)
R(r)d+1
R(r)d+1 > R(r0 ) R(r)d
R(r)d > 0.
0 > R(r)d+1
R(r)d+1 > R(r0 ) R(r)d
R(r)d .
Therefore
1
1
d+1 n1
R(r)
r
dr
R(r)d+1 r n1 dr
0
0
1
1
(31)
d n1
d n1
> R(r0 )
R(r) r
dr
dr = 0,
R(r) r
0
Bernstein Numbers and Lower Bounds for the Monte Carlo Error
483
m
Remark 2 (Quality of the prefactor) Consider the identity idm1 : m
1 1 with
Bernstein number bm (idm1 ) = 1. (For notation see Sect. 3.1.) For any J {1, . . . , m}
being an index set containing n indices define the deterministic algorithm
AJ (x) :=
xi ei , x m
1,
iJ
where ei = (ij )m
j=1 are the vectors of the standard basis. With being the uniform
distribution on the unit ball B1m m
1 , for the average case setting this type of
algorithm is optimal.
We add some randomness to the above method. Let J = J() be uniformly
distributed on the system of index sets {I {1, ..., m} | #I = n} and define the
Monte Carlo algorithm An = (An ) by
An (x) :=
xi ei ,
iJ()
The error is
e(An , idm1 , x) = E x An (x) 1 =
m
mn
x 1 .
P(i
/ J()) |xi | =
m
i=1
484
R.J. Kunsch
truncated Gaussian measures with Lewiss theorem which gives us a way to choose
a suitable Gaussian measures for our average case setting.
Theorem 2 (Adaptive Monte Carlo methods) Let S :
F G be a compact linear
operator and F
F be the closed unit ball in F. Then for n < m for adaptive Monte
Carlo methods we obtain
eran,ada (n, S, F) c
where the constant can be chosen as c =
mn
bm (S),
m
12e1
16
1
.
108
Remark 3 The given constant can be directly extracted from the proof in Heinrich [3].
However by optimizing some parts of the proof one can show that the theorem is
1
still valid with c = 16
. When restricting to homogenious algorithms (i.e. An ( f ) =
An (f ) for R) we may show the above result with the optimal constant c = 1
(see also Remark 2). The proofs for these statements will be published in future work.
Proof (Theorem 2) As before we assume
F = Rm . We start with the existence of in
some sense optimal Gaussian measures on
F. Let x be a standard Gaussian random
vector in Rm . Then (J) := E Jx F defines a norm on the set of linear operators
J : Rm Rm . By Lewis Theorem (see for example [10, Theorem 3.1]) there exists
a linear mapping J : Rm Rm with maximal determinant subject to (J) = 1,
and tr(J 1 T ) m (T ) for any linear mapping T : Rm Rm . In particular with
T = JP for any rank-(m n) projection P within Rm this implies
E JPx F
mn
.
m
(32)
For the average setting let denote the Gaussian measure for the distribution of
the rescaled random vector c
Jx, where c
= 81 , and let be the truncated measure,
i.e. (A) = (A
F) for measurable sets A Rm . Note that is no probability
measure, but a sub-probability measure with (F) < 1, which is sufficient for the
purpose of lower bounds.
Then by Heinrich [3, Proposition 2] we have
edet,ada (n, S, ) c
c
inf E SJPx G ,
P
(33)
where
the infimum is taken over orthogonal rank-(m n) projections P and c
=
1
1
e . (The conditional measure y for given the information y = N(f ) can
2
be represented as the distribution of c
JPy x with a suitable orthogonal projection Py .)
With SJPx G JPx F bm (S) and (32) and c = c
c
we obtain the theorem.
Note that we consider Monte Carlo algorithms with fixed information cost n,
whereas in [3] n denotes the average information cost En() which leads to slightly
different bounds, like 4c b4n (S) instead of 2c b2n (S).
Bernstein Numbers and Lower Bounds for the Monte Carlo Error
485
3 Applications
3.1 Recovery of Sequences
We compare the results we obtain by Theorems 1 and 2 with some asymptotic lower
bounds of Heinrich [3] for the Monte Carlo approximation of the identity
id : Np Nq .
Here Np denotes RN equipped with the p-norm x p = (|x1 |p + . . . + |xN |p )1/p for
p < , or x = maxi=1,...,N |xi | for p = , the input set is the unit ball BpN of Np .
Since the identity is injective, Bernstein widths and Bernstein numbers coincide.
Proposition 4 (Heinrich 1992) Let 1 p, q and n N. Then
1/q1/p
n
,
1,
if
if
if
if
1 p, q < ,
1 p < q = ,
1 q < p = ,
p = q = .
bm (id : 2m
p
1/q1/p
, if 1 p q or 1 q p 2,
m
2m
1/q1/2
q ) m
, if 1 q 2 p ,
1
if 2 q p .
Combining this with Theorem 2 for m = 2n one may obtain a result similar
to Proposition 4, though without the logarithmic factor for 1 p < q = and
even with a weaker polynomial order for 1 q < p if p > 2. However for
the non-adaptive setting with Theorem 1 we can use the quantities km,n (S) defined
in Remark 1. The following result on volume ratios due to Meyer and Pajor [5] is
relevant to the problematic case 1 q < p .
Proposition 5 (Meyer, Pajor 1988) For every d-dimensional subspace Yd Rm
and for 1 q p we have
Vold (Bpm Yd )
Vold (Bqm Yd )
Vold (Bpd )
Vold (Bqd )
486
R.J. Kunsch
Note that by this for the case 1 q < p = we even have stronger lower bounds
than in Proposition 4, namely without the logarithmic term, however this only holds
for non-adaptive algorithms. On the other hand, for the case 1 p < q = this
result is weaker by a logarithmic factor compared to Heinrichs result.
Proof (Corollary 3) For 1 p q we apply Theorem 1 using the Bernstein
numbers from Lemma 3 with m = 2n.
For 1 q p let m = 4n and d = m n = 3n. By Proposition 5 we have
km,n (id :
m
p
m
q)
= inf m
Yd R
Vold (Bpm Yd )
Vold (Bqm Yd )
1/d
Vold (Bpd )
Vold (Bqd )
1/d
.
(34)
The volume of the unit Ball in dp can be found e.g. in [10, Eq. (1.17)], it is
d
2 1 + 1p
.
Vold (Bpd ) =
1 + dp
For 1 p < we apply Stirlings formula to the denominator
d/p
d
p
d
d
d
,
= 2
1+
e(d/p) where 0
p
p ep
p
12d
and by this we obtain the asymptotics (Vold (Bpd ))1/d d 1/p . For p = we simply
d
have (Vold (B
))1/d = 2. Putting this into (34), by Remark 1 together with Theorem 1
we obtain the corollary.
Finally observe that in the case 1 p q Proposition 5 provides upper
m
bounds for the quantities km,n (id : m
p q ). By this we see that taking these
quantities instead of the Bernstein numbers will not change the order of the lower
4n
4n
bounds for the error eran,nonada (n, id : 4n
p q , Bp ).
(35)
Bernstein Numbers and Lower Bounds for the Monte Carlo Error
487
(36)
Nd0
f F := sup D f .
(37)
Nd0
of
F with dim Vd = 2d/2 . For f Vd one can show D f f for all
multi-indices Nd0 , i.e. f F = f . Therefore with m = 2d/2 and Xm = Vd
we obtain bm (S) = 1. Since the sequence of Bernstein numbers is decreasing, we
know the first 2d/2 Bernstein numbers.
Knowing this, by Theorems 1 and 2 we directly obtain the following result for
randomized algorithms.
Corollary 4 (Curse of dimensionality) For the problems Sd we have
eran,nonada (n, Sd , Fd )
and
1
for n 2d/21 1,
2
(39)
488
R.J. Kunsch
Note that if we do not collect any information about the problem, the best algorithm
would simply return 0 and the so-called initial error is
e(0, Sd , Fd ) = 1.
Even after evaluating exponentially many (in d) information functionals, with nonadaptive algorithms we only halve the initial error, if at all. The problem suffers from
the curse of dimensionality. For more details on tractability notions see [8].
Acknowledgments I want to thank E. Novak and A. Hinrichs for all the valuable hints and their
encouragements during the process of compiling this work.
In addition I would like to thank S. Heinrich for his crucial hint on Bernstein numbers and
Bernstein widths.
Last but not least I would like to express my gratitude to Brown Universitys ICERM for its support
with a stimulating research environment and the opportunity of having scientific conversations that
finally inspired the solution of the adaptive case during my stay in fall 2014.
References
1. Bakhvalov, N.S.: On the approximate calculation of multiple integrals. Vestnik MGU, Ser.
Math. Mech. Astron. Phys. Chem., 4:318: in Russian. English translation: Journal of Complexity 31(502516), 2015 (1959)
2. Gardner, R.J.: The Brunn-Minkowski inequality. Bulletin of the AMS 39(3), 355405 (2002)
3. Heinrich, S.: Lower bounds for the complexity of Monte Carlo function approximation. J.
Complex. 8, 277300 (1992)
4. Li, Y.W., Fang, G.S.: Bernstein n-widths of Besov embeddings on Lipschitz domains. Acta
Mathematica Sinica, English Series 29(12), 22832294 (2013)
5. Meyer, M., Pajor, A.: Sections of the unit ball of np . J. Funct. Anal. 80, 109123 (1988)
6. Naor, A.: The surface measure and cone measure on the sphere of np . Trans. AMS 359, 1045
1079 (2007)
7. Novak, E.: Optimal linear randomized methods for linear operators in Hilbert spaces. J. Complex. 8, 2236 (1992)
8. Novak, E., Wozniakowski, H.: Tractability of Multivariate Problems, Linear Information, vol.
I. European Mathematical Society, Europe (2008)
9. Novak, E., Wozniakowski, H.: Approximation of infinitely differentiable multivariate functions
is intractable. J. Complex. 25, 398404 (2009)
10. Pisier, G.: The Volume of Convex Bodies and Banach Space Geometry. Cambridge University
Press, Cambridge (1989)
11. Traub, J.F., Wasilkowski, G..W., Wozniakowski, H.: Information-Based Complexity. Academic
Press, New York (1988)
12. Wang, Z.X., Guo, D.R.: Special Functions. World Scientific, Singapore (1989)
1 Introduction
Since the publication of Giles articles about multilevel Monte Carlo methods [8, 9],
which applied an earlier idea of Heinrich [10] to stochastic differential equations, an
enormous amount of literature on the application of multilevel Monte Carlo schemes
to various applications has been published. For an overview of the state of the art
in the area, the reader is referred to the scientific program and the proceedings of
MCQMC14 in Leuven.
This note is intended to show the consequences of the availability of different
types of convergence results for stochastic partial differential equations of It type
(SPDEs for short in what follows). Here we consider so called strong and weak
A. Lang (B)
Department of Mathematical Sciences, Chalmers University of Technology
and University of Gothenburg, SE-412 96 Gothenburg, Sweden
e-mail: annika.lang@chalmers.se
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_25
489
490
A. Lang
+
In the context of this note, H denotes a separable Hilbert space. The sequence is said
to converge weakly to Y if
lim |E[(Y )] E[(Y )]| = 0
+
491
N
1 i
Y
N i=1
converges P-almost surely to E[Y ] for N +. In the following lemma we see that
it also converges in mean square to E[Y ] if Y is square integrable, i.e., Y L 2 (; B)
with
L 2 (; B) := v : B, v strongly measurable, vL2 (;B) < + ,
where
vL2 (;B) := E[v2B ]1/2 .
In contrast to the almost sure convergence of EN [Y ] derived from the strong law of
large numbers, a convergence rate in mean square can be deduced from the following
lemma in terms of the number of samples N N.
Lemma 1 For any N N and for Y L 2 (; B), it holds that
1
1
E[Y ] EN [Y ]L2 (;B) = Var[Y ]1/2 Y L2 (;B) .
N
N
The lemma is proven in, e.g., [6, Lemma 4.1]. It shows that the sequence of socalled Monte Carlo estimators (EN [Y ], N N) converges with rate O(N 1/2 ) in
mean square to the expectation of Y .
Next let us assume that (Y , N0 ) is a sequence of approximations of Y , e.g.,
Y V , where (V , N0 ) is a sequence of finite dimensional subspaces of B. For
given L N0 , it holds that
492
A. Lang
YL = Y0 +
L
(Y Y1 )
=1
L
E[Y Y1 ].
=1
A possible way to approximate E[YL ] is to approximate E[Y Y1 ] with the corresponding Monte Carlo estimator EN [Y Y1 ] with a number of independent
samples N depending on the level . We set
E L [YL ] := EN0 [Y0 ] +
L
=1
and call E L [YL ] the multilevel Monte Carlo estimator of E[YL ]. The following lemma
gives convergence results for the estimator depending on the order of weak convergence of (Y , N0 ) to Y and the convergence of the variance of (Y Y1 , N).
If neither estimates on weak convergence rates nor on the convergence of the variances are available, one can usethe in general slowerstrong convergence rates.
Lemma 2 Let Y L 2 (; B) and let (Y , N0 ) be a sequence in L 2 (; B), then,
for L N0 , it holds that
E[Y ] E L [YL ]L2 (;B)
E[Y YL ]B + E[YL ] E L [YL ]L2 (;B)
1/2
L
1
1
= E[Y YL ]B + N0 Var[Y0 ] +
N Var[Y Y1 ]
Y YL L2 (;B) + 2
=1
L
=0
1/2
where Y1 := 0.
Proof This is essentially [3, Lemma 2.2] except that the square root is kept outside
the sum. Therefore it remains to show the property of the multilevel Monte Carlo
estimator that
E[YL ] E L [YL ]2L2 (;B) = N01 Var[Y0 ] +
L
=1
493
=1
and that all summands are independent, centered random variables by the construction of the multilevel Monte Carlo estimator. Thus [7, Proposition 1.12] implies
that
E[E[YL ] E L [YL ]2B ]
= E[E[Y0 ] EN0 [Y0 ]2B ] +
L
=1
494
A. Lang
where denotes the Riemann zeta function, i.e., E[Y ] E L [YL ]L2 (;B) has the
same order of convergence as E[Y YL ]B .
Assume further that the work WB of one calculation of Y Y1 , 1, is
bounded by C4 a for a constant C4 and > 0, that the work to calculate Y0 is
bounded by a constant C5 , and that the addition of the Monte Carlo estimators costs
C6 aL for some 0 and some constant C6 . Then the overall work WL is bounded
by
L
(2) 1+
+ C6 aL .
WL aL2 C5 + C4
a
=1
If furthermore (a , N0 ) decreases polynomially, i.e., there exists a > 1 such that
a = O(a ), then the bound on the computational work simplifies to
WL =
max{2,}
O(aL
)
if < 2,
(2+2) 2+
L , aL }) if 2.
O(max{aL
Proof First, we calculate the error of the multilevel Monte Carlo estimator. It holds
with the made assumptions that
N01 Var[Y0 ] C3 aL2
and, for = 1, . . . , L, that
2 (1+)
a = C2 aL2 (1+) .
2
L
(1+) C2 aL2 (1 + ),
=1
where denotes the Riemann zeta function. To finish the calculation of the error we
apply Lemma 2 and assemble all estimates to
E[Y ] E L [YL ]L2 (;B) (C1 + (C3 + C2 (1 + ))1/2 ) aL .
Next we calculate the necessary work to achieve this error. The overall work consists
of the work WB to compute Y Y1 times the number of samples N on all levels
= 1, . . . , L, the work W0B on level 0, and the addition of the Monte Carlo estimators
2
in the end. Therefore, using the observation that N aL2 a 1+ , = 1, . . . , L,
2
and N0 aL with equality if the right hand side is an integer, we obtain that
WL C5 N0 + C4
L
495
N a + C6 aL
=1
C5 aL2 + C4
L
aL2 a 1+ a + C6 aL
2
=1
L
(2) 1+
+ C6 aL ,
aL2 C5 + C4
a
=1
which proves the first claim of the theorem on the necessary work.
If < 2 and additionally (a , N0 ) decreases polynomially, the sum on the
right hand side is absolutely convergent and therefore
max{2,}
).
WL aL2 (C5 + C4 aL
(2+2) 2+
= O(max{aL
) + C6 aL
, aL }).
We remark that the computation of the sum over different levels of the Monte
Carlo estimators does not increase the computational complexity if Y V for all
N0 and (V , N0 ) is a sequence of nested finite dimensional subspaces of B.
496
A. Lang
(1)
as Hilbert-space-valued stochastic differential equation on the finite time interval (0, T ], T < +, with deterministic initial condition X(0) = X0 . We pose
the following assumptions on the parameters, which ensure the existence of a mild
solution and some properties of the solution which are necessary for the derivation
and convergence of approximation schemes.
Assumption 1 Assume that the parameters of (1) satisfy the following:
1. Let A be a negative definite, linear operator on H such that (A)1 L(H) and
A is the generator of an analytic semigroup (S(t), t 0) on H.
2. The initial value X0 is deterministic and satisfies (A) X0 H for some
[0, 1].
3. The covariance operator Q satisfies (A)(1)/2 LHS < + for the same as
above.
4. The drift F : H H is twice differentiable in the sense that F Cb1 (H; H)
Cb2 (H; H 1 ), where H 1 denotes the dual space of the domain of (A)1/2 .
Under Assumption 1, the SPDE (1) has a continuous mild solution
X(t) = S(t)X0 +
S(t s)F(X(s)) ds +
S(t s) dW (s)
(2)
for t [0, T ], which is in L p (; H) for all p 2 and satisfies for some constant C
that
sup X(t)Lp (;H) C(1 + X0 H ).
t[0,T ]
497
for 0 and
for [0, 1/2] uniformly in , n N0 . Furthermore they satisfy for all [0, 2],
[, min{1, 2 }], and k = 1, . . . , N(n),
k
)(A)/2 L(H) C(h + (t n )/2 )(tkn )(+)/2 .
(S(tkn ) S,n
k
S,n
X0
+ t
k
j=1
kj+1
n
S,n F(X,n (tj1
))
j=1
tjn
n
tj1
kj+1
S,n
dW (s).
(3)
We remark here that we do not approximate the noise which might cause problems in
implementations. One way to treat this problem is to truncate the KarhunenLove
expansion of the Q-Wiener process depending on the decay of the spectrum of Q
(see [2, 5]).
The theory on strong convergence of the introduced approximation scheme is
already developed for some time and the convergence rates are well-known and
stated in the following theorem.
Theorem 2 (Strong convergence [1]) Let the stochastic evolution Eq. (1) with mild
solution X and the sequence of its approximations (X,n , , n N0 ) given by (3)
satisfy Assumptions 1 and 2 for some (0, 1]. Then, for every (0, ), there
exists a constant C > 0 such that for all , n N0 ,
max
k=1,...,N(n)
498
A. Lang
It should be remarked at this point that the order of strong convergence does not
exceed 1/2 although we are considering additive noise since the regularity of the
parameters of the SPDE are assumed to be rough. Under smoothness assumptions
the rate of strong convergence attains one for additive noise since the higher order
Milstein scheme is equal to the EulerMaruyama scheme. Nevertheless, under the
made assumptions on the regularity of the initial condition X0 and the covariance
operator Q of the noise, this does not happen in the considered case.
The purpose of the multilevel Monte Carlo method is to approximate expressions
of the form E[(X(t))] efficiently, where : H R is a sufficiently smooth
functional. Therefore weak error estimates of the form |E[(X(tkn ))]E[(X,n (tkn ))]|
are of importance. Before we state the convergence theorem from [1], we specify the
necessary properties of in the following assumption.
Assumption 3 The functional : H R is twice continuously Frchet differentiable and there exists an integer m 2 and a constant C such that for all x H and
j = 1, 2,
mj
(j) (x)L[m] (H;R) C(1 + xH ),
where (j) (x)L[m] (H;R) is the smallest constant K > 0 such that for all u1 , . . . , um
H,
| (j) (x)(u1 , . . . , um )| Ku1 H um H .
Combining this assumption on the functional with Assumptions 1 and 2 on the
parameters and approximation of the SPDE, we obtain the following result, which
was proven in [1] using Malliavin calculus.
Theorem 3 (Weak convergence [1]) Let the stochastic evolution equation (1) with
mild solution X and the sequence of its approximations (X,n , , n N0 ) given by (3)
satisfy Assumptions 1 and 2 for some (0, 1]. Then, for every : H R
satisfying Assumption 3 and all [0, ), there exists a constant C > 0 such that
for all , n N0 ,
max
k=1,...,N(n)
499
since we are in general not able to compute the expectation exactly. Going back to
Sect. 2, we recall that the first approach to approximate the expected value is to do a
(singlelevel) Monte Carlo approximation. This leads to the overall error given in the
following corollary, which is proven similarly to [3, Corollary 3.6] and included for
completeness.
Corollary 1 Let the stochastic evolution equation (1) with mild solution X and the
sequence of its approximations (X,n , , n N0 ) given by (3) satisfy Assumptions 1
and 2 for some (0, 1]. Then, for every : H R satisfying Assumption 3 and
all [0, ), there exists a constant C > 0 such that for all , n N0 , the error of
the Monte Carlo approximation is bounded by
1
2
E[(X(tkn ))] EN [(X,n (tkn )))]L2 (;R) C h + (t n ) +
k=1,...,N(n)
N
max
for N N.
Proof By the triangle inequality we obtain that
E[(X(tkn ))]EN [(X,n (tkn )))]L2 (;R)
E[(X(tkn ))] E[(X,n (tkn )))]L2 (;R)
+ E[(X,n (tkn )))] EN [(X,n (tkn )))]L2 (;R) .
The first term is bounded by the weak error in Theorem 3 while the second one is
the Monte Carlo error in Lemma 1. Putting these two estimates together yields the
claim.
The errors are all converging with the same speed if we couple and n such that
4
h2 t n as well as the number of Monte Carlo samples N for N0 by N h .
This implies for the overall work that
(d+2+4 )
),
500
A. Lang
Corollary 2 (Strong convergence) Let the stochastic evolution equation (1) with
mild solution X and the sequence of its approximations (X,n , , n N0 ) given by (3)
satisfy Assumptions 1 and 2 for some (0, 1]. Furthermore couple and n such
2
2 2
that t n h2 and for L N0 , set N0 hL as well as N hL h 1+ for
all = 1, . . . , L and arbitrary fixed > 0. Then, for every : H R satisfying
Assumption 3 and all [0, ), there exists a constant C > 0 such that for all
, n N0 , the error of the multilevel Monte Carlo approximation is bounded by
max
k=1,...,N(nL )
where nL is chosen according to the coupling with L. If the work of one computation
in space is bounded by WH = O(hd ) for = 0, . . . , L and fixed d 0, which
includes the summation of different levels, the overall work will be bounded by
WL = O(hL(d+2) L 2+ ).
Proof We first observe that
max
k=1,...,N(nL )
by Theorem 2 and the coupling of the space and time discretizations. Furthermore it
holds that
|E[(X(tknL ))]E[(XL,nL (tknL ))]|
max
k=1,...,N(nL )
max
k=1,...,N(nL )
max
k=1,...,N(nL )
ChL ,
Ch .
501
is made precise in the following corollary and the computations for given accuracy
afterwards.
Corollary 3 (Weak convergence) Let the stochastic evolution equation (1) with mild
solution X and the sequence of its approximations (X,n , , n N0 ) given by (3)
satisfy Assumptions 1 and 2 for some (0, 1]. Furthermore couple and n such
4
4 2
that t n h2 and for L N0 , set N0 hL as well as N hL h 1+ for
all = 1, . . . , L and arbitrary fixed > 0. Then, for every : H R satisfying
Assumption 3 and all [0, ), there exists a constant C > 0 such that for all
, n N0 , the error of the multilevel Monte Carlo approximation is bounded by
max
k=1,...,N(nL )
where nL is chosen according to the coupling with L. If the work of one computation
in space is bounded by WH = O(hd ) for = 0, . . . , L and fixed d 0, which
includes the summation of different levels, the overall work will be bounded by
(d+2+2 ) 2+
WL = O(hL
).
Proof The proof is the same as for Corollary 2 except that we obtain
max
k=1,...,N(nL )
directly from Theorem 3 and therefore set a = h , = 1/2, and the sample
numbers according to these choices in Theorem 1.
If we take regular subdivisions of the grids, i.e., we set, up to a constant, h := 2
for N0 and rescale both corollaries such that the convergence rates are the same,
2
i.e., the errors are bounded by O(h ), we obtain that for a given accuracy L on
level L N, Corollary 2 leads to computational work
2+ 2 + (d+2)/
| log2 L |
WL = O 2
2 L
while the estimators in Corollary 3 can be computed in
WL = O
2 + ((d+2)/(2 )+1)
| log2 L | .
L
2
Therefore the availability of weak convergence rates implies a reduction of the computational complexity of the multilevel Monte Carlo estimator which depends on the
regularity and d referring to the dimension of the problem in space. For large d, the
work using strong convergence rates is essentially the squared work that is needed
with the knowledge of weak rates. Additionally, for all d 0, the rates are better and
3/(2 )+1
3/
for the weak rates versus L ,
especially in dimension d = 1 we obtain L
502
A. Lang
Table 1 Computational work of different Monte Carlo type approximations for a given precision L
Monte Carlo
MLMC with strong conv.
MLMC with weak conv.
General
(d+2)/
((d+2)/(2 )+2)
22+ 2+
2 L
(d/2+3)
= 1, omitting L
const.
(d+2)
| log2 L |
| log2 L |
2+ ((d+2)/(2 )+1)
| log2 L |
2 L
(d/2+2)
L
| log2 L |
where (0, 1). Nevertheless, one should also mention that Corollary 2 already
reduces the work for 4 > d + 2 compared to a (singlelevel) Monte Carlo approximation according to weak convergence rates. The results are put together in Table 1
for a quick overview.
5 Simulation
In this section simulation results of the theory of Sect. 4 are shown, where it has
to be admitted that the chosen example fits better the framework of [6] since we
estimate the expectation of the solution instead of the expectation of a functional
of the solution. Simulations that fit the conditions of Sect. 4 are under investigation.
Here we simulate similarly to [4] and [5] the heat equation driven by additive Wiener
noise
dX(t) = X(t) dt + dW (t)
on the space interval (0, 1) and the time interval [0, 1] with initial condition X(0, x) =
sin( x) for x (0, 1). In contrast to previous simulations, the noise is assumed to be
white in space to reduce the strong convergence rate of the scheme to (essentially) 1/2.
The solution to the corresponding deterministic system with u(t) = E[X(t)] for
t [0, 1]
du(t) = u(t) dt
is in this case u(t, x) = exp( 2 t) sin( x) for x (0, 1) and t [0, 1].
The space discretization is done with a finite element method and the hat function
basis, i.e., with the spaces (Sh , h > 0) of piecewise linear, continuous polynomials
(see, e.g., [6, Example 3.1]). The numbers of multilevel Monte Carlo samples are
calculated according to Corollaries 2 and 3 with = 1 to compare the convergence
and complexity properties with and without the availability of weak convergence
rates. In the left graph in Fig. 1, the multilevel Monte Carlo estimator E L [XL,2L (1)]
was calculated for L = 1, . . . , 5 for available weak convergence rates as in Corollary 3 while just for L = 1, . . . , 4 in the other case to finish the simulations in a
reasonable time on an ordinary laptop. The plot shows the approximation of
503
1/2
i.e.,
e1 (XL,2L ) :=
m
1/2
1
(exp( 2 ) sin( xk ) E L [XL,2L (1, xk )])2
.
m
k=1
N
1
i
e1 (XL,2L
)2
1/2
i=1
i
where (XL,2L
, i = 1, . . . , N) is a sequence of independent, identically distributed
samples of XL,2L and N = 10. The simulation results confirm the theory. In Fig. 2 the
computational costs per level of the simulations on a laptop using matlab are shown
for both frameworks. It is obvious that the computations using weak convergence
rates are substantially faster. One observes especially that the computations with
weak rates on level 5 take less time than the ones with strong rates on level 4. The
computing times match the bounds of the computational work that were obtained in
Corollaries 3 and 2.
10
strong, = 1
strong, = 0
weak, = 1
weak, = 0
10
10
strong, = 1
strong, = 0
weak, = 1
weak, = 0
10
L2 error
10
L error
O(h )
O(h )
2
10
10
10
10
10
0
10
10
10
10
10
10
Fig. 1 Mean square error of the multilevel Monte Carlo estimator with samples chosen according
to Corollaries 2 and 3
504
A. Lang
10
10
10
10
strong, = 1
strong, = 0
6
10
O(h )
weak, = 1
weak, = 0
O(h )
10
4
0
10
10
10
Fig. 2 Computational work of the multilevel Monte Carlo estimator with samples chosen according
to Corollaries 2 and 3
Finally, Figs. 1 and 2 include besides = 1 also simulation results for the border
case = 0 in the choices of sample sizes per level. One observes in the left graph in
Fig. 1 that the variance of the errors for = 0 in combination with Corollary 2 is high,
which is visible in the nonalignment of the single simulation results. Furthermore
the combination of Figs. 1 and 2 shows that = 0 combined with Corollary 3 and
= 1 with Corollary 2 lead to similar errors, but that the first choice of sample sizes
is essentially less expensive in terms of computational complexity. Therefore the
border case = 0, which is not included in the theory, might be worth to consider
in practice.
Acknowledgments This research was supported in part by the Knut and Alice Wallenberg foundation as well as the Swedish Research Council under Reg. No. 621-2014-3995. The author thanks
Lukas Herrmann, Andreas Petersson, and two anonymous referees for helpful comments.
References
1. Andersson, A., Kruse, R., Larsson, S.: Duality in refined Sobolev-Malliavin spaces and weak
approximations of SPDE. Stoch. PDE: Anal. Comp. 4(1), 113149 (2016). doi:10.1007/
s40072-015-0065-7
2. Barth, A., Lang, A.: Milstein approximation for advection-diffusion equations driven by multiplicative noncontinuous martingale noises. Appl. Math. Opt. 66(3), 387413 (2012). doi:10.
1007/s00245-012-9176-y
505
3. Barth, A., Lang, A.: Multilevel Monte Carlo method with applications to stochastic partial
differential equations. Int. J. Comp. Math. 89(18), 24792498 (2012). doi:10.1080/00207160.
2012.701735
4. Barth, A., Lang, A.: Simulation of stochastic partial differential equations using finite element
methods. Stochastics 84(23), 217231 (2012). doi:10.1080/17442508.2010.523466
5. Barth, A., Lang, A.: L p and almost sure convergence of a Milstein scheme for stochastic
partial differential equations. Stoch. Process. Appl. 123(5), 15631587 (2013). doi:10.1016/j.
spa.2013.01.003
6. Barth, A., Lang, A., Schwab, Ch.: Multilevel Monte Carlo method for parabolic stochastic
partial differential equations. BIT Num. Math. 53(1), 327 (2013). doi:10.1007/s10543-0120401-5
7. Da Prato, G., Zabczyk, J.: Stochastic Equations in Infinite Dimensions. In: Encyclopedia of
Mathematics and Its Applications. Cambridge University Press, Cambridge (1992). doi:10.
1017/CBO9780511666223
8. Giles, M.B.: Improved multilevel Monte Carlo convergence using the Milstein scheme. In:
Alexander, K., et al. (eds.) Monte Carlo and Quasi-Monte Carlo methods 2006. Selected Papers
Based on the presentations at the 7th International Conference Monte Carlo and quasi-Monte
Carlo Methods in Scientific Computing, Ulm, Germany, August 1418, 2006, pp. 343358.
Springer, Berlin (2008). doi:10.1007/978-3-540-74496-2_20
9. Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607617 (2008). doi:10.
1287/opre.1070.0496
10. Heinrich, S.: Multilevel Monte Carlo methods. In: Margenov, S., Wasniewski, J., Yalamov, P.Y.
(eds.) arge-Scale Scientific Computing, Third International Conference, LSSC 2001, Sozopol,
Bulgaria, June 6-10, 2001, Revised Papers. Lecture notes in computer science, pp. 5867.
Springer, Heidelberg (2001). doi:10.1007/3-540-45346-6_5
11. Jentzen, A., Kurniawan, R.: Weak convergence rates for Euler-type approximations of semilinear stochastic evolution equations with nonlinear diffusion coefficients (2015)
1 Introduction
Monte Carlo simulation is a very convenient method to solve problems arising in
physics like the advectiondiffusion equation with a Dirichlet boundary condition
507
508
L. Lentre
a well-posed problem [4, 5] and to be able to use later the theory of stochastic
differential equations, we required that satisfies an ellipticity condition and has its
coefficients at least in C 2 (D), and that v is bounded and in C 1 (D).
Interesting computations involving the solution c(t, x) are the moments
Mk (T ) =
(2)
The above expectation is nothing more than an average of the positions at time
T of particles that move according to a scheme associated with the process X . This
requires a large number of these particles to be computed. For linear equations, the
particles do not interact with each other and move according to a Markovian process.
The great advantage of the Monte-Carlo method is that its rate of convergence is
not affected by the curse of dimensionality. Nevertheless, the slowness of the rate
caused by the Central-Limit theorem can be considered as a drawback. Precisely, the
computation of the moments requires a large amount of particles to achieve a reliable
approximation. Thus, the use of supercomputers and parallel architectures becomes a
key ingredient to obtain reasonable computational times. However, the main difficulty
when one deals with parallel architectures is to manage the random numbers such
that the particles are not correlated, otherwise a bias in the approximation of the
moments is obtained.
In this paper, we investigate the parallelization of the Monte Carlo method for
the computation of (2). We will consider two implementations strategies where the
total number of particles is divided into batches distributed over the Floating Point
Units (FPUs):
1. SAF: the Strategy of Attachment to the (FPUs) where each FPU received a Virtual Random Number Generator (VRNG) which is either different independent
Random Number Generators (RNGs) or copies of the same RNG in different
states [10]. In this strategy, the random numbers are generated on demand and do
not bear any attachment to the particles.
2. SAO: the Strategy of Attachment to the Object where the particles carries their
own Virtual Random Number Generator.
509
Both schemes clearly carry the non correlation of the particles assuming that all the
drawn random numbers have enough independence which is a matter of RNGs.
Sometimes particles with a singular behavior are encountered and the examination of the full paths of such particles is necessary. With the SAF, a particle replay
requires either to re-run the simulation with a condition to record only the positions
of this particle or to keep track of the random numbers used for this particle. In
both cases, it would drastically increase the computational time and add unnecessary
complications to the code. On the contrary, a particle replay is straightforward with
the SAO.
The present paper is organized in two sections. The first one describes SAF and
SAO. It also treat of the work done in PALMTREE, a library we developed with the
generator RNGStreams [11] and which contains an implementation of the SAO. The
second section presents two numerical experiments which illustrate the performance
of PALMTREE [17] and the SAO. Characteristic curves like speedup and efficiency
are provided for both experiment.
510
L. Lentre
511
512
L. Lentre
process in the Launcher. Then the particles distributed on each MPIs process are
simulated, drawing the random number from the attached VRNG.
Sometimes a selective replay may be necessary to capture some singular paths
in order to enable a physical understanding or for debugging purposes. However,
recording the path of every particle is a memory intensive task as keeping the track
of the random numbers used by each particle. This constitutes a major drawback for
this strategy. SAO is preferred in that case.
513
to the (m 1) 200,000 + 1st substream so that we can compete with the speed of
the SAF.
A naive algorithm using a loop containing the default function that passes through
each substream one at a time is clearly too slow. As a result, we decide to modify
the algorithm for MRG32k3a proposed in [3].
The current state of the generator RNGStreams is a sequence of six numbers,
suppose that {s1 , s2 , s3 , s4 , s5 , s6 } is the start of a substream. With the vectors Y1 =
{s1 , s2 , s3 } and Y2 = {s4 , s5 , s6 }, the matrix
and the numbers m 1 = 4294967087 and m 2 = 4294944443, the jump from one substream to the next is performed with the computations
X 1 = A1 Y1 mod m 1 and X 2 = A2 Y2 mod m 2
with X 1 and X 2 the states providing the first number of the next substream.
As we said above, it is too slow to run these computations n times to make a
jump from the 1st-substream to the nth-substream. Subsequently, we propose to use
the algorithm developed in [3] based on the storage in memory of already computed
matrix and the decomposition
k
gj 8j,
s=
j=0
for any s N.
Since a stream contains 251 = 817 substreams, we decide to only store the already
computed matrices
Ai Ai2 Ai7
Ai8 Ai28 Ai78
..
.. . .
.
. ..
.
.
16
16
16
for i = 1, 2 with A1 and A2 as above. Thus we can reach any substream s with the
formula
k
g 8j
Ais Yi =
Ai j Yi mod m i
j=0
514
L. Lentre
This solution provides a process that can be completed with a complexity less
than O(log2 p) which is much faster [3] than the naive solution. The Fig. 4 illustrates
this idea. In effect, we clearly see that the second FPU receive a stream and then
performs a jump from the initial position of this stream to the first random number
of the n + 1 substream of this exact same stream.
t
(x, t, y) D [0, T ] D,
(x, t, y) = 0, t [0, T ], y D, x D.
515
(3)
This parabolic partial differential equation derived from (1) is often called the
Kolmogorov Forward equation or the FokkerPlanck equation. The probability theory provides us with the existence of a unique Feller process X = (X t )t0 whose
transition function density is the solution of the adjoint of (3), that is
t
(x, t, y) D [0, T ] D,
(x, t, y) = 0, t [0, T ], x D, y D,
(4)
(5)
starting at the position y and killed on the boundary D. Here, (Bt )t0 is a ddimensional Brownian motion with respect to the filtration (Ft )t0 satisfying the
usual conditions [18].
The path of such a process can be simulated step-by-step with a classical Euler
scheme. Therefore a Monte Carlo algorithm for the simulation of the center of mass
simply consists in the computation until time T of a large number of paths and the
average of all the final positions of every simulated particle still inside the domain.
As we are mainly interested in computational time and efficiency, the numerical
experiments that will follow are performed in free space. Working on a bounded
domain would only require to set the accurate stopping condition, which is a direct
consequence of the FeynmanKac formula that is to terminate the simulation of the
particle when it leaves the domain.
516
L. Lentre
(6)
with Bn = Btn+1 Btn . In this case, the Euler scheme presents the advantage of
being exact.
Since the Brownian motion is easy to simulate, we choose to sample 10,000,000
paths starting from the position 0 until time T = 1 with 0.001 as time step. We
compute the speedup S and the efficiency E which are defined as
S=
T1
T1
and E =
100,
Tp
p Tp
where T1 is the sequential computational time with one MPIs process and T p is the
time in parallel using p MPIs process.
The speedup and efficiency curves together with the values used to plotted them
are respectively given in Fig. 5 and Table 1. The computations were realized with
the supercomputer Lambda from the Igrida Grid of INRIA Research Center Rennes
Bretagne Atlantique. This supercomputer is composed of 11 nodes with 2 6 Intel
Xeon(R) E5647 CPUs at 2.40 Ghz on Westmere-EP architecture. Each node possesses 48 GB of Random Access Memory and is connected to the others through
infiniband. We choose GCC 4.7.2 as C++ compiler and use the MPI library OpenMPI
1.6 as we prefer to use opensource and portable software. These tests include the
time used to write the output file for the speedup computation so that we also show
the power of the HDF5 library.
The Table 1 clearly illustrates PALMTREEs performance. It appears that the
SAO does not suffer a significant loss of efficiency despite it requires a complex
(a)
(b)
Speedup
Efficiency
120
108
100
96
90
84
80
72
70
60
60
48
50
40
36
30
24
20
12
10
MPIs processes
1
12 24 36 48 60 72 84 96 108 120
MPIs processes
1
12 24 36 48 60 72
84 96 108 120
Fig. 5 Brownian motion: a The dash line represents the linear acceleration and the black curve
shows the speedup. b The dash line represents the 100 % efficiency and the black curve shows the
PALMTREEs efficiency
517
12
24
36
48
60
72
84
96
108
120
Time (s)
4842
454
226
154
116
93
78
67
59
53
48
Speedup
10.66
21.42
31.44
41.74
52.06
62.07
72.26
82.06
91.35
100.87
Efficiency
100
88.87
89.26
87.33
86.96
86.77
86.21
86.03
85.48
84.59
84.06
preprocessing. Moreover, the data show that the optimum efficiency (89.26 %) is
obtained with 24 MPIs processes.
As we mentioned in Sect. 2.2, the independence between the particles is guaranteed by the non correlation of random numbers generated by the RNG. Moreover,
Fig. 6 shows that the sum of the squares of the positions of the particles at T = 1
follow a 2 distribution in two different cases: (a) between substreams i and i + 1
for i = 0, . . . , 40,000 of the first stream. (b) between substreams i of the first and
second streams for i = 0, . . . , 10,000.
at
b
X tn + (eat 1) +
a
e2at 1
N (0, 1)
2a
(a)
(b)
Fig. 6 2 test: a between substreams i and i + 1 for i = 0 . . . 40,000 of the first stream. b between
substreams i of the first and second streams for i = 0 . . . 10,000
518
L. Lentre
(a)
(b)
Speedup
120
Efficiency
108
100
90
80
70
60
50
40
30
20
10
96
84
72
60
48
36
24
12
MPIs processes
1 12 24 36 48 60 72 84 96108120
MPIs processes
1 12 24 36 48 60 72 84 96108120
Fig. 7 Constant diffusion with an affine drift: a The dash line represents the linear acceleration
and the black curve shows the speedup. b The dash line represents the 100 % efficiency and the
black curve shows the PALMTREEs efficiency
Table 2 The values used to plot the curve in Fig. 7
Processes
24
36
48
60
72
84
96
108
120
Time (s)
19020 1749
12
923
627
460
355
302
273
248
211
205
Speedup
10.87
20.60
30.33
41.34
53.57
62.98
69.67
76.69
90.14
92.78
Efficiency
100
90.62
85.86
84.26
86.14
89.29
87.47
82.94
79.88
83.46
73.31
519
4 Conclusion
The parallelization of Stochastic Lagrangian solvers relies on a careful and efficient
management of the random numbers. In this paper, we proposed a strategy based on
the attachment of the Virtual Random Number Generators to the Object.
The main advantage of our strategy is the possibility to easily replay some particle
paths. This strategy is implemented in the PALMTREE software. PALMTREE use
RNGStreams to benefit from the native split of the random numbers in streams and
substreams.
We have shown the efficiency of PALMTREE on two examples in dimension one:
the simulation of the Brownian motion in the whole space and the simulation of an
advectiondiffusion problem with an affine drift term. Independence of the paths
was also checked.
Our current work is to perform more tests with various parameters and to link
PALMTREE to the platform H2OLAB [16], dedicated to simulations in hydrogeology. In H2OLAB, the drift term is computed in parallel so that the drift data are split
over MPIs processes. The challenge is that the computation of the paths will move
from one MPIs process to another which raises issues about communications, good
work load balance and an advanced management of the VRNGs in PALMTREE.
Acknowledgments I start by thanking S. Maire and M. Simon who offer me the possibility to
present this work at MCQMC. I thank J. Erhel and G. Pichot for the numerous discussions on
Eulerian Methods. I am also grateful to T.Dufaud and L.-B. Nguenang for the instructive talks on
the MPI library. C. Deltel and G. Andrade-Barroso of IRISA were of great help for the deployment
on supercomputers and understanding the latest C++ standards. Many thanks to G. Landurein for
his help in the implementation of PALMTREE. I am in debt to P. LEcuyer and B. Tuffin for the
very interesting discussions about RNGStreams. I show gratitude to D. Imberti for his help in the
English language during the writing of this article. I finish with a big thanks to A. Lejay. This work
was partly funded by a grant from ANR (H2MNO4 project).
References
1. The C++ Programming Language. https://isocpp.org/std/status (2014)
2. Aronson, D.G.: Non-negative solutions of linear parabolic equations. Annali della Scuola Normale Superiore di Pisa - Classe di Scienze 22(4), 607694 (1968)
3. Bradley, T., du Toit, J., Giles, M., Tong, R., Woodhams, P.: Parallelization techniques for
random number generations. GPU Comput. Gems Emerald Ed. 16, 231246 (2011)
4. Evans, L.C.: Partial differential equations. In: Graduate Studies in Mathematics, 2nd edn.
American Mathematical Society, Providence (2010)
5. Friedman, A.: Partial differential equations of parabolic type. In: Dover Books on Mathematics
Series. Dover Publications, New York (2008)
6. Gardiner, C.: A handbook for the natural and social sciences. In: Springer Series in Synergetics,
4th edn. Springer, Heidelberg (2009)
7. Hundsdorfer, W., Verwer, J.G.: Numerical solution of time-dependent advection-diffusionreaction equations. In: Springer Series in Computational Mathematics. Springer, Heidelberg
(2003)
520
L. Lentre
8. Kloeden, P.E., Platen, E.: Numerical solution of stochastic differential equations. In: Stochastic
Modelling and Applied Probability. Springer, Heidelberg (1992)
9. LEcuyer, P.: Testu01. http://simul.iro.umontreal.ca/testu01/tu01.html
10. LEcuyer, P., Munger, D., Oreshkin, B., Simard, R.: Random numbers for parallel computers: requirements and methods, with emphasis on GPUs. In: Mathematics and Computers in
Simulation, Revision Submitted (2015)
11. LEcuyer, P., Simard, R., Chen, E.J., Kelton, W.D.: An object-oriented random-number package
with many long streams and substreams. Oper. Res. 50(6), 10731075 (2002)
12. Mascagni, M., Srinivasan, A.: Algorithm 806: SPRNG: a scalable library for pseudorandom
number generation. ACM Trans. Math. Softw. 26(3), 436461 (2000)
13. Matsumoto, M., Nishimura, T.: Mersenne twister: a 623-dimensionally equidistributed uniform
pseudo-random number generator. ACM Trans. Model. Comput. Simul. 8(1), 330 (1998)
14. Nash, J.: Continuity of solutions of parabolic and elliptic equations. Am. J. Math. 80(4), 931
954 (1958)
15. ksendal, B.: Stochastic Differential Equations. Universitext. Springer, Heidelberg (2003)
16. Project-team Sage. H2OLAB. https://www.irisa.fr/sage/research.html
17. Lentre, L., Pichot, G.: Palmtree Library. http://people.irisa.fr/Lionel.Lenotre/software.html
18. Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion. Grundelehren der mathematischen Wissenschaften, 3rd edn. Springer, Berlin (1999)
19. Zheng, C., Bennett, G.D.: Applied Contaminant Transport Modelling. Wiley, New York (2002)
1 Introduction
The need for simulation of truncated multivariate normal distributions appears in
many fields, like Bayesian inference for truncated parameter space [10] and [11],
H. Maatouk (B) X. Bay
cole Nationale Suprieure des Mines de St-tienne, 158 Cours Fauriel,
Saint-tienne, France
e-mail: hassan.maatouk@mines-stetienne.fr
X. Bay
e-mail: bay@emse.fr
H. Maatouk
Institut Camille Jordan, Universit de Lyon, UMR 5208 , F - 69622
Villeurbanne Cedex, France
H. Maatouk
Institut de Radioprotection et de Sret Nuclaire (IRSN),
92260 Fontenay-aux-Roses, France
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_27
521
522
523
Suppose that g is another density function close to f such that for some finite constant
c 1, called rejection constant,
f (x) cg(x), x Rd .
(1)
(2)
g(t)dt
.
f(t)dt
and so
f (x) =
C
with c = k C
f(X ).
g(t)dt
.
f(t)dt
f(x)
g(x)
= cg(x),
c
f (t)dt
C g(t)dt
(3)
1
1
1
(x
)
exp
(x
)
, x Rd (4)
(2 )d/2 | |1/2
2
524
(5)
exp (x ) (x ) .
g(x | , ) =
(2 )d/2 | |1/2
2
(6)
Then we prove in the next theorem and corollary that g can be used as a proposal
pdf for rejection sampling on C , and we derive the optimal constant.
Theorem 1 Let f and g be the unnormalized density functions defined as
f(x) = f (x | 0, )1xC and
g(x)
= g(x | , )1xC ,
where f and g are defined respectively in (4) and (6). Then there exists a constant k
such that f(x) k g(x)
(7)
Proof Let us start with the one-dimensional case. Without loss of generality, we
suppose that C = [ , +[, where is positive and = 2 . In this case, the
condition f(x) k g is written
x2
x , e 2 2 ke
and so
k =e
( )2
2 2
max e
x
x
2
=e
( )2
2 2
(x )2
2 2
x
2
x
min
= e
( )2
2 2
) 1 x 1
xC
x C , x 1 ( ) 1 .
. Since
525
The angle between the gradient vector 1 of the function 21 x 1 x at the mode
and the dashed vector (x ) is acute for all x in C since C is convex (see
Fig. 1). Therefore, (x ) 1 is non-negative for all x in C .
By now, we can write the RSM algorithm as follows:
Corollary 1 (RSM Algorithm) Let f and g be the unnormalized density functions
defined as
f(x) = f (x | 0, )1xC and
g(x)
= g(x | , )1xC ,
where f and g are defined by (4)(6). Then the random vector X resulting from the
following algorithm is distributed accorded to f.
1. Generate X with unnormalized density g.
2. Generate U uniformly on [0, 1]. If U exp ( ) 1 X 1 , accept
X ; otherwise go back to step 1.
Proof The proof is done by applying Proposition 1 with the optimal constant k of
Theorem 1.
Remark 1 In practice, we use a crude rejection method to simulate X with unnormalized density g in the RSM algorithm. So if C , RSM degenerates to crude
rejection sampling since = and f = g. Therefore, the method RSM can be
seen as a generalization of naive rejection sampling.
Remark 2 Our method requires only the maximum likelihood of the pdf restricted to
the acceptance region. It is the mode of the truncated multivariate normal distribution.
The numerical calculation of it is a standard problem for solving convex quadratic
programs, see e.g. [14].
526
3 Performance Comparisons
To investigate the performance of the RSM algorithm, we compare it with existing
rejection algorithms. Robert [24] for example proposed a rejection sampling method
in the one dimensional case. To compare the acceptance rates of RSM with Roberts
method, we consider a standard normal variable truncated between and + with
fixed to 1. In Roberts method, the average acceptance rate is high when the
acceptance interval is small (see Table 2.2 in [24]). In the proposed algorithm, simulating from shifted distributions (first step in the RSM algorithm) leads to the fact
that the average acceptance rate is more important when the acceptance interval is
large. As expected, the performance of the proposed algorithm appears when we
have a large gap between and + , as shown in Table 1. Thus the RSM algorithm
can be seen as a complementary of Roberts one.
The performance of the method appears when the probability to be inside the
acceptance region is low. In Table 2, we consider the one dimensional case d = 1 and
we only change the position of , where the acceptance region is C = [ , +[.
Table 1 Comparison of average acceptance rate between Roberts method [24] and RSM under
the variability of the distance between and +
+
Roberts method (%) Rejection sampling
Gain
from the mode (%)
0.5
1
2
5
10
77.8
56.4
35.0
11.6
7.0
18.0
21.2
27.4
28.2
28.4
0.2
0.3
0.7
2.4
4.0
30.8
15.8
6.7
2.2
0.6
0.1
0.0
0.0
0.0
34.9
26.2
20.5
16.8
14.2
12.2
10.6
9.3
8.4
1.1
1.6
3.0
7.4
23.1
92.0
455.6
2936.7
14166.0
527
From the last column, we observe that our algorithm outperforms crude rejection
sampling. For instance, the proposed algorithm is approximately 14,000 times faster
than the crude rejection sampling when the acceptance region is [4.5, +[. Note
also that the acceptance rate remains stable for large (near 10 %) for the RSM
method whereas it decreases rapidly to zero for crude rejection sampling.
Now we investigate the performance of the RSM algorithm using a convex set
in two dimensions. To do this, we consider azero-mean
bivariate Gaussian random
4 2.5
vector x with covariance matrix , equal to
. Assume that the convex set
2.5 2
C R2 is defined by the following inequality constraints:
10 x2 0 and x1 15, 5x1 x2 + 15 0.
It is the acceptance region used in Figs. 2 and 3. By minimizing a quadratic form
subject to linear constraints, we find the mode
1
= arg min x 1 x (3.4, 2.0),
xC 2
and then we compare crude rejection sampling to RSM.
In Fig. 2, we use crude rejection sampling in 2000 simulations of a N (0, ).
Given the number of points in C (black points), it is clear that the algorithm is not
efficient. The reason is that the mean of the bivariate normal distribution is outside the
acceptance region. In Fig. 3, we first simulate from the shifted distribution centered
at the mode with same covariance matrix (step one of the RSM algorithm). Now
in the second step of the RSM algorithm, we have two types of points (black and gray
ones) in the convex set C . The gray points are in C but do not respect the inequality
constraint in the RSM algorithm (see Corollary 1). The black points are in C , and
528
Table 3 Comparison between crude rejection sampling and RSM with respect to the dimension d
Dimension d
2.33
1.29
0.79
0.48
0.25
1.0
1.0
1.0
1.0
1.0
15.0
5.2
2.5
1.5
1.2
15.0
5.2
2.5
1.5
1.2
respect this inequality constraint. We observe that RSM outperforms crude rejection
sampling, with acceptance rate of 21 % against 3 %.
Now we investigate the influence of the problem dimension d. We simulate a
standard multivariate normal distribution X restricted to C = [ , +[d , where
is chosen such that P(X C ) = 0.01. The mean of the multivariate normal
distribution is outside the acceptance region. Simulation of truncated normal distributions in multidimensional cases is a difficult problem for rejection algorithms.
As shown in Table 3, the RSM algorithm is interesting up to dimension three. However, simulation of truncated multivariate normal distribution in high dimensions is
a difficult problem for exact rejection methods. In that case, an adaptive rejection
sampling for Gibbs sampling is needed, see e.g. [13]. From Table 3, we can remark
that when the dimension increases, the parameter tends to zero. Hence, the mode
= ( , . . . , ) tends to the zero-mean of the Gaussian vector X . And so, the
acceptance rate of the proposed method converges to the acceptance rate of the crude
rejection sampling.
529
4 Conclusion
In this paper, we develop a new rejection technique, called RSM, to simulate a
truncated multivariate normal distribution restricted to convex sets. The proposed
method only requires to find the mode of the target probability density function
restricted to the convex acceptance region. The proposal density function in the
RSM algorithm is the shifted target distribution centered at the mode. We provide a
theoretical formula of the optimal constant such that the proposal density function
is as close as possible to the target density. An illustrative example to compare RSM
with crude rejection sampling is included. The simulation results show that using
rejection sampling from the mode is more efficient than crude rejection sampling.
Comparisons with Roberts method in the one dimensional case is discussed. The
RSM method outperforms Roberts method when the acceptance interval is large and
the probability of the normal distribution to be inside is low. The proposed rejection
method has been applied in the case where the acceptance region is a convex subset
of Rd , and can be extended to non-convex regions by using the convex hull. Note
that it is an exact method and it is easy to implement since the mode is calculated
as a Bayesian estimator in many application. For instance, the proposed algorithm
has been used to simulate a conditional Gaussian process with inequality constraints
(see [20]). An adaptive rejection sampling for Gibbs sampling is needed to improve
the acceptation rate of the proposed method.
Acknowledgments This work has been conducted within the frame of the ReDice Consortium,
gathering industrial (CEA, EDF, IFPEN, IRSN, Renault) and academic (Ecole des Mines de SaintEtienne, INRIA, and the University of Bern) partners around advanced methods for Computer
Experiments. The authors wish to thank Olivier Roustant (EMSE), Laurence Grammont (ICJ, Lyon
1) and Yann Richet (IRSN, Paris) for helpful discussions, as well as the anonymous reviewers for
constructive comments and the participants of MCQMC2014 conference.
References
1. Botts, C.: An accept-reject algorithm for the positive multivariate normal distribution. Comput.
Stat. 28(4), 17491773 (2013)
2. Breslaw, J.: Random sampling from a truncated multivariate normal distribution. Appl. Math.
Lett. 7(1), 16 (1994)
3. Casella, G., George, E.I.: Explaining the Gibbs sampler. Am. Stat. 46(3), 167174 (1992)
4. Chopin, N.: Fast simulation of truncated Gaussian distributions. Stat. Comput. 21(2), 275288
(2011)
5. Da Veiga, S., Marrel, A.: Gaussian process modeling with inequality constraints. Annales de
la facult des sciences de Toulouse 21(3), 529555 (2012)
6. Devroye, L.: Non-Uniform Random Variate Generation. Springer, New York (1986)
7. Ellis, N., Maitra, R.: Multivariate Gaussian simulation outside arbitrary ellipsoids. J. Comput.
Graph. Stat. 16(3), 692708 (2007)
8. Emery, X., Arroyo, D., Pelez, M.: Simulating large Gaussian random vectors subject to
inequality constraints by Gibbs sampling. Math. Geosci. 119 (2013)
530
9. Freulon, X., Fouquet, C.: Conditioning a Gaussian model with inequalities. In: Soares, A. (ed.)
Geostatistics Tria 92, Quantitative Geology and Geostatistics, vol. 5, pp. 201212. Springer,
Netherlands (1993)
10. Gelfand, A.E., Smith, A.F.M., Lee, T.M.: Bayesian analysis of constrained parameter and
truncated data problems using Gibbs sampling. J. Am. Stat. Assoc. 87(418), 523532 (1992)
11. Geweke, J.: Exact inference in the inequality constrained normal linear regression model. J.
Appl. Econom. 1(2), 127141 (1986)
12. Geweke, J.: Efficient simulation from the multivariate normal and student-t distributions subject
to linear constraints and the evaluation of constraint probabilities. In: Proceedings of the 23rd
Symposium on the Interface Computing Science and Statistics, pp. 571578 (1991)
13. Gilks, W.R., Wild, P.: Adaptive rejection sampling for Gibbs sampling. J. R. Stat. Soc. Series
C (Applied Statistics) 41(2), 337348 (1992)
14. Goldfarb, D., Idnani, A.: A numerically stable dual method for solving strictly convex quadratic
programs. Math. Progr. 27(1), 133 (1983)
15. Griffiths, W.E.: A Gibbs sampler for the parameters of a truncated multivariate normal distribution. Department of Economics - Working Papers Series 856, The University of Melbourne
(2002)
16. Hrmann, W., Leydold, J., Derflinger, G.: Automatic Nonuniform Random Variate Generation.
Statistics and Computing. Springer, Berlin (2004)
17. Kotecha, J.H., Djuric, P.: Gibbs sampling approach for generation of truncated multivariate
Gaussian random variables. IEEE Int. Conf. Acoust. Speech Signal Process. 3, 17571760
(1999)
18. Laud, P.W., Damien, P., Shively, T.S.: Sampling some truncated distributions via rejection
algorithms. Commun. Stat. - Simulation Comput. 39(6), 11111121 (2010)
19. Li, Y., Ghosh, S.K.: Efficient sampling method for truncated multivariate normal and student
t-distribution subject to linear inequality constraints. http://www.stat.ncsu.edu/information/
library/papers/mimeo2649_Li.pdf
20. Maatouk, H., Bay, X.: Gaussian process emulators for computer experiments with inequality
constraints (2014). https://hal.archives-ouvertes.fr/hal-01096751
21. Martino, L., Miguez, J.: An adaptive accept/reject sampling algorithm for posterior probability
distributions. In: IEEE/SP 15th Workshop on Statistical Signal Processing, SSP 09, pp. 4548
(2009)
22. Martino, L., Miguez, J.: A novel rejection sampling scheme for posterior probability distributions. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal
Processing ICASSP, pp. 29212924 (2009)
23. Philippe, A., Robert, C.P.: Perfect simulation of positive Gaussian distributions. Stat. Comput.
13(2), 179186 (2003)
24. Robert, C.P.: Simulation of truncated normal variables. Stat. Comput. 5(2) (1995)
25. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, Berlin (2004)
26. Rodriguez-Yam, G., Davis, R.A., Scharf, L.L.: Efficient Gibbs sampling of truncated multivariate normal with application to constrained linear regression (2004). http://www.stat.columbia.
edu/~rdavis/papers/CLR.pdf
27. Von Neumann, J.: Various techniques used in connection with random digits. J. Res. Nat. Bur.
Stand. 12, 3638 (1951)
28. Jun-wu YU, G.l.T.: Efficient algorithms for generating truncated multivariate normal distributions. Acta Mathematicae Applicatae Sinica, English Series 27(4), 601 (2011)
Abstract This work investigates the star discrepancies and squared integration
errors of two quasi-random points constructions using a generator one-dimensional
sequence and the Hilbert space-filling curve. This recursive fractal is proven to maximize locality and passes uniquely through all points of the d-dimensional space. The
van der Corput and the golden ratio generator sequences are compared for randomized integro-approximations of both Lipschitz continuous and piecewise constant
functions. We found that the star discrepancy of the construction using the van der
Corput sequence reaches the theoretical optimal rate when the number of samples
is a power of two while using the golden ratio sequence performs optimally for
Fibonacci numbers. Since the Fibonacci sequence increases at a slower rate than the
exponential in base 2, the golden ratio sequence is preferable when the budget of
samples is not known beforehand. Numerical experiments confirm this observation.
Keywords Quasi-random points
sequence numerical integration
Hilbert curve
discrepancy
golden ratio
C. Schretter (B)
ETRO Department, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
e-mail: cschrett@vub.ac.be
C. Schretter
IMinds, Gaston Crommenlaan 8, Box 102, 9050 Ghent, Belgium
Z. He
Tsinghua University, Haidian Dist., Beijing 100084, China
M. Gerber
Universit de Lausanne, 1015 Lausanne, Switzerland
N. Chopin
Centre de Recherche En conomie Et Statistique, ENSAE, 92245 Malakoff, France
H. Niederreiter
RICAM, Austrian Academy of Sciences, Altenbergerstr. 69, 4040 Linz, Austria
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_28
531
532
C. Schretter et al.
1 Introduction
The Hilbert space-filling curve in two dimensions [1], first described in 1891 by David
Hilbert, is a recursively defined fractal path that passes uniquely through all points
of the unit square. The Hilbert curve generalizes naturally in higher dimensions
and presents interesting potential for the construction of quasi-random point sets
and sequences. In particular, its construction ensures the bijectivity, adjacency and
nesting properties that we define in the following.
For integers d 2 and m 0, let
2d m 1
Imd = Imd (k) := [k, k + 1] 2d m k=0
(1)
be the splitting of [0, 1] into closed intervals of equal size 2d m and Smd be the
splitting of [0, 1]d into 2d m closed hypercubes of volume 2d m . First, writing
H : [0, 1] [0, 1]d for the Hilbert space-filling curve mapping, the set Smd (k) :=
H (Imd (k)) is a hypercube that belongs to Smd (bijectivity property). Second, for any
k {0, . . . , 2d m 2}, Smd (k) and Smd (k + 1) have at least one edge in common (adjacency property). Finally, if we split Imd (k) into the 2d successive closed intervals
d
d
(ki ), ki = 2d k + i and i {0, . . . , 2d 1}, then Sm+1
(ki ) are simply the splitIm+1
d
d
d(m+1)
(nesting property).
ting of Sm (k) into 2 closed hypercubes of volume 2
The Hilbert space-filling curve has already been applied to many problems in
computer science such as clustering points [2] and optimizing cache coherence for
efficient database access [3]. The R*-tree data structure has also been proposed
for efficient searches of points and rectangles [4]. Similar space-filling curves have
been used to heuristically propose approximate solutions to the traveling salesman
problem [5]. In computer graphics, the Hilbert curve has been used to define strata
prior to stratified sampling [6]. Very recently, the inverse Hilbert mapping has also
been applied to sequential quasi-Monte Carlo methods [7].
Fig. 1 First three steps of the recursive construction of the Hilbert space-filling curve in two
dimensions. The dots snap to the closest vertex on an implicit Cartesian grid that cover the space
with an arbitrary precision increasing with the recursion order of the mapping calculations
533
The recursive definition of the Hilbert space-filling curve provides levels of details
for approximations of a continuous mapping from 1-D to d-D with d 2, up to any
arbitrary numerical precision. An illustration of the generative process of the curve
with increasing recursion order is shown in Fig. 1. Efficient computer implementations exists for computing Hilbert mappings, both in two dimensions [8, 9] and up to
32 or 64 dimensions [10]. Therefore, the Hilbert space-filling curve allows fast constructions of point sets and sequences using a given generator set of coordinates in
the unit interval. The remainder of this work focuses on comparing the efficiency of
two integro-approximation constructions, using either the van der Corput sequence
or the golden ratio sequence [11].
2 Integro-Approximations
Let f () be a d-dimensional function that is not analytically integrable on the unit
cube [0, 1]d . We aim at estimating an integral
=
[0,1]d
f (X ) d X.
(2)
Given a one-dimensional sequence x0 , . . . , xn1 in [0, 1), we can get a corresponding sequence of points P0 , . . . , Pn1 in [0, 1)d in the domain of integration via the
mapping function H : [0, 1] [0, 1]d towards samples into the d-dimensional unit
cube. The integral can therefore be estimated by the following average:
=
n1
1
f (H (xi )).
n i=0
(3)
Recent prior work by He and Owen [12] studied such approximations with the
van der Corput sequence as the one-dimensional input for the Hilbert mapping function H . To define the van der Corput sequence, let
i=
(4)
k=1
be the digit expansion in base b 2 of the integer i 0. Then, the ith element of
the van der Corput sequence is defined as
xi =
k=1
dk (i)bk .
(5)
534
C. Schretter et al.
13
11 3
11
13 5
10
10 2
12
12 4
Fig. 2 The first 13 coordinates generated by the van der Corput (top) and the golden ratio (bottom)
sequences. For this specific choice of number of samples, the points are more uniformly spread on
the unit interval with the golden ratio sequence and the maximum distance between the two closest
coordinates is smaller than in the van der Corput sequence
Fig. 3 The first hundred (top row) and thousand (bottom row) points generated by marching along
the Hilbert space-filling curve with distances given by the van der Corput sequence (left) and the
golden ratio sequence (right). In contrast to using the golden ratio number, the van der Corput
construction generates points that are implicitly aligned on a regular Cartesian grid
535
(6)
where {t} denotes the fractional part of the real number t and is the golden ratio
(or golden section) number
1+ 5
1.6180339887 . . . ;
=
2
(7)
however, since only fractional parts are retained, we can as well substitute by the
golden ratio conjugate number
=1=
1
0.6180339887 . . . .
(8)
In prior work, we explored applications of these golden ratio sequences for generating randomized integration quasi-lattices [14] and for non-uniform sampling [15].
Figure 2 compares the first elements of the van der Corput generator and the golden
ratio sequence with s = 0. Figure 3 shows their images in two dimensions through
the Hilbert space-filling curve mapping. It is worth pointing out that both of the van
der Corput and the golden ratio sequences are extensible, while the latter spans the
unit interval over a larger range.
3 Star Discrepancy
A key corollary of the strong irrationality of the golden ratio is that the set of coordinates will not align on any regular grid in the golden ratio sequence. Therefore, we
could expect that irregularities in the generated sequence of point samples could be
advantageous in case the function to integrate contains regular alignments or selfrepeating structures. In order to compare their potential performance for integroapproximation problems, we use the star discrepancy to measure the uniformity of
the resulting sequence P = (P0 , . . . , Pn1 ).
d
[0, ai ). The
For a = (a1 , . . . , ad ) [0, 1]d , let [0, a) be the anchored box i=1
star discrepancy of P is
Dn (P)
A(P, [0, a))
d ([0, a))
= sup
n
a[0,1)d
(9)
with the counting function A giving the number of points from the set P that belong
to [0, a) and d being the d-dimensional Lebesgue measure, i.e., the area for d = 2.
536
C. Schretter et al.
10
VDC
VDC:n=2k
GR
GR:n=F(k)
n1
Star discrepancy
10
10
10
10
10
10
10
10
Number of samples
Fig. 4 A comparison of the star discrepancies of the dyadic van der Corput (VDC) and the golden
ratio (GR) sequences. The dots are evaluated at n = 2k , k = 1, . . . , 12 for the VDC construction
and at n = F(k), k = 1, . . . , 18 for the GR construction. The reference line is n 1
(10)
with n = F(k), k 1.
From the result above, we can see that in most cases the star discrepancy of the
golden ratio sequence is smaller than that of the van der Corput sequence. It is also
of interest to compare the performance of the resulting point sequences P generated
by the van der Corput and golden ratio sequences. For the former, we can prove that
the star discrepancy of P is O(n 1/d ) [12].
537
More generally, for an arbitrary one-dimensional point set x0 , . . . , xn1 in [0, 1],
the following result provides a bound for the star discrepancy of the resulting
d-dimensional point set P:
Theorem 1 Let x0 , . . . , xn1 be n 1 points in [0, 1] and P = {H (x0 ), . . . ,
H (xn1 )}. Then
n1 1/d
(11)
Dn (P) c Dn {xi }i=0
for a constant c depending only on d.
Proof For the sake of simplicity we assume that the Hilbert curve starts at (0, . . . , 0)
[0, 1]d . Let m 0 be an arbitrary integer and a [0, 1)d be such that Smd (0)
B := [0, a). Let SmB = {W Smd : W B}, B = SmB and DmB = {W Smd :
W = }. Then, let D mB be the set of #DmB disjoint subsets of [0, 1]d such
(B\ B)
that
1. W D mB , W DmB | W W, 2. D mB = DmB , 3. B {D mB } = .
(12)
Note that D mB is obtained by removing boundaries of the elements in DmB such that
the above conditions 2 and 3 are satisfied. Then, we have
A(P , W B)
A(P , B)
A(P , B)
+
d (B)
d ( B)
d (W B) .
n
n
n
W D mB
(13)
To bound the first term on the right-hand side, let SmB = {Smd (0)} {Smd (k)
k 1 such that Smd (k) B, Smd (k 1) B c = } so that B contains #SmB
non-consecutive hypercubes belonging to Smd . By the property of the Hilbert curve,
consecutive hypercubes in Smd correspond to consecutive intervals in Imd (adja contains at most #SmB non consecutive intercency property). Therefore, h( B)
d
vals that belong to Im so that there exist disjoint closed intervals I j [0, 1], j =
SmB +1
n1
= #j=1
I j . Hence, since the point set {xi }i=0
is
1, . . . , #SmB + 1 such that h( B)
in [0, 1) we have, using Proposition 2.4 of [16],
Smd ,
A(P, B)
A({x }, h( B))
i
n1
=
2(#SmB + 1) D {xi }i=0
d ( B)
1 (h( B))
.
n
n
(14)
To bound #SmB , let m 1 m be the smallest positive integer such that Smd 1 (0) B
and let km 1 be the maximal number of hypercubes in SmB1 . Note that km 1 = 2m 1 (d1) .
Indeed, by the definition of m 1 , the only way for B to be made of more than one
hypercube in Smd1 is to stack such hypercubes in at most (d 1) dimensions, otherwise, we can reduce m 1 to (m 1 1) due to the nesting property of the Hilbert curve.
538
C. Schretter et al.
In each dimension we can stack at most 2m 1 hypercubes that belong to SmB1 so that
km 1 = 2m 1 (d1) .
Let m 2 = (m 1 + 1) and Bm 2 = B\ SmB1 . Then,
Bm
#Sm 2 2 km 2 := 2d 2m 2 (d1)
(15)
Bm
Bm
k1
More generally, for m 1 m k m, we define Bm k := Bm k1 \ Sm k 1
and #Sm k k
m k (d1)
is bounded by km k := 2d2
. Note that, for any j = 1, . . . , k 1, the union of
Bm
m1
k j = 2d 2m(d1) + 2d 2m 1 (d1)
j=m 1
2(mm 1 )(d1) 1
4d 2m(d1)
2d1 1
(16)
so that
A(P, B)
n1
2(1 + 4d 2m(d1) ) D {xi }i=0
d ( B)
.
n
(17)
For the second term of (13), take W D mB and note that W Smd (k) for a k
{0, . . . , 2dm 1}. Then,
A(P, W B)
A(P, Smd (k))
d (W B)
+ d (Smd (k))
n
n
A({xi }, Imd (k))
+ 1 (Imd (k))
n
n1
21 (Imd (k)) + 2 D {xi }i=0
n1
= 2 2dm + D {xi }i=0
=
(18)
where the last inequality uses the fact that the xi s are in [0, 1) as well as Proposition
2.4 in [16]. Thus,
A(P, W B)
n1
d (W B) 2d 2m + 2d 2m(d1) D {xi }i=0
(19)
n
W D mB
539
(20)
Finally, if a [0, 1)d is such that Smd (0) [0, a), we proceed exactly as above,
but now B is empty and therefore the first term in (13) disappears. To conclude
n1
.
the proof, we choose the optimal value of m such that 2m 2(d1)m D {xi }i=0
n1 1/d
Hence, D (P) c D {xi }i=0
for a constant c depending only on d.
Compared to the result obtained for the van der Corput sequence, which only relies
on the Hlder property of the Hilbert curve [12], it is worth noting that Theorem 1
is based on its three key geometric properties: bijectivity, adjacency and nesting.
Theorem 1 is of key importance in this work as it says that the discrepancy of
the point set is monotonously related to the discrepancy of the generator sequence.
From this point of view, we can see that the star discrepancy of P generated by the
golden ratio sequence is O(n 1/d log(n)1/d ) for n 2. Numerical experiments will
compare the van der Corput and the golden ratio generator sequences and highlight
practical implications for computing the cubatures of four standard test functions.
4 Numerical Experiments
For the scrambled van der Corput sequences, the mean squared error (MSE) for
integration of Lipschitz continuous integrands is in O(n 12/d ) [12]. Additionally,
it is also shown in [12] that for discontinuous functions whose boundary of discontinuities has bounded (d 1)-dimensional Minkowski content, one can get an MSE
of O(n 11/d ). We will compare the two quasi-Monte Carlo constructions using
randomized sequences in our following numerical experiments.
We consider first two smooth functions that were studied in [17, 18] and are shown
in the first row of Fig. 5. The Additive function
f 1 (X ) = X 1 + X 2 ,
X = (X 1 , X 2 ) [0, 1]2 ,
(21)
X = (X 1 , X 2 ) [0, 1]2 .
(22)
540
C. Schretter et al.
3
2.5
2
1.5
1
0.5
0
3
2.5
2
1.5
1
0.5
0
1
0.8
0
0.8
0.6
0.2
0.4
0.4
0.6
0.8
0.6
0.2
0.2
0.4
0.4
0.6
1 0
3
2.5
2
1.5
1
0.5
0
0.8
0.2
1 0
3
2.5
2
1.5
1
0.5
0
1
1
0.8
0.8
0
0.6
0.2
0.4
0.4
0.6
0.8
0.6
0.2
0.2
1 0
0.4
0.4
0.6
0.8
0.2
10
Fig. 5 The four test functions used for integro-approximation experiments. The smooth functions
on the first rows are fairly predictable as their variations are locally coherent. However, the functions
on the second row contain sharp changes that are difficult to capture with discrete sampling
X = (X 1 , X 2 ) [0, 1]2 ,
(23)
X = (X 1 , X 2 ) [0, 1]2 .
(24)
541
Additive
10
VDC
k
VDC:n=2
GR
GR:n=F(k)
n2
10
MSE
10
10
10
10
10
10
10
10
10
Number of samples
Smooth
10
VDC
k
VDC:n=2
GR
GR:n=F(k)
2
n
MSE
10
10
10
10
10
10
10
10
Number of samples
Fig. 6 A comparison of the mean squared errors (MSEs) of the randomized van der Corput and
the golden ratio sequences for the smooth functions f 1 (top) and f 2 (bottom). The reference line is
n 2
542
C. Schretter et al.
Cusp
0
10
VDC
k
VDC:n=2
GR
GR:n=F(k)
2
n
10
MSE
10
10
10
10
10
10
10
10
10
Number of samples
Discontinuous
0
10
VDC
VDC:n=2k
GR
GR:n=F(k)
n1.5
MSE
10
10
10
10
10
10
10
10
Number of samples
Fig. 7 A comparison of the mean squared errors (MSEs) of the randomized van der Corput sequence
and golden ratio sequences for the functions f 3 (top) and f 4 (bottom). The reference line is n 1.5
for the discontinuous step function and n 2 for the continuous function
5 Conclusions
This work evaluated the star discrepancy and squared integration error for two constructions of quasi-random points, using the Hilbert space-filling curve. We found that
using the fractional parts of integer multiples of the golden ratio number often leads to
improved results, especially when the number of samples is close to a Fibonacci number. The discrepancy of the point sets increases monotonously with the discrepancy
of the generator one-dimensional sequence, therefore the van der Corput sequence
543
leads to optimal results in the specific cases when the generating coordinates are
equally-spaced.
In future work, we plan to investigate generalizations of the Hilbert space-filling
curve in higher dimensions. A deterioration of the discrepancy is expected as the
dimension increases, an effect linked to the curse of dimensionality. Since the Hilbert
space-filling curve is accepted by a pseudo-inverse operator, the problem of constructing quasi-random samples is reduced to choosing a suitable generator onedimensional sequence. We therefore hope that the preliminary observations presented
here may spark subsequent research towards designing adapted generator sequences,
given specific integration problems at hand.
Acknowledgments The authors thank Art Owen for suggesting conducting the experimental comparisons presented here, his insightful discussions and his reviews of the manuscript.
References
1. Bader, M.: Space-Filling CurvesAn Introduction with Applications in Scientific Computing.
Texts in Computational Science and Engineering, vol. 9. Springer, Berlin (2013)
2. Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.H.: Analysis of the clustering properties of
Hilbert space-filling curve. Technical report, University of Maryland, College Park, MD, USA
(1996)
3. Terry, J., Stantic, B., Terenziani, P., Sattar, A.: Variable granularity space filling curve for indexing multidimensional data. In: Proceedings of the 15th International Conference on Advances
in Databases and Information Systems, ADBIS11, pp. 111124. Springer (2011)
4. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust
access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, pp. 322331 (1990)
5. Platzman, L.K., Bartholdi III, J.J.: Spacefilling curves and the planar travelling salesman problem. J. ACM 36(4), 719737 (1989)
6. Steigleder, M., McCool, M.: Generalized stratified sampling using the Hilbert curve. J. Graph.
Tools 8(3), 4147 (2003)
7. Gerber, M., Chopin, N.: Sequential quasi-Monte Carlo. J. R. Stat. Soc. Ser. B 77(3), 509579
(2015)
8. Butz, A.: Alternative algorithm for Hilberts space-filling curve. IEEE Trans. Comput. 20(4),
424426 (1971)
9. Jin, G., Mellor-Crummey, J.: SFCGen: a framework for efficient generation of multidimensional space-filling curves by recursion. ACM Trans. Math. Softw. 31(1), 120148 (2005)
10. Lawder, J.K.: Calculation of mappings between one and n-dimensional values using the Hilbert
space-filling curve. Research report BBKCS-00-01, University of London (2000)
11. Coxeter, H.S.M.: The golden section, phyllotaxis, and Wythoffs game. Scr. Math. 19, 135143
(1953)
12. He, Z., Owen, A.B.: Extensible grids: uniform sampling on a space-filling curve. e-print (2014)
13. Franek, V.: An algorithm for QMC integration using low-discrepancy lattice sets. Comment.
Math. Univ. Carolin 49(3), 447462 (2008)
14. Schretter, C., Kobbelt, L., Dehaye, P.O.: Golden ratio sequences for low-discrepancy sampling.
J. Graph. Tools 16(2), 95104 (2012)
15. Schretter, C., Niederreiter, H.: A direct inversion method for non-uniform quasi-random point
sequences. Monte Carlo Methods Appl. 19(1), 19 (2013)
544
C. Schretter et al.
16. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM,
Philadelphia (1992)
17. Sloan, I.H., Joe, S.: Lattice Methods for Multiple Integration. Clarendon Press, Oxford (1994)
18. Owen, A.B.: Local antithetic sampling with scrambled nets. Ann. Stat. 36(5), 23192343
(2008)
1 Introduction
There are many practical applications for which we need to approximate integrals of
multivariate functions. The number of variables d in many applications is huge. It is
desirable to know what is the minimal number of function evaluations that is needed
to approximate the integral to within and how this number depends on 1 and d.
In this paper we consider weighted integration. We restrict ourselves to product weights which control the importance of successive variables and groups of
variables. We consider weighted integration defined over two Sobolev spaces. One
space consists of smooth functions defined over the whole Euclidean space, whereas
P. Siedlecki (B)
Faculty of Mathematics, Informatics and Mechanics, University of Warsaw,
Banacha 2, 02-097 Warszawa, Poland
e-mail: psiedlecki@mimuw.edu.pl
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_29
545
546
P. Siedlecki
the second one is an anchored space of functions defined on the unit cube that are
once differentiable with respect to all variables.
We find necessary and sufficient conditions on product weights to obtain uniform
weak tractability for weighted integration. This problem is solved by first establishing
a relation between uniform weak tractability and so called T -tractability. Then we
apply known results on T -tractability from [4].
We compare necessary and sufficient conditions on uniform weak tractability with
the corresponding conditions on strong polynomial, polynomial, quasi-polynomial
and weak tractability. All these conditions require some specific decay of product
weights. For different notions of tractability the decay is usually different.
We also briefly consider (s, t)-weak tractability introduced recently in [6]. This
notion holds if the minimal number of function evaluations is not exponential in s
and d t . We stress that now s and t can be arbitrary positive numbers. We show that as
long as t > 1 then weighted integration is (s, t)-weakly tractable for a general tensor
product Hilbert space whose reproducing univariate kernel is finitely integrable over
its diagonal. This means that as long as we accept a possibility of an exponential
dependence on d with < t then we do not need decaying product weights and
we may consider even the case where all product weights are the same.
2 Multivariate Integration
Assume that for every d N we have a Borel measurable subset
Dd of Rd , and
d : Dd R+ is a Lebesgue probability density function, Dd d (x)d x = 1. Let
Fd be a reproducing kernel Hilbert space of real integrable functions defined on a
common domain Dd with respect to the measure d (A) = A d (x)d x defined on
all Borel subsets of Dd .
A multivariate integration is a problem INT = {INTd } such that
INTd : Fd R : f
f (x)d (x)d x
Dd
for every d N
We approximate INTd ( f ) for f Fd by algorithms which use only partial information about f . The information about f consists of a finite number of function
values f (t j ) at sample points t j Dd . In general, the points t j can be chosen adaptively, that is the choice of t j may depend on f (ti ) for i = 1, 2, . . . , j 1. The
approximation of INTd ( f ) is then
Q n,d ( f ) = n ( f (t1 ), f (t2 ), . . . , f (tn ))
for some, not necessarily linear, function n : Rn R.
The worst case error of Q n,d is defined as
e(Q n,d ) = sup |INTd ( f ) Q n,d ( f )|.
f Fd 1
547
Since the use of adaptive information does not help we can restrict ourselves to
considering only non-adaptive algorithms, i.e., t j can be given simultaneously, see
[1]. It is also known that the best approximations can be achieved by means of linear
functions, i.e., n can be chosen as a linear function. This is the result of Smolyak
which can be found in [1]. Therefore without loss of generality, we only need to
consider non-adaptive and linear algorithms of the form
Q n,d ( f ) =
n
a j f (t j )
j=1
(0, 1], d N.
We say that INT = {INTd } is strongly T -tractable iff there are nonnegative numbers C and t such that
n(, INTd ) C T (1 , 1)t
(0, 1], d N.
548
P. Siedlecki
Examples of T -tractability include polynomial tractability (PT) and strong polynomial tractability (SPT) if T (x, y) = x y, and quasi-polynomial tractability (QPT)
if T (x, y) = exp((1 + ln x)(1 + ln y)).
We say that INT = {INTd } is weakly tractable (UWT) iff
lim
1 +d
ln n(, INTd )
= 0.
1 + d
As in [5], we say that INT = {INTd } is uniformly weakly tractable (UWT) iff
lim
1 +d
ln n(, INTd )
= 0 , (0, 1).
+ d
it follows that T,(0,1) is a generalized tractability function for every , (0, 1).
Suppose that INT is uniformly weakly tractable, i.e.,
lim
1 +d
ln n(, INTd )
=0
+ d
, > 0.
Thus, for arbitrary but fixed , (0, 1), there exists t > 0 such that
ln n(, INTd ) t ( + d )
Hence
t
n(, INTd ) exp( + d )
(0, 1], d N.
(0, 1], d N.
549
Assume now that INT is T, -tractable for every , (0, 1). That is, for all
, (0, 1) there are positive C(, ) and t (, ) such that
n(, INTd ) C(, ) exp t (, ) ( + d ) (0, 1], d N.
Take now arbitrary positive and which may be larger than 1. Obviously there
exist 0 , 0 (0, 1) such that 0 < and 0 < . Since INTd is T0 ,0 -tractable then
ln n(, INTd )
ln C(0 , 0 ) + t (0 , 0 )(0 + d 0 )
lim
= 0.
+ d
+ d
1 +d
1 +d
lim
1 +d
ln n(, INTd )
=0
+ d
, > 0,
We add that Lemma 1 holds not only for multivariate integration but also for all
multivariate problems.
u{1,2,...,d}
d,u
R(x j , t j )
ju
where
R(x, t) = 1 M (x, t)
0
for
x, y R,
and
M = {(x, t) R2 : xt 0}.
We assume that the weights are bounded product weights, i.e., d, = 1 and
d,u =
ju
(1)
550
P. Siedlecki
where d, j satisfy
0 d, j <
Rd
R
(t)dt = 1 and
lim
j=1 d, j
d
Proof Lemma 1 implies that it is sufficient to prove that INT is T, -tractable for
every , (0, 1). Here T, is defined as in Sect. 3. From [4, Corollary 12.4] we
know that INT is T, -tractable iff the following two conditions hold:
ln 1
< ,
ln T, (1 , 1)
0
d
j=1 d, j
lim lim sup
< .
1 d
ln T, (1 , d)
(2)
lim sup
(3)
Since
ln 1
ln 1
=
lim
=0
0 ln T, ( 1 , 1)
0 + 1
lim
the first condition is satisfied for every , (0, 1) regardless of the choice of
weights . Note that for the second condition on T, -tractability we have the following equivalence:
d
lim lim sup
j=1 d, j
+ d
d
<
lim sup
d
j=1 d, j
d
< .
551
j=1 d, j
d
(4)
j=1 d, j
d
= 0 > 0.
(5)
j=1 d, j
d
1
= lim
d d
d
j=1 d, j
d
d
1
j=1 d, j
lim lim sup
= 0.
d d
d
d
Since (5) obviously implies (4) we have shown that the weighted integration INT
is uniformly weakly tractable iff the condition (5) is satisfied.
After obtaining a necessary and sufficient condition on uniform weak tractability
of the weighted integration INT it is interesting to compare it with conditions on
other types of tractability, which were obtained in [4, Corollary 12.4].
The weighted integration INT is :
strongly polynomially tractable
lim sup
d
d
d, j < ,
j=1
d
polynomially tractable
quasi-polynomially tractable
j=1
lim sup
ln d
d
j=1
lim sup
d
d
lim
lim
d, j
d, j
ln d
j=1 d, j
d
d
j=1
d, j
< ,
< ,
= 0 > 0,
= 0.
Note that depending on the weights , the weighted integration INT can satisfy
one or some types of tractability.
Let d, j =
1
j
552
P. Siedlecki
ju
where
R(x, t) = 1 M (x, t) min(|x a|, |t a|)
for
x, y [0, 1],
ju
for non-negative d, j .
The weighted integration problem INT = {INTd, } is given as in [4, Sect. 12.6.1]:
INTd, : Fd, R : f
[0,1]d
f (t)dt.
Theorem 2 Consider weighted integration problem INT for product weights. Then
for both the absolute and normalized error criteria
d
INT is uniformly weakly tractable iff
lim
j=1 d, j
d
553
Theorem 1. Therefore we can repeat the reasoning used in the proof of Theorem 1 to
obtain the same condition on uniform weak tractability of the presently considered
weighted integration problem.
(6)
H (K 1,d, j ).
j=1
Dd
d
j=1
1/2
1 + d, j
D2
K (x, t)(x)(t) d x dt
Hence, INTd, 1 and the absolute error criterion is harder than the normalized
error criterion.
554
P. Siedlecki
d
1
(1 + d, j
K (x, x)(x)d x)
n(, INT )
2
.
D
j=1
1 +d
2 ln 1
+
s + d t
lim
1 +d
lim
1 +d
2 ln 1
+
s + d t
ln n(, INT )
s + d t
K (x, x)(x)d x
d
j=1
d, j
s + d t
D
K (x, x)(x)d x d
=0
s + d t
for every s > 0 and t > 1. Hence, we have (s, t)-weak tractability for INT .
From Theorems 1, 2 and 3 we see that strong polynomial, polynomial and weak
tractability for weighted integration requires some decay conditions on product
weights even for specific Hilbert spaces, whereas (s, t)- weak tractability for t > 1,
which is the weakest notion of tractability considered here, holds for all bounded
product weights and for general tensor product Hilbert spaces for which the univariate
reproducing kernel satisfies (6).
Acknowledgments I would like to thank Henryk Wozniakowski for his valuable suggestions. This
project was financed by the National Science Centre of Poland based on the decision number DEC2012/07/N/ST1/03200. I gratefully acknowledge the support of ICERM during the preparation of
this manuscript.
References
1. Bakhvalov, N.S.: On the optimality of linear methods for operator approximation in convex
classes of functions. USSR Comput. Math. Math. Phys. 11, 244249 (1971)
2. Gnewuch, M., Wozniakowski, H.: Quasi-polynomial tractability. J. Complex. 27, 312330
(2011)
3. Novak, E., Wozniakowski, H.: Tractability of Multivariate Problems, vol. I. European Mathematical Society, Zrich (2008)
555
4. E. Novak, H. Wozniakowski. Tractability of Multivariate Problems Volume II: Standard Information for Functionals. European Mathematical Society, Zrich (2010)
5. Siedlecki, P.: Uniform weak tractability. J. Complex. 29, 438453 (2013)
6. Siedlecki, P., Weimar, M.: Notes on (s, t)-weak tractability: a refined classification of problems
with (sub)exponential information complexity. J. Approx. Theory 200, 227258 (2015)
1 Introduction
The paper provides some progress in the fundamental problem of algorithmic construction of good methods of approximation and numerical integration. Numerical
integration seeks good ways of approximating an integral
f (x)d
m
j f ( j ), = ( 1 , . . . , m ), j ,
j = 1, . . . , m.
(1)
j=1
It is clear that we must assume that f is integrable and defined at the points 1 , . . . , m .
The expression (1) is called a cubature formula (, ) (if Rd , d 2) or a
quadrature formula (, ) (if R) with knots = ( 1 , . . . , m ) and weights
V. Temlyakov (B)
University of South Carolina, Columbia, SC, USA
e-mail: temlyakovv@gmail.com
V. Temlyakov
Steklov Institute of Mathematics, Moscow, Russia
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_30
557
558
V. Temlyakov
f d m ( f, )|.
(2)
There are many different ways to construct good deterministic cubature formulas
beginning with heuristic guess of good knots for a specific class and ending with finding a good cubature formula as a solution (approximate solution) of the optimization
problem
m (W, ).
inf
1 ,..., m ;1 ,...,m
Clearly, the way of solving the above optimization problem is the preferable one.
However, in many cases this problem is very hard (see a discussion in [11]). It was
observed in [10] that greedy-type algorithms provide an efficient way for deterministic constructions of good cubature formulas for a wide variety of function classes.
This paper is a follow up to [10]. In this paper we discuss in detail a greedy-type
algorithmIncremental Algorithmthat was not discussed in [10]. The main advantage of the Incremental Algorithm over the greedy-type algorithms considered in [10]
is that it provides better control of weights of the cubature formula and gives the same
rate of decay of the integration error.
We remind some notations from the theory of greedy approximation in Banach
spaces. The reader can find a systematic presentation of this theory in [12], Chap. 6.
Let X be a Banach space with norm . We say that a set of elements (functions) D
from X is a dictionary if each g D has norm less than or equal to one (g 1) and
the closure of D coincides with X . We note that in [9] we required in the definition
of a dictionary normalization of its elements (g = 1). However, it is pointed out
in [11] that it is easy to check that the arguments from [9] work under assumption
g 1 instead of g = 1. In applications it is more convenient for us to have an
assumption g 1 than normalization of a dictionary.
For an element f X we denote by Ff a norming (peak) functional for f :
F f = 1,
F f ( f ) = f .
559
(2) Define
i,
i,
G i,
m := (1 1/m)G m1 + m /m.
(3) Let
f mi, := f G i,
m .
We show how the Incremental Algorithm can be used in approximation and
numerical integration. We begin with a discussion of the approximation problem. A
detailed discussion, including historical remarks, is presented in Sect. 2. For simplicity, we illustrate how the Incremental Algorithm works in approximation of univariate
trigonometric polynomials.
An expression
m
c j g j , g j D, c j R,
j = 1, . . . , m
j=1
is called m-term polynomial with respect to D. The concept of best m-term approximation with respect to D
m ( f, D) X :=
inf
{c j },{g j D}
f
m
c j g j X
j=1
N
N
(ak cos k2 x + bk sin k2 x) A := |a0 | +
(|ak | + |bk |).
k=1
k=1
560
V. Temlyakov
An advantage of the IA() over other greedy-type algorithms is that the IA() gives
precise control of the coefficients of the approximant. For all approximants G i,
m we
=
1.
Moreover,
we
know
that
all
nonzero
coefficients
of
have the property G i,
m A
the approximant have the form a/m where a is a natural number. In Sect. 2 we prove
the following result.
Theorem 2 For any t RT (N ) the IA(, L p , RT N ) with an appropriate schedule , applied to f := t/t A , provides after m iterations an m-term trigonometric
polynomial G m (t) := G i,
m ( f )t A with the following approximation property
t G m (t) Cm 1/2 (ln N )1/2 t A , G m (t) A = t A ,
with an absolute constant C.
Comparing Theorems 1 and 2 we see that the error bound in Theorem 1 is better
than in Theorem 2ln(1 + N /m) versus lnN . It is important in applications in the
m-term approximation of smoothness classes. The proof of Theorem 1 is based on
the Weak Chebyshev Greedy Algorithm (WCGA). The WCGA is the most powerful
and the most popular in applications greedy-type algorithm. Its Hilbert space version
is known in signal processing under the name Weak Orthogonal Matching Pursuit.
For this reason for the readers convenience we discuss the WCGA in some detail in
Sect. 2 despite the fact that we do not obtain any new results on the WCGA in this
paper.
We note that the implementation of the IA() depends on the dictionary and the
ambient space X . The IA() from Theorem 2 acts with respect to the real trigonometric system 1, cos 2 x, sin 2 x, . . . , cos N 2 x, sin N 2 x in the space X = L p
with p lnN . Relation p lnN means that there are two positive constants C1 and
C2 , which do not depend on N , such that C1 N p C2 N .
We now proceed to results from Sect. 3 on numerical integration. As in [10] we
define a set Kq of kernels possessing the following properties. Let K (x, y) be
a measurable function on x y . We assume that for any x x K (x, )
L q ( y ), for any y y the K (, y) is integrable over x and x K (x, )dx
L q ( y ), 1 q .
For a kernel K K p we define the class
W pK
:= { f : f =
K (x, y)dx.
561
D := {K (x, ), x x }
and define a Banach space X (K , p ) as the L p ( y )-closure of span of D. In Sect. 3
the following theorem is proved.
Theorem 3 Let W pK be a class of functions defined above. Assume that K K p
satisfies the condition
K (x, ) L p ( y ) 1, x x , |x | = 1
and JK X (K , p ). Then for any m there exists (provided by an appropriate Incremental Algorithm) a cubature formula m (, ) with = 1/m, = 1, 2, . . . , m,
and
m (W pK , ) C( p 1)1/2 m 1/2 , 1 < p 2.
Theorem 3 provides a constructive way of finding for a wide variety of classes
W pK cubature formulas that give the error bound similar to that of the Monter Carlo
method. We stress that in Theorem 3 we do not assume any smoothness of the kernel
K (x, y).
gD
(2) Define
m := m := span{ cj }mj=1 ,
and define G cm := G c,
m to be the best approximant to f from m .
(3) Denote
f mc := f mc, := f G cm .
The term weak in this definition means that at the step (1) we do not shoot for
the optimal element of the dictionary, which realizes the corresponding supremum,
562
V. Temlyakov
but are satisfied with weaker property than being optimal. The obvious reason for
this is that we do not know in general that the optimal one exists. Another, practical
reason is that the weaker the assumption the easier to satisfy it and, therefore, easier
to realize in practice.
We consider here approximation in uniformly smooth Banach spaces. For a
Banach space X we define the modulus of smoothness
(u) :=
sup
x=y=1
1
( (x + uy + x uy) 1).
2
The uniformly smooth Banach space is the one with the property
lim (u)/u = 0.
u0
It is well known (see for instance [3], Lemma B.1) that in the case X = L p ,
1 p < we have
u p/ p
if 1 p 2,
(u)
2
( p 1)u /2 if 2 p < .
(3)
p :=
q
,
q 1
(4)
(5)
Both R.S. Ismagilov [5] and V.E. Maiorov [6] used constructive methods to get
their estimates (4) and (5). V.E. Maiorov [6] applied number theoretical methods
563
based on Gaussian sums. The key point of that technique can be formulated in terms
of best m-term approximation of trigonometric polynomials. Let as above RT (N )
be the subspace of real trigonometric polynomials of order N . Using the Gaussian
sums one can prove (constructively) the estimate
m (t, RT ) C N 3/2 m 1 t1 , t RT (N ).
(6)
Denote as above
a0 +
N
N
(ak cos k2 x + bk sin k2 x) A := |a0 | +
(|ak | + |bk |).
k=1
k=1
(7)
Thus (7) is stronger than (6). The following estimate was proved in [1]
m (t, RT ) Cm 1/2 (ln(1 + N /m))1/2 t A , t RT (N ).
(8)
In a way (8) is much stronger than (7) and (6). The proof of (8) from [1] is not
constructive. The estimate (8) has been proved in [1] with the help of a nonconstructive theorem of Gluskin [4]. In [11] we gave a constructive proof of (8). The key
ingredient of that proof is the WCGA. In the paper [2] we already pointed out that
the WCGA provides a constructive proof of the estimate
m ( f, RT ) p C( p)m 1/2 f A ,
p [2, ).
(9)
The known proofs (before [2]) of (9) were nonconstructive (see discussion in [2],
Sect. 5). Thus, the WCGA provides a way of building a good m-term approximant.
However, the step (2) of the WCGA makes it difficult to control the coefficients of
the approximantthey are obtained through the Chebyshev projection of f onto
m . This motivates us to consider the IA() which gives explicit coefficients of the
approximant. We note that the IA() is close to the Weak Relaxed Greedy Algorithm (WRGA) (see [12], Chap. 6). Contrary to the IA(), where we build the
mth approximant G m as a convex combination of the previous approximant G m1
and the newly chosen dictionary element m with a priori fixed coefficients: G m =
(1 1/m)G m1 + m /m, in the WRGA we build G m = (1 m )G m1 + m m
with m [0, 1] chosen from an optimization problem, which depends on f and m.
564
V. Temlyakov
For more detailed comparison of the IA() and the WRGA in application in numerical
integration see [12], pp. 402403.
Second, we proceed to a discussion and proof of Theorem 2. In order to be able
to run the IA() for all iterations we need existence of an element mi, D at the
step (1) of the algorithm for all m. It is clear that the following condition guarantees
such existence.
Condition B. We say that for a given dictionary D an element f satisfies Condition
B if for all F X we have
F( f ) sup F(g).
gD
It is well known (see, for instance, [12], p. 343) that any f A1 (D) satisfies
Condition B. For completeness we give this simple argument here. Take any f
A1 (D). Then for any > 0 there exist g1 , . . . , g N D and numbers a1 , . . . , a N
such that ai > 0, a1 + + a N = 1 and
f
N
ai gi .
i=1
Thus
F( f ) F + F(
N
i=1
A1 (D)
gD
p=
q
, n = 1, 2, . . . .
q 1
m = 1, 2 . . . .
565
In the case f A1 (D) this theorem is proved in [11] (see also [12], Chap. 6). As
we mentioned above Condition B is equivalent to f A1 (D).
We now give some applications of Theorem 5 in the construction of special polynomials. We begin with a general result.
Theorem 6 Let X be a uniformly smooth Banach space with modulus of smoothness
(u) u q , 1 < q 2. For any n elements 1 , 2 , . . . , n , j 1, j = 1, . . . , n,
there exists a subset [1, n] of cardinality || m < n and natural numbers a j ,
j such that
n
aj
1
j X C 1/q m 1/q1 ,
j
n j=1
m
j
a j = m.
m
1
jk X C 1/q m 1/q1 ,
m
k=1
m
1
k=1 m jk
can be
n
aj
1
j C(b)(ln m)1/2 m 1/2 .
j
n j=1
m
j
(10)
n
a j ( p)
1
j p C p 1/2 m 1/2 ,
j
n j=1
m
j( p)
with |( p)| m.
j( p)
a j ( p) = m,
(11)
566
V. Temlyakov
Second, by the Nikolskii inequality (see [7], Chap. 1, S2): for a trigonometric
polynomial t of order N one has
t p C N 1/q1/ p tq ,
we obtain from (11)
C N 1/ p
1 q < p ,
n
a j ( p)
1
j
j
n j=1
m
j( p)
n
a j ( p)
1
j p C p 1/2 N 1/ p m 1/2 .
j
n j=1
m
j( p)
aj
j
j C N 1/ p t
aj
j
j p C p 1/2 N 1/ p m 1/2 .
567
sup
L p ( y ) 1
= J ()
J (y)
m
K ( , y) (y)dy| =
=1
m
K ( , ) L p ( y ) .
(13)
=1
Define the error of optimal cubature formula with m knots for a class W
m (W ) :=
inf
1 ,...,m ; 1 ,..., m
m (W, ).
inf
1 ,...,m ; 1 ,..., m
J ()
m
K ( , ) L p ( y ) .
=1
Thus, the problem of finding the optimal error of a cubature formula with m knots
for the class W pK is equivalent to the problem of best m-term approximation of a
special function J with respect to the dictionary D = {K (x, ), x x }.
Consider a problem of numerical integration of functions K (x, y), y y , with
respect to x, K Kq :
x
K (x, y)dx
m
K ( , y).
=1
K (x, y)dx
m
K ( , y) L q ( y ) .
=1
The above definition of the (K , q)-discrepancy implies right a way the following
relation.
568
V. Temlyakov
Proposition 2
inf
1 ,...,m ; 1 ,..., m
inf
1 ,...,m ; 1 ,..., m
D(m , K , q)
J ()
m
K ( , ) L q ( y ) .
=1
m
K ( , ) L p ( y ) .
=1
h(y)(y)dy.
y
Therefore, the functions |h(y)K (x, y)| and h(y)K (x, y) are integrable on x y
and by Fubinis theorem
F(JK ) =
h(y)
y
K (x, y)dx =
y
=
which proves the Condition B. Applying Theorem 5 and taking into account (3) we
complete the proof.
Proposition 2 and the above proof imply the following theorem on (K , q)discrepancy.
569
[0,1]d
Thus the IA() should find at a step m an approximate solution to the following
optimization problem (over x [0, 1]d )
[0,1]d
i,
i,
| f m1
(y)|q2 f m1
(y)K (x, y)dy
max.
References
1. DeVore, R.A., Temlyakov, V.N.: Nonlinear approximation by trigonometric sums. J. Fourier
Anal. Appl. 2, 2948 (1995)
2. Dilworth, S.J., Kutzarova, D., Temlyakov, V.N.: Convergence of some Greedy Algorithms in
Banach spaces. J. Fourier Anal. Appl. 8, 489505 (2002)
3. Donahue, M., Gurvits, L., Darken, C., Sontag, E.: Rate of convex approximation in non-Hilbert
spaces. Constr. Approx. 13, 187220 (1997)
4. Gluskin, E.D.: Extremal properties of orthogonal parallelpipeds and their application to the
geometry of Banach spaces. Math USSR Sbornik 64, 8596 (1989)
5. Ismagilov, R.S.: Widths of sets in normed linear spaces and the approximation of functions by
trigonometric polynomials, Uspekhi Mat. Nauk, 29 (1974), 161178; English transl. in Russian
Math. Surveys, 29 (1974)
6. Maiorov, V.E.: Trigonometric diameters of the Sobolev classes W pr in the space L q . Math.
Notes 40, 590597 (1986)
7. Temlyakov, V.N.: Approximation of Periodic Functions, Nova Science Publishers, Inc., New
York (1993)
8. Temlyakov, V.N.: Weak greedy algorithms. Adv. Comput. Math. 12, 213227 (2000)
9. Temlyakov, V.N.: Greedy algorithms in Banach spaces. Adv. Comput. Math. 14, 277292
(2001)
570
V. Temlyakov
10. Temlyakov, V.N.: Cubature formulas, discrepancy, and nonlinear approximation. J. Complex.
19, 352391 (2003)
11. Temlyakov, V.N.: Greedy-type approximation in Banach spaces and applications. Constr.
Approx. 21, 257292 (2005)
12. Temlyakov, V.N.: Greedy Approximation. Cambridge University Press, Cambridge (2011)
Abstract This is a tutorial paper that gives the complete proof of a result of Frolov
(Dokl Akad Nauk SSSR 231:818821, 1976, [4]) that shows the optimal order
of convergence for numerical integration of functions with bounded mixed derivatives. The presentation follows Temlyakov (J Complex 19:352391, 2003, [13]),
see also Temlyakov (Approximation of periodic functions, 1993, [12]).
Keywords Frolov cubature Numerical Integration Sobolev space Tutorial
1 Introduction
We study cubature formulas for the approximation of the d-dimensional integral
I( f ) =
[0,1]d
f (x) dx
for functions f with bounded mixed derivatives. For this, let D f , Nd0 , be the
usual (weak) partial derivative of a function f and define the norm
f 2s,mix :=
D f 2L 2 ,
(1)
Nd0 : s
where s N. In the following we will study the class (or in fact the unit ball)
Hds,mix :=
f C sd ([0, 1]d ) : f s,mix 1 ,
(2)
i.e. the closure in C([0, 1]d ) (with respect to s,mix ) of the set of sd-times continuously differentiable functions f with f s,mix 1. Note that these well-studied
classes of functions often appear with different notations, like M W2s , S2s W or S2s H .
M. Ullrich (B)
Johannes Kepler Universitt, 4040 Linz, Austria
e-mail: mario.ullrich@jku.at
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_31
571
572
M. Ullrich
f Hds,mix : supp( f ) (0, 1)d .
(3)
n
a j f (x j )
(4)
j=1
j
for a given set of nodes {x j }nj=1 , x j = (x1 , . . . , xd ) [0, 1]d , and weigths (a j )nj=1 ,
a j R, i.e. the algorithm Q n uses at most n function evaluations of the input function.
The worst case error of Q n in the function class H is defined as
e(Q n , H ) = sup |I ( f ) Q n ( f )|.
f H
573
s,mix
Remark 2 There is a natural generalization of the spaces H ds,mix , say H d,
p , where
the L 2 -norm in (1) is replaced by an L p -norm, 1 < p < . The same lower bounds
as mentioned in Remark 1 are valid also in this case, see [13, Theorem 3.2]. Obviously,
the upper bounds from Theorem 1 hold for these spaces if p 2, since the spaces get
smaller for larger p. For 1 < p < 2 it was proven by Skriganov [10, Theorem 2.1]
that the same algorithm satisfies the optimal order. We refer to [13] and references
therein for more details on this and the more delicate case p = 1.
Remark 3 Besides the cubature rule of Frolov that is analyzed in this paper, there
are several other constructions. Two prominent examples are the Smolyak algorithm
and (higher order) digital nets, see [9, Chap. 15] and [1], respectively. However, it is
proven that the Smolyak algorithm cannot achieve the optimal order of convergence
for the function classes under consideration, see [2, Theorem 5.2], and that the upper
bounds on the error for digital nets are (at the moment) restricted to small smoothness,
see e.g. [6]. In this sense Frolovs cubature is universal, i.e. the same cubature rule
gives the optimal order of convergence for every choice of the parameters s and d.
This is also true in the more general setting of Besov and Triebel-Lizorkin spaces,
see [14].
2 Proof of Theorem 1
2.1 The Algorithm
We start with the construction of the nodes of our cubature rule. See Sloan and Joe [11]
for a more comprehensive introduction to this topic. In the setting of Theorem 1 the
set X [0, 1)d of nodes will be a subset of a lattice X Rd , i.e. x, y X implies
x y X. In fact, we take all points inside the unit cube.
The lattice X will be d-dimensional, i.e. there exists a non-singular matrix
T Rdd such that
(5)
X := T (Zd ) = T x : x Zd .
The matrix T is called the generator of the lattice X. Obviously, every multiple of
X, i.e. cX for some c R, is again a lattice and note that while X is a lattice, it is
not necessarily an integration lattice, i.e. in general we do not have X Zd .
In the following we will fix a generator T and consider all points inside the cube
[0, 1)d of the shrinked lattice a 1 T (Zd ), a > 1, as nodes for our cubature rule for
functions from H ds,mix . That is, we will use the set of points
X ad := a 1 X [0, 1)d ,
where X is given by (5).
a > 1,
(6)
574
M. Ullrich
For the construction of the nodes it remains to present a specific generator matrix
T that is suitable for our purposes. For this, define the polynomials
d
Pd (t) :=
t 2 j + 1 1,
t R.
(7)
j=1
Obviously, the polynomial Pd has only integer coefficients, and it is easy to check
that it is irreducible1 (over Q) and has d different real roots. Let 1 , . . . , d R be
the roots of Pd . Using these roots we define the d d-matrix B by
d
d
j1
B = Bi, j i, j=1 := i
i, j=1
(8)
This matrix is a Vandermonde matrix and hence invertible and we define the generator
matrix of our lattice by
T = (B
)1 ,
(9)
where B
is the transpose of B. It is well known that X := B(Zd ) is the dual lattice
associated with X = T (Zd ), i.e. y X if and only if x, y
Z for all x X.
We define the cubature rule for functions f from H ds,mix by
Q a ( f ) := a d det(T )
f (x),
a > 1.
(10)
xX ad
In the next subsection we will prove that Q a has the optimal order of convergence
for H ds,mix .
Note that Q a ( f ) uses |X ad | function values of f and that the weights of this
algorithm are equal, but do not (in general) sum up to one, i.e. Q a is not a quasiMonte Carlo method. While the number |X ad | of points can be estimated in terms of the
determinant of the corresponding generator matrix, it is in general not equal. In fact, if
a 1 X would be an integration lattice, then it is well known that |X ad | = a d det(T 1 ),
see e.g. [11]. For the general lattices that we consider, we know, however, that these
numbers are of the same order, see Skriganov [10, Theorem 1.1].2
Lemma 1 Let X = T (Zd ) Rd be a lattice with generator T of the form (9), and
let X ad be given by (6). Then there exists a constant C T that is independent of a such
that
d
|X | a d det(T 1 ) C T lnd1 1 + a d
a
polynomial P is called irreducible over Q if P = G H for two polynomials G, H with rational
coefficients implies that one of them has degree zero. This implies that all roots of P must be irra
tional. In fact, every polynomial of the form dj=1 (x b j ) 1 with different b j Z is irreducible,
but has not necessarily d different real roots.
2 Skriganov proved this result for admissible lattices. The required property will be proven in
Lemma 3, see also [10, Lemma 3.1(2)].
1A
575
|X ad |
= 1.
a d det(T 1 )
Remark 4 It is still not clear if the corresponding QMC algorithm, i.e. the cubature
rule (10) with a d det(T ) replaced by |X ad |1 , has the same order of convergence. If
true, this would imply the optimal order of the L p -discrepancy, p < , of a (deterministic) modification of the set X ad , see [5, 10]. We leave this as an open problem. In
fact, Skriganov [10, Corollary 2.1] proved that for every a > 0 there exists a vector
z a Rd such that the translated set X ad z a satisfies the above conditions.
In the remaining subsection we prove the crucial property of these nodes. For
this we need the following corollary of the Fundamental Theorem of Symmetric
Polynomials, see, [3, Theorem 6.4.2].
Lemma 2 Let P(x) = dj=1 (x j ) and G(x1 , . . . , xd ) be polynomials with integer coefficients. Additionally, assume that G(x1 , . . . , xd ) is symmetric in x1 , . . . , xd ,
i.e. invariant under permutations of x1 , . . . , xd . Then, G(1 , . . . , d ) Z.
We obtain that the elements of the dual lattice B(Zd ) satisfy the following.
Lemma 3 Let 0 = z = (z 1 , . . . , z d ) B(Zd ) with B from (8). Then, dj=1 z i
Z \ 0.
Proof Fix m = (m 1 , . . . , m d ) Zd such that Bm = z. Hence,
zi =
d
m j i
j1
j=1
depends only on i . This implies that dj=1 z i is a symmetric polynomial in 1 , . . . , d
with integer coefficients. By Lemma 2, we have dj=1 z i Z.
to prove z i = 0 for i = 1, . . . , d. Define the polynomial R1 (x) :=
dIt remains
j1
m
x
and assume that z = R1 ( ) = 0 for some = 1, . . . , d. Then there
j
j=1
exist unique polynomials G and R2 with rational coefficients such that
Pd (x) = G(x)R1 (x) + R2 (x),
where degree(R2 ) < degree(R1 ). By assumption, R2 ( ) = 0. If R2 0 this is a
contradiction to the irreducibility of Pd . If not, divide Pd by R2 (instead of R1 ).
Iterating this procedure, we will eventually find a polynomial R with degree(R ) >
0 (since it has a root) and rational coefficients that divides Pd : a contradiction to the
irreducibility. This completes the proof of the lemma.
We finish the subsection with a result on the maximal number of nodes in the dual
lattice that lie in an axis-parallel box of fixed volume.
576
M. Ullrich
Corollary 2 Let B be the matrix from (8) and a > 0. Then, for each axis-parallel
box Rd we have
a B(Zd ) a d vold () + 1.
Proof Assume first that vold () < a d . If contains 2 different points z, z
a B(Zd ), then, using that this implies z = z z a B(Zd ), we obtain
vold ()
|z i z i | =
i=1
|z i | a d
i=1
Pd (x) = 2 cos d arccos(x/2) ,
cf. the Chebyshev polynomials. The roots of this polynomial are given by
(2i 1)
,
i = 2 cos
2d
i = 1, . . . , d.
Hence, the construction of the lattice X that is based on this polynomial is completely
explicit. For a suitable polynomial if 2d + 1 is prime, see [7]. We didnt try to find
a completely explicit construction in the intermediate cases.
577
For this we need the following two lemmas. Recall that the Fourier transform of
an integrable function f L 1 (Rd ) is given by
f(y) :=
with y, x
:=
s (y) =
d
j=1
Rd
y Rd ,
y j x j . Furthermore, let
s
d
j=1
|2 y j |
2
=0
|2 y j |2 j ,
y Rd .
(11)
Nd0 : s j=1
Clearly,
s (y)| f(y)|2 =
Nd0 : s
2
d
j
2 i y,x
(2
i
y
)
f
(x)
e
dx
j
d
R j=1
2
D f (y)
Nd0 : s
d
d
M B := # m Z : B ([0, 1] ) m + (0, 1) = . Then, for each f H ds,mix ,
s N, we have
MB
f 2s,mix .
s (z)| f(z)|2
det(B)
d
zB(Z )
f (T (m + x)),
x [0, 1]d .
mZd
Clearly, at most M B of the summands are not zero and g is 1-periodic. Hence, we
obtain by Parsevals identity and Jensens inequality that
578
M. Ullrich
s (z)| f(z)|2 =
2
f (z) =
D
s zB(Zd )
zB(Zd )
= det(T )2
s yZd
Rd
2
D f (x) e2 i By,x
dx
2
D f (T x) e2 i y,x
dx
s yZd
Rd
s yZd
[0,1]d
2
2
2
i
y,x
= det(T )
D
f
(T
(m
+
x))
e
dx
d
[0,1]
s yZd mZd
2
= det(T )2
D g(x) e2 i y,x
dx
= det(T )2
[0,1]d
= det(T )2 M B2
det(T )2 M B2
= det(T )2 M B
D g(x)2 dx
s
[0,1]d
[0,1]d
Rd
2
1
dx
D
f
(T
(m
+
x))
M
B mZd
2
1
D f (T (m + x)) dx
MB
d
mZ
D f (T x)2 dx = det(T ) M B f 2
s,mix
as claimed.
f (x) =
f(y).
yX
xX[0,1)d
f (x) =
xX
f (x) =
xZd
f (T x) =
g(x).
xZd
f(y) =
yX
f(By) =
yZd
= det(T )
= det(T )
yZd
yZd
Rd
f (x) e2 i By,x dx =
yZd
f (T z) e2 i y,z dz = det(T )
Rd
yZd
579
f (x) e2 i y,B
Rd
dx
g(z) e2 i y,z dz
Rd
g(y),
yZd
where we performed the substitution x = T z. (Here, we need that the lattice is fulldimensional.) In particular, the series on the left hand side converges if and only
if the right hand side does. For the proof of this convergence note that f H ds,mix ,
s 1, implies g1,mix gs,mix < . We obtain by Lemma 4 that
2
1 (y)|g(y)|
M B g21,mix <
yZd
1/2
1/2
2
|g(y)|
|1 (y)|1
1 (y) |g(y)|
< ,
y=0
yZd
y=0
g(y)
yZd
yZd
g(z) e2 i y,z dz
Rd
yZd
[0,1]d
g(m + z) e2 i y,z dz =
mZd
g(m).
mZd
The
last equality is simply d the evaluation of the Fourier series of the function
mZd g(m + x), x [0, 1] , at the point x = 0. It follows from the absolute convergence of the left hand side that this Fourier series is pointwise convergent.
By Lemma 5 we can write the algorithm Q a , a > 1, as
Q a ( f ) = a d det(T )
xX ad
f (x) =
f(z),
f H ds,mix ,
za B(Zd )
where a B (see (8)) is the generator of the dual lattice of a 1 T (Zd ) (see (9)) and
X ad = (a 1 X) [0, 1)d . Since I ( f ) = f(0) we obtain
580
M. Ullrich
1/2
1/2
|s (z)|1
s (z) | f(z)|2 .
za B(Zd )\0
za B(Zd )\0
with s from (11). We bound both sums separately. First, note that Lemma 4 implies
that
s (z) | f(z)|2 C(a, B) f 2s,mix
za B(Zd )\0
(12)
za B(Zd )\0
This follows from the fact that Ma B is the number of unit cubes that are necessary
to cover the set a B
([0, 1]d ), and det(a B) is its volume.
Now we treat the first sum. Define, for m = (m 1 , . . . , m d ) Nd0 , the sets
(m) := {x Rd : 2m j 1 |x j | < 2m j for j = 1, . . . , d}.
and note that dj=1 |x j | < 2m1 for all x (m). Recall from Lemma 3 that
d
d
d
d
j=1 z j Z \ 0 for all z B(Z ) \ 0 and, consequently,
j=1 |z j | a for z
d
d
d
a B(Z ) \ 0. This shows that |(a B(Z ) \ 0) (m)| = 0 for all m N0 with m1 <
d log2 (a) =: r . Hence, with |z | := dj=1 max{1, 2 |z j |}, we obtain
za B(Zd )\0
|s (z)|1
za B(Zd )\0
|z |2s =
|z |2s .
Note that for z (m) we have |z | dj=1 max{1, 2 2m j 1 } 2m1 . Since (m)
is a union of 2d axis-parallel boxes each with volume less than 2m1 , Corollary 2
implies that (a B(Zd ) \ 0) (m) 2d (a d 2m1 + 1) 2d+2 2m1 r for m with
m1 r . Additionally, note that {m Nd0 : m1 = } = d+1
< ( + 1)d1 .
We obtain
|s (z)|1
581
(a B(Zd ) \ 0) (m) 22sm1
=r m:m1 =
za B(Zd )\0
2d+2
2m1 r 22sm1
=r m:m1 =
2d+2
( + 1)d1 2r 22s = 2d+2
(t + r + 1)d1 2t 22s(t+r )
=r
d1
log2 a d
1+
t=0
d1
2d+2 a 2sd log2 a d
t=0
t +2
d log2 (a)
d1
2(12s)t
t=0
where we have used that d log2 (a) r < d log2 (a) + 1. Clearly, the last series converges iff a > e1/(2s1) and, in particular, it is bounded by 23 for a e2 and all
s N.
So, all together
d1
e( Q a , H ds,mix ) 2d/2+3 a sd log2 a d 2
(13)
for a > 1 large enough. From Lemma 1 we know that the number of nodes used by
Q a is proportional to a d . This proves Theorem 1.
Remark 6 It is interesting to note that the proof of Theorem 1 is to a large extent
independent of the domain of integration. For an arbitrary Jordan measurable set
Rd we can consider
Q a from (10) with the set of nodes X ad
1the algorithm
d
d
replaced by X a () = a T (Z ) . The only difference in the estimates would
be that C(a, B), cf. (12), converges to vold () instead of 1.
References
1. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy theory and quasi-Monte
Carlo integration. Cambridge University Press, Cambridge (2010)
2. Dung, D., Ullrich, T.: Lower bounds for the integration error for multivariate functions
with mixed smoothness and optimal Fibonacci cubature for functions on the square, Math.
Nachrichten (2015) (to appear)
3. Fine, B., Rosenberger, G.: The fundamental theorem of algebra. Springer-Verlag, New York,
Undergraduate Texts in Mathematics (1997)
4. Frolov, K.K.: Upper error bounds for quadrature formulas on function classes. Dokl. Akad.
Nauk SSSR 231, 818821 (1976)
5. Frolov, K.K.: Upper bound of the discrepancy in metric L p , 2 p < . Dokl. Akad. Nauk
SSSR 252, 805807 (1980)
582
M. Ullrich
6. Hinrichs, A., Markhasin, L., Oettershagen, J., Ullrich, T.: Optimal quasi-Monte Carlo rules
on higher order digital nets for the numerical integration of multivariate periodic functions.
e-prints (2015)
7. Lee, C.-L., Wong, K.B.: On Chebyshevs polynomials and certain combinatorial identities.
Bull. Malays. Math. Sci. Soc. 2(34), 279286 (2011)
8. Nguyen, V.K., Ullrich, M. Ullrich, T.: Change of variable in spaces of mixed smoothness and
numerical integration of multivariate functions on the unit cube (2015) (preprint)
9. Novak, E., Wozniakowaski, H.: Tractability of Multivariate Problems, Volume II: Standard
Information for Functionals EMS Tracts in Mathematics, Vol. 12, Eur. Math. Soc. Publ. House,
Zrich (2010)
10. Skriganov, M.M.: Constructions of uniform distributions in terms of geometry of numbers.
Algebra i Analiz 6, 200230 (1994)
11. Sloan, I.H., Joe, S.: Lattice Methods for Multiple Integration. Oxford Science Publications,
New York (1994)
12. Temlyakov, V.N.: Approximation of Periodic Functions. Computational Mathematics and
Analysis Series. Nova Science Publishers Inc, NY (1993)
13. Temlyakov, V.N.: Cubature formulas, discrepancy, and nonlinear approximation. J. Complex.
19, 352391 (2003)
14. Ullrich, M., Ullrich, T.: The role of Frolovs cubature formula for functions with bounded
mixed derivative, SIAM J. Numer. Anal. 54(2), 969993 (2016)
d
1 2 + 2 K (x , t ) for all x, t Rd .
=1
The [0, 1] are scale parameters, and the > 0 are sometimes called shape parameters. The reproducing kernel K corresponds to some Hilbert space of functions
d generalizes the anisotropic Gaussian reproducing kerdefined on R. The kernel K
nel, whose tractability properties have been established in the literature. We present
sufficient conditions on { }
=1 under which function approximation problems on
Hd are polynomially tractable. The exponent of strong polynomial tractability arises
from bounds on the eigenvalues of positive definite linear operators.
Keywords Function approximation Tractability Product kernels
1 Introduction
This article addresses the problem of function approximation. In a typical application
we are given data of the form yi = f (x i ) or yi = L i ( f ) for i = 1, . . . , n. That is,
a function f is sampled at the locations {x 1 , . . . , x n }, usually referred to as the data
583
584
sites or the design, or more generally we know the values of n linear functionals
L 1 , . . . , L n applied to f . Here we assume that the domain of f is a subset of Rd . The
goal is to construct An f , a good approximation to f that is inexpensive to evaluate.
Algorithms for function approximation based on symmetric positive definite kernels have arisen in both the numerical computation literature [3, 5, 13, 18], and the
statistical learning literature [1, 4, 7, 12, 1417]. These algorithms go by a variety
of names, including radial basis function methods [3], scattered data approximation
[18], meshfree methods [5], (smoothing) splines [17], kriging [15], Gaussian process
models [12] and support vector machines [16].
Many kernels commonly used in practice are associated with a sequence of shape
parameters = { }
=1 , which allows more flexibility in the function approximation problem. Examples of such kernels include the Matrn, the multiquadrics, the
inverse multiquadrics, and the extensively studied Gaussian kernel (also known as the
squared exponential kernel). The anisotropic stationary Gaussian kernel, is given by
d (x, t) := e12 (x1 t1 )2 d2 (xd td )2 =
K
d
e (x t )
2
for all x, t Rd ,
(1)
=1
wor
(An ) :=
sup
f Hd 1
f An f L2 ,
f L2 :=
1/2
f (t) d (t) dt
2
Rd
, (2)
585
Table 1 Error decay rates for the Gaussian kernel as a function of sample size n
Data available
Absolute error criterion
Normalized error criterion
Linear functionals
Function values
n max(r ( ),1/2)
n max(r ( )/[1+1/(2r ( ))],1/4)
n r ( ) , if r ( ) > 0
n r ( )/[1+1/(2r ( ))] , if r ( ) > 1/2
1/
r ( ) := sup > 0
< .
(3)
=1
The kernel studied in this article has the more general product form given below:
d,, (x, t) :=
d (x, t) = K
K
d
(4)
=1
x, t R.
(5)
We assume that we know the eigenpair expansion of the kernel K for univariate
functions in terms of its shape parameter . Many kernels in the numerical integration
and approximation literature take the form of (4), where governs the vertical scale
of the kernel across the th dimension. In particular, taking = 1 for all and
K (x, t) = exp( 2 (x t)2 ) recovers the anisotropic Gaussian kernel (1).
The goal of this paper is to extend the results in Table 1 to the kernel in (4). In
essence we are able to replace r ( ) by r (, ), defined as
1/
( ) < = r { }N ,
r (, ) := sup > 0
(6)
=1
with the convention that the supremum of the empty set is taken to be zero.
The known eigenpair expansion of K does not give us explicit formulae for the
, is a convex
, . However, since K
eigenvalues and eigenfunctions of the kernel K
combination of the constant kernel and a kernel with a known eigenpair expansion,
, by approximatwe can derive upper and lower bounds on the eigenvalues of K
ing the corresponding linear operators by finite rank operators and applying some
inequalities for eigenvalues of matrices. These bounds then imply bounds on the
d , which is of tensor product form. Bounds on the eigenvalues of
eigenvalues of K
K d lead to tractability results for function approximation on Hd .
586
2 Function Approximation
2.1 Reproducing Kernel Hilbert Spaces
d ) denote a reproducing kernel Hilbert space of real functions
Let Hd = H ( K
d
defined on R . The goal is to approximate any function in Hd given a finite number
d : Rd Rd R is symmetric and positive
of data. The reproducing kernel K
definite. It takes the form (4), where K satisfies the unit trace condition:
R
(7)
j=1
inf
An with L j
2.2 Tractability
While typical numerical analysis focuses on the rate of convergence, it does not take
into consideration the effects of d. The study of tractability arises in informationbased complexity and it considers how the error depends on the dimension, d, as
well as the number of data, n.
In particular, we would like to know how ewor (n, Hd ) depends not only on n
but also on d. Because of the focus on d-dependence, the absolute and normalized
error criteria mentioned in Table 1 may lead to different answers. For a given positive
(0, 1) we want to find an algorithm An with the smallest n for which the error does
587
not exceed for the absolute error criterion, and does not exceed ewor (0, Hd ) =
Id for the normalized error criterion. That is,
n
wor
(, Hd ) = min n | e
wor
,
= abs,
(n, Hd )
Id , = norm,
.
(8)
If q = 0 above then we say that I is strongly polynomially tractable and the infimum
of p satisfying the bound above is the exponent of strong polynomial tractability.
The essence of polynomial tractability is to guarantee that a polynomial number
of linear functionals is enough to solve the function approximation problem up to an
error at most . Obviously, polynomial tractability depends on which class, all or
std , is considered and whether the absolute or normalized error is used.
The property of strong polynomial tractability is especially challenging since then
the number of linear functionals needed for an -approximation is independent of d.
Nevertheless, we provide here positive results on strong polynomial tractability.
Rd
d is a positive definite
It is known that W is self-adjoint and positive definite if K
kernel. Moreover (7) implies that W is compact. Let us define the eigenpairs of W
by (d, j , d, j ), where the eigenvalues are ordered, d,1 d,2 , and
W d, j = d, j d, j with d, j , d,i
Hd = i, j for all i, j N.
Note also that for any f Hd we have
f, d, j
L2 = d, j f, d, j
Hd .
Taking f = d,i we see that {d, j } is a set of orthogonal functions in L2 . Letting
1/2
d, j = d, j d, j for all j N,
588
d (x, t) =
K
d, j (x) d, j (t) =
j=1
j=1
To standardize the notation, we shall always write the eigenvalues of the linear
d,, in (4) in a weakly decreasing order
operator corresponding to the kernel K
d,, ,1 d,, ,2 . We drop the dependency on the dimension d to denote the
,
eigenvalues of the linear operator corresponding to the one-dimensional kernel K
in (5) by , ,1 , ,2 . Similarly the eigenvalues of the linear operator corresponding to the one-dimensional kernel K (x, t) are denoted by ,1 ,2 .
A useful relation between the sum of the th power of the multivariate eigenvalues
d,, , j and the sums of the th powers of the univariate eigenvalues , , j is given
by [6, Lemma 3.1]:
j=1
d,,
,j
=
, , j
=1
> 0.
j=1
We are interested in the high dimensional case where d is large, and we want
to establish convergence and tractability results when and/or tend to zero
as . According to [10], strong polynomial tractability holds if the sum of
some powers of eigenvalues are bounded. The following lemma provides us with
some useful inequalities on eigenvalues of the linear operators corresponding to
reproducing kernels.
Lemma 1 Let H (K A ), H (K B ), H (K C ) L2 (R, 1 ) be Hilbert spaces with
symmetric positive definite reproducing kernels K A , K B and K C such that
R
(9)
Let the eigenvalues of the operators be sorted in a weakly decreasing order, i.e.
,1 ,2 . Then these eigenvalues satisfy
C,i+ j+1 a A,i+1 + b B, j+1 , i, j = 1, 2, . . .
(10)
(11)
589
n
x, u j
u j , x L2 (R, 1 ).
j=1
590
, j =
j=1
R
> 0.
(12)
We want to verify whether polynomial tractability holds, namely whether (8) holds.
1
2r (,
)
, j
C3
C2
2
j=2
(14)
hold for all 0 < < sup{ | N}, then it follows that
I is strongly polynomially tractable with exponent
p all = min 2,
1
.
r (, )
c2 := sup
dN
591
1/
d,,
,j
< ,
(15)
j=c1
d,, , j
d
d
=
, , j =
[1 2 + 2 K (t, t)]1 (t) dt
=1
j=1
=1
j=1
d
1 2 + 2 = 1.
=1
2
,
, j 1 + C U ( ) ,
(16)
j=1
where the constant CU does not depend on or . Since all the eigenvalues of K
are non-negative, we clearly have for the first eigenvalue of K ,
, ,1 1.
(17)
,
On the other hand, (13) gives the lower bound of the first eigenvalue of K
, (x, t)1 (x)1 (t) dtdx =
1 2 + 2 K (x, t) 1 (x)1 (t) dtdx
K
R2
R2
2
2
=1 +
K (x, t)1 (x)1 (t) dtdx 1 C1 ( )2 .
(18)
, ,1
R2
(19)
(20)
592
2
,
,j
j=3
, j1 C3 ( )2
(21)
j=3
by (14). Combining (17), (19) and (21) gives (16), where the constant CU = C1 +C3 .
The lower bound we want to establish is that for < 1/(2r (, )),
,
,j
1 + CL ( )
if <
j=1
C2
2C1
1/[2(1 )]
(22)
, ,1 1 C1 ( )2 .
,
,1
(23)
j = 2, 3, . . .
2
,
,j
j=2
, j C2 ( )2 ,
(24)
j=2
where the last inequality follows from (14). Inequalities (23) and (24) together give
2
2
,
1 + (C2 /2)( )2
, j 1 C 1 ( ) + C 2 ( )
j=1
j=1
d,
j
1 + CU ( )2
=
, , j
=1
= exp
j=1
=1
ln 1 + CU ( )
=1
exp CU
=1
( )
< . (25)
593
We now consider the lower bound in the multivariate case and define the set A by
C2 1/[2(1 )]
A =
<
.
2C1
Then
sup
dN
d,,
, , j =
, , j
, , j .
,j
=1
j=1
j=1
A
j=1
N\A
j=1
We want to show that this supremum is infinite for < 1/(2r (, )). We do this by
proving that the first product on the right is infinite. Indeed for < 1/(2r (, )),
1 + CL ( )2 1 + CL
, , j
( )2 = .
A
A
j=1
A
Therefore, p all 1/r (, ), which establishes the formula for p all . The estimates on
ewor-all (n, Hd ) and n wor-abs-all (, Hd ) follow from the definition of strong tractability.
Finally, the exponent of strong tractability is 2 for the isotropic kernel because
r (, ) = 0 in this case. To prove that strong polynomial tractability is equivalent
to polynomial tractability, it is enough to show that polynomial tractability implies
strong polynomial tractability. From [10, Theorem 5.1] we know that polynomial
tractability holds if and only if there exist numbers c1 > 0, q1 0, q2 0 and > 0
such that
1/
d, j
< .
c2 := sup d q2
dN
q1
j=C1 d
If so, then
d q2 (c1 1),
,
,j
,1 c2 < .
j=1
This implies that 1. On the other hand, for = 1 we can take q1 = q2 = 0 and
arbitrarily small C1 , and obtain strong tractability. This completes the proof.
Theorem 1 states that the exponent of strong polynomial tractability is at most
2, while for all shape parameters for which r (, ) > 1/2 the exponent is smaller
than 2. Again, although the rate of convergence of ewor-all (n, Hd ) is always excellent,
594
the dependence on d is eliminated only at the expense of the exponent which must
be roughly 1/ p all . Of course, if we take an exponentially decaying sequence of
the products of scale parameters and shape parameters, say, = q for some
q (0, 1), then r (, ) = and p all = 0. In this case, we have an excellent rate
of convergence without any dependence on d.
(1 + 1 + 2 )2
worabsstd
n
(, Hd )
.
4
wor-std
For the isotropic kernel with = and = for all , the exponent of
strong tractability is at least 2 and strong polynomial tractability is equivalent to
polynomial tractability.
Furthermore if r (, ) > 1/2, then
I is strongly polynomially tractable with exponent of strong polynomial tractability at most
1
1
1
+ 2
= p all + ( p all )2 < 4.
p std =
r (, ) 2r (, )
2
For all d N we have
ewor-std (n, Hd ) n 1/ p = n r (, )/[1+1/(2r (, ))] n ,
std
n wor-abs-std (, Hd ) p
std
0.
Proof The same proofs as for [6, Theorems 5.3 and 5.4] can be used. We only need
to show that the assumption of [9, Theorem 5], which is used in [6, Theorem 5.4], is
satisfied. It is enough to show that there exists p > 1 and B > 0 such that for any
n N,
d,, ,n
595
B
.
np
(26)
Take = 1/(2r (, )). Since the eigenvalues ,n are ordered, we have for n 2,
,n
C3 2
1
1
,
, j
, j
n 1 j=2
n 1 j=2
n1
n
where the last inequality follows from (14). Raising to the power 1/ gives
,n 2
C3
n1
1/
.
C3
n2
1/
1/
= 2 2 C3
n
n2
1/
n 1/
2 2 (3C3 )1/
.
n 1/
C4
,
np
{std, all}.
596
n wor-abs-all (, Hd ) p
all
0.
Proof From [10, Theorem 5.2] we know that strong polynomial tractability holds if
and only if there exits a positive number such that
c2 := sup
d
d,, , j
d,, ,1
j=1
= sup
d
d,,
,1
d,,
,j
j=1
< .
If so, then n wor-nor-all (, Hd ) c2 2 for all (0, 1) and d N, and the exponent
of strong polynomial tractability
infimum of 2 for which c2 < .
# is the
r (, )) from (25).
For all d N, we have
j=1 d,, , j < for = 1/(2
}
<
if
and
only
if
sup
It remains to note that supd {1/d,,
d {1/d,, ,1 } < .
,1
Furthermore note that (18) implies that
sup
d
1
d,, ,1
=1
1
.
1 C1 ( )2
597
#
2
Clearly, r (, ) 1/2 implies that
=1 ( ) < , which yields c2 < .
all
1/r (, ). The estimates on ewor-all (n, Hd ) and
This also proves that p
wor-nor-all
(, Hd ) follow from the definition of strong tractability.
n
598
References
1. Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic, Boston (2004)
2. Bernstein, D.S.: Matrix Mathematics. Princeton University, New Jersey (2008)
3. Buhmann, M.D.: Radial Basis Functions. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge (2003)
4. Cucker, F., Zhou, D.X.: Learning Theory: An Approximation Theory Viewpoint. Cambridge
Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge (2007)
5. Fasshauer, G.E.: Meshfree Approximation Methods with Matlab, Interdisciplinary Mathematical Sciences, vol. 6. World Scientific Publishing Co., Singapore (2007)
6. Fasshauer, G.E., Hickernell, F.J., Wozniakowski, H.: On dimension-independent rates of convergence for function approximation with Gaussian kernels. SIAM J. Numer. Anal. 50, 247271
(2012). doi:10.1137/10080138X
7. Hastie, T., Tibshirani, R., Friedman, J.: Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics, 2nd edn. Springer Science+Business Media
Inc, New York (2009)
8. Knutson, A., Tao, T.: Honeycombs and sums of Hermitian matrices. Not. AMS 482, 175186
(2001)
9. Kuo, F.Y., Wasilkowski, G.W., Wozniakowski, H.: On the power of standard information for
multivariate approximation in the worst case setting. J. Approx. Theory 158, 97125 (2009)
10. Novak, E., Wozniakowski, H.: Tractability of Multivariate Problems Volume I: Linear Information. EMS Tracts in Mathematics, vol. 6. European Mathematical Society, Zrich (2008)
11. Pietsch, A.: Operator Ideals. North-Holland Publishing Co., Amsterdam (1980)
12. Rasmussen, C.E., Williams, C.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006). http://www.gaussianprocess.org/gpml/
13. Schaback, R., Wendland, H.: Kernel techniques: from machine learning to meshless methods.
Acta Numer. 15, 543639 (2006)
14. Schlkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization,
Optimization, and Beyond. MIT Press, Cambridge (2002)
15. Stein, M.L.: Interpolation of Spatial Data: Some theory for Kriging. Springer, New York (1999)
16. Steinwart, I., Christmann, A.: Support Vector Machines. Springer Science+Business Media,
Inc., New York (2008)
17. Wahba, G.: Spline Models for Observational Data, CBMS-NSF Regional Conference Series
in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)
18. Wendland, H.: Scattered Data Approximation. Cambridge Monographs on Applied and Computational Mathematics, vol. 17. Cambridge University Press, Cambridge (2005)
Discrepancy Estimates
For Acceptance-Rejection Samplers
Using Stratified Inputs
Houying Zhu and Josef Dick
Acceptance-rejection sampler
Discrepancy
1 Introduction
The acceptance-rejection algorithm is one of the widely used techniques for sampling
from a distribution when direct simulation is not possible or is expensive. The idea
of this method is to determine a good choice of proposal density (also known as
hat function), and then sample from the proposal density with low cost. For a given
target density : D R+ and a well-chosen proposal density H : D R+ , one
assumes that there exists a constant L < such that (x) < L H (x) for all x in the
domain D. Let u have uniform distribution in the unit interval, i.e. u U ([0, 1]).
Then the plain acceptance-rejection algorithm works in the following way. One first
,
draws X H and u U ([0, 1]), then accepts X as a sample of if u L(X)
H (X)
H. Zhu (B) J. Dick
School of Mathematics and Statistics, The University of New South Wales,
Sydney NSW 2052, Australia
e-mail: houying.zhu@unsw.edu.au
J. Dick
e-mail: josef.dick@unsw.edu.au
Springer International Publishing Switzerland 2016
R. Cools and D. Nuyens (eds.), Monte Carlo and Quasi-Monte Carlo Methods,
Springer Proceedings in Mathematics & Statistics 163,
DOI 10.1007/978-3-319-33507-0_33
599
600
otherwise reject this sample and repeat the sampling step. Note that by applying
this algorithm, one needs to know the value of L. However, in many situations, this
constant is known for the given function or can be estimated.
Devroye [6] gave a construction method of a proposal density for log-concave
densities and Hrmann [17] proposed a rejection procedure, called transformed density rejection, to construct a proposal density. Detailed summaries of this technique
and some extensions can be found in the monographs [3, 18]. For many target densities finding a good proposal density is difficult. To improve efficiency one can also
determine a better choice of driver sequence having the designated proposal density,
which yields a deterministic type of acceptance-rejection method.
The deterministic acceptance-rejection algorithm has been discussed by
Moskowitz and Caflisch [22], Wang [31, 32] and Nguyen and kten [23], where
empirical evidence and a consistency result were given. Two measurements included
therein are the empirical root mean square error (RMSE) and the empirical standard
deviation. However, the discrepancy of samples has not been directly investigated.
Motivated by those papers, in [33] we investigated the discrepancy properties of
points produced by a totally deterministic acceptance-rejection method. We proved
that the discrepancy of samples generated by the acceptance-rejection sampler using
(t, m, s)nets as driver sequences is bounded from above by N 1/s , where the target
density function is defined in (s 1)-dimension and N is the number of samples
generated by the deterministic acceptance-rejection sampler. A lower bound shows
that for any given driver sequence, there always exists a target density such that the
star-discrepancy is bounded below by cs N 2/(s+1) , where cs is a constant depending
only on s.
Without going into details, in the following we briefly review known results in
the more general area of deterministic Markov chain quasi-Monte Carlo.
601
602
show that if the graph can be covered by a small number of elementary intervals,
then an improved rate of convergence can be achieved using (t, m, s)-nets as driver
sequence. In general, this strategy does not work with stratified sampling, unless one
knows the elementary intervals explicitly.
The paper is organized as follows. In Sect. 2 we provide the needed notation and
background. Section 3 introduces the proposed acceptance-rejection sampler using
stratified inputs, followed by the theoretical results including an upper bound on the
star-discrepancy and the L q -discrepancy. Numerical tests are presented in Sect. 3.3
together with a discussion of the results in comparison with the theoretical bounds of
Theorems 1 and 2. For comparison purpose only we do the numerical tests also with
pseudo-random inputs. Section 4 illustrates an improved rate of convergence when
using (t, m, s)-nets as driver sequences. The paper ends with concluding remarks.
2 Preliminaries
We are interested in the discrepancy properties of samples generated by the
acceptance-rejection sampler. We consider the L q -discrepancy and the stardiscrepancy.
Definition 1 (L q -discrepancy) Let 1 q be a real number. For a point set
PN = {x 0 , . . . , x N 1 } in [0, 1)s , the L q -discrepancy is defined by
L q,N (PN ) =
N 1
q 1/q
1
1[0,t) (x n ) ([0, t)) dt
,
[0,1]s N n=0
1, if x n [0, t),
, [0, t) = sj=1 [0, t j ) and is the Lebesgue
0, otherwise.
measure, with the obvious modification for q = . The L ,N -discrepancy is called
the star-discrepancy which is also denoted by D N (PN ).
where 1[0,t) (x n ) =
Later we will consider the discrepancy of samples associated with a density function.
The acceptance-rejection algorithm accepts all points below the graph of the
density function. In order to prove bounds on the discrepancy, we assume that the
set below the graph of the density function admits a so-called Minkowski content.
Definition 2 (Minkowski content) For a set A Rs , let A denote the boundary of
A and let
(( A) )
,
M ( A) = lim
0
2
where ( A) = {x Rs |x y for y A} and denotes the Euclidean
norm. If M ( A) (abbreviated as M A ) exists and is finite, then A is said to admit
an (s 1)dimensional Minkowski content.
603
For simplicity, we consider the Minkowski content associated with the boundary
of a given set, however one could define it in more general sense. Ambrosio et al.
[1] present a detailed discussion of general Minkowski content.
604
(( A) )
< .
2
Thus by the definition of the limit, for any fixed > 2, there exists 0 > 0 such that
(( A) ) M A whenever 0 < 0 .
s c j c j +1
, the largest diagBased on the form of the subcube given by i=1
,
M 1/s M 1/s
1/s
s
onal length
is s M
. We can assume that M > ( s/0 ) , then s M 1/s =:
< 0 and iJ Q i ( A) , where J is the index set for the sets Q i which satisfy
Q i A = . Therefore
|J |
M A
(( A) )
= sM A M 11/s .
1
(Q i )
M
605
Without loss of generality, we can set = 3. Note that the number of boxes Q i
which intersect Jt is bounded by the number of boxes Q i which intersect A,
which completes the proof.
Remark 1 Ambrosio et al. [1] found that for a closed set A Rs , if A has a Lipschitz
boundary, then A admits an (s 1)-dimensional Minkowski content. In particular,
a convex set A [0, 1]s has an (s 1)-dimensional Minkowski content. Note that
the surface area of a convex set in [0, 1]s is bounded by the surface area of the unit
cube [0, 1]s , which is 2s and it was also shown by Niederreiter and Wills [25] that 2s
is best possible. It follows that the Minkowski content M A 2s when A is a convex
set in [0, 1]s .
Lemma 4 Suppose that all the assumptions of Lemma 3 are satisfied. Let N be the
number of points accepted by Algorithm 1. Then we have
M((A) 3s 1/2 M A M 1/s ) N M((A) + 3s 1/2 M A M 1/s ).
Proof The number of points we accept in Algorithm 1 is a random number since the
driver sequence given by stratified inputs is random. Let E(N ) be the expectation
of N . The number of Q i which have non-empty intersection with A is bounded by
l = 3s 1/2 M A M 11/s from Lemma 3. Thus
E[N ] l N E[N ] + l.
Further we have
E[N ] =
M1
i=0
(Q i A)
= M(A).
(Q i )
(1)
(2)
Combining (1) and (2) and substituting l = 3s 1/2 M A M 11/s , one obtains the desired
result.
Before we start to prove the upper bound on the star-discrepancy, our method
requires the well-known BernsteinChernoff inequality.
Lemma 5 [2, Lemma 2] Let 0 , . . . , l1 be independent random variables with
E(i ) = 0 and |i | 1 for all 0 i l 1. Denote by i2 the variance of i , i.e.
2 1/2
i2 = E(i2 ). Set = ( l1
. Then for any > 0 we have
i=0 i )
l1
2e /4 , if 2 ,
P
i
2
2
2e /4 , if 2 .
i=0
606
D N , (Y N(s1) ) =
where C =
[0,1]s1
N 1
1
1
sup
1[0,t) ( yn )
(z)d z ,
N
C
s1
[0,t)
t[0,1]
n=0
(z) dz and s 2.
D N , (Y N(s1) )
s4
2
1
1
2s 2
6M A
((A))
1
1
2 2s
log N
N
1
1
2 + 2s
2(A)
,
N
(3)
Proof Let Jt = ([0, t) [0, 1]) A, where t = (t1 , . . . , ts1 ). Using the notation
from Algorithm 1, let yn be the first s1 coordinates of z n A, for n = 0, . . . , N 1.
We have
M1
N 1
1 Jt (x n ) =
1[0,t) ( yn ).
n=0
n=0
Therefore
N 1
1
1 M1
1
1
(Jt ).
1[0,t) ( yn )
(z)d z =
1 Jt (x n )
N n=0
C [0,t)
N n=0
(A)
(4)
It is noted that
M1
1 Jt (x n )
n=0
M1
N
N
(Jt )
1 Jt (x n ) M(Jt ) + (Jt ) M
(A)
(A)
n=0
M1
1 Jt (x n ) M(Jt ) + M(A) N
n=0
M1
M1
1 Jt (x n ) M(Jt ) + M(A)
1 A (x n )
n=0
n=0
M1
2 sup
1 Jt (x n ) M(Jt ).
t[0,1]s
n=0
(5)
607
(V )
= M(V ),
(Q i )
iJ
1, if z i Q i Jt ,
/ Q i Jt .
0, if z i
By definition,
l1
l1
M1
1 Jt (x n ) M(Jt ) =
i M
(Q i Jt ).
n=0
i=0
(6)
i=0
(7)
(8)
i=0
608
(9)
i=1
see, for instance, [11, Lemma 3.1] and [16, Section 2.1]. This means that we can
restrict ourselves to the elements of 1/M .
In view of (9)
s1
2L N
2
2
+1
P (Ri ; z 1 , . . . , z N ) |1/M |2N 4 2N 4 (2e)s1
< 1,
C
for = 2 2s and N 8e
+ 2.
C
Case 2: On the other hand, if 2 , then by Lemma 5 we obtain
P (Jt ; z 1 , . . . , z N ) l 1/2 (log N )1/2
l
l 1/2 (log N )1/2
4
=P
(i Ei ) 2e
.
(10)
i=1
(2e)s1
2L N
C
s1
+1
< 1,
609
1
1
2s 1/2 N 2 2s (log N )1/2 + 1/M.
1
q 1/q
N
N
1[0,t) ( yn )
(z) dz) dt
,
C [0,t)
[0,1)s1 n=0
where Y N(s1) is the sample set associated with the density function .
Theorem 2 Let the unnormalized density function : [0, 1]s1 R+ satisfy all
the assumptions stated in Theorem 1. Let Y N(s1) be the samples generated by the
acceptance-rejection sampler using stratified inputs in Algorithm 1. Then we have
for 2 q ,
1/q
4 2C((A))(11/s)(11/q)
(11)
Proof Let Jt = ([0, t) [0, 1]) A, where t = (t1 , . . . , ts1 ) [0, 1]s1 . Let
i (t) = 1 Q i Jt (x i ) (Q i Jt )/(Q i ),
where Q i for 0 i M 1 is a disjoint covering of [0, 1)s with (Q i ) = 1/M.
Then E(i (t)) = 0 since we have E[1 Q i Jt (x i )] = M(Q i Jt ). Hence for any
t [0, 1]s1 ,
E[i2 (t)] = E[(1 Q i Jt (x i ) M(Q i Jt ))2 ]
= E[1 Q i Jt (x i )] 2M(Q i Jt )E[1 Q i Jt (x i )] + M 2 2 (Q i Jt )
= M(Q i Jt )(1 M(Q i Jt )) 1/4.
610
i2 (t) =
i=0
l1
i=0
N 1
2 1/2
N
1[0,t) ( yn )
(z) dz) dt
C [0,t)
[0,1)s1 n=0
M1
N (Jt ) 2 1/2
= E
1 Jt (x n )
dt
(A)
[0,1)s1 n=0
M1
E(N )(Jt )
N (Jt ) 2 1/2
E
1 Jt (x n ) M(Jt ) +
dt
(A)
(A)
[0,1)s1
n=0
2 1/2
2 (Jt )
M1
(E(N ) N ) dt
2 E
1 Jt (x n ) M(Jt ) +
,
(A)
[0,1)s1 n=0
where we use (a + b)2 2(a 2 + b2 ).
Then we have
(s1)
E N 2 L 22,N (Y N
1/2
2 E
= 2
= 2
[0,1]s1
[0,1]s1
M1
2
i (t) dt +
i=0
M1
i=0
l1
[0,1]s1 i=0
2 1/2
1
E(N ) N
2
((A))
M1
L 2 2 1/2
i2 (t) dt + 2
i (1)
C
E[i2 (t)]dt +
i=0
l1
L2
C2
i2 (1)
1/2
i=0
1
(L 2 + C 2 )1/2 1/2
L 2 l 1/2
2 + 2
=
l .
4
C 4
2C
PM [0,1]s
(s1)
|N D N (Y N
)| =
sup
sup
PM [0,1]s t[0,1]s1
sup
sup
M1
i (t) =
i=0
sup
l1
i (t)
l1
i (t) l/4.
sup
611
Therefore, for 2 q ,
1/q
(L 2 + C 2 )1/2 11/q
q
,
E[N q L q,N (Y N(s1) )]
l
4 2C
which is a consequence of the log-convexity of L p -norms, i.e. f p f 1
p0
f p1 , where 1/ p = (1 )/ p0 + / p1 . In our case, p0 = 2 and p1 = .
Additionally, following from Lemma 4, we have M 2L N /C whenever
M > (6Ls 1/2 M A /C)s . Hence we obtain the desired result by substituting l = 3s 1/2
M A M 11/s and replacing M in terms of N .
Remark 2 It would also be interesting to find out whether (11) still holds for 1 <
q < 2. See Heinrich [15] for a possible proof technique.
We leave it as an open problem.
1 x1
(e + ex2 + ex3 + ex4 ), (x1 , x2 , x3 , x4 ) [0, 1]4 .
4
N 1
1
q 1/q
1
1[0,t) ( yn )
(z)d z dt
,
C [0,t)
[0,1]s1 N n=0
(12)
where C = [0,1]s1 (z) dz and t = (t1 , . . . , ts1 ). One can write down a precise
formula for the squared L 2 -discrepancy for the given in this example, which is
L 2 (Y N(s1) , )2 =
2,t dt
[0,1]s1
N 1 s1
7
1
1 71
16
+
= 2
(1 max{ym, j , yn, j }) +
N m,n=0 j=1
4C 2 54e2
27e 108
4
N 1 4
2
1
k=1 (1 yi,k )
1
yi, j
(1 + e yi, j e
)
,
16N C i=0 j=1
1 yi,2 j
where C = 1 1/e.
612
Theorem 1 shows that Algorithm 1 can yield a point set satisfying the discrepancy
bound (3). To test this result numerically and to compare it to the acceptance-rejection
algorithm using random inputs, we performed the following numerical test. We generated 100 independent stratified inputs and 100 independent pseudo-random inputs
for the acceptance-rejection algorithm. From the samples sets obtained from the
acceptance-rejection algorithm we chose those samples which yielded the fastest
rate of convergence for stratified inputs and also for pseudo-random inputs.
Theorem 1 suggests a convergence rate of order N 1/21/(2s) = N 0.6 for stratified
inputs. The numerical results in this test shows an empirical convergence of N 0.62 ,
see Fig. 1. In comparison, the same test carried out with the stratified inputs replaced
by pseudo-random inputs shows a convergence rate of order N 0.55 . As expected,
stratified inputs outperform random inputs.
We also performed numerical experiments to test Theorem 2. For q = , the left
side in (11) is the infinite moment, i.e. the essential supremum, of the random variable
N L q,N (Y Ns1 ). Theorem 2 suggests a convergence rate of order N 1/s = N 0.2 . To
compare this result with the numerical performance in our example, we used again
100 independent runs, but now chose the one with the worst convergence rate for each
case. With stratified inputs, we get a convergence rate of order N 0.55 in this case (see
Fig. 1), which may suggest that Theorem 2 is too pessimistic. Note that Theorem 2
only requires very weak smoothness assumptions on the target density, whereas the
density in our example is very smooth. This may also explain the difference between
the theoretical and numerical results.
We also test Theorem 2 for the case q = 2. In this case, the left side of (11) is
an L 2 average of N L 2,N (Y Ns1 ). Theorem 2 with q = 2 suggests a convergence rate
of L 2,N (Y Ns1 ) of order N 1/21/(2s) = N 0.6 . The numerical experiment in Fig. 2
10
0
Random-worst
0.74 N -0.45
Random-best
1.99 N -0.55
Stratified-worst
0.98 N -0.55
Stratified-best
Discrepancy
2.03 N -0.62
10
-1
10
-2
10
-3
10
10
10
10
10
Number of points
10
10
613
L2-Stratified
0.26 N -0.59
L2-Random
Discrepancy
0.24 N -0.50
10-2
10-3
10 1
10 2
10 3
10 4
10 5
Number of points
yields a convergence rate of order N 0.59 , roughly in agreement with Theorem 2 for
q = 2. For random inputs we get a convergence rate of order N 0.50 , as one would
expect.
614
ai + 1
, d
d
i
b
bi
with integers 0 ai < bdi and di 0 for all 1 i s. If d1 , . . . , ds are such that
d1 + + ds = k, then we say that the elementary interval is of order k.
Definition 4 (fair sets) For a given set PN = {x 0 , x 1 , . . . , x N 1 } consisting of N
points in [0, 1)s , we say for a subset J of [0, 1)s to be fair with respect to PN , if
N 1
1
1 J (x n ) = (J ),
N n=0
615
and Jt = ([0, t) [0, 1]) A. Let Jt denote the boundary of Jt and [0, 1]s
denotes the boundary of [0, 1]s . For k N we define the covering number
k () = sup min{v :U1 , . . . , Uv Ek : ( Jt \ [0, 1]s )
t[0,1]s
v
Ui ,
i=1
(13)
[0,1]s1
(z)d z.
Emt with
Ui Ui = for 1 i < i v. Let V1 , . . . , Vz
v Vi Jt , V i Vi =
z
for all 1 i < i z and Vi Ui = such that i=1 Vi i=1 Ui Jt . We define
W =
z
Vi
i=1
v
Ui
i=1
and
Wo =
z
Vi .
i=1
Then W and W o are fair with respect to the (t, m, s)-net, W o Jt W and
(W \ Jt ), (Jt \ W o ) (W \ W o ) =
v
i=1
(Ui ) =
v
i=1
The proof of the result now follows by the same arguments as the proofs of [33,
Lemma 1 & Theorem 1].
From Lemma 3 we have that if A admits an (s 1)dimensional Minkowski
content, then
k () cs b(11/s)k .
This yields a convergence rate of order N 1/s in Lemma 6. Another known example
is the following. Assume that is constant. Since the graph of can be covered by
616
just one elementary interval of order m t, this is the simplest possible case. The
results from [24, Sect. 3] (see also [8, pp. 184190] for an exposition in dimensions
s = 1, 2, 3) imply that k () Cs k s1 for some constant Cs which depends only
on s. This yields the convergence rate of order (log N )s1 N 1 in Lemma 6. Thus, in
general, there are constants cs, and Cs, depending only on s and such that
cs, k s1 k () Cs, b(11/s)k ,
(14)
1
2
3
+ 2(l1) + 3(l1) + .
l1
b
b
b
Let t [0, 1). In the following we define elementary intervals of order k N which
cover Jt \ [0, 1]2 . Assume first that k is a multiple of , then let g = k/. Then
we define the following elementary intervals of order k = g:
ag1
ag1
a g a1
ag + 1
a1
+ + g1 + g ,
+ + g1 +
b
b
b b
b
bg
ag1
ag
ag1
ag + 1
a1
a1
+ + (g1)(1) + g(1) , 1 + + (g1)(1) + g(1) ,
b1
b
b
b
b
b
(15)
617
tg
tg+1
t1
+ + g + g+1 + .
b
b
b
tg+u1
tg+u1
t1
cu t1
cu + 1
+ + g+u1 + g+u , + + g+u1 + g+u
b
b
b
b
b
b
dg(1)u d1
dg(1)u
d1
1
+ + g(1)u ,
+ + g(1)u + g(1)u ,
b
b
b
b
b
(16)
618
5 Concluding Remarks
In this paper, we study an acceptance-rejection sampling method using stratified
inputs. We examine the star-discrepancy and the L q -discrepancy and obtain that the
star-discrepancy is bounded by N 1/21/2s , which is slightly better than the rate of
plain Monte Carlo. A bound on the L q -discrepancy is given through an estimation of
q
q
(E[N q L q,N ])1/q . It is established that (E[N q L q,N ])1/q achieves an order of convergence of N (11/s)(11/q) for 2 q . Unfortunately, our arguments do not yield
an improvement for the case 1 < q < 2. From our numerical experiments we can
see that, adapting stratified inputs in the acceptance-rejection sampler outperforms
the original algorithm. The numerical results are roughly in agreement with the upper
bounds in Theorems 1 and 2.
We also find that the upper bound for the star-discrepancy using a deterministic
driver sequence can be improved to N for 1/s < 1 under some assumptions.
An example illustrates these theoretical results.
Acknowledgments The work was supported by Australian Research Council Discovery Project
DP150101770. We thank Daniel Rudolf and the anonymous referee for many very helpful comments.
References
1. Ambrosio, L., Colesanti, A., Villa, E.: Outer Minkowski content for some classes of closed
sets. Math. Ann. 342, 727748 (2008)
2. Beck, J.: Some upper bounds in the theory of irregularities of distribution. Acta Arith. 43,
115130 (1984)
3. Botts, C., Hrmann, W., Leydold, J.: Transformed density rejection with inflection points. Stat.
Comput. 23, 251260 (2013)
4. Chen, S.: Consistency and convergence rate of Markov chain quasi Monte Carlo with examples.
Ph.D. thesis, Stanford University (2011)
5. Chen, S., Dick, J., Owen, A.B.: Consistency of Markov chain quasi-Monte Carlo on continuous
state spaces. The Ann. Stat. 39, 673701 (2011)
6. Devroye, L.: A simple algorithm for generating random variats with a log-concave density.
Computing 33, 247257 (1984)
7. Devroye, L.: Nonuniform Random Variate Generation. Springer, New York (1986)
8. Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte
Carlo Integration. Cambridge University Press, Cambridge (2010)
9. Dick, J., Rudolf, D.: Discrepancy estimates for variance bounding Markov chain quasi-Monte
Carlo. Electron. J. Prob. 19, 124 (2014)
10. Dick, J., Rudolf, D., Zhu, H.: Discrepancy bounds for uniformly ergodic Markov chain QuasiiMonte Carlo. http://arxiv.org/abs/1303.2423 [stat.CO], submitted (2013)
11. Doerr, B., Gnewuch, M., Srivastav, A.: Bounds and constructions for the star-discrepancy via
-covers. J. Complex. 21, 691709 (2005)
12. Gerber, M., Chopin, N.: Sequential quasi-Monte Carlo. J. R. Stat. Soc. B 77, 144 (2015)
13. Gnewuch, M.: Bracketing number for axis-parallel boxes and application to geometric discrepancy. J. Complex. 24, 154172 (2008)
14. He, Z., Owen, A.B.: Extensible grids: uniform sampling on a space-filling curve. J. R. Stat.
Soc. B 115 (2016)
619
15. Heinrich, S.: The multilevel method of dependent tests. In: Balakrishnan, N., Melas, V.B.,
Ermakov, S.M., (eds.), Advances in Stochastic Simulation Methods, pp. 4762. Birkhuser
(2000)
16. Heinrich, S., Novak, E., Wasilkowski, G.W., Wozniakowski, H.: The inverse of the stardiscrepancy depends linearly on the dimension. Acta Arith. 96, 279302 (2001)
17. Hrmann, W.: A reject technique for sampling from T-concave distributions. ACM Trans. Math.
Softw. 21, 182193 (1995)
18. Hrmann, W., Leydold, J., Derflinger, G.: Automatic Nonuniform Random Variate Generation.
Springer, Berlin (2004)
19. Kuipers, L., Niederreiter, H.: Uniform Distribution of Sequences. Wiley, New York (1974)
20. LEcuyer, P., Lcot, C., Tuffin, B.: A randomized quasi-Monte Carlo simulation method for
Markov chains. Oper. Res. 56, 958975 (2008)
21. Morokoff, W.J., Caflisch, R.E.: Quasi-Monte Carlo integration. J. Comput. Phys. 122, 218230
(1995)
22. Moskowitz, B., Caflisch, R.E.: Smoothness and dimension reduction in quasi-Monte Carlo
methods. Math. Comput. Mod. 23, 3754 (1996)
23. Nguyen, N., kten, G.: The acceptance-rejection method for low discrepancy sequences (2014)
24. Niederreiter, H.: Point sets and sequences with small discrepancy. Monatshefte fr Mathematik
104, 273337 (1987)
25. Niederreiter, H., Wills, J.M.: Diskrepanz und Distanz von Maen bezglich konvexer und
Jordanscher Mengen (German). Mathematische Zeitschrift 144, 125134 (1975)
26. Owen, A.B.: Monte Carlo Theory, Methods and Examples. http://www-stat.stanford.edu/
~owen/mc/. Last accessed Apr 2016
27. Robert, C., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York (2004)
28. Roberts, G.O., Rosenthal, J.S.: Variance bounding Markov chains. Ann. Appl. Prob. 18, 1201
1214 (2008)
29. Tribble, S.D.: Markov chain Monte Carlo algorithms using completely uniformly distributed
driving sequences. Ph.D. thesis, Stanford University (2007)
30. Tribble, S.D., Owen, A.B.: Constructions of weakly CUD sequences for MCMC. Electron. J.
Stat. 2, 634660 (2008)
31. Wang, X.: Quasi-Monte Carlo integration of characteristic functions and the rejection sampling
method. Comupt. Phys. Commun. 123, 1626 (1999)
32. Wang, X.: Improving the rejection sampling method in quasi-Monte Carlo methods. J. Comput.
Appl. Math. 114, 231246 (2000)
33. Zhu, H., Dick, J.: Discrepancy bounds for deterministic acceptance-rejection samplers. Eletron.
J. Stat. 8, 678707 (2014)
Index
B
Barth, Andrea, 209
Bay, Xavier, 521
Belomestny, Denis, 229
Binder, Nikolaus, 423
Brhier, Charles-Edouard, 245
C
Carbone, Ingrid, 261
Chen, Nan, 229
Chopin, Nicolas, 531
D
Dahm, Ken, 423
Dereich, Steffen, 3
Dick, Josef, 599
Durrande, Nicolas, 315
G
Gantner, Robert N., 271
Genz, Alan, 289
Gerber, Mathieu, 531
Giles, Michael B., 303
Ginsbourger, David, 315
Goda, Takashi, 331
Gnc, Ahmet, 351
Goudenge, Ludovic, 245
H
He, Zhijian, 531
Hickernell, Fred J., 367, 407, 583
Hinrichs, Aicke, 385
Hoel, Hkon, 29
Hofer, Roswitha, 87
Hussaini, M. Yousuff, 351
Hppl, Juho, 29
J
Jakob, Wenzel, 107
Jimnez Rugama, Llus Antoni, 367, 407
K
Keller, Alexander, 423
Kritzer, Peter, 437
Kucherenko, Sergei, 455
Kunsch, Robert J., 471
L
Lang, Annika, 489
Lentre, Lionel, 507
Lenz, Nicolas, 315
Lester, Christopher, 303
Li, Sangmeng, 3
Liu, Yaning, 351
M
Maatouk, Hassan, 521
Matsumoto, Makoto, 143
N
Niederreiter, Harald, 87, 531
Novak, Erich, 161
621
622
O
Oettershagen, Jens, 385
Ohori, Ryuichi, 143, 331
kten, Giray, 351
P
Pillichshammer, Friedrich, 437
R
Robert, Christian P., 185
Roustant, Olivier, 315
S
Schretter, Colas, 531
Schuhmacher, Dominic, 315
Schwab, Christoph, 209, 271
Siedlecki, Pawe, 545
Song, Shugfang, 455
ukys, Jonas, 209
Suzuki, Kosuke, 331
Index
T
Temlyakov, Vladimir, 557
Tempone, Ral, 29
Trinh, Giang, 289
Tudela, Loc, 245
U
Ullrich, Mario, 571
W
Wang, Yiwei, 229
Whittle, James, 303
Y
Yoshiki, Takehito, 331
Z
Zhou, Xuan, 583
Zhu, Houying, 599