by
SIMON M C F A D Z E A N  F E R G U S O N
A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F
THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
in
T H E F A C U L T Y O F G R A D U A T E STUDIES
(Institute of A p p l i e d Mathematics)
(Department of Mathematics)
T H E UNIVERSITY OF BRITISH C O L U M B I A
M a r c h 1995
S i m o n McFadzeanFerguson
In
presenting
this
degree at the
thesis
in partial
University of
fulfilment of
the
requirements
for
an advanced
freely available for reference and study. I further agree that permission for extensive
copying of this thesis for
department
or
by
his
or
representatives.
It
is
understood
that
copying
my
or
publication of this thesis for financial gain shall not be allowed without my written
permission.
Department of
The University of British Columbia
Vancouver, Canada
Ik
Date
DE6 (2/88)
\ 2.
'vSfrML '
Abstract
Geostatistics involves the statistical estimation of erratic surfaces, similar to those found
in geology, using sample data. It has been my experience that there are few texts i n geostatistics written for people who are new to the subject, and who have not been immersed
in it since its inception i n the 1960's. To prevent other people becoming confused by the
changing notation, and unspoken assumptions, I provide an overview of this subject, w i t h
the a i m of providing a clearer understanding of the concepts involved, the assumptions
made, and the motivation behind each type of estimator. I then concentrate on the more
general form of estimation assuming a nonhomogeneous trend, called
Universal Kriging.
I explain i n detail how this estimator can be found i n an accurate and computationally
efficient way. Using the information gained from robustness studies of this estimator, I
then attempt to apply it to real surfaces, for different methods of covariance estimation
and trend orders.
ii
Table of Contents
Abstract
ii
Table of Contents
iii
List of Figures
vi
List of Tables
vii
Acknowledgements
viii
General Notation
ix
Introduction
T h e E v o l u t i o n of Geostatistics
2.1
2.2
T h e Histogram
13
2.3
16
2.4
Nonlinear estimators
18
(OK)
estimator, ZOK(^)
2.4.1
Disjunctive K r i g i n g (DK)
19
2.4.2
Indicator K r i g i n g (IK)
25
K r i g i n g w i t h a nonhomogeneous trend
3.1
27
3.1.1
Discussion
30
34
iii
3.2
3.3
36
3.2.1
Cholesky factorisation
36
3.2.2
Q R Decomposition
36
3.2.3
37
3.2.4
37
3.2.5
44
46
3.3.1
F i n d i n g $ explicitly
46
3.3.2
F i n d i n g R K(*)
46
explicitly
3.4
47
3.5
48
3.5.1
51
3.5.2
57
59
4.1
4.2
4.3
4.4
60
. .
63
64
67
Robust K r i g i n g
68
A Numerical Application
71
5.1
5.2
74
5.3
Results
79
5.4
87
iv
. . . .
72
5.5
Conclusions
90
94
Conclusion
97
Bibliography
List of Figures
1.1
2.2
is transformed to N(0,1)
. . .
using
CDF's
23
5.3
A n experimental semivariogram
75
5.4
76
5.5
79
5.6
80
5.7
Plots of residuals versus fitted values for the A D and S D data sets . . . .
82
5.8
83
5.9
85
5.10 Density and normal qq plots for L S  W L S and L S  L I N differences for
stationary trend surface
87
92
93
vi
List of Tables
5.1
5.2
84
85
vii
Acknowledgements
First and foremost I would like to express my gratitude to D r . R u b e n Zamar for taking a
particular interest i n my work i n his sampling theory course, and for offering to supervise
me for the duration of my Masters thesis. I thank h i m for his continuous support and
guidance during the course of this research.
I am also extremely grateful to Professor D o n L u d w i g for serving on my advisory committee, for carefully reading drafts of this thesis, and for his many helpful comments
during various stages of this work. I also thank h i m for his generous support as my
course advisor.
viii
General Notation
Xj
Z(x)
the theoretical regionalised variable for the ore grade at the point x
z(x)
the ore grade measured at the point x , seen as a realisation of the variable Z ( x )
Z(x)
an estimator of Z{x)
(x)
LI(X.)
Z(y))
C(h)
7(x, y) the semivariogram for Z(x) with same notation changes as for C ( x , y)
cr^(x)
<7g(x)
A,(x)
A(x)
<p
a Lagrange multiplier
<p
//(x)
f(x)
points
p
(3
the ((L + 1) x 1) vector of trend constants, /?/, such that: u(x) = / 3 f (x)
sample points
KQ
an i n i t i a l estimate of K
k(x)
the (n x 1) vector of covariances between the ore grade variable Z(x) at x and
those at the sample points, Z ( x i ) . . . Z ( x )
n
m(x)
k^x)
such that: Y
= M
MM
M~ F
l
_ 1
_ 1
i?
R
V
i?(x)
the actual residual estimate at the sample point x*, seen as a realisation of
the variable i?(x,) such that: r(x,) = Z(XJ) m ( x j )
the (n x 1) vector of estimated residuals r ( x j ) at the sample points x i . . . x,
the covariance parameters for a chosen covariance model C ( x , y ; 6)
xi
Chapter 1
Introduction
Geostatistics was initially created to provide better estimates of ore reserves. To understand the concepts of geostatistics, it is necessary to look more closely at the types of
surfaces of the ore deposits involved.
T h e quality of ore, or
the type of ore deposit. Bedded sedimentary ore deposits can have quite regular spatial
ore grade distributions. For these deposits, the ore reserve can be adequately estimated
using conventional techniques such as the fitting of a deterministic function by least
squares.
gold mines i n South Africa can have extremely erratic spatial ore grade distributions,
even w i t h i n a very small area.
surfaces, they are so erratic i n their spatial behaviour that points on the surface that are
relatively far apart appear to be statistically independent. T h i s idea of ore grade being
somewhere between a deterministic function and a random variable led to the concept
of the
regionalised variable. A s the types of surfaces that are found i n geology could not
be realistically fitted by any form of deterministic function without having to solve for a
huge number of parameters, geologists were forced to look at statistical models for these
surfaces.
A s for traditional statistical models, geostatistics considers surface values .z(x) and
z(y),
Chapter 1. Introduction
and
Z(y)
E(Z(x))
p, V x
(1.1)
Var(Z(x))
o\
(1.2)
Var(Z(x)  Z(y)) =
Vx
V a r ( Z ( x ) ) + Var(Z(y)) = 2a .
(1.3)
Equations (1.1) and (1.2) state that the surface variables at a l l points have the same
expected value and variance, and are commonly referred to as the
stationarity under
translation or homogeneity assumptions i n both the mean and the variance. These assumptions allow the estimation of statistical properties which can be assumed representative of the whole surface or population over which stationarity is assumed, rather than
of a particular location. Under these assumptions we can view each sample value as an
outcome of the
Therefore, under this basic model, the geological surface z ( x ) is modelled as a realisation
of an infinite set of independent (normal) random variables, Z(x),
x.
random
function, as it assigns a random variable to each point x . Even though the sample values
are used to estimate the common mean,
of the sampled points is not used i n estimating the surface value at a specifically placed
unsampled point. Bearing i n m i n d that the geological surface z ( x ) that we are t r y i n g to
model as a random function is actually
as entirely
Instead, the closer x and y are to each other, the more likely Z(x)
and
Chapter 1. Introduction
Z(y)
that however randomly the points x and y are chosen, they are not independent, but
spatially correlated.
E(Z(x)) =
Vx
V a r ( Z ( x )  Z{y))
(14)
= 2 (h) J
E ( Z ( x ) ) = /* V x
(15)
Cov(Z(x),Z(y)) = C(h) J
where h = x y .
The function, 27(h), is called the
an increasing function of  h  . The idea behind this is that the closer the points x and y
are, the more alike the variables will be, and so the smaller w i l l be the variance of their
difference.
Thus, under this geostatistical theory, the geological surface, z(x),
realisation of an infinite set of random variables, Z(x),
is modelled as a
distances, and are correlated over short distances, dependent on their variogram 27(h) or
covariance function C ( h ) , otherwise known as the
is referred to as a
regionalised variable.
class of IS models strictly contains those satisfying S O S . In practise, I have found that
this distinction is lost because one must assume stationarity in the variance i n order to
Chapter 1.
Introduction
estimate either the variogram or the covariance function. Under this unifying assumption,
the variogram and covariance function can be directly related i n the following way:
2 (h)
7
( Z ( x )  Z{y)) = V a r ( Z ( x )  Z ( y ) )
Var(Z(x)) +
2a 2C(h).
Var(Z(y))2Cov(Z(x),Z(y))
(1.6)
F r o m now on, I use the abbreviation (co)variogram to refer to both the variogram and
covariance function. In geostatistics, people refer to the semivariance or semivariogram,
7(h), rather than the variogram where
(h) = a
2
2
C(h) = C(0)C(h),
(1.7)
suggest that the covariance function should indicate more correlation i n one direction
than another, i t may also be assumed to be stationary under rotation, or isotropic, to
make estimation of the covariance function easier:
C ( h ) > C{h)
where h =  h 
Having outlined the philosophy and assumptions behind basic geostatistical theory, I now
concentrate on more specific developments within this subject of spatial estimation.
In Chapter 2, I describe the evolution of ideas i n mainstream geostatistics. Throughout
this chapter, the random function Z(x) is assumed to have a homogeneous trend. Firstly,
Chapter 1. Introduction
Figure 1.1: A n idealised (co)variogram under the second order stationarity assumption.
C(h) is the covariogram and 'y(h) is the variogram. Sample values whose
interlocation distance is greater than a, the range of influence, are considered to be uncorrelated and thus independent.
ZOK(X),
ZOK{*)
tribution of the sample variables which can be estimated by forming a histogram of the
data.
Under
the assumption that the data is jointly gaussian, the optimal estimator can be shown to
be
linear. U p o n the failure of this assumption, I then outline the alternative methods
estimators, outlining their differing motivations and assumptions, and pointing out their
practical failings.
In Chapter 3, I then outline the attempts that have been made i n geostatistics to estimate random functions Z(x) w i t h a
ZUK^X)
Pointing
out the weaknesses i n applying the original form of the estimator, I describe i n detail an
Chapter 1. Introduction
Chapter 2
T h e E v o l u t i o n of Geostatistics
Even though the theory of regionalised variables and the original kriging estimators were
developed i n the early to late 60's, much of this work was written i n French and it was not
until the publications i n English of Michel David's
and A n d r e JournePs and C . Huijbregts'
of these geostatistical ideas became widespread. Since this time, numerous papers and
books have been published on a l l aspects of geostatistics.
It is impossible to summarise adequately this wealth of ideas, recent estimators, improved
versions of old estimators, and differences of opinion, i n a single chapter. Therefore I outline the 'backbone' of modern geostatistics, highlighting the motivations behind each type
of estimator. I begin by deriving Matheron's
Z(x),
Z(x)
at an unsampled point x , and are chosen as a weighted sum of the regionalised variables
Z ( x i ) , . . . , Z(x. ) at the sample points, x i , . . . , x . These sampled points do not have to
n
be of any specific configuration. The weights are chosen so that Z(x) is unbiased, and the
variance of the difference between Z(x)
and Z ( y ) , is known
for any two points x and y i n the area over which we want to estimate
z(x).
C(x!,Xi)
....
C(x!,X )
n
(2.8)
\ C(x ,xi)
n
and C ( x , y ) = C o v ( Z ( x ) , Z(y))
4. The mean of Z(x)
....
C(x ,x ) J
n
is the same for all points x i n the area over which we want to
2.1
D e r i v a t i o n o f M a t h e r o n ' s O r d i n a r y K r i g i n g (OK) e s t i m a t o r ,
ZOK(*)
Z K(^)
0
of the surface
variable Z(x.) at point x as a linear function of the surface variables Z ( i ) > > Z(yin) at
x
= >(x)Z(xO = A ( x ) Z ,
r
(2.9)
10
where
( Z(
X L
( Ax(x)
) ^
A(x) =
Z
n
A (x)
2
(2.10)
ZOK(X)
between the ore grade z ( x ) that would be measured at x and its estimate
Z K{*),
0
to be
zero:
E(Z(x)  Z {*))
OK
(Z(x))  (Z *(x))
G
1=1
(i  E M * ) ) = o.
(2.11)
Therefore, for equation (2.11) to be true, independent of the value of the homogeneous
mean LL, we require:
] P A j ( x ) = 1,
otherwise written as
A ( x ) l = 1.
T
i=i
<r (x)
2
Var(Z(x)  Z ^ ( x ) )
Var(Z(x)) + V a r ( Z
O K
( x ) )  2 C o v ( Z ( x ) , Z0AT(X))
(2.12)
Chapter 2.
The Evolution
of
Geostatistics
11
Var(Z(x)) + V a r ( A ( x ) Z )  2Cov(Z(x), A ( x ) Z )
r
o (x) + A ( x ) i T A ( x )  2 A ( x f k ( x ) ,
2
(2.13)
where
C(x,
X l
) ^
(2.14)
k(x)
V G(x,x) j
n
crf(x) subject
<r (x) + 2 ( l  A ( x ) l ) 0
(2.15)
w i t h respect to A ( x ) and 4>. The factor of two i n the Lagrange component of (2.15) is
only for convenience i n the working, and has no effect on the final solution.
A s H is quadratic i n A ( x ) and K is nonnegative definite, when we differentiate with
respect to A ( x ) , the stationary point solution w i l l necessarily be at a m i n i m u m :
<9A(x)

 2 K A ( x )  2k(x)  201 = 0
=
(1A(xfl) = 0
KX(x)
k ( x ) + 01
(216)
A(x) l
1.
(2.17)
12
These are referred to as the kriging equations. B y first solving for c/>, and using the
assumption that K is positive definite, it can be shown that a unique solution for A(x)
exists.
(218)
OK
i=l
which is already unbiased. Therefore an unbiasedness constraint is not required, and the
kriging equation for A(x) is simply:
(2.19)
tfA(x) =k(x).
A s mentioned, ZOK(X)
and ZOK(X)
A s s u m i n g E(Z(x) )
2
to have a finite variance), the optimal estimator of Z(x) by any measurable func
tion g(Z(xi),...,
Z (x)
CE
= E(Z(x)Z(
=
/
J
z(x)
X l
),...,Z(x ))
f e ) , Z ( x ) .  , Z ( x ) {Z{*)X*1),
1
(2.20)
*W)
/z(xi),...,Z(x )(2(Xi),...,2(x ))
n
d z ( x )
Chapter 2.
CE
13
is optimal i n that it minimises the error variance over all estimators which can
practise. For this reason, we are forced to restrict our choice of estimators to those that
require information that we know, or can at least estimate from the known data values
z(Xi), . . . , 2 ( x ) .
n
If ( Z ( x ) , Z ( x i ) , . . . , Z(x ))
n
tional assumption, only linear functions of the sample variables need be considered. O f
course, if pi is unknown, then CE cannot be found exactly. In this case, Zo {x)
K
should
CE.
The first problem is how to verify this gaussian assumption using the correlated sample
values. U p to now this has been done using the simplest of tools: the histogram.
2.2
The Histogram
tribution from only the sample values ( z ( ) , . . . , ;z(x )), seen as a single realisation
X l
of ( Z ( x i ) , . . . , Z(x )).
of inference possible. Therefore, we can only verify that the marginal distributions of
( Z ( x ) , Z ( x i ) , . . . , Z(x ))
n
14
difficult due to the correlation i n the sample values. In order for it to represent
true distribution of Z(x),
the
= l,...,n
T h e problem is
that, i n reality, samples are far from independent. Miners drill where they t h i n k the ore
grade is high, and then cluster their drilling about this high grade region. So treating
these samples as independent realisations of Z(x)
being overrepresented, leading to a
grade.
A very simple technique called
sample values to form the histogram has proved to much reduce this bias, and is far more
practical than the other methods devised to resolve this problem (David, 1988). One
drawback to this technique is that, dependent on the grid size, different histograms may
be found. T h i s leads to confusion over which histogram to use, demanding a subjective
benefit function over which to optimise.
M y objection to this technique is that it attempts to remove the bias by choosing the
sample points randomly within each square. This does not agree w i t h the theory. A s
described i n Chapter 1, geostatistics is based on the premise that however randomly
you choose your samples, they will not be independent but correlated as dictated by the
covariance function. For this reason we must ensure independence by m a k i n g sure that
the samples used i n the histogram are further apart than the
covariance function. Using the declustering technique, we could still end up w i t h samples
close together despite being i n separate squares, which would then bias the histogram.
A method more i n line w i t h the theory would be to assume a gaussian distribution,
15
estimate the (co)variogram from the data, and from this, estimate a. T h e n by o m i t t i n g
the samples that are within distance a of each other, we could form a histogram and
see if our original assumption is valid. The (co)variogram estimators, discussed later i n
Chapter 4, are again only optimal under the gaussian assumption.
A n o t h e r source of bias i n the histogram is the occurrence of
outliers. A g a i n , numerous
solutions have been put forward, but an even more basic problem still lies i n defining an
outlier. A common method to correct this bias is to decide on a particular model for
the distribution from the histogram and then correct the sample values accordingly to fit
the model. T h e problem w i t h this approach is that i n practise, relatively small samples
are taken, and so the sample values could very well be accurate but just not numerous
enough to provide a good representation of the distribution. In this situation it would be
very unwise to alter the sample values, and thus risk biasing the results, so as to fit an
estimated histogram. Instead, estimation techniques that are robust to outliers should
be used.
It is a well known fact that linear estimators are extremely sensitive to outliers. For
this reason, histogram estimation, and outlier detection i n particular, is an essential
and skilled job and takes great experience to be done well. If the corrected histogram is
skewed from normal, it could indicate either the existence of a nonhomogeneous trend, or
a different distribution. The first case w i l l be discussed in Chapter 3. In the second case,
if the distribution is
Lognormal Kriging, can be used due to its close connection to the gaussian distribution.
Otherwise, CE
estimators.
Chapter
2.3
2. The Evolution
of
Geostatistics
16
is a linear
this can easily be shown using multivariate gaussian theory (Rendu 1979).
F ( x ) = ln(Z(x)),
(2.21)
course this must be transformed back i n order to find the estimator for Z(x):
Z (x)
LK
= De "&\
(2.22)
where D is a constant derived i n (Journel 1980) to ensure that the resulting estimator is
unbiased. In deriving D, Journel uses the fact that YOK(X) is normally distributed and
so e
Y
^ is lognormally distributed.
is
instead of Y K(^),
0
unbiased approximation to C E .
T h e use of LK has been somewhat controversial, due to its lack of robustness compared
to n o r m a l kriging. OK is only dependent on the relative values of the covariance function
Chapter 2.
The Evolution
of
Geostatistics
17
C(h) at different h, i.e. its shape, whereas LK is very sensitive to local variations i n the
value of C(h) of the transformed data. This can be seen by looking at the OK kriging
equations (2.16) and (2.17) which can be written as
EAj(x)C(xj,Xi) = C(x,xO
i=i
i = l,...,n,
>i(x) = l.
(2.23)
(224)
a ^ j ( ) ( j . i ) + / ? E i ( ) = a C ( x , X i ) + /? + <>
/
x
z= l,...,n.
(2.25)
Therefore, by using equation (2.24), we can cancel P from both sides of the equation.
T h e n by dividing throughout by a, equation (2.25) becomes
d>
A ( x ) C ( x , x ) = C(x,Xi) + j
j=i
i = l,...,n.
01
of Aj(x).
(2.26)
{YIj=\ A J ( X ) C ( X J , x,)
equations. Therefore, the covariance function can not be horizontal. So, it is clear that
any constant nonzero factor or constant term i n the covariance function is filtered out
by the kriging equations, and thus has no effect on the kriging parameters A ( x ) and the
final estimator
ZOK{*)
However, for LK, the unbiasedness constant D is dependent upon the exact covariance
function, and thus the estimator loses this filtering property. This makes LK far more
18
dependent on the estimation of the covariance function, which is already a very difficult
task. It is demonstrated very clearly i n (David 1988) using a couple of examples, that by
slightly altering the covariance function you can get a great variety of estimates. For this
reason, LK is a very risky estimator to use unless the practitioner is very experienced
and some form of crossvalidation technique is used, such as David's method ( K i m and
Knudsen, 1978).
value at a time and using the other sample values and the proposed covariance function
to estimate it. B y repeating this for every sample value, we get a collection of estimation
errors which we can compare w i t h the model predictions to see if the proposed covariance
is appropriate.
2.4
Nonlinear estimators
T h e advantage of using
are computationally
For these reasons the
original linear kriging estimators such as ZOK(X) are still i n great use i n the mining
industry, despite the recent development of nonlinear estimators. For the same reasons,
it is also worthwhile to use linear estimators, even when the variable Z ( x ) does not
depend on the data i n a linear way.
into subsections, where the kriging assumptions of a homogeneous trend and a jointly
gaussian or lognormal sample distribution are more likely to be satisfied, and use a
different estimator for each subsection. A s linear estimators are extremely sensitive to
outliers, dividing the estimator i n this way would also limit the effect of the outliers to
the sections i n which they lie. O f course, the extent of this division would be limited by
the number of samples, considering that a covariance function and trend function must
19
nonlinear
estimators of Z(~x).
T h e first nonlinear estimator to be developed was Disjunctive Kriging. Devised by M a t h eron (1976), it is still one of the most popular nonlinear estimators, at least i n research
if not i n practise, and for this reason I shall describe it i n more detail.
2.4.1
(2.27)
i=i
E(Z (x)
DK
 Z(x,)) =
E(Z(x)  Z(x.j))
j = 1,... ,n.
(2.28)
20
disjunctive
nature of the estimator. The reason for this will become clear later on i n this section.
T h e functions fi must be Immeasurable for E[fi (Z(x.i)]
ZDK{*)
solved i n general, more assumptions must be made to simplify the problem. F r o m now
on, for ease of notation, x shall be denoted by Xo.
If we can assume that Zfa),
Z ( x i ) = to(Y(xi)),
= (f^TSWWtyi)^))
Fi^ViAVj)
\fc=0
F(d )F(d )
yi
yj
Vi, j
(2.29)
where v , k 0,1,..., are orthogonal functions i n that Vz, E[ri (Yi)ri (Yi)]
k
kl
k2
=0
hi() = fM)).
(2.30)
For a general isofactorial model, understanding of the following theory can become unnecessarily confused by notation.
21
specific model. The following steps are very similar to those i n the general case:
If the sample variables Zfc)
distribution function F is iV(0,1), and whose joint distribution functions Fij are bivariate
normal w i t h marginal means 0, marginal variances 1, and correlation coefficient pij =
E(YiYj) (which i n this case is also the covariance), then the Yi can be shown to form an
isofactorial model w i t h the Tjk(Yi) being the normalised Hermite polynomials:
V k ( Y i )
=k\
> >~>
w h e r e
;'
( 2
'
3 1 )
Z (x)
DK
J2h(Z(xi))
i=l
= j^hiiYi),
= JZh^iXi))
i=l
(2.32)
i=l
are Immeasurable, each element in the sum can be written as an expansion of Hermite
polynomials i n the following way:
oo
h (Yi) = W Y )
fc=0
t
Similarly Z{x )
0
oo
=
fc=o
TJ
(V\
A.fc^^.

(2.33)
lc
Z(xo)=u,(Yo) = f > ^ T ^
(  )
2
3 4
Under the assumption that we know the transformation cu, we also also know the coefficients bk, k = 0 , 1 , . . Now, i n order to solve for hi, i = 0 , . . . , n, or more specifically, for
22
it ( i(Yi)
E
i) = E(Z(x )
I Yj)
j = 1 , . . . , n.
(2.35)
i=l
B y inserting (2.33) a n d (2.34) into (2.35), we get:
oo
oo
n \
Y.Y.^E{H {Y )\Y )
k
fc=0 i=l
= Y.T^(H {Y )\Y )
k=0
j = l,...,n,
(2.36)
E(H (Yi) \ Yj) = pij H (Yj) for all triples (i,j,k), becomes:
k
oo
>
fc=0 i=l
E hp^VkiYj)
j = 1 , . . . , n.
(2.37)
k=0
it *Pn
x
= *A>;*
j = l,...,n,
A; = 0 , 1 , . .
(2.38)
i=i
Thus, for each k, we now have n equations and n unknowns allowing us to solve for each
set of Ajfc, i = 1 , . . . , n, separately  thus the title Disjunctive K r i g i n g .
In practise, the Hermite polynomial expansions must be truncated to form a finite sum,
such that our DK estimator becomes:
n
ZDKM
= E E
oo
^VkiYi) ~ E E W ( ^ ) = E W ) ,
i=l
fc=0
i=l
fc=0
(239)
i=l
where the coefficients Xi are found by solving K sets of n equations as defined i n (2.38).
k
23
A t this point it must also be noted that for F ( x ) to have an invariant distribution w i t h
respect to x , as required for an isofactorial model, so must zT(x), under the common transformation u>. A s a consequence, Z(x) is required to be stationary i n both its mean and
variance. Another major drawback is that this model has no allowance for an anisotropic
covariance function.
DK
innate to the model, which i n many cases render DK an inapplicable form of estimation.
DK
also poses some difficult problems when put to practical use: the most important
being the estimation of to. The only documented method of estimation seems to be
in (David, 1988).
N(0,1) random variable is transposed onto the same axes. Each sample value 2(x;) is
transformed to the corresponding value y{x.i) on the N(0,1) curve, chosen as the value
below which the same proportion of the distribution lies. T h i s is illustrated i n Figure
2.2.
F(z) _
'
F(y)
F(yO=F(Zi)
0.6
0.4
0.2
yi
0.2
0.4
Zi
0.6
0.8
Figure 2.2: This is an illustration of how the sample values . z ( x i ) , . . . , z(x ) are transformed to corresponding values y ( x i ) , . . . , y ( x ) from a iV(0,1) distribution.
F(z) is the C D F of the sample values, and F(y) is the C D F of a N(0,1)
random variable.
n
24
B y then plotting z(x{) against t/(x;), we can fit a function to of the form:
(2.40)
kVk
fc=0
to a N(0,1)
random
locations. Due to there being only one realisation of each random variable Yi, no inference
can be made about these bivariate distributions, so an assumption that the estimated
to also provides normal bivariate distributions Fij, V i, j must be made. Also, accurate
estimation of the C D F from correlated data is directly related to histogram estimation.
A s discussed earlier, this is a difficult task.. Disjunctive K r i g i n g is also computationally
heavy i n comparison w i t h the previous linear estimators, and as Hawkins and Cressie
(1984) point out, the procedure for estimating the functions fi is highly nonrobust.
E s t i m a t i o n of the correlation (covariance) function p and the coefficients b are completely
k
dependent upon the transformed data y(xi), i = 1 , . . . , n. A bias i n the estimation of to,
from misspecification of the C D F say, could lead to a compounded bias i n the resulting
estimator.
{
Most other nonlinear estimators have been developed especially to estimate recoverable
reserves rather than total reserves, where instead of estimating the reserves directly from
the sample variables Z{xi), indicator variables of the following type are defined:
I(x;z)
if Z(x) < z
w.p.
p(z)
otherwise
w.p.
1 p(z)
(2.41)
one for each sample variable, where z is a cutoff grade below which the ore is not
recoverable.
assumed throughout.
2.4.2
25
Indicator K r i g i n g
Indicator K r i g i n g , or IK,
(IK)
mation arguing that i n nature, variables are rarely gaussian, especially the uncontrolled
spatial distributions found i n the earth sciences. Therefore, Journel proposes to use the
data to model uncertainty i n the value at an unknown point x i n the form of a conditional
distribution function F(x;z\n) of Z(x).
distribution and using the data to find an optimal estimator. T h i s conditional distribution function is modelled as a weighted linear function of the indicator functions at the
data points:
n
(2.42)
tance related covariance function, e.g.^^p, just as i n distance related moving average
estimators, where D is a constant. T h e n by repeating this for z , k = 1 , . . . , K, a discrete
k
approximation of F ( x ; z\n) can be found. This is of course w i t h the condition that the
resulting estimate is a valid distribution function i.e. F G [0,1] and F(z)
z > z'. In order to provide an estimate of Z(x)
a
ZJK,
> F(z')
when
minimises:
E[L(z
IK
 Z(x))\Z(x )...
Z(x )} = / f ^ L(z
 f=i
L(z
IK
IK
 z)dF(x; z\n)
 4 ) [ F ( x ; z \n)  F ( x ; z \n))
k+1
(2.43)
where z' =
k
26
^+ ,
Zfc+
Multigaussian Kriging
Chapter 3
In practise, most ore surfaces are not homogeneous i n their mean, but instead the mean
follows a systematic trend. This can sometimes be diagnosed by looking at the estimated
covariance function. If the correct covariance function is fitted, and is significantly nonzero for large values of h, this implies that surface variables that are far apart are still
significantly correlated.
deterministic function, the trend surface /i(x), relating the two variables. T h e way i n
which geostatistical theory deals with this nonhomogeneity i n the mean is to model the
surface values relative to this trend, rather than the surface values themselves. T h i s leads
to the model:
Z ( x ) = ( x ) + e(x),
(3.44)
where
^i(x) = trend surface, assumed nonrandom,
e(x) = small scale regionalised variable incorporating the randomness i n Z ( x ) ,
and
( Z ( x ) ) = n(x),
Vx
=>
( e ( x ) ) = 0, V x .
(3.45)
So now, as long as the trend surface is known or can be wellestimated, the new regionalised variable e(x) is stationary, and so can be modelled using regular geostatistical
27
28
theory. T h e covariance function for e(x) is of course the same as for Z(x), as the trend
surface is assumed to be nonrandom.
The problem w i t h the above model, which will be discussed i n greater detail later, is i n
its practical application. The surface value, z(x),
realisation of the surface variable Z(x). W h e n the mean is stationary, we can use every
sample value to estimate the mean ft, whereas now we have only one value w i t h which
to estimate t(x), for each point x. Even if we were to know z(x) for the whole surface,
which would then make estimation unnecessary, we still could not estimate yu(x) exactly.
So really, this trend surface only exists as a theoretical tool to make the regionalised
variables stationary, but i n practise it is impossible to find.
Despite these practical drawbacks, Universal Kriging, as devised by M a t h e r o n (1969),
is still the only form of estimation i n geostatistical theory that incorporates a nonhomogeneous trend.
For this reason, much has been written on this estimator, and i n
particular on unbiased estimation of the covariance function using sample values from
nonhomogeneous sample variables.
In this chapter, I first outline the original derivation of the Universal K r i g i n g
(UK)
29
Chapter 5.
Finally, I discuss the problem of estimating the covariance function unbiasedly from
nonhomogeneous sample variables. I describe how this led to the theory of intrinsic ran
dom functions and a proposed form of unbiased estimation of the generalised covariance
function. I also describe other forms of unbiased covariance estimation that have been
developed since then.
Before deriving the Universal K r i g i n g estimator Z (x),
UK
U K m o d e l assumptions
1. T h e basic model for each regionalised variable Z ( x ) at a point x is as defined i n
equations (3.44) and (3.45).
2. T h e trend surface p(x) is unknown, yet we know its shape, i n that it is of the form:
/x(x) = / ? / / * ( x ) = / 3 f ( x ) ,
r
(3.46)
where
/
and
f(x) =
/o(x)
/i(x)
V hW
T h e trend component functions, / ; ( x ) , defining the shape of the trend, are known,
but the parameters B are unknown.
t
30
3. T h e matrix F, defined as
/o(xi)
/L(XI)
F =
(3.47)
^ /o(x )
n
.... / ( x ) J
L
becomes
\ illconditioned for relatively small L and will cause roundoff errors to occur when
calculated. Numerical solutions to this problem will be discussed later.
3.1
ZUK^)
ZJJK^)
X i , . . . , x , such that:
n
Z (x) =
UK
5Xx)Z(
)
i=l
Xl
(3.48)
= Z^A(x),
where Z and A ( x ) are as defined i n equation (2.10), and the weights Aj(x) are unknown.
n
For ZIJK(*)
such that:
E(Z(x) 
Z (x)) =
UK
E(Z(x)) n
E(Z (x))
UK
= ^( )  E *( M i)
X
i=l
(3.49)
31
f(x) = f > ( x ) f ( ) ,
l = 0,...,L,
X i
i=i
written simply as
f(x)
= A(x) F.
(3.50)
We now want to find the optimal weights which minimise the error variance, <r (x),
2
to minimise:
\{x) F)cf>
T
a (x) + 2 ( f ( x ) 
(3.51)
d\{.x
H
2(f ( x )  A ( x ) F ) = 0,
r
= k ( x ) + Feb
T
(3.52)
= A(x) F.
(3.53)
K~
F A(x)
F /C k(x) +
F A(x)
f(x).
32
F K~ F(f)
_ 1
(3.54)
(3.55)
F ^ ^ x ) + F K~ Fcb
T
= f(x).
(3.56)
(3.57)
B y substituting this solution for <f> back into (3.52), and multiplying by K ,
1
we have
(3.58)
Thus, by substituting (3.58) back into (3.48), the Universal Kriging estimator,
of Z(x),
ZUK(X),
Z (x)
VK
(3.59)
In order to calculate the kriging estimate, we need to define the trend component functions, fi(x), i n equation (3.46).
homo
geneous. Therefore we have little idea of what the trend component functions should be.
In most cases, the trend surface is assumed to be a polynomial of some general order p,
where
Chapter 3.
33
'
/o(x)
/i(x)
f(x) =
( x\
= 1
if
p = 0,
L =
Hp = 2,
if p = 1,
V /L(X) ;
and
1 ^
xy
n=i
If p = 0, the C/K
Z (x.) is a (linear) unbiased estimator that minimises the variance of the error,
UK
it must necessarily estimate the exact values at the the sample points w i t h zero error
variance. Therefore, assuming there to be no measurement error i n the sample values,
UK is an interpolator. This can be very difficult to see by looking at the UK estimator.
We
must go back to the original UK kriging equations defined i n (3.52) and (3.53). B y
letting x be a general sampled point X j , and noticing that (3.52) and (3.53) can be written
as:
f(x,) =
r
f(
X l
(3.60)
f ^
A(x^
(3.61)
f(x )
n
34
AJ(XJ) =
1,
AJ(XJ) =
0 V7 ^ i,
and (p = 0. A s we know that this system of equations has a unique solution then this
must be i t . Therefore
zuxi^i) = zfai)
VXJ. B y substituting
A(XJ) into
equation (2.13) for the error variance, it is easy to see that the terms cancel to leave a
zero error variance as expected.
ZUK(X)
then the predicted surface becomes discontinuous at the data points. A s an example, if
we assume that the trend is homogeneous and the surface variables at a l l points on the
surface are completely uncorrelated, such that p = 0, and:
if / i = 0
otherwise
C(h) =
(3.62)
J2?=i z (
the mean of the sample values, which becomes discontinuous at the sample points.
3.1.1
Discussion
It is a wellknown numerical fact that though products and transposes of large matrices
can be calculated fairly easily and accurately, inverses are a completely different proposition. A s K is an n x n matrix and A an (.L + l ) x (L + l) matrix, calculating K"
and A"
i)>
35
for any extensive data analysis that one uses one of the more numericallybased packages
such as M a t l a b .
I wrote a program i n M a t l a b to estimate a surface from a set of samples using the
above explicit form of the UK estimator. A l t h o u g h it was no problem to calculate
accurately, A~
K~
square of this order of magnitude difference i n its components, this w i l l result i n it having
a nearzero determinant and thus an inaccurate calculation of its inverse.
3.2
36
Z(XJ), Ripley
approaches the
problem from a point of view analogous to that i n time series analysis. He estimates the
trend fj,(x) directly and then models the residuals, i?(x) = Z(x) / i ( x ) , just as we d i d
for Z ( x ) before, but now assuming they are stationary w i t h mean zero. T h e same basic
model assumptions and notation as defined for Matheron's estimator still hold.
Before proceeding w i t h the derivation, I define some well known numerical decompositions
which are used i n the analysis:
3.2.1
Cholesky factorisation
X = MM ,
T
(3.63)
3.2.2
Q R Decomposition
Q R decomposition states that any (N x (L + 1)) matrix X can be decomposed into the
product of an
X = QR,
(3.64)
where
QQ = Q Q = 1,
T
and
R=^
J,
(3.65)
3.2.3
37
(3.66)
= VV
= VV
= I.
be found, for instance, i n Moler (1967). The advantages of using S V D are discussed i n
Section 3.2.4.
3.2.4
(3.67)
0(x) = /3 f(x),
T
w i t h the components of f (x) assumed known, then under the basic model, we arrive at
the following vector equation for the ore grade variables at the sample points:
1
= Ff3 + e ,
n
*(x0
e(x )
2
where e
(3.68)
*(Xn)
38
where Z, F, and f (x) are as defined i n the last section, and the residuals, e , are assumed
n
min (Z  Fb) K (Z
 Fh)
= min
^K~ e
l
n n
= ,b
m
V(*i)>
i n
(369)
i=lj = l
= ( e ( x i ) , . . . , e ( x ) ) is the vector of errors. T h e idea behind this is that i f the
where
ore grade variables Z(XJ) and Z(x.j) at the sampled points j and X j are highly correlated,
X
this reduces the worth of their combined information. Therefore, the multiple of their
associated errors, (XJ) and ( j)> will have a reduced weighting by being multiplied
x
by
COV(Z(XJ), Z(x.j))
= Kij.
T h i s w i l l reduce the effect that a cluster of points has on the polynomial fit, thus providing
a better fit to the more isolated points.
A s K is a covariance matrix, it is nonnegative definite by construction. Therefore we
can use Cholesky factorisation as defined earlier to write K~
i n terms of M
K'
= (MM )~
T
= M~ M\
,
(3.70)
_ 1
= (M )~
T
min (Z  Fb) M M~ (Z
T
 Fh)
= min ( M  Z
1
 M Fb) (M Z 1
M~ Fb).
l
(3.71)
39
Z'(XJ),
and thus the residuals e(xj). T h e new residuals of the transformed problem, ( X J ) , are
independent random variables with variances equal to one:
Cov(0
Cov(Y ) = C o v ( M
M~ KM~
_ 1
Z ) = M
n
= M~ MM
M"
_ 1
Cov(Z )M~
= I.
(3.72)
It then becomes clear that the generalised least squares estimator J3 is the standard least
squares estimator of the transformed problem:
M~ Z
l
= M~ Ff3 + M
l
_ 1
e ,
(3.73)
(3.74)
where
= M~ Z ,
1
G = M~ F
and
= M~ e .
(3.75)
w i l l be
BA l t h o u g h the transformed variables F ( x ^ ) are uncorrelated among themselves, they rem a i n correlated w i t h the unknown ore grade variable Z(x) that we are t r y i n g to estimate,
w i t h new covariance
k (x)
y
40
Cov(Z(x),Y ) = Cov(Z(x),M Z )
M Cov(Z(x),Z ) = M~ k(x).
(3.76)
Notice that since M is a lower triangular matrix, the transformations as denned i n (3.75)
can be carried out using a sequential substitution method. For example, Y
can be written as
M Y
= Z .
(3.77)
MnY,
MY
= Z( )
X l
+ M Y
M Y
+ M r
MY
+ M Y
2l
Z1
nl
22
3 2
n2
= Z(x )
+ M y
33
+ M Y
n3
= Z(x )
3
+ ...
+ M Y
nn
= Z(x )
n
B y solving the first equation for Y\ and substituting it into the second equation, it
becomes obvious how Z
A s stated earlier, we can now use standard least squares methods to estimate j3 from the
transformed data.
41
problem
Following from equation (3.71), we now want to find the standard least squares (LS)
estimator of f3 that satisfies
min H = min ( Y  G b ) ( Y
 Gb).
(3.78)
H = YlY
 2b G Y
T
A s G G = F K~ F
T
+ b G Gh.
T
(3.79)
^ 2G Y
ah
<
As F F
T
+ 2G Gh = 0
T
=>
J3 = (G G)~ G Y .
T
(3.80)
is also
Therefore
in calculating /3, as defined above, we run into exactly the same problem as before, of
finding the inverse of an illconditioned matrix. In this
developed to avoid this problem where the columns of G are orthogonalised using the QR
G = QR,
where
(3.81)
m i n ( Y  Gb) (Y
 Gb)
42
m i n ( Y  Gb) QQ (Y
T
 Gb)
min ( Q ( Y
 Gb)) (Q (Y
 Gb))
=
Now if we define the vector Q Y
T
m i n (Q Y
T
 Rb) (Q Y
T
 Rb).
(3.82)
QY
(3.83)
\ J2
(3.82) becomes
Rb V
Ji
( J  Rb \
mm
b
(3.84)
= R/3.
(3.85)
Since R is an upper triangular matrix, b can be found by applying the sequential substitution method as before.
We can summarise the above i n the following steps:
1)
2)
3)
Chapter 3.
Kriging
with a nonhomogeneous
trend
43
functions.
Transform the sample data Z
5)
6)
7)
Calculate Q Y
8)
to find J
to Y
by solving MY
= Z
4)
for Y(xi).
~ T
Thus we have found the G L S estimator (3 and have an estimate /2(x) = j3 f (x) of the
trend.
A t this point it is worth noting that the above method of avoiding roundoff error for the
illdefined L S problem is only one of many. A n alternative method called singular value
decomposition (SVD), as defined i n Section 3.2.3, is probably the most widely used, as
it is not only very stable, but it also decomposes G i n such a way that it reveals a great
deal about its structure and stability. B y allowing the practitioner to keep an eye on
the condition number of G G,
44
are so corrupted by roundoff error, that they have the effect of pulling (3 away from the
L S solution. T h e final S V D solution of (3 for the linear system G(3 = Y
(3* =
where E
_ 1
is:
VE~ U nY,
U
* is E
with
(3.86)
is also extremely versatile in its application. If G is square and nonsingular, then (3* is
the exact solution. If G is singular, whether it is square or not, then if Y is i n the range
of G , there are an infinite number of solutions, and (3* is the one w i t h the smallest length
(3 f3, for increased stability. Lastly, and most importantly, if Y
T
Despite the versatility and stability of this method, Ripley's method is more than adequate for the L S problems that I encounter i n Chapter 5.
3.2.5
S t e p 2: M o d e l l i n g t h e r e s i d u a l s , i ? ( x )
Once the trend has been estimated, we then find the residuals, r ( x ; ) , such that
r(xi) = z(xi)
 p,(xi) = z(xi)
 J3 f ( x ^ , Vx;
(3.87)
and
r(xi)
r(x )
2
\
(3.88)
T h e residual r ( x ; ) for each sample is seen as a realisation of the residual variable R(x,) =
Z(XJ)
/X(XJ). Under the assumption that /i(x) is the actual trend, i ? ( x ) is equal to the
45
theoretical residual random function e(x) defined earlier. Therefore, we assume that R(x)
is a homogeneous random function such that
E(R(x)) = 0, V x ,
Var(i2(x)) = Var(Z(x)) = o (x), V x ,
2
^OAT(X) =
5Xx)i2(xi) =
RTJ(X),
(3.89)
where
Kr){x) = k ( x ) .
(3.90)
we can solve Mv{x) = k(x) for u ( x ) , and then solve M rj(x) = i>(x) for rj(x), using the
T
1)
2)
4)
Solve M 7](x) = v(x) for 77(x) using the reversed substitution method
T
= m(x) + Ro {y) = p f
T
(x) + R ^ ( x )
(3.91)
3.3
46
It is far from obvious that Ripley's estimator, as defined i n equation (3.91), is equivalent
to Matheron's estimator, as defined i n equation (3.59). Therefore I now show that the
two forms oi UK estimation are equivalent.
3.3.1
Finding ft explicitly
RR
= R R = R Q QR
= GG = A
= A~ R
l
(3.92)
(3 =
R J
= (A R )J
A R
X
'
V J
=
A'R {Q Y )
T
A R (Q M Z ).
1
(3.93)
B y noting that R = Q G = Q M~ F,
T
i n equation (3.93), we
3.3.2
Finding
Q)Q
ROK(^)
M~ Z
l
= A~ F M~
x
M~ 7i
l
= A~ F K~ Z .
1
(3.94)
explicitly
A s ROK^) is defined as
i?(x) = Rl(K k(x)),
(3.95)
(3.96)
47
explicitly as
Kn = Z  (ZK FA F ) .
1
(3.97)
T T
tion (3.91), we get an explicit expression for Ripley's UK estimator that is identical to
Matheron's estimator:
Z {x)
UK
3.4
m ( x ) + i?(x)
ZlK^FA'H(x)
+ Z {I l
( i f  ^ x ) + K FA~
(f(x) 
F K~ k(x)))
T
C a l c u l a t i n g cr (x), t h e e r r o r v a r i a n c e o f Z (x)
2
UK
The advantage of having a statistical rather than deterministic estimator of the ore grade
such as Z (x), is that one can estimate its error variance, cr (x) = V a r ( Z ( x ) Z (x)),
2
UK
UK
and thus have some idea of the magnitude of the estimation error.
T h e UK error variance is as derived i n equation (2.13):
(T (x) = a ( x ) + A ( x ) K A ( x )  2 A ( x ) k ( x )
2
(3.98)
where A ( x ) is as defined i n equation (3.58). B y substituting A ( x ) into (3.98), and simplifying the expanded equation using the fact that F K~ F
T
a (x)
2
<r (x) 2
kfaf/T^x)
+ (f (x)  F K k(pc))
T
A'
(f(x)  F ^  k ( x ) )
T
(3.99)
48
For ease of calculation, we can write a (x) in terms of G, k y ( x ) and R which have already
2
<T(X) =
2
ZTJK{X)'
a (x) + k ( x ) M  M  k ( x )
2
: r
+ ( f ( x )  F M M k(x))
T
(f(x) 
F M M k(x))
T
^ W+ k^xfk^x)
(f (X)  GTky(x)))
+ (lf
(f (x)  G k ( x ) )
(3.100)
(f(x)  G k ( x ) ) .
(3.101)
where
h = R
2 /
3.5
Chapter 3. Kriging
with a nonhomogeneous
trend
49
data. For this reason, the UK estimator stated earlier is somewhat misleading. Before
going any further, I shall define C ( x , y ; 0) as a parametric class of models assumed to
contain the true covariance function, and I shall restate Ripley's version of the estimator
in equation (3.91) so as to include this information:
Z (x)
UK
0),
(3.102)
estimated
directly from the sample data using, for example, Matheron's estimator (Matheron 1969):
where N(h)
jivM ^
[z(x
"
z(Xj)]2
'
'
(3 103)
N(h).
There are other more robust estimators which w i l l be discussed i n Chapter 5, but the
important fact is that the estimation is unbiased.
assumptions, and using the filtering property of the kriging equations, the covariance
function is equivalent to the negative of the variogram for the purposes of finding the
optimal kriging weights. Therefore, assuming an appropriate parametric model is fitted
to the experimental variogram, estimation with a homogeneous trend is unbiased.
O n the other hand, when the trend is nonhomogeneous, as for UK, estimation of the
covariance function poses a larger problem. The process of estimating the trend, calculating the residual estimates at the sample points, and from these estimating
C(h),
introduces a substantial bias (Starks and Fang 1982a). Even if we were to know the correct covariance m a t r i x K, the process of estimating the trend by the o p t i m a l unbiased
50
G L S estimator /t(x), and estimating the covariance function from the resulting residuals
R is biased. T h i s is demonstrated by the following:
(x)
f(x) 3 =
Z  FJ3 = (J  FA~ F K~ )Z
f(x) A F K Z
nt
BZ .
Therefore,
E(R )
(3.104)
but,
Cov(R )
n
BCov(Z )B
= (I
FA F K~ )K{I
l
K  {KK~ FA~ F
1
K~ FA~ F )
1
FA~ F K~ K
1
KFA^F
K = Cov{Z )
(3.105)
Therefore, by using an estimator of the trend, we have introduced a bias, such that the
variance of the new residual process i?(x) is different from that of the original data
Z.
n
= la ,
2
function converges. It has been demonstrated (Cressie, 1987) that this backsubstitution
51
w i l l not remove the bias i n the covariance function, concluding that the best any iteration
method can do is converge to a
biased estimator.
T h i s problem w i t h UK was recognised soon after its development i n 1969, and by 1973
M a t h e r o n had already come up w i t h an apparent solution to this problem i n the form
of
Intrinsic Random Functions of order p (Matheron 1973), where p is the order of the
3.5.1
Even today, UK and IRFp are referred to as two different methods of estimation, but
in fact, as best linear unbiased estimators, they are identical (Christensen 1990). It is
best to think of IRFp as just a different way of looking at the UK problem, resulting i n
the
unbiased estimation of the generalised covariance function (GCF) K (h), rather than
p
increments of the sample variables are found such that the resulting
Chapter 3. Kriging
with a nonhomogeneous
trend
52
xi,..., x ,
m
a generalised increment
(GI) of order p
J2"iZ{xi)
=v Z
5></'(xi)
such that
i=l
where
= 0
l = 0,...,L,
(3.106)
i=l
/i(x),
and p is the order of the trend. Some Vi, but not a l l , can be zero valued.
These G T s have the property of filtering out any linear combination of the trend component functions,
J2t=o dififc),
1=0
/m
E ^ W E ^ 5>/<(xi)
i=l
\i=l
1=0
A n intrinsic
tion
Z(x)
random function
Y,ViZ{xi).
of order p (IRFp)
ViZ{*i
(3.107)
and variance, for any configuration of sample points x . . . , x , and any GI vector v,
1 ;
E ^ Z ( * i + x)
i=i
r L
Yl i E/ '/'(xt + x) + e(x + x)
u
i=i
Vl=0
i u s
Chapter
3.
Kriging
with a nonhomogeneous
trend
E '( )/a(Xi)
T
i=l
(=0
Ls=0
E E f e ( x ) E^/*( *
x
1=0s=0
m
.1=1
53
+E ^ ( + )
e
i=i
m
Vit^Xi
x)
1=1
E^e(xi + x),
(3.108)
i=l
which is thus stationary i n both its mean and variance, for any configuration of sample
points X i , . . . , x
and any GI vector u. T h i s is using the fact that the monomial trend
component functions fi are closed under translation, so that //(x^ + x ) can be expanded
as above. Thus Z(x)
the class or
IRFp:
V a r ( E ViZifr)
] = V a r ( E ^e(x*)
\i=l
\i=l
= E E WiKpifr
(3.109)
j)
i=l j= l
is finally clarified.
equivalent functions whose covariance functions could be used by UK to give the same
results. He points out that i n forming the IRFp, the unbiasedness conditions essentially
divide the covariance function C ( x , y ) into an effective part, the GCF
A T ( x , y ) , and a
p
redundant part i ? ( x , y ) :
C ( x , y ) = K ( x , y ) + i?(x,y),
p
where K (x,
p
(3.110)
each function. B y filtering out the polynomial trend, the GPs essentially filter out this
54
1 ;
..., x
for UK, w i t h x
being
Z(xo) 
Z {xo)
UK
Z(x )  A (xo)Z( ) =
0
UiZ(xi),
X i
i=l
(3.111)
i=0
where
' Ai(xo)
^: = <
1
if i 0.
(3.112)
ifi= 0
n
0 , . . . , L , can be written
as:
n
2>/i(xi) = 0 Z= 0,...,L.
(3.113)
i=0
<Jg(x)
respect to the unbiasedness constraints, as for UK, producing the same set of equations
for the kriging weights A ; ( x ) and the Lagrangian parameters <pi, but w i t h the covariance
0
function replaced by the GCF. Thus, for K r i g i n g purposes, instead of finding the specific
covariance function C of the random function Z(x)\
characterising the class of random functions IRFp.
55
function which satisfies (3.109) for any GI satisfying (3.111) and (3.112). A s s u m i n g N
such G T s , h,.. .,IN can be found, where Ii = "
= 1
K (h) is estimated by a
VijZfc),
R = E [i  E(I )
2
i=i
N
VijZ(%)
Vi=i
i=i
E "ijVikKpdxj
(3.114)
x )
fc
j=i fc=i
T h i s uses the fact that the GTs now have zero mean. K is assumed t o be stationary
p
its parameters so that the regression simplifies to the unique solution of a set of linear
equations, otherwise the regression would be very difficult to solve. A n o t h e r restriction
is that i t must be positive definite for it to be a valid covariance function. It has been
proven (Matheron 1973) that a valid model for a positive definite polynomial GCF is:
K (h) =
p
for p 0
ai\h\
\h\ + a  / i 
>
(3.115)
for p = 1
for p = 2
 j^/aitt3.
A more recent improvement on the form of these functions can be found i n (Delfiner et
al., 1978) where the Green's function  / i  l o g  / i  is added to K (h) defined above for b o t h
2
is the D i r a c delta function, allowing for a discontinuity at h = 0, the nugget effect. This
can arise from measurement error, incurring variance i n the value measured at a point.
56
i n i t i a l estimate of
Kp(h) can be found, and the whole process is repeated, similarly to the iterative process
described earlier for C(h).
of K (h).
p
K (h)
p
is i n practise easier
is
its highly restricted choice of models. Another problem lies i n the form of estimation of
its parameters.
T h e fact that L S
estimation allows very little interaction by the practitioner can be a major disadvantage,
as it is highly sensitive to outliers. Journel (1989) also states that the L S estimation
often results i n a , the nugget effect, being dominant, resulting i n little or no spatial
0
equations.
So
3.5.2
57
Since 1973, when the above form of unbiased covariance estimation was first proposed,
other methods have been developed.
One example is the maximum likelihood estimator 0 ( M a r d i a and M a r s h a l l 1984) which,
assuming ( Z ( x i ) , . . . , Z ( x ) ) to be jointly gaussian, maximises the likelihood:
n
exp{\{zFt3{0)) K{0y\zFp{0))\
(3.116)
= Y,Ki6i
9j =
Z BZ
T
(3.117)
j = l,...,m.
T h e matrices Bj are n x n
(3.118)
the addition of any linear combination of the trend component functions to the surface,
and are unbiased. T h e optimal matrices Bj are found by minimising the error variance
of each 9j under these constraints, just as for UK.
Despite the apparent completeness of the theory, many problems arise when t r y i n g to
apply it i n practise. Firstly, as the error variance involves the fourth moments of the
sample values, a distributional assumption for ( Z ( x i ) , . . . , Z ( x ) ) is required. Secondly,
n
an initial estimate K
58
addition of a trend is withheld by using Ko, the resulting estimators 9j may not necessarily
be m i n i m u m variance. Also if this method is used iteratively to find 0 by substituting
the estimate K for Ko, it may become biased, as the matrices Bj become functions of
the sample values. Similarly, if constraints for K to be a positive definite m a t r i x are
included, then the matrices Bj will be functions of the sample values and may again
interfere w i t h the unbiasedness of 0.
Other examples of this form of unbiased covariance estimation are
restricted maximum
Chapter 4
ZTJK{X)
A l t h o u g h much effort and thought has been put into the development of the various
forms of kriging estimator, relatively little has been put into analysing the robustness
of these estimators to changes i n the assumptions, and providing robust estimators to
cope w i t h these changes.
W i t h respect to the
UK
estimator,
ZUK(X),
these changes
could take the form of outliers i n the data; misspecification of the trend component
functions; misrepresentation of the sampling configuration, and misspecification of the
(co)variogram model and associated parameters.
A s extremely little has been written on the robustness of the supposedly unbiased forms
of covariance estimation discussed i n section 3.5.2, I shall concentrate on the other forms
of covariance estimation where the trend function is estimated to find estimates of the
residuals. T h e estimation process, for a specific sample and sample configuration, can be
divided into four distinct steps:
1. choosing the trend component functions f(x) appropriate to the surface of estimation, and finding initial estimates of the trend coefficients (3.
2. calculating the estimated residuals R , and estimating the experimental (co)varion
gram nonparametrically.
3. choosing an appropriate parametric (co)variogram model to be fitted to the experimental (co)variogram, and estimating the model parameters 0.
59
ZUK{X)
60
E a c h step i n the estimation process could be affected by the above changes. A s each step
is also dependent on the previous steps, the UK estimator could potentially be extremely
unstable to these changes.
In this chapter I describe how each step of the estimation process is affected by some of
these changes, and how this may have an effect on the final estimator. In some cases,
robust estimators have been devised to lessen these effects.
4.1
S t e p 1: T h e t r e n d c o m p o n e n t f u n c t i o n s
The problem of estimating the trend coefficients j3 unbiasedly, given that the appropriate
trend component functions are known, has been discussed at length, yet very little is
known about how to choose these functions, or more specifically, the order of the trend,
assuming it to be polynomial.
Starks and Fang (1982b) suggest first seeing if a trend is evident i n the geology of the
surrounding area, or i n the data.
T h e n after
fitting
the (co)variogram, they suggest using the crossvalidation technique discussed earlier to
obtain errors and predicted error variances. O n looking at the normal quantilequantile
plot of the standardised errors, an alternative model should be fitted if it is far from a
N(0,1) distribution. The idea is that if the correct trend and regionalised variable models
are fitted then the errors will be independent N(0,1) random variables. T h e problem w i t h
this approach is that, assuming first that the kriging model is appropriate for the surface,
it is extremely unlikely that the correct one will be chosen. Therefore, these standardised
errors w i l l be correlated and lead to a biased estimation of the distribution. A second
61
problem is that if the points are evenly spaced then if a point is taken out and estimated
from the rest, the nearest sample point w i l l be about twice the normal distance from
the point of estimation. T h i s will lead to more inaccurate estimation than usual of the
surface values and error variances, and could again bias the error distribution. Also, due
to the large number of steps i n the estimation process, it is very difficult to pinpoint
misspecification of the trend as being the sole cause.
different models to the data and the comparison of their errors and predicted error
variances at the samples points is very sensible, though computationally expensive.
Another possible way of choosing the order of the polynomial trend model from the sample
data is to use stepwise regression and other commonly used methods i n linear regression
for model choice. A good reference for these techniques is Weisberg (1985). Forward
selection, one particular form of stepwise regression, begins w i t h a simple regression
model and at each step, one variable (or group of variables) is added to the model. T h i s
variable must provide the largest reduction i n the sum of squares of all the possible
variables not i n the model, when the new regression model is fitted to the sample data.
Equivalently, the variable w i t h the largest Fvalue is chosen. T h i s adding of variables
is continued until a stopping rule is met.
of variables i n the model have reached a chosen limit; the Fvalues for the rest of the
possible variables to be added are below a chosen minimum; or the addition of the next
variable would make the matrix F F
T
in the L S fit of the model. In this particular case, we must add all the monomials of each
order all at once, i n order for the trend model to retain its invariance to rigid motions. A s
the coordinate system is only chosen for convenience, the trend model should not depend
on it. For this reason, this stepwise method reduces to adding groups of monomials of
62
successively higher order until either the m a x i m u m number of terms is reached or the F value is below the m i n i m u m . A s only one group of monomials is considered at each step,
it is only sensible to choose the m i n i m u m for the Fvalue to be the point of significance
for the specific F statistic. A s long as the order of the trend does not get too high, the
previously discussed methods i n Section 3.2 should solve the problem of roundoff error.
One drawback to this method is that the sample variables must be uncorrelated such
that the residuals can be assumed to be independent N(0, o ) random variables. Under
2
geostatistical theory, the sample values are correlated, so they must first be uncorrelated
by using a technique such as declustering.
Alternatively, various studies have been done on the effects of choosing the wrong order
of the trend, or
a form of perturbation analysis, that owerspecification of the trend would increase the
potential instability of the UK estimator by increasing the bound on the relative error i n
the kriging weights. Cressie and Zimmerman (1992) correctly state that overspecification
of the trend would not cause any bias i n the final estimator, but due to the sample values
being used to estimate the extra trend coefficients, the
smallscale variation of the residuals, then if the trend is underspecified, the resulting
residuals should contain the extra variance unaccounted for by the trend. Therefore the
variogram of the residuals should naturally be overfit and thus correct the bias to some
degree.
63
UK
4.2
(co)variogram
C(h)
JFHhJl ] ^
{ R { X i )
R ( x
'
) ) 2
( 4
'
U 9 )
Xj
For secondorder stationarity, both the variogram estimator 27(h), and the covariogram
estimator C(h) can be used to estimate the variogram and covariance function respectively, but for intrinsic stationarity, only the variogram estimator can be used, due to
nonstationarity i n the variance. B o t h of these estimators are m i n i m u m variance a n d
unbiased under the gaussian assumption of R(x), but Cressie and G r o n d o n a (1992) show
that i n terms of the bias, the variogram estimator is more stable to misspecification of
the trend than the covariogram estimator. Therefore I advise using 27(h) rather than
C(h) when assuming secondorder stationarity.
If the residual data R
robust variogram estimator. One proposed by A r m s t r o n g and Delfiner (1980) uses a scale
estimator devised by Huber (1964) rather than the sample variance. Cressie and Hawkins
(1980) have devised a similar estimator to Matheron's but instead take the fourth power
of the mean rooted difference:
= h jm)
Dl/
64
ZUK(X)
7+
<
4121)
T h e justification for this robust estimator is that the rooted differences i?(xj)
i?(xj)2
are very close to normal even when the differences i?(xj) Rfaj) follow a symmetric
distribution about zero with heavy tails contaminated by outliers. Another robust estimator, 270(h), is proposed by Cressie and Hawkins, replacing the mean of the rooted
differences D i n equation (4.121) by the median D, which is a more robust statistic to
outliers than the mean. Hawkins and Cressie (1984) found that for data containing outliers, all of the above estimators had a positive bias, but that 275(h) had a smaller bias
than 275(h) which again had a smaller bias than Matheron's i n equation (4.119). W i t h
respect to their variances, the order of 275(h) and 275(h) was reversed, but Matheron's
estimator proved to have the smallest variance as the outlier effect became negligible,
confirming its m i n i m u m variance property under gaussian conditions.
Therefore one
must use either 275(h) or 275(h) as an estimator of the experimental variogram when
the data is thought to contain outliers.
4.3
S t e p 3: F i t t i n g a n a p p r o p r i a t e p a r a m e t r i c m o d e l , C ( x , y ; 0)
T h i s step has been given the most attention with respect to robustness as it is considered
to be the most sensitive part of the estimation process. M a n y studies have been conducted
on robustness of variogram models to misspecification, or more exactly, to inappropriate
choice of parametric model or wrong estimation of the parameters.
A l t h o u g h kriging predictions for linear unbiased estimators have m i n i m u m error variance,
65
UK
given that the underlying assumptions hold, if the (co)variogram is misspecified, the
error variance can become very large. D i a m o n d and A r m s t r o n g (1984) have investigated
the effect of small perturbations of the variogram on the kriging predictions for a fixed
set of observations.
variogram models, they found that the gaussian model gave very unstable behaviour for
small perturbations i n the variogram whereas the spherical model was far more stable i n
comparison. Bardossy (1988) proposes a different measure to D i a m o n d and Armstrong's
which is also sensitive to the location of the estimation point x , a factor which Bardossy
shows is a major influence on the kriging weights. Using this new measure, Bardossy
also found the gaussian model to give more unstable behaviour to small changes i n the
parameters than either the spherical or exponential models. For this reason, Stein (1989)
recommends that if a model must be fitted that is quadratic at the origin, then the
gaussian should be replaced by a more robust model such as C ( h ; 0) = c ? 
0 2 h
i e
(l +
6 h).
2
A s the number of observations increases, Stein and Handcock (1989) show that as long as
the variogram model used is
their final prediction and their kriging variances become negligible. The term
compatible
is strictly defined by Stein and Handcock, but it can be loosely defined as having the
same behaviour near the origin. A l t h o u g h this is an encouraging fact, I think that it is
66
ZTJK( )
X
final
Therefore
the total of these negligible weights could sum to a weight that has a significant effect
on the final prediction. Alternatively, if the process is truly stochastic, then only the
closer observations should be included i n the estimator anyway, (i.e we should see the
observation weights for the further points as exceeding a confidence interval i n some way
and therefore being ignored.)
4.4
67
ZUK{X)
A l t h o u g h much has been written about the effects of misspecification of the trend and
the (co)variogram on the final UK estimator ZUK{x),
In conclusion, the UK estimator has an inherent stability i n its estimation process due
to the model's decomposition of Z(x) into the large scale variation of the trend, and the
small scale variation of the residuals. A s long as estimates of the trend coefficients /3 and
the covariance parameters 0 are relatively good then, as discussed earlier, an error i n one
is naturally corrected by an opposite error i n the other. Therefore, as long as each step
in the estimation process is carried out carefully, and an appropriate covariance model
is chosen that is compatible and relatively stable to misspecification of its parameters,
then the UK estimator should be fairly robust to errors i n the estimation process.
A l t h o u g h every effort has been made to make each step as robust as possible to outliers,
ZUK{X)
68
of the data. Therefore i n some way these outliers must be isolated a n d edited so as not
to overinfluence the predictions made be Z (x).
UK
4.4.1
Robust Kriging
(4.122)
3. use these weights ctij and the neighbouring sample values to Xj to get a robust
prediction 5(x;) of .z(xj) for each sample point x, using a weighted median method:
if gridded data, then choose the nearest eight points, requiring only two different kriging weights to be used throughout.
solve
iZj^i
if there are multiple roots and/or intervals of roots to the equation then there
w i l l always be an odd number of them, so choose z(xi) to be the middle one,
and if i t is an interval, choose
Z(XJ)
ZRK(XJ)
ZUK(^)
69
by using the robust estimate of the variogram to calculate the kriging weights A;
as usual, but by replacing the sample values z
z {x)
RK
= ^w(xi)
J
(4.123)
T h i s method of smoothing the data is very similar to Huber's smoothing process used i n
time series (Huber 1977). The crux of the method is i n the fourth step when the data
is edited by Winsorizing.
The larger the estimated error variance at X;, the more erratic
the surface is assumed to be, and therefore the larger the bound about the weighted
median of its surrounding values within which zfa)
constant c also controls the w i d t h of this bound, and is thus an important parameter i n
the estimation process. A relatively high value of 22.5 is usually chosen so as not to
alter the data unless a sample value is extremely different from the weighted median of
its surrounding sample values.
Therefore, the essential difference between this form of robust kriging and the previous
methods, is that the edited sample values are used i n the final estimate instead of the
original sample values. Considering that the error variances sj are not estimated robustly,
and that they may be extremely inaccurate as discussed i n section 4.1, this method may
not be an effective way of identifying outliers. Instead, the editing of the data may only
succeed i n biasing the final estimate.
A n alternative method is to measure i n some way the influence that a particular sample
value has on the final estimates, and then to decide whether the highly influential sample
70
ZJJK{^)
values are outliers or important points that should be left untouched. Christensen, Johnson, and Pearson (1992) do this by using
of the UK estimator which allows for measurement errors at the sample points, the ac
tual values at the sample points are estimated ( z ) using the n measured sample values,
n
similar to
z _i),
n
is used to
measure the relative influence a sample has on the resulting estimates. In this way, the
most influential sample values can be found and edited if thought to be outliers.
Chapter 5
A Numerical Application
71
72
5.1
A s previously mentioned i n Chapter 4, there are two general types of method by which
the (co)variogram can be estimated from the data. T h e first requires i n i t i a l estimation
of the trend.
73
(co)variogram model and the validity of the many assumptions made about the data and
their distribution. A s pointed out i n section 3.5.2, if these assumptions are not satisfied,
it could lead to biased estimators just as for the first method. Therefore, it would be
unwise to rely on these apparent unbiasedness properties. Also, as stated i n section 4.3,
a biased (co)variogram estimator does not necessarily lead to a biased kriging estimator.
T h e methods of the second type also require a highly restricted choice of (co)variogram
model, usually linear i n the parameters. These parameters, 9, are very difficult to interpret, and as a result, it is difficult to evaluate whether the estimates, 9, are reasonable
for the considered data set. T h i s property, and the very nature of the estimation process,
allows little interaction by the practitioner. Also, the final parameter estimates can often
lead to an invalid (co)variogram.
O n the other hand, the (co)variogram models for the first type of method are very
versatile and naturally produce positive definite (co)variograms. T h e individual parameters are also very easy to interpret as, say, the variance of Z(x),
(co)variogram at h 0. T h i s allows the practitioner to use his or her skill and experience
i n choosing the (co)variogram model and the method of fitting it to the experimental
(co)variogram.
Lastly, and most importantly, there seems to be absolutely nothing documented on the robustness of these 'unbiased' estimators. W i t h respect to Delfiner's L S method, discussed
i n section 3.5.1, it is well known that L S estimation is highly sensitive to outliers. A l t e r natively, much has been written about the sensitivity of estimators of the first method to
outliers and misspecification of the trend and (co)variogram, as previously summarised
i n Chapter 4. For this major reason, and the others discussed above, I choose to use
74
5.2
For a stationary trend, the experimental (co)variogram can be directly estimated from
the data, whereas for a first and second order trend, we face the problem of estimating the
trend from correlated data without previous knowledge of the (co)variogram. A s there
is still no generally accepted way of doing this, I simply decluster the sample points, and
then assuming them to be independent, I fit a trend surface of first or second order by L S .
T h i s method should be fairly stable, as the sample data is exact with no measurement or
positional errors. I use the declustering technique discussed i n section 2.2, using a grid
spacing of 4 metres. This generally reduces the sample from 50 to 4344 values.
I then calculate the residuals between the sample values and the fitted trend surface.
Assuming these residuals to have originated from normally distributed random variables
w i t h zero mean, I use them to estimate the experimental (co)variogram by the exact
same methods as for a homogeneous trend, but using the estimated residuals
R(XJ), i =
I also
choose Matheron's variogram estimator over the covariogram estimator for its stability
to misspecification of the trend. A s \N(h)\ is extremely small or zerovalued for most
values of h, it is standard practise to divide the h scale into intervals or discrete lags,
providing only one semivariogram estimate, ^(hj), per lag, where N(hj)
Xi Xj = hj, and hj is i n j
th
= {(XJ,X.,) :
75
length but it seems common to divide the h scale into the order of 1020 intervals. I use
15 intervals of equal length, w i t h the estimators evaluated at the midpoints. A n example
of an experimental semivariogram of this form is shown i n Figure 5.3.
Figure 5.3: A n experimental semivariogram after dividing the h scale into 15 discrete
lags, providing only one estimator per lag.
T h e next step is to fit a valid semivariogram model to the experimental semivariogram.
T h e common models considered are linear, exponential, spherical, and gaussian, although
there are many other variations of these which could be applied. A comparison of these
models can be found i n Figure 5.4. A s mentioned i n Section 3.1.1, the [/If equations are
constructed i n such a way that any constant term or factor i n the covariance function is
filtered out. Therefore, any linear model of the semivariogram would be equivalent under
UK, so there would be no point fitting one to the experimental semivariogram under
these methods.
O f the nonlinear models i n Figure 5.4, the most appropriate one to fit to the experimental
semivariogram seems to be the spherical model, for every pairing of surface and sample
76
Linear
S = Spherical
E = Exponential
Gaussian
Figure 5.4: A comparison of semivariogram models with the same sill or variance (except
for the linear model), and the same gradient at h = 0 (except for the gaussian
model which always has a zero gradient at h 0).
configuration that I choose. Also, as mentioned i n section 4.3, the spherical model is
relatively stable to misspecification of 0. So, i n order to automate the estimation process,
I assume that the spherical model is the most appropriate nonlinear model to fit i n all
cases. A s most of the experimental semivariograms indicate negligible discontinuity at
h = 0, which is expected as there is no measurement error i n the data, I also decide not
to allow for a nugget effect i n the model. Therefore the final form of the spherical model
that I fit to the experimental semivariograms is as follows:
h< R
V
(5.124)
h> R
77
for the parameters cannot be found. Instead some form of NewtonRhaphson iteration
technique must be applied to the L S statistic to find a local minimum. A s there may be
many local minima, initial values for the parameters must be stated. For the spherical
model, the parameters are S and R.
S is the horizontal asymptote or sill of the model, and i n theory should be equal to the
variance of Z(x), assuming the variance to be stationary. Therefore, the sill S can be
estimated by an estimate of cr , such as the sample variance a ,
2
i=i
O f course, if a trend is fitted to the surface, then the estimated residuals i2(xj) should
be used instead of the data i n the above formula. Cressie (1985) shows that <r w i l l
2
always have a negative bias due to the estimation of LL by Z, just as i n Section 3.1 when
estimating the trend for higher dimensions. Cressie and Glonek (1984) propose that by
using the median of the sample values, rather than the sample mean Z, as an estimate
of p, this bias can be much reduced. Also <r is based on the assumption that the data
2
78
lags w i t h a small number of pairs. He proposes finding the model parameters 0 that
minimise
L
rather than
j(hj \ 0) is the semivariogram model. Cressie states that this is a vast improvement over
L S . Cressie also proposes that this estimator of 6 could be used as the starting value of
an iterative generalised L S approach.
W h e n comparing Cressie's weighted L S method w i t h normal L S for fitting the spherical
model, I find that there is practically no difference between them when the model is a
good fit to the experimental semivariogram. Otherwise, I find that Cressie's method does
fit the early lags far better, which are the most important lags, as discussed i n section
4.3. A n example of this is shown i n Figure 5.5.
A l s o Cressie's method gives more weight to the middle lags, which have the highest values
of
\N(hj) . T h i s has the effect of bending the normal L S fit towards these weighted values.
In Figure 5.5 this effect is shown to t u r n an almost linear L S fit into a more appropriate
spherical fit. In general though, this weighting of the middle lags has the effect of changing
the sill S and the range R without altering the general L S fit too much. A n example of
this is shown i n Figure 5.6.
Lastly, as the values of
\N(hj)\ for lags hj = 13,14, and 15 are generally far below those
of the other lags, I do not include them i n the L S and weighted L S fits. A similar 'rule'
has been devised by Journel and Huijbregts (1978) for fitting semivariograms.
To summarise, I apply UK w i t h a homogeneous, first order and second order trend
Chapter
5. A Numerical
Application
79
Figure 5.5: A comparison between LS and Cressie's weighted L S method for fitting a
spherical model to the first 12 lags of the experimental semivariogram in
Figure 5.3.
surface, using both a linear semivariogram, and a spherical semivariogram fitted to the
experimental semivariogram using L S and weighted L S . Finally, I use the relation C{h) =
cr j(h)
2
C(h).
A s UK filters out any constant terms or factors i n the covariance function, o and S could
2
be any constants that make C(h) a valid positive definite function, i.e. any numbers large
enough to ensure that C(h) is nonnegative i n the required range of h. However, it is
best to estimate these parameters as accurately as possible so that C(h) can be used to
estimate the error variance.
5.3
Results
20 randomly chosen geological surfaces are estimated by UKiox 20 different sample configurations of 50 randomly chosen points; 3 different methods of (co)variogram estimation
Chapter 5. A Numerical
Application
80
WLS
LS
10
20
30
40
50
60
Figure 5.6: A n example of how Cressie's weighted L S method alters the sill of the L S fit
without altering the general L S fit too much.
Vijkn
s a
ijkn
81
overall mean p plus the sum of these effects and their interactions:
= + M + T + S + (MT)ij + (TS)
f l
jk
+ (MS)
ik
+ (MTS)
ijk
+e
ljkn
(5.127)
configuration k. n is the surface coefficient. The rest of the terms i n equation (5.127) are
the interaction effects between the different factors. tij
kn
using the
B y modelling the response variable i n this way, I can carry out an analysis of variance
( A N O V A ) study on the data to compare the relative effects. G o o d references for A N O V A
tests and related diagnostic techniques are Montgomery(1991) and Hicks (1993). A s the
sample configurations are randomly chosen, a mixed A N O V A model must be used. T h e
method and trend effects, M and T, and their interactions, MT, are called fixed effects.
The sample effects, S, and their associated interactions, TS, MS, and MTS, are called
random effects. For each random class, S,TS,MS, and MTS, the effects are assumed to
be normally distributed w i t h zero mean and constant variance for a l l i,j, and k. For each
fixed class, M,T, and MT, the null hypothesis is that a l l the effects are zero, and the
alternative hypothesis is that there is least one nonzero effect. For each random class,
the null hypothesis is that the common variance is zero, and the alternative hypothesis
is that i t is nonzero. If we assume that the model is appropriate, and the residuals are
independent normally distributed random variables w i t h constant variance a , then the
2
Fvalues for each class follow an F distribution w i t h their respective degrees of freedom.
In each case, the alternative hypothesis only replaces the null hypothesis if the Fvalue
is significantly large. Therefore the upper t a i l is the critical region, requiring a onesided
test.
82
Before analysing the A N O V A tables for the A D and S D data sets closely, the residuals
of their respective fits to the above model should be analysed to ensure that they satisfy
the above assumptions. Plots of the residuals versus the fitted values for b o t h the A D
and S D data sets are shown i n Figure 5.7.
AD Data
35
40
Fitted Value
SD Data
45
50
Figure 5.7: Plots of residuals versus fitted values for the mean absolute difference (AD)
and the mean squared difference (SD) data sets.
A l t h o u g h the variance of the residuals for A D shows only a slight increase w i t h i n creasing magnitude of the fitted values, the variance of the residuals for S D shows a
marked increase indicating the need for a transformation of the data.
L o g ( S D ) pro
vides the most stable residual variance of a l l the transformations i n the natural set
{ S D , S D ^ , log(SD), S D ^ } but at the expense of an asymmetric histogram of the resid 1
uals, whereas S D ^ provides a slightly increasing residual variance similar to that for A D ,
but a near perfect histogram and normal quantilequantile plot. T h e quantilequantile
plot for A D is also relatively straight except for a slight kink i n the negative t a i l between
83
10 and 30. T h i s is reflected i n the histogram where there is a slight dip around 20.
B o t h the histograms and normal quantilequantile plots for A D and S D ^ can be found
in Figure 5.8.
Figure 5.8: Histograms and normal quantilequantile plots for both the A D and SD2
residuals. A straight normal quantilequantile plot indicates that the residuals follow a normal distribution.
These results, and further plots of residuals versus method, trend, and sample, support
my conclusion that neither the A D or S D ^ model fits are i n significant violation of
the above assumptions as to seriously affect any conclusions made from the respective
A N O V A tables.
B y comparing the A N O V A tables for A D and SD5 i n Table 5.1, it is clear that they
provide exactly the same information. For both response variables, the two fixed factor
effects, the variance of the sample effect and the
AD
84
Effect
trend
method
sample
trend:method
trend:sample
trend:sample
trend:method:sample
Residuals
Df
2
2
19
4
38
38
76
3420
Sum of Sq
3629
1225
42887
996
1319
433
312
534749
M e a n Sq
Effect
Df
2
2
19
4
38
38
76
3420
Sum of Sq
18286
4575
157588
3359
7838
1663
1235
1080426
M e a n Sq
9143.20
2287.66
8294.08
839.73
206.27
43.76
16.24
315.91
SD*
trend
method
sample
trend: method
trend:sample
method:sample
trend:method: sample
Residuals
1814.49
612.56
2257.24
249.12
34.72
11.40
4.11
156.36
F Value
52.261
53.733
14.436
60.613
0.222
0.073
0.026
p Value
0.000
0.000
0.000
0.000
1.000
1.000
1.000
F Value
44.326
52.277
26.255
51.708
0.653
0.139
0.051
p Value
0.000
0.000
0.000
0.000
0.951
1.000
1.000
extremely high probability of being nonzero, whereas the other interaction effects are
almost definitely zero.
T h e trend:method interaction plots for A D and SD2 i n Figure 5.9 are also very similar.
T h e y b o t h indicate that by increasing the order of the trend, the differences i n the
methods are only magnified, without changing the actual ordering i n any way.
Only
Figure 5.9:
85
mean A D .
mean
LS
WLS
lin
0
36.00
36.08
36.11
I
37.72
37.06
36.35
II
39.85
38.81
36.86
s.d.
LS
WLS
lin
0
11.84
11.84
11.82
I
12.80
12.92
12.00
II
14.50
13.82
12.51
Table 5.2: Mean and standard deviations of the A D data for the different method/trend
combinations estimated over all the combinations of surface and sample configuration.
Firstly, I test the differences between the A D means of the different (co)variogram methods, keeping the trend order fixed. For both the first and second order trend surfaces (I
and II), b o t h the paired ttest and paired W i l c o x o n signed rank test give extremely small
86
pvalues for each difference. Therefore, despite the fact that these differences seem i n significant, they are actually highly significant. This implies that for estimating a general
surface of this type from 50 randomly chosen sample values, L I N is better than W L S ,
which is itself better than L S , when using a polynomial trend surface of order one or
two. For a stationary trend (O), the two tests agree that the difference between W L S
and L I N is insignificant, but for the other two differences, W L S  L S and L I N  L S , there
is some disagreement. The respective pvalues for the paired ttest are 0.263 and 0.314,
suggesting that the two differences are insignificant, whereas the values for the W i l c o x o n
test are 0.016 and 0.000, suggesting completely the opposite conclusions. O n looking
at the density plots and normal quantilequantile plots of the two differences i n Figure
5.10,1 feel that despite the heavy tails the distibutions are nearly normal and suggest an
insignificant difference from zero for the respective means. Therefore I conclude that the
two differences are insignificant, i n agreement w i t h the paired ttests. A s these differences
are so small, this conclusion is rather less important than for the first and second order
trends.
Secondly, I test the differences between the A D means for the different trend orders,
keeping the (co)variogram method fixed. For each (co)variogram method, the two tests
give extremely small pvalues for every difference. T h i s implies that for any (co)variogram
method, when estimating a general surface of this type from 50 randomly chosen sample
values, the lower the trend order the better. T h e exact same conclusions are found by
analysing the S D ^ data i n this way. These conclusions are also reflected i n the mean and
median factor plots for A D i n Figure 5.11.
These factor plots also point out that however significant the differences may be between
the (co)variogram methods and trend orders, the differences i n the means for the various
87
sample configurations are far greater. T h i s strongly indicates that the choice of sample
configuration is very important i n the estimation of a surface, and suggests that more
effort should be put into choosing the best sampling scheme. T h e problem w i t h choosing
the best sampling scheme is that it is entirely dependent on the surface, of which we know
extremely little at this stage i n the estimation process. Based on this little knowledge,
various sampling schemes such as clustered, stratified, systematic or random sampling
can be applied (Ripley 1981).
5.4
In this final section, the problem of choosing the order of the trend surface is considered.
88
A t h i r d method, also discussed i n Section 4.1, could be to fit all three trend models to the
declustered data by L S and calculate the residual sum of squares at the sample points,
as i n stepwise regression. Ftests can be performed to test whether the reduction i n the
sum of squares, when moving from a stationary to a first order model, and from a first
order to second order model, are significant.
T h e results of these methods when applied to surface 13 for the first sample configuration
are as follows:
89
T h e cross validation kriging errors and associated error variances are estimated for a
homogeneous, first order, and second order trend surface, using a linear covariance function. Note that the parameters of the linear model are now required to be estimated i n
order to estimate the kriging error variances. The normal quantilequantile plots of the
standardised errors for each trend model are i n Figure 5.12. It is clear that the best fit to
a N(0,1) distribution is the second order trend model, w i t h the stationary model being
the worst. A s expected, none of them are very good fits.
Next, I calculate the mean squared difference and mean absolute difference of the kriging
errors. I find the ordering of the trend models to be the same for b o t h measures, w i t h
the second order model having the lowest value, and the first order model the highest.
Lastly, the three trend models are fitted to the 43 declustered sample values by L S . The
F statistic for moving from the stationary to the first order trend model is 29.71 which
is highly significant for an F(2,40) distribution. The F statistic for moving from the first
order trend to the second order trend model is 35.10 which is also highly significant for an
F(3,37) distribution. Therefore again it is concluded that the second order trend model
is the most appropriate, then first order, and lastly the stationary model.
In fact, the trend model that provides the best fit to surface 13 for the first sample
configuration is the first order model, w i t h the second order model coming i n last place.
Therefore, these methods can clearly be misleading and result i n choosing the worst
rather than the best trend model for a particular surface and sample configuration.
5.5
90
Conclusions
geneous trend, there is no significant difference between the three forms of variogram
estimation.
Finally, I conclude that the methods for detecting the optimal trend order for a specific
sample configuration and surface can be seriously misleading.
Therefore, for a sample configuration of 50 randomly chosen sample points and a general
surface of the type estimated i n this example, it is best to choose the simplest trend
surface and the simplest method of variogram estimation. I suggest that this is as a result
of my inexperience i n finding an initial estimate for the trend, and i n fitting a variogram
model.
91
For these reasons, I conclude that unless the practitioner has experience i n
applying this type of estimator and i n using the above methods for detecting the o p t i m a l
trend order, UK should always be applied w i t h a homogeneous trend i.e.
K r i g i n g , for which the choice of variogram estimation is unimportant.
Ordinary
92
20'
20 n
8
8
97
79
6
6
1611
Q
<
Q
<
"l
15
lOO
ll
16
II
LSWLS
102
LS
WLST
LINT
''
LIN4
15O
0
18
I
17ISM
1751
19
S'
14
19
12
18
sample
18'
trend
Factors
method
sample
trend
method
Factors
Figure 5.11: Mean and median factor plots for the A D data set. The horizontal lines in
the mean and median plots mark the overall mean and median respectively.
93
Homogeneous trend
Figure 5.12: Normal quantilequantile plots for the cross validation standardised residuals. The standardised residuals for the second order trend provide the best
fit to a N(0,1) distribution.
Chapter 6
Conclusion
Since the original linear estimators of Ordinary and Universal K r i g i n g were first developed
i n the 1960's, a multitude of alternative estimators have been put forward, based on far
more complex mathematical and statistical theory, yet few, if any, have proved to be
practically superior. T h i s is supported by the fact that O r d i n a r y K r i g i n g is still being
practiced i n many ore mines around the world, including the gold mines i n South Africa,
where statistical techniques for ore estimation were first applied. So, as is often the case,
the simplest model proves to be the best.
A l t h o u g h Universal K r i g i n g (UK) is the most general of the linear estimators, it fails
i n its attempt to model the surface as having a
robust to errors at each stage i n its estimation process. Despite all this work, UK w i t h a
nonhomogeneous trend is still considered to be too unreliable to be of any real practical
use. Another problem is that, being a linear estimator, it is also very sensitive to outliers
in the data. T h i s problem is difficult to avoid. Therefore, UK is very dependent upon the
prior detection and editing of these outliers, which is itself a very difficult and unreliable
procedure. In situations where outliers are known to exist, UK should be replaced by
94
Chapter 6. Conclusion
95
find that the linear (co)variogram model, which demands no interaction at a l l by the
practitioner, performs better than the spherical model fitted by weighted L S , which itself
performs better than the spherical model fitted by L S . T h i s again indicates that the less
interaction by an inexperienced practitioner the better, as it is unlikely that a linear
model is more appropriate than a spherical model for a (co)variogram when b o t h fitted
correctly. T h i s observation leads me to some suggestions for future research.
Chapter
6.
Conclusion
96
for different surfaces of varying irregularity and different numbers of sample values, and
try to find some definition of the type of surface and number of sample values that UK
is best suited for. It is known that the interpolating spline is equivalent to UK w i t h a
first order trend and a fixed covariance function equal to the Green's function, /i Zo(?/i
2
(Dubrule 1983). The theoretical advantage that UK has over spline interpolation is that
it is more versatile, allowing the trend order and the covariance function to change to
suit the surface being fitted. It would be of interest to see whether this translates to an
advantage i n practise.
Thirdly, I think less attention should be put towards finding unbiased forms of estimation
and more towards finding estimators that are robust to outliers. A s these estimators are
generally applied to ore deposits, where there are likely to be errors i n measurement, I
feel that the effort i n this field of research has been somewhat misplaced.
Lastly, my numerical example strongly indicates that the choice of sample configuration
is extremely important i n the estimation of a surface, far outweighing the effects of the
different trend orders and forms of (co)variogram estimation. Therefore, I feel that more
effort should be put towards finding methods of choosing an optimal sampling scheme.
Bibliography
97
Bibliography
D A V I D , M . , 1988,
98
BLUEPAK3D Manual:
U n i v . Press, Baltimore.
HAWKINS, D . M . , AND CRESSIE, N . , 1984, Robust kriging  a proposal:
G e o l , 16, 318.
Math.
Bibliography
99
Geostatistics:
London,
M o d e l criticism
Math.
A simple
of Linear Algebraic
Systems:
Prentice
Bibliography
100
F i n a l project, C S 542a, U n i v . of B . C .
N E U M A N , S. P . , A N D J A C O B S E N , E . A . , 1984, Analysis of nonintrinsic spatial
variability by residual kriging w i t h application to regional groundwater levels: M a t h .
G e o l , 16, 499521.
PRESS, W . H . , FLANNERY, B . P . , TEUKOLSKY, S . A . , AND V E T T E R L I N G
1989,
702p.
W.T.,
STARKS, T . , A N D F A N G , J . , 1982a, The effect of drift on the experimental semivariogram: J . Int. Assoc. M a t h . Geol., 14, 309319.
STARKS, T . , AND F A N G , J . , 1982b, O n the estimation of the generalised covariance
function: J . Int. Assoc. M a t h . Geol., 14, 5764.
STEIN, M . L . , 1989, The loss of efficiency i n kriging prediction caused by misspecification of the covariance structure, in Armstrong, M . et al. (Eds.), Geostatistics
vol. 1: Kluwer, 273282.
STEIN, M . L . , A N D H A N D C O C K , M . S., 1989, Some asymptotic properties of
kriging when the covariance function is misspecified: M a t h . Geol., 2 1 , 171190.
SULLIVAN, J . , 1984, Conditional recovery estimation through P r o b a b i l i t y K r i g i n g , in
Verly, G . et al. (Eds.), Geostatistics for Natural Resources Characterisation: Reidel,
Dordrecht, 365384.
V E R L Y , G . , 1983, T h e Multigaussian approach and its application to the estimation
of local reserves: M a t h . G e o l , 15, 259286.
W E I S B E R G , S., 1985,
Bibliography
101
Papadakis estimator and other nonlinear estimators of treatment contrasts i n fieldplot experiments: Biometrika, 76, 253259.