Sie sind auf Seite 1von 13

Computers & Geosciences 25 (1999) 627639

An expanded GSLIB cokriging program allowing for two


p
Markov models
Xianlin Ma*, Andre G. Journel
Petroleum Engineering Department, Stanford University, Stanford, CA 94305-2220, USA
Received 13 August 1998; accepted 1 January 1999

Abstract
The introduction of the two Markov models, MM1 and MM2, corresponding to two symmetric data screening
hypotheses, has considerably facilitated the practice of modeling the matrix of (cross) semivariograms needed for
cokriging. Program newcokb3D expands the GSLIB code cokb3D to take advantage of the colocated cokriging
option under either the linear model of coregionalization (general case) or the MM1 or the MM2 model. A case
study using the public domain GSLIB data set demonstrates application of program newcokb3D and analyzes the
impact of the alternative models of coregionalization. # 1999 Elsevier Science Ltd. All rights reserved.
Keywords: Colocated cokriging; Markov models; Locally varying mean; GSLIB

1. Introduction and recalls


The classical cokriging estimate of a primary variable Z1(u) by a linear combination of both primary
and secondary data is here recalled, in its simple cokriging version, see for example Goovaerts (1997, p. 205):
Z1 u m1

nk
K X
X
k1 ak 1

lak uZk uak mk

where mk, k=1, . . ., K are the stationary means of the


K covariates involved; uak , ak=1, . . ., nk are the locations of the nk data related to covariate Zk and the
lak 's are the corresponding cokriging weights.
The corresponding cokriging system requires deter-

Code
available
at
http://www.iamg.org/CGEditor/
index.htm
* Corresponding author. Tel.: +1-650-723-8064; fax: +1650-725-2099.
E-mail address: ma@pangea.stanford.edu (X. Ma)
p

mination of a permissible matrix of (cross) covariances


or, equivalently, of (cross) variograms modeling the
pattern of cross correlation between the K variables
included (Christakos, 1984; Goovaerts, 1997, p. 108).
This matrix involves up to K 2 dierent (cross) covariance models with Ckk '(h)=Cov{Zk(u),Zk '(u+h)}.
1.1. Colocated cokriging
The inuence and modeling of the (cross)covariance
matrix has been the single most important obstacle to
the practice of cokriging. A marked simplication and
a corresponding growth in use of cokriging occurred
with the introduction of the concept of colocated cokriging with a Markov-type model of coregionalization
(Almeida and Journel, 1994; Goovaerts, 1997, p. 235).
In the presence of dense secondary data (a case often
found in practice), it is argued that the single closest
sample of the covariate best correlated to the primary
variable being estimated carries most of the covariate
information; in this case the cokriging estimator Eq.
(1) takes the reduced form

0098-3004/99/$ - see front matter # 1999 Elsevier Science Ltd. All rights reserved.
PII: S 0 0 9 8 - 3 0 0 4 ( 9 9 ) 0 0 0 0 9 - 6

628

X. Ma, A.G. Journel / Computers & Geosciences 25 (1999) 627639

Z 1 u m1

n1
X

la1 uZ1 ua m1 l0 Yu 0 ,

a1 1

where Y(u) is the covariate best related to the primary


variable Z1(u), u ' is the y-sample location closest to the
unsampled location u. Note that Y(u ') could be some
function, not necessarily linear of the actual covariates
Zk (u '), k>1.
In many applications, the secondary data grid is as
dense or denser than the estimation grid; in which
case, u '=u exactly, or one can relocate the secondary
datum y(u') to the nearest grid node which amounts to
consider again u '=u. In both cases there is one secondary datum colocated with each node u being estimated, hence the name colocated cokriging.
1.2. The MM1 model
The reduced cokriging Eq. (2) utilizes only one
single secondary datum y(u '), hence the secondary covariance function CY(h) is not called for in the corresponding cokriging system, only the stationary
secondary variance CY(0). The Markov model proposed by Almeida and Journel (1994), hereafter
denoted MM1, is such that modeling of CY(h) is not
required, only the primary covariance C1(h) is needed.
Under MM1 the cross correlogram is written
r12 h r12 0r1 h,

where r1(h)=C1(h)/C1(0) is the primary correlogram


and r12(0) is the colocated correlation coecient
inferred from pairs of data {z1(ua ),y(ua )}.
The extreme simplicity of the coregionalization
model Eq. (3) explains its success, notwithstanding its
rather restrictive Markov-type screening assumption
EfYuvZ1 u z1 , Z1 u h z1 0 g
EfYuvZ1 u z1 g, 8u, h, z1 , z1 0 :

In words, the colocated primary datum Z1(u)=z1


screens the inuence of any further away primary data
on the secondary variable Y(u). Note that the screening applies to the secondary variable Y(u), not to the
primary variable being estimated. Such screening hypothesis makes sense if the volume support of the primary variable Z1(u) is larger and includes that of the
secondary variable Y(u).
1.3. The MM2 model
In applications involving secondary data stemming
from remote sensing measurement devices, the volume
support of the secondary data Y(u) or Zk(u), k>1 is
typically larger than that of the primary variable Z1(u).
In such cases, one would expect a screening eect

reverse from that expressed in relation Eq. (4), more


precisely
EfZ1 uvYu y, Yu h y 0 g
EfZ1 uvYu yg, 8u, h, y, y 0 :
The following `reverse'
would then be appropriate:

cross-correlation

r12 h r12 0rY h,

5
model
6

with rY(h)=CY(h)/CY(0) being the secondary correlogram model.


This latter model Eq. (6) is precisely the Markov
model 2, denoted MM2, recently introduced by
Journel (1998). As opposed to the MM1 model Eq.
(3), MM2 requires inference of the secondary covariance CY(h), but this should not be a dicult task since
secondary data are, typically, more abundant.
Because the cokriging system under the MM2 model
requires both the primary and secondary covariances,
C1(h) and CY(h), these two must be consistent with the
MM2 cross covariance Eq. (6). For this purpose,
Journel (1998) suggests modeling the primary covariance as a linear combination of the secondary covariance model (typically easy to obtain) and any other
permissible covariance model rR(h); more precisely
C1 h C1 0r1 h
r1 h r212 0r2 h 1 r212 0rR h:

rR(h) provides the degrees of freedom necessary to


model C1(h).
In terms of semivariograms, Eq. (7) is written as


g h
g1 h C1 0 r212 0 2
8
1 r212 0gR h ,
C2 0
with C1(0) and C2(0) being the primary and secondary
variances and gR(h) any permissible semivariogram
model with unit sill.
Both MM1 and MM2 are permissible models of coregionalization because it can be shown that they are
the (cross) correlograms of specic pairs of random
functions verifying the constitutive screening assumptions Eq. (4) or Eq. (5).
2. Program newcokb3D
The documentation of the GSLIB 2.0 cokriging program cokb3d mentions the concept of colocated cokriging but this option is not implemented (Deutsch and
Journel (1998, p. 100). The new code here proposed,
newcokb3D, includes the option of colocated cokriging
under both Markov models MM1 and MM2 and the

X. Ma, A.G. Journel / Computers & Geosciences 25 (1999) 627639

Fig. 1. Parameter le of the newcokb3D code.

629

630

X. Ma, A.G. Journel / Computers & Geosciences 25 (1999) 627639

general linear model coregionalization (LMC). The


code newcokb3D also allows the useful concept of cokriging with a local varying mean with the cokriging
estimator written as (Goovaerts, 1997, p. 190)
Z 1 u m1

n1
X

la1 Z1 ua1 m1 ua1

a1 1

K
X

9
lak Zk uak mk ,

k61

where the locally varying primary mean values m1(u)


are given by an input le, lvmfl.dat in the parameter le of Fig. 1. These locally varying mean values
can be used to introduce information related to the
primary variable Z1. The option of utilizing locally

varying primary mean can be used with or without the


option of colocated cokriging.
2.1. Input options
. Local varying mean option: this relates only to the
primary variable. The primary means m1(u) at all
nodes u of the estimation grid are provided as an
input le. The primary mean m1(ua ) at any primary
data location ua is identied to the mean m1(u) at
the node u nearest to ua.
. Colocated cokriging option: under this option, only
one secondary variable is allowed.
. If the Markov model MM1 is used, enter the colocated correlation coecient r12(0), the variance
CY(0) of the secondary variable and only the pri-

Fig. 2. Pixel maps of the reference primary and secondary data sets. Exhaustive and sample scattergram of primary vs. secondary.
(The white dots on Fig. 2a are the sample locations).

X. Ma, A.G. Journel / Computers & Geosciences 25 (1999) 627639

mary semivariogram model g1(h)=C1(0) C1(h) as


g11.
. If the Markov model MM2 is used, enter the colocated correlation coecient r12(0), the variance
C1(0) of the primary variable, enter as g11 the
residual semivariogram model gR(h)=1 rR(h)
dened by the second relation Eq. (7), last enter
as g22 the secondary semivariogram model
g2(h)=C2(0) C2(h).

3. Case study
The GSLIB primary and secondary reference data
sets in les true.dat and ydata.dat (Deutsch

631

and Journel, 1998, p. 3738) are used. These data sets


consist of 2500 values on a regular 50  50 grid, see
pixel plot maps in Fig. 2. The entire secondary data
set (le ydata.dat) and the 29 primary sample data
(le data.dat) are considered the information available for estimation of the remainder primary values.
Fig. 2 gives also the scattergrams of primary vs. secondary data using all reference data (2500 pairs) and
the sample primary data (29 pairs). The sample correlation r^ 12(0)=0.493 is seen to be less than the actual
(reference) correlation value 0.64. All later cross correlation modeling uses the sample correlation value
0.493.
Fig. 3 gives the experimental omnidirectional semivariograms calculated from the sample information (29

Fig. 3. Sample (29) omnidirectional semivariograms(dash line) and LMC model t(solid line): (a) primary, (b) secondary, (c) crosssemivariogram.

632

X. Ma, A.G. Journel / Computers & Geosciences 25 (1999) 627639

primary+2500 secondary data) and their t by the linear model of coregionalization (LMC), for h>0:

C2(0)=7.32. Finally the MM1 model is written, see


model t in Fig. 4, for h>0:

g1 h 14:0 13:0Sphvhv=30

g1 h 14:0 13:0Sphvhv=30

g12 h 2:0 5:2Sphvhv=30

g12 h 0:26g1 h 3:6 3:3Sphvhv=30:

gY h g2 h
0:3 6:9Sphvhv=30 0:8Gaussvhv=6,

10

where Sph(h/a ) and Gauss(h/a ) designate unit sill


spherical and Gaussian structure with (practical) range
a.
Note that the colocated correlation coecient corresponding
to this LMC model is: r12=7.2/
p
27  8=0.49, a value close to the sample correlation
0.493.

11

The resulting
colocated
correlation coecient is:
p

r12=6.9/ 27  7:32=0.49, a value equal to that of


LMC and close to the sample correlation 0.493. Note
the larger nugget eect (3.6) of the MM1 cross semivariogram model compared to that (2.0) of LMC: this
is because the MM1 cross semivariogram g12(h) is
modeled from g1(h) which has a large nugget eect,
compare Fig. 3c and Fig. 4. Recall that MM1 does not
include a model for g2(h).

3.1. MM1 model

3.2. MM2 model

One starts by modeling the primary semivariogram


g1(h) as in model Eq. (10). Then the cross semivariogram is given by relation Eq. (3), that is
p
g12 h C12 0 C12 h C1 0C2 0r12 01 r1 h

The MM2 model is more dicult to use. One starts


modeling the secondary semivariogram g2(h), then the
same model as in Eq. (10) is used. Next the cross semivariogram model is given by the MM2 relation Eq. (6)
as
s
C1 0
g12 h
r 0g2 h,
C2 0 12

s
C2 0

r 0g1 h:
C1 0 12
where C1(0)=27 is the sill of the g1(h) model. Since in
MM1 the secondary semivariogram is not modeled,
the value C2(0) is identied as the sample variance

with C1(0)=25.8 identied to the primary sample variance, C2(0)=8.0 identied to the sill of model g2(h) in
expression (10), and r^ 12(0) =0.493, it comes

Fig. 4. Sample (29) omnidirectional cross-semivariogram (dash line) and MM1 model t (solid line). The MM1 model for the primary semivariogram is the same as LMC.

X. Ma, A.G. Journel / Computers & Geosciences 25 (1999) 627639

g12 h 0:9g2 h
0:3 6:1Sphvhv=30 0:7Gaussvhv=6:
This MM2 cross semivariogram model has a much
lower nugget eect than the model provided by either
LMC or MM1, see Eq. (10) and (11), because it is
modeled from g2(h) which has low nugget eect. All
three models, LMC, MM1, MM2, for the cross semivariogram g12(h) provide equally good ts of the experimental values, compare Figs. 3c, 4 and 5c. The
sampling uctuations of g^ 12(h) due to too few data
pairs does not allow one to decide which of the three
models is best; this decision will have to be based on a
qualitative evaluation of the Markov hypotheses
underlying the MM1 and MM2 models.

633

Last, the experimental primary semivariogram g^ 1(h)


is to be modeled according to Eq. (8), i.e. using a combination of the standardized model g2(h)/C2(0) and the
degree of freedom provided by gR(h). Recall that any
permissible semivariogram model with unit sill can be
used for gR. In practice, gR(h) is modeled from the experimental dierence values:
g^ R h

1
1 r^ 212 0

g^ 1 h=C^ 1 0 r^ 212 0


g2 h
,
C2 0

where g^ 1(h)/C^ 1(0) is the standardized experimental primary semivariogram, g2(h)/C2(0) is the standardized
secondary semivariogram model taken from Eq. (10),
and r^ 12(0)=0.493 is the sample correlation. Fig. 5a

Fig. 5. Sample (29) omnidirectional semivariogram (dash line) and MM2 model t (solid line): (a) residual, (b) primary, (c) crosssemivariogram. The MM2 model for the secondary semivariogram is the same as LMC.

634

X. Ma, A.G. Journel / Computers & Geosciences 25 (1999) 627639

Fig. 6. Ordinary kriging maps and their scattergrams with the reference true primary values: (a) LMC, (b) MM1, (c) MM2.

X. Ma, A.G. Journel / Computers & Geosciences 25 (1999) 627639

gives the experimental dierence values g^ R(h) and the


t by the unit sill model, for h>0:
gR h 0:3 0:7Sphvhv=30:
Finally, the MM2 model for the primary semivariogram is written according to Eq. (8), see model t in
Fig. 5b, for h>0
g1 h

25:8
0:4932 g2 h 25:81 0:4932 gR h
8:0

0:78g2 h 19:5gR h
6:1 19:0Sphvhv=30 0:6Gaussvhv=6
When compared to the LMC and MM1 Eqs. (10) and
(11), MM2 results in a g1-model with much less nugget
eect.
In summary, the MM2 model is written, see Fig. 5b
and c, for h>0
g1 h 6:1 19:0Sphvhv=30 0:6Gaussvhv=6
g12 h 0:9g2 h
0:3 6:1Sphvhv=30 0:7Gaussvhv=6
gY h g2 h
0:3 6:9Sphvhv=30 0:8Gaussvhv=6:

12

635

The corresponding
p correlation coecient is
r12(0)=7.2/ 25:7  8:0=0.50, a value close to the
sample correlation 0.493.

3.3. Ordinary kriging


Using the 29 primary sample values and the 2500
(exhaustive) secondary data, colocated ordinary cokriging was used to estimate all 2500 reference primary zvalues. The three previous coregionalization models
were used, namely LMC, MM1 and MM2. The same
data search strategy was used, hence any dierence in
the resulting kriging maps is due to the model dierences.
Fig. 6 gives the three kriging maps and corresponding scattergrams of estimated values vs. the reference
true primary values; compare to Fig. 2a. All three krigings are unbiased in that they reproduce reasonably
the 29 sample mean (3.38) and they all show the
characteristic smoothing eect (less variance) of all regression-type estimators. The sample size (29) is too
small to perform any signicant cross validation, hence
from the present results we would conclude that there
are insignicant dierences between the three models.
It would have been preferable to consider three dierent data sets, each leading to a dierent model; we
decide to use the GSLIB data set to limit the size of
the paper and also because that data set is widely
known and immediately accessible.

Fig. 7. Data locations for estimating the primary value at location u=(9.5,24.5). The seven primary data values are grayscale
coded. The colocated secondary data value at location u is 0.5.

636

X. Ma, A.G. Journel / Computers & Geosciences 25 (1999) 627639

3.4. Detailed analysis


A useful aspect of GSLIB programs, often overlooked, is the debug option which lists the program
intermediary operations and numerical results. The
debug option level 3 (highest) of program newcokb3D
has been set `on' for the ordinary cokriging of the
single value z(u), at location u=(9.5, 24.5) from seven
primary data and the single colocated secondary
datum y(u)=0.5. Fig. 7 gives the data conguration

map grayscale coded for the primary data values. Figs.


8 to 10 give the three debug printouts corresponding
to the three models used, LMC, MM1 and MM2.
First the (9  9) cokriging matrix is listed, starting to
the left with the right hand side data-to-unknown covariance values r(.), then following with the data-todata covariance values a(.). The last row corresponds
to the ordinary kriging condition of all eight kriging
weights summing up to 1. Recall the standardized
ordinary cokriging estimate expression (Deutsch and

Fig. 8. Debug output for standardized ordinary cokriging using LMC model.

X. Ma, A.G. Journel / Computers & Geosciences 25 (1999) 627639

637

Fig. 9. Debug output for standardized ordinary cokriging using MM1 model.

Journel, 1998, p. 74)


Z  u

7
X

variance sK2(u) are listed, e.g. for LMC: z (u)=1.78,


sK2(u)=18.04, see Fig. 8.

la Zua m1 l0 Yu m2 m1 ,

a1 1

P
with: a1 1 =la+l0=1, m1=3.38 is the primary variable mean and m2=2.32 the secondary variable mean;
thus, y(u) m2+m1=1.560.
Following the covariance matrix, the debug printout
gives the Lagrange parameter value, the (3D) coordinates, values and kriging weights of the eight data
retained. Last, the estimated value z (u) and its kriging

4. Results
The LMC and MM1 models give very similar
results; this is as expected from their close cross semivariogram expressions, recall Eqs. (10) and (11). The
smaller nugget constant (2.0) of the LMC model for
g12(h) results in greater cross correlation (redundancy)
between the secondary datum y(u) and the seven pri-

638

X. Ma, A.G. Journel / Computers & Geosciences 25 (1999) 627639

Fig. 10. Debug output for standardized ordinary cokriging using MM2 model.

mary data, this explains the lesser weight (0.405) given


to that secondary datum by the LMC model (that
weight is 0.426 for MM1). The MM2 model has a signicantly smaller nugget for g12(h): 0.3 vs. 2.0 for
LMC and 3.6 for MM1, see second expressions of relations Eqs. (10)(12); consequently MM2 provides the
less weight (0.169) for the secondary datum y(u)
deemed more redundant with the seven primary data.
MM2 also has the smaller nugget (6.1) for the primary semivariogram g1(h), hence provides the best cor-

relation between the primary data and the primary


unknown: this compounds the previous eect and
increases the weights given to the primary data, in particular that (0.131) given to the fourth primary and
highest datum value z(u4)=8.03. Both these eects
compound to deliver the highest estimated value
z (u)=1.82 when using the MM2 model with a signicantly smaller kriging variance sK2(u)=11.98.
Although the global results displayed in Fig. 6 may
not show signicant dierences between the three

X. Ma, A.G. Journel / Computers & Geosciences 25 (1999) 627639

models, analysis of the debug output shows that the


decision underlying the choice of any of the three previous models could be signicant for local estimation.
In practice, because of data sparsity, the decision to
adopt a model rather than another often relies on an
expert judgment: in this case one should evaluate the
appropriateness of the screening hypotheses underlying
the MM1 and MM2 models.
5. Conclusions
Cokriging, that is estimation of a primary variable
from primary data and secondary data originating
from one or more secondary variables, is possibly one
of the most valuable geostatistical algorithms. The
single major hurdle in the practice of cokriging is the
determination of the set of (cross) semivariograms
which model the pattern of (cross)correlation between
each pair of variables. The introduction of Markov
models, MM1 and more recently MM2, has considerably facilitated this modeling. In short, if the primary
variable has a large value support that can screen the
inuence of distant secondary data, the MM1 model
should be considered: then the cross semivariogram
g12(h) is proportional to the primary semivariogram
model g1(h). In the reverse case where the secondary
variable with large volume support can screen the inuence of distant primary data, the MM2 model is relevant: then the model g12(h) is proportional to the
secondary semivariogram model g2(h).
Program newcokb3D extends GSLIB cokb3D to
include the option of colocated cokriging under the

639

general linear model of coregionalization (LMC) or


the MM1 or the MM2 model.
A case study using the GSLIB reference data sets
demonstrates application of these new options. An
analysis of the debug output le shows the impact of
all three models when estimating the same location
under identical data conguration: if the secondary
variable has larger volume support than the primary
variable and yet the MM1 model is used, the redundancy of the secondary data with the primary data is
understated resulting in too much weight being given
to the secondary data.
We took advantage of this modication of GSLIB
program cokb3D to add the valuable option of cokriging with a locally varying primary mean. Information
about the primary trend may be available from secondary data.
References
Almeida, A., Journel, A.G., 1994. Joint simulation of multiple
variables with a Markov-type coregionalization model.
Mathematical Geology 26 (5), 565588.
Christakos, G., 1984. On the problem of permissible covariance and variogram models. Water Resources Research 20
(2), 252265.
Deutsch, C.V., Journel, A.G., 1998. GSLIB: Geostatistical
Software Library and User's Guide, 2nd ed. Oxford
University Press, New York, 369 pp.
Goovaerts, P., 1997. Geostatistics for Natural Resources
Evaluation. Oxford University Press, New York, 483 pp.
Journel, A.G., in press. Markov models for cross covariances.
SCRF report No. 11, Stanford University, California.

Das könnte Ihnen auch gefallen