Sie sind auf Seite 1von 13

Geoderma 262 (2016) 174186

Contents lists available at ScienceDirect

Geoderma
journal homepage: www.elsevier.com/locate/geoderma

A one-step approach for modelling and mapping soil properties based on


prole data sampled over varying depth intervals
T.G. Orton a,b,, M.J. Pringle b, T.F.A. Bishop a
a
b

Faculty of Agriculture and Environment, The University of Sydney, 1 Central Avenue, Australia Technology Park, Eveleigh, NSW 2015, Australia
EcoSciences Precinct, Department of Science, Information Technology and Innovation, GPO Box 5078, Brisbane, QLD 4001, Australia

a r t i c l e

i n f o

Article history:
Received 16 April 2015
Received in revised form 31 July 2015
Accepted 10 August 2015
Available online xxxx
Keywords:
Geostatistics
Spatial variability
Soil prole
Depth function
Area-to-point kriging

a b s t r a c t
Datasets for modelling and mapping soil properties often consist of samples from many spatial locations, collected from several different soil depth intervals. However, interest may lie in the spatial distribution of the property
for a particular target depth interval, which may or may not correspond to the sampled intervals. It is the task
of the data analyst to put the data together in such a way that useful and reliable conclusions can be drawn for
the soil depths of practical interest. Previous studies to tackle this problem include multi-stage approaches
and point-data-based 3-dimensional geostatistical approaches. One disadvantage of a multi-stage approach
for example, rst tting splines to the data for sampled proles, then imputing new data for the target interval,
before considering a spatial analysis with the imputed data is that the imputation generally ignores any uncertainty in the imputed data, which might give misleading conclusions. Point geostatistical methods, on the other
hand, assume that the data represent the value of the target variable at a specic point in the prole, rather than
its average over a sampling interval; this too could give misleading estimates. In this work, we present a statistical
method that properly deals with the sample support of soil prole data so that all data can be considered in a
single geostatistical analysis. The approach is based on the area-to-point kriging framework, which can be
used to represent the uncertainty from data that are averages over non-negligible sample supports (in our
case, the different sampled depth intervals). We combine a covariance model for the increment-averaged data
in the vertical domain with another model for the horizontal variation. This enables us to (i). process all data
in a single analysis, and (ii). calculate predictions for any target depth and support based on the same statistical
model. We test the approach on data from the MurrayDarling basin in eastern Australia, where interest lies
in mapping various soil properties that could have an effect on water salinity of the nearby Muttama Creek:
we illustrate the methodology for predicting clay content. Finally we discuss a number of possible extensions
of the methodology to broaden its applicability, which should provide the basis of further studies.
2015 Elsevier B.V. All rights reserved.

1. Introduction
Soil properties vary signicantly both across the landscape and
through the soil prole, and interest lies in characterizing and mapping
this variation to provide land users with useful information. Datasets
often consist of samples from many spatial locations, at several different
depth intervals. Within a particular study, these depth intervals may
be xed (e.g. 010 cm, 1020 cm, and 2030 cm). Other studies may
consider different xed intervals, or sampling intervals that are dened
according to soil horizons and therefore vary between locations within
the study. It is then the task of the data analyst to draw useful and
reliable conclusions for soil depths of practical interest. For example,
the GlobalSoilMap project specications (Arrouays et al., 2014) dictate
that soil properties should be mapped for depth intervals of 05 cm,
515 cm, 1530 cm, 3060 cm, 60100 cm and 100200 cm.
Corresponding author.
E-mail address: Thomas.Orton@dsiti.qld.gov.au (T.G. Orton).

http://dx.doi.org/10.1016/j.geoderma.2015.08.013
0016-7061/ 2015 Elsevier B.V. All rights reserved.

Various approaches have been adopted previously to perform this


task. A common approach is to t splines to the prole data for each
site (Bishop et al., 1999), use the spline to impute data for the soil property over the depth interval of interest, and then proceed with the analysis as if the value were known without error (e.g. Malone et al., 2009,
2011a; Adhikari et al., 2013; Orton et al., 2014; Bishop et al., 2015).
We refer to this as a spline-then-krige (STK) approach. This process
does not account for the uncertainty in the values inferred from the
spline, and could yield misleading conclusions.
Another possible approach to the problem is 3-dimensional (3-D)
geostatistics. However, this has been applied as if the data collected
from soil depth intervals were concentrated at a single point (e.g.
Hengl et al., 2014), at either one of the bounds of the sampling interval,
or at the interval's mid-point. This approach also fails to properly represent the support on which the data were originally collected (over an
interval, rather than from a point), and could again yield misleading
conclusions. Breidt et al. (2007) recognized the dangers of using midpoint assignment to represent increment-averaged data, and proposed

T.G. Orton et al. / Geoderma 262 (2016) 174186

a mixed-model approach for estimating depth functions, whilst properly accounting for the interval support of the data; their focus was on the
estimation of depth proles, whereas our focus is more on the use of
such data for modelling and mapping using spatial datasets of soil
horizon data. Other 3-D approaches (e.g. Poggio and Gimona, 2014;
Veronesi et al., 2012) generally suffer the same drawback; all data are
assumed to have identical vertical support, which ignores their different
uncertainties.
In a geostatistical framework, the sample support of data that are
averages of an attribute over non-negligible areal units can be dealt
with by area-to-point kriging (ATP kriging; Kyriakidis, 2004). This
method allows the sampling units and prediction supports to all have
different sizes and shapes. It has been applied in several case studies
in recent years to analyse areal-averaged data (Kyriakidis and Yoo,
2005; Kerry et al., 2012; Schirrmann et al., 2012; Truong et al., 2014).
Although usually carried out to account for the horizontal support (i.e.
the data are areal averages), there is no reason that the same methodology cannot be carried out to deal with the vertical support of soil prole
data (i.e. for data that are measurements of the average value of a soil
property over depth intervals). This was noted in Heuvelink (2014),
although we are unaware of any studies that have implemented such
an approach.
In this work, we combine the ATP approach for the vertical distribution with standard kriging approaches for the horizontal distribution.
Thus, a statistical model for the complete dataset (all spatial locations
and all depth intervals) is dened, with the support of each datum (a
combination of spatial location and depth interval) properly represented. We refer to this model for increment-averaged data, and the predictions built on the model, as increment-averaged kriging (IAK). We
propose that this all-in-one model should provide a better assessment
of prediction uncertainty compared with a two-stage approach, or an
approach that represents interval data by their mid-points (although
comparison of the different methodologies is not undertaken in the
current study).
We consider the methodology in the framework of a linear mixed
model (LMM; Lark et al., 2006). Thus, part of the variation of the target
variable can be explained by a collection of explanatory variables, with
the remainder being modelled as spatially dependent (i.e. data close
to each other in horizontal space and at similar depths are more likely
to be similar than data far apart in space and at different depths). We
allow interactions between depth and the spatial explanatory variables,
so that different relationships can be modelled at different depths in the
prole. We also allow the variance parameters of residuals to depend on
depth, which provides a mechanism to represent different uncertainties
at different depths in the prole.
Usually in ATP-kriging studies, the average covariances must be
calculated numerically (by a discretization approach), due to the complex nature of the areal data units in 2-D space. However, for our
increment-averaged data, the average covariances can be computed
analytically. We derive an expression for the covariance of the
increment-averaged data, based on an exponential model for the point
covariances. This signicantly reduces the computational load of maximum likelihood methods compared with numerical procedures. Nonetheless, for large datasets (when the total number of data is more than a
few thousand), likelihood approximation techniques may have to be
used (e.g. Stein et al., 2004; Eidsvik et al., 2014); we do not consider
these here though.
We test the proposed IAK approach on data from the Murray
Darling basin in eastern Australia, where interest lies in mapping
soil properties that could have an effect on water salinity of the
nearby Muttama Creek. Soil cores (to a depth of 1 m) were collected from 55 spatial locations over the Muttama catchment, and each
core was divided into horizons, giving a total of 192 samples. We
use this case study to illustrate the IAK approach, mapping clay
content and its attendant uncertainty based on the data from
these samples.

175

2. Theory
Throughout the following, we will assume that the horizontal
support of the data and of the prediction is point support. The method
can be extended to deal with data that are both areal- and depth-wise
averages, if this were to be required in another study. We begin our
presentation of the methodology with a simple stationary model for
the point covariances. We then extend this model to a more realistic
one, allowing variances to depend on depth, before describing how
this relates to the average covariances required to model the variation
of increment-averaged data.
2.1. IAK model: initial stationary model for point covariances
We begin our development towards a statistical model for the analysis of depth interval-averaged data by considering a 3-D model for
point data (i.e. with depths, d, taken to be xed points):
yx; d x; d x; d

where x are the horizontal coordinates. We use a linear model of some


known covariates to give the trend function, (x, d), which can be
written as:
x; d X x; d

where X(x, d) contains the known values of the covariates and is the
vector of associated parameters (to be estimated). This is known as the
xed-effect function, and X(x, d) constitutes a row of the xed-effect
design matrix. We assume that the residuals, (x, d), follow a multivariate normal distribution with mean zero and covariances depending
only on the horizontal and vertical separation distances (this assumption will be relaxed in Section 2.2). As a rst approach, we assume a
separable (product) covariance model (De Iaco et al., 2011):
CovY x; d; Y x 0 ; d 0  2 x h x ; x d hd ; d

where x(h x; x) is a correlation function for any pair of observations


separated by distance h x = |x ' x|, and d(h d; d) is another correlation function of the vertical separation distances, hd = |d d|. The
parameter, 2, is the variance. This simple model assumes that the
covariances can be written as the product of a function that depends
only on the horizontal separation distances, and one that depends
only on the vertical separation distances. Although this can be restrictive, it provides a useful starting point, and we suggest possible alternatives in the discussion.
In the product covariance model, we can choose any permissible correlation functions (see e.g. Webster and Oliver, 2001) for x(hx; x)
and d(hd; d). However, as Truong et al. (2014) point out, there is no
information in areal-averaged data (or in our case, depth intervalaveraged data) to dene a nugget effect. We therefore assume that the
depth-wise correlation function, d(hd; d), has zero nugget, and
model it with a single spatial autocorrelation structure. We can still
include a nugget effect for the horizontal variation though, and we
write x(hx; x) as the sum of a nugget component and Nm spatial
autocorrelation structures:
x hx ; x s0 x;0 h x

Nm
X



si x;i h x ; x;i

i1

1
if hx 0
is the nugget correlation function;
0
otherwise
x;i(h x ; x;i), i = 1, , Nm, are Nm spatial correlation functions, with
parameter vectors x;i; parameters si, i = 1, , Nm are the proportions of
variance associated with each of the Nm spatial correlation functions;
where: x;0 hx f

176

T.G. Orton et al. / Geoderma 262 (2016) 174186


N

m
and the parameter s0 1i1
si gives the proportion associated
with the nugget variation.

2.3. IAK model: the LMM for depth interval-averaged data

Reparameterizing in terms of ci = si 2, i = 0, , Nm, (i.e. the variances associated with the nugget and each of the Nm spatial correlation
functions), we can write the full product covariance model as:

Eq. (1) presented the statistical model for the point-support variable,
with mean function given by Eq. (2) and covariance function Eq. (8).
However, our data are given as depth interval averages, and we must
use these point-support models to calculate expectations and covariances for the interval support. In the geostatistical literature, this
process is known as regularization (e.g. Goovaerts, 2008), with its
reverse the use of interval-support data to infer point-support
models known as deconvolution.
We assume that the measurement of variable Y for an interval I =
[u, l] (where l N u) represents the arithmetic mean of point values of Y
within this interval. The depth interval-averaged variable is then a linear combination of multivariate normal variables, and is also multivariate
normal (see e.g. Kyriakidis and Yoo, 2005). Its mean and covariance
matrix are given by interval averages of the respective statistics for
the point-support variable.
First, the expectation on interval support is:

CovY x; d; Y x 0 ; d 0 

8 N !
m
X
>
>
>
ci d hd ; d
>
<

if hx 0
i0
!
:
Nm
otherwise
X
>


>
>
>
ci x;i hx ; x;i d hd ; d
:
i1

5
We work with this form herein.
For the vertical correlation, we will assume an exponential function.
One property of the exponential function that we will make use of in
this work is that it is integrable; this will allow us to derive an analytical
function for the average covariances (that represent the correlation
between observations of depth-interval averages), rather than requiring a numerical procedure to approximate them.
2.2. IAK model: non-stationary variances

ci x; d; x 0 ; d 0 f i x; d f i x 0 ; d 0 :

Although the fi(x, d) functions can in theory be chosen to model variances that depend on spatially-varying covariates, here we consider
only the dependence on depth. In particular, we will consider:
f i x; d f i d P ri d; i

where P ri d; i denotes the polynomial in d of order ri, with parameter


vector, i, of length ri + 1. In this work, we take each polynomial to be of
order ri = 2 (although there is no theoretical reason for not including
higher-order terms, if deemed to be necessary). Lower-degree polynomials are achieved by setting the higher-order coefcients to zero. In
order for the resulting covariance function to be positive denite, we
must ensure that fi(x, d) N 0 for all i and for all locations in the study
area (both data and prediction locations).
Substituting these polynomial functions into Eq. (5) gives:
Nm
X

where Xx; I is the average of the point-support design matrix, X(x, d),

The stationarity of the variance assumed in the previous section


might be particularly restrictive for analysing 3-D data that vary both
horizontally and vertically. In particular, we might expect different
horizontal (spatial) variability at different depths (e.g. for a ploughed
eld, we might expect smooth variability in clay content in the cultivated top-soil, and a larger degree of variability at depth). The stationary
model given in Eq. (5) can be generalized to accommodate variances,
ci(x, d, x, d), that depend on the spatial locations and depths, which
provides a mechanism to model such effects.
We follow various authors (e.g. Lark, 2009; Haskard and Lark, 2009;
Marchant et al., 2009) in assuming that variance terms can be modelled
by the product of a (positive) function applied at the two locations in
question:

8
>
>
>
>
<

x; I Xx; I

P ri d; i P ri d 0 ; i d hd ; d


if hx 0
0 
i0
!
:
Cov Y x; d; Y x0 ; d
Nm
otherwise
X
>


>
0
>
>
P ri d; i P ri d ; i x;i hx ; x;i d hd ; d
:
i1

8
We write the complete set of covariance parameters as =
{ x , d, }, where contains all of the i s and x contains all of
the x;i s.

for all d I. We write X as shorthand for the matrix consisting of Xx; I


for all data. We will consider spatial covariates that do not depend on
depth. However, we will allow interactions between these spatial covariates and depth, as detailed in Section 3.2; for example, to model the
interaction between a spatial covariate w and depth squared (d2), we
would include the values wd 2 wd 2 wu2 ul l 2 =3 in Xx; I.
Note that this is not the same as using depth interval mid-points to
2

dene the xed-effects ( wd wu l=2 wu2 2ul l 2 =4 ).


By using average values of X(x, d) over the sample supports here,
trend calibration will explicitly account for the change of support: the
expectation X(x, d) will be relevant for a point-support variable and
Xx; I relevant for a variable on increment-averaged support.
Second, the covariance for a pair of observations on interval support
is given by the average of the covariances from the point-support model
over the respective depth intervals for the two observations in question.
That is:
CovY x; I ; Y x 0 ; I 0 

1
jIj jI 0 j

Z Z
dI

CovY x; d; Y x 0 ; d 0 dd 0 dd 10

d 0 I 0

where |I| and |I| are the lengths of the intervals I and I, respectively
(Kyriakidis, 2004). We use as shorthand C to denote the full data covariance matrix with elements dened by Eq. (10).
In ATP kriging, these average covariances are usually computed
numerically, by discretizing the areal units into a number of points,
calculating covariances between these points, and averaging the values.
Such a numerical procedure is necessary because the integrals (to
compute these averages) of the point covariance function over irregular
two-dimensional areal units are analytically intractable. However, in
our case, the integrals are in one dimension (depth) and as a result
are analytically tractable for certain point-covariance functions. In this
work, we consider the exponential model for the depth-wise correlation
function, d(hd; d), where d = ad is a single distance parameter
(approximately one third of the effective range of correlation), which
allows analytical results for the 1-D interval-averaged covariance. This
signicantly reduces the computational load compared with a numerical procedure, particularly when maximum likelihood methods are
used for parameter estimation. The interval-averaged exponential
covariance function is derived in the Supplementary material and
presented in Appendix A. Herein, we refer to the methods described
in this section for increment-averaged data, and the predictions built

T.G. Orton et al. / Geoderma 262 (2016) 174186

on these methods (which follow standard practices as described in


the following section), as increment-averaged kriging (IAK).

177

3. Methods

(continuous) covariates with correlation (|r|) greater than 0.85, and


remove from the pool the one with the least explanatory power for
the target variable; that is, we formulate the multiple linear regression
model:

3.1. IAK parameter estimation and model selection methods

yi 0 1 d i 2 d 2i 3 w i 4 d i w i 5 d 2i wi i ;

i 1; ; N;
14

Parameter estimation for kriging is often carried out by a method-ofmoments approach. For ATP-kriging, Goovaerts (2008) presents an
iterative method-of-moments approach to perform deconvolution and
estimate parameters of a point-support variogram. For point-support
data, Lark (2000) demonstrated theoretical advantages of maximum
likelihood (compared to method-of-moments) to estimate parameters,
and this approach was suggested by Kyriakidis (2004) as an alternative
for ATP-kriging parameter estimation. A further improvement over
maximum likelihood is residual maximum likelihood (REML), introduced by Patterson and Thompson (1971) to reduce the bias in variance
parameters as a result of the unknown xed-effect parameters (Lark
et al., 2006). We t parameters for the IAK model using REML; the
REML formula is exactly the same as in the usual case, but with Xx; I
in place of X(x, d) to give the xed-effect design matrix X , and
Eq. (A6) used to calculate the elements of the covariance matrix, C:
1   1  T 1 
ln C ln X C X
2
2
 T 1 1 T 1

1 T 1
1
y C C X X C X
y ;
X C
2

lnR j y k

11

where k is a constant (which can be ignored). Two models tted by


REML with the same xed-effects structure can be compared using
the Akaike information criterion (AIC; Akaike, 1974), and we select
covariance models based on this criterion.
The xed-effect parameters can be estimated, conditionally on the
REML-estimated covariance parameters, as:

1 T 1
^ XT C1 X
X C
y :

12

The covariance matrix for the uncertainty of these parameters is:


h i  T 1 1
^ X C X
var
:

13

Eqs. (12) and (13) can be used in Wald tests to determine the significance of particular covariates.
3.2. IAK xed-effect and covariance model selection algorithm
We model the horizontal and vertical trends by considering interactions between the spatial covariates and depth. By doing this, different
spatial trends can be represented at different depths. We assume that
our spatial covariates vary in space only; thus, the only mechanism
to represent different trends with depth is some kind of interaction
between these spatial covariates and depth. Throughout the following,
we reserve the term predictors to refer to the columns of the xedeffect design matrix (which may include interactions with depth), and
use covariates or input variables to refer to the original spatial covariates (without interactions with depth). If a model includes predictors
based on a categorical input variable of three classes (without depth
interactions), then removing this single input variable from the model
would reduce the number of predictors by two.
We begin with a large pool of potential spatial covariates,
from which we which we initially remove redundant (highly correlated) covariates. Following Bishop et al. (2015), we identify pairs of

based on each of the pair of identied covariates, wi = wi(1) and wi =


wi(2), (using depth-interval midpoints as a crude approximation for
each di), and remove the one giving the smaller R2.
Based on this reduced pool of nc spatial covariates, a full xed-effect
design matrix is formulated with interactions between the nc covariates,
and depth, d, and d2. (Here, we do not consider interactions between the
spatial covariates themselves.) For a datum sampled on interval [u, l],
the mean of d (equal to the interval midpoint), and the mean of d2
(equal to (u2 + ul + l2)/3) are used to compute elements of the design
matrix. This gives a design matrix with 3nc + 3 columns (the
predictors).
It is at this point that we consider choice of an appropriate covariance model structure. We t several alternative covariance models
with the full xed-effect design matrix, and select the one giving the
smallest AIC. For these alternatives, we consider exponential and Gaussian correlation models, each with nugget and spatial standard deviations given by polynomials in d of order ri 2, i = 0, 1 (i = 0 for the
nugget component, i = 1 for the spatial component; Eq. (7)). We also
consider a pure nugget model, again with r0 2; this gives 21 models
to t and compare.
The full design matrix is then reduced based on an iterative procedure with Wald tests. The selected covariance model is tted with the
full 3nc + 3-column design matrix. Wald tests are then applied and
the least signicant of these 3nc + 3 predictors removed, if its signicance is greater than p = 0.05. Note that we only allow removal of the
highest-order term, so for instance dw cannot be removed if d 2w is
still in the model, and d 2 cannot be removed if the model still contains
any interactions between d 2 and the spatial covariates. Also note that
categorical predictors based on an input variable with more than two
categories occupy more than one column of the design matrix, and
these columns are considered for removal together. This process is
repeated, retting the covariance model each time a column is removed
from the design matrix, until all remaining variables have a signicance
of p b 0.05.
3.3. IAK prediction
With the methods described in Section 2 used to dene the design
matrix, X, and covariance matrix, C, prediction of the primary soil property at unsampled locations follows the standard universal kriging
equations (see e.g. Webster and Oliver, 2001). That is, the prediction is:


^ C 0;d C1 y Xd
^ ;
^0 X 0
y
d
d;d

15

and variance:
1

^0  C 0;0 C0;d Cd;d Cd;0


var y

 T 1 1 
T
1
1
X0 C0;d Cd;d Xd Xd Cd;d Xd
X0 C0;d Cd;d Xd ;

16

where 0 and d index the rows of X corresponding to the prediction and


data, respectively, and C0;d refers to the submatrix of C representing covariances between the prediction and data (similarly for Cd;d , Cd;0 , and
C 0;0 ). Note that the nal additive term in Eq. (16) accounts for uncer^ We must also consider
tainty in the estimated trend parameters, .
the desired prediction support in both the horizontal and vertical
dimensions. Here, we consider point-support predictions in the

178

T.G. Orton et al. / Geoderma 262 (2016) 174186

horizontal dimension and interval-support in the vertical dimension.


For validation we choose the vertical supports of the validation data,
and for producing maps we choose the vertical supports of 010 cm,
5060 cm, and 90100 cm for illustrative purposes. All predictions are
calculated using all estimation data (i.e. a global search window is used).

here. The magnitude of the standardized coefcients then provides


some indication of the importance of each covariate in the xed-effect
function (albeit with some reservations about colinearity of predictors).

3.4. Interpreting xed-effect model coefcients

4.1. Site details and data

When input variables are dened on their original scales, it can be


difcult to interpret and compare the estimated xed-effect coefcients.
For a simple means of comparing regression coefcients, the continuous
input variables can be standardized to have means of 0 and standard
deviations of 1 (Schielzeth, 2010). Gelman (2008) proposed standardizing to have standard deviations of 0.5 (i.e. by dividing by two standard
deviations), so that coefcients for binary variables are more comparable to those for the continuous predictors, and we follow this approach

Stream and land salinity is a major issue in the MurrayDarling


basin, eastern Australia, due to its threat to ecosystem health and
agricultural productivity. Several sub-catchments have previously
been agged as having high salt exports and high stream salinities
(Department of Environment and Climate Change NSW, 2009), and
one of these the 1025 km2 Muttama creek sub-catchment of the
Murrumbidgee river, Fig. 1 was selected as the focus of the current
study. Knowledge and understanding of the key causes of salt

4. Case study

Fig. 1. The study area in the MurrayDarling basin, eastern Australia. Coordinates are relative to an origin south west of the study area. The fty estimation data proles are shown by
crosses and the ve numbered validation data proles by open circles. Note the close proximity of four of the validation locations to data points, so that their symbols overlap.

T.G. Orton et al. / Geoderma 262 (2016) 174186

mobilisation from landscapes, and its spatial variability, are vital for
salinity control and effective management of the land. Many soil variables affect the release of salt into waterways; here we focus on soil
texture, in particular clay content.
Soil cores were collected in 2013 from 55 locations across the study
area (Fig. 1). Each soil core was taken to a depth of 1 m (or less where
shallower soil did not permit this), and divided into horizons. All locations provided between three and six horizons giving a total of 192 samples available for laboratory analysis. Amongst a number of soil
properties, clay content was measured using the hydrometer method.
Fig. 2 shows histograms summarizing these data at three depths in the
prole (based on the midpoints of sampling intervals, d, for display
purposes only); upper ( d 0:15 m ), middle ( 0:15bd0:5 m ), and
lower (dN0: 5 m). The data at each depth appear reasonably symmetric,
and we proceed with analysis under the assumption that clay content is
a Gaussian random variable. The right-hand panel of this gure shows
all of the data plotted with the lengths of lines indicating the sampling
intervals. This shows the range of sampling intervals in the dataset,
and the increasing trend in clay content down the prole. Thicker bars
occur where multiple data are very similar.
The aim of this study is to model the clay content data in all samples
and map it over the study area for any required depth interval of interest. As detailed previously, we choose the three depths of 010 cm, 50
60 cm, and 90100 cm for mapping, to illustrate differences in the spatial distribution of clay content down the soil prole. For modelling and
mapping, we utilize spatial covariate data on 29 covariates, as listed in
Table 1. For further details of these covariates, we refer to Bishop et al.
(2015). We acknowledge here that this dataset of just 55 spatial locations does not provide the sternest test of the methodology. At this
stage, the aim of the case study is more to illustrate the potential of
the methodology to deal with a dataset of various sampled depth

179

intervals, modelling trends and spatially-correlated residuals in both


the horizontal and vertical domains.
4.2. Validation and mapping
Validation is also an important part of any geostatistical analysis. The
support at which validation is carried out should ideally represent the
support on which predictions are ultimately required (Bishop et al.,
2015). For us, the required support is point support in the horizontal
domain and interval supports of 010 cm, 5060 cm, and 90100 cm
vertically. However, since our data were sampled based on soil horizons,
the depth intervals are irregular between spatial locations; indeed only
six, one and three of the 55 locations produced data directly for the
three mapping depth intervals, respectively. Furthermore the data are
clustered in horizontal space, which provides another complication for
calculating validation statistics (Brus et al., 2011). Therefore, validation
(either by data-splitting or by a full cross-validation exercise) using
our dataset (of clustered data from a non-probability sampling design)
would not answer the specic questions relevant to our case
study. Brus et al. (2011) recommended that when calibration data are
a non-probability sample, validation of digital soil maps should be
through an additional probability sample, which would be useful in
our case to calculate model-free estimates of validation statistics for
specic depth intervals of interest. However, without such an independent validation set, we consider the following data-splitting exercise,
and stress that this serves more as an illustration rather than a full
validation exercise.
We split the data into estimation and validation data as follows. We
remove ve of the proles, one selected randomly from each of ve
subregions dened by a k-means clustering on the spatial coordinates
of the entire study area. A stratied random sample was preferred

Fig. 2. Histogram plots (left) of the clay content data in the upper (sample midpoint d 0:15 m), middle (0:15 mbd 0:5 m), and lower (dN0:5 m) soil proles. The right-hand plot shows all
data plotted with vertical lines representing the sample depth intervals.

180

T.G. Orton et al. / Geoderma 262 (2016) 174186

Table 1
Summary information of the available covariates.
Category

Variable names and descriptions

Spatial
support/scale

Source

Spatial coordinates
Digital terrain attributes

Eastings, Northings, relative to an origin south west of the study area


Elevation, slope, aspect, plan curvature, prole curvature, wetness index, altitude above channel
network, length-slope factor, multi-resolution valley bottom atness index, multi-resolution
ridge-top atness index, topographic position index (11 variables)
Potassium (K, %), thorium (Th, ppm), uranium (U, ppm), ratio Th:K, ratio U:K,
ratio U:Th, ratio U2:Th, dose rate (terrestrial sources of radiation), total dose (terrestrial and
cosmic sources of radiation), weathering intensity index (10 variables)
Three categories (cropping/grazing/other)
Three categories (felsic/mac/other)

n/a
90-m raster

n/a
NASA

105-m raster

Minty et al. (2009)

1:100 000 polygons


1:250 000 polygons

ABARES
Geoscience Australia

Radiometrics

Land use
Geology

here to a purely random sample so as to spread the limited validation


data over the study area. The locations of the ve removed proles are
shown by the open circles in Fig. 1; the remaining 50 proles (with a
total of 174 horizons) are used as estimation data, and shown as the
asterisks in Fig. 1. As can be seen, four of the validation locations fall
very close to locations in the estimation dataset (between 15 m and
50 m from their nearest estimation data locations). The other location
was on the edge of the study area, 1.7 km from its nearest neighbour.
To provide a numerical evaluation of errors for each predicted
prole, we calculate a root mean squared error (RMSE) for each of the
ve validation proles, with contributions weighted according to the
thickness of each sampled depth interval:
v
u
ni
2
X
u


1
^i j yi j ;

li j ui j y
RMSEi t
lini ui1 j1

model was best, with orders r0 = 2 and r1 = 1 selected for the nugget
and spatial standard deviations, respectively (the black and grey dotted
lines in Fig. 3). Both components were smallest at the top of the prole.
The nugget, represented by a quadratic function of depth, reached a
maximum at around 60 cm before decreasing, whilst the spatial standard deviation continued to increase down the prole. The selected
Gaussian model gave a smaller AIC than the three pure nugget models,
indicating that there is some spatial correlation in the residuals from the
tted trend model. We work with the Gaussian spatial correlation
model with r0 = 2 and r1 = 1 for IAK herein.
5.3. Fixed-effect model selection

5. Results

Wald tests were used to remove predictors from the full design
matrix that did not contain useful information for predicting the target
variable. The procedure presented in Section 3.2 resulted in a design
matrix with 21 columns (reduced from 69), based on 8 different spatial
variables (4 digital terrain model variables, 3 radiometrics variables, and
the geological classes, Fig. 4). We report the tted xed-effect model,
applied for three depth intervals: 010 cm, 5060 cm and 90100 cm
(Table 3a). To apply the model for depth interval [u, l], where the
model contains the three terms, 1 elev, 2 d elev and 3 d2 elev
(where elev is the elevation), for example, we present the coefcient

5.1. Spatial covariate pool

1 2 d 3 d , where d is the expected value of d in the interval

17

where ij is the prediction of yij (the validation datum for the jth of the ni
layers of validation prole i), uij and lij are the bounds of its sampled
depth interval, ui1 is the upper bound of the top layer for prole i (0 in
each case), and lini is the lower bound of the bottom layer for prole i.

The full pool of spatial covariates (Table 1) consisted of 23 continuous variables (horizontal coordinates, DTM-derived and radiometrics
variables) and two categorical variables (land use and geology, both
with three classes). Highly-correlated covariates were removed according to the algorithm detailed in Section 3.2. This resulted in ve of the 23
continuous covariates being removed from the predictor pool (total
dose, Th and U from the radiometrics variables because of high correlations with dose rate; ratio U:Th because of its high correlation with ratio
U2:Th; length-slope factor due to its correlation with slope). A full
design matrix was formulated for IAK based on interactions between
these spatial covariates and both depth, d, and d2. (Recall that interactions between the spatial covariates themselves were not considered.)
This matrix contained 174 rows (for the 50 spatial locations with data
for three to six horizons at each) and 69 columns: 22 (18 columns for
continuous predictors + 4 columns for categorical predictors) multiplied by 3 (no interaction, interaction with d, interaction with d2) plus
3 (terms for the constant, d and d2).
5.2. Spatial covariance model selection
Twenty-one different covariance models were tted using the full
design matrices to give xed effects. Exponential and Gaussian spatial
covariance models were compared through their AICs, with differentordered polynomials of d used to give the nugget and spatial standard
deviations (Table 2). A pure nugget model, representing no spatial
correlation, was also compared. The results suggest that the Gaussian

[u, l] (= (u + l)/2) and d is the expected value of d2 in [u, l] (=


(u2 + ul + l2)/3). The changing coefcients demonstrate the ability
of the model to represent different relationships at different depths.
For instance, the elevation had a small positive effect on clay content
in the topsoil, but a larger negative impact lower down the prole.
The coefcients presented in Table 3a were calculated with input
variables dened on their original scales. To allow some comparison of
the effects of each variable, we re-estimate the xed-effect function
with the continuous input variables standardized to have means of 0
and standard deviations of 0.5 (Table 3b). The three variables deemed
to be most important at each depth interval are highlighted, suggesting
that geology and weathering intensity index are important in the
topsoil, whilst geology, potassium and the radiometrics dose rate are
important for mapping in the 0.91.0 m depth.

Table 2
AICs of the 18 tested covariance models; r1 is the order of the polynomial used to model
the square root of the spatial variance, r0 is the order for the square root of the nugget.
Selected model is shown in bold type.
Pure nugget

Nugget

r0

0
1
2

691.9
666.1
662.6

Exponential

Gaussian

r1

r1

680.5
665.7
664.3

664.4
658.1
656.6

661.4
657.9
657.8

679.2
661.7
660.1

664.5
657.0
654.2

661.2
657.8
655.8

T.G. Orton et al. / Geoderma 262 (2016) 174186

181

The residuals from the IAK xed-effect function were modelled with
a Gaussian correlation model (with effective range of correlation 12 km)
with nugget effect. The variance parameters of this model depended on
depth, as shown in Fig. 3 (the black and grey solid lines, for the nugget
and spatial standard deviations, respectively); the functions are very
similar to those tted based on the full xed-effect design matrix.
The increasing standard deviations down the prole reect larger
uncertainty with depth. The vertical correlation model had an effective
range of 70 cm.
5.4. Validation

Fig. 3. Fitted functions for the nugget, (f0(d), black lines) and spatial (f1(d), grey lines)
standard deviations (see Eqs. (6) and (7)). Dotted lines are the functions tted with the
full xed-effect design matrix, solid lines are tted after removal of insignicant xed
effects.

The predicted clay content proles at the ve validation locations


are shown in Fig. 5. The shapes of the continuous predicted proles
(on 1-cm average prediction support) show reasonable agreement
with the validation data (shaded horizontal bars). The plots also show
the IAK predictions at the support of the validation data (vertical solid
black lines), and their associated 95% prediction intervals (vertical
dotted lines). The RMSE was smallest for locations 1 and 4 and largest
for location 5, the validation location that was the farthest from its
nearest estimation datapoint. The 95% prediction intervals for 12
of the 18 validation horizons (67%) captured the true clay content
(this should be close to 95%). However, with just 5 validation proles,
it is not very meaningful to read much into numerical measures of
the adequacy of predictions and their uncertainty assessments. Usually one would expect smaller prediction variances when the target
depth interval is wide. However, this effect is not apparent in our
prediction intervals, since we allowed variance parameters to be
functions of depth, which had a larger effect on prediction variances
in our study.
Fig. 5 also illustrates the coherency of the IAK predictions. That is, the
area to the left-hand side of the continuous prediction line is equal to
the area to the left of each vertical black bar in Fig. 5. Coherence (also
known as the pycnophylactic property) of ATP-kriging point predictions
with areal data is a property of the methodology (Kyriakidis, 2004),

Fig. 4. The eight selected covariates. aacn: altitude above channel network, mrrtf: multi-resolution ridge-top atness index, wndx: weathering intensity index.

182

T.G. Orton et al. / Geoderma 262 (2016) 174186

Table 3
Coefcients and standardized coefcients of the tted xed-effect function, presented for three illustrative depths. The three largest standardized coefcients are highlighted for
each depth.
Depth, m

Int

(a)
00.1
0.50.6
0.91.0

Fixed-effect function coefcients


50.4
3.62
0.0463
67.5
21.5
0.666
127
35.7
1.24

dosef

wndx

(b)
00.1
0.50.6
0.91.0

Fixed-effect function standardized coefcients


26.4
4.16
1.78
9.07
52.2
24.7
25.7
9.07
58.6
41.1
47.6
9.07

5.87
5.87
5.87

elev

mrrtf

0.0121
0.0193
0.207
1.81
2.89
31.0

0.608
4.64
8.84
1.08
8.22
15.6

aacn
0.0112
0.365
0.666

slope

g1

g2

46.1
132
275

5.92
30.6
50.3

14.5
20.0
24.3

5.92
30.6
50.3

14.5
20.0
24.3

0.373
12.2
22.3

3.41
9.79
20.4

Int: intercept, mean for reference geological class, other; K: potassium; dosef: dose rate; wndx: weathering intensity index; elev: elevation; mrrtf: multi-resolution ridge-top atness index;
aacn: altitude above channel network; slope: slope; g1: mean for felsic geological class in comparison to reference class, other; g2: mean for mac geological class in comparison to reference class, other

which guarantees the agreement between predictions at different supports demonstrated by the IAK approach in this study. This coherence
would seem to be a desirable property of a method that is to be
employed for prediction over various scales (in terms of the widths
of target interval).

sampling intervals (e.g. Hengl et al., 2014), or multi-stage approaches


(e.g. spline-then-krige, STK; Malone et al., 2011a) that fail to account
for the uncertainty in the interval-sampled data in the formulation of
prediction models.
6.1. Variance with depth

5.5. Mapping
Fig. 6 (left) shows maps of the predicted clay contents at depths of
010 cm, 5060 cm and 90100 cm, and Fig. 6 (right) shows the associated widths of the 95% prediction intervals. For presentation, values less
than 0% or greater than 100% were truncated to these limits. The topsoil
map is relatively homogeneous for large parts of the study area. It was
suggested (Table 3) that geology and weathering intensity index were
the most important predictors for dening the spatial distribution in
the 010-cm depth interval, and the effects of the geology can clearly
be seen in the resultant map. Lower in the prole, the predicted clay
contents are more variable, with more extreme predictions; geology,
potassium and the radiometrics dose rate were suggested (Table 3) as
being the most important for the 90100-cm depth. The maps of prediction interval widths show the smallest uncertainties in the topsoil and
the largest uncertainties lower in the prole.
6. Discussion
We have presented a framework for analysis of soil prole data, in
which averages of the soil property over depth intervals (horizons
or xed depth intervals), rather than observations at exact points in
the prole, are measured. The increment-averaged kriging (IAK)
approach built on the methodology of area-to-point kriging
(Kyriakidis, 2004) accounts properly for the vertical support of
the data. This is in contrast to approaches built on assuming samples
were collected from interval midpoints that ignore the thicknesses of

We allowed the covariance parameters for the IAK approach to


depend on depth. By doing this, different patterns of spatial variability
in the residuals could be modelled at different depths in the soil prole.
This led to larger variances, with a greater proportion of spatial variance,
being tted for the horizontal variability deeper in the soil prole.
Malone et al. (2011b) validated predictions and prediction intervals
from a spline-then-krige (STK) approach, in which prediction uncertainty was estimated empirically from the residuals (after model tting)
by a fuzzy k-means approach. Their results suggested underestimation
of variance for deeper soils. A possible reason for this underestimation
is the generally wider sampling intervals at depth, meaning that the
imputed data in a STK approach should convey less information about
the soil property in the 90100 cm interval than a true measurement
of the 90100 cm interval. This uncertainty in the values extracted
from the spline is not propagated into model tting in a STK approach.
The IAK approach presented here could combat this, and it would be
interesting to compare results in a more extensive validation exercise.
6.2. Extensions of the covariance model
In contrast to the variance parameters, the range of spatial correlation was assumed constant for all depths. Haskard and Lark (2009)
present a spectral tempering approach, which allows the range of a
covariance model to depend on covariates. This could be applied to extend the covariance modelling approach used here so that the range can
depend on depth. Also, we used a separable covariance model to model

Fig. 5. Validation data (shaded bars) and predictions for the ve validation locations (left to right). Continuous lines show predictions of 1-cm averages. Solid and dashed vertical lines
show predictions and 95% prediction intervals, respectively, at the support of the validation data.

T.G. Orton et al. / Geoderma 262 (2016) 174186

183

Fig. 6. Predictions (left) and widths of 95% prediction intervals (right) for clay content (%) at depths 010 cm (top), 5060 cm (middle), and 90100 cm (bottom).

the variation with horizontal separation distance and vertical separation


distance. This simple model provided a reasonable starting point, but
has been criticised for being based on unrealistic assumptions when

applied in spacetime analyses (e.g. Stein, 2005). Relaxations of


this assumption are possible; for instance, productsum covariance
models have been suggested as a exible generalization for modelling

184

T.G. Orton et al. / Geoderma 262 (2016) 174186

spatio-temporal data (e.g. De Iaco et al., 2001). Another possibility is a


sum-metric model (e.g. Heuvelink and Grifth, 2010), although implementation of this model for increment-averaged data might require numerical rather than analytical calculation of average covariances, which
could prove challenging for large datasets. Nonetheless, such extensions
could be implemented and may improve covariance modelling in our
context.
We assumed that the vertical component of the covariance model
had zero nugget variance, since it has no effect on the likelihood of
increment-averaged data (all white-noise variation in the vertical
dimension is averaged out). However, it is likely that there is variation in the soil property occurring over ner scales than the sampling
interval widths, and the data contain no information to model this.
Truong et al. (2014) suggested that expert opinion could be used to
dene a nugget variance in such situations, and this could be used
if predictions are required on a point support. In our case predictions
are only required on a block (or increment average) support, therefore the nugget variance is also averaged out of the prediction
variance. Nonetheless, similar ideas might be used to dene some
short-range spatially-correlated component of variation (perhaps
by assuming an exponential model for this short-range component
with a range of 5 cm, and eliciting information about the expected
difference between values of the soil property at 5-cm intervals).
Prediction variances might be sensitive to the amount of shortrange variation.

6.3. Extensions of the trend model


In this study we used a multiple linear regression model to give
the trend, allowing interactions between the spatial covariates
and depth (and depth squared) so that different spatial trends would
be modelled at different depths. The use of a linear model of the
covariates in the spatial domain could be a limitation. Incorporation of
interactions between the spatial covariates provides one possible remedy to this. Other non-linear terms in d could be included if, for example,
an exponential change in a soil property was expected down the soil
prole, although some trend parameters will then have to be tted
numerically, along with the covariance parameters, by maximum likelihood (rather than REML). Alternatively, machine-learning techniques
(e.g. articial neural networks, random forests and other regression
tree methods) have demonstrated the ability to model non-linear
relationships between predictors and target variables. A common
approach is to use these techniques in a two-stage methodology, rst
tting the trend model with the machine-learning technique assuming
independence of model residuals, and second performing a spatial
analysis of the residuals and kriging these (e.g. Malone et al., 2009;
Lacarce et al., 2012; Martin et al., 2014). It is possible that the trendtting step will overt, because of the assumption of independence
made in this stage. To combat this, regression tree methods could be
implemented as more of a one-stage approach by performing the
model tting whilst accounting for spatial correlation of residuals. For
instance, the output of a regression tree analysis is a collection of splits
of the covariates, and the predicted value for each branch end (or terminal node) of the tree. The collection of covariate splits effectively denes
a design matrix for the data (and for predictions at unsampled loca^ can be
tions), so that the means for the terminal nodes (the vector )
retted by REML in the IAK framework. By adopting this approach, the
trend will be (partly) tted whilst accounting for both the spatial correlation in the residuals and the varying uncertainties arising from the
different sampled depth intervals. However, this is beyond the scope
of the current work.
An additional advantage of regression-tree approaches is the
discretization of covariates into classes, so that extrapolation to extreme
values of the covariates is not an issue. In this work, in which measured
clay contents ranged from 8 to 83%, many predictions fell outside this

range (particularly for the deepest soil). A regression tree approach, as


described above, would deal with this, as would curtailing the covariates to their ranges observed in the estimation dataset.
6.4. Transformations
For modelling soil-texture variables, an additional criterion comes
into play: that each compositional variable (a percentage) must be
between 0 and 100%, and if modelling sand, silt and clay contents simultaneously, their sum must be 100%. Lark and Bishop (2007) suggested
the additive log transform as an appropriate variable for analysis of
such soil texture data. This consists of analysing the two transformed
clay
silt
and y2 ln sand
, as Gaussian variables, before
variables, y1 ln sand
back-transforming predictions of these two variables; predicted values
for the three fractions are then guaranteed to be between 0 and 100%
and sum to 100%. To consider such an analysis within the framework
of the IAK approach, we would have to consider the scale of averaging.
For instance, we assumed in this study that the clay content for an interval [u, l] represented the arithmetic mean of point values within this
interval. This is no longer a valid assumption for the transformed
variables, y1 and y2. The same is true for log-transformed variables.
Orton et al. (in press) considered the effects of the scale of averaging
on composite-sampled soil data (i.e. samples that are formed by aggregating a number of basic soil aliquots, before these composite samples
are measured). This approach could be combined with the IAK approach
presented here to deal with lognormal, or other transformed variables,
and is something that we plan to investigate in further work.

6.5. Modelling abrupt changes in the soil prole


Adhikari et al. (2013) considered mapping soil texture in Denmark
using a STK approach, in which they expected more or less uniform
soil texture in the top ploughed layer of agricultural soils. To deal with
this, they introduced articial data at the top and bottom of the
ploughed layer, both of thickness 1 cm and with the same measurement
as the ploughed layer. This forced the splines of soil texture variables to
be constant in the top layer, and only change below the second 1-cm
datum. A similar approach could be applied with the IAK method
presented here. For example, to deal with texture-contrast soils
(which are common in many parts of Australia) in which there is an
abrupt change between the A and B horizons, 1-cm thick data could
be imputed at the bottom of the A and top of the B horizons. This will
have the effect of forcing the prole to have an abrupt change, which
should be damped to some degree as prediction locations move further
from this data location.
An alternative solution was proposed by Kempen et al. (2011),
who dealt with the issue by constructing piecewise depth functions;
the parameters of these functions were interpolated and then applied to model the depth function at unsampled locations. Such a
piecewise model might be incorporated in the trend component of
our framework, perhaps allowing the horizon thicknesses to be predicted as functions of environmental covariates. This would give a
non-linear trend function, therefore its parameters would have to
be estimated by maximum likelihood (rather than REML) using a numerical method. This seems a more elegant solution than the insertion of articial data, but may also be more computationally
demanding due to the number of non-linear parameters. The general
effectiveness of these approaches for dealing with texture-contrast
soils warrants further investigation.
6.6. Use of the method as an alternative imputation approach
Finally, although the main message of this paper has been that we
want to avoid multi-stage procedures, we thought it would be of interest to note that the IAK method also has potential to be used for

T.G. Orton et al. / Geoderma 262 (2016) 174186

imputation in place of the spline stage in STK approaches. This could


be done by tting parameters of a pure-nugget covariance model
with correlation only in the vertical domain for all of the available
soil-prole data. This requires at least two parameters to be tted
based on all soil-prole data: a variance parameter vector, 0 , of
length at least one (modelling the effect of depth on standard deviation), and the distance parameter for the vertical correlation, ad. In
contrast, to utilize the spline approach requires just one parameter
to be tted to all prole data, the smoothness parameter, . Predictions over increments within sampled soil proles are then very similar to those of equal-area splines. One benet of IAK imputation is its
natural assessment of uncertainty in the imputed values, which offers the opportunity to propagate this uncertainty into model tting,
if the model-tting methodology allows. For instance, when modelling soil Carbon stocks, we must consider the bulk densities. Missing
bulk density data in parts of the prole can lead to problems, but
having a reliable means of imputing these values, whilst accounting
for their uncertainty, would be advantageous. Clifford et al. (2014)
considered a non-parametric simulation approach for imputing
missing values in soil proles, whilst accounting for uncertainty.
The work presented here provides an alternative in the framework
of the linear mixed model.

and dene:

0



Si I ; I0 ; i ; d ad Hlu0 H l u
 

 0

 P 5 min l ; l ; i i P 5 maxu ; u0 ; i i
"
(  0 )
l l

 0  
 0 
2
ad P 2 max l ; l ; i P 2 min l ; l ; i exp
ad


0
ju uj
P 2 maxu ; u0 ; i P 2 minu ; u0 ; i exp
ad
( 0
)









l
u
0
0
P 2 max u ; l ; i P 2 min u ; l ; i exp
ad

#
0
ju lj
0
0
P 2 maxl ; u ; i P 2 minl ; u ; i exp
ad

eter vectors for the polynomial terms in Eq. (A2) are:


i 4

We have presented a statistical model to effectively harmonize


data sampled over various depth increments. This allows all data to
be incorporated into a single statistical analysis, whilst properly
accounting for their differing uncertainties. A number of extensions
have been suggested, and we believe the work constitutes an
interesting avenue of research as an alternative procedure for
multi-depth soil mapping to the commonly applied multi-stage
(spline-then-krige) procedures.

A2

where i = 0 is for the nugget variance and i = 1, , Nm are for the


Nm spatial variance functions (Nm = 1 in our case study). The func(
1
if z 0
tion, Hz
is the Heaviside function. The param0
otherwise
2

7. Conclusions

185


3
i0 i1 ad 2 i2 a2d
5
i1 2 i2 ad
i2

3
i0 i1 ad 2 i2 a2d
5
i 4
i1 2 i2 ad
i2

A3

2

2
6
6
6
i i 6
6
6
4

Acknowledgements

A4

i0
i0 i0
i0


i0 i1 i1 i0 =2 i0 i1 i1 i0 =2

i0 i2  i1 i1 i2 i0 =3 i0 i2 i1 i1 i2 i0 =3
i1 i2  i2 i1 =4 i1 i2 i2 i1 =4
i2 i2 =5 i2 i2 =5

3
7
7
7
7
7
7
5

A5

This research was supported under an Australian Postgraduate


Award (APA) for International Postgraduate Research Scholarship
(IPRS) recipients, funded by the Commonwealth Department of
Innovation, Industry, Science and Research (DIISR). We would like
to acknowledge the NSW Department of Agriculture for funding to
support the eld work to collect the soil samples used in this study.
We would also like to thank staff and students at the University
of Sydney who assisted with the eld and lab work for the dataset
presented in this work, in particular Ana Horta, Farzina Akter and
Dipangkar Kundu.

With these denitions, the average covariances are given by:




Cov Y x; I ; Y x0 ; I 0

8
Nm 
X

>
1
>
>
Si I ; I 0 ; i ; d
>
< lu l0 u0 
>
>
>
>
:

if hx 0
i0
Nm 
X


 otherwise
1
0
0

S
I
;
I
;

h
;

x
i
i
x;i
d
x;i
lu l u0 i1

A6

Supplementary material
Appendix A
Here we present an expression for the average covariance for Y(x, I)
and Y(x, I), where I = [u , l] is the depth interval (of an observation
or prediction) at horizontal (point) location x, and I ' = [u , l]
is the depth interval at x. We assume that the point covariances i.e.
for Y(x, d) and Y(x, d) are given by Eq. (8), with order-2 polynomial
functions (i.e. quadratic equations) of depth for the standard deviations associated with the nugget and spatial variances (see
Eqs. (6) and (7)). We also assume that d(hd; d) is the exponential
correlation function:



h
d hd ; d d hd ; ad exp d
ad

A1

Supplementary data to this article can be found online at http://dx.


doi.org/10.1016/j.geoderma.2015.08.013.
References
Adhikari, K., Kheir, R.B., Greve, M.B., Bocher, P.K., Malone, B.P., Minasny, B., McBratney,
A.B., Greve, M.H., 2013. High-resolution 3-D mapping of soil texture in Denmark.
Soil Sci. Soc. Am. J. 77, 860876.
Akaike, H., 1974. A new look at the statistical model identication. IEEE Trans. Autom.
Control 19, 716723.
Arrouays, D., McBratney, A.B., Minasny, B., Hempel, J.W., Heuvelink, G.B.M., MacMillan,
R.A., Hartemink, A.E., Lagacherie, P., McKenzie, N.J., 2014. The GlobalSoilMap project
specications. In: GlobalSoilMap: basis of the global spatial soil information system.
In: Arrouays, D., McKenzie, N.J., Hempel, J.W., Richer de Forges, A.C., McBratney,
A.B. (Eds.).
Bishop, T.F.A., McBratney, A.B., Laslett, G.M., 1999. Modelling soil attribute depth functions
with equal-area quadratic smoothing splines. Geoderma 91, 2745.
Bishop, T.F.A., Horta, A., Karunaratne, S.B., 2015. Validation of digital soil maps at different
spatial supports. Geoderma 241242, 238249.

186

T.G. Orton et al. / Geoderma 262 (2016) 174186

Breidt, F.J., Hsu, N.-J., Ogle, S., 2007. Semiparametric mixed models for incrementaveraged data with application to carbon sequestration in agricultural soils. J. Am.
Stat. Assoc. 102, 803812.
Brus, D.J., Kempen, B., Heuvelink, G.B.M., 2011. Sampling for validation of digital soil
maps. Eur. J. Soil Sci. 62, 394407.
Clifford, D., Dobbie, M.J., Searle, R., 2014. Non-parametric imputation of properties for soil
proles with sparse observations. Geoderma 232, 1018.
De Iaco, S., Myers, D.E., Posa, D., 2001. Spacetime analysis using a general productsum
model. Stat. Probab. Lett. 52, 2128.
De Iaco, S., Myers, D.E., Posa, D., 2011. Strict positive deniteness of a product of covariance functions. Commun. Stat. 40, 44004408.
Department of Environment and Climate Change NSW, 2009. Salinity Audit: Upland
catchments of the New South Wales MurrayDarling Basin Available at: www.
environment.nsw.gov.au/resources/salinity/09153SalinityAudit.pdf.
Eidsvik, J., Shaby, B.A., Reich, B.J., Wheeler, M., Niemi, J., 2014. Estimation and prediction in
spatial models with block composite likelihoods. J. Comput. Graph. Stat. 23, 295315.
Gelman, A., 2008. Scaling regression inputs by dividing by two standard deviations. Stat.
Med. 27, 28652873.
Goovaerts, P., 2008. Kriging and semivariogram deconvolution in the presence of irregular
geographical units. Math. Geosci. 40, 101128.
Haskard, K.A., Lark, R.M., 2009. Modelling non-stationary variance of soil properties by
tempering an empirical spectrum. Geoderma 153, 1828.
Hengl, T., de Jesus, J.M., MacMillan, R.A., Batjes, N.H., Heuvelink, G.B.M., Ribeiro, E.,
Samuel-Rosa, A., Kempen, B., Leenaars, J.G.B., Walsh, M.G., Gonzalez, M.R., 2014.
SoilGrids1km global soil information based on automated mapping. PLoS ONE 9,
e105992.
Heuvelink, G.B.M., 2014. Uncertainty quantication of GlobalSoilMap products. In:
GlobalSoilMap: basis of the global spatial soil information system. In: Arrouays, D.,
McKenzie, N.J., Hempel, J.W., Richer de Forges, A.C., McBratney, A.B. (Eds.).
Heuvelink, G.B.M., Grifth, D.A., 2010. Spacetime geostatistics for geography: a case
study of radiation monitoring across parts of Germany. Geogr. Anal. 42, 161179.
Kempen, B., Brus, D.J., Stoorvogel, J.J., 2011. Three-dimensional mapping of soil organic
matter content using soil type-specic depth functions. Geoderma 162, 107123.
Kerry, R., Goovaerts, P., Rawlins, B.G., Marchant, B.P., 2012. Disaggregation of legacy soil
data using area to point kriging for mapping soil organic carbon at the regional
scale. Geoderma 170, 347358.
Kyriakidis, P.C., 2004. A geostatistical framework for area-to-point spatial interpolation.
Geogr. Anal. 36, 259289.
Kyriakidis, P.C., Yoo, E.-H., 2005. Geostatistical prediction and simulation of point values
from areal data. Geogr. Anal. 37, 124151.
Lacarce, E., Saby, N.P.A., Martin, M.P., Marchant, B.P., Boulonne, L., Meersmans, J., Jolivet, C.,
Bispo, A., Arrouays, D., 2012. Mapping soil Pb stocks and availability in mainland
France combining regression trees with robust geostatistics. Geoderma 170, 359368.
Lark, R.M., 2000. Estimating variograms of soil properties by the method-of-moments and
maximum likelihood. Eur. J. Soil Sci. 51, 717728.
Lark, R.M., 2009. Kriging a soil variable with a simple nonstationary variance model.
J. Agric. Biol. Environ. Stat. 14, 301321.

Lark, R.M., Bishop, T.F.A., 2007. Cokriging particle size fractions of the soil. Eur. J. Soil Sci.
58, 763774.
Lark, R.M., Cullis, B.R., Welham, S.J., 2006. On spatial prediction of soil properties in the
presence of a spatial trend: the empirical best linear unbiased predictor (E-BLUP)
with REML. Eur. J. Soil Sci. 57, 787799.
Malone, B.P., McBratney, A.B., Minasny, B., Laslett, G.M., 2009. Mapping continuous depth
functions of soil carbon storage and available water capacity. Geoderma 154,
138152.
Malone, B.P., McBratney, A.B., Minasny, B., 2011a. Empirical estimates of uncertainty for
mapping continuous depth functions of soil attributes. Geoderma 160, 614626.
Malone, B.P., de Gruijter, J.J., McBratney, A.B., Minasny, B., Brus, D.J., 2011b. Using
additional criteria for measuring the quality of predictions and their uncertainties
in a digital soil mapping framework. Soil Sci. Soc. Am. J. 75, 10321043.
Marchant, B.P., Newman, S., Corstanje, R., Reddy, K.R., Osborne, T.Z., Lark, R.M., 2009.
Spatial monitoring of a non-stationary soil property: phosphorus in a Florida water
conservation area. Eur. J. Soil Sci. 60, 757769.
Martin, M.P., Orton, T.G., Lacarce, E., Meersmans, J., Saby, N.P.A., Paroissien, J.B., Jolivet, C.,
Boulonne, L., Arrouays, D., 2014. Evaluation of modelling approaches for predicting
the spatial distribution of soil organic carbon stocks at the national scale. Geoderma
223225, 97107.
Minty, B., Franklin, R., Milligan, P., Richardson, M., Wilford, J., 2009. The radiometric map
of Australia. Explor. Geophys. 40, 325333.
Orton, T.G., Pringle, M.J., Page, K.L., Dalal, R.C., Bishop, T.F.A., 2014. Spatial prediction of soil
organic carbon stock using a linear model of coregionalisation. Geoderma 230231,
119130.
Orton, T.G. Pringle M.J. Allen D.E. Dalal R.C. Bishop T.F.A. in press. A geostatistical method
to account for the number of aliquots in composite samples for normal and lognormal variables, Eur. J. Soil Sci. http://dx.doi.org/10.1111/ejss.12297.
Patterson, H.D., Thompson, R., 1971. Recovery of inter-block information when block sizes
are unequal. Biometrika 58, 545554.
Poggio, L., Gimona, A., 2014. National scale 3D modelling of soil organic carbon stocks
with uncertainty propagation an example from Scotland. Geoderma 232, 284299.
Schielzeth, H., 2010. Simple means to improve the interpretability of regression coefcients. Methods Ecol. Evol. 1, 103113.
Schirrmann, M., Herbst, R., Wagner, P., Gebbers, R., 2012. Area-to-point kriging of soil
phosphorus composite samples. Commun. Soil Sci. Plant Anal. 43, 10241041.
Stein, M.L., 2005. Spacetime covariance functions. J. Am. Stat. Assoc. 100, 310321.
Stein, M.L., Chi, Z., Welty, L.J., 2004. Approximating likelihoods for large spatial datasets.
J. R. Stat. Soc. Ser. B 66, 275296.
Truong, P.N., Heuvelink, B.M., Pebesma, E., 2014. Bayesian area-to-point kriging using
expert knowledge as informative priors. Int. J. Appl. Earth Obs. Geoinf. 30, 128138.
Veronesi, F., Corstanje, R., Mayr, T., 2012. Mapping soil compaction in 3D with depth
functions. Soil Tillage Res. 124, 111118.
Webster, R., Oliver, M.A., 2001. Geostatistics for environmental scientists. John Wiley &
Sons, Chichester, UK.