HTTP WWW - Sciencedirect.com Science Ob MImg& M Cdi 5840& User 1234512& Search& CoverDate 05!31!2000& Ie Sdarticle

Computers & Geosciences 26 (2000) 361±371
Geostatistical classi®cation for remote sensing: an

introduction
P.M. Atkinson a,*, P. Lewis b
a
Department of Geography, University of Southamptom, High®eld, Southampton SO17 1BJ, UK
b
Department of Geography, University College London, 26 Bedford Way, London, WC1H 0AP, UK
Received 15 August 1998; accepted 13 January 1999
Abstract
Traditional spectral classi®cation of remotely sensed images applied on a pixel-by-pixel basis ignores the
potentially useful spatial information between the values of proximate pixels. For some 30 years the spatial
information inherent in remotely sensed images has been employed, albeit by a limited number of researchers, to
enhance spectral classi®cation. This has been achieved primarily by ®ltering the original imagery to (i) derive texture
`wavebands' for subsequent use in classi®cation or (ii) smooth the imagery prior to (or after) classi®cation. Recently,
the variogram has been used to represent formally the spatial dependence in remotely sensed images and used in
texture classi®cation in place of simple variance ®lters. However, the variogram has also been employed in soil
survey as a smoothing function for unsupervised classi®cation. In this review paper, various methods of
incorporating spatial information into the classi®cation of remotely sensed images are considered. The focus of the
paper is on the variogram in classi®cation both as a measure of texture and as a guide to choice of smoothing
function. In the latter case, the paper focuses on the technique developed for soil survey and considers the
modi®cation that would be necessary for the remote sensing case. 7 2000 Elsevier Science Ltd. All rights reserved.
Keywords: Variogram; Classi®cation; Geostatistics; Smoothing; Texture
1. Introduction niques described in the main body of this paper

depend on the variogram, it is introduced here brie¯y.
This paper reviews geostatistical techniques for the For continuous variables, such as re¯ectance in a
classi®cation of remotely sensed images. Geostatistics given waveband, the experimental semivariance is
is a set of techniques for the analysis of spatial data. de®ned as half the average squared dierence between
All geostatistical techniques are characterised by their values separated by a given lag h, where h is a vector
dependence on a model of the spatial covariance func-
in both distance and direction. Thus, the experimental
tion or, more frequently, the variogram. Since the tech-
variogram gn (h) (sometimes referred to as the semivar-
iogram, but more generally abbreviated to variogram)
* Corresponding author. Tel.: +44-1703-594617; fax: +44- may be obtained from a=1,2, . . . ,P(h) pairs of obser-
1703-593295. vations {zn (xa), zn (xa+h)} de®ned on a support n at
E-mail address: pma@soton.ac.uk (P.M. Atkinson). locations {x, x+h} separated by a ®xed lag h:
0098-3004/00/$ - see front matter 7 2000 Elsevier Science Ltd. All rights reserved.
PII: S 0 0 9 8 - 3 0 0 4 ( 9 9 ) 0 0 1 1 7 - X
362 P.M. Atkinson, P. Lewis / Computers & Geosciences 26 (2000) 361±371
1 X
ph
univariate feature space. In a simple empirical
gn h zn xa ÿ zn xa h2 : 1
2Ph a1 approach presented by Carr (1996) this allocation is
executed through the prior construction of a simple
Curran (1998) and Woodcock et al. (1988a,b) pro- look-up table such that any pixel value will have as-
vide readable introductions to the variogram while sociated with it the destination class without the need
Dungan (1998) reviews geostatistical techniques for to compare the distances to all class means. This pro-
estimation and simulation in a remote sensing context. cedure is readily extended to multivariate feature space
Geostatistical techniques that utilise spatial infor- where the distances to class means may be obtained by
mation in classi®cation can be split into two distinct Euclidean geometry.
groups. In the ®rst, spatial information is used to pro- A second popular method of supervised classi®-
vide data on texture. It is implicit in such approaches cation is maximum likelihood (ML) classi®cation
that texture varies spatially across the image, and par- based on Bayes' Theorem. ML classi®cation proceeds
ticularly between the classes of interest, so that data by selecting the largest posterior probability rather
on texture can be used to inform classi®cation. In the than the minimum distance. In the univariate case, the
second group, spatial information is used to smooth simple empirical method for ML classi®cation
the classi®ed image. The rationale for smoothing is described by Carr (1996) proceeds as for c-means
that inaccuracies that arise from simple spectral classi- classi®cation except that for each training site the com-
®cation applied on a pixel-by-pixel basis can be plete distribution of values is retained and a histogram
reduced using the spatial dependence between neigh- computed in place of the mean. For 8-bit remotely
bouring pixels. Proximate pixels are likely to be similar sensed imagery, each histogram (per class) has 256
(where the spatial resolution is ®ne relative to the scale bins. The number of occurrences in each bin relative to
of variation) and this dependence can be formalised the total number of occurrences determines the con-
(e.g., in a modelled variogram) and utilised to increase ditional probability distribution (conditional because it
classi®cation accuracy. The goal is to choose a smooth- is per class). Once the conditional probability distri-
ing function based upon this variogram model. butions are computed, the simpli®ed equation for
The two approaches (texture and smoothing) to determining class membership probability, as derived
using the variogram in multivariate classi®cation are from Bayes' Theorem, is:
considered separately in the following sections. First,
the principles of supervised spectral classi®cation are pz j c
pc j z 2
revisited to provide a foundation and context within X
t
pz j r
which to regard geostatistical approaches.
r1
where p(cvz ) is the conditional probability of having

class c given pixel value z, p(zvc ) is the conditional
2. Supervised classi®cation revisited
probability that pixel value z belongs to class c and
there are r = 1, . . . ,t classes. A pixel is then allocated
In this paper, model-based multivariate supervised
to the class with which it has the highest posterior
(e.g., maximum likelihood) classi®cation is the primary
probability of membership. This empirical approach
interest. However, it may be instructive to consider
depends on there being sucient training data to
brie¯y empirical approaches for univariate classi®-
characterise fully the probability distributions for each
cation, following Carr's (1996) example. class. In practice, it is more common to replace the
probability distribution with a probability density
2.1. Empirical approaches function with parameters (mean, variance and covari-
ance between wavebands) estimated from the training
One of the simplest approaches to supervised classi®- data. The model-based approach is described in the
cation is the minimum-distance-to-mean classi®er, also next section.
referred to as the k-means or (here) c-means classi®er.
This classi®er utilises the Euclidean distances in (spec-
tral) feature space between (i) the pixels to be classi®ed 2.2. Model-based approaches
and (ii) the class means (obtained from training data).
In the univariate case, a training site is selected which In remote sensing the observation vectors are
belongs to a desired class and the values for all pixels commonly treated as random variables characterised
at the site are averaged to obtain the class mean. This by a multivariate Gaussian distribution. A multivari-
procedure is repeated for all other classes of interest. ate mean is determined for each class and the
Each pixel in the remainder of the image may then be Euclidean distance metric used in c-means classi®-
allocated to the class mean to which it is nearest in cation is replaced by the Mahalanobis distance, a
P.M. Atkinson, P. Lewis / Computers & Geosciences 26 (2000) 361±371 363
standardized statistical distance from which the a (obtained from training data) using simple geometry.
posteriori probability of class membership can be This could provide an image of t distances to each
obtained using Bayes' Theorem. The maximum like- class mean. Since the distributions of the training data
lihood classi®cation is achieved by allocating each are likely to form ellipsoids in feature space the Eucli-
pixel to the class with which it has the highest a dean metric is often replaced with the Mahalanobis
posteriori probability of membership. distance Mci which takes into account the variance±co-
It is useful to view multivariate, and in particular variance matrix Vc associated with a given class c:
ML, classi®cation as a series of stages from the initial
Mci zk xi ÿ ukc T Vÿ1
c zk xi ÿ ukc 3
remotely sensed image to the classi®ed image, each of
which contains potentially useful spatial information where zk (xi) is the vector of {k = 1, . . . ,K } waveband
for classi®cation purposes (see Fig. 1). For example, values at pixel locations xi and ukc is the vector of
start with the remotely sensed image of K wavebands. means in K wavebands for class c. In Fig. 1, the var-
The information in the image can be viewed in (spec- iance±covariance matrices per class are represented
tral) feature space (each waveband providing a dimen- graphically by probability contours. It is clear that
sion or axis) instead of geographical space. In this whereas the pixel to be classi®ed is closer to the Veg-
feature space it is possible to compute Euclidean dis- etation class mean, it is more likely to be classi®ed as
tances from each pixel to each of the class means Tarmac2 when the variance±covariance matrix is
Fig. 1. Stages in maximum likelihood classi®cation: (1) original imagery, (2) Mahalanobis distances, (3) probabilities derived from
probability density function (pdf), (4) a posteriori probabilities derived from Bayes theorem and (5) ®nal classi®ed image.
taken into account. This provides a second image of t 3. Texture classi®cation

Mahalanobis distances (Fig. 1). From the Mahalanobis
distances it is possible to determine an estimated There are many possible approaches to texture
(Gaussian) probability density function for each class classi®cation (see, for example, Chen et al., 1997a;
of interest. This can be expressed as in Eq. (4): Haralick et al., 1973). These generally involve the com-
putation of further image layers or `wavebands' of tex-
ture information using an image ®lter. Thus, for
1 example, the variance within a local moving window
pzxi j c K=2
expÿ1=2Mci 4
2p j Vc j1=2 could be computed and used as an additional image
layer to increase the accuracy of multivariate classi®-
cation (see Haack and Bechdol, 2000). A range of
where p(z(xi)vc ) is the probability density for pixel z(xi) techniques (e.g., neural networks) can then be used to
at location xi as a member of class c (Foody et al., perform the classi®cation (Chen et al., 1997b; Raghu
1992; Thomas et al., 1987). This provides a third et al., 1995).
image, this time of t probabilities one for each class. Recently, research has focused on the variogram
To use the above probabilities in maximum likeli- computed within a local window as a measure of tex-
hood classi®cation (and particularly in the presence of ture. Carr and Miranda (1998) compare this measure
non-equal a priori probabilities) these must be con- with more `traditional' co-occurrence-based measures.
verted to a posteriori probabilities using Bayes' Theo- They found that the texture measure that achieved
rem. The a posteriori probability of a pixel z(xi) greatest accuracy depended upon the nature of the
belonging to class c, L(cvz(xi)), may be obtained from data and texture. Variogram approaches provide the
Bayes' Theorem as before using Eq. (2) which may focus of this section. In general, two approaches for
now be written more fully to include the a priori prob- texture classi®cation which utilise the variogram may
abilities (Eq. (5)): be distinguished. In the ®rst, the actual values of the
sample variogram at discrete lags are used directly. In
the second, the variogram is modelled and the model
Pc pzxi j c
Lc j zxi 5 coecients are used. These two approaches are
X
t
Pr pzxi j r reviewed brie¯y in this section.
r1
3.1. The sample variogram in texture classi®cation

where Pc is the a priori probability of membership of
class c. This creates a fourth image of t a posteriori The use of actual semivariance estimates in texture
probabilities (Fig. 1). The maximum likelihood classi®- classi®cation was made popular in the ®eld of remote
cation is achieved by allocating pixels to their most sensing by Miranda et al. (1992, 1996, 1998), Miranda
likely class of membership. This results in a ®nal ®fth and Carr (1994) and Carr (1996). In all of these papers
image of {c = 1, . . . ,t } classes. variations on the same algorithm were implemented,
It is notable that fuzzy approaches to classi®cation referred to generally as the semivariogram textural
are equivalent to omitting the ®nal step, presenting the classi®er (STC). The basis of the approach is described
investigator with the posterior probabilities of member- below.
ship (or alternatively fuzzy memberships) to each class A window of ®xed size is de®ned and is used to
for each pixel. Fuzzy classi®cation has become increas- extract representative l by m areas for each class for
ingly popular in remote sensing in recent years as more use in training the classi®er. Miranda and Carr (1994)
remote sensing researchers realise the ubiquity of the used a window of 22 22 pixels. Within each window
solution oered (see, for example, Foody, 2000; Smith a kernel of r by s pixels is de®ned and is moved over
et al., 2000). However, the accuracy and utility of the window allowing semivariances for lags of up to r
fuzzy classi®cation can also be increased using spatial ÿ 1 or s ÿ 1 (whichever is the larger) to be estimated
information: geostatistical approaches are not per window. Miranda and Carr found that, for their
restricted to hard classi®cation. particular study, a kernel of 7 7 pixels allowed su-
Each of the ®ve dierent images described above ciently accurate estimation of the semivariance while
contains useful information for classi®cation purposes keeping the risk of the kernel straddling a class bound-
and we shall return to this point later in the discussion ary small. The mean semivariance (and the standard
of this paper. We now turn our attention to the ®rst of deviation of semivariance) per lag computed within the
the two approaches for incorporating spatial infor- 7 7 pixel kernel over the entire image provided the
mation into the classi®cation procedure, referred to as information on which class allocation was based. This
texture classi®cation. information was provided as (r ÿ 1, s ÿ 1)max `wave-
bands' (features), one for each lag. To these features, where h is the lag distance and c0 is the nugget var-
the original image wavebands could be added allowing iance, c1 is the structured variance and a' is the non-
a combined spectral and textural classi®cation. linear parameter: the parameters of the exponential
Several dierent methods of supervised classi®cation model. Given the theoretical value of the covariance C
could be used to allocate pixels on the basis of the tex- (equal to c0+c1) where the variation is second-order
ture information provided by the semivariances at stationary (Eq. (7)):
dierent lags. Miranda and Carr (1994) used a paralle-
lepiped classi®er because it was simple and computa- 1h i
C lim gh lim meanzxi h ÿ zxi 2
tionally ecient. They used the standard deviation to h41 h412 x2V
de®ne decision boundaries for each class. Where

mean z2 xi 7
boundaries overlapped a c-means classi®er was induced x2V
as described earlier. The approach was extended to
maximum likelihood classi®cation in Carr (1996). Also, From Eqs. (6) and (7) one obtains Eq. (8):
Miranda and Carr (1994) and Carr (1996) chose to " #
adopt the simple empirical look-up table classi®er 1 ÿ gh
a 0 ÿh= log : 8
described earlier in this paper. The approach could mean z2 xi
x2w
readily be extended to the parametric equivalent,
which is also described above. Ramstein and Ray used the approximation given by
Carr (1996) summarises the earlier work involving Eq. (8) to estimate a ' for each pixel in a Landsat The-
the STC and provides Fortran 77 code (two programs, matic Mapper channel 3 image of the urban area of
MXTEXT for univariate classi®cation and MXMULT Strasbourg, France. The resulting univariate feature
for multivariate classi®cation) to execute several vari- (one waveband representing a ') was then used as a
ations of the STC classi®er including empirical c- single discriminating variable in a classi®cation pro-
means and ML classi®cation. Lark (1996) applied this cedure. Ramstein and Ray found that the non-linear
approach to aerial photographs, while Chica-Olmo coecient provided much better discrimination
and Abarca-HeÂrnandez (2000) provide an example in between land cover classes than other linear coecients
this issue. such as C. The utility of this approach depends on the
appropriateness of the exponential model for all pixel-
centred kernel locations. However, Ramstein and
3.2. Variogram model coecients in texture
Ray found that it produced visually appealing results
classi®cation
when used to classify land cover in the urban area of
Strasbourg. Ramstein and Ray's research suggests
The second approach to using the variogram in tex- that approaches based solely on semivariance values
ture classi®cation involves assuming some kind of may miss vital information: approaches based on the
form or model for the sample variogram and obtaining range should not be overlooked.
the coecients of the model for use as features in Herzfeld (1993) adopted the modelling approach to
classi®cation in place of the semivariances themselves. texture classi®cation and developed it speci®cally for
Ramstein and Ray (1989) were the ®rst to automate sea¯oor classi®cation. This amounted to de®ning, in
such an approach in a remote sensing context, while the ®rst instance, several dierent target sea¯oor
Herzfeld and Higginson (1996) have automated a simi- classes such as `sediment pond', `inside corner moun-
lar approach for the classi®cation of elevation on the tains' and `abyssal hill terrain'. The sample variograms
mid-Atlantic sea¯oor. The general approach is obtained for these training areas representing each
described brie¯y below. class were then examined for dierences between
Ramstein and Ray (1989) selected a kernel of size classes. The analysis involved both the calculation of
11 11 pixels for their particular study and used it to anisotropic variograms and residual variograms (from
estimate a non-linear parameter a ' (equal to approxi- a trend m (h)), also referred to as the centred or
mately one third of the eective range) for every pixel. detrended variogram. A similar kind of analysis was
They suggested that least squares model ®tting for performed by Wallace et al. (2000) to discriminate
every pixel would be possible, although prohibitive in between vegetation communities in the Mojave desert.
terms of computer time. Instead they proposed an ap- It appeared from Herzfeld (1993) that sucient
proximation of a ' based on the exponential form of dierences existed between the variograms of dierent
model given by Eq. (6): classes to merit automated classi®cation. Herzfeld and
gh c0 c1 1 ÿ expÿh=a 0 Higginson (1996) automated a classi®cation procedure
based on several estimated variogram coecients
which were designed speci®cally to exploit the dier-
g0 0 6 ences most relevant to sea¯oor classi®cation and, in
particular, several coecients which exploited class optimally an unknown value. In kriging, the variogram
speci®c periodicity in the variogram. Herzfeld and Hig- is used to determine optimal weights li to apply to n
ginson combined linear and non-linear coecients to sample data {z(xi), i = 1,2,3, . . . ,n ) to form a
form a feature vector for each pixel upon which classi- weighted linear combination z(x0) which estimates
®cation was based. some unknown value z(x0), thus:
The uses of variogram model coecients noted
above all relate the texture measure empirically to class X
n
z x0 li zxi : 9
properties to provide a classi®cation. It is worth noting i1
that such measures have also been related both empiri-
cally, and using physically-based scene models, to con- The advantage and main attraction of kriging is that it
tinuous variables such as tree size and density in estimates optimally by referring to the variogram
forested areas. St-Onge and Cavayas (1995), for (model) estimated from the data themselves. It would
example, related forest structural parameters empiri- be attractive also if this advantage could be transferred
cally to directional variogram properties. Jupp (1997) to multivariate classi®cation. Two sets of authors have
describes a physically-based model for variance and used the variogram in the kriging sense as a spatial
variograms as a function of viewing and illumination weighting function in unsupervised, multivariate classi-
angles based on geometric optics. Any such mapping ®cation (Bourgaullt et al., 1992; Oliver and Webster,
from variogram characteristics to continuous variables 1989). While supervised classi®cation is the primary
could be applied to classi®cation. There are two par- goal, these two papers are reviewed brie¯y before con-
ticular points of note in such work: (i) directional sidering the potential problems of applying the
(rather than omnidirectional) variograms are typically approach to supervised classi®cation in remote sensing.
used and (ii) physically-based models can describe how
the variogram changes as the viewing and illumination 4.1. Oliver and Webster's original idea
angles change.
Some 10 years ago Oliver and Webster (1989) pro-
3.3. Problems with variogram-based texture posed and demonstrated a geostatistical basis for the
classi®cation spatial weighting of multivariate classi®cation for ap-
plication in soil survey. Their approach was particu-
The main problem with using the variogram as a larly suited to sample data provided as a regular lattice
measure of texture is that the homogeneous regions of or complete cover in the form of a raster array. It
dierent texture within the image must be suciently would seem, therefore, that their approach might also
large to allow computation of the variogram up to a be suitable for classi®cation in remote sensing.
reasonable number of lags. In many cases, the parcels Oliver and Webster's (1989) proposal built on the
of interest in the image are too small relative to the work of others, notably Webster and Burrough (1972)
spatial resolution of the imagery. Berberoglu et al. who ®rst introduced the idea of modifying the dissimi-
(2000) address this issue by computing texture within larity matrix by a non-linear function of separating
parcels de®ned a priori using vector data. distance in geographical space. However, whereas
The main problem with variogram model-based Webster and Burrough chose non-linear functions
approaches to texture classi®cation is that automatic (inverse distance square and exponential) arbitrarily,
®tting of (non-linear) models to variograms is unreli- Oliver and Webster's proposal was to use a function
able. Thus, the choice of variogram model may be obtained from the data themselves. The method is
inappropriate for certain regions of the image or for described brie¯y as follows.
certain classes and the coecients of the model ®tted The similarity matrix may be constructed for all
to the local variogram may be misleading. pairs of observations i and j (pixels) on which K prop-
erties (wavebands) have been measured using a simi-
larity coecient such as Gower's (1971) coecient
(Eq. (10)):
4. Smoothing the classi®cation
X
K
In the preceding section the variogram was used to 1ÿ j zik ÿ zjk j =rk wijk
provide information on texture for use in classi®cation. k1
Sij 10
This is dierent to the common use of the variogram X
K
Wijk
in geostatistics, that is, as a spatial weighting function k1
in kriging (Matheron, 1965, 1971). In kriging, the
objective is to smooth or average local values based on where sij is a measure of the similarity between pixels i
the variogram (or other structure function) to estimate and j, zik is the pixel value at i for class k, rk is a class-
speci®c constant and wijk is a weight. This matrix may Kh EZx ÿ mMZx h ÿ mT 15
then be converted to dissimilarity by (Eq. (11)):
where, Z(x) is a row vector of p second-order station-
dij 21 ÿ sij 1=2
: 11 ary random functions, m=E[Z(x)], and M is a p by p
positive de®nite symmetric matrix used as metric in the
The objective is to modify this dissimilarity matrix to calculation of (dis)similarities.
take account of both the geographical proximity Bourgaullt et al. also based the dissimilarities matrix
between pixels and the form of spatial variation. This on the Mahalanobis distance (Eq. (3)). Unfortunately,
may be achieved by multiplying the dissimilarity by a despite oering what would appear to be a less ambig-
function of geographical distance (Eq. (12)): uous method for selecting a variogram model, Bour-
gaullt et al. (1992) introduced a further ambiguity in
d ij dij f xi ÿ xj : 12 that the multivariate variogram and covariogram,
despite being computed from common data, produced
If the function were the exponential model (Eq. (6)) dierent classi®cation results.
then the modi®cation would take the following form Although not discussed by Bourgaullt et al., the
(Eq. (13)): above can be attributed to what might be termed the
c1 c0 `local' and `global' operation of the two measures.
d ij dij 1 ÿ expÿhij =a 0 dij : 13 Essentially, modi®cation of the dissimilarity by the
c0 c1 c0 c1
(multivariate) variogram does not modify the contri-
The modi®ed dissimilarity matrix may be used in unsu- bution of points that lie beyond the range of spatial
pervised classi®cation whether that be based on hier- correlation. In the method of Bourgaullt et al., a mean
archical clustering or non-hierarchical dynamic spatially modi®ed dissimilarity is calculated: if all
clustering (Oliver and Webster, 1989). points considered in the calculation of this were further
Oliver and Webster applied their method to three than the range from a candidate point then the
small data sets on soil properties to demonstrate its weighted dissimilarity would be the same as the
utility as a tool in classi®cation, particularly for soil unweighted value. If a mean spatially-modi®ed simi-
management purposes. The unsupervised algorithm larity measure was used, however, with the spatial
chosen was a form of non-hierarchical dynamic cluster- modi®cation performed using a multivariate co-vario-
ing that operates on orthogonal principal coordinates gram which tends to zero beyond the range then only
rather than directly on the original dissimilarity matrix. points within the range would contribute positively to
Because of the need to extract a single spatial weight- the estimate of the mean. These measures are non-line-
ing function from the multivariate feature space the arly related, so no linear transformation of modi®ed
modelled variogram of the ®rst (or ®rst few) principal similarity to modi®ed dissimilarity will produce equiv-
component(s) was used in the modi®cation (Eqs. (12) alent results using the two methods in the general case.
and (13)). This seems to undermine somewhat the The former is `global' in the sense that it includes con-
unbiasedness desired for such an approach. Further tributions from all points while the latter is `local' in
ambiguity was introduced by the suggestion of the that only points within the range contribute. Further,
authors that the range a of the model could be varied in the `global' case, the eect of spatial modi®cation
to achieve dierent amounts of smoothing as desired. will depend on the image extent considered, as it will
For example, a larger range (which yields smoother vary according to the proportion of observations
results) may be more appropriate for management pur- which lie within the range.
poses. Again, this falls somewhat short of the objective
of an unbiased and even optimal solution as is the case 4.3. Alternative smoothing approaches
with kriging.
Many alternative approaches to smoothing in super-
vised classi®cation have been employed. These range
from simple low-pass ®ltering of remotely sensed ima-
4.2. Modi®cations to the method gery to more intricate processing (e.g., see the graph-
theoretic approach adopted by Barr and Barnsley,
Bourgaullt et al. (1992) proposed to replace the var- 2000). Some of these alternative approaches are dis-
iogram of the ®rst (few) principal component(s) with cussed in this section.
the multivariate variogram (Eq. (14)) and the multi- An important alternative to the spatial weighting of
variate covariogram (or covariance function) (Eq. (15)) multivariate classi®cation based on the variogram is
de®ned as follows: the Gibbs sampler. Based on Bayes' Theorem, it allows
one to incorporate information in neighbouring pixels
2Gh EZx ÿ Zx hMZx ÿ Zx hT 14 into the spectral classi®cation procedure. The Gibbs
sampler updates iteratively the predicted class of a for dierences in class homogeneity. There is much
pixel (chosen at random) conditional upon the pre- overlap between such an approach and those discussed
vious values of all other pixels (and particularly neigh- above.
bouring pixels). The iteration is stopped when the There are many algorithms available for segmenting
sequence converges to a (suciently) stable solution. remotely sensed images (Haralick and Shapiro, 1985).
In some sense then, this solution may be regarded as While a review of this literature is beyond the scope of
optimal (SchroÈder et al., 2000). However, it is only op- the present paper, it is worth pointing out that segmen-
timal in that the ®t achieved using the Gibbs sampler tation is fundamentally dierent to traditional spectral
attains maximum pseudo likelihood for the given classi®cation because spatial contiguity is an explicit
choice of spatial weighting function and kernel size goal of segmentation whereas it is only implicit in
(Augustin et al., 1996). In most instances, it is necess- classi®cation. There are many approaches to segmenta-
ary to experiment with dierent choices for the weight- tion including those based on edge detection (most
ing function and kernel size to search for a generally often based on some high-pass ®lter) and those based
optimal solution. on region-growing (based on the growth of hom-
van der Meer (1996, 1999) used indicator kriging ogenous regions conditional upon similarities between
(Goovaerts, 1997) applied to multivariate data to the pixel to be merged and previously merged pixels).
obtain a classi®cation for all pixels in a remotely There exist many examples of segmentation applied to
sensed image. The approach involved de®ning indi- remotely sensed imagery (Janssen and Molenaar, 1995;
cator variables for each feature (waveband) in an Khodja and Mengue, 1996; Lemoigne and Tilton,
image and obtaining variograms for each indicator. 1995; Lobo et al., 1996; Ryherd and Woodcock, 1996).
These variograms were then used in block indicator Segmentation routines involving region-growing al-
kriging to estimate the average value of each indicator gorithms may be based on the similarities between var-
for a block or area of pixels centred on the pixel to be iograms or the coecients of variogram models (Lloyd
classi®ed. This amounts to smoothing of the tra- and Atkinson, 1998) resulting in segmentation based
ditional classi®er. However, importantly the spatial in- on variogram texture.
formation (weighting function) incorporated into the
classi®cation via the variogram is derived from the
form of spatial variation in the variable itself as is
desired. The ambiguity in this approach would appear
to come from the initial selection of indicator cut-os 5. Discussion: spatial weighting for remote sensing
and the size of blocks (i.e., the amount of smoothing). classi®cation
Further, extrapolation at the tails of the distribution
can alter the results substantially.
The above indicator approach is similar to an 5.1. Selecting an appropriate space
approach known as regionalised classi®cation (Bohling,
1997; Harf and Davis, 1990; Moline and Bahr, 1995). One of the ®rst decisions facing the analyst wishing
Regionalised classi®cation is, in fact, nothing more to use a spatial weighting in smoothing a classi®er for
than the interpolation to unobserved sites of the inputs remote sensing is in which space should the spatial
to or the outputs from some traditional classi®er (geographical) weighting be determined and applied?
applied to sparse data. However, papers on regiona- The work on regionalised classi®cation of Harf and
lised classi®cation have been useful for pointing out Davis (1990), Moline and Bahr (1995) and Bohling
that dierent stages in the classi®cation process can be (1997) (among others) illustrates clearly that there are
interpolated (or in the present case smoothed). For many possible spaces in which the smoothing can take
example, one could interpolate the feature vectors and place. These include (see Fig. 1):
then proceed with classi®cation at all sites. Alterna-
tively, one could interpolate the Mahalanobis dis- 1. multivariate feature space (for example, the actual
values in the wavebands),
tances, or the probabilities for use in ML classi®cation.
This choice, which is entirely general, represents an im- 2. Euclidean distance-to-means in multivariate feature
portant decision for investigators wishing to devise space,
3. Mahalanobis distance-to-means in multivariate fea-
strategies for incorporating spatial information into a
classi®cation. ture space,
4. probabilities (per class),
Following on from the above theme, Palubinskas et
5. a posteriori probabilities (per pixel) and
al. (1995) applied a smoothing algorithm to the fuzzy
(for example, a posteriori Bayesian probabilities) out- 6. the classi®ed image.
puts from the classi®cation of a remotely sensed The spatial weighting function could be estimated in
image. The algorithm was modi®ed locally to account any one of these spaces and used to aid classi®cation.
5.2. A geostatistical basis for supervised classi®cation? depend on a stationary variogram. It should be noted
that the Gibbs sampler does not necessarily help in
this regard. The Gibbs sampler is a generally appli-
For several reasons Oliver and Webster's (1989)
cable ®tting procedure most useful for Markov chains.
approach without modi®cation is unlikely to be of
It does not necessarily allow for non-stationarity
much utility for the classi®cation of remotely sensed
although it can be used to ®t models which do so to
imagery. First, in remote sensing unsupervised classi®-
some extent (for example, Brunsdon, 2000). In most
cation is used less frequently than supervised classi®-
implementations, the result of the Gibbs sampler will
cation and for the latter there is no dissimilarity
be a maximum pseudolikelihood ®t for a selected
matrix. The equivalent to the dissimilarity matrix, the
smoothing function and kernel size ®xed over the
distances between the pixel values and the class means
entire image. A potential solution is provided by a per-
expressed in terms of some distance metric in feature
parcel classi®er that utilises a priori knowledge in the
space, do not have a spatial dimension (they are not
form of digital vector boundaries to segment the region
expressed as a function of lag). Each class mean is
of interest into distinct parcels prior to classi®cation.
de®ned over the whole image, not for a single pixel
Within each parcel the spatial weighting could be
located in geographical space. These distances are,
applied independently. The per-parcel classi®er would
therefore, less readily modi®ed by a function de®ned in
also limit naturally the number of data to a tolerable
geographical space and the variogram cannot be
value.
employed in the same way as for unsupervised classi®-
cation. The most straightforward approach would be
to use the (multivariate) variogram as a smoothing
function directly on the images of distances-to-class-
means (Fig. 1). However, the simple application of the
variogram as a smoothing function (that is, convolu- 6. Conclusions
tion of the images of distances with a kernel-based
weighting function) is unlikely to result in optimal use Most techniques that use the variogram to classify
of the spatial correlation in the image. The Gibbs sam- remotely sensed images have their shortcomings. For
pler discussed above is attractive in this regard because example, texture classi®ers based on the variogram
of the lack of an explicit structure function. work only where the homogenous regions of each class
Second, the data sets must be small for the algor- in the image are suciently large, homogenous and
ithm to be eective (for example, one of the data sets dierent texturally between classes. Even then there is
analysed by Oliver and Webster consisted of 6 14 no guarantee that the extra data on texture will yield
cells) and in remote sensing the data sets are typically useful information (above that provided by the original
large (commonly in excess of 1000 1000 pixels). imagery). Approaches that use the variogram for
Despite the problems discussed above, supervised smoothing in classi®cation may be divided into two
classi®cation does have the advantage of removing the groups: those based on simple ®ltering of the image (at
need to compare distances between all pixels and all any stage in the classi®cation process) and the method
other pixels (involving nearly 500,000,000,000 compari- of Oliver and Webster (1989) in which the variogram
sons for an image of 1000 1000 pixels, which would is used to modify the dissimilarities in unsupervised
be prohibitive in most cases). Therefore, small data classi®cation. The former approaches do not deliver
sets are not a prerequisite for supervised classi®cation. the optimality associated with geostatistical techniques
Finally, the variogram is de®ned as a parameter of a such as kriging. The latter approach holds much prom-
RF model which is stationary in the squared dier- ise, but problems will need to be overcome if it is to be
ences between pairs of locations separated by a given applied in remote sensing.
lag h (referred to as the intrinsic hypothesis). For
many remotely sensed images the entire scene is not
readily modelled in such a way. In particular, for land-
scapes aected by human activity (for example, agri-
cultural ®elds, forest stands, urban areas and so on)
the objective of classifying an entire image using a Acknowledgements
single variogram is ultimately ¯awed because the
stationary RF model is unjusti®ed. Where a stationary This paper was written while PMA was on leave at
variogram model cannot reasonably be considered, but the School of Mathematics, University of Wales, Car-
rather the model should be allowed to vary smoothly di. The authors thank Professor Giles Foody for use-
across the image, the objective should be to adopt ful information relating to this paper and are grateful
(probably non-parametric) approaches which do not to the referees for their comments.
References Haralick, R.M., Shanmugam, K., Dinstein, I., 1973. Textural

features for image classi®cation. IEEE Transactions on
Augustin, N.H., Mugglestone, M.A., Buckland, S.T., 1996. Systems, Man and Cybernetics 3, 610±621.
An autologistic model for the spatial distribution of wild- Haralick, R.M., Shapiro, L.G., 1985. Image segmentation
life. Applied Ecology 33 (2), 339±347. techniques. Computer Vision, Graphics and Image
Barr, S., Barnsley, M., 2000. Reducing structural clutter in Processing 29 (1), 100±132.
land cover classi®cation of very high spatial resolution Harf, J., Davis, J.C., 1990. Regionalization in geology by
remotely sensed images for urban land use mapping. multivariate classi®cation. Mathematical Geology 22 (5),
Computers & Geosciences 26 (4), 433±449. 573±588.
Berberoglu, S., Lloyd, C.D., Atkinson, P.M., Curran, P.J., Herzfeld, U.C., 1993. A method for sea¯oor classi®cation
2000. The integration of spectral and textural information using directional variograms, demonstrated for data from
using neural networks for land cover mapping in the the western ¯ank of the Mid-Atlantic Ridge. Mathematical
Mediterranean. Computers & Geosciences 26 (4), 385±396. Geology 25 (7), 901±924.
Bohling, G.C., 1997. GSLIB-style programs for discriminant Herzfeld, U.C., Higginson, C.A., 1996. Automated geostatisti-
analysis and regionalized classi®cation. Computers & cal sea¯oor classi®cation-principles, parameters, feature
Geosciences 23 (7), 725±761. vectors, and discrimination criteria. Computers &
Bourgaullt, G., Marcotte, D., Legendre, P., 1992. The multi- Geosciences 22 (1), 35±52.
variate (co) variogram as a spatial weighting function in Janssen, L.L.F., Molenaar, M., 1995. Terrain objects, their
classi®cation methods. Mathematical Geology 24 (5), 463± dynamics and their monitoring by the integration of GIS
478. and remote sensing. IEEE Transactions on Geoscience and
Brunsdon, C., 2000. A Bayesian approach to schools catch- Remote Sensing 33 (3), 749±758.
ment-based modelling. Geographical and Environmental Jupp, D.L.B., 1997. Modelling directional variance and vario-
Modelling (in press). grams using geometric optics. Journal of Remote Sensing
Carr, J.R., 1996. Spectral and textural classi®cation of single (China) 1, 94±101.
and multiple band digital images. Computers & Khodja, A., Mengue, A., 1996. Improvement in the thematic
Geosciences 22 (8), 849±865. contribution of an XS spot image using spatial processes
Carr, J.R., Miranda, F.P., 1998. The semivariogram in com- and a segmentation method Ð application to the region of
parison to the co-occurrence matrix for classi®cation of Lagdo (Northern Cameroon). International Journal of
image texture. IEEE Transactions on Geoscience and Remote Sensing 17 (5), 879±886.
Remote Sensing 36 (6), 1945±1952. Lark, R.M., 1996. Geostatistical description of texture on an
Chen, Y.Q., Nixon, M.S., Thomas, D.W., 1997a. On texture aerial photograph for discriminating classes of land-cover.
classi®cation. International Journal of Systems Science 28 International Journal of Remote Sensing 17 (11), 2115±
(7), 669±682. 2133.
Chen, K.S., Yen, S.K., Tsay, D.W., 1997b. Neural classi®-
Lemoigne, J., Tilton, J.C., 1995. Re®ning image segmentation
cation of SPOT imagery through integration of intensity
by integration of edge and region data. IEEE Transactions
and fractal information. International Journal of Remote
on Geoscience and Remote Sensing 33 (3), 605±615.
Sensing 18 (4), 763±783.
Lobo, A., Chic, O., Casterad, A., 1996. Classi®cation of
Chica-Olmo, M., Arbarca-HeÂrnandez, F., 2000. Computing
Mediterranean crops with multisensor data Ð per-pixel
geostatistical image texture for remotely sensed data classi-
versus per-object statistics and image segmentation.
®cation. Computers & Geosciences 26 (4), 373±383.
International Journal of Remote Sensing 17 (12), 2385±
Curran, P.J., 1988. The semi-variogram in remote sensing: an
2400.
introduction. Remote Sensing of Environment 24 (3), 493±
Lloyd, C.D., Atkinson, P.M. 1998. The eect of scale-related
507.
Dungan, J., 1998. Spatial prediction of vegetation quantities issues on the geostatistical analysis of Ordnance Survey
using ground and image data. International Journal of (R) digital elevation data at the national scale. In: Gomez-
Remote Sensing 19 (2), 267±285. Hernandez, J., Soares, A., Froidevaux, R. (Eds.),
Foody, G.M., 2000. Estimation of sub-pixel land cover com- GeoENV II: Geostatistics for Environmental Applications.
position in the presence of untrained classes. Computers & Kluwer, Dordrecht.
Geosciences 26 (4), 469±478. Matheron, G., 1965. Les Variables Regionalisees et Leur
Foody, G.M., Campbell, N.A., Trodd, N.M., Wood, T.F., Estimation. Masson, Paris.
1992. Derivation and applications of probabilistic Matheron, G., 1971. The Theory of Regionalized Variables
measures of class membership from the maximum likeli- and its Applications. Centre de Morphologie
hood classi®cation. Photogrammetric Engineering and MatheÂmatique de Fontainebleau. Ecole des Minesde Paris,
Remote Sensing 58 (9), 1335±1341. Fascicule 5, Fontainebleau.
Goovaerts, P., 1997. Geostatistics for Natural Resources Miranda, F.P., Carr, J.R., 1994. Application of the semivario-
Evaluation. Oxford University Press, Oxford. gram textural classi®er (STC) for vegetation discrimination
Gower, J.C., 1971. A general coecient of similarity and using SIR-B data of the Guiana Shield, Northwestern
some of its properties. Biometrics 27, 857±871. Brazil. Remote Sensing Reviews 10, 155±168.
Haack, B., Bechdol, M., 2000. Integrating multisensor data Miranda, F.P., Fonseca, L.E.N., Carr, J.R., Taranik, J.V.,
and RADAR texture measures for land cover mapping. 1996. Analysis of JERS-1 (FUYO-1) SAR data for veg-
Computers & Geosciences 26 (4), 411±421. etation discrimination in Northwestern Brazil using the
semivariogram textural classi®er (STC). International tional variogram. International Journal of Remote Sensing
Journal of Remote Sensing 17 (17), 3523±3529. 16 (11), 1999±2021.
Miranda, F.P., Fonseca, L.E.N., Carr, J.R., 1998. SchroÈder, M., Walessa, M., Rehrauer, H., Seidel, K., Datcu,
Semivariogram textural classi®cation of JERS-1 (Fuyo-1) M., 2000. Gibbs random ®eld models: a toolbox for infor-
SAR data obtained over a ¯ooded area of the Amazonian mation extraction. Computers & Geosciences 26 (4), 423±
rainforest. International Journal of Remote Sensing 19 (3), 432.
549±556. Smith, G.R., Woodward, J.C., Heywood, D.I., Gibbard, P.L.,
Miranda, F.P., Macdonald, J.A., Carr, J.R., 1992. 2000. Interpreting pleistocene glacial features from SPOT
Application of the semivariogram textural classi®er (STC) HRV data using fuzzy techniques. Computers &
for vegetation discrimination using SIR-B data of Borneo. Geosciences.
International Journal of Remote Sensing 13 (12), 2349± Thomas, I.L., Benning, V.M., Ching, N.P., 1987.
2354. Classi®cation of Remotely Sensed Images. Adam Hilger,
Moline, G.R., Bahr, J.M., 1995. Estimating spatial distri- Bristol.
butions of heterogenous subsurface characteristics by van der Meer, F., 1996. Classi®cation of remotely-sensed ima-
regionalized classi®cation of electrofacies. Mathematical gery using an indicator kriging approach-application to
the problem of calcite-dolomite mineral mapping.
Geology 27 (1), 3±22.
International Journal of Remote Sensing 17 (6), 1233±
Oliver, M.A., Webster, R., 1989. A geostatistical basis for
1249.
spatial weighting in multivariate classi®cation.
van der Meer, F. 1999. Geostatistical approaches for image
Mathematical Geology 21 (1), 15±35.
classi®cation and assessment of uncertainty in geologic
Palubinskas, G., Lucas, R.M., Foody, G.M., Curran, P.J.,
processing. In: Atkinson, P.M., Tate, N.J. (Eds.),
1995. An evaluation of fuzzy and texture-based
Advances in Remote Sensing and GIS Analysis. Wiley,
approaches for mapping regenerating tropical forest classes
Chichester, pp. 147±166.
from Landsat-TM data. International Journal of Remote Wallace, C.S.A., Watts, J.M., Yool, S.R., 2000.
Sensing 16 (4), 747±759. Characterizing the spatial structure of vegetation commu-
Raghu, P.P., Poongodi, R., Yegnanarayana, B., 1995. A com- nities in the Mojave desert using geostatistical techniques.
bined neural-network approach for texture classi®cation. Computers & Geosciences 26 (4), 397±410.
Neural Networks 8 (6), 975±987. Webster, R., Burrough, P.A., 1972. Computer-based soil map-
Ramstein, G., Ray, M., 1989. Analysis of the structure of ping of small areas from sample data. II. Classi®cation
radiometric remotely-sensed images. International Journal smoothing. Journal of Soil Science 23, 222±234.
of Remote Sensing 10 (6), 1049±1073. Woodcock, C.E., Strahler, A.H., Jupp, D.L.B., 1998a. The
Ryherd, S., Woodcock, C., 1996. Combining spectral and tex- use of variograms in remote sensing I: Scene models and
tural data in the segmentation of remotely sensed images. simulated images. Remote Sensing of Environment 25 (3),
Photogrammetric Engineering and Remote Sensing 62 (2), 323±348.
181±194. Woodcock, C.E., Strahler, A.H., Jupp, D.L.B., 1998b. The
St-Onge, B.A., Cavayas, F., 1995. Estimating forest stand use of variograms in remote sensing II: real digital images.
structure from high-resolution imagery using the direc- Remote Sensing of Environment 25 (3), 349±379.

HTTP WWW - Sciencedirect.com Science Ob MImg& M Cdi 5840& User 1234512& Search& CoverDate 05!31!2000& Ie Sdarticle

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

HTTP WWW - Sciencedirect.com Science Ob MImg& M Cdi 5840& User 1234512& Search& CoverDate 05!31!2000& Ie Sdarticle

Hochgeladen von

Copyright:

Verfügbare Formate

Computers & Geosciences 26 (2000) 361±371

Geostatistical classi®cation for remote sensing: an

Keywords: Variogram; Classi®cation; Geostatistics; Smoothing; Texture

1. Introduction niques described in the main body of this paper

where p(cvz ) is the conditional probability of having

taken into account. This provides a second image of t 3. Texture classi®cation

3.1. The sample variogram in texture classi®cation

de®ne decision boundaries for each class. Where

References Haralick, R.M., Shanmugam, K., Dinstein, I., 1973. Textural

Das könnte Ihnen auch gefallen