Beruflich Dokumente
Kultur Dokumente
www.elsevier.com/locate/rse
Received 24 January 2005; received in revised form 23 August 2005; accepted 29 August 2005
Abstract
The number of training samples per class (n) required for accurate Maximum Likelihood (ML) classification is known to be affected by the
number of bands ( p) in the input image. However, the general rule which defines that n should be 10p to 30p is often enforced universally in
remote sensing without questioning its relevance to the complexity of the specific discrimination problem. Furthermore, identifying this many
training samples is often problematic when many classes and/or many bands are used. It is important, then, to test how this generally accepted rule
matches common remote sensing discrimination problems because it could be unnecessarily restrictive for many applications. This study was
primarily conducted in order to test whether the general rule defining the relationship between n and p was well-suited for ML classification of a
relatively simple remote sensing-based discrimination problem. To summarise the mean response of n-to-p for our study site, a Monte Carlo
procedure was used to randomly stack various numbers of bands into thousands of separate image combinations that were then classified using an
ML algorithm. The bands were randomly selected from a 119-band Enhanced Thematic Mapper-plus (ETM+) dataset comprised of 17 images
acquired during the 2001 2002 southern hemisphere summer agricultural growing season over an irrigation area in south-eastern Australia.
Results showed that the number of training samples needed for accurate ML classification was much lower than the current widely accepted rule.
Due to the asymptotic nature of the relationship, we found that 95% of the accuracy attained using n = 30p samples could be achieved by using
approximately 2p to 4p samples, or 1 / 7th the currently recommended value of n. Our findings show that the number of training samples needed
for a simple discrimination problem is much less than that defined by the general rule and therefore the rule should not be universally enforced; the
number of training samples needed should also be determined by considering the complexity of the discrimination problem.
D 2005 Elsevier Inc. All rights reserved.
Keywords: Crop classification; Dimensionality; Training sample; Time-series; Multi-temporal; Maximum likelihood
1. Introduction
The Fcurse of dimensionality_ is the tendency for model
accuracy to initially increase as the number of variables (e.g.,
bands, p) used increases, but then reach a limit where accuracy
decreasesthe point where the model is overfit (Hand, 1981;
Hughes, 1968; Pal & Mather, 2003). This phenomenon is
called Fpeaking_ in the pattern recognition literature (Jain &
Waller, 1978) and in the remote sensing literature has been
T.G. Van Niel et al. / Remote Sensing of Environment 98 (2005) 468 480
469
470
T.G. Van Niel et al. / Remote Sensing of Environment 98 (2005) 468 480
P93/R84
P92/R84
Fig. 1. Location of CIA in New South Wales, Australia. The overlapping rectangles represent two ETM+ scenes, where P is Path and R is Row of the Landsat World
Reference System-2 (WRS-2). The dashed lines through the study site represent the Hyperion swath.
Table 1
The ETM dataset is shown
Date
DS1O
Image number
Channel number
08
17
02
09
25
04
05
12
13
22
10
17
02
11
18
27
04
007
016
032
039
055
064
096
103
135
144
160
167
183
192
199
208
215
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
17
8 14
15 21
22 28
29 35
36 42
43 49
50 56
57 63
64 70
71 77
78 84
85 91
92 98
99 105
106 112
113 119
Oct. 2001
Oct. 2001
Nov. 2001
Nov. 2001
Nov. 2001
Dec. 2001
Jan. 2002
Jan. 2002
Feb. 2002
Feb. 2002
Mar. 2002
Mar. 2002
Apr. 2002
Apr. 2002
Apr. 2002
Apr. 2002
May 2002
The southern hemisphere summer growing season at the CIA lasts from around
October to May. Days since 1 October (DS1O) represent the number of days
since the nominal start of the summer growing season.
T.G. Van Niel et al. / Remote Sensing of Environment 98 (2005) 468 480
1
i
j
mi mj V
mi mj
2
JM 2 1 eB :
3
The JM distance was calculated for the six possible
combinations of the four classes. These were calculated using
the entire validation dataset (described in Section 2.2, above)
for each class in order to summarise the entire class response.
This was performed for all 17 dates in turn.
2.5. Determining the relationship between n and p
Three methods were used to summarise the relationship
between n and p. Each of these measured how well multiple
training data pdfs matched associated validation data pdfs.
These methods included: (1) assessing the mean and standard
deviation (SD) of the ratio of the training to validation
reflectances; (2) calculating the probability ( P) that the
validation data would be assigned a member of the correct
class based on the relevant training data; and (3) determining
the Kappa classification accuracy (K ) for each crop (the
training data were used to classify the image and the validation
data were used to assess accuracy). The K method was
calculated in two ways: either by: (i) the response of K to
varying p (K p ), for set values of n; or (ii) the response of K to
varying n (K n ), for set values of p. A Monte Carlo procedure
was used to randomly select and Fstack_ bands from the
available ETM dataset; the various training and validation
pixels of these random band combinations formed the basis of
these three comparisons. The general Monte Carlo procedure
will be described first, followed by the three summaries of the
n-to-p relationship.
2.5.1. Monte Carlo procedure
A Monte Carlo procedure involves two aspects: (1)
randomisation; and (2) integration or averaging. A Monte
Carlo procedure was used to summarise the general response
between n and p as described above. The Monte Carlo
procedure consisted of 8800 iterations in total (to clarify, here
Fiteration_ simply refers to the repeating of a process, not
providing a closer approximation of a solution to an equation).
Within an iteration, a predetermined number of bands were
randomly combined into a single image stack (see Fig. 2, steps
471
472
T.G. Van Niel et al. / Remote Sensing of Environment 98 (2005) 468 480
10
11
12
13
14
15
Images
16 17
Bands
e.g., 33
e.g., 40
e.g., 61
e.g., 107
4. Intersect both training and validation points with the stacked image
0.5
1.0
1.5
2.0
Sy
R
R
R
R
Sr
Sy
Sr
Sy
No
100 iterations?
Yes
7. Calculate single mean, and SD of the variable for 100 iterations (reflectance,
probability, and accuracy), and calculate mean and SD training to validation ratios
for reflectance
Fig. 2. An example of the Monte Carlo procedure is shown for a unique n p combination. In this example, a 5-band image stack (made up of bands 11, 33, 40, 61,
and 107) is classified into rice (R), maize (M), sorghum (Sr) and soybeans (Sy). Note, cells in steps 2, 3, and 5 denote fields not individual pixels.
5
2
T.G. Van Niel et al. / Remote Sensing of Environment 98 (2005) 468 480
1.5
JM Distance
473
1.0
Maize/Sorghum
Maize/Soybeans
Sorghum/Soybeans
Rice/Maize
Rice/Sorghum
Rice/Soybeans
0.5
Nov .
Oct.
0.0
0
25
50
Dec.
75
Jan.
100
Feb.
125
Mar .
150
175
Apr .
200
May
225
474
T.G. Van Niel et al. / Remote Sensing of Environment 98 (2005) 468 480
(b.)
n =40 (offset=+0.6)
1.6
Ratio (training/validation)
Ratio (training/validation)
(a.)
n =30 (offset=+0.4)
1.4
n =20 (offset=+0.2)
1.2
n =10
1.0
0
10
15
20
25
30
35
n =40 (offset=+0.6)
1.6
n =30 (offset=+0.4)
1.4
n =20 (offset=+0.2)
1.2
n =10
1.0
40
10
20
25
30
35
40
(c.)
(d.)
n =40 (offset=+0.6)
1.6
Ratio (training/validation)
Ratio (training/validation)
15
n =30 (offset=+0.4)
1.4
n =20 (offset=+0.2)
1.2
n =10
1.0
0
10
15
20
25
30
35
n =40 (offset=+0.6)
1.6
n =30 (offset=+0.4)
1.4
n =20 (offset=+0.2)
1.2
n =10
1.0
40
10
15
20
25
30
35
40
Fig. 4. Ratio of training reflectance means to validation reflectance means (solid line) and training reflectance standard deviations to validation reflectance
standard deviations (dashed line) for rice (a), maize (b), sorghum (c), and soybeans (d) are shown for the initial 6000 Monte Carlo iterations. Note, ratios for
n = 20 are offset by +0.20, ratios for n = 30 are offset by +0.40, and ratios for n = 40 are offset by +0.60 (i.e., if offsets were not applied, the lines would overlay
near 1.0 on the y-axis).
(a.) 100
(b.) 100
10 -20
Probability (P)
Probability (P)
10 -20
10 -40
10 -60
10 -80
10 -100
n=10
10 -300
0
n=30
n=20
10
15
20
25
n=40
30
35
40
10 -40
10 -60
10 -80
10 -100
n=10
10 -300
0
10
15
20
25
n=40
30
35
40
(c.) 100
(d.) 100
10 -20
Probability (P)
10 -20
Probability (P)
n=30
n=20
10 -40
10 -60
10 -80
10 -100
n=10
10 -300
0
n=30
n=20
10
15
20
25
n=40
30
35
40
10 -40
10 -60
10 -80
10 -100
n=10
10 -300
0
n=30
n=20
10
15
20
25
n=40
30
35
40
Fig. 5. P of classifying the entire class mean vectors based on the training data for rice (a), maize (b), sorghum (c), and soybeans (d) is shown for the initial 6000
Monte Carlo iterations. Note, the curves representing P for various values on n overlap prior to the relationship breaking down.
T.G. Van Niel et al. / Remote Sensing of Environment 98 (2005) 468 480
relationships do not break down, then classifiers that use firstorder statistics (e.g., minimum distance, or spectral angle
mapper) would not suffer from the peaking phenomenon and
would tend to cumulatively increase in classification accuracy
with the addition of more bands.
3.2.2. Probability ( P) assessment
The results of the P analysis are shown in Fig. 5. The point
where P decreased exponentially defined where the training
data pdf no longer represented the rest of the relevant class.
This happened in every case just prior to p = n, see Fig. 5 a to d.
This was where the classification was overfit and, after which,
poor accuracies would be expected. In this case, P was largely
a function of the Mahalanobis distance which has a reciprocal
relationship to P; Mahalanobis distance continues to increase
as p increases (for a constant n). The critical breakpoints for
each of the curves in Fig. 5 were associated with an opposing
increase in the Mahalanobis distance, which in turn caused
P Y 0 (or approximately <10 300 for double precision floating
point values, see Fig. 5).
The P analysis provided direct evidence that the classification model became overfit, and from this, we were able to
approximately determine where this occurred (just before p = n)
or near the point where the covariance matrix would become
singular. To represent this relationship in a more meaningful
way to users, two classification accuracy assessments were run:
(1) analysing the response of K p (varying p, given certain
values of n); and (2) analysing the response of K n (varying n,
given certain values of p).
80
80
(b.) 100
(a.) 100
60
40
20
n=30
n=20
n=10
n=40
0
-20
0
10
15
20
25
30
35
60
40
20
10
80
80
(d.) 100
60
40
n=30
n=20
n=40
0
-20
0
10
15
20
25
15
20
25
30
35
40
(c.) 100
n=10
n=40
0
-20
0
40
n=30
n=20
n=10
20
475
30
35
40
60
40
20
n=30
n=20
n=10
n=40
0
-20
0
10
15
20
25
30
35
40
Fig. 6. Per-field K p for ML classifications are shown for n = 10, 20, 30, and 40 for rice (a), maize (b), sorghum (c), and soybeans (d). The points p opt (empty circle),
p 90% (filled circle) and p 75% (empty square) are shown on each curve. See the text for detailed definitions of these three terms.
476
T.G. Van Niel et al. / Remote Sensing of Environment 98 (2005) 468 480
Table 2
Critical break points are defined from the accuracy statistics from the 4 crop types and training sample curves shown in Fig. 6 including p opt, p 90% and p 75%; see text
for detailed definitions
Class
Rice
Maize
Sorghum
Soybeans
n = 10
n = 20
n = 30
n = 40
p opt
p 90%
p 75%
p opt
p 90%
p 75%
p opt
p 90%
p 75%
p opt
p 90%
p 75%
5
5
6
5
7
7
7
7
8
8
8
8
10
10
10
10
16
16
15
16
18
18
17
18
20
15
15
15
26
26
25
26
28
27
27
28
25
25
20
25
36
35
35
36
38
38
37
38
Because of the rule defined for non-rapid areas of change at the end of Section 2.5.1, the p opt statistic for n = 20, n = 30, and n = 40 is sampled to the nearest 5th
increment. The p 75% and p 90% values in each case had Fsingle-channel_ precision since the gap between the last two 5-channel iterations were always backfilled
(again see the end of Section 2.5.1 for full details).
(a.) 100
p=10
p=20
p=15
(b.) 100
Kappa Accuracy (%)
80
60
40
20
0
-20
0
200
100
300
400
500
80
60
p=5
40
20
0
-20
0
600
p=10
300
400
500
(d.) 100
p=20
p=15
80
60
p=5
40
20
0
200
300
p=10
p=10
100
400
600
(c.) 100
Kappa Accuracy (%)
200
100
-20
0
p=20
p=15
p=5
500
600
80
p=20
p=15
p=5
60
40
20
0
-20
0
100
200
300
400
500
600
Fig. 7. Per-field K n for ML classifications are shown for p = 5, 10, 15, and 20 for rice (a), maize (b), sorghum (c), and soybeans (d). The seven symbols on each curve
represent, in order, n = 1p, n = 2p, n = 3p, n = 4p, n = 5p, n = 10p, and n = 30p.
T.G. Van Niel et al. / Remote Sensing of Environment 98 (2005) 468 480
477
Table 3
Summary of the relationship between accuracy of various n for set values of p
Class
p =5
p = 10
p = 15
p = 20
n 90%
n 95%
n99%
n90%
n 95%
n99%
n90%
n 95%
n99%
n90%
n 95%
n99%
Rice
Maize
Sorghum
Soybeans
3p
4p
2p
3p
4p
5p
3p
4p
30p
30p
4p
10p
<2p
3p
<2p
<2p
3p
4p
2p
3p
10p
10p
3p
10p
<2p
<2p
<2p
<2p
2p
3p
2p
2p
5p
10p
3p
5p
<2p
<2p
<2p
<2p
2p
2p
2p
2p
4p
5p
3p
5p
The values of n reported in the table relate to the accuracy determined using n = 30p, see the text for full details. Numbers of n are represented relative to p in the
body of the table.
As p increased from 5 to 20, the ratio of the number of n-top needed to attain the same relative proportion of accuracy
decreased (Table 3). Also, the three different metrics (i.e., n 90%,
n 95%, or n 99%) resulted in different ranges of recommended
numbers of training samples; n 99%4p to 30p, n 95%2p to 4p,
and n 90%2p to 3p (Table 3). This meant that developing a
single definition between n and p was difficult because it
varied on both the values considered to represent satisfactorily
high accuracy, and the number of bands used. For example, the
current rule of n needing to be 10p to 30p was about right if
n 99% was considered for p = 5 or p = 10. However, if either n 90%
or n 95% was considered to represent satisfactorily high
accuracy for the same p, then a smaller n would be required
(i.e., n 2p to 4p). Likewise, if p = 15 to 20, then for n 90%,
n 95%, or n 99%, the n required would also be far less than 10p to
30p (i.e., n 2p to 5p).
The Foptimum_ point along each curve was defined here
as being of a sufficiently high accuracy after which little
gain in accuracy was attained per extra training sample
added. We selected n 95% as it was positioned where the
curves were: (1) not varying dramatically, and (2) not in a
position of large diminishing returns in accuracy (Fig. 7 and
Table 3). Based on the n 95% metric, the ideal numbers of
samples ranged from about 2p to 4p. This showed that 95%
of the accuracy was retained at our study site by using 1 /
7th the number of training samples recommended by the
previously accepted rule (based on n = 30p). There was no
metric where n = 10p to 30p were consistently needed across
the range of p values tested.
3.2.5. Similarity analysis instead of significance testing
Performing some statistically rigorous analysis on the main
results of the two classification analyses above (K p and K n )
could greatly strengthen the interpretation of results. For
example, in the K p analysis, it would be interesting to know
how similar p opt was to p 75%. This would show the impact of
using more bands than optimal (for a given n) on classification
accuracy results. Likewise, for the K n analysis, as we
previously recommended using n 95% to define how many n
are required, the most interesting comparison would be whether
this 95% accuracy level (n 95%) was very similar (statistically)
to the accuracy defined by the recommended rule (n = 30p).
This would show the impact of using less training samples than
recommended for a given p. If they were very similar, then it
would strengthen the argument that for our dataset, attaining
n = 30p samples was not necessary.
478
T.G. Van Niel et al. / Remote Sensing of Environment 98 (2005) 468 480
S probcondx jH0
(2) H0:p opt,p 75%, or how similar each of the 100 accuracies
from the n-to-p combination representing p 75% were to the
mean (minus 1, 2, or 3 SDs) of the 100 accuracies from the nto-p combination representing p opt;
7a
100
X
7b
i1
>
x 0; expected m K ref
>
>
<
x 1; expected m K ref 1SD K ref
:
7c
condx
>
x 2; expected m K ref 2SD K ref
>
>
:
x 3; expected m K ref 3SD K ref
The similarity analysis was tested on four hypotheses; the
first two null hypothesis gauged the impact of using more
bands than was optimal given a set value of n, with the
remaining two null hypotheses assessing the impact of using
less training samples than the recommended n = 30p. Specifically they are:
(1) H0:p opt,p 90%, or how similar each of the 100 accuracies
from the n-to-p combination representing p 90% were to the
(a.)
1.00
(3) H0:n 30p ,n 95%, or how similar each of the 100 accuracies
from the n-to-p combination representing n 95% were to the
mean (minus 1, 2, or 3 SDs) of the 100 accuracies when
n = 30p; and
(4) H0:n 30p ,n 90%, or how similar each of the 100 accuracies
from the n-to-p combination representing n 90% were to the
mean (minus 1, 2, or 3 SDs) of the 100 accuracies when
n = 30p.
Since we showed in Section 3.2.4 that the number of n
needed to attain 95% of the accuracy was approximately
15% for that of attaining the accuracy by using n = 30p, the
third null hypothesis analysis was particularly useful as it
helped define whether acquiring the remaining 85% of the
training samples made a difference in the results. The
results of these four analyses are shown in Fig. 8; the
responses for four classes (i.e., rice, corn, sorghum and soy)
were averaged to achieve a single response for each H0
tested.
Fig. 8 reveals an indirect relationship between similarity and
both the size of n (Fig. 8a and b) and the size of p (Fig. 8c and
d). This was expected as the mean accuracy increased and the
(b.)
H0:popt ~ p90%
H0:popt ~ p75%
0.75
n=10
n=40
n=30
n=20
Probability
Probability
0.75
1.00
0.50
0.25
0.50
n=10
n=20
n=40
0.25
0.00
0
0.00
0
1.00
(d.)
H0:n30p ~ n95%
p=5
1.00
H0:n30p ~ n90%
0.75
p=5
p=10
0.50
Probability
Probability
0.75
Condition
Condition
(c.)
n=30
p=15
p=20
0.25
0.00
0
p=10
0.50
p=15
p=20
0.25
Condition
0.00
0
Condition
Fig. 8. Probabilities that condx would arise given that H0 is true are shown for H0: p optp 90% (a), H0: p optp 75% (b), H0: n 30p n 95% (c), and H0: n 30p n 90% (d).
Higher probabilities are indicative of higher similarity between v ref and v i .
T.G. Van Niel et al. / Remote Sensing of Environment 98 (2005) 468 480
479
480
T.G. Van Niel et al. / Remote Sensing of Environment 98 (2005) 468 480