Beruflich Dokumente
Kultur Dokumente
2004
doi:10.1016/j.icesjms.2003.11.002
Harbitz, A., and Pennington, M. 2004. Comparison of shortest sailing distance through
random and regular sampling points. e ICES Journal of Marine Science, 61: 140e147.
The shortest sailing distance through n sampling points is calculated for simple theoretical
sampling domains (square and circle) as well as for a rather irregular and concavely shaped
real sampling domain in the Barents Sea. The sampling sites are either located at the nodes
of a square grid (regular sampling) or they are randomly distributed. For n less than ten, the
exact shortest sailing distance is derived. For larger n, a traveling salesman algorithm
(simulated annealing) was applied, and its bias (distance from true minimum) was estimated
based on a case where the true minimum distance was known. In general, the average
minimum sailing distance based on random sampling was considerably shorter than for
regular sampling, and the difference increased with sample size until an asymptotic value
was reached at about n ¼ 60 for a square domain. For the sampling domain in the Barents
Sea used for shrimp (Pandalus borealis) abundance surveys (n ¼ 118 stations), the cruise-
track lengths based on random sampling were approximately normally distributed. The
mean sailing distance was 18% shorter than the cruise track for regular sampling and the
standard deviation equalled 2.6%.
Ó 2003 International Council for the Exploration of the Sea. Published by Elsevier Ltd. All rights reserved.
Keywords: abundance surveys, random and regular sampling, shrimp, simulated annealing,
traveling salesman.
Received 20 February 2003; accepted 3 November 2003.
A. Harbitz: Institute of Marine Research, PO Box 6404, N-9294 Tromsø, Norway.
M. Pennington: Institute of Marine Research, PO Box 1870, Nordnes, N-5817 Bergen,
Norway; tel: +47 55236309; fax: +47 55238531; e-mail: michael.pennington@imr.no.
Correspondence to A. Harbitz; tel: +47 77609731; fax: +47 77609701; e-mail:
alf.harbitz@imr.no
1054-3139/$30 Ó 2003 International Council for the Exploration of the Sea. Published by Elsevier Ltd. All rights reserved.
Comparison of shortest sailing 141
In this paper, a TSP algorithm based on simulated The regular square grid can also be applied to a stratified
annealing (Kirkpatrick et al., 1983) is used to find the sampling design, e.g., by choosing different orientations
shortest cruise track. The algorithm is taken from Press et al. and different closest neighbor distances in each stratum.
(1994) and translated to Matlab code that is freely available The theoretical minimum distance is then the sum of the
from either of the authors. minimum distances within each stratum or
X
m pffiffiffiffiffiffiffiffiffiffi
dregmin ¼ ni Ai ; ð2Þ
Length of cruise track i¼1
The minimum sailing distance, drnd, through n random sites, close to the actual number of trawl hauls conducted in
locations in a closed cycle was determined exactly for n!10. each survey. Recently, some modifications were made by
For larger n, it was estimated using the TSP algorithm (see applying the diagonals in the NeS grid as a regular square
Appendix A). To compare random and regular sampling, grid in the least abundant area (south-western part), i.e.
simulations were used to estimate the distribution of the a regular square grid oriented NWeSE with ca. 28 nm
random variable f ¼ drnd =dreg or f ¼ drnd =dregmin . Because spacing between neighbor sampling sites.
the iterative simulated annealing algorithm chose permuta- In several surveys, one and the same location has been
tions of the chronological order of sample point visits in sampled several times in order to study diurnal effects. An
a random fashion, repetitions of the algorithm on a given ideal coverage of the entire study area has also rarely been
realization of n random sampling points provide different obtained, due to ice coverage in the north, bad weather,
results. The estimated variance of drnd (and f) therefore will gear problems, or other complications. In 1996, a pilot
contain one component due to different sampling point study was performed by applying a regular square NeS
realizations, and one component due to the stochastic nature grid in part of the abundant Hopen area (northern stratum)
of the simulated annealing procedure. with 10 nm distances between closest neighbors.
Three major properties of the TSP algorithm examined At each station, a standard trawl haul is conducted with
were the bias (distance from true minimum distance), the a vessel speed of 3 nm/h (kn) for 20 min, and the trawling
standard deviation of minimum traveling distance found by operation typically takes about 1 h, varying with depth.
repeating the TSP algorithm on a given realization of n This includes the slowing down of the vessel from the
random points, and the efficiency in terms of CPU time. cruising speed of 12 kn between stations, the setting,
The bias of the algorithm was estimated as the difference towing, and retrieval of the trawl and the speeding up to 12
between the mean and the minimum shortest distance found kn towards the next station. The vessel is heading towards
by a number of TSP-runs for the same realization of the next station during the trawl operation with an average
sampling points. In order to examine how many runs were speed of approximately 3 kn. Thus for each station ap-
needed in order to obtain the exact minimum, a test case proximately 3 nm of the sailing distance is covered. The
with 100 random locations in a square where the exact remaining average distance of 17 nm takes 1 h 25 min. The
solution is known was run (Reinelt, 1991). The success rate, time spent on cruising thus dominates the time spent on
i.e. the relative number of simulations where the exact sampling, but the latter could not be neglected.
solution was found was then used as a rough guideline for The following sampling schemes were examined with
how many repeated simulations for a given realization were n ¼ 118 sampling points:
needed in order to obtain the exact minimum, or a minimum
1. Regular distance. The minimum regular sailing dis-
negligibly larger than the exact solution. Note that the bias
tance, dreg, that is possible as compared to the
can be reduced by adjusting the parameters in the TSP
theoretical minimum distance.
algorithm at the cost of increased CPU time. A rough
2. Random sampling. The distribution of f ¼ drnd =dreg
goodness criterion to assess the found minimum is to verify
when the sampling points are randomly distributed
that the tour does not intersect itself, which is inconsistent
over the entire sampling domain (one stratum).
with the optimal solution (Flood, 1956).
3. Cell simulation. The distribution of f ¼ drnd =dreg when
With regard to dispersion, the standard deviation of
one random location is chosen within each of the 118
minimum sailing distance due to different realizations of
regular unit cells in the sampling domain.
the n random locations is of major interest. It is assumed
4. Transport included. The ratio between shortest random
that this standard deviation is independent of the standard
and shortest regular distance when the transport
deviation of the TSP algorithm for a given realization. It is
distance to and from the port (Tromsø) is included.
further assumed that the standard deviation of the TSP
5. Stratified sampling. The distribution of f ¼ drnd =dregmin
algorithm is independent of realization. By subtracting the
with six strata and the number of sampling sites within
variance of the TSP algorithm from the variance of mini-
each stratum proportional to stratum area and average
mum distance found by a number of different realizations
stratum standard deviation of historical shrimp (Pan-
with one TSP simulation per realization, the standard devia-
dalus borealis) samples. The stratification is ignored in
tion due only to different realizations can be estimated.
the simulations of shortest sailing distance.
Case studies 1 through 4 were based on 200 realizations and
The Barents Sea shrimp survey case 5 was based on 400 realizations. For case 1, some small
This trawl-based survey has been conducted in April/May random values were added to the 118 regular sites to obtain
on an annual basis with a fixed regular sampling design only one unique solution. In this regular case, it is easier to
since 1992. The square sampling grid is oriented in an NeS judge subjectively if the exact minimum distance is
(and EeW) direction with 20 nautical miles (nm) between obtained, compared to the situation with random locations.
closest neighbors. A rather irregular border encloses the Case study 3 was chosen to examine if random sampling
concavely shaped domain, including 118 regular sampling gives shorter sailing distance than regular sampling under
Comparison of shortest sailing 143
the strong restriction of having one random location in each duration T, n will be larger for a random survey than for
regular unit cell. This option can be seen as a hybrid a regular survey when qrnd ¼ drnd =dreg !1. Whether this
combining regular and random sampling. increase in sample size will make a random survey more
Case study 4 includes the traveling distance to and from the precise than a regular survey will depend on the spatial
sampling field. If the traveling distance dominates the sailing structure of the stock.
pA random survey is more precise than
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
distance within the field, the f-ratio would be very close to a regular survey if nrnd =nreg is larger than CVrnd/CVreg,
one (no difference between minimum sailing distance by where the first ratio is calculated for fixed T and the latter is
random and regular sampling). However, the sailing distance calculated for fixed nznreg .
within the sampling field may be much larger than the Note that Equations (3) and (4) can be slightly modified to
transport distance to and from the field, even if the latter is take into account other factors. For example, to examine the
comparable to the extension of the sampling field. This is the effect of different constant tow durations, t0 can be replaced
case for the Barents Sea sampling domain, where the by c1 þ t, where c1 is the average trawl operation time
transport distance is about 15% of the total sailed distance. excluding the tow duration, t (Pennington and Vølstad,
As noted previously, stratified sampling is expected to 1991). If the vessel is heading towards the next station during
provide shorter regular sailing distance than non-stratified the sampling process with an average speed of v0, and the
sampling. This is also the expected result for random ‘‘sampling distance’’ v0t0 is not negligible compared with the
sampling, but it is hard to deduce theoretically how stratified average distance between adjacent stations, then this can be
sampling will influence the f-ratio. This is the reason why accounted for by replacing t0 with ð1 v0 =vÞt0 in Equations
case 5 is included. (3) and (4).
a) mean values
1.00
circular domain
0.95 square domain
mean(drnd / dreg)
0.90
0.85
0.80
asymptotic estimate = 0.772
0.75
4 5 6 7 8 9 20 30 40 50 60 70 80 90100
sample size, n
b) standard deviations
0.25
circular domain
0.20 square domain
std(drnd / dreg)
0.15
0.10
0.05
0.00
4 5 6 7 8 9 20 30 40 50 60 70 80 90100
sample size, n
Figure 2. The mean of the estimated distribution of the minimum distance through n random points divided by minimum distance based
on n regular points (a) and the estimated standard deviation of the distribution (b) for a square and a circle vs. sample size, n. The exact
minimum is calculated for n!10, and minimum distances were estimated by simulated annealing for n > 10. The asymptotic estimate is
based on bias estimates of the simulated annealing procedure.
randomly distributed within the sampling domain, the random distance tended to be shorter than the regular one.
minimum sailing distance through the 118 points tended to An example is shown in Figure 4b, where the random
be considerably smaller than for regular sampling. An distance was 11.4% shorter than the regular one. Based on
example is shown in Figure 4a, where the minimum distance 200 random realizations, the distance with one random
through the random points is 25% smaller than the regular point per cell was estimated on average to be 6.5% shorter
minimum distance shown in Figure 3. Based on 200 random than the regular distance, when no bias correction of the
realizations, the minimum sailing distance was found to be TSP-procedure was applied. When applying the same TSP-
approximately normally distributed and the mean was 18% bias correction as for unconstrained random locations, the
smaller than the regular minimum distance. The estimated average random distance was estimated to be 8.1% shorter
standard deviation of the sampling distribution equalled than the regular distance.
2.6%, after a minor adjustment that took the dispersion of the Including the transport distance to and from the sample
TSP algorithm into account. The bias Df of the TSP algorithm field appeared to have a small effect on the relative difference
was in this case estimated to be 0.0161, i.e. considerably between random and regular sampling, even though the
smaller than for the square domain. This is probably due to transport distance was comparable with the extension of the
the use of more iterations in the TSP algorithm in the Barents area. An example is shown in Figure 4c, where the random
Sea simulations. As a rough estimate, each TSP simulation distance is 21.0% shorter than the regular one. On average,
required approximately 3 min CPU time in the Barents Sea based on 200 realizations, a 16.4% shorter sailing distance
case, as compared to about half the time required for the was obtained with random sampling, applying the same
square domain setting. TSP-bias correction as before.
Even when one random point was chosen within each The last factor examined was stratified sampling, where
square cell defined by the regular sampling scheme, the the number of sample points in each stratum was chosen
Comparison of shortest sailing 145
Figure 3. Minimum regular sailing distance, dreg, through the n ¼ 118 regular sampling sites in the Barents Sea, as foundpby the use of the
ffiffiffiffiffiffiffi
simulated annealing procedure. Note that this minimum distance is 3.86% larger than the theoretical minimum distance A n. The dotted
contour shows the actual sampling domain, while the solid contour shows the approximation by assigning one square cell to each regular
sampling site.
approximately proportional to stratum area and mean of sampling. If instead of 118 stations, 143 stations were
stratum standard deviation over years. For the six strata, the sampled randomly, then the average CV would be
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
number of sample points were: 10 (18), 9 (11), 14 (12), 14 8:2% 118=143 ¼ 7:4%. Given the validity of the assump-
(24), 51 (30), and 20 (23) in stratum 1 through 6, tions underlying the geostatistic estimates, it would then
respectively, with the sample size in the non-stratified case appear that the regular design is the preferred design for
in parentheses. In this case, the theoretical regular minimum these surveys at least for estimating abundance.
distance by applying Equation (2) is 97.6% of the theoretical
non-stratified regular minimum distance. An example is
shown in Figure 4d, where the random distance was 21.2% Discussion
shorter than the theoretical regular minimum. On average,
based on 200 realizations, a 14.0% shorter sailing distance One may ask why the drdn/dreg results incorporated the bias
was obtained by random vs. regular sampling. To determine of the TSP algorithm, because in practice a bias can hardly
the actual shortest distance compared to the theoretical one be avoided for realistic sampling sizes and one must live
in the stratified case is rather tedious and was not done, but with the fact that the exact minimum distance normally will
a factor larger than 1.0386 obtained for the non-stratified not be found. Note, however, that to provide the results in
case is expected. If the factor 1.0386 is applied, the average the present study, several realizations of the n sampling
random distance is 17.3% shorter than the regular distance, locations were needed (typically 200 realizations were run
i.e. very close to the non-stratified case. for the different settings). In a practical situation, only one
To find out if a random design provides a smaller CV than realization is needed, and thus far more extensive TSP-runs
a regular design, t0 in Equations (3) and (4) was replaced by can be performed. It is therefore realistic to expect that the
ð1 v0 =vÞt0 ¼ 0:75 h. If nreg ¼ 118 stations are sampled, bias in practice is negligible, at least for sample sizes that
then C ¼ 285 h (Equation (3)). For a random survey of are not much larger than n ¼ 118 used in this work. pffiffiffiffiffiffiffi
the same duration, q ¼ 0:82, and therefore, nrnd ¼ 143 The existence of the asymptotic result drnd ¼ k A n for
(Equation (4)). sufficiently large n, where k is a universal constant, was
Using geostatistic methods, Harbitz and Aschan (in proven by Beardwood et al. (1958), but the exact value of k is
press) estimated that the coefficient of variation (CV) for to our knowledge unknown. As outlined in the present paper,
these shrimp surveys would be on average 8.2% if random the theoretical minimum distance for regularpsampling ffiffiffiffiffiffiffi
sampling were employed (nrnd ¼ 118) and 6.4% for regular points (at the nodes in a square grid) is dregmin ¼ A n. This
146 A. Harbitz and M. Pennington
100 nm 100 nm
c) transport d) stratified
included random
100 nm
100 nm
Figure 4. Examples of ratios between random and regular cruise tracks in the Barents Sea with n ¼ 118 sampling points. (a) Random
sampling, f ¼ drnd =dreg ¼ 0:7514, (b) one random location per unit cell, f ¼ drnd =dreg ¼ 0:8859, (c) random sampling with transport to
and from field included, f ¼ drnd =dreg ¼ 0:7899, and (d) stratified sampling, drnd =dregmin ¼ 0:7879.
is also an asymptotic result, at least for all realistic sampling not applied, and also rather independent of including the
domains. Thus asymptotically, f ¼ drnd =dreg converges to traveling distance to and from the sampling area. Note,
the constant k, and k!1 implies a shorter minimum sailing however, that if the vessel starts in one port and ends in
distance by random than by regular locations. another, the sailing route is no longer closed, and the
Though the exact value of k is not known, it has been traveling salesman algorithm needs to be modified. How this
estimated based on simulations. Beardwood et al. (1958) will influence the f-ratio is a task for future investigations,
estimated that k is equal to 0.75 with 95% confidence along with the influence of, e.g., particular constraints, such
interval (0.66, 0.84). Christofides and Eilon (1969) graph- as the minimum time required to handle the sample at each
ically show that their simulations fit quite well with k ¼ 0:75 sampling site before a new sample is taken. The simulated
for n in the range of 70e100, the interval in which the results annealing algorithms in general are quite flexible with regard
for a circular, square, and triangular square coincided. to including constraints (Press et al., 1994).
An asymptotic result by itself does not say how large n It appears that for the shrimp surveys, a regular survey
should be in order to be close to its asymptotic limit. For would generate more precise abundance estimates than
n ¼ 118 and the Barents Sea sampling domain, the estimated random surveys, even though considerably more stations
mean f based on 200 realizations was 0.85 when corrected could be sampled randomly. Note, however, that de-
for bias, which is rather far from the asymptotic estimate of termining an appropriate estimate of the CV for systematic
k ¼ 0:77. When adjusting the estimate for the minimum designs is a complicated issue. Even if a correct model for
realizable regular distance, the mean f-ratio was 0.82. the correlation structure is applied, it is often a great
It is interesting that our results appear to be rather challenge to find appropriate estimators of the model
independent of whether a stratified sampling design is or is parameters. For example, a critical parameter for the
Comparison of shortest sailing 147
geostatistic estimator is the nugget parameter, which Thompson, S. K. 1992. Sampling. Wiley, New York.
determines how much of the sampling variance is caused Thompson, S. K., and Seber, G. A. F. 1996. Adaptive Sampling.
Wiley, New York.
by random variation and how much is due to the spatial
structure. A regular design will not have any sampling pairs
at zero or small distances. Such pairs are often crucial to
estimate adequately the nugget effect and, therefore, the Appendix A
estimated CV for a pure regular survey may be rather
The traveling salesman algorithm
imprecise and biased. In contrast, for random surveys it is
straightforward to estimate the precision of the abundance The applied traveling salesman algorithm is taken from
estimates, and these CV-estimates will tend to be more Press et al. (1994). Two simple techniques for changing the
precise than CV-estimates obtained for a regular design. order of the n sampling locations were used:
To conclude, the major goal of this work has been to 1. Reverse order: choose a random segment of succeeding
document that random sampling provides considerably sites along the present route, and reverse the order of
shorter sailing distances than regular sampling, and that this the segment.
is one of the several aspects that should be considered in the 2. Transport: choose a random segment of succeeding
much wider challenge of finding an optimal sampling sites along the present route studied, and transport the
design. whole segment between two succeeding sites outside
the segment, one of them chosen randomly.