Hengl 06 Finding The Right Pixel Size

ARTICLE IN PRESS
Computers & Geosciences 32 (2006) 12831298 www.elsevier.com/locate/cageo
Finding the right pixel size$

Tomislav Hengl
European Commission, Directorate General JRC, Institute for Environment and Sustainability, Soil and Waste Unit, TP 280, Via E. Fermi 1, I-21020 Ispra (VA), Italy Received 6 May 2005; received in revised form 23 November 2005; accepted 24 November 2005
Abstract This paper discusses empirical and analytical rules to select a suitable grid resolution for output maps and based on the inherent properties of the input data. The choice of grid resolution was related with the cartographic and statistical concepts: scale, computer processing power, positional accuracy, size of delineations, inspection density, spatial autocorrelation structure and complexity of terrain. These were further related with the concepts from the general statistics and information theory such as Nyquist frequency concept from signal processing and equations to estimate the probability density function. Selection of grid resolution was demonstrated using four datasets: (1) GPS positioning data the grid resolution was related to the area of circle described by the error radius, (2) map of agricultural plotsthe grid resolution was related to the size of smallest and narrowest plots, (3) point dataset from soil mappingthe grid resolution was related to the inspection density, nugget variation and range of spatial autocorrelation and (4) contour map used for production of digital elevation modelthe grid resolution was related with the spacing between the contour lines i.e. complexity of terrain. It was concluded that no ideal grid resolution exists, but rather a range of suitable resolutions. One should at least try to avoid using resolutions that do not comply with the effective scale or inherent properties of the input dataset. Three standard grid resolutions for output maps were nally recommended: (a) the coarsest legible grid resolutionthis is the largest resolution that we should use in order to respect the scale of work and properties of a dataset; (b) the nest legible grid resolutionthis is the smallest grid resolution that represents 95% of spatial objects or topography; and (c) recommended grid resolutiona compromise between the two. Objective procedures to derive the true optimal grid resolution that maximizes the predictive capabilities or information content of a map are further discussed. This methodology can now be integrated within a GIS package to help inexperienced users select a suitable grid resolution without doing extensive data preprocessing. r 2005 Elsevier Ltd. All rights reserved.
Keywords: Grid resolution; Scale; Inspection density; Point pattern analysis; Variogram; Terrain complexity
1. Introduction A grid cell, popularly known as pixel, is the fundamental spatial entity in a raster-based GIS (Gatrell, 1991; DeMers, 2001). Although there is practically no difference between pixel and grid cell, geoinformation scientists like to emphasize that pixel is a technology and grid is a model (De By,
Detailed instructions to derive a suitable grid resolution available at http://hengl.pfos.hr/PIXEL/ Tel.: +39 0332 785535; fax: +39 0332 786394. E-mail address: tomislav.hengl@jrc.it. 0098-3004/$ - see front matter r 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2005.11.008
ARTICLE IN PRESS
1284 T. Hengl / Computers & Geosciences 32 (2006) 12831298
2001). A grid means ideal propertiesorthogonal matrix, xed resolution, which a raster image does not necessarily has to t. For example, an aerial photo rst needs to be ortho-rectied and then resampled to a regular grid to (approximately) t the grid model (Rossiter and Hengl, 2002). Grid cell can be also related (but should not be confused) with the support size, which is typically a xed area or volume of the land that is being sampled. Support size can be increased by using composite samples or by averaging point-sampled values belonging to the same blocks of land. In geostatistics, one can also control the support size of the output models by averaging multiple predictions per regular blocks of land, which is known as block kriging (Heuvelink and Pebesma, 1999). This means that we can sample at point locations, then make predictions for blocks of 10 10 m. The latter often confuses GIS users because we can produce predictions at regular point locations (point kriging) and then display them using a raster map, but we can also make predictions for blocks of land (block kriging) and display them using the same raster model (Bishop et al., 2001). This distinction is especially important for the validation of the spatial prediction models because it can lead to serious misconceptionsvalidating a point
model (support size of few centimeters) at 1 km support or vice versa can be quite discouraging (Stein et al., 2001). Although the raster structure has a number of serious disadvantages such as of under- and oversampling in different parts of the study area and large data storage requirements, it will remain the most popular format for spatial modelling in the coming years (DeMers, 2001). What makes it especially attractive is that most of the technical characteristics are controlled by a single measure: grid resolution, expressed as ground resolution in meters. The enlargement of grid resolution leads to aggregation or upscaling and decrease of grid resolution leads to disaggregation or downscaling. As grid becomes coarser, the overall information content in the map will progressively decrease and vice versa (McBratney, 1998; Kuo et al., 1999; Stein et al., 2001). In cartography, coarser grid resolutions are connected with smaller scales and larger study areas, and ner grid resolutions are connected with larger scales and smaller study areas. The former denition often confuses non-cartographers because bigger pixel means smaller scale, which usually means larger study area (Fig. 1). Note in Fig. 1, that both aggregation and disaggregation can be done before or after geo-computation. If the
COARSER GRID
SMALL SCALES
BASE MAPS scale SMODEL scale SDERIVED MAPS scale S-
Aggregation
UPSCALING
MODEL scale S
YES
BASE MAPS scale S
DERIVED MAPS scale S
Disaggregation
DOWNSCALING
NO
BASE MAPS scale S+ FINER GRID
MODEL scale S+
DERIVED MAPS scale S+
LARGE SCALES
Fig. 1. Upscaling and downscaling in a grid-based GIS. S indicates scale: S are smaller scales and S are larger scales. Based on McBratney (1998).
ARTICLE IN PRESS
T. Hengl / Computers & Geosciences 32 (2006) 12831298 1285
model is linear, the two routes should yield the same results (Heuvelink and Pebesma, 1999); if not, there can be serious differences. Aggregation is fairly useful procedure to reduce the small scale variation and get better idea about the general pattern (Stein et al., 2001). In contrast, if our objective is to locate extremes (hot spots), then aggregation is something we should avoid. Many researchers investigated the effects of grid resolution on the accuracy of their models. Applications range from mapping of soil properties (Florinsky and Kuryakova, 2000), modelling of surface runoff (Kuo et al., 1999; Molnr and Julien, 2000), sea currents (Davies et al., 2000) or modelling of meteorological data (McQueen et al., 1995; Noda and Niino, 2003). In the case of terrain data, aggregation of grid resolutions will seriously deteriorate accuracy of terrain parameters (Weihua and Montgomery, 1994; Dietrich et al., 1995; Thompson et al., 2001). Wilson et al. (2000) demonstrated the impact the grid resolution makes on hydrological analysis: as the grid resolution increased from 30 to 200 m the ow-path length and specic catchment area maps changed drastically. The grid resolution can also be crucial for the accuracy of the simulation model such as surface runoff or erosion models (Sanchez Rojas, 2002). Kienzle (2004) gives a systematic overview of effects of various grid resolutions on the reliability of terrain parameters suggesting ner grid resolutions from 520 m. Liang et al. (2004), on the other hand, observed impacts of different spatial resolutions on modelling surface runoff and concluded that the resolution needs to be improved only to a critical level after which the model will not necessarily perform better. Weihua and Montgomery (1994) also got a substantial improvement with 10 m grid resolution over 30 and 90 m data, but 2 or 4 m data gave only marginal improvements. Florinsky and Kuryakova (2000) focused specically on the importance of grid resolution of terrain parameters on the efciency of spatial prediction of soil variables. They plotted correlation coefcients versus different grid resolutions and looked for the grid size with most powerful prediction efciency. Bishop et al. (2001) suggested use of the Shannons information criterion to select the optimal block size for block kriging. All these experiments clearly prove two things: (1) grid resolution plays an important role for the efciency of the mapping and (2) its selection can be optimized, to a certain level, to satisfy both processing capabilities and representation of spatial variability.
Although much has been published on the effect of grid resolution on the accuracy of spatial modelling, choice of grid resolution is seldom based on the inherent spatial variability of the input data (Vieux and Needham, 1993; Bishop et al., 2001). In fact, in most GIS projects, grid resolution is selected without any scientic justication. In the ESRIs package ArcGIS, for example, the default output cell size is suggested by the system using some trivial rule: in the case the point data is being interpolated in Spatial Analyst, the system will take the shortest side of the study area and divide it by 250 to estimate the cell size (ESRI, 2002). Obviously, such pragmatic rules do not have a sound scientic background. This motivated me to produce methodological guides to select a suitable grid resolution for output maps based on the inherent properties of the input data. I tried to relate the choice of grid resolution to measurable cartographic and statistical concepts such as: scale, processing power, positional accuracy, inspection density, spatial dependence structure and complexity of terrain. I will rst recommend some general rules of thumb to select the grid resolution and then demonstrate how to select a legible grid resolution given the real datasets.
2. Methods 2.1. Grid resolution and cartographic concepts Although we live in a digital era where we do not necessarily work with hard copy maps, spatial resolution and extent are still strongly related with the traditional cartographic concepts (Quattrochi and Goodchild, 1997; Goodchild, 2001). For example, in traditional soil cartography the scale of an existing map is commonly assessed by estimating either the maximum location accuracy (MLA) or average size area (ASA) of the polygons on the ground (Rossiter, 2003). These cartographic denitions can also be used to estimate the suitable grid resolution for a given mapping scale. As a rule of thumb, Rossiter (2003) suggests that four grid cells can be considered equivalent to the minimum legible delineation (MLD), which is the smallest size area that we map. According to the denition of Vink (1975) the MLD is 0:25 cm2 on the map, so the suitable grid resolution can estimated based on
ARTICLE IN PRESS
the scale number SN: r p MLD SN 2 0:000025 SN 0:0025 pp 4 2
universal rule of thumb to relate scale with grid resolution. (1) 2.2. Grid resolution and computer processing power The grid resolution can also be related with the size of area and processing power of our computer. Although we might insist on using the nest grid resolution possible, the calculation time will increase exponentially (cubically) with the total number of pixels in a map (McBratney, 1998). This means that grid resolution needs to match capabilities of our computer and time given to complete a GIS project. Following the popular Moores law, Lagacherie and McBratney (2005) discussed the relationship between the grid resolution and processing capabilities of standard desktop PCs and discovered the following rough relationship between the log of the image size and the current year (Fig. 3a): log10 m 0:14 Y 1955 (3)
where p is the grid (pixel) size and MLD is the minimum legible delineation area on the ground in m2 . This means that for a 1:50 K scale, MLD is 6.25 ha and suitable grid resolution is 125 m, which seems fairly coarse. Somewhat larger grid resolutions from 0.5 to 3 mm on the map have been also recommended by Valenzuela and Baumgardner (1990). The grid resolution can also be related to the MLA, which commonly ranges from 0.25 mm to maximum of 0.1 mm on the map (Vink, 1975). This gives the smallest legible resolutions pXSN MLA SN 0:00025 0:0001 (2)
So for 1:50 K scale, the smallest legible grid resolution is 12.5 m (5 m). Resolutions ner than 5 m truly do not make sense as it will be hard to visualize or print them at this scale of work. Following notions in microscopy, and the Nyquist frequency concept from signal processing (Shannon, 1949), which states that the original signal can be reconstructed if sampling frequency is two times the original frequency, McBratney et al. (2003, Table 1), suggested that there should be at least 2 2 pixels to represent smallest rounded objects of interest and at least two pixels to represent the width of elongated objects. The smallest objects are typically of size 1 1 mm on the map, so that the grid resolution can be determined using the p 0:5 mm rule (Fig. 2a). The former can be used as the
where log10 m is the logarithm of the image size in pixels and Y is the current year. According to this formula, the standard image size in year 2005 would be about 107 pixels, which makes an image of 3000 3000 pixels. The grid resolution can also be estimated by dividing the size of the study area by the number of pixels that computer can handle. For example, if the size of area is about 100 000 km2 and computer can handle 107 pixels, we should probably work with resolutions of 100 or more meters. Based on this principle, we can also derive the coarsest global standard resolution. This is the resolution of an
200K
20
0.00
025
Scale number
0.0
00
150K
Observations / km2
15
100K
10
4 / cm2 2.5 / cm2 1 / cm2
50K
0.0
025
(a)
50 100 150 Grid resolution (m)
200
(b)
50 100 150 Grid resolution (m)
200
Fig. 2. Popular cartographic rules to select grid resolution: (a) relationship between grid resolution and cartographic scale; (b) relationship between grid resolution and density of observation points for soil mapping applications.
ARTICLE IN PRESS
1012 1010 Image size (pixels)
2040
2030 108 106 104 2010 102 2000 (b) Year 1980 2000 Year 2020 2020
2004
(a)
500 1000 1500 Standard grid resolution (m)
2000
Fig. 3. (a) Growth of standard image size in pixels also follows Moores Law. (b) By year 2040, images of almost all Earth should be available in resolutions of 25 m or better. Note inection point at year 2026.
image covering the whole Globe. The surface of Earth is about 5:10 108 km2 (Yoder, 1995), which means that the coarsest global standard resolution in the year 2005 is about 7 km. If the Eq. (3) is correct, the coarsest global standard resolution in year 2040 will be about 25 m (Fig. 3b). At that time the computers will be so powerful that they will be able to handle images of million by million pixels! These are of course rather simple models and real gures might differ from application to application. 2.3. Grid resolution and GPS positioning If the scale of a project is unknown or nonstandard, we can assess it by analyzing the mapping methodology. For example, the choice of the grid resolution can be related with the positional accuracy of our eld positioning method. This is especially important for integration of GPS with remote sensing images and aerial photos where we should be certain that our GPS reading will (most probably) fall inside the right pixel. To ensure this, we rst need to estimate the condence radius of the positioning method using some control points. The condence radius is the radius of a circle where we expect the most P 95% of the points to appear (Arnaud and Flori, 1998). It is normally evaluated using the error vector rE , which is as a difference between the measured X GPS ; Y GPS and the true location of the control point X T ; Y T : q (4) rE X GPS X T 2 Y GPS Y T 2
As a rule of thumb, one should select the grid resolution so that the area of circle described by the error radius is equal or smaller than the area of pixel (Fig. 4b): rE p2 X 2 p r2 E (5)
where is the average error radius and p is the grid resolution. The recommended grid resolution is the one where most of the points (95%) would fall within the pixel q p r2 (6) EP95% p % 1:8 rEP95% where rEP95% is the 95% probability error radius or condence radius derived from the cumulative distribution of the error vectors from a set of measurements (Fig. 4a). Garmin (www.garmin. com), for example, claims that their handheld receivers (selective availability turned off) achieve horizontal error of not more than 15 m in 95% of time. This would be then compatible with a grid resolution of 27 m. 2.4. Grid resolution and remote sensing systems As with raster-based GIS, grid resolution controls many aspects of remote sensing systems used for mapping. Characteristics of objects are scale dependent and their total number, area, average size or perimeter of objects being mapped will differ for different grid resolution (Lillesand and Kiefer, 2000, pp. 598603). In fact, one will never be able to absolutely determine how many islands or lakes are
ARTICLE IN PRESS
confidence radius
error vector
(a) (b)
grid resolution
Fig. 4. Some important concepts for evaluation of positioning methods: (a) distribution of error vectors follows log-normal distribution; (b) condence radius where most of measurements fall should not have larger area than grid node.
in the world and what is their perimeter. Although we cannot actually change the grid resolution in remote sensing images, we should at least be certain that the images are adequate for our mapping application. We can do that by inspecting the size of smallest spatial objects that are being mapped and then select the appropriate grid resolution, i.e. remote imaging system. Again, there should be at least four pixels to represent smallest objects and at least two pixels to represent the narrowest objects, which can be expressed mathematically as 8 p > aMLD > < if So3 (7) pp w 4 > MLD > if S43 : 2 where aMLD is the area of the smallest objects, wMLD is the width of the narrowest objects and S is the shape complexity index derived as the perimeter to boundary ratio: r P a ; r S (8) 2rp p where P is the perimeter of polygon, a is the area of polygon and r is the radius of circle with the same surface area (Hole, 1953). Arbitrary value of 3 is used to differentiate between compact and narrow/ long polygons. This means that we can rst sample size of spatial objects (e.g. crop plots, water bodies, forest patches, roads) in part of the study area and then decide on the suitable sensor. To be more certain, one can plot the histogram with cumulative
distributions and derive the 5% probability area, i.e. 5% probability width of the reference objects (Garbrecht and Martz, 1994). For example, if the objective of our mapping project are agricultural plots, and if the smallest plots are about 1 ha, the grid resolution should be at least 50 m, which means that we should use Landsat or similar imagery. If the smallest plots are about 0.05 ha (corresponds to the scale 1:5000), we will need images with resolutions of 10 m and ner. In addition to the size of smallest objects, one also needs to take into account how contrasting is an object of interest when compared with its surroundings. The local contrast between adjacent objects can be assessed by statistically comparing the reectance values between the target object rT and surrounding object rS . Obviously, the higher the difference, the less strict we have to be about the grid resolution. For example, consider the centerline road markers of size of 20 cm. Such markers will be visible even on the 60 cm resolution QuickBird images because of high contrast with the surrounding objects. The impact of local contrast in the case of aggregation is illustrated in Fig. 5b: if the contrast is fairly low, the resampling will totally diminish point or line object so they cannot be distinguished from the surroundings. 2.5. Grid resolution and point samples In many mapping projects, a map is made out of the point samples collected in the eld and then used
ARTICLE IN PRESS
Fig. 5. Impact of local contrast on visibility of objects for point, line and polygon features: (a) high contrast between target object and surrounding rT 0:4%; rS 99:6%; and (b) low contrast between target object and surrounding rT 78:4%; rS 94:1%. Fine resolution images have been resampled to coarser grids using bilinear resampling.
to make predictions. To be consistent, every mapping project should have approximately an equal density of samples per area, also called inspection density. Obviously, the denser the observation points, the larger the scale of mapping. A cartographic rule, used for example in soil mapping, is that there should be at least one (ideally four) observation per 1 cm2 of the map (Avery, 1987, Table 1). This principle can be used to estimate the effective scale of a data set consisting of sampled points only. For example, 10 observations per km2 corresponds to the scale of about 1:50 K. The same principle can be also expressed mathematically r r A A 2 SN 4 10 . . . SN (9) 102 N N where A is the surface of the study area in m2 and N is the total number of observations. Remember from Eq. (1) that the scale number can be used to estimate the grid resolution. If we take the intermediate number of 2.5 observations per cm2 and combine it with the p 0:5 mm on the map rule of thumb, with a bit of reduction, we nally get a simple formula r A p 0:0791 (10) N So, for example, if we deal with 100 samples and the size of the area is 10 km2 , a recommended grid
resolution would be 25 m (see also Fig. 2b), which can also be expressed as 160 pixels per point sample. Grid resolution can also be related to geometry of point patterns, i.e. distance between the sampled points (Boots and Getis, 1988). Following, again, the Nyquist frequency concept (Shannon, 1949) the grid resolution should be at most half the average spacing between the closest point pairs pp hij 2 (11)
where hij is the average distance between two closest point pairs also called mean shortest distance. In the case of regular point samples, the formula simplies to r A p 0:5 (12) N or 4 pixels per point sample. Note that the difference between the factors in Eqs. (10) and (12) is fairly big. You should keep in mind that Eq. (12) is valid only if we are dealing with (spatially) absolutely regular point samples. If the point samples show more random or clustered distribution of points, we need to be somewhat more strict about the grid resolution. If we are dealing with random point samples, then the average spacing between the closest point pairs is approximately half the spacing between closest point pairs in regular point samples (Fig. 6b). This
ARTICLE IN PRESS
(a)
1.00
(b)
In addition, we can inspect the spatial correlation structure of the point data set and use this information to select the grid resolution. Here the key is to estimate the range of spatial dependence. Obviously, a variable that is spatially auto-correlated at shorter distances would require ner grid resolution and vice versa. Selection of grid spacing can be related with estimation of the optimal bin size that can estimate the probability density function, used for example in statistics to display histograms (Izenman, 1991). This gives a rather simple formula p hR m3
regular point sampling
1
Probability of finding one point neighbour
0.75
(15)
0.50 random point sampling 0.25
(c)
Distance (m)
Fig. 6. Selection of grid resolution in relation to different point sample patterns: (a) in situation of regular sampling, coarsest grid resolution is half average spacing to closest point pair; (b) in situation of random sampling, coarsest grid resolution should be about two times ner; (c) probability of nding one point in neighborhood for regular and point samples.
is because random sampling has equal probability of producing totally clustered and totally regular samples (Fig. 6c). See also the supplementary materials (http://hengl.pfos.hr/PIXEL/) for further argumentation. The factor then modies to r A p 0:25 (13) N We can be even more strict and look for a grid resolution where 495% of closest point pairs do not fall into the same pixel. This can be done using some well known point pattern analysis algorithms such as the one described in Boots and Getis (1988) and Rowlingson and Diggle (1993). We rst need to derive the probability of nding the rst point pair at different distances and for a given data set. We can then select the 5% probability smallest distance (see Fig. 6c) pXhijP5% (14)
After we have determined the range of spatial dependence (hR ), we need to count the number of point pairs (m) within that range and then derive a suitable grid resolution that can be used to represent the spatial dependence structure. Another useful thing of estimating the spatial dependence structure is that we get an estimate of the nugget variance, i.e. the variation that happens at zero distances. This is a part of variation that we would typically like to remove or diminish, which can be achieved by using for example block kriging (Heuvelink and Pebesma, 1999; Bishop et al., 2001). Because the nugget variance refers to microscale variation (in fact, innitely small distances), any block size with positive area would average out short-range variation. To be on the safe side, we can increase the size of the block B depending on the amount of nugget variation. A satisfactory block size can be estimated by comparing the derived estimation variance with the original variation in the data, so-called normalized estimation error sE sE% (16) sz where sE is the estimation error (precision) and sz is the total variation in the spatial data. This ratio is equivalent to the coefcient of multiple determination R2 used in regression analysis to show how much of the variation has been explained by a model. If sE% p40% this means that the prediction model has explained 85% or more of the variation in the data. In contrary, when the sE% gets close to 100%, this means that the prediction model is completely weak R2 0. In practice, we can run block-kriging predictions for different block sizes and than select the block size for which the normalized estimation error is good enough to map the whole area (e.g. sE% 40%). Similarly,
ARTICLE IN PRESS
there is no gain in using larger block sizes if there is no nugget variation or if already 95% of variation is explained by the prediction model sE% 23%; R2 0:95. 2.6. Grid resolution and terrain analysis The key problem when selecting a grid resolution for terrain analysis is that there can be a signicant difference between the surface elevation on a coarser grid versus the actual topography, meaning that some peaks and channels might disappear in a raster DEM. In general, an increase in the detail in the DEM will also mean more accurate terrain parameters. This increase, however, depends on the general variability of the landscape. For example, a generally simple and smooth landscape might not need a ne resolution DEM. Even more so, if the grid resolution is too ne it might introduce local artefacts or slow down computation of terrain parameters. Obviously, we need a grid resolution that optimally reects the variability of the elevation surface and is able to represent the majority of geomorphic features (Borkowski and Meier, 1994; Kienzle, 2004). A suitable grid resolution can be derived for given sampled elevations (e.g. contours) and based on the complexity of (sampled) terrain. Imagine a onedimensional topography with a number of inection points (Fig. 7). We can again connect the problem of the grid resolution with the NyquistShannon sampling theorem (Shannon, 1949; Tempi, 1999). In our one-dimensional example above, terrain is the signal and its frequency is determined by the density of inection points. Hence, the grid resolution should be at least half the average spacing between the inection points: l pp 2 ndz (17)
27 25 23 21 19 0 5 x (m) 27 25 23 21 19 z (m) 0 5 x (m) 27 25 23 21 19 0 5 x (m) 27 25 23 21 19 0 5 x (m) 10 15 p = 0.5 m 10 15 p = 1.2 m 10 15 p = 2.5 m 10 15
where l is the length of a transect and ndz is the number of inection points observed. We can also be more strict and look for the 5% probability spacing between the inection points. In the example in Fig. 7 there are 20 inection points and average spacing between them is 0.8 m. Hence, a grid resolution of at least 0.4 m is recommended. Cumulative distribution of distances between the inection points shows that 5% threshold of the smallest spacing between the inection points is 0.2 m. Hence, grid resolution of at least 0.4 m and at most 0.2 m is recommended.
Fig. 7. Schematic example showing effect of grid resolution on representation of topography: too coarse grid resolution p 2:5 m will misrepresent topography; whereas ner grid resolution p 0:5 m will be more effective in representing all peaks and channels.
ARTICLE IN PRESS
In the 2D case, the suitable grid resolution can be estimated from the total length of contours. Here the contours present mapped changes of xed elevations. We do not actually have a map of inection points, but can approximate them using the contour map. The suitable grid resolution is then p 2 A P l (18)
P where A is the total size of the study area and l is the total cumulative length of all digitized contours. A more strict approach is to evaluate the density of contour lines in an area and derive the 5% probability smallest width of contours. 3. Case studies 3.1. Example 1: GPS positioning In this example, I will rst demonstrate how to select a grid resolution based on the evaluation of the GPS positioning method. In this case, 100 positioning xes were recorded using the single-x GPS positioning method (Arnaud and Flori, 1998) at the control point with a known location. The uctuation of the GPS readings can be seen in Fig. 8a. The errors ranged from 0.7 to 23.9 m, average error was 8.5 m with a standard distribution of 5.2 m. The error vectors seem to follow the
log-normal distribution (Fig. 8b). The theoretical log-normal distribution gave the 95% probability radius of 19.1 m, while the experimental distribution shows a somewhat higher value (20.4). Eq. (6) gives us a suitable grid resolution of 34.4 m. If this grid resolution is selected, most (95%) of GPS xes will fall within the right pixels. This number for example corresponds to the resolution of the Landsat imagery. A more accurate positioning method would be needed to locate points within ner grid resolutions. For example, August et al. (1994) showed that using the averaging of multiple GPS xes, the 95% radius of a standard GPS method can be decreased to up to 2.5 times by averaging 300 replicates. If we would like to use a GPS positioning with grid resolutions of about 15 m, then we would need to use GPS positioning with averaging (5 min per point). Higher positional accuracy (520 times) can be achieved by using differential correction, which can improve accuracy to less than three meters on average. Such accuracy would be compatible with grid resolutions within the range 210 m (SPOT or IKONOS imagery). 3.2. Example 2: monitoring agricultural plots I will now demonstrate how to select a remote sensing sensor based on the size of agricultural plots (Fig. 9a). The polygon map consists of 121 polygons in total. The smallest polygon is 0.005 ha, the
GPS single-fix True location of the point 20
theoretical distribution
average error experimental distribution
frequency
10
0 0 10 95% probability confidence radius (b) 20 m
0 (a)
20 m
Fig. 8. Selecting grid resolution based on condence radius of positioning method: (a) 100 single-x GPS measurements around true location of point; (b) histogram of error vectors, average error vector and 95% probability condence radius.
ARTICLE IN PRESS
24
frequency
12
3 polygon size
6ha
1.00 Cumulative probability 0.75 0.50 50% probability 0.25 0
0 (a)
200 m (b)
50 0 5% probability
100
150 200 m grid resolution
Fig. 9. Selection of grid resolution based on average size of objects observed: (a) agricultural plots; (b) distribution of surfaces for compact plots So3 and related grid resolution.
biggest is 6.903 ha, average size of polygons is 0.824 ha with standard deviation of 1.005 ha. The polygons were then separated into two groups according to the shape complexity index. In this case only six polygons classied for narrow polygons S43. For each of these, an average width has been estimated by taking regular measurements (10 per polygon). A histogram of areas of compact polygons and a cumulative theoretical log-normal distribution can be seen in Fig. 9b. I further on derived the 5% inverse cumulative distribution value assuming the log-normal distribution. I got 0.046 ha, which means that the pixel size should be about 20 m. The coarsest legible grid resolution for this data set P 50% would be 70 m A 0:5 ha. If resolutions coarser than 70 m are used to monitor agriculture for this area, then in more than 50% of the areas there will be less than four pixels per polygon. Note that in this case we are not using the true smallest polygon size but a somewhat higher gure (0.046 ha) because the smallest value (0.005 ha) is not representative. Further inspection of the widths showed that the average width of the narrow polygons is about 16 m, which gives a somewhat more strict grid resolution of about 8 m. However, the narrow polygons occupy only 0.9% percent of the total study area,
so we do not have to be as strict. Finally, I would recommend that satellite imagery in a range from 10 to 70 m can be used to monitor agriculture in this study area. 3.3. Example 3: point data interpolation In this example, I will use the Wesepe point data previously used in numerous soil mapping applications (De Gruijter et al., 1997). The dataset consist of 552 prole observations where various soil variables have been described. The target variable is the membership value to enk earth soil type to analyze the spatial dependence structure. The values range from 0 to 1, with an average of 0.232 and a standard deviation of 0.322. The total size of the area is 12:1 km2 , which gives a sampling density of about 45 observations per km2 , which corresponds to the scale of 1:25 K, i.e. grid resolution of 12 m (Eq. (10)). If we inspect the spreading of the points, we see that the average spacing between the closest point pairs is about 120 m, which is fairly close to the regular point sampling (for this data set 148 m). The cumulative distribution showed that 95% of points are at distances of 5 and more meters. This means that the legible grid resolutions are between 5 and 150 m (Eq. (12)).
ARTICLE IN PRESS
Automated tting of the variogram using a global model further gave a nugget parameter C 0 of 0.042, a sill C 0 C 1 of 0.097 and a range parameter R of 175.2 m, which means that the variable is correlated up to the distance of about 525 m hR . There are 11 807 point pairs within this range, which nally means that the optimal lag/grid size would be about 23 m. The pattern analysis of the point data set further shows that there is clear regularity in the point geometry: most of the distances are grouped at 180 m. The nal interpolated map in resolution of 10 m can be seen in Fig. 10d. If we apply block-kriging, the nugget variation, which seems to be signicant in this case C 0 0:042, would be diminished and the model would explain 71% of variation at the support size of 10 m and 78% at the support size of 60 m. Note
that although further increasing of the block size would results in somewhat better normalized estimation error, on the other hand, we would unnecessarily loose a level of detail by increasing the block size above 120 m because more than 50% point pairs would be within the same blocks. Also note that this sampling density corresponds to the scale of 1:25 K, which means that the support size should not exceed 25 m (Fig. 11). 3.4. Example 4: contour data for terrain modelling In this case study, I will demonstrate how a grid resolution can be selected from a map of contours, i.e. a dataset consisting of lines digitized from a topo-map. The study area is described in detail in Hengl et al. (2004). Contour lines were extracted
0.5 0.4 0.3 0.2 0.1 0.0
0
(a)
2000 m
(d)
1.00
0.120
3390 4194 5854 6327 5315 5109 6432 6554 5751
0.75 Semivariance Probability
0.090
1667 1370
3167
0.50
0.060
219
0.25
0.030
Nugget = 0.042 C0+C1 = 0.097 R=175.2 m exponential model
200
(b)
400 600 Distance (m)
800
(c)
400
800 1200 Distance (m)
1600
Fig. 10. Selection of grid resolution based on point pattern analysis: (a) a set of 552 soil prole observations; (b) probability of nding one point in neighborhood and graph of distances to closest point; (c) variogram and parameters tted automatically in gstat (http://www.gstat.org), (d) interpolated map using ordinary kriging at grid resolution of 10 m.
ARTICLE IN PRESS
B=0
B = 10
B = 60
0.50 0.40 0.30 0.20 0.10 0.00
0.100 0.080 0.060 0.040 0.020 0.000
R 2 = 31%
R 2 = 71%
R 2 = 78%
Fig. 11. Kriging predictions (above) and the kriging estimation error (below) for different block sizes. Increasing the block size (support) typically leads to lower estimation error and total reduction of the nugget variation. Compare with the punctual estimation B 0 on the left.
from the 1:50 K topo-map (Fig. 12a), with the contour interval of 10 m and supplementary 5 m contours in areas of low relief. The total area is 13:69 km2 and elevations range from 80 to 240 m. There were 127.6 km of contour lines in total, which means that the average spacing between the contours is 107 m. The grid resolution should be at least 53.5 m to present the most of the mapped changes in relief. I then derived the distance from the contours map using the 5 m grid and displayed the histogram of the distances (Fig. 12b) to derive the 5% probability distance. Absolutely shortest distance between the contours is 7 m, and the 5% probability distance is 12.0 m. Finally, I can conclude that the legible resolution for this data set is within the range 12.053.5 m. Finer resolutions than 12 m are unnecessary for the given complexity of terrain. Note that selection of the most suitable grid resolution based on the contour maps is scale dependant. For the contour lines digitized from the 1:5 K topo maps (Fig. 12c), the average spacing between the lines is 26.6 m and the 5% probability
distance is 1.6 m. This means that, at 1:5 K scale, the recommended resolutions are between 1.6 and 13.3 m. 4. Discussion and conclusions There are three important concepts brought out in this paper that need to be further emphasized. First, principles from the general statistics and information theory, such as Nyquist frequency concept from signal processing and equations to estimate the probability density function, can be closely related to selection of the grid resolution. Second, there are three standard grid resolutions that can be derived for each input data: Coarsest legible grid resolutionthis is the largest resolution that we should use given a specic scale, positional accuracy, size of objects or/and complexity of terrain. Using resolutions coarsest than the coarsest legible resolution means that we are either not respecting the scale of work,
ARTICLE IN PRESS
Fig. 12. Selection of grid resolution based on complexity of terrain: (a) and (c) contours from 1:50 and 1:5 K topographic maps, (b) and (d) histograms of distances between contours for two scales.
positional accuracy, inspection density, size of objects being mapped or the complexity of terrain. Finest legible grid resolutionthis is the smallest grid resolution that represents the most (95%) of spatial objects or topography. This is the nest meaningful resolution which corresponds to the concept of the maximum location accuracy. Resolutions ner than the nest legible resolution are probably just waste of memory. Recommended grid resolutionthis is a compromising resolution, usually set as the intermediate number between the coarsest and nest resolutions. Third, the choice of the grid resolution needs to be considered in relation to a number of inherent properties of a dataset or study area. The para-
meters and summary equations to derive coarsest and nest grid resolutions discussed in this paper are given in Table 1. These formulas can now be integrated within a GIS package to help inexperienced surveyors derive grid maps without doing extensive data preprocessing. A simple calculator can be found via the website listed at the beginning of the paper. From all discussed statistical approaches to determine the true optimal pixel size, two seems to be most promising. The rst is to select the pixel size that yields best predictive properties. For example, if we wish to use terrain parameters to predict soil properties over entire study area, we can test terrain parameters derived at various resolutions and then select the one that offers the highest correlation coefcient with the target variable (Florinsky and Kuryakova, 2000). A problem of this approach is
ARTICLE IN PRESS
T. Hengl / Computers & Geosciences 32 (2006) 12831298 1297 Table 1 Summary equations to select grid resolution: SN is scale factor, rE is positioning error, rE is average positioning error, a is average size of delineations, aMLD is area of the minimum legible delineation, wMLD is width of narrowest legible delineation, A is surface of study area, N is number of sampled points in study area, hij is spacing between closest point pairs, hij is average spacing between closest points, hR is range of spatial dependence, m is number of point pairs within range of spatial dependence, and l is total length of contours Aspect Working scale GPS positioning error Size of reference objects Inspection density Distance between points Spatial dependence structure Complexity of terrain Coarsest legible resolution pSN 0:0025 p1:8 rEP99% p p 4a q A p0:1 N
hij 2 R p h2 A P
Finest legible resolution XSN 0:0001 p XE p r p wMLD X 2 q A X0:05 N XhijP5% XhijP5% X wMLD 2
Recommended compromise SN 0:0005 1:8 rEP95% p a MLD 4 q A 0:0791 N q A 0:250:5 N hR m3 A 2P l

1
that it can be fairly time consuming to test all possible combinations of different grid resolutions. In addition, the prediction power versus grid resolution graph might give a set of different peaks for different target variables, so that we cannot select a single optimal grid resolution. Finally, this grid resolution is then valid only for this study area and its effects might be different outside the area. The second analytical approach to select the resolution is to derive information content for different block size values and then select the one that offers the richest amount of information per unit area (Bishop et al., 2001). In this case, only a single combination of the prediction model and block size should yield the highest information content. Such principles still need to be rened and tested in various case study to see if they really lead to optimal grid resolutions with maximum information content for a given level of detail. Selection of the right pixel size will remain an issue that is relative to application type and project objectives. Moreover, standard grid resolutions (20200 m in most cases) with which we work today, will soon shift toward ner and ner, which means that we need to consider grid resolution in time context also. No absolute ideal pixel size exist, that is for sure. One should at least try to avoid using resolutions that do not comply with the inherent properties of the input datasets. References
Arnaud, M., Flori, A., 1998. Bias and precision of different sampling methods for GPS positions. Photogrammetric Engineering & Remote Sensing 6 (June), 597600.
August, P., Michaud, J., Labash, C., Smith, C., 1994. GPS for environmental applications: accuracy and precision of locational data. Photogrammetric Engineering & Remote Sensing 60 (1), 4145. Avery, B., 1987. Soil survey methods: a review. Technical Monograph No. 18. Soil Survey & Land Resource Centre, Silsoe, Bedfordshire, England, 86pp. Bishop, T.F.A., McBratney, A.B., Whelan, B.M., 2001. Measuring the quality of digital soil maps using information criteria. Geoderma 103 (1), 95111. Boots, B., Getis, A., 1988. Point Pattern Analysis. Scientic Geography Series, vol. 8. Sage, Newbury Park, CA, 224pp. Borkowski, A., Meier, S., 1994. A procedure for estimating the grid cell size of digital terrain models derived from topographic maps. Geo-Informations-System 7 (1), 25. Davies, A.M., Kwong, S.C., Flather, R., 2000. On determining the role of wind wave turbulence and grid resolution upon computed storm driven currents. Continental Shelf Research 20 (14), 18251888. De By, R. (Ed.), 2001. Principles of Geographical Information Systems. ITC Educational Textbook Series, vol. 1. ITC, Enschede, The Netherlands, 466pp. De Gruijter, J.J., Walvoort, D.J.J., van Gaans, P.F.M., 1997. Continuous soil mapsa fuzzy set approach to bridge the gap between aggregation levels of process and distribution models. Geoderma 77 (24), 169195. DeMers, M.N., 2001. GIS Modeling in Raster. GIS & Remote Sensing. Wiley, West Sussex, England, 208pp. Dietrich, W., Reiss, R., Hsu, M., Montgomery, D., 1995. A process-based model for colluvial soil depth and shallow landsliding using digital elevation data. Hydrological Processes 9 (34), 383400. ESRI, 2002. Spatial Analyst Help Documentation. ArcGIS Users Guide. ESRI Inc., Redlands, CA, 563pp. Florinsky, I., Kuryakova, G., 2000. Determination of grid size for digital terrain modelling in landscape investigations exemplied by soil moisture distribution at a micro-scale. International Journal of Geographical Information Science 14 (8), 815832.
ARTICLE IN PRESS
1298 T. Hengl / Computers & Geosciences 32 (2006) 12831298 Molnr, D.K., Julien, P.Y., 2000. Grid-size effects on surface runoff modeling. Journal of Hydrologic Engineering 5 (1), 816. Noda, A., Niino, H., 2003. Critical grid size for simulating convective storms: a case study of the del city supercell storm. Geophysical Research Letters 30 (16), 14. Quattrochi, D., Goodchild, M. (Eds.), 1997. Scale in Remote Sensing and GIS. Mapping Sciences Series. Lewis Publishers, Boca Raton, FL, 432pp. Rossiter, D.G., 2003. Methodology for Soil Resource Inventories, third ed. ITC Lecture Notes SOL.27. ITC, Enschede, The Netherlands, 110pp. Rossiter, D.G., Hengl, T., 2002. Technical note: creating geometrically-correct photo-interpretations, photomosaics, and base maps for a project GIS. Technical Report, ITC, Department of Earth System Analysis, Enschede, The Netherlands, 29pp. Rowlingson, B., Diggle, P., 1993. Splancs: spatial point pattern analysis code in S-plus. Computers & Geosciences 19 (5), 627655. Sanchez Rojas, R., 2002. GIS-based upland erosion modeling, geovisualization and grid size effects on erosion simulations with CASC2D-SED. Ph.D. Dissertation, Colorado State University, Fort Collins, CO. Shannon, C.E., 1949. Communication in the presence of noise. Proceedings of the Institute of Radio Engineers 37 (1), 1021. Stein, A., Riley, J., Halberg, N., 2001. Issues of scale for environmental indicators. Agriculture, Ecosystems & Environment 87 (2), 215232. Tempi, K., 1999. DTM accuracy assessment. In: ASPRS Annual Conference. Portland, p. 11. Thompson, J., Bell, J., Butler, C., 2001. Digital elevation model resolution: effects on terrain attribute calculation and quantitative soil-landscape modeling. Geoderma 100, 6789. Valenzuela, C., Baumgardner, M., 1990. Selection of appropriate cell sizes for thematic maps. ITC Journal 3, 219224. Vieux, B., Needham, S., 1993. Nonpoint-pollution model sensitivity to grid-cell size. Journal of Water Resources Planning and Management 119 (2), 141157. Vink, A., 1975. Land Use in Advancing Agriculture, vol. X. Springer, New York, NY, 394pp. Weihua, Z., Montgomery, D., 1994. Digital elevation model grid size, landscape representation, and hydrologic simulations. Water Resources Research 30 (4), 10191028. Wilson, J., Repetto, P., Snyder, R., 2000. Effect of data source, grid resolution, and ow-routing method on computed topographic attributes. In: Wilson, J.P., Gallant, J.C. (Eds.), Terrain Analysis: Principles and Applications. Wiley, New York, NY, pp. 133161. Yoder, C., 1995. Astrometric and geodetic properties of earth and the solar system. In: Global Earth Physics. A Handbook of Physical Constants, vol. AGU Reference Shelf 1. American Geophysical Union, Washington, DC. Garbrecht, J., Martz, L., 1994. Grid size dependency of parameters extracted from digital elevation models. Computers & Geosciences 20 (1), 8587. Gatrell, A., 1991. Concepts of space and geographical data. In: Longley, P., Goodchild, M., Maguire, D., Rhind, D. (Eds.), Geographic Information Systems, rst ed., vol. 1. Wiley, West Sussex, UK, pp. 119134. Goodchild, M., 2001. Metrics of scale in remote sensing and GIS. International Journal of Applied Earth Observation and Geoinformation 3 (2), 114120. Hengl, T., Gruber, S., Shrestha, D., 2004. Reduction of errors in digital terrain parameters used in soil-landscape modelling. International Journal of Applied Earth Observation and Geoinformation (JAG) 5 (2), 97112. Heuvelink, G.B.M., Pebesma, E.J., 1999. Spatial aggregation and soil process modelling. Geoderma 89 (12), 4765. Hole, F., 1953. Suggested terminology for describing soil as threedimensional bodies. Soil Science Society of America Proceedings 17, 131135. Izenman, A., 1991. Recent developments in nonparametric density estimation. Journal of the American Statistical Association 86 (413), 205224. Kienzle, S., 2004. The effect of DEM raster resolution on rst order, second order and compound terrain derivatives. Transactions in GIS 8 (1), 83112. Kuo, W.-L., Steenhuis, T., McCulloch, C., Mohler, C., Weinstein, D., DeGloria, S., Swaney, D., 1999. Effect of grid size on runoff and soil moisture for a variable-source-area hydrology model. Water Resources Research 35 (11), 34193428. Lagacherie, P., McBratney, A., 2005. Spatial soil information systems and spatial soil inference systems: perspectives for digital soil mapping. In: Lagacherie, P., McBratney, A., Voltz, M. (Eds.), Proceedings of the Global Workshop on Digital Soil Mapping, Montpellier 1417 September 2004. Developments in Soil Science Series. Elsevier, INRA, Montpellier, pp. 115. Liang, X., Guoa, J., Leung, R., 2004. Assessment of the effects of spatial resolutions on daily water ux simulations. Journal of Hydrology 298 (14), 287310. Lillesand, T., Kiefer, R., 2000. Remote Sensing and Image Interpretation, fourth ed. Wiley, New York, NY, 715pp. McBratney, A., 1998. Some considerations on methods for spatially aggregating and disaggregating soil information. Nutrient Cycling in Agroecosystems 50, 5162. McBratney, A., Mendoca Santos, M., Minasny, B., 2003. On digital soil mapping. Geoderma 117 (12), 352. McQueen, J., Draxler, R., Rolph, G., 1995. Inuence of grid size and terrain resolution on wind eld predictions from an operational mesoscale model. Journal of Applied Meteorology 34 (10), 21662181.

Hengl 06 Finding The Right Pixel Size

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Hengl 06 Finding The Right Pixel Size

Hochgeladen von

Copyright:

Verfügbare Formate

ARTICLE IN PRESS

Computers & Geosciences 32 (2006) 12831298 www.elsevier.com/locate/cageo

Finding the right pixel size$

BASE MAPS scale S

DERIVED MAPS scale S

BASE MAPS scale S+ FINER GRID

DERIVED MAPS scale S+

the scale number SN: r p MLD SN 2 0:000025 SN 0:0025 pp 4 2

4 / cm2 2.5 / cm2 1 / cm2

50 100 150 Grid resolution (m)

50 100 150 Grid resolution (m)

1012 1010 Image size (pixels)

500 1000 1500 Standard grid resolution (m)

Probability of finding one point neighbour

0.50 random point sampling 0.25

27 25 23 21 19 0 5 x (m) 27 25 23 21 19 z (m) 0 5 x (m) 27 25 23 21 19 0 5 x (m) 27 25 23 21 19 0 5 x (m) 10 15 p = 0.5 m 10 15 p = 1.2 m 10 15 p = 2.5 m 10 15

GPS single-fix True location of the point 20

average error experimental distribution

0 0 10 95% probability confidence radius (b) 20 m

1.00 Cumulative probability 0.75 0.50 50% probability 0.25 0

150 200 m grid resolution

0.5 0.4 0.3 0.2 0.1 0.0

0.75 Semivariance Probability

Nugget = 0.042 C0+C1 = 0.097 R=175.2 m exponential model

400 600 Distance (m)

800 1200 Distance (m)

0.50 0.40 0.30 0.20 0.10 0.00

0.100 0.080 0.060 0.040 0.020 0.000

Recommended compromise SN 0:0005 1:8 rEP95% p a MLD 4 q A 0:0791 N q A 0:250:5 N hR m3 A 2P l

Das könnte Ihnen auch gefallen