Sie sind auf Seite 1von 10

Designing field studies in soil science

D. J. Pennock
Department of Soil Science, University of Saskatchewan, 51 Campus Drive, Saskatoon, Saskatchewan, Canada S7N 5A8 (e-mail: pennock@sask.usask.ca). Received 22 May 2003, accepted 1 December 2003.
Pennock, D. J. 2004. Designing field studies in soil science. Can. J. Soil Sci. 84: 110. Field research in soil science ranges from modal profile descriptions in support of soil survey to elaborate manipulative experimental designs. All of these field approaches make a valuable contribution to soil science, but researchers who do not use either classical manipulative experimental or geostatistical designs have little guidance (or encouragement) available to them. Well-designed field research of any type requires a clear definition of the research question; a thorough review of the literature to establish the state of knowledge; definition of the population under study and the elements that comprise it; and choice of appropriate scales for sampling support, spacing, and study extent based on an understanding of the underlying processes. For studies where hypothesis testing is appropriate, the hypotheses should be based on sound biological or physical reasoning, and sufficient replicates should be taken to ensure a reliable test. The major challenge in field research design is the development of landscape-scale research designs to examine complex interactions among hydrological, climatic, chemical, and biological processes at scales relevant for environmental management. Key words: Research design, landscape-scale , soil genesis, pattern studies, hypothesis testing, spatial statistics, sampling, cesium Pennock, D. J. 2004. La conception dtudes sur le terrain en science des sols. Can. J. Soil Sci. 84: 110. Dans la science des sols, la recherche sur le terrain va de la description modale de profils pour faciliter les levs en pdologie la conception dexpriences exigeant des manipulations complexes. Ces approches apportent une contribution utile cette science, mais les chercheurs qui ne font pas appel aux manipulations classiques ou aux analyses gostatistiques ont peu pour les guider (voire les encourager). Une exprience sur le terrain bien conue ncessite une dfinition claire du but recherch, un dpouillement mticuleux de ce qui sest crit afin dtablir ltat des connaissances, la circonscription de la population ltude et des lments qui la composent et, enfin, le choix dchelles dune grandeur convenable pour planifier plus aisment lchantillonnage, lespacement des prlvements et lenvergure de lexprience en regard de nos connaissances sur les processus sous-jacents. Quand ltude part dune hypothse quil faut vrifier, cette dernire devrait sappuyer sur un solide raisonnement biologique ou physique, et le nombre de rptitions devrait suffire garantir la fiabilit de lexprience. La principale difficult lorsquon conoit une tude sur le terrain consiste dresser un plan exprimental lchelle du relief qui permettra ltude des interactions complexes entre les processus hydrologiques, climatiques, chimiques et biologiques luvre, une chelle adapte la grance de lenvironnement. Mots cls: Plan exprimental, chelle du relief, gense des sols, tudes de motifs, vrification dhypothse, statistiques spatiales, chantillonnage, csium Statistical analysis and interpretation are the least critical aspects of experimentation, in that if purely statistical or interpretative errors are made, the data can be reanalyzed. On the other hand, the only complete remedy for design or execution errors is repetition of the experiment. (Hurlbert 1984, p. 189)

Can. J. Soil. Sci. Downloaded from pubs.aic.ca by 187.78.55.223 on 03/08/12 For personal use only.

Field research in soil science encompasses a great range of activities yet guidance on the design of field research programs is confined to only a few of these activities. Our colleagues who deal with classical agronomic trials have research design and statistical guidance available at every level from the simple cookbook to the most esoteric of explorations. In more recent times, pedometricians have established a range of appropriate field-sampling designs for geostatistical and related spatial statistical problems. Many of the rest of us, however, are somewhere between these two well-charted research pathways. Often the objects we wish to study cannot be readily manipulated and imposed as treatments, which greatly complicates the application of classical agronomic research designs. The rigorous lay-out and high sample number requirements for many pedometric sampling strategies can be daunting, and there are many research questions these approaches are ill-suited to explore. My purpose in this paper is to categorize the full range
1

of field studies undertaken by soil scientists and then to discuss the design of the field-sampling programs appropriate for each of the categories. The main emphasis is placed on studies outside of the classical agronomic and pedometric frameworks. Although critical evaluation of field research studies has not been a major focus in soil science, it has engaged field ecologists, hydrologists and physical geographers to a much greater degree. The structure of this paper closely reflects the influential paper by Eberhardt and Thomas (1991) on the design of field studies in environmental science. They use two main criteria to classify field studies. The first criterion is the kind of events that are observed, which they divide into those where a distinct perturbation occurs (e.g., a major flood, toxic spill, or the establishment of a polluting industry in a region) and those where no distinct event occurs. The second, and more important, criterion is the degree of control exerted by the observer. In a controlled experiment,

CANADIAN JOURNAL OF SOIL SCIENCE

Table 1. Categories of field research in soil science Initiator Private sector Public sector Category of study Contaminant survey Soil survey Monitoring Researcher Soil geomorphological/ pedological Pedometric/ geostatistical Sampling strategy Judgment Judgment (free survey) Judgment and probability Judgment Model-based Design-based/ probability Design-based/ probability Judgment, model and design-based Design-based/ probability Design-based/ probability Description Delineation of type and severity of contamination. Sampling strategy may be dictated by appropriate regulatory requirement Mapping of soil taxonomic unit distribution accompanied by descriptive summaries of modal soil profiles Determination of current status and trends for natural resources Interpretation of soil or landform evolution based on description and analysis of high resolution soil stratigraphical sections Values of property modeled as a stochastic process; used to define spatial means, variances, and in interpolation of values at unsampled positions Examination of the effects of a distinct perturbation (e.g., flooding, clear-cut forest harvest) Hypothesis generation and testing based on analysis of spatial and temporal patterns Studies undertaken to estimate parameters of physically based models, to develop functional relationships between variables (e.g., pedotransfer functions) or to test simulations produced by models Hypothesis testing for differences and correlations among classes that cannot be randomized by the researcher (e.g., topographical position, soil textural class) Hypothesis testing based on comparisons among treatments that are imposed by the researcher (e.g., fertilizer rate and formulation)

Can. J. Soil. Sci. Downloaded from pubs.aic.ca by 187.78.55.223 on 03/08/12 For personal use only.

Perturbation Pattern Model support

Comparative mensurative Manipulative

Table 2. Number of samples required to achieve a power of 0.80 in a two-sample, two-tailed t-test. The effect size is expressed as the real difference (in percent) between the means of the two groups at a constant coefficient of variation of 20% Effect size (% real difference between means) 0.01 5 10 15 20 25 30 40 50 376 96 44 26 17 13 8 6

Probability of type I error () 0.05 0.10 Number of samples required for power of 0.80 253 64 29 17 12 9 6 4 199 51 23 14 9 7 4 3 0.20 145 37 17 10 7 5 3 3

treatments are imposed by the researcher to measure the response of the property under study. In uncontrolled studies, the treatments are typically a selection of classes or strata that differ in some characteristic that is not under the control of the researcherdifferent soil textural classes or landform positions, for example. This distinction corresponds to the widely cited distinction between manipulative and mensurative experiments made by Hurlbert (1984). Classifying research studies is much like classifying soilsthere are a few textbook examples encountered and typically the profiles/studies seem to fit into two or more classes. My intent here is to develop categories of studies

that have different sampling approaches associated with them, rather than to attempt to classify individual research papers into one category or another. CATEGORIES OF FIELD RESEARCH IN SOIL SCIENCE An initial distinction can be drawn between studies whose primary objective is the advancement of knowledge through publication in peer-reviewed journals versus research that is undertaken to meet specific public or private sector objectives. In the latter case the research design is often dictated (at least in part) by the initiating agency.

PENNOCK DESIGNING FIELD STUDIES IN SOIL SCIENCE

The first three categories of research (contaminant survey, soil survey, and monitoring studies) are typically initiated by private or public sector agencies (Table 1). The remainder are researcher-initiated or curiosity driven projects assuming, of course, that you can find a funding agency that shares your curiosity! (1) Contaminant Surveys These types of field studies are most typically undertaken by private-sector environmental consultants, and the specific objective may range from an initial evaluation of the extent of contamination to the final stage of remediation of the problem. Laslett (1997) states that consultants who undertake these surveys almost always employ judgment sampling (also called purposive sampling). Typically they prefer to place their samples where their experience and knowledge of site history tells them the contamination might be located. Judgment sampling can result in accurate estimates of population parameters such as means and totals, but cannot provide a measure of the accuracy of these parameters (Gilbert 1987). In many jurisdictions the sampling design may also be constrained by the appropriate regulatory framework. (2) Soil Survey The objective of these studies is to map the distribution of soil taxonomic units and to provide descriptive summaries of the main properties of the soils (or, more simply, soil survey). Basher (1997) observed that government-funded soil survey is in decline throughout much of the developed world due to the combined effects of budgetary cuts and (often) the lack of an obvious user group for the products. Special-purpose inventories of soils are one of the core activities in soils-related environmental consulting. In soil survey or inventory studies the association between soil classes and landscape units is established in the field by judicious selection of sampling points and thorough description of the profiles (often termed the free survey approach). When exercised by a soil scientist whose experience allows a wise selection and an informed interpretation of the profile, this type of judgment sampling can be an extremely efficient way of completing the inventory. In soil survey, judgmentbased sampling is often complemented with probability sampling such as systematic grid or transect sampling in selected units as a quality control measure and to generate variates of selected properties. (3) Monitoring Studies The goal of these studies is to assess the current status and trends of different natural resources. Olsen et al. (1999) surveyed the statistical issues surrounding the major national inventory programs in the United States. They found that a considerable range of sampling designs were being used and that generalization was difficult due to the wide range in objectives among the agencies that initiated the monitoring studies. Given the diversity of possible designs, they emphasized the need to have a clear and concise statement of monitoring objectives prior to selecting a specific design.

Can. J. Soil. Sci. Downloaded from pubs.aic.ca by 187.78.55.223 on 03/08/12 For personal use only.

(4) Pedological or Soil Geomorphic Studies The focus in these studies is on past eventsmore specifically on the processes that formed the soil properties or landscapes under study and the environments that controlled the rates of these processes. Two branches of this approach were evident in the past pedon-scale studies and soil geomorphic or landscape evolution studies. Pedon-scale studies were closely associated with the development of soil taxonomic systems, and (like the classification systems themselves) focus on vertical, intrapedon processes. The roots of the soil-geomorphic approach are in Quaternary geology and soil science, and soil geomorphologists focused on lateral transfer processes and historical landscape evolution, the latter often through the tools of paleopedology (Basher 1997). The geomorphologist Stanley Schumm summarized the need for historical reconstruction in the earth sciences in his examination of fluvial geomorphology: Thus it is possible to view the fluvial system as a physical system or a historical system. In actuality the fluvial system is a physical system with a history. Hence the objective of the geomorphologist is to understand not only the physics and chemistry of the landscape, but its alteration and evolution through time (Schumm 1977, p. 10). Equally there are very few soil properties that can be explained in isolation from their historical and landscape contextA soil sample in the laboratory is nothing more than a bag of dirt. That bag of dirt becomes a useful research sample only if we know the field relations it represents (Daniels 1988, p. 1518). Given the great interest in the effect of current and predicted climate change on soils, it is unfortunate that the effect of past climates on soil and landscape evolution is so little studied now by North American soil scientists. In pedological or soil geomorphic studies, modal profiles or highly resolved exposures are located, and their pedological or sedimentological attributes are carefully described and sampled. For the soil geomorphic studies, the stratigraphical extent of the formation is assessed and (ideally) relative or absolute dating techniques are used to develop a chronology of events. The field observations are combined with laboratory analyses of stable soil properties to develop an interpretation of the soilforming processes. The oft-cited book by Birkeland (1999) provides a thorough examination of this approach, as does the review by Wysocki et al. (2000). As Phillips (2001) points out in his article on contingency and generalization in pedology, there is rarely a single, definitive interpretation that emerges in these studiesmultiple working hypotheses are not only intellectually desirable but may also all be true. (5) Geostatistical/Pedometric Studies This category includes a wide range of approaches (Yates and Warrick 2002; McBratney et al. 2002). Typically these studies are undertaken to quantify the spatial pattern of soil properties, to use this quantification in the interpolation of values at unsampled locations, to assess the suitability of different spatial models for processes, or in the design of efficient sampling programs.

CANADIAN JOURNAL OF SOIL SCIENCE

Can. J. Soil. Sci. Downloaded from pubs.aic.ca by 187.78.55.223 on 03/08/12 For personal use only.

Brus and de Gruijter (1997) argued that there is a fundamental distinction between geostatistical studies and those based on more classical statistical survey principles (including the remaining categories of studies defined in this paper). Geostatistical studies use model-based sampling strategies where the pedogenic process that has lead to the field of values for a given property is modeled as a stochastic process. The specific realization of the population that is sampled is but one of an infinite set of populations that could be produced by the model. For model-based sampling strategies randomness is inherent to the modelling process, and non-probability-based sampling designs can be accommodated in the analysis. In the alternative, design-based sampling strategies, the field of values that we are to sample is assumed to be fixed (at least at the specific time of sampling) and stochasticity is due solely to the probability based sampling design used and to any measurement error. The sampling design used confers a design-specific independence on the data regardless of the spatial continuity of the underlying processes. Brus and de Gruijter (1997) stated that the independence of sample data required by classical statistical analysis is met by this design-independence, which is contrary to information contained in many published sources. (6) Perturbation Studies Eberhardt and Thomas (1991) categorized studies triggered by a distinct perturbation as intervention or perturbation studies. In ecology, perturbation studies often focus on the response of population characteristics to the perturbation using time series analysis. Ideally, these studies are carried out before and after the perturbation, and the post-perturbation results should ideally be compared to some control or unperturbated system (Eberhardt and Thomas 1991). These studies are uncommon in the research literature in soil science but are central to environmental pollution monitoring. Gilbert (1987) provides an excellent overview of sampling methods in environmental pollution monitoring. (7) Pattern Studies These studies are undertaken to assess and explain the spatial or temporal pattern of properties (Eberhardt and Thomas 1991). Two major (and overlapping) themes exist in pattern studiesthe quantification of the spatial and temporal variability in properties and hypothesis generation and testing using point patterns (Underwood et al. 2000). The former are (typically) done with the explicit purpose of evaluating variability; the latter are often much less clearly designed, and the hypotheses being tested are (too often) fuzzy and/or undefined. Compared to the rich literature in the use of geostatistics/ pedometrics there has been little attention paid by soil scientists to hypothesis generation and testing potential of pattern studies. Underwood et al. (2000) stated that pattern studies occupy a clear niche in the evolution of ecological research: There is no possible doubt that observations of patterns or lack of patterns are the fundamental startingblocks for ecological study. Until patterns have been

described, there is no basis for invoking explanatory models about processes. Nor is there any mechanism for understanding the scale and scope of any process that may be operating (p. 108) In some cases, the hypothesis testing may involve formal testing, but in many others the comparison between measured and hypothesized patterns may be made using other criteria, including simple visual comparisons (Grayson and Blschl 2000). This multi-faceted role of pattern studies is evident in many areas of soil science. For example, Govers et al. (1999) pointed out that spatial pattern studies on soil erosion using 137Cs redistribution techniques were critical in the discovery of tillage redistribution of soil by soil scientists and geomorphologists. Multiple studies in cultivated fields found that consistently high rates of soil loss were associated with slope segments that had convex downslope profiles. Soil loss from these segments could not be readily explained by water or wind erosion and ultimately tillage redistribution (which had been previously documented and explored in the agricultural engineering literature) was recognized as the most probable explanation for the observed pattern (Govers et al. 1999). The nitrous oxide emission study of Wagner-Riddle et al. (1997) is an example of a temporal pattern study. They used micrometerological techniques and a Tunable Diode Laser Trace Gas Analyzer to generate a quasi-continuous record of N2O emissions from four fields that had different crops over a 28-mo period. The crops are unreplicated, and therefore unambiguous, causal linkage between the crops and emissions cannot be made; however, Groffman et al. (2000) argued these multi-year, quasi-continuous studies are essential to challenge our existing ideas about the factors that control annual emissions. Hypotheses testing using patterns in hydrology has recently been explored in detail in the volume edited by Grayson and Blschl (2000). One of their main objectives is the comparison of measured patterns of hydrological properties to simulated patterns produced by process modeling. The methods used for comparisons include purely visual comparisons and point-by-point comparisons based on direct comparison of points and mapping of residuals between observed and simulated patterns. They suggest that more refined and insightful use of spatial patterns is vital for progressing the science of hydrology and for better constraining the uncertainty in our predictions of the future (p. 78). Underwood et al. (2000) pointed out that pattern studies can supply necessary support for a hypothesis but cannot provide sufficient evidence for its acceptanceother, alternative processes could often lead to the same prediction. This indeterminacy of field observations is also emphasized by Phillips (2001). For example, the study of Wagner-Riddle (1997) clearly indicated the importance of freeze-thaw processes for annual N2O emissions but could not identify the specific mechanism involved. The pattern correlation study of Pennock et al. (1992) implies a causation linkage between properties, but uses a technique (correlation analysis) which, by itself, cannot establish causation (Webster, 1989). This reinforces the initial point of

PENNOCK DESIGNING FIELD STUDIES IN SOIL SCIENCE

t ten x E

Can. J. Soil. Sci. Downloaded from pubs.aic.ca by 187.78.55.223 on 03/08/12 For personal use only.

Fig. 1. Schematic diagram illustrating the concepts of support, spacing, and extent for a grid sampling design on a near-level landscape [adapted from Blschl and Sivapalan (1995)].

Underwood et al. (2000) that these studies have a clear role to play at an early stage of the development of a particular area of study, but the hypotheses generated by them must be more rigorously tested using other designs. (8) Modelling Support Studies Eberhardt and Thomas (1991) discussed a distinct type of study (sampling for modelling) that is undertaken for efficient estimation of parameters for models. Their definition is a strict one, but this category is used in a more general way in my paper. In soil science (and most particularly in soil physics and hydrology) many field studies are undertaken to support modelling: to estimate parameters required in physically based models, to develop functional relations (Webster 1989) between variables, or to test simulations produced by models. The development of pedotransfer functions (Bouma 1989) is a particular category of this type of study. The sampling designs used in these studies range from judgment selection of points through to very elaborate manipulative designs. The final categories of experiments are all primarily concerned with the drawing of comparisons between groups. They differ in the ability of the observer to impose the treatments, and correspond to the well-known mensurative/manipulative distinction made by Hurlbert (1984). (9) Comparative Mensurative Studies In comparative mensurative experiments, comparisons are made between classes that the researcher defines but cannot controlsites grouped by different soil textures, soil zones, landform positions, soil taxonomic classes, and drainage class are all examples. Their location cannot be randomized by the researcher, unlike treatments such as tillage type or fertilizer rates.

Eberhardt and Thomas (1991) distinguish between studies where the whole population is available for sampling and those where a deliberate selection of contrasting parts of the population is made. In soil science, this latter type would typically be called indicator studies. Studies on landform and soil erosion relationships in Saskatchewan using 137Cs can be used to illustrate this distinction. Initial spatial pattern studies of 137Cs-estimated soil erosion suggested that distinct ranges of soil loss or gain were associated with quantitatively defined landform elements (Martz and de Jong 1987; Pennock and de Jong 1987). Whole-population studies using sampling grids that spanned the range of landforms at the sampled sites were used to test this association between landform elements and soil redistribution (Pennock and de Jong 1990). These studies consistently showed that doubly convex slope segments [termed divergent shoulder (DSH) elements] had the highest rates of soil loss. This knowledge was used in Pennock et al. (1995) to compare maximum erosion rates of five parent materials by only sampling DSH elements. Their results provide a relative ranking or indication of the different parent materials susceptibility to erosion, but only if the assumptions of the whole-population studies are valid. (10) Manipulative Experiments In manipulative studies the treatments can be imposed by the researcherideally as fixed amounts that are applied exactly (or at least as exactly as the implementation allows). Eberhardt and Thomas (1991) argued that these are the only type of research designs that the term experiment should be attached to. Any issue of the Canadian Journal of Soil Science will contain several such studies and many agricultural research design texts give excellent guidance for the selection and implementation of these methods. The main

CANADIAN JOURNAL OF SOIL SCIENCE

issues facing soil scientists when selecting these techniques are associated with replication. These issues are also common to the comparative mensurative techniques and will be discussed below. CONSIDERATIONS IN THE DESIGN OF FIELD PROGRAMS Selection of the Appropriate Category Selection of the appropriate approach can be made only after formulation of a clear research question. If my research question is a soil geomorphic one, the success of the project will hinge on my ability to find, describe, and convincingly interpret soil stratigraphical sections that yield a highly resolved environment record. Choice of an appropriate field design for other research questions will depend in large part on the information already available. For a newly discovered property [e.g. amino sugar N (Khan et al. 2001) or charcoal in soil organic matter (Skjemstad et al. 2002)], inventory or pattern studies in a new region would be appropriate. On the other hand, it is difficult to see what an inventory or single-site pattern study would add to our understanding of soil nitrate or organic carbon in almost any agricultural region in Canada. After a thorough review of the literature you may pose questions about these properties that an innovative manipulative design could successfully address; indeed the development of such designs is the hallmark of critical science and the making of great scientists. A thorough review of what is known is essential to determine how to design research on what is (as yet) unknown. Definition of Population and Experimental Units The research question also defines the population that the studys findings will pertain tosometimes referred to as the inference space (at least when using inferential statistics). A clear definition of the population is essential for successful sampling design. The population consists of all possible objects that share some common characteristic; the sampling design or experimental design will specify how a subset of those objects will be drawn from the population (Steel and Torrie 1980). In manipulative studies, the treatments are imposed on individual experimental units or experimental plots that can be randomly situated at the site; the size, shape and arrangement of the experimental units are decided upon by the researchers (Steel and Torrie 1980). For pattern studies and comparative mensurative studies, the population is composed of objects whose placement cannot be controlled by the researcher. Often the area that a single observation pertains to is much less obvious than in a manipulative study. The range of spatial dependence can be quantitatively assessed using geostatistics, and sufficient studies have been completed to give the approximate range for many properties [see Mulla and McBratney (2000) for a recent summary]. The use of geostatistics prior to the design of a sampling program would be ideal but is a logistical and funding obstacle for many researchers. For many field research projects in soil science, hydrology, and plant ecology, the elements of the population are

Can. J. Soil. Sci. Downloaded from pubs.aic.ca by 187.78.55.223 on 03/08/12 For personal use only.

topographically defined (Rowe 1984; Hudson 1992; Grayson and Blschl 2000). Grayson and Blschl (2000) discuss four main types of topographically defined elements, which they generally call model elements. Many approaches to landform segmentation have been developed, which divide landforms into quantitatively defined landform elements (Speight 1968). In other studies, the population may be composed of soil map polygons, textural groupings, or drainage classes. Based on the terminology of researchers such as Speight (1968) and Grayson and Blschl (2000), the general term population elements will be used in the remainder of this paper for the objects that comprise the population. The term sample will pertain to the specific population elements that are drawn from the population in a particular sampling design; the sampling design specifies how the specific elements are selected (following well-established usage of these terms). Scale Issues Scale refers to a characteristic length (or time) of a process, observation, or model (Blschl and Sivapalan 1995). Grayson and Blschl (2000) examine three aspects of scale (Fig. 1). Support is a geostatistical term and refers to the volume of a spatial sample (e.g., core diameter) or the measurement length of a temporal sample. This is also referred to as sampling unit (Steel and Torrie 1980) or grain (McBratney 1998). Spacing pertains to the distance (or elapsed time) between successive sampling units and, more generally, the layout of the samples in the study area. The extent is the total duration of the temporal series of samples or the length (or area) sampled in a spatial sampling program. Defined in this way, extent is the spatial or temporal definition of the population and follows directly from the definition of the population. Issues of extent are relevant for the design of all field studies. A common theme in the ecological (e.g., Carpenter et al. 1998; Gustafson 1998; Schindler 1998; Underwood et al. 2000; Groffman et al. 2000; Haag and Matschonat 2001) and hydrological (Seyfried and Wilcox 1995, Western et al. 1999) literature is the need to match the extent of the study with the scale of the process (or processes) under study. Seyfried and Wilcox (1995) provided several examples of the connection between extent and process. At one extreme, the processes of surface runoff and infiltration at one study site were controlled by shrub spacing, and the relevant scale was between 0.3 and 8 m. At the coarsest scale, variability in snowfall distribution at different elevations in a large catchment was due to the effect of prevailing wind direction during storms, and the relevant extent for this study was between 2000 and 15 000 m. Hence, the appropriate extent of a study must be carefully matched to the processes that the researcher believes are operating in the area. Replication Issues Questions about scale are closely linked to the issue of replication in hypothesis-testing experiments. In a manipulative study, replication means the repeated imposition of a set of treatments. In a mensurative or a pattern study, the repeated, unbiased selection and sampling of population elements

PENNOCK DESIGNING FIELD STUDIES IN SOIL SCIENCE

constitutes replication [albeit, as Eberhardt and Thomas (1991) point out, of a very different kind than in manipulative experiments]. Many classical agronomic experiments involve both types of replication; for example, fertilizer trials at sites with different textures involve imposed treatments (fertilizer rates) and inherent treatments or site characteristics (soil texture). Replication is used to provide an estimate of experimental error, to improve precision by reducing the standard deviation of treatment or class means, to increase the scope of inference, and to effect control of the error variance (Steel and Torrie 1980). The estimation of the experimental error is required for tests of significance. In a standard, small plot experimental design, lack of sufficient replication is inexcusable and studies that failed to have sufficient replication should not be published. For comparative mensurative, pattern or model support studies, the question of what constitutes a replicate is decided by the research question and the definition of the population. For example, a study on the relationship between convex profile curvature and rates of soil loss could be carried out in a single field if the field contained multiple hills with pronounced convex profile curvature (i.e., shoulder elements). The researcher could use intensive field sampling and gamma spectroscopy to measure the 137Cs concentration in each of the shoulder elements, and then use conversion relationships to provide a time-integrated 40-yr measurement of soil loss due to erosional processes. The correlation between soil loss and measured curvature at each sampled point could then be assessed or a regression relationship developed. If, however, the researcher assumed that the results pertained to all fields in the study region or to even larger units such as soil zones, then they would be guilty of pseudoreplication (Hurlbert 1984). Pseudoreplication is a consequence of the actual physical space over which samples are taken or measurements made being smaller or more restricted than the inference space implicit in the hypothesis being tested (Hurlbert 1984, p. 190). It occurs when researchers make claims that their work demonstrates a more general effect when, in fact, the error terms used in the statistical analysis reflects only the variation in a treatment or class replicate (Raffaelli and Moller 2000; Underwood 1997). The 137Cs study discussed above provides a time-integrated measure of the soil redistribution pattern in the sampled field, but each field in the region has its own history of erosionfor example, catastrophic rainfall events may affect adjacent fields very differently, depending on the crop or the residue cover on the surface during the event. Each field constitutes one element of the population of possible erosional responses; sampling of multiple fields (i.e., population elements) will allow the parameters of that population to be estimated. Multiple cores from one field are a sub-sample of one field with a particular erosional history; assuming each of these sub-samples are independent replicates of the range of possible erosional histories is pseudoreplication. Clearly the way to avoid pseudoreplication is to have a clear definition of the population your study will pertain to and the elements that comprise the population.

Hurlberts (1984) definition of pseudoreplication also included a requirement for replicates to be spatially independent. This requirement for spatial independence has been challenged by subsequent authors (Underwood 1997; Raffaelli and Moller 2000). They argue that spatially nonindependent replicates can be analyzed if the non-independence is explicitly incorporated into the analysis used, and Underwoods (1997) textbook deals at length with suitable methods. Moreover Brus and de Gruijter (1997) stated that the need for spatial independence has been misunderstood the design- independence inherent to specific probabilitybased sampling designs confers the required independence on the samples. A final, but very important, issue concerning replication is that there are some things that cannot be replicated. An obvious category is catastrophesfloods, pestilence, or major pollution events that will (one hopes) not be repeated. Another category is the post-establishment effects of major projects such as smelters or dams. The two cases differ insofar as the occurrence of the catastrophes cannot be foreseen, whereas the establishment of a power plant or dam could be. Both of these are types of perturbation analyses, and Eberhardt and Thomas (1991) suggest that comparison of the affected area with well-selected control areas (i.e., unaffected by the perturbation) is a viable approach in these types of studies. There are also objects that cannot be replicated by virtue of their size or complexity. Schindler (1998) and Carpenter et al. (1998) argued that true replication is impossible among even small lakes such as in the Experimental Lakes Area of Northwestern Ontario. These authors make a strong case that we should not attempt to replicate systems at this level of complexity, but should instead have well-designed manipulations of contrasting systems and then analyze them using appropriate statistical techniques. Schindler (1998) is especially critical of the use of replicated microcosm as surrogates for whole-ecosystem manipulations because of the inability of microcosms to include ecosystem-scale transfer and exchange processes. He concludes that proper upscaling from microcosm-type experiment is impossible, and experiments at less than ecosystem scales are inappropriate for making ecosystem-scale predictions. In soil science the term landscape-scale studies has been commonly used instead of ecosystem-scale studies. True landscape-scale studies require inclusion of land use effects, yet the extent of the landscape clearly limits our ability to truly replicate them and to impose treatments on them. Fortunately the recent increase in publications by soil scientists (e.g., Van Kessel and Wendroth 2001) in this area seems to indicate a desire to grapple with these issues. Once the researcher has clearly defined what constitutes a replicate they are immediately confronted with the question of how many replicates are required. In the simplest type of hypothesis testing, two hypotheses are constructed: the null hypothesis (Ho) of no difference between the two groups, and the alternative hypothesis of a significant difference occurring. The researcher chooses an level to control the probability of rejecting the null hypothesis when it is actually true (i.e., of finding a difference between the two groups when none, in fact, existed in nature or a Type I error).

Can. J. Soil. Sci. Downloaded from pubs.aic.ca by 187.78.55.223 on 03/08/12 For personal use only.

CANADIAN JOURNAL OF SOIL SCIENCE

For conservation research, Peterman (1990) argued that the consequences of committing a Type II error (i.e., of failing to reject the null hypothesis when it is, in fact, false) can be graver than a Type I error. The probability of failing to reject the Ho when it is, in fact, false is designated as and the power of a test equals (1). Low power tests of hypotheses are unlikely to detect differences between two or more groups even when a difference does, in fact, exist in nature. Although the appropriate power level for a reliable test must reflect the gravity of the issue under examination, Peterman (1990) suggested that, in general, studies should be designed to achieve a minimum of 0.2 and hence a power of 0.80. Issues about the selection of for environmental pollution studies are explored in detail by Gilbert (1987). The power of a test is a function of four factors: the level chosen, the variance of the sampled groups, sample numbers (N) in each group, and the effect size (i.e., the actual difference between the groups). The smaller the probability of committing a Type I error (), the lower the power (holding the other factors constant). Peterman (1990) questioned the uncritical acceptance of an of 0.05 or 0.01 for conservation work, and argued that an of 0.10 or greater is more appropriate for some designs. Second, the higher the variance in the groups, the lower the power. Soil properties tend to be inherently variable and coefficients of variation (CV) range from about 10% for pH and porosity to 20% for SOC and texture and greater than 100% for many water or solute transport properties (Mulla and McBratney 2000). This inherent variation must be taken into account in the selection of appropriate sample numbers. Power also decreases when the difference between the means of the sampled groups is small, or when the sample size is small. An assessment of effects of the four factors that control power can (and should) be a required part of sampling design. For example, a researcher wants to determine the effects of adoption of no-till on soil organic carbon. Two fields are being compared, and the researcher wants to know how many samples must be taken to detect a statistical difference at an of 0.10 using a two-sample, two-tailed t-test. She believes that the two fields had SOC levels of about 50 Mg ha1 in the upper 15 cm before imposition of no-till on one of the fields 10 years previously. She also assumes that the CV for SOC is about 20%. Published summaries of SOC change due to notill adoption in the Canadian prairies suggest that SOC gains may be as high as 5 Mg ha1 over the 10-year period (Janzen et al. 1998), yielding an effect size of approximately 10%. A power analysis indicates that she requires 51 samples from each group to detect a difference for this effect size at her chosen of 0.10 (Table 1). If she had used only 25 samples per group, the P value of the t-test would equal about 0.24 and she would fail to reject the null hypothesis. To detect small changes in SOC content with time, the use of well-structured samplings of the same sampling unit can allow much smaller differences to be detected over short periods of time (Ellert et al. 2002), and these specialized designs should be considered for this research question. This example illustrates the well-deserved skepticism that many statisticians have concerning significance testing of differences. Webster (2001) pointed out that if you take

large enough samples you can establish that any soil is different from almost any other for whatever property of them that you care to choose (p. 335) and that rejection of the null hypothesis does not mean that the difference you have detected is important or physically or biologically meaningful. The researcher should initially establish the differences among groups that would have real significance from a biological, physical, or (for many agronomic or environmental issues) economic perspective. Given what is known of the inherent variation of the property under study, an adequate number of samples can be then taken to ensure reliable difference testing of the hypotheses. Sample Spacing and Layout Several recent, comprehensive reviews have presented the options available for sampling design (Mulla and McBratney 2000; Pennock and Appleby 2002; de Gruijter 2003). Systematic sampling designs are commonly used in the soil and earth sciences. They are often criticized by statisticians [for reasons discussed in Eberhardt and Thomas (1991)] but the ease with which they can be used and the efficiency with which they gather information make them popular in the earth sciences. For example, Wolcott and Church (1991) found that for sampling of river gravels up to 500 randomly chosen points were required in order to yield the same quality of statistical information as 100 systematic grid samples. The major caution in the use of systematic sampling with a constant spacing is that the objects to be sampled must not be arranged in an orderly manner, which might correspond to the spacing along a transect or grid. Geostatistical studies ideally require observations to be made at two or more scales such that the spatial dependence can be evaluated at different spacings (McBratney 1998). For example, Lettner et al. (2000) used three grids with areas of 10 000 m2, 100 m2, and 1 m2, each of which contained 81 sampling points for a study on the spatial variability of 137Cs. The smaller area grids were nested within the larger grid, such that 235 samples were taken in all. The use of non-stratified, systematic designs may be very inefficient for mensurative sampling designs. Appropriate sample numbers for mensurative experiments can be gathered efficiently by an a priori placement of points into the relevant groups or strata, and then a random selection of points within each stratum until the desired number is reached. For example, Slobodian et al. (2002) laid out a grid on the surface of their sites and then classified each point in the grid into one of three landform element classes. Grid points were then randomly selected until 10 samples in each of the three groups were chosen. The summary statistics for the site as a whole can then be calculated by using designspecific estimators (de Gruijter 2002). CONCLUSIONS AND RECOMMENDATIONS The issues that should be considered in the design of a successful field research program can be summarized as: 1. A clear definition of the research question is the initial (and most critical) step. This definition dictates the type of research design that is appropriate and the specific design issues associated with different research types.

Can. J. Soil. Sci. Downloaded from pubs.aic.ca by 187.78.55.223 on 03/08/12 For personal use only.

PENNOCK DESIGNING FIELD STUDIES IN SOIL SCIENCE

2. The appropriateness of a given research design can be judged only after a thorough review of what is known about the research question. Exploratory pattern studies can be very informative at an early stage of research, but yield little new information for well-established research topics. Equally, the imposition of a set of treatments if little is known of the processes controlling responses is unlikely to produce comprehensive interpretations. 3. There is never a good reason for haphazard sampling the rationale for selecting sampling points in pedological, soil geomorphic, or inventory studies should be clearly stated. 4. A clear definition of the population and the elements that comprise the population under study is very important. 5. The definition of the population dictates the extent of the study and the physical or temporal space that the results pertain to, which is critical to avoid pseudoreplication. 6. The sample support, spacing, and extent of the study must be consistent with what is known of the processes controlling the phenomena being studied. 7. The construction of hypotheses for formal testing should be based on sound physical or biological reasoning, and sufficient samples should be taken to allow reliable testing of the alternative hypotheses. 8. The exclusion of phenomena because they cannot be replicated is inherently limiting to the expansion of our knowledge of soils. Innovative approaches must continue to be developed and applied so that we can expand the scale at which field studies can undertaken. This final point is a key one for the future of field research in soil science. The recognition of the importance of tillage translocation of soil can again illustrate this point. Erosion had been studied for decades using a well-established, widely disseminated manipulative research designthe standard erosion plots used to establish and revise the Universal Soil Loss Equation. These plots had yielded a vast store of knowledge on the relationship between water erosion and site factors such as slope gradient, management, and soil factors. Yet the standard design of the experimental plotsrectangular plots located on rectilinear slope segmentsexcluded elements with downslope curvature where soil movement by tillage translocation is most evident (Govers et al. 1999). Recognition of the importance of tillage translocation could only occur when innovative field sampling techniques were coupled with 137Cs analysis of soils to yield our current understanding of the spatial pattern of redistribution. Progress in many of areas of soil science awaits similar innovations in field research design. A major limitation to innovation (and source of frustration for field researchers) is the uncritical application by reviewers or editors of research standards suitable for small-plot, manipulative studies to more complex field settingsfor example, demands for repli-

cation of unreplicable, complex landscapes or for randomization of fixed landscape elements. If the authors of a manuscript have clearly documented their rationale for the design they selected (following the criteria presented at the beginning of this section) and have developed a coherent interpretation of the data gathered in the study, the research should be published. Subsequent testing of the interpretation by other researchers using ever more rigorous designs is the best measure of the ultimate importance of the research. ACKNOWLEDGMENTS My understanding of field research has been greatly improved by discussions with colleagues past and present, particularly Darwin Anderson, Chris Van Kessel, Richard Farrell, and Tom Veldkamp. My thanks also to the three reviewers of this paper, who greatly improved the manuscript and suggested a number of fine literature sources for citation.
Basher, L. R. 1997. Is pedology dead and buried? Aust. J. Soil Res. 35: 979994. Birkeland, P. W. 1999. Soils and geomorphology. Oxford University Press, Oxford, UK. 430 pp. Blschl, G. and Sivapalan, M. 1995. Scale issues in hydrological modelling: A review. Hydro. Proc. 9: 251290. Bouma, J. 1989. Land qualities in space and time. Pages 313 in J. Bouma and A. K. Bregt, eds. Land qualities in space and time. PUDOC, Wageningen, The Netherlands. Brus, D. J. and de Gruijter, J. J. 1997. Random sampling or geostatistical modelling? Choosing between design-based and modelbased sampling strategies for soil (with discussion). Geoderma 80: 144. Carpenter, S. R., Cole, J. J., Essington, T. E., Hodgson, J. R., Houser, J. N., Kitchell, J. F. and Pace, M. L. 1998. Evaluating alternative explanations in ecosystem experiments. Ecosystems 1: 335344. Daniels, R. B. 1988. Pedology, a field or laboratory science? Soil Sci. Soc. Am. J. 52: 15181519. de Gruijter, J. J. 2002. Sampling. Pages 4579 in J. H. Dane and G. C. Topp, eds. Methods of soil analysis. Part 4. Physical methods. SSSA, Inc., Madison, WI. Eberhardt, L. L. and Thomas, J. M. 1991. Designing environmental field studies. Ecol. Monogr. 61: 5373. Ellert, B. H., Janzen, H. H. and Entz, T. 2002. Assessment of a method to measure temporal change in soil carbon storage. Soil Sci. Soc. Am. J. 66: 16871695. Gilbert, R. O. 1987. Statistical methods for environmental pollution monitoring. Van Nostrand Reinhold, New York, NY. 320 pp. Govers, G., Lobb, D. A. and Quine, T. A. 1999. Tillage erosion and translocation: emergence of a new paradigm in soil erosion research. Soil Tillage Res. 51: 167174. Grayson, R. B. and Blschl, G. 2000. Spatial patterns in catchment hydrology. Observations and modelling. Cambridge University Press, Cambridge, UK. 404 pp. Groffman, P. M., Brumme, R., Butterbach-Bahl, K., Dobbie, K. E., Mosier, A. R., Ojima, D. S., Papen, H., Parton, W. J., Smith, K. A. and Wagner-Riddle, C. 2000. Evaluating annual nitrous oxide fluxes at the ecosystem scale. Global Biogeo. Cycles 14: 10611070. Gustafson, E. J. 1998. Quantifying landscape spatial pattern: What is the state of the art? Ecosystems 1: 143156. Haag, D. and Matschonat, G. 2001. Limitations of controlled experimental systems as models for natural systems: a conceptual assessment of experimental practices in biogeochemistry and soil science. Sci. Total Environ. 277: 199216.

Can. J. Soil. Sci. Downloaded from pubs.aic.ca by 187.78.55.223 on 03/08/12 For personal use only.

10

CANADIAN JOURNAL OF SOIL SCIENCE Rowe, J. S. 1984. Forestland classification: limitations of the use of vegetation. Pages 132148 in J. G. Bockheim, ed. Forest land classification: Experiences, problems, perspectives. Department of Soil Science, University of Wisconsin, Madison, WI. Schindler, D. W. 1998. Replication versus realism: The need for ecosystem-scale experiments. Ecosystems 1: 323334. Schumm, S. A. 1977. The fluvial system. John Wiley and Sons, New York, NY. 338. Seyfried, M. S. and Wilcox, B. P. 1995. Scale and the nature of spatial variability: Field examples having implications for hydrologic modeling. Water Res. Res. 31: 173184. Skjemstad, J. O., Reicosky, D. C., Wilts, A. R. and McGowan, J. A. 2002. Charcoal carbon in U. S. agricultural soils. Soil Sci. Soc. Am. J. 66: 12491255. Slobodian, N., Van Rees, K. and Pennock, D. J. 2002. Cultivation-induced effects on belowground biomass and organic carbon. Soil Sci. Soc. Am. J. 66: 924930. Speight, J. G. 1968. Parametric description of landform. Page 239250 in G. A. Stewart, ed. Land evaluation. MacMillan of Australia, Sydney, Australia. Steel, R. G. D. and Torrie, J. H. 1980. Principles and procedures of statistics. A biometrical approach. McGraw-Hill Book Company, New York, NY. 633 pp. Underwood, A. J. 1997. Experiments in ecology. Their logical design and interpretation using analysis of variance. Cambridge University Press, Cambridge, UK. 504 pp. Underwood, A. J., Chapman, M. G. and Connell, S. D. 2000. Observations in ecology: you cant make progress on processes without understanding the patterns. J. Exp. Marine Biol. Ecol. 250: 97115. Van Kessel, C. and Wendroth, O. 2001. Landscape research exploring ecosystem processes and their relationship at different scales in space and time. Soil Tillage Res. 58: 9798. Wagner-Riddle, C., Thurtell, G. W., Kidd, G. K., Beauchamp, E. G. and Sweetman, R. 1997. Estimates of nitrous oxide emissions from agricultural fields over 28 months. Can. J. Soil Sci. 77: 135144. Webster, R. 1989. Is regression what you really want? Soil Use Manage. 5: 4753. Webster, R. 2001. Statistics to support soil research and their presentation. Eur. J. Soil Sci. 52: 331340. Western, A. W., Grayson, R. B., Blschl, G., Willgoose, G. R. and McMahon, T. A. 1999. Observed spatial organization of soil moisture and its relation to terrain indices. Water Res. Res. 35: 797810. Wolcott, J. and Church, M. 1991. Strategies for sampling spatially heterogeneous phenomena: The example of river gravels. J. Sediment. Petrol. 61: 534543. Wysocki, D. A., Schoeneberger, P. J. and LaGarry, H. E. 2000. Geomorphology of soil landscapes. Pages E-5 to E-39 in M. E. Sumner, ed. Handbook of soil science. CRC Press, Boca Raton, FL. Yates, S. R. and Warrick, A. W. 2002. Geostatistics. Pages 81118 in J. H. Dane and G. C. Topp, eds. Methods of soil analysis. Part 4. Physical methods. SSSA, Inc., Madison, WI.

Hudson, B. D. 1992. The soil survey as paradigm-based science. Soil Sci. Soc. Am. J. 56: 836841. Hurlbert, S. H. 1984. Pseudoreplication and the design of ecological field experiments. Ecol. Monogr. 54: 187211. Janzen, H. H., Campbell, C. A., Izaurralde, R. C., Ellert, B. H., Juma, N., McGill, W. B. and Zentner, R. P. 1998. Management effects on soil C storage on the Canadian prairies. Soil Tillage Res. 47: 181195. Khan, S. A., Mulvaney, R. L. and Hoeft, R. G. 2001. A simple test for detecting sites that are nonresponsive to nitrogen fertilization. Soil Sci. Soc. Am. J. 65: 17511760. Laslett, G. M. 1997. Discussion of : D. J. Brus and J. J. de Gruijter, Random sampling or geostatistical modelling. Geoderma 80: 4549. Lettner, H., Bossew, P. and Hubmer, A. K. 2000. Spatial variability of fallout Caesium-137 in Austrian Alpine regions. J. Environ. Radioact. 47: 7182. Martz, L. W. and de Jong, E. 1987. Using cesium-137 to assess the variability of net soil erosion and its association with topography in a Canadian Prairie landscape. Catena 14: 439451. McBratney, A. B. 1998. Some considerations on methods for spatially aggregating and disaggregating soil information. Nutr. Cycl. Agroecosyst. 50: 5162. McBratney, A. B., Anderson, A. N., Lark, R. M. and Odeh, I. O. 2002. Newer application techniques. Pages 159198 in J. H. Dane and G. C. Topp, eds. Methods of soil analysis. Part 4. Physical methods. SSSA, Inc., Madison, WI. Mulla, D. J. and McBratney, A. B. 2000. Soil spatial variability. Pages A321A352 in M. E. Sumner, ed. Handbook of soil science. CRC Press, Boca Raton, FL. Olsen, A. R., Sedransk, J., Edwards, D., Gotway, C. A., Liggett, W., Rathbun, S., Reckhow, K. H. and Young, L. J. 1999. Statistical issues for monitoring ecological and natural resources in the United States. Environ. Monitor. Assess. 54: 145. Pennock, D. J. and Appleby, P. G. 2002. Site selection and sampling design. Pages 1540 in F. Zapata, ed. Handbook for the assessment of soil erosion and sedimentation using environmental radionuclides. Kluwer Academic Publishers, Amsterdam, The Netherlands. Pennock, D. J. and de Jong, E. 1987. The influence of slope curvature on soil erosion and deposition in hummocky terrain. Soil Sci. 144: 209218. Pennock, D. J. and de Jong, E. 1990. Rates of soil redistribution associated with soil zones and slope classes in Southern Saskatchewan. Can. J. Soil Sci. 70: 325334. Pennock, D. J., Lemmen, D. S. and de Jong, E. 1995. Cesium137-measured erosion rates for soils of five parent -material groups in southwestern Saskatchewan. Can. J. Soil Sci. 75: 205210. Pennock, D. J., van Kessel, C., Farrell, R. E. and Sutherland, R. A. 1992. Landscape-scale variations in denitrification. Soil Sci. Soc. Am. J. 56: 770776. Peterman, R. M. 1990. Statistical power analysis can improve fisheries research and management. Can. J. Fish Aquat. Sci. 47: 215. Phillips, J. D. 2001. Contingency and generalization in pedology, as exemplified by texture-contrast soils. Geoderma 102: 347370. Raffaelli, D. and Moller, H. 2000. Manipulative field experiments in animal ecology: Do they promise more than they can deliver? Pages 299338 in A. H. Fitter and D. G. Raffaelli, eds. Advances in ecological research. Volume 30. Academic Press, San Diego, CA.

Can. J. Soil. Sci. Downloaded from pubs.aic.ca by 187.78.55.223 on 03/08/12 For personal use only.

Das könnte Ihnen auch gefallen