12 views

Uploaded by Alfonso Hernández Ríos

save

- Bba Term-V-business Research Methods-II (Rg)
- CME106 Problem set 6
- IA_2016_HL_Report Checklist IBO Rev CH (1)
- Syllabus_STAT512_new (3).docx
- Comparative
- Ch 13 Student Krajewski
- Salomon EpidemiologicTransition PDR2002
- 확률통계답안지
- Chapter14_Review_Answers.pdf
- Econometrics
- 7 Artikel Edisi Juni 2017 Register Journal Iain Salatiga
- IMMS_B108
- Huong Dan SD SPSS 11.5
- 1718 1-The Role of Statistics in Engineering.pdf
- A Fully Automated Correction Method of EOG
- Career and RSI/WMSD
- Using Real Data For Decision Analysis
- Outliers
- Logistic Regression[1]
- A NOTE ON TESTING OF HYPOTHESIS
- Yes Please
- The Unwinding: An Inner History of the New America
- Sapiens: A Brief History of Humankind
- The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
- Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
- Dispatches from Pluto: Lost and Found in the Mississippi Delta
- Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
- John Adams
- The Prize: The Epic Quest for Oil, Money & Power
- The Emperor of All Maladies: A Biography of Cancer
- Grand Pursuit: The Story of Economic Genius
- This Changes Everything: Capitalism vs. The Climate
- A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
- The New Confessions of an Economic Hit Man
- Team of Rivals: The Political Genius of Abraham Lincoln
- The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
- Smart People Should Build Things: How to Restore Our Culture of Achievement, Build a Path for Entrepreneurs, and Create New Jobs in America
- Rise of ISIS: A Threat We Can't Ignore
- The World Is Flat 3.0: A Brief History of the Twenty-first Century
- Bad Feminist: Essays
- Angela's Ashes: A Memoir
- Steve Jobs
- How To Win Friends and Influence People
- The Sympathizer: A Novel (Pulitzer Prize for Fiction)
- Extremely Loud and Incredibly Close: A Novel
- Leaving Berlin: A Novel
- The Silver Linings Playbook: A Novel
- The Light Between Oceans: A Novel
- The Incarnations: A Novel
- You Too Can Have a Body Like Mine: A Novel
- The Love Affairs of Nathaniel P.: A Novel
- Life of Pi
- Brooklyn: A Novel
- The Flamethrowers: A Novel
- The First Bad Man: A Novel
- We Are Not Ourselves: A Novel
- The Blazing World: A Novel
- The Rosie Project: A Novel
- The Master
- Bel Canto
- A Man Called Ove: A Novel
- The Kitchen House: A Novel
- Beautiful Ruins: A Novel
- Interpreter of Maladies
- The Wallcreeper
- The Art of Racing in the Rain: A Novel
- Wolf Hall: A Novel
- The Cider House Rules
- A Prayer for Owen Meany: A Novel
- My Sister's Keeper: A Novel
- Lovers at the Chameleon Club, Paris 1932: A Novel
- The Bonfire of the Vanities: A Novel
- The Perks of Being a Wallflower

You are on page 1of 12

x

**A protocol for data exploration to avoid common
**

statistical problems

Alain F. Zuur*1,2, Elena N. Ieno1,2 and Chris S. Elphick3

1

Highland Statistics Ltd, Newburgh, UK; 2Oceanlab, University of Aberdeen, Newburgh, UK; and 3Department of

Ecology and Evolutionary Biology and Center for Conservation Biology, University of Connecticut, Storrs, CT, USA

Summary

1. While teaching statistics to ecologists, the lead authors of this paper have noticed common statis-

tical problems. If a random sample of their work (including scientiﬁc papers) produced before doing

these courses were selected, half would probably contain violations of the underlying assumptions

of the statistical techniques employed.

2. Some violations have little impact on the results or ecological conclusions; yet others increase

type I or type II errors, potentially resulting in wrong ecological conclusions. Most of these viola-

tions can be avoided by applying better data exploration. These problems are especially trouble-

some in applied ecology, where management and policy decisions are often at stake.

3. Here, we provide a protocol for data exploration; discuss current tools to detect outliers, hetero-

geneity of variance, collinearity, dependence of observations, problems with interactions, double

zeros in multivariate analysis, zero inﬂation in generalized linear modelling, and the correct type of

relationships between dependent and independent variables; and provide advice on how to address

these problems when they arise. We also address misconceptions about normality, and provide

advice on data transformations.

4. Data exploration avoids type I and type II errors, among other problems, thereby reducing the

chance of making wrong ecological conclusions and poor recommendations. It is therefore essential

for good quality management and policy based on statistical analyses.

Key-words: collinearity, data exploration, independence, transformations, type I and II

errors, zero inﬂation

**All statistical techniques have in common the problem of
**

Introduction

‘rubbish in, rubbish out’. In some methods, for example, a sin-

The last three decades have seen an enormous expansion of the gle outlier may determine the ﬁnal results and conclusions.

statistical tools available to applied ecologists. A short list Heterogeneity (diﬀerences in variation) may cause serious

of available techniques includes linear regression, generalized trouble in linear regression and analysis of variance models

linear (mixed) modelling, generalized additive (mixed) model- (Fox 2008), and with certain multivariate methods (Huberty

ling, regression and classiﬁcation trees, survival analysis, neu- 1994).

ral networks, multivariate analysis with all its many methods When the underlying question is to determine which covari-

such as principal component analysis (PCA), canonical corre- ates are driving a system, then the most diﬃcult aspect of the

spondence analysis (CCA), (non-)metric multidimensional analysis is probably how to deal with collinearity (correlation

scaling (NMDS), various time series and spatial techniques, between covariates), which increases type II errors (i.e. failure

etc. Although some of these techniques have been around for to reject the null hypothesis when it is untrue). In multivariate

some time, the development of fast computers and freely avail- analysis applied to data on ecological communities, the pres-

able software such as R (R Development Core Team 2009) ence of double zeros (e.g. two species being jointly absent at

makes it possible to routinely apply sophisticated statistical various sites) contributes towards similarity in some techniques

techniques on any type of data. This paper is not about these (e.g. PCA), but not others. Yet other multivariate techniques

methods. Instead, it is about the vital step that should, but are sensitive to species with clumped distributions and low

frequently does not, precede their application. abundance (e.g. CCA). In univariate analysis techniques like

generalized linear modelling (GLM) for count data, zero

inﬂation of the response variable may cause biased parameter

*Correspondence author. E-mail: highstat@highstat.com estimates (Cameron & Trivedi 1998). When multivariate tech-

Correspondence site: http://www.respond2articles.com/MEE/ niques use permutation methods to obtain P-values, for exam-

2009 The Authors. Journal compilation 2009 British Ecological Society

The term ‘linear’ in linear regression refers to the way parameters are used in the model and not to the type of relationships that are modelled. One of the most used. The aim of this paper is to pro. Although data exploration is an important part of any anal- ysis. before any statistical analysis. the data were so unbal- anced that it made more sense to analyse only a subset of the Läärä (2009) gives seven reasons for not applying preliminary data. ple in CCA and redundancy analysis (RDA. All R code and data used problems (Fig. observations. this technique is associated with linear pat- terns and normality. Gelman. When that under. in NMDS using the Jaccard index (Legen- aspects of a data exploration to search out patterns (‘data dre & Legendre 1998). We also need to know whether the data are balanced before including interactions. All graphs were produced using the software package R vide a protocol for data exploration that identiﬁes potential (R Development Core Team 2008). ter Braak & Ver- donschot 1995). and misused. but this is fundamentally dif. Decisions about what models to test should be made a priori based on the researcher’s biological understanding of In some statistical techniques the results are dominated by out- the system (Burnham & Anderson 2002). we deﬁne an outlier as an observation that has a both the process used and the limitations of any inferences relatively large or small value compared to the majority of should be clearly stated.4 A. observations. It visualizes the median and the spread of the data. data exploration can take in this paper are available in Appendix S1 (Supporting Infor- up to 50% of the time spent on analysis. F. 1. location and month to model the gonadosomatic index (the weight of the gonads relative to total body weight) of squid. Quinn & Keough 2002). Methods in Ecology & Evolution. Protocol for data exploration. observations are essentially viewed as dredging’) can provide guidance for future work. In fact. and any 2009 The Authors. but in the boxplot. e. Throughout the paper we focus on the use of graphical tools A graphical tool that is typically used for outlier detection is (Chatﬁeld 1998. liers. including: most statistical techniques based With this wealth of potential pitfalls. the central limit theory implies approximate normality. Zuur. but the presences and absences. Lines are then drawn from the boxes.highstat. warns against sented as a horizontal line with the 25% and 75% quartiles certain tests and advocates graphical tools (Montgomery & forming a box around the median that contains half of the Peck 1992. mation) and from http://www. the median is typically pre- homogeneity. ensuring that the scien. 3–14 . Draper & Smith 1998. it is the broader population avoided. For example. moment. In our experience. by using interactions or quadratic terms (Mont- gomery & Peck 1992). both concepts are often misunderstood. both sexes were not measured at every location in each month due to unbalanced sampling. Instead. on normality are robust against violation. For standing is very limited. Linear regression is more than capable of ﬁtting nonlinear rela- tionships. or the Mantel test (Legendre & Legendre 1998). Consequently. Using 2007). it is important that it be clearly separated from hypothesis Step 1: Are there outliers in Y and X? testing. some cases it is also possible to apply tests for normality or Depending on the software used. However. In contrast. data exploration can be used as a example. Journal compilation 2009 British Ecological Society. and for larger data error) or produce results determined by only a few inﬂuential sets the tests are sensitive to small deviations (contradicting the observations. When data exploration is used in this manner. For the tests conducted. tests for normality. Ieno & Smith (2007) used the covariates sex.g. The same holds with regression-type techniques applied on temporally or spatially correlated observations. Zuur et al. Fig. 1. techniques is without doubt linear regression. The statistical literature. outliers may cause overdispersion in a Poisson GLM hypothesis-generating exercise. temporal or spatial correlation between observations can increase type I errors (rejecting the null hypothesis when it is true). Knowing whether we have linear or nonlinear patterns between response and explanatory variables is crucial for how we apply linear regres- sion and related techniques. however. 1). new data should be important that the researcher understands how a particular collected based on the hypotheses generated and independent technique responds to the presence of outliers. other techniques treat them like any other value. Pasarica & Dodhia 2002).com. hence an outlier does not inﬂuence the results should be viewed very cautiously and inferences about outcome of the analysis in any special way. or binomial GLM when the outcome is not binary (Hilbe ferent from the process that we advocate in this paper. requires that detailed data exploration be applied central limit theory). Often. for wrongly dismiss a model with a particular covariate (type II small samples the power of the tests is low. and refrain from including certain interactions. for larger data sets tist does not discover a false covariate eﬀect (type I error).

) Points beyond these lines are (often wrongly) considered to be out- liers. the analysis. there may have been only one opportunity to were sorted by weight. or on the left-hand side. as we will see in a moment) that there are seven should be dropped because their presence is likely to dominate outliers. Poisson or neg- then intermediate values should also have been generated by ative binomial for count data) because doing this allows us to (a) (b) 65 Wing length (mm) Order of the data 60 55 55 60 65 Wing length (mm) Fig. fered among observers. they wrongly. 2. and the vertical axis corresponds to the order of the data. the other values. such length of about 68 mm that stands out to the left about half observations may be dropped. This value is not considerably larger than the consequent reduction in sample size may be undesirable. There is one observation of a wing sample the higher temperature. and the lower and upper ends of the box are the 25% and 75% quartiles respectively. If omitting such observations is not an option. but with relative small data sets way up the graph. species abundances are modelled as a function of temperature. than the major. Data exploration 5 points beyond these lines are labelled as outliers. The ‘upward’ trend because the range 20–25 C is inadequately sampled. especially if other observations have outliers for other explana- Figure 3 shows a multi-panel Cleveland dotplot for all of tory variables. as imported from the data ﬁle (in this case sorted by the bird’s weight). In some cases it may be helpful to rotate the boxplot 90 to match the Cleveland dotplot. the morphometric variables measured. The lines indicate 1. In general. e. on estimated parameters). but as the response variable is of primary whereby the observer’s eye is drawn to the wrong number on a interest. but wing length data. An example of the latter is when for errors and assess whether the observed values are reason. empirical data. Points that stick out that sticks out from the rest. are observed consider whether unusual observations exert undue inﬂuence values that are considerable larger. but highly neglected. edly. gamma for continuous data. With a large sample size. note that the observations identiﬁed by the one of 25 C. data is an option. and observations of a morphometric variable (wing length of the determine how the number of extreme points compares to the saltmarsh sparrow Ammodramus caudacutus. Note that one should not try to argue that probability distribution that allows greater variation for large such large values could have occurred by chance. which is the 75% minus 25% quar- tiles. or smaller. (a) Boxplot of wing length for 1295 saltmarsh sparrows. and found that the ﬁrst two axes were This is a graph in which the row number of an observation is mainly determined by the outliers. plotted vs.g.g. an equivalent number of random observations from an Figure 2a shows an example of such a graph using 1295 appropriate distribution. Figure 2b shows a Cleveland dotplot for the sparrow with nearly all temperature values between 15 and 20 C. thereby providing much So far. Methods in Ecology & Evolution. outliers in the response vari- could indicate true measurement errors (e. Such extreme values In regression-type techniques.) When the most likely explanation is that the hick & Rubega 2008). graphical tool to on the full sparrow data set to see whether observations dif- visualize outliers is the Cleveland dotplot (Cleveland 1993). A more rigorous approach is to on the right-hand side. 3–14 .g. For example. ables are more complicated to deal with. the observation value. Gjerdrum. 1. we applied a discriminant analysis Another very useful. (Note that the interval deﬁned by these lines is not a conﬁdence interval. If tinction between inﬂuential observations in the response vari- such observations exist. Some chance. it is better to choose a statistical method that uses a measurement scale). Journal compilation 2009 British Ecological Society. The graph leads one to believe (perhaps extreme observations are measurement (observer) errors. however. (A useful exercise is to generate. the Normal distribution. this is not an ideal sampling design boxplot are not especially extreme after all. In a ﬁeld in Fig. we have loosely deﬁned an ‘outlier’ as an observation more detailed information than a boxplot. (b) Cleveland dotplot of the same data. The horizontal axis represents the value of wing length. Elp. The line in the middle of the box represents the median. note that some vari. but none were. some ﬁt the char. Transforming the acteristics of ‘observer distraction’ sensu Morgan 2004. on an analysis (e. mean values (e. If they were. able. it is important to check the raw data able and in the covariates. then consider transforming the explanatory variables. repeat- researchers routinely (but wrongly) remove these observations. ables have a few relatively large values. 2b simply arises because the data in the spreadsheet study.5 times the size of the hinge. and require further investigation. so we cannot say yet that it is an outlier.g. We make a dis- ity of the observations. 2009 The Authors.

In all these graphs the residual options. For multivariate analyses. unpublished on probability distributions. a long-distance migrant approach is not an option because these methods are not based shorebird. Multi-panel Cleveland dotplot for six Wing length Tarsus length Head length morphometric variables taken from the spar- row data. they will not be detected. Observer and measurement errors are a valid jus. of these two variables (i. Axis labels were suppressed to improve visual presentation. to know whether the statistical technique to be used does assume normality. The solution to heterogeneity of tiﬁcation for dropping observations. there seems to be slightly less the Cook statistic in linear regression (Fox 2008) gives infor. observer or sex) to see whether there are dif- Value of the variable ferences among subsets of the data. Regardless of how the issue is addressed. ﬁtted values. by subgroup (e. 3. Ultimately. veriﬁcation of homogeneity do with outliers. and making a similar set of condi- observation or transforming the covariate are sensible tional boxplots for the residuals. but is reasonably robust against violation in multivariate techniques like discriminant analysis. 2009a). variation in the winter data for males and more variation in the mation on the change in regression parameters as each obser. before applying a statistical the ratio between the largest and smallest variance is 4 (conser- analysis. stabilize the variance. generalized least squares. or mean values superimposed. If you shows conditional boxplots of the food intake rates of Hudso.e.g. Note that some variables have a few unusually small or large values. But outliers in the variance is either a transformation of the response variable to response variable may require a more reﬁned approach. male data from the summer. Laird & Ware 2004). Ieno. the sexes is similar. time period or a combination rather sensitive to large values because it is based on Pythago. iable being measured. a PCA does not Homogeneity of variance is an important assumption in analy. it is important to know whether there Various statistical techniques assume normality. Linear regression does sis of variance (ANOVA). in which case dropping the residuals vs. data led many of our postgraduate course participants to produce exploration allows this to be done. one should (2008) shows that for a simplistic linear regression model heter- investigate the presence of such observations using the graphi. other regression-related models and assume normality. 3–14 . notes can be especially helpful for documenting when unu- sual events occur.6 A. or applying statistical techniques that especially when they represent genuine variation in the var. (ii) variation in observations from the three time periods is sim- Some statistical packages come with a whole series of diag. Fox similar values. and what exactly is assumed Step 2: Do we have homogeneity of variance? to be normally distributed? For example. we have to assume ras’ theorem. (2009a). and thus providing objective information Step 3: Are the data normally distributed? with which to re-examine outliers. Observations also can be plotted. omitted. To apply an ANOVA on these data to test whether measure of association. vative) or more. Figure 4 of the assumption (Fitzmaurice. Zuur et al. Taking detailed ﬁeld or experiment Pinheiro & Bates 2000. Instead. such small diﬀerences vation is sequentially. However. and individually. Culmen length Nalospi to bill tip Weight Order of the data from text file Fig. Outliers in a covariate may arise due to should be done using the residuals of the model. More serious with such tools is that when there are multiple ‘outliers’ with examples of violation can be found in Zuur et al.g. do not require homogeneity (e. ilar. by plotting poor experimental design. histogram after histogram of their data (e. however. whereas the Chord distance down-weights large that (i) variation in the observations from the sexes is similar. Zuur et al. variation should be similar. the Euclidean distance is mean intake rates diﬀer by sex. want to apply a statistical test to determine whether there is sig- 2009 The Authors. it is up to the ecologist to decide what to In regression-type models. an interaction). after sorting the observations from heaviest to lightest (hence the shape of the weight graph). Fig. and (iii) variation between the three time periods within nostic tools to identify inﬂuential observations.e. For example. and this has are outliers and to report how they were handled. In this case. i. 5a). F. this nian godwits (Limosa haemastica). For example. we can use a diﬀerent data). Hence. work with the original data. ogeneity seriously degrades the least-square estimators when cal tools discussed in this paper. The problem in variation are not something to worry about. on a mudﬂat in Argentina (E.g. Journal compilation 2009 British Ecological Society. 1. Methods in Ecology & Evolution. require normality (Jolliﬀe 2002). values (Legendre & Legendre 1998). It is important.

indicates skew. 5. temperature. One possible statistical analysis is to model the number of birds Montgomery & Peck 1992). Zuur shape of the histogram in Fig. it would not be advisable to transform the data conditions that are unfavourable to both species. this lets us need to consider what it means when two species are jointly see that the skewness of the original histogram is probably absent. GLM is the appropriate analysis. We are interested in whether the mean 0·0 values change between sexes and time peri- Summer Pre-migration Winter Summer Pre-migration Winter ods. 1. July and August 0 data were used). water depth. Methods in Ecology & Evolution. 2003) investigated the eﬀects of straw In linear regression. The consider zero inﬂated GLMs (Cameron & Trivedi 1998. (a) Histogram of the weight of 1193 20 50 sparrows (only the June. however. Zuur et al. Journal compilation 2009 British Ecological Society. son or negative binomial GLM as these would produce biased Even when the normality assumption is apparently violated. normality of the raw data implies normality elling a count. By extension. parameter estimates and standard errors. Note that the distribution is 100 80 June skewed. 6. farm. (a) (b) 100 80 August 60 150 40 20 0 100 100 Frequency Frequency 80 July 60 40 Fig. Figure 5b shows a multi-panel histogram for the same neously using multivariate techniques. etc. broken down by month. group is important (Huberty 1994). for example that it contains circumstances. we can make histograms of residu. et al. Weight (g) 2009 The Authors. This result could say something important about the caused by sparrow weight changes over time. and more normality of observations of a particular variable within each diﬃcult to detect. Figure 7 of the residuals. (b) Histograms for the weight of the 60 0 sparrows. an assumption that cannot be veri. even though we cannot fully test the ber of zeros tells us that we should not apply an ordinary Pois- assumption. we actually assume normality of all the management on waterbird abundance in ﬂooded rice ﬁelds. 2009a). Simple t-tests also assume that the observations in each group are normally distributed. which may suggest to one that data transformation is One can also analyse data for multiple species simulta- needed. 2007). Instead one should the situation may be more complicated than it seems. as diﬀerences among months may be made smaller. for example. ﬁeld management ﬁed unless one has many replicates at each sampled covariate method. 3–14 . as a function of time. Elphick & Oring (1998. ness. The extremely high num- 2002. Step 4: Are there lots of zeros in the data? hence histograms for the raw data of every group should be examined. Multi-panel conditional boxplots for 0·2 the godwit foraging data. However. Note that 40 14 16 18 20 22 24 26 28 20 the centre of the distribution is shifting. The three boxplots in each panel correspond to three time peri- ods. but need to assume that variation is simi- lar in each group. For such analyses. we variable except that the data are plotted by month. 5a. and 0 Weight (g) this is causing the skewed distributed for the 15 20 25 aggregated data shown in (a). shows a frequency plot illustrating how often each value for als to get some impression of normality (Quinn & Keough total waterbird abundance occurred. Migration period niﬁcant group separation in a discriminant analysis. replicate observations at a particular covariate value (Fig. Under these ecological characteristics of a site. Because this analysis involves mod- value. Therefore. 4. Data exploration 7 Female Male 1·0 0·8 Intake rate 0·6 0·4 Fig.

Methods in Ecology & Evolution. if a species has a highly clumped distribution. This result is consistent with the biology of 0 the species studied. thus. or water depth and distance to the complicate interpretation of such analyses. mixed eﬀects models. The same limitations holds if residuals are plotted vs. 3–14 . 7. but this does not pro- vide conclusive evidence for normality. CCA. we need to know whether there are a confusing statistical analysis in which nothing is signiﬁcant. Journal compilation 2009 British Ecological Society. we assume AMWI that observations are normally distributed with the same spread GWTE (homogeneity). The the same observation (e. Frequency plot showing the number of observations with a quite diﬀerent (e. can greatly ates like weight and length. GLMs or GAMs. 1. but similar problems exist in analysis of corrgram (Fig. Table 1 gives the results of a multiple linear regression in which 2009 The Authors. frequency of double zeros is very high. 718 of 2035 obser- variate analyses that ignore double zeros are discussed in vations equal zero. Normality and homogeneity at each covariate value GADW MALL cannot be veriﬁed unless many (>25) replicates per covariate value are taken. a MALL GADW GWTE AMWI NOPI NOSH UNDU COOT AMBI GBHE SNEG GREG KILL LBCU GRYE LBDO SNIP DUNL LESA RBGU histogram of pooled residuals should be made. or even change the sign of estimated parameters. Collinearity is the existence of similarity among the habitat needs of species or the ecological correlation between covariates. Plotting data for individual species would result in even higher frequencies of zeros. then the biggest problem to nothing about the suitability of a given site for a species. We can either present this eﬀect of collinearity is illustrated in the context of multiple information in a table. Elphick & Oring 1998). Alternative multi- certain number of waterbirds for the rice ﬁeld data. which is seldom the case in ecological studies. The top bar relates the colours in the graph to the proportion of Frequency zeros. or is simply rare. Legendre & Legendre (1998) and Zuur et al. Visualization of two underlying assumptions in linear regres. most of which form large ﬂocks and have 0 4 8 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 Observed values highly clumped distributions. this might covariates? mean that the sites are ecologically similar.8 A. although their ecological use of habitats is often Fig. the overcome is often collinearity. one is likely to end up with attitude to joint absences. 6. Step 5: Is there collinearity among the when two sites both have the same joint absences. respond to species that have more than 80% of their observa- tions jointly zero. A corrgram showing the frequency with which pairs of water- bird species both have zero abundance. On the other hand. 8. the variance. double zeros in the data. Sarkar 2008). RDA. Irrespective of our shoreline. or use advanced graphical tools like a linear regression. NOSH sion: normality and homogeneity. This means that for each species-pair.g. site). In practice. Zuur et al. 0·5 0·6 0·7 0·8 0·9 1·0 Response variable RBGU LESA DUNL SNIP LBDO GRYE LBCU KILL GREG SNEG GBHE AMBI Covariate COOT UNDU Fig. (2007). At each covariate value. 8. F. Four-letter acronyms represent diﬀerent waterbird species. but where dropping one covariate can make the others signiﬁ- we need to calculate how often both had zero abundance for cant. ﬁtted values to verify homogeneity. The dots represent observed values NOPI and a regression line is added. The colour and the amount that a circle has been ﬁlled correspond to the proportion of observa- 100 200 300 400 500 600 700 tions with double zeros. A high frequency of zeros. A PCA would label such species as similar. If collinearity is ignored. Fig. All the blue circles cor. The diagonal running from bottom left to top right represents the percentage of observations of a variable equal to zero. Common examples are covari- similarity of sites.g. If the underlying question in a study is which covariates are then joint absences might arise through chance and say driving the response variable(s). In our waterbird example.

see Gjerdrum. The price one pays for this situation is that the the VIFs. but these terms are not plots comparing covariates. An alternative consideration. 3–14 . Fox 2008): time gives a model with only the Juncus and Shrub variables. This phenomenon is illustrated in two covariates X and Z are collinear. longitude) are used together with covariates sion model in which covariate Xj is used as a response variable. Gjerdrum et al. or even moderate. like temperature. High. but with little further change in P-values. In the fourth column P-values are presented for the model after collinearity has been removed by sequentially deleting each variable for which the VIF value was highest until all remaining VIFs were below 3. is how easy alternative (VIF). of Z is explained should include mention of the collinearity. It is the ﬁrst expression also be expected if temporal (e. Gjerdrum et al. 2008). recalculate the VIFs (cf. Therefore. biplot (Jolliﬀe 2002) applied on all covariates. or perhaps better. Methods in Ecology & Evolution. The second even a VIF of 2 may cause nonsigniﬁcant parameter estimates. With the collinearity problem removed. the number of banded sparrows. In linear regression. Journal compilation 2009 British Ecological Society. stem density. The term Rj2 is the R2 from a linear regres. one should always and all other covariates as explanatory variables. For a discussion of collinearity in and repeat this process until all VIFs are smaller than a pre. Whenever diﬃcult to detect an eﬀect. is modelled as a function of the covariates listed in the ﬁrst column. month. etc. collinearity is especially abundance of various plant species (for details. (2006). Note that only one covariate. Montgomery & Peck (1992) used a value of Carroll et al. Following this tics for each regression parameter when all covariates are process caused three variables to be dropped from our analysis: included in the model. Data exploration 9 the number of saltmarsh sparrows captured in a study plot is 10. correlation coeﬃcients or a PCA relevant to the current discussion (and therefore their mathe. problematic when ecological signals are weak. In the last column. P-values of the t-statistic for three linear regression models and variance inﬂation factor (VIF) values for the full model. A high R2 in plot all covariates against temporal and spatial covariates. showing how drop- 1 r2 Varianceðbj Þ ¼ ping collinear variables can have a bigger impact on P-values 1 Rj ðn 1ÞS2j 2 than dropping nonsigniﬁcant covariates. which means that there is ates. One strategy for addressing this problem is to sequentially and recognize that it might well be X that is driving the system drop the covariate with the highest VIF. The choice of which covariates to drop can be based on collinearity. 2008). In the full model. also called the variance inﬂation factor future work on the topic will be done. and those for plant height and the per cent cover of the rush Juncus gerardii. the third column of the table gives the VIF values for tical analysis. which means that the P-values get larger making it more covariates are to measure in terms of eﬀort and cost. giving the most parsimonious explanation for the number of sparrows in a plot Covariate P-value (full model) VIF P-value (collinearity removed) P-value (reduced model) % Juncus gerardii 0Æ0203 44Æ9953 0Æ0001 0Æ00004 % Shrub 0Æ9600 2Æ7818 0Æ0568 0Æ0727 Height of thatch 0Æ9989 1Æ6712 0Æ8263 % Spartina patens 0Æ0640 159Æ3506 0Æ3312 % Distichlis spicata 0Æ0527 53Æ7545 0Æ2538 % Bare ground 0Æ0666 12Æ0586 0Æ8908 % Other vegetation 0Æ0730 5Æ8170 0Æ9462 % Phragmites australis 0Æ0715 3Æ7490 0Æ2734 % Tall sedge 0Æ2160 4Æ4093 0Æ4313 % Water 0Æ0568 17Æ0677 0Æ6942 % Spartina alterniﬂora (short) 0Æ0549 121Æ4637 0Æ2949 % Spartina alterniﬂora (tall) 0Æ0960 159Æ3828 Maximum vegetation height 0Æ2432 6Æ1200 Vegetation stem density 0Æ7219 3Æ2064 2009 The Authors. on common sense or biological standard errors of the parameters are inﬂated with the square knowledge. see selected threshold. Table 1. latitude.g. The term Sj depends on covariate values. combination with measurement errors on the covariates. In that case.g. Elphick & Rubega 2005. is weakly signiﬁ. In the second and third columns. then the biological discussion in which the eﬀect all covariates and shows that there is a high level of collinearity. Juncus variable is shown to be highly signiﬁcant (Table 1). rainfall. which is a measure of how many birds were present. an expression for the variances of the Sequentially dropping further nonsigniﬁcant terms one at a parameters bj is given by (Draper & Smith 1998. 1. the cant at the 5% level. and Z is used in the statis- Table 1. The such a model means that most of the variation in covariate Xj easiest way to solve collinearity is by dropping collinear covari- is explained by all other covariates. n is the sample size Other ways to detect collinearity include pairwise scatter- and r2 is the variance of the residuals. ables (e. Collinearity can matical formulation is not given here). the P-values and VIF values for the full model are presented (note that no variables have been removed yet). but a more stringent approach is to use values as low as 3 as modelled as a function of covariates that describe the relative we did here. that for the tall Spartina alterniﬂora. especially when root of 1 ⁄ (1 ) Rj2). year) or spatial vari- that is important. only variables with signiﬁcant P-values remain. column of the table gives the estimated P-values of the t-statis. compared to the situation without collinearity.

Multi-panel scatterplots between the 0 number of banded sparrows and each covari- 0 10 20 30 40 0 2 4 6 8 30 40 50 60 0 20 40 60 80 ate. Zuur et al. Journal compilation 2009 British Ecological Society. Any observation that sticks out from the black cloud Another essential part of data exploration. especially in needs further investigation. 1. we Fig. 0·4 0·5 0·7 0·7 Nalospi to bill tip The upper ⁄ right panels show pairwise scat- terplots between each variable. 9. 2b has analysis as it has only one non-zero value. Methods in Ecology & Evolution. common approach to this analysis is to apply a linear regres- plots are also useful to detect observations that do not comply sion model in which weight is the response variable and wing with the general pattern between two variables.10 A. A model with multiple explanatory variables may one asks whether the relationship between wing length and still provide a good ﬁt.e. A LOESS smoother was added to aid Covariates visual interpretation. Note that the variable for the per cent of tall values after all. sex (categorical) and month (categorical) Maximum vegetation height Vegetation stem density 50 40 30 20 10 0 0 2 4 6 8 10 12 20 40 60 80 % Tall sedge % Water % Spartina alterniflora (short) % Spartina alterniflora (tall) 50 40 30 20 10 0 Banded 0 5 10 15 0 5 10 15 20 0 20 40 60 0 20 40 60 80 100 % Distichlis % Bare ground % Other vegetation % Phragmites australis 50 40 30 20 10 0 0 10 20 30 40 50 0 5 10 15 20 0 2 4 6 8 10 12 0 5 10 % Juncus gerardii % Shrub Height of thatch % Spartina patens 50 40 30 20 10 Fig. univariate analysis. The font size of the correlation coeﬃcient is proportional to its value. these may be diﬀerent species. typing mistakes or they may be correct covariate (Fig. 9 ute to collinearity. suggesting that it is that the boxplots and Cleveland dotplots should not only be indeed something that should be checked. except per- haps for the amount of Juncus (see also Table 1). 20 24 28 32 10 12 14 16 10 15 20 25 65 Wing chord 55 20 26 32 0·5 Tarsus length 35 0·5 0·5 Head length 25 10 14 0·4 0·5 0·7 Culmen length Fig. There are no clear patterns in Fig. it just means that there are no clear two-way rela. Figure 10 length (continuous). Staying with the sparrow morphometric data. Note 55 60 65 25 30 35 6 8 12 16 that there are various outliers. shows a multi-panel scatterplot (also called a pair plot) for the Step 6: What are the relationships between Y 1295 saltmarsh sparrows for which we have morphological and X variables? data. suppose that tionships. 10 contain Pearson correlation coeﬃcients. A Besides visualizing relationships between variables. each measurement errors. F. The lower panels in applied on the response variable but also on covariates (i. 9). scatter. between the response and explanatory variables. Note that the large wing length observation sedge in a plot (%Tall sedge) should be dropped from any that we picked up with the Cleveland dotplot in Fig. 2009 The Authors. Note that the Step 7: Should we consider interactions? absence of clear patterns does not mean that there are no rela- tionships. This result shows average values for all other variables. is plotting the response variable vs. 3–14 . 10. and the low- 6 er ⁄ left panels contain Pearson correlation 10 20 0·6 0·5 0·6 0·6 0·5 Weight coeﬃcients. Multi-panel scatterplot of morpho- 12 18 metric data for the 1295 saltmarsh sparrows. weight changes over the months and diﬀers between sexes. which can be should not have calculated the VIFs with %Tall sedge included inﬂuenced by outliers meaning that outliers can even contrib- in the previous section).

Another residuals should not contain any dependence structure. squares (Zuur et al. This concept is best the models using a selection criterion or hypothesis test explained with examples. of amphibians killed by cars at various locations along a road. If this is the case. in the population. (Pinheiro & Bates 2000). Journal compilation 2009 British Ecological Society. these individuals covariate that was not measured. the independence up and similar parental provisioning history. These include using Step 8: Are observations of the response lagged response variables as covariates (Brockwell & Davis variable independent? 2002). For example.g. however. Results showed that the three-way interaction is elling any spatial or temporal relationships. In some months. 11. For example. 2009a) or allowing regression parameters meaning that information from any one observation should to change over time (Harvey 1989). assumption is rather important and violation may increase the When such dependence arises. Another commonly encountered situation the number of observations is very small. Testing for independence. Coplot for the sparrow data. the number the regression analysis can tell us whether this is indeed the of disease-causing spores aﬀecting larval honey bees from mul- case). mixed eﬀects modelling (Pinheiro & Bates 2000). then there is probably no signiﬁcant interaction (although only feeding behaviour of diﬀerent godwits on a beach. nestlings could be nested within wing length is indeed changing over the months and between nests). 1.g. which is an excellent graphical tool sed in which dependence among observations played a role. The presence of a dependence struc- The observations from the sparrow abundance data set were ture in the raw data may be modelled with a covariate such as taken at multiple locations. indicating the tiple hives and the number of calls from owl chicks upon arri- potential presence of interactions. or by nesting data signiﬁcant. The graph Examples include the amount of bioluminescence at sites along contains multiple scatterplots of wing length and weight. However. When using regression techniques. there is a problem with this analysis. because they share a similar genetic make. and there are no data where non-independence must be addressed is when there is at all from males in September. pH values in Irish rivers. val of a parent. then nates (Wood 2006). the number sion line is added to each scatterplot. auto-correlation may give P-values that are 400% inﬂated. Methods in Ecology & Evolution. or the inclusion of a smoothing func- other have characteristics that are more similar to each other tion of time or a two-dimensional smoother of spatial coordi- than to birds from locations separated by larger distances. (2009a) a large number of data sets were analy- shows the data in a coplot. indicating that the relationship between weight and in a hierarchical structure (e.e. lines have diﬀerent slopes. A sensible approach would be phylogenetic structure (i. 3–14 . Data exploration 11 Given : month Sep Aug Jul Jun May 52 56 60 52 56 60 52 56 60 16 18 20 22 24 Female Given : sex Weight (g) 16 18 20 22 24 Fig. Ostrom (1990) showed that ignoring analyse the data needs to account for it. by mod. sexes. and compare variables have been accounted for. a bivariate linear regression model was ﬁtted to aid visual 52 56 60 52 56 60 interpretation. Figure 11 In Zuur et al. to visualize the potential presence of interactions. It is also possible to ﬁt a not provide information on another after the eﬀects of other model with and without a correlation structure. within a data set. On each panel. is not always easy. The Male lower left panel shows a scatterplot between wing length and weight for males in May. and the upper right panel for females in September. There are many ways to include a temporal or spatial depen- dence structure in a model for analysis. In our example. often a residual correlation structure is caused by an important all of the young from one nest) are sampled. the model we would violate the independence assumption. dependence due to shared ancestry) to repeat the analysis for only the June–August period. it may not might be more similar to each other than random individuals be possible to resolve the problem. one an oceanic depth gradient. impos- A crucial assumption of most statistical techniques is that ing a residual correlation structure using generalized least observations are independent of one another (Hurlbert 1984). if all lines are parallel. If birds at locations close to each month or temperature. nitrogen isotope ratios in whale for each month and sex combination. the statistical model used to type I error. 2009 The Authors. Regardless of the method used. Wing length (mm) are covariates. A bivariate linear regres. Quite example is when multiple individuals of the same family (e. teeth as a function of age. however.

in lin- the L. and therefore there is no point in panels (a) and (c). Pullin & Knight 2009). 2009a). This means that abundances at depend on the speciﬁc data set. ear regression. be just as important to ensuring that the best policies are The ﬁrst time series shows high numbers of white-rumped derived from ecological studies.c shows a short time series illustrating regger et al. Panel (b) shows a signiﬁcant correlation with making histograms. Methods in Ecology & Evolution. by Time (2 weeks) Lag deﬁnition. no clear sequence to the observations (e. Sonde- et al. Journal compilation 2009 British Ecological Society. sandpipers Calidris fuscicollis during the ﬁrst 20 weeks. (c) Number of Larus dominicanus vs. but sense. Dotted lines in panels 0 (b) and (d) are c. Stewart & Pullin 2006. dominicanus showing no signiﬁcant correlation. but even the well- ies does not show a clear pattern in the abundance of kelp gulls known assumptions continue to be violated frequently in the (Larus dominicanus). we have discussed a series of pitfalls that can lowed by zeros (because the species migrates). there is no signiﬁcant auto. apply throughout ecological research. In all cases. time t depend on abundances at time t ) 1 and t ) 2. Fitzmaurice et al. 12. fuscicollis ACF 400 800 –0·4 0·2 0·8 ACF 0 5 10 15 20 25 0 2 4 6 8 10 12 Time (2 weeks) Lag Fig. F. and the strategies has been adequately addressed in a way that makes biological to address them. time or spatial coordinates. Law et al. In this paper. and also the residuals guide management decisions or public policy because of the afterwards. Once satisﬁed that each issue All of the problems described in this paper. (a) Number of Calidris fuscicollis (c) (d) plotted vs.g. 2004. This approach is more diﬃcult if there is ular management practices (Roberts. PCA). and applied ecologists have become tions on the same object). 1. 95% conﬁdence bands. (a) (b) C. multiple observa.g. fol. 5 10 15 20 25 0 2 4 6 8 10 12 The auto-correlation with time lag 0 is. Brown & Prescott 2006.g. For example. using the residuals produced by the model. Such problems can be avoided dependence is to plot auto-correlation functions (ACF) for only by applying a systematic data exploration before embark- regularly spaced time series. 2009). For example. equal to 1. and any assumptions can be veriﬁed only by applying data explorations of the methods mentioned above could be applied. The second time ser. some less so. Not every data set requires time series and the same time series shifted by k time units. Rather than sim- plistically following through the protocol. or variograms for irregularly ing on the analysis (Fig. and then an seriously inﬂuence the results of an analysis. 2009 The Authors. (b) Auto-correlation function for the C. it is important to check whether there is dependence in they are particularly relevant when results are to be used to the raw data before doing the analysis.d show the auto-correlation of the time series in require normality (e. Stephens et al. some statistical techniques do not Figures 12b. Zuur mann 2008. Zuur et al. Increasing attention has variable vs. spaced time series and spatial data (Schabenberger & Pierce Although we have presented our protocol as a linear 2002). but in this case one can include a increasingly sophisticated in the statistical methods that they dependence structure using random eﬀects (Pinheiro & Bates use (e. Ellison 2004. dominicanus ACF collis time series showing a signiﬁcant 8 12 –0·4 0·2 0·8 correlation at time lags of 2 and 4 weeks ACF (1 time lag = 2 weeks). Any clear pattern is a been paid in recent years to the body of data supporting partic- sign of dependence. But more fundamental questions about the the observed abundance of two bird species on a mudﬂat in appropriateness of the underlying data for a given analysis can Argentina over a 52 week period (E. dominicanus abundance L. 2009. we would encourage users to treat it as a series of Discussion questions to be asked of the data. The best order to apply the steps may also a time lag of k = 1 and k = 2. it should be used ﬂexibly. time (1 unit = 2 weeks). unpublished data). (d) Auto-correlation 4 function for L. dominicanus time series. And for some techniques. fusci- L. For steps after the analysis has been performed. 1). normality and homogeneity should be veriﬁed correlation. 3–14 . Figure 12a. ticking oﬀ each point in order. An ACF calculates the Pearson correlation between a sequence. fuscicollis abundance C. Hence.12 A. each step. ecological literature. Koper & Manseau 2009. Robinson & Ha- 2000. These checks can be made by plotting the response repercussions of making a mistake. Some of these abundance increase again after 38 weeks. problems are well known. the data set should be ready for the main analysis. the problems can lead to sta- A more formal way to assess the presence of temporal tistical models that are wrong. time. 2005. Ieno.

L. (2002) Introduction to Time Series and Forecast- ing. one’s analyses to such circumstance is an important skill for New York. 125. & Rubega. Stefanski. A.H. L. Data exploration 13 Ecological ﬁeld data tend to be noisy. Verlag.. N. policy. Montgomery. it selection and nesting success in saltmarsh breeding sparrows? Condor. & Ware. making Elphick. D. ecologists simply transform data to avoid assump. ities further complicate matters.R. ing of systematic data exploration would improve the quality Pullin. (1998) Applied Regression Analysis. I.C. Chapman & Hall. New York.. Ecological Monographs.M. Legendre.W. Springer. (1998) Winter management of Californian rice transformation less important. (1998) Numerical Ecology. Gjerdrum.A.U.S.. Every step of Koper. P. It is numbers and productivity of saltmarsh sharp-tailed sparrows (Ammodramus also important to ensure that the transformation actually caudacutus) using habitat features? Auk. New York.M. Journal of Applied Ecology.S. Amsterdam. 17–29. N. FL. 2nd edn. M. ﬁeld conditions Acknowledgements unpredictable and prior knowledge often limited. Zuur et al. Wiley. ing of the constraining assumptions imposed by a given data Cambridge University Press. It is a given fact that data exploration should not be used to Jolliﬀe. J. 56. NJ. & Trivedi. (2004) Remarks on the taking and recording of biometric measurements in bird ringing. their analysis a priori and then collects data may be compro.H. J. Cambridge. Addison Wesley. Applying data exploration (e. & Davis. and any outlier removed of Applied Ecology.g. Boca Raton. and research prior. For all these reasons. G. J. mised or irrelevant. J. Hurlbert. Sage Publications.g.J. the routine use and transparent report. Ecology Letters. 2nd edn. (1977) Data Analysis and Regression: A Second apply data exploration on the ﬁrst data set to create hypotheses Course in Statistics.S. & Oring. cess. Journal of Ecology.F. the idealized situation whereby an ecologist carefully designs Brown. J. Wiley and Sons. 35.A. tion violations. E. The Ring. M.C. C. analysis using untransformed data. Cambridge University Press. MA. (2008) Applied Regression Analysis and Generalized Linear Models. Ellison. automatic selection tools such as Mosteller and Tukey’s edn. (1984) Pseudoreplication and the design of ecological ﬁeld error. FL.P. A. CA. Structural Time Series Models and the Kalman solves the problem at hand. (2000) Mixed Eﬀects Models in S and S-Plus. Burslem. even commonly recommended Filter.W. John stabilize the variance and to linearize relationships. Cleveland. Second English Edition. and use the second data set to test the hypotheses. 187–211. Journal compilation 2009 British Ecological Society. but it requires a thorough understand. (2009) Statistics: reasoning on uncertainty. New York. Ecosystems and clusions about long-term trends compared to an appropriate Environment. should be justiﬁed and mentioned. Elphick. UK.M.S. (2005) What determines nest site to subsequently back-transform values to make predictions. Ostrom. There are three main reasons for a transforma. 520. Ruppert. Harvey. plants: insights from point process theory. & Peck. (1998) Problem Solving: A Statistician’s Guide. Thousand Oaks ⁄ Newbury Park. 7.N. However. C. Brockwell. T. C. to reduce the eﬀect of outliers (especially in covariates).W. C. The bottom line is that Hilbe. knowledge. avoid transforming response variables. C. & Crainiceanu. (2003) Conservation implications of ﬂooding rice how the use of a data transformation resulted in diﬀerent con. 54.K.M. (2004) Bayesian inference in ecology. UK. Illian. In the We thank Anatoly Saveliev. G. (1990) Time Series Analysis: Regression Techniques. C. 97.. R. 2nd edn. 3–14 .S. and two anonymous reviewers for comments on applied realm.. 2nd used. Agriculture. Reading. Elsevier. great caution because these methods ignore the eﬀects of cova. Springer. Läärä. tionships between response and explanatory variables) to cre. (2009a) showed ﬁelds for waterbirds. The American Statistician. hetero. transformations do not always work. Methods in Ecology & Evolution. Springer-Verlag. then a valid approach is to create two data sets.S. Chapman & Hall. 616–628. & Knight.A. Having the analytical ﬂexibility to adjust Burnham. 26. C. it produces. Summit. A. S. Such a pro. UK. is only practical for larger data sets. 2nd edn. If one has limited a priori Morgan. scatterplots to visualize rela. This situation is especially so for long-term studies.. 1. evidence that model assumptions were violated and that Gunatilleke. (1993) Visualizing Data. 71–78. New York. Carroll.W. (2002) Model Selection and Multimodel Inference.A. D. Hobart Press.. Journal the exploration should be reported.R.J. Cambridge University Press. 509– using more advanced techniques like GLS and GAMs. 590–599. C. and the insigniﬁcance of mations need to be justiﬁed based on the exploratory analysis testing null. R. D. & Manseau. If a transformation is Fox. Gunatilleke. (2004) Applied Longitudinal Analysis. W. an applied ecologist. L.P.T. to Draper. John Wiley & Sons. R. Boca Raton. however. Elphick. (2002) Principal Component Analysis. L. (2002) Let’s practice what we preach: turning tables into graphs in statistic research. (1992) Introduction to Linear Regression ate hypotheses and then using the same data to test these Analysis.C. & Prescott. Biological of ecological research and any applied recommendations that Conservation. Law. A. hence it may be best to Fitzmaurice. Measurement Error in Nonlinear Models: A Modern Perspective.R. experiments. & Bates. (2008) How well can we model pret results on the original scale of the response variable. Hoboken. 849–862. & Dodhia. 2nd edn. Frequently. J.H. the best solutions vary. H. Annales Zoologici Fennici. & Oring. & Rubega. Springer- of the speciﬁc situation. Cambridge. 608–617. (2006) set. CA. E.V. C. H. R. (2009) Doing more good than harm – building an evidence-based for conservation and environmental management. C. A Practical Information–Theoretic Approach.M. deﬁne the questions that a study sets out to test. Cameron. K. New York. P. M. may not always be clear how to do this and still be able to inter. R. (2006) Applied Mixed Models in Medicine. 46. 2009 The Authors. Sage Publications Inc.. 3rd edn. Mosteller. the use of many data sets to examine species responses to climate change). Pasarica. 2nd edn. where the initial goals often change References with circumstances (e. & (e. 121–130. N.. however. 94. bulging rule (Mosteller & Tukey 1977) should be used with Gelman. J. (1998) Regression Analysis of Count Data. (2007) Negative Binomial Regression. Gratzer. 107. A. D. an earlier draft. NJ. & Legendre.M. When problems arise. F. & Tukey. 931–934. (2009) Ecological information from spatial patterns of the transformation rectiﬁed the situation). & Smith. Chatﬁeld. (1989) Forecasting. 46. Elphick. C..g. Regardless Pinheiro.S. New York. hypotheses should be avoided. I. Reasons for data transfor. Laird. (2009) Generalized estimating equations and gener- alized linear mixed-eﬀects models for modelling resources selection. 142. John Wiley and Sons. 138–157. 95–108. riates. P. Another argument against transformations is the need Gjerdrum. & Anderson. ﬁelds on winter waterbird communities. tion. D.A. the choice of a speciﬁc transformation is a matter of trial and Cambridge. geneity and nonlinearity problems can be solved. changes in funding.

but are not copy-edited or typeset. & Martı́nez del Rio. E.. S. New York. P. Wood. & Smith. (2006) Generalized Additive Models. Biological Conservation. Ieno.. (2002) Contemporary Statistical Models for version of this article: the Plant and Soil Sciences.F.P.F. Journal compilation 2009 British Ecological Society. New York. 57. Clements. & Meesters. G. UK. Quinn.A. (2008) Correcting for spatial autocorre- lation in sequential sampling.L. A. (2005) Information theory and hypothesis testing: a call for pluralism.J.N. A.. Journal of Applied Ecology.G.F. 7. Boca Raton. Vienna. New York. than missing ﬁles) should be addressed to the authors.W. & Pullin. analysis and related multivariate methods in aquatic ecology. Handling Editor: Robert P.. A. (2006) Are review articles a Springer.. W. & Smith. Zuur. An Introdcution with R. F.F. 3–14 . Hayward. Data sets and R code used for analysis.. Ieno. E. reliable source of evidence to support conservation and environmental management? A comparison with medicine. E. (2002) Experimental Design and Data Analysis for N. R Development Core Team (2009) R: A Language and Environment for Zuur.D. H. Zuur. A. Such materials may ter Braak. Aquatic Technical support issues arising from supporting information (other Science. Zuur et al.. 4–12. (2009a) Mixed Eﬀects Models and Exten- Biologists. D. (2009b) A Beginner’s Guide to R.J. Buskirk. Chapman Hall ⁄ CRC. ISBN 3-900051-07-0. Statistical Computing.J.A. Methods in Ecology & Evolution.R-project. 1221– 1227.org.B.D. URL http://www.H. New York. Received 13 August 2009. Journal of As a service to our authors and readers. C. Additional Supporting Information may be found in the online Schabenberger.F. & Noon. S. Stewart.D. D. M. G. Frecklenton Robinson.N. Wang.N. support ing information supplied by the authors. (2009) Using SiZer to detect thresholds in ecological data. C. J. B..N. Supporting Information Sarkar. Springer. Saveliev. G. 132. A. this journal provides Applied Ecology. sions in Ecology with R. Ieno. G.M. & Keough. Springer. 225–289. 45. Stephens. Cambridge. F.. Frontiers in Ecology and the Appendix S1. 1.. 2009 The Authors. & Hamann. & Pierce. E. R Foundation for Statistical Computing. P.P. (1995) Canonical correspondence be re-organized for online delivery. Walker.H. Cambridge University Press.M. Boca Raton. 42. FL. CRC Press. G.R. & Verdonschot.W. (2008) Lattice: Multivariate Data Visualization with R. FL.S.J. accepted 8 October 2009 409–423. Environment. O.14 A. Springer. Roberts. (2007) Analysing Ecological Data. Sonderegger. P. A. 190–195.

- Bba Term-V-business Research Methods-II (Rg)Uploaded bySaish Naik
- CME106 Problem set 6Uploaded byTony Hua
- IA_2016_HL_Report Checklist IBO Rev CH (1)Uploaded byFRENCHONLY
- Syllabus_STAT512_new (3).docxUploaded byGurkreet Sodhi
- ComparativeUploaded byAd Vels
- Ch 13 Student KrajewskiUploaded byJorge Jarpa V
- Salomon EpidemiologicTransition PDR2002Uploaded byfrancisco Láprida
- 확률통계답안지Uploaded bylikemun
- Chapter14_Review_Answers.pdfUploaded byChristine Mae Torriana
- EconometricsUploaded byOladipupo Mayowa Paul
- 7 Artikel Edisi Juni 2017 Register Journal Iain SalatigaUploaded byabudira2009
- IMMS_B108Uploaded byerlangga_bayu
- Huong Dan SD SPSS 11.5Uploaded bydrtrungdalieu
- 1718 1-The Role of Statistics in Engineering.pdfUploaded byApolinus Silalahi
- A Fully Automated Correction Method of EOGUploaded byTekno Med
- Career and RSI/WMSDUploaded byPaulo Moraes
- Using Real Data For Decision AnalysisUploaded byphi slamma jamma
- OutliersUploaded byClaudio Pilar
- Logistic Regression[1]Uploaded byanil_kumar3184754
- A NOTE ON TESTING OF HYPOTHESISUploaded byIlieCraciun