Beruflich Dokumente
Kultur Dokumente
To cite this article: Rosana Aguilera, David M. Livingstone, Rafael Marcé, Eleanor Jennings,
Jaume Piera & Rita Adrian (2016) Using dynamic factor analysis to show how sampling resolution
and data gaps affect the recognition of patterns in limnological time series, Inland Waters, 6:3,
284-294, DOI: 10.1080/IW-6.3.948
Article
Abstract
Currently, many water quality variables can be automatically monitored at subdaily frequencies, but most environmen-
tal datasets still rely heavily on comparatively low-frequency (e.g., monthly) sampling campaigns. Taking advantage of
the long-term ecological data available from polymictic Lake Müggelsee (Germany) at weekly temporal scales, we
tested whether dynamic factor analysis (DFA), a dimension-reduction technique especially designed for time series
with data gaps, was able to reproduce the same underlying patterns if the resolution of the observations was artificially
reduced from weekly to once every 2 or 4 weeks (2-weekly or 4-weekly, respectively) and whether the same results
were obtained for different variables (water temperature, water chemistry, and plankton). As expected, the reduction in
data resolution and the addition of gaps increased the amount of variance in our case studies; however, water
temperature patterns, which generally vary only slowly if considered at broader temporal scales, were represented well
in the 4-weekly data model. By contrast, total algal biomass and nutrient dynamics, which usually fluctuate more
rapidly, were not. We further assessed the representativeness of spot samples taken by selection at 4-weekly and
seasonal intervals compared to samples obtained by averaging the weekly data over these 2 intervals. In this case, the
resulting observation variance was more pronounced in DFA models of the samples taken by selection than of those
obtained by averaging. The introduction of additional artificial gaps in one univariate and one multivariate model
decreased the amount of information that could be explained by the pattern(s) extracted by DFA. Overall, we found that
a too low sampling frequency could be a limiting factor when studying interactions and responses to changing
conditions in aquatic systems, particularly when biological variables are involved.
Key words: dynamic factor analysis, Müggelsee, sampling resolution, time series
than the Nyquist frequency is interpreted as appearing at wavelet Z-transform (WWZ) method can handle unevenly
frequencies lower than the Nyquist frequency, which sampled data, but large gaps in the data would complicate
results not only in a loss of information about the system the identification of temporal patterns (Foster 1996).
but also distorts the information obtained. In limnology, Another example is Seasonal Kendall trend analysis,
these signals might represent processes and interactions which can cope with missing values (Hirsch et al. 1982).
between variables within a given ecosystem or region. Nevertheless, commonly used methods such as spectral
Processes in lakes are complex, involving overlapping analysis, wavelet analysis, and Box-Jenkins modeling
temporal and trophic-level scales. Thus, when measuring usually require time series to be stationary and evenly
ecological variables in lakes, the sampling frequency and sampled (Zuur et al. 2003a). In addition, these methods are
the lengths of the time series obtained are critical for not particularly suitable for unraveling patterns and inter-
capturing and understanding patterns of temporal evolution actions between variables over time.
in these variables. These patterns include (but are not Dynamic factor analysis (DFA) is a dimension-reduc-
limited to) long-term trends and seasonal variability, which tion method designed for time series (Zuur et al. 2003a)
can usually be identified using lower frequency data (e.g., used mainly to estimate underlying common patterns in a
monthly) as well as high-frequency fluctuations, which set of time series. In this method, observations are related
typically require finer data resolution. to the hidden patterns or states by the state-space model.
In the past few decades, studies relying mainly on This framework provides a means for modeling nonlinear
weekly to once every 4 week (4-weekly) spot samples effects and structural change in time series and is able to
(Halliday et al. 2012) have revealed the complex nature of cope with missing observations (Harvey 1989). DFA has
ecosystems response to environmental change and the been used widely in econometric and psychological studies
time scales involved (review in Adrian et al. 2012). Such and has recently been applied to fisheries datasets (Zuur et
studies have encouraged the measurement of long-term, al. 2003b, Zuur and Pierce 2004, Vilizzi 2012), groundwa-
high-frequency datasets (Parr et al. 2003). The recent ter-quality data (Muñoz-Carpena et al. 2005, Kuo et al.
development and deployment of high-frequency sensor 2013), and river water-quality data (Aguilera et al. 2015).
monitoring equipment and the increasing duration of ob- To address the issue of the temporal scale of sampling
servational time series available from lakes have in turn and gaps in limnological time series, we took advantage
allowed more detailed study of lake responses and of an ongoing long-term research program at Müggelsee,
processes (Jennings et al. 2012, Solomon et al. 2013, Rose a shallow, polymictic lake in Berlin, Germany, in which
et al. 2014). These technological developments also data on physicochemical variables and plankton are
guarantee that the sampling frequency will normally collected weekly. We chose a shallow, polymictic lake
exceed the Nyquist frequency, avoiding any numerical because this type of lake is known to be more strongly
artifacts associated with aliasing. influenced by short-term ambient weather conditions than
Despite such advances, most limnological datasets are are dimictic lakes (Gerten and Adrian 2000). The ultimate
still prone to uneven sampling intervals and missing ob- goal of this study was to assess the potential usefulness of
servations. These sampling campaigns are, in most cases, DFA in analyzing patterns in limnological time series
subject to logistical and financial limitations and often do with differing temporal sampling resolutions and data
not take into account the ideal frequency requirements gaps. To achieve this, we specified 4 objectives. (1) We
necessary to obtain adequate sample representativeness of tested whether DFA was able to reproduce the same
the system being studied. These commonly encountered underlying patterns in the data when the sampling
problems can represent a limitation in the study of the resolution, in univariate analyses, was reduced from
underlying patterns in ecosystem responses to climatic and weekly to 4-weekly. (2) We compared and contrasted
anthropogenic forcing. analyses of physicochemical and biological variables
Irregularities in sampling frequency and the presence because these 2 types of variables may differ with regard
of data gaps are often the first challenges encountered to their frequency spectra. (3) We aimed to determine
when selecting an appropriate method for analyzing time whether the complex dynamics of a set of time series
series. Consequently, the choice of analytical methodology with well-known interactions could be described using
becomes crucial to taking advantage of all available DFA and differing sampling frequencies. (4) Finally, we
information. Of the various time-series analysis techniques assessed the ability of DFA to cope with missing observa-
that can be used to study responses and patterns, few are tions in 2 ways: (a) by introducing artificial gaps into the
able to cope with these challenges. For instance, Schoell- time series and investigating the effect of this on the
hamer (2001) applied a modified singular-spectrum patterns extracted; and (b) by resampling the available
analysis algorithm to obtain spectral estimates from weekly data to obtain lower-frequency (i.e., seasonal and
records with a large fraction of missing data. The weighted 4-weekly) datasets.
Table 1. Time period and number of observations for the Müggelsee variables included in the study. (Note that for pH the number of observa-
tions was substantially lower than for the other variables).
Number of observations Date
Variable Weekly 2-weekly 4-weekly Start End
Water temperature 1748 874 437 24 Jul 1978 16 Jan 2012
Dissolved oxygen 1722 861 431 22 Jan 1979 16 Jan 2012
pH 1027 514 257 18 May 1992 16 Jan 2012
Ammonium 1721 861 431 01 Jan 1979 19 Dec 2011
Total algal biomass 1721 861 431 01 Jan 1979 19 Dec 2011
Copepod species case study 1614 807 404 21 Jan 1980 20 Dec 2010
formulation, and thus DFA, also allows a univariate time
Methods series to be decomposed into a pattern and an error term.
DFA was performed using the Multivariate Autoregres-
Study site and time-series data sive State-Space (MARSS) R-package 3.4 (Holmes et al.
2013), run in R 2.15.1 (R Core Team 2012). Each time series
Müggelsee is a shallow, polymictic lake in Berlin was standardized by subtracting its mean and dividing by its
(Germany) with a surface area of 7.3 km2, a mean depth of standard deviation to facilitate the interpretation and
4.9 m, and a water retention time of ~6–8 weeks. Data comparison of resulting patterns and factor loadings. The
collection in the lake has been undertaken weekly (from basis of the DFA formulation in MARSS includes a process
spring to autumn) or once every 2 weeks (2-weekly; during model (x) and an observation model (y):
winter) since 1979 (Köhler et al. 2005). A comprehensive
description of the characteristics of Müggelsee and its sur- xt = xt-1 + wt wt ~ MVN(0,Q)
roundings can be found in Driescher et al. (1993). Physico- yt = Zxt + vt vt ~ MVN(0,R). (1)
chemical and biological variables from the Müggelsee
dataset were selected to carry out DFA. Variables included The observations (yt) are modeled as a linear combination of
water temperature, dissolved oxygen (DO), and pH (all hidden patterns (xt) and factor loadings (Z; Holmes et al.
measured at 1 m depth), nutrient concentrations 2012). The terms wt and vt describe the errors associated with
(ammonium), and the abundances of phytoplankton (both the hidden patterns and the observations, respectively,
taken from the upper 4 m of the water column) and normally distributed with expectation 0 and with covariance
zooplankton (taken from the entire water column). A matrices Q and R. The Q matrix is set to an identity matrix,
detailed description of sampling and sample processing is and the R matrix is usually taken to be diagonal. The diagonal
given in Driescher et al. (1993) and in Gerten and Adrian values in the R matrix contain the variances of the time series
(2000). involved, and the off-diagonal values contain the covariances
The time spans covered by the time series ranged from between the time series. The MARSS algorithm uses the
~19 yr (pH) to 33 yr (water temperature; Table 1). The finest Kalman filter and smoother to compute the best estimate for
resolution was weekly. The reduction in resolution from the hidden patterns, given a set of assumed values for the
weekly to 2-weekly and 4-weekly was accomplished by unknown parameters Z, Q, and R. The output of the Kalman
simple decimation (i.e., by selecting every second or every filter is then used to compute the likelihood of the data given
fourth value from the original weekly dataset). the better estimates for the set of parameters. MARSS
estimates the maximum likelihood using the expectation–
Dynamic factor analysis (DFA) maximization (EM) method (Holmes 2013). The number of
iterations necessary is determined by the log-likelihood of
Dynamic factor analysis is based on structural time-series reaching convergence (slope tolerance for log-log test = 0.1;
models viewed from within the state-space model framework absolute tolerance test = 0.001).
(Harvey 1989). In DFA, all common patterns in a set of time A considerable advantage of the state-space approach is
series are estimated simultaneously using the maximum the ease of addressing missing observations (Durbin and
likelihood method. These patterns are modeled as random Koopman 2012). Holmes (2013) performed a series of
walks and thus are allowed to be stochastic, meaning that modifications in the EM update equations for missing
their shape can change over time and is not necessarily values in MARSS models, based on the work of Shumway
restricted to a straight-line trend or a cosine-function cyclic or and Stoffer (1982, 2010) and Zuur et al. (2003a), and
seasonal component (Zuur et al. 2007). The state-space presented a thorough explanation of the methodology. The
main disadvantage of DFA is its intensive computations, Effect of gaps on pattern extraction in time-series analysis
translating into long computational times ranging from Gaps were introduced in one representative univariate
hours to days and weeks, depending on the complexity of time series (total algal biomass) and in the multivariate
the model. In our study, the High-Performance Computing copepod species case described earlier (Scharfenberger et
Cluster Undarius allowed the simultaneous computation of al. 2013). Gaps were systematically introduced by
multiple independent analyses. Nevertheless, the deleting 3 months of available data per year. The deleted
computation time for the multivariate analyses (based on data consisted of a large gap comprising 2 consecutive
the copepod species and cryptophytes case study described spring months (Apr and May) and a shorter gap in winter
below) ranged from 1 to 4 weeks. (Dec). Three different analyses with different sampling
resolutions (weekly, 2-weekly, and 4-weekly) were
Effect of sampling resolution performed to determine changes to the observed patterns
(previous models without artificial gaps) when data gaps
Univariate time-series analysis are larger or evenly distributed across time (i.e.,
Univariate time-series analyses were performed so that systematic missing observations at specific times of the
a pattern plus an error term would be extracted from year). In the copepod species case study, gaps were
each variable along a trophic hierarchy (water introduced only in the driving variable (i.e., cryptophyte
temperature, nutrients, DO, pH, phytoplankton, availability).
zooplankton) and at different data sampling resolutions
(weekly, 2-weekly, and 4-weekly). Thus, not only the Effect of sampling representativeness
effect of differences in the temporal resolution of the Using the same univariate time series (total algal
time-series data was assessed, but also the behavior of biomass) and multivariate copepod species examples
different system levels across the trophic hierarchy, mentioned earlier, we compared the results of DFA
which is affected by biotic interactions and thus models for 4-weekly and seasonal observations taken at
typically incorporates processes of both high-frequency those specific intervals (i.e., discrete, equidistant data
and low-frequency variability. points measured every 4 weeks and every season, respec-
tively) with the results of models that averaged the
Multivariate time-series analysis weekly data over 4-weekly and seasonal intervals. This
A well-established and well-studied relationship simple exercise aimed to assess how representative a
between variables measured in Müggelsee was selected sample could be in the case of discrete, low-frequency
for the application of DFA to a set of relevant time sampling compared to the case in which all available
series. The case study dealt with the abrupt increase in information (i.e., weekly data) was used by simply
the abundance of a copepod species, Cyclops kolensis, averaging the observations over the 4-weekly or seasonal
when the abundance of a competing, larger species, intervals.
Cyclops vicinus, fell below a critical threshold triggered
by, for example, a change in the abundance of crypto- DFA model evaluation
phytes, a major food source for the juvenile stages of Comparing weekly, 2-weekly, and 4-weekly data-resolu-
both Cyclops species (Scharfenberger et al. 2013). tion models might not be straightforward because these
A set of models was evaluated for the copepod models seem to extract different temporal patterns,
species case study to test different numbers of common including long-term trends and seasonal and other cyclic
patterns (m) ranging from 1 to n − 1 (where n is the patterns, although the patterns extracted might vary
number of time-series involved in a particular analysis, depending on the type of variable. The resulting models
in this case 3) as well as different structures for the for differing data resolutions can first be assessed
covariance matrix R. Because the number of model visually; because DFA is analogous to linear regression,
parameters increases drastically when using an uncon- the same tools can be applied for model validation (Zuur
strained R matrix, we assumed the time series involved et al. 2007). The first tool plots the resulting best fits to
in our multivariate analyses had either an equal the observed values to identify the extent to which the
(diagonal and equal) or different (diagonal and unequal) model can capture the patterns in the original time series.
observation variance but no covariance between time The MARSS R-package assesses different model fits
series. The best model was then selected based on the by means of AIC. AIC was used in the copepod species
Akaike information criterion (AIC; Akaike 1974) as case study, where the best model fit with m patterns had
well as on model inspection of plotted patterns versus to be selected from among a set of different models. In
observed values and a visualization of the magnitudes addition, factor loadings and values in the covariance
and signs of factor loadings. matrix R (i.e., the observation error matrix containing
Results
Fig. 1. Pattern (line) derived from dynamic factor analysis fitted to Fig. 3. Patterns (lines) derived from dynamic factor analyses fitted to
the standardized 4-weekly data points (dots) for near-surface water the standardized (a) weekly and (b) 4-weekly data points (dots) for
temperature in Müggelsee. dissolved oxygen concentration in Müggelsee.
Reducing sampling resolution in multivariate of Pattern 1. The diagonal values in the R matrix indicated
time-series analysis that the selected model satisfactorily explained the
variability in the time series of the 2 copepod species (0.54
In the case study of the coexisting copepod species C. for C. vicinus and 0.12 for C. kolensis) but not the
vicinus and C. kolensis and the cryptophyte biomass variability in the cryptophyte biomass time series, which
(Scharfenberger et al. 2013), model selection based on AIC had a high value of unexplained variance in the covariance
implied that both weekly and 2-weekly data were best matrix R (0.76).
described by 2 common patterns and a diagonal and The 4-weekly data yielded one common pattern; C.
unequal covariance matrix. According to the factor vicinus and the cryptophyte biomass followed a decreasing
loadings, C. vicinus and the cryptophyte biomass (Fig. 5b) trend with time, and C. kolensis followed an increasing
seemed to be best described by Pattern 1, which decreased trend related to a negative factor loading (dotted line in Fig.
across time. By contrast, C. kolensis (Fig. 5b) was best 6a). The factor loadings of the 4-weekly resolution model
characterized by Pattern 2, which tended to be the inverse (Fig. 6b) were about one order of magnitude lower than
those in the weekly and 2-weekly resolution models.
Overall, the variance reflected in the 4-weekly resolution
model (average diagonal values in R matrix = 0.92) was
about twice that in the weekly model, indicating that a
significant amount of information was lost by using
4-weekly biomass data rather than data of higher resolution.
Fig. 5. The best dynamic factor analysis model for the weekly data
of the copepod species case study yielded 2 common patterns. (a)
Temporal variability of the 2 extracted patterns. (b) Pattern 1 (dark
Fig. 4. Patterns (lines) derived from dynamic factor analyses fitted to grey) mainly describes the time-series of C. vicinus and crypto-
the standardized (a) weekly, (b) 2-weekly, and (c) 4-weekly data phytes, whereas Pattern 2 (black) describes the temporal variability
points (dots) for total algal biomass in Müggelsee. of the abundance of C. kolensis, based on factor loading magnitudes.
Table 2. Amount of unexplained information in univariate and multivariate DFA models for data of weekly, 2-weekly, and 4-weekly sampling
resolution with and without artificially introduced gaps.
Weekly
2-weekly 4-weekly
DFA Model m R m R m R
Total algal biomass (no artificial gaps) 1 0.25 1 0.51 1 0.84
Total algal biomass (with artificial gaps) 0.35 0.65 0.88
Copepod abundances (no artificial gaps) 2 0.54 2 0.57 1 0.92
Copepod abundances (with artificial gaps) 0.78 0.82 0.93
m = number of patterns; R = amount of unexplained information (variance value in covariance matrix). Copepod
abundance data from Scharfenberger et al. (2013).
covariance matrix) in the weekly resolution model with weekly and 2-weekly data resolution models, respectively
artificial gaps increased by 40% (weekly data) compared to (Table 2). The 4-weekly model increased the amount of
the previous weekly resolution model without such gaps unexplained information by only 1.1% because the initial
(Table 2). The increase in unexplained information was variance was already high (from R = 0.84 to 0.88). The
lower in the case of the 2-weekly (27.5%) and 4-weekly total length of gaps introduced in all 3 temporal resolution
(4.8%) resolution models. These sampling resolutions cases decreased the number of observations by an average
initially yielded a large amount of unexplained information of 24.6% (range 24.3–25.0%) compared to the correspond-
in the previous models (0.51 and 0.84; Table 2), and ing models of the same case study without artificial gaps.
therefore the difference in the corresponding artificial gap Despite these differences, the extracted patterns and
models was not as drastic as in the case of weekly data. resulting best fits for models with and without artificial
Nevertheless, the explanatory power of the 2-weekly and gaps did not differ substantially from one another; in other
4-weekly data was much lower than that of the weekly words, the introduction of gaps in the cryptophyte time
data. series did not substantially alter the shape and behavior of
In the multivariate copepod species model, the the patterns extracted by DFA.
covariance R value increased by 44.4% and 43.9% for the
Unexplained information and sampling
representativeness
Discussion
Table 3. Amount of unexplained information in univariate and multivariate DFA models with differing resampling strategies (i.e., data
selection as opposed to data averaging).
4-weekly Seasonal
DFA Model m R m R
Total algal biomass (data selection) 0.84 0.81
1 1
Total algal biomass (data averaging) 0.78 0.64
Copepod abundances (data selection) 0.92 0.90
1 1
Copepod abundances (data averaging) 0.88 0.81
m = number of patterns; R = amount of unexplained information (variance value in covariance
matrix). Copepod abundance data from Scharfenberger et al. (2013).
scales. For instance, automated in situ measurements seasonality. This was not the case for the pH and DO
such as those of the Global Lake Ecological Observatory models, which presumably show more high-frequency
Network (GLEON) or Networking Lake Observatories variability than does water temperature when considered
in Europe (NETLAKE) operate on subhourly temporal at broader temporal scales. The information contained in
scales (e.g., Hamilton et al. 2014). Most environmental the time series of ammonium concentration and total
datasets, however, rely heavily on lower-frequency algal biomass, however, was best explained by the
sampling campaigns such as that conducted at weekly data model. The aliasing effect caused by under-
Müggelsee. Furthermore, time series usually contain sampling is much stronger in processes with a significant
gaps of different lengths, distributed irregularly with power of high frequency components. Algal growth
time. The role of the temporal resolution of long-term (Reynolds 1984) and nutrient dynamics operate on
data compared with the distribution of gaps determines temporal scales of days or less (Lampert and Sommer
what kind of questions can be addressed to better 2007), indicating the need, in these cases, for a higher
understand system dynamics and their response to envi- sampling frequency to avoid or reduce the effect of
ronmental change in the long term. Here, we addressed aliasing.
the question of the temporal resolution needed to capture Overall, in the univariate time-series analyses of the
the variability in driving and response variables or the chemical and biological variables, we observed that
interaction between them. Within an ecological context, 4-weekly resolution might not suffice to describe the
the ability to deal with irregular sampling data in variability that occurs within short time windows.
univariate or multivariate time series would be Addressing temporal scale is especially relevant in
considered an important improvement in time-series polymictic lakes, which, unlike dimictic and monomictic
analysis methods (Cazelles et al. 2008). DFA, which lakes, are subject to frequent changes in thermal regime
provides the means to analyze time series with varying during summer. In Müggelsee, for instance, although 79%
sampling resolutions and missing observations, of all stratification events last <1 day, the frequency and
succeeded in capturing general long-term trends in period of stratification are predicted to increase (Wilhelm
physical, chemical, and biological limnological variables and Adrian 2008). This change in stratification has
but failed to capture the seasonal variability and potential consequences for plankton development in the
associated interactions between driving and response lake; stratification periods exceeding 3 weeks, for instance,
variables, particularly when biological data were have been shown to favor the proliferation of cyanobacte-
involved. ria (Wagner and Adrian 2009), and severe climate-related
changes in plankton diversity have occurred on temporal
Sampling resolution reduction in scales of 2–3 weeks (Wagner and Adrian 2011).
physicochemical and biological variables
Sampling resolution reduction in multivariate
Predictably, decreasing the data resolution from weekly analysis
to 4-weekly lowered the capability of the DFA models to
explain the underlying short-term patterns; however, the In the multivariate copepod species case, the best DFA
decrease in the amount of unexplained information models usually yielded fewer common patterns for the
varied. In the case of an abiotic variable such as near- 4-weekly data resolution, indicating a reduced capability
surface water temperature, which is usually dominated of the models to capture interactions apparent in data
by low-frequency variability, 4-weekly data were sampled at higher resolution (i.e., weekly and 2-weekly).
sufficient to capture the main long-term pattern and The 4-weekly data were thus unable to capture the
underlying common patterns among the 3 interacting larger portion of the available information, whereas the
time series of copepod species abundances and practice of selecting observations at specific target
cryptophyte biomass. Note that the overall long-term intervals excludes much of the available information. We
pattern in copepod abundances seemed to coincide with acknowledge that the weekly sampling frequency in this
the results of Scharfenberger et al. (2013), who stated study might be below the Nyquist frequency for many
that the decline in C. vicinus followed the driver biotic and abiotic processes in limnology. Despite this
threshold scenario of regime shift theory (Andersen et al. limitation, however, we strongly recommend that the
2009), caused by an abrupt change in a driver (in this sampling frequency be tailored to include the potentially
case the decline in cryptophyte prey availability), distorting effects of aliasing when designing monitoring
whereas concomitantly C. kolensis benefited from this schemes. Also noteworthy is that, although high-fre-
situation and increased in abundance in the decades quency (e.g., daily, hourly) measurements of biotic
following the shift event. The failure to capture the variables would be preferred to guarantee sampling rep-
dynamics in zooplankton development and interactions resentativeness, most measurements of biotic variables
critically depends on species-specific life history traits. rely on spot-sample campaigns, which are undertaken at
Fast-growing species such as cladocerans and rotifers lower frequencies.
exhibit life-history durations of only a couple of days Several studies have suggested that long-term routine
compared to >4 weeks for copepods (Adrian et al. 2006). monitoring, even at low sampling frequencies, is
Thus, the population dynamics of cladocerans and important with respect to slow ecological processes, rare
rotifers are more easily missed if sampling is at low or episodic phenomena, highly variable processes, and
temporal resolution. Nevertheless, for copepods with subtle or complex phenomena. For instance, monthly
long and complex life histories, entire generations can be averaged stream nitrate data in the tropical forest of the
missed if sampling intervals are in the range of a month Luquillo Mountains (Puerto Rico) allowed researchers to
(Adrian et al. 2006). Despite the relative dearth of quantify the magnitude of the impact of Hurricane Hugo
information obtained by analyzing 4-weekly observa- on water chemistry and to assess the speed at which the
tions, DFA could still extract long-term, general patterns system approached background conditions after the
and, in some cases, intraannual and seasonal patterns in hurricane hit the island in 1989 (Schaefer et al. 2000,
our data (e.g., Fig. 4c and 6a). Dodds et al. 2012). In an analysis of watershed charac-
teristics, Halliday et al. (2012) found that 4-weekly data
Dealing with data gaps and sampling allowed the analysis of long-term trends and provided
representativeness new insights into the changing amplitude and phase of
the seasonality of water quality variables. Furthermore,
The introduction of additional artificial gaps into the data they observed that low-frequency (e.g., monthly) data
used for the univariate total algal biomass models and the provided background information on their study basin
multivariate copepod species case study models decreased and on the processes occurring necessary to interpret the
the amount of information that could be explained by the complexity of the high-frequency data.
pattern(s) extracted by DFA (Table 2). The increased Overall, low-frequency data allow the detection of
observation variance, however, varied according to the long-term trends and seasonality in those cases in which
temporal resolution of the data and was more pronounced the power in the low-frequency components of the
in the weekly data models in both the univariate and mul- processes is much higher than that in the high-frequency
tivariate cases. In real terms, the introduction of gaps for 2 components, specifically in cases in which the aliasing
consecutive spring months would be especially critical in effect does not significantly affect characterization of the
real sampling programs for most biological variables such large-scale processes. High-frequency data are key in the
as algal or cladoceran biomass because spring is a key identification of extremes, short-term trends, and
time of year when rapid, short-term changes in external subdaily variability in limnological variables. High-fre-
driving variables such as air temperature and day length quency data are also needed when the power in the high-
usually take place. Nevertheless, even in winter (a season frequency components is comparable to that in the low
much neglected because of the notion that not much frequency components; in this case the aliasing effect
happens), variability in the phytoplankton can be high at caused by undersampling is much stronger. A
short temporal scales (Özkundakci et al. 2016). All of this compromise between financial and logistical resources
was reflected in the loss of information found in the and sample representativeness must be reached to
weekly models with data gaps. maximize the information obtained from the observa-
Intuitively, averaging the highest-frequency observa- tions taken at the chosen sampling frequency.
tions to rescale the data to a lower frequency retains a
Schoellhamer DH. 2001. Singular spectrum analysis for time series Wilhelm S, Adrian R. 2008. Impact of summer warming on the thermal
with missing data. Geophys Res Lett. 28:3187–3190. characteristics of a polymictic lake: consequences for oxygen,
Shumway RH, Stoffer DS. 1982. An approach to time series smoothing nutrients and phytoplankton. Freshwater Biol. 53:226 –237.
and forecasting using the EM algorithm. J Time Ser Anal. 3:253–264. Zayed A. 1993. Advances in Shannon’s sampling theory. Taylor &
Shumway RH, Stoffer DS. 2010. Time series analysis and its applica- Francis.
tions: with R examples. New York (NY): Springer. Zuur AF, Fryer RJ, Jolliffe IT, Dekker R, Beukema JJ. 2003a.
Solomon CT, Bruesewitz DA, Richardson DC, Rose KC, Van de Estimating common trends in multivariate time series using dynamic
Bogert MC, Hanson PC, et al. 2013. Ecosystem respiration: drivers factor analysis. Environmetrics. 14:665–685.
of daily variability and background respiration in lakes around the Zuur AF, Tuck ID, Bailey N. 2003b. Dynamic factor analysis to
globe. Limnol Oceanogr. 58:849–866. estimate common trends in fisheries time series. Can J Fish Aquatic
Vilizzi L. 2012. Abundance trends in floodplain fish larvae: the role of Sci. 60:542–552.
annual flow characteristics in the absence of overbank flooding. Fund Zuur AF, Pierce GJ. 2004. Common trends in northeast Atlantic squid
Appl Limnol Arch Hydrobiol. 181:215–227. time series. J Sea Res. 52:57–72.
Wagner C, Adrian R. 2009. Cyanobacteria dominance: quantifying the Zuur AF, Ieno EN, Smith GM. 2007. Analysing ecological data (Vol.
effects of climate change. Limnol Oceanogr. 54:2460–2468. 680). New York: Springer.
Wagner C, Adrian R. 2011. Consequences of changes in thermal regime
for plankton diversity and trait composition in a polymictic lake: a
matter of temporal scale. Freshwater Biol. 56:1949 –1961.