Sie sind auf Seite 1von 12

Inland Waters

ISSN: 2044-2041 (Print) 2044-205X (Online) Journal homepage: http://www.tandfonline.com/loi/tinw20

Using dynamic factor analysis to show how


sampling resolution and data gaps affect the
recognition of patterns in limnological time series

Rosana Aguilera, David M. Livingstone, Rafael Marcé, Eleanor Jennings,


Jaume Piera & Rita Adrian

To cite this article: Rosana Aguilera, David M. Livingstone, Rafael Marcé, Eleanor Jennings,
Jaume Piera & Rita Adrian (2016) Using dynamic factor analysis to show how sampling resolution
and data gaps affect the recognition of patterns in limnological time series, Inland Waters, 6:3,
284-294, DOI: 10.1080/IW-6.3.948

To link to this article: https://doi.org/10.1080/IW-6.3.948

Published online: 02 Jan 2018.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=tinw20
284

Article

Using dynamic factor analysis to show how sampling resolution


and data gaps affect the recognition of patterns in limnological
time series
Rosana Aguilera,1*† David M. Livingstone,2 Rafael Marcé,1 Eleanor Jennings,3 Jaume Piera,4 and Rita Adrian5,6
1
Catalan Institute for Water Research (ICRA), Girona, Spain
2
Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
3
Department of Applied Sciences, Dundalk Institute of Technology, Dundalk, Ireland
4
Physical and Technological Oceanography Department, Institute of Marine Sciences (ICM-CSIC), Barcelona, Spain
5
Leibniz-Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany
6
Free University Berlin, Department of Biology, Chemistry and Pharmacy, Berlin, Germany

Current address: Marine Science Institute, University of California Santa Barbara, Santa Barbara, CA, USA
* Corresponding author: raguilera@icra.cat

Received 11 November 2015; accepted 23 February 2016; published 17 May 2016

Abstract

Currently, many water quality variables can be automatically monitored at subdaily frequencies, but most environmen-
tal datasets still rely heavily on comparatively low-frequency (e.g., monthly) sampling campaigns. Taking advantage of
the long-term ecological data available from polymictic Lake Müggelsee (Germany) at weekly temporal scales, we
tested whether dynamic factor analysis (DFA), a dimension-reduction technique especially designed for time series
with data gaps, was able to reproduce the same underlying patterns if the resolution of the observations was artificially
reduced from weekly to once every 2 or 4 weeks (2-weekly or 4-weekly, respectively) and whether the same results
were obtained for different variables (water temperature, water chemistry, and plankton). As expected, the reduction in
data resolution and the addition of gaps increased the amount of variance in our case studies; however, water
temperature patterns, which generally vary only slowly if considered at broader temporal scales, were represented well
in the 4-weekly data model. By contrast, total algal biomass and nutrient dynamics, which usually fluctuate more
rapidly, were not. We further assessed the representativeness of spot samples taken by selection at 4-weekly and
seasonal intervals compared to samples obtained by averaging the weekly data over these 2 intervals. In this case, the
resulting observation variance was more pronounced in DFA models of the samples taken by selection than of those
obtained by averaging. The introduction of additional artificial gaps in one univariate and one multivariate model
decreased the amount of information that could be explained by the pattern(s) extracted by DFA. Overall, we found that
a too low sampling frequency could be a limiting factor when studying interactions and responses to changing
conditions in aquatic systems, particularly when biological variables are involved.

Key words: dynamic factor analysis, Müggelsee, sampling resolution, time series

Introduction frequency, defined as half the sampling frequency (Zayed


1993). In other words, the sampling frequency should be at
The Nyquist-Shannon sampling theorem implies that a least twice the highest frequency contained in the signal.
continuous signal can be properly sampled only if it An inadequate sampling rate results in the phenomenon
contains no power at frequencies above the Nyquist known as aliasing, in which power at frequencies higher

DOI: 10.5268/IW-6.3.948 Inland Waters (2016) 6, pp.284-294


© International Society of Limnology 2016
Effects of sampling resolution and data gaps of limnological data on pattern recognition 285

than the Nyquist frequency is interpreted as appearing at wavelet Z-transform (WWZ) method can handle unevenly
frequencies lower than the Nyquist frequency, which sampled data, but large gaps in the data would complicate
results not only in a loss of information about the system the identification of temporal patterns (Foster 1996).
but also distorts the information obtained. In limnology, Another example is Seasonal Kendall trend analysis,
these signals might represent processes and interactions which can cope with missing values (Hirsch et al. 1982).
between variables within a given ecosystem or region. Nevertheless, commonly used methods such as spectral
Processes in lakes are complex, involving overlapping analysis, wavelet analysis, and Box-Jenkins modeling
temporal and trophic-level scales. Thus, when measuring usually require time series to be stationary and evenly
ecological variables in lakes, the sampling frequency and sampled (Zuur et al. 2003a). In addition, these methods are
the lengths of the time series obtained are critical for not particularly suitable for unraveling patterns and inter-
capturing and understanding patterns of temporal evolution actions between variables over time.
in these variables. These patterns include (but are not Dynamic factor analysis (DFA) is a dimension-reduc-
limited to) long-term trends and seasonal variability, which tion method designed for time series (Zuur et al. 2003a)
can usually be identified using lower frequency data (e.g., used mainly to estimate underlying common patterns in a
monthly) as well as high-frequency fluctuations, which set of time series. In this method, observations are related
typically require finer data resolution. to the hidden patterns or states by the state-space model.
In the past few decades, studies relying mainly on This framework provides a means for modeling nonlinear
weekly to once every 4 week (4-weekly) spot samples effects and structural change in time series and is able to
(Halliday et al. 2012) have revealed the complex nature of cope with missing observations (Harvey 1989). DFA has
ecosystems response to environmental change and the been used widely in econometric and psychological studies
time scales involved (review in Adrian et al. 2012). Such and has recently been applied to fisheries datasets (Zuur et
studies have encouraged the measurement of long-term, al. 2003b, Zuur and Pierce 2004, Vilizzi 2012), groundwa-
high-frequency datasets (Parr et al. 2003). The recent ter-quality data (Muñoz-Carpena et al. 2005, Kuo et al.
development and deployment of high-frequency sensor 2013), and river water-quality data (Aguilera et al. 2015).
monitoring equipment and the increasing duration of ob- To address the issue of the temporal scale of sampling
servational time series available from lakes have in turn and gaps in limnological time series, we took advantage
allowed more detailed study of lake responses and of an ongoing long-term research program at Müggelsee,
processes (Jennings et al. 2012, Solomon et al. 2013, Rose a shallow, polymictic lake in Berlin, Germany, in which
et al. 2014). These technological developments also data on physicochemical variables and plankton are
guarantee that the sampling frequency will normally collected weekly. We chose a shallow, polymictic lake
exceed the Nyquist frequency, avoiding any numerical because this type of lake is known to be more strongly
artifacts associated with aliasing. influenced by short-term ambient weather conditions than
Despite such advances, most limnological datasets are are dimictic lakes (Gerten and Adrian 2000). The ultimate
still prone to uneven sampling intervals and missing ob- goal of this study was to assess the potential usefulness of
servations. These sampling campaigns are, in most cases, DFA in analyzing patterns in limnological time series
subject to logistical and financial limitations and often do with differing temporal sampling resolutions and data
not take into account the ideal frequency requirements gaps. To achieve this, we specified 4 objectives. (1) We
necessary to obtain adequate sample representativeness of tested whether DFA was able to reproduce the same
the system being studied. These commonly encountered underlying patterns in the data when the sampling
problems can represent a limitation in the study of the resolution, in univariate analyses, was reduced from
underlying patterns in ecosystem responses to climatic and weekly to 4-weekly. (2) We compared and contrasted
anthropogenic forcing. analyses of physicochemical and biological variables
Irregularities in sampling frequency and the presence because these 2 types of variables may differ with regard
of data gaps are often the first challenges encountered to their frequency spectra. (3) We aimed to determine
when selecting an appropriate method for analyzing time whether the complex dynamics of a set of time series
series. Consequently, the choice of analytical methodology with well-known interactions could be described using
becomes crucial to taking advantage of all available DFA and differing sampling frequencies. (4) Finally, we
information. Of the various time-series analysis techniques assessed the ability of DFA to cope with missing observa-
that can be used to study responses and patterns, few are tions in 2 ways: (a) by introducing artificial gaps into the
able to cope with these challenges. For instance, Schoell- time series and investigating the effect of this on the
hamer (2001) applied a modified singular-spectrum patterns extracted; and (b) by resampling the available
analysis algorithm to obtain spectral estimates from weekly data to obtain lower-frequency (i.e., seasonal and
records with a large fraction of missing data. The weighted 4-weekly) datasets.

DOI: 10.5268/IW-6.3.948 Inland Waters (2016) 6, pp.284-294


286 Aguilera et al.

Table 1. Time period and number of observations for the Müggelsee variables included in the study. (Note that for pH the number of observa-
tions was substantially lower than for the other variables).
Number of observations Date
Variable Weekly 2-weekly 4-weekly Start End
Water temperature 1748 874 437 24 Jul 1978 16 Jan 2012
Dissolved oxygen 1722 861 431 22 Jan 1979 16 Jan 2012
pH 1027 514 257 18 May 1992 16 Jan 2012
Ammonium 1721 861 431 01 Jan 1979 19 Dec 2011
Total algal biomass 1721 861 431 01 Jan 1979 19 Dec 2011
Copepod species case study 1614 807 404 21 Jan 1980 20 Dec 2010
formulation, and thus DFA, also allows a univariate time
Methods series to be decomposed into a pattern and an error term.
DFA was performed using the Multivariate Autoregres-
Study site and time-series data sive State-Space (MARSS) R-package 3.4 (Holmes et al.
2013), run in R 2.15.1 (R Core Team 2012). Each time series
Müggelsee is a shallow, polymictic lake in Berlin was standardized by subtracting its mean and dividing by its
(Germany) with a surface area of 7.3 km2, a mean depth of standard deviation to facilitate the interpretation and
4.9 m, and a water retention time of ~6–8 weeks. Data comparison of resulting patterns and factor loadings. The
collection in the lake has been undertaken weekly (from basis of the DFA formulation in MARSS includes a process
spring to autumn) or once every 2 weeks (2-weekly; during model (x) and an observation model (y):
winter) since 1979 (Köhler et al. 2005). A comprehensive
description of the characteristics of Müggelsee and its sur- xt = xt-1 + wt wt ~ MVN(0,Q)
roundings can be found in Driescher et al. (1993). Physico- yt = Zxt + vt vt ~ MVN(0,R). (1)
chemical and biological variables from the Müggelsee
dataset were selected to carry out DFA. Variables included The observations (yt) are modeled as a linear combination of
water temperature, dissolved oxygen (DO), and pH (all hidden patterns (xt) and factor loadings (Z; Holmes et al.
measured at 1 m depth), nutrient concentrations 2012). The terms wt and vt describe the errors associated with
(ammonium), and the abundances of phytoplankton (both the hidden patterns and the observations, respectively,
taken from the upper 4 m of the water column) and normally distributed with expectation 0 and with covariance
zooplankton (taken from the entire water column). A matrices Q and R. The Q matrix is set to an identity matrix,
detailed description of sampling and sample processing is and the R matrix is usually taken to be diagonal. The diagonal
given in Driescher et al. (1993) and in Gerten and Adrian values in the R matrix contain the variances of the time series
(2000). involved, and the off-diagonal values contain the covariances
The time spans covered by the time series ranged from between the time series. The MARSS algorithm uses the
~19 yr (pH) to 33 yr (water temperature; Table 1). The finest Kalman filter and smoother to compute the best estimate for
resolution was weekly. The reduction in resolution from the hidden patterns, given a set of assumed values for the
weekly to 2-weekly and 4-weekly was accomplished by unknown parameters Z, Q, and R. The output of the Kalman
simple decimation (i.e., by selecting every second or every filter is then used to compute the likelihood of the data given
fourth value from the original weekly dataset). the better estimates for the set of parameters. MARSS
estimates the maximum likelihood using the expectation–
Dynamic factor analysis (DFA) maximization (EM) method (Holmes 2013). The number of
iterations necessary is determined by the log-likelihood of
Dynamic factor analysis is based on structural time-series reaching convergence (slope tolerance for log-log test = 0.1;
models viewed from within the state-space model framework absolute tolerance test = 0.001).
(Harvey 1989). In DFA, all common patterns in a set of time A considerable advantage of the state-space approach is
series are estimated simultaneously using the maximum the ease of addressing missing observations (Durbin and
likelihood method. These patterns are modeled as random Koopman 2012). Holmes (2013) performed a series of
walks and thus are allowed to be stochastic, meaning that modifications in the EM update equations for missing
their shape can change over time and is not necessarily values in MARSS models, based on the work of Shumway
restricted to a straight-line trend or a cosine-function cyclic or and Stoffer (1982, 2010) and Zuur et al. (2003a), and
seasonal component (Zuur et al. 2007). The state-space presented a thorough explanation of the methodology. The

© International Society of Limnology 2016 DOI: 10.5268/IW-6.3.948


Effects of sampling resolution and data gaps of limnological data on pattern recognition 287

main disadvantage of DFA is its intensive computations, Effect of gaps on pattern extraction in time-series analysis
translating into long computational times ranging from Gaps were introduced in one representative univariate
hours to days and weeks, depending on the complexity of time series (total algal biomass) and in the multivariate
the model. In our study, the High-Performance Computing copepod species case described earlier (Scharfenberger et
Cluster Undarius allowed the simultaneous computation of al. 2013). Gaps were systematically introduced by
multiple independent analyses. Nevertheless, the deleting 3 months of available data per year. The deleted
computation time for the multivariate analyses (based on data consisted of a large gap comprising 2 consecutive
the copepod species and cryptophytes case study described spring months (Apr and May) and a shorter gap in winter
below) ranged from 1 to 4 weeks. (Dec). Three different analyses with different sampling
resolutions (weekly, 2-weekly, and 4-weekly) were
Effect of sampling resolution performed to determine changes to the observed patterns
(previous models without artificial gaps) when data gaps
Univariate time-series analysis are larger or evenly distributed across time (i.e.,
Univariate time-series analyses were performed so that systematic missing observations at specific times of the
a pattern plus an error term would be extracted from year). In the copepod species case study, gaps were
each variable along a trophic hierarchy (water introduced only in the driving variable (i.e., cryptophyte
temperature, nutrients, DO, pH, phytoplankton, availability).
zooplankton) and at different data sampling resolutions
(weekly, 2-weekly, and 4-weekly). Thus, not only the Effect of sampling representativeness
effect of differences in the temporal resolution of the Using the same univariate time series (total algal
time-series data was assessed, but also the behavior of biomass) and multivariate copepod species examples
different system levels across the trophic hierarchy, mentioned earlier, we compared the results of DFA
which is affected by biotic interactions and thus models for 4-weekly and seasonal observations taken at
typically incorporates processes of both high-frequency those specific intervals (i.e., discrete, equidistant data
and low-frequency variability. points measured every 4 weeks and every season, respec-
tively) with the results of models that averaged the
Multivariate time-series analysis weekly data over 4-weekly and seasonal intervals. This
A well-established and well-studied relationship simple exercise aimed to assess how representative a
between variables measured in Müggelsee was selected sample could be in the case of discrete, low-frequency
for the application of DFA to a set of relevant time sampling compared to the case in which all available
series. The case study dealt with the abrupt increase in information (i.e., weekly data) was used by simply
the abundance of a copepod species, Cyclops kolensis, averaging the observations over the 4-weekly or seasonal
when the abundance of a competing, larger species, intervals.
Cyclops vicinus, fell below a critical threshold triggered
by, for example, a change in the abundance of crypto- DFA model evaluation
phytes, a major food source for the juvenile stages of Comparing weekly, 2-weekly, and 4-weekly data-resolu-
both Cyclops species (Scharfenberger et al. 2013). tion models might not be straightforward because these
A set of models was evaluated for the copepod models seem to extract different temporal patterns,
species case study to test different numbers of common including long-term trends and seasonal and other cyclic
patterns (m) ranging from 1 to n − 1 (where n is the patterns, although the patterns extracted might vary
number of time-series involved in a particular analysis, depending on the type of variable. The resulting models
in this case 3) as well as different structures for the for differing data resolutions can first be assessed
covariance matrix R. Because the number of model visually; because DFA is analogous to linear regression,
parameters increases drastically when using an uncon- the same tools can be applied for model validation (Zuur
strained R matrix, we assumed the time series involved et al. 2007). The first tool plots the resulting best fits to
in our multivariate analyses had either an equal the observed values to identify the extent to which the
(diagonal and equal) or different (diagonal and unequal) model can capture the patterns in the original time series.
observation variance but no covariance between time The MARSS R-package assesses different model fits
series. The best model was then selected based on the by means of AIC. AIC was used in the copepod species
Akaike information criterion (AIC; Akaike 1974) as case study, where the best model fit with m patterns had
well as on model inspection of plotted patterns versus to be selected from among a set of different models. In
observed values and a visualization of the magnitudes addition, factor loadings and values in the covariance
and signs of factor loadings. matrix R (i.e., the observation error matrix containing

DOI: 10.5268/IW-6.3.948 Inland Waters (2016) 6, pp.284-294


288 Aguilera et al.

variance values) were used to assess the strength of a


model in extracting underlying patterns in our univariate
and multivariate time-series analyses.

Results

Effect of reducing sampling resolution in


univariate time-series

The reduction in resolution (from weekly to 2-weekly to


4-weekly) did not significantly change the DFA results
for near-surface water temperature (Fig. 1, 4-weekly
data), which exhibited relatively little high-frequency
variability (at least at weekly sampling resolution).
For the ammonium concentration data, the analysis at
a 4-weekly data resolution revealed a slight decrease
over time since the beginning of the sampling program
(Fig. 2a), but neither interannual nor seasonal variability
could be captured (Fig. 2b). The amount of unexplained
information was fairly large in the 4-weekly resolution
model, which was also evident in the fit (Fig. 2b). For
DO, the 4-weekly resolution data analysis resulted in a
constant zero pattern across the 3 decades of data (Fig.
3), also the case for pH (results not shown). The amount
of unexplained information, as reflected in the covariance Fig. 2. Patterns (lines) derived from dynamic factor analyses fitted
matrices, was thus significantly larger for the 4-weekly to the standardized (a) weekly and (b) 4-weekly data points (dots)
data models. for ammonium concentration in Müggelsee (measured at 1 m depth).
The analysis of total algal biomass in Müggelsee
revealed a drastic change in the overall pattern of the
time series after the resolution was reduced from weekly
(Fig. 4a) to 2-weekly (Fig. 4b), and 4-weekly (Fig. 4c).
Further, the variance value was higher for the 4-weekly
resolution model (0.84) than for the 2-weekly (0.51) and
weekly (0.25) models, implying that the observation
variance in the 4-weekly resolution model was at least 3
times higher than in the weekly resolution model.

Fig. 1. Pattern (line) derived from dynamic factor analysis fitted to Fig. 3. Patterns (lines) derived from dynamic factor analyses fitted to
the standardized 4-weekly data points (dots) for near-surface water the standardized (a) weekly and (b) 4-weekly data points (dots) for
temperature in Müggelsee. dissolved oxygen concentration in Müggelsee.

© International Society of Limnology 2016 DOI: 10.5268/IW-6.3.948


Effects of sampling
Author/sresolution and data gaps of limnological data on pattern recognition 289

Reducing sampling resolution in multivariate of Pattern 1. The diagonal values in the R matrix indicated
time-series analysis that the selected model satisfactorily explained the
variability in the time series of the 2 copepod species (0.54
In the case study of the coexisting copepod species C. for C. vicinus and 0.12 for C. kolensis) but not the
vicinus and C. kolensis and the cryptophyte biomass variability in the cryptophyte biomass time series, which
(Scharfenberger et al. 2013), model selection based on AIC had a high value of unexplained variance in the covariance
implied that both weekly and 2-weekly data were best matrix R (0.76).
described by 2 common patterns and a diagonal and The 4-weekly data yielded one common pattern; C.
unequal covariance matrix. According to the factor vicinus and the cryptophyte biomass followed a decreasing
loadings, C. vicinus and the cryptophyte biomass (Fig. 5b) trend with time, and C. kolensis followed an increasing
seemed to be best described by Pattern 1, which decreased trend related to a negative factor loading (dotted line in Fig.
across time. By contrast, C. kolensis (Fig. 5b) was best 6a). The factor loadings of the 4-weekly resolution model
characterized by Pattern 2, which tended to be the inverse (Fig. 6b) were about one order of magnitude lower than
those in the weekly and 2-weekly resolution models.
Overall, the variance reflected in the 4-weekly resolution
model (average diagonal values in R matrix = 0.92) was
about twice that in the weekly model, indicating that a
significant amount of information was lost by using
4-weekly biomass data rather than data of higher resolution.

Unexplained information in time-series analysis


with additional data gaps

In the univariate model of total algal biomass, the amount


of unexplained information (diagonal values in the R

Fig. 5. The best dynamic factor analysis model for the weekly data
of the copepod species case study yielded 2 common patterns. (a)
Temporal variability of the 2 extracted patterns. (b) Pattern 1 (dark
Fig. 4. Patterns (lines) derived from dynamic factor analyses fitted to grey) mainly describes the time-series of C. vicinus and crypto-
the standardized (a) weekly, (b) 2-weekly, and (c) 4-weekly data phytes, whereas Pattern 2 (black) describes the temporal variability
points (dots) for total algal biomass in Müggelsee. of the abundance of C. kolensis, based on factor loading magnitudes.

DOI: 10.5268/IW-6.3.948 Inland Waters (2016) 6, pp.284-294


290 Aguilera et al.

Table 2. Amount of unexplained information in univariate and multivariate DFA models for data of weekly, 2-weekly, and 4-weekly sampling
resolution with and without artificially introduced gaps.
Weekly
2-weekly 4-weekly
DFA Model m R m R m R
Total algal biomass (no artificial gaps) 1 0.25 1 0.51 1 0.84
Total algal biomass (with artificial gaps) 0.35 0.65 0.88
Copepod abundances (no artificial gaps) 2 0.54 2 0.57 1 0.92
Copepod abundances (with artificial gaps) 0.78 0.82 0.93
m = number of patterns; R = amount of unexplained information (variance value in covariance matrix). Copepod
abundance data from Scharfenberger et al. (2013).

covariance matrix) in the weekly resolution model with weekly and 2-weekly data resolution models, respectively
artificial gaps increased by 40% (weekly data) compared to (Table 2). The 4-weekly model increased the amount of
the previous weekly resolution model without such gaps unexplained information by only 1.1% because the initial
(Table 2). The increase in unexplained information was variance was already high (from R = 0.84 to 0.88). The
lower in the case of the 2-weekly (27.5%) and 4-weekly total length of gaps introduced in all 3 temporal resolution
(4.8%) resolution models. These sampling resolutions cases decreased the number of observations by an average
initially yielded a large amount of unexplained information of 24.6% (range 24.3–25.0%) compared to the correspond-
in the previous models (0.51 and 0.84; Table 2), and ing models of the same case study without artificial gaps.
therefore the difference in the corresponding artificial gap Despite these differences, the extracted patterns and
models was not as drastic as in the case of weekly data. resulting best fits for models with and without artificial
Nevertheless, the explanatory power of the 2-weekly and gaps did not differ substantially from one another; in other
4-weekly data was much lower than that of the weekly words, the introduction of gaps in the cryptophyte time
data. series did not substantially alter the shape and behavior of
In the multivariate copepod species model, the the patterns extracted by DFA.
covariance R value increased by 44.4% and 43.9% for the
Unexplained information and sampling
representativeness

For both univariate and multivariate data, DFA models


based on data obtained by averaging the original
time-series over either 4-weekly or seasonal intervals out-
performed DFA models based on data points simply
selected from the original time series at the same 2
intervals. The values of variance in the R matrices for the
data selection models are considerably larger than those for
the data-averaging models (Table 3). These differences in
variance are notably larger for the seasonal data than for
the 4-weekly data. Overall, our results confirm that, as
expected, averaging the available observations in a time
series to obtain lower-frequency measurements is
preferable to selecting individual values from the time
series at specific target intervals.

Discussion

The responses of lakes to environmental change are


complex, occur on a variety of temporal scales, and can
be associated with changes that occur within critical
Fig. 6. The best model after dynamic factor analysis for the 4-weekly
data of the copepod species case study yielded one common pattern.
short-term time windows within a season (Adrian et al.
(a) The dotted line represents the inverse pattern related to the 2012). Many past and ongoing long-term research
negative factor loading of C. kolensis shown in (b). programs provide high-quality data at short temporal

© International Society of Limnology 2016 DOI: 10.5268/IW-6.3.948


Effects of sampling resolution and data gaps of limnological data on pattern recognition 291

Table 3. Amount of unexplained information in univariate and multivariate DFA models with differing resampling strategies (i.e., data
selection as opposed to data averaging).
4-weekly Seasonal
DFA Model m R m R
Total algal biomass (data selection) 0.84 0.81
1 1
Total algal biomass (data averaging) 0.78 0.64
Copepod abundances (data selection) 0.92 0.90
1 1
Copepod abundances (data averaging) 0.88 0.81
m = number of patterns; R = amount of unexplained information (variance value in covariance
matrix). Copepod abundance data from Scharfenberger et al. (2013).

scales. For instance, automated in situ measurements seasonality. This was not the case for the pH and DO
such as those of the Global Lake Ecological Observatory models, which presumably show more high-frequency
Network (GLEON) or Networking Lake Observatories variability than does water temperature when considered
in Europe (NETLAKE) operate on subhourly temporal at broader temporal scales. The information contained in
scales (e.g., Hamilton et al. 2014). Most environmental the time series of ammonium concentration and total
datasets, however, rely heavily on lower-frequency algal biomass, however, was best explained by the
sampling campaigns such as that conducted at weekly data model. The aliasing effect caused by under-
Müggelsee. Furthermore, time series usually contain sampling is much stronger in processes with a significant
gaps of different lengths, distributed irregularly with power of high frequency components. Algal growth
time. The role of the temporal resolution of long-term (Reynolds 1984) and nutrient dynamics operate on
data compared with the distribution of gaps determines temporal scales of days or less (Lampert and Sommer
what kind of questions can be addressed to better 2007), indicating the need, in these cases, for a higher
understand system dynamics and their response to envi- sampling frequency to avoid or reduce the effect of
ronmental change in the long term. Here, we addressed aliasing.
the question of the temporal resolution needed to capture Overall, in the univariate time-series analyses of the
the variability in driving and response variables or the chemical and biological variables, we observed that
interaction between them. Within an ecological context, 4-weekly resolution might not suffice to describe the
the ability to deal with irregular sampling data in variability that occurs within short time windows.
univariate or multivariate time series would be Addressing temporal scale is especially relevant in
considered an important improvement in time-series polymictic lakes, which, unlike dimictic and monomictic
analysis methods (Cazelles et al. 2008). DFA, which lakes, are subject to frequent changes in thermal regime
provides the means to analyze time series with varying during summer. In Müggelsee, for instance, although 79%
sampling resolutions and missing observations, of all stratification events last <1 day, the frequency and
succeeded in capturing general long-term trends in period of stratification are predicted to increase (Wilhelm
physical, chemical, and biological limnological variables and Adrian 2008). This change in stratification has
but failed to capture the seasonal variability and potential consequences for plankton development in the
associated interactions between driving and response lake; stratification periods exceeding 3 weeks, for instance,
variables, particularly when biological data were have been shown to favor the proliferation of cyanobacte-
involved. ria (Wagner and Adrian 2009), and severe climate-related
changes in plankton diversity have occurred on temporal
Sampling resolution reduction in scales of 2–3 weeks (Wagner and Adrian 2011).
physicochemical and biological variables
Sampling resolution reduction in multivariate
Predictably, decreasing the data resolution from weekly analysis
to 4-weekly lowered the capability of the DFA models to
explain the underlying short-term patterns; however, the In the multivariate copepod species case, the best DFA
decrease in the amount of unexplained information models usually yielded fewer common patterns for the
varied. In the case of an abiotic variable such as near- 4-weekly data resolution, indicating a reduced capability
surface water temperature, which is usually dominated of the models to capture interactions apparent in data
by low-frequency variability, 4-weekly data were sampled at higher resolution (i.e., weekly and 2-weekly).
sufficient to capture the main long-term pattern and The 4-weekly data were thus unable to capture the

DOI: 10.5268/IW-6.3.948 Inland Waters (2016) 6, pp.284-294


292 Aguilera et al.

underlying common patterns among the 3 interacting larger portion of the available information, whereas the
time series of copepod species abundances and practice of selecting observations at specific target
cryptophyte biomass. Note that the overall long-term intervals excludes much of the available information. We
pattern in copepod abundances seemed to coincide with acknowledge that the weekly sampling frequency in this
the results of Scharfenberger et al. (2013), who stated study might be below the Nyquist frequency for many
that the decline in C. vicinus followed the driver biotic and abiotic processes in limnology. Despite this
threshold scenario of regime shift theory (Andersen et al. limitation, however, we strongly recommend that the
2009), caused by an abrupt change in a driver (in this sampling frequency be tailored to include the potentially
case the decline in cryptophyte prey availability), distorting effects of aliasing when designing monitoring
whereas concomitantly C. kolensis benefited from this schemes. Also noteworthy is that, although high-fre-
situation and increased in abundance in the decades quency (e.g., daily, hourly) measurements of biotic
following the shift event. The failure to capture the variables would be preferred to guarantee sampling rep-
dynamics in zooplankton development and interactions resentativeness, most measurements of biotic variables
critically depends on species-specific life history traits. rely on spot-sample campaigns, which are undertaken at
Fast-growing species such as cladocerans and rotifers lower frequencies.
exhibit life-history durations of only a couple of days Several studies have suggested that long-term routine
compared to >4 weeks for copepods (Adrian et al. 2006). monitoring, even at low sampling frequencies, is
Thus, the population dynamics of cladocerans and important with respect to slow ecological processes, rare
rotifers are more easily missed if sampling is at low or episodic phenomena, highly variable processes, and
temporal resolution. Nevertheless, for copepods with subtle or complex phenomena. For instance, monthly
long and complex life histories, entire generations can be averaged stream nitrate data in the tropical forest of the
missed if sampling intervals are in the range of a month Luquillo Mountains (Puerto Rico) allowed researchers to
(Adrian et al. 2006). Despite the relative dearth of quantify the magnitude of the impact of Hurricane Hugo
information obtained by analyzing 4-weekly observa- on water chemistry and to assess the speed at which the
tions, DFA could still extract long-term, general patterns system approached background conditions after the
and, in some cases, intraannual and seasonal patterns in hurricane hit the island in 1989 (Schaefer et al. 2000,
our data (e.g., Fig. 4c and 6a). Dodds et al. 2012). In an analysis of watershed charac-
teristics, Halliday et al. (2012) found that 4-weekly data
Dealing with data gaps and sampling allowed the analysis of long-term trends and provided
representativeness new insights into the changing amplitude and phase of
the seasonality of water quality variables. Furthermore,
The introduction of additional artificial gaps into the data they observed that low-frequency (e.g., monthly) data
used for the univariate total algal biomass models and the provided background information on their study basin
multivariate copepod species case study models decreased and on the processes occurring necessary to interpret the
the amount of information that could be explained by the complexity of the high-frequency data.
pattern(s) extracted by DFA (Table 2). The increased Overall, low-frequency data allow the detection of
observation variance, however, varied according to the long-term trends and seasonality in those cases in which
temporal resolution of the data and was more pronounced the power in the low-frequency components of the
in the weekly data models in both the univariate and mul- processes is much higher than that in the high-frequency
tivariate cases. In real terms, the introduction of gaps for 2 components, specifically in cases in which the aliasing
consecutive spring months would be especially critical in effect does not significantly affect characterization of the
real sampling programs for most biological variables such large-scale processes. High-frequency data are key in the
as algal or cladoceran biomass because spring is a key identification of extremes, short-term trends, and
time of year when rapid, short-term changes in external subdaily variability in limnological variables. High-fre-
driving variables such as air temperature and day length quency data are also needed when the power in the high-
usually take place. Nevertheless, even in winter (a season frequency components is comparable to that in the low
much neglected because of the notion that not much frequency components; in this case the aliasing effect
happens), variability in the phytoplankton can be high at caused by undersampling is much stronger. A
short temporal scales (Özkundakci et al. 2016). All of this compromise between financial and logistical resources
was reflected in the loss of information found in the and sample representativeness must be reached to
weekly models with data gaps. maximize the information obtained from the observa-
Intuitively, averaging the highest-frequency observa- tions taken at the chosen sampling frequency.
tions to rescale the data to a lower frequency retains a

© International Society of Limnology 2016 DOI: 10.5268/IW-6.3.948


Effects of sampling resolution and data gaps of limnological data on pattern recognition 293

Acknowledgements high-frequency sensor data for validation of deterministic ecological


models. Inland Waters. 5:49–56.
R. Aguilera benefited from a Short-Term Scientific Harvey AC. 1989. Forecasting, structural time series models and the
Mission at the Leibniz-Institute of Freshwater Ecology Kalman filter. Cambridge University Press.
and Inland Fisheries, Berlin, funded by the ESSEM COST Hirsch RM, Slack JR, Smith RA. 1982. Techniques of trend analysis for
Action ES1201 NETLAKE (Networking Lake Observato- monthly water-quality data. Water Resour Res. 18:107–121.
ries in Europe). D.M. Livingstone benefited from time Holmes EE. 2013. Derivation of the EM algorithm for constrained and
spent as guest scientist at the Leibniz-Institute of unconstrained multivariate autoregressive state-space (MARSS)
Freshwater Ecology and Inland Fisheries, Berlin, funded models. Technical Report. arXiv:1302.3919.
by the IGB Fellowship Program in Freshwater Science. R. Holmes EE, Ward EJ, Wills K. 2012. MARSS: Multivariate Autore-
Adrian was supported through the German Science gressive State-Space models for analyzing time-series data. R
Foundation project LakeRisk (AD 91/13-1). Journal. 4:11–19.
Holmes EE, Ward EJ, Wills K. 2013. MARSS: Multivariate Autore-
References gressive State-Space modeling. R package 3.4.
Jennings E, Jones S, Arvola L, Staehr PA, Gaiser E, Jones ID, Weathers
Adrian R, Wilhelm S, Gerten D. 2006. Life-history traits of lake KC, Weyhenmeyer GA, Chiu C, De Eyto E. 2012. Effects of
plankton species may govern their phenological responses to climate weather-related episodic events in lakes: an analysis based on high–
warming. Glob Change Biol. 12:652–661. frequency data. Freshwater Biol. 57:589–601.
Adrian R, Gerten D, Huber V, Wagner C, Schmidt S. 2012. Windows of Köhler J, Hilt S, Adrian R, Nicklisch A, Kozerski HP, Walz N. 2005.
change: temporal scale of analysis is decisive to detect ecosystem Long-term response of a shallow, moderately flushed lake to reduced
responses to climate change. Mar Biol. 159:2533–2542. external phosphorus and nitrogen loading. Freshwater Biol. 50:1639–1650.
Aguilera R, Marcé R, Sabater S. 2015. Detection and attribution of Kuo YM, Jang CS, Yu HL, Chen SC, Chu HJ. 2013. Identifying
global change effects on river nutrient dynamics in a large Mediterra- nearshore groundwater and river hydrochemical variables
nean basin. Biogeosciences. 12:4085–4098. influencing water quality of Kaoping River Estuary using dynamic
Akaike H. 1974. A new look at the statistical model identification. IEEE factor analysis. J Hydrol. 486:39–47.
T Automat Cont. 19:716–23. Lampert W, Sommer U. 2007. Limnoecology: the ecology of lakes and
Andersen T, Carstensen J, Hernandez-Garcia E, Duarte CM. 2009. streams. 2nd ed. Oxford University Press.
Ecological thresholds and regime shifts: approaches to identification. Muñoz-Carpena R, Ritter A, Li YC. 2005. Dynamic factor analysis of
Trend Ecol Evolut. 24:49–57. groundwater quality trends in an agricultural area adjacent to
Cazelles B, Chavez M, Berteaux D, Ménard F, Vik JO, Jenouvrier S, Everglades National Park. J Contam Hydrol. 80:49–70.
Stenseth NC. 2008. Wavelet analysis of ecological time series. Özkundakci D, Gsell AS, Hintze T, Täuscher H, Adrian R. 2016. Winter
Oecologia. 156:287–304. severity determines functional trait composition of phytoplankton in
Driescher E, Behrendt H, Schellenberger G, Stellmacher R. 1993. Lake seasonally ice covered lakes. Glob Change Biol. 22:284–298.
Müggelsee and its environment—natural conditions and anthropo- Parr TW, Siera ARJ, Battarbee RW, Mackay A, Burgess J. 2003.
genic impacts. Int Rev Gesamt Hydrobiol Hydrogr. 78:327–343. Detecting environmental change: science and society—perspectives
Dodds WK, Robinson CT, Gaiser EE, Hansen GJ, Powell H, Smith on long-term research and monitoring in the 21st century. Sci Total
JM, Morse NB, Johnson SL, Gregory SV, Bell T, Kratz TK. 2012. Environ. 310:1–8.
Surprises and insights from long-term aquatic data sets and R Core Team. 2012. R: a language and environment for statistical
experiments. BioScience. 62:709–721. computing. R Foundation for Statistical Computing, Vienna
Durbin J, Koopman SJ. 2012. Time series analysis by state space (Austria). ISBN 3-900051-07-0. http://www.R-project.org/
methods (No. 38). Oxford University Press. Reynolds CS. 1984. The ecology of freshwater phytoplankton.
Foster G. 1996. Wavelets for period analysis of unevenly sampled time Cambridge University Press.
series. Astronom J. 112:1709. Rose KC, Winslow LA, Read JS, Read EK, Solomon CT, Adrian R,
Gerten D, Adrian R. 2000. Climate-driven changes in spring plankton Hanson PC. 2014. Improving the precision of lake ecosystem
dynamics and the sensitivity of shallow polymictic lakes to the North metabolism estimates by identifying predictors of model uncertainty.
Atlantic Oscillation. Limnol Oceanogr. 45:1058–1066. Limnol Oceanogr Meth. 12:303–312.
Halliday SJ, Wade AJ, Skeffington RA, Neal C, Reynolds B, Rowland Schaefer DA, McDowell WH, Scatena F, Asbury CE. 2000. Effects of
P, Neal M, Norris D. 2012. An analysis of long-term trends, hurricane disturbance on stream water concentrations and fluxes in
seasonality and short-term dynamics in water quality data from eight tropical forest watersheds of the Luquillo Experimental Forest,
Plynlimon, Wales. Sci Total Environ. 434:186–200. Puerto Rico. J Trop Ecol. 16:189–207.
Hamilton DP, Carey C, Arvola L, Arzberger P, Brewer C, Cole JC, Scharfenberger U, Mahdy A, Adrian R. 2013. Threshold–driven shifts
Gaiser E, Hanson PC, Ibelings BW, Jennings E, et al. 2014. A Global in two copepod species: testing ecological theory with observational
Lake Ecological Observatory Network (GLEON) for synthesising data. Limnol Oceanogr. 58:741–752.

DOI: 10.5268/IW-6.3.948 Inland Waters (2016) 6, pp.284-294


294 Aguilera et al.

Schoellhamer DH. 2001. Singular spectrum analysis for time series Wilhelm S, Adrian R. 2008. Impact of summer warming on the thermal
with missing data. Geophys Res Lett. 28:3187–3190. characteristics of a polymictic lake: consequences for oxygen,
Shumway RH, Stoffer DS. 1982. An approach to time series smoothing nutrients and phytoplankton. Freshwater Biol. 53:226 –237.
and forecasting using the EM algorithm. J Time Ser Anal. 3:253–264. Zayed A. 1993. Advances in Shannon’s sampling theory. Taylor &
Shumway RH, Stoffer DS. 2010. Time series analysis and its applica- Francis.
tions: with R examples. New York (NY): Springer. Zuur AF, Fryer RJ, Jolliffe IT, Dekker R, Beukema JJ. 2003a.
Solomon CT, Bruesewitz DA, Richardson DC, Rose KC, Van de Estimating common trends in multivariate time series using dynamic
Bogert MC, Hanson PC, et al. 2013. Ecosystem respiration: drivers factor analysis. Environmetrics. 14:665–685.
of daily variability and background respiration in lakes around the Zuur AF, Tuck ID, Bailey N. 2003b. Dynamic factor analysis to
globe. Limnol Oceanogr. 58:849–866. estimate common trends in fisheries time series. Can J Fish Aquatic
Vilizzi L. 2012. Abundance trends in floodplain fish larvae: the role of Sci. 60:542–552.
annual flow characteristics in the absence of overbank flooding. Fund Zuur AF, Pierce GJ. 2004. Common trends in northeast Atlantic squid
Appl Limnol Arch Hydrobiol. 181:215–227. time series. J Sea Res. 52:57–72.
Wagner C, Adrian R. 2009. Cyanobacteria dominance: quantifying the Zuur AF, Ieno EN, Smith GM. 2007. Analysing ecological data (Vol.
effects of climate change. Limnol Oceanogr. 54:2460–2468. 680). New York: Springer.
Wagner C, Adrian R. 2011. Consequences of changes in thermal regime
for plankton diversity and trait composition in a polymictic lake: a
matter of temporal scale. Freshwater Biol. 56:1949 –1961.

© International Society of Limnology 2016 DOI: 10.5268/IW-6.3.948

Das könnte Ihnen auch gefallen