TS Survey

A survey on data mining techniques applied to time series forecasting
FRANCISCO MARTNEZ-LVAREZ* Pablo de Olavide University of Seville, Spain, fmaralv@upo.es ALICIA TRONCOSO Pablo de Olavide University of Seville, Spain, ali@upo.es JOS C. RIQUELME University of Seville, Spain, riquelme@us.es
Data mining has become an essential tool during the last decade to analyze large sets of data. The variety of techniques it includes and the successful results obtained in many application elds, make this family of approaches powerful and widely used. In particular, this work explores the application of these techniques to time series forecasting. Although classical statistical-based methods provides reasonably good results, the result of the application of data mining outperforms those of classical ones. Hence, this work faces two main challenges: i) to provide a compact mathematical formulation of the mainly used techniques, ii) to review the latest works of time series forecasting and, as case study, those related to electricity price. Categories and Subject Descriptors: A.1 [Introductory and Survey]: ; G.3 [Probability and Statistics]: Time series analysis; H.2.8 [Database Management]: Database Applications Data mining Additional Key Words and Phrases: Time series, forecasting, data mining
1. INTRODUCTION
The prediction of the future has fascinated the human being since its early existence. Actually, many of these eorts can be noticed in everyday events such as weather, currency exchange rate, pollution, and so forth. Accurate predictions are essential in economical activities as remarkable forecasting errors in certain areas may involve large loss of money. Given this situation, the successful analysis of temporal data has been a challenging task for many researchers during the last decades and, indeed, it is dicult to gure out any scientic branch with no time-dependant variables. A thorough review of the existing techniques devoted to forecast time series is provided in this survey. Although a description of classical Box-Jenkins methodology is also discussed, this text is particularly focused on those methodologies that
*Corresponding author Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for prot or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specic permission and/or a fee. c 2011 ACM 0000-0000/2011/0000-0001 $5.00
ACM Computing Surveys, Vol. V, No. N, Month 2011, Pages 10??.
F. Martnez-lvarez et al.
make use of data mining techniques. Moreover, a family of energy-related time series are examined due to the scientic relevance it has shown during the last decade: electricity price time series. These series have been chosen since they present some peculiarities such as nonconstant mean and variance, high volatility or presence of outliers, that turns the forecasting process into a particularly dicult task to full. Actually, the electric power markets have become competitive markets due to the deregulation carried out in the last years, allowing the participation of all producers, investors, traders or qualied buyers. Thus, the price of the electricity is determined on the basis of this buying/selling system. Consequently, a will of obtaining optimized bidding strategies has arisen in the electricity-producer companies [Plazas et al. 2005], needing both insight into future electricity prices and assessment of the risk of trusting in predicted prices. Some works have already reviewed electricity price time series forecasting techniques [Aggarwal et al. 2009]. However, none of them are focused on data mining techniques, as they especially explored linear models' accuracy. So, this review is to ll this gap and to highlight all the skills these techniques are exhibiting nowadays. In short, this survey is to provide the reader with a general overview of current data mining techniques used in time series analysis and, as case study, their application to a real-world energy-related set of series. The remainder of this work is structured as follows. Section 2 provides a formal description of a time series and describes its main features. As for the rest of the sections, every section is devoted to explore one forecasting technique. However the reader must consider that these techniques are divided into two main groups: linear regressions and non-linear approaches. Therefore all sections consist of a brief mathematical description of the technique analyzed and a review of the ve most representative works. In particular, Section 3 describes the approaches based on linear methods. Classical Box and Jenkins-based methods such as AR, MA, ARMA, ARIMA or VAR are thus reviewed. In Section 4 rule-based forecasting methods are analyzed, providing a brief explanation of what a decision rule is, and revisiting the latest and most relevant works in this domain. As for Section 5, it is a compendium of the non-linear forecasting techniques currently in use in the data mining domain. In this way, articial neural networks, support vector machines, genetic programming and wavelet transforms are described in this Section, as well as discussing the most relevant improvements achieved by means of these techniques. The Section 6 provides an overview of the local forecasting methods, focusing on the nearest neighbors technique and variants. A compilation of several works that cannot be classied in none of the aforementioned groups is described in Section 7. Thus, forecasting approaches based on Markov processes, on Grey models or on manifold dimensionality reduction, are there detailed. Finally, Figure 1 depicts the methods studied along with the section where they are introduced.
ACM Computing Surveys, Vol. V, No. N, Month 2011.
A survey on data mining techniques to forecast time series.
Fig. 1. Models classication.
This section is to describe temporal data features as well as to provide mathematical description for such a kind of data. Thus, a time series can be understood as a sequence of values observed over time and chronologically ordered. Time is a continuous variable, however, samples are recorded at constant intervals in practice. When the time is considered as a continuous variable, the discipline is commonly referred as functional data analysis [Ramsay and Silverman 2005]. The description of this category is out of scope in this survey. Let yt , t = 1, 2, ..., T be the historical data of a given time series. This series is thus formed by T samples, where each yi represents the recorded value of the variable y at time i. Therefore, the forecasting process consists in estimating the value for yT +1 and, the goal, to minimize the error, which is typically represented as a function of yT +1 yT +1 . This estimation can be extended when the horizon of prediction is greater than one, that is, when the objective is to predict a sample at a time T + h, yT +h . In this situation, the best prediction is reached when a function of h (yT +i yT +i ) is minimized. i=1 Time series can be graphically represented. In particular, the xaxis identies the time (t = 1, 2, ..., T ) whereas the yaxis the values recorded at punctual time stamps (yt ). This representation allows the visual detection of the most highlighting features of a series, such as oscillations amplitude, existing seasons and cycles or the existence of anomalous data or outliers. Figure 2 illustrates, as example, the price evolution for a particular period of 2006 in the Spanish electricity market. An usual strategy to analyze time series is to decompose them in three main components [Brockwell and Davis 2002; Shumway and Stoer 2011]: trend, seasonality and irregular components, also known as residuals. (1) Trend. It is the general movement that the variable exhibits during the observation period, without considering seasonality and irregulars. Some authors prefer to refer the trend as the longterm movement that a time series shows. Trends can present dierent proles such as linear, exponential or parabolic. (2) Seasonality. This component typically represents periodical uctuations of the variable subjected to analysis. It consists of the eects reasonably stable
2. TIME SERIES DESCRIPTION
140
120
Electricity price
100
80
60
40
20
100
200
300
400
500
600
700
Time (hours)
Fig. 2. Time series example.

Time series
5 4.5 4 3.5 5 4.5 4
Trend
Magnitude
3 2.5 2 1.5 1 0.5 0 0 10 20 30 40 50 60 70 80 90 100
Magnitude
3.5 3 2.5 2 1.5 1 0.5 0 0 10 20 30 40 50 60 70 80 90 100
Time
Time
Seasonality
0.1 0.08 0.06 0.04 0.1 0.08 0.06
Residuals
Magnitude
0.02 0 0.02 0.04 0.06 0.08 0.1 0 10 20 30 40 50 60 70 80 90 100
Magnitude
0.04 0.02 0 0.02 0.04 0.06 0.08 0.1 0 10 20 30 40 50 60 70 80 90 100
Time
Time
Fig. 3. Time series main components decomposition.

along with the time, magnitude and direction. It can arise from several factors such as weather conditions, economical cycles or holidays. (3) Once the trend and cyclic oscillations have been calculated and removed, some residual values remain. These values can be, sometimes, high enough to mask the trend and the seasonality. In this case, the term outlier is used to refer these residuals, and robust statistics are usually applied to cope with them [Maronna et al. 2007]. These uctuations can be of diverse origin, which makes the prediction almost impossible. However, if by any chance, this origin can be detected or modeled, they can be thought of precursors in trend changes.
Residuals.
Figure 3 depicts how a time series can be decomposed in the variables above described. Obviously, real-world time series present a meaningful irregular component, which
makes their prediction a especially hard task to full. Some forecasting techniques are focused on detecting trend and seasonality (especially traditional classical methods), however, residuals are the most challenging component to be predicted. The eectiveness of one technique or another is assessed according to its capability of forecasting this particular component. It is for the analysis of this component where data mining-based techniques has been shown to be particularly powerful, as this survey will attempt to show in next sections. There exist real complex phenomena that cannot be represented by means of linear dierence equations since they are not fully deterministic. Therefore, it may be desirable to insert a random component in order to allow a higher exibility on its analysis. Linear forecasting methods are those that try to model a time series behavior by means of a linear function. From all the existing techniques, ve of them are quite popular: AR, MA, ARMA, ARIMA and VAR. These models follows a common methodology, whose application to time series analysis was rst introduced by Box and Jenkins. The original work has been extended and published many times since its rst apparition in 1970, but the newest version can be found in [Box and Jenkins 2008]. Autoregressive AR(p), moving average M A(q), mixed ARM A(p, q) and autoregressive integrated moving average ARIM A(p, d, q) models were described following with this idea, where p is the number of autoregressive parameters, q is the number of moving average parameters and d is the number of dierentiations for the series to be stationary. Vector autoregressive models V AR(p) are the natural extension for AR models to multivariate time series, where p denotes the number of lags considered in the system. There are some other models, such as ARCH (autoregressive conditional heteroscedasticity, rstly presented in [Kohonen 1982]) or GARCH (generalized ARCH, introduced in [Bollerslev 1986]), which follow similar methodology. However, these two methods are especially designed to cope with volatile time series, that is, with series that exhibit high volatility and outlying data. As this particular kind of data deserves a thorough study, the authors have decided not to include any further information about them (for detailed information refer to [Xekalaki and Degiannakis 2010; Francq and Zakoian 2010]). An autoregressive process (AR) is denoted by AR(p), where p is the order of the AR process. This process assumes that every yt can be expressed as a linear combination of some past values. It is a simple model but that adequately describes many real complex phenomena. The generalized AR model of order p is described by: (1) where i are the coecients that models the linear combination, t the adjustment error, and p the order of the model.
yt = i yti +
t i=1 p
3. FORECASTING BASED ON LINEAR METHODS
3.1 Autoregressive processes
When the error is small compared to the actual values, a future value can be estimated as follows:
yt = yt +
p t
=
i=1
wi yti
(2)
Vector autoregressive models (VAR) are the natural extension of the univariate AR to multivariate time series. VAR models have shown to be especially useful to describe dynamic behaviors in time series and therefore to forecast. In a VAR process of order p with N variables V AR(p), N dierent equations are estimated. In each equation a regression of the target variable over p lags is carried. Unlike the univariate case, VAR allow that each series to be related with its own lag and the lag of the other series that form the system. For instance, in two time series systems, there are two equations, one for each variable. This two-series system (V AR(1), N = 2) can be mathematically expressed as follows: (3) (4) (5) where y for i = 1, 2 are the series to be modeled, and 's the coecients to be estimated. Note that the selection of an optimum length of the lag is a critical task for VAR processes and, for this reason, has been widely discussed in literature [Yang 2002].
y1,t = 11 y1,t1 + 12 y2,t1 + y2,t = 21 y1,t1 + 22 y2,t1 +
1,t 2,t i,t
3.2 Vector autoregressive models
When the error cannot be assumed as negligible, AR processes are not valid. In this situation it is practical to use the moving average (MA) process, where the series is represented as linear combination of the error values:
t
3.3 Moving average processes
(6) where q + 1 is the order of the MA model and the coecients of the linear combination As observed, it is not necessary to make explicit use of past values of y to estimate its future value. Finally, MA processes are seldom used alone in practice.
q
yt =
ti
i=1
3.4 Autoregressive moving average processes
Autoregressive and moving average models are combined in order to generate better approximations than that of Wold's representation [Wold 1954]. This hybrid model is called autoregressive moving average process (ARMA) and denoted by ARM A(p, q). Formally:

p q
yt =
i=1
i yti +
i=1
ti
(7)
of
Again, ARMA assumes that t is small compared to yt to estimate future values yt . The estimates of t past values at time t i can be obtained from past actual
values of
yt
and past estimated values of
yt :
eti = yti yti

Therefore, the estimate for
(8)
yt
is calculated as follows:
p q
yt =
i=1
i yti +
i=1
i ti
(9)
3.5 Autoregressive integrated moving average processes

Autoregressive integrated moving average processes (ARIMA) are the most general methods and are the result of combining AR and MA processes. ARIMA models are denoted as
ARIM A(p, d, q),
where
is the number of autoregressive terms,
the number of nonseasonal dierences, and
the number of lagged forecast errors
in the prediction equation. These models follows a common methodology, whose application to time series analysis was rst introduced by Box and Jenkins [Box and Jenkins 2008]. Thus, this methodology proposes an iterative process formed by four main steps as illustrated in Figure 4.
Fig. 4. The Box-Jenkins methodology.

(1) Identication of the model. The rst task to be fullled is to determine wether the time series is stationary or not, that is, to determine if the mean and variance of a stochastic process do not vary along with time. If the time series does not satisfy this constraint, a transformation has to be applied and the time series has to be dierentiated until reaching stationarity. The number of times that the series has to be dierentiated is denoted by the parameters to be determined in ARIMA models.
and is one of
8
(2)
Once
Estimation of the parameters.

likelihood.
duced to an ARMA model with parameters
d is determined, the process is rep and q . These parameters can be
estimated by following non-linear strategies. From all of them, three stand out: the Gauss-Newton method, the least squares minimization and the maximum (3)
Validation of the model. Once the ARIMA model has been estimated several
values or the signicance of the coecients forming the model are forced to agree with some requirements. In cases in which this step is not fullled, the process begins again and the parameters are recalculated.
hypotheses have to be validated. Thus, the tness of the model, the residual
(4)
Forecasts. Finally, if the parameters have been properly determined and validated, the system is ready to perform forecasts.
3.6 Related work

The authors in [Guirguis and Felder 2004] used the GARCH method to forecast the electricity prices in two regions of New York. The obtained results were compared to dierent techniques such as dynamic regression, transfer function models and exponential smoothing. They also showed that accounting for the spike values and the heteroscedastic variance in these time series could improve the forecasting, reaching error rates lesser than 2.5%. Garca et al. [Garca et al. 2005] proposed a forecasting technique based on The proposal was tested on both mainland a GARCH model. Hence, this paper focused on day-ahead forecast of electricity prices with high volatility periods. Spanish and California deregulated markets. Also related with electricity prices time series, the approach proposed by Pekka et al. in [Malo and Kanto 2006] was equally noticeable. In it, the authors considered a variety of specication tests for multivariate GARCH models that were used in dynamic hedging in the Nordic electricity markets. Moreover, hedging performance comparison were conducted in terms of unconditional and conditional ex-post variance. Recently, a study focused the time varying volatility of two most important crude oil markets, the West Texas Intermediate and the Europe Brent [Cheong 2009]. A exible ARCH model was used in order to consider clustering volatility, asymmetric news impact or long memory volatility. Thanks to this investigation, the two markets were clearly dierentiated in terms of volatility.
Multiplicative Seasonal Autorregresive Integrated Moving Average (MS-ARIMA) in

since this market does not present great changes in its trend all along the seasons. Another application of Seasonal ARIMA can be found in [Chen et al. 2009], but
Ghosh presented an ARIMA variation in [Ghosh and Das 2004]. Hence, he used a
order to predict the maximum mensual demand in the city of Maharashtra, India. He used a database that gathers data from April 1980 to June 1999 and made a prediction for the eighteen subsequent months. The reported results were goods
this time the variable to be predicted was the inbound air travel arrivals to Taiwan. The authors in [Conejo et al. 2005] used the wavelet transform and ARIMA models to predict the day-ahead electricity price. Indeed, they rst used the wavelet transform to split the available historical data into constitutive series. Then, specic
ARIMA models were applied to these series and the forecasts were obtained by applying the inverse wavelet transform to the forecasts of these constitutive series. This approach improved former strategies that they had also published [Jimnez and Conejo 1999; Nogales et al. 2002; Contreras et al. 2003]. In [Garca-Martos et al. 2007] ARIMA models, selected by means of Bayesian Information Criteria, were proposed to obtain the forecasts of electricity prices in the Spanish market. In addition, the work analyzed the optimal number of samples used to build the prediction models. Weron et al. [Weron and Misiorek 2008] presented twelve parametric and semiparametric time series models to predict electricity prices for the next day. Moreover, in this work forecasting intervals were provided and evaluated taking into account the conditional and unconditional coverage. They concluded that the intervals obtained by semi-parametric models are better than that of parametric models.
4. RULE-BASED FORECASTING
Prediction based on decision rules usually makes reference to the expert system developed by Collopy and Armstrong in 1992 [Collopy and Armstrong 1992]. The initial approach consisted of 99 rules that combined four extrapolation-based forecasting methods: linear regression, Holt-Winter's exponential smoothing, Brown's exponential smoothing and random walk. During the prediction process, 28 features were extracted in order to characterize the time series. Consequently, this strategy assumed that a time series can be reliably identied by some features. Nevertheless, just eight features were obtained by the system itself since the remaining ones were selected by means of experts' inspections. This fact implies high ineciency insofar as too much time is taken, the ability of the analyst plays an important (and subjective) role and it shows a medium reliability. Formally, an association rule (AR) can be expressed as a sentence such that: If A Then B, with A a logic predicate over the attributes whose fulllment involves to classify the elements with a label B. The learning based on rules tries to nd rules involving the highest number of attributes and samples. ARs were rst dened by Agrawal et al. [Agrawal et al. 1993] as follows. Let I = {i1 , i2 , ..., in } be a set of n items, and D = {tr1 , tr2 , ..., trN } a set of N transactions, where each trj contains a subset of items. Thus, a rule can be dened as X Y , where X, Y I and X Y = . Finally, X and Y are called antecedent (or left side of the rule) and consequent (or right side of the rule), respectively. When the domain is continuous, the association rules are known as quantitative association rules (QAR). In this context, let F = {F1 , ..., Fn } be a set of features, with values in R. Let A and C be two disjunct subsets of F , that is, A F , C F , and A C = . A QAR is a rule X Y , in which features in A belong to the antecedent X , and features in C belong to the consequent Y , such that:
X=
Fi A
4.1 Fundamentals
Fi [li , ui ]
(10)
10
Y =
Fj C
Fj [lj , uj ]
(11)
and the couple expressed as:
where li and lj represent the lower limits of the intervals for Fi and Fj respectively, ui and uj the upper ones. For instance, QAR could be numerically
F1 [12, 25] F3 [5, 9] F2 [3, 7] F5 [2, 8]

where
(12)
F1
and
F3
constitute the features appearing in the antecedent and
F2
and
F5
the ones in the consequent.
4.2 Related work

The authors in [Zhou et al. 2008] proposed an association rule mining method in order to eciently mine time-related rules in association with rule extraction systems. Thus, it was applied to generate candidates association rules, formed by a large number of time-related attributes from a test set. In the bioinformatics context, a temporal association rule mining method was proposed to identify how the expression of any particular gene aects the expression of other genes in [Nam et al. 2008]. This method overcome the drawbacks on extracting temporal dependencies found in classical rule-based systems, which are indispensable to discover underlying regulation mechanisms in biological pathways. The work presented in [Martnez-Ballesteros et al. 2009] discovered association rules based on evolutionary techniques in order to obtain relationships among correlated time series. For this purpose, a genetic algorithm was proposed to determine the intervals that form the rules without a previous discretization of the attributes, which was the main contribution of the approach. The proposal also allowed the overlapping of the regions covered by the rules. By contrast, not all the rule-based system provides crisp decisions. Hence, fuzzy rule-based systems are usually used when the available data presents missing values. In these systems, each element can belong to dierent groups with dierent grade of membership, not providing thus strict rules for every sample. Due to its exibility for dealing with incomplete, imprecise or uncertain data, fuzzy rule-based strategies are often applied to prediction purposes. Hence a fuzzy association rule can be expressed as: If X is A Then Y is B, where X, Y are disjoint subsets of attributes that forms the database and A, B contain the fuzzy sets that are associated with X and Y. By contrast, the authors in [Aznarte et al. 2008] explored the existing links between fuzzy rule-based systems applied to univariate time series and two statistical reference methods: the autoregressive (AR) models and the smooth transition autoregressive model. The fuzzy transform and the perception-based logical deduction were combined in [Stepnicka et al. 2009] to forecast time series. Thus, the authors constructed several models, making use of fuzzy rules, and showing the robustness when applied for long term predictions. Recently, H. T. Liu [Liu 2009] proposed an integrated fuzzy time series forecasting system in which the forecasted value was a trapezoidal fuzzy area instead of a single-point value as it happens with the output of the crisp time series methods.

and seasonal time series.
11
In addition, the system was implemented in order to deal with stationary, trend,
5. FORECASTING BASED ON NON-LINEAR METHODS

Non linear forecasting methods are those that try to model a time series behavior by means of a non linear function. This function is often generated by lineally combining non-linear functions whose parameters have to be determined. On the other hand, global methods are based on nding a linear function able to model the output data from the input ones. Several techniques form this family of methods, among which the most important are: articial neural networks, whose main advantage is that they do not need to know the input data distribution; the supportvector machines, which are very powerful classiers that follow a philosophy similar to that of the ANN; and genetic programming (GP), where the type of non-linear function that models the data behavior can be selected.
5.1 Articial neural networks

Articial neural networks (ANN) were originally conceived by McCulloch and Pitts in [McCulloch and Pitts 1943]. These mechanisms search for solving problems Therefore, these systems own a certain by using systems inspired in the human brain and not by applying step by step as usually happens in most techniques. intelligence resulting from the combination of simple interconnected units neurons that work in parallel in order to solve several tasks, such as prediction, optimization, pattern recognition or control. Neural networks are inspired in the structure and running of nervous systems, in which the neuron is the key element due to its communication ability. The existing analogies between ANN and the synaptic activity are now explained. Signals that arrive to the synapse are the neuron's inputs and can be whether attenuated or amplied by means of an associated
weight.
These input signals can excite the
neuron if a positive weighted synapsis is carried out or, on the contrary, they can inhibit it if the weight is negative. Finally, if the sum of the weighted inputs is equal or greater than a certain threshold, the neuron is activated. Neurons present, consequently, binary results: activation or not activation. Figure 5 illustrates an usual structure of an ANN. There are three main features that characterize a neural network: topology, learning paradigm and the representation of the information. A brief description of them are now provided. (1)
Topology of the ANN. Neural networks architecture consists in the organization and position of the neurons with regard to the input or output of the network. number of layers, the number of neurons per layer, the connection grade and the type of connections among neurons. With reference to the In this sense, the fundamental parameters of the network are the
number of layers,
ANN can be classied into monolayer or multilayer networks (MLP). The rst ones only have one input layer and one output layer, whereas the multilayer networks [Rosenblatt 1958] are a generalization of the monolayer ones, which add intermediate or hidden layers between the input and the output. When discussing about the
connection type, the ANN can be feedforward if the signal

12
Fig. 5. Mathematical model of an ANN.
propagation is produced in just one way and, therefore, they do not have a memory or recurrent if they keep feedback links between neurons in dierent layers, neurons in the same layer or in the same neuron. Finally, the connection grade can be totally connected if all neurons in a layer are connected with the neurons in the next layer (feedforward networks) or with the neurons in the last layer (recurrent networks) and, otherwise, partially connected networks in cases where there is not total connection among neurons from dierent layers. (2) Learning paradigm. The learning is a process that consists in modifying the weights of the ANN, according to the input information. The changes that can be carried out during the learning process are removing (the weight is set to zero), adding (conversion of a weight equal to zero to a weight dierent to zero) or modifying neurons connections. The learning process is said to be nished or, in other words, the network has learnt when the values assigned to the weights remain unchanged. The learning in ANN can be either supervised (perceptron and backpropagation [Rumelhart et al. 1986] techniques) or unsupervised, from which the self-organized Kohonen's maps (SOM) [Kohonen 1986] stands out. (3) Representation of the input/output information. ANN can be also classied according to way in which information relative to both input and output data is represented. Thus, in a great number of networks input and output data are analog which entails activation functions also analogs, either linear or sigmoidal. In contrast, there are some networks that only allow discrete or even binary values as input data. In this situation, the neurons are activated by means of an echelon function. Finally, hybrid ANNs can be found in which input data may accept continuous values and output data would provide discrete values or viceversa. Many references proposing the use of ANNs, or a variation of them, as a powerful tool to forecast time series, can be found in the literature. The most important works are detailed below. Furthermore, the creation of hybrid methods that highlight most of the strengths of each technique is currently the most popular work among the researchers. However, from all of them, the combination of ANN and

fuzzy set theory has become a new tool to be explored.
13
In [Abraham and Nath 2001] the performance of ANNs, fuzzy networks and ARIMA models was evaluated to forecast the electricity demand time series in the State of Victoria (Australia). The results showed that the fuzzy neural network
outperformed the plain ANN and ARIMA models. The detection of spike values is an arduous task usually faced by researchers. Hence, Saini and Soni [Saini and Soni 2003] implemented a multilayer perceptron with two hidden layers in order to predict peaks for the demand in the Indian energetic system, by using conjugate gradient methods. In order to avoid the
trapping of the network into a state of local minima, the optimization of the learning rate and error were performed. Rodrguez and Anders [Rodrguez and Anders 2004] presented a method to predict electricity prices by means of an ANN and fuzzy logic, as well as a combination of both. The basic selected network conguration consisted of a back propagation neural network with one hidden layer that used a sigmoid transfer function and a one-neuron output layer with a linear transfer function. They also reported the
results of applying dierent regression-based techniques over the Ontario market. A hybrid model that ANNs and fuzzy logic was introduced in [Amjady 2006]. As regards the neural network presented, it had a feed-forward architecture and three layers, where the hidden nodes of the proposed fuzzy neural network performed the fuzzycation process. The approach was tested over the Spanish electricity
price market and showed to be better than many other techniques such as ARIMA, Wavelet or MLP. Taylor et al. [Taylor 2006] compared six univariate time series methods to forecast electricity load for Rio de Janeiro and England and Wales markets. These
methods were an ARIMA model and an exponential smoothing (both for double seasonality), an articial neural network, a regression model with a previous principal component analysis and two naive approaches as reference methods. The best method was the proposed exponential smoothing and the regression model showed a good performance for the England and Wales demand. Another neural network-based approach was introduced in [Catalao et al. 2007] in which multiple combinations were considered. These combinations consisted of networks with dierent number of hidden layers, dierent number of units in each layer and several types of transfer functions. The authors evaluated the accuracy of the approach reporting the results from the electricity markets of mainland Spain and California. The use of ANN for forecasting electricity prices in the Spanish market was also proposed in [Pino et al. 2008]. The main novelty of this work lies on the proposed training method for ANN, which is based on making a previous selection for the MLP training samples, using an ART-type [Zurada 1992] neural network. In [El-Telbany and El-Karmi 2008], the authors discussed and presented results by using an ANN to forecast the Jordanian electricity demand, which is trained by a particle swarm optimization technique. They also showed the performance
obtained by using a back propagation algorithm and autoregressive moving average models. In 2009, the authors in [Yao 2009] integrated generalized linear auto-regression
14
and ANN for forecasting the coal demand of China. To evaluate the quality of that hybrid approach, they compared the performance of their own method with the two used models, separately. 5.2 Genetic programming A genetic algorithm (GA) [Goldberg 1989] is a kind of searching stochastic algorithm based on natural selecting procedures. Such algorithms try to imitate the biological evolutive process since they combine the survival of the best individuals in a set, by means of an structured and random process of information exchange. Every time the process iterates, a new set of data structures is generated gathering just the best individuals of older generations. Thus, the GA are evolutionary algorithms due to their capacity to eciently exploit the information relating to past generations. This fact allows the speculation about new searching points in the solution space, trying to obtain better models thanks to its evolution. Many genetic operators can be dened. However, selection, crossover and mutation are the most relevant and used and are now going to be briey described. (1) Selection. During each successive generation, a proportion of the existing population is selected to breed a new generation. Individual solutions are selected through a tness-based process, where tter solutions (as measured by a tness function) are typically more likely to be selected. Certain selection methods rate the tness of each solution and preferentially select the best solutions. Other methods rate only a random sample of the population, as this process may be very time-consuming. Most functions are stochastic and designed so that a small proportion of less t solutions are selected. This helps keep the diversity of the population large, preventing premature convergence on poor solutions. Popular and well-studied selection methods include roulette wheel selection and tournament selection. (2) Crossover. Just after two parents are selected by any selection method, crossover takes place. Crossover is an operator that mates these two parents to produce ospring. The newborn individuals may be better than their parents and the evolution process may continue. In most crossover operators, two individuals are randomly selected and recombined with a crossover probability, pc . That is, an uniform number r is generated and if r pc the two randomly selected individuals undergo recombination. Otherwise, the ospring can be sheer copies of their parents. The value of pc can either be set experimentally or set based on schema-theorem principles [Goldberg 1989]. (3) Mutation. Mutation is the genetic operator that randomly changes one or more of the individuals' genes. The purpose of the mutation operator is to prevent the genetic population from converging to a local minimum and to introduce to the population new possible solutions. Genetic programming (GP) is a natural evolution of GA and its rst apparition in the literature dates of 1992 [Koza 1992]. It is a specialization of genetic algorithms where each individual is a computer program. Therefore it is used to optimize a population of computer programs according to a tness landscape determined by a program's ability to perform a given computational task. Hence, specialized
15
genetic operator that generalize crossover and mutation are used for tree-structured programs. The main steps to be followed when using GP are now summarized. Obviously, depending on the type of the application, these steps may change in order to be adapted to the particular problem to be dealt with. (1) Random generation of an initial population, that is, programs. (2) Iterative execution until the stop condition to be determined in each situation is fullled: (a) To execute each program of the population and to assign an aptitude value, according to their behavior in relation with the problem. (b) To create new programs by applying dierent primary operations to the programs. i. To copy an existing program in the new generation. ii. To create two programs from two existing ones, genetically and randomly recombining some chosen parts of both programs, making use of the crossover operator, which will also be randomly chosen for each program. iii. To create a program from another randomly chosen by randomly changing a gene. (3) The program identied as possessing the best aptitude (the best for the last generation) is the designed result of the GP running. The viability of forecasting the electricity demand via linear GP is analyzed in [Bhattacharya et al. 2002]. Hence, the authors considered load demand patterns for ten consecutive months, observed every thirty minutes for the Victoria State of Australia. The performance was compared with an ANN and a neuro-fuzzy system and the system delivered best results in terms of accuracy and computational cost. When applying forecasting techniques, stochastic environments are usually assumed, which do not make them unsuitable for many real-world time series. In [Wagner et al. 2007] a new model of dynamic GP is developed specically devoted to non-static environments. It incorporates features that allow it to adapt to changing environments automatically as well as retain knowledge learned from previously encountered environments. The authors tested the accuracy over the USA's gross domestic product (GDP) and consumer price ination (CPI). The work in [Xie et al. 2007] analyzed the ability of GP to forecast the price ination in New Zealand. By using non-preprocessed temporal data in short intervals, the reported results showed that GP can perform accurate forecasts of price ination over short and medium terms. Another price forecasting strategy was proposed in [Amjady and Keynia 2009]. In fact the authors presented a mutual information-based feature selection technique in which the prediction part was a cascaded neuro-evolutionary algorithm. The accuracy was largely evaluated since they compared their results obtained from Pennsylvania-New Jersey-Maryland and Spanish electricity markets with seven dierent models. Another linear GP was used in [Guven 2009] with the aim of predicting daily ow rates time series at an American station. In addition, the authors proposed two
16
ANN models that were calibrated with some training sets. The study showed that the performance of the linear GP was moderately better than the one provided by the ANN. An evolutionary technique applied to the optimal short-term scheduling of the electric energy production was presented in [Troncoso et al. 2004]. The equations that dene the problem led to a nonlinear mixed-integer programming problem with a high number of real and integer variables. The required heuristics, introduced to assure the feasibility of the constraints, are analyzed, along with a brief description of the proposed GA. Results from the Spanish power system were reported. An integrated algorithm for forecasting oil production based on GA with variable parameters was presented in [Azadeh et al. 2009]. The algorithm identied the best model based on the results of minimum absolute percentage error and variance analysis. In adittion, the dynamic structure of the proposed GA identied conventional time series as the best model for oil production prediction. The method was validated for Russia, Brazil, India and USA oil production over a period ranging from 2001 to 2006. The support vector machine (SVM) model the way is nowadays understood, initially appeared in 1992 in the Computational Learning Theory (COLT) Conference and it has been subsequently studied and extended [Cortes and Vapnik 1995; Vapnik 1998]. The interest for this learning model is continuously increasing and it is considered an emerging and successful technique nowadays. Thus, it has become to a widely accepted standard in machine learning and data mining disciplines. The learning process in SVM represents an optimization problem under constraints that can be solved by means of quadratic programming. The convexity guarantees a single solution which is an advantage with regard to the classical model of ANN. Furthermore, current implementations provide moderate eciency for real-world problems with thousands of samples and attributes. Support vector machines aims at separating points by means of what they dened as hyperplane, which are just linear separators with a high dimensionality whose functions are dened according to dierent kernels. Formally, a hyperplane in a D-dimensional space is dened as follows: h(x) =< w, x > +b (13) D where x is the sample, w R is the orthogonal vector to the hyperplane, b R, w is the weight vector, b is the bias or threshold decision and < w, x > expresses the scalar product in RD . In case of a binary classier is required, the equation can be reformulated as: f (x) = sign (h(x)) (14) where the sign function is dened as:
sign(x) = +1, if x 0 1, if x < 0
5.3 Support vector machines
(15)
There exist many algorithms directed to create hyperplanes (w, b) given a dataset linearly separable. These algorithms guarantee the convergency to a solution hyperACM Computing Surveys, Vol. V, No. N, Month 2011.
17
Fig. 6. Hyperplane (w, b) equidistant to two classes, margin and support vectors.
plane although particularities of all of them will lead to slightly dierent solutions. Note that there can be innity hyperplanes that perform adequate separations. So the key problem for the SVM is to choose the best hyperplane, in other words, the hyperplane that maximizes the minimum distance (or geometric margin) between the samples in the dataset and the hyperplane itself. Another peculiarity of SVM is that only take into consideration those points belonging to the frontiers of the region of decision, which are the points that do not clearly belong to a class or to another. Such points are named support vectors. Figure 6 illustrates a bidimensional representation of an hyperplane equidistant to two classes, as well as showing the support vectors and the existing margin. If non linear transformation is carried out from the input space to the feature space, non linear separators-based learning is reached with SVM. Kernel functions are used thus in order to estimate the scalar product of two vectors in the features space. Consequently the election of an adequate kernel function is crucial and a priori knowledge of problem is required for a proper application of SVM. Nevertheless, the samples may not be linearly separable (see Figure 7) even in the features space. Trying to classify properly all the samples can seriously compromise the generalization of the classier. This problem is known as overtting. In such situations it is desirable to admit that some samples will be misclassied in exchange for having more promising and general separators. This behavior is reached by inserting soft margin in the model, whose objective function is composed by the addition of two terms: the geometric margin and the regularization term. The importance of both terms is pondered by means of a typically called parameter C . This model appeared in 1999 [Vapnik 1999], and it was the model that really allowed the practical use that SVMs have nowadays, since it provided robustness against the noise. Many works have been focussed on forecasting time series by applying SVM.
18
Fig. 7. Non linearly separable dataset.
Hence, the authors in [Huang et al. 2005] investigated the predictability of nancial movement direction with SVM by forecasting the weekly movement direction of the NIKKEI 225 index. In order to evaluate the accuracy of the proposed SVM, they compared the reported results with those of linear and quadratic analysis as well as with a back-propagation ANN. Further, they proposed a mixed model composed of a SVM and other classiers, weighting each classier according to experts experience. The study carried out in [Hong 2005] analyzed the suitability of applying SVM to forecast the electric load for the Taiwanese market. The results were compared to that of linear regressions and ANN. The same time series type, but related to the Chinese market, was forecasted in [Guo et al. 2006], in which the authors reached a globally optimized prediction by applying a SVM. Fan et al. [Fan et al. 2006] proposed a hybrid machine learning model based on Bayesian classiers and SVM. First, Bayesian clustering techniques were used to split the input data into 24 subsets. Then, SVM methods were applied to each subset to obtain the forecasts of the hourly electricity load. The occurrence of outliers (also called spike prices) or prices signicantly larger than the expected values is an usual feature found in these time series. With the aim of dealing with this feature, the authors in [Zhao et al. 2007] proposed a data mining framework based on both SVM and probability classiers. The research published in [Wang and Wang 2008] proposed a new prediction approach based on SVM and rough sets techniques with a previous selection of features from data sets by using an evolutionary method. The approach improved the forecasting quality, reduced the speed of convergence and the computational cost as regards a conventional SVM and a hybrid model formed by a SVM and simulated annealing algorithms. The use of SVM to forecast the ratios of key-gas in power transformer oil to
19
early detect failures was proposed in [Fei et al. 2009]. In the research, GA were used to determine free parameters of the SVM. The experimental results showed that the new method achieved greater accuracy than Grey model and ANN when the training set was small.
5.4 Wavelet transform methods

All the methods described are applied in the time domain. However, time series can also be analyzed in the frequency domain by means of several techniques. Fourier transform and dierent Fourier-related transforms such as short-time Fourier transform (STFT), fast Fourier transform (FFT) or discrete Fourier transform is the most widely used tool to extract the spectral components from temporal data. However, there is another technique derived from this analysis which is more suitable to time series analysis in the frequency domain: the wavelet transform. There are two dierent types of wavelet transforms. The discrete wavelet transform (DWT) performance is similar to that of low and high-pass lters, since it divides the time series in high and low frequencies. On the other hand, the continuous wavelet transform (CWT) works as if it was a band-pass lter, isolating just the frequency band of interest. Although both strategies can be used to perform spectral analysis, only the CWT is going to be described in this Section because it is much more useful and, consequently, used in time series analysis. DWT is usually used in data that present great variations and discontinuities, which is not the case of time series that frequently as modeled by smooth variations. Hence, the CWT is a convolution of a time series and the wavelet function [Daubechies 1992]. That is, the time series is ltered by a function that plays the same role of the window in the STFT. Nevertheless, in wavelet transform this window has a variable length according to the frequency band to be studied. Formally, the N points-CWT of a time series xn , sampled each t units of time, is dened as the convolution of such series with an extended and delayed wavelet function (t):
1 CW Tx (n, s) = s
N 1 n =0
x n
n n t s
with n = 0 . . . N 1
(16)
As this product has to be done N times for the scale s considered, if N is too large it is faster to estimate the result by using the FFT than by means of the denition. From the convolution theorem [Proakis and Manolakis 1998], the CWT can be obtained from the inverse fast Fourier transform (IFFT) of time series and the wavelet's direct transform:
CW Tx (n, s) = IF F T
1 F F T (x(n, t)F F T ((n, t, s) s
(17)
Since s is the single parameter from which the transform depends on, the estimation of the CWT can be carried out by means of FFT algorithms for each scale as well as simultaneously for all the points forming the series. Conejo et al. [Conejo et al. 2005] proposed a new approach to predict day-ahead electricity prices based on the wavelet transform and ARIMA models. Thus, they decomposed the time series in a set of better-behaved constitutive series by applying
20
the wavelet transform. Then, the future values of these new series were forecast using ARIMA models, with a prior application of the inverse wavelet transform. A time series segmentation methodology is proposed in [Tseng et al. 2006]. It combined clustering techniques, wavelets transforms and PG in order to automatically search for segments and patterns in time series. To be precise, the GP was used to nd to most suitable segmentation points from which patterns may be found. Aggarwal et al. [Aggarwal et al. 2008] also forecasted electricity prices. For this purpose, they divided each day into segments and they applied a multiple linear regression to the original series or the constitutive series obtained by the wavelet transform depending on the segment. Moreover, the regression model used dierent input variables for each segment. Pindoriya et al. [Pindoriya et al. 2008] proposed an adaptive wavelet-based neural network for short-term electricity price time series forecasting for Spanish and California markets. As for the neural network, the output of the hidden layer neurons was based on wavelets that adapted their shape to training data. The authors concluded that their approach converged with higher rate and outperformed in the forecasting the electricity prices compared to other methods due to the ability for modeling the non-stationary and high frequency signals. A new method devoted to analyze the fatigue road loading can be found in [Zaharim et al. 2009]. In it, wavelets transforms and GA were combined as well as clustering the data into a sequence of a nested partition. The main goal was to solve some numerical bounded variables, dynamic focalized search, dynamic control of diversity and feasible region analysis, which are very common issues in evolutionary computation.
6. FORECASTING BASED ON NEAREST NEIGHBORS

One of the most popular way of either predicting or classifying a new data, based on past and known observations, it the nearest neighbors technique (NN), that was rst formulated by Cover and Hart in 1967 [Cover and Hart 1967]. The classical example to illustrate the application of NN refers to a doctor that tries to predict the result of a surgical procedure by comparing it with the obtained result from the most similar patient subjected to the same operation. However, a single case in which surgery had failed may have an excessive inuence over other slightly dierent cases in which the operation had successfully carried out. For this reason, the NN algorithm is generalized with the election of the
nearest neighbors, kNN. Thus, a simple
nearest neighbors generates a prediction for every cases. Moreover,
this rule can be extended by weighting the importance of the neighbors, giving a larger weight to the really nearest neighbors. The search of the nearest neighbor process can be dened as follows:
two dierent type of queries are wanted to be answered: Nearest neighbor: nd the point in Range: given a point
Denition..
Given a dataset
2 = p1 , ..., pn in a metric espace X of distance d,

P nearest to q X r > 0, return all the
points
q X
d(p, q) r
and
p P
that satisfy
21
Figure 8 illustrates an example in which k is set to three (three nearest neighbors are searched for) and an Euclidean metric is used.
Fig. 8. Three nearest neighbors of an instance to be classied.
each example ei has m attributes (ei1 , . . . , eim ) belonging to the metric space E m and a class Ci {C1 , . . . , Cd }. The classication of each new example e fulls that:
e Ci j = i d(e , ei ) < d(e , ej )
Formally, the classication rule is formulated as follows: Denition.. Let D = {e1 , . . . , eN } be a dataset with N labeled examples, in which
(18)
where e Ci indicates the assignation of the class label Ci to the example e ; and d expresses a distance dened in the m-dimensional space, E m . One example is thus labeled according to the nearest neighbor's class. This closeness is dened by means of the distance d which turns the election of this metric essential, since dierent metrics will most likely generate dierent classications. As a consequence the election of the metric is widely discussed in the literature, as shown in [Wang et al. 2007]. Note that the other main drawback that this technique presents is the selection of the number of neighbors to consider [Wang et al. 2006]. In [Troncoso et al. 2002] a forecasting algorithm based on nearest neighbors was introduced. The selected metric was the weighted Euclidean distance and the weights were calculated by means of a GA. The authors forecasted electricity demand time series in the Spanish market and the reported results were compared to those of an ANN. The same algorithm was tested on electricity price time series in [Troncoso et al. 2002]. In [?], the authors proposed a methodology based on weighted nearest neighbors (WNN) techniques. The proposed approach was applied to the 24-hour load forecasting problem and they built an alternative model by means of a conventional dynamic regression technique, where the parameters are estimated by solving a least squares problem, to perform a comparative analysis. A modication of the WNN methodology was proposed in [Troncoso et al. 2007]. To be precise, they explained how the relevant parameters the window length of the time series and the number of neighbors to be chosen are adopted. Then, the approach weighted the nearest neighbors in order to improve the prediction
22
accuracy. The methodology was evaluated with the Spanish electricity prices time series. The study presented in [Dimri et al. 2008] proposed a simple but eective NNbased method to forecast probability of precipitation occurrence and its quantity in the western Himalayas. The model was able to give the projections three days in advance with an accuracy verging on 85%. Although the forecast of occurrence's probability of precipitation was well performed, the quantitative precipitation forecast remained with large error rates. 7. OTHER MODELS Despite of the vast description of methods provided in prior Sections, some authors proposed new forecasting approaches that cannot be classied in none of the aforementioned categories. For this reason, this Section is devoted to introduce all these works. Hence, transfer functions models known as dynamic econometric models in the economics literature based on past electricity prices and demand were proposed to forecast day-ahead electricity prices by Nogales et al. in [Nogales and Conejo 2006], but the prices of all 24 hours of the previous day were not known. They used the median as measure due to the presence of outliers and they stated that the model in which the demand was considered presented better forecasts. The authors in [Pezzulli et al. 2006] focussed on the one year-ahead electricity demand prediction for winter seasons by dening a new Bayesian hierarchical model. They provided the marginal posterior distributions of demand peaks. Also in [Cottet and Smith 2003], Bayesian models are used to forecast electricity demand. Moreover, a multiple linear regression model to forecast electricity consumption using some input variables such as the gross domestic product, the price of electricity and the population was proposed in [Mohamed and Bodger 2005]. A novel non-parametric model using the manifold learning methodology was proposed in [Chen et al. 2007] in order to predict electricity price time series. For this purpose, the authors used cluster analysis based on the embedded manifold of the original dataset. To be precise, they applied manifold-based dimensionality reduction to curve modeling, showing that the day-ahead curve can be represented by a low-dimensional manifold. Recently, a fuzzy inference system adopted due to its transparency and interpretability combined with traditional time series methods was proposed for dayahead electricity price forecasting [Li et al. 2007]. Another dierent proposal can be found in [Wang and Meng 2008], where a forecasting algorithm based on Grey Models was introduced to predict the load of Shanghai. In the Grey model the original data series was transformed to reduce the noise of the data series and the accuracy was improved by using Markov chains techniques. 8. CONCLUSIONS
ACKNOWLEDGMENTS
This research has been partially supported by the Spanish Ministry of Science and Technology under project TIN2007-68084-C-00, and Junta de Andaluca under

project P07-TIC-02611.
REFERENCES
in victoria.
23
Abraham, A. and Nath, B. 2001.
Aggarwal, S. K., Saini, L. M., and Kumar, A. 2008. Price forecasting using wavelet transform Aggarwal, S. K. Saini, L. M. and Kumar, A. Agrawal, R. Imielinski, T. and Swami, A. Amjady, N. Amjady, N. and Keynia, F.
Applied Soft Computing Journal 1, 2, 127138.
A neuro-fuzzy approach for forecasting electricity demand
Azadeh, A. Aramoon, M. and Saberi, M.
Aznarte, J. L. Bentez, J. M. and Castro, J. L. Bhattacharya, M. Abraham, A. and Nath, B. Bollerslev, T. Box, G. and Jenkins, G.
Brockwell, P. J. and Davis, R. A. Catalao, J. P. S. Mariano, S. J. P. S. Mendes, V. M. F. and Ferreira, L. A. F. M. Chen, C. F. Chang, Y. H. and Chang, Y. W. Chen, J. Deng, S. J. and Huo, X. Cheong, C. W.
Collopy, F. and Armstrong, J. S.
Conejo, A. J. Plazas, M. A. Espnola, R. and Molina, B. Cortes, C. and Vapnik, V. Cottet, R. and Smith, M.
Contreras, J. Espnola, R. Nogales, F. J. and Conejo, A. J.
and lse based mixed model in australian electricity market. International Journal of Energy Sector Management 2, 4, 521546. , , 2009. Electricity price forecasting in deregulated markets: A review and evaluation. International Journal of Electrical Power and Energy Systems 31, 1, 1322. , , 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. 207216. 2006. Day-ahead price forecasting of electricity markets by a new fuzzy neural network. IEEE Transactions on Power Systems 21, 2, 887896. 2009. Day-ahead price forecasting of electricity markets by mutual information and cascaded neuro-evolutionary algorithm. IEEE Transactions on Power Systems 24, 1, 306318. , , 2009. An integrated GA-time series algorithm for forecasting oil production estimation: USA, Russia, India, and Brazil. International Journal of Industrial and Systems Engineering 4, 4, 368387. , , 2008. Smooth transition autoregressive models and fuzzy rule-based systems: functional equivalente and consquences. Fuzzy Sets and Systems 158, 27342745. , , 2002. A linear genetic programming approach for modelling electricity demand prediction in victoria. In Proceedings of International Workshop on Hybrid Intelligent Systems. 379394. 1986. Modelling the coherence in short-run nominal exchange rates: A multivariate generalized ARCH model. Review of Economics and Statistics 72, 498505. 2008. Time series analysis: forecasting and control. John Wiley and Sons. 2002. Introduction to time series and forecasting. Springer. , , , 2007. Short-term electricity prices forecasting in a competitive market: a neural network approach. Electric Power Systems Research 77, 12971304. , , 2009. Seasonal ARIMA forecasting of inbound air travel arrivals to Taiwan. International Journal of Forecasting 5, 2, 25140. , , 2007. Electricity price curve modeling by manifold learning. IEEE Transactions on Power Systems 15, 723736. 2009. Modeling and forecasting crude oil markets using ARCH-type models. Energy Policy 37, 6, 23462355. 1992. Rule-based forecasting: Development and validation of an expert systems approach to combining time series extrapolations. Management Science 38, 13921414. , , , 2005. Day-ahead electricity price forecasting using the wavelet transform and arima models. IEEE Transactions on Power Systems 20, 2, 10351042. , , , 2003. ARIMA models to predict next-day electricity prices. IEEE Transactions on Power System 18, 3, 10141020. 1995. Support-vector networks. Machine Learning 20, 3, 273297. 2003. Bayesian modeling and forecasting of intra-day electricity load. Journal of the American Statistical Association 98, 464, 839849.
24
Cover, T. M. and Hart, P. E. 1967. Nearest neighbor pattern classication. IEEE Transac14, 19211931. Short-term forecasting of Jordanian electricity demand using particle swarm optimization. Electric Power Systems Research 78, 425433. Fan, S., Mao, C., Zhang, J., and Chen, L. 2006. Forecasting electricity demand by hybrid machine learning model. Lecture Notes in Computer Science 4233, 952963. Fei, S. W., Liu, C. L., and Miao, Y. B. 2009. Support vector machine with genetic algorithm for forecasting of key-gas ratios in oil-immersed transformer. Expert Systems with Applications: An International Journal 36, 3, 63266331. Francq, C. and Zakoian, J. M. 2010. GARCH Models: Structure, Statistical Inference and Financial Applications. Wiley. Garca, R. C., Contreras, J., van Akkeren, M., and Garca, J. B. 2005. A GARCH forecasting model to predict day-ahead electricity prices. IEEE Transactions on Power Systems 20, 2, 867874. Garca-Martos, C., Rodrguez, J., and Snchez, M. J. 2007. Mixed models for short-run forecasting of electricity prices: application for the spanish market. IEEE Transactions on Power Systems 22, 2, 544552. Ghosh, S. and Das, A. 2004. Short-run electricity demand forecasts in Maharastra. Applied Economics 34, 8, 10551059. Goldberg, D. E. 1989. Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Massachusetts, USA. Guirguis, H. S. and Felder, F. A. 2004. Further advances in forecasting day-ahead electricity prices using time series models. KIEE International Transactions on PE 4-A, 3, 159166. Guo, Y., Niu, D., and Chen, Y. 2006. Support-vector machine model in electricity load forecasting. In International Conference on Machine Learning and Cybernetics. 28922896. Guven, A. 2009. Linear genetic programming for time-series modelling of daily ow rate. Journal of Earth System Science 118, 2, 137146. Hong, W. C. 2005. Electricity load forecasting by using SVM with simulated annealing algorithm. In Proceedings of World Congress of Scientic Computation, Applied Mathematics and Simulation. 113120. Huang, W., Nakamori, Y., and Wang, S. Y. 2005. Forecasting stock market movement direction with support vector machine. Computers and Operations Research 32, 10, 2513 2522. Jimnez, N. and Conejo, A. J. 1999. Short-term hydro-thermal coordination by lagrangian relaxation: Solution of the dual problem. IEEE Transactions on Power System 14, 1, 8995. Kohonen, T. 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of UK. Ination Econometrica 50, 9871008. Kohonen, T. 1986. Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 5969. Koza, J. R. 1992. Genetic programming: on the programming of computers by means of natural selection. MA: MIT Press, Cambridge. Li, G., Liu, C. C., Mattson, C., and Lawarre, J. 2007. Day-ahead electricity price forecasting in a grid environment. IEEE Transactions on Power Systems 22, 1, 266274. Liu, H. T. 2009. An integrated fuzzy time series forecasting system. Expert Systems with Applications: An International Journal 36, 6, 1004510053. Malo, P. and Kanto, A. 2006. Evaluating multivariate GARCH models in the nordic electricity markets. Communications in Statistics: Simulation and Computation 35, 1, 117148. Maronna, R. A., Martin, R. D., and Yohai, V. J. 2007. Robust Statistics: Theory and Methods. Wiley.
Daubechies, I. 1992. Ten lectures on wavelets. Society of Industrial in Applied Mathematics. Dimri, A. P., Joshi, P., and Ganju, A. 2008. Precipitation forecast over western himalayas
using k-nearest neighbour method.
International Journal of Climatology 28,
tions on Information Theory 13,
1, 2127.
El-Telbany, M. and El-Karmi, F. 2008.

, , , 2009. Quantitative association rules applied to climatological time series forecasting.
25
Martnez-Ballesteros, M. Martnez-lvarez, F. Troncoso, A. and Riquelme, J. C. McCulloch, W. S. and Pitts, W. Bulletin of Mathematical Biophysics 5 Mohamed, Z. and Bodger, P. Energy 30 Nam, H. Lee, K. and Lee, D. BMC Bioinformatics 10, Nogales, F. J. and Conejo, A. J. Journal of the Operational Research Society 57 Nogales, F. J. Contreras, J. Conejo, A. J. and Espnola, R. IEEE Transactions on Power System 17, Pezzulli, S. Frederic, P. Majithia, S. Sabbagh, S. Black, E. Sutton, R. and Stephenson, D.
activity. , 115133. economic and demographic variables. , , , 18331843. microarray data sets. 3, 2128. models. , 350356. , , , electricity prices by time series models. , , 2, 342348. , , , , with climatological weather generator. 113125.
Notes in Computer Science 5788, 284291.
Lecture
1943. A logical calculus of the ideas immanent in nervous
2005. Forecasting electricity consumption in new zealand using
2008. Identication of temporal association rules from time-series
2006. Electricity price forecasting through transfer function
2002. Forecasting next-day
2006. The seasonal forecast of electricity demand: a hierchical bayesian model
Applied Stochastic Models in Business and Industry 22, IEEE Transactions on Power Systems 23,
Pindoriya, N. Singh, S. and Singh, S. K.

, , 14231432. , , ,
2008. An adaptive wavelet neural network-based 3,
energy price forecasting in electricity markets.
Pino, R. Parreno, J. Gmez, A. and Priore, P.

of Articial Intelligence 21, 5362.
, , power producer.
2008. Forecasting next-day price of elec-
tricity in the spanish energy market using articial neural networks.
Engineering Applications
Plazas, M. A. Conejo, A. J. and Prieto, F. J. IEEE Transactions on Power Systems 20, Proakis, J. G. and Manolakis, D. G. Digital Signal Processing Functional data analysis Ramsay, J. O. and Silverman, B. W. Rodrguez, C. P. and Anders, G. J. IEEE Transactions on Power Systems 19, Rosenblatt, F. Psychological Review 65 Rumelhart, D. E. Hinton, G. E. and Willians, R. J. Learning internal representations by error propagation Saini, L. M. and Soni, M. K. IEEE Transactions on Power System 18, Shumway, R. H. and Stoffer, D. S. Time series analysis and its applications (with R examples) Stepnicka, M. Pavliska, V. Novak, V. Perfilieva, I. Vavrickova, L. and Tomanova, I. Proceedings of the Fuzzy Systems Association World Congress Taylor, J. International Journal of Forecasting 22 Troncoso, A. Riquelme, J. C. Riquelme, J. M. Martnez, J. L. and Gmez, A.
2005. Multimarket optimal bidding for a 4, 20412050. 1998. . Prentice Hall. 2005. . Springer. power system market. 1, 366374. nization in the brain. , , 386408. , 1986. . MIT Press. conjugate gradient methods. 1, 99105. 2011. . Springer. , , , , , . 483488. of electricity. , 707724. , , , , 2002.
2004. Energy price forecasting in the Ontario competitive
1958. The perceptron: a probabilistic model for information storage and orga-
2003. Articial neural network-based peak load forecasting using
2009. Time series analysis and prediction based on fuzzy rules and the fuzzy transform. In
2006. Density forecasting for the ecient balancing of the generation and consumption
Electricity market price forecasting: Neural networks versus weighted-distance k nearest neighbours.
Troncoso, A. Riquelme, J. C. Riquelme, J. M. Martnez, J. L. and Gmez, A.

, , , , Electricity market price forecasting based on weighted nearest neighbours techniques.
Lecture Notes in Computer Science 2453, 321330.
Troncoso, A. Riquelme, J. M. Riquelme, J. C. Gmez, A. and Martnez, J. L.

, , , , A comparison of two techniques for next-day electricity price forecasting.
Transactions on Power Systems 22, 3, 12941301. Computer Science 2412, 384390.
IEEE
2002.
2007.
Lecture Notes in
26
, , , ,
Troncoso, A. Riquelme, J. M. Riquelme, J. C. Gmez, A. and Martnez, J. L.

Time-series prediction: Application to the short term electric energy demand.
Tseng, V. S. Chen, C. H. Chen, C. H. and Hong, T. P.

, , , by the clustering and genetic algorithms. In
Articial Intelligence 3040, 577586.
Lecture Notes in
2004.
Vapnik, V. Vapnik, V.
Data Mining. 443447. 1998. Statistical learning theory.

1999. , ,
Proceedings of IEEE International Conference on IEEE Transactions on Neural IEEE Transactions

2007. Time series
2006. Segmentation of time series
Wiley.
Wagner, N. Michalewicz, Z. Khouja, M. and McGregor, R. R.

, forecasting for dynamic environments: the dyfor genetic program model.
Networks 10, 5, 988999.
An overview of statistical learning theory.
Wang, J. Neskovic, P. and Cooper, L. N.

, , neighbor rule using statistical condence. , ,
on Evolutionary Computation 11, 4, 433452.
Wang, J. Neskovic, P. and Cooper, L. N. Wang, J. M. and Wang, L. P. Wang, X. and Meng, M.
1248. simple adaptive distance measure. 2008.
Pattern Recognition 39, 3, 417423.

2007.
2006.
Neighborhood selection in the k-nearest
Pattern Recognition Letters 28, 2, 207213.
Improving nearest neighbor rule with a
Transactions of the Institute of Measurement and Control 30, 3, 331344.
2008. A new method for short-term electricity load forecasting.
Proceedings of the 7th International Conference on Machine Learning and Cybernetics. 1244
ric and semiparametric time series models. 1954.
Forecasting electricity demand using grey-markov model.
In
Weron, R. and Misiorek, A.

, ,
2008. Forecasting spot electricity prices: a comparison of paramet-
Wold, H. A Study in the Analisis of Stationary Time Series Xekalaki, E. and Degiannakis, S. ARCH Models for Financial Applications Xie, H. Zhang, M. and Andreae, P.
2010.
International Journal of Forecasting 24, 744763.

. Almqvist and Wiksell. . Wiley.
Yang, M. Yao, P.
prediction. In
Journal 5, 374386.
2009. ,
2002.
Proceedings of IEEE Congress on Evolutionary Computation. 25382545. Lag length and mean break in stationary VAR models. The Econometrics Lecture Notes in Computer Science 5551, 9931001.
, 2009. , , 2007.
2007. Genetic programming for new zealand CPI ination
Integrating generalized linear auto-regression and articial neural networks for
Zaharim, A. Ibrahim, M. D. and Nopiah, Z. M.

fatigue analysis. ,
coal demand forecasting.
Zhao, J. H. Dong, Z. Y. Li, X. and Wong, K. P.

price spike analysis with advanced data mining methods.
European Journal of Scientic Research 28, 1, 6875.
Genetic algorithm in time series
Zhou, H. Wei, W. Shimada, K. Mabu, S. and Hirasawa, K.

, , , , ation rules mining in trac prediction.
tems 22, 1, 376385.
IEEE Transactions on Power Sys2008. Time related associ-
A framework for electricity
Zurada, J. M.
Company.
Intelligent Informatics 12, 5, 467478. 1992. An introduction to articial neural systems.
Journal of Advanced Computational Intelligence and

St. Paul: West Publishing

TS Survey

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

TS Survey

Hochgeladen von

Copyright:

Verfügbare Formate

A survey on data mining techniques applied to time series forecasting

A survey on data mining techniques to forecast time series.

Fig. 1. Models classication.

2. TIME SERIES DESCRIPTION

Fig. 2. Time series example.

3 2.5 2 1.5 1 0.5 0 0 10 20 30 40 50 60 70 80 90 100

3.5 3 2.5 2 1.5 1 0.5 0 0 10 20 30 40 50 60 70 80 90 100

0.02 0 0.02 0.04 0.06 0.08 0.1 0 10 20 30 40 50 60 70 80 90 100

0.04 0.02 0 0.02 0.04 0.06 0.08 0.1 0 10 20 30 40 50 60 70 80 90 100

Fig. 3. Time series main components decomposition.

ACM Computing Surveys, Vol. V, No. N, Month 2011.

A survey on data mining techniques to forecast time series.

3. FORECASTING BASED ON LINEAR METHODS

3.1 Autoregressive processes

ACM Computing Surveys, Vol. V, No. N, Month 2011.

3.2 Vector autoregressive models

3.3 Moving average processes

3.4 Autoregressive moving average processes

ACM Computing Surveys, Vol. V, No. N, Month 2011.

A survey on data mining techniques to forecast time series.

and past estimated values of

eti = yti yti

3.5 Autoregressive integrated moving average processes

ARIM A(p, d, q),

is the number of autoregressive terms,

the number of nonseasonal dierences, and

the number of lagged forecast errors

Fig. 4. The Box-Jenkins methodology.

ACM Computing Surveys, Vol. V, No. N, Month 2011.

Estimation of the parameters.

duced to an ARMA model with parameters

d is determined, the process is rep and q . These parameters can be

3.6 Related work

Multiplicative Seasonal Autorregresive Integrated Moving Average (MS-ARIMA) in

ACM Computing Surveys, Vol. V, No. N, Month 2011.

A survey on data mining techniques to forecast time series.

ACM Computing Surveys, Vol. V, No. N, Month 2011.

and the couple expressed as:

F1 [12, 25] F3 [5, 9] F2 [3, 7] F5 [2, 8]

constitute the features appearing in the antecedent and

the ones in the consequent.

4.2 Related work

ACM Computing Surveys, Vol. V, No. N, Month 2011.

A survey on data mining techniques to forecast time series.

5. FORECASTING BASED ON NON-LINEAR METHODS

5.1 Articial neural networks

These input signals can excite the

connection type, the ANN can be feedforward if the signal

Fig. 5. Mathematical model of an ANN.

ACM Computing Surveys, Vol. V, No. N, Month 2011.

A survey on data mining techniques to forecast time series.

ACM Computing Surveys, Vol. V, No. N, Month 2011.

ACM Computing Surveys, Vol. V, No. N, Month 2011.

A survey on data mining techniques to forecast time series.

ACM Computing Surveys, Vol. V, No. N, Month 2011.

A survey on data mining techniques to forecast time series.

ACM Computing Surveys, Vol. V, No. N, Month 2011.

Fig. 7. Non linearly separable dataset.

A survey on data mining techniques to forecast time series.

5.4 Wavelet transform methods

1 F F T (x(n, t)F F T ((n, t, s) s

ACM Computing Surveys, Vol. V, No. N, Month 2011.

Fig. 1. Models classication.

the number of nonseasonal dierences, and

5.1 Articial neural networks

Fig. 8. Three nearest neighbors of an instance to be classied.

Applied Soft Computing Journal 1, 2, 127138.

Notes in Computer Science 5788, 284291.

2008. Identication of temporal association rules from time-series

tricity in the spanish energy market using articial neural networks.

2003. Articial neural network-based peak load forecasting using

Lecture Notes in Computer Science 2453, 321330.

Transactions on Power Systems 22, 3, 12941301. Computer Science 2412, 384390.

Articial Intelligence 3040, 577586.

Data Mining. 443447. 1998. Statistical learning theory.

Networks 10, 5, 988999.

on Evolutionary Computation 11, 4, 433452.

Pattern Recognition 39, 3, 417423.

Pattern Recognition Letters 28, 2, 207213.

Transactions of the Institute of Measurement and Control 30, 3, 331344.

International Journal of Forecasting 24, 744763.

2007. Genetic programming for new zealand CPI ination

Integrating generalized linear auto-regression and articial neural networks for

European Journal of Scientic Research 28, 1, 6875.