Sie sind auf Seite 1von 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication.


IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Energy Time Series Forecasting Based on


Pattern Sequence Similarity
Francisco MartínezÁlvarez, Alicia Troncoso, José C. Riquelme and Jesús S. AguilarRuiz

Abstract This paper presents a new approach to forecast clustering techniques. This fact discretizes and simplies
the behaviorFirst,
sequences. of time series based
clustering on similarity
techniques are used ofwith
pattern
the the process of prediction, since during the whole process
aim
Thus, of grouping
the pattern and
prediction labeling
of a data the samples
point from
is provided a dataset.
aspredicted
follows.
the PSF algorithm deals with sequences of labels instead
of with sets of real values. Despite that a naive use of la-
First, the
iscalextracted. Then, sequence prior
this sequence to the day
is searched to be
in the histori- bels to predict time series was presented in [21], the PSF
data
samples and the prediction
immediately after is
the calculated
matched by averaging
sequence. The allmain
the includes a new methodology to automatize the obtaining
of the labels providing rules to assign them to the sam-
novelty
are is that only
considered to the labels
forecast the associated
future with each
behavior of pattern
the time ples of real values. The sensitivity of the key parameter
series, avoiding the use of real values of the time series until involved in the selection of the number of underlying pat-
the
eral last
energysteptime
of the
series prediction
are process.
reported and Results
the from sev-of
performance
terns is also analyzed in order to study the robustness of
the method. The number of labels comprising the pattern
the
lished proposed method
techniques showingis compared
a to that
remarkable of recentlyinpub-
improvement the sequence, used in each prediction process, is systematically
prediction. determined in this work.
The PSF algorithm aims to be a general-purpose fore-
Keywords Time series, forecasting, patterns.
casting procedure. However, electricity-related problems
are addressed in this work. To be precise, two major groups
I. Introduction of time series are forecasted: electricity prices and electric-
The analysis of temporal data and the prediction of fu- ity demand. These groups belong to three dierent mar-
ture values of time series are among the most important kets: the Spanish Electricity Market Operator (OMEL),
problems that data analysts face in many elds, ranging the New York Independent System Operator (NYISO)
from nance and economics, to production operations man- and the Australia's National Electricity Market (ANEM).
agement or telecommunications. Therefore, the overall experimentation consists of six inde-
A forecast is a prediction of some future event(s). pendent time series showing thus the adaptability of the
Forecasting problems are often classied as short-term, PSF to miscellaneous time series. Moreover, in order to fa-
medium-term and long-term. Short-term forecasting prob- cilitate the comparison of the obtained results, all the data
lems involve predicting events only a few time periods sets analyzed are available on-line [19], [25], [26].
(days, weeks, months) into the future. Medium-term fore- The rest of the paper is organized as follows. Section
casts extend from one to two years and long-term forecast- II presents an exhaustive revision of the state-of-the-art
ing problems can extend beyond that by many years. on electricity prices and demand time series forecasting.
Time series data can be dened as a chronological se- Section III introduces the proposed methodology and the
quence of observations on a variable of interest. Most fore- description of the PSF algorithm, which can be applied to
casting problems imply the use of such data whose analysis time series of any nature. Section IV shows the results ob-
has traditionally been done by means of classical statistical tained by the PSF approach in electric energy markets of
tools. Nowadays, data mining techniques are acquiring a Spain, Australia and New York for the whole year 2006,
great relevance due to the large number of samples forming including measures of the quality of them. In Section V
the time series in multiple areas. comparisons between the proposed method and other tech-
A new approach, called Pattern Sequence-based Fore- niques are provided. Finally, Section VI summarizes the
casting (PSF), is here presented in order to forecast time main conclusions achieved and gives clues for future work.
series. This work can be considered a generalization of II. Related work
the algorithm introduced in [33], which is based on near-
est neighbors techniques. Nevertheless, the new approach The forecasting of energy time series has been widely
makes predictions using only labels generated by means of studied in literature, as it is described in Sections II-A and
II-B.
Francisco Martínez Álvarez, Alicia Troncoso and Jesús S. Aguilar
Ruiz are with the Pablo de Olavide University, Seville, Spain (e- A. Electricity prices time series forecasting
mail: fmaralv@upo.es; ali@upo.es; aguilar@upo.es) and José C.
Riquelme is with the Department of Computer Science, University The electric power markets have become competitive
of Seville, Seville, Spain (e-mail: riquelme@lsi.us.es). The nancial
support from the Spanish Ministry of Science and Technology, project
markets due to the deregulation carried out in the last
TIN2007-68084-C-00, and from the Junta de Andalucía, project P07- years, allowing the participation of all producers, investors,
TIC-02611, is acknowledged. traders or qualied buyers. Thus, the price of the electric-

Digital Object Indentifier 10.1109/TKDE.2010.227 1041-4347/10/$26.00 © 2010 IEEE


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

ity is determined on the basis of this buying/selling system. several types of transfer functions. Recently, Pindoriya et

Consequently, a will of obtaining optimized bidding strate- al. [28] proposed an articial neural network in which the

gies has arisen in the electricity-producer companies [29], output of the hidden layer neurons was based on wavelets

needing both insight into future electricity prices and as- that adapted their shape to training data.

sessment of the risk of trusting in predicted prices. A modication of the weighted nearest neighbors (WNN)

Electricity prices time series presents some peculiarities methodology is proposed in [33]. To be precise, the ap-

such as nonconstant mean and variance as well as the pres- proach weighted the nearest neighbors in order to improve

ence of outliers that turns the forecasting into a specially the prediction accuracy.

dicult task. Due to this fact, the accomplishment of ac- The occurrence of outliers (also called spike prices) or

curate forecasting has motivated research works by many prices signicantly higher than the expected values is an

authors nowadays [2], [37]. usual feature found in these time series. With the aim

The authors in [7] used the wavelet transform and au- of dealing with this feature, the authors in [41] proposed

toregressive integrated moving average models (ARIMA) a data mining framework based on both support vector

to predict the day-ahead electricity price. Indeed, they machines (SVM) and a probability classier.

rst used the wavelet transform to split the available his- Recently, a fuzzy inference system adopted due to

torical data into constitutive series. Then, specic ARIMA its transparency and interpretability combined with tra-

models were applied to these series and the forecasts were ditional time series methods was proposed for day-ahead

obtained by applying the inverse wavelet transform to the electricity price forecasting [18].

forecasts of these constitutive series. In [15] ARIMA mod-

els, selected by means of Bayesian Information Criteria,


B. Electricity demand time series forecasting

were proposed to obtain the forecasts of the prices. In ad- The process of forecasting the quantity of electricity re-
dition, the work analyzed the optimal number of samples quired for a specic geographical area during a time period
used to build the prediction models. Aggarwal et al. [3] di- is called load forecasting or demand forecasting. This pro-
vided each day into segments and they applied a multiple cess is key since current technology allows to store only
linear regression to the original series or the constitutive little amount of electricity in batteries. Therefore, the de-
series obtained by the wavelet transform depending on the mand forecasting plays an important role for electricity
segment. Moreover, the regression model used dierent in- power suppliers because both excess and insucient en-
put variables for each segment. ergy production may lead to large costs and signicative
Equally noticeable was the approach proposed by Gar- reduction of benets.
cía et al. [14] in which a forecasting technique based on Load forecasting has been widely studied [31], [37]. The
a generalized autoregressive conditional heteroskedasticity existing procedures are usually divided into two main
(GARCH) model was presented. Hence, this paper focused groups [13]. The rst one gathers traditional approaches
on day-ahead forecast of electricity prices with high volatil- such as regression, data smoothing techniques or Box and
ity periods. Jenkin's models. Thus, the authors in [27] focussed on
Transfer functions models based on past electricity prices the one year-ahead prediction for winter seasons by den-
and demand were proposed to forecast day-ahead electric- ing a new Bayesian hierarchical model. They provide the
ity prices by Nogales et al. in [24], but the prices of all 24 marginal posterior distributions of demand peaks. Also in
hours of the previous day were not known. They used the [8] Bayesian models are used to forecast electricity demand.
median as measure due to the presence of outliers and they Moreover, a multiple linear regression model to forecast
stated that the model in which the demand was considered electricity consumption using some input variables such as
presented better forecasts. the gross domestic product, the price of electricity and the

Weron et al. [38] presented twelve parametric and semi- population was proposed in [23].

parametric time series models to predict electricity prices Taylor et al. [32] compared six univariate time series

for the next day. Moreover, in this work forecasting inter- methods to forecast electricity load for Rio de Janeiro

vals were provided and evaluated taking into account the and England and Wales markets. These methods were an

conditional and unconditional coverage. They concluded ARIMA model and an exponential smoothing (both for

that the intervals obtained by semiparametric models are double seasonality), an articial neural network, a regres-

better than that of parametric models. sion model with a previous principal component analysis

A hybrid model that combined articial neural networks and two naive approaches as reference methods. The best

(ANN) and fuzzy logic was introduced in [4]. As regards method was the proposed exponential smoothing and the

the neural network presented, it had a feed-forward ar- regression model showed a good performance for the Eng-

chitecture and three layers, where the hidden nodes of land and Wales demand.

the proposed fuzzy neural network performed the fuzzi- With reference to the second main group, it gathers arti-

cation process. Following with this technique, another cial intelligence techniques among which expert systems,

neural network-based approach was introduced in [5] in neural networks and fuzzy theory are the most popular

which multiple combinations were considered. These com- [22]. In [11], the authors discussed and presented results

binations consisted of networks with dierent number of by using an ANN to forecast the Jordanian electricity de-

hidden layers, dierent number of units in each layer and mand, which is trained by a particle swarm optimization
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Data CLUSTERING PREDICTION

YES
Insert
predicted sample More days?

NO

END

CLUSTERING

Obtain K
data Select clustering (Silhouette, Dunn and Labeled data
Normalization
(K-means) Davies-Bouldin)

PREDICTION

Labeled data Search for Estimate


Obtain W Forecasts
and data equals pattern sequences sample

Fig. 1. Illustration of the proposed methodology. The clustering and prediction stages are further detailed.

technique. They also showed the performance obtained bination of ANN and fuzzy set theory has become a new
by using a back-propagation algorithm and autoregressive tool to be explored.
moving average (ARMA) models. An ANN-based forecast-
ing technique can also be found in [30]. Another proposal III. The proposed methodology
can be found in [36], where a forecasting algorithm based on
The proposed methodology is divided into two phases
Grey Models was introduced to predict the load of Shang-
clearly dierentiated. In a rst step, a clustering technique
hai. In the Grey model the original data series was trans-
is performed and, secondly, the phase of forecasting is ap-
formed to reduce the noise of the data series and the accu-
plied by using the information provided by this cluster-
racy was improved by using Markov chains techniques. Fan
ing. The PSF algorithm is focused on predicting samples
et al. [12] proposed a hybrid machine learning model based
framed in a time series, either one-dimensional or multi-
on Bayesian classiers and SVM. First, Bayesian cluster-
dimensional, previously labeled with clustering techniques.
ing techniques were used to split the input data into 24
As soon as the clustering is applied, the algorithm only pro-
subsets. Then, SVM methods were applied to each subset
cesses the number of the cluster the label associated with
to obtain the forecasts of the hourly electricity load. In
each pattern assigned to the samples, ignoring if they had
[34], the authors proposed a methodology based on WNN
more than one feature.
techniques. The proposed approach was applied to the 24-
With the PSF method, the horizon of prediction can
hour load forecasting problem and they built an alternative
be as long as desired. Hence, more than one sample can
model by means of a conventional dynamic regression tech-
be predicted, making predictions of non-restricted length.
nique to perform a comparative analysis. In [1] the perfor-
This fact is possible because it is implemented with a close
mance of ANN, fuzzy networks and ARIMA models was
loop that feeds the prediction of a sample back in the data
evaluated to forecast the electricity demand time series in
set in order to predict the following sample. As a conse-
Victoria and the results showed that the fuzzy neural net-
quence, the PSF approach is able to insert the predicted
work outperformed the plain ANN and ARIMA models.
samples in the data set with the aim of forecasting fur-
Finally, [35] proposed a new prediction approach based on
ther samples. Therefore, in case the horizon of prediction
SVM techniques with a previous selection of features from
was longer than one day, every predicted sample would be
data sets by using an evolutionary method. The creation
inserted into the data set and considered to be a regular
of hybrids methods that highlight most of the strengths of
sample. This feature is specially useful when the predic-
each technique is currently the most popular work among
tion has to cover various days or a long-term prediction is
the researchers. And, from all hybrids methods, the com-
required.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

"

Fig. 1 shows the basic idea behind the proposed method- The procedure to select the number of clusters to be gen-
ology. All the steps composing this methodology are going erated, K , is now discussed. From the application of these
to be described in subsequent subsections. three indices (see subsections III-B.1, III-B.2, III-B.3), two
possible situations can appear: at least two indices the
A. Data normalization majority coincide in selecting the same K (the K even-
The rst task to be completed is the normalization of tually chosen) or none of them coincide. When the second
data that is only used for the clustering process. It can be situation occurs, the second best values of the three indices
assumed that the prices increase all along the year following are also considered (together with the rst best values).
a tendency in accordance with the intra-annual ination. The K selected is, then, the one pointed by the majority
That is, the original trend is smoothed from the initial of all the cases. Further best values will be included and
data. The transformation applied is, analyzed until one K had more votes than the others.
xj
xj ← 1
N (1) B.1 The silhouette index
xi
The silhouette function provides a quality measure of
N i=1

where xj is the price/demand of the j − th hour of a day separation among the clusters obtained by using a cluster-
and N is equal to 24 since each value represents one hour ing technique. The average distance of the object i belong-
of the day. ing to the cluster A to all the objects in A is denoted by a(i)
B. Clustering technique and the average distance of i to all objects of the cluster
C = A is called d(i, C). For every cluster C = A, d(i, C) is
Given the database of hourly prices/demand, the clus- computed and the smallest one is selected as follows,
tering problem consists of identifying K groups or clusters
such that the prices/demand curves of the days belonging
to a cluster are similar among them and dissimilar to the b(i) = min d(i, C) with i ∈ A
=

(2)
curves of those days belonging to other clusters, according C A
to a distance. Clustering is a dicult task due to the great
number of possible geometric shapes for the clusters and The value b(i) represents the dissimilarity of the object i
distances that can be considered. to its nearest neighbor cluster. Thus, the silhouette values,
As a consequence, the dimensionality of the database is silh(i) are given by the following equation,
drastically reduced from its initial 24 features (equivalent
to the 24 hours of the day) to only one dimension (the label silh(i) =
a(i) − b(i)
(3)
of the cluster to which the day belongs). max{a(i), b(i)}
To achieve this challenge, two questions should be an-
swered: which clustering technique should be chosen? And, The silh(i) can range from −1 to +1, where +1 and −1
if appropriate, how many clusters should be created? means that the object i belongs to an adequate or inad-
These two topics have widely been discussed in the lit- equate cluster, respectively. If the silhouette value of the
erature [39]. Nevertheless, it seems that there is not an object i belonging to the cluster A is close to zero, it means
unique answer because it depends on sensitive factors. that the object i can also be in the nearest neighbor clus-
Crisp or fuzzy clustering are the two main branches of ter to A. If cluster A is a set with only one element, the
non-supervised classication. The discussion of choosing silhouette value of the object i is not dened and in this
one technique or another can be found in [20], in which the case, it is concerted be equal to zero. The objective func-
well-known K-means algorithm was the optimal method tion is the average of silh(i) over the number of objects to
to classify this kind of data set. For this reason, the K- be classied, and the best clustering is reached when this
means algorithm is the clustering technique used in this function is maximized.
work during the whole process of prediction.
The K-means algorithm requires that the user provides B.2 The Dunn index
the number of clusters to be created. However, this num-
ber is a priori unknown and its selection and later evalua- One of the most cited indices was proposed in [10]. The
tions of the results obtained by the clustering are crucial for Dunn index (DU) aims to identify clusters with high inter-
most engineering applications. Thus, the most challenging cluster distance and low intra-cluster distance. The Dunn
problem of the clustering realm has been to select the right index for K clusters Ci with i = 1, ..., K is dened by,
number of clusters for data sets.
For all these reasons, three well-known validity indices DUK = min min fi,j
(4)
have been applied to data in order to decide how many i j = i
groups the original data set has to be split into: silhouette
index [16], Davies-Bouldin index [9] and the Dunn index where
[10]. The three of them share a common feature: the new fi,j =
d(Ci , Cj )
(5)
data structure obtained by the clustering algorithm is eval- max diam(Cm )
uated to test the validity of the partition. m
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Day to be
predicted

Window , W = 5 Window , W = 5 Window , W = 5

4 1 1 1 2 6 3 3 … 3 1 1 2 6 3 1 … 4 1 1 2 6 3 X

6.50 6.50
6.00 6.00
5.50 5.50
5.00 5.00
4.50 4.50
4.00 4.00
3.50 3.50
3.00 3.00
2.50 2.50
1 3 5 7 9 11 13 15 17 19 21 23 1 3 5 7 9 11 13 15 17 19 21 23

6.50
6.00
5.50
5.00
4.50
AVERAGE 4.00
3.50
3.00
2.50
1 3 5 7 9 11 13 15 17 19 21 23

Fig. 2. PSF algorithm.

d(Ci , Cj ) is the dissimilarity between clusters Ci and Cj with ni the number of points and zi the centroid of cluster
dened by, Ci .
The existence of highquality clusters is guaranteed if
d(Ci , Cj ) = min x − y the Davies-Bouldin index reaches small values. Therefore,
x ∈ Ci (6)
the optimal number of clusters is found when this index is
y ∈ Cj minimized for the dataset.

and diam(C) is the intra-cluster function or diameter of C. The PSF algorithm


the cluster dened by this equation,
Given the hourly prices/demand recorded in the past,
diam(C) = max x − y up to day d − 1, the forecasting problem aims at predicting
(7)
x, y ∈ C the 24 hourly prices/demand corresponding to day d.
Let X(i) ∈ IR24 be a vector composed of the 24 hourly
where · represents a norm. energy prices/demand corresponding to a certain day i
In short, the existence of compact and well separated

clusters is guaranteed if the Dunn index reaches high val- X(i) = [x1 , x2 , . . . , x24 ] . (11)
ues. Therefore, the maximum is observed for the most

probable number of clusters in the dataset.


Let Li ∈ {1, ..., K} be the label of the prices/demand of
the day i obtained as a previous step to the forecasting by
B.3 The Davies-Bouldin index using a clustering technique, where K is the number of clus-
i
ters. Let SW be the sequence of labels of the prices/demand
The Davies-Bouldin index identies as good clusters
of the W consecutive days, from day i backward, as follows,
those compact clusters which are far from each other.

Davies-Bouldin index (DB) for K clusters Ci with i = i


SW = [Li−W +1 , Li−W +2 , . . . , Li−1 , Li ] (12)
1, ..., K is dened according to,

where the length of the window, W, is a parameter to be


1  max fi,j
K
determined (see Section III-D).
DBK = (8)
K i=1 j = i The PSF algorithm for the prediction of the hourly
prices/demand of the day d rst searches for the sequences
d−1
where of labels which are exactly equals to SW in the database,
diam(Ci ) + diam(Cj ) providing the equal subsequences set, ESd , dened by this
fi,j = (9)
d(Ci , Cj ) equation,
and, in this case, the diameter of a cluster is dened as,  
j
ESd = j such that SW = SW
d−1
(13)
  21
1  d−1
diam(Ci ) = x − zi 2 (10) In case of nding no sequences in database equal to SW ,
ni the procedure searches for the sequences of labels which are
x∈Ci
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

d−1
exactly equals to SW −1 and thus successively. That is, the The optimal number of labels comprising the window
length of the window composed of the sequence of labels (parameter W ) that will be used as a pattern of search to
is decreased in one unit. This strategy guaranties that at nd all equal sequences of labels in dataset is determined
least some sequences will be found when W is equal to one. by minimizing the forecasting error when the PSF method
According to the PSF approach, the 24 hourly values of is applied to a training set.
the time series for the day d are predicted by averaging the Mathematically, that means to nd the value of W that
values of the days following those in ESd , minimizes the following function:

 
 1 
||X(d) − X(d)|| (15)
X(d) = · X(j + 1) (14)
size(ESd ) d∈T S
j∈ESd

where size(ESd ) is the number of elements that belong to 


where X(d) are forecasted prices/demand for day d, ac-
the set ESd . cording to the PSF method, X(d) are actual recorded
The full procedure of the PSF algorithm is detailed in prices/demand and T S refers to the training set. Notice
Fig. 2 and a general scheme is presented in Fig. 3. The 
that, according to (14), X(d) is an implicit function of the
symbol  stands for  append  (insert at the end). discrete variable W . Hence, the application of standard
In case of a medium or long-term prediction, in which the mathematical programming methods is not possible when
forecasting of more than one sample is required, the follow- searching for W .
ing tasks have to be carried out. First of all, the values of In practice, W is calculated by means of cross-validation.
the predicted sample are linked to the whole data set. Sec- The cross-validation was originally dened as: "the statis-
ond, the clustering process is repeated with the enlarged tical practice of partitioning a sample of data into subsets
data set and, nally, the prediction step is performed (see such that the analysis is initially performed on a single sub-
Fig. 1). set, while the other subsets are retained for subsequent use
in conrming and validating the initial analysis" [17].
Input: Dataset D , number of clusters K , labeled dataset In this work, the n−fold cross-validation is used to obtain
[L1 , L2 , ..., Ld−2 , Ld−1 ], length of the window W and Test Set T the optimal value of W . In n−fold cross-validation, the
Output: Forecasts X(d)  for all days of T original dataset is split into n subsets. From all the n
subsets, one subset is used to validate the model, which
PSF()
is generated by the remaining n − 1 subsets. Thus, this
ESd ← {}

X(d) ←0 process is repeated n times, using each of the n subsets
for each day d ∈ T exactly once to validate. The n results are then combined 
d−1
SW ← [Ld−W , Ld−W +1 , . . . , Ld−2 , Ld−1 ] usually averaged in order to generate the nal estimation.
for each j such as X(j) ∈ D The advantage of this method lies on the use of all samples
j

SW ← Lj−W +1 , Lj−W +2 , . . . , Lj−1 , Lj


j d−1
for both training and validation.
if (SW = SW )
For the training phase, twelve folds have been used in
ESd ← ESd j
for each j ∈ ESd this work (n = 12), where each fold represents a month of

X(d) 
← X(d) + X(j + 1) the year under study. The 12−fold cross-validation is then

X(d) 
← X(d)/size(ES d) evaluated. The forecasting errors are calculated in every
D ← D  X(d)  fold by varying the length of W . These monthly errors
[L1 , L2 , ..., Ld−1 , Ld ] ← clustering(D,K)
are denoted by emonth {W = j} for j = 1 . . . Wmax , where
d←d+1

return X(d) for all days of T Wmax = 10 as empirically is shown in Section IV-B.2.
Then, the average errors are calculated for each window
Fig. 3. A general scheme of the algorithm PSF. size as follows,

1
n

D. Determining the size of the window Aj = emonth {W = j} (16)


n i=1
The previous clustering generates a sequence of labels
associated with every day (in Fig. 2 the sequence of where n = 12 and month = {Jan, . . . , Dec}.
numbers are these labels). Now, a sequence of labels The W selected is the one that minimizes the average
is taken into consideration for further steps; concretely, error corresponding to the 12 folds (months) evaluated.
if the day d has to be predicted, the sequence of labels
SWd−1
= [Ld−W , Ld−W +1 , . . . , Ld−2 , Ld−1 ] is extracted from W = arg min{Aj } with j = 1, ..., Wmax (17)
the data set and is used as a pattern of search, where W is
IV. Results
the length of this sequence (or window).
The selection of W depends on the case under study but The above described methodology has been applied to
it can be systematically tuned. Thus, it is compulsory to the electricity prices and demand of Spanish [25], Aus-
perform a training phase to nd an adequate value of W tralian [19] and New York [26] markets. These six data sets
before applying the PSF approach. have been selected due to the great amount of forecasting
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

results published in the literature. These results will be


TABLE I
used to establish a comparison with that of the proposed
Number of clusters selected for all the markets.
method in this work.
Market Silhouette DU DB Selection
This section is structured as follows. First, the accuracy OMEL 4 (3) 6 (4) 5 (4) 4
of the predictions is validated. Thus, the usual quality prices NYISO 5 5 5 5
ANEM 3 4 3 3
parameters are presented. Second, the PSF approach is OMEL 8 8 7 8
trained in order to produce accurate predictions and, for demand NYISO 4 4 3 4
ANEM 5 5 5 5
this reason, the election of both W and K is discussed here.
Third, the prediction of the year 2006 is provided. Finally,
a sensitivity analysis of the proposed method with regard
to the number of clusters is presented. Table I summarizes the results obtained by the three
indices of cluster validity, in which the numbers in brackets
A. Parameters of quality. represent the second best values, as described in Section
III-B. A further study about the inuence of the number
In order to assess the performance of the PSF approach,
of clusters on the results of the prediction is made in the
several measures have been considered:
section IV-D.
• Mean error relative to x̄ (MER).
The values of the indices when K varies from 2 to 20 are
depicted in Figs. 4, 5, and 6. Fig. 4(a) shows the results of
1  | x̂h − xh |
N
M ER = 100 · (18) all the three indices when varying K in the Spanish elec-
N x̄
h=1 tricity price market. Apparently, all of them have dierent
optimum values since the silhouette and the Dunn indices
where x̂h and xh are the predicted and current
reach the maximum values in K =4 and K =6 (0.3536
prices/demand at hour h respectively, x̄ is the mean
and 0.0010 respectively), while the Davies-Bouldin index
price/demand for the period of interest (a day or a week in
reaches its optimum value when K =5 (0.7417). Never-
this work) and N is the number of predicted hours. Note
theless, a thorough analysis of all the values reveals that
that, the mean price/demand is used in the denominator
for K=4 both Dunn and Davies-Bouldin indices have the
of (18) to avoid the eect of prices close to zero.
second best result (0.0010 and 0.7703), with values really
• Mean absolute error (MAE)
close to the optimum values. For these reasons, the number
of clusters selected for this time series is K = 4. On the
1 
N
other hand, Fig. 4(b) illustrates the results of the Spanish
M AE = | x̂h − xh | (19)
N demand market. As it can be appreciated, both silhouette
h=1
and Dunn indices select K = 8 reaching the maximum
• Standard deviation of MER/MAE (σ ). values in 0.5872 and 1.964E-06, respectively. On the con-
trary, the Davies-Bouldin index reaches its minimum and,
consequently, its optimum value when K = 7. However,
1 N
σ= (eh − ē)2 (20) when K =8 this index also presents a low value close to
N its global minimum (0.8983 versus 0.8208, respectively).
h=1
With reference to the New York electricity prices, the
where
situation shown in Fig. 5(a) reveals an easy selection of
x̂h − xh
eh = (21) the number of clusters. Actually, the three methods se-

lect K =5 since both silhouette and Dunn indices reach
and ē is the mean of the hourly errors the maximum values (0.5324 and 0.0004, respectively)
while the Davies-Bouldin index reaches its minimum value
B. Training the PSF algorithm (0.6602). The selection of K for the New York electricity
In this subsection the number of clusters to be generated, demand time series is shown in Fig. 5(b). It can be no-
as well as the length of the window comprising the sequence ticed, both silhouette and Dunn indices reach the optimum
of labels that has to be searched along the time series, are values in K=4 (0.7895 and 0.0002, respectively), but the
presented. Davies-Bouldin index reaches its minimum when K = 3
(0.8193). Even if the value in K =4 is not specially low
B.1 Selecting the number of clusters (0.9739), it can be considered a globally low value since
First of all, the number of clusters K has to be chosen this index reaches values such as 1.3125 in K = 6.
and, for this purpose, the twelve months of the year 2005 Fig. 6(a) shows the results obtained in the Australian
are considered for training the algorithm. electricity price market. As it can be noticed, both silhou-
In order to validate the quality of the clusters produced ette and Davies-Bouldin indices reach the maximum and
by K-means algorithm, the silhouette, Dunn and Davies- minimum values (0.4858 and 0.6931 respectively) in K=3
Bouldinthree indices have been used in the experiments. . Oppositely, the Dunn index reaches its maximum and,
Thus, the optimal value of K is selected from a system consequently, its optimum value when K = 4. However,
based on majority votes. when K =3 this index also presents a high value verging
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

&

MEAN SILHOUETTE −3
x 10
DUNN INDEX DAVIES−BOULDIN INDEX
0.45 1 1.4

0.4 1.3
0.8

0.35 1.2

0.6
0.3 1.1

0.25 1
0.4

0.2 0.9

0.2
0.15 0.8

0.1 0 0.7
2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
Number of clusters Number of clusters Number of clusters

(a) OMEL  price

MEAN SILHOUETTE −6
x 10
DUNN INDEX DAVIES−BOULDIN INDEX
0.6 2 1.2

1.8
1.15
0.55 1.6
1.1
1.4
0.5
1.2 1.05

0.45 1 1

0.8
0.95
0.4
0.6
0.9
0.4
0.35
0.85
0.2

0.3 0 0.8
2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
Number of clusters Number of clusters Number of clusters

(b) OMEL  demand

Fig. 4. Selecting the optimal number of clusters in OMEL time series.

MEAN SILHOUETTE −4
x 10
DUNN INDEX DAVIES−BOULDIN INDEX
0.55 4.5 1.4

0.5 4 1.3

0.45 3.5
1.2
0.4
3
1.1
0.35
2.5
0.3 1
2
0.25 0.9
1.5
0.2
0.8
1
0.15

0.5 0.7
0.1

0.05 0 0.6
2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
Number of clusters Number of clusters Number of clusters

(a) NYISO  price

MEAN SILHOUETTE −4 DUNN INDEX DAVIES−BOULDIN INDEX


0.9 x 10
1.8 1.5

0.8 1.6
1.4
1.4
0.7
1.3
1.2
0.6
1 1.2

0.5 0.8 1.1

0.6
0.4
1
0.4
0.3
0.9
0.2

0.2
2 4 6 8 10 12 14 16 18 20 0 0.8
2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
Number of clusters Number of clusters Number of clusters

(b) NYISO  demand

Fig. 5. Selecting the optimal number of clusters in NYISO time series.

on its global maximum (0.0003 and 0.0002 respectively). and Dunn index reach the maximum values (0.5673 and
With regard to the Australian electricity demand, the sit- 0.0178 respectively) while the Davies-Bouldin index reaches
uation shown in Fig. 6(b) is conclusive. Hence, the three its minimum value (0.9800).
methods agree in selecting K = 5 since both silhouette
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

'

MEAN SILHOUETTE x 10
−4 DUNN INDEX DAVIES−BOULDIN INDEX
0.5 3 1.3

0.45 1.2
2.5

0.4 1.1
2

0.35 1
1.5

0.3 0.9

1
0.25 0.8

0.5
0.2 0.7

0.15 0 0.6
2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
Number of clusters Number of clusters Number of clusters

(a) ANEM  price

MEAN SILHOUETTE DUNN INDEX DAVIES−BOULDIN INDEX


0.6 0.018 1.35

0.016 1.3
0.55
0.014 1.25

0.5 0.012
1.2
0.01
0.45 1.15
0.008
1.1
0.4 0.006
1.05
0.004
0.35
0.002 1

0.3 0 0.95
2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
Number of clusters Number of clusters Number of clusters

(b) ANEM  demand

Fig. 6. Selecting the optimal number of clusters in ANEM time series.

B.2 Selecting the length of the window be noticed that the mean of standard deviation of MER
Once the number of clusters is already decided, the next in the Spanish market is lower than that of the New York
step is to select the optimal length of the window W . Thus, market (0.27% versus 1.94%). This fact means that the
this step is focused on nding the W that obtains the min- maximum errors in the OMEL prices time series are closer
imum prediction error in the training set. to the average errors than the maximum errors obtained
Therefore, it is required to evaluate the performance of when the prices are predicted in the NYISO market. The
the PSF algorithm when W varies according to the method- standard deviation for the Australian market is the high-
ology presented in Section III-D. est (4.40%) due to the many peak prices considered as
Table II shows how the prediction error varies in accor- outliers that occur in this prices time series.
dance with the number of patterns considered in the win- TABLE III
dow. Note that the symbol '' means that similar sequences Performance of the PSF algorithm for the year 2006 in
of length W were not found when K clusters were consi- OMEL time series.
dered in the training set. Finally, the W that allows a lower
prediction error is the value chosen for further forecasting Month MER (σ)
PRICES
MAE (σ) MER (σ)
DEMAND
MAE (σ) in MW
on real data. It can be concluded that the optimal lengths Jan.
Feb.
7.26% (0.25) 0.53 (0.07)
4.93% (0.19) 0.36 (0.04)
3.12% (1.86) 744.32 (82.12)
of the windows that have to be used are W = 5, W = 3 Mar. 5.88% (0.22) 0.43 (0.05)
4.21%
5.07%
(2.26)
(4.17)
1033.97 (109.21)
1001.71 (98.85)
and W = 6 for the OMEL, NYISO and ANEM price time Apr.
May
3.62%
8.11%
(0.18)
(0.21)
0.28
0.64
(0.03)
(0.05)
4.18%
5.90%
(1.28)
(2.33)
1006.30 (107.58)
1129.76 (96.37)
series since they reach the lower prediction errors (2.23%, Jun.
Jul.
3.76%
4.30%
(0.24)
(0.23)
0.29
0.33
(0.05)
(0.04)
2.89%
2.34%
(1.81)
(1.19)
693.60 (84.60)
585.88 (63.17)
3.27% and 5.81%, respectively) and W = 2, W = 5 and Aug.
Sep.
5.37%
6.41%
(0.34)
(0.31)
0.42
0.50
(0.06)
(0.06)
3.61%
3.15%
(2.17)
(1.55)
792.21 (86.94)
757.02 (75.39)
W = 3 for the OMEL, NYISO and ANEM demand time
Oct. 7.89% (0.29) 0.58 (0.08) 2.89% (3.40) 1121.43 (149.75)
Nov. 8.30% (0.40) 0.64 (0.05) 4.72% (2.39) 982.19 (120.52)
series (2.87%, 4.99% and 3.43%, respectively). Dec.
Mean
8.02%
6.15%
(0.36)
(0.27)
0.59
0.47
(0.07)
(0.05)
6.21%
4.02%
(3.82)
(2.35)
1503.44 (198.49)
945.99 (106.08)

C. Forecasting results Fig. 7 illustrates several prediction curves obtained for


In this subsection the results obtained from the three dif- the Spanish market for the year 2006. Concretely, Fig. 7(a)
ferent markets are provided. Precisely, Tables III, IV and V shows the best and worst predictions generated by the PSF
show the MER and the MAE (and the standard deviations algorithm when electricity prices curves were considered.
σ in brackets) in the Spanish, Australian and New York With regard to the prices, the best prediction occurred on
electricity time series both for prices and demand for the June 23rd in which the MER was 3.10% and the MAE
whole year 2006. In spite of the average of the MER for the 0.12cE/KWHr, while the worst took place on May 8th in
year 2006 in the OMEL prices time series is greater than which the MER was 9.39% and the MAE 0.80cE/KWHr.
that corresponding to NYISO (6.15% versus 5.53%), it can Note that these curves are expressed in cents of Euro per
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

0

TABLE II
MER obtained with the PSF algorithm on the all the markets.
Market W=1 W=2 W=3 W=4 W=5 W=6 W=7 W=8 W=9 W=10 Selected W
OMEL  price (K = 4) 10.32% 8.44% 8.21% 4.39% 2.23% 2.89%     5
OMEL  demand (K = 8) 3.11% 2.87%         2
NYISO  price (K = 5) 7.09% 5.98% 3.27% 6.98% 4.45% 13.20% 10.31%    3
NYISO  demand (K = 4) 5.16% 6.21% 5.68% 5.02% 4.99% 6.23% 7.14% 6.90% 8.91%  5
ANEM  price (K = 3) 9.58% 7.91% 6.26% 6.17% 7.33% 5.81% 6.04% 9.12%   6
ANEM  demand (K = 5) 3.45% 4.17% 3.43% 6.10% 5.89% 4.02% 7.11%    3

TABLE IV
Performance of the PSF algorithm for the year 2006 in Best prediction for Spanish electricity price Worst prediction for Spanish electricity price

NYISO time series.


6 5.5
Real price Real price
Forecasting Forecasting
5.5 5

PRICES DEMAND 5
4.5

Month MER (σ) MAE (σ) MER (σ) MAE (σ) in MW

Price in cE/KWHr

Price in cE/KWHr
4
Jan. 4.45% (2.07) 2.25 (0.34) 5.05% (1.95) 53.21 (6.03) 4.5
Feb. 5.53% (1.52) 3.02 (0.28) 6.88% (2.62) 83.76 (9.19) 3.5
Mar. 6.30% (2.52) 3.97 (0.43) 5.31% (2.42) 59.63 (6.13) 4
Apr. 4.94% (1.47) 3.51 (0.61) 4.97% (2.22) 52.18 (7.21) 3
May 7.59% (2.13) 4.63 (0.43) 6.18% (2.39) 61.12 (5.74)
Jun. 3.34% (1.92) 2.31 (0.29) 3.75% (2.06) 44.17 (4.86) 3.5
2.5
Jul. 3.93% (1.68) 2.28 (0.20) 3.41% (1.78) 37.54 (4.01)
Aug. 5.37% (1.87) 3.49 (0.41) 3.99% (2.13) 39.86 (5.52) 3
0 5 10 15 20 25 2
Sep. 6.24% (1.74) 4.49 (0.53) 4.83% (2.16) 54.14 (6.87)
0 5 10 15 20 25
Hour Hour
Oct. 7.43% (2.33) 4.23 (0.49) 5.37% (2.25) 65.08 (8.01)
Nov. 5.19% (2.09) 3.53 (0.30) 4.86% (1.99) 50.25 (5.11)
Dec. 6.04% (1.99) 3.08 (0.33) 6.80% (2.40) 82.55 (9.97) (a) Spanish electricity price market
Mean 5.53% (1.94) 3.40 (0.39) 5.97% (2.20) 56.96 (6.55)

TABLE V
Performance of the PSF algorithm for the year 2006 in
4 Best prediction for Spanish electricity demand 4Worst prediction for Spanish electricity demand
x 10 x 10
2.2 2.9
Real demand Real demand

ANEM time series.


Forecasting 2.8 Forecasting
2.1
2.7
2
2.6
PRICES DEMAND
Month MER (σ) MAE (σ) MER (σ) MAE (σ) in MW
MW

MW
1.9 2.5

Jan. 5.58% (1.34) 1.51 (0.52) 4.74% (3.54) 412.38 (58.02) 2.4
Feb. 8.59% (3.24) 5.15 (2.21) 4.98% (2.98) 445.12 (43.29) 1.8

Mar. 7.84% (2.98) 1.73 (0.34) 5.02% (5.27) 430.00 (38.90) 2.3
Apr. 9.92% (3.90) 1.98 (0.63) 6.03% (7.46) 519.73 (57.71) 1.7
May 12.85% (4.03) 3.21 (1.02) 4.17% (2.72) 373.22 (43.28) 2.2

Jun. 22.04% (12.34) 6.81 (2.89) 5.67% (3.84) 561.44 (60.32) 1.6 2.1
Jul. 17.11% (10.58) 8.16 (3.42) 4.91% (5.84) 481.18 (53.67) 0 5 10 15 20 25 0 5 10 15 20 25

Aug. 11.71% (5.08) 3.32 (0.40) 5.88% (6.01) 555.07 (65.05) Hour Hour

Sep. 8.23% (2.45) 2.34 (0.23) 3.99% (2.74) 350.88 (32.41)


Oct. 7.66% (2.89) 1.92 (0.11) 4.04% (3.34) 340.73 (48.91)
Nov. 6.76% (1.94) 2.09 (0.34) 6.12% (5.90) 504.16 (70.42) (b) Spanish electricity demand
Dec. 6.42% (2.01) 1.41 (0.28) 3.91% (3.22) 329.26 (33.81)
Mean 10.39% (4.40) 3.30 (1.03) 4.96% (4.41) 441.93 (50.48)
Fig. 7. Best and worst predictions for the Spanish electricity market
in 2006.

kilowatts per hour (cE/KWHr). Fig. 7(b) presents the


best and worst predictions when the electricity demand was remark that it shows the information structured in dierent
analyzed. The best one took place on May 16th in which areas. Thus, the National Electricity Market in Australia
the MER was 1.16% and the MAE 253.49MW. On the is comprised of ve jurisdictions: Queensland, New South
other side, the worst prediction had a MER of 8.67% and a Wales, Victoria, Tasmania and South Australia. The re-
MAE equal to 1759.03MW and it took place on December sults in Table V refers to the Queensland market.
12th . Note that these curves are expressed in megawatts Fig. 9 shows the best and worst prediction curves ob-
(MW). tained for the Australian market in the year 2006 for both
The results obtained for the New York market are illus- electricity prices and demand markets. Fig. 9(a) illustrates
trated in Fig. 8. Fig. 8(a) presents the best and worst pre- the best and worst prediction curves obtained for the elec-
diction curves obtained for the New York electricity prices, tricity prices, which took place on May 12th (with associ-
which took place on July 8th (with a MER of 2.76% and ated MER and MAE of 3.66% and 0.98$/MWHr, respec-
a MAE of 1.41$/MWHr) and on May 12th (its MER was tively) and on July 20th (with associated MER and MAE of
8.89% and the MAE was equal to 6.89$/MWHr), respec- 65.60% and 28.39$/MWHr, respectively). The Australian
tively. Note that these curves are expressed in dollars per electricity price market is characterized by the existence of
MWHr ($/MWHr). Alternatively, Fig. 8(b) references to many spike prices during the year. Indeed, many authors
the best December 10th with a MER and MAE equals have studied how to perform accurate predictions in that
to 2.67% and 28.47MW, respectively and the worst  market [41]. The PSF algorithm, even if it is not able to
February 15th with a MER and MAE equals to 10.56% and nd the real magnitude of such peaks, it is able to forecast
97.89MW, respectively predictions in the demand time se- the existence of them. This fact justies the higher value
ries. of the MER obtained for that day. It can be observed how
With respect to the Australian market, it is important to the proposed algorithm captures the trend of the prices
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING



Best prediction for New York electricity price Worst prediction for New York electricity price Best prediction for Australian electricity price
200 75 19 Worst prediction for Australian electricity price
Real price Real price Real price 6000
Forecasting 70 Forecasting Forecasting Real price
18
180 Forecasting
5000
65 17

Price in $/MWHr
160
Price in $/MWHr

Price in $/MWHr
60 16 4000

Price in $/MWHr
140 55 15
3000
50 14
120
2000
45 13
100 1000
40 12

80 35
0 5 10 15 20 25 0 5 10 15 20 25 11 0
0 5 10 15 20 25 0 5 10 15 20 25
Hour Hour Hour Hour

(a) New York electricity price market (a) Australian electricity price market

Best prediction for New York electricity demand Worst prediction for New York electricity demand Best prediction for Australian electricity demand Worst prediction for Australian electricity demand
1250 1600 9500 9000
Real demand Real demand Real demand Real demand
1200 Forecasting 1500 Forecasting 9000 Forecasting Forecasting
8500
1150 1400 8500
1100 1300 8000 8000
1050 1200 7500
MW

MW

MW

MW
7500
1000 1100 7000
950 1000 6500 7000
900 900 6000
6500
850 800 5500
800 700 5000 6000
0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25
Hour Hour Hour Hour

(b) New York electricity demand (b) Australian electricity demand

Fig. 8. Best and worst predictions for the New York electricity Fig. 9. Best and worst predictions for the Australian electricity
market in 2006. market in 2006.

time series in the Australian market detecting a peak an


outlier at 7:00pm. Note that all the curves are expressed Remark 3. The three indices of cluster validity provided
in Australian dollars per MWHr ($/MWHr). dierent values of the parameter K for the OMEL prices
Finally, Fig. 9(b) references to the best December 8 th time series (K = 4, K = 6, and K = 5 for silhouette, DU
with a MER and MAE equals to 2.67% and 322.82MW, and DB indices, respectively). However, the MER obtained

respectively and the worst November 19 th


with a MER does not present signicative dierences when the number

and MAE equals to 10.56% and 638.92MW, respectively of clusters is equal to some of these three values. Neverthe-

predictions in the demand time series. less, for this time series, the global minimum is reached in
the selected number of clusters from the proposed system
D. Sensitivity to the parameter K based on majority votes.

In this subsection a posteriori analysis of sensitivity to


the parameter K is carried out in order to show the good Sensitivity of K − Prices Sensitivity of K − Demand
performance of the three indices of cluster validity pre-
20 20
OMEL OMEL
sented for these six time series and the robustness of the 15 NYISO 15 NYISO
MER (%)

MER (%)

proposed method with regard to this parameter. ANEM ANEM


Fig. 10 shows the MER provided by the PSF algorithm 10 10
in 2006 for the prices and demand time series when the
5 5
number of clusters K ranges from 2 to 15.
From Table I and Fig. 10, it can be stated the following. 0 0
2 4 6 8 10 12 14 2 4 6 8 10 12 14
Remark 1. The MER is minimum when ve clusters K K
(K = 5) are considered for both prices and demand in New Fig. 10. Sensitivity of the PSF approach to the K parameter.
York and Australia, respectively. All indices silhouette,
DU and DB coincided in the optimal selection of this pa-
rameter for the aforementioned markets. This analysis highlights the validity of the methodology
Remark 2. The indices silhouette and DU selected the followed in order to select K , since it reveals that the MER
same number of clusters for the demand time series of the is minimized when the three indices agreed, that the opti-
Spanish and New York markets (K = 8 and K = 4, re- mality is guaranteed when two of them a majority agreed
spectively) and for the prices time series of the Australian and, nally, that the MER does not vary signicantly when
market (K = 4). Indeed, the global minima of the MER all of them are dierent and K is selected as described in
for these three time series are reached in these values. subsection III-B.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

2

TABLE VI
MER for some weeks of the year 2002 (OMEL  price).
Week Naive ANN ARIMA Mixed models WNN 25.
18th 24th Feb 2002 7.68% 5.23% 6.32% 6.15% 5.15% 5.98%
20th 26th May 2002 7.27% 6.36% 6.36% 4.46% 4.34% 4.51%
19th 25th Aug 2002 27.30% 11.40% 13.39% 14.90% 10.89% 9.11%
18th 24th Nov 2002 19.98% 13.65% 13.78% 11.68% 11.83% 10.07%
Average 15.56% 9.16% 9.96% 9.30% 8.05% 7.42%

V. Comparative analysis of PSF TABLE VII


MER for one week of the year 2000 (OMEL  price).
A comparison between the results obtained by the PSF
method and the most representative approaches reported Day ARIMA Mixed Models 25.
in the literature is provided in this section, showing that Day 1
Day 2
4.30%
7.99%
4.80%
7.30%
3.74%
6.91%
the proposed approach improves the aforementioned tech- Day 3 4.57% 5.40% 3.45%
niques. Thus, in order to validate the accuracy of the pro- Day 4
Day 5
10.81%
6.12%
4.60%
5.10%
5.21%
4.48%
posed algorithm, it has been applied to specic periods Day 6 17.34% 14.90% 9.63%
of time in which other authors evaluated their own ap- Day 7 6.05% 7.20% 4.81%
proaches. Average 8.17% 7.04% 5.46%

Furthermore, this section is divided into two subsections.


The rst one gathers the forecasting results related to the A.2 The New York electricity prices market
electricity price markets, while the second one points out
the enhancements achieved in the electricity demand fore- As for the New York electricity prices time series, the
casting with the proposed methodology. authors in [6] compared some forecasting algorithms with
their own approach called STR. They applied manifold-
A. Electricity prices time series based dimensionality reduction to electricity prices curve
modeling. Hence, they showed that it exists a low-
A.1 The Spanish electricity prices market dimensional manifold representation for the price curve in
the New York electricity market. They compared with an
The Spanish electricity prices market has been widely ARIMA model and a naive Bayes as reference method.
analyzed. Many authors have evaluated their own ap- Table VIII presents the MER obtained for the one week-
proaches over the time series for the year 2002 and, as a ahead electricity price forecasting for each second week for
consequence, the literature oers multiple results for this every month of the year 2005. The last row shows the av-
year. The PSF algorithm is compared to four published erage errors when the horizon of prediction is 24 hours. It
approaches: ARIMA [7], ANN [5], mixed models [15] and can be noticed that the PSF approach provides better pre-
WNN [33]. Finally, it is also compared to the naive Bayes dictions in most months. There are just two cases in which
classier as a reference method. the STR overcomes the PSF algorithm: February 2005 and
As it can be observed in Table VI, the proposed method May 2005 (7.65% and 7.53% for STR versus 7.89% and
has improved most of the MER rates. However, there are 7.58% for PSF, respectively). Note that in these two cases
some exceptions, such as for the week of February 18th  the MER obtained by the PSF is not signicantly high.
24th , in which the ANN obtained an error of 5.23% and In addition, when the average error is evaluated, all the
5.15% for the WNN versus the 5.98% provided by the PSF approaches obtained worse results than those of the PSF
method. The mixed models and the WNN method also algorithm, which improves 1.5% the result of STR. An in-
obtained lower errors in the week of May 20th 26th (4.46% crement of 2% approximately can be observed when the
and 4.34% versus 4.51%, respectively). Apart from these horizon of prediction is one week instead of one day.
two weeks, the PSF algorithm was much more ecient than
the others. The mean errors improved by more than 0.5% A.3 The Australian electricity prices market
the best method compared to (7.42% for the PSF versus The prices in the Australia's National Electricity Mar-
8.05% for the WNN). ket have also been predicted in [41]. It is remarkable that
The authors in [15] also forecasted a week of the year this market presents an especial behavior since many spike
2000. The comparative of the MER rates is shown in Table prices are observed. Despite the authors in [41] have de-
VII. The average of the MER for this week is 5.46% when veloped techniques based on SVM in order to deal with
the PSF method was applied, whereas the mixed models this particular days, the PSF algorithm does not make any
and ARIMA models yield an average of 7.04% and 8.17%, assumption about the nature of the days to be predicted,
respectively. For this week, the average results are 1.5% insofar it uses unsupervised learning and, consequently, no
better than those obtained by the others methods. There- a priori information is known about data.
fore, the improvement reached by the proposed algorithm Table IX shows the MER obtained for, precisely, these
is considered successful. days of the year 2004 with peak prices. It can be observed
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

3

TABLE VIII TABLE XI


MER for the year 2005 (NYISO  price). MER, MAE and σ for several months of the year
2001 (OMEL  demand).
M ER

Month ARIMA Naive STR 25.


Feb 2005 14.57% 12.19% 7.65% 7.89% Parameter DR kNN-based 25.
Mar 2005 13.28% 12.33% 10.19% 9.12% MER 2.82% 2.30% 1.89%
Apr 2005 10.68% 14.59% 10.53% 8.23% σM ER 0.019 0.015 0.014
May 2005 14.21% 6.71% 7.53% 7.58% MAE (MW) 572 471 454
Jun 2005 21.64% 26.68% 13.88% 8.85%
Jul 2005 14.63% 14.44% 10.41% 9.21%
Aug 2005 9.49% 10.28% 6.42% 5.56% TABLE XII
Sep 2005
Oct 2005
10.36%
11.84%
13.17%
11.57%
7.31%
10.11%
6.59%
8.03% MER and MAE for January of the year 2004 (NYISO 
Nov 2005
Dec 2005
11.24%
21.78%
15.18%
23.94%
8.99%
13.30%
7.89%
11.21%
demand).
Jan 2006 26.01% 11.52% 13.28% 10.77%
Average 14.98% 14.38% 9.95% 8.41% Parameter NYISO SVM MLF 25.
Average (one-day-ahead) 7.39% 16.07% 7.10% 6.11% MAE (MW) 214.40 226.87 178.21 176.03
MER 3.16% 3.27% 2.51% 2.39%
TABLE IX
MER for some days of the year 2004 (ANEM  price). B.2 The New York electricity demand
Day (2004) ARIMA SVM 25. The authors in [12] used a model of machine learning
5th June 32.31% 18.09% 16.72% called MLF to predict the electricity demand for the next
17th June 29.09% 13.31% 8.31%
day. The forecasted period was January of the year 2004.
20th June 33.73% 17.11% 14.23%
21st June 24.18% 19.20% 18.93% Moreover, two methods were used in order to validate the
Average 29.82% 16.93% 14.55% forecasting: a SVM-based model and the prediction itself
provided by the New York Independent System Operator
(NYISO).
that the proposed method outperforms all the predictions Table XII shows the results of comparing PSF and the
produced by both ARIMA and SVM approaches. aforementioned methods in the same period. Given the
According to [40] four weeks were predicted with dif- diculty of predicting demand time series, an improvement
ferent methods: a discrete wavelet transform (DWT), a of about 5% (2.39% for PSF versus 2.51% for MLF) it can
multi-layer perceptron (MLP) and a SMV approach. Table be considered a remarkable enhancement.
X presents the MER provided by the PSF method and the
aforementioned techniques when the horizon of prediction B.3 The Australian electricity demand
is one week. The PSF algorithm outperforms the average
To compare the results provided by the proposed method
MER provided by all these methods.
for the Australian market the work in [1] was considered.
Two days were predicted October 1st and 2nd of the year
B. Electricity demand time series 1998 and three methods were used: fuzzy and plain neural
B.1 The Spanish electricity demand networks and an ARIMA model.
As it can be observed in Table XIII, the PSF algorithm
In order to compare the performance of the proposed
improves the prediction error with respect to the other
approach in the Spanish electricity demand time series, the
methods, including the very accurate fuzzy neural network.
results provided in [34] are analyzed.
Table XI shows the comparison between a dynamic re- TABLE XIII
gression (DR), a method based on nearest neighbors tech- MER for some days of the year 1998 (ANEM  demand).
niques (kNN) and the PSF algorithm for the period from
June to November of the year 2001. As it can be noticed, Parameter ARIMA ANN Fuzzy-ANN 25.
the proposed algorithm obtained better predictions not MER 4.23% 3.23% 0.92% 0.90%
only for MER but also for MAE when it was compared to
the other methods considered in the literature.
Although the kNN had a good performance, the PSF VI. Conclusions
was able to reduce from 2.30% to 1.89%. In this paper, a new forecasting algorithm has been
proposed to predict real-world time series. As previous
TABLE X step to the prediction, a clustering technique to label 24-
MER for some weeks of the year 2004 (ANEM  price). dimensional time series has been used and the main novelty
lies on the exclusive use of the labels obtained by the clus-
Week DWT MLP SVM 25. tering to forecast the future behavior of the time series,
Second of January 12.94% 25.81% 23.37% 15.62% avoiding the use of the real values of the time series until
First of July 12.23% 8.36% 15.03% 9.12%
First of August 16.17% 15.85% 36.18% 13.98% the last step of the prediction process. Moreover, an au-
Third of December 10.01% 47.41% 33.74% 10.23% tomatization of the selection of the critical parameters K
Average 12.84% 24.36% 27.08% 12.23% and W has been proposed.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

"

The algorithm has been successfully applied in electric- [20] F. Martínez-Álvarez, A. Troncoso, J. C. Riquelme, and J. M.

ity prices and demand time series of Spanish, Australian Riquelme. Partitioning-clustering techniques applied to the elec-
Lecture Notes in Computer Science
and New York markets providing very competitive results. tricity price time series.
4881:990999, 2007.
,

The performance was accurate in all of them, showing thus [21] F. Martínez-Álvarez, A. Troncoso, J. C. Riquelme, and

the robustness and adaptability of the proposed approach J. S. Aguilar Ruiz. LBF: A labeled-based forecasting algorithm
Prooceed-
for time series of dierent nature. This fact is specially re- and its application to electricity price time series. In
ings of the eighth IEEE International Conference on Data Min-
markable since the approaches found in literature are usu- ing , pages 453461, 2008.

ally focussed on only one specic time series. [22] K. Metaxiotis, A. Kagiannas, D. Askounis, and J. Psarras. Ar-

Future work is focussed on adjusting the model with dy-


ticial intelligence in short term electric load forecasting: A
state-of-the-art survey for the researcher. Energy Conversion
namical lengths of window and on smoothing the matching and Management , 44:15251534, 2003.

sequence criterion. [23] Z. Mohamed and P. Bodger. Forecasting electricity consump-


tion in new zealand using economic and demographic variables.
Energy , 30:18331843, 2005.

References [24] F. J. Nogales and A. J. Conejo.


through transfer function models.
Electricity price forecasting
Journal of the Operational
[1] A. Abraham and B. Nath. A neuro-fuzzy approach for fore- Research Society
Applied Soft Computing
, 57:350356, 2006.
casting electricity demand in victoria. http://www.omel.es
Journal
[25] Spanish Electricity Price Market Operator. .
, 1(2):127138, 2001. [26] The New York Independent System Operator.
[2] S. K. Aggarwal, L. M. Saini, and A. Kumar. Electricity price http://www.nyiso.com .
forecasting in deregulated markets: A review and evaluation.
International Journal of Electrical Power and Energy Systems
[27] S. Pezzulli, P. Frederic, S. Majithia, S. Sabbagh, E. Black,
, R. Sutton, and D. Stephenson. The seasonal forecast of elec-
31(1):1322, 2009. tricity demand: a hierchical bayesian model with climatological
[3] S. K. Aggarwal, L. M. Saini, and Ashwani Kumar. Price fore- weather generator. Applied Stochastic Models in Business and
casting using wavelet transform and lse based mixed model in Industry , 22:113125, 2006.
australian electricity market. International Journal of Energy [28] N. M. Pindoria, S. N. Singh, and S. K. Singh. An adaptative
Sector Management , 2(4):521546, 2008. wavelet neural network-based energy price forecasting in electric-
[4] N. Amjady. Day-ahead price forecasting of electricity markets ity markets. IEEE Transactions on Power Systems , 23(3):1423
by a new fuzzy neural network. IEEE Transactions on Power 1432, 2008.
Systems , 21(2):887896, 2006. [29] M. A. Plazas, A. J. Conejo, and F. J. Prieto. Multimarket opti-
[5] J. P. S. Catalao, S. J. P. S. Mariano, V. M. F. Mendes, and mal bidding for a power producer. IEEE Transactions on Power
L. A. F. M. Ferreira. Short-term electricity prices forecasting Systems , 20(4):20412050, 2005.
in a competitive market: Electric
a neural network approach. [30] J. M. Riquelme, J. L. Martínez, A. Gómez, and D. Cros. Load
Power Systems Research , 77:12971304, 2007. pattern recognition and load forecasting by articial neural net-
[6] J. Chen, S. J. Deng, and X. Huo. Electricity price curve modeling works. International Journal of Power and Energy Systems ,
by manifold learning.IEEE Transactions on Power Systems , 22(1):7479, 2002.
15:723736, 2007. [31] L. F. Sugianto and X. B. Lu. Demand forecasting in the dereg-
[7] A. J. Conejo, M. A. Plazas, R. Espínola, and B. Molina. Day- ulated market: Proceedings of the
a bibliography survey. In
ahead electricity price forecasting using the wavelet transform Australasian Universities Power Engineering Conference , pages
and ARIMA models. IEEE Transactions on Power Systems , 16, 2002.
20(2):10351042, 2005. [32] J. W. Taylor, L. M. de Menezes, and P. E. McSharry. A compar-

[8] R. Cottet and M. Smith. Bayesian modeling and forecasting of ison of univariate methods for forecasting electricity demand up

Journal of the American Statistical


intraday electricity load. to a day ahead. International Journal of Forecasting , 22:116,

Association , 98(464):839849, 2003. 2006.

[9] D. L. Davies and D. W. Bouldin. A cluster separation measure. [33] A. Troncoso, J. C. Riquelme, J. M. Riquelme, J. L. Martínez,

IEEE Transactions on Pattern Analysis and Machine Intelli- and A. Gómez. Electricity market price forecasting based on

gence weighted nearest neighbours techniques. IEEE Transactions on


Power Systems
, 1(4):224227, 2000.
[10] J. Dunn. Well separated clusters and optimal fuzzy partitions. , 22(3):12941301, 2007.

Journal of Cybernetics , 4:95104, 1974. [34] A. Troncoso, J. M. Riquelme, J. C. Riquelme, A. Gómez, and
J. L. Martínez. Time-series prediction: Application to the short
Lecture Notes in Articial Intel-
[11] M. El-Telbany and F. El-Karmi. Short-term forecasting of jor-
term electric energy demand.
ligence
danian electricity demand using particle swarm optimization.
Electric power systems research , 78:425433, 2008. , 3040:577586, 2004.
[35] J. Wang and L. Wang. A new method for short-term electricity
[12] S. Fan, C. Mao, J. Zhang, and L. Chen. Forecasting electricity
Lecture Notes in load forecasting. Transactions of the Institute of Measurement
demand by hybrid machine learning model.
Computer Science , 4233:952963, 2006.
and Control , 30(3):331344, 2008.

[13] E. A. Feinberg and D. Genethliou. Applied Mathematics for Re- [36] X. Wang and M. Meng. Forecasting electricity demand using
Proceedings of the Seventh International
structured Electric Power Systems, Chapter 12 . Springer, 2005.
grey-markov model. In
Conference on Machine Learning and Cybernetics , pages 1244
[14] R. C. García, J. Contreras, M. van Akkeren, and J. B. García.
1248, 2008.
A GARCH forecasting model to predict day-ahead electricity
IEEE Transactions on Power Systems [37] R. Modeling and Forecasting Electricity Loads and
Weron.
prices.
2005.
, 20(2):867874,
Prices . Wiley, 2006.
[38] R. Weron and A. Misiorek. Forecasting spot electricity prices: A
[15] C. García-Martos, J. Rodríguez, and M. J. Sánchez. Mixed mod-
comparison of parametric and semiparametric time series mod-
els for short-run forecasting of electricity prices:
for the spanish market. IEEE Transactions on Power Systems
Application
,
els.International Journal of Forecasting , 24:744763, 2008.
[39] R. Xu and D. C. Wunsch II. Survey of clustering algorithms.
22(2):544552, 2007.
Finding groups in Data: an IEEE Transactions on Neural Networks , 16(3):645678, 2005.
[16] L. Kaufman and P. J. Rousseeuw.
Introduction to Cluster Analysis [40] Z. Xu, Z. Y. Dong, and W. Liu. Neural Networks Applications
[17]
. Wiley, 1990.
R. Kohavi. A study of cross-validation and bootstrap for accu-
in Information Technology and Web Engineering, Chapter 22 .

racy estimation and model selection. InProceedings of Interna- Borneo Publishing, 2005.

tional Joint Conference on Articial Intelligence , pages 1137


[41] J. H. Zhao, Z. Y. Dong, X. Li, and K. P. Wong.
for electricity price spike analysis with advanced data mining
A framework

[18]
1143, 1995.
G. Li, C. C. Liu, C. Mattson, and J. Lawarrée. Day-ahead elec-
methods. IEEE Transactions on Power Systems , 22(1):376385,

tricity price forecasting in a grid environment. IEEE Transac- 2007.

tions on Power Systems , 22(1):266274, 2007.


[19] Australia's National Electricity Market.
http://www.nemmco.com.au .

Das könnte Ihnen auch gefallen