Eld Ann MLP

MLPs Predictive Models to Forecast
Electricity Consumption
James Foot* Valeriu Mihai**
*Faculty of Science and Technology, University of the Algarve (e-mail: a40650@ualg.pt).
** Faculty of Science and Technology, University of the Algarve (e-mail: a41635@ualg.pt).
Abstract: In this paper, we present a different approach for a short-term prediction of the electricity load
demand (ELD) for the Portuguese power grid. Mainly, we apply a Multilayer perceptron (MLP) artificial
neural network (ANN) to forecast the ELD in a one-step-ahead fashion or every 15 minutes based on
historical (altered) data from the Portuguese power grid company Redes Electricas Nacionais (REN). In
designing an ANN-MLP for time series forecasting, the variables that are depended to develop the model
process include the number of input, hidden and output neurons are important. There are no specific ways
to determine these parameters, but through the iteration process. To obtain the best performance in
prediction, ANN models require an experimental approach to analyse the ANN design space and
application of different training strategies. The NN models are trained by the Levenberg-Marquardt
algorithm. Different experiments were carried out to show which parameters are crucial to have good
prediction accuracy using non-linear autoregressive predictive model (NAR).
Keywords: Electricity load demand; Power grid; Multilayer perceptron; Artificial Neural Network;
Prediction; Forecast; Modeling; Non-linear autoregressive predictive model.
1. INTRODUCTION
The electricity load demand forecasting is an important
aspect for any modern energy company, with respect to their
system management. Load forecasting can be used for
scheduling maintenance, reducing spinning reserve capacity,
scheduling individual plant production, which will improve
the reliability of the grid and reduce cost for the company and
the end consumer. There are several different kinds of
forecast lengths depending on the objectives:
1.
Long-term - typically a long-term is a forecast from 1 to

10 years. This is used for major planning and investment,
i.e., if the ELD increases significantly the planning
construction of a new power plant could take up to
several years.
2.
Medium-term - typically a medium-term forecast is from

a couple of months to a year. This is used to ensure that
capacity constraints are met in the medium term.
3.
Short-term - typically a short-term forecast is from a few

minutes up to a day. This is used to assist planning and
to manage electricity production.
In this paper well focus on the short-term prediction in order

to better manage the electricity production for the grid. This
project is based on a paper (Ferreira et al., 2010) were the
authors, using Radial Basis Functions ANN, worked on
creating a model to forecast, within a period of 24 to 48
hours, the ELD for the REN.
We were provided with a file eld180dias.txt containing the

values for the ELD that were measured every 15 minutes
during a period of 180 days.
1.1 Characteristics of ELD
There are several variables like time and random effects that
can affect the normal variation of the ELD, a so make it
harder to do a short-term prediction. As you can imagine the
electricity load demand differs from the day to the night, the
demand from the weekend is different from the demand
during the weekdays, but all theses differences have a cyclic
nature, i.e., the ELD at 12 pm on Tuesday should be similar
to the ELD from the previous Tuesday at 12 pm and so on,
although the occurrence of a public holiday or the shift to and
from daylight saving time and even the start of a school year
can cause changes to these cycles. Random effects are
another source of disturbance to the regular ELD; anything
like heavy machinery in a factory being used, widespread
strikes, and special events can affect the load. Since weve
only been give data for the ELD values, we cant consider
any of these variables mentioned above.
The paper is organized as follows: in section 2 we have a
slight overview of the Multilayer perceptrons (MLPs) and
the Levenberg-Marquardt algorithm, after that in section 3 we
describe the Data set, in section 4 we talk about the
experiments and the procedure to create and train the
network, in section 5 we show the result and analyse them in
section 6. In section 6 we also talk about future work.
2. MODEL IDENTIFICATION PROCEDURE

As previously mentioned, the data available for this project is
a series of historical measurements; limiting us on the type of
model structure available. A Non-linear Auto-Regressive
(NAR) structure is when the inputs are a series of delay from
the output, i.e., if y is the output of the ANN then:
y = f(y 1, y 2, y 3, , y n)
where, n is the number of delay. The ANN model is trained
by the Levenberg-Marquardt (LM) algorithm.
2.1 Multilayer Perceptron
MLP is a subset of ANN, defined as a system of massively
distributed parallel processors (consisting of simple
processing units called neurons) that have natural tendency
for storing and utilizing experiential knowledge (Yassin,
2009). Generally, the MLP learns the relationship between a
set of inputs and outputs by updating internal
interconnections called weights using the back-propagation
algorithm.
In MLP, the units are arranged in interconnected layers: one
input layer, one (or more) hidden layers, and one output
layer; this can be seen in the figure below. The numbers of
input and output units are typically fixed, since they depend
on the input and desired output(s). However, the training
algorithm and the number of hidden units are adjustable, and
can be set so that it maximizes the performance of the MLP.
to increase, this indicates that over-generalization has

occurred, thus training is stopped. ES is widely used because
it is simple to implement and understand, and has been
reported to be superior to regularization methods in many
cases.
2.2 Levenberg-Marquardt
Levenberg-Marquardt algorithm, which was independently
developed by Kenneth Levenberg and Donald Marquardt,
provides a numerical solution to the problem of minimizing a
nonlinear function (Yu et al, 1993; Hagan et al, 1994). It is
fast and has stable convergence. In the artificial neural
network field this algorithm is suitable for small- and
medium-sized
problems.
The
Levenberg-Marquardt
algorithm can be presented as:
W!!! = W! (J! ! J! + uI)!! J! e!
where, u is always positive, called combination coefficient
and I is the identity matrix.
3. DATA SET
The data set used in the experiments correspond to an altered
version of the ELD data for the Portuguese power grid,
measured ever 15 minutes during a period of 180 days
corresponding to 17280 values. The complete time-series is
presented in fig. 2. As mentioned in the previous section, we
used 3 data sets; training set a testing set and a validation set.
Out of that set of 17280 values we used the first 70% of the
values, then 15% for the Testing set and another 15% for
validation.
Fig. 1. Configuration of a multilayer perceptron. The hidden

component can have more than one layer.
A common problem in MLP training is over-generalization,
referring to a condition where the MLP has been trained until
it has memorized the data its given, rendering it unable to
adapt and generalize to new cases.
In order to obtain the optimum MLP generalization, the Early
Stopping (ES) method divides the dataset into three sets the
training set, and independent validation and testing sets. The
training set is used to update the MLP weights during the
training phase, and the error in the independent validation set
is monitored. Since the validation set does not participate in
the training process, it can be used as a performance gauge to
measure the generalization capabilities of the ANN when it
encounters previously untrained cases. If the training error
continues to decrease, but the validation set error has started
Fig. 2. Plot for the values on the file eld180days.txt.

Table 1. Conversion table
Time
N of measurements
1 hour
24 hours (1 day)
48 hours (2 days)
1 week (7 days)
4
96
192
672
5. RESULTS
Table 2. Data set used in experiments

Data set
Training
Validation
Testing
Percentage (%)
Number of points
Number of days
70
12096
126
15
2592
27
15
2592
27
4. EXPERIMENT
As express in the previous sections there are three parameters
that can be changed in order to improve the: number of
hidden layers, number of neurons in each layer and the
number of delay. With these parameters we conducted three
groups of experiments, in which in each group we altered
only one of the parameters. But before we start the
experiments we need to import the data from a .txt file to a
Matlab column vector in order to be able to pre-process, i.e.,
normalizing them in-between 1 and -1. Next phase was to
decide what would be the values for the different parameters.
Our control experiment, experiment A, will have 1 hidden
layer with 4 neurons and n = 3 delays.
Table 3. Parameters for each experiment
Experiment
Parameter
Variations
B, C
D, E, F, G
N hidden layers
N of neurons/layer
2, 3
8, 16, 32, 64
H ,I, J
N of delays
6, 9, 96
3.1 Training
After that we want to start training our ANN. To do this we
used some functions from the Matlab NN toolbox. First we
create the network using the function narnet (Beale
et.al.,2014) that has inputs for the number of delays, and
hidden layer topology (number of hidden layers and number
of neurons in each layer). Then we use the preparets (Beale
et.al.,2014) function to prepare the values for the training and
simulation. After that we divide the data into the 3 sets
mention earlier, 70% for the training set, 15% for the
validation and 15% for the testing, using the function
divideblock ((Beale et.al.,2014). Next we used the train
(Beale et.al.,2014) function to commence the training of the
ANN. This training function is using the LM algorithm.
3.2 Outputs
The outputs of the training function are a series of plots that
display performance of the ANN that is the MSE per
iteration. The root-mean-square error, that is used measure
the difference between the value predicted by a model and
the values actually observed .The Time-Series Response that
shows the error between the target and the output for each
one of the 3 value sets. The weight values for each of the
connects between the neurons.
At the end all the data is restored to their original values, so
that it makes it easier to understand the results.
We ran each experiment 3 times and calculated the average,

in order to improve the reliability of the results.
Table 4. Results for experiment A
Test
MSE
RMSE
Iterations
1
2
3
Average
0.00081489
0.00077090
0.00080887
0.00079822
55.078
53.571
54.875
54.508
11
61
5
26
Table 5. Results for experiment B

Test
MSE
RMSE
Iterations
1
2
3
Average
0.00077030
0.00079379
0.00080769
0.00079059
53.784
54.361
54.835
54.327
34
65
12
37
Table 6. Results for experiment C

Test
MSE
RMSE
Iterations
1
2
3
Average
0.00078439
0.00075733
0.00078205
0.00077459
54.038
53.098
53.957
53.698
88
430
14
177
Table 7. Results for experiment D

Test
MSE
RMSE
Iterations
1
2
3
Average
0.00077628
0.00077807
0.00077769
0.00077735
53.758
53.820
53.807
53.795
22
29
62
38
Table 8. Results for experiment E

Test
MSE
RMSE
Iterations
1
2
0.00078601
0.00079043
54.094
54.246
28
18
3
Average
0.00079298
0.00078907
54.333
54.224
5
17
Table 9. Results for experiment F

Test
MSE
RMSE
Iterations
1
2
0.00073054
0.00073731
52.150
52.391
431
114
3
Average
0.00073797
0.00073527
52.415
52.319
57
201
Table 10. Results for experiment G

Test
MSE
RMSE
Iterations
1
2
3
Average
0.00073008
0.00073290
0.00073913
0.00073404
52.134
52.234
52.456
52.275
130
56
51
79
Table 11. Results for experiment H

Test
MSE
RMSE
Iterations
1
2
0.00072018
0.00073590
51.779
52.341
106
135
3
Average
0.00072338
0.00072649
51.894
52.005
83
108
Table 12. Results for experiment I

Test
MSE
RMSE
Iterations
1
2
3
Average
0.00073024
0.00073409
0.00075043
0.00073825
52.139
52.277
52.855
52.424
49
64
49
54
Table 13. Results for experiment J

Test
MSE
RMSE
Iterations
1
2
3
Average
0.00038865
0.00039017
0.00041169
0.00039687
38.037
38.112
39.149
38.433
73
72
40
62
Fig. 4. Estimated outputs 48 hours of experiment J
After analysing the result, we concluded that experiment J

demonstrated the best results, and for that reason we display
more detailed graphs from that experiment. Fig. 3 shows the
Time-series response that is the difference between the target
and the output for the NN. Fig. 4 shows a plot for the target
values (in blue) and a plot for the target values (in red), in a
window of 48 hours. As we can observe the difference
between them is very small. Fig.5 was the regression values
for each one of the data sets. And Fig. 6 shows the
performance of the ANN.
Fig. 5. Regression of values for Training set, Validation set

and Test set of experiment J
Fig. 3. Time-Series Response of experiment J
Fig. 6. Performance for experiment J
6. CONCLUSIONS
Regarding the experiments made in the previous section can
conclude that they are acceptable for the problem of ELD
forecasting. All the experiments produced valuable insight to
the working of the MLP ANN and could be important for
future work. Starting with the first three experiments (A, B
and C) change the number of hidden layers did improve the
performance of the NN, but not very significant way, but o
the other hand maid it more complex, as you can see by the
increased number of iterations needed. Next we try to change
the number of neurons in the one hidden layer (experiment D,
E, F, G) and see what would happen. Again comparing them
with experiment A the performance improved slightly, but
the complexity grow. And last experiments (H, I, J) we
changed the number of delays, here we observed that when
we used one days worth of delays the performance value had
a bigger drop, the number of iterations is higher then in
experiment A, but still acceptable.
After this analysis done in the section above, we can see that
there is clearly still rom for improvements. For this paper we
only did a small number of experiments that maybe were not
enough to obtain better conclusions.
Some suggestions to improve the result from our experiments
would be to use a genetic algorithm in order to generate the
best network to topology in order to get the smallest error
possible.
Use additional data like an input for weekdays, weekends,
holidays, weather and temperature.
REFERENCES
Ferreira, P. M., Ruano, A. E., Pestana, R., (2010), Evolving
RBF Predictive Models to Forecast the Portuguese
Electricity Consumption., IFAC Conference on Control
Methodologies and Technology for Energy Efficiency.
Hagan, M. T., Menhaj, M. B., (1994), Training feedforward
networks with the Marquardt algorithm, IEEE Trans. on
Neural Networks, vol. 5, pages 989 - 993.
Ruano, A. E., Artificial Neural Networks, Centre for
Intelligent Systems, University of Algarve, pages 7-119.
Mller, Fodslette M., (1993), A scaled conjugate gradient
algorithm for fast supervised learning., Neural networks
6.4, pages 525-533.
Yassin I. M., (2008), Face detection using artificial neural
network trained on compact features and optimized
using particle swarm optimization, M. S. thesis, Faculty
of Electrical Engineering, Universiti Teknologi MARA,
Shah Alam.
Yu, Hao, and Wilamowski B. M., (2011), Levenbergmarquardt training. The Industrial Electronics
Handbook 5, pages 1-15.

Eld Ann MLP

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Eld Ann MLP

Hochgeladen von

Copyright:

Verfügbare Formate

MLPs Predictive Models to Forecast

Long-term - typically a long-term is a forecast from 1 to

Medium-term - typically a medium-term forecast is from

Short-term - typically a short-term forecast is from a few

In this paper well focus on the short-term prediction in order

We were provided with a file eld180dias.txt containing the

2. MODEL IDENTIFICATION PROCEDURE

to increase, this indicates that over-generalization has

Fig. 1. Configuration of a multilayer perceptron. The hidden

Fig. 2. Plot for the values on the file eld180days.txt.

Table 2. Data set used in experiments

We ran each experiment 3 times and calculated the average,

Table 5. Results for experiment B

Table 6. Results for experiment C

Table 7. Results for experiment D

Table 8. Results for experiment E

Table 9. Results for experiment F

Table 10. Results for experiment G

Table 11. Results for experiment H

Table 12. Results for experiment I

Table 13. Results for experiment J

Fig. 4. Estimated outputs 48 hours of experiment J

After analysing the result, we concluded that experiment J

Fig. 5. Regression of values for Training set, Validation set

Fig. 3. Time-Series Response of experiment J

Fig. 6. Performance for experiment J

Das könnte Ihnen auch gefallen