Sie sind auf Seite 1von 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/319417000

Hybrid Support Vector Regression Model Based on Electric Power Load During
National Holiday Seasons

Conference Paper  in  IEEE Network · October 2017

CITATIONS READS

0 71

4 authors, including:

Rezzy Eko Caraka Sakhinah Abu Bakar


Chaoyang University of Technology Universiti Kebangsaan Malaysia
118 PUBLICATIONS   196 CITATIONS    36 PUBLICATIONS   253 CITATIONS   

SEE PROFILE SEE PROFILE

Arif Budiarto
Binus University
18 PUBLICATIONS   17 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

STATISTICAL CALCULATOR @STATCAL View project

Hybrid function projective synchronization in discrete dynamical networks via adaptive control View project

All content following this page was uploaded by Rezzy Eko Caraka on 30 May 2018.

The user has requested enhancement of the downloaded file.


Hybrid Support Vector Regression in Electric Load During National Holiday Season

Rezzy Eko Caraka1,2*, Sakhinah Abu Bakar1†,Bens Pardamean2,3# , Arif Budiarto2‡

1
School of Mathematical Sciences, Faculty of Science and Technology
The National University of Malaysia
2
Bioinformatics and Data Science Research Center,
Bina Nusantara University, Jakarta, Indonesia, 11480
3
Computer Science Department, BINUS Graduate Program – Master of Computer Science
Bina Nusantara University, Jakarta, Indonesia, 11480

Email: *rezzy.caraka@binus.edu P90633@siswa.ukm.edu.my, †sakhinah@ukm.edu.my, #bpardamean@binus.edu,



abudiarto@binus.edu

Abstract—This paper studies non-parametric time-series proper electricity forecasting for electricity needs in a particular
approach to electric load in national holiday seasons based on period. The future prediction will help the solution provider
historical hourly data in state electric company of Indonesia companies to make the right decision. Thus, the planning of
consisting of historical data of the Northern Sumatera also South electricity operation becomes efficient. Electrical prediction
and Central Sumatra electricity load. Given a baseline for
with high precision will attract an imbalance of electrical power
forecasting performance, we apply our hybrid models and
computation platform with combining parameter of the kernel. To between the supply side and the demand side [1].
facilitate comparison to results of our analysis, we highlighted the Forecasting with a huge reason can reduce the imbalance
results around MAPE-based and R2-based techniques. In order to between the side and demand of electricity, will provide the
get more accurate results, we need to improve, investigate, also proper foundation for the stability and power of the network to
develop the appropriate statistical tools. Electric load forecasting avoid waste of resources in the process of scheduling and to
is a fundamental aspect of infrastructure development decisions improve. It is essential for the operation of the system as it can
and can reduce the energy usage of the nation. provide information that can support and help the system work
safely.
Keywords—electric load, Support Vector Regression, Hybrid,
With accurate forecasting too, the electricity system will have
Kernel, Time Series
dynamic stability, quality, and management. Forecasts on
I. INTRODUCTION electric service providers fall into three categories: short-term
forecasting that applies to predictions that occur within a day
Electrical system must be developed to fit the increase in
until the day of up to one week forward. The electric load
electricity needs of customers where the electricity supply
forecasting is complicated, and it sometimes reveals cyclic
should be a priority in its development by the principles of
changes due to cyclic economic activities or seasonal climate
effective and efficient. Therefore, in this case, it is an essential
nature, such as the hourly peak in a working day, weekly peak
matter for electric service providers [1][2]. Forecasting has a
in a business week, and monthly peak in demand planned year
vital role as an absolute requirement that must be done by the
[3].
service provider company. One of the important things is the

Capture Gaussian
Radial Polynomial
Electric Machine Basis
Forecasting
Power Load Learning
Analysis Kernel

Traditional Insight
SVR
ARIMA
Feed Forward Generalized
Neural Regression Hybrid SVR
Network Neural Network

Fig. 1. An Illustration of Electric Load analysis using HYBRID SVR


Having reduced the cost, we need to decrease the probability B. Machine Learning
of accidents, and accurate fault prediction is a goal pursued by
researchers working at system test and maintenance. Most of A crucial aspect of applying kernel methods on time series
the traditional fault forecasting methods are not suitable for data is to find a good kernel similarity to distinguish between
online prediction and real-time processing [4]. What’s more time series [8]. Therefore, classical machine learning
neural network widely used for modeling stock market time algorithms cannot be directly applied to time series
series due to its universal approximation property [5]. On the classification. Kernel methods for dealing with time series data
other hand, researchers indicated that neural network, which have received considerable attention in the machine learning
implements the empirical risk minimization principle, community. An easy way is to treat the time series as static
outperforms traditional statistical models. Also, neural network vectors, ignoring the time dependence, and directly employ a
suffers from difficulty in determining the hidden layer size, linear kernel or Gaussian radial basis kernel and polynomial
learning rate and local minimum traps. Likewise, Vapnik kernel. The Support Vector Machine (SVM) was developed by
proposed Support Vector Regression (SVR), which is exhibits Boser, Guyon, Vapnik, and was first presented in 1992 at the
better prediction to its implementation, risk minimization Annual Workshop on Computational Learning Theory. This
principle and has a global optimum [6] machine learning method with the purpose of finding the best
separator function (hyperplane) that separates the two classes
II. METHOD on the input space[9].
Time series analysis is used when the research data used is Given a finite set of n example/label pairs belonging to ܺ ൈ
intertwined by time, so there is a correlation between current ܻ, where ܻ ൌ ሼെͳǡͳሽǡ our task is to find a model ݂ሺǤ ሻ that
events data from one previous period, meaning that current accurately predicts a new label y for some input x. Specifically,
events are also affected by events in one preceding period. the SVM builds a linear model ݂ሺ‫ݔ‬ሻ ൌ ࢝஋ ࢞ ൅ ܾ when X is a
vector space. ݂ሺ‫ݔ‬௜ ሻ ൒ ͳ for examples where ‫ݕ‬௜ ൌ ͳ and
A. Traditional ARIMA ݂ሺ‫ݔ‬௜ ሻ ൑ െͳ for ‫ݕ‬௜ ൌ െͳ. We also wish for ԡ࢝ԡ to be as small
as possible, because this will generalize to new examples better
ARIMA (autoregressive integrated moving average) than if we allow ԡ࢝ԡ to be large. Intuitively, if a new example
models are the general class of models for forecasting a time x is perturbed by a small amount, then a model with a small
series that can be stationarized by transformations such as ԡ࢝ԡ is less likely to move its prediction, which is the sign of
differencing. The first step in the ARIMA procedure is to ensure ݂ሺ‫ݔ‬ሻ, across the decision boundary to the other class. With a
the series is series is stationary[7] convex objective[11], i.e., minimizing ԡ࢝ԡ, and convex
The process of AR (p) and MA (q) can be written as follow: constraints, we can see that the SVM solves a convex program.
ܼ௧ ൌ  ߶ଵ ܼ௧ିଵ ൅ ‫ ڮ‬൅ ߶௣ ܼ௧ି௣ ൅ ܽ௧ ൅ ߠଵ ܽ௧ିଵ ൅ ‫ ڮ‬൅ ߠ௤ ܽ௧ି௤ ଵ
Often the objective is given as ԡ࢝ԡଶଶ so that it is differentiable,
ARMA (p,q) also can be written by using backshift operator ଶ
(B). To give an example the process of ARMA (1,1) and and sometimes as ԡ࢝ԡଵ to encourage sparsity (classifying on
ARMA(1,2): fewer features of the input). In the quadratic case, this is a
ARMA(1,1): ܼ௧ ൌ  ߶ଵ ܼ௧ିଵ ൅ ܽ௧ ൅ ߠଵ ܽ௧ିଵ quadratic program (QP) [12][13]

ܼ௧ െ ߶ଵ ܼ௧ିଵ ൌ ܽ௧  ൅ ߠଵ ܽ௧ିଵ  ‹ ԡ࢝ԡଶଶ

(5)
ሺͳ െ ߶ଵ ‫ܤ‬ሻܼ௧ ൌ ሺͳ ൅ ߠଵ ‫ܤ‬ሻܽ௧ ‫ݏ‬Ǥ ‫ݐ‬Ǥ ‫ݕ‬௜ ሺ࢝஋ ࢞௜ ൅ ܾሻ ൒ ͳǡ ‫  א ݅׊‬ሾͳǤ Ǥ ݊ሿ
߶ଵ ሺ‫ܤ‬ሻܼ௧ ൌ ߠଵ ሺ‫ܤ‬ሻܽ௧ (1) The dual to this program is the following:
ARMA(1,2): ܼ௧ ൌ  ߶ଵ ܼ௧ିଵ ൅ ܽ௧ ൅ ߠଵ ܽ௧ିଵ ൅ ߠଶ ܽ௧ିଶ ଵ
ƒš σ௜ ߙ௜ െ  σ௜ σ௝ ߙ௜ ߙ௝ ‫ݕ‬௜ ‫ݕ‬௝ ‫ݔ‬௜஋ ‫ݔ‬௝ (6)

ሺͳ െ ߶ଵ ‫ܤ‬ሻܼ௧ ൌ ሺͳ ൅ ߠଵ ‫ ܤ‬൅ ߠଶ ‫ ܤ‬ଶ ሻܽ௧ ‫ݏ‬Ǥ ‫ݐ‬Ǥ σ௜ ߙ௜ ‫ݕ‬௜ୀ଴ǡఈ೔ஹ଴ǡ‫׊‬௜‫א‬ሾଵǤǤ௡ሿ
߶ଵ ሺ‫ܤ‬ሻܼ௧ ൌ ߠଶ ሺ‫ܤ‬ሻܽ௧
An SVM is also called a maximum margin classifier because it
So, the generally equation for ARMA process (p, q) with
maximizes the space between positive and negative training
backward shift operator is:
examples. This form also has the drawback that if the training
߶௣ ሺ‫ܤ‬ሻܼ௧ ൌ ߠ௤ ሺ‫ܤ‬ሻܽ௧ (2)
examples are not linearly classifiable, then an SVM cannot find
The ARIMA model (p, d, q) is a nonstationary time series which
a model that fits the training data. This form is called a hard
after being taken to d (difference) to stationary having an
margin SVM because there must not be any examples in the
autoregressive model of degree p and a moving average of
margin. The margin can be “softened” by adding a loss
degree q. Here is the ARIMA process (1,1,1)
function:
ARIMA (1,1,1): ଵ
ሺܼ௧ െ ܼ௧ିଵ ሻሺܼ௧ െ ߶ଵ ܼ௧ିଵ ሻ ൌ ሺܽ௧ ൅  ߠଵ ܽ௧ିଵ ሻ
‹ ԡ࢝ԡଶଶ ൅ ‫ ܥ‬σ௡௜ୀଵ κሺ݂ሺ‫ݔ‬௜ ሻǡ ‫ݕ‬௜ ሻ

(7)
ሺܼ௧ െ ‫ܼܤ‬௧ ሻሺܼ௧ െ ߶ଵ ‫ܼܤ‬௧ ሻ ൌ ሺܽ௧ ൅  ߠଵ ‫ܽܤ‬௧ ሻ When the loss is the hinge loss, that is, κሺ‫ݖ‬ǡ ‫ݕ‬ሻ ൌ ݉ܽ‫ݔ‬ሺͲǡͳ െ
ሺͳ െ ‫ܤ‬ሻሺͳ െ ߶ଵ ‫ܤ‬ሻܼ௧ ൌ ሺͳ ൅  ߠଵ ‫ܤ‬ሻܽ௧ ‫ݖݕ‬ሻǡthis yields the formulation from Cortes and Vapnik
߶ଵ ሺ‫ܤ‬ሻሺͳ െ ‫ܤ‬ሻ
[14][15] :
ܼ௧ ൌ ߠଵ ሺ‫ܤ‬ሻܽ௧ (3) ଵ
‹ ԡ࢝ԡଶଶ ൅ ‫ ܥ‬σ௡௜ୀଵ ߦ௜ (8)
In general, the ARIMA process (p,d,q) can be written as ଶ
•Ǥ –Ǥ‫ݕ‬௜ ሺ࢝஋ ࢞௜ ൅ ܾሻ ൒ ͳ െ ߦ௜ ǡ ‫ א ݅׊‬ሾͳǤ Ǥ ݊ሿ
follows: ߶௣ ሺ‫ܤ‬ሻሺͳ െ ‫ܤ‬ሻௗ ܼ௧ ൌ  ߠ௤ ሺ‫ܤ‬ሻܽ௧ (4) ߦ௜ ൒ Ͳ
where ߶௣ ሺ‫ܤ‬ሻ ൌ ͳ െ ߶ଵ‫ ܤ‬െ ߶ଶ ‫ܤ‬ଶ െ ‫ ڮ‬െ ߶௣ ‫ܤ‬௣ as AR operator with p The squared hinge loss (replace ߦ௜ with ߦ௜ଶ ) is also common.
orde which is stationer and ߠ௤ ሺ‫ܤ‬ሻ ൌ ͳ ൅ ߠଵ‫ ܤ‬൅ ‫ ڮ‬൅ ߠ௤ ‫ܤ‬௤ as MA Soft-margin classifiers allow examples to lie inside the margin
operator which is invertible. or even in the “wrong” part of the model. In many cases, this
still trains a model that generalizes well. These forms of SVM be transformed into its (more manageable) dual formulation
also have dual forms, usually simple additions to the constraints Lagrangian for the primal problem can be formulated as (11):

on ࢻ. Unfortunately, soft margins are still not enough to fit ଶ
ଶ ଵ
ࣦ ൌ ԡ‫ݓ‬ԡ ൅ σ௃ିଵ

‫כ‬ ఘ
௃ିଵ ௃ ௡ ‫כ‬ ‫כ‬ ‫כ‬௞ ‫כ‬
௝ୀଵฮ‫ݓ‬௝ ฮ ൅ ‫ ܥ‬σ௝ୀଵ σ௞ୀଵ σ௜ୀଵሺ‫ݓ‬௝ Ǥ Ȱ ൫‫ݔ‬௜ ൯ ൅ ܾ௝ ሻ

good models to some datasets. SVMs allow for a nice “trick” െ σ௃ିଵ ௃ ௡ೖ ௝ ‫כ‬ ‫כ‬ ‫כ‬௞
൅ ܾ௝‫ כ‬െ ‫ݓ‬Ǥ Ȱ൫‫ݔ‬௜‫כ‬௞ ൯ ൅ ܾ௝ ሻ
௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ ሺെͳ ൅ ሺ‫ݓ‬௝ Ǥ Ȱ ൫‫ݔ‬௜ ൯
when the dataset does not allow for a good fit. ௃ିଵ ௃ ௡ೖ ௝
െ σ௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ ሺെͳ ൅ ሺ‫ݓ‬௝ Ǥ Ȱ ൫‫ݔ‬௜‫כ‬௞ ൯
‫כ‬ ‫כ‬
൅ ܾ௝‫ כ‬൅ ‫ݓ‬Ǥ Ȱ൫‫ݔ‬௜‫כ‬௞ ൯ െ ܾ௝ ሻ
௡ೖ ௝
െ σ௃ିଵ ௃ ‫כ‬ ‫כ‬ ‫כ‬௞ ‫כ‬
௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߚ௞௜ ൫‫ݓ‬௝ Ǥ Ȱ ൫‫ݔ‬௜ ൯ ൅ ܾ௝ ൯ǡ (11)
C. Hybrid SVR ௝ ௝
ߙ௞௜and ߚ௞௜
are non-negative lagrangian multipliers. The KKT
Researchers have implemented a various number of models conditions [15] for the primal problem require the following
and theories to improve the prediction performance. Different conditions hold true:
techniques have been combined with single machine learning డࣦ
ൌ ‫ ݓ‬൅ σ௃ିଵ ௃ ௡ ௞ ೖ
௃ିଵ ௃ ௡ ௝ ௞ ೖ

డ௪ ௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ Ȱ൫‫ݔ‬௜ ൯ െ σ௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߙ௜ ൌ Ͳǡ


algorithms. Rasel et al. [16] combines SVR and windowing డࣦ ೖ ௝ ೖ ௝
ൌ ߩ‫ݓ‬௝‫ כ‬൅ ‫ ܥ‬σ௃ିଵ ௡ ‫כ‬ ‫כ‬௞ ௡ ‫כ‬ ‫כ‬௞
௝ୀଵ σ௜ୀଵ Ȱ ൫‫ݔ‬௜ ൯ െ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ Ȱ ൫‫ݔ‬௜ ൯ െ
operators, so proposed models are named as Win-SVR model. డ௪ ‫כ‬
௡ೖ
σ௝௞ୀଵ σ௜ୀଵ ߙ௞௜௝
Ȱ‫ כ‬൫‫ݔ‬௜‫כ‬௞ ൯
Three basic models are built by using three different ௡ೖ ௝ ೖ ௝
windowing; namely Normal rectangular windowing operator, െ σ௞ୀ௝ାଵ σ௜ୀଵ ߙ௞௜ Ȱ ൫‫ݔ‬௜ ൯ െ σ௃௞ୀଵ σ௡௜ୀଵ ߚ௞௜ Ȱ‫ כ‬൫‫ݔ‬௜‫כ‬௞ ൯ ൌ
௃ ‫כ‬ ‫כ‬௞
Ͳǡ
డࣦ ‫כ‬
௝ ௡ೖ ௝ ௃ ௡ೖ ௝
flatten windowing operator and de-flatten windowing operator. ൌ െ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ ൅ σ௞ୀ௝ାଵ σ௜ୀଵ ߙ௞௜ ൌ Ͳǡ ‫׊‬௝ǡ
డ௕ೕ
At the same time Bai et al [17] researched SVR and applied to డࣦ ௡ೖ
ൌ σ௃௞ୀଵ σ௜ୀଵ
௝ ௝
ሺߙ௞௜ ൅ ߚ௞௜ െ ‫ܥ‬ሻ ൌ Ͳǡ ‫׊‬௝ǡ (12)
డ௕ೕ‫כ‬
train the static model, with the optimal model structural ௝ ௝
ߙ௞௜ ൒ Ͳǡ ߚ௞௜ ൒ Ͳǡ ‫׊‬௜ǡ ‫׊‬௝ǡ
parameters determined by the ten-fold cross-validation. Dealing
with the forecast of the daily NG consumption, contribution in We can have solution of ‫ ݓ‬as follows
ೖ ௝ ೖ ௝
modeling the dynamic features of a nonlinear and time-vary ‫ ݓ‬ൌ െ σ௃ିଵ ௃ ௡ ௞ ௃ିଵ ௃ ௡ ௞
௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ Ȱ൫‫ݔ‬௜ ൯ ൅ σ௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ Ȱ൫‫ݔ‬௜ ൯
௝ ௃ିଵ ௝
system. Yasin and Caraka [18] explain the application of ൌ σ௞ǡ௜൫σ௞ିଵ௜ୀଵ ߙ௞௜ െ σ௜ୀଵ ߙ௞௜ ൯Ȱ൫‫ݔ‬௜ ൯Ǥ

ଵ ௃ ௡ೖ ௝ ௡ೖ ௝
Localized Multiple Kernel Support Vector Regression ‫ ݓ‬ൌ ሺെ‫ ܥ‬σ௝ୀଵ σ௜ୀଵ Ȱ ൫‫ݔ‬௜ ൯ ൅ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ Ȱ‫ כ‬൫‫ݔ‬௜‫כ‬௞ ൯
‫כ‬ ‫כ‬ ‫כ‬௞

(LMKSVR) to predict the daily stock price. As a result, this ௝ ೖ ௝ ௝ ೖ ௝
൅ σ௞ୀ௝ାଵ σ௡௜ୀଵ ߙ௞௜ Ȱ‫ כ‬൫‫ݔ‬௜‫כ‬௞ ൯ ൅ σ௞ୀଵ σ௡௜ୀଵ ߚ௞௜ Ȱ‫ כ‬൫‫ݔ‬௜‫כ‬௞ ൯
model has good performance to predict daily stock price ൌ
ଵ ௃ ௡ೖ ௝ ௝
σ σ ሺߙ ൅ ߚ௞௜ െ ሻȰ ൫‫ݔ‬௜ ൯ ‫כ‬ ‫כ‬௞
(13)
ఘ ௞ୀଵ ௜ୀଵ ௞௜
with MAPE produced all less than 2%. Thus,
In this paper, we performed SVR combination with ARIMA ଵ ௝ ௃ିଵ ௝ ௞ ିଵ ௝ ௃ିଵ ௝ ௞ ௞ ᇲ ᇲ
ࣦ ࣞ ൌ σ௞ǡ௜ σ௞ ᇲǡ௜ ᇲ൫σ௞ିଵ
௝ୀଵ ߙ௞௜ െ σ௝ୀ௞ ߙ௞௜ ൯൫σ௝ୀଵ ߙ௞௜ െ σ௝ୀ௞ ߙ௞௜ ൯ ‫ܭ‬ሺ‫ݔ‬௜ ‫ݔ‬௜ ᇲ ሻ
model which is traditional time series method also compared ଶ
ఘ ௝ ௝ ௝ ௝ ‫כ‬௞ ᇲ
െ σ௃ିଵ ‫כ‬௞
௝ୀଵ σ௞ǡ௜ σ௞ ᇲ ǡ௜ ᇲ൫ߙ௞௜ ൅ߚ௞௜ െ ‫ܥ‬൯ሺߙ௞ ᇲ ௜ ᇲ ൅ߚ௞ ᇲ ௜ ᇲ െ ‫ܥ‬ሻ ‫ܭ‬ሺ‫ݔ‬௜ ‫ݔ‬ ሻ
with feed forward neural network (FFNN) also generalized ଶ
ೖ ೖ
௜ᇲ

௝ ௝ ௝ ௝
regression neural network (GRNN). We selected several ൅‫ ܥ‬σ௃ିଵ ௃ ௡ ‫כ‬ ௃ିଵ ௡ ௃ିଵ ௡ ‫כ‬
௝ୀଵ σ௞ୀଵ σ௜ୀଵ ܾ௝ ൅ σ௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ െ σ௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ ܾ௝ െ
௝ ௡ೖ ௝ ௡ೖ ௝
optimization techniques in searching for optimum parameters σ௃ିଵ ௃ିଵ ௃
௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ ܾ௝ ൅ σ௝ୀଵ σ௞ୀ௝ାଵ σ௜ୀଵ ߙ௞௜ ൅
ೖ ೖ ೖ
such as MOSEK, QUADPROG, Cross Validation. In SVR σ௃ିଵ ௃ ௡ ௝ ‫כ‬ ௃ିଵ ௃ ௡ ௝ ௃ିଵ ௃ ௡
௝ୀଵ σ௞ୀ௝ାଵ σ௜ୀଵ ߙ௞௜ ܾ௝ െ σ௝ୀଵ σ௞ୀ௝ାଵ σ௜ୀଵ ߙ௞௜ ܾ௝ െ σ௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߚ௞௜ ܾ௝ ሻ
௝ ‫כ‬

method also picked 3 kinds of kernel. Such as Gaussian, Radial ௝ ଵ ௝ ௝ ௝ ௝ ‫כ‬௞ ‫כ‬௞ ᇲ
σ௞ǡ௜ǡ௝ ߙ௞௜ െ σ௞ǡ௜ σ௞ ᇲǡ௜ ᇲ൫ߙ௞௜ ൅ ߚ௞௜ െ ‫ܥ‬൯ ൫ߙ௞ ᇲ௜ ᇲ ߚ௞ᇲ௜ ᇲ െ ‫ܥ‬൯‫ܭ‬ሺ‫ݔ‬௜ ‫ݔ‬௜ᇲ ሻ

Basis and Polynomial. Combining different prediction At the same time, we have solution :
techniques have been investigated widely in the literature. In ‫ ݓ‬ൌ  െσ௃ିଵ ௡ ௝ ௞ ೖ ௝
௃ିଵ ௃ ௡ ௞ ೖ ௝
௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ Ȱ൫‫ݔ‬௜ ൯ ൅  σ௝ୀଵ σ௞ୀ௝ାଵ σ௜ୀଵ ߙ௞௜ Ȱሺ‫ݔ‬௜ ሻ
the short-range prediction combining the various methods is σ୩ǡ୧൫σ୩ିଵ ௝ ୎ିଵ ௝
ൌ ୨ୀଵ ߙ௞௜ െ  σ୨ୀ୩ ߙ௞௜ ൯ Ȱ൫‫ݔ‬௜௞ ൯Ǥ (14)
more useful.
A new trend has emerged in the machine learning community, Also ‫ݓ‬௝‫  כ‬as follows :
ଵ ೖ ௝ ೖ ௝
using models that could capture the temporal information in the ‫ݓ‬௝‫ כ‬ൌ ቀെ‫ ܥ‬σ௃௞ୀଵ σ௡௜ୀଵ Ȱ‫ כ‬൫‫ݔ‬௜‫כ‬௞ ൯ ൅  σ௞ୀଵ σ௡௜ୀଵ ߙ௞௜ Ȱ‫ כ‬ሺ‫ݔ‬௜‫כ‬௞ ሻቁ

time series data as representations and kernels are subsequently ௡ೖ ௝ ௡ೖ ௝
൅σ௃௞ୀ௝ାଵ σ௜ୀ଴ ߙ௞௜ Ȱ‫ כ‬൫‫ݔ‬௜‫כ‬௞ ൯ ൅  σ௃௞ୀଵ σ௜ୀଵ ߚ௞௜ Ȱ‫ כ‬ሺ‫ݔ‬௜‫כ‬௞ ሻ
defined on the fitted models, for example, autoregressive ଵ
௡ೖ
ൌ  σ௃௞ୀଵ σ௜ୀଵ

ሺߙ௞௜ ൅

ߚ௞௜ െ ‫ܥ‬ሻȰ‫ כ‬൫‫ݔ‬௜‫כ‬௞ ൯Ǥ (15)

kernels. Autoregressive (AR) kernels [9] are probabilistic By substituting equations (14) and (15),
kernels for time series data. In an AR kernel, the vector ଵ
ࣦ ஽ ൌ  െ σ௞ǡ௜ σ௞ ᇲǡ௜ ᇲ൫σ௞ିଵ ௃ିଵ ௞ ିଵ௝ ௝ ᇲ ௝
௝ୀଵ ߙ௞௜ െ σ௝ୀ௞ ߙ௞௜ ൯ ൫σ௝ୀଵ ߙ௞ ᇲ ௜ ᇲ െ
autoregressive model class is used to generate an infinite family ଶ
௝ ௞ᇲ
σ௃ିଵ ௞
௝ୀ௞ ᇲ ߙ௞ ᇲ ௜ ᇲ ቁ ߢ ቀ‫ݔ‬௜ ǡ ‫ ݔ‬ቁ
of features from the time series [20]. Given a time series s of ௜ᇲ

ఘ ௃ିଵ ௝ ௝ ௝ ௝
dimension d and of length L, the time series is supposed to be െ σ௝ୀଵ σ௞ǡ௜ σ௞ ᇲǡ௜ ᇲ൫ߙ௞௜ ൅  ߚ௞௜ െ ‫ܥ‬൯ ൫ߙ௞ ᇲ௜ ᇲ ൅ ߚ௞ ᇲ௜ᇲ െ ‫ܥ‬൯ߢ ቀ‫ݔ‬௜‫כ‬௞ ǡ ‫ݔ‬௜‫כ‬௞ ᇲ ቁ

generated according to the following vector autoregressive ൅‫ ܥ‬σ௃ିଵ ௃ ௡ೖ ‫כ‬ ௃ିଵ ௝ ௡ೖ ௝ ௃ିଵ ௝
௝ୀଵ σ௞ୀଵ σ௜ୀଵ ܾ௝ ൅  σ௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ െ σ௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ ܾ௝ െ
௡ೖ ௝ ‫כ‬

(VAR) model of order p: ௃ିଵ ௝ ௡ೖ


σ௝ୀଵ σ௞ୀଵ σ௜ୀଵ ߙ௞௜ ܾ௝


࢙ሺ‫ݐ‬ሻ ൌ σ௜ୀଵ ‫ܣ‬௜ ࢙ሺ‫ ݐ‬െ ݅ሻ ൅ ߳ሺ‫ݐ‬ǡ ‫ ݐ‬ൌ ‫ ݌‬൅ ͳǡ ǥ ǡ ‫ܮ‬ሻ (9) ൅ σ௃ିଵ ௃ ௡ೖ ௝ ௃ିଵ ௃ ௡ೖ ௝ ‫כ‬
σ σ
௝ୀଵ ௞ୀ௝ାଵ ௜ୀଵ ߙ௞௜ െ  σ௝ୀଵ σ௞ୀ௝ାଵ σ௜ୀଵ ߙ௞௜ ܾ௝ ൅
Where ‫ܣ‬௜ ‫ א‬Թௗ௫ௗ are the coefficient matrices, ߳ is a centered ௃ିଵ ௃
σ σ ௡ ೖ ௝
σ ௃ିଵ ௃
σ
σ௝ୀଵ ௞ୀ௝ାଵ ௜ୀଵ ߙ௞௜ ܾ௝ െ  ௝ୀଵ ௞ୀଵ ௜ୀଵ ߚ௞௜ ܾ௝ ሻ σ ௡ ೖ ௝ ‫כ‬

Gaussian noise with a covariance matrix ܸ ‫ א‬Թௗ௫ௗ .Then, the ௝ ଵ ௝ ௥ିଵ ௝ ௞ ᇲ ିଵ ௝


ൌ  σ௞ǡ௜ǡ௝ ߙ௞௜ െ σ௞ǡ௜ σ௞ ᇲǡ௜ ᇲ൫σ௞ିଵ ௝ୀଵ ߙ௞௜ െσ௝ୀ௞ ߙ௞௜ ൯ ൫σ௝ୀଵ ߙ௞ ᇲ ௜ ᇲ െ
likelihood function ‫݌‬ఏ ሺ‫ݏ‬ሻ can be formulated as follows ଶ
௝ ௞ ௞ᇲ
ଵ ଵ σ௥ିଵ
௝ୀ௞ ᇲ ߙ௞ ᇲ ௜ ᇲ ൯ߢ ቀ‫ݔ‬௜ ǡ ‫ݔ‬௜ ᇲ ቁ
‫݌‬ఏ ሺ‫ݏ‬ሻ ൌ ೏ሺಽష೛ሻ ς௅௧ୀ௣ାଵ ‡š’ሺെ ሺ‫ݏ‬ሺ‫ݐ‬ሻ െ σ௣௜ୀଵ ‫ܣ‬௜ ‫ݏ‬ሺ‫ ݐ‬െ ݅ሻሻ் ܸ ିଵ ሺ‫ݏ‬ሺ‫ݐ‬ሻ െ
ଶ ଵ ௝ ௝ ௝ ௝ ᇲ
ଶగȁ௏ȁ మ
െ σ௥ିଵ σ σ ᇲ ൫ߙ௞௜ ൅  ߚ௞௜ െ ‫ܥ‬൯ ൫ߙ௞ ᇲ௜ ᇲ ൅ ߚ௞ ᇲ௜ ᇲ െ ‫ܥ‬൯ߢ ቀ‫ݔ‬௜‫כ‬௞ ǡ ‫ݔ‬௜‫כ‬௞ ᇲ ቁ (16)
ଶఘ ௝ୀଵ ௞ǡ௜ ௞ ǡ௜
σ௣௜ୀଵ ‫ܣ‬௜ ‫ݏ‬ሺ‫ݐ‬ሻ
െ ݅ሻሻሻ (10)
For a given time series s, the likelihood function p_(s) across In the experiments to prove the validity of the proposed method,
all possible parameter setting (under a matrix normal-inverse). we used R2 performance measures. To compare of electric load
Following the standard SVM practice, the primal problem will forecasting with actual data as follows:
σ೙ሺ௬೔ ି௬ෝഢ ሻమ
ܴ ଶ ൌ ͳ െ σ೔೙ ‫ͲͲͳ כ‬Ψ (17) specify the order of the model using the arima() function, or
ሺ௬೔ ି௬ഥഢ ሻమ

correctly generate a set of optimal (p, d, q) using auto.arima().
›ො୪ as fitting value and ›୧ as actual value.
As an effect, this function searches through combinations of
order parameters and picks the set that optimizes the fit criteria
III. RESULT AND DISCUSSION model. Then, we try to compare with non-parametric technique
just like generalized regression neural network (GRNN)
In this paper, we were using the dataset that follows the
[20][21][22] also feed forward neural network [23].
calendar by the Indonesian government during the national
For FNN, standard six-input layer and twelve-input layer are
holiday season. The dataset from 2012, 2013, 2014 were the
adopted. To examine the effect of different architectures on the
training data, and dataset from 2015 was the testing data. It is
performance, we set the number of hidden layer. After
noted that in 2012 there were 19 national holidays, while in
performing the analysis by showing 1,2,3, ..., 12 input layers as
2013 there were 20 national holidays. This dataset is aggregate
well as 1,2,3, ..., 12 hidden layers with 4 different types of
data in hours on each month. To make the data more
activation functions, i.e. semi-linear, sigmoid, Bipolar Sigmoid,
representative then we use mean value for every hour so that
and Hyperbolic tangent. Respectively, the forecasting
144 data (2012-2014) for our training data and 30 data (2015)
performance obtained by FFNN with different numbers of
for our testing data as can be seen in Figure 1.
hidden nodes is depicted in Fig. 4. From this figure, we can see
The first step, we used the classical time series model ARIMA
that FFNN requires different numbers of hidden nodes for
to simulate data on electricity in 2012 until 2014 by using the
different datasets to obtain good performance.
forecast package in R. Which is allows the user to explicitly

TABLE I. EXAMPLE THE DATASET ELECTRIC LOAD DURING NATIONAL HOLIDAY SEASON 2012
Date 00:30 01:00 01:30 02:00 02:30 . 22:00 22:30 23:00 23:30 00:00
2-Jan-12 1,444.8 1,408.3 1,381.2 1,356.4 1,335.0 . 1,845.7 1,743.0 1,632.3 1,591.6 1,555.6
.
9-Jan-12 1,462.4 1,452.9 1,408.2 1,376.7 1,351.0 1,873.5 1,779.6 1,703.8 1,629.6 1,541.6
6-Feb-12 1,455.4 1,428.7 1,378.4 1,372.6 1,355.8 1,853.5 1,749.6 1,646.2 1,583.3 1,550.1
... ...
26-Nov-12 1,579.0 1,534.6 1,504.4 1,488.6 1,468.3 1,958.8 1,826.2 1,780.5 1,729.8 1,649.8
3-Dec-12 1,551.5 1,538.9 1,523.9 1,475.5 1,484.0 1,995.8 1,882.4 1,762.2 1,705.8 1,653.2
10-Dec-12 1,637.9 1,594.4 1,566.5 1,488.7 1,497.2 2,061.5 1,881.1 1,789.7 1,754.9 1,703.1
17-Dec-12 1,641.2 1,598.0 1,594.3 1,560.7 1,545.1 1,859.8 1,752.3 1,661.4 1,594.8 1,553.8
31-Dec-12 1,551.1 1,536.9 1,460.3 1,452.3 1,446.5 1,919.4 1,836.6 1,765.6 1,747.9 1,717.4

TABLE II. COMPARING MODELS

MODEL Type of Parameter and Optimization Accuracy


Classical Time Series ARIMA (2,1,0) Parameter AR1=0.8030 AR2=-0.2008 R2 60,44%
HYBRID-SVR-AR(1) Combination of Kernel Gaussian + Kernel Radial Basis with Optimization R2 90,44%
Quadratic 0.0219
Combination of Kernel Polynomial + Kernel Radial Basis with Optimization R2 98,74
Cross Validation : Cost=10; Epsilon=0.0001
HYBRID-SVR-AR(2) Combination of Kernel Gaussian + Kernel Polynomial with Optimization SMO R297,19%
Combination of Kernel Polynomial + Kernel Radial Basis with Optimization R2 95,27%
MOSEK
Neural Network (12 Input Layer, 12 Hidden Layer) Bipolar sigmoid with Optimization Cross Validation :Validation Ratio 0.2 ; PQ R2 96,56%
Threshold 1.5 and strip 5
Neural Network (12 Input Layer, 12 Hidden Layer) Semi Linear with Optimization Cross Validation : Validation Ratio 0.2 ; PQ R2=95,17%
Threshold 1.5 and strip 5
Neural Network (12 Input Layer, 12 Hidden Layer) SIGMOID with Optimization Cross Validation: Validation Ratio 0.2 ; PQ R2=80,09%
Threshold 1.5 and strip 5
Neural Network (12 Input Layer, 12 Hidden Layer) Hyperbolic Tangent with Optimization Cross Validation: Validation Ratio 0.2 R2=87,86%
; PQ Threshold 1.5 and strip 5
Neural Network (6 Input Layer, 6 Hidden Layer) Bipolar sigmoid with Optimization Cross Validation: Validation Ratio 0.2 ; PQ R2=84,69%
Threshold 1.5 and strip 5
Neural Network (6 Input Layer, 6 Hidden Layer) Semi Linear with Optimization Cross Validation: Validation Ratio 0.2 ; PQ R2=85,74%
Threshold 1.5 and strip 5
Neural Network (6 Input Layer, 6 Hidden Layer) SIGMOID with Optimization Cross Validation: Validation Ratio 0.2 ; PQ R2=83,73%
Threshold 1.5 and strip 5
Neural Network (6 Input Layer, 6 Hidden Layer) Hyperbolic Tangent with Optimization Cross Validation: Validation Ratio 0.2 ; R2=85,24%
PQ Threshold 1.5 and strip 5
Generalized Regression Neural Network (GRNN) – AR 1 Radial Basis Function R2 96,76%
Generalized Regression Neural Network (GRNN) – AR 2 Radial Basis Function R289,88%
Modified Generalized Regression Neural Network (M- Radial Basis Function R2 96,43%
GRNN) – AR 1
Modified Generalized Regression Neural Network (M- Radial Basis Function R2 87,77%
GRNN) – AR 2

We aim to demonstrate not only that SVR performs Actual VS Prediction Multi Kernel SVR
well with the number of examples, but also that it performs well 3000

against the number of kernels and function activation in neural 2800 Actual

network. Kernels are useful for training a model, but there is 2600
Prediction

one flaw: we do not know what that model should return if we


pass in an example that we have not seen yet. In the fact that 2400

Electric Power Load Data


the characteristic property of a time series data is not generated 2200

independently, their dispersion varies in time, 2000


They have cyclic components and often governed by a trend.
Statistical procedures that suppose independent and 1800

identically distributed data are, therefore excluded from the 1600


analysis of time series. After doing a combination of methods 1400
can be found that. HYBRID-SVR-AR (1) with the combination
of Kernel Polynomial + Kernel Radial Basis with Optimization 1200

Cross Validation is the best model for forecasting on electric 1000


0 500 1000 1500 2000 2500 3000 3500 4000
load in 2015 also modified generalized regression neural Data -
network (MGRNN). The main idea of Cross Validation (CV) is
to divide data into two parts (once or several times): one (the Fig. 2. Actual VS Prediction
training set) used to train a model and the other (the validation
set) used to estimate the error of the model. CV selects the In predicting the SVR equation by finding the beta value with
parameter among a group of candidates with the smallest CV tolerance = beta > C *10-6, It suppose in the first point beta
error, where the CV error is the average of the multiple value = 0.4974*10-6 > tolerance then the first point is called the
validation errors. Normally, Nƒfold, leave-one-out, or repeated support vector and is used in forming the prediction equation.
random sub-sampling procedures were used for CV. Basically, The number of support vectors that formed is 144 data. It means
the efficient kernels have been proposed to tackle the challenges 144 data is a support vector and used in the equation to predict
in time series through kernel machines. Based on table 1 electrical power load. The next beta value is used in the SVR
comparing model, it was found that the combination of equation to predict the data testing. Furthermore, based on the
polynomial and radial basis has excellent performance with results of values ߚ௜ and ‫ݕ‬௜ are incorporated into the following
Cost(C) = 10; Epsilon = 0.0001. Once the kernel function is equations݂ሺ‫ݔ‬ሻ ൌ ሺσଵସସ௜ୀଵ ߚ௜ ‫ݕ‬௜ ૎ሺ࢞௜ ሻǤ ૎ሺ࢞ሻ ൅ „‹ƒ•ሻ, with bias = 0
specified, and the parameters are then used to map the training Apart from that, we get the equation݂ሺ‫ )ݔ‬for subsequent use in
data. The polynomial kernel function with d=1 can be defined data testing as Hybrid Support Vector Regression prediction
ȥሺ࢞ǡ ࢞࢏ ሻ ൌ ሺͳ ൅ ்࢞ Ǥ ࢞௜ ሻଵ 2600
Actual VS Prediction Multi Kernel SVR

࢞ ൌ ሾ‫ݔ‬ଵ ሿ் Actual

࢞௜ ൌ ሾ‫ݔ‬ଵ௜ ሿ் 2400


Prediction

Moreover, we get the kernel equation as follows and used for


data training mapping:
Electric Power Load Data

2200
ሺͳ ൅ ்࢞ Ǥ ࢞௜ ሻ ൌ ሺͳ ൅ ‫ݔ‬ଵ ‫ݔ‬ଵ௜ ሻ
We can see Fig 2 the plot of prediction vs actual illustrates a
suitable to fit (indicated by the blue and red line). Also Fig3. 2000

Actual data (2015) vs Prediction (2015)


1800

1600

1400
0 50 100 150
Data -

Fig. 3. Actual data (2015) VS Prediction (2015)


HBRID SVR-AR(2) GAUSS-POL GRNN-AR(1) HBRID SVR-AR(1) POL-RADBAS
NN-Bipolar Sigmoid ACTUAL DATA (2015) MGRNN-AR(1)
HYBRID SVR-AR(2) POL-RADBAS NN-SEMI LINEAR ARIMA (2,1,0)
2,100.0

1,900.0

1,700.0

1,500.0

1,300.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Fig. 4. Forecasting Performance

[7] A. I. McLeod, H. Yu, and E. Mahdi, “Time Series Analysis with R,”
IV. CONCLUSION Handb. Stat., vol. 30, no. February, pp. 1–61, 2011.
[8] A. Harvey and V. Oryshchenko, “Kernel density estimation for time
In case of knowledge mining from the data and also improve series data,” Int. J. Forecast., vol. 28, no. 1, pp. 3–14, 2012.
the accuracy of the forecast we combine the traditional time [9] S. Rüping, “SVM Kernels for Time Series Analysis,” Univ.
series techniques of ARIMA with machine learning as well as Dortmund, p. 8, 2001.
[10] W. Chu and S. S. Keerthi, “Support vector ordinal regression.,”
Feed Forward Neural Network (FFNN), Generalized Neural Comput., vol. 19, no. 3, pp. 792–815, 2007.
Regression Neural Network (GRNN), and support vector [11] A. J. Smola, B. Sch, and B. Schölkopf, “A Tutorial on Support Vector
regression with the combination of kernel Gaussian, Regression,” Stat. Comput., vol. 14, no. 3, pp. 199–222, 2004.
polynomial, and radial basis. We get high accuracy with R2 [12] J. Nocedal and S. J. Wright, “Quadratic Programming,” Numer.
Optim., no. Chapter 18, pp. 448–496, 2006.
more than 80%. Although the combination of multi-kernel [13] L. Bai, J. E. Mitchell, and J. S. Pang, “Using quadratic convex
provides good results but requires high computational reformulation to tighten the convex relaxation of a quadratic program
complexity. This work could be extended by using Group with complementarity constraints,” Optim. Lett., vol. 8, no. 3, pp.
Method Data Handling (GMDH) in prediction and short-term 811–822, 2014.
[14] H. Drucker, C. Burges, L. Kaufman, A. Smola, and V. Vapnik,
forecasting “Support Vector Regression Machines,” Neural Inf. Process. Syst.,
vol. 1, pp. 155–161, 1996.
[15] C. Cortes and V. Vapnik, “Support-Vector Networks,” Mach. Learn.,
ACKNOWLEDGMENT vol. 20, no. 3, pp. 273–297, 1995.
[16] R. I. Rasel, N. Sultana, and P. Meesad, “An efficient modelling
This research supported by School of Mathematical Sciences approach for forecasting financial time series data using support
the National University of Malaysia GP-K014101 and vector regression and windowing operators,” Int. J. Comput. Intell.
Bioinformatics and Data Science Research Center (BDSRC) Stud., vol. 4, no. 2, 2015.
Bina Nusantara University. [17] Y. Bai and C. Li, “Daily natural gas consumption forecasting based
on a structure-calibrated support vector regression approach,” Energy
REFERENCES Build., vol. 127, no. November, pp. 571–579, 2016.
[18] H. Yasin, R. E. Caraka, Tarno, and A. Hoyyi, “Prediction of crude oil
[1] T. Hong, “Short Term Electric Load Forecasting,” North Carolina prices using support vector regression (SVR) with grid search - Cross
State University, 2010. validation algorithm,” Glob. J. Pure Appl. Math., vol. 12, no. 4, 2016.
[2] D. Genethliou, “Statistical Approaches to Electric Load Forecasting,” [19] F. Tang, “Kernel Methods For Time Series Data,” The University of
Stony Brook University, 2015. Birmingham, 2015.
[3] W. Y. Zhang, W. C. Hong, Y. Dong, G. Tsai, J. T. Sung, and G. feng [20] R. E. CARAKA, “Pemodelan General Regression Neural Network
Fan, “Application of SVR with chaotic GASA algorithm in cyclic (GRNN) Dengan Peubah Data Input Return Untuk Peramalan Indeks
electric load forecasting,” Energy, vol. 45, no. 1, pp. 850–858, 2012. Hangseng,” in Trusted Digital Indentitiy and Intelligent System,
[4] L. Datong, P. Yu, and P. Xiyuan, “Fault prediction based on time 2014, pp. 283–288.
series with online combined kernel SVR methods,” in 2009 IEEE [21] R. Caraka and H. Yasin, “Prediksi Produksi Gas Bumi Dengan
Intrumentation and Measurement Technology Conference, I2MTC General Regression Neural Network (Grnn).” Jurusan Statisitka
2009, 2009, pp. 1167–1170. Unpad, Bandung, 2015.
[5] M. Karasuyama and R. Nakano, “Optimizing SVR hyperparameters [22] R.E. Caraka, H. Yasin, and P. A, “Pemodelan General Regression
via fast cross-validation using AOSVR,” in IEEE International Neural Network ( Grnn ) Dengan Peubah Input Data Return Untuk
Conference on Neural Networks - Conference Proceedings, 2007, pp. Peramalan Indeks Hangseng,” no. Snik, 2014.
1186–1191. [23] R. E. Caraka, “Prediction of Euro 50 Using Back Propagation Neural
[6] R. E. Caraka, H. Yasin, and A. W. Basyiruddin, “Peramalan Crude Network (BPNN) and Genetic Algorithm (GA),” Int. J. Eng. Bus.
Palm Oil ( CPO ) Menggunakan Support Vector Regression Kernel Manag., vol. 1, no. 1, pp. 62–67, 2017.
Radial Basis,” Matematika, vol. 7, no. 1, pp. 43–57, 2017.

View publication stats

Das könnte Ihnen auch gefallen